Creating an Amazon EMR Cluster

In this tutorial, we will see how you can create a new EMR cluster

1. Login to AWS and go to EMR console

Login your AWS account

Sign in to the AWS Management Console and open the Amazon EMR console at

https://console.aws.amazon.com/elasticmapreduce/

2.Click Create Cluster

The Create Cluster page is divided into below section:

· General Configuration

· Software configuration

· Hardware configuration

· Security and access

General Configuration

In this Configuration section we will configure following setup.

ClusterName:Enter a descriptive name for your cluster.

Logging:This determines whether Amazon EMR captures detailed log data to Amazon S3.

Launch Mode:Cluster

Software Configuration

In the Software Configuration section, we will configure following setup.

Vendor:Amazon

Release:Choose the EMR version as emr-4.5.0

Applications: Choose Spark

Spark: Spark 1.6.1 on Hadoop 2.7.2 YARN with Ganglia 3.7.2

Hardware Configuration

In the Hardware Configuration section, we will configure following setup.

Instance type:Select your instance type

Number of instances:Select the number of instance for EMR cluster where 1 instance dedicated to master server and remaining instance for core(nodes).

Note:The minimum number of nodes is 3.

Security and Access

EC2 key pair:Choose the key pair for your emr instance access

Permissions : Default :You can choose the default IAM roles.

Review your configuration and if you are satisfied with the settings, click Create Cluster.

Add HDFS Permission for tomcat:

After the creating cluster add steps execution on created cluster

Goto the created EMR cluster and click Add Steps

Pass following paramters and click add

Step type : Custom Jar

Name : ideata app

JAR location : s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar

Arguments : s3://bda-ideata/tomcat_hdfs_permission.sh

Action on failure : Continue

Verify the step you added has been successfully completed by clicking on steps drop down and checking the status which should be “completed”