Tuning Amazon EMR cluster for best performance with Ideata

Enabling CPU Scheduling

CPU scheduling is not enabled by default. To enable the CPU Scheduling, set the following property in the /etc/hadoop/conf/capacity-scheduler.xml file on the ResourceManager and NodeManager hosts:

Replace the DefaultResourceCalculator with the DominantResourceCalculator.

Property:yarn.scheduler.capacity.resource-calculator

Value:org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

1.login to your emr master instance machine and update the follwong properties to /etc/hadoop/conf/capacity-scheduler.xml file

<property>

<name>yarn.scheduler.capacity.resource-calculator</name>

<!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> -->

<value>org.apache.hadoop.yarn.util.resource.``DominantResourceCalculator``</value>

</property>

  1. Restart your emr cluster by follwing command -

sudo reboot

Configure EMR FS file

The EMR File System (EMRFS) and the Hadoop Distributed File System (HDFS) are both installed on your EMR cluster.

EMRFS is an implementation of HDFS which allows EMR clusters to store data on Amazon S3.

EMRFS will try to verify list consistency for objects tracked in its metadata for a specific number of retries. The default is 5. In the case where the number of retries is exceeded the originating job returns a failure. To overcome this issue you can override your default emrfs configuration in the following steps:

Step1: Login your EMR-master machine

Step2: Add following properties to /usr/share/aws/emr/emrfs/conf/emrfs-site.xml

sudo vi /usr/share/aws/emr/emrfs/conf/emrfs-site.xml

<property>

<name>fs.s3.consistent.throwExceptionOnInconsistency</name>

<value>false</value>

</property>

<property>

<name>fs.s3.consistent.retryPolicyType</name>

<value>fixed</value>

</property>

<property>

<name>fs.s3.consistent.retryPeriodSeconds</name>

<value>10</value>

</property>

<property>

<name>fs.s3.consistent</name>

<value>false</value>

</property>

Your emrfs-site.xml file look like this

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>fs.s3.consistent.throwExceptionOnInconsistency</name>

<value>false</value>

</property>

<property>

<name>fs.s3.consistent.retryPolicyType</name>

<value>fixed</value>

</property>

<property>

<name>fs.s3.consistent.retryPeriodSeconds</name>

<value>10</value>

</property>

<property>

<name>fs.s3.consistent</name>

<value>false</value>

</property>

</configuration>