Connect to EMR Cluster in Ideata Application

By default Ideata application runs spark in Embedded mode. You can switch it to cluster mode by connecting to spark EMR. Once your EMR is successfully up and running, login to Ideata Application and go to admin spark seeting and fill the following details in order to latch Ideata application on to EMR cluster

  • Select “Amazon EMR YARN Cluster”

  • Select Spark Environment as Amazon EMR 4.5

  • Provide EMR cluster master public ip-address and master port as 8020

  • Add advance configurations as your cluster specification to get the best performance like

o executor and driver memory ( for example if your cluster node is of 16 gb RAM each, then we can allocate atmost 10gb)

o yarn and executor cores ( for example if your custer node is of 4 core, then we can allocate 4 )

o spark driver host – ec2 public ip address where ideata app is running

o spark executor instance – (number of instances\/nodes in your cluster)