Stands for “Elastic MapReduce”
EMR helps creating Hadoop clusters (big data) to analyze and process vast amount of data
- Clusters can be made of hundreds of EC2 instances
- It comes bundled with Apache Spark, HBase, Presto, Flink…
- It takes care of all the provisioning and configuration
- Auto-scaling integrated with Spot instances
Use cases: data processing, machine learning, web indexing, big data…
Node Types & Purchasing
- Master node: Manage the cluster, coordinate, manage health - long running
- Core node: Run tasks, and store data - long running
- Task node (optional): Just to run tasks - usually Spot
Purchasing
- On-demand: reliable
- Reserved (min 1 year): cost savings
- Spot: cheaper, less reliable
Can have long-running cluster, or transient (temporary) cluster