Hadoop

Hadoop

Intermediate

The Hadoop open-source software framework is widely used for reliable, and scalable distributed computing on a cluster of machines.

This competency area includes implementing advanced parallelism, implementing Counters, performing basic queries and subqueries in Hive, among others. 

Key Competencies:

  1. Implement advanced parallelism in MapReduce using a Combiner - Difference between a reducer and a combiner, using custom writable data types. Applicable for Developer.
  2. Use Partitioners to control the number of reducers - Configure the right partition based on the use case. Applicable for Developer.
  3. Implement Counters - To log mapper and reducer statistics, customize statistics using counters in code. Applicable for Developer.
  4. Configure Map, Shuffle/Reduce, and Job parameters - Optimize disk space, memory, and other resource usages. Applicable for Administration, Developer.
  5. Configure High Availability for the namenode using QJM - Configure machines to run the JournalNodes for HA. Applicable for Administration, Developer.
  6. Install and set up Hive for data warehousing with Hadoop - Set up Hive to work with a Hadoop installation. Applicable for Operations, Developer.
  7. Perform basic queries and subqueries in Hive - Run basic queries using the Beeline or HCatalog CLI. Applicable for Analyst, Developer.
  8. Execute windowing and analytic functions and aggregations in Hive - Perform joins, window operations, grouping, rollup. Applicable for Analyst, Developer.
  9. Configure Hadoop Ozone for Object Storage with Hadoop - Work with Ozone using the command line and programming libraries. Applicable for Operations, Developer.