Hadoop

Intermediate

The Hadoop open-source software framework is widely used for reliable, and scalable distributed computing on a cluster of machines.

This competency area includes implementing advanced parallelism, implementing Counters, performing basic queries and subqueries in Hive, among others.

Key Competencies:

Implement advanced parallelism in MapReduce using a Combiner - Difference between a reducer and a combiner, using custom writable data types. Applicable for Developer.
Use Partitioners to control the number of reducers - Configure the right partition based on the use case. Applicable for Developer.
Implement Counters - To log mapper and reducer statistics, customize statistics using counters in code. Applicable for Developer.
Configure Map, Shuffle/Reduce, and Job parameters - Optimize disk space, memory, and other resource usages. Applicable for Administration, Developer.
Configure High Availability for the namenode using QJM - Configure machines to run the JournalNodes for HA. Applicable for Administration, Developer.
Install and set up Hive for data warehousing with Hadoop - Set up Hive to work with a Hadoop installation. Applicable for Operations, Developer.
Perform basic queries and subqueries in Hive - Run basic queries using the Beeline or HCatalog CLI. Applicable for Analyst, Developer.
Execute windowing and analytic functions and aggregations in Hive - Perform joins, window operations, grouping, rollup. Applicable for Analyst, Developer.
Configure Hadoop Ozone for Object Storage with Hadoop - Work with Ozone using the command line and programming libraries. Applicable for Operations, Developer.

Cookie support is required to access HackerRank