Hadoop

Hadoop

Basic

The Hadoop open-source software framework is widely used for reliable, and scalable distributed computing on a cluster of machines.

This competency area includes understanding Single node cluster in a standalone mode, in pseudo-distributed mode, running shell commands to interface with HDFS, performing parallel processing tasks, among others. 

Key Competencies:

  1. Single node cluster in a standalone mode - Install Java environment, download the Hadoop jar and run Hadoop as a standalone process. Applicable for Operations, Developer.
  2. Single node cluster in pseudo-distributed mode - Configure SSH, format, and start the name node and data node daemons, configure and run the YARN process. Applicable for Operations, Developer.
  3. Monitor cluster and jobs - Monitor name nodes and data nodes using the name node web interface, monitor resource usage using the ResourceManager. Applicable for Administration, Developer.
  4. Run shell commands to interface with HDFS - Run basic list and view commands on HDFS, run commands to store and read data from HDFS, copy files to and from Hadoop. Applicable for Operations, Developer.
  5. Perform parallel processing tasks using MapReduce - Set up the Mapper, Reducer classes to process data stored in HDFS, derive from the right base classes, set up the basic configuration to run MapReduce jobs. Applicable for Developer.
  6. Schedule and manage tasks with YARN - Use the FIFO scheduler, capacity scheduler, and fair scheduler, configure task queues and submit MapReduce tasks to a specific queue. Applicable for Administration, Developer.
  7. Set up and configure a Hadoop cluster on a cloud platform - Configure a simple Hadoop cluster using either Amazon EMR, Azure HDInsight, or Google Cloud DataProc. Applicable for Administration, Developer.
  8. Run MapReduce jobs on Hadoop on a cloud platform - Run MapReduce jobs on Amazon EMR, Azure HDInsight, or Google Cloud DataProc. Configure bucket storage rather than HDFS storage on the cloud. Applicable for Operations, Developer.