Posts

Showing posts from January, 2015

Hadoop Part 1: Hello World

Image
Hadoop Hello World: The Word Count Code: The word count code is the simplest program to get you started with Map Reduce Framework. The task that a wordcount program performs is as follows: Given several text files find a count of number of times each word appears in the entire set It primarily consists of 3 parts: Driver    : Driver portion of the code contains the configuration details for the Hadoop Job. For example the input path, the output path, number of reducers , mapper class name, reducer class name etc Mapper  : Role of mapper in word count is to emit <word, 1>  for each word appearing in the document. Reducer : Role of Reducer in word count is to sum the list of 1's prepared by shuffle and sort phase <word, [1,1,1,1,1,1]>  and emit <word, 6> It's easier to create an eclipse java project and add relevant hadoop jar files for the code below.  package com.kush; import java.io.IOException; import java.util.*; import org.apache.had