STUDY SOURCE CODE: EPISODE 5 – HADOOP.MAPREDUCE.JOBTRACKER

There is a huge difference between MapReduce1 and MapReduce2(YARN), in map reduce 1, they are using job tracker and task tracker to management the process, while YARN uses the resource manager, application manager. Etc to improve the performance.

Since MapReduce1 has already played its part in the Hadoop history for a while, we will study how the map reduce works from a perspective of studying history.

I downloaded the older version of hadoop release 1.2.1 from Here. You can also view the source code from Apache SubVersion online here:

When you set all of this up, and dive to the directory mapred.org.apache.ahdoop.mapred, you will be amazed at how many classes they have in that directory. There are about 200 classes in that directory. From this perspective, we can see this is the central place where all the map-reduce magic happens. Now let’s star this journey with the JobTracker who is the “central location for submitting and tracking MR jobs in a network environment”. It has about 5000 lines of code. Since this is not some sort tutorial but nothing other than some random study notes by a Java layman. I will first post the basic structure of this class, like the content of a book based on the author’s comment, marked by the line number.

JobTracker:

0 foreplay
1500 Real JobTracker
-propertis
-constructors.
2300 Lookptable JobinProg and TaskinProg
-create
-remote
-mark
2515 Accessor
-jobs
-tasks
2909 InterTrackerProtocol
-heartbeat
-update..
3534 JobSubmission Protol
-getjobid
-submitjob
-killjob
-setjobpriority
-getjobprofile
-status
4354 Job Tracker Methods
4408 Methods to track TT
-update
-lost
-refresh
4697 Main (debug)
4876 Check the job if it has invalid requirements
4995 MXBean implementation
-blacklist
-graylist
5110 JobTracker SafeMode

datafireball

STUDY SOURCE CODE: EPISODE 5 – HADOOP.MAPREDUCE.JOBTRACKER

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply