I am afraid I do not understand the timing results of a Map-Reduce job. For example, a job I am running gives me the following results from the job tracker.
Finished in: 1mins, 39sec
CPU time spent (ms) 150,460 152,030 302,490
The entries in CPU time spent (ms) are for Map, Reduce and Total respectively. But, then how is "CPU time spent" being measured and what does it signify? Is this the total cumulative time spent in each of the mappers and reducers assigned for the job? Is it possible to measure other times from the framework such as time for shuffle, sort, partition etc? If so, how?
A second question which bothers me. I have seen some posts here (Link1, Link2 ) which suggest using getTime() in the driver class :
long start = new Date().getTime();
boolean status = job.waitForCompletion(true);
long end = new Date().getTime();
System.out.println("Job took "+(end-start) + "milliseconds");
Is this not doing what the first entry in Job Tracker output provides anyway? Is this necessary? What is the best way to time a hadoop job especially when I want to time IO time, compute time per node/ per stage ?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…