@parallel
will take the jobs to be done and divy them up amongst available workers right away. Note in the ?@parallel
we get The specified range is partitioned ... across all workers.
pmap
by contrast, will start each worker on a job. Once a worker finishes with a job, it will give it the next available job. It is similar to queue based multiprocessing as is common in python, for instance. Thus, it's not so much a case of "redistributing" work but rather of only giving it out at the right time and to the right worker in the first place.
I cooked up the following example which I believe illustrates this. In this somewhat silly example, we have two workers, one of which is slow and the other of which is twice as fast. Ideally, we would want to give the fast worker twice as much work as the slow worker. (or, more realistically, we would have fast and slow jobs, but the principal is the exact same). pmap
will accomplish this, but @parallel
won't.
For each test, I initialize the following:
addprocs(2)
@everywhere begin
function parallel_func(idx)
workernum = myid() - 1
sleep(workernum)
println("job $idx")
end
end
Now, for the @parallel
test, I run the following:
@parallel for idx = 1:12
parallel_func(idx)
end
And get back print output:
julia> From worker 2: job 1
From worker 3: job 7
From worker 2: job 2
From worker 2: job 3
From worker 3: job 8
From worker 2: job 4
From worker 2: job 5
From worker 3: job 9
From worker 2: job 6
From worker 3: job 10
From worker 3: job 11
From worker 3: job 12
It's almost sweet. The workers have "shared" the work evenly. Note that each worker has completed 6 jobs, even though worker 2 is twice as fast as worker 3. It may be touching, but it is inefficient.
For for the pmap
test, I run the following:
pmap(parallel_func, 1:12)
and get the output:
From worker 2: job 1
From worker 3: job 2
From worker 2: job 3
From worker 2: job 5
From worker 3: job 4
From worker 2: job 6
From worker 2: job 8
From worker 3: job 7
From worker 2: job 9
From worker 2: job 11
From worker 3: job 10
From worker 2: job 12
Now, note that worker 2 has performed 8 jobs and worker 3 has performed 4. This is exactly in proportion to their speed, and what we want for optimal efficiency. pmap
is a hard task master - from each according to their ability.
Thus, the recommendations in the Julia docs make sense. If you have small simple jobs then it is more likely that these issues with@parallel
won't cause problems. For bigger or more complex jobs though, pmap
has advantages.