Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
579 views
in Technique[技术] by (71.8m points)

julia - What exactly is the difference between @parallel and pmap?

As the title states: what exactly is the difference between @parallel and pmap? I don't mean the obvious one's a macro for a loop and the other works on functions, I mean how exactly does their implementation differ and how should I use this knowledge to choose between them?

The reason I ask is that a lot of the applications I write could use either construct: I could write a loop and calculate something with @parallel, or wrap what would have been in the loop into a function and call pmap on that. I have been following the advice of using @parallel for things which are quick to evaluate and pmap for calls where each task takes much longer (as it states in the documentation), but I feel that if I have a better understanding of what it's doing I'd be able to make better choices.

For example: does @parallel divide up the work before evaluating? I noticed that if I run a parallel loop where each inner call takes a random amount of time, @parallel can take a long time because at the end I have very few processes still working. pmap on the same microtests doesn't seem to have this: is pmap re-distributing the work as needed?

Other questions like this all stem from my ignorance of what exactly how pmap differs from @parallel.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

@parallel will take the jobs to be done and divy them up amongst available workers right away. Note in the ?@parallel we get The specified range is partitioned ... across all workers. pmap by contrast, will start each worker on a job. Once a worker finishes with a job, it will give it the next available job. It is similar to queue based multiprocessing as is common in python, for instance. Thus, it's not so much a case of "redistributing" work but rather of only giving it out at the right time and to the right worker in the first place.

I cooked up the following example which I believe illustrates this. In this somewhat silly example, we have two workers, one of which is slow and the other of which is twice as fast. Ideally, we would want to give the fast worker twice as much work as the slow worker. (or, more realistically, we would have fast and slow jobs, but the principal is the exact same). pmap will accomplish this, but @parallel won't.

For each test, I initialize the following:

addprocs(2)

@everywhere begin
    function parallel_func(idx)
        workernum = myid() - 1 
        sleep(workernum)
        println("job $idx")
    end
end

Now, for the @parallel test, I run the following:

@parallel for idx = 1:12
    parallel_func(idx)
end

And get back print output:

julia>  From worker 2:  job 1
    From worker 3:  job 7
    From worker 2:  job 2
    From worker 2:  job 3
    From worker 3:  job 8
    From worker 2:  job 4
    From worker 2:  job 5
    From worker 3:  job 9
    From worker 2:  job 6
    From worker 3:  job 10
    From worker 3:  job 11
    From worker 3:  job 12

It's almost sweet. The workers have "shared" the work evenly. Note that each worker has completed 6 jobs, even though worker 2 is twice as fast as worker 3. It may be touching, but it is inefficient.

For for the pmap test, I run the following:

pmap(parallel_func, 1:12)

and get the output:

From worker 2:  job 1
From worker 3:  job 2
From worker 2:  job 3
From worker 2:  job 5
From worker 3:  job 4
From worker 2:  job 6
From worker 2:  job 8
From worker 3:  job 7
From worker 2:  job 9
From worker 2:  job 11
From worker 3:  job 10
From worker 2:  job 12

Now, note that worker 2 has performed 8 jobs and worker 3 has performed 4. This is exactly in proportion to their speed, and what we want for optimal efficiency. pmap is a hard task master - from each according to their ability.

Thus, the recommendations in the Julia docs make sense. If you have small simple jobs then it is more likely that these issues with@parallel won't cause problems. For bigger or more complex jobs though, pmap has advantages.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...