r - Is there an efficient way to parallelize mapply?

Question

Welcome To Ask or Share your Answers For Others

r - Is there an efficient way to parallelize mapply?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Is there an efficient way to parallelize mapply?

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-)

After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package foreach isn't really meant for my problem because too much overhead is produced (see, for example, SO). A good alternative seems to be multicore for Unix machines. In particular, the pvec function seems to be the most efficient one after I checked the help page.

However, if I understand it correctly, this function only takes one vector and splits it up accordingly. I need a function that can be parallized, but takes multiple vectors (or a data.frame instead), just like the mapply function does. Is there anything out there that I missed?

Here is a small example of what I want to do: (Note that I include a plyr example here because it can be an alternative to the base mapply function and it has a parallelize option. However, it is slower in my implementation and internally, it calls foreach to parallelize, so I think it won't help. Is that correct?)

library(plyr)
library(foreach)
n <- 10000
df <- data.frame(P   = rnorm(n, mean=100, sd=10),
                 B0  = rnorm(n, mean=40,  sd=5),
                 CF1 = rnorm(n, mean=30,  sd=10),
                 CF2 = rnorm(n, mean=30,  sd=5),
                 CF3 = rnorm(n, mean=90,  sd=8))

get_uniroot <- function(P, B0, CF1, CF2, CF3) {

  uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3}, 
          lower = 1,
          upper = 10,
          tol   = 0.00001)$root

}

system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3))
   #user  system elapsed 
   #0.91    0.00    0.90 
system.time(x2 <- mdply(df, get_uniroot))
   #user  system elapsed 
   #5.85    0.00    5.85
system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% {
    get_uniroot(P, B0, CF1, CF2, CF3)})
   #user  system elapsed 
  # 10.30    0.00   10.36
all.equal(x1, x2$V1) #TRUE
all.equal(x1, x3)    #TRUE

Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of doMC part, because I couldn't install it. His example works, though, even after adjusting his function.), but didn't get it to work. However, since it uses foreach, I thought the same arguments mentioned above apply, so I didn't try it too long.

#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3))
#Error in { : task 1 failed - "invalid function value in 'zeroin'"

PS: I know that I could just increase tol to reduce the number of steps that are necessary to find a uniroot. However, I already set tol as big as possible.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:30:49+0000

I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:

dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
          function(x) do.call(get_uniroot,as.list(dfm[x,])),
          mc.cores=4L
          )
unlist(result)

This is basically doing the same mapply does, but in a parallel way.

But...

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:

ncores <- 4
id <- floor(
        quantile(0:nrow(df),
                 1-(0:ncores)/ncores
        )
      )
idm <- embed(id,2)

mapply_uniroot <- function(id){
  tmp <- df[(id[1]+1):id[2],]
  mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
                  function(x) mapply_uniroot(idm[x,]),
                  mc.cores=ncores)
final <- unlist(result)

This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :

> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE

Categories

r - Is there an efficient way to parallelize mapply?

r - Is there an efficient way to parallelize mapply?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags