r - apply() is slow - how to make it faster or what are my alternatives?

Question

Welcome To Ask or Share your Answers For Others

r - apply() is slow - how to make it faster or what are my alternatives?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - apply() is slow - how to make it faster or what are my alternatives?

I have a quite large data frame, about 10 millions of rows. It has columns x and y, and what I want is to compute

hypot <- function(x) {sqrt(x[1]^2 + x[2]^2)}

for each row. Using apply it would take a lot of time (about 5 minutes, interpolating from lower sizes) and memory.

But it seems to be too much for me, so I've tried different things:

compiling the hypot function reduces the time by about 10%
using functions from plyr greatly increases the running time.

What's the fastest way to do this thing?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:47:34+0000

What about with(my_data,sqrt(x^2+y^2)) ?

set.seed(101)
d <- data.frame(x=runif(1e5),y=runif(1e5))

library(rbenchmark)

Two different per-line functions, one taking advantage of vectorization:

hypot <- function(x) sqrt(x[1]^2+x[2]^2)
hypot2 <- function(x) sqrt(sum(x^2))

Try compiling these too:

library(compiler)
chypot <- cmpfun(hypot)
chypot2 <- cmpfun(hypot2)

benchmark(sqrt(d[,1]^2+d[,2]^2),
          with(d,sqrt(x^2+y^2)),
          apply(d,1,hypot),
          apply(d,1,hypot2),
          apply(d,1,chypot),
          apply(d,1,chypot2),
          replications=50)

Results:

                       test replications elapsed relative user.self sys.self
5       apply(d, 1, chypot)           50  61.147  244.588    60.480    0.172
6      apply(d, 1, chypot2)           50  33.971  135.884    33.658    0.172
3        apply(d, 1, hypot)           50  63.920  255.680    63.308    0.364
4       apply(d, 1, hypot2)           50  36.657  146.628    36.218    0.260
1 sqrt(d[, 1]^2 + d[, 2]^2)           50   0.265    1.060     0.124    0.144
2  with(d, sqrt(x^2 + y^2))           50   0.250    1.000     0.100    0.144

As expected the with() solution and the column-indexing solution à la Tyler Rinker are essentially identical; hypot2 is twice as fast as the original hypot (but still about 150 times slower than the vectorized solutions). As already pointed out by the OP, compilation doesn't help very much.

Categories

r - apply() is slow - how to make it faster or what are my alternatives?

r - apply() is slow - how to make it faster or what are my alternatives?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags