Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
176 views
in Technique[技术] by (71.8m points)

r - lapply, dplyr, and using values within lists

I am trying to calculate a mean from vector in a list, conditional on the value of another vector in that list. Here is a simple example:

> df1 <- seq(1:10)
> df2 <- rep(0:1, 5) 
> 
> df3 <- bind_cols(df1, df2)
> df3
# A tibble: 10 x 2
    ...1  ...2
   <int> <int>
 1     1     0
 2     2     1
 3     3     0
 4     4     1
 5     5     0
 6     6     1
 7     7     0
 8     8     1
 9     9     0
10    10     1

Basically, I want to calculate the mean of column 1 if column 2 == 0. Very simple however I would like to do this across a few dozen dataframes. For this I am using the lapply function, I first create a list of all my data frames (for simplicity, just one):

> z = list(df3)

df3 now contains both df1 and df2. The part I can't figure out is in the lapply function syntax, how do I calculate the mean of df1 based on the df2 value? I imagine something like this:

tot_mean <- lapply(z[[1]], FUN = function(x) {
  mean(x[[df1]][[df2==1]])  
})

or more generally:

tot_mean <- lapply(z[[1]], FUN = function(x) {
  mean(df1 if df2 == 0)

In addition, my goal would be to then remove df2 from the list; the only value left would be the mean df1 value when df2 equals 0.

I get the sense here that the issue is related to how we are going through the list here (i.e. go through df1 first, calculate the mean, then through df2, calculate mean). I don't necessarily need to use lists, I would be happy to keep df3 as a dataframe however I am not sure how to set up a for loop to run through different data frames and calculate a mean.

Thank you!

question from:https://stackoverflow.com/questions/66058121/lapply-dplyr-and-using-values-within-lists

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If list_of_dfs is a list of data frames, and, for each data frame in the list you want to calculate the mean of the first column when the second column is 0, it would be lapply(list_of_dfs, function(x) mean(x[[1]][x[[2]] == 0])). If you want to use column names inside [[, put them in quotes. lapply(list_of_dfs, function(x) mean(x[["col_1"]][x[["col_2"]] == 0])). (df usually means "data frame" - it's weird to me that you are use df as a part of a name to refer to individual columns, a whole data frame, and a list of data frames. So I've changed the names to try to make it clearer.) – Gregor Thomas

(From Gergor in the comments, Thanks Gregor!)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...