I am trying to calculate a mean from vector in a list, conditional on the value of another vector in that list. Here is a simple example:
> df1 <- seq(1:10)
> df2 <- rep(0:1, 5)
>
> df3 <- bind_cols(df1, df2)
> df3
# A tibble: 10 x 2
...1 ...2
<int> <int>
1 1 0
2 2 1
3 3 0
4 4 1
5 5 0
6 6 1
7 7 0
8 8 1
9 9 0
10 10 1
Basically, I want to calculate the mean of column 1 if column 2 == 0. Very simple however I would like to do this across a few dozen dataframes. For this I am using the lapply
function, I first create a list of all my data frames (for simplicity, just one):
> z = list(df3)
df3 now contains both df1 and df2. The part I can't figure out is in the lapply
function syntax, how do I calculate the mean of df1 based on the df2 value? I imagine something like this:
tot_mean <- lapply(z[[1]], FUN = function(x) {
mean(x[[df1]][[df2==1]])
})
or more generally:
tot_mean <- lapply(z[[1]], FUN = function(x) {
mean(df1 if df2 == 0)
In addition, my goal would be to then remove df2 from the list; the only value left would be the mean df1 value when df2 equals 0.
I get the sense here that the issue is related to how we are going through the list here (i.e. go through df1 first, calculate the mean, then through df2, calculate mean). I don't necessarily need to use lists, I would be happy to keep df3 as a dataframe however I am not sure how to set up a for loop to run through different data frames and calculate a mean.
Thank you!
question from:
https://stackoverflow.com/questions/66058121/lapply-dplyr-and-using-values-within-lists 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…