Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
492 views
in Technique[技术] by (71.8m points)

R: Replacing NA values by mean of hour with dplyr

I'm learning the dplyr package in R and I really like it. But now I'm dealing with NA values in my data.

I would like to replace any NA by the average of the corresponding hour, for example with this very easy example:

#create an example
day = c(1, 1, 2, 2, 3, 3)
hour = c(8, 16, 8, 16, 8, 16)
profit = c(100, 200, 50, 60, NA, NA)
shop.data = data.frame(day, hour, profit)

#calculate the average for each hour
library(dplyr)
mean.profit <- shop.data %>%
  group_by(hour) %>%
  summarize(mean=mean(profit, na.rm=TRUE))

> mean.profit
Source: local data frame [2 x 2]

  hour mean
1    8   75
2   16  130

Can I use the dplyr transform command to replace the NA's of day 3 in the profit with 75 (for 8:00) and 130 (for 16:00)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Try

  shop.data %>% 
             group_by(hour) %>% 
             mutate(profit= ifelse(is.na(profit), mean(profit, na.rm=TRUE), profit))

  #   day hour profit
  #1   1    8    100
  #2   1   16    200
  #3   2    8     50
  #4   2   16     60
  #5   3    8     75
  #6   3   16    130

Or you could use replace

  shop.data %>% 
            group_by(hour) %>%
            mutate(profit= replace(profit, is.na(profit), mean(profit, na.rm=TRUE)))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...