Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
755 views
in Technique[技术] by (71.8m points)

r - Count total missing values by group?

EDIT: input

very new to this.

I have a similar problem to this: group by and then count missing variables?

Taking the input data from that question:

df1 <- data.frame(
  Z = sample(LETTERS[1:5], size = 10000, replace = T),
  X1 = sample(c(1:10,NA), 10000, replace = T),
  X2 = sample(c(1:25,NA), 10000, replace = T),
  X3 = sample(c(1:5,NA), 10000, replace = T))

as one user proposed, it's possible to use summarise_each:

df1 %>% 
  group_by(Z) %>% 
  summarise_each(funs(sum(is.na(.))))
#Source: local data frame [5 x 4]
#
#       Z    X1    X2    X3
#  (fctr) (int) (int) (int)
#1      A   169    77   334
#2      B   170    77   316
#3      C   159    78   348
#4      D   181    79   326
#5      E   174    69   341  

However, I would like to get only the total number of missing values per group.

I've also tried this but it didn't work: R count NA by group

Ideally, it should give me something like:

#       Z    sumNA 
#  (fctr)   (int) 
#1      A    580
#2      B    493
#3      C    585
#4      D    586
#5      E    584  

Thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use the tidyverse approach.

require(tidyverse)
#Sample data
dat <- data.frame(group = rep(c("a", "b", "c", "d", "g"), 3), 
                  y = rep(c(1, NA, 2, NA, 3), 3))


dat %>% 
  group_by(group) %>% 
  summarise(sumNA = sum(is.na(y)))

Output:

  group sumNA
  <fct> <int>
1 a         0
2 b         3
3 c         0
4 d         3
5 g         0

Edit

However, if you have more than one column, you can use summarize_all (or summarize_at if you'd like to specify the columns; thank you @ bschneidr for the comment):

#Sample data
set.seed(123)
dat <- data.frame(group = sample(letters[1:4], 10, replace = T), 
                  x = sample(c(1,NA), 10, replace = T), 
                  y = sample(c(1,NA), 10, replace = T), 
                  z = sample(c(1, NA), 10, replace = T))

dat %>% 
  group_by(group) %>% 
  summarize_all(.funs = funs('NA' = sum(is.na(.))))

# A tibble: 4 x 4
  group  x_NA  y_NA  z_NA
  <fct> <int> <int> <int>
1 a         1     1     0
2 b         3     2     2
3 c         0     1     1
4 d         1     4     2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...