dplyr - Check if all the elements in the Vector are available in the groups in R data frame

Question

Welcome To Ask or Share your Answers For Others

dplyr - Check if all the elements in the Vector are available in the groups in R data frame

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

dplyr - Check if all the elements in the Vector are available in the groups in R data frame

I am having a data frame in R as follows:

df <- data.frame("location" = c("IND","IND","IND","US","US","US"), type = c("butter","milk","cheese","milk","cheese","yogurt"), quantity = c(2,3,4,5,6,7))

I am having a vector as follows:

typeVector <- c("butter","milk","cheese","yogurt")

I need to check if all the 4 types mentioned in the vector are available in the data frame for each group based on the location. If any of the types are missing in a group, I need to add a row with the missing element and the corresponding location with the quantity as 0 in the data frame.

This is my expected output

dfOutput <- data.frame("location" = c("IND","IND","IND","IND","US","US","US","US"), type = c("butter","milk","cheese","yogurt","butter","milk","cheese","yogurt"), quantity = c(2,3,4,0,0,5,6,7))

How can I achieve this in R using dplyr package?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-26T20:29:50+0000

library(dplyr)
distinct(df, location) %>%
  tidyr::crossing(type = typeVector) %>%
  full_join(df, ., by = c("location", "type")) %>%
  ungroup() %>%
  mutate(quantity = coalesce(quantity, 0))
#   location   type quantity
# 1      IND butter        2
# 2      IND   milk        3
# 3      IND cheese        4
# 4       US   milk        5
# 5       US cheese        6
# 6       US yogurt        7
# 7      IND yogurt        0
# 8       US butter        0

Steps:

Create a temporary frame that is an expansion of location with your types in typeVector;

distinct(df, location) %>%
  crossing(type = typeVector)
# # A tibble: 8 x 2
#   location type  
#   <chr>    <chr> 
# 1 IND      butter
# 2 IND      cheese
# 3 IND      milk  
# 4 IND      yogurt
# 5 US       butter
# 6 US       cheese
# 7 US       milk  
# 8 US       yogurt

Join this back onto the original data, which will produce NAs in the new rows

... %>%
  full_join(df, ., by = c("location", "type"))
#   location   type quantity
# 1      IND butter        2
# 2      IND   milk        3
# 3      IND cheese        4
# 4       US   milk        5
# 5       US cheese        6
# 6       US yogurt        7
# 7      IND yogurt       NA
# 8       US butter       NA

Change these new fields from NA to 0 with the mutate. (Note: if you have previously-existing NA and want to keep them that way, then this process needs to be adjusted.)
I tend to ungroup all grouped processes when done. This is not necessary for this task, but if you forget it's grouped and do some future work on it, it is possible that you will get different results, or at least it will be slightly less efficient.

Categories

dplyr - Check if all the elements in the Vector are available in the groups in R data frame

dplyr - Check if all the elements in the Vector are available in the groups in R data frame

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags