Short version
How to do the operation
df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two)
in a more natural way?
Long version
Given a data frame
df1 <- data.frame(groupid = c("one","one","one","two","two","two", "one"),
value = c(3,2,1,2,3,1,22),
itemid = c(1:6, 6))
for many itemid and groupid pairs we have a value, for some itemids
there are groupids where there is no value. I want to add a default
value for those cases. E.g. for the itemid 1 and groupid "two" there
is no value, I want to add a row where this gets a default value.
The following tidyr code achieves this, but it feels like a strange
way to do it (the default value added here is 0).
df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two)
I am looking for suggestions on how to do this in a more natural way.
Since in some weeks looking at the above code I would likely be confused
about its effect I wrote a function wrapping it:
#' Add default values for missing groups
#'
#' Given data about items where each item is identified by an id, and every
#' item can have a value in every group; add a default value for all groups
#' where an item doesn't have a value yet.
add_default_value <- function(data, id, group, value, default) {
id = as.character(substitute(id))
group = as.character(substitute(group))
value = as.character(substitute(value))
groups <- unique(as.character(data[[group]]))
# spread checks that the columns outside of group and value uniquely
# determine the row. Here we check that that already is the case within
# each group using only id. I.e. there is no repeated (id, group).
id_group_cts <- data %>% group_by_(id, group) %>% do(data.frame(.ct = nrow(.)))
if (any(id_group_cts$.ct > 1)) {
badline <- id_group_cts %>% filter(.ct > 1) %>% top_n(1, .ct)
stop("There is at least one (", id, ", ", group, ")",
" combination with two members: (",
as.character(badline[[id]]), ", ", as.character(badline[[group]]), ")")
}
gather_(spread_(data, group, value, fill = default), group, value, groups)
}
Last note: reason for wanting this is, my groups are ordered (week1, week2, ...)
and I am looking to have every id have a value in every group so that after
sorting the groups per id I can use cumsum to get a weekly running total that
is also shown in the weeks where the running total didn't increase.
See Question&Answers more detail:
os