I have a data set like this
id <- rep(c("A", "B", "C", "D", "E"), 5)
year <- rep( c(2001 : 2005), each = 5)
status <- c(0, 0, 2, 0, 4, 0, 0, 3, 0, 1, 0, 4, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 4 )
dt <- data.frame( year, id, status)
Notice that hare in Year 2003 for Id B and D Status > 0
and for other years it is not. My objective is to find the observations that the Id in 2003 has status > 1 and status = 0 for all other years. If a Id do not have observations for all the year I will not consider that even though in this data all the Id's are there for all the year.
What I did is a long process and not effective-
id1 <- dt %>% filter(year == 2003 & status > 1)
id1 <- id1[["id"]]
dt1 <- dt[dt$id %in% id1, ]
dt2 <- dt1 %>% filter(year != 2003)
dt2<- dt2 %>% mutate( st2 = case_when( status == 1 ~ 0, TRUE ~ status) )
dt2<- setDT(dt2)[, fact := +(uniqueN(st2) == 1), id]
dt2 <- dt2 %>% filter(fact == 1 ) %>% filter(st2 == 0)
id2 <- dt2[["id"]]
dt <- dt1[dt1$id %in% id2, ]
rm(id1, id2, dt1, dt2)
I think this gives me my desired output but not effective for repetitive work. I would really appreciate your help to find a nicer way to work on this.
Note: I am new in r and programming - apologies for unorganized question.
Thanks for your help!!!!
question from:
https://stackoverflow.com/questions/65836946/selecting-rows-conditioned-on-other-columns-of-data-frame-in-r