Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

conditional statements - Delete duplicated rows in R with conditions in other columns

This is a little subset of the data :

I have :

Id var1 var2
1   POS NA
1   NA  NEG
2   NEG NA
2   NA  NEG
3   POS NA
3   NA  NEG
4   POS POS
5   POS NA

My ideal output

Id var1 var2
1   POS  NEG
2   NEG  NEG
3   POS  NEG
4   POS  POS
5   POS  NA

I would simply like to delete duplicated Id and have one row per unique id with the good result in var1 and var2. Anyone see the issue? Help would be greatly appreciated. Thank you !

question from:https://stackoverflow.com/questions/65940513/delete-duplicated-rows-in-r-with-conditions-in-other-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You could try a solution with na.omit. This function will remove NA within each group. Assuming your data frame is df...

In base R:

aggregate(. ~ Id,
          data = df, 
          FUN = function(x) { 
            y = na.omit(x) 
            y[length(y) == 0] <- NA 
            y 
          },
          na.action = "na.pass")

Note that y[length(y) == 0] is included to ensure cases like Id 5 and var2 are NA and not character(0).


With dplyr:

library(dplyr)

df %>% 
  group_by(Id) %>%
  summarise(across(everything(), ~ first(na.omit(.))))

Using first will include the first value within the group after NA removed. across(everything()) will apply this method to all columns.


With data.table:

library(data.table)

setDT(df)[, lapply(.SD, na.omit), by = Id]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...