r - How to merge two different groupings if they are not disjoint with dplyr

Question

Welcome To Ask or Share your Answers For Others

r - How to merge two different groupings if they are not disjoint with dplyr

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - How to merge two different groupings if they are not disjoint with dplyr

Suppose that I have two sets of identifiers id1 and id2 in a data frame. How can I create a new identifier id3 that works as follows:

I consider id1 as the stricter key, so that observations are first grouped in id1 and then in id2. If there are two sets of rows with different values of id2 that have some of its elements with the same id1, these two sets should have the same value for id3 (the exact value in id3 doesn't matter much).

 df <- data.frame(id1 = c(1, 1, 2, 2, 5, 6),
             id2 = c(4, 3, 1, 2, 2, 7),
             id3 = c(1, 1, 2, 2, 2, 3))

Rows 1 and 2 are grouped together because they have the same id1. Rows 3, 4 and 5 are grouped together because 3 and 4 have the same id1 and 4 and 5 have the same id2.

Can someone help? I would rather have a solution with dplyr that encompasses a general case in which there is an arbitrary number of possible values in the id columns.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:17:13+0000

This is a graph theory problem. Each id1 and id2 is a separate node and df gives the links between them. You are looking to see which weakly connected clusters each id belongs too.

library(igraph)
df <- df %>% mutate(from = paste0('id1', '_', id1), to = paste0('id2', '_', id2))
dg <- graph_from_data_frame(df %>% select(from, to), directed = FALSE)
df <- df %>% mutate(id3 = components(dg)$membership[from])
df %>% select(id1, id2, id3)

#>   id1 id2 id3
#> 1   1   4   1
#> 2   1   3   1
#> 3   2   1   2
#> 4   2   2   2
#> 5   5   2   2
#> 6   6   7   3

Categories

r - How to merge two different groupings if they are not disjoint with dplyr

r - How to merge two different groupings if they are not disjoint with dplyr

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags