Suppose that I have two sets of identifiers id1
and id2
in a data frame. How can I create a new identifier id3
that works as follows:
I consider id1
as the stricter key, so that observations are first grouped in id1
and then in id2
. If there are two sets of rows with different values of id2
that have some of its elements with the same id1
, these two sets should have the same value for id3
(the exact value in id3
doesn't matter much).
df <- data.frame(id1 = c(1, 1, 2, 2, 5, 6),
id2 = c(4, 3, 1, 2, 2, 7),
id3 = c(1, 1, 2, 2, 2, 3))
Rows 1 and 2 are grouped together because they have the same id1
. Rows 3, 4 and 5 are grouped together because 3 and 4 have the same id1
and 4 and 5 have the same id2
.
Can someone help? I would rather have a solution with dplyr
that encompasses a general case in which there is an arbitrary number of possible values in the id
columns.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…