Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
398 views
in Technique[技术] by (71.8m points)

r - Consolidate duplicate rows

I have a data frame where one column is species' names, and the second column is abundance values. Due to the sampling procedure, some species appear more than once (i.e., there is more than one row with Species X in it). I would like to consolidate those entries and sum their abundances.

For example, given this data frame:

set.seed(6)
df=data.frame(
  x=c("sp1","sp2","sp3","sp3","sp4","sp2","sp3"),
  y=rpois(7,2)); df

which produces:

    x y
1 sp1 2
2 sp2 4
3 sp3 1
4 sp3 1
5 sp4 3
6 sp2 5
7 sp3 5

I would like to instead produce:

    x y
1 sp1 2    
2 sp2 9     (5+4)
3 sp3 7     (5+1+1)
5 sp4 3

Thanks in advance for any help you can provide!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This works:

library(plyr)
ddply(df,"x",numcolwise(sum))

in words: (1) split the data frame df by the "x" column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd in ddply stands for "take a d ata frame as input, return a d ata frame")

Another, possibly clearer, approach:

aggregate(y~x,data=df,FUN=sum)

See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...