I want to convert a dataframe from wide format to long format.
Here it is a toy example:
mydata <- data.frame(ID=1:5, ZA_1=1:5,
ZA_2=5:1,BB_1=rep(3,5),BB_2=rep(6,5),CC_7=6:2)
ID ZA_1 ZA_2 BB_1 BB_2 CC_7
1 1 5 3 6 6
2 2 4 3 6 5
3 3 3 3 6 4
4 4 2 3 6 3
5 5 1 3 6 2
There are some variables that will remain as is (here only ID) and some that will be transformed to long format (here all other variables, all ending with _1, _2 or _7)
In order to transform it to long format I'm using data.table melt and dcast, a generic way able to detect the variables automatically. Other solutions are welcome too.
library(data.table)
setDT(mydata)
idvars = grep("_[1-7]$",names(mydata) , invert = TRUE)
temp <- melt(mydata, id.vars = idvars)
nuevo <- dcast(
temp[, `:=`(var = sub("_[1-7]$", '', variable),
measure = sub('.*_', '', variable), variable = NULL)],
... ~ var, value.var='value')
ID measure BB CC ZA
1 1 3 NA 1
1 2 6 NA 5
1 7 NA 6 NA
2 1 3 NA 2
2 2 6 NA 4
2 7 NA 5 NA
3 1 3 NA 3
3 2 6 NA 3
3 7 NA 4 NA
4 1 3 NA 4
4 2 6 NA 2
4 7 NA 3 NA
5 1 3 NA 5
5 2 6 NA 1
5 7 NA 2 NA
As you can see the columns are reoredered alphabetically, but I would prefer to keep the original order as far as possible, for example taking into account the order of the first appearance of the variable.
ID ZA_1 ZA_2 BB_1 BB_2 CC_7
Should be
ID ZA BB CC
I don't mind if the idvars columns come alltogether at the beginning or if they also stay in their original position.
ID ZA_1 ZA_2 TEMP BB_1 BB_2 CC_2 CC_1
would be
ID ZA TEMP BB CC
or
ID TEMP ZA BB CC
I prefer the last option.
Another problem is that everything gets transformed to character.
See Question&Answers more detail:
os