I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date). Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.
I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates. For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells. Example:
>x
"1987-01-01" 7.1 NA 3
"1987-01-02" 5.2 5 2
"1987-01-06" 2.3 NA 9
>y
"1987-01-01" 55.3 66 45
"1987-01-03" 77.3 87 34
# result I would like
"1987-01-01" 7.1 NA 3 55.3 66 45
"1987-01-02" 5.2 5 2 NA NA NA
"1987-01-03" NA NA NA 77.3 87 34
"1987-01-06" 2.3 NA 9 NA NA NA
What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, e.g. "1987-01-02" shown as 6210) from each series appearing in two separate columns.
I've sat for hours getting almost nowhere, so all help is appreciated.
EDIT: some code included below as per suggestion from Soumendra
atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)
# only number of observations and the observations themselves differ
>str(atcoa)
'data.frame': 6151 obs. of 8 variables:
$ date :Class 'Date' num [1:6151] 6210 6213 6215 6216 6217 ...
$ max : num 4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
$ min : num 4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
$ close : num 4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
$ avg : num NA NA NA NA NA NA NA NA NA NA ...
$ tot.vol : int 877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
$ turnover : num 3762300 388900 3835900 8015900 12468100 ...
$ transactions: int 12 9 24 17 31 26 34 35 37 33 ...
>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02 4.31 4.28 4.31 NA 877733 3762300 12
# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")
>str(ts.atcoa)
Time Series:
Name: object
Data Matrix:
Dimension: 6151 7
Column Names: a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
Row Names: 1970-01-01 01:43:30 ... 1970-01-01 04:12:35
Positions:
Start: 1970-01-01 01:43:30
End: 1970-01-01 04:12:35
With:
Format: %Y-%m-%d %H:%M:%S
FinCenter: GMT
Units: a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
Title: Time Series Object
Documentation: Wed Aug 17 13:00:50 2011
>ts.atcoa[1:1, ]
GMT
a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1970-01-01 01:43:30 4.31 4.28 4.31 NA 877733 3762300 12
# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob) # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25 7.92 7.77 7.79 NA 269172 2119400 19 7.69 7.56 7.64 NA 81176693 593858000 12
EDIT: problem solved by (using zoo package)
atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))
names(atcoa) <- c("a.max", "a.min", "a.close",
"a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
"b.avg", "b.tot.vol", "b.turnover", "b.transactions")
atco <- merge.zoo(atcoa, atcob)
Thank you all for your help.
See Question&Answers more detail:
os