I am wanting to conduct granger causality between several different pairs of variables for country year data. I seem to be able to get it working outside of a loop/function, but am having some trouble integrating it with the remainder of my code. I've provided code for a minimal working example and desired output below. Any help would be greatly appreciated. Thankyou in advance!
EDIT: I should have specified more clearly in the original post. Each column in this generated data contains time series data for multiple countries. I want to average across the countries and then perform granger on those variables) using the same method detailed below
Code to Simulate Time Series Data
library(tidyindexR)
library(tidyverse)
library(dplyr)
library(reshape2)
library(lmtest)
simulateCountryData = function(N=90, NEACH = 30, SEED=100){
# Set seed for reproducibility
set.seed(SEED)
#"""
# This function simulates
# N is the total number of observations you want across all subjecs/geocodes
# NEACH is the number of observations per geocode/sample
# see https://blog.revolutionanalytics.com/2009/02/how-to-choose-a-random-number-in-r.html for more reources
#"""
variableOne<-rnorm(N,sample(1:100, NEACH),0.5)
variableOne[variableOne<0]<-0
variableTwo<-rnorm(N,sample(100:1, NEACH),0.5)
variableTwo[variableTwo<0]<-0
variableThree<-rnorm(N,sample(1:100, NEACH),0.5)
variableThree[variableTwo<0]<-0
variableFour<-rnorm(N,sample(1:100, NEACH),0.5)
variableFour[variableFour<0]<-0
variableFive<-rnorm(N,sample(1:100, NEACH),0.5)
variableFive[variableFive<0]<-0
# Defining the subjects/geocodes
# Makes a sequence of 1:N/NEACH such that the geocode appears NEACH times (i.e. 1,60/3,each=3)
geocodeNum<-factor(rep(seq(1,N/NEACH),each=NEACH))
# Defining the years/time periods to be repeated
# Lists the tear 3x each (3 years per subject/geocode)
year<-rep(seq(2000,2000+NEACH-1,1),N/NEACH)
# Putting it all together
AllData<-data.frame(geocodeNum,
year,
variableOne,
variableTwo,
variableThree,
variableFour,
variableFive)
return(AllData)
}
mySimData = simulateCountryData()
Code to extract pearson correlations and format in dataframe
credit goes to the-mad-statter who kindly provided the answer here
# matrix of unique pairs coded as numeric
mx_combos <- combn(1:length(myVariables), 2)
# list of unique pairs coded as numeric
ls_combos <- split(mx_combos, rep(1:ncol(mx_combos), each = nrow(mx_combos)))
# for each pair in the list, create a 1 x 4 dataframe
ls_rows <- lapply(ls_combos, function(p) {
# lookup names of variables
v1 <- myVariables[p[1]]
v2 <- myVariables[p[2]]
# perform the cor.test()
htest <- cor.test(mySimData[[v1]], mySimData[[v2]])
# record pertinent info in a dataframe
data.frame(Var1 = v1,
Var2 = v2,
Pval = htest$p.value,
Rval = unname(htest$estimate))
})
# row bind the list of dataframes
my_output = dplyr::bind_rows(ls_rows)
This function produces the following dataframe:
Var1 Var2 Pval Rval
variableFive variableFour 0.749283286 -0.03415477
variableFive variableOne 0.865119584 -0.01815724
variableFive variableThree 0.004120881 -0.29960240
variableFive variableTwo 0.024713897 0.23666723
variableFour variableOne 0.249864859 0.12254729
variableFour variableThree 0.587474758 0.05794634
variableFour variableTwo 0.624329543 0.05231733
variableOne variableThree 0.056216554 -0.20200708
variableOne variableTwo 0.368598424 -0.09589547
variableThree variableTwo 0.056192121 -0.20202672
I would like to add the p values of a granger causality analysis between each pairwise variable in the my_output
dataframe as a additional column. At the moment, I can extract the pvalues for a specific pairwise comparison, but am having trouble figuring out how do that for all pairwise variables in an easy way (as in the real example there are many any more variables). If someone could point me n the right direction or provide a solution, that would be amazing.
Code for granger Analysis
This code seems to average data for each year across countries and then perform granger causality between the variables. I just need a bit of help figuring how how to change this to work with the my_output
dataframe.
a= mySimData %>%
select(geocodeNum, year,variableFive) %>%
group_by(year) %>%
summarize(mean(variableFive))
b = mySimData %>%
select(geocodeNum, year,variableTwo) %>%
group_by(year) %>%
summarize(mean(variableTwo))
c = left_join(a,b)
d = grangertest(c$`mean(variableFive)` ~ c$`mean(variableTwo)`)
d[2,4]
Desired output
Ideally, the output should look something like this (albeit with correct P values for the granger):
Var1 Var2 Pval Rval Granger
variableFive variableFour 0.749283286 -0.03415477 0.050
variableFive variableOne 0.865119584 -0.01815724 0.021
variableFive variableThree 0.004120881 -0.29960240 0.090
variableFive variableTwo 0.024713897 0.23666723 0.042
variableFour variableOne 0.249864859 0.12254729 0.050
variableFour variableThree 0.587474758 0.05794634 0.021
variableFour variableTwo 0.624329543 0.05231733 0.092
variableOne variableThree 0.056216554 -0.20200708 0.046
variableOne variableTwo 0.368598424 -0.09589547 0.072
variableThree variableTwo 0.056192121 -0.20202672 0.033
Thanks again for reading.