Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
226 views
in Technique[技术] by (71.8m points)

granger causality analysis for pairwise variables in dataframe using R

I am wanting to conduct granger causality between several different pairs of variables for country year data. I seem to be able to get it working outside of a loop/function, but am having some trouble integrating it with the remainder of my code. I've provided code for a minimal working example and desired output below. Any help would be greatly appreciated. Thankyou in advance!

EDIT: I should have specified more clearly in the original post. Each column in this generated data contains time series data for multiple countries. I want to average across the countries and then perform granger on those variables) using the same method detailed below

Code to Simulate Time Series Data

library(tidyindexR)
library(tidyverse)
library(dplyr)
library(reshape2)
library(lmtest)

simulateCountryData = function(N=90, NEACH = 30, SEED=100){
        
        # Set seed for reproducibility
        set.seed(SEED)
        
        #""" 
        # This function simulates 
        # N is the total number of observations you want across all subjecs/geocodes
        # NEACH is the number of observations per geocode/sample
        # see https://blog.revolutionanalytics.com/2009/02/how-to-choose-a-random-number-in-r.html for more reources
        #"""
        
        variableOne<-rnorm(N,sample(1:100, NEACH),0.5)
        variableOne[variableOne<0]<-0

        variableTwo<-rnorm(N,sample(100:1, NEACH),0.5)
        variableTwo[variableTwo<0]<-0
        
        variableThree<-rnorm(N,sample(1:100, NEACH),0.5)
        variableThree[variableTwo<0]<-0
        
        variableFour<-rnorm(N,sample(1:100, NEACH),0.5)
        variableFour[variableFour<0]<-0
        
        variableFive<-rnorm(N,sample(1:100, NEACH),0.5)
        variableFive[variableFive<0]<-0
        
        # Defining the subjects/geocodes
        # Makes a sequence of 1:N/NEACH such that the geocode appears NEACH times (i.e. 1,60/3,each=3)
        geocodeNum<-factor(rep(seq(1,N/NEACH),each=NEACH))
        
        # Defining the years/time periods to be repeated 
        # Lists the tear 3x each (3 years per subject/geocode)
        year<-rep(seq(2000,2000+NEACH-1,1),N/NEACH)
        
        # Putting it all together
        AllData<-data.frame(geocodeNum,
                            year,
                            variableOne,
                            variableTwo,
                            variableThree,
                            variableFour,
                            variableFive)
        
        return(AllData)
}

mySimData = simulateCountryData()

Code to extract pearson correlations and format in dataframe credit goes to the-mad-statter who kindly provided the answer here

# matrix of unique pairs coded as numeric
mx_combos <- combn(1:length(myVariables), 2)
# list of unique pairs coded as numeric
ls_combos <- split(mx_combos, rep(1:ncol(mx_combos), each = nrow(mx_combos)))
# for each pair in the list, create a 1 x 4 dataframe
ls_rows <- lapply(ls_combos, function(p) {
        # lookup names of variables
        v1 <- myVariables[p[1]]
        v2 <- myVariables[p[2]]
        # perform the cor.test()
        htest <- cor.test(mySimData[[v1]], mySimData[[v2]])
        # record pertinent info in a dataframe
        data.frame(Var1 = v1,
                   Var2 = v2,
                   Pval = htest$p.value,
                   Rval = unname(htest$estimate))
})
# row bind the list of dataframes

my_output = dplyr::bind_rows(ls_rows)

This function produces the following dataframe:

        Var1          Var2        Pval        Rval
variableFive  variableFour 0.749283286 -0.03415477
variableFive   variableOne 0.865119584 -0.01815724
variableFive variableThree 0.004120881 -0.29960240
variableFive   variableTwo 0.024713897  0.23666723
variableFour   variableOne 0.249864859  0.12254729
variableFour variableThree 0.587474758  0.05794634
variableFour   variableTwo 0.624329543  0.05231733
 variableOne variableThree 0.056216554 -0.20200708
 variableOne   variableTwo 0.368598424 -0.09589547

variableThree variableTwo 0.056192121 -0.20202672

I would like to add the p values of a granger causality analysis between each pairwise variable in the my_output dataframe as a additional column. At the moment, I can extract the pvalues for a specific pairwise comparison, but am having trouble figuring out how do that for all pairwise variables in an easy way (as in the real example there are many any more variables). If someone could point me n the right direction or provide a solution, that would be amazing.

Code for granger Analysis This code seems to average data for each year across countries and then perform granger causality between the variables. I just need a bit of help figuring how how to change this to work with the my_output dataframe.

a= mySimData %>% 
        select(geocodeNum, year,variableFive) %>%
        group_by(year) %>%
        summarize(mean(variableFive))

b = mySimData %>% 
        select(geocodeNum, year,variableTwo) %>%
        group_by(year) %>%
        summarize(mean(variableTwo))

c = left_join(a,b)

d = grangertest(c$`mean(variableFive)` ~ c$`mean(variableTwo)`)
d[2,4]

Desired output

Ideally, the output should look something like this (albeit with correct P values for the granger):

        Var1          Var2        Pval        Rval Granger
variableFive  variableFour 0.749283286 -0.03415477   0.050
variableFive   variableOne 0.865119584 -0.01815724   0.021
variableFive variableThree 0.004120881 -0.29960240   0.090
variableFive   variableTwo 0.024713897  0.23666723   0.042
variableFour   variableOne 0.249864859  0.12254729   0.050
variableFour variableThree 0.587474758  0.05794634   0.021
variableFour   variableTwo 0.624329543  0.05231733   0.092
variableOne variableThree 0.056216554 -0.20200708   0.046
variableOne   variableTwo 0.368598424 -0.09589547   0.072
variableThree   variableTwo 0.056192121 -0.20202672   0.033

Thanks again for reading.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can apply the function to all combinations in combn function itself, so no need to have a separate lapply call.

cols <- grep('variable', names(mySimData), value = TRUE)

result <- do.call(rbind, combn(cols, 2, function(p) {
  v1 <- p[1]
  v2 <- p[2]
  # perform the cor.test()
  htest <- cor.test(mySimData[[v1]], mySimData[[v2]])
  val1 <- aggregate(mySimData[[v1]], list(mySimData$year), mean)[[2]]
  val2 <- aggregate(mySimData[[v2]], list(mySimData$year), mean)[[2]]
  #perform grangertest
  mod <- lmtest::grangertest(val1, val2)
  # record pertinent info in a dataframe
  data.frame(Var1 = v1,
             Var2 = v2,
             Pval = htest$p.value,
             Rval = unname(htest$estimate), 
             Granger = mod[2, 4])
}, simplify = FALSE))

result

#            Var1          Var2     Pval     Rval Granger
#1    variableOne   variableTwo 0.368598 -0.09590  0.8309
#2    variableOne variableThree 0.056217 -0.20201  0.2831
#3    variableOne  variableFour 0.249865  0.12255  0.8860
#4    variableOne  variableFive 0.865120 -0.01816  0.8842
#5    variableTwo variableThree 0.056192 -0.20203  0.9049
#6    variableTwo  variableFour 0.624330  0.05232  0.7993
#7    variableTwo  variableFive 0.024714  0.23667  0.7060
#8  variableThree  variableFour 0.587475  0.05795  0.2510
#9  variableThree  variableFive 0.004121 -0.29960  0.2813
#10  variableFour  variableFive 0.749283 -0.03415  0.8396

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...