X

Download Climate Change Model PowerPoint Presentation

SlidesFinder-Advertising-Design.jpg

Login   OR  Register
X


Iframe embed code :



Presentation url :

Home / Science & Technology / Science & Technology Presentations / Climate Change Model PowerPoint Presentation

Climate Change Model PowerPoint Presentation

Ppt Presentation Embed Code   Zoom Ppt Presentation

PowerPoint is the world's most popular presentation software which can let you create professional Climate Change Model powerpoint presentation easily and in no time. This helps you give your presentation on Climate Change Model in a conference, a school lecture, a business proposal, in a webinar and business and professional representations.

The uploader spent his/her valuable time to create this Climate Change Model powerpoint presentation slides, to share his/her useful content with the world. This ppt presentation uploaded by gaetan in Science & Technology ppt presentation category is available for free download,and can be used according to your industries like finance, marketing, education, health and many more.

About This Presentation

Climate Change Model Presentation Transcript

Slide 1 - Climate Change Models to Estimate and Forecast Temperature Gaetan Lion September 2021 1
Slide 2 - 2 Content Introduction Data Baseline trend models CO2 models Out-of-sample forecasts Replicating IPCC scenarios Granger Causality, VAR, IRFs VAR Forecast
Slide 3 - 3 1. Introduction This presentation discloses the modeling of global temperature* associated, or caused, by a rising concentration in CO2 in parts per million (ppm). Other variables will also be explored and tested to include within these Climate Change models. The above is: To assess the information imparted by CO2 concentration into this model estimating and predicting temperature; To test the accuracy of such models to fit the historical temperature data and to forecast temperature within out-of-sample testing framework; To replicate the most recent IPCC scenarios; To better understand the relationship between CO2 concentration and temperature and to attempt to demonstrate causality of CO2 -> temperature. * Measured as temperature anomaly over the 1850 – 1900 average global temperature.
Slide 4 - 4 2. Data -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 NOAA vs NASA Temperature anomaly in degree Celsius NOAA NASA Temperature annual history going back to 1880 to 2020. The two overlapping series are from NASA and the NOAA. We will use the average of the two series. Temperature records are captured as “temperature anomaly.” The latter represents the difference in temperature between a specific year and the average over the 1850 – 1900 period when the industrialization, level and CO2 concentration were much lower. Temperature anomaly is measured in degree Celsius (or Centigrade that are equivalent).
Slide 5 - Annual CO2 concentration in parts per million from 1880 to 2020. Data from 1880 to 1958 is derived from a cooperative effort between three different scientific teams from Australia and France constructing the data derived from ice core analysis. Data from 1958 to 2020 is from the NOAA. 5
Slide 6 - We understand that comparing two levels variables, without detrending them, can lead to spurious correlations and regressions. However, when two level variables are cointegrated the above caveat is nullified. We will disclose later cointegration testing for these two variables. As observed this scatter plot shows a pretty strong correlation between the two variables. 6
Slide 7 - The relationship between CO2 and temperature can be split over two periods. The first one (1880 – 1970) with CO2 concentration ranging from 290 to 325 ppm is associated with a not so strong linear relationship between the two variables. The second one (1971 – 2020) with CO2 concentration ranging from 325 to close to 420 is associated with a very strong linear relationship. For the purpose of our modeling, we will not split the data as the related regression parameters are pretty stable (intercept and slope of the regression equations shown on the scatter plots). 8
Slide 8 - Checking the Autocorrelation of the Residuals of the Ordinary Least Square (OLS) Cointegration Regression: Temperature ~ CO2 8 Given that we are using level variables, the residuals autocorrelation levels as captured by the ACF and PACF graphs is reasonably low. And, at the onset suggests that these two variables (CO2 and temperature) may be indeed cointegrated. The PACF graph at the bottom is the one used to select the number of yearly lags we should select to conduct our unit root testing to confirm that these residuals are indeed stationary (do not have a unit root). Even though within the PACF graph, only lag 1 crosses the line of statistical significance ( > 0.2), we will use up to lag 4 to be more conservative.
Slide 9 - 9 Testing the residuals of the OLS Cointegration Regression Temperature ~ CO2 for stationarity We used 4 lags for each of the above unit root test. In each case, the respective unit root tests confirmed that the Cointegration Regression residuals were stationary. This confirmation allows us to proceed in modeling the relationship between CO2 and temperature using level variables knowing that these two variables are explicitly cointegrated. Further residual model testing often includes testing for autocorrelation, heteroskedasticity, and normal distribution. However, any related residual issues do not affect the regression coefficients biasness. They may affect the reliability of regression coefficients confidence intervals and their statistical significance. However, if such regression coefficients are associated with t-stats > 2.5 or 3.0, statistical significance is typically not an issue (even after adjusting with Robust Standard Errors). Additionally, in some cases as we’ll see we are not explicitly concerned with levels of statistical significance, as long as the variable make good sense in terms of explaining how the climate system works, and that the variable regression coefficient has the appropriate sign.
Slide 10 - 10 3. Baseline trend models Within this section we will develop models that do not use CO2 as an exogenous variable but simply various trend variables (counting 1, 2, 3, 4,…). This is just to test whether just the passing of time is the driving trend and not so much CO2 as a causal factor. This is a pretty good test on whether your level-based original model is truly valid and not another example of a spurious regression using level variables.
Slide 11 - -0.4 -0.6 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 Temperature Anomaly. Historical Fit of a simple Trend Model Actual Trend This model uses a single Trend variable (counting 1, 2, 3, etc.) to estimate the temperature over time. It does not use any exogenous information. 11 As shown, this Trend model is pretty terrible. Notice how it way underestimates temperatures at the onset from 1880 to 1900 and at the end from 2005 to 2020. In between from 1901 to 2004 it typically overestimates temperatures.
Slide 12 - 13 The Trend Model residuals are pretty awful looking -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 0.50 Residual 280 300 320 340 360 380 400 420 CO2 concentration (ppm) Trend Model residuals -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 0.50 -0.60 -0.40 -0.20 0.40 0.60 Residual 0.00 0.20 Model Estimate Trend Model residuals A good model should have a residual curve (red dashed line) that is flat, straight, and sits at the 0.00 level. This would indicate residuals that are stationary and mean reverting around the 0.00 level. These residuals are far away from meeting that standard. They are clearly nonstationary.
Slide 13 - 14 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 Temperature Anomaly. Hist. Fit of a polynomial Trend Model Actual Trend 2 If we simply add a second Trend variable that represents the square of the Trend variable, we actually get a surprisingly good historical fit of the data in absence of any information from CO2. While the Trend variable starts with 1, 2, 3, etc.; the Trend square variable starts with 1, 4, 9, etc. The combination of those two variables make for a very good polynomial regression equation that fits the J shape curve of the data very well. We call this model Trend 2.
Slide 14 - The residual trend line (red) relative to Model Estimates on the x-axis within the right-hand graph is now perfectly flat, straight, and on the 0.00 line as it should. Even that same residual trend line when using CO2 concentration on the x-axis is actually reasonably flat. It looks like this model appear to capture a good deal of the information imparted by the CO2 variable. We have now a pretty competitive Baseline Trend model to assess the validity of our upcoming CO2 mode1ls5. The Trend 2 Model residuals are far better looking
Slide 15 - Description of the Trend 2 model The square of the trend (trend2) is a very large number, so the resulting regression coefficient is very small: 0.000085. 15 All the Goodness-of-fit measures are very high. And, the resulting model errors are pretty low. This is kind of amazing given that we have just used trend variables to fit the temperature history starting back in 1880.
Slide 16 - 16 4. CO2 Models We will introduce two CO2 based models to estimate and forecast temperature. The first one will be our simple linear OLS Cointegration Regression just using CO2 as our stand alone exogenous variable. The second one will be a more complete model that will also include the influence on temperature from the Pacific Decadal Oscillation with warm years due to El Nino and cold years due to La Nina. This model will also include another intervention variable covering the years from 1940 to 1970 before sulfates aerosol were heavily regulated. Sulfates have a lowering effect on temperature that partly counters the rising effect of CO2.
Slide 17 - CO2 model description Notice the extremely high t-stat of the CO2 coefficient, leaving no doubt as to the statistical significance of this variable. 17
Slide 18 - The CO2 model has very good looking residuals (flat red lines) 18
Slide 19 - The more complete CO2 based model The El Nino variable has a p-value of 0.155, not stat. significant at the Alpha < 0.10 level. However, within a sport betting market, this same p-value would correspond to one team being favored with odds close to 6-to-1 of winning. That be a pretty good bet. In view of the above, we are comfortable including the El Nino variable in our model. It also makes sense to include both years that have a positive impact on temperature (El Nino) with the ones that have a negative impact (La Nina). 19
Slide 20 - The complete Model residuals are still reasonably good looking (fairly flat red curves) 20
Slide 21 - 21 Model Competition regarding the fit of Temperature history Whether looking at measure of variance explanation (Adjusted R Square), one-observation prediction (Predicted R Square) or model errors (RMSE and Mean Absolute Error), the three models are very close. The CO2 model and the Trend 2 model are just about dead even on all counts. The Model that includes the other variables such as El Nino and La Nina is fractionally more accurate. If we stopped our analysis now, one could prematurely conclude that the trend (including the trend square variable) just about explains everything regarding the progressive increase in temperature from 1880 and 2020. And, that the two CO2 based models really do not add much information if any above just capturing this trend. This could lead one to assessing our CO2 based models as “spurious.” Additional analysis will confirm otherwise supporting that including a CO2 variable far improves the prediction accuracy of such model. Fitting the historical data is one thing. Making reasonably accurate prediction is far more challenging and useful.
Slide 22 - 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 Temperature Anomaly. Historical Fit Actual CO2 model Model Trend 2 When you visually compare the historical fit of the three models, they all fit the underlying J curve long term trend of temperature increase over the 1880 to 2020 period. 22 The more complete Model that includes the El Nino and La Nina variables does match some of the volatility or oscillations in the temperature annual data much better than the other two models. Otherwise, again really not much difference between the three models.
Slide 23 - 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Historical Fit since 1990 Actual CO2 model Model Trend 2 Focusing on the more recent period since 1990 util 2020, we can observe similar pretty good fit between the three models. 23 The more complete Model has a slightly better fit by better capturing the temperature oscillations associated with El Nino/La Nina. However, notice how the Trend 2 model starts to underestimate the temperature level starting in 2014. This may be the first indication that CO2 does impart some valuable information to this temperature model.
Slide 24 - 24 5. Out-of-sample forecasts Fitting historical data is one thing. And, one way or another it is often relatively easy even in the case of fitting historical temperature level from 1880 to 2020, as we have seen. Predicting observations using out-of-sample forecasts, also called Hold Out testing, is far more difficult and is a far more relevant test of a model predictive accuracy. With such models, you run often into a situation where a model fits the historical data really well, but predicts really poorly (in Hold Out testing). This is a classic situation of model overfitting. It happens all the time. Within this section we will test whether our models are overfit, or if instead they do provide predictive information.
Slide 25 - 25 Cross Validation test Cross validation is a rigorous form of out-of-sample forecast testing. In our case, we removed 14 observations from the data to create a 14-year prediction window. And, we did this exercise 10 times to cover the 141 yearly observations within our complete data set. So, the first prediction window went from 1880 to 1893. We used a model with history from 1894 to 2020 to attempt to predict the 1880 – 1893 years. The second prediction window was from 1894 to 1907. We used a model with history in all other years outside the prediction window. And, we continued this process until using the most recent 14 years as our prediction window. The table compares the Mean Absolute Error (MAE) of each of our three models when we first used the entire data set to fit the history. Next, it discloses the MAE that is the average MAE of the 10 cross validation prediction windows. And, next we look at the ratio or multiple of the cross validations MAE divided by the MAE during history. The cross validation MAE by definition should be much higher than the MAE during history. If that multiple is greater than 1.5, you may be dealing with a model that is overfit. As shown above, all our three models perform well on this count with very little deterioration during the cross validation test. Again the complete Model is a bit better than the other two. And, at the margin our CO2 model did a bit better than the Trend 2 model during cross validation.
Slide 26 - 2006 – 2020 Out-of-sample Hold Out Test 27 Temperature Anomaly. Hold Out 2006 - 2020 1.20 1.00 0.80 0.60 0.40 0.20 0.00 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Actual CO2 model Model Trend2 When we attempt to forecast the recent period (2006 – 2020) using historical data (1880 – 2005), the Trend 2 model way underestimates the increase in temperature over the recent period ( + 0.12 vs. + 0.33 for actuals). The two CO2 based models do a lot better with respective temperature increase ranging from + 0.25 to + 0.28.
Slide 27 - 1990 – 2005 Out-of-sample Hold Out Test 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Temperature Anomaly. Hold Out 1990 - 2005 Actual CO2 Model Model Trend2 This is the exact same pattern as the prior Hold Out Test. On a begin-to-end point basis, the Trend 2 model greatly underestimates the temperature increase over the 1990 – 2005 period. 27 Notice how the simpler CO2 model does better than the more complete Model on a begin-to- end point basis. This was also true in the Hold Out test on the previous slide. The repeated relative failure of the Trend 2 model is not so surprising. Polynomial regressions are notoriously good at fitting historical data; but often not so good the minute you do some out-of-sample testing.
Slide 28 - 1982 – 2020 Out-of-sample Hold Out Test 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020 Actual CO2 Model Model Trend 2 This is an unusually long Hold Out test where we removed the most 39 recent years of the data (1982 – 2020). 28 Just by knowing the CO2 concentration level, we would have come up with an excellent begin-to-end point estimation of the overall temperature increase over this 39 year period (CO2 Model). And, that estimation is far superior than the estimation from the other two models.
Slide 29 - Why is the complete Model a distant second to the simpler CO2 based model? It is because the Pacific Decadal Oscillation that captures the El Nino (+) and La Nina (-) is not so decadal. It is very volatile and captured in 3-month moving average that can often fluctuate between an El Nino (+) and La Nina (-) phenomenon within the same year. Therefore, the yearly based capture of those phenomena is highly inaccurate. 29
Slide 30 - Attempt to improve Hold Out with a Robust Quantile Regression regular CO2 model. To the contrary, the regular CO2 model generated a better set of predictions over this Hold Out period. This gives us some comfort that this CO2 model is pretty well specified, not overly influenced by outliers within its historical data, and able to make really pretty good predictions over a 39 year period. Prediction success over such a long period (just assuming we know the accurate value of CO2 concentration) is very rare for such time series models.31 Temperature Anomaly. Hold Out 1982 - 2020 1.2 1.0 0.8 0.6 0.4 0.2 0.0 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 Actual CO2 Model Robust Model Linear regressions such as our CO2 models can be affected by outliers of both the Y variable (temperature) due to variables not included in the model (El Nino/La Nina, influence of other greenhouse gases, etc.) or the X variable (non linear change or random jumps in the CO2 concentration variable). To remedy the above issue of a regression coefficients being influenced or distorted by outliers in the historical data, we use robust regressions that are more resistant to the influence of such outliers. A common robust regression method is Quantile Regression that regresses to the Median instead of the Mean. And, therefore much reduces the influence of outliers. However, as shown such a Robust Quantile Regression did not improve the Hold Out performance of our
Slide 31 - 6. Replicating IPCC scenarios 32
Slide 32 - IPCC Scenarios 33 Within its most recent assessment, the IPCC has developed 5 different scenarios. The most benign one being called SSP1-1.9 whereby CO2 concentration would remain relatively flat between 400 to 450 ppm. And, the temperature anomaly would remain close to + 1.5 degree Celsius. The most severe one is called SSP5-8.5 when CO2 concentration would continue increasing rapidly to 1100 ppm by the end of the century; and, the temperature anomaly would reach about + 4.4 degree Celsius. Source: IPCC Technical Summary 2021. The large gray letters are part of the following statement ”accepted version subject to final editing.”
Slide 33 - 34 Temperature Anomaly in deg. Celsius Replicating IPCC Scenarios 10 9 8 7 6 5 4 3 2 1 0 -1 300 400 500 600 700 800 900 1000 1100 1200 CO2 Concentration (ppm) CO2 model LN(CO2) model Attempting to replicate the IPCC scenarios Our CO2 linear model appears to way overshoot IPCC scenarios when using true-out-of-sample CO2 concentrations that are way higher than what the model was trained on (much greater than 420 ppm and going up to 1200 ppm). However, using a very similar model structure and simply using the LN(CO2) generates a curve that looks like it may very well replicate the IPCC scenarios. We will look at that in greater detail on the next slide. Note how the two models are very close when using CO2 concentrations that the linear CO2 model was trained on, ranging from 300 to 400 ppm
Slide 34 - As shown on the graph, the LN(CO2) model temperature estimates with CO2 concentration up to 1200 ppm come very close to the ones generated by the IPCC scenarios. The graph highlights the temperature estimates for the most benign IPCC scenario, SSP1-1.9, and the most severe one, SSP5-8.5. The model slightly underestimates the former; and, is pretty much right on the money for the latter (the most severe scenario). 34
Slide 35 - Why did we not use LN(CO2) instead of CO2 to estimate and forecast temperature earlier? It is for a simple reason. When CO2 is < 420 ppm, historically there is a very strong linear relationship between CO2 and temperature. That linear relationship is much stronger and better fitting than a logarithmic relationship between the two variables. 35 We tested a logarithmic model with LN(CO2). It was pretty good, but it came a distant second to the linear CO2 model when conducting out-of-sample Hold Out testing. Over the longer term, going forward, and with true- out-sample CO2 concentration levels (much above 420 ppm), the scientific community within the IPCC assesses that the CO2 vs. temperature relationship follows a logarithmic curve. That’s a very good thing. If the relationship would continue to be linear, our survival would become increasingly unlikely.
Slide 36 - Description of the CO2 Model vs. LN(CO2) one As shown below, regarding the historical fit of the temperature data both models are very close. The Adjusted R Squares are nearly even at 0.89. And, the respective model Standard Errors between 0.117 and 0.118 degree Celsius are also very close. 36
Slide 37 - 1982 – 2020 Out-of-sample Hold Out Test. CO2 Model vs. LN(CO2) Model As shown on our long out-of-sample Hold Out test (1982 – 2020), the CO2 model performs much better than the LN(CO2) model. This is especially true if we look at it from a begin-to-endpoint perspective. The CO2 model just about meets the endpoint in 2020 when the temperature anomaly is + 1.00 degree Celsius. Meanwhile, the LN(CO2) model misses it by almost 0.2 degree Celsius. 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020 Actual CO2 Model LN(CO2) Model 37
Slide 38 - 38 7. Granger Causality, VAR, IRFs We will use the mentioned statistical methods to attempt to assess the causality of the CO2 concentration on temperature. Based on the disclosed work so far we already know there is a very strong association, or correlation, between the two. But, is this association truly causal? Demonstrating causality in any such models is most often extremely challenging. Often one can’t demonstrate true causality or even Granger causality (a less absolute definition of causality that merely entails that one variable is the chronological predecessor of another without necessarily causing the other.
Slide 39 - 39 The steps to evaluate Granger Causality in this particular case Does CO2 Granger cause temperature? Run Granger Causality test: CO2 -> temperature. Test in which direction this causality manifest itself. Run Granger Causality test in the reverse causal direction: temperature -> CO2. This sounds absurd but there may be ecosystem explanations supporting why this may be so. The math is agnostic on stuff like that. Granger Causality just checks if A causes B more than B causes A to confirm the causality direction. What sign direction is this causality. Obviously, we want CO2 concentration to cause rising temperatures not declining one. To check that, we will observe the directional signs of the CO2 variables regression coefficients embedded in the underlying Vector Autoregression (VAR) model. If they sum up to a strong positive value, you have confirmed your hypothesis that CO2 causes rising temperatures. Otherwise, you have not. Next, check out the Impulse Response Function (IRF) graphs to visualize how an unanticipated shock in CO2 concentration reverberates on temperature increase over the next 10 years. Next, explore the Forecast Error Variance Decomposition (FEVD) to evaluate how much information CO2 does truly impart to these VAR models. Only once you have completed all five steps will you have drawn a complete picture of the Granger causality between two variables. Many practitioners stop after the very first step in a hurry to confirm their hypothesis; while being less than enthusiastic about pursuing the next steps that may not confirm their hypothesis.
Slide 40 - Does CO2 Granger cause Temperature? 41 Yes, it does We ran a set of Granger Causality tests. You start with a baseline autoregressive model that just includes 1 yearly lag of the temperature to estimate the temperature history. Next, you develop a second model by adding the 1 year lag of CO2 to also estimate the temperature history. Finally, you test with an F test and a Chi Square test whether the residuals of the second model including the CO2 lag are much lower than the residuals of the baseline autoregressive model. If they are indeed lower at a statistically significant level, you conclude that CO2 does Granger cause temperature. You repeat this procedure up to including 4 yearly lags (we did not contemplate using more lags. Beyond 4 yearly lags, we may likely start overfitting the model on the autoregressive properties of the respective time series). As shown above, both the series of F tests and Chi Square tests using models with up to 4 lags all confirm that CO2 clearly Granger cause temperature. Indeed, in all cases the resulting p-values are essentially Zero allowing us to reject the null hypothesis that there is no statistically significant difference between the two sets of residuals (baseline autoregressive model vs. model including the CO2 lags).
Slide 41 - Does CO2 Granger cause Temperature… more than Temperature Granger causing CO2? Yes it does 42 When you run all the Granger causality test in the other direction, all the F tests and Chi Square test are a lot lower, and the resulting p-value are much lower. In several of the Granger causality tests, we can’t reject the null hypothesis that any difference in residuals between the baseline autoregressive model and the model that includes CO2 is just due to randomness.
Slide 42 - # of lags selection for the VAR models using Information Criteria The models described earlier that include lags of both CO2 and temperature to establish causality in either direction are essentially unrestricted Vector Autoregression (VAR) models. When used for other purposes, on a stand alone basis, such models are also called Autoregressive Distributed Lag (ARDL) models, a popular model structure in social sciences and econometrics. 43 As a side note, when using level variables one should typically use other forms of VAR (not unrestricted). But, given that the residuals of our unrestricted VAR models are uncorrelated, we should be ok to proceed as is. To select the best number of lags for our VAR models, we will check the output of information criteria generated by an R function. The lower the information criterion value the better the model fit and specification. As shown above, two of the information criteria select the VAR models with 2 lags. And, the other two select the VAR models with 4 lags. But, notice that all four models (with lags ranging from 1 up to 4 yearly lags) have very close information criteria values. In essence, they are very competitive with each other. So, we will often look at all four models.
Slide 43 - Does the CO2 vs. Temperature causal relationship have the appropriate positive sign? … well here it gets a bit foggy Yet, when we look at the overall Granger causality effect of CO2 on temperature (associated with an unexpected upward shock in CO2), this net effect seems very small at around 0.005 to 0.006 regardless of the VAR we use. We derive this net effect by summing the CO2 lags regression coefficients. But, at least this net effect is positive. 44 Observing the signs of the CO2 lags regression coefficients leaves us to answer the above question with much nuance. The VAR models with 2 and 3 lags both have one CO2 coefficient with the wrong negative sign. The VAR with 4 lags has two coefficients with the wrong sign. I In some cases, we can accept coefficients with the wrong sign considering that the CO2 -> temperature relationship may have some mean-reverting properties that would cause this reversal in coefficients signs.
Slide 44 - Impulse Response Functions 45 The cumulative Impulse Response Function over the next 10 year periods describing the impact on temperature in response to an unanticipated upward shock of a one unit increase in CO2 concentration is rather unsettling. Well, when using a VAR model with only 1 lag, the IRF graph makes much sense; as it illustrates CO2 having a positive impact on temperature level (left graph). But, the graph on the right that describes the same IRF for a VAR with 2 lags suggests that an upward shock in CO2 would have a negative impact on temperature level. The IRF graphs for VAR with 3 and 4 lags looked nearly identical to the VAR with 2 lags IRF graph (right hand graph) with the negative sign.
Slide 45 - 45 Forecast Error Variance Decomposition (FEVD) For the VAR with 1 lag model fitting temperature, the table indicates that the autoregressive lag of temperature provides the vast majority of the information to fit temperature as the Y dependent variable. And, that the exogenous CO2 lag 1 variable provides very little information to the model. The FEVD profile for all the other VAR models with up to 4 lags had the exact same FEVD profile with the lags of the temperature variable providing over 99% of the information to the model; and, the exogenous CO2 lags providing very little information to these VAR models.
Slide 46 - 46 Why did some of our Granger Causality Analysis later steps showed ambivalent results? The first couple of steps showed pretty convincing mathematical results that CO2 does Granger cause temperature. However, as shown the later steps were between ambivalent to disproving. The above is probably due to a couple of phenomena. The first one is generic to these types of analysis. It is common to confirm Granger causality through the first couple of steps of such analysis. But confirmation through all 5 steps is much less common. The second phenomenon potentially specific to this modeling exercise is that the temperature level variable has a very high level of autocorrelation. And, within VAR models this strong autocorrelation of temperature probably has much reduced the explanatory impact of CO2. Thus, the temperature lags partly crowded out the CO2 ones in terms of estimating temperature levels with VAR models. More specifically, the temperature autocorrelation lag 1 is 0.9518; and, is a bit higher than the CO2 vs. temperature correlation lag 1 at 0.9453. One would think we could resolve this situation by detrending the variables and dealing with yearly changes in temperature and CO2 concentration. But, there is too much volatility in the yearly change variables to demonstrate any explicit relationship between the two variables. I had done such an exercise years ago. And, it would only serve as a mean to demonstrate that there is no Granger causal relationship between the two variables.
Slide 47 - 47 8. VAR Forecast Here we will revisit forecasting temperature anomaly over the 1982 – 2020 period using a model trained using 1880 – 1981 data. But, using VAR structures we will now attempt to conduct this forecast with no information whatsoever (no info regarding prospective CO2 concentration levels). This type of forecast testing is so challenging that it is bordering on the absurd. Imagine actually forecasting a time series variable (S&P 500, GDP, CPI, etc.) over the next 39 years without any exogenous information over those prospective years. That be probably close to impossible.
Slide 48 - Revisiting our best 1982 – 2020 forecast with the CO2 Model This was our best temperature anomaly forecast so far over the 1982 – 2020 period using data from 1880 to 1981 to train our CO2 based model. As shown, this is a remarkably good forecast. It entails that if you could have known CO2 concentration over this period (1982 – 2020), you could have generated a pretty good estimate of the temperature anomaly over this same period (1982 – 2020). Notice that all the CO2 model estimates of the temperature anomaly fall well within the 95% Prediction Interval. This is a rather unusually good situation. 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020. With 95% Prediction Interval Actual CO2 Model Lower Upper 48
Slide 49 - A VAR model w/ 1 lag using LN(CO2) can predict with no info whatsoever! 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. VAR w/ LN(CO2) forecast 1 lag, 1982 - 2020. P.I. 95% Actual VAR fcst Lower Upper Just using LN(CO2) instead of CO2 as our second Z variable within a VAR model with 1 lag generates a surprisingly good forecast of the temperature anomaly over the 1982 – 2020 period with no information whatsoever regarding this period! 49 This is rather astonishing. As shown, the VAR forecast does overestimate temperature by just about 0.1 degree Celsius at the onset in 1982 and in 2020. That’s a very small error given the model is not fed any information.
Slide 50 - Comparing our CO2 Model vs. VAR (with LN(CO2) forecasts 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020 Actual CO2 Model VAR 50 Ok, the VAR model does overestimate the temperature anomaly a bit relative to the OLS Cointegration Regression (CO2 Model). But, the VAR overestimation is really pretty small when considering the VAR model generated a 39 year forecast with no info whatsoever. By contrast, the CO2 model was fed the precise CO2 concentration level over that entire period. That is a huge difference.
Slide 51 - Why did the VAR (w/ LN(CO2) overestimated temperature? 52 440 430 420 410 400 390 380 370 360 350 340 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 CO2 (ppm). VAR (w/ LN(CO2) forecast 1 lag, 1982 - 2020 with P.I. 95% Actual VAR fcst lower upper This question is a little perplexing because we observed earlier that using LN(CO2) instead of CO2 within our earlier OLS regressions resulted in the LN(CO2) model underestimating temperature over the Hold Out (1982 – 2020) by quite a bit. But, when we use this same LN(CO2) variable within this VAR model, instead of underestimating temperature, it actually overestimates them by a little bit. Part of the reason is that this same VAR model does overestimate CO2 concentration. Remember in the former Hold Out tests with the standard OLS regressions, these models were fed with CO2 concentration over the 1982 – 2020 period; while the models were trained over the 1880 – 1981 period. With this VAR model, we are dealing with a rather extraordinary situation where it was trained over the 1880 – 1981 period; and, it was not provided any information over the Hold Out period (1982 – 2020). Yet, it was asked to forecast temperature over that same period. That’s a very challenging situation.
Slide 52 - Conclusion 53 Using CO2 concentration to estimate and forecast temperature anomaly levels was on many counts surprisingly successful. More complex models using additional variables associated with the Pacific Decadal Oscillation (El Nino (+); La Nina (-)) proved not so successful. They could fit the historical data. But, they turned out inferior in forecasting compared to the simpler model just using CO2 concentration. Using the natural log of CO2 as an independent variable was surprisingly successful for replicating the IPCC scenarios and also in forecasting the temperature anomaly over the 1982 – 2020 period with no info whatsoever using a VAR model with one lag. When it came to a full fledge Granger causality analysis, our results were much humbler. We could confirm Granger causality through the first two steps (Granger causality and its relationship direction). But, the subsequent steps turned out to be rather ambivalent (VAR regression coefficients signs, IRFs, FEVD).