Linear model for two asset return series


First we’ll get the prices of two ETFs, one is the S&P 500 tracker (SPY) and the other is the US 7-10y Treasury ETF (IEF).

library(zoo)
library(ggplot2)
library(tseries)
 
spy <- get.hist.quote(instrument="SPY", start="2003-01-01",
                      end=Sys.Date(), quote="AdjClose",
                      provider="yahoo", origin="1970-01-01",
                      compression="d", retclass="zoo")
ief <- get.hist.quote(instrument="IEF", start="2003-01-01",
                      end=Sys.Date(), quote="AdjClose",
                      provider="yahoo", origin="1970-01-01",
                      compression="d", retclass="zoo")
z <- merge.zoo(spy,ief)

For the purpose of regression we will convert into log returns:

z.logrtn <- diff(log(z))
z.logrtn.df <- as.data.frame(z.logrtn)

Now the linear regression where we model the daily S&P 500 returns using the daily Treasury returns as the independent variable.

lm.fit <- lm(SPY ~ IEF,data=z.logrtn)

Looking at the results the “beta” is about -1, so a 1% return on Treasuries means a -1% return on the S&P 500. Beware! Note that this relationship is unreliable. If you look at different periods of time there are periods when beta turns positive.

> summary(lm.fit)

Call:
lm(formula = SPY ~ IEF, data = z.logrtn)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.095435 -0.005149  0.000380  0.005328  0.127657 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.0005794  0.0002025   2.861  0.00425 ** 
IEF         -1.0742832  0.0460546 -23.326  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01123 on 3079 degrees of freedom
Multiple R-squared:  0.1502,	Adjusted R-squared:  0.1499 
F-statistic: 544.1 on 1 and 3079 DF,  p-value: < 2.2e-16

I found a great thread on StackOverflow on annotating a linear plot so you can see the regression equation and R^2, the link is here. You simply pass in the linear model as a parameter and it produces the annotation text which can then be parsed using geom_text(,parse=TRUE).

lm_eqn = function(m) {
  
  l <- list(a = format(coef(m)[1], digits = 2),
            b = format(abs(coef(m)[2]), digits = 2),
            r2 = format(summary(m)$r.squared, digits = 3));
  
  if (coef(m)[2] >= 0)  {
    eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,l)
  } else {
    eq <- substitute(italic(y) == a - b %.% italic(x)*","~~italic(r)^2~"="~r2,l)    
  }
  
  as.character(as.expression(eq));                 
}

In our example you can plot the regression like so:

ggplot(data = z.logrtn.df, aes(x = IEF, y = SPY)) +
  geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
  geom_point() +
  annotate("text", x=mean(z.logrtn.df$IEF), y=Inf, label=lm_eqn(lm.fit), colour="black", size=5, parse=TRUE, vjust=1) +
  theme_bw()

Which looks like this:

SPY vs IEF