5 5th Tutorial
5.1 Recap
Recall that the simple linear model formula is defined as follow: \[y_i = \beta_0 +\beta_1~x_i + \epsilon_i ~~~~~~~~i =1,2,...,n\] However, this formula can be re-written in matrices setup to be as follow: \[\textbf{Y} = \textbf{X}~\beta+\epsilon\] Where \(\textbf{Y}\) is a vector of y elements, \(\beta= [\beta_0, \beta_1]^T\) and \(\epsilon\) is a vector of the errors and \(X\) is the following matrix: \[\textbf{X}= \begin{bmatrix} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix}\]
5.2 Exercises
Exercise 5.1 Using the matrices framework calculate the least squared estimate for \(\beta\).
Solution
Let us start with reminding ourselves with that
1) \(\dfrac{d}{dx} Ax = A^T\) where A is a matrix of constants.
2) \(\dfrac{d}{dx} x^TAx = 2Ax\) whenever A is symmetric.
3) \(\sum x_i^2 = \textbf{x}^T~\textbf{x}\)
Now as we know the \(SSR = \sum(y-\hat{y})^2\) and that can be rearranged to be written \(SSR = (\textbf{Y}-\textbf{X}~\beta)^T~(\textbf{Y}-\textbf{X}~\beta)\) so now we aim to find \(\underset{\beta}{\min}~ SSR\) in order to do this optimisation problem we need to calculate the gradient of SSR
\[\begin{align}
&\dfrac{d}{d~\beta}~(\textbf{Y}-\textbf{X}~\beta)^T~(\textbf{Y}-\textbf{X}~\beta)\\
&\dfrac{d}{d~\beta} (\textbf{Y}^T\textbf{Y}-\textbf{Y}^T~\textbf{X}\beta-\beta^T~\textbf{X}^T~\textbf{Y}+\beta^T~\textbf{X}^T~\textbf{X}~\beta)\\
&\dfrac{d}{d~\beta} (\textbf{Y}^T\textbf{Y}-2\textbf{Y}^T~\textbf{X}\beta+\beta^T~\textbf{X}^T~\textbf{X}~\beta)\\
&= (-2\textbf{Y}^{T}\textbf{X})^T+2\textbf{X}^T\textbf{X}\beta\\
&= -2~\textbf{X}^T\textbf{Y} + 2\textbf{X}^T\textbf{X}\beta
\end{align}\]
Therefore \(\hat\beta\) will be the least squares estimator of \(\beta\) if \(-2~\textbf{X}^T\textbf{Y} + 2\textbf{X}^T\textbf{X}\hat\beta=0\).
To be able to isolate \(\hat\beta\) it is necessary for \(\textbf{X}^T\textbf{X}\) to be invertible; therefore we need \(\textbf{X}\) to be of full rank,then
\[\hat\beta= (\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T\textbf{Y} \]
Exercise 5.2 Prove that \(\hat{\beta}\) is unbiased.
Solution \[\begin{align} E(\hat\beta) & = E((\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T\textbf{Y})\\ & (\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T~E(\textbf{Y})\\ & (\textbf{X}^T\textbf{X})^{-1} \textbf{X}^T~E(\textbf{X}\beta+\epsilon)\\ & (\textbf{X}^T\textbf{X})^{-1} \textbf{X}^T~(\textbf{X}\beta+E(\epsilon))\\ & (\textbf{X}^T\textbf{X})^{-1} \textbf{X}^T~(\textbf{X}\beta+\textbf{0}) = \beta \end{align}\]
Exercise 5.3 Calculate the variance of \(\hat{\beta}\).
\[\begin{align} var(\hat\beta) & = var((\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T\textbf{Y})\\ & = (\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T~var(\textbf{Y})((\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T)^T\\ & = (\textbf{X}^T\textbf{X})^{-1} \textbf{X}^T~var(\textbf{X}\beta+\epsilon)((\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T)^T\\ & = (\textbf{X}^T\textbf{X})^{-1} \textbf{X}^T~var(\epsilon)((\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T)^T\\ & = \sigma^2~ (\textbf{X}^T\textbf{X})^{-1} \textbf{X}^T~((\textbf{X}^T\textbf{X})^{-1}~ \textbf{X}^T)^T\\ & = \sigma^2 (\textbf{X}^T~\textbf{X})^{-1} \end{align}\]
As \[\textbf{X}^T~\textbf{X}=\begin{bmatrix} n & \sum_{i=1}^{n}x_i \\ \sum_{i=1}^{n}x_i & \sum_{i=1}^{n}x_i^2 \\ \end{bmatrix}\]
Therefore, \[(\textbf{X}^T~\textbf{X})^{-1} = \dfrac{1}{\sum_{i=1}^{n}(x_i-\bar{x})^2} \begin{bmatrix} \frac{1}{n}\sum_{i=1}^{n}~x_i^2 & -\bar{x}\\ -\bar{x} & 1 \end{bmatrix}\]
And that can be written to be:
\[var(\hat\beta)=\dfrac{\sigma^2}{\sum_{i=1}^{n}(x_i-\bar{x})^2} \begin{bmatrix} \frac{1}{n}\sum_{i=1}^{n}~x_i^2 & -\bar{x}\\ -\bar{x} & 1\end{bmatrix}= \begin{bmatrix} var(\hat{\beta_0}) & cov(\hat{\beta_0},\hat{\beta_1})\\ cov(\hat{\beta_0},\hat{\beta_1}) & var(\hat{\beta_1}) \end{bmatrix}\]
Exercise 5.4 Assuming the data set
X | 77 | 54 | 71 | 72 | 81 | 94 | 96 | 99 | 83 | 67 |
Y | 82 | 38 | 78 | 34 | 37 | 85 | 99 | 99 | 79 | 67 |
- Calculate \(\hat{\beta}\).
Y = c(82,38,78,34,37,85,99,99,79,67)
X = matrix(c(1,1,1,1,1,1,1,1,1,1,77,54,71,72,81,94,96,99,83,67), nc=2, byrow = FALSE)
X
## [,1] [,2]
## [1,] 1 77
## [2,] 1 54
## [3,] 1 71
## [4,] 1 72
## [5,] 1 81
## [6,] 1 94
## [7,] 1 96
## [8,] 1 99
## [9,] 1 83
## [10,] 1 67
## [,1]
## [1,] -29.26661
## [2,] 1.24769
- Calculate \(cov(\hat\beta_0,\hat\beta_1)\). One way of doing that is by calculating the covariance matrix
## [,1]
## [1,] 347.855
## [,1] [,2]
## [1,] 1240.79251 -15.1890052
## [2,] -15.18901 0.1912973
Therefore, the covariance between \(\beta_0\) and \(\beta_1\) is \(-15.19\)
5.3 Courswork
Attempt Problem 2 in the past exam paper here