3 3rd Tutorial

3.1 Recap

We want to test H0:β1=0 vs. H1:β10 at level α. Note that if we reject H0 we are saying that the model ˆβ0+^β1X has some ability to explain the variance that we are observing in Y . (i.e. There exists a linear relationship between the explanatory variables and the response variable.) Recall that ANOVA table can be used in the test for the existence of regression which is

Source of Variation SS df MS F
Regression SSR 1 MSR F0=MSRMSE
Error SSE n-2 MSE
Total SSTO n-1

Where, SSR= (ˆyiˉy)2=^β21SxxSSE= (^yiyi)2=SyyS2xySxxSSTO= (yiˉy)2=SyyMSR=SSR1MSE=SSEn2

To assess how well the regression line fit the data one can use the The coefficient of determination R2 which can be defined as follow:

R2=SSRSSTO=1 SSESSTO

which measures the proportion of variability explained by the regression.

3.2 Exercieses

Exercise 3.1 A second-hand cars dealer has 10 cars for sale. He decides to investigate the relation between the cars age X (in years) and the millage Y (in thousands miles) by using the simple linear regression model. The dealer reported the following: The mean and standard deviation of the cars millage are given, respectively, by 40.6 and 11.87153. The correlation coefficient between X and Y is 0.9687105. The estimated simple regression model is ˆY=8.892+7.733X.

  1. Obtain SYY,SXX and SXY
  2. Construct 90% CI for the slope
  3. Use ANOVA for testing the significance of the linearity.
  4. What proportion of the total variation in millage is explained by age?
  5. Compute the 95% CI for car millage with age 7 years.


Solution 3.1:
a)

S2Y=11.871532=SYYn1    SYY=11.871532×9=1268.399

Knowing ˆβ1=SXYSXX SXY=ˆβ1SXX

Then,

rXY=SXYSXX SYY=ˆβ1SXXSXX SYY=ˆβ1 SXX1268.399
Therefore:
0.9687105=7.733 SXX1268.399 SXX=19.9 b) SSE=SYYS2XYSXX=78.187MSE=SSEn2=9.773375S(ˆβ1)=MSESXX=0.7 Therefore, \ The 90% CI for the slope is: (ˆβ1±t(1α/2,𝑛2)×S(ˆβ1))=(6.432,9.035)


Source of Variation SS df MS F
Regression 1190.22 1 1990.22 121.787
Error 78.187 8 9.77
Total 1268.399 9

F0.95,1,8=5.31<121.787 Therefore H0 is rejected and that implies the existence of linear relationship.
d) R2=r2XY=0.9384
e) The fitted value ^Yh=8.892+7.7337 ×7=63.0279
ˆβ0=ˉYˆβ1 ˉX   ˉX=4.1
S(ˆY)=MSE ×(1n+(XhˉX)2SXX)=5.107


The 95% CI is:

(ˆYh±t(1α/2,n2)S(ˆYh))=(51.251158,74.804642)

3.3 Coursework

  1. Attempt Problem 3 and Problem 4 in the past exam paper here
  2. Attempt Problem 2 and Problem 3 in the past exam paper here