3 3rd Tutorial
3.1 Recap
We want to test H0:β1=0 vs. H1:β1≠0 at level α. Note that if we reject H0 we are saying that the model ˆβ0+^β1X has some ability to explain the variance that we are observing in Y . (i.e. There exists a linear relationship between the explanatory variables and the response variable.) Recall that ANOVA table can be used in the test for the existence of regression which is
Source of Variation | SS | df | MS | F |
---|---|---|---|---|
Regression | SSR | 1 | MSR | F0=MSRMSE |
Error | SSE | n-2 | MSE | |
Total | SSTO | n-1 |
Where, SSR=∑ (ˆyi−ˉy)2=^β21SxxSSE=∑ (^yi−yi)2=Syy−S2xySxxSSTO=∑ (yi−ˉy)2=SyyMSR=SSR1MSE=SSEn−2
To assess how well the regression line fit the data one can use the The coefficient of determination R2 which can be defined as follow:
R2=SSRSSTO=1− SSESSTO
which measures the proportion of variability explained by the regression.
3.2 Exercieses
Exercise 3.1 A second-hand cars dealer has 10 cars for sale. He decides to investigate the relation between the cars age X (in years) and the millage Y (in thousands miles) by using the simple linear regression model. The dealer reported the following: The mean and standard deviation of the cars millage are given, respectively, by 40.6 and 11.87153. The correlation coefficient between X and Y is 0.9687105. The estimated simple regression model is ˆY=8.892+7.733X.
- Obtain SYY,SXX and SXY
- Construct 90% CI for the slope
- Use ANOVA for testing the significance of the linearity.
- What proportion of the total variation in millage is explained by age?
- Compute the 95% CI for car millage with age 7 years.
Solution 3.1:
a)
S2Y=11.871532=SYYn−1 ⇒ SYY=11.871532×9=1268.399
Knowing ˆβ1=SXYSXX⇒ SXY=ˆβ1SXX
Then,
rXY=SXY√SXX √SYY=ˆβ1SXX√SXX √SYY=ˆβ1 √SXX√1268.399
Therefore:
0.9687105=7.733 √SXX√1268.399 ⇒SXX=19.9
b)
SSE=SYY−S2XYSXX=78.187MSE=SSEn−2=9.773375S(ˆβ1)=√MSESXX=0.7
Therefore,
\
The 90% CI for the slope is:
(ˆβ1±t(1−α/2⁄,𝑛−2)×S(ˆβ1))=(6.432,9.035)
Source of Variation | SS | df | MS | F |
---|---|---|---|---|
Regression | 1190.22 | 1 | 1990.22 | 121.787 |
Error | 78.187 | 8 | 9.77 | |
Total | 1268.399 | 9 |
F0.95,1,8=5.31<121.787
Therefore H0 is rejected and that implies the existence of linear relationship.
d)
R2=r2XY=0.9384
e)
The fitted value ^Yh=8.892+7.7337 ×7=63.0279
ˆβ0=ˉY−ˆβ1 ˉX ⇒ ˉX=4.1
S(ˆY)=MSE ×(1n+(Xh−ˉX)2SXX)=5.107
The 95% CI is:
(ˆYh±t(1−α/2,n−2)S(ˆYh))=(51.251158,74.804642)