3 3rd Tutorial
Exercise 3.1 Appendix C in the textbook gives data related to the number of tractors in 30 serially numbered villages of Doraha development block in Punjab (India). Select (1) WR and (2) WOR simple random sample of 10 villages using direct approach method.
Village No. | Village name | Tractors | Tube wells | Irrigiated area |
---|---|---|---|---|
1 | Ajnaud | 20 | 102 | 281 |
2 | Aracha | 19 | 126 | 337 |
3 | Afjullapur | 5 | 31 | 77 |
4 | Alampur | 10 | 39 | 108 |
5 | Bishanpur | 12 | 85 | 191 |
6 | Shahpur | 21 | 130 | 208 |
7 | Sirthala | 44 | 222 | 698 |
8 | Sultanpur | 12 | 70 | 180 |
9 | Sihora | 20 | 219 | 458 |
10 | Hole | 10 | 115 | 288 |
11 | Kotli Afgana | 11 | 41 | 161 |
12 | Katari | 30 | 133 | 512 |
13 | Kartarpur | 15 | 55 | 123 |
14 | Katana Sahib | 3 | 50 | 100 |
15 | Kaddon | 36 | 249 | 675 |
16 | Gobindpura | 24 | 119 | 258 |
17 | Gurditpura | 17 | 95 | 143 |
18 | Gidrhi | 15 | 100 | 267 |
19 | Ghadani Kalan | 76 | 551 | 1178 |
20 | Ghadani K.hurd | 35 | 209 | 523 |
21 | Ghalouti | 46 | 380 | 583 |
22 | Ghanrash | 16 | 160 | 310 |
23 | Chankoian Kalan | 7 | 33 | 129 |
24 | Chankoian K.hurd | 21 | 142 | 184 |
25 | Chapran | 7 | 64 | 91 |
26 | Jaipura | 22 | 130 | 240 |
27 | Jahangir | 5 | 27 | 80 |
28 | Jandali | 21 | 118 | 429 |
29 | Jargarhi | 38 | 193 | 583 |
30 | Jarag | 59 | 360 | 888 |
\[\begin{align} &N=30\\ \ &\bar{Y}=\frac{\sum_{i=1}^{N} Y_i}{N}=22.5667\\ \ &\sigma^2 = \frac{\sum_{i=1}^{N}~(Y_i-\bar{Y})^2}{N}=273.3789\\ \ &\sigma=\sqrt{273.3789} = 16.5342 \end{align}\]
Here village is the sampling unit. The villages in the population are already serially numbered which, otherwise, is the first step involved in the sample selection. Refer to Appendix B, and use first column by dropping the last two digits of each four-digit number. Then we see that the first random number thus formed is 11. Similarly, the subsequent random numbers are seen to be \(5,26,11, ... ,3\)
(1) By selecting the first 10 random numbers from 1 to 30, without discarding Repetitions (WR), we obtain the serial numbers of villages in the sample. These are given below along with their variable values (number of tractors).
Village | 11 | 5 | 26 | 11 | 11 | 24 | 12 | 22 | 9 | 3 |
Tractors | 11 | 12 | 22 | 11 | 11 | 21 | 30 | 16 | 20 | 5 |
One can see that 11th village has been selected twice in the with replacement simple random sample where repeated selection of units is permitted.
\[\begin{align} &n=10\\ \ &\bar{y}=\frac{\sum_{i=1}^{n} y_i}{n}=15.9\\ \ &s^2 = \frac{\sum_{i=1}^{n} (y_i-\bar{y})^2}{n-1}=53.87789\\ \ &\text{Sampling variance:}~V(\bar{y}) = \frac{\sigma^2}{n}=\frac{273.3789}{10} = 27.3379\\ \ &\text{Standard error of the sample mean:} SE(\bar{y})=\sqrt{V(\bar{y})} = 5.2286\ \end{align}\]
The \(95\%\) confidence interval for the population mean \(\bar{Y}\) are given by \[\bar{y}\pm2~SE(\bar{y})\] \[\Rightarrow \bar{Y} \in [5.4428,26.3572]\]
Unbiased estimator of \(𝑉(\bar{y})\) is: \[v(\bar{y})=\frac{s^2}{n}=\frac{53.8778}{10}= 5.38778\] (2) In without replacement sample, any repetition (11th village in the present case) is omitted, and another random number is selected as its replacement. Next random numbers from 1 to 30 are 7 and 25. Thus the WOR simple random sample of 10 villages from the population under study is with the following serial numbers:
Village | 11 | 5 | 26 | 24 | 12 | 22 | 9 | 3 | 7 | 25 |
Tractors | 11 | 12 | 22 | 21 | 30 | 16 | 20 | 5 | 44 | 7 |
\[\begin{align} &n=10\\ \ &\bar{y}=\frac{\sum y_i}{n}=18.8\\ \ &s^2 = \frac{\sum (y_i-\bar{y})^2}{n-1}=135.7333\\ \ &\text{Sampling variance:}~V(\bar{y}) = \frac{N-n}{Nn}\sigma^2=\frac{30-10}{30\times~10}\times282.8057 = 18.8537\\ \ &\text{Standard error of the sample mean:} SE(\bar{y})=\sqrt{V(\bar{y})} = 4.3421\ \end{align}\]
The \(95\%\) confidence interval for the population mean \(\bar{Y}\) is given by \[\bar{y}\pm2~SE(\bar{y})\] \[\Rightarrow \bar{Y} \in [10.1158,27.4842]\]
Unbiased estimator of \(𝑉(\bar{y})\) is \[v(\bar{y})==\frac{N-n}{Nn}~s^2=\frac{30-10}{30\times10}\times135.7333= 10.557\]
Exercise 3.2 The height (in cm) of 6 students of M.Sc., majoring in statistics, from Punjab Agricultural University, Ludhiana was recorded during 1985. The data, so obtained, are given below:
Student | Name | Height |
---|---|---|
1 | Sarjinder Singh | 168 |
2 | Gurmeet Singh | 175 |
3 | Varinder Kumar | 185 |
4 | Sukhjinder Singh | 13 |
5 | Denivder Kumar | 171 |
6 | Gulshan Kumar | 172 |
- Calculate (a) population mean \(\bar{Y}\), and (b) population variance \(\sigma^2\).
- Enumerate all possible SRS with replacement samples of size n=2. Obtain sampling distribution of mean, and hence show that
- \(E(\bar{y}) = \bar{Y}\)
- \(V(\bar{y}) =\frac{\sigma^2}{n}\)
- \(E(s^2) = \sigma^2\)
- \(E[v(\bar{y})] = V(\bar{y})\)
Solution by R:
1.a
## [1] 174
1.b
## [1] 28.66667
2.a
## [1] 36
#install.packages("tidyverse")
library(tidyverse)
sample <- crossing (Var1=Y, Var2=Y)
samples<- sample (Y, 2, replace= TRUE)
(ybars= apply(sample,1 , mean))
## [1] 168.0 169.5 170.0 170.5 171.5 176.5 169.5 171.0 171.5 172.0 173.0 178.0
## [13] 170.0 171.5 172.0 172.5 173.5 178.5 170.5 172.0 172.5 173.0 174.0 179.0
## [25] 171.5 173.0 173.5 174.0 175.0 180.0 176.5 178.0 178.5 179.0 180.0 185.0
#for a matrix 1 indicates rows,2 indicates columns or rowMeans(sample)
(unbiased_mean <- mean(ybars))
## [1] 174
## [1] TRUE
2.b
## [1] 14.33333
## [1] 14.33333
2.c
#install.packages("matrixStats")
library(matrixStats)
ssquare <- rowVars(as.matrix(sample))
mean(ssquare)
## [1] 28.66667
2.d
## [1] 14.33333
## [1] 14.33333
Exercise 3.3 From the data given in Example 3.2, enumerate all the SRS without replacement samples of size n=2, and write down sampling distribution of mean. Using this distribution, show that:
- \(E(\bar{y})=\bar{Y}\)
- \(Var(\bar{y})=\frac{N-n}{Nn}S^2\)
- \(E(s^2)=S^2\)
- \(E(v(\bar{y}))=V(\bar{y})\)
Solution by R: a.
## [1] 15
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 168 168 168 168 168 175 175 175 175 185 185 185 173 173
## [2,] 175 185 173 171 172 185 173 171 172 173 171 172 171 172
## [,15]
## [1,] 171
## [2,] 172
## [1] 171.5 176.5 170.5 169.5 170.0 180.0 174.0 173.0 173.5 179.0 178.0 178.5
## [13] 172.0 172.5 171.5
#for a matrix 1 indicates rows,2 indicates columns or colMeans(samples)
(unbiased_mean <- mean(ybars))
## [1] 174
## [1] 34.4
## [1] 11.46667
## [1] 11.46667
- Similarly to what have been done previously
## [1] 34.4
## [1] 17.2
## [1] 17.2
Village | Tractors |
---|---|
43 | 14 |
61 | 8 |
58 | 15 |
62 | 39 |
47 | 9 |
34 | 14 |
11 | 11 |
43 | 19 |
5 | 12 |
53 | 18 |
Average number of tractors per village in the block is:
\[\bar{y}=\frac{1}{10}\sum_{i=1}{10}y_i = \frac{159}{10}=15.9\] To find the standard error, we first calculate sample mean square
\[s^2=\frac{1}{n-1}\sum_{i=1}{n}y_i^2-n\bar{y}^2=78.32222\] The estimate of the variance is given by: \[v(\bar{y})=\frac{s^2}{n}=\frac{ 78.3222}{10}= 7.83222\]
Estimate the standard error of the sample mean is: \[se(\bar{y})=\sqrt{7.83222}=2.7986\]
The lower and upper limits of the confidence interval for 𝑌̅ are given by:
\[\begin{align} \bar{y}\pm~2\times~se(\bar{y}) \Rightarrow \bar{y} \in [10.3028 , 21.497] \end{align}\]