3 3rd Tutorial

Exercise 3.1 Appendix C in the textbook gives data related to the number of tractors in 30 serially numbered villages of Doraha development block in Punjab (India). Select (1) WR and (2) WOR simple random sample of 10 villages using direct approach method.

Village No. Village name Tractors Tube wells Irrigiated area
1 Ajnaud 20 102 281
2 Aracha 19 126 337
3 Afjullapur 5 31 77
4 Alampur 10 39 108
5 Bishanpur 12 85 191
6 Shahpur 21 130 208
7 Sirthala 44 222 698
8 Sultanpur 12 70 180
9 Sihora 20 219 458
10 Hole 10 115 288
11 Kotli Afgana 11 41 161
12 Katari 30 133 512
13 Kartarpur 15 55 123
14 Katana Sahib 3 50 100
15 Kaddon 36 249 675
16 Gobindpura 24 119 258
17 Gurditpura 17 95 143
18 Gidrhi 15 100 267
19 Ghadani Kalan 76 551 1178
20 Ghadani K.hurd 35 209 523
21 Ghalouti 46 380 583
22 Ghanrash 16 160 310
23 Chankoian Kalan 7 33 129
24 Chankoian K.hurd 21 142 184
25 Chapran 7 64 91
26 Jaipura 22 130 240
27 Jahangir 5 27 80
28 Jandali 21 118 429
29 Jargarhi 38 193 583
30 Jarag 59 360 888

\[\begin{align} &N=30\\ \ &\bar{Y}=\frac{\sum_{i=1}^{N} Y_i}{N}=22.5667\\ \ &\sigma^2 = \frac{\sum_{i=1}^{N}~(Y_i-\bar{Y})^2}{N}=273.3789\\ \ &\sigma=\sqrt{273.3789} = 16.5342 \end{align}\]

Here village is the sampling unit. The villages in the population are already serially numbered which, otherwise, is the first step involved in the sample selection. Refer to Appendix B, and use first column by dropping the last two digits of each four-digit number. Then we see that the first random number thus formed is 11. Similarly, the subsequent random numbers are seen to be \(5,26,11, ... ,3\)

(1) By selecting the first 10 random numbers from 1 to 30, without discarding Repetitions (WR), we obtain the serial numbers of villages in the sample. These are given below along with their variable values (number of tractors).

Village 11 5 26 11 11 24 12 22 9 3
Tractors 11 12 22 11 11 21 30 16 20 5

One can see that 11th village has been selected twice in the with replacement simple random sample where repeated selection of units is permitted.

\[\begin{align} &n=10\\ \ &\bar{y}=\frac{\sum_{i=1}^{n} y_i}{n}=15.9\\ \ &s^2 = \frac{\sum_{i=1}^{n} (y_i-\bar{y})^2}{n-1}=53.87789\\ \ &\text{Sampling variance:}~V(\bar{y}) = \frac{\sigma^2}{n}=\frac{273.3789}{10} = 27.3379\\ \ &\text{Standard error of the sample mean:} SE(\bar{y})=\sqrt{V(\bar{y})} = 5.2286\ \end{align}\]

The \(95\%\) confidence interval for the population mean \(\bar{Y}\) are given by \[\bar{y}\pm2~SE(\bar{y})\] \[\Rightarrow \bar{Y} \in [5.4428,26.3572]\]

Unbiased estimator of \(𝑉(\bar{y})\) is: \[v(\bar{y})=\frac{s^2}{n}=\frac{53.8778}{10}= 5.38778\] (2) In without replacement sample, any repetition (11th village in the present case) is omitted, and another random number is selected as its replacement. Next random numbers from 1 to 30 are 7 and 25. Thus the WOR simple random sample of 10 villages from the population under study is with the following serial numbers:

Village 11 5 26 24 12 22 9 3 7 25
Tractors 11 12 22 21 30 16 20 5 44 7

\[\begin{align} &n=10\\ \ &\bar{y}=\frac{\sum y_i}{n}=18.8\\ \ &s^2 = \frac{\sum (y_i-\bar{y})^2}{n-1}=135.7333\\ \ &\text{Sampling variance:}~V(\bar{y}) = \frac{N-n}{Nn}\sigma^2=\frac{30-10}{30\times~10}\times282.8057 = 18.8537\\ \ &\text{Standard error of the sample mean:} SE(\bar{y})=\sqrt{V(\bar{y})} = 4.3421\ \end{align}\]

The \(95\%\) confidence interval for the population mean \(\bar{Y}\) is given by \[\bar{y}\pm2~SE(\bar{y})\] \[\Rightarrow \bar{Y} \in [10.1158,27.4842]\]

Unbiased estimator of \(𝑉(\bar{y})\) is \[v(\bar{y})==\frac{N-n}{Nn}~s^2=\frac{30-10}{30\times10}\times135.7333= 10.557\]

Exercise 3.2 The height (in cm) of 6 students of M.Sc., majoring in statistics, from Punjab Agricultural University, Ludhiana was recorded during 1985. The data, so obtained, are given below:

Student Name Height
1 Sarjinder Singh 168
2 Gurmeet Singh 175
3 Varinder Kumar 185
4 Sukhjinder Singh 13
5 Denivder Kumar 171
6 Gulshan Kumar 172
  1. Calculate (a) population mean \(\bar{Y}\), and (b) population variance \(\sigma^2\).
  2. Enumerate all possible SRS with replacement samples of size n=2. Obtain sampling distribution of mean, and hence show that
  1. \(E(\bar{y}) = \bar{Y}\)
  2. \(V(\bar{y}) =\frac{\sigma^2}{n}\)
  3. \(E(s^2) = \sigma^2\)
  4. \(E[v(\bar{y})] = V(\bar{y})\)

Solution by R:
1.a

Y <- c(168,175,185,173,171,172)
N <- 6
(Ybar <- mean(Y))
## [1] 174

1.b

(Sigmasquare <- var(Y)*((N-1)/N))
## [1] 28.66667

2.a

n <- 2
(No.s<- N^n) #Number_of_all_possible_samples
## [1] 36
#install.packages("tidyverse")
library(tidyverse)
sample <- crossing (Var1=Y, Var2=Y)
samples<- sample (Y, 2, replace= TRUE)
(ybars= apply(sample,1 , mean))
##  [1] 168.0 169.5 170.0 170.5 171.5 176.5 169.5 171.0 171.5 172.0 173.0 178.0
## [13] 170.0 171.5 172.0 172.5 173.5 178.5 170.5 172.0 172.5 173.0 174.0 179.0
## [25] 171.5 173.0 173.5 174.0 175.0 180.0 176.5 178.0 178.5 179.0 180.0 185.0
#for a matrix 1 indicates rows,2 indicates columns or rowMeans(sample)
(unbiased_mean <- mean(ybars))
## [1] 174
unbiased_mean==Ybar # that's the required in a
## [1] TRUE

2.b

Ybarsquare <- (ybars)^2
(Variance_Of_ybar <- mean(Ybarsquare) -(unbiased_mean)^2)
## [1] 14.33333
(Sigmasquare/n) 
## [1] 14.33333
# that's the required in b

2.c

#install.packages("matrixStats")
library(matrixStats)
ssquare <- rowVars(as.matrix(sample))
mean(ssquare) 
## [1] 28.66667
# that's the required in c

2.d

(Sigmasquare/n)
## [1] 14.33333
(mean(ssquare/n)) 
## [1] 14.33333
# that's the required in d

Exercise 3.3 From the data given in Example 3.2, enumerate all the SRS without replacement samples of size n=2, and write down sampling distribution of mean. Using this distribution, show that:

  1. \(E(\bar{y})=\bar{Y}\)
  2. \(Var(\bar{y})=\frac{N-n}{Nn}S^2\)
  3. \(E(s^2)=S^2\)
  4. \(E(v(\bar{y}))=V(\bar{y})\)

Solution by R: a.

Y <- c(168,175,185,173,171,172)
n <- 2
(No.s<- choose(N,n))
## [1] 15
(sampless <- combn(Y,n))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]  168  168  168  168  168  175  175  175  175   185   185   185   173   173
## [2,]  175  185  173  171  172  185  173  171  172   173   171   172   171   172
##      [,15]
## [1,]   171
## [2,]   172
(ybars= apply(sampless,2, mean))
##  [1] 171.5 176.5 170.5 169.5 170.0 180.0 174.0 173.0 173.5 179.0 178.0 178.5
## [13] 172.0 172.5 171.5
#for a matrix 1 indicates rows,2 indicates columns or colMeans(samples)
(unbiased_mean <- mean(ybars))
## [1] 174
(Ssquare <- var(Y))
## [1] 34.4
Ybarsquare <- (ybars)^2
(Variance_Of_ybar <- mean(Ybarsquare) -(unbiased_mean)^2)
## [1] 11.46667
(N-n)/(N*n) * Ssquare
## [1] 11.46667
  1. Similarly to what have been done previously
ssquare <- colVars(as.matrix(sampless))
mean(ssquare) 
## [1] 34.4
# Thats the required in (c)
Ssquare/n
## [1] 17.2
(mean(ssquare/n)) 
## [1] 17.2
# thats the required in (c)
Village Tractors
43 14
61 8
58 15
62 39
47 9
34 14
11 11
43 19
5 12
53 18

Average number of tractors per village in the block is:

\[\bar{y}=\frac{1}{10}\sum_{i=1}{10}y_i = \frac{159}{10}=15.9\] To find the standard error, we first calculate sample mean square

\[s^2=\frac{1}{n-1}\sum_{i=1}{n}y_i^2-n\bar{y}^2=78.32222\] The estimate of the variance is given by: \[v(\bar{y})=\frac{s^2}{n}=\frac{ 78.3222}{10}= 7.83222\]

Estimate the standard error of the sample mean is: \[se(\bar{y})=\sqrt{7.83222}=2.7986\]

The lower and upper limits of the confidence interval for 𝑌̅ are given by:

\[\begin{align} \bar{y}\pm~2\times~se(\bar{y}) \Rightarrow \bar{y} \in [10.3028 , 21.497] \end{align}\]







much-smaller image