3 3rd Tutorial

Exercise 3.1 Appendix C in the textbook gives data related to the number of tractors in 30 serially numbered villages of Doraha development block in Punjab (India). Select (1) WR and (2) WOR simple random sample of 10 villages using direct approach method.

Village No.	Village name	Tractors	Tube wells	Irrigiated area
1	Ajnaud	20	102	281
2	Aracha	19	126	337
3	Afjullapur	5	31	77
4	Alampur	10	39	108
5	Bishanpur	12	85	191
6	Shahpur	21	130	208
7	Sirthala	44	222	698
8	Sultanpur	12	70	180
9	Sihora	20	219	458
10	Hole	10	115	288
11	Kotli Afgana	11	41	161
12	Katari	30	133	512
13	Kartarpur	15	55	123
14	Katana Sahib	3	50	100
15	Kaddon	36	249	675
16	Gobindpura	24	119	258
17	Gurditpura	17	95	143
18	Gidrhi	15	100	267
19	Ghadani Kalan	76	551	1178
20	Ghadani K.hurd	35	209	523
21	Ghalouti	46	380	583
22	Ghanrash	16	160	310
23	Chankoian Kalan	7	33	129
24	Chankoian K.hurd	21	142	184
25	Chapran	7	64	91
26	Jaipura	22	130	240
27	Jahangir	5	27	80
28	Jandali	21	118	429
29	Jargarhi	38	193	583
30	Jarag	59	360	888

\[\begin{align} &N=30\\ \ &\bar{Y}=\frac{\sum_{i=1}^{N} Y_i}{N}=22.5667\\ \ &\sigma^2 = \frac{\sum_{i=1}^{N}~(Y_i-\bar{Y})^2}{N}=273.3789\\ \ &\sigma=\sqrt{273.3789} = 16.5342 \end{align}\]

Here village is the sampling unit. The villages in the population are already serially numbered which, otherwise, is the first step involved in the sample selection. Refer to Appendix B, and use first column by dropping the last two digits of each four-digit number. Then we see that the first random number thus formed is 11. Similarly, the subsequent random numbers are seen to be \(5,26,11, ... ,3\)

(1) By selecting the first 10 random numbers from 1 to 30, without discarding Repetitions (WR), we obtain the serial numbers of villages in the sample. These are given below along with their variable values (number of tractors).

Village	11	5	26	11	11	24	12	22	9	3
Tractors	11	12	22	11	11	21	30	16	20	5

One can see that 11th village has been selected twice in the with replacement simple random sample where repeated selection of units is permitted.

\[\begin{align} &n=10\\ \ &\bar{y}=\frac{\sum_{i=1}^{n} y_i}{n}=15.9\\ \ &s^2 = \frac{\sum_{i=1}^{n} (y_i-\bar{y})^2}{n-1}=53.87789\\ \ &\text{Sampling variance:}~V(\bar{y}) = \frac{\sigma^2}{n}=\frac{273.3789}{10} = 27.3379\\ \ &\text{Standard error of the sample mean:} SE(\bar{y})=\sqrt{V(\bar{y})} = 5.2286\ \end{align}\]

The \(95\%\) confidence interval for the population mean \(\bar{Y}\) are given by \[\bar{y}\pm2~SE(\bar{y})\] \[\Rightarrow \bar{Y} \in [5.4428,26.3572]\]

Unbiased estimator of \(𝑉(\bar{y})\) is: \[v(\bar{y})=\frac{s^2}{n}=\frac{53.8778}{10}= 5.38778\] (2) In without replacement sample, any repetition (11th village in the present case) is omitted, and another random number is selected as its replacement. Next random numbers from 1 to 30 are 7 and 25. Thus the WOR simple random sample of 10 villages from the population under study is with the following serial numbers:

Village	11	5	26	24	12	22	9	3	7	25
Tractors	11	12	22	21	30	16	20	5	44	7

\[\begin{align} &n=10\\ \ &\bar{y}=\frac{\sum y_i}{n}=18.8\\ \ &s^2 = \frac{\sum (y_i-\bar{y})^2}{n-1}=135.7333\\ \ &\text{Sampling variance:}~V(\bar{y}) = \frac{N-n}{Nn}\sigma^2=\frac{30-10}{30\times~10}\times282.8057 = 18.8537\\ \ &\text{Standard error of the sample mean:} SE(\bar{y})=\sqrt{V(\bar{y})} = 4.3421\ \end{align}\]

The \(95\%\) confidence interval for the population mean \(\bar{Y}\) is given by \[\bar{y}\pm2~SE(\bar{y})\] \[\Rightarrow \bar{Y} \in [10.1158,27.4842]\]

Unbiased estimator of \(𝑉(\bar{y})\) is \[v(\bar{y})==\frac{N-n}{Nn}~s^2=\frac{30-10}{30\times10}\times135.7333= 10.557\]

Exercise 3.2 The height (in cm) of 6 students of M.Sc., majoring in statistics, from Punjab Agricultural University, Ludhiana was recorded during 1985. The data, so obtained, are given below:

Student	Name	Height
1	Sarjinder Singh	168
2	Gurmeet Singh	175
3	Varinder Kumar	185
4	Sukhjinder Singh	13
5	Denivder Kumar	171
6	Gulshan Kumar	172

Calculate (a) population mean \(\bar{Y}\), and (b) population variance \(\sigma^2\).
Enumerate all possible SRS with replacement samples of size n=2. Obtain sampling distribution of mean, and hence show that

\(E(\bar{y}) = \bar{Y}\)
\(V(\bar{y}) =\frac{\sigma^2}{n}\)
\(E(s^2) = \sigma^2\)
\(E[v(\bar{y})] = V(\bar{y})\)

Solution by R:
1.a

Y <- c(168,175,185,173,171,172)
N <- 6
(Ybar <- mean(Y))

## [1] 174

1.b

(Sigmasquare <- var(Y)*((N-1)/N))

## [1] 28.66667

2.a

n <- 2
(No.s<- N^n) #Number_of_all_possible_samples

## [1] 36

#install.packages("tidyverse")
library(tidyverse)
sample <- crossing (Var1=Y, Var2=Y)
samples<- sample (Y, 2, replace= TRUE)
(ybars= apply(sample,1 , mean))

##  [1] 168.0 169.5 170.0 170.5 171.5 176.5 169.5 171.0 171.5 172.0 173.0 178.0
## [13] 170.0 171.5 172.0 172.5 173.5 178.5 170.5 172.0 172.5 173.0 174.0 179.0
## [25] 171.5 173.0 173.5 174.0 175.0 180.0 176.5 178.0 178.5 179.0 180.0 185.0

#for a matrix 1 indicates rows,2 indicates columns or rowMeans(sample)
(unbiased_mean <- mean(ybars))

## [1] 174

unbiased_mean==Ybar # that's the required in a

## [1] TRUE

2.b

Ybarsquare <- (ybars)^2
(Variance_Of_ybar <- mean(Ybarsquare) -(unbiased_mean)^2)

## [1] 14.33333

(Sigmasquare/n)

## [1] 14.33333

# that's the required in b

2.c

#install.packages("matrixStats")
library(matrixStats)
ssquare <- rowVars(as.matrix(sample))
mean(ssquare)

## [1] 28.66667

# that's the required in c

2.d

(Sigmasquare/n)

## [1] 14.33333

(mean(ssquare/n))

## [1] 14.33333

# that's the required in d

Exercise 3.3 From the data given in Example 3.2, enumerate all the SRS without replacement samples of size n=2, and write down sampling distribution of mean. Using this distribution, show that:

\(E(\bar{y})=\bar{Y}\)
\(Var(\bar{y})=\frac{N-n}{Nn}S^2\)
\(E(s^2)=S^2\)
\(E(v(\bar{y}))=V(\bar{y})\)

Solution by R: a.

Y <- c(168,175,185,173,171,172)
n <- 2
(No.s<- choose(N,n))

## [1] 15

(sampless <- combn(Y,n))

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]  168  168  168  168  168  175  175  175  175   185   185   185   173   173
## [2,]  175  185  173  171  172  185  173  171  172   173   171   172   171   172
##      [,15]
## [1,]   171
## [2,]   172

(ybars= apply(sampless,2, mean))

##  [1] 171.5 176.5 170.5 169.5 170.0 180.0 174.0 173.0 173.5 179.0 178.0 178.5
## [13] 172.0 172.5 171.5

#for a matrix 1 indicates rows,2 indicates columns or colMeans(samples)
(unbiased_mean <- mean(ybars))

## [1] 174

(Ssquare <- var(Y))

## [1] 34.4

Ybarsquare <- (ybars)^2
(Variance_Of_ybar <- mean(Ybarsquare) -(unbiased_mean)^2)

## [1] 11.46667

(N-n)/(N*n) * Ssquare

## [1] 11.46667

Similarly to what have been done previously

ssquare <- colVars(as.matrix(sampless))
mean(ssquare)

## [1] 34.4

# Thats the required in (c)

Ssquare/n

## [1] 17.2

(mean(ssquare/n))

## [1] 17.2

# thats the required in (c)

Village	Tractors
43	14
61	8
58	15
62	39
47	9
34	14
11	11
43	19
5	12
53	18

Average number of tractors per village in the block is:

\[\bar{y}=\frac{1}{10}\sum_{i=1}{10}y_i = \frac{159}{10}=15.9\] To find the standard error, we first calculate sample mean square

\[s^2=\frac{1}{n-1}\sum_{i=1}{n}y_i^2-n\bar{y}^2=78.32222\] The estimate of the variance is given by: \[v(\bar{y})=\frac{s^2}{n}=\frac{ 78.3222}{10}= 7.83222\]

Estimate the standard error of the sample mean is: \[se(\bar{y})=\sqrt{7.83222}=2.7986\]

The lower and upper limits of the confidence interval for 𝑌̅ are given by:

\[\begin{align} \bar{y}\pm~2\times~se(\bar{y}) \Rightarrow \bar{y} \in [10.3028 , 21.497] \end{align}\]

much-smaller image