5 5th Tutorial

Exercise 5.1 The number of colleges in 12 districts of a state are 8, 10, 6, 7, 7, 9, 11, 5, 6, 8, 9, and 11. List all possible samples of size 3 that can be selected from this population of 12 units using LS and CS sampling. Also, determine the average of corresponding sample means in both the cases. Are the two averages equal to the population mean? If yes, what does it indicate about the bias in the two estimators?

Solution Linear systematic sampling:

Random start (r)	Serial	y values	Sample mean
1	(1, 5, 9)	(8,7,6)	7.00
2	(2, 6,10)	(10,9,8)	9.00
3	(3,7,11)	(6,11,9)	8.67
4	(4,8,12)	(7,5,11)	7.67

The average of the sample means is $E({\bar{y}}_{sys})=\dfrac{1}{k} \sum_{r=1}^{k}{\bar{y}}_r=\dfrac{1}{4}(7+9+8.67+7.67)=8.08$ Population mean is $\bar{Y}=\dfrac{1}{N}\sum_{i=1}^{N}{Y_i}=\dfrac{1}{12}(8+10+……+11)= 8.08$ The average of the sample means is equal to the population mean, so it’s unbiased estimator.

For Circular systematic sampling:

Random start (r)	Serial	y values	Sample mean
1	(1, 5, 9)	(8,7,6)	7.00
2	(2, 6,10)	(10,9,8)	9.00
3	(3,7,11)	(6,11,9)	8.67
4	(4,8,12)	(7,5,11)	7.67
5	(5,8,1)	(7,6,8)	7.00
6	(6,9,2)	(9,8,10)	9.00
7	(7,10,3)	(11,9,6)	8.67
8	(8,11,4)	(5,11,7)	7.67
9	(9,12,5)	(6,8,7)	7.00
10	(10,1,6)	(8,10,9)	9.00
11	(11,2,7)	(9,6,11)	8.66
12	(12,3,8)	(11,7,5)	7.67

The average of the sample means is $E({\bar{y}}_{sys})=\dfrac{1}{N}~ \sum_{r=1}^{k}{\bar{y}}_r=\dfrac{1}{12}(7+9+8.67+7.67+….+7.67)=8.08$ Population mean is $\bar{Y}=\dfrac{1}{N}\sum_{i=1}^{N}{Y_i}=\dfrac{1}{N}(8+10+……+11)= 8.08$ The average of the sample means is equal to the population mean, so it’s unbiased estimator.
The R codes for LS.

y = c(8, 10, 6, 7, 7, 9, 11, 5, 6, 8, 9,  11)
# Compute the population mean and variance
(ybar=mean(y))

## [1] 8.083333

(yvar= var(y))

## [1] 3.901515

# perform all systematic samples
n = 3 ; N = length(y) ; k= N/n
sys_sample = matrix(0,n,k)
for(i in 1:k){
  sys_sample[,i] = y[seq(i,N,k)]
}
sys_sample

##      [,1] [,2] [,3] [,4]
## [1,]    8   10    6    7
## [2,]    7    9   11    5
## [3,]    6    8    9   11

sys_mean = apply(sys_sample,2,mean)
sys_mean

## [1] 7.000000 9.000000 8.666667 7.666667

# Compare their mean with the population mean
mean(sys_mean)

## [1] 8.083333

# Compute the variance of the sample mean 
(var_sys_mean = var(sys_mean)*((k-1)/k))

## [1] 0.6319444

sys_var = apply(sys_sample,2, var)
sys_var

## [1] 1.000000 1.000000 6.333333 9.333333

mean(sys_var)

## [1] 4.416667

S = sd(y)
varSRS = ((N-n)/(N*n))* S^2
varSRS

## [1] 0.9753788

(varSRS/var_sys_mean)*100

## [1] 154.3457

Exercise 5.2 Many trees along a canal have been uprooted by a storm. This damage persists along a $35 km$ stretch. The Department of Irrigation is interested in estimating total number of these damaged trees. Each one kilometer segment along the canal has been divided into 5 equal parts by stone markers. Thus, the entire 35 km long stretch is divided into 175 equal segments. Twenty five of these segments are selected using LS sampling with a sampling interval of 7 segments. The information regarding number of uprooted trees $(y)$ obtained from this l-in-7 systematic sample is given in the following table:

Selected	y	Selected	y	Selected	y
6	4	62	3	118	23
13	17	69	8	125	12
20	11	76	5	132	8
27	6	83	13	139	17
34	8	90	9	146	6
41	16	97	16	153	5
48	21	104	17	160	8
55	13	111	9	167	10
NA	NA	NA	NA	174	15

Estimate the total number of uprooted trees, and also determine the confidence interval for it.
Solution

The population size $N=175$, and sampling interval $k=7$. They use linear systematic sampling for the selection. Let the random number $r$ selected from $1$ to $k(=7)$ be $6$.

The sample mean is $\bar{y}_{sys}=\dfrac{1}{7}~\sum_{i=1}^{n}{y_i}=\dfrac{1}{7}(4+\ldots+15)= 11.2$

The estimate of total number of uprooted trees is $y^\prime_{sys}=N\times~{\bar{y}}_{sys} =1960$. The estimate of variance from the equation
\[v(\bar{y}_{sys})=\dfrac{N-n}{2Nn(n-1)} \sum_{i=1}^{n-1}(y_{i+1}-y_{i})^2=0.9207\]

Then estimate of variance $V\left(\ {y^\prime}_{sys}\right)\ =N^2\times0.9207=281966.88$ Using the estimate for total number of uprooted trees and the estimate of its variance, we now calculate the confidence interval for population total

\[ N\times{\bar{y}}_{sys}\pm2\times\sqrt{V\left({\bar{y}}_{sys}\right)}\times N= [\ 1624.161,\ 2295.839\ ]\]

Y = c(4,17,11,6,8,16,21,13,3,8,5,13,9,16,17,9,23,12  ,8,17,6,5,8,10,15)

n = length(Y) ; ybar = mean(Y) ; Ytotal = N*ybar ; N=175
# Compute the population variance
(X = sum(diff(Y)^2))

## [1] 1289

(var_sysmean = X* (N-n)/(2*N*n*(n-1)) )

## [1] 0.9207143

# Compute the population variance 
(Ytotal_var = N^2 * var_sysmean)

## [1] 28196.88

(sd = sqrt(var_sysmean))

## [1] 0.9595386

#CI for sample total 
(CIL = N*(ybar-2*sd))

## [1] 1624.161

(CIU = N*(ybar+2*sd))

## [1] 2295.839

Exercise 5.3 It is desired to estimate the average per day rent for single occupancy rooms in well known hotels of a state. In all, there are 192 such hotels in the state and these are listed in a book entitled “A Guide to Visitors”. The investigator selected a l-in-8 sample of hotels and rang up the managers of sampled hotels.

The information on rent (in rupees) so obtained is given below:

Hotel	Rent	Hotel	Rent	Hotel	Rent
1	100	9	90	17	125
2	120	10	110	18	85
3	125	11	125	19	90
4	115	12	80	20	105
5	110	13	70	21	130
6	80	14	125	22	95
7	130	15	130	23	135
8	120	16	105	24	140

Estimate the average per day rent along with the confidence limits for it.
Solution
The population size $N=192$, and sampling interval $k=8$. They use linear systematic sampling for the selection.

The estimate of the average per day is the sample mean which is $\bar{y}_{sys}=\dfrac{1}{24}\sum_{i=1}^{n}{y_i}=\dfrac{1}{24}(100+\ldots+140)= 110$

The estimate of variance for random population is \[V({\bar{y}}_{sys})=\dfrac{N-n}{Nn(n-1)} \sum_{i=1}^{n}(y_{i}-\bar{y})^2= 14.34556\]

Using the estimate of the average per day and the estimate of its variance, we now calculate the confidence interval for population total

$\bar{y}_{sys}\pm2\ast\sqrt{V\left({\bar{y}}_{sys}\right)} = [\ 102.4249,\ 117.5751\ ]$

y = c(100, 120, 125, 115, 110,  80, 130, 120, 90, 110, 125,  80,  70, 125, 130, 105, 125,  85,  90, 105, 130,  95, 135, 140)
sum(y)

## [1] 2640

n= length(y)
N= 192
(k = N/n)

## [1] 8

#  Compute sample mean 
(ybar = mean(y))

## [1] 110

# Compute the variance of the sample mean 
X = sum((y-mean(y))^2)
(var_sysmean = ((N-n)/(N*n*(n-1)))*X)

## [1] 14.34556

(sd = sqrt(var_sysmean))

## [1] 3.787554

# CI for sample mean 
(CIL = ybar - 2*sd)

## [1] 102.4249

(CIU = ybar + 2*sd)

## [1] 117.5751

Exercise 5.4 Assume the data that we have from 100 observations as follows

set.seed(111)
y = sample(1:40,100,replace=TRUE)

and consider the simple random sampling without replacement (SRS) with $n=20$.

Compute
1. the population mean and variance.
2. Perform all 1-in-5 systematic samples (LS).
3. Compute their means.
4. Compute their variances.
5. Compute all systematic sample mean.
6. Verify that systemic mean is unbiased estimator of the population mean.
7. Compute the variance of the systematic sample mean $Var({\bar{y}}_{sys})$
8. Compute the variance of the SRS mean $Var(\bar{y})$ with $n=20$.
9. Find the relative efficiency of the variance of the simple random mean, $Var(\bar{y})$, and the variance of the systematic mean,$Var(\bar{y}_{sys})$.
Solution

The population mean is $\bar{Y}=20.69$, the population variance is $V({\bar{y}}_{sys})= 120.9635$
The population size N=100, and sampling interval $k=5$. We use linear systematic sampling for the selection. Let the random number $r$ selected from $1$ to $k(=5)$. Each sample has a size of $(n=20)$.
The systematic means for all samples are \[20.25,\ 22.55,\ 20.80,\ 18.10,\ 21.75\]
The variances of the sample means are \[158.7237\ ,93.1026,\ 119.8526,\ 115.8842,\ 130.6184\]
The all systematic sample mean ${\bar{y}}_{sys}=\dfrac{1}{5}~ \sum_{r=1}^{k}{\bar{y}}_r= \dfrac{1}{5}~(20.25+\ldots+\ 21.75)=\ 20.69$
From answer 5, the all systematic sample mean is equal to the population mean, therefore it’s unbiased estimator.
The estimate of variance is $V({\bar{y}}_{sys})= 2.2994$
For simple random sampling without replacement with $n=20$ and $ Var({y})=4.8385$
The relative efficiency of the variance of the simple random mean, $Var(\bar{y})=$, and the variance of the systematic mean, $Var(\bar{y}_{sys}) = 210.4263$ which it shows that the systematic sampling estimator ${\bar{y}}_{sys}$ for the given population, is $210.426$ times more efficient than the one based on SRS
That can be computed using the following pieces of code

set.seed(111)
y = sample(1:40,100,replace=TRUE)
# Compute the population mean 
(ybar = mean(y))

## [1] 20.69

# Compute the population variance
(yvar = var(y))

## [1] 120.9635

# perform all 1-in-5 systematic samples 
k = 5
n = 20
N = 100
sys_samples = matrix(0,n,k)
for(i in 1:k){
  sys_samples[,i] = y[seq(i,N,k)]
}
print(sys_samples)

##       [,1] [,2] [,3] [,4] [,5]
##  [1,]   14   20   19   25    5
##  [2,]   35    8   26   10   36
##  [3,]    8   14   26   25   24
##  [4,]    7   21   15    1    9
##  [5,]   40   25   35    7   36
##  [6,]   28   38   25    4   31
##  [7,]    6   30    5   37    6
##  [8,]   36   28   18   27   29
##  [9,]   23   32    1   16   39
## [10,]    4   25   27   24   16
## [11,]   15   27   29   24   30
## [12,]    1   29   25   26   18
## [13,]   32   19   33   25   27
## [14,]   28   28   11   31   12
## [15,]   31    6   39   31    6
## [16,]   18    9   12   18    6
## [17,]   13    5   31    6   22
## [18,]    6   27    7    4   24
## [19,]   23   35   26   13   23
## [20,]   37   25    6    8   36

sys_mean = apply(sys_samples,2,mean)
sys_mean

## [1] 20.25 22.55 20.80 18.10 21.75

# Compare their mean to the population mean
mean(sys_mean)

## [1] 20.69

# which proves that systemic mean is unbiased estimator of the population mean.
# Compute the variance of the systematic sample mean 
var_sys_mean = var(sys_mean)*((k-1)/k)
var_sys_mean

## [1] 2.2994

# Compute the variances of the systematic samples 
sys_var = apply(sys_samples,2,var)
mean(sys_var)

## [1] 123.6363

# Compute the variance of the SRS mean V(ybar)
S = sd(y)
(varSRS= (N-n)/(N*n)*S^2)

## [1] 4.838541

# The relative efficiency of the variance of the simple random mean
varSRS/var_sys_mean*100

## [1] 210.4263

Selected	y	Selected	y	Selected	y
6	4	62	3	118	23
13	17	69	8	125	12
20	11	76	5	132	8
27	6	83	13	139	17
34	8	90	9	146	6
41	16	97	16	153	5
48	21	104	17	160	8
55	13	111	9	167	10
NA	NA	NA	NA	174	15

Hotel	Rent	Hotel	Rent	Hotel	Rent
1	100	9	90	17	125
2	120	10	110	18	85
3	125	11	125	19	90
4	115	12	80	20	105
5	110	13	70	21	130
6	80	14	125	22	95
7	130	15	130	23	135
8	120	16	105	24	140

Selected	y	Selected	y	Selected	y
6	4	62	3	118	23
13	17	69	8	125	12
20	11	76	5	132	8
27	6	83	13	139	17
34	8	90	9	146	6
41	16	97	16	153	5
48	21	104	17	160	8
55	13	111	9	167	10
NA	NA	NA	NA	174	15

Hotel	Rent	Hotel	Rent	Hotel	Rent
1	100	9	90	17	125
2	120	10	110	18	85
3	125	11	125	19	90
4	115	12	80	20	105
5	110	13	70	21	130
6	80	14	125	22	95
7	130	15	130	23	135
8	120	16	105	24	140

Selected	y	Selected	y	Selected	y
6	4	62	3	118	23
13	17	69	8	125	12
20	11	76	5	132	8
27	6	83	13	139	17
34	8	90	9	146	6
41	16	97	16	153	5
48	21	104	17	160	8
55	13	111	9	167	10
NA	NA	NA	NA	174	15

Hotel	Rent	Hotel	Rent	Hotel	Rent
1	100	9	90	17	125
2	120	10	110	18	85
3	125	11	125	19	90
4	115	12	80	20	105
5	110	13	70	21	130
6	80	14	125	22	95
7	130	15	130	23	135
8	120	16	105	24	140