Robustness of Generalized Row-Column Designs Against Missing Observation(s)
Authors: Anindita Datta, Seema Jaggi, Eldho Varghese#, Cini Varghese,
Arpan Bhowmik, and Mohd. Harun
ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012
#ICAR- Central Marine Fisheries Research Institute, Kochi- 682 018


1. Introduction

When the heterogeneity present in the experimental material is from two sources, then two-dimensional blocking or double blocking of the experimental units is recommended for control or reduction of experimental error. The two blocking systems are referred to generally as row blocking and column blocking and the resulting designs are termed as Row-Column (RC) designs. These designs are used to control variability in field and animal experiments. Most of the row-column designs developed in the literature have one unit corresponding to the intersection of each row and column. However, for the instances when the number of treatments is large with limited experimental resources, Generalized Row-Column (GRC) designs are used where there is more than one unit in each row-column intersection. GRC design is an arrangement of v treatments in p rows and q columns such that the intersection of each row and column (cell) consists of more than one unit. Consider an experiment was conducted to compare the colour intensities of apple sauce (Edmondson, 1998). The treatments consist of all combinations of 12 blends of apple sauce with 4 concentration of cinnamon. Treatments could be stored for 4 different lengths of time. A GRC design as shown below was used in which rows represented cinnamon concentrations, columns as storage times and symbols as blends.

Rows (Cinnamon Concentrations) Columns (Storage Time)
I II III IV
I 1 5 9 2 6 10 3 7 11 4 8 12
II 2 7 10 1 8 9 4 5 12 3 6 11
III 3 8 12 4 7 11 1 6 10 2 5 9
IV 4 6 11 3 5 12 2 8 9 1 7 10
These designs are studied in the literature in different names such as Semi-Latin square in which there are n rows, n columns and intersection of each row and column contains a cell of k units [Bailey and Monod (2001)], Trojan square [Bailey (1988,1992), Edmonson (1998,2002)], Generalized incomplete Trojan-type designs [Jaggi et al. (2010, 2016)]and Row-column designs with multiple units per cell [Datta et al. (2014, 2015,2016)].

In usual practice, these trials are conducted under controlled conditions and it is assumed that there are no disturbances that occur while conducting or measuring the observations. The presence of missing observations, outliers in the data, etc. are some of the disturbances that may occur during experimentation. These disturbances may lead to wrong interpretation of results or less precise comparisons among treatments tried in the experiment. In order to overcome such situations, designs which are insensitive or robust against missing observations/ outliers are required.

2. Methodology

A GRC design is considered here with v treatments arranged in p rows, q columns and in each row-column intersection (i.e. cells) there are k units or plots resulting in total n= pqk experimental units or observations. The following three-way classified model with treatments, rows and columns is considered:

Yl(ij) = m + tl(ij) + αi + b j + el(ij); i =1,2,...,p; j = 1,2,...,q; l = 1,2,...,k, ... 2.1

where Yl(ij) is the response from the lth unit corresponding to the intersection of ith row and jth column. m is the general mean, tl(ij) is the effect of the treatment appearing in the lth unit corresponding to the intersection of ith row and jth column, αi is the ith row effect and bj is the j th column effect. el(ij) is the error term identically and independently distributed and following normal distribution with mean zero and constant variance.

A GRC design is robust against loss of observations, if the loss of efficiency of the residual design as compared to the original design is small. If Cd is the information matrix for estimating the treatment effects of GRC design d and Cd* is that of the residual design d* after the observations are lost, then the efficiency E of the residual design relative to the original design is given by
E = (Harmonic mean of non-zero eigen of Cd* / Harmonic mean of non-zero eigen of Cd)

A GRC design is said to be robust if the efficiency of the resulting design after loss of information is more than 95%.

SAS code has been written in PROC IML to calculate the information matrix ( C-matrix) of treatment effects, its eigen-values and the harmonic mean of non-zero eigen-values of C-matrix of original design and the residual design for GRC design.


3. Results

Here, the robustness of different classes of GRC designs (Bailey, 1992) against missing of one or more observations within a cell as per the efficiency criteria, as defined in methodology, has been investigated. We consider a design to be highly robust against missing observation(s) if the loss in efficiency of the residual design is not more than 5% and robust if the loss in efficiency of the residual design is between 5% to 10%.

The robustness of this class of designs has been investigated against missing of some/ all observations of last column. Without loss of generality, the observations from units of last column are assumed to be missing as the columns can always be interchanged. Table 3.1 gives the parameters of the designs considered i.e., number of treatments (v ≤ 10), number of rows (p), number of columns (q), replication (r), cell size (k) and the number of observation(s) missing with the unit/ cell number of the last column from which the observation(s) are missing along with the efficiency (E) of the residual design relative to the original design.

Datta et al. (2015) developed this series of GRC design for unequal cell sizes. This design is developed by using a BIB design with parameters v*, b* (even), r*, k*, λ*. The resulting design have parameters v = v*, p = 2 rows of size (v*b*)/2, q = b* columns of size v*, r = b*, k1 = k*, and k2 = v*- k*.

Example 3.1: Consider a BIB design with parameters v* = 5, b* = 10, r* = 4, k* = 2, λ* = 1. The following is a GRC design with parameters v = 5, p = 2 of size 25 each and q = 10 columns of size 5, r = 10, k1 = 2 and k 2 = 3.

Rows Columns
I II III IV V VI VII VIII IX X
I 1 2 1 3 1 4 1 5 2 3 3 4 5 2 4 5 2 3 5 2 3 4 1 4 5
II 1 3 5 1 3 4 1 2 5 1 2 4 1 2 3 2 4 2 5 3 4 3 5 4 5
Table 3.1 highlights the parameter of the GRC designs developed based on the above series along with number of observation missing and the cell number from which the observations are missing, harmonic mean of non-zero eigen values of information matrix of original design and the residual design under the three-way model. The efficiency (E) of the residual design relative to the original design have also been highlighted.

Table 3.1: Parameters and efficiency of the residual design

S. No. v p q r k No. of observation missing Unit/ Cell No. HM (Cd) HM (Cd* ) E
1 5 2 10 10 2 3 1 last unit in last cell 8.50 8.29 0.98
2 5 2 10 10 3 3 2 last any two units from last cell 8.50 8.17 0.96
3 5 2 10 10 4 3 3 last cell total 8.50 7.93 0.93
4 5 2 10 10 5 3 2 last unit of each cell of last column 8.50 8.01 0.94
5 5 2 10 10 6 3 5 last unit of each cell of last column and last cell total 8.50 7.69 0.91
6 9 2 12 12 3 6 1 last unit 12.00 11.86 0.99
7 9 2 12 12 4 6 2 last any two units from last cell 12.00 11.72 0.98
8 9 2 12 12 5 6 3 last any three units from last cell 12.00 11.59 0.97
9 9 2 12 12 6 6 4 last any four units from last cell 12.00 11.47 0.96
10 9 2 12 12 7 6 5 last any five units from last cell 12.00 11.34 0.94
11 9 2 12 12 8 6 6 total last cell 12.00 11.21 0.93
12 9 2 12 12 9 6 2 last unit of each cell of last column 12.00 11.73 0.98
13 9 2 12 12 10 6 9 last unit of each cell of last column and last cell total 12.00 11.09 0.92
14 9 2 18 8 4 5 1 last unit 18.00 17.86 0.99
15 9 2 18 8 5 5 2 last any two units from last cell 18.00 17.73 0.99
16 9 2 18 8 6 5 3 any three units from last cell 18.00 17.60 0.98
17 9 2 18 8 7 5 4 last four units from last cell 18.00 17.47 0.97
18 9 2 18 8 8 5 5 total last cell 18.00 17.34 0.96
19 9 2 18 8 9 5 2 last unit of each cell of last column 18.00 17.73 0.99
20 9 2 18 8 10 5 9 last unit of each cell of last column and last cell total 18.00 17.35 0.96
21 10 2 30 30 3 7 1 last unit in last cell 29.76 29.64 1.00
22 10 2 30 30 4 7 2 last two units from last cell 29.76 29.52 0.99
23 10 2 30 30 5 7 3 last any three units from last cell 29.76 29.40 0.99
24 10 2 30 30 6 7 4 last any four units from last cell 29.76 29.29 0.98
25 10 2 30 30 7 7 5 last any five units from last cell 29.76 29.19 0.98
26 10 2 30 30 8 7 6 last any six units from last cell 29.76 29.09 0.98
27 10 2 30 30 9 7 7 last cell total 29.76 28.95 0.97
28 10 2 30 30 10 7 2 last unit of each cell of last column 29.76 29.53 0.99
29 10 2 30 30 11 7 8 last unit of each cell of last column and last cell total 29.76 28.84 0.97
It can be seen from Table 3.1 that the efficiency of the resultant design is quite high for most of the designs. Out of 29 designs, 24 design have efficiency more than and equal to 95% and hence are highly robust whereas 5 designs are only robust as their efficiencies are less than 95%.

4. Conclusion

The series of GRC designs investigated are found to be robust against loss of observations. There is a decreasing trend in efficiency with increase in number of missing observations. In fact, the intensity or the consequences depends upon the size of the design. It can be seen that smaller designs are more affected by the missing observations.

References:


Bailey, R. A. (1988). Semi Latin squares. Journal of Statistical Planning and Inference, 18, 299-312.

Bailey, R. A. (1992). Efficient semi-Latin squares. Statistica Sinica, 2, 413-437.

Bailey, R. A. and Monod, H. (2001). Efficient semi-Latin rectangles: Designs for plant disease experiments. Scandanavian Journal of Statistics, 28, 257-270.

Datta, A., Jaggi, S., Varghese, C. and Varghese, E. (2014). Structurally incomplete row-column designs with multiple units per cell. Statistics and Applications, 12(1&2), 71-79.

Datta, A., Jaggi, S., Varghese, C. and Varghese, E. (2015). Some series of row-column designs with multiple units per cell. Calcutta Statistical Association Bulletin, 67, (265-266), 89-99.

Edmondson, R. N. (1998). Trojan square and incomplete Trojan square design for crop research. Journal of Agricultural Science,131, 135-142.

Jaggi, S., Varghese, C., Varghese, E. and Sharma, V. K. (2010). Generalized incomplete Trojan-type designs. Statistics and Probability Letters, 80, 706-710.

Jaggi, Seema, Varghese, Cini, and Varghese Eldho (2016): A series of generalized incomplete Trojan-type designs. Journal of Combinatorics, Information and System Sciences: American Journal, 40(1-4), 53-60.


About Author / Additional Info:
• Working as a scientist from 2012
• published around 35 research papers in national and international journals of repute
• served as resource person in different institute
• received IARI merit medal for outstanding academic performance in Ph.D.
• received Dr. G.R. Seth young Scientist Award-2015 by Indian Society of Agricultural Statistics
• Received Krishi Vigyan Gaurav (honorary title) by ARCC and BKAS