Error Statistics

The Chi-Square Test

What is Chi-Square?

As is already clear from the Binomial Experiment (Mayo, ch. 5, Figure 5.2), any statistical test comes down to calculating the probability that a certain result holds according to the null hypothesis. A convenient and much used method for this purpose is the chi()-square test, first proposed by Karl Pearson; and this is very useful especially when the experimental results are classified into several groups (which are mutually exclusive). Since Mayo does not say anything about this, let me add this note.

Karl Pearson (1857-1936)

As an example, take Mendel's famous experiment with peas (the materials are drawn from Harald Cramer, The Elements of Probability Theory, Wiley, 1955, 219). With this example, you can get an idea of what chi-square is, and how chi-square test goes. In one of his experiment, Mendel obtained 556 peas, classified according to shape (either Roud or angular) and color (either Yellow or green); Round and Yellow are dominant, the capital letters signifying a dominant gene. Thus according to Mendel's law, the following 16 genotypes have an equal probability (our students should be familiar with this logical pattern!).

RRYY, RRYg, RRgY, RRgg, RaYY, RaYg, RagY, Ragg,

aRYY, aRYg, aRgY, aRgg, aaYY, aaYg, aagY, aagg.

However, since phenotypes are governed by dominant genes, we should expect the following proportion for the four types:

Now, Mendel obtained the following data by his experiment:

phenotype
observed frequency
expected frequency
difference
Round & Yellow
315
312.75
2.25
Round &green
108
104.25
3.75
angular & Yellow
101
104.25
-3.25
angular & green
32
34.75
-2.75
Total
@
556
556.00
0.00

@

All right. How should we check whether or not these data fit the Mendel's expectation (hypothesis)? Here the chi-square works well. Let us define a quantity called the chi-square () as follows. (1) First, take the square of each difference, and divide it by the expected frequency; then (2) take the sum of all. This is the chi-square of these data. In short, the chi-square is a sum of each weighted (squared) difference. We take the square because we wish to treat all differences (positive as well as negative) on a par.

Since we have 4 groups, and since if 3 groups are given then everything can be determined, we say this distribution has "3 degrees of freedom". And for our data, the value of chi-square is:

= 5.0625/312.75 + 14.0625/104.25 + 10.5625/104.25 + 7.5625/34.75

= 0.016 + 0.135 + 0.101 + 0.218 = 0.470

Now this value is useful in that we can obtain the probability that the data occurs (and the probability that the data are an error, as well; see the figure of chi-square distribution), from this value; in any standard textbook of statistics, you can find a table of chi-square distribution, and from this table you can obtain that probability. In our case, the probability falls between 0.90 and 0.95 (see the partial table), since 0.470 comes in between 0.352 and 0.584 in the table. That is, given Mendel's hypothesis, it is quite likely that we obtain these data (distribution). Thus on the customary standard of significance level, it may be judged that this hypothesis passed the test. See the graph of the chi-square (with the degree of freedom 3), and its relation to the probability of an error; to the extent this probability is small, the test is severe.

[This is the curve for the degree of freedom 3]

Table of chi-square distribution; look at the line of degree 3


Readers who need more explanation should consult Ishikawa (1997), ch. 4, pp.63-71. There, a similar problem (in terms of dice) is treated more in detail. Try to calculate probabilities yourself (on a computer)!

ƓlwTCR٭Excelɫ᭎vxo䁪A1997B


INDEX- CV- PUBLS.- PICT.ESSAYS- ABSTRACTS- INDEXELAPLACE- OL.ESSAYS-CRS.MATERIALS


Last modified Jan. 26, 2003. (c) Soshichi Uchii

suchii@bun.kyoto-u.ac.jp