Error Statistics

How to Calculate? Normal Approximation


How to obtain the standard deviation for a binomial distribution?

In Mayo's book, examples of binomial experiment appear occasionally; but unfortunately, numerical examples are scarce. However, it is often necessary to refer to numerical examples, in order to understand general or abstract points in terms of concrete examples. So let me show you how to make simple calculations for obtaining crucial statistical quantities. Since the binomial distribution is conceptually simple, and it can be easily related to the normal distribution, we concentrate on the binomial distribution.

Suppose the following bionomial distribution:

Then the mean can be easily obtained, namely p. Then all we need to know is either the variance or the standard deviation (ª‹variance).

These are the values for a single trial. For the case of n trials, we have to multiply by the factor n:

Now, suppose you wish to know the probability that the observed number of success in 100 trials is no greater than 60, given that p = 0.5 (Mayo's example on p. 159). How can we obtain this probability? Usually, we calculate this by using the numerical table of normal distribution (which, remember, gives values for standardized normal distribution; and the binomial distribution can be approximated by this if the data are large enough), in the following manner:

(1) First, we've got to standardize our problem: For this, notice that the standard deviation in our case is ª‹100ª~0.5ª~0.5 = 5.ª@

(2) Then, since the 60 successes deviate 10 from the mean (50),

deviation/standard deviation = 10/5 = 2.

(3) Finally, this value 2 is the value of x-coordinate in the numerical table (see the Table); and corresponding to this value, the table gives the value of probability 0.9773, which means that the probability that the deviation no greater than10 occurs is 0.9773.

(4) However, this value may be a bit different from the correct value (Mayo gives 0.97), since this is a normal approximation, but it is sufficient for giving you a rough idea. (If we ask a similar question in terms of normal distribution, the preceding value of probability becomes more exact.)

Using the same method of approximation, you can calculate the probability that the observed number of success in 100 trials is no greater than 60, given that p = 0.52. For this case, since the deviation is now 8,

deviation/standard deviation = 8/ª‹100ª~0.52ª~0.48 = 8/ª‹24.96 =1.601

and corresponding to this is probability 0.9463, which means the probability is a bit smaller than the previous case of p = 0.5.

ª@

The Normal Distribution, Numerical Table

x
0.00
0.01
0.02
...
...
...
...
...
...
1.6
0.9452
0.9463
0.9474
...
...
...
...
...
...
2.0
0.9773
0.9778
0.9783
...

With this technique of calculation, you can now easily confirm the following point: Setting a significance level for your experiment by no means settles the question, "what sort of experiment should be performed to test the given null hypothesis?"; for there are a number of different ways to fulfill the given significance level. And that is the problem for any theory of statistical inference, in the camp of Error Statistics. It is quite unfortunate that Mayo continues her arguments without mentioning this basic problem; thus uninitiated readers cannot see one of the crucial points of her book, and of Error Statistics in general.

Now, continuing the same binomial distribution (say, the example of Lady tasting tea), suppose you wish to settle the question with the significance level of 0.03 (in other words, the cutoff point is 2 standard deviations). Then, why not choose a shorter experiment?; say, 50 trials and 33 successes should be sufficient for achieving your goal. For, then,

deviation/standard deviation = 8/ª‹50ª~0.5ª~0.5 = 8/ª‹12.5 =2.26.

Of course, Mayo knows this very well, and should certainly be ready to give you the reason why this experiment may not be good. But without this sort of illustrations, it is hard for many readers to grasp the significance of "severity", "type II error", and many other things. In this respect, such authors as Harald Cramer (The Elements of Probability Theory, Wiley, 1955) for instance, are to be imitated; before touching on general and important problems, they have already given concrete numerical examples; see, for instance, Cramer's remark on "significance" on page 212, and a preceding example on page 201.

The trouble is, however, that in general there are for any given hypothesis a large number of different possible tests on the same level p %. We have already seen an example of this in 14.4, ... (Cramer 1955, 212)

Now, in order to complete our numerical examples (probabilities are calculated by Normal Approximation), let us compare the severity of the two tests (100 trials vs. 50 trials) for the two alternative hypotheses (p = 0.50 vs. p = 0.52); the severity is calculated according to Mayo's revised (unofficially assumed) definition.

ª@
observed frequency of success: e means "greater than f"
f = 60/100
f = 33/50
H0: p = 0.50

P(ªæe, H0) = 0.9773;

P(e, H0) = 0.0227

P(ªæe, H0) = 0.9881;

P(e, H0) = 0.0119

H': p = 0.52

P(ªæe, H') = 0.9463;

P(e, H') = 0.0537

P(ªæe, H') = 0.9762;

P(e, H') = 0.0258

If the significance level (cutoff point) is 0.03

Given e, H0 rejected, and H' accepted;

severity of passing H' = 0.9773

Given e, H0 rejected, and H' accepted, since the significance level is determined by H0; but if H' were the null hypothesis, what? (Maybe then the test would be completely different)

severity of passing H' = 0.9881

No doubt this is an artificial example and we know other methods of estimation (such as confidence intervals) are available for the Error Statistician; still this can make clear what Mayo should spell out.

(1) Is this really what she wants?

(2) What is she going to say for the 50-trial case? Is it a better test since the severity is higher; or is it a bad test since P(e, H') is too low?

(3) What is the point of severity--comparison of severity of different tests, or comparison of severity for different hypotheses in the same test?

I am afraid Mayo's way of exposition tends to prevent (most of) us from understanding the book; for instance, the full picture of the Neyman-Pearson theory (with type I error and type II error) does not appear until chapter 11, and so, how can she expect us to grasp the significance of severity in chapter 6?


BACK TO ERROR STATISTICS


Last modified Jan. 26, 2003. (c) Soshichi Uchii

suchii@bun.kyoto-u.ac.jp