Error Statistics

How does Mayo try to obtain severity?


How does Mayo try to obtain severity?

Since some readers may think that my criticism of Mayo's analysis of Perrin's experiments is unfair, let me continue my argument a bit further. I said, "the consideration of severity for the Perrin case is given only informally, in obscure terms, and it does not coform to the manner she proposed" (Severity for Perrin?) . Here are what she actually said. First, after showing that the null hypothesis j (for the given specific experimental model) passes a particular test, she adds:

Perrin's argument to this effect has weight only because he was able to argue further that if the model was inadequate (if j' was true), we would very often get differences statistically significant from what is typical under j. In other words, he needed to argue that j had passed a severe test. The multiple experiments for which Perrin stresses the need ... are deliberately designed so that if one misses an error, another is likely to find it. ... (228)

All right. But we have to remember that the null hypothesis j and its rival j' were localized by reference to a specific experiment E, with a given sample (and the data obtained from that experiment are modeled in a certain manner, for testing j). We expected that Mayo is going to ascribe severity to this particular test (or to the set of the hypotheses and the test), but we were wrong; Mayo is suggesting that severity is obtainable with a series of experiments. Maybe so, but we have got to keep the same hypotheses j and j' through all such experiments; otherwise her talk of passing j by severe tests does not make sense. For the reader's convenience, let me quote again what j and j' are:

j: The data from E approximates a random sample from the hypothesized Normal process M.

j': The sample displacements of data from E are characteristic of systematic (nonchance) effects.

This j is obtained as a part of larger (more general and richer in content) hypothesis H (222), and relativized to experiment E. Then, how do you test these hypotheses with different experiments? If experiment E is specified, it seems to me that all you can do is to use different models of the same data from the same experiment. But, to my surprise, Mayo's interpretation is definitely different. She continues:

As is typical, Perrin's argument for severity was substantiated by reference to other tests. The overall argument goes beyond any single statistical significance test. Here is where the multitude of deliberately varied additional tests plays a particularly important role. ... (228)

Although Mayo says "Perrin's argument for severity", you should not take this literally, since Perrin himself does not mention severity at all; this is merely Mayo's interpretation. And what is more important is this: what is "the multitude of deliberately varied additional tests"? According to Mayo,

To this end, Perrin describes numerous sets of statistical analyses. Some made use of the same 500 measured displacements, only modified in different ways; others involved further recorded displacements on the same gamboge preparation. Still others dealt with totally distinct gamboge experiments, where the key features were deliberately varied. ... While the same question is being asked (Is j adequate?), by phrasing it differently each test is designed to check if there are mistakes in the answers from other analyses. (228-229)

Thus, although the possibility of different modeling of the same data is mentioned, it is abundantly clear that what Mayo has in her mind is far more varied, including experiments with different samples, with different settings! I just wonder how these different experiments can provide tests for j and j' relativized to E, let alone severe tests!

The source of the difficulties should now be clear: Mayo failed to state clearly that "E" (for experiment) is a variable, so that both j and j' are a general hypothesis to that extent. Thus, if Mayo says that j passed in experiment Ei , this means "a particular instance of j passed that test"; that is, we may accept that "with this sample in such-and-such a setting, displacements are normally distributed". And it is well known that for testing a general hypothesis (such as "the Brownian motion in general can be regarded as completely irregular"), you have got to examine various instances, changing the conditions sorrounding these instances, as Perrin in fact did with respect to the nature of emulsion, viscosity, the size and the mass of the grains, etc.; this surely eliminates a certain range of systematic effects. But the effect of this sort of variety of instances must be explicated in probabilistic terms, according to Mayo's proposal of severity requirement, and as I see it, Mayo has not done this. In short, Mayo should have distinguished carefully between the test of a general hypothesis and the test of its instances, and should have related these two. Thus, this leads us to distinguishing between (1) severity applied to (tests of) a general hypothesis and (2) applied to a singular hypothesis.

And this immediately produces a new distinction: the falsity of a general hypothesis j and the falsity of an instance of j. For example, the negation of "All men are mortal" and the negation of "Socrates is mortal" are quite different; and likewise the negation of j and the negation of an instance of j are quite different. To continue our simple example, in order to falsify "All men are mortal", any single counterexample, say the immortality of Jupiter, is enough, but it is irrelevant to the falsity of the specific instance "Socrates is mortal"; in order to deny the latter, you've got to show Socrates is immortal. Likewise, there are many specific ways (that is, kinds of dependency on systematic effects) to deny j, but given a specific instance of j, there is no such freedom. The systematic effect may be due to temperature, to the size of the grains, to the viscocity of the emulsion, etc., etc., and hence Perrin paid attention to such factors in his individual experiments. Therefore, what he passed by these tests are speficific instances of j, not j itself; what is rejected is not the negation of j (i.e. j') but the negation of a specific instance of j. But Mayo consistently used (see pp. 223-230) j and j' for the hypotheses in specific experiments (i.e. for instances of j etc.), disregarding the preceding distinction completely, which makes her discussion quite confusing and sloppy.

Anyway, Mayo continues her argument (or interpretation of what Perrin did) still further:

The argument goes like this: if experiment E was not correctly described by the hypothesis j (i.e., if j' were true), there would not be an equal chance of being displaced by an amount in either direction for each particle; there would be some dependencies. But we know what it is like to interact with a mechanism with such dependencies. We know what would be expected in those sorts of experiments---we can even "display" it. (230)

Yes, given a specific alternative (which implies j'), we can even display how displacements are different from those actually observed; but with j' alone, we cannot do this, and we cannot provide probabilistic predictions either (notice that j' is far weaker than the specific alternative implying j'; the problem of "catchall" appears again). In other words, Perrin excluded literally only several kinds of systematic effects by his experiments; so how can Mayo obtain the low probability (in the frequentist's sense)* of passing j in case j is false? Mayo's discussion is not informative as regards this question; maybe she has to invoke background information, auxiliary hypotheses, etc., I do not know. It may well be the case that Mayo decides not to say "systematic effects" unless the results significantly deviate from the normal curve; but in this case she cannot speak of severity without circularity (since setting the significance level defines the alternative hypothesis j'). Thus, despite what Mayo said in these several pages, I do not see how Mayo can reconstruct Perrin's inference in terms of her notion of severity. She needs reworking, in view of the distinctions she has neglected.

I have already suggested other ways (by no means refined) to intepret the importance of multiple measurements or tests (including Argument from Coincidence); but for a better way to reconstruct Perrin's argument within Mayo's Error Statistics, see How to reconstruct Perrin's argument? (which occurred to me June 17)

*Notice that this low probability must be obtained from "random experiments" for the frequentists such as Neyman, Pearson, and Mayo. See pp. 165-167. Thus Perrin's several experiments for checking the effect of temperature etc. are not sufficient for inferring this probability; moreover, Perrin's experiments are not designed for this purpose in the first place.


BACK TO ERROR STATISTICS


Last modified Jan. 27, 2003. (c) Soshichi Uchii

suchii@bun.kyoto-u.ac.jp