Error Statistics

Mayo on the "reliability" of a hypothesis

Some readers may feel that my criticism of Mayo, as "not a conscientious frequentist", is based on misunderstanding and hence unfair. So let us examine anew what she said on crucial points, and see whether or not my criticism is unfair.

In the first place, we have to keep in mind how she defined the "reliability" of a hypothesis.

Learning that hypothesis H is reliable, I propose, means learning that what H says about certain experimental results will often be close to the results actually produced---that H will or would often succeed in specified experimental applications. ... This knowledge, I argue, results from procedures (e.g., severe tests) whose reliability is of precisely the same variety. My aim will be to show how passing a severe test teaches about experimental distributions or processes, and how this, in turn, grounds experimental knowledge. (10)

Although there is still a loose point about the notions of "closeness" and "success", this does not affect her argument, nor mine. Now, in view of what she said in later parts of the book, we can understand the "reliability of procedures" as follows:

(1) A procedure or a test is reliable if it will frequently detects errors if there are indeed errors, and not if there are none. (See, e.g., 18-19, 64)

Thus, she presents what she calls the argument from error:

It is learned that an error is absent when (and only to the extent that) a procedure of inquiry (which may include several tests) having a high probability of detecting the error if (and ony if) is exists nevertheless fails to do so, but instead produces results that accord well with the absence of the error. (64)

Remember that she holds the frequency view of probability (like Wes Salmon's). And the reliability of a procedure is then transferred to a hypothesis H, via her definition of "reliability of hypothesis", as was said in the first quotation. Thus,

(2) A hypothesis H is reliable if it frequently succeeds in experimental prediction.

The reliability in (2) is assured by the procedure applied to H, if H passes its tests; and the reliability of the procedure is defined by (1). It looks as if she has nicely succeeded in defining "reliability" in terms of frequency (or relative frequency, to be precise). And of course, the degree of reliability may also be obtained, by referring to the probability in the argument from error.

It looks, also (although Mayo herself does not suggest this), as if we can easily extend Mayo's method of defining "reliability" to single cases, such as weather forecast and betting. The forecast "It will rain tomorrow" is reliable because it is predicted by a reliable forecast system; and the determination that "The fair betting ratio for a head on this coin is 1/2" is reliable because a reliable procedure assures that this coin is not biased. Moreover, the numerical forecast "The degree it will rain in the morning of tomorrow, is 75 percent" can also be reliable for the same reason. Then, you may imagine that the alleged difficulty for the frequentist in dealing with the probability of single events is now solved! But are not there something funny going on in this sort of extension? Isn't it too easy to "solve", in such a manner, one of the hardest problems for the frequentist?

I am not trying to attribute this "easy" extension to Mayo, but trying to draw the reader's attention to the underlying problem in Mayo's error statistics and its use for epistemological argument. If you do not want to extend Mayo's method indefinitely, where should you stop? She does not discuss this question; but her implicit assumption seems to be this: you may talk about reliability of something, be it hypothesis, rule, or procedure, if its applications can be repeated in similar circumstances, and if you can meaningfully talk about statistics of success or failure. But recall that the frequentist usually takes the limit of relative frequency as his/her definition of probability; thus statistics is not enough, and you have to assume the convergence of the relative frequency. Then, even we may assume probability as given, Mayo's "application" of its value to hypotheses or rules as a measure of "reliability" begs at least two questions: (A) why is the limit of relative frequency applicable to finite cases (including single cases), which have different structures from that of an infinite sequence for the probability?; (B) where does the prescriptive element of "reliability" come from, since "reliability" gives a "guide of life" or a practical guidance?

Let us examine these two questions more closely. As regards (A), I presume we do not have to spend much space for proving that a finite sequence and an infinite sequence have a different mathematical structure. And when we talk about application of some mathematical concept to empirical phenomena, we assume that it is a common structure between the two which validates this application. Imagine any calculations, geometrical measurements, or structural designs of a building; these all satisfy this assumption. But for applications of probability (as a limit of relative frequency) to a finite sequence, this assumption is not satisfied; and the notrious case of single events is nothing but a special case of this difficulty (which Wes Salmon named the problem of "short run").

One may argue that this difficulty can be resolved by calculating the probability that the relative frequency in a finite sequence comes within some interval. But this argument is circular in that the probability evoked here is again the limit of relative frequency in another infinite sequence; thus the same difficulty recurs with this probability, generating an infinite regress. Reichanbach and Wes Salmon clearly saw this difficulty, and that's why they appealed to the "pragmatic vindication" of the straight rule (which extrapolates an observed relative frequency as the value of limit); they simply "posit" that value (and no probability assigned to it) until a new value is obtained on a larger body of evidence. Mayo simply disregards this difficulty, when she introduces "reliability", and stops at the reliability of the procedure. Thus, in speaking of "reliability" in the day-to-day practice of science (which is primarily concerned with finite cases), she has either (i) to stick to the frequency view and admit the infinite regress, or (ii) to join Reichenbach and Salmon in their "pragmatic vindication" thereby renoucing "reliability" at some level, or (iii) to appeal to the Bayesian notion of probability (which clearly has a prescriptive or pragmatic import) or something like it.

With the preceding argument, we can also see the alternative positions Mayo may adopt, as regards question (B). If she chooses (i), she has to admit she has no answer; if (ii), "reliability" has a pragmatic import provided by the pragmatic vindication (a success in the long run), but its main thrust comes from the expected convergence of relative frequency in an infinite series, and it may not be relevant to a finite case ("short run"); and finally, if (iii), her position is a curious mixture of the frequency view and the Bayesianism, contrary to her intention.


The preceding has been my initial reaction to Mayo's position. But the readers who finished this book may object that she addresses herself to similar questions in later chapters; in particular, she discusses Peircean Error Correction (Self-correcting thesis, in chapter 12), and this may suggest an answer to my criticism. I am aware of this, and I went over her argument, but could not find a satisfactory answer. So Let me briefly say why I find her answer unsatisfactory.

Mayo proposes a new reading of Peirce's theory of induction, and criticizes the standard interpretation (in terms of the Straight Rule for inferring the limit of relative frequency, and the distinction between Quantitative and Qualitative induction) by saying that it largely misses Peirce's real point. Mayo, instead, understands Peirce in terms of severe testing, in conformity with her own philosophy of experiment. Her discussion is interesting; but let me go straight ahead to the heart of her assertion, which appears in section 12.4. As she understand Peirce's main contention, it is the self-corrective character of induction that matters.

Induction corrects its premises by checking, correcting, or validating its own assumptions. One way that induction corrects its premises is by correcting and improving upon the accuracy of its data. The idea is a fundamental part of what allows induction---understood as severe testing---to be genuinely ampliative. It is why, in an important sense, statistical considerations allow one to come out with more than is put in. At times, even "garbage in" need not mean "garbage out". (435)

Examples Mayo quotes from Peirce are astronomical data and statistics of age. Utilizing statistical machinery, we can obtain more accurate data than the actually collected data. Then Mayo continues as follows:

As with the star catalogue in astronomy, the data thus corrected are more accurate than the original data. That is Peirce's main point. The thrust of the thesis that induction corrects its own premises is easy to put in terms of our error statistical framework: by means of an informal tool kit of key errors and their causes, coupled with systematic tools to model them, experimental inquiry checks and corrects its own assumptions for the purpose of carrying out some other (primary) inquiry.

These cases of correcting premises underscore what I have maintained for Peircean self-correction generally. It is not a matter of saying that with enough data we wil get better and better estimates of the star positions or the distribution of ages in a population. It is a matter of being able to employ methods right now to detect and correct mistakes in a given inquiry. (436)

But notice that this is possible because we use such methods, for example, as the least square, which were already justified in terms of probability (roughly, the least square method gives the best estimate out of the collected data, in the sense that its expected value is most probable). Reichenbach and Wes Salmon certainly know this much. Their question, however, is concerned with how we can justify our use of probability (the frequency interpretation, in particular) which may be employed in statistical methods, such as the least square. Thus I think Mayo completely misses the question.

In view of this, my initial response does not change even after reading her later chapters. She did not wrestle with the question Reichenbach and Wes Salmon tried to solve.


INDEX- CV- PUBLS.- PICT.ESSAYS- ABSTRACTS- INDEXªELAPLACE- OL.ESSAYS-CRS.MATERIALS


November 24, 2000; last modified Jan. 26, 2003. (c) Soshichi Uchii

suchii@bun.kyoto-u.ac.jp