Error Statistics

Argument from Coincidence


In Perrin's book Atoms, the following sort of argument often appears. In order to measure a physical magnitude, such as the density of the granular material or the radius of the grains, one kind of method is not enough, and you should try several different methods in order to obtain reliable results. This may be a piece of "common sense" for experimental researchers. But how should philosophers of science analyze the reasoning behind this instruction or practice? No doubt Mayo's book begins with this sort of motivation.

Now, Perrin used gamboge (which is prepared from a dried vegetable latex) for his study of Brownian motion; and one of the most impressive parts of his work is how carefully he prepared samples for observation. He spent several months in order to obtain desired samples (95)! But that is not the end of the story; he still had to determine the relevant physical quantities of the samples.

As for the density, he applied three different methods, and for the given lot of gamboge grains he obtained the three values: 1.1942, 1.194, and 1.195. They are very close, but why can we be so sure that these values are close to the true value?

As for the radius of the grains, he used three methods with respect to five kinds of samples, and obtained the following data (98).

ª@
method 1
method 2
method 3
sample I
0.50
--
0.49
sample II
0.46
0.46
0.45
sample III
0.371
0.3667
0.3675
sample IV
--
0.212
0.213
sample V
--
0.14
0.15

Again, the values on the same lines are very close, but why can we conclude, from this, that these values are reliable?


As Mayo has already discussed, Ian Hacking presented and analyzed the "argument from coincidence" (Hacking 1983, 200-202), which seems closely similar to the preceding problem.

Two physical processes---electron transmission and fluorescent re-emission---are used to detect the bodies. These processes have virtually nothing in common between them. They are essentially unrelated chunks of physics. It would be a preposterous coincidence if, time and again, two completely different physical processes produced idential visual configurations which were, however, artifacts of the physical processes rather than real structures in the cell. (201)

Thus Hacking's point is that it is extremely improbable that two independent processes (of measurement) produce the same (visual) artifact, so that it is quite reasonable to disregard this possibility. Although this argument is applied to qualitative results by Hacking, there is no reason why this cannot be applied to quantitative results, such as Perrin's; then we may consider that the same sort of argument is underlying Perrin's measurements. That is to say, since it is extremely improbable that these similar values are obtained as an artifact of three different methods of measurement, we may safely conclude that the true value lies within a small interval around here. This is an argument from coincidence, applied to the Perrin's measurement.

Now, Mayo's severity criterion (SC) can be applied to this case, since this is analogous to estimating a correct value of a parameter. But how should it be applied? I presume the comparison of severity is primarily between two alternative ways of measurement: measurement by a single method vs. measurement by two or more methods. Since the latter way is reasonably expected to detect errors more frequently, the latter is more reliable, and hence the same hypothesis (experimental hypothesis as regards the true value) is tested more severely by the latter than by the former. Repeated measurements and statistical analyses of obtained data (such as the "least square" method) can be regarded as a way to achieve this.

To see how this is in accord with (SC), let

h: Error is present, and

j: Error is absent.

Then the probability of detecting an error (e) according to Tl (single method), and according to T2 (multiple methods) is:

P(e|h) is lower and P(ªæe|h) is higher for Tl,

P(e|h) is higher and P(ªæe|h) is lower for T2,

respectively; so that if the hypothesis j passes (with result ªæe) the test T2, then j marks a higher score as regards severity (recall severity is calculated in terms of the probability of ªæe on ªæj, i.e. on h); that is, j passes the severer test (see b-version of SC).

Thus if a certain value is obtained by the latter way T2, that is the value more reliable, since it is obtained by the more reliable procedure (but notice that we are jumping from j to another hypothesis v, giving a specific value). And, further, I do not doubt that the severity in this context can be handled in quantitative terms, since the two ways of measurements can be evaluated, in principle, in terms of error probabilities (notice that measurement belongs to the case of estimation of a parameter, where the range of possible values, i.e. possible hypotheses, is well-defined). This seems to be the way Mayo wishes to adopt. But the problem is, of course, Mayo seems to go far beyond this, and by means of inappropriate definition.


As a historical remark, I wish to add that a similar idea as Hacking's was presented by Popper in a somewhat different context, when he discussed the falsifiability of statistical or probabilistic hypotheses: he proposed "that we take the methodological decision never to explain physical effects, i.e. reproducible regularities, as accumulations of accidents" (Popper 1959, 199).


References

Hacking, I. (1983) Representing and Intervening, Cambridge University Press.

Perrin, J. (1990) Atoms, Ox Bow Press (reprint).

Popper, K. (1959) The Logic of Scientific Discovery, Harper.


BACK TO ERROR STATISTICS


Last modified Jan., 27, 2003. (c) Soshichi Uchii

suchii@bun.kyoto-u.ac.jp