Probability is an empirical measurement of an ensemble of events. It means: Given a set of N independent events, the probability of a specific event is, to a degree of certainty, the number of times the specific event occurred divided by the total N, as N becomes large. By “a degree of certainty,” it is meant simply that the uncertainty in the measurement can be made smaller and smaller by increasing N. (Since this is an inductive process, it has the characteristics of induction, including the requirement of objectively determining when N is large enough to achieve certainty of the probability measure.)
Let me work out the math to calculate the degree of certainty. Consider a coin tossed N times. Suppose that M tosses resulted in a ‘heads’ (H) outcome. To simplify the math (by keeping it in the discrete domain), suppose I know that the coin has been designed to have a “true” heads probability ‘r’ for a single toss of either ‘p’ or ‘q’. Let HM,N denote the event of obtaining M heads from N tosses. Let P(A/B) denote the conditional probability of A given B.
Using Bayes’ theorem,
P(r = p / HM,N) = P(HM,N / r = p) P(r = p) /
[P(HM,N / r = p) P(r = p) + P(HM,N / r = q) P(r = q)]
P(HM,N / r = p) = NCM rM(1-r)N-M
P(HM,N / r = q) = NCM qM(1-q)N-M
If one knows P(r = p), the probability of the true probability being p, one can calculate P(r = p / HM,N), the degree of certainty for the probability estimate of r = p given the empirical data. The problem is that to calculate the degree of certainty of a probability estimate based on empirical data, one needs another probability number. To take a concrete example, suppose I know that my coin has a ‘true’ probability of either 0.3 or 0.4 for a single toss. I toss the coin 100 times and get 33 heads, so that N = 100, M = 33, p = 0.3, q = 0.4. If I use P(r = 0.3) to be 0.5, then the degree of certainty works out to be 69.7 %. The problem is that the value of 0.5 for P(r = 0.3) is still arbitrary. It has no basis in empirical data.
One can extend this to the continuous domain, where r may take any value between 0 and 1. To get a degree of certainty measure, one will need a prior probability distribution for the “true” probability and this distribution will have to be arbitrary. Just as I used a value of 0.5 in my concrete example, one may take this distribution to be the uniform distribution. I have not worked out the math for this case, but it should be easy to do so.
Anyway, it turns out that as one increases the values of N and M proportionately, the degree of certainty for the probability estimate r = M / N, rises to 100% very fast irrespective of the arbitrarily chosen prior probabilities. Practically, this is a very useful feature and this is what Stephen refers to when he writes that the uncertainty in the measurement can be made smaller and smaller by increasing N. But does it change the epistemological status of probability calculations? I don’t think so. As long as N is finite – that is, always – the degree of certainty is arbitrary. At some level, probability calculations always depend on an arbitrary choice of equal likelihood. To see this, just consider Bayes’ theorem above. It uses a weighted average where the weights are prior (or unconditional) probabilities. These unconditional probabilities are usually themselves estimated with other empirical data. Regardless, the calculation of an average assumes an equality of significance of the numbers being averaged. My position is that this assumption of equality is an arbitrary assumption. By using more and more empirical data, one can drive this assumption deeper and deeper, but unless one develops a physical theory – a cause and effect relationship – one cannot get rid of it.