I spent most of yesterday learning about statistics, specifically Margins of Error and Confidence Levels, in preparation for representing a client in Tribunal on Tuesday.
I needed to produce a document that explained these concepts to someone with no former knowledge.
Here's what I produced. Let me know what you think:
A large part of the Appellants’ case rests on the accuracy of The Commissioners observation exercise. To understand what affect this has it is important to understand a couple of fundamental principles of Statistics. These are detailed below.
Relevant Figures from The Commissioners assessment:
Sample Size (N) = 31 (number of meals observed)
Proportion (p) = 0.4028 (proportion of total meals that were not declared, as per Mrs. Clements recalculation in her letter of 10 August 2003)
1) Margin of error (M)
All sampling exercises are prone to error. The larger the sample, the smaller the possible error. The only way to avoid error would be to observe or sample the whole population. Clearly this is infeasible, but it is important to be able to quantify what the error is in a sampling exercise.
There are 2 formulae that produce a margin of error. Both of the formulae shown here have a confidence level of 95% (described below).
a) Simple formula.
The quickest and simplest formula is:
M = 1 / SQRT(N) (1 divided by the Square Root of N)
In this instance this is 1 / SQRT(31), which equals 17.96%
b) Advanced formula
A more complex formula, which is adjusted for the result achieved, is:
M = 1.96 * SQRT((p * (1-p))/N)
1.96 is a constant that is governed by the required confidence level, here it gives a 95% confidence level (two standard deviations). 2.56 would give a 99% confidence level.
In this instance this is 1.96 * SQRT((0.4028 * (1-0.4028))/31), which equals 17.25%
c) An even more complex formula can be used to adjust for the population size (i.e. the total number of meals sold during the period).
The affect of this is that the true answer to the sampling exercise could fall anywhere within 17.96 (or 17.25) percentage points of the answer achieved by the exercise calculation. This means that, based on the 31 meals observed, the suppression level could lie anywhere between 22.32% and 58.24% (40.28 plus and minus 17.96). Any result within this range is equally as likely. There is exactly the same chance that the suppression rate is 22.32% as there is that it is 40.28%.
2) Confidence Level
There is always a possibility that the samples taken give a result outside of this margin of error. For example, if by chance the Officers had observed every single undeclared meal during the period, the correct level of suppression would be just 5.8% (observed undeclared meals as a percentage of total meals sold during the period).
The confidence level is the probability that the true result will fall within the margin of error. The above calculations have been done to give a confidence level of 95%. This means that there is a 1 in 20 chance that the true level of suppression in the observed period falls outside of the margin of error.
A margin of error calculation using a higher confidence level produces higher margins of error. Here, reworking the second formula for a 99% confidence level would give a margin of error of 22.55%.
3) Random Sample
The margin of error calculation assumes that a truly random sample has been used. If the sample has been biased by any factors the margin of error could greatly exceed the result of the calculation. It is impossible to quantify the effect of a bias on the sample.
4) Summary
Ignoring the effect of any bias on the sample, the suppression level calculated from the observed meals is equally likely to fall anywhere between 22.32% and 58.24%. 40.28% simply falls half way between these two.
Further, there is a 1 in 20 chance that the true level of undeclared meals during the observed period falls outside of this range.
The only absolute certainty is that the level of undeclared meals is at least 5.8% (observed undeclared meals as a percentage of total meals sold during the period).