The analytical representation of the f(t) function is pretty complicated that is why a table of pre-calculated Student’s coefficients is usually used. In this case all you need to know is the number of freedom degrees f=n-1 and the required confidence probability.

And now that we have gone through all this theoretical math1ematics let’s return to our initial goal.

When we compared the average performance coefficients for the tested HDDs, I got the impression that the difference between them is too small. And I immediately asked myself: is this difference statistically important? Since not many testers can afford to repeat the HDD tests multiple times, then no one can actually guarantee that the average value will not sift to the right or to the left in case of a small sample (few repeated experiments). And in this case the HDDs, which really perform close to one another, may simply swap places in the rating…

So, we need to figure out the width of the confidence interval where the average HDD speed will fall as a result of 10 experiments with the confidence probability (take for instance, 0.95). As soon as we get the width of the confidence interval we will check if the confidence intervals of different HDDs overlap.

At first, let’s check the dependence of S (standard deviation calculated as a square root of the dispersion) on the type of the test:

Of course, we see that the results deviation is the highest at Copy trace. And on the contrary, the smallest deviation can be observed at General hard Disk Drive Usage. I wonder if it has anything to do with the time it takes to run the entire trace?

Just in case, let’s also check if the dispersion difference is a random or meaningful thing. The criterion for determining the importance of the dispersion difference at a certain level (1-0.95=0.05) is known as F-criterion (Fisher’s criterion) and is based on Fisher’s distribution.

The derived value of the F-function for two considered samples is obtained as a particular value of S1/S2, with a larger dispersion put into the term of fraction. If the obtained F at a given level is lower than Fcritical, then we can consider the experimental results represented by both samples equally precise.

To simplify all these calculations, we will take the samples with the maximal and minimal S for each type of tests, because if these results will pass the qualification according to Fisher’s criterion, then the other results will surely do.

The calculations revealed that our results can be considered equally precise.

And now we should only do two more things.