And debunked here.
Let's look at a model statistical problem.
There are two production automated lathes next to one another. The product is monitored for length.
Both machines have been characterized as having a size distribution with a standard deviation of 0.1 for individual parts.
Current measurements indicate the sample mean for lathe A is 1.00.
Current measurements indicate the sample mean for lathe B is 0.95.
Now, the question is is the true mean for lathe B different than the mean for lathe A?
If we take a single piece from each macine and compare the length, we cannot tell if there is a difference in the two machines.
Now, if we take a large sample from each machine, we can discover if there is actually a difference.
For example, let's use a sample size of 10 for each machine.
Sigma(mean) = sigma(indiviual part) divided by the square root of (n-1)
Therefore, the standard deviation of the mean is 0.033
Three sigma (the most common test for the difference between two means is 0.1. So, the diffeence between the two means is lost in the statistical noise.
Now, lets increase the sample size to 101.
Applying the same calculation, we find the standard deviation of the mean is 0.01.
Now, the difference between the means is 5 sigma(mean). The test indicates there is a difference between the two means.
Now, that we have used this example, we can understand the principles underlying the study on finger length ratios. If you want to test for a significant difference between the mean finger length ratio for the two groups, use a bigger sample.
It does not matter that the individual distributions overlap, a sufficiently large sample size will allow for testing for a difference between the mean values.