This is one of those counter-intuitive mathematical facts about statistics that drive math students to distraction and gamblers to bankruptcy.
Remember that 97% to 99% of a general sample of men will be ‘straight’. If your algorithm were simply to declare all and any individual ‘straight’, it would operate with a 94% to 96% accuracy, misidentifying 1% to 3% of the sample as ‘straight’ when they were ‘gay’. The accuracy would be the percentage of correct hits minus the percentage of “False negatives.”
When the test correctly identifies 91% of the sample, it gives back “false positive” results as well as “false negatives”. Consider a hypothetical sample of 1000 men; allow for ten of them to be ‘gay’, 990 of them to be ‘straight’. Nine of the ‘gay’ men would be correctly identified; 891 of the ‘straight’ men would be correctly identified. That leaves 10 individuals as misidentified; one false negative, and nine false positives. In my hypothetical sample, the algorithm identifies 18 individuals as ‘gay’, when only 9 actually are; that is half the time my hypothetical test identifies an individual as ‘gay’, it would be wrong. Dr. Cox’s statement applies to his algorithm and samples, which are more complex than my simplified example.
Thanks, that makes sense.
I assumed when they said 91% accurate, that took false positives and negatives into account. If it does not, it is not 91% accuracy.