Eugene Volokh has deftly isolated the major flaw in the recent L.A. Times article on DNA, cold cases, and statistics.
In my original post I quoted the language from the article that most disturbed me:
At Puckett’s trial earlier this year, the prosecutor told the jury that the chance of such a coincidence was 1 in 1.1 million.
Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.
In Puckett’s case, it was 1 in 3.
. . . .
In every cold hit case, the panels advised, police and prosecutors should multiply the Random Match Probability (1 in 1.1 million in Puckett’s case) by the number of profiles in the database (338,000). That’s the same as dividing 1.1 million by 338,000.
For Puckett, the result was dramatic: a 1-in-3 chance that the search would link an innocent person to the crime.
In my original post I said:
It seems to me that the conclusion does not logically follow at all. The formulation simply can’t be right. The suggestion appears to be that the larger the database, the greater the chance is that the hit you receive will be a hit to an innocent person. I think that the larger the database, the greater the probability of getting a hit. Then, once you have the hit, the question becomes: how likely is it that the hit is just a coincidence?
Volokh explains the ridiculous nature of the L.A. Times‘s formulation with an excellent example:
Here’s one way of seeing this: Let’s say that the prosecution comes up with a vast amount of other evidence against Pickett — he admitted the crime in a letter to a friend; items left at the murder site are eventually tied to him; and more. He would still, though, have been found through a search of a 338,000-item DNA database, looking for a DNA profile that is possessed by 1/1,100,000 of the population — and under the article’s assertion, “the probability that the database search had hit upon an innocent person” would still have been “1 in 3.”
Despite all the other evidence that the police would have found, and even if the prosecutors didn’t introduce the DNA evidence, there would be, under the article’s description, a 1/3 chance that the search had hit upon an innocent person (Pickett), and thus a 1/3 chance that Pickett was innocent, presumably more than enough for an acquittal. That can’t, of course, be right. But that just reflects the fact that 1/3 is not “the probability that the database search had hit upon an innocent person.” It’s the probability that a search would have come up with someone innocent if the rapist wasn’t in the database.
I think that’s exactly it. I believe the reason is that inclusion of a known guilty person in the database corrupts the math involved in pure probabilities of finding an innocent person.
I think Eugene has hit upon an actual error in the piece with this, and not just a matter that’s open to debate. I don’t think they would ever correct it, because they have a history of failing to correct errors if the explanation of the error is long and difficult — even if it’s unquestionably an error. Still, when I have more time, I’ll follow up on this more.
Read Volokh’s entire post, which has other illuminating insights, here. Previous posts on this subject here, here, and here.