I have noticed a third error in the L.A. Times article on DNA, statistics, and cold hits. The article said:
Typically, prosecutors rely on FBI statistics to estimate the rarity of a particular DNA profile in the general population. This calculation is known as the Random Match Probability.
The chance that two unrelated people will share the same 13 markers can be as remote as 1 in a quadrillion — a number with 15 zeros. Because the match in Puckett’s case involved only 5 1/2 genetic locations, the chance it was coincidental was higher but still remote: 1 in 1.1 million.
Even placing aside the issue of database searches, it is incorrect to say that “the chance it [the match in Puckett’s case] was coincidental was . . . 1 in 1.1 million.” Even if no database had ever been used, this would be an incorrect statement. It is an example of a statistical fallacy known as the “Prosecutor’s Fallacy.” (For detailed discussions of the Prosecutor’s Fallacy and ways that numbers can be misrepresented by English formulations, read here, here, and here.)
It’s important to emphasize that my complaint has nothing to do with the fact that a database search was used. It is true that the article goes on to explain that Puckett’s attorney thought this number misrepresented the probabilities because the match was made through a database. But that is irrelevant to the particular complaint I am making here about the Prosecutor’s Fallacy. Even if the match had not been made through a database, it would be incorrect to say that 1 in 1.1 million represents the chance that the match to Puckett was a coincidence. (I predict that this paragraph of my post will be the one most widely ignored by the comments this post is likely to generate.)
This is a hard concept to explain, and it has much less to do with the actual math than the way that the math is expressed in English.
A simple example makes the point. Let’s say the odds of winning the lottery are 1 in 100 million. That makes it close to certain that you won’t win if you buy only one ticket.
Now assume you bought one ticket and you won. Your jealous friend says that the chance of your numbers matching being a coincidence was 1 in 100 million. By phrasing the probabilities this way, your friend is saying that it was close to certain that you would win. When you say “the chance that this event resulted from a coincidence are 1 in a 100 million” you’re saying the event was almost certain to happen.
Your friend’s statement is very similar to the way the L.A. Times article phrased the probabilities — and it is an example of the Prosecutor’s Fallacy. By using the formulation “the chance of this match being a coincidence are 1 in 100 million” the speaker is taking extremely low odds and making them sound extremely high. What your friend should have said — and what he meant to say — was this: “the chance you’d win was 1 in 100 million.”
Now:you want a real coincidence? This very distinction recently cropped up in the news — and the L.A. Times covered it, and temporarily showed some hint of understanding it. In an article about a Ninth Circuit decision regarding prosecutorial error in characterizing the meaning of DNA results, the L.A. Times discussed the distinction in this way:
The error stemmed from the prosecution expert wrongly conflating two very different mathematical probabilities: The probability that the crime scene evidence matched a person selected at random from the population and the probability that the defendant was guilty.
This formulation is confusing because of the use of the word “guilty” as an imprecise shorthand for the phrase “the person who donated the crime scene DNA.” Still, this shows that the reporters — one of whom, Jason Felch, co-wrote last Sunday’s article on the Puckett case — understand the distinction, or at least show the capacity to understand it.
The original article from last Sunday, about the Puckett case, mis-expresses the concept at another point, but blames it on the prosecution:
Puckett insisted he was innocent, saying that although DNA at the crime scene happened to match his, it belonged to someone else.
At Puckett’s trial earlier this year, the prosecutor told the jury that the chance of such a coincidence was 1 in 1.1 million.
Did he really? Or did the reporters mischaracterize what the prosecutor told the jury? I don’t know. If the prosecutor actually expressed the odds that way, he raised an appeal point for the defense based on the Prosecutor’s Fallacy. But I’m not convinced the error wasn’t the reporters’, given that they gave the same misleading description elsewhere in the article.
(I should note that Eugene Volokh caught this iteration of the error in his previous post on this topic. But because the error is attributed to the prosecutor in this quote, I didn’t notice that the reporters themselves had made the exact same error elsewhere in the article.)
I do know this: the paper has continued to describe random match probability in this misleading way. In an article published yesterday — after the publication of the article describing the case about the Prosecutor’s Fallacy — the deck headline reads:
A long-time scientific controversy centers on how to calculate the probability that such a match would be the result of coincidence.
This is wrong. Granted, it’s hard to express the concept in a headline. But this formulation presumes that you have a match, and you’re talking about the probability that it is the result of a coincidence. Using our lottery example, it would be like saying: “the controversy centers on how to calculate the probability that a particular person having won would be the result of coincidence.” That is not what random match probability addresses. It addresses instead the frequency with which a particular profile appears in a population of unrelated individuals. The database adjustment in question addresses the probability that, if a database is composed of individuals unrelated to the person who donated the crime scene DNA, a database search will nevertheless result in a match to the crime scene DNA.
Again, saying “you had a 1 in 100 million chance of winning” is not the same as saying “the chance your victory resulted from a coincidence is 1 in 100 million.”
This is not trivial. It is important, because people need to understand that the random match probability is not the chance that the defendant is innocent, or (put another way) that it is a coincidence that he is sitting in the defendant’s chair.
It is important to be accurate about these concepts, and the L.A. Times — probably in an admirable effort to simplify them — keeps mucking them up. Which is, coincidentally (?!), the same thing they accuse the courts of doing.
Now, I feel for the reporters. In writing my posts critical of the article, I have myself at times used formulations that are either unclear or do not precisely represent the statistics involved. I have had to write more than one update clarifying my position or removing language I feared might be inaccurate. (God help me, some commenter might even make me do so with this post!) Expressing these concepts in clear, precise, and accurate English is, as I said yesterday, like walking a tightrope.
To me, this illustrates the fact that people can be easily misled by these concepts — which illustrates the need to be extra careful when you’re a major newspaper with a Sunday circulation of over 1 million, writing about the concept on the front page of your Sunday edition.
And, it illustrates the point that when you make these mistakes, it’s important to admit them. I’ll be writing the paper about this error as well (I already wrote them about the first two errors two days ago, in an e-mail I reproduced here). I hope they get around to correcting the errors prominently, given the prominence given to the original article.