Patterico's Pontifications

5/7/2008

Author of DNA Article Responds to Accusation That the Other Side Was Not Fairly Represented

Filed under: Crime,Dog Trainer — Patterico @ 6:31 am

One of my complaints about the recent L.A. Times article on DNA, cold hits, and statistics is that I believe it inadequately portrayed the extent of disagreement regarding the need for the statistical adjustment discussed in the article (multiplying a random match probability like 1 in 1.1 million by the size of a database like 338,000.) . I’ll once again show you the image of the front page to remind you how strongly the paper portrayed the adjustment as the product of a wide consensus among leading experts:

Yet, as I have previously noted, the L.A. Times‘s own expert mathematician Keith Devlin of Stanford says that “the relevant scientific community (in this case statisticians) have not yet reached consensus on how best to compute the reliability metric for a cold hit.” And Prof. David Kaye, who was on the 1996 committee that recommended the adjustment, actually told me that the contrary view is more widely accepted:

[The L.A. Times‘s] description portrays one approach to the issue as if it is the consensus in the scientific literature. It is not. There is disagreement about the need to adjust a random-match probability. Furthermore, if one counts the number of peer-reviewed articles on the subject, the dominant view is that adjustment is not necessary.

Jason Felch, one of the authors of the L.A. Times article, responded to this portion of my complaint, and authorized me to quote him:

That brings us to your second point: that we did not portray the full scientific debate in the article. You are right in saying that a debate persists among statisticians (as it does in most complex scientific questions.) The 1996 National Research Council and, after its dissolution, the FBI’s DNA Advisory Board carefully weighed the arguments of the various statistical camps — the Bayesians and frequentists, but also those who favor likelihood ratios or the first NRC’s approach, which defense attorneys are arguing for now and is more conservative that the NRCII’s adjustment. Both NRC and the DAB concluded the RMPxDATABASE approach was best for cold hit cases. In the forensic field, these two bodies are the source of authority on questions of science — The NRCII is referred to as the “bible” of forensic DNA. But their recommendations are not being followed. This is the point we make in the article, while acknowledging there is not unanimity of opinion.

For the courts, the question is: is there enough of a consensus on this issue that a generally accepted practice has emerged? If the answer is no, the law (Kelly-Frye here in California, Daubert in other states) holds that the evidence should not be presented in courts. So there’s a lot at stake in the question. Not surprisingly, many in the field argue that the issue is not a lack of consensus, but a debate among which of several accurate scientific approaches is more appropriate. So far, the courts have agreed. This is what the California Supreme Court will weight. We are likely to explore some of these complexities in our upcoming coverage of that case.

I’m not convinced that the paper “acknowledg[ed] there is not unanimity of opinion” is a way that was meaningful to readers. The article never even mentioned the entire Donnelly/Balding hypothesis that Prof. Kaye says constitutes the majority opinion of peer-reviewed articles. Readers were told only in passing, deep in the article, that the adjustment discussed in the article “has been widely but not universally embraced by scientists.” As for how the article portrayed general scientific acceptance of the adjustment, I refer you once again to the image of the front page above.

But while I might disagree with Mr. Felch, I thank him for his response.

P.S. I am working on a proposed e-mail to Mr. Felch that questions the article’s assertion that there was a “1 in 3” chance that “the database search had hit upon an innocent person” in selecting Puckett.

Comments (4)

4 Responses to “Author of DNA Article Responds to Accusation That the Other Side Was Not Fairly Represented”

I laid out the math in my comments #39 and #41 in your most recent thread

The short version is: the probability that it’s a good match, as opposed to an innocent hit, when a single hit only is returned, depends on how likely the DB is to contain the perpetrator’s DNA.

If the DB is 100% likely to contain his DNA, the chance of an innocent match is 0. If he’s in there, and there’s only 1 result, it’s going to be him!

If the DB is 0% likely to contain his DNA, the chance of an innocent match is 100%. If he’s not in the DB, it can’t spit him out as an answer!

If the DB has a likelihood of P to contain his DNA, the chance of a guilty match, where 1-in-1.1M people will have matching DNA by chance, and the DB is of size 338k, is:

73.54 x P / ( (73.54 x P) + 22.60 x (1-P) )

If there’s only a 1% chance the DB would have his DNA in it, then a single match returned carries only about a 3% chance the “match” is guilty.

At 50/50 odds, that’s a 1-in-4 chance of innocence.

At 70/30 odds, that’s only a 1-in-11 chance of innocence.

Assuming 50/50 odds, which is what the LAT does, is wrong. You can’t make the assumption that there’s a 50% every defendant is going to be in the DB. That actually undercounts the likelihood of an innocent match in many cases, because for certain types of crimes, the DNA DB is unlikely to contain the perp! The LAT: biased in favor of the police!

So you should ask the question: based on the circumstances of the crime, what is the likelihood that the perp is in the DB? That means looking to the type of crime committed (a stranger-rape-murder 3 decades ago) and the type of DB (a sex offender DNA registry). The police could probably figure out these statistics if they wanted to–they can probably tell you have many 3-decades-old stranger-rape-murder perpetrators they believe are in the DB vs. they believe remain at large (or have since died without being caught).*

But the police aren’t going to bother with that. They want to use their investigation to get to the bottom of innocence/guilt and not rely on statistics. It would be unfair to criminal defendants if the sole evidence against them was a DNA match that could belong to about 30 California men, and a Police expert testifying that, based on the circumstances of the crime, there is probably a 90% chance that the perp is in the DB, so therefore the suspect is guilty. This testimony would be confusing and probably prejudicial to the defendant. It’s really only of use to journalists and to people sizing up our criminal justice system as a whole (law profs, think tanks, etc.).

* if the police come back with a: we think there’s a 10% chance 50% of them are in the DB, and a 20% chance 70% of them are in the DB, and a 30% chance 80% are in the DB, and a 40% chance 90% of them are in the DB–if that’s what they come back with–then you’re on your own, because I’m all mathed out.
Daryl Herbert (4ecd4c) — 5/7/2008 @ 7:55 am
I have a question as well as a comment.
When a cold hit is made and that person is apprehended. How often is a new DNA test done on the identified person to confirm that there was no error in the database? I would hope that it is routine to do a retest, in fact I would hope it is required preferably by law.

It seems to me that the question is. How reliable is the match between the crime scene DNA and the DNA of the individual identified? This has nothing to do with the size of the database. The size of the database has NO influence on the likelihood that the match is in error. People seem to be confusing the database size with the contents of the database which are wholly independent. No matter how big the database gets none of the data is changed due to the increase of the database size. The only database that is relevant is the one consisting of all the people in the world including all those who have died since a given crime was committed. This is the only database that counts and the only one controlling the reliability of the identification.

When a search of a database is made what is searched for is: is there a match between the crime scene DNA and any of the DNA samples in the database? The question of reliability is the chance of error when comparing two samples of DNA not where or how the non crime scene DNA was obtained.
Paul Guantonio (87132e) — 5/7/2008 @ 1:20 pm
Paul, I don’t think you are following the discussion, the issue arose because the kind of comparison that could be done was a limited comparison with a limited number of loci. This changed the statistics of the comparison.
SPQR (26be8b) — 5/7/2008 @ 1:27 pm
As I’ve written before: if the set of loci in the sample DNA have 1.1M variations, then there is about a 1/3 chance of finding a match by coincidence in a set of 338K profiles. The odds against a particular profile match are not impressive when the number of profiles is that large.

One would be very surprised to be dealt a royal flush: the odds against a royal flush are 1 to 649,740. But not if one was dealt 200,000 hands.

The DNA match in this case was a good lead and supportive evidence, but not close to proof. The LAT is basically right – there is a 1/3 chance of this search turning up a false positive. 1.1M to one against this guy matching, but only 3 to 1 against someone matching.
Rich Rostrom (7c21fc) — 5/7/2008 @ 7:01 pm

5/7/2008

Author of DNA Article Responds to Accusation That the Other Side Was Not Fairly Represented

4 Responses to “Author of DNA Article Responds to Accusation That the Other Side Was Not Fairly Represented”

Favorite Sites

Links

Patterico Sells Out