Patterico’s Pontifications

5/8/2008

The L.A. Times’s Errors in Its Piece on DNA and Cold Hits

Filed under: Crime, Dog Trainer, General — Patterico @ 11:10 pm

I have sent the following e-mail to the authors of that L.A. Times piece on DNA and cold hits:

Mr. Felch and Ms. Dolan,

I believe your recent front-page article on DNA cold case statistics misstated the meaning of the math you discuss.

Your article said:

Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett’s case, it was 1 in 3.

The 1-in-3 number does not pertain to the probability that the database search had hit upon an innocent person. Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.

If we ignore the existance of independent evidence of Puckett’s guilt, the statistical chance Puckett is innocent depends in part on the probability that the database contains the guilty party. Your article gives no information on what this probability is (although the fact that the database consists of California-based felons suggests that the chances are better than one would find in a purely random database). Without knowing the probability that the database contains the guilty party, you can’t conclude that the 1-in-3 figure accurately represents the chances Puckett is innocent. Your article confuses two distinct concepts and requires correction.

You state:

In every cold hit case, the panels advised, police and prosecutors should multiply the Random Match Probability (1 in 1.1 million in Puckett’s case) by the number of profiles in the database (338,000). That’s the same as dividing 1.1 million by 338,000.

Actually, you have that upside down. Multiplying (1 in 1.1 million) by 338,000 is the same as dividing 338,000 by 1.1 million — not dividing 1.1 million by 338,000.

Your article continues:

For Puckett, the result was dramatic: a 1-in-3 chance that the search would link an innocent person to the crime.

Again, this is wrong. There is a 1-in-3 chance that the search would link someone to the crime. Whether that person is innocent or not depends on the likelihood that the database contains the guilty party (as well as the quality of other evidence tying that defendant to the crime).

I am not the only person saying this. A similar point was made by Eugene Volokh in this post. And I made the point in more detail in this blog post of mine.

I think the paper owes readers at least two corrections — one of the 1-in-3 statistic, and one on the upside-down division. Given the prominence of the error on the 1-in-3 statistic, which appeared on the front page of the Sunday paper, I hope your paper will make an effort to give this correction the prominence it deserves.

cc: Readers’ Representative

I’ll let you know what I hear in response.

P.S. When I say “Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.” I meant to express this concept: “Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match, period. If we get a single match, we won’t know whether it was to an innocent person or a guilty person without learning more.” In other words, without prior knowledge of the likelihood that the database has the guilty person, all we know is the chance of a hit — not the chance that a single hit has come back to an innocent person.

P.P.S. I just changed the last phrase from “not the chance of a hit to an innocent person” to “not the chance that a single hit has come back to an innocent person.” That more accurately expresses what I was trying to say.

Expressing statistical concepts in accurate English is like walking a tightrope.

5/7/2008

Is That a Growing Consensus, Or Are You Just Unhappy to See Me Still in the Race?

Filed under: 2008 Election, Dog Trainer — Patterico @ 9:23 pm

An article in the L.A. Times has this amusing lede:

Hillary Rodham Clinton may be short on delegates, money and time, but she faced an even more ominous and intractable impediment Wednesday: a growing consensus in the media that her bid for the Democratic presidential nomination is doomed.

Wow. A “growing consensus in the media.”

Wow.

You can’t fight City Hall, and you sure can’t fight a “growing consensus in the media.”

Except when that growing consensus is wrong, of course. Which happens an awful lot.

My Proposed E-Mail to the Authors of the L.A. Times Piece on DNA and Cold Hits

Filed under: Crime, Dog Trainer, General — Patterico @ 6:57 am

It might seem a little odd for me to vet an e-mail I am planning to send by publishing a draft of it on a public website that receives thousands of hits every day. But hey, odd is fun! And so I invite you to read this draft (yet unsent) of a letter to the authors of the recent L.A. Times article on DNA, cold hits, and statistics.

I’d like readers to review it before I send it because I am not a statistics expert, and although I consulted with more than one during the process of drafting this, I want to make sure I have made no mathematical or logical misstatements.

Here it is:

Mr. Felch and Ms. Dolan,

After discussions with numerous people with statistical expertise, I am reasonably confident (that is, as confident as a layman like myself can be) that your recent front-page article on DNA cold case statistics gravely misstated the meaning of the math you discuss.

Your article said:

Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett’s case, it was 1 in 3.

I don’t believe the math in question supports the statement that there was a “1 in 3″ chance that “the database search had hit upon an innocent person” in selecting Puckett.

The starting point for my analysis was this post by Eugene Volokh, a UCLA law professor and blogger. Prof. Volokh agrees with me that your formulation is wrong. He justifies his argument with effective argumentation and examples; I commend his post to you. My e-mail to you (which I am blogging on my site) merely expands on Prof. Volokh’s argument as it relates to the article.

(To keep the discussion simple, I will assume there are no issues relating to data corruption or human error. I’ll also stick with the numbers used in your article: a random match probability of 1 in 1.1 million, and a database of 338,000.)

The logic behind the database adjustment was expressed in a report from the National Research Council as follows:

Recommendation 5.1 proposes multiplying the random-match probability (P) by the number of people in the database (N). If the person who left the evidence DNA was not in the database of felons, then the probability that at least one of the profiles in the database would also match the incriminating profile cannot exceed NP.

The clear working assumption here is that the database consists of “innocent” people who did not leave the DNA in the database.

This makes sense, at least in a hypothetical case where the jury is informed that the authorities came to suspect the defendant because of a database hit. There is a certain “what are the chances?!” quality of DNA evidence that presumes the defendant was under suspicion before the DNA comparison was done. In other words, if the defendant is before the jury because of a database hit, and the jury knows it, the jury may be “wowed” by the fact of the hit. But the impact of this “wow” factor is considerably lessened likely if the jury is told that, in a hypothetical search of a database of completely innocent people, there is a 1/3 chance of a hit.

Thus, it seems clear to me that the idea of the adjustment is to communicate to the jury the likelihood of a false positive, based on the assumption that the true donor of the incriminating profile is not in the database.

My understanding is bolstered by an e-mail I received from Prof. David Kaye, who served on the 1996 NRC committee that recommended the adjustment. In that e-mail, Prof. Kaye stated:

[T]he statisticians who favor an adjustment to the random-match probability are considering [the question:] What is the chance that a search of a database will turn up exactly one match when the source of the crime-scene DNA is someone who is unrelated to everyone in the database?

He restated the question in this way:

What is the chance that a database composed entirely of innocent people (with respect to [the] crime being investigated) will show a match?

Note that the fundamental assumption of the hypothetical is that everyone in the database is innocent. Then, and only then, can one use the adjusted figure recommended by the committees as a (very rough) approximation of the chances of a false positive.

If by contrast, you start with the assumption that you don’t know whether the suspect is in the database or not, then the 1/3 number tells you nothing about whether a single hit from the database is a hit to a) the true donor of the incriminating DNA or b) an innocent person who happens to share the same profile (i.e. a “false positive”).

It’s important to keep in mind that what we’re talking about here is the situation where a database search is conducted, and has resulted in only one hit. The question is: what can we say, statistically, about that one hit?

In the case where you don’t know whether the database contains the the true donor, or “guilty” person (speaking very loosely), the meaning of a single hit from that database is a function of the likelihood that the true donor is in the database — and (given that only one hit was received) the likelihood that nobody else with that profile is in the database.

If you don’t know whether the true donor (or “guilty” person) is in the database or not, the 1/3 number is merely an expression of the likelihood of a hit — any hit. It’s not an expression of the chances that any resultant hit is a hit to an “innocent” person.

Again, I am not a statistics expert, and (perhaps as a result) I don’t know whether it is possible to tell juries anything statistically meaningful about the likelihood that the person in front of them is innocent. (Neither does Prof. Volokh, for what it’s worth.) But I feel fairly confident that the 1/3 number is not an expression of the probability that the person sitting in front of jurors is “innocent.”

Thus, I believe that your article is wrong to say, in the statement quoted above, that “the probability that the database search had hit upon an innocent person” in Puckett’s case was “1 in 3.”

That is simply not so, I believe.

If I’m right, I think The Times needs to correct this misimpression. What’s more, I think any correction should be very prominent, given the extreme prominence of the error (or what I believe to be an error) on the front page of the paper’s Sunday edition.

I hope you will see your way clear to discussing these issues with knowledgeable experts. I also hope that you will issue an appropriate and prominent correction if, after reflection and consultation with experts, you believe I have correctly analyzed the issue.

I look forward to your response.

P.S. I should note that my argument does not address the fact that guilt is not automatic once it is determined that the suspect is the donor of the DNA at the crime scene, just as innocence is not automatic once it is determined that he is not the donor. I assume you are aware of the difference between source attribution and guilt, and left out an explanation of the difference for space reasons.

Nor does my argument address the fact that the 1/3 number is an approximation of an approximation. (Prof. Volokh’s post has more details on the relevant statistics.) I also presume you were aware of this, and believe that the 1/3 number is simply a conservative simplification of the more complex equation that Prof. Volokh sets forth in his post.

My argument has nothing to do with these relatively minor quibbles. One could argue that ignoring them is necessary to keep the issue straightforward and simple. My problem is that, these minor issues aside, the way you have expressed the meaning of the adjusted number is (I believe) so misleading as to be fairly termed an error.

Please let me know what you think. I remain humble on the issue because of my lack of expertise in the field.

Author of DNA Article Responds to Accusation That the Other Side Was Not Fairly Represented

Filed under: Crime, Dog Trainer — Patterico @ 6:31 am

One of my complaints about the recent L.A. Times article on DNA, cold hits, and statistics is that I believe it inadequately portrayed the extent of disagreement regarding the need for the statistical adjustment discussed in the article (multiplying a random match probability like 1 in 1.1 million by the size of a database like 338,000.) . I’ll once again show you the image of the front page to remind you how strongly the paper portrayed the adjustment as the product of a wide consensus among leading experts:

dna-on-front-page.JPG

Yet, as I have previously noted, the L.A. Times’s own expert mathematician Keith Devlin of Stanford says that “the relevant scientific community (in this case statisticians) have not yet reached consensus on how best to compute the reliability metric for a cold hit.” And Prof. David Kaye, who was on the 1996 committee that recommended the adjustment, actually told me that the contrary view is more widely accepted:

[The L.A. Times’s] description portrays one approach to the issue as if it is the consensus in the scientific literature. It is not. There is disagreement about the need to adjust a random-match probability. Furthermore, if one counts the number of peer-reviewed articles on the subject, the dominant view is that adjustment is not necessary.

Jason Felch, one of the authors of the L.A. Times article, responded to this portion of my complaint, and authorized me to quote him:

That brings us to your second point: that we did not portray the full scientific debate in the article. You are right in saying that a debate persists among statisticians (as it does in most complex scientific questions.) The 1996 National Research Council and, after its dissolution, the FBI’s DNA Advisory Board carefully weighed the arguments of the various statistical camps — the Bayesians and frequentists, but also those who favor likelihood ratios or the first NRC’s approach, which defense attorneys are arguing for now and is more conservative that the NRCII’s adjustment. Both NRC and the DAB concluded the RMPxDATABASE approach was best for cold hit cases. In the forensic field, these two bodies are the source of authority on questions of science — The NRCII is referred to as the “bible” of forensic DNA. But their recommendations are not being followed. This is the point we make in the article, while acknowledging there is not unanimity of opinion.

For the courts, the question is: is there enough of a consensus on this issue that a generally accepted practice has emerged? If the answer is no, the law (Kelly-Frye here in California, Daubert in other states) holds that the evidence should not be presented in courts. So there’s a lot at stake in the question. Not surprisingly, many in the field argue that the issue is not a lack of consensus, but a debate among which of several accurate scientific approaches is more appropriate. So far, the courts have agreed. This is what the California Supreme Court will weight. We are likely to explore some of these complexities in our upcoming coverage of that case.

I’m not convinced that the paper “acknowledg[ed] there is not unanimity of opinion” is a way that was meaningful to readers. The article never even mentioned the entire Donnelly/Balding hypothesis that Prof. Kaye says constitutes the majority opinion of peer-reviewed articles. Readers were told only in passing, deep in the article, that the adjustment discussed in the article “has been widely but not universally embraced by scientists.” As for how the article portrayed general scientific acceptance of the adjustment, I refer you once again to the image of the front page above.

But while I might disagree with Mr. Felch, I thank him for his response.

P.S. I am working on a proposed e-mail to Mr. Felch that questions the article’s assertion that there was a “1 in 3″ chance that “the database search had hit upon an innocent person” in selecting Puckett.

5/6/2008

Volokh on DNA and Cold Hits

Filed under: Crime, Dog Trainer, General — Patterico @ 7:05 am

Eugene Volokh has deftly isolated the major flaw in the recent L.A. Times article on DNA, cold cases, and statistics.

In my original post I quoted the language from the article that most disturbed me:

At Puckett’s trial earlier this year, the prosecutor told the jury that the chance of such a coincidence was 1 in 1.1 million.

Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett’s case, it was 1 in 3.

. . . .

In every cold hit case, the panels advised, police and prosecutors should multiply the Random Match Probability (1 in 1.1 million in Puckett’s case) by the number of profiles in the database (338,000). That’s the same as dividing 1.1 million by 338,000.

For Puckett, the result was dramatic: a 1-in-3 chance that the search would link an innocent person to the crime.

In my original post I said:

It seems to me that the conclusion does not logically follow at all. The formulation simply can’t be right. The suggestion appears to be that the larger the database, the greater the chance is that the hit you receive will be a hit to an innocent person. I think that the larger the database, the greater the probability of getting a hit. Then, once you have the hit, the question becomes: how likely is it that the hit is just a coincidence?

Volokh explains the ridiculous nature of the L.A. Times’s formulation with an excellent example:

Here’s one way of seeing this: Let’s say that the prosecution comes up with a vast amount of other evidence against Pickett — he admitted the crime in a letter to a friend; items left at the murder site are eventually tied to him; and more. He would still, though, have been found through a search of a 338,000-item DNA database, looking for a DNA profile that is possessed by 1/1,100,000 of the population — and under the article’s assertion, “the probability that the database search had hit upon an innocent person” would still have been “1 in 3.”

Despite all the other evidence that the police would have found, and even if the prosecutors didn’t introduce the DNA evidence, there would be, under the article’s description, a 1/3 chance that the search had hit upon an innocent person (Pickett), and thus a 1/3 chance that Pickett was innocent, presumably more than enough for an acquittal. That can’t, of course, be right. But that just reflects the fact that 1/3 is not “the probability that the database search had hit upon an innocent person.” It’s the probability that a search would have come up with someone innocent if the rapist wasn’t in the database.

I think that’s exactly it. I believe the reason is that inclusion of a known guilty person in the database corrupts the math involved in pure probabilities of finding an innocent person.

I think Eugene has hit upon an actual error in the piece with this, and not just a matter that’s open to debate. I don’t think they would ever correct it, because they have a history of failing to correct errors if the explanation of the error is long and difficult — even if it’s unquestionably an error. Still, when I have more time, I’ll follow up on this more.

Read Volokh’s entire post, which has other illuminating insights, here. Previous posts on this subject here, here, and here.

5/5/2008

Law Professor: L.A. Times Article on DNA Portrays One View As the Consensus View

Filed under: Crime, Dog Trainer, General — Patterico @ 6:28 am

Prof. David Kaye says on his blog that yesterday’s L.A. Times article on DNA, cold hits, and statistics is not balanced, and portrays one side of a debate as though it is the only valid viewpoint:

[A]n article in the May 3 Los Angeles Times claims to have uncovered a national scandal of sorts. The reporters describe a recent “cold hit” case that they say

is emblematic of a national problem, The Times has found. [¶] Prosecutors and crime labs across the country routinely use numbers that exaggerate the significance of DNA matches in “cold hit” cases, in which a suspect is identified through a database search. [¶] Jurors are often told that the odds of a coincidental match are hundreds of thousands of times more remote than they actually are, according to a review of scientific literature and interviews with leading authorities in the field.

The article maintains that

[I]n cold hit cases, the investigation starts with a DNA match found by searching thousands, or even millions, of genetic profiles in an offender database. Each individual comparison increases the chance of a match to an innocent person. [¶] Nevertheless, police labs and prosecutors almost always calculate the odds as if the suspect had been selected randomly from the general population in a single try. [¶] The problem will only grow as the nation’s criminal DNA databases expand. They already contain 6 million profiles.

This description portrays one approach to the issue as if it is the consensus in the scientific literature. It is not. There is disagreement about the need to adjust a random-match probability. Furthermore, if one counts the number of peer-reviewed articles on the subject, the dominant view is that adjustment is not necessary.

(My emphasis.)

So according to Prof. Kaye, the dominant view according to peer-reviewed articles on the subject is portrayed as the minority view (indeed, I note that the view is hardly discussed, as if nobody takes it seriously).

Prof. Kaye’s post has more excellent insights on the right way to view this controversy. Go here to read it.

Previous posts on this subject here and here.

5/4/2008

Follow-Up on DNA and Cold Hits

Filed under: Crime, Dog Trainer — Patterico @ 6:43 pm

This is a follow-up to this morning’s post on DNA, cold hits, and statistics.

Prof. David Kaye, whom I cited in this morning’s post, has responded to my e-mail and given me permission to quote him.

Thanks for your inquiry. This is a surprisingly subtle statistical question. I have devoted two chapters to it in a forthcoming book to be published by Harvard University Press and have circulated a manuscript on the California cases and the general issue to law reviews. I served on the 1996 NRC committee that recommended adjustment, but I now find it difficult to defend that recommendation. Basically, there are two distinct questions:

Question 1. What is the chance that a database composed entirely of innocent people (with respect to crime being investigated) will show a match? For databases that are small relative to the number of people who could have committed the crime, the NRC adjustment makes sense. The British experience mentioned in article shows that this chance is much larger than the random match probability. But why is this “innocent database” probability important when considering what the evidence of a match to a named individual proves?

Question 2. How much does fact that the defendant identified by a trawl through the database matches – and no one else in the database does — change the odds that he is the source of the DNA at the crime-scene? This is the question that is of interest to a jury trying to weigh the evidence. It is the one that Peter Donnelly and other statisticians have addressed. The answer is that the single match in the database raises the odds even more (but only slightly more) than does testing a single person at random and finding that he matches. As you point out, in the limit of a database that includes every person on earth, the evidence of a single match in the database becomes conclusive. How can the value of the evidence possibly decline as small databases get slightly bigger, then somehow switch direction and get immensely stronger as they get bigger still?

The discussion of the issue in the news and the courts is oversimplified and misleading (but entertaining). The manuscript of the law review article is attached. Feel free to quote from it as “submitted for publication.”

With best wishes,

DHK

I quoted from Prof. Kaye’s article in comments to the previous post. Let me quote one of those passages here, because I think it sheds light on the issue:

We can approach this question in two steps. First, we consider what the import of the DNA evidence would be if it consisted only of the one match between the defendant’s DNA and the crime-scene sample (because he was the only person tested). Then, we compare the impact of the match when the data from the trawl are added to give the full picture. . . . In the database trawl case . . . [i]f anything, the omitted evidence makes it more probable that the defendant is the source. On reflection, this result is entirely natural. When there is a trawl, the DNA evidence is more complete. It includes not only the fact that the defendant matches, but also the fact that other people were tested and did not match. The more people who are excluded, the more probable it is that any one of the remaining individuals — including the defendant — is the source. Compared to testing only the defendant, trawling therefore increases the probability that the defendant is the source. A database search is more probative than a single-suspect search.

Interesting.

I should note that Prof. Kaye’s exposition of the two relevant questions is similar to, but somewhat different from the questions that I posed in my original post. In an attempt to illustrate what I believed to be the questions addressed by the two competing camps, I posited two similar questions: 1. What are the chances that a search of this database will turn up a match with the DNA profile? and 2. What are the chances that any one person whose DNA matches a DNA profile is indeed the person who left the DNA from which the profile is taken?

Prof. Kaye’s questions state the issue in a more refined and, I believe, more accurate manner. As to my questions, he says in a follow-up e-mail:

The answer to your question #1 depends on the chance that the database contains the source (and, if “a match” means exactly one match, no one else with the matching type). That is not the question that the statisticians who favor an adjustment to the random-match probability are considering. The proposed statistical adjustment relates to the following modified version of your #1:

1′. What is the chance that a search of a database will turn up exactly one match when the source of the crime-scene DNA is someone who is unrelated to everyone in the database?

Likewise, the statisticians who argue that the database search is better evidence than the single-suspect search (and they are the majority of those writing on the topic) focus on a variation of your second question:

2.’ What is the chance that the named individual whose DNA matches is the source?

I confess that I did not read the coin example you provided too closely. I suspect that it is correct. I have an example along these lines in my article (inspired by an example in the Donnelly-Friedman article).

Thus, I think the thrust of your remarks are on target, but some of the details of your analysis could be refined.

I thank Prof. Kaye for his correspondence. And yes, the coin example was rather long.

Incidentally, I have an e-mail in to Prof. Peter Donnelly, the Oxford statistician whom I cited in my earlier post. He is out until May 13.

I also received a nice e-mail from Jason Felch, one of the authors of the L.A. Times article, in a response to an e-mail I sent him. I have asked him for permission to quote from the e-mail and am awaiting his reply.

5/3/2008

Statistical Probability in Cold Hit DNA Cases

Filed under: Crime, Dog Trainer, General — Patterico @ 10:20 pm

The L.A. Times has an interesting article about the application of probability measures to “cold hit” cases made from DNA databases. I find the statistical arguments made in the article to be unconvincing, but due to my lack of training in this area, I remain completely humble about my ability to properly analyze the issue. However, experts have widely divergent opinions on the matter — a fact you’d never learn reading the article.

The article begins by describing a 1970s rape/murder scene. A match was made from badly deteriorated DNA that bore only 5 1/2 of the possible 13 markers available. When all 13 markers are available for a match, the probability of a random person bearing the same profile can run to 1 in a quadrillion — thousands of times the number of people on the planet. Because of the lack of the 13 markers in this case, the chance was lowered to 1 in 1.1 million.

This is known as a “random match probability” and the article describes it as the “rarity of a particular DNA profile in the general population.”

At Puckett’s trial earlier this year, the prosecutor told the jury that the chance of such a coincidence was 1 in 1.1 million.

Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett’s case, it was 1 in 3.

The article restates the proposition again later in the article:

In every cold hit case, the panels advised, police and prosecutors should multiply the Random Match Probability (1 in 1.1 million in Puckett’s case) by the number of profiles in the database (338,000). That’s the same as dividing 1.1 million by 338,000.

For Puckett, the result was dramatic: a 1-in-3 chance that the search would link an innocent person to the crime.

It seems to me that the conclusion does not logically follow at all. The formulation simply can’t be right. The suggestion appears to be that the larger the database, the greater the chance is that the hit you receive will be a hit to an innocent person. I think that the larger the database, the greater the probability of getting a hit. Then, once you have the hit, the question becomes: how likely is it that the hit is just a coincidence?

An example makes it simpler.

Let’s say the random match probability for a DNA profile is one in 13.4 billion. In such a case, it seems very unlikely that the hit you get will come back to a different person than the person who left the DNA at the crime scene. Now assume that your database contains all 6.7 billion people on the planet. It’s virtually certain that you will get a hit, of course. But if you got a hit — only one hit — you would intuitively feel certain that you had the right person from that hit.

Yet the logic of the article would seem to say you take 13.4 billion and divide it by the size of the database (6.7 billion). making a 1-in-2 chance (50%) that you have the wrong person (an “innocent person”).

I say hogwash. And I think my example shows why it’s confusing and potentially misleading to use the word “innocent” in these calculations.

My off-the-cuff reaction — and keep in mind, I have no experience in statistics — is that the people who advocate this approach are measuring the question:

1. What are the chances that a search of this database will turn up a match with the DNA profile?

when the truly relevant question is, instead:

2. What are the chances that any one person whose DNA matches a DNA profile is indeed the person who left the DNA from which the profile is taken?

There is a third, rather silly question whose answer seems obvious, but which I will raise for the purposes of relating to an analogy I will make:

3. Once a match has been made through the database, what is the chance that the person whose DNA provided the match will match the DNA profile?

This last one is obviously almost 100%, the lack of complete certainty owing purely to human error; taking human error out of the equation for a theoretical analysis, it’s a tautology: a match is a match.

It seems to me that this is a useful analogy: everyone knows a coin has a 50/50 chance of coming up heads. If I give you a room that has 10,000 coins that were randomly tossed in the air and have landed on the ground, the chances that at least one of those coins landed heads are very nearly approaching 100% certainty (question 1). But the chances that any one of those coins was going to come up heads before it was tossed is still 50% (question 2).

Now, if I tell you to go find me a coin that has come up heads, then the chances it did come up heads are (absent human error) 100% (question 3). But, the chances that it was going to come up heads before it was tossed are still 50% . . . and always will be, no matter how many coins are in the room. You’re almost certain to find one with heads in a room with a larger database (thousands of coins), but the chances that it was going to come up heads always remain the same.

Applying the analogy to a DNA database, it seems to me that the size of the database increases your chances of a hit. But the chances that the profile obtained from your hit is a coincidence will always remain the same, and will always be a function of the number of loci and their frequency in the relevant populations.

The L.A. Times article makes it sound as though it’s quite well accepted that jurors are constantly being misled:

Jurors are often told that the odds of a coincidental match are hundreds of thousands of times more remote than they actually are, according to a review of scientific literature and interviews with leading authorities in the field.

. . . .

[B]ecause database searches involve hundreds of thousands or millions of comparisons, experts say using the general-population statistic can be misleading.

The closest you get to an acknowledgement that not everybody agrees is a passing reference to the fact that this assertion “has been widely but not universally embraced by scientists.”

“Not universally” is quite the understatement. Apparently, there is a debate raging about this among statisticians. Law professor David H. Kaye explains that, while many agree with the analysis described in the L.A. Times article, there is a theory out there that the use of the database “actually increases the probative value of the match.” (I have an e-mail in to Professor Kaye to ask him for further comment.)

The argument to which Professor Kaye refers was made in a Michigan Law Review article by Peter Donnelly, Professor of Statistical Science and Head of the Department of Statistics at the University of Oxford, and Richard D. Friedman, a law professor at the University of Michigan. The first page of their law review article is here. An earlier version of the argument was apparently made by Donnelly with David Balding in a paper titled “Evaluating DNA Profile Evidence When the Suspect is Identified Through a Database Search,” according to mathematician Keith Devlin of Stanford.

Devlin appears to agree with the approach described in the L.A. Times article. However, he says:

Personally, I (together with the collective opinion of the NRC II committee) find it hard to accept Donnelly’s argument, but his view does seem to establish quite clearly that the relevant scientific community (in this case statisticians) have not yet reached consensus on how best to compute the reliability metric for a cold hit.

You’d never know that reading the L.A. Times article, which implies that all but the most rabid pro-law enforcement shills agree that jurors are being given bogus statistics.

[UPDATE: For proof as to how conclusively the paper portrays this point of view, look at this image of what appears on the front page of today’s Sunday paper:

dna-on-front-page.JPG

Tell me where in that image you see any hint that “the relevant scientific community (in this case statisticians) have not yet reached consensus” as mathematician Devlin states.]

The paper wraps up the article by suggesting that the real probability of a coincidence is not 1 in 1.1 million, but 1 in 3:

In the end, however, jurors said they found the 1-in-1.1-million general-population statistic Merin had emphasized to have been the most “credible” and “conservative.” It was what allowed them to reach a unanimous verdict.

“I don’t think we’d be here if it wasn’t for the DNA,” said Joe Deluca, a 35-year-old martial arts instructor.

Asked whether the jury might have reached a different verdict if it had been given the 1-in-3 number, Deluca didn’t hesitate.

“Of course it would have changed things,” he said. “It would have changed a lot of things.”

By the way, in the case described in the L.A. Times article, there was more than just the cold hit. In addition to the fact that the defendant was a serial rapist who described his rapes as “making love” — the same terminology used by the murderer — the prosecution also showed the following:

[Defendant] Puckett “happened to be in San Francisco in 1972,” Merin told jurors in his opening argument. Merin could not place Puckett in [victim] Sylvester’s neighborhood on the day of the slaying. But Puckett had applied for a job near the medical center where Sylvester worked.

With the court lights dimmed and a photo of Sylvester’s naked body displayed on a screen, Merin argued that Puckett’s 1977 sexual assaults showed an “MO” consistent with Sylvester’s killing.

In each of those crimes, Puckett had posed as a police officer to gain the woman’s trust. The absence of forced entry to Sylvester’s apartment indicated her killer had also used a ruse, Merin said.

Puckett had kidnapped his victims by holding a knife or ice pick to their necks, leaving scratches similar to those found on Sylvester’s neck — what Merin called “his signature.”

I now throw open the matter for discussion.

UPDATE: Radley Balko has posted on this. He agrees with the L.A. Times experts. I have posted some counterarguments in his comments.

UPDATE x2: Follow-up post here with helpful responses from Prof. Kaye.

UPDATE x3: Statistics always opens the possibility of using language that doesn’t describe what’s really going on. For example, in this post I referred to “random match probability” as “in essence, the chance that two unrelated people will share the same genetic markers.” I’m not comfortable that this is right, and have removed the sentence. Random match probability refers to the expected frequency of a set of markers appearing in a population of unrelated individuals. I think it’s best to stick with that definition.

4/26/2008

Stein Twitters Out a Column

Filed under: Dog Trainer — Patterico @ 1:28 am

Joel Stein:

I’ve had thoughts that weren’t complete enough for a column and ended up as my Facebook status.

Yeah, well. You’ve had thoughts that weren’t complete enough for a column and ended up as a column.

4/24/2008

L.A. Times Gives Bill Ayers a Puff Piece

Filed under: 2008 Election, Dog Trainer, General — Patterico @ 7:32 am

The L.A. Times gives Bill Ayers a little unrebutted puff piece today, allowing Ayers to claim, without any opposing viewpoint, that he is being misrepresented in the media.

Ayers, of course, is the Weather Underground terrorist (now a professor who spoke fondly of bombings as recently as September 2001) who has been described as “friendly” with Obama. In February, Obama strategist David Axelrod described Obama and Ayers as friendly acquaintances:

“Bill Ayers lives in his neighborhood. Their kids attend the same school,” [Axelrod] said. “They’re certainly friendly, they know each other, as anyone whose kids go to school together.”

This article further explains:

Ayers was loosely involved in Obama’s election as an Illinois state senator in the late 1990s, when he was introduced to local activists at a meeting in his house. He also donated $200 to Obama’s reelection campaign in 2001.

Obama served with Ayers on the board of the Woods Fund, a philanthropic foundation, for three years and shared a platform with him at two academic conferences.

The L.A. Times allows Ayers to portray his terrorist attitude as ancient history, in a piece titled Ex-radical William Ayers keeps low profile. The deck headline reads: “The Weatherman founding member, now a professor, says he wants to avoid fueling his ‘cartoon’ media image. So he won’t be discussing his ties to Obama.” And the lede reads as follows:

William Ayers, a former radical leader turned academic and school reformer, has never been hesitant to speak his mind.

Although there has been no public response from him since his ties to Barack Obama — the two neighbors served on a charity board together for three years — were referenced during last week’s Democratic debate in Philadelphia, Ayers said Wednesday that he has a good reason.

He doesn’t want to feed the flawed “narrative” out in the media, he said, one that has commentators on Ayers’ own blog wondering why someone hasn’t shot him dead yet.

“It’s a cartoon” that people are reacting to, said Ayers, a professor of education, in a brief chat at his University of Illinois at Chicago office.

There’s one little detail left out of this puff piece: just how “ex” are the pro-bombing views of this “ex-radical”? Not as “ex” as they ought to be, according to this New York Times article published, inappropriately enough, on September 11, 2001 (although the interview obviously occurred earlier):

”I don’t regret setting bombs,” Bill Ayers said. ”I feel we didn’t do enough.” Mr. Ayers, who spent the 1970’s as a fugitive in the Weather Underground, was sitting in the kitchen of his big turn-of-the-19th-century stone house in the Hyde Park district of Chicago. The long curly locks in his Wanted poster are shorn, though he wears earrings. He still has tattooed on his neck the rainbow-and-lightning Weathermen logo that appeared on letters taking responsibility for bombings. And he still has the ebullient, ingratiating manner, the apparently intense interest in other people, that made him a charismatic figure in the radical student movement.

Tattoos can be removed, if a person really wants to do it. But why would they, when they still believe in the principles of the criminal organization whose murderous goals are symbolized by that tattoo?

How can any responsible profile of Bill Ayers leave out the tidbit that he spoke approvingly of the concept of setting bombs only 6 1/2 years ago? Or fail to speak to even one person who might have pointed that out?

But then, who said this profile was bound to be responsible? It was published in the L.A. Times.

4/22/2008

Mary McNamara Shows Humor and Class in Addressing George Washington Mistake

Filed under: Dog Trainer, General — Patterico @ 8:48 pm

The Los Angeles Times has this correction:

HBO: A critic’s notebook in Saturday’s Calendar section that mentioned “John Adams” and other HBO shows said that George Washington served only one term as president. He served two terms.

Good enough. But better still is that the writer, Mary McNamara, has a good sense of humor about it:

Oh, if only I could claim it was all a ploy by Calendar editors to gauge readership. But when I wrote in Saturday’s story about HBO that George Washington stepped down from the presidency after serving only one term, it was just a stupid, blind error, the sort that leaves you smiting your forehead, literally and repeatedly, the moment it is pointed out to you.

For the six or seven people living in the Los Angeles Basin who did not e-mail to correct me, he served two terms, not one. And my daddy was a history teacher! Ever since the first e-mail hit my box (on Friday afternoon, about two seconds after the story went up on the website), I have been bathed in hot shame. But I want to thank you, well, most of you, for the gentle tone you took — most clever subject line award goes to: Is a TV Critic Smarter Than a 5th Grader? — though I certainly deserved all those incredulous exclamation marks as well. And yes, I did go to college. Graduated even.

Also, for the record, we entertainment writers are held just as accountable for flubbed historical references as any other journalist. The correction runs today online and in tomorrow’s print edition, and I will try to comfort myself with the knowledge that a good, strong dose of humility is always good for the soul. Especially the soul of a critic.

Well played, Mary McNamara. It would be engaging to see more human reactions like this when the Big Faceless Newspapers correct their mistakes.

One more thing, though: it’s amazing that a mistake in an item that runs on Saturday, which is noticed by the author within two seconds, takes until Monday to correct online (and until Tuesday to correct in the paper). That means the error took two to three days to correct. That’s really even more embarrassing than the error itself.

I mean, I make mistakes all the time, but they’re far less embarrassing when you catch them quickly and correct them quickly. But then, I have full control over what I publish, even after it goes up. Apparently newspapers like the L.A. Times aren’t quite so nimble — and that’s a gross understatement.

Dinosaur Media: the term works on so many levels!

4/21/2008

L.A. Times: Patterico a “prominent Angeleno” who weighs in on Special Order 40

Filed under: Deport the Criminals First, Dog Trainer, Public Policy — Justin Levine @ 12:17 pm

[posted by Justin Levine]

Perhaps he is too bashful to admit it, but Patterico’s favorite newspaper labels him as a “prominent Angeleno” in today’s edition which asks several people their views on Special Order 40 in Los Angeles. (His actual views on the subject are well worth reading too - apart from the side issue of what he is labeled as.)

[Justin Levine]

Next Page »

Powered by WordPress.