Patterico's Pontifications

5/8/2008

The L.A. Times’s Errors in Its Piece on DNA and Cold Hits

Filed under: Crime,Dog Trainer,General — Patterico @ 11:10 pm

I have sent the following e-mail to the authors of that L.A. Times piece on DNA and cold hits:

Mr. Felch and Ms. Dolan,

I believe your recent front-page article on DNA cold case statistics misstated the meaning of the math you discuss.

Your article said:

Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett’s case, it was 1 in 3.

The 1-in-3 number does not pertain to the probability that the database search had hit upon an innocent person. Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.

If we ignore the existence of independent evidence of Puckett’s guilt, the statistical chance Puckett is innocent depends in part on the probability that the database contains the guilty party. Your article gives no information on what this probability is (although the fact that the database consists of California-based felons suggests that the chances are better than one would find in a purely random database). Without knowing the probability that the database contains the guilty party, you can’t conclude that the 1-in-3 figure accurately represents the chances Puckett is innocent. Your article confuses two distinct concepts and requires correction.

You state:

In every cold hit case, the panels advised, police and prosecutors should multiply the Random Match Probability (1 in 1.1 million in Puckett’s case) by the number of profiles in the database (338,000). That’s the same as dividing 1.1 million by 338,000.

Actually, you have that upside down. Multiplying (1 in 1.1 million) by 338,000 is the same as dividing 338,000 by 1.1 million — not dividing 1.1 million by 338,000.

Your article continues:

For Puckett, the result was dramatic: a 1-in-3 chance that the search would link an innocent person to the crime.

Again, this is wrong. There is a 1-in-3 chance that the search would link someone to the crime. Whether that person is innocent or not depends on the likelihood that the database contains the guilty party (as well as the quality of other evidence tying that defendant to the crime).

I am not the only person saying this. A similar point was made by Eugene Volokh in this post. And I made the point in more detail in this blog post of mine.

I think the paper owes readers at least two corrections — one of the 1-in-3 statistic, and one on the upside-down division. Given the prominence of the error on the 1-in-3 statistic, which appeared on the front page of the Sunday paper, I hope your paper will make an effort to give this correction the prominence it deserves.

cc: Readers’ Representative

I’ll let you know what I hear in response.

P.S. When I say “Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.” I meant to express this concept: “Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match, period. If we get a single match, we won’t know whether it was to an innocent person or a guilty person without learning more.” In other words, without prior knowledge of the likelihood that the database has the guilty person, all we know is the chance of a hit — not the chance that a single hit has come back to an innocent person.

P.P.S. I just changed the last phrase from “not the chance of a hit to an innocent person” to “not the chance that a single hit has come back to an innocent person.” That more accurately expresses what I was trying to say.

Expressing statistical concepts in accurate English is like walking a tightrope.

Comments (51)

51 Responses to “The L.A. Times’s Errors in Its Piece on DNA and Cold Hits”

I think I’d go with that letter, too. (Though the paper may decide it’s too trivial a correction to deal with.)
Karl Lembke (521de2) — 5/8/2008 @ 11:31 pm
misstated the meaning of the math you discuss.

Great job – and easily correctable without requiring pages of figures.
Apogee (366e8b) — 5/9/2008 @ 12:26 am
This statement is demonstrably wrong:

The 1-in-3 number does not pertain to the probability that the database search had hit upon an innocent person. Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.

If the database contains the guilty persons DNA sample then the probability of at least one match becomes unity. If the database does not contain the guilty persons DNA then the probability is 1 in 3 of at least one match. Thus the actual probability of a match is somewhere between 1/3 and 1 depending on what the probability is that the offender’s DNA is in the database. The range of this can be measured by examining the statistics of like cases.

Should it turn out that for that type of crime there is a 70% chance the offender’s DNA is already in the database then we can work out the probability of both a match, and the probability of innocence/guilt.

Take the two distinct cases:

For the offender in database: probability of match is 1.
For the offender not in the database, the probability of a match is .333

Overall probability of a match:
1.0*0.7 + 0.333*(1.0-0.7) = .70 + .10 = .80

Probability that, given a match, the matched person is not the offender: .10/.80 = 1 in 8
Probability that, given a match, the matched person is the offender: .70/.80 = 7 in 8

Not bad. A lot more than the 2 in 3 indicated in the article. But wait. Lets look at the case where the probability of the offender’s DNA is less likely to be in the database. Let’s say it is shown to be there only 10% of the time. Going through the same calcuations:

Overall probability of a match:
1.0*0.1 + 0.333*(1.0-0.1) = .10 + .30 = .40

Probability that, given a match, the matched person is not the offender: .10/.40 = 1 in 4
Probability that, given a match, the matched person is the offender: .30/.40 = 3 in 4

So we can see how the probabilities shift depending on the database coverage. A final note. There are probabilities that the database may match multiple people and to be rigorous, multiple scenarious would have to be examined. Does the search terminate on one match or does it do an exhaustive search reporting all matches. Both alter the numbers. However, the purpose of the exercise was to demonstrate error and I think it sufficient for that purpose.

Of course the real question that remains is what percentage of people in the database, non-matching and selected at random, could be excluded by subsequent investigation. The probability of a false positive would then be reduced by this factor.
doug (fbba00) — 5/9/2008 @ 12:50 am
Your points are of mild interest. Sorry, but even after reading all this, I still don’t see it. Yes, the writer is befuddled in his fractions. Yes, the chance of hitting on a guilty party is more likely in this database than, say, the phone book, and certainly cannot possibly hit an “innocent”, except possibly of this particular crime. But again, so what?

The real problem that the article exposed is a completely clueless use of statistics in a particular trial.

The odds of the next 4 rolls of a pair of [fair] dice being snakeyes are 1 in 1.7 million. The odds someone did it last week in Las Vegas are almost certain. Presenting the one case for the other is what happened here.
Kevin Murphy (0b2493) — 5/9/2008 @ 12:56 am
Presenting the one case for the other is what happened here.

If you were talking about the Times article, you would also be correct.
Apogee (366e8b) — 5/9/2008 @ 1:22 am
The 1-in-3 number does not pertain to the probability that the database search had hit upon an innocent person. Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.

No, no no, a thousand times no. There is no little green man hiding in the database fixing the odds so that 1 in 3 searches will always result in a hit regardless of the probability that the killer is in there. The 1 in 3 number is an error rate, not a total hit rate. The latter is necessarily higher. How much higher, we don’t know without knowing the odds of a true hit. Call that value X.

Prior to running the search, there was a 1 in 3 chance of matching an innocent person, and a 1 in X chance of matching the truly guilty party. Neither had any influence on the other. Unless the value of X was 1, there was a possibility that the search could turn up a true hit only, a false hit only, or both. After the fact, we know that this particular search turned up only one hit, therefore, we can state with certainty that this time around, either the killer was in the database or someone was nabbed randomly, but not both. To know which is more likely, we need to know the value of X. If it is 3 (one-third of all killers are in the DB), then one of two things happened, both of which had a 1 in 3 chance of occurring. In that case, the odds of Puckett’s innocence are 50-50, and the Times’s only sin was understatement. If the value of X is 2/3 (two-thirds of all killers are in the DB), then it is twice as likely that Puckett was guilty rather than innocent. But unless the value of X is excruciatingly close to 1, it’s nowhere near the one in a million figure cited by that disingenuous prosecutor, and the Times is guilty of, at worst, a technical foul.
Xrlq (62cad4) — 5/9/2008 @ 4:26 am
OK, you’re right about the slip in the fraction. But they did use the correct fraction to calculate the expected number of false positives. This is slightly greater than the probability of false positives occuring but close enough for the purposes of the article.

As Doug has pointed out you made a slip in the letter.

The article makes the mistake of confusing the false positive rate with the probabilty of innocence. You correctly point out that you need to know the probability of the culprit being in the data base before you can calculate the probability that the match is from the culprit.

But the point of the article is that the prosecution used the random match rate in a very missleading way. They presented it as if it was equivalent to the false positive rate which under the circumstances it was not. This point is correct. You need to use caution in interpreting results from this sort of search.

There is a temtptation to believe in things which would make your efforts more successful especially in these frustrating cases. I suppose that is why many people in law enforcment trust polygraphs. In a cold case such as this you want to belive that you have made a breakthrough. You have to accept the limitations of your techniques.
Lloyd Flack (ddd1ac) — 5/9/2008 @ 4:39 am
Patterico,

You don’t understand, the LATimes doesn’t believe anyone is guilty except for meat eating, gun owning REPUBLICANS.
PCD (5c49b0) — 5/9/2008 @ 4:56 am
I don’t think Doug is saying anything different from what I said.

If the database contains the guilty persons DNA sample then the probability of at least one match becomes unity. If the database does not contain the guilty persons DNA then the probability is 1 in 3 of at least one match. Thus the actual probability of a match is somewhere between 1/3 and 1 depending on what the probability is that the offender’s DNA is in the database. The range of this can be measured by examining the statistics of like cases.

Perhaps the way I expressed it was less than clear, but when I said

Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.

I meant to express this:

Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match. If we get a single match, we won’t know whether it was to a guilty person or not because we don’t have prior knowledge whether the guilty person is in the database or not.

[To elaborate:]

If we had prior knowledge that he was not in the database, we could say there is a 1 in 3 chance of a hit, and any hit would be to an innocent person.

If we had prior knowledge that he was, we could say there is a certain chance of a hit, and that the hit is to a guilty person.

With no prior knowledge either way, we know only that there is a 1 in 3 chance of a hit. If we get a hit, we don’t know whether the hit is to a gulity person or an innocent person.

So maybe I wasn’t entirely clear about *how* I said it, but I think the concept I was trying to express was correct. Using the numbers in the article, it doesn’t change the percentages significantly to remove one possible donor from the world’s pool of possible donors. Put another way, the rest of the database doesn’t know whether the guilty guy is there or not.
Patterico (4bda0b) — 5/9/2008 @ 6:31 am
See my P.S.
Patterico (4bda0b) — 5/9/2008 @ 6:35 am
Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett’s case, it was 1 in 3.

The seminal problems with the above statements are manifold::

1)Leading scientists don’t consider the statistics the “most significant”. This is an intentional overstatement of facts. In point of fact, leading scientists would not agree with the conclusions in this article. Period.

2)The second problem with this statement, is that it intends to convey “innocence” with “marker match” at a level of 1 out of 3. This is also a gross overstatement of the facts. You are NOT going to get a 33% hit on markers on innocent persons if the guilty party is in the database searched. In fact, depending upon the specific markers, the database, and the crime…that number may fluctuate substantially.

3) If we ignore the existance of independent evidence of Puckett’s guilt,… we are completely distorting the statistics to slander the investigation and prosecution of this crime. One cannot, MUST not eliminate the corroborating evidence when one discusses “innocence” of a party.

The existence of compounding pieces evidence work to ameliorate the STATISTICAL chances that the “matched” person was a)not available to commit the crime; b)not physically able to commit the crime; etc.

If you have a database consisting of only those persons available and capable of committing the particular crime…AND then there is a match…the likelihood of 1 in 3 being a “random hit” is ridiculously overstated.
cfbleachers (4040c7) — 5/9/2008 @ 6:38 am
Journalists have never been known for their math skills. That’s why we’re mostly English majors.
Bradley J. Fikes (1c6fc4) — 5/9/2008 @ 6:47 am
leading scientists

The invocation of generic experts – a necessary ingredient for all weak articles.
Amphipolis (fdbc48) — 5/9/2008 @ 7:01 am
Cfbleachers,

The article got it partially right. The false positive rate is far largeer than the random match rate that the prosecutor quoted. Using the random match rate was missleading.

Where the LA Times got it wrong was in confusing the false positive rate with the probability of innocence.

Your second point is wrong. See Patterico’s post above. The probability of getting a false positive has nothing to do with the probability of the suspect being in the data base. Both can happen.

As for the third point, the discussion has been about how strong the evidence of guilt from a data base match is, not how strong the other evidence is. The argument has been that in this case proof of guilt will have to depend primarilly on the other evidence. Yes the data base hit means that he was one of a fairly small number of persons who might have committed the crime. It would not be conclusive evidence by itself.
Lloyd Flack (ddd1ac) — 5/9/2008 @ 7:02 am
I am fascinated by most of this discussion, but the real point of the LAT article was to criticize the technique of searching the database for matches. But this technique is only a high tech version of all investigation work. You start with the population, and narrow the field of suspects by eliminating those who couldn’t have done it. Searching the database eliminated 337,999 potential suspects because their DNA did not match the crime scene. There has to be more evidence to convict than that, and in this case there was.
I am just as convinced of his guilt because he has been convicted of crimes with the same MO, and can’t be eliminated as a suspect by other evidence.
Mike S (d3f5fd) — 5/9/2008 @ 7:45 am
P.S. When I say “Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match — whether that match is to an innocent person or a guilty one.” I meant to express this concept: “Rather, the 1-in-3 number pertains to the probability that a database search will result in a single match, period.

Still wrong. The 1-in-3 number pertains to the probability that the database will result in a random, i.e., false match. The probability that the database will result in one match total is unknown, but irrelevant. All that is relevant are the following:
1. What are the odds of a false hit? Answer: 1 in 3.
2. What are the odds of a true hit? Answer: 1 in x, where x represents the odds that the killer was in the database to begin with.
3. Was this one hit (Puckett) more likely to have been a true hit or a false hit, and how much more likely?
The answer depends entirely on the value of x. If x = 0 (no possibility of killer being in the DB), then we know for certain that Puckett was a false hit. Similarly, if x = 1 (absolute certainty that the killer was in the DB), then we know for certain that Puckett was a true hit. If x = 3, then the odds of his guilt vs. innocence are 50-50, as we know one of two equally probable outcomes (each stood a 1 in 3 chance) has occurred.

Bottom line: your email should have focused entirely on the fact that we are comparing the odds of two types of hits occurring, rather than the odds that one or the other might have occurred in a vacuum.
Xrlq (b71926) — 5/9/2008 @ 7:57 am
X,

It’s a question of what is known when you do the search.

In your example, if you don’t know the value of x, then the chance of any hit is 1 in 3.

If you do know the value of x going in, it changes the equation. It also changes what you can say about the meaning of a single hit resulting from the search.
Patterico (a87c8f) — 5/9/2008 @ 8:14 am
1 of every 1.1 million coins was minted in 1972.

There are 6000 coins in the world made in 1972. Only one is green.

Here’s a room with 338,000 coins. What are the chances a coin will be found with a 1972 date? About 1 in 3.

We have found a single 1972 coin in the room. What are the chances it’s the green one? Dunno. Depends on the chances the green coin was in the room. We can’t say it’s a 1 in 3 chance the coin is green without knowing more.

If we had a scout go in the room and look for the green coin, determine it’s not there, and not look at dates, and we then did the search, we’d expect a roughly1 in 3 chance of finding a 1972 coin. And once we had it, we would know it’s not green.

[UPDATE: Ah, but the chance of a “single hit” is not the same as a chance of a “hit.” That part of my post was poorly worded and inaccurate. — P]
Patterico (deb6c3) — 5/9/2008 @ 8:26 am
In your example, if you don’t know the value of x, then the chance of any hit is 1 in 3.

Wrong. The chance of a random (read: false) hit is 1 in 3. The chances of getting any hit is necessarily higher, as it includes both the 1 in 3 chances of getting a false hit, plus any chance x may give us of getting a true one.
Xrlq (b71926) — 5/9/2008 @ 8:35 am
Reaction of an average person to this thread: MEGO!

We, of the great unwashed, rely on the professionals to sort this out, and to use their best judgement in coming to a consensus that can be applied to the law in a neutral manner.

Now, I know that some of the preceeding thoughts are asking a lot; but, when the freedom and property of real people are at risk, I think it is a fair request.
Too many times, the “professionals” within the judicial system, seem to be children playing at very sophisticated games, knowing there is no down-side risk for them personnally.
Whether or not that perception is accurate, it is out there and should be addressed – perhaps by arguements being carried out on a somewhat higher plane than what we see in the media, and in some televised trials.
Another Drew (f9dd2c) — 5/9/2008 @ 8:43 am
I like this letter better as well Patterico. You are on much more solid ground here. I think you could have brought up the issue of 1-in-4 vs. 1-in-3, but YMMV as they say.

Xlrq,

You are arguing about a conditional probability whereas what Patterico is talking about is an unconditional probability. That is,

Prob(Match|KIB) d.n.e. Prob(Match).

d.n.e. = does not equal. As has been noted many, many times before by me, Karl and Daryl (and I’ll also note we don’t all agree on certain aspects of this case–especially Daryl and I) all agree we do need to know P(Match). P(Match|KIB) assuming no false negatives is going to be 1. You’ll also need that number to update your prior (probability) for guilt.

Prior to running the search, there was a 1 in 3 chance of matching an innocent person, and a 1 in X chance of matching the truly guilty party.

True, but I consider this a trivial observation. Prior to running the trawl we could argue the probability that any of the people in the database is guilty is 1/N where N is the number of people in the database. Once we observe the results of the trawl, that probability is either going to remain the same (no hits) or for some members of the database their probability of guilt will go up, while others go down (by the theorem of total probability). This is kind of a “No duh,” observation.

After the fact, we know that this particular search turned up only one hit, therefore, we can state with certainty that this time around, either the killer was in the database or someone was nabbed randomly, but not both.

So what? The point still remains that we need to know the probability of a DNA match that is not conditioned on anything else. It was in every formulation of Bayes theorem that was put forward in that thread (IIRC).

cfbleachers,

Calm down dude, Patterico is the last person I’d accuse of ignoring corroborating evidence of guilt. The point he is making is: “If we set that data aside for the moment and look at this situation….” Also, please keep in mind we can do sequential updating with Bayes theorem. Let me be explicit here.

We are ultimately interested in P(G|DNA=1). G = guilt and DNA=1 is that we have 1 DNA match. Initially the investigator had no reason to think any member of the database was anymore likely to be guilty than the next. So P(G) could be set to 1/N. Now we get our DNA trawl results. Ah ha! One hit. Woohoo, a possible break through. Now we want to update our prior via,

P(G|DNA=1) = P(DNA=1|G)*P(G)/P(DNA=1).

Now, I think the 1-in-3 number is too high and favor 0.226 as was discussed in previous threads. Further, we agree that P(DNA=1|G) = 1. So we do the arithmetic and get,

P(G|DNA=1) = 0.000013091.

Not very good, but it is much higher than our initial pobability for Pucket of 0.000002959. Our revised probability is 4.24 times as large. So you go and investigate Pucket because everyone elses probability of being guilty just dropped like a rock.

Now, we could argue that the 1/338,000 is too high. Maybe some of the people in that database could not have possibly committed the crime in question in 1972. So you could remove these people and go with a lower prior probability. In the previous thread it was argued that the killer was in the database with probability of 0.066. This is close to 1/15. Using this prior we’d get

P(G|DNA=1) = 0.295.

We approach unity for the above conditional probability as our prior of guilt (for Puckett) approaches 0.226. What this prior says is that prior to the trawl, the investigators would have had to have reason to suspect Puckett, since they didn’t clearly that prior is too high.

Once you do settle on a prior (or even a range of priors for sensitivity analysis) then you bring in the additional evidence. For example, suppose we have a case where we get two hits. And that we used the uniform distribution as our prior. Now the probability that one of these two is guilty,

P(G(i)|DNA=2), i = 1,2.

Is the same. So we investigate both individuals and find out that at the time of the question individual 1 was incarcerated for another crime. We’d update our P(G|DNA=2) again using Bayes theorem and it should be obvious for individual 1 his probability is going to drop substantially.

So, Patterico is just fine on that last point of yours.
Steve Verdon (4c0bd6) — 5/9/2008 @ 8:47 am
I incorrectly stated the probabilities in the low percentage database coverage. It is serious in that the correct numbers point to a higher probability of a false positive than the 1 and 3 Times article indicates.

Overall probability of a match:
1.0*0.1 + 0.333*(1.0-0.1) = .10 + .30 = .40

Probability that, given a match, the matched person is not the offender: .10/.40 = 1 in 4
Probability that, given a match, the matched person is the offender: .30/.40 = 3 in 4

Oops. Got the order wrong on the 10% coverage case. It’s correct on the earlier 70% coverage example. The 10% case should read:

Overall probability of a match:
1.0*0.1 + 0.333*(1.0-0.1) = .10 + .30 = .40

Probability that, given a match, the matched person is not the offender: .30/.40 = 3 in 4
Probability that, given a match, the matched person is the offender: .10/.40 = 1 in 4>/i>

So here we have a different problem. When the a priori probability the database contains the DNA of the guilty party is low, the probability that a match is a false positive becomes larger than 1 in 3.

As stated earlier, this probability is an upper limit that must be reduced by investigative exclusion.
doug (fbba00) — 5/9/2008 @ 9:03 am
Pat, I like this letter much better as well. I think it conveys the fundamentals that potential jurors need to be aware of in a case like this.

#19 Xrlq:

The chance of a random (read: false) hit is 1 in 3. The chances of getting any hit is necessarily higher, as it includes both the 1 in 3 chances of getting a false hit,

I have some heartburn with your wording here. I just want to make sure that you understand there is no such thing as a “false” hit here. Either the portion of the sample DNA fragment exactly matches the relevant portion of a recorded sample in the database, or it doesn’t. The random match probability remains 1:1.1 million~so the chances of a match are going to remain the database size multiplied by the random match probability, unless you’ve managed to skew the database by recording only the descendents of a single progenitor or something.
EW1(SG) (84e813) — 5/9/2008 @ 9:24 am
Pat, I think what you say is correct, and I was just about to post this on your “proposed email” thread, which tries to say the same thing:

Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett’s case, it was 1 in 3.

This last statement is false because the “1 in 3” instead only represents the probability that the one dna match which should be found if you sampled 1.1 million members of the total population you are looking at will be in the data base – which in fact here consists of only about 1/3 of the 1.1 million needed. So the data base correspondingly has a 1 in 3 chance of containing a match, that’s all.

And you don’t know the probability that this one db match is a false positive unless you aslo know the total matches likely based upon the total population number you are considering, which in Puckett’s case was much greater than 1.1 million people, meaning more likely matches than just Puckett’s at a rate of 1/1.1 million total population.

So, Pat, why don’t you just flat-out tell the LAT that the 1/3 number is simply and only the chance that there is a dna match in the actual data base, which itself is within a much larger population [tp] containing all of the possible matches = “total matches” [tm], a total which depends entirely upon the actual number tp represents, given the “random match probability”[rmp], which I assume is the frequency of expected matches within an even larger theoretical population – perhaps the U.S.’s or the World’s? Whatever, the rmp = 1/1.1 million people in the tp of interest.

If I recall correctly, the liklihood that the one match found in the data base is a false positive is actually: [tm-1]/tm., where tm = likely total matches in the tp.

So that: the probability that any match found anywhere within the tp is “guilty” [Pg] = 1/tm., which has nothing to do with how many matches are actually found within the db, or who the matches are – that is, lacking any other consideration as to whether being in the db makes a “match” there more likely to have committed another particular crime compared to a match not in the db.
tm is found by multiplying tp by the frequency of the dna match, rmp: tm = [tp][rmp].

For example, if tp = 18.7 million people, and rmp = 1 match/1.1 million people, then
tm = 17 and Pg = 1/tm = 1/17 = the Probability that any match found within the tp is the guilty one. The probability that the/any match is a false positive is [17-1]/17 = 16/17.

Also, maybe tell the LAT that the rmp, = 1/1.1 million, is not the chance that an innocent match will be “found” by Law Enforcement instead of a guilty one. LE would have a very difficult time of actually finding any of the matches at all by a random search or process, such as by randomly picking someone up off the street. Randomly finding the guilty match would be even less likely, because there is only one of this kind of match vs, say, 17 total. LE simply[?] had a match within their db, which at least made finding Pluckett much easier. [They still haven’t actually found the other matches yet, have they?]
J. Peden (4938ac) — 5/9/2008 @ 9:39 am
EW1(SG), I’m not going to get into a semantic debate. If you don’t like the words “true” and “false,” feel free to substitute other labels more to your liking. The point is that a distinction must be drawn between the hits I call true (those matched to the actual donor) and those I call false (those matched randomly to others). False (random) hits are the 1 in 3 factor. True (actual match to donor) hits are a function of how likely the donor is to be in the database. Neither factor influences the other.

Either the portion of the sample DNA fragment exactly matches the relevant portion of a recorded sample in the database, or it doesn’t. The random match probability remains 1:1.1 million

Right, but if the true donor is in the database, his match isn’t random. that’s why you can’t conflate the two. If some random schmoe is added to the database, the odds increase by 1 / 1.1 million that there will be a hit. If the true donor is added, the odds increase by 1.

Think about the most obvious example, where we have a 100% that the true donor is in the database. We run the test on the entire database. Do you really think there is only a 1 in 3 chance that we will find a match?!
Xrlq (62cad4) — 5/9/2008 @ 10:40 am
There’s lots of ways to skew the “sample”/database. That’s why researchers calculate P-values. For example, here, what if the James-Younger gang were in the database? They were all siblings or cousins.

But we are not looking at the efficacy of a drug or the predictability of a heart attack. We are looking for suspects of a crime and the database is a useful place to start looking. Jesse was killed by Bob Ford and Cole is doing forty years in prison so we check out Frank.
nk (8f20b5) — 5/9/2008 @ 11:00 am
This is why most journalists are unqualified to do anything BUT opinion crap pieces. They are generally incompetent in all fields. A communications degree is probably the #1 choice of students incapable of entering college without affirmative action, unless you count ethnic studies. It undoubtedly has drug down an already lame and pointless degree.
martin (cd5d90) — 5/9/2008 @ 12:40 pm
There is a 1-in-3 chance that the search would link someone to the crime.

Wrong.

There is a 1-in-3 chance the search would link an innocent person to the crime.

There is a P chance that the search would like the guilty party to the crime, where P is the likelihood that the killer would be in the DB.

If P = 90%, then there is better than a 90% chance that the search would link someone to the crime.
Daryl Herbert (4ecd4c) — 5/9/2008 @ 12:56 pm
Daryl, but doesn’t that go only to the usefulness of the database? In other words, are the police wasting their time with it or not? Whether the blind hog found an acorn is an independent issue. Did the totality of the evidence prove the defendant guilty beyond a reasonable doubt?
nk (8f20b5) — 5/9/2008 @ 1:11 pm
How can we know? The jury was led to believe the odds were 1 in a million that the DNA would randomly match to Puckett, when in fact the odds were 1 in 3 that they would match to someone in a database full of sex offenders. If the jury had properly been instructed on that point, they may well have concluded that the DNA semi-match, in conjunction with other evidence, established proof beyond reaosnable doubt. But since they were improperly instructed, they may well have convicted based on the DNA evidence alone.
Xrlq (b71926) — 5/9/2008 @ 2:11 pm
Yup. It’s an investigative tool. Not evidence. Its potential for prejudice outweighs its probative value.
nk (8f20b5) — 5/9/2008 @ 2:25 pm
the LAT’s reporters can not even bother to confirm the gender of an interviewed man’s partner and call that man an “open homosexual” when in fact the partner is female and the man heterosexual.

You expect them to be able to pin down a complicated scientific story ?
seaPea (3c8938) — 5/9/2008 @ 2:52 pm
Yup. It’s an investigative tool. Not evidence. Its potential for prejudice outweighs its probative value.

That depends on how it’s presented. If the prosecution had truthfully advised the jury that the odds of a false hit were roughly 1 in 3, that would still mean that the evidence is probative, just not probative enough to sustain a conviction on its own. I would imagine that most admissible non-DNA evidence falls into this category as well. Every piece of evidence doesn’t have to be a smoking gun, it just can’t be presented as though it were a smoking gun if it’s not.
Xrlq (62cad4) — 5/9/2008 @ 4:10 pm
#25 Xrlq: I’m not meaning to debate semantics, and in truth have only been following certain parts of the discussion “with half an ear,” as much of it has little relevance to a criminalist, and even less to a jury. What concerned me in your earlier post was this statement:

The chances of getting any hit is necessarily higher, as it includes both the 1 in 3 chances of getting a false hit, plus any chance x may give us of getting a true one.

where I got the mistaken impression that you were conflating the two conditions.

I actually think we are on the same page, but what little I know of the subject I learned from an extremely competent criminalist so my viewpoint isn’t necessarily that of the man off the street.

As a working assumption, a criminalist isn’t going to worry about whether the criminal is in the database or not when running the sample against it. After all, to illustrate, with an RMP of 1:1.1 million, it’s possible that as many as 10 people in the LA County area alone match (and yes, I know the crime occurred in San Francisco). And its entirely possible there are no matches in the database. It’s only after a match is found, that the likelihood of an uninvolved party is considered and accounted for by corroboration. So in that sense, all hits are “true” until eliminated.
EW1(SG) (84e813) — 5/9/2008 @ 4:32 pm
#26 nk:

There’s lots of ways to skew the “sample”/database.

Actually, there aren’t and I gave a very poor example above without thinking.

The value of DNA testing as an identification tool comes precisely because of its measurement of independently heritable traits, so a set of quints would skew the database but anybody else wouldn’t.
EW1(SG) (84e813) — 5/9/2008 @ 4:51 pm
Er, and they would have to be identical quints, at that.
EW1(SG) (84e813) — 5/9/2008 @ 4:55 pm
This is so reminiscent of the disconnect between law and medicine regarding the insanity defense. Insanity is not a medical term. So the doctor is talking about “passive-aggressive personality disorder” and “cannabis and alcohol-induced dissociative state” and the jury is supposed to find whether “as a result of mental disease or mental defect, the defendant lacked substantial capacity to appreciate the criminality of his conduct”.
nk (8f20b5) — 5/9/2008 @ 5:18 pm
There is a ceiling on the probability that the culprit is in the data base. It is the probability that the culprit survived long enough to be included in the data base. Considering how long ago this case was, the chance that the culprit has died before he could be included in the data base is far from negligible.
Lloyd Flack (ddd1ac) — 5/9/2008 @ 8:04 pm
nk wrote: Daryl, but doesn’t that go only to the usefulness of the database? In other words, are the police wasting their time with it or not? Whether the blind hog found an acorn is an independent issue. Did the totality of the evidence prove the defendant guilty beyond a reasonable doubt?

No. It’s not an “independent issue.”

The probability that a search will return one or more hits is not independent from the probability that the killer is in the DB.
Daryl Herbert (452002) — 5/9/2008 @ 9:17 pm
If the prosecution had truthfully advised the jury that the odds of a false hit were roughly 1 in 3

The prosecution could not “truthfully” do so, because it isn’t true.
Daryl Herbert (452002) — 5/9/2008 @ 10:24 pm
Steve Verdon: I replied to your last comment directed to me on the previous thread.

If you want to keep discussing it, I would prefer to continue the conversation on whichever thread is the most recent, just so we only have to look at once place to talk to each other.
Daryl Herbert (452002) — 5/9/2008 @ 10:33 pm
And, actually, there are other problems with the Times and Patterico’s “analysis.”

If the chance of a random comparison being a match is indeed 1 in 1.1 million, then the chance against such a match is 0.999999091. Since each comparison is independent, the calculation of the probability against ANY matches in N tries against X samples is (0.999999091)^N. If there are 338,000 samples to test against, the chance that NO MATCH occurs is 73.5%. And 26.5% is therefore the chance that SOME matches occur.

If you want the chance that exactly R matches occur in N samples, you need to use the formula:

C(N,R) * (p^R) * ((1-p)^(N-R)) where C(N,R) is the combination of N things taken R at a time, and p is the probability of a single trial.

IF R = 1, this simplifies to N*p*((1-p)^(N-1)) or with these numbers:

(338000/1100000)* ((1-(1/1100000))^337999

or 0.225, or 22.5% is the chance for exactly 1 match, and 4% is the chance that more-than-one match occurs (26.5%-22.5%).

So, the whole article is silly, as is the correction. The odds of only one match occurring is not 1/3rd, and further it has nothing to do with dividing 338,000 by 1.1 million or the other way around. It’s not a division problem at all.

Lastly, the fact that only one match occurs is not even of interest because it is 5 times more likely than more-than-one match.

All this says is that the DNA testing in this case is hardly conclusive. Convicting someone on this basis is the equivalent of convicting on the basis of hair color or race. At best the DNA test corroborated other evidence.

By the way, the defense attorney gets even lower marks on this test because he had a slam dunk rebuttal.
Kevin Murphy (0b2493) — 5/10/2008 @ 1:57 am
Lastly, the fact that only one match occurs is not even of interest because it is 5 times more likely than more-than-one match.

It’s of interest because it allows us to exclude the possibility of both a true and a false hit.
Xrlq (62cad4) — 5/10/2008 @ 6:02 am
The whole problem with the article (and with much of the debate that has followed it), is the looseness with which some terms are used to convey concepts.

“Innocence” is being used too broadly or inappropriately and the attempt to convey the concept that DNA evidence will wrongly “convict” 1 out of every 3 defendants in a case where it is used, …is the journalistic equivalent of jury tampering before the fact.

Given the LA Times sordid history, I believe this has 1 out of 1.25 chance of being the intention.

Without knowing which specific markers and also the specific composition of the database tested against those markers…we seem to be taking random markers vs. a random database.

The chance of a 5.5 marker match on a RANDOM database of 350,000 RANDOM individuals…that would produce ONE individual…and that individual did not match all of the 13 markers…produces one statistical number. Let’s call that number “X”

Change the database to include all Hopi tribal women. Or Italian-American babies. Or Scottish farmboys. Will that produce a number that is greater or lesser than “X”, because the database is composed of a different subset of individuals?

Now, let’s change those 5.5 markers…for an entirely different group of 5.5 markers. Apply it to our random database and give our “conclusive” number the designation “Y”.

Will “Y” always equal “X”?

How about against different databases?

Here’s the thing…if the markers rule out and rule in different “things”…they rule out and rule in different people.

If the database includes some commonality, that may rule out or rule in a higher or lower percentage of people…depending upon the markers.

Are KNOWN OFFENDERS more likely to commit crimes of this type than the “random general public”?

Does that subset database already have built into it a higher likelihood of criminality than the same sized random database?

Would that fact give a higher return on a “hit” of GUILTY parties based upon 5.5 markers where only one person is returned…and that person was known to be available to commit the crime and in the area at the time of the violent act?
cfbleachers (4040c7) — 5/10/2008 @ 7:17 am
If the chance of a random comparison being a match is indeed 1 in 1.1 million, then the chance against such a match is 0.999999091. Since each comparison is independent, the calculation of the probability against ANY matches in N tries against X samples is (0.999999091)^N. If there are 338,000 samples to test against, the chance that NO MATCH occurs is 73.5%. And 26.5% is therefore the chance that SOME matches occur.

Kevin,

The 26.5% number was covered by Eugene Volokh in his post, which I linked in mine. In the longer version of my post, I included all the caveats so that people like yourself couldn’t come along later and claim that I was an idiot because the number was really 26.5% instead of 1 in 3. Everyone who has read all my posts and followed the links understands that we are talking about an approximation of an approximation.

The 1 in 3 number does accurately represent the multiplication adjustment recommended by the scientific committees as a conservative and simpified way of expressing the chances of finding a match in a database of unrelated individuals who did not donate the crime scene DNA.
Patterico (4bda0b) — 5/10/2008 @ 10:33 am
#44 cfbleachers:

Change the database to include all Hopi tribal women. Or Italian-American babies. Or Scottish farmboys. Will that produce a number that is greater or lesser than “X”, because the database is composed of a different subset of individuals?

No. As I mentioned above, one of the reasons the markers that used are used is because they are independently heritable, i.e. not associated with any subset of the population.
EW1(SG) (84e813) — 5/10/2008 @ 11:02 am
Expressing statistical concepts in accurate English is like walking a tightrope.

Indeed!
And that’s one part of one discipline.
You can understand why professional scientists cringe over what laypeople do to scientific concepts when they try to express them in English.
Karl Lembke (7ae576) — 5/10/2008 @ 1:24 pm
Patterico–

Sorry, I didn’t follow all the links, and I saw no reference to the actual values in any of the posts or comments. The 1/3rd number AND the idea that the calculation was a simple matter of division was prominent however. I confess to skimming.

I think that my pique is more towards 1) the LA Times article’s incorrect use of statistics while correcting someone else’s use of statistics; 2) the trial’s reported use of meaningless psuedo-statistics to convict someone; and 3) your proposed correction focusing on minor errors and ignoring the main issues.

I must admit though that there are about 97 sides to this dicussion by now, and perhaps the statistics themselves are no longer meaningful.
Kevin Murphy (805c5b) — 5/10/2008 @ 3:14 pm
your proposed correction focusing on minor errors and ignoring the main issues.

Kevin, I love ya, but that’s a load of horse hockey.

The CENTRAL POINT made by the article was that the jurors weren’t told that the chances the search HAD HIT on an innocent person were 1 in 3.

That statistic is NOT what the jury should have been told, even according to the scientific committees cited in the article.

That is my focus, and it is hardly a focus on a side issue. It’s the MAIN issue — and even someone with a touch of Prosecution Derangement Syndrome should be able to see that.

Also, since you have been skimming this, maybe I should repeat something that I think most skimmers have been missing:

THE JURY WAS NOT TOLD THERE HAD BEEN A HIT FROM A DATABASE.

You do understand that, right?
Patterico (4bda0b) — 5/10/2008 @ 4:19 pm
49

“THE JURY WAS NOT TOLD THERE HAD BEEN A HIT FROM A DATABASE.”

I consider this sufficient to reverse (assuming the defense wanted them told).
James B. Shearer (fc887e) — 5/10/2008 @ 4:38 pm
What James said. That is by far the biggest problem here. If the prosecutor had said:

Ladies and gentlemen of the jury, there’s this thing called the ‘prosecutor’s fallacy’ that means blah blah blah blah blah. Don’t really understand that crap, all I know is that we had no clue who to suspect for this murder, so we went a-fishin’ in a huge database of known sex offenders. Each little experiment had a 1 in a million chance of randomly matching someone, but we ran the experiment about one-third of a million times. Do the math.

There would have been little or no risk of the jury convicting Puckett on the basis of the partial DNA match alone. They would have seen it as probative (which it is, unless the odds of the killer actually being in the DB are insanely low) but not dispositive (which it isn’t, unless the odds of the killer actually being in the DB are insanely high).
Xrlq (62cad4) — 5/11/2008 @ 12:19 pm

5/8/2008

The L.A. Times’s Errors in Its Piece on DNA and Cold Hits

51 Responses to “The L.A. Times’s Errors in Its Piece on DNA and Cold Hits”

Favorite Sites

Links

Patterico Sells Out