Patterico's Pontifications

8/12/2008

L.A. Times Statistics Expert Says L.A. Times “Mischaracterized” Crucial Statistic in DNA “Cold Hit” Article

Filed under: Crime,Dog Trainer,General — Patterico @ 9:39 pm



A statistics expert cited by the L.A. Times has now publicly claimed that the paper “mischaracterized” a central statistic in a Page One article on DNA, statistics, and cold hits.

Prof. David Kaye was recently described by L.A. Times reporters Jason Felch and Maura Dolan as “an expert on science and the law at Arizona State University and former member of a national committee that studied forensic DNA.”

Prof. Kaye is also now on record, in a soon-to-be-published paper, as saying that those same reporters “mischaracterized” a critical probability in a May article about DNA “cold hit” cases.

Prof. Kaye’s article is scheduled to appear in the journal “Law, Probability, and Risk” in September 2008. Alert readers will recognize the error cited by Prof. Kaye as one that I have repeatedly complained about on this blog. Here’s Prof. Kaye:

Diana Sylvester, a “22-year-old San Francisco nurse had been sexually assaulted and stabbed in the heart” in her San Francisco apartment over thirty years ago. A DNA database match from a highly degraded semen sample led investigators to “John Puckett, an obese, wheelchair-bound 70-year-old with a history of rape.” The jury heard that the random-match probability for the match at five or so loci was about 1 in 1.1 million. It did not learn that the California database had 338,000 profiles in it, making np almost 1 in 3 — a number that would render the match almost worthless to the prosecution (and that the reporters mischaracterized as “the probability that the database search had hit upon an innocent person”).

(All emphasis in this post is mine.)

Yes, they did indeed mischaracterize the probability, as I have been arguing for months.

I’ll send a link to this post, and Prof. Kaye’s article, to reporter Felch and to Jamie Gold. They have dismissed my complaints on this issue in the past. But it seems to me that they have to pay attention, now that an expert they have cited has stated in an academic article that the article “mischaracterized” the central statistic in the article.

P.S. You shouldn’t misread Prof. Kaye’s phrasing to indicate that he believes the match was indeed “almost worthless to the prosecution.” Further details, set forth in the extended entry, will dispel that notion. Those details are of interest mainly to those intensely interested in this topic. But I know there are a few of you among the regular readers here.

[Extended entry]

Prof. Kaye goes on to argue that the 1 in 3 statistic should not be of particular interest to jurors:

If logic were the life of the law, the np statistic would not be permitted. The figure of np = 1/3 in People v. Puckett, for instance, is an estimate of the probability that a database of profiles of n = 338,000 individuals would yield a hit to someone (not necessarily Puckett) if it were composed exclusively of individuals who are not the source of the crime-scene DNA (and who are not identical twins of the true source). [I will interpose here that I have repeatedly emphasized that the 1/3 statistic most properly and accurately represents a probability related to a database of innocent individuals. I have been ridiculed for that statement, but Prof. Kaye’s statement supports me. — Patterico] Unlike the random-match probability of p = 1/1,100,000, this number is not part of a likelihood ratio that is of interest to the jury. The legal issue, as the Supreme Court stated in Nelson, is not whether the database is innocent, but only whether the one defendant named Puckett is guilty or innocent. The likelihood ratio for the match with respect to Puckett as compared to a randomly selected individual is closer to 1,100,000 than to 3. (Kaye 2009). Thus, it is hard to see how the 1/3 figure is of much benefit to a juror seeking a reasonable explanation of the probative force of the evidence.

Prof. Kaye goes on to acknowledge that “the innocent-database-match probability is not completely irrelevant.” The whole thing makes interesting reading, and I recommend that you download it. My main point in this postscript is that Prof. Kaye clearly does not take the position that the 1/3 statistic is determinative and should be given great weight by jurors. To the contrary, he believes that statistic should be mostly (if not completely) irrelevant to jurors.

122 Responses to “L.A. Times Statistics Expert Says L.A. Times “Mischaracterized” Crucial Statistic in DNA “Cold Hit” Article”

  1. I don’t know if there is a judicial council or some governing body that sets policy within the L.A. Criminal Courts. If there is, this entire DNA value/probability issue needs to be on the next agenda.

    My admiration for you only grows, Pat.

    If there were a Pulitzer, or some such, for independently published work, you would win this year.

    Ed (841b4a)

  2. I agree with Ed but you can forget about a nice write-up in the LA Times when you win that award.

    DRJ (a5243f)

  3. As DNA evidence becomes more common, someone is going to have to develop a standard rule on how this is used. I’m not a lawyer but have done a lot of expert witness work and am very aware of how difficult it can be to explain science to jurors. I’ve been retired for years but still get asked to review cases and even testify because I seem to be able to explain medical facts to juries. An awful lot of doctors do not make good witnesses. Having been an engineer at one time, I’m aware that they are even worse, if possible.

    That sort of thing probably explains people like Edwards who made a fortune convincing juries that birth trauma causes cerebral palsy when all the science shows it is unrelated.

    Mike K (155601)

  4. #2 – DRJ – Patterico’s not interested in the Pulitzer. They only give those to Lefty’s.

    He wants a Nobel!

    Apogee (366e8b)

  5. Patterico,

    Please read the last paragraph of the paper. You have to consider the number of matches to the gene profile among potential suspects who are not in the data base. Since we would expect about two others in the region to match the genetic profile the chance that Puckett was the guilty party is about one in three if you only look at the DNA data. Now you might quite reasonably claim that there was other evidence that made Puckett more likely than the unidentified possible matches not in the database. But evidence distinguishing Puckett from genetic look alikes is needed before you can prove guilt beyond reasonable doubt.

    Some of the dispute discussed in this paper is the frequentist versus Bayesian dispute in statistics. They look at subtly different things. The Bayeseian is looking at the chance of error in this case and the frequentist is looking at the reliability of the procedure.

    However I think there are some mistakes here over the handling of finite sample statistics. There was a mistake in the LA Times but the main point was correct. The random match probability is misleading in this case and you are looking for a reason to have more confidence in it than is warranted. You have to live with uncertainty.

    Lloyd Flack (e48d4e)

  6. Obviously, my job just got harder given that I’m now arguing against Kaye and not just against you. Nevertheless…

    Unlike the random-match probability of p = 1/1,100,000, this number is not part of a likelihood ratio that is of interest to the jury. The legal issue, as the Supreme Court stated in Nelson, is not whether the database is innocent, but only whether the one defendant named Puckett is guilty or innocent. The likelihood ratio for the match with respect to Puckett as compared to a randomly selected individual is closer to 1,100,000 than to 3. (Kaye 2009). Thus, it is hard to see how the 1/3 figure is of much benefit to a juror seeking a reasonable explanation of the probative force of the evidence.

    The entire database is indeed “on trial,” as all 338,000 individuals were originally at risk of prosecution.

    It looks like the good professor just fell for a macro version the Monty Hall problem. Remember that time you picked Door #1, Monty picked Door #2, and asked you if you wanted to change your answer? You (the generic you, not you personally) really, really wanted to think that the chances of the prize being behind Door #3 were 1 in 3. After all, those were the odds of the prize being behind any of the doors, and since Monty wasn’t going to pick Door #1 anyway, only Door #3 was “on trial,” right? The answer, of course, is wrong. Doors #2 and #3 were both on trial, as from your perspective, Monty was free to choose between them depending on the location of the prize. Once Monty had made his choice, Door #3 wasn’t random anymore, and the proper analaysis was no longer that Door #3 had the same 1 in 3 chance that any of the doors originally had, but rather, that there was a 2 in 3 chance that one of of the doors available to Monty (let’s call them “the database”) contained the prize, and that if so, Monty had just signaled which of those two doors it was. Based on the selection criteria, the 2 in 3 odds that originally applied to “the database” collectively now apply to Door #3 alone, the one door in the database that Monty has identified as a possible candidate for the prize.

    The same occurred here. Puckett wasn’t a previously identified suspect, nor was he chosen at random. He was chosen for one reason and one reason only: cops went on a 338,000 record fishing expedition, and was the unlucky fish that happened to be caught. In other words, the booby prize of a false match was behind Door #234,567, and Monty just opened doors 1-234,566 and as a result, Monty opened Doors 234,568-339,000 for us.

    If that sounds too counterintuitive (as the original Monty Hall problem did when only 3 rather than 338,000 records were involved), consider this. If another search like Puckett’s is conducted on another case, there is only a 1 in 3 chance we will get the wrong guy. However, there is a very high chance that this unlucky soul will be wrongly convicted, because no matter who he turns out to be, prosecutors will be free to argue that there was only a 1.1 million chance we would have wrongly nabbed him.

    Xrlq (62cad4)

  7. However, there is a very high chance that this unlucky soul will be wrongly convicted, because no matter who he turns out to be, prosecutors will be free to argue that there was only a 1.1 million chance we would have wrongly nabbed him.

    They will be free to argue a statistical fallacy?

    Once again, you sound like the LAT authors, who continually described the RMP in terms of a 1 in 1.1 million chance that the match to Puckett was a coincidence. That is a fallacious way to put it.

    Patterico (cc3b34)

  8. Since we would expect about two others in the region to match the genetic profile the chance that Puckett was the guilty party is about one in three if you only look at the DNA data.

    I don’t think you can say that the chance that Puckett was the guilty party was one in three based on available data.

    The random match probability is misleading in this case and you are looking for a reason to have more confidence in it than is warranted. You have to live with uncertainty.

    I worry that by discussing at such length a case based on degraded DNA, we are creating uncertainty in people’s minds regarding RMP in a non-degraded sample, where there should be essentially no doubt at all.

    Patterico (cc3b34)

  9. I worry that by discussing at such length a case based on degraded DNA, we are creating uncertainty in people’s minds regarding RMP in a non-degraded sample, where there should be essentially no doubt at all.

    True, if the DNA is not degraded there should be little chance of an erroneous match whether it is a cold hit or not. However in cold hit cases because of age and other factors you will tend to have a higher than usual proportion with degraded samples. In such case you need to be cautious about the interpretation of results and the probabilities given.

    Lloyd Flack (ddd1ac)

  10. True, if the DNA is not degraded there should be little chance of an erroneous match whether it is a cold hit or not. However in cold hit cases because of age and other factors you will tend to have a higher than usual proportion with degraded samples. In such case you need to be cautious about the interpretation of results and the probabilities given.

    You always have to be cautious.

    The question will always be, at how many loci were there a match? What distresses me is that jurors are now, apparently, starting to question the significance of 13-loci matches. As if the numbers involved there don’t mean anything. That is truly distressing.

    Patterico (cc3b34)

  11. Let’s suppose that as a result of the Puckett case, the LAPD goes through its Cold Case Fles and identifies 100 additional unsolved rapes for which degraded semen was recovered and stored.

    Further assume that for each semen sample, that only the same five STRs that were typed in the Puckett case can be reliably typed.

    When searching the database of 339,000, some number of these cold cases will return hits. Let’s look at that.

    * First, assume that none of the actual perpetrators are represented in the database. In that event, using the “1 in 3” figure, about 27 cases will return a match or multiple matches with an innocent person/innocent people. There will be about 22 single matches and about 5 cases with two matches. (I am making a guess-estimate of the Poisson Distribution, which accounts for the fact that not all 33 matches will be single matches; there will be some random double hits (and perhaps even a triple hit).)

    * Overlaid on top of this pattern will be the hits that are due to the perpetrators’ DNA being in the database for some cases. Neglecting the instances where perps’ twins or close relatives are in the database (which shouldn’t be ignored in real life), some of the cases that would otherwise have been “0 hit” will thus be “1 hit”, and some that would otherwise have been “1 hit” will be “2 hit.”

    There is no way of looking at any one of these {28 to (28 + X)} cases with hits, and knowing which contains a hit from a guilty party.

    So routine police work must take over at this point. Gender (half of the “innocent” hits will be women), alibi, past criminal record, eyewitness description.

    When some number of these (28 + X) cases go to trial, it seems to me that it would be misleading to tell the jury that the DNA evidence suggests that there is a “1 in 3” chance that the accused is innocent of the crime, caught up in a dragnet.

    It also seems to me to be misleading to assert that “1 in 3” is a ratio that should not be of interest to a jury. If the ratio was “1 in ten billion,” as it might be in a typical case where all 14 STRs in the semen were analyzable, that would be relevant!

    In conclusion, this seems to be a situation where the judicial system is going to have to present a complex reality to the jury and hope for an outcome based on sophisticated reasoning. In the proceedings of the fraction of the (28 + X) cases that go to trial, the quality, thoroughness, and persuasiveness of the police work that led to the charges ought to be crucial. In that, these cases would greatly resemble the run-of-the-mill cases that the justice system handles every day.

    AMac (90ab22)

  12. My bad on proofing the preceding comment. All instances of “28” should have been changed to “27” to correspond to my guess of what the Poisson Distribution would yield.

    AMac (90ab22)

  13. I worry that by discussing at such length a case based on degraded DNA, we are creating uncertainty in people’s minds regarding RMP in a non-degraded sample, where there should be essentially no doubt at all.

    I agree, to a point. However, hearing prosecutors and the FBI defend the weakest cases as though they were strong doesn’t do much to build the public’s confidence in the more numerous, legitimately solid cases, either.

    I’d be willing to go as far as to say that DNA evidence is generally strong enough to withstand either a typical degradation or the database effect, just not both. If Puckett had matched all 13 loci, or if he had been a previously identified suspect rather than the product of a trawl, the odds would easily be a million to one or better in favor of his guilt. But put those two problems together, and you’ve got a system that ran a 1 in 3 chance of getting it wrong in his case, and even worse chance in future as the database continues to grow.

    But right now, no one seems to understand what any of these numbers mean, and jurors end up either grossly overestimating or grossly undersestimating the reliability of any DNA evidence before them. So we’re left with one guy getting convicted under a seriously flawed model that ran a 1 in 3 chance of getting the wrong guy, while another matches all 13 loci and walks. I guess the average is right, though.

    Xrlq (b71926)

  14. The use of statistics by lawyers to influence jurors is apalling. There are 2 immutable
    “facts” regarding Cold Case files. First, biological specimens degrade over time. Second, that technology and precision of DNA testing will continue to improve over time. Only careful analysis of the data and facts should be allowed in the courtroom, not emotional pleas for innocence or whatever. This area is ripe for nationalization of DNA convictions to take it out of the hand of unethical lawyers (not all are, of course) and uninformed judges (not all are, of course). Leave statistics to the mathematician.

    FOB (fd470d)

  15. The lists that they do the terrorist data mining from and for which your Democrat types wanted to prosecute the phone companies, are analogous to the DNA databank in some interesting ways.

    With the phone lists, the libs argue that we are all potential perps but we all have unique “DNA” with 10 loci and no degraded searches need be done.

    So, if your Democrats can succeed at fooling the public by convincing them that their rights are being trampled on by being part of a “suspects list”, you don’t stand a chance of making them overcome the sophistry of the xrlq trolls.

    j curtis (c84b9e)

  16. I tend to think that the main reason why the 1 in 1000000 figure is nonsense is that the probability of laboratory error is almost certainly greater than that.

    Pink Pig (2b901e)

  17. So you would suggest we have high confidence in 13 loci matches? 12? 11? 9?

    Those original statistical likelihoods for occurrence at each locus were empirically researched, but there is a small probability that they are wrong. And I had thought that some of the statistical likelihood that form the basis of confidence are being profoundly questioned as a result of random trolling being performed in some of the dna dbs. (ref: Kathryn Troyer, although her example does come from the LAT)

    FBI estimates of 9 loci matches are 1:113 billion, but matches have been found by Ms. Troyer in reasonably small databases, so what should I conclude?

    I am fine with using dna as we now use it for exonerating people (up to a point). But unless I am convinced of the individual orthogonal probabilities, then I would be skeptical of convicting based on it in relative isolation.

    Please let me know why I should have high confidence in high loci matches (9-13). And if I should have confidence that matches FBI estimates. For example, if a nine loci match probability is 1:50K, it is still relevant.

    Thanks

    EntropyIncreases (10b345)

  18. j curtis #14,

    You’re way in over your depth, here.

    nk (e69fdd)

  19. #15 and #16,

    It’s what I hope to litigate. Hope. If I can get the state to nolle prosse or if the client accepts a favorable plea, that’ll be the end of the case.

    nk (e69fdd)

  20. Regardless of the stats in the current discussion, we’d all be better off if no journalist without at least a masters degree in stats were allowed to report anything more than descriptives, and none of them should be allowed to either interpret them nor suggest any policy based on them.

    In the past 30 years I doubt I have seen 10 instances of properly interpreted stats in the MSM, and damned few properly reported stats above the level of descriptives (and those usually aren’t on point).

    JorgXMcKie (c6778e)

  21. Shockingly…if there were 1.1 million samples the possibility of a hit would be almost 1 to 1.

    Christian Lindke (aa4f59)

  22. EntropyIncreases:

    So you would suggest we have high confidence in 13 loci matches? 12? 11? 9?

    Yes, yes and yes, with the caveat that if the suspect was selected by way of a database search, the database effect will have to be taken into account. I don’t know the likelihood of a false match among a a database search matching 9 loci, but given that you have to go all the way down to 5 1/2 loci to get the 1 in 3 figure, I reckon that a 9 loci database search would still be pretty reliable.

    Those original statistical likelihoods for occurrence at each locus were empirically researched, but there is a small probability that they are wrong. And I had thought that some of the statistical likelihood that form the basis of confidence are being profoundly questioned as a result of random trolling being performed in some of the dna dbs. (ref: Kathryn Troyer, although her example does come from the LAT)

    Troyers findings deviated less than 50% from what the FBI estimates predict. We don’t know the standard error, or how many relatives may be in the database, either of which (or a combination of the two) could account for that relatively slight varation. Worst case scenario, FBI estimates really are off by about 50%, making DNA evidence “only” about 75% as reliable as we think it is now. That could be a real problem for close cases like Puckett’s, where significant degradation and the database effect intersect to make the odds of a false match unusually high. It’s basically a non-issue in most other cases where more loci match, or where there wasn’t a database search to begin with. Who cares if the odds in a particular case are really 1 in 1 billion vs. “only” 1 in 750 million?

    FBI estimates of 9 loci matches are 1:113 billion, but matches have been found by Ms. Troyer in reasonably small databases, so what should I conclude?

    Apparently, I should conclude that Patterico was right after all, and that no one does read past the first page of the L.A. Times. Go back and read the whole cotton pickin’ article, or at least page 1 of the online (not print) edition, and then you come back and tell us what you should conclude. It’s one thing to note, correctly, that if you match enough random samples against each other, something is virtually guaranteed to randomly match to something else. It’s quite another to extrapolate that the odds that someone will randomly match to somebody (a near certainty, given the cross product of all the records involved) apply to the odds that someone will randomly match to the one DNA sample at issue in a particular case. They’re not in the same league.

    Xrlq (b71926)

  23. NK:

    j curtis #14,
    You’re way in over your depth, here.

    I’m not sure if “here” refers to this discussion, this blog, Earth, or somewhere else, but regardless of the definition I agree with yoru statement wholeheartedly. Like Dumptaster, j curtis is one of those “conservatives” I’d like to convert to liberalism so they could run around discrediting someone else for a change.

    Xrlq (b71926)

  24. I think you’re right that the LA Times phrasing is wrong.

    But I think the point they’re trying to make is valid.

    If there’s a 1/3 chance of getting a match with a database full of innocents, then the fact that Puckett matched at 5.5 loci is inadequate for a conviction.

    It’s probably adequate probable cause to look for further evidence, but if the DB match is the only evidence you don’t have him beyond a reasonable doubt.

    Certainly it should be phrased differently, both in the Times and to the Jury.

    See how this sounds.

    There is not a 1/3 probability that the man is innocent, that probability is unknowable. But there IS a 1/3 probability that the DB search does not indicate guilt.

    Sam (c71bb1)

  25. You’re way in over your depth, here.

    Comment by nk

    About what? Your comment seems trollish if you can’t prove your point.

    j curtis (c84b9e)

  26. Thanks xrlq, most of your response appears constructive. I can tell your reading was as close as you presume mine was.

    If I found a match of 144 (100 of which would be expected, 44 unexpected) in a sample of 100K matching on 9 loci, then if DNA were degraded to just those 9 loci, there would be 144 matches…

    So then the key would be what are the odds of the sample being degraded to those specific 9 loci. And then how do those match to the odds the FBI attributes to it. If each locus has particular occurrence odds and are orthogonal from other loci, then I would expect rare things to be rare, not common. So something else is at work.

    If my DNA were in a database somewhere, and I matched 144 different people on 9 loci, I would certainly hope that those other 144 were not violent.

    It bothers me that the FBI doesn’t want any culling of those DNA dbs. I trust it could be done in an anonymous way to protect privacy while verifying the FBI statistics. So what are their objections? Get a few people to validate what can be.

    EntropyIncreases (10b345)

  27. EntropyIncreases,

    You are simply one of many victims of the LAT’s misleading presentation of the Arizona database problem. The number of expected matches is not very different from the number that was actually found. Scrolll down my main page and you’ll see an explanatory post. David Kaye has more on his blog as well, very recently.

    Xrlq argued that the LAT presentation wasn’t misleading. You are living proof that it was.

    What you are overlooking is that when every sample in a 65,000 person database is compared to every other sample, there are over 2 billion comparisons done. Plus, with 9 loci, there are over 700 ways there can be a match (because there are more than 9 loci). Ultimately, the numbers are very close (given the huge products involved) to what the FBI’s product rule would predict.

    Patterico (6c454a)

  28. Xrlq is close to articulating the problem here–is the evidence identifying, or merely corroborative, and are you talking random populations, vs. non-random?

    Where the evidence is corroborative, i.e. there are other, independent pieces of evidence showing that a person is guilty, then the question really is, “what are the odds that a completely randomly selected person, with all these other pieces of evidence tending to show guilt, would not have a match”. Then it makes sense to say that the odds would be very, very small.

    The problem, statistically, is very different when it is the database that leads one to the suspect, and that database is comprised of felons, which is, in no way, a random database of the population because of sex and racial skewing. Then you have to ask, what are the odds that there is any SINGLE match IN THAT DATABASE. Of course, there will be different odds as to two matches, three matches, etc.

    Now, the problem here is that the statistics at issue are talking about different things. One is likelihood of a corroborative match in a random population, the second as a primary identifying match among a non-random database.

    I will say that the LA Times article could have done a better job of laying out the issues, but I don’t think non-statistically inclined people would have come out less confused. Statistical analysis out of non-random populations is very difficult and, like the Monty Hall paradox, yield counter-intuitive results, and felon databases are non-random. Where the matches are only partial, because of degradation, you have very difficult questions of statistical analysis.

    Cyrus

    Cyrus Sanai (4df861)

  29. The Virus returns. Maybe David Petranos Esp and MKDP will join us too. Please, please, please, please, please, please, pretty please.

    JD (75f5c3)

  30. Please don’t, JD.

    nk (e69fdd)

  31. For a further explanation (using very rough numbers) see here. It explains where I got that “over 700” number.

    Patterico (961c14)

  32. Sorry guys, couldn’t resist. I will go back to learning. Y’all are some pretty bright individuals. I love just reading these threads.

    JD (75f5c3)

  33. Xrlq,

    I agree with Ira that it doesn’t seem that you can analogize to Monty Hall, because you don’t know that the prize is behind any of the doors.

    Patterico (114736)

  34. Is Cyrus Synai also Ira?

    DRJ (a5243f)

  35. #30, #32

    The point is not the prize, it’s the lack of randomness in Monty Hall.

    As for the article linked in #30, this pretty much illustrates the point. On low loci matches, it is wrong to say that the chances that person X has a 1 in 1.1 million chance of being the ONLY person who matches. On much more unlikely 9 loci match, the Arizona study shows 122 matches out of a 65,000 population, That means if you were scouring the database for a nine loci match, you would get 122 names. From that, a cold hit investigator picks one guy to prosecute, and btw, he’s a felon!

    So the question is, what does the match tell us about certainty? Using the 9 loci Arizona example, look at the different numbers depending onthe question asked.

    The odds that a cold hit investigation is bringing a case against someone who matches is 100%!

    The odds that the defendant is not guilty assuming the crime was committed by someone in the database 122/65,000 (about one in 500).

    The odds that a randomly selected individual matches any person in the population of 65,000? Not surprisingly, depends on the population from the other pool and whether the 65,000 is true subset or not.

    Now, in the five loci situation, which is what was at issue in the LA Times article, I find it perfectly reasonable to believe that 1/3 of the entries would have a match to another, which is the LA Times claim.

    What does this mean? Using a five loci match, then picking the first guy who looks good on it, is not assurance that some other felon is not just as good. The better the match, the better the odds. However, using low loci matches out of felon databases can create a misidentification, since the selection of the defendant is about as far from a random selection out of a random population as possible.

    Low loci matches, like using any high frequency population characteristic, is obviously a very, very useful investigative tool. But it is the attempt to bring it into the courtroom to prove certainty that is creating some of the confusion. A high loci DNA match is something qualitatively different from a low loci match; by entering low loci matches into the court, the prosecution is to an extent undermining the credibility of the technique.

    That being said, the area does cry out for better study and legal decision-making as to where to draw the line on partial match evidence.

    Are there guidelines in the LADA office, Patterico? What’s the lowest number of loci matches you would go with out of the California database?

    Cyrus Sanai (4df861)

  36. “If my DNA were in a database somewhere, and I matched 144 different people on 9 loci, I would certainly hope that those other 144 were not violent. ”

    No! No sample of DNA matched 144 other samples in the Arizona database.

    There were 144 matching pairs at 9 or more loci (122 at 9). Each pair did not match each other pair. There were not 144 or 122 people with the same DNA at 9 loci.

    Please learn more about the facts. Perhaps then you will not draw incorrect conclusions.

    Patterico (9118cc)

  37. DRJ # 34: I think that Pat, in comment # 33, was probably responding to my comment #139 in the “Jurors, Likely Misled by L.A. Times, Acquit Man Accused of Sexually Assaulting an Elderly Woman” thread.

    Ira (28a423)

  38. Thank, Ira. I’m having trouble keeping up.

    DRJ (a5243f)

  39. “On much more unlikely 9 loci match, the Arizona study shows 122 matches out of a 65,000 population, That means if you were scouring the database for a nine loci match, you would get 122 names.”

    Flatly false. You’re 100 percent wrong. Guess you got misled by the LAT too. That’s OK. You’re not alone.

    Patterico (58f30f)

  40. #39

    I’m starting to think you are the one who does not grasp the statistics.

    In the article you linked to, the author states that in a database of 65,000, there are 122 pairs of people who match at 9 loci.

    So if you look for the number instances for a pair of nine loci matches, you get 122 pairs (actually 244 individual names, but its only meaningful in matches).

    What do you think that number means?

    Cyrus Sanai (4df861)

  41. I think it means that out of about 2.1 billion pairs of samples in the database, there were 122 pairs that matched at nine loci. I infer that the probability that a randomly selected pair shows nine matching loci is 122/2.1 billion or about 1 in 17.3 million. I further infer that the probability that a randomly selected profiles matches some other profile in the database simply due to chance is about 1 – exp(65,000/17.3 million) or about 1 in 266.

    W. Krebs (55a367)

  42. In the article you linked to, the author states that in a database of 65,000, there are 122 pairs of people who match at 9 loci.

    That’s entirely different from That means if you were scouring the database for a nine loci match, you would get 122 names.”

    All it means as far as I can see is that if you only look at nice loci you will get two people out of 65,000. If you look at another nine loci you will get two more people. If you look at another nine, two more. And so, and so forth, on until you get to 122 pairs.

    Am I wrong?

    nk (e69fdd)

  43. #41

    Database we are talking about is the AZ database in the link Patterico gave. That’s 65,000 felon names.

    #42

    I don’t think that’s what the chart says. I read it as stating that in a database of 65,000 names, there are 122 matched pairs (244 names) at nine loci.

    However, this exchange I think brings up part of the problem in the LA Times story, Patterico’s complaint, and the area as a whole. Absent using statistical notation, it can be very confusing to discuss these matters, and even when statistical notation is used, experts can debate what the appropriate question is.

    When I studied statistic academically, I learned the three sources of problems that most commonly afflict statistical studies:

    1. Lack of randomness in base population and failure to properly correct for it.

    2. No means of directly measuring quality being analyzed, or bad evidence of same even if measurable.

    3. Lack of rigor in framing the question to be answered.

    My interest was in economics, but all statistics involving human behavior have the same issues. In these DNA tests, you have 1 always, 2 in the case of degraded DNA, and it is 3 that is the subject of the LA Times article and Patterico’s criticisms.

    Interestingly enough, this subject was discussed at the Ninth Circuit’s Judicial Conference which I covered, and Peter Neufeld of the innocense project had some interesting comments about how bad or dishonest DNA testing had, in the past, led to convictions of the wrong guy. Not a problem with the technology as much as how the tests were carried out and what they were represented to mean.

    Cyrus Sanai (4df861)

  44. I read it as stating that in a database of 65,000 names, there are 122 matched pairs (244 names) at nine loci.

    Doesn’t “matched pairs” imply that there are nine loci for every pair, up to 122 pairs, out of 65,000 people, but there is no third person who fits into the exact same nine loci? So like I said above, if you take 65,000 people, look at any nine loci, you will find two people who share them. Vary the loci but stay at nine and you will find two more. Vary them again staying at nine, and two more. You run out at 122 pairs. ?

    nk (e69fdd)

  45. 44

    Think of it this way. You have 17.3 million boxes. You dump 65000 people at random into the boxes. Then most boxes remain empty. Of the remaining boxes most have exactly one person in them but a few (about 122) have 2. The expected number of boxes with 3 people is about .15 so there most likely won’t be any.

    James B. Shearer (fc887e)

  46. That’s not useful. What if I were searching for a liver to transplant instead of a rapist?

    nk (e69fdd)

  47. nk,

    I think you have it exactly right.. The facts have been misstated by Sanai and EntropyIncreases (I assume not intentionally) as saying something different: that if you start with a profile with nine loci, and run it through a database, you’ll get 122 matches. That is decidedly *not* what was found, although many have misinterpreted it in that way.

    Patterico (05a609)

  48. 46

    You would use a different database intended for that purpose.

    James B. Shearer (fc887e)

  49. Thank you. Somebody who knows what she’s talking about told me that these “samples” (databases) are very useful if you want to study criminals. They are not useful for anything else. Including a probability that someone is the perpetrator of a crime.

    nk (e69fdd)

  50. There are two separate issues here. One is the chance that the individual found by a data base search is in fact not the culprit. The other is the risk that a database search will come up with a false match.

    Usually a database will only cover a small part of the population. Most of the risk of an erroneous match comes from the fact that there is a chance that there is a possibility that the correct match is to someone not in the database. Increasing the size of the database actually reduces this risk but not significantly until most of the relevant population is in the database.

    What increasing the size of the database does is increase the chance of either a correct or incorrect match. It does not increase the chance of any given match being incorrect. If the number of likely matches outside the database is negligible then no harm is done. If it is not then increasing the size of the database increases the chance that you will put an innocent person on trial rather than putting no one on trial at all as well as increasing the chance of putting the guilty person on trial.

    There is a temptation to treat a database search as a magic wand to solve a cold case. If the DNA is highly degraded then the results should be treated with caution. You may need other evidence before you can be sure enough that you have found the culprit.

    Lloyd Flack (e48d4e)

  51. #43

    Given 65,000 names in the AZ database, there will be 65,000*64,999/2 = 2,112,467,500 pairs of names in the database.

    I should add that I assume that the database was searched exhaustively for all possible 9-locus pairs.

    #45

    Given that the expected number of triple matches of nine loci is 0.15, the probability of seeing one or more such matches is also about 0.15.

    W. Krebs (55a367)

  52. wait, has there ever been a case where someone was tried and convicted with having a dna hit against them, and this person was later shown to be innocent?

    funny thing about that, the pro-criminal community would never want that to happen because it cuts at the premise of their “innocent project” schemes. Where they are sneaking trick DNA samples into old case files.

    j curtis (c84b9e)

  53. Where they are sneaking trick DNA samples into old case files.

    What are you drinking? Whatever it is, I want to avoid it.

    Lloyd Flack (e48d4e)

  54. #47

    Patterico,

    No, that is absolutely not what I am contending or wrote. You are not reading what I stated correctly.

    I am stating that according to the stats on that page, in one database, out of 65,000 sets of DNA, there are 244 persons whose DNA matches nine loci, out of a particular statistical examination.

    That’s not my evidence, it is what the data is said to represent.

    I am not saying, that if you run any particular set of DNA, you will get 244 (or even 122) nine loci matches, based on that representation, though that appears to be what you THINK or WANT me to be stating. What I wrote is that out of any run where a match occurred at nine loci, there is a 1 in 500 (approximately) chance that there is a second person in that database who also matches, again based on THAT data. 499 times out of 500, the match would be unique in that 65,000 person database. But as you make the search more general, the number of additional matches goes up exponentially. That’s why the author of the linked article wanted to talk about nine loci (very small numbers of matches) as opposed to five loci (the subject of the LA Times article).

    It is statistically a very different thing to state the chances that A will match B on to some degree out of a database, with the statement of the chances that A will match any one of the database samples to some degree, with a third statement of the number of times there will be two matching samples to some degree in that database.

    The issue is that you want to talk only about the first statement, which is the chance that out of living people, a randomly chosen DNA sample from anyone on earth might match a randomly chosen DNA sample in a database of Y felon samples to X loci. That number is really, really, small and to you, that’s the only number you think is important.

    But this is NOT a random sample, it is NOT a random database, and the question is not merely, what are the odds of a match, but also what are the odds this is the ONLY match on a partial DNA test (which is different, by the way, from a false match or false positive–that is just error). That’s why the pairing analysis is important.

    The reality if that IF there is one match in a database on a cold case, that person becomes a suspect. The chances of any felon’s match against random population: minute. But the next question is, again, what are the odds that he is the ONLY person who matches to that degree, either in the database or the general population.

    Frankly, the only way to cleanly determine the correct statistical odds are to test the felon database against a much larger sample of which the felon database is a subset. In the absence of that, felon database to felon database analysis is what is left; obviously, that analysis OVERSTATES the number of possible incomplete matches, though to what extent you don’t know until you test it.

    I return again to my statement about improper use of statistics: one of the three major sources of error is looking at the wrong question. On partial match DNA, the question of how many other persons might also be matches is a question just as important as the odds of that particular defendant matching that particular sample to X loci.

    Cyrus Sanai (4df861)

  55. has there ever been a case where someone was tried and convicted with having a dna hit against them, and this person was later shown to be innocent?

    What are you drinking? Whatever it is, I want to avoid it.

    I just want to learn about the example so I can read up about it. Are you ready to present the example now?

    j curtis (c84b9e)

  56. #55

    According to Peter Neufeld of the Innocence Project at the Ninth Circuit’s recent judicial conference talk, there have been such instances, but the tests were improperly done, so I gathered it was possible to go back and redo them correctly. That does not go to the reliability of the evidence though–badly done or dishonestly done tests are always a risk.

    Again, we are talking about PARTIAL MATCH evidence, where the number of matches is low. That does not reflect on full (or nearly full) match evidence.

    Part of where I kind of agree with Patterico about the LA Times articles is that the distinction needs to be made clear that the questions arise with degraded DNA samples, and NOT with full or nearly full match. Properly done full match evidence is very strong. The question that arises is where the match is not so good, and that subject–how good is good enough–is an important one. But again, the back and forth on this board, and the confusion Patterico shows about my contentions, shows the problem is the very fine distinctions at play here.

    The other point, again, is whether the DNA evidence is corroborative, vs. identifying. In the SF case, it appeared that the evidence was along with the guys record the only evidence linking that guy to the crime, and what led to him. That is very different from a situation where you have a suspect for other reasons, and bingo, that guy is a match. Where the DNA is the prime identifying evidence with nothing else pointing to him, you MUST look at the chance that some other felon or person might match. If there are already other evidence that excludes most other people, than it is a different situation.

    Cyrus Sanai (4df861)

  57. “On much more unlikely 9 loci match, the Arizona study shows 122 matches out of a 65,000 population, That means if you were scouring the database for a nine loci match, you would get 122 names. From that, a cold hit investigator picks one guy to prosecute, and btw, he’s a felon!”

    Sure makes it sound like you’re saying that if you have a nine loci profile and run it through a database, you’ll get 122 names. If that’s not what you meant, blame yourself for poor phrasing, and not me for reading your ambiguous language differently from the way you now claim you intended it to be read.

    Patterico (cc3b34)

  58. “But again, the back and forth on this board, and the confusion Patterico shows about my contentions, shows the problem is the very fine distinctions at play here.”

    That’s rich. You use imprecise and ambiguous language (which is also undeniably wrong, as under no scenario are there 122 “names” but rather “pairs”) and blame *me* for the confusion.

    Patterico (2783f9)

  59. Cyrus is never imprecise, ambiguous or wrong. Have you learned nothing Patterico?

    How dare you!

    daleyrocks (d9ec17)

  60. Again, we are talking about PARTIAL MATCH evidence, where the number of matches is low.

    Do you have any examples of convictions based on partial match evidence without any corroborative evidence, Cyrus Sanai?

    j curtis (c84b9e)

  61. #57
    Patterico ,

    If you had read the entirety of my post, particularly concerning the 1 in 500 issue, that would have made it clear that I was stating that in 1 of 500 cases there would be a second match, not 122 matches for every nine loci run.

    Taken out of context, yeah, the sentence you quoted could be interpreted as you did, which is why any discussion on this issue needs to have examples or statistical formula, which I provided.

    One more point.

    In any complicated case, it is rarely the case that all the evidence points one way, and sometimes the evidence points in multiple directions. However, law enforcement, once they have decided who the perp is, then focus their investigation on building the case against the individual and putting all contrary evidence in the category of “things we have to disclose and disprove”. This bias is rational, since law enforcement can’t infinitely investigate every lead and have to conserve resources; but it is what leads to the Mark Hatfill situations, innocents labeled as guilty.

    The use of highly degraded DNA evidence to identify perpetrators carries this risk in spades; once the match is made, the police then will do everything they can to prove guilt, even if there are other possibilities, and leads to the possibility of convictions of innocent people. The question that needs to be analyzed is, how many matches are good enough given the size of the databases, the skewed sample, etc. And good enough means, “has a small enough chance that there is at least one other match that courts should find acceptable.” Is one in 500, as seems to be the case in nine loci matches out of the AZ databasse good enough? Maybe. Five loci matches in the Califonia database good enough? The LA Times story suggested “no”. Patterico disagrees, but he is a prosecutor, so he would.

    I don’t have an opinion at where the line should be drawn, since I don’t have access to the data. I do know that the answer will be different for each database, and I do know that the question to be asked is not “What are the odds that a random person’s DNA matches in this number of loci to the degraded sample”, but rather, “What are the odds that the DNA match of this defendant in this number of loci to the degraded sample is the ONLY match out of the relevant population”.

    Now, I’m not sure where Kaye would end up on this. I think he is right that the denominator is not 3. If one out of every three samples at five loci has a paired match in the database, and the database is “guilty” (i.e. someone in the database committed the crime) then the denominator is something around 6 to 5 (since at that number there will be cases of triplets, quadruplets etc.)

    But I think he is wrong to say that the number of interest to the jury, where the evidence is identifying, is the odds that a random sample had that match. It would be different if the evidence was corroborative. I can’t tell from the story which is the case, but maybe Patterico knows or infers more than I do or can.

    Cyrus Sanai (4df861)

  62. Look, this is just the LA Times. You can’t expect them to be as good journalistically as, say, the National Enquirer. Maybe the Enquirer will purchase the Times and we’ll have more believable news in this city.

    Jack (bd6c91)

  63. Cyrus,

    You make good points but I’m sad to say that “why didn’t the police look there instead of here?” does not work as a defense. I suppose that I could, conceivably, Perry Mason style, send Paul Drake out to investigate the other suspect and dramatically convict him while acqutting my client but that’s not the reality.

    I admit that it has been successful in a handful of death penalty cases in post-conviction proceedings but those are rare exceptions.

    nk (e69fdd)

  64. I’m getting a big kick out of Cyrus lecturing an Assistant District Attorney about what law enforcement does. The pretentiousness, it is amazing.

    daleyrocks (d9ec17)

  65. “. Five loci matches in the Califonia database good enough? The LA Times story suggested “no”. Patterico disagrees, but he is a prosecutor, so he would.”

    Citation for your claims in the quoted passage regarding my opinion that a five-loci match in the California database is “good enough.”

    You also misstate what law enforcement does. In close cases, I constantly re-evaluate the evidence to make sure I have the right person.

    So I don’t really appreciate your putting words on my mouth re my opinions on the sufficiency of matches, nor your ignorant pontifications about how we in law enforcement do our jobs.

    Patterico (5e161d)

  66. That’s fine, daleyrocks. I’m learning from Patterico, Xrlq, Cyrus and James B. Shearer. Mostly how to talk about DNA in a way people understand. I’m basically a slow person so the more this goes on the better for me.

    nk (e69fdd)

  67. Fine with me nk. I don’t like the condescension.

    daleyrocks (d9ec17)

  68. Funny; neither do I. Especially coming from someone who has no idea what he’s talking about.

    Patterico (cc3b34)

  69. #65

    Point taken about my inference taken from your criticisms. That leads back to the question that I posed that you did not answer: what does the LADA consider adequate?

    I DO NOT misstate what law enforcement does. My brother is in it, I know many former prosecutors, and there is no lack of journalistic descriptions of the process. The Mark Hatfill case shows how this process goes awry. I can send you some good articles discussing how the FBI went down the wrong path on this.

    Also, your attitude is the snotty one, Patterico. You have never worked as a journalist, but your presume to make criticisms of people who do it for a living. Why? Because you view that your law enforcment expertise gives you particular knowledge that can be applied to a different information gather and persuasion activity.

    Your qualifications for your job are the same as mine: a law degree, and my credentials beat most. Also, I’ve done academic work in statistics.

    But more important, nothing I say here about law enforcement is a secret, or even original. If you don’t recognize where the law enforcement investigation process can go awry, you cannot be doing your job properly in terms of exercising prosecutorial discretion.

    See http://www.latimes.com/news/science/la-na-probe29-2008jun29,0,5805080.story

    Cyrus Sanai (4df861)

  70. There is an interesting Ninth Circuit habeus corpus case where a Ninth Circuit panel freed a convicted killer prosecuted by Patterico’s office, on grounds that the confession may have been illegally obtained and that the LA Superior Court judge (and thus by implication the DA’s office) failed to consider the evidence as to the illegal acquisition of the confession (which was litigated directly in the original state proceedings). See Taylor v. Maddox, 366 F.3d 992 (9th Cir. 2004). The author of the opinion castigating LA County law enforcement? None other than this blog’s fave, Alex Kozinski.

    This was a case where the cops decided who was guilty, and then according to Kozinski, kept grinding on the defendant until he gave a confession. This is precisely the kind of blinkering that Patterico denies happens in his office. Well, Judge Kozinski and two of his colleagues do not agree.

    But hey, I guess Judge Kozinski does not know what he is talking about either when it comes to law enforcement.

    Cyrus Sanai (4df861)

  71.      In comment #52, j curtis asks, “has there ever been a case where someone was tried and convicted with having a dna hit against them, and this person was later shown to be innocent?”
         Here is a web page reporting on a case of a false match based on a six-loci test:
    http://www.forensic-evidence.com/site/EVID/EL_DNAerror.html
         The suspect was a match under the six-loci test, and the British police were certain they had their man. The suspect apparently was incarcerated for 6 months and wasn’t freed until after the DNA test was redone to include additional loci pursuant to a demand by his counsel.

    Ira (28a423)

  72. “This is precisely the kind of blinkering that Patterico denies happens in his office.”

    Cyrus – Can you demonstrate where Patterico says this?

    Are you a criminal lawyer or do you just know some and watch them on TV?

    daleyrocks (d9ec17)

  73. As a follow up to my comment #71, I think that whenever a prosecution is relying on DNA to identify a suspect, the judge and jury need be told how many people are likely walking around with DNA which would also match the test sample. This way we can avoid misinterpretation of the “one chance in whatever” expressions of the statistics.

    Ira (28a423)

  74. Ira #71,

    Well, we had a father here held in jail for nine months as a suspect in his toddler’s rape and murder before the crime lab processed the rapist’s semen and proved it was not his.

    nk (e69fdd)

  75. Sanai says:

    However, law enforcement, once they have decided who the perp is, then focus their investigation on building the case against the individual and putting all contrary evidence in the category of “things we have to disclose and disprove”.

    Note the categorical nature of the indictment. Then he says (my emphasis):

    If you don’t recognize where the law enforcement investigation process can go awry, you cannot be doing your job properly in terms of exercising prosecutorial discretion.

    That’s called moving the goalposts, Mr. Sanai. We’re not stupid here, and we can see when someone is making a dishonest argument.

    I recognize where the law enforcement CAN go awry. I have never, ever denied that — and indeed I have done posts about that before.

    What I contest is your ignorant assertion about what law enforcement does — phrased in a sweeping way so as to sound categorical.

    Your qualifications for your job are the same as mine: a law degree, and my credentials beat most.

    How many years of law enforcement experience do you have, Mr. Sanai? I rather suspect that it is less than mine (over 10 years), and very probably none at all.

    It’s great that you have a brother who has something to do with law enforcement, and that you know some former prosecutors, and that you read about it from time to time. If you think that means that your knowledge stacks up against mine, you go ahead and make that argument and we’ll let the audience decide.

    As for a comparison to my criticism of journalists: if an experienced journalist tells me that I misunderstand something about the basics of how journalists gather information, I’ll listen to that person and see if I can learn something. I won’t allow them to use their superior experience to justify distorting the facts, and indeed, pointing out distortions, omissions, and misrepresentations is what I generally do here. But I wouldn’t pretend to know more about journalism than an experienced journalist. I simply don’t.

    Ultimately, if you had said that law enforcement sometimes goes awry, I would have had no problem with your claims. But you come from the realm of the civil lawyer, where every claim must be sweeping and overstated. So you painted with a broad brush, accused me of opinions I don’t have, and made categorical claims about my profession that aren’t always true.

    In short, my friend, you’d do well to show a little more humility. As for who is being snotty, I’ll leave that one to the audience to decide as well.

    Patterico (cc3b34)

  76. It just keeps getting better. Like Captain Ahab, Cyrus summons his Great White Whale, Judge Kosinski from the deep. Hilarious.

    daleyrocks (d9ec17)

  77. Here is a recent article on Slate about forensic science error and what can be done about it. Note the section on cognitive error.

    http://www.slate.com/id/2197284/

    Cyrus Sanai (4df861)

  78. Mr. Sanai,

    Could you draw on your vast statistical expertise to explain just what in the world you mean by this?

    The odds that the defendant is not guilty assuming the crime was committed by someone in the database 122/65,000 (about one in 500).

    What exactly is the point of dividing 65,000 by 122?

    Could you please elaborate on this at length? Try to leave pomposity and personal shots at others completely out of your answer, and just explain the mathematical purpose of dividing 65,000 by 122.

    Never mind that it makes no real-world sense at all to *assume* the crime was committed by someone in the database. I still want to hear the reasoning behind dividing that one number by the other number. Because I can’t make sense of it, any way you slice it.

    There aren’t 65,000 pairs of names in the database. As someone pointed out above, there are over 2 billion. And that’s before you factor in the fact that there are over 700 ways to combine nine loci.

    Nor are there 122 people in the database who pair up at nine loci. If there are 122 pairs, and each pair is to a different profile, then there are 244 people who pair up with another person.

    So I’m still left befuddled as to the reason someone would divide 65,000 by 122, and what such a person thinks the result would mean.

    I’m not saying there’s no point to it. I’m just saying that, if there is, you haven’t clearly explained it yet.

    Patterico (cc3b34)

  79. As it stands, the 1 in 500 number sounds a lot like the “analysis” of this editorial, which also claims

    If 600,000 other Americans were walking around with the same DNA — as measured by the “9-locus test” — should such a “match” be sufficient to send a man to prison for life?

    To get 600,000, they apparently divided 300,000,000 (the number of Americans) by 500 — suggesting that they thought that if you ran a single profile through a database, you’d get 122 hits (1 in 532, or roughly 1 in 500), and if you ran that profile through a complete database of all Americans, you’d get 600,000 hits.

    This suggests that people who cite a 1 in 500 number, as you did, are expressing exactly the fallacy I accused you of expressing.

    But feel free to clarify. Do you think the editors of the Las Vegas Review-Journal are correct to talk about 600,000 people walking around with the same DNA?

    Patterico (cc3b34)

  80. Cyrus, is your brother still Assistant County Counsel up in Yamhill County or are you speaking of other “law enforcement” experience.

    daleyrocks (d9ec17)

  81. 71-Ira

    Although such an incident seems plausible, it surprises me that there wouldn’t be some serious papers written about the incident considering that such an incident would seem to be something the criminal lawyers would want to reference often.

    What was the name of the innocent guy who was arrested and what was his lawyer’s name? I would expect that at least the supposedly heroic lawyer’s name would be mentioned in at least one story about this.

    j curtis (c84b9e)

  82. 81

    The guy’s name is Raymond Easton. Note he spent a few hours (not six months) in jail before being bailed out and eventually cleared.

    James B. Shearer (fc887e)

  83. j curtis #71, writes, “What was the name of the innocent guy who was arrested and what was his lawyer’s name? I would expect that at least the supposedly heroic lawyer’s name would be mentioned in at least one story about this.” I agree, one would.
    In any event, the debate here and in a related thread in Patterico’s blog shows that the bandying about of one in whatever statistics may be misleading regardless of who is pushing them. The real statistic which should be discussed is how many people would be expected to actually match the loci in the specimen. If 10,000 people would match, that’s not a helpful match in determining guilt. If only one or two persons would be expected to match, then we’re on to something.

    Ira (28a423)

  84. Cyrus?

    Patterico (cc3b34)

  85. Was my question too hard, Cyrus?

    Patterico (cc3b34)

  86. Does this make sense:

    When we get a match from a crime scene sample against a felon in a database, we proceed more “validly” if we do a two-stage process.

    Stage 1, law enforcement calculates the probability that the database gave them a good suspect so they can focus their attention and resources on him.

    Stage 2, at trial, the match is more “valid” if the database is kept out of the picture and we look at the occurence of those crime scene markers in the entire population. (Because the database is not designed to reflect the entire population.)

    Using my liver transplant example, we have a database of donors and we find the one that best matches the donee. We take it to the immunologist and he says a) “Wonderful” or b) “Sorry, that is in fact the closest matching donor we have but not close enough. The liver will be rejected because we still do not have the compatibility needed”.

    nk (e69fdd)

  87. that *may be in fact* the closest matching donor we have

    nk (e69fdd)

  88. I think that approach would work, with the caveat that in this case, no version of RMP should be presented to the jury. But if you’re going to tell them anything about the likelihood of generating a false positive, then whatever you tell them about those odds absolutely MUST take the database effect into account, whether jurors are told about the database itself or not.

    Xrlq (b71926)

  89. Patterico – Cyrus has a history on this blog of taking a powder to avoid admitting he’s wrong or when the questions become too embarrassing. That’s just who he is.

    daleyrocks (d9ec17)

  90. I mentioned earlier the father who stayed in jail for nine months before the Ilinois crime lab processed the crime-scene DNA. So I can see where you might want the already-processed DNA in the database as evidence to avoid having a trial a year and a half or more after the crime was committed and more than nine moths after the defendant was arrested. But if you have a good chain of evidence for the database information, which can be established by a motion in limine, for or against admission, outside the presence of the jury, the trier of fact need not concern itself whether the match was from a database or from a recent sample from the defendant.

    😉 And a daughter story almost on topic. I bought my daughter an “EyeClops” which is a magnifying camera that you connect to your TV. She asked for a toothpick so she can get some cells from her mouth to look at with it. She is only six. Where did she get that idea from?

    nk (e69fdd)

  91. 82Note he spent a few hours (not six months) in jail before being bailed out and eventually cleared.

    Unreal. The clown who wrote that piece just felt that it was more advantageous for his agenda to have the innocent dude spending six months in jail instead of a mere few hours. No way it could have been an accidental typo.

    j curtis (c84b9e)

  92. Well, he may just be busy. I’ll give him the benefit of the doubt.

    Meanwhile, while we wait, if anyone else has a clue what Cyrus is talking about, feel free to chime in.

    Patterico (ab9e29)

  93. Meanwhile, while we wait, if anyone else has a clue what Cyrus is talking about, feel free to chime in.

    I believe that

    1) he still insists that 122 pairs from nine loci found in a database of 65,000 is the same as 244 individuals with the same nine loci. See my comments above.

    2) He has no clue about what a criminal trial with even only a minimally competent defense attorney is like. Let alone, with even only a minimally ethical prosecutor in a horrendously overworked jurisdiction who does not have the luxury, even if he had the inclination, to railroad anyone. The Nifongs are few and far in between and even a doofus like me can beat them.

    nk (e69fdd)

  94. And to forestall nonsense about overcharging, I know that someone who reaches into my car to steal my cell phone can be charged with as little as attempted theft, punishable by up to six months; burglary of an auto, by up to five years; or robbery, by up to ten years. Write to your legislator.

    nk (e69fdd)

  95. “Well, he may just be busy. I’ll give him the benefit of the doubt.”

    Patterico – No doubt when he decides to favor us again with his presence he will tell us he was tied up saving the planet or some such other minor inconvenience. The funny thing is I wouldn’t have thought it would take a man with such a lofty opinion of himself this long to save the planet.

    daleyrocks (d9ec17)

  96.      j curtis in #91 (about ## 71 and 82) writes, The clown who wrote that piece just felt that it was more advantageous for his agenda to have the innocent dude spending six months in jail instead of a mere few hours. No way it could have been an accidental typo.” jc, you’re right, it could not be a typo. The piece says, “It is only when the suspect’s solicitor demanded a retest using additional markers, after the suspect had been in jail for months, that further testing was done,” and “The U.K.’s Daily News of February 11, 2000, reported that when the mistake was discovered some six months later, the arrestee was released without an apology and given a brief letter stating that charges against him were being dropped because “there was not enough evidence to provide a realistic chance of conviction.”
         According to
    http://news.bbc.co.uk/2/hi/uk_news/636209.stm
    the publication which originally published the story was the Daily Mail.
         I could not find the original story. According to the BBC web page, this is the story (and note that the writer should have used the word “precise” instead of the word “accurate”):

    DNA matching failure

    The failure of a supposedly “foolproof” DNA test is highlighted by the Daily Mail.

    The paper reports that Raymond Easton from Swindon in Wiltshire is suing the Greater Manchester police after being accused of a burglary in Bolton.

    Police said his DNA matched samples found at the scene of the crime and said the chance of him being the wrong man was 37,000,000 to one.

    But when his solicitor demanded that a more sophisticated DNA test be carried out, it put him in the clear.

    The more basic test has now been dropped, in favour of the more accurate version. The Mail says the FBI, which uses a similar DNA profiling technique, believes the implications of the case are “mind-blowing”.

    Ira (28a423)

  97. I’m throwing in this link without comment (for the moment). Well, maybe one comment. I think the problems with forensic science may be larger than just how to describe the probabilities involved.

    Bad science in court

    Karl Lembke (cf197b)

  98. He’s leaving comments on other threads.

    He’s dodging the question because he screwed up.

    Patterico (cc3b34)

  99. 96

    I feel like there is a 35% chance that the incident is urban legend. It has so many markings.

    The BBC reports the thing as one of those “what other papers are saying” and even has one of those “hey buddy, we’re not responsible for the claims other media outlets make” notices on the page.

    If no incident like this had ever occurred, you would have lawyers trying to invent it…and the invention would look a whole lot like this incident.

    You can invent a fictional victim and give him a name but it would be very difficult for one to invent a fictional constable or solicitor and give him a name in the story because that would make it too easy to trace and impeach.

    j curtis (c84b9e)

  100. This was real. Probably the only thing that went right was the DNA testing which in the end acquitted the father and that took much longer than it should be allowed to.

    nk (e69fdd)

  101.      j curtis, in comment #99 states that he thinks there is a 35% chance that the Raymond Easton story is urban legend. Hmmm, what does that statistic mean? 😎
         Don’t forget there is the apparently independent follow-up story in the link James Shearer gave us in comment #82.
         nk provides in comment #100 a link to an article about the Riley Fox murder case. DNA is shown once again to be an extremely effective tool to eliminate suspects and an aid to investigation. The real horror of the story is the official negligence and apparent malfeasance.

    Ira (28a423)

  102. Hmmm, what does that statistic mean?

    It’s based on my feelings that I’d need to get 2:1 odds before I’d bet money on it.

    j curtis (c84b9e)

  103. There’s a capital case (nothing I have to do with, thank God) on its way to trial in Illinois. The State has 1) “accomplice” testimony and 2) a nine-loci match from saliva from a chicken bone at the fast-food restaurant where the murders occurred. Illinois gives a jury instruction to the effect that accomplice testimony is untrustworthy and, essentially, that the jury should not convict on the basis of accomplice testimony alone without independent corroboration. Is a nine-loci match corroboration?

    nk (53f203)

  104. #78

    I’m going to respond to this one first, so that the discussion does not go out of hand. Once I get this cleared up, I’ll move on the others.

    It’s clear we are talking about different databases.

    When I was discussing the hypothetical I presented, I used the mathematical analysis from the link YOU gave, http://dna-view.com/ArizonaMatch.htm. That explicitly refers to a database of 65,000 felons, of which there are 122 nine loci pair matches out of what must be presumed was a particular base run. If I am misreading that page, please state where.

    So the database is NOT 2 billion. There is NO existing database that carries 2 billion complete DNA entries. I don’t know where you pick 2 billion from; I assume that is your base number for all males on earth. But that is not the right database or even denominator. The extent of US jurisdiction is maybe 250 million; you don’t rationally include everyone living in China for a crime committed in Arizona.

    The problem is, and I will say this again, that you are confusing the use of identifying with corroborative evidence. When you are saying, “It is this x loci DNA match that proves THIS is the guy” and there is nothing else other than that person’s past felony (which got him on the database), then it is EXTREMELY relevant to the investigator and the jury how many other x loci matches there may be in a relevant population; you get an idea about this by looking at the x-loci matches in the database. Saying that the relevant statistic is the chance that a random person, selected out of x billion on the earth, would have the same x loci match is not enough, where there is a statistically significant chance (let’s call it one third in the case of a five loci match) that even in the small database used, there is at least ONE other match.

    Now, it would be different if the evidence is corroborative. So let’s say in a rape case there are three witnesses seeing the defendant in the area, plus fingerprint evidence, plus a security camera showing the defendant entering the crime scene at the relevant time, and there are only six other people who entered the crime scene at the relevant time. Well, at that point it might be meaningful to say that the odds of a randomly selected person matching this DNA sample at 5 loci is 1 in x million, because the DNA evidence is not what is solely identifying the defendant–there is other evidence, that is completely independent of DNA showing that the particular person might have done it, and cutting out nearly all of the other relevant population. Then the contention you are making is, “What are the odds that a person who entered this crime scene, left his fingerprint on the scene, and was seen by three independent witnesses–a person who contends that he just randomly happened to be there–had a DNA match with the rape kit evidence?” In that case, presenting those very small odds make more sense, because the issue of other potential matches is excluded by the other evidence.

    So, here we go:

    1. Do you contend that I am misinterpreting the size of the database or the scale of the one match odds at nine loci based on the data at http://dna-view.com/ArizonaMatch.htm?

    2. Do you agree, or disagree, about the distinction between identifying and corroborative evidence? If so, please explain why it is not important, on partial match DNA evidence, to look at the statistics concerning possibility of at least one other match in a relevant population, as estimated by statistical analysis of the felon database? In other words, why is not relevant on a five loci match that there is a one third chance of at lease one other five loci match (or in the case of nine loci, one in 500 or so).

    I read you to say that #2 is never important, and if I read you correctly, I think you are really misguided, and are making precisely the kind of focus errors that lead to the Hatfill screw up by the FBI, the Kozinski-led habeus reversal in Taylor v. Maddox (where the confession was the ONLY evidence, and your office prosecuted the crime), and the Mississippi cases discussed in the Slate piece. Here is the lede from that latter article:

    “Between them, Kennedy Brewer and Levon Brooks served more than 30 years in Parchman Penitentiary in Mississippi. Brewer was sentenced to death, Brooks to life without parole. The crimes for which each was convicted are remarkably similar: A female toddler was abducted from her home, raped, murdered, and abandoned in the woods. In each case, Mississippi District Attorney Forrest Allgood decided early on that the boyfriend of the girl’s mother was the culprit. In each case, he asked Dr. Steven Hayne to perform the autopsy. And in each case, Dr. Hayne called in Dr. Michael West to perform some analysis of bite marks on the children. West claimed to have found bite marks that had been missed by other medical professionals and then testified in court that he could definitively match these marks to the teeth of the men Allgood suspected of committing the murders.

    In each case, West was wrong. Two weeks ago, Mississippi Attorney General Jim Hood announced that police had arrested 51-year-old Albert Johnson for the toddlers’ murders. Johnson’s DNA matched that found at the scene in both crimes. ”

    Here is a link to the earlier article in Reason, where the same journalist puts Mississippi’s forensics disaster into context: http://www.reason.com/news/show/122458.html.

    By the way, the conclusion that should be drawn from what I am saying is as follows; (1) your criticism, and the criticism of the expert, as to the relevance of the 1/3 chance of at least one 5 loci match, which is that the LA Times was discussing, is wrong; and (2) in partial evidence DNA cases, the Daubert line should be drawn at nine or more (1 in 500 being not bad) for identifying evidence, and maybe a bit less for corroborative purposes.

    Cyrus Sanai (4df861)

  105. Cyrus,

    There were 122 matches out of all possible 9 loci paired comparisons there were approximately 65,000 X 65,000 / 2 = 2,112,500,000 such comparisons. That is sample 1 was compared to samples 2 to 65,000, sample 2 was compared to sample 1 and samples 3 to 65,000 and so on. There are over 2 billion pairs of records in a data base which has 65,000 records. There were 122 matches out of these pairs suggesting that the chance of any given two records matching by chance is 1 in 17.3 million and the chance of any given suspect having a 9 loci chance match in the database is approximately 1 in 266.

    Lloyd Flack (ddd1ac)

  106. I think Patterico, like the “expert” David Kaye, is applying mutually inconsistent theories to determine the probability of an innocent match vs. a guilty one. The proper analysis is to acknowledge that where an individual record is selected because it generated a match, then we haven’t really changed the database’s odds; all we’ve done is put a face on them. If the database was “framed” (contains a random match to an unrelated individual), Puckett was “framed” (is that random, unrelated individual). And if the database was “guilty” (contains a record from the perp), he is that guilty man.

    Hogwash, sez Patterico. As the California Supreme Court glibly put it, “The database isn’t on trial, this guy is, nyah nyah nyah nyah nyah.” Fine. Let’s just forget the whole process by which the record was selected, and pretend some lab tech had just fished it out at random. If that’s the fiction we’re going to operate under, then that’s the rule we’ll have to live and die by. What are the odds that this one individual record will be a random match? 1 in 1.1 million. What are the odds it will be to the perp? 1 in 338,000, if you are absolutely certain that the perp was in the database to begin with. Otherwise, more remote still. Meaning that the L.A. Times’s error consisted of saying that the odds in favor of Puckett’s guilt were “only” 2 to 1, when in fact, under ideal circumstances they may have been as high as 3 to 1!

    Indeed, the only way to make the Puckett case look like anything remotely like a fair trial is to apply the database effect when discussing the odds of a guilty match (if the database is guilty, then he IS guilty), while ignoring that database effect when discussing the odds of an innocent one (if the database was framed, he has a 1 in 338,000 chance of being the framed individual).

    Xrlq (62cad4)

  107. Lloyd,

    Thank you for re-explaining the math to Mr. Sanai. I explained it previously, but he clearly thinks himself so superior that he isn’t even bothering to understand my arguments. His characterization of my arguments is so far off it’s quite incredible.

    However, as I have already noted, the article I linked makes 2 billion a severe understatement of the number of possible ways that two profiles can match at nine loci in a 65,000 person database. There are 715 ways two profiles can match at nine loci. So you have to multiply that 2 billion number by 715.

    That’s a significant point to understand. And it’s all at my link.

    Cyrus has no clue what he’s talking about. More reading and less lecturing, Mr. Sanai.

    Patterico (b50d44)

  108. Xrlq, the longer you discuss this, the less sense you make.

    And now David Kaye is an “expert” in quotes — someone who doesn’t know statistics as well as the great Xrlq.

    Patterico (aef41d)

  109. You don’t understand != I’m not making sense. It is a bit difficult juggling alternative theories, though, so how ’bout this. For the moment, let’s set aside the 1 in 3 problem you are trying to wish out of existence, and focus instead on the odds that the killer is in the database. Neither you nor I know those odds, so let’s assign it a variable G, which could have a value anywhere between 0 (the killer is definitely not in the database) and 1 (the killer definitely is in the database). What were the odds that John Puckett was the individual whose actual DNA was found in Diana Sylvester’s bedroom?

    100%. Everyone knows that if your DNA matches you must be guilty.
    G. Puckett was only one match among the 338,000 records, so it stands to reason that if the database was “guilty,” he was too.
    G divided by 338,0000. G represents the probability that the killer is somewhere in the database, so to calculate the odds that he’ll turn up in any particular record, divided G by the number of records in the database. After all, the database isn’t on trial, only this individual is.
    1 in 6.684 billion. That’s how many people there are in the world, and the jury’s not supposed to know about the database, so let’s give them the odds that a truly random sample taken anywhere in the world would be from the individual in the world.
    Other (explain).

    I’ll deal with Kaye’s “expertise” later. For now, suffice it to say that if every law professor who holds himself out as an expert were entitled to such deference, I’d have to give far more deference to the critical law “experts” at my elite alma mater than to “experts” from a middle-tier school like ASU. If he were really that much of an “expert,” why hasn’t Harvard or Yale snatched him up yet?

    Xrlq (62cad4)

  110. Oh yeah, I almost forgot – I never claimed to be an expert in statistics. Therefore, anyone who doesn’t know a hell of a lot more about statistics than I do is not an expert.

    Xrlq (62cad4)

  111. “You don’t understand != I’m not making sense. It is a bit difficult juggling alternative theories, though, so how ’bout this. For the moment, let’s set aside the 1 in 3 problem you are trying to wish out of existence . . .”

    That’s where I stopped reading.

    Anyone who understands what Xrlq is saying and *doesn’t* want to be a prick about how he says it, chime in.

    Patterico (13a59a)

  112. “Oh yeah, I almost forgot – I never claimed to be an expert in statistics. Therefore, anyone who doesn’t know a hell of a lot more about statistics than I do is not an expert.”

    Put another way, statistics experts *and * many non-statistics experts know more than you.

    But, if statistics experts aren’t professors at Harvard or Yale, we can safely ignore their opinions.

    Patterico (81f508)

  113. I do agree that you can have credentials and still be a moron. Stanford professor and NPR “Math Guy” Bruce Devlin couldn’t even figure out that the LAT got a fraction upside down. The article I link earlier in this thread conclusively shows Devlin to be a clown, and based on my correspondence with him I heartily concur.

    Patterico (c438d7)

  114. OK, time to go back to the time-honored writing for the enemy. My understanding is that you think would be appropriate to say that the entire database ran 1 in 3 odds of randomly matching to Sylvester’s murderer, but when discussing the odds that Puckett himself would, the appropriate figure is 1 in 1.1 million. Correct or incorrect?

    Xrlq (62cad4)

  115. Put another way, statistics experts *and * many non-statistics experts know more than you.

    Absolutely. Anyone with a B.A. in stats knows more about statistics than I do, but that doesn’t make him an expert. From where I sit, anyone whose understanding of statistics doesn’t absolutely floor me is not an expert. That category certainly includes anyone who can’t understand why the “random match probability” factor contemplates a sample that was chosen at … um … random. I don’t know much about stats but I do know that much.

    Xrlq (62cad4)

  116. “My understanding is that you think would be appropriate to say that the entire database ran 1 in 3 odds of randomly matching to Sylvester’s murderer, but when discussing the odds that Puckett himself would, the appropriate figure is 1 in 1.1 million. Correct or incorrect?”

    Incorrect.

    On what planet would the database run 1 in 3 odds of randomly matching to the murderer?

    Based on stats alone, and not geography, nature of the people in the database, etc., a database would have to be 2 billion strong to run those odds.

    Patterico (58487c)

  117. Or, if by “match” you simply mean “have the same profile as,” then the odds that Puckett matches the murderer are 100 percent.

    Patterico (5619e3)

  118. Also:

    Cyyyyyyyyyyrus!

    Cyyyyyyyyyyyyyyyyyyrus!

    We miss you!

    We think you’re running from the question!

    Patterico (e2c4a0)

  119. “My understanding is that you think would be appropriate to say that the entire database ran 1 in 3 odds of randomly matching to Sylvester’s murderer, but when discussing the odds that Puckett himself would, the appropriate figure is 1 in 1.1 million. Correct or incorrect?”

    I think you might be conflating two concepts here.

    I think would be appropriate to say that an innocent database of the same size would run 1 in 3 odds of containing a profile matching Sylvester’s murderer.

    When discussing the odds that any randomly selected person in the world whose DNA profile is unknown would match that profile, the appropriate figure is 1 in 1.1 million.

    When discussing the odds that Puckett matches the profile, those are 1 in 1.

    When discussing the odds that Puckett is the killer, there is no way to express those odds with certainty.

    However, when discussing the odds that a randomly selected person with that profile is the killer, the odds — taking into account only statistics and nothing else — seem to be roughly 1 in 6000.

    If you know you have only one hit in a database of 338,000, it is my tentative conclusion that the database factor may lower those odds to 1 in 18,000.

    Patterico (f68cb7)

  120. My tentative conclusion is that the number of hits is irrelevant. The only real information a second hit buys you is absolute certainty that somebody matched randomly.

    That said, I now agree with your view that 1 in 1.1 million can be the appropriate figure to use, iff it is presented properly, but still think that unless the jury is absolutely clobbered with warnings, it will be misapplied. We’ll see if I’m right. In particular, here’s hoping j curtis and cfbleachers participate.

    Xrlq (62cad4)

  121. Guys: I read Pat’s site almost daily. Much of what is posted and commented on here makes sense to me. That is, I can follow the arguments and come to a conclusion when claims are in opposition.

    I am not a lawyer, but I am also not an idiot. When Pat claims the LAT has screwed the pooch again, I am able to examine his argument and proofs to form an independent judgment.

    Yet after reading the comments to his original post, I feel like I am, in fact, an idiot. Claimant A says X. Claimant B says Y. Claimant C says Z. I realize this is a blog for attorneys. I admit my ignorance of the subject is not any fault of the lawyer community here. But is it not possible for lawyers to speak English?

    James B. Shearer at #45, and elsewhere above, speaks to me. Nearly all of the other comments speak to (or past) other lawyers. Perhaps that is as it should be at Pat’s place.

    Before reading the comments, I had some belief that DNA evidence was strong juju in proving guilt or innocence at trial. Now, I know I know nothing, and would be–as a juror–evaluating conflicting assertions regarding DNA evidence on the basis of style points.

    At age seventy, it’s unlikely I have sufficient time left for law school, a biology degree with specialization in genetics and another degree in statistics. Perhaps someone can recommend a book on DNA as Evidence for Dummies, preferably written by a “country lawyer”. I do not want to be floored by a subject matter expert, whether as a juror or an ordinary Joe. I want to be persuaded the S.M. expert is not blowing smoke.

    Considering my age, maybe a pamphlet would be more appropriate.

    Thanks to all for making this an interesting place.

    gnholb (710dbc)

  122. Gnholb, I doubt there is a book on DNA evidence for dummies, but FWIW this discussion is really about statistics, not DNA as such. The results of the Arizona search may seem counterintutive, but even if we take them as gospel (i.e., forget standard error, and assume no blood relatives were in the database to muck up the numbers) they don’t change the FBI estimates all that much. They’re just counterintuitive because stats are confusing, not because DNA is. We could discuss equally confusing, equally counterintutive results discussing statistics on baseball, or anything else.

    One way to keep things in perspective is to recall an occasion in the early 1990s when LA radio host Mark Germain, then known as “Mr. KFI,” had a doting female caller tell him he was “one in a million.” In response, Germain cracked that this means there are 7 people in the greater L.A. area who are exactly like him. That’s how juries should look at any figures predicting the likelihood of a single match. I’m not convinced that they do (though so far, my readers seem to be doing a lot better with the probelm than I think 12 random jurors would).

    The problem with David Kaye’s statement that “If logic were the life of the law, the np statistic [i.e., the one taking into account the database effect] would not be permitted” is that there is nothing logical about assuming everyone else will think logically. In a perfectly rational world, the np statistic indeed be unnecessary, as 12 perfectly rational jurors would interpret the one in a million figure exactly the way Germain facetiously interpreted his. From this, a perfectly rational jury would deduce that a 5.5 loci DNA match can make us pretty certain that there are about 6,000 people in the world, give or take a standard deviation, who could have done the crime, but that it doesn’t tell us a damned thing about which of those 6,000 was the one who actually did it.

    There’s only one problem with this approach: we do not live in a perfectly rational world.

    Xrlq (b71926)


Powered by WordPress.

Page loaded in: 0.1432 secs.