Patterico's Pontifications

8/21/2008

Freakonomics Guy: Surprisingly Unexpected Odds Are, Surprisingly, Expected!

Filed under: Crime,Dog Trainer,General — Patterico @ 11:40 am

Freakonomics guy Steven Levitt takes a look at the “Arizona DNA database” issue raised by a recent L.A. Times article, and pronounces himself surprised to learn that the numbers . . . aren’t surprising:

When I heard about this, I wondered if the F.B.I. is totally off its rocker when it comes to the probabilities it gives about DNA matches. Is it possible that the F.B.I. is right about the statistics it cites, and that there could be 122 nine-out-of-13 matches in Arizona’s database?

Perhaps surprisingly, the answer turns out to be yes.

Of course, it shouldn’t be surprising at all. Levitt reveals that you would expect to find about 100 matches at nine loci. Instead, 122 were found, which is likely an inconsequential difference given the magnitude of the numbers we’re discussing.

But these numbers are set forth in the article. So why did Levitt, an experienced statistician, wonder whether the FBI was “totally off its rocker”?

The answer lies in the overly dramatic way the issue was portrayed by the L.A. Times.

On the front page, the paper portrayed the findings as surprising, announcing with great fanfare that dozens of matches were found with probabilities like 1 in 113 billion. It seemed to defy impossible odds, we were told.

Only on page A20 were we told that most of these stunningly unexpected matches were, in fact, expected — because the analyst was comparing every pair to every other pair. As Levitt explains, that means that about 1.4 trillion comparisons were done, making a 1 in 113 billion match anything but unexpected..

This sort of manipulation of the readership helps dramatize a story and sell papers. But it also has real-world consequences, as the last link shows.

More details to come in the comments.

24 Responses to “Freakonomics Guy: Surprisingly Unexpected Odds Are, Surprisingly, Expected!”

  1. But about 98% of your readers said “Birthday Problem” 3 sentences in. Why was this guy surprised?

    Kevin Murphy (805c5b)

  2. Levitt sez:

    “Note, however, that if we start with DNA from a crime scene and then go search the Arizona database for matches, we aren’t doing 2 billion searches, we are doing “only” 46 million (65,000 people times 715 different combos of 9 loci), so we will have a false positive rate of “only” one in 279.”

    Uh, not really.

    If you have degraded DNA with only nine loci to compare, it becomes irrelevant that there are 715 ways for nine loci to match. The match must be at those nine loci.

    If, by contrast, you have non-degraded DNA, then you’ll get a certain number of matches at nine loci — but they won’t match at the other four loci. So it won’t really be a false positive.

    So even Levitt’s post, which is designed to show that the FBI’s numbers are probably correct, skews things unfairly to the defense side.

    Oh, I also wanted to thank Arthur K. for the pointer.

    Patterico (e5f365)

  3. I agree with this, though:

    “What I find interesting about this article and these calculations is that they show how the same sets of basic statistical relationships can appear much more or less convincing depending on how they are portrayed. When we hear that there are 112 matches out of 65,000 people, it seems like DNA fingerprinting is not nearly as good as we think — but that is largely because we aren’t thinking about the fact that 65,000 people imply 2 billion pairs of people.”

    Yup. And when the LAT doesn’t make that clear on the front page, the conclusion — even by the Freakonomics guy — is that “DNA fingerprinting is not nearly as good as we think.”

    That’s what the foreman of a recent jury thought.

    Too bad he didn’t turn to A20.

    Patterico (39abd6)

  4. I really enjoyed reading that book, Freakanomics.

    JD (75f5c3)

  5. The interested reader should note that I already provided the math discussed by Levitt, in comments 27 and 107 to this post. There, I correctly apply the 715 multiplier that Levitt gets partially right and partially wrong in his post.

    Patterico (660322)

  6. Call me cynical, but I doubt Levitt ever really thought the FBI was off its rocker or was truly surpised to learn they were not. If you’ve read Freakonomics, you’ll know that the whole book reads that way. Nothing is ever merely mildly interesting or kinda-sorta counterintutive. Everything is “wowie zowie, I can’t believe this is true but it is, OMG!!!!!!!!!!!!” This article strikes me as more of the same.

    Xrlq (b71926)

  7. “Of course, it shouldn’t be surprising at all …”

    I don’t really agree with this. Many people find the answer to the birthday problem (how many people do you need in a group for it to be more likely than not that two people in the group share a birthday) surprising. Expecting the general population to never be surprised by the answer to a fairly complicated problem (which they have not seen before) is unrealistic.

    James B. Shearer (fc887e)

  8. My point, James B. Shearer, is that the LAT should not have dramatized the problem as creating UNEXPECTED!!!!1!! results, when they were in fact mostly expected.

    I thought that was quite clear in my post.

    Patterico (6d4b05)

  9. I think expert (dna) witnesses have to be prepared to explain the birthday problem to juries.

    Arthur (81fd36)

  10. I have a degree in engineering and 2 Masters degrees. I can’t follow this crap. How is a jury supposed to?

    My experience as a juror says they won’t follow it. They’ll vote on what their “feelings” tell them. In other words the best lawyer wins regardless of the real facts. The last case I sat in on as a juror pitted “expensive suit corporate lawyer” against “folksy Ben Matlock”. Ben crushed the suit but had no basis for his victory when you come down to it. Both lied through their teeth and played the game. Some system we have.

    quasimodo (6b5b4c)

  11. I’m wondering when Cyrus will deign to comment on this.

    daleyrocks (d9ec17)

  12. Quasimodo:

    I have a degree in engineering…

    Well there’s yer problem right there!

    Seriously, look at it this way: Suppose your employer holds a lottery. 1,000 people each are assigned a unique number from 1 to 1,000; then one of those numbers is drawn from a jar, and the winner gets a month’s holiday.

    Now, the probability that somebody will win that lottery is 100%; it’s guaranteed. But that doesn’t change the fact that the probability that you personally will win the lottery is still — 1 out of 1,000.

    Dafydd

    Dafydd ab Hugh (db2ea4)

  13. Quasimodo:

    Oops, I hit submit too soon. I meant to continue…

    Similarly, mathematically, if your database of DNA samples is big enough, then the probably that some pair of people would match each other approaches 100%.

    But that doesn’t change the fact that the probability that a particular person would, by sheer mischance, match the DNA evidence gathered at the scene of a crime is still many billions to one.

    On the birthday problem, everything changes if you specify the birthday: Asking, “How many people must you have in the room before the odds favor two people sharing a birthday?” is completely different from asking, “How many people must you have in the room before the odds favor somebody having October 31st as his birthday?” The former number is much, much smaller than the latter.

    Dafydd

    Dafydd ab Hugh (db2ea4)

  14. Dafydd – That is the first explanation that made sense to me. Thanks.

    JD (5f0e11)

  15. Dafydd – I second JD’s comment. Well said. You have an error in #12, however. I’m self employed, and there’s no way I could send myself on a month’s vacation.

    Also, would you please make the same comment over at the Freakonomics posting? Reading the comments there, most people completely invert Levitt’s point.

    Apogee (366e8b)

  16. Dafydd, I third JD’s comment. My god, I have read every comment on the subject since Patterico first posted on it and generally my eyes glaze over and my brain starts threatening to explode. With a liberal arts background, statistics is all Greek yet in two comments you have provided a clear, succinct analogy which is understandable and, bonus points, an easy to grasp visual.

    Seriously, my brain thanks you.

    Dana (084de8)

  17. 8 “… LAT should not have dramatized the problem as creating UNEXPECTED!!!!1!! results, when they were in fact mostly expected.”

    But they are only expected if you are familiar with this sort of problem. Most people aren’t. So there is a sense in which the results are surprising or unexpected.

    James B. Shearer (fc887e)

  18. “But they are only expected if you are familiar with this sort of problem. Most people aren’t. So there is a sense in which the results are surprising or unexpected.”

    Yeah. If the region’s Premier Newspaper does a right crappy job of explaining it.

    Patterico (0ebc02)

  19. “Yeah. If the region’s Premier Newspaper does a right crappy job of explaining it.”

    Or if you are too pompous to try to understand it or admit that you are wrong.

    daleyrocks (d9ec17)

  20. #13 and following posts:

    It is correct that the questions are different and mean different things.

    I though Levitt’s article was interesting in where he was exact versus loosely metaphoric.

    Here’s his key statement:

    “Note, however, that if we start with DNA from a crime scene and then go search the Arizona database for matches, we aren’t doing 2 billion searches, we are doing “only” 46 million (65,000 people times 715 different combos of 9 loci), so we will have a false positive rate of “only” one in 279.”

    First, the matches are not “false positives”, strictly speaking. A “false positive” or a “false negative” is an incorrect test result due to error or random chance. Matches at x loci are not false postives–they are correct test results.

    Putting that aside, Leavitt calls the rate 1 in 279, whereas I though it was half that rate, 1 in 558; that’s because Leavitt counts every match as 2 because he looks at the positives as X matching Y and Y matching X, while I counted these reciprocal matches as one.

    Nonetheless, the risk of an accurate test misidentifying due to two or matches at 9 loci is NOT 1 in billions, as Patterico keeps arguing, but 1 in hundreds (279 or 558 depending on whether each match counts for 2 or 1) using the Arizona data.

    One other point, re Patterico #2. The odds are not affected by the degrading of the DNA, provided that the degradation is random, which it most certainly will be. Random degradation is the same as random non-matches. It does not affect the odds of nine loci matches.

    Cyrus Sanai (4df861)

  21. “Nonetheless, the risk of an accurate test misidentifying due to two or matches at 9 loci is NOT 1 in billions, as Patterico keeps arguing, but 1 in hundreds (279 or 558 depending on whether each match counts for 2 or 1) using the Arizona data.”

    Wrong. I have already explained this. If your sample is not degraded, matching at only nine loci won’t misidentify anyone, because if there is no match at the other loci, there will be no misidentification.

    By contrast, if the DNA is degraded such that only nine loci are available to compare, then the trawl will create only 65,000 comparisons and not 46 million.

    I pointed this out to Levitt in an e-mail and he agreed, and said he will likely do a follow-up post to clarify.

    Cyrus, I tried explaining this math to you in another thread. Lloyd Flack and I explained that there are over 2 billion pairs in the database, and I further explained that there are 715 times that number (i.e. the 1.4 trillion mentioned here by Levitt) when you look at every way to pair up nine of 13 loci. You didn’t understand and replied with some incoherent nonsense about how there is no database of 2 billion people on the planet — showing that the subtleties of the conversation are 100 percent lost on you.

    I can’t waste my valuable time trying to explain it to you further, as it’s clear that you don’t want to learn. It’s all in Levitt’s post — barring the inaccuracy I already discussed above.

    “One other point, re Patterico #2. The odds are not affected by the degrading of the DNA, provided that the degradation is random, which it most certainly will be. Random degradation is the same as random non-matches. It does not affect the odds of nine loci matches.”

    I have no idea what you’re trying to say here, but I don’t much care. If there are only nine loci available to match, that most certainly does affect the odds, as Levitt acknowledged to me in an e-mail. Now you don’t multiply the pairs by 715, so you’re left with only 65,000 comparisons.

    Patterico (2e5ace)

  22. I think Leavitt’s calculations are a bit too loose. The probability of matching any nine of thirteen specified loci is about 1 in 25 million, if the probability of a random match is 7.5% per locus. The expected number of nine-of-thirteen matches in the AZ database is then (2.1 billion)/(25 million) = 83.

    If the observed number of nine-of-thirteen matches is 122, then that contributes 18.3 to a chi-square statistics for goodness of fit of the number of matches to a binomial distribution. This is too large for me to be comfortable.

    W. Krebs (7c80ac)

  23. Mr. Krebs,

    The thing is, it’s almost certainly true that some of those matches are related. We’re told in the article that the matches at 11 and 12 loci were due to the matched parties being related. Why wouldn’t there be some related matches at nine loci? There probably are.

    The FBI numbers assume a population of unrelated individuals. This is always how the number is related to jurors.

    So there’s no reason to be concerned.

    Do you know what the standard error is? I don’t. So even if you make the counterfactual assumption that all nine loci are unrelated, I’m still not sure we have enough information to know whether to be uncomfortable or not.

    Patterico (3bb36b)

  24. W. Krebs:

    I haven’t read through every comment here… but isn’t everyone tacitly assuming that the DNA match is the only evidence against the defendant?

    Because if it isn’t, if that just gave the DA’s office probable cause to issue search warrants, and those warrants yielded other evidence — bloodstained clothing that matches the victim’s blood, or property from the victim, or a prior relation to the victim (the defendant is the victim’s former husband, hint hint), a history of violence between defendant and victim, evidence the defendant wore shoes whose tracks were found in blood at the scene… then the DNA evidence takes on greatly added significance.

    I suppose it’s possible that somebody could be convicted of a crime based upon nothing, nothing at all, but a DNA match to some blood or hair found at the scene; but as a juror, I would be very, very reluctant to do so.

    Never say never, but boy would Patterico have to make one heck of a case! For one thing, the DNA match only tells us the defendant was there, not that he’s the one who committed the crime.

    I would want to see more evidence than that: Some sort of connection between the two, physical evidence besides DNA at the scene, prior history, perhaps prior convictions for similar crimes. Something!

    Dafydd

    Dafydd ab Hugh (db2ea4)


Powered by WordPress.

Page loaded in: 0.4874 secs.