Patterico's Pontifications

5/5/2008

Law Professor: L.A. Times Article on DNA Portrays One View As the Consensus View

Filed under: Crime,Dog Trainer,General — Patterico @ 6:28 am

Prof. David Kaye says on his blog that yesterday’s L.A. Times article on DNA, cold hits, and statistics is not balanced, and portrays one side of a debate as though it is the only valid viewpoint:

[A]n article in the May 3 Los Angeles Times claims to have uncovered a national scandal of sorts. The reporters describe a recent “cold hit” case that they say

is emblematic of a national problem, The Times has found. [¶] Prosecutors and crime labs across the country routinely use numbers that exaggerate the significance of DNA matches in “cold hit” cases, in which a suspect is identified through a database search. [¶] Jurors are often told that the odds of a coincidental match are hundreds of thousands of times more remote than they actually are, according to a review of scientific literature and interviews with leading authorities in the field.

The article maintains that

[I]n cold hit cases, the investigation starts with a DNA match found by searching thousands, or even millions, of genetic profiles in an offender database. Each individual comparison increases the chance of a match to an innocent person. [¶] Nevertheless, police labs and prosecutors almost always calculate the odds as if the suspect had been selected randomly from the general population in a single try. [¶] The problem will only grow as the nation’s criminal DNA databases expand. They already contain 6 million profiles.

This description portrays one approach to the issue as if it is the consensus in the scientific literature. It is not. There is disagreement about the need to adjust a random-match probability. Furthermore, if one counts the number of peer-reviewed articles on the subject, the dominant view is that adjustment is not necessary.

(My emphasis.)

So according to Prof. Kaye, the dominant view according to peer-reviewed articles on the subject is portrayed as the minority view (indeed, I note that the view is hardly discussed, as if nobody takes it seriously).

Prof. Kaye’s post has more excellent insights on the right way to view this controversy. Go here to read it.

Previous posts on this subject here and here.

18 Comments

  1. I agree that scientifically (as the science stands now), the LAT’s assertion is a red herring. Let’s change DNA to fingerprints. What on earth difference does it make if the fingerprints on the knife are matched to all fingerprints on file with the FBI or only to the people who were on The Orient Express with the victim? The odds of a duplicate fingerprint have been established as have the odds of a duplicate DNA match.

    When there is only a partial print or when there are only five and half markers the issue is still not scientific — it is a matter of rules of evidence and criminal procedure. An honest explanation, from both sides, to the trier of fact as to what either the partial fingerprint or the partial DNA mean.

    Comment by nk (1e7806) — 5/5/2008 @ 6:46 am

  2. Actually, if there is one person who knows forensic DNA it’s Barry Scheck. I’ll see if I can dig up something from him on this issue.

    Comment by nk (1e7806) — 5/5/2008 @ 7:02 am

  3. Barring problems in the software’s code and problems with data entry, the number of false positives should experience linear growth because the error rate should not grow. However, linear growth is still seriously problematic if the error rate isn’t incredibly small.

    Why do you not address the fact that if you have a false positive rate of even .000001, that over a body of 300,000,000 possible entries, that that will give you 300 possible suspects based on the DNA evidence? The software that is used would have to have a false positive rate of .00000001 to get it down to just 3 possible suspects. Maybe the current top packages for DNA processing software is that, or better, but maybe it isn’t.

    Such is the potential problem with false positives in a database that large. The problem is exacerbated by finding that you have DNA matches from across the country. How do you know that the guy from several states away didn’t know the victim?

    Comment by MikeT (2a040c) — 5/5/2008 @ 7:07 am

  4. If the error rate isn’t small enough, you cannot afford to investigate all of the possibilities that you get.

    Comment by MikeT (2a040c) — 5/5/2008 @ 7:09 am

  5. Mike T,

    I could not agree more. Science should be constantly revisiting this issue and not only correcting errors but always be willing to reverse the entire theory.

    But now the standard is “within a reasonable degree of scientific certainty”.

    Which is not the same as “reasonable doubt”. ;)

    Comment by nk (1e7806) — 5/5/2008 @ 7:21 am

  6. MikeT…
    Given your numbers, and a possible match of 300 people nation-wide, I would assume that simple, preliminary investigative routine would eliminate at least 90% from consideration. Now, you’ve got a list you can deal with, utilyzing the most preliminary techniques. Just how many of those on the list would not be able to show that they were not anywhere near the crime scene; had no history of similar behaviour; etc.
    The Times has constructed a straw-man, and many are willing to line-up to salute it.
    BTW, OJ did it!

    Comment by Another Drew (f9dd2c) — 5/5/2008 @ 8:12 am

  7. Thanks to Occasional Reader and his mention of dependent and independent probabilities, I finally figured out what was bothering me. Of course, he understood what was going on, but just didn’t bother to explain it in terms that a dummy like me could understand, but that’s the kind of guy he is.

    A big part of the problem here is that the wrong problem is being solved. What DNA is useful for is identifying an individual~and that’s all! Random Probability Match comes into play only in deciding the confidence we place in the identification itself. Patterico has been right in declaiming that

    Innocence and guilt have to do with factors other than a DNA match — the issue is what the match means. So when we’re discussing pure math, using the word just confuses things.

    but I was troubled by the coin example. Suppose, instead of a DNA sample, we have a photograph. And the photograph that we have might not be perfect, but we have a pretty good idea that the guy in the photo did something. So we look through the mugshots we have on file to see if the photo matches anybody.

    Now, we know that there are 338,000 photos in our file and we know we have a photo. What are the odds we will match our photo to a picture in the file? Given that all we know is we have a photo and 338,000 photos on file?

    The truth is, we don’t know~because we don’t have any way to figure that out. There are some things that we can infer just from having the photograph: well, we have a picture of someone! So obviously, that someone exists. We can rule out that someone from having lived before photographs came about. We might be able to rule out some more people on things we see in the photo itself, if the subject is male, that rules out women. There are some other deductions we can make, but you get the idea.

    The thing we still don’t know after all that, is “What are the chances that the photo matches one in our file of photos on hand?” In fact, we just won’t know until we look through the file to see if there is a match, whether there is one or not. (Which is why the Times story is multiplying apples by small block Chevys and trying to come up with tangerines.) So far, knowing that we have the photo, and from some things we can see in the photo, we’ve been able to exclude some people as being outside our interest, but we haven’t been able to calculate what the chances are that our photo matches one on file. My point being that DNA evidence is exclusionary, not inclusionary.

    Now, we have looked through our file and we think we have a match. Now is where RMP comes into play~it tell us how much we can rely on the photo we have in hand. If the photo in hand shows 70% of the subject’s face, but the scar on his chin is obscured, we can still compare the photo to the one we have on file, we just can’t say with as much certainty that the guy in the photo matches the one on file. Unlike a binary event like a coin toss, we can only say with a certain level of confidence what the strength of our match between the photo and our photo on file is.

    One other thing that we can’t know, is whether or not there is somebody else out there that looks remarkably like our guy in the photo, but without the scar on the chin. All we can say after we have identified every quantifiable characteristic we can think of is that we’ve managed to exclude a certain number of people from consideration. If our guy is blonde, for example, we can exclude all non-blondes, and women, and so on; but if we haven’t managed to exclude the remainder of people on the planet, there is a chance that there is someone that looks like our guy ‘out there’ someplace. Not that there is, but that there might be. And that is what the disagreement Dr. Kaye is discussing revolves around: just how precise is our ability to exclude people based on their DNA?

    Hope this helps somebody, but whether it does or not, it sure kept me busy while waiting in line at the grocery store this morning.

    Comment by EW1(SG) (84e813) — 5/5/2008 @ 8:15 am

  8. Given your numbers, and a possible match of 300 people nation-wide, I would assume that simple, preliminary investigative routine would eliminate at least 90% from consideration. Now, you’ve got a list you can deal with, utilyzing the most preliminary techniques. Just how many of those on the list would not be able to show that they were not anywhere near the crime scene; had no history of similar behaviour; etc.
    The Times has constructed a straw-man, and many are willing to line-up to salute it.

    That is a big assumption there. In most cases you might be right, but it doesn’t take a very large number of cases to be taxing on the police in most jurisdictions. If they have 30 suspects, and have ruled out the first 10 that are local, how can a typical small town police force be expected to figure out if the next 20, which are farther away and thus more expensive to investigate, aren’t suspects as well? Society will now expect them to do that because there will be a list of possible suspects.

    Comment by MikeT (2a040c) — 5/5/2008 @ 10:15 am

  9. #3 Mike T:

    Barring problems in the software’s code and problems with data entry,

    The database is actually populated by the results of an electrophoretic process resulting in electropherogram.

    So although there are quality control monitoring issues in populating the database, there are no software/data entry issues to deal with. And false positives are easily identifiable by the criminalist through the use of the electropherogram.

    Current “PCR-based” methods also result in a larger remaining sample than you started with, so retesting a properly stored sample isn’t too terribly problematic either.

    Comment by EW1(SG) (84e813) — 5/5/2008 @ 10:36 am

  10. MikeT…
    I grant that there could be large numbers to look at; but, I think the first level of analysis would eliminate much of the list (I think 90% would be a minimum – the number would most likely be much higher). We wouldn’t be investigating juveniles (pre-teens) from 2M miles away would we, or seniors in retirement/hospice settings?
    Again, we need to stop assembling straw-men – they don’t test well for DNA.

    Comment by Another Drew (f9dd2c) — 5/5/2008 @ 10:45 am

  11. In this particular case maybe some straw men are good for working out the ethical and legal kinks in this type of evidence?

    Maybe not for sensationalized newspaper articles (THERES A .00001% CHANCE THE MAN IS GONNA KICK IN YER DOOR AND SEND YOU UP FOR 20 COZ YER DOPPELGANGER IN SERBIA OFFED A MOB BUDDY OF HIS!!!!! HIDE!!!!)….

    “the dominant view according to peer-reviewed articles on the subject is portrayed as the minority view” Actually vote counting isn’t necessarily a good way to determine what a prevailing view in the scientific community is.
    There is a statistical method, that people use to determine a “consensus” view that does a bit better job than just simple vote counting (ie. number of papers) called meta-analysis. It would eliminate some things like multiple papers with the same information, slightly repackaged or added on to (results only count once no matter how many papers they are in) and it parses out the influence of different effects on some subject, ie. how much influence would lab sanitation (lot of DNA laying around labs)would have on one’s results etc. It would be pretty interesting to see what a meta-analysis of this topic would produce. One useful thing that this stats tool does is see whether or not “the truth lies somewhere in the middle” coz scientifically speaking sometimes it just doesn’t!.

    Comment by EdWood (06cafa) — 5/5/2008 @ 11:41 am

  12. nk — your example (a partial print matched to the known occupants of the train) illustrates a subtle problem: we do not always know who’s on the train at the time of the murder (we only know those that we know about, there may have been others), and we probably don’t know when the fingerprint was made. If the fingerprint was made by some innocent before or after the killing, or if the killer was not known to be on the train, the partial print matching someone who obviously has opportunity is going to cause more problems than it solves, because attention will be focused on convicting the partial-print matchee, rather than finding other suspects.

    Comment by htom (412a17) — 5/5/2008 @ 12:18 pm

  13. Kaye is dodging the issue, IMO. He postulates a scenario with one unique match and then goes on to talk about that scenario. Fine. I don’t disagree with him there. However, some key points.

    1. Kaye, in one post nots that as the size of the database increases the number of false positives also increases.

    2. Once you have more than one “hit” on a database the probability of guilt for any given one of the hits goes down.

    Let me explains the last one. Suppose we have a comprehensive database (i.e. it covers anyone). Further, lets assume fraud and labratory mistakes are not a factor. Now, we get two hits for a crime, for the sake of argument, we know was committed by one person. In this case, one person is guilty the other innocent. Picking either one at random means you’ll have a 0.5 probability of getting the wrong man. When there are 3 hits, the probability goes to 0.67. With 10 hits you have only a 0.9 probability of getting the right guy.

    Ignoring this, and simply going with the 1 in a million probability of match is somewhat misleading in the absence of other evidence.

    I really don’t see why this is so freaking hard to understand.

    Comment by Steve Verdon (4c0bd6) — 5/5/2008 @ 2:34 pm

  14. Whoops,

    With 10 hits you have only a 0.9 probability of getting the right guy.

    That should read,

    With 10 hits you have only a 0.1 probability of getting the right guy.

    Comment by Steve Verdon (4c0bd6) — 5/5/2008 @ 2:41 pm

  15. Blah, blah, blah, math, etc. (Okay, seriously: interesting reading here in the comments.)

    To me, though, the eye-catcher is yet another L.A. Times article that’s easily debunked if you have *any* familiarity with its subject.

    Ask friends and family who are expert or even just “knowledgeable” in their respective fields: “How’s the L.A. Times’ coverage of your area of expertise?”

    Then chuckle at the answers.

    Comment by ScottH (219e11) — 5/5/2008 @ 2:56 pm

  16. To me, though, the eye-catcher is yet another L.A. Times article that’s easily debunked if you have *any* familiarity with its subject.

    It hasn’t been debunked, the issue has been side stepped, IMO.

    Comparing a database with world wide coverage with a very good match (as Patterico does) to the real world situtation with small coverage databases with less than good matches is not a debunking, but a non-sequitor. Kaye’s points are good, but not entirely relevant. Once you have more than one hit from a database then you have a problem absent other evidence. Even with other evidence that lack of wide coverage also presents a problem as the actual perpatrator might be outside the database, but there could be a match with someone in the database.

    Where exactly have these issues been debunked?

    Comment by Steve Verdon (94c667) — 5/5/2008 @ 3:04 pm

  17. I really don’t see why this is so freaking hard to understand.

    That’s not fair, Steve, you know that statistics is counter-intuitive to most people without a rigorous math background and even difficult to many with.

    Comment by SPQR (26be8b) — 5/6/2008 @ 12:21 pm

  18. Regardless of the validity or invalidity of the math, there is something important to remember. Keep one thing in mind in this discussion, and that is that there is a strong bias among lay people and courts against having guild or innocence itself ( as opposed to evidentiary facts ) be expressed as a mathematical formula.

    Comment by SPQR (26be8b) — 5/6/2008 @ 12:33 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.


Powered by WordPress.

Page loaded in: 0.3134 secs.