Stockholm Bioinformatics Center, SBC
Lecture notes: Molecular Bioinformatics 2001, Uppsala University

Lecture 22 Jan 2001 Per Kraulis

Hidden Markov models

3. Sensitivity and specificity

How can we measure how good a pattern is at detecting a particular sequence feature? There are two different, complementary measures that can be used to describe how well a pattern performs:

We want a pattern to be both sensitive and specific, so that all true matches are found, and nothing but true matches. But as usual, nature will rarely allow us to find such a pattern; we must try to find a good compromise.

A regular expression can easily be designed that has 100% sensitivity: just use the expression that matches anything:.* (dot, star). Everything will be identified as matching this expression, so all true positives will also be identified, and none will be left out (i.e. no false negatives).

Conversely, it is easy to prepare a regexp that is 100% specific: just use a regexp that exactly matches one of the known members of the domain family. Only that single member will be predicted, and no others, (i.e. no false positives).

Let us use an analogy: we have an exam (svenska: "tentamen") to check that a class of students actually have learned what they should about bioinformatics. The exam must of course be designed so that those students that actually have learned their stuff should pass, while those who haven't learned anything will fail.

If the exam is too difficult, then those who manage to pass it will most likely really know their stuff. But there will be many students among those who fail (for stupid reasons, like being too stressed by the difficult questions) who really should have passed it but didn't (false negatives). The exam is specific, but it is not sensitive.

If the exam is too easy, then there will be many among those who pass it who have just made wild guesses, and really haven't learned anything. But on the other hand, of the students who really have learned what they should, not many will fail (even the stress-sensitive will get some answers right). The exam is sensitive, but it is not specific.

Let us define this in a little more technical, precise terms. We define the notions of true and false positive and negative matches in the following way:

The pattern says
Yes, a match No, not a match
Reality Yes, a match True positive; TPos False negative; FNeg
No, not a match False positive; FPos True negative; TNeg

If we use these definitions, then the following holds:

Sensitivity = TPos / (TPos + FNeg)

Specificity = TPos / (TPos + FPos)

Warning: in the medical context (diagnosis), specificity is defined differently: there it is TNeg / (FPos + TNeg).

Copyright © 2001 Per Kraulis $Date: 2001/01/22 12:13:49 $