Conversion between normalized profile scores (NScore) and random match probabilities

Currently, two programs for profile to sequence comparisons report NScores when used with appropriately scaled profiles (like the ones contained in PROSITE).

These programs usually display two different scores per comparison. One of them is the 'raw score', an integer number, which is calculated directly from the profile and the given sequence.

Most of the profiles contain also scaling information that has been derived from the score statistics when running the profiles against the total protein database. This scaling information is used to calculated the normalized 'NScore', a real value that translates directly to the the number of matches that can be expected in a database of a given size.

The following table gives somes examples on how to convert the NScores into probabilities for the SwissProt database and the nonredundant (nr) protein database. The calculation is based on a database size of

            Expected chance matches in
   NScore    SwissProt   nonredundant
     7.0       1.8         5.8
     7.5       0.58        1.82
     8.0       0.18        0.58
     8.5       0.058       0.182
     9.0       0.018       0.058
     9.5       0.006       0.0182
    10.0       0.0018      0.0058
    10.5       0.0006      0.0018
    ...and so on...

The NScore of a match is the negative decadic logarithm of the expected number of matches of the given quality (or better) in a random database of the given size. For NScores <<1, this converges to the probability of finding the match in the database. Since the number of expected matches depends on the size of the database, the decadic logarithm of the database size must be subtracted before the calculation:

-log(NExp) = NScore - log (DBsize)

where (NExp=Expected number of chance matches) and (DBsize=size of the database in characters).