Information on profile databases available for the ISREC Profilescan server

This server uses the pfscan program to search a single sequence against currently available profile databases. All searchable databases contain generalized profiles. If this is not the native format of the database, the entries have been converted to the generalized profile format.


PROSITE
Amos Bairoch's PROSITE database now includes profile entries describing protein families and domains which are not easily covered by PROSITE's own regular-expression based syntax. The PROSITE profiles, which are constructed here at the ISREC, are intended to be of particular high quality. The process of profile construction and iterative refinement is described on this poster presented on the 1995 ISMB meeting in Cambridge.
The list of available profiles is relatively small at the moment, but many more profiles are in the work and we continuously update the list of PROSITE profiles available on the server.

PROSITE Pre-release
We are now also offering searches using our large collection of pre-release profiles. These profiles will eventually go into PROSITE but for some reasons got stuck in the pipeline. The most frequent reason is a lack of decent documentation, which is an absolute requirement of any PROSITE entry. However, the pre-release profiles are of the same quality as the final PROSITE profiles. You can browse the available profiles and check the status of the documentation project in this list.
Pfam
Pfam is a collection of protein motifs and families initially compiled by Erik Sonnhammer and Sean Eddy. It is now maintained by the Bioinformatics group at the Sanger Centre. Pfam is subdivided into two parts, Pfam A (Release 2.0) contains 527 high-quality families, generated using the HMMER Hidden Markov Model package. Pfam B contains a number of protein families automatically extracted from SwissProt and is currently not available for searches on this server. More information is available on the Pfam homepage at the Sanger Centre.
The ISREC ProfileScan server offers searches against two different versions of the Pfam A collection. Both are directly converted from HMMERs interchange format into the generalized profile format and contain the full information of the corresponding HMMs. One set, called "PfamA bitscaled", generates the same kind of scores as the HMMs themselves (apart from minor round-off errors). The other set, called "PfamA NScore", has been rescaled the same way as the above mentioned PROSITE profiles by analysing the score distribution obtained with a randomized database. This version of PfamA returns NScores as a means of assessing the statistical significance of a match.


Domains and families covered by PROSITE and Pfam
Since PROSITE and Pfam are independent projects, there is some overlap between the two databases. Generally, Pfam focusses on 'classical' domains with a high proportion of extracellular modules. In contrast, the PROSITE profile collection emphasizes domains in intracellular proteins, proteins involved in signal transduction, DNA repair, cell cycle regulation and apoptosis.


Go to the ProfileScan page