Information on profile databases available for the ISREC Profilescan server
This server uses the pfscan program to search a
single sequence against currently available profile databases. All searchable
databases contain generalized profiles. If this is not the native format
of the database, the entries have been converted to the generalized profile
format.
-
PROSITE
-
Amos Bairoch's PROSITE database now includes
profile entries describing protein families and domains which are not easily
covered by PROSITE's own regular-expression based syntax. The
PROSITE profiles, which are constructed here at
the ISREC, are intended to be of particular high quality. The process of
profile construction and iterative refinement is described on
this poster presented on the 1995 ISMB meeting
in Cambridge.
The list of available profiles is relatively small at the moment, but many
more profiles are in the work and we continuously update the list of PROSITE
profiles available on the server.
-
PROSITE Pre-release
-
We are now also offering searches using our large collection of pre-release
profiles. These profiles will eventually go into PROSITE but for some reasons
got stuck in the pipeline. The most frequent reason is a lack of decent
documentation, which is an absolute requirement of any PROSITE entry. However,
the pre-release profiles are of the same quality as the final PROSITE profiles.
You can browse the available profiles and check the status of the documentation
project in this
list.
-
Pfam
-
Pfam is a collection of protein motifs and families initially compiled by
Erik Sonnhammer and
Sean Eddy. It is now maintained by the
Bioinformatics group at the Sanger Centre.
Pfam is subdivided into two parts, Pfam A (Release 2.0)
contains 527 high-quality families, generated using the HMMER Hidden Markov
Model package. Pfam B contains a number of protein families
automatically extracted from SwissProt and is currently not available for
searches on this server. More information is available on the
Pfam homepage at the Sanger Centre.
The ISREC ProfileScan server offers searches against two different versions
of the Pfam A collection. Both are directly converted from HMMERs interchange
format into the generalized profile format and contain the full information
of the corresponding HMMs. One set, called "PfamA bitscaled", generates
the same kind of scores as the HMMs themselves (apart from minor round-off
errors). The other set, called "PfamA NScore", has been rescaled the
same way as the above mentioned PROSITE profiles by analysing the score
distribution obtained with a randomized database. This version of PfamA returns
NScores as a means of assessing the statistical
significance of a match.
-
Domains and families covered by PROSITE and Pfam
-
Since PROSITE and Pfam are independent projects, there is some overlap between
the two databases. Generally, Pfam focusses on 'classical' domains
with a high proportion of extracellular modules. In contrast, the
PROSITE profile collection emphasizes domains in intracellular proteins,
proteins involved in signal transduction, DNA repair, cell cycle regulation
and apoptosis.
Go to the ProfileScan page