Performing a Search
1. Go to the Block Searcher sequence entry page. As with other tool pages, this page has a number of links to help and information. None of the default parameters need to be changed for this game.
2. This service also allows the user to enter an email address for emailed results. Do not use this during the game. The sequences used will generally lead to results within a minute's time.
3. Use the default Blocks+ database which has a fairly large number of computer generated blocks. The Block Searcher also will search the Prints database that has hand-crafted blocks. This is not as useful for the game because it is much smaller.
4. The server also has default parameters that set the cutoff E-value at 5, return block summaries with alignments, determine the type of sequence entered automatically (whether DNA or protein), set the program to search both DNA strands, and use the standard genetic code.
5. Copy the FASTA formatted sequence below into the sequence entry window, click the "Perform Search" button, and wait for the results.
>unknown protein
MTACGNVPIFKDGKGCGSCYEVRCKEKPECSGNPVTVFITDMNYEPI
APYHFDLSGKAFGSLAKPGLNDKLRHCGIMDVEFRRVRCKYPAGQKI
VFHIEKGCNPNYVAVLVKFVADDGDIVLMEIQDKLSAEWKPMKLSWG
AIWRMDTAKALKGPFSIRLTSESGKKVIAKDIIPANWRPDAVYTSNVQFY
Interpreting Results
1. The result page begins with an introduction that contains explanations about the databases used to build Blocks+ and the arrangement and meaning of the search results lower on the page. Please read this introduction for further information.
2. The next section, labelled "Hits", begins with the name and size of the query, the number of blocks searched, the number of alignments done and the cutoff E-values used.
3. Next are the hits listed by family (groups of related proteins) with a link to the database entry. Along with each hit is listed the strand used for the alignment if the query was DNA (1 is always listed for protein), the number of blocks matched in that family out of the total number of blocks in the family, the anchor E-value, and the combined E-value. The anchor E-value is the E-value of the highest scoring block and the combined is the statistically combined E-values of all similar blocks from that family.
4. For more than one block in a family to be included as a hit to the query, they have to occur in the query in the same order and with approximately the same spacing as in the family blocks. i.e. The regions of proteins in the family that are included in the blocks are all in the same order in all proteins of the family and must be in this same order in the query to be reported as hits.
5. The last part of the result page contains pairwise alignments of the hits with the query and also graphically demonstrate the order of the blocks in the family.
6. The anchor E-values for high scoring hits to blocks will be smaller than those seen with high scoring hits in BLAST because blocks are very short sequences. However, the combined E-values for hits that are significant should be in the same range as or just slightly lower than the significant ranges found in BLAST. The following general conclusions can be drawn from the blocks hits and E-values:
If the combined E-value of a hit is less than 1 X 10-35, the query or its gene product is very likely to be related to the family of proteins from which the block(s) was built. This is further supported if the name/description of the hit is the same or similar to the name/description of the top BLAST hits for the query. This evidence supports the conclusion (used with evidence from other analysis tools) that the sample is a terrestrial contaminant.
If the combined E-value of a hit is between 0.01 and 1 X 10-35, the query or its gene product is possibly related to the family of proteins from which the block(s) was built. This probability greatly increases if the name/description of the hit is the same or similar to the name/description of the top BLAST hits for the query.
If the combined E-value of a hit is between 1 and 0.01, the query or its gene product has a slight possibility of being related to the family of proteins from which the block(s) was built. Again, this probability greatly increases if the name/description of the hit is the same or similar to the name/description of the top BLAST hits for the query.
If the combined E-value of a hit is above 1, the query or its gene product is not likely to be related to the family of proteins from which the block(s) was built. This, along with evidence from other analysis tools, can support a conclusion of "Origin - extra-terrestrial."
7. The results from the analysis of the sequence above with Blocks Searcher show a top hit with a combined E-value of 1.8 X 10-94, and a second with 1.8 X 10-32. Both hits have descriptions containing the words "Lo1 pI signature" with the first describing it as a major pollen allergen. The top BLAST hits for this sequence are all major pollen allergens. So, this sequence would indicate terrestrial contamination.
© Copyright 2000 The Southwest Biotechnology and Informatics Center (SWBIC) / Regents of New Mexico State University. All rights reserved.