Performing a Search
1. Go to the ClustalW sequence entry page. Please look over the page to become familiar with the information available and the data required for a search. Each of the entry window or drop menu titles is a link to information about that part of the form. For the purposes of this game, none of the parameters need to be changed for a ClustalW multiple alignment.
2. When performing Clustal alignments outside of this game (http://www2.ebi.ac.uk/clustalw/), an email address does not need to be entered unless it is desired that the results be returned by email. This should only be necessary when aligning many sequences or sequences that are very long. The sequences that will be provided in the game should be aligned quickly (30 sec. - 1 min.), so no email address is needed.
3. If desired, a sequence title may be entered, but is not necessary for the alignment to succeed.
4. If the alignment will only be viewed on a screen, a color alignment can be done by selecting yes in the "Color Alignment" drop menu. For the game, do not change this option. This tutorial will only explain the results obtained with a non-color alignment.
5. ClustalW can generate phylogenetic trees when a properly formatted alignment is input. The subject of phylogenetics is beyond the scope of this game, but further information can be found elsewhere on the Internet.
6. The remainder of the parameters influence the way the program will score and build the alignments. These have been optimized for general alignments, so the defaults will be used.
7. Paste the sequences below into the sequence entry window. During the game, copy the preformatted sequences obtained from the game notebook "Reformated Sequences for ClustalW Input" page into the sequence entry window. Remember that these sequences should be in concatenated FASTA format.
>query
MKNTLLKLGVCVSLLGITPFVSTISSVQAERTVEHKVIKNETGTISISQLNKNVW
VHTELGYFSGEAVPS
TDVIITHAHADRIGGMKTLKERGIKAHSTALT
FGNMKVETFYPGKGHTEDNIVVWLPQYQILAGGCLVKSASSKDLGNV
NEWSTSIENVLKRYGNINLVVPGHGEVGDRGLLLHTLDLLK
>gi|2984094
MGGFLFFFLLVLFSFSSEYPKHVKETLRKITDRIYGVFGVYEQVSYENRGFISNAY
FYVADDGVLVVDALSTYKLGKELIESIRSVTNKPIRFLVVTHYHTDHFYGAKAFR
EVGAEVIAHEWAFDYISQPSSYNFFLARKKILKEHLEGTELTPPTITLTKNLNVYLQ
VGKEYKRFEVLHLCRAHTNGDIVVWIPDEKVLFSGDIVFDGRLPFLGSGNSRTWL
VCLDEILKMKPRILLPGHGEALIGEKKIKEAVSWTRKYIKDLRETIRKLYEEGCDVE
CVRERINEELIKIDPSYAQVPVFFNVNPVNAYYVYFEIENEILMGE
>gi|115023|sp|P10425|
MKKNTLLKVGLCVSLLGTTQFVSTISSVQASQKVEQIVIKNETGTISISQLNKNVW
VHTELGYFNGEAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTD
VIITHAHADRIGGITALKERGIKAHSTALTAELAKKSGYEEPLGDLQTVTNLKFGNTK
VETFYPGKGHTEDNIVVWLPQYQILAGGCLVKSAEAKNLGNVADAYVNEWSTSIE
NMLKRYRNINLVVPGHGKVGDKGLLLHTLDLLK
>gi|115030|sp|P25910|
MKTVFILISMLFPVAVMAQKSVKISDDISITQLSDKVYTYVSLAEIEGWGMVPSNGM
IVINNHQAALLDTPINDAQTEMLVNWVTDSLHAKVTTFIPNHWHGDCIGGLGYLQR
KGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTVSLDGMPLQCYYLGGGHATDNIV
VWLPTENILFGGCMLKDNQATSIGNISDADVTAWPKTLDKVKAKFPSARYVVPGH
GDYGGTELIEHTKQIVNQYIESTSKP
>gi|282554|pir||S25844
MTVEVREVAEGVYAYEQAPGGWCVSNAGIVVGGDGALVVDTLSTIPRARRLAEWV
DKLAAGPGRTVVNTH
RVDWGEIELRPPNVTFRDRLTLHVGERQVE
VVMSGVTPFALFGSVAGTLAALDRLAELEPEVVVGGHGPVAGP
QRLAADAVDRRLTPLQAARRADLGAFAGLLDAERLVANLHRAHEELLGGHVRDAM
EI
8. Click on the "Run ClustalW" button and wait for the results.
Interpreting Results
1. Remember that unlike the pairwise alignments in BLAST that align only the best matching segments of the hit and query sequences, ClustalW finds the best alignment over the full length of each sequence submitted. The sequences formatted by the game are the full length sequences of the query and hits as they appear in GenBank or GenPept.
2. The first section of the ClustalW Results page contains two buttons. One returns the user to the sequence entry page to run another analysis. The other (labelled JalView) takes the user to a Java Applet alignment editor called JalView. Because all game users may not have Java ready browsers, the game does not support this option and concentrates on the alignment that is returned in the users browser window. (JalView brings up the ClustalW results in a color alignment with a histogram showing the degree of similarity between all sequences. This alignment can be edited for publication.)
3. The next section of the result page (labelled "Pairwise Scores") gives some information about how the program ran. It gives the names and lengths of the sequences read, the scores obtained as each sequence was aligned in a pairwise alignment with each other sequence.
4. The section labelled "Your guide tree:" gives a guide tree that can be used to construct a phylogenetic tree. The alignment is built by first aligning the two most similar sequences and then adding the other sequences to the alignment in descending order of similarity.
5. The next section shows the actual multiple alignment of the input sequences. Each group of lines shows the sequence names followed by 50 letters (or dashes for gaps) of each sequence. That part of the alignment for the next group shows the next 50 positions of the alignment, and so on.
6. Under aligned protein sequences there is a line containing the characters ., :, and *. These characters mean the following:
* = this column of the alignment contains identical amino acid residues in all sequences (or identical bases if DNA sequences are aligned)
: = this column of the alignment contains different but highly conserved (very similar) amino acids
. = this column of the alignment contains different amino acids that are somewhat similar
blank = this column of the alignment contains dissimilar amino acids or gaps (or different bases if DNA sequences are aligned)
7. With aligned DNA sequences, the character * will only appear, and it means that the same base is found at that position in all sequences in the alignment.
8. There are many uses for ClustalW alignments, but for this game (review number 1 in this section) there are three main pieces of information that it will provide.
If there are a fair number (at least 30-40%) of similarities (., :, or *) spread over the full lengths of the sequences they are likely to be related.
If there are a larger number (at least 50-60%) of similarities grouped in a segment or segments of the alignment, the sequences are likely to share one or more functional domains.
If there are few similarities (less than 20-25%) in the alignment the sequences are not likely to be functionally related. The hits seen in the BLAST output must be random similarities to different parts of the query sequence. This does not mean that the query sequence is not related to any of the hit sequences, it just means that it is not likely to be functionally related to any known terrestrial families of proteins.
9. For the DNA sequence multiple alignment performed for this tutorial, there are around 70% identical nucleotides spread over the full lengths of the sequences indicating that all of the sequences are related, and if these sequences are high BLAST hits, that the sample is likely to be a terrestrial contaminant.
10. Go to this ClustalW result page to see an alignment that illustrates the last possibility in number 8 above.
© Copyright 2000 The Southwest Biotechnology and Informatics Center (SWBIC) / Regents of New Mexico State University. All rights reserved.