Question
Part III: FASTA Algorithm Instructions: 1. Go to https://www.ebi.ac.uk/Tools/sss/ and select FASTA, then protein. 2. Choose UniProtKB/TrEMBL as your database (step 1) 3. Paste in your STS protein query sequence (step 2) 4. Click More options... on step 3 and run your queries for the following choices of parameters: Sco
Question image 1Question image 2

This question hasn’t solved by tutor

Solution unavailable? No problem! Generate answers instantly with our AI tool, or receive a tailored solution from our expert tutors.

Found 10 similar results for your question:

Part III: Maximum parsimony methods 1. 2. 3. 4. What is the key assumption of maximum parsimony methods? How does this differ from distance matrix methods? What are the advantages of maximum parsimony methods? What are the disadvantages of maximum parsimony methods?

Part 1: Smith-Waterman Algorithm Instructions: 1. Copy the "STS protein query sequence" from the week 3 links page (This is the one letter code for the protein encoded by the Resveratrol synthase gene, the gene that catalyzes the final step in the resveratrol synthesis pathway). Be sure to include the ">STS query" on the first line, or the program won't accept it. 2. Go to https://www.ebi.ac.uk/Tools/sss/ and choose SSEARCH, then "protein". This will do a Smith-Waterman local alignment on your protein sequence. 3. Choose the UniProtKB/TrEMBL database (under Step 1 on the page) 4. Paste the STS protein query sequence into the paste window 5. Choose SSEARCH under step 3, then click "More options". Here you will find a number of parameters including the substitution matrix. Search UniProt using three different substitution matrices: • BLOSUM 50 • PAM 120 • PAM 250 Be patient, the calculation will take a while. Questions: 1. What is the name and score of the best hit for each matrix? 2. What is the e-value of the best hit for each matrix? 3. Why do you think the results may have changed? 4. What do e-values mean and how do we interpret them?

Part I: Learning about molecular phylogenies 1. What is the basic assumption underlying a molecular phylogeny? Why must we distinguish between gene trees and species trees? 2. 3. Why don't genes always evolve by a series of bifurcations (i.e., by a series of single base changes)? 4. What are the four steps to constructing a molecular phylogeny? 5. What is an orthologous sequence? 6. What is a paralogous sequence? 7. What is a xenologous sequence? 8. Which type of sequences should you use for a species phylogeny? 9. What is the difference between multiple sequence alignments to discover motifs, etc., vs for constructing phylogenies? 10. Why is Clustal W not a very good choice for constructing species phylogenies?/n10. Why is Clustal W not a very good choice for constructing species phylogenies? 11. Please use the supplemental material on the links page to answer the following questions. What is a phylogenetic tree composed of? What is the difference between rooted and unrooted phylogenetic trees? What are the two major groups of analyses used to examine phylogenetic relationships? 0 0 0 What is a paraphyletic grouping? What happens if a multiple alignment is poor? What is the best way to deal with parts of an alignment that are uncertain due to gaps? What sorts of phylogenies are best constructed using DNA sequence alignments? What sorts of phylogenies are best constructed using protein alignments? What sorts of phylogenies are best constructed using ribosomal RNA sequence alignments? 0 0 0 0 0 What is a homoplasy? 0 Why can't we simply construct all possible trees, score each one, then pick the one with the best score? 0

Part V: Tree evaluation 1. 2. 3. 4. 5. What are the three basic ways to resample the data for tree-building? What is jackknife resampling? What is bootstrap resampling? How does it differ from jackknife resampling? Recreate one of your ML trees except use Bootstrap Resampling as a method of tree evaluation. 6. How did your tree change?

Part II: Distance matrix methods 1. Answer the following questions: What is the general approach used by distance matrix methods to construct a phylogeny? a. b. 2. 3. 4. a. 5. a. b. 6. What are the main differences between UPGMA and neighbor-joining methods? Take your protein sequences from the links page and import them into Mega. Align them via Clustal W and save the alignment. Use your alignment to construct UGMA and Neighbor-Joining Trees. What are the differences and similarities between the trees? Repeat the above with an alignment based on MUSCLE instead of ClustalW. How does this change the results? Why do you think the results are different? Include labeled screen shots of your different trees.

Part II: Needleman-Wunsch Algorithm Instructions: 1. Go to https://www.ebi.ac.uk/Tools/sss/ and select GGSEARCH then protein. This will do a Needleman-Wunsch global alignment on your protein sequence. 2. Choose UniProtKB/TrEMBL as your database (step 1) 3. Paste in your STS protein query sequence (step 2) 4. Click More options... on step 3 and choose BLOSUM50 as your scoring matrix and then click Submit. Questions: 1. What is the name and score of your top hit? 2. How do these results differ from what you got with the Smith-Waterman algorithm (SSEARCH)? 3. Why do you think the results differed?

3. RNAFold also uses partition-function methods (AKA thermodynamic ensemble methods). What is a partition function and how is it related to free energy? What are the advantages and disadvantages of calculating partition functions?

Recombinant DNA You have received a PCR product from an unknown organism that caused fever and a strange rash in young child. The family of the child has two kittens, and the physician treating the child suspects an infection with Bartonella. Strangely, the growth characteristics do not match those of known Bartonella species. The clinical lab has amplified an 825 base pair fragment of the unknown organism's genome and has asked you to use the PCR product to investigate the identity of the unknown bacteria. Once you received the DNA you decided to have the PCR product sent out for sequencing. Bioinformatics 1) Using a Blastn search determine the top 3 most similar DNA sequences and using Blastx determine the top 3 most similar proteins. What gene did the clinical lab PCR amplify? • What is the function of the protein? • Based on the results from this gene what organism did you isolated? And what species is your organism most similar to?

1. For BLAST/FASTA tell us how many significant results were found, and which sequences were most closely related and who they came from. Identify and try to explain any unexpected similarities and any differences between the searches using DNA versus amino acid sequences.

Part III: FASTA Algorithm Instructions: 1. Go to https://www.ebi.ac.uk/Tools/sss/ and select FASTA, then protein. 2. Choose UniProtKB/TrEMBL as your database (step 1) 3. Paste in your STS protein query sequence (step 2) 4. Click More options... on step 3 and run your queries for the following choices of parameters: Scoring Matrix BLOSUM 50 BLOSUM 80 BLOSUM 80 BLOSUM 80 Gap Open -10 -10 0 -64 Gap Extend -2 -2 0 -16/nQuestions: 1. Did these calculations take as long as the Smith-Waterman search? If so, why? 2. Were the results different from the Smith-Waterman search? Why do you think this happens? 3. What is the effect of changing to a higher cutoff BLOSUM matrix (i.e. from 50 to 80) and what does it mean? 4. What is the effect of changing the Gap Open and Gap Extend parameters? Why do you think you observed what you did?