Question

Part II: Finding prokaryotic genes 1. Go to FGENESB - Bacterial Operon and Gene Prediction (softberry.com) 2. Select "Escherichia coli K-12" as closest organism, then paste the E. coli query into

the window and click "process." • How many genes does it find? • How many are on the + strand? • How many are on the - strand? Open a new window and go to https://www.ncbi.nlm.nih.gov/nuccore/NC_000913.3 the GenBank flatfile for the Escherichia coli str. K-12 substr. MG1655 genomic sequence. You will need to select "Change region shown" to "selected region : "begin" to "10000" then click "update view" to see the annotation for this region. • Did fgenesB find any genes not listed on the GenBank Flatfile? • Do the starts and stops agree with those listed on the GenBank flatfile? • If not, which ones are different? 3. Now go to https://en.wikipedia.org/wiki/GeneMark • How does GeneMark find genes? 4. Now go to http://opal.biology.gatech.edu/GeneMark/gmhmmp.cgi •paste the E. coli sequence into the window, then select "species: Escherichia coli_K_12_substr_MG 1655, Output format should be "GFF." Output options should be "PDF" then click "Start GeneMark.HMM™ • When the output comes up click on the link to "coordinates of predicted genes" • How many genes does it find? How many are on the + strand? How many are on the - strand? • Which genes were not listed on the GenBank Flatfile? • Do the starts and stops of the genes they both found agree with those listed on the GenBank flatfile? If not, which ones are different?

Fig: 1