Question

Part II: Finding prokaryotic genes

1. Go to FGENESB - Bacterial Operon and Gene Prediction (softberry.com)

2. Select "Escherichia coli K-12" as closest organism, then paste the E. coli query into the

window and click "process."

• How many genes does it find?

• How many are on the + strand?

• How many are on the - strand?

Open a new window and go to https://www.ncbi.nlm.nih.gov/nuccore/NC_000913.3 the

GenBank flatfile for the Escherichia coli str. K-12 substr. MG1655 genomic sequence.

You will need to select "Change region shown" to "selected region : "begin" to "10000" then

click "update view" to see the annotation for this region.

• Did fgenesB find any genes not listed on the GenBank Flatfile?

• Do the starts and stops agree with those listed on the GenBank flatfile?

• If not, which ones are different?

3. Now go to https://en.wikipedia.org/wiki/GeneMark

• How does GeneMark find genes?

4. Now go to http://opal.biology.gatech.edu/GeneMark/gmhmmp.cgi

•paste the E. coli sequence into the window, then select "species:

Escherichia coli_K_12_substr_MG 1655, Output format should be "GFF." Output

options should be "PDF" then click "Start GeneMark.HMM™

• When the output comes up click on the link to "coordinates of predicted genes"

• How many genes does it find?

How many are on the + strand?

How many are on the - strand?

• Which genes were not listed on the GenBank Flatfile?

• Do the starts and stops of the genes they both found agree with those listed on the

GenBank flatfile? If not, which ones are different?

Question image 1