DNA as a Language
IGS 350/550 Computer
Laboratory
M. Rice / M.
Weir
Illustrated
again are cDNA and genomic DNA sequences of the Drosophila
engrailed gene and an alignment of these two sequences
[engrailed is a gene required for embryo
development.]
- What is the definition of an open reading frame (ORF)? [Think of a definition involving the start codon ATG, and the stop codons TAA, TAG, and TGA.]
- [Homework due next Friday] Write pseudocode for a program that identifies all ORFs of a DNA string. Convert your pseudocode to a program, e.g. in C++ or Python. Run your program on the engrailed cDNA and genomic sequences [you might initially work with the first 200 bases of the sequence]. Your group should hand in a print-out of the output for engrailed
cDNA.
For comparison, an example of a function that identifies all the ATG codons in a DNA sequence is illustrated in C++ and in Python.
- Compare your output with the outputs from
the
ORF finder at NCBI using the same
sequences. You can click on the ORF boxes to display the ORF amino
acids.
- Here is the actual ORF
used for the Engrailed protein. Use the codon tables to work out
by hand the first ten amino acids of Engrailed. (See
jkimball
for a codon table and description of translation.)
- Why are other ORFs of the engrailed
cDNA not actually used to make the Engrailed protein? For example,
there are many copies of the start codon ATG (corresponding to
methionine) within the long ORF.
Kozak (for many organisms) and Cavener (for
Drosophila) [see assigned background reference: Cavener, D.R.
(1987) Comparison of the concensus sequence flanking translational
start sites in Drosophila and vertebrates. Nuc. Acid Res.
15:1353-1361] have examined the frequencies of different bases
at positions near the known start (ATG) codons of large numbers of
proteins. In particular, Cavener collected the sequences that
occur in the 10 positions upstream (5') (-10 to -1) of the AUG
initiating translation for many fly genes. Here is a
partial
listing of these sequences.
- [Optional] Calculate the frequencies
of occurence of each nucleotide at each position using the Cavener
Data. Compare the 10 bp upstream of the actual translation start
(AUG) of engrailed and the internal AUG codons. (The 10 bp
immediately upstream of the actual AUG used for initiating
translation are: GTCGAAACCA.)
Copyright 2005 Wesleyan University