An Introduction to Drosophila
Genomics.
In this lab, you will get a chance to integrate different concepts
of GENE and MAP. You have considered genes variously as genetically
mappable units, information residing on chromosomes, that which
underlies phenotypes, scorable markers, historically discovered
mutants, translatable coding regions, clonable segments of DNA, and
sequenced strings of nucleotides. (You can probably add more.)
Today, you will choose one of the markers you have scored in
Drosophila, and follow it through various databases to find
out:
- its cytological map position,
- its genetic map position
- what is known about alleles of the gene
- what other genes are known to be nearby
- the gene's DNA sequence, and how it would be cut by a
restriction enzyme (useful if you wanted to clone it into a
vector)
- the gene's protein sequence (by conceptual translation)
- whether there are recognizable features in the sequence, and
what their likely functions are
- whether there are similar sequences (implying homologous
genes) in other species, and what is known about these potential
homologs.
At the end of this exercise, you will know more than you probably
ever wanted to know about the gene you picked, but also have a feel
for the difference between kinds of maps, the power of databases, and
the strengths and weaknesses of some biological database searching
tools.
Choose a partner and one of the Drosophila genes below, and
let's get started.
|
brown (bw)
|
forked (f)
|
vermilion (v)
|
white (w)
|
|
cinnabar (cn)
|
Stubble (Sb)
|
scarlet (st)
|
eyeless (ey)
|
The major resource you will use is FlyBase, the on-line
incarnation of what used to be known as The Redbook - a big red book
called "Genetic Variations of Drosophila melanogaster" which
listed all known Drosophila mutants, and what was known about
them. The FlyBase database has all of the original Redbook
information, plus molecular information and links to other databases.
There are links directly to relevant sections of the Berkeley
Drosophila Genome Project, which has genomic sequence, cloned
and sequenced cDNAs, and annotation of predicted genes; and to
searches of international protein databases such as SWISS-PROT and
GenBank.
Open FlyBase
(www.Flybase.org) in a new window, and start by searching for your
gene in the "genes" section.
Find your gene among the query
results.
- Verify that the abbreviation matches, so you know you have the
right gene.
- If you are on a page with several genes in a table, click on
your gene's symbol to get to a brief report on the gene. You might
bookmark this synopsis page, since you'll want to come back to it.
Information from many sources is summarized on this page, with
links to more details.
About the Gene and its Mutants.
- Near the top of the synopsis page, you should be able to work
out the different kinds of map information. C. B. Bridges worked
out a scheme for naming the banding patterns when he drew details
maps of the larval salivary gland chromosomes in the 1930's. He
assigned 20 numbered sections to each major chromosome arm - 1-20
for the X chromosome, also known as chromosome 1, 21-40 for 2L,
41-60 for 2R, 61-80 for 3L, and 81-100 for 3R, with 101 and 102
leftover for the tiny 4th chromosome. L and R denote the left and
right arms of the metacentric 2nd and 3rd chromosomes. Major bands
in each section are labeled with capital letters, and the smaller
bands in between numbers. There is often some uncertainty in
reading fine bands, so a range may be listed. For example, 21D1-2
refers to a specific double band near the left end of chromosome
2.
Genetic map distances are listed in recombinational map
units starting at the left tip of each chromosome.
- Down the right-hand side are different sorts of "available
reports." Use them to determine:
- How many alleles of this gene are known?
- When was the oldest allele found? (Follow the links back to
information about alleles - the original mutant generally has
superscript "1").
- The supplier of mutant fly stocks for most labs in this
country is the Bloomington Indiana Stock Center. Under "stocks",
could you order a fly with this mutation from Bloomington (if you
had an account there)? What stock number would you ask for? (There
may be many, if the marker is used in different combinations. Just
list one.)
- What can you tell about the phenotype of mutants, and where
the gene is expressed? (briefly)
- On the schematic map of the gene region,
you can see other genes that have been identified nearby on the
same chromosome. Find the nearest genes to the left and right of
your gene. Click on them to find out more about them. How were
they identified? Do they have mutant phenotypes, or are they cDNAs
or genes predicted from the sequencing and annotating
project?
About the Gene Product - the DNA and protein
sequences.
- Start by retrieving the sequence in usable format. You may be
able to get it under "transcript". Otherwise, set the choices
after Sequence: get to TRANSCRIPT and FASTA, then click GET. That
way, you'll get the sequence uninterrupted by basepair numbering
and other non-nucleotide annotations. Again, bookmark this page
for future reference. (You can also try specific known
transcripts. Just make sure it's all A, T, C, and G. Proteins will
be dealt with later.)
- Open a new window to Webcutter (link
). This is just one of many programs that will do an in
silico restriction digest for you: that is, will find by
computer the sites that a given restriction enzyme would find and
cleave enzymatically in your sequence. Copy and paste your FASTA
sequence into the box halfway down the page, and choose enzyme
conditions. Start simple, with just EcoRI as the enzyme, to see
how many EcoRI sites there are in your gene. Now you can repeat
with another enzyme, or look at many enzymes at a time.
- Now look for open reading frames: an ATG followed by
translatable codons. The reading frame closes at the first
in-frame stop codon. Paste your sequence from before into the NCBI
open
reading frame (ORF) finder
(http://www.ncbi.nih.gov/gorf/orfig.cgi). It will show you ORFs in
all 6 possible reading frames (why 6?). You can see the
DNA->protein translation by double clicking on one of the ORF
shaded boxes (choose the longest ORF). The translation will show
up in the one-letter amino acid abbreviation code. It should match
the sequence you get from "polypeptides" on the synopsis page, but
may not. (Can you think of reasons why not?)
- Are there recognized protein domains for your protein listed on the synopsis page? Do they relate to the presumed function of your gene?
- The Gene Ontology section is based on observed conserved protein domains. It illustrates predicted molecular functions, biological processes, and cellular components. By following some of the links, you will see the shared properties predicted for this gene.
- You may investigate sequence similarities between the protein product(s) of your gene and others using BLASTP (http://www.ncbi.nlm.nih.gov/BLAST). (BLAST is a computer algorithm for Sequence Alignment). Copy the amino acid sequence from Flybase, then paste it into the appropriate section of BLAST (Standard protein-protein BLAST [blastp]). The initial output shows conserved protein domains with links. You have to click the FORMAT button to view the detailed results of the BLASTP search, and may have to refresh a couple times if the server is busy. You can mouse down over the matches found, to discover their sources, and see where the similarities listed in "protein domains" above come from. (Gene names will be at the top as you mouse over the alignment bars.)
- "Linkout" leads you to some associated databases with information pertaining to your gene. For example, the "Yale Dev. expression" link provides gene expression data. Try exploring these links.
[there is a report page to go with this lab]
Copyright 2005 Wesleyan University (mw/lfa)