BIO 256 - Computer Applications in Biology


Assignment 5 - Genomics on the World Wide Web

Important References:


One of the areas of biotechnology that has changed the most in the last ten years is the acquisition and use of nucleotide and protein sequences. The cost of such sequences, and the time necessary to obtain them, have both dropped remarkably, and the number of known sequences has increased almost exponentially. In fields as disparate as the Human Genome Project, evolutionary biology, and forensic science, the ease of acquiring seqeunces has been revolutionary.

The general discipline of dealing with biological data is called bioinformatics, and the part of bioinformatics that deals with sequences is genomics. The World Wide Web has transformed genomics from the realm of a few esoteric specialists into a resource available to everyone. In this exercise, you will use the genomics tools of The National Center for Biotechnology Information (NCBI) to perform some simple tasks. If you want to learn more about the field, check out the references above, where you can teach yourself enough about the subject to put it on your résumé.

Two common tasks in genomics are to find the sequence for a specific gene of a specific organism (to answer, perhaps, the question "Has anyone sequenced this before?") and to take a specific sequence and find genes and organisms that match ("Whose DNA is this? What is it likely to do?") For the former question, NCBI administers GenBank and other databases of sequence information, which you can search by species or by gene. For the latter question, there is BLAST. In the words of the NCBI web page, "BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA." What this means is that you can give BLAST a sequence, and it will find the likeliest matches in its databases.


Assignment

Here is the assignment:

  1. Using the Taxonomy Browser search (look under "Taxonomy"), find the amino acid sequence for the myoglobin protein in the American pika, Ochotona princeps (this small relative of rabbits and hares was the subject of my baccalaureate research, back when I was a zoologist). There are two sequences; you want the second one, labeled "MYOI". You must have the report in the default format - you do not want the FASTA or Graphic format. Mail the resulting page to me. The subject of the mail message must be bio256a5-1 (lower case, no spaces). Be sure to include your name in the email message.

  2. In the standard genetic code, the codon AGA codes for arginine. In vertebrate mitochondria, it does something different. What does it code for? (Start out at the Taxonomy Browser and hit the button on the left called Genetic Codes.) Copy it exactly if you're not sure what it means (and then find out what it means - you're a biotechnologist). Put this information in the same email message as task 3 below.

  3. Using BLAST, find the most likely source of the sequence below (hint: it is an organism that a former Chair of Biological Sciences has worked with--but don't ask him; that's cheating). The data are already in the FASTA format that BLAST prefers, so you can copy and paste into the data entry box in BLAST: copy the block below, click on the links to get to BLAST (the Basic BLAST Search), and then paste into the box. There are a lot of check boxes and other options on that page--you can leave them at the default values. When you submit your request (by clicking the "Search" button), you will be on and electronic queue, and the resulting web page will tell you how to see the results. Scroll down until you see "Sequences producing significant alignments"; you want the one with the highest score, which is usually the top onbe in the list. Do not email the web results; they are huge, and you will receive no credit if you send them. Instead, send me an email with the subject bio256a5-2, and in the body of the message tell me
    • The name of the organism
    • The name of the gene
    • The vertebrate mitochondrial meaning of the code AGA (from item 2).

Summary of assignment

Check off Format Content Subject line
Forwarded web page Myoglobin sequence bio256a5-1
Email message Three answers above bio256a5-2
>unknown sequence
CCCCGATTTTTTTGGAACTAGAATCCTGTGAAATTCGGGGTAGGAGGGGGGGGGGGGGGGGGGGGGGGGG
GGGAAATCGTTCTGTCCAATATTTATGTCCAATTTATCCCTTATCTCCGTTTTACGCACCTAACCAAATA
TTCTTTTCACTATTATTATATTTTATTTTGTCGTAAACATTCTTCAAATGTTTCAATACTTTTGCATTGG
TCCTAATCCCTTCATTTTAGGACGTGACCGCCGCAGTGGCGGTCATCAAGACCAAGCGCACCATCCAATT
CGTGGACTGGTGCCCCACAGGATTCAAGTGCGGAATCAACTACCAAGCGCCCTCCGTGGTCCCTGGAGGA
GATCTCGCGAAGGTGCAGCGCGCCTTGTGCATGATTTCCAACACGACCGCGATCGCCGAAGTGTTCAGCC
GCATTGACCACAAGTTCGACCTCATGTACGCCAAGCGTGCGTTTGTGCACTGGTACGTCGGAGAGGGTAT
CCTTTAATTCCCCCCC