BIO 256 - Computer Applications in Biology
Assignment 5 - Genomics on the World Wide Web
One of the areas of biotechnology that has changed the most in the last ten years is the acquisition and use of nucleotide and protein sequences. The cost of such sequences, and the time necessary to obtain them, have both dropped remarkably, and the number of known sequences has increased almost exponentially. In fields as disparate as the Human Genome Project, evolutionary biology, and forensic science, the ease of acquiring seqeunces has been revolutionary.
The general discipline of dealing with biological data is called bioinformatics, and the part of bioinformatics that deals with sequences is genomics. The World Wide Web has transformed genomics from the realm of a few esoteric specialists into a resource available to everyone. In this exercise, you will use the genomics tools of The National Center for Biotechnology Information (NCBI) to perform some simple tasks. If you want to learn more about the field, check out the references above, where you can teach yourself enough about the subject to put it on your résumé.
Two common tasks in genomics are to find the sequence for a specific gene of a specific organism (to answer, perhaps, the question "Has anyone sequenced this before?") and to take a specific sequence and find genes and organisms that match ("Whose DNA is this? What is it likely to do?") For the former question, NCBI administers GenBank and other databases of sequence information, which you can search by species or by gene. For the latter question, there is BLAST. In the words of the NCBI web page, "BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA." What this means is that you can give BLAST a sequence, and it will find the likeliest matches in its databases.
Here is the assignment:
Summary of assignment
>unknown sequence CCCCGATTTTTTTGGAACTAGAATCCTGTGAAATTCGGGGTAGGAGGGGGGGGGGGGGGGGGGGGGGGGG GGGAAATCGTTCTGTCCAATATTTATGTCCAATTTATCCCTTATCTCCGTTTTACGCACCTAACCAAATA TTCTTTTCACTATTATTATATTTTATTTTGTCGTAAACATTCTTCAAATGTTTCAATACTTTTGCATTGG TCCTAATCCCTTCATTTTAGGACGTGACCGCCGCAGTGGCGGTCATCAAGACCAAGCGCACCATCCAATT CGTGGACTGGTGCCCCACAGGATTCAAGTGCGGAATCAACTACCAAGCGCCCTCCGTGGTCCCTGGAGGA GATCTCGCGAAGGTGCAGCGCGCCTTGTGCATGATTTCCAACACGACCGCGATCGCCGAAGTGTTCAGCC GCATTGACCACAAGTTCGACCTCATGTACGCCAAGCGTGCGTTTGTGCACTGGTACGTCGGAGAGGGTAT CCTTTAATTCCCCCCC