Lecture 6
Bio 451 Molecular Biology Techniques
Nature of Nucleic Acids; DNA Analysis: Agarose Gel Electrophoresis, Restriction Enzymes, DNA Sequencing, PCR, DNA Microarrays (ORF, cDNA)
Some notes were taken from “Recombinant DNA,” 2nd Ed, J. Watson, M. Gilman, J. Witkowski and M. Zoller (1998). Published by W.H. Freeman and Co.
· DNA is synthesized from deoxyribonucleotide triphosphates. That is, each of the “building blocks” of DNA is made up of a base (A,T,C or G), the sugar deoxyribose and a phosphate group. DNA is synthesized from 5’ to 3’ direction with the aid of DNA polymerase which catalyses the addition of the 5’ phosphate group on to the 3’ hydroxy group of the previous base.
· RNA differs from DNA in the following: RNA is synthesized from ribonucleotide triphosphates. That is, each “building block” of RNA is made up of a base (A,U,C or G), the sugar ribose and a phosphate group. RNA is synthesized 5’ to 3’ by RNA polymerase (the polymerase may be one of three: RNA pol I, RNA pol II, or RNA pol III; these synthesize rRNA, mRNA and tRNA, respectively).
· A relaxed circular DNA molecule can be twisted into a negatively supercoiled molecule by the action of DNA gyrase. The reverse reaction is catalyzed by topoisomerase. The strain in the negatively supercoiled form may be relieved by local disruption of the double helix to produce single-stranded regions, thereby returning the molecule to its relaxed form. The degree of supercoiling determines how compact the circular DNA molecule will become; molecules with increasing degrees of supercoiling are increasingly compact and therefore migrate more rapidly during electrophoresis through a gel. Thus, supercoiled DNA adopts a more compact configuration than its “relaxed” equivalents. It thus moves faster than “relaxed” DNA when subjected to gel electrophoresis. Agarose gel electrophoresis can separate DNA molecules with different amounts of supercoiling. Completely supercoiled DNA has the greatest mobility in agarose gels; molecules with progressively fewer superhelical turns migrate progressively more slowly.
· The restriction modification system was recognized in 1960’s. Bacteria can produce two enzymes characteristic of that strain. One enzyme, a restriction endonuclease (RE), cuts double stranded DNA and a modification enzyme adds methyl groups (DNA methyltransferase) to the strand so that the DNA susceptible to cleavage by the RE is no longer susceptible. Thus, both enzymes recognize the same DNA sequence.
· There are two types of restriction endonucleases. Class I RE cleave at random distances from the recognition sequence, and produce random sized DNA fragments. Class II restriction enzymes cleave DNA at very specific sites which generate a unique and specific set of DNA fragments. Class II RE recognize palindromes (sequences that read the same in the 5’ to 3’ direction and 3’ to 5’ direction). Thus, when the recognition site is comprised of either four or six bases, the nucleotide sequence exhibits a two-fold axis of symmetry. Restriction enzymes cut producing “sticking ends” or “blunt ends.” In most cases, class II RE produce products with 5’ terminal phosphate and 3’ terminal hydroxyl residues.
· The name of the RE derives from the genus and species of the bacteria, the first letter being the genus name and next two letters being the first two letters of the species name (i.e. Eco RI = Escherichia coli). In many cases, a bacterial strain is found to contain more than on RE. In this case, each enzyme is assigned a roman numeral (i.e. Eco RI, Eco RII, Eco RV, Hind III, etc.).
· Physical Mapping of DNA. RE digest permits the purification of specific DNA sequences and the mapping of sequences relative to one another. RE mapping can be achieved by digesting the DNA with single RE followed by double or triple digests.
· Cloning of DNA. RE permit the isolation of a particular DNA sequence creating a fragment that can now be inserted into a region of a bacterial plasmid or viral DNA that will serve as vectors.
· Hybridization. A particular gene may be located along a DNA molecule by hybridizing a particular DNA or mRNA probe to DNA fragments generated by RE digestion. E. Southern developed a technique in which the DNA restriction fragments separated by agarose gel electrophoresis were denatured (strands separated) in place by placing the entire gel into an alkali solution. The denatured DNA fragments can then be transferred and bound to a nitrocellulose membrane in precisely their arrangement in the gel by causing the fragments to diffuse out of the gel onto the membrane. After transfer, the membrane is placed in a solution containing a labeled DNA or RNA probe. The probe will base-pair (hybridize) only with the complementary sequence of DNA.
· Sequence Analysis of DNA. The DNA molecule to be sequenced is cleaved by a RE into a number of fragments. These fragments are then separated by electrophoresis, isolated and may be radioactively labeled at the 5’ end using T4 polynucleotide kinase (an enzyme that can transfer g32P from g32P ATP to 5’ end of DNA chain. After labeling the DNA to be sequenced, the fragment is cleaved by a second restriction enzyme to separate the two 5’ ends. DNA sequencing can then be carried out by the Maxam and Gilbert technique or the Sanger dideoxy method (see below).
· Hybridization can occur between DNA-DNA or DNA-RNA. Hybridization is very useful to examine the expression of genomes and genomic organization.
· Hybridization is based on complementarity between strands of nucleic acids. As the complementary strands collide, they base pair (A-T and C-G). Initially the nucleic acids (NA) base pair over a short region. This initial base pairing is called nucleation. Once this region of base pairing is established the continuation of base pairing to the end of the complementary region is relatively fast (compared to the rate of nucleation).
· The kinetics of hybridization was initially developed for hybridization in solution – both reacting molecules are in solution (as compared to membrane hybridization where one of the reacting NA is immobilized on a membrane). The rate of hybridization is dependent upon the initial concentration of the target DNA x time – written as Cot. Cot1/2 is the value of Cot at which the reaction has proceeded to 1/2 completion.
· Other factors influencing hybridization are temperature, ionic strength, mismatched base pairs and probe complexity.
· Temperature. Maximum rate occurs at 20-25oC below the Tm for DNA-DNA hybrids and 10-15oC below the Tm for DNA-RNA hybrids – Tm is the temperature at which 1/2 of the strands have hybridized (or reassociated) with each other.
· Ionic strength. Optimum rate occurs at 1.5 M Na+
· Mismatched base pairs. Each 10% of mismatching reduces the hybridization rate by a factor of 2.
· Probe complexity refers to the number of non-repetitive sequences- repetitive sequences increase the hybridization rate. A probe might be a cDNA copy of a mRNA, or a RNA, or a synthetic oligonucleotide prepared from knowledge of the protein sequence and the genetic code.
Procedure for solution hybridization
· The target DNA may be cut with restriction enzymes. The probe, for example a cDNA, is labeled – radioactively or not. If double-stranded nucleic acids are used, the reactants are heated at 100oC for 1 min to separate the strands. The reactants are then incubated at optimum temperatures for various times (depending on the Cot value desired). At various time points, the extent of hybridization is measured. For example, S1 nuclease can be used which digest single-stranded nucleic acids, leaving the hybrids intact. Hybrids can be collected on a filter that can be counted for radioactivity, or detected for binding of a non-radioactive probe.
· Examples: One can show organization of genome into highly repetitive, moderately repetitive, and unique sequences. In this case strands of DNA are dissociated and then allowed to reassociate for varying periods of time, or depending on NA concentration (according to the Cot value calculated). The % of hybridization (or % of single-stranded NA remaining) is determined for each time point, and plotted to give a Cot curve that shows 3 classes of genes.
· Another example of hybridization is substractive hybridization used to remove DNA sequences that are common to closely related cell types (e.g. remove B cell sequences from T cell sequences, leaving sequences unique to T cells). cDNA from T cells is hybridized with excess mRNA from B cells, and hybrids are separated by chromatography.
· Ribonuclease protection assay to determine expression of a particular gene in a tissue is another example of hybridization in solution. In this technique, all the mRNAs are isolated from a tissue and hybridized with a RNA probe specific for the gene of interest. The probe finds its complementary RNA sequences and hybridizes, then the non-hybridized sequences are degraded by S1 nuclease or RNase, and the hybrids may be detected on a gel.
Membrane Hybridization (filter hybridization)
· One of the hybridizing molecules is immobilized on a nylon, or nitrocellulose membrane (Southern blotting for DNA, Northern blotting for RNA) by capillary action, vacuum transfer or electrophoretic transfer. The NA is then fixed to the membrane by baking or UV crosslinking. The membrane is then prehybridized to prevent the probe from non-specifically binding to the holes, crevices in the membrane (the same binding sites which bind DNA during blotting procedure). Some prehybridization solutions include blocking reagents such as Denhardt’s solution (Ficoll, polyvinylpyrrolidone, and bovine serum albumin), denatured salmon sperm DNA (or some type of non-homologous DNA), Blotto (Bovine Lacto Transfer Technique Optimizer), heparin, yeast tRNA. “Current Protocols” recommends prehybridizing for at least 3 h with nitrocellulose membrane, and 15 min for nylon membranes (longer times are also fine). The prehybridization solution is replaced by hybridization buffer containing labeled probe. Hybridization is done in sealed plastics bags, or in a hybridization oven (incubator) with rotating bottles.
· Kinetics of membrane hybridization are difficult to predict because you do not know exactly how much of the immobilized DNA (or RNA) is available for hybridization. “Maniatis” recommends that you hybridize for a time sufficient to reach 1-3X the Cot1/2.
Number of hrs to reach Cot1/2 = (1/x) X (y/5) X (z/10)
Where x= weight of probe (in µg); y= complexity (for most probes, the complexity is proportional to the length of the probe in kilobases); z= volume of reaction in ml.
· In practice, most hybridizations are done “overnight.”
· In membrane hybridization, the hybridization “rate” is probably less important than hybrid stability, since most membrane hybridization reactions are done long enough that the factors affecting rate are not a consideration. Hybrid stability is expressed as the melting temperature, or Tm, which for membrane hybridization is the temperature at which the probe dissociates from the target DNA.
· Tm is determined by Na+ concentration, % G+C, concentration of formamide, and the length of the hybrid in base pairs.
For DNA-DNA hybrids, Tm = 81.5oC + 16.6 (log [Na+])+ 0.41(%GC)-0.61(%formamide)-500/length of the hybrid in base pairs.
For RNA-DNA hybrids, Tm = 79.8oC + 18.5 (log [Na+])+ 0.58(%GC)-11.8(%GC)2-0.56(%formamide)-820/length of the hybrid in base pairs.
· Formamide destabilizes nucleic acid duplexes, reducing the Tm by an average of 0.6oC per 1% formamide for DNA-DNA hybrids. Use of formamide allows you to carry out the hybridization at a lower temperature (usually 50% formamide is used and the hybridization is done at 42oC rather than 68oC. Formamide is advantageous with hybridizations on nitrocellulose membrane which are brittle and may fall apart at higher temperatures. Formamide is also useful in RNA-DNA hybrids because the RNA probe can bind very tightly even to heterologous (non-specific) sequences; the destabilizing effect of formamide helps reduce non-specific hybridization.
· It is recommended that membrane hybridization be done at 20-25oC below the Tm for DNA-DNA hybrids and 10-15oC below the Tm for DNA-RNA hybrids. As done in the lab, a typical DNA-DNA hybridization is carried out at 68oC (in the absence of formamide) in high salt (5X SSC buffer (0.75M NaCl, 0.075M sodium citrate).
· Washing is done to remove probes that are not completely hybridized with the target DNA to destabilize mismatched heteroduplexes. Washing is done at increasing “stringency” to approach the Tm of the hybrid- maximum stringency is 5oC below the Tm. Stringency is increased by decreasing the salt concentration (or increasing the temperature). As done in the lab, a typical washing procedure is to wash 2X with 2X SSC, 0.1% SDS for 5 min at room temperature, and 2X with 0.1% SDS for 15 min at 68oC. SDS acts as a blocking reagent in the wash solution.
· Sometimes “reduced” rather than “high” stringency is desirable. Reduced stringency washes would be used if you wanted to detect closely related but not identical sequences. For example, members of a gene family, or closely related genes in a different organism.
· Stringency of hybridization is a determined by the combination of factors (temperature, salt, and organic solvent concentration) that influence the ability of two polynucleotide strands to hybridize. At high stringency, only perfectly complementary strands will hybridize. At reduced stringency, some mismatches can be tolerated.
Dot blot hybridization is useful to screen many samples.
For example, the expression of genes over time, or expression of genes in cancerous vs non-cancerous cells, gene expression in cells stimulated with hormones, inducers, etc. A “minifold” apparatus allows you to spot many samples for hybridization. A membrane is placed below the sample wells, samples of DNA (or RNA) are spotted on the membrane through the sample wells, then the DNA (or RNA) is fixed to the membrane. The membrane can be hybridized to a probe as done with the Southern blots.
· Maxam and Gilbert technique. Each fragment to be sequenced (DNA labeled at its 5’ end) is separated into 4 aliquots. Each aliquot is subjected to chemical degradation specific for one of the four nucleotides. The four reaction mixtures are then electrophoresed on a polyacrylamide gel to separate the fragments based on their size. The smaller fragments migrate ahead of the larger ones. The fragments can then be detected by X-ray film. Since the separation was based on size, the sequence can be read upward from the bottom of the gel by the bands appearing in each of the four tracks.
· Sanger DNA sequencing procedure (1975). 2’,3’-dideoxynucleotides of each of the four bases are prepared. These molecules can be incorporated into DNA by E. coli DNA polymerase because they have a normal 5’ triphophate; however, once incorporated into a growing DNA strand, the ddNTP cannot form a phosphodiester bond with the next incoming dNTP. Growth of that particular DNA chain stops. To sequence by the Sanger method, the following are needed: the DNA strand to be sequenced (the template), a short-labeled piece of DNA (the primer) that is complementary to the end of the template, a carefully controlled ratio of one particular ddNTP with its normal deoxynucleotide, and the other three dNTPs. A DNA strand to be sequenced, along with labeled primer, is split into four DNA polymerase reactions, each containing one of the four ddNTPs. When DNA polymerase is added, normal polymerization will begin from the primer; when a ddNTP is incorporated, the growth of that chain will stop. If the correct ratio of ddNTP:dNTP is chosen, a series of labeled strands will result, the lengths of which are dependent on the location of that particular base relative to the end of the DNA. The resultant labeled fragments are separated by size on a polyacrylamide gel containing urea.
· With the advent of DNA sequencing techniques and the consequent advent of the discipline of genomics (see below and table 24.1), the human genome sequences is almost known in its entirety. Now we can begin to look for differences among individuals. So far, most of these are differences in single nucleotides, so we call them single-nucleotide polymorphisms, or SNPs (pronounced “snips”).
· With the ability to sequence DNA, scientists have been able to amass databases that contain a large number of DNA sequences. These DNA sequences are from the genomes of a variety of organisms, and represent both specific genes as well as short regions that have been sequenced but as yet no function is known. From the DNA sequence of a gene, a potential open reading frame (see below) of the gene can be determined, and then the amino acid encoded by each set of three bases can be determined. Thus, protein sequence can be directly determined from DNA sequences.
· Databases. A variety of DNA and protein sequence databases exist. Some commonly used ones are: GeneBank, EMBL, DDBJ, PIR, SWISS-PROT, PRF, and PDB. Each database has a different method for obtaining sequences and this affects the type of sequences present in the databse. Some are large compilations of many sequences. Some contain only specific atypes of sequences, such as vector sequences for introducing DNA into organisms, and some contain sequences oly from one organism.
· GeneBank, for example, is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research 1998 Jan 1; 26(1): 1-7). The complete release notes from the current version of GenBank are available. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Labortory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. Many journals require submission of sequence information to a database prior to publication so that an accession number may appear in the paper. On the other hand, PDB is a protein data bank that contains sequences derived from the 3 dimensional structure Brookhaven Protein Data Bank.
· Comparing Sequences. A wealth of information can be obtained by comparing sequences. On the single gene level, such comparison can reveal the function of an unknown gene sequence by its similarity to genes of known sequence. It can reveal homologous genes in other organisms. And it can reveal shared domains in different proteins, to name just a few uses.
· The process of sequence comparison involves comparing a query sequence with all the sequences within a database. This requires that a number of comparisons be made. Moreover, seldom does the sequnce of a homologous gene, a gene with the same function or a gene with a similar domain match identically in sequence to the query sequence. With amino acid sequences, there are chimical similarities between different amino acids; some are basic, acidic, polar or non-polar. Often amino acids can be replaced with a similar type and the protein will still form the same domain and maintain the same function. For this reason, different search programs have been developed to compare sequences and to assess when sequences have sufficient similarity.
· A commonly used method for sequence comparison is the Basic Local Alignment Search Tool (BLAST). BLAST is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity (Altschult et al., 1990).
· Results from these searches indicate the specific sequences from within the databases that have similarity and the regions within the sequences that are similar. Two parameters that are usually indicated for the results are the percent identity, that is the percentage of sequence with identical bases or amino acids, and percent similarity, that is the percentage of the sequence with similar amino acids.
· Other uses for computer sequence comparison and analysis include secondary structure of proteins from the primary sequence, isoelectric point determination, hydrophobicity and hydrophilicity of a protein display of 3D structures for proteins; determination of restriction endonucease cleavage sites within a DNA sequence; Potential secondary structure from a RNA primary sequence.
Open reading frames (ORF)
· Genes are DNA sequences that code for a functional product, whether RNA (rRNA or tRNA) or protein. Genes on a particular DNA sequence are detected by an Open Reading Frame (ORF) sequence that starts with the AUG sequence. However, not all sequences that start with AUG will code for a functional product.
· Gene expression can be determined by analyzing the presence of particular mRNAs within a tissue. A cDNA can then be synthesized from the mRNA by reverse transcriptase and a probe be made from the cDNA. The cDNA in turn can then be amplified by PCR and a probe be made to use in the microarrays (see below).
Polymerase Chain Reaction (PCR)
· PCR was devised by Kary Mullis in the mid-1980’s. A major problem in analyzing genes is that they are rare targets in a complex genome that in mammals may contain as many as 40,000 to 50,000 genes. PCR has enabled us to produce enormous numbers of copies of a specified DNA sequence without resorting to cloning.
· PCR amplifies specific regions of DNA. PCR exploits certain features of DNA replication. DNA polymerase uses single-stranded DNA as a template for the synthesis of a complementary new strand. The single-stranded DNA templates can be produced by simply heating double-stranded DNA to temperatures near boiling. DNA polymerase also requires a small section of double-stranded DNA to initiate (prime) synthesis. Therefore, the starting point for DNA synthesis can be specified by supplying an oligonucleotide primer that anneals to the template at that point. Both DNA strands can serve as templates for synthesis provided an oligonucleotide primer is supplied for each strand. For a PCR, the primers are chosen to flank the region of DNA that is to be amplified so that the newly synthesized strands of DNA, starting at each primer, extend beyond the position of the primer on the opposite strand.
· The net result of a PCR is that by the end of n cycles, the reaction contains a theoretical maximum of 2n double-stranded DNA molecules that are copies of the DNA sequence between the primer. Thus, PCR amplifies the DNA of interest in an exponential fashion.
· The bacterium Thermus aquaticus lives in water at a temperature of 75oC. Its DNA polymerase (Taq polymerase) has a temperature optimum of 72oC and is reasonably stable even at 94oC. Taq polymerase can be added just once at the start of a reaction and will remain active through a complete set of amplification cycles. This development has allowed the automation of the PCR through the use of thermal cyclers, which are heating blocks that can be programmed to carry out the time and temperature cycles for a PCR.
· PCR can be used to study the pattern of gene expression: mRNA is converted to cDNA using reverse transcriptase, and the cDNA then serves as the template for the PCR.
· Through DNA sequencing, molecular biologists have been able to sequence whole genomes, and hence a new discipline has been born: Genomics. Genomics refers to the study of the structure and function of whole genomes. Specific gene expression has also been possible since the advent of a new technic: DNA microarrays and microchips.
DNA Microarray (Genome Chip): www.gene-chips.com/
Protocols for DNA chips: http://www.bio.davidson.edu/biology/gcat/protocols/gcatprobes.html
· Molecular biologists are now able to spot many tiny volumes (0.25-1 nanoliter) of DNA on glass or nylon surfaces. This allows many different DNAs to be spotted on one chip, called a DNA microarray. The spots are very small (100-150 µm in diameter), and the centers of the spots are only 200-250 µm apart. After spotting, the DNAs are air dried, and covalently attached by UV radiation to thin silane layer on top of the glass.
· A microchip or oligonucleotide array is made when synthetic oligonucleotides sequences are on the chip.
· The oligonuclotides on a microchip or the cDNAs on a microarray can be hybridized to labeled RNA isolated from cells (or to corresponding cDNAs) to determine gene expression.
· A similar but much more complex quest is to learn about an organism’s proteome, that is, the properties and activities of all the proteins that organism makes in its lifetime. This field is therefore called proteomics.