|Deoxyribonucleic acid (DNA) is a polymeric molecule (one composed of a chain of individual units) that encodes the genetic instructions used in the development and functioning of all known living organisms and many viruses.|
In eukaryotic organisms like animals, plants and fungi, DNA is stored in the nucleus, enabling it to be protected from the harsh environment in the rest of the cell.
The monomers – single units which are chained together – that make up DNA are called nucleotides. Nucleotides are composed of a nucleic base which can be either cytosine (C), guanine (G), adenine (A) or thymine (T) as well as a sugar molecule called deoxyribose and a phosphate group.
How DNA is read: RNA and transcription
The central dogma of molecular biology states that the flow of information starts with DNA, which is transcribed into RNA which is then translated into a protein:
One major type of RNA is messenger RNA (mRNA). Genes are transcribed into mRNA in the nucleus and mRNA then leaves the nucleus to be translated into a protein.
So why is a ‘middle man’ between DNA and protein required? There are several reasons, including protection of the underlying DNA and the fact that many mRNA copies can be active at the same time. If we return to the book analogy, when reading the instructions stored in DNA, rather than taking the whole book, cells make photocopies of the instruction they need and work from that. In this way, the original instructions can be kept safely in their protected position in the nucleus.
Once outside of the nucleus, mRNA is converted into proteins by complexes called ribosomes. These read the mRNA and use it to chain together amino acids into proteins.
The chemical structure of RNA is very similar to that of DNA, but differs in three main ways:
Unlike double-stranded DNA, RNA is a single-stranded molecule in many of its biological roles and has a much shorter chain of nucleotides.
While DNA contains deoxyribose, RNA contains ribose. This makes the molecule less stable.
The complementary base to adenine is not thymine, as it is in DNA, but rather uracil, which is an unmethylated form of thymine.
Function of different DNA regions
Not all DNA has the same purpose. There are lots of ways to characterise the difference in function between different regions, but one broad distinction is between that of coding sequences and DNA which is non-coding.
DNA is called 'coding' if it contributes to the mRNA sequence that will be read by the translation machinery to produce proteins. Coding sequences are found within genes in the form of exons. In some organisms such as humans, the percentage of the genome which consists of coding sequences can be less than 10%. Whilst only representing a small amount of total genetic content, changes in coding sequences – by mutation or genetic engineering – often have a larger effect on cells as they change the structure a protein.
Non-coding DNA refers to sequences that do not contribute directly to protein formation via transcription and translation. These non-coding sequences can have a huge variety of roles:
Regulatory elements: These are DNA sequences which help control the expression of specific genes. Regulatory elements include promoters, which lie just upstream of the coding sequences of genes as well as enhancers, which can be much further away from the genes they control (sometimes as far as 1Mb away).
Introns: Coding sequences within genes are often not present in a single continuous stretch. Instead, different coding sequences that contribute to the same protein are split apart by sequences of non-coding DNA called introns. Through a process called splicing, the coding sequences (exons) of mRNA are stitched together and the introns separating them are removed.
Non-coding RNA: Some non-coding DNA is transcribed but the subsequent RNA is not translated into a protein. These non-coding RNAs can have a diverse range of roles within the cells, including aiding with translation (ribosomal RNA and transfer RNA) as well as playing roles in regulating gene expression e.g. microRNA and siRNA.
Here is a full list of different DNA functions from the iGem Registry of Standard Biological parts, which aims at standardizing and modularizing DNA sequences. With this library, you can start constructing your own DNA segments to perform new functions in organisms such as yeast or e.coli.
|Symbol||Part List||Short Description|
|Promoters (?): A promoter is a DNA sequence that tends to recruit transcriptional machinery and lead to transcription of the downstream DNA sequence.|
|Ribosome Binding Site/about (?): A ribosome binding site (RBS) is an RNA sequence found in mRNA to which ribosomes can bind and initiate translation.|
|Protein domains (?): Protein domains are portions of proteins cloned in frame with other proteins domains to make up a protein coding sequence. Some protein domains might change the protein's location, alter its degradation rate, target the protein for cleavage, or enable it to be readily purified.|
|Protein coding sequences (?): Protein coding sequences encode the amino acid sequence of a particular protein. Note that some protein coding sequences only encode a protein domain or half a protein. Others encode a full-length protein from start codon to stop codon. Coding sequences for gene expression reporters such as LacZ and GFP are also included here.|
|Translational units (?): Translational units are composed of a ribosome binding site and a protein coding sequence. They begin at the site of translational initiation, the RBS, and end at the site of translational termination, the stop codon.|
|Terminators (?): A terminator is an RNA sequence that usually occurs at the end of a gene or operon mRNA and causes transcription to stop.|
|DNA (?): DNA parts provide functionality to the DNA itself. DNA parts include cloning sites, scars, primer binding sites, spacers, recombination sites, conjugative tranfer elements, transposons, origami, and aptamers.|
|Plasmid backbones (?): A plasmid is a circular, double-stranded DNA molecules typically containing a few thousand base pairs that replicate within the cell independently of the chromosomal DNA. A plasmid backbone is defined as the plasmid sequence beginning with the BioBrick suffix, including the replication origin and antibiotic resistance marker, and ending with the BioBrick prefix.|
|Plasmids (?): A plasmid is a circular, double-stranded DNA molecules typically containing a few thousand base pairs that replicate within the cell independently of the chromosomal DNA. If you're looking for a plasmid or vector to propagate or assemble plasmid backbones, please see the set of plasmid backbones. There are a few parts in the Registry that are only available as circular plasmids, not as parts in a plasmid backbone, you can find them here. Note that these plasmids largely do not conform to the BioBrick standard.|
|Primers (?): A primer is a short single-stranded DNA sequences used as a starting point for PCR amplification or sequencing. Although primers are not actually available via the Registry distribution, we include commonly used primer sequences here.|
|Composite parts (?): Composite parts are combinations of of two or more BioBrick parts.|
Complementary DNA - cDNA
Complementary DNA (cDNA) is double-stranded DNA synthesized from a messenger RNA (mRNA) template in a reaction catalysed by the enzyme reverse transcriptase. As the RNA template has already been spliced, cDNA lacks any non-coding DNA and simply carries the DNA that codes directly for the protein of interest.
cDNA is often used to clone eukaryotic genes into prokaryotes. When scientists want to express a specific protein in a cell that does not normally express that protein, they will transfer the cDNA that codes for the protein to the recipient cell. cDNA is also produced naturally by retroviruses (such as HIV-1, HIV-2, Simian Immunodeficiency Virus, etc.) and then integrated into the host's genome where it creates a provirus.