Several steps to building a metagenomics data set
- First, a sample must be collected and DNA must be extracted from the sample.
- Second, the DNA must be multiplied until there are enough copies to sequence; this is the sequencing library.
- Third, the sequencing library is sequenced to yield the raw DNA data.
In the case of the Illumina HiSeq platform, these are 100 to 150-letter strings of A,T,G, and C characters with quality scores attached to each character, also called sequencing "reads". The fourth step is usually to assemble the reads into larger strings called contigs, with the goal of building one contig for each physically distinct piece of DNA (microbial chromosomes and plasmids) in the metagenome. This is a critical step, but because of global and local repeats in DNA sequence, each physical piece of DNA is usually still represented by many contigs that cannot be connected into a single contig. This assembly problem is a major topic of research at the intersection of computer science and microbiology. Once the reads have been assembled into contigs, genes can be predicted from the contigs and contigs can be clustered into draft genomes using common machine-learning algorithms. Analysis of gene content and draft genomes is usually done by bioinformaticians in collaboration with microbiologists and molecular biologists.
Metagenomics relies heavily on the Illumina HiSeq sequencing platforms, which deliver the most DNA data for the lowest price.
Metagenomic assembly is a key step that is under research and development.
De-Bruijn-graph-based assembly software such as IDBA, Ray Meta, All Paths-LG, and Meta Velvet-SL, among others, are popular open-source tools for assembling short reads from metagenomes. Each assembler has unique advantages. Assembly quality, as measured by contig length, is ultimately limited by the read length. Repeat regions that cannot be spanned by reads will prevent contigs from expanding.
Importance of Metagenomics for several industries
We should add some more information here.