Exome Sequencing: Introduction and Applications
Authors: Sandhya Sanand1, Parampreet Kaur1, Raj Kiran2, Kishor Gaikwad1
1ICAR - National Research Centre on Plant Biotechnology, New Delhi â€" 110012.
2ICAR - National Bureau of Plant Genetic Resources, New Delhi-110012


Summary

Exome sequencing is a potentially expanding technique having large scale implications on genomes biology. In this article, we have given a brief introduction about exome sequencing, its applications in various field along with advantages and disadvantage.

Introduction

The exome represents the total exon of the genome, which remains within the mature RNA after transcription and can not remove by splicing as they are the exonic region ie protein coding part of the genome. Depending upon the species, protein coding region of a genome i.e., exons constitute 1-2 % of whole genome. Sequencing of exclusive exonic regions of genome could be done using two technological alternatives:

  1. Solution-based hybridization approaches: Biotinylated oligonucleiotide probes i.e., baits are used to query fragmented DNA samples which then selectively hybridize to target regions in the genome. These probes are further extracted using magnetic streptavidin beads and the nontargeted portion of the genome is washed away. Target region is then enriched using PCR, followed by sequencing and bioinformatics analysis.
  2. Array-based hybridization approaches: it is similar to solution based approach except that the probes are bound to a high-density microarray Array-based methods were the first to be used for exome capturing by Albert et al (2007) but solution-based methods are quicker, easy, and cost effective, require less amount of input DNA and hence are more efficient than former methods.

    Exome Capture platforms
    Currently, major providers of several popular ready-to-use exome capture platforms for the preparation and sequencing of exome-enriched libraries mostly from Agilent, Roche NimbleGen, and Illumina are commercially available. In addition to human kits, NimbleGen offers capture kits for maize, barley, wheat and soy exomes. Agilent kits also provide non-human support and kits are available for soy exome. Nimblegen platform offers the greatest bait density and adequately cover most of its targeted bases. Illumina platforms are better in targeting untranslated regions. The Agilent platform display better efficiency across low GC targets, possibly because of its lower number of PCR cycles, longer baits and/or the use of RNA probes while Nimblegen and Illumina involve use of DNA probes.

    Exome sequencing applications:
    Exome sequencing offers wide range of applications in population genetics/genomics, evolutionary studies, gene isolation and comparative genomics.
  1. Clinical Applications: Identification of the causal variant of a rare form of inflammatory bowel disease in an infant (Worthey et al. 2011) was the first successful use of whole exome sequencing. Exome sequencing is extremely favorable especially for applications that require high-coverage of the analyzed regions for identification of low frequency sequence variants. Clinical Applications of exome sequencing involves:
  • Identification of new genetic markers for health traits
  • Identification of somatic mutations in cancer genome
  • Identification of mosaic mutations in disease-related genes
  • Identification of mitochondrial DNA heteroplasmy
  • Identification of sequence variants in mixed DNA samples (e.g., in forensic genetics)
Exome sequencing has been used to detect a causative variant for number of diseases as reviewed by Warr et al. Some of the examples includes: Alzheimer disease, maturity-onset diabetes of the young, high myopia, autosomal recessive polycystic kidney disease, immunodeficiency leading to infection with human herpes virus 8 causing Kaposi Sarcoma and a number of cancer predisposition mutations.

  1. Crop improvement: Exome sequencing presents a very practical method of assessing genomic alterations in the exon sequences and hence can be a very handy tool in generating resources for crop improvement, that follows:
  • Biodiversity studies and identification of new species
  • Single gene identification (cloning by sequencing)
  • Dynamics of host-pathogen interactions
  • Study of crop evolution
  • Trait improvement and marker development
  • Management of symbiotic cropping strategy
  • Variant discovery for crop improvement
  1. Health traits : Exome sequencing has also been used in other mammals to discover variants associated with health traits, for example, Ahonen et al (2013) conducted exome sequencing in conjunction with a genome wide association study to identify a frameshift mutation causing blindness in Phalène dogs. Similarly, exome sequencing has been done in cattle to identify strong candidate variants for haplotypes relating to reduced fertility rates in Holsteins which can be used to selectively breed against these detrimental haplotypes.
Whole Genome Sequencing Vs Whole Exome Sequencing


  • WGS Permits search of SNVs, indels, SV and CNVs in coding as well as non-coding part of the genome while WES neglects the regulatory sequences like promoters regions and enhancers.
  • The sequence coverage is more reliable in WGS as compared to WES therefore WGS can deliver more precise detection of structural variants, and does not have any reference sequence bias produced by probe sequences in WES.
  • WGS shows greater coverage uniformity than WES.
  • Sequencing read length and data size produced in WGS is long while the target probes design for exome-sequencing are less than120 nt long , which make it worthless to sequence using a larger read length. WGS in the long run, take time and space for data storage and data analysis
  • WGS is more universal and also covers the entire genome. While WES is limited and needs prior knowledge of the location and sequence of features to target them
Advantages of exome sequencing : The reduction of non-pertinent repetitive and other non-coding genomic sequences confers important advantages over randomly sequencing the entire genome:

  • Multiplexing of more number of samples for a given sequencing space
  • High sequencing depth facilitates identification of orthologs and paralogs in a population
  • Difficulty to find the functional impact of variants in noncoding regions
  • Identifies variants across a wide range of applications
  • Achieves comprehensive coverage of coding regions
  • cost-effective in comparison to whole-genome sequencing WES is targeted to protein coding regions, so reads represent less than 2% of the genome. This reduces the cost as it focuses to sequence a particular targeted region at a high depth and reduces storage and analysis costs. Reduced costs make it reasonable to increase the number of samples to be sequenced, allowing large population based comparisons, and also resequencing increase the correctness
  • Involves smaller and more manageable data set for faster, easier analysis compared to whole-genome approaches
  • Transferability of exome capture between species

    Limitations of exome sequencing:
  • Inability to comprehensively represent genomic variants
  • Genomic regions which are functional but not yet recognized are not included
  • Exclusion of noncoding variants, for example in regulator regions that could have major impact on trait.
  • Boundaries of exome sequencing as 100% genes of a genome are not targeted
  • Exome sequencing is restricted in the detection of following types of mutations: mutations in copy number variation that may include deletions as well as duplications, epistatic interactions, epigenetic aspect, mosaic mutations, mutations present in repetitive or the region containing high GC content, mutations in genes with corresponding pseudogenes or other highly homologous sequences, triplet repeat disorders.

    Conclusion: Exome sequencing presents an excellent and practical (cost and time efficient) tool having a wide range of applicability in both human and agricultural sciences. However, exome sequencing by itself is not sufficient and rather an integrative approach utilizing multiple omics and bioinformatics is required to fully harness the potential of exome sequencing


References:
Albert TJ, Molla MN, Mujny DM, Nazareth L and Wheeler D (2007) Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4: 903â€"905.

Worthey EA, Mayer AN, Syverson GD, Helbling D, Bonacci BB et al (2011) Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet. Med. 13: 255â€"262.

Ahonen SJ, Arumilli M and Lohi H (2013) A CNGB1 frameshift mutation in Papillon and Phalene dogs with progressive retinal atrophy. PLoS One 8: e72122

Warr A, Robert C, Hume D, Archibald A, Deep N and Watson M (2015) Exome Sequencing: Current and Future Perspectives. G3: Genes, Genomes, Genetics, 5: 1543- 1550

https://blog.genohub.com/2015/02/21/whole-genome-sequencing-wgs-vs-whole-exome-sequencing-wes/


About Author / Additional Info:
I am currently working as a Scientist at ICAR-National Research Center On Plant Biotechnology, New Delhi