Single Nucleotide Polymorphisms (SNPs) in complex plant genomes - Discovery and Genotyping
Author: Dr. SV Amitha Mithra and Dr. Amolkumar U. Solanke
NRC on Plant Biotechnology, LBS Building, IARI, Pusa Campus, New Delhi


Introduction:

Single Nucleotide Polymorphisms (SNPs) are the most abundant class of molecular markers available to researchers across varied fields of biology such as agriculturists, population geneticists, evolutionary biologists and molecular biologists. They are being increasingly used for genotyping applications for detection of quantitative trait loci and mapping in a number of plants (Hyten et al. 2008; Yan et al. 2010). Their abundance, comparatively lesser cost per data point and amenability for high throughput genotyping leading to dense coverage of the genome make them more sought after markers. Their only drawback is that being biallelic (more often than not), they have lesser polymorphic information content (PIC) than the other 'marker of choice'; the multi allelic 'simple sequence repeats' (SSRs) which were all pervasive till the era of SNPs began (Mammadov et al. 2012). However SSRs are neither evenly distributed across the genome, nor so abundant and their genotyping cost still remains high.

Techniques for large scale SNP discovery/genotyping:
Till the advent of Next Generation Sequencing (NGS) technologies, SNPs were discovered and genotyped in two distinct steps. For genotyping a number of techniques are used. The most important ones are Single Base Extension (SBE) based techniques coupled with either restriction enzymes or PCR based amplification (used in capillary sequencers in SNaP shot reaction and oligonucleotide hybridization coupled with SBE used in infinium assay of Illumina), allele specific PCR using locus specific and allele specific primers (GoldenGate assay of Illumina), competitive allele specific PCR (KASPar assay) and purely hybridization based technique from Affymetrix. Once NGS came into picture, it became possible to sequence genomes or amplicons from large number of genotypes and identify SNPs, thus clubbing detection and genotyping in a single step (Elshire et al. 2011).

Issues in SNP discovery in complex genomes:

Major crop species have the advantage of being amenable to making controlled crosses and selfing showing no or little inbreeding depression, higher fecundity and shorter generation times. Despite having these advantages, and the advent of SNP markers and NGS technology which was expected to revolutionize mapping leading to faster gene isolation and functional genomics, SNP discovery in many complex genomes is fraught with difficulties. Though there are number of reasons the most important reason for this is the polyploid nature of these crop species. Of the major food crops, other than rice and maize (which are diploids like human genome) all others have higher ploidy levels or have very low genomic information known: for instance, bread wheat and oats are hexaploids; cotton, ragi, potato, Brassica juncea and B. napus are tetraploids. Higher the ploidy levels, more difficult it is to assemble genomes and develop physical maps. The bioinformatic tools available for assembly and SNP detection tend to give ambiguous results not being able to discriminate true SNPs from homeologous SNPs. The presence of paralogous loci (duplications) aggravates this problem giving rise to too many false SNPs. When heterozygosity is also prevalent, real SNP discovery is still more challenging. Secondly, some crops have too huge genomes, for instance though rice and barley are similar in their morphological and molecular complexity, the genome of barley is roughly 11 times bigger than that of rice. This is known as C value enigma (earlier known as C value paradox) and is a well known concept in molecular genetics. The increase in size may be due to repetitive sequences and their composition. Retro elements which comprise of 50% of the maize genome are one of the important reasons for huge genome sizes. Thus though sequencing is possible, the downstream process of covering the entire genome, assembly, data management and analytical methods employed and annotation remain a great challenge.

Way forward:

Understanding the issues involved in SNP discovery in complex genomes paves way developing and using better algorithms for eliminating such problems and identifying true SNPs. For example, to circumvent the issue of identifying homeologous SNPs from homologous ones, SNP haplotypes are being used rather than single SNPs which considerably decreases false discovery. For eliminating the problem due to paralogous loci, one can compare the average coverage of the genome with regions of very high coverage which are mostly due to duplications. Moreover, these regions tend to have more SNPs than others (Hirsch and Buell 2013). Longer reads, use of paired end reads and large insertion libraries for sequencing can also improve the quality of results from NGS platforms. Utilizing the already available cytogenetic stocks and developing appropriate stocks such as nullisomic and monosomic deletion or substitution lines and using them in conjunction with these promising technologies would be a good strategy to decode complex genomes.

References:
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K et al. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6(5): e19379. doi:10.1371/journal.pone.0019379.
Hirsch CN and Buell CR (2013) Tapping the promise of genomics in species with complex, nonmodel genomes. Annu. Rev. Plant Biol. 2013. 64:89-110.
Hyten D.L, Song Q, Choi IY, Yoon MP, Specht JE, Matukumalli LK, Nelson RL, Shoemaker RC, Young ND and Cregan PB (2008) High-throughput genotyping with the GoldenGate assay in the complex genome of soyabean. Theor. Appl. Genet. 116: 945-952.
Mammadov J, Aggarwal R, Buyyarapu R and Kumpatla S (2012) SNP Markers and their impact on plant Breeding. Int. J. Plant Genomics. 2012: 11 pages doi.org/10.1155/2012/728398
Yan J, Yang X, Shah T, Sanchez-Villeda H, Li J, Warburton M, Zhou Y, Crouch JH and Xu Y (2010) High-throughput SNP genotyping with the GoldenGate assay in maize. Mol. Breeding 25: 441-451.

About Author / Additional Info: