Proceedings
of the XLV Italian Society of Agricultural Genetics - SIGA Annual Congress
Salsomaggiore Terme, Italy - 26/29 September, 2001
ISBN 88-900622-1-5
Poster Abstract
SEQUENCE DIVERSITY AND SNP MARKER DEVELOPMENT IN NORWAY SPRUCE
DEGLI IVANISSEVICH S.*,
MORGANTE M.*,**
* Dipartimento di Produzione
Vegetale e Tecnologie Agrarie Università degli Studi di Udine, via delle
Scienze 208, 33100 Udine
stefania.ivanissevich@dpvta.uniud.it
** DuPont Crop Genetics, Molecular Genetics Group, Delaware Technology
Park 200, Newark, DE, USA
SNPs, molecular markers, Norway spruce, genetic diversity
Direct analysis of genetic
variation at the sequence level (Single Nucleotide Polymorphisms, SNPs) offers
several advantages over other types of DNA marker systems. SNPs are rapidly
becoming the marker of choice for many applications in genome analysis due to
their abundance (especially important in linkage disequilibrium based mapping
approaches) and to the fact that high throughput genotyping methods are being developed
for their analysis. The additional advantage offered by this approach lies in
the phylogenetic information gathered through sequence variation analysis that
allows to draw inferences on allele and population history that cannot be
gathered with any of the other marker systems available. However, information
on the frequency and distribution of SNPs in plants is limited so far.
With the aim of developing
SNPs markers in Norway spruce, we designed 60 primer pairs on cDNA sequences.
In such a large (1C=15x109bp) and highly repetitive (80% repetitive
DNA) genome, before even attempting to identify SNPs one has to find
single-copy regions. EST sequences provide an attractive source of such regions
even if the frequency of SNPs may be lower in the protein encoding portions.
Introns and untranslated regions should therefore be preferentially targeted,
also to get more frequently locus-specific amplification products, especially
since the presence of large gene families has been reported for conifers.
Norway spruce is an
outcrossing highly heterozygous species, with very large effective population
sizes. Based on isozyme and microsatellite data it appears to carry high levels
of variability, most of which (>95%) resides within populations. Conifers in
general are considered among the most genetically variable plant species. We
therefore set out to first estimate the levels and distribution of DNA sequence
variation in expressed portions of the spruce genome and secondly to verify the
feasibility of SNP marker development from EST sequences. We amplified 300-500
bp long fragments from DNA extracted from seed endosperms (megagametophytes),
that are haploid tissues, followed by direct sequencing of the PCR products.
The use of haploid tissue and of direct sequencing offers several advantages,
namely the possibility of direct identification of sequence haplotypes
(multilocus haploid genotypes) without the need for their statistical
reconstruction, the possibility of recognizing real allelic polymorphism from
sequence variation between different gene family members (based on the
assumption that no polymorphism has to be observed within each individual
haploid tissue), and the almost complete elimination of false SNPs due to
mutations introduced by Taq polymerase.
Panels of 12 endosperms were
used that are representative of different European spruce populations. The
sequences were aligned using specific software, the single point mutations were
identified, their frequencies estimated and the haplotypes were determined.
Based on preliminary data
from 13 EST loci, the frequency of nucleotide changes appears to be high, with
an average of one SNP every 88 bases overall and one SNP every 30 bp for the
introns. These frequencies, which are more than order of magnitude greater that
those observed in humans, appear to be even higher that those observed in
maize, which is commonly considered a species with extremely high levels of
variability. We will present data on additional loci, as well as estimates of
relative frequencies of transitions versus transversion, synonymous and non
synonymous substitutions, insertion/deletion events, different population
diversity parameters and number
and distribution of haplotypes. These data will be discussed in light of the
characteristics of spruce populations and of our findings on nucleotide
substitution rates in the Pinaceae as well as used to derive inferences on the
past population history. The possibilities for the development of SNP markers
for practical applications such as genetic mapping of traits using whole genome
linkage disequilibrium based methods or candidate gene association studies will
also be discussed.