Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (14.32 MB, 966 trang )
DK3133_book.fm Page 242 Tuesday, April 12, 2005 4:01 PM
242
Zhan and McDonald
tions of population genetic models (Brown, 1996). Empirical studies of fungal population
genetics have lagged behind those for other eukaryotes because of the lack of morphological genetic markers. The advent of molecular marker technologies in the last 20 years
has made empirical studies of fungal population genetics possible. Molecular marker
technologies provide a large number of highly variable markers, making it possible to
rapidly differentiate almost all levels of genetic variation in fungal populations. Correct
and efficient detection of genetic variation also requires that sample sizes are large enough
and sample strategies are appropriate (McDonald, 1997), to ensure that fungal samples
included in the analysis are a fair reflection of true populations.
However, describing population genetic structure is only the initial step for population genetic studies of fungi (McDonald, 1997). The ultimate goals are to understand the
evolutionary history of a population, to reveal the processes and mechanisms through
which evolutionary changes have occurred, and to predict the evolutionary potential of
fungal species by scrutinizing the evolutionary signals recorded in the current population
genetic structure. These goals cannot be achieved solely through direct observation of
fungal populations and must rely on rigorous statistical inference. The ready availability
of powerful desktop computers and sophisticated analytical software has made it increasingly easy to compute population genetic parameters and develop complex evolutionary
models. But the choice of appropriate analytical tools is an important issue for all population geneticists. All analytical approaches developed for estimating population and evolutionary parameters have advantages and disadvantages, depending on specific genetic
assumptions. The choice of analytical methods should be based on the genetic properties
of the marker system and the questions being addressed. Choosing an inappropriate
analytical method could lead to false inferences regarding population genetic parameters
and the evolutionary history of fungal species or populations.
In this chapter, we will focus our discussion on the properties of mathematical and
statistical approaches currently used by mycologists and plant pathologists to estimate
population genetic structure and to infer migration and recombination of fungal species.
We also point out some problems associated with these mathematical and statistical
methods, as well as some solutions. Analytical methods for inferring mutation rates,
effective population sizes, and natural selection are not covered, as they are not the focus
of current population genetics studies in fungi. Issues relating to the choice of genetic
markers and sampling strategies have already been thoroughly discussed in other reviews
(e.g., Brown, 1996; McDonald, 1997). To limit the length of this discussion, we will keep
the use of mathematical formulae to the minimum. Interested readers are advised to look
at cited publications and textbooks (e.g., Weir, 1996) for additional details.
12.2
EVOLUTIONARY FORCES DETERMINING POPULATION
DYNAMICS OF FUNGI
12.2.1
Mutation
Mutation is the ultimate source of genetic variation, creating new DNA sequence variation
upon which other evolutionary forces act (Freeman and Herron, 2001). Mutation results
from base pair substitutions, small or large deletions, transposition events, and structural
alteration of chromosomes. Mathematical models show that in a finite population, the
probability that an allele will be lost after one generation of reproduction is (1 − fi ) 2 Ne for
diploid fungi and (1 − fi ) Ne for haploid fungi, where fi is the frequency of allele i and Ne
is the effective population size. Thus, the ancestral lineage of any allele can be lost through
random genetic drift in each generation of reproduction. After many generations, all
DK3133_book.fm Page 243 Tuesday, April 12, 2005 4:01 PM
Methods for Estimating Population Genetic Structure of Fungi
243
ancestral alleles except one will eventually disappear from the population. Therefore, in
the absence of mutation, all populations will eventually become fixed for the same
allele/trait and no further evolution will occur for this gene.
Although mutation creates new alleles in fungal populations, the initial frequencies
of the new mutants are usually very low. In the infinite allele model, the initial frequency
for a new mutant allele is 1/2Ne for diploid fungi and 1/Ne for haploid fungi, as this model
assumes each allele emerges through a unique mutation. The frequency of a new mutant
can be increased through recurrent mutation, but this is a slow process. For example, at
least 10,000 generations are needed to increase the frequency of a recurrent mutant to a
frequency of 1%, assuming a mutation rate of 10–6 and no reversion. Thus, the principal
forces determining whether a mutant persists in a fungal population are natural selection
and effective population size.
12.2.2
Migration
Migration (gene flow), here defined as the exchange of genetic information among geographic populations through the movement of gametes or individuals from one place to
another (Slatkin, 1987), plays many roles in the population genetic structure and evolution
of fungal species. First, like mutation, migration serves as a source of genetic variation.
Because mutation is usually a rare event, a specific mutant allele may arise in one fungal
population but not in others. Migration increases the genetic variation of local fungal
populations by introducing novel alleles and allele combinations from neighboring populations. Evolutionary theory suggests that geographic isolation is one of the main steps
leading to speciation. Regular migration acts as a constraining force on evolution by
homogenizing the gene frequencies among geographically separated fungal populations,
thus preventing the accumulation of population differences that could lead to speciation.
On the other hand, occasional exchange of migrants can accelerate the evolutionary process
by spreading superior genes or gene combinations to neighboring populations (Lande,
1984). In agricultural systems, migration is considered a critical factor for promoting
pathogen evolution by spreading new virulence and fungicide resistance alleles among
fungal populations across a large geographical area (McDonald and Linde, 2002).
The agents of migration in fungi can be mycelium, sexual spores, or asexual spores,
as well as fertilizing agents such as microconidia and spermatia (Burnett, 2003) carried
by wind, water, or insects. The principal determinants affecting migration rate of a fungal
species are the type of migration unit and the mechanism used for dispersal. Fungal species
dispersed by rain splash or free-water movement usually have lower potential for longdistance migration than fungal species carried by wind or insect vectors. For some fungi,
such as rusts, migration units (spores) can be carried by air currents for thousand of
kilometers (Watson and de Sousa, 1983), leading to intercontinental migration. Human
activities, such as movement of contaminated machinery and international travel or trade,
might also have a substantial impact on migration for many fungi.
12.2.3
Mating Systems
Fungi exhibit a wide array of recombination strategies. Recombination in fungi can occur
through both meiosis and mitosis. Meiotic recombination usually involves the fusion of
genomes donated by two opposite mating types during sexual reproduction while mitotic
recombination does not involve sex. Heterokaryosis and parasexuality may provide additional mechanisms of recombination for fungal species (Burnett, 2003). Mating systems
affect recombination rates in populations and play a key role in the dissipation of linkage
disequilibrium. Populations exhibiting little or no recombination are expected to show a
significant degree of nonrandom association among unlinked alleles. In contrast, large
DK3133_book.fm Page 244 Tuesday, April 12, 2005 4:01 PM
244
Zhan and McDonald
recombining populations are expected to exhibit random associations among neutral loci
as a result of the reassortment of unlinked genes during meiosis (Hartl and Clark, 1997).
When coupled with selection, recombination can have a large impact on genetic variation
(Braverman et al., 1995; Nordborg et al., 1996) and population subdivision (Felsenstein,
1974; Zhan and McDonald, 2003). First, recombination increases genetic variation in local
populations by creating new genotypes through reshuffling existing genes. Even a small
number of outcrossing events can generate a significant amount of genotype variation. In
plants, for example, Kannenbe and Allard (1967) showed that a single outcross resulted
in a significant number of new genotypes. Second, recombination can increase genetic
variation by creating new alleles through intragenic rearrangement. Third, recombination
increases genetic variation of local populations (Braverman et al., 1995; Nordborg et al.,
1996) and decreases the level of population subdivision (Felsenstein, 1974; Zhan and
McDonald, 2003) by limiting the effects of hitchhiking or background selection on linked
alleles at other loci.
12.2.4
Natural Selection
The targets of natural selection are phenotypes, not genotypes. For selection to occur in
natural populations, three conditions must be satisfied. First, individuals in natural populations must differ in their ability to survive and reproduce. Second, more offspring
must be produced than can survive and reproduce. Third, phenotypes with higher fitness
are present in excess at reproductive age and, therefore, overcontribute to the gene pool
of the following generations.
While the conditions needed for selection to occur appear simple, the effects of
natural selection on the genetic dynamics of fungal populations are complex. Natural
selection can increase or decrease the level of genetic variation and population differentiation, depending on the types of natural selection operating on natural populations (Hartl
and Clark, 1997). Directional selection favoring the same phenotypes in populations
inhabiting different environments may lead to a rapid decrease in genetic variation of
fungal populations while homogenizing allele frequencies among geographically isolated
populations. On the other hand, balancing selection tends to increase the amount of genetic
variation either through selection for rare alleles (frequency-dependent selection) or by
overdominance (heterozygote advantage) in diploids or dikaryotic fungi. Divergent selection for different phenotypes among genetically isolated fungal populations may also lead
to population subdivision.
Selection may play a more significant role in the population genetic dynamics of
plant pathogenic fungi (Leung et al., 1993), primarily due to cultural practices, such as
monoculture, frequent changes in host populations, and intensive applications of fungicides
in agricultural ecosystems. Directional selection for corresponding virulent or fungicideresistant mutants in a fungal population can rapidly change the genetic structure of fungal
populations, causing the loss of effectiveness of new resistance genes or new fungicides
in just a few years. Natural selection is expected to be more efficient at changing allele
frequencies in haploid fungi than in diploid or dikaryotic fungi.
12.2.5
Genetic Drift
Genetic drift shares many evolutionary features with natural selection. Both directional
selection and genetic drift are the principal evolutionary forces leading to fixation of rare
mutant alleles. And both processes can purge genetic variation in local fungal populations
and generate differentiation among subpopulations. Genetic drift is a stochastic process
that affects entire genomes, while natural selection is a determinative process and affects
only loci it directly acts on or is strongly associated with.
DK3133_book.fm Page 245 Tuesday, April 12, 2005 4:01 PM
Methods for Estimating Population Genetic Structure of Fungi
245
No natural populations are exempt from genetic drift. The degree of genetic drift in
a fungal population is determined by its effective size. Genetic drift is more severe in
small than in large populations. Thus, fungal populations with large effective sizes tend
to have higher genetic variation, as more alleles can emerge through mutation and fewer
alleles will be lost due to random drift (Kimura, 1983). In agricultural ecosystems, genetic
drift is expected to play a more important role in the spatial and temporal genetic structure
of fungal pathogen populations due to large fluctuations in fungal population sizes associated with epidemic cycles.
In small or repeatedly bottlenecked populations, genetic drift may cause accumulation or random fixation of deleterious mutations, resulting in a decline in overall population
fitness. Such fitness declines can further reduce effective population size and accelerate
the accumulation of additional mutations, leading to extinction of a species or population.
12.3
MATHEMATICAL AND STATISTICAL METHODS TO
ESTIMATE POPULATION AND EVOLUTIONARY
PARAMETERS
12.3.1
Genetic Variation
In population genetic studies of fungi with mixed sexual and asexual reproduction, both
gene variation and genotype variation must be considered. Genotype here refers to genetically distinct individuals, often defined in fungi based on unique DNA fingerprints. The
amount of gene variation in fungal populations is measured by the numbers and frequencies
of alleles at individual loci and is affected by population age, migration, mating system,
population size, and natural selection. For example, old populations tend to have higher
amounts of gene variation due to the accumulation of mutations over time. On the other
hand, the amount of genotype variation in a fungal population is affected mainly by mating
and reproductive systems. Populations exhibiting largely asexual reproduction tend to have
low genotype variation arrayed as a limited number of clonal lineages, whereas sexual
populations are expected to exhibit a high degree of genotype variation. Genotype variation
is an important population parameter only for organisms with clonal reproduction or
facultative sexual reproduction, such as fungi. In obligate sexual organisms such as mammals, each individual (except identical twins) in a population has a unique genotype, and
genotypic diversity should always be at its theoretical maximum.
12.3.1.1
Estimating Gene Variation
Several approaches have been used to measure and compare the amount of gene variation
in fungal populations. One approach is to measure the proportion of polymorphic loci
across the genome (e.g., Carter et al., 2001; Lewinsohn et al., 2001; Chen et al., 2002;
Plante et al., 2002; Flier et al., 2003). Polymorphic loci are defined as those where the
most common allele is present at a frequency of less than 0.95 or 0.99 (Hartl and Clark,
1997), depending on the preference of the researchers. Previous studies indicate that the
distribution of genetic variation among loci for most organisms follows an L shape, where
the majority of loci in the genome show little or no variation (Nei, 1987). This implies
that to ensure 95% probability of detecting polymorphism for a less variable locus, at
least ~30 diploid individuals should be assayed for the 0.95 criterion of polymorphism
and ~150 for the 0.99 criterion. These numbers increase to ~60 and ~300, respectively,
for haploid fungi. Many sample sizes used in studies of fungal population genetics are
much smaller than these expectations and, thus, are likely to underestimate the level of
polymorphism.
DK3133_book.fm Page 246 Tuesday, April 12, 2005 4:01 PM
246
Zhan and McDonald
A second approach is to measure allelic richness, i.e., the mean number of alleles
at each locus (e.g., Elias and Schneider, 1992; Boisselier-Dubayle et al., 1996; Huss, 1996;
Purwantara et al., 2000; Johannesson et al., 2001a; Zhan et al., 2003). When calculating
allelic richness of populations, all loci should be taken into account, including monomorphic loci. Measures of allelic richness should only be applied to data generated using
multiallelic markers such as restriction fragment length polymorphisms (RFLPs) and
microsatellites. For data generated with biallelic polymerase chain reaction (PCR)-based
markers, for example, amplified fragment length polymorphism (AFLP) and random
amplified polymorphic DNA (RAPD), measures of allele richness are limited because
these markers have a maximum of only two possible alleles. Both theoretical and experimental studies have shown that allelic richness is very sensitive to sample sizes (Nei et
al., 1975; Leberg, 1992; Spencer et al., 2000), especially for highly variable loci. In the
long term, allelic richness is thought to be a more accurate reflection of evolutionary
potential of populations than other measures of gene variation (e.g., Petit et al., 1998).
A third approach is to measure the amount of heterozygosity (e.g., Hantula et al.,
1998; Vainio et al., 1998; Hogberg and Stenlid, 1999; Smith and Sivasithamparam, 2000;
Johannesson et al., 2001b). The amount of heterozygosity is often presented for a single
locus or as an average across several loci and can only be used for diploid or dikaryotic fungi.
The fourth and most widely used approach to measure gene variation in fungal
populations, however, is Nei’s (1973) gene diversity (e.g., Goodwin et al., 1994; Leuchtmann and Schardl, 1998; Rosewich et al., 1999; Pimentel et al., 2000; Gagne et al., 2001;
Douhan et al., 2002; Mahuku et al., 2002; Zhan et al., 2003). Nei’s gene diversity is a
joint function of allele richness and allele frequency. It measures the potential amount of
heterozygosity in random mating diploid populations. Nei’s gene diversity can be used to
measure gene variation for any organism as long as allele frequencies can be determined.
When estimating gene diversity of a fungal population, data from several loci should be
averaged, including monomorphic loci. Like allelic richness, measures of gene diversity
are affected by sample sizes, but are less sensitive than allelic richness (Nei et al., 1975;
Leberg, 1992; Spencer et al., 2000).
After gene variation has been calculated for a fungal population, it is common to
compare it with the gene variation of other populations and even other fungal species,
quantitatively or qualitatively. Results from these types of comparisons are routinely used
as arguments to support hypotheses of genetic drift or natural selection, as well as to
indicate the evolutionary potential of a particular fungal population or species. When
comparing the amount of gene variation in fungal populations, several parameters have
to be considered, especially when comparing data from different laboratories, including:
1.
2.
What types of genetic markers were used. The types of genetic markers chosen
have a strong impact on the estimate of gene variation. The maximum genetic
variation, in terms of Nei’s gene diversity, is 1 – 1/k, where k is the number of
alleles. For data generated with biallelic PCR markers, such as RAPD and AFLP,
the maximum number of alleles is two and the maximum value of Nei’s gene
diversity is 0.50. When data are based on multiallelic markers, such as RFLPs
and microsatellites, the number of alleles at one locus can be very large and
gene diversity can be close to the theoretical maximum of 1.0. Therefore,
comparisons of gene variation can be largely meaningless if different genetic
marker systems are used.
Whether monomorphic loci are included. In many population genetic studies
of fungi, researchers have considered only data from polymorphic loci. For
RFLP and microsatellite markers, these loci are chosen randomly from a
DK3133_book.fm Page 247 Tuesday, April 12, 2005 4:01 PM
Methods for Estimating Population Genetic Structure of Fungi
3.
247
genomic library during a prescreening procedure and may represent the most
variable parts of the genome. Data from less variable (monomorphic) loci are
either discarded or are not included in surveys. This exclusion of monomorphic
loci not only inflates measures of genetic variation of fungal species, but also
makes comparisons from different laboratories difficult or impossible.
Whether sample sizes are the same. Unequal sample sizes are commonly
encountered in population surveys, even in data sets generated from the same
laboratory. As all four measures of gene variation are affected by sample size,
quantitative comparisons of gene variation among fungal populations should be
based on standardized sample sizes. Although some researchers have noted the
relationship between gene variation and sample sizes, the implications of this
relationship are apparently not widely understood, and the majority of researchers have not attempted to correct for differences in sample size. Several strategies
can be used to correct for differences in sample size among populations (Leberg,
2002). In our laboratory, we have used three methods to make this correction:
a. First, we have standardized different fungal samples with θ (4Neu for diploid
organisms and 2Neu for haploid organisms), the expected number of new
alleles emerging through mutation each generation (Zhan et al., 2001). The
θ of a fungal population was estimated from the observed number of alleles
in a sample using the sampling theory developed by Ewens (1972). This
strategy does not compare allelic richness directly. Rather, it compares the
potential of generating new mutants under an infinite neutral allele model.
By assuming constant mutation rates across populations, this strategy can
also be used to compare effective population sizes among populations. A
drawback of this strategy is that it requires large sample sizes to make a
reasonable estimate.
b. A second strategy is to standardize populations with resampling by taking
many random subsamples of the same size from the original data set of each
population and calculating gene variation as the mean value of the resamples
(Zhan et al., 2003). In a global survey of the wheat pathogen Mycosphaerella
graminicola, we found that populations from the Middle East displayed the
highest amount of gene variation and we hypothesized that this region is the
center of origin of this fungus. However, initially we did not know whether
differences in gene variation among regional populations reflected differences
in sample sizes or whether the differences were statistically significant. We
used the resampling strategy to distinguish between these possibilities. For
each resampling replication, the total number of alleles and their frequencies
were recorded from a random sample of the same size as the smallest population and the gene diversity was calculated. This procedure was repeated
100 times. The mean and variance for gene diversity and allele number in
each population were calculated and used for a t-test. When using random
resampling strategies to standardize sample size, it is important to standardize
them according to the population with the smallest sample sizes. Reducing
the larger samples to the mean of all populations, for instance, would reduce
but not eliminate bias. The advantage of this procedure is that it provides a
statistical test for the difference while standardizing sample sizes.
c. A third method is to standardize estimates of allelic richness (Zhan and
McDonald, 2003) with a sample coverage method by estimating the number
of undetected alleles according to sample sizes and the frequency distribution
of alleles (Huang and Weir, 2001). This method estimates the total number
DK3133_book.fm Page 248 Tuesday, April 12, 2005 4:01 PM
248
Zhan and McDonald
of alleles in a population regardless of sample size and is a good indicator
of true gene variation.
12.3.1.2
Estimating Genotype Variation
Many methods have been used to estimate genotypic diversity in fungal populations. These
methods can be grouped roughly into three categories. As explained below, all of these
have some inherent problems, and a better estimator of genotype diversity would represent
a significant advance for fungal population studies.
The first category is based on Nei’s measure of gene diversity (Nei, 1973) and its
analogs. This approach has two problems. The first is that Nei’s gene diversity measures
the expected degree of heterozygosity assuming populations are random mating. As mentioned earlier, genotypic diversity in obligate sexual organisms should always be at its
maximum. Any departure from this expectation can be due to nonrandom mating or asexual
reproduction. Therefore, Nei’s gene diversity is not a good measure of genotypic diversity.
When we compared different analytical methods, we found that Nei’s method usually gave
the highest estimates of genotypic diversity (J.Z., unpublished results), suggesting that
this method biases the estimates upward. The second problem is that heterozygosity
measures only a part of genotype diversity (Gregorius, 1978). Homozygosity does not
necessarily decrease the amount of genotypic diversity in populations. For example, in
populations undergoing obligate sexual reproduction, all individuals usually have distinct
multilocus genotypes and contribute equally to the level of genotype variation, regardless
of whether they are homozygous or heterozygous for particular loci.
The second category is represented by Stoddart and Taylor (1980) using Kimura
and Crow’s (1964) effective number of alleles as an index of genotypic diversity. This
index of genotypic diversity has a minimum value of one and maximum value of n, where
n is the sample size. This measure is strongly associated with sample size, making
comparisons among populations of unequal sample size impossible. This problem can be
circumvented by normalizing estimated genotypic diversity against sample size, resulting
in the percentage of the theoretical maximum (Chen et al., 1994), but normalizing is
thought to work well only when populations have high diversity (Grünwald et al., 2003).
Furthermore, the index of Stoddart and Taylor (1980) emphasizes the most common
genotypes contributing to the genotypic diversity of populations while underrepresenting
the contribution of genotypes that are rare.
The third category is based on the Shannon–Wiener index (Shannon and Weaver,
1949; Muller et al., 2002; Sigler and Turco, 2002), also called the Shannon (Steffenson
and Webster, 1992; Kolmer 1993; Purwantara et al., 2000) or Shannon–Weaver (e.g.,
Burdon and Jarosz, 1992; Meijer et al., 1994; Garbeva et al., 2003) index. Because the
first version of the Shannon–Wiener index was published in 1948 (Shannon, 1948), a year
before publication of the book The Mathematical Theory of Communication, coauthored
by Shannon and Weaver (1949), and is built upon the work of Wiener (1948), some
researchers (e.g., Spellerberg and Fedor, 2003) are against using the names Shannon or
Shannon–Weaver index.
The Shannon–Wiener index, ranging in value from zero to infinity in theory, was
originally developed for telecommunication to estimate the average uncertainty in predicting letters of a message, but has been widely used in many fields, including geography
(e.g., Haines-Young and Chopping, 1996; Nagendra, 2002), ecology (e.g., Floder et al.,
2002; Stampe and Daehler, 2003), and population genetics (e.g., Drenth et al., 1993; Kwon
and Morden, 2002). This index emphasizes the measure of species (genotype) richness
(Hurlbert, 1971) and the importance of species or genotypes with a low or rare frequency
DK3133_book.fm Page 249 Tuesday, April 12, 2005 4:01 PM
Methods for Estimating Population Genetic Structure of Fungi
249
0.40
−p In(p)
0.30
0.20
0.10
0.00
0.00
0.20
0.40
0.60
0.80
1.00
Frequencies of events (e.g., genotypes)
Figure 12.1 The contribution of an event’s frequency (genotypes, species, etc.) to the estimate
of the Shannon–Wiener index.
(Figure 12.1). This index is also affected by sample size and suffers problems similar to
the Stoddart and Taylor index when normalized according to sample size (Grünwald et
al., 2003). More discussion on the estimation of genotypic diversity and its associated
problems can be found in a recent publication (Grünwald et al., 2003).
12.3.2
Population Subdivision
When genetic data are available for several fungal populations, it is natural to ask what
degree of population subdivision exists among the populations surveyed. Population subdivision, here defined as the difference in allele frequencies among subpopulations, can
be generated by genetic drift and natural selection. In the absence of sufficient gene flow
among populations, stochastic changes in allele frequencies among fungal populations can
lead to random fixation of neutral alleles, leading to nonadaptive differentiation (Wright,
1938). On the other hand, divergent selection for various ecological or physiological
characters among genetically isolated fungal populations may lead to adaptive population
subdivision. Population subdivision caused by natural selection usually is limited to the
loci directly affected by natural selection and closely linked genes, unless fungal populations are dominated by asexual reproduction.
The simplest way to measure the magnitude of subdivision among fungal populations
is by testing for homogeneity in gene frequencies using two-dimensional contingency χ2
tables (e.g., Rosewich et al., 1998; James et al., 1999; McDonald et al., 1999; Tenzer and
Gessler, 1999; Salamati et al., 2000), such as introduced by Workman and Niswander
(1970). This type of chi-squared test has (r – 1) × (c – 1) degrees of freedom, where r
refers to the number of alleles at a locus and c refers to the number of fungal populations.
The p values generated by the contingency chi-squared test provide implicit statistical
information on whether the fungal populations compared differ in their allele frequencies
and at what levels. Furthermore, the sensitivity of the test increases with sample size.
Because small populations are more prone to sampling error, this property provides a way
to autocorrect the potential type I error associated with small samples.
Three problems may be encountered when using the contingency χ2 test. First, when
some alleles have small expected values under the hypothesis of equal frequencies across
populations, it is necessary to combine the less frequent alleles into one category. Other-
DK3133_book.fm Page 250 Tuesday, April 12, 2005 4:01 PM
250
Zhan and McDonald
wise, the resulting χ2 value may be too large, causing a false rejection of the null hypothesis.
Many statisticians recommend that χ2 should not be performed if the expected count in
the cells is less than five, but this criterion is hard to realize unless sample sizes are large
and loci are moderately variable. In our laboratory, we usually combine into a single
category alleles with frequencies lower than 0.05 for studies with large sample sizes (>50
individuals per population) and 0.10 for studies with small sample sizes (<30 individuals
per population). Second, a contingency χ2 test is conducted for each locus, and statistical
inference regarding population subdivision may not be consistent across randomly chosen
loci. In many experimental studies in fungal population genetics, it is common to find that
the null hypothesis of equal allele frequency across populations is rejected for one or two
loci, but not for other loci. This inconsistency could be expected if the loci chosen have
undergone different evolutionary processes. For example, some of the loci could be under
selection or linked to selected loci, while others are selectively neutral. In this case, further
tests for neutrality of suspect loci may be required. On the other hand, information from
several loci can be joined to give a combined χ2 value. This combined χ2 value has (c –
1) ∑ (ai – 1) degrees of freedom, where c is the number of fungal populations surveyed
and ai is the number of alleles at the ith locus. Third, when multiple fungal populations
are compared, rejection of the null hypothesis of equal frequencies indicates only that there
is some degree of genetic differentiation among the populations surveyed. The rejection
could be caused by one or a few extreme populations, and the contingency χ2 test would
not identify the responsible populations. To identify the extreme populations, more tedious
procedures such as pair-wise comparisons between fungal populations may be needed.
Another approach widely used to measure subdivision among fungal populations is
the fixation index, FST and its analogs (e.g., Carlier et al., 1996; Milgroom et al., 1996;
Purwantara et al., 2000; Douhan et al., 2002; Garcia et al., 2002; Johannesson et al., 2001b;
Kauserud and Schumacher, 2003; LoBuglio and Taylor, 2002; Urbanelli et al., 2003). The
term fixation was first coined by Wright (1921) to quantify the reduction of heterozygosity
compared with a random mating population. The deficiency of heterozygosity in a population could result from nonrandom mating within local populations. It also could result
from differences in allele frequencies among populations due to cryptic population subdivision. If two subdivided populations are combined in an analysis, there often is an
excess in homozygosity (reduction of heterozygosity) called the Wahlund effect. The
degree of this excess reflects the amount of genetic differentiation between the two
subpopulations. The excess homozygosity in the combined population will reduce to that
expected after only one generation of random mating. This reduction in homozygosity
after one generation of random mating is called the Wahlund principle. Wright (1951)
used the formula (1 – FIT) = (1 – FIS)(1 – FST) to partition the fixation index of a subdivided
population, where FIT is the total reduction of heterozygosity (also called the inbreeding
coefficient) in subdivided populations, and FIS and FST are the lack of heterozygosity due
to nonrandom mating within local populations and to population subdivision, respectively.
FST has a value between 0 and 1. When FST equals one, it indicates that populations are
completely isolated, and when FST equals zero, it indicates the populations are identical.
One advantage of using F statistics to measure the level of population differentiation
is that FST also provides a convenient way to estimate gene flow, provided that populations
are in drift–migration equilibrium and mutation is negligible. However, the F statistics
also have a few deficiencies. First, FST does not have a statistical property. To test whether
an estimate is statistically significant, further analysis using a Fisher exact test or permutation is required. In practice, it is common to conclude that populations show little
differentiation if FST is less than 0.05, moderate differentiation if FST is between 0.05 and
0.15, large differentiation if FST is between 0.15 and 0.25, and very large differentiation
DK3133_book.fm Page 251 Tuesday, April 12, 2005 4:01 PM
Methods for Estimating Population Genetic Structure of Fungi
251
if FST is larger than 0.25, but these values are arbitrary. Second, though FST can be calculated
for any population if there are two alleles at one locus, in the presence of multiple alleles,
the formula (1 – FIT) = (1 – FIS)(1 – FST) is no longer valid unless there is no selection
(Nei, 1965). Third, FST can only be used for diploid organisms. For haploid fungi, no
measure of heterozygosity is possible. Fourth, FST does not take sample sizes and number
of subpopulations into account.
To overcome these problems, several analogs of FST have been developed and widely
used to measure population subdivision. One of these analogs is GST (Nei, 1972, 1973).
GST is an averaged FST over alleles and pairs of populations. Nei (1973) stated that GST
can be applied to any populations without assumptions regarding the pattern of evolutionary forces, such as mutation, selection, and migration, and to sexual or asexual organisms
of any ploidy as long as allele frequencies can be estimated. Furthermore, GST can be used
to quantify population subdivision across many hierarchical levels as long as an appropriate
sampling strategy is used. Like FST, GST (Nei, 1972, 1973) does not have a statistical
property and further analysis is required to obtain a statistical inference. It also does not
explicitly take into account differences in sample sizes and number of subpopulations (Nei
and Chesser, 1983).
Another FST analog is θST, proposed by Cockerham (1969, 1973) and Weir and
Cockerham (1984). θST is estimated from each allele separately. When there are more than
two alleles present at a locus, information from the different alleles can be combined by
taking their geometric means (Weir and Cockerham, 1984). θST emphasizes the effects of
sample sizes and number of subpopulations on the estimation of population subdivision.
This statistic assumes that populations are random mating and have equal effective size.
Unlike GST, values of θST are not always positive. A negative θST value indicates that the
populations are so similar in gene frequencies that no population subdivision can be
detected, but the negative value cannot be used to estimate the amount of gene flow.
More recently, analysis of molecular variance (AMOVA) based on the conventional
theory of analysis of variance (Excoffier et al., 1992) has become widely used to study
spatial population structure of fungi (e.g., Hellgren and Hogberg, 1995; Hamelin, 1996;
Norden, 1997; Pimentel et al., 2000; Vainio and Hantula, 2000; Viji et al., 2001; Barrins
et al., 2002; Coates et al., 2002; Jamaux-Despreaux and Peros, 2003; Kauserud and
Schumacher, 2003; Samils et al., 2003; Urbanelli et al., 2003). This analysis produces
estimates of variance components by partitioning the total sum of the squared distances
between all pairs of haplotypes hierarchically and generates φ statistics. The method can
accommodate various types of genetic assumptions on the evolution of populations
(Excoffier et al., 1992) and can provide a statistical test on the significance of the differentiation index. But the method is computation-intensive. AMOVA relates all differences
among DNA haplotypes to step-wise mutation events and estimates evolutionary divergence (distance) based on the number of mutational steps between haplotypes. AMOVA
was developed for analysis of molecular data from nonrecombining regions, such as
mitochondrial genomes, and is probably valid for asexual fungal populations. Some
researchers have used AMOVA to analyze molecular data from recombining regions, thus
invalidating the fundamental assumptions of the method.
Using F statistics (FST and its analogs) to quantify population subdivision assumes
a K allele or infinite allele mutation model. These models state that mutation events should
be rare and independent of the prior states of ancestral alleles so that any genetic similarity
among natural populations can be attributed to historical association or gene flow. These
assumptions may be reasonable for RFLP, AFLP, and RAPD loci, but they may not apply
to many microsatellite loci. Experimental studies from humans and other eukaryotic
species have shown that mutation rates are very high for many microsatellite loci (e.g.,
DK3133_book.fm Page 252 Tuesday, April 12, 2005 4:01 PM
252
Zhan and McDonald
Udupa and Baum, 2001) and the size of new mutant alleles depends on the size of their
ancestral alleles (for a review, see Li et al., 2002). Therefore, FST and its analogs may not
be appropriate for estimating population subdivision of microsatellite data and may overestimate the actual level of genetic similarity among populations. In this case, Slatkin
(1995) proposed to use RST, a quantity analogous to FST, but which takes into account
differences in allele lengths. Results from computer simulations showed that R statistics
performed better for microsatellite data than F statistics (Slatkin, 1995).
Many researchers also use genetic distance to measure the similarity between fungal
populations (e.g., Weir et al., 1998; James et al., 1999; Six and Paine, 1999; Lakrod et
al., 2000; Valverde et al., 2000; Vandemark et al., 2000; Bock et al., 2002; Hurtado and
Ramstedt, 2002; Plante et al., 2002; Says-Lesage et al., 2002). There are several types of
genetic distance, but Nei’s genetic distance (1972, 1978) is most commonly used. Nei’s
genetic distance is defined as D = –ln I = –ln [JXY/(JX JY)]0.5, where I and JXY are the
normalized genetic identity and the probability of identity by descent between populations
X and Y, and JX and JY are the probability of identity by descent within each population,
respectively. This quantity is similar to GST in many ways. It measures the accumulated
nucleotide substitutions per locus between populations. Because genetic distance is thought
to be linearly related to the time since populations diverged from the same ancestor, the
advantage of this quantity is that it can be used to calculate divergence time between two
fungal populations, provided that all assumptions for GST are satisfied and the rate of
nucleotide substitution is constant over time. Furthermore, although Nei’s genetic distance
has been used primarily to measure genetic similarity of populations within species, it can
also be used to measure similarity between distantly related species (Nei, 1972).
To estimate genetic distance, data from several loci can be combined, based on either
the geometric mean of I or the arithmetic means of Jxy, Jx, and Jy. How to weight different
loci is dependent on the level of genetic similarity between the compared fungal populations
and the evolutionary rates among the loci considered. Nei (1974) proposed using the
geometric mean of I to weight different loci if the evolutionary rates among loci were
different and genetic identity (I) between populations was high. Otherwise, the arithmetic
mean of the J values should be used. Apparently, this argument is largely ignored in
empirical studies of fungal population genetics, possibly due to the fact that evolutionary
rates of loci used are not available in most cases. Because many molecular analyses of
eukaryotic genomes have revealed that the evolutionary rates vary across genomes (Kimura,
1983; Kusumi et al., 2002), we believe that researchers should use the geometric mean to
weight genetic distance among loci in cases where evolutionary rates of loci are unknown
and the populations compared are highly similar. When data from different markers are
combined, it is important to include all loci, including monomorphic loci (Nei, 1972).
However, the majority of researchers, including ourselves (Zhan et al., 2003), have included
only polymorphic loci. Excluding monomorphic loci from analysis of genetic distance
undoubtedly overestimates the genetic differences between fungal populations.
12.3.3
Inference of Evolutionary Forces
Migration, the amount of gene flow among fungal populations, is usually estimated
indirectly based on the distribution of allele frequencies among subpopulations. Currently,
two indirect methods are widely used to derive the amount of migration among fungal
populations. One of them is Wright’s FST statistic for estimating the standardized variance
of allele frequencies among local populations (Wright, 1951). Wright (1951) showed that
in an infinite island model, the level of population subdivision is a function of effective
population size and the average rates of migration among subpopulations. If the migration
rate is small and the effective population size large, the relation between population