1. Trang chủ >
  2. Kỹ Thuật - Công Nghệ >
  3. Điện - Điện tử >

Chapter 12. Analytical and Experimental Methods for Estimating Population Genetic Structure of Fungi

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (14.32 MB, 966 trang )


DK3133_book.fm Page 242 Tuesday, April 12, 2005 4:01 PM



242



Zhan and McDonald



tions of population genetic models (Brown, 1996). Empirical studies of fungal population

genetics have lagged behind those for other eukaryotes because of the lack of morphological genetic markers. The advent of molecular marker technologies in the last 20 years

has made empirical studies of fungal population genetics possible. Molecular marker

technologies provide a large number of highly variable markers, making it possible to

rapidly differentiate almost all levels of genetic variation in fungal populations. Correct

and efficient detection of genetic variation also requires that sample sizes are large enough

and sample strategies are appropriate (McDonald, 1997), to ensure that fungal samples

included in the analysis are a fair reflection of true populations.

However, describing population genetic structure is only the initial step for population genetic studies of fungi (McDonald, 1997). The ultimate goals are to understand the

evolutionary history of a population, to reveal the processes and mechanisms through

which evolutionary changes have occurred, and to predict the evolutionary potential of

fungal species by scrutinizing the evolutionary signals recorded in the current population

genetic structure. These goals cannot be achieved solely through direct observation of

fungal populations and must rely on rigorous statistical inference. The ready availability

of powerful desktop computers and sophisticated analytical software has made it increasingly easy to compute population genetic parameters and develop complex evolutionary

models. But the choice of appropriate analytical tools is an important issue for all population geneticists. All analytical approaches developed for estimating population and evolutionary parameters have advantages and disadvantages, depending on specific genetic

assumptions. The choice of analytical methods should be based on the genetic properties

of the marker system and the questions being addressed. Choosing an inappropriate

analytical method could lead to false inferences regarding population genetic parameters

and the evolutionary history of fungal species or populations.

In this chapter, we will focus our discussion on the properties of mathematical and

statistical approaches currently used by mycologists and plant pathologists to estimate

population genetic structure and to infer migration and recombination of fungal species.

We also point out some problems associated with these mathematical and statistical

methods, as well as some solutions. Analytical methods for inferring mutation rates,

effective population sizes, and natural selection are not covered, as they are not the focus

of current population genetics studies in fungi. Issues relating to the choice of genetic

markers and sampling strategies have already been thoroughly discussed in other reviews

(e.g., Brown, 1996; McDonald, 1997). To limit the length of this discussion, we will keep

the use of mathematical formulae to the minimum. Interested readers are advised to look

at cited publications and textbooks (e.g., Weir, 1996) for additional details.



12.2



EVOLUTIONARY FORCES DETERMINING POPULATION

DYNAMICS OF FUNGI



12.2.1

Mutation

Mutation is the ultimate source of genetic variation, creating new DNA sequence variation

upon which other evolutionary forces act (Freeman and Herron, 2001). Mutation results

from base pair substitutions, small or large deletions, transposition events, and structural

alteration of chromosomes. Mathematical models show that in a finite population, the

probability that an allele will be lost after one generation of reproduction is (1 − fi ) 2 Ne for

diploid fungi and (1 − fi ) Ne for haploid fungi, where fi is the frequency of allele i and Ne

is the effective population size. Thus, the ancestral lineage of any allele can be lost through

random genetic drift in each generation of reproduction. After many generations, all



DK3133_book.fm Page 243 Tuesday, April 12, 2005 4:01 PM



Methods for Estimating Population Genetic Structure of Fungi



243



ancestral alleles except one will eventually disappear from the population. Therefore, in

the absence of mutation, all populations will eventually become fixed for the same

allele/trait and no further evolution will occur for this gene.

Although mutation creates new alleles in fungal populations, the initial frequencies

of the new mutants are usually very low. In the infinite allele model, the initial frequency

for a new mutant allele is 1/2Ne for diploid fungi and 1/Ne for haploid fungi, as this model

assumes each allele emerges through a unique mutation. The frequency of a new mutant

can be increased through recurrent mutation, but this is a slow process. For example, at

least 10,000 generations are needed to increase the frequency of a recurrent mutant to a

frequency of 1%, assuming a mutation rate of 10–6 and no reversion. Thus, the principal

forces determining whether a mutant persists in a fungal population are natural selection

and effective population size.

12.2.2

Migration

Migration (gene flow), here defined as the exchange of genetic information among geographic populations through the movement of gametes or individuals from one place to

another (Slatkin, 1987), plays many roles in the population genetic structure and evolution

of fungal species. First, like mutation, migration serves as a source of genetic variation.

Because mutation is usually a rare event, a specific mutant allele may arise in one fungal

population but not in others. Migration increases the genetic variation of local fungal

populations by introducing novel alleles and allele combinations from neighboring populations. Evolutionary theory suggests that geographic isolation is one of the main steps

leading to speciation. Regular migration acts as a constraining force on evolution by

homogenizing the gene frequencies among geographically separated fungal populations,

thus preventing the accumulation of population differences that could lead to speciation.

On the other hand, occasional exchange of migrants can accelerate the evolutionary process

by spreading superior genes or gene combinations to neighboring populations (Lande,

1984). In agricultural systems, migration is considered a critical factor for promoting

pathogen evolution by spreading new virulence and fungicide resistance alleles among

fungal populations across a large geographical area (McDonald and Linde, 2002).

The agents of migration in fungi can be mycelium, sexual spores, or asexual spores,

as well as fertilizing agents such as microconidia and spermatia (Burnett, 2003) carried

by wind, water, or insects. The principal determinants affecting migration rate of a fungal

species are the type of migration unit and the mechanism used for dispersal. Fungal species

dispersed by rain splash or free-water movement usually have lower potential for longdistance migration than fungal species carried by wind or insect vectors. For some fungi,

such as rusts, migration units (spores) can be carried by air currents for thousand of

kilometers (Watson and de Sousa, 1983), leading to intercontinental migration. Human

activities, such as movement of contaminated machinery and international travel or trade,

might also have a substantial impact on migration for many fungi.

12.2.3

Mating Systems

Fungi exhibit a wide array of recombination strategies. Recombination in fungi can occur

through both meiosis and mitosis. Meiotic recombination usually involves the fusion of

genomes donated by two opposite mating types during sexual reproduction while mitotic

recombination does not involve sex. Heterokaryosis and parasexuality may provide additional mechanisms of recombination for fungal species (Burnett, 2003). Mating systems

affect recombination rates in populations and play a key role in the dissipation of linkage

disequilibrium. Populations exhibiting little or no recombination are expected to show a

significant degree of nonrandom association among unlinked alleles. In contrast, large



DK3133_book.fm Page 244 Tuesday, April 12, 2005 4:01 PM



244



Zhan and McDonald



recombining populations are expected to exhibit random associations among neutral loci

as a result of the reassortment of unlinked genes during meiosis (Hartl and Clark, 1997).

When coupled with selection, recombination can have a large impact on genetic variation

(Braverman et al., 1995; Nordborg et al., 1996) and population subdivision (Felsenstein,

1974; Zhan and McDonald, 2003). First, recombination increases genetic variation in local

populations by creating new genotypes through reshuffling existing genes. Even a small

number of outcrossing events can generate a significant amount of genotype variation. In

plants, for example, Kannenbe and Allard (1967) showed that a single outcross resulted

in a significant number of new genotypes. Second, recombination can increase genetic

variation by creating new alleles through intragenic rearrangement. Third, recombination

increases genetic variation of local populations (Braverman et al., 1995; Nordborg et al.,

1996) and decreases the level of population subdivision (Felsenstein, 1974; Zhan and

McDonald, 2003) by limiting the effects of hitchhiking or background selection on linked

alleles at other loci.

12.2.4

Natural Selection

The targets of natural selection are phenotypes, not genotypes. For selection to occur in

natural populations, three conditions must be satisfied. First, individuals in natural populations must differ in their ability to survive and reproduce. Second, more offspring

must be produced than can survive and reproduce. Third, phenotypes with higher fitness

are present in excess at reproductive age and, therefore, overcontribute to the gene pool

of the following generations.

While the conditions needed for selection to occur appear simple, the effects of

natural selection on the genetic dynamics of fungal populations are complex. Natural

selection can increase or decrease the level of genetic variation and population differentiation, depending on the types of natural selection operating on natural populations (Hartl

and Clark, 1997). Directional selection favoring the same phenotypes in populations

inhabiting different environments may lead to a rapid decrease in genetic variation of

fungal populations while homogenizing allele frequencies among geographically isolated

populations. On the other hand, balancing selection tends to increase the amount of genetic

variation either through selection for rare alleles (frequency-dependent selection) or by

overdominance (heterozygote advantage) in diploids or dikaryotic fungi. Divergent selection for different phenotypes among genetically isolated fungal populations may also lead

to population subdivision.

Selection may play a more significant role in the population genetic dynamics of

plant pathogenic fungi (Leung et al., 1993), primarily due to cultural practices, such as

monoculture, frequent changes in host populations, and intensive applications of fungicides

in agricultural ecosystems. Directional selection for corresponding virulent or fungicideresistant mutants in a fungal population can rapidly change the genetic structure of fungal

populations, causing the loss of effectiveness of new resistance genes or new fungicides

in just a few years. Natural selection is expected to be more efficient at changing allele

frequencies in haploid fungi than in diploid or dikaryotic fungi.

12.2.5

Genetic Drift

Genetic drift shares many evolutionary features with natural selection. Both directional

selection and genetic drift are the principal evolutionary forces leading to fixation of rare

mutant alleles. And both processes can purge genetic variation in local fungal populations

and generate differentiation among subpopulations. Genetic drift is a stochastic process

that affects entire genomes, while natural selection is a determinative process and affects

only loci it directly acts on or is strongly associated with.



DK3133_book.fm Page 245 Tuesday, April 12, 2005 4:01 PM



Methods for Estimating Population Genetic Structure of Fungi



245



No natural populations are exempt from genetic drift. The degree of genetic drift in

a fungal population is determined by its effective size. Genetic drift is more severe in

small than in large populations. Thus, fungal populations with large effective sizes tend

to have higher genetic variation, as more alleles can emerge through mutation and fewer

alleles will be lost due to random drift (Kimura, 1983). In agricultural ecosystems, genetic

drift is expected to play a more important role in the spatial and temporal genetic structure

of fungal pathogen populations due to large fluctuations in fungal population sizes associated with epidemic cycles.

In small or repeatedly bottlenecked populations, genetic drift may cause accumulation or random fixation of deleterious mutations, resulting in a decline in overall population

fitness. Such fitness declines can further reduce effective population size and accelerate

the accumulation of additional mutations, leading to extinction of a species or population.



12.3



MATHEMATICAL AND STATISTICAL METHODS TO

ESTIMATE POPULATION AND EVOLUTIONARY

PARAMETERS



12.3.1

Genetic Variation

In population genetic studies of fungi with mixed sexual and asexual reproduction, both

gene variation and genotype variation must be considered. Genotype here refers to genetically distinct individuals, often defined in fungi based on unique DNA fingerprints. The

amount of gene variation in fungal populations is measured by the numbers and frequencies

of alleles at individual loci and is affected by population age, migration, mating system,

population size, and natural selection. For example, old populations tend to have higher

amounts of gene variation due to the accumulation of mutations over time. On the other

hand, the amount of genotype variation in a fungal population is affected mainly by mating

and reproductive systems. Populations exhibiting largely asexual reproduction tend to have

low genotype variation arrayed as a limited number of clonal lineages, whereas sexual

populations are expected to exhibit a high degree of genotype variation. Genotype variation

is an important population parameter only for organisms with clonal reproduction or

facultative sexual reproduction, such as fungi. In obligate sexual organisms such as mammals, each individual (except identical twins) in a population has a unique genotype, and

genotypic diversity should always be at its theoretical maximum.

12.3.1.1



Estimating Gene Variation



Several approaches have been used to measure and compare the amount of gene variation

in fungal populations. One approach is to measure the proportion of polymorphic loci

across the genome (e.g., Carter et al., 2001; Lewinsohn et al., 2001; Chen et al., 2002;

Plante et al., 2002; Flier et al., 2003). Polymorphic loci are defined as those where the

most common allele is present at a frequency of less than 0.95 or 0.99 (Hartl and Clark,

1997), depending on the preference of the researchers. Previous studies indicate that the

distribution of genetic variation among loci for most organisms follows an L shape, where

the majority of loci in the genome show little or no variation (Nei, 1987). This implies

that to ensure 95% probability of detecting polymorphism for a less variable locus, at

least ~30 diploid individuals should be assayed for the 0.95 criterion of polymorphism

and ~150 for the 0.99 criterion. These numbers increase to ~60 and ~300, respectively,

for haploid fungi. Many sample sizes used in studies of fungal population genetics are

much smaller than these expectations and, thus, are likely to underestimate the level of

polymorphism.



DK3133_book.fm Page 246 Tuesday, April 12, 2005 4:01 PM



246



Zhan and McDonald



A second approach is to measure allelic richness, i.e., the mean number of alleles

at each locus (e.g., Elias and Schneider, 1992; Boisselier-Dubayle et al., 1996; Huss, 1996;

Purwantara et al., 2000; Johannesson et al., 2001a; Zhan et al., 2003). When calculating

allelic richness of populations, all loci should be taken into account, including monomorphic loci. Measures of allelic richness should only be applied to data generated using

multiallelic markers such as restriction fragment length polymorphisms (RFLPs) and

microsatellites. For data generated with biallelic polymerase chain reaction (PCR)-based

markers, for example, amplified fragment length polymorphism (AFLP) and random

amplified polymorphic DNA (RAPD), measures of allele richness are limited because

these markers have a maximum of only two possible alleles. Both theoretical and experimental studies have shown that allelic richness is very sensitive to sample sizes (Nei et

al., 1975; Leberg, 1992; Spencer et al., 2000), especially for highly variable loci. In the

long term, allelic richness is thought to be a more accurate reflection of evolutionary

potential of populations than other measures of gene variation (e.g., Petit et al., 1998).

A third approach is to measure the amount of heterozygosity (e.g., Hantula et al.,

1998; Vainio et al., 1998; Hogberg and Stenlid, 1999; Smith and Sivasithamparam, 2000;

Johannesson et al., 2001b). The amount of heterozygosity is often presented for a single

locus or as an average across several loci and can only be used for diploid or dikaryotic fungi.

The fourth and most widely used approach to measure gene variation in fungal

populations, however, is Nei’s (1973) gene diversity (e.g., Goodwin et al., 1994; Leuchtmann and Schardl, 1998; Rosewich et al., 1999; Pimentel et al., 2000; Gagne et al., 2001;

Douhan et al., 2002; Mahuku et al., 2002; Zhan et al., 2003). Nei’s gene diversity is a

joint function of allele richness and allele frequency. It measures the potential amount of

heterozygosity in random mating diploid populations. Nei’s gene diversity can be used to

measure gene variation for any organism as long as allele frequencies can be determined.

When estimating gene diversity of a fungal population, data from several loci should be

averaged, including monomorphic loci. Like allelic richness, measures of gene diversity

are affected by sample sizes, but are less sensitive than allelic richness (Nei et al., 1975;

Leberg, 1992; Spencer et al., 2000).

After gene variation has been calculated for a fungal population, it is common to

compare it with the gene variation of other populations and even other fungal species,

quantitatively or qualitatively. Results from these types of comparisons are routinely used

as arguments to support hypotheses of genetic drift or natural selection, as well as to

indicate the evolutionary potential of a particular fungal population or species. When

comparing the amount of gene variation in fungal populations, several parameters have

to be considered, especially when comparing data from different laboratories, including:

1.



2.



What types of genetic markers were used. The types of genetic markers chosen

have a strong impact on the estimate of gene variation. The maximum genetic

variation, in terms of Nei’s gene diversity, is 1 – 1/k, where k is the number of

alleles. For data generated with biallelic PCR markers, such as RAPD and AFLP,

the maximum number of alleles is two and the maximum value of Nei’s gene

diversity is 0.50. When data are based on multiallelic markers, such as RFLPs

and microsatellites, the number of alleles at one locus can be very large and

gene diversity can be close to the theoretical maximum of 1.0. Therefore,

comparisons of gene variation can be largely meaningless if different genetic

marker systems are used.

Whether monomorphic loci are included. In many population genetic studies

of fungi, researchers have considered only data from polymorphic loci. For

RFLP and microsatellite markers, these loci are chosen randomly from a



DK3133_book.fm Page 247 Tuesday, April 12, 2005 4:01 PM



Methods for Estimating Population Genetic Structure of Fungi



3.



247



genomic library during a prescreening procedure and may represent the most

variable parts of the genome. Data from less variable (monomorphic) loci are

either discarded or are not included in surveys. This exclusion of monomorphic

loci not only inflates measures of genetic variation of fungal species, but also

makes comparisons from different laboratories difficult or impossible.

Whether sample sizes are the same. Unequal sample sizes are commonly

encountered in population surveys, even in data sets generated from the same

laboratory. As all four measures of gene variation are affected by sample size,

quantitative comparisons of gene variation among fungal populations should be

based on standardized sample sizes. Although some researchers have noted the

relationship between gene variation and sample sizes, the implications of this

relationship are apparently not widely understood, and the majority of researchers have not attempted to correct for differences in sample size. Several strategies

can be used to correct for differences in sample size among populations (Leberg,

2002). In our laboratory, we have used three methods to make this correction:

a. First, we have standardized different fungal samples with θ (4Neu for diploid

organisms and 2Neu for haploid organisms), the expected number of new

alleles emerging through mutation each generation (Zhan et al., 2001). The

θ of a fungal population was estimated from the observed number of alleles

in a sample using the sampling theory developed by Ewens (1972). This

strategy does not compare allelic richness directly. Rather, it compares the

potential of generating new mutants under an infinite neutral allele model.

By assuming constant mutation rates across populations, this strategy can

also be used to compare effective population sizes among populations. A

drawback of this strategy is that it requires large sample sizes to make a

reasonable estimate.

b. A second strategy is to standardize populations with resampling by taking

many random subsamples of the same size from the original data set of each

population and calculating gene variation as the mean value of the resamples

(Zhan et al., 2003). In a global survey of the wheat pathogen Mycosphaerella

graminicola, we found that populations from the Middle East displayed the

highest amount of gene variation and we hypothesized that this region is the

center of origin of this fungus. However, initially we did not know whether

differences in gene variation among regional populations reflected differences

in sample sizes or whether the differences were statistically significant. We

used the resampling strategy to distinguish between these possibilities. For

each resampling replication, the total number of alleles and their frequencies

were recorded from a random sample of the same size as the smallest population and the gene diversity was calculated. This procedure was repeated

100 times. The mean and variance for gene diversity and allele number in

each population were calculated and used for a t-test. When using random

resampling strategies to standardize sample size, it is important to standardize

them according to the population with the smallest sample sizes. Reducing

the larger samples to the mean of all populations, for instance, would reduce

but not eliminate bias. The advantage of this procedure is that it provides a

statistical test for the difference while standardizing sample sizes.

c. A third method is to standardize estimates of allelic richness (Zhan and

McDonald, 2003) with a sample coverage method by estimating the number

of undetected alleles according to sample sizes and the frequency distribution

of alleles (Huang and Weir, 2001). This method estimates the total number



DK3133_book.fm Page 248 Tuesday, April 12, 2005 4:01 PM



248



Zhan and McDonald



of alleles in a population regardless of sample size and is a good indicator

of true gene variation.

12.3.1.2



Estimating Genotype Variation



Many methods have been used to estimate genotypic diversity in fungal populations. These

methods can be grouped roughly into three categories. As explained below, all of these

have some inherent problems, and a better estimator of genotype diversity would represent

a significant advance for fungal population studies.

The first category is based on Nei’s measure of gene diversity (Nei, 1973) and its

analogs. This approach has two problems. The first is that Nei’s gene diversity measures

the expected degree of heterozygosity assuming populations are random mating. As mentioned earlier, genotypic diversity in obligate sexual organisms should always be at its

maximum. Any departure from this expectation can be due to nonrandom mating or asexual

reproduction. Therefore, Nei’s gene diversity is not a good measure of genotypic diversity.

When we compared different analytical methods, we found that Nei’s method usually gave

the highest estimates of genotypic diversity (J.Z., unpublished results), suggesting that

this method biases the estimates upward. The second problem is that heterozygosity

measures only a part of genotype diversity (Gregorius, 1978). Homozygosity does not

necessarily decrease the amount of genotypic diversity in populations. For example, in

populations undergoing obligate sexual reproduction, all individuals usually have distinct

multilocus genotypes and contribute equally to the level of genotype variation, regardless

of whether they are homozygous or heterozygous for particular loci.

The second category is represented by Stoddart and Taylor (1980) using Kimura

and Crow’s (1964) effective number of alleles as an index of genotypic diversity. This

index of genotypic diversity has a minimum value of one and maximum value of n, where

n is the sample size. This measure is strongly associated with sample size, making

comparisons among populations of unequal sample size impossible. This problem can be

circumvented by normalizing estimated genotypic diversity against sample size, resulting

in the percentage of the theoretical maximum (Chen et al., 1994), but normalizing is

thought to work well only when populations have high diversity (Grünwald et al., 2003).

Furthermore, the index of Stoddart and Taylor (1980) emphasizes the most common

genotypes contributing to the genotypic diversity of populations while underrepresenting

the contribution of genotypes that are rare.

The third category is based on the Shannon–Wiener index (Shannon and Weaver,

1949; Muller et al., 2002; Sigler and Turco, 2002), also called the Shannon (Steffenson

and Webster, 1992; Kolmer 1993; Purwantara et al., 2000) or Shannon–Weaver (e.g.,

Burdon and Jarosz, 1992; Meijer et al., 1994; Garbeva et al., 2003) index. Because the

first version of the Shannon–Wiener index was published in 1948 (Shannon, 1948), a year

before publication of the book The Mathematical Theory of Communication, coauthored

by Shannon and Weaver (1949), and is built upon the work of Wiener (1948), some

researchers (e.g., Spellerberg and Fedor, 2003) are against using the names Shannon or

Shannon–Weaver index.

The Shannon–Wiener index, ranging in value from zero to infinity in theory, was

originally developed for telecommunication to estimate the average uncertainty in predicting letters of a message, but has been widely used in many fields, including geography

(e.g., Haines-Young and Chopping, 1996; Nagendra, 2002), ecology (e.g., Floder et al.,

2002; Stampe and Daehler, 2003), and population genetics (e.g., Drenth et al., 1993; Kwon

and Morden, 2002). This index emphasizes the measure of species (genotype) richness

(Hurlbert, 1971) and the importance of species or genotypes with a low or rare frequency



DK3133_book.fm Page 249 Tuesday, April 12, 2005 4:01 PM



Methods for Estimating Population Genetic Structure of Fungi



249



0.40



−p In(p)



0.30



0.20



0.10



0.00

0.00



0.20



0.40



0.60



0.80



1.00



Frequencies of events (e.g., genotypes)



Figure 12.1 The contribution of an event’s frequency (genotypes, species, etc.) to the estimate

of the Shannon–Wiener index.



(Figure 12.1). This index is also affected by sample size and suffers problems similar to

the Stoddart and Taylor index when normalized according to sample size (Grünwald et

al., 2003). More discussion on the estimation of genotypic diversity and its associated

problems can be found in a recent publication (Grünwald et al., 2003).

12.3.2

Population Subdivision

When genetic data are available for several fungal populations, it is natural to ask what

degree of population subdivision exists among the populations surveyed. Population subdivision, here defined as the difference in allele frequencies among subpopulations, can

be generated by genetic drift and natural selection. In the absence of sufficient gene flow

among populations, stochastic changes in allele frequencies among fungal populations can

lead to random fixation of neutral alleles, leading to nonadaptive differentiation (Wright,

1938). On the other hand, divergent selection for various ecological or physiological

characters among genetically isolated fungal populations may lead to adaptive population

subdivision. Population subdivision caused by natural selection usually is limited to the

loci directly affected by natural selection and closely linked genes, unless fungal populations are dominated by asexual reproduction.

The simplest way to measure the magnitude of subdivision among fungal populations

is by testing for homogeneity in gene frequencies using two-dimensional contingency χ2

tables (e.g., Rosewich et al., 1998; James et al., 1999; McDonald et al., 1999; Tenzer and

Gessler, 1999; Salamati et al., 2000), such as introduced by Workman and Niswander

(1970). This type of chi-squared test has (r – 1) × (c – 1) degrees of freedom, where r

refers to the number of alleles at a locus and c refers to the number of fungal populations.

The p values generated by the contingency chi-squared test provide implicit statistical

information on whether the fungal populations compared differ in their allele frequencies

and at what levels. Furthermore, the sensitivity of the test increases with sample size.

Because small populations are more prone to sampling error, this property provides a way

to autocorrect the potential type I error associated with small samples.

Three problems may be encountered when using the contingency χ2 test. First, when

some alleles have small expected values under the hypothesis of equal frequencies across

populations, it is necessary to combine the less frequent alleles into one category. Other-



DK3133_book.fm Page 250 Tuesday, April 12, 2005 4:01 PM



250



Zhan and McDonald



wise, the resulting χ2 value may be too large, causing a false rejection of the null hypothesis.

Many statisticians recommend that χ2 should not be performed if the expected count in

the cells is less than five, but this criterion is hard to realize unless sample sizes are large

and loci are moderately variable. In our laboratory, we usually combine into a single

category alleles with frequencies lower than 0.05 for studies with large sample sizes (>50

individuals per population) and 0.10 for studies with small sample sizes (<30 individuals

per population). Second, a contingency χ2 test is conducted for each locus, and statistical

inference regarding population subdivision may not be consistent across randomly chosen

loci. In many experimental studies in fungal population genetics, it is common to find that

the null hypothesis of equal allele frequency across populations is rejected for one or two

loci, but not for other loci. This inconsistency could be expected if the loci chosen have

undergone different evolutionary processes. For example, some of the loci could be under

selection or linked to selected loci, while others are selectively neutral. In this case, further

tests for neutrality of suspect loci may be required. On the other hand, information from

several loci can be joined to give a combined χ2 value. This combined χ2 value has (c –

1) ∑ (ai – 1) degrees of freedom, where c is the number of fungal populations surveyed

and ai is the number of alleles at the ith locus. Third, when multiple fungal populations

are compared, rejection of the null hypothesis of equal frequencies indicates only that there

is some degree of genetic differentiation among the populations surveyed. The rejection

could be caused by one or a few extreme populations, and the contingency χ2 test would

not identify the responsible populations. To identify the extreme populations, more tedious

procedures such as pair-wise comparisons between fungal populations may be needed.

Another approach widely used to measure subdivision among fungal populations is

the fixation index, FST and its analogs (e.g., Carlier et al., 1996; Milgroom et al., 1996;

Purwantara et al., 2000; Douhan et al., 2002; Garcia et al., 2002; Johannesson et al., 2001b;

Kauserud and Schumacher, 2003; LoBuglio and Taylor, 2002; Urbanelli et al., 2003). The

term fixation was first coined by Wright (1921) to quantify the reduction of heterozygosity

compared with a random mating population. The deficiency of heterozygosity in a population could result from nonrandom mating within local populations. It also could result

from differences in allele frequencies among populations due to cryptic population subdivision. If two subdivided populations are combined in an analysis, there often is an

excess in homozygosity (reduction of heterozygosity) called the Wahlund effect. The

degree of this excess reflects the amount of genetic differentiation between the two

subpopulations. The excess homozygosity in the combined population will reduce to that

expected after only one generation of random mating. This reduction in homozygosity

after one generation of random mating is called the Wahlund principle. Wright (1951)

used the formula (1 – FIT) = (1 – FIS)(1 – FST) to partition the fixation index of a subdivided

population, where FIT is the total reduction of heterozygosity (also called the inbreeding

coefficient) in subdivided populations, and FIS and FST are the lack of heterozygosity due

to nonrandom mating within local populations and to population subdivision, respectively.

FST has a value between 0 and 1. When FST equals one, it indicates that populations are

completely isolated, and when FST equals zero, it indicates the populations are identical.

One advantage of using F statistics to measure the level of population differentiation

is that FST also provides a convenient way to estimate gene flow, provided that populations

are in drift–migration equilibrium and mutation is negligible. However, the F statistics

also have a few deficiencies. First, FST does not have a statistical property. To test whether

an estimate is statistically significant, further analysis using a Fisher exact test or permutation is required. In practice, it is common to conclude that populations show little

differentiation if FST is less than 0.05, moderate differentiation if FST is between 0.05 and

0.15, large differentiation if FST is between 0.15 and 0.25, and very large differentiation



DK3133_book.fm Page 251 Tuesday, April 12, 2005 4:01 PM



Methods for Estimating Population Genetic Structure of Fungi



251



if FST is larger than 0.25, but these values are arbitrary. Second, though FST can be calculated

for any population if there are two alleles at one locus, in the presence of multiple alleles,

the formula (1 – FIT) = (1 – FIS)(1 – FST) is no longer valid unless there is no selection

(Nei, 1965). Third, FST can only be used for diploid organisms. For haploid fungi, no

measure of heterozygosity is possible. Fourth, FST does not take sample sizes and number

of subpopulations into account.

To overcome these problems, several analogs of FST have been developed and widely

used to measure population subdivision. One of these analogs is GST (Nei, 1972, 1973).

GST is an averaged FST over alleles and pairs of populations. Nei (1973) stated that GST

can be applied to any populations without assumptions regarding the pattern of evolutionary forces, such as mutation, selection, and migration, and to sexual or asexual organisms

of any ploidy as long as allele frequencies can be estimated. Furthermore, GST can be used

to quantify population subdivision across many hierarchical levels as long as an appropriate

sampling strategy is used. Like FST, GST (Nei, 1972, 1973) does not have a statistical

property and further analysis is required to obtain a statistical inference. It also does not

explicitly take into account differences in sample sizes and number of subpopulations (Nei

and Chesser, 1983).

Another FST analog is θST, proposed by Cockerham (1969, 1973) and Weir and

Cockerham (1984). θST is estimated from each allele separately. When there are more than

two alleles present at a locus, information from the different alleles can be combined by

taking their geometric means (Weir and Cockerham, 1984). θST emphasizes the effects of

sample sizes and number of subpopulations on the estimation of population subdivision.

This statistic assumes that populations are random mating and have equal effective size.

Unlike GST, values of θST are not always positive. A negative θST value indicates that the

populations are so similar in gene frequencies that no population subdivision can be

detected, but the negative value cannot be used to estimate the amount of gene flow.

More recently, analysis of molecular variance (AMOVA) based on the conventional

theory of analysis of variance (Excoffier et al., 1992) has become widely used to study

spatial population structure of fungi (e.g., Hellgren and Hogberg, 1995; Hamelin, 1996;

Norden, 1997; Pimentel et al., 2000; Vainio and Hantula, 2000; Viji et al., 2001; Barrins

et al., 2002; Coates et al., 2002; Jamaux-Despreaux and Peros, 2003; Kauserud and

Schumacher, 2003; Samils et al., 2003; Urbanelli et al., 2003). This analysis produces

estimates of variance components by partitioning the total sum of the squared distances

between all pairs of haplotypes hierarchically and generates φ statistics. The method can

accommodate various types of genetic assumptions on the evolution of populations

(Excoffier et al., 1992) and can provide a statistical test on the significance of the differentiation index. But the method is computation-intensive. AMOVA relates all differences

among DNA haplotypes to step-wise mutation events and estimates evolutionary divergence (distance) based on the number of mutational steps between haplotypes. AMOVA

was developed for analysis of molecular data from nonrecombining regions, such as

mitochondrial genomes, and is probably valid for asexual fungal populations. Some

researchers have used AMOVA to analyze molecular data from recombining regions, thus

invalidating the fundamental assumptions of the method.

Using F statistics (FST and its analogs) to quantify population subdivision assumes

a K allele or infinite allele mutation model. These models state that mutation events should

be rare and independent of the prior states of ancestral alleles so that any genetic similarity

among natural populations can be attributed to historical association or gene flow. These

assumptions may be reasonable for RFLP, AFLP, and RAPD loci, but they may not apply

to many microsatellite loci. Experimental studies from humans and other eukaryotic

species have shown that mutation rates are very high for many microsatellite loci (e.g.,



DK3133_book.fm Page 252 Tuesday, April 12, 2005 4:01 PM



252



Zhan and McDonald



Udupa and Baum, 2001) and the size of new mutant alleles depends on the size of their

ancestral alleles (for a review, see Li et al., 2002). Therefore, FST and its analogs may not

be appropriate for estimating population subdivision of microsatellite data and may overestimate the actual level of genetic similarity among populations. In this case, Slatkin

(1995) proposed to use RST, a quantity analogous to FST, but which takes into account

differences in allele lengths. Results from computer simulations showed that R statistics

performed better for microsatellite data than F statistics (Slatkin, 1995).

Many researchers also use genetic distance to measure the similarity between fungal

populations (e.g., Weir et al., 1998; James et al., 1999; Six and Paine, 1999; Lakrod et

al., 2000; Valverde et al., 2000; Vandemark et al., 2000; Bock et al., 2002; Hurtado and

Ramstedt, 2002; Plante et al., 2002; Says-Lesage et al., 2002). There are several types of

genetic distance, but Nei’s genetic distance (1972, 1978) is most commonly used. Nei’s

genetic distance is defined as D = –ln I = –ln [JXY/(JX JY)]0.5, where I and JXY are the

normalized genetic identity and the probability of identity by descent between populations

X and Y, and JX and JY are the probability of identity by descent within each population,

respectively. This quantity is similar to GST in many ways. It measures the accumulated

nucleotide substitutions per locus between populations. Because genetic distance is thought

to be linearly related to the time since populations diverged from the same ancestor, the

advantage of this quantity is that it can be used to calculate divergence time between two

fungal populations, provided that all assumptions for GST are satisfied and the rate of

nucleotide substitution is constant over time. Furthermore, although Nei’s genetic distance

has been used primarily to measure genetic similarity of populations within species, it can

also be used to measure similarity between distantly related species (Nei, 1972).

To estimate genetic distance, data from several loci can be combined, based on either

the geometric mean of I or the arithmetic means of Jxy, Jx, and Jy. How to weight different

loci is dependent on the level of genetic similarity between the compared fungal populations

and the evolutionary rates among the loci considered. Nei (1974) proposed using the

geometric mean of I to weight different loci if the evolutionary rates among loci were

different and genetic identity (I) between populations was high. Otherwise, the arithmetic

mean of the J values should be used. Apparently, this argument is largely ignored in

empirical studies of fungal population genetics, possibly due to the fact that evolutionary

rates of loci used are not available in most cases. Because many molecular analyses of

eukaryotic genomes have revealed that the evolutionary rates vary across genomes (Kimura,

1983; Kusumi et al., 2002), we believe that researchers should use the geometric mean to

weight genetic distance among loci in cases where evolutionary rates of loci are unknown

and the populations compared are highly similar. When data from different markers are

combined, it is important to include all loci, including monomorphic loci (Nei, 1972).

However, the majority of researchers, including ourselves (Zhan et al., 2003), have included

only polymorphic loci. Excluding monomorphic loci from analysis of genetic distance

undoubtedly overestimates the genetic differences between fungal populations.

12.3.3

Inference of Evolutionary Forces

Migration, the amount of gene flow among fungal populations, is usually estimated

indirectly based on the distribution of allele frequencies among subpopulations. Currently,

two indirect methods are widely used to derive the amount of migration among fungal

populations. One of them is Wright’s FST statistic for estimating the standardized variance

of allele frequencies among local populations (Wright, 1951). Wright (1951) showed that

in an infinite island model, the level of population subdivision is a function of effective

population size and the average rates of migration among subpopulations. If the migration

rate is small and the effective population size large, the relation between population



Xem Thêm
Tải bản đầy đủ (.pdf) (966 trang)

×