**OPTION 1** Hardy Weinberg Exact Tests

(adapted from the original Genepop 4.0 documentation)

Three distinct tests are available, all concerned with the same null hypothesis Ho (= random union of gametes). The difference between them is the construction of the rejection zone. For the Probability-test (sub-option 3), the probability of the observed sample is used to define the rejection zone, and the P-value of the test corresponds to the sum of the probabilities of all tables (with the same allelic counts) with the same or lower probability. This is the "exact HW test" of Haldane (1954), Weir (1990b), Guo and Thompson (1992) and others. When the alternative hypothesis (H1) of interest is heterozygote excess or deficiency, more powerful tests than the probability-test can be used (see Rousset and Raymond, 1995). One of them, the score test (U test), is available here, either for H1 = heterozygote deficiency (sub-option 1) or H1 = heterozygote excess (sub-option 2). The multi-samples version of these two tests are accessible through sub-options 4 or 5.

Two distinct algorithms are available:

- The complete enumeration method, as described by Louis and Dempster (1987). This algorithm works for less than five alleles. As an exact P-value is calculated by complete enumeration, no standard error is computed.
- A Markov chain (MC) algorithm to estimate without bias the exact P-value of this test (Guo and Thompson, 1992).

- the dememorization number. Enter 1000 if you have no other idea (i.e. the default option). Values below 100 or above 10,000 are not allowed for the web version.
- the number of batches (B). We suggest 100 for a first trial (i.e. the default option for sub-options 1-3) or 20 for sub-options 4-5. Values below 10 or above 32,767 are not allowed.
- The number of iterations per batch (C). We suggest 1000 for a first trial (i.e. the default option). Values below 400 or above 2,147,483,647 (32,767 for sub-options 4 and 5) are not allowed.

**NB**. Much higher values for the MC parameters are allowed for the PC version of Genepop. For greater control, download the software to a local machine. Visit http://kimura.univ-montp2.fr/%7Erousset/Genepop.htm.

For all tests concerned with sub-options 1-3, there are three possible cases. The number of distinct alleles at each locus in each sample is

no more than 4:Genepop will give you the choice between the complete enumeration and the MC method. If you have less than 1000 individuals per sample, the complete enumeration is recommended. Otherwise, the MC method could be much faster. But there are no general rules, results are highly variable, depending also on allele frequencies.

always 5 or more:Genepop will automatically perform only the MC method.

sometimes higher than 4, sometimes not:For cases where the number of alleles is 4 or lower, Genepop will give you the choice between both methods. For the other situations (5 alleles or more in some samples), the MC method will be automatically performed.

Whether one wants enumeration or MC methods to be performed can be specified on the input form.

Several important results are provided for each test by this option:

- the P-value associated with Ho (or '-' if no data were available, or only one allele was present, or two alleles

were detected but one was represented by only one copy) - the standard error (S.E.) of this estimate (only if a MC method was used)
- two estimates of Fis, Weir & Cockerham's (1984) estimate (W&C), and Robertson & Hill's (1984) estimate (R&H). The latter has a lower variance under the null hypothesis. Finally, the number of 'steps' is given: for the complete enumeration algorithm this is the number of different genotypic matrices considered, and for the Markov chain algorithm the number of switches (change of genotypic matrice) performed

If S.E. is too large (say S.E. > 0.01), it is wise sometimes to run the analysis again,
and increase the number of batches (if you tried 100 for the first trial, use 200 next, with
C = 1000). How close the estimate is to the true value depends on the product BxC: the larger,
the better.

For sub-option 3, a global test across loci or across sample is constructed using Fisher's method. This method (sometimes conservative because discrete probabilities are analyzed), is only performed for convenience and its relevance should be first established (e.g. statistical independence of loci).

General statistical theory shows that there is no uniformly better way to combine P-values of different tests. When an alternative model is specified, it is possible to find a better way of combining results from different data sets than Fisher's method, and usually not by combining P-values. In the present context one such method is the multisample score test of Rousset & Raymond (1995), which defines a global test across loci and/or across samples generalizing the tests of sub-options 1 and 2. The global tests are performed by sub-options 4 and 5, only by the MC algorithm. Independence of loci is also assumed for these global tests. The output file reports global P value estimates and standard errors per population, per locus, and over all loci and populations. For each global P value, the average number of switches per test combined is also reported. Since it is tempting to reduce the chain length parameters in this option, special care is needed in

checking this accuracy diagnostic (see appendix). This option generates several large temporary files. The space used temporarily by Genepop can be estimated as: (# of Loci+# of pop+1)*batches*(iterations per batch)*8 octets. For example it will require about 240 Mo of temporary hard disk space if you have 10 loci, 50 samples and if you use a chain of 500,000 steps

(100 batches of 5000 iterations).

Results are returned via your web browser which you can then save to you local machine. You may also choose to have them emailed to you.

*Last Modified on
December 1, 2020
by Eleanor Morgan*

[Genepop Option 1]
[Genepop Home Page]
[Bibilography]