A large family size is required to detect sexual heterogeneity of the recombination fraction using dominant and codominant DNA-markers *

Maximum likelihood methods for testing sexual heterogeneity of the recombination fraction using dominant and codominant DNA-markers are developed. The methods are tested by Montecarlo computer simulation. Two crosses were investigated 1) Ab/aB × Ab/aB and 2) Ab/aB × Ab/αB in which A, a, and α are alleles at a codominant marker and B and b are alleles at a dominant marker. Estimates of the recombination fraction are biased by more than 6 cM if family size is small (50) for the f irst cross. Also, the empirical rate of type I error under homogeneity ranged between 0.0 and 3.7 when using the tabulated values corresponding at 5%. Even for large family sizes, the rate of type I error was close to 0 when the expected rate was 1 or 5%. Empirical power was very low for the self-fertilization cross requiring large family sizes (2,000) and large differences in the recombination fraction in the two sexes (differences larger than 20 cM). For the second cross (Ab/aB × Ab/αB), both parents were heterozygous but they do not have identical genotypes at the codominant marker. Estimates of the recombination fraction for the two sexes were nearly unbiased in the presence of heterogeneity. The rate of type I error was similar to their expected values for this cross. For a low recombination fraction, empirical power for this cross was higher than 50% for a family size of 500 and a true difference in recombination fraction in the two sexes of approx. 10 cM. Additional key words: dominant markers, heterogeneity, linkage analysis, recombination fraction.


Introduction 1
Genetic maps are constructed for most plant and farm animal species.In species in which research resources are large, the genetic markers of choice are usually microsatellites.On the contrary, for species without enough resources for genomics research, such as many plants and aquaculture species, markers widely used consist of Amplified Fragment Length Polymorphism (AFLP) or Random Amplif ied Polymorphic DNA (RAPD).These markers are essentially dominant, which means that homozygotes for the dominant allele and heterozygotes cannot be distinguished.In addition, genetic maps for those species utilize microsatellites in order to link dominant markers to known locations in the genome.It has been established that linkage analysis using dominant markers are seriously biased when the markers are in repulsion phase and the number of offspring is small (Säll and Nilsson, 1994).It is not known if estimates of the recombination fraction are biased when the linkage analysis includes codominant and dominant loci.
Heterogeneity occurs when there are differences in the recombination fraction for different individuals or for individuals belonging to different sexes (sexual heterogeneity).Evidences of sexual heterogenity have been reported in several species of plants such as Pinus radiata (Moran et al., 1983), maize (Robertson, 1984), Pinus taeda (Groover et al., 1995) and species of the genus Populus (Yin et al., 2001).Difference in the recombination rate for the two sexes, has also been reported for aquaculture species such as rainbow trout and Atlantic salmon (Sakamoto et al., 2000;Moen et al., 2004).
A general procedure for testing sexual heterogeneity was developed by Wu et al. (2002).However, they did not investigate if estimates of the recombination fraction using dominant and codominant markers are biased under sexual heterogeneity.They did not investigate power for alternative family sizes when markers are dominant and codominant in the same linkage analysis.
The objectives of this paper are 1) to develop maximum likelihood equations to test for heterogeneity of the recombination fraction in males and females using a codominant and a dominant marker, 2) to determine by Montecarlo simulation if estimates of the re-combination fraction in the two sexes are biased when one of the markers is dominant and the other is codominant, 3) to evaluate the magnitude of type I errors empirically in the test of sexual heterogeneity of the recombination fraction, and 4) to compute empirical power of detection of heterogeneity for alternative sizes of a large full-sib family.

Theory Maximum likelihood estimation under sexual heterogeneity using a codominant and a dominant marker
The structure considered is a full-sib family of relatively large size as often found in many plants and fish species.Consider a codominant marker (with alleles A, a and α) and a dominant marker (with alleles B and b).Offspring at the dominant marker can be BB, Bb, or bb but individuals BB and Bb cannot be distinguished from each other.Assume that c s is the recombination fraction in sires, and c d is the recombination fraction in dams.Two types of crosses were considered 1) Ab/aB × Αb/aB and 2) Ab/aB × Ab/αB.The f irst corresponds to self-fertilization or the cross between two outbred plants heterozygous at a codominant marker for the same alleles and, therefore, with no informative offspring at the codominant marker.The second allows all offspring to be informative at the codominant marker.These crosses represent all situations where meiosis from both sexes can be used to estimate recombination fractions using one dominant and one codominant marker.For simplicity and given the relatively large size of the full sib family, linkage phases were assumed to be known.

Cross Ab/aB × Ab/aB
The gametes contributed by the sire and dam and corresponding frequencies for the genotypes of the offspring are given in Table 1.Using this table, the likelihood equation can be written as: where k = constant, n ij is the gametic count for the offspring with genotype «i» at the codominat marker (i = AA, Aa, aa) and genotype «j» at the dominant marker , and φ aabb = = 1 /4 c d c s.

Cross Ab/aB × Ab/αB
The gametes contributed by the sire and dam and corresponding frequencies for the genotypes of the offspring is given in Table 2.The likelihood equation is: where k =constant, n ij is the gametic count for the offspring with genotype «i» at the codominat marker (i=AA, Aa, aa) and genotype «j» at the dominant marker (j = B-, bb),

Testing linkage heterogeneity in the sexes
The testing of heterogeneity of the recombination fraction between sexes can be carried out using a likelihood ratio test (LRT): where L(c ˆs, c ˆd) is the maximum likelihood under heterogeneity and L(c ˆ) is the maximum likelihood under homogeneity, i.e, when c s = c d .LRT is distributed as a χ 2 with one degree of freedom.

Monte Carlo simulation
A full-sib family with varying offspring size (50,100,500,2000) was simulated in order to investigate how progeny size might affect bias, type I error, and power of detection of heterogeneity of the recombination fraction in the two sexes.It was assumed that linkage phase was known.The probability of transmission of either allele at each marker from sire and dam to offspring was assumed to be 1 /2.
Offspring from the sire was simulated using random drawings from a uniform distribution.For the cross Ab/aB × Ab/aB the generation of the gametic counts for sires was as follows.If a drawing was in the interval between 0 and 1 /2 (1-c s ), then the gametic count for Ab was increased by one.If the drawing was between 1 /2 (1-c s ), and 1 /2 then the gametic count for genotype AB was increased by one.If a drawing was in the interval between 1 /2 and 1 /2 (1+c s ), then the gametic count for ab was increased by one.If the drawing was between 1 /2 (1+c s ), and 1 then the gametic count for genotype aB was increased by one.The gametes from the dams were generated following the same procedure.The genotypes in the offspring were constructed using sire and dam contributions.The offspring genotypes from the cross Ab/aB × Ab/αB were generated following the same principles.Progeny genotypic counts were used to estimate recombination fraction using the grid search method.Each simulation set was replicated 1,000 times.Simulated recombination fractions for sires and dams were 0.05, 0.15, 0.25 and 0.35 and all combinations of them.LRT for testing heterogeneity of the recombination fraction in the sexes was computed for each replicate.
Bias for the estimates of the recombination fraction in sex j was computed as: where c ˆji is the estimate of the recombination fraction for the i-th replicate and the j-th sex (j = 1,2), rep is the number of replicates in the simulation and c j is the true (simulated) recombination fraction in sex j.Variance of the estimates of the recombination fraction for sex j was computed as: Mean square error (MSE) of the estimates of the recombination fraction for sex j was computed as: Empirical rate of type I error was computed in simulations sets where simulated (true) recombination fractions were identical in males and females by computing the number of replicates exceeding the thresholds at P < 0.01 and P < 0.05 corresponding to a χ 2 with 1 degree of freedom and by dividing those values by the total number of replicates.Empirical power was computed in the simulations in which simulated recombination fractions for the male and the female were different, and by computing the number of replicates that were significant after using the tabulated values divided by the total number of replicates.

Results
First, recombination fractions using the full model (sexual heterogeneity) under the null hypothesis (homogeneity) were estimated.Estimates were generally biased in the cross Ab/aB × Ab/aB and unbiased for the cross Ab/αB × Ab/aB.For example, bias in the cross Ab/aB × Ab/aB for a family size of 50 varied from 2 to 12cM.
Second, estimates of the recombination fractions in the two sexes using the full model and the alternative hypothesis (heterogeneity) were carried out.The estimates of recombination fractions in sires and dams, and their variances, together to the mean square error are presented in Table 3 for the cross Ab/aB × Ab/aB and in Table 4 for the cross Ab/αB × Ab/aB.If the family size is relatively small (50 or 250 offspring) then the estimates of the recombination fraction in the two sexes for the Ab/aB × Ab/aB cross are biased.The bias can also be noticed by comparing the values of the variances of the estimates of the recombination fraction with their corresponding mean square error.For small family sizes, the bias can be more than 0.06 units of recombination fraction (approx.6 cM) and tend to be negative.Therefore, the loss of information when both parents are heterozygous for the same alleles at the codominant marker affects the estimation of the recombination fraction.For large family sizes (500 or 2000), estimates of the recombination fractions are nearly unbiased since the values of the average of the estimates of recombination fraction and the simulated are similar.Also, there is a good agreement between the variance of estimates of the recombination fraction of the two sexes with their respective mean square error.If the codominant marker is fully informative (cross Ab/αB × Ab/aB) then the estimates of the recombination fractions in the two sexes are nearly unbiased (Table 4).Bias is 0.0082 or less for all situations simulated.There is a reduction in the bias as family size increases.In addition, variances of the estimates of the recombination fractions were very similar to their corresponding mean square error.
The next step was to evaluate if the maximum likelihood method using tabulated values is appropriate for testing heterogeneity of the recombination fraction between sexes.The empirical percentage of significant results at P < 0.01 and P < 0.05 when simulating the null hypothesis (homogeneity) in a variety of scenarios is depicted in Table 5.That is, the empirical type I error was computed for each set and compared to their expected 1 or 5%.The performance of the cross Ab/aB × Ab/aB to detect heterogeneity when in reality does not exist is clearly low for any sample size.For example, for an expected rate of type I error of 5%, the empirical values in the simulation ranged between 0.0 and 3.7.It seems that the rate of type I error is much lower than expected when the recombination fraction is below 0.05.The estimates of the recombination fractions under homogeneity were in good agreement with those simulated in the cross Ab/αB × Ab/aB.The rate of type I error was generally in good agreement with their expected values.
A next relevant question is to know which family size is needed to detect heterogeneity under true heterogeneity.The empirical power computed as the percentage of significant results at P < 0.05 and P < 0.01 for both types of crosses are presented in Table 6.Power to detect heterogeneity in the cross is very low for all cases considered except when the size of the offspring is large and the differences between sexes are also large.For example, for 2000 offspring and recombination fractions of 0.05 and 0.35, the power is 88.5 and 96.4 for significance levels at P < 0.01 and P < 0.05, respectively.This amount of offspring is feasible in many plants and fish species.However, it may not be practical the use of such large family size given the cost and effort required.On the other hand, power to detect heterogeneity at a significance level of P < 0.05 is higher than 50% for the cross Ab/αB × Ab/aB for differences in the sex recombination rate of 10 units (approx.10 cM) and family sizes of 500 except when recombination rates in the two sexes were high (over 0.25).

Discussion
Heterogeneity of the recombination fraction between sexes using linkage information on dominant and codominant markers is investigated in this paper.The first step was to construct maximum likelihood equations allowing testing heterogeneity of the recombination fraction in the two sexes.The methods were implemented in software and tested by computer simulation.In a second step, bias in estimating the recombination fraction was investigated.It can be concluded from the results of the simulation experiments using crosses Ab/aB × Ab/aB that 1) sexual heterogeneity can be detected when in reality there is not heterogeneity, and 2) estimation of the recombination fraction is biased for small progeny groups.Bias in estimating recombination fraction has already been reported when the two markers are dominant and in a repulsion phase (Säll and Nilsson, 1994) or when neglecting non-informative offspring using half-sib offspring without genotype information from the dams (Gómez-Raya, 2001).The bias reported in this paper is likely due to limited family size since it decreases as family size increases in the simulation.This is a general problem of the maximum  likelihood estimators that are only asymptotically unbiased for large sample sizes.Linkage reports might provide estimates of recombination fractions (and genetic distances) with different bias depending on family size of each study.In addition, the required family size for unbiased estimation of the recombination fraction is prohibited for economical reasons.It is more cost efficient the typing of more codominant markers in the chromosomal area such as microsatellites.Bias for the other cross (Ab/αB × Ab/aB) was much smaller even for small family sizes.A third step was to evaluate the magnitude of the empirical type I error for the two crosses investigated.The computations in the simulation made use of thresholds values tabulated from a χ 2 distribution.This represents the most common situation in which the researcher performs the test using tabulated values.More advance computational methods can use permutations of the same data to generate the distribution of the parameter under the null hypothesis.The underestimation of the empirical thresholds values in the first type of cross would support the need of such methods in those situations even for relative large family sizes.However, a general good agreement was found for the cross Ab/αB × Ab/aB, which would support the use of tabulated values.
On the other hand, linkage phase was assumed to be known in this study.This may not be the general situation found in practice.However, the required large family sizes used for testing heterogenity would give much weigh determining the maximum likelihood of the most likely phase compared to the maximum likelihood of the less likely phase.Consequently, the impact of this assumption on the results must be rather small.
The general conclusions of this paper is that linkage heterogeneity in the two sexes can be reliably detected using family sizes of at least 500 offspring as long as the genotypes at the codominant marker are not identical in the two parents.For smaller family sizes and other genotype configurations power for detection is only high when the difference in the recombination fraction in the two sexes is rather large (more than 20 cM).The-

Table 4 .
Simulation results from the cross Ab/αB × Ab/aB, were α in an allele different from A and a. Bias in estimating recombination fraction in sires [Bias(c ˆs)], and dams [Bias(c ˆd)], variance of the estimates of the recombination fractions for sires [V(c ˆs)], and dams [V(c ˆd)], and mean square error of the estimates for sires [mse(c ˆs)], and dams [MSE(c ˆd)].c s = simulated recombination fraction in sires.c d = simulated recombination fraction in dams.100 multiply all values in the table but the size.Size = number of full-sib offspring in the family

Table 1 .
Gametes from sire and dam and resulting offspring genotypes with their expected frequencies: c s : recombination fraction in sires.c d : recombination frequency in dams.The cross is Ab/aB × Ab/aB

Table 2 .
Gametes from sire and dam and resulting offspring genotypes with their expected frequencies: c s : recombination fraction in sires.c d : recombination frequency in dams.The cross is Ab/aB × Ab/αB

Table 3 .
Simulation results from the cross between Ab/aB × Ab/aB.Bias in estimating recombination fraction in sires [Bias(c ˆs)], and dams [Bias(c ˆd)], variance of the estimates of the recombination fractions for sires [V(c ˆs)], and dams [V(c ˆd)], and mean square error of the estimates for sires [mse(c ˆs)], and dams [MSE(c ˆd)].c s = simulated recombination fraction in sires.c d = simulated recombination fraction in dams.100 multiply all values in the table but the size.Size = number of full-sib offspring in the family

Table 5 .
Empirical rate of Type I error when testing herogeneity between sexes at significance level of 0.05 (P < 0.05) and at significance level of 0.01 (P < 0.01) for true homogeneity of the recombination fraction between sexes and for varying sizes in the offspring in crosses Ab/aB × Ab/aB and Ab/αB × Ab/aB.c: simulated recombination fraction.All values in the table but the size are multiplied by 100

Table 6 .
Empirical power for detection of heterogeneity between sexes at significance level of 0.05 (P < 0.05) and at significance level of 0.01 (P < 0.01) for different true recombination fraction in sire (c s ) and dams (c d ) and for varying sizes in the offspring