Crossing Over…Markov Meets Mendel

Chromosomal crossover is a biological mechanism to combine parental traits. It is perhaps the first mechanism ever taught in any introductory biology class. The formulation of crossover, and resulting recombination, came about 100 years after Mendel's famous experiments. To a great extent, this formulation is consistent with the basic genetic findings of Mendel. More importantly, it provides a mathematical insight for his two laws (and corrects them). From a mathematical perspective, and while it retains similarities, genetic recombination guarantees diversity so that we do not rapidly converge to the same being. It is this diversity that made the study of biology possible. In particular, the problem of genetic mapping and linkage—one of the first efforts towards a computational approach to biology—relies heavily on the mathematical foundation of crossover and recombination. Nevertheless, as students we often overlook the mathematics of these phenomena. Emphasizing the mathematical aspect of Mendel's laws through crossover and recombination will prepare the students to make an early realization that biology, in addition to being experimental, IS a computational science. This can serve as a first step towards a broader curricular transformation in teaching biological sciences. I will show that a simple and modern treatment of Mendel's laws using a Markov chain will make this step possible, and it will only require basic college-level probability and calculus. My personal teaching experience confirms that students WANT to know Markov chains because they hear about them from bioinformaticists all the time. This entire exposition is based on three homework problems that I designed for a course in computational biology. A typical reader is, therefore, an instructional staff member or a student in a computational field (e.g., computer science, mathematics, statistics, computational biology, bioinformatics). However, other students may easily follow by omitting the mathematically more elaborate parts. I kept those as separate sections in the exposition.


Mendel and High School Biology
Sexually reproducing organisms generally combine heritable traits from two parents. The biological process that combines those traits is called meiosis. While mutations could occur during meiosis, most of the variation arises from the combinations of parental traits. How do these parental traits combine? The dominant theory was that some sort of blending or averaging took place. However, such a mode of inheritance would result in an average of all ancestors after only a modest number of generations (imagine repeatedly mixing colors). Instead, by performing experiments on plants, Mendel pointed out the existence of discrete elements that combine but do not mix. Figure 1 shows the simulated number of types of individuals as a function of time. Averaging, with traits taking real values in ½1,10, is used on one population, and the model described in the section ''A Simple Model'', with elements (later called alleles) taking discrete values in f0,1g, is used on another. Mutations are ignored. In both cases, a population size of 100 is kept constant for the entire duration of the simulation (100 time steps). The simulation is repeated 1,000 times to obtain an average for each time step.
Mendel formulated the concept of a gene (unit of inheritance), and hypothesized that inheritance is governed by the following two laws of genetics: 1. Segregation: Each sexually reproducing organism has two alleles (copies) for each gene, one inherited from each parent; and in turn will contribute, with equal probability (1=2), only one of these two alleles. 2. Independent assortment: Alleles of different genes are inherited independently (later deemed not so accurate).
The state of a gene, the genotype, is determined by the two alleles. The resulting trait, the phenotype, is then a function of this state. When the alleles are the same, the gene, or equivalently the genotype, is homozygous; otherwise, it is heterozygous. For example, if an allele can be either a or A, then the possible genotypes are aa, aA, Aa, and AA. Table 1 shows the possible segregations of parental genotypes when at least one of them is heterozygous.
In a dominant/recessive mode where A is dominant, the corresponding phenotype is obtained as a function of the genotype as shown in Table 2, leading to a 3:1 ratio, a 1:1 ratio, and a 1:0 ratio of dominant to recessive phenotypes, respectively.
Students often overlook that these ratios are not simply based on counting the entries, but the result of the segregation law: each allele is contributed with equal probability, i.e., 1=2, resulting in a probability of 1=2 : 1=2~1=4 for each entry in the tables. Table 3 shows another example involving two heterozygous dominant/recessive genotypes that lead to a 9:3:3:1 ratio of phenotypes. In addition to the segregation law, students should be reminded that this ratio assumes that the law of independent assortment holds: alleles of different genes are inherited independently, resulting in a probability of 1=2 : 1=2~1=4 for each assortment (refer to the next section for a mathematical definition of independence), thus a probability of 1=4 : 1=4~1=16 for each entry in the table.

Chromosome, Crossover, and Recombination
About 100 years later, it was established that the physical structure underlying Mendel's laws is the chromosome (for simplicity, a long molecule of DNA). This discovery matched Mendel's experiments really well: In diploid organisms like us chromosomes come in pairs (thus the name diploid), one from each parent! With few exceptions, each chromosome of the pair has copies of the same genes (special stretches of DNA) arranged in the same order: the alleles! In an attempt to explain experimental results and confirm Mendel's laws, chromosomal crossover was formulated and described by Thomas Morgan (coincidentally, his student John Northrop was a teacher of botany at Hunter College, the author's institution), but demonstrated only about 20 years later. Crossover is a mechanism that occurs at the early stages of the meiotic prophase, and combines the two chromosomes of the pair into one, a process called genetic recombination. During this process, the chromosome of the pair that is the source of the allele alternates every so often. Exactly when the switch-the crossoverhappens is almost arbitrary.
When two alleles come from different chromosomes of the pair, their corresponding genes are said to recombine (can you identify the recombinations in Table 3?). Figure 2 illustrates a genetic recombination with one crossover.

A Slight Discrepancy and Genetic Linkage
Mendel's laws (segregation and independent assortment) dictate that genetic recombination occurs with a probability of 1=2. Let's re-examine why this holds true. Let a and A be the two alleles of gene i on the two chromosomes. Similarly, let b and B represent the same for gene j, respectively. Chromosomal crossover will result in recombination of gene i and gene j if one of the two assortments aB and Ab occurs. Since each allele is contributed with equal probability (segregation), both a and B are contributed with probability 1=2. Since alleles of different genes are inherited independently (independent assortment), the assortment aB occurs with probability 1=2 : 1=2~1=4 (refer to the next section for a mathematical definition of independence). The same analysis applies for the assortment Ab, leading to an overall recombination probability of 1=4z1=4~1=2.
However, it has been observed that some pairs of genes show a correlation in their alleles, e.g., their probability of recombination is less than 1=2. In this case, there is a linkage between the genes. How can we now incorporate this notion into the mathematics of Mendel's laws, which so far have relied on the fact that genes are not correlated (assorted independently)? Fortunately, a simple probabilistic model based on Figure 2 (1 crossover) will capture the effect of linkage, and as a result, alleles that are near each other on a chromosome will tend to be inherited together. The inaccuracy of Mendel's law of independent assortment lies therein. Nevertheless, one should still expect that genes which are far from each   other on a chromosome (or on different chromosomes altogether) will assort independently, as Mendel once observed. It will require a better probabilistic model to reflect those two contradictory behaviors (genetic linkage and independence); the later introduction of the Markov chain will take care of this. But first, I will present a simple probabilistic model for genetic linkage. And before doing so, let's review some basic mathematics.

What Do We Need to Know? Probability
Let S~f1, . . . ,ng. A subset of S, E(S, is considered as an event (but not all events are subsets of S). Given a variable x, define the following probabilities of events: n , 1ƒiƒn (uniformly random) where D D denotes the size of a set. So P(S)~1. The negation of an event will always satisfy: Given For instance, if E 1 is an event of probability q and E 2 (S, then P(E 1 and E 2 )~qDE 2 D=n. In general, however, E 1 and E 2 may not be independent. So we define the probability of E 2 conditional on E 1 , i.e., the probability of E 2 given that E 1 occurs.

Matrix Multiplication
I will assume some familiarity with matrices. If, however, this notion is unfamiliar, the parts of the exposition that use matrices may be skipped. Only 2|2 matrices will be considered in this exposition. The multiplication of 2|2 matrices is defined below.

Geometric Series
One of the series that is almost invariably covered in basic calculus is the geometric series.

Exponential Limit
This is one of the basic expressions covered when studying limits. (1z a n ) n~ea , e~2:71828183 Therefore, (1za=n) n &e a for large n.

Logarithm
Here's the definition of natural logarithm and some of its properties: ln a~bua~e b ln a b~b ln a

Harmonic Series
Another famous encounter is the harmonic series and its approximation.

A Simple Model
Motivated by Figure 2, a uniform 1crossover model can be constructed as follows: Consider a chromosome with n genes, i.e., n alleles on each chromosome of the pair. A crossover x is equal to i if it separates gene i and gene iz1, where gene nz1 is hypothetical when x~n, i.e., no crossover. Assume that x is uniform in f1, . . . ,ng (thus the name of the model).

Linkage
Based on the above setting, x takes any value in f1, . . . ,ng with probability 1=n. Two genes at a distance 0ƒdvn, say i and izd, will recombine if x is in fi, . . . ,izd{1g, i.e., with probability 1=nz . . . z1=n (d times), p d~d n This confirms that genes within a close distance (small d) on the chromosome are less likely to be subject to recombination (genetic linkage). Genes that are far apart (large d) have a high probability (up to 1{1=n) of recombination, but are they independent (see ''What Is Wrong'' section)?

Segregation
To find the probability that a given allele of gene i is inherited, let E with probability q be the event that the recombination process starts on the given chromosome of the pair. This event and that genes 1 and i recombine (an event of probability (i{1)=n) are independent. The probability of inheriting the given allele is: zP not E and genes 1 ð and i recombineÞ The addition is justified by the exclusivity of the events: a given allele is inherited when the process starts on the given chromosome and genes 1 and i do not recombine, or when the process starts on the other chromosome and genes 1 and i recombine. Due to the independence of E and recombination, the above becomes: A reasonable assumption is that q~1=2 and, in this case, the above evaluates to 1=2 for every i, as predicted by the segregation law.

Genetic Mapping
Genetic mapping is the problem of placing the genes along the chromosome in their correct relative order. The bad news: It is hard! The good news: Genetic linkage can be used to infer genetic mapping. Though obsolete (it has been done), genetic mapping can be considered to be the first effort towards a computational approach to biology. How does it work?
In the uniform 1-crossover model, genetic linkage tells us that the probability of recombination of two genes is proportional to the distance between these genes. Now consider the genotyping depicted in Table 4 where frequency of recombination can be used as a measure of distance. In a way analogous to Table 4, analyzing the frequency of different pairs of the phenotypes A, B, and C might reveal, for instance, that B and C recombine more often than A and B; therefore, we infer that B is closer to A than C. Such arguments help us to derive the gene order on the chromosome (relative order, not exact distances). While it may be hard to set up the experiment and obtain many offsprings to estimate probabilities, such arguments were definitely behind the construction of the early genetic maps, e.g., the first map of the human genome (all the chromosomes) in 1987.

What Is Wrong?
The reader may choose to skip this section to the next. The uniform 1crossover model is very insightful in explaining Mendel's law of segregation with independent assortment corrected to reflect genetic linkage. However, it suffers from a few deficiencies.

Linkage: OK But…
Nothing is seriously wrong about this aspect. By assigning lower probabilities of recombination for smaller distances, the distance between two genes justifies their linkage when they do not assort independently. However, the actual probability of recombination may not necessarily be proportional to distance or have a dependence on the chromosome length, as in p d~d =n (but more on this in the Markov section).

Segregation: Too Sensitive
The probability of inheriting a given allele is contingent on the probability that the recombination process starts on the given chromosome of the pair, previously called q. If q~1=2, the probability of inheriting a given allele is 1=2, as it should be by the segregation law. While this is a biologically reasonable assumption on q, the segregation law stands very sensitive to this particular choice. A slight deviation from q~1=2 could result in a similar deviation in the probability of inheriting the given allele. Let q~1=2{E, then this probability for gene i is (from the ''Segregation'' section): When i~n, i.e., (i{1)=n&1, this is approximately 1=2zE. If the starting of the recombination process favors one chromosome, E can be large, say close to 1=2 (q&0). The above probability becomes arbitrarily close to 1. This means that the given allele will be inherited almost always.

Independent Assortment: Breaks
Despite genetic linkage, one should still expect that genes which are far from each other on the chromosome will assort independently. Because each chromosome can be treated separately, this independence is certainly true for genes that are on different chromosomes altogether. But on the same chromosome, the probability of recombination p d~d =n implies, for instance, that recombination of gene 1 and gene n occurs with a probability of (n{1)=n&1 for large values of n. Therefore, gene 1 and gene n are highly correlated, and thus dependent (they will almost always recombine).
In retrospect, two genes i and j recombine when the alleles of the two genes are inherited from different chromosomes. Since the probability of inheriting a given allele is 1=2 when the segregation law holds, independence then dictates that the probability of recombination of gene i and gene j must be equal to 1=2. To see this, let E i and E j represent the events of inheriting a given allele for gene i and gene j, respectively, then: P(genes i and j recombine) P(E i and not E j or E j and not E i ) where addition is justified by exclusivity of events, and the last equality follows from that gene i and gene j are independent. When the segregation law holds, P(E i )~P(E j )~1=2 and the above expression evaluates to 1=2 : 1=2z1=2 : 1=2~1=4 z1=4~1=2. Assuming q in the previous section is 1=2, genes are independent if and only if d~n=2. Therefore, the law of independent assortment fails when genes are on the same chromosome. Now, why do we insist that the model must satisfy, among other properties, the law of independent assortment? Well, first because it is a correct law for distant genes. And second, since the probability of recombination increases with distance due to genetic linkage, the law of independent assortment tells us that the probability of recombination increases up to 1=2, but cannot exceed 1=2 (this statement excludes hotspots, which are regions on the chromosome that experience a high probability of recombination even at small distances). It is important for students to make this realization, which will come in handy when solving genetic mapping problems, as illustrated in the section ''A Computational Example of Genetic Mapping''.

Generalization: Not Easy
One might consider extending the uniform 1-crossover model as an attempt of generalization to mimic the actual biological process. However, I will show that extending this model in the most natural way (mathematically, that is) will break the linkage property. For this purpose, consider a uniform 2-crossover model. Let x 1 be the first crossover which is uniform in f1, . . . ,ng (as before), and x 2 be the second crossover which, conditional on x 1 , is uniform in fx 1 , . . . ,ng. Therefore, x 1 and x 2 are not independent, for x 2 cannot precede x 1 . The choice of x 2 §x 1 simplifies the math, but making x 2 wx 1 does not change the results. Now, why even bother to show that this model, which is more difficult to analyze than its predecessor, does not work? Well, my experience in teaching has been the following: While it is important to show students what works, it is equally important to show them what does not work.
With this in mind, all we need is a counter example, so consider gene 1 and gene dz1 (these two genes are at a distance d from each other). The probability of a recombination of gene 1 and gene dz1 is: P(x 1 ƒd and x 2 wd) Using conditional probability and the harmonic series approximation, the ''Uniform 2-crossover Model'' section shows that when n{d is large, this probability is approximately n{d n ½ln n{ln(n{d) We can rewrite the above as: This is not an increasing function of d.
Therefore, we have the highest probability of recombination when (n{d)=n~1=e, i.e., d~n(1{1=e). Note that in this case n{d~n=e, which is large (as required above) when n is large. This means that gene 1 is most likely to recombine with a gene located at a distance approximately 63% of the chromosome length (see Figure 3). While this is an interesting result, it stands as a pure mathematical endeavor with no biological basis.

A Better Model: When Markov Meets Mendel
While the uniform 1-crossover model captures the essentials of segregation and linkage, it is lacking in some important aspects. First, the probability that a given allele is inherited (should be 1=2) depends on an implicit parameter of the model (q in the ''Segregation'' section must be 1=2). Second, genes exhibit the linkage property but they are almost never independent, as this would require a probability of recombination equal to 1=2 (see ''A Slight Discrepancy and Genetic Linkage'' section). From the ''Linkage'' section, this probability is expressed as d=n, implying that only genes at a distance equal to half the chromosome length are independent. Moreover, the probability of recombination depends on the chromosome length and, therefore, two chromosomes that are locally similar but have different lengths exhibit different local recombination behavior. This is not biologically justifiable. Finally, a generalization (with uniformity maintained) to mimic the real biological process with multiple crossovers is not conceivable. A better mathematical model is needed to rectify the above deficiencies. In principle, the model should satisfy the following three laws with multiple crossovers: 1. Segregation: The probability that a given allele of the gene is inherited is 1=2.

Linkage (missed by Mendel):
The probability of recombination of two genes is an increasing function of the distance between them, so it is higher for distant genes. Nevertheless, it should not depend on the chromosome length. 3. Independent assortment: This is impossible due to linkage where distance is a determining factor in the recombination. The alternative is to require genes to be asymptotically independent. As a result, the probability of recombination must approach 1=2 when the distance between the two genes becomes large.
Being a computer scientist by training and not a biologist, when I first suggested to my students a model based on a Markov chain, I called it the jumping model of recombination. I also expressed to them my concern that it may not be real, but as it turned out, it made perfect sense. To be loyal to my first terminology, I will call it here the jumping model.

The Jumping Model
The jumping model is based on a Markov chain. A Markov chain consists of a set of states with probabilities of transition between them (thus the jumping term). For computer scientists, this is often illustrated as a directed weighted graph with vertices representing the states and directed edges representing the transitions between states. The weight of an edge is the probability of the corresponding transition. This is shown in Figure 4 for a Markov chain with two states. Operationally, one would start at a given state and follow transitions in discrete time steps as indicated by their probabilities, thus changing state from one step to another. Let a kl be the probability of transition from state k to state l, and x i be the state at time step i. Figure 4 shows a transition probability p between the two states (and 1{p to the same state, because the transition probabilities of a given state must sum up to 1). A generalized notion of a transition is captured by a conditional probability with the following property: Markov property: For jwi, P(x j~l jx i~k and x i{1~. . . )

P(x j~l jx i~k )
When j~iz1, this probability is the transition probability a kl~P (x iz1~l Dx i~k ). In the event (x i~k and x i{1~. . . ) only x i~k is relevant. In other words, the probability of a state at a given time depends only on the most recently known state. What is the biological significance of the Markov chain in Figure 4? Each state represents a chromosome of the pair, and time in the Markov chain corresponds to  genes on the chromosome. A transition between states in one time step signifies a crossover, and the probability of such a crossover is p. Therefore, x i represents a crossover when x i =x iz1 . One could then inquire about the probability of being in a given state at a given time. The event of being in a given state at time i parallels the event that the corresponding chromosome is the source of the allele for gene i. This is illustrated in Figure 5 by conceptually duplicating the chain for each gene to reflect the change of state over time.
A useful representation of a Markov chain is by a matrix P where P kl (the term in the k th row and l th column of P) is the probability of transition from state k to state l; therefore, every row in P must add up to 1. If we call the states in Figure 4 state 1 and state 2, then our Markov chain can be expressed as: In this matrix, P kl can be interpreted as P(x iz1~l Dx i~k )~a kl . Why is this matrix representation useful? Let's multiply P by itself: Note for instance that P 2 12~2 p(1{p) is equal to P(x iz2~2 Dx i~1 ), because to transition from 1 to 2 in two time steps we can transition from 1 to 1 to 2 with probability (1{p)p or from 1 to 2 to 2 with probability p(1{p). As it turns out, P(x izd~l Dx i~k )~P d kl . The proof of this fact is in the ''Markov Transitional Probabilities'' section and uses conditional probability and the Markov property. Thus, every row in P d must also add up to 1.
Because P is a symmetric matrix (P 12~P21 ), a final note is that all powers of P are symmetric matrices. Therefore, P d kl~P d lk , which now implies that every column in P d must also add up to 1. We can finally establish that the probability of recombination is p d~P (x i~1 and x izd~2 or x i~2 and x izd~1 )

Segregation and Independent Assortment
Following the logic of previous sections, the probability that a given allele of gene i is inherited is: Again, if q~1=2 the above probability is 1=2, which makes the jumping model subject to the same sensitivity to q as the uniform 1-crossover model. However, this can now be alleviated. The theory of Markov chains tell us that P d will converge for large values of d and all rows of P d become identical. Therefore, the rows will define a steady state probability for each state. In other words, the effect of q will be washed out. This theory will not be presented here, but Figure 6 shows a few powers of a given matrix P.
Because P d is symmetric in our case,

(convergence)
Since rows and columns of P d must both add up to 1, p i{1~P i{1 12~P i{1 21 converges to 1=2 for large enough i. By exchanging the roles of q and p i{1 in the top expression, we also get 1=2, maintaining the segregation law for large enough distances when q=1=2.
In addition, since both P(x izd~l Dx i~k ) and P(x izd~l ) approach 1=2, we have that P(x izd~l Dx i~k )&P(x izd~l ) for large d. This makes P(x i~k and x izd~l ) P(x izd~l ) when d is large. Therefore, genes i and izd are asymptotically independent, confirming the law of independent assortment for large enough distances.

Linkage (and Hotspots!)
The previous sections show that p d~P d 12 and that P d 12 converges to 1=2 for large values of d, thus establishing the laws of segregation and independent assortment. However, we wish to determine p d for every value of d. This will reestablish the above results. This time, however, and instead of using the theory of matrices (e.g., eigen decomposition) to study how P d evolves, I will revert to elementary mathematics. Two genes at a distance d from each other will recombine if and only if their chromosome experiences an odd number of crossovers along that distance. This is equivalent to the event of making an odd number of transitions between the two states of the Markov chain during d time steps. Let E d be this event (thus p d~P (E d )). It is not hard to see that Observe that p 1~P (E 1 )~p. Therefore, we can write: The Markov property is essential to justify the multiplication by 1{p and p in the above equation because it makes E 1 independent of the history E d{1 . Technically, P(E 1 DE d{1 ) does depend on the state at time step d{1, but given the symmetry in our Markov chain, it is always p. By rearranging and taking care of the special case when d~1 we get: It is easy to verify that the solution satisfies the above recurrence with a base case p 1~1 (following the pattern of the recurrence, we can retrieve the above expression if we replace d by d{1, multiply by (1{2p), and add p).
While it is easy to verify the solution, obtaining it should not remain a wild guess. By working out a few iterations for p d , the ''Recurrence for p d '' section shows how to derive the solution using a geometric series.
The mathematically savvy could verify that 1{2p is an eigenvalue of P, and that the same expression could have been N When d is large (and pw0), (1{2p) d goes to zero, causing p d to converge to 1=2. This convergence was discussed in the previous section, and should not be surprising by now.
N When 0vpv1=2 (1{2p is positive), (1{2p) d is greater than zero and less than one, causing p d to increase with d (linkage). This increase, however, is not linear as in the uniform 1-crossover model; therefore, it is biologically more realistic.
N When pw1=2 (1{2p is negative), the sign of (1{2p) d alternates, causing p d to alternate between a typical value for d and high (hotspots, first time captured).
The jumping model captures the essential biology of crossover and recombination through the laws of segregation, linkage, and independent assortment. In addition, it reveals the non-typical high recombination probabilities of hotspots. Hotspots are regions on the chromosome that experience a high probability of recombination even at small distances. Therefore, depending on the parameter p, the jumping model embodies two modes of chromosomal recombination.
While a hotspot does not present a difficult concept, it is usually misinterpreted by students as a region with high probability of recombination. This is true if the region is too small (a peak in Figure 7), which is biologically typical of hotspots. However, if the region is large enough, there can be a high probability of recombination only if there is a corresponding low probability, as seen by the alternating pattern in Figure 7. What is interesting about the jumping model (which may not be true biologically) is that this low probability is the typical one for the given distance when p is replaced with 1{p. This is also confirmed by the expression we derived for p d , because when pw1=2 and p d v1=2, d is even and, therefore, (1{2p) d~( 2p{1) d : The alternation itself should be intuitive because a high probability of recombination at a small distance must be driven by a high probability of crossover, which in turn means a high probability of crossing over back to the same chromosome. The jumping model captures this fact through the parameter p with a threshold of 1=2 as a high probability of crossover.

Back to the Days of Morgan
Morgan established that the probability of recombination as a function of distance is the following: which does not account for hotspots. In addition, the notion of distance in the above expression is not the same as ours.
To see this, assume that p is close to zero in the jumping model (no hotspots) and, therefore, 1=p is large. Using the exponential limit, where d is the distance and 1=p is the average distance until the next crossover (because a crossover occurs with probability p). So l is the average number of crossovers between the two genes, and this is how Morgan defined his distance.

Why This Way?
I could have simply argued that the probability of recombination p d is (1{e {2d )=2, and that this is consistent with the laws of inheritance. Therefore, I will list what I believe are important aspects of this exposition. N There is a rapid prototyping with a simple uniform 1-crossover model that reflects the essential biological properties of crossover and recombination (though not perfectly). This allows the student to quickly make a connection between the biology and the mathematics.
N There is no need for advanced calculus or probability (e.g., no mention of Poisson processes or probability distributions other than uniform).  N The jumping model also provides the insight that the probability of crossover must be less than 1=2 to observe the typical behavior of recombination (linkage), and hence giving the correct impression that p is rather small.  N The jumping model can be described (not necessarily analyzed) very easily and satisfies all the required biological properties of crossover and recombination. Therefore, a student can effectively retain and communicate the recombination process.

A Computational Example of Genetic Mapping
Consider the hypothetical family in Table 5 where alleles take values in f0,1g (inspired by a homework assigned by Bonnie Berger at MIT).
To map the genes (genetic mapping), we count the number of recombinations, both paternal and maternal, for each pair of genes, AB, AC, and BC. Then we estimate the probabilities of recombination and relate them to distances.

First Attempt
Since 1{a=2w1=2 (for AB and AC), and it is not generally assumed that genes represent hotspots, we might suspect that our knowledge of the alleles of gene A is wrong. It is more plausible that the alleles of gene A are 1,0 for the father and mother, as shown in Table 6.
This will make P(AB)&P(AC)&a=2 and will keep P(BC)&a. Since the probability of recombination of distant genes is higher, the order of genes is B, A, C or C, A, B.
This solution puts B and C at equal distances from A and, therefore, makes the distance from B to C twice the distance from A to B (and that from A to C). However, doubling the distance should not double the probability of recombination unless the probability is a linear function of distance like in the uniform 1-crossover model. We may adopt this model here if we know in advance that only one crossover occurs; this conditioning makes the crossover uniform even when the underlying model is the jumping one (because of the symmetry in the Markov chain). For this argument to work we will also need x~1; otherwise, we observe a double crossover for Offspring i in Table 6.

Second Attempt
If we believe that our knowledge of the alleles in Table 5 is correct, then the genes are in a hotspot region. The obtained probabilities 1{a=2 and a must correspond to the alternating pattern in Figure 7. Therefore, the order is again B, A, C or C, A, B, with A situated at equal distances from B and C. But are the probabilities consistent? In the jumping model, one could easily show that (1{2p d ) 2~1 {2p 2d . Therefore, we must verify that ½1{ 2(1{a=2) 2~1 {2aza 2 &1{2a, so we will need a to be small enough. Note also that if a is small enough, the probability that B and C recombine is P(AB)½1{P  (AC)z½1{P(AB)P(AC) & a(1{a=2) a{a 2 =2&a, which is consistent. Moreover, the probability of a double crossover is P(AB)P(AC)&(1{a=2) 2~1 {aza 2 =4 &1{a, which is the proportion of offsprings in Table 5  We can easily generalize those attempts to obtain a geometric series:

Conclusion
I am not aware of any other exposition of chromosomal crossover, recombination, genetic linkage, hotspots, and genetic mapping that takes the approach outlined herein. The approach represents a simple and modern treatment of an ancient subject, without a compromise of its scientific and mathematical integrity.
The reader should find an insightful explanation with a focus on reinforcing the ideas by exposing them in different settings. In addition, there is an attempt to introduce the reader to the process of modeling by showing what works and what doesn't. Most importantly, this should provide an early chance to convey to our students that biology is a computational science.
Disclaimer I ignored some of the biological detail in favor of simplicity and consistency. Keep in mind, however, that in biology there is always an exception to the rule!

Further Readings
There is no explicit referencing in the text. This is intentional. I used what everyone would now consider folklore from biology, probability, and calculus. All can be found in textbooks, even elementary ones. For the interested reader, however, and in addition to any introductory texts on probability and calculus, here is a list (in alphabetical order by author) of book chapters that will provide enough background for further endeavors.