Rotation of sex combs in Drosophila melanogaster requires precise and coordinated spatio-temporal dynamics from forces generated by epithelial cells

The morphogenesis of sex combs (SCs), a male trait in many species of fruit flies, is an excellent system in which to study the cell biology, genetics and evolution of a trait. In Drosophila melanogaster, where the incipient SC rotates from horizontal to a vertical position, three signal comb properties have been documented: length, final angle and shape (linearity). During SC rotation, in which many cellular processes are occurring both spatially and temporally, it is difficult to distinguish which processes are crucial for which attributes of the comb. We have used a novel approach combining simulations and experiments to uncover the spatio-temporal dynamics underlying SC rotation. Our results indicate that 1) the final SC shape is primarily controlled by the inhomogeneity of initial cell size in cells close to the immature comb, 2) the final angle is primarily controlled by later cell expansion and 3) a temporal sequence of cell expansion mitigates the malformations generally associated with longer rotated SCs. Overall, our work has linked together the morphological diversity of SCs and the cellular dynamics behind such diversity, thus providing important insights on how evolution may affect SC development via the behaviours of surrounding epithelial cells.


Data analyses of experimental results of Figure 4
There are 5 experimental data points for each of ♀wt, ♂ wt and ♂ bab PR7 2 (Table A). Each data point consists of 4 coordinates: initial area of EP1, initial area of EP2, final area of EP1 and final area of EP2. We apply a combination of Monte Carlo simulations and bootstrapping to construct artificial data points based on these 5 actual experimental data points for each genotype for statistical analyses. First, we perform statistical bootstrapping: we construct 5000 extra sets of data points based on the 5 experimental data points. Each set of these data points consists of the same number of data points (5) as the original. Each data point of a set comes from random drawing of the 5 data points of the original experimental data set with replacement. Next, we perform Monte Carlo simulations on each of the 5000 bootstrapped data sets for each of ♀wt, ♂ wt and ♂ bab PR7 2 : for each of these bootstrapped data sets, we generate 1 million artificial data points from a multivariate Gaussian distribution with means and covariances identical to the means and covariances of that data set. Therefore, there are 4 sample mean values (each for initial EP1, initial EP2, final EP1, final EP2) and 4 × 4 = 1 6 covariance values arranged in a positive semi-definite symmetric matrix required for the generation of these 1 million artificial data points. Finally, we derive quantities and perform confidence interval estimates based on the 5000×3 realizations with a total of 15000 million artificial data points from the bootstrapped data sets of all three genotypes (both detailed below).
For each bootstrapped data set with the 1 million Monte Carlo simulated data points, we define the inhomogeneity coefficient C inhomog eneity as, C inhomog eneity ≡ F inal ar ea of E P2 /I nitial ar ea of E P2 F inal ar ea of E P1 /I nitial ar ea of E P1 . (1) C inhomog eneity is a vector of 1 million elements. In equation 1, C inhomog eneity is formed by element-wise division of two vectors, E P2 F inal E P2 I nitial (numerator) and E P1 F inal E P1 I nitial (denominator), which are themselves formed by element-wise division of the respective individual data points within the same bootstrapped set. For example, the first element of E P2 F inal E P2 I nitial is obtained by dividing the value of the final area of EP2 of the first data point with the initial area of EP2 of the first data point. Therefore, an element of C inhomog eneity > 1 means that, for that particular simulated data point, EP2 expands more (or contracts less) as a proportion of its initial size than EP1, and vice versa for an element of C inhomog eneity < 1 . Thus, C inhomog eneity captures the inhomogeneity of distal cell expansion during SC rotation.
To explore the effects of being ♀wt, ♂ wt or ♂ bab PR7 2 has on the inhomogeneity of expansion of distal cells, we define ∆C inhomog eneity as the difference of C inhomog eneity between any of the two of ♂ wt, ♀wt and ♂ bab PR7 2 (therefore, two bootstrapped data sets, each of 1 million artificial data points, are required for arguments of ∆C inhomog eneity ): Thus, ∆C inhomog eneity (s1 , s2 ) (a vector of 1 million elements) captures how much such an asymmetry of distal cell expansion varies between s1 and s2 where s1 and s2 can be any of ♀wt, ♂ wt and ♂ bab PR7 2 . In obtaining ∆C inhomog eneity (s1 , s2 ), we randomly shuffle the elements of C inhomog eneity (s2 ) before the vector subtraction to avoid potential artifacts resulting from the seeds of the pseudo random number generator. Figure A shows how the values of C inhomog eneity , ∆C inhomog eneity (s1 , s2 ) and other related quantities based on three example bootstrapped data sets can be visualized as distributions. For statistical analyses, there is a total of 5000× 3 such C inhomog eneity distributions. We have also used 5000× 3 such ∆C inhomog eneity (s1 , s2 ) distributions (5000 for each pair of (s1 , s2 ), s1 ̸ = s2 ), out of a total of 3× 5000 × 5000 possible combinations, not counting interchange of indices.
In the main text, we used derived quantities such as Pr ( E P2 I nitial E P1 I nitial < 1 ) or Pr (C inhomog eneity > 1 ). These probability values for each realization of bootstrapped data sets are defined as the proportion of area of the relevant distribution that satisfies the said criterion. These probability values are themselves random variables with a probability distribution that can be visualized by calculating every realization of the probability value from every bootstrapped data set.
Thus, confidence intervals of these probability values and other derived quantities can be estimated in such a manner (Table B).

Data analyses of ABASCT SD statistics of simulated intact SC without temporal dynamics
We first perform ANOVA on the ABASCT SD data of simulated intact SC without temporal dynamics (i.e. delay=0 mcs) to test the equality of the means of ABASCT SD values between SC lengths (5, 7, 9 or 11-tooth SCs). The resulting F-test statistic has a value of 120 with Pr(>F)< 2 × 1 0 −1 6 , which shows at least one of the means is significantly different from the other three. Post-hoc Tukey's test is then used to pairwise compare the ABASCT SD data of each group (SC lengths). The results are shown in Table C and Figure E.

Calculation of p-value in Table D
In Table D we estimated the p-value between the breaking statistics of two sets of SC simulations. Given that 0 ≤ A ≤ B ≤ 4 8 , the p-values quoted in Table D are related to the probability that any two sets of simulations with intact ratio of at most A 4 8 and intact ratio of at least B 4 8 share a common probability distribution. These values are calculated as follows. Assuming that each simulation is independent with the rotated SC having a probability P of being intact, in a set of 48 simulations the probability of getting A intact SCs is thus, where Therefore, the probability that a set of simulations has at most A intact SCs is simply, Similarly, the probability for a set of simulations to get at least B intact SCs is, Multiplying equations 5 and 6 gives the probability that both events happen together, To derive the p-value, one needs the maximum of G (A, B, 4 8 , P) along the P-axis ("worst case scenario"). One method to find such values is to differentiate G (A, B, 4 8 , P) with respect to P and set the expression to zero, and look for the values of P that solve equation 8 and satisfy the second derivative condition d 2 G (A,B,4 8 ,P) dP 2 < 0 . Once we get these values we can determine whether any of those values of P or the "end-point" values (i.e. P = 0 or 1) yields the maximum of equation 7 for 0 ≤ P ≤ 1 .
In our case, we use an alternative numerical approach of scanning the numerical maximal value of G by increasing 0 ≤ P ≤ 1 step-wise. The p-value reported in each row of Table D is two times this numerical maximum.   Each histogram shows counts of a single SC length (5, 7, 9 or 11-tooth SC). . Each histogram shows counts of a single SC length (5, 7, 9 or 11-tooth SC).      Table D: Results of hypothesis testing on the equality of SC breaking probabilities between two sets of simulations. The (null) hypothesis H 0 (A, B) tested in each row is "a simulation set that has at most A intact SCs and another simulation set that has at least B intact SCs share a common probability distribution of SC breakage".