Between-tumor and within-tumor heterogeneity in invasive potential

For women with access to healthcare and early detection, breast cancer deaths are caused primarily by metastasis rather than growth of the primary tumor. Metastasis has been difficult to study because it happens deep in the body, occurs over years, and involves a small fraction of cells from the primary tumor. Furthermore, within-tumor heterogeneity relevant to metastasis can also lead to therapy failures and is obscured by studies of bulk tissue. Here we exploit heterogeneity to identify molecular mechanisms of metastasis. We use “organoids”, groups of hundreds of tumor cells taken from a patient and grown in the lab, to probe tumor heterogeneity, with potentially thousands of organoids generated from a single tumor. We show that organoids have the character of biological replicates: within-tumor and between-tumor variation are of similar magnitude. We develop new methods based on population genetics and variance components models to build between-tumor and within-tumor statistical tests, using organoids analogously to large sibships and vastly amplifying the test power. We show great efficiency for tests based on the organoids with the most extreme phenotypes and potential cost savings from pooled tests of the extreme tails, with organoids generated from hundreds of tumors having power predicted to be similar to bulk tests of hundreds of thousands of tumors. We apply these methods to an association test for molecular correlates of invasion, using a novel quantitative invasion phenotype calculated as the spectral power of the organoid boundary. These new approaches combine to show a strong association between invasion and protein expression of Keratin 14, a known biomarker for poor prognosis, with p = 2 × 10−45 for within-tumor tests of individual organoids and p < 10−6 for pooled tests of extreme tails. Future studies using these methods could lead to discoveries of new classes of cancer targets and development of corresponding therapeutics. All data and methods are available under an open source license at https://github.com/baderzone/invasion_2019.

The total spectral power M/2 k=1 P k is not invariant to scaling. Multiplying each (x j , y j ) 2 by a scale factor ρ yields a scale factor ρ 2 in each P k . Experimental procedures were 3 designed to yield approximately similar sized organoids. We therefore normalized the P k 4 terms to remove the scale factor, focusing on shape rather than scale. Normalization 5 was performed by re-scaling each organoid to a unit circle. A unit circle centered at the Thus for the unit circle and k ∈ {0, ±1, . . . , ±M/2}, we find thatx k = M/2 for k = ±1 11 andx k = 0 otherwise. Similarly,ŷ k = M/2 for k = ±1 and 0 otherwise. Finally, for a 12 circle of radius r centered at the origin, P k = r 2 M 2 /2 for k = ±1 and 0 otherwise. We 13 therefore divide each term P k by P 1 to normalize the power spectrum. Since the k = 1 14 term always contributes a value of 1 to the sum, it is omitted from further consideration. 15 We incorporated two additional factors in the spectral characterization of invasion 16 based on means and parametric derivatives estimated at the midpoint of each interval: 17 y j ≡ (y j +1/2 + y j −1/2 )/2, (4) with j ∈ {1/2, 3/2, . . . , M − 1/2}. The mappingsx andȳ smooth artifacts in the 18 boundary due to pixelation. A zigzag or staircase motif with (x, y) coordinates 19 (0, 0), (1, 0), (1, 1) (2, 1), for example, is mapped to the lineȳ =x − 1/2.

20
The transform ofx is the cosine low-pass filter. We have used the periodic property that e 2πijk/M x j is 22 identical for j = 0 and j = M . Using this filter, P k → cos 2 (πk/M )P k , which smoothly 23 switches the power to 0 as the frequency approaches its largest magnitude, |k| = M/2.
The mappingsẋ andẏ are the local tangents to the parametric curve, normalized by 25 the constant distance 2π/M between adjacent interpolation points. Derivatives of 26 parametric curves are related to curvature, which is constant for a non-invasive round 27 shape and varies over the boundary for an invasive structure. With a similar approach, 28 the transform ofẋ is 29 We therefore introduce the weight (M/π) 2 sin 2 (πk/M ). For low frequencies, 30 sin(πk/M ) ≈ πk/M , andx k → ikx k , the analogous result for continuous coordinates.

31
In this limit, the weighting factor is k 2 .

32
We combined these transforms and the normalization by the power of the first mode 33 to arrive at a weighted spectral power, w, defined as For small values of k, and defining the spacing 2π/M as λ, the leading terms of the 35 weighting and filtering factors are The normal mixture models are PLOS 3/7 The maximum likelihood parameters for these normal mixture models are The resulting model scores are For S 0 , all N observations contribute to the two parameters µ 0 andσ 2 0 . For S 1 , all N 43 observations contribute to estimatingσ 2 W , and N t observations contribute to each 44 estimatedμ t . For S 2 , N t observations contribute to eachμ t andσ 2 t . For tumors with a 45 single organoid, we used the shared variance estimate rather than the within-tumor Unbiased estimates of means are The notationμ rather thanμ indicates unbiased estimates rather than maximum 52 likelihood estimates. For means these estimates are identical, but for variances they are 53 different. The sums of squares Σ 0 for the total population, Σ M for the modeled 54 PLOS 4/7 between-tumor effects, Σ t for individual tumors, and Σ W for within-tumor are with Σ M + Σ W = Σ 0 . The unbiased estimates of variance for H 0 and H 1 , and the 56 variance σ 2 M from the modeled between-tumor effects, are The ANOVA test statistic, Q 1 , is Under the null hypothesis of equal means, µ 1 = µ 2 = . . . = µ T , Q 1 is a random variable 59 following an F -distribution, If the null hypothesis is rejected, then the invasiveness of an organoid may be 61 described as a random variable y composed of a between-tumor effect B and a 62 within-tumor effect W , with The unbiased estimatorσ 2 B is Of the total variance,σ 2 B is estimated to arise from between-tumor heterogeneity, and
Tumor means and organoid deviations from the mean are These transformations define two models, a between-tumor model for the means and a 70 within-tumor model for the deviations: Here we have represented the parameter β from the mixed effects model as two separate 72 parameters, β B for the between-tumor test and β W for the within-tumor tests. These 73 tests are conducted separately as follows.

74
For the between-tumor test, means and sums of squares are Under the null hypothesis that β B = 0, the test statisticQ 2 B ∼ F 1,T −1 , which for large 76 T is distributed approximately as χ 2 1 . Under the alternative hypothesis,Q 2 B is 77 distributed as a non-central χ 2 1 distribution whose expected value depends on the true 78 fraction of variance explained by the between-tumor model, Define the cumulative probability integral Φ(z) as PLOS 6/7 with Φ −1 (α) giving the value of z with lower-tail area α. A two-tailed test of β B = 0 81 with type I error controlled at level α corresponds to |Q B | > z I , with Given the true value Q B , the type II error rate (or false-negative rate) is Φ(z I − |Q B |), 83 with corresponding quantile The relationship between the type I and II error rates, the true effect size R B , and the 85 population size (number of tumors T ) is summarized as This expression depends explicitly on the number of tumors, T . It depends implicitly on 87 the number of organoids per tumor through the factor R 2 B , which is inversely 88 proportional to the variance of the estimated tumor-level invasiveness. This variance is 89 σ 2 B + σ 2 W /N t . We use R 2 B to represent the variance explained in the limit of large N t ,

90
and for simplicity assume that the number of organoids per tumor is the same for all 91 tumors, N t = N/T for all t. For a particular choice of N t , the variance explained is 92 reduced by the factor 1 + (σ 2 W /N t σ 2 B ). The explicit dependence of power on T and N t is 93 then where R 2 B represents the variance explained in the limit of large N t and perfect 95 knowledge of the tumor mean invasiveness.

96
The within-tumor test follows a similar pattern. The sums of squares and estimates 97 are In this case,Q 2 W ∼ F 1,N −T −1 . The relationship between error rates, effect size, and 99 population size is