Pairing statistics and melting of random DNA oligomers: Finding your partner in superdiverse environments

Understanding of the pairing statistics in solutions populated by a large number of distinct solute species with mutual interactions is a challenging topic, relevant in modeling the complexity of real biological systems. Here we describe, both experimentally and theoretically, the formation of duplexes in a solution of random-sequence DNA (rsDNA) oligomers of length L = 8, 12, 20 nucleotides. rsDNA solutions are formed by 4L distinct molecular species, leading to a variety of pairing motifs that depend on sequence complementarity and range from strongly bound, fully paired defectless helices to weakly interacting mismatched duplexes. Experiments and theory coherently combine revealing a hybridization statistics characterized by a prevalence of partially defected duplexes, with a distribution of type and number of pairing errors that depends on temperature. We find that despite the enormous multitude of inter-strand interactions, defectless duplexes are formed, involving a fraction up to 15% of the rsDNA chains at the lowest temperatures. Experiments and theory are limited here to equilibrium conditions.

The solid-phase synthesis of 20N rsDNA was previsously characterized by MALDI-TOF, as described in the S.I. of Ref. [1], where we found that the width of the mass distribution was compatible with the expectation for a random sequence system.
In the context of this work, we also characterized the rsDNA oligomers by HPLC. HPLC experiments were performed at T = 40 • C on DNA solution prepared at c DN A = 0.1g/l. In Fig A(a) we show HPLC traces of 12N compared with single strand oligomer 12ss (CTATGC-CACCTA) and self-complementary 12-mer DD (CGC-GAATTCGCG). The main HPLC peaks of 12ss and DD are apart much more than their width, and occurs within the broader range of the 12N peak profile. The broadness of the 12N peak is the signature of a variety of DNA sequences eluted at slightly different times due to both their different A,T,C and G contents and relative order. Also to be noticed that the center of the 12N trace coincides with the main peak of 12 ss, which has f CG = 0.5. In Fig A(b) we show HPLC traces of the three rsDNA oligomers used in this work, 8N, 12N and 20N. All HPLC traces share the same shape, with their centers occurring at higher times according to their length, as expected.
Although HPLC and MALDI do not demonstrate the actual full randomness of the rsDNA synthesis, they clearly support the notion that the distribution of sequences if broad. Indeed, for the results and observations of this study to hold it is not necessary that the randomness is perfect, but rather that the distributions of defects are the same as in a random system, a condition that could be fulfilled even in a system with imperfectbut still broad -randomness.
It is interesting to compare the number of distinct molecular species in the rsDNA solutions with the actual number of molecules in the experimental samples. Among the system considered, the most extreme diversity is that of 20N, where the number of sequences is 4 20 ≈ 1.1 · 10 12 . Among the samples used in the experiments, the ones with the smallest number of total molecules are those used in melting experiments with the 1 cm cuvette, in which we use 1 mL of solution at a concentration of c = 0, 04g/l ≈ 6mM . In this case the total number of molecules in the sample is N ≈ 4 · 10 16 , thus granting ≈ 10 4 replicas per molecular type.

II. MEASUREMENT OF RSDNA CONCENTRATION
The absorbance A of a DNA solution is proportional to the oligomers concentration and the path length [2]: where l is the optical path length, c DN A is the DNA molar concentration and is the molar extinction coefficient of the DNA in the sample. The extinction coefficient for a particular sequence i is computed, according to the NN model [2], with the following expression: where i (ext 1 ) and i (ext 2 ) represents the contribution of the first and the last bases of the DNA sequence and i (q) are the extinction coefficients of the couples of bases in the position q and q + 1 along the sequence i.
As introduced in the theoretical section, we will consider mixture of rsDNA oligomer where all possible sequences of length L are equally populated; therefore we need the average of the extinction coefficient i i over all possible sequences in the rsDNA system. For rsDNA oligomers with generic length L we get: [2]. Consequently, for the rsDNA system of length L=12, we get 12N = 116041(cm M ) −1 . We perform the concentration characterization in melted condition at high temperatures, in order to avoid the hypocromicity term that is unknown in the case of rsDNA hybridization involving pairing errors. By UV absorbance we set c DN A in the experimental samples.

III. UV ABSORBANCE: EXPERIMENTAL SETUP
To perform melting experiments at high DNA concentration two technological upgrades were necessary: 1) the quartz microfluidic cell shown in Fig B with path length l = 10 µm, which allowed us to increase c DN A up to 20 -30g/l; 2) Quantum Northwest Peltier hot/cold stage (shown in Fig B) combined with a metal ("T" shaped) cell holder which allowed us to explore a wider range of temperature T = 0-90°C. Indeed, accessing to the lowest temperature range is crucial for the analysis of most melting curves, as explained below. Evaparotion was prevented by sealing the two noozles of cell with EPDM corks.
Because of the complex new cell holder and of the need of accurate T measurements, we carefully calibrated the thermostatic system. With repeated heating and cooling ramps in the interval -6 • C to 100 • C at 1 • C/min, we measured T with a thermistor in contact with the microfluidic cell as a function of the inner Peltier probe temperature, T peltier , shown in Fig B. Melting experiments for diluted DNA solution (c DN A = 0.04g/l) were performed using standard quartz cuvette hosted in the same call holder. T was in this case measured directly by a the thermistor inside the DNA solution. In diluted melting experiments, solutions in cuvette were covered by mineral oil in order to prevent evaporation. In all experiments a nitrogen constant flux was used to prevent condensation on both cuvette and microfluidic cell at low T.

IV. ANALYSIS OF UV ABSORBANCE DATA
To analyze and perform fitting on the absorbance data A(T ), we first applied a Savitzky-Golay smoothing filter (sgolayfilt Matlab function) with polynomial order 3. Fig Da shows such curves corresponding to serveral heating and cooling ramps performed consecutively on the same rsDNA solution. It is evident that no evaporation occurs and experiments are reproducible. Secondly, each smoothed A(T ) is fitted following standard protocols [3] in order to remove the contribution of the absorbance drift at high temperature A HT (T ) and at low temperature A LT (T ) so to extract the fraction of rsDNA single strands θ: Although our aim is removing the high and low T drifts, in order to perform the fit necessary to determine them we need to adopts a specific functional shape for θ, which we chose as: and A HT , A LT are described by: the experimental absorbance data A(T) are fitted by:

V. CHARACTERIZATION OF A* AND B* FLUORESCENCE
Here we describe the florescence signal that enabled to detect the hybridizing of DNA strands labeled with the fluorophores Fam and TexasRed, and discuss why we identify it as a Contact-Mediated Quenching. We studied the absorption and fluorescence emission spectra of Fam and TexasRed linked to the complementary DNA strands A and B, respectively, in solutions having different compositions: A*, B* , A*+B* , A*+B and A+B*, where the asterisk marks the presence of thre fluorescent tag on the molecule. Fig E shows the fluorescence intensity I f measured with a fluorimeter in the range from λ = 470nm to λ = 700nm, with excitions at λ F AM = 492nm and λ T exasRed = 593nm. We find the emission peak of the two fluorophores were expected: at λ ≈ 520nm for FAM and λ ≈ 610nm for TexasRed. Although the two florophores would be suitable for FRET, with TexasRed emission following Fam excitation, here the dominant effect is the quenching of both florophores upon forming the A*B* duplex. Indeed, Fig E shows that: i) the pairing of fluorophore-conjugated strands with untagged strands, leaves the fluorescent emission unchanged, as apparent comparing the emission from the A* and A*B, and from the B* and AB* solutions; ii) the formation of A*B* duplexes has a dramatic effect on I f of both fluorophores. Upon A*B* duplexing, the emission of TexasRed, exited at λ T exasRed = 593nm, is reduced to about half of the signal of A* or A*B. Analogously, the emission wavelengths of FAM, excited at λ F AM = 492nm, is also approximately reduced to half of the signal of B* or AB*. We exploit this marked change in I f to obtain information on the formation of A*B* duplexes when immersed in the sea of competitive interactions of rsDNA.
To gain understanding in the nature of the phenomenon, we measured the absorption spectrum of these systems. In Fig F we can appreciate that A*B* duplexing partially modifies the absorbance of both florophores, but not enough to justify the drop in emission described above. In particular, the A*B* absorbance is higher than B* and AB*.
A useful clue for the explanation of the A*B* fluorescence quenching can be found when considering the sizes of the involved molecules: in Fig G we

VI. CONTACT-QUENCHING DATA ANALYSIS
An example of the raw fluorescence signal of a CQ experiment performed with a qPCR machine is shown in Fig Ha where we plot the fluorescence intensity I f obtained by measuring 5 replicas -in 5 different wells -of the same sample, and for each 2 cooling ramps. While the two ramps are, for each replica, similar, the amplitude of the collected signal markedly differs between replicas. This might be due to different sensibilities of the  Fig Hb . The difference between normalized curves enables evaluating the standard deviation of the errors on this signal. Since measurement of N f becomes unstable above 70 • C, we excluded this region from data analysis.
Having characterized CQ from in solutions of A* and B*, we explored N f in solution where only a fraction of the molecules has a fluorescent tag. We thus prepared mixtures with a fixed concentration of A* (c A * = 100nM ) and a concentration of B* and B so that c B * + c B = c A * . In this way, at low T all the A* are paired with either B or B*, depending on the fraction f B * = c B * /c A * . Fig I shows N f for TexasRed (the fluorophore of A*) at T = 20 • C measured for various values of f B * ranging from 0 (A*B* duplexes only) to 1 (A*B duplexes only). We find N f (T = 20 • C) to depend linearly on f B * , as expected from the summation of the fluorescent emissions of individual duplexes. This behaviour enables determining, through CQ, the fraction θ A * B * of paired A*B* in the rsDNA competitive environment as: where N (p) f and N (u) f are the reference signal for the paired and unpaired A*B*. Thus The reference signal N where N LT (T ) = m LT T + q LT and N HT (T ) = m HT T + q HT are the linear drifts respectively at low and high temperatures, and Θ A * B * (T ) has the functional shape expected for DNA melting curves: where H and S are here fitting parameters, while in the melting curve expression they depend on the enthalpic and entropic contributions to the duplex stability, on the DNA concentration and on other numerical parameters.
From the fitting procedure we extract θ A * B * .  We thus determined the correct ∆H and ∆S for the A*B* duplex by fitting the data with the general expression for complementary strands at the same concentration c [5]: Given the significant stabilizing effect of the terminal fluorophores in A*B*, it is reasonable to assume that also any pair involving a strand A* or B* and a rsDNA oligomers will have a modified energetic contribution due to the presence of the fluorophores. To this aim, We thus assume that the pairing free energy ∆G A * B * of A*B* differs from that of AB (∆G AB ) because of the distinct contributions of the two fluorophores ∆G T exasRed and ∆G F AM : ∆G A * B * = ∆G AB + ∆G T exasRed + ∆G F AM = = ∆G AB + 2∆G f luo .
We also assume the effect of the two fluorophores to be equal: ∆G T exasRed = ∆G F AM = ∆G f luo . In line with these assumptions, the pairing free energy of A* or B* with a rsDNA strand is influenced by the contribution of a single fluorophore: Tm vs A*B* concentration measured by CQ for the following systems (dots): 8A*-8B* NaCl 150mM (Red), 8A*-8B* NaCl 1mM (green), 12A*-12B* NaCl 150mM (light blue), 12A*-12B* NaCl 1mM (purple). Dash-dot lines and the shaded regions are the best fit and uncertainty from Eq. (13). Dotted lines: predicted Tm from the Nearest Neighbor model using the energetic parameters for the AB duplex.
where ∆G (α) f CG is the pairing energy computed as described in the main text, with α the pairing quality of the duplex that A* or B* is forming with the rsDNA oligomer, and f CG is the fraction of C or G bases of the labeled DNA strands.