Aptamer-based multiplexed proteomic technology for biomarker discovery.

BACKGROUND
The interrogation of proteomes ("proteomics") in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology and medicine.


METHODOLOGY/PRINCIPAL FINDINGS
We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 µL of serum or plasma). Our current assay measures 813 proteins with low limits of detection (1 pM median), 7 logs of overall dynamic range (~100 fM-1 µM), and 5% median coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding signature of DNA aptamer concentrations, which is quantified on a DNA microarray. Our assay takes advantage of the dual nature of aptamers as both folded protein-binding entities with defined shapes and unique nucleotide sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD). We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to rapidly discover unique protein signatures characteristic of various disease states.


CONCLUSIONS/SIGNIFICANCE
We describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine.

Interrogation of the human proteome in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology. We present a new aptamerbased proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 L of serum or plasma). Our current assay allows us to measure ~800 proteins with very low limits of detection (1 pM average), 7 logs of overall dynamic range, and 5% average coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding DNA aptamer concentration signature, which is then quantified with a DNA microarray. In essence, our assay takes advantage of the dual nature of aptamers as both folded binding entities with defined shapes and unique sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD). We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to discover unique protein signatures characteristic of various disease states. More generally, we describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine.
Proteins present in blood are an immediate measure of an individual's phenotype and state of wellness. Secreted proteins, released from diseased cells and surrounding tissues, contain important biological information with the potential to transform early diagnostic, prognostic, therapeutic, and even preventative decisions in medicine.
We will realize the full power of proteomics only when we can measure and compare the proteomes of many individuals to identify biomarkers of human health and disease and track the blood-based proteome of an individual over time. Because the human proteome contains an estimated 20,000 proteins, plus post-translational variants, that span a concentration range of ~12 logs, there is great technical difficulty in identifying and quantifying valid biomarkers. Proteomic measurements demand extreme sensitivity, specificity, dynamic range, and accurate quantification.
The desire to profile the changes in protein expression at large scale is not new.
Attempts at high-content proteomics began with 2-D gels and now mostly employ mass spectrometry (MS) and antibody-based technologies 1 . MS can deliver specific analytical capabilities, but its sensitivity is limited typically to nM protein concentrations which leaves much of the proteome in plasma undetected. Techniques like Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA) and Multiple Reaction Monitoring (MRM) can be more sensitive, but are still limited to tens of protein measurements 2 . In addition, problems of cost, throughput, and reproducibility remain a challenge. Due to these limitations, MS biomarker studies cannot yet be efficiently scaled to measure with sufficient sensitivity thousands of proteins in thousands of samples, and such studies therefore miss the greatest opportunity for discovery.
In contrast to 2-D gels and MS, antibody-based methods are much more sensitive and can be used to detect analytes in the sub-nM range. This is enabled by the high affinity of antibodies for their targets which is generally in the nM to pM range.
However, non-specific binding of antibodies to non-cognate proteins, other macromolecules, and surfaces requires the use of sandwich-type assays where the second antibody contributes to enhanced specificity through an independent binding event. In other words, technologies such as Enzyme-Linked Immuno-Sorbent Assays (ELISAs) attain high sensitivity by combining the specificity of two different antibodies to the same protein, requiring that both bind to elicit a signal 1 . Although broadly used in single-analyte tests, it has recently become clear that such assays cannot be multiplexed above a few tens of simultaneous measurements 3,4 in large part because cross-reactivity of secondary antibodies to surface-immobilized proteins (including primary antibodies) dramatically erodes specificity 1 . This inherent characteristic compromises the performance of antibody-based arrays including printed antibodies, sandwich formats, and bead-based arrays 1,5 . A recently reported proximity ligation assay that relies on antibody sandwich formation in solution followed by ligation of antibody-tethered nucleic acids and PCR amplification has been multiplexed with six analytes 3 .
Given these challenges, we set out to develop a proteomics array technology analogous to the highly-successful nucleic acid hybridization microarray. To create this technology, we developed a new class of DNA-based aptamers enabled by a versatile chemistry technology that endows nucleotides with protein-like functional groups.
These modifications greatly expand the repertoire of targets accessible to aptamers. The resulting technology provides efficient, large-scale selection of exquisite protein binding reagents selected specifically for use in highly-multiplexed proteomics arrays. Here we present the development of these unique reagents in the context of our high-content, high-performance, low-cost proteomics array, and demonstrate the potential of the platform to identify biomarkers from clinically-relevant samples.
Aptamers are a class of nucleic acid-based molecules discovered twenty years ago 6,7 and have since been employed in diverse applications including therapeutics 8 , catalysis 9 , and now proteomics. Aptamers are short single-stranded oligonucleotides, which fold into diverse and intricate molecular structures that bind with high affinity and specificity to proteins, peptides, and small molecules [10][11][12] . Aptamers are selected in vitro from enormously large libraries of randomized sequences by the process of Systematic Evolution of Ligands by EXponential enrichment (SELEX) 6,7 . A SELEX library with 40 random sequence positions has 4 40 (~10 24 ) possible combinations and a typical selection screens 10 14 -10 15 unique molecules. This is on the order of 10 5 times larger than standard peptide or protein combinatorial molecular libraries 13 .
Based on the collective knowledge of the aptamer field that has developed since its inception 6,7 , we hypothesized that aptamers could make exceptional reagents for high-content proteomics. There were many examples of high affinity RNA and DNA aptamers selected against human proteins 12 . However, there were also examples of difficult protein targets for which standard RNA and DNA SELEX did not yield high affinity aptamers. With two key innovations, we created a new class of aptamer, the Slow Off-rate Modified Aptamer (SOMAmer), which enabled efficient selection of high-affinity aptamers for almost any protein target.
The first innovation was motivated by the idea that aptamers can be endowed with protein-like properties by adding functional groups that mimic amino acid side-chains to expand their chemical diversity 14 . Eaton and colleagues developed the technology to efficiently synthesize nucleotides modified with diverse functional groups and to utilize them in SELEX 14,15 . This innovation was used to select catalysts, including the first RNA-catalyzed carbon-carbon bond formation 9,16 . Building on this work, we developed modified deoxyribonucleotides and SELEX methods 17 to select modified DNA aptamers from libraries that incorporate one of four dUTPs modified at the 5-position ( Fig. 1a and Supplementary Information (SI)).
To test whether modified nucleotides improve SELEX, we compared selections with modified and unmodified nucleotides targeting thirteen "difficult" human proteins that repeatedly failed SELEX with unmodified DNA. As a control, we included GA733-1 protein, which had yielded high-affinity aptamers with unmodified DNA SELEX. The results (Supplementary Table 1) show that only SELEX with modified nucleotides yielded high-affinity aptamers to these difficult proteins. It is worth noting that, depending on the protein, certain modifications worked better than others (Supplementary Table 1), illustrating the benefit of applying multiple modifications against the same target to ensure a high probability of success. Based on these results, we adopted modified nucleotide SELEX exclusively in our standard selections. To date, we have selected high-affinity aptamers (with most K d values lower than nM, see The second innovation was a solution to the principal challenge of identifying a second element of specificity beyond binding of a second ligand for use in high-content arrays. Inspired by classic kinetic theory of specific binding in complex mixtures 18,19 , we employed kinetic manipulations to help overcome the problem of non-specific SOMAmer-protein binding. To achieve this second element of specificity, we selected for aptamers with slow dissociation rates (t 1/2 >30 min, Fig. 1c) that allow selective disruption of non-specific (or non-cognate) binding interactions by using a large excess of a polyanionic competitor. This kinetic challenge works well for two reasons. First, dissociation rates of non-cognate SOMAmer-protein interactions are generally much faster (half-lives of a few minutes or less). Second, since all aptamers are polyanions, another polyanion at high concentration (e.g., dextran sulfate) can serve as a common competitor that dramatically minimizes rebinding events in a multiplex assay. In contrast, a common non-denaturing competitor of all antibody-antigen interactions or, more generally, protein-protein interactions, is not known.
The current array measures 813 human proteins (Supplementary Table 2).
These proteins represent a wide range of sizes, physicochemical properties (e.g., pI range of 4-11 as shown in Fig. 1d), and biological functions from a variety of molecular pathways and gene families (Fig. 1e). Thus, SOMAmer technology enables an efficient and scalable pipeline to generate unbiased content for proteomics arrays.
To create our high-content proteomics discovery array, we developed a novel assay ( Fig. 2) which transforms a complex proteomic sample (e.g., plasma, serum, conditioned media, cell lysates, etc.) into a quantified protein signature. The assay leverages equilibrium binding and kinetic challenge 1 . Both are carried out in solution, not on a surface, to take advantage of more favorable kinetics of binding and dissociation 1 . Briefly, the sample is incubated with a mixture of SOMAmers each containing a biotin, a photocleavable group, and a fluorescent tag followed by capture of all SOMAmer-protein complexes on streptavidin beads (catch-1) (Fig. 2a, 2b-1,2).
After stringent washing of the beads to remove unbound proteins and labeling of beadassociated proteins with biotin under controlled conditions ( Fig. 2b-3), the complexes are released from the beads back into solution by UV light irradiation and diluted into a high concentration of dextran sulfate, an anionic competitor. Note that the biotin that was originally part of the SOMAmer now remains on beads. The anionic competitor coupled with dilution selectively disrupts non-cognate complexes (see Fig. 3a) and since only the proteins now contain biotin, the complexes are re-captured on a second set of beads (catch-2) from which unbound SOMAmers are removed by a second stringent washing ( Fig. 2b-5). The SOMAmers that remain attached to beads are eluted under high pH-denaturing conditions and hybridized to sequence-specific complementary probes printed on a standard DNA microarray ( Fig. 2b-6,7).
The result is a mixture of SOMAmers that quantitatively reflects protein concentrations in the original sample. The modified nucleotides in SOMAmers are designed to maintain canonical base-pairing 17,20 (in a DNA duplex, adducts at the 5-position of pyrimidines are directed toward the major groove of DNA) and hybridize effectively to unmodified DNA oligonucleotides on the array (this, of course, is also required for replication during SELEX). Thus, our assay takes advantage of the dual nature of aptamers as molecules capable of both folding into complex three-dimensional structures, which is the basis of their unique binding properties, and hybridization to specific capture probes.
The assay uses one SOMAmer per analyte rather than a sandwich of binding reagents and thus depends on equilibrium binding and kinetics for specificity. A key contribution to specificity in the assay is the difference in dissociation rates between cognate and non-cognate interactions as illustrated in Fig. 3a (Fig. 3a). This translates to substantial enrichment in the specific signal following kinetic challenge and the two-bead capture steps.
It is worth noting that the use of sequential capture of protein-SOMAmer complexes on two sets of streptavidin beads, first through biotin-labeled SOMAmers (catch-1) and then through biotin-labeled proteins (catch-2), substantially reduces nonspecific interactions. As shown in Fig. 3b, eluate from catch-1 beads generally contains the target protein as well as several other proteins that bind SOMAmers nonspecifically. Eluate from catch-2 beads contains only the target protein in substantially pure form, along with its cognate SOMAmer (for these experiments, reversible protein attachment to monomeric avidin catch-2 beads is used) (SI). This is likely due, in part, to a reduction in the amount of total protein following catch-1 bead washing (only SOMAmer-bound or surface-bound proteins remain) as well as to release and recapture of complexes on separate beads in a reversed orientation (attachment through biotin on proteins).
Finally, capture of SOMAmers on a hybridization array permits quantitative determination of the protein present in the original sample by converting the assay signal (relative fluorescence units, RFUs) to analyte concentration (Fig. 3c). These results show that specific SOMAmer-protein interactions can be detected efficiently in highly complex mixtures like serum or plasma.
With this format, we achieved our goal of developing a high-content, highperformance proteomics technology to power biomarker discovery in human disease.
To assess the quantitative performance of the technology, we determined reproducibility and limits of quantification (LOQ). The assay analyzes 96 samples per run. We collected serum samples from 18 healthy volunteers and assayed five replicates of each sample in a single run and repeated this three times. The results show an overall low median CV of ~5% for intra-run and inter-run CV. We also determined the LOQ values of a representative subset (356) of target proteins in the context of all 813 SOMAmers (SI). The median lower limit of quantification (LLOQ) was ~1 pM, with LLOQs as low as 100 fM for some proteins, a median upper limit of quantification (ULOQ) of ~1.5 nM, and a median range of quantification (ROQ) of >3 logs. We found consistent performance in serum for proteins with low endogenous concentrations when titrated into 10% serum and plasma (SI). Overall, we achieve an ROQ for all proteins in a sample of ~7 logs (~100 fM -1 M) with three sample dilutions that span ~2.5 logs.
The content of the discovery array is flexible and highly scalable, permitting us to continue adding content as our SOMAmer menu increases. This highly multiplexed technology therefore has the requisite reproducibility, sensitivity and range for highcontent proteomics studies and unbiased biomarker discovery.
To demonstrate the utility of the platform in discovery of disease-related biomarkers, we analyzed plasma from subjects with chronic kidney disease (CKD), the slow loss of kidney function over time. CKD is a recently recognized global public health problem that is "common, harmful, and treatable" with an estimated prevalence of nearly 10% worldwide 21 . Early intervention in CKD can substantially improve prognosis, which is otherwise poor 21-24 . To achieve early diagnosis, predictive, and non-invasive CKD biomarkers are needed. Such markers also would be useful for monitoring disease progression and guiding treatment 21-24 .
We chose CKD as a test case because kidney physiology provides filtration of serum molecules based on size (molecular mass) and charge 25 -thus CKD might lead to an increase in the concentration of small proteins (MW <45 kDa). Disease progression is expected to be accompanied by an overall increase in plasma concentration of small proteins.
We obtained and analyzed plasma samples from 42 subjects with CKD. Eleven subjects had early-stage CKD based on estimated GFR (eGFR, defined as stages 1-2, median creatinine clearance 70 ml/min/m 2 , range 62-97 ml/min/m 2 ) and 31 had latestage CKD (stages 3-5, median creatinine clearance 25 ml/min/m 2 , range 7-49 ml/min/m 2 ) 26 . We measured 614 human proteins (array size at the time analyses were conducted) simultaneously for each sample and compared the results of early-to latestage CKD (Fig. 4a).
We identified 60 proteins that varied significantly between the two groups, using the Mann-Whitney test, with a q-value (false discovery rate-corrected p-value) of 4.2 x 10 -4 (Supplementary Table 10). Eleven proteins with the most highly significant variation (q-values <3.5 x 10 -7 ) are highlighted in Fig. 4a and shown in Table 1. Nine out of eleven are relatively small proteins (<25 kDa). For all eleven proteins, there is an inverse correlation between eGFR and protein concentration (Fig. 4b), which supports the notion that these proteins are biomarkers for CKD progression. It is also worth noting that two of the eleven proteins, cystatin C and β 2 -microglobulin, are important known biomarkers of CKD 22-24 and two additional proteins, complement factor D and TNF sR-I, have been reported to have elevated concentrations in CKD 27,2828,29 .
Accumulation in plasma of some small proteins appears to be a major change in the proteome. However, the concentration of many low molecular weight proteins did not change appreciably with disease progression (Fig. 4c); pI also was uncorrelated with an increase in plasma concentration as a function eGFR (data not shown). The surprising fact that the biomarkers are not simply ranked according to their molecular masses shows that reduced kidney function is complex. The accumulation of some (but not all) low molecular weight proteins, sometimes called "middle molecules", in plasma of patients with impaired renal filtration has long been implicated in the pathology of kidney disease 29 . High-content proteomic analysis provides a means of unbiased discovery of such proteins and their relationship to disease progression.. This example demonstrates our ability to discover biomarkers to build diagnostic signatures of disease states for which there is an important medical need 21-24 .
Combining multiple biomarkers might create a high resolution picture of CKD to help develop diagnostic tools.
In conclusion, we have presented the first highly multiplexed and efficient aptamer-based proteomics array technology that simultaneously measures large numbers of proteins ranging from low to high abundance in serum. In CKD, we have identified a multitude of biomarkers with large differences in concentration between early-and late-stage disease. Therefore, these biomarkers represent good candidates for use (alone or in combination) in diagnostic tests for CKD progression. A study of more than 500 additional patients at risk for cardiovascular disease (whose eGFRs were also determined) confirmed and extended the biomarkers associated with reduced filtration in this first CKD study (data not shown).
We have also conducted clinical studies in which no biomarkers have emerged.
For example, in our prospective multicenter breast cancer study, we compared plasma proteomic signatures measuring 813 proteins (current array) of 336 women with suspicious mammogram findings. Based on breast biopsy results, 32 women had ductal carcinoma in situ, 57 had invasive breast cancer and 247 had benign disease. There were no statistically significant differences in the proteomic profile among these three groups. These results, while disappointing, demonstrate that biomarkers are not identified by chance merely because thousands of measurements are made. Of course, it is possible that with a larger array, new biomarkers of breast cancer will emerge.
Our experience to date suggests that in most cases, biomarker discovery using our technology lies someplace between these extremes, with potentially useful biomarkers identified in many critical medical areas including cancer, cardiovascular conditions, neurological disorders, and infectious diseases. Frequently, the distribution of biomarker concentrations among two populations contains considerable overlap which creates the impetus for combining multiple biomarkers to achieve the most accurate diagnosis. In an accompanying paper 30 , we report the first large-scale application of our technology to discover and verify a novel biomarker panel for a major important medical condition, lung cancer, in one of the largest and most comprehensive proteomic biomarker studies to date.

METHODS SUMMARY
We used rationally-designed, chemically-modified nucleotides to create a new class of aptamer -Slow Offrate Modified Aptamers (SOMAmers) -to use as protein binding reagents. We developed new SELEX methods, selected high affinity SOMAmers for >800 human proteins, and developed a highly multiplexed protein affinity assay that uses standard DNA quantification technologies as the final readout. We used affinity capture to demonstrate the specificity of SOMAmers for their target proteins. Assay reproducibility was measured with multiple technical replicates of serum and plasma.
We determined the limits and range of quantification of target proteins with six-point, multiplexed standard curves of purified proteins in buffer. To demonstrate the utility of our new proteomics platform to discover potential biomarkers, we profiled 614 proteins in plasma samples from subjects with early-stage and late-stage CKD. The final readout was a custom DNA microarray. We compared the resulting measurements with the non-parametric Mann-Whitney U Test because the data were non-normally distributed ordinal variables. For significance, we used an alpha cutoff of 4.3 x 10 -04 for the q-value (p-value corrected for multiple comparisons using a false discovery rate estimate).
Sixty of the 614 measured proteins were significantly different between early-and latestage CKD subjects.

Supplementary Information
Aptamer-based multiplexed proteomic technology for biomarker discovery

SELEX with modified nucleotides
In order to select SOMAmers with the novel modified nucleotides described above, we developed new methods to incorporate modified nucleotides into the SELEX process. This included the synthesis of random libraries with modified nucleotides and the enzymatic amplification of SELEX pools that contain modified nucleotides. Several enzymes were screened for the ability to incorporate these modified nucleotides, as well as to amplify a modified template. We used Thermococcus kodakaraensis (KOD) DNA polymerase for PCR with a slightly modified buffer, although at low efficiency.
Additionally, conditions have been defined to amplify selected DNA using a two-step process to avoid potential amplification biases. These methods are detailed in the Materials and Methods section below.
To test whether modified nucleotides improve SELEX to human proteins, we compared selections with modified and unmodified nucleotides against thirteen difficult human proteins for which unmodified SELEX had failed. As a control, we included a protein that previously yielded high-affinity SOMAmers with unmodified SELEX. The results of this experiment are shown in Table 1. Only SELEX with modified nucleotides yielded high-affinity SOMAmers to the difficult proteins.
We have selected SOMAmers to > 1000 human proteins. We first implemented 5-benzylaminocarbonyl-dU (BndU) in our high-throughput SELEX pipeline, and our success rate for selections to human proteins rose from < 30% to > 50% to a diversity of human proteins. This supported our hypothesis that we could develop one SELEX protocol that would work repeatedly for very different proteins. Since then, we have incorporated four modified nucleotides, BndU, 5-naphthylmethylaminocarbonyl-dU (NapdU), 5-tryptaminocarbonyl-dU (TrpdU), 5-isobutylaminocarbonyl-dU (iBudU).
Since the incorporation of these modified nucleotides into SELEX experiments, our success overall success rate (pool K d < ~30nM) is ~84% (1204/1428) for high quality SOMAmers to a wide range of human proteins. The 813 human proteins measured by the current array are shown in Table 2 (at the end of this document).

SOMAmer Specificity
We assessed the specificity of select SOMAmers for the targets they were selected against in an affinity binding assay that mimics our multiplexed proteomics assay. The experimental method is outlined in Figure 1 and detailed below in the Materials and Methods section. This experiment mimics Catch 1 and Catch 2 in the proteomics assay and then uses a third step to capture the bound SOMAmer-protein complex with an oligo that is complementary to a portion of the SOMAmer and acts as an affinity tag. This "Catch 3" step is analogous to the DNA microarray hybridization step in the proteomics assay. The captured complexes are then disrupted and the proteins are eluted and analyzed by denaturing poly-acrylamide gel electrophoresis (PAGE), as shown in Figure 3b (main document). Not all complexes are able to be captured in the

Assay Reproducibility
To assess the reproducibility of our proteomic measurements, we measured multiple replicate samples of serum and plasma in a single assay run. Each run spans an entire 96-well microtiter plate. Multiple replicate runs are performed and CVs are calculated for each analyte as a measure of reproducibility.
Three independent automated assay runs (A, B, and C) were initiated and completed on two different days. Each run was comprised of samples from eighteen different individuals run in five replicates along with six no-protein buffer controls to assess assay background. See Materials and Methods below for detailed methods.  Figure 3 and summarized in Table 3 below. The median intra-and interplate CVs are 3.8% and 4.3% for SOMAmers in the 10% mix, 4.4% and 5.5% for SOMAmers in the 1% mix, and 5.6% and 6.8% for SOMAmers in the 0.03% mix.   Table 3. Summary statistics for intra-and inter-run CVs broken out by plasma dilution (10%, 1%, and 0.03%).
Reproducibility Measuring Serum. The CV for each SOMAmer was computed for each sample by averaging over the replicates and then averaging these CVs over all the samples. Both intra-and inter-plate CVs were computed for each dilution mix and are displayed in Figure 4 and summarized in Table 4 below. The median intra-and interplate CVs are 4.3% and 5.0% for SOMAmers in the 10% mix, 4.2% and 4.9% for SOMAmers in the 1% mix, and 5.3% and 6.4% for SOMAmers in the 0.03% mix . The distributions for intra-run and inter-run CVs for serum. The cdfs for the intra-run CVs are on the left and the inter-run CVs are on the right for the three dilutions mixes, 10% (red), 1% (blue), and 0.03% (green). The inter-run CVs are only slightly higher than the intra-run CVs.  Table 4. Summary statistics for intra-and inter-run CVs for serum.

Limits and Ranges of Quantification
In order to determine the quantitative performance of our proteomics platform, we generated precision profiles for 356 analytes. The overall results are presented in Table 5 at the end of this document. A precision profile, which shows the variation in %CV for calculated concentration as a function of analyte concentration, provides an analytic measurement of assay performance and establishes the limits of quantification (LOQ)both the upper and lower limits of quantification (ULOQ and LLOQ) -which define the dynamic range for analyte measurements. We have focused on optimizing and assessing LLOQs in the assay. Therefore, in some cases ULOQ measurements did not plateau in the measured range and represent a minimum estimate of ULOQ and, therefore, a minimum estimate of the range of quantification (ROQ).

Buffer Experiments.
The LOQ experiments measured six-point standard curves spanning six logs in concentration, from 10 nM to 10 fM, for a series of analytes in a multiplexed fashion. A set of proteins was combined at the highest concentration and serially diluted 1:10 to create a set of standards that spanned six logs in concentration.
Each analyte concentration was measured eight times to determine the assay error at each A typical dose-response curve from the data set is displayed in Figure 5. Two distinct approaches were used to compute precision profiles from these data.
The first approach modeled the standard deviation for calculated concentrations  x , obtained by averaging the eight replicates at each concentration, with a quadratic function from which the precision profile was directly obtained. Figure 6 illustrates this approach for the analyte displayed in Figure 5. The second approach is to model the standard deviation of the assay response  logRFU with a quadratic function and then use the doseresponse function to compute the variance in concentration from the response variance. This is not easily accomplished for the dose-response function used here but linearizing the function at a concentration x leads to the following simplification (Eq. 2 and 3).
Typically, the assay CV in response units ( logRFU /logRFU) is fairly constant so using a quadratic function to model  RFU as a function of concentration should suffice. Figure 7 illustrates this for the data in Figure 5.  We produced the full precision profile for each SOMAmer tested using both numerical approaches outlined above. The results for the analyte shown in Figure S5 are presented below in Figure S8 for (a) modeling  x directly (blue line) and (b) modeling  logRFU from which  x is computed (red line). Both methods give essentially the same result in this case for LLOQ and ULOQ. This particular analyte shows a remarkable five-log quantification range at a 20% CV cutoff with an LLOQ of 0.4-0.6 pM and a ULOQ of 40-50 nM. In general there is good agreement between the two different methods for computing precision profiles, and the assay response  logRFU method was used to calculate the values shown in Table 5.  Table 6 at the end of this document and in Figures S9-S11.
The LOQs determined from spiking analytes into plasma and buffer agree well. Both the LLOQs and the ULOQs are consistent between the two fluids; see Figures S9 and S10 below. The ranges of quantification are also in good agreement, see Figure 11, and are centered at 3 logs of dynamic range. In general, these results illustrate that the quantitative behavior of our multiplexed assay, as characterized by precision profiles, is not affected by fluid matrix effects, allowing us to use protein spikes into buffer to assess the quantitative behavior of our multiplexed assay.  analytes. These data were computed by modeling  logRFU as described above.
LLOQ. The LLOQs were computed from the precision profiles of the analytes measured in buffer. Greater than 95% of the analytes examined produced precision profiles with quantification ranges below a 20% CV cutoff. The distribution of determined LLOQs is shown in Figure 12 and summarized in Table 7. The median LLOQ is 0.9 pM and the inter-quartile range is 0.3 pM -3.9 pM. Over half of the analytes examined have LLOQs that are < 1.0 pM. Although some analytes appear quantitative below 10 fM, these need to be verified with lower measurements.   ULOQ. The distribution of determined ULOQs computed from the buffer precision profiles is given in Figure S13 and summarized in Table 8. The median ULOQ is 1.5 nM and the inter-quartile range is 0.7 nM -4.5 nM. Although some analytes in the present analysis appear to be quantitative above 10 nM, these results need to be verified by making measurements higher than those in this study. For example, albumin's doseresponse curve is still increasing at 10 nM and so an accurate determination of the ULOQ is not possible from this data.   Range of Quantification. The total quantitative range of quantification (ROQ) can be defined as the difference between log ULOQ and log LLOQ. Based on the average LLOQ and ULOQ, the expected median quantification range is ~ 3 logs. Figure   S14 shows the distribution of the quantification range which is summarized in Table 9.
The median range is 3.2 logs, consistent with the LOQs discussed above, and the center 50% have ranges from 2.8-3.7 logs. For those analytes that exceed four logs of quantification additional measurements are required for verification. Figure S14. Distribution of the log quantification range for 356 analytes. The cumulative distribution function for the log of the quantification range is displayed in the plot.  Table 9. Summary of the log quantification range for 356 analytes. A summary of the quantification range data is presented in the table.
A summary of the calculated LLOQ, ULOQ quantitative range for each analyte is provided in Table 5 at the end of this document. The analytes have been grouped by their dilution mixes and sorted with respect to LLOQ. In addition to the quantification metrics, the four parameters used in the dose-response curves are provided as well. An entry of 10 -6 M for LLOQ or ULOQ indicates that a limit has not been established from the data. There are sixteen analytes for which both LLOQ and ULOQ are not determined and so the range is denoted as zero or the standard curves were not properly fit and so have no parameters listed. Both these cases occur at the end of the lists. Table 10 lists sixty potential CKD biomarkers identified in this study comparing early-to late-stage disease with the Mann-Whitney U-test with an alpha of 4.3E-04 for the q-value (false discovery rate corrected p-value).

SOMAmer development and SELEX
Selection methods have been developed for use with poly-His-tagged, biotinylated, and non-tagged proteins. Many variations on these protocol have been used to select the >800 SOMAmers for the proteomic platform, such as alternating selection conditions to increase stringency for slow off-rate SOMAmers or performing the equilibrium steps in solution rather than with targets pre-immobilized. The following protocol is representative and was used for the selection described in the main text of this paper (results shown in Table 1). Selection methods are further detailed in our patents and published patent applications 1,2 .

SomaLogic Proteomic Affinity Assay Method
All steps were performed at room temperature unless otherwise indicated.   PMT setting and the XRD option enabled at 0.05. The resulting tiff images were processed using Agilent feature extraction software version 10.5.1.1 with the GE1_105_Dec08 protocol.

Serum and Plasma Reproducibility Studies
For each plate, five aliquots of plasma or serum from 18 individuals were thawed and plated as described below. Six wells containing only buffer were run on every plate. Serum and plasma samples were run on separate plates because they require slightly different buffers as indicated above. Three plates of each sample type were run over the course of several days and included using different lots of buffers and other reagents that might be expected to change within a large study.

Limits of Quantification (LOQ) Experiment
For the LOQ experiments, four different sets of protein mixes were prepared for each of the three SOMAmer mixes, 10%, 1% or 0.03%, for a total of 12 mixes and 356 proteins. The proteins for each mix were chosen to avoid combining known protein binding partners and known protease-substrate pairs.
The proteins were diluted into SB17T containing 2 µM Z-Block_2 so that each protein was at a final concentration of 20 nM.  S15).