Encoding information in synthetic metabolomes

Eamonn Kennedy; Christopher E. Arcadia; Joseph Geiser; Peter M. Weber; Christopher Rose; Brenda M. Rubenstein; Jacob K. Rosenstein

doi:10.1371/journal.pone.0217364

Abstract

Biomolecular information systems offer exciting potential advantages and opportunities to complement conventional semiconductor technologies. Much attention has been paid to information-encoding polymers, but small molecules also play important roles in biochemical information systems. Downstream from DNA, the metabolome is an information-rich molecular system with diverse chemical dimensions which could be harnessed for information storage and processing. As a proof of principle of small-molecule postgenomic data storage, here we demonstrate a workflow for representing abstract data in synthetic mixtures of metabolites. Our approach leverages robotic liquid handling for writing digital information into chemical mixtures, and mass spectrometry for extracting the data. We present several kilobyte-scale image datasets stored in synthetic metabolomes, which can be decoded with accuracy exceeding 99% using multi-mass logistic regression. Cumulatively, >100,000 bits of digital image data was written into metabolomes. These early demonstrations provide insight into some of the benefits and limitations of small-molecule chemical information systems.

Figures

Citation: Kennedy E, Arcadia CE, Geiser J, Weber PM, Rose C, Rubenstein BM, et al. (2019) Encoding information in synthetic metabolomes. PLoS ONE 14(7): e0217364. https://doi.org/10.1371/journal.pone.0217364

Editor: Andrew C. Gill, University of Lincoln, UNITED KINGDOM

Received: March 9, 2019; Accepted: May 10, 2019; Published: July 3, 2019

Copyright: © 2019 Kennedy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Mass spectra from this work may be downloaded from Metabolomics Workbench data repository (study ST001173). Raw data is also available from the Brown Digital Repository (DOI: 10.26300/jwv9-ew20).

Funding: This research was supported by funding from the Defense Advanced Research Projects Agency (DARPA W911NF-18-2-0031) to BMR and JKR. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have submitted a provisional patent application (62/791,504) related to this work.

Introduction

The metabolome is the complete set of small molecules found in a biological system [1]. The properties of this set of compounds are an amplified and dynamic measure of an organism’s genome, transcriptome, proteome, and environment [2]. This makes the metabolome an incredibly information-rich system, which displays diverse chemical, structural and biological dimensions [3–5].

Although much remains to be understood, improvements in protocols and efficient mass spectrometry (MS) have enabled metabolomic disease screening and drug discovery [6–12]. These technologies are supported by continually improving statistical tools and databases [13, 14]. As these tools advance, they may also suggest exciting alternative applications for metabolomics.

For inspiration, we observe that researchers have mimicked living systems by using DNA [15] for long-term archival information storage [16, 17], building on rapid advances in sequencing technology. Given recent progress in proteomic and metabolic profiling tools [18–21], it is timely to explore if the metabolome can also be used in a complementary way for information representations.

Whereas DNA and proteins are often large molecules which exist in small numbers, metabolites are higher in number, smaller in mass, and more structurally and energetically diverse. Like DNA, metabolites are biologically ubiquitous, and their primary pathways and processes are conserved across species [22]. The power of DNA as an information carrier comes from the combinatorial complexity that can exist within one polymer [23]. By contrast, the power of the metabolome is in the diversity of many co-existing molecules which can interact, or be acted upon, in complex combinations [5].

Non-genomic molecular data storage has also been demonstrated using fluorescent dyes on polymer films [24] and rotaxanes [25]. Other demonstrations have utilized collections of fluorophores which interact with information-bearing compounds in statistically identifiable ways [26]. However, all of these methods encode information into the state of a single compound at one time.

In this paper, we encode abstract binary data into the chemical composition of thousands of spatially arrayed nanoliter volumes (Fig 1a). Each volume (‘spot’) contains a prescribed mixture from a library of purified metabolites—a synthetic metabolome. A key strength of this work is that it can be applied to any chemical library. Metabolites hold particular potential, because they provide access to well-regulated interconversion networks, materials, and databases which could facilitate computational operations on chemical data. The presence or absence of one metabolite in one spot encodes one bit of information. Therefore, the total number of bits stored in one spot is equal to the number of available library elements [27].

Download:

Fig 1. Writing and reading data encoded in mixtures of metabolites.

(a) Binary image data is mapped onto a set of metabolite mixtures, with each bit determining the presence/absence of one compound in one mixture. For example, a spot mapped to four bits with values [0 1 0 1] may contain the 2^nd and 4^th metabolite at that location. (b) Small volumes of the mixtures are spotted onto a steel plate and the solvent is evaporated (scale bars: 5 mm). This chemical dataset is analyzed by MALDI mass spectrometry (b, bottom). Using the observed mass spectrum peaks, decisions are made about which metabolites are present. These decisions are assembled from the array of spots to recover the original image. The image shown is the Rhode Island Hope Regiment Colors [28].

https://doi.org/10.1371/journal.pone.0217364.g001

We recover the encoded data from metabolic mixtures using mass spectrometry (Fig 1b). The data aquisition is inherently parallelized, because a single mass spectrum provides information on every compound in a mixture. Noise characterization and logistic regression strategies for recovering the data are presented, along with examples of chemically encoded digital images. Raw error rates <1% are achieved with kilobyte-scale data sets using a simple peak analysis, illustrating the viability of both writing and reading metabolomic information. We use these experimental demonstrations to consider the benefits and limitations of encoding data into a biochemical medium in which interactions and interconversions can occur.

Materials and methods

Chemical library preparation

Reagent grade samples of 36 distinct metabolic compounds (Table A in S1 File) were diluted in dimethyl sulfoxide (DMSO, anhydrous), each to a nominal concentration of 25mM. Some metabolites were initially dissolved in an alternative solvent (de-ionized water with or without 0.5M or 1M hydrochloric acid) to facilitate solvation in DMSO. 10μL of each compound was aliquoted into a 384-well microplate (Labcyte 384LDV).

Data mixture preparation

The chemical data mixtures were prepared on a 76 × 120 mm² stainless steel MALDI plate. An acoustic liquid handler (Labcyte Echo 550) was employed to transfer the compounds from the library wellplate onto the MALDI plate. The nominal droplet transfer volume is 2.5 nL, but to reduce variability, we typically use 2 droplets (5 nL) per compound. The destinations of the droplets are programmed to match a standard 2.25mm pitch 1536-spot (32 × 48) target.

After spotting the compounds to the MALDI plate, a MALDI matrix material was added to each location. We selected 9-Aminoacridine for its compatibility with metabolite libraries, its low background in the small molecule regime, and its support for both positive and negative ion modes. The MALDI plate is left to dry and crystallize overnight (∼ 10 hours). Once dried, the plate can be stored in a humidity controlled cabinet or analyzed by MALDI-FT-ICR mass spectrometry.

Mass analysis of data plates

A Fourier-transform ion cyclotron resonance (FT-ICR) mass spectrometer (SolariX 7T, Bruker) was used to analyze the crystallized metabolite data mixtures. The exact resolution is a function of the measurement time allocated per spectrum. For these experiments, we typically used 0.5-1 sec, yielding a resolution of < 0.001 Da. The instrument is run in MALDI mode and is configured to serially measure the mass spectrum of each mixture on the 48x32 grid. Acquisition for a full plate takes <2 hours.

To read the encoded data from the mass spectra, the probability of a metabolite being present is modeled as a combination of multiple predictor masses. A multinomial logistic regression considers the natural exponent of an offset plus the sum of all identifying mass SNRs, where each SNR is multiplied by a trained weight coefficient. A limited-memory BFGS algorithm was used to predict the logistic accuracy scores given an input of the n best peaks per metabolite. This process was iterated for all metabolome constituents.

Results

Writing synthetic metabolomes

Our synthetic metabolome is a diverse set of 36 compounds (Table A in S1 File) including vitamins, nucleosides, nucleotides, amino acids, sugars, and metabolic pathway intermediates (all purchased from Sigma-Aldrich). To write data into mixtures of metabolites, we use an acoustic liquid handler (Echo 550, Labcyte) to transfer pure metabolic solutions in 2.5 nL increments to pre-defined locations on a steel MALDI plate. A 2.25 mm pitch grid was chosen for compatibility with standard wellplate protocols. This produces a spatial array of different mixtures of metabolites where the presence (or absence) of each compound in each mixture encodes one bit of information.

After evaporating the solvent, each data plate contains up to 1536 dried spots (Fig 1b), which we can analyze using Matrix Assisted Laser Desorption Ionization (MALDI) mass spectrometry (MS). To prescreen each compound in the synthetic metabolome, a plate was written with combinatorial mixtures of all 36 metabolites across 1400 unique spots (Fig A in S1 File). Since MALDI protocols are chemically specific, we do not expect the same identification accuracy across the whole compound library under one set of conditions. We use this pre-screen to determine the MS identification accuracy for every metabolite with the same protocol.

Ion cyclotron mass spectrometry of metabolite mixtures

A Fourier-transform ion cyclotron resonance (FT-ICR) mass spectrometer (SolariX 7T, Bruker) was used to analyze the array of crystallized mixtures. In FT-ICR MS, a pulsed RF field excites ions into a periodic orbit with a frequency that is determined by the magnetic field strength and the ion’s mass [29], which enables much finer mass resolution than time-of-flight (ToF) instruments. In these experiments, the mass resolution is typically 0.001 Da (Fig B in S1 File). Using FT-ICR MS, metabolites can be discriminated even if their masses are only milli-Daltons apart.

In Fig 2(a), one positive-ion MALDI-FT-ICR mass spectrum is shown for a spot which included guanosine (go), together with 9-Aminoacridine (9A) matrix. Protonated matrix adducts are identified at peaks 1 and 6 (blue), along with adducts of guanosine, labeled (2: Na, 3: K, 4: 2K—H and 5: isoproyl alcohol (IPA) + H). The observed intensities vary by adduct and species. In Fig 2(b), the intensity of the first peak (protonated matrix at m/z = 195.0916 ± 0.001) is illustrated across 1024 spots.

Download:

Fig 2. Analyzing chemical data plates with mass spectrometry.

(a) Positive mode MALDI-FT-ICR mass spectrum of one spot containing Guanosine (go) and the MALDI matrix, 9-Aminoacridine (9A). Automated analysis of each spot used 4x averaged 1-second acquisitions. go ions (2, 3, 4, 5, in red) are present, along with two protonated matrix peaks (1, 6, in blue). (b) The intensity of the protonated matrix (peak 1) at m/z = 195.0916 ± 0.001 is shown graphically for a MALDI plate with 1024 independent mixture spots. Protonated aminoacridine is positively identified in 1020 spots (99.6%).

https://doi.org/10.1371/journal.pone.0217364.g002

Many open-access tools are available for metabolite peak detection and assignment from MS spectra [21]. To clearly relate the mass spectra to binary data, we consider a rudimentary detection scheme: if a metabolite’s mass intensity is above a particular threshold, then it is declared present, and the binary state of its address is set to 1 (or to 0, if its mass peak is absent). This approach identified the substrate matrix protonated peak in 1020 out of 1024 spots (≈ 99.6%) in Fig 2(b).

As an inital demonstration, we selected a library subset of 6 metabolites, which were used to encode a 6,142-pixel binary image of a Nubian ibex [30] into an array of 1024 mixtures (Fig C in S1 File). After pseudo-random interleaving, the data was mapped onto the presence or absence of sorbitol (so), glutamic acid (ga), tryptophan (tp), cytidine (cd), guanosine (go) and 2-deoxyguanosine hydrate (gh). The plate was written and then analyzed using FT-ICR MS as detailed in the Methods.

Fig 3a presents a spatial map and histogram of the spectral background noise observed in 240 independent spots. Before further analysis, we divide each spectrum by its background σ, which allows more direct comparison of signal strength at multiple locations. Signal strength is a complex function of the sample preparation, analyte and adduct. After normalization, peaks of interest for the 6 metabolites are shown in Fig 3b. The first row is a spot whose data contains the six bits [1 0 0 0 0 0], and thus only the m/z peak associated with the first metabolite (sorbitol) is present. Similarly, five other ‘one-shot’ patterns are shown which can be decoded without error.

Download:

Fig 3. Spectral background and noise considerations.

(a) Heatmap of the standard deviation of the MALDI-FT-ICR-MS spectral background noise from 240 unique spots of distinct mixtures, and a histogram of the background intensity standard deviation. (b) Spectra for six metabolites, normalized by the noise standard deviation. The m/z is cropped to six ranges of interest. Six ‘one-hit’ mixtures are plotted, one for each metabolite. (c) To assign presence/absence, we choose an intensity threshold at an appropriate m/z. As shown here, a 3σ threshold applied to the [2M_tp + K]⁺ tryptophan peak yielded a discrimination accuracy of 96%. (d) A hit map of the same tryptophan peak illustrating recovery using the 3σ threshold. Interestingly, the few errors are clustered at the edges of the plate.

https://doi.org/10.1371/journal.pone.0217364.g003

A threshold of 3σ was chosen as the intensity required to declare the presence of a metabolite. For example, if we examine the tryptophan [2M_tp + K]⁺ mass (Fig 3c), we find that this threshold yields 96% correct classification. This detection scheme can also be visualized for each spot on the plate, as shown in Fig 3d. Clustering of errors at the edges of the plate suggests that small misalignments between the MALDI laser positions and the droplet spotting locations were a source of error.

Statistical analysis of data plates

In practice, one compound will be associated with multiple peaks, having varying signal-to-noise ratios and usefulness. For a given metabolome, we should attempt to identify which m/z peaks are most appropriate to identify each library element. Each high-resolution FT-ICR mass spectrum contains ∼2 × 10⁶ m/z points. Since most of the spectral space is background, it is helpful to first reduce the number of features to those which may be statistically useful. 1,444 candidate peaks found in the ensemble average of all mass spectra were tested to determine how accurately the intensities at that m/z classified the encoded data values (Fig 4a).

Download:

Fig 4. Identifying discriminating peaks.

(a) The read recovery for different masses across the MS spectrum is shown for ga. (b) The histogram of adducts associated with peaks from the data in Fig 3 indicates sodiated ions are predominant. (c) For each metabolite, we plot the number of peaks achieving a given detection accuracy. With the exception of sorbitol, each metabolite has at least 10 identifying peaks with >97% accuracy. (d) The error of the single best performing mass for each metabolite. (e) Using only the best performing mass from (d), the error rate for the six metabolites across 1024 locations (6144 bits) is shown as a function of the SNR cutoff. These mixtures encoded the 6142-bit image shown in (f). In (g), we recover the image with a 2.5σ decision threshold, producing approximately 2% cumulative read/write error.

https://doi.org/10.1371/journal.pone.0217364.g004

Although these peaks were identified without chemical bias, many features can be attributed to known metabolite adduct ions (although some are synthesis byproducts or derive from the substrate matrix). A histogram of the associated adduct masses is shown in Fig 4b. H, Na, Na-H and K adducts are all frequently observed.

The number of peaks achieving detection accuracy in the range of 70-100% is shown in Fig 4c. Selecting the best performing peak for each metabolite, and applying a detection threshold of 2.5σ, was sufficient to recover data at about 2% cumulative read/write error (Fig 4e). The corresponding input and output data images are shown in Fig 4f and 4g. The simplicity and success of the overall read and write process is encouraging, but there is still significant room for improvement.

Decoding data from multiple peaks using logistic regression

Assuming that the discriminating peaks are partially uncorrelated (Fig D in S1 File), it is reasonable to seek improvement by utilizing multiple m/z peaks per metabolite. Such strategies will become increasingly important in more complex metabolomes.

Using techniques similar to those for the 6kb ibex image, we encoded a 17,424-bit image of a cat from an Egyptian tomb [31] using 1,452 spots containing data mixtures from a 12-metabolite subset of the library (Fig 5a). We used this data to extend the decoding scheme to incorporate multiple m/z features. After identifying the set of statistically discriminating peaks, we performed a logistic regression using between 1 and 16 of the best-performing peaks. Multi-mass regression achieved a read accuracy of 97.7% for the whole cat image (Fig 5c). Cumulative read error rates for the data in Figs 4 and 5 are shown as a function of the number of masses used in the logistic regression. Applying these techniques to the earlier ibex dataset, an error rate of <0.5% was achieved. However, repeated measurement of spots can cause data loss. It was found that <1% error was added by each successive read of a data plate (Fig E in S1 File). Using a different plate for training achieved the same accuracy without overfitting (Fig F in S1 File). Overall, these demonstrations show that the metabolome is a viable and robust medium for representing digital information.

Download:

Fig 5. Logistic regression for multi-peak molecular data readout.

Improvements over single-peak classification can be achieved with logistic regression utilizing multiple identifying masses per metabolite. (a) (i) A 17,424-bit image written into 1452 mixtures from a 12-metabolite subset of the library. (ii) The image recovered using one discriminating mass per metabolite. (iii) The image recovered using a regression combining 16 peaks per molecule. (b) Some compounds achieve low error rates even with single peaks. However, other molecules do not have an isolated m/z peak that shows high accuracy by itself. For these compounds, multi-peak logistic regression improves classification. (c) Cumulative read error rates for the two images as a function of the number of masses used in the logistic regression.

https://doi.org/10.1371/journal.pone.0217364.g005

Discussion

One advantage of molecular data storage is its high storage density. To date, demonstrations using DNA have reached about 214 petabytes per gram [32], although this is still orders of magnitude from theoretical limits [33]. For moderate amounts of data, an encoded metabolome written using a large small-molecule library could improve on this number [34], thanks to its increased chemical diversity. Our experiments highlight several limitations and potential benefits that warrant further discussion.

Statistically discriminating m/z features were used to classify the metabolite mixtures and recover the data at 98-99.5% accuracy using a simple analysis. Further development can take advantage of the wide range of sophisticated analysis technologies for metabolic profiling, including artificial neural networks, genetic algorithms, and self-organizing maps [35]. The inclusion of these methods, in conjunction with error correcting codes, leaves ample headroom for improved data recovery from more complex mixtures.

In terms of data rates, we demonstrated write speeds of 5 bits/sec, and aggregate read speeds of 11 bits/sec. We have performed little optimization of either the read or write times, and as the size of the metabolite library is increased, the MS read speed in particular has significant room to improve.

Looking forward, it would be interesting to consider the upper bound on information capacity using all known metabolites (∼ 10⁵ [14]). Even if only a fraction are stable, detectable, and display unique masses, this conservatively predicts hundreds of bits per spectral acquisition, which could all be read in parallel. As sub-zeptomole MS and nanomolar concentration detection have been available for nearly two decades [36, 37], detection at this level of complexity seems plausible.

Improvements in spatial density, and perhaps write speed, could come from reducing the volume and pitch of spots. There are opportunities for high density multilayer printing. To avoid storage density limits arising from finite transfer volumes, the precise mixture of metabolites associated with one spot can be pre-mixed in one well of an intermediary data plate. Transfer of 2.5 nL from the intermediary plate well to one spot means that hundreds of metabolites can be present in a nL volume on the plate. There is also room to extend on this work using larger libraries for higher capacity, or by storing multiple bits per complex, leveraging oligomerization [38].

In terms of density, we elected to use millimeter-scale arrays compatible with commercial instrumentation. Scaling the mixture spots down to diffraction-limited laser spot scales could improve data storage density by 6 orders of magnitude. Theoretically, this could facilitate extension from kilobyte- to gigabyte-scale data sets per plate. However, the true limit of data storage density depends on the available instrumentation.

ICR-MS (or other high-resolution MS such as orbital traps) have a finite ion capacity per acquisition, so the number of compounds can not be arbitrarily increased due to competition. Metabolites with a lower ionization efficiency will be excluded even though present in a large, competitive mixture. Therefore, to increase the number of metabolites per spot, future work may need to screen libraries for ionization efficiency. Alternatively, other read strategies (e.g. nanopores [39–41]) could provide higher sensitivity.

A likely source of error in more complex mixtures will be interactions between metabolites [5]. However, interspecies networks may also have benefits, such as opportunities for overwriting or transforming data, which hints at possibilities for synthetic metabolomic computation. One recurring challenge in metabolomics is obtaining trustworthy ‘ground truth’ samples. Perhaps by considering metabolomes as more abstract and mutable stores of information, we can develop new tools that allow us to overcome statistical biases, establish ground truths, and tease out subtle interactions and interconversion rates in well-regulated synthetic metabolomes.

Conclusion

‘Omics’ technologies have grown out of genomics to encompass other complex information-rich systems like the metabolome. It is natural to ask whether there exist complementary opportunities to make use of metabolites’ structural diversity and interactivity. As a proof of principle of postgenomic small-molecule information storage, we have experimentally encoded >100,000 bits of digital images into synthetic metabolomes (Table B in S1 File), and we are confident that this number can be increased significantly in the future. One novel contribution is the demonstration of data storage in a mixture of dissimilar molecules, which can improve information capacity and read times through diversity and parallelism. Perhaps more importantly, this work offers a new perspective on small-molecule chemical information, and it introduces possibilities for synthetic metabolomic computation and establishing metabolic ‘ground truths’ through interrogation of synthetic metabolomes.

Supporting information

S1 File. Supporting information.

Additional details about library compounds, read error rates, dataset sizes, repeated reads, data plates, error correlations, training cross-validation, and adducts.

https://doi.org/10.1371/journal.pone.0217364.s001

(PDF)

Acknowledgments

The authors are grateful for support from Sherief Reda, Eunsuk Kim, and Jason Sello. This research was supported by funding from the Defense Advanced Research Projects Agency (DARPA W911NF-18-2-0031, BMR and JKR). The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

References

1. Kell D. B. & Oliver S. G. The metabolome 18 years on: a concept comes of age. Metabolomics. 12(9), 148 (2016). pmid:27695392
- View Article
- PubMed/NCBI
- Google Scholar
2. Manzoni C. Kia D. A. Vandrovcova J. Hardy J. Wood N. W. Lewis P.A. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Briefings in Bioinformatics. 19(2), 286–302 (2016).
- View Article
- Google Scholar
3. Schmölz L. Birringerm M, Lorkowski S. & Wallert M. Complexity of vitamin E metabolism. World J. Biol Chem. 7(1), 14–43 (2016). pmid:26981194
- View Article
- PubMed/NCBI
- Google Scholar
4. Meiser J. Weindl D. & Hiller K. Complexity of dopamine metabolism. Cell Comm. and Siga. 11:34 (2013).
- View Article
- Google Scholar
5. Sung J. Kim S. Jill Taar Cabatbat J. Jang S. Jin Y. Jung G. Y. et al. Global metabolic interaction network of the human gut microbiota for context-specific community-scale analysis. Nature Communications. 8, 15393 (2017).
- View Article
- Google Scholar
6. Dettmer K. Aronov P. A. & Hammock B.D. Mass spectrometry-based metabolomics. Mass Spectrom Rev. 26(1), 51–78 (2007). pmid:16921475
- View Article
- PubMed/NCBI
- Google Scholar
7. Zhang A. Sun H. Yan G. Wang P. & Wang X. Mass spectrometry-based metabolomics: applications to biomarker and metabolic pathway research. Biomed Chromatogr. 30(1), 7–12 (2016). pmid:25739660
- View Article
- PubMed/NCBI
- Google Scholar
8. Park B. K. Boobis A. Clarke S. Goldring C. E. Jones D. Kenna J. G. et al. Managing the challenge of chemically reactive metabolites in drug development Nature Reviews Drug Discovery. 10, 292–306 (2011). pmid:21455238
- View Article
- PubMed/NCBI
- Google Scholar
9. Sumner L. W. Lei Z. Nikolaubc B. J. & Saitode K. Modern plant metabolomics: advanced natural product gene discoveries, improved technologies, and future prospects. Nat. Prod. Rep. 32(2), 212–229 (2015). pmid:25342293
- View Article
- PubMed/NCBI
- Google Scholar
10. Gowda G. A. & Djukovic D. Overview of mass spectrometry-based metabolomics: opportunities and challenges. Methods Mol Biol. 1198, 3–12 (2014). pmid:25270919
- View Article
- PubMed/NCBI
- Google Scholar
11. Matsuda F. Technical Challenges in Mass Spectrometry-Based Metabolomics. Mass Spectrom (Tokyo) 5(2), S0052 (2016).
- View Article
- Google Scholar
12. Zampieri M. Sekar K. Zamboni N. & Sauer U. Frontiers of high-throughput metabolomics. Current Opinion in Chem. Bio. 36, 15–23 (2017).
- View Article
- Google Scholar
13. Brown M. Dunn W. B. Dobson P. Patel Y. Winder C. L. Francis-McIntyre S. et al. Mass spectrometry tools and metabolite-specific databases for molecular identification in metabolomics. Analyst. 134(7), 1322–32 (2009). pmid:19562197
- View Article
- PubMed/NCBI
- Google Scholar
14. Wishart D. S. Feunang Y. D. Marcu A. Guo A. C. Liang K. Vázquez-Fresno R. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 4(46), 608–617 (2018)
- View Article
- Google Scholar
15. Davis J. Microvenus. Art J. 55, 70 (1996).
- View Article
- Google Scholar
16. Church G. M. Gao Y. & Kosuri S. Next-Generation Digital Information Storage in DNA. Science. 337(6102), 1628 (2012).
- View Article
- Google Scholar
17. De Silva P. Y. & Ganegoda G. U. New Trends of Digital Data Storage in DNA. Biomed. Res. Int. 8072463 (2016). pmid:27689089
- View Article
- PubMed/NCBI
- Google Scholar
18. Cox J. & Mann M. Quantitative, High-Resolution Proteomics for Data-Driven Systems Biology. Ann. Rev. of Biochem. 80, 273–299 (2011).
- View Article
- Google Scholar
19. Nagaraj N. Wisniewski J. R. Geiger T. Cox J. Kircher M. Kelso J. et al. Deep proteome and transcriptome mapping of a human cancer cell line. Molecular Systems Biology 7(1), 548 (2011). pmid:22068331
- View Article
- PubMed/NCBI
- Google Scholar
20. Roumpeka D. D. Wallace R. J. Escalettes F. Fotheringham I. & Watson M. A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data. Front. Genet. 8, 23 (2017). pmid:28321234
- View Article
- PubMed/NCBI
- Google Scholar
21. Hiller K. Hangebrauk J. Jäger C. Spura J. Schreiber K. & Schomburg D. MetaboliteDetector: comprehensive analysis tool for targeted and nontargeted GC/MS based metabolome analysis. Anal Chem. 81(9) 3429–39 (2009). pmid:19358599
- View Article
- PubMed/NCBI
- Google Scholar
22. Peregrín-Alvarez J. M. Sanforf C. Parkinson J. The conservation and evolutionary modularity of metabolism. Genome Biol. 10(6): R63 (2009). pmid:19523219
- View Article
- PubMed/NCBI
- Google Scholar
23. Cartwright J. H. E., Giannerini S. & Gonzalez D. L. DNA as information: at the crossroads between biology, mathematics, physics and chemistry. Phil. Trans. Roy. Soc. of London A: M., Phys. and Eng. Sci. 374 (2064), (2016).
- View Article
- Google Scholar
24. Wei P. Li B. Leon A.D. & Pentzer E. Beyond binary: optical data storage with 0, 1, 2, and 3 in polymer films. J. Mat. Chem. C. 5(23), 5780–5786 (2017).
- View Article
- Google Scholar
25. Green J. E. Choi J. W. Boukai A. Bunimovich Y. Johnston-Halperin E. DeIonno E. et al. A 160-kilobit molecular electronic memory patterned at 1011 bits per square centimetre. Nature. 445, 414, (2007). pmid:17251976
- View Article
- PubMed/NCBI
- Google Scholar
26. Sarkar T. Selvakumar K. Motiei L. & Margulies D. Message in a molecule. Nat. Comm. 7, 11374 (2016).
- View Article
- Google Scholar
27. Arcadia, C. E. Hokchhay, T. Dombroski, A. Ferguson, K. Chen, S. L. Kim, E. et al. Parallelized Linear Classification with Volumetric Chemical Perceptrons. IEEE International Conference on Rebooting Computing (ICRC) (2018).
28. The Rhode Island Hope Regiment Colors 1781. Rhode Island State House, 82 Smith St. Providence, RI. USA.
29. Nikolaev EN. Kostyukevich YI. & Vladimirov GN. Fourier transform ion cyclotron resonance (FT ICR) mass spectrometry: Theory and simulations. Mass Spectrom Rev. 35(2), 219–258 (2016). pmid:24515872
- View Article
- PubMed/NCBI
- Google Scholar
30. Unknown artist. ‘Ibex or Gazelle, Block Print’, 13th or 14th century Egyptian. Ink and white pigment on paper. Accession 2016.624. Gallery 454. Metropolitan Museum of Art. Fifth Avenue, NY. USA.
31. Wilkinson, C. K. ‘Cat Killing a Serpent’, 1921. Facsimile made with Tempera on paper. Accession 30.4.1. Gallery 135. Metropolitan Museum of Art, Fifth Avenue, NY. USA.
32. Erlich Y. & Zielinski D. DNA Fountain enables a robust and efficient storage architecture. Science. 355, 950–954 (2017). pmid:28254941
- View Article
- PubMed/NCBI
- Google Scholar
33. Rose C. & Wright G. Inscribed matter as an energy-efficient means of communication with an extraterrestrial civilization. Nature 431, 47–49 (2004). pmid:15343327
- View Article
- PubMed/NCBI
- Google Scholar
34. Rosenstein, J. K. Rose, C. Reda, S. Weber, P. M. Kim, E. Sello, J. et al. Principles of Information Storage in Molecular Mixtures, arXiv:1905.02187, submitted.
35. Kouskoumvekaki I. & Panagiotou G. Navigating the Human Metabolome for Biomarker Identification and Design of Pharmaceutical Molecules. J Biomed Biotechnol. 525497 (2011). pmid:20936122
- View Article
- PubMed/NCBI
- Google Scholar
36. Tang Y. Pingitore F. Mukhopadhyay A. Phan R. Hazen T. C. & Keasling J. D. Pathway Confirmation and Flux Analysis of Central Metabolic Pathways in Desulfovibrio vulgaris Hildenborough using Gas Chromatography-Mass Spectrometry and Fourier Transform-Ion Cyclotron Resonance Mass Spectrometry. J. Bact. 189, 940–949 (2007). pmid:17114264
- View Article
- PubMed/NCBI
- Google Scholar
37. Belov M. E. Gorshkov M. V. Udseth H. R. Anderson G. A. & Smith R. D. Zeptomole-Sensitivity Electrospray Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectrometry of Proteins. Anal. Chem. 72(10), 2271–2279 (2000). pmid:10845374
- View Article
- PubMed/NCBI
- Google Scholar
38. Martens S. Landuyt A. Espeel P. Devreese B. Dawyndt P. & Du Prez F. Multifunctional sequence-defined macromolecules for chemical data storage. Nat. Commun. 9, 4451 (2018). pmid:30367037
- View Article
- PubMed/NCBI
- Google Scholar
39. Kennedy E. Dong Z. Tennant C. & Timp G. Reading the primary structure of a protein with 0.07 nm³ resolution using a subnanometre-diameter pore. Nat. Nano. 11(11), 968 (2016).
- View Article
- Google Scholar
40. Arcadia C. E., Reyes C. C., & Rosenstein J.K. In Situ Nanopore Fabrication and Single-Molecule Sensing with Microscale Liquid Contacts. ACS Nano, 11 (5), pp. 4907–4915, 2017. pmid:28485922
- View Article
- PubMed/NCBI
- Google Scholar
41. Galenkamp N.S. Soskine M. Hermans J. Wloka C. & Maglia G. Direct electrical quantification of glucose and asparagine from bodily fluids using nanopores. Nat. Commun. 9, 4085 (2018). pmid:30291230
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Kell D. B. & Oliver S. G. The metabolome 18 years on: a concept comes of age. Metabolomics. 12(9), 148 (2016). pmid:27695392
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Manzoni C. Kia D. A. Vandrovcova J. Hardy J. Wood N. W. Lewis P.A. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Briefings in Bioinformatics. 19(2), 286–302 (2016).
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Schmölz L. Birringerm M, Lorkowski S. & Wallert M. Complexity of vitamin E metabolism. World J. Biol Chem. 7(1), 14–43 (2016). pmid:26981194
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Meiser J. Weindl D. & Hiller K. Complexity of dopamine metabolism. Cell Comm. and Siga. 11:34 (2013).
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Sung J. Kim S. Jill Taar Cabatbat J. Jang S. Jin Y. Jung G. Y. et al. Global metabolic interaction network of the human gut microbiota for context-specific community-scale analysis. Nature Communications. 8, 15393 (2017).
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. Dettmer K. Aronov P. A. & Hammock B.D. Mass spectrometry-based metabolomics. Mass Spectrom Rev. 26(1), 51–78 (2007). pmid:16921475
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Zhang A. Sun H. Yan G. Wang P. & Wang X. Mass spectrometry-based metabolomics: applications to biomarker and metabolic pathway research. Biomed Chromatogr. 30(1), 7–12 (2016). pmid:25739660
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Park B. K. Boobis A. Clarke S. Goldring C. E. Jones D. Kenna J. G. et al. Managing the challenge of chemically reactive metabolites in drug development Nature Reviews Drug Discovery. 10, 292–306 (2011). pmid:21455238
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. Sumner L. W. Lei Z. Nikolaubc B. J. & Saitode K. Modern plant metabolomics: advanced natural product gene discoveries, improved technologies, and future prospects. Nat. Prod. Rep. 32(2), 212–229 (2015). pmid:25342293
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Gowda G. A. & Djukovic D. Overview of mass spectrometry-based metabolomics: opportunities and challenges. Methods Mol Biol. 1198, 3–12 (2014). pmid:25270919
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref11] 11. Matsuda F. Technical Challenges in Mass Spectrometry-Based Metabolomics. Mass Spectrom (Tokyo) 5(2), S0052 (2016).
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref12] 12. Zampieri M. Sekar K. Zamboni N. & Sauer U. Frontiers of high-throughput metabolomics. Current Opinion in Chem. Bio. 36, 15–23 (2017).
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref13] 13. Brown M. Dunn W. B. Dobson P. Patel Y. Winder C. L. Francis-McIntyre S. et al. Mass spectrometry tools and metabolite-specific databases for molecular identification in metabolomics. Analyst. 134(7), 1322–32 (2009). pmid:19562197
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref14] 14. Wishart D. S. Feunang Y. D. Marcu A. Guo A. C. Liang K. Vázquez-Fresno R. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 4(46), 608–617 (2018)
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref15] 15. Davis J. Microvenus. Art J. 55, 70 (1996).
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref16] 16. Church G. M. Gao Y. & Kosuri S. Next-Generation Digital Information Storage in DNA. Science. 337(6102), 1628 (2012).
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref17] 17. De Silva P. Y. & Ganegoda G. U. New Trends of Digital Data Storage in DNA. Biomed. Res. Int. 8072463 (2016). pmid:27689089
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref18] 18. Cox J. & Mann M. Quantitative, High-Resolution Proteomics for Data-Driven Systems Biology. Ann. Rev. of Biochem. 80, 273–299 (2011).
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref19] 19. Nagaraj N. Wisniewski J. R. Geiger T. Cox J. Kircher M. Kelso J. et al. Deep proteome and transcriptome mapping of a human cancer cell line. Molecular Systems Biology 7(1), 548 (2011). pmid:22068331
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref20] 20. Roumpeka D. D. Wallace R. J. Escalettes F. Fotheringham I. & Watson M. A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data. Front. Genet. 8, 23 (2017). pmid:28321234
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref21] 21. Hiller K. Hangebrauk J. Jäger C. Spura J. Schreiber K. & Schomburg D. MetaboliteDetector: comprehensive analysis tool for targeted and nontargeted GC/MS based metabolome analysis. Anal Chem. 81(9) 3429–39 (2009). pmid:19358599
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref22] 22. Peregrín-Alvarez J. M. Sanforf C. Parkinson J. The conservation and evolutionary modularity of metabolism. Genome Biol. 10(6): R63 (2009). pmid:19523219
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref23] 23. Cartwright J. H. E., Giannerini S. & Gonzalez D. L. DNA as information: at the crossroads between biology, mathematics, physics and chemistry. Phil. Trans. Roy. Soc. of London A: M., Phys. and Eng. Sci. 374 (2064), (2016).
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref24] 24. Wei P. Li B. Leon A.D. & Pentzer E. Beyond binary: optical data storage with 0, 1, 2, and 3 in polymer films. J. Mat. Chem. C. 5(23), 5780–5786 (2017).
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref25] 25. Green J. E. Choi J. W. Boukai A. Bunimovich Y. Johnston-Halperin E. DeIonno E. et al. A 160-kilobit molecular electronic memory patterned at 1011 bits per square centimetre. Nature. 445, 414, (2007). pmid:17251976
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref26] 26. Sarkar T. Selvakumar K. Motiei L. & Margulies D. Message in a molecule. Nat. Comm. 7, 11374 (2016).
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref27] 27. Arcadia, C. E. Hokchhay, T. Dombroski, A. Ferguson, K. Chen, S. L. Kim, E. et al. Parallelized Linear Classification with Volumetric Chemical Perceptrons. IEEE International Conference on Rebooting Computing (ICRC) (2018).

[ref28] 28. The Rhode Island Hope Regiment Colors 1781. Rhode Island State House, 82 Smith St. Providence, RI. USA.

[ref29] 29. Nikolaev EN. Kostyukevich YI. & Vladimirov GN. Fourier transform ion cyclotron resonance (FT ICR) mass spectrometry: Theory and simulations. Mass Spectrom Rev. 35(2), 219–258 (2016). pmid:24515872
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref30] 30. Unknown artist. ‘Ibex or Gazelle, Block Print’, 13th or 14th century Egyptian. Ink and white pigment on paper. Accession 2016.624. Gallery 454. Metropolitan Museum of Art. Fifth Avenue, NY. USA.

[ref31] 31. Wilkinson, C. K. ‘Cat Killing a Serpent’, 1921. Facsimile made with Tempera on paper. Accession 30.4.1. Gallery 135. Metropolitan Museum of Art, Fifth Avenue, NY. USA.

[ref32] 32. Erlich Y. & Zielinski D. DNA Fountain enables a robust and efficient storage architecture. Science. 355, 950–954 (2017). pmid:28254941
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref33] 33. Rose C. & Wright G. Inscribed matter as an energy-efficient means of communication with an extraterrestrial civilization. Nature 431, 47–49 (2004). pmid:15343327
View Article
PubMed/NCBI
Google Scholar

[106] View Article

[107] PubMed/NCBI

[108] Google Scholar

[ref34] 34. Rosenstein, J. K. Rose, C. Reda, S. Weber, P. M. Kim, E. Sello, J. et al. Principles of Information Storage in Molecular Mixtures, arXiv:1905.02187, submitted.

[ref35] 35. Kouskoumvekaki I. & Panagiotou G. Navigating the Human Metabolome for Biomarker Identification and Design of Pharmaceutical Molecules. J Biomed Biotechnol. 525497 (2011). pmid:20936122
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref36] 36. Tang Y. Pingitore F. Mukhopadhyay A. Phan R. Hazen T. C. & Keasling J. D. Pathway Confirmation and Flux Analysis of Central Metabolic Pathways in Desulfovibrio vulgaris Hildenborough using Gas Chromatography-Mass Spectrometry and Fourier Transform-Ion Cyclotron Resonance Mass Spectrometry. J. Bact. 189, 940–949 (2007). pmid:17114264
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref37] 37. Belov M. E. Gorshkov M. V. Udseth H. R. Anderson G. A. & Smith R. D. Zeptomole-Sensitivity Electrospray Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectrometry of Proteins. Anal. Chem. 72(10), 2271–2279 (2000). pmid:10845374
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref38] 38. Martens S. Landuyt A. Espeel P. Devreese B. Dawyndt P. & Du Prez F. Multifunctional sequence-defined macromolecules for chemical data storage. Nat. Commun. 9, 4451 (2018). pmid:30367037
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref39] 39. Kennedy E. Dong Z. Tennant C. & Timp G. Reading the primary structure of a protein with 0.07 nm³ resolution using a subnanometre-diameter pore. Nat. Nano. 11(11), 968 (2016).
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref40] 40. Arcadia C. E., Reyes C. C., & Rosenstein J.K. In Situ Nanopore Fabrication and Single-Molecule Sensing with Microscale Liquid Contacts. ACS Nano, 11 (5), pp. 4907–4915, 2017. pmid:28485922
View Article
PubMed/NCBI
Google Scholar

[130] View Article

[131] PubMed/NCBI

[132] Google Scholar

[ref41] 41. Galenkamp N.S. Soskine M. Hermans J. Wloka C. & Maglia G. Direct electrical quantification of glucose and asparagine from bodily fluids using nanopores. Nat. Commun. 9, 4085 (2018). pmid:30291230
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

Abstract

Figures

Introduction

Materials and methods

Chemical library preparation

Data mixture preparation

Mass analysis of data plates

Results

Writing synthetic metabolomes

Ion cyclotron mass spectrometry of metabolite mixtures

Statistical analysis of data plates

Decoding data from multiple peaks using logistic regression

Discussion

Conclusion

Supporting information

S1 File. Supporting information.

Acknowledgments

References

Cookie Preference Center

Customize Your Cookie Preference