Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Complete Genes May Pass from Food to Human Blood

  • Sándor Spisák ,

    Affiliations Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary, Children's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America

  • Norbert Solymosi,

    Affiliations Department of Physics of Complex Systems, Eötvös University, Budapest, Hungary, Department of Animal Hygiene, Herd Health and Veterinary Ethology, Szent István University, Budapest, Hungary

  • Péter Ittzés,

    Affiliation Department of Physics of Complex Systems, Eötvös University, Budapest, Hungary

  • András Bodor,

    Affiliation Department of Physics of Complex Systems, Eötvös University, Budapest, Hungary

  • Dániel Kondor,

    Affiliation Department of Physics of Complex Systems, Eötvös University, Budapest, Hungary

  • Gábor Vattay,

    Affiliation Department of Physics of Complex Systems, Eötvös University, Budapest, Hungary

  • Barbara K. Barták,

    Affiliation 2nd Department of Internal Medicine, Semmelweis University, Budapest, Hungary

  • Ferenc Sipos,

    Affiliation 2nd Department of Internal Medicine, Semmelweis University, Budapest, Hungary

  • Orsolya Galamb,

    Affiliation 2nd Department of Internal Medicine, Semmelweis University, Budapest, Hungary

  • Zsolt Tulassay,

    Affiliations Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary, 2nd Department of Internal Medicine, Semmelweis University, Budapest, Hungary

  • Zoltán Szállási,

    Affiliation Children's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America

  • Simon Rasmussen,

    Affiliation Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark

  • Thomas Sicheritz-Ponten,

    Affiliation Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark

  • Søren Brunak,

    Affiliation Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark

  • Béla Molnár,

    Affiliations Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary, 2nd Department of Internal Medicine, Semmelweis University, Budapest, Hungary

  •  [ ... ],
  • István Csabai

    Affiliations Department of Physics of Complex Systems, Eötvös University, Budapest, Hungary, Department of Physics and Astronomy, The Johns Hopkins University, Baltimore, Maryland, United States of America

  • [ view all ]
  • [ view less ]


Our bloodstream is considered to be an environment well separated from the outside world and the digestive tract. According to the standard paradigm large macromolecules consumed with food cannot pass directly to the circulatory system. During digestion proteins and DNA are thought to be degraded into small constituents, amino acids and nucleic acids, respectively, and then absorbed by a complex active process and distributed to various parts of the body through the circulation system. Here, based on the analysis of over 1000 human samples from four independent studies, we report evidence that meal-derived DNA fragments which are large enough to carry complete genes can avoid degradation and through an unknown mechanism enter the human circulation system. In one of the blood samples the relative concentration of plant DNA is higher than the human DNA. The plant DNA concentration shows a surprisingly precise log-normal distribution in the plasma samples while non-plasma (cord blood) control sample was found to be free of plant DNA.


We are constantly exposed to foreign DNA from various sources like benign or malicious microbes in and on our body, pollens in the inhaled air and as the largest amount with the daily food supply. DNA molecules are ubiquitous in large numbers in all raw and unprocessed food. Depending on the extent of processing, various fractions of DNA molecules of varying size may be present in the consumed product, even in processed food such as corn chips and chocolate [1].

Uptake and fate of foreign DNA ingested with the daily food intake in the gastrointestinal tract of mammals is not a completely understood topic. Though exogenous nucleotides are essential at least for maintaining host immunity to allergenic tissues and restoring specific immune responses to foreign antigens [2], the amount of DNA in food is relatively low compared to other constituents and does not have significant nutritional value, hence nutritional studies rarely deal with this issue. The final step of uptake of nucleotides in the epithelium of the gastrointestinal tract is a relatively well understood complex process [3]. In contrast, the comprehension of the degradation process of long chains of DNA and possible uptake of larger fragments face many methodological challenges and very few studies have been conducted on the digestion of food-derived DNA within the 68 m long digestive tract of adult humans [1]. Animal feeding studies have demonstrated that a minor amount of fragmented dietary DNA may resist the digestive process (for a recent review see [4]) and there are sporadic reports in the literature claiming that orally administered small fragments of bacterial DNA [5] or plant RNA [6] can transgress the intestinal barrier, but no studies have explored the question if large DNA segments can pass from natural food intake to the circulatory system.

Blood is not free of DNA. White blood cells have nuclei that contain genetic material, which gives the dominant part of the DNA in a full blood sample. Beyond the DNA contained in the white blood cells the cell free blood plasma contains DNA, too. This is the so called circulating cell-free DNA (cfDNA) which is an ideal target to test the presence of foreign DNA, since most of the human “background” is removed by the cellular fraction.

Characteristics of Cell-free DNA

Circulating cell-free DNA (cfDNA) is defined as extracellular DNA occurring in body fluids was discovered in the human bloodstream and first described in 1948 by Mandel and Metais [7], but its origin and possible role is still controversial. The cfDNAs are mostly double-stranded molecules with fragment size in a wide range from 180 bp up to 21 kbp [8], [9]. The shorter fragments are thought to be related to the histone octamer structure and apoptotic degradation process, while necrosis results much larger fragments. Through phagocytosis of apoptotic cells macrophages may release the degraded DNA fragments into the bloodstream. These cfDNA fragments circulate as nucleoprotein complexes and in healthy individuals, the main part of cfDNA is found adsorbed to the surface of blood cells [10], [11].

The cfDNA concentration in healthy people is between 0 and 100 ng/ml with a mean of 133 ng/ml. This level is increased by an order of magnitude in various types of cancer up to a mean of 18038 ng/ml [12]. How the circulating cfDNA is then eliminated from the blood remains unknown in general but altered nucleotide metabolism was observed in tumorous patients. According to this hypothesis the increased cfDNA concentration is caused by the reduced DNase activity in the tumorous plasma [13] and indeed treatment of tumorous mice with ultra low doses of nucleases significantly decreased the liver and lung metastasis [14]. On the other hand according to Holdenrieder et al. [15] the efficiency of plasma nucleases is limited because the structure of nucleoprotein complexes is able to protect the cfDNA from degradation.

Studying the clearance of fetal DNA from maternal blood after birth by Lo et al. [16] a relatively quick mean half-life time (16.3 min, range 4–30 min) of the cfDNA was observed by using PCR. During the elimination process an initial rapid tissue uptake phase and a second DNase-mediated slower phase can be separated [16], [17].

The cfDNA fragments circulating in the plasma are a mostly uniform sample of the whole genome, however there are some over-represented fragments. Increased DNA integrity was observed in tumorous plasma samples due to the higher ratio of -actin fragments with lengths of 400 bp compared to samples from patients with non-neoplastic diseases, which may be caused by the different origin and degradation rate of the cfDNA [18].

The Origin of the Cell Free DNA

There are many, sometimes contradicting, theories concerning the release of cfDNA and its distribution in the body. Also, we are only at the first steps to uncover the cellular and molecular mechanisms that transfers cfDNA from cells to blood. Initially pathogen origin has been attributed to cfDNA, later different pathological conditions like cancer, inflammation and autoimmune disease, while finally it has been shown to be present in the plasma of subjects with normal physiological conditions [19], [20], too. Our current understanding is that apoptotic cells – which are present in healthy individuals, too – are the primary source. Additionally, in different diseases (inflammation, autoimmune, trauma and cancer) necrotic cells may increase the cfDNA level [8], [21].

There is an alternative theory, which suggests that white blood cells are the main source of cfDNA. Lee et al. [22] attributes the higher concentration in serum than plasma samples to the process of clotting caused by the lysis of white blood cells. Also, in limphocyte, DNA with lower molecular weight than genomic DNA can form a complex with glycoproteins and be actively released into the bloodstream to act as a signaling molecule in different signal transduction pathways [23], [24].

Numerous groups have demonstrated that the genetic and epigenetic alterations of cfDNA in cancer patients can be detected [25], and a possible role in genometastasis has been suggested [26], too. If the issues concerning the great variations in sensitivity and specificity and the mismatch between the cancer profiles from cfDNA studies and other methods [20], [27] were resolved then cfDNA monitoring could be a promising tool in cancer diagnostics.

Foreign Sources of cfDNA

There is evidence that beyond the human cells of the subject other organisms can contribute to the cfDNA budget.

Other humans: Predominant donor origin was proved in patients receiving sex-mismatched bone marrow transplants using quantization of Y-chromosome sequences of plasma and serum cfDNA [28]. Cell free DNA of the fetus can be detected in maternal plasma promising non-invasive prenatal testing of fetal genetic conditions [29]. Though the fetal DNA is in relatively low concentration compared to the maternal cfDNA, fetal DNA has a lower molecular weight. With fragment size separation fetal DNA can be enriched [30] to a level that makes possible the diagnostics. Note, that in our study we use a similar technique and find that indeed different sized cfDNA fractions may have different origin.


Virus DNA has been identified using plasma samples from different virus related (lung, gastric, head and neck cancer) tumor patients [31][33], however, the virus DNA concentration could not be related to the size of the solid tumor and no viral DNA could be identified in cervix cancer [34].


Using 16S rDNA analysis Jiang et. al [35] has shown, that the bacterial DNA level in the human plasma correlates with immune activation and the magnitude of immune restoration in antiretroviral-treated HIV infected persons. Citrobacter freundii and Pseudomonas aeruginosa sequences were identified from patients with acute pancreatitis by PCR and sequencing based approach [36].


DNA from consumed food is usually not considered as a possible source of cfDNA since during food digestion all macromolecules are thought to be degraded to elementary constituents such as amino acids and nucleotides, which are then transferred to the circulatory system through several complex active processes [3]. Though, there are animal studies, mainly focusing on the GMO issue [4], supporting the idea that small fragments of nucleic acids may pass to the bloodstream and even get into various tissues. For example foreign DNA fragments were detected by PCR based techniques in the digestive tract and leukocytes of rainbow trouts fed by genetically modified soybean [37], and other studies report similar results in goats [38], pigs [39], [40] and mice [5].

Results and Discussion

As a first step we have surveyed the composition of cfDNA in samples from 200 human individuals pooled into four groups based on colonoscopy diagnosis as having inflammatory bowel disease (IBD), adenoma (AD), colorectal cancer (CRC) or as negative (NEG). To avoid contamination we have used a contained blood collection and plasma separation system. During the nucleic acid isolation Laminar flow with HEPA filter and filtered pipetting tips were used. Since at the early stage we have separated DNA from particulates, the only possibility of contamination would have been in the form of free DNA which we find very improbable.

Since the sequencing technique produces relatively short fragments (50 nt) it is not possible to estimate the original fragment size from a sequencing study. To be able to infer the foreign cfDNA fragment distribution, prior to sequencing each sample has been separated into three fractions according to their average DNA length. Fraction 1 contained intact DNA above 10 kb (10 thousand base pairs), fraction 2 fragments between 200 bp to 10 kb (smear) and fraction 3 around 200 bp long segments (nucleosomal DNA). After barcoding, fragment libraries were sequenced on a SOLiD IV Next Generation Sequencing (NGS) system yielding 50 nt long reads a total of 86.6 Gbases. Sequencing data is publicly available here: Despite the relatively short lengths of the NGS reads, the separation to fractions and barcoding made possible to identify the original size of the DNA fragments in the blood. On average 71.1% of the reads could be mapped to the human reference genome. The goal of the original study was to find (human) genetic differences between the four groups, according to the stage of their disease, but the relatively large amount of unmapped reads urged us to explore their origin, which is the subject of this article. With discarding the cellular DNA during the sample preparation, using cfDNA only and in this second step discarding the human-matching short reads we have achieved a significant enhancement on the detection of the possibly present non-human DNA.

Before searching for traces of foreign genomes we have discarded the reads which matched the reference human genome. In this way we have excluded most of the possible homologous sequence reads which could give false positive signal from low-complexity, repetitive or evolutionally conservative human sequences. During the initial alignment to human genome we have used permissive parameter settings (“-n 3”) of the Bowtie NGS aligner tool [41] that allowed alignments with several mismatches. This made possible to identify reads which had mutations compared to the reference genome or which had read errors during the sequencing process. On the other hand, during the alignment to foreign genomes, to reduce the possibility of chance alignments, we have used a more stringent criterion, the “-n 0” switch of Bowtie and accepted alignments only with perfect match in the first 28 nt long seed region. To further reduce the possibility of false positives and chance matches to homologous, evolutionally conservative human segments we have fitted the reads matching the tomato genome against the whole refseq genomic collection of NCBI using BLASTN [42] with default settings (blast2 -p blastn -i sample.fa -d refseq_genomic -m 7 -o results.xml). The resulting GenBank IDs were joined to the NCBI taxonomy database to associate them with classes and divisions.

Testing the sequences against the chloroplast genome collection of NCBI (Table 1), over 25,000 sequence reads (Table 2) aligned to plant chloroplasts, among which Solanum tuberosum (potato) and/or the closely related Solanum lycopersicum (tomato) were the most abundant. Calculating the statistics for the tomato chloroplast alone, 127,885 of the 155,461 nucleotides in the plastome are covered by at least one read for the IBD sample. The average coverage is 6.3, which is higher than the sample's coverage of 4.9 for the human genome (see Figure 1). We have found hints for presence of DNA from other food related species (e.g. chicken), but due to the larger genetic homology between vertebrates, larger samples would be needed for convincing results, results will be discussed elsewhere.

Figure 1. Coverage of the tomato chloroplast in the IBD sample.

Small gray dots indicate the counts of alignments at individual nucleotide positions, darker shades are the result of several overlapping points. The orange line is the smoothed coverage of the tomato chloroplast, while the short gray dash indicates the average coverage level of the human genome for the same sample.

Table 2. The initial number of sequence reads and the ones matching the chloroplast genome collection.

The number of aligning short reads shows large differences between the various samples (see Table 2). Most of the matches are in the 1st fraction of IBD that contains the longest (10 kb) intact DNA segments. This is surprising in the light of the current paradigm [43], which assumes that during digestion and absorption DNA is degraded to nucleotides. Our results show that not just some of the DNA can avoid the complete degradation, but fragments large enough to carry complete genes can pass from the digestive tract to blood. As shown in Table 2 the BLAST verification is consistent with the original findings, for chloroplast target sequences dominant part of BLAST hits matched plants only (i.e. not any other species in NCBI Ref. Seq.). The bacterial matching reads can be the result of the genetic homology of the chloroplast genome and bacterial genomes, or may indicate the presence of bacterial DNA in the samples.

All these results strengthen our conclusion that the meal-derived DNA fragments are able to avoid the total degradation in the gastrointestinal tract and enter the circulation through a previously unknown mechanism.

Validation on Independent Samples

The NGS technology is evolving so fast and sequences are produced in such a rate that detailed understanding of all the information hiding in them cannot keep pace with data collection; hence already analyzed data may provide new insights for another research question. So, to confirm our discovery we have searched the publicly available NGS archives [44][46] for circulating cell-free DNA sequencing data. Compared to nuclear genome sequencing studies, plasma DNA data is very rare in the archives. We have found altogether 909 samples from 907 individuals in three studies with accession numbers DRP000446, SRP009039 and SRP016573. The analysis of these independent NGS data confirms our hypothesis that the presence of foreign DNA in human plasma is not unusual, though it shows large variation from subject to subject. SRP016573 study also provides a natural ‘negative control sample’ and eliminates the possibility that the results are mere statistical artifacts, since no trace of plant DNA was found in cord blood samples while more than 1000 reads were detected in the maternal plasma.

Independent sample from subject with inflammation shows high plant DNA concentration.

The original goal of the DRP000446 study [47] was to detect potential pathogens in patients with Kawasaki disease, which is an autoimmune disease that involves the inflammation of blood vessels. The authors of the study have collected 6 DNA samples, two of them from formalin-fixed paraffin-embedded sample of the lymph node biopsy, one from pharyngeal swab sample and three form serum specimens at different stages of the disease. We have analyzed the sequencing data for the three serum samples DRR001355, DRR001356 and DRR001357. The total number of reads in the three samples is only 3.2 M which is much less than the 1732 M in our study, but since a different sequencer (Illumina Genome Analyzer II) was used, the reads are longer (81 nt long) than the 50 nt long reads in our studies (ABI Solid 4 System) reducing further the probability of false positives. Using the same pipeline as above, we have discarded the reads which match the human genome, then aligned the remaining ones to the chloroplast database. The largest number of unique positions were found for Brassica rapa (NC_015139) followed closely by orange (NC_008334). We provide the coverage map in Figure 2. 27742 nucleotide positions of the total 180852 are covered for Brassica rapa. Counting the multiply covered regions, the average coverage is 0.56. Note however, that the coverage is less uniform than for our IBD sample, the rRNA16 s and rRNA 23 s regions are overrepresented. This indicates that some of the matching DNA fragments may originate from some other related species which is missing from the chloroplast genome collection we use. Also, since the chloroplasts have been evolved from endosymbiotic bacteria, bacterial genome fragments may align to this evolutionarily conservative region. Indeed if we BLAST all the 1634 reads that matched the chloroplast genomes against the refseq database, 733 of them also match various bacterial genomes, but 894 does not match any other organisms, just plants. The coverage for Brassica rapa (orange spikes in Figure 2) without the bacterial reads is more uniform. Though in this sample the presence of the chloroplast genome is less definitive here than in our samples, the total reads vs. chloroplast matching reads ratio is even higher. The initial number of reads for the pooled IBD samples was 478 M and after BLAST filtering non-plant sequences there were 23649 matches for the chloroplast genomes i.e. 49 matches/million read (49 ppm), for the other 3 samples these ratios are around or below 1 ppm. For the DRP000446 sample the corresponding ratio is 1634/3.2 M = 497 ppm (272 ppm without bacterial tags). Note, that both in the IBD patients and the Kawasaki disease subject inflammation is present, hence from these samples we cannot exclude the possibility, that the presence of food DNA in high concentration is linked to inflammation.

Figure 2. Brassica rapa chloroplast coverage pileup for the DRP000446 study.

The gray spikes shows the counts of alignments at individual nucleotide positions (vertical scale is logarithmic). 27742 nucleotide positions of the total 180852 are covered. There are two regions around 100,000 and 135,000 where the coverage is more than 10 times than at other parts of the chloroplast. These are the regions where the ribosomal RNA genes are found which share very similar sequence with other chloroplasts and bacterial genomes. Indeed if we BLAST all the 1634 reads that matched the chloroplast genomes against the NCBI reference sequence database, 733 of them also match various bacterial genomes, but 897 does not match any other organisms, just plants. Removing those alignments that match bacterial genomes too, (gray spikes) makes the distribution more uniform.

The amount of plant DNA in 903 individual maternal plasma samples is log-normally distributed and hints diet pattern.

Since the IBD sample was a pooled sample of 50 individuals we do not know how many individual samples contributed to the chloroplast matching reads. In the SRP009039 study [48] plasma DNA of 903 healthy pregnant women with ages ranging from 20 to 45 years were sequenced to study the possibility of prenatal noninvasive diagnosis of fetal trisomy. Depending on the platform, Illumina GAIIx and Illumina HiSeq 2000, the length of the reads are 36 nt or 50 nt, respectively. Though the individual read count is relatively small, typically in the 1 M–14 M reads/study range, the samples are individually identified, so compared to our pooled samples we hope to see if there are individual differences. As for the previous samples we have tested the presence of plant chloroplast DNA and got the largest coverage for soybean (Glycine max, NC_007942.1) with uniform coverage.

The overall average chloroplast DNA ratio is 1.481 ppm but there is a very large variation from sample to sample, so we visualize their cumulative distribution on a logarithmic scale in Figure 3. The numbers of reads per sample are in the range of 940,929–12,827,703 with an average 2,483,480, so it is not possible to detect concentration below 0.078 ppm for the largest, and below 0.35 ppm for the average sized sample. In 75% of the samples we could detect plant DNA and for 220 of the total 903 subjects there are no aligning reads at all, most probably because of the wide distribution and the low coverage. Down to the above mentioned cutoff value the data can be fitted with the following log-normal distribution:(1)with only two free parameters, the location parameter and the scale parameter , the analogs of mean and standard deviation, respectively. If we take into account the finite size of the samples even the cutoff break around 0.35 can be modeled. The gray shaded band in Figure 3 is the result of the simulation of 300 realizations of the log-normal process with taking into account the concrete sizes of the samples. Though log-normal distribution is ubiquitous in almost all disciplines [49] the precise agreement between the data and model is quite surprising. The trend may be explained by the exponential decay dynamics of foreign cfDNA with randomly varying half-lives or waiting times between consumption and blood sample collection.

Figure 3. The cumulative distribution of plant DNA amount for over 900 subjects.

It (black dots) can be fitted with log-normal distribution (red curve) above the sensitivity cutoff (0.35). The gray shaded band is the result of the simulation of 300 realizations of the log-normal process with taking into account the varying sizes of the samples. Among the independent samples (larger dots), the ones from patients with inflammatory diseases (IBD, DRP000446) have the largest concentration. For the SRP016573 sample only the maternal plasma concentration is shown, full blood samples with 0.001 ppm and 0.004 ppm and cord blood samples with zero alignments are omitted from the figure.

There are alignments to several plant species and since the samples are from over 900 different subjects we can test the individual differences. This can be considered as a test of contamination too. If the food origin of this external DNA is true, we expect different plants dominating different samples, according to the different diet of patients, while lab contamination would most probably result the same composition in all samples. In Figure 4 we show how the number of matching reads are distributed between subjects and different plant species. To make the visualization of the broad distribution possible, only plants with at least 50 and samples with at least 10 aligning reads are shown. The clustering algorithm recovers the taxonomic groups of plants. The first three species (beans) are members of the Fabaceae family, the next eight species belong to the Brassicaceae family. These two families are distantly related in the Eurosids clade. There are four members from the Solanaceae family (potato, tobacco,) and one from the Convolvulaceae (Ipomoea, Cuscuta) family. These two families are members of the Solanales order. The remaining eight species are from the Poaceae family from the Monocots clade [50]. All these 24 plants are often consumed by humans or are close relatives of frequently eaten species while many non-edible plants which were part of the aligned chloroplast database do not show up on the list (see Table 1 for complete list of aligned species). Note that on one hand not edible but genetically related species can show up, and the other hand not all the frequently eaten plant species are part of our chloroplast genome collection. We suspect that the only outlier, the non-edible Ipomoea purpurea (morning glory) shows up because the similarity to the genome of Ipomoea batatas (sweet potato) or Ipomoea aquatica (kangkong, or Chinese spinach), a common ingredient in Southeast Asian dishes. Though the number of reads is too small to reconstruct the diet of the individuals, subjects with high Poaceae, high Fabaceae and “high everything except Poaceae” levels can be grouped together. We consider this pattern in food genome coverage as a further proof that the signal is not a statistical artifact.

Figure 4. This heatmap shows the number of chloroplast matching reads on a scale for the SRP009039 study.

From the total 903 subjects the ones with the largest number of matches are shown (only the plant genomes with more than 50, and only the subjects with more than 10 matching reads), the rows are the plant species, the columns are the samples. The automatic clustering recovers the related plant species and the subjects can be also grouped by the food types.

Cord blood is free of plant DNA while it can be detected in mother's plasma.

Four samples from fetal umbilical cord blood, maternal plasma and mother's and father's peripheral blood, with sequence depth were analyzed in the SRP016573 study to noninvasively infer fetal genotype and haplotype and identify Mendelian-disorder genes and complex disease-associated markers [51]. The setup of this study is ideal for testing plant DNA. According to our previous results we expect to find some plant DNA in the maternal plasma sample. Though peripheral full blood samples should contain traces of plant DNA, we expect it's relative concentration much smaller due to the higher amount of human DNA from blood cells. Though the maternal blood reaches the fetal chorion, and fetal plasma can be detected in mother's blood there is no direct fluid exchange between mother and fetus. So, even though some plasma DNA may have passed from mother to child, it's concentration would be much smaller in cord blood. We can use the cord blood sample as a natural ‘negative control’: if the plant DNA signal was the result of some contamination during the processing or a statistical artifact, it should show up in this sample, too. Beyond high sequencing depth, the paired layout of this study with 2100 nt long reads (Illumina HiSeq 2000) further diminish the chance of false positives.

While 1110 reads (0.703 ppm) from the maternal plasma aligned to chloroplast genomes, only 3 reads (0.004) from the father's and 1 read (0.001 ppm) from the mother's full blood sample matched them. There was not a single chloroplast matching read among the 560 M from the umbilical cord blood (see Table 3).

Table 3. The number of sequence reads in the samples and the number and ratio of chloroplast matching ones.


The analysis of all the publicly available circulating cell-free DNA sequencing data of over 1000 human subjects confirms our hypothesis that the presence of foreign DNA in human plasma is not unusual. It shows large variation from subject to subject following strikingly well a log-normal distribution with the highest concentration in patients with inflammation (Kawasaki disease, IBD). These findings could lead to a revision of our view of degradation and absorption mechanisms of nucleic acids in the human body.

Materials and Methods

Here we describe the details for the NEG, IBD, AD, CRC samples. For DRP000446, SRP009039 and SRP016573 samples see the cited papers and the description at the archives, respectively.

Anticoagulated blood was collected from the antecubital veins of fifty healthy individuals (median age, 42.6 years) who had negative colonoscopy. Blood was also collected from three diseased groups including inflammatory bowel disease (48) (median age, 35.2 years) colorectal adenoma (35) (median age, 53.4 years) and colorectal cancer (37) (median age, 64.9 years) with positive macroscopic and pathological finding. Study plan of the medical research was made according to the current legislations and World Medical Association Declaration of Helsinki. Ethics Committee approval was obtained (Nr.: TUKEB 2009/037. Semmelweis University Regional and Institutional Committee of Science and Research Ethics, Budapest, Hungary) and written informed consent was provided by all patients.

Plasma Separation and Cell Free DNA Isolation

Whole-blood samples were collected into Vacutainer tubes (BD Medical Systems) and plasma separation was performed by double centrifugation method (2×1500 g for 10 min) at 4°C within 1 h of the blood collection. The purified plasma fraction was stored at −80°C. From plasma cfDNA was extracted using QIAamp Circulating Nucleic Acid Kit (Qiagen) following the manufacturer's instructions with modification. Briefly, cfDNA was isolated from 5 ml plasma without addition carrier RNA. Quantification of cfDNA was performed using Qubit dsDNA HS Assay fluorometric Kit (Invitrogen). Eluates were pooled from equivalent amount from each sample and concentrated to a final volume of 200 using QIAamp Circulating Nucleic Acid Kit (Qiagen). To achieve the optimal sample volume (50 ) SpeedVac (Eppendorf) concentrator was used.

SOLiD Fragment Library Preparation and Sequencing

The DNA fragment library resequencing was performed on SOLiD IV system. Total of 3–5 g cfDNA was pooled from each group and three fractions were separated via electrophoresis using SyberSafe 1% TBE agarose gel (Invitrogen) and recovered by QIAquick Gel Extraction Kit (Qiagen). Three fractions were separeted according to sequence lengths: the 1st fraction is the intact DNA above 10 kb, the second is between 200 bp to 10 kb (smear) and the 3rd fraction is around 200 bp (nucleosomal DNA). In case of fractions 1 and 2 physical fragmentation was optimized and performed by Covaris S2 instrument. The fractions were labeled by individual barcode (Life Technologies). The size selected DNA (100 L) was end-repaired by adding 40 L 5× End-Polishing Buffer, 4 L dNTP mix, (10 mM), 2 L End Polishing Enzyme 1, 10 UL, 8 L End Polishing Enzyme 2, 5 UL, 46 L MQ distilled water in 200 l total volume and incubating for 30 min at room temperature. The DNA (180 L) was purified by AMPure XP beads (70 L) (Agencourt). Nick translation and amplification (15 cycles) was performed to amplify the ligated and purified DNA using Platinum PCR Amplification Mix (Life Technologies). Distribution of the amplified and non-ligated DNA was controlled using Agilent High Sensitivity DNA kit (Agilent). Size selection was performed using E-gel 2% system (Invitrogen) with the following parameters: required size: 200–250 bp, iBase program: Run E Gel DC, run time: 16 min, 200 ng 50 bp ladder. Library quantification was performed by TaqMan assay and ePCR followed. After the quantification 7×108 beads were loaded into the sequencing slides. The sequencing yielded 50 nt long reads – a total of 86.6 Gbases.

Author Contributions

Conceived and designed the experiments: SS NS IC. Performed the experiments: SS OG FS BKB. Analyzed the data: SS NS PI AB GV FS ZS SR TSP SB DK IC. Contributed reagents/materials/analysis tools: SS GV ZT BM IC. Wrote the paper: SS NS ZS SR TSP SB IC.


  1. 1. Nielsen K, Daffonchio D (2010) Unintended Horizontal Transfer of Recombinant DNA. TWN Biotechnology & Biosafety Series no. 13, TWN (ISBN: 978-967-5412-34-9).
  2. 2. Van Buren CT, Kulkarni AD, Rudolph FB (1994) The role of nucleotides in adult nutrition. J Nutr 124: 160S–164S.
  3. 3. Sanderson IR, He Y (1994) Nucleotide uptake and metabolism by intestinal epithelial cells. J Nutr 124: 131S–137S.
  4. 4. Rizzi A, Raddadi N, Sorlini C, Nordgrd L, Nielsen KM, et al. (2012) The stability and degradation of dietary DNA in the gastrointestinal tract of mammals: implications for horizontal gene transfer and the biosafety of GMOs. Crit Rev Food Sci Nutr 52: 142–161.
  5. 5. Schubbert R, Hohlweg U, Renz D, Doerer W (1998) On the fate of orally ingested foreign DNA in mice: chromosomal association and placental transmission to the fetus. Mol Gen Genet 259: 569–576.
  6. 6. Zhang L, Hou D, Chen X, Li D, Zhu L, et al. (2012) Exogenous plant MIR168a specifically targets mammalian LDLRAP1: evidence of cross-kingdom regulation by microRNA. Cell Res 22: 107–126.
  7. 7. Mandel P, Metais P (1948) Les acides nucleiques du plasma sanguin chez lhomme. CR Acad Sci Paris 142: 241–243.
  8. 8. Jahr S, Hentze H, Englisch S, Hardt D, Fackelmayer FO, et al. (2001) DNA fragments in the blood plasma of cancer patients: quantitations and evidence for their origin from apoptotic and necrotic cells. Cancer Res 61: 1659–1665.
  9. 9. Stroun M, Anker P, Lyautey J, Lederrey C, Maurice PA (1987) Isolation and characterization of DNA from the plasma of cancer patients. Eur J Cancer Clin Oncol 23: 707–712.
  10. 10. Skvortsova TE, Rykova EY, Tamkovich SN, Bryzgunova OE, Starikov AV, et al. (2006) Cell-free and cell-bound circulating DNA in breast tumours: DNA quantification and analysis of tumourrelated gene methylation. Br J Cancer 94: 1492–1495.
  11. 11. Ponomaryova AA, Rykova EY, Cherdyntseva NV, Skvortsova TE, Dobrodeev AY, et al. (2011) RAR2 gene methylation level in the circulating DNA from blood of patients with lung cancer. Eur J Cancer Prev 20: 453–455.
  12. 12. Leon SA, Shapiro B, Sklaroff DM, Yaros MJ (1977) Free DNA in the serum of cancer patients and the effect of therapy. Cancer Res 37: 646–650.
  13. 13. Cherepanova AV, Tamkovich SN, Bryzgunova OE, Vlassov VV, Laktionov PP (2008) Deoxyribonuclease activity and circulating DNA concentration in blood plasma of patients with prostate tumors. Ann N Y Acad Sci 1137: 218–221.
  14. 14. Patutina O, Mironova N, Ryabchikova E, Popova N, Nikolin V, et al. (2011) Inhibition of metastasis development by daily administration of ultralow doses of RNase A and DNase I. Biochimie. 93: 689–696.
  15. 15. Holdenrieder S, Stieber P, Chan LY, Geiger S, Kremer A, et al. (2005) Cell-free DNA in serum and plasma: comparison of ELISA and quantitative PCR. Clin Chem 51: 1544–1546.
  16. 16. Lo YM, Zhang J, Leung TN, Lau TK, Chang AM, et al. (1999) Rapid clearance of fetal DNA from maternal plasma. Am J Hum Genet 64: 218–224.
  17. 17. Minchin RF, Carpenter D, Orr RJ (2001) Polyinosinic acid and polycationic liposomes attenuate the hepatic clearance of circulating plasmid DNA. J Pharmacol Exp Ther 296: 1006–1012.
  18. 18. Wang BG, Huang HY, Chen YC, Bristow RE, Kassauei K, et al. (2003) Increased plasma DNA integrity in cancer patients. Cancer Res 63: 3966–3968.
  19. 19. Bendich A, Wilczok T, Borenfreund E (1965) Circulating DNA as a possible factor in oncogenesis. Science 148: 374–376.
  20. 20. Jung K, Fleischhacker M, Rabien A (2010) Cell-free DNA in the blood as a solid tumor biomarker–a critical appraisal of the literature. Clin Chim Acta 411: 1611–1624.
  21. 21. Lo YM, Rainer TH, Chan LY, Hjelm NM, Cocks RA (2000) Plasma DNA as a prognostic marker in trauma patients. Clin Chem 46: 319–323.
  22. 22. Lee TH, Montalvo L, Chrebtow V, Busch MP (2001) Quantitation of genomic DNA in plasma and serum samples: higher concentrations of genomic DNA found in serum than in plasma. Transfusion 41: 276–282.
  23. 23. Rogers JC, Boldt D, Kornfeld S, Skinner A, Valeri CR (1972) Excretion of deoxyribonucleic acid by lymphocytes stimulated with phytohemagglutinin or antigen. Proc Natl Acad Sci USA 69: 1685–1689.
  24. 24. Stroun M, Lyautey J, Lederrey C, Mulcahy HE, Anker P (2001) Alu repeat sequences are present in increased proportions compared to a unique gene in plasma/serum DNA: evidence for a preferential release from viable cells? Ann N Y Acad Sci 945: 258–264.
  25. 25. Warren JD, Xiong W, Bunker AM, Vaughn CP, Furtado LV, et al. (2011) Septin 9 methylated DNA is a sensitive and specific blood test for colorectal cancer. BMC Med 9: 133.
  26. 26. Garcia-Olmo DC, Dominguez C, Garcia-Arranz M, Anker P, Stroun M, et al. (2010) Cell-free nucleic acids circulating in the plasma of colorectal cancer patients induce the oncogenic transformation of susceptible cultured cells. Cancer Res 70: 560–567.
  27. 27. Ziegler A, Zangemeister-Wittke U, Stahel RA (2002) Circulating DNA: a new diagnostic gold mine? Cancer Treat Rev 28: 255–271.
  28. 28. Lui YY, Chik KW, Chiu RW, Ho CY, Lam CW, et al. (2002) Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin Chem 48: 421–427.
  29. 29. Lo YM, Corbetta N, Chamberlain PF, Rai V, Sargent IL, et al. (1997) Presence of fetal DNA in maternal plasma and serum. Lancet 350: 485–487.
  30. 30. Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR (2010) Analysis of the size distributions of fetal and maternal cell-free DNA by paired-end sequencing. Clin Chem 56: 1279–1286.
  31. 31. Ngan RK, Yip TT, Cheng WW, Chan JK, Cho WC, et al. (2002) Circulating Epstein-Barr virus DNA in serum of patients with lymphoepithelioma-like carcinoma of the lung: a potential surrogate marker for monitoring disease. Clin Cancer Res 8: 986–994.
  32. 32. Lo YM, Chan WY, Ng EK, Chan LY, Lai PB, et al. (2001) Circulating Epstein-Barr virus DNA in the serum of patients with gastric carcinoma. Clin Cancer Res 7: 1856–1859.
  33. 33. Capone RB, Pai SI, Koch WM, Gillison ML, Danish HN, et al. (2000) Detection and quantitation of human papillomavirus (HPV) DNA in the sera of patients with HPV-associated head and neck squamous cell carcinoma. Clin Cancer Res 6: 4171–4175.
  34. 34. Dong SM, Pai SI, Rha SH, Hildesheim A, Kurman RJ, et al. (2002) Detection and quantitation of human papillomavirus DNA in the plasma of patients with cervical carcinoma. Cancer Epidemiol Biomarkers Prev 11: 3–6.
  35. 35. Jiang W, Lederman MM, Hunt P, Sieg SF, Haley K, et al. (2009) Plasma levels of bacterial DNA correlate with immune activation and the magnitude of immune restoration in persons with antiretroviral-treated HIV infection. J Infect Dis 199: 1177–1185.
  36. 36. de Madaria E, Martinez J, Lozano B, Sempere L, Benlloch S, et al. (2005) Detection and identification of bacterial DNA in serum from patients with acute pancreatitis. Gut 54: 1293–1297.
  37. 37. Chainark P, Satoh S, Hirono I, Aoki T, Endo M (2008) Availability of genetically modified feed ingredient: investigations of ingested foreign dna in rainbow trout oncorhynchus mykiss;. Fisheries Science 74: 380–390.
  38. 38. Tudisco R, Mastellone V, Cutrignelli MI, Lombardi P, Bovera F, et al. (2010) Fate of transgenic DNA and evaluation of metabolic effects in goats fed genetically modified soybean and in their offsprings. Animal 4: 1662–1671.
  39. 39. Mazza R, Soave M, Morlacchini M, Piva G, Marocco A (2005) Assessing the transfer of genetically modified DNA from feed to animal tissues. Transgenic Res 14: 775–784.
  40. 40. Sharma R, Damgaard D, Alexander TW, Dugan ME, Aalhus JL, et al. (2006) Detection of transgenic and endogenous plant DNA in digesta and tissues of sheep and pigs fed Roundup Ready canola meal. J Agric Food Chem 54: 1699–1709.
  41. 41. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.
  42. 42. Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7: 203–214.
  43. 43. Rhoades R, Tanner G (2003) Medical Physiology. Lippincott Williams & Wilkins.
  44. 44. NCBI Sequence Read Archive. URL Accessed 2013 July 1.
  45. 45. EBI European Nucleotide Archive. URL Accessed 2013 July 1.
  46. 46. DDBJ Sequence Read Archive. URL Accessed 2013 July 1.
  47. 47. Katano H, Sato S, Sekizuka T, Kinumaki A, Fukumoto H, et al. (2012) Pathogenic characterization of a cervical lymph node derived from a patient with Kawasaki disease. Int J Clin Exp Pathol 5: 814–823.
  48. 48. Jiang F, Ren J, Chen F, Zhou Y, Xie J, et al. (2012) Noninvasive Fetal Trisomy (NIFTY) test: an advanced noninvasive prenatal diagnosis methodology for fetal autosomal and sex chromosomal aneuploidies. BMC Med Genomics 5: 57.
  49. 49. Limpert E, Stahel W, Abbt M (2001) Log-normal distributions across the sciences: Keys and clues. BioScience 51 (5): 341.
  50. 50. Chase MW, Fay MF, Reveal JL, Soltis DE, Soltis PS, et al. (2009) An update of the angiosperm phylogeny group classification for the orders and families of owering plants: Apg iii. Botanical Journal of the Linnean Society 160: 105.
  51. 51. Chen S, Ge H, Wang X, Pan X, Yao X, et al. (2013) Haplotype-assisted accurate noninvasive fetal whole genome recovery through maternal plasma sequencing. Genome Med 5: 18.