Dinosaur Peptides Suggest Mechanisms of Protein Survival

Eleven collagen peptide sequences recovered from chemical extracts of dinosaur bones were mapped onto molecular models of the vertebrate collagen fibril derived from extant taxa. The dinosaur peptides localized to fibril regions protected by the close packing of collagen molecules, and contained few acidic amino acids. Four peptides mapped to collagen regions crucial for cell-collagen interactions and tissue development. Dinosaur peptides were not represented in more exposed parts of the collagen fibril or regions mediating intermolecular cross-linking. Thus functionally significant regions of collagen fibrils that are physically shielded within the fibril may be preferentially preserved in fossils. These results show empirically that structure-function relationships at the molecular level could contribute to selective preservation in fossilized vertebrate remains across geological time, suggest a ‘preservation motif’, and bolster current concepts linking collagen structure to biological function. This non-random distribution supports the hypothesis that the peptides are produced by the extinct organisms and suggests a chemical mechanism for survival.


Introduction
While it is widely accepted that proteins have the potential to survive significantly longer periods of time than DNA [1], persistence of original bone proteins in fossils at least 68 million years old is controversial [2,3], despite multiple lines of evidence supporting this hypothesis [4,5,6,7,8,9]. Current temporal limits for survival of original biomaterials [10,11] are based upon theoretical kinetics and laboratory experiments designed to simulate protein diagenesis through exposure to harsh conditions (e.g. low pH and high temperature [10,12]) and predict complete degradation of measurable biomolecules in well under a million years if degradation proceeds at simulated rates. Modeled degradation of DNA [13] places temporal limits of ,100,000 years (at a constant 10uC), whereas models of protein degradation (e.g. [1,14]) extend this to a few million years (at a constant 10uC). However, these predictions have been surpassed (e.g. [15]), supporting the suggestion that current models may not be appropriate, in part because they do not consider the molecules in their native state (i.e., folded, closely-packed, cross-linked or, in the case of bone, stabilized by association with the mineral phase [16]). Recovery of what appear to be cells, blood vessels and tissues from multiple fossils from varying ages and depositional settings [4], and protein sequence data from two dinosaurs [5,6,7,9], also suggests that these models may be incomplete. Examining endogenous biomolecules other than DNA avoids synthetic amplification and reduces contamination issues that significantly impeded early ancient DNA research. Technological improvements in recent years, including soft ionization mass spectrometry, allow increased detection of minute traces of biomolecules that may persist for extended periods of time via crystal encapsulation [17,18], even in the presence of exogenous contamination that precluded earlier forms of analysis such as amino acid composition analyses and stable isotope analyses [13].
The possibility of using information contained in ancient molecules to address contemporary questions of basic biology and ecology is intriguing, and has unexpected potential beyond paleontology. For example, identifying the elements of the collagen fibril most resistant to degradation in fossils may lead to the rational design of collagenous scaffolds with enhanced in vivo longevities to support tendon or bone regeneration in humans. Similarly, identifying naturally occurring modifications on these molecules that contribute to preservation may also shed light on molecular-based disease processes. We show here that molecular preservation is linked to protein function, and discuss how sequences of ancient peptides can test models of molecular function in extant organisms. In addition, we show how models of extant protein function suggest a mechanism for the survival of proteins in exceptionally well preserved fossils.

Results and Discussion
Type I collagen peptides were extracted and sequenced from , 68 million years old fossils of Tyrannosaurus rex (Museum of the Rockies [MOR] 1125) [5,7], (Fig. 1). However, despite multiple lines of evidence to support the presence of collagen, including in situ antibody binding, the endogeneity of MOR 1125 peptides was disputed, and the sequences instead were suggested to arise from either microbial invasion [19], extant collagens introduced in laboratory experiments [2], or even statistical artifact [3]. Collagen peptide sequences were subsequently derived from a second dinosaur, Brachylophosauraus canadensis (MOR 2598) [9], and included many of the earlier lines of supporting evidence as well as independent replication of data in multiple labs.
Surprisingly, advances in collagen biology also support the authenticity of the fossil peptides. The molecular structure of collagen favors preservation. The triple-helical arrangement and intra-and intermolecular cross-links confer stability upon this ubiquitous structural molecule [20,21,22,23,24,25]. Additionally, when collagen is surrounded by or adsorbed to mineral surfaces, as in bone, its preservation potential is greatly enhanced (e.g. [18,26,27,28,29,30]). In fibrillar collagens, individual triple-helical molecules aggregate, forming a fibril with a characteristic 67 nm banding pattern that is readily recognized by electron microscopy (Fig. 2) [31,32]. Within each 67 nm wide D-period, segments of neighboring molecules are referred to as monomers 1-5 (Fig. 2), and specific functional regions have been mapped to each monomer using a variety of experimental approaches [33,34,35].
The stability and unique function conferred by the triple-helical structure of collagen has been known for over forty years, but just how molecules assemble into microfibrils to form the massive cable-like fibrils in tissues has been less well understood. However, recent advances in technology have allowed molecular resolution images of type I collagen microfibrils and fibrils [35,36]. This new information, coupled with non-random distribution of collagen functional sequences and mutations [33], has led to the formation of a testable model linking structure to function in this massive protein assemblage. Discrete cell-and matrix-interaction domains have been identified, and collagen-binding ligands that cooperatively carry out fibril functions have been recognized.
We reasoned that particular functional molecular regions may contribute to their preferential resistance to biological degradation throughout the lifetime of an individual organism. This property not only needs to remain highly conserved through species but also may render those regions resistant to degradation in the burial environment. Thus, molecular models for differential functions of collagen fibril domains or sequences may provide a chemical or structural rationale for preservation. We mapped eleven fossilderived peptide sequences from two dinosaurs, Tyrannosaurus rex and Brachylophosauraus canadensis [7,9,37] on molecular models of extant human and rat collagens [33,34] (Table 1

, Figs. 3 and 4).
These peptides represent eight sequences which localize to seven regions of the monomer, and comprise less than fifteen percent of the length of the collagen triple helix. They were non-randomly distributed in several respects ( Fig. 3 and Statistical Analyses [see Materials and Methods]). In particular, fossil sequences mapped to regions of the protein partly shielded by tight molecular packing (Fig 4) [34], which may physically stabilize and protect them from enzymatic degradation, thus contributing to their preservation. Comparing the amino acid compositions of fossil peptides with sequences of the entire human protein for predicted properties such as hydrophobicity, polarity and charge revealed that most fossil peptides were from regions of collagen which contain relatively few acidic residues [38], and eight of the peptides (five sequences) lacked such residues altogether, which would limit their solubility and propensity for proteolytic degradation ( Table 1). Also, five peptides mapped to a uniquely hydrophobic fibril region [39]. The results imply that the most stable regions of the protein are those with a more hydrophobic, less acidic nature. That the more exposed, charged regions of collagen with high densities of trypsin cleavage sites yielded fewer fossil peptides suggested their susceptibility to proteolysis in early diagenesis, and supports non-random degradation and preservation patterns for the diverse type I collagen sequence set in fossil bone. It is also interesting to note that perhaps the least stable region, the hydroxyproline deficient thermally-labile domain located towards the C-terminal end of the molecule [40], is not represented by any of the fossil peptides.
All fossil-derived peptides mapped to monomers 2, 3, and 4 on the extant collagen models. The remaining monomers, 1 and 5, are joined across microfibrillar layers by intermolecular cross-links that, while stabilizing the molecule and protecting from enzymatic attack, may also hinder peptide extraction. In fact, the only position where alpha 1 chain peptides (Peptides 3 and 8) colocalize with an alpha 2 chain peptide (Peptide 11) mapped to the integrin binding site that promotes cell-collagen interactions, angiogenesis, and osteoblast differentiation; its fibril location and association with severe mutations also suggest its crucial nature [33] and hence strong selective pressure for conservation of sequence. One peptide (Peptide 4) mapped to the Matrix Metalloproteinase-1 (MMP-1) cleavage domain crucial for collagen remodeling, and a site for fibronectin binding. In living tissues, the integrin binding site and MMP-1 cleavage/fibronectin binding sequences are somewhat buried under the surface of the collagen fibril, thus fibril proteolysis or injury may be needed to render them available for cell-collagen interactions and tissue regeneration [35]. The molecularly ''sheltered'' environment required to protect crucial biological function may also account for enhanced survival of those protein regions in fossils. Although the majority of the dinosaur peptides are from highly conserved regions of the molecule, both of the alpha 2 chain peptides are highly variable [41,42]. That they are not exclusively from sequences with a high similarity to residues in public databases, suggests that the peptides  Chemical characteristics of fossil peptides. Dinosaur peptide sequences were obtained from the literature and their alpha chain location and amino acid positions on the human collagen model determined. The prevalence of acidic residues (bolded, underlined) in the peptides was lower than predicted for ''average'' peptides of comparable lengths from pepsinized human collagen [38], implying that regions of collagen with a less acidic nature were preferentially preserved in the fossils. doi:10.1371/journal.pone.0020381.t001 were not identified solely because they derive from highly conserved sequences; thus, the gaps in our model are not simply due to the lack of peptide identification due to divergence from known organisms. Additional preservation potential may be conferred by association with biomineral, especially if some regions of the collagen molecule are more intimately associated with mineral than others. Conversely, the absence of peptide matches elsewhere in the molecule may be due to lack of response to trypsin resulting from unusual post-mortem modifications which may also confer resistance to proteolytic degradation and contribute to preservation over time [20]. Additional collagen sequences may have survived over time, but because of chemical modification or lack of representation in current databases, may not have been recognized by existing search algorithms and therefore not identified in original analyses.
Our results add to the evidence provided by sequence data [5,7,9,37], molecular phylogenetic analyses [8,9], microstructure [4,6,9,43] and immunoreactivity to anti-collagen antibodies   [6,9,43], that supports persistence of elements of native collagen fibril structure across geological time in some fossils. Most of the peptide sequences aligned perpendicularly with one or more other sequences on the fibril model, implying that neighboring triplehelical segments, or fragments thereof, may have been preserved en bloc. If supported by further peptide recovery and mapping, this observation would validate current models of collagen monomer arrangement in the fibril [35,44].
Mapping the distribution of fossil collagen peptides observed using mass spectrometry to models of collagen function demonstrates that preservation of fossil-derived collagen sequences concurs with current concepts of collagen biology, and provides a molecular mechanism for the preservation of this protein in fossil bone. Moreover, these findings support the endogeneous source and longevity of fossil-derived peptides, because peptides arising from recent contamination are expected to be more concentrated and random in distribution. They would not be expected to be over-represented in regions that so well reflect collagen fibril structure/function relationships in native vertebrate tissue [33,34].
Finally, by showing that functionally crucial protein regions are more stable than others over geologic time, we provide insight into selective pressures constraining the molecular structure, function, and hence sequence, of collagen. Paleoproteomics therefore not only holds significant promise for elucidating evolutionary relationships between extinct and extant organisms, but is potentially useful for enhancing our understanding of protein function in living animals. Also, elucidating molecular functions of extant proteins may help predict proteins or protein regions most likely to preserve in fossils, as has also been shown for the highlyconserved and structurally sheltered mineral-binding mid-region of the bone protein osteocalcin [45]. As technologies continue to improve in both sensitivity and resolution, the recovery of additional protein sequences from fossils will be enhanced. The understanding of preferential preservation driven by molecular function may be used to adapt search algorithms to optimize studies of ancient molecules recovered from multiple extinct taxa. The recovery of additional sequences, allowed by these advances, may shed further light on the biology of extracellular matrix superstructures of living organisms.

Peptide mapping on collagen models
Human microfibril. The two dimensional expanded schematic of the human collagen fibril D-period used here was as presented previously [33]. Positions of select binding sites and functional domains from the D-period ligand binding and mutation map [33] are indicated by symbols placed next to the relevant sequences on the schematic, and the positions of dinosaur peptide sequences were mapped to homologous human sequences according to their linear distance from the N-terminus of the collagen triple helix.
Rat microfibril. The three dimensional collagen microfibril model used in this study was composed from the packing structure of rat tendon type I collagen molecules in situ [35][36]. This molecular model was constructed based on the primary sequences of the a1 and a2 chains of rat collagen, and the superhelical parameters were established from crystallographic structure determinations of collagen-like peptides constrained within the lower resolution fiber diffraction molecular envelope [35]. To map the position of the dinosaur peptide sequences on the three-dimensional rat microfibril, solvent-accessible surface calculation and rendering was performed using SPOCK [46] with the default probe size of 0.14 nm to compose a molecular outline. The Ca ''worm'' traces of relevant portions of individual triple helices were marked (see Fig. 4 for color key) to indicate the positions of peptide sequences from either Tyrannosaurus rex or Brachylophosauraus canadensis, or both (where they co-localized on the collagen molecule). The significant homology between vertebrate collagen protein sequences justifies the approach of localizing functional domains of human type I collagen on the rat type I collagen microfibril.

Statistical Analysis of Peptide Distributions on Collagen
We show the alignment of the eleven dinosaur peptides with homologous sequences on the human collagen map (Fig. 3). By visual inspection, the peptide locations appear to be non-random in several ways. For example, there appears to be co-localization between peptides from the two species on the collagen monomer at three positions. The most interesting finding is that at one of these positions, the alpha 1 chain peptide also co-localizes with its matching alpha 2 chain peptide which occurs at the integrin binding site. Also, all peptides map to Monomers 2, 3, and 4, but not to Monomers 1 and 5. We evaluated the statistical significance of these and other seemingly non-random features through their comparison to a null hypothesis of completely random alignment of the peptides to the collagen map. The null distribution of random alignment was calculated via simulation: a large number (m = 100,000) of simulated maps were generated where the eleven peptides were randomly placed. Each map was generated by sampling eleven random numbers from a discrete uniform distribution (with replacement) among all possible map locations. The uniqueness of a given feature of the peptide alignment to the collagen map was evaluated by calculating the proportion of random maps sharing that feature. We refer to this proportion as the randomization p-value, and deem features with an exceedingly small p-value to be significant (i.e. very few random maps share that feature). We calculated the randomization p-value for nine features of the peptide alignment to the human collagen map. In calculating our threshold for declaring significance, we must account for the fact that we are performing multiple tests (for nine different features). We use the conservative Bonferroni correction to determine our significance threshold, which divides the nominal significance level of 0.05 by the number of tests performed. Thus, our p-value threshold for declaring significance was 0.05/ 9 = 0.0056. As detailed below, two of the nine features were found to be significantly non-random by this criterion and seven were found to not be significant:

Significant Features
Significant Feature #1. Localization to the integrin (cell) binding site: p-value = 0.0024 Details: Three of eleven peptides (two unique sequences) were observed to overlap with the integrin binding site of the fibril which we define as comprising residues 502-510.
Significant Feature #2. Co-localization between the two species: p-value = 0.0034 Details: Three pairs of peptides (three unique sequences) from the two species co-localized on the collagen monomer.

Non-Significant Features
Non-Significant Feature #1. Overlap zone vs. gap zone: pvalue = 0.022 Details: Ten of eleven peptides (seven unique sequences) localized to the overlap zone.
Non-Significant Feature #2. Cell interaction domain: pvalue = 0.212 Details: Three of eleven peptides (two unique sequences) localized to the cell interaction domain.
Non-Significant Feature #4. Co-localization of peptides: pvalue = 0.036 Details: Four of the eleven peptides (four unique sequences) did not overlap with any other peptides.
Non-Significant Feature #5. Overlap with cross-links: pvalue = 0.097 Details: Five of the eleven peptides (three unique sequences) overlapped with the intermolecular cross-links.
Non-Significant Feature #6. Overlap with any functional domain: p-value = 0.014 Details: Eight out of eleven peptides (five unique sequences) colocalized with at least one of the following functional domains: the central integrin binding site; MMP-1-cleavage site; decoron ligation sequences; and overlapping of the intermolecular crosslinks, or aligning with them across the fibril.
Non-Significant Feature #7. Overlap with the master control region: p-value = 0.018 Details: Ten of eleven peptides (seven unique sequences) occupied the master control region, a fibril zone where most of the collagen fibrils crucial functional sequences are located.