Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The quest for a generic bird target to detect the presence of bird in food products and considerations for paleoprotein analysis


It can be important for consumers to know whether food products contain animal material and, if so, of which species. Food products with animal material as an ingredient often contain collagen type 1. LC-MS/MS (Liquid Chromatography–tandem Mass Spectrometry) was applied as technique to generically detect bird. Unlike for example fish, that have experienced longer divergence times, it is still possible to find generic LC-MS targets for avian type 1 collagen. After theoretical target selection using 83 collagen 1α2 bird sequences of 33 orders and construction of a common ancestor sequence of birds, experimental evidence was provided by analyzing extracts from 10 extant bird species. Two suitable options have been identified. The combination of VGPIGPAGNR and VGPIGAAGNR (pheasant only) covers all investigated birds and was not found in other species. The peptide EGPVGFpGADGR covers all investigated birds, but also occurs in several species of crocodiles and turtles. The presence of the generic peptide (combination) was confirmed in food products, proving the principle, and can therefore be used to detect the presence of bird. Furthermore, it is shown how the use of constructed ancestor sequences could benefit the field of paleoproteomics, in the interpretation of collagen MS/MS spectra of ancient species. Our theoretical analysis and assessment of reported Brachylophosaurus canadensis collagen 1α2 MS/MS data provided support for several previous peptide sequence assignments, but we also propose that our constructed ancestral bird sequence GPpGESGAVGPAGPIGSR may fit the MS/MS data better than the original assignment GLPGESGAVGPAGPpGSR.


Collagen type 1 forms the structural and mechanical scaffolding of skin, bones, tendons, blood vessel walls, cornea and other connective tissues. It is the most abundant protein in vertebrates [1] and consists of triple helices [2]. Usually the helices are heterotrimers of two collagen 1α1 chains and one collagen 1α2 chain [3], although skin type 1 collagens from several bony fish additionally contain a 1α3 chain in the heterotrimer along with 1α1 and 1α2 [47]. Collagen analysis is performed in several fields, e.g. in medical research [8,9], food chemistry [10] and paleoprotein analysis [1114]. Food products with animal material as an ingredient often contain collagen type 1, either due to its ubiquity and high abundance in (extracts from) animal tissues and/or by the intended addition of gelatin, which is partly hydrolyzed collagen often obtained from skin or bone [15,16]. For religious or lifestyle reasons it may be important for consumers to know whether food products contain animal material and, if so, of which species. To determine animal species, suitable and reliable analytical techniques should be applied, targeting informational biomolecules such as DNA or protein [17]. In order to find generic targets to detect the presence of bird in food products, the main goal of this study, it is necessary to perform a detailed analysis of their molecular evolution in birds and to construct ancestral sequences. An ancestor sequence of avian collagen represents collagen from the past and, compared to sequences of extant birds, will bear a greater resemblance to collagen extracted from fossilized bones of ancient birds and non-avian dinosaur species. Therefore, the approach presented in this study is also relevant to paleontologists investigating the sequence of collagen in fossilized tissues.

The treatment conditions in the gelatin production process destroy most of the DNA, severely reducing the sensitivity of DNA-based methods or leading to the occurrence of false negative results [10]. The performance of ligand binding assays (LBA) depends on the 3D structure of analyte proteins [18], which is affected by gelatin production conditions or by food processing in general. Responsible use of LBA in (processed) food analysis may require knowledge of reagent-analyte interactions at the molecular level to assess the suitability of an assay, especially with regard to its selectivity and the potential change in affinity for analyte proteins that have a changed 3D structure after food processing. It is preferable to use bottom-up protein LC-MS/MS (Liquid Chromatography–tandem Mass Spectrometry) as a detection technique over DNA-techniques and LBA, because the primary protein structures often remain largely intact during food processing. This is beneficial for reliable detection, especially when smaller target peptides are cleaved from the primary structure. Moreover, the largely intact primary structures of collagen contribute to the intrinsic properties of gelatin, i.e. the enabling of gelatinization. Targets that differ in a single amino acid can easily be distinguished using LC-MS/MS by their retention time, the m/z of precursor ions and/or the m/z of product ions after fragmentation [19]. Several strategies can be applied to detect animal species: one strategy is to select unique targets per species, which are absent in other species; another approach is to find generic targets for a whole group of animal species [20], such as birds. The latter approach was followed in this study. The goal of the study was to identify generic bird targets to add these as modules to the TrustGel method [21].

Phylogenetic studies indicate an early emergence of the three fibrillar collagen clades A to C, before the eumetazoan radiation [22]. Therefore collagen type 1 chains, belonging to the A clade, are present in all bird species and provide a good basis for finding a generic bird target. Eumetazoan radiation occurred approximately 530 million years ago [23], well before the evolution of modern birds. Approximately 165–150 million years ago, birds evolved from theropod dinosaurs and continuously served as a vehicle for protein evolution. The mass extinction event at the end of the Cretaceous, 66 million years ago, decimated the number of bird species and lineages. After that, birds explosively diversified in more than 10,000 species today [2426]. The closest extant relatives to birds are, in descending order, crocodiles, turtles and lizards (including snakes and geckos) [27].

Previously, we developed method modules for the individual detection of quantitative porcine and bovine targets [21] and for several fish species [7], amongst others. Generic bird targets should cover as many bird species as possible, but should be different from other animal species. An advantage of a generic target is that fewer transitions need to be added to a targeted, quantitative LC-MS method compared to when species are detected individually, leaving more room for detection of other targets in the same method. A disadvantage is that it cannot be known whether a generic target truly covers an entire group when genetic information is not available for every species in the group and for a sufficient number of individuals, to cover the variation within a species. However, these aspects will also impair the detection of individual species with unique targets. The theoretical target selection was performed using 83 bird sequences, including the bird species most important in food products, several sequences from related animal groups and database searches. Experimental support was provided by analyzing 10 bird species using non-targeted LC-MS/MS. Targets were selected from collagen 1α2, due to 1) the high abundance of collagen type 1 in general and because 2) less (reliable) genetic information could be retrieved from databases for avian collagen 1α1 than for collagen 1α2. Using collagen 1α1 as target source would reduce the quality of the theoretical target selection process, compared to collagen 1α2. Additionally, gene loss has not been reported for collagen 1α2 in birds, in contrast to collagen 1α1 [28]. Ultimately, one generic target and one target combination met the selection criteria. Their presence was experimentally confirmed in extracts from 10 extant bird species, in chicken soup and in chicken broth, as proof of principle for food products. Finally, it was shown how our combined food chemistry and molecular evolution approach could benefit paleoprotein analysis, in the interpretation of collagen MS/MS spectra from ancient species.

Materials and methods

Bird products (ostrich, goose, duck, turkey, chicken, pheasant, guinea fowl, pigeon, partridge, quail, various cuts) were purchased from local supermarkets. Collagen from these birds was extracted by placing several grams per product in milliQ water in an oven at 100°C for 2 days. After centrifugation, 4 ml extract was added to 3 ml pentane, shaken and centrifuged. After 1 hour at < -70°C the pentane layer was removed. Aliquots were digested with trypsin without reduction or alkylation because the collagen GXY domain, from which the peptide targets of interest were cleaved, contains no disulfide bridges. Chicken soup was homemade and was sampled by taking an aliquot of the gelatinous part, after the soup had cooled. Chicken broth (brand A (1.3% chicken meat powder) and brand B (3.1% chicken meat powder)) and beef broth (2.3% beef extract) tablets were purchased from local supermarkets. Tablets were dissolved in 100 mM ammonium bicarbonate at 95°C for 3 hours. After centrifugation, aliquots were digested with trypsin. The beef broth sample served as a negative control.

Samples were analyzed using a combination of a UHPLC (Ultimate 3000, Dionex) and a Q-Exactive mass spectrometer (ThermoElectron). Separation was achieved on an Acquity HSS T3 column (2.1 × 100 mm, 1.8 μm, Waters, Milford, PA, USA) at a temperature of 40°C with an injection volume of 10 μl. Mobile phases consisted of milliQ water (A) and acetonitrile (B), both containing 0.1% formic acid. A binary gradient from 2% to 30% B was applied at a flow rate of 0.5 ml minute−1, followed by a column wash and equilibration. The total run time was 18 minutes. All peptides were analyzed using electrospray ionization in positive mode (HESI source) using a full-scan data-dependent method with a range of m/z 200–2000. Other settings were: resolution (35,000), spray voltage (3.0 kV), capillary temperature (320°C), heater temperature (350°C), S-Lens RF Level (50 V), AGC target (1e6), and maximum IT (150 ms). The top 5 ions were subjected to data-dependent scans at a normalized collision energy of 15, 25 and 35. XCalibur software version 3 (ThermoScientific) was used for data acquisition. Data analysis was performed manually.

Sequences of 83 avian collagen 1α2 mRNA or cDNA entries were obtained from the NCBI nucleotide database (, accessed in September 2021 (see Table 1). We preferred to use database nucleotide sequences, due to their higher reliability compared to protein sequences. For comparison, crocodile, turtle, snake, mammalian, amphibian and fish collagen 1α2 sequences were added to the set, as well as a constructed common ancestral bird collagen 1α2 sequence. The sequences were translated to protein using Microsoft Excel version 2103. Collagen 1α2 sequences were only included in the data file if the GXY domain was 1014 codons in length (excluding the subsequent GGG triplet) and if there was a glycine codon in each first GXY position, to promote the inclusion of high quality sequences [29]. The sequence of Dromaius novaehollandiae contained missing information, namely GGK at codon position 781 and CCY at position 1001. These codons were adapted to GGT and CCT, respectively, as these were the majority codons at the indicated positions. The bird species in the data set are from 33 orders. A coding DNA estimation of the collagen 1α2 GXY domain of the common bird ancestor was composed using the sequences of one species per order, marked with an asterisk in Table 1. Although there are differences in age between orders, the same weight was provided to each order to calculate the estimation. Of the 1014 positions, 1007 had majority codons (present in 17 or more of the 33 species) that were automatically selected for the ancestral sequence. There were 5 positions with a most abundant non-majority codon, that was selected. Finally, there were 2 positions that exhibited 2 non-majority codons of equal abundance. At position 611 (15x CCT, 15x CCC, 1x CCA and 2x GCC) CCC was selected for the ancestral sequence, because of its slightly higher probability compared to CCT when only single nucleotide changes are considered. At position 863 (16x AAC, 16x AGC and 1x AGT) AGC was selected for the same reason. The constructed common ancestral sequence of birds is reported in Fig 1 and was used to assess the genericity and suitability of bird collagen 1α2 targets.

Fig 1. Constructed coding DNA sequence of the common bird ancestor.

Constructed coding DNA sequence of the avian common ancestral collagen 1α2 GXY domain. Glycine codons in the 1st GXY position are highlighted in light green and other glycine codons in dark green.

Table 1. Overview of collagen 1α2 data sources.

The species marked with an asterisk were included in the common bird ancestor estimation.

Results and discussion

Theoretical target selection

The set of 83 bird collagen 1α2 sequences that met the selection criteria were visualized in codon, codon group and amino acid usage tables [29], to aid in the identification of a generic peptide (see S1 File). Together with several more distant species, the similarities of the birds’ cDNA sequences are shown in Fig 2, by the number of mutual nucleotide differences. Additionally, it was calculated which amino acids were fully conserved, regarding the 83 species. After construction of the common bird ancestral cDNA sequence, see Fig 1, which was also included in the comparison of Fig 2, the ancestral sequence was translated to protein and in silico digested with trypsin, resulting in the formation of 79 peptides containing part of the GXY domain, as summarized in Table 2.

Fig 2. Nucleotide differences between birds and other species.

Distance table of 90 compared animal species, including 83 bird species and the constructed common ancestor of birds, with respect to the collagen 1α2 GXY domain, indicating, from green via yellow to red, the increasing amount of nucleotide differences.

Table 2. Results of the theoretical assessment of tryptic peptides.

The peptides were derived from the constructed common bird ancestor collagen 1α2 GXY domain. Assignments are explained in the main text.

Analogue human collagen 1α2 (Uniprot entry P08123) contains 11 amino acids and 1 tryptic cleavage site N-terminal and 15 amino acids and 1 tryptic cleavage site C-terminal of the GXY domain. The selection of generic tryptic targets for birds was performed in two rounds. The first round focused mainly on genericity of the target, unambiguity (meaning the target is present in a single form) and analyzability; the second round was aimed at uniqueness versus non-bird species. The following criteria were applied during the first selection round:

  1. The peptide target should contain a maximum of 1 N, Q or M residue. Whereas full target unambiguity is highly desirable for quantitative targets, for qualitative targets it may be allowed that the sequence contains an amino acid that can be partially modified, e.g. N or Q (deamidation) or M (oxidation). We chose to avoid peptide targets containing more than one amino acid that could be partially modified, due to the resulting increase in the number of possible forms to be monitored.
  2. The length of the peptide should be at least 7 residues to provide sufficient uniqueness versus other species than birds.
  3. The peptide target should contain a maximum of 1 amino acid that has not been fully conserved, with respect to the 83 bird species.
  4. A maximum of 2 amino acid variations were allowed in the single non-conserved amino acid (see criterion C), again to limit the number of possible forms to be monitored and to reduce the probability that bird species that are not part of the set of 83, would show unexplored variations.

The results of the target selection are summarized in Table 2. For each of the 79 tryptic peptides of the constructed common bird ancestral sequence, the assignment is given in the third column. The assignments consist of a letter A to D, indicating which of the aforementioned criteria was not met (not exhaustive) and a number. For A assignments the number indicates the amount of N, Q or M amino acids in the sequence, for B assignments it indicates the peptide length, for C assignments the number of non-conserved residues in the peptide and for D assignments the number of amino acid variations in the single non-conserved residue. Fully conserved peptides were designated E. Only D2 and E assignments entered the second round of selection. It should be noted that all D2 and E peptides also exhibited fully conserved preceding cleavage sites and absence of P in the amino acid C-terminal to the peptide, which is essential for their release from the primary structure by trypsin digestion. There were a total of 4 D2 and 6 E candidates. In the second round, the candidate peptides were subjected to protein blast search. The applied criterion was that no 100% hits should be obtained with animal species other than bird species. It was expected that hits with non-avian species would be obtained, especially for the E candidates. Hits with many other species were obtained for 5 out of 6 E candidates. However, the E candidate EGPVGFpGADGR had a limited number of hits, only with several species of crocodiles and turtles. It should be noted that p represents hydroxyproline which is often present at the third GXY position. For 3 out of 4 D2 candidates hits with many non-avian species were obtained, while there were no hits for the peptide VGPIGPAGNR. The only two remaining candidates represent positions 375–386 and 387–396 of the GXY domain. Fig 3 illustrates why this part of the sequence is suitable as generic bird target. Positions 375–396 of the bird sequence are more similar to the sequences of crocodiles and turtles than to the sequence of other reported species, as expected from the evolutionary relations. The peptide EGPVGFpGADGR is exactly the same between the constructed common ancestor of birds and the crocodile and turtle sequences shown, but differs from the next closest species groups of snakes, lizards and geckos as well as from the amphibian and fish species. When it can be excluded that a sample contains crocodile or turtle material, by tracing the product’s origin, the peptide EGPVGFpGADGR is suitable as generic bird peptide, especially as it is fully conserved considering the 83 bird species listed in Table 1. Ideally, a generic bird peptide should not occur in crocodile and turtle. As can be seen from Table 2, no other fully conserved E peptide is available with respect to the 83 investigated birds, but the VGPIGPAGNR peptide is the most suitable D2 candidate. First, the constructed common ancestral peptide of birds differs crucially from crocodiles and turtles in that there is an N residue at the 9th position, whereas an A residue occurs in the crocodile and turtle species. Second, there is an I residue at the 4th position, which is often T in crocodiles and turtles. As mentioned previously, VGPIGPAGNR is not fully conserved as it occurs in 82 of the 83 bird species investigated. Only in Phasianus colchicus (common pheasant) the P residue at the 6th position has changed to an A residue, resulting in the sequence VGPIGAAGNR. The combination of VGPIGPAGNR and VGPIGAAGNR can therefore be used to generically investigate the presence of birds, as it covers the whole group and differs from other animal groups.

Fig 3. Species comparison of positions 374–397 of the collagen 1α2 GXY domain.

The left sequence is from the constructed common ancestor of birds and the other sequences are from several reptile, frog and fish species, for comparison. On the left of each cell the codon is mentioned and on the right of each cell the codon group [21], which corresponds to amino acid (without the subscript). Compared to the common ancestor of birds, yellow cells correspond to amino acid differences and orange cells to codon group differences that do not lead to amino acid differences. Tryptic cleavage sites are presented as thick lines between cells.

Experimental assessment

Extracts from ostrich, goose, duck, turkey, chicken, pheasant, guinea fowl, pigeon, partridge and quail were part of the experimental data set. In Fig 4 MS/MS spectra are presented of digested chicken and pheasant extract, showing both versions of the D2 candidate peptide. VGPIGAAGNR was determined experimentally in pheasant, while VGPIGPAGNR was observed in all other analyzed birds, see the result summary in Table 3. All relevant chromatograms and MS/MS spectra are reported in S2 File. A complicating aspect of the combination VGPIGPAGNR/VGPIGAAGNR is that it contains asparagine, which can be deamidated [30], especially in highly processed foods, which negatively affects the unambiguity. Therefore, it is necessary to also monitor the deamidated forms, which are 1 Da higher in mass, when examining the presence of bird material in processed food samples, bringing the total number of forms to be monitored to 4. An advantage of VGPIGPAGNR/VGPIGAAGNR is that it is shorter than EGPVGFpGADGR, theoretically making the combination more suitable to detect collagen if it were present in a more hydrolyzed form. It should be emphasized that the peptides were selected from 83 species classified into 33 orders. Since it is not yet possible to obtain a complete overview of sequences of birds and other species, it is advisable to monitor EGPVGFpGADGR besides VGPIGPAGNR/VGPIGAAGNR when the presence of birds is investigated. In addition, (functionally irrelevant) amino acid changes may have occurred in individuals within a species or be fixed in any bird species, exemplified by the pheasant change to VGPIGAAGNR, which will diminish the suitability of a generic target. On the other hand, (processed) food products often contain material from numerous indivuals, which increases the suitability of a generic target.


MS/MS spectra of tryptic collagen 1α2 peptides from a) chicken (precursor m/z 469.26) and b) pheasant extract (precursor m/z 456.26). The chicken peptide VGPIGPAGNR is nearly generic for birds.

Table 3. Results of the experimental assessment of generic bird peptides.

The presence of VGPIGPAGNR / VGPIGAAGNR (and/or deamidated) and EGPVGFpGADGR in several bird samples and negative control beef broth is indicated.

After having determined the presence of EGPVGFpGADGR and VGPIGPAGNR/VGPIGAAGNR in the extracts of 10 different bird species, we decided to investigate the presence of the same peptides in processed food products: homemade chicken soup, chicken broth tablets and a beef broth tablet as negative control. The results of these analyses are also summarized in Table 3. As expected, VGPIGPAGNR and EGPVGFpGADGR were detected in the chicken food products, but not in the bovine food product. In Fig 5 chromatograms are presented of a chicken and beef broth sample to illustrate this. The chicken samples also showed a deamidated form of VGPIGPAGNR. Deamidated VGPIGPAGNR ([M+2H]2+ ions = > m/z 469.756) has the same nominal mass as the 2nd isotope of VGPIGPAGNR ([M+2H]2+ ions = > m/z 469.766). However, the molecular species are separated by their LC retention time, their exact mass and part of the fragment ions and thus can be easily distinguished. The bird peptides were clearly absent in beef broth. Instead, the bovine peptides GETGPAGPAGPIGPVGAR and GIpGEFGLpGPAGAR were detected, confirming the presence of bovine collagen 1α1 and 1α2 in the beef broth negative control sample. Finally, an MS/MS spectrum of EGPVGFpGADGR ([M+2H]2+ ions = > m/z 587.778) obtained from chicken broth brand A is presented in Fig 6. The fragment ions were according to expectation: mainly b and y type ions, including water loss related to the N-terminal glutamic acid [31]. All relevant chromatograms and MS/MS spectra are reported in S2 File.

Fig 5. Extracted chromatograms of bird targets.

Extracted chromatograms showing the signals for a) VGPIGPAGNR (2nd isotope) and deamidated VGPIGPAGNR (m/z 469.75 to 469.77) and b) EGPVGFpGADGR (m/z 587.775 to 587.785) in chicken (green lines, brand B) and beef (black lines) broth tablets.

Fig 6. MS/MS spectrum of EGPVGFpGADGR.

MS/MS spectrum of the tryptic collagen 1α2 peptide EGPVGFpGADGR (precursor m/z 587.78), which is generic for birds regarding the investigated bird species, but also occurs in crocodile and turtle species. The spectrum was obtained from chicken broth (brand A).

Target coverage

Depending on the level of required genericity and uniqueness versus other animal species, it can be considered to use other peptides from Table 2 during an experiment. As an example the C2-assigned VGApGPAGAR is discussed. The peptide contains 2 non-conserved amino acids, regarding the 83 investigated bird species, both of which exhibit two amino acid variations. The amino acid in the 1st position is V in 60 species and I in 23 species. The amino acid in the 3rd position is A in 82 species and G in 1 species, giving a total of 3 forms. The main form VGApGPAGAR occurs in 60 species, including duck; the second most abundant form IGApGPAGAR occurs in 22 species, including chicken. Finally only Phaethon lepturus or white-tailed tropicbird contains IGGpGPAGAR. The combination VGApGPAGAR/IGApGPAGAR is sufficient to analyze chicken in duck and vice versa when it can be assumed that there are no other species present. When other (bird) species might be present, this combination is not suitable. Moreover, VGApGPAGAR provides hits with many non-bird species (e.g. rat and mouse) and therefore the peptide is not suitable to generically investigate the presence of bird material in food products. In all cases, peptide targets should be assessed regarding the required genericity and uniqueness, in relation to the goal of the study.

The peptide combination VGApGPAGAR/IGApGPAGAR is a clear example of a protein sequence part that could lead to confusing results when subjected to phylogenetic analysis. In a previous study it was established that changes in the GXY domain of type 1 collagens appear to be mainly the result of genetic drift and that back changes in a species’ lineage can also occur within a functionally restricted space [29]. Whilst most of the bird orders investigated in the present study exhibited either the VGApGPAGAR or IGApGPAGAR form, the orders passeriformes, galliformes, apodiformes and anseriformes showed both forms within the species set. This could indicate that the two peptide forms have interchanged more than once during evolution and that the direction has not always been purely divergent. Alternatively, the peptide may not have been fully fixed to either form when the mentioned orders split off, which may persist even in the present, also for single species. Yet, fixation of a mutation is never an end result, exemplified by the white-tailed tropicbird sequence IGGpGPAGAR showing further divergence originating from a different position in the peptide.

Collagen past, present and future

We presented several options to generically detect bird collagen 1α2, based on the current variation in the protein sequence. The divergence process, based mainly on genetic drift [29], will proceed in the future and, therefore, it is expected that more and more bird species will not contain the generic peptides presented in this study at some point in the future. Species will inevitably drift further apart, although this process is very slow. For example, in humans (including the intermediate ancestors to a constructed common Euarchontoglires ancestor) an average of 1.1 nucleotide changes appear to have been fixed in the collagen 1α1 GXY domain (3042 nucleotides) per million years. This amounts to 3.5 × 10−10 changes per nucleotide per year, excluding back changes [29]. Although the variation in collagen sequences within populations can be high [32], it is very rare for variations to become fixed. Moreover, only part of the nucleotide changes result in amino acid changes, which can be detected by protein LC-MS. The factor between nucleotide and amino acid changes is not constant, reducing the suitability of protein sequences for quantitative assessment of evolutionary relations [29]. Although collagen sequence changes in the more recent past appear to mainly have been governed by genetic drift, it is conceivable that future changes again will be more influenced by selective pressure, e.g. due to a change in external conditions. Such a change would have to be quite drastic to really affect the required collagen properties for tissue structure. Besides functional and informational restriction, the maintenance of code robustness [33] may also play a role during divergence, exemplified by codon usage bias structures. Overall, the current divergence status of bird collagen 1α2 makes it possible to apply generic detection using protein LC-MS. For fish species, however, it was observed that it does not seem feasible to select a comprehensive generic collagen type 1 target, due to the much longer divergence times [7]. The selection of generic peptides for fish species is still possible, but at a lower level, e.g. for fish families. Unfortunately, the taxonomy nomenclature of species types, such as birds or fish, is quite confusing because the organization levels “orders” and “families” do not represent similar divergence times for birds and fish. This effect is especially visible in protein domains that are mainly governed by genetic drift, but can be obscured in domains under high selective pressure.

To aid in finding candidate targets, a bird ancestral sequence was constructed, which estimates the collagen 1α2 GXY sequence of birds in the past, from which the present sequences have diverged. The constructed ancestral sequence may also be useful for identification of collagen in fossilized tissues of birds and related species using paleoprotein analysis [34] as it has been observed that collagens can be preserved for millions of years [11,12]. In a previous study [13], five collagen 1α2 peptide sequences were reported for a specimen of Brachylophosaurus canadensis (age 80 million years), a hadrosaur species, see Table 4.

Table 4. Evaluation of reported Brachylophosaurus canadensis collagen 1α2 peptides.

The corresponding sequences from the constructed common ancestor of birds are in the right column. Amino acid differences are highlighted in yellow.

Three of the five peptide sequences are exactly the same between Brachylophosaurus canadensis and the constructed common bird ancestor, and for one peptide there was a single threonine-alanine difference. The bottom peptide from Table 4 exhibited two differences, leucine-proline and proline-isoleucine. These types of changes are not unexpected in collagen 1α2. In a previous study, we found that, when amino acid changes occur, a selection of codon groups, such as A, P, V, T, S1, and I, is predominantly involved in changes between closely related 1α2 collagens [35], as part of a larger change infrastructure. Leucines are slightly less involved. Again, hydroxyproline is often present at the third GXY position instead of proline. Therefore, we further investigated the reported MS/MS spectrum for Brachylophosaurus canadensis GLPGESGAVGPAGPpGSR [12]. It was deduced that, depending on the obtained resolution, our constructed ancestral bird sequence GPpGESGAVGPAGPIGSR (which has exactly the same mass as the reported GLPGESGAVGPAGPpGSR), may fit the MS/MS data better than the original assignment, as it would explain the ions observed at nominal m/z 1424. These ions were assigned as “Potentially co-eluting contaminating ion” but could be assigned as y16+ ions of GPpGESGAVGPAGPIGSR. The unassigned peak at nominal m/z 712 could then represent y162+ ions of GPpGESGAVGPAGPIGSR. This finding indicates that the construction of ancestral sequences using sequences of extant species could be helpful in the structural elucidation of paleoproteins, linking the research fields food chemistry, molecular evolution and paleontology, and providing a strong combination of disciplines to support paleontology in the 21st century and beyond.


Generic LC-MS bird targets were identified after theoretical target selection, using a set of 83 bird collagen 1α2 sequences of 33 orders and a constructed common ancestral sequence, followed by experimental assessment of extracts from 10 bird species. Two tryptic target peptides passed the selection citeria, aimed at genericity, unambiguity, analyzability and uniqueness vs. non-bird species. The combination of VGPIGPAGNR and VGPIGAAGNR (pheasant only) covers all the investigated birds and was not found in other species using protein blast. It should be noted that it is necessary to monitor the deamidated forms in combination with the unmodified forms, as deamidation of N can occur during food processing. The peptide EGPVGFpGADGR covers all the investigated birds, but also occurs in several species of crocodiles and turtles. Only when the presence of the latter species can be excluded, the peptide is suitable as generic bird target. The presence of the generic peptide (combination) was confirmed in chicken soup and broth, with beef broth as negative control sample, providing proof of principle in food products. The constructed common ancestral bird sequence was also used to evaluate elucidated dino sequences, demonstrating that the use of ancestral sequences could be helpful in paleoprotein analysis.

Supporting information

S1 File. Calculations and data.

Calculations and data regarding the generic targets, the common bird ancestor and the distance table.


S2 File. Chromatograms and MS/MS spectra.

All relevant chromatograms and MS/MS spectra of the bird samples and negative control beef broth.



Gerard Wolfis is acknowledged for preparing chicken soup.


  1. 1. Makareeva E, Leikin S. Collagen Structure, Folding and Function. In: Osteogenesis Imperfecta. Academic Press, Cambridge, MA; 2014. ISBN: 978-0-12-397165-4.
  2. 2. Shoulders MD, Raines RT. Collagen structure and stability. Annual Review of Biochemistry 2009; 78: 929–958. pmid:19344236
  3. 3. Bellamy G, Bornstein P. Evidence for procollagen, a biosynthetic precursor of collagen. Proceedings of the National Academy of Sciences USA 1971; 68: 1138–1142.
  4. 4. Kimura S, Ohno Y. Fish type I collagen: tissue-specific existence of two molecular forms, (α1)2α2 and α1α2α3, in Alaska pollack. Comparative Biochemistry and Physiology B 1987; 88: 409–413.
  5. 5. Piez KA. Characterization of a collagen from codfish skin containing three chromatographically different α chains. Biochemistry 1965; 4: 2590–2596.
  6. 6. Kimura S. The interstitial collagens of fish. In: Biology of Invertebrate and Lower Vertebrate Collagens. Springer, Boston, MA; 1985. ISBN: 978-1-4684-7636-1. pp 397–408.
  7. 7. Kleinnijenhuis AJ, Van Holthoon FL, Van der Steen B. Identification of collagen 1α3 in teleost fish species and typical collision induced internal fragmentations. Food Chemistry: X 2022; 14: 100333.
  8. 8. Marini JC, Forlino A, Cabral WA, Barnes AM, San Antonio JD, Milgrom S, et al. Consortium for osteogenesis imperfecta mutations in the helical domain of type I collagen: regions rich in lethal mutations align with collagen binding sites for integrins and proteoglycans. Human Mutation 2007; 28: 209–221. pmid:17078022
  9. 9. Nuytinck L, Freund M, Lagae L, Pierard GE, Hermanns-Le T, De Paepe A. Classical Ehlers-Danlos syndrome caused by a mutation in type I collagen. American Journal of Human Genetics 2000; 66: 1398–1402. pmid:10739762
  10. 10. Grundy HH, Reece P, Buckley M, Solazzo CM, Dowle AA, Ashford D, et al. A mass spectrometry method for the determination of the species of origin of gelatin in foods and pharmaceutical products. Food Chemistry 2016; 190: 276–284.
  11. 11. Asara JM, Schweitzer MH, Freimark LM, Phillips M, Cantley LC. Protein sequences from Mastodon and Tyrannosaurus rex revealed by mass spectrometry. Science 2007; 316: 280–285. pmid:17431180
  12. 12. Schweitzer MH, Zheng W, Organ CL, Avci R, Suo Z, Freimark LM, et al. Biomolecular characterization and protein sequences of the Campanian hadrosaur B. canadensis. Science 2009; 324: 626–631. pmid:19407199
  13. 13. Schroeter ER, DeHart CJ, Cleland TP, Zheng W, Thomas PM, Kelleher NL, et al. Expansion for the Brachylophosaurus canadensis collagen I sequence and additional evidence of the preservation of cretaceous protein. Journal of Proteome Research 2017; 16: 920–932. pmid:28111950
  14. 14. Chen F, Welker F, Shen C-C, Bailey SE, Bergmann I, Davis S, et al. A late Middle Pleistocene Denisovan mandible from the Tibetan Plateau. Nature 2019; 569: 409–412. pmid:31043746
  15. 15. Stevens P. Gelatine. In Imeson A, Food Stabilisers, Thickeners and Gelling Agents. Chichester, United Kingdom: Blackwell Publishing Ltd.; 2010. pp 116–144.
  16. 16. Vergauwen B, Stevens P, Prawitt J, Olijve J, Brouns E, Babel W. Gelatin. Ullman’s Encyclopedia of Industrial Chemistry. Weinheim, Germany: Wiley-VCH; 2016. pp 1–22.
  17. 17. Rohman A, Windarsih A, Erwanto Y, Zakaria Z. Review on analytical methods for analysis of porcine gelatine in food and pharmaceutical products for halal authentication. Trends in Food Science & Technology 2020; 101: 122–132.
  18. 18. Cross TG, Hornshaw MP. Can LC and LC-MS ever replace immunoassays? Journal of Applied Bioanalysis 2016; 2: 108–116.
  19. 19. Dupree EJ, Jayathirtha M, Yorkey H, Mihasan M, Petre BA, Darie CC. A Critical Review of Bottom-Up Proteomics: The Good, the Bad, and the Future of This Field. Proteomes 2020; 8: 14. pmid:32640657
  20. 20. Kleinnijenhuis AJ, van Holthoon FL, van Dongen WD. Integrated hemolysis monitoring for bottom-up protein bioanalysis. Bioanalysis 2020; 12: 1231–1241. pmid:32915066
  21. 21. Kleinnijenhuis AJ, Van Holthoon FL, Herregods G. Validation and theoretical justification of an LC-MS method for the animal species specific detection of gelatin. Food Chemistry 2018; 243: 461–467. pmid:29146366
  22. 22. Exposito J-Y, Valcourt U, Cluzel C, Lethias C. The fibrillar collagen family. International Journal of Molecular Sciences 2010; 11: 407–426. pmid:20386646
  23. 23. Peterson KJ, Butterfield NJ. Origin of the Eumetazoa: Testing ecological predictions of molecular clocks against the Proterozoic fossil record. Proceedings of the National Academy of Sciences USA 2005; 102: 9547–9552. pmid:15983372
  24. 24. Brusatte SL, O’Connor JK, Jarvis ED. The Origin and Diversification of Birds. Current Biology 2015; 25: R888–R898. pmid:26439352
  25. 25. Xu X, Zhou Z, Dudley R, Mackem S, Chuong C-M, Erickson GM, et al. An integrative approach to understanding bird origins. Science 2014; 346: 1253293. pmid:25504729
  26. 26. Prum RO. Who’s Your Daddy? Science 2008; 322: 1799–1800.
  27. 27. Field DJ, Gauthier JA, King BL, Pisani D, Lyson TR, Peterson KJ. Toward consilience in reptile phylogeny: microRNAs support an archosaur, not a lepidosaur affinity for turtles. Evolution & Development 2014; 16: 189–196.
  28. 28. Haq F, Ahmed N, Qasim M. Comparative genomic analysis of collagen gene diversity. 3 Biotech 2019; 9: 83. pmid:30800594
  29. 29. Kleinnijenhuis AJ. Visualization of Genetic Drift Processes Using the Conserved Collagen 1α1 GXY Domain. Journal of Molecular Evolution 2019; 87: 106–130.
  30. 30. Androutsou M-E, Nteli A, Gkika A, Avloniti M, Dagkonaki A, Probert L, et al. Characterization of Asparagine Deamidation in Immunodominant Myelin Oligodendrocyte Glycoprotein Peptide Potential Immunotherapy for the Treatment of Multiple Sclerosis. International Journal of Molecular Sciences 2020; 21: 7566. pmid:33066323
  31. 31. Neta P, Pu Q-L, Kilpatrick L, Yang X, Stein SE. Dehydration versus deamination of N-terminal glutamine in collision-induced dissociation of protonated peptides. Journal of the American Society for Mass Spectrometry 2007; 18: 27–36. pmid:17005415
  32. 32. Chan T-F, Poon A, Basu A, Addleman NR, Chen J, Phong A, et al. Natural variation in four human collagen genes across an ethnically diverse population. Genomics 2008; 91: 307–314. pmid:18272325
  33. 33. Ofria C, Adami C, Collier TC. Selective pressures on genomes in molecular evolution. Journal of Theoretical Biology 2003; 222: 477–483. pmid:12781746
  34. 34. Schroeter ER, Cleland TP, Schweitzer MH. Deep Time Paleoproteomics: Looking Forward. Journal of Proteome Research 2022; 21: 9–19. pmid:34918935
  35. 35. Kleinnijenhuis AJ, Van Holthoon FL. Domain-specific proteogenomic analysis of collagens to evaluate de novo sequencing results and database information. Journal of Molecular Evolution 2018; 86: 293–302. pmid:29721580