Protein and Peptide Composition of Male Accessory Glands of Apis mellifera Drones Investigated by Mass Spectrometry

In honeybees, reproductive females usually mate early in their life with more than 10 males in free flight, often within 10 minutes, and then store male gametes for up to five years. Because of the extreme polyandry and mating in free flight special adaptations in males are most likely. We present here the results of an investigation of the protein content of four types of male reproductive glands from the Western honeybee (Apis mellifera) drone, namely seminal vesicles (secretion in ejaculate), as well as bulbus, cornua and mucus glands (secretions for the mating plug). Using high resolution and accuracy mass spectrometry and a combination of database searching and de novo sequencing techniques it was possible to identify 50 different proteins in total, inside all mentioned glands, except in the mucus gland. Most of the proteins are unique for a specific gland type, only one of them (H9KEY1/ATP synthase subunit O) was found in three glands, and 7 proteins were found in two types of glands. The identified proteins represent a wide variety of biological functions and can be assigned to several physiological classes, such as protection, energy generation, maintaining optimal conditions, associated mainly with vesicula seminalis; signaling, cuticle proteins, icarpin and apolipoproteins located mainly in the bulbus and cornua glands; and some other classes. Most of the discovered proteins were not found earlier during investigation of semen, seminal fluid and tissue of reproductive glands of the bee drone. Moreover, we provide here the origin of each protein. Thus, the presented data might shed light on the role of each reproductive gland.


Introduction
The ability to store significant amounts of male gametes for a long period of time (several years) is a characteristic feature of the queens of social Hymenoptera [1]. The queens of the honeybee (Apis mellifera) mate only once, early in their adult life, and accumulate the amount of sperm sufficient for the complete lifetime [2][3][4]. Sperm storage by female insects is an However, this effect is by far smaller when the sperm is activated [10]. In total, about a hundred proteins are identified in honeybee seminal fluid so far [21,29,30]. They can be distributed in several functional groups: signaling and defense, e.g. antioxidant, metal-binding, pheromonebinding proteins; energy production, e.g. amino acid synthetases, ATP or NADH synthases; protein structure/function proteins, e.g. proteases and heat shock proteins; several representatives of lipid metabolism proteins, phosphate catabolism proteins and transport proteins were also found. Apart from functional classes, proteins found can be attributed to three physiological categories: proteins involved in maintaining environment for sperm survival (energy, phosphate-binding and protein folding), proteins providing sperm with its physiological needs (energy, protection from dangerous substances) and proteins influencing female behavior [29,30].
Our goal was to investigate the protein and peptide content of four different reproductionrelated glands of the honeybee drone separately in order to identify functional classes and the morphological origin of proteins synthesized by the glands. We employed database searching methods, using all bee proteins reported so far in public protein databases, as well as de novo sequencing, to complement database search results. Both approaches were based on high accuracy and resolution mass spectrometry (MS) measurements, allowing precise structure determination.

Gland Preparation
Predominantly mature drones were collected from a hive of Western honeybees (Apis mellifera carnica) kept at the apiary of the Oberursel Bee Research Institute. From each drone, the reproductive apparatus was dissected and each gland was separated and rinsed three times in phosphate buffered saline. Tissue samples were stored at -20°C until further processing.

Extraction of Gland Content
Two glands of each type were carefully mixed with 30 μL 0.1% TFA avoiding the destruction of gland tissue and sonicated for 5 min. The solution was left overnight at 4°C to complete the extraction of gland contents. Later, the samples were purified by ZipTip C18 using standard protocol, completely dried in SpeedVac and reconstituted in 1% formic acid (FA) before HPLC-MS analysis. No enzymatic digestion was performed before LC-MS analysis.

Tandem Mass Spectrometry
All experiments were done on a 7-Tesla Finnigan LTQ-FT Ultra mass spectrometer (Thermo Fisher Scientific GmbH, Bremen, Germany) consisting of a linear quadrupole ion trap and a Fourier transform ion cyclotron resonance (FTICR) mass spectrometer, equipped with a nano electrospray ionization source run in positive ion mode (spray voltage 2.2 kV, ion transfer tube temperature 275°C). The HPLC system consisted of a solvent degasser, a nano flow pump and an autosampler (Ultimate, Dionex/LCPackings, Idstein, Germany). Chromatographic separation of the peptides took place in a 15-cm analytical column with an internal diameter of 75 μm filled with C18, 3 μm, 100 Å stationary phase. Samples were pre-focused on the trap column (Dionex, C18 PepMap, i.d. 300 μm, length 5 mm) and eluted by the following multistep gradient: 4-40% B in 7 min, 40-95% B in 35 min, and isocratic 95% B for 5 min. Solvent A was 2% acetonitrile/water with 0.1% FA, solvent B-80% acetonitrile/water with 0.1% FA. The mass spectrometer was operated in the data-dependent mode, fragmenting three most intense ions by CID (isolation width 2 u, normalized collision energy 30). Survey MS spectra (from m/z 200-2000) were acquired in the FTICR cell with a mass resolution of 100,000 (@m/z 400), while MS 2 spectra were recorded with a mass resolution of 50,000 (@m/z 400). The mass spectrometry data have been deposited to the ProteomeXchange Consortium [31] via the PRIDE partner repository with the dataset identifier PXD001993 and 10.6019/PXD001993.

Amino Acid Sequence Analysis
First, LC-MS data were processed by Thermo Proteome Discoverer version 1.3 (Thermo Fisher Scientific GmbH, Bremen, Germany) using the SEQUEST search engine [32] embedded in this software. The reference database included proteins of the genus Apis obtained from Uniprot as well as frequently observed contaminations (cRAP set: http://www.thegpm.org/crap/index. html dated 29.02.2012). Unspecific cleavages were used to form theoretical peptides; oxidation of methionine was specified as variable modification. Fragmentation spectra were grouped with 4 ppm tolerance for parent ion and 1 min RT window prior to search. The same mass tolerance was used for parent ion in SEQUEST, while for the fragments the tolerance was set to 0.01 Da. Database search results were filtered to FDR < 0.05 (estimated by target-decoy approach) and further validated by manual de novo sequencing. In the second step, manual de novo interpretation of all spectra was performed to find peptides undiscovered by SEQUEST. Two home-built programs, written in Python 2.7.5, were used to assist de novo sequencing. Except when standard Python packages Pyteomics [33] was used, the GUI was built using the PyQt4 (v. 4.9.6) package. The first program (Mass2aa) was used to fit the mass provided by the user, to an amino acid composition, consisting of the 20 standard amino acids including several most common modifications. The mass can be accounted as complete peptide with free acid or amide C-terminus, b-or y-ion, or as bare amino acid combination. The second program (Fuz-zyMatch) was employed to match incomplete sequencing results to a database of proteins. The software is capable to operate with permutation blocks, e.g. when the position of amino acids in the sequence is not definite. When a large part of sequence is unclear, it can account for small changes in the sequence, such as amino acid substitution or modification in N-or Cterminal part. Detailed description is provided in S1 Text. The same protein database as for SEQUEST search was used for matching de novo sequencing results.

Results and Discussion
On the first step, the HPLC-MS data for each gland were analyzed with database searching algorithm (SEQUEST). As the result a number of matches were found, that were checked for correctness later. The important features taken into consideration were: (1) the completeness of rational explanations of all intense peaks in the spectrum, (2) the length of consecutive ion series, and (3) the interconnection between ion series (e.g. complementarity of ions, internal ions). An example of a spectrum poorly identified by SEQUEST and subsequently corrected using the de novo sequencing approach as illustrated in this manuscript, is presented in S2 Text.
The list of proteins identified by database search is provided in Table 1. Protein identification by only one peptide is also included in the table, since all peptide identifications were additionally checked for correctness manually, minimizing the number of false positive identifications. Moreover, some of the identifications receive additional support in the second step of data analysis. Although the secretion of mucus glands forms the major part of the mating sign, only low molecular mass compounds were identified in this gland, while proteins were discovered in three other gland types. In total 6 proteins (1 unique) were found in bulbus, 13 (8 unique) in cornua and 18 (16 unique) in vesicula seminalis. In total 31 proteins, with 73 corresponding peptides (2.4 peptide/protein) were found. S1 Table contains additional details on identified peptides and proteins.
On the next step de novo sequencing was applied to all unidentified spectra. During this process peptides with modifications, unusual peptides and those which were missed by SEQUEST can be identified. Analysis of fragmentation spectra was performed manually. Home-built software was used to match sequencing results with the database of proteins. Several interesting examples of this analysis can be found below and in S3 Text. The fragmentation spectrum of the triply charged precursor of the same peptide (Fig 3) shows a different fragmentation pattern, most of the cleavages take place in the N-terminal part. Some of the ions are common between the two fragmentation spectra. With the help of this spectrum it is possible to find the structure of the 506 Da N-terminal gap discussed above. The only possible structure of b 2 -ion according to accurate mass is the pair of two alanine amino acids. A weak noise-level peak corresponding to the cleavage between these two amino acids (y 16 m/z 1713.7759 and m/z 857.3939) gives this hypothesis additional support. No additional cleavages were observed in the C-terminal part. Therefore its structure remains unclear. Using the two fragmentation spectra of the same peptide, it is possible to establish the partial sequence AAHPEEDDGGQ(PR)(425.2620). Matching it against the database gave three hits with identical sequence AAHPEEDDGGQPRPPGR-OH. All three of them are related to different types of structural cuticle protein. It is worth mentioning that the sequence of the prolinearginine pair is as predicted by the proline rule. The difference in the mass of the database peptides compared to the measured one is -0.9839 Da. This observed mass difference should be due to some modification of the non-sequenced C-terminal part. Its sequence is PPGR, and the most probable explanation of the mass shift is an amidation of the C-terminus, since none of the amino acids (proline, glycine, arginine) present in the peptide sequence have any known modifications that can lead to this mass change. The accuracy of match for the C-terminal part (425.2620 Da) is 0.17 ppm, and for the full peptide is −0.12 ppm. Most of the observed cleavages in the spectrum in Fig 4 represent fragmentation of the Nterminal part of the peptide. The interesting feature of this spectrum is an unusual mass difference between y 18 and y 19 (202.0788 Da), which can be attributed either to a VC or an AM pair (5.95 ppm). During database matching none of these sequences was found, however a closely related sequence can be found: ALEWNAAH. The mass difference can be nicely explained as oxidation of tryptophan to hydroxytryptophan (indicated as W ox in the sequence). The difference between theoretical and experimental mass is +0.0046 Da. The resulting three hits are related to the same three proteins, as in the previous case (Figs 2 and 3), while the peptide sequence is ALEWNAAHPEEDDGGQPRPPGR. The difference between the masses of peptides in the database and the measured mass is -0.9825 Da. Since this peptide is related to the same proteins it is reasonable to assume a C-terminal amidation leading to the mass shift. Mass errors for the C-terminal non-sequenced part and the whole peptide are 1.01 ppm and 0.53 ppm, respectively.
The fragmentation spectrum presented in  ATNALKRLLP-OH (both leucine and isoleucine are indicated with L). This sequence can be found in the 10 kDa heat shock protein. The mass error is 0.08 ppm for the complete peptide. Fig 6 shows the fragmentation spectrum of the peptide from vesicula seminalis, having accurate measured mass of 1359.7779 Da (uncharged). Using the quite prominent and long series of y-ions, supported by complementary b-ions (Fig 6), it is possible to establish the complete sequence of the peptide, excluding L/I identification, since the latter is not possible with the fragmentation method applied. This sequence is not present in any of the known proteins in the database, though the best match (FNVMGLGEPIRF) can be found in three proteins, annotated as Glutathione S-transferase S4. The difference between sequences is two single amino acid modifications at positions 2 and 4, namely N-> P and M-> K. Both substitutions are quite rare in proteins and, therefore, can be of some interest. The mass error for complete peptide is 1.10 ppm.
The complete list of all proteins found in male accessory glands of the bee drone is presented in Table 2 (Additional details can be found in S2 Table). The number of identified peptides and proteins is almost doubled after the de novo sequencing step (131 peptides, 50 proteins, 2.6 peptides/protein). Some of the identified peptides contain at least one modification (see S2  Table). All newly added peptides can be found at least in one protein from the database. However, in several cases new sequences can be attributed to more than one protein, but these proteins seem to be homologous and share the same function. De novo sequencing complements the database search results not only by adding new proteins, but also by giving additional support for the proteins found by SEQUEST and thus providing more than one peptide evidence per protein.
The distribution of proteins found in each gland is as follows: bulbus−12 (6 unique) proteins, cornua−15 (7 unique) proteins, vesicula seminalis−32 (28 unique) proteins. It should be mentioned, that most of the proteins are unique for a specific gland type, only one of them (H9KEY1/ATP synthase subunit O) was found in all three glands, 7 proteins were found in the two types of glands which contribute to the mating sign. The major part of identified proteins originates from vesicula seminalis, the only secretion which contributes to the seminal fluid.
The main functional classes of found proteins are the same as reported earlier [21,30,34], but the direct comparison of protein lists shows only partial overlap (Table 3). Collins et al. [21] investigated the protein composition of semen and seminal vesicles, and 8 proteins from the current investigation were found in their work as well. The tissue of origin matches our observation in all cases, except for ATP synthase (H9K918) identified only in the cornua gland. Baer et al. [30] focused on seminal-fluid protein composition, and there are only two proteins in common with our findings. Arginine kinase (H9K1E2) was found in the bulbus gland and the seminal vesicles; the second one is citrate synthase 1 which was identified in seminal vesicles. Chan et al. [34] published a protein atlas of the honey bee that included proteomic analysis of complete bee drone testis and mucus gland. The contents of mucus gland were investigated by us as well; however no proteins could be identified. There is little overlap between their observations and earlier reports [21,30] (Table 3). Twenty proteins identified in the current investigation were also found in the analysis of Chan et al. [34]. The vast majority of these proteins were present in both organs (testis and mucus gland), only three proteins (H9K6R4, H9KDD2 and Q5XUU6) are located in testis only. Since testis was analyzed as one sample, it is not possible to compare gland localization of identified proteins. According to our observations, most of the common proteins are present in vesicula seminalis. The only protein identified in all three glands (ATP synthase, H9KEY1) was reported by Chan et al. as well [34]. Although the overlap with the Bee protein atlas [34] is quite sound, a significant part of the proteins found in the current investigation was not reported earlier. The functional variety of overlapping proteins is quite broad and includes protein folding, carbohydrate and lipid metabolism, phosphorylation/phosphate metabolism, oxidoreductase activity, and structural activity.
Five proteins found in bulbus and cornua glands (A5A5E4, H9KBS5, H9KGM8, H9KU41, H9KUC2), both by de novo and database methods, are attributed as cuticular proteins. Both glands are of ectodermal origin and bulbus glands produce chitinous plates [18]. The remaining proteins can be separated to the same functional groups as reported by Baer et al [29,30] and Collins et al [21]. The defense group includes proteins involved in degradation of potentially harmful substances, e.g. glutathione S-transferases, peroxiredoxin, and several oxidoreductases. One of the glutathione S-transferases belongs to the sigma class. It is known that members of this class have high activity for lipid peroxidation products and were described in metabolically active tissues (e.g. flight muscles) in flies, in bee venom [35] and also in queen spermatheca and drone semen [12]. The proteins of this group are found mostly in seminal vesicles, the only gland contributing to the production of ejaculate.
It was shown that sugars and phospholipids are the primary sources of energy for sperm [36,37]. Therefore, the presence of proteins involved in degradation of side-products looks quite reasonable in vesicula seminalis. Proteins involved in metabolism of carbohydrates and lipids were found as well, e.g., malate dehydrogenase, apolipoproteins, citrate synthase, enoyl-CoA hydratase and others. The seminal vesicles contain proteins associated with carbohydrate metabolism, while bulbus and cornua glands contain apolipoproteins (one type each). Moors and Billen [26] reported that lipids form the major part of cornual secretion. The identification of apolipoprotein-III-like protein and electron transfer flavoprotein subunit beta-like supports this observation. The adjacent group of identified proteins involved in energy generation includes arginine kinase, aspartate aminotransferase, fructose-bisphosphate aldolase, citrate synthase, and ATP-related proteins. The largest number of them was found in seminal vesicles, indicating high energy demand of the sperm, however the smaller number of proteins were found in bulbus and cornua gland as well. The major part of protein-processing proteins, e.g., heat shock proteins and disulphate isomerase were identified in seminal vesicles. They might be involved in conditioning of the seminal fluid environment. Several representatives of this class were also found in bulbus and cornua glands. A relatively large group of proteins found almost exclusively in seminal vesicles belong to the structural/motor group, like myosin, actin, tubulin, troponin, the representatives of this class were also reported by Collins and Baer [21,30]. The group of signaling proteins consists of circadian-controlled protein (H9K7H5) and juvenile hormone transporter which were both found in the cornua gland. This type of proteins might mediate female controlling functions of sperm. The function of some identified proteins remains uncertain and might be the target of further investigations.
The possibility of all proteins being secreted was analyzed by TargetP [38]. Eleven proteins were identified as secreted ones and two contained a mitochondrion transport signal. The localization of the remaining proteins cannot be predicted. Thus, only about 20% of the proteins are predicted to be secreted. A similar ratio was also found for human and bee seminal proteins in earlier studies [30]. Prediction results are provided in Table 4.
The list of potentially secreted proteins includes all cuticle proteins, icarpin, apolipoproteins which are associated with lipid metabolism, both proteins with signaling functions, and disulfide-isomerase which belongs to the protein processing group.
It should be mentioned specifically that the prediction of secretory proteins is based solely on the presence or absence of some specific amino acid motifs in the protein sequence, which target it to the specific pathway. However, several other factors could influence the behavior of the protein, not mentioning the potential inaccuracy of the prediction.

Conclusions
The content of four different reproduction-related glands of the honeybee drone (bulbus, cornua, mucus and seminal vesicles) were analyzed separately, while proteins were found only in three of them, excluding the mucus gland. The combination of database search and de novo sequencing allowed the identification of a total of 50 proteins. The de novo sequencing approach significantly increased the amount of peptides and proteins identified in the samples and allowed establishing peptide modifications not included during database search. The largest number of proteins was found in seminal vesicles, the only gland that contributes to semen proteins. There are not many proteins in common between the various gland types. Most of them were found in one specific gland only. Comparing to earlier investigations on seminal fluid of bee drone and the content of some reproductive glands [21,30] it should be mentioned that most of the proteins found in this study weren't reported earlier. All proteins can be subdivided in the following functional groups: defense group that includes proteins involved in degradation of potentially harmful substances, proteins involved in metabolism of carbohydrates and lipids, proteins involved in energy generation, proteins with protein processing and protein conditioning functions, structural/motor proteins, signaling proteins, and several proteins with unknown function. Defense, carbohydrate metabolism, significant part of energy production related, motor and protein conditioning proteins were found to be located in seminal vesicles, indicating the necessity to provide male gametes with energy and protection. Cuticle proteins are unique for bulbus and cornua glands and might be related to the glands production of chitinous plates. Moreover, both cornua and bulbus glands contain lipid metabolism proteins, the representatives of this functional class were not found in other glands. The unique feature of cornua glands is the presence two signaling proteins. Using TargetP [38], 11 proteins were predicted to be secreted and 2 proteins to be located in mitochondria. The secreted group contains proteins involved in lipid metabolism, protein processing ones and both signaling proteins.
Supporting Information S1 Table. Detailed list of proteins identified only by database search (SEQUEST) in reproduction related glands of the bee (A. mellifera) drone. (DOCX) S2 Table. Detailed list of proteins identified by combination of database search (SEQUEST) and de novo sequencing in reproduction-related glands of the bee (A. mellifera) drone.
(DOCX) S1 Text. Description of the home-built supporting software.