The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific

Categories of Recruitment Metadata

The recruitment metadata distinguishes eight different general categories based on the relative placement of paired end sequencing reads (mated reads) when recruited to a reference sequence in comparison to their known orientation and separation on the clone from which they were derived. Assuming orientation is correct, two mated reads can be recruited closer together, further apart, or within expected distances given the size of the clone from which the sequences were derived. These sequences are categorized as “short,” “long,” or “good,” respectively. Alternately, the mated reads may be recruited in a mis-oriented fashion, which trumps issues of separation. These reads can be categorized as “normal,” “anti-normal,” or “outie.” In addition, there are two other categories. “No mate” indicates that no mated read was available for recruitment, possibly due to sequencing error. Perhaps most useful of any of the recruitment categories, “missing” mates indicate that while a mated sequence was available, it was not recruited to the reference. “Missing” mates identify breaks in synteny between the environmental data and the reference sequence.

