Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota

Adaptation of organisms to environmental change may be facilitated by the creation of new genes. New genes without homologs in other lineages are known as taxonomically-restricted orphan genes and may result from divergence or de novo formation. Previously, we have extensively characterized the evolution and origin of such orphan genes in the nematode model organism Pristionchus pacificus. Here, we employ large-scale transcriptomics to establish potential functional associations and to measure the degree of transcriptional plasticity among orphan genes. Specifically, we analyzed 24 RNA-seq samples from adult P. pacificus worms raised on 24 different monoxenic bacterial cultures. Based on coexpression analysis, we identified 28 large modules that harbor 3,727 diplogastrid-specific orphan genes and that respond dynamically to different bacteria. These coexpression modules have distinct regulatory architecture and also exhibit differential expression patterns across development suggesting a link between bacterial response networks and development. Phylostratigraphy revealed a considerably high number of family- and even species-specific orphan genes in certain coexpression modules. This suggests that new genes are not attached randomly to existing cellular networks and that integration can happen very fast. Integrative analysis of protein domains, gene expression and ortholog data facilitated the assignments of biological labels for 22 coexpression modules with one of the largest, fast-evolving module being associated with spermatogenesis. In summary, this work presents the first functional annotation for thousands of P. pacificus orphan genes and reveals insights into their integration into environmentally responsive gene networks.


Response:
We absolutely agree with the reviewer. We added these concerns to the discussion section. Nevertheless, also in agreement with the reviewer, we would argue that this data set is still valid to infer coexpression modules and that our main conclusions are therefore not majorly affected. This is further supported by our systematic analysis of subsampled data sets (Supplemental Fig S5), which shows that the number of environmentally responsive orphan genes consistently stays high across all combinations of data sets.

Comment:
Although the authors initially (in their results) propose interpretations that, one may argue, go beyond what can be reasonably concluded from their results, the discussion is very sensible and highlights key limitations of the study, thus providing a fair and insightful account of their research.
They further highlight the fact that the main value of the study lies as a resource and a predictive tool to guide future investigations of orphan genes, which I agree with. Beyond that, the various ways they analyzed their data and compared them to other published datasets has clear merit and will be a source of inspiration for others.

Response:
We again thank the reviewer for this comment. We hope that the additional analyses (e.g. subsampling RNA-seq data sets, simulations of network integration) better support our conclusions. Nevertheless, we went through the results section and adjusted our interpretations to be more in line with the discussion.
Comment: Specific issues 1) Some methods are minimal, particularly for sample preparation, and would not allow for replication of the findings. Additional references to the standard protocols used, and details about the sequencing facility used (one cannot assume that an Illumina platform is standard equipment, yet) are necessary.

Response:
We have expanded the corresponding methods section.

Comment:
2) The authors should also be careful in their use of the word nematode (generalization) in lieu of P. pacificus. This is particularly problematic when looking at interactions with bacteria as adult P. pacificus are parasites and not bacterivorous predators, by contrast with C. elegans for instance. C. elegans adult responses to bacteria likely reflect influences from both a dietary and a gut microbiota component. Generalizing statements to "nematodes" that include species with distinct ecologies is inaccurate.
Response: For clarification, we would like to point out that there is no evidence that P. pacificus are parasites. Nevertheless, we went through the text again and removed any unnecessary generalizations.

Comment:
3) Lines 298-299: « suggesting that the transcriptomic response of nematodes to various environmental microbiota do not strictly reflect phylogeny ». This is seemingly true for P. pacificus but perhaps misleading as it could be that strain-specific changes are obscuring phylum specific changes (a greater coverage of bacterial diversity might reveal that). This could also be due to a methodological limitation: the limited ability to detect changes led to overlooking phylum-specific changes.
Response: Thanks, we have added this alternative explanation to the corresponding results section.

Comment: Most of the variation seem to come from a couple of isolates LRB104, 80 and 17, which may dwarf other changes and lower the discovery rate of other interesting effects. Would the authors consider rerunning the analysis excluding at least LRB104?
Response: Thanks for this suggestion. We recomputed the coexpression network without LRB104 (Supplemental Fig S4), but that has only a slight impact on gene module assignment. In addition, we have systematically explored the structure of the coexpression networks as a function of the number of RNA-seq samples. These results show that the number of environmentally responsive orphan genes stays in the range of thousands of genes irrespectively of the data set.

Comment: Fig2
, could the authors also provide a reordered ranking of bacterial list based on phylogeny to help visualizing whether one or more modules may change expression based on bacterial phylogeny? Not all transcriptional changes may be equal in terms of biological relevance and key modules that could be related to known immune or metabolic pathways might actually map phylogeny when others don't. The whole transcriptome picture may be masking that.
Response: Thanks, for the suggestion. We have grouped the bacteria by genus and class. In addition, we have downsampled the largest clusters to allow better visualization of the smaller modules. This may be used to screen for modules that agree better with the phylogeny than others. 3 legend and/or the text what control condition they used to monitor gene expression in the 28 modules over time? I assume it is E. coli OP50 but it needs to be made clear there for non-nematologists. The authors should also specify the nematode species in the legend. It would also be very interesting to identify the timing of larval stages either on the left or the right of the Fig3 plot.

Comment: 4) Could the authors specify in the fig
Response: Thanks, we added the control condition in the legend and labeled the larval stages.
that removal of the data for the most strongly affecting bacteria LRB104 does not change module membership. Repeating this with other expression profiles -removing data for different strains and recomputing the graph -could lend further confidence in the described model.

Response:
We have done a more systematic analysis of the coexpression networks from subsampled data (Supplemental Fig S5). This showed that with fewer RNA-seq samples, network modules actually tend to get larger and this consequently increases the number of orphan genes. Thus, the full data set with all 24 RNA-seq samples yields the most conservative lower estimate for the number of orphan genes that are part of environmentally responsive gene modules. These results have been included in the corresponding results section. fig. 3 the authors suggest that developmental signatures in co-expression modules on different bacteria may represent effects of different bacteria on developmental timing, but the authors indicated earlier that all worms harvested were adults of the same, relatively precise stage. If so, developmental delays are not an option here. This runs the risk of confusing the reader.