Deep Evolutionary Comparison of Gene Expression Identifies Parallel Recruitment of Trans-Factors in Two Independent Origins of C4 Photosynthesis
Identification of homologues and quantification of gene expression after de novo assembly, for full details see Text S1. (A) Correlation in quantification derived from reciprocal best BLAST (RBB) hits in the de novo assembly and reference summed over all transcript isoforms per reference gene locus. (B) The Spearman correlation in transcript abundances between the reference guided estimation and estimates generated using different transcript orthology assignment methods on the same de novo assembled transcriptome. “RBB only” means that only the reciprocal best BLAST transcripts were selected. E-value cut-offs (e.g. 1e-5) indicate the fixed value at which sequences were determined to be homologues. OrthoMCL indicates that OrthoMCL was used to cluster and identify orthologous transcript groups. Finally, the black bar indicates the effect of varying the percentile cut-off on the abundance estimate accuracy of the conditional orthology assignment method. (C) Conditional orthology assignment method begins by performing all versus all BLAST searches of the assembled transcripts against a reference proteome. (D) The reciprocating hits (indicated by blue lines) are selected for self-training. (E) The reciprocating hits are binned according to assembled transcript length and a quadratic model is fit to the e-value and length data. (F) Non-reciprocating hits which fall above the curve are accepted as putative homologues, non-reciprocating hits which fall below the curve are rejected. (G) Correlation in quantification derived from conditional assigned transcripts using species own reference genome. (H) Correlation in quantification derived from conditional assigned transcripts using intermediary reference genome. For full details, validation and explanation please see the supplementary methods (Text S1).