De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads: Sordaria macrospora, a Model Organism for Fungal Morphogenesis
(A) Comparison of partly orthologous polyketide biosynthesis clusters from S. macrospora (scaffold_17, S.m.) and P. nodorum (supercontig 16, P.n., data for P. nodorum are from the Stagonospora nodorum database at http://www.broadinstitute.org/annotation/genome/stagonospora_nodorum/Home.html ). The six genes for which an ortholog is present in both clusters are shown in green, orthology is indicated by gray bars between the genes. Genes for which no orthologs are present in both clusters are given in blue. (B) Percent identity from BLASTP analysis (e-value ≤10−5) from a comparison of S. macrospora proteins versus P. nodorum proteins. Mean values of percent protein identity were calculated for (I) all proteins with a significant hit (e-value ≤10−5, 7424 proteins), (II) all proteins that contain a Pfam domain from one of the five Pfam domain families that are represented within the orthologous proteins from the cluster (137 proteins, the domains are adh_short, FAD_binding_3, p450, PAL, and UbiA, Table S8), (III) the orthologous proteins from the cluster (six proteins, indicated in green in A). The mean percent sequence identity for the orthologous proteins from the cluster is significantly higher (p = 0.001) than either of the other two mean sequence identity values as indicated by an asterisk.