Table 1.
Comparison of maintained* open-source IT tools and their functionalities for full-stack DBTL cycle.
Fig 1.
Conversion of natural language lab protocols for iterative design-build-test-learn cycles to literate protocols using teemi.
Natural language protocols (left—blue) comprehensible to humans are converted into computer code (right—yellow) that can be understood by both computers and humans. In teemi, each procedure in natural language protocols is connected with names of python modules in literate protocols, thus lowering the programming entry level needed for adopting teemi. See also S1 Fig for more details. Created with Biorender.com.
Table 2.
Overview of the notebooks created for this work.
Fig 2.
Design and characteristics of the constituent DNA parts used as experimental testbed for teemi.
(A) The ten-step biosynthetic pathway converting geraniol to strictosidine. The G8H step is highlighted in a dashed box [26]. (B-C) Rooted phylogenetic trees of G8H (D) and CPR (E) protein representatives. Uniprot identifiers are shown in parentheses. Catharanthus roseus (Cro), Rauvolfia serpentina (Rse), Olea europaea (Oeu), Camptotheca acuminata (Cac), Vinca minor (Vmi), Cinchona calisaya (Cca), Ophiarrhiza pumila (Opu), and Swertia mussatii (Smu), Artemisia annua (Aan), Arabidopsis thaliana (Ath), Catharanthus longifolius (Clo), Amsania hubrichtii (Ahu), and Aspergillus niger (Ani). (D-E) Temporal resolution of transcript abundances for candidate genes [34], for which promoters were chosen to control the expression of genes encoding G8H (D) and CPR (D) homologous. (F) Combinatorial assembly and genome integration strategy.
Fig 3.
Design, characterization and modeling of design-build-test-learn cycle I.
(A) Outline of the stochastic sampling and test workflow for data generation. Created with Biorender.com. (B) The distribution and counts of parts from the 167 strains that were accepted as input for machine learning in the first learning phase of the first DBTL cycle. (C) The distribution of observed strictosidine titers relative to reference strain MIA-CH-A2. Below the bar plot the distribution of parts for each of the 238 analyzed strains is presented. (D) Cross-validated predictions vs average normalized strictosidine production. All values are ranked.
Table 3.
Machine-learning model characteristics.
Fig 4.
Design, characterization and modeling of design-build-test-learn cycle II.
(A) The distribution and counts of parts from the strains that were accepted as input for machine learning in the second learning phase of the second cycle of DBTL. (B) The distribution of observed strictosidine titers relative to reference strain MIA-CH-A2. Below the bar plot the distribution of parts for each of the 240 analysed strains is presented. (C) Cross-validated predictions vs average normalized strictosidine production. All values are ranked.
Fig 5.
Learning curves and top-ranking strains designs from the iterative engineering cycles.
Learning curves from the first (A) and second (B) DBTL cycles, illustrating mean absolute error (MAE) of the best-performing deep learning and XGBoost models used cycle I and II, respectively, in relation to the number of data points (blue line) and the cross-validation holdout prediction MAE together with the standard deviations of the 10 models created (yellow line). The points are based on 10 models created with a randomized shuffled data in partitions of 33, 67, 100% and 20, 40, 60, 80 and 100% of the data available for dbtl1 and dbtl2 respectively to get the same size of partitions. (C) Average strictosidine production for Top-20 strains from first and second DBTL cycles. Genotypes are shown (left) with their respective color codes (middle) and average strictosidine production (right). For the strictosidine production, the light and dark blue colors correspond to strain designs that were first found in the first and second second DBTL cycle, respectively.