Deep reinforcement learning for the control of microbial co-cultures in bioreactors

doi:10.1371/journal.pcbi.1007783

Deep reinforcement learning for the control of microbial co-cultures in bioreactors

Fig 1

Reinforcement learning for the control of two auxotrophic species in a chemostat.

(A) The basic reinforcement learning loop; the agent interacts with its environment through actions and observes the state of the environment along with a reward. The agent acts to maximise the total reward it receives (the return). (B) System of two auxotrophs dependent on two different nutrients, with competition over a common carbon source. (C) Diagram of a chemostat. The state observed by the reinforcement learning agent is composed of the populations of two strains of bacteria; the actions taken by the agent control the concentration of auxotrophic nutrients flowing into the reactor. (D) Representative system trajectory. The agent’s actions, taken at discrete time-points (circles), influence the state dynamics (black arrows), with the aim of fulfilling the reward condition (moving to the centre of the green circle). The state is comprised of the (continuously-defined) abundance of two microbial populations, N₁ and N₂. The agent’s actions dictate the rate at which auxotrophic nutrients flow into the reactor. At each time-step, the agent’s reward is dependent on the distance between the current state and the target state.

doi: https://doi.org/10.1371/journal.pcbi.1007783.g001