AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning

Paper (Nature Communications): https://doi.org/10.1038/s41467-023-37139-y

Abstract

Closed-loop, autonomous experimentation enables accelerated and material-efficient exploration of large reaction spaces without the need for user intervention. However, autonomous exploration of advanced materials with complex, multi-step processes and data sparse environments remains a challenge. In this work, we present AlphaFlow, a self-driven fluidic lab capable of autonomous discovery of complex multi-step chemistries. AlphaFlow uses reinforcement learning integrated with a modular microdroplet reactor capable of performing reaction steps with variable sequence, phase separation, washing, and continuous in-situ spectral monitoring. To demonstrate the power of reinforcement learning toward high dimensionality multi-step chemistries, we use AlphaFlow to discover and optimize synthetic routes for shell-growth of core-shell semiconductor nanoparticles, inspired by colloidal atomic layer deposition (cALD). Without prior knowledge of conventional cALD parameters, AlphaFlow successfully identified and optimized a novel multi-step reaction route, with up to 40 parameters, that outperformed conventional sequences. Through this work, we demonstrate the capabilities of closed-loop, reinforcement learning-guided systems in exploring and solving challenges in multi-step nanoparticle syntheses, while relying solely on in-house generated data from a miniaturized microfluidic platform. Further application of AlphaFlow in multi-step chemistries beyond cALD can lead to accelerated fundamental knowledge generation as well as synthetic route discoveries and optimization.

a Illustration of an RL-based feedback loop between the learning agent and the automated experimental environment. b Schematic of full reactor system with (I) reagent injection, (II) droplet oscillation, (III) optical sampling, (IV) phase separation, (V) waste collection, and (VI) refill modules. c Schematics of individual module functions corresponding to (i) formulation, (ii) synthesis, (iii) characterization, and (iv) phase separation. d General flow diagram of learning agent condition selection process. e, f Block diagram of the reaction space exploration campaigns, sequence selection, and volume-time optimization, respectively. P1, P2, P3, and P4 correspond to an arbitrary set of injection reagents, which for the purpose of this study, are oleylamine, sodium sulfide, cadmium acetate, and formamide, respectively. Sequence selection was performed using constant reagent injection volumes and reaction times and directing the system to select the order that reagents are injected. Volume-time optimization was conducted by using an autonomously learned order of reagent injections, specified by the sequence selection campaign, and setting the system to identify optimal injection volumes and reaction times for each of the twenty steps.