Materials
CaCl2, paraformaldehyde and tri-sodium citrate dihydrate were obtained from Sigma Aldrich. NaOH, thymol and copper(II) sulfate pentahydrate were obtained from Fisher Scientific. 1,3-Dihydroxyacetone dimer was obtained from Fluorochem EU. Resorcinol was obtained from TCI Europe NV. Formaldehyde solutions were prepared by depolymerization of paraformaldehyde at 60 °C; the final formaldehyde concentration was determined by titration with sodium sulfite and phenolphthalein45. For all aqueous solutions, ultrapure water, obtained from an Elga Purelab Chorus 1, was used. Before use, water was degassed by stirring under vacuum for 10–15 min. Ion mobility mass spectrometry experiments were performed with a timsToF instrument (Bruker Daltonics) equipped with an electrospray ionization source operating in positive mode.
Flow reactions
A CSTR (volume 435 μl) with five inlets and an outlet was fabricated from poly(methyl methacrylate) by the Radboud TechnoCentre. LabM8 syringe pumps with BD Plastipak syringes were used to control input flow rates (a schematic drawing and a photograph of the set-up are provided in Extended Data Fig. 1). Syringes were loaded with the specified solutions and connected to the reactor with filled tubing. When the flow of water was divided between two syringes, the flows were fused using a Y connector before reaching the reactor. The reactor and a small outlet tubing were filled according to the initial conditions of the experiment. Once filled, the outlet tubing was capped with a one-way flow check valve and the system was allowed to build up pressure to overcome the crack pressure of the valve. Subsequently, the reactor output was diluted with a water flow (0.8 ml min−1) controlled by a Bruker Elute HPG 1300 high-performance liquid chromatography system. The dilution flow was merged with the reactor outlet with a Y connector. With a subsequent Y connector, the flow was diverted between the instrument and a Restek RT-25020 backpressure regulator connected to a waste line. The backpressure regulator provided a constant pressure of 2 bars in the reservoir.
Inputs to the reactor were controlled by changing the flow rates of selected syringes. For a desired input concentration Cin, the flow rate can be calculated as F = Ftot Cin/Csyr, with Ftot the total flow rate of the system (217.5 μl min−1 in all experiments, corresponding to a residence time of 2 min) and Csyr the concentration of the selected syringe.
Flow inputs
Experimental conditions were selected based on previously published research4, to create high compositional diversity over the used concentration range. The general workflow consisted of first generating a desired input function u(t), and then scaling the generated function to a suitable input profile. To do so, the function was first mean-centred, scaled with a manually chosen factor, which was chosen to maximize amplitude without generating negative flows. After scaling, a baseline value of the corresponding syringe was added, so the profile fluctuated around the initial input concentration. The flow rate of water was used to counterbalance changes in flow rate, to ensure a constant total flow rate of 217.5 μl min−1, corresponding to a residence time of 2 min. All reactions were allowed to equilibrate at steady state for at least 30 min before starting flow profiles, which included another initial 30 min of steady state. Details of the parameters used for generating the various flow profiles are provided in the respective Methods sections.
Mass spectrometry
Trapped ion mobility spectrometry (TIMS) experiments were performed using an N2 carrier gas by scanning inverse ion mobilities from 0.4 Vs cm−2 to 0.84 Vs cm−2. The ramp time was set to 500 ms and the accumulation time to 20 ms to minimize ion activation in the TIMS region. The mass range scanned by the time-of-flight (ToF) analyser was set to m/z 50–650. A complete description of the instrumental parameters is available in Supplementary Information section 1.1.
Ion intensity extraction
A list of ions with reference m/z and inverse mobilities was established based on the most intense signals observed (Supplementary Information section 1.2). Ion chromatograms were then extracted for mass- and mobility-selected ions based on the reference list of ions using the TimsPy library46. Ion chromatograms were extracted with a mass width of 0.02 Da and a mobility width of 0.006 Vs cm−2.
Nonlinear classification
Varying input concentrations of formaldehyde and NaOH were applied with constant concentrations of DHA (50 mM) and CaCl2 (15 mM). For every input in the nonlinear classification dataset, ion signals were collected for 30 min (Supplementary Information section 3.1 and Supplementary Figs. 5 and 6). The output in the last 10 min of this period were averaged to reduce noise and used as steady-state data, resulting in 106-dimensional vectors for all 132 inputs. These vectors were subsequently normalized to remove the mean and scaled to unit variance across features. For the selected nonlinear classification tasks, a linear support vector classifier was trained to obtain classifications of the inputs. For every task, a stratified leave-five-out cross-validation was performed, with 520 repeats in total, with every input as part of the test set 20 times (20 repeats of 26 random splits, five inputs per split), and the Φ score was calculated over the test set for every repeat as
$$\varPhi =\frac{{\rm{TP}}\times {\rm{TN}}-{\rm{FP}}\times {\rm{FN}}}{\sqrt{\left({\rm{TP}}+{\rm{FP}}\right)\left({\rm{TP}}+{\rm{FN}}\right)\left({\rm{TN}}+{\rm{FP}}\right)\left({\rm{TN}}+{\rm{FN}}\right)}}$$
where TP denotes the number of true positives, TN denotes true negatives, FP denotes false positives and FN denotes false negatives. This score returns +1 for perfect predictions, and −1 for completely wrong predictions. The reported Φ accuracy was then obtained as (Φ + 1)/2, and averaged over all 520 repeats. More information is available in Supplementary Information sections 3.2–3.6, and code is provided in the analysis/classification.ipynb notebook.
Complex dynamics prediction
A fluctuating DHA flow profile was sampled from a normal distribution with a mean of 36.25 μl min−1 and a standard deviation of 10.36 μl min−1, corresponding to a mean input concentration of 50 mM, with a standard deviation of approximately 14.268 mM. Each flow rate was held constant for 60 s before switching, with an inversely fluctuating water input to ensure the total flow rate remained constant. The formaldehyde, NaOH and CaCl2 inputs were held constant at 50 mM, 30 mM and 15 mM, respectively.
For the in silico simulation of the carbon metabolism of E. coli, a Systems Biology Markup Language (SBML) model from ref. 36 was adapted to include inflow and outflow terms for every substrate of the form \(\varnothing \to X\left({k}_{{\rm{f}}}{X}_{{\rm{in}}}\right)\) and \(X\to \varnothing \left({k}_{{\rm{f}}}X\right)\). The flow constant (residence time) was set to \({k}_{{\rm{f}}}=0.5\,{\min }^{-1}\), and the inflow concentrations \({X}_{{\rm{in}}}\) were set to the initial concentrations of the model. This modified SBML file was subsequently compiled into a C++ module by the AMICI computational package47 and loaded as a Python module.
To generate the training and test sets, the model was first run for 1,000 min until a steady state was reached. Then, for every step in the fluctuating input pattern, the DHA input flow concentration was set to the corresponding value of the physical fluctuating flow profile. The model was simulated with this input flow for the duration of the physical flow profile (1 min) before the new DHA input flow was set. For every step, the simulation was initialized at the final state of the previous step. By appending the results of all simulation steps, a complete record of the behaviour of the network under fluctuating conditions was obtained.
Next, the same DHA input flow was used as input into the formose reservoir. The response of the formose reservoir was collected every 500 ms, after which the output was averaged over bins of 10 s to reduce noise. The recorded formose reservoir response was trained on the individual substrate time series of the model for 30 min, using a ridge regression algorithm with the regularization strength set to α = 5 × 10−5. The trained weights were then used to predict the substrate time series directly from the reservoir output for the remainder of the measurement time (code available in the analysis/dynamics.ipynb notebook).
Absolute scaled error
To compare predictions for the dynamic tasks to the true values over time, we calculated the absolute scaled error (ASE), which is the absolute error between predictions and true values, divided by the mean absolute error of a naive mean forecast based on the training data. This error measure produces scale-invariant values that can be used to compare predictions across different data scales.
$${\rm{ASE}}(t)=\frac{| \,\widehat{y}(t)-y(t)| }{\frac{1}{{T}_{{\rm{train}}}}\mathop{\sum }\limits_{t}^{{T}_{{\rm{train}}}}| \,y(t)-{\bar{y}}_{{\rm{train}}}| }$$
where ŷ(t) is the prediction at time t, y(t) is the true value at time t and \({\bar{y}}_{{\rm{train}}}\) is the mean value of the train set.
Forecasting
DHA, NaOH and formaldehyde inputs were simultaneously varied according to the dynamics of a Lorenz attractor (ρ = 28, σ = 10, β = 8/3) over the duration of 8 h, with a constant concentration of CaCl2 (15 mM). The x, y and z axes were scaled to the NaOH, DHA and formaldehyde inputs by 1.4, 1.0 and 1.3, respectively. For a more detailed description of the flow profile used, see Supplementary Information section 5.1. The reservoir response was measured every 500 ms, after which the output was averaged over bins of 10 s to reduce noise. Next, a ridge regression algorithm was used with the regularization strength set to α = 5 × 10−5 to train the formose response on the input flows 2 min (120 s) into the future for a duration of 30 min. The trained weights were then used to forecast the input flows 2 min into the future directly from the reservoir output for the remainder of the measurement time (code available in the analysis/forecast.ipynb notebook).
Mutual information
Mutual information is defined for a pair of random variables X and Y as
$$I(X{\rm{;}}Y)=\sum _{Y}\sum _{X}{P}_{(X,Y)}(x,y)\log \,\left(\frac{{P}_{(X,Y)}(x,y)}{{P}_{X}(x){P}_{Y}(y)}\right)$$
with PX and PY the marginal distributions, and P(X,Y) the joint distribution of the random variables. This formula was adapted to calculate the mutual information between a time-dependent input signal u(t) and a single ion output signal at a different time x(t + δt) as
$${I}_{U{\rm{;}}X}\left({\rm{\delta }}t\right)={\sum }_{t}{P}_{U,X}\left({x}_{t+{\rm{\delta }}t},{u}_{t}\right)\mathrm{ln}\frac{{P}_{U,X}\left({x}_{t+{\rm{\delta }}t},{u}_{t}\right)}{{P}_{X}\left({x}_{t+{\rm{\delta }}t}\right){P}_{U}\left({u}_{t}\right)}$$
(1)
where u(t) is a time-dependent input flow and x(t) is a single ion trace over time. An implementation from the Scikit-learn computational package48,49 was used to perform the calculations (code available in the analysis/mutual_information.ipynb notebook).
Chemical read-out for the formose reaction
Benedict’s reagent was prepared by dissolving sodium citrate (8.65 g) and Na2CO3 (5.0 g) in 40 ml of water. CuSO4·5H2O was dissolved in 5 ml of water and slowly mixed with the sodium citrate, sodium carbonate solution. The solution was further diluted to 50 ml total volume. Seliwanoff’s reagent was prepared by dissolving resorcinol (25 mg) or thymol (25 mg) in HCl (3 M, 50 ml). Polytetrafluoroethylene (PTFE) tubing, 1/16” outside diameter × 0.032” inside diameter, was used as a plug-flow reactor, with a total volume of 480 μl. Unless otherwise specified, concentrations were 50 mM for DHA, 30 mM for NaOH, 15 mM for CaCl2 and 100 mM for formaldehyde, with a 4 min residence time. Output was sampled by connecting the plug-flow reactor to a Bio-Rad drop former. Droplets were collected in a liquid nitrogen-cooled microplate, with two droplets (70 μl) per well. To each well, 150 μl of colorimetric reagent (Benedict’s, Seliwanoff’s resorcinol or Seliwanoff’s thymol) was added. The microplates were heated to 100 °C using a Grant-bio PHMP-100, and pictures were taken at regular intervals, using a Nikon Z5 with a Laowa 100-mm F2.8 CA-Dreamer Macro 2X. Images were adapted to plots using OpenCV-Python50.
To achieve a direct read-out of the formose reservoir, we added reagents to the reaction mixture. The overall sum of the concentration of compounds in the mixture ‘multiplied’ by the reaction with the added reagent results in a specific colorimetric response depending on both mixture composition and reagent. As Extended Data Fig. 5 shows, each combination results in a specific hue or colour. The reagents thus function as a fixed read-out layer to the reaction mixture, chemically setting the read-out weights. The reagents tested here have different mechanisms for their colorimetric response: Benedict’s reagent functions under basic conditions, through the oxidation of Cu(II) to Cu(I), and the associated colour changes from blue to red; Seliwanoff’s reagent (both for resorcinol and thymol) functions under acidic conditions, where compounds are first dehydrated towards furfural derivates, followed by a condensation reaction with resorcinol or thymol, forming a coloured dye. Seliwanoff’s reagent classically uses resorcinol, but, in principle, other phenolic compounds can be used. We demonstrate colorimetric responses for a grid consisting of varying DHA and NaOH inputs. For Benedict’s reagent, we observe increasing sensitivity with decreasing NaOH:DHA, compared to Seliwanoff’s reagent, for which we observe increasing sensitivity with increasing NaOH:DHA for both the resorcinol- and the thymol-based reagents.
The chemical read-out allows for the identification of different environmental inputs, especially if combinations of different reagents or reaction times are used. Potentially, this approach may be further extended to incorporate a feedback mechanism that can change the amount and type of reagents and modify other system hyperparameters to perform a specific computational task, either through the inclusion of an in-the-loop computer or directly through physical learning51.