Preprints
P. Piho, P. Thomas (2023) “Feedback between stochastic gene networks and population dynamics enables cellular decision making” biorxiv
D. Volteras, V. Shahrezaei, P. Thomas (2023) “Global transcription regulation revealed from dynamical correlations in time-resolved single-cell RNA-sequencing”
bioRxiv
Peer reviewed
W. Tang, A.C.S. Jørgensen, S. Marguerat, P. Thomas, V. Shahrezaei (2023) “Modelling capture efficiency of single cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics” Bioinformatics btad395
Abstract
Gene expression is characterised by stochastic bursts of transcription that occur at brief and random periods of promoter activity. The kinetics of gene expression burstiness differs across the genome and is dependent on the promoter sequence, among other factors. Single-cell RNA sequencing (scRNA-seq) has made it possible to quantify the cell-to-cell variability in transcription at a global genome-wide level. However, scRNA-seq data is prone to technical variability, including low and variable capture efficiency of transcripts from individual cells. Here, we propose a novel mathematical theory for the observed variability in scRNA-seq data. Our method captures burst kinetics and variability in both cell size and capture efficiency, which allows us to propose several likelihood-based and simulation-based methods for the inference of burst kinetics from scRNA-seq data. Using both synthetic and real data, we show that the simulation-based methods provide an accurate, robust and flexible tool for inferring burst kinetics from scRNA-seq data. In particular, in supervised manner, a simulation-based inference method based on neural networks proves to be accurate and useful in application to both allele and non-allele specific scRNA-seq data.
F. A. Hughes, A. Barr, P. Thomas (2022) “Patterns of interdivision time correlations reveal hidden cell cycle factors” eLife 11, e80927
Abstract
The time taken for cells to complete a round of cell division is a stochastic process controlled, in part, by intracellular factors. These factors can be inherited across cellular generations which gives rise to, often non-intuitive, correlation patterns in cell cycle timing between cells of different family relationships on lineage trees. Here, we formulate a framework of hidden inherited factors affecting the cell cycle that unifies known cell cycle control models and reveals three distinct interdivision time correlation patterns: aperiodic, alternator, and oscillator. We use Bayesian inference with single-cell datasets of cell division in bacteria, mammalian and cancer cells, to identify the inheritance motifs that underlie these datasets. From our inference, we find that interdivision time correlation patterns do not identify a single cell cycle model but generally admit a broad posterior distribution of possible mechanisms. Despite this unidentifiability, we observe that the inferred patterns reveal interpretable inheritance dynamics and hidden rhythmicity of cell cycle factors. This reveals that cell cycle factors are commonly driven by circadian rhythms, but their period may differ in cancer. Our quantitative analysis thus reveals that correlation patterns are an emergent phenomenon that impact cell proliferation and these patterns may be altered in disease.
P. Thomas, V. Shahrezaei (2021) “Coordination of gene expression noise with cell size: analytical results for agent-based models of growing cell populations” Royal Soc Interface 18, 20210274
Summary
We often seek simple, effective models to understand the behaviour of complex systems. To understand gene expression in cells, for example, it is often assumed that effective dilution reactions and extrinsic noise sources affect intracellular concentrations. We develop a theory of cellular agents that grow and divide to investigate how intracellular reaction networks are affected by stochasticity in cell size dynamics and division. We show that stochastic concentration homeostasis - an extension of deterministic concentration homeostasis - guarantees the validity of the effective models, and demonstrate where effective cell models fail and where they perform well.
Abstract
The chemical master equation and the Gillespie algorithm are widely used to model the reaction kinetics inside living cells. It is thereby assumed that cell growth and division can be modelled through effective dilution reactions and extrinsic noise sources. We here re-examine these paradigms through developing an analytical agent-based framework of growing and dividing cells accompanied by an exact simulation algorithm, which allows us to quantify the dynamics of virtually any intracellular reaction network affected by stochastic cell size control and division noise. We find that the solution of the chemical master equation – including static extrinsic noise – exactly agrees with the agent-based formulation when the network under study exhibits *stochastic concentration homeostasis*, a novel condition that generalises concentration homeostasis in deterministic systems to higher order moments and distributions. We illustrate stochastic concentration homeostasis for a range of common gene expression networks. When this condition is not met, we demonstrate by extending the linear noise approximation to agent-based models that the dependence of gene expression noise on cell size can qualitatively deviate from the chemical master equation. Surprisingly, the total noise of the agent-based approach can still be well approximated by extrinsic noise models.
J. Kuntz, P. Thomas, G.B. Stan, M. Barahona (2021) “Approximations of countably-infinite linear programs over bounded measure spaces” SIAM J Optimisation 31, 604–625.
Abstract
We study a class of countably infinite linear programs (CILPs) whose feasible sets are bounded subsets of appropriately defined spaces of measures. The optimal value, optimal points, and minimal points of these CILPs can be approximated by solving finite-dimensional linear programs. We show how to construct finite-dimensional programs that lead to approximations with easy-to-evaluate error bounds, and we prove that the errors converge to zero as the size of the finite-dimensional programs approaches that of the original problem. We discuss the use of our methods in the computation of the stationary distributions, occupation measures, and exit distributions of Markov chains.
J. Kuntz, P. Thomas, G.B. Stan, M. Barahona (2021) “Stationary distributions of continuous-time Markov chains: a review of theory and truncation-based approximations” SIAM Review 63, 3–64.
Summary
We review the mathematical properties of truncation-based approximation methods to compute stationary distributions of the chemical master equation and general CTMCs.
Abstract
Computing the stationary distributions of a continuous-time Markov chain (CTMC) involves solving a set of linear equations. In most cases of interest, the number of equations is infinite or too large, and the equations cannot be solved analytically or numerically. Several approximation schemes overcome this issue by truncating the state space to a manageable size. In this review, we first give a comprehensive theoretical account of the stationary distributions and their relation to the long-term behaviour of CTMCs that is readily accessible to non-experts and free of irreducibility assumptions made in standard texts. We then review truncation-based approximation schemes for CTMCs with infinite state spaces paying particular attention to the schemes' convergence and the errors they introduce, and we illustrate their performance with an example of a stochastic reaction network of relevance in biology and chemistry. We conclude by elaborating on computational trade-offs associated with error control and several open questions.
M. Tonn, P. Thomas, M. Barahona, & D.A. Oyarzún (2020). “Computation of single-cell metabolite distributions using mixture models” Front Cell Develop Biol, 8, 1596.
Abstract
Metabolic heterogeneity is widely recognized as the next challenge in our understanding of non-genetic variation. A growing body of evidence suggests that metabolic heterogeneity may result from the inherent stochasticity of intracellular events. However, metabolism has been traditionally viewed as a purely deterministic process, on the basis that highly abundant metabolites tend to filter out stochastic phenomena. Here we bridge this gap with a general method for prediction of metabolite distributions across single cells. By exploiting the separation of time scales between enzyme expression and enzyme kinetics, our method produces estimates for metabolite distributions without the lengthy stochastic simulations that would be typically required for large metabolic models. The metabolite distributions take the form of Gaussian mixture models that are directly computable from single-cell expression data and standard deterministic models for metabolic pathways. The proposed mixture models provide a systematic method to predict the impact of biochemical parameters on metabolite distributions. Our method lays the groundwork for identifying the molecular processes that shape metabolic heterogeneity and its functional implications in disease.
P. Thomas (2020) “Stochastic modelling approaches for single-cell analyses” in Systems Medicine: Integrative Qualitative and Computational Approaches (ed. By O Wolkenhauer), Vol.3, pg. 45-55 Academic Press
Abstract
Single-cell analyses are becoming increasingly important in cell biology and personalized approaches to medicine. Such analyses frequently reveal heterogeneity that exists within and between cells. We give a concise overview of stochastic methods used to analyse non-genetic heterogeneity in models of cell populations and examine several analytical results on the determinants of gene expression noise. We then review models that advanced our understanding of stochastic phenomena in cellular decision making, stem cell differentiation, tissue homoeostasis and cell cycle dynamics.
W. Tang, F. Bertaux, P. Thomas, C. Stefanelli, M. Saint, S. Marguerat, V. Shahrezaei (2020) “bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data” Bioinformatics 36, 1174-1181
Abstract
Normalization of single-cell RNA-sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient and unified approach for normalization, imputation and batch effect correction. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We first validate our assumptions by showing this model can reproduce different statistics observed in real scRNA-seq data. We demonstrate using publicly available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule fluorescence in situ hybridization measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared with other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalization, imputation and true count recovery of gene expression measurements from scRNA-seq data.
J. Kuntz, P. Thomas, G.B. Stan, M. Barahona (2019) “Bounding the stationary distributions of the chemical master equation via mathematical programming” J Chem Phys 151, 034109.
Abstract
The stochastic dynamics of biochemical networks are usually modeled with the chemical master equation (CME). The stationary distributions of CMEs are seldom solvable analytically, and numerical methods typically produce estimates with uncontrolled errors. Here, we introduce mathematical programming approaches that yield approximations of these distributions with computable error bounds which enable the verification of their accuracy. First, we use semidefinite programming to compute increasingly tighter upper and lower bounds on the moments of the stationary distributions for networks with rational propensities. Second, we use these moment bounds to formulate linear programs that yield convergent upper and lower bounds on the stationary distributions themselves, their marginals, and stationary averages. The bounds obtained also provide a computational test for the uniqueness of the distribution. In the unique case, the bounds form an approximation of the stationary distribution with a computable bound on its error. In the nonunique case, our approach yields converging approximations of the ergodic distributions. We illustrate our methodology through several biochemical examples taken from the literature: Schlögl’s model for a chemical bifurcation, a two-dimensional toggle switch, a model for bursty gene expression, and a dimerization model with multiple stationary distributions.
M. Tonn, P. Thomas, M. Barahona, D. Oyarzún (2019) “Stochastic modelling reveals mechanisms of metabolic heterogeneity” Commun Biol 2, 108
Abstract
Phenotypic variation is a hallmark of cellular physiology. Metabolic heterogeneity, in particular, underpins single-cell phenomena such as microbial drug tolerance and growth variability. Much research has focussed on transcriptomic and proteomic heterogeneity, yet it remains unclear if such variation permeates to the metabolic state of a cell. Here we propose a stochastic model to show that complex forms of metabolic heterogeneity emerge from fluctuations in enzyme expression and catalysis. The analysis predicts clonal populations to split into two or more metabolically distinct subpopulations. We reveal mechanisms not seen in deterministic models, in which enzymes with unimodal expression distributions lead to metabolites with a bimodal or multimodal distribution across the population. Based on published data, the results suggest that metabolite heterogeneity may be more pervasive than previously thought. Our work casts light on links between gene expression and metabolism, and provides a theory to probe the sources of metabolite heterogeneity.
J. Kuntz, P. Thomas, G.B. Stan, M. Barahona (2019) “The exit time finite state projection scheme: bounding exit distributions and occupation measures of continuous-time Markov chains” SIAM Sci Comput 41, A748-A769
Abstract
We introduce the exit time finite state projection (ETFSP) scheme, a truncation-based method that yields approximations to the exit distribution and occupation measure associated with the time of exit from a domain (i.e., the time of first passage to the complement of the domain) of time-homogeneous continuous-time Markov chains. We prove that (i) the computed approximations bound the measures from below; (ii) the total variation distances between the approximations and the measures decrease monotonically as states are added to the truncation; and (iii) the scheme converges, in the sense that, as the truncation tends to the entire state space, the total variation distances tend to zero. Furthermore, we give a computable bound on the total variation distance between the exit distribution and its approximation, and we delineate the cases in which the bound is sharp. We also revisit the related finite state projection scheme and give a comprehensive account of its theoretical properties. We demonstrate the use of the ETFSP scheme by applying it to two biological examples: the computation of the first passage time associated with the expression of a gene, and the fixation times of competing species subject to demographic noise.
P. Thomas (2019) “Intrinsic and extrinsic noise in gene expression in lineage trees” Nat Sci Rep 9, 474
Abstract
Cell-to-cell heterogeneity is driven by stochasticity in intracellular reactions and the population dynamics. While these sources are usually studied separately, we develop an agent-based framework that accounts for both factors while tracking every single cell of a growing population. Apart from the common intrinsic variability, the framework also predicts extrinsic noise without the need to introduce fluctuating rate constants. Instead, extrinsic fluctuations are explained by cell cycle fluctuations and differences in cell age. We provide explicit formulas to quantify mean molecule numbers, intrinsic and extrinsic noise statistics in two-colour experiments. We find that these statistics differ significantly depending on the experimental setup used to observe the cells. We illustrate this fact using (i) averages over an isolated cell lineage tracked over many generations as observed in the mother machine, (ii) population snapshots with known cell ages as recorded in time-lapse microscopy, and (iii) snapshots with unknown cell ages as measured from static images or flow cytometry. Applying the method to models of stochastic gene expression and feedback regulation elucidates that isolated lineages, as compared to snapshot data, can significantly overestimate the mean number of molecules, overestimate extrinsic noise but underestimate intrinsic noise and have qualitatively different sensitivities to cell cycle fluctuations.
B.M.C. Martins, A.K. Tooke, P. Thomas, J.C.W. Locke (2018) “Cell size control driven by the circadian clock and environment in cyanobacteria” PNAS 115, E11415-E11424
Summary
When and at what size to divide are critical decisions, requiring cells to integrate internal and external cues. While it is known that the 24-h circadian clock and the environment modulate division timings across organisms, how these signals combine to set the size at which cells divide is not understood. Iterating between modeling and experiments, we show that, in both constant and light−dark conditions, the cyanobacterial clock produces distinctly sized and timed subpopulations. These arise from continuous coupling of the clock to the cell cycle, which, in light−dark cycles, steers cell divisions away from dawn and dusk. Stochastic modeling allows us to predict how these effects emerge from the complex interactions between the environment, clock, and cell size control.
Abstract
How cells maintain their size has been extensively studied under constant conditions. In the wild, however, cells rarely experience constant environments. Here, we examine how the 24-h circadian clock and environmental cycles modulate cell size control and division timings in the cyanobacterium Synechococcus elongatus using single-cell time-lapse microscopy. Under constant light, wild-type cells follow an apparent sizer-like principle. Closer inspection reveals that the clock generates two subpopulations, with cells born in the subjective day following different division rules from cells born in subjective night. A stochastic model explains how this behavior emerges from the interaction of cell size control with the clock. We demonstrate that the clock continuously modulates the probability of cell division throughout day and night, rather than solely applying an on−off gate to division, as previously proposed. Iterating between modeling and experiments, we go on to identify an effective coupling of the division rate to time of day through the combined effects of the environment and the clock on cell division. Under naturally graded light−dark cycles, this coupling narrows the time window of cell divisions and shifts divisions away from when light levels are low and cell growth is reduced. Our analysis allows us to disentangle, and predict the effects of, the complex interactions between the environment, clock, and cell size control.
P. Thomas, G. Terradot, V. Danos, A.Y. Weisse (2018) “Sources, propagation and consequences of stochasticity in cellular growth” Nat Commun 9, 4528
Abstract
Phenotypic variation is a hallmark of cellular physiology. Metabolic heterogeneity, in particular, underpins single-cell phenomena such as microbial drug tolerance and growth variability. Much research has focussed on transcriptomic and proteomic heterogeneity, yet it remains unclear if such variation permeates to the metabolic state of a cell. Here we propose a stochastic model to show that complex forms of metabolic heterogeneity emerge from fluctuations in enzyme expression and catalysis. The analysis predicts clonal populations to split into two or more metabolically distinct subpopulations. We reveal mechanisms not seen in deterministic models, in which enzymes with unimodal expression distributions lead to metabolites with a bimodal or multimodal distribution across the population. Based on published data, the results suggest that metabolite heterogeneity may be more pervasive than previously thought. Our work casts light on links between gene expression and metabolism, and provides a theory to probe the sources of metabolite heterogeneity.
M. Voliotis, P. Thomas, C.G. Bowsher, R. Grima (2018) “Stochastic simulation of biomolecular networks in dynamic environments” in Quantitative Biology:Theory, Computational Methods, and Models (q-bio textbook, ed. by B. Munsky, W.S. Hlavacek, L.S. Tsimring ), MIT Press
Abstract
Phenotypic variation is a hallmark of cellular physiology. Metabolic heterogeneity, in particular, underpins single-cell phenomena such as microbial drug tolerance and growth variability. Much research has focussed on transcriptomic and proteomic heterogeneity, yet it remains unclear if such variation permeates to the metabolic state of a cell. Here we propose a stochastic model to show that complex forms of metabolic heterogeneity emerge from fluctuations in enzyme expression and catalysis. The analysis predicts clonal populations to split into two or more metabolically distinct subpopulations. We reveal mechanisms not seen in deterministic models, in which enzymes with unimodal expression distributions lead to metabolites with a bimodal or multimodal distribution across the population. Based on published data, the results suggest that metabolite heterogeneity may be more pervasive than previously thought. Our work casts light on links between gene expression and metabolism, and provides a theory to probe the sources of metabolite heterogeneity.
P. Thomas (2018) “Analysis of cell size homeostasis at the single cell and population level.” Front Phys 6, 64
Abstract
Growth pervades all areas of life from single cells to cell populations to tissues. Cell size often fluctuates significantly from cell to cell and from generation to generation. Here we present a unified framework to predict the statistics of cell size variations within a lineage tree of a proliferating population. We analytically characterize (i) the distributions of cell size snapshots, (ii) the distribution within a population tree, and (iii) the distribution of lineages across the tree. Surprisingly, these size distributions differ significantly from observing single cells in isolation. In populations, cells seemingly grow to different sizes, typically exhibit less cell-to-cell variability and often display qualitatively different sensitivities to cell cycle noise and division errors. We demonstrate the key findings using recent single-cell data and elaborate on the implications for the ability of cells to maintain a narrow size distribution and the emergence of different power laws in these distributions.
P. Thomas (2017) “Making sense of snapshot data: ergodic principle for clonal cell populations” J Royal Soc Interface 14, 20170467
Abstract
Population growth is often ignored when quantifying gene expression levels across clonal cell populations. We develop a framework for obtaining the molecule number distributions in an exponentially growing cell population taking into account its age structure. In the presence of generation time variability, the average acquired across a population snapshot does not obey the average of a dividing cell over time, apparently contradicting ergodicity between single cells and the population. Instead, we show that the variation observed across snapshots with known cell age is captured by cell histories, a single-cell measure obtained from tracking an arbitrary cell of the population back to the ancestor from which it originated. The correspondence between cells of known age in a population with their histories represents an ergodic principle that provides a new interpretation of population snapshot data. We illustrate the principle using analytical solutions of stochastic gene expression models in cell populations with arbitrary generation time distributions. We further elucidate that the principle breaks down for biochemical reactions that are under selection, such as the expression of genes conveying antibiotic resistance, which gives rise to an experimental criterion with which to probe selection on gene expression fluctuations.
M. Priestman, P. Thomas, B.D. Robertson, V. Shahrezaei (2017) “Mycobacteria modify their cell size control under sub-optimal carbon sources” Front Cell Dev Biol 5, 64
Abstract
The decision to divide is the most important one that any cell must make. Recent single cell studies suggest that most bacteria follow an “adder” model of cell size control, incorporating a fixed amount of cell wall material before dividing. Mycobacteria, including the causative agent of tuberculosis Mycobacterium tuberculosis, are known to divide asymmetrically resulting in heterogeneity in growth rate, doubling time, and other growth characteristics in daughter cells. The interplay between asymmetric cell division and adder size control has not been extensively investigated. Moreover, the impact of changes in the environment on growth rate and cell size control have not been addressed for mycobacteria. Here, we utilize time-lapse microscopy coupled with microfluidics to track live Mycobacterium smegmatis cells as they grow and divide over multiple generations, under a variety of growth conditions. We demonstrate that, under optimal conditions, M. smegmatis cells robustly follow the adder principle, with constant added length per generation independent of birth size, growth rate, and inherited pole age. However, the nature of the carbon source induces deviations from the adder model in a manner that is dependent on pole age. Understanding how mycobacteria maintain cell size homoeostasis may provide crucial targets for the development of drugs for the treatment of tuberculosis, which remains a leading cause of global mortality.
A. Andreychenko, L. Bortolussi, R. Grima, P. Thomas, V. Wolf (2017) “Distribution approximations for the chemical master equation: comparison of the method of moments and the system size expansion” Modeling Cellular Systems, 39-66
Abstract
The stochastic nature of chemical reactions has resulted in an increasing research interest in discrete-state stochastic models and their analysis. A widely used approach is the description of the temporal evolution of such systems in terms of a chemical master equation (CME). In this paper we study two approaches for approximating the underlying probability distributions of the CME. The first approach is based on an integration of the statistical moments and the reconstruction of the distribution based on the maximum entropy principle. The second approach relies on an analytical approximation of the probability distribution of the CME using the system size expansion, considering higher order terms than the linear noise approximation. We consider gene expression networks with unimodal and multimodal protein distributions to compare the accuracy of the two approaches. We find that both methods provide accurate approximations to the distributions of the CME while having different benefits and limitations in applications.
F. Fröhlich, P. Thomas, A. Kazeroonian, F.J. Theis, R. Grima, J. Hasenauer (2016) “Inference for stochastic chemical kinetics using moment equations and system size expansion” PLoS Comput Biol 12, e1005030
Summary
In this manuscript, we introduce efficient methods for parameter estimation for stochastic processes. The stochasticity of chemical reactions can influence the average behavior of the considered system. For some biological systems, a microscopic, stochastic description is computationally intractable but a macroscopic, deterministic description too inaccurate. This inaccuracy manifests itself in an error in parameter estimates, which impede the predictive power of the proposed model. Until now, no rigorous analysis on the magnitude of the estimation error exists. We show by means of two simulation examples that using mesoscopic descriptions based on the system size expansions and moment-closure approximations can reduce this estimation error compared to inference using a macroscopic description. This reduction is most pronounced in an intermediate volume regime where the influence of stochasticity on the average behavior is moderately strong. For the JAK/STAT pathway where experimental data is available, we show that one parameter that was not structurally identifiable when using a macroscopic description becomes structurally identifiable when using a mesoscopic description for parameter estimation.
Abstract
Quantitative mechanistic models are valuable tools for disentangling biochemical pathways and for achieving a comprehensive understanding of biological systems. However, to be quantitative the parameters of these models have to be estimated from experimental data. In the presence of significant stochastic fluctuations this is a challenging task as stochastic simulations are usually too time-consuming and a macroscopic description using reaction rate equations (RREs) is no longer accurate. In this manuscript, we therefore consider moment-closure approximation (MA) and the system size expansion (SSE), which approximate the statistical moments of stochastic processes and tend to be more precise than macroscopic descriptions. We introduce gradient-based parameter optimization methods and uncertainty analysis methods for MA and SSE. Efficiency and reliability of the methods are assessed using simulation examples as well as by an application to data for Epo-induced JAK/STAT signaling. The application revealed that even if merely population-average data are available, MA and SSE improve parameter identifiability in comparison to RRE. Furthermore, the simulation examples revealed that the resulting estimates are more reliable for an intermediate volume regime. In this regime the estimation error is reduced and we propose methods to determine the regime boundaries. These results illustrate that inference using MA and SSE is feasible and possesses a high sensitivity.
M. Voliotis, P. Thomas, R. Grima, C.G. Bowsher (2016) “Stochastic simulation of biomolecular networks in dynamic environments” PLoS Comput Biol 12, e1004923
Summary
Simulation algorithms have become indispensable tools in modern quantitative biology, providing deep insight into many biochemical systems, including gene regulatory networks. However, current stochastic simulation approaches handle the effects of fluctuating extracellular signals and upstream processes poorly, either failing to give qualitatively reliable predictions or being very inefficient computationally. Here we introduce the Extrande method, a novel approach for simulation of biomolecular networks embedded in the dynamic environment of the cell and its surroundings. The method is accurate and computationally efficient, and hence fills an important gap in the field of stochastic simulation. In particular, we employ it to study a bacterial decision-making network and demonstrate that robustness to fluctuations from upstream signaling places strong constraints on the design of networks determining cell fate.
Abstract
Simulation of biomolecular networks is now indispensable for studying biological systems, from small reaction networks to large ensembles of cells. Here we present a novel approach for stochastic simulation of networks embedded in the dynamic environment of the cell and its surroundings. We thus sample trajectories of the stochastic process described by the chemical master equation with time-varying propensities. A comparative analysis shows that existing approaches can either fail dramatically, or else can impose impractical computational burdens due to numerical integration of reaction propensities, especially when cell ensembles are studied. Here we introduce the Extrande method which, given a simulated time course of dynamic network inputs, provides a conditionally exact and several orders-of-magnitude faster simulation solution. The new approach makes it feasible to demonstrate—using decision-making by a large population of quorum sensing bacteria—that robustness to fluctuations from upstream signaling places strong constraints on the design of networks determining cell fate. Our approach has the potential to significantly advance both understanding of molecular systems biology and design of synthetic circuits.
P. Thomas (2015) “Systematic approximation methods for stochastic biochemical kinetics” PhD Thesis, University of Edinburgh
Abstract
Experimental studies have shown that the protein abundance in living cells varies from few tens to several thousands molecules per species. Molecular fluctuations roughly scale as the inverse square root of the number of molecules due to the random timing of reactions. It is hence expected that intrinsic noise plays an important role in the dynamics of biochemical networks. The Chemical Master Equation is the accepted description of these systems under well-mixed conditions. Because analytical solutions to this equation are available only for simple systems, one often has to resort to approximation methods. A popular technique is an expansion in the inverse volume to which the reactants are confined, called van Kampen's system size expansion. Its leading order terms are given by the phenomenological rate equations and the linear noise approximation that quantify the mean concentrations and the Gaussian fluctuations about them, respectively. While these approximations are valid in the limit of large molecule numbers, it is known that physiological conditions often imply low molecule numbers. We here develop systematic approximation methods based on higher terms in the system size expansion for general biochemical networks. We present an asymptotic series for the moments of the Chemical Master Equation that can be computed to arbitrary precision in the system size expansion. We then derive an analytical approximation of the corresponding time-dependent probability distribution. Finally, we devise a diagrammatic technique based on the path-integral method that allows to compute time-correlation functions. We show through the use of biological examples that the first few terms of the expansion yield accurate approximations even for low number of molecules. The theory is hence expected to closely resemble the outcomes of single cell experiments.
P. Thomas, R. Grima (2015) “Approximate probability distributions of the master equation” Phys Rev E 92, 012120
Abstract
Master equations are common descriptions of mesoscopic systems. Analytical solutions to these equations can rarely be obtained. We here derive an analytical approximation of the time-dependent probability distribution of the master equation using orthogonal polynomials. The solution is given in two alternative formulations: a series with continuous and a series with discrete support, both of which can be systematically truncated. While both approximations satisfy the system size expansion of the master equation, the continuous distribution approximations become increasingly negative and tend to oscillations with increasing truncation order. In contrast, the discrete approximations rapidly converge to the underlying non-Gaussian distributions. The theory is shown to lead to particularly simple analytical expressions for the probability distributions of molecule numbers in metabolic reactions and gene expression systems.
P. Thomas, C. Fleck, R. Grima, N. Popovic (2014) “System size expansion using Feynman rules and diagrams” J Phys A: Math Theor 47, 455007 • Selected for J Phys A: Highlights 2014 and IOPselect
Abstract
Few analytical methods exist for quantitative studies of large fluctuations in stochastic systems. In this article, we develop a simple diagrammatic approach to the chemical master equation that allows us to calculate multi-time correlation functions which are accurate to any desired order in van Kampenʼs system size expansion. Specifically, we present a set of Feynman rules from which this diagrammatic perturbation expansion can be constructed algorithmically. We then apply the methodology to derive in closed form the leading order corrections to the linear noise approximation of the intrinsic noise power spectrum for general biochemical reaction networks. Finally, we illustrate our results by describing noise-induced oscillations in the Brusselator reaction scheme which are not captured by the common linear noise approximation.
P. Thomas, N. Popovic and R. Grima (2014) “Phenotypic switching in gene regulatory networks” PNAS 111:6994
Summary
Phenotypes of an isogenic cell population are determined by states of low and high gene expression. These are often associated with multiple steady states predicted by deterministic models of gene regulatory networks. Gene expression is, however, regulated via stochastic molecular interactions at transcriptional and translational levels. Here, we show that intrinsic noise can induce multimodality in a wide class of regulatory networks whose corresponding deterministic description lacks multistability, thus offering a plausible alternative mechanism for phenotypic switching without the need for ultrasensitivity or highly cooperative interactions. We derive a general analytical framework for quantifying this phenomenon, thereby elucidating how cells encode decisions, and how they retain memory of transient environmental signals, using common gene regulatory motifs.
Abstract
Noise in gene expression can lead to reversible phenotypic switching. Several experimental studies have shown that the abundance distributions of proteins in a population of isogenic cells may display multiple distinct maxima. Each of these maxima may be associated with a subpopulation of a particular phenotype, the quantification of which is important for understanding cellular decision-making. Here, we devise a methodology which allows us to quantify multimodal gene expression distributions and single-cell power spectra in gene regulatory networks. Extending the commonly used linear noise approximation, we rigorously show that, in the limit of slow promoter dynamics, these distributions can be systematically approximated as a mixture of Gaussian components in a wide class of networks. The resulting closed-form approximation provides a practical tool for studying complex nonlinear gene regulatory networks that have thus far been amenable only to stochastic simulation. We demonstrate the applicability of our approach in a number of genetic networks, uncovering previously unidentified dynamical characteristics associated with phenotypic switching. Specifically, we elucidate how the interplay of transcriptional and translational regulation can be exploited to control the multimodality of gene expression distributions in two-promoter networks. We demonstrate how phenotypic switching leads to birhythmical expression in a genetic oscillator, and to hysteresis in phenotypic induction, thus highlighting the ability of regulatory networks to retain memory.
P. Thomas, A. V. Straube, J. Timmer, C. Fleck, R. Grima (2013) “Signatures of nonlinearity in single cell noise-induced oscillations” J Theor Biol 335:222
Abstract
A class of theoretical models seeks to explain rhythmic single cell data by postulating that they are generated by intrinsic noise in biochemical systems whose deterministic models exhibit only damped oscillations. The main features of such noise-induced oscillations are quantified by the power spectrum which measures the dependence of the oscillatory signal's power with frequency. In this paper we derive an approximate closed-form expression for the power spectrum of any monostable biochemical system close to a Hopf bifurcation, where noise-induced oscillations are most pronounced. Unlike the commonly used linear noise approximation which is valid in the macroscopic limit of large volumes, our theory is valid over a wide range of volumes and hence affords a more suitable description of single cell noise-induced oscillations. Our theory predicts that the spectra have three universal features: (i) a dominant peak at some frequency, (ii) a smaller peak at twice the frequency of the dominant peak and (iii) a peak at zero frequency. Of these, the linear noise approximation predicts only the first feature while the remaining two stem from the combination of intrinsic noise and nonlinearity in the law of mass action. The theoretical expressions are shown to accurately match the power spectra determined from stochastic simulations of mitotic and circadian oscillators. Furthermore it is shown how recently acquired single cell rhythmic fibroblast data displays all the features predicted by our theory and that the experimental spectrum is well described by our theory but not by the conventional linear noise approximation.
P. Thomas, H. Matuschek, R. Grima (2013) “How reliable is the linear noise approximation of gene regulatory networks?” BMC Genomics 14(Suppl 4):S5
Abstract
The linear noise approximation (LNA) is commonly used to predict how noise is regulated and exploited at the cellular level. These predictions are exact for reaction networks composed exclusively of first order reactions or for networks involving bimolecular reactions and large numbers of molecules. It is however well known that gene regulation involves bimolecular interactions with molecule numbers as small as a single copy of a particular gene. It is therefore questionable how reliable are the LNA predictions for these systems.
We implement in the software package intrinsic Noise Analyzer (iNA), a system size expansion based method which calculates the mean concentrations and the variances of the fluctuations to an order of accuracy higher than the LNA. We then use iNA to explore the parametric dependence of the Fano factors and of the coefficients of variation of the mRNA and protein fluctuations in models of genetic networks involving nonlinear protein degradation, post-transcriptional, post-translational and negative feedback regulation. We find that the LNA can significantly underestimate the amplitude and period of noise-induced oscillations in genetic oscillators. We also identify cases where the LNA predicts that noise levels can be optimized by tuning a bimolecular rate constant whereas our method shows that no such regulation is possible. All our results are confirmed by stochastic simulations.
The software iNA allows the investigation of parameter regimes where the LNA fares well and where it does not. We have shown that the parametric dependence of the coefficients of variation and Fano factors for common gene regulatory networks is better described by including terms of higher order than LNA in the system size expansion. This analysis is considerably faster than stochastic simulations due to the extensive ensemble averaging needed to obtain statistically meaningful results. Hence iNA is well suited for performing computationally efficient and quantitative studies of intrinsic noise in gene regulatory networks.
P. Thomas, R. Grima, A. V. Straube (2012) “Rigorous elimination of fast stochastic variables from the linear noise approximation using projection operators” Phys Rev E 86, 041110
Abstract
The linear noise approximation (LNA) offers a simple means by which one can study intrinsic noise in monostable biochemical networks. Using simple physical arguments, we have recently introduced the slow-scale LNA (ssLNA), which is a reduced version of the LNA under conditions of timescale separation. In this paper we present the first rigorous derivation of the ssLNA using the projection operator technique and show that the ssLNA follows uniquely from the standard LNA under the same conditions of timescale separation as those required for the deterministic quasi-steady-state approximation. We also show that the large molecule number limit of several common stochastic model reduction techniques under timescale separation conditions constitutes a special case of the ssLNA.
P. Thomas, H. Matuschek, R. Grima (2012) “Computation of biochemical pathway fluctuations beyond the linear noise approximation using iNA” In Proceedings IEEE Int Conf Bioinformatics Biomed, pp. 1-5
Abstract
The linear noise approximation is commonly used to obtain intrinsic noise statistics for biochemical networks. These estimates are accurate for networks with large numbers of molecules. However it is well known that many biochemical networks are characterized by at least one species with a small number of molecules. We here describe version 0.3 of the software intrinsic Noise Analyzer (iNA) which allows for accurate computation of noise statistics over wide ranges of molecule numbers. This is achieved by calculating the next order corrections to the linear noise approximation's estimates of variance and covariance of concentration fluctuations. The efficiency of the methods is significantly improved by automated just-in-time compilation using the LLVM framework leading to a fluctuation analysis which typically outperforms that obtained by means of exact stochastic simulations. iNA is hence particularly well suited for the needs of the computational biology community.
P. Thomas, H. Matuschek, R. Grima (2012) “Intrinsic Noise Analyzer: A software package for the exploration of stochastic biochemical kinetics using the system size expansion” PloS One 7, e38518
Abstract
The accepted stochastic descriptions of biochemical dynamics under well-mixed conditions are given by the Chemical Master Equation and the Stochastic Simulation Algorithm, which are equivalent. The latter is a Monte-Carlo method, which, despite enjoying broad availability in a large number of existing software packages, is computationally expensive due to the huge amounts of ensemble averaging required for obtaining accurate statistical information. The former is a set of coupled differential-difference equations for the probability of the system being in any one of the possible mesoscopic states; these equations are typically computationally intractable because of the inherently large state space. Here we introduce the software package intrinsic Noise Analyzer (iNA), which allows for systematic analysis of stochastic biochemical kinetics by means of van Kampen’s system size expansion of the Chemical Master Equation. iNA is platform independent and supports the popular SBML format natively. The present implementation is the first to adopt a complementary approach that combines state-of-the-art analysis tools using the computer algebra system Ginac with traditional methods of stochastic simulation. iNA integrates two approximation methods based on the system size expansion, the Linear Noise Approximation and effective mesoscopic rate equations, which to-date have not been available to non-expert users, into an easy-to-use graphical user interface. In particular, the present methods allow for quick approximate analysis of time-dependent mean concentrations, variances, covariances and correlations coefficients, which typically outperforms stochastic simulations. These analytical tools are complemented by automated multi-core stochastic simulations with direct statistical evaluation and visualization. We showcase iNA’s performance by using it to explore the stochastic properties of cooperative and non-cooperative enzyme kinetics and a gene network associated with circadian rhythms. The software iNA is freely available as executable binaries for Linux, MacOSX and Microsoft Windows, as well as the full source code under an open source license.
P. Thomas, A.V. Straube, R. Grima (2012) “The slow-scale linear noise approximation: an accurate, reduced stochastic description of biochemical networks under timescale separation conditions” BMC Syst Biol 6, 39 • Selected as editor’s pick 2012
Abstract
It is well known that the deterministic dynamics of biochemical reaction networks can be more easily studied if timescale separation conditions are invoked (the quasi-steady-state assumption). In this case the deterministic dynamics of a large network of elementary reactions are well described by the dynamics of a smaller network of effective reactions. Each of the latter represents a group of elementary reactions in the large network and has associated with it an effective macroscopic rate law. A popular method to achieve model reduction in the presence of intrinsic noise consists of using the effective macroscopic rate laws to heuristically deduce effective probabilities for the effective reactions which then enables simulation via the stochastic simulation algorithm (SSA). The validity of this heuristic SSA method is a priori doubtful because the reaction probabilities for the SSA have only been rigorously derived from microscopic physics arguments for elementary reactions.
We here obtain, by rigorous means and in closed-form, a reduced linear Langevin equation description of the stochastic dynamics of monostable biochemical networks in conditions characterized by small intrinsic noise and timescale separation. The slow-scale linear noise approximation (ssLNA), as the new method is called, is used to calculate the intrinsic noise statistics of enzyme and gene networks. The results agree very well with SSA simulations of the non-reduced network of elementary reactions. In contrast the conventional heuristic SSA is shown to overestimate the size of noise for Michaelis-Menten kinetics, considerably under-estimate the size of noise for Hill-type kinetics and in some cases even miss the prediction of noise-induced oscillations. A new general method, the ssLNA, is derived and shown to correctly describe the statistics of intrinsic noise about the macroscopic concentrations under timescale separation conditions. The ssLNA provides a simple and accurate means of performing stochastic model reduction and hence it is expected to be of widespread utility in studying the dynamics of large noisy reaction networks, as is common in computational and systems biology.
P. Thomas, A.V. Straube, R. Grima (2011) “Communication: Limitations of the stochastic quasi-steady-state approximation in open biochemical reaction networks” J Chem Phys 135, 181103
Abstract
It is commonly believed that, whenever timescale separation holds, the predictions of reduced chemical master equations obtained using the stochastic quasi-steady-state approximation are in very good agreement with the predictions of the full master equations. We use the linear noise approximation to obtain a simple formula for the relative error between the predictions of the two master equations for the Michaelis-Menten reaction with substrate input. The reduced approach is predicted to overestimate the variance of the substrate concentration fluctuations by as much as 30%. The theoretical results are validated by stochastic simulations using experimental parameter values for enzymes involved in proteolysis, gluconeogenesis, and fermentation.
R. Grima, P. Thomas, A.V. Straube (2011) “How accurate are the nonlinear chemical Fokker-Planck and chemical Langevin equations?” J Chem Phys 135, 084103 • Selected as editor’s choice 2011
Abstract
The chemical Fokker-Planck equation and the corresponding chemical Langevin equation are commonly used approximations of the chemical master equation. These equations are derived from an uncontrolled, second-order truncation of the Kramers-Moyal expansion of the chemical master equation and hence their accuracy remains to be clarified. We use the system-size expansion to show that chemical Fokker-Planck estimates of the mean concentrations and of the variance of the concentration fluctuations about the mean are accurate to order Ω(-3∕2) for reaction systems which do not obey detailed balance and at least accurate to order Ω(-2) for systems obeying detailed balance, where Ω is the characteristic size of the system. Hence, the chemical Fokker-Planck equation turns out to be more accurate than the linear-noise approximation of the chemical master equation (the linear Fokker-Planck equation) which leads to mean concentration estimates accurate to order Ω(-1∕2) and variance estimates accurate to order Ω(-3∕2). This higher accuracy is particularly conspicuous for chemical systems realized in small volumes such as biochemical reactions inside cells. A formula is also obtained for the approximate size of the relative errors in the concentration and variance predictions of the chemical Fokker-Planck equation, where the relative error is defined as the difference between the predictions of the chemical Fokker-Planck equation and the master equation divided by the prediction of the master equation. For dimerization and enzyme-catalyzed reactions, the errors are typically less than few percent even when the steady-state is characterized by merely few tens of molecules.
P. Thomas, A.V. Straube, R. Grima (2010) “Stochastic theory of large-scale enzyme-reaction networks: Finite copy number corrections to rate equation models” J Chem Phys 133, 195101
Abstract
Chemical reactions inside cells occur in compartment volumes in the range of atto- to femtoliters. Physiological concentrations realized in such small volumes imply low copy numbers of interacting molecules with the consequence of considerable fluctuations in the concentrations. In contrast, rate equation models are based on the implicit assumption of infinitely large numbers of interacting molecules, or equivalently, that reactions occur in infinite volumes at constant macroscopic concentrations. In this article we compute the finite-volume corrections (or equivalently the finite copy number corrections) to the solutions of the rate equations for chemical reaction networks composed of arbitrarily large numbers of enzyme-catalyzed reactions which are confined inside a small subcellular compartment. This is achieved by applying a mesoscopic version of the quasisteady-state assumption to the exact Fokker–Planck equation associated with the Poisson representation of the chemical master equation. The procedure yields impressively simple and compact expressions for the finite-volume corrections. We prove that the predictions of the rate equations will always underestimate the actual steady-state substrate concentrations for an enzyme-reaction network confined in a small volume. In particular we show that the finite-volume corrections increase with decreasing subcellular volume, decreasing Michaelis–Menten constants, and increasing enzyme saturation. The magnitude of the corrections depends sensitively on the topology of the network. The predictions of the theory are shown to be in excellent agreement with stochastic simulations for two types of networks typically associated with protein methylation and metabolism.