Interpreting reductionist models in systems biology

I find reductionism invaluable when modelling systems biology. My background in physics has hammered in to me that a model should be as large as it needs to be but no larger. I will by no means argue against heavy reductionism, but I recently came to realise that I have been thinking about models in a way which may not be accurate.

When approaching a complicated biological issue with a reductionistic view I often ask myself “What basic interaction logic is causing the dynamics we observe experimentally?". From there, I can either try to reduce our pre-existing knowledge about the system to a minimal reaction network that is capable of reproducing what I’m after. Or, I can ignore the known interactions and start building an hypothesis based on what dynamical properties the experimental data is implying must exit. Does the dynamics seem bistable? Then I would need a positive feedback somewhere. Is the system responding to changes in input rather than absolute levels? Then it must have some mechanism of precise adaptation.

Regardless of how I conduct my reasoning, I will end up with a small reaction network which I will use to represent the action of a much larger, real, network. This method helps cutting to the heart of what biochemical architecture is actually important for the observed dynamics. However, during model definition, I would use rather dogmatic take on how to translate an hypothesis into mathematics. I (and I think most systems biologists) would build my systems of differential equations following a specific pattern of writing down production, transfer and degradation terms. These terms are populated with parameters which have dimension and supposedly have some meaning in the real biological system of study. A degradation rate parameter, for example, is mappable to the rate at which a molecule is degraded. Or is it?

We assume such simple mappability between the model terms and reality, but is this an accurate way of thinking? I don’t think that it is news for most systems biologists that this mapping is imperfect but I also don’t think that the issue is explored as fully as it should be.

A model which fully enumerates all the reactions that makes up a biological system can be defined in our dogmatic way without a problem (given that the system is fully known). Its parameters have a real correspondence with measurables in the biological systems. And, if you populate the models with measured parameters and initial conditions, the model will behave just like the biological system does. However, a reductionist model is different. Consider what would happen if you reduced a system with a hundred components and intermediate steps to a model where the two components $A$ and $B$ interact with each other. If we now define our model using the same methods as we would for the fully enumerated model we would end up with the same production and degradation rate parameters. If we first measured the production and degradation rates of $A$ and $B$ and then used these values in the model, what would we get? The model would most likely not be able to recapitulate the observed dynamics.

If we instead do parameter inference the other way, by tuning them such that they make the model dynamics fit data, we may get model/data agreement. But, do we then get agreement between $A$'s degradation rate in the model and the one we measure in the system? Probably not. This is because when we fitted the reduced model to data, we forced the terms that we let remain to perform the action which in reality is performed by many separate reactions. If the real system has a time-delay between the production of $A$ and the perception of the subsequent signalling by $B$ then this delay will somehow have to be baked into some of the remaining terms. The degradation rate constant of $A$ is then not doing what we defined it to do, which is to represent the degradation of $A$. Rather, it is being fitted to best approximate the combination of $A$'s degradation as well as all of the actions that would have been performed by bits of the network that we simplified away. The mappability of parameters, or even entire terms, to reality is thus compromised.

This is not the end of the world, but it may warrant some introspection to what we are doing when we model. Even with the mappability broken, I still think that one can glean a lot of information from these models. And, I think that the reductionistic approach is net positive. However, I question whether the method of model definition which I have seen taught is actually justifiable. The terms defined for a model is generally drawn from a library of terms which represent common biochemical processes or small sets thereof (the Michaelis-Menten function, the hill function, linear degradation/dilution, diffusion, etc.). But, if we know that the terms and parameters of the reductionist model is not mappable to single biochemical events, why do we define them as if they were?

I’m not trying to throw the baby out with the bathwater here. I fully concede that the whole point of reductionism is to reduce the complexity of a problem. It would, therefore, not be a good idea to invent new terms to re-introduce the full complexity of the system. But I do think that it would be a good idea to expand our library of terms to ones which concisely represents the actions of larger sub-networks. For example, arbitrarily long linear multi-step pathways can be accurately represented using a two-parameter gamma distributed delay. I also think that one should conceptually separate the fully enumerated model and the reductionist model. They are fundamentally different, why should they share the same building blocks? Furthermore, both the education and the nomenclature we use surrounding systems biology modelling tends to conflate the meaning of mathematical terms for these two ways of modelling. Let’s try to stop that.