Advice to a Friend
Engineering is an entirely pragmatic discipline. We attempt to do things, fail (sometimes), and try and learn the lessons from those failures. We embody these lessons in codes of practice, guidance and best practice. At our most elevated, we manage successfully to extract from our experience general principles that, subject to confirmation from properly organised experimentation, have some sort of scientific validity. When it comes to building complex, large-scale systems there is a substantial body of experience and much hard won knowledge about how such projects should be conceived, organised and conducted. True, there are disagreements, and we certainly do not get it right all the time, nevertheless many of the critical risks in such projects are typically avoided through good engineering practice.
It is the duty of an engineer then to sound a warning when we see somebody blithely undertaking a highly complex, large-scale project disregarding the most obvious steps which are necessary to assure success or, at any rate, avoid costly failures. In this case it is the biomedical and life sciences which are being called to attention.
The scientific programme they have embarked on is to computationally model the human physiome from genes to physiological processes, by way of cells, metabolic and organ systems. This entails harnessing genetic, proteomic and experimental data in support of this model. In its ultimate realisation the programme will deploy the model as the basis for personalised medicine and the suitability of medical treatments to an individual case will be determined by reference to the model. This programme is scientifically an immensely challenging one, there are large gaps in the required knowledge and the relationships between phenomena at the different scales are poorly understood. It is not however the towering scale of the science challenge that concerns me but rather the absence of an engineering perspective.
If I am to characterise existing efforts in this space they could best be described as 'hacking': low-level, ad-hoc and impossible to build on.
The immense scale and complexity of the endeavour should be clear, even if we assume the data were in place and the science wholly understood, which, of course, it is not. We are looking at computational models that are larger and much more complex than anything we have built hitherto. The data infrastructure required must handle volumes greater, and structures more elaborate, than anything we have tackled before. This prior to taking into account the implications of personalisation, let alone the considerations associated with applying this in medical settings. These add up, in short to the largest computational engineering challenge ever faced.
Now don't get me wrong. I am 'up' for the challenge and do not suggest that this is (from an engineering standpoint, I cannot speak for the science) beyond achievability, though it does perhaps lie at the boundaries of the state-of-the art. What I ask is that we build the proper engineering foundations first.
What are these engineering foundations? Well first we need to specify requirements. In this context that means first we need to indicate precisely what 'properties' the models must be capable of reasoning about. We need an understanding of how the models are to be used. We also need some real sense of the 'volumetrics': just how large a problem we are facing, we know it is big, but we have no empirically well founded sense of the dimensions and scale.
Next we need an architecture - a structural and conceptual framework to build on. Such an architecture must incorporate some principles of modularity and definitions of key interfaces. No large engineering project can be undertaken without the ability to reduce complexity by dividing a system into parts. Given that the models we are interested in are multi-dimensional, the types of modularity required are necessarily going to be sophisticated. In particular we need to arrive at some notion of a 'component' that has engineering utility but also corresponds to (or can be mapped onto) biological reality.
We need a plan. We need to marshall our resources. We cannot simply work in an uncoordinated manner at different levels on different phenomena and believe that everything will just 'come together'. I do not expect a unified programme of work but some sort of roadmap with intermediate targets could perhaps be achieved. In some sense the process of agreeing the roadmap might itself yield valuable outcomes.
We need a framework of constructional tools. These tools would include languages, frameworks for data management, collaborative environments and, critically, an infrastructure for recording progress and managing the versioning and configuration of the model and its component parts.
Finally, we need to have an organised means of testing the model at the component, sub-system and integrated system level. A test set, test harness, test plan, test metrics and rigorous test process are essential in any well conceived engineering project.
I invite the biomedical and life sciences communities to attend to these engineering foundations or risk the larger scientific programme going awry.