AI - Super models

The impact of AI/data-driven drug discovery is evolving rapidly in life sciences.
Harnessing the beneficial potential of this technology, however, requires its optimal integration into R&D says Biorelate’s Dr Ben Sidders.

Data-driven, AI-enabled drug discovery is becoming a reality, reflected in the rise of ‘AI-first’ companies such as Recursion and Insilico Medicine. Traditional pharmaceutical companies are also embracing AI across their businesses.

Although previously the perceived value of AI in drug discovery and development has fallen short of the hype, a number of point solutions are now having a positive impact within aspects of R&D, providing pointers about how to successfully embed AI at its core.

Today, in target discovery, knowledge graphs are proving adept at integrating a vast number of data sources into a query-able structure, which can be used to make informed and relatively unbiased target prioritisation decisions.

Challenges remain, however. Predicting synergistic drug combinations has been the topic of extensive research, and every flavour of AI model has been assessed with only limited success and almost no translational relevance.

Nor are we any nearer to being able to predict the effect of a drug on a given patient without first running a clinical trial.

Progress will require a structured and integrated approach to AI-enabled R&D transformation, spanning data, model, culture and validation considerations.

So far, AI has found most success where the data set is large, complete and in many cases has been generated specifically to solve the problem at hand.

The UNI foundation model for computational pathology, for instance, was trained on >100 million images from 20 tissue types.

In contrast one of the largest data sets available to train models for drug combination synergy prediction has 910 combinations of 118 drugs – many orders of magnitude smaller.

This problem is further exacerbated when we look at data from clinical trial cohorts, which is often sparse, and inconsistent in what is measured.

For example, one trial might collect demographics and data for a specific blood-based biomarker; another might also collect genomic data.

Then there are differences in the analysis pipelines applied to all these data. Re-processing and harmonising all of these data types are highly labour intensive, and often only the start of the process.

The features used to train models may not be derived directly from the harmonised data and may need significant further manipulation.

The underlying issue is that pharma’s data, particularly that from clinical trials, was not generated for AI. To exploit data in a meaningful way using AI, companies must develop a data strategy – and be willing to fund and generate data on clinical cohorts if possible – to build useful data of the required scale.

While AI models excel at classification and predictive problems, if AI is to revolutionise drug discovery it must incorporate causality.

Predicting that a drug might work in a new indication is valuable, but it is not the same as explaining why the drug will work in that indication.

To support internal and regulatory decision-making it is essential to have explainable biology that supports a mechanistic understanding of the particular drug or biology.

The integration of prior knowledge and data-driven insights offers a promising solution.

AI combined with highly accurate causal relationships can distil both a broader array of targets with strong promise, and a mechanistic understanding of their biological role in disease.

Cause-and-effect relationships can be mined from the literature and created from experimental data.

These relationships, defining the regulatory interactions between two biological entities, can be combined into structural causal models – a framework to represent and analyse the causal relationships between variables.

Such models provide a systematic way to model how changes in one variable can lead to changes in another.

These could be used during the training process of more expansive foundation models, but also to build specific mechanistic models that further describe the output from an upstream finding.

The output from all AI solutions should be validated, experimentally if appropriate, with two provisos. First, the R&D function should be set up so that all data feeds back to the AI model.

This helps to mitigate some of the challenges described above, while ensuring that the model can be continually improved.

For example, every result from the CRISPR screening group should find its way back into the knowledge graph so that future queries can benefit from that data.

Second, there needs to be a triage-based validation model. While an AI system is able to identify hundreds of targets, the challenge is to stay open to ‘left-field’ opportunities that AI might highlight.

Orthogonal in silico approaches might be used to go from 1,000 to 100 targets, but to go from 100 to 10 the team should adopt the quickest, most high-throughput experiment to yield the next rung of supporting evidence.

Underlying many of the data, model and validation issues up to now has been the culture of the organisation and its failure to fully adapt to an AI-driven way of thinking or working.

While there are increasing efforts to bridge this gap, upskilling or recruiting talent with AI expertise is essential.

At the same time data scientists must be educated in the decision-making process of R&D, and understand/develop methods that directly support that. More could also be done to build the understanding that AI will raise the productivity level of all R&D researchers, and is therefore an opportunity and not a threat.

Super models

Active advances

Igniting explanations

Viewing validation

Enabling a new reality