June 2026 • PharmaTimes Magazine • 14

// DIVERSITY //


Trial blazing!

How AI and real-world data are closing the clinical trial diversity gap

Despite regulatory uncertainty around the US Food and Drug Administration’s Diversity Action Plans (DAPs), most trial sponsors remain firmly committed to building more representative studies.

The missing piece has not been intent; it has been the operational tools to act on it at scale.

AI and real-world data (RWD) are now providing exactly that.
Clinical trial diversity has long sat at the intersection of scientific necessity, ethical obligation and commercial pragmatism.

For decades, the argument was clear: trials that fail to enrol populations reflective of the real-world patient community yield data that generalises poorly; labels that restrict prescribing and outcomes that underserve the most vulnerable.
Yet progress has been frustratingly slow. This is beginning to change, and the catalyst is not solely regulatory pressure.

A revolution in artificial intelligence (AI) and RWD is giving trial sponsors the operational capability to move diversity from a box-ticking exercise to a core component of trial strategy.

Regulatory backdrop: ambiguous but not absent

The FDA’s June 2024 draft guidance on Diversity Action Plans (DAPs) set out clear expectations: studies should specify enrolment goals disaggregated by race, ethnicity, sex and age.

The legal basis, the Food and Drug Omnibus Reform Act (FDORA), is established.
What remains uncertain is the timing of final guidance, and with it the 180-day implementation clock.

In practice, this ambiguity has created a split market.

Large pharmaceutical companies have largely absorbed DAPs into standard operating procedures, treating them as a scientific baseline rather than a compliance obligation.

Many have public diversity commitments and track enrolment demographics as a standard metric alongside efficacy endpoints.

Biotechs present a more complex picture.

Many are focused on rare diseases and niche populations where diversity obligations interact uncomfortably with finite funding runways.

The instinct to find the fastest path to market, for shareholders and for patients, can create genuine tension with broader enrolment goals.

Yet the penalty for getting this wrong can be severe, and it is not confined to biotechs.

One company recently received approval for a pivotal phase 3 therapy but with a label restricted to the narrow population it had enrolled.

Expanding that label will require entirely new trials, costing hundreds of millions of dollars and years of delay that could have been avoided with more thoughtful enrolment from the outset.

The business case for diversity in clinical trials is no longer theoretical.

A restricted label is not just a scientific failure; it is a commercial one.

Against this backdrop, the lull in US regulatory enforcement should not be read as permission to pause.

It is an opportunity to build the right foundations, and AI and RWD are making that task considerably more achievable.

AI and RWD revolution

The phrase ‘AI and real-world data’ risks becoming a catch-all that obscures more than it reveals.

In the context of clinical trial diversity, these tools address specific, well-defined operational challenges across the trial life cycle.

It is worth being precise about where they add value.

1. Feasibility modelling and protocol design

The most common failure mode in diversity planning is leaving it too late.
By the time a protocol is finalised and a site network activated, the parameters that determine who can be enrolled are largely fixed.

Retrospective fixes, such as protocol amendments, site additions and targeted outreach, are expensive, slow and often insufficient.

‘Clinical trial diversity has long sat
at the intersection of scientific necessity, ethical obligation and commercial pragmatism’

AI-powered feasibility tools change this by integrating patient-availability data from the outset.

Drawing on large-scale RWD – including electronic health records (EHR), insurance claims, laboratory data and disease registries – these platforms can model the likely patient pool for a given indication across geographies, demographics and site networks before a single patient is screened.

Critically, they can stress-test inclusion and exclusion (I/E) criteria against real-world patient populations.

Many I/E criteria that appear clinically reasonable in isolation turn out to disproportionately exclude specific demographic groups: patients with comorbidities more prevalent in certain ethnic populations, or age thresholds that inadvertently exclude the elderly.

AI can surface these trade-offs, allowing protocol teams to make informed decisions about where scientific rigour is genuinely required and where criteria can be broadened without compromising data quality.

In one example, a sponsor identified that proposed I/E constraints were excluding a significant portion of eligible patients from under-represented groups.

Protocol amendments made before study initiation increased the predicted enrolment rate by 10% or 20% and broadened the eligible participant base, changes that would have been far more costly to implement mid-trial.

  • AI-powered feasibility tools can simulate enrolment scenarios by demographic group before protocol lock
  • I/E criteria analysis identifies exclusionary parameters that may not be scientifically necessary
  • Geographic modelling highlights where under-represented populations exist but are not being reached by proposed site networks
  • Early intervention is substantially cheaper than mid-trial protocol amendments.

2. Accelerating R&D and insight generation
Site selection has historically been driven by a familiar shortlist: high-enrolling sites with experienced investigators; existing infrastructure and established prior site-sponsor relationships.

The result is a network that reflects the demographics of its past patients rather than the demographics needed for the trial at hand.

RWD is enabling a fundamentally different approach.

By analysing patient population data at the site level (including the demographic composition of the catchment population, disease prevalence, standard of care patterns and historical enrolment performance), sponsors can identify sites that serve under represented communities and have the patient pool to support diverse enrolment.

AI adds a further layer by integrating social determinants of health (SDOH) data: transport access; insurance status; language prevalence and community health infrastructure.

A site in a demographically diverse urban area may be poorly served by public transport, making it effectively inaccessible to the very patients it serves.
These factors, invisible in traditional site selection databases, become legible through SDOH analysis.

The output is not just a ranked list of sites, but a diversity-stratified network, one designed from the start to reach the patient populations that matter scientifically and commercially.

Site selection built on historical enrolment performance perpetuates demographic bias.

RWD-driven selection can break that cycle.

3. Patient identification and real-time matching
Even the most thoughtfully designed trial and carefully selected site network will underperform if patient identification remains slow, inefficient and demographically skewed.

This is where the combination of AI and RWD is perhaps most transformative.
Every day, billions of data points enter healthcare systems, such as EHR entries, lab results, prescription claims and imaging records.

Historically, the latency in these data streams meant that analyses were retrospective, often months behind clinical reality.

That latency is collapsing. Near-real-time querying of RWD is now possible, enabling clinical teams to identify potentially eligible patients as they enter the healthcare system rather than waiting for periodic data pulls.

AI supercharges this capability in two important ways.

First, it can parse unstructured data (clinical notes, discharge summaries, referral letters) that contain highly relevant patient history but have historically been inaccessible to systematic analysis.

Natural language processing (NLP) models can extract and match this information against I/E criteria at scale, dramatically expanding the pool of identifiable candidates.

Second, AI can apply matching algorithms that are expressly calibrated to surface patients from under represented groups.

These algorithms flag candidates who meet eligibility criteria but might not appear in structured query results because their conditions are recorded differently, their healthcare interactions are less frequent or their demographic characteristics are under represented in training data.

This is not about lowering the scientific bar; it is about removing the invisible filters that have always existed in manual identification processes.

The results are measurable. In one deployment, RWD triggers drove 64% of randomised patients across 25 sites within a four-month recruitment period, accelerating timelines by two months.

Critically, real-time matching reached patients who would otherwise have gone unidentified, reducing screen fail rates, and reducing burden on site staff.

  • NLP models extract eligibility-relevant information from unstructured clinical notes at scale
  • Real-time RWD querying identifies eligible patients as they enter the healthcare system
  • Matching algorithms fine-tuned for demographic diversity surface under represented candidates
  • Targeted physician outreach and referral pathways convert identified patients to screened candidates
  • Reduced screen fail rates and site burden improve the economics of diversity-focused enrolment.

4. Patient engagement and reducing systemic barriers
Identifying patients is not the same as enrolling them.

Underrepresented populations face well-documented barriers to trial participation: distrust of the medical establishment; practical obstacles such as time off work and transport; language and health literacy challenges, and a lack of community-level awareness that trials are even an option.

AI and digital health tools are increasingly being deployed to address these barriers directly.

AI-driven patient engagement platforms can deliver personalised, culturally appropriate communications in the patient’s preferred language, with messaging adjusted to the specific concerns of different demographic communities.

Digital consent processes, remote monitoring capabilities and decentralised trial elements can reduce the burden of participation, making enrolment viable for patients who would have been excluded by the practical demands of traditional site-based trials.

These tools do not replace the community engagement and trust-building work that is essential for reaching underserved populations.

But they provide the infrastructure that makes such outreach scalable and sustainable across a global trial network.

Two principles that apply regardless of company size

The operational reality for a top ten pharmaceutical company looks very different from that of a Series B start-up.

But the underlying principles that make AI and RWD effective for diversity are size-agnostic.

Two stand out. The sponsors who have made the most consistent progress on trial diversity share one characteristic: they begin the work before protocol finalisation.

Feasibility assessments that incorporate real-world demographic data allow teams to set realistic enrolment goals, identify gaps in the site network and design protocols that do not inadvertently exclude the patients they most need to enrol.

This is not just good practice for diversity.

It is good trial design.

Early feasibility work consistently reduces mid-trial amendments, improves enrolment velocity and produces data that is more generalisable and therefore more commercially valuable.

Meet patients where they are

Once a trial is operational, the opportunity to influence diversity outcomes does not end.

Real-time RWD and AI-powered matching allow clinical teams to continuously identify and engage eligible patients throughout the recruitment period.

This is particularly valuable for reaching patients who interact with healthcare systems infrequently or who are served by sites outside the traditional network.

The combination of digital outreach, community-level engagement infrastructure and real-time data creates a recruitment model that is responsive to the actual distribution of patients, not just the distribution of historically active trial sites.

What sponsors should do now

The regulatory environment will clarify in time.

In the interim, sponsors that invest in building AI and RWD capabilities for diversity will be better positioned on multiple dimensions: scientifically; commercially and ethically.

The practical steps are well defined:

  • Integrate RWD-based demographic feasibility analysis into protocol development as standard, before I/E criteria are finalised
  • Audit site selection processes for demographic bias and incorporate SDOH data into network design
  • Deploy AI-powered patient identification tools explicitly calibrated to surface under represented populations
  • Invest in decentralised and hybrid trial elements that reduce practical barriers to participation
  • Engage community-level partners and patient advocacy organisations early; digital tools amplify human relationships, they do not replace them
  • Track diversity metrics as a standard enrolment KPI, reported alongside efficacy and safety data.

Image

Closing thoughts

There is a strong scientific and moral consensus that the pharmaceutical industry needs to move forward on diversity, in clinical trials and in patient care.

This is occurring independently of regulation. The tools to act on that consensus at scale now exist and are proven in deployment. The choice for sponsors is not whether to address diversity, but when and how.

Those that build the right data infrastructure and AI capabilities now will find that the challenge of diverse enrolment becomes, over time, not a burden to be managed but a competitive advantage to be leveraged.

And for the patients at the heart of this initiative, those who have historically been excluded from the trials that determine how their conditions are treated, the stakes could not be higher.

This article is based on research and industry experience supporting clinical trial sponsors on diversity strategy.

Suzanne Caruso is General Manager & Executive Vice President, Clinical & Strategic Intelligence at Norstella

Daniel Chancellor is VP of Thought Leadership at Norstella

Claire Riches is VP of Clinical Solutions at Citeline

Fenwick Eckhardt is Associate Director, Solution Consulting Operations at Citeline


Five alive: The points of difference

Accurate efficacy and safety data: different demographic groups metabolise medicines differently due to genetic variation, enzyme levels and metabolic rates, so diverse trials ensure a drug’s safety and effectiveness are understood across the full population who will use it

Identification of group‑specific side effects: some adverse reactions occur more often or more severely in certain genetic or ethnic groups, and broad representation helps researchers detect these risks early

Optimal dosage calibration: a one‑size‑fits‑all dose can cause under‑dosing or toxicity in some sub‑populations, so diverse participation allows more precise dosing based on factors such as body composition, age and biological sex

Better understanding of disease variations: many conditions present differently or carry different risk burdens across demographics, meaning diverse trials reveal insights that support tailored clinical understanding

Reduction of health disparities: underserved communities often face higher disease burdens but lower research representation, and inclusive trials help ensure medical advances address the needs of these populations.