October 2025 • PharmaTimes Magazine • 32-33
// CLINICAL TRIALS //
Strengthening data integrity in trials with subjective endpoints
Clinical trials that rely on subjective or semi-subjective endpoints, such as clinician-reported outcomes or patient-reported symptoms, face distinct challenges when it comes to data integrity.
Variability in interpretation, inconsistent scoring and gaps in training can all compromise outcomes, particularly in international or multicentre studies.
This article highlights some lessons learned in the dermatology field, which relies heavily on subjective outcomes, and outlines strategies for improving data integrity.
By implementing outcome assessment training across operational teams, embedding quality thinking from the start-up phase and anticipating interrater variation, teams can help reduce inconsistencies and strengthen the reliability of findings in any field where trial endpoints are difficult to quantify and vulnerable to interpretation.
In dermatology, many trials rely on visible signs and well-defined scoring systems to assess disease severity, including assessments of erythema, lesion thickness and the percentage of body surface area (BSA) affected.
These scores feed directly into composite outcomes and play a central role in determining treatment efficacy.
Even widely used tools such as the Psoriasis Area and Severity Index (PASI) and the Eczema Area and Severity Index (EASI), while structured, rely heavily on clinician interpretation of visual signs and there is still considerable room for variation.
Two investigators might interpret the same skin presentation quite differently, or a single investigator might apply scores inconsistently across different visits.
Even the slightest nuance can affect the composite outcome, influencing data integrity in ways that are hard to detect.
This type of variability may not show up as a protocol deviation but, when reviewed holistically, discrepancies in how scoring is applied can lead to broader concerns about consistency and reliability.
In composite scoring systems, a small inconsistency in one input, such as misjudging the area affected, can have a knock-on effect on the overall score.
This ‘domino effect’ may skew the apparent severity of a condition or the perceived efficacy of a treatment, particularly in endpoint models where multiple observations are combined to determine response.
Rater training for investigators and their teams can help to address this issue, ensuring that all sites, and clinicians within the same site, follow the same procedures.
This can help investigators assess their patients with as much objectivity as possible, and assist CROs in standardising protocols at each site in every country.
It is also important to identify rater drift (diminishing interrater reliability occurring throughout a clinical trial) and remediate it through further training to ‘recalibrate’ investigators.
In practice, these issues are not limited to the raters themselves.
Operational team members, such as project managers, CRAs and CTMs, also play a vital role in recognising when data may be affected by inconsistent scoring.
‘Even the slightest nuance can affect the composite outcome, influencing data integrity in ways that are hard to detect’
When these team members are trained in the outcome assessments being used, they are better equipped to spot potential problems and initiate discussions with sites that could prevent bigger issues later on.
Familiarising operational staff with the structure and intent of outcome assessments enables them to take a more proactive approach to quality, complementing site-level training without being reliant on it.
Operational awareness of clinical assessments also contributes to a broader culture of accountability.
When everyone involved inthe trial understands how endpoints are derived, they’re more likely to engage critically with the data and raise questions early.
This creates a feedback loop where quality issues are identified faster, and cross-functional teams are empowered to respond more effectively.
The opportunity to minimise scoring variability starts early in the process during site selection, feasibility determination and protocol design.
At this stage, teams can assess whether sites are equipped and experienced with the specific outcomes in use, and whether they understand how small differences in scoring can affect final results.
In studies with complex or nuanced outcome assessments, it can be helpful to involve operational, medical and data-focused team members in these early conversations.
Ensuring that everyone involved fully understands the scoring methodology, including site-facing teams, helps to create alignment before recruitment begins.
This approach is particularly important in early-phase studies or when supporting smaller biotech sponsors who may need additional input into how outcomes are operationalised across sites.
Although these approaches were developed in dermatology, they are directly applicable to other indications that rely on subjective outcomes.
Rheumatology, for example, often uses complex scoring systems that can include up to 90 elements feeding into a single result.
The risk of inconsistency is even greater, but the same principles apply.
In rheumatology, the assessments may be more granular and multifactorial, but the challenge remains the same: ensuring consistency across raters, time points and locations.
Outcome assessment training across the wider team, early alignment on scoring expectations and careful attention to site performance are transferable strategies that help reduce variability in any therapeutic area where human judgement is part of the endpoint.
In the longer term, digital imaging tools and AI-based scoring systems may also help to reduce variability in some assessments.
While not a replacement for rater training, these technologies could eventually support more objective interpretation of visual data, although validation and practical implementation remain ongoing challenges.
Innovative imaging modalities, such as 3D total-body photography, are showing great promise in clinical research, as well as wearable biosensors to monitor itch and sleep patterns, which simultaneously allow remote assessment, supporting the increase in decentralised studies.
Artificial intelligence (AI) has made incredible inroads in modern medicine, especially in diagnostic imaging, and is starting to show its applicability in this context.
For example, AI can potentially analyse images, optimising pattern detection, improving the interpretation of results and, ultimately, increasing diagnostic accuracy.
AI has a lot to offer in dermatology, especially with respect to advanced image analysis and automatic interpretation of skin lesions, but it still has a long way to go before it becomes a mainstay in clinical trials.
The quality and diversity of training data and potential biases due to the lack of transparency in algorithms are ongoing limitations, as are the legal and ethical questions that still need to be addressed.
Developing robust data in clinical trials with subjective endpoints is undoubtedly challenging, but by no means impossible.
By blending rigorous methodology, ongoing rater training and, when appropriate, the latest technologies, research teams can minimise variability and protect data integrity.
Rapidly addressing interrater variation and sources of bias through continuous education is key, as is embedding quality from the earliest planning stages.
A critical part of this effort is ensuring that operational teams, and not just investigators, understand how outcomes are measured and how inconsistencies arise.
This broader awareness enables earlier detection, more meaningful site interactions and better support for consistent scoring throughout the trial.
Janet Overvelde is Senior Director of Project Management at Indero