Identifying Reader Fatigue in Oncology Trial Imaging

Reader fatigue in radiology can lead to high medical errors. Out of more than 11,000 preventable hospital deaths in English NHS acute hospitals each year, over eight thousand deaths are attributed to diagnostic error.

Interpretation of medical images is a challenging task as it is repetitive in nature, and errors (false negatives) are relatively common. Radiologists are also subject to visual fatigue due to continuous and longtime exposure to computer screens for interpreting medical images. It has been found that radiological errors are more likely to occur at the end of shift as compared to the beginning.

Blinded Independent Central Review (BICR) is one method recommended by the FDA for registration oncology trials (1). The objective evaluation of clinical indicators involving radiological images is an important end-point factor in oncology trials. During BICR, readers are routinely monitored using different models to ensure quality reads.

“Reviewer Disagreement Index may be used as an early indicator for potential reader quality and fatigue if a particular reader is concurrently reading on multiple trials.”

– Manish Sharma, MD, VP Medical Imaging, Calyx

Reader Monitoring Approaches

Monitoring of reviewers is not just mandatory from a regulatory viewpoint but also critical for proactive intervention throughout the trial. Monitoring safeguards high quality radiological assessments as well as adherence to specific clinical trial protocols. There is always inherent variability in the BICR process due to the difference in backgrounds, training, and humanity of the reviewers. Despite the existing inherent variability, there is a lack of meaningful methods for tracking and proactively improving reviewer performance.

“Double read with adjudication” is one of the frequently used models in various oncology trials review. This method permits better radiological assessments than a single read model. Double read with adjudication makes the process of BICR robust enough to not let variability and even minor errors impact the study outcome, if monitored properly.

Adjudication Rate (AR) has been used as a metric to track reviewer performance but does not accurately identify reviewer performance issues. It is dependent on many external variables like adjudication trigger, end point, indication, tumor burden etc. Adjudication agreement rate (AAR) is a relative performance indicator for a given reviewer where a higher adjudicator agreement rate suggests better reader performance.

Evaluating Reader Fatigue with Reviewer Disagreement Index

In an earlier blog, we presented how Reviewer disagreement index (RDI) considers the subjects for which adjudicator disagreed with the reviewer and considers adjudicator disagreement relative to the total number of cases read. RDI considers both the overall AR for a study and the individual AAR for each reader. The RDI indicates the percentage of disagreed cases for a given reader across the total number of cases read, as defined in the equation below, where a low RDI value indicates better reader performance and high RDI value indicates poor reader performance.

RDI =

# of cases where adjudicator disagreed with given reader

Total # of all cases read

Not surprisingly, there might be a correlation between the number of studies a reader reads on and how the cases are distributed across all studies. If the cases are distributed across too many studies, it can be a major issue contributing to reader fatigue e.g., if the reader in a reading session of 2 hours reads a few cases each on 10 different trials. The best-case scenario is to have a fine balance that would allow a reader a majority of cases on the same study in a reading session.

The plots below show a weak correlation of decreasing timepoints (Y-axis) to increasing RDI (X-axis). RDI can be used to give an early insight to trends by monitoring the ratio of RDI on each study per week, or per reading session and its interplay with overall and other studies’ RDI.

A recent study suggests that RDI can also be used as a good surrogate for reader fatigue (3). RDI may be used on an effective daily / weekly / monthly basis against read volume for monitoring reader fatigue which affects variability and thus read quality. Increasing RDI trend may be used as an early indicator for potential reader quality and fatigue if a particular reader is concurrently reading on multiple trials.

Reference

1. Guidance for Industry Developing Medical Imaging Drug and Biologic Products. Part 3: Design, Analysis, and Interpretation of Clinical Studies. US Department of Health and Human Services. Food and Drug Administration. Center for Drug Evaluation and Research. Center for Biologics Evaluation and Research; 2004.

2. Clinical Trial Imaging Endpoints Process Standards Guidance for Industry Draft. US Department of Health and Human Services. Food and Drug Administration. Center for Drug Evaluation and Research. Center for Biologics Evaluation and Research. March 2015 Revision1.

3. Manish Sharma, Madhuri Madasu, Sree Sudha Kota, Surabhi Bajpai, Yibin Shao, Srinivas Pasupuleti, Michael O’Connor, “Using reader disagreement index as a tool for monitoring impact on read quality due to reader fatigue in central reviewers,” Proc. SPIE 12035, Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment, 120350J (4 April 2022); doi: 10.1117/12.2613082