Self-supervised Text-vision Alignment for Automated Brain MRI Abnormality Detection: A Multicenter Study (ALIGN Study).

Wood, DA.; Guilhem, E.; Kafiabadi, S.; Al Busaidi, A.; Dissanayake, K.; Hammam, A.; Mansoor, N.; Townend, M.; Agarwal, S.; Wei, Y.; Mazumder, A.; Barker, GJ.; Sasieni, P.; Ourselin, S.; Cole, JH.; Nair, N.; Geetha, A.; Onyekwuluje, C.; Dineen, R.; Dhillon, P.; Costigan, C.; Fatania, K.; Igra, M.; Nichols, R.; Saada, J.; Juette, A.; Sultana, R.; Spohr, H.; Booth, TC.

Self-supervised Text-vision Alignment for Automated Brain MRI Abnormality Detection: A Multicenter Study (ALIGN Study).

All Authors

Wood, DA.

Guilhem, E.

Kafiabadi, S.

Al Busaidi, A.

Dissanayake, K.

Hammam, A.

Mansoor, N.

Townend, M.

Agarwal, S.

Wei, Y.

Show 10 more

LTHT Author

Fatania, Kavi
Igra, Mark

LTHT Department

Radiology

Publication Date

2025

Item Type

Journal Article

Abstract

Purpose To develop a self-supervised text-vision framework to detect abnormalities on brain MRI scans by leveraging free-text neuroradiology reports, eliminating the need for expertlabeled training datasets. Materials and Methods This retrospective and prospective multicenter study included 81,936 brain MRI examinations and corresponding radiology reports for adult patients at two UK National Health Service (NHS) hospitals during January 2008-December 2019 for training and internal testing, and 1,369 prospectively collected examinations between March 2022-March 2024 from four separate NHS hospitals for external testing (clinicaltrials.gov NCT043681). A neuroradiology language model (NeuroBERT) was trained using self-supervised tasks to generate report embeddings. Convolutional neural networks (one per MRI sequence) were trained to map scans to embeddings by minimizing mean squared error loss. The framework then detected abnormalities in new examinations by scoring scans against query sentences using textimage similarity. Model diagnostic performance was assessed using the area under the receiver operating characteristic curve (AUC). Results The framework achieved an AUC of 0.95 (95% CI: 0.94, 0.97) for normal versus abnormal classification and generalized to external sites with examination-level AUCs of 0.90 (95% CI: 0.86, 0.93), 0.87 (95% CI: 0.83, 0.90), 0.86 (95% CI: 0.83, 0.90), and 0.85 (95% CI: 0.81, 0.89). In five zero-shot classification tasks-acute stroke, multiple sclerosis, intracranial hemorrhage, meningioma, and hydrocephalus-the framework achieved a mean AUC of 0.89 (range, 0.77-0.93). For visual-semantic image retrieval, mean precision was 0.84 among the top 15 images across seven pathologies. Conclusion The self-supervised text-vision framework accurately detected brain MRI abnormalities without expert-labeled datasets. ? The Author(s) 2025. Published by the Radiological Society of North America under a CC BY 4.0 license.