Automated confidence estimation in deep learning auto-segmentation for brain organs at risk on MRI for radiotherapy.

Alzahrani, NM.; Henry, AM.; Al-Qaisieh, BM.; Murray, LJ.; Nix, MG.

Automated confidence estimation in deep learning auto-segmentation for brain organs at risk on MRI for radiotherapy.

All Authors

Alzahrani, NM.

Henry, AM.

Al-Qaisieh, BM.

Murray, LJ.

Nix, MG.

LTHT Author

Alzahrani, Nouf
Henry, Ann
Al-Qaisieh, Bashar
Murray, Louise
Nix, Michael

LTHT Department

Oncology
Medical Physics & Engineering
Leeds Cancer Centre

Publication Date

2024

Item Type

Journal Article

Abstract

PURPOSE: We have built a novel AI-driven QA method called AutoConfidence (ACo), to estimate segmentation confidence on a per-voxel basis without gold standard segmentations, enabling robust, efficient review of automated segmentation (AS). We have demonstrated this method in brain OAR AS on MRI, using internal and external (third-party) AS models. METHODS: Thirty-two retrospectives, MRI planned, glioma cases were randomly selected from a local clinical cohort for ACo training. A generator was trained adversarialy to produce internal autosegmentations (IAS) with a discriminator to estimate voxel-wise IAS uncertainty, given the input MRI. Confidence maps for each proposed segmentation were produced for operator use in AS editing and were compared with "difference to gold-standard" error maps. Nine cases were used for testing ACo performance on IAS and validation with two external deep learning segmentation model predictions [external model with low-quality AS (EM-LQ) and external model with high-quality AS (EM-HQ)]. Matthew's correlation coefficient (MCC), false-positive rate (FPR), false-negative rate (FNR), and visual assessment were used for evaluation. Edge removal and geometric distance corrections were applied to achieve more useful and clinically relevant confidence maps and performance metrics. RESULTS: ACo showed generally excellent performance on both internal and external segmentations, across all OARs (except lenses). MCC was higher on IAS and low-quality external segmentations (EM-LQ) than high-quality ones (EM-HQ). On IAS and EM-LQ, average MCC (excluding lenses) varied from 0.6 to 0.9, while average FPR and FNR were <=0.13 and <=0.21, respectively. For EM-HQ, average MCC varied from 0.4 to 0.8, while average FPR and FNR were <=0.37 and <=0.22, respectively. CONCLUSION: ACo was a reliable predictor of uncertainty and errors on AS generated both internally and externally, demonstrating its potential as an independent, reference-free QA tool, which could help operators deliver robust, efficient autosegmentation in the radiotherapy clinic.