Automated extraction of ESGO operative report fields from free-text surgical notes using large language models in advanced ovarian cancer.

No Thumbnail Available

All Authors

Laios, A.
Plakou, G.
Quaranta, M.
Kaufmann, A.
Theophilou, G.
Kalampokis, E.
Fotopoulou, C.

LTHT Author

Laios, Alexandros
Quaranta, Michela
Kaufmann, Angelika
Theophilou, Georgios

LTHT Department

Oncology
Leeds Cancer Centre
Gynaecological Oncology

Non Medic

Publication Date

2026

Item Type

Journal Article

Language

Subject

ARTIFICIAL INTELLIGENCE , OVARIAN NEOPLASMS , GENITAL NEOPLASMS , FEMALE

Subject Headings

Abstract

OBJECTIVE: To determine whether large language models (LLMs) can automatically extract organ-level disease involvement to populate the Surgical Findings section of the European Society of Gynaecological Oncology (ESGO) Operative Report for advanced ovarian cancer. METHODS: We retrospectively collected 300 operative notes from cytoreductive surgeries performed at a tertiary ESGO-accredited center. Each note was interrogated to identify disease involvement across 35 pre-defined ESGO anatomical sites. For each site, LLMs were tasked with classifying whether disease was present. Their accuracy was compared with expert annotations using F1 scores. Four modern models were selected based on their state-of-the-art performance and suitability for clinical text interpretation. Operative notes were converted into sets of binary (yes/no) questions corresponding to each anatomical site. Models were tested both in their basic form and after targeted enhancement strategies to reduce common errors. These enhancements included adding a clinical terminology list, providing clearer task instructions, and showing a small number of examples. RESULTS: The models showed good baseline accuracy, with the two top-performing systems achieving F1 scores of 0.851 (95% confidence interval [CI] 0.841 to 0.861) and 0.864 (95% CI 0.854 to 0.873). Following optimization strategies, accuracy increased further, reaching 0.897 (95% CI 0.888 to 0.906) and 0.875 (95% CI 0.866 to 0.884). Performance was highest for key clinical sites, including the omentum, right diaphragm (95%), and ovaries (92%). Lower accuracy was observed for complex anatomical sites such as bowel (small bowel 73%, large bowel 61%) and peritoneal sites (pouch of Douglas 82%, abdominal wall 68%). Frequent errors involved laterality, overlapping anatomical regions, and ambiguous abbreviations. Optimization strategies improved distinction between closely related sites (rectosigmoid vs large bowel/mesentery) and reduced left/right errors. CONCLUSIONS: With enhancement strategies, LLMs demonstrated near-human performance in extracting ESGO-compliant operative information. Integrating model-assisted extraction into surgical workflows may reduce reporting time, improve completeness, and help standardize operative documentation.

Journal

International Journal of Gynecological Cancer