AnNLP-driven framework for automated radiology–pathology concordance assessment in breast biopsy

Kapalı Erişim

Tarih

2026

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

MDPI Publishing

Erişim Hakkı

info:eu-repo/semantics/openAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Background/Objectives: To develop and assess the feasibility of a natural language processing (NLP) framework for automated assessment of radiology-pathology concordance in breast biopsy using machine learning-based analysis of unstructured reports. Methods: This retrospective study included 766 paired radiology and pathology reports from ultrasound- or mammography-guided breast biopsies (August 2020-May 2024). Reports underwent translation, normalization, tokenization, lemmatization, and synonym expansion, followed by structured encoding of BI-RADS and pathology categories. Three models were trained: a Decision Tree, a LightGBM classifier, and a fine-tuned BioBERT model. Concordance labels were defined by multidisciplinary consensus. Performance metrics included accuracy, sensitivity, specificity, F1-score, area under the curve (AUC), and Cohen's kappa. SHapley Additive exPlanations (SHAP) analysis was used to identify influential features. Results: Among 766 cases, 707 (92.3%) were concordant and 59 (7.7%) were initially discordant. After excluding B3 lesions (n = 46), 13 true discordant cases remained (1.7%). Including B3 lesions increased clinically non-concordant or indeterminate cases from 1.7% to 7.7%, indicating that the apparent performance of the models is likely sensitive to case definition and dataset composition. BI-RADS 4a was the most common category (31.3%), and benign pathology (B2) accounted for 64.4% of biopsies. Within this dataset, LightGBM yielded the highest apparent AUC (0.999) (however, given the extremely small number of true discordant cases, this estimate is likely unstable and should be interpreted with caution), while BioBERT showed the strongest agreement with expert consensus (κ = 0.89). SHAP analysis identified clinically meaningful terms such as calcification, hypoechoic, ductal, and carcinoma as key contributors to model predictions. Given the very limited number of true discordant cases, these performance estimates are likely unstable and should be regarded as preliminary, requiring validation in larger, multi-center cohorts. Conclusions: This study presents a proof-of-concept NLP-based framework for radiology-pathology concordance assessment. The models showed promising performance in identifying potentially discordant cases; however, given the limited number of true discordant samples, these findings should be considered preliminary and require further validation in larger, multi-center datasets before clinical implementation.

Açıklama

Anahtar Kelimeler

Natural Language Processing, Radiology–Pathology Concordance, Breast Biopsy, Machine Learning, Artificial İntelligence

Kaynak

Diagnostics

WoS Q Değeri

Q1

Scopus Q Değeri

Q2

Cilt

16

Sayı

9

Künye

Esmerer, E., Nazlı, M. A., Uzun-Per, M., Gümüş Değidiben, M., Söyleyici, M., Tahir, E., & Bal, M. (2026). AnNLP-driven framework for automated radiology–pathology concordance assessment in breast biopsy. Diagnostics, 16(9), pp. 1-15. https://doi.org/10.3390/diagnostics16091249