An NLP-driven framework for automated radiology–pathology concordance assessment in breast biopsy

Esmerer, Emel; Nazlı, Mehmet Ali; Uzun-Per, Meryem; Gümüş Değidiben, Melike; Söyleyici, Merve; Tahir, Eren; Bal, Mert

An NLP-driven framework for automated radiology–pathology concordance assessment in breast biopsy

dc.authorid	0000-0002-8273-976X
dc.authorid	0000-0003-4605-7822
dc.authorid	0000-0002-4958-4575
dc.authorid	0009-0000-9534-7581
dc.authorid	0009-0008-4804-2927
dc.authorid	0009-0000-0289-1781
dc.authorid	0000-0001-6250-929X
dc.contributor.author	Esmerer, Emel
dc.contributor.author	Nazlı, Mehmet Ali
dc.contributor.author	Uzun-Per, Meryem
dc.contributor.author	Gümüş Değidiben, Melike
dc.contributor.author	Söyleyici, Merve
dc.contributor.author	Tahir, Eren
dc.contributor.author	Bal, Mert
dc.date.accessioned	2026-05-18T08:19:19Z
dc.date.available	2026-05-18T08:19:19Z
dc.date.issued	2026
dc.department	Fakülteler, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstract	Background/Objectives: To develop and assess the feasibility of a natural language processing (NLP) framework for automated assessment of radiology-pathology concordance in breast biopsy using machine learning-based analysis of unstructured reports. Methods: This retrospective study included 766 paired radiology and pathology reports from ultrasound- or mammography-guided breast biopsies (August 2020-May 2024). Reports underwent translation, normalization, tokenization, lemmatization, and synonym expansion, followed by structured encoding of BI-RADS and pathology categories. Three models were trained: a Decision Tree, a LightGBM classifier, and a fine-tuned BioBERT model. Concordance labels were defined by multidisciplinary consensus. Performance metrics included accuracy, sensitivity, specificity, F1-score, area under the curve (AUC), and Cohen's kappa. SHapley Additive exPlanations (SHAP) analysis was used to identify influential features. Results: Among 766 cases, 707 (92.3%) were concordant and 59 (7.7%) were initially discordant. After excluding B3 lesions (n = 46), 13 true discordant cases remained (1.7%). Including B3 lesions increased clinically non-concordant or indeterminate cases from 1.7% to 7.7%, indicating that the apparent performance of the models is likely sensitive to case definition and dataset composition. BI-RADS 4a was the most common category (31.3%), and benign pathology (B2) accounted for 64.4% of biopsies. Within this dataset, LightGBM yielded the highest apparent AUC (0.999) (however, given the extremely small number of true discordant cases, this estimate is likely unstable and should be interpreted with caution), while BioBERT showed the strongest agreement with expert consensus (κ = 0.89). SHAP analysis identified clinically meaningful terms such as calcification, hypoechoic, ductal, and carcinoma as key contributors to model predictions. Given the very limited number of true discordant cases, these performance estimates are likely unstable and should be regarded as preliminary, requiring validation in larger, multi-center cohorts. Conclusions: This study presents a proof-of-concept NLP-based framework for radiology-pathology concordance assessment. The models showed promising performance in identifying potentially discordant cases; however, given the limited number of true discordant samples, these findings should be considered preliminary and require further validation in larger, multi-center datasets before clinical implementation.
dc.identifier.citation	Esmerer, E., Nazlı, M. A., Uzun-Per, M., Gümüş Değidiben, M., Söyleyici, M., Tahir, E., & Bal, M. (2026). An NLP-driven framework for automated radiology–pathology concordance assessment in breast biopsy. Diagnostics, 16(9), pp. 1-15. https://doi.org/10.3390/diagnostics16091249
dc.identifier.doi	10.3390/diagnostics16091249
dc.identifier.endpage	15
dc.identifier.issn	2075-4418
dc.identifier.issue	9
dc.identifier.pmid	PMID: 42121953
dc.identifier.scopus	2-s2.0-105038468832
dc.identifier.scopusquality	Q2
dc.identifier.startpage	1
dc.identifier.uri	https://doi.org/10.3390/diagnostics16091249
dc.identifier.uri	https://hdl.handle.net/20.500.13055/1483
dc.identifier.volume	16
dc.identifier.wos	WOS:001764065400001
dc.identifier.wosquality	Q1
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	PubMed
dc.indekslendigikaynak.other	SCI-E - Science Citation Index Expanded
dc.institutionauthor	Uzun-Per, Meryem
dc.institutionauthorid	0000-0002-4958-4575
dc.language.iso	en
dc.publisher	MDPI Publishing
dc.relation.ispartof	Diagnostics
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Natural Language Processing
dc.subject	Radiology–Pathology Concordance
dc.subject	Breast Biopsy
dc.subject	Machine Learning
dc.subject	Artificial İntelligence
dc.title	An NLP-driven framework for automated radiology–pathology concordance assessment in breast biopsy
dc.type	Article
dspace.entity.type	Publication

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Tam Metin / Full Text.pdf
Boyut:: 1.02 MB
Biçim:: Adobe Portable Document Format

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.17 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Bilgisayar Mühendisliği Bölümü Koleksiyonu
PubMed İndeksli Yayın Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu