Yazar "Esmerer, Emel" seçeneğine göre listele
Listeleniyor 1 - 1 / 1
Sayfa Başına Sonuç
Sıralama seçenekleri
Yayın AnNLP-driven framework for automated radiology–pathology concordance assessment in breast biopsy(MDPI Publishing, 2026) Esmerer, Emel; Nazlı, Mehmet Ali; Uzun-Per, Meryem; Gümüş Değidiben, Melike; Söyleyici, Merve; Tahir, Eren; Bal, MertBackground/Objectives: To develop and assess the feasibility of a natural language processing (NLP) framework for automated assessment of radiology-pathology concordance in breast biopsy using machine learning-based analysis of unstructured reports. Methods: This retrospective study included 766 paired radiology and pathology reports from ultrasound- or mammography-guided breast biopsies (August 2020-May 2024). Reports underwent translation, normalization, tokenization, lemmatization, and synonym expansion, followed by structured encoding of BI-RADS and pathology categories. Three models were trained: a Decision Tree, a LightGBM classifier, and a fine-tuned BioBERT model. Concordance labels were defined by multidisciplinary consensus. Performance metrics included accuracy, sensitivity, specificity, F1-score, area under the curve (AUC), and Cohen's kappa. SHapley Additive exPlanations (SHAP) analysis was used to identify influential features. Results: Among 766 cases, 707 (92.3%) were concordant and 59 (7.7%) were initially discordant. After excluding B3 lesions (n = 46), 13 true discordant cases remained (1.7%). Including B3 lesions increased clinically non-concordant or indeterminate cases from 1.7% to 7.7%, indicating that the apparent performance of the models is likely sensitive to case definition and dataset composition. BI-RADS 4a was the most common category (31.3%), and benign pathology (B2) accounted for 64.4% of biopsies. Within this dataset, LightGBM yielded the highest apparent AUC (0.999) (however, given the extremely small number of true discordant cases, this estimate is likely unstable and should be interpreted with caution), while BioBERT showed the strongest agreement with expert consensus (κ = 0.89). SHAP analysis identified clinically meaningful terms such as calcification, hypoechoic, ductal, and carcinoma as key contributors to model predictions. Given the very limited number of true discordant cases, these performance estimates are likely unstable and should be regarded as preliminary, requiring validation in larger, multi-center cohorts. Conclusions: This study presents a proof-of-concept NLP-based framework for radiology-pathology concordance assessment. The models showed promising performance in identifying potentially discordant cases; however, given the limited number of true discordant samples, these findings should be considered preliminary and require further validation in larger, multi-center datasets before clinical implementation.












