Scientific accuracy of large language models in tilted implant dentistry: A guideline-based comparative evaluation

Yıldız, Mehmet S.Alkap, MelekÖzdal, UmutÖzdal Zincir, Özge2026-04-282026-04-282026Yıldız, M. S., Alkap, M., Özdal, U., & Özdal Zincir, Ö. (2026). Scientific accuracy of large language models in tilted implant dentistry: A guideline-based comparative evaluation. The Journal of Craniofacial Surgery, pp. 1-5. https://doi.org/10.1097/SCS.00000000000127681049-22751536-3732https://doi.org/10.1097/SCS.0000000000012768https://hdl.handle.net/20.500.13055/1448Tilted dental implant systems are widely used in the rehabilitation of anatomically compromised jaws and are supported by international consensus guidelines. Concurrently, large language models (LLMs) are increasingly accessed as informational tools in implant dentistry; however, their scien tific accuracy and adherence to guideline-based principles in advanced implant concepts remain insufficiently explored. This study evaluated the scientific accuracy, guideline conformity, and clinical consistency of responses generated by 4 LLMs regarding tilted dental implant systems. A total of 120 guide line-based questions covering 8 predefined domains (definition, indications, contraindications, advantages, surgical procedure content, prosthetic procedure content, complications, and prognosis/survival) were developed in accordance with ITI, EAO, and AAOMS consensus reports. Each question was in dependently submitted to ChatGPT-5.2, Copilot, DeepSeek, and Gemini, and all responses were anonymized and evaluated by a multidisciplinary expert panel using a structured ordinal scoring system. Overall, scientific accuracy scores were high across all models, with near-ceiling performance observed in domains related to indications, advantages, procedural con tent, and prognosis. Statistically significant between-model differences were identified in the definition (P = 0.003), con traindications (P = 0.006), and complications (P < 0.001) do mains, with DeepSeek and Gemini demonstrating consistently higher scores in complication-related content compared with ChatGPT and Copilot. Within-model analyses further revealed significant domain-dependent variability across all LLMs. Al-though LLMs demonstrate a strong capacity to reproduce es tablished, guideline-based knowledge regarding tilted implant systems, limitations remain in safety-critical domains requiring nuanced clinical judgment. Accordingly, LLMs should be re garded as adjunctive educational tools rather than substitutes for expert decision-making in craniofacial implantology.eninfo:eu-repo/semantics/closedAccessArtificial IntelligenceImplant DentistryLarge Language ModelsTilted Dental ImplantsScientific accuracy of large language models in tilted implant dentistry: A guideline-based comparative evaluationArticle10.1097/SCS.000000000001276815Q3PMID: 42011987Q3