Instruction Tuned Large Language Models for Assisting Brain Surgery Through Procedural Alignment and Decision Support Using PPO Reinforcement Learning
DOI:
https://doi.org/10.52783/jns.v14.2918Keywords:
Instruction tuning, Large language models, Brain surgery, Proximal Policy Optimization, Reinforcement learning, Procedural alignmentAbstract
The integration of large language models (LLMs) into clinical decision-making remains a critical challenge, especially in high-risk domains such as neurosurgery. This study presents a novel framework that leverages instruction-tuned LLMs optimized using Proximal Policy Optimization (PPO) reinforcement learning to assist brain surgery through procedural alignment and decision support. We begin by fine-tuning a transformer-based LLM on domain-specific surgical protocols and neurosurgical dialogue datasets using supervised instruction tuning. To further enhance procedural adherence and mitigate hallucinations, we introduce a reward model guided by expert-annotated signals such as factual accuracy, stepwise protocol fidelity, and relevance to surgical context. PPO is employed to iteratively refine the model's responses through a feedback loop, optimizing both language coherence and domain-specific reliability. Experimental evaluations on simulated neurosurgical benchmarks demonstrate that our model outperforms both instruction-tuned and PPO-only baselines in terms of procedural accuracy and decision support relevance. The results indicate that reinforcement learning with human feedback, when tailored to surgical requirements, significantly improves trustworthiness and alignment in LLM outputs. This research contributes a critical step toward the deployment of explainable, reliable AI assistants for neurosurgical procedures.
Downloads
Metrics
References
Lundstrom, C., Sjöblom, E., & Lilja, M. (2022). Trust in artificial intelligence in healthcare: A qualitative study of users' experiences. Journal of Biomedical Informatics, 131, 104083. https://doi.org/10.1016/j.jbi.2022.104083
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Christiano, P. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. https://arxiv.org/abs/1707.06347
Topol, E. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56. https://doi.org/10.1038/s41591-018-0300-7
Yuan, Z., Yip, M. C., Luo, W., & Chen, I. Y. (2023). Instruction tuning of language models enhances domain-specific reasoning: Applications in medicine. Nature Digital Medicine, 6(2), 100–112. https://doi.org/10.1038/s41746-023-00877-5
Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., ... & Le, Q. V. (2022). Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416. https://arxiv.org/abs/2210.11416
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Christiano, P. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. https://arxiv.org/abs/1707.06347
Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., ... & Brain, M. (2023). Large language models encode clinical knowledge. Nature, 614(7949), 87–94. https://doi.org/10.1038/s41586-022-05599-9
Caballero, A., Estevez, D., Ríos, M., Rodríguez, L., & Martín, Á. (2020). A clinical decision support system for surgical procedures. Expert Systems with Applications, 139, 112833. https://doi.org/10.1016/j.eswa.2019.112833
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.