Instruction Tuned Large Language Models for Assisting Brain Surgery Through Procedural Alignment and Decision Support Using PPO Reinforcement Learning

Authors

  • G Ramana Murthy
  • A. Jansi Rani
  • Lavanya S
  • P Ravi Kumar
  • K. Karunambiga

DOI:

https://doi.org/10.52783/jns.v14.2918

Keywords:

Instruction tuning, Large language models, Brain surgery, Proximal Policy Optimization, Reinforcement learning, Procedural alignment

Abstract

The integration of large language models (LLMs) into clinical decision-making remains a critical challenge, especially in high-risk domains such as neurosurgery. This study presents a novel framework that leverages instruction-tuned LLMs optimized using Proximal Policy Optimization (PPO) reinforcement learning to assist brain surgery through procedural alignment and decision support. We begin by fine-tuning a transformer-based LLM on domain-specific surgical protocols and neurosurgical dialogue datasets using supervised instruction tuning. To further enhance procedural adherence and mitigate hallucinations, we introduce a reward model guided by expert-annotated signals such as factual accuracy, stepwise protocol fidelity, and relevance to surgical context. PPO is employed to iteratively refine the model's responses through a feedback loop, optimizing both language coherence and domain-specific reliability. Experimental evaluations on simulated neurosurgical benchmarks demonstrate that our model outperforms both instruction-tuned and PPO-only baselines in terms of procedural accuracy and decision support relevance. The results indicate that reinforcement learning with human feedback, when tailored to surgical requirements, significantly improves trustworthiness and alignment in LLM outputs. This research contributes a critical step toward the deployment of explainable, reliable AI assistants for neurosurgical procedures.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Lundstrom, C., Sjöblom, E., & Lilja, M. (2022). Trust in artificial intelligence in healthcare: A qualitative study of users' experiences. Journal of Biomedical Informatics, 131, 104083. https://doi.org/10.1016/j.jbi.2022.104083

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Christiano, P. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. https://arxiv.org/abs/1707.06347

Topol, E. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56. https://doi.org/10.1038/s41591-018-0300-7

Yuan, Z., Yip, M. C., Luo, W., & Chen, I. Y. (2023). Instruction tuning of language models enhances domain-specific reasoning: Applications in medicine. Nature Digital Medicine, 6(2), 100–112. https://doi.org/10.1038/s41746-023-00877-5

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., ... & Le, Q. V. (2022). Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416. https://arxiv.org/abs/2210.11416

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Christiano, P. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. https://arxiv.org/abs/1707.06347

Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., ... & Brain, M. (2023). Large language models encode clinical knowledge. Nature, 614(7949), 87–94. https://doi.org/10.1038/s41586-022-05599-9

Caballero, A., Estevez, D., Ríos, M., Rodríguez, L., & Martín, Á. (2020). A clinical decision support system for surgical procedures. Expert Systems with Applications, 139, 112833. https://doi.org/10.1016/j.eswa.2019.112833

Downloads

Published

2025-04-02

How to Cite

1.
Murthy GR, Rani AJ, S L, Kumar PR, Karunambiga K. Instruction Tuned Large Language Models for Assisting Brain Surgery Through Procedural Alignment and Decision Support Using PPO Reinforcement Learning. J Neonatal Surg [Internet]. 2025Apr.2 [cited 2025Sep.21];14(5):166-75. Available from: https://jneonatalsurg.com/index.php/jns/article/view/2918