AI-Driven Multimodal Fusion of Neuroimaging and Speech Analysis for Early Detection of Alzheimer’s Disease Biomarkers.
DOI:
https://doi.org/10.52783/jns.v14.3905Keywords:
NAAbstract
Alzheimer’s Disease (AD) is a progressively debilitating neurodegenerative disorder and is frequently diagnosed at advanced stages due to the lack of reliable, efficient early-stage biomarkers. Existing diagnostic techniques, including cerebrospinal fluid (CSF) analysis and positron emission tomography (PET), are invasive, costly, and not accessible in low-resource environments. Although structural and functional neuroimaging (MRI/fMRI) and speech analysis have individually been demonstrated to hold promise in the detection of AD, their potential to work synergistically is largely unexplored. To our knowledge, this study is the first to present a hybrid artificial intelligence (AI) framework that combines convolutional neural networks (CNNs) for neuroimaging analysis and transformer-based natural language processing (NLP) for speech pattern evaluation to detect early AD biomarkers with high sensitivity and specificity.
MRI/fMRI scans were extracted from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and we collected a novel speech dataset comprising verbal fluency, picture description, and spontaneous speech from both AD patients, mild cognitive (MCI) subjects, and healthy controls. Our multimodal fusion model uses both a 3D CNN extractor for neuroimaging data and a fine-tuned BERT transformer for linguistic and paralinguistic features of speech (e.g. semantic coherence, syntactic complexity, and pause frequency). An attention-based fusion layer assigns dynamic weights to the contributions of imaging and speech modalities, which optimizes biomarker detection.
The experimental results showed that our model could accurately differentiate early AD from MCI with an accuracy of 92.3% (AUC: 0.96), where a prominent improvement was found in the classification performance as compared with unimodal approaches (MRI to AD: 82.1% accuracy; speech to AD: 76.5% accuracy). The model especially screened hippocampal atrophy and lexical repetition as the most discriminative ones. Longitudinal validation in a 3-year follow-up cohort showed a strong correlation between AI-predicted risk scores and clinical progression based on decline in Mini-Mental State Examination (MMSE) scores (r=0.85, p<0.001).
This study contributes:
- A novel multimodal AI frameworkfor early AD detection using non-invasive, cost-effective data.
- Empirical validationof speech and neuroimaging fusion, surpassing unimodal benchmarks.
- Clinical interpretabilitythrough saliency maps and attention weights, aligning with known AD pathology.
Downloads
Metrics
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.