top of page

A comparative analysis of machine learning algorithms to predict Alzheimer’s disease

Original author: Morshedul Bari Antor, (2021) (DOI: 10.1155/2021/9917919)


jay darji.jpg

Jr. technical researcher

November 16, 2022


Alzheimer's disease is a progressive neurological disorder that causes brain cells to die and the brain to shrink (atrophy). Alzheimer's disease is the most common cause of dementia, which is defined as a progressive decline in cognitive, behavioral, and social skills that impairs a person's ability to function independently. Approximately 5.8 million people in the United States aged 65 years and older live with Alzheimer's disease. Of those, 80% are 75 years old and older. Out of the approximately 50 million people worldwide with dementia, between 60% and 70% are estimated to have Alzheimer's disease.

Although there is no cure for AD, an early diagnosis of the disease is important to start therapies that can slow down the progression and improve the management of AD, hence a machine learning system can reduce this problem by predicting the disease. The main aim is to recognize dementia among various patients. This paper represents the result and analysis regarding the detection of dementia from various machine learning models. The authors have used the Open Access Series of Imaging Studies (OASIS) dataset for the development of the system. The dataset is small, but it has some significant values. The dataset has been analyzed and applied in several machine learning models such as Support vector machine (SVM), logistic regression, decision tree, and random forest for prediction. Among trained models SVM attains the best results in detecting dementia in numerous patients. The system is simple and can easily help people by detecting dementia among them.


For detecting dementia Magnetic Resonance Imaging (MRI) data from OASIS has been utilized. Dataset had 15 features of which 8 features have been incorporated into the model-building task. Gender, person’s age (age), years of education (EDUC), socio-economic status (SSE), mini-mental state examination (MMSE), estimated total intracranial volume (eTIV), normalized whole brain volume (nWBV), and Atlas scaling factor (ASF) are the 8 features.


All the categorical features are converted to numerical values and then to scale all these features standardization method (z – score) has been applied. Following this, the strategy for handling missing data was determined based on the analysis of the correlation between each feature. Since features with missing values had a high impact on the target, missing values were replaced with the median value of the respective feature.


The dataset was then divided into training and testing data with an 80:20 split. The data set was divided using stratified random sampling, which resulted in balanced data. Finally, the training data was fed to various classifiers such as Support Vector Machine (SVM), logistic regression, decision tree, and random forest. The performance of each model was then evaluated before and after tuning the model parameters.

Results & discussion

Since this is a classification problem, the evaluation metrics used in this system were as follows: accuracy, recall, area under the curve (AUC), and confusion matrix. After compiling the results from all the models, it is evident that the support vector machine (SVM), more specifically the support vector classifier, gave the best overall result in all metrics. However, the decision tree classifier gave the best true positive result. For fine-tuning, grid search has been used in the models. So, the results obtained are the best possible results for the dataset. Some more complex models like the random forest classifier suffered from an overfitting issue.


The SVM classifier had 85% training accuracy and 73% of testing accuracy before fine-tuning the model. For fine-tuning, a grid search has been used. After fine-tuning, the results from the same models improved significantly. The SVM classifier yielded 92% test accuracy after tuning. Other algorithms such as random forest, decision trees, and logistic regression gave 81%, 80%, and 74% accuracy in detecting dementia respectively.

Impact of the research

The ML system can help the general public get an idea about the possibility of dementia in adult patients by simply inputting MRI data. Hopefully, it will help patients to get early treatment for dementia and improve their quality of life.

Connecting Dots

Get the latest updates

Join Celltalk!

Subscribe us today & receive latest updates on your mail !

bottom of page