top of page
Clouds in the Sky
Jay Darji

Jay Darji

Jr. Technical researcher

Systematic Evaluation of machine learning algorithms for neuroanatomically-based age prediction in youth.

Published on: April 03, 2024

Original author: Amirhossein, et al., 2022 (doi: 10.1002/hbm.26010)

Human longevity refers to the actual duration of an individual’s life, and it has been a subject of fascination, study, and speculation throughout human history. Application of machine learning (ML) algorithms to structural magnetic resonance imaging (MRI) data has yielded behaviorally meaningful estimates of the biological age of the brain (brain age). The choice of the ML approach in estimating brain age in youth is important because age-related brain changes in this age group are dynamic. However, the comparative performance of the available ML algorithms has not been systematically appraised. A variety of machine-learning approaches have been tested to predict human age based on MRIs. Among these, tree-based models and algorithms with nonlinear kernels, are especially promising approaches that have been used to construct accurate models for age prediction. In this article, Extreme Gradient Boosting (MAE of 1.49 years), Random Forest Regression (MAE of 1.58 years), and Support Vector Regression (SVR) with Radial Basis Function (RBF) Kernel (MAE of 1.64 years) emerged as the three most accurate models. Methodology: This study was carried out using T1-weighted scans from six separate cohorts: Autism Brain Imaging Data Exchange (ABIDE); ABIDE II; ADHD-200; Human Connectome Project Development (HCP-D); Child Mind Institute (CMI); Adolescent Brain Cognitive Development (ABCD); Pediatric Imaging, Neurocognition, and Genetics (PING) Data Repository. The data was collected from multiple sites from 8 different countries. Only psychiatrically healthy participants with high-quality anatomical brain scans from each cohort were included. A total of 2105 samples were used for training the ML models while data from ABCD (n = 4078) and PING (594) were used for testing. After doing standard preprocessing of the images, a total of 21 different algorithms were trained on these images. Models were separately trained for males and females. Hyperparameter tuning (when required) was performed in the combined training set (n = 2105), using a grid search in a fivefold cross-validation scheme across five repeats. In each cross-validation 80% of the training sample was used to train the model and 20% was used to test the model parameters. Subsequently, the model was re-trained on the whole training dataset using the optimal hyperparameters identified through cross-validation. Finally, the generalizability of the model was tested in two hold-out datasets (ABCD n = 4078 and PING n = 594). The primary accuracy measure for each algorithm was the Mean Absolute Error (MAE) which represents the absolute difference between the neuroimaging-predicted age and the chronological age. Result: In the present study, the authors undertook a comprehensive comparison of machine learning algorithms for sMRI-based age prediction as a proxy for the biological age of the brain in youth. They identified that the linear algorithms, except Elastic Net Regression, performed poorly while the XGBoost, RF regression and SVR with RBF kernel emerged as the three top-performing models in males and females in cross-validation and in the combined hold-out datasets. They attained maximum accuracy, MAE of 1.49 years with Extreme Gradient Boosting, followed by Random Forest Regression with an MAE of 1.58 years, and Support Vector Regression (SVR) with Radial Basis Function (RBF) Kernel having an MAE of 1.64 years. In summary, using a wide range of ML algorithms on geographically diverse datasets of young people, they showed that tree-based followed by nonlinear kernel-based algorithms offer robust, accurate, and generalizable solutions for predicting age based on brain morphological features. Impact of the research: The findings of the present study can be used as a guide for quantifying brain maturation during development and its contribution to functional and behavioral outcomes. It can be used as a guide for optimizing methodology while quantifying brain age in youth.

DeepMAge: A methylation aging clock developed with deep learning.

Published on: November 22, 2023

Original author: Fedor Galkin, et al., 2021 (doi: 10.14336/AD.2020.1202)

Human longevity refers to the actual duration of an individual’s life, and it has been a subject of fascination, study, and speculation throughout human history. The quest for understanding and extending the human lifespan has driven scientific research, medical advancements, and lifestyle changes. Aging may be influenced by various factors such as environmental, biochemical, and evolutionary constraints. Aging clocks have proven to be invaluable tools for biogerontology since they offer a unique opportunity to quantify the aging process. This ability is essential for testing geroprotective interventions and studying age-related diseases. A variety of machine learning approaches have been tested to predict human age, based on the molecular-level features. Among these, deep learning, or neural networks, is a promising approach that has been used to construct accurate clocks using blood biochemistry, transcriptomics, and microbiomics data. In this article, a DNA methylation aging clock based on deep neural network has been proposed. Its absolute median error was 2.77 years in an independent verification set of 1,293 samples from 15 studies. Methodology This study was carried out using datasets collected from the publicly available Gene Expression Omnibus repository. Overall, 32 studies were used with 6,411 DNAm profiles in total. Among these, 17 studies and 4,930 samples were included in the training set. The other 15 studies and 1,293 profiles were used in the verification set. Only 24,538 CpG sites shared between the 450K and 27K platforms were used, minus sex chromosome sites and sites with orthologous sequences on multiple chromosomes. Approximately 17% of the samples used in this project were associated with integer age values. To avoid introducing this bias into the model, 0.5-year counts were added to the integer ages. No counts were added to the float age values. They used the 353 regression coefficients (plus intercept) published in the original paper by Horvath et al. to reconstruct the linear regression model. The model was then used to estimate the logarithmically transformed age. Additionally, a de novo elastic net model was trained using a protocol from Horvath et al. First, a neural network was trained on 4930 samples with 24538 features, and then deep feature selection and gradient-based feature selection methods were applied to find the most important features. The final model was trained using the 1,000 most important features and with 5-fold cross-validation. To optimize model parameters, they used a grid search, and to minimize the MAE loss function, they used a backpropagation algorithm. Results They trained the deep neural network DeepMAge using a collection of 4,930 blood DNAm profiles, obtaining a cross-validated MedAE of 2.24 years from the control cohorts of 17 studies. The results of DeepMAge testing on control cohorts from 15 independent datasets (1,293 samples) were slightly less accurate, with a MedAE of 2.77 years. In the study on tauopathic frontotemporal dementia and palsy, cases were 1.00 years older than controls. People with inflammatory bowel diseases (IBD) were predicted by DeepMAge to be 1.23. years older than controls. Women with ovarian cancer were predicted to be 1.70 years older. Multiple sclerosis patients were predicted to be 2.10 years older. People with congenital CHARGE and Kabuki syndromes were quite interestingly predicted to be 5.28 years younger than controls. These results may indicate a faster pace of aging in people with these pathologies (except for CHARGE and Kabuki syndromes). Conclusion Epigenetic aging has numerous quantitative models. This study developed DeepMAge — the first deep learning DNAm aging clock, which significantly outperforms more widespread linear models in multiple settings. Impact of the research DeepMAge is sensitive to diseases — a property that makes it valuable as a potential health marker. DNA methylation clocks can be used as biomarkers of healthy and unhealthy aging, as well as be used to predict disease risk.

Lung cancer prediction using robust machine learning and image enhancement methods on extracted gray-level co-occurrence matrix features.

Published on: July 12, 2023

Original author: Hussain, L., et al., 2022 (doi: 10.3390/app12136517)

In the present era, cancer is the leading cause of demise in both men and women world-wide, with low survival rates due to inefficient diagnostic techniques. In the year 2021, about 1,898,160 new cancer cases were encountered, and 608,570 deaths were projected in the United States, and 85% were be non-small cell lung cancer (NSCLC), among others. NSCLC can be detected with radiofrequency (RF) excision and stereotactic body radiotherapy (SBRT) methods. Lung cancer has two main types, i.e., NSCLC and small cell lung carcinoma (SCLC). Both of these types of cancer spread in different ways and require different treatment accordingly. NSCLC spreads very slowly and is different than SCLC, which is related to smoking, and propagates very rapidly, forms tumors, and spreads to different parts of the body. The deaths based on SCLC are proportional to the number of cigarettes smoked. In medical image processing, image enhancement can further improve prediction performance. This study aimed to improve lung cancer image quality by utilizing and employing various image enhancement methods, such as image adjustment, gamma correction, contrast stretching, thresholding, and histogram equalization methods. They extracted the gray-level co-occurrence matrix (GLCM) features on enhancement images, and applied and optimized vigorous machine learning classification algorithms, such as the decision tree (DT), naïve Bayes, support vector machine (SVM) with Gaussian, radial base function (RBF), and polynomial. Without the image enhancement method, the highest performance was obtained using SVM, polynomial, and RBF, with accuracy of (99.89%). Methodology They used public dataset provided by the Lung Cancer Alliance (LCA). A Digital Imaging and Communications in Medicine (DICOM) format is used for database images. There were 76 patients, with a total of 945 images, including 377 from NSCLC and 568 from SCLC. To avoid overfitting, the researchers used different image data augmentation methods for increasing data comprising geometric transformation, and photometric shifting and primitive data augmentation methods. These methods include rotation, flipping, cropping, shearing, and translation in the geometric transformation. Good-quality images and the improved visual effects of images can be further helpful for diagnostic systems. In this study, they utilized different image enhancement methods based on various factors to improve the image quality before computing features. They extracted 22 gray-level co-occurrence features and fed them to machine learning algorithms. Results This study was specifically conducted to improve lung cancer prediction performance by optimizing feature extraction and machine learning methods. They first extracted the 22 gray-level co-occurrences (GLCM) features, and applied various ML algorithms to classify the NSCLC from SCLC. Then an image enhancement method, i.e., image adjustment, was used before extracting the GLCM feature, and then applied machine learning algorithms. The results revealed that the prediction performance based on image enhancement methods improved with all the classification algorithms. Many classification algorithms were utilized such as SVM, Decision Tree, and Naïve Bayes. They computed ML algorithms’ performance based on specificity, sensitivity, negative predictive value (NPV), positive predictive value (PPV), area under the receiver operating curve (AUC), and accuracy using 10-fold cross-validation. The SVM Gaussian and RBF provided the highest performance, with sensitivity (99.82%), specificity and PPV (100%), and accuracy (99.89%), followed by SVM polynomial, with sensitivity and NPV (100%), specificity (99.72%), and accuracy (99.89%). The decision tree yielded an accuracy of 99.35%, whereas the naïve Bayes yielded an accuracy of 88.57%. Conclusion This study was conducted to distinguish between the groups of NSCLC and SCLC by first extracting hand-crafted texture features and employing supervised machine learning algorithms, such as naïve Bayes, and decision tree and SVM with RBF, Gaussian, and polynomial kernels. The results revealed that the proposed methods are very robust in improving the further diagnosis of lung cancer prognosis by expert radiologists. Impact of the research The research contributes to improving the accuracy and reliability of lung cancer diagnosis, particularly in distinguishing between NSCLC and SCLC. Accurate diagnosis is crucial for selecting the appropriate treatment and improving patient outcomes. The application of image enhancement techniques and texture feature extraction improves the performance of lung cancer detection. This can potentially lead to earlier detection of lung cancer, allowing for timely interventions and better treatment outcomes.

A hybrid CNN-GLCM classifier for detection and grade classification of brain tumour.

Published on: March 29, 2023

Original author: Akila, et.al. (2022) (DOI: 10.21203/rs.3.rs-531022/v1)

A tumor is a volume of irregular and abnormal cells affecting the function of nearby healthy cells in the human body. Meningioma is the most commonly occurring tumor in adults with high-risk factors. Meningiomas are seen in the dura mater, which are the outer tissues of the brain and are called meninges. The general classification of tumors is benign or noncancerous tumors and a malignant or cancerous tumor. Benign does not spread to other parts of the body whereas a malignant or cancerous tumor may appear as a mass and pulls out healthy cells by taking nourishment from the body tissues.​ Early identification and diagnosis of brain tumors using a supervised approach plays an essential role in the field of medicine. In this paper, an automated computer-aided method using deep learning architecture named CNN Deep net is proposed for the detection, classification, and diagnosis of meningioma brain tumors. This proposed method includes preprocessing, classification, and segmentation of the primary occurring brain tumor in adults. The proposed CNN Deep Net architecture extracts the features internally from enhanced images and classifies them into normal and abnormal tumor images. Based on its feature attributes the proposed CNN Deep net classifier, classifies the detected tumor image into either (low grade) benign or (high grade) malignant. This proposed CNN Deep net classification methodology approach with a grading system is evaluated both quantitatively and qualitatively. Methodology The performance of the proposed CNN Deep Net is observed on the MR brain images of open-access BRATS datasets. This dataset includes ground truth images that are obtained from expert radiologists. This paper accessed 600 brain images from the dataset and it has both normal brain images (340 brain images) and abnormal brain images (260 brain images). The meningioma tumor prediction and classification in MR image is carried out using proposed the CNN Deep net. All dataset images of 512*512-pixel resolution is resized into 256*256 pixels and achieve the same dimensionality with scaling. The tumor regions are segmented using global thresholding approach in fusion with connected the component method. This preprocessed brain MRI image is classified either as normal or abnormal using the proposed CNN Deep net classifier. Results & discussion Both subjective and objective comparisons of classification and segmentation are well explained in terms of sensitivity, specificity, accuracy, precision, F score, DSI, etc. Analysis of the proposed CNN Deep net is performed by means of classification rate which is the ratio of the number of correctly classified objects to the total objects involved in the method. In this proposed method the classification rate of normal images and affected tumor images is 99.5% when 340 normal MRI images and 206 affected tumor MRI images are considered for classification. It is thus found that an average of 99.5% classification rate and validation accuracy of 99.4% is achieved through the proposed method. It is observed that the metrics sensitivity, specificity, accuracy, precision, and F-score of the proposed CNN Deep Networks are better when compared with the conventional machine learning methods. Impact of the research The NCIS and WHO have reported that every year around 13,000 people are affected by the tumor. Every year the death rate is progressively increasing because of the late diagnosis and semi-automated grading system. The detection and classification of tumors by the manual method is the greatest challenge that the pathologists encounter. With the emergence of machine learning techniques in medical imaging analysis the detection and classification approaches has become easier and more effective in terms of accuracy. An image analysis scheme of CNN Deep Net is proposed for the detection and classification of meningioma tumors with high accuracy and classification rate. Conclusion A novel diagnostic system using GLCM CNN classifier is proposed and achieves a high classification rate and accuracy. Thus, the proposed CNN Deep net architecture has achieved a higher classification rate of 99.5% with better specificity (98.6%) and sensitivity (97.2%).

A comparative analysis of machine learning algorithms to predict Alzheimer’s disease.

Published on: November 16, 2022

Original author: Morshedul Bari Antor, et.al. (2021) (DOI: 10.1155/2021/9917919)

Alzheimer's disease is a progressive neurological disorder that causes brain cells to die and the brain to shrink (atrophy). Alzheimer's disease is the most common cause of dementia, which is defined as a progressive decline in cognitive, behavioral, and social skills that impairs a person's ability to function independently. Approximately 5.8 million people in the United States aged 65 years and older live with Alzheimer's disease. Of those, 80% are 75 years old and older. Out of the approximately 50 million people worldwide with dementia, between 60% and 70% are estimated to have Alzheimer's disease.​ Although there is no cure for AD, an early diagnosis of the disease is important to start therapies that can slow down the progression and improve the management of AD, hence a machine learning system can reduce this problem by predicting the disease. The main aim is to recognize dementia among various patients. This paper represents the result and analysis regarding the detection of dementia from various machine learning models. The authors have used the Open Access Series of Imaging Studies (OASIS) dataset for the development of the system. The dataset is small, but it has some significant values. The dataset has been analyzed and applied in several machine learning models such as Support vector machine (SVM), logistic regression, decision tree, and random forest for prediction. Among trained models SVM attains the best results in detecting dementia in numerous patients. The system is simple and can easily help people by detecting dementia among them. Methodology For detecting dementia Magnetic Resonance Imaging (MRI) data from OASIS has been utilized. Dataset had 15 features of which 8 features have been incorporated into the model-building task. Gender, person’s age (age), years of education (EDUC), socio-economic status (SSE), mini-mental state examination (MMSE), estimated total intracranial volume (eTIV), normalized whole brain volume (nWBV), and Atlas scaling factor (ASF) are the 8 features. All the categorical features are converted to numerical values and then to scale all these features standardization method (z – score) has been applied. Following this, the strategy for handling missing data was determined based on the analysis of the correlation between each feature. Since features with missing values had a high impact on the target, missing values were replaced with the median value of the respective feature. The dataset was then divided into training and testing data with an 80:20 split. The data set was divided using stratified random sampling, which resulted in balanced data. Finally, the training data was fed to various classifiers such as Support Vector Machine (SVM), logistic regression, decision tree, and random forest. The performance of each model was then evaluated before and after tuning the model parameters. Results & discussion Since this is a classification problem, the evaluation metrics used in this system were as follows: accuracy, recall, area under the curve (AUC), and confusion matrix. After compiling the results from all the models, it is evident that the support vector machine (SVM), more specifically the support vector classifier, gave the best overall result in all metrics. However, the decision tree classifier gave the best true positive result. For fine-tuning, grid search has been used in the models. So, the results obtained are the best possible results for the dataset. Some more complex models like the random forest classifier suffered from an overfitting issue. The SVM classifier had 85% training accuracy and 73% of testing accuracy before fine-tuning the model. For fine-tuning, a grid search has been used. After fine-tuning, the results from the same models improved significantly. The SVM classifier yielded 92% test accuracy after tuning. Other algorithms such as random forest, decision trees, and logistic regression gave 81%, 80%, and 74% accuracy in detecting dementia respectively. Impact of the research The ML system can help the general public get an idea about the possibility of dementia in adult patients by simply inputting MRI data. Hopefully, it will help patients to get early treatment for dementia and improve their quality of life.

Connecting Dots

Get the latest updates

Join Cell talk!

Subscribe us today & receive latest updates on your mail !

bottom of page