Keshtvarz Hesam Abadi A, Hajizadeh E, Pourhoseingholi M, Nazemalhossein Mojarad E. Comparison of Random Forest and Logistic Regression Methods in Predicting Mortality in Colorectal Cancer Patients and its Related Factors. irje 2019; 14 (4) :375-383
URL:
http://irje.tums.ac.ir/article-1-6201-en.html
1- Masters Student of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
2- Professor of of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University,Tehran, Iran , hajizadeh@modares.ac.ir
3- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
4- Gastroenterology and Liver Disease Research center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Abstract: (3936 Views)
Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors on the mortality of patients with colorectal cancer using random forest and logistic regression methods.
Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shahid Beheshti University of Medical Sciences during the years 2009 to 2014 were used as a retrospective study. Data analysis was performed using random forest and logistic regression methods. To analyze the data, R software version 3.4.3 was considered.
Results: Ten important variables related to colorectal cancer deaths were selected by random forest method. Several criteria such as the area under the characteristic curve (AUC) were used to compare the random forest method with logistic regression. According to both criteria, five important variables ranked by random forest were Cancer stage, age of diagnosis, patient's age, HLA, and degree of differentiation (tumor differentiation). In terms of different criteria, the random forest method had better performance than logistic regression (Area under the ROC curve for random forest and logistic regression methods was: 98%; 80% respectively).
Conclusion: Variables such as Cancer stage, age of diagnosis, patient's age, HLA, and degree of differentiation are considered as the most important factors affecting mortality in colorectal cancer, that the patients' longevity can be increased with the early diagnosis of cancer and screening programs.
Type of Study:
Research |
Subject:
General Received: 2019/04/16 | Accepted: 2019/04/16 | Published: 2019/04/16
Send email to the article author