Search published articles


Showing 8 results for Data Mining

Taha Samad Soltani , Mostafa Langarizadeh, Maryam Zolnoori,
Volume 9, Issue 3 (9-2015)
Abstract

Background and Aim: Data mining is a very important branch in deeper understanding of medical data, which attempts to solve problems in the diagnosis and treatment of diseases. One of the most important data mining applications is to examine the existing data patterns. The present study aims to examine the existing data patterns of patients with asthma. Materials and Methods: This study was performed on 258 patients with respiratory symptoms, who referred to Imam Khomeini and Masih Daneshvari Hospitals in 2009. All records were entered into Excel software, and data mining add-ins were used. Analyses such as key influencers, cluster analysis of patients, and detecting exceptions have been done. Results: The most common clinical sign of asthma among subjects was severe coughing, which was highly affected by thrills. The data were aggregated into 5 clusters for more general analyses. Their common denominator was then identified and the records with exceptional features were determined. Then, following cost analysis and setting the threshold value at 612, a questionnaire was developed based on data features for diagnosis of asthma. Conclusion: The developed framework for data mining and analysis is an appropriate tool for knowledge extraction based on the data and their relationships. Meanwhile, it can identify and fill the existing gap in medical decision- making when using clinical guideline
Sajad Mazaheri , Maryam Ashoori, Zeynab Bechari,
Volume 11, Issue 3 (9-2017)
Abstract

Background and Aim: Nowadays heart disease is very common and is a major cause of mortality. Proper and early diagnosis of this disease is very important. Diagnostic methods and treatments of the disease are so expensive and have many side effects. Therefore, researchers are looking for cheaper ways to diagnose it with high precision. This study aimed to identify a model for the treatment of heart disease.
Materials and Methods: In this descriptive cross-sectional study, the sampling method was census. The sample consisted of data from Khatam and Ali Ibn Abi Talib Hospitals in Zahedan. The data were developed as an Excel file, and Clementine12.0 software was used for data analysis. In the present study, C5.0, C & R Tree, CHAID, and QUEST algorithms and artificial neural network were carried out on the collected data. 
Results: The accuracy of 76.04 by C & R algorithm indicates the better performance of Decision Tree Algorithms than that of the Neural Network. 
Conclusion: This study aimed to provide a model for the prediction of a suitable heart disease treatment to reduce treatment costs and provide better quality of services for physicians. Due to considerable implementation risks of invasive diagnostic procedures such as angiography and also obtaining successful experiences of data analysis in medicine, this study has presented a model based on data analysis techniques. The improvable point of this model is the provision of a decision support system to help physicians to increase the accuracy of diagnosis in the treatment of diseases. 

Seyed Abbas Mahmoodi , Kamal Mirzaie, Seyed Mostafa Mahmoodi ,
Volume 11, Issue 3 (9-2017)
Abstract

Background and Aim: Gastric cancer is the second leading cause of cancer death in the world. Due to the prevalence of the disease and the high mortality rate of gastric cancer in Iran, the factors affecting the development of this disease should be taken into account. In this research, two data mining techniques such as Apriori and ID3 algorithm were used in order to investigate the effective factors in gastric cancer.
Materials and Methods: Data sets in this study were collected among 490 patients including 220 patients with gastric cancer and 270 healthy samples referred to Imam Reza hospital in Tabriz. The best rules related to this data set were extracted through Apriori algorithm and implementing it in MATLAB. ID3 algorithm was also used to investigate these factors.
Results: The results showed that having a history of gastro esophageal reflux has the greatest impact on the incidence of this disease. Some rules extracted through Apriori algorithm can be a model to predict patient status and the incidence of the disease and investigate factors affecting the disease. The prediction accuracy achieved through ID3 algorithm is 85.56 which was a very good result in the prediction of gastric cancer.
Conclusion: Using data mining, especially in medical data, is very useful due to the large volume of data and unknown relationships between systemic, personal, and Behavioral Features of patients. The results of this study could help physicians to identify the contributing factors in incidence of the disease and predict the incidence of the disease.

Reza Safdari, Maliheh Kadivar, Parinaz Tabari, Hala Shawky Own ,
Volume 11, Issue 5 (1-2018)
Abstract

Background and Aim: Neonatal jaundice is a matter that is very important for clinicians all over the world because this disease is one of the most common cases that requires clinical care. The aim of this study is to use data classification algorithms to predict the type of jaundice in neonates, and therefore, to prevent irreparable damages in future.
Materials and Methods: This is a descriptive study and is done with the use of neonatal jaundice dataset that has been collected in Cairo, Egypt. In this study, after preprocessing the data, classification algorithms such as decision tree, Naïve Bayes, and kNN (k-Nearest Neighbors) were used, compared and analyzed in Orange application.
Results: Based on the findings, decision tree with precision of 94%, Naïve Bayes with precision of 91%, and kNN with precision of 89% can classify the types of neonatal jaundice. So, among these types, the most precise classification algorithm is decision tree. 
Conclusion: Classification algorithms can be used in clinical decision support systems to help physicians make decisions about the types of special diseases; therefore, physicians can look after patients appropriately. So the probable risks for patients can be decreased. 

Mohammad Reza Shahraki , Mahboubeh Mesgar,
Volume 13, Issue 1 (5-2019)
Abstract

Background and Aim: The liver, as one of the largest internal organs in the body, is responsible for many vital functions including purifying and purifying blood, regulating the body's hormones, preserving glucose, and the body. Therefore, disruptions in the functioning of these problems will sometimes be irreparable. Early prediction of these diseases will help their early and effective treatment. Regarding the importance of liver diseases and increasing number of patients, the present study, using data mining algorithms, aimed to predict liver disease.
Materials and Methods: This descriptive study was performed using 721 data from liver patient in zahedan. In this study, after preprocessing data, data mining techniques such as SVM: Support Vector Machine, CHAID, Exhaustive CHAID and boosting C5.0, data were analyzed using IBM SPSS Modeler 18 data mining software.
Result: The validity obtained for boosting C5.0 94/09, for Exhaustive CHAID algorithm 88/71, for SVM 87/09, for CHAID algorithm 85/47 prediction of liver disease. the boosting C5.0 algorithm showed a better performance of this algorithm among other algorithms.
Conclusion: According to the rules created by boosting C5.0 algorithm, for a new sample, one can predict the likelihood of a person for developing liver disease with high precision.

Raoof Nopour, Mohammad Shirkhoda, Sharareh Rostam Niakan Kalhori,
Volume 14, Issue 2 (5-2020)
Abstract

Background and Aim: Colorectal cancer is one of the most common gastrointestinal cancers among human beings and the most important cause of death in the world. Based on the risk of colorectal cancer for individuals, using an appropriate screening program can help to prevent the disease. Therefore, the purpose of this study was to design a model for screening colorectal cancer based on risk factors to increase the survival rate of the disease on the one hand and to reduce the mortality rate on the other.
Materials and Methods: By reviewing articles and patients' records, 38 risk factors were detected. To determine the most important risk factors clinically, CVR(content validity ratio) was used; and considering the collected data, Spearman correlation coefficient and logistic regression analysis were applied for statistical analyses. Then, four algorithms -- J-48, J-RIP, PART and REP-Tree -- were used for data mining and rule generation. Finally, the most common model was obtained based on comparing the performance of the algorithms.
Results: After comparing the performance of algorithms, the J-48 algorithm with an F-Measure of 0.889 was found to be better than the others.
Conclusion: The results of evaluating J-48 data mining algorithm performance showed that this algorithm could be considered as the most appropriate model for colorectal cancer risk prediction.

Mostafa Shanbehzadeh, Hadi Kazemi-Arpanahi, Raoof Nopour,
Volume 16, Issue 2 (5-2022)
Abstract

Background and Aim: Breast cancer is one of the most common and aggressive malignancies in women. Timely diagnosis of breast cancer plays an important role in preventing the progression of this disease, timely treatment measures, and aftermath reducing the mortality rate of these patients. Machine learning has the potential ability to diagnose diseases quickly and cost-effectively. This study aims to design a CDSS based on the rules extracted from the decision tree algorithm with the best performance to diagnose breast cancer in a timely and effective manner.
Materials and Methods: The data of 597 suspected people with breast cancer (255 patients and 342 healthy people) were retrospectively extracted from the electronic database of Ayatollah Taleghani Hospital in Abadan city with 24 characteristics, mainly pertained to lifestyle and medical histories. After selecting the most important variables by using the Chi-square Pearson and one-way analysis of variance (P<0.05), the performance of selected data mining algorithms including RF, J-48, DS, RT and XG -Boost was evaluated for breast cancer diagnosis in Weka 3.4 software. Finally, the breast cancer diagnostic system was designed based on the best model and through C# programming language and Dot Net Framework V3.5.4.
Results: Fourteen variables including personal history of breast cancer, breast sampling, and chest X-ray, high blood pressure, increased LDL blood cholesterol, presence of mass in upper inner quadrant of the breast, hormone therapy with estrogen, hormone therapy with Estrogen-progesterone, family history of breast cancer, age, history of other cancers, waist-to-hip ratio and fruit and vegetable consumption showed a significant relationship with the output class at the P<0.05. Based on the results of the performance evaluation of selected algorithms, the RF model with sensitivity, specificity, accuracy, and F- measure equal to 0.97, 0.99, 0.98, 0.974, respectively, AUC=0.936 had higher performance than other selected algorithms and was suggested as the best model for breast cancer diagnosis.
Conclusion: It seems that using modifiable variables such as lifestyle and reproductive-hormonal characteristics as input to the RF algorithm to design the CDSS, can detect breast cancer cases with optimal accuracy. In addition, the proposed system can be effectively adapted in real clinical environments for quick and effective disease diagnosis.

Atefeh Abbasi, Somayeh Nasiri, Sayyed Mostafa Mostafavi, Abbas Habibolahi,
Volume 19, Issue 4 (11-2025)
Abstract

Background and Aim: Neonatal hypoxic-ischemic encephalopathy (HIE) is a clinical syndrome characterized by impaired brain function resulting from oxygen deprivation and reduced cerebral blood flow. Developing predictive models can serve as valuable tools for physicians in forecasting disease outcomes and facilitating early interventions. The present study was conducted with the aim of constructing a predictive model for neonatal hypoxic-ischemic encephalopathy using data mining algorithms.
Materials and Methods: This applied study was conducted using a descriptive approach. In the first stage, the factors influencing the prediction of neonatal hypoxic-ischemic encephalopathy were identified through expert surveys. In the second stage, data pertaining to 4,000 neonates were collected from the Iman system, available in the database of the Ministry of Health and Medical Education, during the years 2020–2021. Following preprocessing, a dataset comprising 3,962 records with 13 features was extracted. Subsequently, predictive models were developed using algorithms including artificial neural networks, decision tree variants, random forest, support vector machines, logistic regression, and Bayesian networks. Model construction was performed using the Python programming language within the Anaconda environment. Finally, performance evaluation and comparison were carried out using metrics such as accuracy, precision, specificity, F1-score, and the Area Under the Curve (AUC).
Results: The findings of the study revealed that the Area Under the Receiver Operating Characteristic Curve (AUROC) for models developed using logistic regression, artificial neural networks, random forest, Bayesian networks, support vector machines, and decision trees were 86%, 86%, 84%, 82%, 76%, and 74%, respectively. The highest performance was achieved by the logistic regression algorithm, with an accuracy of 81%, sensitivity of 85%, and specificity of 96%. The greatest sensitivity was observed in logistic regression, artificial neural networks, and support vector machines, whereas the naïve Bayesian algorithm demonstrated the lowest performance metrics. In the predictive model for hypoxic-ischemic encephalopathy, the most influential feature was the first-minute Apgar score, while the least influential factor was delivery outside the hospital.
Conclusion: The findings of the present study indicated that the predictive model for neonatal hypoxic-ischemic encephalopathy based on the logistic regression algorithm demonstrated superior performance. It is anticipated that the application of practical data-driven algorithms for neonates with hypoxic-ischemic encephalopathy will play a crucial role in the rapid identification of the condition and the provision of appropriate treatment. Such approaches can enable healthcare professionals to act within the critical window of opportunity, thereby improving the quality of care, preventing disease progression, and reducing the severity of adverse outcomes.


Page 1 from 1     

© 2026 , Tehran University of Medical Sciences, CC BY-NC 4.0

Designed & Developed by: Yektaweb