Background and Aim: Breast cancer is one of the most common and aggressive malignancies in women. Timely diagnosis of breast cancer plays an important role in preventing the progression of this disease, timely treatment measures, and aftermath reducing the mortality rate of these patients. Machine learning has the potential ability to diagnose diseases quickly and cost-effectively. This study aims to design a CDSS based on the rules extracted from the decision tree algorithm with the best performance to diagnose breast cancer in a timely and effective manner.
Materials and Methods: The data of 597 suspected people with breast cancer (255 patients and 342 healthy people) were retrospectively extracted from the electronic database of Ayatollah Taleghani Hospital in Abadan city with 24 characteristics, mainly pertained to lifestyle and medical histories. After selecting the most important variables by using the Chi-square Pearson and one-way analysis of variance (P<0.05), the performance of selected data mining algorithms including RF, J-48, DS, RT and XG -Boost was evaluated for breast cancer diagnosis in Weka 3.4 software. Finally, the breast cancer diagnostic system was designed based on the best model and through C# programming language and Dot Net Framework V3.5.4.
Results: Fourteen variables including personal history of breast cancer, breast sampling, and chest X-ray, high blood pressure, increased LDL blood cholesterol, presence of mass in upper inner quadrant of the breast, hormone therapy with estrogen, hormone therapy with Estrogen-progesterone, family history of breast cancer, age, history of other cancers, waist-to-hip ratio and fruit and vegetable consumption showed a significant relationship with the output class at the P<0.05. Based on the results of the performance evaluation of selected algorithms, the RF model with sensitivity, specificity, accuracy, and F- measure equal to 0.97, 0.99, 0.98, 0.974, respectively, AUC=0.936 had higher performance than other selected algorithms and was suggested as the best model for breast cancer diagnosis.
Conclusion: It seems that using modifiable variables such as lifestyle and reproductive-hormonal characteristics as input to the RF algorithm to design the CDSS, can detect breast cancer cases with optimal accuracy. In addition, the proposed system can be effectively adapted in real clinical environments for quick and effective disease diagnosis.