Showing 15 results for Logistic Regression
M.a Pohrhoseingholi, H Alavi Majd, A.r Abadi, S Parvanehvar,
Volume 1, Issue 1 (12-2005)
Abstract
Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.
Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary variable and compare the result with case-complete analysis in a logistic regression model dealing with factors that influence the choice of the delivery method.
Our data came from a cross-sectional study of factors associated with the choice of the delivery method in pregnant women. The sample size in this cross-sectional study was 365 and the data were collected through interviews, using questionnaires covering several demographic variables, delivery history, attitude, and some social factors. We used standard deviations to compare the efficiency of the two methods.
Results: The results show that maximum likelihood analysis by EM algorithm is more effective than case-complete analysis.
The problem of missing data is common in surveys and it causes bias and decreased model efficacy. Here we show that the EM algorithm for imputation in logistic regression with missing values for a discrete covariate is more effective than case-complete analysis.
Conclusion: On the other hand if missing values occur for a continuous covariate then we have to use other methods or change the variable into a discrete one.
Ma Pourhosseingholi, Y Mehrabi, H Alavi-Majd, P Yavari,
Volume 1, Issue 3 (2-2006)
Abstract
Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of multicollinearity in the analysis of a case-control study.
Methods: Our data came from a case-control study in which 300 women with breast cancer were compared to 300 controls. Five highly correlated quantitative variables were selected to assess the effect of multicollinearity. First, an ordinary logistic regression model was fitted to the data. Then, to remove the effect of multicollinearity, two latent variables were generated using factor analysis and principal components analysis methods. Parameters of logistic regression were estimated using these latent as explanatory variables. We used the estimated standard errors of the parameters to compare the efficiency of models.
Results: The logistic regression based on five primary variables produced unusual odds ratio estimates for age at first pregnancy (OR=67960, 95%CI: 10184-453503) and for total length of breast feeding (OR=0). On the other hand, the parameters estimated for logistic regression on latent variables generated by both factor analysis and principal components analysis were statistically significant (P<0.003). The standard errors were smaller than with ordinary logistic regression on original variables. The factors and components generated by the two methods explained at least 85% of the total variance.
Conclusions: This research showed that the standard errors of the estimated parameters in logistic regression based on latent variables were considerably smaller than that of model for original variables. Therefore models including latent variables could be more efficient when there is multicollinearity among the risk factors for breast cancer.
A Ahmadi, J Hasanzadeh, A Rajaefard,
Volume 4, Issue 2 (9-2008)
Abstract
Background & Objectives: Hypertension is one of the most prevalent and important risk factor of cardio-vascular diseases. The aim of this research was to determine relative factors on hypertension in Kohrang.
Methods: This survey was a population – based case - control study. The study population consisted of 415 patient with hypertension (cases) and 415 controls without any history of cardiovascular and or cerebrovascular diseases & hypertension. A systematic random sampling was used. The chi-square test and conditional logistic regression model was used and the data were analyzed by STATA.
Results: Family history of hypertension, age over 60, no physical activity, bmi≥30 were calculated as risk factors with odds ratio: 2.33 (95% CI 1.58-3.47), 2.01(95% CI 1.24-2.67), 1.8 (95% CI 1.2-2.7), 1.66 (95% CI 1.32-2.07) respectively (p<0.05). Fish consumption, unsaturated fat consumption and literacy were considered as protective factors with an odds ratio: 0.516 (95% CI 0.35-0.69), 0.514 (95% CI 0.36-0.72), 0.28 (95% CI 0.17-0.45) respectively (p<0.01).
Conclusions: The findings of this study highlight to plan appropriate health promotion programmes by health policy makers.
Z Asadollahi, P Jafari, M Rezaeian,
Volume 10, Issue 1 (6-2014)
Abstract
Background & Objectives: Due to the increasing tendency to measure the quality of life in recent years and the extensive quality of life questionnaires, it is important to determine the appropriate method of analyzing data derived from these studies. The aim of the present study was to introduce ordinal logistic regression models as an appropriate method for analyzing the data of quality of life.
Methods: The data was derived from a cross-sectional study on quality of life survey of 938 students. For data analysis, two binary logistic regression models and ordinal logistic regression models were used and the results of these models were compared.
Results: The results of goodness of fit showed that all three models were fitted well. Based on the ordinal logistic regression models, the three variables out of the explanatory variables were statistically associated with the response while based on the binary logistic regression model, after combining two categories of response variable, only two variables were significant. Therefore, combining the categories of the response variable should be avoided as much as possible because it may lead to data loss due to ignoring some of the response categories.
Conclusion: It is concluded that to analyze quality of life data, due to the nature of the response variable, ordinal logistic regression models are recommended considering the fewer parameter estimates and easier interpretation of the results
A Asadabadi , A Bahrampour, Aa Haghdoost,
Volume 10, Issue 3 (12-2014)
Abstract
Background and Objectives : recent years, considerable attention has been paid to statistical models for classification of medical data according to various diseases and their outcomes. Artificial neural networks have been successfully used for pattern recognition and prediction since they are not based on prior assumptions in clinical studies. This study compared two statistical models, artificial neural network and logistic regression, to predict the survival of patients with breast cancer.
Methods: Two models were applied on cancer registry data, Kerman, southeast of Iran, to predict survival. The data of 712 breast cancer patients in the age group 15 to 85 years was used in this study. The logistic regression and three-layer perceptron neural network models were compared in terms of predicting the survival. Sensitivity, specificity, prediction accuracy, and the area under ROC curve were used for comparing the two models.
Results : In this study, the sensitivity and specificity of logistic regression and artificial neural network models were (0.594, 0.70) and (0.621, 0.723), respectively. Prediction accuracy and the area under ROC curve for two models were (0.688, 0.725) and (0.70, 0.725), respectively.
Conclusion: Although there were insignificant differences in the performance of the two models for predicting the survival of the patients with breast cancer, the corresponding results of artificial neural network were more appropriate for predicting survival in such data.
H Akbarein, Ar Bahonar, S Bokaie, N Mosavar, A Rahimi- Foroushani , H Sharifi, As Makenali, Nd Rokni, B Marhamati- Khameneh , S Broumanfar,
Volume 10, Issue 3 (12-2014)
Abstract
Background & Objectives: Bovine Tuberculosis (BTB) is one of the most important zoonoses. Mycobacterium bovis is the responsible agent of BTB in the cattle. The current study was conducted to investigate the determination factors of BTB in dairy farms covered by the tuberculin screening test.
Methods: A herd level case- control study was carried out in 124 (62 cases & 62 controls) dairy farms in the provinces of Tehran, Alborz, Hamedan, Isfahan, Qazvin, Qom, Mazandaran and Semnan. The control farms were individually matched with case farms by farm capacity and distance. Statistical analyses were done by Stata 11.2 using conditional logistic regression.
Results: Proper management of manure (OR=0.12 95% CI: 0.03-0.49), regular flaming of stalls (OR= 0.21 95% CI: 0.04-0.92) and complete fencing around the farm (OR= 0.17 95% CI: 0.03-0.81) decreased while the presence of rodents (rat) (OR= 4.90 95% CI: 1.04-23.01) increased the risk of infection. The interaction among these variables was not statistically significant
Conclusion: According to the results, there is an essential need to pay more attention to rodent control in farms.
M Aram Ahmadi , A Bahrampour,
Volume 11, Issue 3 (11-2015)
Abstract
Background and Objectives: Diabetes is a chronic and common metabolic disease which has no curative treatment. Logistic regression (LR) is a statistical model for the analysis and prediction in multivariate statistical techniques. Discriminant analysis is a method for separating observations in terms of dependent variable levels which can allocate any new observation after making discriminating functions. The aim of this study was to compare and determine the effective variables in type 2 diabetes.
Methods: The data included 5357 persons obtained through a cohort study in Kerman, southeastern Iran, in 2009-11. Diabetes was considered the response variable. The independent variables after deleting colinearity and correlated variables included height, waist circumference, age, gender, occupation, education, drugs, systolic blood pressure, HDL, LDL, drug abuse, activities, and triglyceride. Sensitivity, specificity, accuracy, and ROC curve were applied for determining and comparing the prediction power of the models.
Results: The results in the reduced model with extracted significant variables from the full model, the sensitivity of the LR model and DA was 74% and 22.4%, the specificity of the LR model and DA was 71.1 % and 95.4 %, the prediction accuracy of the LR model and DA was 71.5% and 85.3%, and the ROC curve of the LR model and DA was 80.3% and 80.1%, respectively.Simulation showed the sensitivity, specificity, accuracy, and ROC curve was 99.18%, 98.49%, 98.59%, and 99.9% for the LR model and 92.62%, 99.19%, 98.26%, and 99.56% for DA, respectively.
Conclusion: The results showed that the risk factors of diabetes in the logistic regression reduced model were waist circumference, age, gender, LDL level, systolic pressure, and drugs. Also, the sensitivity of the LR model was more than DA while DA had a higher specificity and prediction accuracy. Comparison of the ROC curve showed that the prediction estimated values were rather similar in both models, but the two models were the same asymptotically.
F Zayeri, Sh Seyedagha, H Aghamolaie, F Boroumand, P Yavari,
Volume 12, Issue 2 (8-2016)
Abstract
Background and Objectives: Breast cancer is one of the most common malignancies in women which accounts for the highest number of deaths after lung cancer. The aim of the current study was to compare the logistic regression and classification tree models in determining the risk factors and prediction of breast cancer.
Methods: We used from the data of a case-control study conducted on 303 patients with breast cancer and 303 controls. In the first step, we included 16 potential risk factors of breast cancer in both the logistic regression and classification tree models. Then, the area under the ROC curve (AUC), sensitivity, and specificity indexes were used for comparing these models.
Results: From 16 variables included in the models, 5 variables were statistically significant in both models. Sensitivity, specificity, and AUC was 71%, 69%, and 74.7% for the logistic regression and 63.3%, 68.8%, and 71.1% for the classification tree, respectively.
Conclusion: The obtained results suggest that the classification tree has more power for separating patients from healthy people. Menopausal status, number of breast cancer cases in the family, and maternal age at the first live birth were significant indicators in both models.
F Zayeri, M Amini, H Hasanzadeh,
Volume 13, Issue 4 (3-2018)
Abstract
Background and Objectives: Shift work as a pervasive phenomenon in various industrial sectors is one of the most stressful factors in the workplace. Considering the contradictory reports on the relationship of shift work and hypertension, the main objective of the present study was to investigate the relationship between these two variables among petrochemical industry staff of Mahshahr, Iran.
Methods: In this longitudinal study, 3254 petrochemical staff were investigated during 2008-2011. According to work schedule, shift workers were divided into two groups of shift work and day work (1872 day workers and 1382 shift workers). The aim of this research was to assess the effect of shift work on hypertension by adjusting confounding variables such as gender, age, body mass index, and smoking. The data were analyzed using a random-effects logistic regression model.
Results: Of 3254 (3142 males and 112 females) subjects, 37.85% (860 subject) were hypertensive. The random effects model, with controlling covariates, showed no significant relationship between shift work and hypertension (OR=1.04, 95% CI= (0.98, 1.10). Moreover, the variance of the random effects was significant.
Conclusion: Generally, according to the results of this study, shift work is not a significant risk factor for hypertension.
F Feizmanesh, Aa Safaei,
Volume 14, Issue 3 (12-2018)
Abstract
Background and Objectives: Pulmonary embolism is a potentially fatal and prevalent event that has led to a gradual increase in the number of hospitalizations in recent years. For this reason, it is one of the most challenging diseases for physicians. The main purpose of this paper was to report a research project to compare different data mining algorithms to select the most accurate model for predicting pulmonary embolism in hospitalized patients. This model would provide the knowledge needed by the medical staff fir better decision making.
Methods: In this research, we designed a prediction model using different methods of machine learning that would best predict the probability of pulmonary embolism in patients at risk. Among data mining algorithms, Bayesian network, decisions tree (J48), logistic regression (LR), and sequential minimal optimization (SMO) were used. The data used in the study included risk factors and past history of patients admitted to the Lung Department of Shariati Hospital, Tehran, Iran.
Results: The results showed that the accuracy and specificity of all prediction models were satisfactory. The Bayesian model had the highest sensitivity in predicting pulmonary embolism.
Conclusion: Although the results showed a little difference in the performance of prediction models, the Bayesian model is a more appropriate tool to predict the occurrence of pulmonary embolism in hospitalized patients in this type of data. It can be considered a supportive approach along medical decisions to improve disease prediction.
S Dehghani, A Abadi, M Namdari, Z Ghorbani,
Volume 14, Issue 4 (3-2019)
Abstract
Background and Objectives: Periodontal disease is one of the most common oral health problems. Clinical attachment loss occurs in sever periodontal cases (CAL>3). In this study, we applied a classic regression model and the models that consider the hierarchical structure of the data to estimate and compare the effect of different factors on CAL.
Methods: This cross-sectional study was performed in 375 pregnant women and 192 mothers of three-year-old children. The data were gathered from 16 health networks of Shahid Beheshti University of Medical Sciences, Tehran, Iran. CAL was determined for 6 teeth per person by a dentist according to WHO standard oral health examination form. Three-level and ordinary logistic regression analyses were applied for data analysis using the STATA software 14.
Results: Of 3,402 examined teeth, 6.3% had CAL> 3mm. Based on the obtained results, the odds of CAL>3mm were 2.4 in the third semester compared to non-pregnant women. The odds of CAL>3mm were 2.86 in women without daily floss use compared to women with routine daily floss use. Posterior teeth were more likely to have CAL>3m than anterior teeth (OR = 1.65) (P-value < 0.05).
Conclusion: According to the AIC index, multi-level logistic regression model has a better fit than ordinary logistic regression model and can estimate the coefficients of factors related to CAL>3mm more precisely. The use of the ordinary logistic regression model in hierarchical data can result in underestimated standard errors of the estimated parameters.
Am Keshtvarz Hesam Abadi , E Hajizadeh, Ma Pourhoseingholi, E Nazemalhossein Mojarad ,
Volume 14, Issue 4 (3-2019)
Abstract
Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors on the mortality of patients with colorectal cancer using random forest and logistic regression methods.
Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shahid Beheshti University of Medical Sciences during the years 2009 to 2014 were used as a retrospective study. Data analysis was performed using random forest and logistic regression methods. To analyze the data, R software version 3.4.3 was considered.
Results: Ten important variables related to colorectal cancer deaths were selected by random forest method. Several criteria such as the area under the characteristic curve (AUC) were used to compare the random forest method with logistic regression. According to both criteria, five important variables ranked by random forest were Cancer stage, age of diagnosis, patient's age, HLA, and degree of differentiation (tumor differentiation). In terms of different criteria, the random forest method had better performance than logistic regression (Area under the ROC curve for random forest and logistic regression methods was: 98%; 80% respectively).
Conclusion: Variables such as Cancer stage, age of diagnosis, patient's age, HLA, and degree of differentiation are considered as the most important factors affecting mortality in colorectal cancer, that the patients' longevity can be increased with the early diagnosis of cancer and screening programs.
M Chehrazi, R Omani Samani , E Tehraninejad, H Chehrazi, A Arabipoor,
Volume 14, Issue 4 (3-2019)
Abstract
Background and Objectives: Analysis of ordinal data outcomes could lead to bias estimates and large variance in sparse one. The objective of this study is to compare parameter estimates of an ordinal regression model under maximum likelihood and Bayesian framework with generalized Gibbs sampling. The models were used to analyze ovarian hyperstimulation syndrome data.
Methods: This study used the data from 138 patients of a clinical trial phase III to compare the efficacy of intravenous Albumin and Cabergoline in prevention of ovarian hyperstimulation syndrome. The original study was done between 2010 to 2011 in Royan institute. We compared maximum likelihood and Bayesian estimation with generalized Gibbs sampling for an ordinal regression model based on confidence intervals and standard errors. The model were fit through R 3.3.2 software version.
Results: Markov Chain Monte Carlo results reduced the standard errors for estimates and consequently, narrower confidence intervals. Autocorrelations for generalized Gibbs sampler reached to zero in compare to standard Gibbs sampler for shorter time.
Conclusion: It seems that confidence intervals of an ordinal regression model are shorter for generalized Gibbs sampler in compare to standard Gibbs and maximum likelihood. It suggests doing more studies to warrant the results.
Hr Bahrami Taghanaki , E Mosa Farkhani , R Eftekhari Gol , P Bahrami Taghanaki , S Bokaei, A Taghipour, B Beygi,
Volume 16, Issue 3 (11-2020)
Abstract
Background and Objectives: Diabetes is considered as one of the most common endocrine disorders worldwide. The aim of this study was to investigate the factors associated with diabetic complications.
Methods: A case-control study was performed on the data of 70089 diabetic patients (4622 cases and 53613 controls) extracted from the SINA Electronic Health Record (SinaEHR®) in a population covered by Mashhad University of Medical Sciences in 2018. The effect of independent variables on the likelihood of diabetic complications was investigated using single-variable and multivariate logistic regression models with the control of the potential confounding effects.
Results: Using the multivariate logistic regression, the odds of developing diabetic complications were 0.35 (0.31-0.38) for living in the city, 0.73(0.67-0.79) for living in the suburbs and 0.31(0.28-0.33) for living in rural areas relative to the metropolises, 0.84 (0.78-0.91) for illiterate subjects, 0.70 (0.66-0.75) for physical activity, 1.51(1.34-1.71) for stage 1 hypertension and 1.87 (1.43-2.44) for stage 2 hypertension relative to normal blood pressure, 0.79(0.74-0.85) for uncontrolled low density lipoprotein and 1.42(1.33-1.51) for uncontrolled hemoglobin A1C.
Conclusion: Various risk factors were identified to increase the odds ratio of diabetic complications. The most important risk factors were uncontrolled glycosylated hemoglobin and stage 1 and 2 hypertension. Control of these factors can reduce the chance of diabetic complications in diabetic patients.
Nasrin Talkhi, Nooshin Akbari Sharak, Zahra Rajabzadeh, Maryam Salari, Seyed Masoud Sadati, Mohammad Taghi Shakeri,
Volume 18, Issue 3 (12-2022)
Abstract
Background and Objectives: Due to the high prevalence of COVID-19 disease and its high mortality rate, it is necessary to identify the symptoms, demographic information and underlying diseases that effectively predict COVID-19 death. Therefore, in this study, we aimed to predict the mortality behavior due to COVID-19 in Khorasan Razavi province.
Methods: This study collected data from 51, 460 patients admitted to the hospitals of Khorasan Razavi province from 25 March 2017 to 12 September 2014. Logistic regression and Neural network methods, including machine learning methods, were used to identify survivors and non-survivors caused by COVID-19.
Results: Decreased consciousness, cough, PO2 level less than 93%, age, cancer, chronic kidney diseases, fever, headache, smoking status, and chronic blood diseases are the most important predictors of death. The accuracy of the artificial neural network model was 89.90% in the test phase. Also, the sensitivity, specificity and area under the rock curve in this model are equal to 76.14%, 91.99% and 77.65%, respectively.
Conclusion: Our findings highlight the importance of some demographic information, underlying diseases, and clinical signs in predicting survivors and non-survivors of COVID-19. Also, the neural network model provided high accuracy in prediction. However, medical research in this field will lead to complementary results by using other methods of machine learning and their high power.