Classification and regression tree for characterising smoking patterns among adults: evidence from global adult tobacco survey, Bangladesh
More details
Hide details
1
Jahangirnagar University, Savar, Dept. of Statistics, Bangladesh
Publication date: 2018-03-01
Tob. Induc. Dis. 2018;16(Suppl 1):A296
Download abstract book (PDF)
KEYWORDS
TOPICS
ABSTRACT
Background:
Tobacco consumption
is a preventable public health problem. Many tobacco related
studies have employed logistic regression in their analysis and they mostly
analyzed categorical variables with dichotomous outcomes. In comparison to logistic
regressions, classification and regression tree (CART), a data mining technique
have not been widely applied for tobacco related research though this technique
has enormous benefits over other methods. Therefore, this study examines the smoking patterns among adults by
CART method and to compare findings with other traditional techniques.
Methods:
Dataset covered a
nationally representative sample of 9,629 respondents extracted from Global
Adult Tobacco Survey, Bangladesh and used CART techniques for its suitability
than others such as, binary logistic regression, multinomial logistic
regression, chi-squared automatic interaction detector, quick unbiased
statistical test.
Results:
CART was used to
characterize the cigarette smoking behaviour among adults aged 15 years and
above in Bangladesh. The
algorithm builds a tree model to
classify "average number of cigarettes smoked per day" using some attributes as
predictors. CART was found easy to understand compared to other data mining
techniques. Logistic regression model requires the parametric
assumption (PA) of the dependent variable. However, this PA often restricts when data "A mixture of categorical and
continuous variables". Therefore, CART is appropriate because: (i) Purely
non-parametric and is independent of distribution assumptions (ii) Can handle
both continuous and categorical data (iii) Can use skewed or multi-modal data
without requiring the independent variables to be normally distributed (iv) Can
handle missing data (v) Relatively automatic 'machine learning (vi) Less input
is needed for analysis and (vii) Visualization character and its results are
simple to interpret even for non-statisticians.
Conclusions:
Among the
different techniques so far used in characterizing smoking patterns among
adults, CART is the best in terms of all aspects and suggested for future
research.