Starting from:

$30

Factors Affecting Children’s First Smoking

Factors Affecting Children’s First Smoking

1 Introduction
The dataset is coming from the 2014 American National Youth Tobacco Survey, it provides the data related to the
situation of smoking among young Americans. In this report, we will focus on addressing two hypotheses. First of
all, whether geographic variation(between states) in the mean age children first try cigarettes is substantially greater
than variation amongst schools. Secondly, whether two non-smoking children have the same probability of trying
cigarettes within the next month, given the same confounders and random effects.
2 Method
We will fit a Weibull survival model to analyze the hypotheses given the Weibull distribution is approximately a good
fit in our model. Then, we add school and state as random effects in our model in order to analyze the first hypothesis.
In addition, some confounding variables should be added and treated as fixed effects, such as sex, rural/urban and
ethnicity. Given the following model:
Zijk|Yijk, Aijk, Schoolij , Statei = min(Yijk, Aijk)
Eijk, Aijk, Schoolij , Statei = I(Yijk < Aijk)
Yijk|Schoolij , statei ∼ W eibull(λijk, α)
λijk = exp(−ηijk)
ηijk = β0 + β1IRural,ijk + β2IF emale,ijk + β3IBlack,ijk + β4ISpanic,ijk + β5IAsian,ijk
+β6INative,ijk + β7IP acif ic,ijk + Schoolij + Statei
Statei ∼ N(0, σ2
u
)
Schoolij simN(0, σ2
v
)
Noted Aijk is individual’s age. When Eijk = 1, then Yijk = Zijk. When Eijk = 0, then Zijk < Y ijk < ∞
Where Yijk is the age of child k first tried cigarette in state i and school j. Xijkβ/(β0 +... +β7IP acif ic,ijk) contains
an intercept, and all the fixed effects, including sex, rural/urban and ethnicity. Statei and Schoolij are both random
effects.
0.025quant mean 0.975quant
0.271 1 3.694
Table 1: Mean and 95%CI of shape prior
In terms of prior distribution, we firstly put Log-Normal(log(1), 2/3) for shape(α) parameter, it is consistent with our
prior assumption based on Table 1. Since Weibull shape parameter that allows for the mean value 1 can make a flat
hazard function, also, the shape parameter can not be 4 or 5, because they are not in the 95% CI. In addition, we
use penalized complexity prior that follows the exponential distribution, for State σ
2
u
and School σ
2
v
, with P(σu
0.7) = 0.01 and P(σv 0.13) = 0.01. (Mathematically, 0.7 is computed by log(10)/3 and 0.13 is computed by
log(1.5)/3 since 3 standard deviations account for about 99% data, then σu ∼ exp(6.6) and σv ∼ exp(35.4)) . They
satisfy two assumptions: the variability in the rate of smoking initiation between states with some states having
double or triple the rate of smoking update compared other states but unlikely to see at 10; the ‘worst’ schools are
1
expected to have at most 50% greater rate than the ‘healthiest’ schools. Based on the Figure 1 shown below, our prior
chosen looks appropriate.
Figure 1: Posterior and Prior Distribution for Shape, School and State
3 Result
mean 0.025quant 0.975quant
(Intercept) 1.857 1.960 1.758
RuralUrbanRural 0.893 0.947 0.843
SexF 1.051 1.072 1.030
Raceblack 1.057 1.094 1.022
Racehispanic 0.967 0.994 0.941
Raceasian 1.213 1.299 1.136
Racenative 0.912 0.989 0.844
Racepacific 0.882 1.019 0.775
SD for school 0.144 0.122 0.173
SD for state 0.059 0.027 0.103
alpha parameter for weibullsurv 2.985 2.901 3.070
Table 2: Posterior Means and Quantiles for Model Parameters.
2
The Table 2 provides the natural scale of the parameters of fixed effects. Based on our model shown above, we simply
take the negative and exponential of original estimators since λijk = exp(−ηijk).
3.1 Hypothesis 1: variation between state and school
To addressing the first hypothesis, we can focus on the estimators of standard deviation(SD) for school and state
shown in the Table 2. First of all, both 95% CI of SD do not included 0, which indicates state and school as random
effects are significant. In other words, the age children first try cigarettes differ from both state and school. If we
want to compare the variation, we found that the mean of standard deviation for school is more than twice the mean
of standard deviation for state given their value are 0.144 and 0.059 respectively. Thus, tobacco control programs
should target the schools with the earliest smoking ages and not concern themselves with finding particular states
where smoking is a problem.
3.2 Hypothesis 2: probability of trying cigarettes for non-smoking children
Figure 2: The Cumulative Hazard Function
To addressing the second hypothesis, we can first look at the estimated alpha(α)/shape parameter in the Table 2. If
two non-smoking children have the same probability of trying cigarettes within the next month, irrespective of their
ages but provided the same confounders(sex, rural/urban, ethnicity) and random effects(school and state), then the
cigarette smoking must have a flat hazard function. In other words, the shape parameter should be approximately 1.
We find that the estimated alpha parameter is very far from 1 and not even inside the 95% CI. Another way to check
whether the hazard function is flat can look at the cumulative hazard function(Figure 2), we find that the plot shows
a non-linear pattern. It supports the hazard function is not flat. Also, we notice that the cumulative hazard function
increases more quickly when age getting older. In summary, age will affect the probability of non-smoking children
trying cigarettes within the next month, and older the age, higher the probability.
3
4 Conclusion
Based on the analysis of the result of the dataset from the 2014 American National Youth Tobacco Survey, we have
addressed two hypothesis. First of all, variation among schools in the mean age children first try cigarettes is substantially greater than geographic variation(between states). Thus, tobacco control programs should target the particular
school instead of the state. Secondly, non-smoking children have the different probability of trying cigarettes within
the next month given their different ages, even though they have the same sex, ethnicity and etc. More specific, the
older children are more likely to smoke in the next month. In general, tobacco control programs should pay more
attention to older children in particular schools.
5 Appendix
smokeFile = Pmisc::downloadIfOld("http://pbrown.ca/
teaching/appliedstats/data/smoke.RData")
load(smokeFile)
smoke = smoke[smoke$Age 9, ]
forInla = smoke[, c("Age", "Age_first_tried_cigt_smkg",
"Sex", "Race", "state", "school", "RuralUrban")]
forInla = na.omit(forInla)
forInla$school = factor(forInla$school)
library("INLA")
forSurv = data.frame(time = (pmin(forInla$Age_first_tried_cigt_smkg,
forInla$Age) - 4)/10,
event = forInla$Age_first_tried_cigt_smkg <=
forInla$Age)
# left censoring
forSurv[forInla$Age_first_tried_cigt_smkg == 8, "event"] = 2
smokeResponse = inla.surv(forSurv$time, forSurv$event)
fitS2 = inla(smokeResponse ~ RuralUrban + Sex + Race +
f(school, model = "iid", hyper = list(prec = list(prior = "pc.prec",
param = c(0.13, 0.01)))) +
f(state, model = "iid",hyper = list(prec = list(prior = "pc.prec",
param = c(0.7,0.01)))),
control.family = list(variant = 1,hyper = list(alpha = list(prior = "normal",
param = c(log(1),(2/3)^(-2))))),
control.mode = list(theta = c(8,2, 5), restart = TRUE),
data = forInla, family = "weibullsurv",verbose = TRUE,
control.compute=list(config = TRUE))
library(xtable)
alpha = rbind(c("0.025quant","mean","0.975quant"),
exp(qnorm(c(0.025, 0.5, 0.975), mean = log(1), sd = 2/3)))
print(xtable(alpha,digits = 3))
res = rbind(exp(-fitS2$summary.fixed[, c("mean", "0.025quant","0.975quant")]),
Pmisc::priorPostSd(fitS2)$summary[,c("mean", "0.025quant", "0.975quant")],
fitS2$summary.hyper[1,c("mean", "0.025quant", "0.975quant")])
print(xtable(res,digits = 3))
library(survival)
fitS2$priorPost = Pmisc::priorPost(fitS2)
for (Dparam in fitS2$priorPost$parameters) {
4
do.call(matplot, fitS2$priorPost[[Dparam]]$matplot)
do.call(legend, fitS2$priorPost$legend)}
forSurv1 = data.frame(time = (pmin(
forInla$Age_first_tried_cigt_smkg,forInla$Age) - 4)/10,
event = forInla$Age_first_tried_cigt_smkg <= forInla$Age)
forSurv1$event = as.numeric(forSurv1$event)
hazEst = survfit(Surv(time*10+4, event) ~ 1, data=forSurv1)
xSeqNatural = seq(4, 100, len=1000)
xSeqTrans = (xSeqNatural-4)/10
densHaz = Pmisc::sampleDensHaz(fit = fitS2, x = xSeqTrans, n = 20)
matplot(xSeqNatural, densHaz[, "cumhaz", ], type = "l",
lty = 1, col = "#FF000020", ylim = c(0.001,1), xlim = c(0,20),
xlab = "Age",ylab = "Cumulative Hazard Function")
lines(hazEst, fun = "cumhaz")
5
Are female pedestrians safer than male in UK?
Xinqi Shen
1 Introduction
The dataset is coming from the department of transport in the UK, it consists of all pedestrians involved in motor
vehicle accidents with either fatal or slight injuries(pedestrians with moderate injuries have been removed). In this
report, we will addressing the hypothesis that whether women trend to be, on average, safer as pedestrians than
men, particular as teenagers and in early adulthood.
2 Method
We will fit a conditional logistic regression model. First of all, we treat fatal accidents as cases and slight injuries as
controls. Then, in order to adjust for time of day, lighting conditions, and weather, we split our data into different
strata based on the same conditions. Last but not least, we include sex and the interaction between sex and age in
our model for analyzing our hypothesis. Given the following model:
logit[pr(Yij = 1)] = αi + Xijβ
logit[pr(Yij = 1)|Zij = 1] = α

i + Xijβ
α

i = αi + log[pr(Zij = 1|Yij = 1)/pr(Zij = 1|Yij = 0)]
Where Yij is either 1 or 0 representing fatal or slight injuries. Noted that i is strata, then Yi1 is case i with j 1 are
controls. α

i
is some constant value for strata i. Xijβ contains age and the interaction term between sex and age.
Zij is ’in the study’ indicators.
3 Result
coef exp(coef) se(coef) z Pr(|z|)
age0 - 5 0.132 1.142 0.044 3.008 0.003
age6 - 10 -0.320 0.726 0.041 -7.822 0.000
age11 - 15 -0.383 0.682 0.041 -9.305 0.000
age16 - 20 -0.443 0.642 0.040 -10.958 0.000
age21 - 25 -0.268 0.765 0.042 -6.355 0.000
age36 - 45 0.412 1.509 0.039 10.648 0.000
age46 - 55 0.768 2.156 0.039 19.709 0.000
age56 - 65 1.212 3.361 0.038 32.023 0.000
age66 - 75 1.797 6.033 0.036 49.447 0.000
ageOver 75 2.396 10.976 0.035 68.124 0.000
age26 - 35:sexFemale -0.448 0.639 0.052 -8.573 0.000
age0 - 5:sexFemale 0.028 1.029 0.055 0.517 0.605
age6 - 10:sexFemale -0.177 0.838 0.051 -3.490 0.000
age11 - 15:sexFemale -0.250 0.779 0.047 -5.295 0.000
age16 - 20:sexFemale -0.279 0.756 0.052 -5.364 0.000
age21 - 25:sexFemale -0.369 0.691 0.063 -5.828 0.000
age36 - 45:sexFemale -0.448 0.639 0.052 -8.679 0.000
age46 - 55:sexFemale -0.376 0.686 0.048 -7.792 0.000
age56 - 65:sexFemale -0.237 0.789 0.040 -5.878 0.000
age66 - 75:sexFemale -0.143 0.866 0.032 -4.429 0.000
ageOver 75:sexFemale -0.126 0.882 0.027 -4.606 0.000
Table 1: Estimated Model Parameters
1
Figure 1: Estimated Model Parameters
Both the Table 1 and the Figure 1 provide the information about the odds ratio involved in motor vehicle accidents
with fatal injuries for male and female. Noted that, we should look at exp(coef) column and the reference group here
is male with age 26-35. Then, the exponential coefficients of age term provide the odds ratio for male between some
age group and the reference age group. However, the exponential coefficients of interaction terms provide the odds
ratio between female and male in the same age group. Thus, based on the results among interaction terms, we found
that except women with age 0-5, all the odds ratio for women are less than 1. In other words, in general, women
tend to be safer as pedestrians than men. Also, we noticed that the smallest odds ratio for women is in age 26-45,
provided the value 0.639. Hence, women pedestrians are safer than men, particularly in middle-aged.
4 Conclusion
Based on the analysis of the result of the dataset from the department of transport in the UK, we have addressed our
hypothesis. Thus, we can conclude that women tend to be, on average, safer as pedestrians than men, particular in
middle-aged(26-45). Anyway, everyone needs to raise security awareness.
5 Appendix
library(Pmisc)
library("survival")
pedestrainFile = Pmisc::downloadIfOld(
"http://pbrown.ca/teaching/appliedstats/data/pedestrians.rds")
pedestrians = readRDS(pedestrainFile)
pedestrians = pedestrians[!is.na(pedestrians$time),]
pedestrians$y = pedestrians$Casualty_Severity == "Fatal"
pedestrians$timeCat = format(pedestrians$time, "%Y_%b_%a_h%H")
pedestrians$strata = paste(pedestrians$Light_Conditions,
pedestrians$Weather_Conditions, pedestrians$timeCat)
theTable = table(pedestrians$strata, pedestrians$y)
onlyOne = rownames(theTable)[which(theTable[, 1] ==
0 | theTable[, 2] == 0)]
x = pedestrians[!pedestrians$strata %in% onlyOne, ]
summary(glm(y ~ sex + age + Light_Conditions + Weather_Conditions,
data = x, family = "binomial"))$coef[1:4, ]
theClogit = clogit(y ~ age + age:sex + strata(strata), data = x)
2
theCoef = rbind(as.data.frame(summary(theClogit)$coef),
`age 26 - 35` = c(0, 1, 0, NA, NA))
theCoef$sex = c("Male", "Female")[1 + grepl("Female",
rownames(theCoef))]
theCoef$age = as.numeric(gsub("age|Over| - [[:digit:]].*|[:].*",
"", rownames(theCoef)))
theCoef = theCoef[order(theCoef$sex, theCoef$age),]
res = summary(theClogit)$coef[,1:5]
library(xtable)
print(xtable(res,digits = 3))
matplot(theCoef[theCoef$sex == "Male", "age"],
exp(as.matrix(theCoef[theCoef$sex ==
"Male", c("coef", "se(coef)")]) %*% Pmisc::ciMat(0.99)),
log = "y", type = "l", col = "black", lty = c(1,
2, 2), xaxs = "i", yaxs = "i",ylab='Odds Ratio', xlab='(a) male')
matplot(theCoef[theCoef$sex == "Female", "age"],
exp(as.matrix(theCoef[theCoef$sex ==
"Female", c("coef", "se(coef)")]) %*% Pmisc::ciMat(0.99)),
log = "y", type = "l", col = "black", lty = c(1,
2, 2), xaxs = "i",ylab='Odds Ratio', xlab='(b) female')
3

More products