Starting from:

$35

SS 2864b  Jarque-Bera statistic


SS2864b 

1. [30] Jarque-Bera statistic is defined as
JBn =
nγ2
n
6
+
n(κn − 3)2
24
,
where γn =
1
ns3
n
Pn
i=1(Xi − X¯)
3 and κn =
1
ns4
n
Pn
i=1(Xi − X¯)
4 are standardized sample skewness
and kurtosis respectively, and X¯ and sn are the sample mean and standard deviation respectively. If X1, . . . , Xn are iid r.v.’s from Normal population, then JBn follows χ
2
(2) distribution
(asymptotically). However, when n is small (less than 200), the p-value calculated from χ
2
(2)
is not accurate. In the following you will use both χ
2
(2) and Monte Carlo simulation to find
its p-value.
(a) [6] Write a function JB with argument x and the return value should be the JBn. Test
your function with x=rnorm(100).
(b) [6] Simulate the first data set x1 from
n=100; eps=0; x1=rnorm(n, 2, 1+5*rbinom(n,1,eps))
and two additional data sets x2 with eps=0.01 and x3 with eps=0.05, respectively. Use
qqnorm and qqline on those three data sets and comment your findings.
(c) [6] Use Monte Carlo simulation to simulate JBn values under normal assumption (H0).
For each simulation, generate a normal data as x=rnorm(n) and then use the function
JB from (a) to compute JBn. Repeat it K times. You should write a function JB.MC
with arguments n and K = 50000 (n is the sample size) to implement such a procedure
and the return values should be a vector of those JBn (K of them).
(d) [6] Generate one output data set from JB.MC with n = 100 and plot its histogram
(as density with more break points or you can plot the density curve generated by the
function density). Then overlay χ
2
(2) density with col=2. Comment your findings, in
particularly, addressing their right tail behaves/matches.
(e) [6] Use the function JB from (a) with x1, x2, x3 as inputs to compute their JBn values.
Then, for each calculated JBn value, compute their p-values (right tails) through χ
2
(2)
distribution (the CDF of χ
2
(2) in R is pchisq(q,df=2)) and the output data set from (d).
Comment your findings, in particularly, addressing their differences. Do those p-value
match what you have found in (b)? You may run (b) a few times to get some consistency
and set a proper seed.
2. [30] Implement bootstrap procedure for AR(1) time series models. AR(1) is of the form
Xt = θXt−1 + εt
,
2
where {εt} is a sequence of iid errors with mean 0 and variance σ
2
. A least estimator of θ is
ˆθ =
Pn
t=1 Xt−1Xt
Pn
t=1 X2
t
.
Do the following steps to implement the bootstrap inference for θ. No looping is allowed except
(c)(ii).
(a) [6] Write a function theta.est with argument x to compute ˆθ, where x is the vector of
observations. Notice that X0 = 0 is assumed in the formula. Test your function with
x=huron-mean(huron) (the same time series huron used in Assignment 5. So please
download huron data set from owl) and save the result to my.est.
(b) [6] Use the centered huron data x and its theta estimator my.est from (a) to compute the
residuals as
εˆt = Xt − ˆθXt−1, t = 1, 2, . . . , n
and save the sample mean centered residuals ˆε1−¯ε, . . . , ˆ εˆn−¯εˆ to a vector my.resid, where
¯εˆ is the mean of those ˆεt
. Check the distribution of my.resid by its histogram or density
and qqnorm (qqline) and comment your findings.
(c) [12] Write a function one.boot with arguments eps=my.resid and theta=my.est to
compute one bootstrap estimation of θ. The procedures are
i. Resample eps with replacement to get a vector called eps.star.
ii. Compute x

t = ˆθx∗
t−1 + ε

t
, t = 1, . . . , n (assume that x

0 = 0).
iii. Compute and return the least estimator ˆθ
∗ based on x

1
, . . . , x∗
n by calling the function
theta.est from (a).
Test your function with default values to see if it works and test again to see if a different value is produced. Then run 10000 bootstraps (using replicate with one.boot) to
produce a vector output.
(d) [6] Use the output from (c) to find the histogram or density curve of ˆθ
∗ − ˆθ and comment
your findings. Then construct 95% confidence interval for θ. You need to use the fact that
ˆθ − θ ≈ ˆθ
∗ − ˆθ in distribution.
Notice that this inference is for the huron time series only.
3. [30] NASA’s GISS Surface Temperature Analysis (GISTEMP) is an estimate of global surface
temperature change recorded monthly. A file, GLB.Ts dSST.csv, can be downloaded from
owl in Data sets folder. It contains monthly temperatures from 1880 to 2019. Please do the
following steps to carry out some basic modelings. Any looping such as for, while, repeat is
not allowed.
3
(a) [5] Import the dataset into R as a data frame. Create another data frame to keep only Jan,
..., Dec 12 columns. Then use apply function to generate a vector of yearly average temperatures from 1880 to 2019 and save it as yearly.temp. Plot it against year=1880:2019
(as a line) and comment your findings (any pattern changes).
(b) [7] Construct an objective function based on (sum of square)
Xn
i=1
(yearly.temp[i] − a − b ∗ year[i])2
and use nlminb with start values (-10,0.1) to estimate a, b (saved as ls.est). You cannot
use lm to find a, b though you can use it to check if your answer is right or not. Then compute the residuals as resid=yearly.temp-ls.est[1]-ls.est[2]*year. For model diagnostic
checking based on residuals, do the following steps.
i. Add the fitted line (the fitted values computed as fitted=ls.est[1]+ls.est[2]*year)
as col=2 to the original yearly temperature plot and comment how good or bad the
fitted line is.
ii. Scatter plot of resid against year and comment out if there are any patterns or it is
completely random.
iii. Use qqnorm and qqline to resid to see if it is normally distributed. Comment your
findings.
(c) [7] Redo (b) with an objective function based on
Xn
i=1
(yearly.temp[i] − a − b ∗ year[i] − c ∗ (year[i])2
)
2
with start values (300, -1, 0.1). Notice that you have to add this option scale=c(1, 300,
1000) to nlminb otherwise it doesn’t converge (check the usage of scale in nlminb’s
help). For the fitted line, choose col=3.
(d) [7] Create a dummy vector as dummy= c(rep(0,85), rep(1,55)). Redo (b) with an objective function based on
Xn
i=1
(yearly.temp[i] − a − b ∗ year[i] − c ∗ dummy[i] − d ∗ dummy[i] ∗ year[i])2
with start values (-1, 1, -1, 1). For the fitted line, choose col=4.
(e) [4] Comment the similarity and difference among three models. You can overlay three
fitted lines together in a new plot to comment. Which is the best fitted model? What are
your conclusions of rising temperature rates per decade?
Disclaim: Three models from above do not take the consideration of yearly.temp as a
time series. Further study is needed for more accurate statistical modelings.
4

More products