$29.99
1. Medical image estimation.
Suppose xi
, i = 1, . . . , n are i.i.d. Poisson with
P(xi = k) = e
−µiµ
k
i
k!
with unknown mean µi
. The variables xi represent the number of times that one of n
possible independent events occurs during a certain period. In emission tomography, they
may represent the number of photons emitted by n sources.
We consider an experiment designed to determine the means µi
. The experiment involves
m detectors. If event i occurs, it is detected by detector j with probability pji. We assume
the probabilities pji are given (with pji 0 and Pm
j=1 pji ≤ 1. The total number of events
recorded by detector j is denoted by yj ,
yj =
Xn
i=1
yji, j = 1, . . . , m.
Formulate the maximum likelihood estimation problem of estimating the means µi
, based
on observed values of yj , j = 1, . . . , m. Will the maximum likelihood function returns a
unique maximizer? (Hint: the variables yji have Poisson distribution with means pjiµi
.
The sum of n independent Poisson variables with means λ1, . . . , λn has a Poisson distribution with mean λ1 + · · · + λn.
2. Logistic regression.
Given n observations (xi
, yi), i = 1, . . . , n, xi ∈ R
p
, yi ∈ {0, 1}, parameters a ∈ R
p and
b ∈ R. Consider the log-likelihood function for logistic regression:
`(a, b) = Xn
i=1
{yi
log h(xi
; a, b) + (1 − yi) log(1 − h(xi
; a, b))}
(a) Derive the Hessian H of this function and show that H is negative semi-definite (this
implies that ` is concave and has no local maxima other than the global one.)
(b) Use data logit-x.dat and logit-y.dat from Tsquare, which contain the predictors xi ∈
R
2 and response yi ∈ {0, 1} respectively for logistic regression problem. Implement
Newton’s method for optimizing `(a, b) and apply it to fit a logistic regression model
to the data. Initialize Newton’s method with a = 0, b = 0. Plot the value of the log
likelihood function versus iterations. (You may use load logit-x.dat to load data.)
What are the coefficients a and b from your fit?
1
(c) Find a value of step-size that gives you convergence, and another value of step-size
(larger) where your algorithm diverges.
3. Locally weighted linear regression.
Consider a linear regression problem in which we want to weight different training examples
differently. Specifically, suppose we want to minimize
J(θ) = 1
2
Xn
i=1
wi(θ
T xi − yi)
2
.
In class, we have worked out what happens for the case where all the weights are the same.
In this problem, we will generalize some of those ideas to the weighted setting, and also
implement the locally weighted linear regression algorithm.
(a) Show that J(θ) can also be written as
J(θ) = (Xθ − y)
TW(Xθ − y)
for an appropriate diagonal matrix W, matrix X and vector y. State clearly what
these matrices and vectors are.
(b) Suppose we have samples (xi
, yi), i = 1, . . . , n of n independent examples, but in
which the yi
’s were observed with different variances, and
p(yi
|xi
, θ) = 1
q
2πσ2
i
exp(−
(yi − θ
T xi)
2
2σ
2
i
)
i.e. yi has mean θ
T xi and variance σ
2
i
(where σ
2
i
are fixed, known, constants). Show
that finding the maximum likelihood estimate of θ reduces to solving a weighted linear
regression problem. State clearly what the wis are in terms of σ
2
i
’s.
(c) Use data rx.dat and ry.dat, which contain the predictors xi and response yi respectively for our problem. Implement gradient descent for (unweighted) linear regression
that we derived in class on this dataset, and plot on the same figure the data and the
straight line resulting from your fit. (Remember to include the intercept term.)
(d) Implement locally weighted linear regression on this dataset, using gradient descent,
and plot on the same figure the data and the line resulting from your fit. Using the
following weights
wi = exp(−x
2
i /(20)).
Plot the J(θ) versus iterations.
4. Exponential family and Fisher information.
A PDF f(x|θ) of a random variable is called to be from an exponential family if we can
write
f(x|θ) = g(x)e
β(θ)+h(x)γ(θ)
for some g(x), β(θ), h(x) and γ(θ).
2
(a) Show that Bernoulli, Binomial, Poisson, Exponential and Gaussian distributions all
belong to exponential family. Here the PDF for them are given by
Bernoulli: f(x|p) = p
x
(1 − p)
1−x
, x = {0, 1}
Binomial: f(x|n, p) = ?
n
x
?
p
x
(1 − p)
n−x
, x = {0, 1, . . . , n}
Poisson: f(x|λ) = e
−λλ
x
/x!, x = {0, 1, . . .}
Exponential: f(x|λ) = e
−λxλ, x ≥ 0
Gaussian: f(x|µ, Σ) = 1
p
(2π)
p|Σ|
e
− 1
2
(x−µ)Σ−1
(x−µ)
, x ∈ R
p
(b) Find the Fisher information for Bernoulli distribution.
5. House price dataset.
The HOUSES dataset contains a collection of recent real estate listings in San Luis Obispo
county and around it. The dataset is provided in RealEstate.csv.
The dataset contains the following fields:
• MLS: Multiple listing service number for the house (unique ID).
• Location: city/town where the house is located. Most locations are in San Luis
Obispo county and northern Santa Barbara county (Santa Maria-Orcutt, Lompoc,
Guadelupe, Los Alamos), but there some out of area locations as well.
• Price: the most recent listing price of the house (in dollars).
• Bedrooms: number of bedrooms.
• Bathrooms: number of bathrooms.
• Size: size of the house in square feet.
• Price/SQ.ft: price of the house per square foot.
• Status: type of sale. Thee types are represented in the dataset: Short Sale, Foreclosure and Regular.
Fit linear regression model to predict Price using remaining factors (except Status), for
each of the three types of sales: Short Sale, Foreclosure and Regular, respectively.
3