Homework 2
Part 1
PCA
1. PCA can be explained from two different perspectives. What are the two perspectives?
2. The first principal direction is the direction in which the projections of the data points have the largest
variance in the input space. We use ?1 to represent the first/largest eigenvalue of the covariance matrix, ?1
to denote the corresponding principal vector/direction (?1 has unit length i.e., L2 norm is 1), ? to represent
the sample mean, and ? to represent a data point. The deviation of ? from the mean ? is ? − ?.
? = ???(?) Is implemented in sk-learn with "whiten=True", and the number of components/elements of
? is usually less than the number of components/elements of ?
(1) what is the scalar-projection of ? in the direction of ?1 ?
(2) what is the scalar-projection of the deviation ? − ? in the direction of ?1?
(3) what is the first component of ? ?
hint: it is a function of ?1
, ?, ?, and ?1
(4) assuming ? only has one component, then we do inverse transform to recover the input
?̃ = ???−1
(?)
compute ?̃ using ?, ?, ?1 and ?1: ?̃ =? ? ?
(5) assuming ? and y have the same number of elements, and we do inverse transform to recover the input
?̃ = ???−1
(?)
what is the value of ? − ?̃ ?
3. Show that PCA is a linear transform of ? − ?
(use the definition on
http://mathworld.wolfram.com/LinearTransformation.html)
Maximum Likelihood Estimation and NLL loss
(This is a general method to estimate parameters of a PDF using data samples)
4. Maximum Likelihood Estimation when the PDF is an exponential distribution.
Suppose we have N i.i.d. (Independently and identically distributed) data samples {?1, ?2, ?3, … , ??}
generated from a PDF which is assumed to be an exponential distribution. ?? ∈ ℛ
+ ??? ? = 1 ?? ?, which
means they are positive scalars. This is the PDF:
?(?) = {
??
−?? ??? ? ≥ 0
0 ??ℎ??????
Your task is to build an nll (negative log likelihood) loss function to estimate the parameter ? of the pdf
from the data samples.
(1) write the NLL loss function: it is a function of the parameter ?
(2) take the derivative of the loss with respect to ?, and set the result to 0.
After some calculations, you will obtain an equation about ? =∗∗∗∗∗∗
Hint: read nll in the lecture of gmm
5. Maximum Likelihood Estimation when the PDF is histogram-like.
A histogram-like PDF ?(?) is defined on a 1-dimensional (1D) space that is divided into fixed
regions/intervals. So, ?(?) takes constant value ℎ?
in the i-th region. There are K regions. Thus,
{ℎ1, ℎ2, … , ℎ?} is the set of (unknown) parameters of the PDF. Also, ∑ ℎ?Δ?
?
?=1 = 1, where Δ?
is the width
of the i-th region.
Now, we have a dataset of N samples {?1, ?2, ?3, … , ??}, and ??
is the number of samples in the i-th region.
The task is to find the best parameters of the PDF using the samples.
(1) write the NLL loss function: it is a function of the parameters
Note: it is a constrained optimization problem, we need to use Lagrange multiplier to convert constrained
optimization to unconstrained optimization. Thus, add ?(∑ ℎ?Δ?
?
?=1 − 1) to the loss function, where ? is the
Lagrange multiplier.
(2) take the derivative of the loss with respect to ℎ?
, set it to 0, and obtain the best parameters along with
the value of ?.
Part 2
Complete the task in
H2P2T1.ipynb and
H2P2T2.ipynb
Note: It is very time-consuming to fit a gmm to high dimensional data, and therefore pca + gmm is the
"standard" approach.
Delivery term: September 30, 2022