---
title: "Multivariate Statistical Analysis - Exercise Session 9"
date: "`r format(Sys.time(), '%d.%m.%Y')`"
output:
bookdown::pdf_document2:
extra_dependencies: ["float"]
latex_engine: xelatex
number_sections: no
toc: no
header-includes:
- \usepackage{amsmath}
- \usepackage{siunitx}
---
```{r, echo=FALSE}
# Figure placement
knitr::opts_chunk$set(fig.pos = "H", out.extra = "")
```
# Problem 1: Linear Discriminant Analysis
Install package \texttt{MASS} if you haven't yet.
```{r, eval=FALSE}
install.packages("MASS")
```
Import package \texttt{MASS} and read the data. Leave only species versicolor and virginica, i.e., drop first 50 observations. We also drop redundant level setosa with the function \texttt{droplevels}.
```{r}
library(MASS)
data(iris)
iris <- droplevels(iris[-(1:50), ])
dim(iris)
head(iris)
```
(ref:captionscatter) Pairwise scatter plots of variables.
Firstly, we visualize the data in Figure \@ref(fig:scatter).
```{r scatter, fig.cap="(ref:captionscatter)"}
pairs(iris[, 1:4], pch = c(16, 17)[iris$Species], gap = 0, upper.panel = NULL,
col = c(rgb(0, 0, 1, 0.5), rgb(1, 0, 0, 0.5))[iris$Species])
par(xpd = TRUE)
legend(0.75, 0.75, legend = levels(iris$Species), pch = c(16, 17),
col = c(rgb(0, 0, 1, 0.5), rgb(1, 0, 0, 0.5)), cex = 1)
```
## a) Perform Fisher's linear discriminant analysis
With function \texttt{lda} from package \texttt{MASS} we can perform Fisher's linear discriminant analysis. Function \texttt{lda} can be used quite similarly compared to the function \texttt{lm}. That is, the input can be given as a formula
$$
\texttt{y} \sim \texttt{x1} + \texttt{x2} + \dots + \texttt{xg},
$$
where response \texttt{y} is the grouping factor and \texttt{x1}, \texttt{x2}, \ldots, \texttt{xg} are the discriminators. In this case,
$$
\begin{split}
&\texttt{y} = \texttt{Species}, \\
&\texttt{x1} = \texttt{Sepal.Length}, \\
&\texttt{x2} = \texttt{Sepal.Width}, \\
&\texttt{x3} = \texttt{Petal.Length} \quad\text{and} \\
&\texttt{x4} = \texttt{Petal.Width}.
\end{split}
$$
Below we perform Fisher's linear discriminant analysis and give the vector $a$.
```{r}
# Species ~ . is a shorthand notation for
# Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
iris_lda <- lda(Species ~ ., data = iris)
a_lda <- iris_lda$scaling
a_lda
```
Next we perform Fisher's linear discriminant analysis manually. Let
$$
\begin{split}
&W = \sum_{i=1}^g (n_i - 1) \textrm{Cov}(X_i) \quad\text{and} \\
&B = \sum_{i=1}^g n_i(\bar x_i - \bar x)(\bar x_i - \bar x)^T.
\end{split}
$$
In the case when $g = 2$, we can express $B$ as
$$
B = \frac{n_1n_2}{n} dd^T,
$$
where $d = \bar x_1 - \bar x_2$. Then vector $a$ is obtained by computing
$$
a = W^{-1} d.
$$
In general case $g \in \{2,3, \ldots\}$, solution $a$ is equal to the eigenvector corresponding to the largest eigenvalue of $W^{-1} B$.
```{r}
# cov(X_1)
s1 <- cov(iris[1:50, 1:4])
# cov(X_2)
s2 <- cov(iris[51:100, 1:4])
d1 <- colMeans(iris[1:50, 1:4])
d2 <- colMeans(iris[51:100, 1:4])
d <- d1 - d2
d <- as.matrix(d, ncol = 1)
b <- (50 * 50) / 100 * d %*% t(d)
w <- 49 * (s1 + s2)
l <- solve(w) %*% b
# Eigenvectors of l
a_manual <- as.numeric(eigen(l)$vectors[, 1])
a_manual2 <- solve(w) %*% d
```
By comparing \texttt{a\_lda}, \texttt{a\_manual} and \texttt{a\_manual2} we can see that they are equal up to scale.
```{r}
c(norm(a_lda, type = "2"), norm(a_manual, type = "2"),
norm(a_manual2, type = "2"))
a_lda <- a_lda / norm(a_lda, type = "2")
a_manual2 <- a_manual2 / norm(a_manual2, type = "2")
data.frame(a_lda = a_lda, a_manual = a_manual, a_manual2 = a_manual2)
```
## b) In which group will new flower be classified?
```{r}
newobs <- data.frame(Sepal.Length = 6, Sepal.Width = 3, Petal.Length = 4,
Petal.Width = 1)
predict(iris_lda, newdata = newobs)$class
```
New flower will be classified as \texttt{versicolor}.
We can also predict the group of new observation $x$ by hand. New observation $x$ is allocated to the population whose mean score is closest to the $a^Tx$. That is, $x$ is allocated to group $j$, if
$$
|a^Tx - a^T\bar x_j| < |a^Tx - a^T\bar x_i|, \quad\text{for all $i\neq j$}.
$$
```{r}
# new observation is classified as versicolor
abs(t(a_lda) %*% (as.numeric(newobs) - d1)) <
abs(t(a_lda) %*% (as.numeric(newobs) - d2))
```
## c) Leave-one-out cross-validation
Below we perform leave-one-out cross-validation
```{r}
d_cv <- lda(Species ~ ., data = iris, CV = TRUE)
```
Results of leave-one-out cross-validation can be cross tabulated. Here two versicolor are classified as virginica and one virginica is classified as versicolor.
```{r}
result <- data.frame(est = d_cv$class, truth = iris[, 5])
table(result)
```
Thus misclassification rate is $3/100 = 0.03$. We can also perform leave-one-out cross-validation manually.
```{r}
predicted <- rep(NA, 100)
for (i in 1:100) {
train <- iris[-i, ]
test <- iris[i, ]
predicted[i] <- predict(lda(Species ~ ., data = train), newdata = test)$class
}
predicted <- factor(predicted, levels = c(1, 2),
labels = c("versicolor", "virginica"))
sum(predicted != iris$Species) / nrow(iris)
```
# Problem 2: Fisherâ€™s Linear Discriminant Function
Show that the solution for the problem:
$$
\max_a = \left\{ \frac{a^TBa}{a^TWa}\right\},
$$
is obtained by setting $a$ equal to the eigenvector of $W^{-1}B$ that corresponds to the largest eigenvalue.
$$
W = \text{measure of group dispersions} \quad
B = \text{dispersion between groups}
$$
Since the vector $a$ can be scaled arbitrarily without affecting the ratio, we can formulate the problem as follows:
$$
\max_a\left\{a^TBa\right\} \quad\text{s.t.}\quad a^TWa = 1.
$$
Let $W^{1/2}$ be the symmetric square root of $W$. Note that matrix $W$ is symmetric. Let $z = W^{1/2} a$ and $a = W^{-1/2} z$. Then
$$
\begin{split}
&a^TBa = \left(W^{-1/2} z\right)^TB\left(W^{-1/2} z\right)
= z^TW^{-1/2}BW^{-1/2}z \quad\text{and} \\
&a^TWa = \left(W^{-1/2} z\right)^TW\left(W^{-1/2} z\right)
= z^T\underbrace{W^{-1/2}WW^{-1/2}}_{=I}z = z^Tz.
\end{split}
$$
Note that $W^{-1/2}BW^{-1/2}$ is symmetric and hereby the spectral decomposition exists ($W^{-1/2}BW^{-1/2} = \Gamma\Lambda\Gamma^T$ and $\Gamma^T\Gamma = I$). Denote $w = \Gamma^Tz$. Then
$$
\begin{split}
&z^TW^{-1/2}BW^{-1/2}z = z^T\Gamma\Lambda\Gamma^Tz = w^T\Lambda w,
\quad\text{and} \\
&z^Tz = z^T\Gamma\Gamma^Tz = w^Tw.
\end{split}
$$
Now we can reformulate the problem as
$$
\max_w\left\{w^T\Lambda w\right\} = \max_w\left\{\sum_{i=1}^p
\lambda_iw_i^2\right\} \quad\text{s.t.}\quad w^Tw = 1.
$$
Since $\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_p$, we choose the first element of $w$ to be one and the rest to be zero. This means that $z = \Gamma w = \gamma_1$, where $\gamma_1$ is the first eigenvector of $W^{-1/2}BW^{-1/2}$ and $a = W^{-1/2}z = W^{-1/2}\gamma_1$.
Note that for any two matrices $A\in\mathbb{R}^{n\times p}$ and $C\in\mathbb{R}^{p\times n}$, the non-zero eigenvalues of $AC$ and $CA$ are the same and have the same multiplicity (Theorem A.6.2 of Mardia, Kent and Bibby). Now let $A = W^{-1/2}B$ and $C = W^{-1/2}$, this means that the non-zero eigenvalues of $CA = W^{-1}B$ are the as as $AC = W^{-1/2}BW^{-1/2}$. Hence, $\lambda_1$is the largest eigenvalue of $W^{-1}B$. Since $\gamma_1$ is the eigenvector corresponding to the largest $\lambda_1$ of $W^{-1/2}BW^{-1/2}$, we have that
$$
W^{-1}B(W^{-1/2}\gamma_1) = W^{-1/2}(W^{-1/2}BW^{-1/2}\gamma_1)
= W^{-1/2}\lambda_1\gamma_1 = \lambda_1(W^{-1/2}\gamma_1).
$$
This shows that $a = W^{-1/2}\gamma_1$ is the eigenvector of $W^{-1}B$ corresponding to its largest eigenvalue $\lambda_1$.