Canonical correlation analysis involves partitioning a collection of variables into two sets, an x-set and a y-set. The Object is then to find linear combinations eta = a’x and fii =b’y such that eta and fii have the largest possible correlation. Such linear combinations can give insight into the relationships between the two sets of variables.
Canonical correlation analysis has certain maximal properties similar to those of principal component analysis. However, whereas principal component analysis considers interrelationships within a set of variables, the focus of canonical correlation is on the relationship between two groups of variables.

Problem 1. Canonical Correlation Analysis

Upload CAR.txt into your R-workspace. It contains several international car manufacturers that have been evaluated in eight different categories. The scores are from 1 (very good) to 6 (very bad). Perform the canonical correlation analysis and use the following groups:
\[\begin{equation} X = (Price, Value) \\ Y = (Economy, Servcice, Design, Sport, Safety, Easy h.) \end{equation}\]
Note that cars with score 6 for the variable Value, lose their value the fastest over time.

Read, modify, and look at the data
car <- read.table("CAR.txt",header=T,sep="\t")
str(car)
## 'data.frame':    24 obs. of  10 variables:
##  $ Type   : Factor w/ 22 levels "Audi","BMW","Citroen",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Model  : Factor w/ 20 levels "","1.3","100",..: 3 8 10 1 19 13 1 1 17 7 ...
##  $ Economy: num  3.9 4.8 3 5.3 2.1 2.3 2.5 4.6 3.2 2.6 ...
##  $ Service: num  2.8 1.6 3.8 2.9 3.9 3.1 3.4 2.4 3.9 3.3 ...
##  $ Value  : num  2.2 1.9 3.8 2.2 4 3.4 3.2 1.6 4.3 3.7 ...
##  $ Price  : num  4.2 5 2.7 5.9 2.6 2.6 2.2 5.5 2 2.8 ...
##  $ Design : num  3 2 4 1.7 4.5 3.2 3.3 1.3 4.3 3.7 ...
##  $ Sport  : num  3.1 2.5 4.4 1.1 4.4 3.3 3.3 1.6 4.5 3 ...
##  $ Safety : num  2.4 1.6 4 3.3 4.4 3.6 3.3 2.8 4.7 3.7 ...
##  $ Easy.h.: num  2.8 2.8 2.6 4.3 2.2 2.8 2.4 3.6 2.9 3.1 ...
summary(car)
##       Type        Model       Economy         Service          Value      
##  Opel   : 2          : 5   Min.   :2.100   Min.   :1.600   Min.   :1.600  
##  VW     : 2   1.3    : 1   1st Qu.:2.575   1st Qu.:2.400   1st Qu.:2.175  
##  Audi   : 1   100    : 1   Median :3.100   Median :2.900   Median :3.200  
##  BMW    : 1   19     : 1   Mean   :3.254   Mean   :3.021   Mean   :3.104  
##  Citroen: 1   200    : 1   3rd Qu.:3.825   3rd Qu.:3.425   3rd Qu.:3.725  
##  Ferrari: 1   306    : 1   Max.   :5.300   Max.   :4.700   Max.   :5.500  
##  (Other):16   (Other):14                                                  
##      Price           Design         Sport           Safety     
##  Min.   :1.500   Min.   :1.30   Min.   :1.100   Min.   :1.400  
##  1st Qu.:2.600   1st Qu.:2.95   1st Qu.:3.075   1st Qu.:2.800  
##  Median :2.900   Median :3.20   Median :3.300   Median :3.200  
##  Mean   :3.246   Mean   :3.20   Mean   :3.450   Mean   :3.279  
##  3rd Qu.:4.050   3rd Qu.:3.55   3rd Qu.:3.925   3rd Qu.:3.725  
##  Max.   :5.900   Max.   :4.80   Max.   :5.800   Max.   :5.900  
##                                                                
##     Easy.h.     
##  Min.   :1.600  
##  1st Qu.:2.400  
##  Median :2.650  
##  Mean   :2.737  
##  3rd Qu.:2.925  
##  Max.   :4.300  
## 
# Convert X and Y to matrices for future operations
# (note that e.g. matrix multiplication not possible for data.frame objects)

X <- as.matrix(car[,c(6,5)]) # X = (Price, Value)
Y <- as.matrix(car[,c(3,4,7:10)]) # Y = (Economy, Service, Design, Sport, Safety, Easy.h)

# Combine X and Y
XY <- cbind(X,Y)
# You can give rownames to your matrix, might help later in the interpretation part:
rownames(XY) <- paste(car$Type, car$Model)

a) Compute the sample canonical vectors with corrected scaling.

Let \(x\) be a \(p\)-variate random vector and let \(y\) be a \(q\)-variate random vector. In canonical correlation analysis the aim is to find the canonical vectors \(\alpha\) and \(\beta\) such that the linear combinations \(\alpha^T x\) and \(\beta^T y\) have the largest possible correlation. In other words, we want to maximize \(\rho_k\):

\[\begin{equation} \rho_k = |corr(u_k, v_k)| \end{equation}\]

such that \(var(u_k) = var(v_k) = 1\) where \(u_k = \alpha_k^{T}x\) and \(v_k = \beta_k^{T}x\). The canonical vectors vectors \(\alpha_k\) and \(\beta_k\) are found by computing the eigenvectors of matrices \(M_1\) and \(M_2\), where:

\[\begin{equation} \left\{\begin{matrix} M_1 = \Sigma_{11}^{-1}\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} \\ M_2 = \Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12} \end{matrix}\right. \end{equation}\]
R <- cov(XY) 
R11 <- R[1:2,1:2]  # cov(X)
R22 <- R[3:8,3:8]  # cov(Y)
R21 <- R[3:8,1:2]  # cov(Y,X)
R12 <- R[1:2,3:8]  # cov(X,Y)
R11.inv <- solve(R11)
R22.inv <- solve(R22)

# Matrix M_1
M1 <- R11.inv %*% R12 %*% R22.inv %*% R21

# Matrix M_2
M2 <- R22.inv %*% R21 %*% R11.inv %*% R12

Non-zero eigenvalues of M1 and M2 are the same

eigen(M1)$values
## [1] 0.9592139 0.8202120
eigen(M2)$values[1:2]
## [1] 0.9592139 0.8202120

The eigenvectors of \(M_1\) and \(M_2\) are computed using eigen. However, if we use these as our canonical vectors the variances of the resulting linear combinations will not satisfy the requirement \(var(u_k) = var(v_k) = 1\).

alpha_1 <- eigen(M1)$vectors[,1]
alpha_2 <- eigen(M1)$vectors[,2]

beta_1 <- eigen(M2)$vectors[,1]
beta_2 <- eigen(M2)$vectors[,2]

# Compute scores u_k and v_k
u_1 <- X %*% alpha_1
u_2 <- X %*% alpha_2
v_1 <- Y %*% beta_1
v_2 <- Y %*% beta_2

c(var(u_1), var(u_2), var(v_1), var(v_2))
## [1] 2.0691881 0.2496118 1.5235109 0.5523109
In order to achieve the unit variance, we need to scale the canonical vectors with their corresponding standard deviations. It can be shown that the variances of the transformed variables may be expressed using the covariance matrices: \[\begin{align} \left\{\begin{matrix} var(u_1) &= \alpha_1^T \Sigma_{11} \alpha_1\\ var(u_2) &= \alpha_2^T \Sigma_{11} \alpha_2\\ var(v_1) &= \beta_1^T \Sigma_{22} \beta_1\\ var(v_2) &= \beta_2^T \Sigma_{22} \beta_2 \end{matrix}\right. \end{align}\]

Here we rescale the canonical vectors

alpha_1 <- alpha_1 / c(sqrt(alpha_1 %*% R11 %*% alpha_1))
alpha_2 <- alpha_2 / c(sqrt(alpha_2 %*% R11 %*% alpha_2))
beta_1  <- beta_1 / c(sqrt(beta_1 %*% R22 %*% beta_1))
beta_2  <- beta_2 / c(sqrt(beta_2 %*% R22 %*% beta_2))

# Re-compute scores u_k and v_k
u_1 <- X %*% alpha_1
u_2 <- X %*% alpha_2
v_1 <- Y %*% beta_1
v_2 <- Y %*% beta_2

c(var(u_1), var(u_2), var(v_1), var(v_2))
## [1] 1 1 1 1

b) Compute the score vectors corresponding to the canonical vectors and the sample canonical correlations.
# Score vectors:
# (Original values weighted by the scaled eigen vectors.)
eta1 <- X %*% alpha_1
eta2 <- X %*% alpha_2
fii1 <- Y %*% beta_1
fii2 <- Y %*% beta_2

# Sample canonical vectors scaled such that the score vectors have unit variance (var = 1); the scaling makes them unique up to sign.
score_matrix <- cbind(eta1, eta2, fii1, fii2)
colnames(score_matrix) <- c('eta1', 'eta2', 'fii1', 'fii2')
apply(score_matrix, 2, var)
## eta1 eta2 fii1 fii2 
##    1    1    1    1
# Matrix of the canonical correlations
round(cor(score_matrix), 2)
##      eta1 eta2 fii1 fii2
## eta1 1.00 0.00 0.98 0.00
## eta2 0.00 1.00 0.00 0.91
## fii1 0.98 0.00 1.00 0.00
## fii2 0.00 0.91 0.00 1.00
# Canonical correlations
# r1 = 0.98
# r2 = |-0.91| = 0.91
# The relationship between both pairs seems to be quite strong.

# Note that the canonical correlations are also the square roots of the non-zero eigenvalues of M1,M2
round(sqrt(eigen(M1)$values),2)
## [1] 0.98 0.91

c) Interpret the first pair of canonical variables and plot the corresponding scores.
# Context.
# u1 and v1:
# We call u1 the value index of the car
# and v1 the quality index of the car.
paste0(round(alpha_1,2), colnames(X), collapse = ' ')
## [1] "0.32Price -0.62Value"
paste0(round(beta_1,2), colnames(Y), collapse = ' ')
## [1] "0.43Economy -0.21Service 0Design -0.47Sport -0.22Safety -0.4Easy.h."
# Variables Price and Value have opposite signs.
# --> Variables Economy, Service, Design, Sport, Safety and Easy Handling have the opposite relation to them.
plot(eta1, fii1,
     xlab="'Value index' of the car",
     ylab="'Quality' of the car",
     pch = 16, col = 4, cex = 1.25,
     xlim = c(-4,1), ylim = c(-5,-1))
text(eta1,fii1,labels=paste(car$Type,car$Model), pos = 2)

# In the plot, x-axis:
# The most left point would be a car with Value=6 (very bad) and Price = 1 (very good)
# --> On the left we have cheap cars that lose value fast (e.g. Wartburg, Trabant)

# The most right point would be a car with Value=1 (very good) and Price = 6 (very bad)
# --> On the right we have cars that are expensive but lose value slowly (e.g. BMW, Jaguar, Ferrari)

# We interpet u1 as the value index of the car, such that high scores on the x-axis indicate that 
# the car is a "worthy" investment. Note also, that Value has almost twice as much weight in the scores
# here (when compared to Price).

# 6 = very bad, 1 = very good
# v1 <- 0.43Economy - 0.21 Service +0Design - 0.47Sport - 0.22Safety - 0.4 Easy.h

#y-axis:
# Here, Economy, Sport and Easy handling have the largest weight in this component,
# desing has negligible effect on the scores. 

# The uppermost points have very bad Economy and very good Service, Sport, Safety and Easy.h
# --> The uppermost cars use a lot of fuel but the other qualities are good
# (e.g. BMW, Ferrari)

# The bottommost points have very good economy but the other qualities are very bad
# --> The bottommost cars are economical but other qualities are bad. 

# v1 can be interpreted as a quality index of the car, such that you get a sporty car that is easy 
# to drive in the expense of economy.

# Interpret the second component similarly, might be more difficult (but not impossible). For example, Ferrari and Wartburg have similar profiles.
paste0(round(alpha_2,2), colnames(X), collapse = ' ')
## [1] "-1.41Price -1.42Value"
paste0(round(beta_2,2), colnames(Y), collapse = ' ')
## [1] "-0.46Economy -0.7Service 0.06Design 0Sport 0.3Safety -1.01Easy.h."
# Note the similar wights on Price and Value.
plot(eta2, fii2,
     xlab="u2",
     ylab="v2",
     pch = 16, col = 4, cex = 1.25,
     xlim = c(-12,-6))
text(eta2,fii2,labels = paste(car$Type,car$Model), pos = 2)

# The x-axis might reflect the average of Value and Price:
sort(rowSums(XY[,1:2]) / 2)
##           VW Golf         VW Passat          Hyundai         Opel Corsa 
##              2.30              2.65              2.70              2.80 
##       Opel Vectra       Ford Fiesta      Nissan Sunny            Volvo  
##              2.95              3.00              3.00              3.05 
##       Lada Samara      Mercedes 200          Audi 100       Peugeot 306 
##              3.15              3.20              3.20              3.20 
##        Renault 19    Toyota Corolla        Citroen AX         Mazda 323 
##              3.20              3.20              3.25              3.25 
##          Fiat Uno            Rover  Mitsubishi Galant      BMW 5 series 
##              3.30              3.30              3.35              3.45 
##       Trabant 601           Jaguar       Wartburg 1.3          Ferrari  
##              3.50              3.55              3.60              4.05
# Looking at the original data might be helpful:
# i.e., why Wartburg and 'Fefe' display similarities?
car[c(24,4),]
##        Type Model Economy Service Value Price Design Sport Safety Easy.h.
## 24 Wartburg   1.3     3.7     4.7   5.5   1.7    4.8   5.2    5.5     4.0
## 4   Ferrari           5.3     2.9   2.2   5.9    1.7   1.1    3.3     4.3
# Or why VW's are similar
car[c(22,23),]
##    Type  Model Economy Service Value Price Design Sport Safety Easy.h.
## 22   VW   Golf     2.4     2.1   2.0   2.6    3.2   3.1    3.1     1.6
## 23   VW Passat     3.1     2.2   2.1   3.2    3.5   3.5    2.8     1.8