The function mjca computes a multiple or joint correspondence analysis based on the eigenvalue decomposition of the Burt matrix. The lambda option selects the scaling variant desired for reporting inertias. \(lambda="indicator"\) gives multiple correspondence analysis based on the correspondence analysis of the indicator matrix, with corresponding inertias (eigenvalues) as presented in lecture slides.
# Argument 'lambda' controls the scaling method, "indicator" is as presented in the lecture slides.
# Argument 'reti' control whether the indicator matrix should be returned or not; let's set it to TRUE
tea.mca <- mjca(tea, lambda = "indicator", reti = T)
2. Overview of the mjca object, and some key details
You can get an overview of values stored in the object by calling name(). Explanations and additional details on these values may be obtained by calling help(mjca).
# Items saved into the mjca object:
names(tea.mca)
## [1] "sv" "lambda" "inertia.e" "inertia.t" "inertia.et"
## [6] "levelnames" "factors" "levels.n" "nd" "nd.max"
## [11] "rownames" "rowmass" "rowdist" "rowinertia" "rowcoord"
## [16] "rowpcoord" "rowctr" "rowcor" "colnames" "colmass"
## [21] "coldist" "colinertia" "colcoord" "colpcoord" "colctr"
## [26] "colcor" "colsup" "subsetcol" "Burt" "Burt.upd"
## [31] "subinertia" "JCA.iter" "indmat" "call"
# Variables and variable categories (factors, levels):
tea.mca$factors
## factor level
## Tea1 "Tea" "black"
## Tea2 "Tea" "Earl Grey"
## Tea3 "Tea" "green"
## How1 "How" "alone"
## How2 "How" "lemon"
## How3 "How" "milk"
## How4 "How" "other"
## how1 "how" "tea bag"
## how2 "how" "tea bag+unpackaged"
## how3 "how" "unpackaged"
## sugar1 "sugar" "No.sugar"
## sugar2 "sugar" "sugar"
## where1 "where" "chain store"
## where2 "where" "chain store+tea shop"
## where3 "where" "tea shop"
## always1 "always" "always"
## always2 "always" "Not.always"
# Singular values (eigen values, since lambda = 'indicator'):
tea.mca$sv^2
## [1] 0.27976178 0.25774772 0.22013794 0.18792961 0.16876495 0.16368666
## [7] 0.15288834 0.13838682 0.11569167 0.08612637 0.06221147
# Cumulative proportion of variation explained by k components:
cumsum(tea.mca$sv^2 / sum(tea.mca$sv^2))
## [1] 0.1525973 0.2931870 0.4132622 0.5157693 0.6078229 0.6971065 0.7805002
## [8] 0.8559839 0.9190885 0.9660665 1.0000000
3. The summary and analysis of the MCA results
# Mass: mass (weight) of the point
# QLT: quality of representation of the variable:category
# INR: The inertia of the point
# k=1: Principal coordinates of the first dimension
# COR: The relative contribution of the principal axis to the point inertia
# CTR: The absolute contribution of the point to the inertia of the axis
S <- summary(tea.mca)
barplot(data.frame(S$scree)[,3], ylim = c(0,20), names.arg = paste('PC',1:11), main = 'Screeplot', las = 2, xlab = 'Component', ylab = '% of variation explained');box()
S$columns
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 Tea:black 41 72 66 -446 65 29 143 7 3
## 2 Tea:Earl Grey 107 135 32 250 113 24 111 22 5
## 3 Tea:green 18 144 74 -464 27 14 -974 117 67
## 4 How:alone 108 118 30 22 1 0 -251 117 27
## 5 How:lemon 18 84 75 -682 58 31 464 27 15
## 6 How:milk 35 43 64 331 29 14 229 14 7
## 7 How:other 5 144 82 -289 3 1 2141 142 89
## 8 how:tea bag 94 639 45 616 497 128 -329 142 40
## 9 how:tea bag+unpackaged 52 519 66 -371 63 26 1001 457 203
## 10 how:unpackaged 20 667 92 -1943 515 270 -1057 152 87
## 11 sugar:No.sugar 86 62 41 -238 60 17 40 2 1
## 12 sugar:sugar 81 62 44 254 60 19 -42 2 1
## 13 where:chain store 107 715 38 533 506 108 -343 209 49
## 14 where:chain store+tea shop 43 705 75 -481 81 36 1333 624 299
## 15 where:tea shop 17 699 95 -2164 520 279 -1269 179 104
## 16 always:always 57 13 53 -109 6 2 118 7 3
## 17 always:Not.always 109 13 28 57 6 1 -62 7 2
4. Visualization and interpretation of the results
plot(tea.mca, arrows = c(T,T))
Note that, the figure is messy and typically it is better to save high-resolution images with ‘pdf’ function for example: pdf(‘filename.pdf’, width = width_in_incehes, height = height_in_inches)
5. Note how the MCA is essentially CA conducted using the indicator matrix
plot(ca(tea.mca$indmat),
arrows = c(F,T), map = 'symmetric', what = c('none','all'))
6. The plot can be done manually as well to increase tuning possibilities
# Obtain the principal coordinates scaled with the corresponding eigen value:
tea.covariates <- sweep(tea.mca$colcoord[,1:2],2, tea.mca$sv[1:2],'*')
# Make a normalization function, i.e., to scale values from 0 to 1 (this is for visulization purposes):
normalize <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
# Save the mjca summary object:
a <- summary(tea.mca)
# Generate the scatter plot. Note how the point size is now scaled according to quality of representation:
plot(tea.covariates, xlim = c(-2.5,1), ylim = c(-1.5,2.5), pch = 16, col = 2,
cex = normalize(a$columns$` qlt`) + 1)
# Add arrows. Slight transparency is added to increase visibility.
arrows(rep(0,17), rep(0,17), tea.covariates[,1], tea.covariates[,2],
length = 0.05, col = rgb(1,0,0,0.25))
# "Cross-hair" is added, i.e., dotted lines crossing x and y axis at 0.
abline(h = 0, v = 0, lty = 3)
# Add variable:category names to the plot.
text(tea.covariates, tea.mca$levelnames, pos = 2, cex = 0.75)
\[ \begin{aligned} V=T^TT,\quad where\enspace T_{i,pl}=\frac{x_{ipl} - n_{pl}/n}{\sqrt{Pn_{pl}}} \end{aligned} \]
# T matrix:
n_pl <- unlist(apply(tea,2,table))
x_ipl <- tea.mca$indmat
n <- nrow(tea)
P <- ncol(tea)
K <- length(n_pl)
T_mat <- (x_ipl - matrix(rep(n_pl,n), ncol = 17, byrow = T)/n) /
matrix(rep(sqrt(P*n_pl),n), ncol = 17, byrow = T)
V_mat <- t(T_mat) %*% T_mat
# Trace(V) = K/P - 1 = sum(T^2)
sum(diag(V_mat))
## [1] 1.833333
K/P - 1
## [1] 1.833333
sum(T_mat^2)
## [1] 1.833333
# Non-zero eigen values of V are the same as the eigen values returned by 'mjca' function:
eigen(V_mat)$values[1:11]
## [1] 0.27976178 0.25774772 0.22013794 0.18792961 0.16876495 0.16368666
## [7] 0.15288834 0.13838682 0.11569167 0.08612637 0.06221147
tea.mca$sv^2
## [1] 0.27976178 0.25774772 0.22013794 0.18792961 0.16876495 0.16368666
## [7] 0.15288834 0.13838682 0.11569167 0.08612637 0.06221147
# And, the trace(V) = sum of the eigen values of V:
sum(eigen(V_mat)$values[1:11])
## [1] 1.833333