Översikt

  • This course is an introduction to multivariate statistical analysis. The goal is to learn basics of common multivariate data analysis techniques and to use the methods in practice. Software R is used in the exercises of this course. The topics of the course are multivariate location and scatter, principal component analysis, bivariate correspondence analysis, multiple correspondence analysis, canonical correlation analysis, discriminant analysis, classification, and clustering.  

    Note that all the lectures and exercise classes are given on campus. Remote attendance is not possible. 

    Before the course starts, make sure that you know how to calculate the univariate means, medians, variances, and max and min values. Familiarize yourself with the correlation coefficients and common graphical presentations (boxplots, scatter plots, histograms, bar plots, pie charts) of data. Learn to calculate the multivariate mean vector and covariance matrix. Make sure that you know what is a cumulative distribution function, a probability density function, and a probability mass function. Make sure that you know what is the expected value of a random variable. Read about univariate and multivariate normal distributions and elliptical distributions. Make sure that you know what is meant by central symmetric distributions and skew distributions. Recall what are the determinant, eigenvectors and eigenvalues of a matrix and make sure that you know what is meant by a symmetric matrix and a positive definite matrix.

    How to pass this course?

    You are expected to:

    -Attend the lectures and be active - not compulsory, no points, but highly recommended. 

    -Submit your project work on time - THIS IS COMPULSORY - max 6 points.

    -Take the exam - max 24 points. 

    -Participate to weekly exercises (group 1, group 2, group 3 OR group 4) - not compulsory, but highly recommended - max 3 points. 

    -Be ready to present your homework solutions in the exercise group - not compulsory, but highly recommended - max 3 points.

    Max total points = 6 + 24 + 3 + 3 = 36. You need at least 16 points in order to pass the course.

    How to get a good grade?

    -Attend the lectures and be active!

    -Work hard on your project work.

    -Be active in the exercises!

    -Study for the exam!

    Grading is based on the total points as follows: 16p -> 1, 20p -> 2, 24p -> 3, 28p -> 4, 32p -> 5.



  • Lecture materials are placed here. Note that there is a reference list in the end of each lecture slide set. If you wish to study some topic in more detail, you can take a look at the literature listed in the reference list. 

    Note that all the lectures are given on campus. Remote attendance is not possible. 

  • Exercises

    Participate to weekly exercises (group 1, group 2, group 3 OR group 4) - not compulsory, but highly recommended - max 3 points. If you attend 2-3 times, you get 1 point. If you attend 4-5 times, you get 2 points. If you attend at least 6 times (out of 11 times), you get 3 points.

    In order to earn the exercise points, you have to arrive on time to the exercise session and write your name to the participation list. You can not get any exercise points without attending the exercises.

    Exercise session 11 is reserved for the project work and for summarizing the contents of the course.

    Attending all the exercise sessions, including the last one, is highly recommended.

    Note that all the exercise classes are given on campus. There are no remote groups. 

    Homework

    Solve the homework problems and be ready to present your solutions in the exercise group - not compulsory, but highly recommended - max 3 points. Note that your solution does not have to be perfect or even correct --- trying your very best is enough!

    If you solve your homework assignments  2-3 times, you get 1 point. If you solve your homework assignments 4-5 times, you get 2 points. If you  solve your homework assignments at least 6 times (out of 10 times), you get 3 points.

    In order to earn the homework points, you have to arrive on time to the exercise session and write your name to the homework list. You can not get any homework points without attending the exercises.

    The exercise points are valid until the end of November 2023.

    Project Work 

    Submit your project work on time as one single pdf-file - THIS IS COMPULSORY - max 6 points 

    Find a multivariate (at least 3-variate) dataset (Statistics Finland (=Tilastokeskus), OECD, collect yourself, ...), set a research question, and perform multivariate analysis. Write a report (max 10 pages), and submit it below before Friday 14.4.2023 at 12.00! Note that the deadline is at noon, not midnight!

    Note that the project work has to be conducted individually. Group work is not allowed.

    Goals of the project work:

    -Description of the research questions

    -Description of the dataset

    -Univariate and bivariate statistical analysis to present the variables

    -Application of your chosen multivariate statistical methods to answer research questions (justification and output)

    -Conclusions and answers to the question raised at the beginning

    -Critical evaluation of the analysis

    Remember that no findings is a finding!

    Note that you will automatically get 0 points from the exam if you will not submit your project work on time!

    About grading of the project work: 

    Maximum points are 6 and the 6 points are divided as follows.

    Intro (description of the research question and of the data source or data collection) --- max 0.5 p.

    Univariate analysis (description of the variables, summary statistics, visualization) --- max 1p.

    Bivariate analysis (analysis of bivariate dependencies, visualization) --- max 1 p.

    Multivariate analysis --- max 3 p. This is divided to selection of the method --- max 1 p.; technical implementation --- max 1 p.; and presenting the results/interpretation --- max 1 p.

    Critical evaluations (report about possible sources of biases etc.) --- max 0.5 p.

    If the report is not polished (blurry images, text in the marginal etc), that may lead to -1p.

    Note that you don't have to attach any R-codes to your project work.