Topic outline

  • To complete this unit you need to

    1. Watch the video lectures
    2. Read the materials for the written assignment 2 (optional)
    3. Complete the unit 3 discussion forum task
    4. Return written assignment 2 (optional)
    5. View the model answer for written assignment 2 and instructors comment's to written assignment 2 (optional)
    6. Participate in the seminar (mandatory)
    7. Participate in the computer class (optional)
    8. Submit data analysis assignment 1 (mandatory)
    9. Complete your caption assignments for unit 3 and all earlier units
    10. Submit reflection and feedback for unit 3

    The unit discusses assumptions and principles behind regression analysis. After this unit, you should

    1. Understand that all statistical techniques make assumptions, of which some are empirically testable and others are not, and that some assumptions are more important than others.
    2. Understand how log transformation can be applied to model non-linear, relative effects
    3. Have a basic understanding of the concept of endogeneity and why it is a serious challenge for non-experimental research.
    4. Have a basic understanding of how and why regression results can be visualized using marginal prediction plots.
    5. Understand the relationship between linear model and correlation matrix and also understand why understanding this relationship is very useful when learning about linear models such as regression.

    • Choice icon

      Please indicate your preferred participation model to this seminar. The in-person seminar will be organized if there are at least two persons joining in.

    • Forum icon
    • Video lectures

    • Topic 1: Revisiting unit 2 concepts

    • H5P icon
    • Topic 2: More on the use of regression analysis

    • H5P icon
    • H5P icon
    • Topic 3: Statistical tests after regression

    • H5P icon
    • H5P icon
    • H5P icon
    • H5P icon
    • Topic 4: Model implied correlation matrix and misunderstandings of regression

    • H5P icon
    • Because your group is "Reader of quantitative research", a video about model implied covariance matrix will not be shown.

    • H5P icon

      How to calculate a covariance matrices. This is a useful rule is because it allows us to see that the variance of Y is a sum of all these different sources of variation.

      Note: This video contains errors and will be re-recorded.

      Click to view transcript

      In this video, I will expand the previous video's principle to covariance matrices. A correlation matrix is a special case of the covariance matrix that has been scaled so that the variances of each variable are 1. So correlation matrix is kind of like a standardized version of a covariance matrix. Some features of linear models are better understood in covariance metrics, so understanding the same set of rules in covariance form is useful. Let's take a look at the covariance between X 1 and Y. We calculate the covariance X1 Y the same way as we calculated correlation. So we take the unstandardized regression coefficients here, so previously we were working with standardized regression coefficients, these are now unstandardized because we are working on the raw metric instead of the correlation metric. So we have X1 to Y 1 path. We get the beta 1 goes here.

      Then another way of X1 to Y is to our travel 1 covariance X1 to X2 so that's covariance and then regression path. So we get that and then our X1 to X3, 1 covariance, and then to Y so that's all. We sum those together. That gives us the covariance between X1 and Y and that's the same math that we had in a correlation example but instead of working with correlations, we work with covariances. Things get more interesting when we look at what is the variance of Y. So the variance of Y is given by that equation here. So the idea is that we go from Y, and then we go to each source of variance of Y and then we come back. So we go from Y to X1, we take the variance of X1 and then we come back. So its variance of X1 times beta1 squared in the correlation metric we just take beta1 and beta1 squared because the variance in correlation matrix is one so we just ignore that.

       When we go from Y to X1, X2 and beta2 then we get that here and we go it both ways. So why this is a useful rule is because it allows us to see that the variance of Y is a sum of all these different sources of variation, so we get variation due to X covariance due to X1 and X2 we get variation due to the error term. So the variation of Y is the sum of all these variances and covariances of the explanatory variables, plus the variance of U the error term that is uncorrelated with all the explanatory variables. This covariance form of the model implied a correlation matrix rule is useful when you start working on more complicated models, such as confront factor analysis models.


    • H5P icon
    • H5P icon
    • Topic 5: Regression assumptions and diagnostics

    • H5P icon
    • H5P icon
    • H5P icon
    • H5P icon
    • H5P icon
    • File icon
      Correct answers for Regression diagnostics and analysis workflow video tasks File PDF
    • H5P icon
    • URL icon

      Check that you have completed your captioning assignments for this unit and all previous units. This item will be marked as completed by the course staff when your captions have been reviewed by the course staff.

    • Assignments and model answers

      Model answers are shown only to students whose assignments have been graded.

    • Turnitin Assignment 2 icon
    • Reflection and feedback Unit 3

      Reflection is a key element of learning. At the end of the unit it is good time to look back at what you have learned, where you did well, and what you can still improve on. After you have completed all parts of the unit and received grades and feedback for all your submitted work, will in a short feedback form below.

    • Materials

    • Additional resources