To complete this unit you need to
- Watch the video lectures
- Read the materials for the written assignment 4 (optional)
- Complete the unit 5 discussion forum task
- Return written assignment 4 (optional)
- View the model answer for written assignment 4 and instructors comment's to written assignment 4 (optional)
- Participate in the seminar (mandatory)
- Submit Reflection and feedback for unit 5
The unit addresses two important violations of regression assumptions: non-linearity and non-independence of observations. This far we have used log transformation and quadratic terms for modeling non-linear effects. In this unit we introduce generalized linear model (GLM), which is in many cases more appropriate than manual transformations of variables. This modeling technique has covers most commonly used single dependent variable models as special cases (e.g. logistic regression, poisson regression, tobit regression, etc.). Maximum likelihood estimation is introduced for those students who want to do quantitative research themselves.
The second topic is non-independence of observations, which can arise e.g. in longitudinal datasets or datasets where people are nested in teams. These kind of data are often analyzed with panel data regressions (GLS random effects, GLS fixed effects) or multilevel modeling (i.e. HLM). However, in many cases these advanced techniques are unnecessary and normal regression with cluster robust standard error would do fine. Our primary objective with non-independent data is to understand the different kinds of effects that can be estimated from such data and how the within effect can be estimated with normal regression.
After this unit you should:
- Understand that the reason for using GLM models is that you want to model non-linear relationships. While GLM models are often used when the dependent variable is non-continuous (binary, count, categorical), the distribution of the dependent variable is not a reason to use GLM per se, but non-linear models often make sense in these cases.
- Understand the interpretation of the three most common GLM curves: the line (normal regression), the exponential curve (log link), the S-curves (logistic and probit) and how the effects of other variables are interpreted when these curves are used.
- Understand why plotting is essential when interpreting non-linear models.
- (Only students that want to use statistical analysis techniques in their own research:) Understand the principle of maximum likelihood estimation.
- Understand the challenges that non-independent data post for regression analysis
- Be able to explain the within effect and contextual effect and understand when these would be of interest.
- Be able to estimate the within effect using normal regression and cluster robust standard errors.
Some of the video materials on this unit belong to a larger set of materials developed for an advanced course and can go into some technical detail (e.g. exponential models for counts -video). Instead of understanding all the details, it is important to understand the big picture: when would the use of these techniques be appropriate and how the result are interested.