TU-L0022 - Statistical Research Methods D, Lecture, 25.10.2022-29.3.2023
This course space end date is set to 29.03.2023 Search Courses: TU-L0022
Data analysis assignment 2 (optional)
Instructions
This exercise is a continuation from the previous exercise. If you want, you can use your submission to the previous exercise as a starting point.
Add a categorical variable occupation type to the model (if you have not done so already) and compare the effects between then categories. Then do a moderation and a mediation analysis with a statistical software of your choice using the approaches presented by Baron and Kenny (1986) using the Prestige dataset used in the class. Answer the following three research questions:
- Are there systematic differences in income between the occupation categories?
- Are women dominated professions rewarded less for prestigiousness than men dominated professions?
- To what extent can the positive relationship between education and income mediated by prestigiousness?
You can explain either income or if you see it necessary, the logarithm of income. Alternatively, you can use Poisson QML regression with robust standard errors. This is not explained in the screencasts, but you can find an explanation and examples using R and Stata in this article.
If you choose to use a non-linear model, provide evidence of the correctness of the functional form by plotting the observations under the marginal prediction curves. You can also try out spline regression following the example shown in the article cited above. (This is not explained in the screencasts.)
Interpret the results using both confidence intervals and p values. Calculate the confidence intervals for the mediation effect by using both bootstrapping and "Sobel test"
or its variant. You can also try out simultaneous equations estimation (sem in Stata, sem in lavaan R package; not explained in the screencasts).
To answer whether there are systematic differences in income between the occupation categories, you need to calculate statistical test for the difference between the regressions of the dummy variables. This is not demonstrated in the screencasts, but it is explained in the video Testing linear hypotheses after regression (8:57). Stata command for the test is test and R command is linearHypothesis. Read the documentation of these commands to understand their use. If you use SPSS to do this assignment, you need to calculate the test by hand because SPSS does not support this test. To get the required estimate covariance matrix, choose click on "Statistics" in the regression dialog and then check the box "Covariance matrix" in the "Linear regression: Statistics" dialog that appears.
Document your thought process: how did you explore the data, how you checked the assumptions, and how the model evolved. The submitted report should be prepared according to instructions that you can find here
You can load the data and generate a log of income in the following ways.
Stata:
use https://stats.idre.ucla.edu/stat/stata/examples/ara/prestige, clear gen lnincome = log(income)
R:
install.packages("car”) library(car) Prestige$lnincome <- ln(Prestige$lnincome)
The dataset is from
Fox, J. (1997). Applied Regression Analysis, Linear Models, and Related Methods. Thousand Oaks, Calif: SAGE Publications, Inc.
The dataset can be downloaded directly from
http://www.ats.ucla.edu/stat/spss/examples/ara/default.htm
http://socserv.socsci.mcmaster.ca/jfox/Books/Applied-Regression/
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173-1182.
Instructions
This exercise is a continuation from the previous exercise. If you want, you can use your submission to the previous exercise as a starting point.
Add a categorical variable occupation type to the model (if you have not done so already) and compare the effects between then categories. Then do a moderation and a mediation analysis with a statistical software of your choice using the approaches presented by Baron and Kenny (1986) using the Prestige dataset used in the class. Answer the following three research questions:
- Are there systematic differences in income between the occupation categories?
- Are women dominated professions rewarded less for prestigiousness than men dominated professions?
- To what extent can the positive relationship between education and income mediated by prestigiousness?
You can explain either income or if you see it necessary, the logarithm of income. Alternatively, you can use Poisson QML regression with robust standard errors. This is not explained in the screencasts, but you can find an explanation and examples using R and Stata in this article.
If you choose to use a non-linear model, provide evidence of the correctness of the functional form by plotting the observations under the marginal prediction curves. You can also try out spline regression following the example shown in the article cited above. (This is not explained in the screencasts.)
Interpret the results using both confidence intervals and p values. Calculate the confidence intervals for the mediation effect by using both bootstrapping and "Sobel test"
or its variant. You can also try out simultaneous equations estimation (sem in Stata, sem in lavaan R package; not explained in the screencasts).
To answer whether there are systematic differences in income between the occupation categories, you need to calculate statistical test for the difference between the regressions of the dummy variables. This is not demonstrated in the screencasts, but it is explained in the video Testing linear hypotheses after regression (8:57). Stata command for the test is test and R command is linearHypothesis. Read the documentation of these commands to understand their use. If you use SPSS to do this assignment, you need to calculate the test by hand because SPSS does not support this test. To get the required estimate covariance matrix, choose click on "Statistics" in the regression dialog and then check the box "Covariance matrix" in the "Linear regression: Statistics" dialog that appears.
Document your thought process: how did you explore the data, how you checked the assumptions, and how the model evolved. The submitted report should be prepared according to instructions that you can find here
You can load the data and generate a log of income in the following ways.
Stata:
use https://stats.idre.ucla.edu/stat/stata/examples/ara/prestige, clear gen lnincome = log(income)
R:
install.packages("car”) library(car) Prestige$lnincome <- ln(Prestige$lnincome)
The dataset is from
Fox, J. (1997). Applied Regression Analysis, Linear Models, and Related Methods. Thousand Oaks, Calif: SAGE Publications, Inc.
The dataset can be downloaded directly from
http://www.ats.ucla.edu/stat/spss/examples/ara/default.htm
http://socserv.socsci.mcmaster.ca/jfox/Books/Applied-Regression/
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173-1182.
Sorry, no guest users are allowed to access this plugin. Please login.