TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
This course space end date is set to 06.04.2022 Search Courses: TU-L0022
Data analysis assignment 3 (optional)
Instructions
Delmar and Wiklund (2008) study the relationship between the small business managers' growth motivation and firm growth. One of their hypotheses is
Hypothesis 2: Growth motivation at T1 has a positive effect on growth at T2.
Your task is to do a replication study using the provided data. The data are from the Orbis database available to all Aalto Students (https://primo.aalto.fi/permalink/358AALTO_INST/ngpgq9/alma997722844406526) and from a longitudinal
survey of software companies in Finland. The data from the Orbis database are self-explanatory. The survey data are from question number 4 of the survey form, which is provided as a part of the data package: "How well do the following statements describe
the growth of your firm?”
(The data are anonymized by shuffling the company identifiers in both datasets in the same way.)
Your task is to combine the data, assess the dimensionality of the survey scale using exploratory factor analysis, construct one or more summed scales from the survey items and assess their reliabilities, and then finally do a regression analysis of realized growth on growth motivation and control variables. This exercise will familiarize you with simple data management tasks as well as arguing reliability and construct validity.
If you want, you can extend the analysis in a few different ways
- In addition to exploratory factor analysis, do a confirmator factor analysis. If you choose to do this analysis, pay attention to the chi2 statistic and diagnose the model by inspecting the residuals.
- The non-indepence of observations assumption is violated in the data because they are repeated observations over time. You can optionally take this into consideration by using cluster robust standard errors. Moreover, regression gives the population average effect, which is rarely of interest because it does not have a clear causal interpreation. As an alternative, you can apply a model that produces the within effect.
Both these extensions are demonstrated in the model answer.
Document your analysis: what was the purpose of each analysis step and how did you interpret the results. The submitted report should be prepared according to instructions that you can find here.
Suggested outline of the analysis process and commands
The table below lists the sub tasks and commands in Stata and R that you can use to complete the assignment. This is just one of the possible ways to do the assignment and you are of course free to do it also in any other way you can.
Subtask |
Stata commands and links |
R commands and links |
---|---|---|
Prepare the Orbis data |
||
Load the data |
insheet |
read.csv |
Explore the data |
stem, pairs, summary, head, cor |
|
Create new identifier variable |
The data needs to be setup as a panel a bit later and this requires numerical ID variables, but the raw data have text identifiers (e.g. FI12345678). seq-function in egen command |
Not applicable to R |
Reshape from wide to long |
reshape |
melt, cast (reshape library), str_sub (stringr library), as.numeric |
Set up the data as panel |
xtset |
Not applicable to R |
Ensure that all variables that contain numeric data are stored as numeric and not as text |
describe, destring |
as.numeric, gsub, as.character |
Generate new company level variables and transform existing variables if needed |
generate, replace
You need to define at least one new variable for growth. Use the relative change of revenue over one or more years. |
R does not have a convenient built-in function for lagged variables. You need to either sort the data and shift the observation vectors yourself, or you can use the slide command in DataCombine package. |
Drop unnecessary variables |
drop, keep |
subset, Extract ([]) |
Save the data on disk |
save |
Not needed in R because you can have multiple datasets in memory |
Prepare the survey data |
||
Load the data |
insheet |
read.csv |
Explore the data |
stem, pairs, summary, head, cor |
|
Do a factor analysis of the survey data |
factor, rotate |
fa (from psych package. You also need GPARotation package) |
Calculate one or more summed scales and asses their reliabilities |
alpha |
alpha (from psych package) |
Merge the datasets |
||
Prepare the datasets for merge |
You need to merge the two datasets by company identifier and year. The variables on which you merge the two datasets need to have identical names on both datasets. Also, you need to make sure that there are no duplicate observations in the data on the identifying variables. rename, duplicates list, duplicates drop |
names, duplicated |
Merge the datasets |
merge |
merge |
Analyze the full data |
|
|
Descriptive statistics and correlations |
correlate, summarize |
summary, cor |
Run regression models and compare the results |
regress, estimates store, estimates table, estimates clear |
lm, screenreg (from the texreg package) |
Post-estimation diagnostics |
Stata documentation for regression postestimation and regression postestimation plots |
plot.lm, plot, residuals, avPlots (from the car package) |
Data exclusions (e.g. outliers) and transformations, if needed. |
replace, drop |
subset, Extract ([]) |
Other issues |
The commands for reshaping and merging require that there are no duplicate observations. duplicates list, duplicates drop |
Delmar, F., & Wiklund, J. (2008). The Effect of Small Business Managers' Growth Motivation on Firm Growth: A Longitudinal Study. Entrepreneurship Theory and Practice, 32(3), 437-457. doi:10.1111/j.1540-6520.2008.00235.x
Instructions
Delmar and Wiklund (2008) study the relationship between the small business managers' growth motivation and firm growth. One of their hypotheses is
Hypothesis 2: Growth motivation at T1 has a positive effect on growth at T2.
Your task is to do a replication study using the provided data. The data are from the Orbis database available to all Aalto Students (https://primo.aalto.fi/permalink/358AALTO_INST/ngpgq9/alma997722844406526) and from a longitudinal
survey of software companies in Finland. The data from the Orbis database are self-explanatory. The survey data are from question number 4 of the survey form, which is provided as a part of the data package: "How well do the following statements describe
the growth of your firm?”
(The data are anonymized by shuffling the company identifiers in both datasets in the same way.)
Your task is to combine the data, assess the dimensionality of the survey scale using exploratory factor analysis, construct one or more summed scales from the survey items and assess their reliabilities, and then finally do a regression analysis of realized growth on growth motivation and control variables. This exercise will familiarize you with simple data management tasks as well as arguing reliability and construct validity.
If you want, you can extend the analysis in a few different ways
- In addition to exploratory factor analysis, do a confirmator factor analysis. If you choose to do this analysis, pay attention to the chi2 statistic and diagnose the model by inspecting the residuals.
- The non-indepence of observations assumption is violated in the data because they are repeated observations over time. You can optionally take this into consideration by using cluster robust standard errors. Moreover, regression gives the population average effect, which is rarely of interest because it does not have a clear causal interpreation. As an alternative, you can apply a model that produces the within effect.
Both these extensions are demonstrated in the model answer.
Document your analysis: what was the purpose of each analysis step and how did you interpret the results. The submitted report should be prepared according to instructions that you can find here.
Suggested outline of the analysis process and commands
The table below lists the sub tasks and commands in Stata and R that you can use to complete the assignment. This is just one of the possible ways to do the assignment and you are of course free to do it also in any other way you can.
Subtask |
Stata commands and links |
R commands and links |
---|---|---|
Prepare the Orbis data |
||
Load the data |
insheet |
read.csv |
Explore the data |
stem, pairs, summary, head, cor |
|
Create new identifier variable |
The data needs to be setup as a panel a bit later and this requires numerical ID variables, but the raw data have text identifiers (e.g. FI12345678). seq-function in egen command |
Not applicable to R |
Reshape from wide to long |
reshape |
melt, cast (reshape library), str_sub (stringr library), as.numeric |
Set up the data as panel |
xtset |
Not applicable to R |
Ensure that all variables that contain numeric data are stored as numeric and not as text |
describe, destring |
as.numeric, gsub, as.character |
Generate new company level variables and transform existing variables if needed |
generate, replace
You need to define at least one new variable for growth. Use the relative change of revenue over one or more years. |
R does not have a convenient built-in function for lagged variables. You need to either sort the data and shift the observation vectors yourself, or you can use the slide command in DataCombine package. |
Drop unnecessary variables |
drop, keep |
subset, Extract ([]) |
Save the data on disk |
save |
Not needed in R because you can have multiple datasets in memory |
Prepare the survey data |
||
Load the data |
insheet |
read.csv |
Explore the data |
stem, pairs, summary, head, cor |
|
Do a factor analysis of the survey data |
factor, rotate |
fa (from psych package. You also need GPARotation package) |
Calculate one or more summed scales and asses their reliabilities |
alpha |
alpha (from psych package) |
Merge the datasets |
||
Prepare the datasets for merge |
You need to merge the two datasets by company identifier and year. The variables on which you merge the two datasets need to have identical names on both datasets. Also, you need to make sure that there are no duplicate observations in the data on the identifying variables. rename, duplicates list, duplicates drop |
names, duplicated |
Merge the datasets |
merge |
merge |
Analyze the full data |
|
|
Descriptive statistics and correlations |
correlate, summarize |
summary, cor |
Run regression models and compare the results |
regress, estimates store, estimates table, estimates clear |
lm, screenreg (from the texreg package) |
Post-estimation diagnostics |
Stata documentation for regression postestimation and regression postestimation plots |
plot.lm, plot, residuals, avPlots (from the car package) |
Data exclusions (e.g. outliers) and transformations, if needed. |
replace, drop |
subset, Extract ([]) |
Other issues |
The commands for reshaping and merging require that there are no duplicate observations. duplicates list, duplicates drop |
Delmar, F., & Wiklund, J. (2008). The Effect of Small Business Managers' Growth Motivation on Firm Growth: A Longitudinal Study. Entrepreneurship Theory and Practice, 32(3), 437-457. doi:10.1111/j.1540-6520.2008.00235.x
Sorry, no guest users are allowed to access this plugin. Please login.