Data analysis assignment 3 (optional)

Receive a grade

Instructions

Delmar and Wiklund (2008) study the relationship between the small business managers' growth motivation and firm growth. One of their hypotheses is

Hypothesis 2: Growth motivation at T1 has a positive effect on growth at T2.

Your task is to do a replication study using the provided data. The data are from the Orbis database available to all Aalto Students (https://primo.aalto.fi/permalink/358AALTO_INST/ngpgq9/alma997722844406526) and from a longitudinal survey of software companies in Finland. The data from the Orbis database are self-explanatory. The survey data are from question number 4 of the survey form, which is provided as a part of the data package: "How well do the following statements describe the growth of your firm?”

(The data are anonymized by shuffling the company identifiers in both datasets in the same way.)

Download the data files here

Your task is to combine the data, assess the dimensionality of the survey scale using exploratory factor analysis, construct one or more summed scales from the survey items and assess their reliabilities, and then finally do a regression analysis of realized growth on growth motivation and control variables. This exercise will familiarize you with simple data management tasks as well as arguing reliability and construct validity.

If you want, you can extend the analysis in a few different ways

In addition to exploratory factor analysis, do a confirmator factor analysis. If you choose to do this analysis, pay attention to the chi² statistic and diagnose the model by inspecting the residuals.
The non-indepence of observations assumption is violated in the data because they are repeated observations over time. You can optionally take this into consideration by using cluster robust standard errors. Moreover, regression gives the population average effect, which is rarely of interest because it does not have a clear causal interpreation. As an alternative, you can apply a model that produces the within effect.

Both these extensions are demonstrated in the model answer.

Document your analysis: what was the purpose of each analysis step and how did you interpret the results. The submitted report should be prepared according to instructions that you can find here.

Suggested outline of the analysis process and commands

The table below lists the sub tasks and commands in Stata and R that you can use to complete the assignment. This is just one of the possible ways to do the assignment and you are of course free to do it also in any other way you can.

Subtask	Stata commands and links	R commands and links
Prepare the Orbis data
Load the data	insheet	read.csv
Explore the data	UCLA website on data exploration	stem, pairs, summary, head, cor
Create new identifier variable	The data needs to be setup as a panel a bit later and this requires numerical ID variables, but the raw data have text identifiers (e.g. FI12345678). seq-function in egen command	Not applicable to R
Reshape from wide to long	reshape	melt, cast (reshape library), str_sub (stringr library), as.numeric
Set up the data as panel	xtset	Not applicable to R
Ensure that all variables that contain numeric data are stored as numeric and not as text	describe, destring	as.numeric, gsub, as.character
Generate new company level variables and transform existing variables if needed	generate, replace You need to define at least one new variable for growth. Use the relative change of revenue over one or more years. Stata documentation on lags and leads	R does not have a convenient built-in function for lagged variables. You need to either sort the data and shift the observation vectors yourself, or you can use the slide command in DataCombine package.
Drop unnecessary variables	drop, keep	subset, Extract ([])
Save the data on disk	save	Not needed in R because you can have multiple datasets in memory
Prepare the survey data
Load the data	insheet	read.csv
Explore the data	UCLA website on data exploration	stem, pairs, summary, head, cor
Do a factor analysis of the survey data	factor, rotate	fa (from psych package. You also need GPARotation package)
Calculate one or more summed scales and asses their reliabilities	alpha	alpha (from psych package)
Merge the datasets
Prepare the datasets for merge	You need to merge the two datasets by company identifier and year. The variables on which you merge the two datasets need to have identical names on both datasets. Also, you need to make sure that there are no duplicate observations in the data on the identifying variables. rename, duplicates list, duplicates drop	names, duplicated
Merge the datasets	merge	merge
Analyze the full data
Descriptive statistics and correlations	correlate, summarize	summary, cor
Run regression models and compare the results	regress, estimates store, estimates table, estimates clear	lm, screenreg (from the texreg package)
Post-estimation diagnostics	Stata documentation for regression postestimation and regression postestimation plots	plot.lm, plot, residuals, avPlots (from the car package)
Data exclusions (e.g. outliers) and transformations, if needed.	replace, drop	subset, Extract ([])
Other issues	The commands for reshaping and merging require that there are no duplicate observations. duplicates list, duplicates drop

Delmar, F., & Wiklund, J. (2008). The Effect of Small Business Managers' Growth Motivation on Firm Growth: A Longitudinal Study. Entrepreneurship Theory and Practice, 32(3), 437-457. doi:10.1111/j.1540-6520.2008.00235.x