TUL0022  Statistical Research Methods D, Lecture, 25.10.202229.3.2023
This course space end date is set to 29.03.2023 Search Courses: TUL0022
Topic outline

This is a blended learning course that contains both online and inperson elements. While online participation may be possible, we follow a schedule, which means that completing the course as independent selfstudy for credits is not possible.
The goal of the course is to develop an understanding of how statistical methods are used in management and other social research and how results are usually presented in journal articles. The course is designed for both those interested in just reading and understanding research done with statistical methods and for those who already use or plan to use statistical research methods in their own work.
During the course we will go through empirical papers published in Academy of Management Journal, Strategic Management Journal, and other highquality journals and analyze how these papers were done. The methods and research designs used in these papers cover a majority of basic methods and designs used in these journals.
The analysis techniques covered during the course include regression analysis, its application and extensions to binary, count, and categorical variables, and factor analysis, including both exploratory and confirmatory factor analysis. Confirmatory factor analysis is explained on a surface level that is sufficient for its basic application and evaluation of published results. Extensions of these techniques, such as structural regression models (structural equation models), or multilevel models or other similar techniques for nonindependent observations (e.g. longitudinal or multilevel data) are briefly introduced, but a more thorough study of these techniques is left for advanced courses.
The course consists of eight units, that each take two to four weeks and contain video lectures, online and inperson discussions, and assignments. The number of credits varies between 58 depending on which assignments students choose to complete. The content of each course component is explained later in the course brochure. The data analysis assignments can be completed with Stata, R, or SPSS, but SPSS is not recommended.
This course is targeted to industrial engineering and management doctoral students. For DIEM students, the course TUL0000  Research Methods in Industrial Engineering and Management is a prerequisite. There are no strict prerequisites for students outside DIEM, but a background of a basic course in research methods on the Ph.D level is expected.
The course will run also as a parallel instance in University of Jyväskylä under the course code JSBJ1310, but using this MyCourses instance.

A discussion forum for general issues that are not specific to any of the units. Please post unit specific questions to the forums of each unit.
If you think that you are receiving too may email notifications from the forums, you can reduce the number of emails by unsubscribing or switching to digest emails.


To complete this unit you need to
 Participate the introductory lecture (optional) or watch the introduction videos
 Create a username at Zotero.org and Register your Zotero account to access course materials
 Read the instructions for written assignments and plagiarism policy.
 Complete the unit 1 discussion forum task according to the instructions that will be posted to the forum.
 Choose a group to indicate what you want to learn. The course content may be customized based on the choice.
 Choose your preferred participation option between In person or online teaching
 Sign up for the preexam
 Study for the preexam and complete the preexam.
The purpose of the introduction unit is to familiarize you with how a blended learning course works. The unit starts with an introductory lecture, followed by a short introductory forum task. The introductory unit concludes with the course preexam.

To complete this unit you need to
 Watch the video lectures
 Read the materials for the written assignment 1 (mandatory)
 Complete the unit 2 discussion forum task
 Return written assignment 1 (mandatory)
 View the model answer for written assignment 1 and instructor's comments to written assignment 1 (mandatory)
 Start working on data analysis assignment 1 (mandatory)
 Participate in the seminar (mandatory)
 Participate in the computer class (optional)
 Complete your caption assignments for unit 2 and all earlier units
 Submit reflection and feedback for unit 2
The unit introduces the principles of causal inference and basics of linear regression models. After this unit, you should
 Understand the three conditions for causality and why you cannot Interpret regression results as evidence of causality unless the regression is used as a part of a research design that takes the temporal order of cause and effect into account and more importantly eliminates rival explanations.
 Understand that regression (and other analysis results) need to be interpreted and explained to a reader in a way that makes then understandable for people that do not know statistical analysis very well. You should also know that pvalue is not the main result but simply indicates which regression coefficients you should probably pay more attention to when interpreting the results.
 Understand that statistical analysis is not rocket science, but a collection of fairly intuitive ideas and simple mathematics. (If you go beyond regression, then the math can get a bit more complicated.)

To complete this unit you need to
 Watch the video lectures
 Read the materials for the written assignment 2 (optional)
 Complete the unit 3 discussion forum task
 Return written assignment 2 (optional)
 View the model answer for written assignment 2 and instructors comment's to written assignment 2 (optional)
 Participate in the seminar (mandatory)
 Participate in the computer class (optional)
 Submit data analysis assignment 1 (mandatory)
 Complete your caption assignments for unit 3 and all earlier units
 Submit reflection and feedback for unit 3
The unit discusses assumptions and principles behind regression analysis. After this unit, you should
 Understand that all statistical techniques make assumptions, of which some are empirically testable and others are not, and that some assumptions are more important than others.
 Understand how log transformation can be applied to model nonlinear, relative effects
 Have a basic understanding of the concept of endogeneity and why it is a serious challenge for nonexperimental research.
 Have a basic understanding of how and why regression results can be visualized using marginal prediction plots.
 Understand the relationship between linear model and correlation matrix and also understand why understanding this relationship is very useful when learning about linear models such as regression.

To complete this unit you need to
 Watch the video lectures
 Read the materials for the written assignment 3 (optional)
 Complete the unit 4 discussion forum task
 Return written assignment 3 (optional)
 View the model answer for written assignment 3 and instructors comment's to written assignment 3 (optional)
 Participate in the seminar (mandatory)
 Participate in the computer class (optional)
 Submit data analysis assignment 2 (optional)
 Complete your caption assignments for unit 4 and all earlier units
 Submit reflection and feedback for unit 4
The unit continues from the previous unit with additional issues in linear regression models, focusing on common applications of the technique: mediation and moderation. We will also introduce instrumental variables. After this unit you should:
 Understand the concepts of mediation and moderation models and why this kind of models are useful for research.
 Know how interaction terms can be used in regression to estimate moderation models.
 Understand why plotting is useful and marginal effects are essential for interpreting mediation model results and know why centering of variables will be useless and even counterproductive in proper interpretation of mediation models.
 Know the two most common strategies for estimating mediation models: the Baron and Kenny causal steps approach and the simultaneous equations approach.
 Have a basic understanding of what instrumental variables, how they can be applied to test and control for endogeneity, and how instrumental variables can be used in twostage least squares estimation. You should be aware of the instrument relevance criterion and the instrument exclusion criterion.

To complete this unit you need to
 Watch the video lectures
 Read the materials for the written assignment 4 (optional)
 Complete the unit 5 discussion forum task
 Return written assignment 4 (optional)
 View the model answer for written assignment 4 and instructors comment's to written assignment 4 (optional)
 Participate in the seminar (mandatory)
 Complete your caption assignments for unit 5 and all earlier units
 Submit Reflection and feedback for unit 5
The unit addresses two important violations of regression assumptions: nonlinearity and nonindependence of observations. This far we have used log transformation and quadratic terms for modeling nonlinear effects. In this unit we introduce generalized linear model (GLM), which is in many cases more appropriate than manual transformations of variables. This modeling technique has covers most commonly used single dependent variable models as special cases (e.g. logistic regression, poisson regression, tobit regression, etc.). Maximum likelihood estimation is introduced for those students who want to do quantitative research themselves.
The second topic is nonindependence of observations, which can arise e.g. in longitudinal datasets or datasets where people are nested in teams. These kind of data are often analyzed with panel data regressions (GLS random effects, GLS fixed effects) or multilevel modeling (i.e. HLM). However, in many cases these advanced techniques are unnecessary and normal regression with cluster robust standard error would do fine. Our primary objective with nonindependent data is to understand the different kinds of effects that can be estimated from such data and how the within effect can be estimated with normal regression.
After this unit you should:
 Understand that the reason for using GLM models is that you want to model nonlinear relationships. While GLM models are often used when the dependent variable is noncontinuous (binary, count, categorical), the distribution of the dependent variable is not a reason to use GLM per se, but nonlinear models often make sense in these cases.
 Understand the interpretation of the three most common GLM curves: the line (normal regression), the exponential curve (log link), the Scurves (logistic and probit) and how the effects of other variables are interpreted when these curves are used.
 Understand why plotting is essential when interpreting nonlinear models.
 (Only students that want to use statistical analysis techniques in their own research:) Understand the principle of maximum likelihood estimation.
 Understand the challenges that nonindependent data post for regression analysis
 Be able to explain the within effect and contextual effect and understand when these would be of interest.
 Be able to estimate the within effect using normal regression and cluster robust standard errors.
Some of the video materials on this unit belong to a larger set of materials developed for an advanced course and can go into some technical detail (e.g. exponential models for counts video). Instead of understanding all the details, it is important to understand the big picture: when would the use of these techniques be appropriate and how the result are interested.

To complete this unit you need to
 Watch the video lectures
 Read the materials for the written assignment 5 (optional)
 Complete the unit 6 discussion forum task
 Return written assignment 5 (optional)
 View the model answer for written assignment 5 and instructors comment's to written assignment 5 (optional)
 Participate in the seminar (mandatory)
 Participate in the computer class (optional)
 Complete your caption assignments for unit 6 and all earlier units
 Submit Reflection and feedback for unit 6
The unit discusses the concept of measurement, which arises from the efforts to quantify abstract concepts such as innovativeness. After this unit, you should:
 Know what conceptualization and operationalization are and why they are critically important for measurement.
 Understand the concepts of reliability and validity of measurement.
 Understand the two approaches for measurement reliability assessment, the distinct tests and testretest approaches and what assumptions each of these procedures makes.
 Understand that coefficient alpha quantifies the reliability of the scale score, that alpha is not always an ideal reliability statistic, and why increasing alpha does not always mean that reliability is increased.
 Understand the concepts of predictive validity, content validity, and construct validity and how these concepts are related to one another and measurement validity.

To complete this unit you need to
 Watch the video lectures
 Read the materials for the written assignment 6 (mandatory)
 Complete the unit 7 discussion forum task
 Return written assignment 6 (mandatory)
 View the model answer for written assignment 6 and instructors comment's to written assignment 6 (mandatory)
 Participate in the seminar (mandatory)
 Participate in the computer class (optional)
 Submit data analysis assignment 3 (optional)
 Complete your caption assignments for unit 7 and all earlier units
 Submit Reflection and feedback for unit 7
The unit continues from the previous unit by introducing confirmatory factor analysis (CFA), structural regression models, and some of their postestimation tools. In the seminar, we discuss also conceptualization and scale development processes. After this unit, you should:
 Understand the difference between exploratory and confirmatory factor analysis.
 Have a basic understanding of the workflow for a confirmatory factor analysis.
 Know what the chi^{2} statistic in a CFA tests, know why it is the most important statistics that the analysis provides, and have a basic understanding of what to do when the chi^{2} test rejects the model (which it often does).
 Know the basic idea of structural regression models. (This analysis technique is a bit more advanced and it is not addressed on the course in detail, but it is useful to know the basic idea because the technique is widely used.)
 Know why the idea of "formative measurement" is problematic.
 Know when and why indices constructed from nonscale variables can be useful and how such indices can be justified.
 Understand the importance of conceptualization as a first step in measurement development and have a basic understanding of a commonly used scale development procedure. (This is not covered in the videos, but will be discussed in the seminar.)

To complete this unit you need to
 Watch the video lectures
 Read the materials for the written assignment 7 (mandatory)
 Complete the unit 8 discussion forum task
 Return written assignment 7 (mandatory)
 View the model answer for written assignment 7 and instructors comment's to written assignment 7 (optional)
 Participate in the seminar (mandatory)
 Complete your caption assignments for unit 8 and all earlier units
 Submit Reflection and feedback for unit 8
The unit addresses research design and current issues and debates in quantitative management research. This unit concludes with a full day seminar.

The learning diary that should be updated during the course. The final version must be returned one week after the last lecture.

This page contains additional materials that can be useful when preparing for the course.
Series of MOOCs that will cover basics of statistics
While the course covers some basics of statistical inference, we focus mostly on how these tools are used in management research. It may therefore be helpful to review some of the basics of statistics before the class. The best way to do this is to watch some of the lectures in the "Data analysis and statistical inference" online course on Coursera:
https://www.coursera.org/course/statistics
The following courses are relevant:
 Introduction to Probability and Data
 Inferential Statistics
 Linear Regression and Modeling
Getting familiar with a statistical software
Because we have limited time to work with computers in the class, it is highly recommended that you familiarise yourself with the statistical software that you plan to use before the start of the first class.
Stata
Check out the getting started manual and work through the sample session (Chapter 1)
 Getting Started with Stata for Windows
 Getting Started with Stata for Mac
 Getting Started with Stata for Unix
Stata also has a Youtube channel with tutorial videos. The following video is a good starting point:
Tour of the Stata 14 interface
For more, see http://www.stata.com/links/videotutorials/
R and RStudio
If you plan to use R for the dataanalysis exercises, it is recommended that you do an online tutorial before starting the class. R has a learning curve, and without learning the basics before the course, we will end up spending too much time in learning R instead of learning how to use R for data analysis.
DataCamp provides good, free online tutorials
Particularly, this course is likely to be useful
https://www.datacamp.com/courses/introductiontodata
The MOOCs listed above also use R and provide tutorial on its use.
You should also probably take a look at these books:
Kabacoff, R. (2011). R in action data analysis and graphics with R. Shelter Island, NY; London: Manning ; Pearson Education [distributor].
Wickham, H., & Grolemund, G. (2016). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.
Wickham's book presents a more modern take on R. The book is available here http://r4ds.had.co.nz
SPSS
While it is possible to complete the course using SPSS, this is not a good idea because SPSS is not a good choice for serious data analysis. However, if you just want to do the first assignment, which is mandatory for passing the course, that is doable with SPSS.
For Finnish students
Mikko Ketokivi's book is an excellent resource for basics of quantitative research:
Ketokivi, M. (2015). Tilastollinen päättely ja tieteellinen argumentointi (2nd ed.). Helsinki: Gaudeamus.

The course has computer assignments during which you will work on data analysis problems using a statistical software. The purpose of these assignments is to familiarize you to statistical software, teach you how to interpret analysis results, and teach some basic principles of reproducible research.
Each submission consists of two parts:
 Analysis file (Stata do file, R file, or SPSS syntax file)
 A report based on the analysis file
You need to upload two files into separate tabs in the TurnItIn activity. Uploading zip files is not allowed because the file contained in compressed packages cannot be commented online.
Anonymous grading
The course applies anonymous grading, which means that the instructor cannot see the students' identities in the grading system.
 Do not include your name, student number, or any other identifying information to any submission.
 Do not include any information that would allow to identify your university either (course code, course name, name of the university)
Analysis file
You may use the statistical software by using menus or typing commands, but at the end you need to go through your analysis log and record the commands that you did to an analysis file that will reproduce all the relevant analyses. Using analysis file is important for reproducibility of research. This is important for two reasons
 Other researchers may be interested in the details of your data and analysis, and an analysis file completely documents all your analyses.
 If you submit an article to an academic journal, the reviewers often suggest changes to the analyses. Maintaining an analysis file will make these changes much easier to implement.
To verify that your analysis file works, it is recommend that you restart or otherwise completely reset the statistical software before running the final version of the analysis file. The file produces an analysis log, which will form a basis of your report.
Report
The analysis file is then converted to a report, which documents what exactly you did, why, and how you interpreted the results.
The work flow for producing a report is demonstrated in the screencast for the first data analysis assignment and consists of the following three steps:
 Prepare and run one analysis file that contains all analyses that you did.
 Export the analysis log as a Word document
 Add headings and normal style paragraphs where you explain your interpretation of the analyses to the Word document.
The purpose of the comments that you write is to document your thought process: how did you explore the data, how you interpreted the results, how you checked the assumptions, and how the model evolved. Include a conclusions section where you explain what is your answer to the research question based on the analyses and assess the size of the effects.
Producing a report using Stata
Start by installing the user written MarkDoc module and its dependencies:
net install github, from("https://haghish.github.io/github/") github install haghish/markdoc, stable
You need to do the steps listed above only once.
Add the following line to the beginning of your analysis file:
log using report, replace
Then after each graphics command, add the following:
img
Add the following line to the end of your analysis file:
log close
markdoc report, export(docx) mini replaceRunning the analysis file after these modifications will produce a log file in Word format.
It is a common problem that in Windows the report does not contain any images. This happens because the MarkDoc package does not work well with network drives. Make sure that your working directory is set to your computer instead of a network drive. The current working directory is show in the bottom left corner of the Stata main window. If the path starts with "//", you are working on a network drive. One way to fix this issue is to run the following code, which will create a temporary directory for you and set that as a working directory:
tempfile tf mkdir "`tf" cd "`tf"'
Producing a report using RStudio
After preparing the analysis file, use the Compile notebook feature to compile an MS Word type document.
Producing a report using SPSS
After preparing the analysis file, first close all output windows. Then select the full analysis file (ctlr+a / cmd+a) and run the file. In the new output window that appears, click on the Export button and export the log in "Word/RTF" format.
Screencasts
The screencasts for the first assignment, available here, demonstrate the process of creating a report using R, SPSS, and Stata.


On this course contain a lot of video materials that are also available on Mikko Rönkkö's YouTube channel at https://www.youtube.com/mronkko. We use a crowdsourcing model, where the students help in managing the materials. Each video has a set of materials on the course page, a YouTube version, and slides and transcript on OSF.