Where does the analysis begin? What should you do, when you have collected the data? Well, get acquainted with it. A common point of departure is to plot and visualise the data into a graph and take a first look at it just by assessing its main features. Where are most of the observations concentrated? Is there a shape that you can recognize? Are there any outliers?
After that, you can move to calculating descriptive statistics to gain an understanding of what kind of data you are dealing with. You will find a table presenting these numbers in pretty much all research articles that apply quantitative analysis. The table typically presents measures of central tendency: the mean, median, and mode; and measures of their variability: the range, and standard deviation or variance. Beyond individual variables, we may present descriptions of the relationship between two different variables, i.e. measures of dependence, such correlation. When two variables are positively correlated, ‘the larger the first, the larger the second’; and when they are negatively correlated, ‘the larger the first, the smaller the second’.
Calculating and presenting these will give the researcher, and the reader, a first impression of the phenomenon. For instance, the table could provide information on the average revenue of the companies and how much this varies within the companies included in the sample. It indicates what might be a high or low value (typically values that are more than one standard deviation away from the mean are considered high or low). Also, it tells to which extent the variables are statistically associated.
Whereas descriptive statistics focus on the data, making predictions on the variables and their relationships within the wider population, we must immerse ourselves to the world of inferential statistics. This includes an abundance of different methods – that suit different types of questions and data, and have their specific limitations. However, the idea is to draw conclusions from the data in order to infer how different entities compare across time or groups, in reality.
Inferential statistics offers ways of testing our hypothesis. This is done through a null hypotheses significance testing (NHST).Null hypothesis means that the predictors (independent variables) that we have included in our model do not have an effect on the outcome (dependent variable). Basically, our model and hypotheses are rubbish – or, at least, they are not supported by our data. In other words, NHST estimates how likely it is that we would obtain a result that confirms the alternative hypothesis (=the one we have formulated to state that there is a particular effect), if there ultimately is no effect.
It also deals with statistic models that are used to predict an outcome. They follow the same basic procedure: choosing a model that ‘fits’ with the data, and then using that model to make predictions of the wider population. In order to fit a model, you will need a function that describes how the predictors form the outcome, and you will need to define the error function that describes the difference between your data and the model’s prediction. (Don’t worry: there are analysis that test how well your model actually fits.) Then you use this model to predict what the population values would look like.
Linear regression, multiple regression, logistic regression… These are all ways of doing predictive analysis; they predict an outcome from predictor variables and error. Choose one that is best, and has the least error: Are you interested in bivariate correlation, i.e. whether there is a relationship between two variables? Choose liner regression model. It formulates a mathematical function that describes a linear relationship between the variables. Visually, this would mean drawing a straight line through the ‘cloud of data’ in a way that best describes it.
Or do you think there is a partial correlations and multiple independent variables? Then choose multiple regression. Or is it so that your dependent variable is a binary variable and you are explaining it with nominal/ordinal/interval/ratio independent variables? Then logistic regression is the method for you.
In practice, there are many statistical programs (SPSS, R, Stata, Matlab, SAS) that are used to analyse the data. And, as already mentioned, this is a whole world that you can spend a life-time studying.
Before getting excited about the enormous possibilities of statistical analysis, remember that studies relying on correlation do not necessarily tell about causality. For us to know that a particular independent variable really affects the dependent variable – ‘causes it’ – more criteria than their joint variation must be fulfilled. Think about the relationship between ice cream and drowning; they correlate but do not have a causal relationship. A causal relationship should have a temporal sequence (cause àeffect), the effect should be consistent and the strength of it should be of relevance. In fact, there is still debate on what suffices as an evidence of causality.
Cardon, M. S. & Kirk, C. P. 2013. Entrepreneurial passion as mediator of the self‐efficacy to persistence relationship. Entrepreneurship Theory and Practice, 39(5), 1027-1050.
Zhang, Y. & Shaw, J. D. 2012. Publishing in AMJ--Part 5: Crafting the Methods and Results. Academy of Management Journal, 55(1), 8-12.
Descriptive and Inferential Statistics
Describing one variable (at a time): mean, median, and mode, range, and standard deviation
Correlation – describing two variables
Testing Hypotheses: Null Hypothesis Significance Testing (NHST)
Simple Linear Regression
Exercise 7.1 – Comprehend
After getting acquainted with the materials (readings, videos), explain briefly (max 1 page) what is the purpose of data visualisation, descriptive and inferential statistics.
Exercise 7.2 – Critique
Read the Results and Limitations sections of the article by Cardon and Kirk (2013), and answer the questions on it: How do the authors analyse their data; which analytical techniques do they use? Do you think the authors sufficiently justify why they have chosen these methods?
Please check. Did you gain an understanding of the following?
- How to present descriptive statistics
- Basic principle of inferential statistics
- Difference between correlation and causality
If you can answer everything with a confident Yes! then you have achieved the learning objective of this session.