MS-C1620 Statistical Inference

To be solved at home before the exercise session.

1. Go to the website which lists pairs of variables that have no causal relationship but still exhibit a large correlation. Pick one of the datasets and figure out how the data is presented, i.e., how are the plots constructed from the \((x_i, y_i)\)-data (the plots are not scatter plots of the two variables in question), how are individual pairs \((x_i, y_i)\) represented in the plots and what are the lines going through the points?
2. Let \(x, y, \varepsilon\) be random variables such that, \[y = x + \varepsilon,\] where \(\mathrm{Var}(x) = 1\), \(\mathrm{Var}(\varepsilon) = \sigma^2 > 0\) and \(x\) and \(\varepsilon\) are independent (interpretation: \(x\) and \(y\) have a perfect linear relationship but the observed value of \(y\) is contaminated with the noise/measurement error \(\varepsilon\) having variance \(\sigma^2\)). Compute the Pearson correlation \(\rho\) between \(x\) and \(y\) and investigate how it behaves when \(\sigma^2\) is increased. Interpret this behavior.

To be solved at the exercise session.

The file data_dependency.txt contains seven bivariate data sets (the columns xi and yi, where \(i = 1,2, \ldots ,7\), always form a pair).
1. Read the file into R using the command read.table.
2. Draw a scatter plot for each pair of variables.
3. Calculate the Pearson and Spearman correlations of the pairs and compare them to the scatter plots.
4. The underlying distributions of the samples 5-7 are the same up to the variance of yi (the variance is highest in sample 7). What happens to the correlation coefficients as the variance increases and why?

(Optional) Use also the tests given on slides 6.16 and 6.20 to test the null hypothesis \(H_0: \rho = 0\) for Pearson correlation in problem 2e. How do the results compare to the permutation test?

(Optional) Simulate the distribution of the sample Pearson correlation \(\hat{\rho}\) under normality by generating multiple datasets of size n from a bivariate normal distribution of your choice. Then transform the sample Pearson correlations as \(\hat{\rho} \mapsto \mbox{arctanh}(\hat{\rho})\) and inspect the distribution of the transformation. Does it look normal? (it should for large \(n\), as per slide 6.13)