* This do-file creates some of the graphs used in Lecture 3 of Principles of Empirical Analysis * Matti Sarvimäki, January 2022 clear all set seed 12345 global home "/Users/sarvimm1/Dropbox (Aalto)/teaching/Principles_of_Empirical_Analysis/matti" global data "$home/data" global output "$home/graphs" * --- Reading in the FLEED teaching data --- import delimited using "$data/Fleed_tyossakaynti/fleed_puf_julk.csv", clear * clean up the data a bit rename shtun id gen year=1995+vuosi gen woman=(sukup==2) replace woman=. if sukup==. gen age=year-syntyv rename svatva earn order id year earn age woman sort id year replace earn=earn/1000 keep if year>=2009 * previous year's income sort id year gen earn_t1=earn[_n-1] if id==id[_n-1] * Level of education * see "Detailed description" here: https://www.stat.fi/en/luokitukset/koulutus/ tostring ktutk, replace gen edul=substr(ktutk,1,1) destring edul, replace replace edul=5 if edul==6 replace edul=0 if edul==. label define edul 0 "Less/unknown", add label define edul 3 "Secodary", add label define edul 5 "Bachelor", add label define edul 7 "Master", add label define edul 8 "Lis./PhD", add label values edul edul * Use only year 2010 keep if year==2010 *--- Graphs and tables --- *- Example 1: Conditional mean tabstat earn, by(edul) stat(mean N) format(%9.0f) *- Example 2: Cross tabulation tabulate edul woman tabulate edul woman, cell nofreq *- Example 3: scatterplots * Income vs previous year's income corr earn*, cov corr earn* local rho=round(r(rho)*100)/100 di `rho' local opt "xlab(0(20)100) ylab(0(10)100) xtitle(Income in 2009) ytitle(Income in 2010) xsize(16) ysize(20)" scatter earn earn_t1, mcolor(navy%25) msize(vsmall) `opt' graph export "$output/scatter_income.png", replace scatter earn earn_t1, mcolor(navy%25) msize(vsmall) `opt' text(1 85 "Correlation: `rho'") graph export "$output/scatter_income2.png", replace * note that scatter plots with large data can get very large if using * vector graphics format (e.g. pdf), so I'm using a raster format (png) instead * Perfect correlation gen x=earn*42 local opt "ylab(0(10)100) xtitle(42 * Income in 2010) ytitle(Income in 2010) xsize(16) ysize(20)" scatter earn x, mcolor(navy%25) msize(vsmall) `opt' graph export "$output/scatter_income3.png", replace drop x * Zero correlation 1: random variable gen random=uniform() corr earn random local opt "ylab(0(10)100) xtitle(Random number) ytitle(Income in 2010) xsize(16) ysize(20)" scatter earn random, mcolor(navy%25) msize(vsmall) `opt' graph export "$output/scatter_income4.png", replace * Zero correlation 2: nonlinear relationship gen x=_n replace x=. if x>=5000 gen y=50*x-.01*x^2 corr y x local opt " xtitle(x) ytitle(y = 50x - 0.1x{superscript:2})xsize(16) ysize(20)" scatter y x, mcolor(navy%25) msize(vsmall) `opt' graph export "$output/scatter5.png", replace * Regression regress earn earn_t1 local opt "xlab(0(20)100) ylab(0(10)100) xtitle(Income in 2009) ytitle(Income in 2010) xsize(16) ysize(20)" twoway (scatter earn earn_t1, mcolor(navy%25) msize(vsmall)) (lfit earn earn_t1, lw(thick)), `opt' legend(off) text(1 85 "Y = 2.49 + .93X") graph export "$output/scatter_income_reg.png", replace *--- *-- Example 4: age and income corr earn age local opt "xlab(15(5)70) ylab(0(10)100) xtitle(age in 2010) ytitle(income in 2010) xsize(16) ysize(20) msize(tiny) mc(navy%25)" scatter earn age, `opt' graph export "$output/scatter_income_age1.png", replace scatter earn age, `opt' jitter(10) graph export "$output/scatter_income_age2.png", replace local opt "xlab(15(5)70) ylab(0(10)100) xtitle(age in 2010) ytitle(income in 2010) xsize(16) ysize(20) legend(off)" twoway (scatter earn age, msize(tiny) mc(navy%25) jitter(10)) (lfit earn age, lw(thick)), `opt' graph export "$output/scatter_income_age2b.png", replace preserve collapse (mean) earn, by(age) local opt "xlab(15(5)70) ylab(0(10)100) xtitle(age in 2010) ytitle(income in 2010) xsize(16) ysize(20)" scatter earn age, `opt' graph export "$output/scatter_income_age3.png", replace twoway (scatter earn age) (lfit earn age, lw(thick)), `opt' legend(off) graph export "$output/scatter_income_age3b.png", replace twoway (scatter earn age) (qfit earn age, lw(thick)), `opt' legend(off) graph export "$output/scatter_income_age3c.png", replace restore gen age_sq=age^2 regress earn age age_sq