* This do-file creates the graphs used in Lecture 2 of Principles of Empirical Analysis * Matti Sarvimäki, January 2021 clear all global home "[add your folder here]" global data "$home/data" global output "$home/graphs" * --------------------------- * --- FLEED TEACHING DATA --- * --------------------------- * reading in the data import delimited using "$data/Fleed_tyossakaynti/fleed_puf_julk.csv", clear * first look browse shtun vuosi sukup syntyv svatva if vuosi==15 * clean up the data a bit rename shtun id gen year=1995+vuosi gen woman=(sukup==2) replace woman=. if sukup==. gen age=year-syntyv rename svatva earn order id year earn age woman keep if year==2010 browse id year earn age woman * drop missing observations and summarize the data drop if earn==. summarize earn, detail di "Coefficient of variation (CV): " r(sd)/r(mean) * calculate quantile ratios di r(p90)/r(p10) di r(p90)/r(p50) di r(p50)/r(p10) * histograms local opt "xsize(16) ysize(20) col(navy) lw(vvthin) lcol(white) ylab(none) xtitle("")" hist earn, `opt' disc graph export "$output/histogram1.pdf", replace hist earn, `opt' bin(50) graph export "$output/histogram2.pdf", replace hist earn, `opt' bin(10) graph export "$output/histogram3.pdf", replace * Kernel density estimates local opt "xsize(16) ysize(20) lw(thick) lcol(navy) ylab(none) xtitle("") title("")" kdensity earn, `opt' graph export "$output/kdensity1.pdf", replace kdensity earn, `opt' bw(1000) graph export "$output/kdensity2.pdf", replace kdensity earn, `opt' bw(10000) graph export "$output/kdensity3.pdf", replace * CDF of earned income local opt "xsize(18) ysize(20) xtitle("") lc(navy) ylab(0(.05)1, grid glw(thin)) xlab(0(20000)100000, grid glw(thin)) ytitle(Fraction below x)" distplot earn, `opt' graph export "$output/cdf4.pdf", replace * ---------------------------------------- * --- ILLUSTRATIVE DISTRIBUTION GRAPHS --- * ---------------------------------------- * Drawing the example PDFs local opt "xsize(16) ysize(20) ylab(none) yscale(off) xtitle("") title("") legend(off) yline(0, lc(black)) xlab(-4(1)4)" twoway (function y=normalden(x,0,1), range(-4 -1) lw(none) recast(area) fcol(navy%25)) /// (function y=normalden(x,0,1), range(-4 4) lw(thick) lc(navy)) /// , `opt' text(0.05 -1.5 "15.9%") graph export "$output/cdf1.pdf", replace twoway (function y=normalden(x,0,1), range(-4 0) lw(none) recast(area) fcol(navy%25)) /// (function y=normalden(x,0,1), range(-4 4) lw(thick) lc(navy)) /// , `opt' text(0.1 -.8 "50%") graph export "$output/cdf2.pdf", replace * ... and the CDF (there must be more elegant way to do this, but I'm in a hurry....) preserve drop * set obs 10000 egen x=fill(-4(.1)4) drop if x>4 gen pdf=normalden(x) gen cdf=pdf in 1 replace cdf=cdf[_n-1]+pdf if cdf[_n-1]!=. replace cdf=cdf/10 scatter cdf x, c(l) m(none) lw(thick) xlab(-4(1)4, grid glw(thin)) ylab(0(.05)1, grid glw(thin)) xsize(16) ysize(20) xtitle("") ytitle(Fraction below x) graph export "$output/cdf3.pdf", replace restore