clear * Create paths to folders "data" and "output" by using global command * Read more: help global global data "C:\Users\aapo.kivinen\Dropbox (Aalto)\PhD\teaching\Principles_of_Economic_Analysis\aapo\session one\data" global output "C:\Users\aapo.kivinen\Dropbox (Aalto)\PhD\teaching\Principles_of_Economic_Analysis\aapo\session one\output" /* Import dataset to Stata csv-files are imported by import delimited. Option clear indicates that we want to clear potential dataset that we previously had in use (without clear an error would occur). In delim option, I indicate the delimiting character between variables. This was chosen at StatFin webpage. */ import delimited "$data\004_115c.csv",clear delim(";") describe browse * Give shorter variable name to a variable: rename maintypeofactivity main_activity * Drop a varible: drop sex * Change a value of a (string) variable within a variable. replace age = "100" if age == "100 -" * Now, we can change variable type from string to numerical. * More info: help destring destring age, replace *Drop population outside of labor force: younger than 15 or older than 74: drop if (age < 15 | age>74) *Check what occupations there are in the data codebook main_activity * Drop irrelevant group: drop if main_activity == "0-14 years old" * Rename variables to make naming consistent: rename population31dec v4 * Transform into "long" format. This is a difficult concept (at least to me) to grasp * See: help reshape reshape long v, i(main_activity age) j(year) * Rename a variable rename v population * Set year into correct ones: replace year = year + 1983 *Transform age into "wide" format reshape wide population, i(main_activity year) j(age) * Aggregating the data into 10-year age bins. This is done by two nested for-loops. describe browse * Give shorter variable name to a variable: rename maintypeofactivity main_activity * Drop a varible: drop sex * Change a value of a (string) variable within a variable. replace age = "100" if age == "100 -" * Now, we can change variable type from string to numerical. * More info: help destring destring age, replace *Drop population outside of labor force: younger than 15 or older than 74: drop if (age < 15 | age>74) *Check what occupations there are in the data codebook main_activity * Drop irrelevant group: drop if main_activity == "0-14 years old" * Rename variables to make naming consistent: rename population31dec v4 * Transform into "long" format. This is a difficult concept (at least to me) to grasp * See: help reshape reshape long v, i(main_activity age) j(year) * Rename a variable rename v population * Set year into correct ones: replace year = year + 1983 *Transform age into "wide" format reshape wide population, i(main_activity year) j(age) /* Aggregating the data into 10-year age bins. This is done by using a for-loop, locals and egen function rsum. We run a for-loop for values 15, 25,..., 65. Then, we create a local that is value in a loop, called bottom, +9. Finally, we create a new variable that takes the sum between these variables eg. 15 to 24. (Don't worry if you are confused here, this is not easy). */ local var = "population" forvalues bottom=15(10)74{ local top=`bottom'+9 egen `var'`bottom'`top'=rsum(`var'`bottom'-`var'`top') } *Dropping redundant (old) variables drop population15-population74 * Save this file as a tempfile * See more: help tempfile tempfile emp save `emp' *Import second data set to Stata import delimited "$data\009_123x.csv", clear rename currentpriceseuro v2 *Transform into "long" format reshape long v, j(year) i(transaction) drop transaction *correct years: replace year = year + 1978 drop if year < 1987 rename v gdp *Merge with employment statistics merge 1:m year using `emp' drop _merge *Sort data set sort main_activity year save "$data\cleaned_data.dta", replace