clear all *Set up the directory where the file is and where the final excel is saved cd "Your_directory_name_here" *Point c: import the data import delimited "fleed_puf_julk.csv", delimiters(",") *Point d: Keep only the relevant variables keep vuosi sukup syntyv ktutk svatva tyotu *Point e: Generate earnings, which is a sum of svatva and tyotu gen earnings = svatva+tyotu *Point f: replace missing earnings with 0 replace earnings=0 if earnings==. *Point g: convert year numbers to years by adding 1994 to each year. * Rename vuosi to year replace vuosi = vuosi+1994 rename vuosi year *Point h: person's age is current year minus birth year gen age = year-syntyv *Point i: generate dummy for working-aged (18-65) persons gen working_aged=0 replace working_aged=1 if age>=18 & age<=65 *Point j: drop individuals that are not working aged drop if working_aged==0 *Point k: convert gender indicator to a dummy variable gen female = 0 replace female = 1 if sukup==2 *Point l: drop redundant variables drop svatva tyotu syntyv working_aged sukup *Point m-n: covert education to a string tostring ktutk, replace force *Point o: extract the first number from ktutk and covert to numeric: gen edu_level = substr(ktutk,1,1) destring edu_level, replace force *Point p: create a dummy for tertairy education gen tertiary=0 replace tertiary = 1 if edu_level>=5 & edu_level<=8 *Point q: generate mean earnings by year-gender-tertiary categories: egen mean_earnings = mean(earnings), by(year female tertiary) *Point r: retain only one year-gender-tertairy observation per group: drop earnings duplicates drop year female tertiary, force *Point s-w: Convert the data into a form in which we have one observation *per year and separate variables for each four earnings categor. The procedure *given in the instructions is quite tedious. The one below should work much easier * I am not an efficient coder but at least the one below works :) *Create variables for the earnings categories: gen earnings_f_college=. replace earnings_f_college = mean_earnings if female==1 & tertiary==1 gen earnings_m_college=. replace earnings_m_college = mean_earnings if female==0 & tertiary==1 gen earnings_f_highschool=. replace earnings_f_highschool = mean_earnings if female==1 & tertiary==0 gen earnings_m_highschool=. replace earnings_m_highschool = mean_earnings if female==0 & tertiary==0 *collapse the data to contain only the observation per year collapse (mean) earnings_f_college earnings_f_highschool earnings_m_college earnings_m_highschool, by(year) *Point x: export to excel: export excel using "earnings edu finland.xlsx", firstrow(variables) replace exit , clear STATA