clear all /* Applied Microeconometrics I Stata tutorial 2023 Atte Pudas */ *Part I *Note: I am using Stata 18 *Note: the step numbers below refer to document "Stata_tutorial_session_2023_slides.pdf" *First, let's create a new folder on your computer for the tutorial files. *Then, download all related files from mycourses page including datasets. *Current working directory pwd *Specify directory: cd *Change your work directory to the folder where your working files are located: *cd “directory_name” *(all files you save during the Stata session without specifying another directory will be *saved here) *Step 5.1: Define folder from which the data is retrieved cd "YOUR FOLDER PATH HERE" *How to find the path of dataset file in your computer? Just go the new folder we created for the tutorial. *Then right click to the .dta file that you want to load into Stata and click Properties. You will see the path on Location row. *Step 5.2: Now, we want to create a log file. log using "tutorial_log.log", replace *Step 5.3: it's time to load the data into Stata. use "mini_fleed_data.dta", clear *Let's execute our do file for the first time. You can do it by clicking execute button on the toolbar on top or just press Ctrl+D. *You can partially execute your code by highlighting the rows you want to execute. *Step 6: Now, you can see the all variables at the variable manager window. Check if you loaded correctly! browse *Step 7: Some descriptives of our variables. describe *Step 8: Count the observations. count *You can always ask help from Stata. help describe *Step 9: Now, let's look contents of a variable. codebook main_activity *There are two main types of variables, "numerical" vs "string". Mind the data type because they are handled differently in Stata. *Now let's generate some new variables *Step 10.1 gen agesq = age^2 browse age agesq *Step 10.2 gen activity=0 replace activity=1 if main_activity=="Employed" | main_activity=="Unemployed" *Vertical bar "|" stands for "inclusive or" (at least one is true) *The ampersand "&" stands for "and" (both are true) * you can also create the same dummy variable with command gen activity2 = (main_activity=="Employed" | main_activity=="Unemployed") br activity main_activity * you can remove variables from the data drop activity2 *String variables: Note the quotation marks! gen kieli = "suomi" if native_language == "finnish" br kieli native_language *Step 11: summary statistics for activity variable sum activity *Step 12 gen unemployed=. replace unemployed=0 if main_activity=="Employed" replace unemployed=1 if main_activity=="Unemployed" *For numerical variables, . stands for a missing/empty value, for string variables "" * we can replace all the value of variable kieli with a empty value replace kieli = "" drop kieli *Step 13: summary statistics for unemployed variable sum unemployed *Step 14: Unemployment and Activity rates of women between age 25 and 54 sum activity unemployed if female == 1 & (age >= 25 & age <= 54) *Step 15: Let's create a table table children_under_7 female, statistic(mean unemployed) statistic(frequency) *Step 16: Now, let's look at the relationship between having children under 7 years old and employment of women *Step 16.1 reg months_employment children_under_7 *Step 16.2 reg months_employment children_under_7 if female==1 //women reg months_employment children_under_7 if female==0 //men *Step 16.3 categorize children xi: reg months_employment i.children_under_7 if female==1 * categorized variable is not a string, so we can do the same by coding reg months_employment i.children_under_7 if female==1 *Step 17: adding controls xi: reg months_employment children_under_7 i.age i.region i.education i.native_language if female==1 *Step 18 close the log file log close * You can convert the log-file into pdf translate tutorial_log.log tutorial_log.pdf, replace