clear all

/*
	Applied Microeconometrics I
	Stata tutorial 2023
	Atte Pudas
*/

*Part I

*Note: I am using Stata 18
*Note: the step numbers below refer to document "Stata_tutorial_session_2023_slides.pdf" 

*First, let's create a new folder on your computer for the tutorial files.
*Then, download all related files from mycourses page including datasets.

*Current working directory
pwd
 
*Specify directory: cd
*Change your work directory to the folder where your working files are located:
*cd “directory_name”
*(all files you save during the Stata session without specifying another directory will be
*saved here)

*Step 5.1: Define folder from which the data is retrieved 
cd "YOUR FOLDER PATH HERE"

*How to find the path of dataset file in your computer? Just go the new folder we created for the tutorial.
*Then right click to the .dta file that you want to load into Stata and click Properties. You will see the path on Location row.

*Step 5.2: Now, we want to create a log file.

log using "tutorial_log.log", replace

*Step 5.3: it's time to load the data into Stata.

use "mini_fleed_data.dta", clear

*Let's execute our do file for the first time. You can do it by clicking execute button on the toolbar on top or just press Ctrl+D. 
*You can partially execute your code by highlighting the rows you want to execute.

*Step 6: Now, you can see the all variables at the variable manager window. Check if you loaded correctly!
browse

*Step 7: Some descriptives of our variables. 
describe

*Step 8: Count the observations. 
count

*You can always ask help from Stata.
help describe

*Step 9: Now, let's look contents of a variable.
codebook main_activity

*There are two main types of variables, "numerical" vs "string". Mind the data type because they are handled differently in Stata.

*Now let's generate some new variables
*Step 10.1
gen agesq = age^2 
browse age agesq

*Step 10.2
gen activity=0
replace activity=1 if main_activity=="Employed" | main_activity=="Unemployed"
*Vertical bar "|" stands for "inclusive or" (at least one is true)
*The ampersand "&" stands for "and" (both are true)

* you can also create the same dummy variable with command 
gen activity2 = (main_activity=="Employed" | main_activity=="Unemployed")

br activity main_activity

* you can remove variables from the data
drop activity2 

*String variables: Note the quotation marks!
gen kieli = "suomi" if native_language == "finnish"
br kieli native_language

*Step 11: summary statistics for activity variable
sum activity

*Step 12
gen unemployed=.
replace unemployed=0 if main_activity=="Employed"
replace unemployed=1 if main_activity=="Unemployed"

*For numerical variables, . stands for a missing/empty value, for string variables ""
* we can replace all the value of variable kieli with a empty value
replace kieli = ""
drop kieli

*Step 13: summary statistics for unemployed variable
sum unemployed

*Step 14: Unemployment and Activity rates of women between age 25 and 54
sum activity unemployed if female == 1 & (age >= 25 & age <= 54)

*Step 15: Let's create a table
table children_under_7 female, statistic(mean unemployed) statistic(frequency)

*Step 16: Now, let's look at the relationship between having children under 7 years old and employment of women
*Step 16.1
reg months_employment children_under_7
*Step 16.2
reg months_employment children_under_7 if female==1 //women
reg months_employment children_under_7 if female==0 //men

*Step 16.3 categorize children 
xi: reg months_employment i.children_under_7 if female==1
* categorized variable is not a string, so we can do the same by coding
reg months_employment i.children_under_7 if female==1

*Step 17: adding controls
xi: reg months_employment children_under_7 i.age i.region i.education i.native_language if female==1

*Step 18 close the log file 
log close

* You can convert the log-file into pdf
translate tutorial_log.log tutorial_log.pdf, replace