# Stata basics Jaakko Markkanen ## Dyndoc template (remember to add option `embedimage`)! ### Printing in Stata The print command for Stata is the `display` or `di` command. Let’s start by printing a familiar message: ~~~~ <> di as text "Hello World!" <> ~~~~ ### Help The most important Stata command is the `help` command. It searches the Stata documentation for the user input. ``` help help help regress ``` ### Macros and scalars Stata has two types of “variables” called local and global macros (not to be confused with actual Stata variables). Local macros are saved in memory until the end of the current do-file or Stata instance. Global macros are saved until you exit Stata. Let’s start by saving your working directory with macros. Create a folder called ECON-C4100 somewhere on your computer and copy the path to Stata: ~~~~ <> local userPath "C:/ECON-C4100" global userPath "C:/ECON-C4100" <> ~~~~ Macros are referenced with `` `localName' `` and ` $globalName `. We can print them by typing: ~~~~ <> di "`userPath'" di "$userPath" <> ~~~~ One crucial thing about statistical programming is to know which folder your working directory is pointing at. Let's set our working dir to `$userPath`: ~~~~ <> cd $userPath // or cd `userPath' <> ~~~~ You can print the location of the current working with the `pwd ` command: ~~~~ <> pwd <> ~~~~ On Windows, `cd` without arguments would print the current working directory, but in OSX or Linux it would change the working directory to the user root. To print the Stata system directories you can type: ~~~~ <> sysdir <> ~~~~ You can combine macros easily: ~~~~ <> local userPathTwice `userPath' $userPath di "`userPathTwice''" <> ~~~~ You can also erase macros by simply typing: ~~~~ <> local userPathTwice <> ~~~~ Macros can be used as counters: ~~~~ <> local one_ 1 local two_ 2 local three_ `one_' + `two_' di `three_' <> ~~~~ However notice how `""` change the output: ~~~~ <> di "`three'" <> ~~~~ You can also use scalars. They are refered with their names but can only store numbers: ~~~~ <> scalar one_ = 1 scalar two_ = 2 scalar three_ = one_ + two_ di three_ scalar drop three_ <> ~~~~ ### Commenting and line breaks in Stata Including comments to your code is easy: ~~~~ <> // Comment, one line * Comment, one line /* Comment, multiple lines */ <> ~~~~ The default chracter that marks the end of a command is newline (line break). You can change it to ; and back easily: ~~~~ <> #delimit ; #delimit cr ; <> ~~~~ When using the standard line break, we need to use `///` to tell Stata if our command is presented in multiple lines. ### Importing data to Stata Next I demonstrate on how to import text and Excel data to Stata. We obtain our data directly through Statistics Finland API (I’ve created the links beforehand). We could also download the data to disk and replace the address with the file path. More on that later... ~~~~ <> local delimiter "tab" // our data has 'tabulate' key as the delimiter. import delimited using /// https://pxnet2.stat.fi:443/PXWeb/sq/2c23b351-c9a5-4946-b9c1-06c8146e7119 , /// delimiter(`delimiter') clear <> ~~~~ Let’s describe our data in memory: ~~~~ <> describe _all <> ~~~~ In Stata, the data is saved as variables. Think them as vectors or matrices or as columns. Notice how the month variable has storage type `str7`. That means it’s a string variable. We can turn it into a time variable that Stata understands: ~~~~ <> generate temp = monthly(month, "YM") drop month rename temp month label variable month "Month" format month %tm <> ~~~~ Above, we first generate a new variable with the `generate` or `gen` command. Then we delete a variable with the `drop` command and rename the variable temp back to month with the `rename` command. Finally, we give the new month variable it’s old label. ~~~~ <> describe _all <> ~~~~ We could also do the same with Excel data. This time Statistics Finland gives us a direct link to an Excel file: ~~~~ <> import excel using /// https://pxnet2.stat.fi/PXWeb/sq/feec4f38-7ddb-4c9f-b091-a0688c3f7b89 , /// cellrange(A3:F194) firstrow clear keep A B Pointfigure rename A month rename B commodity generate temp = monthly(month, "YM") drop month rename temp month label variable month "Month" format month %tm <> ~~~~ Excel files are often quite awkward. Notice how we need to determine the precise cellrange or the data would include some unwanted metadata from the Statistics Finland. Command `keep` is the opposite of `drop`. Finally, to demonstrate the dynamic options of Stata, let’s do a graph: ~~~~ <> line Pointfigure month <> ~~~~ <>