Suppose we assume that the data comes from the following misspesified indirect utility
\[\begin{align*} u_{ijt} = \alpha p_{jt} + \beta^{(1)}x_{jt} + \beta^{(2)}l_j + \beta^{(3)} roof_{j} + \xi_{jt} + \epsilon_{ijt} \end{align*}\]
Denote the mean utility \[\delta_{jt} = \alpha p_{jt} + \beta^{(1)}x_{jt} + \beta^{(2)}l_j + \beta^{(3)} roof_{j} + \xi_{jt} \] Then we have
\[\begin{align*} u_{ijt} = \delta_{jt} + \epsilon_{ijt} \end{align*}\]
From lecture 2 slide 14, the probability of consumer i choosing product j in market t is
\[\begin{align*} Pr(y_{ijt} = 1) &= Pr(u_{ijt} \ge u_{ikt}, \; \forall k, \; j \neq k)\\ &= Pr(\epsilon_{ikt}-\epsilon_{ijt} \leq \delta_{jt} - \delta_{kt}, \; \forall k, \; j \neq k) \end{align*}\]
We’re assuming \(\epsilon\) is distributed type 1 extreme value. Then the probability becomes
\[\begin{align*} P(y_{ijt}=1) = P_{ijt} = \frac{\text{exp}(\delta_{jt})}{1 + \sum_k^J \text{exp}(\delta_{kt})} \end{align*}\]
For details of how to derive this, see Train (2009) Chapter 3. Note that this is the same for all consumers. Then the empirical counterpart of the choice probability \(P_{ijt}\) is the market share \(s_{jt}\)
\[\begin{align*} s_{jt} = \frac{\text{exp}(\delta_{jt})}{\sum_{k=0}^J \text{exp}(\delta_{kt})} = \frac{\text{exp}(\delta_{jt})}{1 + \sum_{k=1}^J \text{exp}(\delta_{kt})} \end{align*}\]
And after normalizing the outside good to zero
\[\begin{align*} s_{0t} = \frac{exp(0)}{\sum_{k=0}^J \text{exp}(\delta_{kt})} = \frac{1}{1 + \sum_{k=1}^J \text{exp}(\delta_{kt})} \end{align*}\]
Take natural logarithms of the choice probabilities
\[\begin{align*} \text{ln}(s_{jt}) &= \text{ln}(\text{exp}(\delta_{jt})) - \text{ln}(\sum_{k=0}^J \text{exp}(\delta_{kt})) = \delta_{jt} - \text{ln}(\sum_{k=0}^J \text{exp}(\delta_{kt})) \\ \text{ln}(s_{0t}) &= \text{ln}(1) -\text{ln}(\sum_{k=0}^{J} \text{exp}(\delta_{kt})) = -\text{ln}(\sum_{k=0}^{J} \text{exp}(\delta_{kt}))\\ \end{align*}\]
Now, subtract the log of the outside good’s market share from the log of product j’s market share. Note that the number of inside goods is 4, so we can substitute \(J=4\)
\[\begin{align*} \text{ln}(s_{jt}) - \text{ln}(s_{0t}) &= \delta_{jt} - \text{ln}(\sum_{k=0}^4 exp(\delta_{kt})) -[- \text{ln}(\sum_{k=0}^{4} exp(\delta_{kt}))]\\ \text{ln}(s_{jt}/s_{0t}) &= \delta_{jt} =\alpha p_{jt} + \beta^{(1)}x_{jt} + \beta^{(2)}l_j + \beta^{(3)} roof_{j} + \xi_{jt} \end{align*}\]
The purpose of deriving this equation is that now we have a nice linear expression that we can estimate using OLS and standard linear IV methods. Moreover, we only need to observe market level data.
Note that the excel file has the market shares of the inside goods (taking into account the outside good) but not the market share of the outside good. Thus, in order to run the regression on the estimation equation derived in 4.1. b), we must
# Loads data and converts it into a data.table object using a pipe %>%.
# %>% passes an object to the next line.
# data.table is just an enhanced version of data.frame
boat_dt <-
read_xlsx("boat_data.xlsx") %>%
data.table(.)
head(boat_dt)
## prices shares length quality cost_shifter roof firm_ids
## <num> <num> <num> <num> <num> <num> <num>
## 1: 5.858942 0.498415102 9.336710 1.2291334 0.8022839 1 1
## 2: 4.600115 0.102175147 7.092352 0.7524592 1.5204211 1 2
## 3: 1.000218 0.004850799 6.159682 0.3415478 0.3349984 0 3
## 4: 3.233179 0.216493489 5.780847 0.7570901 0.7599571 0 4
## 5: 3.827983 0.354432180 9.336710 0.5739109 0.1647224 1 1
## 6: 2.976261 0.136502418 7.092352 0.8794283 0.7626693 1 2
## market_ids
## <num>
## 1: 1
## 2: 1
## 3: 1
## 4: 1
## 5: 2
## 6: 2
Calculate the outside good’s market share for each market and create the dependent variable.
# Outside good market share = 1 - sum of the inside goods' market shares within
# the market.
boat_dt[, outside_good_ms := 1 - sum(shares), by = market_ids]
# Dependent variable = ln(market share/outside good market share)
# note that in R, log is the natural logarithm
boat_dt[, ln_sj_s0 := log(shares / outside_good_ms)]
head(boat_dt[, .(shares, outside_good_ms, market_ids, ln_sj_s0)])
## shares outside_good_ms market_ids ln_sj_s0
## <num> <num> <num> <num>
## 1: 0.498415102 0.1780655 1 1.029282011
## 2: 0.102175147 0.1780655 1 -0.555462791
## 3: 0.004850799 0.1780655 1 -3.603007839
## 4: 0.216493489 0.1780655 1 0.195409217
## 5: 0.354432180 0.3555943 2 -0.003273548
## 6: 0.136502418 0.3555943 2 -0.957448236
Estimate the logit model. Note that there is no constant in the estimation equation!
# You might as well use lm. I use feols to get a clean output using etable
# Note that R includes an intercept by default, so we have to include -1 in the equation to run the regression without a constant.
logit <-
feols(ln_sj_s0 ~ -1 + quality + prices + length + roof, data = boat_dt)
etable(logit)
## logit
## Dependent Var.: ln_sj_s0
##
## quality 1.092*** (0.0329)
## prices -0.1248*** (0.0254)
## length -0.2148*** (0.0120)
## roof 1.758*** (0.0526)
## _______________ ___________________
## S.E. type IID
## Observations 4,000
## R2 0.32147
## Adj. R2 0.32096
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that all the estimates are badly biased. The true parameter values are quality = 2, prices = -2, length = 0.5 and (mean of) roof = 4. The differences from the estimates are large.
The endogeneity problem arises from the fact that there is quality unobserved by the econometrician. The higher the unobserved quality the higher price the consumers are willing to pay for the good, all else equal. Therefore, price is positively correlated with unobserved quality \(\xi_{jt}\) which is positively correlated with utility and prices are endogenous.
Firm’s own cost shifter and observed quality of other firms in the market are useful instruments:
Generate instruments from competitors’ qualities and cost shifters:
# Sum of competitors' qualities
boat_dt[, sum_obs_quality_iv := sum(quality) - quality, by = market_ids]
# Sum of competitors' cost shifters
boat_dt[, sum_cost_shifter_iv := sum(cost_shifter) - cost_shifter, by = market_ids]
Estimate 2SLS:
#Use all instruments
logit_2sls <-
feols(ln_sj_s0 ~ -1 + quality + length + roof |
prices ~ cost_shifter + sum_obs_quality_iv + sum_cost_shifter_iv,
data = boat_dt)
#Use only own cost shifter and sum of the competitors' observed qualities
logit_2sls_2<-
feols(ln_sj_s0 ~ -1 + quality + length + roof |
prices ~ cost_shifter + sum_obs_quality_iv,
data = boat_dt)
etable(logit, logit_2sls, logit_2sls_2)
## logit logit_2sls logit_2sls_2
## Dependent Var.: ln_sj_s0 ln_sj_s0 ln_sj_s0
##
## quality 1.092*** (0.0329) 1.616*** (0.0483) 1.614*** (0.0483)
## prices -0.1248*** (0.0254) -1.712*** (0.0559) -1.707*** (0.0563)
## length -0.2148*** (0.0120) 0.4306*** (0.0243) 0.4287*** (0.0245)
## roof 1.758*** (0.0526) 3.111*** (0.0825) 3.107*** (0.0827)
## _______________ ___________________ __________________ __________________
## S.E. type IID IID IID
## Observations 4,000 4,000 4,000
## R2 0.32147 -0.33965 -0.33572
## Adj. R2 0.32096 -0.34065 -0.33672
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
All estimates look better, but not very close to the true parameter values.
Price elasticity can be expressed using the derivative of market share w.r.t. price
\[\eta_{jj}= \frac{\Delta s_{jt}/s_{jt}}{\Delta p_{jt}/p_{jt}}= \frac{\Delta s_{jt}}{\Delta p_{jt}}\frac{p_{jt}}{s_{jt}}= \frac{\partial s_{jt}}{\partial p_{jt}}\frac{p_{jt}}{s_{jt}}\]
Let us derive the formula for the price elasticity. First substitute
\[ s_{jt} = \frac{\text{exp}(\delta_{jt})}{1 + \sum_{k=1}^J \text{exp}(\delta_{kt})} \] We get
\[\begin{align*} \eta_{jj}&=\frac{\partial s_{jt}}{\partial p_{jt}}\frac{p_{jt}}{s_{jt}} =\frac{\partial \left( \frac{\text{exp}(\delta_{jt})}{1 + \sum_{k=1}^J \text{exp}(\delta_{kt})}\right)}{\partial p_{jt}}\frac{p_{jt}}{s_{jt}} \\ &=\frac{\frac{\partial \text{exp}(\delta_{jt})}{\partial p_{jt}} (1 +\sum_{k=1}^J \text{exp}(\delta_{kt})) - \text{exp}(\delta_{jt}) \frac{\partial}{\partial p_{jt}}(1 +\sum_{k=1}^J \text{exp}(\delta_{kt}))}{(1 + \sum_{k=1}^J \text{exp}(\delta_{kt}))^2} \frac{p_{jt}}{s_{jt}} \end{align*}\]
Substitute \(\frac{\partial \text{exp}(\delta_{jt})}{\partial p_{jt}} = \alpha \cdot \text{exp}(\delta_{jt})\) and \(\frac{\partial}{\partial p_{jt}}(1 +\sum_{k=1}^J \text{exp}(\delta_{kt}))= \alpha \cdot \text{exp}(\delta_{jt})\) since the derivative of the other products’ mean utilities w.r.t. \(p_{jt}\) are zero
\[\begin{align*} \eta_{jj}&= \frac{\alpha \cdot\text{exp}(\delta_{jt}) (1 +\sum_{k=1}^J \text{exp}(\delta_{kt})) - \text{exp}(\delta_{jt}) \cdot \alpha \cdot \text{exp}(\delta_{jt})}{(1 + \sum_{k=1}^J \text{exp}(\delta_{kt}))^2} \frac{p_{jt}}{s_{jt}}\\ &= \left[ \alpha \frac{\text{exp}(\delta_{jt}) }{1 + \sum_{k=1}^J \text{exp}(\delta_{kt})} - \alpha \left( \frac{\text{exp}(\delta_{jt})}{1 + \sum_{k=1}^J \text{exp}(\delta_{kt})}\right)^2 \right]\frac{p_{jt}}{s_{jt}}\\ &= \left( \alpha s_{jt} - \alpha s_{jt}^2 \right) \frac{p_{jt}}{s_{jt}}\\ &= \alpha(1-s_{jt})s_{jt} \frac{p_{jt}}{s_{jt}} \\ &= \alpha(1-s_{jt}) p_{jt} \end{align*}\]
The elasticity only depends on firm \(j\)’s price and market share as well as \(\alpha\). I use the estimate for alpha from logit_2sls_2 specification to calculate this.
alpha <- logit_2sls_2$coefficients["fit_prices"]
# calculates price elasticity
boat_dt[, elasticity_jt := alpha * (1 - shares) * prices]
# plots
logit_plot <-
ggplot(data = boat_dt,
aes(
x = elasticity_jt,
group = as.factor(firm_ids),
fill = as.factor(firm_ids)
)) +
geom_density(alpha = 0.4) +
ggtitle("Logit Price Elasticities Across Markets")
logit_plot
Notice that the distribution of the price elasticities are very similar between firm 1 and 2 (the boats with roof) and between firms 3 and 4 (firms with no roof).
The average elasticities by firm are
boat_dt[, .("Average elasticity" = mean(elasticity_jt)), by = firm_ids]
## firm_ids Average elasticity
## <num> <num>
## 1: 1 -5.600759
## 2: 2 -5.289546
## 3: 3 -4.072071
## 4: 4 -3.934288
Conlon and Mortimer (2021) define the diversion ratio as follows: “As the price of j increases, some consumers leave product j, and a subset of these consumers switch to a substitute product k. The diversion ratio, \(D_{jk}\) , is defined as the ratio of the switchers to the leavers.”
Note that the diversion ratio from product j to product k only depends on the market shares of those two products.
\[\begin{align*} D_{jk} = \frac{s_{kt}}{1 - s_{jt}} \end{align*}\]
logit_div_13 <-
boat_dt[firm_ids == 3, shares] / boat_dt[firm_ids == 1, 1 - shares]
logit_div_31 <-
boat_dt[firm_ids == 1, shares] / boat_dt[firm_ids == 3, 1 - shares]
data.table("D_13" = mean(logit_div_13),
"D_31" = mean(logit_div_31))
## D_13 D_31
## <num> <num>
## 1: 0.1921416 0.3445739
These substitution patterns are lacking as I do not take into account that firm 3 has roof in it’s boat, while 2 does not. We’re assuming the IIA holds which is not the case with the true model. With random coefficients, the ratio of probabilities of two goods depends on the whole market, including attributes of all other goods on the market, making IIA not hold. As a result, the logit model gives incorrect substitution patterns.
Our estimation equation is \[\begin{align*} ln(s_{jt}) - ln(s_{0t}) &= \delta_{jt} + \sigma ln(s_{jt/g}) + \xi_{jt}\\ ln(s_{jt}/s_{0t}) &= \alpha p_{jt} + \beta^{(1)}x_{jt} + \beta^{(2)}l_j + \beta^{(3)} roof_{j} + \sigma ln(s_{jt/g}) + \xi_{jt} \end{align*}\]
There are multiple options for instruments. Some possible instruments for price are
Within-nest market share is instrumented with withing nest competitor’s observed quality
boat_dt[, group := fifelse(firm_ids %in% c(1, 2), 1, 2)]
head(boat_dt[, .(firm_ids, group)])
## firm_ids group
## <num> <num>
## 1: 1 1
## 2: 2 1
## 3: 3 2
## 4: 4 2
## 5: 1 1
## 6: 2 1
boat_dt[, within_nest_ms := shares / sum(shares), by = .(group, market_ids)]
# Generate own nest's competitor's cost shifter IV
boat_dt[,
own_n_comp_costs := sum(cost_shifter) - cost_shifter,
by = .(market_ids, group)]
# Generate own nest's summed cost shifter, is used below
boat_dt[,
n_cost_shifter := sum(cost_shifter),
by = .(group, market_ids)]
# Generate other group's competitors' summed cost shifter IV
other_groups_cost_shift <- boat_dt[, .(n_cost_shifter, group, market_ids)]
other_groups_cost_shift[, merge_group := fifelse(group == 1, 2, 1)]
other_groups_cost_shift[, group := NULL]
other_groups_cost_shift[, group := merge_group] #switch groups
other_groups_cost_shift[, other_n_cost_s := n_cost_shifter]
boat_dt[other_groups_cost_shift, on = c("group", "market_ids"),
other_n_cost_s := other_n_cost_s]
# Within nest competitor's observed quality IV for within nest market share
boat_dt[,
own_n_obs_quality := sum(quality) - quality,
by = .(group, market_ids)]
# Sum of observed quality within nest, is used below
boat_dt[,
n_obs_quality := sum(quality),
by = .(group, market_ids)]
# Generate other group's competitors' summed observed quality IV
other_groups_quality <- boat_dt[, .(n_obs_quality, group, market_ids)]
other_groups_quality[, merge_group := fifelse(group == 1, 2, 1)]
other_groups_quality[, group := NULL]
other_groups_quality[, group := merge_group] #switch groups
other_groups_quality[, other_n_qual := n_obs_quality]
boat_dt[other_groups_quality, on = c("group", "market_ids"),
other_n_qual := i.other_n_qual]
#Remember our instruments are:
# 1. own cost shifter
# 2. within nest competitor's cost shifter
# 3. other nest's competitors' summed cost shifters
# 4. withing nest competitor's observed quality
# 5. other nests competitors summed quality.
#Riku used 1. 2. and 4.
Estimate specifications with and without IVs. Tuomas found that the model works better if he didn’t use other firms’ cost shifters as IVs.
#with roof in mean quality
nl<-
feols(ln_sj_s0 ~ -1 + quality + length + roof + prices + log(within_nest_ms),
data = boat_dt)
#without roof in mean quality
nl_nr<-
feols(ln_sj_s0 ~ -1 + quality + length + prices + log(within_nest_ms),
data = boat_dt)
#instrument using own cost shifter,
# observed quality of within nest competitor and
# observed quality of other nest's competitors
nl_2sls<-
feols(ln_sj_s0 ~ -1 + quality + length + roof |
prices + log(within_nest_ms) ~ cost_shifter + own_n_obs_quality +
other_n_qual,
data = boat_dt)
#same without roof in mean quality
nl_2sls_nr<-
feols(ln_sj_s0 ~ -1 + quality + length |
prices + log(within_nest_ms) ~ cost_shifter + own_n_obs_quality +
other_n_qual,
data = boat_dt)
#instrument using all generated instruments:
# 1. own cost shifter
# 2. within nest competitor's cost shifter
# 3. other nest's competitors' summed cost shifters
# 4. withing nest competitor's observed quality
# 5. other nests competitors summed quality.
nl_2sls_all <-
feols(ln_sj_s0 ~ -1 + quality + length + roof |
prices + log(within_nest_ms) ~ cost_shifter
+ own_n_comp_costs
+ own_n_obs_quality
+ other_n_cost_s
+ other_n_qual,
data = boat_dt)
#Same without roof
nl_2sls_all_nr <-
feols(ln_sj_s0 ~ -1 + quality + length |
prices + log(within_nest_ms) ~ cost_shifter
+ own_n_comp_costs
+ own_n_obs_quality
+ other_n_cost_s
+ other_n_qual,
data = boat_dt)
Below are results without roof in mean quality compared with the logit results. The results are very similar whether we use the three instruments (3rd column) or the full set of instruments (last column).
etable(logit_2sls, nl_nr, nl_2sls_nr, nl_2sls_all_nr)
## logit_2sls nl_nr nl_2sls_nr
## Dependent Var.: ln_sj_s0 ln_sj_s0 ln_sj_s0
##
## prices -1.712*** (0.0559) 0.1313*** (0.0185) -1.289*** (0.0689)
## quality 1.616*** (0.0483) 0.4716*** (0.0259) 0.9330*** (0.0507)
## length 0.4306*** (0.0243) -0.0083 (0.0101) 0.6249*** (0.0300)
## roof 3.111*** (0.0825)
## log(within_nest_ms) 0.9727*** (0.0152) 0.6690*** (0.0450)
## ___________________ __________________ __________________ __________________
## S.E. type IID IID IID
## Observations 4,000 4,000 4,000
## R2 -0.33965 0.57183 -0.13569
## Adj. R2 -0.34065 0.57150 -0.13654
##
## nl_2sls_all_nr
## Dependent Var.: ln_sj_s0
##
## prices -1.285*** (0.0636)
## quality 0.9621*** (0.0489)
## length 0.6091*** (0.0286)
## roof
## log(within_nest_ms) 0.5912*** (0.0396)
## ___________________ __________________
## S.E. type IID
## Observations 4,000
## R2 -0.16355
## Adj. R2 -0.16442
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Below are results with roof in mean quality compared with the logit results. Remember the true parameter values are quality = 2, prices = -2, length = 0.5 and (mean of) roof = 4. We see that the value of the nesting parameter if between 0 and 1, suggesting that products within nest are closer substitutes with each other than products in different nests.
etable(logit_2sls, nl, nl_2sls, nl_2sls_all)
## logit_2sls nl nl_2sls
## Dependent Var.: ln_sj_s0 ln_sj_s0 ln_sj_s0
##
## prices -1.712*** (0.0559) -0.0799*** (0.0185) -1.442*** (0.0692)
## quality 1.616*** (0.0483) 0.6804*** (0.0248) 1.415*** (0.0582)
## length 0.4306*** (0.0243) -0.0259** (0.0093) 0.3791*** (0.0222)
## roof 3.111*** (0.0825) 1.115*** (0.0397) 2.710*** (0.1066)
## log(within_nest_ms) 0.8631*** (0.0144) 0.2430*** (0.0530)
## ___________________ __________________ ___________________ __________________
## S.E. type IID IID IID
## Observations 4,000 4,000 4,000
## R2 -0.33965 0.64239 0.01259
## Adj. R2 -0.34065 0.64204 0.01160
##
## nl_2sls_all
## Dependent Var.: ln_sj_s0
##
## prices -1.382*** (0.0546)
## quality 1.370*** (0.0488)
## length 0.3682*** (0.0200)
## roof 2.621*** (0.0864)
## log(within_nest_ms) 0.2984*** (0.0392)
## ___________________ __________________
## S.E. type IID
## Observations 4,000
## R2 0.07960
## Adj. R2 0.07868
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I use estimates from the nl_2sls (with roof in the mean utility and only three instruments).
alpha <- nl_2sls$coefficients["fit_prices"]
names(alpha) <- NULL
sigma <- nl_2sls$coefficients["fit_log(within_nest_ms)"]
names(sigma) <- NULL
boat_dt[, n_elasticity_jt := alpha * prices *
((1 / (1 - sigma)) - (sigma / (1 - sigma) * within_nest_ms) - shares)
]
# plots
n_logit_plot <-
ggplot(data = boat_dt,
aes(
x = n_elasticity_jt,
group = as.factor(firm_ids),
fill = as.factor(firm_ids)
)) +
geom_density(alpha = 0.4) +
ggtitle("Nested Logit Price Elasticities Across Markets")
par(mar = c(4, 4, .1, .1))
logit_plot
n_logit_plot
boat_dt[, .("Average elasticity" = mean(n_elasticity_jt)), by = firm_ids]
## firm_ids Average elasticity
## <num> <num>
## 1: 1 -5.695437
## 2: 2 -5.484764
## 3: 3 -4.065844
## 4: 4 -3.933168
These are quite close to the average logit price elasticities but a bit further from 0.
D_13_n <-
boat_dt[firm_ids == 3, shares * (1 - sigma)] /
boat_dt[firm_ids == 1, 1 - sigma * within_nest_ms + (1 - sigma) * shares]
D_31_n <-
boat_dt[firm_ids == 1, shares * (1 - sigma)] /
boat_dt[firm_ids == 3, 1 - sigma * within_nest_ms + (1 - sigma) * shares]
data.table("D_13" = mean(logit_div_13),
"D_31" = mean(logit_div_31),
"D_13_n" = mean(D_13_n),
"D_31_n" = mean(D_31_n))
## D_13 D_31 D_13_n D_31_n
## <num> <num> <num> <num>
## 1: 0.1921416 0.3445739 0.09510307 0.2321172
The nested logit diversion ratios are smaller. This makes sense since products 1 and 3 are in different nests and the nested logit model is able to capture the fact that there is less substitution between products with roof and without roof. Therefore, we expect the nested logit diversion ratios to be closer to the true substitution patterns.
Conlon, C., & Mortimer, J. H. (2021). Empirical properties of diversion ratios. The RAND Journal of Economics, 52(4), 693-726.