4. Estimating misspesified models with market-level data

4.1 Derive the logit estimation equation from the choice probabilities.

Suppose we assume that the data comes from the following misspesified indirect utility

\[\begin{align*} u_{ijt} = \alpha p_{jt} + \beta^{(1)}x_{jt} + \beta^{(2)}l_j + \beta^{(3)} roof_{j} + \xi_{jt} + \epsilon_{ijt} \end{align*}\]

Utility with mean utility \(\delta_{jt}\) is

\[\begin{align*} u_{ijt} = \delta_{jt} + \epsilon_{ijt} \end{align*}\]

From the lecture 2

\[\begin{align*} Pr(y_{ijt} = 1) &= Pr(u_{ijt} \ge u_{ikt}, \; \forall k, \; j \neq k)\\ &= \int I(\epsilon_{ikt}-\epsilon_{ijt} \leq \delta_{jt} - \delta_{kt})f(\epsilon)d\epsilon \end{align*}\]

Assuming \(\epsilon\) is type 1 extreme value we have

\[\begin{align*} P(y_{ijt}=1) = P_{ijt} = \frac{\text{exp}(\delta_{jt})}{1 + \sum_k^J \text{exp}(\delta_{kt})} \end{align*}\]

What is the empirical counterpart of the \(P_{ijt}\)?

\[\begin{align*} s_{jt} = \frac{\text{exp}(\delta_{jt})}{1 + \sum_k^J \text{exp}(\delta_{kt})} \end{align*}\]

And after normalizing the outside good to zero

\[\begin{align*} s_{0t} = \frac{1}{1 + \sum_k^J \text{exp}(\delta_{kt})} \end{align*}\]

Take natural logarithms of the choice probabilities (\(ln(1)=0\))

\[\begin{align*} \text{ln}(s_{jt}) &= \delta_{jt} - \text{ln}(\sum_k^4 exp(\delta_{kt})) \\ \text{ln}(s_{0t}) &= -\text{ln}(\sum_k^{4} exp(\delta_{kt}))\\ \text{ln}(s_{jt}) - \text{ln}(s_{0t}) &= \delta_{jt} - \text{ln}(\sum_k^4 exp(\delta_{kt})) + \text{ln}(\sum_k^{4} exp(\delta_{kt}))\\ \text{ln}(s_{jt}/s_{0t}) &= \delta_{jt} =\alpha p_{jt} + \beta^{(1)}x_{jt} + \beta^{(2)}l_j + \beta^{(3)} roof_{j} + \xi_{jt} \end{align*}\]

We have a nice linear expression that we can estimate using OLS and standard linear IV methods. Moreover, we only need to observe market level data.

4.2 Estimate logit demand without any instruments and explain what is the endogeneity issue at hand.

  • The thing to notice is that we need to divide market shares, \(s_{jt}\) with outside good’s share \(s_{0t}\).
  • Moreover, we want to omit the intercept.
# Loads data and converts it into a data.table object using a pipe %>%.
# %>% passes an object to the next line.
boat_dt <-
  readxl::read_xlsx("../boat_data.xlsx") %>%
  data.table(.)

head(boat_dt)
##      prices      shares   lenght   quality cost_shifter roof firm_ids
## 1: 5.858942 0.498415102 9.336710 1.2291334    0.8022839    1        1
## 2: 4.600115 0.102175147 7.092352 0.7524592    1.5204211    1        2
## 3: 1.000218 0.004850799 6.159682 0.3415478    0.3349984    0        3
## 4: 3.233179 0.216493489 5.780847 0.7570901    0.7599571    0        4
## 5: 3.827983 0.354432180 9.336710 0.5739109    0.1647224    1        1
## 6: 2.976261 0.136502418 7.092352 0.8794283    0.7626693    1        2
##    market_ids
## 1:          1
## 2:          1
## 3:          1
## 4:          1
## 5:          2
## 6:          2

Calculate outside good’s market share and create logged dependent variable.

boat_dt[, outside_good_ms := 1 - sum(shares), by = market_ids]
boat_dt[, ln_sj_s0 := log(shares / outside_good_ms)]

head(boat_dt[, .(outside_good_ms, market_ids, ln_sj_s0)])
##    outside_good_ms market_ids     ln_sj_s0
## 1:       0.1780655          1  1.029282011
## 2:       0.1780655          1 -0.555462791
## 3:       0.1780655          1 -3.603007839
## 4:       0.1780655          1  0.195409217
## 5:       0.3555943          2 -0.003273548
## 6:       0.3555943          2 -0.957448236

Estimate the logit model.

logit <-
  feols(ln_sj_s0 ~ -1 + quality + prices + lenght + roof, data = boat_dt)

etable(logit)
##                               logit
## Dependent Var.:            ln_sj_s0
##                                    
## quality           1.092*** (0.0329)
## prices          -0.1248*** (0.0254)
## lenght          -0.2148*** (0.0120)
## roof              1.758*** (0.0526)
## _______________ ___________________
## S.E. type                       IID
## Observations                  4,000
## R2                          0.32147
## Adj. R2                     0.32096
  • All estimates are badly biased.
  • True model is quality = 2, prices = -2, length = 0.5 and roof = 4.
  • Price is positively correlated with unobservable quality \(\xi_{jt}\), and unobservable quality is positively correlated with market shares.

4.3 Estimate logit using 2SLS. Can you use the observed quality and costs shifter as instruments? Why?

  • Observed quality of other firms at the market and firm’s own cost shifter are useful instruments.
  • cost shifter affects prices through the equilibrium conditions, mc, and it is uncorrelated with unobserved quality by assumption.
    • Here we must think that the cost shifter is something that does not correlate with unobservable quality. An example could be the distance from factory to the store or the price of some neccessary component or labor costs.
  • Other firms’ observed quality affects firm’s own prices, as firm must price lower if competitors sell high quality goods.
  • Other firms’ observed quality is uncorrelated with unobservable quality of my good by assumption.
  • The assumption is motivated the fact that we take the market structure, including goods there, as exogenous. We do not allow firms to adjust their products in response to competitors products. We have a static model.
boat_dt[, sum_obs_quality_iv := sum(quality) - quality, by = market_ids]
boat_dt[, sum_cost_shifter_iv := sum(cost_shifter) - cost_shifter, by = market_ids]
logit_2sls <-
  feols(ln_sj_s0 ~ -1 + quality + lenght + roof | 
        prices ~ sum_obs_quality_iv + cost_shifter + sum_cost_shifter_iv,
      data = boat_dt)

logit_2sls_no_sum_cs <-
  feols(ln_sj_s0 ~ -1 + quality + lenght + roof | 
        prices ~ sum_obs_quality_iv + cost_shifter,
      data = boat_dt)

etable(logit, logit_2sls, logit_2sls_no_sum_cs)
##                               logit         logit_2sls logit_2sls_no_su..
## Dependent Var.:            ln_sj_s0           ln_sj_s0           ln_sj_s0
##                                                                          
## quality           1.092*** (0.0329)  1.616*** (0.0483)  1.614*** (0.0483)
## prices          -0.1248*** (0.0254) -1.712*** (0.0559) -1.707*** (0.0563)
## lenght          -0.2148*** (0.0120) 0.4306*** (0.0243) 0.4287*** (0.0245)
## roof              1.758*** (0.0526)  3.111*** (0.0825)  3.107*** (0.0827)
## _______________ ___________________ __________________ __________________
## S.E. type                       IID                IID                IID
## Observations                  4,000              4,000              4,000
## R2                          0.32147           -0.33965           -0.33572
## Adj. R2                     0.32096           -0.34065           -0.33672
  • All estimates look better, while not good. ** True model is prices = -2, quality = 2, length = 0.5 and roof = 4.
  • Our model is still misspesified as we are omitting the consumer spesific heterogeneity with respect to how much they like boats with roofs.
  • Still this is much better in terms of price and very easy to implement.

4.4 Calculate the price elasticity in each market for each firm.

Let us derive the formula for the price elasticity.

\[\begin{align*} \frac{\partial s_{jt}}{\partial p_{jt}} \frac{p_{jt}}{s_{jt}} &= \frac{\frac{\partial}{\partial p_{jt}} e^{V_{jt}} (1 +\sum_k^4 e^{V_{kt}}) - \frac{\partial}{\partial p_{jt}}(1 +\sum_k^4 e^{V_{kt}}) e^{V_{jt}}}{(1 + \sum_k^4 e^{V_{kt}})^2} \frac{p_{jt}}{s_{jt}} \\ &= \frac{e^{V_{jt}} \frac{\partial V_{jt}}{\partial p_{jt}} (1 +\sum_k^4 e^{V_{kt}}) - e^{2V_{jt}} \frac{\partial V_{jt}}{\partial p_{jt}}}{(1 +\sum_k^4 e^{V_{kt}})^2} \frac{p_{jt}}{s_{jt}} \\ &= \alpha s_{jt} - \alpha s_{jt}^2 = \alpha(1 - s_{jt})s_{jt} \frac{p_{jt}}{s_{jt}} = \alpha(1 - s_{jt})p_{jt} \end{align*}\]

The elasticity only depends on firm \(j\)’s price and market share as well as \(\alpha\). I use estimate for alpha from logit_2sls spesification.

alpha <- logit_2sls$coefficients["fit_prices"]
names(alpha) <- NULL# remove column name

# calcluates price elasticity
boat_dt[, elasticity_jt := alpha * (1 - shares) * prices]

# plots
logit_plot <-
  ggplot(data = boat_dt,
         aes(
           x = elasticity_jt,
           group = as.factor(firm_ids),
           fill = as.factor(firm_ids)
         )) + 
  geom_density(alpha = 0.4) +
  ggtitle("Logit Price Elasticities Across Markets")

logit_plot

boat_dt[, .("Average elasticity" = mean(elasticity_jt)), by = firm_ids]
##    firm_ids Average elasticity
## 1:        1          -5.616266
## 2:        2          -5.304192
## 3:        3          -4.083346
## 4:        4          -3.945181

4.5 Calculate Logit Diversion Ratios

\[\begin{align*} D_{jk} = \frac{s_{kt}}{1 - s_{jt}} \end{align*}\]

Conlon and Mortimer (2021) define the diversion ratio as follows: “Diversion ratio \(D_{jk}\) is the fraction of consumers who leave product \(j\) after a price increase and switch to product \(k\).”

logit_div_13 <- 
  boat_dt[firm_ids == 3, shares] / boat_dt[firm_ids == 1, 1 - shares]

logit_div_31 <- 
  boat_dt[firm_ids == 1, shares] / boat_dt[firm_ids == 3, 1 - shares]

logit_div_12 <- 
  boat_dt[firm_ids == 2, shares] / boat_dt[firm_ids == 1, 1 - shares]

logit_div_43 <- 
  boat_dt[firm_ids == 3, shares] / boat_dt[firm_ids == 4, 1 - shares]

data.table("D_13" = mean(logit_div_13),
           "D_31" = mean(logit_div_31),
           "D_12" = mean(logit_div_12),
           "D_43" = mean(logit_div_43))
##         D_13      D_31      D_12      D_43
## 1: 0.1921416 0.3445739 0.3438125 0.1513089

These substitution patterns are lacking as I do not take into account that firm 3 has roof in it’s boat, while 2 does not.

4.6 Write down the nested logit estimation equation.

Our estimation equation is \[\begin{align*} ln(s_{jtg}) - ln(s_{0t}) &= \delta_{jt} + \sigma ln(\bar{s_{jt/g}}) + \xi_{jt}\\ ln(s_{jtg}/s_{0t}) &= \alpha p_{jt} + \beta^{(1)}x_{jt} + \beta^{(2)}l_j + \beta^{(3)} roof_{j} + \sigma ln(\bar{s_{jt/g}}) + \xi_{jt} \end{align*}\]

  • exogenous variables: observed quality and length.

  • endogenous variables: price and within group market share \(s_{jt/g}\)

  • take the nest structure into account when instrumenting.

  • We instrument price with own cost shifter, within nets competitors cost shifter and other nest’s competitors’ summed cost shifters, as well as withing nest competitor’s observed quality and other nests competitors summed quality.

  • Market share is instrumented with withing nest competitor’s observed quality and other nests competitors’ summed observed quality.

boat_dt[, group := fifelse(firm_ids %in% c(1, 2), 1, 2)]

head(boat_dt[, .(firm_ids, group)])
##    firm_ids group
## 1:        1     1
## 2:        2     1
## 3:        3     2
## 4:        4     2
## 5:        1     1
## 6:        2     1
boat_dt[, within_nest_ms := shares / sum(shares), by = .(group, market_ids)]

# cost shifter IV for within nest market share
boat_dt[,
        own_n_comp_costs := sum(cost_shifter) - cost_shifter,
        by = .(market_ids, group)]

boat_dt[,
        n_cost_shifter := sum(cost_shifter),
        by = .(group, market_ids)]

other_groups_cost_shift <- boat_dt[, .(n_cost_shifter, group, market_ids)]

other_groups_cost_shift[, merge_group :=  fifelse(group == 1, 2, 1)]
other_groups_cost_shift[, group := NULL]
other_groups_cost_shift[, group := merge_group]
other_groups_cost_shift[, other_n_cost_s := n_cost_shifter]

boat_dt[other_groups_cost_shift, on = c("group", "market_ids"),
        other_n_cost_s := other_n_cost_s]

# obs quality shifter IV for within nest market share
boat_dt[,
        own_n_obs_quality := sum(quality) - quality,
        by = .(group, market_ids)]

boat_dt[,
        n_obs_quality := sum(quality),
        by = .(group, market_ids)]

other_groups_quality <- boat_dt[, .(n_obs_quality, group, market_ids)]

other_groups_quality[, merge_group :=  fifelse(group == 1, 2, 1)]
other_groups_quality[, group := NULL]
other_groups_quality[, group := merge_group]
other_groups_quality[, other_n_qual := n_obs_quality]

boat_dt[other_groups_quality, on = c("group", "market_ids"),
        other_n_qual := i.other_n_qual]

head(boat_dt[,
             .(firm_ids,
               within_nest_ms,
               market_ids,
               own_n_obs_quality,
               other_n_qual,
               other_n_cost_s)]
     )
##    firm_ids within_nest_ms market_ids own_n_obs_quality other_n_qual
## 1:        1     0.82987545          1         0.7524592     1.098638
## 2:        2     0.17012455          1         1.2291334     1.098638
## 3:        3     0.02191518          1         0.7570901     1.981593
## 4:        4     0.97808482          1         0.3415478     1.981593
## 5:        1     0.72195397          2         0.8794283     2.171100
## 6:        2     0.27804603          2         0.5739109     2.171100
##    other_n_cost_s
## 1:       1.094955
## 2:       1.094955
## 3:       2.322705
## 4:       2.322705
## 5:       1.483692
## 6:       1.483692

Estimate spesifications with and without IVs. I find that the model works better if I do not use other firms’ cost shifters as IVs.

n_logit<-
  feols(ln_sj_s0 ~ -1 + quality + lenght + roof + prices + log(within_nest_ms),
      data = boat_dt)

n_logit_2sls<-
  feols(ln_sj_s0 ~ -1 + quality + lenght + roof | 
        prices + log(within_nest_ms) ~ cost_shifter + own_n_obs_quality +
          other_n_qual,
      data = boat_dt)

n_logit_2sls_no_r<-
  feols(ln_sj_s0 ~ -1 + quality + lenght | 
        prices + log(within_nest_ms) ~ cost_shifter + own_n_obs_quality +
          other_n_qual,
      data = boat_dt)

n_logit_2sls_all_ivs <-
  feols(ln_sj_s0 ~ -1 + quality + lenght + roof | 
        prices + log(within_nest_ms) ~ cost_shifter + own_n_obs_quality +
          other_n_qual + other_n_qual + own_n_comp_costs + other_n_cost_s,
      data = boat_dt)

n_logit_2sls_all_ivs_no_r <-
  feols(ln_sj_s0 ~ -1 + quality + lenght | 
        prices + log(within_nest_ms) ~ cost_shifter + own_n_obs_quality +
          other_n_qual + other_n_qual + own_n_comp_costs + other_n_cost_s,
      data = boat_dt)

etable(n_logit, n_logit_2sls, n_logit_2sls_no_r, n_logit_2sls_all_ivs, logit_2sls, n_logit_2sls_all_ivs_no_r)
##                                 n_logit       n_logit_2sls  n_logit_2sls_no_r
## Dependent Var.:                ln_sj_s0           ln_sj_s0           ln_sj_s0
##                                                                              
## quality              0.6804*** (0.0248)  1.415*** (0.0582) 0.9330*** (0.0507)
## lenght               -0.0259** (0.0093) 0.3791*** (0.0222) 0.6249*** (0.0300)
## roof                  1.115*** (0.0397)  2.710*** (0.1066)                   
## prices              -0.0799*** (0.0185) -1.442*** (0.0692) -1.289*** (0.0689)
## log(within_nest_ms)  0.8631*** (0.0144) 0.2430*** (0.0530) 0.6690*** (0.0450)
## ___________________ ___________________ __________________ __________________
## S.E. type                           IID                IID                IID
## Observations                      4,000              4,000              4,000
## R2                              0.64239            0.01259           -0.13569
## Adj. R2                         0.64204            0.01160           -0.13654
##                     n_logit_2sls_all..         logit_2sls n_logit_2sls_all..
## Dependent Var.:               ln_sj_s0           ln_sj_s0           ln_sj_s0
##                                                                             
## quality              1.370*** (0.0488)  1.616*** (0.0483) 0.9621*** (0.0489)
## lenght              0.3682*** (0.0200) 0.4306*** (0.0243) 0.6091*** (0.0286)
## roof                 2.621*** (0.0864)  3.111*** (0.0825)                   
## prices              -1.382*** (0.0546) -1.712*** (0.0559) -1.285*** (0.0636)
## log(within_nest_ms) 0.2984*** (0.0392)                    0.5912*** (0.0396)
## ___________________ __________________ __________________ __________________
## S.E. type                          IID                IID                IID
## Observations                     4,000              4,000              4,000
## R2                             0.07960           -0.33965           -0.16355
## Adj. R2                        0.07868           -0.34065           -0.16442

4.7 Calculate elasticities for nested logit.

I use estimates from the n_logit_2sls.

alpha <- n_logit_2sls$coefficients["fit_prices"]
names(alpha) <- NULL

sigma <- n_logit_2sls$coefficients["fit_log(within_nest_ms)"]
names(sigma) <- NULL

boat_dt[, n_elasticity_jt := alpha * prices *
          ((1 / (1 - sigma)) - (sigma / (1 - sigma) * within_nest_ms) - shares)
        ]
# plots
n_logit_plot <-
  ggplot(data = boat_dt,
         aes(
           x = n_elasticity_jt,
           group = as.factor(firm_ids),
           fill = as.factor(firm_ids)
         )) + 
  geom_density(alpha = 0.4) +
  ggtitle("Nested Logit Price Elasticities Across Markets")

par(mar = c(4, 4, .1, .1))
logit_plot
n_logit_plot

boat_dt[, .("Average elasticity" = mean(n_elasticity_jt)), by = firm_ids]
##    firm_ids Average elasticity
## 1:        1          -5.695437
## 2:        2          -5.484764
## 3:        3          -4.065844
## 4:        4          -3.933168
D_13_n <-
  boat_dt[firm_ids == 3, shares * (1 - sigma)] /
  boat_dt[firm_ids == 1, 1 - sigma * within_nest_ms + (1 - sigma) * shares]

D_31_n <-
  boat_dt[firm_ids == 1, shares * (1 - sigma)] /
  boat_dt[firm_ids == 3, 1 - sigma * within_nest_ms + (1 - sigma) * shares]

D_12_n <-
  boat_dt[firm_ids == 2, shares * (1 - sigma)] /
  boat_dt[firm_ids == 1, 1 - sigma * within_nest_ms + (1 - sigma) * shares]


data.table("D_13" = mean(logit_div_13),
           "D_31" = mean(logit_div_31),
           "D_13_n" = mean(D_13_n),
           "D_31_n" = mean(D_31_n))
##         D_13      D_31     D_13_n    D_31_n
## 1: 0.1921416 0.3445739 0.09510307 0.2321172

These are the better diversion ratios as they take into account whether some boats have roofs or not.

References

Conlon, C., & Mortimer, J. H. (2021). Empirical properties of diversion ratios. The RAND Journal of Economics, 52(4), 693-726.

Appendix

2SLS nested logit without roof.

n_logit_estimates <-
  feols(ln_sj_s0 ~ -1 + quality + lenght | 
        prices + log(within_nest_ms) ~ cost_shifter + own_n_obs_quality + other_n_qual + own_n_comp_costs,
      data = boat_dt)

etable(n_logit_estimates)
##                      n_logit_estimates
## Dependent Var.:               ln_sj_s0
##                                       
## prices              -1.485*** (0.0722)
## log(within_nest_ms) 0.4943*** (0.0444)
## quality              1.048*** (0.0543)
## lenght              0.6886*** (0.0323)
## ___________________ __________________
## S.E. type                          IID
## Observations                     4,000
## R2                            -0.40837
## Adj. R2                       -0.40943