Micro Enterprise Exposure and Bank
Profitability During the COVID Period

A Causal Assessment Using Difference in Differences, PSM and Fixed Effects
Models

Objective

The objective of this analysis is to examine whether a higher exposure to micro-sized firms within the corporate loan portfolio affects bank profitability, focusing in particular on the period surrounding the COVID-19 shock.

Lending to very small firms is often considered riskier and more costly for banks due to higher monitoring costs, weaker collateral, and greater default risk compared to lending to larger firms. At the same time, micro-enterprises play an important role in economic development, especially in emerging economies.

This project therefore investigates whether banks with higher micro exposure—defined as the share of lending to micro-sized firms within the total corporate loan portfolio—experienced a different change in return on assets (ROA) during the COVID period compared to banks with lower micro exposure.

Data

The analysis uses bank-level panel data for Brazilian financial institutions.
The original data originate from the Central Bank of Brazil's IFData database:

https://www3.bcb.gov.br/ifdata/

For convenience, the dataset used in this project was downloaded from the following public repository, which compiles and organizes the IFData bank-level information:

https://github.com/dkgaraujo/brazilianbanks

The dataset contains financial statements and balance sheet information for Brazilian banks over multiple years.

Methodology

To estimate the causal effect of micro enterprise lending exposure on profitability, the analysis combines several empirical methods commonly used in applied econometrics:

  • Difference-in-Differences (DiD) to compare changes in profitability between banks with high and low micro exposure over time.
  • Propensity Score Matching (PSM) to construct a control group of banks with similar observable characteristics.
  • Fixed Effects Panel Regression to control for unobserved bank-specific heterogeneity.

By combining matching with panel regressions, the analysis attempts to reduce selection bias and isolate the impact of micro enterprise lending on bank profitability.

R codeCell 1
Show code
library(tidyverse)
library(fixest)
library(MatchIt)

load("brazilian_banks_201403_onwards.rda")
R codeCell 2
Show code
# ============================================================
# Data preparation function
# ------------------------------------------------------------
# Purpose:
#   Create a clean panel dataset for the corporate lending analysis.
#
# Main steps:
#   1. Construct Micro exposure variables
#   2. Apply basic sample restrictions
#   3. Drop observations with missing values in key variables
#   4. Create transformed balance-sheet and profitability measures
#   5. Define treatment status using micro exposure in a pre-treatment quarter
#   6. Return the final estimation dataset
# ============================================================

make_data <- function(
  pre_q_char,                         # e.g. "2019-12-31"
  data_raw = brazilian_banks_201403_onwards,
  EMP_COL = NULL
) {

  # ----------------------------------------------------------
  # 0. Convert the reference quarter to Date format
  #    and set the last allowed quarter in the sample
  # ----------------------------------------------------------
  pre_q  <- as.Date(pre_q_char)
  last_q <- as.Date("2022-06-30")

  # ----------------------------------------------------------
  # 1. Construct lending variables and apply initial filtering
  # ----------------------------------------------------------
  data <- data_raw %>%
    dplyr::mutate(
      micro = Credit_portfolio_of_micro_sized_borrower,
      small = Credit_portfolio_of_small_sized_borrower,
      med   = Credit_portfolio_of_medium_sized_borrower,
      large = Credit_portfolio_of_large_sized_borrower
    ) %>%
    dplyr::filter(
      !is.na(Legal_Person_Loans_Total),
      Legal_Person_Loans_Total != 0,
      Quarter <= last_q
    ) %>%
    dplyr::mutate(
      num = dplyr::coalesce(micro, 0),
      denom = dplyr::coalesce(micro, 0) +
                  dplyr::coalesce(small, 0) +
                  dplyr::coalesce(med, 0) +
                  dplyr::coalesce(large, 0),
      micro_exposure = num / Legal_Person_Loans_Total
    )

  # ----------------------------------------------------------
  # 2. Keep only observations with non-missing values in the
  #    variables required for the empirical analysis
  # ----------------------------------------------------------
  needed_vars <- c(
    "FinInst",
    "Quarter",
    "Net_Income_qtr",
    "Total_Assets",
    "Equity",
    "Total_Deposits",
    "Legal_Person_Loans_Total",
    "Income_Statement__Other_Operating_Income_and_Expenses__Administrative_Expenses_qtr",
    "TD",
    "Headquarters___State",
    "Segment",
    "micro_exposure"
  )

  if (!is.null(EMP_COL)) {
    needed_vars <- c(needed_vars, EMP_COL)
  }

  needed_vars <- intersect(needed_vars, names(data))

  data_clean <- data %>%
    dplyr::filter(
      dplyr::if_all(dplyr::all_of(needed_vars), ~ !is.na(.))
    )

  # ----------------------------------------------------------
  # 3. Create transformed variables used in the regressions
  # ----------------------------------------------------------
  df_final <- data_clean %>%
    dplyr::mutate(
      log_assets      = log(Total_Assets),
      eq_assets       = Equity / Total_Assets,
      dep_assets      = Total_Deposits / Total_Assets,
      loans_assets    = Legal_Person_Loans_Total / Total_Assets,
      adminexp_assets = -Income_Statement__Other_Operating_Income_and_Expenses__Administrative_Expenses_qtr / Total_Assets,
      state           = as.factor(Headquarters___State),
      segment         = as.factor(Segment),
      TD              = as.factor(TD),
      roa             = 400 * Net_Income_qtr / Total_Assets
    ) %>%
    dplyr::arrange(Quarter, FinInst)

  # ----------------------------------------------------------
  # 4. Define treatment status based on micro exposure in the
  #    chosen pre-treatment quarter
  #
  #    Control group: micro exposure < 5%
  #    Treated group: micro exposure >= 50%
  #    Middle group: dropped
  # ----------------------------------------------------------
  df_pre <- df_final %>%
    dplyr::filter(Quarter == pre_q) %>%
    dplyr::mutate(
      treatment = dplyr::case_when(
        micro_exposure < 0.05 ~ 0L,
        micro_exposure >= 0.50 ~ 1L,
        TRUE ~ NA_integer_
      )
    )

  selected_banks <- df_pre %>%
    dplyr::filter(!is.na(treatment)) %>%
    dplyr::distinct(FinInst, treatment)

  # ----------------------------------------------------------
  # 5. Merge treatment labels back to the full panel and keep
  #    only the variables needed for the final dataset
  # ----------------------------------------------------------
  df_out <- df_final %>%
    dplyr::inner_join(selected_banks, by = "FinInst") %>%
    dplyr::select(
      FinInst,
      Quarter,
      treatment,
      log_assets,
      eq_assets,
      dep_assets,
      loans_assets,
      adminexp_assets,
      roa,
      state,
      segment,
      TD,
      micro_exposure,
      dplyr::any_of(EMP_COL)
    ) %>%
    dplyr::arrange(Quarter, FinInst)

  return(df_out)
}

# Create the final dataset using 2019 Q4
df_out <- make_data(pre_q_char = "2019-12-31")

Data Preparation

The code above constructs the panel dataset used in the empirical analysis.

First, the raw supervisory data are processed to construct the key variable of interest: micro exposure. This variable measures the share of lending to micro-sized firms within the total corporate loan portfolio of each bank. It is calculated as the ratio of the credit portfolio of micro-sized borrowers to total loans to legal persons.

Several filtering steps are then applied. Observations are removed if total corporate lending (Legal_Person_Loans_Total) is missing or equal to zero, and the sample is restricted to quarters up to 2022 Q2. In addition, observations with missing values in the main balance sheet and institutional variables are dropped. The variables required to remain non-missing include:

  • FinInst
  • Quarter
  • Net_Income_qtr
  • Total_Assets
  • Equity
  • Total_Deposits
  • Legal_Person_Loans_Total
  • Income_Statement__Other_Operating_Income_and_Expenses__Administrative_Expenses_qtr
  • TD
  • Headquarters___State
  • Segment
  • micro_exposure

Next, several transformed variables used in the econometric models are created:

  • Bank size: log_assets = log(Total_Assets)
  • Capitalization: eq_assets = Equity / Total_Assets
  • Funding structure: dep_assets = Total_Deposits / Total_Assets
  • Loan intensity: loans_assets = Legal_Person_Loans_Total / Total_Assets
  • Administrative costs: adminexp_assets = -Administrative_Expenses / Total_Assets
  • Profitability: roa = 400 × Net_Income_qtr / Total_Assets

Institutional characteristics are also converted to categorical variables, including the bank's state of headquarters, business segment, and institution type (TD).

Finally, banks are classified into treatment groups based on their micro exposure in the pre-treatment quarter (2019 Q4). Banks with very low exposure (micro_exposure < 5%) form the control group, while banks with very high exposure (micro_exposure ≥ 50%) form the treated group. Banks with intermediate exposure are excluded in order to create a clear contrast between low- and high-exposure institutions.

The resulting dataset is a balanced panel of banks with treatment status, financial characteristics, and profitability measures that can be used in the subsequent matching and difference-in-differences analysis.

R codeCell 3
Show code
df_out %>%
  group_by(Quarter, treatment) %>%
  summarise(n = n(), .groups = "drop") %>%
  tidyr::pivot_wider(
    names_from = treatment,
    values_from = n,
    names_prefix = "treat_"
  )
Output
A tibble: 34 × 3
Quartertreat_0treat_1
<date><int><int>
2014-03-31141 79
2014-06-30145 82
2014-09-30146 88
2014-12-31149 87
2015-03-31148 86
2015-06-30148 87
2015-09-30152 89
2015-12-31149 89
2016-03-31152 94
2016-06-30151 96
2016-09-30156 98
2016-12-31158 99
2017-03-31163 99
2017-06-30160 94
2017-09-30164 99
2017-12-31159 89
2018-03-31165100
2018-06-30164 95
2018-09-30172101
2018-12-31174100
2019-03-31175104
2019-06-30178 95
2019-09-30182108
2019-12-31186108
2020-03-31182106
2020-06-30175103
2020-09-30176100
2020-12-31168 98
2021-03-31170 97
2021-06-30158 83
2021-09-30167 94
2021-12-31154 78
2022-03-31163 94
2022-06-30165 83

Sample Size Over Time

The code above reports the number of banks in the control and treated groups in each quarter. The control group (low micro exposure) contains roughly 140–180 banks, while the treated group (high micro exposure) contains about 80–110 banks per quarter.

Although the exact counts vary slightly over time, the sample composition remains relatively stable across the period.

R codeCell 4
Show code
plot_means <- function(
  data,
  varname,
  shock_date = as.Date("2019-12-31"),
  pretty_name = NULL
) {
  
  if (!("Quarter" %in% names(data))) stop("The dataset must contain a 'Quarter' column.")
  if (!("treatment" %in% names(data))) stop("The dataset must contain a 'treatment' column.")
  if (!(varname %in% names(data))) stop(paste0("Variable '", varname, "' not found."))
  
  if (is.null(pretty_name)) pretty_name <- varname
  
  df_plot <- data %>%
    dplyr::group_by(Quarter, treatment) %>%
    dplyr::summarise(
      mean_value = mean(.data[[varname]], na.rm = TRUE),
      .groups = "drop"
    ) %>%
    dplyr::mutate(
      treatment = factor(treatment, levels = c(0, 1), labels = c("Control", "Treated"))
    )
  
  p <- ggplot2::ggplot(
    df_plot,
    ggplot2::aes(
      x = Quarter,
      y = mean_value,
      color = treatment,
      group = treatment
    )
  ) +
    ggplot2::geom_line(linewidth = 0.9) +
    ggplot2::geom_point(size = 1.8) +
    ggplot2::geom_vline(
      xintercept = shock_date,
      linetype = "dashed",
      color = "black",
      linewidth = 0.6
    ) +
    ggplot2::scale_x_date(
      date_breaks = "1 year",
      date_labels = "%Y"
    ) +
    ggplot2::labs(
      title = pretty_name,
      x = NULL,
      y = NULL,
      color = NULL
    ) +
    ggplot2::theme_minimal(base_size = 12) +
    ggplot2::theme(
      plot.title = ggplot2::element_text(face = "bold", hjust = 0.5),
      axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
      legend.position = "bottom",
      panel.grid.minor = ggplot2::element_blank()
    )
  
  return(p)
}
R codeCell 5
Show code
library(patchwork)

vars <- c(
  "roa",
  "log_assets",
  "eq_assets",
  "dep_assets",
  "loans_assets",
  "adminexp_assets"
)

plots <- lapply(vars, function(v) {
  plot_means(df_out, v, pretty_name = v)
})

combined_plot <- wrap_plots(plots, ncol = 2) +
  plot_layout(guides = "collect") &
  theme(legend.position = "bottom")

combined_plot

ggplot2::ggsave(
  "bank_variables_over_time.png",
  plot = combined_plot,
  width = 14,
  height = 12,
  dpi = 300
)
Output
Plot output
plot without title

Preliminary Patterns

The figures show that bank profitability (ROA) declines after the onset of the COVID period. At the same time, the other continuous variables — including bank size, capitalization, deposit funding, loan intensity, and administrative expenses — also evolve over time and do not move identically across the two groups.

This suggests that the decline in ROA cannot be attributed to micro exposure alone. Part of the difference in profitability may also be related to other bank characteristics, which motivates the use of a more formal empirical strategy in the next step.

R codeCell 6
Show code
# ============================================================
# Construct a pre–post dataset
# ------------------------------------------------------------
# The function extracts bank characteristics in a pre-period
# and combines them with profitability in a later post-period.
# It then computes the change in ROA between the two periods.
# ============================================================

make_pre_post_df <- function(df, pre_q_char, post_q_char) {

  pre_q  <- as.Date(pre_q_char)
  post_q <- as.Date(post_q_char)

  # --- 1. Data from the pre-treatment quarter ---
  df_pre <- df %>%
    dplyr::filter(Quarter == pre_q)

  # --- 2. ROA from the post-treatment quarter ---
  df_post <- df %>%
    dplyr::filter(Quarter == post_q) %>%
    dplyr::select(FinInst, roa_post = roa)

  # --- 3. Merge the two datasets by bank ---
  df_out <- df_pre %>%
    dplyr::inner_join(df_post, by = "FinInst") %>%
    dplyr::mutate(
      roa_diff = roa_post - roa   # change in ROA between periods
    )

  return(df_out)
}

# Create the pre–post dataset
pre_post_df <- make_pre_post_df(
  df_out,
  pre_q_char  = "2019-12-31",
  post_q_char = "2021-09-30"
)

# ============================================================
# Collapse small categorical groups
# ------------------------------------------------------------
# Categories with very few observations are grouped into
# an "Other" category to avoid extremely small cells.
# ============================================================

collapse_categories <- function(x, min_n = 5) {
  tab <- table(x)
  valid_levels <- names(tab)[tab >= min_n]

  x_new <- ifelse(x %in% valid_levels, as.character(x), "Other")
  factor(x_new)
}

# Apply the collapsing to categorical variables
pre_post_df <- pre_post_df %>%
  dplyr::mutate(
    state   = collapse_categories(state,   min_n = 5),
    segment = collapse_categories(segment, min_n = 5)
  )

Pre–Post Dataset Construction

For the difference-in-differences analysis, a specific pre-treatment and post-treatment period are selected. Bank characteristics are taken from the pre-treatment quarter (2019 Q4), while profitability is measured in the post period (2021 Q3). This allows the construction of the change in return on assets (ROA) between the two periods for each bank.

In addition, rare categories in the state and segment variables are grouped into an "Other" category to avoid very small groups in the subsequent analysis.

R codeCell 7
Show code
did_simple <- lm(roa_diff  ~ treatment, data = pre_post_df)
summary(did_simple)
Output
Call:
lm(formula = roa_diff ~ treatment, data = pre_post_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-79.447  -1.167   0.882   3.429  46.447 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -0.7000     0.8802  -0.795   0.4271  
treatment    -3.2815     1.4666  -2.237   0.0261 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.37 on 259 degrees of freedom
Multiple R-squared:  0.01896,	Adjusted R-squared:  0.01517 
F-statistic: 5.006 on 1 and 259 DF,  p-value: 0.02611
R codeCell 8
Show code
did_simple <- lm(roa_diff  ~ treatment + segment + log_assets + eq_assets + TD + loans_assets + dep_assets + state, data = pre_post_df)
summary(did_simple)
Output
Call:
lm(formula = roa_diff ~ treatment + segment + log_assets + eq_assets + 
    TD + loans_assets + dep_assets + state, data = pre_post_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-65.986  -2.581   0.212   2.697  49.589 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -37.395958  12.390427  -3.018  0.00282 **
treatment     -0.010944   2.055587  -0.005  0.99576   
segment198    -2.819954   3.745939  -0.753  0.45231   
segment199    -2.553810   3.235691  -0.789  0.43074   
segment9      -0.004881   2.567771  -0.002  0.99848   
segmentOther  -2.217640   5.650745  -0.392  0.69508   
log_assets     1.584399   0.502393   3.154  0.00182 **
eq_assets     -0.327503   4.125805  -0.079  0.93680   
TDI            3.889095   2.703594   1.438  0.15161   
loans_assets   1.594258   0.918024   1.737  0.08375 . 
dep_assets     1.590948   3.082658   0.516  0.60627   
stateDF        0.883257   6.191549   0.143  0.88668   
stateES        3.209862   5.456761   0.588  0.55693   
stateGO        3.651265   5.323461   0.686  0.49346   
stateMG        1.425543   3.980504   0.358  0.72056   
stateOther    -0.021271   4.427790  -0.005  0.99617   
statePB        1.200158   6.119028   0.196  0.84467   
statePR       -3.406259   4.613399  -0.738  0.46104   
stateRJ        0.853528   4.423344   0.193  0.84715   
stateRO        2.842554   5.143940   0.553  0.58105   
stateRS        1.811756   4.137031   0.438  0.66183   
stateSC        4.419439   3.983573   1.109  0.26837   
stateSP        1.276674   3.784586   0.337  0.73616   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.21 on 238 degrees of freedom
Multiple R-squared:  0.1246,	Adjusted R-squared:  0.04367 
F-statistic:  1.54 on 22 and 238 DF,  p-value: 0.06223

Difference-in-Differences Estimation

To estimate the relationship between micro exposure and the change in profitability, a simple difference-in-differences style regression is estimated. The outcome variable is the change in return on assets between the pre- and post-period:

$\Delta ROA_i = ROA_{i,post} - ROA_{i,pre}$

Baseline specification

The simplest specification compares the change in profitability between treated and control banks:

$\Delta ROA_i = \alpha + \beta \, Treatment_i + \varepsilon_i$

where $Treatment_i$ equals 1 for banks with high micro exposure and 0 for banks with low micro exposure.

The estimated coefficient is negative and statistically significant:

$\hat{\beta} \approx -3.28$

This suggests that banks with high micro exposure experienced a larger decline in profitability between the pre- and post-period.

Specification with controls

To account for differences in bank characteristics, the regression is extended with control variables:

$\Delta ROA_i = \alpha + \beta \, Treatment_i + \gamma_1 \log(Assets_i) + \gamma_2 EqAssets_i + \gamma_3 LoansAssets_i + \gamma_4 DepAssets_i + \delta_{Segment_i} + \delta_{State_i} + \delta_{TD_i} + \varepsilon_i$

After controlling for bank characteristics and institutional factors, the treatment coefficient becomes close to zero and statistically insignificant.

R codeCell 9
Show code
library(cobalt)

psm_seg <- matchit(
  treatment ~ log_assets + eq_assets + dep_assets +
    loans_assets + adminexp_assets,
  data   = pre_post_df,
  method = "nearest",
  ratio  = 5,
  caliper = 0.05,
  distance = "logit"
)

summary(psm_seg2)$sum.matched

love.plot(psm_seg, binary = "std")
Output
Warning message:
"glm.fit: fitted probabilities numerically 0 or 1 occurred"
Warning message:
"Not all treated units will get 5 matches."
A matrix: 6 × 7 of type dbl
Means TreatedMeans ControlStd. Mean Diff.Var. RatioeCDF MeaneCDF MaxStd. Pair Dist.
distance 0.50287199 0.49788463 0.0204895721.03305880.0092038050.052631580.03056565
log_assets18.1462779918.22556895-0.0416203991.66814950.0471466020.155263160.73568112
eq_assets 0.37605038 0.35898041 0.0598210781.39217040.0434563420.121052630.82959483
dep_assets 0.29396987 0.31787599-0.0863385100.86624110.0623800070.155701751.06369353
loans_assets 0.09906939 0.09844823 0.0046874411.10591850.0550598240.160964911.32801049
adminexp_assets 0.01634512 0.01627290 0.0028073221.36509290.0459921360.123245610.59399229
Plot output
plot without title

Propensity Score Matching

To improve the comparability of treated and control banks, a propensity score matching (PSM) procedure is applied. The propensity score is estimated using a logistic model where treatment status is explained by observable bank characteristics:

$treatment_i = f(\log(assets_i), eq\_assets_i, dep\_assets_i, loans\_assets_i, adminexp\_assets_i)$

Banks in the treated group are then matched to similar banks in the control group using nearest-neighbour matching with a ratio of 5 control banks per treated bank and a caliper restriction to avoid poor matches.

The balance of the covariates after matching is evaluated using the standardized mean differences, which are visualized in the Love plot. The goal of the matching procedure is to ensure that treated and control banks have similar observable characteristics before comparing profitability outcomes.

Categorical variables such as bank segment, state, and institution type were also considered in the matching procedure. However, including these variables made it difficult to obtain a stable and well-balanced matching sample. Given that previous exploratory analysis and regression results did not indicate that these variables play a major role in explaining profitability differences, the final matching specification focuses on continuous balance sheet characteristics.

R codeCell 10
Show code
matched_df <- match.data(psm_seg)
R codeCell 11
Show code
model_ate <- lm(roa_diff ~ treatment, data = matched_df, weights = weights)

summary(model_ate)
Output
Call:
lm(formula = roa_diff ~ treatment, data = matched_df, weights = weights)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-80.421  -0.641   1.579   3.656  28.163 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -2.037      1.584  -1.286    0.201
treatment     -0.970      2.620  -0.370    0.712

Residual standard error: 12.86 on 102 degrees of freedom
Multiple R-squared:  0.001342,	Adjusted R-squared:  -0.008448 
F-statistic: 0.1371 on 1 and 102 DF,  p-value: 0.7119
R codeCell 12
Show code
model_ate <- lm(roa_diff ~ treatment + log_assets + eq_assets + loans_assets + dep_assets, data = matched_df, weights = weights)

summary(model_ate)
Output
Call:
lm(formula = roa_diff ~ treatment + log_assets + eq_assets + 
    loans_assets + dep_assets, data = matched_df, weights = weights)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-60.269  -5.496  -0.970   4.548  26.936 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -76.5227    19.2285  -3.980 0.000132 ***
treatment     -0.6139     2.3326  -0.263 0.792968    
log_assets     3.7068     0.9258   4.004 0.000121 ***
eq_assets      6.0627     6.5558   0.925 0.357343    
loans_assets  23.8734     7.7156   3.094 0.002572 ** 
dep_assets     7.5525     4.5936   1.644 0.103351    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.44 on 98 degrees of freedom
Multiple R-squared:  0.241,	Adjusted R-squared:  0.2023 
F-statistic: 6.224 on 5 and 98 DF,  p-value: 4.708e-05

Treatment Effect After Matching

After constructing the matched sample, the treatment effect is estimated using weighted regressions that incorporate the matching weights.

The baseline specification estimates:

$\Delta ROA_i = \alpha + \beta \, Treatment_i + \varepsilon_i$

In the matched sample, the estimated treatment coefficient is small and statistically insignificant:

$\hat{\beta} \approx -0.97$

A second specification additionally controls for bank characteristics:

$\Delta ROA_i = \alpha + \beta \, Treatment_i + \gamma_1 \log(Assets_i) + \gamma_2 EqAssets_i + \gamma_3 LoansAssets_i + \gamma_4 DepAssets_i + \varepsilon_i$

In this specification the treatment effect remains close to zero and statistically insignificant:

$\hat{\beta} \approx -0.61$

Overall, the results suggest that once banks are matched on observable characteristics, there is no evidence that higher micro exposure systematically reduces bank profitability.

R codeCell 13
Show code
plot_matched_quarterly_means <- function(
  df_out,
  matched_df,
  varname,
  shock_date = as.Date("2019-12-31"),
  pretty_name = NULL
) {
  # ----------------------------------------------------------
  # 0. Basic checks
  # ----------------------------------------------------------
  if (!(varname %in% names(df_out))) {
    stop(paste("Variable", varname, "not found in df_out."))
  }
  if (!("weights" %in% names(matched_df))) {
    stop("matched_df must contain a 'weights' column.")
  }
  if (!("FinInst" %in% names(matched_df))) {
    stop("matched_df must contain a 'FinInst' column.")
  }
  if (!("treatment" %in% names(matched_df))) {
    stop("matched_df must contain a 'treatment' column.")
  }
  if (!("Quarter" %in% names(df_out))) {
    stop("df_out must contain a 'Quarter' column.")
  }
  if (!("treatment" %in% names(df_out))) {
    stop("df_out must contain a 'treatment' column.")
  }

  if (is.null(pretty_name)) pretty_name <- varname

  # ----------------------------------------------------------
  # 1. Keep only matched banks
  # ----------------------------------------------------------
  matched_banks <- unique(matched_df$FinInst)

  # ----------------------------------------------------------
  # 2. Extract treatment and matching weights
  # ----------------------------------------------------------
  weights_info <- matched_df %>%
    dplyr::select(FinInst, treatment, weights)

  # ----------------------------------------------------------
  # 3. Rebuild the full panel for matched banks only
  # ----------------------------------------------------------
  df_panel <- df_out %>%
    dplyr::filter(FinInst %in% matched_banks) %>%
    dplyr::left_join(weights_info, by = c("FinInst", "treatment"))

  # ----------------------------------------------------------
  # 4. Compute weighted quarterly means
  # ----------------------------------------------------------
  df_plot <- df_panel %>%
    dplyr::group_by(Quarter, treatment) %>%
    dplyr::summarise(
      weighted_mean = weighted.mean(.data[[varname]], weights, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    dplyr::mutate(
      treatment = factor(treatment, levels = c(0, 1), labels = c("Control", "Treated"))
    )

  # ----------------------------------------------------------
  # 5. Plot
  # ----------------------------------------------------------
  p <- ggplot2::ggplot(
    df_plot,
    ggplot2::aes(
      x = Quarter,
      y = weighted_mean,
      color = treatment,
      group = treatment
    )
  ) +
    ggplot2::geom_line(linewidth = 0.9) +
    ggplot2::geom_point(size = 1.8) +
    ggplot2::geom_vline(
      xintercept = shock_date,
      linetype = "dashed",
      color = "black",
      linewidth = 0.6
    ) +
    ggplot2::scale_x_date(
      date_breaks = "1 year",
      date_labels = "%Y"
    ) +
    ggplot2::labs(
      title = pretty_name,
      x = NULL,
      y = NULL,
      color = NULL
    ) +
    ggplot2::theme_minimal(base_size = 12) +
    ggplot2::theme(
      plot.title = ggplot2::element_text(face = "bold", hjust = 0.5),
      axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
      legend.position = "bottom",
      panel.grid.minor = ggplot2::element_blank()
    )

  return(p)
}

library(patchwork)

pretty_names <- c(
  roa = "ROA",
  log_assets = "Log assets",
  eq_assets = "Equity / Assets",
  dep_assets = "Deposits / Assets",
  loans_assets = "Loans / Assets",
  adminexp_assets = "Administrative expenses / Assets"
)

vars <- names(pretty_names)

plots <- lapply(vars, function(v) {
  plot_matched_quarterly_means(
    df_out = df_out,
    matched_df = matched_df,
    varname = v,
    pretty_name = pretty_names[[v]]
  )
})

combined_matched_plot <- wrap_plots(plots, ncol = 2) +
  patchwork::plot_layout(guides = "collect") &
  ggplot2::theme(legend.position = "bottom")

combined_matched_plot
Output
Plot output
plot without title

Balance After Matching

The figures show that after matching the treated and control groups follow much more similar paths over time. The main balance sheet variables and profitability measures move broadly together, suggesting that the matching procedure substantially improved comparability between the two groups.

However, the alignment is not perfect and some differences remain. This indicates that while propensity score matching reduces observable differences between banks, it cannot completely eliminate all sources of heterogeneity.

Fixed Effects Estimation

As a final step, fixed effects regressions are estimated using the full panel dataset. These models exploit the panel structure of the data and control for unobserved time-invariant bank characteristics.

Four specifications are estimated. First, a baseline fixed effects model is estimated without additional controls using the full sample. Second, the same specification is estimated with balance sheet control variables. The analysis is then repeated on the matched sample obtained from the propensity score matching procedure, again with and without control variables.

This results in four models in total, allowing a comparison between unmatched and matched samples as well as between parsimonious and control-rich specifications.

R codeCell 14
Show code
df_out <- df_out %>%
  dplyr::mutate(
    post = ifelse(Quarter > as.Date("2019-12-31"), 1L, 0L)
  )

fe_model1 <- feols(
  roa ~ treatment * post + log_assets + eq_assets  + loans_assets + dep_assets + state + TD + segment  | FinInst + Quarter,
  data = df_out
)
summary(fe_model1)
Output
The variables 'treatment', 'post', 'stateAL', 'stateAM', 'stateAP', 'stateBA'
and 23 others have been removed because of collinearity (see $collin.var).

OLS estimation, Dep. Var.: roa
Observations: 8,717
Fixed-effects: FinInst: 294,  Quarter: 34
Standard-errors: IID 
                Estimate Std. Error   t value   Pr(>|t|)    
log_assets      4.889480   0.298112 16.401481  < 2.2e-16 ***
eq_assets      17.624027   1.396976 12.615840  < 2.2e-16 ***
loans_assets   -0.079003   0.183406 -0.430756 6.6666e-01    
dep_assets      1.231668   1.144717  1.075958 2.8198e-01    
stateCE        40.188755   3.235005 12.423089  < 2.2e-16 ***
stateMG        -4.183376   4.392313 -0.952431 3.4091e-01    
stateRS         9.030908   4.312259  2.094241 3.6269e-02 *  
TDI             1.262167   0.703956  1.792964 7.3015e-02 .  
segment196     10.191690   8.777445  1.161123 2.4563e-01    
segment197      3.231386   1.620738  1.993775 4.6209e-02 *  
segment198      3.475873   1.497635  2.320907 2.0316e-02 *  
treatment:post -2.561198   0.417317 -6.137300 8.7749e-10 ***
... 29 variables were removed because of collinearity (treatment, post
and 27 others [full set in $collin.var])
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 8.20626     Adj. R2: 0.282665
                Within R2: 0.063698
R codeCell 15
Show code
fe_model2 <- feols(
  roa ~ treatment * post | FinInst + Quarter,
  data = df_out
)
summary(fe_model2)
Output
The variables 'treatment' and 'post' have been removed because of collinearity
(see $collin.var).

OLS estimation, Dep. Var.: roa
Observations: 8,717
Fixed-effects: FinInst: 294,  Quarter: 34
Standard-errors: IID 
               Estimate Std. Error  t value   Pr(>|t|)    
treatment:post -2.22683   0.426907 -5.21619 1.8701e-07 ***
... 2 variables were removed because of collinearity (treatment and
post)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 8.46708     Adj. R2: 0.237342
                Within R2: 0.003233
R codeCell 16
Show code
fe_model3 <- feols(
  roa ~ treatment * post | FinInst + Quarter,
  data = df_panel
)
summary(fe_model3)
Output
The variables 'treatment' and 'post' have been removed because of collinearity
(see $collin.var).

OLS estimation, Dep. Var.: roa
Observations: 3,098
Fixed-effects: FinInst: 104,  Quarter: 34
Standard-errors: IID 
               Estimate Std. Error  t value Pr(>|t|) 
treatment:post -0.87331   0.645671 -1.35256   0.1763 
... 2 variables were removed because of collinearity (treatment and
post)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.75912     Adj. R2: 0.276621
                Within R2: 6.177e-4
R codeCell 17
Show code
df_panel <- df_out %>%
  dplyr::filter(FinInst %in% matched_banks) %>%
  dplyr::left_join(weights_info, by = c("FinInst", "treatment"))

fe_model4 <- feols(
  roa ~ treatment * post + log_assets + eq_assets  + loans_assets + dep_assets + TD + state  | FinInst + Quarter,
  data = df_panel
)
summary(fe_model4)
Output
The variables 'treatment', 'post', 'stateAL', 'stateAM', 'stateAP', 'stateBA'
and 21 others have been removed because of collinearity (see $collin.var).

OLS estimation, Dep. Var.: roa
Observations: 3,098
Fixed-effects: FinInst: 104,  Quarter: 34
Standard-errors: IID 
               Estimate Std. Error   t value   Pr(>|t|)    
log_assets      2.24033   0.491480  4.558324 5.3658e-06 ***
eq_assets       7.05519   2.486078  2.837878 4.5724e-03 ** 
loans_assets   -8.16951   2.100127 -3.890008 1.0245e-04 ***
dep_assets     -2.15688   1.981625 -1.088438 2.7649e-01    
TDI            -1.58299   1.610045 -0.983196 3.2559e-01    
stateMG        -6.54715   4.267084 -1.534337 1.2505e-01    
treatment:post -1.00735   0.647378 -1.556051 1.1980e-01    
... 27 variables were removed because of collinearity (treatment, post
and 25 others [full set in $collin.var])
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.69005     Adj. R2: 0.288   
                Within R2: 0.018332

Fixed Effects Results

As a final step, difference-in-differences models with bank and quarter fixed effects are estimated using the panel dataset. The specification can be written as

$ROA_{it} = \alpha_i + \lambda_t + \beta (Treatment_i \times Post_t) + X_{it}\gamma + \varepsilon_{it}$

where $\alpha_i$ denotes bank fixed effects, $\lambda_t$ quarter fixed effects, and the coefficient of interest is the interaction term $Treatment_i \times Post_t$, which captures whether banks with higher micro exposure experienced a different change in profitability after the COVID period.

In the full sample, the estimated effect is negative and statistically significant. Without additional controls the coefficient is approximately

$\hat{\beta} \approx -2.23$

and remains similar when balance sheet controls are included:

$\hat{\beta} \approx -2.56$

This suggests that banks with higher micro exposure experienced a larger decline in ROA after the shock.

However, when the analysis is repeated on the matched sample, the estimated effect becomes smaller and statistically insignificant. Without controls the coefficient is

$\hat{\beta} \approx -0.87$

and with controls

$\hat{\beta} \approx -1.01$.

Overall, the fixed effects results are consistent with the earlier findings: once banks with similar observable characteristics are compared, there is no strong evidence that higher micro exposure systematically reduces bank profitability.

Summary

Method Sample Specification Treatment Effect
DID Full sample No controls $-3.28$
DID Full sample With controls $-0.01$
DID Matched sample No controls $-0.97$
DID Matched sample With controls $-0.61$
FE Full sample No controls $-2.23$
FE Full sample With controls $-2.56$
FE Matched sample No controls $-0.87$
FE Matched sample With controls $-1.01$

The estimates obtained from the full sample are relatively dispersed, likely reflecting the volatility of ROA and the influence of outliers. By contrast, the matched sample produces more consistent results across specifications.

Across the matched regressions, the estimated treatment effect is close to minus one percentage point. While it remains possible that some confounding factors are not fully controlled for, the results suggest that banks with higher micro-enterprise exposure experienced on average roughly a one percentage point decline in return on assets following the onset of COVID.