Objective
The objective of this analysis is to examine whether a higher exposure to micro-sized firms within the corporate loan portfolio affects bank profitability, focusing in particular on the period surrounding the COVID-19 shock.
Lending to very small firms is often considered riskier and more costly for banks due to higher monitoring costs, weaker collateral, and greater default risk compared to lending to larger firms. At the same time, micro-enterprises play an important role in economic development, especially in emerging economies.
This project therefore investigates whether banks with higher micro exposure—defined as the share of lending to micro-sized firms within the total corporate loan portfolio—experienced a different change in return on assets (ROA) during the COVID period compared to banks with lower micro exposure.
Data
The analysis uses bank-level panel data for Brazilian financial institutions.
The original data originate from the Central Bank of Brazil's IFData database:
https://www3.bcb.gov.br/ifdata/
For convenience, the dataset used in this project was downloaded from the following public repository, which compiles and organizes the IFData bank-level information:
https://github.com/dkgaraujo/brazilianbanks
The dataset contains financial statements and balance sheet information for Brazilian banks over multiple years.
Methodology
To estimate the causal effect of micro enterprise lending exposure on profitability, the analysis combines several empirical methods commonly used in applied econometrics:
- Difference-in-Differences (DiD) to compare changes in profitability between banks with high and low micro exposure over time.
- Propensity Score Matching (PSM) to construct a control group of banks with similar observable characteristics.
- Fixed Effects Panel Regression to control for unobserved bank-specific heterogeneity.
By combining matching with panel regressions, the analysis attempts to reduce selection bias and isolate the impact of micro enterprise lending on bank profitability.
Show code
library(tidyverse)
library(fixest)
library(MatchIt)
load("brazilian_banks_201403_onwards.rda")
Show code
# ============================================================
# Data preparation function
# ------------------------------------------------------------
# Purpose:
# Create a clean panel dataset for the corporate lending analysis.
#
# Main steps:
# 1. Construct Micro exposure variables
# 2. Apply basic sample restrictions
# 3. Drop observations with missing values in key variables
# 4. Create transformed balance-sheet and profitability measures
# 5. Define treatment status using micro exposure in a pre-treatment quarter
# 6. Return the final estimation dataset
# ============================================================
make_data <- function(
pre_q_char, # e.g. "2019-12-31"
data_raw = brazilian_banks_201403_onwards,
EMP_COL = NULL
) {
# ----------------------------------------------------------
# 0. Convert the reference quarter to Date format
# and set the last allowed quarter in the sample
# ----------------------------------------------------------
pre_q <- as.Date(pre_q_char)
last_q <- as.Date("2022-06-30")
# ----------------------------------------------------------
# 1. Construct lending variables and apply initial filtering
# ----------------------------------------------------------
data <- data_raw %>%
dplyr::mutate(
micro = Credit_portfolio_of_micro_sized_borrower,
small = Credit_portfolio_of_small_sized_borrower,
med = Credit_portfolio_of_medium_sized_borrower,
large = Credit_portfolio_of_large_sized_borrower
) %>%
dplyr::filter(
!is.na(Legal_Person_Loans_Total),
Legal_Person_Loans_Total != 0,
Quarter <= last_q
) %>%
dplyr::mutate(
num = dplyr::coalesce(micro, 0),
denom = dplyr::coalesce(micro, 0) +
dplyr::coalesce(small, 0) +
dplyr::coalesce(med, 0) +
dplyr::coalesce(large, 0),
micro_exposure = num / Legal_Person_Loans_Total
)
# ----------------------------------------------------------
# 2. Keep only observations with non-missing values in the
# variables required for the empirical analysis
# ----------------------------------------------------------
needed_vars <- c(
"FinInst",
"Quarter",
"Net_Income_qtr",
"Total_Assets",
"Equity",
"Total_Deposits",
"Legal_Person_Loans_Total",
"Income_Statement__Other_Operating_Income_and_Expenses__Administrative_Expenses_qtr",
"TD",
"Headquarters___State",
"Segment",
"micro_exposure"
)
if (!is.null(EMP_COL)) {
needed_vars <- c(needed_vars, EMP_COL)
}
needed_vars <- intersect(needed_vars, names(data))
data_clean <- data %>%
dplyr::filter(
dplyr::if_all(dplyr::all_of(needed_vars), ~ !is.na(.))
)
# ----------------------------------------------------------
# 3. Create transformed variables used in the regressions
# ----------------------------------------------------------
df_final <- data_clean %>%
dplyr::mutate(
log_assets = log(Total_Assets),
eq_assets = Equity / Total_Assets,
dep_assets = Total_Deposits / Total_Assets,
loans_assets = Legal_Person_Loans_Total / Total_Assets,
adminexp_assets = -Income_Statement__Other_Operating_Income_and_Expenses__Administrative_Expenses_qtr / Total_Assets,
state = as.factor(Headquarters___State),
segment = as.factor(Segment),
TD = as.factor(TD),
roa = 400 * Net_Income_qtr / Total_Assets
) %>%
dplyr::arrange(Quarter, FinInst)
# ----------------------------------------------------------
# 4. Define treatment status based on micro exposure in the
# chosen pre-treatment quarter
#
# Control group: micro exposure < 5%
# Treated group: micro exposure >= 50%
# Middle group: dropped
# ----------------------------------------------------------
df_pre <- df_final %>%
dplyr::filter(Quarter == pre_q) %>%
dplyr::mutate(
treatment = dplyr::case_when(
micro_exposure < 0.05 ~ 0L,
micro_exposure >= 0.50 ~ 1L,
TRUE ~ NA_integer_
)
)
selected_banks <- df_pre %>%
dplyr::filter(!is.na(treatment)) %>%
dplyr::distinct(FinInst, treatment)
# ----------------------------------------------------------
# 5. Merge treatment labels back to the full panel and keep
# only the variables needed for the final dataset
# ----------------------------------------------------------
df_out <- df_final %>%
dplyr::inner_join(selected_banks, by = "FinInst") %>%
dplyr::select(
FinInst,
Quarter,
treatment,
log_assets,
eq_assets,
dep_assets,
loans_assets,
adminexp_assets,
roa,
state,
segment,
TD,
micro_exposure,
dplyr::any_of(EMP_COL)
) %>%
dplyr::arrange(Quarter, FinInst)
return(df_out)
}
# Create the final dataset using 2019 Q4
df_out <- make_data(pre_q_char = "2019-12-31")
Data Preparation
The code above constructs the panel dataset used in the empirical analysis.
First, the raw supervisory data are processed to construct the key variable of interest: micro exposure. This variable measures the share of lending to micro-sized firms within the total corporate loan portfolio of each bank. It is calculated as the ratio of the credit portfolio of micro-sized borrowers to total loans to legal persons.
Several filtering steps are then applied. Observations are removed if total corporate lending (Legal_Person_Loans_Total) is missing or equal to zero, and the sample is restricted to quarters up to 2022 Q2. In addition, observations with missing values in the main balance sheet and institutional variables are dropped. The variables required to remain non-missing include:
FinInstQuarterNet_Income_qtrTotal_AssetsEquityTotal_DepositsLegal_Person_Loans_TotalIncome_Statement__Other_Operating_Income_and_Expenses__Administrative_Expenses_qtrTDHeadquarters___StateSegmentmicro_exposure
Next, several transformed variables used in the econometric models are created:
- Bank size:
log_assets = log(Total_Assets) - Capitalization:
eq_assets = Equity / Total_Assets - Funding structure:
dep_assets = Total_Deposits / Total_Assets - Loan intensity:
loans_assets = Legal_Person_Loans_Total / Total_Assets - Administrative costs:
adminexp_assets = -Administrative_Expenses / Total_Assets - Profitability:
roa = 400 × Net_Income_qtr / Total_Assets
Institutional characteristics are also converted to categorical variables, including the bank's state of headquarters, business segment, and institution type (TD).
Finally, banks are classified into treatment groups based on their micro exposure in the pre-treatment quarter (2019 Q4). Banks with very low exposure (micro_exposure < 5%) form the control group, while banks with very high exposure (micro_exposure ≥ 50%) form the treated group. Banks with intermediate exposure are excluded in order to create a clear contrast between low- and high-exposure institutions.
The resulting dataset is a balanced panel of banks with treatment status, financial characteristics, and profitability measures that can be used in the subsequent matching and difference-in-differences analysis.
Show code
df_out %>%
group_by(Quarter, treatment) %>%
summarise(n = n(), .groups = "drop") %>%
tidyr::pivot_wider(
names_from = treatment,
values_from = n,
names_prefix = "treat_"
)
| Quarter | treat_0 | treat_1 |
|---|---|---|
| <date> | <int> | <int> |
| 2014-03-31 | 141 | 79 |
| 2014-06-30 | 145 | 82 |
| 2014-09-30 | 146 | 88 |
| 2014-12-31 | 149 | 87 |
| 2015-03-31 | 148 | 86 |
| 2015-06-30 | 148 | 87 |
| 2015-09-30 | 152 | 89 |
| 2015-12-31 | 149 | 89 |
| 2016-03-31 | 152 | 94 |
| 2016-06-30 | 151 | 96 |
| 2016-09-30 | 156 | 98 |
| 2016-12-31 | 158 | 99 |
| 2017-03-31 | 163 | 99 |
| 2017-06-30 | 160 | 94 |
| 2017-09-30 | 164 | 99 |
| 2017-12-31 | 159 | 89 |
| 2018-03-31 | 165 | 100 |
| 2018-06-30 | 164 | 95 |
| 2018-09-30 | 172 | 101 |
| 2018-12-31 | 174 | 100 |
| 2019-03-31 | 175 | 104 |
| 2019-06-30 | 178 | 95 |
| 2019-09-30 | 182 | 108 |
| 2019-12-31 | 186 | 108 |
| 2020-03-31 | 182 | 106 |
| 2020-06-30 | 175 | 103 |
| 2020-09-30 | 176 | 100 |
| 2020-12-31 | 168 | 98 |
| 2021-03-31 | 170 | 97 |
| 2021-06-30 | 158 | 83 |
| 2021-09-30 | 167 | 94 |
| 2021-12-31 | 154 | 78 |
| 2022-03-31 | 163 | 94 |
| 2022-06-30 | 165 | 83 |
Sample Size Over Time
The code above reports the number of banks in the control and treated groups in each quarter. The control group (low micro exposure) contains roughly 140–180 banks, while the treated group (high micro exposure) contains about 80–110 banks per quarter.
Although the exact counts vary slightly over time, the sample composition remains relatively stable across the period.
Show code
plot_means <- function(
data,
varname,
shock_date = as.Date("2019-12-31"),
pretty_name = NULL
) {
if (!("Quarter" %in% names(data))) stop("The dataset must contain a 'Quarter' column.")
if (!("treatment" %in% names(data))) stop("The dataset must contain a 'treatment' column.")
if (!(varname %in% names(data))) stop(paste0("Variable '", varname, "' not found."))
if (is.null(pretty_name)) pretty_name <- varname
df_plot <- data %>%
dplyr::group_by(Quarter, treatment) %>%
dplyr::summarise(
mean_value = mean(.data[[varname]], na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::mutate(
treatment = factor(treatment, levels = c(0, 1), labels = c("Control", "Treated"))
)
p <- ggplot2::ggplot(
df_plot,
ggplot2::aes(
x = Quarter,
y = mean_value,
color = treatment,
group = treatment
)
) +
ggplot2::geom_line(linewidth = 0.9) +
ggplot2::geom_point(size = 1.8) +
ggplot2::geom_vline(
xintercept = shock_date,
linetype = "dashed",
color = "black",
linewidth = 0.6
) +
ggplot2::scale_x_date(
date_breaks = "1 year",
date_labels = "%Y"
) +
ggplot2::labs(
title = pretty_name,
x = NULL,
y = NULL,
color = NULL
) +
ggplot2::theme_minimal(base_size = 12) +
ggplot2::theme(
plot.title = ggplot2::element_text(face = "bold", hjust = 0.5),
axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
legend.position = "bottom",
panel.grid.minor = ggplot2::element_blank()
)
return(p)
}
Show code
library(patchwork)
vars <- c(
"roa",
"log_assets",
"eq_assets",
"dep_assets",
"loans_assets",
"adminexp_assets"
)
plots <- lapply(vars, function(v) {
plot_means(df_out, v, pretty_name = v)
})
combined_plot <- wrap_plots(plots, ncol = 2) +
plot_layout(guides = "collect") &
theme(legend.position = "bottom")
combined_plot
ggplot2::ggsave(
"bank_variables_over_time.png",
plot = combined_plot,
width = 14,
height = 12,
dpi = 300
)
plot without title
Preliminary Patterns
The figures show that bank profitability (ROA) declines after the onset of the COVID period. At the same time, the other continuous variables — including bank size, capitalization, deposit funding, loan intensity, and administrative expenses — also evolve over time and do not move identically across the two groups.
This suggests that the decline in ROA cannot be attributed to micro exposure alone. Part of the difference in profitability may also be related to other bank characteristics, which motivates the use of a more formal empirical strategy in the next step.
Show code
# ============================================================
# Construct a pre–post dataset
# ------------------------------------------------------------
# The function extracts bank characteristics in a pre-period
# and combines them with profitability in a later post-period.
# It then computes the change in ROA between the two periods.
# ============================================================
make_pre_post_df <- function(df, pre_q_char, post_q_char) {
pre_q <- as.Date(pre_q_char)
post_q <- as.Date(post_q_char)
# --- 1. Data from the pre-treatment quarter ---
df_pre <- df %>%
dplyr::filter(Quarter == pre_q)
# --- 2. ROA from the post-treatment quarter ---
df_post <- df %>%
dplyr::filter(Quarter == post_q) %>%
dplyr::select(FinInst, roa_post = roa)
# --- 3. Merge the two datasets by bank ---
df_out <- df_pre %>%
dplyr::inner_join(df_post, by = "FinInst") %>%
dplyr::mutate(
roa_diff = roa_post - roa # change in ROA between periods
)
return(df_out)
}
# Create the pre–post dataset
pre_post_df <- make_pre_post_df(
df_out,
pre_q_char = "2019-12-31",
post_q_char = "2021-09-30"
)
# ============================================================
# Collapse small categorical groups
# ------------------------------------------------------------
# Categories with very few observations are grouped into
# an "Other" category to avoid extremely small cells.
# ============================================================
collapse_categories <- function(x, min_n = 5) {
tab <- table(x)
valid_levels <- names(tab)[tab >= min_n]
x_new <- ifelse(x %in% valid_levels, as.character(x), "Other")
factor(x_new)
}
# Apply the collapsing to categorical variables
pre_post_df <- pre_post_df %>%
dplyr::mutate(
state = collapse_categories(state, min_n = 5),
segment = collapse_categories(segment, min_n = 5)
)
Pre–Post Dataset Construction
For the difference-in-differences analysis, a specific pre-treatment and post-treatment period are selected. Bank characteristics are taken from the pre-treatment quarter (2019 Q4), while profitability is measured in the post period (2021 Q3). This allows the construction of the change in return on assets (ROA) between the two periods for each bank.
In addition, rare categories in the state and segment variables are grouped into an "Other" category to avoid very small groups in the subsequent analysis.
Show code
did_simple <- lm(roa_diff ~ treatment, data = pre_post_df)
summary(did_simple)
Call:
lm(formula = roa_diff ~ treatment, data = pre_post_df)
Residuals:
Min 1Q Median 3Q Max
-79.447 -1.167 0.882 3.429 46.447
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7000 0.8802 -0.795 0.4271
treatment -3.2815 1.4666 -2.237 0.0261 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.37 on 259 degrees of freedom
Multiple R-squared: 0.01896, Adjusted R-squared: 0.01517
F-statistic: 5.006 on 1 and 259 DF, p-value: 0.02611
Show code
did_simple <- lm(roa_diff ~ treatment + segment + log_assets + eq_assets + TD + loans_assets + dep_assets + state, data = pre_post_df)
summary(did_simple)
Call:
lm(formula = roa_diff ~ treatment + segment + log_assets + eq_assets +
TD + loans_assets + dep_assets + state, data = pre_post_df)
Residuals:
Min 1Q Median 3Q Max
-65.986 -2.581 0.212 2.697 49.589
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -37.395958 12.390427 -3.018 0.00282 **
treatment -0.010944 2.055587 -0.005 0.99576
segment198 -2.819954 3.745939 -0.753 0.45231
segment199 -2.553810 3.235691 -0.789 0.43074
segment9 -0.004881 2.567771 -0.002 0.99848
segmentOther -2.217640 5.650745 -0.392 0.69508
log_assets 1.584399 0.502393 3.154 0.00182 **
eq_assets -0.327503 4.125805 -0.079 0.93680
TDI 3.889095 2.703594 1.438 0.15161
loans_assets 1.594258 0.918024 1.737 0.08375 .
dep_assets 1.590948 3.082658 0.516 0.60627
stateDF 0.883257 6.191549 0.143 0.88668
stateES 3.209862 5.456761 0.588 0.55693
stateGO 3.651265 5.323461 0.686 0.49346
stateMG 1.425543 3.980504 0.358 0.72056
stateOther -0.021271 4.427790 -0.005 0.99617
statePB 1.200158 6.119028 0.196 0.84467
statePR -3.406259 4.613399 -0.738 0.46104
stateRJ 0.853528 4.423344 0.193 0.84715
stateRO 2.842554 5.143940 0.553 0.58105
stateRS 1.811756 4.137031 0.438 0.66183
stateSC 4.419439 3.983573 1.109 0.26837
stateSP 1.276674 3.784586 0.337 0.73616
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.21 on 238 degrees of freedom
Multiple R-squared: 0.1246, Adjusted R-squared: 0.04367
F-statistic: 1.54 on 22 and 238 DF, p-value: 0.06223
Difference-in-Differences Estimation
To estimate the relationship between micro exposure and the change in profitability, a simple difference-in-differences style regression is estimated. The outcome variable is the change in return on assets between the pre- and post-period:
$\Delta ROA_i = ROA_{i,post} - ROA_{i,pre}$
Baseline specification
The simplest specification compares the change in profitability between treated and control banks:
$\Delta ROA_i = \alpha + \beta \, Treatment_i + \varepsilon_i$
where $Treatment_i$ equals 1 for banks with high micro exposure and 0 for banks with low micro exposure.
The estimated coefficient is negative and statistically significant:
$\hat{\beta} \approx -3.28$
This suggests that banks with high micro exposure experienced a larger decline in profitability between the pre- and post-period.
Specification with controls
To account for differences in bank characteristics, the regression is extended with control variables:
$\Delta ROA_i = \alpha + \beta \, Treatment_i + \gamma_1 \log(Assets_i) + \gamma_2 EqAssets_i + \gamma_3 LoansAssets_i + \gamma_4 DepAssets_i + \delta_{Segment_i} + \delta_{State_i} + \delta_{TD_i} + \varepsilon_i$
After controlling for bank characteristics and institutional factors, the treatment coefficient becomes close to zero and statistically insignificant.
Show code
library(cobalt)
psm_seg <- matchit(
treatment ~ log_assets + eq_assets + dep_assets +
loans_assets + adminexp_assets,
data = pre_post_df,
method = "nearest",
ratio = 5,
caliper = 0.05,
distance = "logit"
)
summary(psm_seg2)$sum.matched
love.plot(psm_seg, binary = "std")
Warning message: "glm.fit: fitted probabilities numerically 0 or 1 occurred" Warning message: "Not all treated units will get 5 matches."
| Means Treated | Means Control | Std. Mean Diff. | Var. Ratio | eCDF Mean | eCDF Max | Std. Pair Dist. | |
|---|---|---|---|---|---|---|---|
| distance | 0.50287199 | 0.49788463 | 0.020489572 | 1.0330588 | 0.009203805 | 0.05263158 | 0.03056565 |
| log_assets | 18.14627799 | 18.22556895 | -0.041620399 | 1.6681495 | 0.047146602 | 0.15526316 | 0.73568112 |
| eq_assets | 0.37605038 | 0.35898041 | 0.059821078 | 1.3921704 | 0.043456342 | 0.12105263 | 0.82959483 |
| dep_assets | 0.29396987 | 0.31787599 | -0.086338510 | 0.8662411 | 0.062380007 | 0.15570175 | 1.06369353 |
| loans_assets | 0.09906939 | 0.09844823 | 0.004687441 | 1.1059185 | 0.055059824 | 0.16096491 | 1.32801049 |
| adminexp_assets | 0.01634512 | 0.01627290 | 0.002807322 | 1.3650929 | 0.045992136 | 0.12324561 | 0.59399229 |
plot without title
Propensity Score Matching
To improve the comparability of treated and control banks, a propensity score matching (PSM) procedure is applied. The propensity score is estimated using a logistic model where treatment status is explained by observable bank characteristics:
$treatment_i = f(\log(assets_i), eq\_assets_i, dep\_assets_i, loans\_assets_i, adminexp\_assets_i)$
Banks in the treated group are then matched to similar banks in the control group using nearest-neighbour matching with a ratio of 5 control banks per treated bank and a caliper restriction to avoid poor matches.
The balance of the covariates after matching is evaluated using the standardized mean differences, which are visualized in the Love plot. The goal of the matching procedure is to ensure that treated and control banks have similar observable characteristics before comparing profitability outcomes.
Categorical variables such as bank segment, state, and institution type were also considered in the matching procedure. However, including these variables made it difficult to obtain a stable and well-balanced matching sample. Given that previous exploratory analysis and regression results did not indicate that these variables play a major role in explaining profitability differences, the final matching specification focuses on continuous balance sheet characteristics.
Show code
matched_df <- match.data(psm_seg)
Show code
model_ate <- lm(roa_diff ~ treatment, data = matched_df, weights = weights)
summary(model_ate)
Call:
lm(formula = roa_diff ~ treatment, data = matched_df, weights = weights)
Weighted Residuals:
Min 1Q Median 3Q Max
-80.421 -0.641 1.579 3.656 28.163
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.037 1.584 -1.286 0.201
treatment -0.970 2.620 -0.370 0.712
Residual standard error: 12.86 on 102 degrees of freedom
Multiple R-squared: 0.001342, Adjusted R-squared: -0.008448
F-statistic: 0.1371 on 1 and 102 DF, p-value: 0.7119
Show code
model_ate <- lm(roa_diff ~ treatment + log_assets + eq_assets + loans_assets + dep_assets, data = matched_df, weights = weights)
summary(model_ate)
Call:
lm(formula = roa_diff ~ treatment + log_assets + eq_assets +
loans_assets + dep_assets, data = matched_df, weights = weights)
Weighted Residuals:
Min 1Q Median 3Q Max
-60.269 -5.496 -0.970 4.548 26.936
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -76.5227 19.2285 -3.980 0.000132 ***
treatment -0.6139 2.3326 -0.263 0.792968
log_assets 3.7068 0.9258 4.004 0.000121 ***
eq_assets 6.0627 6.5558 0.925 0.357343
loans_assets 23.8734 7.7156 3.094 0.002572 **
dep_assets 7.5525 4.5936 1.644 0.103351
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.44 on 98 degrees of freedom
Multiple R-squared: 0.241, Adjusted R-squared: 0.2023
F-statistic: 6.224 on 5 and 98 DF, p-value: 4.708e-05
Treatment Effect After Matching
After constructing the matched sample, the treatment effect is estimated using weighted regressions that incorporate the matching weights.
The baseline specification estimates:
$\Delta ROA_i = \alpha + \beta \, Treatment_i + \varepsilon_i$
In the matched sample, the estimated treatment coefficient is small and statistically insignificant:
$\hat{\beta} \approx -0.97$
A second specification additionally controls for bank characteristics:
$\Delta ROA_i = \alpha + \beta \, Treatment_i + \gamma_1 \log(Assets_i) + \gamma_2 EqAssets_i + \gamma_3 LoansAssets_i + \gamma_4 DepAssets_i + \varepsilon_i$
In this specification the treatment effect remains close to zero and statistically insignificant:
$\hat{\beta} \approx -0.61$
Overall, the results suggest that once banks are matched on observable characteristics, there is no evidence that higher micro exposure systematically reduces bank profitability.
Show code
plot_matched_quarterly_means <- function(
df_out,
matched_df,
varname,
shock_date = as.Date("2019-12-31"),
pretty_name = NULL
) {
# ----------------------------------------------------------
# 0. Basic checks
# ----------------------------------------------------------
if (!(varname %in% names(df_out))) {
stop(paste("Variable", varname, "not found in df_out."))
}
if (!("weights" %in% names(matched_df))) {
stop("matched_df must contain a 'weights' column.")
}
if (!("FinInst" %in% names(matched_df))) {
stop("matched_df must contain a 'FinInst' column.")
}
if (!("treatment" %in% names(matched_df))) {
stop("matched_df must contain a 'treatment' column.")
}
if (!("Quarter" %in% names(df_out))) {
stop("df_out must contain a 'Quarter' column.")
}
if (!("treatment" %in% names(df_out))) {
stop("df_out must contain a 'treatment' column.")
}
if (is.null(pretty_name)) pretty_name <- varname
# ----------------------------------------------------------
# 1. Keep only matched banks
# ----------------------------------------------------------
matched_banks <- unique(matched_df$FinInst)
# ----------------------------------------------------------
# 2. Extract treatment and matching weights
# ----------------------------------------------------------
weights_info <- matched_df %>%
dplyr::select(FinInst, treatment, weights)
# ----------------------------------------------------------
# 3. Rebuild the full panel for matched banks only
# ----------------------------------------------------------
df_panel <- df_out %>%
dplyr::filter(FinInst %in% matched_banks) %>%
dplyr::left_join(weights_info, by = c("FinInst", "treatment"))
# ----------------------------------------------------------
# 4. Compute weighted quarterly means
# ----------------------------------------------------------
df_plot <- df_panel %>%
dplyr::group_by(Quarter, treatment) %>%
dplyr::summarise(
weighted_mean = weighted.mean(.data[[varname]], weights, na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::mutate(
treatment = factor(treatment, levels = c(0, 1), labels = c("Control", "Treated"))
)
# ----------------------------------------------------------
# 5. Plot
# ----------------------------------------------------------
p <- ggplot2::ggplot(
df_plot,
ggplot2::aes(
x = Quarter,
y = weighted_mean,
color = treatment,
group = treatment
)
) +
ggplot2::geom_line(linewidth = 0.9) +
ggplot2::geom_point(size = 1.8) +
ggplot2::geom_vline(
xintercept = shock_date,
linetype = "dashed",
color = "black",
linewidth = 0.6
) +
ggplot2::scale_x_date(
date_breaks = "1 year",
date_labels = "%Y"
) +
ggplot2::labs(
title = pretty_name,
x = NULL,
y = NULL,
color = NULL
) +
ggplot2::theme_minimal(base_size = 12) +
ggplot2::theme(
plot.title = ggplot2::element_text(face = "bold", hjust = 0.5),
axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
legend.position = "bottom",
panel.grid.minor = ggplot2::element_blank()
)
return(p)
}
library(patchwork)
pretty_names <- c(
roa = "ROA",
log_assets = "Log assets",
eq_assets = "Equity / Assets",
dep_assets = "Deposits / Assets",
loans_assets = "Loans / Assets",
adminexp_assets = "Administrative expenses / Assets"
)
vars <- names(pretty_names)
plots <- lapply(vars, function(v) {
plot_matched_quarterly_means(
df_out = df_out,
matched_df = matched_df,
varname = v,
pretty_name = pretty_names[[v]]
)
})
combined_matched_plot <- wrap_plots(plots, ncol = 2) +
patchwork::plot_layout(guides = "collect") &
ggplot2::theme(legend.position = "bottom")
combined_matched_plot
plot without title
Balance After Matching
The figures show that after matching the treated and control groups follow much more similar paths over time. The main balance sheet variables and profitability measures move broadly together, suggesting that the matching procedure substantially improved comparability between the two groups.
However, the alignment is not perfect and some differences remain. This indicates that while propensity score matching reduces observable differences between banks, it cannot completely eliminate all sources of heterogeneity.
Fixed Effects Estimation
As a final step, fixed effects regressions are estimated using the full panel dataset. These models exploit the panel structure of the data and control for unobserved time-invariant bank characteristics.
Four specifications are estimated. First, a baseline fixed effects model is estimated without additional controls using the full sample. Second, the same specification is estimated with balance sheet control variables. The analysis is then repeated on the matched sample obtained from the propensity score matching procedure, again with and without control variables.
This results in four models in total, allowing a comparison between unmatched and matched samples as well as between parsimonious and control-rich specifications.
Show code
df_out <- df_out %>%
dplyr::mutate(
post = ifelse(Quarter > as.Date("2019-12-31"), 1L, 0L)
)
fe_model1 <- feols(
roa ~ treatment * post + log_assets + eq_assets + loans_assets + dep_assets + state + TD + segment | FinInst + Quarter,
data = df_out
)
summary(fe_model1)
The variables 'treatment', 'post', 'stateAL', 'stateAM', 'stateAP', 'stateBA' and 23 others have been removed because of collinearity (see $collin.var).
OLS estimation, Dep. Var.: roa
Observations: 8,717
Fixed-effects: FinInst: 294, Quarter: 34
Standard-errors: IID
Estimate Std. Error t value Pr(>|t|)
log_assets 4.889480 0.298112 16.401481 < 2.2e-16 ***
eq_assets 17.624027 1.396976 12.615840 < 2.2e-16 ***
loans_assets -0.079003 0.183406 -0.430756 6.6666e-01
dep_assets 1.231668 1.144717 1.075958 2.8198e-01
stateCE 40.188755 3.235005 12.423089 < 2.2e-16 ***
stateMG -4.183376 4.392313 -0.952431 3.4091e-01
stateRS 9.030908 4.312259 2.094241 3.6269e-02 *
TDI 1.262167 0.703956 1.792964 7.3015e-02 .
segment196 10.191690 8.777445 1.161123 2.4563e-01
segment197 3.231386 1.620738 1.993775 4.6209e-02 *
segment198 3.475873 1.497635 2.320907 2.0316e-02 *
treatment:post -2.561198 0.417317 -6.137300 8.7749e-10 ***
... 29 variables were removed because of collinearity (treatment, post
and 27 others [full set in $collin.var])
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 8.20626 Adj. R2: 0.282665
Within R2: 0.063698Show code
fe_model2 <- feols(
roa ~ treatment * post | FinInst + Quarter,
data = df_out
)
summary(fe_model2)
The variables 'treatment' and 'post' have been removed because of collinearity (see $collin.var).
OLS estimation, Dep. Var.: roa
Observations: 8,717
Fixed-effects: FinInst: 294, Quarter: 34
Standard-errors: IID
Estimate Std. Error t value Pr(>|t|)
treatment:post -2.22683 0.426907 -5.21619 1.8701e-07 ***
... 2 variables were removed because of collinearity (treatment and
post)
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 8.46708 Adj. R2: 0.237342
Within R2: 0.003233Show code
fe_model3 <- feols(
roa ~ treatment * post | FinInst + Quarter,
data = df_panel
)
summary(fe_model3)
The variables 'treatment' and 'post' have been removed because of collinearity (see $collin.var).
OLS estimation, Dep. Var.: roa
Observations: 3,098
Fixed-effects: FinInst: 104, Quarter: 34
Standard-errors: IID
Estimate Std. Error t value Pr(>|t|)
treatment:post -0.87331 0.645671 -1.35256 0.1763
... 2 variables were removed because of collinearity (treatment and
post)
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.75912 Adj. R2: 0.276621
Within R2: 6.177e-4Show code
df_panel <- df_out %>%
dplyr::filter(FinInst %in% matched_banks) %>%
dplyr::left_join(weights_info, by = c("FinInst", "treatment"))
fe_model4 <- feols(
roa ~ treatment * post + log_assets + eq_assets + loans_assets + dep_assets + TD + state | FinInst + Quarter,
data = df_panel
)
summary(fe_model4)
The variables 'treatment', 'post', 'stateAL', 'stateAM', 'stateAP', 'stateBA' and 21 others have been removed because of collinearity (see $collin.var).
OLS estimation, Dep. Var.: roa
Observations: 3,098
Fixed-effects: FinInst: 104, Quarter: 34
Standard-errors: IID
Estimate Std. Error t value Pr(>|t|)
log_assets 2.24033 0.491480 4.558324 5.3658e-06 ***
eq_assets 7.05519 2.486078 2.837878 4.5724e-03 **
loans_assets -8.16951 2.100127 -3.890008 1.0245e-04 ***
dep_assets -2.15688 1.981625 -1.088438 2.7649e-01
TDI -1.58299 1.610045 -0.983196 3.2559e-01
stateMG -6.54715 4.267084 -1.534337 1.2505e-01
treatment:post -1.00735 0.647378 -1.556051 1.1980e-01
... 27 variables were removed because of collinearity (treatment, post
and 25 others [full set in $collin.var])
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.69005 Adj. R2: 0.288
Within R2: 0.018332Fixed Effects Results
As a final step, difference-in-differences models with bank and quarter fixed effects are estimated using the panel dataset. The specification can be written as
$ROA_{it} = \alpha_i + \lambda_t + \beta (Treatment_i \times Post_t) + X_{it}\gamma + \varepsilon_{it}$
where $\alpha_i$ denotes bank fixed effects, $\lambda_t$ quarter fixed effects, and the coefficient of interest is the interaction term $Treatment_i \times Post_t$, which captures whether banks with higher micro exposure experienced a different change in profitability after the COVID period.
In the full sample, the estimated effect is negative and statistically significant. Without additional controls the coefficient is approximately
$\hat{\beta} \approx -2.23$
and remains similar when balance sheet controls are included:
$\hat{\beta} \approx -2.56$
This suggests that banks with higher micro exposure experienced a larger decline in ROA after the shock.
However, when the analysis is repeated on the matched sample, the estimated effect becomes smaller and statistically insignificant. Without controls the coefficient is
$\hat{\beta} \approx -0.87$
and with controls
$\hat{\beta} \approx -1.01$.
Overall, the fixed effects results are consistent with the earlier findings: once banks with similar observable characteristics are compared, there is no strong evidence that higher micro exposure systematically reduces bank profitability.
Summary
| Method | Sample | Specification | Treatment Effect |
|---|---|---|---|
| DID | Full sample | No controls | $-3.28$ |
| DID | Full sample | With controls | $-0.01$ |
| DID | Matched sample | No controls | $-0.97$ |
| DID | Matched sample | With controls | $-0.61$ |
| FE | Full sample | No controls | $-2.23$ |
| FE | Full sample | With controls | $-2.56$ |
| FE | Matched sample | No controls | $-0.87$ |
| FE | Matched sample | With controls | $-1.01$ |
The estimates obtained from the full sample are relatively dispersed, likely reflecting the volatility of ROA and the influence of outliers. By contrast, the matched sample produces more consistent results across specifications.
Across the matched regressions, the estimated treatment effect is close to minus one percentage point. While it remains possible that some confounding factors are not fully controlled for, the results suggest that banks with higher micro-enterprise exposure experienced on average roughly a one percentage point decline in return on assets following the onset of COVID.