Replication notebook

Statistical Tests and Diagnostic Checks

Replication notebook for the descriptive statistics, normality tests, LiNGAM residual diagnostics, and VAR lag-order checks reported in the paper.

Overview

This notebook reproduces the descriptive statistics and diagnostic tests reported in the paper for the event-day return data. The input is the one-minute log-return dataset created in the data-preparation notebook and saved as full_dataset.csv.

The workflow has five main parts. First, the notebook loads the cleaned return panel, converts the timestamp column to New York time, and keeps the AI-related stock universe used in the event study. Second, it computes descriptive statistics for the demarketed event-day returns: mean, variance, skewness, excess kurtosis, and the estimated Student-t shape parameter. Third, it applies the Jarque-Bera and Shapiro-Wilk normality tests to the same demarketed return series. Fourth, it repeats the descriptive and normality-test calculations for the LiNGAM residuals obtained from the demarketed data. Finally, it compares VAR(0) and VAR(1) specifications by BIC for each event day.

The outputs correspond to the statistical evidence reported in Table 2 of the main text and Tables C.1-C.2 in Appendix C. The code is written to be robust to missing tickers, empty event windows, and estimation failures: in these cases the relevant entries are returned as NA rather than stopping the whole replication.

R code Cell 2
Show code
# ============================================================
# Required package: R2DAG
# ============================================================
# The replication code uses functions from the author's R package
# R2DAG. The package is available from GitHub and can be installed
# with the command below.
#
# dependencies = FALSE and upgrade = "never" are used to avoid
# automatically modifying the user's existing R package environment.
# After installation, the package is loaded for the LiNGAM-based
# residual extraction used later in the notebook.

devtools::install_github(
  "espanm/R2DAG",
  dependencies = FALSE,
  upgrade = "never"
)

library(R2DAG)

Load the event-study return data

This cell reads the one-minute log-return dataset from the user-specified folder. The datetime column is converted to New York time because the event windows are defined in U.S. market time. The non-AI benchmark tickers are removed here, so the following statistical tests are computed only for the stock universe used in the AI-event network analysis.

R code Cell 4
Show code
library(readr)
library(xts)

# Read the one-minute log-return dataset created by the data-preparation notebook.
# Replace [YOUR FOLDER] with the folder where full_dataset.csv is stored.
df <- read_csv("[YOUR FOLDER]/full_dataset.csv")

# Event windows are defined in Eastern/New York trading time.
df$datetime <- as.POSIXct(df$datetime, tz = "America/New_York")

# These non-AI tickers were included in the broad data file but are not used in
# this notebook's event-day statistical tables.
non_ai_tickers <- c("WMT", "COST", "CVX", "ABBV", "PG", "MRK", "PM", "MCD", "PEP", "T")

# Drop only those non-AI tickers that are actually present in the data.
cols_to_drop <- intersect(non_ai_tickers, colnames(df))

df <- df[, !colnames(df) %in% cols_to_drop]

# Convert the data frame into an xts object for convenient intraday subsetting.
data_merged <- xts(df[, -1], order.by = df$datetime)
Output 1
A szükséges csomag betöltődik: zoo


Kapcsolódás csomaghoz: 'zoo'


The following objects are masked from 'package:base':

    as.Date, as.Date.numeric


Rows: 469139 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl  (30): NVDA, MSFT, AMZN, META, GOOG, TSLA, AAPL, AVGO, AMD, MU, ORCL, IB...
dttm  (1): datetime

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Descriptive statistics for demarketed event-day returns

This block computes the distributional summary statistics for each stock and each event day after removing the market component. For every stock, the one-minute return is regressed on the S&P 500 return over the same event-day window, and the residuals are treated as demarketed returns.

For each event, the notebook reports the cross-sectional median, minimum, and maximum across stocks for the mean, variance, skewness, excess kurtosis, and estimated Student-t shape parameter. These results correspond to Table 2 in Section 3 of the paper.

R code Cell 6
Show code
library(xts)
library(moments)
library(readr)

# The SP500 column is required because demarketing is done by regressing
# each stock return on the market return.
stopifnot("SP500" %in% colnames(data_merged))

# Event dates used in the empirical analysis.
events <- c(
  "ChatGPT launch" = "2022-11-30",
  "NVIDIA Hopper/H100" = "2022-03-22",
  "GitHub Copilot preview" = "2021-06-29",
  "Claude Opus 4.6" = "2026-02-05"
)

# Extract the regular U.S. trading session for a given event day.
get_event_window <- function(data, date_str,
                             start_hms = "09:30:00",
                             end_hms = "16:00:00") {
  from <- paste0(date_str, " ", start_hms)
  to <- paste0(date_str, " ", end_hms)
  data[paste0(from, "/", to)]
}

# Estimate the degrees-of-freedom parameter of a Student-t density by maximum
# likelihood. The transformation nu = 2 + exp(theta) keeps the fitted degrees
# of freedom above 2, so the variance of the fitted Student-t distribution exists.
fit_student_t_df <- function(z) {
  z <- na.omit(as.numeric(z))

  if (length(z) < 30) return(NA_real_)

  sdz <- sd(z)
  if (!is.finite(sdz) || sdz <= 0) return(NA_real_)

  negloglik_t <- function(par, x) {
    mu <- par[1]
    sigma <- exp(par[2])
    nu <- 2 + exp(par[3])

    ll <- dt((x - mu) / sigma, df = nu, log = TRUE) - log(sigma)

    if (any(!is.finite(ll))) return(1e12)
    -sum(ll)
  }

  # Start the optimization at the sample mean, sample standard deviation,
  # and an initial Student-t degrees-of-freedom value of 8.
  init <- c(
    mean(z),
    log(sdz + 1e-8),
    log(8 - 2)
  )

  fit <- tryCatch(
    optim(
      par = init,
      fn = negloglik_t,
      x = z,
      method = "BFGS",
      hessian = FALSE,
      control = list(maxit = 2000, reltol = 1e-10)
    ),
    error = function(e) NULL
  )

  if (is.null(fit)) return(NA_real_)
  if (!is.list(fit) || is.null(fit$convergence) || fit$convergence != 0) return(NA_real_)

  nu_hat <- 2 + exp(fit$par[3])

  if (!is.finite(nu_hat)) return(NA_real_)
  nu_hat
}

# Compute all descriptive statistics for one stock on one event day.
compute_one_asset_one_event <- function(day_data, ticker) {
  stat_names <- c(
    "Mean",
    "Variance",
    "Skewness",
    "Excess Kurtosis",
    "Student-t density shape parameter"
  )

  # Return NA values if the requested ticker is not available.
  if (!(ticker %in% colnames(day_data))) {
    out <- rep(NA_real_, length(stat_names))
    names(out) <- stat_names
    return(out)
  }

  # Keep the stock and market return only, then drop missing observations.
  sub <- day_data[, c(ticker, "SP500")]
  sub <- na.omit(sub)

  if (NROW(sub) < 30) {
    out <- rep(NA_real_, length(stat_names))
    names(out) <- stat_names
    return(out)
  }

  y <- as.numeric(sub[, ticker])
  x <- as.numeric(sub[, "SP500"])

  # Demarket the stock return by taking residuals from y_t = alpha + beta * SP500_t + eps_t.
  fit_lm <- lm(y ~ x)
  eps <- resid(fit_lm)

  if (length(eps) < 30 || sd(eps) <= 0 || !is.finite(sd(eps))) {
    out <- rep(NA_real_, length(stat_names))
    names(out) <- stat_names
    return(out)
  }

  c(
    Mean = mean(eps),
    Variance = var(eps),
    Skewness = moments::skewness(eps),
    `Excess Kurtosis` = moments::kurtosis(eps) - 3,
    `Student-t density shape parameter` = fit_student_t_df(eps)
  )
}

# All stock columns except the market index are treated as assets.
asset_names <- setdiff(colnames(data_merged), "SP500")

event_asset_results <- list()
panel_results <- list()

for (ev_name in names(events)) {
  ev_date <- events[[ev_name]]
  day_data <- get_event_window(data_merged, ev_date)

  # Compute a stock-by-statistic table for the current event.
  # If the event window is unavailable, keep the table structure and fill it with NA.
  if (NROW(day_data) == 0) {
    stats_mat <- matrix(NA_real_, nrow = 5, ncol = length(asset_names))
    rownames(stats_mat) <- c(
      "Mean",
      "Variance",
      "Skewness",
      "Excess Kurtosis",
      "Student-t density shape parameter"
    )
    colnames(stats_mat) <- asset_names
  } else {
    stats_mat <- sapply(asset_names, function(tkr) {
      compute_one_asset_one_event(day_data, tkr)
    })
  }

  # Store the asset-level results in case the detailed table is needed later.
  event_asset_results[[ev_name]] <- as.data.frame(stats_mat)

  # Collapse the cross-section into median, minimum, and maximum values,
  # matching the summary format used in the paper.
  panel_mat <- t(apply(stats_mat, 1, function(z) {
    z <- as.numeric(z)
    c(
      median = if (all(is.na(z))) NA_real_ else median(z, na.rm = TRUE),
      min = if (all(is.na(z))) NA_real_ else min(z, na.rm = TRUE),
      max = if (all(is.na(z))) NA_real_ else max(z, na.rm = TRUE)
    )
  }))

  panel_results[[ev_name]] <- panel_mat
}

# Combine the four event-level summary tables into one publication-style table.
final_table <- cbind(
  panel_results[[1]],
  panel_results[[2]],
  panel_results[[3]],
  panel_results[[4]]
)

final_table <- as.data.frame(final_table)

colnames(final_table) <- c(
  "Event1_median", "Event1_min", "Event1_max",
  "Event2_median", "Event2_min", "Event2_max",
  "Event3_median", "Event3_min", "Event3_max",
  "Event4_median", "Event4_min", "Event4_max"
)

rownames(final_table) <- c(
  "Mean",
  "Variance",
  "Skewness",
  "Excess Kurtosis",
  "Student-t density shape parameter"
)

# Rename columns with readable event labels.
final_table_named <- final_table

colnames(final_table_named) <- c(
  "ChatGPT_median", "ChatGPT_min", "ChatGPT_max",
  "Hopper_median", "Hopper_min", "Hopper_max",
  "Copilot_median", "Copilot_min", "Copilot_max",
  "Claude_median", "Claude_min", "Claude_max"
)

# Format small mean and variance values in scientific notation, while keeping
# the other statistics in a compact four-decimal format.
format_final_table <- function(df) {
  out <- as.data.frame(matrix("", nrow = nrow(df), ncol = ncol(df)))
  rownames(out) <- rownames(df)
  colnames(out) <- colnames(df)

  sci_rows <- c("Mean", "Variance")

  for (i in seq_len(nrow(df))) {
    rn <- rownames(df)[i]

    if (rn %in% sci_rows) {
      out[i, ] <- sapply(df[i, ], function(x) {
        if (is.na(x)) {
          NA_character_
        } else {
          formatC(x, format = "e", digits = 4)
        }
      })
    } else {
      out[i, ] <- sapply(df[i, ], function(x) {
        if (is.na(x)) {
          NA_character_
        } else {
          sprintf("%.4f", round(x, 4))
        }
      })
    }
  }

  out
}

final_table_named_fmt <- format_final_table(final_table_named)

print(final_table_named_fmt, quote = FALSE)
Output 1
                                  ChatGPT_median ChatGPT_min ChatGPT_max
Mean                                 -4.9894e-22 -2.0054e-19  1.2324e-19
Variance                              8.1051e-07  1.4686e-07  5.5381e-06
Skewness                                  0.4426     -0.4505      2.0525
Excess Kurtosis                           4.6714      2.0593     28.0396
Student-t density shape parameter         3.3362      2.0461      5.8412
                                  Hopper_median  Hopper_min Hopper_max
Mean                                 3.5415e-22 -1.9467e-19 1.6892e-19
Variance                             6.9809e-07  1.6558e-07 2.1532e-06
Skewness                                 0.1709     -0.5343     1.2112
Excess Kurtosis                          3.3660      0.8303    12.3028
Student-t density shape parameter        3.6660      2.9305     8.6867
                                  Copilot_median Copilot_min Copilot_max
Mean                                  3.6358e-21 -1.2752e-19  2.8638e-19
Variance                              2.1119e-07  7.9631e-08  1.0426e-06
Skewness                                 -0.1976     -1.0853      1.6568
Excess Kurtosis                           4.3651      2.0837     12.1919
Student-t density shape parameter         3.3842      2.3939      8.0000
                                  Claude_median  Claude_min Claude_max
Mean                                 6.2562e-20 -3.7660e-19 9.8345e-19
Variance                             2.2330e-06  4.0941e-07 4.2332e-06
Skewness                                 0.3702     -1.3297     3.8295
Excess Kurtosis                          9.2251      3.3324    48.8524
Student-t density shape parameter        3.0090      2.2148     4.1176

Normality tests for demarketed event-day returns

This block applies the Jarque--Bera and Shapiro--Wilk tests to the demarketed stock-return residuals. The null hypothesis in both tests is normality. As above, the results are first computed stock by stock within each event window and are then summarized across stocks using the median, minimum, and maximum. These values are reported in Table 2 in Section 3.

R code Cell 8
Show code
library(xts)
library(tseries)

# The market index is required to construct demarketed residuals.
stopifnot("SP500" %in% colnames(data_merged))

events <- c(
  "ChatGPT launch" = "2022-11-30",
  "NVIDIA Hopper/H100" = "2022-03-22",
  "GitHub Copilot preview" = "2021-06-29",
  "Claude Opus 4.6" = "2026-02-05"
)

# Extract the regular trading session for the selected event date.
get_event_window <- function(data, date_str,
                             start_hms = "09:30:00",
                             end_hms = "16:00:00") {
  from <- paste0(date_str, " ", start_hms)
  to <- paste0(date_str, " ", end_hms)
  data[paste0(from, "/", to)]
}

# Compute normality-test results for one stock on one event day.
compute_test_results <- function(day_data, ticker) {
  out_na <- c(
    `Jarque and Bera p-value` = NA_real_,
    `Jarque and Bera statistic` = NA_real_,
    `Shapiro - Wilk p-value` = NA_real_,
    `Shapiro - Wilk statistic` = NA_real_
  )

  if (!(ticker %in% colnames(day_data))) {
    return(out_na)
  }

  # Demarketing is done on the same observations used in the event-day test.
  sub <- day_data[, c(ticker, "SP500")]
  sub <- na.omit(sub)

  if (NROW(sub) < 8) {
    return(out_na)
  }

  y <- as.numeric(sub[, ticker])
  x <- as.numeric(sub[, "SP500"])

  fit_lm <- lm(y ~ x)
  eps <- resid(fit_lm)
  eps <- eps[is.finite(eps)]

  if (length(eps) < 8 || sd(eps) <= 0 || !is.finite(sd(eps))) {
    return(out_na)
  }

  # Jarque--Bera is based on skewness and kurtosis.
  jb_out <- tryCatch(
    tseries::jarque.bera.test(eps),
    error = function(e) NULL
  )

  # Shapiro--Wilk is an additional finite-sample normality diagnostic.
  sw_out <- tryCatch(
    shapiro.test(eps),
    error = function(e) NULL
  )

  c(
    `Jarque and Bera p-value` = if (is.null(jb_out)) NA_real_ else unname(jb_out$p.value),
    `Jarque and Bera statistic` = if (is.null(jb_out)) NA_real_ else unname(jb_out$statistic),
    `Shapiro - Wilk p-value` = if (is.null(sw_out)) NA_real_ else unname(sw_out$p.value),
    `Shapiro - Wilk statistic` = if (is.null(sw_out)) NA_real_ else unname(sw_out$statistic)
  )
}

asset_names <- setdiff(colnames(data_merged), "SP500")

event_test_results <- list()
panel_results <- list()

for (ev_name in names(events)) {
  ev_date <- events[[ev_name]]
  day_data <- get_event_window(data_merged, ev_date)

  # Compute stock-level test results. Missing event windows are represented by NA tables.
  if (NROW(day_data) == 0) {
    res_mat <- matrix(NA_real_, nrow = 4, ncol = length(asset_names))
    rownames(res_mat) <- c(
      "Jarque and Bera p-value",
      "Jarque and Bera statistic",
      "Shapiro - Wilk p-value",
      "Shapiro - Wilk statistic"
    )
    colnames(res_mat) <- asset_names
  } else {
    res_mat <- sapply(asset_names, function(tkr) {
      compute_test_results(day_data, tkr)
    })
  }

  event_test_results[[ev_name]] <- as.data.frame(res_mat)

  # Summarize the cross-section of tickers for the table in the paper.
  panel_mat <- t(apply(res_mat, 1, function(z) {
    z <- as.numeric(z)
    c(
      median = if (all(is.na(z))) NA_real_ else median(z, na.rm = TRUE),
      min = if (all(is.na(z))) NA_real_ else min(z, na.rm = TRUE),
      max = if (all(is.na(z))) NA_real_ else max(z, na.rm = TRUE)
    )
  }))

  panel_results[[ev_name]] <- panel_mat
}

# Bind all event summaries into a single table.
final_table_named <- cbind(
  panel_results[["ChatGPT launch"]],
  panel_results[["NVIDIA Hopper/H100"]],
  panel_results[["GitHub Copilot preview"]],
  panel_results[["Claude Opus 4.6"]]
)

final_table_named <- as.data.frame(final_table_named)

colnames(final_table_named) <- c(
  "ChatGPT_median", "ChatGPT_min", "ChatGPT_max",
  "Hopper_median", "Hopper_min", "Hopper_max",
  "Copilot_median", "Copilot_min", "Copilot_max",
  "Claude_median", "Claude_min", "Claude_max"
)

rownames(final_table_named) <- c(
  "Jarque and Bera p-value",
  "Jarque and Bera statistic",
  "Shapiro - Wilk p-value",
  "Shapiro - Wilk statistic"
)

print(round(final_table_named, 6))
Output 1
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 

Output 2
                          ChatGPT_median ChatGPT_min  ChatGPT_max Hopper_median
Jarque and Bera p-value         0.000000    0.000000     0.000000      0.000000
Jarque and Bera statistic     354.614785   73.427884 13009.262112    211.014680
Shapiro - Wilk p-value          0.000000    0.000000     0.000004      0.000000
Shapiro - Wilk statistic        0.937985    0.741686     0.975916      0.952831
                          Hopper_min  Hopper_max Copilot_median Copilot_min
Jarque and Bera p-value      0.00000    0.003482       0.000000    0.000000
Jarque and Bera statistic   11.32037 2554.930752     340.595405   73.094691
Shapiro - Wilk p-value       0.00000    0.047569       0.000000    0.000000
Shapiro - Wilk statistic     0.88531    0.992501       0.937917    0.855082
                          Copilot_max Claude_median Claude_min  Claude_max
Jarque and Bera p-value      0.000000       0.00000   0.000000 0.00000e+00
Jarque and Bera statistic 2419.491530    1391.81089 183.801578 3.97348e+04
Shapiro - Wilk p-value       0.000001       0.00000   0.000000 0.00000e+00
Shapiro - Wilk statistic     0.973139       0.89029   0.723845 9.65171e-01

Descriptive statistics for demarketed LiNGAM residuals

This block repeats the descriptive-statistics exercise after applying the LiNGAM-based contemporaneous identification step. The input to LiNGAM is the demarketed event-day return matrix. The residuals returned by R2_network() are then interpreted as the estimated independent structural shocks.

The summary statistics are computed for each structural residual series and then summarized across assets by event. These results correspond to Table C.2 in Appendix C.

R code Cell 10
Show code
library(xts)
library(moments)
library(readr)

# The SP500 column is needed for the first-stage demarketing regression.
stopifnot("SP500" %in% colnames(data_merged))

events <- c(
  "ChatGPT launch" = "2022-11-30",
  "NVIDIA Hopper/H100" = "2022-03-22",
  "GitHub Copilot preview" = "2021-06-29",
  "Claude Opus 4.6" = "2026-02-05"
)

# Extract the regular U.S. trading session for each event day.
get_event_window <- function(data, date_str,
                             start_hms = "09:30:00",
                             end_hms   = "16:00:00") {
  from <- paste0(date_str, " ", start_hms)
  to   <- paste0(date_str, " ", end_hms)
  data[paste0(from, "/", to)]
}

# Fit a Student-t density to one residual series and return the estimated
# degrees-of-freedom parameter. Smaller values indicate heavier tails.
fit_student_t_df <- function(z) {
  z <- na.omit(as.numeric(z))

  if (length(z) < 30) return(NA_real_)

  sdz <- sd(z)
  if (!is.finite(sdz) || sdz <= 0) return(NA_real_)

  negloglik_t <- function(par, x) {
    mu    <- par[1]
    sigma <- exp(par[2])
    nu    <- 2 + exp(par[3])

    ll <- dt((x - mu) / sigma, df = nu, log = TRUE) - log(sigma)

    if (any(!is.finite(ll))) return(1e12)
    -sum(ll)
  }

  init <- c(
    mean(z),
    log(sdz + 1e-8),
    log(8 - 2)
  )

  fit <- tryCatch(
    optim(
      par = init,
      fn = negloglik_t,
      x = z,
      method = "BFGS",
      hessian = FALSE,
      control = list(maxit = 2000, reltol = 1e-10)
    ),
    error = function(e) NULL
  )

  if (is.null(fit)) return(NA_real_)
  if (!is.list(fit) || is.null(fit$convergence) || fit$convergence != 0) return(NA_real_)

  nu_hat <- 2 + exp(fit$par[3])

  if (!is.finite(nu_hat)) return(NA_real_)
  nu_hat
}

# Compute descriptive statistics for a single residual vector.
compute_stats_from_vector <- function(z) {
  stat_names <- c(
    "Mean",
    "Variance",
    "Skewness",
    "Excess Kurtosis",
    "Student-t density shape parameter"
  )

  z <- na.omit(as.numeric(z))

  if (length(z) < 30 || sd(z) <= 0 || !is.finite(sd(z))) {
    out <- rep(NA_real_, length(stat_names))
    names(out) <- stat_names
    return(out)
  }

  c(
    Mean = mean(z),
    Variance = var(z),
    Skewness = moments::skewness(z),
    `Excess Kurtosis` = moments::kurtosis(z) - 3,
    `Student-t density shape parameter` = fit_student_t_df(z)
  )
}

# Construct the demarketed return matrix for one event window.
# Each stock is regressed separately on SP500, and the residuals are collected.
get_demarketed_matrix <- function(day_data, asset_names) {
  needed <- c(asset_names, "SP500")
  sub <- day_data[, needed]
  sub <- na.omit(sub)

  if (NROW(sub) < 30) return(NULL)

  x_mkt <- as.numeric(sub[, "SP500"])

  eps_mat <- sapply(asset_names, function(tkr) {
    y <- as.numeric(sub[, tkr])
    fit <- tryCatch(lm(y ~ x_mkt), error = function(e) NULL)

    if (is.null(fit)) {
      rep(NA_real_, length(y))
    } else {
      resid(fit)
    }
  })

  eps_mat <- as.matrix(eps_mat)
  colnames(eps_mat) <- asset_names
  eps_mat <- eps_mat[complete.cases(eps_mat), , drop = FALSE]

  if (nrow(eps_mat) < 30) return(NULL)
  eps_mat
}

# Run the LiNGAM-based R2_network routine and extract the residuals.
# These residuals are the estimated structural shocks used in the appendix table.
get_lingam_residuals_from_R2_network <- function(X) {
  X <- as.matrix(X)

  if (nrow(X) < 30 || ncol(X) < 2) return(NULL)

  X_df <- as.data.frame(X)

  fit <- tryCatch(
    R2_network(
      data = X_df,
      directed = TRUE,
      amat = FALSE,
      dag_method = "lingam",
      standardize_for_dag = FALSE
    ),
    error = function(e) NULL
  )

  if (is.null(fit)) return(NULL)
  if (is.null(fit$lingam_residuals)) return(NULL)

  E <- as.matrix(fit$lingam_residuals)

  if (!all(dim(E) == dim(X))) return(NULL)

  colnames(E) <- colnames(X)
  E
}

asset_names <- setdiff(colnames(data_merged), "SP500")

event_asset_results <- list()
panel_results <- list()

for (ev_name in names(events)) {

  ev_date <- events[[ev_name]]
  day_data <- get_event_window(data_merged, ev_date)

  stat_names <- c(
    "Mean",
    "Variance",
    "Skewness",
    "Excess Kurtosis",
    "Student-t density shape parameter"
  )

  # Keep a fixed table shape even if a window, demarketing step, or LiNGAM fit fails.
  if (NROW(day_data) == 0) {
    stats_mat <- matrix(NA_real_, nrow = length(stat_names), ncol = length(asset_names))
    rownames(stats_mat) <- stat_names
    colnames(stats_mat) <- asset_names

  } else {
    demarketed_X <- get_demarketed_matrix(day_data, asset_names)

    if (is.null(demarketed_X)) {
      stats_mat <- matrix(NA_real_, nrow = length(stat_names), ncol = length(asset_names))
      rownames(stats_mat) <- stat_names
      colnames(stats_mat) <- asset_names

    } else {
      lingam_E <- get_lingam_residuals_from_R2_network(demarketed_X)

      if (is.null(lingam_E)) {
        stats_mat <- matrix(NA_real_, nrow = length(stat_names), ncol = length(asset_names))
        rownames(stats_mat) <- stat_names
        colnames(stats_mat) <- asset_names

      } else {
        stats_mat <- sapply(asset_names, function(tkr) {
          compute_stats_from_vector(lingam_E[, tkr])
        })
      }
    }
  }

  event_asset_results[[ev_name]] <- as.data.frame(stats_mat)

  # Summarize residual statistics across assets for each event.
  panel_mat <- t(apply(stats_mat, 1, function(z) {
    z <- as.numeric(z)
    c(
      median = if (all(is.na(z))) NA_real_ else median(z, na.rm = TRUE),
      min    = if (all(is.na(z))) NA_real_ else min(z, na.rm = TRUE),
      max    = if (all(is.na(z))) NA_real_ else max(z, na.rm = TRUE)
    )
  }))

  panel_results[[ev_name]] <- panel_mat
}

final_table <- cbind(
  panel_results[[1]],
  panel_results[[2]],
  panel_results[[3]],
  panel_results[[4]]
)

final_table <- as.data.frame(final_table)

colnames(final_table) <- c(
  "Event1_median", "Event1_min", "Event1_max",
  "Event2_median", "Event2_min", "Event2_max",
  "Event3_median", "Event3_min", "Event3_max",
  "Event4_median", "Event4_min", "Event4_max"
)

rownames(final_table) <- c(
  "Mean",
  "Variance",
  "Skewness",
  "Excess Kurtosis",
  "Student-t density shape parameter"
)

final_table_named <- final_table
colnames(final_table_named) <- c(
  "ChatGPT_median", "ChatGPT_min", "ChatGPT_max",
  "Hopper_median", "Hopper_min", "Hopper_max",
  "Copilot_median", "Copilot_min", "Copilot_max",
  "Claude_median", "Claude_min", "Claude_max"
)

# Use the same formatting convention as in the demarketed-return table.
format_final_table <- function(df) {
  out <- as.data.frame(matrix("", nrow = nrow(df), ncol = ncol(df)))
  rownames(out) <- rownames(df)
  colnames(out) <- colnames(df)

  sci_rows <- c("Mean", "Variance")

  for (i in seq_len(nrow(df))) {
    rn <- rownames(df)[i]

    if (rn %in% sci_rows) {
      out[i, ] <- sapply(df[i, ], function(x) {
        if (is.na(x)) {
          NA_character_
        } else {
          formatC(x, format = "e", digits = 4)
        }
      })
    } else {
      out[i, ] <- sapply(df[i, ], function(x) {
        if (is.na(x)) {
          NA_character_
        } else {
          sprintf("%.4f", round(x, 4))
        }
      })
    }
  }

  out
}

final_table_named_fmt <- format_final_table(final_table_named)

print(final_table_named_fmt, quote = FALSE)
Output 1
                                  ChatGPT_median ChatGPT_min ChatGPT_max
Mean                                 -4.9894e-22 -2.2805e-19  1.8492e-20
Variance                              6.0181e-07  1.4686e-07  5.5381e-06
Skewness                                  0.2820     -0.6586      2.0525
Excess Kurtosis                           4.0854      1.3027     23.5446
Student-t density shape parameter         3.9736      2.0461      5.8412
                                  Hopper_median  Hopper_min Hopper_max
Mean                                 1.5483e-21 -1.1279e-20 2.8251e-19
Variance                             6.9809e-07  1.6558e-07 1.8148e-06
Skewness                                 0.1281     -0.5343     1.2112
Excess Kurtosis                          3.4682      0.8303    12.3028
Student-t density shape parameter        3.5242      2.7519     8.6867
                                  Copilot_median Copilot_min Copilot_max
Mean                                 -1.8119e-21 -6.0059e-20  1.6810e-19
Variance                              1.7731e-07  7.5973e-08  8.7952e-07
Skewness                                 -0.0194     -1.0374      1.4039
Excess Kurtosis                           4.1727      1.8080      9.3242
Student-t density shape parameter         3.4243      2.5331      8.0000
                                  Claude_median  Claude_min Claude_max
Mean                                 1.1203e-21 -6.9405e-20 6.4325e-19
Variance                             1.7719e-06  4.0941e-07 4.2332e-06
Skewness                                 0.3702     -1.3297     3.3859
Excess Kurtosis                          9.2251      2.4755    44.1258
Student-t density shape parameter        3.1939      2.1753     5.1292

Normality tests for demarketed LiNGAM residuals

This block applies the Jarque--Bera and Shapiro--Wilk normality tests to the LiNGAM residuals. The procedure is the same as in the previous normality-test section, but the tested series are now the estimated structural shocks obtained after demarketing and contemporaneous LiNGAM identification. These results are reported in Table C.2 in Appendix C.

R code Cell 12
Show code
library(xts)
library(readr)
library(tseries)

# SP500 is required for the first-stage demarketing step.
stopifnot("SP500" %in% colnames(data_merged))

events <- c(
  "ChatGPT launch" = "2022-11-30",
  "NVIDIA Hopper/H100" = "2022-03-22",
  "GitHub Copilot preview" = "2021-06-29",
  "Claude Opus 4.6" = "2026-02-05"
)

# Extract the regular trading session for one event day.
get_event_window <- function(data, date_str,
                             start_hms = "09:30:00",
                             end_hms   = "16:00:00") {
  from <- paste0(date_str, " ", start_hms)
  to   <- paste0(date_str, " ", end_hms)
  data[paste0(from, "/", to)]
}

# First-stage demarketing: regress each stock on SP500 and collect residuals.
get_demarketed_matrix <- function(day_data, asset_names) {
  needed <- c(asset_names, "SP500")
  sub <- day_data[, needed]
  sub <- na.omit(sub)

  if (NROW(sub) < 8) return(NULL)

  x_mkt <- as.numeric(sub[, "SP500"])

  eps_mat <- sapply(asset_names, function(tkr) {
    y <- as.numeric(sub[, tkr])

    fit <- tryCatch(
      lm(y ~ x_mkt),
      error = function(e) NULL
    )

    if (is.null(fit)) {
      rep(NA_real_, length(y))
    } else {
      resid(fit)
    }
  })

  eps_mat <- as.matrix(eps_mat)
  colnames(eps_mat) <- asset_names
  eps_mat <- eps_mat[complete.cases(eps_mat), , drop = FALSE]

  if (nrow(eps_mat) < 8) return(NULL)
  eps_mat
}

# Run the LiNGAM-based network routine and extract its residual matrix.
get_lingam_residuals_from_R2_network <- function(X) {
  X <- as.matrix(X)

  if (nrow(X) < 8 || ncol(X) < 2) return(NULL)

  X_df <- as.data.frame(X)

  fit <- tryCatch(
    R2_network(
      data = X_df,
      directed = TRUE,
      amat = FALSE,
      dag_method = "lingam",
      standardize_for_dag = FALSE
    ),
    error = function(e) NULL
  )

  if (is.null(fit)) return(NULL)
  if (is.null(fit$lingam_residuals)) return(NULL)

  E <- as.matrix(fit$lingam_residuals)

  if (!all(dim(E) == dim(X))) return(NULL)

  colnames(E) <- colnames(X)
  E
}

# Apply the two normality tests to one residual vector.
compute_test_results_from_vector <- function(z) {
  z <- as.numeric(z)
  z <- z[is.finite(z)]

  out <- c(
    `Jarque-Bera statistic` = NA_real_,
    `Jarque-Bera p-value`   = NA_real_,
    `Shapiro-Wilk W`        = NA_real_,
    `Shapiro-Wilk p-value`  = NA_real_
  )

  if (length(z) < 8 || sd(z) <= 0 || !is.finite(sd(z))) {
    return(out)
  }

  jb_res <- tryCatch(
    tseries::jarque.bera.test(z),
    error = function(e) NULL
  )

  if (!is.null(jb_res)) {
    out["Jarque-Bera statistic"] <- as.numeric(jb_res$statistic)
    out["Jarque-Bera p-value"]   <- jb_res$p.value
  }

  sw_res <- tryCatch(
    shapiro.test(z),
    error = function(e) NULL
  )

  if (!is.null(sw_res)) {
    out["Shapiro-Wilk W"]       <- as.numeric(sw_res$statistic)
    out["Shapiro-Wilk p-value"] <- sw_res$p.value
  }

  out
}

asset_names <- setdiff(colnames(data_merged), "SP500")

event_test_results <- list()
panel_test_results <- list()

for (ev_name in names(events)) {

  ev_date <- events[[ev_name]]
  day_data <- get_event_window(data_merged, ev_date)

  test_names <- c(
    "Jarque-Bera statistic",
    "Jarque-Bera p-value",
    "Shapiro-Wilk W",
    "Shapiro-Wilk p-value"
  )

  # Preserve the output structure even when the window, demarketing, or LiNGAM step fails.
  if (NROW(day_data) == 0) {
    test_mat <- matrix(NA_real_, nrow = length(test_names), ncol = length(asset_names))
    rownames(test_mat) <- test_names
    colnames(test_mat) <- asset_names

  } else {
    demarketed_X <- get_demarketed_matrix(day_data, asset_names)

    if (is.null(demarketed_X)) {
      test_mat <- matrix(NA_real_, nrow = length(test_names), ncol = length(asset_names))
      rownames(test_mat) <- test_names
      colnames(test_mat) <- asset_names

    } else {
      lingam_E <- get_lingam_residuals_from_R2_network(demarketed_X)

      if (is.null(lingam_E)) {
        test_mat <- matrix(NA_real_, nrow = length(test_names), ncol = length(asset_names))
        rownames(test_mat) <- test_names
        colnames(test_mat) <- asset_names

      } else {
        test_mat <- sapply(asset_names, function(tkr) {
          compute_test_results_from_vector(lingam_E[, tkr])
        })
      }
    }
  }

  event_test_results[[ev_name]] <- as.data.frame(test_mat)

  # Summarize the cross-section for each event and each test statistic.
  panel_mat <- t(apply(test_mat, 1, function(z) {
    z <- as.numeric(z)
    c(
      median = if (all(is.na(z))) NA_real_ else median(z, na.rm = TRUE),
      min    = if (all(is.na(z))) NA_real_ else min(z, na.rm = TRUE),
      max    = if (all(is.na(z))) NA_real_ else max(z, na.rm = TRUE)
    )
  }))

  panel_test_results[[ev_name]] <- panel_mat
}

final_test_table <- cbind(
  panel_test_results[[1]],
  panel_test_results[[2]],
  panel_test_results[[3]],
  panel_test_results[[4]]
)

final_test_table <- as.data.frame(final_test_table)

colnames(final_test_table) <- c(
  "ChatGPT_median", "ChatGPT_min", "ChatGPT_max",
  "Hopper_median", "Hopper_min", "Hopper_max",
  "Copilot_median", "Copilot_min", "Copilot_max",
  "Claude_median", "Claude_min", "Claude_max"
)

rownames(final_test_table) <- c(
  "Jarque-Bera statistic",
  "Jarque-Bera p-value",
  "Shapiro-Wilk W",
  "Shapiro-Wilk p-value"
)

print(round(final_test_table, 6))
Output 1
                      ChatGPT_median ChatGPT_min ChatGPT_max Hopper_median
Jarque-Bera statistic     297.210466   32.394849 9008.259039    218.906617
Jarque-Bera p-value         0.000000    0.000000    0.000000      0.000000
Shapiro-Wilk W              0.946944    0.741686    0.979000      0.952306
Shapiro-Wilk p-value        0.000000    0.000000    0.000019      0.000000
                      Hopper_min  Hopper_max Copilot_median Copilot_min
Jarque-Bera statistic   11.32037 2554.930752     332.153586   54.115589
Jarque-Bera p-value      0.00000    0.003482       0.000000    0.000000
Shapiro-Wilk W           0.88531    0.992501       0.943136    0.886036
Shapiro-Wilk p-value     0.00000    0.047569       0.000000    0.000000
                      Copilot_max Claude_median Claude_min   Claude_max
Jarque-Bera statistic 1540.884458   1391.810890  100.78234 32385.271163
Jarque-Bera p-value      0.000000      0.000000    0.00000     0.000000
Shapiro-Wilk W           0.977795      0.895098    0.73438     0.971066
Shapiro-Wilk p-value     0.000011      0.000000    0.00000     0.000001

BIC comparison between VAR(0) and VAR(1)

This final block compares a contemporaneous-only specification, VAR(0), with a first-order dynamic specification, VAR(1), for each event day. The comparison is based on the Bayesian Information Criterion computed from the residual covariance matrix of the fitted system. The preferred model is the one with the lower BIC. These results correspond to Table C.1 in Appendix C.

R code Cell 14
Show code
library(xts)

events <- c(
  "ChatGPT launch" = "2022-11-30",
  "NVIDIA Hopper/H100" = "2022-03-22",
  "GitHub Copilot preview" = "2021-06-29",
  "Claude Opus 4.6" = "2026-02-05"
)

# The market index is needed for the demarketing step before the VAR comparison.
stopifnot("SP500" %in% colnames(data_merged))

stock_names <- setdiff(colnames(data_merged), "SP500")

# For the BIC comparison, use the full event calendar day rather than only
# the regular trading-session subset.
get_one_day <- function(x, date_string) {
  x[paste0(date_string, " 00:00:00/", date_string, " 23:59:59")]
}

# Construct the demarketed return matrix for one event day.
get_demarketed_day <- function(day_data, stock_names) {
  needed <- c(stock_names, "SP500")
  sub <- day_data[, needed]
  sub <- na.omit(sub)

  if (NROW(sub) < 5) return(NULL)

  market_vec <- as.numeric(sub[, "SP500"])

  eps_mat <- sapply(stock_names, function(nm) {
    y <- as.numeric(sub[, nm])

    fit <- tryCatch(
      lm(y ~ market_vec),
      error = function(e) NULL
    )

    if (is.null(fit)) {
      rep(NA_real_, length(y))
    } else {
      resid(fit)
    }
  })

  eps_mat <- as.matrix(eps_mat)
  colnames(eps_mat) <- stock_names
  eps_mat <- eps_mat[complete.cases(eps_mat), , drop = FALSE]

  if (nrow(eps_mat) < 5 || ncol(eps_mat) < 2) return(NULL)

  eps_mat
}

# Compute BIC values for VAR(0) and VAR(1) using the same multivariate sample.
compute_var0_var1_bic <- function(X, type = "const") {
  X <- as.matrix(X)

  if (nrow(X) < 5 || ncol(X) < 2) return(NULL)

  K <- ncol(X)
  N <- nrow(X)

  # VAR(1) uses X_{t-1} to explain X_t. The same effective sample is used
  # for both VAR(0) and VAR(1), which makes the BIC comparison consistent.
  Y <- X[-1, , drop = FALSE]
  Xlag <- X[-N, , drop = FALSE]
  T_eff <- nrow(Y)

  if (T_eff < 3) return(NULL)

  # Fit a multivariate system equation-by-equation using least squares.
  fit_system <- function(Ymat, Xreg = NULL, type = "const") {
    if (is.null(Xreg)) {
      if (type == "const") {
        Z <- matrix(1, nrow = nrow(Ymat), ncol = 1)
      } else if (type == "none") {
        Z <- NULL
      } else {
        stop("Only type = 'const' or type = 'none' is implemented.")
      }
    } else {
      if (type == "const") {
        Z <- cbind(1, Xreg)
      } else if (type == "none") {
        Z <- Xreg
      } else {
        stop("Only type = 'const' or type = 'none' is implemented.")
      }
    }

    if (is.null(Z)) {
      U <- Ymat
      n_star <- 0
    } else {
      Bhat <- tryCatch(qr.solve(Z, Ymat), error = function(e) NULL)
      if (is.null(Bhat)) return(NULL)

      U <- Ymat - Z %*% Bhat
      n_star <- ncol(Z)
    }

    # Residual covariance and log-determinant term of the Gaussian system likelihood.
    Sigma_u <- crossprod(U) / nrow(Ymat)

    det_out <- tryCatch(
      determinant(Sigma_u, logarithm = TRUE),
      error = function(e) NULL
    )
    if (is.null(det_out) || !is.finite(as.numeric(det_out$modulus))) return(NULL)

    logdet <- as.numeric(det_out$modulus)

    # BIC per observation for a K-variable system with n_star regressors per equation.
    BIC_val <- logdet + (log(nrow(Ymat)) / nrow(Ymat)) * K * n_star

    list(
      T_eff = nrow(Ymat),
      K = K,
      n_star = n_star,
      logdet = logdet,
      BIC = BIC_val
    )
  }

  var0 <- fit_system(Ymat = Y, Xreg = NULL, type = type)
  var1 <- fit_system(Ymat = Y, Xreg = Xlag, type = type)

  if (is.null(var0) || is.null(var1)) return(NULL)

  list(
    var0 = var0,
    var1 = var1
  )
}

bic_results <- lapply(names(events), function(ev_name) {
  d <- events[[ev_name]]

  raw_day <- get_one_day(data_merged, d)
  if (NROW(raw_day) == 0) {
    return(data.frame(
      event = ev_name,
      date = d,
      T_eff = NA_integer_,
      K = NA_integer_,
      BIC_VAR0 = NA_real_,
      BIC_VAR1 = NA_real_,
      preferred_BIC = NA_character_
    ))
  }

  demarketed_day <- get_demarketed_day(raw_day, stock_names)
  if (is.null(demarketed_day)) {
    return(data.frame(
      event = ev_name,
      date = d,
      T_eff = NA_integer_,
      K = NA_integer_,
      BIC_VAR0 = NA_real_,
      BIC_VAR1 = NA_real_,
      preferred_BIC = NA_character_
    ))
  }

  bic <- compute_var0_var1_bic(demarketed_day, type = "const")
  if (is.null(bic)) {
    return(data.frame(
      event = ev_name,
      date = d,
      T_eff = NA_integer_,
      K = NA_integer_,
      BIC_VAR0 = NA_real_,
      BIC_VAR1 = NA_real_,
      preferred_BIC = NA_character_
    ))
  }

  data.frame(
    event = ev_name,
    date = d,
    T_eff = bic$var0$T_eff,
    K = bic$var0$K,
    BIC_VAR0 = bic$var0$BIC,
    BIC_VAR1 = bic$var1$BIC,
    preferred_BIC = ifelse(bic$var1$BIC < bic$var0$BIC, "VAR(1)", "VAR(0)")
  )
})

bic_results <- do.call(rbind, bic_results)
print(bic_results, digits = 6)
Output 1
                   event       date T_eff  K BIC_VAR0 BIC_VAR1 preferred_BIC
1         ChatGPT launch 2022-11-30   389 19 -273.514 -270.431        VAR(0)
2     NVIDIA Hopper/H100 2022-03-22   389 19 -277.480 -273.856        VAR(0)
3 GitHub Copilot preview 2021-06-29   389 19 -293.707 -290.479        VAR(0)
4        Claude Opus 4.6 2026-02-05   389 19 -257.576 -255.320        VAR(0)