Cox Proportional Hazards Model with KL Divergence for Data Integration

Fits a Cox proportional hazards model that incorporates external information via a Kullback–Leibler (KL) divergence penalty. External information can be supplied either as external risk scores (RS) or as external coefficients (beta). The tuning parameter(s) etas control the strength of integration.

Usage

coxkl(
  z,
  delta,
  time,
  stratum = NULL,
  RS = NULL,
  beta = NULL,
  etas,
  tol = 1e-04,
  Mstop = 100,
  backtrack = FALSE,
  message = FALSE,
  data_sorted = FALSE,
  beta_initial = NULL
)

Arguments

z: Numeric matrix of covariates with rows representing observations and columns representing predictor variables. All covariates must be numeric.
delta: Numeric vector of event indicators (1 = event, 0 = censored).
time: Numeric vector of observed event or censoring times. No sorting required.
stratum: Optional numeric or factor vector defining strata.
RS: Optional numeric vector or matrix of external risk scores. Length (or number of rows) must equal the number of observations. If not supplied, beta must be provided.
beta: Optional numeric vector of external coefficients (e.g., from prior studies). Length must equal the number of columns in z. Use zeros to represent covariates without external information. If not supplied, RS must be provided.
etas: Numeric vector of tuning parameters controlling the reliance on external information. Larger values place more weight on the external source.
tol: Convergence tolerance for the optimization algorithm. Default is 1e-4.
Mstop: Maximum number of iterations for the optimization algorithm. Default is 100.
backtrack: Logical; if TRUE, backtracking line search is applied during optimization. Default is FALSE.
message: Logical; if TRUE, progress messages are printed during model fitting. Default is FALSE.
data_sorted: Logical; if TRUE, input data are assumed to be already sorted by stratum and time. Default is FALSE.
beta_initial: Optional numeric vector of length p giving the starting value for the first eta. If NULL, a zero vector is used.

Value

An object of class "coxkl" containing:

eta: the fitted \(\eta\) sequence.
beta: estimated coefficient matrix (\(p \times |\eta|\)).
linear.predictors: matrix of linear predictors.
likelihood: vector of partial likelihoods.
data: a list containing the input data used in fitting (z, time, delta, stratum, data_sorted).

Details

If beta is supplied (length ncol(z)), external risk scores are computed internally as RS = z %*% beta. If RS is supplied, it is used directly. Data are optionally sorted by stratum (or a single stratum if NULL) and increasing time when data_sorted = FALSE. Estimation proceeds over the sorted data, and the returned linear.predictors are mapped back to the original order. Optimization uses warm starts across the (ascending) etas grid and supports backtracking line search when backtrack = TRUE.

Internally, the routine computes a stratum-wise adjusted event indicator (delta_tilde) and maximizes a KL-regularized partial likelihood. The current implementation fixes lambda = 0 in the low-level optimizer and exposes etas as the primary tuning control.

Examples

data(ExampleData_lowdim)

train_dat_lowdim <- ExampleData_lowdim$train
beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good
eta_list <- generate_eta(method = "exponential", n = 10, max_eta = 5)

model <- coxkl(z = train_dat_lowdim$z,
               delta = train_dat_lowdim$status,
               time = train_dat_lowdim$time,
               stratum = train_dat_lowdim$stratum,
               beta = beta_external_good_lowdim,
               etas = eta_list)