Skip to contents

This function performs cross-validation on the high-dimensional Cox model with Kullback–Leibler (KL) penalty. It tunes the parameter eta (external information weight) using user-specified cross-validation criteria, while also evaluating a lambda path (either provided or generated) and selecting the best lambda per eta.

Usage

cv.coxkl_enet(
  z,
  delta,
  time,
  stratum = NULL,
  RS = NULL,
  beta = NULL,
  etas,
  alpha = 1,
  lambda = NULL,
  nlambda = 100,
  lambda.min.ratio = ifelse(n < p, 0.05, 0.001),
  nfolds = 5,
  cv.criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"),
  c_index_stratum = NULL,
  message = FALSE,
  seed = NULL,
  ...
)

Arguments

z

Numeric matrix of covariates with rows representing individuals and columns representing predictors.

delta

Numeric vector of event indicators (1 = event, 0 = censored).

time

Numeric vector of observed times (event or censoring).

stratum

Optional factor or numeric vector indicating strata.

RS

Optional numeric vector or matrix of external risk scores. If not provided, beta must be supplied.

beta

Optional numeric vector of external coefficients (length equal to ncol(z)). If not provided, RS must be supplied.

etas

Numeric vector of candidate eta values to be evaluated.

alpha

Elastic-net mixing parameter in \((0,1]\). Default = 1 (lasso penalty).

lambda

Optional numeric scalar or vector of penalty parameters. If NULL, a decreasing path is generated using nlambda and lambda.min.ratio.

nlambda

Integer number of lambda values to generate when lambda is NULL. Default 100.

lambda.min.ratio

Ratio of the smallest to the largest lambda when generating a sequence (when lambda is NULL). Default 0.05 when n < p, otherwise 1e-3.

nfolds

Integer; number of cross-validation folds. Default = 5.

cv.criteria

Character string specifying the cross-validation criterion. Choices are:

  • "V&VH" (default): "V&VH" loss.

  • "LinPred": loss based on cross-validated linear predictors.

  • "CIndex_pooled": pool all held-out predictions and compute one overall C-index.

  • "CIndex_foldaverage": average C-index across folds.

c_index_stratum

Optional stratum vector. Used only when cv.criteria is "CIndex_pooled" or "CIndex_foldaverage" to compute a stratified C-index while the fitted model is non-stratified; if supplied, it must be identical to stratum. Default NULL.

message

Logical; whether to print progress messages. Default = FALSE.

seed

Optional integer random seed for fold assignment.

...

Additional arguments passed to coxkl_enet.

Value

An object of class "cv.coxkl_enet":

integrated_stat.full_results

Data frame with columns eta, lambda, and the aggregated CV score for each lambda under the chosen cv.criteria. For loss criteria, an additional column with the transformed loss (Loss = -2 * score); for C-index criteria, a column named CIndex_pooled or CIndex_foldaverage.

integrated_stat.best_per_eta

Data frame with the best lambda (per eta) according to the chosen cv.criteria (minimizing loss or maximizing C-index).

integrated_stat.betahat_best

Matrix of coefficient vectors (columns) corresponding to the best lambda for each eta.

external_stat

Scalar baseline statistic computed from the external risk score RS under the same cv.criteria.

criteria

The evaluation criterion used (as provided in cv.criteria).

alpha

The elastic-net mixing parameter used.

nfolds

Number of folds.

Details

Data are sorted by stratum and time. External info must be from RS or beta (if beta given with length ncol(z), RS = z %*% beta); alpha \(\in (0,1]\).

For each candidate eta, a decreasing lambda path is used (generated from nlambda/lambda.min.ratio if lambda = NULL); CV folds are created by get_fold. Each fold fits coxkl_enet on the training split (full lambda path) and evaluates the chosen criterion on the test split.

Aggregation follows the code paths for "V&VH", "LinPred", "CIndex_pooled", or "CIndex_foldaverage":

  • "V&VH": sums pl(full) - pl(train) across folds (reported as loss via Loss = -2 * score).

  • "LinPred": aggregates test-fold linear predictors and evaluates partial log-likelihood on full data (reported as Loss = -2 * score).

  • "CIndex_pooled": pools comparable-pair numerators/denominators across folds to compute one C-index.

  • "CIndex_foldaverage": averages the per-fold stratified C-index.

The best lambda is selected per eta (min loss / max C-index), and the function returns full results, the per-eta optimum, corresponding coefficients, and an external baseline from RS.

Examples

data(ExampleData_highdim) 

train_dat_highdim <- ExampleData_highdim$train
beta_external_highdim <- ExampleData_highdim$beta_external

etas <- generate_eta(method = "exponential", n = 10, max_eta = 100)

cv_res <- cv.coxkl_enet(z = train_dat_highdim$z,
                        delta = train_dat_highdim$status,
                        time = train_dat_highdim$time,
                        stratum = NULL,
                        RS = NULL,
                        beta = beta_external_highdim,
                        etas = etas,
                        alpha = 1.0)
#> Warning: Stratum not provided. Treating all data as one stratum.