Cross-Validation for CoxKL Model with elastic net & lasso penalty
cv.coxkl_enet.RdThis function performs cross-validation on the high-dimensional Cox model with
Kullback–Leibler (KL) penalty.
It tunes the parameter eta (external information weight) using user-specified
cross-validation criteria, while also evaluating a lambda path
(either provided or generated) and selecting the best lambda per eta.
Usage
cv.coxkl_enet(
z,
delta,
time,
stratum = NULL,
RS = NULL,
beta = NULL,
etas,
alpha = 1,
lambda = NULL,
nlambda = 100,
lambda.min.ratio = ifelse(n < p, 0.05, 0.001),
nfolds = 5,
cv.criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"),
c_index_stratum = NULL,
message = FALSE,
seed = NULL,
...
)Arguments
- z
Numeric matrix of covariates with rows representing individuals and columns representing predictors.
- delta
Numeric vector of event indicators (1 = event, 0 = censored).
- time
Numeric vector of observed times (event or censoring).
- stratum
Optional factor or numeric vector indicating strata.
- RS
Optional numeric vector or matrix of external risk scores. If not provided,
betamust be supplied.- beta
Optional numeric vector of external coefficients (length equal to
ncol(z)). If not provided,RSmust be supplied.- etas
Numeric vector of candidate
etavalues to be evaluated.- alpha
Elastic-net mixing parameter in \((0,1]\). Default =
1(lasso penalty).- lambda
Optional numeric scalar or vector of penalty parameters. If
NULL, a decreasing path is generated usingnlambdaandlambda.min.ratio.- nlambda
Integer number of lambda values to generate when
lambdaisNULL. Default100.- lambda.min.ratio
Ratio of the smallest to the largest lambda when generating a sequence (when
lambdaisNULL). Default0.05whenn < p, otherwise1e-3.- nfolds
Integer; number of cross-validation folds. Default =
5.- cv.criteria
Character string specifying the cross-validation criterion. Choices are:
"V&VH"(default):"V&VH"loss."LinPred": loss based on cross-validated linear predictors."CIndex_pooled": pool all held-out predictions and compute one overall C-index."CIndex_foldaverage": average C-index across folds.
- c_index_stratum
Optional stratum vector. Used only when
cv.criteriais"CIndex_pooled"or"CIndex_foldaverage"to compute a stratified C-index while the fitted model is non-stratified; if supplied, it must be identical tostratum. DefaultNULL.- message
Logical; whether to print progress messages. Default =
FALSE.- seed
Optional integer random seed for fold assignment.
- ...
Additional arguments passed to
coxkl_enet.
Value
An object of class "cv.coxkl_enet":
integrated_stat.full_resultsData frame with columns
eta,lambda, and the aggregated CV score for eachlambdaunder the chosencv.criteria. For loss criteria, an additional column with the transformed loss (Loss = -2 * score); for C-index criteria, a column namedCIndex_pooledorCIndex_foldaverage.integrated_stat.best_per_etaData frame with the best
lambda(pereta) according to the chosencv.criteria(minimizing loss or maximizing C-index).integrated_stat.betahat_bestMatrix of coefficient vectors (columns) corresponding to the best
lambdafor eacheta.external_statScalar baseline statistic computed from the external risk score
RSunder the samecv.criteria.criteriaThe evaluation criterion used (as provided in
cv.criteria).alphaThe elastic-net mixing parameter used.
nfoldsNumber of folds.
Details
Data are sorted by stratum and time. External info must be from RS or
beta (if beta given with length ncol(z), RS = z %*% beta); alpha \(\in (0,1]\).
For each candidate eta, a decreasing lambda path is used (generated from nlambda/lambda.min.ratio
if lambda = NULL); CV folds are created by get_fold. Each fold fits coxkl_enet
on the training split (full lambda path) and evaluates the chosen criterion on the test split.
Aggregation follows the code paths for "V&VH", "LinPred", "CIndex_pooled", or "CIndex_foldaverage":
"V&VH": sumspl(full) - pl(train)across folds (reported as loss viaLoss = -2 * score)."LinPred": aggregates test-fold linear predictors and evaluates partial log-likelihood on full data (reported asLoss = -2 * score)."CIndex_pooled": pools comparable-pair numerators/denominators across folds to compute one C-index."CIndex_foldaverage": averages the per-fold stratified C-index.
The best lambda is selected per eta (min loss / max C-index), and the function returns full results,
the per-eta optimum, corresponding coefficients, and an external baseline from RS.
Examples
data(ExampleData_highdim)
train_dat_highdim <- ExampleData_highdim$train
beta_external_highdim <- ExampleData_highdim$beta_external
etas <- generate_eta(method = "exponential", n = 10, max_eta = 100)
cv_res <- cv.coxkl_enet(z = train_dat_highdim$z,
delta = train_dat_highdim$status,
time = train_dat_highdim$time,
stratum = NULL,
RS = NULL,
beta = beta_external_highdim,
etas = etas,
alpha = 1.0)
#> Warning: Stratum not provided. Treating all data as one stratum.