Cross-Validation for CoxKL Model with Elastic Net & Lasso Penalty
cv.coxkl_enet.RdPerforms k-fold cross-validation to tune the hyperparameters for the high-dimensional Cox proportional hazards model with Kullback–Leibler (KL) divergence penalty.
This function primarily tunes the external information weight eta.
For each candidate eta, it internally validates the optimal regularization
parameter lambda.
Usage
cv.coxkl_enet(
z,
delta,
time,
stratum = NULL,
RS = NULL,
beta = NULL,
etas,
alpha = 1,
lambda = NULL,
nlambda = 100,
lambda.min.ratio = ifelse(n < p, 0.05, 0.001),
nfolds = 5,
cv.criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"),
c_index_stratum = NULL,
message = FALSE,
seed = NULL,
...
)Arguments
- z
Numeric matrix of covariates. Rows represent individuals and columns represent predictors.
- delta
Numeric vector of event indicators (1 = event, 0 = censored).
- time
Numeric vector of observed times (event or censoring).
- stratum
Optional numeric or factor vector indicating strata. If
NULL, all subjects are assumed to be in the same stratum.- RS
Optional numeric vector or matrix of external risk scores. If not provided,
betamust be supplied.- beta
Optional numeric vector of external coefficients (length equal to
ncol(z)). If provided, it is used to compute external risk scores. If not provided,RSmust be supplied.- etas
Numeric vector of candidate
etavalues to be evaluated.- alpha
Elastic-net mixing parameter in \((0,1]\). Default is
1(lasso penalty).- lambda
Optional numeric vector of lambda values. If
NULL, a path is generated automatically.- nlambda
Integer. Number of lambda values to generate if
lambdais NULL. Default is 100.- lambda.min.ratio
Numeric. Ratio of min/max lambda. Default depends on sample size vs dimension (0.05 if n < p, else 1e-03).
- nfolds
Integer. Number of cross-validation folds. Default is
5.- cv.criteria
Character string specifying the cross-validation criterion for selecting both
etaandlambda. Choices are:"V&VH"(default): V&VH loss."LinPred": Loss based on cross-validated linear predictors."CIndex_pooled": Pooled C-Index."CIndex_foldaverage": Average C-Index across folds.
- c_index_stratum
Optional stratum vector. Required only when
cv.criteriais set to"CIndex_pooled"or"CIndex_foldaverage", and a stratified C-index needs to be computed while the fitted model is non-stratified. Default isNULL.- message
Logical. Whether to print progress messages. Default is
FALSE.- seed
Optional integer. Random seed for reproducible fold assignment.
- ...
Additional arguments passed to the underlying fitting function
coxkl_enet.
Value
An object of class "cv.coxkl_enet". A list containing:
bestA list with the optimal parameters:
best_eta: The selected eta value.best_lambda: The selected lambda value.best_beta: The coefficient vector corresponding to the best eta and lambda.criteria: The criterion used for selection.
integrated_stat.full_resultsA
data.framecontaining the performance metric for every combination ofetaandlambda.integrated_stat.best_per_etaA
data.framecontaining the best lambda and corresponding score for each candidateeta.integrated_stat.betahat_bestA matrix of coefficients where each column corresponds to the optimal model for a specific
eta.criteriaThe selection criterion used.
alphaThe elastic net mixing parameter used.
nfoldsThe number of folds used.
Details
The function iterates through the provided etas. For each eta,
it performs cross-validation (based on nfolds) to select the optimal lambda
and computes the corresponding cross-validation score.
The available criteria for selection are:
"V&VH": The Verweij & Van Houwelingen partial likelihood loss (default)."LinPred": Loss based on the prognostic performance of the linear predictor."CIndex_pooled": Harrell's C-index computed by pooling linear predictors across folds."CIndex_foldaverage": Harrell's C-index computed within each fold and averaged.
Examples
if (FALSE) { # \dontrun{
data(ExampleData_highdim)
train_dat_highdim <- ExampleData_highdim$train
beta_external_highdim <- ExampleData_highdim$beta_external
eta_list <- generate_eta(method = "exponential", n = 50, max_eta = 100)
cv.coxkl_enet_est <- cv.coxkl_enet(
z = train_dat_highdim$z,
delta = train_dat_highdim$status,
time = train_dat_highdim$time,
stratum = train_dat_highdim$stratum,
beta = beta_external_highdim,
etas = eta_list,
alpha = 1,
cv.criteria = "CIndex_pooled"
)
} # }