Cross-Validated CLR with Mahalanobis Distance Transfer Learning and Elastic Net Penalty
cv.ncc_MDTL_enet.RdPerforms K-fold cross-validation (CV) to jointly select the integration
parameter eta and the Elastic Net penalty parameter lambda
for Conditional Logistic Regression with Mahalanobis distance transfer
learning and Elastic Net penalty, implemented via ncc_MDTL_enet.
This function is designed for 1:m matched case-control settings where each stratum (matched set) contains exactly one case and \(m\) controls.
Arguments
- y
Numeric vector of binary outcomes (0 = control, 1 = case).
- z
Numeric matrix of covariates.
- stratum
Numeric or factor vector defining the matched sets. Required.
- beta
Numeric vector of external coefficients (length
ncol(z)). Required.- vcov
Optional numeric matrix (
ncol(z)xncol(z)) as the weighting matrix \(Q\). IfNULL, defaults to the identity matrix.- etas
Numeric vector of candidate tuning values for \(\eta\). Required.
- alpha
Elastic Net mixing parameter in \((0,1]\). Default
NULL(set to 1 with a warning if not supplied).- lambda
Optional numeric vector of lambda values. If
NULL, a lambda path is generated automatically for eacheta.- nlambda
Integer. Number of lambda values. Default
100.- lambda.min.ratio
Smallest lambda as a fraction of
lambda.max. Default depends on sample size relative to number of covariates.- nfolds
Number of cross-validation folds. Default
5.- criteria
Character string specifying the CV performance criterion. One of
"loss"(default),"AUC","CIndex", or"Brier".- message
Logical. If
TRUE, prints progress messages. DefaultFALSE.- seed
Optional integer seed for reproducible fold assignment. Default
NULL.- ...
Additional arguments passed to
ncc_MDTL_enet.
Value
A list of class "cv.ncc_MDTL_enet" containing:
bestA list with the global best \((\eta, \lambda)\):
best_eta,best_lambda,best_beta,criteria.integrated_stat.full_resultsA
data.framewith the CV score for every \((\eta, \lambda)\) combination.integrated_stat.best_per_etaA
data.framewith the bestlambdaand score for eacheta.integrated_stat.betahat_bestMatrix of full-data coefficients at the best
lambdafor eacheta.criteriaThe CV criterion used.
alphaThe Elastic Net mixing parameter.
nfoldsThe number of folds used.
Details
Cross-validation is performed at the stratum level: each matched set is
treated as an indivisible unit and assigned to a single fold using
get_fold_cc.
For each candidate eta, a full lambda path is fit on the complete
data, and then K-fold CV is used to evaluate each lambda along this path.
The function performs a 2D search over \((\eta, \lambda)\).
The criteria argument controls the CV performance metric:
"loss": Average negative conditional log-likelihood on held-out strata (lower is better)."AUC": Matched-set AUC based on within-stratum comparisons (higher is better)."CIndex": Alias for"AUC"in the 1:m matched setting."Brier": Conditional Brier score based on within-stratum softmax probabilities (lower is better).
Examples
if (FALSE) { # \dontrun{
data(ExampleData_cc_highdim)
train_cc <- ExampleData_cc_highdim$train
y <- train_cc$y
z <- train_cc$z
sets <- train_cc$stratum
beta_ext <- ExampleData_cc_highdim$beta_external
eta_list <- generate_eta(method = "exponential", n = 30, max_eta = 20)
cv_fit <- cv.ncc_MDTL_enet(
y = y,
z = z,
stratum = sets,
beta = beta_ext,
vcov = NULL,
etas = eta_list,
alpha = 1,
nfolds = 5,
criteria = "loss",
seed = 42
)
cv_fit$best$best_eta
cv_fit$best$best_lambda
} # }