Cross-Validated CLR with Mahalanobis Distance Transfer Learning
cv.ncc_MDTL.RdPerforms K-fold cross-validation (CV) to select the integration parameter
eta for Conditional Logistic Regression with Mahalanobis distance
transfer learning, implemented via ncc_MDTL.
This function is designed for 1:m matched case-control settings where each stratum (matched set) contains exactly one case and \(m\) controls.
Usage
cv.ncc_MDTL(
y,
z,
stratum,
beta,
vcov = NULL,
etas = NULL,
tol = 1e-04,
Mstop = 100,
nfolds = 5,
criteria = c("loss", "AUC", "CIndex", "Brier"),
message = FALSE,
seed = NULL,
...
)Arguments
- y
Numeric vector of binary outcomes (0 = control, 1 = case).
- z
Numeric matrix of covariates.
- stratum
Numeric or factor vector defining the matched sets. Required.
- beta
Numeric vector of external coefficients (length
ncol(z)). Required.- vcov
Optional numeric matrix (
ncol(z)xncol(z)) as the weighting matrix \(Q\). Typically the precision matrix of the external estimator. IfNULL, defaults to the identity matrix.- etas
Numeric vector of candidate tuning values for \(\eta\). Required.
- tol
Convergence tolerance passed to
ncc_MDTL. Default1e-4.- Mstop
Maximum Newton-Raphson iterations passed to
ncc_MDTL. Default100.- nfolds
Number of cross-validation folds. Default
5.- criteria
Character string specifying the CV performance criterion. One of
"loss"(default),"AUC","CIndex", or"Brier".- message
Logical. If
TRUE, prints progress messages. DefaultFALSE.- seed
Optional integer seed for reproducible fold assignment. Default
NULL.- ...
Additional arguments passed to
ncc_MDTL.
Value
A list of class "cv.ncc_MDTL" containing:
internal_statA
data.framewith one row peretaand the CV metric for the chosencriteria.beta_fullMatrix of coefficients from the full-data fit (columns correspond to
etas).bestA list with
best_eta,best_beta, andcriteria.criteriaThe criterion used for selection.
nfoldsThe number of folds used.
Details
Cross-validation is performed at the stratum level: each matched set is
treated as an indivisible unit and assigned to a single fold using
get_fold_cc. This ensures that the conditional likelihood is
well-defined within each training and test split.
The criteria argument controls the CV performance metric:
"loss": Average negative conditional log-likelihood on held-out strata (lower is better)."AUC": Matched-set AUC based on within-stratum comparisons (higher is better)."CIndex": Alias for"AUC"in the 1:m matched setting."Brier": Conditional Brier score based on within-stratum softmax probabilities (lower is better).
Examples
if (FALSE) { # \dontrun{
data(ExampleData_cc)
train_cc <- ExampleData_cc$train
y <- train_cc$y
z <- train_cc$z
sets <- train_cc$stratum
beta_ext <- ExampleData_cc$beta_external
eta_list <- generate_eta(method = "exponential", n = 50, max_eta = 10)
cv_fit <- cv.ncc_MDTL(
y = y,
z = z,
stratum = sets,
beta = beta_ext,
vcov = NULL,
etas = eta_list,
nfolds = 5,
criteria = "loss",
seed = 42
)
cv_fit$best$best_eta
} # }