Cox Proportional Hazards Model Integrated with External Individual-level Data and Elastic Net Penalty
cox_indi_enet.RdFits a series of penalized stratified Cox models that integrate an external
individual-level dataset via a composite likelihood weight eta, while
applying an Elastic Net (Lasso + Ridge) penalty for variable selection and
regularization in high-dimensional settings.
Usage
cox_indi_enet(
z_int,
delta_int,
time_int,
stratum_int = NULL,
z_ext,
delta_ext,
time_ext,
stratum_ext = NULL,
etas,
alpha = 1,
lambda = NULL,
nlambda = 100,
lambda.min.ratio = NULL,
lambda.early.stop = FALSE,
tol = 1e-04,
Mstop = 1000,
max.total.iter = (Mstop * nlambda),
group = NULL,
group.multiplier = NULL,
standardize = TRUE,
nvar.max = NULL,
group.max = NULL,
stop.loss.ratio = 0.01,
actSet = TRUE,
actIter = Mstop,
actGroupNum = NULL,
actSetRemove = FALSE,
returnX = FALSE,
trace.lambda = FALSE,
message = FALSE,
...
)Arguments
- z_int
Numeric matrix of covariates for the internal dataset (\(n_{\text{int}} \times p\)).
- delta_int
Numeric vector of event indicators for the internal dataset (1 = event, 0 = censored).
- time_int
Numeric vector of survival times for the internal dataset.
- stratum_int
Optional stratum identifiers for the internal dataset. Default
NULLassigns all internal observations to a single stratum.- z_ext
Numeric matrix of covariates for the external dataset (\(n_{\text{ext}} \times p\)). Must have the same number of columns as
z_int.- delta_ext
Numeric vector of event indicators for the external dataset (1 = event, 0 = censored).
- time_ext
Numeric vector of survival times for the external dataset.
- stratum_ext
Optional stratum identifiers for the external dataset. Default
NULLassigns all external observations to a single stratum.- etas
Numeric vector of nonnegative external weights.
eta = 0gives an internal-only penalized fit. The vector is sorted internally in ascending order.- alpha
The Elastic Net mixing parameter, with \(0 < \alpha \le 1\).
alpha = 1is the lasso penalty, andalphaclose to 0 approaches ridge. Defaults to 1.- lambda
Optional numeric vector of penalty parameters applied to all
etavalues. IfNULL, a lambda path is generated automatically for eacheta.- nlambda
Integer. The number of lambda values to generate per
eta. Default is 100.- lambda.min.ratio
Numeric. The ratio of the smallest to the largest lambda in the sequence. Default is 0.05 if \(n_{\text{all}} < p\), and 1e-3 otherwise.
- lambda.early.stop
Logical. If
TRUE, stops the lambda path early if the loss improvement is small. DefaultFALSE.- tol
Numeric. Convergence tolerance for the coordinate descent optimization. Default is 1e-4.
- Mstop
Integer. Maximum coordinate descent iterations per lambda. Default is 1000.
- max.total.iter
Integer. Maximum total iterations across the entire lambda path. Default is
Mstop * nlambda.- group
Integer vector defining group membership for grouped penalties. Default treats each variable as its own group (standard Elastic Net / Lasso).
- group.multiplier
Numeric vector. Multiplicative factors for penalties applied to each group.
- standardize
Logical. If
TRUE, the stacked design matrixz_allis standardized internally. Coefficients are returned on the original scale. DefaultTRUE.- nvar.max
Integer. Maximum number of active variables allowed. Defaults to
p.- group.max
Integer. Maximum number of active groups allowed. Defaults to the total number of unique groups.
- stop.loss.ratio
Numeric. Threshold for early stopping based on loss ratio. Default is 1e-2.
- actSet
Logical. If
TRUE, uses an active-set strategy for coordinate descent. DefaultTRUE.- actIter
Integer. Maximum iterations for active set refinement. Default is
Mstop.- actGroupNum
Integer. Limit on active groups in the active set strategy.
- actSetRemove
Logical. Whether to allow removal from the active set. Default
FALSE.- returnX
Logical. If
TRUE, the standardized design matrix objectstd.Zis included in the returned result. DefaultFALSE.- trace.lambda
Logical. If
TRUE, prints the lambda sequence progress. DefaultFALSE.- message
Logical. If
TRUE, shows a progress bar over theetasloop. DefaultFALSE.- ...
Additional arguments (currently unused).
Value
An object of class "cox_indi_enet" containing:
etaSorted sequence of \(\eta\) values used.
betaNamed list of length
length(etas). Each element is a matrix of estimated coefficients (\(p \times n_\lambda\)) on the original covariate scale, with columns named by the correspondinglambdavalues.lambdaNamed list of length
length(etas). Each element is the vector of lambda values actually used for thateta.alphaThe Elastic Net mixing parameter used.
linear.predictors_intList of matrices (\(n_{\text{int}} \times n_\lambda\)) of internal linear predictors in the original observation order, one per
eta.linear.predictors_extList of matrices (\(n_{\text{ext}} \times n_\lambda\)) of external linear predictors in the original observation order, one per
eta.groupFactor vector of group assignments for each covariate.
group.multiplierNumeric vector of group penalty multipliers used.
dataList of the original input data used.
Details
The fitted objective is
$$\ell_{\eta,\lambda}(\beta) = \ell_{\text{int}}(\beta) + \eta \, \ell_{\text{ext}}(\beta) - \text{Pen}_{\lambda,\alpha}(\beta),$$
where \(\text{Pen}_{\lambda,\alpha}\) is the Elastic Net penalty. This is equivalent
to fitting a penalized stratified Cox model on the stacked data with observation
weights 1 (internal) and eta (external), while keeping internal and external
strata separated (no mixing of risk sets across cohorts).
If
alpha = 1, the penalty is Lasso.If
alphais close to 0, the penalty approaches Ridge.If
eta = 0, external data is effectively ignored and the model reduces to a standard Elastic Net Cox model on internal data only.
The function fits one full lambda path per eta value. Standardization is
performed once on the stacked design matrix before the loop, so the lambda sequence
is recomputed for each eta (since \(\lambda_{\max}\) depends on the
weighted score at \(\beta = 0\)).
Examples
if (FALSE) { # \dontrun{
## Load example individual-level data
data(ExampleData_indi)
z_int <- ExampleData_indi$internal$z
delta_int <- ExampleData_indi$internal$status
time_int <- ExampleData_indi$internal$time
stratum_int <- ExampleData_indi$internal$stratum
z_ext <- ExampleData_indi$external$z
delta_ext <- ExampleData_indi$external$status
time_ext <- ExampleData_indi$external$time
stratum_ext <- ExampleData_indi$external$stratum
## Generate a sequence of eta values
eta_list <- generate_eta(method = "exponential", n = 10, max_eta = 5)
## Fit the composite-likelihood Elastic Net Cox model path
fit.cox_indi_enet <- cox_indi_enet(
z_int = z_int,
delta_int = delta_int,
time_int = time_int,
stratum_int = stratum_int,
z_ext = z_ext,
delta_ext = delta_ext,
time_ext = time_ext,
stratum_ext = stratum_ext,
etas = eta_list,
alpha = 1, # Lasso penalty
nlambda = 100
)
## Coefficient matrix for the first eta value
fit.cox_indi_enet$beta[[1]]
} # }