Methods for Transfer-learning Based Integrated Cox Models
Methods.RmdThe survkl package implements a transfer-learning
procedure that integrates external summary information with newly
collected time-to-event data under a Cox proportional hazards model.
This vignette summarizes the underlying methodology: the internal Cox
model, the external summary information, the partial likelihood-based
Kullback–Leibler (KL) transfer-learning objective, and the regularized
extension for high-dimensional data.
Cox Proportional Hazards Model for the Target Cohort
Let denote the death time and the censoring time for patient , , where is the total sample size of the target (internal) cohort. The observed survival time is , and the death indicator is . Let be a -dimensional covariate vector for the -th patient. We assume that, conditional on , is independently censored by . Consider the Cox proportional hazards model
where is an arbitrarily unspecified baseline hazard function, specifies the log-relative-risk relationship between the covariates and the hazard function, and is a vector of regression parameters. Under the standard linear specification, . The log-partial likelihood is given by
where is the at-risk indicator.
External Summary Information
To account for privacy constraints, we consider scenarios where only external summary information is available, rather than individual-level external data. For example, suppose the estimated coefficients are available from a published Cox model; a risk score can then be computed as for the -th subject in the target cohort. The proposed transfer-learning procedure is flexible and can incorporate various forms of external summary information, including estimated risk scores from machine-learning algorithms and clinically derived risk groupings.
Partial Likelihood-Based Transfer Learning
To extract information from external risk scores, we formulate the censored time-to-event data as a dynamic ranking problem. Specifically, suppose the internal cohort comprises unique failure times . Let specify that individual fails in , and let specify all the censoring and failure information up to time , together with the information that one failure occurs in . Based on the external risk scores, the conditional density of given is
where the second equality follows from canceling in the numerator and denominator. Following Wang et al. (2023), the partial likelihood-based KL divergence between the conditional densities corresponding to the external risk scores and the internal Cox model, contained in , is given by
where the expectation is taken with respect to the external conditional density , and is the conditional density based on the internal Cox model,
When is generated from clinically derived risk groupings, does not represent a formal conditional density; instead, it can be viewed as a Plackett–Luce ranking metric, and can be interpreted as a generalized KL divergence. The accumulated KL divergence across the sequence of conditional experiments is
which measures the discrepancy between the external risk scores and the internal Cox model. To integrate external information while accounting for potential disparities, we combine the internal log-partial likelihood with the accumulated KL divergence by constructing the penalized objective function
where is a tuning parameter that controls the trade-off between the internal model and the external risk scores. Setting recovers the internal-only Cox fit, whereas larger values of place more weight on the external information.
Equivalent weighted form. Substituting the Cox-model expressions and noting that the unique failure times coincide with the observed internal event times, the integrated objective admits the equivalent weighted partial-likelihood form
where the externally induced pseudo-event weight is defined as
This representation shows that the external information enters the internal partial likelihood by augmenting each subject’s observed event indicator with a fractional pseudo-event weight derived from the external risk scores, with governing the relative contribution of the two sources.
Regularization for High-Dimensional Data
For high-dimensional applications, where the number of covariates may be large relative to the sample size , we extend the integrated objective by adding a regularization term. The resulting objective function enables simultaneous variable selection and parameter estimation:
where is a penalty function and is a tuning parameter controlling its strength. The package supports the following choices of :
Ridge (Hoerl and Kennard, 1970): which shrinks coefficients toward zero and stabilizes estimation under collinearity.
LASSO (Tibshirani, 1997): which produces sparse solutions by setting some coefficients exactly to zero.
Elastic Net (Simon et al., 2011): where is a mixing parameter that blends the LASSO and ridge penalties; reduces to the LASSO and to ridge.
In survkl, ridge-penalized estimation is provided by
coxkl_ridge, while the elastic-net family (including the
LASSO as the special case
)
is provided by coxkl_enet. The companion cross-validation
routines cv.coxkl, cv.coxkl_ridge, and
cv.coxkl_enet perform
-fold
cross-validation to select the integration weight
and the regularization parameter
,
using Harrell’s C-index for discrimination and the V&VH loss for
overall model fit.