Skip to contents

The survkl package implements a transfer-learning procedure that integrates external summary information with newly collected time-to-event data under a Cox proportional hazards model. This vignette summarizes the underlying methodology: the internal Cox model, the external summary information, the partial likelihood-based Kullback–Leibler (KL) transfer-learning objective, and the regularized extension for high-dimensional data.

Cox Proportional Hazards Model for the Target Cohort

Let DiD_i denote the death time and CiC_i the censoring time for patient ii, i=1,,ni = 1, \ldots, n, where nn is the total sample size of the target (internal) cohort. The observed survival time is Ti=min{Di,Ci}T_i = \min\{D_i, C_i\}, and the death indicator is δi=𝕀(DiCi)\delta_i = \mathbb{I}(D_i \le C_i). Let Zi=(Zi1,,Zip)Z_i = (Z_{i1}, \ldots, Z_{ip})^\top be a pp-dimensional covariate vector for the ii-th patient. We assume that, conditional on ZiZ_i, DiD_i is independently censored by CiC_i. Consider the Cox proportional hazards model

λ(tZi)=λ0(t)exp{g(Zi,β)}, \lambda(t \mid Z_i) = \lambda_0(t)\,\exp\{g(Z_i, \beta)\},

where λ0(t)\lambda_0(t) is an arbitrarily unspecified baseline hazard function, g(Zi,β)g(Z_i, \beta) specifies the log-relative-risk relationship between the covariates ZiZ_i and the hazard function, and βp\beta \in \mathbb{R}^p is a vector of regression parameters. Under the standard linear specification, g(Zi,β)=Ziβg(Z_i, \beta) = Z_i^\top \beta. The log-partial likelihood is given by

(β)=i=1nδi[g(Zi,β)log{l=1nYl(Ti)exp{g(Zl,β)}}], \ell(\beta) = \sum_{i=1}^{n} \delta_i \left[ g(Z_i, \beta) - \log\left\{ \sum_{l=1}^{n} Y_l(T_i)\,\exp\{g(Z_l, \beta)\} \right\} \right],

where Yl(Ti)=𝕀(TlTi)Y_l(T_i) = \mathbb{I}(T_l \ge T_i) is the at-risk indicator.

External Summary Information

To account for privacy constraints, we consider scenarios where only external summary information is available, rather than individual-level external data. For example, suppose the estimated coefficients β̃\tilde{\beta} are available from a published Cox model; a risk score can then be computed as g̃(Zi)=Ziβ̃\tilde{g}(Z_i) = Z_i^\top \tilde{\beta} for the ii-th subject in the target cohort. The proposed transfer-learning procedure is flexible and can incorporate various forms of external summary information, including estimated risk scores from machine-learning algorithms and clinically derived risk groupings.

Partial Likelihood-Based Transfer Learning

To extract information from external risk scores, we formulate the censored time-to-event data as a dynamic ranking problem. Specifically, suppose the internal cohort comprises KK unique failure times t1<<tKt_1 < \cdots < t_K. Let AkA_k specify that individual kk fails in [tk,tk+dtk)[t_k, t_k + dt_k), and let BkB_k specify all the censoring and failure information up to time tkt_k^{-}, together with the information that one failure occurs in [tk,tk+dtk)[t_k, t_k + dt_k). Based on the external risk scores, the conditional density of AkA_k given BkB_k is

f̃(AkBk)=λ̃0(tk)exp{g̃(Zk)}dtki=1nYi(tk)λ̃0(tk)exp{g̃(Zi)}dtk=exp{g̃(Zk)}i=1nYi(tk)exp{g̃(Zi)}, \tilde{f}(A_k \mid B_k) = \frac{\tilde{\lambda}_0(t_k)\,\exp\{\tilde{g}(Z_k)\}\,dt_k} {\sum_{i=1}^{n} Y_i(t_k)\,\tilde{\lambda}_0(t_k)\,\exp\{\tilde{g}(Z_i)\}\,dt_k} = \frac{\exp\{\tilde{g}(Z_k)\}} {\sum_{i=1}^{n} Y_i(t_k)\,\exp\{\tilde{g}(Z_i)\}},

where the second equality follows from canceling λ̃0(tk)dtk\tilde{\lambda}_0(t_k)\,dt_k in the numerator and denominator. Following Wang et al. (2023), the partial likelihood-based KL divergence between the conditional densities corresponding to the external risk scores and the internal Cox model, contained in AkBkA_k \mid B_k, is given by

dKL(f̃f;tk)=𝔼f̃[log{f̃(AkBk)f(AkBk)}], d_{\mathrm{KL}}(\tilde{f} \parallel f;\, t_k) = \mathbb{E}_{\tilde{f}} \left[ \log\left\{ \frac{\tilde{f}(A_k \mid B_k)}{f(A_k \mid B_k)} \right\} \right],

where the expectation is taken with respect to the external conditional density f̃(AkBk)\tilde{f}(A_k \mid B_k), and f(AkBk)f(A_k \mid B_k) is the conditional density based on the internal Cox model,

f(AkBk)=exp{g(Zk,β)}i=1nYi(tk)exp{g(Zi,β)}. f(A_k \mid B_k) = \frac{\exp\{g(Z_k, \beta)\}} {\sum_{i=1}^{n} Y_i(t_k)\,\exp\{g(Z_i, \beta)\}}.

When g̃(Zk)\tilde{g}(Z_k) is generated from clinically derived risk groupings, f̃(AkBk)\tilde{f}(A_k \mid B_k) does not represent a formal conditional density; instead, it can be viewed as a Plackett–Luce ranking metric, and dKL(f̃f;tk)d_{\mathrm{KL}}(\tilde{f} \parallel f;\, t_k) can be interpreted as a generalized KL divergence. The accumulated KL divergence across the sequence of conditional experiments A1B1,,AKBKA_1 \mid B_1, \ldots, A_K \mid B_K is

DKL(f̃f)=k=1KdKL(f̃f;tk), D_{\mathrm{KL}}(\tilde{f} \parallel f) = \sum_{k=1}^{K} d_{\mathrm{KL}}(\tilde{f} \parallel f;\, t_k),

which measures the discrepancy between the external risk scores and the internal Cox model. To integrate external information while accounting for potential disparities, we combine the internal log-partial likelihood with the accumulated KL divergence by constructing the penalized objective function

η(β)=(β)ηDKL(f̃f), \ell_{\eta}(\beta) = \ell(\beta) - \eta\, D_{\mathrm{KL}}(\tilde{f} \parallel f),

where η0\eta \ge 0 is a tuning parameter that controls the trade-off between the internal model and the external risk scores. Setting η=0\eta = 0 recovers the internal-only Cox fit, whereas larger values of η\eta place more weight on the external information.

Equivalent weighted form. Substituting the Cox-model expressions and noting that the unique failure times t1<<tKt_1 < \cdots < t_K coincide with the observed internal event times, the integrated objective admits the equivalent weighted partial-likelihood form

η(β)i=1n{δi+ηδ̃i1+ηg(Zi,β)δilog[l=1nYl(Ti)exp{g(Zl,β)}]}, \ell_{\eta}(\beta) \;\propto\; \sum_{i=1}^{n} \left\{ \frac{\delta_i + \eta\, \tilde{\delta}_i}{1 + \eta}\, g(Z_i, \beta) - \delta_i \log\left[ \sum_{l=1}^{n} Y_l(T_i)\,\exp\{g(Z_l, \beta)\} \right] \right\},

where the externally induced pseudo-event weight is defined as

δ̃i=k=1KYi(tk)exp{g̃(Zi)}j=1nYj(tk)exp{g̃(Zj)}. \tilde{\delta}_i = \sum_{k=1}^{K} \frac{Y_i(t_k)\,\exp\{\tilde{g}(Z_i)\}} {\sum_{j=1}^{n} Y_j(t_k)\,\exp\{\tilde{g}(Z_j)\}}.

This representation shows that the external information enters the internal partial likelihood by augmenting each subject’s observed event indicator δi\delta_i with a fractional pseudo-event weight δ̃i\tilde{\delta}_i derived from the external risk scores, with η\eta governing the relative contribution of the two sources.

Regularization for High-Dimensional Data

For high-dimensional applications, where the number of covariates pp may be large relative to the sample size nn, we extend the integrated objective by adding a regularization term. The resulting objective function enables simultaneous variable selection and parameter estimation:

η,λ(β)=η(β)λP(β), \ell_{\eta, \lambda}(\beta) = \ell_{\eta}(\beta) - \lambda\, P(\beta),

where P(β)P(\beta) is a penalty function and λ0\lambda \ge 0 is a tuning parameter controlling its strength. The package supports the following choices of P(β)P(\beta):

  • Ridge (Hoerl and Kennard, 1970): P(β)=12β22=12j=1pβj2, P(\beta) = \tfrac{1}{2}\,\|\beta\|_2^2 = \tfrac{1}{2}\sum_{j=1}^{p} \beta_j^2, which shrinks coefficients toward zero and stabilizes estimation under collinearity.

  • LASSO (Tibshirani, 1997): P(β)=β1=j=1p|βj|, P(\beta) = \|\beta\|_1 = \sum_{j=1}^{p} |\beta_j|, which produces sparse solutions by setting some coefficients exactly to zero.

  • Elastic Net (Simon et al., 2011): P(β)=αβ1+12(1α)β22=j=1p[α|βj|+12(1α)βj2], P(\beta) = \alpha\,\|\beta\|_1 + \tfrac{1}{2}(1 - \alpha)\,\|\beta\|_2^2 = \sum_{j=1}^{p}\left[ \alpha\,|\beta_j| + \tfrac{1}{2}(1 - \alpha)\,\beta_j^2 \right], where α[0,1]\alpha \in [0, 1] is a mixing parameter that blends the LASSO and ridge penalties; α=1\alpha = 1 reduces to the LASSO and α=0\alpha = 0 to ridge.

In survkl, ridge-penalized estimation is provided by coxkl_ridge, while the elastic-net family (including the LASSO as the special case α=1\alpha = 1) is provided by coxkl_enet. The companion cross-validation routines cv.coxkl, cv.coxkl_ridge, and cv.coxkl_enet perform KK-fold cross-validation to select the integration weight η\eta and the regularization parameter λ\lambda, using Harrell’s C-index for discrimination and the V&VH loss for overall model fit.