Skip to contents

Notations and Model Setup

Suppose the time-to-event data arise from SS distinct strata, representing heterogeneous sources or sampling blocks, where stratum ss contains nsn_s subjects and the total sample size is N=βˆ‘s=1SnsN = \sum_{s=1}^S n_s. For subject ii in stratum ss, let Ti(s)T_i^{(s)} and Ci(s)C_i^{(s)} denote the event and censoring times, respectively. Each subject is associated with pp-dimensional covariates 𝐙i(s)βˆˆβ„p\mathbf{Z}_i^{(s)} \in \mathbb{R}^p. We assume that Ti(s)T_i^{(s)} and Ci(s)C_i^{(s)} are independent conditional on 𝐙i(s)\mathbf{Z}_i^{(s)}. We define the observed time Xi(s)=min{Ti(s),Ci(s)}X_i^{(s)} = \min\{T_i^{(s)}, C_i^{(s)}\} and the event indicator Ξ΄i(s)=𝕀(Ti(s)≀Ci(s))\delta_i^{(s)} = \mathbb{I}(T_i^{(s)} \le C_i^{(s)}).

Consider the following stratified Cox proportional hazards model:

Ξ»(s)(tβˆ£π™i(s))=Ξ»0(s)(t)exp{r(𝐙i(s),𝛃)}, \lambda^{(s)}\!\left(t \mid \mathbf{Z}_i^{(s)}\right) = \lambda_0^{(s)}(t)\exp\!\left\{ r(\mathbf{Z}_i^{(s)}, \boldsymbol{\beta}) \right\},

where Ξ»0(s)(t)\lambda_0^{(s)}(t) is an unspecified stratum-specific baseline hazard function, treated as an infinite-dimensional nuisance parameter that absorbs between-stratum heterogeneity arising from differences in source populations, clinical practice, or unmeasured confounding, and π›ƒβˆˆβ„p\boldsymbol{\beta} \in \mathbb{R}^p is a common regression parameter shared across all strata. r(𝐙i(s),𝛃)r(\mathbf{Z}_i^{(s)}, \boldsymbol{\beta}) denotes the internal risk score; under the standard linear specification, r(𝐙i(s),𝛃)=𝐙i(s)βŠ€π›ƒr(\mathbf{Z}_i^{(s)}, \boldsymbol{\beta}) = {\mathbf{Z}_i^{(s)}}^\top \boldsymbol{\beta}.

Assume that, in stratum ss, the observed cohort has K(s)K^{(s)} unique failure times t1(s)<t2(s)<β‹―<tK(s)(s)t_1^{(s)} < t_2^{(s)} < \cdots < t_{K^{(s)}}^{(s)}. The stratified Cox log-partial likelihood is given by

β„“(𝛃)=βˆ‘s=1Sβˆ‘k=1K(s)βˆ‘i=1nsΞ΄i(s)(tk(s))[r(𝐙i(s),𝛃)βˆ’log{βˆ‘j=1nsYj(s)(tk(s))exp{r(𝐙j(s),𝛃)}}], \ell(\boldsymbol{\beta}) = \sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \sum_{i=1}^{n_s} \delta_i^{(s)}\!\left(t_k^{(s)}\right) \left[ r(\mathbf{Z}_i^{(s)}, \boldsymbol{\beta}) - \log \left\{ \sum_{j=1}^{n_s} Y_j^{(s)}\!\left(t_k^{(s)}\right) \exp\!\left\{ r(\mathbf{Z}_j^{(s)}, \boldsymbol{\beta}) \right\} \right\} \right],

where Yj(s)(t)=𝕀(Xj(s)β‰₯t)Y_j^{(s)}(t) = \mathbb{I}\!\left(X_j^{(s)} \ge t\right) is the at-risk indicator in stratum ss, and Ξ΄i(s)(t)=𝕀(Xi(s)=t,Ξ΄i(s)=1)\delta_i^{(s)}(t) = \mathbb{I}\!\left(X_i^{(s)} = t,\, \delta_i^{(s)} = 1\right) indicates whether subject ii in stratum ss fails at time tt.

KL Divergence Formulation

Let GG be chosen as the negative entropy so that the resulting Bregman divergence reduces to the Kullback–Leibler divergence. To construct the probabilistic framework, let β„›k(s)\mathcal{R}_k^{(s)} denote the at-risk set at tk(s)t_k^{(s)} in stratum ss, let Ak(s)(i)A_k^{(s)}(i) denote the event that subject iβˆˆβ„›k(s)i \in \mathcal{R}_k^{(s)} fails in the interval [tk(s),tk(s)+dtk(s))[t_k^{(s)},\, t_k^{(s)} + dt_k^{(s)}), and let Bk(s)B_k^{(s)} collect all failure and censoring information up to time tk(s)βˆ’{t_k^{(s)}}^{-}, together with the information that exactly one failure occurs in [tk(s),tk(s)+dtk(s))[t_k^{(s)},\, t_k^{(s)} + dt_k^{(s)}).

Then {Ak(s)(i)∣Bk(s):k=1,…,K(s),s=1,…,S}\{A_k^{(s)}(i) \mid B_k^{(s)} : k=1,\ldots,K^{(s)},\, s=1,\ldots,S\} defines a sequence of conditional experiments. At each event time tk(s)t_k^{(s)} in stratum ss, the internal working model specifies the conditional density as

Multinomial(1,πͺk(s)). \mathrm{Multinomial}\!\left(1,\, \mathbf{q}_k^{(s)}\right).

The stratum-specific probability mass assigned to subject iβˆˆβ„›k(s)i \in \mathcal{R}_k^{(s)} under the internal model is

πͺk(s)(i):=𝒫{Ak(s)(i)|Bk(s)}=exp{r(𝐙i(s),𝛃)}βˆ‘j=1nsYj(s)(tk(s))exp{r(𝐙j(s),𝛃)}, \mathbf{q}_k^{(s)}(i) := \mathcal{P}\!\left\{A_k^{(s)}(i) \,\middle|\, B_k^{(s)}\right\} = \frac{ \exp\!\left\{ r(\mathbf{Z}_i^{(s)}, \boldsymbol{\beta}) \right\} }{ \sum_{j=1}^{n_s} Y_j^{(s)}(t_k^{(s)}) \exp\!\left\{ r(\mathbf{Z}_j^{(s)}, \boldsymbol{\beta}) \right\} },

where Ξ»0(s)(tk(s))\lambda_0^{(s)}(t_k^{(s)}) is the stratum-specific baseline hazard that cancels in the ratio.

To extract information from the external model, we replace the internal risk score with the external risk score rΜƒ(β‹…)\tilde{r}(\cdot), obtained by applying the external coefficient estimates 𝛃̃\tilde{\boldsymbol{\beta}} to the internal cohort. The corresponding probability mass under the external model is

𝐩k(s)(i):=𝒫ext{Ak(s)(i)|Bk(s)}=exp{rΜƒ(𝐙i(s))}βˆ‘j=1nsYj(s)(tk(s))exp{rΜƒ(𝐙j(s))}. \mathbf{p}_k^{(s)}(i) := \mathcal{P}_{\mathrm{ext}}\!\left\{A_k^{(s)}(i) \,\middle|\, B_k^{(s)}\right\} = \frac{ \exp\!\left\{\tilde{r}(\mathbf{Z}_i^{(s)})\right\} }{ \sum_{j=1}^{n_s} Y_j^{(s)}(t_k^{(s)}) \exp\!\left\{\tilde{r}(\mathbf{Z}_j^{(s)})\right\} }.

The KL divergence between the external and internal conditional experiments at time tk(s)t_k^{(s)} is

𝐝KL(𝐩k(s)βˆ₯πͺk(s))=βˆ‘iβˆˆβ„›k(s)𝐩k(s)(i)log𝐩k(s)(i)πͺk(s)(i). \mathbf{d}_{KL}\!\left(\mathbf{p}_k^{(s)} \,\|\, \mathbf{q}_k^{(s)}\right) = \sum_{i \in \mathcal{R}_k^{(s)}} \mathbf{p}_k^{(s)}(i) \log\frac{\mathbf{p}_k^{(s)}(i)}{\mathbf{q}_k^{(s)}(i)}.

Accumulating over all strata and failure times yields the total divergence:

π’ŸKL(𝐏βˆ₯𝐐)=βˆ‘s=1Sβˆ‘k=1K(s)𝐝KL(𝐩k(s)βˆ₯πͺk(s)), \mathcal{D}_{\mathrm{KL}}(\mathbf{P} \parallel \mathbf{Q}) = \sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \mathbf{d}_{KL}\!\left(\mathbf{p}_k^{(s)} \,\|\, \mathbf{q}_k^{(s)}\right),

which, after substituting the Cox-model expressions, simplifies to

π’ŸKL(𝐏βˆ₯𝐐)=βˆ’βˆ‘s=1Sβˆ‘k=1K(s)βˆ‘i=1nsYi(s)(tk(s))exp{rΜƒ(𝐙i(s))}βˆ‘j=1nsYj(s)(tk(s))exp{rΜƒ(𝐙j(s))}[r(𝐙i(s),𝛃)βˆ’log{βˆ‘j=1nsYj(s)(tk(s))exp{r(𝐙j(s),𝛃)}}]+Ξ¨, \mathcal{D}_{\mathrm{KL}}(\mathbf{P} \parallel \mathbf{Q}) = -\sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \sum_{i=1}^{n_s} \frac{ Y_i^{(s)}(t_k^{(s)}) \exp\!\left\{\tilde{r}(\mathbf{Z}_i^{(s)})\right\} }{ \sum_{j=1}^{n_s} Y_j^{(s)}(t_k^{(s)}) \exp\!\left\{\tilde{r}(\mathbf{Z}_j^{(s)})\right\} } \left[ r(\mathbf{Z}_i^{(s)},\boldsymbol{\beta}) - \log \left\{ \sum_{j=1}^{n_s} Y_j^{(s)}(t_k^{(s)}) \exp\!\left\{r(\mathbf{Z}_j^{(s)},\boldsymbol{\beta})\right\} \right\} \right] + \Psi,

where Ξ¨=βˆ‘s,kΞ¨k(s)\Psi = \sum_{s,k} \Psi_k^{(s)} does not involve 𝛃\boldsymbol{\beta}.

Integrated Objective Function

Proposition. Under the above construction, the integrated objective function in the stratified Cox model satisfies

QΞ·(𝛃)=βˆ’β„“(𝛃)+Ξ·π’ŸKL(𝐏βˆ₯𝐐)βˆβˆ’βˆ‘s=1Sβˆ‘i=1ns{Ξ΄i(s)+Ξ·Ξ΄Μƒi(s)1+Ξ·β‹…r(𝐙i(s),𝛃)βˆ’Ξ΄i(s)log[βˆ‘j=1nsYj(Xi(s))exp(r(𝐙j(s),𝛃))]}, Q_{\eta}(\boldsymbol{\beta}) = -\ell(\boldsymbol{\beta}) + \eta \, \mathcal{D}_{\mathrm{KL}}(\mathbf{P} \parallel \mathbf{Q}) \propto -\sum_{s=1}^{S} \sum_{i=1}^{n_s} \left\{ \frac{\delta_i^{(s)} + \eta \tilde{\delta}_i^{(s)}}{1 + \eta} \cdot r(\mathbf{Z}_i^{(s)}, \boldsymbol{\beta}) - \delta_i^{(s)} \log \left[\sum_{j=1}^{n_s} Y_j(X_i^{(s)}) \exp\left(r(\mathbf{Z}_j^{(s)}, \boldsymbol{\beta}) \right) \right] \right\},

where the externally induced pseudo-event weight is defined as

Ξ΄Μƒi(s)=βˆ‘k=1K(s)Yi(tk(s))exp{rΜƒ(𝐙i(s))}βˆ‘j=1nsYj(tk(s))exp{rΜƒ(𝐙j(s))}, \tilde{\delta}_i^{(s)} = \sum_{k=1}^{K^{(s)}} \frac{ Y_{i}(t_k^{(s)}) \exp\{ \tilde{r}(\mathbf{Z}_{i}^{(s)}) \} }{ \sum_{j=1}^{n_s} Y_{j}(t_k^{(s)}) \exp\{ \tilde{r}(\mathbf{Z}_{j}^{(s)}) \} },

β„“(𝛃)\ell(\boldsymbol{\beta}) is the internal stratified Cox log-partial likelihood defined above, and Ξ·β‰₯0\eta \ge 0 is the integration weight.