Skip to contents

When multiple failures occur at the same recorded time point—such as with daily event recording, grouped clinical visit schedules, or discretized follow-up times—the underlying risk-set contribution becomes substantially more complex, and the associated conditional-experiments structure as well as the stratum-specific probability mass functions must be modified accordingly. We consider two standard approaches for handling ties: Cox’s exact method (Cox 1972) and the Breslow approximation (Breslow 1974). Because these two approaches construct the probability mass in fundamentally different ways, we discuss them separately.

Cox’s Exact Method

In stratum ss, suppose that at time tk(s)t_k^{(s)}, dk(s)=i=1nsδi(s)(tk(s))1d_k^{(s)} = \sum_{i=1}^{n_s} \delta_i^{(s)}(t_k^{(s)}) \ge 1 subjects fail. Let k(s)\mathcal{R}_k^{(s)} denote the at-risk set with cardinality nk(s)n_k^{(s)}, and let

𝒟k(s)={ik(s):δi(s)(tk(s))=1} \mathcal{D}_k^{(s)} = \bigl\{ i \in \mathcal{R}_k^{(s)} : \delta_i^{(s)}(t_k^{(s)}) = 1 \bigr\}

denote the observed failure (tie) set of size dk(s)d_k^{(s)}. Let k(s)(dk(s))\mathcal{R}_k^{(s)}(d_k^{(s)}) denote the collection of all subsets of size dk(s)d_k^{(s)} drawn from k(s)\mathcal{R}_k^{(s)}, with cardinality ck(s)=(nk(s)dk(s))c_k^{(s)} = \binom{n_k^{(s)}}{d_k^{(s)}}.

Probabilistic Framework

Under Cox’s exact method, for stratum ss and event time tk(s)t_k^{(s)}, let Hk(s)(dk(s))H \in \mathcal{R}_k^{(s)}(d_k^{(s)}) denote a candidate failure subset of size dk(s)d_k^{(s)}, and define the event Ak(s)(H)A_k^{(s)}(H) to indicate that the subjects in HH fail in the interval [tk(s),tk(s)+dtk(s))[t_k^{(s)},\, t_k^{(s)} + dt_k^{(s)}). Let Bk(s)B_k^{(s)} denote all censoring and failure information up to tk(s){t_k^{(s)}}^{-}, together with the information that exactly dk(s)d_k^{(s)} failures occur in that interval. Then

{Ak(s)(H)Bk(s):Hk(s)(dk(s)),k=1,,K(s),s=1,,S} \bigl\{\, A_k^{(s)}(H) \mid B_k^{(s)} : H \in \mathcal{R}_k^{(s)}(d_k^{(s)}),\ k=1,\ldots,K^{(s)},\ s=1,\ldots,S \,\bigr\}

remains a well-defined sequence of conditional experiments. At each event time tk(s)t_k^{(s)}, the internal working model specifies the conditional density as Multinomial(1,𝐪k(s))\mathrm{Multinomial}(1, \mathbf{q}_k^{(s)}). In contrast to the no-ties case where the support consists of nk(s)n_k^{(s)} individual subjects, under Cox’s exact method the support is given by the ck(s)c_k^{(s)} candidate failure subsets of size dk(s)d_k^{(s)}.

Internal and External Probability Mass Functions

The stratum-specific probability mass at time tk(s)t_k^{(s)} under the internal model is

𝐪k(s)(H):=𝒫{Ak(s)(H)Bk(s)}=exp{rk(H;𝛃)}Hk(s)(dk(s))exp{rk(H;𝛃)}, \mathbf{q}_k^{(s)}(H) := \mathcal{P}\!\left\{ A_k^{(s)}(H) \mid B_k^{(s)} \right\} = \frac{ \exp\!\left\{ r_{k}(H;\boldsymbol{\beta}) \right\} }{ \sum_{H' \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \exp\!\left\{ r_{k}(H';\boldsymbol{\beta}) \right\} },

where

rk(H;𝛃):=jHr(𝐙j(s),𝛃) r_{k}(H;\boldsymbol{\beta}) := \sum_{j \in H} r(\mathbf{Z}_j^{(s)},\boldsymbol{\beta})

denotes the sum of internal risk scores over subjects in the candidate failure subset HH.

Replacing the internal risk scores with the external risk scores r̃(𝐙i(s))\tilde{r}(\mathbf{Z}_i^{(s)}), the probability mass under the external model is

𝐩k(s)(H):=𝒫ext{Ak(s)(H)Bk(s)}=exp{r̃k(H)}Hk(s)(dk(s))exp{r̃k(H)}, \mathbf{p}_k^{(s)}(H) := \mathcal{P}_{\mathrm{ext}}\!\left\{ A_k^{(s)}(H) \mid B_k^{(s)} \right\} = \frac{ \exp\!\left\{ \tilde{r}_{k}(H) \right\} }{ \sum_{H' \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \exp\!\left\{ \tilde{r}_{k}(H') \right\} },

where

r̃k(H):=jHr̃(𝐙j(s)) \tilde{r}_{k}(H) := \sum_{j \in H} \tilde{r}(\mathbf{Z}_j^{(s)})

denotes the sum of external risk scores over subjects in HH.

KL Divergence

The KL divergence between the external and internal models at time tk(s)t_k^{(s)} in stratum ss is

𝐝KL(𝐩k(s)𝐪k(s))=Hk(s)(dk(s))𝐩k(s)(H)log𝐩k(s)(H)𝐪k(s)(H). \mathbf{d}_{KL}\!\left(\mathbf{p}_k^{(s)} \,\|\, \mathbf{q}_k^{(s)}\right) = \sum_{H \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \mathbf{p}_k^{(s)}(H) \log\frac{\mathbf{p}_k^{(s)}(H)}{\mathbf{q}_k^{(s)}(H)}.

Substituting the expressions above and accumulating over all strata and failure times yields

𝒟KL(𝐏𝐐)s=1Sk=1K(s)(Hk(s)(dk(s))𝐩k(s)(H)rk(H;𝛃))+s=1Sk=1K(s)log{Hk(s)(dk(s))exp{rk(H;𝛃)}}. \mathcal{D}_{\mathrm{KL}}(\mathbf{P} \parallel \mathbf{Q}) \propto -\sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \left( \sum_{H \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \mathbf{p}_k^{(s)}(H)\, r_k(H;\boldsymbol{\beta}) \right) + \sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \log \left\{ \sum_{H' \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \exp\!\left\{ r_k(H';\boldsymbol{\beta}) \right\} \right\}.

Integrated Objective Function

Proposition (Exact Ties). Under the stratified Cox model with exact ties, the integrated objective Qη(𝛃)=(𝛃)+η𝒟KL(𝐏𝐐)Q_{\eta}(\boldsymbol{\beta}) = -\ell(\boldsymbol{\beta}) + \eta\,\mathcal{D}_{\mathrm{KL}}(\mathbf{P}\parallel\mathbf{Q}) admits the representation

Qη(𝛃)s=1Sk=1K(s){rk(𝒟k(s);𝛃)+ηHk(s)(dk(s))𝐩k(s)(H)rk(H;𝛃)1+ηlog[Hk(s)(dk(s))exp(rk(H;𝛃))]}, Q_{\eta}(\boldsymbol{\beta}) \propto -\sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \left\{ \frac{ r_k(\mathcal{D}_k^{(s)};\boldsymbol{\beta}) + \eta \sum_{H \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \mathbf{p}_k^{(s)}(H)\,r_k(H;\boldsymbol{\beta}) }{1+\eta} - \log \left[ \sum_{H'\in\mathcal{R}_k^{(s)}(d_k^{(s)})} \exp\!\bigl(r_k(H';\boldsymbol{\beta})\bigr) \right] \right\},

where rk(𝒟k(s);𝛃)=j𝒟k(s)r(𝐙j(s),𝛃)r_k(\mathcal{D}_k^{(s)};\boldsymbol{\beta}) = \sum_{j\in\mathcal{D}_k^{(s)}} r(\mathbf{Z}_j^{(s)},\boldsymbol{\beta}).

Furthermore, under the linear specification r(𝐙j(s),𝛃)=𝐙j(s)𝛃r(\mathbf{Z}_j^{(s)},\boldsymbol{\beta}) = {\mathbf{Z}_j^{(s)}}^\top\boldsymbol{\beta}, define

𝐰𝒟k(s)=j𝒟k(s)𝐙j(s),𝐰H(s)=jH𝐙j(s),𝐰̃k(s)=Hk(s)(dk(s))𝐩k(s)(H)𝐰H(s). \mathbf{w}_{\mathcal{D}_k}^{(s)} = \sum_{j\in\mathcal{D}_k^{(s)}} \mathbf{Z}_j^{(s)}, \qquad \mathbf{w}_H^{(s)} = \sum_{j\in H} \mathbf{Z}_j^{(s)}, \qquad \tilde{\mathbf{w}}_{k}^{(s)} = \sum_{H\in\mathcal{R}_k^{(s)}(d_k^{(s)})} \mathbf{p}_k^{(s)}(H)\,\mathbf{w}_H^{(s)}.

Then the objective simplifies to

Qη(𝛃)s=1Sk=1K(s){(𝐰𝒟k(s)+η𝐰̃k(s)1+η)𝛃log[Hk(s)(dk(s))exp(𝐰H(s)𝛃)]}. Q_{\eta}(\boldsymbol{\beta}) \propto -\sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \left\{ \left( \frac{ \mathbf{w}_{\mathcal{D}_k}^{(s)} + \eta\,\tilde{\mathbf{w}}_{k}^{(s)} }{1+\eta} \right)^\top \boldsymbol{\beta} - \log \left[ \sum_{H'\in\mathcal{R}_k^{(s)}(d_k^{(s)})} \exp\!\bigl( {\mathbf{w}_{H'}^{(s)}}^\top\boldsymbol{\beta} \bigr) \right] \right\}.

Breslow Approximation

Although Cox’s exact method provides a precise treatment of tied failure times, its computational cost grows combinatorially with the number of ties at each event time: the size of the candidate failure set (nk(s)dk(s))\binom{n_k^{(s)}}{d_k^{(s)}} becomes prohibitively large whenever dk(s)d_k^{(s)} is non-negligible relative to nk(s)n_k^{(s)}. The Breslow approximation (Breslow 1974) addresses this limitation by replacing the exact combinatorial denominator with a computationally tractable surrogate, while retaining the same support k(s)(dk(s))\mathcal{R}_k^{(s)}(d_k^{(s)}) of all size-dk(s)d_k^{(s)} subsets of the risk set.

Approximation of the Combinatorial Denominator

Under Cox’s exact method, the denominator of 𝐪k(s)(H)\mathbf{q}_k^{(s)}(H) involves the elementary symmetric polynomial of degree dk(s)d_k^{(s)} in the individual exponentiated risk scores,

Hk(s)(dk(s))exp{rk(H;𝛃)}=Hk(s)(dk(s))jHexp{r(𝐙j(s),𝛃)}, \sum_{H' \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \exp\!\left\{ r_k(H'; \boldsymbol{\beta}) \right\} = \sum_{H' \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \prod_{j \in H'} \exp\!\left\{ r(\mathbf{Z}_j^{(s)}, \boldsymbol{\beta}) \right\},

which enumerates all possible subsets of size dk(s)d_k^{(s)} drawn without replacement from k(s)\mathcal{R}_k^{(s)}. The Breslow approximation replaces this without-replacement enumeration by treating the dk(s)d_k^{(s)} failures as dk(s)d_k^{(s)} independent draws with replacement from k(s)\mathcal{R}_k^{(s)}, yielding the approximation

Hk(s)(dk(s))jHexp{r(𝐙j(s),𝛃)}[lk(s)exp{r(𝐙l(s),𝛃)}]dk(s), \sum_{H' \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \prod_{j \in H'} \exp\!\left\{ r(\mathbf{Z}_j^{(s)}, \boldsymbol{\beta}) \right\} \;\approx\; \left[ \sum_{l \in \mathcal{R}_k^{(s)}} \exp\!\left\{ r(\mathbf{Z}_l^{(s)}, \boldsymbol{\beta}) \right\} \right]^{d_k^{(s)}},

which becomes increasingly accurate as nk(s)dk(s)n_k^{(s)} \gg d_k^{(s)}, since the probability of selecting the same subject twice becomes negligible. Crucially, the support of the distribution remains unchanged: both the exact and Breslow formulations are defined over all (nk(s)dk(s))\binom{n_k^{(s)}}{d_k^{(s)}} candidate failure subsets Hk(s)(dk(s))H \in \mathcal{R}_k^{(s)}(d_k^{(s)}). What changes is solely the denominator used to normalize the probability mass.

Internal and External Probability Mass Functions

Substituting this approximation, the Breslow approximation to the internal probability mass at time tk(s)t_k^{(s)} in stratum ss is

𝐪̃k(s)(H):=exp{rk(H;𝛃)}[lk(s)exp{r(𝐙l(s),𝛃)}]dk(s), \tilde{\mathbf{q}}_k^{(s)}(H) := \frac{ \exp\!\left\{ r_k(H; \boldsymbol{\beta}) \right\} }{ \left[ \displaystyle\sum_{l \in \mathcal{R}_k^{(s)}} \exp\!\left\{ r(\mathbf{Z}_l^{(s)}, \boldsymbol{\beta}) \right\} \right]^{d_k^{(s)}} },

and correspondingly, the Breslow approximation to the external probability mass is

𝐩̃k(s)(H):=exp{r̃k(H)}[lk(s)exp{r̃(𝐙l(s))}]dk(s), \tilde{\mathbf{p}}_k^{(s)}(H) := \frac{ \exp\!\left\{ \tilde{r}_k(H) \right\} }{ \left[ \displaystyle\sum_{l \in \mathcal{R}_k^{(s)}} \exp\!\left\{ \tilde{r}(\mathbf{Z}_l^{(s)}) \right\} \right]^{d_k^{(s)}} },

where Hk(s)(dk(s))H \in \mathcal{R}_k^{(s)}(d_k^{(s)}) and r̃k(H)=jHr̃(𝐙j(s))\tilde{r}_k(H) = \sum_{j \in H} \tilde{r}(\mathbf{Z}_j^{(s)}) as before. Note that 𝐪̃k(s)\tilde{\mathbf{q}}_k^{(s)} and 𝐩̃k(s)\tilde{\mathbf{p}}_k^{(s)} are not proper probability distributions over k(s)(dk(s))\mathcal{R}_k^{(s)}(d_k^{(s)}) in general, since the Breslow denominator does not equal the true normalizing constant and hence the masses do not sum to one. Nevertheless, they serve as well-defined surrogates within which the KL-divergence framework can be applied in an approximate sense.

KL Divergence and Simplified Weight

The approximate KL divergence follows the same derivation as in the exact-ties setting. Under the Breslow approximation, the combinatorial sum over all subsets Hk(s)(dk(s))H \in \mathcal{R}_k^{(s)}(d_k^{(s)}) collapses under the with-replacement structure. Specifically, exchanging the order of summation and noting that the marginal probability of subject jj appearing in any selected subset equals dk(s)d_k^{(s)} times its single-draw softmax probability, one obtains

Hk(s)(dk(s))𝐩̃k(s)(H)rk(H;𝛃)=dk(s)jk(s)r(𝐙j(s),𝛃)exp{r̃(𝐙j(s))}lk(s)exp{r̃(𝐙l(s))}, \sum_{H \in \mathcal{R}_k^{(s)}(d_k^{(s)})} \tilde{\mathbf{p}}_k^{(s)}(H)\,r_k(H;\boldsymbol{\beta}) = d_k^{(s)} \sum_{j \in \mathcal{R}_k^{(s)}} r(\mathbf{Z}_j^{(s)},\boldsymbol{\beta})\cdot \frac{ \exp\!\left\{\tilde{r}(\mathbf{Z}_j^{(s)})\right\} }{ \displaystyle\sum_{l \in \mathcal{R}_k^{(s)}} \exp\!\left\{\tilde{r}(\mathbf{Z}_l^{(s)})\right\} },

with no combinatorial enumeration required.

Integrated Objective Function

Proposition (Breslow Approximation). Under the stratified Cox model with the Breslow approximation for ties, the integrated objective Qη(𝛃)=(𝛃)+η𝒟KL(𝐏𝐐)Q_{\eta}(\boldsymbol{\beta}) = -\ell(\boldsymbol{\beta}) + \eta\,\mathcal{D}_{\mathrm{KL}}(\mathbf{P}\parallel\mathbf{Q}) under the linear specification r(𝐙j(s),𝛃)=𝐙j(s)𝛃r(\mathbf{Z}_j^{(s)},\boldsymbol{\beta}) = {\mathbf{Z}_j^{(s)}}^\top\boldsymbol{\beta} admits the representation

Qη(𝛃)s=1Sk=1K(s){(𝐰𝒟k(s)+η𝐰̃k(s)1+η)𝛃dk(s)log[lk(s)exp{𝐙l(s)𝛃}]}, Q_{\eta}(\boldsymbol{\beta}) \propto -\sum_{s=1}^{S} \sum_{k=1}^{K^{(s)}} \left\{ \left( \frac{ \mathbf{w}_{\mathcal{D}_k}^{(s)} + \eta\,\tilde{\mathbf{w}}_k^{(s)} }{1+\eta} \right)^\top \boldsymbol{\beta} \;-\; d_k^{(s)} \log \left[ \sum_{l \in \mathcal{R}_k^{(s)}} \exp\!\left\{ {\mathbf{Z}_l^{(s)}}^\top \boldsymbol{\beta} \right\} \right] \right\},

where 𝐰𝒟k(s)=j𝒟k(s)𝐙j(s)\mathbf{w}_{\mathcal{D}_k}^{(s)} = \sum_{j \in \mathcal{D}_k^{(s)}} \mathbf{Z}_j^{(s)} and

𝐰̃k(s)=dk(s)jk(s)𝐙j(s)exp{r̃(𝐙j(s))}lk(s)exp{r̃(𝐙l(s))}. \tilde{\mathbf{w}}_k^{(s)} = d_k^{(s)} \sum_{j \in \mathcal{R}_k^{(s)}} \mathbf{Z}_j^{(s)}\cdot \frac{ \exp\!\left\{\tilde{r}(\mathbf{Z}_j^{(s)})\right\} }{ \displaystyle\sum_{l \in \mathcal{R}_k^{(s)}} \exp\!\left\{\tilde{r}(\mathbf{Z}_l^{(s)})\right\} }.

As in the exact-ties setting, 𝐰̃k(s)\tilde{\mathbf{w}}_k^{(s)} depends only on the prior risk score r̃()\tilde{r}(\cdot) and can be computed once in a preprocessing step. In contrast to the exact method, however, 𝐰̃k(s)\tilde{\mathbf{w}}_k^{(s)} under the Breslow approximation requires no combinatorial enumeration: it reduces to a softmax-weighted average of covariates over the risk set k(s)\mathcal{R}_k^{(s)}, scaled by dk(s)d_k^{(s)}, with computational cost O(nk(s))O(n_k^{(s)}) per event time. The resulting objective is structurally identical to the standard Breslow partial likelihood, with the observed covariate sum 𝐰𝒟k(s)\mathbf{w}_{\mathcal{D}_k}^{(s)} replaced by the blended term (𝐰𝒟k(s)+η𝐰̃k(s))/(1+η)(\mathbf{w}_{\mathcal{D}_k}^{(s)} + \eta\,\tilde{\mathbf{w}}_k^{(s)})/(1+\eta), with no additional computational burden introduced by the KL integration term.

Comparison of the Two Methods

The Breslow method provides a computationally efficient approximation and is generally suitable when the number of tied events is moderate. The exact method yields more accurate inference in the presence of extensive ties, at the cost of increased computational burden due to the enumeration of all subsets k(s)(dk(s))\mathcal{R}_k^{(s)}(d_k^{(s)}). In practice, the two methods produce nearly identical results when ties are infrequent. Users can select between the two via the ties argument in coxkl_ties() and cv.coxkl_ties().

References

Breslow, Norman. 1974. “Covariance Analysis of Censored Survival Data.” Biometrics, 89–99.
Cox, David R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological) 34 (2): 187–202.