Notations and Model Setup
Suppose the time-to-event data arise from
distinct strata, representing heterogeneous sources or sampling blocks,
where stratum
contains
subjects and the total sample size is
.
For subject
in stratum
,
let
and
denote the event and censoring times, respectively. Each subject is
associated with
-dimensional
covariates
.
We assume that
and
are independent conditional on
.
We define the observed time
and the event indicator
.
Consider the following stratified Cox proportional hazards model:
where
is an unspecified stratum-specific baseline hazard function, treated as
an infinite-dimensional nuisance parameter that absorbs between-stratum
heterogeneity arising from differences in source populations, clinical
practice, or unmeasured confounding, and
is a common regression
parameter shared across all strata.
denotes the internal risk score; under the standard linear
specification,
.
Assume that, in stratum
,
the observed cohort has
unique failure times
.
The stratified Cox log-partial likelihood is given by
where
is the at-risk indicator in stratum
,
and
indicates whether subject
in stratum
fails at time
.
Let
be chosen as the negative entropy so that the resulting Bregman
divergence reduces to the KullbackβLeibler divergence. To construct the
probabilistic framework, let
denote the at-risk set at
in stratum
,
let
denote the event that subject
fails in the interval
, and let
collect all failure and censoring information up to time
,
together with the information that exactly one failure occurs in
.
Then
defines a sequence of conditional experiments. At each event time
in stratum
,
the internal working model specifies the conditional density as
The stratum-specific probability mass assigned to subject
under the
internal model is
where
is the stratum-specific baseline hazard that cancels in the ratio.
To extract information from the external model, we replace the
internal risk score with the external risk score
,
obtained by applying the external coefficient estimates
to the internal cohort. The corresponding probability mass under the
external model is
The KL divergence between the external and internal conditional
experiments at time
is
Accumulating over all strata and failure times yields the total
divergence:
which, after substituting the Cox-model expressions, simplifies
to
where
does not involve
.