`surtvep`

is an R package for fitting penalized Newton’s
method for the time-varying effects model using mAIC, TIC, GIC as
information criteria, in particular we span the parameter using basis
functions. Utilities for carrying out post-fitting visualization,
summarization, and inference are also provided. In this tutorial we
introduce the use of `surtvep`

through an example
dataset.

This tutorial aims to provide a comprehensive guide on survival
estimation for the time-varying coefficient model using the
`surtvep`

package in R. The package addresses the challenges
associated with large-scale time-to-event data and the estimation of
time-varying effects in survival analysis. By implementing a
computationally efficient Kronecker product-based proximal algorithm and
incorporating various penalties to improve estimation,
`surtvep`

offers an efficient and flexible solution for
handling time-varying coefficients in survival analysis.

Throughout the tutorial, we will cover the key features and
capabilities of the `surtvep`

package, providing examples and
use cases to illustrate the package’s functionality. By the end of this
tutorial, you will have a solid understanding of how to use
`surtvep`

for survival estimation with time-varying
coefficients, as well as how to leverage the package’s unique features
to improve your analysis.

```
#Install the package, need to install the devtools packages:
require("devtools")
require("remotes")
remotes::install_github("UM-KevinHe/surtvep", ref = "Lingfeng_test")
```

The purpose of this section is to give users a general sense of the package. We will briefly go over the main functions, basic operations and outputs. After this section, users may have a better idea of what functions are available, which ones to use, or at least where to seek help.

First, we load the ‘surtvep’ package:

The main functions used in the package are Newton’s method ‘coxtv’ and Newton’s method combined with penalization ‘coxtp’, which we will demonstrate in this section. We load a set of data created beforehand for illustration:

```
data("ExampleData")
z <- ExampleData$z
time <- ExampleData$time
event <- ExampleData$event
```

The command loads an input covariate matrix ‘z’, time-to-event outcome ‘time’ and ‘event’ from this saved R data archive. The saved data set is a simulation data set with continuous covariates.

We fit the Newton’s method without penalization use the most basic call to ‘coxtv’.

```
fit.tv <- coxtv(z = z, event = event, time = time)
#> Iter 1: Obj fun = -3.2982771; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -3.2916285; Stopping crit = 2.1424865e-02;
#> Iter 3: Obj fun = -3.2916034; Stopping crit = 8.0884492e-05;
#> Iter 4: Obj fun = -3.2916034; Stopping crit = 1.8232153e-09;
#> Algorithm converged after 4 iterations!
```

‘fit.tv’ is an object of class ‘coxtv’ that contains all the relevant
information of the fitted model for further use. We do not encourage
users to extract the components directly. Instead, various methods are
provided for the object such as `plot`

and `test`

that enable us to execute those tasks more elegantly.

We can get the time-varying coefficients by calling the
`get.tvcoef`

method:

```
beta.tv <- get.tvcoef(fit.tv)
head(beta.tv)
#> X1 X2
#> 0.000178354960296403 0.8190941 -0.07539314
#> 0.000355177544950556 0.8202944 -0.07436044
#> 0.000532422851851532 0.8214929 -0.07332691
#> 0.0011332947259668 0.8255221 -0.06983544
#> 0.00146478440156426 0.8277227 -0.06791732
#> 0.00224849921081871 0.8328625 -0.06340520
```

The first row of `beta.tv`

represents the time-varying
coefficient of x1 and x2 at time 0.00017835.

In order to get the time-varying coefficients on a new time sequence,
we can modify the `times`

argument in
`get.tvcoef`

.

```
time.new <- seq(1,2,0.1)
beta.tv.new <- get.tvcoef(fit.tv, time = time.new)
head(beta.tv.new)
#> X1 X2
#> 1 1.0753701 0.7029324
#> 1.1 1.0529174 0.5242153
#> 1.2 1.0039703 0.3215421
#> 1.3 0.9365847 0.1039432
#> 1.4 0.8590977 -0.1195252
#> 1.5 0.7798461 -0.3398073
```

We can visualize the time-varying coefficients by executing the
`plot`

method:

Each sub figure corresponds to a variable. It shows the time-varying
effect of our predictors. In our `ExampleData`

, the first
predictor has a constant effect of 1, and the second predictor has a
time-varying effect of \(\text{sin}(3\pi *
t/4)\), where \(t\) is the time.
The dotted line indicates the that hazard ratio is 0, which means the
predictor has no effect. Users may also wish to plot the effect of
different covariates in the same plot: this can be done by setting
`allinone = TRUE`

in the plot command.

The confidence interval shown presented on the figure can be
calculated using the `confint`

function.

`ci.df <- confint(fit.tv)`

Note that users can draw the plots by yourself with the necessary
data given. We give some raw example `ggplot`

codes. Please
refer to plotting for more
information.

Next we fit the Newton’s method combined with penalization method. We
specify a range of penalization coefficients first, then call the
`coxtp`

function. Detailed disucussion of how to specify the
range of penalization coefficients and how to choose the appropriate one
will be discussed in section Information Criteria.

```
lambda_all <- c(1)
fit.penalize = coxtp(z = z, event = event, time=time, lambda = lambda_all, method = "ProxN")
#> Iter 1: Obj fun = -3.3017443; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -3.2954100; Stopping crit = 2.0664261e-02;
#> Iter 3: Obj fun = -3.2953323; Stopping crit = 2.5346890e-04;
#> Iter 4: Obj fun = -3.2953321; Stopping crit = 4.6097119e-07;
#> Algorithm converged after 4 iterations!
#> lambda 1 is done.
```

With the tools introduced so far, users are able to fit the time-varying model. There are many more arguments in the package that give users a great deal of flexibility. To learn more, move on to later section.

This section gives the flowchart for the `surtvep`

package. Generally, `coxtv`

utilizes proximal Newton’s method
to estimate the time-varying coefficients. `coxtp`

combines
the Newton’s approach with penalization. `IC`

calculates
different information criteria to select the best tuning parameter in
front of the penalty term. `cv.coxtp`

uses cross-validation
for tuning parameter selection.. `tvef.ph`

,
`tvef.ph.time`

and `tvef.ph.zero`

provides
hypothesis testing for the fitted model. `get.tvef`

retrieves
the time-varying coefficients for the fitted model. `confint`

provides confidence intervals for these coefficients.
`baseline`

offers the baseline estimations. `plot`

visualizes the estimated time-varying coefficients. Other functions and
details are provided in later sections. Ï

While obtaining these estimates under the Cox proportional-hazards model is relatively straightforward, the assumption of constant hazard ratios is frequently violated in populations defined by an initial, acute event, such as myocardial infarction, or in studies with long-term follow-up.

The Cox non-proportional hazards model is a flexible and powerful tool for modeling the time-varying effects of covariates in survival analysis, allowing for the estimation of hazard ratios that change over time. This model can capture complex relationships between predictor variables and the time-to-event outcome, providing a more accurate representation of the underlying processes.

Let \(D_{i}\) denote the time lag from transplantation to death and \(C_{i}\) be the censoring time for patient \(i\), \(i=1,\ldots, n\). Here \(n_j\) is the sample size. The observed time is \(T_{i} = \min\{D_{i},C_{i}\}\), and the death indicator is given by \(\delta_{i} = I(D_{i} \leq C_{i})\). Let \(\boldsymbol{X}_{i}=(X_{i1}, \ldots, X_{iP})^T\) be a \(P\)-dimensional covariate vector. We assume that \(D_{i}\) is independent from \(C_{i}\) given \(\textbf{X}_{i}\). Consider the hazard function \[ \lambda(t|\boldsymbol{X}_{i}) = \lambda_{0}(t)\exp\{\boldsymbol{X}_{i}^\top{\boldsymbol\beta}(t)\}, %\nonumber \] where \(\lambda_{0}(t)\) is the baseline hazard. The time-varying coefficients \(\beta(t)\) represent the effect of predictors on the outcome are varying through different time points.

To estimate the time-varying coefficients \({\boldsymbol\beta}(t)=\{\beta_{1}(t),\ldots, \beta_{P}(t)\}\), we span \(\boldsymbol\beta(\cdot)\) by a set of cubic B-splines defined on a given number of knots: \[\begin{eqnarray} \beta_{p}(t)=\boldsymbol\theta_{p}^\top\boldsymbol{B}(t)=\sum_{k=1}^K \theta_{pk} B_k(t), ~~ p=1, \ldots, P, \nonumber \end{eqnarray}\] where \(\boldsymbol{B} (t)=\{B_1(t), \ldots, B_K(t)\}^T\) forms a basis, \(K\) is the number of basis functions, and \(\boldsymbol\theta_{p}=(\theta_{p1}, \ldots, \theta_{pK})^T\) is a vector of coefficients with \(\theta_{pk}\) being the coefficient for the \(k\)-th basis of the \(p\)-th covariate. Detailed inforamtion about the construction of the B-spline basis can seen in the B-spline construction.

In this section we introduce the Newton’s method for estimating time-varying effects in detail.

With a length-\(PK\) parameter vector \(\boldsymbol\theta=vec(\boldsymbol\Theta)\), the vectorization of the coefficient matrix \(\boldsymbol\Theta=(\boldsymbol\theta_{1}, \ldots, \boldsymbol\theta_{P})^T\) by row, the log-partial likelihood function is \[\begin{equation} \ell(\boldsymbol\theta)=\sum_{i=1}^{n_j} \delta_{i} \left [\boldsymbol{X}_{i}^T \boldsymbol\Theta \boldsymbol{B}(T_{i}) -\log \left\{\sum_{i' \in R_{i}} \exp \{\boldsymbol{X}_{i' }^T \boldsymbol\Theta \boldsymbol{B}(T_{i}) \} \right \} \right ] \end{equation}\], where \(R_{i}=\{i': 1 \leq i' \leq n, ~ T_{i'}\geq T_{i}\}\) is the at-risk set.

`coxtv`

applies Newton’s method to solve the problem.
Specifically, suppose we have current estimates \(\widetilde{\boldsymbol\theta}\), the update
is \[
\widetilde{\boldsymbol\theta} \leftarrow
\widetilde{\boldsymbol\theta} + \nu \boldsymbol{\mu};
\] where \[
\boldsymbol{\mu} = \left(- \nabla^2 \ell(\boldsymbol{\theta})
\right)^{-1} \nabla \ell(\boldsymbol{\theta})
\], and \(\nu\) is a step size
adjusted by backtracking linesearch. \(\nabla
\ell(\boldsymbol{\theta})\) and \(\nabla^2 \ell(\boldsymbol{\theta})\) is the
first and second derivative of the log partial likelihood.

`coxtv`

provides various arguments for users to customize
the fit: we introduce some commonly used arguments here.

`strata`

is a vector of indicators for stratification. Default =`NULL`

, (i.e. no stratification group in the data), an unstratified model is implemented.`nsplines`

number of basis functions in the splines to span the time-varying effects, whose default value is 8. We use the R function`splines::bs`

to generate the B-splines.`ties`

a character string specifying the method for tie handling. If there are no tied events, the methods are equivalent. By default`"Breslow"`

uses the Breslow approximation, which can be faster when many ties are present.`tol`

tolerance used for stopping the algorithm. See details in`stop`

below. The default value is`1e-6`

.`iter.max`

maximum iteration number if the stopping criterion specified by`stop`

is not satisfied. Default value is 20.`degree`

degree of the piecewise polynomial for generating the B-spline basis functions—default is 3 for cubic splines.`degree = 2`

results in the quadratic B-spline basis functions.`method`

a character string specifying whether to use Newton method or proximal Newton method. If`"Newton"`

then Hessian is used, while the default method`"ProxN"`

implements the proximal Newton which can be faster and more stable when there exists ill-conditioned second-order information of the log-partial likelihood.`gammma`

parameter for proximal Newton method`"ProxN"`

. Default value is`1e8`

.`btr`

a character string specifying the backtracking line-search approach.`"dynamic"`

is a typical way to perform backtracking line-search. See details in Convex Optimization by Boyd and Vandenberghe (2009).`"static"`

limits Newton’s increment and can achieve more stable results in some extreme cases, such as ill-conditioned second-order information of the log-partial likelihood, which usually occurs when some predictors are categorical with low frequency for some categories. Users should be careful with`static`

as this may lead to under-fitting.`tau`

a scalar in (0,1) used to control the step size inside the backtracking line-search. The default value is 0.5.`stop`

character string specifying the stopping rule to determine convergence. Use to denote the log-partial likelihood at iteration step m.`"incre"`

means we stop the algorithm when Newton’s increment is less than the`tol`

.`"relch"`

means we stop the algorithm when the is less than the`tol`

.`"ratch"`

means we stop the algorithm when is less than the`tol`

.`"all"`

means we stop the algorithm when all the stopping rules`"incre"`

,`"relch"`

and`"ratch"`

are met. Default value is`ratch`

.`iter.max`

, if achieved, overrides any stop rule for algorithm termination.`parallel`

if`TRUE`

, then the parallel computation is enabled. The number of threads in use is determined by`threads`

.`threads`

an integer indicating the number of threads to be used for parallel computation. Default is`2`

. If`parallel`

is false, then the value of`threads`

has no effect.`fixedstep`

if`TRUE`

, the algorithm will be forced to run`iter.max`

steps regardless of the stopping criterion specified.

In the following sections we brefily describe these useful arguments
when calling `coxtv`

.

Now we start with a relatively harsh simulated data. Here, the covariates V1 and V2 were generated as binary variables with around 90% frequency, which is a relatively harsh setting to be estimated. The related true log-hazard function for each variable is \(\beta_1(t)=1\) and \(\beta_2(t)=exp(-1.5*t)\), where t denotes time.

Let’s check the data first:

```
data("ExampleDataBinary")
table(ExampleDataBinary$x[,1])
#> < table of extent 0 >
table(ExampleDataBinary$x[,2])
#> < table of extent 0 >
z <- ExampleDataBinary$z
time <- ExampleDataBinary$time
event <- ExampleDataBinary$event
```

Both predictors are presented with frequency around 25%.

`method = "ProxN"`

The ‘method’ parameter has two options.
* method="Newton"* and

`method="ProxN"`

```
fit.newton <- coxtv(z = z, event = event, time = time, method = 'Newton')
fit.proxN <- coxtv(z = z, event = event, time = time, method = 'ProxN')
```

Then we plot the fitted methods with all the curves on the same plot. No obvious change can be oberserved here. However, we do recommand using

`strata`

When the data of patients are collected from different strata, we can extend the model to a stratified version. We use \(j=1,\ldots,J\) to denote the \(J\) different centers. Let \(D_{ij}\) denote the time to death and \(C_{ij}\) be the censoring time for patient \(i\) in center \(j\), \(i=1,\ldots, n_j\), and \(j=1, \ldots, J\). Here \(n_j\) is the sample size in center \(j\). The total number of patients is \(N=\sum_{j=1}^Jn_j\), the observed time is \(T_{ij} = \min\{D_{ij},C_{ij}\}\), and the death indicator is given by \(\delta_{ij} = I(D_{ij} \leq C_{ij})\).

Let \(\textbf{X}_{ij}=(X_{ij1}, \ldots, X_{ijP})^T\) be a \(P\)-dimensional covariate vector. We assume that \(D_{ij}\) is independent from \(C_{ij}\) given \(\textbf{X}_{ij}\). Correspondingly, the log-partial likelihood function is \[ \ell_{strata}(\boldsymbol\theta) = \sum_{j=1}^J \sum_{i=1}^{n_j} \delta_{ij} \left [\boldsymbol{X}_{ij}^T \boldsymbol\Theta \boldsymbol{B}(T_{ij}) -\log \left\{\sum_{i' \in R_{ij}} \exp \{\boldsymbol{X}_{i' j}^T \boldsymbol\Theta \boldsymbol{B}(T_{ij}) \} \right \} \right ], \] where \(R_{ij}=\{i': 1 \leq i' \leq n_j, ~ T_{i' j}\geq T_{ij}\}\) is the at-risk set for stratum \(j\).

For the case with different strata, the usage is to include the strata variable. First, we load a set of generated data with different stratums. In our simulation, we vary the baseline for different stratums by adding a small term to the baseline function. The small term is generated by uniform distribution with mean = 0 and standard deviation = 0.5.

```
data("StrataExample")
z <- StrataExample$z
time <- StrataExample$time
event <- StrataExample$event
strata <- StrataExample$strata
```

The `strata`

is a vector of indicators for stratification.
The stratified model can be easily fitted by calling

`btr`

`btr`

is a character string specifying the backtracking
line-search approach. `"dynamic"`

is a typical way to perform
backtracking line-search. See details in Convex Optimization by Boyd and
Vandenberghe (2009). `"static"`

limits Newton’s increment and
can achieve more stable results in some extreme cases, such as
ill-conditioned second-order information of the log-partial likelihood,
which usually occurs when some predictors are categorical with low
frequency for some categories.

Users should be careful with `static`

as this may lead to
under-fitting.

The proximal Newton can help improve the estimation when the origional hessian matrix is very close to a singular one, which may often occur in the setting of time-varying effects, however, over-fitting issue still exists. We further improve the estimation by introducing the penalty. The basic idea of penalization is to control the model’s smoothness by adding a ‘wiggliness’ penalty to the fitting objective. Rather than fitting the non-proportional hazards model by maximizing original log-partial likelihood, it could be fitted by maximizing \[\begin{align} \ell(\boldsymbol{\theta}) - P_\lambda(\boldsymbol{\theta}). \end{align}\]

We have different choices of \(P_\lambda(\boldsymbol{\theta})\). Potential choices are to use P-splines, and discrete penalties. Detailed discussions are provided below.

The P-splines are low rank smoothers using a B-spline basis, usually defined on evenly spaced knots, with a difference penalty applied directly to the parameters \(\theta_{pk}\), to control function wiggliness. When we set standard cubic B-spline basis functions, the penalty function used for \(\beta_p\) will be \[\begin{align*} P_\lambda(\boldsymbol{\theta}) = \lambda\sum_{j=1}^P\sum_{k=1}^{K-1}\{(\boldsymbol{\theta}_{j(k+1)} - \boldsymbol{\theta}_{jk})\}^2. \end{align*}\]

It is straight forward to express the penalty as a quadratic form, \(\boldsymbol{\theta}^T\boldsymbol{S}\boldsymbol{\theta}\), in this basis coefficients:

\[\begin{align*} \sum_{k=1}^{K-1}\{(\boldsymbol{\theta}_{j(k+1)} - \boldsymbol{\theta}_{jk})\}^2 = \boldsymbol{\theta}^T \begin{bmatrix} 1 & -1 & 0 & 0 & . & . & . \\ -1 & 2 & -1 & 0 & 0 & . & . \\ 0 & -1 & 2 & -1 & 0 & 0 & . \\ . & . & . & . & . & . & .\\ . & . & . & . & . & . & .\\ \end{bmatrix} \boldsymbol{\theta} \end{align*}\]Hence the penalized fitting problem is to maximize \[\begin{align} \ell(\boldsymbol{\theta}) - \lambda\boldsymbol{\theta}^T\boldsymbol{S}\boldsymbol{\theta} \end{align}\] with respect to \(\boldsymbol{\theta}\).

The reduced rank spline smoothers with derivative based penalties can be set up almost as easily, while retaining the sparsity of the basis and penalty and the ability to mix-and-match the orders of spline basis functions and penalties.

We denote the B-spline basis as of order \(m_1\), and \(m_1 = 3\) denotes a cubic spline. Associated with the spline will be a derivative based penalty \[\begin{align*} P_{\lambda} = \lambda \int_{0}^{T}\boldsymbol{\beta}^{[m_2]}(t)^2dt \end{align*}\] where \(\boldsymbol{\beta}^{[m_2]}(t)\) denotes the \(m_2^{th}\) derivative of \(\boldsymbol{\beta}\) with respect to \(t\). It is assumed that \(m_2 \leq m_1\), otherwise it makes no sense that the penalty is formulated in terms of a derivative that is not properly defined for the basis functions. Similarly, \(P_{\lambda}\) can be written as \(\boldsymbol{\theta}^T\boldsymbol{S}\boldsymbol{\theta}\) where \(\boldsymbol{S}\) is a banded diagonal matrix of known coefficients. The algebraic expression of \(\boldsymbol{S}\) is complex, as discussed in . However, this has little impact on the computation time.

The usage of Newton’s method with penalization `coxtp`

is
very similar to `coxtv`

introduced above. The main difference
comes from the penalization tuning parameter selection.

We use the smooth spline here for penalization (default). You could
also use `spline="P-spline"`

. For tuning parameter
`lambda`

, you could either enter a single numeric value or a
vector of numbers. If `lambda`

is entered as a vector of
numbers, the tuning parameter can be selected based on different
criteria (AIC, TIC or GIC) using function `IC`

.
Alternatively, users can use `cv.coxtp`

to select the tuning
parameter \(\lambda\) based on cross
validation.

Following is a model fit with the `lambda`

as a vector for
different illustration purposes. We use the relatively harsh setting
`ExampleDataBinary`

to illustrate the usage of
`coxtp`

, `IC`

and `cv.coxtp`

, the
penalization method. This setting contains binary predictors with low
frequency, which may have ill-conditioned second-order information of
the log-partial likelihood. Penalization plays an important role here as
it helps to smooth the time-varying effects.

```
data(ExampleDataBinary)
z <- ExampleDataBinary$z
time <- ExampleDataBinary$time
event <- ExampleDataBinary$event
```

First we fit the penalized model with
`"Smooth-spline"`

.

```
lambda_spline_all = c(0.01,0.1,1,10,100,1000)
# fit.pspline <- coxtp(event = event, z = z, time = time, lambda=lambda_spline_all, spline = "P-spline")
fit.smoothspline <- coxtp(event = event, z = z, time = time, lambda = lambda_spline_all, penalty = "Smooth-spline")
```

Then we use the `IC`

function to select the tuning
parameter.

```
IC <- IC(fit.smoothspline)
print(IC$mAIC)
#> [1] 15209.79 15205.84 15201.74 15199.47 15199.18 15199.35
print(IC$TIC)
#> [1] 15207.09 15203.63 15199.85 15197.89 15198.62 15199.44
print(IC$GIC)
#> [1] 15208.92 15205.59 15201.99 15200.02 15199.50 15199.56
```

We can directly call the `plot`

function to give the
estimation plot.

From the result above, we notice that with different selection criteria, the best tuning parameter \(\lambda\) selected is different for different selection criteria. In this example, we use the TIC criterion. The result for this model is saved in the related criterion model and can be accessed as follows:

```
model.mAIC = IC$model.mAIC
model.TIC = IC$model.TIC
model.GIC = IC$model.GIC
```

Compared to the non-penalized model, we can see that the effect of ``V1” is shrunk to approximately linear in the penalized model.

The Nelson-Aalen estimator (Breslow estimator) of the cumulative function is given by \(\widetilde{\Lambda}(t) = \int_0^t \widetilde{\Lambda}_0(u)\), where \(\widetilde{\Lambda}(t)\) is 0 except at the observed failure times \(t_i\), at which it takes the value \[\begin{align*} d\Lambda_0 = {d_i}\left\{\mathop{\sum}\limits_{\ell \in R(T_i)} \exp \{\boldsymbol{X}_{i' }^T \boldsymbol{\Theta} \boldsymbol{B}(T_{i}) \}\right\}^{-1}. \end{align*}\]

The baseline estimation here refers to the baseline hazard at time t when holding all the covariates equal to zero.

```
data("ExampleData")
z <- ExampleData$z
time <- ExampleData$time
event <- ExampleData$event
time2 <- round(time, digits = 1)
```

```
lambda_spline_all=c(0.1,1,10,100)
fit.smoothspline <- coxtp(event = event, z = z, time = time2, lambda=lambda_spline_all, penalty = "Smooth-spline")
IC <- IC(fit.smoothspline)
fit.mAIC <- IC$model.mAIC
```

`base.est = baseline(fit.mAIC)`

```
head(base.est)
#> $time
#> [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
#> [20] 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
#>
#> $hazard
#> [1] 0.02576925 0.04632723 0.04796684 0.05258371 0.04708783 0.04544031
#> [7] 0.05096521 0.03807625 0.04421912 0.05162255 0.05153851 0.06367847
#> [13] 0.04167940 0.04678650 0.04840596 0.06907205 0.05280167 0.04285638
#> [19] 0.02951995 0.02536585 0.02495845 0.02703002 0.03259201 0.03841546
#> [25] 0.01788728 0.02645356 0.02097025 0.00000000 0.03385874 0.00000000
#> [31] 0.00000000
#>
#> $cumulHaz
#> [1] 0.02576925 0.07209648 0.12006331 0.17264703 0.21973486 0.26517517
#> [7] 0.31614038 0.35421663 0.39843575 0.45005830 0.50159681 0.56527528
#> [13] 0.60695468 0.65374117 0.70214714 0.77121918 0.82402086 0.86687724
#> [19] 0.89639719 0.92176304 0.94672149 0.97375150 1.00634352 1.04475898
#> [25] 1.06264625 1.08909982 1.11007006 1.11007006 1.14392881 1.14392881
#> [31] 1.14392881
```

As a result, we could either increase the time interval to make ties exists or remove the points that have \(\lambda=0\). The only differences would result in the baseline hazard, there is no influence on the cumulative baseline hazard.

In this section we provide three tests for the model. The first is to test

To test whether the effects are time-varying, we use the constant property of B-splines, that is, if \(\theta_{p1}=\cdots=\theta_{pK},\) the corresponding covariate effect is time-independent. Specify a matrix \(\boldsymbol{C}_p\) such that \(\boldsymbol{C}_p\boldsymbol\theta=\textbf{0}\) corresponds to the contrast that \(\theta_{p1}=\cdots=\theta_{pK}\). A Wald statistic can be constructed by \[\begin{align*} S_p=(\boldsymbol{C}_p\boldsymbol{\widehat\theta})^T \left[\boldsymbol{C}_p \{- \triangledown^2 \ell(\boldsymbol{\widehat\theta})\}^{-1}\boldsymbol{C}_p^T\right]^{-1}(\boldsymbol{C}_p\boldsymbol{\widehat\theta}). \end{align*}\]

Under the null hypothesis that the effect is time-independent, \(S_p\) is asymptotically chi-square distributed with \(K-1\) degrees of freedom.

```
tvef.ph(fit.proxN)
#> chisq df p
#> X1 6.889955 7 0.44042707
#> X2 21.437089 7 0.00317433
```

For the second variable, We reject the null hypothesis and claim it has the time varying effect.

To test whether the effects are significant or not, we use the test statistic \(\theta_{p1}=\cdots=\theta_{pK} = 0,\) the corresponding covariate effect is time-independent. Specify a matrix \(\boldsymbol{C}_p\) such that \(\boldsymbol{C}_p\boldsymbol\theta=\textbf{0}\) corresponds to the contrast that \(\theta_{p1}=\cdots=\theta_{pK} = 0\). A Wald statistic can be constructed by \[\begin{align*} S_p=(\boldsymbol{C}_p\boldsymbol{\widehat\theta})^T \left[\boldsymbol{C}_p \{- \triangledown^2 \ell(\boldsymbol{\widehat\theta})\}^{-1}\boldsymbol{C}_p^T\right]^{-1}(\boldsymbol{C}_p\boldsymbol{\widehat\theta}). \end{align*}\]

`test.zero <- tvef.zero(fit.proxN)`

Under the null hypothesis that the effect is time-independent, \(S_p\) is asymptotically chi-square distributed with \(K\) degrees of freedom.

`test.zero.time <- tvef.zero.time(fit.proxN)`

From the plot above, we could observe that with different knots
selected, the performance is not different so much for different knots,
all the estimates are relatively close to the real function. However, in
the penalized model, the lambda is more important which is another
reason that we need to select the best lambda before actually fitting
the model. The following plot is the performance of different lambda
selected. For the following plot, the knot was set as default, which is
`knot=8`

.

`SUPPORT`

In this section, we illustrate the usage of our package by analyzing
the time-varying effects on the real data `SUPPORT`

(Study to
Understand Prognoses Preferences Outcomes and Risks of Treatment). We
use the processed `SUPPORT`

data set included in the R
package (Bhatnagar et al. 2020).

We study the time-varying effects of metastatic cancer as the illustration example and include

First we load the data and get the survival outcome:

```
data(support)
time <- support$d.time
death <- support$death
```

Age, cancer status and diabetes status are included as covariates and considered as categorical variables. Age has 4 levels, cancer status has 3 levels and diabetes has 2 levels. The following codes processed the covariates for model fitting.

```
#diabetes:
diabetes <- model.matrix(~factor(support$diabetes))[,-1]
#sex: female as the reference group
sex <- model.matrix(~support$sex)[,-1]
#age: continuous variable
age <-support$age
age[support$age<=50] <- "<50"
age[support$age>50 & support$age<=60] <- "50-59"
age[support$age>60 & support$age<70] <- "60-69"
age[support$age>=70] <- "70+"
age <- factor(age, levels = c("60-69", "<50", "50-59", "70+"))
z_age <- model.matrix(~age)[,-1]
# cancer status: metastatic, yes, no
ca <- factor(support$ca)
z_ca_mets <- as.numeric(support$ca == "metastatic")
z_ca_non_mets <- as.numeric(support$ca == "yes")
z <- data.frame(z_age, z_ca_mets, z_ca_non_mets, sex, diabetes)
# colnames(z) <- c("age_50", "age_50_59", "age_70", "metastatic", "diabetes", "male")
colnames(z) <- c("age_50", "age_50_59", "age_70", "metastatic", "non_metastatic", "diabetes", "male")
```

The Kaplan-Meier (KM) plots were generated using the following codes to visualize the survival probability over time for three categorical variables. Analysis of the KM plot for cancer status revealed a crossover point at approximately day 200, where the survival probability of the non-metastatic cancer group surpassed that of the cancer group, suggesting a violation of the non-proportional hazards assumption.

```
library(survminer)
library(survival)
data <- data.frame(time, death, z)
fit1 <- survfit(Surv(time, death) ~ metastatic + non_metastatic, data = data)
fit2 <- survfit(Surv(time, death) ~ diabetes, data = data)
fit3 <- survfit(Surv(time, death) ~ age_50 + age_50_59 + age_70, data = data)
ggsurvplot(fit1, data = data, legend.labs = c("non-cancer", "non-metastatic cancer", "metastatic cancer"))
```

`ggsurvplot(fit2, data = data, legend.labs = c("diabetes", "non-diabetes"))`

`ggsurvplot(fit3, data = data, legend.labs = c("60-69", "70+", "50-59", "<50"))`

Fit the Newtons’ Method using `coxtv`

:

```
fit.coxtv <- coxtv(event = death, z = z, time = time, nsplines = 10)
#> Iter 1: Obj fun = -5.7809097; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -5.7668975; Stopping crit = 1.9495719e-01;
#> Iter 3: Obj fun = -5.7668366; Stopping crit = 8.4613404e-04;
#> Iter 4: Obj fun = -5.7668365; Stopping crit = 1.5189288e-06;
#> Iter 5: Obj fun = -5.7668365; Stopping crit = 1.8799714e-10;
#> Algorithm converged after 5 iterations!
```

We take a look at the estimation plots of cancer status using the
`plot`

function:

`plot(fit.coxtv, parm = c("metastatic", "non_metastatic"), ylim = c(-2,2), ylab = "log hazard ratio")`

Conditioned on the diabetes status and age, we can observe a violation of the proportional hazards assumption. A significant difference in the hazard of death was observed between the metastatic cancer, non-metastatic cancer, and non-cancer groups. Specifically, patients with metastatic cancer exhibited a higher hazard of death during the initial 1500 days compared with non-cancer patients.

However, upon examination of the estimation plot for non-metastatic cancer, a decrease at the beginning followed by an increase can be observed. The fluctuation in the plot is not deemed reliable, primarily due to the instability of the estimation. To address this issue, a penalty term was added to improve the estimation.

```
fit.coxtp <- coxtp(event = death, z = z, time = time, lambda = c(1,10,100,1000))
#> Iter 1: Obj fun = -5.7812444; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -5.7671833; Stopping crit = 1.9641947e-01;
#> Iter 3: Obj fun = -5.7671236; Stopping crit = 8.3225826e-04;
#> Iter 4: Obj fun = -5.7671235; Stopping crit = 1.7825937e-06;
#> Iter 5: Obj fun = -5.7671235; Stopping crit = 1.4381264e-10;
#> Algorithm converged after 5 iterations!
#> lambda 1 is done.
#> Iter 1: Obj fun = -5.7812465; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -5.7671838; Stopping crit = 1.9644253e-01;
#> Iter 3: Obj fun = -5.7671242; Stopping crit = 8.3187556e-04;
#> Iter 4: Obj fun = -5.7671241; Stopping crit = 1.7283676e-06;
#> Iter 5: Obj fun = -5.7671518; Stopping crit = 3.8614177e-04;
#> Iter 6: Obj fun = -5.7671241; Stopping crit = 3.8599213e-04;
#> Iter 7: Obj fun = -5.7671241; Stopping crit = 1.9834684e-13;
#> Algorithm converged after 7 iterations!
#> lambda 10 is done.
#> Iter 1: Obj fun = -5.7812812; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -5.7672135; Stopping crit = 1.9659361e-01;
#> Iter 3: Obj fun = -5.7671540; Stopping crit = 8.3082429e-04;
#> Iter 4: Obj fun = -5.7671539; Stopping crit = 1.4655789e-06;
#> Iter 5: Obj fun = -5.7673043; Stopping crit = 2.1045372e-03;
#> Iter 6: Obj fun = -5.7671539; Stopping crit = 2.1001131e-03;
#> Iter 7: Obj fun = -5.7671539; Stopping crit = 8.0611971e-13;
#> Algorithm converged after 7 iterations!
#> lambda 100 is done.
#> Iter 1: Obj fun = -5.7814771; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -5.7674171; Stopping crit = 1.9704674e-01;
#> Iter 3: Obj fun = -5.7673576; Stopping crit = 8.3362394e-04;
#> Iter 4: Obj fun = -5.7673575; Stopping crit = 1.3752688e-06;
#> Iter 5: Obj fun = -5.7675286; Stopping crit = 2.4012175e-03;
#> Iter 6: Obj fun = -5.7673575; Stopping crit = 2.3954592e-03;
#> Iter 7: Obj fun = -5.7673575; Stopping crit = 1.0074129e-12;
#> Algorithm converged after 7 iterations!
#> lambda 1000 is done.
IC.support <- IC(fit.coxtp)
```

`plot(IC.support$model.mAIC, parm = c("metastatic", "non_metastatic"), ylim = c(-2,2), ylab = "log hazard ratio")`

Then we provide the hypothesis testing for each covariate.

```
#test the proportional hazards assumption
test.ph <- tvef.ph(fit.coxtv)
print(test.ph)
#> chisq df p
#> age_50 46.311169 9 5.271760e-07
#> age_50_59 13.208472 9 1.533978e-01
#> age_70 3.338770 9 9.493378e-01
#> metastatic 389.789436 9 2.064355e-78
#> non_metastatic 4.168983 9 8.999424e-01
#> diabetes 19.018021 9 2.504010e-02
#> male 44.476142 9 1.153035e-06
#test the significance of each covariate
test.zero <- tvef.zero(fit.coxtv)
print(test.zero)
#> chisq df p
#> age_50 92.95934 10 1.384020e-15
#> age_50_59 23.20617 10 1.001062e-02
#> age_70 33.59559 10 2.162004e-04
#> metastatic 997.76510 10 5.666717e-208
#> non_metastatic 134.96747 10 4.517433e-24
#> diabetes 25.96236 10 3.791138e-03
#> male 48.35095 10 5.354766e-07
```

When testing for the proportional hazards assumption, age_50, non_cancer, non_metastatic, diabetes and male has p-value < 0.05. We reject the null hypothesis and claim that these covariates have violated the proportional hazards assumption.

All these covariates are significant based on the Wald test.

Then we can use the following codes to get the baseline estimation and the baseline plot.

`base <- baseline(IC.support$model.TIC)`

`plot(base)`

When dealing with data where ties are presented, we can also give baseline hazard and absolute hazard for each covariate.

We utilized the Breslow’s approximation to estimate the baseline hazard at each event time. We round the unique event time first.

```
time.discrete <- round(support$d.time, digits = -2)
fit.discrete <- coxtv(event = death, z = z, time = time.discrete, nsplines = 10)
#> Iter 1: Obj fun = -5.8724904; Stopping crit = 1.0000000e+00;
#> Iter 2: Obj fun = -5.8627640; Stopping crit = 1.6318989e-01;
#> Iter 3: Obj fun = -5.8627181; Stopping crit = 7.6917032e-04;
#> Iter 4: Obj fun = -5.8627180; Stopping crit = 1.5741850e-06;
#> Iter 5: Obj fun = -5.8627180; Stopping crit = 1.9209980e-10;
#> Algorithm converged after 5 iterations!
base.discrete <- baseline(fit.discrete)
```

The following codes provide the absolute hazards plot.

```
beta.abso <- data.frame(get.tvcoef(fit.discrete))
time_unique<- unique(time.discrete)
haz_j <- base.discrete$hazard
smooth_data <- data.frame(time_unique, haz_j)
loessMod30 <- loess(haz_j ~ time_unique, data=smooth_data, span=0.30) # 30% smoothing
haz_j_smooth30 <- predict(loessMod30)
exp_betax<- matrix(0,length(unique(time.discrete)),dim(z)[2])
beta2 <- beta.abso
beta2$cancer <- 0
exp_betax <- exp(beta2)
haz_0_betax <- haz_j_smooth30 * exp_betax
```

```
library(ggplot2)
plot_age <- ggplot(haz_0_betax, aes(x=time_unique)) +
geom_line(aes(y=metastatic, color = "metastatic-cancer"),size = 0.6) +
geom_line(aes(y=non_metastatic, color = "non-metastatic"),size = 0.6) +
geom_line(aes(y=cancer, color = "non-cancer"),size = 0.6) +
scale_color_manual(name='Age at diagnosis',
breaks = c('non-cancer', 'non-metastatic', 'metastatic-cancer'),
values = c('non-cancer'='dark green', 'non-metastatic'='red', 'metastatic-cancer' = 'black')) +
theme(plot.title = element_text(hjust = 0.2))
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
plot_age
```

Bhatnagar, Sahir, Maxime Turgeon, Jesse Islam, Olli Saarela, and James
Hanley. 2020. “Casebase: Fitting Flexible Smooth-in-Time Hazards
and Risk Functions via Logistic and Multinomial Regression.” https://CRAN.R-project.org/package=casebase.