fit a cross-validated Cox non-proportional hazards model with P-spline or Smoothing-spline where penalization tuning parameter is provided by cross-validation

Fit a Cox non-proportional hazards model via penalized maximum likelihood. The penalization tuning parameter is provided by cross-validation.

cv.coxtp(
  event,
  z,
  time,
  strata = NULL,
  lambda = c(0.1, 1, 10),
  nfolds = 5,
  foldid = NULL,
  knots = NULL,
  penalty = "Smooth-spline",
  nsplines = 8,
  ties = "Breslow",
  tol = 1e-06,
  iter.max = 20L,
  method = "ProxN",
  gamma = 1e+08,
  btr = "dynamic",
  tau = 0.5,
  stop = "ratch",
  parallel = FALSE,
  threads = 1L,
  degree = 3L,
  fixedstep = FALSE
)

Arguments

event

failure event response variable of length nobs, where nobs denotes the number of observations. It should be a vector containing 0 or 1.

z

input covariate matrix, with nobs rows and nvars columns; each row is an observation.

time

observed event times, which should be a vector with non-negative values.

strata

a vector of indicators for stratification. Default = NULL (i.e. no stratification group in the data), an unstratified model is implemented.

lambda

a user specified sequence as the penalization coefficients in front of the spline term specified by penalty. This is the tuning parameter for penalization. The function IC can be used to select the best tuning parameter based on the information criteria. Users can specify larger values when the absolute values of the estimated time-varying effects are too large. When lambda is 0, Newton method without penalization is fitted.

nfolds

number of folds for cross-validation, the default value is 5. The smallest value allowable is nfolds=3.

foldid

an optional vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.

knots

the internal knot locations (breakpoints) that define the B-splines. The number of the internal knots should be nsplines-degree-1. If NULL, the locations of knots are chosen as quantiles of distinct failure time points. This choice leads to more stable results in most cases. Users can specify the internal knot locations by themselves.

penalty

a character string specifying the spline term for the penalized Newton method. This term is added to the log-partial likelihood, and the penalized log-partial likelihood serves as the new objective function to control the smoothness of the time-varying coefficients. Default is P-spline. Three options are P-spline, Smooth-spline and NULL. If NULL, the method will be the same as coxtv (unpenalized time-varying effects models) and lambda (defined below) will be set as 0.

P-spline stands for Penalized B-spline. It combines the B-spline basis with a discrete quadratic penalty on the difference of basis coefficients between adjacent knots. When lambda goes to infinity, the time-varying effects are reduced to be constant.

Smooth-spline refers to the Smoothing-spline, the derivative-based penalties combined with B-splines. See degree for different choices. When degree=3, we use the cubic B-spline penalizing the second-order derivative, which reduces the time-varying effect to a linear term when lambda goes to infinity. When degree=2, we use the quadratic B-spline penalizing first-order derivative, which reduces the time-varying effect to a constant when lambda goes to infinity. See Wood (2017) for details.

If P-spline or Smooth-spline, then lambda is initialized as a sequence (0.1, 1, 10). Users can modify lambda. See details in lambda.

nsplines

number of basis functions in the splines to span the time-varying effects. The default value is 8. We use the R function splines::bs to generate the B-splines.

ties

a character string specifying the method for tie handling. If there are no tied events, the methods are equivalent. By default "Breslow" uses the Breslow approximation, which can be faster when many ties are present. If ties = "none", no approximation will be used to handle ties.

tol

tolerance used for stopping the algorithm. See details in stop below. The default value is 1e-6.

iter.max

maximum iteration number if the stopping criterion specified by stop is not satisfied. The default value is 20.

method

a character string specifying whether to use Newton method or proximal Newton method. If "Newton" then Hessian is used, while the default method "ProxN" implements the proximal Newton which can be faster and more stable when there exists ill-conditioned second-order information of the log-partial likelihood. See details in Wu et al. (2022).

gamma

parameter for proximal Newton method "ProxN". The default value is 1e8.

btr

a character string specifying the backtracking line-search approach. "dynamic" is a typical way to perform backtracking line-search. See details in Convex Optimization by Boyd and Vandenberghe (2004). "static" limits Newton's increment and can achieve more stable results in some extreme cases, such as ill-conditioned second-order information of the log-partial likelihood, which usually occurs when some predictors are categorical with low frequency for some categories. Users should be careful with static, as this may lead to under-fitting.

tau

a positive scalar used to control the step size inside the backtracking line-search. The default value is 0.5.

stop

a character string specifying the stopping rule to determine convergence. "incre" means we stop the algorithm when Newton's increment is less than the tol. See details in Convex Optimization (Chapter 10) by Boyd and Vandenberghe (2004). "relch" means we stop the algorithm when the \((loglik(m)-loglik(m-1))/(loglik(m))\) is less than the tol, where \(loglik(m)\) denotes the log-partial likelihood at iteration step m. "ratch" means we stop the algorithm when \((loglik(m)-loglik(m-1))/(loglik(m)-loglik(0))\) is less than the tol. "all" means we stop the algorithm when all the stopping rules ("incre", "relch", "ratch") are met. The default value is ratch. If iter.max is achieved, it overrides any stop rule for algorithm termination.

parallel

if TRUE, then the parallel computation is enabled. The number of threads in use is determined by threads.

threads

an integer indicating the number of threads to be used for parallel computation. The default value is 2. If parallel is false, then the value of threads has no effect.

degree

degree of the piecewise polynomial for generating the B-spline basis functions---default is 3 for cubic splines. degree = 2 results in the quadratic B-spline basis functions.

If penalty is P-spline or NULL, degree's default value is 3.

If penalty is Smooth-spline, degree's default value is 2.

fixedstep

if TRUE, the algorithm will be forced to run iter.max steps regardless of the stopping criterion specified.

Value

An object of class "cv.coxtp" is returned, which is a list with the ingredients of the cross-validation fit.

model.cv: a "coxtp" object with tuning parameter chosen based on cross-validation.
lambda: the values of lambda used in the fits.
cve: the mean cross-validated error - a vector having the same length as lambda. For the k-th testing fold (k = 1, ..., nfolds), we take the remaining folds as the training folds. Based on the model trained on the training folds, we calculate the log-partial likelihood on all the folds \(loglik0\) and training folds \(loglik1\). The cve is equal to \(-2*(loglik0 - loglik1)\). See details in Verweij (1993). This approach avoids the construction of a partial likelihood on the test set so that the risk set is always sufficiently large.
lambda.min: the value of lambda that gives minimum cve.

Details

The function runs coxtp length of lambda by nfolds times; each is to compute the fit with each of the folds omitted.

References

Boyd, S., and Vandenberghe, L. (2004) Convex optimization. Cambridge University Press.

Gray, R. J. (1992) Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association, 87(420): 942-951.

Gray, R. J. (1994) Spline-based tests in survival analysis. Biometrics, 50(3): 640-652.

Luo, L., He, K., Wu, W., and Taylor, J. M. (2023) Using information criteria to select smoothing parameters when analyzing survival data with time-varying coefficient hazard models. Statistical Methods in Medical Research, in press.

Perperoglou, A., le Cessie, S., and van Houwelingen, H. C. (2006) A fast routine for fitting Cox models with time varying effects of the covariates. Computer Methods and Programs in Biomedicine, 81(2): 154-161.

Verweij, P. J., and Van Houwelingen, H. C. (1993) Cross‐validation in survival analysis. Statistics in Medicine, 12(24): 2305-2314.

Wu, W., Taylor, J. M., Brouwer, A. F., Luo, L., Kang, J., Jiang, H., and He, K. (2022) Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients. Lifetime Data Analysis, 28(2): 194-218.

Wood, S. N. (2017) P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Statistics and Computing, 27(4): 985-989.

Examples

if (FALSE) {
data(ExampleData)
z <- ExampleData$z
time  <- ExampleData$time
event <- ExampleData$event
lambda  = c(0.1, 1)
fit  <- cv.coxtp(event = event, z = z, time = time, lambda=lambda, nfolds = 5)
}