Fit a penalized discrete survival model

Main function for fitting a penalized discrete survival model

Usage

pp.DiscSurv(
  data,
  Event.char,
  prov.char,
  Z.char,
  Time.char,
  lambda,
  nlambda = 100,
  lambda.min.ratio = 1e-04,
  penalize.x = rep(1, length(Z.char)),
  penalized.multiplier,
  lambda.early.stop = FALSE,
  nvar.max = p,
  stop.dev.ratio = 0.001,
  bound = 10,
  backtrack = FALSE,
  tol = 1e-04,
  max.each.iter = 10000,
  max.total.iter = (max.each.iter * nlambda),
  actSet = TRUE,
  actIter = max.each.iter,
  actVarNum = sum(penalize.x == 1),
  actSetRemove = F,
  returnX = FALSE,
  trace.lambda = FALSE,
  threads = 1,
  MM = FALSE,
  return.transform.data = FALSE,
  ...
)

Arguments

data: an dataframe or list object that contains the variables in the model.
Event.char: name of the event indicator in data as a character string. Event indicator should be a binary variable with 1 indicating that the event has occurred and 0 indicating (right) censoring.
prov.char: name of provider IDs variable in data as a character string.
Z.char: names of covariates in data as vector of character strings.
Time.char: name of the follow up time in data as a character string.
lambda: a user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio.
nlambda: the number of lambda values. Default is 100.
lambda.min.ratio: the fraction of the smallest value for lambda with lambda.max (smallest lambda for which all coefficients are zero) on log scale. Default is 1e-04.
penalize.x: a vector indicates whether the corresponding covariate will be penalized. If equals 0, variable is unpenalized, else is penalized. Default is a vector of 1's (all covariates are penalized).
penalized.multiplier: A vector of values representing multiplicative factors by which each covariate's penalty is to be multiplied. Default is a vector of 1's.
lambda.early.stop: whether the program stop before running the entire sequence of lambda. Early stop based on the ratio of deviance for models under two successive lambda. Default is FALSE.
nvar.max: number of maximum selected variables. Default is the number of all covariates.
stop.dev.ratio: if lambda.early.stop = TRUE, the ratio of deviance for early stopping. Default is 1e-3.
bound: a positive number to avoid inflation of provider effect. Default is 10.
backtrack: for updating the provider effect, whether to use the "backtracking line search" with Newton method.
tol: convergence threshold. For each lambda, the program will stop if the maximum change of covariate coefficient is smaller than tol. Default is 1e-4.
max.each.iter: maximum number of iterations for each lambda. Default is 1e4.
max.total.iter: maximum number of iterations for entire path. Default is max.each.iter * nlambda.
actSet: whether to use the active method for variable selection. Default is TRUE.
actIter: if actSet = TRUE, the maximum number of iterations for a new updated active set. Default is max.each.iter (i.e. we will update the current active set until convergence ).
actSetRemove: if actSet = TRUE, whether we remove the zero coefficients from the current active set. Default is FALSE.
returnX: whether return the standardized design matrix. Default is FALSE.
trace.lambda: whether display the progress for fitting the entire path. Default is FALSE.
threads: number of cores that are used for parallel computing.
MM: whether we use the "Majorize-Minimization" algorithm to optimize the objective function.
...: extra arguments to be passed to function.

Value

An object with S3 class ppDiscSurv.

beta: the fitted matrix of covariate coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.
alpha: the fitted value of logit-transformed baseline hazard.
gamma: the fitted value of provider effects. The effect of the first provider is set to be reference group.
lambda: the sequence of lambda values in the path.
df: the estimates of effective number of selected variables all the points along the regularization path.
iter: the number of iterations until convergence at each value of lambda.

Details

The model is fit by Newton method and coordinate descent method.

References

K. He, J. Kalbfleisch, Y. Li, and et al. (2013) Evaluating hospital readmission rates in dialysis facilities; adjusting for hospital effects. Lifetime Data Analysis, 19: 490-512.

Examples

data(DiscTime)
data <- DiscTime$data
Event.char <- DiscTime$Event.char
prov.char <- DiscTime$prov.char
Z.char <- DiscTime$Z.char
Time.char <- DiscTime$Time.char
fit <- pp.DiscSurv(data, Event.char, prov.char, Z.char, Time.char)
fit$beta[, 1:5]
#>    0.1601     0.1458     0.1329     0.1211     0.1103
#> Z1      0 -0.1735757 -0.3325774 -0.4793679 -0.6155647
#> Z2      0  0.0000000  0.0000000  0.0000000  0.0000000
#> Z3      0  0.0000000  0.0000000  0.0000000  0.0000000
#> Z4      0  0.0000000  0.0000000  0.0000000  0.0000000
#> Z5      0  0.0000000  0.0000000  0.0000000  0.0000000
fit$alpha[, 1:5]
#>                 0.1601    0.1458    0.1329    0.1211    0.1103
#> [Time: 0.53] -1.668065 -1.747753 -1.824209 -1.898663 -1.971909
#> [Time: 1.03] -1.480994 -1.526815 -1.571328 -1.615488 -1.659886
#> [Time: 3.92] -1.622145 -1.641556 -1.661848 -1.683563 -1.706991
#> [Time: 6.74] -1.251585 -1.257340 -1.264124 -1.272503 -1.282795
#> [Time: 12.5] -1.781635 -1.780095 -1.780526 -1.783340 -1.788771
fit$gamma[, 1:5] #effect of the first provider is set to be zero
#>       0.1601     0.1458    0.1329     0.1211     0.1103
#> 1  0.0000000  0.0000000  0.000000  0.0000000  0.0000000
#> 2 -4.5665147 -4.3516946 -4.162992 -3.9945779 -3.8425705
#> 3 -0.7478311 -0.7116142 -0.682481 -0.6580980 -0.6367834
#> 4  1.2412090  1.1456446  1.059286  0.9815198  0.9121089
#> 5 -2.3399410 -2.1966372 -2.070536 -1.9577270 -1.8555876