Fit a group penalized generalized regression model

Main function for fitting a group penalized generalized regression model

Usage

grp.lasso(
  data,
  Y.char,
  Z.char,
  prov.char,
  group = 1:length(Z.char),
  group.multiplier,
  standardize = T,
  lambda,
  nlambda = 100,
  lambda.min.ratio = 1e-04,
  lambda.early.stop = FALSE,
  nvar.max = p,
  group.max = length(unique(group)),
  stop.dev.ratio = 0.001,
  bound = 10,
  backtrack = FALSE,
  tol = 1e-04,
  max.each.iter = 10000,
  max.total.iter = (max.each.iter * nlambda),
  actSet = TRUE,
  actIter = max.each.iter,
  actGroupNum = sum(unique(group) != 0),
  actSetRemove = F,
  returnX = FALSE,
  trace.lambda = FALSE,
  threads = 1,
  ...
)

Arguments

data: an dataframe or list object that contains the variables in the model.
Y.char: name of the response variable in data as a character string.
Z.char: names of covariates in data as vector of character strings.
prov.char: name of provider IDs variable in data as a character string. If "prov.char" is not specified, all observations are are considered to be from the same provider.
group: a vector describing the grouping of the coefficients. If there are coefficients to be included in the model without being penalized, assign them to group 0 (or "0").
group.multiplier: A vector of values representing multiplicative factors by which each covariate's penalty is to be multiplied. Default is a vector of 1's.
standardize: logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE.
lambda: a user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio.
nlambda: the number of lambda values. Default is 100.
lambda.min.ratio: the fraction of the smallest value for lambda with lambda.max (smallest lambda for which all coefficients are zero) on log scale. Default is 1e-04.
lambda.early.stop: whether the program stop before running the entire sequence of lambda. Early stop based on the ratio of deviance for models under two successive lambda. Default is FALSE.
nvar.max: number of maximum selected variables. Default is the number of all covariates.
group.max: number of maximum selected groups. Default is the number of all groups.
stop.dev.ratio: if lambda.early.stop = TRUE, the ratio of deviance for early stopping. Default is 1e-3.
bound: a positive number to avoid inflation of provider effect. Default is 10.
backtrack: for updating the provider effect, whether to use the "backtracking line search" with Newton method.
tol: convergence threshold. For each lambda, the program will stop if the maximum change of covariate coefficient is smaller than tol. Default is 1e-4.
max.each.iter: maximum number of iterations for each lambda. Default is 1e4.
max.total.iter: maximum number of iterations for entire path. Default is max.each.iter * nlambda.
actSet: whether to use the active method for variable selection. Default is TRUE.
actIter: if actSet = TRUE, the maximum number of iterations for a new updated active set. Default is max.each.iter (i.e. we will update the current active set until convergence ).
actGroupNum: if actSet = TRUE, the maximum number of variables that can be selected into the new active set for each time when the active set is updated. Default is number of groups.
actSetRemove: if actSet = TRUE, whether we remove the zero coefficients from the current active set. Default is FALSE.
returnX: whether return the standardized design matrix. Default is FALSE.
trace.lambda: whether display the progress for fitting the entire path. Default is FALSE.
threads: number of cores that are used for parallel computing.
...: extra arguments to be passed to function.

Value

An object with S3 class gr_ppLasso.

beta: the fitted matrix of covariate coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.
gamma: the fitted value of provider effects.
group: a vector describing the grouping of the coefficients.
lambda: the sequence of lambda values in the path.
loss: the loss of the fitted model at each value of lambda.
linear.predictors: the linear predictors of the fitted model at each value of lambda.
df: the estimates of effective number of selected variables all the points along the regularization path.
iter: the number of iterations until convergence at each value of lambda.

Details

The model is fit by Newton method and coordinate descent method.

References

K. He, J. Kalbfleisch, Y. Li, and et al. (2013) Evaluating hospital readmission rates in dialysis facilities; adjusting for hospital effects. Lifetime Data Analysis, 19: 490-512.

Examples

data(BinaryData)
data <- BinaryData$data
Y.char <- BinaryData$Y.char
prov.char <- BinaryData$prov.char
Z.char <- BinaryData$Z.char
group <- BinaryData$group
fit <- grp.lasso(data, Y.char, Z.char, prov.char, group = group)
# fitted values of covariate coefficients (under the lambda sequence that was automatically generated by the package).
round(fit$beta[1:5, 1:5], 5)
#>    0.0939   0.0856    0.078    0.071   0.0647
#> Z1      0  0.00000  0.00000 -0.00333 -0.05627
#> Z2      0  0.00000  0.00000  0.00388  0.06351
#> Z3      0  0.11268  0.21468  0.30730  0.38978
#> Z4      0 -0.01162 -0.01402 -0.00961 -0.00191
#> Z5      0  0.00000  0.00000  0.00000  0.00000
# estimated center effects
round(fit$gamma[1:5, 1:5], 5)
#>     0.0939   0.0856    0.078    0.071   0.0647
#> 1 -0.24117 -0.28906 -0.33666 -0.38433 -0.44264
#> 2 -1.96359 -1.88178 -1.80802 -1.74141 -1.68328
#> 3 -1.20895 -1.19072 -1.17424 -1.15896 -1.13944
#> 4 -1.96008 -1.89903 -1.84185 -1.78821 -1.73552
#> 5 -0.55005 -0.56817 -0.58719 -0.60666 -0.62532