Skip to contents

Main function for fitting a group penalized generalized regression model

Usage

grp.lasso(
  data,
  Y.char,
  Z.char,
  prov.char,
  group = 1:length(Z.char),
  group.multiplier,
  standardize = T,
  lambda,
  nlambda = 100,
  lambda.min.ratio = 1e-04,
  lambda.early.stop = FALSE,
  nvar.max = p,
  group.max = length(unique(group)),
  stop.dev.ratio = 0.001,
  bound = 10,
  backtrack = FALSE,
  tol = 1e-04,
  max.each.iter = 10000,
  max.total.iter = (max.each.iter * nlambda),
  actSet = TRUE,
  actIter = max.each.iter,
  actGroupNum = sum(unique(group) != 0),
  actSetRemove = F,
  returnX = FALSE,
  trace.lambda = FALSE,
  threads = 1,
  ...
)

Arguments

data

an dataframe or list object that contains the variables in the model.

Y.char

name of the response variable in data as a character string.

Z.char

names of covariates in data as vector of character strings.

prov.char

name of provider IDs variable in data as a character string. If "prov.char" is not specified, all observations are are considered to be from the same provider.

group

a vector describing the grouping of the coefficients. If there are coefficients to be included in the model without being penalized, assign them to group 0 (or "0").

group.multiplier

A vector of values representing multiplicative factors by which each covariate's penalty is to be multiplied. Default is a vector of 1's.

standardize

logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE.

lambda

a user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio.

nlambda

the number of lambda values. Default is 100.

lambda.min.ratio

the fraction of the smallest value for lambda with lambda.max (smallest lambda for which all coefficients are zero) on log scale. Default is 1e-04.

lambda.early.stop

whether the program stop before running the entire sequence of lambda. Early stop based on the ratio of deviance for models under two successive lambda. Default is FALSE.

nvar.max

number of maximum selected variables. Default is the number of all covariates.

group.max

number of maximum selected groups. Default is the number of all groups.

stop.dev.ratio

if lambda.early.stop = TRUE, the ratio of deviance for early stopping. Default is 1e-3.

bound

a positive number to avoid inflation of provider effect. Default is 10.

backtrack

for updating the provider effect, whether to use the "backtracking line search" with Newton method.

tol

convergence threshold. For each lambda, the program will stop if the maximum change of covariate coefficient is smaller than tol. Default is 1e-4.

max.each.iter

maximum number of iterations for each lambda. Default is 1e4.

max.total.iter

maximum number of iterations for entire path. Default is max.each.iter * nlambda.

actSet

whether to use the active method for variable selection. Default is TRUE.

actIter

if actSet = TRUE, the maximum number of iterations for a new updated active set. Default is max.each.iter (i.e. we will update the current active set until convergence ).

actGroupNum

if actSet = TRUE, the maximum number of variables that can be selected into the new active set for each time when the active set is updated. Default is number of groups.

actSetRemove

if actSet = TRUE, whether we remove the zero coefficients from the current active set. Default is FALSE.

returnX

whether return the standardized design matrix. Default is FALSE.

trace.lambda

whether display the progress for fitting the entire path. Default is FALSE.

threads

number of cores that are used for parallel computing.

...

extra arguments to be passed to function.

Value

An object with S3 class gr_ppLasso.

beta

the fitted matrix of covariate coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.

gamma

the fitted value of provider effects.

group

a vector describing the grouping of the coefficients.

lambda

the sequence of lambda values in the path.

loss

the loss of the fitted model at each value of lambda.

linear.predictors

the linear predictors of the fitted model at each value of lambda.

df

the estimates of effective number of selected variables all the points along the regularization path.

iter

the number of iterations until convergence at each value of lambda.

Details

The model is fit by Newton method and coordinate descent method.

References

K. He, J. Kalbfleisch, Y. Li, and et al. (2013) Evaluating hospital readmission rates in dialysis facilities; adjusting for hospital effects. Lifetime Data Analysis, 19: 490-512.

See also

coef, plot function.

Examples

data(BinaryData)
data <- BinaryData$data
Y.char <- BinaryData$Y.char
prov.char <- BinaryData$prov.char
Z.char <- BinaryData$Z.char
group <- BinaryData$group
fit <- grp.lasso(data, Y.char, Z.char, prov.char, group = group)
# fitted values of covariate coefficients (under the lambda sequence that was automatically generated by the package).
round(fit$beta[1:5, 1:5], 5)
#>    0.0939   0.0856    0.078    0.071   0.0647
#> Z1      0  0.00000  0.00000 -0.00333 -0.05627
#> Z2      0  0.00000  0.00000  0.00388  0.06351
#> Z3      0  0.11268  0.21468  0.30730  0.38978
#> Z4      0 -0.01162 -0.01402 -0.00961 -0.00191
#> Z5      0  0.00000  0.00000  0.00000  0.00000
# estimated center effects
round(fit$gamma[1:5, 1:5], 5)
#>     0.0939   0.0856    0.078    0.071   0.0647
#> 1 -0.24117 -0.28906 -0.33666 -0.38433 -0.44264
#> 2 -1.96359 -1.88178 -1.80802 -1.74141 -1.68328
#> 3 -1.20895 -1.19072 -1.17424 -1.15896 -1.13944
#> 4 -1.96008 -1.89903 -1.84185 -1.78821 -1.73552
#> 5 -0.55005 -0.56817 -0.58719 -0.60666 -0.62532