Main function for fitting a group penalized generalized regression model
Usage
grp.lasso(
data,
Y.char,
Z.char,
prov.char,
group = 1:length(Z.char),
group.multiplier,
standardize = T,
lambda,
nlambda = 100,
lambda.min.ratio = 1e-04,
lambda.early.stop = FALSE,
nvar.max = p,
group.max = length(unique(group)),
stop.dev.ratio = 0.001,
bound = 10,
backtrack = FALSE,
tol = 1e-04,
max.each.iter = 10000,
max.total.iter = (max.each.iter * nlambda),
actSet = TRUE,
actIter = max.each.iter,
actGroupNum = sum(unique(group) != 0),
actSetRemove = F,
returnX = FALSE,
trace.lambda = FALSE,
threads = 1,
...
)
Arguments
- data
an
dataframe
orlist
object that contains the variables in the model.- Y.char
name of the response variable in
data
as a character string.- Z.char
names of covariates in
data
as vector of character strings.- prov.char
name of provider IDs variable in
data
as a character string. If "prov.char" is not specified, all observations are are considered to be from the same provider.- group
a vector describing the grouping of the coefficients. If there are coefficients to be included in the model without being penalized, assign them to group 0 (or "0").
- group.multiplier
A vector of values representing multiplicative factors by which each covariate's penalty is to be multiplied. Default is a vector of 1's.
- standardize
logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is
standardize=TRUE
.- lambda
a user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on
nlambda
andlambda.min.ratio
.- nlambda
the number of lambda values. Default is 100.
- lambda.min.ratio
the fraction of the smallest value for lambda with
lambda.max
(smallest lambda for which all coefficients are zero) on log scale. Default is 1e-04.- lambda.early.stop
whether the program stop before running the entire sequence of lambda. Early stop based on the ratio of deviance for models under two successive lambda. Default is
FALSE
.- nvar.max
number of maximum selected variables. Default is the number of all covariates.
- group.max
number of maximum selected groups. Default is the number of all groups.
- stop.dev.ratio
if
lambda.early.stop = TRUE
, the ratio of deviance for early stopping. Default is 1e-3.- bound
a positive number to avoid inflation of provider effect. Default is 10.
- backtrack
for updating the provider effect, whether to use the "backtracking line search" with Newton method.
- tol
convergence threshold. For each lambda, the program will stop if the maximum change of covariate coefficient is smaller than
tol
. Default is 1e-4.- max.each.iter
maximum number of iterations for each lambda. Default is 1e4.
- max.total.iter
maximum number of iterations for entire path. Default is
max.each.iter
*nlambda
.- actSet
whether to use the active method for variable selection. Default is TRUE.
- actIter
if
actSet = TRUE
, the maximum number of iterations for a new updated active set. Default ismax.each.iter
(i.e. we will update the current active set until convergence ).- actGroupNum
if
actSet = TRUE
, the maximum number of variables that can be selected into the new active set for each time when the active set is updated. Default is number of groups.- actSetRemove
if
actSet = TRUE
, whether we remove the zero coefficients from the current active set. Default is FALSE.- returnX
whether return the standardized design matrix. Default is FALSE.
- trace.lambda
whether display the progress for fitting the entire path. Default is FALSE.
- threads
number of cores that are used for parallel computing.
- ...
extra arguments to be passed to function.
Value
An object with S3 class gr_ppLasso
.
- beta
the fitted matrix of covariate coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.
- gamma
the fitted value of provider effects.
- group
a vector describing the grouping of the coefficients.
- lambda
the sequence of
lambda
values in the path.- loss
the loss of the fitted model at each value of
lambda
.- linear.predictors
the linear predictors of the fitted model at each value of
lambda
.- df
the estimates of effective number of selected variables all the points along the regularization path.
- iter
the number of iterations until convergence at each value of
lambda
.
References
K. He, J. Kalbfleisch, Y. Li, and et al. (2013) Evaluating hospital readmission rates in dialysis facilities; adjusting for hospital effects.
Lifetime Data Analysis, 19: 490-512.
Examples
data(BinaryData)
data <- BinaryData$data
Y.char <- BinaryData$Y.char
prov.char <- BinaryData$prov.char
Z.char <- BinaryData$Z.char
group <- BinaryData$group
fit <- grp.lasso(data, Y.char, Z.char, prov.char, group = group)
# fitted values of covariate coefficients (under the lambda sequence that was automatically generated by the package).
round(fit$beta[1:5, 1:5], 5)
#> 0.0939 0.0856 0.078 0.071 0.0647
#> Z1 0 0.00000 0.00000 -0.00333 -0.05627
#> Z2 0 0.00000 0.00000 0.00388 0.06351
#> Z3 0 0.11268 0.21468 0.30730 0.38978
#> Z4 0 -0.01162 -0.01402 -0.00961 -0.00191
#> Z5 0 0.00000 0.00000 0.00000 0.00000
# estimated center effects
round(fit$gamma[1:5, 1:5], 5)
#> 0.0939 0.0856 0.078 0.071 0.0647
#> 1 -0.24117 -0.28906 -0.33666 -0.38433 -0.44264
#> 2 -1.96359 -1.88178 -1.80802 -1.74141 -1.68328
#> 3 -1.20895 -1.19072 -1.17424 -1.15896 -1.13944
#> 4 -1.96008 -1.89903 -1.84185 -1.78821 -1.73552
#> 5 -0.55005 -0.56817 -0.58719 -0.60666 -0.62532