Main function for fitting a penalized generalized regression
Usage
pp.lasso(
data,
Y.char,
Z.char,
prov.char,
standardize = T,
lambda,
nlambda = 100,
lambda.min.ratio = 1e-04,
penalize.x = rep(1, length(Z.char)),
penalized.multiplier,
lambda.early.stop = FALSE,
nvar.max = p,
stop.dev.ratio = 0.001,
bound = 10,
backtrack = FALSE,
tol = 1e-04,
max.each.iter = 10000,
max.total.iter = (max.each.iter * nlambda),
actSet = TRUE,
actIter = max.each.iter,
actVarNum = sum(penalize.x == 1),
actSetRemove = F,
returnX = FALSE,
trace.lambda = FALSE,
threads = 1,
MM = FALSE,
...
)
Arguments
- data
an
dataframe
orlist
object that contains the variables in the model.- Y.char
name of the response variable in
data
as a character string.- Z.char
names of covariates in
data
as vector of character strings.- prov.char
name of provider IDs variable in
data
as a character string. If "prov.char" is not specified, all observations are are considered to be from the same provider.- standardize
logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is
standardize=TRUE
.- lambda
a user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on
nlambda
andlambda.min.ratio
.- nlambda
the number of lambda values. Default is 100.
- lambda.min.ratio
the fraction of the smallest value for lambda with
lambda.max
(smallest lambda for which all coefficients are zero) on log scale. Default is 1e-04.- penalize.x
a vector indicates whether the corresponding covariate will be penalized. If equals 0, variable is unpenalized, else is penalized. Default is a vector of 1's (all covariates are penalized).
- penalized.multiplier
A vector of values representing multiplicative factors by which each covariate's penalty is to be multiplied. Default is a vector of 1's.
- lambda.early.stop
whether the program stop before running the entire sequence of lambda. Early stop based on the ratio of deviance for models under two successive lambda. Default is
FALSE
.- nvar.max
number of maximum selected variables. Default is the number of all covariates.
- stop.dev.ratio
if
lambda.early.stop = TRUE
, the ratio of deviance for early stopping. Default is 1e-3.- bound
a positive number to avoid inflation of provider effect. Default is 10.
- backtrack
for updating the provider effect, whether to use the "backtracking line search" with Newton method.
- tol
convergence threshold. For each lambda, the program will stop if the maximum change of covariate coefficient is smaller than
tol
. Default is 1e-4.- max.each.iter
maximum number of iterations for each lambda. Default is 1e4.
- max.total.iter
maximum number of iterations for entire path. Default is
max.each.iter
*nlambda
.- actSet
whether to use the active method for variable selection. Default is TRUE.
- actIter
if
actSet = TRUE
, the maximum number of iterations for a new updated active set. Default ismax.each.iter
(i.e. we will update the current active set until convergence ).- actVarNum
if
actSet = TRUE
, the maximum number of variables that can be selected into the new active set for each time when the active set is updated. Default isnvar.max
.- actSetRemove
if
actSet = TRUE
, whether we remove the zero coefficients from the current active set. Default is FALSE.- returnX
whether return the standardized design matrix. Default is FALSE.
- trace.lambda
whether display the progress for fitting the entire path. Default is FALSE.
- threads
number of cores that are used for parallel computing.
- MM
whether we use the "Majorize-Minimization" algorithm to optimize the objective function.
- ...
extra arguments to be passed to function.
Value
An object with S3 class ppLasso
.
- beta
the fitted matrix of covariate coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.
- gamma
the fitted value of provider effects.
- lambda
the sequence of
lambda
values in the path.- loss
the loss of the fitted model at each value of
lambda
.- linear.predictors
the linear predictors of the fitted model at each value of
lambda
.- df
the estimates of effective number of selected variables all the points along the regularization path.
- iter
the number of iterations until convergence at each value of
lambda
.
References
K. He, J. Kalbfleisch, Y. Li, and et al. (2013) Evaluating hospital readmission rates in dialysis facilities; adjusting for hospital effects.
Lifetime Data Analysis, 19: 490-512.
Examples
data(BinaryData)
data <- BinaryData$data
Y.char <- BinaryData$Y.char
prov.char <- BinaryData$prov.char
Z.char <- BinaryData$Z.char
fit <- pp.lasso(data, Y.char, Z.char, prov.char)
# fitted values of covariate coefficients (under the lambda sequence that was automatically generated by the package).
round(fit$beta[1:5, 1:5], 5)
#> 0.132 0.1202 0.1096 0.0998 0.091
#> Z1 0 0.00000 0.00000 0.00000 0.00000
#> Z2 0 0.00000 0.00000 0.00000 0.00000
#> Z3 0 0.11203 0.21488 0.30988 0.39805
#> Z4 0 0.00000 0.00000 0.00000 0.00000
#> Z5 0 0.00000 0.00000 0.00000 0.00000
# estimated center effects
round(fit$gamma[1:5, 1:5], 5)
#> 0.132 0.1202 0.1096 0.0998 0.091
#> 1 -0.24120 -0.29399 -0.34311 -0.38893 -0.43182
#> 2 -1.96354 -1.87424 -1.79841 -1.73365 -1.67798
#> 3 -1.20894 -1.18831 -1.17129 -1.15721 -1.14552
#> 4 -1.96004 -1.89221 -1.83322 -1.78156 -1.73600
#> 5 -0.55006 -0.56985 -0.58945 -0.60876 -0.62768