Lasso Regression Model with R Code

    Date:

    Tibshirani (1996) introduces the so called LASSO (Least Absolute Shrinkage and Selection Operator) model for the selection and shrinkage of parameters. This model is very useful when we analyze big data. In this post, we learn how to set up the Lasso model and estimate it using glmnet R package.

    Tibshirani (1996) introduces the LASSO (Least Absolute Shrinkage and Selection Operator) model for the selection and shrinkage of parameters. The Ridge model is similar to it in terms of the shrinkage but does not have selection function because the ridge model make the coefficient of unimportant variable close to zero but not exactly to zero.

    These regression models are called as the regularized or penalized regression model. In particular, Lasso is so powerful that it can work for big dataset in which the number of variables is more than 100 or 1000, 10000, …. and so on. The traditional linear regression model cannot deal with this sort of big data.

    Although the linear regression estimator is the unbiased estimator in terms of bias-variance trade-off relationship, the regularized or penalized regression such as Lasso, Ridge admit some bias for reducing variance. This means the minimization problem for the latter has two components: mean squared error and penalty for parameters.  l1-penalty of Lasso make variable selection and shrinkage possible but l2- penalty of Ridge make only shrinkage possible.

    Model : Lasso

    For observation index i=1,2,…,N and variable index j=1,2,…,p standardized predictors xij and demeaned or centered response variables) are given, Lasso model finds βj which minimize the following objective function.

    Here, Y is demeaned for the sake of exposition, but this is not a must. However, X variables should be standardized with mean zero and unit variance because the difference in scale of variables tends to distribute penalty to each variables unequally.

    From the above equations, the first part of it is the RSS (Residual Sum of Squares) and the second is the penalty term. This penalty term is adjusted by the hyperparameter λ. Hyperparameter is given exogenously by user through the process of manual searching or cross-validation.

    When certain variable is included in the Lasso but decreases RSS so small to be negligible (i.e. decreasing RSS by 0.000000001), the impact of the shrinkage penalty grows. This means the coefficient of this variable is to zero (Lasso) or close to zero (Ridge).

    Unlike the Ridge (convex and differentiable), the Lasso (non-convex and non-differentiable) does not have the closed-form solution in most problems and we use the cyclic coordinate descent algorithm. The only exception is the case where all X variables are orthonormal but this case is highly unlikely.

    R code

    Let’s estimate parameters of Lasso and Ridge using glmnet R package which provides fast calculation and useful functions.

    For example, we make some artificial time series data. let X be 10 randomly drawn time series (variables) and Y variable with predetermined coefficients and randomly drawn error terms. Some coefficient are set to zero for the clear understanding of the differences between the standard linear, Lasso, and Ridge regression. The R code is as follows.

    #=========================================================================#
    # Financial Econometrics & Derivatives, ML/DL using R, Python, Tensorflow 
    # by Sang-Heon Lee
    #
    # https://shleeai.blogspot.com
    #-------------------------------------------------------------------------#
    # Lasso, Ridge
    #=========================================================================#
     
    library(glmnet)
     
        graphics.off()  # clear all graphs
        rm(list = ls()) # remove all files from your workspace
        
        N = 500 # number of observations
        p = 20  # number of variables
        
    #--------------------------------------------
    # X variable
    #--------------------------------------------
        X = matrix(rnorm(N*p), ncol=p)
     
    # before standardization
        colMeans(X)    # mean
        apply(X,2,sd)  # standard deviation
     
    # scale : mean = 0, std=1
        X = scale(X)
     
    # after standardization
        colMeans(X)    # mean
        apply(X,2,sd)  # standard deviation
     
    #--------------------------------------------
    # Y variable
    #--------------------------------------------
        beta = c( 0.15, -0.33,  0.25, -0.25, 0.05,rep(0, p/2-5), 
                 -0.25,  0.12, -0.125, rep(0, p/2-3))
     
        # Y variable, standardized Y
        y = X%*%beta + rnorm(N, sd=0.5)
        y = scale(y)
     
    #--------------------------------------------
    # Model
    #--------------------------------------------
        lambda <- 0.01
        
        # standard linear regression without intercept(-1)
        li.eq <- lm(y ~ X-1) 
        
        # lasso
        la.eq <- glmnet(X, y, lambda=lambda, 
                        family="gaussian", 
                        intercept = F, alpha=1) 
        # Ridge
        ri.eq <- glmnet(X, y, lambda=lambda, 
                        family="gaussian", 
                        intercept = F, alpha=0) 
     
    #--------------------------------------------
    # Results (lambda=0.01)
    #--------------------------------------------
        df.comp <- data.frame(
            beta    = beta,
            Linear  = li.eq$coefficients,
            Lasso   = la.eq$beta[,1],
            Ridge   = ri.eq$beta[,1]
        )
        df.comp
        
    #--------------------------------------------
    # Results (lambda=0.1)
    #--------------------------------------------
        lambda <- 0.1
        
        # lasso
        la.eq <- glmnet(X, y, lambda=lambda,
                        family="gaussian",
                        intercept = F, alpha=1) 
        # Ridge
        ri.eq <- glmnet(X, y, lambda=lambda,
                        family="gaussian",
                        intercept = F, alpha=0) 
        
        df.comp <- data.frame(
            beta    = beta,
            Linear  = li.eq$coefficients,
            Lasso   = la.eq$beta[,1],
            Ridge   = ri.eq$beta[,1]
        )
        df.comp
        
    #------------------------------------------------
    # Shrinkage of coefficients 
    # (rangle lambda input or without lambda input)
    #------------------------------------------------
        
        # lasso
        la.eq <- glmnet(X, y, family="gaussian", 
                        intercept = F, alpha=1) 
        # Ridge
        ri.eq <- glmnet(X, y, family="gaussian", 
                        intercept = F, alpha=0) 
        # plot
        x11(); par(mfrow=c(2,1)) 
        x11(); matplot(log(la.eq$lambda), t(la.eq$beta),
                       type="l", main="Lasso", lwd=2)
        x11(); matplot(log(ri.eq$lambda), t(ri.eq$beta),
                       type="l", main="Ridge", lwd=2)
        
    #------------------------------------------------    
    # Run cross-validation & select lambda
    #------------------------------------------------
        mod_cv <- cv.glmnet(x=X, y=y, family='gaussian',
                            intercept = F, alpha=1)
        
        # plot(log(mod_cv$lambda), mod_cv$cvm)
        # cvm : The mean cross-validated error 
        #     - a vector of length length(lambda)
        
        # lambda.min : the λ at which 
        # the minimal MSE is achieved.
        
        # lambda.1se : the largest λ at which 
        # the MSE is within one standard error 
        # of the minimal MSE.
        
        x11(); plot(mod_cv) 
        coef(mod_cv, c(mod_cv$lambda.min,
                       mod_cv$lambda.1se))
        print(paste(mod_cv$lambda.min,
                    log(mod_cv$lambda.min)))
        print(paste(mod_cv$lambda.1se,
                    log(mod_cv$lambda.1se)))

    Estimation Results

    The following figure shows true coefficients (β) with which we generate data, the estimated coefficients of three regression models.

    The estimation results provide similar results between models despite the uncertainty in data-generating process. In particular, Lasso is identifying the insignificant or unimportant variables as zero coefficients. The variable selection and shrinkage effect are strong with λ. The following figures show the change of estimated coefficients with respect to the change of the penalty parameter (log(λ)) which is the shrinkage path.

    Model Selection

    The most important thing in Lasso boils down to select the optimal λ. This is determined in the process of the cross-validation. cv.glmnet() function in glmnet provides the cross-validation results with some proper range of λ. Using this output, we can draw a graph of log(λ) and MSE(means squared error).

    From the above figures, the first candidate is the λ at which the minimal MSE is achieved but it is likely that this model have many variables. The second is the largest λ at which the MSE is within one standard error of the minimal MSE. This is somewhat heuristic or empirical approach but have some merits for reducing the number of variables. It is typical to choose the second, MSE minimized 1se λ. But visual inspection is very important tool to find the pattern of shrinkage process.

    The following result reports the estimated coefficients under the MSE minimized λ and MSE minimized 1se λ respectively.

    Forecast

    After estimating the parameters of Lasso regression, it is necessary to use this model for prediction. The forecasting exercise use not the penalty term but the estimated coefficients. Looking at the forecasting method, the Lasso, Ridge, and linear regression models are the same because the penalty term is only used for the estimation.

    Based on this post, sign restricted Lasso model will be discussed. It is important to set constraints on the sign of coefficient since the economic theory or empirical stylized fact advocate the specific sign.

    Tibshirani, Robert (1996). “Regression Shrinkage and Selection via the lasso,” Journal of the Royal Statistical Society 58-1, 267–88.

    Originally posted on SH Fintech Modeling.

    Disclosure: Interactive Brokers

    Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

    This material is from SHLee AI Financial Model and is being posted with its permission. The views expressed in this material are solely those of the author and/or SHLee AI Financial Model and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

    Go Source

    Chart

    SignUp For Breaking Alerts

    New Graphic

    We respect your email privacy

    Share post:

    Popular

    More like this
    Related

    Investors Hesitate as Geopolitical Tensions Mount: Nov. 19, 2024

    Heightening geopolitical tensions between Kyiv and Moscow are generating...

    Options 101: Understanding the Basics of Financial Options

    The article “Options 101: Understanding the Basics of Financial...

    Securities Finance short interest data provider Orbisa offers analysis as markets react to the U.S. Election

    SHORT INTEREST: Securities Finance Reacts to the U.S. Election  Donald...

    Bond opportunities in a resilient US economy

    Listen to the full conversation with Matt here Key takeaways A...