LASSO vs Ridge Regression in Machine Learning - When to use what?
Apr 12, 2025
LASSO vs Ridge Regression in Machine Learning - When to use what?
When building a regression model, particularly in the context of high-dimensional datasets, one of the key concerns that arises early is the risk of overfitting. It’s a common challenge: the model performs well on training data but generalizes poorly to unseen examples.
To mitigate this, we apply regularization — a set of techniques designed to penalize model complexity and improve generalization. Among the most widely used are LASSO (L1 regularization) and Ridge (L2 regularization). At a glance, they might seem quite similar: both modify the loss function by adding a penalty term; both aim to reduce overfitting; both are staples in the toolbox of any machine learning practitioner.
Yet, when it comes to practical use, they behave very differently. Choosing the right one can significantly impact the performance, interpretability, and stability of your model.
In this article, we’ll explore what sets Ridge and LASSO apart — how each works under the hood, what makes them suited for different problems, and how you can make an informed choice between them in real-world scenarios.
Note: LASSO or Ridge are not constrained to linear models. But for simplicity, in this newsletter, we will consider the linear form only.
1. What is LASSO regression?
Let’s begin with LASSO — Least Absolute Shrinkage and Selection Operator.
Imagine that we want to fit (i.e., find weights) a linear model of the following form:
To fit the model and find the weights, LASSO minimizes the following objective function:
Here, λ
is a regularization hyperparameter that controls the strength of the penalty. The key aspect is the L1 norm — the sum of the absolute values of the weights.
This penalty has a unique and very useful effect: it encourages sparsity. During optimization, the cost of large weights increases linearly, and this pushes some weights to become exactly zero. In practical terms, this means that LASSO not only regularizes your model but also performs feature selection.
From a geometric perspective, the constraint introduced by the L1 norm forms a diamond-shaped region in weight space. When the loss function is minimized within this constrained region, the corners of the diamond, which align with the coordinate axes, tend to become optimal solutions. These corners correspond to zero values for one or more coefficients.
The result is a sparse model: one that automatically selects a subset of the most relevant features and discards the rest by setting their weights to zero.
Here's a more detailed visualization of how weights become zeros. From the figure, we see that as we decrease the diamond size (increase λ), we can reach the point where the objective function touches the constrained region at zero weight value.
2. What is Ridge Regression?
Now let’s turn to Ridge regression, which uses the L2 norm (square weight values) as its penalty.
Unlike LASSO, Ridge penalizes the square of the weights rather than their absolute values. The effect of this difference is subtle but important.
While Ridge also reduces the magnitude of weights — thereby controlling model complexity — it does not drive them to zero. Instead, it shrinks them smoothly, meaning that all features remain in the model, although with potentially reduced influence.
Geometrically, the L2 penalty creates a circular constraint region (or a hypersphere in higher dimensions). This shape lacks the sharp corners present in LASSO’s diamond-shaped region. As a result, the solution to the optimization problem tends to fall inside the sphere, but not on any axis — hence, no weights are eliminated.
This behavior makes Ridge regression especially useful when:
-
You believe that many or most features are at least somewhat predictive.
-
You are dealing with highly correlated features — Ridge tends to distribute the weights more evenly among them, making the model more stable.
-
Your goal is prediction performance rather than model interpretability.
3. So, what is the main difference?
As we see above, the main difference between LASSO and Ridge is the fact that LASSO naturally drives part of the weights towards zero. This naturally creates the process of feature selection.
Ridge, on the other hand, does not drive weights to zero while it drives them to become small.
Now, 3 questions arise:
Q1: Can Ridge give you zero weights?
Answer: Yes, sure!
Q2: Does LASSO create more zero weights than Ridge?
Answer: Yes, and this is exactly the main difference between them!
Here you can see this in the figure below..
Q3: When to use what?
Answer: See below!
4. When to use LASSO vs Ridge?
When to Use LASSO
LASSO shines in scenarios where:
-
You expect that only a small subset of features truly drives the outcome.
-
You want to build a model that is easy to interpret, with fewer predictors.
-
You’re working with high-dimensional data (e.g., gene expression datasets, NLP feature vectors).
-
You want a built-in mechanism for feature elimination during training.
However, LASSO is less stable when your dataset contains highly correlated features. In such cases, it may arbitrarily select one feature and ignore the others — even if they are equally informative. This can lead to high variance across different training samples or cross-validation folds.
When to Use Ridge
Ridge is generally preferred when:
-
You believe that many features contribute to the target variable, even if their contributions are small.
-
Your features are collinear or highly correlated — Ridge spreads the weight across them, reducing instability.
-
You’re prioritizing predictive accuracy over interpretability.
-
You want a regularized model, but you don’t want to eliminate features.
In practice, Ridge is often more robust out of the box and easier to tune, especially when working with noisy or collinear data.
5. Summary
Here's the summary table as a takeaway.
To stay up to day with my articles and roadmaps both on the technical and career part of your ML journey, subscribe to my weekly newsletter below!
Related articles & posts