Is GARCH still useful in the age of deep learning?

Yes, more than people expect. For one-day-ahead daily-frequency volatility forecasting on equities, GARCH(1,1) and GJR-GARCH are competitive with the best ML approaches and frequently win on out-of-sample log-likelihood. ML approaches dominate at higher frequencies (5-minute and below), in cross-sectional settings (joint modeling of many assets), and when alternative data is available. For the most common volatility-forecasting use case, GARCH is the right starting point.

How often should I refit GARCH parameters?

For daily data, monthly refits are typical. GARCH parameters are stable enough that more frequent refits add noise; less frequent refits miss slow drift in the volatility dynamics. ARIA Analyst refits monthly on a rolling 5-year window. Be careful about look-ahead, the fit at time t should use only data through t.

What is the difference between GARCH and implied volatility?

GARCH produces a model-based forecast of statistical volatility, the standard deviation of realized returns. Implied volatility (from option prices via Black-Scholes inversion) reflects the market's consensus expectation of volatility, which embeds a risk premium for volatility risk and any deviations from the lognormal model. The two are correlated but not identical. Implied vol is typically 1-2 percentage points higher than realized for equity indices because of the volatility risk premium. Both are useful; they answer different questions.

GARCH for Volatility Forecasting: A Practical Guide

Volatility clusters. Calm periods are followed by calm periods; turbulent periods are followed by turbulent periods. This single empirical fact, first formalized by Engle (1982) for ARCH and extended by Bollerslev (1986) to GARCH, is the foundation of modern volatility modeling. Despite forty years of subsequent research, the simple GARCH(1,1) specification remains hard to beat for one-day-ahead volatility forecasts on equity returns. This article explains how GARCH works, why it works, and where it stops working.

The clustering observation

Look at any equity return series. The daily returns themselves are roughly uncorrelated (close to white noise on average), but the squared returns, a proxy for variance, are strongly autocorrelated. A large move today predicts a large move tomorrow, regardless of direction. This is the autocorrelation in the second moment that GARCH models capture.

The intuition is that volatility is driven by a slowly-changing latent state, risk appetite, macroeconomic uncertainty, regime-specific volatility, that persists for days to weeks before transitioning. The state itself is unobserved, but its effects on squared returns are visible. GARCH provides a parametric model of this latent state.

The GARCH(1,1) recursion

The GARCH(1,1) model specifies that the conditional variance at time t depends on the previous period's squared return and the previous period's variance:

σ²_t = ω + α · r²_{t-1} + β · σ²_{t-1}

The three parameters are ω (the long-run variance level scaled by 1 − α − β), α (the weight on recent returns), and β (the weight on recent variance). For typical equity data, α is around 0.05-0.10 and β is around 0.85-0.92. The sum α + β measures volatility persistence: if α + β = 1, volatility has a unit root (the so-called IGARCH limit); if α + β < 1, volatility is mean-reverting toward the long-run level ω / (1 − α − β).

The "(1,1)" in GARCH(1,1) refers to the lag orders: one lag of squared returns, one lag of variance. Higher-order GARCH(p,q) models exist but rarely fit equity data better than (1,1). The parsimony of GARCH(1,1) is one of its key features, three parameters describe most of the volatility structure in daily returns.

Parameter estimation

GARCH parameters are estimated by maximum likelihood under a distributional assumption for the standardized residuals (r_t / σ_t). The Gaussian likelihood is standard but produces biased estimates if returns are fat-tailed (which they are). The Student's t-distribution likelihood is the standard fat-tail-robust alternative, it adds one extra parameter (degrees of freedom) and produces unbiased estimates for typical equity data.

Optimization is by quasi-Newton methods (BFGS or L-BFGS) with parameter constraints (ω > 0, α ≥ 0, β ≥ 0, α + β < 1). Convergence is fast for daily data, most of the cost is computing the likelihood, which is linear in sample size. A 5-year daily series (≈1,250 observations) fits in milliseconds on modern hardware.

A practical concern is that the likelihood surface can be flat near the boundary α + β = 1, leading to numerical issues. The standard remedy is to start optimization from a sensible point (α = 0.1, β = 0.85, ω = sample variance × 0.05) rather than from random starts, and to add small numerical floors on α and β to keep optimization away from the boundary.

GJR-GARCH and the leverage effect

Equity returns show a leverage effect: negative returns increase future volatility more than positive returns of the same magnitude. Standard GARCH(1,1) is symmetric, it treats positive and negative returns identically, and therefore misses this asymmetry. GJR-GARCH (Glosten, Jagannathan, Runkle 1993) extends GARCH(1,1) with an extra term that activates only for negative returns:

σ²_t = ω + α · r²_{t-1} + γ · r²_{t-1} · I(r_{t-1} < 0) + β · σ²_{t-1}

For equity returns, γ is typically around 0.03-0.06, comparable in magnitude to α. The improvement over symmetric GARCH(1,1) is statistically significant and economically meaningful for risk-management applications, a 5% drop matters more than a 5% rally for forecasting tomorrow's volatility.

ARIA Analyst uses GJR-GARCH(1,1) with t-distributed innovations as the default volatility model for equity assets. The slight increase in complexity over standard GARCH(1,1) is worth the meaningful improvement in tail-risk forecasts.

Forecasting with GARCH

The one-step-ahead variance forecast from GARCH(1,1) is simply the recursion applied to the most recent observation. Multi-step forecasts are obtained by iterating the recursion, using expected values for future observations. For k-step-ahead, the forecast converges to the unconditional variance ω / (1 − α − β) at the persistence rate (α + β)^k.

For most equity series with α + β ≈ 0.97, the forecast converges to the unconditional mean over roughly 30-50 trading days. This is why GARCH is most useful for short-horizon forecasts (1-10 days) and less useful for longer horizons. For monthly volatility forecasts, the model essentially reverts to the historical average, there is little information left in the conditional state.

The out-of-sample forecast accuracy of GARCH(1,1) for one-day-ahead variance is hard to beat. Comparisons against random forests, gradient boosting, and even neural network variants on daily equity data typically show that simple GARCH(1,1) is within a few percent of the best model on RMSE and frequently wins on log-likelihood. The reason is that the volatility-clustering signal is small and the GARCH parameters capture most of it efficiently.

GARCH variants and when to use them

EGARCH: log-variance specification. Eliminates the need for non-negativity constraints on parameters. Useful when the data shows strong leverage effects or when you want unconstrained optimization.
TGARCH: like GJR but with absolute returns instead of squared returns. Different functional form, similar economic intuition.
IGARCH: integrated GARCH with α + β = 1 imposed. The model becomes a random walk in variance. Often a better fit for extreme persistence regimes but discards mean-reversion information.
FIGARCH: fractionally integrated GARCH for very-long-memory volatility. Useful for high-frequency intraday data; usually overkill for daily.
Realized GARCH: incorporates realized variance from intraday data as an explanatory variable. Substantially improves forecasts when intraday data is available.
Markov-switching GARCH: combines regime-switching with GARCH dynamics. Useful for risk-management applications across crisis and non-crisis regimes.

Common pitfalls

Forgetting to demean returns. GARCH models variance, not raw second moments. Subtract a mean (or AR(1) residual) before fitting.
Using normal innovations. Equity returns have fat tails. Use t-distributed innovations or robust standard errors.
Overfitting with high-order GARCH(p,q). GARCH(1,1) is hard to beat. Higher orders rarely justify the parameter count.
Ignoring the leverage effect. For equities, GJR or EGARCH consistently outperforms symmetric GARCH. Use one of them.
Forecasting past the convergence horizon. Multi-step GARCH forecasts converge to the unconditional mean; beyond 30-50 days for typical equities, the forecast carries no information.

Conclusion

GARCH(1,1) and its asymmetric extension GJR-GARCH remain the workhorses of volatility forecasting at the daily frequency. The model is parsimonious, well-understood, fast to estimate, and competitive with vastly more complex ML approaches on the metric that matters: one-step-ahead forecast accuracy. For Monte Carlo simulations, VaR calculations, and option-implied volatility comparisons, a properly-fitted GJR-GARCH is the right starting point.

ARIA Analyst uses GJR-GARCH(1,1) with t-distributed innovations in its Monte Carlo simulation engine and VaR calculations. Create a free account to see the model in action on any ticker, or read our Monte Carlo guide for the simulation context. See also our VaR explanation for the risk-management application.

The clustering observation

The GARCH(1,1) recursion

The GARCH(1,1) model specifies that the conditional variance at time t depends on the previous period's squared return and the previous period's variance:

σ²_t = ω + α · r²_{t-1} + β · σ²_{t-1}

Parameter estimation

GJR-GARCH and the leverage effect

σ²_t = ω + α · r²_{t-1} + γ · r²_{t-1} · I(r_{t-1} < 0) + β · σ²_{t-1}

Forecasting with GARCH

GARCH variants and when to use them

EGARCH: log-variance specification. Eliminates the need for non-negativity constraints on parameters. Useful when the data shows strong leverage effects or when you want unconstrained optimization.
TGARCH: like GJR but with absolute returns instead of squared returns. Different functional form, similar economic intuition.
IGARCH: integrated GARCH with α + β = 1 imposed. The model becomes a random walk in variance. Often a better fit for extreme persistence regimes but discards mean-reversion information.
FIGARCH: fractionally integrated GARCH for very-long-memory volatility. Useful for high-frequency intraday data; usually overkill for daily.
Realized GARCH: incorporates realized variance from intraday data as an explanatory variable. Substantially improves forecasts when intraday data is available.
Markov-switching GARCH: combines regime-switching with GARCH dynamics. Useful for risk-management applications across crisis and non-crisis regimes.

Common pitfalls

Forgetting to demean returns. GARCH models variance, not raw second moments. Subtract a mean (or AR(1) residual) before fitting.
Using normal innovations. Equity returns have fat tails. Use t-distributed innovations or robust standard errors.
Overfitting with high-order GARCH(p,q). GARCH(1,1) is hard to beat. Higher orders rarely justify the parameter count.
Ignoring the leverage effect. For equities, GJR or EGARCH consistently outperforms symmetric GARCH. Use one of them.
Forecasting past the convergence horizon. Multi-step GARCH forecasts converge to the unconditional mean; beyond 30-50 days for typical equities, the forecast carries no information.

GARCH for Volatility Forecasting: A Practical Guide

The clustering observation

The GARCH(1,1) recursion

Parameter estimation

GJR-GARCH and the leverage effect

Forecasting with GARCH

GARCH variants and when to use them

Common pitfalls

Conclusion

Frequently asked questions

Is GARCH still useful in the age of deep learning?

How often should I refit GARCH parameters?

What is the difference between GARCH and implied volatility?

Ready to put this into practice?

Continue reading

Transaction Cost Modeling: The Backtest Killer Nobody Talks About

Information Coefficient: The Metric Quants Live and Die By

The EU AI Act and AI Investment Platforms: What Operators Need to Know

GARCH for Volatility Forecasting: A Practical Guide

The clustering observation

The GARCH(1,1) recursion

Parameter estimation

GJR-GARCH and the leverage effect

Forecasting with GARCH

GARCH variants and when to use them

Common pitfalls

Conclusion

Frequently asked questions

Is GARCH still useful in the age of deep learning?

How often should I refit GARCH parameters?

What is the difference between GARCH and implied volatility?

Ready to put this into practice?

Continue reading

Transaction Cost Modeling: The Backtest Killer Nobody Talks About

Information Coefficient: The Metric Quants Live and Die By

The EU AI Act and AI Investment Platforms: What Operators Need to Know