LSTM vs LightGBM for Stock Prediction: Why Boosting Wins
A head-to-head comparison of LSTM neural networks and LightGBM gradient boosting for stock return prediction. Out-of-sample evidence, training cost, feature engineering, and why tree ensembles win for tabular financial data.
The 2017-2020 wave of "deep learning will eat finance" produced an enormous volume of papers proposing LSTM, GRU, and Transformer-based models for stock return prediction. By 2024, the consensus among practitioners was that, on tabular financial features at the daily frequency, gradient-boosted trees beat deep neural networks decisively. This article explains why, with reference to public benchmarks and our own production experience.
The two contenders
LSTM (long short-term memory) is a recurrent neural network architecture designed for sequential data. The model takes a sequence of observations (prices, volumes, or engineered features) and produces a prediction conditional on the full sequence. The recurrence allows the model to learn long-range dependencies, useful for time series with persistent state.
LightGBM is a gradient-boosting framework built on decision trees. It trains an ensemble of shallow trees sequentially, each correcting the residuals of the previous ensemble. The model is non-sequential, each prediction depends only on a feature vector at a single point in time, not on a sequence. Temporal information is provided to LightGBM through engineered features (lagged returns, rolling means, momentum indicators) rather than learned by the model.
On the surface, LSTM seems better suited to time-series problems, it learns temporal dependencies natively. In practice, the structure of financial data shifts the advantage to LightGBM for several reasons.
Why LightGBM wins on tabular financial data
First, financial features are tabular by nature. The signal in stock return prediction is not in the raw price sequence, it is in derived features like momentum (12-month return), valuation (P/E ratio), quality (ROIC), and sentiment (news flow). Gradient boosting trees handle tabular features naturally; LSTMs require feature engineering anyway because raw prices have too low signal-to-noise ratio.
Second, financial data is data-poor by ML standards. A daily-frequency stock has ~250 observations per year. For 30 years of history, that is 7,500 observations per stock. Even with 1,000 stocks, the total dataset is ~7.5 million rows, small by deep learning standards. LSTMs have thousands of parameters and benefit from datasets in the hundreds of millions to billions of observations. With 7.5 million observations and 90 features, gradient-boosted trees fit cleanly without regularization heroics.
Third, financial features change meaning across regimes. A high momentum signal means one thing in trending markets and another in mean-reverting ones. Tree-based models handle this naturally through their splitting structure: different paths through the tree apply to different regimes. LSTMs would need to learn the regime conditioning implicitly through their hidden state, which is data-expensive.
Fourth, training cost. LightGBM trains in seconds-to-minutes on a single CPU. A comparable LSTM takes hours to days on a GPU. For walk-forward backtesting that retrains the model many times, the cost difference is decisive.
Empirical evidence
Multiple academic papers have benchmarked LSTM and LightGBM on stock return prediction. The pattern is consistent: at the daily frequency on tabular features, LightGBM matches or slightly outperforms LSTM in out-of-sample accuracy, with a fraction of the training cost. Hu et al. (2021) found LightGBM Sharpe ratios 0.05-0.15 higher than LSTM in a 100-stock universe over 2010-2020. Krauss et al. (2017) found similar results for short-horizon predictions in the S&P 500.
The picture changes at higher frequencies and with alternative data. For 5-minute-frequency intraday prediction with order-book features, LSTMs (and more recently, Transformers) do outperform tree ensembles because the data volume is high enough to support the larger parameter count and the sequential structure of order flow matters. For cross-sectional alpha generation using earnings transcripts (text data), Transformer-based models naturally dominate because the input is sequential text.
For the bulk of retail-relevant stock prediction, daily frequency, tabular features, equity universe, gradient-boosted trees are the right baseline. ARIA Analyst uses a stacked LightGBM + XGBoost ensemble as the ML core, and we have tested LSTM variants several times. The LSTMs underperform on every regime and cost 10x more to train.
Feature engineering matters more than model choice
The single biggest determinant of out-of-sample accuracy in stock prediction is not the model, it is the features. A well-engineered LightGBM with 90 carefully-chosen features outperforms a poorly-engineered LSTM with raw price sequences by a wide margin, and a well-engineered LSTM with the same 90 features as input performs roughly the same as the LightGBM (slightly worse, in our experience).
This is the lesson that the deep learning wave eventually taught finance: model architectures matter less than features, and features matter less than the underlying data and labeling pipeline. We covered the feature engineering side in our blog post on feature engineering for financial ML.
Where LSTMs (or Transformers) make sense
LSTM and Transformer architectures have specific use cases in finance where they outperform tree ensembles:
- High-frequency prediction (5-minute or shorter) where the sequential structure of order flow is informative and data volume is large.
- Text-based features, earnings transcripts, news articles, regulatory filings. Transformers dominate because the input is naturally sequential text. FinBERT and similar models are the right tool here, with their outputs fed as features into a downstream tree ensemble.
- Multi-asset joint modeling where cross-asset dependencies need to be learned. Tree ensembles model assets one at a time; sequence models can capture cross-asset interactions natively. The advantage here is real but data-expensive.
- Alternative data with rich structure, satellite imagery, social media networks, mobile-app usage panels. Anywhere the input is non-tabular, sequence or graph models can outperform tree ensembles.
Why XGBoost and LightGBM, not just one
Even within tree ensembles, there is variation. XGBoost and LightGBM share the gradient-boosting idea but differ in implementation: XGBoost uses level-wise tree growth and a regularized objective; LightGBM uses leaf-wise growth and histogram-based splitting. The two produce correlated but not identical predictions. Stacking them, combining their probability outputs with a meta-learner, produces better out-of-sample performance than either alone.
ARIA Analyst's ML core is a stacked LightGBM + XGBoost ensemble, with a logistic regression meta-learner that combines the two base models' probabilities. The improvement over the best base model is small (~3% reduction in log-loss out-of-sample) but consistent across regimes. CatBoost would be a third reasonable addition; it adds diminishing returns and we have not included it in the production stack to keep the pipeline simple.
Conclusion
For daily-frequency stock prediction on tabular features, gradient-boosted trees (LightGBM, XGBoost) outperform LSTM and Transformer architectures consistently and at a fraction of the training cost. The wisdom of the 2017-2020 deep learning wave was that more data and more compute help, but financial data at the daily frequency has the wrong shape for deep learning to dominate. Where deep learning does dominate, high-frequency prediction, text features, alternative data, it remains a powerful tool. For the bulk of retail-relevant equity prediction, the answer is gradient boosting.
ARIA Analyst uses a stacked LightGBM + XGBoost ensemble for its ML core. Create a free account to see the model in action, or read our scoring methodology for the full pipeline. See feature engineering for financial ML for why features matter more than architecture.
Frequently asked questions
Is the LSTM vs LightGBM debate settled?
For tabular daily-frequency equity data, yes, LightGBM and its peers (XGBoost, CatBoost) outperform LSTMs reliably and at a fraction of the training cost. For higher-frequency data, text data, and alternative data formats, deep learning still has a role. The lesson from a decade of head-to-head comparisons is that model choice should follow data shape: tabular gets trees, sequential gets RNNs, text gets Transformers.
Do I need GPUs for LightGBM?
No. LightGBM trains efficiently on CPUs and the GPU variant offers modest speedup only for very large datasets (10M+ rows). For typical financial applications with 1-10M rows of data, CPU training is fast enough, minutes to tens of minutes for a full ensemble. This is one reason gradient boosting is so practical in production: no GPU infrastructure required.
What about Transformers for stock prediction?
For tabular features at daily frequency, Transformers offer no advantage over tree ensembles. The attention mechanism is designed for sequential data where positional relationships matter; tabular features have no inherent position. For text features (10-K filings, earnings transcripts), Transformers (FinBERT, FinGPT) are clearly the right tool, but the output should be fed as features into a downstream tree ensemble for the prediction task. The hybrid is what works in production.
Ready to put this into practice?
ARIA Analyst applies these methods on any stock, crypto, forex, commodity, or fund. Three free analyses per day on the free tier.