Feature Engineering for Financial ML: A 90-Feature Walkthrough
A complete walkthrough of feature engineering for financial machine learning. The 90 features ARIA Analyst computes per ticker, grouped by family, with the rationale for each.
In financial machine learning, model architecture is a second-order concern. The first-order concern is features. A well-engineered feature set with a vanilla gradient-boosted-trees model will outperform a poorly-engineered feature set with the most sophisticated deep learning architecture, every time. This is not a controversial claim among practitioners, it is the lesson everyone learns after their first failed deep-learning-on-prices experiment.
This article documents the 90 features ARIA Analyst computes per ticker for its ML ensemble, grouped into five families: fundamental, technical, momentum, sentiment, and macro. For each family I describe the features, the rationale, and the pitfalls. The goal is to give a worked example of what a serious financial ML feature set looks like, not to give a recipe to copy.
Family 1: Fundamental features (28 features)
Fundamental features describe the underlying business: how profitable it is, how fast it is growing, how leveraged it is, and how richly it is valued. These features change slowly (quarterly at most) but carry meaningful long-horizon signal.
Profitability: ROE, ROIC, net margin, gross margin, EBITDA margin, FCF margin. Each is reported as a trailing 12-month value, a 5-year average, and a trend (acceleration/deceleration over 8 quarters). The trend matters more than the level in many cases, a deteriorating high-margin business is a different bet than a stable medium-margin business.
Growth: revenue CAGR (1Y, 3Y, 5Y), EPS CAGR, FCF CAGR, book value CAGR. Same structure, level plus trend. The gap between revenue and EPS growth is its own feature (margin expansion/compression).
Leverage: debt/equity, net debt/EBITDA, interest coverage, current ratio. These are useful primarily as risk filters, extreme values flag potential financial distress; modest values are noise.
Valuation: P/E, forward P/E, P/B, P/S, EV/EBITDA, EV/Sales, FCF yield, earnings yield. Each as a level and as a percentile rank within the stock's own 5-year history (relative valuation is more informative than absolute valuation).
Quality: Piotroski F-score, Altman Z-score. These are composite features that bundle multiple financial-statement signals into a single number. They have well-documented predictive power for distressed and high-quality buckets respectively.
Family 2: Technical features (22 features)
Technical features are derived from price and volume data alone. They are higher-frequency than fundamentals and capture momentum, mean reversion, and volatility regime.
Momentum: 1-month, 3-month, 6-month, 12-month returns. These four features together capture the well-documented momentum factor. 12-month minus 1-month (the "Carhart momentum") is its own feature, and it predicts forward returns more cleanly than raw 12-month return.
Mean reversion: 14-day RSI, distance from 50-day moving average, distance from 200-day moving average, Bollinger band position. Mean-reversion signals are noisy on their own but combine well with fundamental quality.
Volatility regime: 20-day realized vol, 60-day realized vol, ratio of the two (a "vol expansion" indicator), GARCH-fitted persistence. These features capture whether the stock is in a calm or turbulent regime, useful as conditioning features for the ML ensemble.
Volume: 20-day average volume, 5-day relative volume, volume-price correlation, on-balance volume trend. Volume features help distinguish technical signals that have institutional conviction from those that are noise.
Pattern indicators: ADX (trend strength), MACD histogram, Ichimoku cloud distance. These are popular technical-analysis indicators with mixed predictive power; including a small number as features rather than as standalone signals lets the ML model figure out which combinations work.
Family 3: Momentum and reversal cross-sectional features (15 features)
Cross-sectional features compare a stock to its sector, factor, or universe peers. These are some of the most predictive features in the financial ML literature.
Relative strength: 12-month return vs. sector ETF, 12-month return vs. factor portfolio (size, value, momentum, quality, low-vol), 12-month return vs. universe median. These quantify whether the stock is leading or lagging its peers.
Relative valuation: P/E vs. sector median, EV/EBITDA vs. sector median, FCF yield vs. sector median. A stock that is cheap relative to its peers is a different bet than one that is cheap relative to the broad market.
Quality rank: Piotroski rank within sector, ROIC rank, growth rank. These convert absolute quality numbers into peer-relative ranks.
Beta features: market beta, sector beta, factor exposures (size, value, momentum, quality, low-vol). These capture systematic risk in a way that the raw return features cannot.
Family 4: Sentiment features (12 features)
Sentiment features capture market positioning and information flow that is not reflected directly in price.
News sentiment: FinBERT-classified tone of news articles over 7-day window, 30-day window. Reports as percent positive minus percent negative. Strong at extremes (contrarian indicator), weak in the middle.
Insider activity: net insider buying over 90 days (executives buying minus selling, dollar-weighted). Strong positive signal at extremes (executives have non-public information about their own company).
Short interest: short interest as percent of float, change over 30 days, days to cover. High short interest with positive momentum is a short-squeeze setup; high short interest with weak momentum is a bearish signal.
Options flow: put/call ratio over 5 days, implied volatility skew (25-delta put vs. 25-delta call), implied vol vs. realized vol ratio. Options markets aggregate informed-trader positioning.
Institutional ownership: change in institutional ownership over 90 days, 13F filings recency, hedge fund positioning. Smart-money positioning matters more for small and mid-cap stocks where institutions cannot easily disguise their flow.
Family 5: Macro features (13 features)
Macro features describe the environment in which the stock is operating. They are the same across all stocks at any given time but interact with stock-level features in important ways (a cyclical with strong fundamentals is a different bet at the start of a recession than at the start of an expansion).
Yield curve: 10Y minus 2Y Treasury spread, 10Y minus 3M, 30Y minus 10Y. Different segments of the curve matter for different sectors.
Credit conditions: investment-grade credit spread, high-yield credit spread, change in each over 30 days. Credit spreads are leading indicators of financial stress.
Real rates: 10Y real yield (TIPS-derived), change over 30 days. Real rates drive the discount rate for long-duration equities (growth, tech).
Dollar: DXY level, 1-month change. The dollar matters for multinationals and commodities.
Volatility regime: VIX level, VIX term structure (1-month vs. 6-month), HMM regime probabilities. Volatility regime conditions almost every other feature.
Sector rotation: returns of GICS sector ETFs over 1M and 3M. The rotation matters as a signal of where the market is positioning.
Why this many features?
Ninety features is a lot. The argument for keeping it this large rather than picking a smaller subset is that gradient-boosted trees handle high-dimensional feature spaces well and automatically discount features that do not contribute predictively. The argument against, that more features mean more chances to overfit, is mitigated by using regularized boosting (L2 regularization, early stopping on validation set, max-depth caps) and walk-forward cross-validation.
Empirically, our feature selection experiments suggest that around 60 of the 90 features carry meaningful signal in any given regime. Which 60 changes across regimes, momentum features dominate in trending markets, valuation features dominate in mean-reverting markets, sentiment features dominate in extremes. The model is allowed to choose the conditional importance via its splitting structure.
Common feature engineering pitfalls
- Look-ahead bias. The cardinal sin. Use only information that was available at the prediction date. For fundamentals, this means using the reporting lag, the 10-Q filed at time t describes the quarter ending at time t-90 days, but it was not available until time t.
- Survivorship bias. If your feature set excludes stocks that delisted (bankruptcy, mergers, deletions), your backtest will overstate returns. Use a survivorship-bias-free database.
- Stationarity assumption. Features that worked in the 1990s may not work today (and vice versa). Re-estimate feature importance regularly and condition on regime.
- Multi-collinearity. Many of the features above are correlated (P/E, P/B, P/S all measure valuation). Gradient boosting handles this but it inflates feature-importance noise. Be careful interpreting importances when correlations are high.
- Data leakage through preprocessing. If you scale features using full-sample means and standard deviations, you have leaked future information. Use only past data for any preprocessing step.
Conclusion
Feature engineering is where most of the work, and most of the predictive power, lives in financial ML. The 90 features above are a worked example of a serious feature set; the exact list matters less than the discipline of building it carefully and respecting the look-ahead and survivorship constraints. A vanilla gradient-boosted-trees model on a well-engineered feature set will outperform a sophisticated architecture on a poorly-engineered set every time.
ARIA Analyst exposes a subset of these features to users via the deep-dive analysis page. Create a free account to see the features in action on any stock, or read our scoring methodology for how the features combine into a single score. See LSTM vs LightGBM for why model choice matters less than feature choice.
Frequently asked questions
Should I include alternative data features?
Yes, but only after the core feature set is solid. Alternative data (satellite imagery, app usage panels, web-scraping signals) can add value at the margin, but it is expensive to acquire, noisy, and often duplicates information that is in price or fundamentals. Add alternative data once you have proven the model works on standard features and have a clear hypothesis about what the alternative data adds.
How do I handle missing data in features?
Three approaches. (1) Drop the observation if any feature is missing, wasteful but safe. (2) Impute with sector/peer median, works for missing-at-random features like fundamentals that occasionally lag. (3) Add a missingness indicator as its own feature and impute with a sentinel value, works for features where missingness itself is informative (a stock with no analyst coverage is different from a stock with full coverage). Gradient boosting natively handles missing values via the third approach; we use that as default in ARIA Analyst.
How often should I update feature definitions?
The feature set should be stable; changes should be rare and well-motivated. Adding new features mid-backtest invalidates the backtest (you have implicitly used future knowledge to choose the features). Reserve feature additions for major model updates and re-validate the entire pipeline when they happen. The 90-feature set we use today is roughly the same as it was 18 months ago, with two or three additions and no removals.
Ready to put this into practice?
ARIA Analyst applies these methods on any stock, crypto, forex, commodity, or fund. Three free analyses per day on the free tier.