QuantBulgaria Logo
Login / Register
Under the Hood

Statistical Engineering

A rigorous, five-layer quantitative architecture — from raw OHLCV data to a deployable trade signal. Every component is independently testable, statistically grounded, and designed to fail gracefully.

Architecture

The Five-Layer Signal Pipeline

A raw equity price bar enters the pipeline as OHLCV data. It exits as either a fully parameterised trade signal (entry price, stop, target, size) or is discarded. Every layer can independently reject the candidate — there is no way to override a rejection downstream.

Each layer is a pure function: deterministic, stateless, and unit-testable in isolation. This design prevents hidden interactions between components and makes individual layer performance attributable.

Feature Computation ML Scoring Regime Gate Position Sizing Execution Model
📊

Layer 1 — Feature Engineering

Five targeted sub-signals computed from daily OHLCV: RSI-14, RSI-2, SMA-20 deviation, Bollinger Band position, and 5-day momentum. All computations strictly causal — only close-of-day t data used.

Causal · No Lookahead · Mean-Reversion
📐

Layer 2 — Composite Signal Score

Five mean-reversion sub-signals are normalised 0–1 and combined with fixed weights into a composite score [0,1]. Only scores ≥ 0.62 advance — threshold calibrated on walk-forward hold-out data.

RSI · Bollinger · SMA · Momentum · Weighted
🌡️

Layer 3 — Macro Regime Gate

Two binary gates: (1) SPY 200-day SMA determines BULL/BEAR regime, (2) SPY 20-day realised vol must exceed 12%. Low-vol environments suppress mean-reversion premium — all entries suspended when the vol gate fails.

SPY 200-SMA · 20d RV > 12% · Long-Only
⚖️

Layer 4 — Risk & Position Sizing

1.5×ATR(14) stop-loss and 3.0×ATR(14) take-profit computed per signal — producing a consistent 2:1 R:R across all volatility regimes. Position size fixed at 3% of equity maximum. Hard cap prevents over-concentration in low-volatility environments.

1.5×ATR Stop · 3.0×ATR Target · 3% Hard Cap
🏦

Layer 5 — Execution Model

0.05% slippage + 0.01% commission applied per side. Gap-down scenarios modelled using historical overnight gap distribution. Worst single-trade event: −15.12% (NOW gap_stop).

0.12% Round-Trip · Gap Risk Modelled

Layer 1

Feature Engineering

From each daily OHLCV bar, the engine computes five targeted mean-reversion sub-signals. These are not generic factor features — each is selected for its direct economic meaning in the context of short-term price reversion. No additional feature selection or dimensionality reduction is applied.

All features are computed on a strictly causal window: only data available at close-of-day t is used to generate the signal for t+1. The lookback is capped at 60 trading days for the SMA and Bollinger calculations, and 14 days for ATR and RSI14.

# Mean-reversion signal family
RSI_14 = rsi(close, 14)
RSI_2 = rsi(close, 2)

# Deviation from moving averages
SMA20_pct = (close[t] - sma(close, 20)) / sma(close, 20)
BB_pos = (close[t] - bb_lower) / (bb_upper - bb_lower)

# Short-term momentum (inverted for reversion)
Mom_5d = (close[t] / close[t-5]) - 1
5 sub-signals 14–20 day causal window Mean-reversion focused

Demo: CRYN Synthetic Stock — Price + Signals

Illustrative only — synthetic data generated for visualisation. Green markers = buy signals passing all 5 layers. Red dashed = stop-loss. Green dashed = take-profit.

Layer 2

Composite Signal Architecture

The scoring model is a fixed-weight composite of five mean-reversion sub-signals. Each sub-signal is independently normalised to a [0,1] range based on its empirical distribution over the trailing 252-day lookback window, then combined with the weights shown in the table below.

This design is deliberately interpretable: every component has a clear economic rationale, and the combined score is auditable trade-by-trade. The threshold of 0.62 was chosen as the precision-recall elbow on walk-forward out-of-sample data — not tuned on the full backtest period.

score = 0.25×norm(RSI14_signal)
         + 0.20×norm(RSI2_signal)
         + 0.25×norm(SMA20_pct_below)
         + 0.20×norm(BB_position)
         + 0.10×norm(Mom5d_neg)

signal = if score0.62: "LONG" else: "SKIP"
Sub-SignalWeightMean-Reversion Rationale
RSI 1425%Oversold on 14-day momentum; primary reversion trigger
SMA-20 Deviation25%% below 20-day SMA; measures short-term stretch
Bollinger Band Position20%Lower band proximity; normalised price compression
RSI 220%Ultra-short-term exhaustion signal; high precision on 1–3 day reversions
5-Day Momentum10%Confirms directional oversell without contradicting primary signals

Signal Component Weights (fixed)

Weights are fixed constants, not fitted parameters. Stability confirmed across all five walk-forward windows.

Threshold Calibration

Signal Score Distribution

The chart shows the empirical distribution of composite scores across all candidates evaluated over the 60-month backtest. The threshold of 0.62 was chosen as the precision-recall elbow — the point at which further tightening yields diminishing Sharpe improvement relative to the reduction in trade count.

Lowering the threshold to 0.50 increases trade volume by ~3× but compresses profit factor from 1.50 to approximately 1.18 — insufficient to justify the added execution costs and capital churn. The 0.62 threshold is treated as fixed out-of-sample and is not adjusted between walk-forward windows.

Threshold Sensitivity

ThresholdTradesWin RateProfit FactorSharpe
0.509,80051.3%1.180.62
0.556,10054.4%1.330.84
0.584,70056.8%1.420.97
0.62 ★3,25359.2%1.501.09
0.681,64060.5%1.540.88

Score Distribution — All Candidates Evaluated

Orange dashed line = 0.62 threshold. Signals left of threshold are discarded. Right = live candidates (3,253 over 60 months).

Trade Analytics

P&L Distribution

Across 3,253 closed trades, the system produces a 59.2% win rate with average winner (+5.14%) and average loser (−4.96%) closely matched — edge comes from win frequency, not payoff asymmetry. The position-level holding period averages 4–6 trading days; most exits are signal-reversal (mean reversion completion) or max-hold expiry.

The fat left tail consists primarily of gap-down events that breach the stop before market open — these are included at their actual fill price with no special treatment. The worst single trade was −15.12% (NOW gap_stop). All gap events are included in headline figures.

+5.14%

Avg Winner

−4.96%

Avg Loser

59.2%

Win Rate

−15.12%

Worst Trade (NOW)

Trade P&L Distribution — 3,253 closed trades

Bin width 0.5%. Green bars = winners (59.2%), red bars = losers (40.8%). Fat left tail from overnight gap events.

Layer 3

Market Regime Detection

Regime detection is a binary gate — not a soft signal. When the regime model classifies the market as "high-stress", entry to all new positions is fully suspended regardless of individual signal quality. The calendar below shows which months were fully active, partially filtered, or fully suspended over the 5-year backtest window.

Regime Components

📈

SPY 200-Day SMA (primary)

SPY closing price vs. its 200-day simple moving average determines the BULL/BEAR regime label. BEAR regime does not suspend signals — mean-reversion still works — but position sizing context is noted.

📊

20-Day Realised Volatility Gate

SPY 20-day annualised realised volatility must exceed 12% for signals to pass. Low-volatility environments suppress mean-reversion premium — entries are suspended when markets are too calm.

🔢

Long-Only Regime Filter

The engine is long-only. All 3,253 signals across the 60-month backtest were long entries. This structural bias benefits from the positive equity risk premium while keeping the model interpretable.

Monthly Regime Calendar — 2021–2026

Active — full entries allowed
Partial — reduced position count
Filtered — all entries suspended

Layer 4

Position Sizing Mathematics

Position size is fixed at 3% of equity per signal — simple, transparent, and immune to over-sizing in low-volatility environments. The stop and target are driven by ATR, not used to back-calculate shares. This prevents the common pitfall of tiny ATR signals generating dangerously large nominal positions.

# ATR-based stop & target calculation
ATR_14 = atr(high, low, close, 14)
stop_dist = 1.5 × ATR_14 # 1.5× ATR stop
take_prof = 3.0 × ATR_14 # 3.0× ATR target → 2:1 R:R

# Fixed-fractional sizing
max_size = equity × 0.03 # 3% hard cap
shares = floor(max_size / close)

The 3% hard cap prevents any single signal from dominating the portfolio. With 3% sizing, a maximum of ~33 simultaneous positions would fully deploy capital — in practice peak concurrent open positions reached 10 during high-signal periods. The 1.5×ATR stop and 3.0×ATR target produce a consistent 2:1 R:R across all volatility regimes.

Rolling Portfolio Exposure — % Capital Deployed

Illustrative 90-day rolling capital exposure. Regime suspension periods visible as red bands where exposure drops to near zero.

Sizing Tradeoff Summary

SizeCAGRMax DDSharpe
1%7.5%−6%0.95
2%14.2%−12%1.04
3% ★20.7%−18%1.09
5%33.5%−30%1.02

Validation

Statistical Integrity Framework

Backtested performance is meaningless without a formal bias-elimination protocol. QuantBulgaria applies five specific integrity checks before any result is reported.

🔬

No Lookahead Bias

All feature computations use only data available at close-of-bar t. Future prices, future volatility, or end-of-period indices are never referenced. Verified by unit test on every feature function.

📅

Expanding Window CV

5-fold time-series cross-validation with an expanding training window. No walk-forward fold ever uses data from a future fold for model selection or threshold calibration.

🎲

Anti-Survivorship

The S&P 500 constituent list is point-in-time for each window — stocks delisted or removed during the period are included up to their removal date. Ex-post selection of winners is structurally impossible.

💸

Full Cost Modelling

0.12% round-trip friction on every trade. The worst-case gap scenario (−15.12% NOW) is modelled as a realistic tail event, not filtered as an outlier. No gross-of-cost figures are reported.

🧪

Monte Carlo Stress

1,000 bootstrap simulations resample the 3,253-trade log with replacement to produce a distribution of terminal equity values. The reported CAGR sits at the 50th percentile; the 5th percentile simulation still achieves +37% total return.

📐

Parameter Sensitivity

All three primary parameters (threshold 0.62, stop 1.5×ATR, target 3.0×ATR) are stress-tested ±20%. Profit factor remains above 1.35 across the full grid — confirming robustness rather than brittle curve-fitting.

Layer 5 · Stress Testing

Monte Carlo Simulation

The chart to the right shows the terminal equity distribution from 1,000 bootstrap simulations of the 3,253-trade log. Each simulation draws trades with replacement from the historical outcomes and applies them sequentially with 3% position sizing.

The distribution confirms that the headline result is not a lucky draw — the 5th percentile simulation still achieves +37% total return over 5 years. The median (50th pct) is +156%, matching the deterministic backtest result.

Simulation Percentiles (5-year terminal equity)

PercentileTerminal EquityTotal Return
5th (worst-case)1.37×+37%
25th1.98×+98%
50th (median)2.56×+156%
75th3.31×+231%
95th (best-case)4.78×+378%

Monte Carlo — Terminal Equity Distribution (n=1,000)

Orange line = actual deterministic backtest result (2.56×). Distribution centred on +156% total return.

Ready to See It Live?

The same five-layer engine runs on live market data every trading day. Explore signals, backtests, and risk analytics on the QuantBulgaria platform.

Launch Platform