ML4T Book 3rd Edition

🇹🇭 ภาษาไทย

Machine Learning for Algorithmic Trading (3rd Edition) โดย Stefan Jansen — หนังสือฉบับที่สาม กำหนดออก June 2026 มี 27 chapters แบ่งเป็น 6 ส่วน ครอบคลุมตั้งแต่ data foundations ถึง production deployment

เพิ่ม 4 chapters ใหม่ทั้งหมด (Ch 16, 17 Strategy Simulation & Portfolio Construction, Ch 22-24 RAG/KG/Agents) และ Part 6 Production ใหม่ทั้ง part

โครงสร้าง 6 ส่วน

Partหัวข้อChapters
1 — FoundationData & Strategy SetupCh 1–6
2 — FeaturesFeature EngineeringCh 7–10
3 — ModelsML Pipeline & SynthesisCh 11–15
4 — StrategyBacktest to ExecutionCh 16–20
5 — Advanced AIRL, RAG & AgentsCh 21–24
6 — ProductionDeploy & MonitorCh 25–27

Part 1 — Foundation (Ch 1–6)

Chชื่อhighlight
1The Process Is Your EdgeML4T workflow แบบ 2-layer; regime detection ด้วย GMM (volatility ratio Risk-On vs Risk-Off = 1.3x); กำหนด evidence boundary
2The Financial Data Universe8 asset classes; bitemporal PIT; storage: Parquet 3.4x compression, Polars ASOF 3.8x faster
3Market MicrostructureLOB reconstruction (NASDAQ ITCH 423M msg/day, 97.6% cancel rate); dollar bars ดีที่สุด (JB=84.7 vs 3838 time bars)
4Fundamental & Alternative Databitemporal SEC EDGAR pipeline; entity resolution 3 stage; published factors สูญเสีย ~58% หลัง publication
5Synthetic Financial DataTimeGAN TSTR=1.70; Tail-GAN VaR error 102%→11.5%; Diffusion-TS KS=0.06; GReaT/distilgpt2 AUC=0.84
6Strategy Research Framework3-layer metrics; 5 leakage types; walk-forward CV; baseline checkpoint; run logging + DSR

Part 2 — Features (Ch 7–10)

Chชื่อhighlight
7Defining the Learning Tasklabel engineering; feature-label evaluation fold-by-fold; search accounting; correlation→causality
8Financial Feature Engineering3 filters (horizon/driver/role); price-derived, cross-instrument, contextual families; SPY-TLT regime conditioning: 17% IC swing
9Model-Based Feature Extractiondiagnostics, spectral, volatility, uncertainty, regime, cross-sectional features; walk-forward fitting
10Text Feature Engineeringlexical→static embeddings→sequential→Transformers; financial NLP workflow; PIT-safe text features

Part 3 — Models (Ch 11–15)

Chชื่อhighlight
11The ML PipelineRidge/LASSO/Elastic Net; Ridge 1.5x ICIR vs OLS; conformal prediction (CQR+ACI 88.1%); SHAP diagnostics
12Advanced Models for Tabular DataXGBoost, LightGBM, CatBoost; GBM beats linear ใน 7-8/9 case studies; TabM competitive; TreeSHAP
13Deep Learning for Time SeriesN-BEATS, PatchTST, iTransformer, TFT; DL rarely beats GBM baseline; linear model beats Transformer (Zeng 2022)
14Latent Factor ModelsPCA, IPCA, RP-PCA, CAE, adversarial SDF; factor zoo problem (400+ factors, 65% ล้มเหลว); CAE IC +0.073
15Causal Machine LearningDML; BSTS event impact; PCMCI/NOTEARS causal discovery; predictive vs causal signal

Part 4 — Strategy (Ch 16–20)

Chชื่อhighlight
16Strategy Simulationbacktest = falsification; 6 failure modes; DSR; IC champion ≠ Sharpe champion; cadence mediates IC→Sharpe
17Portfolio Constructionequal-weight hard to beat (DeMiguel 2009); Kelly criterion; HRP (no matrix inversion); no universal allocator winner
18Transaction Costscost taxonomy; square-root impact model; Almgren-Chriss optimal execution; TCA feedback loop; alpha-to-go
19Risk ManagementVaR/CVaR; drawdown path risk; factor decomposition; stress testing; GARCH/EWMA adaptive controls; kill switches
20Strategy Synthesis9 case study verdicts; NASDAQ-100 IC=0.008 แต่ Sharpe=4.22; GBM ชนะ 6/9; median holdout decay ~50%

Part 5 — Advanced AI (Ch 21–24)

Chชื่อhighlight
21Reinforcement LearningMDP formulation; DQN→PPO→SAC; optimal execution; market making; deep hedging (pfhedge); IRL
22RAG for Financial Researchhallucination → RAG solution; structure-aware parsing; hybrid retrieval + BM25; KG-guided +24% accuracy -85% tokens
23Knowledge Graphsgraph justified เมื่อ multi-hop query; LLM extraction pipeline; Graph RAG; institutional crowding features
24Autonomous AgentsReAct/ToT/Reflexion; explicit state + memory schema; tool contracts; multi-agent forecasting; Warden security pattern

Part 6 — Production (Ch 25–27)

Chชื่อhighlight
25Live Trading Systemsunified backtest↔live framework; IBKR/Alpaca/QuantConnect; order state machine 11 states; pipeline verification
26MLOps & Governancetechnical vs statistical failure distinction; PSI/KS/SHAP drift; shadow mode; circuit breakers; MLflow/DVC/Feast
27The Systematic Edgeprocess = durable edge; quant career archetypes (T-shaped); quantum/DeFi/AI ethics frontiers; learning system design

Key Numbers จากหนังสือ

  • 9 case studies: ETF, US Equities, NASDAQ-100, CME Futures, S&P500 Options, Crypto Perps, FX, Commodities, Firm Characteristics
  • GBM ชนะ 6/9 case studies downstream (Sharpe); linear wins ด้วย Ridge ใน asset ที่ correlated features
  • Median holdout decay ~50% across strategies
  • Backtest Sharpe: gross 1.76 → net -62.61 (NASDAQ-100 intraday case study) — cost assumptions matter มาก

🇬🇧 English

Machine Learning for Algorithmic Trading (3rd Edition) by Stefan Jansen — the third edition, due June 2026, covering 27 chapters across 6 parts from data foundations to live production deployment.

Adds 4 entirely new chapters (Strategy Simulation Ch16, Portfolio Construction Ch17, RAG Ch22, Knowledge Graphs Ch23, Autonomous Agents Ch24) and an entirely new Part 6 (Production).

6-Part Structure

PartThemeChaptersNotebooks
1 — FoundationData & Strategy SetupCh 1–6~59
2 — FeaturesFeature EngineeringCh 7–10~42
3 — ModelsML Pipeline & SynthesisCh 11–15~88
4 — StrategyBacktest to ExecutionCh 16–20~60
5 — Advanced AIRL, RAG & AgentsCh 21–24~36
6 — ProductionDeploy & MonitorCh 25–27~31

Part 1 — Foundation

Ch 1 — The Process Is Your Edge The ML4T workflow as a 2-layer system: a stable data infrastructure plus an iterative research loop. Evidence boundary separates what can be tested from what must be assumed. Regime detection using GMM on AQR factor data produces a 1.3x volatility ratio between Risk-On and Risk-Off regimes. Causal inference and GenAI are integrated into the workflow as augmentation tools, not replacements for statistical rigor.

Ch 2 — The Financial Data Universe Eight asset classes (equities, ETFs, fixed income, commodities, FX, crypto, options, derivatives). PIT correctness and bitemporal storage as core data engineering constraints. Storage benchmarks: Parquet achieves 3.4x compression vs CSV; DuckDB excels for SQL analytics; Polars ASOF joins run 3.8x faster than pandas.

Ch 3 — Market Microstructure LOB reconstruction from NASDAQ TotalView-ITCH (423M messages/day, 97.6% cancellation rate, 41% within 500ms). Bar sampling comparison: dollar bars achieve JB=84.7 vs 3,838 for time bars on NVDA — dollar bars are the recommended default for ML workflows. Lee-Ready trade classification: 96% accuracy vs 84% for tick test alone.

Ch 4 — Fundamental and Alternative Data Bitemporal pipeline from SEC EDGAR for point-in-time correctness. Three-stage entity resolution: deterministic (LEI/CIK/FIGI) → probabilistic (string similarity) → embedding-based. Published return predictors lose ~58% of performance post-publication (McLean & Pontiff 2016). SEC 10-K NLP pipeline: MD&A (Item 7) + Risk Factors (Item 1A).

Ch 5 — Synthetic Financial Data Classical baselines (bootstrap, GBM, GARCH) as benchmarks. GAN variants: TimeGAN TSTR ratio 1.70; Tail-GAN VaR error 102%→11.5%; Sig-CWGAN TSTR 0.97. Diffusion-TS: KS statistic 0.06, TSTR 1.00, 2.6x volatility ratio between regimes. LLM tabular generation: GReaT/distilgpt2, TSTR AUC-ROC 0.84.

Ch 6 — Strategy Research Framework Three-layer metric framework: model diagnostics / signal diagnostics / strategy outcomes. Five forms of data leakage. Walk-forward CV with temporal buffers. Baseline checkpoint (timing, coverage, trading-intensity sanity). Four-level trial taxonomy for run logging. Deflated Sharpe Ratio (DSR) as search-aware inference.


Part 2 — Features

Ch 7 — Defining the Learning Task Label engineering: fixed-horizon vs event-style constructions, overlap diagnosis, break-even cost checks. Feature-label evaluation fold by fold. Search accounting and multiple-testing adjustments. Mechanism plausibility to distinguish stable signal from confounded proxies.

Ch 8 — Financial Feature Engineering Three filters: horizon alignment, driver hypothesis (persistence/reversion/risk compensation/predictable-clock), role separation (signal vs state variable). Price-derived families: trend/momentum, reversal, volatility (Parkinson/Garman-Klass/Yang-Zhang, 5-14x efficiency gain), liquidity, microstructure. Cross-instrument: SPY-TLT correlation conditioning momentum IC with 17-percentage-point swing across regimes. Contextual: fundamentals, calendar (sin/cos), macro state. Degrees-of-freedom discipline: one knob at a time.

Ch 9 — Model-Based Feature Extraction Model-based features extracted from fitted procedures rather than raw price series. Families: diagnostics/stationarity, spectral/signal transforms, volatility (GARCH), uncertainty, regime (HMM), cross-sectional/panel. Key discipline: all fitting must happen within training windows (walk-forward) to preserve PIT correctness.

Ch 10 — Text Feature Engineering Evolution: lexical/TF-IDF → Word2Vec/GloVe static embeddings → LSTM/GRU sequential → Transformer contextual embeddings. Self-attention resolves polysemy and long-range dependence. Modern workflow: pre-trained checkpoint → domain adaptation → task fine-tuning. PIT-safe timestamps using model cutoffs and aggregation rules.


Part 3 — Models

Ch 11 — The ML Pipeline Ridge (L2), LASSO (L1), Elastic Net as principled regularization for high-dimensional, correlated financial features. Ridge achieves 1.5x ICIR improvement over OLS at optimal regularization on ETF case study. Conformal prediction: CQR+ACI progressively closes conditional coverage gap during high-volatility periods (82.3%→88.1% for 90% target). SHAP four-layer protocol: sign consistency, magnitude plausibility, stability, regime-conditional analysis.

Ch 12 — Advanced Models for Tabular Data XGBoost (regularized objective, second-order approximation), LightGBM (GOSS, leaf-wise growth), CatBoost (ordered target statistics). GBMs beat linear baselines in 7-8/9 primary-label comparisons. TabM (rank-1 adapters) beats GBM on several case studies. Optuna TPE with pruning can halve computation. TreeSHAP interaction decomposition: momentum regime-conditional (collapses above 90th-percentile volatility).

Ch 13 — Deep Learning for Time Series LSTM/GRU limitations: sequential bottleneck, gradient degradation. N-BEATS: basis expansion for trend+seasonality. Critical finding (Zeng 2022): linear models outperform Transformers 20-50% across LTSF benchmarks — Transformers largely ignore temporal order. PatchTST, iTransformer, TFT as post-critique architectures. Foundation models: TSFMs underperform tree-based on return prediction but show promise for volatility/VaR. Cross-dataset verdict: DL rarely outperforms strong tabular baselines; crypto perps is clearest DL win.

Ch 14 — Latent Factor Models Factor zoo problem: 400+ published factors, 65% failed replication (Hou, Xue, Zhang). PCA → IPCA (time-varying characteristic betas) → RP-PCA (pricing-error penalties) → CAE (nonlinear beta mapping) → adversarial SDF (no-arbitrage minimax). Yield curve: 3 PCA factors explain 95-99% variance. Equity latent factors: best IC ~0.073-0.074 but t-stat below Harvey-Liu-Zhu t>3.0 threshold.

Ch 15 — Causal Machine Learning DAGs for causal question formulation. Double Machine Learning (DML) for continuous treatment effect estimation with high-dimensional confounders. Bayesian Structural Time-Series (BSTS) for event impact via counterfactual baselines. Causal discovery: PCMCI, NOTEARS, VAR-LiNGAM. Distinguishing predictive signal from causal effect is a stability predictor.


Part 4 — Strategy

Ch 16 — Strategy Simulation Backtest as falsification, not verification. Six failure modes: lookahead, survivorship, data snooping, unrealistic execution, cost underestimation, regime fragility. Non-ML baseline Sharpe 0.76 fails to beat 60/40. DSR, White’s Reality Check, Rademacher Anti-Serum (RAS). Key cross-dataset finding: IC champion ≠ Sharpe champion in most case studies; rebalancing cadence mediates IC-to-Sharpe translation more than model choice.

Ch 17 — Portfolio Construction Fundamental Law of Active Management: IC=0.03 still useful with sufficient breadth. Equal-weight famously hard to beat (DeMiguel, Garlappi, Uppal 2009). Kelly criterion → fractional Kelly (half/quarter sizing). HRP: agglomerative clustering + recursive bisection, avoids matrix inversion. No universal winner across allocators — depends on trading environment.

Ch 18 — Transaction Costs Cost taxonomy: explicit (commissions, financing, borrow, taxes) / implicit (spread, slippage, impact) / capacity costs. Range: <1 bps liquid ETFs to >100 bps illiquid options. Square-root impact model has strong empirical support. TWAP, VWAP, adaptive participation, Almgren-Chriss optimal execution. Alpha-to-go: fast-decaying signals may lose most value before positions are fully established.

Ch 19 — Risk Management Seven risk categories: market, factor, leverage, concentration, liquidity/capacity, model, operational. VaR/CVaR + regime-conditional estimates. Drawdown: Ulcer Index integrates depth and duration. Factor decomposition: market beta increases in volatile regimes when it’s most costly. Adaptive controls: GARCH/EWMA targeting, STVU. Graduated kill switches: watch at 5%→terminate at 30% drawdown.

Ch 20 — Strategy Synthesis Nine case study verdicts: advance (US firm characteristics, FX), iterate (ETFs, NASDAQ-100), reframe (CME, S&P options, crypto). Key finding: NASDAQ-100 has weakest IC (0.008) but highest Sharpe (4.22). Median holdout Sharpe decay ~50% across studies. GBM is downstream champion in 6/9 studies. Cost-survival tiers: US firm characteristics survives above 100 bps; S&P options is negative at zero friction.


Part 5 — Advanced AI

Ch 21 — Reinforcement Learning RL’s comparative advantage: execution, market making, hedging (not alpha discovery). MDP formulation: state space, continuous action spaces, reward engineering. PPO for execution (modest improvement over TWAP), SAC for market making. Deep Hedging via pfhedge: no-transaction bands emerge from cost-aware policies. Inverse RL for reward inference from order flow. Key risk: simulation-to-reality gap (non-stationarity, impact reflexivity).

Ch 22 — RAG for Financial Research Hallucination is unacceptable in finance → RAG as architectural response. Structure-aware parsing (LlamaParse, Docling, Marker) vs naive fixed-size chunking. Domain-specific embeddings (Voyage AI finance, Fin-E5): FinMTEB benchmark shows consistent gap vs general models. Hybrid retrieval: semantic + BM25 via Reciprocal Rank Fusion. Re-ranking with cross-encoders. KG-guided retrieval: +24% correctness, -85% token consumption vs page-window retrieval (FinReflectKG-MultiHop). Retrieve-extract-compute-narrate for numeric questions.

Ch 23 — Knowledge Graphs Graph justified for: multi-hop dependency queries, structural crowding analysis, temporal relationship evolution. Not justified for: single-entity lookups, narrative synthesis, sparse graphs. Five-stage LLM extraction pipeline with governance-first approach. Three-timestamp model (event/disclosure/extraction time) — disclosure time is the PIT visibility gate. GNNs: fraud detection production-ready; alpha generation experimental. Start with hand-crafted graph features.

Ch 24 — Autonomous Agents ReAct (auditable loops) → Tree of Thoughts (parallel hypothesis exploration) → Reflexion (post-run critique). Explicit three-tier memory: working / session / persistent. Tool contracts as primary quality determinant. Context engineering: expose only phase-appropriate tools and PIT-consistent evidence. Warden security pattern: policy proxy with allowlists. Multi-agent forecasting: Neyman extremization + Platt calibration. Scope: read-only research agents (L1 decision support), not order execution.


Part 6 — Production

Ch 25 — Live Trading Systems Technical divergence between backtest and live is the primary self-inflicted failure mode. Unified framework: same strategy code in ml4t-backtest and ml4t-live. Brokers: IBKR (SmartRouting, no PFOF), Alpaca (commission-free REST API), QuantConnect (LEAN engine). Order lifecycle: 11-state machine with 23 valid transitions. Pipeline verification: feed identical inputs through both systems and compare at each stage. Crypto case study: LightGBM classifier deployed to OKX with prediction-flip exits.

Ch 26 — MLOps and Governance Technical failure (pipeline divergence) vs statistical failure (model decay) — requires different diagnostics. Three drift types: data drift (PSI/KS), feature drift (SHAP monitoring), concept drift (ADWIN/DDM). Shadow mode evaluation before champion-challenger promotion. Minimum effect size: 0.2-0.3 Sharpe improvement required for promotion. Four-level circuit breakers: trade / strategy / portfolio / system. MLOps stack: Feast (feature store), DVC (data versioning), MLflow (model registry, SR 11-7 compliance).

Ch 27 — The Systematic Edge Process is the durable edge. Five quant archetypes: researcher, trader, developer, portfolio manager, risk manager. Quantamental roles (systematic + fundamental) as the dominant industry trend. T-shaped expertise. Frontiers: quantum computing (mid-2030s for meaningful advantage), DeFi (live alpha today from on-chain data, AMMs), AI ethics (EU AI Act now a compliance requirement). Burnout as professional risk. Four career failure modes: over-specialization, underestimating soft skills, ignoring regulation, perpetual learning without application.


Cross-Dataset Key Numbers

MetricValueSource
Gross Sharpe (NASDAQ-100 intraday)+1.76Ch 16
Net Sharpe (NASDAQ-100 intraday)-62.61Ch 16
Median holdout decay~50%Ch 20
GBM wins (downstream Sharpe)6/9 case studiesCh 20
DSR adjustments: materially change conclusionsseveral candidatesCh 16
US firm char: validation Sharpe+3.03Ch 20
US firm char: holdout Sharpe+2.52Ch 20
FX: only study where holdout > validationCh 20