ML4T Book 2nd Edition

🇹🇭 ภาษาไทย

Machine Learning for Algorithmic Trading (2nd Edition) โดย Stefan Jansen ตีพิมพ์ กรกฎาคม 2020 | สำนักพิมพ์ Packt Publishing | 858 หน้า | 23 chapters + appendix

3rd Edition กำลังมา (June 2026) ใน ML4T Platform — ขยายเป็น 27 chapters เพิ่ม GenAI, causal inference, production MLOps

ML4T Workflow (Framework กลาง)

Data → Features → ML Model → Signal → Backtest → Portfolio → Live

โครงสร้าง Chapters

Part 1 — Data (Ch 1–5)

Chชื่อสาระสำคัญ
1ML for Trading – From Idea to ExecutionML4T workflow, use cases, strategy lifecycle
2Market and Fundamental DataITCH feed, tick→bars, pandas-datareader
3Alternative Data for FinanceCategories, evaluation criteria, web scraping
4Financial Feature EngineeringAlpha factors, TA-Lib, Kalman filter, Alphalens
5Portfolio Optimization and Performance EvaluationSharpe, HRP, pyfolio

Part 2 — ML Foundations (Ch 6–8)

Chชื่อสาระสำคัญ
6The Machine Learning ProcessBias-variance, cross-validation, purging/embargoing
7Linear ModelsOLS, ridge, lasso, Fama-French, logistic regression
8The ML4T Workflow – From Model to Strategy Backtestingbacktrader, Zipline Pipeline API

Part 3 — Classical ML (Ch 9–13)

Chชื่อสาระสำคัญ
9Time-Series ModelsARIMA, GARCH, VAR, cointegration, pairs trading
10Bayesian MLPyMC3, Bayesian Sharpe ratio, rolling regression
11Random ForestsDecision trees, RF, long-short Japanese stocks, LightGBM
12BoostingGBM, XGBoost, LightGBM, CatBoost, SHAP
13Unsupervised LearningPCA, clustering, HRP portfolio

Part 4 — NLP (Ch 14–16)

Chชื่อสาระสำคัญ
14Text Data for Trading – Sentiment AnalysisspaCy, TF-IDF, naive Bayes
15Topic ModelingLDA (sklearn + Gensim), earnings call topics
16Word Embeddingsword2vec, GloVe, doc2vec, BERT intro

Part 5 — Deep Learning (Ch 17–21)

Chชื่อสาระสำคัญ
17Deep Learning for TradingFeedforward NN, TF2, PyTorch, long-short strategy
18CNNsLeNet5, transfer learning, 1D conv for time series
19RNNsLSTM, GRU, multivariate time series, SEC filings
20AutoencodersVAE, conditional autoencoder for asset pricing
21GANsTimeGAN, synthetic financial time series

Part 6 — RL + Conclusions (Ch 22–23)

Chชื่อสาระสำคัญ
22Deep Reinforcement LearningDDQN, OpenAI Gym, custom TradingEnvironment
23Conclusions and Next StepsKey lessons, backtest overfitting, platform comparison

Appendix: 100+ alpha factors ใน TA-Lib, WorldQuant formulaic alphas

Concepts หลัก

Conceptความหมาย
IC (Information Coefficient)Spearman rank correlation ระหว่าง predicted vs. actual returns
Lookahead Biasใช้ข้อมูลอนาคตโดยไม่ตั้งใจ
Deflated Sharpe RatioSharpe ratio ที่ปรับสำหรับ multiple testing
Purging/EmbargoingCross-validation technique สำหรับ time series
HRPHierarchical Risk Parity — portfolio construction ด้วย clustering

🇬🇧 English

Machine Learning for Algorithmic Trading (2nd Edition) by Stefan Jansen Published July 2020 | Packt Publishing | 858 pages | 23 chapters + appendix | 400+ notebooks

The 3rd Edition is coming (June 2026) as part of ML4T Platform — expands to 27 chapters, adds GenAI, causal inference, and production MLOps.

The ML4T Workflow (Central Framework)

Data → Features → ML Model → Signal → Backtest → Portfolio → Live
         ↑                                   ↓
         └───────── learn from results ───────┘

Every chapter applies this workflow to a different ML approach or data type.

Complete Chapter Structure

Part 1 — Data (Ch 1–5)

ChTitleKey Content
1ML for Trading – From Idea to ExecutionML4T workflow overview, use cases, strategy lifecycle
2Market and Fundamental DataNasdaq ITCH feed, tick→bars (time/volume/dollar), pandas-datareader, XBRL
3Alternative Data for FinanceCategories (individuals/business/sensors/satellites), evaluation criteria, web scraping
4Financial Feature EngineeringAlpha factors: momentum, value, volatility, quality; TA-Lib; Kalman filter; Alphalens
5Portfolio Optimization and Performance EvaluationSharpe ratio, mean-variance, Black-Litterman, Kelly criterion, HRP, pyfolio

Part 2 — ML Foundations (Ch 6–8)

ChTitleKey Content
6The Machine Learning ProcessSupervised/unsupervised/RL overview; bias-variance tradeoff; cross-validation; purging/embargoing
7Linear ModelsOLS, ridge, lasso, CAPM→Fama-French factor models, logistic regression, predict returns
8The ML4T Workflow – From Model to Strategy BacktestingBacktest pitfalls (lookahead/survivorship/outlier); backtrader; Zipline Pipeline API

Part 3 — Classical ML (Ch 9–13)

ChTitleKey Content
9Time-Series ModelsARIMA, SARIMAX, ARCH/GARCH, VAR, cointegration, pairs trading backtest
10Bayesian MLPyMC3, MAP/MCMC/variational inference; Bayesian Sharpe ratio; rolling regression for pairs
11Random ForestsDecision trees, bagging, RF; long-short Japanese stocks with LightGBM; Alphalens evaluation
12Boosting Your Trading StrategyAdaBoost, GBM, XGBoost, LightGBM, CatBoost; SHAP values; intraday strategy
13Unsupervised Learning for Risk FactorsPCA, ICA, t-SNE, UMAP; k-means, hierarchical, DBSCAN clustering; HRP portfolio

Part 4 — NLP (Ch 14–16)

ChTitleKey Content
14Text Data for Trading – Sentiment AnalysisNLP pipeline (spaCy, TextBlob); TF-IDF; naive Bayes on news and Yelp data
15Topic ModelingLSI, pLSA, LDA (sklearn + Gensim); earnings call topic modeling
16Word Embeddingsword2vec, GloVe, doc2vec; BERT/transformer intro; SEC filings for return prediction

Part 5 — Deep Learning (Ch 17–21)

ChTitleKey Content
17Deep Learning for TradingFeedforward NN, activation functions, dropout, SGD/Adam; TF2 and PyTorch; long-short strategy
18CNNs for Financial Time SeriesLeNet5, AlexNet, VGG16 transfer learning; 1D convolutions; CNN-TA clustering
19RNNs for Multivariate Time SeriesLSTM, GRU, bidirectional RNN; S&P500 regression; multivariate macro; SEC filing sentiment
20AutoencodersFeedforward/conv/denoising autoencoders; VAE; conditional autoencoder for asset pricing
21GANs for Synthetic Time-Series DataDCGAN, conditional GAN; TimeGAN (train on synthetic, test on real)

Part 6 — RL + Conclusions (Ch 22–23)

ChTitleKey Content
22Deep Reinforcement LearningMDP, value iteration, Q-learning, DDQN; OpenAI Gym; custom TradingEnvironment
23Conclusions and Next StepsKey lessons: data quality, bias-variance, backtest overfitting, platform comparison

Appendix — Alpha Factor Library: 100+ factors in TA-Lib (moving averages, momentum, volume, volatility) + WorldQuant formulaic alphas (Alpha001, Alpha054).

Key Concepts Across the Book

ConceptDescription
IC (Information Coefficient)Spearman rank correlation between predicted and actual returns — the primary signal quality metric
Lookahead BiasAccidentally using future information in features — causes unrealistically good backtests
Deflated Sharpe RatioSharpe ratio adjusted for multiple testing — guards against backtest overfitting
Purging/EmbargoingCross-validation technique for time series — prevents leakage between train and test
Alpha FactorA signal expected to predict returns before being arbitraged away
HRPHierarchical Risk Parity — portfolio construction using clustering instead of matrix inversion

Main Tools Used

Data & Features: pandas, NumPy, TA-Lib, Quandl, yfinance, Zipline bundles ML: scikit-learn, statsmodels, PyMC3, LightGBM, XGBoost, CatBoost Deep Learning: TensorFlow 2, PyTorch, Keras Backtesting: backtrader, Zipline, Alphalens, pyfolio NLP: spaCy, Gensim, TextBlob