ML4T Trading Approaches

🇹🇭 ภาษาไทย

สรุปแนวทางการเทรดจาก ML4T Book 2nd Edition (Stefan Jansen, 2020) เพื่อใช้เป็น roadmap ในการพัฒนา systematic trading strategy

Framework กลาง: ML4T Workflow

หนังสือไม่ได้ให้ “สูตรสำเร็จ” แต่สอน กระบวนการ — ทุก strategy ทุก model ถูก apply ผ่าน workflow เดียวกัน:

ไอเดีย → ข้อมูล → Features → Model → Signal → Backtest → Portfolio → Live
              ↑                                     ↓
              └──────────── เรียนรู้จากผลลัพธ์ ────────┘

หัวใจของ workflow: ไม่มี step ไหนที่ข้ามได้ — โดยเฉพาะ Backtest ที่ต้องทำอย่างถูกต้องก่อน deploy จริง

แนวทางการเทรดที่ครอบคลุมใน 23 Chapters

🟢 ระดับเริ่มต้น — ไม่ต้องใช้ ML ก็ได้

1. Alpha Factors (Ch 4 + Appendix)

Signal พื้นฐานที่ใช้ predict returns มี 4 หมวดหลัก:

หมวด	ตัวอย่าง Signal	ทำไมถึง work
Momentum	12-month return, RSI, MACD	trend ยังคงอยู่หลังจากเริ่ม
Value	P/E, P/B, earnings yield	mean reversion ในระยะยาว
Volatility/Size	Beta, market cap, ATR	risk premium ที่ตลาดให้
Quality	ROE, gross margin, debt/equity	บริษัทดีมักทำกำไรได้ต่อเนื่อง

วัดคุณภาพ signal ด้วย IC (Information Coefficient) = Spearman rank correlation ระหว่าง signal vs. actual return

Appendix มี 100+ alpha factors สำเร็จรูปใน TA-Lib พร้อม code

2. Statistical Arbitrage / Pairs Trading (Ch 9)

หา 2 assets ที่ cointegrated → ราคาเดินคู่กันระยะยาว
→ เมื่อ spread ห่างออก → Long ตัวถูก, Short ตัวแพง
→ เมื่อ spread กลับมา → ปิด position

ใช้ Engle-Granger / Johansen test ตรวจสอบ cointegration → backtest ด้วย backtrader

🟡 ระดับกลาง — Classical ML

3. Long-Short Equity ด้วย Tree-based Models (Ch 11–12)

Strategy หลักที่หนังสือสาธิต:

ใช้ LightGBM / XGBoost predict 1-month return ของหุ้นแต่ละตัว
Long หุ้น top quintile (predict สูงสุด)
Short หุ้น bottom quintile (predict ต่ำสุด)
ประเมิน signal ด้วย Alphalens, portfolio ด้วย pyfolio

Feature ที่ใช้: lagged returns, technical indicators, volume patterns

ข้อดี: SHAP values ช่วยอธิบายว่า model ตัดสินใจเพราะอะไร → ไม่ใช่ black box

4. Intraday Strategy (Ch 12)

ใช้ GBM กับ minute-frequency equity data:

Engineer features จาก microstructure (bid-ask spread, order imbalance)
Predict direction ของ bar ถัดไป
ต้องการ low latency + realistic transaction cost model

5. Volatility Forecasting (Ch 9)

ใช้ GARCH model predict volatility:

ประยุกต์ใช้กับ options pricing
ใช้ใน risk management (position sizing, stop loss)
VAR model สำหรับ multi-asset volatility

6. Portfolio Optimization ด้วย HRP (Ch 5, 13)

Hierarchical Risk Parity — portfolio construction ที่ใช้ clustering แทน matrix inversion:

assets → hierarchical clustering → allocate risk evenly ตาม tree structure

ข้อดี: ไม่ต้อง invert covariance matrix → stable กว่า mean-variance optimization มาก

🔴 ระดับสูง — Deep Learning & Alternative Data

7. Sentiment Trading จาก Text (Ch 14–16)

3 ระดับความซับซ้อน:

Bag-of-words + naive Bayes (Ch 14) — เริ่มต้นง่าย, เร็ว
Topic Modeling / LDA (Ch 15) — หา themes ใน earnings calls
Word Embeddings / BERT (Ch 16) — state-of-the-art, predict earnings surprises จาก SEC filings

Data sources: earnings call transcripts, news articles, SEC 10-K/10-Q filings

8. Return Prediction ด้วย Deep Learning (Ch 17–19)

Model	ใช้กับ	สาระสำคัญ
Feedforward NN (Ch 17)	daily stock returns	หลาย hidden layers เรียนรู้ nonlinear patterns
CNN (Ch 18)	time series แปลงเป็น image (CNN-TA)	convolutional filters จับ local patterns
LSTM/RNN (Ch 19)	multivariate time series	จำ long-range dependencies ใน sequence

9. Conditional Risk Factors ด้วย Autoencoder (Ch 20)

สร้าง risk factors ที่ขึ้นกับ stock characteristics → ใช้ใน asset pricing model ที่ dynamic กว่า Fama-French

10. Reinforcement Learning Trading Agent (Ch 22)

สร้าง agent ที่ตัดสินใจ Buy / Sell / Hold โดย optimize long-run cumulative reward:

State (ราคา, indicators) → Agent (DDQN) → Action → Reward → เรียนรู้

ใช้ OpenAI Gym สร้าง custom TradingEnvironment พร้อม position tracking และ P&L

3 บทเรียนที่สำคัญที่สุด

1. Data คือทุกอย่าง Signal ดีมาจากข้อมูลดี — point-in-time data สำคัญมาก เพราะ lookahead bias ทำให้ backtest ดูดีเกินจริงเสมอ ต้องใช้ข้อมูลที่รู้ได้ ณ เวลานั้นจริงๆ เท่านั้น

2. Backtest ≠ ความจริง ถ้า backtest ดูดีเกินไป — มักเป็น overfitting ใช้ Deflated Sharpe Ratio (DSR) และ purging/embargoing ใน cross-validation เพื่อ validate ได้จริง

3. Process > Model ไม่มี model ไหนที่ “ดีที่สุด” ตลอดกาล สิ่งที่สำคัญกว่าคือ workflow ที่ถูกต้อง discipline ใน evaluation และ domain expertise ที่บอก signal จาก noise ได้

Roadmap การเริ่มต้น (แนะนำ)

Step 1: Alpha Factors (Ch 4)
        เข้าใจว่า signal มาจากไหน และวัดด้วย IC
              ↓
Step 2: Backtest ด้วย backtrader (Ch 8)
        เรียนรู้ pitfalls ก่อนใช้เงินจริง
              ↓
Step 3: Long-Short ด้วย LightGBM (Ch 11–12)
        Classical ML ที่ใช้ได้จริง + interpretable ด้วย SHAP
              ↓
Step 4: เพิ่ม NLP signal (Ch 14–16)
        ใช้ alternative data เพิ่ม edge
              ↓
Step 5: Deep Learning / RL (Ch 17–22)
        เมื่อพร้อม — complexity สูง ต้องการ compute และ data มากขึ้น

ความเชื่อมโยงกับ Tools ใน Wiki นี้

Tool	นำไปต่อยอดได้อย่างไร
TradingView MCP	ดู chart + indicator values → ใช้เป็น input ให้ model หรือ validate signal ด้วยตา
ML4T Platform	3rd edition เพิ่ม RAG & Agents, GenAI, causal inference → เพิ่ม edge ด้วย LLM
MiroFish	Swarm simulation → ทดสอบ market scenario ก่อน deploy strategy

🇬🇧 English

A summary of trading approaches from ML4T Book 2nd Edition (Stefan Jansen, 2020), organized as a practical roadmap for developing systematic trading strategies.

Central Framework: The ML4T Workflow

The book teaches a process, not a formula. Every strategy and model is applied through the same workflow:

Idea → Data → Features → Model → Signal → Backtest → Portfolio → Live
         ↑                                      ↓
         └────────── learn from results ──────────┘

No step can be skipped — especially Backtesting, which must be done correctly before any live deployment.

Trading Approaches Covered in 23 Chapters

🟢 Beginner — No ML Required

1. Alpha Factors (Ch 4 + Appendix)

Core signals used to predict returns, organized into 4 categories:

Category	Example Signals	Why They Work
Momentum	12-month return, RSI, MACD	Trends persist after they begin
Value	P/E, P/B, earnings yield	Mean reversion over the long run
Volatility/Size	Beta, market cap, ATR	Risk premiums the market assigns
Quality	ROE, gross margin, debt/equity	Good companies tend to stay good

Signal quality is measured by IC (Information Coefficient) = Spearman rank correlation between the signal and actual forward returns.

The Appendix contains 100+ ready-to-use alpha factors in TA-Lib, with code.

2. Statistical Arbitrage / Pairs Trading (Ch 9)

Find 2 cointegrated assets → prices move together long-term
→ When spread widens → Long the cheap one, Short the expensive one
→ When spread reverts → Close the position

Use Engle-Granger / Johansen tests for cointegration, then backtest with backtrader.

🟡 Intermediate — Classical ML

3. Long-Short Equity with Tree-Based Models (Ch 11–12)

The main strategy demonstrated in the book:

Use LightGBM / XGBoost to predict 1-month returns for each stock
Long the top quintile (highest predicted return)
Short the bottom quintile (lowest predicted return)
Evaluate signal quality with Alphalens; portfolio performance with pyfolio

SHAP values explain why the model makes each decision — not a black box.

4. Intraday Strategy (Ch 12)

Apply GBM to minute-frequency equity data. Engineer features from microstructure (bid-ask spread, order imbalance), predict the next bar’s direction. Requires low latency and a realistic transaction cost model.

5. Volatility Forecasting (Ch 9)

Use GARCH to predict volatility: applied in options pricing, risk management (position sizing, stop loss), and multi-asset volatility with VAR models.

6. Portfolio Optimization with HRP (Ch 5, 13)

Hierarchical Risk Parity uses clustering instead of matrix inversion:

Assets → hierarchical clustering → allocate risk evenly across the tree structure

More numerically stable than mean-variance optimization; no covariance matrix inversion needed.

🔴 Advanced — Deep Learning & Alternative Data

7. Sentiment Trading from Text (Ch 14–16)

Three levels of complexity:

Bag-of-words + Naive Bayes (Ch 14) — easy to start, fast
Topic Modeling / LDA (Ch 15) — extract themes from earnings calls
Word Embeddings / BERT (Ch 16) — predict earnings surprises from SEC filings

8. Return Prediction with Deep Learning (Ch 17–19)

Model	Applied to	Key Idea
Feedforward NN (Ch 17)	Daily stock returns	Multiple hidden layers learn nonlinear patterns
CNN (Ch 18)	Time series as images (CNN-TA)	Convolutional filters capture local temporal patterns
LSTM/RNN (Ch 19)	Multivariate time series	Learns long-range dependencies in sequences

9. Conditional Risk Factors with Autoencoders (Ch 20)

Build risk factors conditioned on stock characteristics → a dynamic asset pricing model that outperforms static Fama-French.

10. Reinforcement Learning Trading Agent (Ch 22)

Build an agent that decides Buy / Sell / Hold by optimizing long-run cumulative reward:

State (price, indicators) → DDQN Agent → Action → Reward → Learn

Uses OpenAI Gym with a custom TradingEnvironment that tracks positions and P&L.

3 Most Important Lessons

1. Data is everything. Good signals come from good data. Point-in-time correctness is critical — lookahead bias always makes backtests look unrealistically good. Only use information that was actually available at the time of the trade.

2. Backtest ≠ Reality. If a backtest looks too good, it’s probably overfitting. Use Deflated Sharpe Ratio (DSR) and purging/embargoing in cross-validation to get honest estimates of out-of-sample performance.

3. Process beats model. No model is “best” forever. What matters more is a sound workflow, rigorous evaluation discipline, and domain expertise that can tell signal from noise.

Recommended Starting Path

Step 1: Alpha Factors (Ch 4)
        Understand where signals come from; measure with IC
              ↓
Step 2: Backtest with backtrader (Ch 8)
        Learn pitfalls before risking real capital
              ↓
Step 3: Long-Short with LightGBM (Ch 11–12)
        Practical classical ML + interpretable with SHAP
              ↓
Step 4: Add NLP signals (Ch 14–16)
        Use alternative data for additional edge
              ↓
Step 5: Deep Learning / RL (Ch 17–22)
        When ready — higher complexity, more compute and data needed

Connection to Other Tools in This Wiki

Tool	How It Extends ML4T
TradingView MCP	View charts and indicator values → use as model input or visually validate signals
ML4T Platform	3rd edition adds RAG & Agents, GenAI, causal inference → LLM-enhanced edge
MiroFish	Swarm simulation → stress-test market scenarios before deploying a strategy

PrasitN Wiki

รายการหน้า

ML4T Trading Approaches

ML4T Trading Approaches

🇹🇭 ภาษาไทย

Framework กลาง: ML4T Workflow

แนวทางการเทรดที่ครอบคลุมใน 23 Chapters

🟢 ระดับเริ่มต้น — ไม่ต้องใช้ ML ก็ได้

🟡 ระดับกลาง — Classical ML

🔴 ระดับสูง — Deep Learning & Alternative Data

3 บทเรียนที่สำคัญที่สุด

Roadmap การเริ่มต้น (แนะนำ)

ความเชื่อมโยงกับ Tools ใน Wiki นี้

🇬🇧 English

Central Framework: The ML4T Workflow

Trading Approaches Covered in 23 Chapters

🟢 Beginner — No ML Required

🟡 Intermediate — Classical ML

🔴 Advanced — Deep Learning & Alternative Data

3 Most Important Lessons

Recommended Starting Path

Connection to Other Tools in This Wiki

มุมมองกราฟ

สารบัญ