Mean Reversion Dominates W13 Out-of-Sample Test
First out-of-sample prediction evaluation. Mean reversion led at 85.7% in an upward-biased week. KG interface tuning begins next cycle.
W13 predictions were locked on April 2 at 06:53 UTC. W13 actuals were scored the same day. This is the first true out-of-sample evaluation — predictions committed before actuals were known, then validated against real outcomes. This is iteration 1 of the SBPI prediction pipeline.
Mean reversion, the simplest statistical heuristic (scores below tier midpoint go up, above go down), led all methods at 85.7% directional accuracy in a week where 18 of 21 companies moved upward. The KG-augmented method ran with default (untuned) parameters and predicted "stable" across the board — its first tuning pass deploys next cycle.
The prediction pipeline improves through multiple inputs running simultaneously: better parameters, more weeks of data, new research methods, and refinement of the scoring methodology itself. The Optuna optimizer has already found a 12-parameter configuration that lifts KG accuracy from 4.8% to 69.9% on training data. That configuration gets wired in for W14.
W13 Prediction Scorecard
Every prediction from every method, scored against actuals.
| Company | W12 | W13 | Delta | Dir | Persist | Momentum | Mean Rev | KG Aug |
|---|---|---|---|---|---|---|---|---|
| DramaBox | 82.75 | 82.75 | 0.0 | STABLE | ✓ stable | ✗ up | ✗ up | ✓ stable |
| ReelShort | 82.0 | 81.2 | -0.8 | DOWN | ✗ stable | ✓ down | ✗ up | ✗ stable |
| Disney | 76.55 | 77.1 | +0.55 | UP | ✗ stable | ✓ up | ✓ up | ✗ stable |
| iQiYi | 65.7 | 67.3 | +1.6 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| JioHotstar | 62.25 | 65.4 | +3.15 | UP | ✗ stable | ✓ up | ✓ up | ✗ stable |
| Google/100Z | 63.65 | 62.95 | -0.7 | DOWN | ✗ stable | ✗ stable | ✗ up | ✗ stable |
| HolyWater | 61.65 | 61.95 | +0.3 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| Netflix | 60.8 | 60.95 | +0.15 | UP | ✗ stable | ✗ down | ✓ up | ✗ stable |
| GoodShort | 58.8 | 60.2 | +1.4 | UP | ✗ stable | ✓ up | ✓ up | ✗ stable |
| CandyJar | 58.65 | 59.55 | +0.9 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| ShortMax | 56.65 | 57.85 | +1.2 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| Lifetime/A&E | 55.45 | 56.9 | +1.45 | UP | ✗ stable | ✓ up | ✓ up | ✗ stable |
| Amazon | 50.2 | 54.25 | +4.05 | UP | ✗ stable | ✗ down | ✓ up | ✗ stable |
| Viu | 48.15 | 48.95 | +0.8 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| GammaTime | 46.15 | 48.5 | +2.35 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| COL/BeLive | 44.55 | 47.25 | +2.7 | UP | ✗ stable | ✓ up | ✓ up | ✗ stable |
| VERZA TV | 32.3 | 32.7 | +0.4 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| RTP | 26.3 | 27.95 | +1.65 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| Both Worlds | 21.5 | 24.15 | +2.65 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| KLIP | 22.35 | 23.6 | +1.25 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| Mansa | 19.35 | 21.2 | +1.85 | UP | ✗ stable | ✗ stable | ✓ up | ✗ stable |
| TOTALS | 1/21 4.8% |
6/21 28.6% |
18/21 85.7% |
1/21 4.8% |
||||
Reading the Scorecard
Persist = last week's direction continues. Momentum = W11→W12 trend extrapolated. Mean Rev = scores below tier midpoint predicted to rise. KG Aug = knowledge graph triples weighted into prediction. Green ✓ = correct directional call. Red ✗ = wrong.
Accuracy by Prediction Method
Two evaluation windows plus cumulative totals.
Mean Reversion Is Inflated by Market Conditions
Mean reversion's 85.7% is inflated by market conditions. W13 had the strongest upward bias of any measured week (18/21 positive). Mean reversion predicted "up" for all 21 because every score was below the tier midpoint. This tells us more about market state than model quality. The true test is a mixed or down week.
Baseline Established
Mean reversion's strong performance establishes the baseline to beat. The optimized KG configuration (69.9% on training data) deploys next cycle. Each week adds data that improves all methods — the pipeline gets stronger with every iteration.
Biggest Calls
The predictions that tell us the most about what works and what doesn't.
Amazon +4.05 — The Fatafat Surprise
Amazon launched its first dedicated micro-drama service (Fatafat via MX Player) on March 23. Free ad-supported, 150+ show slate, celebrity campaign. Went from "Platform Giant (Absent)" to "Platform Giant (Entering)." Content score +6, Narrative +7.
Only mean reversion predicted upward movement. Momentum predicted -3.2 based on the W11→W12 decline. This is what categorical strategy shifts look like — they invalidate trend-following entirely.
ReelShort -0.8 — The Slow Erosion Continues
Only naive momentum correctly called ReelShort's continued decline. Head of Production still unreplaced. Absent from HRTS panel while DramaBox took the stage. COL Group parent pivoting to infrastructure. The talent exodus signal from W12 continues to compound.
Momentum methods shine when trends persist. ReelShort is now the clearest sustained-decline signal in the dataset.
iQiYi +1.6 — Triple Announcement
HK listing, $100M buyback, Nadou Pro AI launch. Stock jumped 10%. Mean reversion called it; momentum and KG both predicted stable.
Event-driven catalysts are invisible to all current prediction methods except the one that assumes "things go up." This class of signal needs dedicated event-impact modeling.
Google/100 Zeros -0.7 — Post-Announcement Decay
Every method got this wrong except persistence/KG (which called stable, not down). The March 12 announcement press cycle faded. No premieres, no concrete dates. Natural decay from announcement high.
This is the kind of signal that event-impact analysis should catch but our methods currently miss. Post-announcement decay is a known pattern — it should be modelable.
JioHotstar +3.15 — IPL Delivers
IPL opening weekend hit 515M combined reach (+26% YoY). Tadka platform rollout with 100 microdramas in 7 languages. Momentum and mean reversion both called it correctly.
The strongest consensus correct call in the evaluation. When both trend-following and mean reversion agree, the signal is strong.
Competitive Landscape
Updated tier positions after W13 scoring. 18 of 21 companies moved up.
Tier 1 — Dominant (80+)
Tier 2 — Strong (55–80)
Tier 3 — Emerging (40–55)
Tier 4 — Vulnerable (<40)
Boundary Watch
Amazon Approaching Tier 2
Amazon (54.25) moved from mid-Tier 3 toward the Tier 2 boundary at 55. One more week of Fatafat momentum could push it into the Strong tier. This would be the first platform giant to enter Tier 2 from below since tracking began.
GammaTime Climbing
GammaTime (48.5, +2.35) is approaching the Tier 3 midpoint. The Forensic Files IP deal gives it a content pipeline that most Tier 3 companies lack. If it maintains this trajectory, it reaches the Tier 2 boundary in 3–4 weeks.
Both Worlds — Largest Tier 4 Gain
Both Worlds (+2.65) had the largest gain among Tier 4 companies. The US-Africa co-production model is generating real differentiation. Still far from Tier 3, but the growth rate is notable.
Prediction Pipeline — Iteration 1 Results
The first two evaluation cycles establish where each method starts. The pipeline improves from here.
Starting Point, Not Endpoint
The KG-augmented method ran with default (untuned) parameters for its first two evaluation cycles. It predicted "stable" for all 21 companies because the direction_threshold was set conservatively at 0.5. This is the starting configuration — the first tuning pass deploys next cycle. The knowledge graph features (momentum, anomalies, tier proximity, divergence) all exist in the graph but the weights connecting them to the decision layer had not been optimized.
The Optuna TPE optimizer has already found a 12-parameter configuration that lifts directional accuracy from 4.8% to 69.9% on training data. This configuration activates the KG features that the default config left dormant — enabling anomaly detection, raising the direction threshold, and weighting divergence and tier proximity signals into the prediction.
Default vs. Optimized Parameters
The optimized config gets deployed for W14. Key parameter changes below.
Multiple Inputs, Simultaneous Iteration
The prediction pipeline is not a single-method experiment. It improves through multiple inputs running in parallel:
Better parameters. The optimized configuration deploys next cycle. As more weeks of evaluation data accumulate, the optimizer retrains on a larger dataset, producing progressively better parameter fits.
More data. Each week adds ~500 triples to the knowledge graph and one more evaluation point for every method. SPARQL queries get richer. Patterns that require 6–8 weeks of longitudinal data become detectable.
New research methods. Event impact analysis (event_impact_analyzer.py) and news signal processing are not yet in the nightly pipeline. Events like Amazon Fatafat and iQiYi's triple announcement are the signals that simple statistical methods miss. Integrating event detection adds a method class that no current approach covers.
Scoring methodology refinement. The SBPI dimension weights (DP 25%, CS 20%, NO 20%, CoS 20%, MI 15%) are fixed estimates. With enough weeks of data, TPE optimization of these weights could improve the underlying scoring itself, which improves everything downstream.
The goal is actionable intelligence for each brand on the stack ranking. Directional predictions are one signal feeding into that intelligence product. The pipeline gets stronger with every iteration.
Next Steps
Improving predictions and brand intelligence, week over week.
Deploy Optimized Config
The Optuna TPE-tuned parameters get wired into the live prediction engine for the W14 cycle. This is the first real tuning pass. The 12-parameter configuration in best-config.json activates anomaly detection, divergence weighting, and tier proximity — features that the default config left dormant.
Load W13 Data into Oxigraph
Expand the triple store from 1,672 to ~2,200+ triples. More longitudinal data improves every method — SPARQL queries get richer, patterns become more detectable. Run sbpi_to_rdf.py with W13 state data.
Generate W14 Predictions
All methods produce predictions for next week. The optimized KG config runs alongside existing methods. Each week builds the evaluation dataset. Lock predictions before W14 actuals drop.
Add Event Impact Signals
The event_impact_analyzer.py script exists but is not in the nightly pipeline yet. News events (Amazon Fatafat launch, iQiYi triple announcement) are the signals that simple statistical methods miss. Integrating event detection improves the intelligence product across all methods.
Experiment 3: Dimension Weight Learning
Now have 4 weeks of data. The current fixed dimension weights (DP 25%, CS 20%, NO 20%, CoS 20%, MI 15%) may not be optimal. TPE optimization of these weights could improve the underlying scoring, which improves everything downstream.
Brand Intelligence Cards
Start generating per-company intelligence briefs from the SPARQL insight digests. The prediction pipeline's real output is not an accuracy percentage — it is actionable intelligence for each brand that tells them where they stand, what is changing, and what to watch.