design.md 3.2 KB

Context

The current project has several sampled-report strategies and an exploratory script that tested ultra-short candidates. The first pass used a small random sample and produced a VWAP candidate that did not survive longer validation. The robust validation workflow must make that failure visible and provide a direct path for accepting only statistically supported candidates.

The existing strategy segment runners already return the metrics needed for per-window analysis. The change can reuse those runners and add a research-only validation path that computes all required statistics from deterministic, non-overlapping windows.

Goals / Non-Goals

Goals:

  • Validate candidate strategies over a long historical range.
  • Use all available non-overlapping windows for the selected window size.
  • Report confidence intervals and distribution metrics needed to judge statistical support.
  • Identify RSI2 parameter sets whose 95% confidence interval lower bound remains positive.
  • Preserve the conclusion that the earlier VWAP candidate is not statistically supported.

Non-Goals:

  • No live trading or paper trading behavior changes.
  • No new order execution logic.
  • No strategy auto-selection for production trading.
  • No fee, slippage, or funding-rate model in this change.
  • No generic backtest framework rewrite.

Decisions

  1. Use deterministic non-overlapping windows.

Random sampled windows are useful for quick exploration, but they leave the result dependent on the sample seed. Non-overlapping windows provide a direct count of independent evaluation slices for the chosen window size and make the reported sample count auditable.

  1. Keep robust validation in a research script.

The requirement is to validate and incorporate the strategy conclusion, not to change trading execution. Keeping the workflow outside the trading CLI avoids expanding the system boundary while still making the result repeatable.

  1. Rank candidates by 95% confidence interval lower bound before average return.

Average return alone allowed small-sample winners to rank highly. The lower confidence bound better matches the objective: only promote candidates whose positive result survives statistical uncertainty.

  1. Promote RSI2 as a candidate family, not a live strategy.

The current robust run shows RSI2 candidates with positive confidence interval lower bounds on 3m/5m data. That supports further candidate work, but it does not justify live deployment because transaction costs are not included.

Risks / Trade-offs

  • Transaction costs are excluded → The robust output must label results as gross returns and must not claim live profitability.
  • Window returns can still be regime-dependent → The output includes worst return, p10, p90, median, and positive-window rate so the distribution is visible.
  • Non-overlapping windows reduce sample count versus overlapping windows → The chosen sample count is more conservative and easier to reason about.
  • Candidate ranking can overfit the tested historical period → The change promotes only candidates with positive confidence interval lower bounds and leaves cost-aware validation as a separate required step before trading.