Red Flags to Avoid When Choosing a Trading Strategy 

Iriscale 8

Algorithmic trading can look deceptively simple: find a strategy with an impressive equity curve, automate it, and let statistics do the work. In practice, the biggest losses often begin before the first order is sent — during strategy selection, backtesting, and due diligence. A strategy can perform beautifully in a backtest and still be structurally unsafe: overfit to history, dependent on unrealistic execution, or packaged as a black box with no risk controls. 

This guide gives self-directed investors, advisors, and institutional teams a practical framework for evaluating strategies — whether you’re reviewing signals in a marketplace, building in a no-code environment, or trying to backtest more rigorously. You’ll learn the most common failure modes, which metrics to demand, and what questions to ask before allocating even a small pilot. 

1. The illusion of historical outperformance: overfitting and the “factor zoo” 

A backtest is a hypothesis generator, not evidence of edge. The core red flag is a strategy that looks too good for too long without strong out-of-sample validation. 

Overfitting happens when a model learns noise rather than signal — and the risk rises sharply when many parameter combinations, indicators, filters, and timeframes are tested before only the best-looking result is published. Bailey, Borwein, López de Prado, and Zhu formalized this problem as the Probability of Backtest Overfitting (PBO), showing that when many strategy variants are tried, the best in-sample performer tends to disappoint out-of-sample — and that standard hold-out methods alone are unreliable safeguards.¹ Bailey and López de Prado developed a complementary tool, the Deflated Sharpe Ratio (DSR), specifically to correct Sharpe ratios for selection bias and the number of trials run during research.² 

Academic finance offers a cautionary parallel. In their widely-cited Review of Financial Studies paper, Harvey, Liu, and Zhu examined over 300 published return factors and argued that, given the extent of data mining in the literature, most newly claimed findings likely require a much higher statistical hurdle — a t-ratio above 3.0 rather than the conventional 2.0 — before they should be taken seriously.³ If peer-reviewed anomalies can be inflated by extensive data mining, a strategy backtest presented without disclosing how many variants were tried deserves the same scrutiny. 

Red flags to watch: 

  • An excessively smooth equity curve with few drawdowns across multiple market regimes 
  • Many degrees of freedom: dozens of rules, filters, or tuned parameters 
  • No disclosure of the research process — how many variants were tried and why the final version was chosen 

Safeguards: Require walk-forward or combinatorially cross-validated testing. Ask whether the Sharpe ratio has been adjusted for the number of trials. If the provider won’t share any validation evidence, treat it as a material risk. 

2. Black-box strategies: what you must know before you trust the model 

A second major red flag is opacity: you’re asked to fund a strategy you can’t meaningfully interrogate. “Black box” doesn’t automatically mean bad, but it raises the bar for controls, disclosure, and governance significantly. 

The CFA Institute’s curriculum on backtesting and simulation explicitly identifies survivorship bias, look-ahead bias, and data snooping as the primary problems in backtesting that users should understand.⁴ When a provider refuses to explain their data sources, trading frequency, assumptions, or what conditions would break the strategy, you have no basis for assessing those risks. 

Operational opacity carries its own category of danger. On August 1, 2012, Knight Capital Group experienced a software deployment error — a technician failed to copy updated code to one of eight servers, reactivating a dormant trading function called “Power Peg.” Within 45 minutes, the system had executed millions of unintended trades across roughly 150 stocks. Knight Capital took a pre-tax loss of $440 million, and the firm was eventually acquired.⁵ The failure had nothing to do with strategy quality. It was a breakdown in release management, monitoring, and operational controls — the exact things a black-box provider cannot demonstrate if they won’t share their operational processes. 

Black-box red flags: 

  • “Proprietary” used to deflect basic questions: data sources, holding period, trading frequency, risk limits 
  • No monitoring plan: no drift detection, no kill switch, no explanation of when the strategy should be turned off 
  • No operational narrative: who manages it, how code changes are reviewed, how deployments are tested 

Safeguards: Require a minimum transparency packet — sample trades, slippage assumptions, instrument universe, and a plain-language explanation of what market conditions would break the strategy. Demand that any live deployment includes a kill switch and a documented change log before you allocate capital. 

3. Risk metrics that hide the real danger 

A strategy’s headline return is rarely the risk you actually experience. Marketing that emphasizes CAGR or win rate while omitting drawdowns, tail risk, leverage, or dependency on rare events is a structural red flag. 

Bailey and López de Prado’s work on the Deflated Sharpe Ratio explains the mechanism precisely: when a strategy is chosen from many alternatives, the reported Sharpe ratio is systematically inflated by selection bias and non-normally distributed returns.² A strategy that looks like Sharpe 2.5 might have a deflated Sharpe well below 1.0 once the number of trials is accounted for. Meanwhile, a 90% win rate can mask a strategy whose losses, when they arrive, wipe out months of gains — a dynamic that only distribution-aware metrics like expected shortfall or worst-month statistics will reveal. 

Risk metric red flags: 

  • Maximum drawdown minimized or absent from reported results 
  • Sharpe ratio shown without context on how many strategy variants were evaluated to produce it 
  • High win rate emphasized without showing payoff asymmetry 
  • No return distribution view: no histogram, no worst-month figures, no tail quantiles 

Safeguards: Require a standardized risk table covering CAGR, annualized volatility, maximum drawdown, rolling drawdown, exposure/leverage, and at least one stress scenario. Ask for rolling Sharpe across sub-periods — stable performance across market regimes is harder to manufacture than a single full-period number. 

4. Execution and liquidity: when the edge exists only on paper 

Even a statistically real edge can be destroyed by execution. A backtest that assumes mid-price fills, instant full-size execution, or clean exits during volatile markets is describing a market that doesn’t exist in practice. 

Execution red flags: 

  • High turnover with no realistic slippage model 
  • Trading in illiquid instruments — micro-caps, thin altcoin pairs, small venues — without volume participation constraints 
  • No order type realism: limit orders assumed to fill without adverse selection, stops assumed to trigger at the exact stop price during gap openings 
  • Latency dependency disguised as a simple strategy — if it requires sub-second execution infrastructure to work, most users cannot replicate the results 

Safeguards: Re-examine the backtest with conservative execution assumptions: add spread plus slippage, delay signal entry by one bar, and cap position size as a percentage of daily average volume. A useful stress test: “If slippage doubles, does this strategy survive?” If the answer is unclear, the edge may be purely theoretical. 

5. Unrealistic backtest assumptions: fees, survivorship bias, and data quality 

A strategy can fail simply because the backtest was built on assumptions that don’t hold in live markets. The CFA Institute’s official backtesting curriculum identifies survivorship bias and look-ahead bias as two of the primary structural problems that inflate simulated results.⁴ Survivorship bias occurs when a backtest only includes instruments still trading today, ignoring all the companies that delisted, went bankrupt, or were removed from an index — producing an artificially successful historical universe.⁶ Look-ahead bias occurs when data available only after the fact (like a quarterly earnings release) is used to generate a trade signal as if it were available in real time. 

Assumption-based red flags: 

  • No fee schedule, or a single low-commission assumption that ignores exchange fees, spread, funding rates, or borrow costs 
  • Using current index constituents only, without accounting for historical membership — classic survivorship bias 
  • Using information at a timestamp that wasn’t available to a trader at that moment — look-ahead bias 
  • Data sourced from one vendor but executed via another, with different corporate action adjustments or timestamp conventions 

Safeguards: Force realistic assumptions — higher fees than expected, slippage buffers on both sides, and funding/borrow costs where relevant. Verify dataset integrity: survivorship-free universes with correct corporate action adjustments are available from reputable data providers and are worth the additional cost. Keep an untouched validation set and never touch it until the research is finalized. 

6. Marketing, compliance, and the hypothetical performance problem 

Even if you’re trading your own capital, thinking in compliance terms improves rigor: it forces you to document assumptions, identify misleading claims, and design controls. For those operating as registered advisers, the stakes are higher. 

The SEC’s Investment Adviser Marketing Rule (Rule 206(4)-1), which had a compliance deadline of November 2022, prohibits advisers from including hypothetical performance in advertisements unless they have adopted policies ensuring that performance is relevant to the likely financial situation of the intended audience. In 2023 alone, the SEC brought enforcement actions against a fintech adviser and nine additional registered investment advisers for advertising hypothetical performance without those required safeguards — collecting $850,000 in combined penalties in the September sweep alone.⁷ The SEC has stated explicitly that hypothetical performance advertisements “may present an elevated risk for prospective investors” and that enforcement will continue. 

Marketing and compliance red flags: 

  • Hypothetical or backtested returns presented without clear labeling, disclosed assumptions, or risk warnings 
  • No policies for model changes — if a strategy can be modified without notice, you’re funding discretion, not a defined system 
  • Conflicts of interest not disclosed: the seller trades ahead, changes rules after drawdowns, or reports results selectively 
  • API keys or exchange integrations required with broad permissions and no documented controls around access 

Safeguards: Treat every strategy document as if it might be reviewed: document assumptions, disclose limitations, and maintain a change log. If working with external strategy providers, require audit trail evidence and testing documentation before integration. 

Pre-flight checklist: 10 questions before running any strategy 

Use this before deploying any strategy — whether you built it in Mangrove’s no-code builder, sourced it from the Monetization Marketplace, or received it from a third party. If you can’t answer multiple questions confidently, the strategy is not ready to deploy. 

  1. How many strategy variations were tested before arriving at this final version? 
  1. Is there a truly untouched out-of-sample period, or walk-forward test, that was never touched during development? 
  1. Are results consistent across materially different market regimes — trending, ranging, high volatility, and crisis periods? 
  1. What is the maximum drawdown, and how long did historical recovery take? 
  1. Has the Sharpe ratio been adjusted for the number of trials run during research? 
  1. What are the exact fee, slippage, spread, and funding cost assumptions — and what happens if they double? 
  1. Is the instrument universe survivorship-free and properly adjusted for corporate actions and delistings? 
  1. Can you reproduce a trade list and verify the logic independently, without taking anything on trust? 
  1. Is there a documented operational plan — monitoring, alerts, a kill switch, and a change log? 
  1. Is any performance presentation clearly labeled as backtested or hypothetical, with assumptions and limitations disclosed? 

FAQ 

How can I tell if a backtest is overfit if I don’t have access to the code? 

Ask for evidence that doesn’t require trust: multiple out-of-sample segments or walk-forward windows, sensitivity tests showing performance survives parameter changes, and honest disclosure of how many variants were evaluated. Bailey et al.’s PBO framework shows that “best backtest” selection is a predictable trap when many trials are run — the more combinations tested, the higher the probability that the winner is a statistical artifact rather than a genuine signal.¹ If a provider refuses to share any validation artifacts, even anonymized, treat the refusal itself as a red flag. 

Which performance metrics matter most for safety? 

Start with maximum drawdown, rolling drawdowns by sub-period, and exposure/leverage. Then evaluate risk-adjusted performance with the number of trials in mind — Bailey and López de Prado’s DSR exists precisely because headline Sharpe can be systematically misleading when strategies are selected from large search spaces.² Finally, examine distribution-aware metrics: expected shortfall, worst-month, and tail quantiles. Average return hides crash risk. 

Can a no-code strategy builder produce a robust strategy? 

Yes — if the process is rigorous. No-code tools remove engineering friction but can accelerate overfitting by making it frictionless to try thousands of parameter combinations. The tool is not the problem; the process is. Maintain a strict hold-out validation set, document all research decisions, use cross-validation rather than in-sample optimization, and apply PBO thinking before allocating any capital.¹ 

Mangrove provides trading tools and infrastructure — not financial advice. Digital asset trading involves significant risk, including potential loss of principal. Past strategy performance, whether backtested or live, does not guarantee future results. Users should consult a qualified financial advisor before making investment decisions. 

Ready to build and test strategies with built-in risk controls, transparent metrics, and no black boxes? Request Access to Mangrove → 

Sources 

  1. Bailey, D., Borwein, J., López de Prado, M., & Zhu, Q. (2015). The Probability of Backtest Overfitting. Journal of Computational Finance (Risk Journals). Available at SSRN: https://ssrn.com/abstract=2326253 
  1. Bailey, D., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality. Journal of Portfolio Management, 40(5), 94–107. Available at SSRN: https://ssrn.com/abstract=2460551 
  1. Harvey, C. R., Liu, Y., & Zhu, H. (2016). … and the Cross-Section of Expected Returns. Review of Financial Studies, 29(1), 5–68. https://www.nber.org/papers/w20592 
  1. CFA Institute. (2026). Backtesting and Simulation — Refresher Readings. https://www.cfainstitute.org/insights/professional-learning/refresher-readings/2026/backtesting-and-simulation 
  1. Wikipedia: Knight Capital Group. https://en.wikipedia.org/wiki/Knight_Capital_Group; see also detailed technical postmortem at: https://www.henricodolfing.com/2019/06/project-failure-case-study-knight-capital.html 
  1. CFA Institute. (2024). The Good, Bad and Ugly of Bias in AI. https://www.cfainstitute.org/insights/articles/good-bad-and-ugly-of-bias-in-ai 
  1. U.S. Securities and Exchange Commission. (September 11, 2023). SEC Sweep Into Marketing Rule Violations Results in Charges Against Nine Investment Advisers. https://www.sec.gov/newsroom/press-releases/2023-173-sec-sweep-marketing-rule-violations-results-charges-against-nine-investment-advisers 

Share the Post: