Understanding Strategy Performance Metrics for Digital Assets Trading 

Iriscale 7

Why This Matters Now 

Algorithmic trading is no longer reserved for hedge funds with large quant teams. Self-directed investors increasingly build rules-based strategies with no-code tools, subscribe to systems in strategy marketplaces, or use AI-assisted platforms that generate signals and execute trades automatically. The upside is speed and consistency. The risk is that many strategies are selected — or optimized — on the wrong evidence, usually a single “return” number that ignores how that return was achieved. In digital asset markets, this problem is amplified: high volatility inflates raw return numbers, short data histories make back-tests easy to overfit, and 24/7 markets create new measurement traps that traditional metric conventions weren’t built for. 

Performance metrics exist to answer three practical questions: 

  • Did the strategy make money? (Return metrics: Total Return, CAGR) 
  • What did it cost in risk to earn that money? (Risk metrics: volatility, Maximum Drawdown, Value-at-Risk) 
  • Was the return efficient and repeatable? (Risk-adjusted metrics: Sharpe, Sortino, Calmar — and trade-level diagnostics like Expectancy and Profit Factor) 

This guide is for investors who already know basic trading concepts but want to read a back-test report more critically. You’ll learn what each metric measures, how to interpret it in context, and where the common traps live — especially in digital asset markets, where 24/7 trading and fat-tail moves can distort traditional measurements. 

Absolute vs. Risk-Adjusted Returns 

Absolute return answers: How much did the strategy make? Risk-adjusted return answers: How much did it make per unit of risk? 

This distinction matters because capital is finite and psychological tolerance is real. A strategy that earns 20% with a –40% maximum drawdown may be practically untradeable for most investors. Another that earns 12% with a –12% drawdown can often be sized up with confidence — and may generate more dollars in practice. 

Consider two strategies over three years: 

  • Strategy A: CAGR 12%, Max Drawdown –35%, high volatility 
  • Strategy B: CAGR 10%, Max Drawdown –12%, lower volatility 

On a spreadsheet, A “wins” on CAGR. In an allocator’s seat, B often wins because it can be scaled within real risk limits. This is why professional systematic managers often target volatility and drawdown constraints explicitly — not just returns. 

Context also matters when evaluating any back-test. Long-run hedge fund composites have delivered mid-single-digit average annual returns in many multi-year windows, while broad equity indices have historically averaged around 10% nominally over long periods. A back-test showing 40%+ CAGR at low drawdown is extraordinary — and demands extraordinary scrutiny for overfitting, leverage assumptions, or missing costs. 

Return Metrics 

Total Return 

Total return captures the overall gain from price appreciation plus all income received over the measurement period. 

Formula: (Ending Value − Beginning Value + Total Income) ÷ Beginning Value 

Example: A $1,000,000 portfolio grows to $1,350,000 and generates $90,000 in income over three years. Total return = 44%. 

Key pitfall: Total return is not annualized. Comparing 44% over three years to 44% over 18 months is misleading. It also requires consistent assumptions about fee treatment and income reinvestment. 

CAGR (Compound Annual Growth Rate) 

CAGR is the geometric mean annual growth rate — the smoothed rate at which the portfolio would have needed to grow each year to reach the ending value, assuming all gains are reinvested. 

Formula: (Ending Value ÷ Beginning Value)^(1/n) − 1 

Example: Using the same portfolio: (1,350,000 ÷ 1,000,000)^(1/3) − 1 = 10.5% CAGR 

Key pitfall: CAGR hides the path. A strategy can show an attractive CAGR while enduring drawdowns that would have caused most investors to exit early — erasing the theoretical benefit. Never evaluate CAGR in isolation. 

Beta and Jensen’s Alpha 

Beta measures a strategy’s sensitivity to a market benchmark. A beta of 1.0 behaves like the market; 0.3 means muted equity sensitivity; a negative beta implies hedging characteristics. 

Formula: Cov(Portfolio Returns, Market Returns) ÷ Var(Market Returns) 

Key pitfall: Beta is backward-looking and can shift in regime changes. If a strategy’s returns are mostly explained by high beta to a rising market, it may simply be long risk — not generating genuine edge. 

Jensen’s Alpha is the excess return over what the CAPM model would predict, given beta. A positive alpha suggests skill or a mispriced exposure relative to the benchmark — though persistent positive alpha is genuinely rare across most manager universes. 

Formula: Alpha = Actual Return − [Risk-Free Rate + Beta × (Market Return − Risk-Free Rate)] 

Example: Risk-free rate 2%, market return 10%, beta 0.8 → expected return = 8.4%. If actual portfolio return is 5.7%, alpha = −2.7%. 

Key pitfall: Alpha is highly sensitive to benchmark selection. Using the wrong benchmark manufactures alpha or conceals risk. Short sample sizes and data-mined back-tests can create “paper alpha” that doesn’t persist. 

Risk Metrics 

Maximum Drawdown (MDD) 

Maximum Drawdown is the greatest cumulative percentage decline from any previous equity peak to a subsequent trough. It is the most direct measure of capital pain. 

Formula: (Valley Value − Peak Value) ÷ Peak Value 

Example: Peak $1,000,000 → Trough $820,000 → MDD = −18% 

Interpretation context: Large multi-strategy hedge fund composites have experienced drawdowns in the –20% to –25% range over multi-year periods. Systematic trend-following strategies often target maximum drawdowns ≤ −15%. Digital asset strategies may tolerate deeper drawdowns — some have exceeded −40% in volatile periods. 

Key pitfall: MDD is sensitive to measurement frequency. Computing drawdown on monthly NAV can miss severe intramonth declines (sometimes called granularity bias). If a strategy trades intraday, drawdown should be measured at intraday granularity. 

Volatility 

Volatility — the standard deviation of periodic returns — is the foundational measure of total risk. 

Formula: σ_annualized = σ_period × √k, where k = periods per year 

Critical note for digital assets: Equities and FX use √252 (trading days). Digital assets trade 365 days a year, so annualization uses √365. Mixing these conventions materially changes volatility figures — and therefore any ratio that uses volatility, like Sharpe. 

Interpretation context: Under ~10% annualized is considered low (bond-like), 10–20% is moderate (broad equities), 20–30% is high, and above 30% is very high. Bitcoin’s realized volatility has historically been far above traditional markets — around 63% in 2023 and approximately 49% in 2024, versus the S&P 500’s high-teens range. 

Key pitfall: Volatility treats upside and downside equally. A strategy with high upside volatility and controlled downside volatility will look worse on Sharpe than its risk profile warrants. The Sortino ratio addresses this. 

Value-at-Risk (VaR) 

VaR estimates the minimum expected loss over a stated time period at a stated confidence level. For example, a 95% daily VaR of $87,000 means there is a 5% chance the portfolio loses more than $87,000 in a single day. 

There are two common approaches: 

  • Parametric VaR: Assumes returns follow a normal distribution, uses the portfolio’s mean and standard deviation. 
  • Historical VaR: Sorts past returns and reads off the relevant percentile loss. 

Key pitfall: VaR ignores the severity of losses beyond the confidence threshold. Parametric VaR assumes distributional properties that often fail in fat-tail markets. Digital asset returns regularly violate the normality assumption, particularly at high confidence levels — meaning VaR can materially understate crash risk. Always pair VaR with stress testing and scenario analysis. 

Digital asset-specific note: Risk can spike when traditional market liquidity providers are offline. Weekend and overnight gaps can worsen drawdowns and degrade VaR model fit if models rely on weekday-only data samples. 

Risk-Adjusted Metrics 

Sharpe Ratio 

The Sharpe ratio is the most widely used risk-adjusted metric. It measures excess return per unit of total volatility. 

Formula: (Strategy Return − Risk-Free Rate) ÷ Annualized Standard Deviation 

Example: Annual return 12%, risk-free rate 2%, annualized volatility 20% → Sharpe = 0.50 

If you reduce volatility to 15% without changing the return, Sharpe rises to 0.67 — an immediate illustration of why volatility targeting is such a powerful lever. 

Interpretation context: ~0.5 is modest, ~1.0 is strong, and >1.5 is excellent for liquid strategies — though context matters (fees, leverage, and tail risk all affect interpretation). 

Key pitfall: Sharpe penalizes upside and downside volatility equally, and can be inflated by strategies with crash risk (like short volatility positions) that look stable until they don’t. Always examine Sortino and Calmar alongside Sharpe. 

Sortino Ratio 

The Sortino ratio is similar to Sharpe but penalizes only downside volatility — a useful adjustment when upside variability is not considered problematic. 

Formula: (Strategy Return − Risk-Free Rate) ÷ Downside Standard Deviation 

Example: Same 12% return and 2% risk-free rate, but downside deviation is 10% (not 20%). Sortino = 1.0, while Sharpe = 0.50. This tells you that the strategy’s variability is concentrated on the upside — a meaningfully different risk profile. 

Key pitfall: If downside deviation is computed on too few observations, the ratio becomes unstable. Strategies with infrequent but severe drawdowns may not show a deteriorating Sortino until it’s too late. 

Calmar Ratio 

The Calmar ratio is a drawdown-based efficiency metric commonly used in systematic and trend-following strategy evaluation. It maps directly to investor pain. 

Formula (common convention): CAGR ÷ |Maximum Drawdown| 

Example: CAGR 10%, MDD −20% → Calmar = 0.50. Keep CAGR at 10% but reduce MDD to −10% and Calmar doubles to 1.0 — often a meaningful upgrade in real-world investability. 

Key pitfall: Calmar can be inflated by selecting a short measurement window that misses the worst historical drawdown. Always verify the evaluation period covers multiple market regimes. 

Trade-Level Efficiency Metrics 

Return and risk metrics describe outcomes at the portfolio level. Trade-level metrics explain how those outcomes were generated — and whether the strategy’s edge is consistent enough to survive real costs and losing streaks. 

Win Rate 

The percentage of trades that are profitable. Win rate alone is not edge. A strategy can win 80% of the time and still be a net loser if the losses dwarf the wins. 

Win Rate = Winning Trades ÷ Total Trades 

Profit Factor 

Gross profit divided by gross loss. A Profit Factor above 1.0 is technically profitable before costs; many practitioners view >1.3 as decent and >1.7 as strong, though this varies by trade frequency and cost structure. 

Profit Factor = Sum of Profits ÷ Sum of Losses 

Key pitfall: Profit Factor often looks better in back-tests that don’t account for spreads, fees, and slippage. After adding realistic transaction costs, many marginal strategies dip below 1.0 in live trading. 

Expectancy 

Expectancy is the average profit per trade — the mathematical engine of long-run profitability. 

Expectancy = (Win Rate × Average Win) − (Loss Rate × Average Loss) 

Example: Win rate 45%, average win $300, loss rate 55%, average loss $180 → Expectancy = $135 − $99 = $36 per trade 

A positive expectancy that survives costs and holds across market regimes is the foundation of any durable strategy. 

Risk–Reward Ratio 

Average win divided by average loss per trade. A strategy with a low win rate can still be highly profitable with a high enough reward-to-risk ratio — which is why win rate alone tells you very little. 

R:R = Average Win ÷ Average Loss 

Digital asset note: In fast-moving 24/7 markets, stops are not guarantees. Slippage during liquidation events or periods of thin liquidity can blow through average loss assumptions and collapse expectancy in ways that don’t appear in back-tests built on liquid-hours data. 

Reading a Back-Test Report: A Layered Approach 

A robust evaluation process reads metrics in sequence, not in isolation. 

Layer 1 — Validate return realism. Start with Total Return and CAGR. If CAGR is dramatically above plausible ranges for the strategy type, investigate: hidden leverage, missing costs, or overfitted rules are the most common culprits. A CAGR of 30–50% with minimal drawdown warrants serious skepticism before any capital is committed. 

Layer 2 — Check risk and pain. Review MDD and volatility together. Ensure drawdown is measured on sufficiently granular data — monthly sampling can mask severe interim declines. For digital assets, use correct annualization (√365) and account for 24/7 risk windows including weekends. 

Layer 3 — Evaluate efficiency. Use Sharpe, Sortino, and Calmar together. A moderate CAGR with excellent Calmar can often be scaled more safely than a higher CAGR with poor drawdown control. If Sharpe is decent but Calmar is poor, look for crash risk: low-frequency, high-severity drawdowns that don’t show up in volatility. 

Layer 4 — Attribute returns. Examine beta and alpha. If performance is largely explained by high beta to a rising market, the “strategy” may simply be leveraged long risk. Separate market exposure from genuine edge before drawing conclusions. 

Layer 5 — Inspect trade quality. Review win rate, profit factor, expectancy, and R:R. If expectancy is only slightly positive, small cost changes can break the strategy entirely. If profit factor is strong but win rate is low, ensure you can operationally and psychologically tolerate extended losing streaks. 

Common Pitfalls 

Annualization errors — Using √252 when a market trades 365 days materially changes volatility and every ratio built on it. Choose your convention and apply it consistently across all strategies you compare. 

Granularity bias in drawdowns — A daily or monthly equity curve can hide deep intraday crashes. If a strategy trades intraday, compute drawdown intraday. 

Model risk in VaR — Parametric VaR assumes normality. Digital asset returns often violate this assumption, especially at high confidence levels. Prefer historical methods, stress tests, and tail-focused models when evaluating digital asset strategies. 

Benchmark mistakes — Alpha depends on the benchmark model. A misaligned benchmark manufactures alpha or conceals risk. Match the benchmark to the strategy’s actual opportunity set. 

Back-test biases — Survivorship bias inflates back-test returns by excluding failed assets or delisted instruments. Overfitting produces high in-sample metrics that collapse out-of-sample. Both are extremely common and easy to miss. 

Ignoring costs — Trade-level metrics often degrade sharply once you add spreads, fees, slippage, and funding costs. A back-test that doesn’t model these realistically is not a reliable predictor of live performance. 

Practical Levers That Don’t Require New “Alpha” 

Many of the most meaningful improvements to a strategy’s metrics come from risk design and execution discipline — not from changing the underlying signal logic. 

  • Volatility targeting — Scale position size to maintain a stable annualized volatility target (e.g., 10–15%). Because Sharpe is excess return divided by volatility, reducing σ while holding returns constant directly improves efficiency. 
  • Drawdown guardrails — Implement drawdown-based de-risking: reduce exposure after peak-to-trough declines of a defined threshold. This often improves Calmar by cutting MDD without requiring a change in strategy logic. 
  • Signal diversification — Combining uncorrelated strategies (trend-following, mean-reversion, carry) reduces portfolio volatility through imperfect correlation, improving both Sharpe and Calmar. 
  • Execution improvements — Better routing, reduced market impact, and slippage controls improve total return and profit factor — especially for higher-frequency strategies. 
  • Tail-risk modeling — If using VaR, complement it with scenario stress tests and non-parametric methods, particularly when evaluating digital asset strategies with fat-tail return distributions. 

For 24/7 digital asset strategies: Risk controls must be automated. If the market never closes, manual oversight is not sufficient. Weekend liquidity shifts and overnight shocks can deepen a mild drawdown quickly if guardrails aren’t systematic and always-on. 

8 Questions to Ask Before Going Live 

  1. Are returns computed as total return (including income and funding costs), with fees and slippage modeled realistically? 
  1. Is CAGR computed geometrically, and does the back-test span multiple market regimes? 
  1. What is the maximum drawdown — and is it measured at appropriate granularity (not just monthly)? 
  1. Is volatility annualized with the correct convention (√252 for equities/FX; √365 for digital assets)? 
  1. Is the VaR method appropriate for fat-tailed return distributions, and does it match the actual holding period and confidence level? 
  1. Do Sharpe, Sortino, and Calmar tell a consistent story — or is one ratio hiding tail risk? 
  1. Is performance driven by beta to a rising market, or is there evidence of genuine alpha relative to the right benchmark? 
  1. Are trade-level metrics (expectancy, profit factor, R:R) positive after adding realistic costs — and can you tolerate the implied losing streaks? 

Mangrove provides trading tools and infrastructure — not financial advice. Digital asset trading involves significant risk, including potential loss of principal. Past strategy performance, whether backtested or live, does not guarantee future results. Users should consult a qualified financial advisor before making investment decisions. 

Ready to evaluate your strategy with transparent metrics and built-in risk controls? Request access to Mangrove → 

Share the Post: