Evaluating a strategy for beginners

I’ve been an algorithmic trader for about 6 years. I have a bit of knowledge on how to evaluate a strategy, having evaluated many winning and losing strategies of my own… I have noticed a trend on the strategies here to return a huge amount over a small time period so I thought I’d give a short rundown on how I evaluate a change to one of my systems…

  1. Sample size is king
    a. Metrics like sortino, sharpe, expectancy, etc are all fine but none of them capture the sample size in question. You need to manually look at the sample size and weigh that mentally against the metric you’re looking at: Sharpe of 1.5 with 20 samples is worthless, sharpe of 1.5 with 1000 samples is significant.
  2. Time period is important
    a. How long has the strategy been running? 3 months is nothing, you want to see how the strategy acts over many market conditions and a minor pullback in a massive bull run does not constitute a bear market - in a real bear market many (all?) of the top strategies on c2 are going to get completely donked then you will suddenly see a bunch of short strategies become top of the leaderboard
  3. Max DD is key
    a. This gives you an idea, combined with a big sample size and long time period, of how a strategy acts in a downturn. Max DD of the SP500 from 2000-2019 is about 56% - this is buy and hold DD and if you see that in automated strategy RUN AWAY. Most funds want to see max DD under 30% - you can use DD numbers to determine position sizing in a backtest, so if you see massive max DD chances are the strategy is overleveraged.

Evaluating a strategy is not hard if you have all the metrics needed, but trusting “the leaderboard” or whatever has the highest 30 day sharpe is going to result in you losing all your money when market conditions change.

Exposure to multiple segments of market conditions is what you want to see, not an overinflated single metric.

I’m not going to link to my strategy because I’m not trying to sell you something.


The Leader Board is not ranked by any single metric, as you suggest. It is ranked by C2 Score, which is scoring method we’ve been working on for a very long time. (It incorporates many factors.)

I think other sites have the same problem, you end up weighing strategies with good results recently over strategies with good results long term since the metrics you’re using are likely not incorporating time range and sample size