C2 data mining for fun and profit


I am following up all this discussions. I think you should stop personal attacks. The conversation is not constructive anymore. What I see in here, at least Vixtrader shows his track records and he is using his real money in his system. What else do you need ? If he blow up his account, thats his money. Some of followers from his tweeter came to here because the developer has some track records. May be you should create your own system if you do not like his trading style and stay away from his style. I just do not want all sudden this topic will be closed because of out of topic and personal attacks. I have not subscribed his program.


Definitely number of days since creation is a key variable. From Explorer we see that Sharpe tends to 0 as we grow the number of days…


Explorer code:

// How to obtain statistics + system data:
var sharpe_and_days_data =
(from syst in C2SYSTEMS
join statistics in C2STATS
on syst.SystemId equals statistics.SystemId
where statistics.StatName == “jSharpe”
// try to filter out extremes:
where statistics.StatValueVal >= 0 && statistics.StatValueVal <= 5
select new {
Id = syst.SystemId,
Name = syst.SystemName,
Days = Math.Round((DateTime.Now - syst.Started).TotalDays,0),
Sharpe = Math.Round(statistics.StatValueVal,2)

// Prepare data for ScatterChart:
var scatter_data = from item in sharpe_and_days_data
select new XYData() {
X = item.Sharpe,
Y = (decimal)item.Days };

// Show a graph:
CHART = ScatterChart.Create(
“Sharpe vs Number of Days since live.”,


// Show data we have:
TEXT = “Used data”;
TABLE = sharpe_and_days_data;


Contrary to some of the comments in this old thread, I think it would be very useful to do a statistical analysis of the predictors of both profits and of system failure. How important is longevity? How important are low drawdowns? How important is a win % below 90%?

In my PhD statistical training, one of my professors noted that smart people were often good at guessing which variables are associated with outcomes and the direction of the relationship, but they are very poor at guessing the MAGNITUDE of the effects, and which effects persist after adding control variables.

Also, there are different sorts of outcomes that one could try to model/predict. For example, if one were predicting which strategies might fail over the next six months, my guess is that current high-flyers would be over-represented. But if one were predicting which systems would get the highest average returns over the next six months, my guess is that current high-flyers would also be over-represented in that group as well. In part, of course, these two contrary outcomes would be expected because of the higher risk of most extremely high return strategies.

To do a good study, one would have to overcome the very serious survivor bias problem, probably by downloading the GRID and the LEADER BOARD today and then coming back later to test results.

As David Stephens has argued, this community thrives when investors here actually make money–and stick around. Both Collective2 and System Developers probably make more money when there is a larger pool of investors to draw from, some of them looking for additional strategies for diversification.

If there are some predictors that are much better than others for predicting future returns or avoiding collapse, these should be moved to the Leader Board. Personally, I’d like to see “Returns since autotrading began,” and 3 month or 6 month returns with blanks for strategies not old enough to qualify.


Not sure if the C2 members care about the longevity of any one particular strategy, they are always on the look out for the next hottest strategy, just like they would for the hottest stocks on the market.

As far as safety, you probably have to have faith to the trader leader, and hopefully he/she has lots of his/her own money in the game, not just $10k, 20k, etc. Minimum probably is $100K. Unfortunately, very few are available.

Constantly communicate with trade leader is also key.


I would love to have $100k to put behind my strategy, but that would be imprudent at this time given my means. If my strategy performs as intended (and as it has the past few months before joining C2), in a few years I will grow this strategy nest egg to that size and continue to trade it here.


You can start with 10k, all TOS systems are appreciated independently of the capital size.


Thanks @AndreyBlinkov, I’m currently trading 100% TOS with a new Roth IRA created/funded for this purpose (BrokerTransmit) but it won’t show as a TOS badge until my strategy makes 10 trades. I was only remarking that not all trade leaders have $100k to put behind their strategy (I do not), although I appreciate the sentiment behind the idea.


I’m reading through this post as my wife paints the trim in the basement…so I feel a bit guilty, but not too much. :slight_smile: It strikes me as interesting that I don’t think I read anything about understanding the system developer’s investment philosophy. I understand that the original post was more about data mining and creating an algo to predict future system success, which I do think would be interesting. But in addition to how the system has performed in the past, which all the grid data is…as a subscriber, I would be interested in understanding from the developer WHY the system performed the way it did, and how it is expected to perform under different scenarios in the future, and WHY it is expected to perform that way. What’s the worst case scenario? If the system developer had all his/her money invested in the system, what would he/she be worried about? Without looking at past data or algorithms in a model, in words…WHY does the system work? Maybe not as efficient as a super-grid, but maybe more valuable??? :slight_smile:


Looks like collective2 shows mainly active and recently active systems (at least thru API and the Grid). These are mainly more or less good systems. Unsupported systems (failed or good but stopped) are removed or hidden. Data mining on available data will have strong survivorship bias and might be useless in any prediction.


I don’t know what C2 uses to set the order on the Leader Board, but C2 score and popularity are probably involved.

I just ran a regression on Grid data I downloaded a few weeks ago (when I joined C2). The C2 score can be predicted quite well (r=.804) with just three variables:

  1. Max DD;
  2. log of Annual Return (w trading costs);
  3. log of strategy age in days.

Popularity was not automatically codable, so it was not used.

I was afraid that perhaps C2 maximized % winning trades or ignored the age of strategies, but neither turned out to be true.

So I wouldn’t say that C2 favors new strategies, except to the extent that many new strategies have unrealistically small drawdowns and unrealistically high annual returns. But system age is very important, too. And taking the log of annual return reduces the excessively high return systems somewhat.

The criteria that C2 uses are a priori sensible to use, though it might be slightly better to use variables that had been empirically proven to predict good future returns or low future system failure.


Excellent point indeed! :+1:

Such as?

Thank you for your input.


Getting back on to the point of predicting future profitability, I do agree that a ML algo is going to be most effective. I also think that you have to look at details of trades, NOT overall statistics. Personally comparing the average “worst drawdown” of any given trade with the average Trade PL to be very enlightening. Looking at details of trades can also help determine if a system developer is changing the style of the system, or just dealing with an unfavorable market. This also easily and quickly filters out things like martingale systems, and looks at the number and style of trades to determine whether you have a representative sample of the system.

I’ve often considered building something like this, but its been low priority on my research projects. If anyone is interested in collaborating, feel free to message me.