Call for C2 Score Algorithm

I would like to be able to personalize the current score formula by adjusting the weighting factors to suit my own needs. Adding more components (UI, UPI, etc.) as suggested in this thread is also a good idea. It would be a cafeteria score. Each component should be explained in an informative help page listing its derivation and examples of pros and cons.



I am also shocked to learn that the performance statistics table does not include commissions. That needs to be corrected.



It would also be nice to be able to export a list of strategies statistics into an excel spreadsheet for side by side comparison. The user can do whatever with the data. Copy and paste currently available is too time consuming.

I would keep it simple. Personally, I evaluate trading systems by dividing the annualized return by the maximum drawdown. Minimum system age is 1 year.


I think the c2 score should remain completely quantitative and a reflection of the returns and volatility. Qualitative inferences are a slippery slope in that this is supposed to happen socially, rather than through a c2 assessment.



With that said, the big thing missing from my perspective is tracking skewness / kurtosis. This would call out the martingale problem as well as cases where several losses are covered by large occasional profits or any other form of return distribution. Not knocking any, although I prefer normally distributed with positive skewness or better myself, but at least a subscriber gets a sense for how much turbulence heā€™s in for. Similarly, one can avoid negative skew if he/she wishes.



Just my 0.02.



CCF & Co.

Iā€™d give this a go from the statistical pattern recognition viewpoint if youā€™d make your historic data available.



Tony

I see no value in the C2 score; in fact, it is worthless to subscribers. I think it should be eliminated immediately.



Letā€™s rely on the mathematical statistics that speak for themselves.




One issue is that the score does not take into account userā€™s risk tolerance, nor his trading habits.



If you are willing to trade only manually, then it does not matter how good is the algorithm if it makes 100 trades a day. You might not be able to do it manually.



Also, IMHO algorithms are not correctly judged based on the results only. Results are a good only if you have a large history.



For example my algorithm can make 1 trade a day, 2 trades a day, ā€¦ or even 50+ trades a day.



The best results I get are with a trading strategy of 1-2 trades every day, then for 5 trades a day I still get good results, but not scale linearly (still, every new trade brings money).



For example if I run 100 simulation with only 1 trade every day, 40% of those simulation will lose money in a given year, 60% will make money, but overall, on average it would make a better investment than investing in 10 stocks every day (given the pool of money is limited, with unlimited money makes sense to do 50+ trades a day).



I would be able to send every day in advance N sets of the trades I want to do, and a rank which would represent the quality of the trade. So you could keep track not only of the first 2 trades I enter, but of all trades I could generate, and then someone based on their risk profile can chose to do 1 trade every day, or 50 trades if they have resources.



see http://treepl.com/p-7966.csv.html for example.



Regards,

Edison

Agree. see my post in recent C2 Score thread.

heavier weight on the Sortino Ratio and VAR downside measures.

less consideration to time duration, more consideration to skill, trade attribution, favorable excursions, adverse excursions, entry, exit efficiency.



the CFA institute has valuable cross reference in the area of risk management, and performance measurement. In fact, look into using the GIPS reporting methods which helps create excellent comparison performance analysis.



The market can only be perfect, no one should be higher than 999.

The score is too liberal if too many have it. ERA and sports averages are hardly above 50%.



no score should go beyond 500.



It should fall faster than it can rise and should attribute skill relative performance.



the C2 score should be removed until the matter is remedied, as imo, Matt is risking his entire system more than he realizes, all in the respected effort to protect it.



no score may be the best answer.



hope all this makes sense.





The problem as I see it is that the C2 score is reflective of past performance. But subscribers want comfort that it will somehow be reflective of the system providers future performance.



I would like to suggest that there are enough statistics gathered at C2 and enough collective brains that a scoring system that provides some level of future prediction could be developed.



In other words, instead of attempting to determine what past ā€œperformanceā€ parameters are of interest, letā€™s focus on likely future ā€œperformanceā€. This wouldnā€™t be an easy task but one needs to start by choosing factors and determining if there is any correlation between the factor and future performance. Once some factors are identified then they could be combined to generate a C2 score.



Science fiction? Perhaps but otherwise I donā€™t see the value in the C2 score.

Steve, all a system can measure is past performance. Developers use a period of time, say four (4) years to develop a system. This is called ā€œin sampleā€ development. Once a system is developed to the point a programmer thinks it is profitable, he tests the system on an ā€œout of sampleā€ period. If the system continues to be profitable, then and only then should it be traded with real money. Most of the equity curves I have seen on this website shouldnā€™t be traded. One of the matrix scores I like to see for judging a system is RAR/MAXDD (Annual Rate of return / maximum draw down).

Greg - you might not understand what I am saying. There is plenty of historical data to work with on this site.



How about somebody do a simple backtest - letā€™s call it ā€œin sampleā€ if that suits you.



Start with the parameter: RAR/MAXDD. Gather statistics for RAR/MAXDD for each system historical points in time; every month for example. With each data point gathered also capture the forward profitability of the system. Process each active system, historically month by month. At the end of this exercise you will know beyond a shadow of a doubt whether RAR/MAXDD had any predictive power for the systemā€™s future results.



Then then try another parameter, and so forth. In the end C2 would have some parameters that would have some predictive power. If they want a separate ā€œout of sampleā€ data set then thatā€™s cool too.



The reason why I suggest this is because it is absolutely meaningless to generate a C2 score based on what individuals feel are good factors. Base it on statistics, not gut feel.



Steve

Steve,

Well now that you explained it it does make more sense. It is a good idea and I do think it would add value in the selection process. Having said that, it still wouldnā€™t guarantee future success of a trading system. The truth is that all trading systems eventually fail. Trading is always done on out of sample data. You can look at the old indicators from the past such as the Donchian system and realize eventually the success of a system brings its demise. In a lot of the systems I trade now only show stellar gains since 2008. Markets change and systems have to too.

Wowsers. Keep It Simple, Smarties!



Ratios galore:

It is my observation that a lot of systems have an excellent record until subscribers come aboard; so a performance ratio of the system with subs vs. without subs is important. (Perhaps C2 can completely recalculate stats for subscribers as a "separate system." And perhaps not a System Score and a Vendor Score, but a System Score and a Subscriber Score (how "well" do the subs do?) Or all three!?!?)



What %-age of cumulative is coming from what %-age of trades? If profits are evenly distributed, the system is credited - if most of the cumulative profit comes from 1 or 2 trades over many more trades it is penalized (basically means they hit the jackpot once or twice.)



What is the relation (ratio) of cumulative profit (pos or neg) to average dd? (I saw someone mention something like this in the "gainers" column - would like to see the best ratio in there, not just a bunch of systems recouping from a large dd.) I think we can all agree that a 50% gainer with an avg. dd of 8% is a "better system" than a 100% gainer with a 50% dd (50/8 is > 100/50.)[LINKSYSTEM_63150935]

Why not consider using the Sortino ratio instead of the Sharpe?



Most investors/traders here likely do not consider upside volatility as a problem. That is what they are looking for. No?



As an aside: I know you are working on the score, but I would love to see the Sortino ratio in the grid if you can squeeze it in there. Thanks for your time and consideration.

Just from my observations I think the scroe should be based on performance, there are strategies on here with a 999, and 1000 perfect score and over the last 6 months have lost 20%-33% , its hard to explain to a subscriber why they lost a third of their account and the strategy is rated 1000.

I would like to propose that the following could also be used to help calculate the score:



1. Compounded annual return / average of 25% largest draw downs (higher is better)



2. Standard Trade Deviation (lower is better)

3. Standard Deviation by Month (lower is better)

4. Standard Deviation by Year (lower is better)

5. Kurtosis (0 is best)

6. Positive Skew (0 is best)

7. Negative Skew (0 is best)

8. Z-Ratio (higher is better)



1. This seems like a great factor to include as it will not severely impact good systems that have experienced larger drawdowns, and it still rewards those that manage to keep them to a minimum.



2-8. This will reward system creators that generate consistent and statistically significant returns, which can help create a smoother equity curve.

While you could just have a single number, why not make available the numbers that are felt to be most important? This doesnā€™t mean you wonā€™t have a C2 Score, but it could show people what the C2 score is based on and at the same time give each person a much broader view of what is contributing to the score. Personally, I would find such tools as the sharpe ratio, sortino ratio, and others to be more useful than 1 ratio that has a prioritized different things than I might.



In fact, why not let people make up their own custom ratio based on what they want? You could supply the standard information and a standard C2 Score, but letting people customize their own C2 Score using different formulas just might be better as not only does this allow for more personal judgment (and thus people canā€™t get mad at being stuck with just 1 standard C2 Score), but encourages innovation in coming up with formulas to find consistently successful systems, presumably making for smarter investors.

The number of subscribers or autotrader shouldnā€™t be considered. The score must be determined by the performance of the strategy. Subscribers must be attracted by an strategy because it has a good C2 score, not expect for subscribers to increase the score.

1 Like

Iā€™m not sure that thereā€™ll be one score that users can rely on given differing priorities and tolerances.

This said, Iā€™d like to suggest improvements with the grid to help investors better find strategies. The current customization of the columns is nice. My request is to add the ability to download these customized results to a CSV that can then be imported for further combination and analysis.

Another suggestion is add ranking of the different performance measures as percentiles (0-99 / 1-100) across all strategies. For measures like drawdown where less is better, do an inverse and set 100 (or 99) as the measure with the lowest. This would allow us to see more easily where a strategy stands relative to others on C2 for that measure, and the separation would allow investors to more easily see, combine and weight the measures to our own preferences.

I didnā€™t see the original thread, this showed up only now in my list.

tl;dr;
I donā€™t use the current C2 score. So if you are not interested in the opinion of a non-user, you can stop reading now.

I donā€™t use the current score for 2 reasons:

(1) I donā€™t believe using a black-box formula to help my financial decisions;
(2) I donā€™t believe that one size fits all. As a matter of fact I donā€™t even believe that one size fits a single person. I.e. I make my decisions about systems based on multiple formulas; it is a little more effort than simply sort systems based on a single metric, but I couldnā€™t find a single metric even for one person (myself.)

I understand that C2 wants to keep selection (the Grid) simple, but IMHO to make it too simple makes it useless.

I believe that you actually can have the cake and eat it too.

Here is (one) way to do that.

Develop a whole bunch of metrics. If C2 picks up this old thread, we can brainstorm about these. And I mean a big bunch. Hundreds.

Offer a UI where I can select and weight my metrics. Of course given that we talk about a LOT of metrics, the UI would be somehow involved, like grouping the metrics for easier selection and navigation.

This would solve #1 because it wouldnā€™t be a black box any more.

This would solve #2 because I could have >1 weighted function (to let me name and save them would be nice.)

And this could keep the baby without throwing it out with the bathwater, as C2 could offer ā€˜pre-packagedā€™ metrics, maybe >1, appropriately named and fully described (the metrics and weights published), like ā€œLong term, conservative, IRA-appropriateā€ (weights applied: age: 7.5, trades short: -1000, DD: 10.5, etc)

=========

A next phase, much harder but extremely interesting approach to the same problem (in addition to, not instead of) can be to find systems (metrics, weights) based on samples. I.e. ā€œI like the following systems: A, B, C, because [random examples:] use low margin, are in the market most of the time, donā€™t have more than 5 positions open, win/loss ratio is >2, etc, etc.ā€

In general: now we have ā€œselect strategiesā€ and then ā€œThe Gridā€ and ā€œExplorerā€ ā€“ maybe we want 4-5-6 alternatives here, not only two. Like ā€œThe Filterā€, ā€œThe Samplerā€, etc.

Thanks,

Joseph

1 Like