Just noticed the "new" advanced statistics- a complete breakdown of sharpe and drawdowns…
Thanks Matt for constantly looking for ways to improve this site.
-David
This looks very interesting to myself also. Is there any recommended reading I can do to help me understand what it is trying to tell me? Thanks Rick Haines
First, the thanks ought to go to Jules Ellis, who wrote the new statistics module. Jules is in the process of putting together a small guide which will explain some of the stats in more detail. I’ll post it shortly.
A few caveats: This is a very early release, and is undergoing some tweaking. You will notice, for example, that some of the statistics listed in Jule’s module do not match the statistics elsewhere in C2. This is because the different software pieces use slightly different raw data. Over the next week or so I will make sure the data is consistent, and you will begin to see some of Jules’ stats work their way into the rest of C2.
I didn’t want to call too much attention to the new statistics yet, since we are still ironing out some of the kinks. So a much more public and hearty thanks to Jules will be forthcoming soon.
Matthew
P.S. Jules is one of several C2 Members to contribute software or add-on modules to C2. If you are a programmer looking for an interesting project to keep you busy, please consider contributing to the site. If you need ideas or guidance, contact me.
Would it be possible to put my previously suggested Hold&Hope Indicator into the mix? I find that it is the best way to distinguish between systems that make money vs. Trick Plays (averaging down, refusing to take a loss) or things like a $19,000 drawdown en route to a $600 profit…
That was (from system track record):
1) eliminate all rows with "No calc" – since cannot be used as part of formula
2) sum max DDs in remaining rows
3) sum net profit in same remaining rows
Hold&Hope indicator - #3 / #2.
Systems that use a lot of tricks to get a nice equity curve, are somewhat < 0.4. Averaging down drives up the drawdown (and lowers this value). Holding on until trade turns around, even with severe drawdowns also lower the Hold&Hope Indicator. Really bad systems might be like 0.2.
Good, tradable systems are usually > 0.5. Great systems (over 1.0) are quite rare…
This is also rather intuitive. It is a real worry when someone has a 0.2 - meaning that the subscriber is punished with about $1000 in drawdown for $200 in profit.
While I welcome the efforts by Matthew and Jules to increase the functionality of the site, I see one problem. These statistics are necessarily based on an average hypothetical account. What matters to me is what will happen in MY account.
For example, extreme-os, a popular system, has a Sharpe ratio of 5.7 according to the "old" C2 statistics. I am subscribed to this system and experienced an average $0.065 slippage per trade, in addition to $0.005 commissions (both per share). As a result in MY account the Sharpe ratio would have been ~4. One of the reviewers of this system reports >$0.10 slippage per side (>$0.20/trade) when trading 1000-4000 shares, and his Sharpe ratio would have been less than 1!
Thus, all we can really say about a Sharpe ratio for extreme-os is that "it lies somewhere between 0 and 5, depending on individual circumstances". In other words, it is pointless to report these statistics with 3 digit accuracy for systems where slippage and commissions have a serious impact.
For systems that are sensitive to slippage and commissions, I think C2 should not report these statistics, as they are meaningless and misleading. For all other systems, the statistics are potentially a great source of information.
Bravo to Jules and Matthew! Thank you!
I agree with Ross that some form(s) of Net Profit / DD
calculation would be on the top of my wish list.
Thanks again.
Ross and Science Trader,
I agree with the comments, but it cannot be done in the current project because it is entirely based on the array of daily equity values, as I proposed in December on the forum. I’m aware of the limitations. Perhaps it can be done in a next project. But I wanted to begin with this because the data structure is simple. A software project like this is likely to have all kind of communication and integration problems (as is the case right now), so I think it is wise to start with the easy things and learn from them before attempting more difficult things. This is not a promis that there will be a next project, though. MK and I will first have to finish this and evaluate it.
Jules,
What I’m suggesting is: just omit the statistics in case they’re not meaningful, i.e. for systems where slippage and commissions have a severe impact. It’s better to not report the information than to report the wrong information. I just think reporting these statistics for all systems is too big a leap at once. Why not start with systems where the Cumu $ profit after slippage and commissions is in close range with the raw Cumu $ profit, and leave other systems for a future project in which you or others find a way to make these statistics robust to slippage and commissions? This wouldn’t require any additional work for you, since Cumu $ profit after slippage and commissions and raw Cumu $ profit are already displayed for a fair number of systems. All that is needed is a simple rule that would determine whether the advanced statistics link is active or not.
Science Trader,
I’ll let Matthew decide, but I prefer the link being active for all systems. For example, you mentioned extreme-os, but I am really interested in the output for that system. I am subscribed to it too.
- I would like to see what the Compounded Annual Return is; this will probably be less than the Annualized return that is reported.
- I also want to see what the performance of the last 6 months is, because suspect it to be decreasing.
- I also want to see what the reliability of the estimates is, because the future won’t be exactly the same as the past.
The slippage comes on top of this. But even without this the above points can explain why some people expect too much of it, and displaying these statistics may lower their expectations.
If the information value of the statistics was 0 in these cases then I would agree that they should not be reported, but I think that the information value is larger than 0 even though it is not 100%. In that case I prefer the user to decide for himself how he uses the information.
The guide that I have written explains how slippage can greatly affect the return. It also points out the limitation of not using the trade data.
Nevertheless, I agree with you in that I can imagine that a naive user may be overwhelmed by the statistics and believes that this page must be the “truth”. Perhaps it would be better to change the name “Advanced statistics” into “Detailed statistics” (less intimidating) and to add some warning text on top of the page that addresses these two limitations.
Science Trader,
To explain my point further: For extreme-os C2 now reports 267.94% Annualized return. My program reports for the last 6 months a compounded annual return of 1.267 = 126.7% (please don’t take this outcome seriously, we still have to change a few integration details that can have a huge impact; I merely try to illustrate my point). The guide explains that with 0.1% slippage per day (taken as percentage of the account value) you need a compounded annual return of 44.1% to break even, and with 0.2% slippage per day you need 107.7%. My estimate, based on my own experience, is that my slippage is somewhere between these two. So I would roughly estimate a compounded slippage of about 80%. If I simply subtract this from 126% then my annual net profit would be 46% (the actual formula in the guide is not simple subtraction and entails a return between 9% and 57% in this case). This is substantially smaller than the 267.94% that is reported now. Again, please don’t take this seriously as a report about extreme-os, the numbers aren’t right on this moment; but I expect that it will work in this direction.
Jules,
Explaining the limitations in a guide and adding a disclaimer sounds like a good idea.
Jules, I actually would suggest one additional statistic: some measure of autocorrelation, e.g. Box-Ljung.
That’s a nice idea. Let me think about it and discuss it with MK. Because we are on different continents, it isn’t like I can step into his office and add a subroutine. Thinking about this, it occurs to me that more suggestions may arise, and that it is more convenient to wait a while and then do them all at once in a follow-up version.
There should also be written something in the guide about it in that case. Have you any suggestions why it would be important? My idea would be that it is mainly a check of the assumptions underlying the confidence intervals and hypothesis tests.
It’s indeed important for the independence assumptions underlying the confidence intervals. But there’s more to it. An ideal strategy would have zero serial correlation: In general people prefer to see their account increase each day/week/month by a consistent amount that is independent of the previous period, instead of having runs of several days with profits, followed by runs of several days with losses. Just to give an example, take two systems (A and B) and their daily (or monthly) returns in %:
A. -2, -1, -2, 3, 3, 2, 3, -4, 2, -3 (serial correlation = 0.1)
B. 3, 3, 2, 3, -3, -2, -1, -2, -4, 2 (serial correlation = 0.5)
Both systems have the same returns (and Sharpe ratio) but the ordering is different. Many people would prefer A over B, as B looks great in the beginning, but then goes through a larger and longer drawdown.
The serial correlation is also useful if subscribers want to apply their own money management. In case of high positive serial correlation (assuming it is consistent through time), it pays off to invest more money as profits increase (as it is more likely tomorrow will also be a profitable day), and scale out as profits decrease. In case of negative serial correlation, the opposite would be better. In general, I would be worried if I’d see extreme values (given the number of observations is sufficiently large), as it might signal that the vendor did not address these issues himself in the first place.
Funny, I have a similar example in the guide. This would basically be covered by considering the draw down statistics, which are probably more easier to understand than an autocorrelation for many traders. The reason that I ask is that the purpose of the statistic is often important in deciding what exactly has to be computed. Even in the case of a simple autoregressive model, for example, there are these choices:
1. based on the account values V(t), the returns R(t) = (V(t) - V(t-1)) / V(t-1), or the log return rates L(t) = ln(1 + R(t)), or the returns in excess of the risk-free return, or the log return rates in excess of the log of the risk-free rate
2. no differencing, one time differencing, two times differencing,…
3. up to lag 1, or lag 2, or lag 3, …
4. autocorrelations or partial autocorrelations
5. only correlations, or also slope and intercept
6. including confidence intervals and hypothesis tests or not
Any choice in this will be done 3 times for a system that is older than 6 months. Obviously, the size of the output will explode if we blindly do it all. So there have to be some theoretical considerations in order to limit the possibilities. I would reason like this:
1. The present statistical tests are all based on excess returns or excess log return rates, so evaluation of the independence assumption is only relevant for these. This is also enough for the effect that you described.
2. For the returns and log return rates there is conceptually no need of differencing, but the sequence may actually still be non-stationary. Nevertheless I would chose (choose?) no differencing. What do you think?
3. I would stop after lag 5, or perhaps even after lag 1, just to limit the size of the output.
4. I must think more about this.
5. Correlations would be enough, since the model is not actually used for prediction.
6. A nonsignificant autocorrelation does not necessarily invalidate the other inferential statistics, so I think that a significance test is in order.
That implies 5 lags * 2 return measures * 2 statistics (estimate and p-value) * 3 periods = 60 extra statistics if the system is more than 6 months old. That seems to much. To keep it practical I suggest to do only lag 1.
My suggestion would be (actually, for any of the advanced statistics) to only include it if it’s sufficiently different from another statistic (empirically!).
If we consider 60 different autocorrelation statistics, but the ranking of C2 systems under each of the statistics would overlap for ~90%, it should be sufficient to just report one. If it turns out serial correlation is not a real issue for any system, it could be omitted in the first place (while documenting this fact in the guide).
For serial correlation I would just take a one-period lag on raw returns, for each of the three periods, and include a p-value if possible. If a 2-period lag, or log-return serial correlation changes rankings drastically, well, then there’s some useful information in there worth conveying.
I cannot know whether they are empirically different unless I write the program first. I try to avoid that.
I expect that there won’t be much difference between an autocorrelation based on returns or on log return rates. But I would rather chose the one based on log return rates because their distribution is probably less skewed if the returns are extreme.
Thanks again Jules for the time you are putting into writing the program for enhancing our experience, your time involved in this project is appreciated.
-David
Jules,
Here’s another idea. I’m not sure about the best way of incorporating this, but somehow (maybe just a short paragraph in the guide) subscribers should be aware of data snooping issues. I.e., if all vendors would be flipping coins, and their systems would follow random walks, we’d still see some systems with attractive Sharpe ratios. E.g., together with reporting Sharpe, it would be interesting also to report what % of systems would be expected to exceed that level by chance alone. I think that would help to keep people’s expectations more realistic.
Hi Science Trader,
I think that issue is sufficiently emphasized by Ross on the forum
Many of such cases would disappear if you consider the lowerbound of the confidence interval of the Sharpe ratio. What won’t disappear is that a system with random positions in a bull market will get many “significant” positive statistics, even if there is no real market advantage. I have included this example in the guide with the warning that statistics is not a substitute for common sense.
I have spent a lot of time in writing the guide. Evidently, I cannot address every possible misconception, nor can I give an investment course. So I rather suggest that you discuss these issues in your monthly review. I’ve also another suggestion: Design a short list of statistics you would first look to when evaluating a system. I have one myself (because MK asked it), and I can send it to you if you want.