> I have spent a lot of time in writing the guide.
Thank you! Very much appreciated.
> I’ve also another suggestion: Design a short list of statistics you would first look to when evaluating a system.
1) Profit factor (divide gross profit by gross loss).
2) Profit / max intra-day draw down.
3) Single Unit Gross Profit / Single Unit Max DD.
Thanks!
PS. Perhaps it is a good idea for the future to add to the guide a chapter with contributions of others, like you. I would definitely apply some kind of editorial policy though, and refuse philosophical epistels or contributions that imho create a misconception. I don’t expect that coming from you, but you know what I mean.
Sam,
Thanks. Your list is a list of statistics that should be added in a next project, isn’t it? The current project uses only the daily equity values, so intra-day draw downs and trade characteristics (like number of units) are not input of the program. For the next project (if there is one) it might be convenient to have a list of requests indeed.
> Your list is a list of statistics that should be added in a next project, isn’t it?
Whatever and whenever. Beggars can’t be choosers. I would have had these on the last project, or C2 to begin with, but this is a short list in response to your post asking for a short list.
Thanks again.
Well, I’ll publish my stuff in my newsletter first. But if there’s anything in there that’s worth including in the guide, I’m happy to make parts of it available for a reprint. I might discuss data snooping in a future issue, but will probably do some other things first.
Jules will you please send me your list? Thanks Rick Haines
Well, it isn’t a secret, so I can as well post it here. But please be aware that this is just my personal opinion, and it may change. If I believed that everybody agreed with it, then I wouldn’t have programmed all the other things. So don’t hesitate to come up with another short list…
First, I look at the results based on the full history of daily values, because these are usually the most reliable. From these, I consider, in order of importance to me:
1. Ratio statistics of excess log return rates > Statistics related to Sharpe ratio > Sharpe ratio (Hedges UMVUE).
See the guide why a Sharpe ratio is important. The same reasoning applies to many other measure though. One reason why I still prefer the Sharpe ratio is that the confidence interval is also computed, and personally (because of my background) I 'm more used to think in this measure. I consider the excess log return rates, rather than the excess returns, because these are more relevant when the system is compounding. I look at the UMVUE because it is unbiased.
2. Lower bound and upper bound of the confidence interval of the Sharpe in (1).
If the lower bound is negative, this means that there is not (yet) enough evidence to conclude that the system is profitable. A large difference between the upper bound and the lower bound means that the estimate is unreliable.
3. Ratio statistics of excess log return rates > Statistics related to Sortino ratio > Sortino ratio.
This is important if the distribution of returns is skewed; then the Sharpe ratio uses the wrong risk measure in its denominator. I’m not very experienced in evaluating the Sortino ratio, so I cannot tell you what values are ‘good’.
4. Combined statistics > Calmar ratio
Comparable to the Sharpe ratio in (1), but it uses the max draw down in its denominator. This is important because losses tend to cluster together to form a big draw down, and this isn’t captured in the Sharpe ratio or the Sortino ratio. I read somewhere that it is known to traders that no system can maintain a Calmar ratio larger than 2, which means that for 100% annual profit you should expect a draw down of 50%… So far I saw many systems with a much higher value and I’m not sure how to interpret that: to good to be true?
5. Combined statistics > Compounded Annual Return (geometric extrapolation).
For compounding systems this is more relevant than the other annualized return. Of course, the return should be considered in combination with the risk, which is the following points.
6. Draw down statistics > Quartiles of draw downs > Mean of quarter 4.
This is a simple risk measure. It is comparable to the max draw down, in that it says how consecutive losses can accumulate. It is somewhat more reliable than the max draw down because more observations are used.
7. Order statistics > Quartiles of return rates > Mean of quarter 1.
This is also a risk measure. It gives an indication of how much your account value can change within one day. E.g. a value of 0.9 means that on the 25% worst days the account value decreased on average with 10%.
8. Order statistics > Risk estimates for a one-period unit investment (based on Extreme Value Theory) > Expected Shortfall (both methods).
This is like 7, but for the 5% worst days instead of the 25% worst days. Furthermore the direction is reversed, e.g. a value of 0.9 means a loss of 90% (this reversion is somewhat inconvenient, I admit). Because there may be only a few relevant observations in the tail, this estimate is improved (hopefully) with extreme value theory. I cannot tell whether the moments method is better than the regression method; that’s why I programmed both, and I tend to pick the worst one. If both estimates are unavailable then I look at the estimate based on a lognormal distribution.
9. alpha based on log return rates, and lower bound of alpha.
This says whether the system has a market advantage. However, the S & P is always used as benchmark, so this is relevant only for systems that are related to the S & P (e.g., not forex).
If the system shows a very different performance recently, then I also look at the results for the last 6 months.
Hope it helps.
I just read the guide. I think it’s written very clearly and should serve its purpose well. Congratulations, Jules!
Jules Ellis
I’m digging in your guide as well as others. So far I haven’t found any misconceptions with my picture of the word.
And it’s educational lol
I’d like to say thank you for the new stats and your guide.
Hmm… sorry
I read somewhere that it is known to traders that no system can maintain a Calmar ratio larger than 2, which means that for 100% annual profit you should expect a draw down of 50%… So far I saw many systems with a much higher value and I’m not sure how to interpret that: to good to be true?
Best usage of Calmar ratio is in estimation of real trades, not simulations. It’s true for any stats, but Calmar ratio is more sensitive to real world. Just my 2c. Sorry again.
Eu
Thanks, Science Trader! As I said, I spend a lot of time in it, but still I can imagine that people find it too difficult or too long or… So I’m glad that at least one person thinks that it is written clearly!
Jules,
While others are busy debating the hope & hold ratio I tried to get my brain around something I observed in the advanced statistics. In some cases the p-value for a statistic is less than 0.05, while the CI includes zero (e.g. RT Forex North - daily values, Sharpe ratio, alpha). How should this be interpreted?
I do not agree with your conclusion that " So however you put it, regardless of whether your emphasis is on high returns or low risk, it is always more rational to chose the portfolio with the highest Sharpe ratio and adjust the leverage to your personal wishes of return and risk, than to chose a portfolio with a lower Sharpe ratio."
In other words your conclusion suggests (in the example of your guide), that investment A better than investment B which is false. Investment B is definitely better than investment A. Of course investment 1.6A (which I would call investment D rather than investment 1.6A) is better than investment B. The reason your conclusion is false is because of your assumption that simply increasing the leverage from 1.0 to 1.6 for investment A would yield investment 1.6A is invalid. It is flawed because of the two laws in economics: the law of diminishing returns and the law of diminimshing productivity.
ps: sorry, after reading your above conclusion, I discontinued reading the whole guide, as it does not make sense to me anymore.
Science Trader,
The p-value is for a one-sided test, while the confidence interval is two-sided. For the confidence interval this means that it has 2.5% at the lower side and 2.5% at the upper side. So if you test whether the confidence interval includes 0, this is comparable to a one-sided test with the level of significance at 2.5% instead of 5%. So that test will be more conservative, i.e. more inclined to retain the null hypothesis, than comparing the p-value with 5%. So I would interpret this as a case where one can be reasonably sure, but not very sure, that the population mean is positive.
Even if you compare the p-value with 2.5% I am not sure that it can never contradict the confidence interval. The p-value pertains to a hypothesis about the mean, while the CI is a confidence interval for the Sharpe ratio. If it was a confidence interval for the mean then it could never contradict the conclusion from the corresponding p-value, but this isn’t the case. So they are slightly different statistical methods. Obviously the population mean must have the same sign as the population Sharpe ratio. But I have seen enough cases where different estimation methods lead to conclusions that contradict each other, so I won’t be surprised if it can happen here too. However, the example that you gave has a p-value larger than 2.5%, so it can be explained by the one-sided / two-sided difference.
While others are busy debating the hope & hold ratio
No. I think that Julies effort by combining the stats and giving a guide for explanation of the stats is more, more, more important and significant that any home made ideas. I think it’s major improvement of C2 that might be compared only with RF.
So… please continue
Eu
That part of the guide is a presentation of the reasoning of Sharpe (1994). If you disagree with it, I suggest that you discuss it with him, or with the committee that gave him the Nobel prize. I’m sure that they will be impressed by your supreme insights in economy.
It is Harry Markovitz (a Nobel laureate also) who discovered that when you have 2 portfolios with the same average arithmetic rate of return, the one with the lower StDev (or volatility) will have a higher compounded rate of return. This does not mean that investment A in your flawed example is better than investment B. It means that investment 1.6A (which should rightly be called investment D) is better than than investment B.
Why are you calling the nobel prize committee to support your flawed conclusions? Ran out of arguments? Who are you going to call next? Reminds me of this guy at C2 who was calling everybody he knows to come to support his arguments…
thanks, that clarifies the issue.
"Reminds me of this guy at C2 who was calling everybody he knows to come to support his arguments…"
You mean that Pal guy who always refer to websites and literature to support what he says, even when many times they contradict him?
You say that 1.6A is better than B. I say that too. What’s the difference? You say that 1.6A is not the same as having A with leverage. Well, read Sharpe (1994), particularly the section below equation (25). He clearly disagrees with you. And I don’t see that his argument is flawed.
There is an assumption that perhaps should be made explicit in the guide, namely that the investor can have every level of leverage that he wants. But that’s not what you are saying.
There are also two typos in that section (1.4A should be 1.6A and 1.6A should be 0.6A). I corrected that yesterday in my original document. But that’s not your point.
I referred to the committee because of your ps. Since you’re so disappointed that you stopped reading, I don’t see how we can have a sensible discussion.
>You say that 1.6A is not the same as having A with leverage. Well, read Sharpe (1994), particularly the section below equation (25). He clearly disagrees with you. And I don’t see that his argument is flawed.
This assumption that 1.6A (again it should be called D not 1.6A because it is very misleading) is the same as having A with leverage is a myth and goes against the two laws of economics. A law cannot be broken; that is why it is called the law in the first place. Theory comes inductively from experience, not the other way around. “Theory” without practice is useless and “Practice” without theory is impossible. These laws manifest itself in the market place only in real-trading. Sharpe is an academic theoretician. He may not be aware of these. Prof. Sharpe accepted this myth (irrationalism) and then interpreted his ratio accordingly and others followed this irrationalism and expand on this…