Sharpe Ratio Calculation is WAYYYY off!

The Sharpe ratio calculations appear to be incorrect now. For example

the “Compounded Money” system is showing an SR of 710. That is

not seven point one zero but seven hundred and ten!



This happened a few days ago, there must have been some change

in the SR calculation code.



Formerly, when I would do screens, I would sort by Sharpe Ratio.

That sort is much less useful now.



While I can’t vouch for the actual Sharpe numbers being displayed, they at least seem to have corrected some serious deficiencies in comparing one system to the next. In the thread “Sharpe Ratio understanding”, I posted some systems that had mediocre equity graphs with high Sharpes, and some that had smooth graphs with real bad Sharpes. I’ve reproduced those lists below. What is notable is that every one of the “Smooth graph - Bad Sharpe” systems now have much higher Sharpes that seem - to the eye - to better represent the system. The roller coaster equity systems, OTOH, have Sharpes that, for the most part, look like they haven’t changed much. Also, if you do a search for Sharpe < .5, the results ALL have truly bad equity graphs, unlike before. So I don’t know if the numbers are correct, but I can plainly see that they represent system performance much better than before, and comparisons between systems seem much more meaningful now.



Hans.



As of a few weeks ago, these were some of the highest Sharpes, all with roller coaster graphs. Their Sharpes, mostly, haven’t changed much. The numbers are the Sharpes from back then:



Midas Med-Term (Prudent) - 1.937

YeKai Swing trader - .916

Bender’s S&P Emini crossover - 2.126

Entropia - 1.053

ER2 Cash Cow - 1.596

Goofiz Foliage’s Future ATM - 1.524

Kondor Futures Trading System - 1.036

Under Development - 1.001



These systems have relatively smooth equity curves, but previously had bad Sharpes. They now appear among the best systems, Sharpe-wise, with most/all above 1.5.



ARS - (-.091)

Beck TA Commodity Trader - .176

Blog.fallondpicks.com - (-.091)

CTS CLC System - (-.235)

CTS Crash Control - (-.296)

CTS SnapBack System - .283 (much smoother than you’d expect from the S.R.)

ETF investor - (-.839) - Wow! Look.

Granville’s Nemesis - .09

MB trading - .36 REAL steady!

Momentum Buys - .171

Mutual Fund Trader - .076 A bit volatile, but not like the high Sharpes posted earlier

I’ve seen Sharpe Ratios top over 10,000, but they are not systems I’d want to trade. I would take a look at the RF, returns, DD’s, Ave. trade, profit factor (W:L ratio) or expectancy.



Sharpe is an okay secondary performance measurement but not something I’d want to optimize/sort on by itself to evaluate a system. When one optimizes on Sharpe it has a tendancy to arbitrarily push the number of trades down which is fine until it starts to affect other things like CAR / Max DD which I’m really much more interested in any way. So for me it’s one of those numbers that all other things being equal I’ll take the results with a higher Sharpe Ratio.



The lower Sharpe Ratio (SR) for those systems with equity graphs which are as you call roller coaster (which it actually isn’t for my method Midas Med-Term (Prudent) which has a SR of 2.274) is due to the outlying winners, and their effect on the SR calculation. As more data is added to the equity graph, the roller coaster effect dimnishes as more data is compressed into a smaller space. The SR is an accurate reflection of this effect or rather the absence of it. This quote, taken from trendfollowing.com, discusses the negative side of using a SR to calculate risk versus return- ( Mulvaney also notes that conventional measures of risk-adjusted returns (i.e. SR) miss the boat:



“Implicitly using the standard deviation assumes that the returns are normally distributed. But in fact our returns stream is very positively skewed, and highly asymmetrical. Our standard deviation is extremely high but this is because of the positive outliers. The standard deviation involves squaring the deviations from the mean and the outliers are what really push it up. So a very strong case can be made that CTAs’performance is severely penalized by the Sharpe ratio.”

ps: Actually, with the new SR caluculations, the SR values for those systems Hans Hansen mentioned are not correct for some of the systems(as he has included in his message), so I don’t know what the new SR values are for some these systems (actual SR values may be much higher as evidenced so far).



Particularly for

Midas Med-Term (Prudent)(O) n/a instead of 2.274

Entropia 4.332 instead of 1.053

Kondor Futures Trading System 1.817 instead of 1.036

Under Development - n/a instead of 1.001



For all other systems mentioned by Hans, I’m not interested because their returns are definitely not attractive eventhough some of them may appear in the best systems list. They may be best systems for MK or others, but not for me.

I’m not sure, but I think you are missing my point. Altho I can’t vouch for the accuracy of the actual Sharpe numbers, I think that the ranking of the systems is far more accurate. ie - higher Sharpe systems tend to have better equity curves than they did in the past. My previous posts from several weeks ago were meant to point out inequities in the Sharpe rankings. Even if the numbers are not being correctly computed, the relative comparisons between systems seems FAR better now. In particular, the systems that I listed as “Smooth curves, bad Sharpe” are now shown with Sharpe ratios that much better represent what intuition (from the equity graph) would indicate. So the RELATIVE ranking is much improved, IMHO, even if the numbers themselves are suspect. But, in the end, aren’t we really interested in the relative rank? That is, system .vs. system?



Hans.

The change is an improvement, but it may have over corrected the other way. I am assuming the Sharpe ratio plays a large roll in on the “out of 10” C2 rating.



Before the adjustment my C2 rating was 2.91 out of 10, but jumped to 9.81 out of 10 following the readjustment (http://www.collective2.com/images/c2_16054184.jpg)



Now, I can accept my system is far from the best, but it has been consistently profitable so I doubt it ranked in the bottom 30% of all systems as was originally the case. By the same token. I don’t think it is in the top 2% either… so some further tweaking is needed.



JMO,

Declan

SR is a good relative measure to compare similar systems, not dissimilar or any two systems arbitrarily regardless of their disimilarity.



That is the my point which you seem to be ignoring. But then again, I don’t blame you; the current setup at C2 makes it extremely difficult to identify similar systems or good systems in particular, so many seem to be comparing systems arbitrarily using SR inspite of its limitations and drawbacks (which are well documented).

There are two separate projects being undertaken. (1) A Review of the way Sharpe is calculated, and (2) A Review of the way the C2 Score is calculated. Project #2 is still not finished, so please expect further changes. I’ll hold off comment on that aspect of the changes, for now.



About Sharpe calculations: I spent some time changing the way Sharpe was calculated. Here is context: To calculate the Sharpe, you first divide the system’s returns into “periods” (of whatever length you choose). Then you determine the mean return and standard deviation of periodic returns. Then you compare this with the risk-free rate (a T-Bill rate is good).



When Collective2 was young, there weren’t a lot of systems with long track records. Thus, I used different period lengths for different systems, depending on the system age. A very young system might use daily returns, while an older system might use quarterly returns.



Now, I have thrown that out. Instead, I use only monthly returns, for all systems. I am not religious about this – I know Pal Anand suggests that the period used is immaterial since returns are (theoretically) normally distributed – but I thought that comparing identical periods might yield more consistent results across systems. If you want to argue that I should go back to weekly (or daily?!) returns for young systems, feel free to make a case.



But the upshot of switching to monthly returns is that I imposed a minimum track-record length of 90 days (3 periods) before Sharpe is calculated. This at least gets rid of the problem of very young systems, with track records of no statistical significance, receiving high Sharpe ratios.



Finally, I should point out one other change in the calculation. I am not certain what I have done is valid, so I request feedback or comments about it from math-savvy users. As futures traders know, a futures account can buy a T-Bill and use it (discounted 90%, say) as the collateral required for trading margin. In other words, you automatically receive the risk-free rate in your account, without executing trades.



For this reason, a futures account that makes zero trades doesn’t really underperform the risk-free rate. It automatically achieves the risk-free rate. So I made the following modification to the Sharpe formula: I look at the overall “percent” of the account allocated to futures trading. Then I discount the risk-free rate used in the Sharpe formula by this amount. Thus, for a system that trades solely futures, the risk-free rate used in the formula would be zero percent (i.e. we want to analyze how much additional risk you take to achieve incremental results over the risk free rate. If you make no trades, you still get the risk-free rate.) Similarly, if a system trades 50% futures, the risk free rate is half of the T-Bill rate.



Now that I write this, I’m wondering whether this is valid. Math people: please let me know. If necessary, I’m happy to explain further.

Included is a ELA written by Bob Fulkes, with the function outline below. It calculates the Sharpe Ratio as a function to be used in an indicator…to be used in conjunction with a system. I find it to be very useful and I want to thank Bob for it.



{Function: SharpeRatio}



{



Function : SharpeRatio



Last Edit : 11/24/98



Provided By : Bob Fulks



Description : This function calculates and returns the Sharpe Ratio

of a series of account values. It samples the series of values

on a yearly, quarterly, monthly, weekly, or daily basis as

determined by an input. It also calculates average return and

standard deviation. It prints the results in a form suitable for

importing into an Excel spreadsheet for plotting.



Inputs:

Mode - Sampling period (0=yearly, 1=quarterly, 2=monthly,

3=weekly, 4=daily

NetValue - The series of values to be sampled. It should be

equal to the beginning equity plus accumulated net profits.

Periods - The number of yearly, quarterly, etc., periods to

include in the calculation. If this value is zero, the

function will use all periods up to a maximum of 1500.

PrntMode:

zero - Print one line summary only on last bar

> zero - Print values as stored in array plus summary

< zero - Do not print anything

Futures:

TRUE - For futures trading (Sharpe = Ave / SDev)

FALSE - For Stocks (Sharpe = (Ave - 5) / SDev)



Method: The function samples the value of the trading account at

periodic intervals, calculates returns in each period, then

calculates the average and standard deviation of returns and

annualizes them. It then calculates the Sharpe Ratio as noted

above.



Assumptions: The usage for stocks assumes a constant value of 5%

for the risk-free return (T-Bill interest rate). This is a good

assumption for recent times but may be incorrect for the distant

past. The Sharpe Ratio is independent of the sampling interval

if the returns are normally distributed. Returns are typically

not strictly normally distributed so the sampling interval will

affect the results somewhat. There should be more than about 25

samples to get reasonable accuracy so use daily samples for 1

to 6 months of trades, weekly samples for 6 months to 24 months

of trades, etc.





© 1998 Robert G. Fulks, All rights reserved.



}



Input: Mode(NumericSimple),

{0=yearly, 1=quarterly, 2=monthly, 3=weekly, 4=daily}

NetValue(NumericSimple),

{Net value of account = Beginning Equity + NetProfit}

Periods(NumericSimple),

{Number of periods to use in calculation, zero = all}

PrntMode(NumericSimple),

{0 = print summary, 1 = include detail, -1 = don’t print}

Futures(TrueFalse);

{TRUE for Futures, FALSE for Stocks}



Vars: Index(0), {Index used to index Return array}

SIndex(0), {Index used to sum Return array}

LNetVal(0), {NetValue at end of previous period}

LClose(0), {Close at end of previous period}

YClose(0), {Close at end of previous bar}

Size(0), {Sixe of data to be stored in array}

ILast(0), {Number of entries in array}

Ave(0), {Average return}

ASum(0), {Used to calc Average}

SSum(0), {Used to calc Standard Deviation}

SDev(0), {Standard Deviation}

SDMult(0), {Multiplier to annualize Standard Deviation}

Mo(0), {Month for bar}

MP(0), {MarketPosition}

MPX(0), {MarketPosition flag becomes 1 on first trade}

YMo(0), {Month for previous bar}

Yr(-99), {Year for bar}

YYr(0), {Year for previous bar}

YDate(0), {Date for previous bar}

AvMult(0), {Multiplier to annualize Average}

NetVal(0), {NetValue series}

YNetVal(0), {Netval for previous bar}

Active(FALSE), {False for first calc then true thereafter}

Record(FALSE), {Flag to trigger calculation at end of period}

Summary(FALSE), {Flag set if summary printed}

StDate(0), {Start date}

Sharpe(0); {Sharpe Ratio}



Array: Return1500; {Table of returns as a percent}



Size = iff(Periods > 0, Periods, 1500);

Size = MinList(Size, 1500);

NetVal = Netvalue;

Mo = Month(Date);

Yr = Year(Date);



{This determines marketposition in either systems or indicators}

if MarketPosition <> 0 then

MP = MarketPosition

else

MP = I_MarketPosition;



MPX = iff(MP <> 0, 1, MPX);



Condition1 = Mo = 1 or Mo = 4 or Mo = 7 or Mo = 10;



begin



{Initialize for yearly}



if Mode = 0 and Yr <> YYr then begin

SDMult = 1;

AvMult = 1;

Record = TRUE;

end;



{Initialize for quarterly}



if Mode = 1 and Mo <> YMo and Condition1 then begin

SDMult = 2;

AvMult = 4;

Record = TRUE;

end;



{Initialize for monthly}



if Mode = 2 and Mo <> YMo then begin

SDMult = SquareRoot(12);

AvMult = 12;

Record = TRUE;

end;



{Initialize for weekly}



if Mode = 3 and DayOfWeek(Date) < DayOfWeek(YDate) then begin

SDMult = SquareRoot(52);

AvMult = 52;

Record = TRUE;

end;



{Initialize for daily}



if Mode = 4 and Date <> YDate then begin

SDMult = SquareRoot(253);

AvMult = 253;

Record = TRUE;

end;

end;



{Action if new year, quarter, month, week, or day}



if Record = TRUE then begin

if Active = TRUE then begin

{Each time except first time}

begin

ILast = ILast + 1;

if LNetVal <> 0 then Value1 = YNetVal / LNetVal;

if Value1 > 0 then Return[Index] = 100 Log(Value1);

if PrntMode > 0 then Print(Index:5:0, Date:7:0, YClose:6:2,

LClose:6:2, YNetVal:7:0, LNetVal:7:0, Return[Index]:4:2);

Index = Mod(Index + 1, Size);

end;

end else

{First time only after initial position}

if MPX > 0 then begin

Active = TRUE;

StDate = Date;

if PrntMode > 0 then Print(Index:5:0, Date:7:0, YClose:6:2,

LClose:6:2, YNetVal:7:0, LNetVal:7:0, Return[Index]:4:2);

end;



LClose = YClose;

LNetVal = YNetVal;

Record = FALSE;

end;



{Calculate and print summary}



if Active = TRUE and Summary = FALSE and

(LastBarOnChart or ILast >= Size) then begin



{Calculate average return in period}

Summary = TRUE;

ASum = 0;

ILast = MinList(Size, ILast);

for SIndex = 0 to ILast - 1 begin

ASum = ASum + Return[SIndex];

end;

Ave = ASum / ILast;



{Calculate annualized standard deviation}

SSum = 0;

for SIndex = 0 to ILast - 1 begin

SSum = SSum + Square(Return[SIndex] - Ave);

end;

SDev = SDMult SquareRoot(SSum / ILast);



{Annualize average}

Ave = AvMult Ave;



{Convert back to ratios from logarithms}

SDev = 100 (ExpValue(SDev / 100) - 1);

Ave = 100 (ExpValue(Ave / 100) - 1);



{Calculate Sharpe Ratio}

if SDev <> 0 then begin

if Futures then

Sharpe = Ave / SDev

else

Sharpe = (Ave - 5) / SDev;

end;



if PrntMode >= 0 then

Print( “,”, StDate:6:0, “,”, ILast:6:0, “,”, SDev:6:1, “%,”, Ave:6:1,

"%,",

Sharpe:3:2, ", “,GetSymbolName, “,”);



end;



{Print(Date:6:0, NetVal, Sharpe:4:2, MP:2:0, Active);}



YMo = Mo;

YYr = Yr;

YDate = Date;

YClose = Close;

YNetVal = NetVal;



SharpeRatio = Sharpe;



The Sharpe Ratio is related to the annualized rate of return and the

annualized standard deviation of returns. That said, the calculations

can be based upon sampling the equity curve at any fixed interval -

days, weeks, months, calendar quarters, years, etc. (Sampling monthly

is not strictly fixed intervals because the number of trading days in

a month varies a bit.)



If one used daily samples (neglecting compounding) they would multiply

the average daily return by 253 and multiply the standard deviation of

returns by the square root of 253 to get the annualized returns.

If one uses weekly samples (neglecting compounding) they would

multiply the average weekly return by 52 and multiply the standard

deviation of weekly returns by the square root of 52 to get the

annualized values for each quantity. If one used monthly return samples (neglecting compounding), the corresponding numbers would be 12 and the square root of 12.



The annualized values should be about the same no matter how

frequently you take the samples. In theory, they will be independent

of how frequently you take samples if the returns have a “normal”

(Gaussian) distribution. In practice, returns tend to not quite be

"normal” but are a little narrower in the middle and have fatter than

normal “tails”.



ps: If the returns follow a “normal distribution” it should not matter whether you sample the equity curve daily, weekly, monthly, quarterly etc. The process of annualizing the numbers should give the same result.



One can easily show that the periodic returns from any investment are

not constant but have a distribution of values. This distribution

closely follows the “bell-shaped curve” which is called the “normal

distribution” in statistics. Maybe statisticians like Jules or others can verify this…

MK: Whatever you’ve changed seems to have made system-to-system comparisons much better - ie similar equity curves now have similar Sharpes.



I would think more frequent sampling (shorter periods) would be better, as few systems here are old enough for quarterly samples. If a system were 2 years old, that would only be 8 samples - hardly what I would consider statistically valid. That said, I think that consistent sample periods for all systems is better, despite the “theoretical” assertion that returns are normally distributed and thus period is irrelevant. I suspect that over short time periods this is not true in real life - point in fact the HUGE differences in some systems’ Sharpes as calculated with different time periods. That blows the ‘time period irrelevance’ argument out of the water, IMHO. Yes, I know some of you will rebut with mathematical/theoretical arguments, but look at what has actually happened in real life to many of these systems. Different time period -> BIG difference in Sharpe; sometimes even going from a negative Sharpe to a very respectable positive number.

You are ignoring the fact that only MK knows for sure why the SR values were different before. The SR values calculated on a monthly sample period as it is now is also not statistically valid; for eg., for a system that has been in existence for only a year, that is only 12 samples; hardly statistically valid. For a system that has been existence for only 3 months, only 3 samples, totally invalid.



There could be several reasons for the SR values not being correct before. It could be because there was a inherent flaw in the formula used to calculate SR itself or the number of samples may not have been adequate for the time period under study. We don’t even know what time periods were used, whether they were the same or different, how many samples were used, what formula was used etc., So, without knowing the real reason for the discrepancies in the SR values, and to use this fact of the SR values being different before compared to now and to say that this blows out of water the ‘time period irrelevance’ is totally erroneous.



I presume you are not a statistician. Statisticians like Bob Fulks have demonstrated to me beyond a doubt that if the returns follow a “normal distribution” it should not matter whether you sample the equity curve daily, weekly, monthly, quarterly etc. The process of annualizing the numbers should give the same result.



He has demonstrated to me that the periodic returns from any investment are not constant but have a distribution of values. This distribution closely follows the “bell-shaped curve” which is called the “normal distribution” in statistics.

Pal



Your whole statement depends on assuming that returns follow a normal distribution because someone named Bob Fulks said so. Returns for many systems do not and that is per design. You cannot prove a theory based on a false assumption.



There are several counter examples showing that Sharpe Ratio calculated over different periods are much different.



Calculating Sharpe over monthly periods is a good balance between long and short term trading systems. Long term trend following systems, typically have lots of small lossess followed by big gainers. If Sharpe Ratio is calculated over weekly periods, they will get very low Sharp Ratios since their weekly returns would not be great, but looking over monthly, or even quarterly periods, they would be much better and fairer.



I agree with Hans that whatever method is used for the calculation, does make the rankings look better and fairer.



Regards

- Fanus

If there are adequate samples for the distribution, it does follow a normal distribution. You saying it doesn’t does not make it a false assumption.



You are ignoring the fact, that as more data is accumulated, the period of sampling would change from daily to weekly to monthly and the SR values would look even better.



But, because I develop long-term trend following systems, I’m all for it, since as you say, that way SR values would look much better, but that doesn’t change the fact that it is statistically invalid if it is based on inadequate samples.



I consider it as an improvement that the same period length is used for all systems now. This makes different systems easier to compare.



I agree with Hans and Fanus that the assumption of a normal distribution will often be violated and that one can therefore not assume that the length of the time period is irrelevant.



I would vote for a much shorter time period, namely days. One reason for my preference is that this time period is also used for the equity curves. So this will lead to a better correspondence between the curves and the SR (apart from the limitations of the SR that were already discussed in another thread).



Another reason is that I check my balance every day and my daily mood correlates highly with the values that I see there :-). These daily fluctuations in my (un)happiness should be reflected in the SR. What’s more important: I think that many subscribers here consider their balance almost daily or even more frequently. This is a reason to let the daily fluctuations enter in the SR. They will be wiped out when the period is set to a month.



Another reason is that the number of observations will be larger when the periods are days. Three observations is very small as basis of a standard deviation. However, this is not entirely obvious, because the observations themselves will be more reliable when a longer period is chosen. I expect that the net effect is that the SR is more reliably estimated (i.e. closer to the population value) when days are used then when months are used, simply because more information is used.



Note that the fact that the equity curve is based on days could also be used to argue that the SR should not be based on days: The SR based on days would not really add something to the equity curves, which are also based on days. The advantage would only be practical, in that the SR can be used to sort the systems.



Using months would perhaps be more conventional and more useful for new users who want to compare C2 systems with other investment strategies (e.g. a mutual fund). That longer periods are often used elsewhere is probably the result of a focus at the traditional private investor, who buys a stock and then doesn’t look at it for several months. But I think that it is more important here that the SR is useful for people who have already decided to use a C2 system, and then want to compare the C2 systems among each other. A expect that a daily period is more useful for them.



Jules

Pal



If you say returns follow a normal distribution, please prove it. One need only one counter example to disprove your statement and this will be very easy to come up with 30 returns which does not follow a normal distribution.



I am not ignoring any facts. You are making incorrect assumptions and present them as facts to prove your statement.



- Fanus

I dont have to prove anything. You are the one who has to prove with a counter example. I am yet to see one. It is true that the returns follow a normal distribution with adequate samples. Whether that adequate number is 30 or not is debatable. The proof is in the Central Limit Theorem of Statistics.

Jules



I think Sharpe Ratio based on daily returns will favor daytrade systems and put non daytrade systems at a disadvantage.



For example, using an extreme case where one system making 0.1% each day to end up with about 3% the month and another system trade once a month and make 10% in a few days, then Sharpe Ratio calculated over daily periods, will most likely be better for the daytrade system, even though the other system have better returns.



If looking at a system with year of history, would you rather rather trade a system making 3% a month, or one making 10% a month with same drawdown because Sharpe Ratio is better for the 3% system?



I think calculations over monthly periods is a good balance for both daytrade and longer term systems and more closely follow industry standards.



Regards

- Fanus

You present your assumptions as facts and then say you don’t need to prove it? Sorry Pal, facts doesn’t get proved based on lack of counter examples. This is not how it works. If you see a 1000 white swans, this doesn’t prove black ones doesn’t exist just because you didn’t see one.



But anyway here is your counter example:

30 returns:

1,2, 3, 4, 5, 6 , 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30.



This is not a normal distribution.



- Fanus

Matthew,

Rereading your post I want to add the following to my previous post.



1. As I said, I would vote for days, but I can understand it if the computational load of this would be too high, and then I would suggest weeks.



2. The effect that young systems get a very high SR can also be avoided by simply not reporting the SR for systems that are younger than a certain limit (say, 90 days).



3. I don’t know what T bills are and I don’t know what special things future accounts can do that other accounts cannot do. But I would recommend to use the same definition of ‘risk free’ for all systems. This would improve the comparability of the SRs. As I understand it now, you apply a different definition for future systems than for stock systems. That would mean that if a futures system and a stocks system have identical equity curves, then the futures system will get a different SR than the stocks system. Please correct me if I interpret you wrongly. If I’m right, I don’t see the advantage of this.



4. With respect the T bills: It is my understanding that future trading is often daytrading. So it seems to me that your assumption requires the daytrading of T bills too. How realistic is this? You wrote that the T bills can be used as collateral. Do you mean that if the account uses 25% on average for futures, that the other 75% is used for T bills? It seems to me that if I’m autotrading such a system that (1) I have to follow each closing of a position immediately by buying T bills, and (2) even more difficult, before the opening of a new position I have to sell those T bills in order to avoid a margin call. So I should be faster than Tradebullet. If understand this situation correctly, then I think that this is realistic only if the system itself orders the trading of T bills. No one prevents the system from doing that (?) so if the system doesn’t use this easy way to increase its performance, too bad for the system, but C2 shouldn’t correct that.



Jules