Beau Blasts ETF Timer

For those of you who avoid the Chatter area, today was a real free-for-all. But what caught my attention was Backtest Beau blasting ETF Timer. Here’s what he said:



14:46 ET:

The only thing ETF Timer has done is buy and hold with QLD. Whatever trades he does in between his entry and exit, are irrelevant. It’s still one trade.

Beau Wolinsky,



Well I have great respect for ETF Timer so I thought I’d look at his record. Has he really bought and held QLD for two years?



No, he was had 15 periods where he was either long or short the Nasdaq: 6 were long positions (QLD), and 9 were short positions (QID).



Moreover his record on calling the turns in the market is stunning. Here’s how it breaks out. I assume he enters at the open and exits at the open of the day in question, and the percentage gain or loss is based on the Nasdaq cash index:



S 1/31/08 Exit 3/14/08 +2.0 percent

L 3/14/08 Exit 3/20/08 -2.2

S 3/25/08 Exit 4/11/08 +0.1

L 4/11/08 Exit 6/24/08 +2.1

S 6/25/08 Exit 8/11/08 -1.3

S 8/14/08 Exit 9/29/08 +9.7

L 10/2/08 Exit 10/10/08 -22.0

S 10/13/08 Exit 10/24/08 +13.9

S 10/30/08 Exit 11/13/08 +11.5

L 11/19/08 Exit 12/22/08 +5.6

S 1/2/09 Exit 1/22/09 +6.8

S 1/28/09 Exit 2/24/09 +8.6

L 3/3/09 Exit 7/8/09 +31.0

S 7/14/09 Exit 7/16/09 -3.6

L (presumably) 7/17/09 open trade +10.0



Long Trades: 4 winners 2 losers 24.5 percent

Short Trades: 7 winners 2 losers 47.7 percent



Looks like Backtest Beau mis-spoke when he said all ETF Timer has done is buy and hold, in fact it’s made about twice as much on short side as on the long side.



Now I know Beau will come up with a backtest to show his latest and greatest is much better than this, but ladies and gentlemen my hats off to ETF Timer for navigating a very tough 2 years so well IN REAL TIME.

Great post and great analysis of one of the most amazing timing systems on C2. This is what one typically can expect when Keith posts here.



Thanks Keith for your effort and contribution.


Beau, the self proclaimed “greatest strategy developer of all time” with the “world’s greatest strategy” has shown his true colors with his ETF Timer “analysis.” He is a bitter, arrogant, jealous little boy who works back office and failed the CFA exam.



Of course, you are right, he’ll come up with a backtest and 1,000 word diatribe to show how he is right, everyone is wrong, millions in funding is coming his way and that Beau is the best at everything financial. A comedy show ensues anytime Beau posts here, at Elite Trader or on his blog.



Sorry for the digression. My congrats to ETF Timer for such a successful history, regardless how he does it.



Jack

Keith, you come out with beautiful names for people: Backtest Beau is a great name for him, better than Almighty Index or Controller Walt because Backtest Beau is rythmic sounding unlike the others I named…



Unofrtunately the number of trades for ETF timer is not statistically significant for me to analyze with confidence, but I still analyzed it by ignoring a lot of criteria which I deem are essential for a rational system and I came out with the following:



Rank Name Value Price Diff (-ve is undervalued) Sharpe Ratio Annualized Return CAGR Max. DD



8 ETF Timer 7 97.00 90.00 2.235 111.3 79.4 26.90 16 13 7973.31 7344.67 677 0.8125 0.694542947 12 464.97 6.54



It seems that ETF Timer is a low return/low risk system. Even though it has weathered the storm for the past 2 years, so did a lot of systems and traders. By no means it is a top rated system, even though it is the most popular…In fact I rate it in the bottom pile of C2s unrational (unproductive) systems (CAGR 79.4 < 100)…



You may draw your own conclusions.

Folks, consider the source (Paper trading piker Palsun) of the most idiotic comment I have EVER heard on C2 about ETF Timer (79% annual return, real time, for 2 years!)…



"It seems that ETF Timer is a low return/low risk system. Even though it has weathered the storm for the past 2 years, so did a lot of systems and traders. By no means it is a top rated system, even though it is the most popular…In fact I rate it in the bottom pile of C2s unrational (unproductive) systems (CAGR 79.4 < 100)… “



If this ridiculous statement doesn’t move you to put Palsun on “ignore,” nothing will. This guy Palsun is DANGEROUS!”

OK, here’s my conclusions.



1. About statistically significant. Beau brings this up as do you. There is probably no system on C2 that has “a statistically significant number of trades”. The reason for this is twofold. First, statistical significance means that the distribution of the sample is sufficient to characterize the underlying infinite distribution. In the case of trading, that means that the number of trades observed is sufficient to describe all the trades that will ever be taken with that methodology. An analogy is “how many IQ tests do I have to take on subjects before I know the statistics of IQ”. The answer to “how many” of anything depends on the variance of the underlying distribution. If we’re talking about the accuracy of rulers, then a small number of measurements would be necessary to get the distribution. If we’re talking about IQ, many more measurements would be necessary. The “how many” question is answered by statistical tests on the sample distribution. Most statisticans won’t even conduct a test until they have 30 samples. That’s where the number 30 comes from in the literature: the mistaken belief that 30 trades is a “statistically significant” number of trades. No, it’s just a sufficient number to start doing T-tests, Z-tests, etc. on the distribution. Unfortunately, trading is a “high variance” endeavour. The range from the largest loser to the largest winner can span many thousands of dollars. It would take thousands of trades to characterize the distribution of a trading strategy. C2 has a number of systems that have thousands of trades. Are they “characterized”. That brings us to the second part of “statistically significant”



“Statistically significant” only applies to distributions that are stationary. Stationary distributions do not change over time, or change so slowly that measurements can be meaningfully compared across large time differences. Height, and IQ are common examples of stationary distributions. Unfortunately, stock market data, commodity data, and forex data are not stationary. How a market trades varies over time. A great example is system that was very popular at C2: VN Forex Club. This system relied on the Japanese carry trade, and when that went away, the system crashed. All systems here rely on a “trading idea”. When the market cooperates and trades the way “the idea” believes it should, results will be good, when the market doesn’t cooperate, results will fall off. A truly robust system will be designed around a trading idea that is valid most of the time. Good examples are “trading with the trend in futures”, or “buying weakness and selling strength” in the stock market.



Bottom line: I don’t think any C2 system has enough trades to believe that it’s behaviour is fully characterized. But that doesn’t mean there are no good systems at C2.



2. This idea that “rational systems” fall into “categories” is something I don’t understand. I believe you’re forcing the criteria to fit your system. Every “low return/low risk” system can be a “high return/high risk” system by applying sufficient leverage. The fact of the matter is that a truly good system designer will pick an operating point based on risk tolerance. I’ve marketed to the public for over 16 years and I know that the average trader will abandon a strategy when he has a 20 percent drawdown. They all “say” they can withstand 30, 40, or even 50 percent drawdown to achieve 50, 70, 100 percent return, but the calls start coming when it hits 10 percent and they bail at 20. Why would anyone ever “design” a 50 percent drawdown system that they offer to others? It’s crazy. Scale back the leverage until you’re under 20. If the return is worth it, traders will come.



3. Which brings me to my last conclusion. If you want to compare “rational” systems, just use the Calmar ratio. Average annual return divided by max drawdown allows you to compare “apples to apples” one system versus another. Leverage will let you adjust the risk to your risk tolerance.

An excellent post!

Thank you for your contribution(s), Keith.



Gilbert

beautiful names for people:



Let’s try: Pathetic Palsun, Pitiful Palsun



In fact, there are so many things that work with Palsun:



at a loss, at sea, baffled, bewildered, bollixed, clueless, come apart, come unzipped, discombobulated*, dopey, doubtful, floored, foggy, fouled up, hung up, in a fog, lost, loused up, messed up, mind-blown, mixed up, mucked up, mystified, nonplussed, perplexed, rattled, screwed up, shook, shook up, spaced out, stuck, stumped, thrown, unglued, without a clue

I disagree with your first conclusion. We don’t have to worry about normality of the distribution of trades, provided we have at least 30 trades. The central limit theorem of statistics says that as long as we have at least 30 “observations” (i.e., trades in our case), the distribution of the averages will be normal even if the trades themselves are not normally distributed. So to say that “there is probably no system on C2 that has “a statistically significant number of trades”” is grossly wrong. To prove your statement, you would have to disprove the Central Limit Theorem. If you do you would get the nobel prize. You cannot prove with an example, you can only disprove. Proof requires a law or a theorem. To assume otherwise would be dangerous…



One question that can be addressed by the use of statistics is whether a trading system is inherently profitable. We can approach this problem using confidence intervals for the average trade. If we have a sample of, say, 100 trades from a trading system, we can compute the average trade, T. Of course, we expect T to be greater than zero, indicating that the system has been profitable on average. However, if we took a different sample of 100 trades, we would, in general, find a different average trade, T. If the variation among the trades is large enough, it’s possible that some of these averages could be less than zero, indicating that the system was not profitable on average for those trades.



By computing the confidence intervals for the average, T, we can determine whether it’s likely that the average will be greater than zero. The confidence intervals specify upper and lower bounds for the average. The true average lies within those bounds with some specified probability or confidence level, such as 95%. The equation for the confidence intervals is as follows:



CI = t * SD/sqrt(N)



where t is the Student’s t statistic, SD is the standard deviation of the trades, N is the number of trades, and sqrt represents “square root.” The average trade is likely to lie between T - CI and T + CI. For the system to be profitable at our specified confidence level, we need T > CI.



The value of t depends on the specified confidence level and the number of trades, N. The exact value can be found in a statistics table for the t distribution or calculated in software, such as from the TINV function in Excel. However, provided we have a reasonably large number of trades, the exact value is not necessary. If N = 60, the t value for 95% confidence is t = 2.00. For larger values of N, t will get slightly smaller, dropping to 1.96 for very large N. To be conservative, then, we can take t = 2.00 as long as we have at least 60 trades. If our actual value of N is larger than 60, we will have slightly larger intervals than if we used the exact value of t.



Under this assumption, then, we have



CI = 2 * SD/sqrt(N); N >= 60, 95% confidence.



Because the square root of N is in the denominator, all other things being equal, the more trades we have, the smaller our confidence intervals will be.



This is a way to quantify what most of us already know from intuition and/or experience: if you want to know whether a trading system is profitable, the more history the better. In fact, we can re-write the CI equation to tell us how large N needs to be in order to demonstrate profitability:



N > 4 * (SD/T)^2



where the ^2 indicates “square.” This assumes we have a good estimate for the standard deviation and average trade.



As an example of this equation, let’s take example numbers from an example system. Lets say we had an average trade, T = 248, and a standard deviation, SD = 990. Plugging these into the equation for N, we get N > 63. In other words, with these average trade and standard deviation numbers, there’s a 95% chance the average trade will be profitable (i.e., greater than zero) provided we have more than 63 trades in our sample.



Regarding your 2nd conclusion:



I disagree. I’m not forcing anything. It is simple really. Those are the characteristics of a rational system as evaluated by an objective standard and ETF Timer falls short of the standard. We’re all familiar with the concept that the greater the risk, the greater the reward. Higher profits are the compensation for taking on a greater risk. No one wants to assume more risk without being compensated for it. However, if you don’t understand the true risk inherent in a system, then how do you know if you’re being fairly compensated for it? More to the point, when comparing trading systems or when comparing parameter sets for a given system, we generally want to choose the one that produces the greatest reward for a given level of risk. In order to do this, we need to understand the risk-reward characteristics of a system. Position sizing is a way to relate risk to reward. Just as there is no limit to man’s need of knowledge and therefore of thought, so there is no limit to man’s need of wealth and therefore of creative work. You canot have a high returns with low risk. There can be no such thing as a man who transcends the need of progress, whether intellectual or material. There is no human life that is “safe enough,” “long enough,” “knowledgeable enough,” “affluent enough,” or “enjoyable enough” - not if man’s life is the standard of value. So it is completely rational (not crazy) to aspire for high returns.



Regarding your 3rd conclusion:



I’m not sure about this. I sorted by Calmar ratio and I obtained completely different rankings (actually Fortuna has the highest Calmar Ratio of any other rational system). I use the Expectancy Score for the rankings. “Expectancy score is a better, more objective measure than the Sharpe Ratio for evaluating the relative performance of different trading strategies.” http://unicorn.us.com/trading/expectancy.html trading strategies.



I feel that all of your conclusions are wrong. I therefore do not change my conclusion about ETF Timer as an irrational system in the bottom pile of C2s unproductive systems…

Palsun,

You should credit your sources…you fraud: http://www.breakoutfutures.com/Newsletters/Newsletter0403.htm





http://www.breakoutfutures.com/Newsletters/Newsletter0403.htm

PS: There is no monopoly on ideas that are not copyrighted; non-copyrighted material found in public domain are not permanent property. Also, copy-righted material in public domain which the author intended to be freely distributed as shareware/freeware is not permanent property. In both cases, in sharing those ideas, no rights are violated.

Whatever you say. You are a Walter Mitty.


How very, very, very little Palsun knows. Looks like he copied it word for word, and that is a SCREAMING example of plagiarism. Perhaps he does not recognize this at the bottom of the pilfered page:



Copyright © 2001-2009 Breakout Futures

I think index missed one name
. Plagiarism Palsun



Palsun, in case you don’t know



Plagiarism is the use or close imitation of the language and thoughts of another author. It is considered dishonest, fraudulent, and a breach of ethics.



Palsun, notice I didn’t mention copyright infringement. Presenting someone ideas as your own shows the type of person you are. There is a big difference in sharing ideas and pretending an idea is yours.

I didn’t say the distribution of the trades wouldn’t be normal, but in fact they aren’t. Almost every trading stategy produces a distribution that is lognormal. What I said was 30 trades is not a big enough sample to characterize the underlying distribution. In my seminars I explain it this way. Suppose you have a barrel full of socks and someone pulls out thirty pairs. If the 30 pairs are 15 black and 15 blue, you can be pretty sure the underlying population (the whole barrel) will have a distribution that’s about 50-50. Now 16-bit color monitors have 65,536 true colors. If I stuffed a barrel (OK a huge barrel) with a number of socks of each color, and had you pull out 30 samples, would you have the slightest clue to the distribution of each color? No way, you’d need 100s of thousands of samples to figure it out.



By the way the central limit theorem doesn’t say what you said it does (I mean what you copied verbatim form Bryant’s website). There’s no mention of 30 anything. As I thought I explained, the 30 number is a mis-interpretation that Bryant and a number of other trading writers have made. Statistians won’t even attempt to apply normal distribution tests to a sample until they have at least a minimal amount. The minimal amount you see in the literature varies but it’s around 30. Doesn’t mean 30 can characterize the underying distribution ( as we saw from the sock example) but we can start doing tests.



As for your disagreement with my second conclusion, you postulated what you considered an “objective standard”, your criteria. I just observed that your criteria let your system in and left the ones everybody subscribes to out. Maybe everybody else has a different “objective criteria” than you do. I know I do and it involves not trading a system that has had a 50 percent drawdown authored by someone who has 20 C2 failures in the past and who states, this is the same as my other ones. Am I being irrational?



As for your conclusion on my 3rd conclusion (gets complicated doesn’t it), “Expectancy score is a better, more objective measure than the Sharpe Ratio for evaluating the relative performance of different trading strategies”, did I miss Unicorn.com getting a Nobel prize for their work in risk management?

Corby,

Good job to expose the plagiarist

You’re welcome.

Sorry, once again I disagree with all of your conclusions…



I rather believe a theorem or a law than take your word for it…



and yes you are being irrational…I did not have 20 failures…I had some methods that were abandoned as I was testing my system and other methods that crashed as I was tesing my system…You know a difference between a method and a system, right? The very same system with a different method is what I am using today and it is a success and it has less than 50% DD and above average return and that, you cant take away mister…Anyway, my success or failure is irrelevant to our present discussion (of ETF Timer)



You dont have to have a nobel prize in risk management to objectively evaluate systems…



I’m sorry, Truth hurts… but there it is…

Palsun -



Do you believe you plagiarized, or is that a conclusion you also disagree with?



The answer will speak volumes about you.