System length and statistic issue

Some food for thought. Two systems: One’s lenght is 2 year but has 20 trades. Another one’s age is two month but has 200 trades.



Which one is meaningful?



I think number of trades also counts for statistic issue.

A statistical significance test can answer your question.



Generally it is accepeted by statisticians, that a system that has at least 30 trades could have statistically significant results.



More on this here:



http://www.adaptrade.com/Articles/article-sig.htm



rgds, Pal

Midas Long-Term Value

Midas Short-Term Value

You are correct about the statistical acceptance of 30 trades, but this is also regarded by statisticians as the minimum sample size and the results contain a certain error. The error in the results is decreased by a larger sample size.



When applying statistics to the markets the statistical results apply only to the current trading environment from which the trades came. The problem is that the trading envionment changes constantly ie trend, range, bull, bear. As an example, a trading system might only work in bull markets, so to take a statistical sample of 30 trades in a bull market can give an indication of how the trading system will work in a bull market trend but says nothing about how the system will work in a bear market or a ranging market.



Therefore, to perform a statistically correct analysis of a trading sytem the 30 trades must come from all the various trading environments, bull, bear, trending, and ranging, 7 or 8 trades from each environment. (To make it a little more statistically accurate I might suggest a minimum of 30 trades from each environment. )

I agree you Alan, numbers of trades alone is not enough. Time is worth something. VIVALDI is tested over 20 years and 1,108 trades, showing 58.8% profitable in 17 markets. The spreadsheets with such results are downloadable from the website at www.cclsys.ca.



But even that is not enough since this is historical back-testing. If it were an actual track record, then that would be that. But given this is computerized back-test results, i.e. hypothetical, then the issue of curve-fitting comes in. There are different schools on this. One school states that if you have only one parameter and it works over such a testing period (including all 17 or whatever different markets ) then it must be good. This is especially true if the parameter is the same over all tested markets.



Another school believes in making each system fit each market even though it makes it more open to curve-fitting degradation. However, is the degradation likely to be greater (i.e. worse) than the aforementioned method that knows that some of the instruments will do far more poorly than others but at least has the more comfortable ideology (‘all are the same so therefore it is not curve fit’).? If you look at the results from any such ‘one-size-fits-all’ approach, many of the markets really don’t perform well at all even though they do show profits. If you like, this is an ideology-driven approach which is willing to lose quite a bit of money in certain markets in order to maintain a clean ‘non-optimized’ label.



A third school will maintain that if you have more than one parameter, and especially if those parameters/rules are non-correlated with each other, then the likelihood of curve-fitting diminishes with each additional rule because it gets harder and harder for them to all agree. Although still not free of curve-fitting danger, the likelihood of serious degradation diminishes considerably, especially since usually what will happen if there has been too much curve fitting is that the rules will not line up to generate clear signals and the system will tend to avoid entries, and therefore losses as well as profits. It will just not perform much at all in other words.



But there are rules and rules. There are rules for entering, exiting to take profits, stop losses, break-evens. These rules may or may not be highly correlated with the entry rules. It becomes very complex trying to determine the degree of curve-fitting.



Which is what roll-forward testing addresses. You optimize up until Dec 31st 1990, for example, and then record the results from Jan 1 1991 to Dec 31st 1991. Then you optimize again and so on. Or you optimize until 2001 and show what happened after 2001. The before and after results will tell a very definite story. I am researching applying an automated roll-forward approach to my system to use for both further testing and presentation. I happen to believe that it is best to continously revise optimizations - if done judiciously - especially since in my case I am using seasonals, and I want to incorporate each year’s data into the overall data base with which to compile the seasonal component of the system. Just common sense really.



Whereas if the different rules are highly correlated - i.e. different length moving averages, indicators with same bar length and similar information, then even though there are many ‘rules’, since they are correlated the dangers of over-optimization are very real.



But again: if the mix of markets is overly correlated, that story might depend more on overall market conditions - as you pointed out - then the methodolgy itself. Bonds have been in a bull market for a long time, and if that changes significantly, many bond systems just won’t work. Many index systems that did well in the 90’s and early 2000’s when the market was strongly trending (up or down) have had a rough go of it recently. And there is good reason to believe that stocks will remain in a trading range type pattern ( speaking intermediate level) for many years to come. What basis for a huge new bull market will there be?



(This never ends!!)



Ultimately, the only definitive test is real trading with real money. The next best is the service C2 is offering with real-time simulation. But since C2 - for perfectly valid reasons - does not show hypothetical, only real-time tracked, results, it is hard to find many systems here (yet!) with long track records.



Futures Truth offers full-bore tests using intraday data to double-check stop fill values, but many EOD systems enter on the open (like VIVALDI ), not using intraday stops, in which case a good Tradestation or other testing platform (using well put-together continuous contracts for futures systems) will yield excellent results.



Good luck in your evaluations!