Realism Factor

Will the board administrator please delete the following post from the IMO-EMINI board. It really belongs here under suggestions, feedback. Thanks.



Matthew:

You can’t take the Realism Factor with a “grain of salt” when it is being used to evaluate trading systems and influence prospective subscribers, yet it does not reflect the real results of trading systems, as evidenced by the posts that I have read on these boards. Also, I am yet to understand how I make only three trades on my system and be assigned a realism factor. I suggest you discontinue the realism factor until it can be made to reflect real trading. To suggest that the realism factor be taken with “a grain of salt” means that it does not work and therefore should not be used to rate professional trading systems. Instead of the realism factor, poll subscribers using an online questionnaire to determine the level of slippage and fill problems, etc, being experienced by subscribers, and assign a numerical rating to that for each system. You could poll a subscriber 5 days after using the signals for rapid daytrading systems. Also poll subscribers that cancel, and also those on the trial period.

Questions could be like:

1) On a scale of 1-10, 10 being slippage is a problem, 1 being slippage is not a problem, what is your experience with slippage.

2) On a scale of 1-10, 10 being good, 1 being poor, how would you rate the speed of your limit order fills.

3) On a scale of 1-10, 10 being good, 1 being poor, how would you rate the speed of your market order fills.

4) On a scale of 1-10, 10 being good, 1 being poor, how would you rate getting most of your limit orders filled. 5) On a scale of 1-10, 10 being good, 1 being poor, how would you rate getting a fill for your market order.

6) On a scale of 1-10, 10 being good, 1 being poor, how would you rate this system for consistent profitability.

7) Do you use auto trading or manual.

Next to the poll result rating (whatever the rating is called) should be the % of subscribers using auto trading for that system, since that would have indications for some of the fill issues rated in the poll.

These questions and others asked of subscribers every three months to rate trading systems and monitor overall C2 quality, in my view, would be welcomed by subscribers since they have the most to gain/lose.

Systems with less than two subscribers would get a n/a (not applicable) rating. The experience of two subscribers trading the same system is sufficiently qualitative to generate a system rating.

Trading systems with subscribers and pending a rating, could show P for pending.

This suggestion is open for feedback and your comments, Matthew.

I’m sorry you don’t like the RF number you have been assigned. However your disagreement is not sufficient reason for abandoning the project. In fact, the RF is exceedingly useful, and the math and logic behind it is strong. I am working on some new ways for people to see and understand the RF, and you will soon begin to see it applied (and explained) in ways that will make more sense. These new features will be released shortly.



MK

I second this motion. This will give subscribers so much more confidence in trusting the results, which will benefit everyone since more subsribers = more money for everyone here.



But 5 days is way too soon to evalute a system, even a daytrading one. I would say at the end of every month, ask the subscriber to enter or update the survey.

Eventhough the above polling suggestion seems good, It would be even better if it is used not to replace the Realism Factor (RF) completely, but instead used as one of the criteria that makes up the RF, keeping in mind as MK mentioned before that the RF is a work in progress and is being constantly improved upon as more and more experience is gained, though I must admit that the RF at its present stage doesn’t seem to be taken into account when compiling the best systems list (for which a possible explanation might be guessed by anybody, however short-term that view might be.)

The poll will do the job. If I ask you what percentage of your buy orders experience slippage, and what percentage of your limit orders get partial or no fills in your real account, I beleive you could give me a number that is close to 100% accurate.

The poll will not do the job completely and accurately. The reason being slippages/fills all depend on your trade (position) size, broker and the trading platform (auto-trading/manual/broker-assisted) used which are all variables and therefore the utility of which varies from trader to trader, broker to broker, computer to computer, network to network, to be entirely honest and reliable.



In my opinion, the poll results (which should have a minimum of 30 polls from 30 different traders to have statistical significance,) can only be an addition to compute the RF(minimum 30 trades and minimum 10 trades per period used in calculations), nevertheless an important addition, honesty and reliability issues not withstanding.



A complementary suggestion to get an estimate of the confidence level to trade a system would be to conduct a statistical significance test, as I have done, for example for my system Midas Long-Term:



=============================================================================================

SIGNIFICANCE TEST

=============================================================================================



Number of Trades: 14

Number of Degrees of Freedom: 10

Average trade at 95.00% confidence: $871.21 +/- 499.64

Worst-case average trade at 95.00% confidence: $371.58

Probability that average trade is greater than zero: 99.49%



Trades pass statistical significance test at specified confidence level.



System: Midas Med-Term



=============================================================================================

SIGNIFICANCE TEST

=============================================================================================



Number of Trades: 123

Number of Degrees of Freedom: 119

Average trade at 95.00% confidence: $1681.28 +/- 823.64

Worst-case average trade at 95.00% confidence: $857.64

Probability that average trade is greater than zero: 99.95%



Trades pass statistical significance test at specified confidence level.



System: Midas Short-Term



=============================================================================================

SIGNIFICANCE TEST

=============================================================================================



Number of Trades: 153

Number of Degrees of Freedom: 149

Average trade at 95.00% confidence: $1332.47 +/- 630.80

Worst-case average trade at 95.00% confidence: $701.67

Probability that average trade is greater than zero: > 99.95%



Trades pass statistical significance test at specified confidence level.



System: ZShort-Term



=============================================================================================

SIGNIFICANCE TEST

=============================================================================================



Number of Trades: 551

Number of Degrees of Freedom: 547

Average trade at 75.00% confidence: $255.68 +/- 215.72

Worst-case average trade at 75.00% confidence: $39.96

Probability that average trade is greater than zero: 78.08%



Trades pass statistical significance test at specified confidence level.

What matters most is the actual experience of a subscriber using a trading system.

You said, “The poll will not do the job completely and accurately. The reason being slippages/fills all depend on your trade (position) size, broker and the trading platform (auto-trading/manual/broker-assisted) used which are all variables and therefore the utility of which varies from trader to trader, broker to broker, computer to computer, network to network, to be entirely honest and reliable.”.

Regarding position size and slippage, no fills and partial fills, the trading system should give guidelines as to position size to use, and type order to use. Regarding auto-trading/manual/broker assisted trading, the trading system should say what is the best way to enter trades for that system. In light of your comments, the following question should be included in the questionnaire: On a scale of 1-10, 1 means following all trading system guidelines, 10 means following no trading system guidelines, how closely do you follow guidelines given for this trading system? The answers to that question should be factored into the numerical rating.

The other question should be: On a scale of 1-10, 1 means not matching C2 results, 10 means matching C2 results, how do each of your real trades match C2 results. I enter all my trades immediately as I receive the C2 signals, and follow the system guidelines, but my real trades match C2 trades less than expected because of too many (1) slippage…(2) too many no fills…(3) too many partial fills… This other question I beleive is also central to the issue of real time trades compared to C2 results: On a scale of 1-10, 1 means not at all, 10 means all of the time, how do you experience (a) slippage…(b) partial fills on limit orders…© no fills on limit orders… Also, you said “In my opinion, the poll results (which should have a minimum of 30 polls from 30 different traders to have statistical significance,) can only be an addition to compute the RF(minimum 30 trades and minimum 10 trades per period used in calculations), nevertheless an important addition, honesty and reliability issues not withstanding.” My view on that is 30 trades by at least two traders trading the same system is sufficient to give a reliable reading on the issues that are in question, using the type of poll questions in the examples above. Make the poll mandatory for all system subscribers. Also, make it mandatory for each trading system to give guidelines for trading the system. This is relevant to the suggested polling of system subscribers. Guidelines example for trading each system could be as follows:

1) Maximum position size recommended:

Stocks with average volume 500k - 1 million shares: Maximum position size 5k shares, and 10k shares if getting good fills at 10k shares. Above 1 million shares average volume: Maximum position size 10k - 20k shares, 20k shares if getting good fills at 20k shares.

2) Market orders (or limit orders).

3) Auto trade (or ITM).

4) Do not recommend broker assisted order entry.

5) Enter positions immediately on receiving ITM signals.

Trading systems should also be required to state:

1) Averaging down used/not used.

2) Stop loss used/not used.

3) Stop loss size.

Using the poll, and with guidelines stated for trading each system, the poll feedback will result in info that is credible for assigning numerical ratings for each trading system, and discontinue the realism factor. Allow system subscribers to tell what their experience is when trading C2 systems. It’s a win situation for C2, system subscribers, and for trading systems that are tradeable.

This is such a dumb idea to want to replace the Realism Factor with subscribers feedback.



If I am subscribed to a good system, I am surely not going to give any feedback. The more people who know about it, the better the chance that the system will stop working. So, I will keep quiet.



When you have one, or two unhappy customers for whatever reason, they can provide bad feedback to bring the realism factor down.



Jeallous competitors can subscribe to a system with the whole purpose of providing bad feedback to bring the realism factor down.



People will use Realism Factor to pick systems. So new systems will be at a disadvantage and the already established systems on C2 will have an advantage who already have subscribers. So, people will continue to select those systems since they have a realism factor. People will use RF to pick systems, and systems will need subscribers to get a RF. Deadlock.



There are already a place where subscribers can leave feedback. To want to replace RF with this, will leave it wide open for abuse. RF should be based on cold, hard facts.



If a system trade 100K shares of a stock which trade only 10K per day, RF should go down since a trade is not based in realism. A subscriber might scaled it down a lot and trade 10 shares a day and make money, but then his return will be a fraction of what the system show on C2. So, the system on C2 might show 200% annualized return a year and the real account have 5% and the subscriber might be happy with 5% and provide good feedback which will increase the RF, when in fact real time trading do not resemble C2 trading at all.



If a subscriber does not 100% follow the orders from C2, incluing trade sizes, then feedback cannot be accurate. If the system on C2 trade 1000 shares, but a subscriber only 100, then the subscriber is in no position to provide feedback on fill and slippage issues, or any other kind feedback, since it doesn’t duplicate the C2 trading and is apples to oranges comparison.



To say the system should provide suggestions on position size is completely silly. Position size depends on the risk factor of the subscriber. Some are ok with putting on huge sizes and having huge drawdowns, some are not. The system vendor cannot possibly cater to each subscriber’s risk tollerance. What if the vendor does and someone is more risk adverse than the vendor and doesn’t like the position size recommedations. Should he give a bad rating then, just because he don’t like and agree with it? And by providing position size recommendations, this can be seen as giving financial advice which leave the vendor open for legal action if the subscriber lose a lot of money.



This suggestion is so impractical and anyone who will think about this for 5 minutes on how something like this can replace RF, will come to the same conclusion.



Chris

Finally and intelligent response.

You are missing the point. The only way to find out how much slippage, partial fill and no-fill subscribers are experiencing is to ask them. A system that has a high level of those problems, and the subscribers are not exceeding the recommended maximum position size, then that system has a problem, and that should show in the rating of that system. That info is only available from system subscribers and they should be polled to get it. The poll results would display only as a number like the realism index number, you would not see the poll results for each individual subscriber.

Actually, you are missing the point. I gave several examples above why this is completely impractical. A method like this is far too easy to abuse and is completely based on assuming everyone will provide completely unbiased feedback. You cannot based a realism factor on unreliable, non statistical feedback. How can you verify that what subscribers say is true? You can’t. And without verifiable feedback, your whole concept fall apart. Period.



Things like Realism Factor should be based on hard verifiable facts. Not on someone’s emotions a week after he experience losses in a system which 100% match the losses on C2.



Chris

“A complementary suggestion to get an estimate of the confidence level to trade a system would be to conduct a statistical significance test, as I have done, for example for my system Midas Long-Term:”



Perhaps my memory is failing me but several weeks ago I witnessed the complete destruction of a trading system that up until that time had demonstrated a beautiful equity curve over several months. It was advertised as the most robust trading system in the world. Then within a one week time period the equity curve went from about $150K to negative $1 MILLION DOLLARS. What was the name of that trading system? Midas something or other. The next day I could no longer view that system on Collective2. Perhaps a computer blip?



My comments on the Realism Factor (RF) are as follows:



(1) System developers are doing simulated trading at much higher leverage than they do in real life. It is a competitive market and they want to attract subscribers. Highly leveraged systems may ultimately bankrupt subscribers if they emulate the trades. The RF needs to account for the leverage being used and worst case historical drop (plus comfort factor) in the security(ies) being traded. No assumptions should be made that stops will get the subscriber out of a position.



(2) There needs to be a distinction between systems designed for professional day-traders and systems designed for the average investor. What is real for a few individuals with $100K+ trading capital and able to babysit a computer all day long is not real for

the millions of average investors with day time jobs and $10K of excess money to invest in the markets. (Question: why is C2 not attracting subscribers? Answer: C2 is not catering to the average investor).



3) Slippage is not the same as realism and should not be combined with realism. There is a difference between a system that cannot be traded (because they can’t get fills, etc.) and may not be traded profitably. The slippage should to be handled separately from realism since there are variables that affect the slippage including brokerage, auto-trade/manual, etc. C2 should estimate a general slippage figure and give rationale for the figures presented. i.e liquidity of market, volatility, velocity/direction, market-if-touched orders, etc. Then the potential subscriber can judge for him(her)self whether the system can be profitably traded.



Why would they not record their true experiences. System subscribers will tell if they are having a bad time with a system, and they will tell if they are having a good time with a system. It has nothing to do with emotions, it’s all about saying what your experience is.

If I say to you answer the following questions: 1) What % of your limit orders get filled… 2) what % of your limit orders get partial fills… 3) What % of your limit orders experience slippage… 4) Do you trade position size above/below what the system trades…

Forget about your earlier comments for a minute and just answer the four questions without naming the system.

You are correct; during weekends, the account equity used to wildly swing when there are bogus quotes for forex, usually the average price of 0.00 and original price; so that if one is long on those pair with bogus quotes, the account would drop substantially; but the quotes have been pretty reliable of late; maybe there was a change of forex quote vendor for those pairs? thanks to MK for fixing, but I still see for some systems like Lexcap Forex Mark I, the bogus quotes are making the account to wildly swing, but in this case it is up $1M, probably it is short on some currency pair and the current price which now is the average of 0.00 and original price is way down and so the account is up; so the quote problems may not have been completely fixed yet. This happens only on weekends occassionally, so that when the forex market opens on sunday night, it is back to normal.

The subscribers cannot record their true experiences accurately and completely, because memory is short at best and we cannot expect everybody to be rational; because even though the rules of the game are simple, it does takes a lifetime to master it; consistent, disciplined and objective; except perhaps the great ones like Warren Buffet etc.; so whatever poll one conducts cannot be completely honest and reliable (which is what the RF is trying to measure.)



You may then apply the RF to the expectancy score to get a refined number which would be a measure of the productiveness of the system, breaking a tie between similar systems using the Sharpe Ratio.



But, I still maintain that the subscribers experiences count for something towards building the RF, but unfortunately may not be pratical enough to be implemented; but I would not be surprised if MK finds some way to do just that; after all genius is the capacity to take pains; anyway we always have the subscribers reviews; so that if a subscriber had a bad experience with a system, they can vent their vengenace in their reviews; actually a system vendor would be greatful if a subscriber came out with suggestions to improve the system for the future; believe me, the advise though free, is paid for dearly by somebody, somewhere; as everything in this universe is paid for by somebody; for if the guilty does not pay then the innocent has to pay for it.

Chris hit the nail right on the head. In addition, we need not forget that because we are trading hypothetical yet simulated accounts regardless of the size of the account, the ultimate judge for the RF MUST be the market itself without artificial and biased influence (humans) because at the end, the market decides whether your trade size can be executed, not humans. Liquidity must also be based on the available size on the bid/ask at the time the trade is transmitted to the market not just the average traded volume.



RF must also be very transparent not proprietary to insure fairness across all types of trading instruments, meaning every trade must be judged on its own merit based on the ability of the market to absorb such size being traded at that moment. Example, if I trade 1000 CSCO options, and the available size is 25000on the bid/35000on the ask, that trade must receive 100 RF because the market can easily absorb such size at that particular moment. The same with QQQQ options. On the other hand, If I buy 500 options on a stock with 75 contracts on the bid/100 contracts on the ask, the RF score would be 20 (100 realisticaly available/500 proposed).



To make this even better for the subscriber, Matthew will include an additional line attached to the broadcasted trade, how much can be realistically filled based on the actual available size on the bid/ask at that particular time. In an instant, the subscriber who receives the broadcasted trade through ITM or even email can instantly decide BEFORE initiating the trade with real money whether the trade can be realistically filled or not based on the amount of real money they are comfortable trading with.

My answer to this is, how will you guarantee that answers to your questions are 100% truthful? If you cannot 100% guarantee that the answers are correct, then RF calculated based on that has absolutely no merit.



Answers maybe not be truthful because someone, a jeallous competitor for example, could plain lie, or even an honest subscriber might make mistakes with answering the questions. Or on the other side, family members and friends providing good feedback to increase the RF value. Or a system vendor even paying a few people to provide good feedback. Good feedback will result in higher RF values which will result in more subscribers, which will make it a good “investment” to pay off people to provide good feedback. A system vendor himself can subscribe to his own system several times and provide him or herself good feedback.



And besides that, what about the new systems without subscribers? As I said before, based on your suggestion systems will need subscribers to get an RF factor and potential subscribers will look at systems with RF factors. This is like when you freshly graduated and is looking for your first job. Everyone look for experience, but no one give you a chance to get experience, until you find that one company willing to take a chance on you. New systems will have to hope for someone to take a chance.



If you have done any real trading yourself, you will also realize your four questions above are useless. Some brokers are better with filling limit orders than others which make your suggestion even more unscientific as now we get into the issue of comparing brokers.



And to say we should forget about my earlier comments so that you can make a point is ridiculous. My previous comments invalidate your whole suggestion. You cannot conveniently forget certain points which invalidate your point just so that your point seems to be valid. This is like getting caught for speed and you tell the cop, lets forget that there was a speedlimit, which mean I wasn’t speeding.



Your whole concept is based on feedback from unreliable sources. Unless you can 100% guarantee that answers to any questions will be 100% truthful and unbiased a RF value based on that is useless and I haven’t seen any suggestions from you on how this will be guaranteed.



When someone trade 100K of a stock with 10K volume for the day, one can 100% say this fill is unrealistic. You have no way of knowing how truthful and accurate feedback is from a subscriber.



You also said: “System subscribers will tell if they are having a bad time with a system, and they will tell if they are having a good time with a system.” Why will they? Because they are asked? How many times have you said no thanks when someone called you and ask if you will participate in a survey? And this is well documented that in trading people will paint a more rosy picture than their accounts show. By admitting to losses, they admit to making a bad choice and being wrong about something. No one like to be wrong.



Chris

Tarek,



I have been watching/bookmarked your system because your trading style/speed/markets traded matches mine closely and has done fairly well considering that the typical hedge fund portfolio made only 9%/year in average returns, 2% more than the stock market.



Eventhough your system experienced max open equity drawdowns early on, it recovered remarkably well to make new highs in a very short time unlike the mighty DJIA which is yet to break-even from its 2000 peak drawdown and has a history of max closed equity drawdown of nearly 89% which took almost 25 years to recover from.



Though you seem to use a sophisticated algorithm to optimize the number and strike prices of the options traded (according to your systems description), I have managed to use a simple, yet effective method based on the instruments long-term support and resistance.



I have been trading stock/ETF options lately by just looking at the 20 day average volume of the stock/ETF/futures and sizing my trade to be 1 option for every 100 shares of stock/ETF, my system recommends (and 1 option for every futures contract,) within the limits of 1% (may increase with account size) of the 20 day average volume of shares/contracts traded, without any problems.



But, if C2 can know the exact bid/ask numbers when the trade is placed, then perhaps instead of penalizing the RF, it could very well limit the trade size to available bid/ask and leave the RF alone?

Pal,



Thanks for the kind comments. Yes, I’m sure C2 can be programmed to actually take a snapshot of the available bid /ask as soon as the vendor issues a signal. The signal may arrive a bit slower to the subscriber (15 seconds delay) but when my average holding time per trade is over 12 days, the subscriber is barely affected. It won’t be advantageous though for the day trader who relies heavily on speed.



Limiting the trade size to the available liquidity is a great idea and will always give you a high RF score, However, since I’m chasing performance over perfect risk metrics and high RF score, it should be available only as an option the vendor can choose from in the system edit. BTW, your futures system is doing well. keep it up. Cheers.



Tarek

You said “The subscribers cannot record their true experiences accurately and completely, because memory is short at best and we cannot expect everybody to be rational;”. It is more correct to assume that subscribers, like most people in this society, will answer a questionnaire truthfully, especially when they have nothing to lose by doing so, and especially when they are likely to gain by doing so. It is also safe to assume that most investors/traders have enough memory, to simply recount what typically happens when they enter a buy/sell order. The value of polling is well recognised, and polling is extensively used in every democratic, free market society, to make known what is not known, plus or minus 3% error factor in large samples. And I am not even suggesting taking a sample, but rather polling everyone in the universe of subscribers, which would have less of an error factor than taking a sample. Subscriber feedback is a more reliable way to go than using guesstimates as to what might occur, or what could occur when placing trades.

And it is not rational to assume widespread dishonesty, irrationality, and memory deficiency among system subscribers.