Battle of the Back-Testers
Exploring the Art and Science of Back-testing with Jason Strimpel and Marsten Parker
Allow me to share a few thoughts that came up as we brought together two exceptional minds in the trading technology space to talk about their back-testing applications. In the blue corner representing Python - Jason Strimpel, an experienced quantitative risk manager, trader and technology leader, and in the red corner representing his own application (Real Test), Marsten Parker, a legendary systematic trader and bona fide Market Wizard. The discussion dives deep into the nuances of back-testing, highlighting the importance of understanding the underlying technical mechanisms and the diverse approaches to creating robust trading strategies.
Meet the Experts:
Jason Strimpel has a rich background spanning approximately 20 years in trading, quantitative risk management, and technology. Currently working in Gen AI at Amazon Web Services, Jason has held various pivotal roles, including work spinning up the data and analytics group at Rio Tinto, and building a risk and trading system for them. He’s a passionate educator, sharing his knowledge through PyQuant News and teaching courses on Python for quantitative finance. For a discount on those, visit the tools page on our website!
In the show, Jason talks about the challenges and rewards of building a back-testing engine using Python, discussing the importance of being familiar with the assumptions and limitations of your tools. He highlights how Zipline Reloaded, an open-source back-testing framework, can be a powerful resource despite some of its limitations, such as handling multi-asset back-tests and data bundling issues.
Marsten Parker is a self-taught programmer and trader with over 20 years of trading experience. Initially trained as a classical violinist, Marsten’s entrepreneurial journey took him from the concert halls to the trading floors, where he developed a systematic approach to trading that has earned him recognition as a Market Wizard in Jack Schwager’s ‘Unknown Market Wizards’ book. He’s known for his meticulous back-testing and flexible trading strategies, which have enabled him to adapt and thrive in various market conditions.
Marsten shares his journey from being dissatisfied with existing back-testing tools to developing his own, called RealTest. He discusses how RealTest, built in C, allows for both vector-based and event-based back-testing, offering a lot of speed without sacrificing precision. Marsten’s focus has always been on creating a tool that is intuitive and fast, enabling traders to iterate quickly and refine their strategies effectively.
A Quick Introduction to Back-Testing
In the world of systematic trading, the process of back-testing plays a critical role in validating the effectiveness of a trading strategy before it's deployed in the real market. Back-testing involves applying your trading algorithm to historical market data to simulate how it would have performed in the past. This practice is invaluable because it allows traders to evaluate the strategy's potential profitability, risk management, and overall robustness. However, the success of back-testing hinges on both the quality of the software & data used and the rigor of the back-testing process itself.
Key Technical Aspects of a Back-Testing Application
Even the beginner systematic trader needs the best back-testing application they can get their hands on. There’s little point beginning the journey of building strategies based on actual historic data if the results you get are not 100% realistic. You could waste months or years believing that you were finding edges, when you weren’t. Good data is equally critical. For longer-term and more basic strategies, perhaps slightly poorer quality data would get you most of the way there, but why even take that risk given the low cost of high-quality data that is much more fit for purpose? Don’t begin building your house on sand.
Finding edges in an efficient market isn’t easy. You can’t bring a blunt scalpel to a brain surgery, there’s not enough room for error. Fortunately, these tools exist, and range from free to cheap to expensive, so there are plenty of high-quality options for you these days.
Having the right machinery isn’t going to make you a master-craftsman, but if you are trying to learn to hammer in nails with a toothbrush you’re going to get frustrated as well as learn bad habits. Get a good value for money hammer from the outset.
Choose your Weapon
So, there are a number of tools on the market: Real Test; Strategy Quant; Amibroker; Trade Station; Multi Charts; Ninja Trader; Zorro Project and MetaTrader5 (better than MetaTrader4) come to mind. A few others saved in my browser for what it’s worth: Trading Blox; Right Edge, which recently passed away (the domain seems to have been purchased by a porn site - don’t look!); Investor RT; Numerai; Wealth Lab; DeltaRay & Orats for options, Quant Insti, Price Action Lab, Build Alpha and more.
In the Python arena there is Quant Connect (C# or Python can be used) and Quant Rocket (both of these include access to loads of data). But the gracious donation to the community by the (now also passed) Quantopian is the open-source Python back-tester Zipline. Jason talks about the Zipline Reloaded fork as the best back-testing engine in the Python marketplace. Not bad for free! To back-test in Python proficiently you probably also need Vector BT and a couple of libraries such as Pandas, NumPy, SciPy, QuantLib and so on.
DIY or Off the Shelf?
My two cents is that if you don’t already code, you may want to start with an easy to use but extremely high-quality application like Real Test. You can confirm that systematic trading is your thing before investing further time and effort in learning to code. This might also avoid a common pitfall: getting so engrossed in the joy of creating code, that you lose sight of the goal of making money. Without having a dig at developers, I’ll subtly suggest that this isn’t an uncommon problem. : ) If you already code or know that Python is essential for your career progression in finance, or want to learn to code anyway, then diving deep into Python is a really smart choice.
Having access to both possibilities also has real advantages. Imagine rapidly prototyping ideas or researching a particular edge by running thousands of simulations in minutes, while having the skills on hand to develop custom solutions and bespoke functionality whenever you needed it. Python might also come in handy for automating the execution of your trades with your broker. My Order Management System which connects to the Interactive Brokers API is built in Python.
Ultimately if you can code yourself, you get infinite flexibility. Your ability to check under the hood and see exactly what’s going on is going to serve you well. If you’ve got a unique idea, you’re not reliant on an off-the-shelf application handling your specific requirement. You’re also in a better position to develop and retain custom intellectual property if you’re building everything yourself. You can decide what’s important to you and focus on that. My feeling is that if you take this path, you’d have to be prepared for a million and one little ‘tricks and traps’ that the newer trader might not consider and that need to be accounted for. These could take you a while to discover. Chances are with an established piece of software, with a large user-group and regular updates, almost every one of these nuances are going to have been dealt with.
Then Comes the Research
Once you have the tools, there is still a learning curve ahead before you get the keys to the kingdom. When will you first learn about historic changes in the way exchanges delivered data? When will you figure out that your strategy, which is dependent on low commissions would have been untradable a decade ago when commissions were much higher? When will bitter experience teach you that you need to be prepared for the risk you haven’t even seen yet, and what that might look like? Something as simple as slippage can kill many short-term strategies. How about the concept of ‘warehousing risk’ that negative skew strategies like mean-reversion might bring, storing it up until the floodgates blow it open? Can all decisions be based on the data, or should logic over-ride? When doing our research, is there enough data for a machine to learn on, or do we need to logically design a framework before refining it on real-world conditions? Then of course, even if you get all that right, there’s the big one: how do you construct a process and a discipline to avoid over-fitting?
The Art of Applying the Science
The commercially savvy trader has a secret weapon: he or she knows that the objective in the end, is to make money. Once all the tools, data and skills are in place they can get to work on the strategy construction with a particular mental construct; one that will prioritise chances of success in the future (chances of building something robust out of sample) and that might mean sacrificing the ego on the alter of the holy grail system. You’ll have to think outside the box, prioritise survival, stand on the shoulders of giants and apply the time-tested principles of good strategy development. By this I mean things like diversification - of markets and of models, good risk management, small bet sizes, and so on. You’ll also need to train your eyes to spot the model likely to work best in the future, not the one that worked best in the past. That might take some lateral thinking: such as a decision to take a ‘design-led’ or ‘logic-driven’ approach rather than a machine-learning approach while you are starting out.
Key to all of this in my humble experience is to develop a process for research and development that forces you not to over-fit. Like building a quantitative trading strategy – you design rules to ensure your method of research is sound, and then enforce them with discipline. Marsten mentioned his iterative approach, where he likes to be in a ‘flow state’ (pays to have a fast back-tester) and Jason highlighted statistical validation and testing of known factors (such as the crack spread).
So We Back Test
I think a lot of newer traders view back-testing a little bit like a slot machine: if they press the button enough times at some point they may get lucky and find the holy grail. The danger in this approach is that the ideal model for the past is inevitably not going to be the ideal model for the future. Just as a small example, does your data include the great depression? Black Monday in 1987? Even if it included all human history, would it rule out something different happening in the future? It’s not likely. So back-testing is really just a research function, and viewed like that you can gather evidence (as Jason mentioned) and once you piece together enough evidence to be confident in your findings, you can progress to live trading. Even then, logic based decisions such as how much to risk on a new strategy will over-ride any quantitative work you’ve done, and this is why I think the art is equally as important as the science.
In a future post I’d love to deep-dive the topic of curve-fitting, and how to avoid it, but to wrap this up I’ll remind the newer traders of some of the analysis that can be done in good software to evidence you have found a robust edge in the market.
Out of Sample Testing: Reserve data you have not trained the model on. Do not touch it. Never cheat. You get to run your model on this data once. When you do it more than once, you’ve just made the data in-sample and missed the whole point.
Cross-Market Validation: Perhaps you build distinct strategies for gold, and a different strategy for silver, because you believe they ‘behave differently’, and that might be true in some cases. On the other hand, if a system works on more than one product or market, it is generating evidence that it is universally robust.
Stress-Testing: Whether by constructing your own data or via other methods, you can include ‘what if’ scenario testing to check how the model works when passing through an event it has never seen before. In my days as a Risk Analyst, I loved constructing hypothetical scenarios to confirm whether an event could pose a ‘risk of ruin’ to the bank/exchange/fund.
Map to Market: The idea here is to answer the questions: ‘does my model make money when it is supposed to?’ as well as ‘Does my model lose money when it is supposed to?’. Hint, there are times when it should lose money! This is a logic-based approach and confirms profits weren’t being generated by noise.
Parameter Sensitivity Analysis: If your model works with a 200 period moving average and dies with a 210 period moving average, you’re doomed. But see how far you can take this analysis. You can drive research by testing a myriad of permutations to confirm a hypothesis, rather than ‘spit out a model’. This doesn’t just go for indicators – what about stress testing commissions or slippage?
Monte Carlo Simulations: Valid for some models more than others, this involves running the strategy through a large number of simulated market scenarios, with different sequences of returns, to assess how it performs under various market conditions.
Walk-Forward Optimisation: This method involves optimizing the strategy on a rolling basis, where a portion of the data is used to optimize parameters, and the next portion is used to test the strategy. The process is repeated across the entire dataset.
Statistical Validation: Optimising on a single metric, such as your compound annual growth rate divided by your worst draw down, may miss crucial evidence. How many trades does your strategy generate? What are the trade level statistics? Is your sample size large enough to draw statistically relevant conclusions?
Design & Logic Based: I already mentioned asking the question: ‘does my strategy generate returns as expected’. This type of analysis would also fit into a category of ‘common sense tests’. You have to expect very different things from a mean reversion system than from a trend following system. A model should have an objective and that will determine the outputs you are expecting (a hedge strategy for example might lose money frequently, but save your bacon when the excrement hits the propelling device). If you already know a strategy ‘works’ because you traded it in a discretionary manner, or it’s public information, then you may seek to replicate and enhance said strategy, rather than build one with a machine-learning approach. This can give you confidence that you are not wasting your time. External evidence is part of the evidence.
Simplify: Complex models with many parameters are more likely to fit noise rather than the signal. You are also likely to have added in just the right parameter to avoid the worst draw-downs historically and make you feel better, but will they work in future? Ask yourself what it would take for your assumption to break down. Ok, so how could you prepare for that? Simpler models are easier to diagnose if concerns come up.
Ultimately, in my mind, you want to create multiple models that work together in a diversified portfolio. Listen out for the next show with Laurens Bensdorp who trades 55 of them for more on that!
Good luck and stay curious!