I've been busy working on the open-source QSForex system over the past week. I've made some useful improvements and I thought I'd share them with you in this forex trading diary update.
In particular, I've made the following changes, which will be discussed at length in this entry:
- Modification to the
object to fix an error with how position openings and closings are handled - Added historical data capability via tick data files through DukasCopy downloads
- Built the first version of an event-driven backtester based on this daily tick data
For those of you who are unfamiliar with QSForex and are coming to this forex diary series for the first time, I strongly suggest having a read of the following diary entries to get up to speed with the software:
- Forex Trading Diary #1 - Automated Forex Trading with the OANDA API
- Forex Trading Diary #2 - Adding a Portfolio to the OANDA Automated Trading System
- Forex Trading Diary #3 - Open Sourcing the Forex Trading System
As well as the Github page for QSForex:
Position Handling Error Fix
The first change I want to discuss is how the Position
object is handling buy/sell orders.
Initially I designed the Position
object to be quite lean, delegating the majority of the work of calculating position prices to the Portfolio
However, this lead to needless complexity in the Portfolio
class, which I eventually realised would confuse new users of the software.
This would likely become especially problematic as I'm sure you would wish to eventually develop your own custom portfolio handling capability without having to worry about "boilerplate" position handling.
In addition I realised I was actually making a mistake because I had mixed the buying and selling of orders with having a long or short position. This meant that upon the close of a position the calculation of P&L was incorrect.
I've now modified the Position
object to accept bid and ask prices, rather than "add" and "remove" prices, which were originally determined upstream of the Position
object via the Portfolio
This means that the Position
now tracks whether it is long or short upon being opened and uses the correct bid or ask price as the purchase or closing value.
I've also had to modify the unit tests to reflect the new interface. Despite the fact that these modifications take some time to complete, it provides greater confidence in the results. This is especially true when we consider more sophisticated strategies.
You can see the new position.py
file in its entirety below:
from decimal import Decimal, getcontext, ROUND_HALF_DOWN
class Position(object):
def __init__(
self, position_type, market,
units, exposure, bid, ask
self.position_type = position_type # Long or short
self.market = market
self.units = units
self.exposure = Decimal(str(exposure))
# Long or short
if self.position_type == "long":
self.avg_price = Decimal(str(ask))
self.cur_price = Decimal(str(bid))
self.avg_price = Decimal(str(bid))
self.cur_price = Decimal(str(ask))
self.profit_base = self.calculate_profit_base(self.exposure)
self.profit_perc = self.calculate_profit_perc(self.exposure)
def calculate_pips(self):
getcontext.prec = 6
mult = Decimal("1")
if self.position_type == "long":
mult = Decimal("1")
elif self.position_type == "short":
mult = Decimal("-1")
return (mult * (self.cur_price - self.avg_price)).quantize(
Decimal("0.00001"), ROUND_HALF_DOWN
def calculate_profit_base(self, exposure):
pips = self.calculate_pips()
return (pips * exposure / self.cur_price).quantize(
Decimal("0.00001"), ROUND_HALF_DOWN
def calculate_profit_perc(self, exposure):
return (self.profit_base / exposure * Decimal("100.00")).quantize(
Decimal("0.00001"), ROUND_HALF_DOWN
def update_position_price(self, bid, ask, exposure):
if self.position_type == "long":
self.cur_price = Decimal(str(bid))
self.cur_price = Decimal(str(ask))
self.profit_base = self.calculate_profit_base(exposure)
self.profit_perc = self.calculate_profit_perc(exposure)
As always you can find the latest version of the full code at the Github page.
Historical Tick Data Capability
The next major task in creating a useful full trading system is to have a high-frequency backtesting capability.
An essential prerequisite involves creating a data-store for currency pair tick data. Such data can become quite large. For instance, I downloaded a day's worth of tick data for a single currency pair from DukasCopy in CSV format and it came to 3.3Mb.
One can easily see that high-frequency backtesting of 20+ currency pairs, over multiple years, with significant parameter variations, can rapidly lead to gigabytes of trading data that must be ingested.
Such data eventually needs special handling, including the creation of an efficient fully-automated securities master database. We will discuss such a system in the future, but for now, daily CSV files will suffice for our needs.
In order to put the backtesting data and the live streaming data on the same footing, I have created an abstracted price handling class called PriceHandler
is an example of an abstract base class that requires any subclasses to override "pure virtual" methods.
The only mandated method is stream_to_queue
, which is called via the pricing thread when the system is activated (either live trading or backtest).
takes price information, from a location that depends upon the particular class implementation, and then uses the .put()
method of the queue to add TickEvent
In this way all PriceHandler
subclasses can interface with the rest of the trading system without the remaining components knowing (or caring!) how the pricing information is generated.
This gives us substantial flexibility in coupling flat-files, file-stores such HDF5, relational databases such as PostgreSQL or even external resources such as websites, to the backtesting or live trading engine.
Here is the snippet for the PriceHandler
from abc import ABCMeta, abstractmethod
class PriceHandler(object):
PriceHandler is an abstract base class providing an interface for
all subsequent (inherited) data handlers (both live and historic).
The goal of a (derived) PriceHandler object is to output a set of
bid/ask/timestamp "ticks" for each currency pair and place them into
an event queue.
This will replicate how a live strategy would function as current
tick data would be streamed via a brokerage. Thus a historic and live
system will be treated identically by the rest of the QSForex
backtesting suite.
__metaclass__ = ABCMeta
def stream_to_queue(self):
Streams a sequence of tick data events (timestamp, bid, ask)
tuples to the events queue.
raise NotImplementedError("Should implement stream_to_queue()")
I've created a subclass called HistoricCSVPriceHandler
, which possesses two methods.
The first is called _open_convert_csv_files
and uses Pandas to open the CSV file into a DataFrame and form the Bid and Ask columns.
The second method, stream_to_queue
, iterrates through this DataFrame and at each iteration adds a TickEvent
object to the events queue.
In addition the current bid and ask prices are set at the class level, which are later queried via the Portfolio
Here is the listing of the HistoricCSVPriceHandler
class HistoricCSVPriceHandler(PriceHandler):
HistoricCSVPriceHandler is designed to read CSV files of
tick data for each requested currency pair and stream those
to the provided events queue.
def __init__(self, pairs, events_queue, csv_dir):
Initialises the historic data handler by requesting
the location of the CSV files and a list of symbols.
It will be assumed that all files are of the form
'pair.csv', where "pair" is the currency pair. For
GBP/USD the filename is GBPUSD.csv.
pairs - The list of currency pairs to obtain.
events_queue - The events queue to send the ticks to.
csv_dir - Absolute directory path to the CSV files.
self.pairs = pairs
self.events_queue = events_queue
self.csv_dir = csv_dir
self.cur_bid = None
self.cur_ask = None
def _open_convert_csv_files(self):
Opens the CSV files from the data directory, converting
them into pandas DataFrames within a pairs dictionary.
pair_path = os.path.join(self.csv_dir, '%s.csv' % self.pairs[0])
self.pair = pd.io.parsers.read_csv(
pair_path, header=True, index_col=0, parse_dates=True,
names=("Time", "Ask", "Bid", "AskVolume", "BidVolume")
def stream_to_queue(self):
for index, row in self.pair:
self.cur_bid = Decimal(str(row["Bid"])).quantize(
Decimal("0.00001", ROUND_HALF_DOWN)
self.cur_ask = Decimal(str(row["Ask"])).quantize(
Decimal("0.00001", ROUND_HALF_DOWN)
tev = TickEvent(self.pairs[0], index, row["Bid"], row["Ask"])
Now that we have a basic historic data capability, we are in a position to create a fully event-driven backtester.
Event-Driven Backtesting Capability
As I keep reiterating on QuantStart I am extremely keen on using backtesting environments that are as close as possible to the live deployment.
This is due to the fact that sophisticated transaction cost handling, especially at high frequency, is often the real determinant as to whether a strategy will be profitable or not.
Such high-frequency transaction cost handling can only really be simulated with the use of a multi-threaded event-driven execution engine.
While such a system is significantly more complicated than a basic vectorised P&L "research" backtester, it will capture a lot more of the true behaviour and allow us to make far better decisions when choosing strategies.
In addition, it means that we can iterate more rapidly as time goes on, because we won't have to continually make the transition from "research level" strategy to "implementation grade" strategy as they are the same thing.
The only two components that change are the price streaming class and the execution class. Everything else will be identical between the backtesting and live trading systems.
In fact, this means that the new backtest.py
code is almost identical to the trading.py
code that handles live or practice trading with OANDA.
All we're really changing is the import of the HistoricPriceCSVHandler
and the SimulatedExecution
classes instead of StreamingPriceHandler
and the OANDAExecutionHandler
. Everything else remains the same.
Here is the listing for backtest.py
import copy
import Queue
import threading
import time
from decimal import Decimal, getcontext
from qsforex.execution.execution import SimulatedExecution
from qsforex.portfolio.portfolio import Portfolio
from qsforex import settings
from qsforex.strategy.strategy import TestStrategy
from qsforex.data.price import HistoricCSVPriceHandler
def trade(events, strategy, portfolio, execution, heartbeat):
Carries out an infinite while loop that polls the
events queue and directs each event to either the
strategy component of the execution handler. The
loop will then pause for "heartbeat" seconds and
while True:
event = events.get(False)
except Queue.Empty:
if event is not None:
if event.type == 'TICK':
elif event.type == 'SIGNAL':
elif event.type == 'ORDER':
if __name__ == "__main__":
# Set the number of decimal places to 2
getcontext().prec = 2
heartbeat = 0.0 # Half a second between polling
events = Queue.Queue()
equity = settings.EQUITY
# Load the historic CSV tick data files
pairs = ["GBPUSD"]
csv_dir = settings.CSV_DATA_DIR
if csv_dir is None:
print "No historic data directory provided - backtest terminating."
# Create the historic tick data streaming class
prices = HistoricCSVPriceHandler(pairs, events, csv_dir)
# Create the strategy/signal generator, passing the
# instrument and the events queue
strategy = TestStrategy(pairs[0], events)
# Create the portfolio object to track trades
portfolio = Portfolio(prices, events, equity=equity)
# Create the simulated execution handler
execution = SimulatedExecution()
# Create two separate threads: One for the trading loop
# and another for the market price streaming class
trade_thread = threading.Thread(
target=trade, args=(
events, strategy, portfolio, execution, heartbeat
price_thread = threading.Thread(target=prices.stream_to_queue, args=[])
# Start both threads
One of the drawbacks of using a multi-threaded execution system for backtesting is that it is not deterministic.
This means that over multiple runs of the same data we will see changes, albeit small ones, across the results.
This is because we cannot guarantee that the threads will execute instructions in the same order over multiple runs of the same simulation.
For instance, when placing items onto the queue, we might get nine TickEvent
objects placed onto the queue in run #1, but may get eleven in run #2.
Since the Strategy
object is polling the queue for TickEvent
objects, it will see different bid/ask prices across the two runs and thus will open a position at different bid/ask prices. This will lead to (small) differences in the returns.
Is this a major problem? I don't really think so. Not only is this how the live system will function anyway, but it also lets us know how sensitive our strategy is to the speed of data receipt.
For instance, if we calculate the variance of the returns across all of our simulated runs with the same data then it will give us an idea of how susceptible the strategy is to data latency.
Ideally we want a strategy that has a small variance across each of our runs. However, if it has a large variance, it means we should be very concerned about deployment.
We could even eliminate the problem of determinism entirely by simply using a single-thread in our backtesting code (as with the QuantStart equities event-driven backtester. However, this has the drawback of reducing the realism with the live system. Such are the dilemmas of high-frequency trading simulation!
Next Steps
Another issue that I keep bringing up is that the system is only capable of handling a base currency of GBP and a single currency pair, GBP/USD.
Now that the Position
handling has been substantially modified, it will be a lot easier to extend it to handle multiple currency pairs. This is the next step.
At that point we will be able to try multi currency pair strategies and eventually introduce Matplotlib to graph the results.
Don't forget to check out the current version of QSForex at the Github page.