As data-driven strategies continue to take over the world of professional investing, The Wall Street Journal’s most recent look at this seismic shift highlighted the increasing pressure on traditional investors. For example, at firms like Citigroup, traders are being introduced to coding languages like Python through multi-day courses – and they’re happily spending days away from the floor to do it.
This article was originally published by TABB Forum.
Simply put, data is the future of finance and It’s hard to understate just how much investment strategies are changing because of better access to data and the tools to derive value from it. However, data-driven strategies are only as good as the data behind them, and without properly preparing data, any analysis performed is likely to be flawed.
In other words, you’re wasting your time with Python – or even with analysis tools that require no coding knowledge – if you’re working with bad data.
In an earlier TABB Forum article, I boiled down years of experience cleansing and normalizing hundreds of data sets and offered five common missteps to look out for when preparing data to get value from it. Here are five more.
- Foreign Exchange Conversion. This may seem obvious, but remembering to convert any holdings to your portfolio currency is something that gets overlooked surprisingly often – especially if foreign assets represent a small percent of a portfolio. Calculating percent return and using that figure for analysis will obscure the true returns and can significantly skew results in either direction. FX conversion should be at the top of the checklist for making sure data is usable.
- Split Adjustment. A lot of historical datasets do not account for stock splits. So when a security is trading $44 per share and the next day it’s $22.50 – because of a split – it could look like you’ve lost half your value. And if you’re looking at a lot of historical data over time that doesn’t account for splits, price performance will be all over the place. Whenever possible, you want to use data sets that, by default, already have split adjustments built in. If you select the right data sets from the start, you’ll never have to think about this.
- Using Price Return Vs. Total Return. In many cases, investors will choose a stock because it offers a dividend, and if you’re not using a total return index, you’ll be understating returns. For example, a stock that moves from $95 to $100 over the course of a year would show a 5 percent return. But if that stock also returned a $4 dividend that you failed to include, you’d be discounting the return by nearly half. If you’re not using total return data, you’re missing the full picture and automatically giving preference to companies that don’t give dividends.
- Comparing the Incompatible. Different countries and sectors have various standards and accounting practices, so you can’t just take raw reported values from whoever sends them. Many of the premium data vendors have ways to standardize information into a more “apples-to-apples” data set. For example, Thomson Reuters Worldscope is one of the best options. But be careful, because not all data sets are built like this, and even some data sets from other premium vendors don’t account for these inconsistencies. It’s still good practice to normalize your factor scores within a peer group. Rank them within their region or industry to account for major differences between sectors.
- Having a Fallback Data Source. A lot of the best practices for dealing with financial data are geared towards finding and using the ideal data source, but it’s not always that easy. For example, using a total return index is always preferable because it’s a much better metric, but there are instances where some data might have gaps in coverage. It’s possible that data from a certain country, sector or set of historical years might not be available. In these cases, it’s important to have a fallback where you can use the total return data that is available and supplement it with price returns. Having this fallback is important to making sure you have a more robust set of analytics, and even though it’s not perfect, it’s still better than having major gaps in an analysis.
The industry is in the middle of a difficult transition where traditional investors are trying to learn how to code, and coders are becoming licensed investors. But it won’t always be this way and as new tools make it easier for anyone to work with data – without knowledge of scripting languages like Python or R – a new normal will emerge for investors with all levels of technical skills.
While tools will get (and are getting) easier, a good foundation and an understanding for the data behind them will make you that much more effective.