Skip to content
On Data, Unicorns and Granola Bars
CircleUpDecember.05.201911 min read

On Data, Unicorns and Granola Bars

A quant’s journey in the private markets

After a career spent applying systematic and data driven techniques to investing in the public markets, I am now on a path to do the same in the private markets. The time is now for quantitative investing to begin, in earnest, the disruption of the private markets.

The state of affairs is analogous to the public equity markets circa 1981: large quantities of highly relevant data are available to those who have the wherewithal to collect and organize it; the computational power required to do the work is readily available; and many areas of the private markets are structurally inefficient, fragmented and have operated largely unchanged for decades. Our goal at CircleUp is to bring many of the methodologies that have proven so useful in the public markets for stocks, currencies and interest rates to the private credit and private equity markets (1).

However, just as it was very difficult to transform readily available income statement and balance sheet information into machine readable form for public companies in 1981, it is today a very difficult data science problem to aggregate, map and organize the vast quantities of unstructured data that are required to operate in the private markets. Having done this we believe we are in a position to add significant value to our clients portfolios.

An intentionally over-simplified history of quantitative investing

In the early 1980s it became apparent to a relatively small subset of investors that the combination of data about companies and (relatively) cheap access to computational power would confer a significant advantage to those who were able to harness it.

Within the equity markets there were two strains of quantitative investors, differentiated by the types of data, the time horizon and trading strategies on which they focused. The ‘statistical’ quants collected data pertaining to daily trading activity in the market (e.g., prices, volumes and bid/ask spreads). The ‘fundamental’ quants collected data from the quarterly and annual financial statements of companies (e.g., revenues, earnings and debt). In both cases the goal was to have a complete set of data for each and every company in the market so that data driven comparisons between companies could be made across the broadest opportunity set possible. Today that is a maximum of about 5000 reasonably tradable stocks globally.

Quants needed this data at each point in time, typically daily or monthly, going as far back in history as possible so that they could run experiments (a.k.a. backtests) to identify patterns in the data that were indicative of profitable investments. If the future were to look anything like the past, then exploiting these patterns would form the basis of an investment strategy.

The statistical quants determined that pairs of companies who shared the same underlying economic drivers (e.g., Coke/Pepsi or Ford/GM) should have stock prices that closely tracked each other day-to-day and week-to-week. When too large of a gap opened up between such a pair of stocks a profitable trade was to be long (bet on) the underperforming stock and short (bet against) the outperforming stock. This so-called ‘pairs trading’ strategy, run across every conceivable pair, formed the basis of what became known as statistical arbitrage.

Likewise, the fundamental quants realized that systematically buying companies which were cheaper than their peers on price-to-earnings ratios and simultaneously had as good or better profitability than their peers, was a reliable way to outperform the broad market. This was the genesis of what eventually became known as factor investing, the ‘factors’ here being value (price-to-earnings) and profitability.

Nearly forty years later, the top ranks of global asset managers are filled with many of the firms who pioneered the use of these techniques. Further, many of the same principles have spread to markets for interest rates, currencies, commodities and corporate bonds. To say that data driven, systematic investing has revolutionized the public markets is an understatement. Today quants make up about ⅓ of stock-trading volume.

What makes a quantitative approach feasible?

While it is tempting to think that an abundance of data and computing power is sufficient to implement a quantitative investment strategy, these are only necessary conditions. In order to make a quantitative equity strategy viable, the following need to be satisfied (2):

  1. large number of companies to select from.
  2. Large quantities of cross sectionally comparable data about the companies.
  3. The data needs to contain information that is correlated with future outcomes.
  4. The data cannot be readily available and usable for the average market participant.

Let’s assume that all four of these criteria are satisfied. How then does a quant armed with their data and algorithms generate returns that are superior to the returns of the other market participants? First, quants do not win because their models tell them to put all of their money into Apple just prior to the introduction of the iPod. Rather, quants generate sustained outperformance because the information they have provides a small statistical edge. Each stock that the models choose is only slightly more likely than pure chance to beat the market. To efficiently take advantage of this edge, quants typically hold a portfolio made up of hundreds, or even thousands, of stocks.

The private markets, in contrast, rarely has available data containing meaningful information with which to compare the future prospects of companies across a large and investable universe. It is by definition very hard to obtain data about private companies:

  1. There are no regulations that require a private firm to disclose significant data about their business;
  2. There are very few industries in which private companies are incentivized by market forces to make data about themselves widely available;
  3. When data is available about private companies it rarely covers a large number of contemporaneously, much less historically, comparable firms.

As promising as it sounds to take data driven investing into the private markets, there are a number of meaningful obstacles that need to be overcome.

Quants don’t hunt unicorns…

Consider tech VC: a very popular and increasingly crowded corner of the private markets. To apply a data-driven approach, an investor would need robust data, which simply isn’t readily available. Take cyber-security for example, if a start-up is based on an innovative malware detection technique, it is not likely to broadcast the fine points of its algorithm to the world and the terms on which they sell their wares are often kept secret so as not to influence negotiations with future clients. This does not make for a data rich environment. It is exceedingly difficult, using only publicly accessible data, to compare one cyber-security firm to the next much less evaluate cyber-security firms versus other categories of software.

More problematic is that the object of the game in tech VC is most often to identify the next winner-take-all ‘unicorn’. If a quant were to take on this challenge she would need to ask herself the following question: would all of the data available in 2006 about MySpace, Friendster, Facebook and Orkut have been sufficient for a statistical model to predict the emergence and eventual dominance of Facebook in social networks? The answer is definitely, “of course not”. This is not how quant works. Statistical models, almost by definition, cannot forecast outliers while tech VC returns depend on the outliers (6% of deals make up 60% of returns). Said differently, quant models do not excel at predicting something that has never happened before. And yet, much of the money in tech VC was made by investors successfully identifying the emergence of ride sharing, social media and internet search (3).

So, where can quantitative techniques be expected to add significant value in private markets? We need to identify market segments that satisfy all of the criteria mentioned above and, in addition, do not require our models to forecast the creation of something entirely new.

…But, quants do eat granola bars

There is at least one sector that satisfies all of the criteria for quantitative investing to have a chance in the private markets: consumer, and specifically consumer packaged goods (CPG). This encompasses everything from beauty and personal care products to pet food and, yes, granola bars. The consumer industry in North America is a massive market and, unlike technology or industrials, the companies in this sector want consumers to know everything about their brand and their products, creating a data rich environment. They actively promote product features, pricing, ingredients, and where they are sold. Retailers want consumers to know which goods they sell and at what price. End consumers engage with these brands online and continuously generate mountains of text data. Furthermore, a multitude of third party data collectors are focused on this sector. In summary, there is an abundance of relevant information available about every product imaginable.

Another important feature of CPG is that there are hundreds, if not thousands, of competing brands in most product categories — think cosmetics or, again, granola bars. The basic business model for producing, distributing and selling these types of low-cost, high repeat purchase items is neither complicated nor does it vary greatly from one brand to the next or even one category to the next. The upshot is that the available data are very comparable: cross sectionally, we can assess the merits of sunscreen A versus sunscreen B on the shelf at CVS today. We can also make comparisons across time: how does a new sports/energy drink stack up versus successful brands like Vitamin Water before it was eventually sold to Coca-Cola for $4.1B?

To revisit the criteria I led with:

  1. A large number of companies to select from: There are 500,000 consumer goods companies and retailers in North America.
  2. Large quantities of cross sectionally comparable data about the companies: Distribution, Brand and Category data is available for 100,000’s of companies today and in time-series.
  3. The data needs to contain information that is correlated with future outcomes: We can empirically show it is possible to forecast relevant metrics such as revenue growth using the available data.
  4. The data cannot be readily available and usable for the average market participant: The data comes from hundreds of both public and private sources and stitching it together is a very difficult problem called Entity Resolution.

So, why am I here?

CircleUp has collected data on 1.4 million companies primarily in the US and Canada over the last seven years. We are currently tracking 500,000+ live and relevant companies. This is collected from hundreds of sources spanning dozens of online venues, specialized consumer goods data bases and truly private data from the tens of thousands of companies that CircleUp has worked with since it was founded in 2012. Some of this can be obtained by others and some is unique to CircleUp. As noted above, it is a very difficult data science problem to aggregate, map and organize the vast quantities of consumer industry data coming from these hundreds of sources. Having solved this problem, like the original quants in the 1980s, CircleUp today is one of the very few firms that have massive amounts of relevant and comparable data about a very large and investable universe of private companies.

In the public markets of today the ‘simple stuff’ from the 1980’s is so widely available that it does not confer any meaningful informational edge (4). Everyone talks about “alt data”, but this is just a fancy way of saying they value novel data sets. The reality is that everyone in the public markets mostly have the same data and therefore a lot of the work ends up being focused on trying to design smarter algorithms to extract signal from the data. In the limit, if everyone has exactly the same data, there is nothing else to do but build better algos than the competition. This is really, really hard. In 1981 having the data is what mattered: if you had information and the means to clean and process it, then you didn’t need to be a genius to figure out what to do with it.

The private markets are the last corner of the capital markets that remain untouched by the several decade’s old revolution in technology and data. Even so, it is by no means obvious that all of private equity and credit will succumb to the quant techniques that proved so useful in the public markets. However, large parts of the private markets, including the consumer goods industry, satisfy all of the criteria necessary for a sufficiently motivated quantitative investor to generate outsized returns.

This is the closest thing I have encountered in my career to the state of affairs that existed in the public markets of several decades ago. Just as the few firms that embraced data driven investing in the public markets generated significant returns for their clients, I believe that CircleUp is at the beginning of an analogous journey today and it is exciting to be a part of it!


(1) CircleUp is an investment firm that harnesses the power of data to provide capital and resources to emerging consumer brands.

(2) 1,2,3 taken together capture the essence of what is known as the “the fundamental law of active management” from Grinold & Kahn.

(3) It is worth noting that, in the public markets, traditional factor models have generally had a very difficult time dealing with the era of “winner-take-all” tech stocks.

(4) It is a debatable point as to whether basic factor models still have an edge in 2019. What is nearly certain is that if they do provide some outperformance, it is a fraction of what it once was and the fees asset managers can charge for delivering the modest outperformance on offer have shrunk commensurately.