As a data scientist, I’m constantly iterating the way I approach my work — the field of machine learning evolves so quickly that promising new tools and methods surface on a near daily basis. And as a company, CircleUp firmly believes that data-powered technology is the future of private equity investing, a field that has up until now depended on the work of human analysts.
This means our data science team has the exciting but challenging task of pioneering a completely new approach to an industry, while using an ever-evolving set of tools. We often use the old Silicon Valley trope “building the plane while flying it” to describe what we do.
My goal with this article is to share what my colleagues and I have learned along the way. I also want to open up a conversation with others in data science, finance, and tech on the most efficient and accurate ways that new technologies can be applied to private equity investing.
Our mission at CircleUp is to help entrepreneurs thrive by giving them the capital and resources they need. To do this we rely heavily on our Helio platform, a consumer business graph that collects intelligence on early-stage companies in CPG. Helio helps us to scale and debias the tasks of discovering new brands and prioritizing our investment resources.
At its core, Helio is a collection of algorithms and models that describe various attributes of the 1.2M brands that we track. The data scientists behind Helio spend most of our time developing, scrutinizing, and improving these models. Ultimately, CircleUp’s success as a company depends on how effective we are at amassing an accurate and in-depth database and optimizing the algorithms and models that sit on top of it.
Over the past five years Helio has grown tremendously, from the early days of our Classifier into a full-blown consumer business graph. We credit much of this growth to the ever-increasing power of machine learning and technology, which has enabled us to move quickly through model development and keep our eyes on the prize: helping entrepreneurs thrive.
The Power of the ‘Black Box’
Some of the most useful machine learning tools in our arsenal are what we refer to — with equal parts fervor and skepticism — as the “black box” models. Black box models help us to find complex patterns and relationships that would be impossible to discover using fundamental analysis or simple statistics. For example, we can feed a black box model a variety of raw data points including social media activity, product categories, and various distribution metrics, and produce revenue estimates that are often within $10–100k of the true revenue. This ability is transformative for a startup like CircleUp, since knowing a company’s revenue can help us to determine whether it is a fit for our platform and investors.
Why It’s Hard to Sell a Black Box in Fintech
The popularity of black box models — indeed, our own usage of them — suggests that their pros must outweigh their cons. And while historically we’ve agreed with that assessment, we are beginning to reassess and think more carefully about the conditions that qualify a given usage of a black box.
There are tradeoffs, after all, to using black box models. The same complex patterns that drive accuracy are often impossible to understand on an intuitive level. In other words, one cannot see inside of the box. Deep neural networks often contain dozens of intermediate “layers,” with progressively obscure transformations on the raw data. Values in the middle of the network are meaningless on first inspection and either impossible to impractical to explain. And yet, they work extremely well.
Our ability to help entrepreneurs thrive relies not only on our ability to make strong predictions in a traditionally unpredictable market, but also on our ability to develop trusting relationships with investors. Investors are often wary of a completely black box approach, and with good reason: Markets are famously difficult to predict, and not being able to manually track and fully understand the methods behind your investing models can be quite disquieting to those who are used to fully traceable analytic practices. Black box models are just as difficult to rationalize as the markets themselves — which is a valid reason to be excited about their potential and yet somewhat distrustful of them.
But perhaps there can be a compromise. Alongside the progress we’ve seen in machine learning and tech, research into the consumer packaged goods (CPG) sector has also progressed in building theories for why certain business strategies and performance measures are correlated with success. CircleUp believes that CPG investing is much more systematic than other popular sectors in private equity investment, such as IT and software: CPG businesses tend to follow more predictable growth trajectories than their tech industry counterparts, with valuations and exits linked in a relatively linear way to revenue and distribution.
This implies that it should be possible to construct simpler models, based on interpretable theories and causal mechanisms that obtain levels of accuracy comparable to a black box. If so, these models should provide a more compelling rationale to investors, one that resonates with their own mental models for business growth and therefore improves the likelihood of a deal.
Brain vs. Brawn
We refer to these two competing approaches as “brainy” and “brawny” to capture what we believe are their respective essences. Brainier models are simpler, with theoretical structures and features that map on to investor concepts and business principles. The brain reference is a nod to the additional thinking required to specify these models, something that is not easily automated (yet) by a machine.
In contrast, “brawn” refers to the black box modeling approach. The brawn philosophy is to reduce the modeling exercise to a search problem and then rely on a machine (often multiple parallel machines) to navigate the complex space of potential patterns to fit a given outcome variable. With a bit of practice, these models can also be quite “brainless” from the user’s perspective. Similar to flying a commercial aircraft, it can be very difficult learning how to operate the machinery and to troubleshoot the occasional bug; but once those skills are mastered, you can typically turn on “auto pilot” and let the algorithms do the heavy lifting.
When to Use Each Approach
A question we’ve frequently struggled with at CircleUp while building Helio is whether or not to apply a brainy or brawny approach to a given problem. Part of the reason for this post is that we are extremely curious to know whether other companies face the same conflict and how they address it. The other part is to share what we currently do, hopefully start a conversation, and perhaps help other data scientists along the way.
We find that the choice of brain vs. brawn is clear in some cases and murkier in others, and that the decision can be mapped roughly to a coarse taxonomy that we use to organize our models:
- Human judgments at scale
First, we have models that enable us to apply human judgment at scale.These are essentially automation problems where there are no theories available to motivate a structured model.
For example, our industry classification model determines the categories associated with a brand (e.g., popcorn, makeup, detergent) based on available keywords. Investors and CircleUp associates have only a minimal interest in how these models work, and are mostly concerned that they perform as well as possible. We have found that the best ways to optimize these models are to collect a wide range of features, draw from a diverse range of sources, and refresh the training sample regularly to stay on top of drifting signals.
Brawny, black box architectures such as support vector machines, gradient-boosted decision trees, and “deep learning” models work well for this use case.
- Measure something fuzzy
Second, we have models that help us to measure something fuzzy but intuitively important.
A prime example is our “brand” model, which rank orders brands within a given category from best to worst. The actual number that determines the ultimate rank order is based on a variety of data points that industry experts and CircleUp team members believe are prima facie evidence of brand quality. We argue that this type of model is better served by brain than brawn.
A useful analogy can be drawn to personality models from social psychology. When a personality researcher attempts to measure a fuzzy trait, such as “openness”, they create a survey instrument to quantify a person’s preferences or behaviors that they believe are prima facie evidence of the trait. Then they apply statistical methods such as factor analysis to the survey responses to build evidence that the questions do reflect an underlying construct. The resulting factor scores are often validated against external measures that one believes would be impacted by the construct.
Similarly, with our brand model, we collect numerous data points that are intuitively related to brand (e.g., sentiment, social media activity) and find a weighting schema that yields a strong correlation to our validation measures, such as revenue growth or share of wallet. Linear regression works perfectly well for this task; it is literally an algorithm to find optimal weighting schemes. More advanced techniques such as partial least squares regression and elastic net regularization can help to simplify these models and reduce overfitting.
Hence, when we measure something fuzzy we opt for a brainer, more traceable approach. Brawny approaches are inappropriate for this case because our goal is to understand exactly how certain variables jointly relate to an outcome. It’s also acceptable if the model fit is relatively weak, since it’s reasonable to expect that our fuzzy concept of brand will account for only a modest portion of revenue growth or share of wallet.
- Mere prediction
Third, we have models that predict something that is measurable but inaccessible, including predicting the future. These models probably have the strongest connotation with machine learning and data science in an industry setting. Predicting the future has obvious benefits for planning, budgeting, and investing. For example, CircleUp would like to know the last 12 months revenue and year-over-year revenue growth for all of the 1.2M companies tracked by Helio. From our own proprietary data and through partnerships we have actual values for these quantities for a small subset of companies and would like to train a model to predict the remainder.
We find that the decision between brain and brawn is most difficult for this type of model. Clearly, we want to be as accurate as possible, but we also need reassurances that the model is performing in a sensible way that will generalize out of sample and out of time.
Brawny models often give us the best performance, but brainer models are most compelling to investors. We are building both types of models currently, and have a number of considerations when choosing which to use in production:
How We Make These Choices
It should be clear by now that the decision to take either a “brainy” or “brawny” approach is highly specialized to each specific use case, and informed by a number of different factors. In order to keep some consistency around the decision-making process, we make sure to evaluate a few key variables every time we choose to take a brainy or brawny approach to a given problem:
The first consideration is whether the success of the model relies on the user developing a gut feeling of trust.
Consider the use of black box models in self-driving cars. Imagine a car that is highly accurate (very rarely has a collision) but that works in a way that is nearly impossible to explain. Now imagine a car that is also very safe, although slightly less accurate on the training data, but uses a very simple set of rules that be understood by a lay person and that map to rules used by expert human drivers. Both cars have been trained on a random set of possible driving scenarios and should perform well on possible future scenarios, but there is no guarantee. Which do you trust to maximize your safety, especially in new conditions that the cars have not been trained on? Do you prefer the higher accuracy car, or the slightly lower accuracy car that uses a sensible, understandable set of rules?
We don’t believe there is a correct answer to this question in the general case, but we can tell you that many institutional investors that deploy millions to billions of dollars in capital annually tend to prefer the latter. Investors are often willing to sacrifice small to moderate amounts of training and out-of-sample performance for greater interpretability. For this reason, the Helio models used by our funds team to inform investment decisions tend to be simpler, and the predictions are often presented alongside with their interpretations, including the raw inputs, beta coefficients and example calculations for a small but illustrative set of companies.
It is also important to consider whether past attempts to use black box models for a given problem have succeeded or failed.
For example, the past 10–20 years have seen a flood of brawny models applied to stock trading, and the results have been mixed. Although “quant” trading firms are as popular today as ever, more experienced firms will argue that few genuine signals exist in the public markets, and those that do exist are already known.
CircleUp listens carefully to these investors and has developed a viewpoint that is mindful of, but not wholly deterred by them. While we agree that data mining has the potential to lead down unproductive paths, especially in the public markets, we also believe there are critical disanalogies between public and private markets that work in our favor. The key difference in private markets is the lack of centralized data on company performance measures and outcomes, especially in the “long tail” (for smaller brands, smaller retailers, and in less populated locations.) Our strategy centers on precisely this problem, and we have focused immense resources on collecting and building partnerships to obtain these data in an abundance previously unseen in the CPG space. More so than our models perhaps, our exclusive access to this data will hopefully guarantee that we are first to discover the signals that do exist.
And while history tells us that data mining can be risky, we also believe that black box models can be a great catalyst for discovering new, counterintuitive trends in fresh data sets with deeper and broader coverage than previously available. We often use brawny models for data mining and to generate new ideas and hypotheses, which we can then later build into more structured models with ready interpretations.
Lastly, the choice between brain and brawn depends on whether you expect your models to represent truly causal relationships.
For example, do you need to know whether brand strength has a causalimpact on revenue growth, or merely that companies with strong brands also tend to grow faster? Causation is most important for investment firms that are involved in the decision making and strategy of their portfolio companies. For example, if an investor knows that brand strength causallyimpacts growth, then it may be wise to acquire a new company with strong distribution but a weak brand, since a capital injection could go toward improving brand, which would in turn increase profitability.
Black box models rarely provide any information relevant to teasing apart causality. While you can often determine which features have the greatest impact on model performance, these impact metrics gloss over important details, such as whether the feature has a positive or negative relationship on the outcome variable and whether there were non-linear or interaction effects. Those properties would require brainier follow-up analyses.
CircleUp is attracted to models with causal interpretations, because over time we would like to be more prescriptive with our data, not only by helping investors make stronger predictions but by helping entrepreneurs to make smart decisions that will improve their brands and products.
Just the Start
Hopefully, this has provided a general outline of what we at CircleUp Data Science have learned over the past several years in applying machine learning to CPG investing, and some of the best practices we’ve developed. But this is just the beginning. We are in the earliest, most pioneering stages of what we believe will be a true sea change in private equity investment, and there is still so much more to learn.
We’d love to hear feedback on our approach from data scientists, investors, entrepreneurs, and others. You can comment here, shoot us an email at firstname.lastname@example.org, and find me on Twitter at @ericgtaylor.
ABOUT THE AUTHOR:
For the past 10 years, Eric has worked as a quant of sorts on a variety of topics: AI and the psychology of human learning, unmanned aircraft systems, cloud computing, and financial tech. At CircleUp, he builds machine learning models to discover and evaluate early-stage high-growth investment opportunities in the consumer packaged goods industry. Eric holds a B.A. in Psychology and Mathematics from the University of Texas at Austin, and a Ph.D. in Cognitive Psychology from the University of Illinois at Urbana-Champaign.