Introduction to Helio
Helio is our machine learning platform that identifies, classifies, and evaluates early-stage consumer and retail companies to shine a light on breakout brands. Helio beats heuristics with data and offers actionable insights to industry participants.
Historically, there has been no centralized source of performance data on companies in the private markets. This challenge is amplified in the “long tail” of early-stage brands, which are often digitally native or distributed through regional retailers. We built Helio to address this challenge and spot innovation early:
We gather data from hundreds of sources that fall into 3 buckets:
Public data: Gathered from publicly available sources, online and offline
Partnership data: Gathered through our relationships with various partners
Practitioner data: Gathered directly from entrepreneurs; this is the training data for our models
We normalize the data we collect in order to assemble an overall picture of the landscape. This results in 1.4M companies tracked across North America - (nearly) every consumer and retail business. Not all of these companies will be relevant for us, but we think it’s important to cast a wide net so we don’t miss anything
Helio classifies the 500K+ relevant companies into over 100 categories with a high degree of granularity (e.g. kombucha distinct from tea)
We build evaluative models that analyze various aspects of each company, relative to its category
We build evaluative models that analyze various aspects of each company, relative to its category
Helio is focused on descriptive and diagnostic analytics. It’s able to analyze what happened in the consumer sector and typically figure out why it happened. It is also able to generate predictions related to future revenue and distribution. While our models aren’t perfect (and by nature never will be), they provide a significant information advantage.
As we infuse Helio with more data, we will continue to improve the predictive capabilities as well as prescriptive insights in order to help companies operate more successfully.
In five years, we believe Helio will be the rails on which the consumer industry runs.
How Helio Finds Companies
How can we cover every consumer and retail company that exists in North America? We do so in multiple stages:
Identify the universe of companies:
We use our proprietary technology to continuously identify companies in the consumer and retail space. We define this sector broadly so as not to miss anything. Today we track 1.4M companies.
Remove irrelevant companies:
We apply our relevance filters that use natural language processing techniques and classification models to validate whether the company in question currently sells branded consumer products in the US and Canada.
While we continue to add emerging companies, we believe we’re close to 100% coverage.
Associate each datapoint with the appropriate company:
Our entity resolution system assigns each data point - at the product, brand, and company level - to a company using a combination of search algorithms and machine learning models.
Let’s zero in on the problem of entity resolution. There is no universal directory or common identification code that all brands and retailers adhere to, so many companies and products have similar names and data sources have inconsistencies. This makes for a technical challenge. Industry participants, including data companies and retailers, acknowledge it’s a beast of a problem.
Take the product “Duke’s Original Smoked Shorty Sausage, 5 Oz”. The challenge is to correctly identify which company this product should be associated with. Sounds easy?
Looking at our database, we find several separate companies that a consumer would refer to as Duke’s: “Duke Foods”, “Duke’s”, and “Duke’s Pet Foods”.
For Helio, conflating these three companies would generate noise in our algorithms. To humans, this challenge appears trivial on a small scale, but how about across more than a million companies, tens of millions of SKUs (Stock Keeping Units), and trillions of data points? A computer and sophisticated modeling is needed.
The outcome is a knowledge graph of the entire consumer space that maps reviews, labels, social posts, pricing, location, and more to individual products and brands.
How Helio Helps Entrepreneurs and Investors
Let’s look at several examples to better understand how Helio could serve both entrepreneurs and investors.
Finding companies aligned with an emerging trend
An investor believes sustainable seafood is the next hot trend and would like to find all companies that are of a relevant size for an early-stage private equity investment.
First, we need to classify companies into categories based on the products they sell:
Today we have a multi-level industry classification system with over 100 categories that span the consumer and retail space. That system is evolving and will eventually include the ability for flexible, ad-hoc classifications to accommodate the ever-changing landscape without sacrificing accuracy of classifications.
For this example, we collected the following text about a company and use NLP techniques to define the category (or multiple categories), in this case, food:
Our system classified the company as a food business: The classification is probabilistic, and therefore not perfect, but we are able to quantify the error rate and improve it over time to minimize misclassification.
This isn’t easy - one source may categorize salmon jerky as “fish” while another source labels it as “snack.” This challenge is exacerbated by the number and variation of sources Helio ingests to extract category and product data.
Back to our use case, we are able to extract all of the relevant companies from our database using our industry classification system.
Next, assign relevant / emerging distinguishing attributes to the brands and/or their products:
Orthogonal to the category classification is attribute extraction, which classifies companies or products by their characteristics e.g. organic, natural, luxury, gluten-free, sustainable, women-led etc. Attribute extraction helps us discover emerging trends when attributes emerge across categories. How important is fat-free vs. sugar-free? How important is it for deodorant to not have aluminum? Is the shot format in beverage taking off relative to similar products in full 8oz or 12oz size? We use TF-IDF (term frequency, inverse document frequency) to identify the language brands use to distinguish themselves. Words that are used often by a subset of brands have high TF-IDF values (e.g., “organic”), whereas words that are either ubiquitous across brands (e.g., “tasty”) or rare within any given brand (e.g., “balthazar”) have low TF-IDF.
Now we take the group of relevant companies and identify the applicable attributes and screen out the companies that don’t have “sustainable” as an attribute.
Since attributes and categories are independent, we can also apply the screening in reverse.
Finally, estimate the revenue of each company on the list to determine if it’s appropriate for our use case:
Our revenue estimate model does exactly what the name suggests - assigns a trailing twelve month revenue to a company without ever interacting with the company. Eventually it will assign projected revenue as well. We have assembled a library of historical revenue data alongside other data at that point in the brand's growth - distribution, social media presence, product information, etc. This becomes the training data in our revenue model.
We take the companies within the investor’s desired revenue range and provide them with the final list.
The investor now has a comprehensive list of companies without the traditional leg work and time intensive research.
Diligence a company’s current performance
An investor would like to evaluate a company relative to its competitive set
To understand a company’s competitive position, we can look at several key dimensions, including brand strength, product differentiation and distribution breadth.
“Brand” is an abstract concept that is difficult to quantify. Our brand model tries to do just that – score a company across several dimensions that measure how much a brand resonates with consumers. Helio is quantifying what has so far been unquantifiable. Our brand score has been shown to correlate with future revenue growth.
In today's digital world, consumers engage directly with brands across a number of mediums, all of which provide a signal for brand equity. Social media, reviews, and awards are just some of the obvious sources we can use to measure these signals.
We track each brand's social activity and evaluate its network over time. For example, the amount of engagement with the brand over time:
The key premise of our product analysis is understanding whether a product is differentiated from the competition, and the extent to which that differentiation matters to consumers in that category. The hypothesis is that uniqueness is necessary but not sufficient for success. Some investors call this a “reason for being.” We have explored and continue to explore several key components including:
Packaging: How similar is the packaging to other products? Does it stand out? To answer these questions, we isolate the product package from photos at the SKU level and use computer vision. Doing this analysis at scale is still in development.
For example, this is a color analysis of the package of one granola bar product from RxBar vs. the typical packaging in the granola bar category:
Ingredients & Nutrition (if relevant): Does the product contain ingredients that are staples in its category? Do the ingredients/nutrition have any attributes that add value? For example, we see that having more fiber in a yogurt correlates positively with a product’s ratings and that yogurts with higher protein content on average cost more, while sugar correlates negatively with both ratings and price.
Distribution is critical for the success and growth of a company in consumer goods. Helio quantifies the breadth and growth of distribution across many channels - wholesale and direct, offline and online (in process) - in today’s increasingly omnichannel world. Most organizations only have distribution data on the biggest retailers and brands. This is fine if you’re trying to figure out how many bottles of Coke were sold at Target last week, but means that you miss a lot of what’s happening in the “long tail” of smaller retailers.
This is the distribution of number of stores by retailer in the US:
With Helio, we’re able to see this long tail of retailers. This allows us to track brands even when they start very small, then follow them as they grow.
In addition to looking at historical distribution, we can also project future distribution. Our Future Distribution Model predicts the number of doors each SKU will be sold at in a year’s time. This allows the investor to understand not just the current state of a brand’s SKU level distribution, but also its expected future distribution.
Setting & executing a retail strategy
An entrepreneur can use data to determine the retailers to prioritize and then determine which retailers will be the most impactful for long-term growth.
An entrepreneur can use CircleUp’s data to understand the competitive landscape, inform a retail strategy and arm herself with data for conversations with retailers. Some of the same data and models that help investors can also be leveraged by entrepreneurs to grow their businesses:
Identify key competitors:
Use Helio’s revenue estimate model, sliced by category and key attribute, to understand the largest players in the category and which competitors are of a similar size, etc.
Analyze competitors’ strengths and weaknesses:
Use the brand model and product insights to see where her company stands compared to competitors. Digging deeper into Helio’s data, she can understand the source of her company’s strengths (e.g. the product ingredients or features, the visual identity of the brand, the breadth and engagement of the brand’s fan base, etc.) and use this information in conversations with retailers to her advantage.
Track competitor’s momentum:
Understand how competitors are growing - growing fast, losing momentum, etc.
Understand present and future distribution landscape:
Most importantly, using Helio’s distribution model, she can see how her company compares to competitors in terms of the strength of wholesale distribution - both offline and online (in process). In the future, she will also be able to see Helio’s assessment of the retailers and geographies that matter (and will matter) the most in her category. This is an example of forecasted future door growth in the shampoo category:
Equipped with this information, the entrepreneur can devise a robust distribution strategy.
Predicting future performance
A lender would like to extend credit to a company by determining its current creditworthiness and future ability to repay the loan
A lender can use Helio data to assist with loan underwriting. The models we already discussed above - revenue estimate and revenue growth - can help determine the approximate size of a prospective borrower’s existing working capital gap (the gap between paying suppliers for product and getting paid by retailer).
Helio has (and will continue to add) additional capabilities that are especially pertinent in this situation:
Future revenue growth: Helio’s ultimate goal is to predict a company’s propensity for future revenue growth, using signals such as expanding distribution, a catalyst for growing sales, as well as a variety of other indicators. This growth metric will help to determine the appropriate ratio of advances on accounts receivable (AR) vs. inventory.
Current and future door growth: In this example, the lender can use Helio’s future door growth model to increase confidence in the company’s ability to pay back the loan. The lender can then use current door growth to monitor distribution stability and flag if points of distribution decrease.
Bankruptcy risk assessment: This future model will encompass much of our data including labeled historical bankruptcy data and will be able to assess the probability of bankruptcy. More to come!
Naturally, the lender will also assess other information such as gross margin, days sales outstanding (DSO), and cash burn, which is received directly from the company.
Compiling all of this data together, the lender can build a comprehensive picture of a company’s distribution and revenue growth potential in order to decide whether or not the company is a good candidate for a loan.
Note: The data and metrics shown here are illustrative and subject to change
This page is best viewed in landscape mode