How Helio has overcome the challenges of ingesting, validating, and publishing data at scale

CircleUp avatar

Written by
CircleUp

Last updated
3rd July, 2021

The Challenges

Ingredient and nutrition data comes in a multitude of formats, or, more often, no structure at all.

Ingredient and nutrition data comes in a multitude of formats, or, more often, no structure at all.

The Infrastructure

Airflow gives us the ability to schedule and monitor our data pipelines, which are often a series of interdependent Spark jobs run on EC2 clusters.

Airflow gives us the ability to schedule and monitor our data pipelines, which are often a series of interdependent Spark jobs run on EC2 clusters.

The Data Lake