Senior Data Engineer, Ingestion - CU100

San Francisco, CA

Data Acquisition & Ingestion Team Overview

The Data Acquisition & Ingestion team is responsible for collecting, ingesting, and normalizing the data that powers Helio. We are building an automated ingestion system that is able to scale and reliably onboard and extract value from hundreds of disparate data sources. We are also champions of maintaining high quality data practices across all of Helio’s data pipelines.

Some of the technologies we leverage are Python 3, PySpark, AWS, Docker, Kubernetes, Postgres, Airflow, Jenkins and Git.

We are looking for a Senior Data Engineer to help us design and develop data pipelines to ingest, validate, extract, and normalize data across new and existing sources. The ideal candidate will be self-driven and comfortable balancing progress towards a longer-term roadmap while maintaining context and stability across a dynamic set of existing data sources. While the role is for an individual contributor, we are looking for someone who is excited and willing to mentor junior engineers.


Provide senior-level contribution to the design, implementation and maintenance of complex data pipelines

Build reliable services for gathering & ingesting data from a wide variety of sources

Build performant and reliable data pipelines to validate, extract and normalize data from a wide variety of sources

Develop strategy, tools, and workflow for integrations and ingestion of data

Collaborate with cross-functional teams and stakeholders to understand data needs

Write quality, maintainable code with extensive test coverage in a fast-paced, agile software engineering environment

Mentor junior teammates and lead by example in demonstrating software engineering best practices


Hold a B.S. or M.S. in Computer Science, or equivalent degree

5-7+ years of proven working experience as a data engineer

Excellent software engineering skills and strong fundamentals in algorithms, data structures, predictive modeling and big data concepts

Strong programming fundamentals and proficiency in an object-oriented language such as Python or Scala

Excellent communication skills to collaborate with stakeholders in engineering, data science, and product

Nice to Have

Experience with our stack (Python, PySpark, Airflow, AWS ecosystem) is preferred but not required

Experience building large-scale and complex data processing pipelines

A successful history of manipulating, processing and extracting value from large disconnected datasets

Strong analytical skills related to working with unstructured datasets

Useful Traits for this Role

Communication, both technical and business-level, especially with external contractors

Detail-oriented, Business-sense and ability to manage ambiguity; able to synthesize detailed schema specifications from a newly identified source

Ability to understand, maintain, document, and be knowledgeable about a large variety of data sources; able to deal with a certain level of reactive context-switching

Proactive and driven; will identify gaps in our data model and will proactively work to improve it


All Open Positions