Senior Data Engineer

San Francisco, CA

Data Acquisition & Ingestion Team Overview

The Data Acquisition & Ingestion team is responsible for collecting, ingesting, and normalizing the data that powers Helio. We are building an automated ingestion system that is able to scale and reliably onboard and extract value from hundreds of disparate data sources. We are also champions of maintaining high quality data practices across all of Helio’s data pipelines.

Some of the technologies we leverage are Python 3, PySpark, AWS, Docker, Kubernetes, Postgres, Airflow, Jenkins and Git.

We are looking for a Senior Data Engineer to help us design and develop data pipelines to ingest, validate, extract, and normalize data across new and existing sources. The ideal candidate will be self-driven and comfortable balancing progress towards a longer-term roadmap while maintaining context and stability across a dynamic set of existing data sources. While the role is for an individual contributor, we are looking for someone who is excited and willing to mentor junior engineers.

CircleUp was recently honored as one of Fast Company's Top 10 Most Innovative Companies in Data Science and has been named a CB Insights FinTech 250, a Top 5 Most Disruptive Company in Finance by CNBC, and to the Forbes FinTech 50. Founded in 2012, CircleUp is headquartered in San Francisco and backed by Union Square Ventures, GV, Canaan Partners, QED Advisors, and others. Learn more at www.circleup.com

Responsibilities

  • Provide senior-level contribution to the design, implementation and maintenance of complex data pipelines
  • Build reliable services for gathering & ingesting data from a wide variety of sources
  • Build performant and reliable data pipelines to validate, extract and normalize data from a wide variety of sources
  • Develop strategy, tools, and workflow for integrations and ingestion of data
  • Collaborate with cross-functional teams and stakeholders to understand data needs
  • Write quality, maintainable code with extensive test coverage in a fast-paced, agile software engineering environment
  • Mentor junior teammates and lead by example in demonstrating software engineering best practices

Requirements

  • Hold a B.S. or M.S. in Computer Science, or equivalent degree
  • 5-7+ years of proven working experience as a data engineer
  • Excellent software engineering skills and strong fundamentals in algorithms, data structures, predictive modeling and big data concepts
  • Strong programming fundamentals and proficiency in an object-oriented language such as Python or Scala
  • Excellent communication skills to collaborate with stakeholders in engineering, data science, and product
  • Experience with our stack (Python, PySpark, Airflow, AWS ecosystem) is preferred but not required
  • Experience building large-scale and complex data processing pipelines
  • A successful history of manipulating, processing and extracting value from large disconnected datasets
  • Strong analytical skills related to working with unstructured datasets

Useful Traits for this Role

  • Communication, both technical and business-level, especially with external contractors
  • Detail-oriented, Business-sense and ability to manage ambiguity; able to synthesize detailed schema specifications from a newly identified source
  • Ability to understand, maintain, document, and be knowledgeable about a large variety of data sources; able to deal with a certain level of reactive context-switching
  • Proactive and driven; will identify gaps in our data model and will proactively work to improve it
Apply

All Open Positions