Senior Data Engineer

San Francisco, CA

Data Acquisition & Ingestion Team Overview

The Data Acquisition & Ingestion team is responsible for collecting, ingesting, and normalizing the data that powers Helio. We are building an automated ingestion system that is able to scale and reliably onboard and extract value from hundreds of disparate data sources. We are also champions of maintaining high quality data practices across all of Helio’s data pipelines.

Some of the technologies we leverage are Python 3, PySpark, AWS, Docker, Kubernetes, Postgres, Airflow, Jenkins and Git.

We are looking for a Senior Data Engineer to help us design and develop data pipelines to ingest, validate, extract, and normalize data across new and existing sources. The ideal candidate will be self-driven and comfortable balancing progress towards a longer-term roadmap while maintaining context and stability across a dynamic set of existing data sources. While the role is for an individual contributor, we are looking for someone who is excited and willing to mentor junior engineers.

CircleUp has been named one of the Top 5 Most Disruptive Companies in Finance by CNBC, one of the 50 Best FinTech Innovators by KPMG, Top 3 Most Innovative Companies within Data Science by Fast Company and one of America's Most Promising Companies by Forbes. We are backed by top-tier investors including Google Ventures, Union Square Ventures (backers of Etsy/Kickstarter), and the ex CEOs/Presidents of Goldman Sachs, Morgan Stanley, Thomson Reuters, the Stanford Endowment and Capital One.

Responsibilities

  • Provide senior-level contribution to the design, implementation and maintenance of complex data pipelines
  • Build reliable services for gathering & ingesting data from a wide variety of sources
  • Build performant and reliable data pipelines to validate, extract and normalize data from a wide variety of sources
  • Develop strategy, tools, and workflow for integrations and ingestion of data
  • Collaborate with cross-functional teams and stakeholders to understand data needs
  • Write quality, maintainable code with extensive test coverage in a fast-paced, agile software engineering environment
  • Mentor junior teammates and lead by example in demonstrating software engineering best practices

Requirements

  • Hold a B.S. or M.S. in Computer Science, or equivalent degree
  • 5-7+ years of proven working experience as a data engineer
  • Excellent software engineering skills and strong fundamentals in algorithms, data structures, predictive modeling and big data concepts
  • Strong programming fundamentals and proficiency in an object-oriented language such as Python or Scala
  • Excellent communication skills to collaborate with stakeholders in engineering, data science, and product
  • Experience with our stack (Python, PySpark, Airflow, AWS ecosystem) is preferred but not required
  • Experience building large-scale and complex data processing pipelines
  • A successful history of manipulating, processing and extracting value from large disconnected datasets
  • Strong analytical skills related to working with unstructured datasets

Useful Traits for this Role

  • Communication, both technical and business-level, especially with external contractors
  • Detail-oriented, Business-sense and ability to manage ambiguity; able to synthesize detailed schema specifications from a newly identified source
  • Ability to understand, maintain, document, and be knowledgeable about a large variety of data sources; able to deal with a certain level of reactive context-switching
  • Proactive and driven; will identify gaps in our data model and will proactively work to improve it
Apply

All Open Positions