Software Engineer, Data Pipelines

San Francisco, CA

CircleUp harnesses the power of machine learning and predictive analytics to discover some of the fastest-growing companies in the consumer & retail sector. Our mission is to help entrepreneurs thrive by giving them the resources and capital they need. We are building a predictive data system called "Helio" to bring the data-driven revolution that has occurred in the public markets to the private markets, starting with consumer & retail.

We are working on challenging problems in information retrieval, entity resolution, and machine learning. We are developing an in-depth knowledge graph of all private companies by mining vast amounts of data to successfully rewrite the rules on how private companies are evaluated.

CircleUp was recently honored as one of Fast Company's Top 10 Most Innovative Companies in Data Science and has been named a CB Insights FinTech 250, a Top 5 Most Disruptive Company in Finance by CNBC, and to the Forbes FinTech 50. Founded in 2012, CircleUp is headquartered in San Francisco and backed by Union Square Ventures, GV, Canaan Partners, QED Advisors, and others. Learn more at www.circleup.com

Responsibilities:

  • Build and maintain performant and reliable data pipelines in service of machine learning and predictive analytics.
  • Build and maintain systems to normalize and aggregate data across a wide variety of sources.
  • Build and maintain automated processing pipelines and big data infrastructure to operate and configure production clusters and data orchestration services.

Requirements:

  • Have a B.S., M.S. or Ph.D. in Computer Science or equivalent degree and work experience
  • 6+ years of data focused, professional or open source, software engineering experience
  • Strong communication skills to facilitate working closely with data scientists, product managers, and business team
  • Excellent software engineering skills and strong fundamentals in algorithms, data structures, system design, and big data concepts
  • Strong experience with distributed systems
  • Significant experience with our stack (python3, pandas, scikit-learn, spark, airflow, docker, AWS ecosystem) or comparable stack

Example Projects we face:

  • Embed with data science team to help bring ML model into production. This includes understanding model training and feature engineering process, identifying performance bottlenecks and suggest alternatives, apply software engineering best practices, and help define broader validation and verification strategies.
  • Build ML model evaluation system to help us catch training and scoring anomalies.
  • Build data pipeline for new sources. Work through normalization and interpolation challenges to enable smooth integration with our other data assets that enable Helio.
  • Design new or improve on existing data access patterns to enable other engineers, data scientists, and business users to more easily derive value from our data repository.
Apply

All Open Positions