Software Engineer, Data

San Francisco, CA

CircleUp harnesses the power of machine learning and predictive analytics to discover some of the fastest-growing companies in the consumer & retail sector. Our mission is to help entrepreneurs thrive by giving them the resources and capital they need. We are building a predictive data system called "Helio" to bring the data-driven revolution that has occurred in the public markets to the private markets, starting with consumer & retail.

We are working on challenging problems in information retrieval, entity resolution, and machine learning. We are developing an in-depth knowledge graph of all private companies by mining vast amounts of data to successfully rewrite the rules on how private companies are evaluated.

CircleUp has been named one of the Top 5 Most Disruptive Companies in Finance by CNBC, one of the 50 Best FinTech Innovators by KPMG, Top 3 Most Innovative Companies within Data Science by Fast Company and one of America's Most Promising Companies by Forbes. We are backed by top-tier investors including Google Ventures, Union Square Ventures (backers of Etsy/Kickstarter), and the ex CEOs/Presidents of Goldman Sachs, Morgan Stanley, Thomson Reuters, the Stanford Endowment and Capital One.


  • Build and maintain performant and reliable data pipelines in service of machine learning and predictive analytics.
  • Build and maintain systems to normalize and aggregate data across a wide variety of sources.
  • Build and maintain automated processing pipelines and big data infrastructure to operate and configure production clusters and data orchestration services.


  • Have a B.S., M.S. or Ph.D. in Computer Science or equivalent degree and work experience
  • 6+ years of data focused, professional or open source, software engineering experience
  • Strong communication skills to facilitate working closely with data scientists, product managers, and business team
  • Excellent software engineering skills and strong fundamentals in algorithms, data structures, system design, and big data concepts
  • Strong experience with distributed systems
  • Significant experience with our stack (python3, pandas, scikit-learn, spark, airflow, docker, AWS ecosystem) or comparable stack

Example Projects we face:

  • Embed with data science team to help bring ML model into production. This includes understanding model training and feature engineering process, identifying performance bottlenecks and suggest alternatives, apply software engineering best practices, and help define broader validation and verification strategies.
  • Build ML model evaluation system to help us catch training and scoring anomalies.
  • Build data pipeline for new sources. Work through normalization and interpolation challenges to enable smooth integration with our other data assets that enable Helio.
  • Design new or improve on existing data access patterns to enable other engineers, data scientists, and business users to more easily derive value from our data repository.

All Open Positions