overlay
Software Engineer (Machine Learning)
Engineering – Libraries
hybrid: San Francisco, CA
Salary range $170,112 - $237,000
added Tue Jul 04, 2023
link-outApply to Anyscale
About the role:
We're looking for passionate, motivated people who are excited to build infrastructure and tooling for next generation machine learning applications. We're hiring exceptional Software Engineers for the distributed training team at Anyscale, which is responsible for building and maintaining open source machine learning libraries widely adopted across the industry.
We are particularly looking for Senior or staff and above candidates who can help cast and execute on a vision for the future of machine learning training infrastructure. We are open to both Individual Contributors and people who are primarily technical but have prior experience managing a small team.
About the Distributed Training team:
The team's mission is to make it really easy to do distributed machine learning on Ray and Anyscale. Specifically, our team maintains and develops features for a broad number of libraries — including Ray Train (distributed deep learning), Ray Tune (distributed hyperparameter tuning), and XGBoost-on-Ray.
Our team is the most user-facing engineering team on the open source side, collaborating with ML engineering teams at organizations like Shopify, Uber, and Bytedance.

As part of this role, you will:

  • Build performant, scalable, fault-tolerant distributed machine learning libraries that power the next generation of machine learning platforms around the world
  • Work on difficult architectural problems and turn them into reality
  • Work with a team of leading experts in the areas of distributed systems and machine learning
  • Work with engineering managers and product managers to lead and grow an extremely talented team of software engineers
  • Work closely with open source community (with ML researchers, ML engineers, data scientists) to scope and build new abstractions for scalable machine learning
  • You like to work closely with end users and iterate on the product with them
  • Help us build and shape a world-class company

We'd love to hear from you if you have:

  • 5+ years of building, scaling and maintaining software systems in production environments
  • Solid fundamentals in algorithms, data structures, system design
  • Experience with machine learning frameworks and libraries (PyTorch, Tensorflow)
  • Experience designing fault-tolerant distributed systems
  • Strong architectural skills

Bonus points!

  • Experience working with a cloud technology stack (AWS, GCP, Kubernetes)
  • Experience building machine learning training pipelines or inference services in a production setting
  • Experience with managing and maintaining open source machine learning libraries
  • Experience managing small teams in pursuit of an ambitious technical goal
  • Experience using Ray

Compensation

  • At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. The target salary for this role is $170,112 ~ $237,000. As the market data changes over time, the target salary for this role may be adjusted.
This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including the following:
  • Stock Options
  • Healthcare plans, with premiums covered by Anyscale at 99%
  • 401k Retirement Plan
  • Wellness stipend
  • Education stipend
  • Paid Parental Leave
  • Flexible Time Off
  • Commute reimbursement
  • 100% of in office meals covered
Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish
At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.

We're a San Francisco based company, proud to be backed by $250+ million from top-tier investors like Andreessen Horowitz, NEA, and Addition.