overlay
Senior ML Infra Engineer
Engineering (SW)
on site: New York City
added Fri Aug 25, 2023
link-outApply to Normal Computing

About The Role

As an ML Infrastructure Engineer at Normal Computing, you will play a crucial role in shaping our full-stack AI platform by designing, building, and maintaining scalable machine learning infrastructure for the development, training, and deployment of frontier machine learning techniques and algorithms.

Depending on interest and skills, responsibilities could include:

  • Collaborating closely with ML research scientists and engineers to optimize and productionize pipelines and workflows, ensuring efficiency, best practices, and effective resource utilization.

  • Implementing tools, libraries and frameworks to speed up and enable new research.

  • Collaborate closely with Thermodynamic Hardware scientists and engineers to integrate a novel simulation stack and compilation engine

  • Be a part of planning and performing rapid prototyping of machine learning techniques applied to real-world scientific and enterprise problems.

  • Make improvements to model architectures, training, simulation, and compilation procedures

  • Report and present software developments, experimental results and analysis clearly and efficiently.

  • Contributing to existing documentation or educational content (e.g. blog posts or talks) and adapt content based on product/program updates and user feedback

  • Stay up-to-date with the latest industry trends and technologies, driving continuous improvement and innovation within our ML platform.

  • Mentoring and guiding junior colleagues, nurturing a collaborative, growth-oriented environment that promotes knowledge sharing and professional development.

Qualifications:

  • Bachelor's degree or higher in Computer Science, Engineering, or a related field.

  • 3+ years of experience in infrastructure engineering, with a focus on machine learning, distributed systems, and cloud computing.

  • Experience with at least one programming language (preference for those commonly used in ML or scientific computing such as Python or C++).

  • Experience using TensorFlow, PyTorch, Jax, NumPy, Pandas or similar ML/scientific libraries.

  • Leadership and collaboration qualities, enthusiasm for real-world, responsible impact

  • Excellent problem-solving skills and a proven ability to troubleshoot and optimize complex systems.

  • Strong written and verbal communication skills, with the ability to explain complex concepts to both technical and non-technical stakeholders.

Preferred qualifications:

  • Knowledge of containerization technologies (Docker, Kubernetes) and any cloud platform like GCP, AWS, Azure.

  • Familiarity with ML infrastructure tools and technologies, such as Ray, MLflow, Kubeflow, Flyte, or similar platforms.

  • Strong understanding of CI/CD pipelines, infrastructure-as-code (Terraform, CloudFormation), and configuration management tools (Ansible, Puppet, Chef).

  • Experience with big data technologies such as Hadoop, Spark, or Flink.

  • Familiarity with data storage and processing systems (SQL/NoSQL, Kafka).

  • A passion for staying up-to-date with the latest advancements in AI, ML, and infrastructure technologies.

In addition, the following would be a significant advantage:

  • Applied experience with machine learning, preferably modern deep learning architectures (e.g. Transformers, CNNs, vision-language models, deep reinforcement learning)·

  • Experience with machine learning training objectives beyond accuracy (e.g. Bayesian learning, meta-learning, value-at-risk, robustness, distribution shift, class imbalance, fairness)

  • Experience with large-scale Bayesian modeling and inference

  • Comfort with probabilistic programming languages (e.g. Tensorflow Probability)

  • Experience in cross functional collaboration, with research teams and product teams.

Additional Information:

Normal Computing values diverse perspectives and experience. We encourage you to apply to this role if you feel you would be a good fit, even if you do not meet all of the requirements listed.

This role will receive a competitive salary + benefits + equity.

NY Est. Base Annual Range: $150,000-$190,000

A variety of factors are considered when determining someone’s compensation–including a candidate’s professional background, experience, and location. Final offer amount may vary from the amount listed above.

Normal Computing is an equal opportunity employer. We are committed to building a diverse and inclusive workforce and do not discriminate based on race, religion, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, age, sexual orientation, veteran or military status, or any other legally protected characteristics, Normal Computing is committed to providing reasonable accommodations for candidates with disabilities who need assistance during the hiring process. To request a reasonable accommodation, please email accomodations@normalcomputing.ai

Normal is a New York-based deep-tech startup founded by former engineers from Google Brain, Alphabet X, and Palantir. Our investors include Celesta Capital, First Spark Ventures, Micron and former Google CEO Eric Schmidt. We engage with enterprise companies across various industries, including services, manufacturing, and the public sector, to deliver cutting-edge AI solutions.

We are on a mission to make AI universally scalable and useful.

Our products serve as critical full-stack infrastructure for our enterprise users in deploying AI into high-stakes applications. We are addressing the challenges of reliability, adaptivity, and auditability, which have traditionally been central barriers to adoption.

We believe that the untapped potential of AI to create transformative value remains to be fully realized. Thus far, AI has been subject to technological limitations such as unpredictable factual errors in generative AI (known as hallucinations) or lack of auditability as black-box models. Consequently, these limitations restrict the application of AI primarily to consumer-grade, low-stakes generative AI workflows and basic pattern recognition systems.

We envision that true transformative value can be unlocked in enterprise-grade, high-stakes AI workflows, where AI can reason reliably and autonomously, and understand its own limits. In these contexts, AI has the capacity to drive meaningful outcomes with real, complex impact.

Our approach involves redesigning AI systems from the ground up, contrasting other surface-level approaches. Our AI application development platform is powered by novel full-stack probabilistic machine learning infrastructure driven by thermodynamic physics. With Normal's probabilistic AI, we offer unprecedented control over reliability, adaptivity, and auditability to AI models, specifically tailored for critical and customer-specific enterprise workflows.

As we forge ahead with our mission, we are seeking passionate individuals eager to collaborate with our uniquely diverse and interdisciplinary team, and motivated by a workplace where the hardest problems remain to be solved.