Job Description
Company Overview
At Scale, we're at the forefront of AI innovation, powering the world's most advanced Large Language Models (LLMs), generative models, and computer vision models. Trusted by industry leaders like OpenAI, Meta, Microsoft, and government agencies, we aim to accelerate the transition from traditional software to AI.
Our Mission
Fast-track the adoption of AI across industries by transforming how organizations build and deploy AI applications.
The Role: Machine Learning Platform Engineer
As part of the RLXF team, you'll develop and optimize our internal distributed framework for large language model training and inference, supporting research and development across Scale.
Key Responsibilities
- Build, profile, and optimize training and inference frameworks
- Collaborate with ML teams to accelerate research and model development
- Research and integrate cutting-edge ML technologies
- Support next-generation LLM training, inference, and data curation
Ideal Candidate Skills
- Passion for system optimization and large-scale distributed ML systems
- Experience with multi-node LLM training and inference
- Skilled in software engineering using CUDA, Pytorch, transformers, Flash Attention, etc.
- Strong communication skills in a cross-functional environment
- Expertise in post-training methods & newer use cases for LLMs (instruction tuning, RLHF, reasoning, agents, multimodal)
Compensation & Benefits
- Salary Range: $200,800—$251,000 USD (dependent on location & experience)
- Equity-based compensation (subject to Board approval)
- Benefits include:
- Health, dental, vision insurance
- Retirement plans
- Learning & development stipend
- Generous PTO
- Possible commuter stipend
Additional Information
- Location: San Francisco, New York, Seattle
- 90-day policy for role reconsideration
- Commitment to diversity and inclusivity in hiring
- Reasonable accommodations available for applicants with disabilities
About Us
Our goal is to accelerate AI development and deployment, helping organizations leverage AI efficiently and responsibly.