Job Description: What To Expect
Join Tesla’s Dojo Performance Team to design and optimize cutting-edge system-level simulation frameworks for AI accelerators. You will simulate the performance of thousands of Dojo compute nodes operating in parallel to drive state-of-the-art machine learning (ML) workloads. This role centers on modeling large-scale AI training systems, to evaluate performance of new kernels and mapping strategies. By analyzing trade-offs between memory, compute, and communication across system resources, you will help push the boundaries of AI performance and efficiency.
What You'll Do
- Develop system-level simulation frameworks to model the performance of massively parallel AI accelerators, including compute distribution, memory hierarchy, interconnects, and dataflow.
- Simulate and analyze how large-scale ML workloads, from FSD to LLMs, are mapped and executed across thousands of Dojo compute nodes.
- Collaborate with ML architects, kernel developers, and system engineers to ensure simulations reflect real-world AI training requirements.
- Design and implement tests to evaluate trade-offs in system resources, including memory bandwidth, capacity, latency, and compute, to optimize performance for large-scale AI workloads.
- Build and maintain software tools and frameworks to support simulation development, testing, and integration.
- Conduct performance analysis to identify bottlenecks and propose system-level optimizations.
- Stay current with advancements in ML model architectures, parallel computing, and system-level simulation techniques.
- Participate in code reviews, debugging, and testing to ensure robust and scalable simulation frameworks.
What You'll Bring
- Degree in Computer Science, Electrical Engineering, or proof of exceptional skills in related fields, or equivalent experience.
- Strong proficiency in C++ for developing high-performance simulation frameworks.
- Solid understanding of ML/deep learning model architectures, including how models are partitioned and mapped across multiple devices.
- Good understanding in Compute Architecture, Memory Hierarchy, and Dataflows.
- Experience in system-level simulation, parallel computing, or ML workload optimization.
- Knowledge of kernel development processes and how ML workloads are deployed on hardware accelerators.
- Familiarity with analytical simulation techniques for modeling high-level system behavior.
- Excellent problem-solving skills, with the ability to analyze complex systems and propose innovative solutions.
- Strong communication and collaboration skills to work effectively with cross-functional teams, including ML researchers, architects, and engineers.
- Ability to work onsite in our Palo Alto, CA office.
Benefits
Compensation and Benefits
Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits from day 1:
- Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
- Family-building, fertility, adoption, and surrogacy benefits
- Dental (including orthodontic coverage) and vision plans, both with options with a $0 paycheck contribution
- Company Paid HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
- Healthcare and Dependent Care Flexible Spending Accounts (FSA)
- 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
- Company paid Basic Life, AD&D, short-term and long-term disability insurance
- Employee Assistance Program
- Sick and Vacation time (Flex time for salary positions), and Paid Holidays
- Back-up childcare and parenting support resources
- Voluntary benefits: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
- Weight Loss and Tobacco Cessation Programs
- Tesla Babies program
- Commuter benefits
- Employee discounts and perks program
Expected Compensation
- $132,000 - $330,000 per year + cash and stock awards + benefits
- Pay may vary depending on location, skills, experience, and other factors.
Summary
This role involves developing advanced simulation frameworks to improve AI hardware performance at Tesla, requiring expertise in C++, ML architectures, system simulation, and parallel computing.