Software Engineer, AI Systems Performance Modeling

Tesla

Palo Alto, CA

Full-time

6 days ago

Full-time

C++System SimulationMachine LearningParallel ComputingAI Accelerators

Job Description: What To Expect

Join Tesla’s Dojo Performance Team to design and optimize cutting-edge system-level simulation frameworks for AI accelerators. You will simulate the performance of thousands of Dojo compute nodes operating in parallel to drive state-of-the-art machine learning (ML) workloads. This role centers on modeling large-scale AI training systems, to evaluate performance of new kernels and mapping strategies. By analyzing trade-offs between memory, compute, and communication across system resources, you will help push the boundaries of AI performance and efficiency.

What You'll Do

Develop system-level simulation frameworks to model the performance of massively parallel AI accelerators, including compute distribution, memory hierarchy, interconnects, and dataflow.
Simulate and analyze how large-scale ML workloads, from FSD to LLMs, are mapped and executed across thousands of Dojo compute nodes.
Collaborate with ML architects, kernel developers, and system engineers to ensure simulations reflect real-world AI training requirements.
Design and implement tests to evaluate trade-offs in system resources, including memory bandwidth, capacity, latency, and compute, to optimize performance for large-scale AI workloads.
Build and maintain software tools and frameworks to support simulation development, testing, and integration.
Conduct performance analysis to identify bottlenecks and propose system-level optimizations.
Stay current with advancements in ML model architectures, parallel computing, and system-level simulation techniques.
Participate in code reviews, debugging, and testing to ensure robust and scalable simulation frameworks.

What You'll Bring

Degree in Computer Science, Electrical Engineering, or proof of exceptional skills in related fields, or equivalent experience.
Strong proficiency in C++ for developing high-performance simulation frameworks.
Solid understanding of ML/deep learning model architectures, including how models are partitioned and mapped across multiple devices.
Good understanding in Compute Architecture, Memory Hierarchy, and Dataflows.
Experience in system-level simulation, parallel computing, or ML workload optimization.
Knowledge of kernel development processes and how ML workloads are deployed on hardware accelerators.
Familiarity with analytical simulation techniques for modeling high-level system behavior.
Excellent problem-solving skills, with the ability to analyze complex systems and propose innovative solutions.
Strong communication and collaboration skills to work effectively with cross-functional teams, including ML researchers, architects, and engineers.
Ability to work onsite in our Palo Alto, CA office.

Benefits

Compensation and Benefits

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits from day 1:

Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
Family-building, fertility, adoption, and surrogacy benefits
Dental (including orthodontic coverage) and vision plans, both with options with a $0 paycheck contribution
Company Paid HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
Healthcare and Dependent Care Flexible Spending Accounts (FSA)
401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
Company paid Basic Life, AD&D, short-term and long-term disability insurance
Employee Assistance Program
Sick and Vacation time (Flex time for salary positions), and Paid Holidays
Back-up childcare and parenting support resources
Voluntary benefits: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
Weight Loss and Tobacco Cessation Programs
Tesla Babies program
Commuter benefits
Employee discounts and perks program

Expected Compensation

$132,000 - $330,000 per year + cash and stock awards + benefits
Pay may vary depending on location, skills, experience, and other factors.

Summary

This role involves developing advanced simulation frameworks to improve AI hardware performance at Tesla, requiring expertise in C++, ML architectures, system simulation, and parallel computing.

Job Highlights

Qualifications

Degree in Computer Science, Electrical Engineering, or related fields / exceptional skills / equivalent experience
Strong proficiency in C++
Understanding of ML/deep learning model architectures and data partitioning
Knowledge of Compute Architecture, Memory Hierarchy, Dataflows
Experience in system-level simulation and parallel computing
Familiarity with kernel development and ML deployment on hardware accelerators
Analytical simulation techniques
Problem-solving skills
Cross-functional collaboration
Onsite work in Palo Alto, CA

Benefits

Comprehensive healthcare plans
Family-building and surrogacy benefits
Dental and vision coverage
Retirement plans including 401(k)
Disability and Life insurance
Employee Assistance and wellness programs
Paid time off and holidays
Childcare resources
Voluntary insurance benefits
Employee discounts

Compensation

Salary ranging from $132,000 to $330,000 annually plus stock and bonuses.

Software Engineer, AI Systems Performance Modeling

Job Description: What To Expect

What You'll Do

What You'll Bring

Benefits

Compensation and Benefits

Expected Compensation

Summary

Job Highlights

Qualifications

Benefits

Compensation

Apply for this job