Job Description
Introduction
Nexa AI is an on-device AI research and deployment company. We specialize in:
- Tiny, multimodal models (e.g., Octopus v2, OmniVLM, OmniAudio)
- Local on-device inference framework (e.g., nexa-sdk)
- Model optimization techniques (e.g., NexaQuant)
Our work has been recognized by industry leaders like Google, Hugging Face, AMD, and more. We partner with enterprises and SMBs to bring local intelligence to every device.
Responsibilities
- Build on-device ML infrastructure at scale
- Assist in developing and optimizing Large Language Models (LLMs) for on-device deployment
- Support on-device AI research efforts
- Contribute to the development of our SDKs across multiple platforms including Windows, MacOS, Android, iOS, and Linux
Candidate Requirements
You May Be a Good Fit If You:
- Hold a minimum BS or MS in Computer Science
- Are familiar with PyTorch
- Have an excellent understanding of computer science fundamentals, including data structures, algorithms, and coding
- Possess knowledge of operating system internals, compilers, and low-power/mobile optimization
- Have experience with low-level programming in C and frameworks like CUDA, OpenCL
- Are proficient in multithreading and performance optimization
Logistics
- Part-Time: Remote, 20+ hours/week
- Full-Time: Based in Cupertino, California
How To Apply
Send your resume to career@nexa4ai.com