Job Description: Research Lead for Speech, Audio, and Conversational AI

We are seeking a highly skilled and experienced Research Lead specializing in Speech, Audio, and Conversational AI to join our innovative team.

Role Overview

In this role, you will:

Spearhead research and development of cutting-edge technologies in speech processing, text-to-speech (TTS), audio analysis, and real-time conversational AI.
Push the boundaries of automatic speech recognition (ASR), speaker identification, diarization, speech synthesis, voice cloning, dubbing, and audio generation.

Key Responsibilities

Develop advanced Audio Language Models and Speech Language Models using state-of-the-art Audio/Speech and Large Language Models.
Research, architect, and deploy new generative AI methods such as autoregressive models, causal models, and diffusion models.
Design and implement low-latency, end-to-end models with multilingual speech/audio as input and output.
Conduct experiments to evaluate and improve model performance focusing on accuracy, naturalness, efficiency, and real-time capabilities across multiple languages.
Stay at the forefront of advancements and incorporate new techniques into foundation models.
Collaborate with cross-functional teams for integration into products.
Publish research findings at top-tier conferences/journals (e.g., INTERSPEECH, ICASSP, ICLR, ICML, NeurIPS).
Mentor junior researchers and engineers.
Promote best practices including rigorous testing, documentation, and ethical considerations.

Qualifications

Ph.D. in Computer Science, Electrical Engineering, or related fields focusing on speech processing, audio analysis, and machine learning.
Experience with training speech/audio models such as W2V-BERT, SONAR, AST, Hi-Fi GAN, VQ-GAN, AudioLDM, SeamlessM4T.
Proficiency with Audio Language Models like AudioPALM, Moshi, and Seamless M4T.
Proven ability in developing neural networks like Transformers, Mixture of Experts, Diffusion Models, State Space Machines (e.g., MAMBA, SAMBA).
Experience in optimizing models for low-latency, real-time applications.
Strong background in multilingual speech recognition, voice cloning, dubbing, and synthesis.
Skills in deep learning frameworks such as TensorFlow and PyTorch.
Experience in deploying large-scale speech/audio models.
High-performance computing skills in Python, C/C++, CUDA, and kernel-level programming.
Familiarity with audio signal processing techniques.

Job Highlights

Ph.D. in relevant fields.
Experience with speech/audio models for representation and generation.
Expertise in Audio Language Models.
Proven success with neural network architectures.
Experience in real-time application optimization.
Multilingual speech recognition and voice synthesis skills.
Competency in deep learning frameworks and deploying large models.
Proficiency in high-performance computing and signal processing techniques.

Join Our Team

Be a leading innovator in Speech and Audio AI technology with us!

Job Highlights

Ph.D. in Computer Science, Electrical Engineering, or a related field focusing on speech processing and machine learning.
Extensive experience with training and deploying speech/audio models like W2V-BERT, SONAR, AST, Hi-Fi GAN, VQ-GAN, and AudioLDM.
Expertise in Audio Language Models such as AudioPALM, Moshi, and Seamless M4T.
Proven track record with advanced neural network architectures, including Transformers, Diffusion Models, and State Space Machines.
Specialized in low-latency, real-time applications and multilingual speech recognition.
Proficiency in deep learning frameworks like TensorFlow and PyTorch.
Skilled in high-performance computing involving Python, C/C++, and CUDA.
Strong background in audio signal processing techniques.

Join us to lead innovative projects in Speech and Audio AI!

Research Engineer, Speech Foundation Models

Job Description: Research Lead for Speech, Audio, and Conversational AI

Role Overview

Key Responsibilities

Qualifications

Job Highlights

Join Our Team

Job Highlights

Apply for this job