Job Description: Research Lead for Speech, Audio, and Conversational AI
We are seeking a highly skilled and experienced Research Lead specializing in Speech, Audio, and Conversational AI to join our innovative team.
Role Overview
In this role, you will:
- Spearhead research and development of cutting-edge technologies in speech processing, text-to-speech (TTS), audio analysis, and real-time conversational AI.
- Push the boundaries of automatic speech recognition (ASR), speaker identification, diarization, speech synthesis, voice cloning, dubbing, and audio generation.
Key Responsibilities
- Develop advanced Audio Language Models and Speech Language Models using state-of-the-art Audio/Speech and Large Language Models.
- Research, architect, and deploy new generative AI methods such as autoregressive models, causal models, and diffusion models.
- Design and implement low-latency, end-to-end models with multilingual speech/audio as input and output.
- Conduct experiments to evaluate and improve model performance focusing on accuracy, naturalness, efficiency, and real-time capabilities across multiple languages.
- Stay at the forefront of advancements and incorporate new techniques into foundation models.
- Collaborate with cross-functional teams for integration into products.
- Publish research findings at top-tier conferences/journals (e.g., INTERSPEECH, ICASSP, ICLR, ICML, NeurIPS).
- Mentor junior researchers and engineers.
- Promote best practices including rigorous testing, documentation, and ethical considerations.
Qualifications
- Ph.D. in Computer Science, Electrical Engineering, or related fields focusing on speech processing, audio analysis, and machine learning.
- Experience with training speech/audio models such as W2V-BERT, SONAR, AST, Hi-Fi GAN, VQ-GAN, AudioLDM, SeamlessM4T.
- Proficiency with Audio Language Models like AudioPALM, Moshi, and Seamless M4T.
- Proven ability in developing neural networks like Transformers, Mixture of Experts, Diffusion Models, State Space Machines (e.g., MAMBA, SAMBA).
- Experience in optimizing models for low-latency, real-time applications.
- Strong background in multilingual speech recognition, voice cloning, dubbing, and synthesis.
- Skills in deep learning frameworks such as TensorFlow and PyTorch.
- Experience in deploying large-scale speech/audio models.
- High-performance computing skills in Python, C/C++, CUDA, and kernel-level programming.
- Familiarity with audio signal processing techniques.
Job Highlights
- Ph.D. in relevant fields.
- Experience with speech/audio models for representation and generation.
- Expertise in Audio Language Models.
- Proven success with neural network architectures.
- Experience in real-time application optimization.
- Multilingual speech recognition and voice synthesis skills.
- Competency in deep learning frameworks and deploying large models.
- Proficiency in high-performance computing and signal processing techniques.
Join Our Team
Be a leading innovator in Speech and Audio AI technology with us!