Requirements
* If you thrive in fast-moving startup environments, enjoy experimenting with new ideas, and love seeing your work come to life in production then you’ll feel right at home
* A PhD (or near completion) in a relevant field, or equivalent research experience
* Hands-on experience with Large Multimodal Models and a strong foundation in generative (language) models. This could be in the context of tasks such as VQA, Audio/Video understanding tasks, captioning behavioral analysis, Translation tasks, Speech to Speech systems
* Experience in fine-tuning/adapting VLMs for control, conditioning, or downstream tasks
* Solid background in deep learning and foundation modes
* Strong PyTorch skills and comfort building deep learning pipelines
* (Desirable) Knowledge of large-scale model training and optimization
* (Desirable) Experience in duplex-conversational model
* (Desirable) Broader understanding of generative AI across modalities
* (Desirable) Exposure to software development best practices
* (Desirable) A flexible, experimental mindset i.e. comfortable working across research and engineering
* (Desirable) (Bonus) Publications at EMNLP, COLING, NeurIPS, ICLR, CVPR, ICCV
What the job involves
* We’re looking for an AI Researcher to join our core AI team and push the boundaries of Foundation Multimodal Conversational Models
* Conduct research on Large Multimodal Models in the context of Conversational Avatars (e.g. Neural Avatars, Talking-Heads)
* Develop methods to model both verbal and non-verbal aspects of conversation, adapting and controlling avatar behavior in real time, with low-latency
* Experiment with fine-tuning, adaptation, and conditioning techniques to make AudioVisual Multimodal Models, more expressive, controllable, and task-specific
* Partner with the Applied ML team to take research from prototype to production
* Stay up to date with cutting-edge advancements — and help define what comes next
#J-18808-Ljbffr