- SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.
- Char2Wav: End-to-end Speech Synthesis.
- ObamaNet: Photo-realistic lip-sync from text.
- MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.
- Chunked Autoregressive GAN for Conditional Waveform Synthesis
- Wav2CLIP: Learning Robust Audio Representations From CLIP
Responsibilities
- Oversee research process implementation, from problem definition all the way to running and analyzing concrete research experiments.
- Collaborate and communicate clearly and efficiently with the rest of the team about the status, results, and challenges of your current tasks.
- Contribute to designing the research roadmap of the company
- Train and mentor other members of the team.
- Own the research function of specific product features.
Challenges
- Using deep learning (including but not limited to NLP, speech processing, computer vision, etc.) to solve problems for media creation and editing.
- Creating realistic voice doubles using only a few minutes of audio.
- Creating tools to synthesize photo-realistic videos that match our Overdub (personalized speech synthesis) feature.
- Designing and developing new algorithms for media synthesis, anomaly detection, speech recognition, speech enhancement, filler word detection, audio and video tagging etc.
- Coming up with new research directions to improve our product
Requirements
- Proven experience in designing and implementing deep learning algorithms.
- PhD or Master’s degree specialized in Deep Learning or equivalent experience.
- Track record of developing new ideas in machine learning, as demonstrated by one or more first author publications or projects.
- Good programming skills and experience with deep learning frameworks.
- Ability to generate more ideas than you can implement.
- Implementing a given idea is easy and efficient. Once the experiment set up is established, you’re able to implement many ideas per day, evaluate them, and organize your time to be productive and efficient.
- You wish you had more GPUs to run all the experiments that you wanted!
- You know Pytorch/Tensorflow inside-out
- We do not require domain-specific knowledge in computer vision or speech-processing
- Lead author of an accepted publication in one of the top conferences: ICLR, ICML, NeurIPS, ICASSP, ICCV, CVPR, InterSpeech, etc.
- Played a key role in shipping a feature in production which uses deep learning as a key component.
Benefits include a generous healthcare package, catered lunches, and flexible vacation time. We currently have offices in San Francisco and Montreal, and are open to folks working remotely between PT and ET time zones. Whether you love WFH or can’t wait to get back to being in person, we're interested in offering an environment that works for you.
Descript is an equal opportunity workplace—we are dedicated to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. We believe in actively building a team rich in diverse backgrounds, experiences, and opinions to better allow our employees, products, and community to thrive.
Descript is building a simple, intuitive, fully-powered editing tool for video and audio — an editing tool built for the age of AI. We are a team of 125 — with a proven CEO and the backing of some of the world's greatest investors (OpenAI, Andreessen Horowitz, Redpoint Ventures, Spark Capital).
Descript is the special company that's in possession of both product market fit and the raw materials (passionate user community, great product, large market) for growth, but is still early enough that each new employee has a measurable influence on the direction of the company.