Description
This is us
Kaltura’s (NYSE:KLTR) mission is to power any video experience for any organization – live, on-demand, or real-time. We not only want to make using video simpler, but we also want to better people’s lives through video. Founded in 2006, Kaltura is now a global leader in the video market with millions of people using our products daily to teach, learn, watch, connect, and collaborate. Among our customers, you’ll find more than 1000 global, well-known organizations.
15+ years since starting the company, we continue to foster a diverse and collaborative work environment where everyone gets a say. Our team is currently 700+ people, and we’re still growing. We have offices in New York, London, Singapore, and Tel Aviv, but our technology is all in the cloud.
Kaltura has a fast-paced environment where initiative is always encouraged. Together with our hybrid work model and flexible state of mind, you get the right conditions for creative juices to flow freely. Thanks to our long line of products, cultivation of rich collaborative culture and care for each Kalturian, you’ll never run out of room to grow and evolve.
If you don't meet 100% of the requirements below - that's okay, nobody's perfect! We believe in hiring people, not just a list of skills. We encourage you to apply if you think this is a role that would make you excited about coming to work every day.
Requirements
The role
We are looking for a brilliant AI Research Engineer to build the brain and body of our Real-Time Avatar & Conversational Stack. This is a hands-on, deep-tech role where you will design, train, and optimize the next generation of Multimodal AI Models. You will join an elite R&D unit, working at the bleeding edge of Generative Video, Speech Synthesis, and Large Language Models. Your mission is to solve one of the hardest problems in AI: creating a unified, ultra-low-latency agent that can see, hear, and speak with human-level fidelity. You won't just implement papers; you will architect the systems that define the state-of-the-art for enterprise video.
The day-to-day
- Collaborate with Technical Leadership: Partner directly with the Head of AI to architect the long-term research roadmap. You will work shoulder-to-shoulder with other AI Research Engineers, brainstorming novel architectures and conducting peer reviews to push the collective intelligence of the team.
- Master Multimodal Architectures: Research and train large-scale models that fuse Video Generation (pixels), Audio (speech/prosody), and Text (semantics) into a cohesive experience.
- Next-Gen Video Synthesis: Develop and optimize advanced architectures—specifically Diffusion Transformers (DiT) and modern GANs—for photorealistic avatar synthesis, focusing on lip-sync accuracy and temporal consistency.
- Conquer Real-Time Constraints: Tackle the challenge of "in-the-wild" inference. You will optimize heavy foundation models to run within strict millisecond latency budgets, ensuring fluid, uninterrupted conversation.
- Advance the Speech Stack: Enhance our proprietary Streaming ASR and Neural TTS architectures to handle interruptions, emotional intonation, and multi-speaker dynamics seamlessly.
Ideally, we’re looking for:
- 5+ years of experience in Deep Learning research and engineering, with a strong track record of bringing research concepts to production.
- Advanced Academic Background: M.Sc. or Ph.D. in Computer Science, AI, or a related field, with a focus on Generative Models or Computer Vision.
- Generative Media Expertise: Deep understanding of modern architectures (Transformers, Diffusion, GANs) applied to video synthesis, neural rendering, or audio generation.
- Strong Engineering Skills: Proficiency in Python and deep learning frameworks (PyTorch is preferred), with the ability to write clean, modular, and scalable code.
- Inference Optimization: Experience optimizing models for low-latency real-time inference (e.g., Quantization, TensorRT, ONNX).
These would also be nice:
- Top-Tier Publications: A record of published papers in major AI conferences (CVPR, NeurIPS, ICCV, etc.).
- Low-Level Optimization: Experience with CUDA or C++ for maximizing GPU performance.
- Streaming Knowledge: Familiarity with real-time media protocols like WebRTC.
The perks:
- Hybrid, flexible work environment
- Extended private health (including mental) insurance
- Personal and professional development programs
- Occasional Cross company long weekends