The Globalization Data Science and Engineering team is at the forefront of removing language barriers and providing a stellar member experience to all our members regardless of their language preferences. We are responsible for the translation and cultural adaptation of all aspects of member interaction, including beautiful localized user interfaces, subtitles, and dubbing of award-winning Netflix originals.
We are looking for an experienced Machine Learning Engineer with deep expertise in training and inference efficiency for Large Language Models (LLMs), Multimodal LLMs, and other media ML models. In this rare opportunity, you will design and build systems and infrastructure that make LLM training and inference faster, more scalable, and more reliable across a diverse global catalog and workload. You will partner with a talented cross-functional team of scientists, engineers, product managers, and domain experts to deliver business impact through efficient, production-ready ML solutions.
ResponsibilitiesDesign and build scalable training and inference systems for LLMs, Multimodal LLMs, and other media ML models.
Optimize end-to-end training: data pipelines (streaming, sharding, bucketing), distributed training (parallelism strategies), and mixed precision.
Optimize inference and serving: KV cache, batching, quantization, and long-context handling.
Scale model training and inference into robust, performant systems integrated into Netflix workflows.
Act as a technical thought leader for training and inference efficiency, driving initiatives that significantly improve scalability, latency, and reliability.
Mentor and uplevel other engineers and scientists in large-scale ML systems and performance engineering.
About youExtensive experience in ML engineering for large, production-grade systems using LLMs, Multimodal LLMs, and other media ML models.
Deep hands-on expertise in training optimization: high-throughput data loading (streaming, sharding, bucketing); distributed training (parallelism strategies); GPU/accelerator optimization.
Strong experience in inference optimization: KV cache design and optimization; batching and scheduling for high-throughput, low-latency serving; quantization and/or model compression.
Proficient with PyTorch and solid software engineering fundamentals (testing, observability, performance profiling).
Proven track record of leading ML initiatives and partnering with stakeholders to define and execute impactful roadmaps.
Exceptional communication and collaboration skills; comfortable with ambiguity and high ownership.
Netflix culture resonates with you.
Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $466,000.00 - $750,000.00. This compensation range will vary based on location.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.
Netflix is a unique culture and environment. Learn more here.
Job is open for no less than 7 days and will be removed when the position is filled.