About the Opportunity

A rare chance to join a stealth, well-funded AI hardware start-up building a custom AI SoC and full inference serving stack from scratch. You will work directly alongside world-class hardware and software engineers, with genuine end-to-end ownership over how large-scale foundation models run on next-generation silicon.

What You'll Do

You'll get the chance to serve as a core contributor on a small, senior team building state-of-the-art inference serving and cluster scheduling capabilities for a custom AI SoC
You'll have the opportunity to architect high-performance multi-node inference stacks, designing and tuning throughput and latency from the ground up
You'll get to implement advanced optimisation strategies across TP/PP/EP hybrids, continuous batching, and KV cache management at the intersection of compute, networking, and storage
You'll have the chance to drive performance improvements directly inside leading inference frameworks including vLLM, SGLang, and PyTorch
You'll get the opportunity to develop advanced cluster scheduling algorithms that push the frontier of efficiency for large-scale open-source models
You'll be able to engage directly with the open-source community, upstreaming optimisations and influencing the roadmap of widely adopted AI infrastructure projects
You'll get to apply best practices in performance benchmarking, testing, and debugging to maintain a production-grade stack that runs on novel silicon

What We're Looking For

Strong Python, C++ and PyTorch engineering fundamentals with a track record of shipping high-quality software in a fast-moving environment
1+ years as an active developer on LLM inference serving frameworks such as vLLM or SGLang
Deep understanding of LLM inference internals including KV cache, batching strategies, and attention mechanisms
Experience running and optimising large-scale workloads across heterogeneous clusters
Proficiency in performance analysis; GPU kernel development in CUDA, Triton, or ROCm is a plus
Familiarity with networking, storage management, or distributed scheduling technologies such as Orca or LMCache is a significant plus

Education

Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent industry experience preferred.

Senior ML Engineer (Inference Serving)

Apply Now

Embedded Software Engineer (all-levels)

Senior Product Manager - AI Cloud

Windows Kernel Engineer

Get in touch.