About the Opportunity
A rare chance to join a stealth, well-funded AI hardware start-up building a custom AI SoC and full inference serving stack from scratch. You will work directly alongside world-class hardware and software engineers, with genuine end-to-end ownership over how large-scale foundation models run on next-generation silicon.
What You'll Do
- You'll get the chance to serve as a core contributor on a small, senior team building state-of-the-art inference serving and cluster scheduling capabilities for a custom AI SoC
- You'll have the opportunity to architect high-performance multi-node inference stacks, designing and tuning throughput and latency from the ground up
- You'll get to implement advanced optimisation strategies across TP/PP/EP hybrids, continuous batching, and KV cache management at the intersection of compute, networking, and storage
- You'll have the chance to drive performance improvements directly inside leading inference frameworks including vLLM, SGLang, and PyTorch
- You'll get the opportunity to develop advanced cluster scheduling algorithms that push the frontier of efficiency for large-scale open-source models
- You'll be able to engage directly with the open-source community, upstreaming optimisations and influencing the roadmap of widely adopted AI infrastructure projects
- You'll get to apply best practices in performance benchmarking, testing, and debugging to maintain a production-grade stack that runs on novel silicon
What We're Looking For
- Strong Python, C++ and PyTorch engineering fundamentals with a track record of shipping high-quality software in a fast-moving environment
- 1+ years as an active developer on LLM inference serving frameworks such as vLLM or SGLang
- Deep understanding of LLM inference internals including KV cache, batching strategies, and attention mechanisms
- Experience running and optimising large-scale workloads across heterogeneous clusters
- Proficiency in performance analysis; GPU kernel development in CUDA, Triton, or ROCm is a plus
- Familiarity with networking, storage management, or distributed scheduling technologies such as Orca or LMCache is a significant plus
Education
Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent industry experience preferred.
