Senior Cloud Infrastructure Engineer
Remote·Posted 1mo ago
aisaasinfrastructuretypescriptgorustkubernetesterraform
Senior Cloud Infrastructure Engineer at Restate Restate ( restate.dev ) is a lightweight runtime that turns AI agents, workflows, and backend services into durable processes - so teams can focus on their logic, not failure mechanics. The role: We're looking for a Senior to Staff-level cloud infrastructure engineer to work across all product pillars (OSS, on-prem deployments, Multi-tenant SaaS, BYOC; bring your own cloud). This means deep work in our Rust-based infrastructure layer, integrating with cloud provider APIs, building infrastructure-as-code tooling, and ensuring reliability and security at scale. You'll have significant ownership over major parts of our cloud infrastructure. The opportunity Front-row seat to the biggest infra shift in decades Durable runtimes like Restate are becoming the next foundational infrastructure component - and increasingly a critical piece for AI applications . As systems become more agentic, long-running, integration-heavy, and failure-prone, durable execution turns reliability from a bespoke engineering tax into a default property. In this role, you’re not watching that shift from the sidelines - you help build the platform that enables it. State-of-the-art tech, built from first principles Restate re-imagines durable execution as a lightweight self-contained stack - no database required - and ships as a single Rust binary with an optimized custom storage layer, low latency orchestration, and an analytics engine for observability. Enterprise Traction Restate is already used by Fortune 500 companies , including Tier 1 banks running critical financial workflows , and also by cutting-edge AI and infra startups pushing the boundary of what “production-grade agents” mean. You’ll work on problems where reliability, correctness, and operational simplicity are existential. Work with world-class engineers You’ll partner directly with engineers who’ve built and operated foundational systems at scale - creators of Apache Flink , and leaders from Meta’s messaging infrastructure . You’ll have the chance to work with incredibly talented individuals who care deeply about their craft. What you’ll do This is a Cloud Infrastructure Engineering role spanning Restate’s product offering: OSS, on-prem deployments, Multi-tenant SaaS, BYOC. The scope of the role includes but is not limited to: Build and operate Restate Cloud: extend our managed multi-tenant offering, working across the infrastructure, control plane, networking, storage, and observability of Restate workloads. Evolve our BYOC product and work with customers on operating on-prem installations: design and build the infrastructure that runs inside customer cloud accounts. Reliability and observability across the fleet: SLOs, metrics, traces, logs, alerting, and runbooks. Build automation so we can scale our product offering across deployment methods. On-call: participate in the cloud on-call rotation. A US-based hire materially improves our timezone coverage. What we’re looking for Senior to Staff profile We’re targeting Senior-to-Staff: you’ve operated production SaaS or platform infrastructure before, you’ve seen real failure modes, and you have (strong) opinions about how to run multi-tenant systems. You have an appreciation for operating in a compliance-sensitive environment. Must-Haves: Strong cloud infrastructure background with deep understanding of major cloud provider architectures. Experience with infrastructure-as-code and cloud orchestration , particularly Kubernetes-based stateful workloads ; balancing continuous delivery with safety while maintaining large-scale production systems. Software engineering skills in a systems language (Rust, Go, C++); willingness and ability to learn Rust on the job. You should be comfortable taking ownership end-to-end, from design through production operations, and thrive in early-stage startup ambiguity. Nice-to-Haves: Prior experience with Restate or durable execution specifically. Deep enterprise procurement/compliance navigation. Kubernetes operator development, experience with IaC systems like Cluster API, Crossplane or Terraform. Not a fit: You want to work primarily on the runtime core rather than cloud, BYOC, and customer-facing infra. You’ve mostly architected and reviewed, and aren’t excited to be hands-on. You are averse to multi-cloud, Kubernetes, operating infrastructure as a shared responsibility with customers Our stack: We use Restate extensively: the Restate Cloud control plane is built on Restate and TypeScript. Rust infrastructure services and Kubernetes operators. Location and travel US-based, fully remote. East Coast is a plus as it would materially improve our on-call coverage given the team’s existing geography. Travel: minimal - occasional team offsites, little required customer travel.