AI Cluster Runtime: Reproducible Configs for GPU-Accelerated Kubernetes Clusters

GPU Kubernetes is hard. Aligning kernels, drivers, container runtimes, operators, and Kubernetes versions is a version compatibility minefield. A single misconfigured component can take down an entire GPU fleet, and root cause analysis can take days. Typically, these known-good configurations live as tribal knowledge in “runbooks” and internal pipelines, not as portable, reproducible artifacts. ...

2026-03-12 · 3 min · Mark Chmarny

Leaving Cruise; why I'm still excited about AI platforms

Today marks a bittersweet moment as I say goodbye to Cruise. When I joined the company seven months ago, my mission was to scale the AV services worldwide and to modernize the AV and Cloud service developer platforms. Despite the unexpected challenges following the October incident, my journey at Cruise has been incredibly enriching, teaching me the true essence of resilience, adaptability, and commitment to excellence. ...

2024-02-16 · 1 min · Mark Chmarny