I hate writing Kubernetes manifests. But for the last three years, if you wanted to serve a custom GenAI model, you had to build a cluster. AWS Lambda was useless for this. You can’t fit a modern PyTorch model in the zip limit, the cold starts are 10 seconds, and there is no GPU access….