Cost-Aware Model Routing in Production: Why Every Request Shouldn’t Hit Your Best Model
Your system isn’t expensive because your models are expensive. It’s expensive because every request defaults to the most capable model you have. That’s not a cost problem. That’s a routing problem. And most systems don’t have a routing layer at all. Part 1 established why inference cost emerges from behavior, not provisioning. Part 2 explained…
