FABRIC ARCHITECTURE
Fabric topology, execution locality, and east-west congestion — the network layer that determines where AI workloads can run.

>_ Architecture Maturity Position
Current Stage
Operational — Maturity Stage 02 of 07
Primary Architectural Concern
East-west fabric constraints that determine where execution, data, models, and context can physically move at scale.
Primary Architectural Tension
Execution locality — performance and efficiency — vs. execution mobility — flexibility and resource utilization. Optimizing for one degrades the other; most AI cluster failures at scale trace back to a design that never made this tension explicit.
Primary Failure Mode
Fabric-Blind Architecture — cluster designs that model GPU capacity, storage capacity, and scheduling behavior independently while treating east-west networking as shared utility infrastructure. Locality collapse, congestion amplification, and stranded GPU capacity follow at scale.
Stage Outcome
Ability to evaluate fabric topology against AI workload demands, identify the Execution Locality Boundary (#116), and specify network requirements before cluster design or procurement decisions are made.
Next Stage
A3 — AI Storage & Data Pipeline Architecture → /ai-architecture-learning-path/ai-storage-data-pipeline-architecture/
AI fabric architecture is the constraint layer that precedes every placement, scheduling, and locality decision in an AI cluster. Before the scheduler runs, before workloads are admitted, before inference routes are evaluated — the fabric has already determined what is physically possible. The east-west bandwidth envelope, the oversubscription ratio, the congestion control model, and the topology design are all decided at infrastructure time, not runtime. Those decisions propagate silently into every workload outcome that follows.
Teams that treat east-west networking as plumbing don’t encounter the gap at design time — they encounter it when training jobs stall at scale, inference latency spikes under load, or congestion collapses throughput in ways that present as GPU failures or scheduler inefficiency. By then the architectural decision has already been made, often at procurement, often without any analysis of the Execution Locality Boundary that governs where execution can physically occur. This stage exists to move that analysis to where it belongs — before the cluster is designed.
>_ Why This Stage Exists
Named Failure State — Fabric-Blind Architecture
Fabric-Blind Architecture is the condition where network constraints are treated as implementation details rather than architectural constraints. Cluster designs model GPU capacity, storage capacity, and scheduling behavior as independent variables while east-west networking is provisioned as a shared utility service — sized for average load, not for the amplified east-west demand that AI workloads generate at scale.
Once the Execution Locality Boundary is crossed, the fabric becomes the dominant workload-placement authority regardless of scheduler intent. The scheduler can route correctly — the fabric still determines whether execution is feasible. This ties Framework #103 (Infrastructure Authority Migration) and Framework #116 (Execution Locality Boundary) together as the core architectural identity of this stage: the network stops being passive infrastructure and starts making placement decisions whether or not the architecture acknowledges it.
Framework #116 — Execution Locality Boundary
The Execution Locality Boundary is the point at which moving data, models, or context costs more than moving execution, causing network architecture to become the dominant workload-placement constraint. It is not a threshold that can be calculated from a spec sheet — it emerges from the intersection of workload access patterns, model size, context window requirements, and east-west bandwidth capacity. Identifying it before cluster procurement is the primary architectural outcome of this stage.
Three Failure Patterns This Stage Prevents
- 01Fabric saturation misdiagnosed as GPU underperformance or scheduler inefficiency — the root cause remains invisible until workload density increases
- 02InfiniBand vs. RoCEv2 selection made on throughput spec alone without topology requirements or congestion control analysis — a procurement decision that cannot be corrected at runtime
- 03Cluster procurement finalized before the Execution Locality Boundary is identified — placement constraints discovered post-deployment when architectural correction is no longer economically viable
>_ What This Stage Is Not
01 — Not a Networking Fundamentals Primer
This stage assumes working familiarity with switching, routing, and basic topology. The concern is architectural constraint modeling at the AI workload layer — not introductory networking concepts.
02 — Not a Vendor Selection Guide
InfiniBand vs. RoCEv2 is an architectural tradeoff analysis, not a product comparison. This stage is not about switch SKU selection — it is about understanding which fabric model fits which workload topology and why the decision cannot be deferred.
03 — Not a Kubernetes Networking Tutorial
CNI plugins, service mesh, and overlay networking belong to the cluster orchestration layer. Those are A4 concerns. A2 operates at the physical and logical fabric layer that underlies every workload container regardless of how the scheduler addresses it.
04 — Not a Substitute for A4
Fabric constraints define what is physically possible. The scheduler and placement authority layer in A4 decides what actually happens within those constraints. A2 answers: what constrains execution movement? A4 answers: given those constraints, who decides where execution occurs? Different layer. Different architectural concern.
>_ Where to Enter This Stage
The default entry point for this stage is completion of A1 — Accelerated Compute Architecture — or equivalent working vocabulary: VRAM constraints, interconnect topology, GPU scheduling primitives, and the distinction between compute-bound and memory-bound workloads. A1 establishes how accelerated compute behaves. A2 establishes what constrains its movement.
Architects who already hold fabric-layer vocabulary — east-west amplification, oversubscription physics, RoCEv2 congestion control mechanisms — can enter directly at Cluster 02. The Cluster 01 articles are still recommended as calibration for how those concepts interact specifically with AI workload demand patterns, but they are not required prerequisites for experienced network architects.
A4 — AI Runtime & Cluster Orchestration — should not be entered without completing this stage. Placement and scheduling decisions made without a fabric constraint model produce cluster designs that are architecturally unsound before the first workload runs. The scheduler assumes a fabric. A2 is where that assumption gets examined.
>_ Where This Stage Sits
AI Infrastructure Architecture Path — Maturity Progression
| Stage | Architectural Question | Maturity Level |
|---|---|---|
| A1 | How does accelerated compute behave? | Foundation |
| A2 ← YOU ARE HERE | What constrains execution movement? | Operational |
| A3 | What constrains data movement? | Operational |
| A4 | Who decides where execution occurs? | Strategic |
| A5 | How is execution operated? | Strategic |
| A6 | Who governs execution authority? | Strategic |
| A7 | How does execution survive failure? | Resilient |

>_ Stage Reading Sequence
Each cluster below is organized by architectural problem. Every cluster answers: what becomes architecturally unstable if this discipline is misunderstood?
Cluster 01 — Fabric Physics
How AI fabric behaves at scale
Deterministic Networking: The Missing Layer in AI-Ready Infrastructure
Non-deterministic networking is an architectural liability before any other fabric decision is made. This article establishes why fabric behavior must be specified and validated — not assumed — and what the operational consequences are when latency and delivery guarantees are left to chance at AI scale.
GPU Fabric Physics 2026: Why 800G Isn’t Enough for 100k-GPU Training
East-west traffic amplification, oversubscription physics, and the scale failure modes that emerge when fabric is sized for throughput rather than topology. This article makes the Execution Locality Boundary concrete — demonstrating at what point movement costs exceed compute costs and how that threshold shifts with cluster scale.
Cluster 02 — Congestion & Locality
Where and why execution movement breaks down
InfiniBand Is Losing the Fabric War. Here’s What That Changes for Your Architecture.
The architectural tradeoff between guaranteed delivery and commodity scale — congestion control requirements, topology fit, and why the InfiniBand vs. RoCEv2 decision is not a vendor preference question. The article surfaces the protocol-level constraints that make this a pre-procurement architectural decision, not a post-deployment optimization.
>_ AI Fabric Architecture Failure Patterns
Cluster 03 — Network Authority
The network layer as architectural decision-maker
The Network Is Becoming the AI Control Plane
Framework #103 — Infrastructure Authority Migration. As AI systems scale, execution feasibility becomes increasingly governed by network constraints rather than scheduler intent. The fabric layer evolves from transport mechanism to control-plane authority — making placement decisions whether or not the architecture acknowledges it. This article is the bridge between A2’s fabric constraint model and A4’s placement authority layer.
AI Placement Decisions Are Architecture — Not Optimization
How fabric constraints propagate into placement economics — the cost and latency consequences of decisions made without a locality model. This article closes A2 by making the downstream propagation explicit: the constraints established in this stage are the starting assumptions for A3 (data locality) and the boundary conditions for A4 (placement authority).
>_ Stage Graduates Can Now
You can now operate at the fabric layer with architectural intent. A1 established how accelerated compute behaves — A2 establishes what constrains its movement. What the next stages add is the decision authority layer: A3 asks what constrains data movement across that same fabric, and A4 asks given these constraints, who controls where execution occurs and under what enforcement model.
- →Evaluate InfiniBand vs. RoCEv2 selection against workload topology and congestion control requirements — not throughput specifications
- →Identify the Execution Locality Boundary (#116) in a planned or existing cluster design before procurement decisions are made
- →Diagnose Fabric-Blind Architecture failure modes — fabric saturation events that present as GPU underperformance or scheduler inefficiency
- →Specify east-west bandwidth, oversubscription ratios, and congestion control requirements as first-class cluster design inputs
- →Recognize when fabric constraints have become the dominant execution authority within a platform — and identify where scheduler decisions are no longer the primary determinant of workload placement
- →Upstream bridge: fabric constraint vocabulary established here propagates directly into A3 storage locality decisions and A4 placement authority design — both stages assume this constraint model as their starting point
>_ Where Do You Go From Here?
YOUR FABRIC DESIGN MAY BE CONSTRAINING EXECUTION
BEFORE THE FIRST MODEL DEPLOYS.
Most AI cluster failures that get blamed on GPU shortage or scheduler inefficiency are fabric saturation events that were never modeled at design time. An Infrastructure Architecture Review surfaces the constraints before they become incidents.
Infrastructure Architecture Review
A structured architecture review across your AI infrastructure stack — fabric design, locality modeling, congestion exposure, and control plane dependencies.
- > Execution locality assessment
- > East-west congestion modeling
- > Fabric architecture validation
- > Control-plane dependency mapping
Architecture Playbooks. Field-Tested Blueprints.
AI infrastructure failure patterns, fabric design blueprints, and operational architecture guides from production environments.
- > Fabric congestion analysis patterns
- > Execution locality modeling
- > AI cluster design blueprints
- > Control plane architecture guides
Zero spam. Unsubscribe anytime.
>_ Frequently Asked Questions
Q: What is the Execution Locality Boundary and why does it determine AI workload placement?
A: The Execution Locality Boundary (Framework #116) is the point at which moving data, models, or context costs more than moving execution — causing network architecture to become the dominant workload-placement constraint. Below this boundary, schedulers can route workloads freely across the cluster. Above it, the fabric’s east-west bandwidth and topology constraints become the primary determinant of where execution can actually occur. The significance for architects is that this boundary exists in every AI cluster but is only discovered at procurement time if it is explicitly modeled. Post-deployment discovery means the architectural correction is no longer economically viable.
Q: When does fabric architecture become the binding constraint on AI cluster performance?
A: Fabric architecture becomes the binding constraint when east-west traffic demand — driven by gradient synchronization, activation exchange, model parallelism, or context distribution — exceeds the oversubscribed available bandwidth between compute nodes. At that point, the fabric is making placement decisions that override scheduler intent. The threshold is not fixed — it depends on model size, parallelism strategy, batch size, and topology. The Cluster 01 articles establish the physics of where this threshold sits for different cluster configurations.
Q: InfiniBand vs. RoCEv2 — how should an architect frame that decision beyond throughput specifications?
A: The decision should be framed around three variables: topology requirements (fat-tree vs. dragonfly vs. rail-optimized), congestion control architecture (RDMA losslessness via PFC/ECN vs. InfiniBand’s native flow control), and operational complexity tolerance. Throughput specifications are a starting point, not the decision. RoCEv2 at scale without proper PFC/ECN configuration degrades under load in ways that are difficult to attribute and harder to correct. InfiniBand provides native lossless delivery but constrains topology choices and introduces vendor concentration risk. The architectural question is not which delivers more bandwidth — it is which fabric model the workload topology can actually use.
Q: What is east-west traffic amplification and why does it behave differently in AI clusters than general compute?
A: In general compute environments, east-west traffic is driven by microservice communication — relatively low bandwidth, high message rate, short duration. In AI clusters, east-west traffic is driven by distributed training operations that require synchronized parameter updates across every GPU involved in a training run. A single all-reduce operation in a 512-GPU training job generates traffic proportional to model size multiplied by the number of participating nodes — not just the number of communicating pairs. That amplification factor means AI clusters require fabric bandwidth and topology designed for sustained high-bandwidth all-to-all communication patterns, not the bursty low-bandwidth patterns that general compute fabrics handle well.
Q: What is Fabric-Blind Architecture and what failure modes does it produce at scale?
A: Fabric-Blind Architecture is the condition where a cluster design models GPU capacity, storage capacity, and scheduling behavior as independent variables while treating east-west networking as a shared utility service. The failure modes that follow are: locality collapse (execution placed without respect to data proximity, causing movement costs to dominate runtime), congestion amplification (bandwidth contention that presents as GPU idleness or scheduler inefficiency), and stranded GPU capacity (accelerators that cannot be efficiently utilized because the fabric cannot support the communication patterns the workloads require). The failure state is architecturally preventable — it emerges from treating fabric as an implementation detail rather than a first-class architectural constraint.
Q: How do the fabric constraints established in this stage affect placement and scheduling decisions in A4?
A: A4 — AI Runtime & Cluster Orchestration — inherits the fabric constraint model from A2 as its boundary condition. The scheduler in A4 operates within the feasibility space defined by the fabric topology, east-west bandwidth capacity, and the Execution Locality Boundary. A scheduler that is unaware of these constraints will produce placement decisions that are logically correct but physically suboptimal — routing workloads to nodes that the fabric cannot efficiently connect at the required bandwidth. A4’s placement authority layer is only architecturally sound if it is built on the constraint model that A2 establishes.
>_ Related Systems
Foundation stage — GPU and accelerator mechanics that A2 fabric constraints are designed around. Required context for the Execution Locality Boundary.
Open Stage →Next stage — data locality and pipeline constraints that inherit the fabric model established here. A2 and A3 together define the full movement constraint envelope.
Open Stage →Strategic stage — placement authority and scheduling decisions operate within the constraint boundary A2 defines. The scheduler’s feasibility space is determined here.
Open Stage →Framework #103 — Infrastructure Authority Migration. The doctrinal anchor for Cluster 03 — how the fabric layer acquires control plane authority as AI workload complexity scales.
Open Article →Model east-west saturation thresholds and validate fabric architecture against AI workload demand profiles — apply the constraint model from this stage interactively.
Open Tool →Cross-domain: east-west fabric design principles as applied to hypervisor-based environments — the constraint model overlaps significantly at the physical layer.
Open Track →NVIDIA’s architecture documentation for Quantum-2 — reference for the topology and congestion control characteristics covered in the InfiniBand analysis in this stage.
Open Reference →IETF specification for RoCEv2 congestion management — the PFC/ECN requirements that determine whether RoCEv2 delivers lossless behavior under load.
Open Reference →