The CPU Strikes Back: Architecting Inference for SLMs on Cisco UCS M7
Target Scope & Technical Boundaries Primary Objective: To validate the architectural viability of running Small Language Models (SLMs) like Llama 3 (8B) and Mistral (7B) on standard Cisco UCS M7 Compute Nodes (Intel Xeon 5th Gen) without discrete GPUs. In Scope: Instruction Set Architecture: Utilizing Intel AMX (Advanced Matrix Extensions) and AVX-512 for inference acceleration….


