|

GitOps for Bare Metal: Applying SDLC to Physical Hardware

A server rack where the physical hardware is displaying Git commit logs, symbolizing hardware as code.
Figure 1: If it isn’t in Git, it doesn’t exist. Even the hardware.

The “Spreadsheet of Doom”

You know the one. That “Master Inventory.xlsx” file everyone dumps in the Engineering Drive. MAC Address, IPMI IP, Rack Unit, Status—it’s all there.

And it is always, 100% of the time, wrong.

You go to provision a “spare” node, only to find it has a dead drive, or the wrong BIOS version, or someone quietly repurposed it for a dev cluster three months ago and never updated the sheet.

This is Physical Drift.

In the cloud, we treat infrastructure as code. We destroy and recreate VMs with a pull request. But on-prem, we still treat bare metal servers like pets. We name them, nurse them, and manually log into the BMC to mount ISOs.

That era is over.

Why This Matters Now

Bare metal is back—and not in a hobbyist way.

AI clusters, GPU farms, edge compute, and data sovereignty workloads are pushing infrastructure back into physical data centers. Hardware lifecycles are shrinking. Firmware vulnerabilities are increasing. And human-operated metal does not scale.

If your hardware lifecycle still depends on:

  • Tribal knowledge,
  • Spreadsheets,
  • And a crash cart…

To make this work, we need a bridge between the API world (Git) and the physical world (IPMI/Redfish).

What GitOps for Bare Metal Is Not

Let’s be clear about what this architecture is—and isn’t.

It is not:

  • A prettier UI for IPMI.
  • A CMDB with YAML syntax.
  • A magic fix for bad hardware.

The Mechanics: Bridging the Air Gap

“GitOps for bare metal” sounds like a buzzword until you realize it is just a reconciliation loop. In Kubernetes, an operator watches a Git repo and makes the cluster match the manifest. In Bare Metal, the “Cluster” is the data center, and the “Manifest” is the hardware state.

The “Machine” as a CRD

We stop thinking of a server as “that Dell R760 in Rack 12.” We think of it as a Machine object in YAML.

YAML

apiVersion: tinkerbell.org/v1alpha1
kind: Hardware
metadata:
  name: worker-node-04
spec:
  metadata:
    facility: "us-east-dc1"
  network:
    interfaces:
      - dhcp:
          mac: "00:25:90:fd:8b:22"
          ip: "10.10.30.24"
  bmc:
    ip: "10.10.10.24"
    username: "admin"
    password: "ipmi-password-secret"

When you push this YAML to Git:

  1. The Workflow Engine (like Tinkerbell or Canonical MAAS) sees the new commit.
  2. The Provisioner talks to the BMC via Redfish/IPMI.
  3. The Action powers on the server, sets the boot order to PXE, and streams the OS image directly to RAM.

There is no spreadsheet. The “Inventory” is the Git Repo. If a server dies, you delete the YAML, and the provisioner wipes the hardware.

Diagram showing the flow from Git Commit to BMC/IPMI command execution.
Figure 2: The Gap between code and metal is bridged by the BMC Controller.

The Solution: A Declarative Data Center

So, what is this actually?

It is:

  • A reconciliation system between desired state and physical reality.
  • A way to apply SDLC principles to hardware lifecycle.
  • A foundation for operating physical infrastructure like cloud.

To build this, you need a stack that speaks both Git and IPMI.

1. The Source of Truth (Git)

Your entire fleet state lives here.

  • hardware/: Defines MACs, BMC IPs, and physical location.
  • profiles/: Defines what the server is (e.g., “GPU Worker”, “Storage Node”).
  • templates/: Defines the boot workflow (wipe disks -> flash BIOS -> install OS).

2. The Workflow Engine (Tinkerbell / Smee)

This is the “Kubernetes for Metal.” It runs a DHCP server, a TFTP server, and a workflow controller. When a server boots, it doesn’t just get an IP. It gets a Job.

  • Job 1: Erase partition table.
  • Job 2: Flash Firmware v2.4.
  • Job 3: Stream CoreOS image to /dev/nvme0n1.

3. The Reconciliation Loop

This is where the magic happens. If you change the OS_VERSION in Git from v1.2 to v1.3:

  • The engine detects the drift.
  • It marks the node as “Out of Compliance.”
  • Depending on your policy, it effectively reboots and re-images the node automatically.
A GitHub PR triggering a real-time bare metal OS reinstall.
Figure 3: Merging a Pull Request is the new “Insert Boot Media.”

What Breaks Without This

If you ignore declarative hardware, you are building on sand. Without this:

  • Zombie servers run outdated firmware for years.
  • Shadow clusters never get patched.
  • Disaster recovery depends on one engineer’s memory.
  • Security controls stop at the hypervisor.
  • Compliance audits turn into scavenger hunts.

You don’t have an infrastructure problem. You have a state management problem.

The Architect’s Verdict

We need to stop fearing the hardware. The reason on-prem operations fail is that we rely on human memory to manage physical state.

  1. Treat MAC Addresses like UUIDs. They are just unique identifiers for a compute resource.
  2. Kill the Snowflake Servers. If a server requires a “special BIOS tweak” that isn’t in the code, it is technical debt. Wipe it.
  3. The BMC is the API. Stop walking to the rack with a crash cart. If you can’t fix it via IPMI, it’s a hardware RMA, not a sysadmin problem.

If you are building your Modern Infrastructure Pillar, this is the foundation. You cannot run a cloud on top of manual hardware.

If you’re serious about Infrastructure as Code, GitOps can’t stop at the hypervisor. The metal is next. If you are ready to automate it, start the Modern Infra & IaC Learning Path.


Additional Resources

R.M. - Senior Technical Solutions Architect
About The Architect

R.M.

Senior Solutions Architect with 25+ years of experience in HCI, cloud strategy, and data resilience. As the lead behind Rack2Cloud, I focus on lab-verified guidance for complex enterprise transitions. View Credentials →

Editorial Integrity & Security Protocol

This technical deep-dive adheres to the Rack2Cloud Deterministic Integrity Standard. All benchmarks and security audits are derived from zero-trust validation protocols within our isolated lab environments. No vendor influence.

Last Validated: Feb 2026   |   Status: Production Verified
Affiliate Disclosure

This architectural deep-dive contains affiliate links to hardware and software tools validated in our lab. If you make a purchase through these links, we may earn a commission at no additional cost to you. This support allows us to maintain our independent testing environment and continue producing ad-free strategic research. See our Full Policy.

Similar Posts