Overview

Proxmox hypervisor running KVM virtual machines with GPU passthrough. The backbone of everything — from OpenClaw agents to local AI inference. An AMD Radeon AI PRO R9700 with 32GB VRAM handles local inference via ROCm. Not a cloud bill. Owned hardware, owned data, full control.

The current system is the second generation. The first started in 2024 as a repurposed Dell T430 workstation tower running a modified Tesla P40 (24GB VRAM) alongside an RTX 3060 12GB — enterprise inference capacity at used-hardware prices. That setup proved the concept: local GPU inference was fast enough and the architecture was worth building on. The current build is the evolution of that work.

Hypervisor — Physical Host

CP

Processor

CPUIntel Xeon E5-2690v4
Cores14 cores / 28 threads @ 2.60GHz
PlatformX99
MM

Memory

Installed128GB DDR4 ECC
TypeECC Registered
GP

GPU

GPUAMD Radeon AI PRO R9700
VRAM32GB
PassthroughVFIO to ageis-node VM
ST

Storage

SSD2x 1TB TeamGroup + Samsung 870 EVO + 960 NVMe
HDD22TB WD White (bulk storage)
VM StorageLocal SSD pool

ageis-node — AI VM

A KVM virtual machine on the hypervisor dedicated to AI inference and agent workloads. The R9700 is passed through via PCIe for near-native GPU performance.

VM

ageis-node

vCPUs4 (allocated from Xeon)
RAM16GB (expandable)
GPUAMD R9700 32GB passthrough
DriverROCm (Linux)
RoleAI inference + OpenClaw + ACT-R

Embedder / Inference Node

A dedicated secondary node handles embedding generation and text-to-speech workloads, keeping those tasks off the main inference GPU.

EN

Embedder Node

OSDebian 13 (Trixie)
GPUGTX 1650 Super 4GB
RoleEmbeddings + TTS worker
StatusOperational

Network

Both nodes sit on the local LAN with Tailscale VPN providing secure remote access. Services are accessible from anywhere through the mesh network without exposing ports to the internet.

NW

Network Topology

LANPrivate local network
VPNTailscale mesh
Remote AccessTailscale + SSH
DNSLocal resolver
$ pvesh get /nodes/hypervisor/status --output-format yaml
cpu_count: 28
memory:
  total: 128G
  allocated_vms: 16G (ageis-node) + others

$ pvesh get /nodes/ageis-node/status --output-format yaml
cpu: 0.12
memory:
  used: 11.2G
  total: 16G
uptime: 17280000
kversion: 6.8.12-5-pve

$ rocm-smi --showmeminfo vram
GPU[0] : VRAM Total:  32768 MB
GPU[0] : VRAM Used:   variable (model dependent)

$ tailscale status
hypervisor     connected   linux   active; direct
ageis-node     connected   linux   active; direct
embedder-node  connected   linux   active; direct

Why Local-First Matters

  • No API costs — run experiments without watching a billing dashboard
  • Offline capability — internet goes down, work continues
  • Data sovereignty — nothing leaves the network unless explicitly sent
  • No rate limits — run as many queries as the hardware can handle
  • Learn by running — understanding infrastructure means operating it daily

Technologies

Proxmox KVM ROCm Tailscale Linux

Related Writing

Status

Production. Runs 24/7. The foundation everything else is built on.