Overview
Proxmox hypervisor running KVM virtual machines with GPU passthrough. The backbone of everything — from OpenClaw agents to local AI inference. An AMD Radeon AI PRO R9700 with 32GB VRAM handles local inference via ROCm. Not a cloud bill. Owned hardware, owned data, full control.
The current system is the second generation. The first started in 2024 as a repurposed Dell T430 workstation tower running a modified Tesla P40 (24GB VRAM) alongside an RTX 3060 12GB — enterprise inference capacity at used-hardware prices. That setup proved the concept: local GPU inference was fast enough and the architecture was worth building on. The current build is the evolution of that work.
Hypervisor — Physical Host
CP
Processor
| CPU | Intel Xeon E5-2690v4 |
| Cores | 14 cores / 28 threads @ 2.60GHz |
| Platform | X99 |
MM
Memory
| Installed | 128GB DDR4 ECC |
| Type | ECC Registered |
GP
GPU
| GPU | AMD Radeon AI PRO R9700 |
| VRAM | 32GB |
| Passthrough | VFIO to ageis-node VM |
ST
Storage
| SSD | 2x 1TB TeamGroup + Samsung 870 EVO + 960 NVMe |
| HDD | 22TB WD White (bulk storage) |
| VM Storage | Local SSD pool |
ageis-node — AI VM
A KVM virtual machine on the hypervisor dedicated to AI inference and agent workloads. The R9700 is passed through via PCIe for near-native GPU performance.
VM
ageis-node
| vCPUs | 4 (allocated from Xeon) |
| RAM | 16GB (expandable) |
| GPU | AMD R9700 32GB passthrough |
| Driver | ROCm (Linux) |
| Role | AI inference + OpenClaw + ACT-R |
Embedder / Inference Node
A dedicated secondary node handles embedding generation and text-to-speech workloads, keeping those tasks off the main inference GPU.
EN
Embedder Node
| OS | Debian 13 (Trixie) |
| GPU | GTX 1650 Super 4GB |
| Role | Embeddings + TTS worker |
| Status | Operational |
Network
Both nodes sit on the local LAN with Tailscale VPN providing secure remote access. Services are accessible from anywhere through the mesh network without exposing ports to the internet.
NW
Network Topology
| LAN | Private local network |
| VPN | Tailscale mesh |
| Remote Access | Tailscale + SSH |
| DNS | Local resolver |
$ pvesh get /nodes/hypervisor/status --output-format yaml
cpu_count: 28
memory:
total: 128G
allocated_vms: 16G (ageis-node) + others
$ pvesh get /nodes/ageis-node/status --output-format yaml
cpu: 0.12
memory:
used: 11.2G
total: 16G
uptime: 17280000
kversion: 6.8.12-5-pve
$ rocm-smi --showmeminfo vram
GPU[0] : VRAM Total: 32768 MB
GPU[0] : VRAM Used: variable (model dependent)
$ tailscale status
hypervisor connected linux active; direct
ageis-node connected linux active; direct
embedder-node connected linux active; direct
Why Local-First Matters
- No API costs — run experiments without watching a billing dashboard
- Offline capability — internet goes down, work continues
- Data sovereignty — nothing leaves the network unless explicitly sent
- No rate limits — run as many queries as the hardware can handle
- Learn by running — understanding infrastructure means operating it daily
Technologies
Proxmox KVM ROCm Tailscale Linux
Related Writing
Status
Production. Runs 24/7. The foundation everything else is built on.