Hardware Setup

$ neofetch --hypervisor
  ╔═══════════════════════════════════════╗
  ║  HYPERVISOR (Physical Host)           ║
  ╠═══════════════════════════════════════╣
  ║  OS:       Proxmox VE (Debian-based)  ║
  ║  CPU:      Xeon E5-2690v4 (14C/28T)   ║
  ║  RAM:      128GB DDR4 ECC             ║
  ║  GPU:      AMD R9700 AI PRO 32GB      ║
  ║  Storage:  SSD pool + 22TB HDD        ║
  ║  Network:  LAN + Tailscale VPN        ║
  ║  Stack:    KVM / LXC / VFIO           ║
  ║  Uptime:   continuous                 ║
  ╚═══════════════════════════════════════╝

$ neofetch --ageis-node
  ╔═══════════════════════════════════════╗
  ║  AGEIS-NODE (AI VM)                   ║
  ╠═══════════════════════════════════════╣
  ║  OS:       Debian 13 (Trixie)         ║
  ║  CPU:      4 vCPUs (from Xeon pool)   ║
  ║  RAM:      16GB allocated             ║
  ║  GPU:      AMD R9700 32GB passthrough  ║
  ║  Stack:    ROCm 6.x / Ollama          ║
  ║  Role:     AI inference + agents      ║
  ╚═══════════════════════════════════════╝

$ neofetch --embedder-node
  ╔═══════════════════════════════════════╗
  ║  EMBEDDER NODE                        ║
  ╠═══════════════════════════════════════╣
  ║  OS:       Debian 13 (Trixie)         ║
  ║  GPU:      GTX 1650 Super 4GB         ║
  ║  Role:     Embeddings + TTS worker    ║
  ║  Status:   operational                ║
  ╚═══════════════════════════════════════╝

Hypervisor — Physical Host

The physical machine runs Proxmox VE as the hypervisor, hosting KVM virtual machines and LXC containers for service isolation. All VMs and workloads run on this hardware.

Processor

CPUIntel Xeon E5-2690v4
Cores14 cores / 28 threads @ 2.60GHz
PlatformX99
RoleHypervisor + VM host

Memory

Installed128GB DDR4 ECC
TypeECC Registered
AllocatedVMs + Proxmox overhead

GPU

GPUAMD Radeon AI PRO R9700
VRAM32GB
PassthroughVFIO to ageis-node VM

Storage

SSD2x 1TB TeamGroup + Samsung 870 EVO + Samsung 960 NVMe
HDD22TB WD White (bulk storage)
BootProxmox host OS
VM StorageLocal SSD pool

ageis-node — AI VM

A KVM virtual machine on the hypervisor dedicated to AI inference, the OpenClaw pipeline, and the ACT-R memory system. The R9700 is passed through via PCIe for near-native GPU performance.

VM Resources

vCPUs4 (allocated from Xeon)
RAM16GB (expandable)
GPUAMD R9700 32GB passthrough
DriverROCm (Linux)
StoragePortion of SSD pool

Role

InferenceLocal LLM via Ollama
AgentsOpenClaw pipeline
MemoryACT-R cognitive system
DatabasePostgreSQL

Embedder / Inference Node

A dedicated secondary node handles embedding generation and text-to-speech workloads, keeping those tasks off the main inference GPU.

Embedder Node

OSDebian 13 (Trixie)
GPUGTX 1650 Super 4GB
RoleEmbeddings + TTS worker
StatusOperational

Network

Both nodes sit on the local LAN with Tailscale VPN providing secure remote access. Services are accessible from anywhere through the mesh network without exposing ports to the internet.

Network Topology

LANPrivate local network
VPNTailscale mesh
Remote AccessTailscale + SSH
DNSLocal resolver

ROCm GPU Stack

The AMD Radeon AI PRO R9700 runs ROCm on Linux for local GPU inference. ROCm provides the compute framework — similar to CUDA but for AMD hardware. This is what makes local AI inference possible without depending on NVIDIA.

The GPU is passed through from Proxmox to the ageis-node VM using KVM GPU passthrough. This gives the VM direct hardware access with near-native performance while keeping it isolated from the host.

$ rocm-smi --showmeminfo vram
GPU[0] : VRAM Total: 32GB
GPU[0] : VRAM Used:  variable (model dependent)

$ rocminfo | grep "Name"
Name: gfx1201
Marketing Name: AMD Radeon AI PRO R9700

Running Services

  • OpenClaw — AI agent platform (primary workload)
  • Ollama — Local LLM inference server
  • Embedding service — Vector embeddings on the embedder node
  • TTS service — Text-to-speech on the embedder node
  • PostgreSQL — ACT-R memory store + application data
  • Tailscale — Mesh VPN across all nodes
  • Monitoring — Resource tracking and alerting

Why Local-First

Running inference locally means no API rate limits, no per-token costs for routine tasks, no data leaving the network, and no dependency on external services being available. The cloud is a tool, not a requirement — heavy reasoning goes to Claude API, everything else stays local. Read the full argument in Why Local AI Matters, or see how this hardware powers the OpenClaw agent platform. For the complete infrastructure deep-dive, see the Homelab Architecture write-up.