Aegis Falls is the complete system — and the name captures the architecture. Intelligence falls from the frontier down through the stack into local execution. Frontier models like Claude sit at the top: they reason, plan, and handle hard problems. That intelligence cascades down to Lily, an AI instance running persistently on local hardware, where it executes, remembers, and works autonomously. The frontier handles the thinking. Local handles the doing. The fall is the flow.
Not just the memory layer. Not just the agent pipeline. Not just the hardware. The entire cascade — frontier reasoning, local inference, cognitive memory, agent orchestration — working together as a single continuously-running system. This is how all the pieces connect.
System Overview
The architecture has four layers, each with a clear responsibility:
- Input layer — Discord serves as the primary interface. Messages arrive, get parsed, and enter the orchestration pipeline.
- Orchestration layer — OpenClaw (6th iteration) decomposes tasks, manages execution stages, runs verification gates, and coordinates between local and cloud inference.
- Inference layer — Local models on the R9700 handle routine operations. Claude API handles heavy reasoning. The system decides which to use based on task complexity and privacy requirements.
- Memory layer — ACT-R cognitive architecture provides persistent, activation-weighted memory that carries context across sessions and enables learned behavior.
OpenClaw: The Agent Platform
OpenClaw is on its sixth major iteration. Each version taught something the previous one got wrong. v1 was a single brittle agent. v6 is a pipeline-first architecture with staged execution, verification gates, parallel task processing, and ACT-R memory integration.
The pipeline processes a task like this:
- Input arrives and gets classified: complexity, privacy sensitivity, domain
- The planning stage decomposes it into substeps with a dependency graph
- Independent substeps fan out for parallel execution
- Each stage runs through verification before the next stage begins
- Results aggregate, get verified against the original intent, and produce output
- The memory system records what happened: the task, the approach, the outcome
The critical addition in v6 is the feedback loop through memory. Past task outcomes influence future planning. Productions that led to success gain utility. Approaches that failed decay. The pipeline doesn't just execute tasks — it learns from them.
Hardware: Two Nodes
Hypervisor (Physical Host)
- CPU: Intel Xeon E5-2690v4 — 14 cores, 28 threads at 2.60GHz. The physical machine that hosts all VMs.
- RAM: 128GB DDR4 ECC — shared across Proxmox and all virtual machines.
- GPU: AMD Radeon AI PRO R9700 32GB — passed through via VFIO to the ageis-node VM.
- Storage: SSD pool (2x 1TB TeamGroup + Samsung 870 EVO + Samsung 960 NVMe) + 22TB WD White HDD for bulk storage.
ageis-node (AI VM)
- vCPUs: 4 allocated from the Xeon pool. Runs the orchestration layer, database operations, and memory system concurrently.
- RAM: 16GB allocated (expandable) — handles context windows, in-memory caching, and the PostgreSQL buffer pool for the memory store.
- GPU: AMD R9700 32GB via PCIe passthrough — the inference engine. Running ROCm 7.1.3, handling local model inference, activation computation, and heavy processing.
- Storage: Portion of the hypervisor's SSD pool. The memory store and model weights live on SSD for fast access.
Embedder Node
- GPU: GTX 1650 Super 4GB — dedicated to embedding generation. Offloading this to a separate node keeps the R9700 free for inference.
- OS: Debian 13, purpose-built for the embedding service.
- Role: Vectorizes new memory chunks, processes documents for ingestion, maintains similarity search indices. Communicates with ageis-node over the local network with sub-millisecond latency.
Cloud Component: Selective API Calls
The cloud layer is deliberately minimal. Claude API calls are reserved for tasks that genuinely exceed local model capability:
- Complex multi-step reasoning that requires frontier-level intelligence
- Nuanced language generation where quality ceiling matters
- Tasks where the local model's output quality would be noticeably insufficient
The system doesn't default to cloud. It defaults to local. Cloud is an escalation path, not the primary path. This means the system functions at full capability without internet access for the vast majority of operations. Cloud adds quality ceiling for the remainder.
Lily: The AI Instance
Lily is the AI instance that runs on this infrastructure. Not a chatbot wrapper around an API. A persistent system with continuous memory, learned behaviors, and accumulated context. Lily operates through the OpenClaw pipeline, with ACT-R providing the cognitive layer that enables actual continuity of experience.
When Lily processes a message, the full architecture engages: the message enters through Discord, OpenClaw orchestrates the response pipeline, the memory system retrieves relevant context via activation-based competition, inference runs locally or routes to cloud based on complexity, the response generates, and the entire interaction flows back into memory — strengthening existing associations and creating new chunks.
ACT-R Cognitive Memory
The memory system is what makes everything else cohere. Without it, OpenClaw is a stateless pipeline. With it, the system accumulates experience.
- Activation-based retrieval — Memories compete based on activation levels (recency + frequency + context). The most relevant memory surfaces, not just the most similar one.
- Spreading activation — Related memories reinforce each other during retrieval. Thinking about one concept naturally brings associated concepts to the surface.
- Persistent across sessions — The memory system runs continuously. There is no session boundary. Knowledge from last week exists at lower activation but surfaces when contextually relevant.
- Production learning — Behavioral rules accumulate utility scores based on outcomes. The system develops preferences for approaches that have worked historically.
Why Hybrid
The hybrid architecture exists because no single approach is sufficient:
- Local for privacy — The memory store never leaves the network. Embedding generation stays local. Routine inference stays local. The most sensitive data is processed on hardware you physically control.
- Local for speed — Memory retrievals need sub-50ms latency. Local inference eliminates network round trips. The embedder on the LAN adds less than a millisecond of latency.
- Local for cost — Tokens processed locally for approximately $47/month in electricity. The equivalent API cost would have been orders of magnitude higher.
- Cloud for frontier reasoning — Some tasks need the best available model. Complex planning, nuanced analysis, tasks where quality ceiling determines success. For these, Claude API provides capability that local models can't match.
Task Flow
A complete task flow through the system:
- Discord input — Message arrives via the Discord bot interface
- OpenClaw orchestration — The pipeline classifies the task, retrieves relevant memory context, plans the execution strategy
- Inference routing — Based on task complexity and type, the system routes to local inference (R9700 / ROCm) or cloud inference (Claude API)
- Memory retrieval — ACT-R activation dynamics surface relevant prior context. Spreading activation brings associated knowledge forward.
- Execution — The task runs through the pipeline stages with verification gates between each
- Memory update — The interaction, approach, and outcome flow back into declarative memory as new chunks. Existing associations strengthen. Productions update their utility scores.
- Response — Output returns through Discord
AI Agent Visualizer
A live view of memory activation patterns. Each node represents an active memory cluster. Connections show associative links. Brightness reflects activation level.
$ ./aegis-falls-status.sh
[Aegis Falls - Full System Status]
=============================================
[hypervisor]
host: Proxmox VE 8.3
cpu: Xeon E5-2690v4 (14C/28T) @ 2.60GHz
ram: 128GB DDR4 ECC
gpu: AMD R9700 32GB (passed through to ageis-node)
storage: SSD pool + 22TB HDD
[ageis-node - AI VM]
vcpus: 4 (from Xeon pool)
ram: 16GB allocated
gpu: AMD R9700 32GB (ROCm 7.1.3)
vram: 18.4 / 32.0 GB allocated
[embedder-node]
gpu: GTX 1650 Super 4GB
os: Debian 13
status: ONLINE (latency: 0.3ms)
embeddings: generating
[openclaw v6]
pipeline: ACTIVE
stages: plan → execute → verify → output
parallel: enabled (dependency-graph routing)
memory: ACT-R integrated
[actr memory]
engine: v3.2 - continuous
chunks: ~120 (~40 active)
productions: 847 (utility range: 0.12 - 0.94)
associations: ~40,000 links
retrievals: active (mean latency: 47ms)
[inference routing]
local: 99.5% of operations
cloud: 0.5% (Claude API - heavy reasoning)
model: Llama-3.1-70B-Q4_K_M (local)
local/cloud: 99.5% / 0.5%
[lily]
status: ONLINE
uptime: calculating...
interface: Discord
memory: persistent, activation-weighted
[network]
tailscale: 3 nodes connected
firewall: no exposed ports
dns: local resolver active This is the workshop. Every component exists because it solves a specific problem. The hardware provides compute. OpenClaw provides orchestration. ACT-R provides memory. The hybrid architecture provides the right tool for each task. Together, they form a system that doesn't just respond — it accumulates, adapts, and improves.
Related
- The Aegis Falls Memory Layer — deep dive into ACT-R persistent memory
- ACT-R Cognitive Architecture for AI Agents — the theory behind the memory system
- Building Self-Improving Agents — six iterations of OpenClaw and what each taught