SoAZCloud
AI Infrastructure Consultant

I build AI agent systems
that run in production.

Not proofs of concept. Not demos. Multi-agent platforms with persistent memory, security hardening, cost controls, and self-healing infrastructure — serving real customers at scale.

Agent Insights Feed
$400K+ annual savings from a $2K/mo infrastructure investment 89% reduction in token costs through model routing and caching Multiple autonomous agents running 24/7 in production 1,600+ cross-agent memories accumulated — every session gets smarter Nightly self-healing updates — zero manual intervention, zero downtime Compliance checks verified daily — automated, not manual Support tickets triaged and analyzed before engineers see them Replacing headcount with agents — not augmenting, replacing 40 Linux capabilities dropped per container — security by default New customers onboarded automatically — scrape, transform, provision $400K+ annual savings from a $2K/mo infrastructure investment 89% reduction in token costs through model routing and caching Multiple autonomous agents running 24/7 in production 1,600+ cross-agent memories accumulated — every session gets smarter Nightly self-healing updates — zero manual intervention, zero downtime Compliance checks verified daily — automated, not manual Support tickets triaged and analyzed before engineers see them Replacing headcount with agents — not augmenting, replacing 40 Linux capabilities dropped per container — security by default New customers onboarded automatically — scrape, transform, provision
$400K+
Annual Savings
$2K
Monthly Infrastructure
89%
Token Cost Reduction
16x
Return on Investment

Enterprise AI agent infrastructure.
Designed, built, and operated.

Most companies that want AI agents end up with a chatbot bolted onto their product. That solves maybe 10% of the problem. The other 90% — orchestration, security, memory, cost control, multi-tenancy, automated operations — is where the real engineering lives.

I build multi-agent platforms where each agent owns a function — some customer-facing, serving thousands of end users directly, and others internal, automating operations for the teams behind the product. Each agent has its own identity, tools, security boundaries, and persistent memory. They run autonomously, learn from every interaction, and operate within hardened infrastructure that updates and heals itself nightly.

This isn't a framework or a template. It's production infrastructure that serves real customers around the clock.

AI Infrastructure — server stack with agent dashboards and monitoring panels
Case Study

Multi-Agent Platform for Enterprise SaaS

A production AI agent orchestration platform running multiple autonomous agents through a unified gateway — on pace to save $400K+ annually in salaries and benefits from a $2K/month infrastructure investment. External agents serve customers directly. Internal agents replace headcount across engineering, security, sales, and support.

Conversational Chat Agent
External — End Customer Interaction

One agent instance serves all external customers — not per-customer deployments. Context derived from HTTP request origin, with three-layer data isolation: origin-based scoping, API-enforced boundaries, and workspace file rules. Token optimization reduced session cost by 89% through intelligent caching and model routing. Deployed behind ALB + WAF with rate limiting and prompt injection defense.

Technical Reference Agent
External — Public Knowledge Base

Pre-built SQLite database: 76 product models, 4,052 manual sections, 3,241 verified facts with FTS5 full-text search. Live API integration for regulatory lookups. Zero-fabrication policy — never guesses specifications, always cites source material. Public HTTPS endpoint behind ALB + WAF.

Support Operations
Internal — Engineering & Support Teams

Automated ticket triage on cron schedules (business hours, off-hours, weekends). Syncs tickets, gathers code context from GitHub repos, writes technical analysis, and posts structured notes for engineers. 1,600+ memories accumulated from months of continuous operation.

Security Operations
Internal — Security & Compliance Team

CrowdStrike threat monitoring, SOC 2 compliance automation, security awareness training management. Runs daily policy lifecycle checks and post-update health verification. Keeps the security team informed without manual dashboard monitoring.

Data Acquisition
Internal — Data & Engineering Teams

Hybrid architecture: cloud instance for file operations, edge compute node with headed Chrome for browser automation. Bypasses bot detection through CDP-connected headed browsers. Async job system lets internal teams kick off long-running scrapes and check results later.

Provisioning
Internal — Onboarding & Ops Teams

Automated customer setup for internal ops teams: scrape existing web presence, transform data, provision through APIs. Full dev environment with production seed data (15 real accounts, 44K items). Handles complex parent/child account hierarchies.

Sales Enablement
Internal — Sales Team

Internal Q&A and feature training for the sales team. Answers product questions, surfaces competitive positioning, and keeps reps up to date on new capabilities — all through Slack, without leaving their workflow.

Account Management
Internal — Account & Onboarding Teams

Supports account managers and onboarding specialists with customer context, workflow automation, and internal knowledge base access. Two dedicated agents serving overlapping but distinct team functions through the same gateway.

All agents share a unified gateway, persistent semantic memory, and self-healing infrastructure — whether they're serving external customers or replacing internal headcount.

Read the Full Case Study
Shared Intelligence

Agents that remember.
Across every session.

Most AI agents are stateless — every conversation starts from zero. I built a persistent semantic memory system that all agents share. When one agent solves a problem, every other agent benefits.

Built on PostgreSQL + pgvector with local embeddings (not API calls — a 300MB model running on the same instance). Memory types include architectural decisions with rationale, discovered patterns, and verified reference material. 1,600+ memories accumulated and growing.

Semantic memory network — interconnected nodes sharing knowledge across agents
Agent encounters a problem

Support agent receives a ticket about an API integration failure.

Searches shared memory first

Finds that the provisioning agent resolved a similar issue two weeks ago — including root cause and fix.

Resolves with context

Applies the known solution, adds new details discovered during this interaction.

Memory grows

New insight stored. Every future agent session — across all agents — can access it immediately.

PostgreSQL
+ pgvector
Local Embeddings
768-dim, zero API cost
Scoped Access
Agent + shared memories
MCP Server
IDE integration
Open Source

Tools I Build and Release Publicly

Clawd Coordinator

Orchestrate remote Claude Code sessions across machines via WebSocket. Dispatch prompts to agents running on cloud VMs, stream results back in real time, and fan out work in parallel — all from your terminal or via MCP.

Built for developers who need to coordinate AI coding agents across distributed infrastructure. MIT licensed, published on npm as @soazcloud/clawd-coordinator.

WebSocket Orchestration
Fan-out tasks across machines
SQLite Persistence
Task recovery, audit trails, zero native deps
RBAC & Multi-Tenant
Orgs, roles, per-agent tokens
Cross-Platform CLI
Windows, macOS, Linux, 25+ commands

What I Build

End-to-end AI agent infrastructure — from initial architecture through production operations.

Multi-Agent Orchestration

Unified gateway routing across Slack, HTTP, WebSocket, and cron triggers. Each agent isolated with its own tools, permissions, and communication channels.

Multi-Tenant Architecture

Single agent instances serving thousands of end customers with proper data isolation. Origin-based context injection, API-enforced boundaries, workspace-level rules.

Cost Engineering

Token optimization, prompt caching with warm-keeping strategies, model selection per agent based on task complexity. Turning $13.8K/month projections into $4.8K.

Security Hardening

Docker sandboxed execution with 40 Linux capabilities dropped per container. Environment variable filtering, elevated tool restrictions, exec approval workflows.

Self-Healing Operations

Automated nightly updates: version check, session drain, update, rebuild databases, smoke-test all agents with auto-retry and auto-fix, rollback on failure.

Semantic Memory Systems

Persistent cross-agent memory with vector search, scoped access control, and local embeddings. Agents accumulate institutional knowledge over time.

Sandboxed Code Execution

Session-scoped Docker containers for agent code execution. Credentials injected via environment variables, never exposed in agent-readable workspaces.

Hybrid Cloud + Edge

Multi-node architectures connecting cloud instances with edge compute nodes via encrypted mesh networking. Browser automation and heavy compute on dedicated hardware.

Behavioral Contracts

Agent behavior defined in versioned markdown workspace files — personality, tools, rules — not buried in application code. Hot-reloadable without restarts.

How I Think About AI Infrastructure

The technical decisions that separate production systems from demos.

One agent per function, not one LLM call per function

Each agent has persistent identity, memory, tools, and security boundaries. They learn over time. A "support agent" isn't a prompt template — it's a system that has triaged thousands of tickets and remembers every one.

Multi-tenant from a single agent

One external-facing agent serves all customers with proper data isolation — not per-customer deployments. Internal agents serve multiple teams through the same gateway. Architecture decisions that affect cost, operations, and update velocity.

Local embeddings, not API calls

A 300MB model running locally handles all vector embeddings. Zero per-query API costs, lower latency, no external dependency for core memory operations.

Workspace files as behavioral contracts

Agent behavior lives in versioned markdown files, not buried in application code. Change an agent's personality, tools, or rules by editing a document. Hot-reload without restarts.

Docker sandboxing as security boundary

Agents execute code in session-scoped containers with 40 Linux capabilities dropped. Credentials injected via filtered environment variables — never in agent-readable workspace files.

Optimize for cost at scale, not just functionality

Token optimization, prompt caching, model selection per task complexity. Building something that works is step one. Building something that works at scale without a runaway API bill is the real job.

AI agents are easy to build.
Hard to operate.

Every engineering team can get a chatbot running in a weekend. The hard problems start on Monday: How do you serve thousands of external customers from a single instance without data leaking between them? How do you give internal teams autonomous agents without blowing up your API budget? How do you update nine agents every night without breaking any of them? How do you give agents the ability to execute code without giving them the keys to your infrastructure?

These are infrastructure problems, not AI problems. They require the same rigor as any other production system — monitoring, security, cost controls, automated operations — plus a deep understanding of how LLMs actually behave in production. That's what I specialize in.

Simple chatbot vs full production AI agent infrastructure
Infrastructure, not just AI
  • Single ARM64 instance running all agents with 2GB swap — cost-optimized, not over-provisioned
  • Prompt caching with 55-minute heartbeats to keep 1-hour TTL cache warm
  • WAF + ALB + rate limiting (300 req/5min) on public-facing endpoints
  • Prompt injection defense, output filtering, PII handling rules
  • Control UI dashboard with per-agent session tracking and cache diagnostics
Limited Availability

Free AI Infrastructure Audit

A focused 30-minute review of your current AI stack — no pitch, no fluff. Walk away with a clear picture of where you stand and what to fix first.


What we cover

  • Production Readiness

    Is your agent architecture built to handle real load, failures, and edge cases — or is it a demo in disguise?

  • Token Cost Analysis

    Where your LLM spend is going, what's being wasted, and what caching or routing changes would move the needle.

  • Security Boundaries

    Tenant isolation, prompt injection exposure, output filtering gaps, and where your trust model breaks down.

  • Scaling Bottlenecks

    The architectural constraints that will choke throughput before you hit the scale you're planning for.

Book Your Free Audit

30 minutes — no obligation — spots are limited

Reading List

What I'm Reading

AI infrastructure, agent systems, and tools worth your time.

Let's Build Something Real

If you're past the proof-of-concept stage and need AI agents that actually run in production — with the security, cost controls, and operational maturity to back it up — I'd like to hear about it.

Please enter your name
Please enter a valid email
Please enter your message
Security Check