The architecture of an AI employee

A system that thinks for you
and never forgets a thing.

Alex isn't an app. It's a small piece of software on your machine, a heartbeat in the cloud, and a memory that grows every week. Scroll through how it actually works.

~400MB localCloud-managedTenant isolatedAuto-updates
Scroll
01 — The split

Two halves. One brain.

The whole system breaks into two pieces. Stuff on your computer. Stuff on ours. Once you see it this way, the rest of it just clicks.

Your machine · 400MB

Local Install

~/ · encrypted on disk

  • Claude CodeThe AI runner. The thing that actually thinks when you talk to Alex.
  • Memory filesPlain English. Who you are, your team, your offers, your clients. Open in TextEdit.
  • Skills folder~25 playbooks for the work — ad creation, briefs, status reports, carousel builds.
  • .env keysYour API keys for Anthropic, Slack, CRM, ads — encrypted, never shared.
Our cloud · always on

The Heartbeat

Railway · multi-tenant · isolated

  • Scheduled jobsMorning brief at 7am. Slack scans every 5 min. Calendar conflict watch. Firing without your laptop.
  • Background workersAd performance, review monitoring, lead enrichment, edit pipeline status — 24/7.
  • Tenant isolationEvery customer is its own sealed environment. No admin can see across accounts. Built-in by design.
  • Skill repo syncOne private GitHub repo. Push update → every customer pulls in seconds. No reinstall.
02 — The brain

Claude Code is the thinker.

The brain isn't a fixed program. It's a reasoning engine that reads your memory, picks the right playbook, runs the work, and reports back. Same engine that powers Anthropic.

A

Reads. Decides. Runs. Reports.

Every conversation, the brain reads your memory first, then picks the right skill from your folder, then runs the playbook end-to-end. It doesn't memorize a script — it reasons through what you asked.

0k+
Token context window
0 skills
Loaded per tier
~0 MB
Total local footprint
memory
Persistent across sessions
03 — The memory

Plain text files. Forever context.

No database. No admin panel. Memory is a folder of markdown files at the top of your home directory. Alex reads them on every startup. You can open any of them in any text editor.

CLAUDE.md

The prime directive. Loaded into every conversation. Who you are, your role, your business, hard rules. Permanent context.

clients.md

Active client registry. Names, offers, pipelines, status. Edits propagate to every workflow Alex runs for that client.

people.md

Team roster. Roles, communication style, escalation paths. Drives every routing and assignment decision.

decisions.md

Architectural and operational decisions. Append-only journal. Future sessions know why something was done.

feedback_*.md

Corrections and validated approaches. Every "do it this way" or "yes, exactly that" lives here. Compounds over time.

project_*.md

Live project state — active builds, in-flight migrations, deadlines. Updated as work progresses.

~/ · alex
$ remember Caleb's agency runs F50 onboarding every Tuesday at 10am
> Saved to project_caleb_f50.md and people.md (Caleb · cadence)
→ context will load on every future session, automatically
04 — The skills

Twenty-five playbooks. One trigger phrase each.

A skill is a self-contained playbook for a specific job. When you ask Alex to do something, it figures out which skill applies and runs the whole thing end-to-end. New ones drop in and just work.

Morning BriefDaily · 7am
Ad Creative FactoryOn demand
Carousel GeneratorOn demand
Client Status ReportWeekly
Slack Triage5-min poll
Edit ReviewPipeline
Brand Book BuilderOnboarding
Lead Audit DeckCold outreach
Email OutreachSequence
Meeting NotesPost-call
Calendar WatchConflict scan
Ad PerformanceHourly
Content ScoutResearch
Auto EditorReels · Shorts
GHL ExpertCRM ops
SEO AuditSite-wide
Lead Site BuilderOne-shot
Slide DeckDecks · pitches
Visual ExplainerWalkthroughs
Brand CompliancePre-delivery
Task DelegationSlack + Oudio
Client HealthChurn signals
Research AssistantDeep dive
Voice CloneElevenLabs
+ More every weekAuto-pull
05 — How it acts

From prompt to finished work — five steps.

Every action runs through the same path. Read context, plan, route to the right skill, execute, report. Every correction along the way teaches Alex something it remembers next time.

01 · INTAKE

You ask

Slack, Telegram, terminal, voice. The request lands and the brain wakes up.

02 · CONTEXT

Memory loads

CLAUDE.md, clients, people, decisions, feedback — all read in parallel before a single action.

03 · ROUTE

Skill picked

The right playbook is matched. Sub-agents spawn for parallel work when the job is big enough.

04 · EXECUTE

Work runs

Tools fire. APIs call. Files write. Approvals route to humans where required. Nothing happens silently.

05 · LEARN

Memory updates

Every correction, every "yes that worked" — appended to memory. The next run starts smarter.

06 — Cost engineering

Every token is measured.

Alex was built for a real agency burning real money on AI. Token efficiency isn't a feature, it's the whole architecture. Five layers of optimization run on every call.

Average compute spend per seat

$14 — $32 / mo

Compared to standalone Claude API usage at the same volume, the optimization stack saves around 84%.

84% SAVED
$0vs unoptimized · $200+
  • Prompt cachingStatic system prompts and memory cached at the API layer. Repeated reads stay free.
  • Model routingHaiku for triage. Sonnet for everyday work. Opus reserved for heavy reasoning.
  • Context compressionLong sessions auto-summarize. Old turns roll up so the context stays lean.
  • Sub-agent parallelismBig jobs fork into focused sub-agents — each with its own clean context. Less waste.
  • Spend telemetryLive alerts at $5 / $10 / $30 thresholds. You see the cost rising before it's a number you regret.
07 — The framework

Gets smarter every week. Without you doing anything.

One private GitHub repo holds every skill. When a new model drops or a new playbook ships, it lands in the repo and your install pulls it in two seconds. No reinstall. No data loss. No upcharge.

01

Build

New skill or model upgrade lands in the private repo. Tagged for the right tier.

02

Push

One alex update on your machine and the new capability is live.

03

Run

The next time you talk to Alex, it's there. Same memory. Same context. New power.

08 — The investment

One install. A team's worth of output.

Three tiers based on how deep you want to go. Each is a one-time install with ongoing access to updates, coaching, and the skill repo. Compute runs on your own API key — engineered to stay efficient.

Starter
$5,000
15 skills · core ops
  • Slack bot trained on your business
  • Daily morning brief
  • Client status reports
  • Edit review pipeline
  • Bi-weekly coaching
Most installs
Operator
$10,000
25 skills · full agency stack
  • Everything in Starter
  • Ad creative factory + delivery
  • Brand book generation
  • Carousel + visual builder
  • Lead enrichment + outreach
  • Priority coaching
Studio
$20,000
40 skills · custom builds
  • Everything in Operator
  • Custom skills built for your business
  • Full GHL + automation buildout
  • Voice clone + auto-edit pipeline
  • White-glove onboarding
+ Compute (your API key)
~$24/ mo
Average seat usage on your own Anthropic API key

Compute is billed directly by Anthropic on your own key — we don't mark it up. The five-layer optimization stack keeps the average install at $14–$32 / mo. Heavy users still rarely cross $60.

Compare that to a part-time VA at $1,200–$2,400 / mo for limited hours and a 30-day ramp.

Your install pays for itself the first time Alex writes a brief at 7am, drafts an ad before lunch, or catches a Slack message you would have missed.

— and that's the whole thing

Small on your machine. Massive in what it does for you.

A 400MB install. A cloud heartbeat. A memory that grows. Twenty-five playbooks today, more every week. One number on the bottom line.