The architecture of an AI employee

A system that thinks for you
and never forgets a thing.

Alex isn't an app. It's a small piece of software on your machine, a heartbeat in the cloud, and a memory that grows every week. Scroll through how it actually works.

~400MB localCloud-managedTenant isolatedAuto-updates

Scroll

01 — The split

Two halves. One brain.

The whole system breaks into two pieces. Stuff on your computer. Stuff on ours. Once you see it this way, the rest of it just clicks.

Your machine · 400MB

Local Install

~/ · encrypted on disk

Claude CodeThe AI runner. The thing that actually thinks when you talk to Alex.
Memory filesPlain English. Who you are, your team, your offers, your clients. Open in TextEdit.
Skills folder~25 playbooks for the work — ad creation, briefs, status reports, carousel builds.
.env keysYour API keys for Anthropic, Slack, CRM, ads — encrypted, never shared.

⇄

Our cloud · always on

The Heartbeat

Railway · multi-tenant · isolated

Scheduled jobsMorning brief at 7am. Slack scans every 5 min. Calendar conflict watch. Firing without your laptop.
Background workersAd performance, review monitoring, lead enrichment, edit pipeline status — 24/7.
Tenant isolationEvery customer is its own sealed environment. No admin can see across accounts. Built-in by design.
Skill repo syncOne private GitHub repo. Push update → every customer pulls in seconds. No reinstall.

02 — The brain

Claude Code is the thinker.

The brain isn't a fixed program. It's a reasoning engine that reads your memory, picks the right playbook, runs the work, and reports back. Same engine that powers Anthropic.

Reads. Decides. Runs. Reports.

Every conversation, the brain reads your memory first, then picks the right skill from your folder, then runs the playbook end-to-end. It doesn't memorize a script — it reasons through what you asked.

0k+

Token context window

0 skills

Loaded per tier

~0 MB

Total local footprint

∞ memory

Persistent across sessions

03 — The memory

Plain text files. Forever context.

No database. No admin panel. Memory is a folder of markdown files at the top of your home directory. Alex reads them on every startup. You can open any of them in any text editor.

CLAUDE.md

The prime directive. Loaded into every conversation. Who you are, your role, your business, hard rules. Permanent context.

clients.md

Active client registry. Names, offers, pipelines, status. Edits propagate to every workflow Alex runs for that client.

people.md

Team roster. Roles, communication style, escalation paths. Drives every routing and assignment decision.

decisions.md

Architectural and operational decisions. Append-only journal. Future sessions know why something was done.

feedback_*.md

Corrections and validated approaches. Every "do it this way" or "yes, exactly that" lives here. Compounds over time.

project_*.md

Live project state — active builds, in-flight migrations, deadlines. Updated as work progresses.

~/ · alex

$ remember Caleb's agency runs F50 onboarding every Tuesday at 10am

> Saved to project_caleb_f50.md and people.md (Caleb · cadence)

→ context will load on every future session, automatically

04 — The skills

Twenty-five playbooks. One trigger phrase each.

A skill is a self-contained playbook for a specific job. When you ask Alex to do something, it figures out which skill applies and runs the whole thing end-to-end. New ones drop in and just work.

Morning BriefDaily · 7am

Ad Creative FactoryOn demand

Carousel GeneratorOn demand

Client Status ReportWeekly

Slack Triage5-min poll

Edit ReviewPipeline

Brand Book BuilderOnboarding

Lead Audit DeckCold outreach

Email OutreachSequence

Meeting NotesPost-call

Calendar WatchConflict scan

Ad PerformanceHourly

Content ScoutResearch

Auto EditorReels · Shorts

GHL ExpertCRM ops

SEO AuditSite-wide

Lead Site BuilderOne-shot

Slide DeckDecks · pitches

Visual ExplainerWalkthroughs

Brand CompliancePre-delivery

Task DelegationSlack + Oudio

Client HealthChurn signals

Research AssistantDeep dive

Voice CloneElevenLabs

+ More every weekAuto-pull

05 — How it acts

From prompt to finished work — five steps.

Every action runs through the same path. Read context, plan, route to the right skill, execute, report. Every correction along the way teaches Alex something it remembers next time.

01 · INTAKE

You ask

Slack, Telegram, terminal, voice. The request lands and the brain wakes up.

02 · CONTEXT

Memory loads

CLAUDE.md, clients, people, decisions, feedback — all read in parallel before a single action.

03 · ROUTE

Skill picked

The right playbook is matched. Sub-agents spawn for parallel work when the job is big enough.

04 · EXECUTE

Work runs

Tools fire. APIs call. Files write. Approvals route to humans where required. Nothing happens silently.

05 · LEARN

Memory updates

Every correction, every "yes that worked" — appended to memory. The next run starts smarter.

06 — Cost engineering

Every token is measured.

Alex was built for a real agency burning real money on AI. Token efficiency isn't a feature, it's the whole architecture. Five layers of optimization run on every call.

Average compute spend per seat

$14 — $32 / mo

Compared to standalone Claude API usage at the same volume, the optimization stack saves around 84%.

84% SAVED

$0vs unoptimized · $200+

Prompt cachingStatic system prompts and memory cached at the API layer. Repeated reads stay free.
Model routingHaiku for triage. Sonnet for everyday work. Opus reserved for heavy reasoning.
Context compressionLong sessions auto-summarize. Old turns roll up so the context stays lean.
Sub-agent parallelismBig jobs fork into focused sub-agents — each with its own clean context. Less waste.
Spend telemetryLive alerts at $5 / $10 / $30 thresholds. You see the cost rising before it's a number you regret.

07 — The framework

Gets smarter every week. Without you doing anything.

One private GitHub repo holds every skill. When a new model drops or a new playbook ships, it lands in the repo and your install pulls it in two seconds. No reinstall. No data loss. No upcharge.

Build

New skill or model upgrade lands in the private repo. Tagged for the right tier.

Push

One alex update on your machine and the new capability is live.

Run

The next time you talk to Alex, it's there. Same memory. Same context. New power.

08 — The investment

One install. A team's worth of output.

Three tiers based on how deep you want to go. Each is a one-time install with ongoing access to updates, coaching, and the skill repo. Compute runs on your own API key — engineered to stay efficient.

Starter

$5,000

15 skills · core ops

Slack bot trained on your business
Daily morning brief
Client status reports
Edit review pipeline
Bi-weekly coaching

Most installs

Operator

$10,000

25 skills · full agency stack

Everything in Starter
Ad creative factory + delivery
Brand book generation
Carousel + visual builder
Lead enrichment + outreach
Priority coaching

Studio

$20,000

40 skills · custom builds

Everything in Operator
Custom skills built for your business
Full GHL + automation buildout
Voice clone + auto-edit pipeline
White-glove onboarding

+ Compute (your API key)

~$24/ mo

Average seat usage on your own Anthropic API key

Compute is billed directly by Anthropic on your own key — we don't mark it up. The five-layer optimization stack keeps the average install at $14–$32 / mo. Heavy users still rarely cross $60.

Compare that to a part-time VA at $1,200–$2,400 / mo for limited hours and a 30-day ramp.

Your install pays for itself the first time Alex writes a brief at 7am, drafts an ad before lunch, or catches a Slack message you would have missed.

See full pricing Book a setup call

— and that's the whole thing

Small on your machine. Massive in what it does for you.

A 400MB install. A cloud heartbeat. A memory that grows. Twenty-five playbooks today, more every week. One number on the bottom line.

A system that thinks for youand never forgets a thing.