Enterprise Agentic AI Pattern: Proximity Over SpecializationPinned

posted on November 22, 2025 | tags: [ AppliedAI, AI Agents, Multi-Tenant, Architecture, Enterprise AI ]

A production-ready pattern for 2025 and beyond – Mark Roxberry

A deep dive into the architecture pattern that dominated AI agent deployments in 2025: one immutable agent binary, externalized context, and proximity-based execution.

Proximity Over Specialization

I stopped writing blog posts about AI agents for almost a year because every time I started one, the architecture I wanted to describe felt too simple to be interesting. Turns out “too simple” is exactly what enterprises now pay millions to achieve.

Run the exact same immutable agent binary in every tenant and every region.
Everything that makes it behave differently — context, policies, RAG indexes, escalation rules — lives outside the code and is injected at runtime.

That’s it. No per-tenant forks. No “EU agent” vs “US agent” code bases. One golden image, zero trust exceptions, full data residency compliance.

I'll call this Proximity over Specialization.

Why This Pattern Won in 2025

Security & audit teams finally sleep at night
One binary = one SBOM = one place to patch prompt-injection fixes or update the JSON validator.
CISOs now accept multi-tenant AI agents only when the answer to “show me every line of code that runs in Germany” is a single Git commit SHA.
Data residency becomes trivial
Deploy the same container to Frankfurt, Virginia, Singapore, São Paulo.
The agent wakes up, pulls tenant-specific context from the local object store or vector DB, and is instantly compliant with GDPR, CCPA, LGPD, etc.
Upgrades measured in minutes, not quarters
Blue/green rollout of a single Docker image across 40 regions finishes before your coffee gets cold.

The Four Pillars We Actually Ship With

Single immutable agent binary
No if tenant == "acme-eu" nonsense. Ever.
Externalized, cacheable context
Policies, product catalogs, SLAs, escalation matrices — all versioned JSON/YAML artifacts stored next to the data they govern.
Model router, not model roulette
95% of turns are answered by a local cache or a cheap 8B–70B model.
Only true “no grounding” cases escalate to the frontier model (logged, metered, reviewed).
Recursive sub-agent compaction for oversized context
When a tenant’s policy + RAG payload exceeds the model window, the same agent binary spins up a temporary “Context Compactor” sub-agent.
It distills 500k tokens → 24k tokens, then hands the dense summary back to the primary loop.

Primary Agent
   ↓ (context > window)
Spawns → Context Compactor Sub-Agent (cheap long-context model)
   ← returns compacted summary + fingerprint
Primary Agent continues with grounded, compliant context

Same binary. Different system prompt. Zero new code.

The Refinements That Turned This From Prototype to $XXM Workload

Compaction is two-stage and idempotent (temperature=0) so results are cacheable
Every artifact (raw context, compacted blob, policy bundle) is versioned and signed
Compactor redacts PII automatically; if redaction removes >30% it aborts and escalates
Cold-start latency dropped from 9s → 800ms by pre-warming compacted blobs on policy deploy
Added a speculative local model (Phi-3.5 or Grok-3-mini) before frontier escalation — ~60% token savings

Results After 18 Months in Production

100% regulatory audits passed — auditors asked only for the container image digest
11-minute global patch window for prompt-injection vectors
3.8s p95 latency for a 400k-token grounded context
<8% frontier model spend, down from 68% when “just calling GPT-4o”

Closing

This isn’t sexy. There are no 1.9T hero models, no fine-tuned snowflakes, no framework-of-the-week. If you’re one of the three companies on Earth that actually needs a 70B fine-tune per tenant for drug discovery IPC classification, congratulations — you get to keep your snowflake repo. Everyone else gets to ship.

It’s just engineering.

And in 2025, disciplined engineering is the only thing that separates AI pilots that quietly die from AI infrastructure that prints money and survives audits.

One binary. External context. Proximity over specialization.

References

Credits

Image

Image generated with DALL·E (OpenAI); edited by Mark Roxberry

This post and/or images used in it may have been created or enhanced using generative AI tools for clarity and organization. However, all ideas, technical work, solutions, integrations, and other aspects described here are entirely my own.