Anavec · AnaRack — The AI-Native Runtime Platform

CHAPTER 01

A heterogeneous rack chassis, not bespoke infrastructure.

AnaRack standardizes disparate hardware components into a single foundational architecture. Multi-zones, one chassis, one runtime — and any compliant drawer slides in.

Multi-zones. One foundational architecture.

Today, enterprise AI infrastructure is bespoke — every workload assembles a different pile of servers, switches, and storage. AnaRack makes that the wrong question. The chassis is the answer; the modules are the question.

Multi-zones standardize the surface that AnaROS schedules onto. Every zone accepts any third-party drawer that conforms to its SDI contract. The drawers can change. The chassis doesn't.

[ZONE_A]

High-Density CPU Tier

x86 / ARM compute sleds carrying the control plane, services, and data-prep stages of the pipeline.

1U / 2U profiles

[ZONE_B]

Accelerated GPU Compute

PCIe shelves carrying heterogeneous GPU payload — drawers swap on their own clock, vendor-neutral.

multi-vendor

[ZONE_C]

Networking & Switching Layer

Leaf/spine fabric · out-of-band management · scale-up + scale-out + scale-cross under one plane.

SONiC · multi-vendor

[ZONE_D]

Memory & Storage Tier

NVMe-oF shelves, memory pools, and object tiers — drawer-class storage and a new memory tier for memory-bound acceleration, surfaced through SDI.

hot · warm · cold

AnaRack heterogeneous rack chassis · modular drawers and orange/blue pipeline flows

Brownfield · or · AnaRack

AnaROS runs on the rack you choose. AnaRack is the rack when you want more.

BROWNFIELD · YOUR EXISTING RACK

AnaROS attaches — unchanged hardware.

Your GPU cluster, your CUDA stack, your K8s — no forklift. AnaROS drops in as the rack OS on top of what you already own.

Visibility from workflow to silicon
Workload governance for every pipeline that runs on the rack
Infrastructure guardrails for who and what is allowed to act

ANARACK · THE OPEN-COMPUTE UPGRADE

AnaROS + AnaRack — decoupled CPU and GPU.

Open, composable, heterogeneous. Segregate the CPU host from the GPU shelf. Extend the life of what you own and refresh each on its own clock.

Open networking · open shelf · off-the-shelf memory and disk
Segregated CPU host from GPU shelf for supply-chain flexibility
Ratio profiles become a dynamic per-workload rack policy
Stackable across racks as one logical chassis

Same operating contract. Same Pipeline Governor. Same two surfaces. The rack decides how far you push open-compute.

CHAPTER 02

Heterogeneous capability drawers.

Hardware components — GPU servers, CPU nodes, switches — are not independently installed. They onboard and register into the platform as drawers, through SDI, vendor-neutral.

Components don't just get installed. They get onboarded.

The moment a subsystem on a rack slides into a chassis, AnaRack's SDI layer register it, fingerprints its capabilities, runs a class probe, and onboard it onto AnaRack. From that point on, the subsystem is no longer a piece of hardware — it's a logical capability the platform can schedule, below the application layer.

3rd-party hardware integrates through the SDI contract for its class. Onboarding is minutes, not weeks. Removal is a hot-swap, not a maintenance window.

>INITIALIZING CAPABILITY ONBOARDING

>DETECTING:8× H100 TENSOR CORE GPU | 10-Slot PCIe expansion shelf

>CLASS:GSL · accelerator | pcie-shelf

>FINGERPRINT:0x9F2A:H100:80GB:NVLINK·OK | 0x16B8:L40S:PCIe·OK

>STATUS: READY FOR REGISTRATION

ONBOARDING · LIVE

Capability drawer onboarding · INITIALIZING CAPABILITY ONBOARDING · 8× H100 Tensor Core GPU · READY FOR REGISTRATION

01 · DETECT

SDI auto-discovers

Drawer slots in

AnaRack's SDI layer auto-discovers the drawer the moment it lands in a slot. Fingerprints model, count, firmware, NVLink topology, NIC ports.

capability_class=GSL · fingerprint=0x9F2A

02 · REGISTER

Governor admits it

Capability registered

AnaROS Governor validates the drawer against its class registry, runs the probe, then registers the capability as available — bound to this rack, this slot.

probed_ok · lifecycle=deployed_full

03 · ABSTRACT

Raw silicon → logical pool

Physical complexity hidden

AnaRack translates the drawer's physical surface into logical capabilities the platform can schedule: TFLOPS, aggregated vRAM, network bandwidth, IOPS.

logical · TFLOPS · vRAM · bandwidth

04 · GOVERN

Enveloped by perimeter

Inside the boundary

The drawer is enveloped by the AnaRack software perimeter. Unified abstraction, zero-trust boundary, global pool visibility. Pipelines can now claim it.

governance_signal=admitted · ledger=claim_live

SDI · SOFTWARE-DEFINED INTEGRATION

Heterogeneous hardware. Integrated platform.

Every third-party drawer that lands in an AnaRack slot walks the same four-step journey through the SDI contract for its class. Vendor-neutral on the hardware side; uniform contract on the platform side. The chassis stays the same — the drawers do not have to.

Detect · Register · Abstract · Govern

CHAPTER 03

The intersection of compute and flow.

Horizontal drawers carry potential energy. Vertical pipelines carry kinetic energy. AnaRack hosts the runtime where the two meet — pipelines agnostic to hardware, routed onto the right drawer on demand.

Horizontal capacity. Vertical pipelines.

If the horizontal drawers represent the system's potential energy, the vertical pipelines represent its kinetic energy. These are the live AI workflows coursing through the chassis — model weights streaming in, inference requests streaming out, telemetry rising back to the control plane.

Pipelines abstracts the underlying hardware. AnaRack routes each workflow onto the most efficient governed drawer, on demand, and reshapes the topology as the workload mix shifts. The same chassis serves a thousand different pipelines.

Vertical axes of execution · data pipeline flow and execution pipeline flow, 400 Gbps, latency < 1ms

↓ THE DATA & CONTROL PLANE 400 Gbps · vertical flow

Orchestration flows down.

Model weights, RAG context, telemetry, state synchronization — anything orchestrative moves through the data & control plane. AnaROS Governor lives here; AnaRack ferries its decisions to the drawers.

Model weight loading
RAG context ingestion
Telemetry collection
State synchronization
Policy distribution

↑ THE EXECUTION PLANE 400 Gbps · < 1 ms · vertical flow

Execution runs across.

Live inference requests, dynamic compute scaling, response generation, parallelized batches — anything productive moves through the execution plane. Sub-millisecond round-trip across drawers, governed end-to-end.

Live inference requests
Dynamic compute scaling
Response generation
Parallelized batches
Egress streaming

01 · REQUEST INBOUND

Pipeline enters the platform

A pipeline carrying an inference workload — agentic call, RAG query, fine-tune step — arrives at the AnaRack runtime. It carries intent, not a hardware target.

02 · OPTIMAL ROUTING

Control plane directs flow

AnaRack's control plane reads the registered drawer pool, applies the pipeline's SLO and tenant policy, and directs the flow to the most efficient governed drawer available right now.

03 · EXECUTION

Drawer provisions exact compute

The selected drawer provisions exactly the compute the pipeline needs — no more, no less — and the workflow proceeds. Telemetry returns up the data & control plane.

CHAPTER 04

The blueprint advantage.

Four dimensions where the AnaRack blueprint changes the equation versus siloed infrastructure — velocity, governance, scaling, and resource utilization.

DIMENSION

TODAY

Siloed Infrastructure

ANRP BLUEPRINT

AI-Native Runtime Platform

Deployment Velocity

How long to land new capacity

Weeks manual provisioning · rack, cable, configure, validate

Minutes drawer onboarding · SDI auto-discover + register

Governance

Where the perimeter sits

Highly fragmented per-node security · stitched policy · per-domain tools

Unified perimeter one bounding box · one control plane · ledger-backed

Scaling

How capacity grows under load

High friction hardware-bound · forklift upgrades · re-architect

Elastic & dynamic capability registration · slot more drawers · stack more racks

Resource Utilization

What fraction is actually working

Low stranded compute · workload-locked silos · idle GPU dollars

High logical pools · scheduled per-pipeline · multi-tenant

CHAPTER 05

Scalable. Many racks, one logical chassis.

Slot in additional hardware and AnaRack takes care of the rest — modular drawers extend into modular racks, all behaving as one logical chassis. Stackable across N racks per pod with a single contract and one stack of operation.

Many racks. One logical chassis.

Slot in additional hardware and AnaRack takes care of the rest. The SDI layer discovers the new drawer or rack the moment it lands, abstracts its raw silicon into logical TFLOPS, vRAM, and bandwidth, and utilizes it inside the same governed runtime as the original chassis. Cross-rack spines between racks. No re-architecture. No application rewrites. The pipelines never see new hardware — they just get more of it.

Every property of the single-chassis runtime extends cleanly across racks. Pipeline X-Ray, stage telemetry across the fabric, and the resource ledger correlate one stack of operation with full visibility, traceability, and governance — at any scale. Expandable, stackable to N racks per pod. The chassis grows; the contract doesn't.

N racks per pod · cross-rack spine · 1.6 Tbps · scalable capacity 3× shown

Limitless modular scalability · Rack 01, Rack 02, Rack 03 connected by cross-rack spine at 1.6 Tbps · scalable capacity 3×

Lifecycles · Refresh · Extend

Decouple CPU and GPU lifecycles. Refresh independently. Extend the life of what you own.

Standard rack envelope, off-the-shelf modules, programmable fabric shelf, heterogeneous host sleds. Each capability moves on its own clock — host count and shelf payload change without rebuilding the other.

01 · CPU CLOCK

CPU refresh no longer forces GPU refresh

Host sleds move on their own cadence. Capability shelves stay put.

02 · MEMORY WALL

GPU memory wall no longer forces a bigger whole server

Grow accelerator memory in the shelf — leave the host layer alone.

03 · HOST COUNT

Host count grows or shrinks without rebuilding the shelf payload class

Right-size the host layer to the workload. Same GPU shelf underneath.

CPU : GPU RATIO PROFILES

1 : 2 host : GPU · CPU-heavy path 1 : 4 host : GPU · balanced mix 1 : 8 host : GPU · GPU-dense path

Set per rack, per workload. MINIMIZE CAPEX. MAXIMIZE utilization.

CHAPTER 06

Workflow continuum across physical, virtual, on-prem, and hybrid. One logical system.

Every property of the single-chassis runtime extends cleanly across racks, on-prem, hybrid, and multi-cloud virtual racks. AnaRack is, first, an abstraction — heterogeneous compute, fabric, memory, storage, governed perimeter. The hardware in Chapters 1–7 is one substrate. Public cloud GPUaaS is another: EC2 instances + GPU instances + VPC + EBS/S3 + IAM policies compose a virtual rack with the same five elements.

AnaRack is the abstraction. Hardware is one substrate; cloud is another.

Every cloud-composed AI environment lines up against the same five elements an on-prem AnaRack chassis exposes: EC2 sleds for control · GPU instances for compute · EBS+S3 for storage · VPC for fabric · IAM/SG for the policy plane. The SDI contract that AnaRack defines holds across both. The stage telemetry across the fabric and AAIF verdict engine attach the same way. Cloud GPUaaS isn't a separate product category — it's a virtual rack you assemble per-workload.

PHYSICAL RACK · ANRP CHASSIS

Heterogeneous hardware drawers.

CPU sleds · 1U/2U
GPU shelf · 4U, heterogeneous
NVMe · hot · warm · cold
SONiC switch · 51.2 Tbps
Governed perimeter · OCP-open

VIRTUAL RACK · CLOUD-COMPOSED

Same five elements, cloud-rendered.

EC2 instances · m6/m7/c7
GPU instances · P5 · G6 · A100
EBS · S3 · object + block
VPC fabric · ENI · TGW · DX
IAM · SG · NACL · policy plane

SEE OPERATOR PAIN The operational story — split-brain governance across physical and virtual racks — is told in Use Cases · Case 09 · Physical + Virtual Rack.

Pipeline continuum · physical and virtual

The workflow continues — from your rack into the cloud.

AnaRack makes physical and virtual racks one substrate for one pipeline. The workflow that runs on your hardware today runs on cloud capacity tomorrow — same SDI contract, same governance, same audit trail.

AnaRack hybrid pipeline continuum — an AI workflow flows from physical on-prem racks into the virtual rack composed on hyperscaler clouds (AWS, Azure, GCP) and external LLM endpoints. The pipeline remains one unit of work, governed end-to-end by AnaROS.

FIGURE · ONE PIPELINE · PHYSICAL TO VIRTUAL · ONE UNIT OF WORK

The pipeline does not stop at the rack edge — it continues into the cloud, observed and governed as one.

CHAPTER 07

Governed enterprise serving. Multi-tenant by contract.

Modular physical capabilities dynamically fueling governed, logical AI pipelines — multi-tenant by contract, cryptographically isolated, single-pane-of-glass observable. Whether substrate is a physical AnaRack chassis or a virtual rack composed on cloud GPUaaS, the governance contract is the same.

Multi-tenant by contract, not by configuration.

Every enterprise pipeline carries three things the chassis has to honor: a tenant identity, an SLO budget, and a chargeback line. AnaRack makes all three first-class properties of the runtime — workload prioritization happens at the drawer level, not in a side-car queue. Noisy neighbors get rate-gated before they ever touch a paying tenant's SLOs — no downtime cascades from one workload to another — and every SLA is observable, attributable, and chargeback-friendly out of the box.

Multi-tenant pipelines execute behind strict cryptographic separation, with tenant identity carried into every fabric hop — no shared-fate at the GPU, no implicit memory crossing. And every drawer, every pipeline, every hop reports into the same telemetry plane: pipeline X-Ray, stage telemetry across the fabric, and the resource ledger all stitched into one pane of glass for the entire modular runtime.

Governed Enterprise Serving · Guaranteed SLAs — workload prioritization prevents noisy neighbor disruptions · Total Isolation — multi-tenant pipelines execute with strict cryptographic separation · Unified Visibility — a single pane of glass for all telemetry across the modular runtime · shield enveloping rack with secure pipeline flow

CHAPTER 08

Server architecture for independent scale.

CPU and GPU don't have to scale together. AnaRack's server architecture lets each scale on its own clock, sized to the workload — no stranded silicon, no fixed ratios. The reference chassis below is the first AnaRack-compliant manifestation, available today for design partners.

ARCHITECTURE · CONCURRENCY · SCALE INDEPENDENTLY

Two concurrent fabrics — PCIe compute + Ethernet fabric.

Both paths light simultaneously: GPU compute on PCIe (orange) while the next batch streams on Ethernet (cyan).

Latency is relative — the noise-floor principle.

A data path's latency only matters next to the cost beside it. If prep, compute, storage or disk dwarfs the path, the PCIe edge is noise.

log scale · each step ≈ 10×

1 · Wins on benchmark

Significant, defensible performance advantage.

Storage-heavy inference (RAG, ML inspection, batch embedding)
High-egress generation (image / video / doc)
Pre-warm / multi-stage pipelined inference
Scale-out across shelves & racks

2 · Wins on architecture

Ties or slightly trails raw — wins the system.

CPU-prep-heavy single-GPU ingress: programmable re-ratio and hteterogeneous hosts
Latency-critical small-message loops: disaggregation is pure upside
Mixed-model heterogeneous serving: mixed GPU and per-stage placement
Long-lived production fleets: lifecycle TCO

FRONT FACEPLATE · 4U · GEN5

Reference chassis · programmable accelerator shelf

The first AnaRack-compliant manifestation of independent-scale server architecture. Programmable host:slot partitioning lets one chassis serve many workload ratios — not a fixed-ratio box. BMC integration native to AnaROS; hex-cell filter slides out from the front; SDI onboarding on first power. Available for design partner pilots.

FORM

4U · 19"EIA-310

SLOTS

8× dual-width x16 FHFL + 2× single-width x16 FHFL or 16× SW x8

BANDWIDTH

256 GB/sPCIe Gen5

POWER

675W / slotper accelerator

BMC

IPMI + AnaROSdrawer-native

ONBOARD

SDIzero-touch

PARTITIONING Programmable PCIe Gen5 switch fabric — host:slot fan-out is configurable, not fixed. Supports 1, 2, or 4 host uplinks (64 / 128 / 256 GB/s aggregate) and per-host ratios set per backplane at boot or runtime via SDI. Partition topology details on request.

NEXT Own-engineered chassis on roadmap — deeper telemetry, faceplate · filter · drawer ID optimized around partner duty cycles. Same programmable promise. Same standard interfaces.

CHAPTER 09

A new memory tier — memory-bound stages stay compute-bound.

Modern AI pipelines spend more time moving memory than multiplying it. AnaRack inserts a memory tier between HBM and storage — close enough to feed accelerators at near-memory speed, deep enough to hold the working sets that won't fit in HBM. The cliff between accelerator memory and far storage flattens.

Where memory-bound pipelines stall — and what changes.

When the working set outgrows HBM, the next stop today is far storage — orders of magnitude slower. GPUs sit idle waiting for the data they're about to need. P99 collapses, and a faster GPU doesn't fix it.

AnaRack's new memory tier sits between HBM and storage. AnaROS pre-stages into it ahead of the GPU's request — MoE expert weights, KV pages, embedding tables, retrieval state — so the stages that historically dominate end-to-end latency stop being the bottleneck.

No new memory API. No application rewrites. Your CUDA stack, your K8s, your models — unchanged.

01

An order of magnitude closer to compute

Than the far-storage tier that catches HBM spillover today.

02

Lift on the stages that matter

Expert swap, KV cache miss, retrieval — the memory-bound stages that drive service recovery time and SLO misses, not the matmul that was already fast.

03

Bigger models. Deeper caches. Larger indexes.

Working sets that previously forced spillover stay close to compute.

DESIGN PARTNER · DEEPER ARCHITECTURE WALK-THROUGH ON REQUEST

CHAPTER 10

Anavec Ethernet switches. One stack. Every tier.

Same chassis story, different layer. A family of switches — spine, aggregation, ToR, and edge — built on a curated mix of silicon, current and next generation. SONiC as the network OS for the data plane; AnaROS-native control plane for SDI onboarding, stage telemetry, and AAIF verdicts. One vendor of record from silicon to SLO, across every tier.

2U · 51.2 T · SHOWN

64-port OSFP800 fabric switch · SONiC + AnaROS-native.

The top of the family — a 2U Ethernet switch at 51.2 Tbps non-blocking, available today, anchors the rack spine for scale-up within the rack and scale-out across the pod. SONiC as the open, hardened network OS for the data plane; AnaROS-native control plane for SDI onboarding, switch-port-to-pipeline correlation (POFC), and AAIF verdicts. The rest of the portfolio — aggregation, ToR, and edge — ships with SONiC + AnaROS stack, and tracks the next silicon generation on the same compatibility envelope. One vendor of record from silicon to SLO, across every tier. Available for design partner pilots.

FORM

2U · 19"EIA-310

PORTS

64× OSFP800or 256× 200GE breakout

BANDWIDTH

51.2 Tbpsnon-blocking

SILICON

TH5 · SP4TH6 / SP6 on roadmap

NETWORK OS

SONiC + AnaROSdata + control plane

INTEGRATION

SDI + POFCAnaROS-native

FAMILY SPINE · 51.2 T · 102.4T AGGREGATION · 32× 800G · 25.6 T ToR · 48× 100/400G EDGE · 48× 25/100G
All Anavec-branded. Curated silicon across Standard networking. Same SONiC + AnaROS-native stack across every tier.

NEXT Own-curated SONiC distribution + deeper AAIF integration — switch-port-level verdicts, evidence, audit. Same standard interfaces, same single point of accountability — applied uniformly across future generations land.

STARTER KIT · DAY-1 SKU

The 9U entry-level rack.

The PCIe shelf (Ch 08) and the 32×400G Ethernet switch (Ch 09) ship together as the minimum Anavec rack — four chassis, AnaROS pre-loaded, the smallest path into the full platform.

BILL OF MATERIALS Mgmt switch · Ethernet switch (32×400G) · CPU server · PCIe expansion shelf — wired with 1G OOB, 2×200G + 100G Ethernet, and PCIe Gen5 CDFP. AnaROS pre-loaded; pilot SKU available today. See the Starter Kit →

The heterogeneous Rack-as-a-System for AI workloads.

Loosely-coupled enterprise AI.

A heterogeneous rack chassis, not bespoke infrastructure.

Multi-zones. One foundational architecture.

High-Density CPU Tier

Accelerated GPU Compute

Networking & Switching Layer

Memory & Storage Tier

AnaROS runs on the rack you choose. AnaRack is the rack when you want more.

AnaROS attaches — unchanged hardware.

AnaROS + AnaRack — decoupled CPU and GPU.

Heterogeneous capability drawers.

Components don't just get installed. They get onboarded.

SDI auto-discovers

Governor admits it

Raw silicon → logical pool

Enveloped by perimeter

Heterogeneous hardware. Integrated platform.

The intersection of compute and flow.

Horizontal capacity. Vertical pipelines.

Orchestration flows down.

Execution runs across.

Pipeline enters the platform

Control plane directs flow

Drawer provisions exact compute

The blueprint advantage.

DIMENSION

Siloed Infrastructure

AI-Native Runtime Platform

Deployment Velocity

Governance

Scaling

Resource Utilization

Scalable. Many racks, one logical chassis.

Many racks. One logical chassis.

Decouple CPU and GPU lifecycles. Refresh independently. Extend the life of what you own.

Workflow continuum across physical, virtual, on-prem, and hybrid. One logical system.

AnaRack is the abstraction. Hardware is one substrate; cloud is another.

The workflow continues — from your rack into the cloud.

Governed enterprise serving. Multi-tenant by contract.

Multi-tenant by contract, not by configuration.

Server architecture for independent scale.

Two concurrent fabrics — PCIe compute + Ethernet fabric.

Reference chassis · programmable accelerator shelf

A new memory tier — memory-bound stages stay compute-bound.

Where memory-bound pipelines stall — and what changes.

Anavec Ethernet switches. One stack. Every tier.

64-port OSFP800 fabric switch · SONiC + AnaROS-native.

The 9U entry-level rack.

The future Enterprise AI infrastructure.