AnaROS · The Rack Operating System

One runtime. For the rack — and the pipeline beyond it.

AnaROS turns a rack full of heterogeneous CPUs, GPUs, fabric, and storage into a single governed system — and extends the same operating contract across racks, clouds, and providers as the pipeline crosses them. Four lenses, one storyline — Visibility, Traceability, Governance, and Workload Placement — built on multi-tier telemetry that goes from L4 application workflow down to L1 fabric, and follows the pipeline wherever it lands.

01 · VISIBILITY

The vertical stack

See L4 → L3 → L2 → L1 at once. Workflow, pipeline, fabric, racks — one picture.

02 · TRACEABILITY

The horizontal pipeline

Walk the pipeline X-Ray. Intra-node, inter-node, fabric. Victim vs. root cause.

03 · GOVERNANCE

Live resource map

Every GPU, every model, every pipeline accounted for — ledger vs. reality.

04 · PLACEMENT

Workloads on the right silicon

Stages routed to the cheapest GPU class that holds their SLO. Discover to Govern.

CHAPTER 01

Visibility — the vertical stack.

Why four layers?

A single AI call traverses your application code → runtime → GPU → fabric → storage in milliseconds. When p99 collapses, the symptom you see at the top — a slow agent, a stuck retrieval, a timeout — usually has its root cause four layers down: a NIC buffer overflow, a NUMA miss, a noisy neighbor in another tenant. The dashboards you have today don't connect: K8s knows the pods, CUDA knows the kernels, the network team owns the spine — none of them see across boundaries.

AnaROS gives the rack one vertical view.

Four cooperating layers — L4 workflow, L3 pipeline X-ray, L2 logical fabric, L1 physical rack — visible from one operator surface. Trace any workflow top-to-bottom in a single pane, and the symptom and the root cause finally show up on the same screen. Scroll to drill into each layer.

L4 Application Workflow

The workflow your business actually runs.

Agentic flows, MoE applications, inspection pipelines — expressed as a controlled workflow graph with steps, branches, loops, and SLOs. This is the surface CIOs and product owners reason about.

Unit of work: the pipeline, not the GPU box
Inputs: intent, data sources, SLOs, tenant scope
What you see: 6 nodes · branch + merge · per-node host
Why it matters: outcomes are owned by the workflow

L3 Pipeline Infrastructure

One pipeline. Heterogeneous rack. Mixed GPU classes. Governed.

AnaROS decomposes each workflow into Ingest, Preparation, Execution, and Persistence across network, compute, memory and storage — and routes each stage to the right fit GPU class that holds its SLO. Heavy CNN/ViT inference doesn't mean every stage needs an H200.

Stages: Ingest · Prep · Execution · Persistence
Routing: per-stage GPU class (T4 · L4 · L40S · H200)
Telemetry: p95/p99, drift, queue depth, chargeback
Why it matters: 60-80% of compute isn't the H200 stage

L2 Logical Fabric & Resource Map

GPU-to-GPU logic flows, dynamically resourced.

The Anavec logical fabric resolves source/destination, bandwidth, and zero-copy paths between GPUs, the CPU sled, the staging tier, and storage. The resource map tells you what's allocated, what's draining, and what's free — at all times.

Heterogeneous ratios: 1:2 · 1:4 · 1:8 host-to-GPU
Memory staging fabric: warm pre-staging, no GPU wait
Resource map: CPU sleds, GPU shelves, memory tier
Why it matters: data is ready before the GPU asks

L1 Physical AI Data Center

A heterogeneous fabric — not a vendor SKU.

Standard rack envelope. Off-the-shelf modules. Programmable PCIe fabric. CPU and GPU refresh on independent clocks. The POFC tree resolves every pipeline edge to a physical switch port.

CPU sleds: x86 / ARM, 1U / 2U, refresh independently
GPU shelf: 10-slot class, programmable fabric, multi-vendor
Fabric: 2 spines · 2 leaves · 11 edges [Example Illustration only]
Storage: object · NVMe · OLAP — kept off the GPU node

L4 Application Workflow · inspection-classify

                  
                  
                  
                  
                  
                  
                  
                  
                  
                  
                  
                  
                
1
Image Ingest
When triggered · 0ms
{anavec3:NIC}
2
Tile / Decode
HTTP call · 12ms
{anavec3:CPU}
3
Route Decision
Stage router
{anavec3:n8n}
4
CNN Classify
HTTP call
{anavec2:L40S}
5
ViT Verify
HTTP call
{anavec3:L4}
6
Merge Verdict
JSON format
{anavec3:CPU}
L3 Pipeline X-Ray · b10-resnet-classify
PIPELINE EXPLORER · Inter-Stage · liveGET /api/v1/collector-intelligence
HOST · anavec1-0
↔ intra-node
HOST · anavec1-0
↗ inter-node
HOST · anavec2-0
↗ inter-node
HOST · anavec1-0
INGRESS CPU-NETcleared susp 23%
EBPFactive
HTTPp99 8761ms
CTRL
IPC
0.3 ms
kernel pipe
PREPARATION CPU-HEAVYcleared susp 82%
EBPFactive
HTTPp99 8761ms
CTRL
FABRIC
8694 ms
leaf1-0 · Eth80-88
EXECUTION GPUvictim susp 49%
EBPFactive
GPU SM1%
CTRL
FABRIC
2.1 ms
leaf2-0 · Eth88-80
PERSISTENCE STORAGE-OPTroot cause susp 98%
EBPFactive
KERNELFlows —
CTRL
L2 Logical Fabric · network · bandwidth
cpu-sled-01x86 · 64c
cpu-sled-02arm · 96c
amsf-tier1.2 TB warm
nvme-pool48 TB
Anavec Logical Fabric
cpu→fabric14.2 GB/s
fabric→gpu38.6 GB/s
amsf prefetch22.4 GB/s
gpu-h200-a94% busy
gpu-l40s-b55% busy
gpu-l4-c78% busy
gpu-t4-d41% busy
L1 Fabric Topology · spine · leaf · physical racks + virtual racks across clouds

                
              

CHAPTER 02

Traceability — the horizontal pipeline.

Where the vertical view tells you which layer, the horizontal X-Ray tells you where in the journey. Scroll to walk a real production pipeline — and find the root cause behind the obvious victim.

Pipeline X-Ray v2 b10-resnet-classify Severity: OK SLO RED production-team · anaros-b10 conf 94%

X-RayRateGateGovernorUpdated · 5s ago

PIPELINE EXPLORER Inter-Stage · AAIF / collector-intelligence · live

GET /api/v1/collector-intelligence

HOST · anavec1-0

↔ intra-node

HOST · anavec1-0

↗ inter-node

HOST · anavec2-0

↗ inter-node

HOST · anavec1-0

INGRESS CPU-NET

cleared susp 23%

CNTR1

EBPFactive

HTTPp99 8761ms

CTRL

IPC

0.3 ms

kernel pipe

PREPARATION CPU-HEAVY

cleared susp 82%

CNTR1

EBPFactive

HTTPp99 8761ms

CTRL

FABRIC

8694 ms

leaf1-0 · Eth80→88

EXECUTION GPU

victim susp 49%

CNTR1

EBPFactive

GPU SM1%

CTRL

FABRIC

2.1 ms

leaf2-0 · Eth88→80

PERSISTENCE STORAGE-OPT

root cause susp 98%

CNTR1

EBPFactive

KERNELFlows —

CTRL

E2E JITTER WATERFALL

Total 19.0ms · measured 4% · unknown 96% (telemetry-gap)

ingress

0.0ms 100%

preparation

6.0ms 51%

staging

5.0ms 38%

execution

0.0ms 100%

post_processing

5.0ms 67%

persistence

6.0ms 59%

01 Visibility · the whole pipeline

The pipeline crosses many boundaries.

This workflow lives on anavec1(GPU) for ingress and preparation, hops across fabric to anavec2(GPU) for GPU execution, then hops back to anavec1 for storage. Pipeline X-Ray sees all of it — intra-node IPC, inter-node fabric, leaf switches, every stage.

Boundary kinds intra-node IPC × 1, inter-node fabric × 2
Hosts anavec1-0, anavec2-0
Switches leaf1-0, leaf2-0
Severity OK / SLO RED — symptom and cause diverge

02 Intra-node · inter-process

Where the pipeline hops inside a node.

Ingress and Preparation both live on anavec1 — but they're separate processes in separate containers, talking through a kernel pipe. Most observability stacks lose visibility here. The X-Ray traces it: pipe-buffer occupancy, context switches, container CPU contention.

Kind intra-node IPC · kernel pipe buf
Latency 0.3 ms · throughput line-rate
Context switches 4.2k/s · sched p99 12 µs
Container CPU contention yellow on preparation

03 Inter-node · fabric hop

Where the pipeline leaves the host.

Preparation on anavec1 hands off to Execution on anavec2 — across a real fabric edge. Without POFC, this hop goes dark. With POFC, the X-Ray pins it to a specific switch port pair on leaf1-0 and surfaces drift, loss, and ECMP right there.

Logical edge preparation → execution
Physical path anavec1:enp1s0f0np0 → leaf1-0:Eth80 → Eth88 → anavec2
Fabric class ethernet-scale-out · VLAN 10
Observed 0.00 Gbps · loss 0 ppm · p99 8694 ms (upstream)

04 POFC · pipeline over fabric

One view that ties pipelines to fabric paths.

Every pipeline edge reconciled with its physical path through leaf/spine. ECMP divergence becomes legible. Ask "where does this workflow live on this fabric?" — and get a topology answer, not a Grafana stack.

Topology 2 spines · 2 leaves · 5 hosts · 11 edges
Flows 12 active pipeline edges on this fabric
Highlighted embedding-amsf-pipeline · anavec3 → anavec2
ECMP divergence tracked per (src, dst, pipeline)

05 Foundation · multi-tier telemetry

None of this works without depth.

Six telemetry domains (Node, Kernel, Process, Net L2/L3, TCP, HTTP/L7, gRPC/H2) crossed with six collector layers (SDI/SSH, Prometheus, cAdvisor, eBPF/Beyla, SDK Py/C++, OTel/OTLP). Each phase adds signals — and unlocks new RateGate capabilities.

Telemetry layers 6 active on this pipeline
End-to-end traces enabled (OTel/OTLP)
RateGate unlocked node · switch-port · container · TCP · HTTP · gRPC · GPU-kernel
Where it lives AnaROS Pipeline · Multi-tier Telemetry tier

06 Governance · the switch is a first-class object

Leaf1-0, in its own words.

Open the switch and you see POFC signals: resolved endpoints, reconstructions, drifts, violations, latest tick. Verdicts and pending mitigations live next to the switch they apply to, not in a separate ticketing system.

Resolved endpoints 2 / 57 — including the affected pipeline
POFC reconstructions 2 · hops on switch 2 · drifts 0
Latest tick #24282 · 4s ago · ring depth 100
Governance posture 0 pending · 0 pending verdicts

07 End-to-end · victim → root cause

Execution is the symptom. Persistence is the cause.

In this example, GPU SM 1% looks like a GPU problem, but the X-Ray correlates intra-node, inter-node, and fabric signals across all stages — and lands on Persistence (suspicion 98%). Storage IOPS contention back-pressures execution upstream all the way to ingress.

Victim execution · GPU SM 1% · suspicion 49%
Root cause persistence · suspicion 98% · 59% of tail
Blast radius 1 pipeline, 2 hosts, 2 leaf switches
Auto-action Governor rate-gate · tenant notified

stage detail

live · poll 5s

IngressCPU-NET

PERFORMANCE

p99 latency

8761.0 ms

p95 latency

—

throughput

0.2 rps

queue depth

—

METRICS

cntr

ebpf

active

http

p99 8761ms

kernel

Flows 115 MB/s

net

Rx 88 MB/s

node

CPU 0%

process

Container CPU —

throughput

122 req/s

Stage is cleared. Suspicion is low — the latency here is upstream backpressure leaking into ingress, not a problem with this stage.

CTRL · open in collector

Fabric POFC View

rack-keyed · pipeline-over-fabric synthesis

2. FLOWS ON THIS FABRIC

PIPELINE	SRC → DST	HOPS	CLASS
embedding-amsf-pipeline	anavec3 → anavec2	leaf2-0	scale-out
embedding-baseline	anavec3 → anavec2	leaf2-0	scale-out
langgraph-semantic-moe	anavec3 → anavec1	leaf2-0 → spine1-0 → leaf1-0	scale-up
langgraph-semantic-moe	anavec1 → anavec2	leaf1-0 → spine1-0 → leaf2-0	scale-up
b10-resnet-classify	anavec1 → anavec2	leaf1-0 → spine2-0 → leaf2-0	scale-up

Multi-Tier Collector Intelligence

progressive telemetry depth · root-cause readiness

TELEMETRY LAYERS SDI/SSH Prometheus cAdvisor eBPF/Beyla SDK Py/C++ OTel/OTLP

NODECPU GPU MEM

KERNELsyscall eBPF

PROCESScontainer

NET L2/L3Rx-Tx-Drop

TCP / PORTconn RTT

HTTP / L7endpoint lat

gRPC / H2stream

System Health

CPU 49%

GPU SM 1%

ctx switches

sched lat

Container CPU

OOM events

Memory

total/avail

GPU VRAM

page faults

TLB miss

container RSS

Storage

disk IOPS

write lat p99

block I/O lat

blkio stall

container blkio

Net Fabric

NIC IRQ rate

net Rx/Tx

port Rx/Tx

drop+ECN

TCP / Conn

TCP retrans

socket backlog

RTT p50/p99

HTTP / L7

eBPF intercept

reqs/endpoint

p99 latency

gRPC status

GPU / Accel

GPU SM util

VRAM used

CUDA events

PCIe BW

tokens/s

batch size

Queue / BP

kernel pipe buf

net queue

TCP send buf

app queue depth

gRPC flow window

SDI/SSH Prometheus cAdvisor eBPF/Beyla SDK Py/C++ OTel/OTLP

Switch detail · leaf1-0

POFC observation + governance · last tick 4s ago

Identity

SIDleaf1-0

Mgmt IP10.110.3.25:8080

Ports67

StateActive

POFC Signals

Resolved endpoints2 / 57

Reconstructions2

Hops on switch2

Drifts0

Latest tick#24282 · 4s ago

Governance Posture

PENDING

HISTORICAL

PENDING VERDICTS

RESOLVED ENDPOINTS · 2 of 2

PIPELINE	EDGE	PORTS	BOUND
langgraph-semantic-moe	preparation → execution	leaf1-0:Eth80 → leaf2-0:Eth80	✓ bound
langgraph-semantic-moe	post_proc → persistence	leaf2-0:Eth88 → leaf1-0:Eth80	✓ bound

CHAPTER 03

Governance — live resource map.

Every GPU, every loaded model, every active pipeline — visible and attributable. Ledger contract versus reality, in real time.

B14 Resnet Classify ×

HOT 8 WARM 4 COLD 14

LIVE

RESOURCE MAP · TWEAKS

Theme

lightdark

Cluster topology

Racks 1

GPU servers / rack 3

GPUs / server 1

Spine switches 0

Leaf switches 1

Load · active pipelines 49

Admission queue 8

Overlay

Ledger contract (ghost)

Contention badges

Node utilization fill

Flow particles

Flow speed 1.2×

LEAF

switch-sonic-as4625

SONiC · 54×1GMGMT

anavec1 · 192.168.9.155· DGX

CPU HEAD NODE

CPU0
20c

NVMe0

NVMe1

NIC0

NIC1

BMC

GPU TRAY · 1×GB10

GPU #0 · GB10 · 119GB36%

vram 42.85 / 119 GB36%

2 models loaded · GPU 18% · SM 12%

MEMORY · 119GB unified

42.85 / 119 GB36%

LOADED MODELS · 2

WARMQwen/Qwen3-8B

COLDTinyLlama/Chat-v1.0

PIPELINES · 12 active

healthy · ssh+nvidia-smi · 5s ago

anavec2 · 192.168.9.156· DGX

CPU HEAD NODE

CPU0
20c

NVMe0

NVMe1

NIC0

NIC1

BMC

GPU TRAY · 1×GB10

GPU #0 · GB10 · 119GB55%

vram 65.75 / 119 GB55%

3 models loaded · GPU 28% · SM 22%

MEMORY · 119GB unified

65.7 / 119 GB55%

LOADED MODELS · 3

WARMQwen/Qwen3-32B

COLDQwen/Qwen2.5-1.5B-Instruct

COLDTinyLlama/Chat-v1.0

PIPELINES · 18 active

healthy · ssh+nvidia-smi · 5s ago

anavec3 · 192.168.9.157· DGXselected

CPU HEAD NODE

CPU0
20c

NVMe0

NVMe1

NIC0

NIC1

BMC

GPU TRAY · 1×GB10 · executing B14

GPU #0 · GB10 · 119GB30%

vram 36.18 / 119 GB30%

1 model loaded · GPU 2% · SM 1%

MEMORY · 119GB unified

36.2 / 119 GB30%

LOADED MODELS · 1

HOTmicrosoft/resnet-50 (B14 active)

PIPELINES · 19 active

healthy · ssh+nvidia-smi · 5s ago

GPU · #0 · GB10

server-anavec3 · gpu-anavec3-0

GPU UTIL

30%

VRAM UTIL

30%

MEM UTIL

FABRIC BANDWIDTH

Scale-Up
ethernet · intra-rack0 / 1 Gbps

Scale-Out
ethernet · cross-rack0 / 1 Gbps

MODELS · 1 LOADED

HOTmicrosoft/resnet-50B14

182 MB · loaded 14h ago

PIPELINES · 19

B14 Resnet Classify ▸ exec

+ Adaptive Rag App, B10-B13, B15-B16, …

B14 · LEDGER vs REALITY SYNTHESIZED

ledger

actual

GPU-hrs

9.5

0.078

Bandwidth

0.31 Gbps

0.248 Gbps

contract gap · synthesized until ledger_claims × xray

PATH · 3 HOPS

LEAF

switch-sonic-as4625

ingress

SERVER

anavec3

compute

GPU #0

GB10

execute

CHAPTER 04

Placement — workloads on the right silicon.

Heavy CNN/ViT inference, MoE expert layers, and agentic pipeline stages don't all need an H200. AnaROS routes each stage to the cheapest GPU class that holds its SLO — heterogeneous by design.

One inspection pipeline, four stages, four different GPU classes — and one governance plane on top.

INSPECTION PIPELINE · UNIT OF WORK

amsf:inspection-b6-amsf · production-team

STAGE 1

Image Ingest

L4PCIe

NIC + storage bound

host · anavec1-0

STAGE 2

Tile / Pre-process

L40SPCIe

normalize, augment, batch

host · anavec1-0

STAGE 3

Defect Inference

H200SXM

heavy CNN / ViT

host · anavec2-0

STAGE 4

Classify / Dispose

T4PCIe

small classifier, final decision

host · anavec3-0

AnaROS™ — Pipeline Governance Layer — SLOs · Isolation · Routing · Chargeback PIPELINE-AWARE

WORKFLOW PLACEMENT JOURNEY

workflow: amsf:inspection-b6-amsf

01 · DISCOVER

Discover

5 containers declared · source workflow_intent.yaml

02 · RECONCILE

Reconcile

declared shape bound · strategy per_container

03 · PLACE

Place

stages routed to L4 · L40S · H200 · T4 · advisory plan ready

04 · DEPLOY

Deploy

deploy state pending · Terraform walkthrough on apply

05 · GOVERN

Govern

SLOs, isolation, chargeback wire after deploy

RESOURCE MAP · LIVE SNAPSHOT

advisory only — no commit until Apply

anavec1

unified

gpu-anavec1-0

111.1 GB free

cap 128 · used 16.9 (87% free)

8 models loaded

anavec2

unified

gpu-anavec2-0

106.9 GB free

cap 128 · used 21.1 (84% free)

2 models loaded

anavec3

unified

gpu-anavec3-0

123.3 GB free

cap 128 · used 4.7 (96% free)

2 models loaded

⚡ Apply this plan · Terraform Deploy walkthrough

One rack operating system. One runtime. One storyline.

Visibility → Traceability → Governance → Placement, all on the same governed pipeline. The same AnaROS console runs on your homelab today.

Next step · Instrument · Profile · Pilot Request a briefing hello@anavec.ai