We talked about the theory. Now here's the tool. govcai is an open source CLI wrapper that makes govc output LLM-ready — with 300x smaller responses, built-in safety gates, and zero changes to your vCenter workflow.
TL;DR — govcai wraps VMware's govc CLI and transforms its verbose JSON into compact, LLM-optimized markdown. Drop it in, point it at your vCenter, and your AI agents get up to 4,760x smaller responses, 98.6% extraction accuracy (vs 42.2% with raw govc), and built-in safety gates that prevent accidental
vm destroy. Same govc under the hood, same env vars — just smarter output. Open source, Apache 2.0. github.com/vchaindz/govcai
If you're running LLM agents against VMware infrastructure today, you're burning tokens and getting unreliable results. Raw govc output wasn't designed for AI consumption, and it shows. govcai fixes that with a single wrapper — no vCenter changes, no new APIs.
| Benefit | What It Means |
|---|---|
| Up to 4,760x smaller responses | A 1.5 MB host info dump becomes 315 bytes. Your agent's context window stays open for reasoning, not data. |
| 98.6% LLM accuracy (vs 42.2%) | Flat markdown tables with 0–1 extraction hops instead of 3–5 hops through nested JSON. The LLM finds the answer instead of hallucinating one. |
| 35–55% cost reduction per tool call | Fewer tokens in, fewer tokens out. Real-world Claude Code benchmarks show consistent savings across single-command tasks. |
| 2 turns instead of 3 | govcai returns pre-formatted markdown — no extra round-trip for JSON parsing and table rendering. |
| Built-in risk gates | Every command classified as low/medium/high risk. Mutating and destructive operations require explicit --approve. No accidental deletions. |
| Self-describing commands | --discover, --schema, --help-compact — agents query only the schemas they need, on demand, instead of consuming entire man pages. |
| Zero migration effort | Same GOVC_URL, GOVC_USERNAME, GOVC_PASSWORD. If govc works, govcai works. |
In our previous post, we laid out five principles for making CLI tools work efficiently with LLM agents: structured output by default, response shaping, noun-verb command structure, deterministic error contracts, and built-in discovery. The response was clear — infrastructure teams want this, but they want working code, not just a blueprint.
Today we're releasing govcai, an open source project that implements all five principles as a drop-in wrapper around VMware's govc. It's written in Go, requires zero changes to your vCenter environment, and works everywhere govc does.
govcai sits between your LLM agent and govc. It runs the same govc commands against your vCenter, but intercepts the response and transforms it from verbose JSON into compact, structured markdown that an LLM can actually consume without burning through its context window.
There's no new API to learn. If you know govc, you know govcai. Same environment variables (GOVC_URL, GOVC_USERNAME, GOVC_PASSWORD), same vCenter connection, same authentication. The difference is what comes back.
Here's what the optimization looks like on a real vCenter environment with 41 VMs, 4 datastores, and 2 clusters.
| Command | govc Output | govcai Output | Reduction |
|---|---|---|---|
host info --view config |
1.5 MB | ~315 B | 4,760x |
alarm list (50 alarms) |
818 KB | 371 B | 2,204x |
vm ip <vm> |
70 KB | 124 B | 567x |
datacenter info |
183 KB | 389 B | 470x |
vm list (41 VMs) |
1.7 MB | 5.4 KB | 316x |
datastore usage |
32 KB | 270 B | 121x |
permissions list |
86 KB | 1.3 KB | 69x |
The headline number — 316x reduction on VM listing — comes from extracting the 11 fields an agent actually needs from the 200+ fields govc returns per VM. But the real story is what this enables: an LLM agent can now hold an entire 41-VM inventory in ~1,400 tokens instead of ~427,000.
We ran end-to-end benchmarks with Claude Code (Sonnet) against a production vCenter. Each task was executed 3 times to measure consistency.
| Task | govc Tokens | govcai Tokens | govc Cost | govcai Cost | Savings |
|---|---|---|---|---|---|
| List VMs | 66,571 | 43,410 | $0.068 | $0.031 | 35% tokens, 54% cost |
| Host info | 66,399 | 42,024 | $0.051 | $0.030 | 37% tokens, 41% cost |
| Datastore usage | 70,194 | 42,086 | $0.067 | $0.030 | 40% tokens, 55% cost |
| Cluster bundle | 239,396 | 152,100 | $0.122 | $0.081 | 36% tokens, 34% cost |
Single-command tasks consistently complete in 2 tool-call turns with govcai versus 3 with govc. The third turn disappears because govcai returns pre-formatted markdown — Claude doesn't need an extra round-trip to parse JSON and render a table.
Compact, structured output doesn't just save tokens — it makes the LLM more accurate.
| Task | govc Accuracy | govcai Accuracy |
|---|---|---|
| "Free space on datastore ssd?" | 39.2% | 98.5% |
| "IP of VM web-prod-01?" | 31.0% | 99.5% |
| "Which VMs have autostart?" | 56.5% | 97.8% |
| Average | 42.2% | 98.6% |
With govc, the LLM has to navigate 3–5 hops through nested JSON objects to find the answer. With govcai, the answer is 0–1 hops from the surface. Less noise, fewer extraction errors.
Here's how govcai maps to the design principles from the original blog post:
govcai outputs markdown tables by default — not because markdown is magic, but because it's the format LLMs handle best. Tables compress naturally, column headers provide context, and the whole thing fits in a fraction of the tokens.
# govc: 1.7 MB of nested JSONgovc vm.info -json# govcai: 5.4 KB markdown table with the fields that mattergovcai vm list
Need the raw JSON? --format raw passes through govc's output untouched. Need JSON for scripting? --format json gives you govcai's filtered result as JSON.
This is where the biggest token savings come from. Instead of returning everything, govcai offers pre-built views that match common agent intents:
govcai vm info web-01 --view perf # CPU%, memory%, uptimegovcai vm info web-01 --view config # CPU count, memory, disk, networkgovcai vm info web-01 --view status # Power, tools, IP, guest OSgovcai host info esxi-01 --view perf # CPU/memory utilizationgovcai host info esxi-01 --view config # Hardware specs
For maximum control, --fields lets you specify exactly which columns you want, and --token-budget caps the total output size.
Every command follows govcai <noun> <verb> — predictable enough that an LLM can infer commands it hasn't seen:
govcai
├── vm (list, info, status, power-on, power-off, create, destroy, ...)
├── host (list, info, status, performance, maintenance-enter, ...)
├── datastore (list, info, usage, ls, rm, ...)
├── cluster (summary, capacity, rule-list, host-list, ...)
├── snapshot (tree, create, remove, revert, removeall)
├── metric (list, info, sample, interval-info, ...)
├── tags (list, info, create, attach, detach, ...)
├── disk (list, create, attach, detach, ...)
├── role (list, usage, create, update, remove)
├── permissions (list, set, remove)
├── alarm (list, info)
├── license (list, info, assigned-list, add, ...)
├── library (list, info, item-list, deploy, ...)
└── ... (19 categories, 164 commands total)
Every error is JSON with a machine-readable code, a target, and a retry hint. No stack traces, no ambiguous English paragraphs:
{"error": true, "code": "APPROVAL_REQUIRED", "message": "This operation requires --approve", "target": "vm.destroy", "retry": false}
The error codes (AUTH_FAILED, VM_NOT_FOUND, TIMEOUT, APPROVAL_REQUIRED, etc.) map directly to recovery actions. An agent can build retry logic without parsing natural language.
LLM agents shouldn't need to read man pages. govcai provides three levels of self-documentation:
govcai --help-compact # ~576 tokens for all 164 commandsgovcai vm info --schema # JSON schema: args, views, risk levelgovcai --discover # Complete JSON schema dump
--help-compact gives roughly 16 tokens per command — compared to ~200 tokens for a typical man page entry. An agent can scan the entire command surface in a single tool call.
When you let an LLM agent run infrastructure commands, the risk profile changes fundamentally. A human reads vm destroy and thinks twice. An LLM might not.
govcai classifies every command by risk level:
| Risk Level | Description | Behavior | Examples |
|---|---|---|---|
| Low | Read-only operations | Runs immediately | vm list, host info, datastore usage |
| Medium | Mutating operations | Requires --approve |
vm power-on, snapshot create, maintenance-enter |
| High | Destructive operations | Requires --approve |
vm destroy, snapshot removeall, pool destroy |
Without --approve, any mutating or destructive command returns a structured error:
{"error": true, "code": "APPROVAL_REQUIRED", "message": "This operation requires --approve", "target": "vm.destroy", "retry": false}
This gives the LLM agent (or its orchestrator) a clear decision point: escalate to a human, or proceed with explicit approval. No accidental deletions.
Real infrastructure tasks rarely involve a single command. govcai supports two patterns for multi-step operations:
Bundles aggregate related read-only commands into a single call:
govcai bundle cluster-status # system about + cluster summary + host listgovcai bundle vm-health # vm list + system aboutgovcai bundle full-inventory # complete infrastructure scan
Workflows are YAML-defined pipelines that can include variables, conditionals, and dry-run previews:
name: vm-health-checkdescription: Comprehensive VM health assessmentrisk_level: lowsteps: - id: list-vms task: vm.list view: summary - id: system-info task: system.about
govcai workflow workflows/vm-health-check.yamlgovcai workflow workflows/vm-health-check.yaml --dry-run # preview without executing
govcai covers 164 of govc's ~412 commands across 19 categories. Thirteen categories have 100% coverage.
The gap is intentional. Uncovered commands fall into four buckets: niche admin operations (SSO, VCSA appliance management), interactive commands (VM console, VNC), and high-privilege operations that shouldn't be casually accessible to AI agents or its work in progress.
Every command govcai does expose has a purpose-built handler that converts verbose JSON into compact markdown — adding commands without proper handling would just pass through raw JSON and defeat the purpose.
# Build from sourcegit clone https://github.com/vchaindz/govcai.gitcd govcaigo build -o govcai ./cmd/govcai/# Ensure govc is installedbrew install govc # or download from github.com/vmware/govmomi/releases# Configure vCenter connection (same as govc)export GOVC_URL=https://vcenter.example.comexport GOVC_USERNAME=administrator@vsphere.localexport GOVC_PASSWORD=secretexport GOVC_INSECURE=true# GOVC_DATACENTER is auto-detected and cached for 24h# Try itgovcai system aboutgovcai vm listgovcai host list --view configgovcai datastore usagegovcai --help-compact
govcai auto-detects your datacenter when GOVC_DATACENTER isn't set. If there's exactly one, it's used automatically. If there are multiple, govcai tells you which ones are available and asks you to specify.
govcai is open source under the Apache 2.0 license. The immediate roadmap includes expanding coverage for content libraries (library.create, library.sync), adding OVF import/export support, and building more workflow templates for common operational patterns.
But the bigger goal is demonstrating a pattern. The principles behind govcai — response shaping, risk gates, deterministic errors, built-in discovery — aren't VMware-specific. They apply to any CLI tool you want to make LLM-ready. We started with govc because it's the CLI tool we know best, but the architecture is designed to be a reference for wrapping kubectl, terraform, esxcli, or any other infrastructure CLI.
If you're building AI-driven infrastructure automation, we'd love your feedback. Try it, file issues, contribute new task handlers.
The project is at github.com/vchaindz/govcai.