Optimizing VMware CLI Tools Like govc for LLM Agents

Written by Dennis | Mar 16, 2026 9:53:28 AM

Your CLI already has the data AI agents need. The problem is how it delivers it. Here's how to bridge the gap without rewriting everything as an MCP server.

In our previous post, we made the case that MCP beats CLI for complex infrastructure AI workflows. But here's the uncomfortable truth: most infrastructure teams already have battle-tested CLI tools — govc, esxcli, kubectl, terraform — and rewriting them as MCP servers isn't realistic. The smarter question is: how do you make existing CLI tools work efficiently with LLM agents?

That's exactly the problem Codenotary's lnxops approach tackles in the AgentX platform, and it offers a blueprint that applies far beyond Linux operations. The core idea is deceptively simple: don't change what your CLI does, change how it talks.

The Token Tax on Raw CLI Output

Let's start with a concrete example. When an AI agent runs govc vm.info -json /DC0/vm/web-server-01, it gets back something like this:

{ "virtualMachines": [{ "name": "web-server-01", "config": { "changeVersion": "2026-03-10T14:22:01.234567Z", "modified": "2026-03-10T14:22:01.234567Z", "hardware": { "numCPU": 4, "numCoresPerSocket": 2, "autoCoresPerSocket": false, "memoryMB": 8192, "virtualICH7MPresent": false, "virtualSMCPresent": false, "firmware": "bios", "device": [ ... 2000+ lines of device specs ... ], "cpuAllocation": { ... }, "memoryAllocation": { ... } }, "guestFullName": "Ubuntu Linux (64-bit)", "version": "vmx-19", "uuid": "422b3d90-a1b2-c3d4-e5f6-123456789abc", "instanceUuid": "502b3d90-a1b2-c3d4-e5f6-123456789abc", "template": false }, "runtime": { "connectionState": "connected", "powerState": "poweredOn", "host": { ... }, "bootTime": "2026-03-01T08:00:00Z", "maxCpuUsage": 9200, "maxMemoryUsage": 8192 }, "guest": { "toolsStatus": "toolsOk", "toolsRunningStatus": "guestToolsRunning", "guestId": "ubuntu64Guest", "guestFullName": "Ubuntu Linux (64-bit)", "hostName": "web-server-01", "ipAddress": "10.0.1.42", "net": [ ... network details ... ], "disk": [ ... disk details ... ] }, "storage": { ... }, "layoutEx": { ... hundreds of lines ... } }] }

That response easily hits 8,000–15,000 tokens. The agent asked a simple question — "what's the status of web-server-01?" — and got back a firehose. Device arrays, layout metadata, allocation policies, UUIDs the agent will never reference. Every token of that response counts against the context window and the API bill.

Now multiply this by the 5–8 tool calls a typical investigation requires. By the third call, the agent is spending more tokens on data it already consumed than on actually reasoning about the problem.

The Five Principles of LLM-Friendly CLI Design

The lnxops approach and the broader "CLI for agents" movement converge on five design principles that transform how CLI tools serve AI consumers:

1. Structured Output by Default, Not by Flag

Most CLI tools treat machine-readable output as an afterthought — a --json flag bolted onto human-formatted defaults. For agent consumption, this needs to flip:

Before (govc default):

Name: web-server-01
Path: /DC0/vm/web-server-01
UUID: 422b3d90-a1b2-c3d4-e5f6-123456789abc
Guest name: Ubuntu Linux (64-bit)
Memory: 8192MB
CPU: 4 vCPU(s)
Power state: poweredOn
Boot time: 2026-03-01 08:00:00 +0000 UTC
IP address: 10.0.1.42
Host: esxi-host-03.lab.local

After (agent-optimized):

{"name":"web-server-01","cpu":4,"memoryMB":8192,"powerState":"poweredOn","ip":"10.0.1.42","host":"esxi-host-03","guestOS":"ubuntu64","bootTime":"2026-03-01T08:00:00Z"}

The first is 320 tokens. The second is 68 tokens. Same information, 79% fewer tokens.

The practical minimum: detect when stdout isn't a TTY (i.e., the consumer is a program, not a human) and automatically switch to structured output. An OUTPUT_FORMAT=json environment variable or --output json flag serves as the explicit override.

2. Response Shaping — Return Only What's Asked For

This is the single biggest win. Raw API responses dump everything; an optimized CLI returns only the fields relevant to the query. The lnxops approach implements this through field selectors:

# Raw govc: returns everything (~12,000 tokens) govc vm.info -json /DC0/vm/web-server-01 # Optimized: returns only requested fields (~80 tokens) lnxops vm info web-server-01 --fields name,cpu,memory,powerState,ip # Even better: pre-built "views" for common agent queries lnxops vm status web-server-01 # status view: power, uptime, tools lnxops vm performance web-server-01 # perf view: cpu%, mem%, disk latency lnxops vm config web-server-01 # config view: cpu, mem, disk, network

This mirrors exactly what AgentO does on the MCP side — each tool returns a curated response, not a raw API dump. The lnxops approach brings the same discipline to CLI interfaces by defining "views" that map to common agent intents.

3. Noun-Verb Command Structure

The traditional Unix command style (govc vm.info, govc host.esxcli) works for humans who know the tool. For an LLM agent discovering available commands, a noun-verb hierarchy turns exploration into a deterministic tree search:

lnxops ├── vm │ ├── list # List all VMs (summary view) │ ├── info # Detailed VM information │ ├── status # Power state, uptime, health │ ├── performance # CPU, memory, disk, network metrics │ └── changes # Recent configuration changes ├── host │ ├── list │ ├── info │ ├── status │ └── performance ├── log │ ├── search # Cross-source log search │ ├── tail # Recent entries │ └── stats # Aggregated statistics └── cluster ├── summary ├── capacity └── alerts

An agent that knows lnxops vm can discover all VM-related actions. An agent that knows lnxops vm performance can infer that lnxops host performance probably exists. This predictability eliminates exploration overhead — the token cost of the agent trying random commands to figure out what's available.

4. Deterministic Error Contracts

Human-oriented error messages are a token sink. "Error: Unable to connect to vCenter server. Please check your network connection and ensure the server is running" is 28 tokens of text the agent can't reliably act on.

Agent-optimized errors use structured exit codes and machine-readable output:

{"error": true, "code": "VCENTER_UNREACHABLE", "message": "Connection refused", "host": "vcenter.lab.local", "port": 443, "retry": true}

Now the agent can programmatically decide: error code is VCENTER_UNREACHABLE with retry: true, so wait and retry. No parsing of English sentences, no ambiguity.

5. Built-in Discovery and Self-Documentation

An agent shouldn't need to read a man page (burning thousands of tokens) to discover what a command does. The lnxops approach embeds machine-readable discovery:

# Returns JSON schema of all available commands lnxops --discover # Returns the exact input schema for a specific command lnxops vm performance --schema # Returns compact help (~16 tokens per command vs ~200 for man pages) lnxops --help-compact

This is the same pattern that mcp2cli uses to achieve its 96% token reduction: instead of injecting all tool schemas into the context on every turn, let the agent query for only the schemas it needs, on demand.

The Optimization Pipeline: From govc to Agent-Ready

Here's what a practical optimization pipeline looks like, taking raw govc output and making it LLM-ready:

Stage 1: JSON Output — Use govc's existing -json flag. This gets you structured data but with massive payload bloat.

Stage 2: Field Filtering — Pipe through jq to select only the fields the agent needs. This can be automated by defining "view" profiles.

Stage 3: Response Budgeting — Set a token budget per response. If the filtered output exceeds the budget, automatically summarize or paginate. A list of 500 VMs shouldn't dump all 500 into the context; return the first 20 with a count and a continuation token.

Stage 4: Caching Layer — For data that changes slowly (host inventory, cluster topology), cache results and serve from cache. This is what AgentO does with its three-tier cache (Ristretto → Redis → SQLite), but even a simple file-based cache with smart TTLs provides massive gains.

Stage 5: Semantic Routing — Map natural-language intents to specific command + field combinations. When the agent says "check if the VM is healthy," route to vm status with health-relevant fields, not vm info with everything.

Token Budget: The Numbers

Here's what the optimization stages actually save on a real-world "investigate slow VM" workflow:

Step	Raw CLI (govc)	Stage 2 (filtered)	Stage 5 (routed)
List VMs	~5,200 tokens	~800 tokens	~400 tokens
VM details	~12,000 tokens	~120 tokens	~80 tokens
Performance metrics	~8,500 tokens	~200 tokens	~150 tokens
Host info	~6,800 tokens	~160 tokens	~100 tokens
Recent logs	~15,000 tokens	~1,200 tokens	~600 tokens
Total	~47,500 tokens	~2,480 tokens	~1,330 tokens

That's a 97% reduction from raw CLI to fully optimized — remarkably close to what a purpose-built MCP server achieves. The lnxops approach demonstrates that you can get 90%+ of the MCP benefit while keeping your existing CLI toolchain.

When This Beats Full MCP Migration

The lnxops approach isn't a replacement for MCP — it's a bridge. Here's when it makes the most sense:

You already have battle-tested CLI tools. govc, kubectl, terraform, and esxcli have years of hardening. Wrapping them with an optimization layer preserves that reliability.

You need incremental adoption. You can optimize one command at a time. Start with the five commands your agents call most, optimize those, measure the token savings, then expand.

Your team knows the CLI ecosystem. MCP servers require new development patterns. An optimized CLI wrapper can be built by the same team that maintains the existing tools, using familiar patterns.

You're multi-tool. If your agents use 15 different CLI tools, building 15 MCP servers is a project. Building a consistent optimization wrapper that applies the same principles across all 15 is more tractable.

The Convergence with MCP

The interesting thing is that the lnxops approach and MCP are converging. A well-optimized CLI tool with structured output, field selectors, deterministic errors, and built-in discovery is essentially an MCP server with a different transport layer. The mcp2cli project makes this explicit — it can wrap any CLI tool and expose it as an MCP server at runtime, with zero code generation.

AgentO takes the next logical step: rather than wrapping CLI tools, it builds directly on the vSphere API with purpose-built MCP tools. But the design principles are identical — structured output, curated responses, deterministic behavior, and built-in discovery. The difference is architectural maturity, not philosophy.

For teams starting their AI infrastructure journey, the lnxops approach provides a practical on-ramp: optimize what you have, measure the gains, and evolve toward native MCP as the value becomes clear.

View full post