Opvizor Blog

The True Cost of Infrastructure Monitoring

Written by Slava | Jan 30, 2024 1:50:34 PM

Introduction

Today, I'd like to delve into one of the most discussed topics with our customers and partners: the real cost of monitoring. It is no secret that planning IT budgets involves considerations beyond the amount of hardware, encompassing the associated software stack, whether for VMware (ESXi hosts, vCenter, vSphere, vSAN), Nutanix, NetApp, Dell PowerMax, and many more.

Monitoring Your Ecosystem

If you have invested in one or more of the above, you have likely pondered how to monitor your ecosystem and must decide on the most sensible path. Either opting for a packaged solution or taking the do-it-yourself route.

Packaged Solutions for VMWare

Let's begin with the first option. If you are a VMware (or now Broadcom) shop, you have probably purchased vROps or its evolution, the Aria product portfolio, or are contemplating acquiring them to maintain a homogenous VMware line. Based on the feedback from the companies we work with, dedicating a person's entire time to vROps might be worthwhile. Otherwise, you might spend more time extracting useful information from it, and even if you manage to do so, it could break in the next upgrade cycle. Some customers with thousands of ESXi hosts over dozens of vCenters mention that, after several years, the only useful feature is alerting when a host or vCenter disconnects. 

However, it's not as useful for performance, and the dashboard they utilized to monitor available resources is being deprecated. VMware suggests using several different dashboards that don't provide the same information. Others complain about not agreeing with vROps sizing recommendations and not following them, even though they use vROps. I am not saying that tools from vendor X to monitor the hardware of the same vendor are used to promote upgrades and sales more than they are intended to help the user focus on savings, but it wouldn't be out of the question.

Another frequently flagged issue is that the problem with vROps isn't just cost-related (just wait for your next license renewal in 2024) but also how much of a burden it is to perform simple tasks. It ingests so much data that it can be daunting for a small team to rationalize and figure out what matters to them and what doesn't. It may be a fantastic product, but it’s expensive enough that sometimes it's cheaper to buy additional hardware than to justify the licenses for right-sizing.

Challenges with VMWare and Observability

Unfortunately, this flows into another sensitive topic—teams managing the VMWare ecosystem are often small and/or understaffed. Monitoring is good, but observability is what you truly want. Having all that data means nothing if you can't execute on it, whether it's capacity planning or finding a bottleneck. Are you running a Kubernetes cluster? If you were running on a VM and now run on containers, your infrastructure and cloud bills are pretty much the same with the same cluster size. But instead of tens of VMs, you’re now running hundreds or thousands of containers, each generating the same amount of telemetry data as the VMs. Your observability costs are probably higher than the infrastructure supporting your apps.

Alternatives to VMWare

VMware isn’t the only company that can assist you, right? There are other tools to help you monitor your ecosystem. You may have considered New Relic, Datadog, Solarwinds, or Dynatrace. They seem to provide a seamless experience and beautiful dashboards. But a) most of them don't have infrastructure monitoring at their core (believe me, I worked for New Relic), have had price hikes every year (shoutout to DataDog customers), or are known to be worth a fortune (Splunk). Another issue with these companies is that you are locked into their ecosystem, so migrating away will cost you time (downtime) and money. You may have evaluated Checkmk, PRTG, Zabbix, or Nagios, but in addition to complex installation and the need to set up thousands of rules, they won't provide you with the insights you need to have a reasonable MTTD/MTTR.

The DIY Monitoring Approach

But wait… Myriad tools can do it better, and some of them are even free! Perhaps you have thought about building your monitoring stack or have tried setting up the usual suspects: Grafana, ELK, Prometheus, Loki, Telegraf, and the rest. Don’t get me wrong, I have seen it work, but it required tremendous effort to set up and maintain, which is another job on its own. My three years at Logz.io, one of the leading companies in the open-source logging space, allowed me to witness six-figure bills for the supporting infrastructure and the same costs for the engineering man-hours to maintain it. The issue of monitoring not equaling observability persists. Sure, you can have all the Grafana dashboards, but the data isn't magically going to get there.

True Cost of DIY Monitoring

To determine the cost of a DIY stack, you need to ensure you have the following in mind:

  • Cost of infrastructure that it will all run on, computing power to execute it all, and storage to store terabytes of logs and metrics
  • Cost of the engineering hours to set it up
  • Cost of engineering hours to maintain it (because things break)
  • Cost of developing your dashboards and integrations for your storage/firewalls, databases, and their maintenance
  • Cost of time to learn the dashboards and understand what each of them means

Technical Knowledge and Challenges

Another issue is the technical knowledge within your company (or lack thereof). If someone has set it up and left the company - what will you do?

Give Opvizor a Try

These problems are why we developed Opvizor. To address the immediate needs of visibility into your VMware and cloud environments and let you focus on your business, rather than maintenance. I am not trying to say that we are a silver bullet. I am also not saying that our VMware monitoring on-prem and in the cloud will solve your issues. What I AM saying, however, is that it will take you five minutes to see our demo environment and see what you can expect or 30 minutes to see your real actionable data if you install our appliance during the free trial and have a chat with our AI assistant and ask him questions about your dashboards and how to interpret them.



If you disagree with me and want to prove me wrong or have questions about Opvizor - our entire team would be thrilled to share our best practices and data/cost savings that we delivered to customers like Motorola, DZ BANK, and many others.