Observability costs are exploding as companies strive to provide maximum customer satisfaction with high performance and 24/7 availability.
Global annual spending on observability in 2024 will be well over $2.4 billion, and is expected to reach $4.1 billion by 2028. This is reflected in observability costs ranging from 10% to 30% of total infrastructure spend on an individual company basis.
These costs will undoubtedly rise as the digital environment expands and becomes increasingly complex. Therefore, it is essential for cost-conscious companies to evaluate how to best reduce this cost while maintaining overall excellence in observability.
Learn why observability software is in such high demand, how to implement a DIY cost optimization approach, and criteria for selecting off-the-shelf options to keep observability costs as low as possible.
Why is observability so expensive?
The most obvious reason for increased observability costs is that businesses must accommodate today's consumers who expect lightning-fast, on-demand, 24/7 access to everything digital. Monitoring the health of your systems is essential for modern businesses. But in addition, a variety of technical and organizational factors drive observability costs.
Let's take a look at some of them:
microservices
Microservices generate more observability data than comparable monolithic applications. This is especially important for trace data that shows how data flows within your application through all the interfaces it intersects. The more microservices there are, the more data there is, and the more complex their interdependencies become.
ephemeral server
Previously, servers were running for years. But in a cloud-centric world, ephemeral servers have become much more popular due to the ability to spin up servers on demand, the nature of microservices and containerization, and the increasing use of spot instances. . This also increases infrastructure complexity and increases data volume.
SRE and chaos engineering
Site reliability engineers (SREs) typically test applications using chaos engineering, intentionally introducing failures to verify resiliency. For example, SRE destroys a server just to see how the system responds. The resulting failures are not typically seen in normal day-to-day system operation, so observability data is again increased to cover these test modes and scenarios.
Indexing and hot storage
As a result of the above factors, observability solutions ingest and process vast amounts of data so that businesses can understand where problems are and ensure that the health of their applications and websites is not compromised. is needed. However, this typically requires indexing the data to speed up search and query operations, and storing the data in hot storage for frequent and fast retrieval. This directly drives up the cost of observability, especially since hot storage is very expensive.
It's not the amount of data that matters, it's the management of the data.
Some observability vendors recommend limiting data ingestion to reduce costs, but this strategy can lead to missed detections of operational issues and valuable data needed for root cause analysis. Observability can be compromised due to data loss, increased risk of non-compliance with various regulatory requirements, etc.
Before we get into how you can better manage your data and related costs, let's take a look at some surprising statistics about data consumption for over 1,000 companies.
Do-it-yourself observability
Taking a DIY approach may be most effective for many companies with experienced DevOps and SRE teams. Here's what you need to know when building DIY observability.
Start with the right framework for DIY cost-effective observability
Data management is complex, so it's easy to lose sight of the details. However, to reduce observability costs and keep them low, you need to start with the right approach.
It doesn't have to be a large or complex consulting project to reduce observability costs. The main steps to follow are:
Decide how your data will be used
Here are three categories you can use to organize.
- Data you search on a daily basis
- Data used for dashboards and alerts, but not searched frequently
- Data retained for compliance purposes only
Many open source tools can give you some insight into what is being searched for most. For example, Prometheus query logs can tell you which queries are running the most, and therefore which time series metrics are most important.
You can also expand on the categories above, as organizations undoubtedly have different data usage scenarios. However, it is important to start with this basic classification, as you will need it later.
Abandon the index everything pattern
A common trend in observability solutions is to index all ingested data into a tool like OpenSearch and then move it over time to a cheaper storage option like S3. Not all of the ingested data is used for fast searches, and 30% of the data is never used. Indexing is very expensive and should be limited to frequently searched data.
This pattern is common because it is easy to set up the flow. However, by defining use cases, teams can create more intelligent data routing patterns that classify data before deciding what to do with it.
Route data to appropriate storage
Once you have data use cases and statistics in place, it becomes easier to categorize your data. This classification allows teams to understand what data needs to be queried immediately, what data is never queried, and everything in between. Based on the category, you can decide whether to archive your data, store it on hot storage solid-state disks (SSDs), or store it on intermediate options such as magnetic Amazon Elastic Block Store (EBS) volumes.
In this flow, only the most important and frequently searched data is indexed and stored on expensive SSD (hot storage). On the other hand, compliance data that does not add operational value can be sent directly to less expensive archival storage. Data required for intermittent use can be stored on magnetic EBS volumes.
Do not re-index
Reindexing occurs when the data is already in archive storage, but needs to be accessed again. For example, regulatory data may be archived on a regular basis, but is only needed once a year to generate a report. Even if the data is eventually removed from hot storage, this act of reindexing is very costly. Additionally, adding this large amount of data back to the index slows down operational queries.
As an alternative to this costly and inefficient re-indexing, archived data should be stored in an easily accessible open source format such as Parquet or CSV. This allows you to query the archive directly without creating an index. This reduces observability charges, but more importantly, it separates historical and operational data and keeps operational data queries working quickly.
Minimize data generation whenever possible
Stop generating unnecessary logs, traces, and metrics. The classification described here will help you understand which data is useful and which data is not.
Data needed for regulatory compliance and peace of mind should be stored directly in low-cost archival storage. In most cases, this data will not be used, but you can query it directly from the archive, as described in the previous section.
Convert logs and spans to metrics
There is no rule that says data must be brought in in its original format. Logs are particularly expensive to store due to their large size. Not all fields of log data are useful. If your logs have limited useful fields, consider converting them to time series metrics and removing the original logs from storage. Metrics are smaller and cheaper to store. This data can be indexed, giving DevOps teams the same insight. Cost is optimized because there is significantly less data to index.
One exception to metrics that have low storage costs is when they have high cardinality. These metrics have labels that contain many distinct values, such as the IP address metric, which supports millions of users. Each distinct value under the label provides different ways in which you can query the data. This slows down queries, increases costs, and increases outages. In general, metrics work better with many different time series than with a single time series with many high-cardinality and high-dimensional labels.
To avoid high cardinality, teams can aggregate metrics to have fewer labels, remove unnecessary labels, or generate smaller metrics with lower cardinality. These actions help reduce costs and are important to maintaining high performance standards.
Off-the-shelf observability
The operational overhead of managing your own observability solution may be too high. This overhead can distract your team and burden you with the painstaking maintenance of your observability stack and its underlying infrastructure. If you are considering a managed observability solution, the following sections provide general guidance.
What to look for in an observability vendor
When looking at SaaS observability options, cost optimization depends on the provider, its architecture, and how insights are generated in its own systems.
Here are some tips to help you choose a cost-effective solution.
Ask the right questions about costs
Consumers using SaaS observability providers need a way to optimize the cost of each system. Whether you're already integrated with a provider or choosing one for the first time, ask specifically about cost optimization. Ask, “What tools do you offer your customers to optimize costs?”
The answers provided by vendors reveal what tools they have invested in and built, rather than placing the burden of cost optimization directly on the consumer. When using third-party solutions, customers have less control over the flow, so a common response to reducing costs is simply to reduce the amount of data. As previously discussed, this is not the best choice, as insight into the health of the software system may be lost, and implementing these reductions requires significant engineering time (and increases cost). It's not an option. Therefore, if an actual cost optimization tool is not built into the product, it should be avoided as the provider likely does not want you to optimize costs.
Understand vendor pricing models
This may take more effort, but it's important to read the fine print. If a vendor offers many bundled services that force you to purchase features you don't need, has per-host pricing that doesn't differentiate between host sizes, or has different non-standardized prices for different features. If so, you should think twice.
Look for a vendor that offers straightforward pricing so you can easily estimate costs and avoid expensive overage fees.
YouTube.COM/THENEWSTACK
Technology moves fast, so don't miss an episode. Subscribe to our YouTube channel to stream all our podcasts, interviews, demos, and more.
subscribe