Monitoring is how you build applications and sleep soundly at night. You don't want your users unable to sign up because you ran out of funds in your SMS provider wallet and didn't notice for three days (that actually happened).
The problem, however, is that monitoring services can be expensive, with services like Datadog charging a standard fee of $15 per month and subsequent fees as you inject more data.
To keep yourself afloat as a startup, you need creative ways to monitor your systems. Potentially zero-cost strategies. So, in this article, we’ll consider three cost-effective monitoring strategies—basic, advanced, and hybrid—that could cut down your monitoring costs by 80%.
But first, what should you even be monitoring?
You can't monitor everything. For instance, it wouldn’t make sense to constantly monitor an API that's rarely used, like say the "delete account" API. Well, maybe your app is that bad, but for regular applications, monitoring every metric would likely lead to notification fatigue, wasting your team’s time on false alarms.
As a startup, you only need to keep an eye on the most critical aspects of your application to provide a great user experience. To do that, I say, go RED!
R — Rate of Request E — Error Rate D — Duration
Rate is the number of requests your application receives per second. It provides insight into your application's load and enables you to respond to issues before they occur.
Monitoring the rate of requests helps you understand when your users are most active, allowing you to cater for peak periods, prevent downtime due to overload, and ensure user satisfaction. Monitoring the rate of requests also provides insight into potential issues like security threats such as DDoS attacks, allowing you to address the issue before it results in total service disruption.
By monitoring requests, you can be proactive when addressing anomalies, saving costs and optimizing user experience.
Error rate is the percentage of requests that result in an error or the ratio of failed requests to the total number of requests times a hundred.
(Number of failed requests / Total number of requests) × 100
Monitoring error rates can provide insight into your application’s performance and user experience. For instance, an increase in error rate might indicate that your application is struggling to handle increased load, meaning your users might be getting inconsistent behaviours with your app. In today’s “low user patience” market, poor application performance could lead to users abandoning your product.
By tracking error rate, you can identify and address issues swiftly, preventing major service disruptions.
Duration is the amount of time it takes to complete a request to your application.
Although some requests might require more processing time, users typically expect load times between 2 and 3 seconds on average. Anything beyond that often leads to user frustration and site abandonment. Therefore, you should aim to keep your request duration under three seconds whenever possible.
By monitoring request duration, you can actively assess user experience and address performance issues quickly before they result in significant financial loss.
Monitoring RED metrics would be pointless, though, if you don't know when issues occur, which makes alerts and notifications essential. So, let’s examine cost-effective tools and strategies that will enable you to track the RED metrics, set up alerts, and receive notifications so you can respond to issues swiftly.
All applications are unique, so they may require different levels or depths of monitoring.
If your primary need is to monitor service uptime, then basic monitoring is sufficient. The limitation, however, is that it isn’t 100% RED, as you can only monitor request duration and “inverted error rate”.
Here’s what I mean. Because the basic monitoring strategy uses watchers that make requests to your services, the only request rates you can track are the ones the monitor sends. Consequently, the error rate becomes the ratio of failed requests sent by the monitor to total requests sent by the monitor. Hence, the term “inverted error rate”, since you’re monitoring your own requests. Two open-source tools that can help you monitor your application at minimal cost are Uptime Kuma and Monika.
You can use solutions like Uptime Kuma for basic monitoring. It’s open source, easy to set up, and supports a wide range of monitoring protocols, including HTTP, TCP, and DNS. It also has a monitor dashboard, which makes it user-friendly for non-technical team members. Setting up alerts and notification channels is also straightforward through the Uptime Kuma dashboard.
But what if you need a cost-effective strategy that monitors actual error rates as well as request rates?
Enter advanced monitoring.
For a more comprehensive monitoring solution, Netdata is the way to go. It offers auto-detection, dashboard and preconfigured alerts out-of-the-box, allowing you to start monitoring your infrastructure right away.
Unlike Prometheus, which has a much steeper learning curve, Netdata was built with user-friendliness in mind. It enables you to quickly gather insights without investing significant time or resources into setup and customization. With its kickstart script, all you need is one command, and all required dependencies, repository setup, and metadata agent installation are done for you.
It’s important to note, though, that while Netdata was built to be lightweight, it can be resource intensive, so you’d still incur some cost on the server running it.
But what if you don’t want to handle the infrastructure costs of using Netdata? Well, hybrid monitoring might be for you.
Sometimes, it's just easier to use an off-the-shelf tool like Datadog for monitoring. You don’t need to set up a server for your dashboards; you simply install an agent in your application that sends data to your Datadog instance, easy peasy!
The issue with enterprise tools is that you can rack up bills fast! Especially with solutions like Datadog that work on a data ingestion pricing model. Rather than incur such costs, you can take advantage of their one-time infrastructure payment rate of about $15 a month and couple it with a basic monitoring solution like Uptime Kuma. This hybrid approach lets you delegate some tasks to Uptime Kuma while doing infrastructure monitoring on Datadog. You'd have to be careful though, so you don't go beyond what you need Datadog for. One way to save cost would be to minimize the amount of logs you send to Datadog, using their filtering capabilities to only ingest critical logs.
You can delegate monitoring requesting duration monitoring to Uptime Kuma. If you want synthetic monitoring, which Datadog also offers, you can combine Datadog with another tool like Monika, an open-source tool that's easy to set up. Now, you can get the best of both worlds!
You can start with basic monitoring and gradually try out other strategies until you find one that fits your team. It all depends on the resources available to you. Building a startup is a continuous learning and exploration process to find what works for you and this is no different.
So keep exploring and keep building.