Skip to content
  • There are no suggestions because the search field is empty.

Building an AI App? One Missing Test Cost This Dev $5000

One Missing Test Cost This Dev $5000, David Brusser

David Brusser built a free AI tool to help junior analysts with Excel formulas, only to blow through a $5000 bill in just a few days. Although he was able to turn the product around, turning it into a million dollar product, not everyone would be that lucky to have built a free tool people were willing to pay for.

For instance, I recently built a headline analyzer to help TestingPod authors create engaging headlines for their articles. I don’t think people would be willing to pay for that. I mean, yeah, I put a lot of work into engineering the prompt so that it works in one click, but I don’t think it’s something that would have a large pool of paying customers. To me, it’s a nice thing to have.

You might also be building a free AI tool, maybe for lead generation or to increase the value of another product. I don’t know about you, but to me, a $5000 bill for a product that won’t make you money isn’t a great business idea.

Well, maybe you’re rich. If that’s you, you can stop reading this. If not, keep reading as we explore testing considerations for building a cost-effective AI application. We’ll consider Cost Estimation, Testing rate limiting, monitoring, and performance optimization.

Let’s start with the first wall of defence, Cost Estimation.

Cost Estimation

LLM models aren’t free. Even if you’re using a self-hosted open-source model for your app, you’ll still incur some sort of cost.

For the headline analyzer app, I used the GPT-4o OpenAI model, which gets billed on a token basis, so it’s important that we estimate the potential cost we’ll incur on our free tool. Token-based billing simply means that we’re getting charged on the number of tokens used, which could be a word or a character.

A token could either be an input or output token. Input tokens are what our users send to our app, which in our case is the headline while output tokens are the analysis the model returns. So, the longer the headline our users send, the more cost we incur, same goes for the detail in the analysis we return. The good thing, though, is that a headline analyzer isn’t open-ended like the chatGPT app. It does one thing: the headline goes in, and the analysis comes out. Also, a headline would typically have a word limit. Any article headline longer than 20 words might as well be an article of its own. So, the cost estimate of our app is pretty straight forward.

To estimate the cost, we’ll use the maximum word count of a headline, which we’re setting to 20 words, and the input and output cost of the AI model. The prompt was about 1000 words, and the output was about 150 words.

This means that analyzing one headline would cost about $0.00339 for input (1,356.6 tokens) and $0.00199 for output (199.5 tokens), resulting in a total of $0.00539 for each analysis. Assuming a token-to-word ratio.

David’s approach to cost estimation would be different as his app requires a more diverse or less predictable output from the AI model. This shows how every AI application is different, and testing them might require a different approach.

There’s a part of testing them, though, that’s the same for all of them, and that’s rate limiting.

Testing for Rate Limiting

Building a free tool makes it vulnerable to abuse. Even worse for a product like a headline analyzer which doesn’t require login. The solution? Rate limiting.

By implementing a rate limiter, you stop users from using your product at a threshold that you set. The headline analyzer uses a rate limiter that caches the user’s IP address and uses a sliding window algorithm to determine if a user can send a request to the app or not.

The rate limiter needs to be tested to ensure that it works as intended. To test it we could write a script that simulates user behaviour or take the easier option of using Postman, sending requests till we hit the specified limit. If you want to include it in your automation process, you can take the first option. In my opinion though, it’s not necessary as it’s not code that would change frequently.

When testing your rate limit functionality, it’s good practice to take note of the following to ensure its effectiveness.

  1. Test that your API returns appropriate responses such as an HTTP 429 status code (Too Many Requests).
  2. Verify in your tests that the responses include a  "Retry-After" header.
  3. Test for edge cases, testing just below and just above the rate limit.

Now we’re sure our app won’t be abused by malicious users. If David had tested his app against abuse, he probably wouldn’t have blown through $5000. Truth though is even with such precaution, things can still go wrong.

Which takes us to our next wall of defense, fail safe and monitoring.

Fail-Safe & Alerts

Your fail-safe is your “if all hell breaks lose, I won’t go broke” measure.

Let’s assume some users were able to get around the rate limiter we implemented, the fail-safe completely rejects any further request to the app and prevents our API wallet from being drained.

For our headline analyzer, we’ll set a maximum threshold of cost or token that we’re willing to take on. The simplest way to set this would be to add a monthly limit on the Open AI dashboard. We can also listen for “limit exceeded error” when calling the APIs, which then triggers an alert to Slack or any messaging platform, notifying us about the app’s status.

With that, our app is foolproof.

Test Your Apps, Even The Free Ones

It might be tempting to skip tests completely when building a free tool. But David’s experience shows us that while a broken feature on a free tool might not result in angry customers, it can still drain your funds.

So, whether you’re building a free or paid tool, always test your product!

 

References



MagicPod is a no-code AI-driven test automation platform for testing mobile and web applications designed to speed up release cycles. Unlike traditional "record & playback" tools, MagicPod uses an AI self-healing mechanism. This means your test scripts are automatically updated when the application's UI changes, significantly reducing maintenance overhead and helping teams focus on development.


Jahdunsin Osho

Written by Jahdunsin Osho

Founder and Tech Lead at Edubaloo, is passionate about providing affordable quality education for students across Africa. Prior to this, he worked at several startups, building scalable backend systems, developing consumer blockchain applications and core blockchain infrastructures. Impact-driven, Jahdunsin leverages his non-technical skills in SEO, copywriting, and paid advertising to ensure that the products he builds reach the target audience.