Application Performance Monitoring

As Imfiny starts to work on a prototype of product there was the opportunity to go through a few exercises along application design, infrastructure and other points. As with prototypes we usually don't aim to setup all the tools ourselves. As we advise teams we work with we chose our battles.

As performance monitoring and errors handling are so key to keeping a product going along we decided to go with AppSignal. Some of us have been using it on and off for years depending on the projects. And it had been a while since we really looked at it. It has now become a great and complete APM not just limited to errors handling. It really looks great : simple but not too simple with plenty of insights into how the application is doing.

NewRelic

When we get to help teams we sometimes find out that there is no performance monitoring tool in place and that the team barely knows how things are going in production. It's ok, when the product is not very old the team is usually focused on getting users and keeping the thing affloat. That's understandable.

In such cases, so far, we advised teams to go with a low cost account on NewRelic. It's a great tool, it brings a lot of insights quickly on how things are going with the main metrics (Error rate, response time and throughput) easily accessible. The issue we have with it is the price and the enormous amount of features : clearly, most teams pay too much for what they use.

So, while we think NewRelic is a great tool, we also think most teams don't need all those nifty features.

Keep an eye on the ball

The main goal, after all, is to get a team familiar with the concepts behind performance monitoring, and help them develop their own culture and rituals around those.

That's why we like AppSignal. It's a lot simpler to comprehend what's happening in the UI, where to go, what to do.

In our "Building up : Product and performance metrics" post in December 2020 we covered the gist of the reasons behing performance monitoring. And we usually advise our clients to go through different phases to gradually move from "it's still alive, probably" to "it's alive and well" as introducing a truck load of concepts in one go doesn't work well.

We first show how to monitor the application and what each of the main key metrics mean :

  • error rate : the number of errors happening for a given chunk of time
  • response time : the time it takes for the application to respond to a request, usually it's only server side
  • throughput : the number of requests happening for a given chunk of time, it gives a context and scale to the previous two

With those concepts introduced we move onto giving them a scale related to web applications : what's considered acceptable usually for the industry, for the specific domain of the application and for the team. Those three scales might not be the same. It all depends on how the application is doing and how the team can affect those at the time.

Taking "response time" as an example : the team might see requests taking 5 seconds to be responded to. 5s is a very long time, yet, while you might want to get to 0.5s it's probably easier to halve this response time first, it's also lesss daunting. So in such cases we generally advise in this way : "ok, let's try to get this time halved" and we get to work, introducing where to look and what to look for. With a custom dashboard in place the impact of every deployed change will be directly visible. Once that first objetive is reached, another round can happen.

Once such habits are introduced and the team is watching those numbers it can be time to introduce concepts such as Service Level Objectives. The previous steps basically introduced the concept of Service Level Indicators, by now the team can understand what they mean, how to act on them and where they need to be so that the product can be defined as "healthy".

Google SRE book has a great chapter on Service Level Objectives, it's good read for any person involved at the engineering level in a product.

Error handling

Most performance monitoring tools out there are able to monitor errors in an application. That's how we discovered AppSignal originally and it still excels on that. NewRelic can also work well for that but our reference for this is Sentry. The licensing model allows for the code to be used, seen, studied and even extended as long as you are not planning in selling it as your own SaaS.

It can be self hosted and does wonders to drill into errors. For us at Imfiny it's still the best out there. If you have the time and hands to set it up for your team it'll be a good investment.

Self hosting

This allows us to talk a little bit about self hosting performance monitoring tools. It's clear that AppSignal, NewRelic and others do a great job at giving a team the performance insights it needs. Yet, the price can be high and at some point the team might want to invest time and money into setting up and managing their solution of choice. After some time using commercial APM tools it might also come clear to the team what they actually need from it.

In such a situation, at the moment, one great solution is to turn towards Prometheus with a Grafana setup. Prometheus has become a de facto leader on this and makes for a great solution.
As for RubyOnRails applications there are ways to instrument them to get all the right metrics out. Projects like Yabeda are promising.

In summary : when the time comes teams can find a way to setup Prometheus, Grafana and Sentry to have a complete, self hosted, performance and error monitoring stack.

Going self hosted has some advantages and some drawbacks, like everything it's no silved bullet. But, usually, if the team feels the need for it has come, it usually means its big enough to overcome the drawbacks one way or another.

Wrapping up

So, it looks like from now on AppSignal will have our preference when it comes to introducing teams to performance monitoring tools. It clearly has all most teams need to go from early steps to a long way in the life of a product. It also doubles as a great error tracking tool while not being super expensive. There are very few features most teams won't get to use too, so there will be less waste of money.

Great work AppSignal, thanks for the help.

Need help ?

We specialise in helping small and medium teams transform the way they build, manage and maintain their Internet based services.

With more than 10 years of experience in running Ruby based web and network applications, and 6 years running products servicing from 2000 to 100000 users daily we bring skills and insights to your teams.

Wether you have a small team looking for insights to quickly get into the right gear to support a massive usage growth, or a medium sized one trying to tackle growth pains between software engineers and infrastructure : we can help.

We are based in France, EU and especially happy to respond to customers from Denmark, Estonia, Finland, France, Italy, Netherlands, Norway, Spain, and Sweden.

We can provide training, general consulting on infrastructure and design, and software engineering remotely or in house depending on location and length of contract.

Contact us to talk about what we can do : sales@imfiny.com.