Kumina | Blog

What you need to know about monitoring containers

monitoring containers

monitoring containers

In our previous blog posts we talked about the benefits of containers, and tried to answer the question “What’s the right choice for my company: containers or virtual machines?” by describing several business examples. When adopting a new technology, you will always face some challenges. One of the important questions in this regard is how to get insight in the behaviour of your applications and container environment. To answer this question, this week we’ll talk about the monitoring of containers.

The dynamic nature of the cloud

One aspect of cloud computing is that systems and the jobs run on them can be easily started, stopped and migrated. This is in contrast with traditional computing, where one uses a fixed set of systems, often running applications in a fixed layout. At Kumina, we’ve come to realise that in order to use such a dynamic system effectively, you also need to use a monitoring system accommodating such use. In our experience, such a system should meet at least the following criteria:

These aspects have been the main reasons for us as a company to migrate from our Nagios- and Munin-based monitoring stack to Prometheus. Combined with Grafana, we are able to provide functional dashboards that offer good insight into our production setup.

Whitebox and blackbox monitoring

When you’re using cluster managers like Kubernetes, your applications consisting of separate components are more likely to be spread out across multiple systems. In such a distributed model, performance is also influenced by the performance of the network, in terms of reliability, latency and throughput.

With most traditional monitoring systems, there is a strong emphasis on performing blackbox monitoring: testing the system by merely considering its externally observable behaviour. A good example of such a test is a HTTP probe, which checks whether a given URL matches a certain output. In the case of distributed applications, such probes are sufficient for alerting, but don’t provide enough insight into the root causes of complex problems.

This is why it’s important to also focus on whitebox monitoring: extending your application to report statistics relevant to its operation as well. The Prometheus project offers libraries that you can link into your application, allowing you to easily convert such metrics into counters, gauges and histograms. These metrics are then exported via HTTP, so that they then can be scraped by Prometheus.

Which metrics should processes expose?

By far the most interesting thing to measure is a task’s communication with its surroundings, both in terms of the requests it receives and the ones it generates. This applies to RPCs sent over the network, but also the task’s interaction with the system (disk I/O). For each of these channels, it makes sense to at least measure these five aspects:

Having at least these metrics available will make it a lot easier to identify the root cause of problems in systems that are largely distributed. Be sure to read the chapter “Monitoring distributed systems” of Google’s SRE book, as it provides good insight with regard to their experiences monitoring their production setup.

Metrics shouldn’t be an afterthought

We’ve observed that most open-source applications often don’t export useful metrics by default. Even when they do, they are typically only added during a later stage of the application’s development lifecycle, after receiving feature requests for them from users.

As we at Kumina believe that metrics are not only of interest to administrators, but also to the software’s developers, we advise that metrics shouldn’t be seen as an afterthought. They should be added as soon as during the early stages of the software development process, just like unit tests. Is there a Product Readiness Review (PRR) process in place at your organisation for determining whether software can be taken into production? If so, consider making the availability of useful metrics a requirement for completion.

While setting up Prometheus at Kumina, we’ve had to develop several utilities for converting metrics provided by existing pieces of software to Prometheus’ format (so-called metrics exporters). Since we at Kumina want to give back to the open-source community, we’ve published most of these on our company’s GitHub page.

 

Kumina creates and manages Docker and Kubernetes based container platforms, completed with a wide range of professional services and unlimited support. Don’t hesitate to contact us when you are considering the move to a container-based platform, we love to help you get started by offering you an hour of free consultancy.

Exit mobile version