Posts Tagged ‘monitoring’

A Prometheus exporter for Dovecot

Monday, January 16th, 2017

To start this year’s series of contribution to the open source community, we’re proud to announce the release of yet another tool that we use to monitor our production setup, namely a Prometheus metrics exporter for the Dovecot POP/IMAP mail server.

One of the key features of Prometheus is that it is very well suited for white box monitoring, i.e. having graphs and alerts based on internal state of the program, as opposed to testing just the externally visible behaviour of the system. For Dovecot we’re very interested in using white box monitoring to be able to graph traffic and resource usage per customer, domain and user.

It turns out that we’re in luck, as Dovecot 2.1 and later ship with a statistics module that provides access to this kind information. When enabled, Dovecot binds to an additional UNIX or TCP socket on which metrics are exported. The Dovecot exporter that we’ve published on GitHub is basically a light-weight proxy that converts the metrics from Dovecot’s format into Prometheus metrics, exporting them over HTTP.

Below is a screenshot of our Dovecot exporter in action. The graph shows the rate of IMAP commands sent to our mail server, broken down by IMAP username.

If your email setup is also based on Dovecot and use Prometheus for monitoring, we’d like to invite you to give this exporter a try as well. Feel free to file issues or send pull requests on GitHub.

Birdwatcher: Accessing Calico/BIRD metrics through Prometheus

Friday, October 28th, 2016

At Kumina we maintain a Kubernetes setup running on Amazon EC2. For the low-level networking between containers, we make use of Calico. Calico configures all of our EC2 systems to form a mesh network. The systems in this mesh network all run an instance of the BIRD Internet Routing Daemon.

One of the problems we ran into with Calico is that it’s sometimes hard to get a holistic view of the state of the system. Calico ships with a utility called calicoctl that can be used to print the state of a single node in the mesh, but using this utility can easily become laborious as the number of EC2 instances increases.

Given that we already make strong use of Prometheus for our monitoring, we’ve solved this by writing a tool called Birdwatcher that exports the metrics generated by BIRD in Prometheus’ format. This allows us to put alerts in place for when an excessive number of changes to routes occur, or when routes simply fail to work for a prolonged period of time.

Today we’re happy to announce that Birdwatcher is now available on our company’s GitHub page. If you’re a user of both Calico and Prometheus, be sure to give it a try. Enjoy!

 

screen-shot-birdwatcher

Automated monitoring? Easy!

Tuesday, February 11th, 2014

One of the things we take very serious here at Kumina is our monitoring. We’ve always done so, but even we must admit that during the starting years, we sometimes forgot to include all possible checks for a new service or host. And it sucks when you forget to setup the monitoring for a specific item, because you generally only find out about it when it’s actually down already…

We like to check as much as possible (if not everything). For example, we check if a service is up and running, we check if a vhost is returning the expected response, if an SSL certificate is still valid or if it will expire within 30 days, we check if OpenVPN certificates are close to expiration and if all loaded Apache modules actually come from a Debian package. And we check often, generally every 30 seconds, but we would prefer to do it even more often. However, these are not things you want to configure manually over and over again.

Automate everything

We’re using Icinga in two datacenters in failover mode, the second node takes over if the primary is unreachable. We currently monitor 319 hosts (including some failover virtual hosts) and a grand total of nearly 10000 checks. Although this fluctuates daily, since most changes on a server also adds or removes checks. It is all done automatically. This prevents us from forgetting to setup monitoring for a specific item or host and also allows us to quickly deploy new checks on the entire infrastructure. Consider the Fokirtor check we created last year, it’s very easy for us to simply deploy it on all those machines.

Using the tools at hand

We’re currently pretty heavy Puppet users, so we leverage the infrastructure we already have in place for that.

Since a puppet agent runs on our monitoring hosts every few hours, it’ll deploy new configuration a few times per day. It’s not exactly continuous delivery, but close enough for our needs for now. Equally important, it removes checks we no longer need. For instance, if we’ve create a redirect that was changed into a full-fledged site, the check is automatically changed to no longer expect a 301 response but a 200 with a correct string (that we provided, of course, it’s not that automated).

We started out using the power of puppet’s exported resources but over time as our config grew, it started to take way too long for Puppet to deploy new configuration on the monitoring hosts. We now deploy the configuration for both Icinga instances using a script that reads the stored config from the Puppet database.

Other uses

As you might imagine, we also do this for trending with Munin. We automatically deploy the Munin plugins on the clients when we deploy a new service and we automatically deploy the host configuration on the Munin server. As well as required firewall rules on the client side.

LOADays 2013 and Kumina

Monday, April 8th, 2013

Yesterday evening we returned from Loadays 2013 in Antwerp after two days of talks, meeting interesting people and Belgian beers.┬áBeing there allowed us to see how our peers do their administrating, what new technologies are on the rise and some general presentations on related subjects (like this awesome vim overview). In this blogpost I’ll go through some things we learned and want to implement in the (near) future.

(more…)

Checking for rogue Apache modules

Wednesday, April 3rd, 2013

We’ve read a lot recently about attacks in which an attacker loads a modified module into Apache to insert iframes in outgoing data. Pretty scary, especially since nobody really seems to know how the hacks are performed. Recently, Sucuri wrote a blog article about how to check for rogue Apache modules on Debian. We’ve decided to implement this into an Icinga/Nagios check.

You can find the source for the plugin here. We also publish all our plugins via the ‘nagios-plugins-kumina’ package, provided by our apt repository.

Hope this helps!

Update: I packaged and pushed the wrong version of the script… Silly me. Fixed now!