Posts Tagged ‘hetzner’

Proxying Neighbor Discovery messages: ndproxy

Friday, June 10th, 2011

On our systems at Hetzner we only have a single /64 IPv6 range, which we use to assign addresses to virtual systems, running in Xen and KVM. We also wish to perform layer 3 and 4 firewalling and traffic accounting on the host system, which means we don’t directly bridge the virtual machines to the external interface, but bridge them to a dummy interface on the host system. This implies that Neighbor Discovery messages that are generated on the internal bridge interface are not propagated to the outside network interface. We currently solve this by manually adding proxy rules, using the ip -6 neigh add proxy ... dev ... command.

The disadvantage of this approach is that you cannot add proxy rules for entire ranges of addresses. This is also not a good approach, because it may potentially pollute upstream routers with spurious entries. This is a problem for us, because we want to be able to simply assign new addresses to virtual machines without requiring manual reconfiguration on the host system. Therefore we have written a small script called ndproxy, which scans the output of ip -6 neigh show dev ... and replicates proxy entries on the outer interface.

The code is published in the ndproxy repository on GitHub. Enjoy!

Failover without Hetzner addresses: the HostHenker

Friday, March 18th, 2011

Even though we have already presented quite a few scripts over the last couple of weeks to perform automatic failover at Hetzner using Heartbeat, we also want to provide failover for setups where we don’t use any failover addresses. For these types of setups, we have written a shell script, called HostHenker.

HostHenker is a small shell script, which uses nc(1) to determine whether a daemon is listening on a given host. Based on whether a daemon is running, it adds an entry to /etc/hosts for this host or a fallback host. Invocation of this utility is simple:

hosthenker mysql.example.com 3306 10.0.0.7 10.0.0.8

In this setup, the script tries to connect to 10.0.0.7:3306 and writes an entry for mysql.example.com to /etc/hosts. The actual command invoked by this script is nc -zw 15. The -z flag enables zero-I/O-mode, effectively closing the connection as soon as it has been established, while -w 15 sets a 15-second timeout, which is often good enough.

Right before installing the new /etc/hosts file, HostHenker checks whether the new file is different from the original one. If this is the case, it writes a string to stderr, which can be useful when configuring this script to run as a cronjob.

The files:

Hetzner Failover IP OCF script part III: When HTTP attacks

Wednesday, March 16th, 2011

Our OCF script for failovers at Hetzner worked flawlessly the last month. Last week, however, a problem arose we did not anticipate. The webservice returns an HTTP statuscode (as is expected from a webserver) and we did not anticipate any HTTP errorcodes.

An HTTP response in the 4XX or 5XX range would kill the python interpreter with a traceback from urllib2 and an exit code of 1, a code which told the OCF script to return $OCF_NOT_RUNNING which caused a failover to occur. This wouldn’t be a problem in a normal operating environment.

Unfortunately, we noticed that the Hetzner failover webservice isn’t totally stable. This happens on both hosts in the failover setup, who will both try to failover and cause havoc. Fortunately, OCF has an errorcode which means a soft fail ($OCF_ERR_GENERIC), we can use this code to tell heartbeat a temporary failure has occurred and it should not failover.

The parse-hetzner-json.py script now has a try-except construction for the HTTP requests and has 3 exit codes:

  • 0: Everything OK, I have the failover-IP
  • 1: Unknown Error, can’t get status of the failover-IP
  • 2: Everything OK,  I do not have the failover-IP

The error-codes are then processed by the hetzner-failover-ip OCF script as follows:

${OCF_RESKEY_script} -g -i ${OCF_RESKEY_ip}
case $? in
0)
return $OCF_SUCCESS ;;
2)
return $OCF_NOT_RUNNING ;;
*)
sleep 30 # Do not DOS Hetzner
return $OCF_ERR_GENERIC ;;
esac

The sleep 30 is required, as too many requests to the Hetzner failover webservice (which happens when $OCF_ERR_GENERIC is returned) will ban you for a couple of minutes with an HTTP 403 status.

Another advantage of the new exit-codes (and the processing of them) is when the python interpreter fails (exit-code 1) $OCF_ERR_GENERIC is returned and no failover will happen.

All of the above amounts to this: When the webservice is unreachable, the JSON is unparsable or something happens that isn’t meant to happen, heartbeat will soft-fail and not fail over.

The files:

Hetzner Failover IP OCF script part deux: local DNS resolving

Wednesday, February 23rd, 2011

Two weeks ago we published a script that allows one to update the failover address provided by Hetzner using an OCF script. This makes it possible to provide redundant services between two systems within the Hetzner network. Even though this script by itself seems to function properly, it does have one shortcoming.

Consider a setup where both systems provide a set of services that use the same data store (e.g. a MySQL database). Even though these database services are replicated, queries must always be processed by the master node. Naively, one could solve this by simply letting all these services use the failover address provided by Hetzner. This will however not work, for the reason that even though traffic from the outside will always be routed to exactly one of the two systems, both systems have the address defined locally. The only way to perform connections between both systems, is by using the per-system (non-failover) addresses.
(more…)

Hetzner Failover IP OCF script

Friday, February 11th, 2011

At Hetzner you can get very cheap servers. If your application stack can handle failovers and the like, it’s a cheap venue to setup a fairly large setup. One thing that’s a bit different than at most other colocators I know, is their network setup. They actually route all traffic via managed switches to your machine. So all machines are in their own network. That can be a problem if you want to do cool stuff like moving an IP address on the fly.

Luckily, they have provided “Failover IP” addresses, which you can allocate to a server and which you can switch to another server. But only via a web interface. The web interface also has an API, which makes things a bit easier. For one of our customers, we wrote an OCF script that can perform the failover, so we can user heartbeat and pacemaker over there.
(more…)