Hetzner Failover IP OCF script

At Hetzner you can get very cheap servers. If your application stack can handle failovers and the like, it’s a cheap venue to setup a fairly large setup. One thing that’s a bit different than at most other colocators I know, is their network setup. They actually route all traffic via managed switches to your machine. So all machines are in their own network. That can be a problem if you want to do cool stuff like moving an IP address on the fly.

Luckily, they have provided “Failover IP” addresses, which you can allocate to a server and which you can switch to another server. But only via a web interface. The web interface also has an API, which makes things a bit easier. For one of our customers, we wrote an OCF script that can perform the failover, so we can user heartbeat and pacemaker over there.

Due to the fact that pacemaker expects all variables to be the same on both machines, we need to use several data sources. We’ve created it as follows:

An OCF script that calls a Python script for assigning the failover IP
The aforementioned Python script, which reads some variables from a local file (defaults to /etc/hetzner.cfg) and which actually talks to the API to switch the IP address or check if the IP address is currently assigned to this host
A local config file which is read by the Python script and contains the Hetzner API credentials and the local machine IP address.

The local IP address in the configuration file is needed because we run all important stuff in VMs and the API expects the IP address of the iron to which you want the failover IP to point. Usually, you do not have access to the local IP address, which is why we simply set it up in the configuration file. The Python script is fairly simple. You can run it with -h to see the possible commands you can give it. The config file probably requires some explanation:

[dummy]
user = #12345+RaNdM
pass = sEcReT
local_ip = 1.2.3.4

The user and pass can be generated from the Hetzner Robot interface. When you have selected the server to which the failover IP is assigned, select the Admin option and request new credentials. These are specific to that machine and all resources assigned to that machine. This is a safety measure. The local IP is the primary IP address of the local machine. So if you want to be able to switch the failover IP address to the machine with the local IP address of 2.3.4.5, that machine will have local_ip = 2.3.4.5 in it’s /etc/hetzner.cfg file. Are you still following this? Good!

Now, the using the OCF script is simple. Add it to /usr/lib/ocf/resource.d/kumina/hetzner-failover-ip and setup your CRM configuration as follows:

primitive IP_mysql ocf:kumina:hetzner-failover-ip \
	op start interval="0" timeout="300s" \
	op monitor interval="60s" timeout="300s" \
	params ip="1.1.1.1" script="/usr/local/sbin/parse-hetzner-json.py"

The 1.1.1.1 should be replaced with your failover IP, of course. The script needs to be added. If you want to use another configuration file, you can change it into /usr/local/sbin/parse-hetzner-json.py -c /etc/myconfig.hetz or something that suits your fancy. The timeout is needed, because the Hetzner API is a slow beast. (On a related note, I think it’s possible to change the OCF script to use this as a default, but I couldn’t find it quickly.)

Do let us know if you have questions or if this helped you!

The files:

Update: Add monitor statement to CRM configuration, to work with scenarios where failover addresses are modified manually.

Update 2: Kumina no longer uses the code above at this moment, therefor the code is no longer maintained by us.

Tags: failover, heartbeat, hetzner, ocf, pacemaker

This entry was posted on Friday, February 11th, 2011 at 15:47. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

21 Responses to “Hetzner Failover IP OCF script”

araknos@gmail.com says:

December 1, 2016 at 21:20

Your work is very nice. To avoid checking hetzner failover ip with such delay, I preferred haproxy heartbit on local meshed vpn built with tincd. Since failover ip is bound to machine real ip, bridged to a vm with tincd local vpn ip… I monitor failover “locally” in a more efficient way then you can do with those delays, but still using your script.

I still have to switch variables once a failover transition occurs (and a new failoverip-realip association arises).

Doei! 😉
Pizza says:

February 15, 2013 at 11:18

Many thanks, it’s very helpful and save my time
Artyom says:

September 12, 2012 at 14:30

Thanks a lot!
pithagora says:

September 3, 2012 at 11:56

Hi Tim,
Is my Heartbeat config correct?

logfacility daemon
keepalive 2
deadtime 15
warntime 5
initdead 120
udpport 694
ucast eth0 IP_OF_THE_SECOND_SERVER
auto_failback on
node ha1
node ha2
use_logd yes
crm respawn
pithagora says:

August 13, 2012 at 17:44

i run the OCF script within shell and i get:

/usr/lib/ocf/resource.d/company/hetzner-failover-ip start
/usr/lib/ocf/resource.d/company/hetzner-failover-ip: 165: -s: not found
hetzner-failover-ip[16916]: DEBUG: default start : 0

What i have changed into downloaded from your site script is:

OCF_ROOT=/usr/lib/ocf

Do i have to do some more changes into it?
Tim Stoop says:

August 6, 2012 at 16:30

I have no idea what’s going wrong, then. I’d suggest trying to debug from within the OCF script, add a log file or something to catch the actual error message.
pithagora says:

August 6, 2012 at 16:27

I runt it manually like this:

/usr/local/sbin/parse-hetzner-json.py –ip=my_failover_IP -s -c /etc/hetzner.cfg

and it moves the IP to the specified in the /etc/hetzner.cfg local_ip

probably it should work like this too:
/usr/local/sbin/parse-hetzner-json.py –ip=1.2.3.4 -s
Tim Stoop says:

August 6, 2012 at 16:06

This is the interesting line:

WARN: unpack_rsc_op: Processing failed op IP_mysql_monitor_0 on ha2: unknown exec error (-2)

Try running the script by hand and see what error you’re getting?
pithagora says:

August 6, 2012 at 16:02

Hello Tim,
I’m getting such output in the /var/log/daemon.log of the second node of the cluster when i
do /etc/init.d/networking stop on first –
http://pastebin.com/0Z1Gj7j0
Can you please help understand what is wrong?
Thanks in advance.
Stefan says:

July 31, 2012 at 17:28

Thanks for these scripts. Why do you say failover is slow? It should not depend on the monitor interval of the failover ip ??
According to the Hetzner doc monitoring is limited to 100 req/hr. For a 2 node setup (cloneset) what should make a monitor interval at about 80 secs.
or am I missing something here?
- Tim Stoop says:
  
  August 1, 2012 at 11:40
  
  80 if no failovers occur, because those are counted towards your maximum. And 80 if you only use 1 IP failover address. We use five addresses, currently, and the limit is global. It kind of adds up really quick. Also, 80 seconds is fairly long, in other setups we generally have a monitor interval of 10 seconds for plain failover IP addresses.
Niki says:

April 10, 2012 at 08:15

Thanks a lot!
Tim Stoop says:

April 9, 2012 at 12:02

It’s an inconvenience, indeed. We’ve run into that problem as well. The solution is to increase the monitoring interval to something like 600 seconds. This makes failover rather slow, however. The guys at Hetzner don’t seem to understand the concept of “failover” mechanism, I’m afraid.

We actually left them because of all these kind of problems. If you’re interested in another hoster, you might want to checkout our other project, https://www.twenty-five.nl. At least you get to talk to people there who know what they’re talking about 😉
Niki says:

April 9, 2012 at 08:11

Thanks for answer.

I has deleted resource and added it once again and now it works.
But now I get another trouble.
Seems to me Hetzner has limitation connections count per hour to his API:
In /var/log/messages

WARN: unpack_rsc_op: Processing failed op ClusterIP01_monitor_60000 on frontend02-nginx: unknown error (1)

and if I run failover ip procedure manually :
curl -u login:password https://robot-ws.your-server.de/failover/1.1.1.1

i get

{“error”:{“status”:403,”code”:”RATE_LIMIT_EXCEEDED”,”max_requests”:100,”interval”:3600,”message”:”rate

but if I want route failover ip to another server everything works well.

Is it critical for pacemaker/heartbeat?
Niki says:

April 6, 2012 at 11:50

I have configured pacemaker as described there but resource does not work. It hangs with “Stopped ” status
IP_mysql (ocf::kumina:hetzner-failover-ip): Stopped

All scripts are executable.

My crm config:

node $id=”d8e93a90-765c-4ba8-9ee3-d111adfacf9c” host02
node $id=”e5c7ff03-b032-4052-964f-248c04aa031b” host01
primitive IP_mysql ocf:kumina:hetzner-failover-ip \
op start interval=”0″ timeout=”300s” \
op monitor interval=”60s” timeout=”300s” \
params ip=”1.1.1.1″ script=”/usr/local/sbin/parse-hetzner-json.py”
property $id=”cib-bootstrap-options” \
dc-version=”1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c” \
cluster-infrastructure=”Heartbeat” \
no-quorum-policy=”ignore” \
stonith-enabled=”false” \
symmetric-cluster=”false”

How I can investigate what’s wrong?
- Tim Stoop says:
  
  April 6, 2012 at 14:13
  
  Did you check the output from pacemaker? Does it even try to run the script? Does that script run when you try it manually?
Aliendog says:

February 27, 2012 at 18:30

Hi guys

Great tutorial, but I am having problems setting up hearbeat. Could you please share how your ha.cnf looks like?

Thanks,

A
RaSca says:

May 17, 2011 at 16:09

So the solution must be DNAT. I was supposing that, but this is an additional confirmation.

Thanks a lot and keep up the good work!
Pieter Lexis says:

May 17, 2011 at 13:44

Hi RaSca,

You can’t at hetzner. You can only failover to another physical machine at hetzner. We use IP-tables on that host to forward the packets to the VM.
RaSca says:

May 17, 2011 at 12:43

Great job guys!
Have you any suggestion on how to associate this failover IP to an internal virtual machine? I mean, what if I want to set a virtual gateway with a WAN and a LAN on which the WAN IS the failover IP?

Thanks a lot for this precious job!

RaSca

Hetzner Failover IP OCF script

21 Responses to “Hetzner Failover IP OCF script”

Recent Posts

Kumina has achieved ISO27001:2022 certification!

Migrating volumes to CSI in well-established Kubernetes clusters

Kumina HyperRouter

Certified Kubernetes Administrator Exam Developer

In response to Container Technologies at Coinbase

Search

Hetzner Failover IP OCF script

21 Responses to “Hetzner Failover IP OCF script”

Search

Recent Posts

Kumina has achieved ISO27001:2022 certification!

Migrating volumes to CSI in well-established Kubernetes clusters

Kumina HyperRouter

Certified Kubernetes Administrator Exam Developer

In response to Container Technologies at Coinbase

Tags