Kumina into the cloud; creating Amazon EC2 images

At Kumina we have already gained lots of experience when it comes to deploying and administering Debian installations on virtualisation platforms such as KVM and Xen. In all our setups, we also perform administration of the Dom0 — the operating system running the virtualisation software. Lately we have also been looking at cloud computing solutions, such as Amazon EC2. One of the advantage of cloud computing is that it’s easy to provide scalability. One can simply spawn new system instances on demand. Unfortunately the lack of administrative access to Dom0 can make it harder to debug and recover instances.

In order to make use of Amazon EC2 to its full potential, it is important that we can quickly spawn Debian installations that are automatically configured using Puppet. We accomplish this by creating our own Kumina-branded Amazon Machine Image (AMI). Compared to the stock Amazon Linux and Ubuntu images, it uses a different approach. Instead of creating an image of a pre-installed Debian system, we have created a relatively small system (about 12 MB), which uses Debootstrap to store an up-to-date installation on the provided storage space. When finished, it stores a set of pre-generated SSL certificates for Puppet in the right place and reboots into this new Debian installation. From within this system, we run Puppet to install additional pieces of software and configure the system correctly.

Even though we did our very best to make this AMI a piece of art, there are some loose ends. When booting, the disk is already divided into two separate partitions (one for the AMI, one for all the remaining space), which cannot be repartitioned later on. This is why the installation image is exactly 128 MB and recycled as the /boot partition of the resulting system. This does however have the advantage that it’s now possible to use LVM to partition the remaining disk space, since we’re not booting off LVM.

Also, we noticed both the Xen kernel and the busybox package supplied by Debian are highly unsuitable to be used in this setup. The kernel depends a huge amount of modules to actually boot properly; even the Xen blockfront and netfront and the ext file system driver are built as modules. This is why we use the Ubuntu kernel during bootstrap. Unfortunately, the latest release in Maverick has issues rebooting on an SMP system, which is why we have to resort to a pre-release version (2.6.35-28.50). The Busybox package lacks a large amount of tools, which is why we simply build a custom version on the host system. Fortunately, nothing of the install image is part of the eventually installed system.

Amazon allows one to pass up to 16 KB of instance-specific data, which can be downloaded from a predetermined address after startup. We use this 16 KB to store an installation script, which includes the pre-generated certificates for Puppet, disk partitioning settings and a hostname. Right now this data takes a mere 3 KB, meaning there is enough space left for future use.

Even though the scripts to create new instances are close to finished, there is still a lot of work left to improve integration. For example, instances are allocated IP addresses at random. We still need to write scripts to create DNS entries for these systems on demand. So expect to see more blog posts on EC2 in the nearby future!


*Image source: https://unsplash.com/photos/xekxE_VR0Ec

Tags: , , , ,

5 Responses to “Kumina into the cloud; creating Amazon EC2 images”

  1. Eric Kow says:

    Makes sense. Meanwhile, we’ll be working on GSoC project to improve the interoperability situation, making it easier for Git users to contribute to Darcs projects and vice versa. Still, some users may prefer to standardise anyway. Good luck!

  2. Eric Kow says:

    Hi Ed,

    Sorry to hear you’re leaving Darcs.
    Hope the transition goes well (darcs-fastconvert) may be useful.
    Let us know at darcs-users@darcs.net if there is any way we can help.

    Best regards,


    • Tim Stoop says:

      Hi Eric,

      There’s nothing wrong with darcs, really, it’s just that the whole puppet community seems to have defaulted to git and it’s easier for us to follow them. Makes sharing code way easier. And it’s easier for us to use one version control system instead of mixing them up.

  3. Ed Schouten says:

    Hi Arthur,

    There are indeed some hardcoded AMI numbers in the scripts. The problem is that these numbers change every time you rerun the buildami-script, so I’ve simply been changing the spawn-script manually.

    Right now we are in the process of switching from Darcs to Git for some of the codebases we develop internally. Making these repositories available to the public is also one of the things on our todo-list, though we have to make sure we don’t publish confidential (e.g. customer specific) data by accident.

    Of course, we’ll write an announcement on this blog by the time we’ve finished our migration. Thanks for showing interest in the scripts!



  4. Arthur Clune says:

    This looks really good. Any chance of setting up a GitHub repo for it? Also, how about some sample config files.

    There also seem to be some hardcoded AMI numbers in there and other things that I think might need tweaking to make it more generic.



Kumina designs, builds, operates and supports Kubernetes solutions that help companies thrive online. As Certified Kubernetes Service Partner, we know how to build real solutions.