Adventures in highish availability

C1 | Mon 21 Jan | 11:35 a.m.–11:55 a.m.

Presented by

  • Peter Chubb

    Peter has been a long-term contributer to open source (his first patch was to international iSpell in around 1990 to enable Australian spelling rules), to Unix (he worked on the Unix kernel for SGI, Fujitsu, and for AT&T Bell Labs while at Softway Pty Ltd), and to Linux systems software (kernel and low-level software like u-boot and qemu). He has spoken at many LCA events. Peter has never used a Windows operating system except when forced to, and then for only a short time.


I manage a small farm of servers and network for continuous integration and development, supporting around 50 users. We recently retired about a dozen servers, and have instead used containers and virtual machines on a pair of really big servers. Given some excess capacity in the new machines, I decided to try to set up replication and failover, so I can bring one machine down for maintenance, and people won't notice (much). Although there are off-the-shelf tools (like Pacemaker), they didn't seem applicable --- so we rolled out own. In hindsight this may have been a mistake. In this talk, I'll be talking about all the things that went wrong.