Keeping Daemons Alive With Upstart

February 09, 2010

As web programmers we’re generally pretty comfortable with daemon processes like web and database servers. These programs (eg: Apache and MySQL) have robust and proven methods of keeping themselves alive. Even if they occasionally die they seem to come back to life on their own, and no real knowledge of this mechanism is necessary for their use.

But maybe you’re starting to use other daemons like Memcached, Redis, or Delayed Job which don’t come with your operating system in convenient pre-configured packages, or you need a newer version than your package manager provides. What then? Will your daemons be as reliable as the popular server software we’re accustomed to? What are the risks?

What Can Go Wrong?

In a live web application? Lots of things. Let’s say you’re on vacation—these things usually happen when you’re on vacation—and you get an email one morning from your hosting provider stating that your machine was “unresponsive” and needed to be rebooted. You then discover that while Apache or Nginx started up automatically, your Mongrels didn’t, and your site has been down for the past 10 hours.

There are a lot of Bad Things that can happen to daemons, but I think about them in terms of three basic types:

  1. A daemon doesn’t start up after a machine re-boot (often by your hosting provider due to maintenance or malfunction).
  2. A daemon crashes due to a software bug or a hardware problem (eg: bad memory).
  3. A daemon is running but unresponsive: hogging resources and doing nothing.

A Simple Safety Mechanism

Protecting yourself from all three types of problems requires monitoring software, like Monit. However, in my experience, the first two problems (non-running daemons) are by far the most common, and I’d like to propose a solution to them which is still new to me, but which is very appealing due to its simplicity.

If you’re using a recent version of Ubuntu or Fedora Linux your system uses Upstart instead of the traditional SysVInit for controlling processes on system boot. Upstart, which is far more powerful and interesting than SysVInit, can (1) start daemons when your machine boots up, and (2) re-start them when they crash. It requires no additional software, and is extraordinarily easy to configure. For example, to have Upstart manage my Delayed Job daemon, I add this config file to my Rails application at config/delayed_job:

# Start when system enters runlevel 2 (multi-user mode).
start on runlevel 2

# Start delayed_job via the daemon control script.
exec /usr/bin/env RAILS_ENV=production /path/to/app/script/delayed_job start

# Restart the process if it dies with a signal
# or exit code not given by the 'normal exit' stanza.

# Give up if restart occurs 10 times in 90 seconds.
respawn limit 10 90

I then create a symbolic link to it in the Upstart config directory:

cd /etc/event.d
sudo ln -s /path/to/app/config/delayed_job

That’s it! Delayed Job will now restart if it crashes or the machine is rebooted. To start the daemon for the first time I run:

sudo initctl start delayed_job

Again, this is something I’ve just started using. I found out about Upstart recently and it seems like a good solution to a problem which doesn’t always warrant a process monitor (ie, for non-critical services on small sites). I’m sure there’s a better way to configure Upstart. In fact, I suspect there’s a way to configure Upstart (in the current or a future version) to handle unresponsive daemons. Upstart is a powerful event-driven tool which may eventually replace cron on Ubuntu. If you can make any improvements to the code I’ve posted here, please leave a comment below.

comments powered by Disqus