Writing a Good Cron Job

#sysadmin

You’d think writing a good cron script would be simple, right? It’s just a snippet of a shell script inside a formatted file. But I often see people new to UNIX get tripped up when using cron. I won’t go into all of cron’s issues in this post, but I’ll try and give you some advice that should improve your chances of success.

Traps

First, let’s cover some common traps people fall into. Why do most cron jobs fail for new users?

The Environment

Cron doesn’t run your commands using your login shell and profile. By default, it will use Bourne shell (/bin/sh), with a very small PATH environment variable. Often, PATH is set to /usr/bin:/bin.

You can control which shell cron uses to run your job by setting the SHELL variable within your crontab. You can also explicitly set the PATH within your crontab. I advise doing both.

Error Handling

Cron’s primary reporting mechanisms are sending email and writing to the system’s log file. Notice I said, “reporting mechanisms,” and not, “error reporting mechanisms.” There’s a reason for that.

Cron doesn’t understand success or failure like other parts of the system. Other parts of a UNIX system use a command’s return code to determine whether it completed successfully. All cron understands is whether the command generated any output or not. If the command generated output to stdout or stderr, then it will generate an email and/or a log message.

If you want to receive emails from cron, the questions you need to ask include: Can this machine send email? Is the local SMTP daemon configured to send email to the local user’s mailbox, or to your actual IMAP inbox?

If you prefer to leverage your existing monitoring infrastructure instead of using cron’s built-in email facility, then are you monitoring the log file for errors?

You can control where cron sends email using the MAILTO variable within the crontab. You can crontrol the sender address using the MAILFROM variable.

The Bare Minimum

At the very least, be explicit about how your cron commands are executed. I recommend setting the following variables within your crontab: SHELL, PATH, and MAILTO.

SHELL=/bin/bash
PATH=/usr/bin:/bin:/usr/local/bin
MAILTO=admins@example.com

# Say 'Hello cron' every day at 1:30 AM.
30 1 * * * echo 'Hello cron'

Prefer a Separate Script

Rather than writing your cron task inline within the crontab, prefer to write a separate script. By encapsulating your job in a separate script you can set your own variables and working directory so you control the execution environment. Of all the tips I’ll give you, this one gives you the greatest chance of success. You also get to pick the rules you want to play by. By default, cron uses the Bourne shell. When you use a separate script, you tell cron who’s boss via the shebang line. Prefer to use Perl or Python? Go ahead.

Give your cron scripts meaningful names like, purge-old-webserver-logs.sh, or backup-database.sh. You’ll thank yourself later.

Add a comment in the crontab describing the job. Why it exists, and when you expect it to run. I don’t know about you, but I always have to look up the syntax of crontab(5). A useful comment saves me a trip to the man page.

Keep your scripts in a version control system like Git, Mercurial, or Subversion. If this cron job is managing something for your software project, then keep your cron scripts in the same repository. I’m thinking of Rails or Django apps here, but don’t limit yourself to that scenario. Think about whether it makes sense in your project. I don’t usually bother keeping the crontabs themselves in source control. I use configuration management to manage most of my cron jobs, and my configuration management code is in Git.

Security

Don’t shove everything into root’s crontab. If you make a mistake, the results could be catastrophic because root is all-powerful.

Use the principle of least privilege when it comes to file ownership and permissions. For example, if you’re pruning backups of your web site’s data, run the job as the user who owns the data. If your cron job needs to run a privileged command, consider writing targeted sudoers(5) rules to accomodate it. Cron is executing this script on your behalf, so it’s up to you to secure it.

Testing

Schedule the cron job to run a few minutes into the future. Check to see if it worked. Check the log file for errors. If you want errors to send you email, check your inbox to see if a failed job resulted in an email.

If this job has some destructive side-effects like deleting files, please run it in a test environment. A technique you can use to test your cron script is to use an environment variable like, DEBUG, to switch between effectful and no-side-effects execution. You could use command-line arguments instead, but I often find that overkill for a cron script. But, choose the technique that works for you. What’s important is that you make it easy to test.

DEBUG=1 ./purge-old-logs.sh

Example

Here’s an example using the techniques I’ve described. This script purges old log files to prevent the disk from filling up.

Here’s the crontab(5) entry.

SHELL=/bin/bash
PATH=/usr/bin:/bin:/usr/local/bin
MAILTO=admins@example.com

# Clean up old backups at 1:30 AM every Saturday.
30 1 * * 6  /home/username/webapp/scripts/purge-old-logs.sh

And here’s the purge-old-logs.sh script.

#!/bin/bash

LOG_DIR=/home/username/webapp/logs

if [[ -n $DEBUG ]]; then
    find $LOG_DIR -mtime +10 -exec echo {} \;
else
    find $LOG_DIR -mtime +10 -exec rm -f {} \;
fi

A Cautionary Tale

Even experienced users of cron can get tripped up. I created a cron job to back up a directory every week, but it wasn’t working. Can you spot the error?

# Back up 'mydir' at 1:00 AM every Saturday.
0 1 * * 6  tar -zcf $HOME/backups/backup-$(date +'%Y-%m-%d').tar.gz mydir

I specified the command to run inline, instead of creating a separate script. I thought the command was pretty straightforward, and I was being lazy.

After spending a while testing and debugging this issue, I went to the man page and saw this:

The “sixth” field (the rest of the line) specifies the command to be run. The entire command portion of the line, up to a newline or a “%” character, will be executed by /bin/sh or by the shell specified in the SHELL variable of the cronfile. A “%” character in the command, unless escaped with a backslash (\), will be changed into newline characters, and all data after the first % will be sent to the command as standard input.

The issue was the % characters within the $(date +'%Y-%m-%d') command. I edited the crontab, escaped the percent characters, and sure enough it worked. In this case, I thought using single quotes around the % characters protected me from shell expansion, which I didn’t want. I didn’t know that % was a special character when cron evaluates its crontab file input.

That’s what I get for not following my own advice.

Further Reading

  • You can read the man page for the crontab file format using the command, man 5 crontab.
  • DigitalOcean has a good cron tutorial for people new to cron.
  • This article on Crontab Best Practices contains good advice.
  • Cronic is a useful program to use within your crontabs. It wraps your command invocation to work around some of cron’s warts. For example, Cronic will check your command’s return code for success or failure, and only send email on failures, which is a welcome improvement. The Cronic home page also describes common ways used to handle the output of a cron command, and the issues surrounding it.
  • Dead Man’s Snitch is a service you can use to monitor if your cron jobs are running correctly. Add a ping to Dean Man’s Snitch to your command, and the service will let you know if a cron job fails to check in on time.

Thanks to Chris DiGiovanni for reviewing this essay and providing feedback.