Nginx configuration tweaks for client-side speed optimization

| |

As we move more sites to an Nginx/Passenger server stack, we need to translate the server-side optimizations we use for browser-side performance from Apache configuration to Nginx. Here’s how that comes over:

Gzip compression

In Nginx, this is an optional module which should be built in to the software at compile time. If it’s present, you can activate it in site configuration with a block like this:

 gzip on;
 gzip_min_length  1100;
 gzip_buffers     4       8k;
 gzip_proxied     any;
 gzip_types       text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

Note that Nginx considers text/xml and text/html to be redundant MIME types.

Long expiration dates

We want to have our assets cached in the user’s browser cache as long as possible, with a few exceptions (e.g. advertising assets.) To that end, we set expiration headers:

       location ~* \.(js|css|jpg|jpeg|gif|png)$ {
                if (-f $request_filename) {
                   expires                        max;
                   break;
                }
       }

This uses the filename extension to determine which files get “far future” expiration dates. If you’re used to naming files without extensions, this might not work so well for you.

By default, Nginx doesn’t use Etags, so we can stop worrying about that.

Surviving Django traffic spikes with Nginx and memcached

| |

Normally I wouldn’t categorize a multi-national organization with a seven-figure annual budget as a “small” client, but we didn’t build the site for the World Marathon Majors, a sort of mass-marathon trade group made up of the Boston, London, Berlin, Chicago and New York City marathons. Instead, we inherited their Django-based site from a previous partner, and our job is to keep it running, fix what breaks, and make modest upgrades.

The site’s traffic is flat for most of the year, but race day for any of the five events produces a spike in the traffic graph resembling a shark fin in a calm sea. Last month, with both Boston and London on the horizon, we took steps to ensure the site and its new hosting could handle the load.

Because the site does not ask users to register, we recognized that we could cache complete HTML pages as they were rendered by Django and expire them after an arbitrary time span. If we kept rendered pages in the cache for ten minutes, for example, no page would be rendered more frequently than once every ten minutes, and yet news updates would never be delayed by more than ten minutes waiting for a cache refresh.

Because we were already using Nginx as the front-end of our server stack, we followed this stellar write-up by Alex Holt describing a simple system where Django writes to the cache, and Nginx reads from it. By using memcached for our cache rather than writing to disk or a database, we realized a little extra benefit: whenever Nginx has a cache hit, the page is returned significantly faster than if there’s a cache miss and Django has to render the page. This isn’t surprising when you think about the whole stack–of course a simple read will always be faster than a dynamic page build with multiple database queries–but gave us a nice Zen server moment. By doing less work, the server was getting more done.

And the whole stack survived London with flying colors. Now we just have to figure out how to unwind the site’s designed-in assumption that UK is a legitimate language code for English.  (Hint: People who speak “UK” use the Cyrillic alphabet.)

Common Running on hold

| |

We’ve shut down the running application at commonrunning.com, and redirected all traffic to that domain to this post.

The Common Running idea suffered from some serious drawbacks. One of those was a cold start problem, where the functionality of the site for early adopters was limited until we had reached a critical mass of user input (a level the site never reached). Another was the problem of basic data; as long as we were soliciting reviews, we were obligated to maintain a comprehensive, high-quality database of current and recent running shoe models, and we were unable to meet that obligation while also performing paying client work.

We still hold out some hope for the CR technology model, perhaps with a plan in place to address the problems outlined above. For the time being, however, we won’t pretend the site is useful and will be concentrating our energy on other projects.

(Check the collection of Common Running posts for some bits of the two-year history of the site.)

26.2 miles of data

| |

We work on an occasional consulting basis with the B.A.A., organizers of the Boston Marathon. The 114th running of that venerable race happened on Monday, April 19th. Quite a bit has changed in the operation of a major marathon since 1897, and Network World produced this report on how the B.A.A.’s other technology partners handle the information stream created by ~25,000 runners (23,176 starters and 22,678 finishers according to the B.A.A.) who, despite participating in one of the world’s oldest and simplest sports, have since the mid-1990s been accompanied by RFID chips.

If you’re curious about transponder-based timing technology, I wrote an article for New England Runner in late 2008 which, while already obsolete in terms of the current state of the art, pretty much captured things at the time. The video above was brought to our attention by Josh Merlis of Albany Running Exchange, whose event production team does some pretty nifty stuff with transponder data, PHP, and occasional peripherals like on-course digital cameras.

We’re hiring!

| |

We’re hiring! We are currently in need of a front-end website developer.

We’re a small custom development house, with new custom Content Management Systems about half our current work, and half ongoing feature development for existing clients. We’re conveniently located right downtown in Amherst, overlooking the Common.

XHTML/CSS and recent web development experience is necessary. A design background, Ajax, Javascript and/or advanced CSS-fu are definite pluses. We’re looking for someone excited about the modern web, and the exact details of your résumé are less important than if you are “smart and get things done.”

Primary responsibilities will include:

  • Dynamic front-end web development using XHTML and CSS.
  • Updates, new features and maintenance for existing web applications, and new development to client specifications.
  • Thoroughly testing code written by yourself, as well as others.

Necessary skills and attributes:

  • Ability to write consistent, standards-compliant XHTML and CSS.
  • Ability (and eagerness) to learn new skills on the job.
  • Attention to detail and production schedules.

Compensation commensurate with experience, plus benefits. Please submit a resume and cover letter to jobs@commonmediainc.com. This is an on-site position only, no telecommuters, please.

ETA: We have all the applications we need at the moment, thank you. We’ll update this page if that changes.

Rejecting mail for valid local users

| |

Several months ago we mentioned that certain Linux distributions will, if they are running an SMTP server and accepting mail for local users, accumulate spam for “role” users who will never read their mail, e.g. mail, uucp, news, etc.

I finished by suggesting,

The best shortcut here is to bounce any email destined for the role users. This will vary depending on your MTA, so I won’t detail it here.

It’s true that it varies by MTA, but it turns out it’s really hard to find this information. MTAs aren’t set up to reject mail from specific addresses; they want a specific list of valid addresses and they’ll reject everything else.

It turns out that there’s a faster and cleaner method which is MTA-independent: lock the mailboxes for those users.

This could be as simple as putting an empty file in /var/spool/mail/uucp (for example) which is owned by root with 600 permissions, but that’s going to generate a bunch of error messages when your local delivery agent tries to write to a file it doesn’t have permissions for. A more elegant solution is to symlink those paths to /dev/null:

ln -s /dev/null /var/spool/mail/uucp

Now make sure the symlink has the correct ownership…

chown -h uucp:mail /var/spool/mail/uucp

Now all that spam will be silently delivered to the bit-bucket.

New Client Site: TABBForum

| |

Tabb ForumIf you’re interested in capital markets, check out our newest client site, Tabb Forum.

Streamlining Drupal updates

| |

We don’t run many Drupal sites, but there are enough of them. I wish the upgrade path for Drupal was as easy as WordPress’s svn-based upgrades, but I’ve borrowed some ideas from Rails and Capistrano to make my process a little quicker than it might be otherwise.

Each Drupal upgrade provides a zip file (unless you’re working with CVS and frankly I’d rather not). Each of our Drupal sites has its own user, and the site root lives in the home directory of that user (e.g. ~/public_html or ~/www.) I started by unzipping each version of Drupal independently in the home directories (e.g. ~/drupal-6.14, ~/drupal-6.15.) Then I would make public_html a symbolic link to that directory. This meant I could “flip the switch” between versions with one command:

$ rm public_html && ln -s drupal-6.15 public_html

Still, I needed to copy a bunch of site-dependent files (e.g. the drupal/sites/* files, among others) between the old versions and the new, and that was getting tedious. So finally I created a shared directory, ~/shared/ with all the site-dependent files. This served to take those files out of the “deploy path”. Now I can use symlinks to install them in each new version in turn:

$ cd ~/drupal-6.15/sites/all/ && ln -s ~/shared/sites/all/* ./
$ cd ~/drupal-6.15/sites/default && ln -s ~/shared/sites/default/* ./
$ cd ~/drupal-6.15/themes && ln -s ~/shared/themes/* ./

Undoubtedly someone has already scripted this stuff, but I was pretty proud of it so maybe it will be useful to someone else. (There’s always Deploying Drupal with Capistrano, but I think that’s solving a slightly different problem.)

Python package trouble? Check your python

| |

This is exactly the sort of low-level stuff you’d think everyone should know, but I searched an error message today and didn’t get a useful answer. I found one, so here it is for the next searcher.

If you’re trying to build a Python package (in my case, the ReportLab toolkit), and your build fails with a string of error messages starting with Python.h: No such file or directory, the problem is that the package includes some amount of code which is written in C. The build is trying to compile that code, and the C compiler is looking for the Python C headers, and for most Linux users (I ran in to this on an Ubuntu system) the C headers aren’t part of the core Python package. You need the python-dev package. Try this:

sudo apt-get install python-dev

Then try your build again; I bet it will work.

YouthBuild Providence launches!

| |
YouthBuild Home Page

YouthBuild Home Page

In conjunction with the fabulous design team at PopKitchen, we’re delighted to announce the debut of YouthBuild Providence’s brand new website!

It’s always fun to be on board from kick-off to launch, and this was no exception.