Archive for the ‘Ruby on Rails’ Category

Giving back, in a small way

Thursday, April 24th, 2008

OK, a very small way.

Like many small companies doing development on the Web, Common Media depends heavily on free and open-source software. Part of the point of open source is that a programmer using the program (or library, or plug-in) may get in to the code and wrangle it around until it works best for them. The obligation that comes with that freedom is to “give back” any such changes if they may be useful to the wider community. With big projects, that can mean active participation in a coding community; for smaller packages, it may just mean sending code back to the maintainer for consideration.

We mentioned a few weeks ago how we tweaked a plug-in for Common Kitchen. Today, that code became our first checked-in contribution to an open-source project, Netphase’s acts_as_amazon_product. Hopefully it won’t be the last.

(I also took the opportunity to use a topical test case. Check the commit to see which magazine we test magazine searching with.)

Rails asset hosts, SCM, and bundling

Wednesday, April 9th, 2008

This afternoon we pushed another big revision to the La Cucina Italiana website, which is now sharing a very small fraction of the thousands of recipes that magazine has in its archives. There will be more recipes coming online in the months to come, but most of the puzzles we had to solve are finished now.

One in particular was handling the photos which go with some recipes. Artful photography is a hallmark of the La Cucina Italiana brand, and the editorial team needed to be able to upload their images directly to the site. These photos wouldn’t be stored in the site database, but because they wouldn’t be part of our Subversion repository for the site, either, they had to live outside the normal site root in order to avoid being blown away by any site updates we deployed.

Enter Rails’ asset hosts. Rails allows for assets (e.g. images, CSS files, or Javascript includes) to be served by a host other than that of the main site. Because the asset host definition happens at the host name level, the asset host can actually be on the same box as the main site, or it can be elsewhere in the world; you manage that at a different level of abstraction.

A “free” benefit of asset hosts is that by defining multiple asset hosts (a0 through an), you can fool a user’s browser into downloading your site through a dozen or more different connections, rather than just the two it limits itself two with any single server. Rails will make each asset link use a different asset host, and the browser will open two connections to each server, not caring that they all happen to be on the same box. This gets us extra YSlow points (of course, we lose them again by requiring another DNS lookup for each asset host).

Our hangup, though, was that some assets, specifically CSS and Javascripts, did need to stay inside the Subversion repository and the site’s file tree.

Here’s the solution we came up with:

  • Defined an Apache virtual server answering to subdomains a0 through a4. (In other words, all four subdomains are the same real server, but the browser doesn’t know that; we could make them different physical servers someday.)
  • Set the site root for that server to /var/www/assets/.
  • Made symbolic links from /var/www/assets/javascripts to /var/www/production/current/public/javascripts, and from /var/www/assets/stylesheets to /var/www/production/current/public/stylesheets. (N.B. if you’re using Capistrano, “current” in your site directory is itself a symbolic link to a directory with your last site deploy.)
  • Set the upload code for recipe photos to store the uploaded images in subdirectories of /var/www/assets/images.

This way, we get all the benefits of asset hosts for assets which are under revision control, and assets which aren’t. And the asset hosts themselves let us have assets which aren’t necessarily under revision control.

The free bonus here was that by using a symbolic link to the javascripts directory, bundle_fu doesn’t have to know or understand our asset host setup; it just stores its bundled files in that same subdirectory as always, and it just works.

Y Be Slow?

Thursday, March 27th, 2008

When Yahoo! released it’s YSlow application last year as a plugin for the Firebug Firefox extension (because really, what web developers don’t have Firefox installed, even if it isn’t their primary browser?) nearly everyone installed the tool and started going down the list of rules Yahoo! laid down for improving “front-end performance” on websites. Several people wrote up suggestions for using the output to improve Rails apps, including a good summary for Nginx, but we’re using Apache.

“Front-end performance” means attacking speed as a problem between the browser and the server, and not as a problem which exists solely behind the server. We can optimize an application as much as we want, but if the browser makes thirty-five round-trips to the server fetching CSS, image files, and JavaScripts, application optimization isn’t helping much. In addition to grading apps on these fourteen points, YSlow gives a load time (in milliseconds!), and as your grade improves, you can also see the load time improving by perceptible intervals.

This afternoon I ran YSlow on the current development version of the La Cucina Italiana site. The initial grade was 52, an F. When I was done, it was 88, a B, and if I circumvent a dubious aspect of YSlow, it becomes a 98 A. I made only four changes: three Apache configuration tweaks and one Rails change. If you’re a Rails developer with Apache 2.2 in your stack, here are the low-hanging fruit for a better YSlow score. You can make these changes in your site configuration or within an .htaccess file.

ETags: The Yahoo! explanation of how ETags are important is a little confusing. The summary is this: you want to turn ETags off. They’re ineffective with multiple servers (i.e. an asset-server setup) and even if everything is on one host, there are other, just as useful means of avoiding downloads of cached files. So spare yourself that many bytes per connection and turn them off. It’s a one-line fix:

FileETag none

GZip components: The time saved by sending down compressed files is greater than the time spent compressing them, but only for text-based file types. (Images and PDFs are already compressed, so re-compression won’t help and might hurt.) If you’re using Coda Hale’s configuration for Apache and Mongrel (and many of us are) the code for compressing text components before download is already in your Apache configuration. However, Coda’s config misses one file type which Rails seems to use for Javascript, application/x-javascript, so YSlow keeps dinging us for uncompressed JavaScript files. With that added, the configuration for compression looks like this:

AddOutputFilterByType DEFLATE text/html text/plain text/xml application/xml application/xhtml+xml text/javascript text/css application/x-javascript
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

(Note that there should be only four lines there: one AddOutputFilterByType and then three BrowserMatch lines.)

Far-future cache expiration: Yahoo! is looking for really far future expiration dates for cache expiration. The reasoning here is to maximize the odds that your users are arriving at your site with a “primed” cache, i.e. one where most of your components are already loaded. By putting a way-out expiration on components, users keep those bits in their cache longer. The flip side is that filenames need to change in order to prompt the browser to load a changed file. Rails does this by automatically appending a timestamp to the filenames of many components. (Check your page source and see what is actually getting called when you load a CSS file.) Therefore, we can set our default expires header way out in the future.

This is a two-line addition to the Apache configuration:

ExpiresActive On
ExpiresDefault "access plus 10 years"

Reducing connection count: This is the tough one. Default Rails apps load a zillion (OK, seven) Javascript files and at least one but possibly many more CSS files, and that’s before we start with page images. Most browsers will only open two or three connections at a time to a given server, so all the files are waiting in line to get loaded. You can maximize the number of concurrent connections by using an asset server; this fools the browser into opening more connections by giving it more hosts to connect to. (This is like opening more Starbucks’ in the same town.) That doesn’t really save you, though, and it boosts the number of DNS lookups the browser needs to run. To really cut the number of JS and CSS files, you need to bundle assets. It’s possible to bundle images, but that’s not a problem for LCI, which has only four or five images on the home page. Bundling JS and CSS is where the action is.

There are multiple ways to do this, including a nifty program called AssetPackager and the caching code built in to Rails 2.0. We went with what we saw as the simplest route, which was bundle_fu, but we have an eye on AssetPackager for future consideration. bundle_fu is a Rails plug-in which takes all the CSS and JS files for a template and concatenates them into one CSS and one JS file, sometimes even minifying the JS. It’s a quick installation, and not only does it convince YSlow to give your site a good grade on the number of connections, it also gives good grades for “Put JS at the bottom” and “Minify JS” because, hey, there’s only one file, right? This one move improved our YSlow score more than any other step.

The score at this point: 88 points, a high B. We’re still suffering for not having minified our JS (a little, and we’ve decided not to bother, since it’s being gzipped anyway) and for not using a CDN.

Cheating: The CDN point is one of the most hotly-contested on the YSlow report card, because most sites as small as ours see very little return on the investment of putting our site on a Content Delivery Network such as Akamai. It’s possible, though, to get YSlow to turn a blind eye to your CDN-less-ness. Just add your own site’s domain to the list of CDN servers in YSlow’s preferences. Doing that jumped our score up to 98–pretty close to perfect.

Unfixable: One thing which may keep your site from ever having a good YSlow report is using a lot of outside components. Common Kitchen gets dinged because its page loads include calls to two different ad networks, not to mention multiple images from Amazon on some pages and the Google Analytics code. This makes for a lot of un-bundle-able scripts, multiple DNS requests, and a lot of components where we don’t control their ETags (or lack thereof), expiration header, or compression. Of course, we can’t control their server uptime, either, so it may be that the YSlow scores are the least of our problems!

Building LAPACK and Ruby’s linalg on Mac OS X

Monday, March 24th, 2008

Update, 7 September 2008: Before you actually do anything with this post, make sure to read the update and comments at the bottom.

Installing Ruby’s linalg linear algebra library on a Mac OS X system is problematic because linalg is built around LAPACK, the Linear Algebra PACKage, and OS X (at least the version I’m working with, 10.4.11) ships with a bastard version of LAPACK which is missing some important symbols for linalg. The way around this is to install a full version of LAPACK.

You’d think this would be easy, but LAPACK is written in FORTRAN, and the version of gcc included with Xcode doesn’t include a FORTRAN compiler by default. So before we can install LAPACK, we need a FORTRAN compiler.

There are at least two gcc-based FORTRAN compilers out there, and they both offer pre-compiled binaries for Mac OS X (Intel or PowerPC). You can build from source if that’s how you want to do it, but I want this over with, so I’m grabbing the Intel binaries and getting on with my life. LAPACK seems to have trouble dealing with the g95 compiler, so we went with gfortran. gfortran has a nice Mac-like .dmg installer, so you can just download that, click through the installation, and gfortran is ready in /usr/local/bin/ (which is hopefully in your $PATH). You can make a link so that the g77 command (the old gcc compiler for the FORTRAN77 standard) points to gfortran with this command:

sudo ln -s /usr/local/bin/gfortran /usr/local/bin/g77

Now you’ll want to start in on LAPACK. Download the tarball from http://www.netlib.org/lapack/lapack.tgz and store it in /usr/local/src/ as before. Unpack with

tar xzvf lapack.tgz

You’ll have created a directory named e.g. lapack-3.1.1. cd into this directory. What’s missing from LAPACK is the standard ./configure step; we’ll have to edit the make.inc file ourselves before running make to build the package.

Fortunately, Robert Hatcher builds LAPACK as part of his CERNLIB build, which means that the shell commands for creating a working LAPACK make.inc are available as part of that script. Here’s the relevant excerpt:

# customize makefile
sed -e 's/_LINUX/_DARWIN/' make.inc.example > make.inc
echo "" >> make.inc
echo ".SUFFIXES : .f .o" >> make.inc
echo "" >> make.inc

# go ahead and build - "all" will perform tests
make blaslib lapacklib tmglib > make.log 2>&1
if [ $? != 0 ] ; then
echo "*** Error in make blaslist lapacklib tmglib ***"
grep -i err make.log
fi

Now: this step will take a while. If you copied this all into a file and ran it as a shell script (the sane thing to do, I think), it will take a good while to run, on the order of ten or fifteen minutes; if you are keying in the commands line by line, it will pause long after the make blaslib lapacklib tmglib line. Don’t panic; this means it’s working. (If you’re paranoid and like seeing stuff stream across your screen to prove you’re compiling something, you may want to background the process and then use tail -f make.log to get the full output.)

Once it’s done, it’s time to put these files where they belong:

sudo cp blas_DARWIN.a /usr/local/lib/libblas.a
sudo cp lapack_DARWIN.a /usr/local/lib/liblapack3.a

(Note that it may be the case that there are faster BLAS libraries out there; if you’re squeezing every cycle out of your app, it may be worth looking into that, but it’s beyond the scope of this post.)

Unfortunately, we’re still not done. linalg still needs several libraries from the f2c package, which is quite hard to dig up. The best route I’ve found is to grab the package available through Fink. The trick is that Fink installs libraries in /sw/lib/ and we need them elsewhere (/usr/local/lib/ should work). Use a link to solve that:

sudo ln -s /sw/lib/libf2c.a /usr/local/lib/

Now it’s (finally) time to install linalg. Unfortunately, there’s no gem available for this that I’m aware of. (This may be because the package has been essentially “done” for four or five years, so it’s older than the widespread use of gem.) Download the tarball from the project page to /usr/local/src/ and un-tar it; you’ll get a folder named linalg-0.3.2. cd into that folder, and you should be able to use

sudo ruby install.rb

…but you can’t, actually. This builds most of the files you need, barring two; it will start the installation, but eventually stall because it’s missing two .so files, ext/linalg/linalg.so and ext/lapack/lapack.so. These are “Shared Object” files, akin to Windows DLLs, but the Makefiles in these directories defines the DLLIB macro as ending with the .bundle extension, and linalg.bundle is what gets built.

So, we brute-force it by breaking the process down into “make” and “install”, and in between we create those .so files.

ruby install.rb make
cp ext/linalg/linalg.bundle ext/linalg/linalg.so
cp ext/lapack/lapack.bundle ext/lapack/lapack.so
sudo ruby install.rb install

If you don’t trust this hack, put in ruby install.rb test before the install task to verify that everything works. I’m not sure why the package tries to install an .so file its own makefiles don’t build; if someone can figure that out and patch it, I’m sure the maintainers would love to know.

If you find any obvious errors in this, or see some steps we can stick, feel free to comment and we’ll make edits. Hopefully this will come in handy for someone.

Update, 7 September 2008: Be sure to read through the comments to where James Lawrence, linalg’s maintainer, points out the new (as of yesterday) 1.0.0 release which resolves most of these problems. If you’re struggling with linalg and aren’t using 1.0.0, try that new version.

Restarting Mongrel clusters with Capistrano 2

Saturday, December 8th, 2007

There’s (still) a glitch between the mongrel_cluster gem and Capistrano 2 (we’re using mongrel_cluster 1.0.5 and Capistrano 2.1.0, for reference) where the application restart at the end of a cap deploy fails with an error like this:

Couldn't find any pid file in '/var/www/[application]/current/tmp/pids' matching 'dispatch.[0-9]*.pid'

I’m not sure what’s causing this, but the solution comes at the end of this post, under the heading “Restart Mongrel.” Due to issues with Cap 2 and sudo, though, the provided script fails for us. We’re running the updates as root (bad idea, but it gets around the cap sudo issues) so I updated the task like this:

# Restart task

set :mongrel_config, "/etc/mongrel_cluster/#{application}.yml"

namespace :deploy do

	task :restart do
		run "mongrel_rails cluster::restart -C #{mongrel_config}"
	end

end

I also commented out the “:mongrel_conf” variable from our previous configuration, which it appears that Capistrano was ignoring anyway.

More elegant Mongrel restarts

Thursday, November 15th, 2007

Having explained our hack for bringing back Mongrel after a server crash, I discovered that our hosting company has a different approach. Their method has the advantage of not requiring scripts with hard-coded paths; on the other hand, it does require you to patch Mongrel itself, which makes things interesting come upgrade time. (On the other hand, the hypothetical Mongrel update may incorporate this patch, or avoid the problem some other way.)

Their method patches Mongrel to test whether the processes enumerated in the PID file(s) are actually still running. If the process is dead (that is, the PID file belongs to a Mongrel instance which died in a server crash,) the file is declared “stale” and cleared, allowing Mongrel to start up properly; if the process exists, the PID file is not stale, and Mongrel aborts startup as it was originally designed.

The method is detailed here (scroll down to the heading, “Stale PID files preventing Mongrel to start up,” ungrammatical though it may be.)

Bringing Mongrel back from a server crash

Wednesday, November 14th, 2007

When a server crashes, Mongrel (or a Mongrel cluster) obviously doesn’t get a chance to shut down cleanly. This means it leaves behind the files it uses to store its process IDs. When the server restarts, the Mongrel startup script attempts to start the daemon(s), but on finding these PID files are already present, assumes (incorrectly) that Mongrel is already running, and cancels startup, saying, “PID file log/mongrel.pid already exists. Mongrel could be running already. Check your log/mongrel.log for errors.”

This is technically correct behavior–after all, what if Mongrel really is already running?–but it makes it nearly impossible to bring Mongrel back automatically after a server crash; one would have to manually delete the PID file(s) and then start the daemon(s).

If you don’t have systems administrators tending your websites 24/7, you need a better solution. We considered hacking the init script (found at /etc/rc.d/init.d/mongrel_cluster on our Fedora Core server) but found that the necessary logic made the script too complicated. Instead, we created a new startup script, filed at /etc/rc.d/init.d/mongrel_cleanup to solve the problem.

The mongrel_cleanup script is set to run at the same run-levels as mongrel_cluster. On shutdown or restart, it does nothing, but on start, it checks for the presence of the PID files and deletes them if they’re found. It therefore has to run before mongrel_cluster, which is why the priority number is 84 for startup and 16 for shutdown: mongrel_cluster is 85 and 15.

To use this script, save it in /etc/rc.d/init.d/mongrel_cleanup (or whatever the appropriate script directory is) and then put it in the startup queue with these commands:

# chkconfig --add mongrel_cleanup
# chkconfig --level 345 mongrel_cleanup on

Also, edit this script. I’ve hardwired the paths and names of our Mongrel cluster PID files; you will want to change your paths, or let me know if you come up with a more elegant method!

#!/bin/bash
#
# Parker Morse for Common Media, Inc., 9 November, 2007
#
# mongrel_cleanup      Startup script to recover from crashes.
#
# chkconfig: - 84 16
# description: A hack to clear PID files left behind by Mongrel clusters
#              after an unscheduled server crash. Checks for the presence
#              of these files and deletes them if found.
#              

RETVAL=0
PIDFILE_DIR=/path/to/app/current/log

# Gracefully exit if the controller is missing.
#which mongrel_cluster_ctl >/dev/null || exit 0

# Go no further if config directory is missing.
#[ -d "$CONF_DIR" ] || exit 0

case "$1" in
    start)
      if test -s $PIDFILE_DIR/mongrel.8000.pid
          then
          /bin/rm $PIDFILE_DIR/mongrel.8000.pid;
      fi
      if test -s $PIDFILE_DIR/mongrel.8001.pid
          then
          /bin/rm $PIDFILE_DIR/mongrel.8001.pid;
      fi
      if test -s $PIDFILE_DIR/mongrel.8002.pid
          then
          /bin/rm $PIDFILE_DIR/mongrel.8002.pid;
      fi
      if test -s $PIDFILE_DIR/mongrel.8003.pid
          then
          /bin/rm $PIDFILE_DIR/mongrel.8003.pid;
      fi
      RETVAL=$?
  ;;
    stop)
      exit 0
  ;;
    restart)
      exit 0
  ;;
    *)
      echo "Usage: mongrel_crash_cleanup {start|stop|restart}"
      exit 1
  ;;
esac      

exit $RETVAL

undefined method `conditions_by_like’ for Classname::Class

Friday, August 10th, 2007

If you get that perplexing error, just run:

rake db:migrate

The problem is that the database is out of synch with the models, and your schema.rb needs to be updated.

Followup 8/14/2007: having these methods auto-generated is inconsistent at best, the code is available here if you’d rather eliminate the cat and mouse game and move on to larger problems (written by Alan Marcero).

Restarting Rails

Monday, June 18th, 2007

When I described our Capistrano implementation issues the other day, I was still missing one piece of the puzzle. Our deployment script would run cleanly, but there would be some errors at the end when Capistrano tried to “reap” the running server process and restart it. Because Dreamhost runs the server itself, the process ID of the server process isn’t found in the standard location, and Capistrano was unable to kill it and restart it.

Fortunately, nearly every problem on the ‘net has been found and solved before. I went back to the “Nuby on Rails” script which gave us the “Let Capistrano create current” tip, and found some code in their deploy.rb which handles the problem. With that covered, we’re really back to one-command deployments.

As an aside, something I’ve noticed as I climb the Rails learning curve is a startling number of weblogs professing to be “newbies learning Rails.” Many of them are giving good advice (anybody who can learn from their experience and pass it on is giving usable advice) but few of them are at the “newbie” level any longer, and I could argue that some of them never were. All of us are or were “new” to Rails and Ruby, but like us, I expect most of the authors of these “newbie” sites came to Rails with previous experience in LAMP, .Net, or some other framework. A true “newbie” blog, written by someone with no prior web programming experience (or perhaps no programming experience) would probably be painful to read, though.

I mostly mention this because I’m a little worried that experienced web developers feel a need to be self-deprecating about their skills with Rails. It’s fine if the culture of the framework is for everyone to adopt this “only-an-egg” approach, leaving their minds open for continual learning, but if everyone is disclaiming advanced knowledge in order to avoid responsibility for their suggestions, that’s a little creepy.

Deploying to Dreamhost with Capistrano

Saturday, June 16th, 2007

I fantasize about the day when we’ll be running on a co-located server where I’ll have full control of configuration, software installation, etc., but for the time being, we have shared hosting. (Actually, I fantasize about the day when we’ll be running a rack of servers behind a load-balancer. I’m not sure if the co-located box of our own is “starting small” or a case of even my fantasies having fantasies.)

Also, new as it is, the Rails community has already embraced a number of conventional-wisdom processes handed on as best practices; one of them is using Capistrano to automate deployment from the trunk of our Subversion repository to the production server. Or, as one of the tutorials I read started,

Yeah, we know. If you’re just getting started with Rails, and you’ve been reading all these great articles by all these experienced Rails developers, it’s pretty much inevitable. One day, you’re going to see this:

You ARE using Capistrano, aren’t you?

And you will say, well, no I’m not. I must be a complete idiot.

But it’s not really that simple. Maybe you’re not using Capistrano because getting it set up initially was a real pain.

When you first “Capistranize” a Rails app, you get a hefty deploy.rb file which is almost completely non-useful to you and your application. The basic tutorial suggests a basic file which is useful but still has problems on your actual deployment server. You go through the generated file, fixing configuration lines to match your installation (you think) and run cap deploy, and… long error messages.

Here are three distinct steps we took to solve some of the error messages and get Capistrano working.

  • Remove the :db role: Capistrano attempts to make SSH connections to every host listed with a role in deploy.rb. If you’re with a shared host, odds are you don’t have shell access to your database machine. You’ll be prompted for a password, then get an error which looks like this:

    ** [update_code] exception while rolling back: Net::SSH::AuthenticationFailed, commonkitchen
    authentication failed for `commonkitchen'

    Commenting out the :db role is sufficient to stop Capistrano from trying to connect to that host, and lets you move on to the next problem. Update: not having that role keeps migrations from being run. Instead, set the :db role to be the same hostname as the other roles.

  • Use a “standard” repository layout: I say “standard” because you’re free to lay out your repository however you wish. However, we initially had all our app at the top level of our repository, and Capistrano was trying to find it in the trunk/ directory. Best to use the recommended layout described in the turtle book… or at least keep your application inside a trunk directory, even if you never use tags and branches. The errors are thick, but somewhere will be one that looks like this:

    /usr/local/lib/ruby/gems/1.8/gems/capistrano-1.4.1/lib/capistrano/scm/subversion.rb:24:in `latest_revision': Could not determine latest revision (RuntimeError)

    Ultimately, it’s easier to rearrange your repository than teach Capistrano to drop the trunk path.

  • Let Capistrano create the web root: When we set up the virtual host on our server, the root directory (~/application/current/public) was automatically created by that setup. Capistrano would then create the rollback directories, but couldn’t create a symlink to the current directory. We got errors like this:

    command "~/application/current/script/process/reaper" failed on ourserver.example.com

    Once the current directory and its children were deleted, Capistrano was able to create its own current with a symlink to the most recent version, and everything was fine. (We figured this out from this tip.)

Now that Capistrano is working, it’s great to be able to deploy a revision with a single command. But getting there is less than simple, unfortunately.

Update: There’s a fourth tip I missed.

Update, 22 July 2007: Capistrano 2.0 is out now, so it’s worth noting that all the notes above were based on version 1.4.1. As of today, we haven’t upgraded yet (nor are we hosted on Dreamhost any longer) so use this information at your own risk.