Testing Rails Memory Usage With Valgrind
TL;DR: Ruby 1.9.2 leaks memory with rails apps, switch to 1.9.3.
We’ve been having some trouble running out of memory on our production servers ever since we upgraded our app to Rails 3.1 and Ruby 1.9.2 (p290). Our unicorn processes will gradually use up all the memory on our (m1.large) servers (7.5Gb) over the course of 24 hours or so.
Today one of our developers came across some topics (1, 2) on StackOverflow, as well as a post on HN that mentioned it might be a bug in ruby 1.9.2. I wanted to investigate that claim empirically, so I needed to find a tool that could measure memory consumption.
After a little googling, I was reminded of Valgrind, a suite of tools for testing applications. Valgrind has an excellent memory usage analyzer, Memcheck.
I was having some trouble running it and honestly got a little
frustrated, but Evan Weaver’s blog post about testing ruby with
valgrind at least gave me hope it was possible. I eventually
trolled through enough of the (excellent) valgrind documentation and
found that if I ran valgrind with --trace-children=yes
that I could
get the full results from testing my rails app.
Setup
I used version 3.7.0 of valgrind, which I built from source. Ubuntu also
ships version 3.6.1, which should work the same. You can simply run
sudo apt-get install valgrind
to get it. Otherwise it’s easy to
install from source.
I’m using RVM to manage my rubies. Ruby supports valgrind internally as of 1.9 with the –with-valgrind configure option. It’s apparently on by default, but if you’re paranoid you can pass the additional configure flag with RVM by doing:
RVM’s -C
option will pass subsequent options to the configure script.
In order to more accurately test ruby, I wrote a test script to ensure my application was exercised moderately. It’s a simple script which just loads some data into memory:
To actually execute valgrind, I ran:
Testing
Valgrind outputs a HEAP SUMMARY
and a LEAK SUMMARY
at the end of
program execution. These are what I’m focusing on to determine memory
changes between 1.9.2 and 1.9.3.
Here’s the raw output:
1.9.2-p290 Results:
1.9.3-p0 Results:
Analysis
As you can see, 1.9.2 leaked around 51M, whereas 1.9.3 leaked only 6.5k. In block terms, that’s a 100,000x increase in leakage.
The only thing that changed between these reports was the version of ruby. It seems there is definitely a memory leak running at least our rails 3.1 app on ruby 1.9.2. We’ll be upgrading to 1.9.3 as soon as possible.
Additionally, our app eats about 6.6Gb of memory performing this simple test.
I also ran a runner script with no ruby operations (just load the environment
and quit) and the app still consumed 4.3Gb. We definitely need to look
into what’s going on to load all that data. For reference, here’s the
abbreviated output of rake stats
: