Wednesday, November 25, 2009

Scaling up is out, scaling out is in

One of the more interesting, if less visible, trends in the past half-decade has been that clock speeds on modern CPUs have stagnated. I'm writing this post on my Macbook, which turns one year old next week. It's equipped with a 2GHz processor and 2GB of RAM. It's the first computer I've bought since 2002, when I built a ~1.2GHz Athlon system with 1GB of RAM. Instead of a factor of 10 faster in 8 years, it's a factor of less than two, and I don't think we're going to see much more in terms of clock speed in the future. Check out the graph below:

Since around 2002 clock speeds have held steady at about 2GHz. The primary constraint has been thermal. As processors moved into the multi-GHz range they started to dissipate up to 100W of heat, which becomes impractical to cool (ever had your legs burned by your laptop?). "Scaling up" clock speeds had hit a wall. So hardware engineers had to focus on other ways of making things faster. They did some increasingly clever things like superscalar execution (dispatching multiple instructions per clock cycle), new specialized instructions (SSE, etc), hyperthreading (a single processor appearing as two processors to the OS), then on to the logical conclusion of multi-core (multiple CPU dies in a single package). Performance now comes from "scaling out" to multiple cores, and if you're running a service, multiple machines.

The consequence of this shift from faster clock cycles to more processors has been that after decades of sitting on their asses and waiting for the next doubling of clock speeds to make up for their lazy coding, software engineers have to actually write code differently to get it to run fast. This could mean traditional optimization, re-writing existing code to run faster without fundamentally changing the approach to the problem. But increasingly it means taking advantage of the way hardware is evolving by writing code to take advantage of multiple cores by splitting the problem into independent pieces that can be executed simultaneously.

To some degree the service we're building at Kikini can naturally take advantage of multiple cores, since we're serving many simultaneous requests. However, due to the transactional nature of databases, there is a limit to how much performance you can get by simply adding more cores. Write operations require locks which cause other transactions to fail, so even if you had infinite cores you'd still be constrained by how your design your database.

All this points to three main ways to achieve high performance:
  1. Optimize individual queries
  2. Design queries and the database schema to minimize locking to take advantage of multiple cores
  3. Partition data in clever ways to spread the load across multiple servers
Fundamental to this is to always be measuring, which is why it's important to have an automated system like I described earlier this month so that engineers can stay focused on the important stuff.

Sunday, November 22, 2009

Working Around JSVC's Logging Limitations

JSVC is a popular option for people using Tomcat as their web container. The main advantage of JSVC is that it allows downgrading the user running a process (since most Linux systems require the root user to open a port below 1024), and also acts as a watchdog to restart the JVM if it crashes. However one big problem with JSVC is that it can only write the output of the JVM it's hosting to two files on the filesystem corresponding to stdout and stderr. This is problematic since it doesn't allow for log rotation or any other form of redirection.

At Kikini, we created a logging solution to append log statements into SimpleDB so that logs from all our machines end up in a central location, unbounded by normal filesystem limits, and easily query-able against and monitored, allowing us to react quickly to diagnose problems. The simplest way to use our logger is to redirect the output from the target process to the stdin of our logging process. However JVSC makes this rather difficult since it is hard-coded to only write to files on the filesystem.

Fortunately we have a trick up our sleeve in the form of UNIX named pipes, which can use as a target for JSVC to write to and a source for the logger to read from:
mkfifo pipe.out
mkfifo pipe.err
/usr/bin/ STDOUT < pipe.out
/usr/bin/ STDERR < pipe.err
/usr/bin/jsvc -outfile pipe.out -errfile pipe.err ...
Now JSVC will start up, and write into the pipes we created, which will be redirected into the mylogger processes.

Friday, November 13, 2009

Using Maven Chronos Without an External JMeter Install

Performance is one of the things we're really focused on at Kikini. But we want to stay focused on actually improving performance, and not spending a lot of cycles making manual measurements and interpreting logs. JMeter is probably the best open-source tool out there for measuring performance of a web application. I designed a JMeter test plan to simulate users visiting our site. Unfortunately while JMeter is great at making measurements, it stops short of data analysis and reporting.

Ideally we would like to get perf reports out of every build, which means we would like to do reporting as part of our Maven build, with results available as easily readable charts on our build server. The top hit you're likely to get from searching for "maven jmeter" is the awful JMeterMavenPlugin. I say awful because it wasn't easy to integrate, and if you look at the source code it's obvious that the project was done in spare time. There are a number of comments in the source like "this mess is necessary because..." which makes me think the whole thing is poorly designed, and if you search around you will indeed find that there are a number of problems people have encountered trying to use it. Finally, the output from the plugin is just the simple JMeter log, and not the reports I'd like.

All the way down in the middle of the second page of the Google results I found this gem: chronos-maven-plugin. Not only does this look like a well-designed and well-executed project, it produces wonderful HTML reports, perfect for plugging into our build server! This is a snippet of what the Chronos output looks like:

The only downside is that the Chronos plugin requires an external install of JMeter, which kind of defeats the whole purpose of Maven. Fortunately, inspired by an Atlassian post, I worked out a way to use the Chronos plugin without making JMeter a manual install by using the maven-dependency-plugin. First I deployed the JMeter ZIP file as an artifact on our Artifcatory repository:

<?xml version="1.0" encoding="UTF-8"?>
  <description>Artifactory auto generated POM</description>

In my POM, I set jmeter.home to the location that we'll be unpacking JMeter into:


Next I use the dependency plugin in the pre-integration-test step to unpack JMeter into the target folder:


Finally I configure Chronos to run:


Bingo. Now anyone running our build can get the JMeter performance reports with nothing more complex than running "mvn verify chronos:report".