## Paper notes: DB2 with BLU Acceleration

### DB2 with BLU Acceleration: So Much More than Just a Column Store

Raman et. al., VLDB 2013

#### The big idea

IBM’s venerable DB2 technology was based on traditional row-based technology. By moving to a columnar execution engine, and crucially then by taking full advantage of the optimisations that columnar formats allow, the ‘BLU Acceleration’ project was able to improve read-mostly BI workloads by a 10 to 50 times speed-up.

## ByteArrayOutputStream is really, really slow sometimes in JDK6

TLDR: Yesterday I mentioned on Twitter that I’d found a bad performance problem when writing to a large ByteArrayOutputStream in Java. After some digging, it appears to be the case that there’s a bad bug in JDK6 that doesn’t affect correctness, but does cause performance to nosedive when a ByteArrayOutputStream gets large. This post explains why.

## On Raft, briefly

Raft is a new-ish consensus implementation whose great benefit, to my mind it, is its applicability for real systems. We briefly discussed it internally at Cloudera, and I thought I’d share what I contributed, below. There’s an underlying theme here regarding the role of distributed systems research in practitioners’ daily work, and how the act of building a distributed system has not yet been sufficiently well commoditised to render a familiarity with the original research unnecessary. [Read More]

## Some miscellanea

CAP FAQ I wrote an FAQ on The CAP Theorem. The aim is to definitively settle some of the common misconceptions around CAP so as to help prevent its invocation in useless places. If someone says they got around CAP, refer them to the FAQ. It should be a pretty simple introduction to the theorem as well. I think that CAP itself is a pretty uninteresting result, but it does at least shine a light on tradeoffs implicit in distributed systems. [Read More]

## Columnar Storage

You’re going to hear a lot about columnar storage formats in the next few months, as a variety of distributed execution engines are beginning to consider them for their IO efficiency, and the optimisations that they open up for query execution. In this post, I’ll explain why we care so much about IO efficiency and show how columnar storage - which is a simple idea - can drastically improve performance for certain workloads.

Caveat: This is a personal, general research summary post, and as usual doesn’t neccessarily reflect our thinking at Cloudera about columnar storage.

Disks are still the major bottleneck in query execution over large datasets. Even a machine with twelve disks running in parallel (for an aggregate bandwidth of north of 1GB/s) can’t keep all the cores busy; running a query against memory-cached data can get tens of GB/s of throughput. IO bandwidth matters. Therefore, the best thing an engineer can do to improve the performance of disk-based query engines (like RDBMs and Impala) usually is to improve the performance of reading bytes from disk. This can mean decreasing the latency (for small queries where the time to find the data to read might dominate), but most usually this means improving the effective throughput of reads from disk.

The traditional way to improve disk bandwidth has been to wait, and allow disks to get faster. However, disks are not getting faster very quickly (having settled at roughly 100 MB/s, with ~12 disks per server), and SSDs can’t yet achieve the storage density to be directly competitive with HDDs on a per-server basis.

The other way to improve disk performance is to maximise the ratio of ‘useful’ bytes read to total bytes read. The idea is not to read more data than is absolutely necessary to serve a query, so the useful bandwidth realised is increased without actually improving the performance of the IO subsystem. Enter columnar storage, a principle for file format design that aims to do exactly that for query engines that deal with record-based data.

## Cloudera Impala

If you have a strong background in either databases or distributed systems, and fancy working on such an exciting technology, send me a note!

It’s great to finally be able to say something about what I’ve been working at Cloudera for nearly a year. At StrataConf / Hadoop World in New York a couple of weeks ago we announced Cloudera Impala. Impala is a distributed query execution engine that understands a subset of SQL, and critically runs over HDFS and HBase as storage managers. It’s very similar in functionality to Apache Hive, but it is much, much, much (anecdotally up to 100x) faster.

## On some subtleties of Paxos

There’s one particular aspect of the Paxos protocol that gives readers of this blog - and for some time, me! - some difficulty. This short post tries to clear up some confusion on a part of the protocol that is poorly explained in pretty much every major description.