EuroSys 2012 blog notes

EuroSys 2012 was last week - one of the premier European systems conferences. Over at the Cambridge System Research Group’s blog, various people from the group have written notes on the papers presented. They’re very well-written summaries, and worth checking out for an overview of the research presented.

FLP and CAP aren't the same thing

An interesting question came up on Quora this last week. Roughly speaking, the question asked how, if at all, the FLP theorem and the CAP theorem were related. I’d thought idly about exactly the same question myself before. Both theorems concern the impossibility of solving fairly similar fundamental distributed systems problems in what appear to be fairly similar distributed systems settings. The CAP theorem gets all the airtime, but FLP to me is a more beautiful result. Wouldn’t it be fascinating if both theorems turned out to be equivalent; that is effectively restatements of each other?

[Read More]

Should I take a systems reading course?

A smart student asked me a couple of days ago whether I thought taking a 2xx-level reading course in operating systems was a good idea. The student, understandably, was unsure whether talking about these systems was as valuable as actually building them, and also whether, since his primary interest is in ‘distributed’ systems, he stood to benefit from a deep understanding of things like virtual memory.

[Read More]

How consistent is eventual consistency?

This page, from the ‘PBS’ team at Berkeley’s AMPLab is quite interesting. It allows you to tweak the parameters of a Dynamo-style system, then by running a series of Monte Carlo simulations gives an estimate of the likelihood of staleness of reads after writes. Since the Dynamo paper appeared and really popularised eventual consistency, the debate has focused on a fairly binary treatment of its merits. Either you can’t afford to be wrong, ever, or it’s ok to have your reads be stale for a potentially unbounded amount of time. [Read More]

STM: Not (much more than) a research toy?

It’s a sign of how down-trodden the Software Transactional Memory (STM) effort must have become that the article (sorry, ACM subscription required) published in a recent CACM might have been just as correctly called “STM: Not as bad as the worst possible case”. The authors present a series of experiments that demonstrate that highly concurrent STM code beats sequential, single threaded code. You’d hope that this had long ago become a given, but what this demonstrates is only hey, STM allows some parallelism. [Read More]

The Theorem That Will Not Go Away

The CAP theorem gets another airing. I think the article makes a point worth making again, and makes it fairly well - that CAP is really about P=> ~(C & A). A couple of things I want to call out though, after a rollicking discussion on Hacker News. “For a distributed (i.e., multi-node) system to not require partition-tolerance it would have to run on a network which is guaranteed to never drop messages (or even deliver them late) and whose nodes are guaranteed to never die. [Read More]

CAP confusion: Problems with Partition Tolerance

Over on the Cloudera blog I’ve written an article that should be of interest to readers of this blog. I’m no great fan of the ubiquity of the CAP theorem - it’s a solid impossibility result which appeals to the theorist in me, but it doesn’t capture every fundamental tension in a distributed system. For example: we make our systems distributed across more than one machine usually for reasons of performance and to eliminate a single point of failure. [Read More]

Apache ZooKeeper is looking for Google Summer of Code applicants

Students! Over at Apache ZooKeeper we’re looking for great students with a strong interest in distributed systems to work with us over the summer as part of Google’s Summer of Code, 2010. Summer of Code is a great program - providing stipends to students and more importantly connecting them with mentors in open source projects. ZooKeeper has a number of interesting projects to get started on. ZooKeeper is a distributed coordination platform on which you can build the distributed equivalents of many traditional concurrent primitives like locks, queues and barriers. [Read More]

GFS Retrospective in ACM Queue

This is a really great article. Sean Quinlan talks very openly and critically about the design of the Google File System given ten years of use (ten years!). What’s interesting is that the general sentiment seems to be that the concessions that GFS made for performance and simplicity (single master, loose consistency model) have turned out to probably be net bad decisions, although they probably weren’t at the time. There are scaling issues with GFS - the well known many-small-files problem that also plagues HDFS, and a similar huge-files problem. [Read More]

SOSP 2009 Program Available

The accepted papers for SOSP 2009 are here. As ever, some excellent looking papers. If you search for the titles you can often turn up drafts or even the submitted versions. The best looking sessions to me are ‘scalability’ and ‘clusters’, but there’s at least one great looking title in every session. I’ll start posting some reviews once I find some bandwidth (and have finished the computation theory series - next one on its way). [Read More]