Apache ZooKeeper is an invaluable tool for
distributed coordination. However, when I began working
for Netflix, I found a lot of interest in ZooKeeper but not a lot of
experience. A few trials had been done but none of them made it out of the
testing sandbox. For that reason, Curator was initially conceived as a way to
make ZooKeeper easier to use for non-experts. The original versions of what
would become Curator consisted of a small library meant for internal use, but
it quickly became clear that it would be useful to others outside Netflix.
Over a period of time, ZooKeeper has become very
popular at Netflix. Via Curator, some of the uses are:
- InterProcessMutex for ensuring unique values in various sequence ID generators
- Cassandra Backups
- LeaderSelector for various housekeeping tasks in Chukwa collector
- InterprocessSemaphore to manage third party services that allow only a limited number of concurrent users
- Managing various caches
- Much more all the time
Zookeeper challenges
Here’s a ZooKeeper pop quiz – what’s wrong with
this code?
Answer: if the call to client.create() happens
before the ZooKeeper client successfully connects to the server, the create()
method will throw an exception.
New users of ZooKeeper are surprised to learn that
a significant amount of connection management must be done manually. For
example, when the ZooKeeper client connects to the ensemble it must negotiate a
new session and so on. This takes some time. If you use a ZooKeeper client API
before the connection process is complete, ZooKeeper will throw an exception.
These types of exceptions are referred to as “recoverable” errors.
Alternate options
Prior to Curator, options for working with Curator
were limited to the raw ZooKeeper class provided in the distribution and the
open source ZKClient. With either of these options, most of the work of writing
recipes would still have to be done manually. The ZooKeeper distribution
includes some pre-built recipes but there is no generalized framework for
creating new recipes and the provided implementations are missing some known
best practices.
Today, it’s a much brighter picture for using
ZooKeeper. If you are using a JVM-based language, Curator is the best way to
connect to ZooKeeper. For Python, there’s Kazoo, etc.
Curator architecture
Curator is a platform for writing ZooKeeper Recipes.
It’s built as a stack of three modules that make writing recipes much simpler –
no need to worry about connection management, edge case, etc.
- At the bottom of the stack is Curator Client.
This module is a wrapper around the native ZooKeeper class that adds connection
management and a framework for retries.
- The next layer is Curator Framework. It takes
advantage of Curator Client’s connection management and retry mechanism. It
adds a nice fluent API for all ZooKeeper interactions and performs them inside
of a retry loop. The beauty here is that when using ZooKeeper via the Curator
Framework you no longer need to be concerned with managing the ZooKeeper
connection or handling “recoverable” errors.
- On top of the Curator Framework, the Curator
Recipes module provides implementations for all of the recipes listed on the
ZooKeeper recipes page plus many additional recipes. Some of the most popular
recipes are: LeaderSelector (a leader election recipe), InterProcessMutex (a
distributed lock recipe) and PathChildrenCache (a local, updating cache of a
ZooKeeper ZNode’s children).
Curator Recipes also serve as examples of how to
use the Curator Framework for writing new recipes. However, writing ZooKeeper
recipes, even with Curator, is still a challenge. I’m fond of saying that
“friends don’t let friends write ZooKeeper recipes”. Always see if there’s a
Curator recipe that meets your needs or, if necessary, build on an existing
recipe.
Hungry for more?
Curator usage continues to grow and is now in
the Apache Incubator.
We invite you to participate and contribute via curator.incubator.apache.org.
About the author:
Jordan Zimmerman is a Senior Software Engineer on the Platform Team at Netflix.
He has contributed to open source software throughout his career. At Netflix he’s opened Curator, Exhibitor and Governator and maintains the Netflix Open Source hub at http://netflix.github.com. In his spare time he plays drums (bebop and big band), listens to opera (mostly Puccini), drinks fine whisky (Laphroaig and Buffalo Creek) and watches soccer (ManU) and Epic Meal Time. |
Hi Jordan, we had the pleasure of hearing Adrian mentioning Curator within his talk @C*Summit a fortnight ago in SFO and great to see that i's gathering momentum within the Incubator.
ReplyDeleteCan you please tell me/us more about Curator and how it manages/schedules your Cassandra backup. I was knocked for six when I saw the infrastructure you guys work with @Netflix so I am interested to know about this. Thanks for the post. Great work.
I don't have details on the Cassandra backup use of ZooKeeper. I'll ask around and try to get back to you.
ReplyDelete