Prologue
Check out Event-driven Data or: How I Learned to Stop Living in the Present and Travel Through Time by Zach Lark for a story about the data model that drove most of our research and performance testing.
Before I get into the setup, I want to clear the air. This isn't some writeup about how great Apache Cassandra
is as a database or how NoSQL is the new world order. This is a look
back at the long process by which we proved to ourselves and our
colleagues that Cassandra was a great fit for our project. What I hope
you, the reader, can take away from this is a sense of how we approached
a challenge of this scale and nebulousness, and how we avoided feeling
overwhelmed by breaking the project up into fun little chunks. Our
actual metrics will be largely irrelevant here.
To quickly introduce Cassandra, I'll say it's an open source
distributed database, like a clustered, load-balanced web service that
happens to store and retrieve data. Each instance of the database
running on a server is a node, and when networked together they form a ring or cluster. The distributed nature of the system can offer a high level of resilience, but brings with it a set of interesting challenges.
Choosing Our Own Adventure
Another developer and I were given an opportunity to explore new database technologies for a project that was years in the making, but not yet put together on any platform. We'd been building applications on top of Oracle Relational Database Management System (RDBMS) as long as either of us had been at the company, so this was a "Big Deal." We were, in essence and seemingly out of the blue, given carte blanche to find the database that was right for the project. Imagine, for a moment, being told that cost is not a factor, and that you should base your decision on pure, objective, technical analysis. Exciting, right?It should be noted, though, that a task granting this much freedom assigns equal responsibility. We were not simply choosing a database. Rather, we were equipping our managers with the information they would need to make an impactful decision for the company, along with procurement folk, executives, a DBA team with decades of combined Oracle experience, and dozens of other people used to certain kinds of software running on certain kinds of hardware. Here's our little success story.
Enabling the Decision
The first thing we read online about any database was either
a raving review from a diehard evangelist, a scathing diatribe from a
fan of something else, or blanket statements about the product's
superiority from the company's own marketing team. A dozen fastest and
most reliable databases created a frustrating and unnavigable landscape,
so our first goal was to split up, cut through the BS, and attempt to
understand the technologies. As a mad genius once put it, "Instead of
seeing what they want you to see, you gotta open your brains to the
possibilities."
We needed to apply each database's recommended best
practices to our particular data model, query patterns, volume, and peak
load. We catalogued driver availability, support agreements, and
hardware and network topology requirements. High availability,
performance, security, and ease of administration were top priorities.
To bring all that and more together, we kept a journal for the duration
of the project detailing all of our research and findings on our
company's wiki, so nobody could claim we didn't "do our due diligence."
After what felt like weeks of pure research, and several
cuts, we compiled a very short list of contenders, including Cassandra,
and subjected those to a thorough and well-catalogued series of tests
that we basically kept scaling up until something broke.
Creating a Consistent Environment
For the most part, databases come with some assembly required. As we devised our test strategy for Cassandra, we reviewed the documentation as thoroughly as time would allow, looking for the configuration that would best serve our purposes. Turns out it was "default," but whatever. We setup our best production-style linux virtual machine in a cloud platform, installed the database and any other necessary software, created a template based on that machine, and then assaulted our objective with an army of clones. Virtual machine clones, of course. Every time we had a test to run, we'd have a fresh, perfectly configured database sitting on consistent hardware. I can't stress how beneficial the cloud environment was for our evaluation. Misconfigure something? Bring up a new clone. A bug in your test garbles your data? Clone. Discover a new setting that would serve as a new baseline? Create a new template, then clone that.
The cloning process was super helpful when we wanted to
experiment with new settings. Cassandra provides a few options for
defining a data replication strategy, and in general they rely on
essentially identical configuration between the various nodes in the
cluster. Once a strategy is chosen, it can be setup as a template and
cloned to any number of machines. Additional clones can be used to
expand the cluster with minimal work. Opportunities for human error are
greatly reduced. This is more important for some humans, but it's good
to keep in mind regardless.
Creating Representative Data Sets
We had a handful of baseline snapshots based on varying data
volumes from which we cloned all of our environments, and it was a
pretty cool little side project to figure out what those baselines
should be. Our team created projections for the new feature's customer
base, and thus also the data volume, out to seven years. We chose a
handful of milestones from those projections, generally where something
would grow by an order of magnitude, maybe the number of users, the
number of records of a particular type, or the average rate of changes
to those records. Variance in user profiles gave us some interesting
data sets to play with. For example, some objects in our system might be
composed of a dozen elements, where others might be thousands, but we
need great performance for both. This variance created excellent
learning opportunities around best practices for query patterns and
denormalization.
12345678910111213private void createFourYearData() {
IdentityEvent identityEvent;
for (int i = 0; i < numberOfIdentities; i++) {
identityEvent = createNewIdentityEvent();
identityEventDao.save(identityEvent);
eventDao.save(createNewEvent(identityEvent.getNetworkId(), ID_EVENT));
eventDao.save(createNewEvent(identityEvent.getNetworkId(), ADDR_EVENT));
for (int j = 0; j < partitionSize; j++) {
addressEventDao.save(createNewAddressEvent(identityEvent.getNetworkId()));
}
}
}
The above code snippet highlights the simple scaling
properties of our baselines. Over time, we expect more "identities" in
the system, and also more "address change" events associated with each
identity. Our target databases reflected growth in these dimensions in
vastly different ways.
Creating Tools, Enabling Repeatability
We had an array of tests laid out for each database, and
were prepared to gather more metrics than even seemed reasonable. We had
read through some very cool articles in the Jepsen series on Aphyr.com,
and were inspired to build up our own toolkit, albeit somewhat mashed
in with our in-house code repositories. Not only would it serve our
immediate needs, but it would help others at the company with a similar
mission. Not to mention, if our results ever seemed unbelievable, we
could reproduce them in a jiffy and say, "I told you so."
Our first tools were our sample data hoses to blast those fresh
database snapshots. We'd use these to generate fairly representative
data sets through our prototype microservice, which had the handy
benefit of making it a breeze to swap out minimal code to support
different databases. These services were quickly deployed to the same
cloud environments as the databases and included in the snapshots so a
single web service request would build up any data configuration we
could imagine. We were starting to feel powerful.
A couple more members of the team poured lots of time and energy into building up JMeter
test suites that were basically designed to knock over either the
microservice or the database. These were repeatable, scalable tests
based on high concurrent user counts and other worst-case scenarios. Our
virtual environments were expanded again to include machines dedicated
to running these tests, and new batches of snapshots were taken.
By this point we basically had at least the corner pieces of
the puzzle, and maybe some edges, but we were burying ourselves in
data, piles of test results, heaps of potential but nothing substantive
yet. We also had several sources to cross-reference, like JMeter results
and the metrics tracked by DataStax OpsCenter,
a web-based monitoring interface for Cassandra. Our next move was
obvious: whip up a tool to parse JMeter results, fetch and interpret
OpsCenter data via its REST API, compute averages, percentiles and such,
and spit everything out in a CSV. Then, naturally, build some snazzy
charts with the company colors. A few simple plots distilled weeks of
setup, testing, and analysis into an at-a-glance view that succinctly
but accurately described the performance limitations for our particular
environment and data model.
Run Those Tests
Our actual tests were generally long-running simulations of
extreme load or system instability, run on top of each of our baselines.
Where the baselines scaled based on data volume and partition size, the
tests generally scaled in terms of concurrent threads. We accounted for
variable usage patterns by running different versions of each test for
write-heavy, read-heavy, and balanced query workloads. Testing
durability and availability kept things rather exciting, despite this
phase consisting of lots of waiting and crunching numbers. During extra
executions of some of the tests, we would kill one of the Cassandra
nodes, wait a while, and bring it back online. We were most interested
in whether or how much the overall cluster performance was impacted
during and after the outage, whether any writes were lost, and how long
it would take the cluster to repair. Lastly, while our primary focus was
on scaling the client and data aspects of our system, another variable
we felt deserved some attention was cluster size. We ran a few more of
our tests with extra Cassandra nodes, and measured the performance
against the baseline clusters. This provided fairly straightforward
cost-versus-performance guidelines, a handy reference for future
discussions, especially if customer growth exceeds expectations.
Distilling Data Yields Good Proof
In hindsight, an important goal of each of our tests was
that it end in some sort of failure. In most cases, it had something to
do with average and max response times reaching ridiculous levels. These
were represented by steeply inclined walls on the far right of our
graphs, and they gave us confidence in two things. The first was that
the database could handle orders of magnitude more traffic than we
planned to throw at it any time soon. The second was that we'd know what
kinds of growth would impact performance and what to do about it.

Knowing where each database would fail against our use cases
and with our data model was key, though. Imagine if we had simply
tested our expected user loads and data volumes, even at our biggest
seven year baseline, and called it a draw when several databases handled
them marvellously. If one we chose that way would have failed to
maintain low response times with, let's say, twice as many concurrent
users, or thirty percent more data, we'd probably have found out in
production sometime. Definitely not cool.
One of our bonus takeaways was an amusing anecdote about the
benefits of consistent, repeatable tests. We had been plagued by an odd
performance issue for a couple weeks at least, during most of our
tests. We identified some of the conditions associated with the spikes
in our otherwise smooth graphs, but after significant debugging efforts,
we could not track down the root cause. We'd had a suspicion early on
that it might be something weird with the cloud platform, but didn't
have anything to base it on outside of intuition. We were also basically
told that what we were seeing shouldn't be possible, which, by the way,
is my absolute favorite software description. Our wild goose chase
tracking down everything that wasn't the problem gave us enough
information to have an effective conversation with the vendor's
technical people, and offload our findings. They found and patched a bug
in their software within a day or two, which was impressive. I'm not
trying to knock this vendor or anything, I'm just saying, repeatable
tests in a consistent environment can help out everyone involved in ways
that you can't predict.
Happy Hunting
I hope our experiences researching database solutions can
help give perspective to anyone feeling daunted by a similar task. Just
remember that finding or building tools is the best counter to time
constraints, and when a test's usefulness is in doubt, turn it up to
eleven. If you break something, you'll learn something. Thanks for
reading!
Post a Comment
Click to see the code!
To insert emoticon you must added at least one space before the code.