Event-driven Data or: How I Learned to Stop Living in the Present and Travel Through Time



I am a father, husband, gamer, and musician who also happens to be a development manager. I live in the frozen north of Michigan's Upper Peninsula, which means that when I'm not digging out from snow, I'm tending to the bear hibernating in my outhouse.



Over the past decade, I've viewed the storage of data in a single frame of mind: data are mutable. A piece of information was always inserted, updated N number of times over its lifetime, and eventually deleted and erased from history. The data was always true and always current. Any attempt to store or retrieve history on the data was a nice-to-have feature, with varying implementations to very specific pieces of the information for which the database was responsible, therefore limiting its scope and complexity. But what if we took this viewpoint and turned it on its head? What if, instead of historical information being an afterthought, we made it a first-class citizen of the database and designed a solution around it? Instead of storing ever-changing data, we stored immutable events which the application could dynamically interpret and present back to the user in whatever way was needed?

Getting Started

My journey down the path of immutability started on the recommendation of our architect, Emmett. We had just started design on a new project and were contemplating moving away from the typical relational database we'd used for many years to a less expensive alternative. His thought was that with the change in technology we may be able to implement a more unique approach to storing the data. We all met up one afternoon in conference room D and Emmett immediately kicked things off.
"We have always treated data as completely mutable. Let's take an address as a very simple example." At this point he walked to the board and picked up a marker.

The Mutable Address

"In 2010 someone, let's call him Marty, adds his mailing address into the system," he said, drawing a small table containing the standard pieces of address information that we had come to be very familiar with over the years. He continued, "Over five years Marty changes his address ten times due to being a musician that can't settle down. At any point in time during the five years I could have easily queried the database and found what Marty's current mailing address was. Let's say that after five years someone needed to know where exactly Marty was living on November, 5th 2012."

Emmett drew a giant question mark on the board and went on, "Or how about when that update occurred?" He drew another question mark. "Or how about the time that he stole my Delorean and disappeared for two months only to show up on the other side of town wearing the same clothes with my car hanging halfway out of the old theater that's now a church?" A third question mark made its way onto the board, followed by a few exclamation points as well. Things had gotten weird.
Emmett took a deep breath and continued, "The simple answer is that without a significant change to the application and/or database these answers would be left unsolved."
After a moment of silence he looked around and exclaimed, "Great Scott! Someone took all of the erasers again," and proceeded to wipe the board clean using his hands.

Immutable Address Events

"Now let's take the same scenario but imagine that instead of a single record we stored the same information as a series of immutable events." Emmett proceeded to draw the same record up on the board again. "When the mailing address was initially added into the system in 2010, this would have been tracked as an event, say 'new'."
He added a column to the existing address fields labeled "EVENT" and within it wrote "NEW" and continued, "Each change over the next five years could have been inserted as a new "moved" event, each time inserting a new copy of the mailing address along with information about who issued the change, from where it originated, etc." Under the initial address he added another series of blocks to represent multiple inserts of address information, marking each record's event column as "MOVED".
Once he was finished writing the new addresses he proceeded with his explanation, "Once an event is entered into the database it is never deleted or altered in any way, making the data we have for Marty immutable. With the data tracked as events such as this, we can easily answer any of the questions I asked before." And with that he scratched an itch on his face, marking it with the black dry-erase marker that had transferred to his hands when he wiped off the board earlier.

"I'm still not seeing how this is better than us just having the application keep track of changes in a history log," remarked Buford, one of the junior members of the development team.
"Well Buford," Emmett replied, "by changing our data in this way we've gained a significant advantage over how we typically manage information: the ability to travel through time."
At this point in the conversation and by the look in his eyes we were sure that Emmett should probably have called in sick that day, but he ran his fingers through his prematurely silver hair and continued with his explanation.
"By replaying the events that we recorded we can easily see what Marty's address information was at any point in time, giving users of the application the ability to do much more with the data than was ever possible with just a single mutable address rec-."
Buford cut across Emmett, "Hold on a second. I'm still not seeing how this is any better than what we already have."

Fixing Bugs Without Fixing Data

Emmett narrowed his eyes at Buford and continued, "Well before I was interrupted I was about to get to the best part. Let's say that the logic that we write to replay a particular event is released to production with a bug, and if you're working on the code, Buford, the likelihood of that happening is greater than not. Say that you introduced a bug when the application replayed the 'MOVED' event that accidentally transposes what we have in the line one and line two fields of the address prior to returning it to the caller. Now customers are seeing this data about addresses that doesn't make sense to them. In the past this would have required not only a fix to the application but also someone to create and run a datafix for all of the records affected by this bug to straighten out the information in the database as well."
As Emmett continued to explain the example he started to pace around the room. "Now let's take that same situation and apply it to this event-driven data model. Customers are calling in because of the bug and what do we do now? We still need to fix the bug. This is obvious. But because we've corrected a bug in the way that the event was processed and rendered to the user, fixing the bug is the only thing that development needs to do. Because the data that we've stored is simply a list of events, the logic behind replaying them can be changed at any point to present our users with new or corrected ways of seeing their data. Effectively we can change an action in the past to alter the present!"
With this, Clara, the attendee from customer support, gave a verbal whoop which took everyone by surprise.

Analyzing Data Through Time

George from marketing (who had crashed the meeting uninvited) took this opportunity to try to bring Emmett's jubilation crashing down. He straightened his necktie and said, "What marketing is interested in is analytics about who is using our system, how they're using it, and how everything correlates to one another. While all of this event stuff sounds wonderful, I'm not sure how it benefits my team in the long run."
Emmett took a moment to gather his thoughts and responded, "How do you gather such information now?"
"Well we have direct access to the production database of course. Anyone on my team can write a query in a few minutes and see who is purchasing the most widgets and who may need more marketing material sent to them about our widgets," George replied.
"While direct access to a production database is an entirely separate issue," Emmett began, "this event-driven data model would still allow for that type of analysis and so much more. No longer do you have to remain fixed on the present. You can now traverse various periods of time, gathering data such as at what time customers purchased widget A over cog B, correlate that with their location, and target your marketing to very specific groups of existing or potential customers to make your group much more effective. With this data we can begin incorporating more 'big data' technologies to help us with this analysis and hopefully get things to a point where you no longer need to worry about access to the database anymore."
This made Verne from our security team give a slow clap and with that the hour was up and the meeting was finished.

Wrapping Up

I hope you enjoyed this highly fictional tale about a highly factual data model. It really is a departure from the standard "one record to rule them all" mentality, but as my colleague Emmett said when he closed the meeting answering Buford's question on what will happen to mutability, "Mutability? Where we're going we don't need mutability."

Post a Comment

[disqus][blogger][facebook]

Afrogalaxy

Contact Form

Name

Email *

Message *

Powered by Blogger.
Javascript DisablePlease Enable Javascript To See All Widget