I gave a talk this morning on the JavaScript Event Loop
at Penguicon 2013. Even though I had used JavaScript for several years,
I didn’t completely comprehend how the Event Loop works until a few
months ago. When the opportunity came to present at Penguicon, I figured
this was as good of a topic as any. You can download the presentation
below (or view it in your browser), and I’ll throw all the individual
slides and the gist of what I said about them on this page.
Download as a Keynote, Powerpoint, or HTML presentation.
Introduction
Slide 2/14: Credibility
I’ve been a web developer for a while, starting at some smaller mom and pop shops (not listed), to a couple fortune 50’s, before finally ending up at smaller and smaller (and quicker and more advanced) companies. For most of that time I was doing procedural PHP and MySQL programming, before eventually moving to mostly JavaScript (both frontend and backend).
I’m currently working with Packt to get a book on
Backbone.js published (which is a frontend JavaScript framework for
building Single Page Applications). Be sure to keep an eye out for it
and purchase several copies, even if you don’t intend on reading them.
Slide 3/14: MultiThreaded
Let me first begin the presentation by talking about something mostly unrelated to JavaScript; MultiThreaded programming. If an application is built to be MultiThreaded, it will make use of several of your CPU cores simultaneously. This means it can do number crunching in different places at the same time and we refer to this as Concurrency. An application built in this manner can be a single process within the Operating System. The Operating System itself usually gets to choose which cores an application will run on (even which core a single threaded application will run on).One way to fake MultiThreaded-ness in SingleThreaded languages is to simply run several different processes and have them communicate with each other.
For the longest time, CPUs were getting faster and faster, but then Moore’s Law caught up, and we sorta hit a wall with how fast our CPUs can get. So, to make hardware faster, we now throw more CPU cores at the computer. In order to truly scale and use the hardware to its fullest, one needs to build applications which make use of all CPU cores.
MultiThreading isn’t all butterflies and puppy tails though. There can be some big issues with this type of code, particularly Deadlocks and Race Conditions. One such example of these kinds of issues is that if an application is running on two separate threads, both threads reads a variable from memory at the same time, and both attempt to update the value by adding 2 to it. If the existing value is 10, and thread A adds 2, it does so by writing 12 to the memory location. If thread B also wants to add 2, it still thinks the value is 10, and writes 12. The programmer would expect it to be 14 and ends up with 12, and there are no errors. This type of bug can be very hard to track down, and the worst part is that it will happen in an unpredictable way.
Slide 4/14: SingleThreaded
Now that you know what MultiThreaded means, lets talk about
how JavaScript is not MultiThreaded. A JavaScript engine exists in a
single OS process, and consumes a single thread. This means that when
your application is running, CPU execution is never performed in
parallel. By running the JavaScript engine in this method, it is
impossible for users to get the Deadlocks and Race Conditions which
plague MultiThreaded applications.
Developers often refer to their callbacks running in an unexpected
order as a Race Condition, however it is not the same thing that happens
to MultiThreaded applications, and can usually be solved and tracked
down easily enough (e.g., use another callback).Slide 5/14: Implementation
There are three important features of a JavaScript engine
that deserve mention. These are the Stack, the Heap, and the Queue. Now,
different browsers have different JavaScript engines (e.g. Chrome has
V8, Firefox has OdinMonkey, and IE has something written in BASIC called
Chakra (just kidding!)) and each browser will implement these features
differently, but this explanation should work for all of them.
Heap: The simplest part of this is the Heap. This is
a bunch of memory where your objects live (e.g. variables and functions
and all those things you instantiate). In the presentation I refer to
this as Chaotic, only because the order doesn’t really matter and
there’s no guarantee with how they will live. In this heap, different
browsers will perform different optimizations, e.g., if an object is
duplicated many times, it may only exist in memory once, until a change
needs to happen, at which point the object is copied.
Stack: This is where the currently running
functions get added. If function A() runs function B(), well you’re two
levels deep in the stack. Each time one of these functions is added to
the stack, it is called a frame. These frames contain pointers to the
functions in the heap, as well as the objects available to the function
depending on its current scope, and of course the arguments to the
function itself. Different JavaScript engines likely have different
maximum stack sizes, and unless you have a runaway recursive function,
you’ve probably never hit this limit. Once a function call is complete,
it gets removed from the stack. Once the stack is empty, we’re ready for
the next item in the Queue.
Queue: This is where function calls which are queued up for the future go. If you perform a
setTimeout(function() { console.log('hi'); }, 10);
,
that anonymous function is living in the next available queue slot. No
items in the queue will be run until the current stack is complete. So,
if you have some work that might be slow that you want to run after you
get your data, try a setTimeout() with a delay of 0ms. Future items
which rely on I/O to complete, or a long running timer, are somehow in
that queue as well, although I’m not exactly sure how that is
implemented.
It’s worth mentioning Garbage Collection here as well. In
JavaScript it’s easy to create tons of objects all willy nilly like.
These get added to the Heap. But, once there is no scope remaining that
needs those objects, it’s safe to throw them away. JavaScript can keep
an eye on the current stack and the items in the Queue, and see what
objects in the Heap are being pointed to. If an object no longer has
pointers to it, it is safe to assume that object can be thrown away. If
you aren’t careful with how you manage your code, it’s easy to not have
those pointers disappear, and we call this wasted memory a Memory Leak.
Slide 6/14: Implementation Example
This code-run is an example of the previous slide. So, the very first thing that happens is that function a() and b() are “hoisted” to the top of the script, and are added to the heap. We then run the first message log “Adding code to the queue” in the current stack. After that we run a setTimeout, and the anonymous function in there is added to the Queue. Then we do another log, and run the a() function with an argument of 42. We are now one level deep in the stack, and that frame knows about the a() function, the b() function, and its argument of 42. Within a() we run b(), and we are now two levels deep in our stack. We print more messages, leave b(), leave a(), and print a final message. At that point, our stack is empty and we’ve run all of our code, and are now ready for the next item in the queue.Once we’re in the next queue item, we run the anonymous function (which exists in the Heap somewhere), and display our message.
At first glance, one might assume the message “Running next code from queue” could have been run earlier, perhaps after the first message. If this were a MultiThreaded application, that message could have been run at any point in time, randomly placed between any of the outputted messages. But, since this is JavaScript, it is guaranteed to run after the current stack has completed.
Slide 7/14: Sleeping
I come from a background in writing PHP/MySQL applications.
When a PHP script runs, it performs a bunch of work, and then probably
runs a MySQL query. Once that call is made to the external server, the
application falls asleep. It literally halts everything it is doing and
waits for a response from the database server. Once the result comes
back, it does some further processing, and then it might perform another
I/O function, such as calling an RSS feed. And, as you might guess, it
falls asleep again.
Now, what if the call to the RSS feed doesn’t require any of
the data we gain from the database call? Then the order of the two
calls might not have mattered. But, more importantly, the two calls
could have been run simultaneously! The application is as slow as the
two calls combined, instead of being as slow as the slowest of the two.
Node.js does something pretty cool, where every I/O request it makes
is a non blocking call. This means that the call can end the current
stack, and the callback can be called later on in a separate Queue. If
we’re performing a bunch of I/O operations, they can be run in parallel.
The application will still sleep, but it won’t be blocking.The web browser is the same. Most of the time it is doing nothing, perhaps waiting for a user to click on something, or waiting for an AJAX request to finish up.
Slide 8/14: Sequential vs Parallel I/O
This is a great graphic I adapted from the CodeSchool
Real-Time Web with Node.js course. It shows how the I/O operations for
sequential I/O compares to parallel I/O. The sequential graph represents
calls make in a more traditional language such as PHP, whereas the
parallel graph represents calls made in an EventLoop driven language
with non blocking I/O, or even MultiThreaded applications. Notice that
the application is only as slow as the slowest I/O operation, instead of
as slow as all I/O operations combined.
Slide 9/14: Other Language Event Loops
JavaScript isn’t the only language that can have an Event Loop. They can be implemented in the more traditional procedural languages as well. However, by having it built into the language, it’ll surely be quicker and have a nicer syntax.
Also, when it is implemented in another language, you lose
out on the special features if your I/O is blocking, so you’ll have to
be careful with which libraries you choose.
Some examples of Event Loops in other languages include Ruby’s EventMachine, Python’s Twisted and Tornado, and PHP’s ReactPHP.Slide 10/14: Other Language Event Loop Example
Here’s an apples to oranges comparison of the Event Loop working in Node.js to perform a simple TCP echo example, and the (I’m assuming) same application working in Ruby’s EventMachine. I took the Node example from the homepage of nodejs.org, and the EventMachine example from their GitHub readme. They’ve been altered slightly to use the same text and hopefully perform the same function (I honestly don’t know Ruby though).
Notice that the syntax for JavaScript is less terse.
Slide 11/14: Event Loops are Awesome
There you have it folks, Event Loops are awesome. They don’t have the race conditions or deadlock issues that MultiThreaded applications have. Most web applications waste time waiting on I/O, and this is a good way around it. There is no special syntax for it to work in JavaScript; it is built in. It’s pretty easy to build stateful web applications (whereas if this were PHP you’d need a database to store shared data, in JS you could just use a local variable).Slide 12/14: Event Loops aren’t Awesome
There you have it folks, Event Loops aren’t awesome. If you
perform a bunch of CPU intensive work, it will block your process and
only use one core. Unless, of course, you use Node.js and offload work
to another process. Or, if you’re in a browser, read the next slide.
Memory leaks are also possible, as you’re running an application for a
long time instead of temporarily. Unless, of course, you program cleanly
and are able to avoid those issues.
Slide 13/14: Web Workers
Well, now that I spent this whole time telling you how JavaScript is a SingleThreaded application and you can’t make use of multiple cores, I’ll apologize for being a liar. The core of JavaScript is single threaded, and it’s been that way for many years. However, there’s this cool new thing that came out in the last few years called Web Workers. It will allow your browser (doesn’t exist in Node) to offload work to a separate thread. This feature is available in every modern web browser, so feel free to offload your work today.
How it works is you create a script, and throw some specifically formatted code in there. The main script loads it with
var worker = new Worker('task.js');
,
where task.js is an existing JavaScript file. You also attach a bunch
of event handlers to the created worker object, and interact with the
worker that way. The script will run in its own instance of the
JavaScript engine, and cannot share memory with the main thread (which
has the nice side effect of preventing those race conditions).
When you want to pass information to and from the worker,
you use something called message passing. This allows you to pass simple
JSON objects around, but not complex objects that contain functions or
anything referencing the DOM. A great use-case for Web Workers would be
calculating a SHA1 hash or performing some map/reduce computations.
Basically, anything that involves a ton of number crunching and isn’t
all DOM operations.
Slide 14/14: Conclusion
There you have it, the JavaScript Event Loop. It is great
for I/O bound applications, and horrible for CPU bound applications.
Many people think the engine is MultiThreaded, or at least that it can
do things in parallel. Turns out it can do I/O in parallel, but not CPU
computations (unless using a separate process with Node.js or a Web
Worker in the browser).
Post a Comment