Project Loom: Java with a stronger Fiber

8 min readAug 1, 2019

Popularity comes at a price. Java is and has been a very popular language, attracting praise and critiques. While it would be fair to expect a decline, after so many years and such a big legacy, Java is actually in a pretty good shape, and has a very strong technical road-map. Basically a new Java is coming, and a few years from now things might be very different in JVM-land. The OpenJDK has some technically impressive projects that we will hopefully be able to use soonish, and that have the potential to affect not only Java but even other languages.

Apart from Loom, the focus of this article, you should keep an eye on Valhalla, which might double the performance of Java on some cases, and Graal, which does so many things that I don’t even know where to start! And of course the language is changing to become less verbose, thanks to Amber.

These project might even change the perception of other languages, so they are potentially really high impact.

For example: Loom+Graal give you continuations (co-routines) and ahead of time compilation, making Go less appealing than now.

Loom + Amber give you fibers (enabling potentially simpler actor systems) and shorter syntax, making Scala less attractive than now. Valhalla + Graal might reduce the performance gap with C++. And Graal might push Python to run in the JVM itself, or at least PySpark might benefit from it.

But lets focus on Loom. And as there is not much practical information about it at this time, we will go on, build and use this experimental JVM and take some benchmarks. Let the numbers speak!

Project Loom

Java used to have green threads, at least in Solaris, but modern versions of Java use native threads. Native threads are nice, but relatively heavy, and you might need to tune the OS if you want to have tens of thousands of them.

Project Loom introduces continuations (co-routines) and fibers (a type of green threads), allowing you to choose between threads and fibers. With Loom, even a laptop can easily run millions of fibers, opening the door to new, or not so new, paradigms.

A small digression: Erlang

You might have heard about Erlang. It’s a very interesting, language, much older than Java, with some shortcomings but also some impressive features. Erlang has native support for green threads, and in fact the VM counts the operations and switch between green threads every now and then.

In Erlang, it is common for a program to have many long-lived, not very busy, threads. It is in fact expected to serve every user with a dedicated thread. Many of these threads might execute network operations (after all Erlang has been developed by Ericsson for the Telecom industry), and these network operations are synchronous. Yes, synchronous. We might serve a million of users with one machine with a lot of RAM, using simple, synchronous, network operations.

Synchronous vs Asynchronous

For years we have been told that scalable servers require asynchronous operations, but that’s not completely true.

Sure, if you need to scale using a thread pool (or even one single thread), you basically have no alternatives: you have to use asynchronous operations. And asynchronous operations can scale very well.

When I joined Opera Software in 2008 I was a bit surprised to hear that Presto, the Core of the browser, was single threaded. Yep, one single thread. But that was enough. Tens of tabs rendering HTML and processing Javascript, network downloads, file operations, cookies, cache, you name it. And only one thread, lots of asynchronous operations, and callbacks everywhere. And it worked pretty well.

But asynchronous code is hard. It can be very hard. Asynchronous calls break the flow of the operations, and what could be just 20 lines of simple code, might need to be split in multiple files, run across threads, and can take a developers hours to figure out what is actually happening.

Wouldn’t it be nice to get the simplicity of synchronous operations with the performance of asynchronous calls?

Fibers to the rescue

Loom introduces fibers. That’s great, but it is not enough. To do useful things you need a network stack that is fiber friendly. When I tried Loom a few months ago, this was not the case. Creating around 40–50 fibers was enough to start to have network errors. The project was too immature.

In June, the JDK 13 accepted in the mainline the JEP 353 (https://openjdk.java.net/jeps/353), which rewrote the Java Socket API to be Fiber friendly.

While not everything works, Loom can now be used with network operations.

It’s time to have an Actor System that can leverage the fibers of Loom.

Ok, maybe it is a bit early, as Project Loom is still experimental and the JDK 13 is due in September, but I could not resist and I created and open sourced a small Actor System able to take advantage of Loom: Fibry. We will use it to benchmark Loom and see if fibers are really better than threads.

Actors and Fibry

Actors are used in multi-threaded environment to achieve concurrency in a relatively simple way. In particular actors are single threaded, so you do not have concurrency issue by definition, as long as they operate only on their state; you can alter the state of an actor sending messages to it.

Erlang enforces this safety having only constants (no for-loops, and you can’t even switch two variables in a traditional way…), Java does not. But actors can still be very useful.

An excellent use case for actors is when you have a long running task that is particularly light, typically because it relies on network operations and just waits for the clients to do something. For example, an IoT network might have all the devices permanently connected to a control server, sending messages only every now and then. A chat is another example of program that can benefit from actors. And a server supporting WebSockets might be another candidate.

Fibry is my Actor System, designed to be small, flexible and simple to use, and of course to take advantage of Loom. Fibry works with any version of Java starting from Java 8, and it has no dependencies, but requires Loom to use fibers. If you would like to have more information, please check GitHub: https://github.com/lucav76/Fibry/.

Building Loom

Building Loom is a bit time consuming, but easy. You can get some information here:

https://wiki.openjdk.java.net/display/loom/Main#Main-DownloadandBuildfromSource

After installing Mercurial (OpenJDK is still on Mercurial). You need to run these commands:

hg clone http://hg.openjdk.java.net/loom/loom
cd loom
hg update -r fibers
sh configure
make images

You might need to install some packages during the process, but ‘sh configure’ should tell you which commands to run.

That’s it!

You could now create “Hello World” with fibers:

var fiber = FiberScope.background().schedule(() -> System.out.println(“Hello World”));

You can get more information here:

https://wiki.openjdk.java.net/display/loom/Structured+Concurrency

We are not going to use fibers directly, but we will use Fibry, as we are primarily concerned with how actors can benefit from them.

Comparing Fibers and Threads

Let’s count how much time we need to create (and keep alive) 3K threads. You can try a higher number if your OS is properly tuned. I am using the standard configuration of a c5.2xlarge VM with Loom JDK without parameters. It can created 3K threads, but not 4K.

When you run this test with many threads, be prepared that it can be a bit hard on your PC, and you might need a reboot.

for(int i=0; i<3000; i++)
    Stereotypes.threads().sink(null);

This code creates 3K “sink threads”, that simply discard the messages they receive. In my VM, it takes 210 ms to execute.

Let’s try to create 1M Fibers, using the fibers() method instead of threads():

for(int i=0; i<1_000_000; i++)
    Stereotypes.fibers().sink(null);

In my VM I can actually create 3M fibers. 3 millions!

With Loom can roughly create 1000 times more fibers than threads! You can surely tune the VM and the OS to increase the number of threads, but it is my understanding that there is a limit around 32K.

Fibers are also much faster to create. 3K threads require 210 ms, but in the same amount of time it is possible to create 200K fibers, meaning that fibers creation is up to 70 times faster than thread creation!

Measuring context switching

In general, a computer needs to switch from one thread to another, and this take a small but significant amount of time. We will now try to see if fibers are faster on this particular problem. To try to measure context switching, we will create two threads and exchange messages synchronously, with a code similar to this (you need to call ActorSystem.setDefaultStrategy() to select threads or fibers):

var act = ActorSystem.anonymous().newActorWithReturn((Integer n) -> n * n);
Stereotypes.def().runOnceSilent(() -> {
    for (int i = 0; i < 250_000; i++)
        act.sendMessageReturn(i).get();
}).closeOnExit(actorAnswer).waitForExit();

Here we have the actor act able to return the square of a number, and another actor asking it to do so 250K times, waiting for the result.

On my VM, threads needs around 4700 ms to complete this task, while fibers need around 1500 ms, so fibers can exchange 3 times as many synchronous messages as threads.

Network operations

Let’s now check if network operations are fine.

The following is a simple HTTP HelloWorld code, that starts the embedded Java HTTP server:

Stereotypes.def().embeddedHttpServer(12345, exchange -> “Hello World!”);

Every time a new client is connected, a new actor is created to process the request. In this case threads and fibers perform very similarly at around a disappointing 2200 requests per second. Here the bottleneck is probably the embedded HTTP server, which is not meant for server loads.

So let’s try to write a super simple HTTP server that always answer with the same string:

Stereotypes.def().tcpAcceptorSilent(12345, conn -> {
    try ( var is = conn.getInputStream(); 
          var os = conn.getOutputStream()) {
        while (is.read() != '\n' || is.read() != '\r' || 
               is.read() != '\n') { /** Skip to the end */ }
        os.write("HTTP/1.1 200 OK\r\n" + 
                 "Content-Length: 6\r\n\r\nHello!".getBytes());
    }
}, null).waitForExit();

I am testing with Apache Bench, using 100 threads:

ab -k -n 50000 -c 100 http://localhost:12345/

The thread version can serve almost 11K requests per second, while fibers score above 24K. So in this test fibers are twice as fast as threads.

Are fibers always faster?

Not exactly. For some reason, threads seems to be slightly faster at sending asynchronous messages, at around 8.5M per second, while fibers peak at around 7.5M per second. In addition, threads seem to suffer less from congestion when the number of threads grow, in this particular benchmark.

This might be solvable switching to a different messaging system than the one Fibry uses. In addition, let’s not forget than Loom is not yer ready for production, so there is still margin to improve the behavior.

If you want to run some benchmarks by yourself, you can find the full code and some more tests here: https://github.com/lucav76/FibryBench/

Conclusions

Loom seems in a good shape. Fibers behave really well from a performance point of view, and have the potential to increase the capacity of a server very much, while at the same time simplifying the code. Fibers might not be a solution for every problem, but surely actors systems can greatly benefit from them.

I am looking forward to see Loom merged in the mainline of the OpenJDK. What about you?