Benchmarking JVMs Vol II

Benchmarking JVMs Vol II

A while ago I tried out OpenJ9 compared to HotSpot when it comes to the JZenith Redis example app. Then I did some more optimizations and got it up to 7k requests per second.

I was talking the other day about the new Oracle license system for the JDK and the name Azul was mentioned, then I remembered that there is another JVM out there: Zing. So in addition to my last post here are the numbers for Zing, startup times are comparible to HotSpot, but the request numbers are not so nice:

Summary:
Total: 10.0109 secs
Slowest: 0.0469 secs
Fastest: 0.0007 secs
Average: 0.0101 secs
Requests/sec: 4910.9484


Response time histogram:
0.001 [1] |
0.005 [8930] |■■■■■■■■■■■■■■■■■■■■
0.010 [17706] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.015 [14121] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.019 [5621] |■■■■■■■■■■■■■
0.024 [1791] |■■■■
0.028 [658] |■
0.033 [208] |
0.038 [86] |
0.042 [33] |
0.047 [8] |


Latency distribution:
10% in 0.0038 secs
25% in 0.0064 secs
50% in 0.0095 secs
75% in 0.0129 secs
90% in 0.0168 secs
95% in 0.0198 secs
99% in 0.0271 secs

This is basically the same as in my unoptimized HotSpot benchmark. Even though it is not complaining about not being able to use the native transport I suspect that Netty is just more optimized for HotSpot.

Optimizing Vert.x request throughput

Optimizing Vert.x request throughput

While benchmarking HotSpot against OpenJ9 I realised that 5k requests per second are nice enough, but that there might still be some room for optimization.

Vert.x has two ways to improve request performance, the native transport and the amount of Verticles (i.e. concurrency) you allow for requests. So I started playing around with that.

Native Transport

So Vert.x has this ability to use different transport implementations, that basically replace the Netty Channel based event loop with something EPoll based.

In order to enable that you just have to add

<dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-transport-native-epoll</artifactId>
    <version>4.1.19.Final</version>
    <classifier>linux-x86_64</classifier>
</dependency>

to your pom.xml. Make sure that the version matches the version of Netty Vert.x is using right now.

In order to prefer the native transport you have to set a Vert.x Option:

new VertxOptions().setPreferNativeTransport(true)

It also gives you some more options to play with the underlying TCP stack:

final HttpServerOptions options = new HttpServerOptions()
        .setTcpFastOpen(true)
        .setTcpNoDelay(true)
        .setTcpQuickAck(true);

So let’s see how that performs:

Summary:
Total: 10.0067 secs
Slowest: 0.0259 secs
Fastest: 0.0027 secs
Average: 0.0091 secs
Requests/sec: 5494.8290


Response time histogram:
0.003 [1] |
0.005 [43] |
0.007 [401] |
0.010 [49262] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.012 [3198] |■■■
0.014 [938] |■
0.017 [1033] |■
0.019 [74] |
0.021 [26] |
0.024 [7] |
0.026 [2] |


Latency distribution:
10% in 0.0081 secs
25% in 0.0085 secs
50% in 0.0089 secs
75% in 0.0093 secs
90% in 0.0097 secs
95% in 0.0107 secs
99% in 0.0151 secs

On average it gives you a plus of 300 requests per second. It is faster, but actually not much.

Verticle deployments

Vert.x has multiple event loops, in the default configuration as many as visible cores. So if you start your HttpServer in a Verticle, you can scale it on the same machine with the amount of instances that you deploy. Aligning that with the number of hardware threads that your system provide is in general a good option.

Vert.x is actually reusing the port binding, so it allows you to deploy multiple Verticles that bind to the same port, as long as they do the same thing that is generally not a problem.

vertx.deployVerticle(() -> new AbstractVerticle() {
            @Override
            public void start(Future<Void> startFuture) {
                vertx.createHttpServer(options)
                        .requestHandler(handler)
                        .listen(restConfiguration.getPort(), restConfiguration.getHost(), ar -> {
                            if (ar.succeeded()) {
                                startFuture.complete(null);
                            } else {
                                startFuture.fail(ar.cause());
                            }
                        });
            }
        }, new DeploymentOptions().setInstances(Runtime.getRuntime().availableProcessors()),
        completableHandler.handler());

I’m using availableProcessors(), which since Java 10 even takes visible processors through CGROUPs into account, meaning that inside your CPU limited Docker container you only get as many instances as you can utilize CPU.

Summary:
Total: 10.0053 secs
Slowest: 0.0313 secs
Fastest: 0.0006 secs
Average: 0.0070 secs
Requests/sec: 7096.2075


Response time histogram:
0.001 [1] |
0.004 [7472] |■■■■■■■■■■
0.007 [30415] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.010 [21795] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.013 [7789] |■■■■■■■■■■
0.016 [2382] |■■■
0.019 [701] |■
0.022 [281] |
0.025 [94] |
0.028 [49] |
0.031 [21] |


Latency distribution:
10% in 0.0036 secs
25% in 0.0049 secs
50% in 0.0065 secs
75% in 0.0086 secs
90% in 0.0110 secs
95% in 0.0129 secs
99% in 0.0176 secs

That gives another 2.5k requests per second and actually fully loads my poor Laptop, so that massively increases throughput up to 7k on HotSpot.

A quick look at OpenJ9 shows that it is consistently 1k/rps slower, so I guess most of the high performance stuff is just better optimized for HotSpot.

Benchmarking JVMs

Benchmarking JVMs

While playing around with jZenith I realized that I somehow ignored OpenJ9 up to now. So I wanted to see if it actually makes a difference for a small test case.

As I wrote basically the same app over and over again in jZenith to play with different integrations I wanted to see how it looks for the example app for the Redis Plugin.

I use hey nowadays for REST benchmarks, because they draw these nice histograms.

Just a couple of quick numbers for further reference (and for when it finally runs with GraalVM). All numbers from my Laptop (Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, 12 GB of RAM)

openjdk version "11" 2018-09-25
OpenJDK Runtime Environment (build 11+24-Ubuntu-118.04)
OpenJDK 64-Bit Server VM (build 11+24-Ubuntu-118.04, mixed mode, sharing)

Startup time is fairly consistent around 1.7 seconds, with JVM startup adding 1 second (!) of overhead in warm-up with 60 seconds of requests, then:

hey -z 10s http://localhost:8080/user/e01afce1-cf1d-49ab-a78d-53e5ca1032ad

Summary:
Total: 10.0086 secs
Slowest: 0.0401 secs
Fastest: 0.0038 secs
Average: 0.0098 secs
Requests/sec: 5103.0285


Response time histogram:
0.004 [1] |
0.007 [254] |
0.011 [43529] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.015 [5071] |■■■■■
0.018 [1446] |■
0.022 [518] |
0.026 [143] |
0.029 [79] |
0.033 [31] |
0.036 [1] |
0.040 [1] |


Latency distribution:
10% in 0.0084 secs
25% in 0.0088 secs
50% in 0.0091 secs
75% in 0.0096 secs
90% in 0.0127 secs
95% in 0.0142 secs
99% in 0.0193 secs

So 5k requests per second is quite good I’d say.

Next OpenJ9:

openjdk version "11" 2018-09-25
OpenJDK Runtime Environment AdoptOpenJDK (build 11+28-201810022340)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.10.0-rc2, JRE 11 Linux amd64-64-Bit 20181002_42 (JIT enabled, AOT enabled)
OpenJ9 - e44c4716
OMR - 32df9563
JCL - e80f5bd084 based on jdk-11+28)

Startup is slightly faster (measured internally) but the VM overhead is more around 1.5 secs.

Summary:
Total: 10.0055 secs
Slowest: 0.0378 secs
Fastest: 0.0030 secs
Average: 0.0116 secs
Requests/sec: 4313.1229


Response time histogram:
0.003 [1] |
0.007 [8] |
0.010 [961] |■
0.013 [40454] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.017 [738] |■
0.020 [52] |
0.024 [817] |■
0.027 [111] |
0.031 [2] |
0.034 [5] |
0.038 [6] |


Latency distribution:
10% in 0.0105 secs
25% in 0.0109 secs
50% in 0.0112 secs
75% in 0.0117 secs
90% in 0.0124 secs
95% in 0.0132 secs
99% in 0.0226 secs

So in this scenario HotSpot is definitely faster, but I guess measuring against a nightly of OpenJ9 is actually not really fair. We will see how it performs against Graal.

jZenith – an opinionated approach to building modern Java Microservices

jZenith – an opinionated approach to building modern Java Microservices

So, after teasing about a Spring Boot for Vert.x I started coding a little bit. The result now runs under the name of jZenith.

It is a simple prototype with an example application that does simple CRUD on a simple entity. But it already has a lot of things that I think are needed, plus some bugs and some nice technologies.

The overall app setup currently looks like this:

JZenith.application(args)
       .withPlugins(
         RestPlugin.withResources(UserResource.class)
                   .withMapping(NoSuchUserException.class, 404),
         PostgresqlPlugin.create()
       )
       .withModules(new ServiceLayerModule(), new PersistenceLayerModule(), new MapperModule())
       .withConfiguration("postgresql.database", "test")
       .withConfiguration("postgresql.username", "test")
       .withConfiguration("postgresql.password", "test")
       .run();

I exchanged a lot of the technologies I used over the last years, basically it is Guice based, which I still prefer to any other DI framework. On the Rest side it uses Resteasy, which someone thankfully pointed out to me has native support for RxJava based Resources, making for nice resources like this:

public Single<UserResponse> getUser(@NonNull @PathParam("id") final UUID id) {
    return userService.getById(id)
                      .map(userMapper::mapToUserResponse);
}

I’m basically using it right now to play around with other technologies and having a little fun with low-level stuff. It is a great learning experience. Most of the stuff is actually just glue code between the different frameworks. The only thing that is actually some code is the configuration system.

The next steps will be to get some of the kinks out of the libraries that I’m using and put a better abstraction on SQL based databases, then maybe one of the other databases like Cassandra. And I need to write more tests.

And it already has a website, thanks to github pages (jzenith.org) and a logo. Setting up a project has never been easier, unless the fact that it is really hard to find free .org domains nowadays.

Original