Vert.x Boot first step

So, I happen to find myself with a lot of free time at my disposal. Having spent the last three years head-of-teching in a startup I also have finally the time to think about the technologies we used, the shortcomings, the pitfalls and what I would do different on the next project.

Turns out that I really fell in love with Vert.x as a technology, Java 10 and RxJava2. As you still can see on GitHub we build an awfull lot of components around that. Vert.x is just blazingly fast and easy to understand, RxJava makes for nice reactive streams and modern Java is just fun to work with.

We also had a couple of Spring Boot services, mostly for simplicity and the hype four years ago. In general Spring Boot brings a lot of features, most of them you don’t need or don’t know. It is also quite slow when it comes to startup, in a container, on a VM, in the Cloud.

Now, while I am wondering what to do next and what to do in general I started wondering what I would do knowing what I know now.

We mostly used a stack based on Vert.x, RxJava (sadly 1.x) and Vert.x Jersey. We glued it all together with a couple of abstractions over Verticle deployment and got our own nice stack this way.

Things I would (and have done with newer services) is using RxJava2 as well as replace HK2 with Guice. HK2 is nice but also slightly inferior to Guice in my opinion. For some reason HK2 is not that well maintained, as of last week there was still no support for Java 10 in a released version.

And now I am wondering if I should take this whole stack to a new level and build something like Vert.x Boot, a system that easily lets you bootstrap Vert.x microservices for a variaty of use-cases with the following features:

Easy to use bootstrapping process like Spring Boot has
Guice as CI and general application configuration approach (I strongly believe in using code for as many things as possible)
Vertx-Jersey as a general REST/JAX-RS abstraction
Metrics and Prometheus integration out of the box using Micrometer
Defined readiness and liveness probe patterns (health checks)
Very good RxJava2 integration
Minimal dependency footprint that is easy to upgrade in order to follow Java’s amazingly fast upgrade cycle
Some well defined testing and containerization patterns (we all build images now, no?) that make it easy to build “golden” images, maybe using Testcontainers

On top of that there should be some modules to integrate with commonly used data layer services and provide the infrastructure that you usally use around that, like connection pooling and schema migrations:

SQL, using jOOQ, HikariCP and Flyway for migrations
Elasticsearch, using vertx-elasticsearch-service and elasticsearch-migration
Cassandra, using vertx-cassandra and cassandra-migration
Kafka
Redis

Bonus

I really like GraphQL, so something that would abstract around GraphQL Java to build easy to use GraphQL endpoints would also be nice
I’m not so sure about my feelings about Jigsaw and the way the Java eco-system is going, but having a good way to work with modules would also be a bonus.

I used to say that Java is not a great fit for Microservices because of memory usage and startup times, but with this stack it might actually make sense. I haven’t yet figured completely out how low you can go, but with a very low old-gen baseline and requests that always run only in your young-generation you can actually run those things with a very small memory footprint.

Maybe that is something to fill my time with once the summer ends.

Berlin Buzzwords 2015 – Day 2

It’s All Fun And Games Until…: A Tale of Repetitive Stress Injury (Eric Evans)

Basically, watch out for yourself. If it hurts, you are doing something wrong. It’s good to be reminded about that from time to time. So, watch out for yourself:

Stress injuries is no joke. We only have one health. Thanks, Eric. Now I have all the symptoms 🙂 #bbuzz pic.twitter.com/S9VJMb0JbH

— Alexey Hanin (@alexeyhanin) June 2, 2015

A complete Tweet index on Apache Lucene (Michael Busch)

Michael Busch has given this talk in one version or the other for a couple of years now. Unfortunately it got more shallow now, not so many technical details about how they optimized Lucene for Twitter. Numbers are great, they have two billion queries per day and about 500 million tweets per day. One thing that he didn’t mention in his earlier talks is that they actually figured that Earlybird does scale to Twitter level requests due to the Earthquake in Japan when they had to emergency shutdown the caching layer in front (which was Ruby on Rails and did not scale that well). Nowadays they have all tweets in a pretty vanilla Lucene with some additions (going to be open sourced soon) and use a Mesos cluster in case they have to reindex all the data.

And BTW: Tweet IDs encode the timestamp the tweet was send at.

Automating Cassandra Repairs (Radovan Zvoncek)

First time speaker Radovan did a pretty good job. Apparently there are several ways to get to the “consistency” of the “eventual consistency” in Cassandra, which are Read Repair, Hinted Handoff and full blown anit-entropy repairs. The latter ones apparently can lead to a lot of problems if done improperly, so Spotify build something to manage that: Reaper. The problems are usually due to disk IO limits, network saturation or just plain full disks. Spotify Reaper orchestrates anti-entropy repairs to make them reliable.

I’m still somewhat confused that one aparently has to spend a lot of time repairing Casandra clusters. I always thought that was what Cassandra was doing.

Diving into Elasticsearch Discovery (Shikhar Bhushan)

For all the people who forgot, like me, how ES clustering works this was a good reminder. Plus I learned that discovery is pluggable, so you can write your own plugin to provide the clustering part for ES. He apparently did and wrote Eskka, an Akka based clustering approach. Writing your own apparently isn’t that much fun because APIs change all the time. Just in case you forgot, Zen is the default way ES clusters.

Change Data Capture: The Magic Wand We Forgot (Martin Kleppmann)

Highest buzzword per second speed so far on #bbuzz by @martinkl Change Data Capture. A simple idea found a courageous man to get it shaped.

— Alexey Hanin (@alexeyhanin) June 2, 2015

We all know the problem Martin was describing: same data in different form, like in your database, in your cache, in your search engine. He went back to the “Change Data Capture” principle, which basically says “save once, distribute everywhere”. So in order to realize that he wrote a PostgreSQL plugin “Bottled Water” which gets the changes from Postgres and posts them to a Kafka topic. Yay for the best project name this year in the category: will never find that on google.

His implementation and idea is solid, the problem is that it is a Kafka topic per table, so you actually loose the transaction when reading from Kafka. Otherwise it is transaction save, messages are only sent when the transaction in Postgres commits. He uses Avro on the wire and transforms the Postgres DDL Schema to an Avro schema.

If you want to get your transaction back you would need a stream processor (Storm/Spark) downstream to reassemble your transactions. Might be a good idea if you already have a Postgres DB or rely on some special properties of a centralized Datastore, otherwise it is OK if your microservices write directly to Kafka.

Has someone actually coined the word “nanoservices” yet for designs that basically do just one thing? Like take the request, write it to a queue (Kafka) and all other processing taking place by consumers down the queue that do just one thing as well.

Designing Concurrent Distributed Sequence Numbers for Elasticsearch (Boaz Leskes)

Elasticsearch is rewriting the way they do distributed indexing based on the Raft Consensus Algorithm. Sounds great, they are mitigating a lot of problems they do have right now.

Apache Lucene 5 – New Features and Improvements for Apache Solr and Elasticsearch (Uwe Schindler)

Apparently, Lucene 4 broke a lot of indexes due to it’s build in backward compatibility to Lucene <=3. With two big companies actually relying on Lucene, that kind of amazes me.

Lucene 5 gets rid of all this legacy stuff and drops support for older indexes. Plus it adds a lot of data safety features when it comes to on-disk indices like checksums and sequence numbers. So, Solr and Elasticsearch should finally be production ready …. ;).

JDK seems to keep breaking Lucene (remember that the initial JDK 7 release broke Lucene?), apparently one should not use G1 GC with Lucene (es? Solr?).

And Lucene 5 uses a lot of the “new” JDK 7 APIs for IO to finally get the index safely to disk.

Real-Time Monitoring of Distributed Systems (Tobias Kuhn)

Less distributed, more of Real-Time monitoring. Apparently they build their own system for analyzing their loggs for anomaly detection, punnily named Anna Molly, which was open sourced now.

They made pretty clear that thresholds are not enough if you have a highly dynamic system that can change on multiple dimensions any time. Seasonality of your date makes it even harder to define useful thresholds. There are a couple of algorithms which can be used for anomaly detection, namely Tukey’s outlier detection and seasonal trend decomposition. And T-digest comes to the rescue of course.

For monitoring they actually use a cascade of statsd and carbon.

To sum up bbuzz 2015:

The winning buzzword at #bbuzz 2015 seems to be anomaly detection. Find all the anomalies in all your data!

— Marcus Thiesen (@mthiesen) June 2, 2015

Berlin Buzzwords 2015 – Day 1

Analytics in the age of the Internet of Things (Ludwine Probst)

Basically a talk about analyzing a demo dataset from sports activity via Spark. Not so much new stuff in there, but beautifully manually illustrated slides.

Real time analytics with Apache Cassandra and Apache Spark (Christopher Batey)

Good speaker, awesome talk. Some takeaways:

One should read the dynamo paper
You can (mis)use the datacenter awareness of Cassandra for isolating workloads if you run spark on top of it.
500ms is the lowest usefull microbatch length

"less than 500ms latency: use Storm. Otherwise Spark (Streaming)" @chbatey #bbuzz

— Marcus Thiesen (@mthiesen) June 1, 2015

Application performance management with open source tools (Tudor Golubenco, Monica Sarbu)

Packetbeat, which apparently just joined Elastic, is a TCP Layer application monitoring solution. So what they basically do is understand your protocol (HTTP, Redis, Postgres) and give you metrics directly from the traffic (how long did my HTTP request take?). Sound really interesting, as it doesn’t need any integration. Will be integrated into the ELK-Stack and get some more data providers. I really like the idea.

Practical t-digest Applications (Ted Dunning)

t-digest is an algorithm to get realtime quantiles/percentiles out of your data. That comes in handy if you want to have the data always at your fingertips and/or want to identify outliers. It is blazingly fast and needs constant memory, so you actually want to have it wherever you have numbers. Of course there is a Java-Library and an integration into Elasticsearch. Awesome speaker as well.

The Do’s and Don’ts of Elasticsearch Scalability and Performance (Patrick Peschlow)

Basically a long reminder to RTFM. Know what data you need, now the pitfalls, disable features you don’t need and make sure that your cluster setup fits your requirements.

Detecting Events on the Web with Java, Kafka and ZooKeeper (James Stanier)

Good speaker, I think they build quite interesting stuff at Brandwatch. It was not clear to me till the end why the built all that stuff themselves, but I think they co-evolved with Storm/Spark and just made their existing software cluster aware rather than rewriting the stuff.

@mthiesen Thanks for the write-up! Indeed – we co-evolved with some of these technologies (in fact, we have had our crawlers since '06!)

— James Stanier (@jstanier) June 1, 2015

Reminded me that there is Apache Curator, a set of high level abstractions for Zookeeper services (https://curator.apache.org/)

Analyzing and Searching Streams of Social Media at Scale using Spark, Kafka and Elasticsearch (Markus Lorch)

IBM is using Spark. Basically they got a pretty standard setup to get a lot of data from Twitter and enrich/augment that with some of their proprietary tech (mood detection etc.). Nothing special here.

Predictive Insights for IT Operations (Omer Trajman)

Actually a pretty good speaker, but I didn’t really get the whole point. He basically explained that you should use the same big data techniques for analyzing the data that comes out of your operations measurements (and btw, he has a company specializing on that). But it wasn’t a sales talk. So basically, yes, analyze all the data.

JSunconf 2015 Day 1

famo.us

Web Components

Hardware Hacking

code.talks 2014 – Tag 2

Machine Learning mit künstlichen Neuronalen Netzen und Clojure
Stefan Richter

Recruiting @fdc 😉

Datengetriebene Analyse und Verbesserung von Code
Andreas Dewes

Die Jungs fahren einen interessanten Ansatz, leider aktuell nur für Python.

Code und Gesellschaft – macht was draus!
Nico Lumma

Guerrilla software design: doing it wrong and getting it right
Marco Cecconi

Predictive Analytics zum Schutze der Liebe
Mario Selk

Die haben bei Parship das interessante Problem der Scammer. War ein sehr unterhaltsamer Talk.

Handgranaten für die Developer
Nils Lauk

code.talks 2014 – Tag 1

Elasticsearch Lessons Learned
Patrick Peschlow

Marvel is a management and monitoring product for Elasticsearch. Daraus das Sense UI als Elasticsearch Client.

Percolator sind gespeicherte Suchen.

Immer Aliase für Indexe benutzen.

Plugins kennen.

https://blog.codecentric.de

WebAPI – expand what the Web can do today
Carsten Sandtner

XTags (?) Mozillas Implementierung auf Web Components, wie Polymer

Contacts API
Settings API
Vibration API
Alarm API

Modern Web Application (In-)Security
Fabian Beterke, Felix Schmidt

Im Westen nichts neues. Bei PHP auch nicht.

Introduction to CoreOS
Timo Derstappen

Github ist eine Public SSH Key Registry.
etcd, flannel, fleet, locksmith for update management

Hamburg Geekettes – Lightning Talks
Diana Knodel, Uygar Gomez, Lisa Junger, Tina Egolf, Inga Halpin, Eshani Sarma, Tina Umlandt

http://www.vivoie.com – Plattform for part-time entrepreneurs

Schluss mit Copy & Paste! Design Pattern automatisieren mit Xtend
Sebastian Zarnekow, Sven Efftinge

Active Annotations – write code that writes code. Nice. Ich hab echt Lust bekommen noch mal Xtend eine Chance zu geben. Ich mag halt kein Scala.

Distributed Ad hoc Real-Time Stream Processing
Christian Kreutzfeldt

OTTO hatte ein Problem dem sie mit Kafka, Storm und Stanza nicht beikommen konnten und haben sich selbst was gebaut. Das heißt wohl ASAP. Das wird wohl bald (asap) Open Source.

Die Freiheit wiedererlangen

Die Neuen Liberalen fragen “Was muss in Deutschland unbedingt angepackt werden”? Die Frage ist zwar schon falsch formuliert, aber ich beantworte sie trotzdem.

Die westlichen Gesellschaften, nicht nur Deutschland allein, stehen vor technischen und strukturellen Herausforderungen, die sich nicht mehr durch eine auf den Staat begrenzte und fokusierte Politk lösen lassen. Der Verlust unserer Freiheiten geschieht durch Geheimdienste diverser Nationen und durch multinationale Konzerne. Die gesellschaftlichen Unterschiede und der Verlust der Mittelschicht sind ein internationales Phänomen. Unsere Umwelt wird von internationalen Wirtschaftsunternehmen geschädigt. Daher muss jede Politk stets eine internationale Politik sein.

Überwachung verhindern, Transparenz fördern

Wir sind in der zweiten Hälfte des Schachbretts. In fünf Jahren wird die Anzahl der Transistoren in einem Computerchip die Anzahl der Neuronen in einem menschlichen Gehirn überholt haben. Der Mensch wird also bald Maschinen bauen die komplexer sind als das Gehirn das sie entwickelt hat.
Der technische Fortschritt der und dessen Geschwindigkeit (Internet, Smartphone) nimmt weiterhin zu. Dennoch sind wir jetzt schon mit den gesellschaftlichen Auswirkungen überfordert.

In diesem Land gibt es 40,4 Millionen Smartphones. Jedes davon ist in der Lage bei geeigneter Manipulation seine nähere Umgebung in Bild und Ton zu überwachen. Was dieses Jahr die Smartphones sind, sind in fünf Jahren die Drohnen, die unsere Pakete bringen und anstelle von festen Überwachungskameras eingesetzt werden. Auf den Straßen Californiens fahren jetzt schon Autos ohne Fahrer.

Unternehmen und Staaten sammeln Unmengen von Daten, die unsere Freiheit grundsätzlich in Frage stellen. Wo jeder Gedanke und jede Bewegung für immer festgehalten wird kann man keine Freiheit mehr ausleben.

Für diese Realität, in dem jedes Telefon, jedes Auto, jede Laterne und jede Uhr eine potentielle Überwachungsmaschine ist brauchen wir ein Konzept und eine Antwort. Diese Antwort kann nur aus starken Regeln für staatliche und private Einheiten bestehen, die die individuelle Freiheit und Verfügungsgewalt des einzelnen über seine Daten in den Vordergrund stellen.

Diese Regeln müssen so fest installiert werden das sie durch Angriffe durch große staatliche Einheiten, die versuchen im Geheimen zu arbeiten, oder durch große wirtschaftliche Einheiten, die durch ihre Systemrelevanz zu einem Machfaktor werden, nicht umgangen werden können.

Diese Regeln sind die einzige Chance die unsere Freiheit im 21 Jahrhundert hat. Wir müssen die Kontrolle über unsere Netze und unsere Kommunikation wiedererlangen.

Durch größtmögliche Transparenz bei allen wesentlichen Entitäten muss von von Anfang an verhindert werden das sich solche unkontrollierten Machtfaktoren entwickeln.

Die Schere schließen

Die soziale Marktwirtschaft ist ausgeblutet. Alleine in Deutschland besitzen die Top 0,1% der Bevölkerung 22,5% des Gesamtvermögens. So kann eine Gesellschaft auf Dauer nicht existieren. Manche haben bereits erkannt das sie bald vom Mob durch die Straßen getrieben werden.

Wir müssen die Unterschiede die nicht zuletzt durch die technischen Veränderungen hervorgerufen wurden wieder unter Kontrolle bekommen, bevor es dafür zu spät ist.

Wirklich frei entfalten kann man sich nur wenn man die nötige Bildung und die nötigen Mittel dafür hat. Die Bildung ist die einzige Chance die wir haben alle großen Probleme der Menschheit unter Kontrolle zu bekommen. Wie schlimm wir darin versagen zeigt jeder einzelne, der sich nach einem Besuch unseres Bildungssystems für den Krieg im Irak und in Syrien entscheidet.

Wir dürfen nicht darauf warten das der technische Fortschritt uns Vollbeschäftigung schenkt, diese Zeiten sind vorbei. Das selbstfahrendes Auto ist die nächste Veränderung die Taxifahrer, Lieferdienste, Busfahrer und LKW-Fahrer erstmal ohne eine Existenzgrundlage zurücklässt. Daher müssen wir jetzt die Existenzgrundlage eines jeden einzelnen sichern und das bedingungslos. Existenz heißt hierbei aber auch eine freie Existzenz mit Möglichkeiten zur vollständigen gesellschaftlichen Teilhabe, ohne staatliche Überregelung oder künstliche Verknappung.

Wer nur damit beschäftigt ist seinen wirtschaftlichen Status Quo zu erhalten oder von Existenzenängsten verzehrt wird kann nicht frei leben.

Wirtschaft kontrollieren

Wirtschaften passiert heutzutage viel zu oft neben der Gesellschaft und nicht als Teil von ihr. Das die Wirtschaft aber ein Teil der Gesellschaft ist haben zuletzt die Bailouts 2008 gezeigt. Die Wirtschaft muss wieder in die Mitte der Gesellschaft zurückgeholt werden. Sie darf kein intransparenter, geldgetriebener und internationaler Machtfaktor sein die nicht mehr zu kontrollieren ist. Gesetze die von Lobbyisten geschrieben werden sind eine Beleidigung für unsere Demokratie.

Wirtschaften darf nur nachhaltig passieren. Das Klima dieses Planeten haben wir schon nachhaltig geschädigt. Trotzdem gibt es immer noch Firmen die in Deutschland Fracking betreiben.

Eine freie Entfaltung ist nur möglich wenn die Wirtschaft in einem gesellschaftlich anerkannten Rahmen operiert, der nachhaltig die Interessen jedes einzelnen als höchstes Gut betrachtet. Dabei ist das Respektieren der natürlichen Resourcen und Grenzen dieses Planeten eine Selbstverständlichkeit.

Das aktuelle Gefühl der Machtlosigkeit gegenüber internationalen Konzernen muss überkommen werden.

JSunconf HH – Day 1

Jelle Akkerman – Your first steps with Clojurescript and Om
ClojureScript ist Clojure für den Client, kompiliert nach JavaScript.
react.js ist ein Databinding-Framework. Om bringt react.js in ClojureScript.

Slides

Robin Böhm – Enterprise AngularJS
Leider nicht so viel interessantes.

Elma Burke & Robin Böhm – Hoodie & AngularJS – The perfect couple for simple and fast Prototyping!

Hoodie ist ein Webapp Stack mit Local Storage Integration (offline first) das auch schon den Backend kram mitbringt (CouchDB). Das kann man dann mit Angular verheiraten und dann ist das awesome.

Damien Klinnert – Angular performance tuning for large apps
– Prefer ng-if to ng-show (subtree removal vs. CSS hidden)
– use bindonce (on github)
– precalculate properties
– Scalyr Angular
– Angular Fastscroll
– use track by in ngRepeat

Tools
– Batarang / angular-instruments

Slide Wiki

Developer Conference Hamburg 2013 – Day 2

Man sollte immer einmal im Jahr was über Sicherheit hören. Der Saal war voll, der Speaker hat extra dunkle Folien gemacht damit wir nach der Party nicht so geblendet werden. Trotzdem was gelernt. Beef Project ist wohl ein XSS Toolkit mit dem man sich mal anschauen kann was so geht. Versioneye hält Bibliotheken und deren Version im Auge und sagt bescheid wenn es was neues gibt.

Der Kollege sprach selbst für meinen Zustand etwas zu monoton. War auch eher ein Einsteigervortrag, daher nichts neues an der Front.

Guter Überblick darüber wie man E-Commerce richtig macht. Technisch jetzt keine großen Besonderheiten.

Johannes hat ein wenig was über Nerds und Manager erzählt und das agile Methoden eigentlich die Lösung für alles sind.

TypeScript ist eine Microsoft Erweiterung für JS die Typen erlaubt. Sieht gut aus, würde ich benutzen, vor allem nachdem das was ich bis jetzt von Coffeescript gesehen habe nicht so überzeugend war.

Guter Einsteigervortrag von einem Cloudera Menschen. Den Teil kannte ich aber leider schon.

Dann hat Christian was über CDNs erzählt.

Fazit: Super Konferenz, nächstes Jahr wieder.

Developer Conference Hamburg 2013 – Day 1 – Ganz großes Kino

Heute begann die dritte Developer Conference in Hamburg. Aus nicht näher zu bestimmenden Gründen hab ich die letzten beiden verpasst, dieses Jahr konnte ich dann endlich mal teilnehmen. Als Location diente das Cinemaxx am Dammtor, ich bevorzuge ja wenn Events in fußläufiger Entfernung zu meiner Wohnung stattfinden.
Der Untertitel Klassentreffen war schon mal sehr passend, die üblichen verdächtigen der Hamburger IT-Szene konnte man schon morgens alle begrüßen.
Die Location an sich war schon mal sehr großartig, Talks in Kinosesseln zu hören ist auf jeden Fall sehr gemütlich.

Die Talks

Die G+J Digital Leute hauen mal raus was sie so machen und wie sie so arbeiten. An sich interessant, wenn man stern.de aber alle drei Jahre neu baut ist das wohl tatsächlich ein besonderer Fall. Da kann man bei der Qualität schon mal Kompromisse eingehen.

Schönes großes System: 560.000.000 PIs in 30 Tagen, 25 Server, Elasticsearch, 384 GB RAM auf den DB Servern, 32 GB auf den Appservern, .NET, Redis.
Lustiges live staging über meta.stackoverflow.com, wo die Entwicklungsversion läuft. Fünf Deployments pro Tag.

Schönes Caching, die cachen sogar ihre UI Objekte um die GC der CLR zu entlasten.
Schöner interessanter Talk.

Angular.js (Hannes Finck)
Super Code basierter Talk, ich muss dringend mal was mit Angular machen. Vor allem muss ich slid.es und Plunker mal testen.

Dann hat Steff halt was über Go erzählt 🙂

Neo4j
Graphdatenbanken sind klasse wenn man stark verbundene Daten hat, die man mit Attributen an Nodes und Edges abbilden kann. Nervige Relationstabellen die man in relationalen DBs braucht fallen weg. Daher auch keine Joins und das ist wohl ziemlich schnell. Transaktionen kann es aber auch. Implementiert in Java und spricht REST, kann man aber auch embedden.

Dann hat Steff was über Clojure erzählt.

elasticsearch

Crate ist eine OpenSource Erweiterung für Elasticsearch, erlaubt SQL abfragen. Quasi eine Big Data DB auf Basis von Lucene. Die Jungs haben das seit drei Jahren im produktiven Einsatz. Kommt demnächst.

Fazit: erster Tag war sehr gut, viel Code gesehen, viele Ideen gehört und viele viele Leute getroffen. Jetzt geht es noch kurz zum Social Event.

blog.thiesen.org

a geek life