Berlin Buzzwords 2015 – Day 1

wpid-wp-1433180927768.jpg

Analytics in the age of the Internet of Things (Ludwine Probst)

Basically a talk about analyzing a demo dataset from sports activity via Spark. Not so much new stuff in there, but beautifully manually illustrated slides.

Real time analytics with Apache Cassandra and Apache Spark (Christopher Batey)

image

Good speaker, awesome talk. Some takeaways:

  • One should read the dynamo paper
  • You can (mis)use the datacenter awareness of Cassandra for isolating workloads if you run spark on top of it.
  • 500ms is the lowest usefull microbatch length

Application performance management with open source tools (Tudor Golubenco, Monica Sarbu)

image

Packetbeat, which apparently just joined Elastic, is a TCP Layer application monitoring solution. So what they basically do is understand your protocol (HTTP, Redis, Postgres) and give you metrics directly from the traffic (how long did my HTTP request take?). Sound really interesting, as it doesn’t need any integration. Will be integrated into the ELK-Stack and get some more data providers. I really like the idea.

Practical t-digest Applications (Ted Dunning)

image

t-digest is an algorithm to get realtime quantiles/percentiles out of your data. That comes in handy if you want to have the data always at your fingertips and/or want to identify outliers. It is blazingly fast and needs constant memory, so you actually want to have it wherever you have numbers. Of course there is a Java-Library and an integration into Elasticsearch. Awesome speaker as well.

The Do’s and Don’ts of Elasticsearch Scalability and Performance (Patrick Peschlow)

image

Basically a long reminder to RTFM. Know what data you need, now the pitfalls, disable features you don’t need and make sure that your cluster setup fits your requirements.

Detecting Events on the Web with Java, Kafka and ZooKeeper (James Stanier)

image

Good speaker, I think they build quite interesting stuff at Brandwatch. It was not clear to me till the end why the built all that stuff themselves, but I think they co-evolved with Storm/Spark and just made their existing software cluster aware rather than rewriting the stuff.

Reminded me that there is Apache Curator, a set of high level abstractions for Zookeeper services (https://curator.apache.org/)

Analyzing and Searching Streams of Social Media at Scale using Spark, Kafka and Elasticsearch (Markus Lorch)

IBM is using Spark. Basically they got a pretty standard setup to get a lot of data from Twitter and enrich/augment that with some of their proprietary tech (mood detection etc.). Nothing special here.

Predictive Insights for IT Operations (Omer Trajman)

Actually a pretty good speaker, but I didn’t really get the whole point. He basically explained that you should use the same big data techniques for analyzing the data that comes out of your operations measurements (and btw, he has a company specializing on that). But it wasn’t a sales talk. So basically, yes, analyze all the data.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s