I’ve created this interactive visualization of New Years Eve 2014 as it unfolded on Twitter using a bunch of open source tools running on Windows Azure.
Recently, I’ve been spending some working with Big Data and Hadoop Distributions and I was trying to come up with a “useful” side project to play around with the technology, what bigger event is there on twitter than the annual #happynewyear tweets as they fly around the world at the dawn of 2014?
I connected to twitters streaming API using a simple node.js client. The open source node package appropriately named Twit by Tolga Tezel does all the heavy lifting for me in a few lines of code. I aggregated over 6 million tweets in 24 hours – averaging 60 tweets per second. According to twitters documentation, the streaming API will give you access to 1% of the twitter firehose at any one time and judging by the geographic spread of the tweets I suspect that it is sympathetic to where in the world you connect from, I was running out of the windows azure data center in Dublin.
Processing the data
Now on to the data crunching, I uploaded all the tweets in multiple 20MB text files to Windows Azure Blob Storage and spun up an 8 node HDInsight Hadoop Cluster to process the data. Storing the tweets naively in blob storage gave me the flexibility to only spin up the cluster for a couple of minutes. I aggregated all the tweets that had a place associated with them and extracted the latitude and longitude coordinates.
Visualizing the results
I used Chrome’s open source Web GL Globe platform to showcase the results in an interactive 360 degree visualization of the data. You’ll need to be running Web GL enabled browser when you connect to the website.
Open Source tools – power to the people
This experiment cost me absolutely zip to conduct ! All the code and technologies I used were open source – node.js, Hadoop and Web GL Globe. The cloud compute time also came free of charge thanks to my MSDN subscription.
The source code is available here, may the source be with you and #happynewyear
I’ve been working on a pretty cool side project that I presented at Tech Ed Australia 2012 – “The Mass Mobile Experiment!”
It’s a generic collaboration framework that enables lots of people (say at a conference) to enjoy a shared experience in real time using their mobile phone, tablet or other internet device. At my Tech Ed session I had over100 people playing a single game of pong on the big screen using their mobile phones to control the paddles in real time! The platform is built using node.js, websockets (socket.io) and it supports a plug-in architecture enabling other games / experiences to plug in pretty easily. So far I’ve got a multi-player quiz game, pong, a political worm and an interactive presentation.
Conceptual Architecture – MME ( Mass Mobile Experiment)
- Client ( mobile phone) sends data to server over long running websocket
- Server (node.js) aggregates the data and sends to the playfield over websockets
- Playfield (browser on a big screen) runs the game loop and process the aggregated data from the server.
- Control Panel allows you to change games and throttle the client and server
Bench marking on Windows Azure
In order to load test the platform I built yet another game! This time I got 200 people in the office to “play the game”. It involved leaving a web page open for 20 minutes while I stepped up the number of websocket connections on each browser and started to send data to the server.
- The client connects to the server over websockets and sends typical game data on a timer
- The server collects interesting metrics such as requests per second and CPU and sends this to the playfield
- The playfield ( load test app) listens for data over another websocket and plots the data in real time
- Node.js server running on a medium size worker role on Window Azure, 3.5 GB RAM, Allocated Bandwidth 200 (Mbps).
- 2000 concurrent WebSockets ( multiplexing over 200 different laptops in the office)
- Requests per second 8500
- Memory Usage on Azure 76%
- Message send frequency from client – 4 messages per second
Check out this screenshot from the azure management portal – I managed to push the CPU to a 89% at 11:40 when the system ramped up to 2000 concurrent users!
- Node.js running on an azure worker role scales really really nicely. In my case a medium sized VM scaled to 2000 concurrent web sockets processing 8000 requests per second. Not a single message got lost between the browser and the server even when the server was under stress!
Why you should distrust this post !
- The measurements were taken using node.js v0.8.9, socket.io v 9.0, these technologies are evolving rapidly.
- For the mass mobile experiment the node server is pretty simplistic, it aggregates data and sends it to the playfield. This may not represent what your application is doing.
All of the results along with all of the source code is open sourced here on GitHub
May the source be with you !