Coding Hell

Programming and stuff.

ReChat for Twitch

A little while ago, I released an extension that adds a nifty feature to Twitch (Twitch is the video streaming platform for games that was acquired by Amazon last year for $970 million): ReChat for Twitch.

Twitch does not only allow users to see live video streams but also records them so that users can watch them later. The one thing you miss out on when you watch a recorded stream is the chat. This is where ReChat comes into play: ReChat allows you to see the recorded chat messages as if it was a live stream.

Indexing chat messages

Since Twitch does either not record chat messages or at least not make them available through their API, I was forced to build my own indexing system.

Luckily, Twitch’s custom built chat server does not only come with an HTTP front-end, but is also reachable via IRC. IRC, in contrast to the proprietary HTTP chat interface of Twitch, is a well documented and quite simple standard (RFC 1459) and was therefore the natural choice to connect to Twitch chat.

I chose to write the indexing daemon in JavaScript and run it with Node.js. It’s event-driven architecture and the non-blocking I/O seems like a perfect fit for an application that waits on numerous connections for incoming chat messages.

For the storage backend, I chose to go with Elasticsearch. Not only does Elasticsearch a good job at indexing hundreds of documents per second, it also is easily scalable and blazingly fast at finding documents. Every chat message is represented by a JSON document consisting of the actual message, the sender, the chat room, and a timestamp.

Browser extension

Since all modern browser (probably even Internet Explorer 12) feature a JavaScript powered extension interface, writing cross-browser extensions has never been easier. The main difference between the browsers is how you have to provide metadata (e.g. manifest.json for Chrome, package.json for Firefox, …) and the API to access bundled resources.

All of the browser extension’s source code is available on GitHub.

Search API

The ReChat search API consists of a basic Sinatra web application that talks to Elasticsearch, a threaded Puma that acts as application server, and an nginx as reverse proxy.

Based on the unique Twitch video ID, the web application fetches video time information via Twitch API, searches for the matching chat messages in the Elasticsearch index and serves them in chunks with pagination.

Aggregations

Elasticsearch has some nifty aggregation features to analyze the data available. Maybe some sort of statistics page would be a nice addition? It could for example feature a visual representation of the most frequent words (some common english words filtered out):

Comments