Spark, Hadoop, Hive and Programming: Design Facebook Chat Function

Monday, March 6, 2017

Design Facebook Chat Function

How to design Facebook chat function?

First and foremost, as I mentioned in previous posts, system design interviews can be extremely diversified. It’s mostly up to the interviewer to decide which direction to discuss. As a result, different interviewers can have completely different discussions even with the same question and you should never expect this article to be something like a standard answer.

Basic infrastructure

It’s better to have a high-level solution and talk about the overall infrastructure. If you have no prior experience with messaging app, you might find it not easy to come up with a basic solution. But that’s totally fine. Let’s have a very naive solution and optimize it later.

Basically, one of the most common ways to build a messaging app is to have a chat server that acts as the core of the whole system. When a message comes, it won’t be sent to the receiver directly. Instead, it goes to the chat server and is stored there first. And then, based on the receiver’s status, the server may send the message immediately to him or send a push notification.

A more detailed flow works like this:

User A wants to send message “Hello Gainlo” to user B. A first send the message to the chat server.
The chat server receives the message and sends an acknowledgement back to A, meaning the message is received. Based on the product, the front end may display a single check mark in A’s UI.
Case 1: if B is online and connected to the chat server, that’s great. The chat server just sends the message to B.
Case 2: If B is not online, the chat server sends a push notification to B.
B receives the message and sends back an acknowledgement to the chat server.
The chat server notifies A that B received the message and updates with a double check mark in A’s UI.

Real-time

The whole system can be costly and inefficient once it’s scaled to certain level. So any way we can optimize the system in order to support a huge amount of concurrent requests?

There are many approaches. One obvious cost here is that when delivering messages to the receiver, the chat server might need to spawn an OS process/thread, initialize HTTP (maybe other protocol) request and close connection at the end. In fact, this happens to every message. Even if we do the other way around that the receiver keeps requesting the server to check if there’s any new message, it’s still costly.

One solution is to use HTTP persistent connection. In a nutshell, receivers can make an HTTP GET request over a persistent connection that doesn’t return until the chat server provides any data back. Each request will be re-established when it’s timed out or interrupt. This approach provides a lot of advantages in terms of response time, throughput and cost.

Online notification

Another cool feature of Facebook chat is showing online friends. Although the feature seems to be simple at the first glance, it improves user experience tremendously and it’s definitely worth to discuss. If you are asked to design this feature, how would you do it?

Obviously, the most straightforward approach is that once a user is online, he sends a notification to all his friends. But how would you evaluate the cost of this?

When it’s at the peak time, we roughly need O(average number of friends * peak users) of requests, which can be a lot when there are millions of users. And this cost can be even more than the message cost itself. One idea to improve this is to reduce unnecessary requests. For instance, we can issue notification only when this user reloads a page or sends a message. In other words, we can limit the scope to only “very active users”. Or we won’t send notification until a user has been online for 5min. This solves the cases where a user shows online and immediately goes offline.

Spark, Hadoop, Hive and Programming

Monday, March 6, 2017

Design Facebook Chat Function

Basic infrastructure

Real-time

Online notification

No comments:

Post a Comment