Designing Real-Time Collaborative Systems: A Deep Dive into Mouse Pointer Tracking

Master System Design Interviews with a Step-by-Step Approach to Building Scalable, Low-Latency Solutions

Oct 06, 2024

A decade ago, system design interview questions were broad, like "Design Facebook" or "Design WhatsApp," which were too complex to cover in an hour. Candidates began memorizing answers, but struggled when interviewers asked deeper questions.

Over time, the focus shifted to smaller sub-systems, like newsfeeds or notification systems, which fit within the interview timeframe. These problems often reflect real challenges companies have faced, allowing them to evaluate a candidate's approach.

Companies sometimes share these issues in blog posts, so it's crucial for developers to stay updated by reading them.

In today’s article, we'll explore a real-time mouse pointer system, inspired by a problem solved by Canva. We'll treat it as a system design question, create a solution, and discuss the trade-offs. If you're preparing for system design interviews, this article will guide you on how to approach them. Let's begin!

Note: - This article is inspired by Canva’s article on how they designed real-time mouse pointer movements. Credits and Kudos to the Canva’s Engineering team for their efforts.

Problem Statement

Design a collaborative editing system which shows the real-time mouse pointer movements of the editors or the viewers.

**Two different dashboards showing the user’s mouse pointers**

Clarifying questions

Before proposing a design, clarify assumptions and ask relevant questions. Here are some questions to ask your interviewer:

What should the collaborative editor display to the users ?
What should happen on the screen when a user moves their mouse pointer ?
How will we differentiate mouse pointers—by colour or usernames?
Should we design the editing features or focus only on mouse pointer movements?
What happens when a new user joins or an existing user disconnects ?

These questions help define the functional requirements of your system, influencing your API design and the workflows.

Additionally, discuss system constraints, such as :

What is the expected scale and how many concurrent users can the system handle?
Should the system track and store all mouse movement history?
What is the required availability? Should it aim for zero downtime with a 99.99% SLA?
How much lag in milliseconds is acceptable for the mouse movement?
How accurately should the system display the mouse pointer movements?

Take notes on the interviewer’s answers and keep these constraints in mind while designing the system.

Let’s now define the functional and non-functional requirements for the system. (PS - The following requirements my differ based on the interviewer)

Functional requirements (FR)

Users should see the collaborative editor screen and view other users working on the same document. FR-1
All users should see each other's mouse movements on the screen in real-time. FR-2
Each user's mouse pointer must display their username. FR-3
The feature should focus solely on showing the mouse movements, not the collaborative editing. FR-4
When a new user joins, their mouse pointer should appear on all the screens. If a user disconnects, their pointer should be hidden from all the screens. FR-5

Non-Functional requirements (NFR)

The system must handle mouse movements from 100K online users with a high throughput. NFR-1
The system should not keep a history of user’s mouse locations; it must only display the current position and the movement. NFR-2
The system must be highly available with zero downtime and 99.99% availability, functioning without errors during upgrades or deployments. NFR-3
The system’s p99 latency for mouse pointer movements should be under 50 ms for a seamless user experience. NFR-4
Mouse movements do not need 100% accuracy; a tolerance of 5-10% is acceptable. NFR-5

Now that we have clear idea of the system’s requirements, our next step is to come up with a design, define the data models and identify the different workflows.

Detailed Design & Choices

We will now outline the different actors, their interactions, the data flow, and methods for storing and propagating the data.

Actors

Clients - Desktop Web Browsers will connect with the backend servers. Further, they would periodically pass the mouse pointer’s location on the screen. We will assume that the client-side libraries would capture this information and send it to the servers.
Servers - The servers would be responsible for the following functionalities :-
1. Managing the client connections.
2. Processing the mouse pointer updates from the clients.
3. Passing the mouse pointer update from one client to other clients working on the same document.

Data models

The data exchanged between the client and the server falls into the following categories :-

New connection - Client initiates a connection and sends the document on which it is working along with the mouse pointer’s current location. (FR-1)
Mouse pointer movements - Clients would send this as soon as the user’s mouse moves. (FR-2,FR-3, FR-4)
User disconnection - The server would have to hide the mouse pointer from other users in case one user disconnects. (FR-5)

Out of the above, 2 would have a very high volume. We can let the client send the mouse’s updated location three times every second without compromising mouse pointer’s accuracy on UI (NFR-5, Clarify this assumption with the interviewer).

Additionally, the system need strong reliability guarantees for 1 and 3 and they shouldn’t get missed (FR-5). In case the updates are missed, the user’s experience wouldn’t be accurate. For eg:- Showing an offline user online. It’s fine if the system misses mouse pointer movements as the accuracy wouldn’t be compromised much (NFR-5).

Let’s look at the sample data for the three categories.

New connection

{
  "eventType": "NewConnection",
  "userId": "user123",
  "username": "JohnDoe",
  "documentId": "doc987",
  "mousePointer": {
    "x": 250,
    "y": 300
  },
  "timestamp": "2024-10-06T12:34:56Z"
}

Mouse pointer movements

{
  "eventType": "MouseMovement",
  "userId": "user123",
  "documentId": "doc987",
  "mousePointer": {
    "x": 320,
    "y": 450
  },
  "timestamp": "2024-10-06T12:35:10Z"
}

User disconnection

{
  "eventType": "UserDisconnection",
  "userId": "user123",
  "documentId": "doc987",
  "timestamp": "2024-10-06T12:36:00Z"
}

In certain cases, the user may not explicitly leave the screen. For eg:- browser or device crash. In such cases, the server would have to identify the disconnected client and communicate the new state to the other users.

Workflows

Client-Server interaction

The clients would need to maintain a persistent connection with the backend servers. The server would need to relay one client’s mouse update to all other clients. Hence, we would need bi-directional communication between the clients and the servers.

Websockets would solve the use case for real-time communication between the client and the server. (FRs 1-5). Further, it would have low overhead and latency meeting (NFR-4).

Moreover, the clients would connect to a LoadBalancer which would efficiently distribute the traffic among the different backend servers. The least loaded load balancing strategy would be appropriate to ensure even distribution of connections.

Note that in the above setup, clients working on the same doc may get connected to different servers. This avoids making a server single point of failure and doesn’t make a single server a hotspot thereby improving the availability (NFR-3).

Additionally, in case of upgrades and server restarts, the clients would reconnect and the LoadBalancer would rebalance the connection ensuring high availability (NFR-3).

User connection & disconnection

The server needs to handle when a user connects or disconnects and inform all the other users about the change. Since each user connects to a different server, all servers must know where each user is connected. This way, updates are sent to the right servers, which then pass them on to their users.

To solve this problem, every server must have a view of the session state. The session state would consists of :-

DocumentId - identifier of the document.
List of users working on the document, and their corresponding severs.

When the session state changes, one server must notify all others. Others servers must updates their session state view accordingly.

For strong reliability, low-latency, and high throughput, Redis Streams is ideal for exchanging session state. This ensures that server restarts or deployments won't cause issues, as servers can read data sequentially from a reliable stream.

Each server will read from its own dedicated Redis Stream and write to all other servers' Streams.

The below diagram illustrates the process in detail :

**Client-Server connection/disconnection via Redis Streams**

Mouse pointer updates

On processing the mouse pointer update, the server would have to lookup the clients working on the document and their corresponding servers. It would have to then send the message to all the other servers.

Redis Pub/Sub would be best solution for this use case since the mouse pointer updates have following characteristics :-

Low-latency delivery (NFR-4) - Redis Pub/Sub guarantees real-time quick communication between the clients.
Transient messages (NFR-2, NFR-5) - Redis Pub/Sub doesn’t retain the messages if the clients don’t consume it. Also, it’s fine to lose a couple of updates during server restarts.
Throughput (NFR-1) - Pub/Sub handles real-time updates with high throughput and ideal for collaborative editing use cases.

The documentId can be used as a Pub/Sub topic. All the server instances handling a document, would subscribe to this topic.

Any server receiving an update would add the update in the topic. All other server instances would immediately receive the latest mouse pointer update and process it.

Further, the server instance can unsubscribe if it doesn’t have any client working on a particular document. And subscribe as soon as it finds a new client working on a given document.

The following diagram illustrates how the mouse pointer update originates from one client and gets propagated to other clients working on the same document.

Trade-offs

We will now justify the choices that we have made in the above design. In system design, there is no right or wrong answer, and we choose the best option that meets our requirements.

Client-Server communication - WebSockets.

HTTP alternatives such as polling would introduce additional latency and an overhead. This would result in inefficient resource usage and slow updates (violates NFR-4).

Hence, WebSockets would be suitable for our use case of collaborative editing with frequent updates that need minimum latency.

Trade-off decision: Given the need for low-latency, real-time updates, WebSockets are the optimal choice for client-server communication in this use case.

Session data - Redis Streams

Redis Streams provide strong reliability guarantees even if a server goes offline. The server can process missed messages even after restarts.

However, they require more overhead compared to Pub/Sub, making them less ideal for high-frequency, non-critical updates like mouse movements.

Trade-off decision: For session state data, where reliability and message delivery guarantees are critical, Redis Streams are the best solution despite the additional overhead. Since the frequency of connections/disconnections is much lower (about 1/1000th of mouse pointer updates), the overhead is manageable.

Mouse pointer updates - Redis Pub-Sub

Redis Pub/Sub allows for real-time broadcasting of messages, making it ideal for high-frequency, transient data like mouse pointer movements.

Pub/Sub doesn’t guarantee reliable delivery of messages in case the subscribers go down. Hence, subscribers would miss the updates.

Trade-off decision: Given the high frequency of mouse pointer updates and the need for speed over durability, Redis Pub/Sub is the best choice for this type of data exchange.

Serialization format

The mouse pointer and session update data can be exchanged between client-server using JSON. JSON is human-readable and easier to debug. However, with large volume of messages, the CPU consumption would spike.

Binary serialization is faster and uses less CPU. Hence, a compact binary serialization would be appropriate for this use case.

Trade-off decision: Given the high frequency of mouse pointer updates, we must prioritize performance over the convenience of debugging, making binary serialization the better choice.

Redis Commands for Mouse pointer updates

Flushing Redis commands: The server can push every mouse pointer update to the Redis Pub-Sub. However, this would be a resource intensive operation. And the system wouldn’t scale to handle large throughput of messages.

Batching Redis commands : Batching the Redis PubSub commands within specific intervals would result in better resource utilization.

Trade-off decision: Although batching results in slight delay of messages (because commands are sent together at intervals), but this is outweighed by the reduction in CPU usage. Since mouse movements don’t require millisecond-level precision, this was an acceptable trade-off. (NFR-4)

Conclusion

Now, we have a collaborative editing system that shows the real-time mouse pointer movements of the editors or viewers. The design meets all the functional and non-functional requirements that we had listed in the beginning.

In a nutshell, the design addresses the following aspects :-

Client-Server communication - Uses Websockets for bi-directional communication and low-latency. The LoadBalancer ensures high availability and even traffic distribution.
Session data - The system relies on Redis Streams to manage and propagate the session data. It ensures that all the servers have a consistent view of users viewing a given document.
Mouse movements - Redis Pub/Sub provides low-latency, real-time & high-frequency mouse pointer updates.

In this article, we learnt a methodical approach to tackle any ambiguous system design problem. You can remember this and apply the technique in your next system design interview.

We came up with the above design assuming that the client would send 3 updates every second. Do you think the above design would scale in case we want to send 60 updates ?

Are there any bottlenecks in the design to handle 60 updates/sec ? If yes, how can we overcome them ? Leave your thoughts in the comments below.

For more information of Redis Streams, you can refer my article here.

Before you go:

❤️ the story and follow the newsletter for more such articles
🔔 Follow me: LinkedIn, Twitter, Medium
Your support helps keep this newsletter free and fuels future content. Consider a small donation to show your appreciation here - Paypal Donate

How Canva designed real-time mouse pointer updates

Engineering At Scale

Discussion about this post