Redis Persistence Dive Deep - Trade-offs Between Performance And Durability
Understanding AOF and RDB strategies, their trade-offs and applications
What would happen to your data if your Redis instance crashed suddenly? Would all your data be lost, partially recoverable, or safe ?
Redis is widely known as an in-memory cache, but can it also serve as a durable storage system like a traditional database?
Like many developers, I initially assumed Redis was just a cache. But as I explored its persistence mechanisms, I realized it could offer more.
Performance and durability are two key considerations in designing Redis’s data persistence. If the system prioritizes performance, it must compromise on durability, and vice versa.
In this article, we will dive deep into the different data persistence strategies of Redis. We will look at the pros/cons of each and their practical applications.
By the end of the article, you will gain sufficient technical depth to decide the right persistence strategy for your use case. You will also enhance your skills to succeed in your next system design interview.
With that, let’s first understand the need for data persistence.
Why does Redis need data persistence ?
Redis is an in-memory cache and stores all the data in the RAM. RAM is volatile, and if the server crashes or restarts, all the data is lost.
Assume that you are using Redis as a cache for a backend database. What would happen if it crashed mid-way while handling customer traffic ?
The following diagram illustrates the state of the application before and after Redis crash.
Here’s what would happen after a Redis crash:
High database load - All the cache requests would fallback on the database increasing its load. Often, this is known as the thundering herd problem.
Slow queries - Since database fetches the data from the disk, the query latency would spike by 100x.
Reduced availability - Some slow queries might time out, leading to failed application requests and reduced system availability (e.g., from 99.99% to 95%).
After the Redis server restarts, there would be a brief period where the data would be written to the cache, and eventually, the system would function as expected.
As we just saw, there is a possibility of complete data loss in case a Redis instance crashes. Additionally, this impacts the application's performance and results in a poor user or client experience.
So, is there a way to tackle this problem by permanently storing the data somewhere ? Yes, and Redis provides several strategies to solve this problem.
Let’s now take a look at the first strategy called Append-Only File (AOF) for persistence.
Append-Only File (AOF)
Redis clients interact with Redis by running different commands. For example: SET key value, INCR key, LPUSH key value, etc.
Redis is single-threaded by design and sequentially executes all the commands. The command execution modifies the in-memory data structures, and we end up with a particular state.
The Append-Only File (AOF) persistence strategy exploits this and stores all the commands in an append-only file. On server restarts, the AOF is read and all commands are sequentially executed to recover the previous state.
The following diagram illustrates how Redis logs commands in the AOF file.
The diagram below shows how Redis reconstructs the previous state by reading the AOF file.
Why are commands added to the AOF after execution, not before? 🤔 🤔
Leave your thoughts in the comments
We just saw that we can completely shield Redis from data loss by using the AOF technique. But astute readers would have noticed that we are logging every command to disk and that might increase the I/O. This would eventually slow down Redis’s main thread.
Although the AOF technique can prevent complete data loss, it comes with an additional cost that impacts performance. Let’s now look at the pros/cons of this approach.
Pros
Data durability - It can reconstruct the previous state and result in 0 data loss.
Cons
Performance - Since every command is written to a disk, there is an additional disk I/O. This impacts the performance of each command and might introduce an additional 100-200 ms latency. This prevents clients from using Redis as a high-performance, in-memory cache.
Slow starts - The AOF might contain billions of commands. Given the single-threaded nature and sequential execution, complete recovery may take anywhere between 100 milliseconds to 1-2 seconds.
Logging every command to the disk impacts the performance. As a result, Redis provides multiple configuration options to tune performance and durability.
Configuration options
Redis allows tuning AOF behaviour with three configurations, each offering a trade-off between performance and durability.
Internally, Redis uses the fsync system call to flush executed commands to the AOF. It provides the following three options :-
appendfsync always - It logs all the executed commands to the AOF. It takes time but guarantees strong durability.
appendfsync everysec - Executed commands are appended to a buffer, which is flushed to the AOF every second. In the worst case, Redis might lose one second of new data.
appendfsync no - Instead of application flushing the data, the flushing is controlled by the Operating System. Depending on the kernel, the frequency could be anywhere between 30 sec - 60 sec.
The below matrix captures the three different options along with the impact on the system’s quality attributes.
While AOF minimizes data loss, it has a performance penalty. Thus, Redis provides another option known as RDB (Redis Database).
Let’s understand how RDB tries to overcome the downsides of AOF.
Redis Database (RDB)
Redis takes a snapshot of all the in-memory data and stores it in a binary file (.rdb suffix). In case of a crash, it reads and recovers the data from the rdb file.
It provides the commands SAVE and BGSAVE (background save) to take a snapshot. Here’s the difference between the two commands:
SAVE - It’s sequentially executed in Redis’s main thread. As a result, it impacts the cache’s performance due to additional I/O overhead.
BGSAVE - It forks a new child process. The child process executes on a different CPU core and copies the data into the RDB file.
Redis crash recovery process using RDB is illustrated in the following diagram.
What would happen if the memory is modified during snapshotting ? 🤔 🤔
Leave your thoughts in the comments
Now that you understand the basic working, let’s look at some pros/cons of this approach.
Pros
Performance - The snapshotting process can be executed on a different thread (using BGSAVE). As a result, it doesn’t impact the overall performance of Redis.
Restart time - The RDB file data can be read quickly and restored. This reduces the restart time and overhead in data recovery.
Cons
Data loss - It doesn’t guarantee 0% data loss. As a result, clients may get stale data or result in cache misses.
Large datasets - Since fork() is used, the process can be time-consuming for large datasets. This may impact the client’s performance in case Redis stop serving the clients till the child process is successfully created.
So far, we have seen that while AOF prioritizes data durability at the cost of performance, RDB sacrifices some durability in favor of better performance.
Given the trade-offs of AOF and RDB, is there a way to achieve a middle ground? Fortunately, Redis offers a hybrid approach: AOF + RDB.
We can combine RDB with AOF and strike the right balance between the two approaches. Let’s now understand how we can get best of both the worlds.
AOF + RDB
Redis provides a configuration option aof-use-rdb-preamble. If it is enabled, Redis takes periodic snapshots and stores the subsequent commands in the AOF file.
The diagram below illustrates the working of AOF + RDB.
Through snapshotting, it eliminates the need to execute all the commands from AOF and only uses the commands executed after snapshotting. This reduces the startup time and also AOF size.
Further, it minimizes the data loss by logging the executed commands in AOF. Unlike RDB, it doesn’t lose the commands executed after snapshotting.
The following are some pros/cons of this approach :-
Pros
Data durability - Data loss depends on the frequency of executing the fsync. Increasing the frequency would lead to less data loss but impact performance.
Startup time - Since it reads the snapshot and then executes commands in AOF, less time is taken during startup as compared to AOF only approach.
Performance - Performance is better since it depends on fsync frequency. Reducing the frequency improves performance (impacts durability).
Cons
Complexity - Loading the rdb snapshot and then applying the AOF commands adds complexity to the process.
Now that you have a good understanding of Redis persistence, let’s compare the different approaches and look at practical applications of each.
AOF vs RDB vs AOF + RDB
The following table summarizes the strengths and weaknesses of each of the three approaches.
Let’s now look at the practical applications of each of the approaches.
AOF
AOF is best suited for applications that require strong durability guarantees. Following are some common use cases :-
Financial transactions - Banking systems, e-commerce payment services, etc.
Chat applications - WhatsApp, Discord, Slack, etc.
Stock market applications - Order books for stock market. Complete state can be reconstructed without data loss.
RDB
RDB enables quick data recovery and minimal performance impact. Also, use cases that don’t require strong guarantees can opt for RDB. Here are some common practical examples :-
Web application caching - CDN caching, API rate limiting, etc
Gaming leaderboards - Online leaderboards showing stats
AOF + RDB
This approach strikes the right balance between performance and durability. Additionally, it speeds up the boot time resulting in better performance than AOF. It can be used for the below use cases :-
Real-time analytics and fraud detection
E-commerce product inventory
Microservices communication
Conclusion
Redis stores the data in-memory and risks data loss due to crash or restarts. It provides several approaches to prevent data loss and recover after crash.
The Append-Only File (AOF) uses a log file that contains list of commands executed by Redis sequentially. In case Redis restarts, it reads the file and executes all the commands to recover the full state.
AOF ensures 0% data loss but impacts the performance. To overcome this challenge, a data persistence technique called RDB (Redis Database) is used.
Redis takes a snapshot of its memory and stores it in a .rdb binary file. After crash recover, Redis recovers the old state by reading the binary file. However, the data durability is compromised since commands executed after the snapshot aren’t stored.
To overcome the performance and durability extremes of the two approaches, AOF and RDB can be combined. This helps balance the performance and durability trade-offs.
Now that you have understood the data persistence concepts, which technique would you use to convert Redis into a permanent database ? Leave your thoughts in the comments below.
Leave your answers to the above question in the comments below.
Before you go:
❤️ the story and follow the newsletter for more such articles
Your support helps keep this newsletter free and fuels future content. Consider a small donation to show your appreciation here - Paypal Donate
Also I believe for RDB file creation when Redis' main process forks a new child process it uses Copy-on-write (CoW) technique to share cache data by referencing it through pointers instead of creating a new copy of existing data, which minimises memory footprint and decreases the snapshot creation latency.
great read. awesome article!!