ACID properties: Simplified and Illustrated
A Beginner's Guide to Relational Databases' ACID Properties
In today’s ever changing tech landscape, new frameworks, libraries and languages emerge every year and replace the existing one. Yet, despite this rapid evolution, companies still continue to use three decade old relational databases such as PostgreSQL and MySQL.
Have you wondered how bank transactions are seamless, how double booking is prevented by flight and movie booking websites, and or why your social media data remains intact despite crashes? The answer lies in the ACID properties of databases. These properties—Atomicity, Consistency, Isolation, and Durability—act as safeguards against errors, crashes, and other failures.
In this article, we will delve into each property, using illustrations to make concepts easy to grasp. By the end, you will have strengthened your database fundamentals and be equipped with the skills needed for your next tech interview.
Illustration
We will understand each of the ACID properties using an example to understand it clearly. Let’s assume that we are storing bank balances for different customers. We will create the tables shown in the diagram below :-
Constraints
We will impose the following constraints on the table :-
Primary key constraint -
AccountId
will be the primary key for theAccounts
table.Referential integrity constraint -
AccountId
will be a foreign key in theBalances
table.Positive balance constraint - The
balance
column of theBalances
table must be strictly greater than or equal to 0.
Atomicity
Relational databases use transaction as a unit of work. Each transaction can consist of several read and write operations.
Every transaction is atomic or indivisible in nature. This Atomicity property states that the transaction will either succeed or fail. It won’t ever be left in a partially completed state.
This property protects the database from any inconsistencies and maintains data integrity during failures. It avoids any unexpected outcome due to partial failures.
Absence of Atomicity would leave the database in an inconsistent state, result in data loss or unreliable transactions.
Let’s understand Atomicity with an example.
Example
Transfer $300 from John’s account (id=2) to Jane’s account (id=1). We will have execute the following two SQL queries on the balances
table :-
Operation 1:
UPDATE Balances SET Balance = Balance - 300 WHERE AccountId = 2;
Operation 2:
UPDATE Balances SET Balance = Balance + 300 WHERE AccountId = 1;
Without Atomicity
Without Atomicity, the above two operations would be executed independently. Let’s walkthrough the following two error scenarios :-
Operation 1 fails & Operation 2 succeeds - In case the first operation fails, the database won’t deduct amount from John’s account. But it would add $300 to Jane’s account since the second operation has succeeded.
Operation 1 succeeds & Operation 2 fails - In case the first operation succeeds, $300 would be deducted from John’s account. However, if the second operation fails, then Jane wouldn’t receive $300.
In both the cases, the database would be in an inconsistent state.
With Atomicity
The above diagram illustrates how Atomicity works in databases. The two operations are executed as a part of a transaction.
Atomicity ensures that the transaction would either succeed or fail. If the transaction succeeds, it would execute all the operations. In case of failures, none of the operations would be executed.
With Atomicity, both the operations would be part of a single transaction. Let’s consider the failure scenarios mentioned in the previous section :-
Operation 1 fails - The database would rollback the transaction in case Operation 1 fails. As a result, it won’t execute both the statements. Hence, the state of the database wouldn’t change.
Operation 1 succeeds & Operation 2 fails - If the first operation succeeds, the database would make local modifications to the value but won’t commit the change. Once the operation 2 fails, the transaction would rollback the changes. The rollback would take the database to its previous state.
Both the operations succeed - In this case, $300 would be deducted from John’s account and credited to Jane’s account.
In case of transaction failure, none of the records in the Database would change. While if the transaction succeeds, it would take the database to a valid state.
Consistency
Consistency guarantees that a database transaction always takes a database from one valid state to another valid state. It enforces that the database follows certain pre-defined rules and constraints during transaction execution. The rules could be constrains on column values (non-zero values) or unique column values.
Without Consistency, the database would store inaccurate values and applications relying on the database would be unreliable.
Example
Let’s perform the below two operations :-
Transaction 1 -
UPDATE Balances SET Balance = Balance - 700 WHERE AccountId = 2
Transaction 2 -
INSERT INTO accounts (accountId, name, description)
VALUES (1, 'Jim Smith', 'Personal Savings Account')
Without Consistency
Without Consistency, following would be the result of the above two transactions :-
Transaction 1 - It would update the balance of accountId=2, and the new balance would be -100 (600 is the original balance).
Transaction 2 - This would create another account with accountId=1. Thus, there would be multiple accounts with the same accountId.
As a result, the database would violate the business rules and go into an inconsistent state.
With Consistency
We will see the below results if database guarantees consistency :-
Transaction 1 - It would throw an error violating balance check constraint. It would not update the value of the balance column.
Transaction 2 - It would throw an error indicating duplicate key that violates unique constraint. The database won’t insert the new record in the table.
Consistency ensures that the database adheres to business rules and guarantees data integrity.
Isolation
Relational databases have to handle multiple concurrent transactions. These transactions can contain operations which could modify common records.
It’s important to ensure that two concurrent transactions don’t step on each others toes while modifying the same record. Also, the database must guarantee data integrity after the execution of multiple transactions.
The database achieves this through transaction isolation. Isolation ensures that the final effect of running two concurrent transactions is equivalent to running them one after another. As a result, it avoids data races and ensures correct final state.
Example
Let’s say our user John (userId=1) purchases some goods worth $300 and at the same time, his friend transfers $100 to his account. We will have the following two transactions updating the balance :-
Transaction 1 (T1) :-
UPDATE Balances SET Balance = Balance - 300 WHERE AccountId = 2;
Transaction 2 (T2):-
UPDATE Balances SET Balance = Balance + 100 WHERE AccountId = 2;
Given initial balance of $600, his final balance must be $400 (600-300+100).
Without Isolation
The database would execute the transactions in the following manner :-
T1 & T2 would read the value of Balance as $600.
T1 would execute and compute the value of Balance = (600 - 300) = 300.
T2 would execute and compute the value of Balance = (600 + 100) = 700.
If T1 commits first followed by T2, then T2 would overwrite the value of Balance to 700.
While if T2 commits first followed by T1, then T1 would win and update the final value of Balance to 300.
In either of the cases, the final value doesn’t match the expected value of $400. As a result, the data integrity is violated due to race condition between concurrent transactions.
With Isolation
Let’s see how databases internally isolate transactions and guarantee correctness :-
T1 & T2 would both read $600 as Balance value and execute in parallel.
T1 would compute the final value of Balance as 300 but it would acquire a lock on the database record.
T2 would find that T1 has taken a lock and it would wait on the lock.
Once T1 commits, it would update the balance to $300. It would subsequently release the lock.
Finally, T2 would acquire the lock and find the new value of Balance i.e $300. T2 would execute and update the Balance to $400 (300 + 100).
Using transaction isolation, the final Balance value matches the expected value of $400.
Isolation is expensive and impacts the performance. Relational databases define different levels of isolation. Developers have to make a trade-off between performance and consistency while choosing the isolation levels.
Durability
Durability ensures that once data is committed, it will not be lost. If a transaction fails, any uncommitted data will not be written to the disk.
This property guarantees safety of the data in the following scenarios :-
Database process crashes.
Database software upgrades.
Machine reboots.
Relational databases use a Write-Ahead Log (WAL) to store a list of all committed transactions. Once a transaction is committed, it is recorded in the WAL. Even if the database crashes before the data is committed, the WAL is used during a restart to restore the database state.
Example
Let’s say we execute the below SQL query through a transaction-
BEGIN;
UPDATE Balances SET Balance = Balance + 300 WHERE AccountId = 2;
COMMIT;
Assume that the initial balance is 600. Once the transaction is successful, the updated balance would be 900.
Let’s assume that the database crashes or the machine reboots just after committing the transaction.
Without Durability
Without durability, the balance for the account won’t be updated after the database restart (after reboot/crash). Thus, any updates to the database would be lost.
With Durability
After the database restart, updates done to the records wouldn’t get lost. The database would read the records from the WAL and reconstruct the records just before the crash. Hence, it would show the last committed value of balance i.e 900.
Conclusion
ACID properties safeguard a database from several failures. These properties ensure the data integrity, consistency and validity after any data modification operation.
Let’s summarise and go through each of the properties :-
Atomicity - Transaction is the smallest unit of work. It consists of several database read/write operations. Atomicity guarantees that the transaction would either succeed or fail. It follows the all or nothing principle and ensures that the database never ends up in a state where a transaction partially succeeds.
Consistency - Consistency ensures that the database adheres to the constraints imposed by the user. The constraints include primary key constraints, referential integrity constraints, & non-zero column value constraints, etc. The database transitions from one valid state to another on transaction execution.
Isolation - Isolation allows a database to run multiple transactions concurrently without resulting in any data corruption. Isolation is expensive and performance takes a hit. As a result, databases offer various isolation levels to make a trade-off between performance and data accuracy.
Durability - Once data is written to the database, it is never lost. Databases use Write-Ahead Log (WAL) as the source of truth for all the data. It protects the databases from any crash, or machine reboots and safeguards the data.
Till now, we have only scratched the surface of databases. In the subsequent posts, we will dive deep into advanced concepts like Database isolation levels, working of WAL, and database crash recovery.
Thanks for reading the article!
In this article, we explored an example of a bank transaction to explain ACID properties. Can you think of other examples that utilise ACID properties? Share your thoughts in the comments below.
Before you go: