System Design Fundamentals: What are Non-Functional Requirements ?
A brief overview of Non-Functional requirements with examples.
Introduction
Over the past decade, I have had the privilege of working on various systems within the software industry. My experience encompasses High Frequency Trading (HFT) systems, Card Transaction & Order Management platforms, Microsoft's Cloud Infrastructure, and Amazon's E-commerce website. Each of these systems has presented its own unique set of requirements, and I have thoroughly enjoyed engaging in the design process, reviewing my own designs, as well as assessing and critiquing the designs of others.
This journey has been truly remarkable, with each project providing invaluable learning opportunities. As a junior developer, I initially found system design to be a daunting task. I would approach design problems by focusing primarily on functional aspects, ensuring that my design solved the intended problem. However, during the review process, I would encounter questions such as, "How would the system handle server crashes? What measures are in place to address disk failures? How will the system cope with sudden surges in traffic during sales events? Are user data being stored securely within the system?"
Such questions are commonly posed by experienced senior engineers, highlighting the importance of considering these critical edge cases. Newcomers to the industry often overlook these aspects, subscribing to the belief that if their code runs smoothly on their own machine and passes test cases, it will surely function flawlessly in production environments.
System requirements can be classified into two main categories: functional and non-functional requirements. Functional requirements outline what the system must accomplish, while non-functional requirements focus on how the system should achieve its objectives. Non-functional requirements address the system's behavior in various scenarios. Throughout the design process, it is imperative for engineers to ensure that the proposed design satisfies both the functional and non-functional requirements.
This article aims to delve into the realm of non-functional requirements, providing an understanding of their significance and impact. Additionally, we will examine real-world examples of diverse software systems, illustrating the role that non-functional requirements play throughout the software development life cycle.
Latency
Latency refers to the total time it takes for a client's request to be sent and a final response to be received from the server. It is a critical aspect of any system, and systems with minimal latency are considered highly performant. To illustrate the importance of latency, let's consider an example.
Imagine you are using an e-commerce website to shop for your favourite t-shirts. While browsing through the items, let's say the website takes more than 5 minutes to load. This would undoubtedly frustrate users, leading them to quickly navigate to a competitor's website. In today's fast-paced world, time has become increasingly precious, and companies need to deliver seamless experiences to customers without keeping them waiting.
This is precisely why we must place significant emphasis on minimizing latency. There are multiple techniques available to optimize latency, with one common solution being caching. Caching helps reduce response time by storing frequently accessed data or results. Often, the introduction of new features can result in increased latency for webpages. To avoid such scenarios, companies establish guardrails to ensure that the overall latency of the webpage remains unaffected.
In conclusion, latency plays a crucial role in determining the performance and user experience of a system. It is imperative for companies to prioritize minimizing latency through various optimization techniques, such as caching, to deliver fast and responsive applications that meet the expectations of users in today's time-sensitive environment.
Throughput
Throughput is the rate at which a system or process can effectively handle and process a specific amount of work within a given time period. It serves as a measurement of the system's capacity to manage tasks, transactions, or data transfers.
In many applications, a second factor authentication is implemented to verify user identities. To combat fraud, credit card and debit card issuers often authenticate users by requesting them to enter a One Time Password (OTP). Internally, the backend of these systems generates an OTP, stores it, and sends a notification to the user's device. The backend infrastructure is complex, involving numerous interactions among microservices.
For instance, during a holiday sale event where numerous people are making purchases using their cards, the card transaction processing systems experience a significant surge in traffic. The services generate and send OTPs asynchronously, utilizing queuing systems like SNS or SQS. Consumers retrieve the OTPs and transmit them through the network provider. However, if the number of messages in the SQS or SNS queue increases, the consumers must process the messages promptly to prevent a buildup.
High system throughput allows for the processing of a large volume of messages within a specified timeframe. In the given example, the system's throughput can be enhanced by adding more consumers. Throughput is also crucial for batch processing systems and high-throughput systems that perform real-time analytics and handle extensive data processing.
Accuracy
Banking applications need to show the data to the users with high degree of accuracy. In case a Bank’s customer wants to view the balance and his current balance is $100,000. Assume that the user is shown $50,000 instead, what would be the reaction of the user ?
Let’s contrast this with a video streaming website like Youtube which shows the number of times a video has been watched. In case a video receives a million views on a youtube but shows 900k views, should it be fine ? Yes, from a user’s perspective, nothing changes. Accuracy or +/- 10% should be alright in such cases. However, showing 10 instead of 1 million will not be acceptable.
Engineers need to define the degree of accuracy while designing any system. Some systems like Banking applications need to show the data like user’s balance with 100% accuracy. While applications like Youtube can accept an accuracy of +/- 10% while showing the number of times a video is watched.
Durability
Durability refers to the software's ability to securely store data indefinitely. In today's data-driven world, every user activity is meticulously recorded, encompassing everything from the places visited to the conversations you had with friends.
On a daily basis, vast amounts of data are generated, including text, images, videos, and gifs, amounting to petabytes of information. Companies leverage this user data to understand behavioural patterns and make informed decisions aimed at enhancing user engagement. To ensure persistence, user data is typically stored on external devices such as Solid State Drives (SSDs).
Failures are an unavoidable aspect of distributed systems. Both SSDs and Hard Disks are susceptible to failure, resulting in the potential loss of all data within a matter of seconds. Additionally, natural disasters can impact data centres, leading to the complete removal of user data.
To address this, companies employ data replication techniques. User data is replicated across multiple machines, ensuring redundancy. In the event of a single machine failure, the data can be retrieved from the remaining machines. The diagram below illustrates this scenario: upon uploading a photo, it is replicated on multiple machines. In case of any failure during photo retrieval, an alternative machine is used to fetch the desired data
Few of the systems don’t need strong durability. For eg:- OTP is transient and expires within few minutes. Hence, in such cases, the OTP is deleted once it gets expired. However, apps like Instagram need to ensure that any reel or photo doesn’t get deleted due to any system level failure.
Security
We often hear of data breaches where the user’s important information gets leaked. Without security, anyone on the internet can read your personal messages, use your credit card, or inject malware into your system. Designing a secure system is one of the important pillars of system design.
Internet products involve communication with the users through interfaces such as web browsers, mobile apps, desktop apps, tablets, etc. The rule of thumb is any data in transit must be encrypted and no third party should be able to read the data. Almost all of the websites today use TLS (Transport Layer Security) for communication.
User’s credit or debit card information is considered sensitive. Card issuers use advanced hardware devices or managed services like AmazonKMS (Key Management Service) for securely encrypting the sensitive information. Moreover, for inter-service communication, companies employ authentication & authorization to prevent unknown users from attacking the system.
Availability
I use my credit card to make payments for Uber rides and typically settle the amount prior to booking a new trip. Within Uber, my card details are securely stored, and I only need to provide the CVV and my bank's OTP (One-Time Password) to finalize the transaction for the previous trip.
Recently, I traveled to a new city with an inadequate rail and bus network, making Uber my primary means of transportation. As I needed to pay for my last trip, I entered my CVV and was redirected to the Bank's OTP page. Unfortunately, I did not receive the OTP initially. I attempted to resend the OTP, but it failed to work.
Feeling a sense of urgency, I waited anxiously for approximately five minutes when I finally received an IVR (Interactive Voice Response) call. It was an automated call from my bank, providing me with the OTP. With the OTP in hand, I successfully booked a new ride. This experience highlighted the criticality of maintaining system availability and the potential impact on users during instances of downtime.
In today's world, any form of service downtime is simply unacceptable. It not only results in a subpar user experience but can also lead to frustration. Services must strive for high availability, although achieving a 100% guarantee is practically impossible. Nevertheless, it is essential to design systems in a manner that minimizes the adverse impact on users.
There are multiple ways in which the system’s availability can be improved. Few of the examples are using redundancy, using retry mechanisms for failure, disaster recovery and load balancing.
Scalability
Scalability is the ability of a system to handle growing load without degrading the performance. Applications such as Google, Instagram, Tik-Tok, Gmail, Amazon are scalable as they are elastic enough to meet the increasing demands of the new users.
Scalability often comes at a cost and companies need to invest in better hardware or increase capacity of the existing systems. Most of the cloud providers offer scalable services out of the box such as Amazon DynamoDB, Azure CosmoDB, Azure Kubernetes Services etc.
Scalability plays an important role in the growth of any Internet product company. The business’s scalability is directly proportional to the scalability of it’s software.
I have discussed Scalability at length in my previous article. For more details, you can refer -
Conclusion
Non-functional requirements are crucial in system design, as they impact the performance, user experience, and overall system success. Engineers need to address both the functional and the non-functional requirements to provide a seamless user experience, and meet user expectations.
To develop robust and successful software, we need to lay emphasis on the performance, scalability, availability, security, etc. Before finalising the design, it is important to ensure that the design meets both the functional and non-functional requirements. Once we follow this practice, we can build systems that deliver value and stand the test of time
Thanks for reading the article! Before you go:
You can share your thoughts on the article in the comments below.
Nice one