Database Proxies: Connection Management at Scale
Understanding database proxies for sharding, replication, and traffic spikes
In the last article, we learned how connection pools play a pivotal role in managing database connections.
But, do they fully safeguard a database from unexpected traffic spikes? Would adding 10 service instances result in a 10× increase in database connections? 🤔
Database proxies prevent this by residing between the service and the database and acting as connection multiplexers.
Moreover, proxies also help in solving other challenges such request routing, query caching, failover, and so on ensuring scalability and reliability.
In this article, we will understand how proxies solve the scaling and availability challenges in distributed systems. The article will also discuss trade-offs and include a decision matrix to guide the proxy decision.
With that, let’s first understand the various challenges in building distributed applications with a relational database like PostgreSQL or MySQL.
Distributed System Challenges
Assume that we have a service managing social media tweets that is capable of handling 4K TPS. Let’s say that it fetches the data from a PostgreSQL instance by creating 100 connections.
What would happen if the TPS rises from 4K to 40K? How would we manage the additional load?
Scaling
We would need to deploy 10 service instances to handle the load of 40K TPS. Since each service instance requires 100 connections, there would be 1000 (100 x 10) connections to our PostgreSQL instance.
The diagram below illustrates how horizontal scaling can increase the number of database connections.
The database would struggle to handle the connection surge and fail new connections. Thus, the database would become a bottleneck and limit the system’s scalability.
Duplicate requests
At 40K TPS, there would be multiple requests for the same resource. For example: a request to fetch the same tweet.
With several instances, the same SQL query would be executed multiple times returning the same data. This would result in an additional database load slowing the overall execution and leading to a poor user experience.
We can solve this challenge through a technique known as sharding that distributes the data over multiple database or by having read replicas. But it brings in a different set of challenges that we will see in the next section.
Sharding and Read replicas
With several database instances, we now need to pass the configuration (host, port, etc) of each database instance to the application. We need to write additional code so that the application is aware of the different shards and handle the read/write segregation explicitly.
This introduces the following challenges:
Tight coupling - The application is tightly coupled with the sharding logic and the available shards.
Scaling challenges - The application needs to be updated with the addition of each new replica by passing the updated configuration.
Inflexibility - If application code is rewritten in a different stack, it would require duplication of the existing sharding logic.
The below diagram shows how an architecture involving database replicas and read/write segregation increases the complexity.
While these problems can be solved, it would require building a library for handling the sharding logic. We would need to develop this for every programming language and database.
Now that you understand the core problem, let’s understand how Database proxies solve the problem.
What is a database proxy?
Database proxies are software applications that sit between a service and a database. They create and manage a pool of database connections, and applications connect to the proxy instead of directly to the database.
They intercept the service connections and multiplex them onto a fixed pool of database connections.
The below sequence diagram shows how the process works.
Let’s now understand how Proxies solve the scaling challenges described in the previous section.
Scaling
Proxies create a fixed number of connections to the database. While the service-proxy connections increase with horizontal scaling, the number of database connections remains constant.
This design protects the database from additional connections reducing the chances of failing new connections. Similarly, the load on the database remains stable due to fixed number of connections.
How many connections do you think a single Proxy can support? 🤔 (Share your answers in the comments below)
The following diagram shows how Proxy handles additional connections from application by keeping fixed number of database connections.
Functions as a Service (FaaS) platforms like Lambdas widely adopt proxies. Since every Lambda is short-lived, every invocation leads to setup and teardown of a database connection.
Additionally, the problem amplifies when the number of concurrent executions exceeds maximum database connections.
With proxies, Lambdas no longer connect to the database directly and instead connect to the proxies. As a result, the database remains shielded from any spike in the Lambda layer.
Read Caching
Many proxies can be configured to cache the database query results. The proxy then returns the results from the cache instead of executing the database query.
Caching improves the response time and also reduces the overall load on the database. Proxies such as ProxySQL and Pgpool-II support native in-memory caching with configurable cache size and the TTL (time-to-live).
Sharding and Read replicas
Proxies can be configured with rules to segregate the read/write requests. Similarly, we can leverage the same technique to shard the data across several database servers.
The following diagram shows how proxies simplify configuration and abstract database shards/replicas.
Unlike traditional ways of hardcoding sharding in the application, proxies help through:
Reduced coupling - The sharding logic is abstracted and configurable eliminating the tight coupling.
Ease of scaling - New shards/replicas can be easily introduced by making configuration changes using admin interface.
Flexibility - Rewriting the application code wouldn’t require additional efforts to integrate the sharding logic.
While database proxies offer several benefits, do you think it makes sense to use it always?. Let’s now dive into the trade-offs that would help us decide whether to use a proxy or not.
Trade-offs
Deployment complexity
Since proxies are separate applications, they need to be deployed resulting in an additional overhead. This overhead compounds when you have to deploy a proxy on a different machine than the database server.
Maintenance overhead
The development team now needs to maintain, patch and update the proxy. Similarly, they need to manage the OS upgrades and fix vulnerabilities in the server hosting the proxy.
Latency
Since the queries go via the proxy, it could increases the latency by 10-15 ms. This could impact the performance in case the application requires strict latencies.
Proxies are a must when your system has complex use cases such as sharding, replication, read/write segregation and caching. In such cases, the benefits outweigh the costs. However, for a single database system, they are can often become an overkill.
Let’s conclude with a decision matrix to help you decide whether to use a proxy.
Conclusion
While database proxies are versatile and solve several distributed system challenges, they are not always required. Before using one, you should think about your workload, performance needs, and the complexity that you are willing to manage.
Here’s a decision matrix to help you decide whether you should use a proxy in your architecture.
Now that you understand the working of proxies, here’s a question that’s worth reflecting on - Can proxies become a single point of failure and bring down your application? 🤔 If yes, how do we prevent such scenarios?
Leave your thoughts in the comments below. We will address this question in the next article.
Before you go:
❤️ the story and follow the newsletter for more such articles
Your support helps keep this newsletter free and fuels future content. Consider a small donation to show your appreciation here - Paypal Donate
Educative has officially launched their Year End sale— and it’s their strongest offer of the entire year.
Whether you’re preparing for interviews, mastering new technologies, or strengthening your fundamentals, Educative provides high-quality, in-browser learning with no setup required.
Use my link to get an extra 10% off









