By Anurag in System Design — Mar 20, 2023

Let’s talk about Scalability

A web service is said to be scalable if, adding more computing resources results in performance gains proportional to the added resources.

Talking about Web Scalability | Large Scale Distributed systems - Smartutr

Scalability is frequently used buzzword to indicate that something is not properly designed.

Often we hear in technical discussions, “but that doesn’t scale”. It's used as the magical spell to end any argument.

It often indicates that some system is designed in such a way that it's difficult to increase its performance beyond a certain amount.

Such a system is called as "not scalable" because it's either practically impossible to scale such a system or scaling its performance would be extremely expensive affair.

But, what is it that we really mean by scalability? Let's find out in this blog.

What is Scalability?

A web service is said to be scalable if, adding more computing resources results in performance gains proportional to the added resources.

In simple words, scalability is the measure of a system's ability to increase or decrease its performance and cost in response to changes in application's processing demands

Which means that if you could increate your web application's performance, as it's demand increases, without wasting a lot of time, money or resources, and similarly reduce it's performance equally easily, as and when required, then your web application is said to be scalable.

There are generally two ways to achieve scalability: Scale up vs Scale out (vertical vs horizontal scaling)

Vertical Scaling

When we increase or decrease the capacity of our existing compute resources (for instance, memory or central processing units) in order to match our application's demands.

This is generally very easy to achieve and doesn't need a lot of changes in our underlying application design.

But there's a practical limit to the maximum size of a machine we could use.

Horizontal Scaling

When we add more and more resources and make them work together to handle increasingly large amounts of workloads.

Adding more resources is generally more complex because, most applications are not initially designed with scalability in mind, so we need to think through and redesign our application architecture in order to make multiple machines work together in a seamless way.

But once we design our applications such that they support horizontal scaling, then there's practically no limit to our applications ability to scale.

What scalability is not?

Scalability is generally very closely used with few other keywords like performance, latency, throughput. Sometimes people even use it interchangably, but it's crucial to understand the difference between these.

Performance

Performance, in a very broad sense is a measure of how well your system satisfies it's business requirements.

Let's try to understand these with a practical example. Suppose we are given a task to design a web application similar to smartutr. Smartutr is a web application that allows educators to easily create educational content and allows students to access those resources on demand. For simplicity we'll only focus on pre-recorded video lecture streaming part for now.

For such a video lecture streaming service, the business requirement for the platform is two fold.

Allow educators to upload video lectures
Allow students to stream those lectures on their devices.

So, if our application can allow educators to upload lectures and also serve video lectures to our students in a fast and reliable manner, it's said to have a good performance.

In context of web services, there are many metrics that can be used to measure performance of a system. Two of the most widely used such metrics are latency and throughput. Let's take a brief look at both of these metrics.

Latency and Throughput

Latency is the time taken to perform some action or to produce some result.

Throughput is the number of such actions or results performed or produced per unit of time.

Latency in very simple terms could be understood as the time that it takes for a system to produce some results. It is usually measured in units of time, i.e hours, minutes, seconds or even nanoseconds.

Throughput on the other hand, is the total number of such actions executed or results produced in one unit of time by the system. This is measured in units of whatever is being produced (HTTP responses or bits/bytes of data) per unit of time.

Let's get back to our example.

When you type in https://www.smartutr.com in your web browser, your browser takes on the responsibility of connecting to the backend servers and fetching us the video lectures while we wait for the website to load :P

So, the overall time it takes for the browser to contact smartutr servers and fetch back some response (including the time that smartutr servers takes in processing the request and generating a response) is collectivelly called as latency.

In short for any web service, Latency is the time it takes for data to travel between the user and the server plus the time it takes for the server to process it.

Also, the amount of data that you receive from the server per unit time is called as throughput.

So, if the smartutr servers sends you a video lecture of around 0.6 GB (approx 600 MB) in 10 min (i.e 600 sec), it's throughput is 1 mbps whereas its latency to produce 1 GB of data would be approximately 1000 sec.

Generally, you should aim for maximal throughput with acceptable latency.

Performance vs Scalability

So now that we have looked at what is performance and how it can be measured, let's look at how it is different from scalability.

If your web service has low latency and high throughput, we can say it has a good performance. But a good question to ask here is, for how many users? When you have less number of users using your web service, it's easy to achieve high performance.

But as our number of users increase, and therefor the amount of data we store and process increases exponentially, your performance metrics might drop significantly.

This might happen due to multiple reasons. Maybe your SQL queries are not optimised, so as the database size grows, data processing time (and therefor service latency) increases. Or maybe when a large number of tutors start uploading videos, your server soon runs out of storage space and crashes? Or maybe some students are trying to access our web service from a continent that's half way accross the planet, so our server requests travel longer distances, eventually increasing the student's wait time.

All of these issues aren't related to performance per se, because our service performed very well for a small number of tutors and students, who lived closer to our data centers.

But if our application doesn't have the ability to accommodate similar performance when we have exponentially more users and that too from all across the world, then we have a scalability problem.

In very simple words, the best way to look at performance vs scalability is as follows

If we have a performance problem, our system is slow for a single user.
If we have a scalability problem, our system is fast for a single user but slow under heavy load.

Why is it so difficult to scale web services?

A natural question to ask after reading about scalability is - Why is achiving high scalability so hard?

Because scalability cannot be an after-thought. It requires applications and platforms to be designed with scaling in mind, such that adding resources actually results in improving the performance or that if redundancy is introduced the system performance is not adversely affected.

Many algorithms that perform reasonably well under low load and small datasets can explode in cost if either request rates increase, the dataset grows or the number of nodes in the distributed system increases.

In upcoming blogs, we will disscuss in depth about designing such distributed systems that have high scalability. We'll look at what are some of the most common scalability issues, why do they happen, how to possibly avoid them and finally disscuss about their possible solutions.

And remember, scalability isn't a trivial problem to solve, it takes thorough knowledge and experience to actually deal with highly distributed and scalable systems.

So keep learning and have fun..... until next time.

Let’s talk about Scalability