Large Scale System Design

5 min readOct 22, 2021

Hello, I am Vinoth Gunasekaran, I have 15+ years of experience designing and developing softwares for various business needs. This article is about deep dive of various system designs and overview of how to approach large scale system designs.

The system design is a process in which we connect all the components to meet the desired business expectation in efficient possible way. It starts from understanding the user requirements and system scaling requirements.The system design can be divided in to two parts

Individual components(Services , Databases) design
Interconnectivity of components.

Here, I am attempting to break down the service design ( Database design will be Part2 of this series), I am talking an approach to analyze few typical uses cases and Pros & Cons of service design in those use cases.

List of use cases:

Distributed configuration data management System ( Read Service — DB Architecture)
Data as a service (Write Service — DB)
Machine Learning integrated systems ( Kafka — Service )
RealTime Micro Service ( Service)
Service — Service Architecture

1. Distributed Configuration management system:

It is typical service -DB architecture. The users stores the configurations which can be changed dynamically and pulled by subscribing application services at runtime when needed.

System requirement:

Read request : 1Million per day
Write request : 100K Per day
Highly available
Response Time: <10Ms

Given the read heavy requirement, the design is to have microservice that caters for read request another service for UI writes. Having multiple instance of read micro service in two data centers solves the problem of reading the data scale. Since the system needs to be highly available we hosted the service in two different data centers to avoid reliability issue. The data between two data centers are synched through bi directionally. Off-course the service hosted behind global load balancer.

2. Data as a service

This is a service — noSQL architecture. The system gets a feeds of data from multiple sources and stores it. When read request comes to service, it serves the data.

System requirement:

Read request : 1 Million Per Day
Write request : 10 Million Per Day
Storage : 10 TB per Month
Consistency : Eventual

In this case the focus is on designing a system which supports huge write. We put a Kafka in front of all the source systems to induct the data. Kafka helps throttling of message and avoid overwhelming the system. From Kafka the design had a micro service which writes the data to Cassandra. The reason why we choose Cassandra is because of its ability synch with other nodes seamlessly even through the other node is running in different region. For reads we exposed a microservice for consumers.

3. Machine learning Integrated Systems

The system gets feed from internet and send it to users. The users work on the customer feed.The data in the pipeline is send to machine learning model for fraud detection.

System requirement:

Messages Per day: 100K
No failures are allowed
Audit required
Influx in the feed volume

In order to handle the risk of data influx we have introduced a Kafka to induct the data and store. The design includes a micro service to read data from Kafka and verify against a ML model. Each stage the is stored in Kafka to make sure the data is not lost and re-processable if needed. There is down side to the design because of multiple Kafka hops, it increases failure point and delays.

4. RealTime Micro Service

The job of this service is to fetch latest date from cache and provide it to consumers

System requirement:

Read requests : 10 million req a day
Write requests: none
Authentication required

The design includes a micro service which reads from cache. The micro service runs in multiple DC to make sure it provides high availability and enabled with property which validates authentication. Multiple instance of the micro service is deployed close to consumer data centers and back behind GSLB for load balancing.

5. Service to Service

Service A had to connect to service B for data with minimal timeout limit and synchronous request

System requirement:

Read requests : 10 million req a day
Authentication required

The design is have a micro service for individual heavy read API and deployed in multiple DC to make sure reads are successful and not timed out.

The general best practices in designing services is to consider specific micro service and backed up behind load balancer or API gateway. Deploy the service in two or more data centers for availability and reliability. Depending on the type of service, choose the HTTP protocol type such as GET, PUT, UPDATE & DELETE.

Name the API’s meaningfully and choose right type of data exchange formate ( JSON works better in most modern use cases).

Secure your service with HTTPS and enable authentication between your system to service or service to service.

In the next series we will see about the database selection in system design and table design.

Large Scale System Design

1. Distributed Configuration management system:

2. Data as a service

3. Machine learning Integrated Systems

4. RealTime Micro Service

5. Service to Service

Written by Vinoth Gunasekaran

No responses yet