Sunday, May 2, 2021

System Design Interview

System Design interview is a prominent interview phase, which focuses on Architecture, Design and attention to details.

Key Points to Remember
  1. Always start with High level architecture, then Scaling, Database Design, API Design, Security, Monitoring & Logging and finally Benefits & Tradeoffs.
  2. Don't get into details prematurely
  3. Avoid fitting requirements to a set architecture in mind
  4. Keep it simple and avoid too many hacks/complications when solving.
  5. Critical reasoning and argument are key to a successful software design. Have justifications for the points you make. Don’t make points without thinking them through. Half-hearted attempts at solving problems (using buzz words) are frowned upon heavily.
  6. Be aware of the current solutions within the industry and best tech practices. 
  7. Know when to pick a solution vs. building something custom. Provide pros and cons for the solution.

System Requirements
  • What scope of the Design ? UI, Backend, Database ?
  • What will be bottlenecks of the system - different regions, geolocation.
  • How much consistency is required for the system ? Strong consistency or eventual consistency. 
  • Are there different types of items/products/objects ? 
  • Are there different rates / prices for each product/service ? Memory Estimate - Size of Data ?
  • Are there any delete, cancel, remove, terminate use cases ?
  • How many users ? How many transactions or requests per second ?
  • How much data is each user uploading and requesting ?
  • Depending on the users, total load, is distributed system with big data required ?
  • Using decimal - down the line there are floating point errors
  • Should split services into multiple micro services ?
  • Does the system needs more reads or writes ? Read Replica ? (When replica then require load balancer)
  • Separate payment option from payment calculation.
  • Volume of Data and Low Latency ?
  • Reliable and Security

Non-Functional Requirements:
  1. Availability
  2. Latency - Cache
  3. Scale - How many requests/users in a day
  4. Monitoring & Logging
  5. Load Balancing

API Design

HTTP Verb CRUD Entire Collection (e.g. /customers) Specific Item (e.g. /customers/{id})
POST Create 201 (Created), 'Location' header with link to /customers/{id} containing new ID. 404 (Not Found), 409 (Conflict) if resource already exists..
GET Read 200 (OK), list of customers. Use pagination, sorting and filtering to navigate big lists. 200 (OK), single customer. 404 (Not Found), if ID not found or invalid.
PUT Update/Replace 405 (Method Not Allowed), unless you want to update/replace every resource in the entire collection. 200 (OK) or 204 (No Content). 404 (Not Found), if ID not found or invalid.
PATCH Update/Modify 405 (Method Not Allowed), unless you want to modify the collection itself. 200 (OK) or 204 (No Content). 404 (Not Found), if ID not found or invalid.
DELETE Delete 405 (Method Not Allowed), unless you want to delete the whole collection—not often desirable. 200 (OK). 404 (Not Found), if ID not found or invalid.


API can be public endpoints or internal endpoints.
Before determining the APIs define the system entities (user, product, transaction objects etc), and their corresponding operations (add, fetch, upload, assign etc).

API - Domain/Resource/{Params}
  • API should have no side effects, it should only perform a single add/update operation as the name of the API indicates.
  • API Operation should be atomic
  • When the response for the API is large use pagination or fragmentation (breaking response into multiple tcp pieces).

API Gateway

API Gateway enables to create and access an HTTP or WebSocket API.
A link between the API Gateway and the ELB which is NOT exposed to the internet can be created using VPC Link (only works with Network Load Balancer).

To integrate your API Gateway REST API with a public Application Load Balancer, use API Gateway HTTP integration. For private Application Load Balancers, use API Gateway VPC link to first connect to a private Network Load Balancer. Then, use the Network Load Balancer to forward API Gateway requests to the private Application Load Balancer.


Relational versus NoSQL Database

Relational databases store data in rows and tables. They enable to perform join operations using SQL across different database tables. NoSQL databases are grouped into four categories: key-value stores, graph stores, column stores, and document stores. Join operations are generally not supported in non-relational databases.

Non-relational databases are preferred when:
  • The application requires super-low latency.
  • The data is unstructured, or we don't have any relational data.
  • We only need to serialize and deserialize data (JSON, XML, YAML, etc.). 
  • Need to store a massive amount of data.
  • NoSQL databases are generally easy to scale
In NoSQL databases, the partition key is used to compute a hash, which then selects the corresponding node to store the data. Partition key should be random, evenly distributed and be mostly unique. The second key for indexing is the sort key which defines the order in which the data is to be stored in the cluster nodes. For example DynamoDB has Sort key, ElasticSearch has Index Sorting and Cassandra has Clustering column as sort key. Data redundancy cannot be avoided generally in NoSQL databases.

MongoDB Versa Cassandra Versus DynomoDB

When low latency is required for read queries, then data must be aggregated during writes and no extra computation be done during reads.

Use ids for identification

​Database Design

Identify key objects, their identifying keys and their relationship mapping.
Data can be stored as individual events or as an aggregate data for a collection.

The most crucial aspect in designing a relational database is to identify the relationships among tables.
  • One to Many: The relationships were manager manages zero or more employees while an employee is managed by one (and only one) manager, or a customer may place many orders while an order is placed by one particular customer are examples of one-to-many relationships. A One-to-many relationship is represented by creating two tables for two entities and have the primary key of the "one"-end or the parent table be stored in multiple rows of the "many"-end or the child table. For example the primary key of the table Manager (managerId) is stored for each row in the employees table, and is known as foreign key. A foreign key of a child table is a primary key of a parent table, used to reference the parent table. For every value in the child table, there is one and only one row in the parent table.
  • Many to Many: The relationship for example were a customer's order may contain one or more products; and a product can appear in many orders. Apart from the two entity table, to support many-to-many relationship a third table needs to be created known as a junction table. The primary key of the junction table is combination of primary keys of both the two entities.
  • For example the primary key for OrderDetails table will the two columns orderID (primary key of Order table) and productID (primary key of Product table).
  • One to One: A one-to-one relationship is used to split the data into two tables, especially optional supplementary information which are many times empty. Hence for every row in the parent table, there is at most one row (possibly zero) in the child table.

The database design can be normalized by creating a new table for optional data using one-to-one relationship and splitting a large table into two smaller tables.

Entity Integrity Rule: The primary key cannot contain NULL. Otherwise, it cannot uniquely identify the row. For composite key made up of several columns, none of the column can contain NULL. Most of the RDBMS check and enforce this rule.

Referential Integrity Rule: Each foreign key value must be matched to a primary key value in the table referenced (or parent table).


Sharding

Database Sharing - for handling too many requests - based on region select one database.
Sharding - Read or Write Sharding, Vertical or Horizontal Sharding

Sharding provides high consistency and availability. Join across shards is extremely expensive. Fixed number of shards, hence less flexibility, can be alleviated using hierarchical sharding were a shard is broken up into multiple mini-shards and requests are directed to them using each shard's manager. Index can be created on each shards.

Horizontal partitioning is partitioning of data using a key which is an attribute of the data and allocate each partition to different servers. Vertical partitioning used columns to partition data.


Performance and ​Scalability

System Reliability versus Availability

Horizontal Versus Vertical Scaling

Horizontal Scaling: Adding multiple machines.
  • Needs Load Balancing
  • More Resilient
  • Network calls (RPC) 
  • Data inconsistency can occur
  • Scales effectively without limitations.
Vertical Scaling: Adding more power (CPU, RAM, etc.) to servers.
  • Single point of failure, It does not have failover and redundancy
  • Inter-process communication (faster than network)
  • Data is consist
  • Hardware limitations for scaling a single machine. It is impossible to add unlimited CPU and memory to a single server.

Load Balancer

A load balancer evenly distributes incoming traffic among web servers that are defined in a load-balanced set. Load balancer has public IP which is mapped to the domain name in DNS. For stateful services, every request from the same client must be routed to the same server, which is accomplished using sticky sessions in the load balancer. Load Balancer can manage traffic on different layers as below.
  • Layer 3 - IP address based
  • Layer 4 - DNS based
  • Layer 7 - Application Level

Load Balancing can route traffic based on below techniques.
  • Round Robin,
  • Remove traffic from machines which are offline
  • Hashing on IP address
  • Assign traffic to machines with least load

Cascading failure is a problem were a single failed instance in cluster enables load balancer to equally route its traffic to rest of the healthy instances, which in turn breaks capacity of other instances with added extra traffic, thus crashing other instances as well and cascading the capacity failures to more nodes until entire system goes down. In order to solve the problem, assign a compute capacity i,e. request/second to each node and have a queue for each node which accepts requests based on node capacity. 

Message Queue

A message queue is a durable component, stored in memory, that supports asynchronous communication. It serves as a buffer and distributes asynchronous requests. The producers create messages and publish them into the message queue. The consumers/subscribers connect to the queue and consume the messages from the queue. Message queue is preferred for decoupling to build scalable and reliable application.

Never use database as message queues, as polling of database for new records is expensive (long intervals).


Caching

Caching allows to serve content faster. It happens at different levels in a web application, from CDN (edge caching), Database caching natively used by every database, Browsers caching the static assets based on the cache expiry headers and Server caching (API caching) which is custom caching of data in a server application. Caching is used for heavy concurrent reads but less high write frequency.

The TTL (Time To Live) in cache is used when the data is updated frequently and the cache is required to be expired in regular intervals. The cache automatically gets deleted once the TTL interval has passed. Do not make the cache expiration date too short as this will cause the system to reload data from the database too frequently. Meanwhile, it is advisable not to make the expiration date too long as the data can become stale. Multiple cache servers across different data centers are recommended to avoid single point of failure.

Below are some of the different caching strategies. Read through cache strategy is most popularly used.

Lazy loading (Cache aside): Lazy loading strategy keeps the cache updated through the application asynchronously. When the application requests data, it first makes the request to the cache. If the data exists in the cache and is not expired, then the data is returned to the application, or else the application requests the data from data store. After receiving the data from data store, the application writes the data into cache.

Read through cache: The data will be read through the cache every time. First, check whether data exists in the cache. If it does, read from the cache and send the response. If it doesn’t, the cache will be updated from the datasource. Cache will then send the response back to the client.

Write-through cache: The write-through strategy adds or updates the data in the cache whenever data is written to the database. Data is first written to the cache and then to the main database.

Write behind cache: Cache will send the response before writing to the datastore. It writes to the datastore asynchronously depending on the load. It's a special type of write through cache, were the cache doesn’t wait for the datastore to acknowledge whether data is stored.

Refresh ahead cache: Refresh ahead cache is used to refresh the data before it expires. It happens asynchronously so the end user won’t see any issues.

Pre-Computation

Content Delivery Network (CDN)

CDN is a network of geographically dispersed servers used to deliver static content. When an image (or video) is request by the user using an URL, the domain of the URL is provided by CDN. If the CDN does not have the image in cache, it requests the image from origin server which return the images with TTL header describing how long the image should be cached. The CDN caches the image until TTL expires and returns the image to the user. In case of temporary CDN outage, clients should be able to detect the problem and request resources from the origin. File objects can be invalidated in CDN before expiration. E.g. Akamai


​High-level Architecture

Add images media to Object Storage like S3, use CDN for caching.

Http Polling - Repeatedly ask server if new info is available. Lot of messages and low latency
Long Polling - Send request to server, server holds on to the request until server has a message.
               Require to maintain an open connection, if lot of messages are coming so need to send many requests. Good for notifications.

WebSockets - maintain a duplex (two-way) open connection which allows to send / receive messages.
Messaging server - use pub-sub pattern to publish/read messages.
User Endpoints (e.g. Get Videos) - CDN - Akamai

Metrics, Logging and Monitoring for Auditing

Different components of the system should be decoupled so that they can be scaled independently.

In order to avoid a single point of failure in distributed systems, have multiple nodes, master-slave architecture for databases, multiple load balancers routed to by DNS, and instances in multiple regions.

Database replication is implemented using master/slave relationship between the original (master) and the copies (slaves). A master generally only supports write operations while slave gets copies of the data from master and only supports read operations. If the master database goes offline, a slave database will be promoted to be the new master. If only one slave database is available and it goes offline, read operations will be directed to the master database temporarily. A new database server will replace the old one. Whenever strong consistency is required read locks are used on replicas.

To scale the web tier horizontally, the web tier should be made stateless. A good practice is to store session data in the persistent storage such as relational database or NoSQL. Each web server in the cluster can access state data from shared data store. A stateless system is simpler, more robust, and scalable.

It is recommended to Pre Scale (or Auto scaling) in order to handle an upcoming big event traffic. Also rate limiting of the traffic is helpful. Autoscaling means adding or removing web servers automatically based on the traffic load.

GeoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user. The requests are geo-routed with the closest data center to the user.

Different regions could use different local databases or caches, were traffic might be routed to a data center in case of failover. Data should be asynchronously replicated across multiple data centers.

Circuit Breaker
Single point of failures
Reverse Proxy

Scalable = Partitioning
Reliable = Replication (Master / Slave) and checkpointing
Fast = In-memory

​Trade-offs


Use Cases

Uber - Use Google S2 API for location tracking and finding cabs in 1 mile radius.

Dropbox - (Indexer, Chunker, Watcher) on client, Metadata cached, Sync service, versioning

Netflix - Videos are stored based on resolution and formats. Transcoding is used to validate videos for errors, convert video to different formats and resolutions and optimized them for different devices. In transcoding, a single 50 GB video file -> break into chunks -> into queue -> workers process chunks -> merge into S3. A Transcoder performs the transcoding processing using concurrent async workers. Videos are placed in different regions to avoid latency due long network paths.

Chat (WhatsApp): Web Sockets are used for asynchronous messaging between client and server and vice versa. It keeps the connection open.

XMPP Protocol: Push messages to the clients using i.e. Peer to Peer communication.

Instagram: Use Distributed File System (S3 Bucket) to stores images and videos. Anytime storing images/videos remember to also store corresponding meta-data.

Task Scheduling: Scheduling tasks to execute immediately is straight forward queue approach, but to execute the task at a specific time, would require constant polling at periodic time intervals. Also it is important to clarify if the jobs are data heavy or processing heavy and are either lightweight or heavyweight. The tasks are submitted to a distributed message queue (Kafka) in order to handle huge requests (10 million jobs/day) and provide fault tolerant. A centralized architecture which involves Zookeeper and worker nodes, which employ MapReduce on top of HDFS to perform tasks.

A distributed architecture in which a group of workers grab tasks from a task queue to process, provides a fully parallel, non-blocking solution for task scheduling problem. If the tasks are too small/inexpensive then such a system would add more cost to parallelize and communication between the nodes. The message queue system like Kafka or RabbitQueue (or table in RDB) can also be used to persist the queue.

Hadoop MapReduce is designed to solve such task scheduling problem. In MapReduce the Master node maintains a task queue (a.k.a. JobTracker) for all the workers in its cluster, and has a standby node to consistently replicate JobTracker's state. When a worker finishes its work, it retrieves a new unit of work (Map operation) from the Master. The Master does not assign more than one task per worker and hold onto more work until some Worker become idle who can immediately accept the work.