System Design

tags: Software Engineering, Coding Interview Cheatsheet

Basics

SQL, noSQL
Concurrency
- Threads, deadlock, starvation
- read/write locks
Networking
- Routers/Switches
- TCP vs UDP
File Systems
- OS, file system, database
- levels of caching in modern OS

Terminologies

Replication

Replication refers to frequently copying the data across multiple machines. Post replication, multiple copies of the data exists across machines. This might help in case one or more of the machines die due to some failure.

Consistency

Assuming you have a storage system which has more than one machine, consistency implies that the data is same across the cluster, so you can read or write to/from any node and get the same data. Eventual consistency : Exactly what the name suggests. In a cluster, if multiple machines store the same data, an eventual consistent model implies that all machines will have the same data eventually. Its possible that at a given instance, those machines have different versions of the same data ( temporarily inconsistent ) but they will eventually reach a state where they have the same data.

Availability

In the context of a database cluster, Availability refers to the ability to always respond to queries ( read or write ) irrespective of nodes going down.

Partition Tolerance

In the context of a database cluster, cluster continues to function even if there is a “partition” (communications break) between two nodes (both nodes are up, but can’t communicate).

Vertical scaling and Horizontal scaling

In simple terms, to scale horizontally is adding more servers. To scale vertically is to increase the resources of the server ( RAM, CPU, storage, etc. ). Example: Lets say you own a restaurant which is now exceeding its seating capacity. One way of accomodating more people ( scaling ) would be to add more and more chairs (scaling vertically). However since the space is limited, you won’t be able to add more chairs once the space is full. Another way of scaling would be to open new branches of the restaurant ( horizontal scaling ). Source : http://stackoverflow.com/questions/5401992/what-does-scale-horizontally-and-scale-vertically-mean

Sharding

With most huge systems, data does not fit on a single machine. In such cases, sharding refers to splitting the very large database into smaller, faster and more manageable parts called data shards.

CAP Theorem

It is impossible to simultaneously guarantee the following:

Consistency
Availability
Partition Tolerance

Approaching System Design Questions

Feature Clarification (2 minutes)
Estimations
1. Does the data fit in one machine/database?
2. Can the cache fit on one machine/database?
Design Goals
1. Can data loss be tolerated?
Skeleton of design
1. Discuss high-level components; go into deep dive only on request

Caching

Types

Write-through - write to both cache and db at the same time, before confirming write completion
- Write latency is higher
- but re-reading writes and reads is fast
Write-around - write to db, missing cache
- cache must fetch reads from db on first try
- higher read latency
Write-back - I/O completion sent when data written to cache
- cache writes to db
- might lose data
- allieviated with replicates

Implementing LRU caching

Hashtable + Doubly-linked list Key -> Pointer to node in doubly linked list Each time it is accessed, move node to head of doubly-linked list Evicting keys: If adding new item, and it is full, remove tail of list,

SAS/SSD

used for I/O over SATA (7.5krpm)

Implementing TinyURL

Feature Clarification

Shorten a URL
Expand a slug into a URL
Allow users to pick a custom URL

Data Estimation

Assume tinyURL load, 100M new writes per month
Then, in 5 years, 6B writes.
To handle 6B slugs, assuming we’re using [A-z][a-z][0-9] 62^k > 6*10^9
slugs need just 6 characters, 6 bytes.
1. Slugs will take up 36GB.
Assume 500 bytes for a URL, URLs will take up 3TB.
1. It is reasonable to store all of this on a single machine.
2. But large amounts of reads and writes going to one machine can cause deadlock
3. Master-slave replication

Design Goals

Latency	Consistency	Availability
Yes	Yes	C > A

Design API

shortenURL(url)
expandURL(hash)

Computing the Hash

convert_to_base_62(md5(url + salt))[:6]

Stateless application servers

load balancers ensure application is available when a server dies, and client knows which server to talk to

Implementing Search

Implementing a distributed key-store

Data can’t fit onto one machine
So now the choice is between consistency and availability
Perform some estimations

Jethro's Braindump