System Design
Basics
- SQL, noSQL
- Concurrency
- Threads, deadlock, starvation
- read/write locks
- Networking
- Routers/Switches
- TCP vs UDP
- File Systems
- OS, file system, database
- levels of caching in modern OS
Terminologies
Replication
Replication refers to frequently copying the data across multiple machines. Post replication, multiple copies of the data exists across machines. This might help in case one or more of the machines die due to some failure.
Consistency
Assuming you have a storage system which has more than one machine, consistency implies that the data is same across the cluster, so you can read or write to/from any node and get the same data. Eventual consistency : Exactly what the name suggests. In a cluster, if multiple machines store the same data, an eventual consistent model implies that all machines will have the same data eventually. Its possible that at a given instance, those machines have different versions of the same data ( temporarily inconsistent ) but they will eventually reach a state where they have the same data.
Availability
In the context of a database cluster, Availability refers to the ability to always respond to queries ( read or write ) irrespective of nodes going down.
Partition Tolerance
In the context of a database cluster, cluster continues to function even if there is a “partition” (communications break) between two nodes (both nodes are up, but can’t communicate).
Vertical scaling and Horizontal scaling
In simple terms, to scale horizontally is adding more servers. To scale vertically is to increase the resources of the server ( RAM, CPU, storage, etc. ). Example: Lets say you own a restaurant which is now exceeding its seating capacity. One way of accomodating more people ( scaling ) would be to add more and more chairs (scaling vertically). However since the space is limited, you won’t be able to add more chairs once the space is full. Another way of scaling would be to open new branches of the restaurant ( horizontal scaling ). Source : http://stackoverflow.com/questions/5401992/what-does-scale-horizontally-and-scale-vertically-mean
Sharding
With most huge systems, data does not fit on a single machine. In such cases, sharding refers to splitting the very large database into smaller, faster and more manageable parts called data shards.
CAP Theorem
It is impossible to simultaneously guarantee the following:
- Consistency
- Availability
- Partition Tolerance
Approaching System Design Questions
- Feature Clarification (2 minutes)
- Estimations
- Does the data fit in one machine/database?
- Can the cache fit on one machine/database?
- Design Goals
- Can data loss be tolerated?
- Skeleton of design
- Discuss high-level components; go into deep dive only on request
Caching
Types
- Write-through - write to both cache and db at the same time,
before confirming write completion
- Write latency is higher
- but re-reading writes and reads is fast
- Write-around - write to db, missing cache
- cache must fetch reads from db on first try
- higher read latency
- Write-back - I/O completion sent when data written to cache
- cache writes to db
- might lose data
- allieviated with replicates
Implementing LRU caching
Hashtable + Doubly-linked list Key -> Pointer to node in doubly linked list Each time it is accessed, move node to head of doubly-linked list Evicting keys: If adding new item, and it is full, remove tail of list,
SAS/SSD
- used for I/O over SATA (7.5krpm)
Implementing TinyURL
Feature Clarification
- Shorten a URL
- Expand a slug into a URL
- Allow users to pick a custom URL
Data Estimation
- Assume tinyURL load, 100M new writes per month
- Then, in 5 years, 6B writes.
- To handle 6B slugs, assuming we’re using [A-z][a-z][0-9] 62^k > 6*10^9
- slugs need just 6 characters, 6 bytes.
- Slugs will take up 36GB.
- Assume 500 bytes for a URL, URLs will take up 3TB.
- It is reasonable to store all of this on a single machine.
- But large amounts of reads and writes going to one machine can cause deadlock
- Master-slave replication
Design Goals
Latency | Consistency | Availability |
---|---|---|
Yes | Yes | C > A |
Design API
- shortenURL(url)
- expandURL(hash)
Computing the Hash
convert_to_base_62(md5(url + salt))[:6]
Stateless application servers
load balancers ensure application is available when a server dies, and client knows which server to talk to
Implementing Search
Implementing a distributed key-store
- Data can’t fit onto one machine
- So now the choice is between consistency and availability
- Perform some estimations