# Data Structures and Algorithms

## Java

### Access Modifier

C: Class; P: Package; SC: Subclass; W: World

Modifier C P SC W
public Y Y Y Y
protected Y Y Y N
none Y Y N N
private Y N N N

### Comparable

Implement Comparable<T>:

int compareTo(T o)


### hashCode

• If two objects are equal, hashCode must return the same result
• hashCode must return the same result hen invoked on the same object more than once
int hashCode() {}
boolean equals(Object o) {}


### Common Issues

Array out of bounds; divide by zero; infinite loop; exception thrown but not caught; access member variable in static method

## Algorithmic Analysis

Amortized Cost
Algo has amortized cost $$T(n)$$ if ∀ k ∈ \mathbb{Z}, cost of k operations is $$\leq kT(n)$$

### Master’s Theorem

$$T(n) = aT(\frac{n}{b}) + f(n)$$ where a ≥ 1, b > 1

• $$f(n) \in O(n^c)\text{, } c < \log_ba \ \text{ then } T(n) \in \Theta(n^{\log_ba})$$
• $$f(n) \in O(n^c\log^kn)\text{, } c = \log_ba \ \text{ then } T(n) \in \Theta(n^c\log^{k+1}n)$$
• $$f(n) \in O(n^c), c > \log_ba \text{ and } \exists k \ \text{ st } af(\frac{n}{b})\le kf(n) \text{ then } T(n) \in \Theta(n^{\log_ba})$$

#### Common Ones

• $$T(n) = T(n/2) + \Theta(1) = O(\log n)$$ (Binary Search)
• $$T(n) = 2T(n/2) + \Theta(1) = O(n)$$ (Binary Tree Traversal)
• $$T(n) = 2T(n/2) + \Theta(\log n) = O(n)$$ (Optimal Sorted Matrix Search)
• $$T(n) = 2T(n/2) + O(n) = O(n \log n)$$ (Merge Sort)

## Abstract Data Types

Bag
Insert(i); Draw()
Stack (LIFO)
empty(); peek(); pop(); push(i)
Queue (FIFO)
Dequeue
double-ended queue

## Searching

• Binary Search $$O(\log n)$$
int binarySearch(int[] arr, int key) {
int start = 0, end = arr.length - 1;
int found = -1;
while(start <= end) {
int mid = start + (end - start)/2;
if(arr[mid] < key) {
start = mid + 1;
} else if(arr[mid] > key) {
end = mid - 1;
} else {
found = mid;
// if we want first instance
end = mid - 1;
// if we want last instance
// start = mid + 1;
}
}
return found;
}

One sided Binary Search
Suppose one side is bounded, eg [1, ∞). Use the sequence [1,2,4,8,16…, $$2^k$$…] If it works for $$2^k$$, then search on [$$2^{k-1}, 2^k$$]
Peak Finding
A[j] in array A is peak if (i) A[j] > Aj-1 A[j] > A[j+1]. If only one item in array, vacously true
1D Peak Finding $$O(\log n)$$
D&C
if a[n/2] < a[n/2-1] look at 1..n/2-1
else if a[n/2] < a[n/2+1] look at n/2+1..n
else return a[n/2]

2D Peak Finding $$O(m + n)$$
D&C
find max in border + cross O(m+n)
if max is peak return
else go into quadrant with higher number


## Sorting

Bubble Sort
Stable, In-place, W&A $$O(n^2)$$, B $$O(n)$$, S $$O(1)$$; Invariant : At iteration i , the sub-array A[1 .. i] is sorted and any element in A[i + 1 .. A . size] is greater or equal to any element in A[1 .. i]
Selection Sort
In-Place, Unstable; find minimum element and swap. W,A,B $$O(n^2)$$, S $$O(n/1)$$; Invariant: a[0…i-1] is sorted all entries in a[i..n-1] are larger than or equal to the entries in a[0..i-1]
Insertion Sort
In-place, Stable; W $$O(n^2)$$, B $$O(n \log n)$$, S $$O(n)$$; Invariant: The subarray a[i] consists of the original elements in sorted order.
Merge Sort
Stable, In Place; W/B $$O(n\log n)$$, S $$O(n)$$
Quick Sort
In-place, Unstable; W $$O(n^2)$$, A/B $$O(n\log n)$$ S $$O(\log n)$$

## Geometric Algorithms

### Jarvis March $$O(hn)$$

1. Find somewhere to start, e.g. y-min coordinate
2. Add point with maximum angle from horizon $$O(n)$$
3. Keep adding points with maximum angle from previous

### Line Intersection Algorithm $$O(n\log n)$$

1. Divide into two equal size sets (along vertical line)
2. Recursively find convex hulls (base case 3 points)
3. Merge convex hulls
1. Find upper tangent lines
1. while $$(u,v,w)$$ clockwise, decrement $$v$$
2. while $$(v,w,z)$$ clockwise, increment $$w$$
2. Find lower tangent lines
1. while $$(w,v,u)$$ clockwise, increment $$v$$
2. while $$(z,v,u)$$ clockwise, decrement $$w$$

### Quick Hull $$O(n \log n)$$

1. Choose pivot, construct two subproblems, delete interior points
2. recurse on subproblems

## Trees

### Binary Trees (height h)

$$h(v) = max\left(h(v.left), h(v.right)\right) + 1$$

• BST: left ST < key < right ST
• traversal $$O(n)$$ IN:LSR, PRE:SLR, POST:LRS
• insert, search, findMax, findMin: $$O(h)$$
• successor $$O(h)$$:
• if hasRightChild, smallest node in right sub-tree
• else, first parent node that is left child (parent of node is successor)
• delete $$O(h)$$: switch numChild
• 0: remove v
• 1: remove v, connect child(v) to parent(v)
• 2: swap with successor(v), remove(v)

### AVL Trees (height $$h = \log n$$)

• Property: Every node is height-balanced
• $$\lvert v.left.height - v.right.height \rvert \le 1$$
• insert $$O(\log n)$$:
• insert key in BST
• walk up, perform max 2 rotations if out-of-balance
• delete(v): ($$\log n$$ rotations)
• If v has 2 children, swap with successor
• delete v, and reconnect children
• for every ancestor of deleted node
• rotate if out-of-balance
• Splay Trees: Rotate nodes that are accessed to root. consider using where operations are non-random.

### Augmented Trees

#### Rank Tree (Order Statistics)

• store weight of tree in each node:
• $$w(v) = w(v.left) + w(v.right) + 1$$
• select(k) $$O(\log n)$$: finds node with rank $$k$$
rank = left.weight + 1;
if (k == rank)
return v
else if (k < rank)
return left.select(k)
else return right.select(k-rank)

• rank(v) $$O(\log n)$$: computes rank of node v
rank = v.left.weight + 1
while (v != null) do
if v is left child do nothing
if v is right child,
rank += v.parent.left.weight + 1
v = v.parent


#### Interval Trees

• Each node is an interval $$(m, n), m \le n$$
• Sort by $$m$$, augment node with maximum $$n$$ of children in each node
• search(x) $$O(\log n)$$:
if x in c
return c
else if c has no left child
search in right subtree
else if x > max endpoint in c.left
search in right subtree
else search in left subtree

• findAll(x) $$O(k \log n)$$ for k overlapping intervals
search(x)
store it somewhere else
remove interval
repeat until no intervals found


#### Orthogonal Range Searching

• 1D

1. use a binary tree search tree
2. store all points in the leaves of the tree, internal nodes store only copies
3. each internal node v stores the max of any leaf in the left subtree
4. Query Time: $$O(k + \log n)$$
5. Building Tree: $$O(n \log n)$$
• k-dim Tree

1. each node in the x-tree has a set of points in its subtree
2. store the y-tree at each x-node containing all points
3. Query Time: $$O(k + \log^d n)$$
4. Building Tree: $$O(n \log^{d-1}n)$$
5. Space: $$O(n \log^{d-1}n)$$

#### Custom Augmentations

• Average height of people taller: augment nodes to include the count of the number of nodes in that sub-tree, along with the sum of the heights of all the people in that sub-tree. To return the desired average, first search for the name in the hash table; assume it is at node v; then find the sum of the heights of: the right-child of v, and if w is on the path from v to the root and v is in w’s left-subtree, then w’s right-subtree and w.

## Hash Tables

• n: #items, m: #buckets
• Simple Uniform Hashing: Keys are equally likely to map to every bucket, and are mapped independently
• $$load(ht) = \frac{n}{m}$$
• $$E_\text{search} = 1 + \frac{n}{m}$$
• Assume $$m=\Omega(n)$$, $$E_\text{search} = O(1)$$

### Hash Functions

#### Division

• $$h(k) = k \text{ mod } m$$, choose m prime

#### Multiplication

• fix table size: $$m=2^r$$, for some $$r$$
• fix word size: $$w$$, size of key in bits
• fix odd constant $$A$$, $$A > 2^{w-1}$$
• $$h(k) = (Ak) \text{ mod } 2^w >> (w - r)$$

#### Rolling Hash

• When key changes by single character

### Chaining

• bucket stores linked list, containing (object, value)
• Worst insert $$O(1 + cost(h))$$
• Expected search = $$1 + \frac{n}{m} = O(1)$$
• Worst search $$O(n)$$

• One item per slot, probe sequence of buckets until find only one
• $$h(key, i) : U \mapsto {1..m}$$, $$i$$ is no. of collisions
• search: keep probing until empty bucket, or exhausted entire table
• delete: set key to tombstone value, so probe sequence still works
• insert: on deleted cell, overwrite, else find next available slot
• good hash function:
1. $$h(key, i)$$ enumerates all possible buckets
2. Simple Uniform Hashing
• Linear: $$h(k,i) = h(k) + i$$, Clustering
• Double: $$h(k,i) = f(k) + i \cdot g(k) \text{ mod } m$$
• Insert, Search: $$\frac{1}{1-\alpha}$$ where $$\alpha = \frac{n}{m} \le 1$$
• good: saves space, rare mem alloc, better cache perf

### Cuckoo Hashing

• Resolving hash collisions with worst-case constant lookup time
• Lookup: inspection of just two locations in the hash table
• Insertion: Insert into first table if empty; else kick out other key to second location.
• If infinite loop, hash function is rebuilt in place

### Table resizing

• Scan old table $$O(m_1)$$, create new table $$O(m_2)$$, insert each element $$O(1)$$, total $$O(m_1 + m_2 + n)$$
• $$O(n)$$ amor: if $$n == m$$, $$m = 2m$$, if $$n < \frac{m}{4}$$, $$m = \frac{m}{2}$$

### Fingerprint Hash Table (FHT)

• Vector of 0/1 bits
• no false negatives, but has false positives. $$P_{\text{no FP}} = \left(\frac{1}{e}\right)^{n/m}$$

### Bloom Filter

• use $$n$$ hash functions. More space per item, but require $$n$$ collisions for false positive.
• $$P_{\text{coll}} = \left(1- e^{-kn/m}\right)^k$$
• Two hash functions, $$h(k)$$ and $$t(k)$$, two tables $$T_1$$ and $$T_2$$
• insert: $$T_1[h(k)] = 1$$, $$T_2[h(k)] = 1$$
• search: if $$T_1[h(k)]$$ and $$T_2[h(k)]$$ both 1 return true

## Graphs

Type Space v,w any all
List $$O(V+E)$$ slow fast fast
Mat $$O(V^2)$$ fast slow slow
• BFS/DFS do not explore all paths

#### BFS $$O(V+E)$$

bfs(root)
Q.enqueue(root)

while Q is not empty:
current = Q.dequeue()
visit(current)
for each node n adj to current
if n not visited
n.parent = current
Q.enqueue(n)


#### DFS $$O(V+E)$$

• Same as BFS, but use stack instead of queue

#### Topological Sort (DAG)

• Post-order DFS
• Kahn’s Algorithm (first append all nodes with no incoming edges to result set, remove edges connected to these nodes and repeat, also O(V+E))

### SSSP

#### Bellman-Ford $$O(EV)$$

• $$O(V^3)$$ if using Adj Matrix
do V number of times
for (Edge e : graph)
relax(e)

• can terminate early if no improvement
• can detect negative cycle: perform V times, then perform once more, if have changes it has negative cycle
• if all weights are the same, use BFS

#### Dijkstra $$O(E\log V)$$

Interesting article on how Dijkstra’s algorithm is everywhere

• Doesn’t work with negative edge weights
• can terminate once end is found
add start to PQ
dist[i] = INF for all i
dist[start] = 0
while PQ not empty
w = pq.dequeue()
for each edge e connected to w
if edge is improvement
update pq[w] O(logn)
update dist[w]


#### DAG

• Toposort, relax in order
• SSSP on DAG: run topo sort, and relax edges in that order in $$O(V+E)$$
• Single Source Longest Path problem is easy on DAG: multiply edge weights by -1 and run SSSP

### Heap

• implements priority queue, is a complete binary tree
• priority of parent > priority of child
• insert: create new leaf, bubbleUp
• decreaseKey: update priority, bubbleDown
• delete: swap with leaf, delete, and then bubble
• store in array:
• $$left(x) = 2x + 1$$
• $$right(x) = 2x + 2$$
• $$parent(x) = \lfloor(x-1)/2\rfloor$$

#### Heap Sort

1. Heapify (insert n items) O(n log n)
2. Extract from heap n times (O(n log n))

3. Improvement: recursively join 2 heaps and bubble root down (base case single node) O(n)

4. O(n log n) worst case, deterministic, in-place

#### UFDS (weighted)

• union(p,q) $$O(\log n)$$
• find parent of p and q
• make root of smaller tree root of larger tree
• find(k) $$O(\log n)$$

• search up the tree, return the root
• (PC): update all traversed nodes parent to root
• WU with PC, union and find $$O(\alpha(m,n))$$

### MST

• acyclic subset of edges that connects all nodes, and has minimum weight

#### Properties

1. Cutting edge in MST results in 2 MSTs
2. Cycle Poperty: $$\forall$$ cycle, max weight edge is not in MST
3. Cut Property: $$\forall$$ partitions, min weight edge across cut is in MST

#### Prim’s $$O(E \log V)$$

• Uses cycle property
T = {start}
enqueue start's edges in PQ
while PQ not empty
e = PQ.dequeue()
if (vertex v linked with e not in T)
T = T U {v, e}
else
ignore edge
MST = T


#### Kruskal’s $$O(E\log V)$$

• Uses UFDS
• It is possible that some edge in the first $$V-1$$ edges will form a cycle with pre-existing MST solution
Sort E edges by increasing weight
T = {}
for (i = 0; i < edgeList.length; i++)
if adding e = edgelist[i] does
not form a cycle
else ignore e
MST = T


#### Boruvka’s $$O(E\log V)$$

T = { one-vertex trees }
While T has more than one component:
For each component C of T:
Begin with an empty set of edges S
For each vertex v in C:
Find the cheapest edge from v
to a vertex outside of C, and
Add the cheapest edge in S to T
Combine trees connected by edges
MST = T


#### Variants

1. Same weight: BFS/DFS $$O(E)$$
2. Edges have weight $$1..k$$:
• Kruskal’s
• Bucket sort Edges $$O(E)$$
• Union/check $$O(\alpha (V))$$
• Total cost: $$O(\alpha(V)E)$$
• Prim’s
• Use array of size k as PQ, each slot holds linked list of nodes
• insert/remove nodes $$O(V)$$
• decreaseKey $$O(E)$$
3. Directed MST
• $$\forall$$ node except root, add minimum incoming edge $$O(E)$$
4. MaxST
• negate all weights, run MST algo

#### MST Problems

• How do I add an edge (A,B) of weight k into graph G and find MST quickly?

• Use cycle property; max edge in any cycle is not in MST
• only add (A,B) if k is not the max weight edge
• O(V + E) time to find max edge along A → B with DFS
• Given an undirected graph with $$K$$ power plants, find the minimum cost to connect all other sites.

• run Prim’s, use super source
• weight of new edges are zero
• this is a single MST
• How do I make Kruskal run faster when sorting?

• Store edges in separate linked lists
• To process edges in increasing weight, process all edges in one linked list then the next
• Time: $$O(E)$$ or $$O(E\alpha(m, n))$$
• Space: $$O(E)$$, need to store all $$E$$ edges
• Minimum Bottleneck Spanning Tree (MBST)

• General idea: If I use some edge e that is not in the MST to replace some edge e’ in the MST, then my max. edge is max (max edge on original MST, e).
• Intuitively, my MST would then fulfill the condition of MBST.
• Note: Every MST is an MBST, but not every MBST is an MST
• Find maximum distance between 2 vertices in MST

• Bruteforce: perform DFS starting from every single location since there is only one path from any node to another
• DFS: $$O(V+E)$$, doing it $$V$$ times, $$O(V(V+E)) =O(V^2)$$ since $$E = V - 1$$
• Space: $$O(V)$$, need to store all the edges in MST

### Floyd-Warshall (APSP)

• Shortest paths have optimal substructure
• Shortest paths have overlapping subproblems
• Idea: gradually allow usage of intermediate vertices
• Invariant: At step k, shortest path via nodes 0 to k are correct
// precondition: A[i][j] contains weight
//  of edge (i,j) or inf if no edge
int[][] APSP(A) {
// len = # vertices
// clone A into S
for(int k = 0; k < len; k++)
for(int i = 0; i < len; i++)
for(int j = 0; j < len; j++)
S[i][j] =
Math.min(S[i][j],
S[i][k] + S[k][j]);
return S;
}


### Network Flow

k-edge connected
Source and target are k-edge connected if there are k edge disjoint paths(don’t share edges) from source to target.
Max flow
st-cut property with minimum capacity(outgoing from s, ignore incoming to s)
Min cut
Let $$S$$ be the nodes reachable from the source in the residual graph. $$T$$ = all other nodes, S → T is minimum cut
Augmenting Path
path in residual graph from s to t that has no 0 weight edges

#### Ford-Fulkerson

2. While there exists augmenting path:
• find an augmenting path
• compute bottleneck (min edge)
• increase flow on the path by bottleneck capacity

Time Complexity:

• DFS: $$O(|F|E)$$
• BFS(Edmonds-Karp, shortest augmenting path): $$O(VE^2)$$
• Dinitz: $$O(V^2E)$$

### Graph Algorithms on Trees

#### Check if connected graph is tree

Run DFS, stop when after traversing $$V-1$$ edges, return true if all nodes connected and no other used edge. False otherwise. $$O(V)$$

#### Min Vertex Cover

• Idea: transform tree into DAG, run DP
• only two possiblities for each vertex; taken or not
int MVC(int v, int flag) {
int ans = 0;
if (memo[v][flag] != -1)
return memo[v][flag];
else if (leaf[v]) //if v is leaf
ans = flag;
else if (flag == 0) {
ans = 0;
ans+= MVC(child, 1);
}
}
else if (flag == 1) {
ans += min(MVC(child,1),
MVC(child,0));
}
}
}


#### SSSP

• On a weighted tree, any graph traversal algorithm (eg. DFS, BFS) can obtain the shortest path to any vertice in $$O(V)$$
• Weight of shortest path between two vertices is the sum of the weights of edges on the unique path

#### ASSP

• Run SSSP on V vertices in total $$O(V^2)$$, compared to $$O(V^3)$$ FW algorithm

#### Diameter

• Originally, run FW in $$O(V^3)$$ and do an $$O(V^2)$$ all-pairs check, to total $$O(V^2)$$.
• Now, only need 2 $$O(V)$$ traversals: DFS/BFS from any vertex $$s$$ to find the furthest vertex $$x$$. Then do a DFS/BFS one more time from vertex $$x$$ to find furthest vertex $$y$$. Length of unique path along x to y is the diameter of the tree.

### Graph Modelling Techniques

1. minimum shortest path from many source to one destination: run SSSP treating destination as source.
2. multiple sources to multiple destinations: consider super source and super sink, with edge weight 0, and run Dijkstra (if no negative edge weights), BF otherwise.
3. Attempt to convert graph into a DAG and use DP techniques. Example: attaching a variable to a vertex that is monotonically decreasing
4. Shortest path between X and Y that passes through node $$A$$: Compute two shortest paths; X to A, A to Y, and join the paths.

## Parallel Algorithms

### Parallel Fibonacci

parallelFib(n) {
if(n < 2) then
return n;
x = spawn parallelFib(n - 1);
y = spawn parallelFib(n - 2);
sync;
return x + y;
}

• Critical Path: $$T_\infty$$, Parallelism = $$T_1/T_\infty$$
• $$T_\infty(n) = max(T_\infty(n - 1), T_\infty(n - 2)) + O(1) = O(n)$$
• $$T_p> T_1/p$$
• $$T_p > T_\infty$$, cannot run slower on more processors
• Goal: $$T_p = (T_1/p) + T_\infty$$, $$T_1/p$$ is the parallel part, $$T_\infty$$ is the sequential part

Before: • Work analysis: $$T_1(n) = O(n^2)$$ • critical path analysis: $$T_\infty(n) = O(n^2)$$ After:

pMatAdd(A,B,C,i,j,n)
if(n == 1)
C[i,j] = A[i,j] + B[i,j];
else:
spawn pMatAdd(A,B,C,i + n/2,j + n/2,n/2);
sync;

• Work Analysis: $$T_1(n) = 4T_1(n/2) + O(1) = O(n^2)$$
• Critical Path Analysis: $$T_\infty(n) = T_\infty(n/2) + O(1) = O(\log n)$$

### Parallelized Merge Sort $$O(\log^3n)$$

pMerge(A[1..k], B[1..m], C[1..n])
if (m > k) then pMerge(B, A, C);
else if (n==1) then C = A;
else if (k==1) and (m==1) then
if (A <= B) then
C = A; C = B;
else
C = B; C = A;
else
// binary search for j where
// B[j] <= A[k/2] <= B[j+1]
spawn pMerge(A[1..k/2],
B[1..j],
C[1..k/2+j])
spawn pMerge(A[k/2+1..l],
B[j+1..m],
C[k/2+j+1..n])
synch;

pMergeSort(A, n)
if (n=1) then return;
else
X = spawn pMergeSort(A[1..n/2], n/2)
Y = spawn pMergeSort(A[n/2+1, n], n/2)
synch;
A = spawn pMerge(X, Y);


Icon by Laymik from The Noun Project. Website built with ♥ with Org-mode, Hugo, and Netlify.