# Data Structures and Algorithms

## Java

### Access Modifier

C: Class; P: Package; SC: Subclass; W: World

Modifier C P SC W
public Y Y Y Y
protected Y Y Y N
none Y Y N N
private Y N N N

### Comparable

Implement Comparable<T>:

int compareTo(T o)


### hashCode

• If two objects are equal, hashCode must return the same result
• hashCode must return the same result hen invoked on the same object more than once
int hashCode() {}
boolean equals(Object o) {}


### Common Issues

Array out of bounds; divide by zero; infinite loop; exception thrown but not caught; access member variable in static method

## Algorithmic Analysis

Amortized Cost
Algo has amortized cost $$T(n)$$ if ∀ k ∈ \mathbb{Z}, cost of k operations is $$\leq kT(n)$$

### Master’s Theorem

$$T(n) = aT(\frac{n}{b}) + f(n)$$ where a ≥ 1, b > 1

• $$f(n) \in O(n^c)\text{, } c < \log_ba \ \text{ then } T(n) \in \Theta(n^{\log_ba})$$

• $$f(n) \in O(n^c\log^kn)\text{, } c = \log_ba \ \text{ then } T(n) \in \Theta(n^c\log^{k+1}n)$$

• $$f(n) \in O(n^c), c > \log_ba \text{ and } \exists k \ \text{ st } af(\frac{n}{b})\le kf(n) \text{ then } T(n) \in \Theta(n^{\log_ba})$$

• Common Ones

-   \$$T(n) = T(n/2) + \Theta(1) = O(\log n)\$$ (Binary Search)
-   \$$T(n) = 2T(n/2) + \Theta(1) = O(n)\$$ (Binary Tree Traversal)
-   \$$T(n) = 2T(n/2) + \Theta(\log n) = O(n)\$$ (Optimal Sorted Matrix
Search)
-   \$$T(n) = 2T(n/2) + O(n) = O(n \log n)\$$ (Merge Sort)


## Abstract Data Types

Bag
Insert(i); Draw()
Stack (LIFO)
empty(); peek(); pop(); push(i)
Queue (FIFO)
Dequeue
double-ended queue

## Searching

• Binary Search $$O(\log n)$$
int binarySearch(int[] arr, int key) {
int start = 0, end = arr.length - 1;
int found = -1;
while(start <= end) {
int mid = start + (end - start)/2;
if(arr[mid] < key) {
start = mid + 1;
} else if(arr[mid] > key) {
end = mid - 1;
} else {
found = mid;
// if we want first instance
end = mid - 1;
// if we want last instance
// start = mid + 1;
}
}
return found;
}

One sided Binary Search
Suppose one side is bounded, eg [1, ∞). Use the sequence [1,2,4,8,16…, $$2^k$$…] If it works for $$2^k$$, then search on [$$2^{k-1}, 2^k$$]
Peak Finding
A[j] in array A is peak if (i) A[j] > A[j-1] (ii) A[j] > A[j+1]. If only one item in array, vacously true
1D Peak Finding $$O(\log n)$$
D&C
if a[n/2] < a[n/2-1] look at 1..n/2-1
else if a[n/2] < a[n/2+1] look at n/2+1..n
else return a[n/2]

2D Peak Finding $$O(m + n)$$
D&C
find max in border + cross O(m+n)
if max is peak return
else go into quadrant with higher number


## Sorting

Bubble Sort
Stable, In-place, W&A $$O(n^2)$$, B $$O(n)$$, S $$O(1)$$; Invariant : At iteration i , the sub-array A[1 .. i] is sorted and any element in A[i + 1 .. A . size] is greater or equal to any element in A[1 .. i]
Selection Sort
In-Place, Unstable; find minimum element and swap. W,A,B $$O(n^2)$$, S $$O(n/1)$$; Invariant: a[0…i-1] is sorted all entries in a[i..n-1] are larger than or equal to the entries in a[0..i-1]
Insertion Sort
In-place, Stable; W $$O(n^2)$$, B $$O(n \log n)$$, S $$O(n)$$; Invariant: The subarray a[i] consists of the original elements in sorted order.
Merge Sort
Stable, In Place; W/B $$O(n\log n)$$, S $$O(n)$$
Quick Sort
In-place, Unstable; W $$O(n^2)$$, A/B $$O(n\log n)$$ S $$O(\log n)$$

## Geometric Algorithms

### Jarvis March $$O(hn)$$

1. Find somewhere to start, e.g. y-min coordinate
2. Add point with maximum angle from horizon $$O(n)$$
3. Keep adding points with maximum angle from previous

### Line Intersection Algorithm $$O(n\log n)$$

1. Divide into two equal size sets (along vertical line)
2. Recursively find convex hulls (base case 3 points)
3. Merge convex hulls
1. Find upper tangent lines
1. while $$(u,v,w)$$ clockwise, decrement $$v$$
2. while $$(v,w,z)$$ clockwise, increment $$w$$
2. Find lower tangent lines
1. while $$(w,v,u)$$ clockwise, increment $$v$$
2. while $$(z,v,u)$$ clockwise, decrement $$w$$

### Quick Hull $$O(n \log n)$$

1. Choose pivot, construct two subproblems, delete interior points
2. recurse on subproblems

## Trees

### Binary Trees (height h)

$$h(v) = max\left(h(v.left), h(v.right)\right) + 1$$

• BST: left ST < key < right ST
• traversal $$O(n)$$ IN:LSR, PRE:SLR, POST:LRS
• insert, search, findMax, findMin: $$O(h)$$
• successor $$O(h)$$:
• if hasRightChild, smallest node in right sub-tree
• else, first parent node that is left child (parent of node is successor)
• delete $$O(h)$$: switch numChild
• 0: remove v
• 1: remove v, connect child(v) to parent(v)
• 2: swap with successor(v), remove(v)

### AVL Trees (height $$h = \log n$$)

• Property: Every node is height-balanced
• $$\lvert v.left.height - v.right.height \rvert \le 1$$
• insert $$O(\log n)$$:
• insert key in BST
• walk up, perform max 2 rotations if out-of-balance
• delete(v): ($$\log n$$ rotations)
• If v has 2 children, swap with successor
• delete v, and reconnect children
• for every ancestor of deleted node
• rotate if out-of-balance
• Splay Trees: Rotate nodes that are accessed to root. consider using where operations are non-random.

### Augmented Trees

• Rank Tree (Order Statistics)
-   store weight of tree in each node:
-   \$$w(v) = w(v.left) + w(v.right) + 1\$$
-   select(k) \$$O(\log n)\$$: finds node with rank \$$k\$$

<!--listend-->

text
rank = left.weight + 1;
if (k == rank)
return v
else if (k < rank)
return left.select(k)
else return right.select(k-rank)


-   rank(v) \$$O(\log n)\$$: computes rank of node v

<!--listend-->

text
rank = v.left.weight + 1
while (v != null) do
if v is left child do nothing
if v is right child,
rank += v.parent.left.weight + 1
v = v.parent


• Interval Trees
-   Each node is an interval \$$(m, n), m \le n\$$
-   Sort by \$$m\$$, augment node with maximum \$$n\$$ of children in each node
-   search(x) \$$O(\log n)\$$:

<!--listend-->

text
if x in c
return c
else if c has no left child
search in right subtree
else if x > max endpoint in c.left
search in right subtree
else search in left subtree


-   findAll(x) \$$O(k \log n)\$$ for k overlapping intervals

<!--listend-->

text
search(x)
store it somewhere else
remove interval
repeat until no intervals found


• Orthogonal Range Searching
-    1D

1.  use a binary tree search tree
2.  store all points in the leaves of the tree, internal nodes store
only copies
3.  each internal node v stores the max of any leaf in the left subtree
4.  Query Time: \$$O(k + \log n)\$$
5.  Building Tree: \$$O(n \log n)\$$

-    k-dim Tree

1.  each node in the x-tree has a set of points in its subtree
2.  store the y-tree at each x-node containing all points
3.  Query Time: \$$O(k + \log^d n)\$$
4.  Building Tree: \$$O(n \log^{d-1}n)\$$
5.  Space: \$$O(n \log^{d-1}n)\$$

• Custom Augmentations
-   **Average height of people taller**: augment nodes to include the
count of the number of nodes in that sub-tree, along with the sum
of the heights of all the people in that sub-tree. To return the
desired average, first search for the name in the hash table; assume
it is at node v; then find the sum of the heights of: the right-child
of v, and if w is on the path from v to the root and v is in w’s
left-subtree, then w’s right-subtree and w.


## Hash Tables

• n: #items, m: #buckets
• Simple Uniform Hashing: Keys are equally likely to map to every bucket, and are mapped independently
• $$load(ht) = \frac{n}{m}$$
• $$E_\text{search} = 1 + \frac{n}{m}$$
• Assume $$m=\Omega(n)$$, $$E_\text{search} = O(1)$$

### Hash Functions

• Division
-   \$$h(k) = k \text{ mod } m\$$, choose m prime

• Multiplication
-   fix table size: \$$m=2^r\$$, for some \$$r\$$
-   fix word size: \$$w\$$, size of key in bits
-   fix odd constant \$$A\$$, \$$A > 2^{w-1}\$$
-   \$$h(k) = (Ak) \text{ mod } 2^w >> (w - r)\$$

• Rolling Hash
-   When key changes by single character


### Chaining

• bucket stores linked list, containing (object, value)
• Worst insert $$O(1 + cost(h))$$
• Expected search = $$1 + \frac{n}{m} = O(1)$$
• Worst search $$O(n)$$

• One item per slot, probe sequence of buckets until find only one
• $$h(key, i) : U \mapsto {1..m}$$, $$i$$ is no. of collisions
• search: keep probing until empty bucket, or exhausted entire table
• delete: set key to tombstone value, so probe sequence still works
• insert: on deleted cell, overwrite, else find next available slot
• good hash function:
1. $$h(key, i)$$ enumerates all possible buckets
2. Simple Uniform Hashing
• Linear: $$h(k,i) = h(k) + i$$, Clustering
• Double: $$h(k,i) = f(k) + i \cdot g(k) \text{ mod } m$$
• Insert, Search: $$\frac{1}{1-\alpha}$$ where $$\alpha = \frac{n}{m} \le 1$$
• good: saves space, rare mem alloc, better cache perf

### Cuckoo Hashing

• Resolving hash collisions with worst-case constant lookup time
• Lookup: inspection of just two locations in the hash table
• Insertion: Insert into first table if empty; else kick out other key to second location.
• If infinite loop, hash function is rebuilt in place

### Table resizing

• Scan old table $$O(m_1)$$, create new table $$O(m_2)$$, insert each element $$O(1)$$, total $$O(m_1 + m_2 + n)$$
• $$O(n)$$ amor: if $$n == m$$, $$m = 2m$$, if $$n < \frac{m}{4}$$, $$m = \frac{m}{2}$$

### Fingerprint Hash Table (FHT)

• Vector of 0/1 bits
• no false negatives, but has false positives. $$P_{\text{no FP}} = \left(\frac{1}{e}\right)^{n/m}$$

### Bloom Filter

• use $$n$$ hash functions. More space per item, but require $$n$$ collisions for false positive.
• $$P_{\text{coll}} = \left(1- e^{-kn/m}\right)^k$$
• Two hash functions, $$h(k)$$ and $$t(k)$$, two tables $$T_1$$ and $$T_2$$
• insert: $$T_1[h(k)] = 1$$, $$T_2[h(k)] = 1$$
• search: if $$T_1[h(k)]$$ and $$T_2[h(k)]$$ both 1 return true

## Graphs

Type Space v,w any all
List $$O(V+E)$$ slow fast fast
Mat $$O(V^2)$$ fast slow slow
• BFS/DFS do not explore all paths

• BFS $$O(V+E)$$

text
bfs(root)
Q.enqueue(root)

while Q is not empty:
current = Q.dequeue()
visit(current)
for each node n adj to current
if n not visited
n.parent = current
Q.enqueue(n)


• DFS $$O(V+E)$$
-   Same as BFS, but use stack instead of queue

• Topological Sort (DAG)
-   Post-order DFS
-   Kahn's Algorithm (first append all nodes with no incoming edges to
result set, remove edges connected to these nodes and repeat,
also O(V+E))


### SSSP

• Bellman-Ford $$O(EV)$$
-   \$$O(V^3)\$$ if using Adj Matrix

<!--listend-->

text
do V number of times
for (Edge e : graph)
relax(e)


-   can terminate early if no improvement
-   can detect negative cycle: perform V times, then perform once more,
if have changes it has negative cycle
-   if all weights are the same, use BFS

• Dijkstra $$O(E\log V)$$
[Interesting article on how Dijkstra's algorithm is everywhere](https://blog.evjang.com/2018/08/dijkstras.html)

-   Doesn't work with negative edge weights
-   can terminate once end is found

<!--listend-->

text
dist[i] = INF for all i
dist[start] = 0
while PQ not empty
w = pq.dequeue()
for each edge e connected to w
if edge is improvement
update pq[w] O(logn)
update dist[w]


• DAG
-   Toposort, relax in order
-   SSSP on DAG: run topo sort, and relax edges in that order in \$$O(V+E)\$$
-   Single Source Longest Path problem is easy on DAG: multiply edge
weights by -1 and run SSSP


### Heap

• implements priority queue, is a complete binary tree

• priority of parent > priority of child

• insert: create new leaf, bubbleUp

• decreaseKey: update priority, bubbleDown

• delete: swap with leaf, delete, and then bubble

• store in array:

• $$left(x) = 2x + 1$$
• $$right(x) = 2x + 2$$
• $$parent(x) = \lfloor(x-1)/2\rfloor$$
• Heap Sort

1.  Heapify (insert n items)  O(n log n)
2.  Extract from heap n times (O(n log n))

3.  **Improvement**: recursively join 2 heaps and bubble root down (base
case single node) O(n)
4.  O(n log n) worst case, deterministic, in-place

• UFDS (weighted)
-   union(p,q) \$$O(\log n)\$$
-   find parent of p and q
-   make root of smaller tree root of larger tree
-   find(k) \$$O(\log n)\$$
-   search up the tree, return the root
-   (PC): update all traversed nodes parent to root

-   WU with PC, union and find \$$O(\alpha(m,n))\$$


### MST

• acyclic subset of edges that connects all nodes, and has minimum weight

• Properties

1.  Cutting edge in MST results in 2 MSTs
2.  **Cycle Poperty**: \$$\forall\$$ cycle, max weight edge is not in MST
3.  **Cut Property**: \$$\forall\$$ partitions, min weight edge
across cut is in MST

• Prim’s $$O(E \log V)$$
-   Uses cycle property

<!--listend-->

text
T = {start}
enqueue start's edges in PQ
while PQ not empty
e = PQ.dequeue()
if (vertex v linked with e not in T)
T = T U {v, e}
else
ignore edge
MST = T


• Kruskal’s $$O(E\log V)$$
-   Uses UFDS
-   It is possible that some edge in the first \$$V-1\$$ edges will form a
cycle with pre-existing MST solution

<!--listend-->

text
Sort E edges by increasing weight
T = {}
for (i = 0; i < edgeList.length; i++)
if adding e = edgelist[i] does
not form a cycle
else ignore e
MST = T


• Boruvka’s $$O(E\log V)$$
text
T = { one-vertex trees }
While T has more than one component:
For each component C of T:
Begin with an empty set of edges S
For each vertex v in C:
Find the cheapest edge from v
to a vertex outside of C, and
Add the cheapest edge in S to T
Combine trees connected by edges
MST = T


• Variants
1.  Same weight: BFS/DFS \$$O(E)\$$
2.  Edges have weight \$$1..k\$$:
-   Kruskal's
-   Bucket sort Edges \$$O(E)\$$
-   Union/check \$$O(\alpha (V))\$$
-   Total cost: \$$O(\alpha(V)E)\$$
-   Prim's
-   Use array of size k as PQ, each slot holds linked
list of nodes
-   insert/remove nodes \$$O(V)\$$
-   decreaseKey \$$O(E)\$$
3.  Directed MST
-   \$$\forall\$$ node except root, add minimum incoming edge \$$O(E)\$$
4.  MaxST
-   negate all weights, run MST algo

• MST Problems
-    How do I add an edge (A,B) of weight k into graph G and find MST quickly?

-   Use cycle property; max edge in any cycle is not in MST
-   only add (A,B) if k is not the max weight edge
-   O(V + E) time to find max edge along A → B with DFS

-    Given an undirected graph with \$$K\$$ power plants, find the minimum cost to connect all other sites.

-   run Prim’s, use super source
-   weight of new edges are zero
-   this is a single MST

-    How do I make Kruskal run faster when sorting?

-   Store edges in separate linked lists
-   To process edges in increasing weight, process all edges in one
-   Time: \$$O(E)\$$ or \$$O(E\alpha(m, n))\$$
-   Space: \$$O(E)\$$, need to store all \$$E\$$ edges

-    Minimum Bottleneck Spanning Tree (MBST)

-   General idea: If I use some edge e that is not in the MST to replace
some edge e’ in the MST, then my max. edge is max (max edge on
original MST, e).
-   Intuitively, my MST would then fulfill the condition of MBST.
-   Note: Every MST is an MBST, but not every MBST is an MST

-    Find maximum distance between 2 vertices in MST

-   Bruteforce: perform DFS starting from every single location since
there is only one path from any node to another
-   DFS: \$$O(V+E)\$$, doing it \$$V\$$ times, \$$O(V(V+E)) =O(V^2)\$$ since \$$E = V - 1\$$
-   Space: \$$O(V)\$$, need to store all the edges in MST


### Floyd-Warshall (APSP)

• Shortest paths have optimal substructure
• Shortest paths have overlapping subproblems
• Idea: gradually allow usage of intermediate vertices
• Invariant: At step k, shortest path via nodes 0 to k are correct
// precondition: A[i][j] contains weight
//  of edge (i,j) or inf if no edge
int[][] APSP(A) {
// len = # vertices
// clone A into S
for(int k = 0; k < len; k++)
for(int i = 0; i < len; i++)
for(int j = 0; j < len; j++)
S[i][j] =
Math.min(S[i][j],
S[i][k] + S[k][j]);
return S;
}


### Network Flow

k-edge connected
Source and target are k-edge connected if there are k edge disjoint paths(don’t share edges) from source to target.
Max flow
st-cut property with minimum capacity(outgoing from s, ignore incoming to s)
Min cut
Let $$S$$ be the nodes reachable from the source in the residual graph. $$T$$ = all other nodes, S → T is minimum cut
Augmenting Path
path in residual graph from s to t that has no 0 weight edges
• Ford-Fulkerson
1.  Start with 0 flow
2.  While there exists augmenting path:
-   find an augmenting path
-   compute bottleneck (min edge)
-   increase flow on the path by bottleneck capacity

Time Complexity:

-   DFS: \$$O(|F|E)\$$
-   BFS(Edmonds-Karp, shortest augmenting path): \$$O(VE^2)\$$
-   Dinitz: \$$O(V^2E)\$$


### Graph Algorithms on Trees

• Check if connected graph is tree
Run DFS, stop when after traversing \$$V-1\$$ edges, return true if all
nodes connected and no other used edge. False otherwise. \$$O(V)\$$

• Min Vertex Cover
-   Idea: transform tree into DAG, run DP
-   only two possiblities for each vertex; taken or not

<!--listend-->

java
int MVC(int v, int flag) {
int ans = 0;
if (memo[v][flag] != -1)
return memo[v][flag];
else if (leaf[v]) //if v is leaf
ans = flag;
else if (flag == 0) {
ans = 0;
ans+= MVC(child, 1);
}
}
else if (flag == 1) {
ans += min(MVC(child,1),
MVC(child,0));
}
}
}


• SSSP
-   On a weighted tree, any graph traversal algorithm (eg. DFS, BFS) can
obtain the shortest path to any vertice in \$$O(V)\$$
-   Weight of shortest path between two vertices is the sum of the
weights of edges on the unique path

• ASSP
-   Run SSSP on V vertices in total \$$O(V^2)\$$, compared to \$$O(V^3)\$$ FW algorithm

• Diameter
-   Originally, run FW in \$$O(V^3)\$$ and do an \$$O(V^2)\$$ all-pairs check,
to total \$$O(V^2)\$$.
-   Now, only need 2 \$$O(V)\$$ traversals: DFS/BFS from any vertex \$$s\$$ to find
the furthest vertex \$$x\$$. Then do a DFS/BFS one more time from vertex
\$$x\$$ to find furthest vertex \$$y\$$. Length of unique path along x to y
is the diameter of the tree.


### Graph Modelling Techniques

1. minimum shortest path from many source to one destination: run SSSP treating destination as source.
2. multiple sources to multiple destinations: consider super source and super sink, with edge weight 0, and run Dijkstra (if no negative edge weights), BF otherwise.
3. Attempt to convert graph into a DAG and use DP techniques. Example: attaching a variable to a vertex that is monotonically decreasing
4. Shortest path between X and Y that passes through node $$A$$: Compute two shortest paths; X to A, A to Y, and join the paths.

## Parallel Algorithms

### Parallel Fibonacci

parallelFib(n) {
if(n < 2) then
return n;
x = spawn parallelFib(n - 1);
y = spawn parallelFib(n - 2);
sync;
return x + y;
}

• Critical Path: $$T_\infty$$, Parallelism = $$T_1/T_\infty$$
• $$T_\infty(n) = max(T_\infty(n - 1), T_\infty(n - 2)) + O(1) = O(n)$$
• $$T_p> T_1/p$$
• $$T_p > T_\infty$$, cannot run slower on more processors
• Goal: $$T_p = (T_1/p) + T_\infty$$, $$T_1/p$$ is the parallel part, $$T_\infty$$ is the sequential part

Before: • Work analysis: $$T_1(n) = O(n^2)$$ • critical path analysis: $$T_\infty(n) = O(n^2)$$ After:

pMatAdd(A,B,C,i,j,n)
if(n == 1)
C[i,j] = A[i,j] + B[i,j];
else:
spawn pMatAdd(A,B,C,i + n/2,j + n/2,n/2);
sync;

• Work Analysis: $$T_1(n) = 4T_1(n/2) + O(1) = O(n^2)$$
• Critical Path Analysis: $$T_\infty(n) = T_\infty(n/2) + O(1) = O(\log n)$$

### Parallelized Merge Sort $$O(\log^3n)$$

pMerge(A[1..k], B[1..m], C[1..n])
if (m > k) then pMerge(B, A, C);
else if (n==1) then C = A;
else if (k==1) and (m==1) then
if (A <= B) then
C = A; C = B;
else
C = B; C = A;
else
// binary search for j where
// B[j] <= A[k/2] <= B[j+1]
spawn pMerge(A[1..k/2],
B[1..j],
C[1..k/2+j])
spawn pMerge(A[k/2+1..l],
B[j+1..m],
C[k/2+j+1..n])
synch;

pMergeSort(A, n)
if (n=1) then return;
else
X = spawn pMergeSort(A[1..n/2], n/2)
Y = spawn pMergeSort(A[n/2+1, n], n/2)
synch;
A = spawn pMerge(X, Y);


Icon by Laymik from The Noun Project. Website built with ♥ with Org-mode, Hugo, and Netlify.