Data Structures and Algorithms
Java
Access Modifier
C: Class; P: Package; SC: Subclass; W: World
Modifier | C | P | SC | W |
---|---|---|---|---|
public | Y | Y | Y | Y |
protected | Y | Y | Y | N |
none | Y | Y | N | N |
private | Y | N | N | N |
Comparable
Implement Comparable<T>
:
int compareTo(T o)
hashCode
- If two objects are
equal
,hashCode
must return the same result - hashCode must return the same result hen invoked on the same object more than once
int hashCode() {}
boolean equals(Object o) {}
Common Issues
Array out of bounds; divide by zero; infinite loop; exception thrown but not caught; access member variable in static method
Algorithmic Analysis
- Amortized Cost
- Algo has amortized cost \(T(n)\) if ∀ k ∈ \mathbb{Z}, cost of k operations is \(\leq kT(n)\)
Master’s Theorem
\(T(n) = aT(\frac{n}{b}) + f(n)\) where a ≥ 1, b > 1
- \(f(n) \in O(n^c)\text{, } c < \log_ba \ \text{ then } T(n) \in \Theta(n^{\log_ba})\)
- \(f(n) \in O(n^c\log^kn)\text{, } c = \log_ba \ \text{ then } T(n) \in \Theta(n^c\log^{k+1}n)\)
- \(f(n) \in O(n^c), c > \log_ba \text{ and } \exists k \ \text{ st } af(\frac{n}{b})\le kf(n) \text{ then } T(n) \in \Theta(n^{\log_ba})\)
Common Ones
- \(T(n) = T(n/2) + \Theta(1) = O(\log n)\) (Binary Search)
- \(T(n) = 2T(n/2) + \Theta(1) = O(n)\) (Binary Tree Traversal)
- \(T(n) = 2T(n/2) + \Theta(\log n) = O(n)\) (Optimal Sorted Matrix Search)
- \(T(n) = 2T(n/2) + O(n) = O(n \log n)\) (Merge Sort)
Abstract Data Types
- Bag
- Insert(i); Draw()
- Stack (LIFO)
- empty(); peek(); pop(); push(i)
- Queue (FIFO)
- add(); offer(i); peek(); poll(); remove()
- Dequeue
- double-ended queue
Searching
- Binary Search \(O(\log n)\)
int binarySearch(int[] arr, int key) {
int start = 0, end = arr.length - 1;
int found = -1;
while(start <= end) {
int mid = start + (end - start)/2;
if(arr[mid] < key) {
start = mid + 1;
} else if(arr[mid] > key) {
end = mid - 1;
} else {
found = mid;
// if we want first instance
end = mid - 1;
// if we want last instance
// start = mid + 1;
}
}
return found;
}
- One sided Binary Search
- Suppose one side is bounded, eg [1, ∞). Use the sequence [1,2,4,8,16…, \(2^k\)…] If it works for \(2^k\), then search on [\(2^{k-1}, 2^k\)]
- Peak Finding
- A[j] in array A is peak if (i) A[j] > A[j-1] (ii) A[j] > A[j+1]. If only one item in array, vacously true
- 1D Peak Finding \(O(\log n)\)
- D&C
if a[n/2] < a[n/2-1] look at 1..n/2-1
else if a[n/2] < a[n/2+1] look at n/2+1..n
else return a[n/2]
- 2D Peak Finding \(O(m + n)\)
- D&C
find max in border + cross O(m+n)
if max is peak return
else go into quadrant with higher number
Sorting
- Bubble Sort
- Stable, In-place, W&A \(O(n^2)\), B \(O(n)\), S \(O(1)\); Invariant : At iteration i , the sub-array A[1 .. i] is sorted and any element in A[i + 1 .. A . size] is greater or equal to any element in A[1 .. i]
- Selection Sort
- In-Place, Unstable; find minimum element and swap. W,A,B \(O(n^2)\), S \(O(n/1)\); Invariant: a[0…i-1] is sorted all entries in a[i..n-1] are larger than or equal to the entries in a[0..i-1]
- Insertion Sort
- In-place, Stable; W \(O(n^2)\), B \(O(n \log n)\), S \(O(n)\); Invariant: The subarray a[i] consists of the original elements in sorted order.
- Merge Sort
- Stable, In Place; W/B \(O(n\log n)\), S \(O(n)\)
- Quick Sort
- In-place, Unstable; W \(O(n^2)\), A/B \(O(n\log n)\) S \(O(\log n)\)
Geometric Algorithms
Jarvis March \(O(hn)\)
- Find somewhere to start, e.g. y-min coordinate
- Add point with maximum angle from horizon \(O(n)\)
- Keep adding points with maximum angle from previous
Line Intersection Algorithm \(O(n\log n)\)
- Divide into two equal size sets (along vertical line)
- Recursively find convex hulls (base case 3 points)
- Merge convex hulls
- Find upper tangent lines
- while \((u,v,w)\) clockwise, decrement \(v\)
- while \((v,w,z)\) clockwise, increment \(w\)
- Find lower tangent lines
- while \((w,v,u)\) clockwise, increment \(v\)
- while \((z,v,u)\) clockwise, decrement \(w\)
- Find upper tangent lines
Quick Hull \(O(n \log n)\)
- Choose pivot, construct two subproblems, delete interior points
- recurse on subproblems
Trees
Binary Trees (height h
)
\(h(v) = max\left(h(v.left), h(v.right)\right) + 1\)
- BST: left ST < key < right ST
- traversal \(O(n)\) IN:LSR, PRE:SLR, POST:LRS
- insert, search, findMax, findMin: \(O(h)\)
- successor \(O(h)\):
- if hasRightChild, smallest node in right sub-tree
- else, first parent node that is left child (parent of node is successor)
- delete \(O(h)\): switch numChild
- 0: remove v
- 1: remove v, connect child(v) to parent(v)
- 2: swap with successor(v), remove(v)
AVL Trees (height \(h = \log n\))
- Property: Every node is height-balanced
- \(\lvert v.left.height - v.right.height \rvert \le 1\)
- insert \(O(\log n)\):
- insert key in BST
- walk up, perform max 2 rotations if out-of-balance
- delete(v): (\(\log n\) rotations)
- If v has 2 children, swap with successor
- delete v, and reconnect children
- for every ancestor of deleted node
- rotate if out-of-balance
- Splay Trees: Rotate nodes that are accessed to root. consider using where operations are non-random.
Augmented Trees
Rank Tree (Order Statistics)
- store weight of tree in each node:
- \(w(v) = w(v.left) + w(v.right) + 1\)
- select(k) \(O(\log n)\): finds node with rank \(k\)
rank = left.weight + 1;
if (k == rank)
return v
else if (k < rank)
return left.select(k)
else return right.select(k-rank)
- rank(v) \(O(\log n)\): computes rank of node v
rank = v.left.weight + 1
while (v != null) do
if v is left child do nothing
if v is right child,
rank += v.parent.left.weight + 1
v = v.parent
Interval Trees
- Each node is an interval \((m, n), m \le n\)
- Sort by \(m\), augment node with maximum \(n\) of children in each node
- search(x) \(O(\log n)\):
if x in c
return c
else if c has no left child
search in right subtree
else if x > max endpoint in c.left
search in right subtree
else search in left subtree
- findAll(x) \(O(k \log n)\) for k overlapping intervals
search(x)
store it somewhere else
remove interval
repeat until no intervals found
Orthogonal Range Searching
-
1D
- use a binary tree search tree
- store all points in the leaves of the tree, internal nodes store only copies
- each internal node v stores the max of any leaf in the left subtree
- Query Time: \(O(k + \log n)\)
- Building Tree: \(O(n \log n)\)
-
k-dim Tree
- each node in the x-tree has a set of points in its subtree
- store the y-tree at each x-node containing all points
- Query Time: \(O(k + \log^d n)\)
- Building Tree: \(O(n \log^{d-1}n)\)
- Space: \(O(n \log^{d-1}n)\)
Custom Augmentations
- Average height of people taller: augment nodes to include the count of the number of nodes in that sub-tree, along with the sum of the heights of all the people in that sub-tree. To return the desired average, first search for the name in the hash table; assume it is at node v; then find the sum of the heights of: the right-child of v, and if w is on the path from v to the root and v is in w’s left-subtree, then w’s right-subtree and w.
Hash Tables
- n: #items, m: #buckets
- Simple Uniform Hashing: Keys are equally likely to map to every
bucket, and are mapped independently
- \(load(ht) = \frac{n}{m}\)
- \(E_\text{search} = 1 + \frac{n}{m}\)
- Assume \(m=\Omega(n)\), \(E_\text{search} = O(1)\)
Hash Functions
Division
- \(h(k) = k \text{ mod } m\), choose m prime
Multiplication
- fix table size: \(m=2^r\), for some \(r\)
- fix word size: \(w\), size of key in bits
- fix odd constant \(A\), \(A > 2^{w-1}\)
- \(h(k) = (Ak) \text{ mod } 2^w » (w - r)\)
Rolling Hash
- When key changes by single character
Chaining
- bucket stores linked list, containing (object, value)
- Worst insert \(O(1 + cost(h))\)
- Expected search = \(1 + \frac{n}{m} = O(1)\)
- Worst search \(O(n)\)
Open Addressing
- One item per slot, probe sequence of buckets until find only one
- \(h(key, i) : U \mapsto {1..m}\), \(i\) is no. of collisions
- search: keep probing until empty bucket, or exhausted entire table
- delete: set key to tombstone value, so probe sequence still works
- insert: on deleted cell, overwrite, else find next available slot
- good hash function:
- \(h(key, i)\) enumerates all possible buckets
- Simple Uniform Hashing
- Linear: \(h(k,i) = h(k) + i\), Clustering
- Double: \(h(k,i) = f(k) + i \cdot g(k) \text{ mod } m\)
- Insert, Search: \(\frac{1}{1-\alpha}\) where \(\alpha = \frac{n}{m} \le 1\)
- good: saves space, rare mem alloc, better cache perf
- bad: sensitive to hash, load
Cuckoo Hashing
- Resolving hash collisions with worst-case constant lookup time
- Lookup: inspection of just two locations in the hash table
- Insertion: Insert into first table if empty; else kick out other key to second location.
- If infinite loop, hash function is rebuilt in place
Table resizing
- Scan old table \(O(m_1)\), create new table \(O(m_2)\), insert each element \(O(1)\), total \(O(m_1 + m_2 + n)\)
- \(O(n)\) amor: if \(n == m\), \(m = 2m\), if \(n < \frac{m}{4}\), \(m = \frac{m}{2}\)
Fingerprint Hash Table (FHT)
- Vector of 0/1 bits
- no false negatives, but has false positives. \(P_{\text{no FP}} = \left(\frac{1}{e}\right)^{n/m}\)
Bloom Filter
- use \(n\) hash functions. More space per item, but require \(n\) collisions for false positive.
- \(P_{\text{coll}} = \left(1- e^{-kn/m}\right)^k\)
- Two hash functions, \(h(k)\) and \(t(k)\), two tables \(T_1\) and \(T_2\)
- insert: \(T_1[h(k)] = 1\), \(T_2[h(k)] = 1\)
- search: if \(T_1[h(k)]\) and \(T_2[h(k)]\) both 1 return true
Graphs
Type | Space | v,w | any | all |
---|---|---|---|---|
List | \(O(V+E)\) | slow | fast | fast |
Mat | \(O(V^2)\) | fast | slow | slow |
Simple search
- BFS/DFS do not explore all paths
BFS \(O(V+E)\)
bfs(root)
Q.enqueue(root)
while Q is not empty:
current = Q.dequeue()
visit(current)
for each node n adj to current
if n not visited
n.parent = current
Q.enqueue(n)
DFS \(O(V+E)\)
- Same as BFS, but use stack instead of queue
Topological Sort (DAG)
- Post-order DFS
- Kahn’s Algorithm (first append all nodes with no incoming edges to result set, remove edges connected to these nodes and repeat, also O(V+E))
SSSP
Bellman-Ford \(O(EV)\)
- \(O(V^3)\) if using Adj Matrix
do V number of times
for (Edge e : graph)
relax(e)
- can terminate early if no improvement
- can detect negative cycle: perform V times, then perform once more, if have changes it has negative cycle
- if all weights are the same, use BFS
Dijkstra \(O(E\log V)\)
Interesting article on how Dijkstra’s algorithm is everywhere
- Doesn’t work with negative edge weights
- can terminate once end is found
add start to PQ
dist[i] = INF for all i
dist[start] = 0
while PQ not empty
w = pq.dequeue()
for each edge e connected to w
if edge is improvement
update pq[w] O(logn)
update dist[w]
DAG
- Toposort, relax in order
- SSSP on DAG: run topo sort, and relax edges in that order in \(O(V+E)\)
- Single Source Longest Path problem is easy on DAG: multiply edge weights by -1 and run SSSP
Heap
- implements priority queue, is a complete binary tree
- priority of parent > priority of child
- insert: create new leaf,
bubbleUp
- decreaseKey: update priority,
bubbleDown
- delete: swap with leaf, delete, and then
bubble
- store in array:
- \(left(x) = 2x + 1\)
- \(right(x) = 2x + 2\)
- \(parent(x) = \lfloor(x-1)/2\rfloor\)
Heap Sort
-
Heapify (insert n items) O(n log n)
-
Extract from heap n times (O(n log n))
-
Improvement: recursively join 2 heaps and bubble root down (base case single node) O(n)
-
O(n log n) worst case, deterministic, in-place
UFDS (weighted)
-
union(p,q) \(O(\log n)\)
- find parent of p and q
- make root of smaller tree root of larger tree
-
find(k) \(O(\log n)\)
- search up the tree, return the root
- (PC): update all traversed nodes parent to root
-
WU with PC, union and find \(O(\alpha(m,n))\)
MST
- acyclic subset of edges that connects all nodes, and has minimum weight
Properties
- Cutting edge in MST results in 2 MSTs
- Cycle Poperty: \(\forall\) cycle, max weight edge is not in MST
- Cut Property: \(\forall\) partitions, min weight edge across cut is in MST
Prim’s \(O(E \log V)\)
- Uses cycle property
T = {start}
enqueue start's edges in PQ
while PQ not empty
e = PQ.dequeue()
if (vertex v linked with e not in T)
T = T U {v, e}
else
ignore edge
MST = T
Kruskal’s \(O(E\log V)\)
- Uses UFDS
- It is possible that some edge in the first \(V-1\) edges will form a cycle with pre-existing MST solution
Sort E edges by increasing weight
T = {}
for (i = 0; i < edgeList.length; i++)
if adding e = edgelist[i] does
not form a cycle
add e to T
else ignore e
MST = T
Boruvka’s \(O(E\log V)\)
T = { one-vertex trees }
While T has more than one component:
For each component C of T:
Begin with an empty set of edges S
For each vertex v in C:
Find the cheapest edge from v
to a vertex outside of C, and
add it to S
Add the cheapest edge in S to T
Combine trees connected by edges
MST = T
Variants
- Same weight: BFS/DFS \(O(E)\)
- Edges have weight \(1..k\):
- Kruskal’s
- Bucket sort Edges \(O(E)\)
- Union/check \(O(\alpha (V))\)
- Total cost: \(O(\alpha(V)E)\)
- Prim’s
- Use array of size k as PQ, each slot holds linked list of nodes
- insert/remove nodes \(O(V)\)
- decreaseKey \(O(E)\)
- Kruskal’s
- Directed MST
- \(\forall\) node except root, add minimum incoming edge \(O(E)\)
- MaxST
- negate all weights, run MST algo
MST Problems
-
How do I add an edge (A,B) of weight k into graph G and find MST quickly?
- Use cycle property; max edge in any cycle is not in MST
- only add (A,B) if k is not the max weight edge
- O(V + E) time to find max edge along A → B with DFS
-
Given an undirected graph with \(K\) power plants, find the minimum cost to connect all other sites.
- run Prim’s, use super source
- weight of new edges are zero
- this is a single MST
-
How do I make Kruskal run faster when sorting?
- Store edges in separate linked lists
- To process edges in increasing weight, process all edges in one linked list then the next
- Time: \(O(E)\) or \(O(E\alpha(m, n))\)
- Space: \(O(E)\), need to store all \(E\) edges
-
Minimum Bottleneck Spanning Tree (MBST)
- General idea: If I use some edge e that is not in the MST to replace some edge e’ in the MST, then my max. edge is max (max edge on original MST, e).
- Intuitively, my MST would then fulfill the condition of MBST.
- Note: Every MST is an MBST, but not every MBST is an MST
-
Find maximum distance between 2 vertices in MST
- Bruteforce: perform DFS starting from every single location since there is only one path from any node to another
- DFS: \(O(V+E)\), doing it \(V\) times, \(O(V(V+E)) =O(V^2)\) since \(E = V - 1\)
- Space: \(O(V)\), need to store all the edges in MST
Floyd-Warshall (APSP)
- Shortest paths have optimal substructure
- Shortest paths have overlapping subproblems
- Idea: gradually allow usage of intermediate vertices
- Invariant: At step k, shortest path via nodes 0 to k are correct
// precondition: A[i][j] contains weight
// of edge (i,j) or inf if no edge
int[][] APSP(A) {
// len = # vertices
// clone A into S
for(int k = 0; k < len; k++)
for(int i = 0; i < len; i++)
for(int j = 0; j < len; j++)
S[i][j] =
Math.min(S[i][j],
S[i][k] + S[k][j]);
return S;
}
Network Flow
- k-edge connected
- Source and target are k-edge connected if there are k edge disjoint paths(don’t share edges) from source to target.
- Max flow
- st-cut property with minimum capacity(outgoing from s, ignore incoming to s)
- Min cut
- Let \(S\) be the nodes reachable from the source in the residual graph. \(T\) = all other nodes, S → T is minimum cut
- Augmenting Path
- path in residual graph from s to t that has no 0 weight edges
Ford-Fulkerson
- Start with 0 flow
- While there exists augmenting path:
- find an augmenting path
- compute bottleneck (min edge)
- increase flow on the path by bottleneck capacity
Time Complexity:
- DFS: \(O(|F|E)\)
- BFS(Edmonds-Karp, shortest augmenting path): \(O(VE^2)\)
- Dinitz: \(O(V^2E)\)
Graph Algorithms on Trees
Check if connected graph is tree
Run DFS, stop when after traversing \(V-1\) edges, return true if all nodes connected and no other used edge. False otherwise. \(O(V)\)
Min Vertex Cover
- Idea: transform tree into DAG, run DP
- only two possiblities for each vertex; taken or not
int MVC(int v, int flag) {
int ans = 0;
if (memo[v][flag] != -1)
return memo[v][flag];
else if (leaf[v]) //if v is leaf
ans = flag;
else if (flag == 0) {
ans = 0;
for(child : adjList[v]) {
ans+= MVC(child, 1);
}
}
else if (flag == 1) {
for (child : adjList[v]) {
ans += min(MVC(child,1),
MVC(child,0));
}
}
}
SSSP
- On a weighted tree, any graph traversal algorithm (eg. DFS, BFS) can obtain the shortest path to any vertice in \(O(V)\)
- Weight of shortest path between two vertices is the sum of the weights of edges on the unique path
ASSP
- Run SSSP on V vertices in total \(O(V^2)\), compared to \(O(V^3)\) FW algorithm
Diameter
- Originally, run FW in \(O(V^3)\) and do an \(O(V^2)\) all-pairs check, to total \(O(V^2)\).
- Now, only need 2 \(O(V)\) traversals: DFS/BFS from any vertex \(s\) to find the furthest vertex \(x\). Then do a DFS/BFS one more time from vertex \(x\) to find furthest vertex \(y\). Length of unique path along x to y is the diameter of the tree.
Graph Modelling Techniques
- minimum shortest path from many source to one destination: run SSSP treating destination as source.
- multiple sources to multiple destinations: consider super source and super sink, with edge weight 0, and run Dijkstra (if no negative edge weights), BF otherwise.
- Attempt to convert graph into a DAG and use DP techniques. Example: attaching a variable to a vertex that is monotonically decreasing
- Shortest path between X and Y that passes through node \(A\): Compute two shortest paths; X to A, A to Y, and join the paths.
Parallel Algorithms
Parallel Fibonacci
parallelFib(n) {
if(n < 2) then
return n;
x = spawn parallelFib(n - 1);
y = spawn parallelFib(n - 2);
sync;
return x + y;
}
- Critical Path: \(T_\infty\), Parallelism = \(T_1/T_\infty\)
- \(T_\infty(n) = max(T_\infty(n - 1), T_\infty(n - 2)) + O(1) = O(n)\)
- \(T_p> T_1/p\)
- \(T_p > T_\infty\), cannot run slower on more processors
- Goal: \(T_p = (T_1/p) + T_\infty\), \(T_1/p\) is the parallel part, \(T_\infty\) is the sequential part
Matrix Addition
Before: • Work analysis: \(T_1(n) = O(n^2)\) • critical path analysis: \(T_\infty(n) = O(n^2)\) After:
pMatAdd(A,B,C,i,j,n)
if(n == 1)
C[i,j] = A[i,j] + B[i,j];
else:
spawn pMatAdd(A,B,C,i,j,n/2);
spawn pMatAdd(A,B,C,i,j + n/2,n/2);
spawn pMatAdd(A,B,C,i + n/2,j,n/2);
spawn pMatAdd(A,B,C,i + n/2,j + n/2,n/2);
sync;
- Work Analysis: \(T_1(n) = 4T_1(n/2) + O(1) = O(n^2)\)
- Critical Path Analysis: \(T_\infty(n) = T_\infty(n/2) + O(1) = O(\log n)\)
Parallelized Merge Sort \(O(\log^3n)\)
pMerge(A[1..k], B[1..m], C[1..n])
if (m > k) then pMerge(B, A, C);
else if (n==1) then C[1] = A[1];
else if (k==1) and (m==1) then
if (A[1] <= B[1]) then
C[1] = A[1]; C[2] = B[1];
else
C[1] = B[1]; C[2] = A[1];
else
// binary search for j where
// B[j] <= A[k/2] <= B[j+1]
spawn pMerge(A[1..k/2],
B[1..j],
C[1..k/2+j])
spawn pMerge(A[k/2+1..l],
B[j+1..m],
C[k/2+j+1..n])
synch;
pMergeSort(A, n)
if (n=1) then return;
else
X = spawn pMergeSort(A[1..n/2], n/2)
Y = spawn pMergeSort(A[n/2+1, n], n/2)
synch;
A = spawn pMerge(X, Y);