I-Tech: Algo

Recursion: Normal and powerful tool when used with sufficient corner case validation to avoid system lockup.

Greedy: For such problems, we first make a safe move(if there is optimal solution consistent with first move), break subproblems and solve until it ends. Sometimes, sorting inputs in a particular order helps speed up greedy algorithm.

Divide-Conquer-Combine: Use when smaller sub problems are not evaluated multiple times. E.g. Binary Search Tree. Leads to recursive solutions.

1. Divide: Break into non-overlapping subproblems of same type,

2. Conquer: Solve each of sub-problem using recursion as these sub-problems are same type

3. Combine: Combine results from 2.

Dynamic Programming(Memoization): When smaller sub-problems are repeatedly evaluated. E.g. Fibonacci

ARRAYS

+ Contiguous area of memory having equal sized elements

+ Guaranteed constant time access to any element [ i.e. base + elem_size*(i-first_index) ]

+ Add/delete at end O(1).

- Expensive to add/delete at start/middle O(n).

Dynamic Array
Partial Solution to static array: Dynamically allocated, int *arr=new int[size]; //Question: How big should be size?
Full Solution: All computer problems can be solved with another level of indirection. Store pointer to dynamically allocated array and replace it with newly allocated array when required.
GetElem(i), SetElem(i, val) Size() O(1)
PushBack(val), Remove(i) O(n)

Usage: C++ vectors, Java ArrayList, Python list(no static array)

It has:
arr: dynamically allocated array
capacity: size of arr
size: number of elements currently in array
When size==capacity, allocate new array, copy elements, update arr, free old array.

Get(i): if(i<0 or i>=size) ERR out of range
return arr[i]

Set(i, val): if(i<0 or i>=size) ERR out of range
arr[i]=val;

PushBack(val):
if size==capacity:
allocate new_arr[2*capacity]
for i from 0 to size-1: new_arr[i]=arr[i]
free arr
arr=new_arr; capacity=2*capacity;
arr[size]=val; size=size+1

Remove(i):
if(i<0 or i>=size) ERR out of range
for j from i to size-2:
arr[j]=arr[j+1]
size=size-1

Size(): return size;

Say if we expand array capacity by 10 times, we end up having O(n^2)/n=O(n) using amortized cost shown below.

LINKED LIST

+ Constant time to add/delete from front

+ With tail and doubly linked list, constant time to add/delete at back

- O(n) to find random member

Singly Linked List : head-> 1->2->3->10->100->NULL.

Each node has a key(1,2,3,10,100) and next pointer(arrow).

APIs: push_[back/front], pop_[back/front], top_[back/front], add_before/after(Node, key), is_present(key) etc.

All back operations, is_present, add_before/after are O(n).

Note: If we maintain tail pointer, just like head pointer, some of the back operations can be reduced to O(1).

Doubly Linked List:

It also has prev pointer, along with key and next pointer.

The back operations become efficient, but adds code complexity.

e.g.

push_back(key)

n=new node;

n.key=key; n.next=NULL;

if(tail=NULL):

head=tail=n;

n.prev=NULL;

else:

tail.next=n;

n.prev=tail;

tail=n;

STACK

Push, Pop, Top, IsEmpty operations, LIFO (Last In First Out) order followed.

Usage: Find if input string is balanced, i.e. has matching (), [], {}.

Stack s

for char in str:

if char in [ '(', '[', '{' ]:

s.push(char);

else:

if s.IsEmpty(): return 0;

top=s.pop();

if(top=='[' and char!=']') OR (top=='[' and char!=']') OR (top=='[' and char!=']') :

return 0;

return s.IsEmpty();

Stack using Array: Simple O(1) operations as it is LIFO.
Stack using List: can be extended without being constrained by array size.
Stack using dynamic array based implementation, in which when the array is full, it is reallocated twice the size and older members copied over.

QUEUES

All EnQ, DeQ, IsEmpty O(1) operations, FIFO (First In First Out) order followed.

Queue with Linked List:
Queue with Array can be nightmare, DeQ will require removal from start. It can be eased if read and write index are maintained. write index denotes location to be written next, same for read index.
In above array implementation, it can use circular array, but ensure that write and read pointers have one element buffer for proper operation. Thus, Q is considered full with abs(write-read)==1
Q is empty with write==read.

TREES

Can be empty, or node with key, and list of child trees.

Keeping parent information improves tree algorithm.

Many representations of trees possible, like Syntax Tree For expression

3cos(7x-5) can be represented as:

3 cos

* 5

7 x

B C D

E F

Ancestor is parent or parent of parent etc. E.g. A, B are ancestor of E.

Descendent: child or child of child etc. E.g. B, C, D, E, F are descendent of A

Sibling: Share same parent. E.g. B, C, D are siblings

Leaf: Node with no children. E.g. E, F, C, D.

Height: Max depth of subtree node and farthest leaf. E.g. wrt A, height is 3.

Forest: Collection of trees.

E.g. B

E F

Or, C-D

Binary Tree: A node has key, left, right and parent(optional) information.

Height(tree):

if tree==NULL: return 0

return 1 + Max(Height(tree->left), Height(tree->right)

Size(tree):

if tree==NULL: return 0

return 1 + Size(tree->left) + Size(tree->right)

Tree Traversal: DFS(In/Pre/PortOrder) and BFS(LevelOrder)

DepthFirst (InOrder):

if tree==NULL: return

InOrder(tree->left)

Print(tree.key)

InOrder(tree->right)

DepthFirst (Pre-Order)

if tree==NULL: return

Print(tree->key)

PreOrder(tree->left)

PreOrder(tree->right)

DepthFirst(PostOrder)

if tree==NULL: return

PostOrder(tree->left)

PostOrder(tree->right)

Print(tree->key)

BreadthFirst(LevelOrder)

if tree==NULL: return

Q q

q.EnQ(tree)

while !q.Empty():

node=q.DeQ()

Print(node)

if node->left!=NULL: q.EnQ(node->left)

if node->right!=NULL: q.EnQ(node->right)

Sample Tree:

InOrder(DFS): Amber -> Bear -> Cat -> Lion -> Panda -> Snake -> Dog -> Turkey -> Whale

PreOrder(DFS): Lion -> Bear -> Amber -> Cat -> Snake -> Panda -> Turkey -> Dog -> Whale

PostOrder(DFS): Amber -> Cat -> Bear -> Panda -> Dog -> Whale -> Turkey -> Snake -> Lion

LevelOrder(BFS): Lion -> Bear -> Snake -> Amber -> cat -> Panda -> Turkey -> Dog -> Whale

Priority Queue
Generalization of Q where each element has a priority and elements come out in order by priority.
Usage: C++ priority_queue, Java PriorityQueue, Python heapq
Array based, normal Queue operations like PushBack(), PopFront().

It has two basic operations, Insert(), ExtractMax(). Other ops can be Remove(i), GetMax()-return but not change Q, ChangePriority(i, p)

E.g. Scheduler, Dijkstra shortest path, Prim's Min Spanning Tree from Graph, Huffman to construct optimum prefix-free string encoding, HeapSort

Approach:
Keep sorted array to optimize ExtractMax(), but Insert() becomes linear as O(logn) to find position using binary search, and shift all element right O(n).
PQ using unsorted array: O(1) Insert, O(n) ExtractMax
PQ using sorted array: O(n) Insert, O(1) ExtractMax

PQ using binary heap: O(log n) Insert and ExtractMax. FindMin is O(1).

Binary heap is binary tree where for each edge of tree, value of parent is at least value of child.

The max-heap property: Value of each node is <= value of its parent with max value at the root.

Disjoint Sets
For problems such as in a maze, is A reachable from B.
MakeSet(x): create singleton set {x}
Find(x): returns ID of set containing x. Find(x)=Find(y) if x and y are in same set, else not equal.
Union(x, y): Merge two sets containing x and y.

ID: It can be chosen as smallest element of a set.
e.g. {10,5,3,20} {6} {1,2,8}
1 2 3 5 6 8 10 20
smallest: 1 1 3 3 6 1 3 3

smallest[i]: stores smallest element in the set i belongs to.

MakeSet(x): smallest[x]=x
Find(x): return smallest[x]

Union(x, y): O(n)-bottleneck
x_id=Find(x)
y_id=Find(y)
if x_id==y_id: return
m=min(x_id, y_id)
for k from 1 to n:
if smallest[k] in {x_id, y_id}: smallest[k]=m

PreProcess(maze):
for each cell c in maze: MakeSet(c)
for each cell c in maze:
for each neighbour n of c: Union(c, n)

IsReachable(A,B): return Find(A)=Find(B)

Solution: Use linked list to improve Union(). Represent set as linked list, use list tail as ID (bold) of the set.
List1: 9->3->2->4->7
List2: 6->1->8
Union: 9->3->2->4->7 -> 6->1->8
+ Union is O(1) [iff we can get tail of list1 and head of list2 in const time,
+ ID is well defined
- Find() is O(n).

Solution: Representing Disjoint sets as trees give best representation.

Represent each set as rooted tree
ID is root of tree
Use array parent[1..n] where parent[x] is parent of x, or x if its root.

MakeSet(x): parent[x]=x //O(1)
Find(x): //O(tree height)
while x!=parent[x]: x=parent[x]
return x;

How to merge trees: hang shorter tree under longer tree to keep height under check. Hanging shorter tree under longer tree is called union by rank heuristic.
How to quickly find height of tree: Keep rank[1..n] where rank[i] is height of subtree whose root is i.

MakeSet(x): parent[x]=x rank[x]=0
Find(x):
while x!=parent[x]: x=parent[x]
return x

Union(x, y):
x_id=Find(x)
y_id=Find(y)
if(x_id == y_id): return
if rank[x_id] > rank[y_id]: parent[y_id]=x_id;
else:
parent[x_id]=y_id
if rank[x_id] == rank[y_id]: rank[y_id]=rank[y_id]+1

Hashing

dict in python,
HashMap in Java,
Keywords like for, while, if in C,
FileSystems(mapping to mount), are implemented using hash table.
Banking password generator: password is not sent over network, rather unique hashvalue generated on device and sent to server, server compares password against hashvalue.
Storage optimization like dropbox: Optimized using hashing

Problem:
To save from DenialOfServiceAttack where same IP sends many request to hog your server. Wish to analyze logs to safeguard, hashtable can be helpful.
Say to answer queries of last 1 hour:
- Keep count how many times IP address appeared in logs,
- counter updated every second. (increment for new lines, decrement for old lines older than 1 hr)
- DS to map IP to counters
- IP can be taken as 32-bit number, use Array A having 2^32 IP size and contains counter, number of times IP appeared.
e.g. 69.171.230.68=69*2^24 + 171*2^16 + 230*2^8+68=1168893508

Issue: In direct addressing AccessLastHour() and UpdateAccessList() are O(1), but need 2^32 array to keep counters. Huge memory requirement, becomes worse for IPv6(2^128). Answer: Hash functions.

Hash Table implementation of set or map using hashing. Chaining is most frequent method to implement hash tab.

Chaining: Store mapping from one type of objects to another type of objects. e.g. FileName to file on disk, EmpID to employee name. It is technique to implement hash table.

Map from Set S to V has: hasKey(O), Get(O), Set(O, v), where O is element of S, v is element of V.

Going back to IP problem:
Create array of size m=8. Each cell will store list, having Object O and value v.

SET
A set is DS which has 3 operations: Add(O), Remove(O), find(O).
Two ways:
1. Map from S to V = {true, false}; //Depending on if obj is in set
2. (efficient): Store just Object O instead of storing pair.
Set: unordered_set in C++, HashSet in Java, set in Python
Map: unordered in C++, HashMap in Java, dict in Python

Hash Functions
- Should be quick to compute.
- should give unique value for different objects.
- must be deterministic, can not be chosen randomly.

Example: Phone Book, store name against phone numbers.

1. DirectAddressing, convert number to integer and have large array to store names. Lost of space wastage as exponential memory consumed.

1TB to store phone book of 12 digit phone number.

2. Chaining: Select hash function of cardinality m, create name array of size m, Name[h(int(phone))].

Start with smaller hash table and use concept of dynamic arrays to double size when limit is hit.

Hashing INT

1. Choose prime number, p, bigger than 10^n [n-number of digits in INT]

2. Choose hash table size, m=1000(say)

Say if h(x)=(ax+b)%p=(34x+2)%1000019,

For N=1482567, h(x)=407185

h(x)/m=185 (index in hash table)

Here, a should be random number between 1,p; and b should be different random number between 0,p

Hashing STRING

- Reverse of phone book, given name find phone number.

- Need to design hash function for name

Convert each character str[i] into integer using ASCII or UniCode.

Choose big prime, p

PolyHash(S, p, x)

hash =0

for i from len(S) − 1 to 0:

hash = (hash * x + S[i]) % p

return hash

E.g. len(str)=3

hash=str[2]*x%p

hash=str[1]+str[2]*x%p

hash=str[0]+str[1]*x+str[2]*(x^2)%p

Calculate Power:

#define DEBUG //printf

float mypow(float x, int n) //n can be -ve int

{

if(n==0) { DEBUG("\tRetA 1\n"); return 1; }

float val=mypow(x, n/2);

if(n%2==0)

{ DEBUG("\tRetB %f\n", val*val); return val*val; }

else

if(n>0) { DEBUG("\tRetC %f\n", x*val*val); return x*val*val; }

else { DEBUG("\tRetD %f\n", (val*val)/x); return (val*val)/x; } //Required for negative power

}

To understand how recursion works here, enable prints and see below:

RetA 1	RetA 1	RetA 1
RetC 2.000000	RetD 0.500000	RetC 2.100000
RetB 4.000000	RetB 0.250000	RetB 4.409999
RetC 32.000000	RetB 0.062500	2.1^2=4.409999
RetB 1024.000000	2^-4=0.062500
2^10=1024.000000

Fibonacci

#define DEBUG printf

int fib(int n) //Recursion without memoization

{

if(n==0) { printf(" A0");return 0;}

else if(n==1) { printf(" B 1"); return 1;}

else { int x=fib(n-1)+fib(n-2); printf(" C %d", x); return x;}

}

If you see, for calculating next Fib number, there is repetition of calculation (shown in color). This can be saved using Dynamic Programming (Memoization)

Fib(5): 5, B 1 A0 C 1 B 1 C 2 B 1 A0 C 1 C 3 B 1 A0 C 1 B 1 C 2 C 5

Fib(): 8 B 1 A0 C 1 B 1 C 2 B 1 A0 C 1 C 3 B 1 A0 C 1 B 1 C 2 C 5 B 1 A0 C 1 B 1 C 2 B 1 A0 C 1 C 3 C 8

int fib_iterative(int n) //Iterative version to save stack used between function calls

{

int first=0, second=1, next, c;

printf("\nInitial %d terms on fib are: ", n);

for(c=0;c

{

if(c<=1) next=c;

else { next=first+second; first=second; second=next; }

printf("%d\n", next);

}

Output:

Initial 10 terms on fib are: 0 1 1 2 3 5 8 13 21 34

Algorithm complexity:

O(n) represents upper bound. Θ(n) means tight bound. Ω(n) represents lower bound.

Naive Recursive Fib: O(s^n) Θ(Fn)

More Theory
Amortized Analysis: To find worst case time for cases which are mostly O(1) but can take upto O(n) at times, such as Dynamic Array when they are doubled in capacity.
AmortizedCost= Cost(n-ops)/n

Aggregate Method:
Let ci=cost of i-th insert
ci = 1 + x
x=(i-1), if (i-1) is power of 2 as i-th insert will cause resize
x=0 otherwise

Amortized Cost= Sum i=1TOn ci / n = O(n)/n = O(1) of each insertion, worst case cost is still O(n).

Banker's Method: (no code change, just analysis/mental accounting)
Charge extra for each cheap operation, save extra charge as token in DS.
Use tokens to pay for expensive operations.

Physicist's Method: Like a ball taken up hill (potential energy), rolled down energy converted to kinetic energy.

Divide and Conquer Example
Multiply Long Integers
http://www.cs.umd.edu/~meesh/351/mount/lectures/lect10-long-int-multiply.pdf

I-Tech

Technical Notes

Algo

No comments:

Post a Comment