Algo

Recursion: Normal and powerful tool when used with sufficient corner case validation to avoid system lockup.

Greedy: For such problems, we first make a safe move(if there is optimal solution consistent with first move), break subproblems and solve until it ends. Sometimes, sorting inputs in a particular order helps speed up greedy algorithm.

Divide-Conquer-Combine
: Use when smaller sub problems are not evaluated multiple times. E.g. Binary Search Tree. Leads to recursive solutions.
1. Divide: Break into non-overlapping subproblems of same type,
2. Conquer: Solve each of sub-problem using recursion as these sub-problems are same type
3. Combine: Combine results from 2.

Dynamic Programming(Memoization): When smaller sub-problems are repeatedly evaluated. E.g. Fibonacci


ARRAYS
+ Contiguous area of memory having equal sized elements
+ Guaranteed constant time access to any element [ i.e. base + elem_size*(i-first_index)  ]
+ Add/delete at end O(1).
- Expensive to add/delete at start/middle O(n). 

Dynamic Array
Partial Solution to static array: Dynamically allocated,  int *arr=new int[size]; //Question: How big should be size?
Full Solution: All computer problems can be solved with another level of indirection. Store pointer to dynamically allocated array and replace it with newly allocated array when required.
GetElem(i), SetElem(i, val) Size()   O(1)
PushBack(val), Remove(i) O(n)

Usage: C++ vectors, Java ArrayList, Python list(no static array)

It has:
arr: dynamically allocated array
capacity: size of arr
size: number of elements currently in array
When size==capacity, allocate new array, copy elements, update arr, free old array.

Get(i): if(i<0 or i>=size) ERR out of range
           return arr[i]

Set(i, val): if(i<0 or i>=size) ERR out of range
          arr[i]=val;

PushBack(val):
  if size==capacity:
    allocate new_arr[2*capacity]
    for i from 0 to size-1: new_arr[i]=arr[i]
  free arr
  arr=new_arr; capacity=2*capacity;
arr[size]=val; size=size+1

Remove(i):
  if(i<0 or i>=size) ERR out of range
  for j from i to size-2:
    arr[j]=arr[j+1]
  size=size-1

Size(): return size;

Say if we expand array capacity by 10 times, we end up having O(n^2)/n=O(n) using amortized cost shown below.


LINKED LIST
+ Constant time to add/delete from front
+ With tail and doubly linked list, constant time to add/delete at back
- O(n) to find random member

Singly Linked List : head-> 1->2->3->10->100->NULL.
Each node has a key(1,2,3,10,100) and next pointer(arrow).

APIs: push_[back/front], pop_[back/front], top_[back/front], add_before/after(Node, key), is_present(key) etc.

All back operations, is_present, add_before/after are O(n).
Note: If we maintain tail pointer, just like head pointer, some of the back operations can be reduced to O(1).


Doubly Linked List: 
It also has prev pointer, along with key and next pointer.
The back operations become efficient, but adds code complexity.
e.g.
push_back(key)
  n=new node;
  n.key=key; n.next=NULL;
  if(tail=NULL):
    head=tail=n;
    n.prev=NULL;
  else:
    tail.next=n;
    n.prev=tail;
    tail=n;

STACK
Push, Pop, Top, IsEmpty operations, LIFO (Last In First Out) order followed.

Usage: Find if input string is balanced, i.e. has matching (), [], {}.
Stack s
for char in str:
  if char in [ '(', '[', '{' ]:
    s.push(char);
  else:
    if s.IsEmpty(): return 0;
    top=s.pop();
    if(top=='[' and char!=']')  OR (top=='[' and char!=']')  OR (top=='[' and char!=']') :
      return 0;
return s.IsEmpty();

  • Stack using Array: Simple O(1) operations as it is LIFO.
  • Stack using List: can be extended without being constrained by array size.
  • Stack using dynamic array based implementation, in which when the array is full, it is reallocated twice the size and older members copied over.


QUEUES
All EnQ, DeQ, IsEmpty O(1) operations, FIFO (First In First Out) order followed.

  • Queue with Linked List: 
  • Queue with Array can be nightmare, DeQ will require removal from      start. It can be eased if read and write index are maintained. write index denotes location to be written next, same for read index.
  • In above array implementation, it can use circular array, but ensure that write and read pointers have one element buffer for proper operation. Thus, Q is considered full with abs(write-read)==1
  • Q is empty with write==read.
TREES
Can be empty, or node with key, and list of child trees.
Keeping parent information improves tree algorithm.

Many representations of trees possible, like Syntax Tree For expression 
3cos(7x-5) can be represented as:

     *
3       cos
            -
         *     5
          7   x

           A
   B     C      D
E    F

Ancestor is parent or parent of parent etc. E.g. A, B are ancestor of E.
Descendent: child or child of child etc. E.g. B, C, D, E, F are descendent of A
Sibling: Share same parent. E.g. B, C, D are siblings
Leaf: Node with no children. E.g. E, F, C, D.
Height: Max depth of subtree node and farthest leaf. E.g. wrt A, height is 3.
Forest: Collection of trees.
   E.g.    B
           E   F
   Or,   C-D



Binary Tree:  A node has key, left, right and parent(optional) information.
Height(tree):
  if tree==NULL: return 0
  return 1 + Max(Height(tree->left), Height(tree->right)

Size(tree):
  if tree==NULL: return 0
  return 1 + Size(tree->left) + Size(tree->right)

Tree Traversal: DFS(In/Pre/PortOrder) and BFS(LevelOrder)
DepthFirst (InOrder): 
  if tree==NULL: return
  InOrder(tree->left)
  Print(tree.key)
  InOrder(tree->right)

DepthFirst (Pre-Order)
  if tree==NULL: return
  Print(tree->key)
  PreOrder(tree->left)
  PreOrder(tree->right)

DepthFirst(PostOrder)
  if tree==NULL: return
  PostOrder(tree->left)
  PostOrder(tree->right)
  Print(tree->key)

BreadthFirst(LevelOrder)
  if tree==NULL: return
  Q q
  q.EnQ(tree)
  while !q.Empty():
    node=q.DeQ()
    Print(node)
    if node->left!=NULL: q.EnQ(node->left)
    if node->right!=NULL: q.EnQ(node->right)

Sample Tree:



InOrder(DFS): Amber -> Bear -> Cat -> Lion -> Panda -> Snake -> Dog -> Turkey -> Whale
PreOrder(DFS): Lion -> Bear -> Amber -> Cat -> Snake -> Panda -> Turkey -> Dog -> Whale
PostOrder(DFS): Amber -> Cat -> Bear -> Panda -> Dog -> Whale -> Turkey -> Snake -> Lion
LevelOrder(BFS): Lion -> Bear -> Snake -> Amber -> cat -> Panda -> Turkey -> Dog -> Whale


Priority Queue
Generalization of Q where each element has a priority and elements come out in order by priority.
Usage: C++ priority_queue, Java PriorityQueue, Python heapq
Array based, normal Queue operations like PushBack(), PopFront(). 

It has two basic operations, Insert(), ExtractMax(). Other ops can be Remove(i), GetMax()-return but not change Q, ChangePriority(i, p)

E.g. Scheduler, Dijkstra shortest path, Prim's Min Spanning Tree from Graph, Huffman to construct optimum prefix-free string encoding, HeapSort

Approach:
Keep sorted array to optimize ExtractMax(), but Insert() becomes linear as O(logn) to find position using binary search, and shift all element right O(n).
PQ using unsorted array: O(1) Insert, O(n) ExtractMax
PQ using sorted array: O(n) Insert, O(1) ExtractMax
PQ using binary heap: O(log n) Insert and ExtractMax. FindMin is O(1).

Binary heap is binary tree where for each edge of tree, value of parent is at least value of child.
The max-heap property: Value of each node is <= value of its parent with max value at the root.


Disjoint Sets
For problems such as in a maze, is A reachable from B.
MakeSet(x): create singleton set {x}
Find(x): returns ID of set containing x. Find(x)=Find(y) if x and y are in same set, else not equal.
Union(x, y): Merge two sets containing x and y.

ID: It can be chosen as smallest element of a set. 
e.g. {10,5,3,20}  {6}  {1,2,8}
                1   2   3   5   6   8   10   20
smallest:   1   1   3   3   6   1   3     3

smallest[i]: stores smallest element in the set i belongs to.

MakeSet(x): smallest[x]=x
Find(x): return smallest[x]

Union(x, y):      O(n)-bottleneck
  x_id=Find(x)
  y_id=Find(y)
  if x_id==y_id: return
  m=min(x_id, y_id)
  for k from 1 to n:
    if smallest[k] in {x_id, y_id}: smallest[k]=m

PreProcess(maze):
  for each cell c in maze: MakeSet(c)
  for each cell c in maze: 
    for each neighbour n of c: Union(c, n)

IsReachable(A,B): return Find(A)=Find(B)


Solution: Use linked list to improve Union(). Represent set as linked list, use list tail as ID (bold) of the set.
List1: 9->3->2->4->7
List2: 6->1->8
Union: 9->3->2->4->7   -> 6->1->8
+ Union is O(1) [iff we can get tail of list1 and head of list2 in const time, 
+ ID is well defined
- Find() is O(n).

Solution: Representing Disjoint sets as trees give best representation.

  • Represent each set as rooted tree
  • ID is root of tree
  • Use array parent[1..n] where parent[x] is parent of x, or x if its root.

MakeSet(x): parent[x]=x   //O(1)
Find(x):    //O(tree height)
  while x!=parent[x]: x=parent[x]
  return x;

How to merge trees: hang shorter tree under longer tree to keep height under check. Hanging shorter tree under longer tree is called union by rank heuristic.
How to quickly find height of tree: Keep rank[1..n] where rank[i] is height of subtree whose root is i.

MakeSet(x): parent[x]=x   rank[x]=0
Find(x): 
  while x!=parent[x]: x=parent[x]
  return x

Union(x, y):
  x_id=Find(x)
  y_id=Find(y)
  if(x_id == y_id): return
  if rank[x_id] > rank[y_id]: parent[y_id]=x_id;
  else: 
    parent[x_id]=y_id
    if rank[x_id] == rank[y_id]: rank[y_id]=rank[y_id]+1


Hashing

  • dict in python,
  • HashMap in Java, 
  • Keywords like for, while, if in C, 
  • FileSystems(mapping to mount), are implemented using hash table.
  • Banking password generator: password is not sent over network, rather unique hashvalue generated on device and sent to server, server compares password against hashvalue.
  • Storage optimization like dropbox: Optimized using hashing

Problem:
To save from DenialOfServiceAttack where same IP sends many request to hog your server. Wish to analyze logs to safeguard, hashtable can be helpful.
Say to answer queries of last 1 hour:
- Keep count how many times IP address appeared in logs, 
- counter updated every second. (increment for new lines, decrement for old lines older than 1 hr)
- DS to map IP to counters
- IP can be taken as 32-bit number, use Array A having 2^32 IP size and contains counter, number of times IP appeared.
 e.g. 69.171.230.68=69*2^24 + 171*2^16 + 230*2^8+68=1168893508

Issue: In direct addressing AccessLastHour() and UpdateAccessList() are O(1), but need 2^32 array to keep counters. Huge memory requirement, becomes worse for IPv6(2^128). Answer: Hash functions.

Hash Table implementation of set or map using hashing. Chaining is most frequent method to implement hash tab.

Chaining: Store mapping from one type of objects to another type of objects. e.g. FileName to file on disk, EmpID to employee name. It is technique to implement hash table.

Map from Set S to V has: hasKey(O), Get(O), Set(O, v), where O is element of S, v is element of V.

Going back to IP problem:
Create array of size m=8. Each cell will store list, having Object O and value v.

SET
A set is DS which has 3 operations: Add(O), Remove(O), find(O).
Two ways:
1. Map from S to V = {true, false}; //Depending on if obj is in set
2. (efficient): Store just Object O instead of storing pair.
Set: unordered_set in C++, HashSet in Java, set in Python
Map: unordered in C++, HashMap in Java, dict in Python

Hash Functions
- Should be quick to compute.
- should give unique value for different objects.
- must be deterministic, can not be chosen randomly.

Example: Phone Book, store name against phone numbers.
1. DirectAddressing, convert number to integer and have large array to store names. Lost of space wastage as exponential memory consumed.
1TB to store phone book of 12 digit phone number.
2. Chaining: Select hash function of cardinality m, create name array of size m, Name[h(int(phone))].

Start with smaller hash table and use concept of dynamic arrays to double size when limit is hit.

Hashing INT

1. Choose prime number, p,  bigger than 10^n  [n-number of digits in INT]
2. Choose hash table size, m=1000(say)
Say if h(x)=(ax+b)%p=(34x+2)%1000019, 
For N=1482567, h(x)=407185
h(x)/m=185 (index in hash table)
Here, a should be random number between 1,p; and b should be different random number between 0,p

Hashing STRING
- Reverse of phone book, given name find phone number.
- Need to design hash function for name
Convert each character str[i] into integer using ASCII or UniCode. 
Choose big prime, p

PolyHash(S, p, x) 
  hash =0
  for i from len(S) − 1 to 0: 
    hash = (hash * x + S[i]) % p 
  return hash
E.g. len(str)=3
hash=str[2]*x%p
hash=str[1]+str[2]*x%p
hash=str[0]+str[1]*x+str[2]*(x^2)%p

Calculate Power: 
#define DEBUG //printf

float mypow(float x, int n) //n can be -ve int
{
   if(n==0)   { DEBUG("\tRetA 1\n"); return 1; }
   float val=mypow(x, n/2);
   if(n%2==0)
     { DEBUG("\tRetB %f\n", val*val); return val*val; }
   else
     if(n>0)   { DEBUG("\tRetC %f\n", x*val*val); return x*val*val; }
     else      { DEBUG("\tRetD %f\n", (val*val)/x);  return (val*val)/x; } //Required for negative power
}

To understand how recursion works here, enable prints and see below:
        RetA 1
        RetA 1
        RetA 1
        RetC 2.000000
        RetD 0.500000
        RetC 2.100000
        RetB 4.000000
        RetB 0.250000
        RetB 4.409999
        RetC 32.000000
        RetB 0.062500
2.1^2=4.409999
        RetB 1024.000000
2^-4=0.062500
2^10=1024.000000
Fibonacci
#define DEBUG printf
int fib(int n)  //Recursion without memoization
{
  if(n==0)  { printf(" A0");return 0;}
  else if(n==1) { printf(" B 1"); return 1;}
  else { int x=fib(n-1)+fib(n-2); printf(" C %d", x); return x;}
}
If you see, for calculating next Fib number, there is repetition of calculation (shown in color). This can be saved using Dynamic Programming (Memoization)
Fib(5): 5,   B 1 A0 C 1 B 1 C 2 B 1 A0 C 1 C 3 B 1 A0 C 1 B 1 C 2 C 5
Fib(): 8      B 1 A0 C 1 B 1 C 2 B 1 A0 C 1 C 3 B 1 A0 C 1 B 1 C 2 C 5 B 1 A0 C 1 B 1 C 2 B 1 A0 C 1 C 3 C 8

int fib_iterative(int n)  //Iterative version to save stack used between function calls
{
   int first=0, second=1, next, c;
   printf("\nInitial %d terms on fib are: ", n);
   for(c=0;c
   {
     if(c<=1)  next=c;
     else {  next=first+second; first=second; second=next; }
     printf("%d\n", next);
   }
}
Output: 
Initial 10 terms on fib are: 0 1       1       2       3       5       8       13      21      34            

Algorithm complexity:
O(n) represents upper bound. Θ(n) means tight bound. Ω(n) represents lower bound.
Naive Recursive Fib: O(s^n)  Θ(Fn) 

More Theory
Amortized Analysis: To find worst case time for cases which are mostly O(1) but can take upto O(n) at times, such as Dynamic Array when they are doubled in capacity.
AmortizedCost= Cost(n-ops)/n

Aggregate Method: 
Let ci=cost of i-th insert
ci = 1 + x
      x=(i-1), if (i-1) is power of 2 as i-th insert will cause resize 
      x=0 otherwise

Amortized Cost= Sum i=1TOn ci / n = O(n)/n = O(1) of each insertion, worst case cost is still O(n).

Banker's Method: (no code change, just analysis/mental accounting)
Charge extra for each cheap operation, save extra charge as token in DS.
Use tokens to pay for expensive operations.

Physicist's Method: Like a ball taken up hill (potential energy), rolled down energy converted to kinetic energy.

Divide and Conquer Example
Multiply Long Integers
http://www.cs.umd.edu/~meesh/351/mount/lectures/lect10-long-int-multiply.pdf

No comments:

Post a Comment