Sunday, May 2, 2021

System Design Interview

System Design interview is a prominent interview phase, which focuses on Architecture, Design and attention to details.

Key Points to Remember
  1. Always start with High level architecture, then Scaling, Database Design, API Design, Security, Monitoring & Logging and finally Benefits & Tradeoffs.
  2. Don't get into details prematurely
  3. Avoid fitting requirements to a set architecture in mind
  4. Keep it simple and avoid too many hacks/complications when solving.
  5. Critical reasoning and argument are key to a successful software design. Have justifications for the points you make. Don’t make points without thinking them through. Half-hearted attempts at solving problems (using buzz words) are frowned upon heavily.
  6. Be aware of the current solutions within the industry and best tech practices. 
  7. Know when to pick a solution vs. building something custom. Provide pros and cons for the solution.

System Requirements
  • What scope of the Design ? UI, Backend, Database ?
  • What will be bottlenecks of the system - different regions, geolocation.
  • How much consistency is required for the system ? Strong consistency or eventual consistency. 
  • Are there different types of items/products/objects ? 
  • Are there different rates / prices for each product/service ? Memory Estimate - Size of Data ?
  • Are there any delete, cancel, remove, terminate use cases ?
  • How many users ? How many transactions or requests per second ?
  • How much data is each user uploading and requesting ?
  • Depending on the users, total load, is distributed system with big data required ?
  • Using decimal - down the line there are floating point errors
  • Should split services into multiple micro services ?
  • Does the system needs more reads or writes ? Read Replica ? (When replica then require load balancer)
  • Separate payment option from payment calculation.
  • Volume of Data and Low Latency ?
  • Reliable and Security

Non-Functional Requirements:
  1. Availability
  2. Latency - Cache
  3. Scale - How many requests/users in a day
  4. Monitoring & Logging
  5. Load Balancing

API Design

HTTP Verb CRUD Entire Collection (e.g. /customers) Specific Item (e.g. /customers/{id})
POST Create 201 (Created), 'Location' header with link to /customers/{id} containing new ID. 404 (Not Found), 409 (Conflict) if resource already exists..
GET Read 200 (OK), list of customers. Use pagination, sorting and filtering to navigate big lists. 200 (OK), single customer. 404 (Not Found), if ID not found or invalid.
PUT Update/Replace 405 (Method Not Allowed), unless you want to update/replace every resource in the entire collection. 200 (OK) or 204 (No Content). 404 (Not Found), if ID not found or invalid.
PATCH Update/Modify 405 (Method Not Allowed), unless you want to modify the collection itself. 200 (OK) or 204 (No Content). 404 (Not Found), if ID not found or invalid.
DELETE Delete 405 (Method Not Allowed), unless you want to delete the whole collection—not often desirable. 200 (OK). 404 (Not Found), if ID not found or invalid.


API can be public endpoints or internal endpoints.
Before determining the APIs define the system entities (user, product, transaction objects etc), and their corresponding operations (add, fetch, upload, assign etc).

API - Domain/Resource/{Params}
  • API should have no side effects, it should only perform a single add/update operation as the name of the API indicates.
  • API Operation should be atomic
  • When the response for the API is large use pagination or fragmentation (breaking response into multiple tcp pieces).

API Gateway

API Gateway enables to create and access an HTTP or WebSocket API.
A link between the API Gateway and the ELB which is NOT exposed to the internet can be created using VPC Link (only works with Network Load Balancer).

To integrate your API Gateway REST API with a public Application Load Balancer, use API Gateway HTTP integration. For private Application Load Balancers, use API Gateway VPC link to first connect to a private Network Load Balancer. Then, use the Network Load Balancer to forward API Gateway requests to the private Application Load Balancer.


Relational versus NoSQL Database

Relational databases store data in rows and tables. They enable to perform join operations using SQL across different database tables. NoSQL databases are grouped into four categories: key-value stores, graph stores, column stores, and document stores. Join operations are generally not supported in non-relational databases.

Non-relational databases are preferred when:
  • The application requires super-low latency.
  • The data is unstructured, or we don't have any relational data.
  • We only need to serialize and deserialize data (JSON, XML, YAML, etc.). 
  • Need to store a massive amount of data.
  • NoSQL databases are generally easy to scale
In NoSQL databases, the partition key is used to compute a hash, which then selects the corresponding node to store the data. Partition key should be random, evenly distributed and be mostly unique. The second key for indexing is the sort key which defines the order in which the data is to be stored in the cluster nodes. For example DynamoDB has Sort key, ElasticSearch has Index Sorting and Cassandra has Clustering column as sort key. Data redundancy cannot be avoided generally in NoSQL databases.

MongoDB Versa Cassandra Versus DynomoDB

When low latency is required for read queries, then data must be aggregated during writes and no extra computation be done during reads.

Use ids for identification

​Database Design

Identify key objects, their identifying keys and their relationship mapping.
Data can be stored as individual events or as an aggregate data for a collection.

The most crucial aspect in designing a relational database is to identify the relationships among tables.
  • One to Many: The relationships were manager manages zero or more employees while an employee is managed by one (and only one) manager, or a customer may place many orders while an order is placed by one particular customer are examples of one-to-many relationships. A One-to-many relationship is represented by creating two tables for two entities and have the primary key of the "one"-end or the parent table be stored in multiple rows of the "many"-end or the child table. For example the primary key of the table Manager (managerId) is stored for each row in the employees table, and is known as foreign key. A foreign key of a child table is a primary key of a parent table, used to reference the parent table. For every value in the child table, there is one and only one row in the parent table.
  • Many to Many: The relationship for example were a customer's order may contain one or more products; and a product can appear in many orders. Apart from the two entity table, to support many-to-many relationship a third table needs to be created known as a junction table. The primary key of the junction table is combination of primary keys of both the two entities.
  • For example the primary key for OrderDetails table will the two columns orderID (primary key of Order table) and productID (primary key of Product table).
  • One to One: A one-to-one relationship is used to split the data into two tables, especially optional supplementary information which are many times empty. Hence for every row in the parent table, there is at most one row (possibly zero) in the child table.

The database design can be normalized by creating a new table for optional data using one-to-one relationship and splitting a large table into two smaller tables.

Entity Integrity Rule: The primary key cannot contain NULL. Otherwise, it cannot uniquely identify the row. For composite key made up of several columns, none of the column can contain NULL. Most of the RDBMS check and enforce this rule.

Referential Integrity Rule: Each foreign key value must be matched to a primary key value in the table referenced (or parent table).


Sharding

Database Sharing - for handling too many requests - based on region select one database.
Sharding - Read or Write Sharding, Vertical or Horizontal Sharding

Sharding provides high consistency and availability. Join across shards is extremely expensive. Fixed number of shards, hence less flexibility, can be alleviated using hierarchical sharding were a shard is broken up into multiple mini-shards and requests are directed to them using each shard's manager. Index can be created on each shards.

Horizontal partitioning is partitioning of data using a key which is an attribute of the data and allocate each partition to different servers. Vertical partitioning used columns to partition data.


Performance and ​Scalability

System Reliability versus Availability

Horizontal Versus Vertical Scaling

Horizontal Scaling: Adding multiple machines.
  • Needs Load Balancing
  • More Resilient
  • Network calls (RPC) 
  • Data inconsistency can occur
  • Scales effectively without limitations.
Vertical Scaling: Adding more power (CPU, RAM, etc.) to servers.
  • Single point of failure, It does not have failover and redundancy
  • Inter-process communication (faster than network)
  • Data is consist
  • Hardware limitations for scaling a single machine. It is impossible to add unlimited CPU and memory to a single server.

Load Balancer

A load balancer evenly distributes incoming traffic among web servers that are defined in a load-balanced set. Load balancer has public IP which is mapped to the domain name in DNS. For stateful services, every request from the same client must be routed to the same server, which is accomplished using sticky sessions in the load balancer. Load Balancer can manage traffic on different layers as below.
  • Layer 3 - IP address based
  • Layer 4 - DNS based
  • Layer 7 - Application Level

Load Balancing can route traffic based on below techniques.
  • Round Robin,
  • Remove traffic from machines which are offline
  • Hashing on IP address
  • Assign traffic to machines with least load

Cascading failure is a problem were a single failed instance in cluster enables load balancer to equally route its traffic to rest of the healthy instances, which in turn breaks capacity of other instances with added extra traffic, thus crashing other instances as well and cascading the capacity failures to more nodes until entire system goes down. In order to solve the problem, assign a compute capacity i,e. request/second to each node and have a queue for each node which accepts requests based on node capacity. 

Message Queue

A message queue is a durable component, stored in memory, that supports asynchronous communication. It serves as a buffer and distributes asynchronous requests. The producers create messages and publish them into the message queue. The consumers/subscribers connect to the queue and consume the messages from the queue. Message queue is preferred for decoupling to build scalable and reliable application.

Never use database as message queues, as polling of database for new records is expensive (long intervals).


Caching

Caching allows to serve content faster. It happens at different levels in a web application, from CDN (edge caching), Database caching natively used by every database, Browsers caching the static assets based on the cache expiry headers and Server caching (API caching) which is custom caching of data in a server application. Caching is used for heavy concurrent reads but less high write frequency.

The TTL (Time To Live) in cache is used when the data is updated frequently and the cache is required to be expired in regular intervals. The cache automatically gets deleted once the TTL interval has passed. Do not make the cache expiration date too short as this will cause the system to reload data from the database too frequently. Meanwhile, it is advisable not to make the expiration date too long as the data can become stale. Multiple cache servers across different data centers are recommended to avoid single point of failure.

Below are some of the different caching strategies. Read through cache strategy is most popularly used.

Lazy loading (Cache aside): Lazy loading strategy keeps the cache updated through the application asynchronously. When the application requests data, it first makes the request to the cache. If the data exists in the cache and is not expired, then the data is returned to the application, or else the application requests the data from data store. After receiving the data from data store, the application writes the data into cache.

Read through cache: The data will be read through the cache every time. First, check whether data exists in the cache. If it does, read from the cache and send the response. If it doesn’t, the cache will be updated from the datasource. Cache will then send the response back to the client.

Write-through cache: The write-through strategy adds or updates the data in the cache whenever data is written to the database. Data is first written to the cache and then to the main database.

Write behind cache: Cache will send the response before writing to the datastore. It writes to the datastore asynchronously depending on the load. It's a special type of write through cache, were the cache doesn’t wait for the datastore to acknowledge whether data is stored.

Refresh ahead cache: Refresh ahead cache is used to refresh the data before it expires. It happens asynchronously so the end user won’t see any issues.

Pre-Computation

Content Delivery Network (CDN)

CDN is a network of geographically dispersed servers used to deliver static content. When an image (or video) is request by the user using an URL, the domain of the URL is provided by CDN. If the CDN does not have the image in cache, it requests the image from origin server which return the images with TTL header describing how long the image should be cached. The CDN caches the image until TTL expires and returns the image to the user. In case of temporary CDN outage, clients should be able to detect the problem and request resources from the origin. File objects can be invalidated in CDN before expiration. E.g. Akamai


​High-level Architecture

Add images media to Object Storage like S3, use CDN for caching.

Http Polling - Repeatedly ask server if new info is available. Lot of messages and low latency
Long Polling - Send request to server, server holds on to the request until server has a message.
               Require to maintain an open connection, if lot of messages are coming so need to send many requests. Good for notifications.

WebSockets - maintain a duplex (two-way) open connection which allows to send / receive messages.
Messaging server - use pub-sub pattern to publish/read messages.
User Endpoints (e.g. Get Videos) - CDN - Akamai

Metrics, Logging and Monitoring for Auditing

Different components of the system should be decoupled so that they can be scaled independently.

In order to avoid a single point of failure in distributed systems, have multiple nodes, master-slave architecture for databases, multiple load balancers routed to by DNS, and instances in multiple regions.

Database replication is implemented using master/slave relationship between the original (master) and the copies (slaves). A master generally only supports write operations while slave gets copies of the data from master and only supports read operations. If the master database goes offline, a slave database will be promoted to be the new master. If only one slave database is available and it goes offline, read operations will be directed to the master database temporarily. A new database server will replace the old one. Whenever strong consistency is required read locks are used on replicas.

To scale the web tier horizontally, the web tier should be made stateless. A good practice is to store session data in the persistent storage such as relational database or NoSQL. Each web server in the cluster can access state data from shared data store. A stateless system is simpler, more robust, and scalable.

It is recommended to Pre Scale (or Auto scaling) in order to handle an upcoming big event traffic. Also rate limiting of the traffic is helpful. Autoscaling means adding or removing web servers automatically based on the traffic load.

GeoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user. The requests are geo-routed with the closest data center to the user.

Different regions could use different local databases or caches, were traffic might be routed to a data center in case of failover. Data should be asynchronously replicated across multiple data centers.

Circuit Breaker
Single point of failures
Reverse Proxy

Scalable = Partitioning
Reliable = Replication (Master / Slave) and checkpointing
Fast = In-memory

​Trade-offs


Use Cases

Uber - Use Google S2 API for location tracking and finding cabs in 1 mile radius.

Dropbox - (Indexer, Chunker, Watcher) on client, Metadata cached, Sync service, versioning

Netflix - Videos are stored based on resolution and formats. Transcoding is used to validate videos for errors, convert video to different formats and resolutions and optimized them for different devices. In transcoding, a single 50 GB video file -> break into chunks -> into queue -> workers process chunks -> merge into S3. A Transcoder performs the transcoding processing using concurrent async workers. Videos are placed in different regions to avoid latency due long network paths.

Chat (WhatsApp): Web Sockets are used for asynchronous messaging between client and server and vice versa. It keeps the connection open.

XMPP Protocol: Push messages to the clients using i.e. Peer to Peer communication.

Instagram: Use Distributed File System (S3 Bucket) to stores images and videos. Anytime storing images/videos remember to also store corresponding meta-data.

Task Scheduling: Scheduling tasks to execute immediately is straight forward queue approach, but to execute the task at a specific time, would require constant polling at periodic time intervals. Also it is important to clarify if the jobs are data heavy or processing heavy and are either lightweight or heavyweight. The tasks are submitted to a distributed message queue (Kafka) in order to handle huge requests (10 million jobs/day) and provide fault tolerant. A centralized architecture which involves Zookeeper and worker nodes, which employ MapReduce on top of HDFS to perform tasks.

A distributed architecture in which a group of workers grab tasks from a task queue to process, provides a fully parallel, non-blocking solution for task scheduling problem. If the tasks are too small/inexpensive then such a system would add more cost to parallelize and communication between the nodes. The message queue system like Kafka or RabbitQueue (or table in RDB) can also be used to persist the queue.

Hadoop MapReduce is designed to solve such task scheduling problem. In MapReduce the Master node maintains a task queue (a.k.a. JobTracker) for all the workers in its cluster, and has a standby node to consistently replicate JobTracker's state. When a worker finishes its work, it retrieves a new unit of work (Map operation) from the Master. The Master does not assign more than one task per worker and hold onto more work until some Worker become idle who can immediately accept the work. 


Saturday, April 3, 2021

Coding Interview - Key Algorithms and Techniques

Coding Interview is ubiquitous now across most of the companies and primarily used to filter out developers for the interview process. The argument that coding interview is an ideal test to determine the quality of the developer is very much debatable. Candidates fail the coding interview not because they are necessarily bad programmers/developers but because they are not trained and prepared for competitive programming. Competitive programming techniques are rarely used in real life production code, but are vital to crack the coding interview problems. Below are some of the key points, important algorithms and vital techniques to do better in competitive programming.


Key Points to Remember
  • Always check for edge cases, e.g. number is zero or negative (also one in some cases), array or string is empty or null etc.
  • Always check for missing counter++ or counter-- in while loops before running code.
  • Always check the index boundary first in any (if/while) condition, to avoid index out of bounds exception with other conditions in the if/while statement.
  • Anytime incrementing the array index, check if it is above the array length.
  • When looping a matrix, we first loop to matrix.length, then matrix[i].length. Avoid while loops for matrix. When using while loops over a matrix, don't forget to reset the outer counter to 0. 
  • Every iterative or recursive solution should begin with the check for the exit case.
  • Avoid preemptive checks before the recursive method call, as this adds duplicate logic, complicates the code and makes it harder to debug.
  • To find the smallest amount of integers from the given array which total to the target sum, avoid the misconception of selecting the largest numbers first, it doesn't work always.
  • When combining two keys into a single key string for a hash map, always have a separator between the complex two keys.
  • While working with linked lists and updating the Head, Tail or Current node, always remember that "tail.next = newNode" is not enough, also need to point the tail to the new node "tail = newNode".
  • Always prefer HashSet instead of HashMap if logic requires to select elements matching a given criteria, ignoring the count.
  • Anytime when its required to store the index of the elements then always use HashMap instead of ArrayList.
  • Always strive for better performance even at the cost of extra memory, as memory is cheap today.

Recursion to Iteration

Any recursive algorithm can be rewritten as iterative and vice versa. Iterative algorithm is generally preferred over recursive algorithm as recursion consumes limited stack memory as compared to larger heap space. A Recursion can be converted into Iteration by having a while loop and a data structure which can vary from a simple string or an array, but preferably a stack. If recursion has two or more recursive calls immediately following one another, then to preserve the order of the calls, we have to add them in the reverse order to the stack. The stack contains the variable and also holds the return value. If the stop condition is reached, check whether the stack is empty. If the stack is not empty, pop() and jump to the appropriate variable (label). If the stack is empty, return the result.

class Node {
    int data;
    Node left, right;
}
 
public void preorderRecursive(Node root) {

    if (root == null) {
       return;
    }
    
    System.out.print(root.data + " ");
    preorder(root.left);
    preorder(root.right);
}


public void preorderIterative(Node root) {

    if (root == null) {
        return;
    }
    
    Stack<Node> stack = new Stack();
    stack.push(root);
    
    while (!stack.empty()) {
        Node current = stack.pop();
    
        System.out.print(current.data + " ");
    
        if (current.right != null) {
           stack.push(current.right);
        }
    
        if (current.left != null) {
           stack.push(current.left);
        }
    }
}

Time Complexity

While calculating the time complexity (Big-O) it is very important to not ignore the copying of array, string and other manipulations within the loops.

Below are few examples of recursive calls and their corresponding time complexity.

Time Complexity : O(n)
recursive1(int n) {
   if(n <= 1)
      return;

   recursive1(n - 1);
}

Time Complexity : O(2^n)
recursive2(int n) {
   if(n <= 1)
      return;

   recursive2(n - 1);
   recursive2(n - 1);
}

To determine the time complexity for recursion were the size of the list is divided into smaller list by some factor, then
n / (factor pow k) = 1, which gives k = log factor (n). 

Here n is size of the list, and k is the levels of recursion. 
If factor is 2, then n / (2 pow k) = 1, which gives k = log2(n)


Space Complexity

Avoid including the size of the result returned in deterring the space complexity.
Space complexity of recursive function, include any of the stack space taken up by function calls.


ASCII & Unicode

ASCII defines 128 characters, which map to the numbers 0–127. Unicode defines (less than) 2 power(21) characters, which, similarly, map to numbers 0 – 2 power(21). Unicode is a superset of ASCII, and the numbers 0–127 have the same meaning in ASCII as they have in Unicode. For example, the number 65 means "Latin capital 'A'". ASCII uses an 8-bit encoding while Unicode uses a variable bit encoding i.e. 32, 16, and 8-bit encodings. Because Unicode characters don't generally fit into one 8-bit byte, there are numerous ways of storing Unicode characters in byte sequences, such as UTF-32 and UTF-8.


String

Palindrome is a sequence of characters which reads same backward as forward. If the string has odd length like "abcba" then it has a unique character at the center, otherwise for even length like "abba" all character count is even. Hence all characters should occur an even number of times, except one. If more than one characters appear an odd number of times, then the string cannot be a palindrome.

To check if the String is a palindrome: Take two pointers i pointing to the start of the string and j pointing to the end of the string. Keep incrementing i and decrementing j while i < j and at every step check whether the characters at these pointers are same or not. If not then the string is not a palindrome else it is.

Two Pointer Approach is used for the string problems such as to reverse a string/array. In such case we have one pointer at start, while another pointer at the end, each moving in opposite direction and swapping its corresponding values.

Problems which require to find a substring or subarray with certain attributes (especially repeating attributes) and x number of changes, then it's a sliding window problem. Sliding window also has two pointers, initially both are at start, then the second pointer moves forward, while first is still at start. Once the second pointer moves to position when certain attributes are not satisfied, then depending on the problem the first pointer moves by one or other location.


Arrays

Never use switch/if-else statements for long list of mappings for example character to count, number to word etc. HashMap can be used in such cases. But always prefer to use arrays such that index is the code (character or number), while element is the value.

Always try to sort the array of the options, when we get to the point that the option is greater than the required value then we can break the loop, skip rest of the higher options, since the array was sorted.
int[] result = new int[26];
for(char c: someString.toCharArray()) {
    result[c-'a']++
}

Given a fixed length array which can have 2 possible values, either zero or one (vacant or empty), then total possible unique combinations is 2^ArrayLength. If 10^9 operations are performed on the array, the total combinations of values produced are within 2^ArrayLength with cycles. Hence the Nth operation value can be found by N % 2^ArrayLength, and finding the corresponding combination value.


Quick Sort Implementation: Always select the last element in the list as pivot. Swap the elements less than pivot, with the elements from the start of the list. Finally move the pivot element from the end of the list to the position after the last swapped element which was smaller than pivot. Now the list is partitioned, with all elements before pivot are smaller than the pivot, while elements after the pivot are larger than pivot. Now call sort method recursively on the two lists excluding the pivot element, to sort them. Best case performance is O(n log n), while worst case performance is O(n^2).

private int partition(int[] array, int low, int high) {
     
    int pivot = array[high];     
    int i = low;
 
    for(int j = low; j <= high - 1; j++) {
         
        if (array[j] < pivot) {
        
            swap(array, i, j);
            i++;
        }
    }

    swap(array, i, high);
    return i;
}
 
private void quickSort(int[] array, int low, int high) {
    if (low < high) {
        int pi = partition(array, low, high); 
        quickSort(array, low, pi - 1);
        quickSort(array, pi + 1, high);
    }
}

Number Division

If given an integer and asked to convert it to words or reverse a number or add/remove digits etc, then NEVER convert it to char array.

int reversed = 0, pop;

while(x != 0) {
  pop = x % 10;
  x /= 10;
  reversed = (reversed * 10) + pop
}

Below is the key logic to convert number to words, refer the entire code.
int i = 0;
String words = "";
    
while (num > 0) {
   if (num % 1000 != 0) {
    	words = numberToWordHelper(num % 1000) + THOUSANDS[i] + " " + words;
   }
   num /= 1000;
   i++;
}

private String numberToWordHelper(int num) {
    if (num == 0)
        return "";
    else if (num < 20)
        return LESS_THAN_20[num] + " ";
    else if (num < 100)
        return TENS[num / 10] + " " + numberToWordHelper(num % 10);
    else
        return LESS_THAN_20[num / 100] + " Hundred " + numberToWordHelper(num % 100);
}


Binary Search

Binary Search Recrusive: O(log n)
public int binarySearchRecursive(int[] sortedArray, int key, int low, int high) {
    int middle = (low + high) / 2;
        
    if (high < low) {
        return -1;
    }

    if (key == sortedArray[middle]) {
        return middle;
    } else if (key < sortedArray[middle]) {
        return binarySearchRecursive(sortedArray, key, low, middle - 1);
    } else {
        return binarySearchRecursive(sortedArray, key, middle + 1, high);
    }
}

Binary Search Iterative
public int runBinaryIterative(int[] sortedArray, int key, int low, int high) {
    int index = Integer.MAX_VALUE;
    
    while (low <= high) {
        int mid = (low + high) / 2;
        if (sortedArray[mid] < key) {
            low = mid + 1;
        } else if (sortedArray[mid] > key) {
            high = mid - 1;
        } else if (sortedArray[mid] == key) {
            index = mid;
            break;
        }
    }
    return index;
}

Anytime in the problem it is mentioned to "Find the smallest number for ...", then start with number = 2, and use (number << 1) or (number * 2) to increment rather than number++, so that number is incremented as 2,4,8,16,32,64,128 etc. Then find the midpoint between the previous number (failing the condition) and the current number, and use binary search logic to determine the exact smallest number with some condition.


Linked List

Array element shifts are very expensive especially for sort and similar operations. Hence Linked List is preferred in such case. Rather than shifting and swapping elements, two different linked lists are created to maintain elements less than and elements greater than or equal to.

The Two pointer approach works well for LinkedList problems.
  • Have one slow pointer moving by (i+1), while another fast pointer move by (i+2).
  • Have one slow pointer point to the first element while the another fast pointer move over the rest of the elements.

Calculating the mid point of the list is frequently used for quick sort or similar approaches.
right - left = 2(midpoint - left)
midpoint = left + (right - left)/2


Stack and Queue
  • Implementing stack requires a variable to keep track the top of stack. If the top of the stack is null then the stack is empty.
  • Implementing queue requires 2 variables to track the front and the end of the queue. When front (or last) is null then queue is empty, and elements are added to the end/last node of the queue.

Monotonic Stack

A monotonic stack is a stack of elements that are incremented or monotonically decremented on the stack and can only be manipulated at the top of the stack. Monotonic stack maintenance is O(n) time complexity, all elements will only enter the stack once. Before the element is added to the stack, the element that destroys the monotonicity at the top of the stack is deleted. It mainly used to solve the problems to find the Next Greater Element or range queries in an array.

Whenever we see a number x greater than stack.peek() we pop all elements less than x. For all the popped elements, their next greater element is x.

public int[] dailyTemperatures(int[] T) {
    int[] ans = new int[T.length];
    Stack<Integer> stack = new Stack<>();

    for (int i = 0; i < T.length; i++) {
        while (!stack.isEmpty() && T[i] > T[stack.peek()]) {
            ans[stack.peek()] = i - stack.pop();
        }
        stack.push(i);
    }
    return ans;
}


public int[] nextGreaterElement(int[] findNums, int[] nums) {
    Map<Integer, Integer> map = new HashMap<>(); // map from x to next greater element of x
    Stack<Integer> stack = new Stack<>();
    for (int num : nums) {
        while (!stack.isEmpty() && stack.peek() < num)
            map.put(stack.pop(), num);
        stack.push(num);
    }
    for (int i = 0; i < findNums.length; i++)
        findNums[i] = map.getOrDefault(findNums[i], -1);
    return findNums;
}

Trie

Trie is a tree data structure often used for storing characters representing collection of strings. If two strings have a common prefix then they will have the same ancestor in this tree. Trie is used for word validations, find all the words, used for prefix based search.

Initially TrieNode has empty root node with empty map, which is the current node. In order to store a string into Trie, we scan each its character, check if it is present in the Map of the current node. If character is present in the map, then we jump to the node using the reference value for the character key in the map, making it the current node. If character is not present then add the character to the map of the current node, with the value being reference to the newly created instance of TrieNode, which now becomes the current node for the next character. For the last character of the string the isEndOfWord flag is true. Finally, all the leaf TrieNode have empty maps. Time complexity is O(mn), were m is average length of string and n is total strings to be added.

In prefix based search to find if the string starting with prefix exists, we check the map for prefix character, jump to next node referenced, until all prefix characters are present in corresponding node's map. The full word check also works the same, only we check if the last character's TrieNode has isEndOfWord true.
private final TrieNode root;

public Trie() {
    root = new TrieNode();
}

public void insert(String word) {

    TrieNode current = root;
    
    for (int i = 0; i < word.length(); i++) {
    
        char ch = word.charAt(i);
        TrieNode node = current.children.get(ch);
        
        if (node == null) {
            node = new TrieNode();
            current.children.put(ch, node);
        }
        current = node;
    }
    current.endOfWord = true;
}

public boolean search(String word) {

    TrieNode current = root;
    
    for (int i = 0; i < word.length(); i++) {
    
        char ch = word.charAt(i);
        TrieNode node = current.children.get(ch);
        
        if (node == null) {
            return false;
        }
        current = node;
    }
    return current.endOfWord;
}

Disjoint Set

A disjoint set is a group of sets where no item can be in more than one set. It is also called a union–find data structure as it supports union and find operation on subsets. 

Find operation returns the group the element belongs to. It determines in which subset a particular element is in and returns the representative of that particular set. An item from this set typically acts as a "representative" of the set.

Union operation merges two different subsets into a single subset, and the representative of one set becomes representative of another.

The disjoint–set also supports one other important operation called MakeSet, which creates a set containing only a given element in it.

In order to create Union Find, we first construct a bijection or a mapping between the objects and the integers in the range [0, n). We randomly assign a mapping between the objects and the integers which is stored in a HashMap. Then construct an array which has values as indexes, were each index is associated with an object using the previous HashMap mapping. Initially each object referenced by the index in the array is a root node. As union operation e.g. Union(A,J) are executed, the objects A and J would have parent-child relationship (Parent A -> Child J), such that the index which mapped to object J is now updated to index of object A in the array. Ideally either object passed in union operation can be the parent. As this process continues the size of each group increases. If a union operation has to be performed on two objects from different group, then first the root node of the smaller group is determined and then its corresponding index is updated with the index of the root node for the larger group.

In order to find which component a particular element belongs to find the root of that component by following the parent nodes until a self loop is reached. The time complexity increases to find the root node as each node points to its parent which ultimately points to the root node along the chain. To enhance the performance we use path compression were each child object in the group now points to the root node of the group.

1. The first way, called union by rank, is to always attach the smaller tree to the root of the larger tree. Since it is the depth of the tree that affects the running time, the tree with a smaller depth gets added under the root of the deeper tree, which only increases the depth of the depths were equal. Single element trees are defined to have a rank of zero, and whenever two trees of the same rank r are united, the result has the rank of r+1. The worst-case running-time improves to O(log(n)) for the Union or Find operation.

2. The second improvement, called path compression, is a way of flattening the tree’s structure whenever Find is used on it. The idea is that each node visited heading to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, as Find recursively traverses up the tree, it changes each node’s parent reference to point to the root that is found. The resulting tree is much flatter, speeding up future operations not only on these elements but on those referencing them, directly or indirectly.

import java.util.HashMap;
import java.util.Map;
 
class DisjointSet {

    private Map<Integer, Integer> parent = new HashMap();

    // stores the depth of trees
    private Map<Integer, Integer> rank = new HashMap();
 
    public void makeSet(int[] universe) {

        // create n disjoint sets (one for each item)
        for (int i: universe) {
            parent.put(i, i);
            rank.put(i, 0);
        }
    }
 
    // Find the root of the set in which element k belongs
    public int Find(int k) {

        // if k is not the root
        if (parent.get(k) != k) {
            // path compression
            parent.put(k, Find(parent.get(k)));
        }
 
        return parent.get(k);
    }
 
    // Perform Union of two subsets
    public void Union(int a, int b) {

        // find the root of the sets in which elements x and y belongs
        int x = Find(a);
        int y = Find(b);
 
        // if x and y are present in the same set
        if (x == y) {
            return;
        }
 
        // Always attach a smaller depth tree under the root of the deeper tree.
        if (rank.get(x) > rank.get(y)) {
            parent.put(y, x);
        }
        else if (rank.get(x) < rank.get(y)) {
            parent.put(x, y);
        }
        else {
            parent.put(x, y);
            rank.put(y, rank.get(y) + 1);
        }
    }
}

Trees

A binary search tree is a binary tree in which every node fits a specific ordering property: all left descendents <= n < all the right descendents.

Graphs

Adjacency List is the most common way to represent a graph. Every vertex (or node) stores a list of adjacent vertices. In an undirected graph, an edge like (a, b) would be stored twice: once in a's adjacent vertices and once in b's adjacent vertices.
class Graph {
   public Node[] node s;    // HashMap<Node, LinkedList<Node>> adjacencyMap;
}

class Node {
   public String name;
   public Node[] children;   // Not required when adjacencyMap is used in Graph
   public boolean visited;    // For BFS
}

Adjacency matrix is an NxN boolean matrix (where N is the number of nodes), where a true value at matrix[i][j] indicates an edge from node i to node j. (Can also use integer with Os and 1s). In an undirected graph, an adjacency matrix will be symmetric.


Topological Sorting

Topological sorting of vertices of a Directed Acyclic Graph is an ordering of the vertices v1,v2,...,vn in such a way, that if there is an edge directed from vertex vi towards vertex vj i.e. (vi -> vj), then vi comes before vj. Hence topological sort algorithm takes a directed graph and returns an array of the nodes where each node appears before all the nodes it points to. A cyclic graph do not have any valid topological orderings. Topological sort is generally used for problems involving ordering of dependent items.


We implement topological sort using a boolean array of visited nodes, and a stack to maintain all nodes in topological sorted order. To find topological sort, we start with any node. If the current node is not visited, then we add the node as visited and explore children of the node, adding them first to the visited array. If node has no children i.e. it is completely explored then the node is added to the stack and we go back to the parent node. We continue visiting all the children of current node until all of them are visited. Finally return the topological sort by popping the nodes from the stack.
public class Graph {

    private int vertices;
    private List<List<Integer> > adjacencyList;

    Graph(int v) {
        vertices = v;
        adjacencyList = new ArrayList<ArrayList<Integer> >(v);
        for (int i = 0; i < v; ++i) {
            adjacencyList.add(new ArrayList<Integer>());
        }
    }

    public void addEdge(int v, int w) { adjacencyList.get(v).add(w); }

    private void topologicalSort(int v, boolean[] visited, Stack<Integer> stack) {

        visited[v] = true;

        for (Integer i : adjacencyList.get(v)) {
            if (!visited[i]) {
                topologicalSort(i, visited, stack);
            }
        }

        stack.push(v);
    }

    public void topologicalSort() {
    
        Stack<Integer> stack = new Stack<>();
        boolean visited[] = new boolean[vertices];

        for (int i = 0; i < vertices; i++) {
            if (!visited[i]) {
                topologicalSort(i, visited, stack);
            }
        }

        // Print contents of stack
        while (stack.empty() == false) {
            System.out.print(stack.pop() + " ");
        }
    }
}


Depth First Search (DFS)

In Depth-first search (DFS), we start at the root (or another arbitrarily selected node) and explore each branch completely before moving on to the next branch. That is, we go deep first before we go wide. It is simpler and mostly used to visit every node in the graph.
public void dfs(int start) {
    boolean[] isVisited = new boolean[adjVertices.size()];
    dfsRecursive(start, isVisited);
}

private void dfsRecursive(int current, boolean[] isVisited) {
    isVisited[current] = true;
    visit(current);
    for (int dest : adjVertices.get(current)) {
        if (!isVisited[dest])
            dfsRecursive(dest, isVisited);
    }
}

Whenever asked for permutations or combinations then typically it can be solved by DFS with recursion. Simulate taking and not taking the current element in the list.

findCombinations(int[] array, index, target, List<Int> current, List<List<Int>> combinations) {

  if(target == 0) {
    combinations.add(current);
    return;
  }

  if(target < 0) {
    return;
  }

  for(int i = index; i < array.length; i++) {
    if(i == index || array[i] == array[i-1]) {
      current.add(array[i])
      findCombinations(array, i+1, target - array[i], current, combinations);
      current.remove(array.size() - 1)
    }
  }
}

Breadth First Search (BFS)

In Breadth-first search (BFS), we start at the root (or another arbitrarily selected node) and explore/visit each neighbor before going on to any of their children. That is, we go wide before we go deep. It is used o find the shortest path. It is usually implemented using an iterative solution with a queue.
public void breadthFirstSearch(Node node) {

    if (node == null)
        return;

    LinkedList<Node> queue = new LinkedList<>();
    queue.add(node);                                 // Add first Node

    while (!queue.isEmpty()) {
        Node currentFirst = queue.removeFirst();

        // If node already visited/added to queue then skip
        if (currentFirst.isVisited())
            continue;

        // Mark the node as visited
        currentFirst.visit();
        System.out.print(currentFirst.name + " ");

        LinkedList<Node> allNeighbors = adjacencyMap.get(currentFirst);

        if (allNeighbors == null)
            continue;

        for (Node neighbor : allNeighbors) {
            if (!neighbor.isVisited()) {
                queue.add(neighbor);
            }
        }
    }
}

Dynamic Programming

To solve any problem using Dynamic Programming the problem should have an optimal substructure (allows recursion) and overlapping subproblems. Any combination problem can be solved by Dynamic Programming.

Dynamic programming problems can be solved by either using Recursion (Top Down) and Memoization, or Tabulation (Bottom Up). In Top-Down approach, the problem is implemented using recursion and memoization achieved by adding hash map to store the duplicate computation. In the Bottom-up approach, we start the transition from our base state i.e DP[0] and follow our state transition relation to reach our destination state DP[n].

Below are important steps in solving a dynamic programming problem.
  • Identify the changing parameters and determine the number of subproblems.
  • Visualize and represent the given problem with a tree like structure on a drawing board with root as the input, children as sub-problems, edges as each combination steps and the leaves the end result for selected branch combination. This gives clear insight and understanding of the problem.
  • From the above dynamic programming tree, if the height of the tree is n and each node has m children then the total time complexity is O (m^n), and the space complexity is the height of the tree i.e. n. If the tree were to be a binary tree were m is 2, then time complexity becomes O(2^n).
  • Clearly express the recurrence relation.

Tabulation Approach
  • Create a table with the length equal to the length of the input, plus one for the base case. Hence the length of array is target input value + 1, such that array[target-input] is the last value. The input which changes throughout the problem is used to determine size of the array for tabulation. Generally based on the type of the answer to be returned, that corresponding type is used as values for the array. If there are two target inputs then its two dimensional array. 
  • Initialize the array to default values, for e.g. numbers are initialized to 0 while boolean to True/False.
  • Decide the trivially smallest input and add the solution for this smallest subproblem. Typically base case is added for index 0 (and 1) when either the number is 0 or char is empty etc.
  • Every index of the array corresponds to the subproblem input to the problem function.
  • Iterate through the table and fill in further positions based on the values in the current position.
  • Determine the logic for filling the positions can be based on the recursive version of function.

For example, the coin change problem requires us to determine the fewest number of coins needed to make up the required amount. First we determine the smallest number of coins to make up 0 amount, then 1 amount, then N-1 amount finally N amount. The logic can be represented by below pseudo code.
for 0 to N amount                    // --- i
   for each coin in coins[]          // --- j
      if(coin value is less then amount) then we select the coin
         df[i] = Math.min(dp[i], 1 + dp[i - coins[j]])




Thursday, April 1, 2021

Coding Interview - Java for Competitive Programming

Competitive programming and Coding interviews requires candidates to have strong command on any one programming language to translate the problem solution to code. Ideally it would be good to learn multiple programming languages, but because the "Language Agnostic" interview policy adopted by many tech companies, it is generally fruitless to spend time and effort to master multiple languages instead of focusing on one language & algorithms.

Java language is predominantly used by many for coding interviews due to its simple syntax and widely available help online. Below are some of the commonly used code, functions, hints, tricks which come handy during coding interview.

It is very important to note that different Java Online IDE's used for the interviews, which have different level of support for various standard library methods, hence be prepared to use alternative API's in order to carry out a given operation. Further in some interviews the interviewee is not even provided a java compiler, but just a simple text editor (Codepad). The interviewee is expected to run and debug the code in mind to determine correctness of the code which can be very challenging.

Basics

The default value of char primitive type variable is '\u0000'.

Swapping two variables without temp variable
a = a + b
b = a - b
a = a - b

Alternative approach to swap using XOR operation
x = x ^ y;
y = x ^ y;
x = x ^ y;

The break statement can be used to exit from a single loop. To break from multiple loops, use return statement or use break with a label for the outer loop as below.
outerloop:
  for (int i=0; i < 5; i++) {
    for (int j=0; j < 5; j++) {
        if (i * j > 6) {
            System.out.println("Breaking");
            break outerloop;
       }
        System.out.println(i + " " + j);
    }
  }

Pass By Value

Arguments in Java are always passed by value regardless of the original variable type. Each time a method is invoked, a copy for each argument is created in the stack memory and the copy version is passed to the method. A copy of variable is created in the stack for Primitive type. For Object type, a new reference or pointer is created inside the stack memory, which points to the actual object data. Hence any change to the reference inside the method is actually changing the reference of the copies and not the original references. When defining an arraylist or any collection in java, a reference is created inside the stack that points to multiple objects inside the heap memory. When the list is updated in the called method, then the original list reference can see the modification, since both references are pointing to the same object in memory.


String or Character operations

String Tokenization is used to process each token string one at a time.
import java.util.StringTokenizer;

StringTokenizer tokenizer = new StringTokenizer(str, ",");

while (tokenizer.hasMoreElements()) {
   tokens.add(tokenizer.nextToken());
}

The split() method of String is used to tokenize string and store result in array.
String[] info = someString.split("\\s+");            // split by space
String[] fragments = info[1].split("\\.");           // split by dot

Each alphabet character has a numeric value, for example numeric value of alphabet 'a' is 10.
Character.getNumericValue('a')

The substring() method returns a new string that is a substring of given string based on the passed indexes.
String substring(int beginIndex)
String substring(int beginIndex, int endIndex)

To reverse a String use the StringBuilder's reverse method as below.
new StringBuilder("String to reverse").reverse().toString();

Given a string containing a prefix word, return remaining substring.
if(string.startsWith(word)) {
   String remainingString = string.substring(word.length());
}

Math Library

The Math library is Java has very useful methods for performing any mathematical operations.
  • Math.abs(num): Returns the absolute value.
  • Math.ceil(num): Returns the smallest integer that is greater than or equal to the argument
  • Math.floor(num): Returns the largest integer that is less than or equal to the argument.
  • Math.max(a, b) and Math.min(c, d): Returns the largest and smallest of the two arguments respectively.
  • Math.pow(a, b): Returns the value of the first argument raised to the power of the second argument. 
  • Math.round(a): Returns the closed int or long (as per the argument)
  • Math.random(): Returns a double value with a positive sign, greater than or equal to 0.0 and less than 1.0

Arrays

Print Array elements for debugging purposes
System.out.println(Arrays.toString(array));

Initialize or set all elements of array with one value
int[] intarry = new int[n+1];
Arrays.fill(intarry, -1);
Arrays.fill(intarry, 0);

Arrays can be copied in entirely or partially. The copyOf() method copies entire array, while the copyOfRange() method copies the subset of the specified array. The arrays can be of either primitive or object type.
Arrays.copyOf(original[], int newLength) and 
Arrays.copyOfRange(original[], int from, int to)

Array elements can be sorted using sort()
char[] charArray = string.toCharArray();
java.util.Arrays.sort(charArray);

The Arrays.binarySearch()(SET 1 | SET2) method is used to apply binary search on a sorted array.

Convert int array to List of Integer using IntStream.boxed() which box each element of the stream to an Integer.
List<Integer> list = Arrays.stream(arr)
                           .boxed()
                           .collect(Collectors.toList());
We can also use IntStream.of() to get IntStream from integer array.
List<Integer> list = IntStream.of(arr)    // returns IntStream
                            .boxed()
                            .collect(Collectors.toList());
Converting primitive integer array to an Integer array.
Integer[] boxedArray = Arrays.stream(arr).boxed().toArray(Integer[]::new);


Comparable and Comparator

Comparable object is capable of comparing itself with another object. The class (of element to be compared) itself must implements the java.lang.Comparable interface and override the compareTo method to compare its instances. Comparable compares "this" reference with the object specified in the compareTo(T o) method.

Comparator is an external class from the element type that is to be compared. Multiple separate classes implementing Comparator are created to compare by different members. The Collections.sort() method takes Comparator argument which invokes the compare() to sort objects. Comparator compares objects of two different class passed to the compare(T o1,T o2) method. Comparator also provides built in method such as naturalOrder() and reverseOrder() for natural ordering, reversed() to reverse ordering of the comparator.

Both compareTo() method of Comparable and compare() method of Comparator return integer with:

negative = first argument (or this object) is less than the (second) argument.
positive = first argument (or this object) is greater than the (second) argument.
zero = both objects are equal

If sorting of objects needs to be based on natural order then use Comparable whereas if sorting needs to be done on attributes of different objects, then use Comparator.


Collections and Lists

Sort collection elements using sort method
Collections.sort();

Sort collection elements using sort method
Collections.copy(List<T> dest, List<T> src)  copies specified collection.

The Collections.rotate() to rotate a collection or an array by a specified distance.

The Collections.frequency() method can be used to get the frequency of a specified element in a collection or an array.

The Collections.binarySearch() used to apply binary search on a collection based on comparators.

The Set collection does not have index, hence to get the first element from a Set we use Iterator.
HashSet<String> hashSet = new HashSet<String>();
Iterator<String> iter = hashSet.iterator();
if (iter.hasNext()) {
  String value = iter.next();
}

LinkedHashSet is subclass of HashSet that guarantees insertion-order.
SortedSet is a set in which the elements are stored in some sorted order.
NavigableSet extends the SortedSet interface with navigation (searching) methods.

Queue<E> extends the Collection interface to maintain a collection whose elements need to be processed in some way, i.e., insertion at one end and  removal at the other, usually as FIFO. E.g. PriorityQueue<E>, LinkedList<E>. 

The offer(E e) method of Queue Interface inserts the specified element into this queue if it is possible to do so immediately without violating capacity restrictions. This method is preferable to add() method since this method does not throws an exception when the capacity of the container is full since it returns false.

The peek() method returns the object at the top of the current queue, without removing it, while the poll() method returns the object at the top and removes it. If the queue is empty then both peek() and poll() methods returns null. The remove() method is returns the top object from the queue and throws NoSuchElementException if the queue is empty.

The LinkedList implementation uses a doubly-linked list and, Insertions and deletions are very efficient. If frequent insertions and deletions occur inside a list, a LinkedList can be more efficient. LinkedList class also implements two other interfaces that allow it to be used for stacks and different kinds of queues.

PriorityQueue is implementation for a queue with priority ordering.

Deque<E> extends the Queue interface to maintain a queue whose elements can be processed at both ends. E.g. ArrayDeque<E>, LinkedList<E>. Deque inherits the offer method from Queue interface which is similar to offerLast. It allows to insert from the front and end, as well as remove from the front and the end from the constant time.
Deque<Integer> deque = new ArrayDeque<>();
deque.offerFirst(val);
deque.offerLast(val);

Maps

HashMap is implemented as a hash table, and there is no ordering on keys or values. HashMap is same as Hashtable, except that it is unsynchronized and permits nulls.

TreeMap is a map, sorted by the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used. It is an efficient way of sorting and storing the key-value pairs. The storing order maintained by the TreeMap must be consistent with equals just like any other sorted map, irrespective of the explicit comparators.

LinkedHashMap is a subclass of HashMap with an additional feature of preserving the insertion-order of the elements.


Print Map key-value pairs using java8
map.forEach((key, value) -> System.out.println(key + ":" + value));
Simple way to increment or update values in a Map.
map.put(sum, map.getOrDefault(sum, 0) + 1);
The putIfAbsent is handy method to only put the key and value if it doesn't exists.
map.putIfAbsent(first_key, value);
The ceilingEntry() method call returns an entry with the least key greater than or equal to key, or null if there is no such key.
Integer value = map.ceilingEntry(dollar - sum).getValue();

Threads

The Thread.start() method sets the thread in Ready-to-run state and makes it eligible for running. It waits for its turn to get CPU time. Once picked by thread scheduler it goes in Running state.

The sleep() causes the currently running thread to temporarily pause its execution and transit to the Sleeping state. The Sleep method does not relinquish any lock that the thread might have. After the sleep time, the thread transitions to the Ready-to-run state where it takes its turn to run again.

The yield() method may cause the current thread in the Running state to transit to the Ready-to-run state, relinquishing the CPU and giving other threads in the Ready-to-run state a chance to run. The thread is then at the mercy of the thread scheduler as to when it will run again, which is possible if there are no threads in the Ready-to-run state. The yield() method is an advisory method, and comes with no guarantees that the JVM will carry out the call's bidding. A call to the yield() method does not affect any locks that the thread might hold.

The wait() and notify() methods can only be executed on an object whose lock the thread holds (synchronized object). When a thread invokes the wait() method on the object whose lock it holds, it leaves the Running state and transits into Waiting-for notification state. The thread relinquishes the ownership of the object lock on which the wait() method was invoked, which can then be acquired by other threads. A thread in the Waiting-for-notification state is awakened by:
  1. Another thread invokes the notify() method on the object of the waiting thread, and the waiting thread is selected as the thread to be awakened.
  2. The waiting thread times out.
  3. Another thread interrupts the waiting thread
The notify() method on an object relinquishes the lock from the current thread, and wakes up another thread waiting for the same object's lock. After being notified, the waiting thread first transits to the Blocked-for-lock-acquisition state to acquire the lock on the object, and then to the Ready-to-run state.

A thread can invoke the join() method on another thread in order to wait for the other thread to complete its execution before continuing. Thread calling join waits until other thread completes its execution, or waiting thread is timeout or interrupted.


Bit Manipulation

Check if the number is even or odd without using the % operator, with bitwise & operation.
System.out.println((a & 1) == 0 ?  "EVEN" : "ODD" );
Fast Multiplication or Division by 2.
n = n << 1;   // Multiply n with 2
n = n >> 1;   // Divide n by 2

0110 + 0110 is equivalent to 0110 * 2, which is equivalent to shifting 0110 left by 1.

0100 equals 4, and multiplying by 4 is just left shifting by 2. So we shift 0011 left by 2 to get 1100.

If we XOR a bit with its own negated value, we will always get 1. Therefore, the solution to a^(~a) will be a sequence of 1s.

~0 is a sequence of 1s, so ~0 << 2 is 1s followed by two Os. ANDing that with another value will clear the last two bits of the value.
x ^ 0s = x
x ^ 1s = ~x 
x ^ x = 0

x & 0s = 0
x & 1s = x 
x & x = x 

x | 0s = x
x | 1s = 1s 
x | x = x

A positive number is represented as itself while a negative number is represented as the two's complement of its absolute value (with a 1 in its sign bit to indicate that a negative value).

To find two's complement, we invert the bits in the positive representation and then add 1. For example, 3 is represented as 011 in binary. Flip the bits to get 100, add 1 to get 101, then prepend the sign bit (1) to get 1101.

In a logical right shift, we shift the bits and put a 0 in the most significant bit. It is indicated with a >>> operator.

In an arithmetic right shift, we shift values to the right but fill in the new bits with the value of the sign bit, which has the effect of(roughly) dividing by two. It is indicated by a > > operator.

The right shift operator ">>" keeps the sign bit intact i.e. first or most significant bit never lost, doesn't matter how many times you shift. On the other hand unsigned right shift operator ">>>" doesn't preserve sign of original number and fills the new place with zero, hence its also known as right shift with zero fill.

A sequence of all ls in a(signed) integer represents -1.

Binary bits is can be used instead of array of flags and enables to easily check if any flag is true. In order to check that no bits in the integer are 1, we just compare the integer to 0. There is actually a very elegant way to check that an integer has exactly one bit set to 1. If we subtract 1 from it and then AND it with the new number, we should get 0.
var1 = 010010
var2 = 000010

010010 & 111101 = 000000

The below method shifts 1 over by i bits, creating a value that looks like 00010000. By performing an AND with num, we clear all bits other than the bit at bit i. Finally, we compare that to 0. If that new value is not zero, then bit i must have a 1. Otherwise, bit i is a 0.

boolean getBit(int num, int i) {
   return ((num & (1 << i)) != 0);
}
SetBit shifts 1 over by i bits, creating a value like 00010000. By performing an OR with num, only the value at bit i will change. All other bits of the mask are zero and will not affect num.

int setBit(int num, int i) {
   return num | (1 << i);
}

An efficient trick to determine if a number is a power of 2:
boolean isPowerOfTwo(int x) {
     // First x in the below expression is for the case when x is 0
      return x!=0 && ((x&(x-1)) == 0);    
}

Regular Expressions
import java.util.regex.Matcher;
import java.util.regex.Pattern;

Pattern pattern = Pattern.compile("(.*)(\\d+)(.*)", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("order is for QT3000!");
if (matcher.find( )) {
   System.out.println("Found value: " + matcher.group(0) );
   System.out.println("Found value: " + matcher.group(1) );
   System.out.println("Found value: " + matcher.group(2) );
}

I/O Operations

Get input from system input.
Scanner scanner = new Scanner(System.in);
String myString = scanner.next();
int myInt = scanner.nextInt();
scanner.close();

Thursday, June 11, 2020

Dynamic Programming

Problems having optimal substructure and overlapping subproblems can be solved by dynamic programming, in which subproblem solutions are Memoized rather than computed again and again. The problems are usually solved in bottom-up manner, were smaller sub-problems are solved first, then the larger sub-problems are solved from them.

Memoization Matrix usually has the column with increasing number from 0 to final result, while the row has the elements which needs to selected for the optimal solution. The first column (and sometimes row) has the 0th column which is a hypothetical condition providing initial values with the matrix.

Levenshtein Distance

Determine the minimum edit distance for converting one string to another string

The Levenshtein distance is a string metric for measuring the difference between two sequences. It is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Here we first create a matrix with Source String as the columns while Destination String as rows. If the row and column character are same then no operation is needed and we directly copy the previous value of the diagonal cell as it is. If the row and column character is different then we find the minimum previous value from the top, side and diagonal, add 1 to it to get the final value as below.
Character Operations: Replace, Delete, Insert.





Below is the code for Levenshtein Distance.
static void editDP(String s1, String s2) 
    { 
        int l1 = s1.length(); 
        int l2 = s2.length(); 
        int[][] DP = new int[l1 + 1][l2 + 1]; 
  
        // initilize by the maximum edits possible 
        for (int i = 0; i <= l1; i++) 
            DP[i][0] = i; 
        for (int j = 0; j <= l2; j++) 
            DP[0][j] = j; 
  
        // Compute the DP matrix 
        for (int i = 1; i <= l1; i++) { 
            for (int j = 1; j <= l2; j++) { 
  
                // if the characters are same 
                // no changes required 
                if (s1.charAt(i - 1) == s2.charAt(j - 1)) 
                    DP[i][j] = DP[i - 1][j - 1]; 
                else { 
  
                    // minimu of three operations possible 
                    DP[i][j] = min(DP[i - 1][j - 1], 
                                   DP[i - 1][j], DP[i][j - 1]) 
                               + 1; 
                } 
            } 
        }  
    } 



Coin Change Problem

When given coins of different denominations and a total amount, find out the number of combinations that make up that amount. Here the assumption is that there are unlimited supply of given coin denominations.
In the below table shows the P(n) for each number, which is the Number of ways representing an Integer as addition of positive Integers.

n Ways P(n)
1 (1) 1
2 (2), (1+1) 2
3 (3), (2+1), (1+1+1) 3
4 (4), (3+1), (2+2), (2+1+1), (1+1+1+1) 5


Lets see below example

Amount = 12
Coins = {1, 2, 5}

ForEach Coin in the given coin denominations i.e. {1, 2, 5}

If Amount >= Coin then
   Combinations[Amount] += Combinations[Amount - Coin]


First initialize the Combinations array with 0 value for all elements.

Combinations when Coin is 1
Set Combinations[0] = 1 when coin is 0, which means using coin 0 to achieve amount 0 there is only 1 combination.

Then we use the above logic to determine the next value of the combination when coin is 1 and amount is 1.
Combinations[1] = Combinations[1-1] + Combinations[1] = Combinations[0] + Combinations[1] = 1 + 0 = 1

Similarly, 
Combinations[2] = Combinations[2-1] + Combinations[2] = Combinations[1] + Combinations[2] = 1 + 0 = 1


0 1 2 3 4 5 6 7 8 9 10 11 12
1 1 1 1 1 1 1 1 1 1 1 1 1


Combinations when Coin is 2
When Amount is 0 or 1 and Coin is 2, the condition (Amount >= Coin) becomes false, so we skip for Amounts 0 and 1.

When Amount is 2 and Coin is 2, we have,
Combinations[2] = Combinations[2-2] + Combinations[2] = Combinations[0] + Combinations[2] = 1 + 1 = 2

The original value of Combinations[2] was 1 from above.
Similarly when Amount is 3 and Coin is 2, we have,
Combinations[3] = Combinations[3-2] + Combinations[3] = Combinations[1] + Combinations[3] = 1 + 1 = 2

Similarly when Amount is 4 and Coin is 2, we have,
Combinations[4] = Combinations[4-2] + Combinations[4] = Combinations[2] + Combinations[4] = 2 + 1 = 3


0 1 2 3 4 5 6 7 8 9 10 11 12
1 1 2 2 3 3 4 4 5 5 6 6 7


Combinations when Coin is 5
When Amount is 0 to 4 and Coin is 5, the condition (Amount >= Coin) becomes false, so we skip Amounts 0 and 4.

When Amount is 5 and Coin is 5, we have,
Combinations[5] = Combinations[5-5] + Combinations[5] = Combinations[0] + Combinations[5] = 1 + 3 = 4

Similarly when Amount is 6 and Coin is 5, we have,
Combinations[6] = Combinations[6-5] + Combinations[6] = Combinations[1] + Combinations[6] = 1 + 4 = 5


0 1 2 3 4 5 6 7 8 9 10 11 12
1 1 2 2 3 4 5 6 7 8 10 11 13


Another example were Amount = 4 and Coins = {1, 2, 3}


Below is the code for Coin Change Problem

int change(int amount, int[] coins) {

  int[] combinations = new int[amount + 1];
  combinations[0] = 1;

  for(int coin : coins) {
    for(int i = 1; i < combinations.length; i++) {
       if(i >= coin) {
         combinations[i] += combinations[i - coin];
       }
    }
  }
}



Longest Common Subsequence (LCS) Problem

Longest Common Subsequence Problem is to find the common subsequences but are not required to occupy consecutive positions with the original sequences.
For example, consider the following sequences, A and B

A = bqdrcvefgh
B = abcvdefgh

Here the length of LCS is 7,
LCS are bdefgh and bcvefgh

We first construct a matrix with columns as characters of string A and rows as characters of string B.
  • If one of the string is empty or both the strings are empty then the length of the longest common subsequence is 0, hence we put zero for both the empty character row and empty character column.
  • When the row character matches with the column character for the cell, then take the previous diagonal value in the matrix and add 1 since there is a match, to get the final value of the cell.
  • When the row character does not match with the column character for the cell, then take the maximum between the above cell and the left cell, to get the final value of the cell.



int longestCommonSubsequence(String A, String B) {

  int m = A.length(), n = B.length();

  int[][] table = new int[m+1][n+1];

  for(int i = 0; i <= m; i++) {
     table[i][0] = 0;
  }

  for(int j = 0; j <= n; j++) {
     table[0][j] = 0;
  }

  for(int i = 1; i <= m; i++) {

     for(int j = 1; j <= n; j++) {

        if(A.charAt(i - 1) == B.charAt(j - 1))
           table[i][j] = table[i - 1][j - 1] + 1;
        else 
           table[i][j] = Math.max(table[i - 1][j], table[i][j - 1]);
     }
  }

  return table[m][n];
}


Subset Sum Problem

Given a set of positive integers and an integer s, find if there is any non-empty subset whose sum is s.

For example set = { 1, 2, 7, 5 }
sum = 8 has subset { 1, 2, 5 }
sum = 4 has no subset

First the given set is sorted in ascending order.

Logic:
Since this is a dynamic programming problem we create a matrix with columns as 0 incrementing by 1 till the sum s. The rows are each element in the example set in a sorted order.

Truth Value = (Truth Value by Excluding New Value) OR (Truth Value by Including New Value)

Truth value by excluding the new value in the current cell comes from the value of upper cell within the matrix.
Truth value by including the new value in the current cell comes from getting the cell value of previous/upper row and the column equal to (sum - new value of the row) within the matrix.
Finally, we return true if we get subset by including or excluding current item else we return false.

To find the subset which gives the sum s, we go to the last cell and backtrack to check were the True value is coming from. If value in the cell above it is False, then it means current cell has become True after including the current element. So include the current element and check for the sum = sum – current element.


public static boolean subSetDP(int[] A, int sum) {
        boolean[][] solution = new boolean[A.length + 1][sum + 1];
        // if sum is not zero and subset is 0, we can't make it 
        for(int i=1; i<=sum; i++){
            solution[0][i]=false;
        }
        // if sum is 0 the we can make the empty subset to make sum 0
        for(int i=0; i<=A.length; i++){
            solution[i][0]=true;
        }
        
        // do for ith item
        for(int i=1; i<=A.length; i++){
            // consider all sum from 1 to sum
            for(int j=1; j<=sum; j++){
            
                // don't include ith element if j-array[i-1] is negative
                if(A[i-1] > j) {
                   //first copy the data from the top
                   solution[i][j] = solution[i-1][j];
                }
                else {
                    // find subset with sum j by excluding or including the ith item
                    solution[i][j] = solution[i-1][j] || solution[i-1][j - A[i-1]];
                }
            }
        }
        return solution[A.length][sum];
    }


Partition Problem

Given a set of positive integers, find if it can be divided into two subsets with equal sum.

Example Set S = {3,1,1,2,2,1}
We can partition S into two partitions each having sum 5.
S1 = {1,1,1,2}
S2 = {2,3}

Another solution could be,
S1 = {3,1,1}
S2 = {2,2,1}

// Return true if there exists a subset of array[0..n) with given sum
private bool subsetSum(int arr[], int n, int sum)
{
	// T[i][j] stores true if subset with sum j can be attained with
	// using items up to first i items
	bool T[n + 1][sum + 1];

	// if 0 items in the list and sum is non-zero
	for (int j = 1; j <= sum; j++)
		T[0][j] = false;

	// if sum is zero
	for (int i = 0; i <= n; i++)
		T[i][0] = true;

	// do for ith item
	for (int i = 1; i <= n; i++)
	{
		// consider all sum from 1 to sum
		for (int j = 1; j <= sum; j++)
		{
			// don't include ith element if j-arr[i-1] is negative
			if (arr[i - 1] > j)
				T[i][j] = T[i - 1][j];
			else
			// find subset with sum j by excluding or including the ith item
				T[i][j] = T[i - 1][j] || T[i - 1][j - arr[i - 1]];
		}
	}

	// return maximum value
	return T[n][sum];
}

// Partition problem - Return true if given array arr[0..n-1] can
// be divided into two subsets with equal sum
public bool partition(int arr[], int n)
{
	int sum = 0;
	for (int i = 0; i < n; i++)
	sum += arr[i];

	// return true if sum is even and array can can be divided into
	// two subsets with equal sum
	return !(sum & 1) && subsetSum(arr, n, sum/2);
}



0/1 Knapsack Problem

We are given a set of items, each with a weight and a value (or profit) and need to determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. The 0/1 indicates that the item can either be selected or not selected, i.e. items are indivisible.

Sum of the Weights <= Total Weights
Sum of the value should be maximum

Consider the below example were we have items {a, b, c, d, e} with weights and values as below. We need to in the subset with maximum sum of value were sum of their weights should be equal to or less then 9. The quantity of all the items is 1. We create the matrix as below, with the columns as increasing number of weights aka Sum of weights from 0 to the total weight. The rows of the matrix are the sorted using item values.


weight value
a 1 1
b 3 4
c 4 5
d 5 7
e 2 4
item value weight 0 1 2 3 4 5 6 7 8 9
a 1 1 0 1 1 1 1 1 1 1 1 1
e 4 2 0 1 4 5 5 5 5 5 5 5
b 4 3 0 1 4 5 5 8 9 9 9 9
c 5 4 0 1 4 5 5 8 9 10 10 13
d 7 5 0 1 4 5 5 8 9 11 12 13

We cannot achieve the zero weight, using the weights of any items so entire column is assigned the value 0.  For column with weight 1, since item a which has weight 1 can contribute in forming the sum of the weights from 1 to 9, we add 1 to entire row. Now we follow below logic to determine the current value of the cell,

1)   If item weight <= sum of weights,
      value = maximum of(value by excluding the new item, value by including the new item)

To calculate the value by including the new item, we add
  • value of the new item (not the weight)
  • value from above row and the column (sum of weights in column - current item's weight).
2)   If item weight > sum of the weights, copy the value from the above cell.

Use backtracking to determine from were the final value of 13 came to determine the weights or the items included in the result collection.

private int knapSack(int sumWeight, int weights[], int values[]) { 

        int n = values.length;
        
        // K[i][j] stores the maximum value that can be attained with
        // weight less than or equal to j using items up to first i items
        int K[][] = new int[n + 1][sumWeight + 1]; 

	    for (int j = 0; j <= sumWeight; j++) {
		   K[0][j] = 0;
        }

        for (int i = 1; i<= n; i++) { 
        
            // consider all weights from 0 to maximum capacity sumWeight
            for (int j = 0; j<= sumWeight; j++) { 
                    
               	// don't include ith element if j-weights[i-1] is negative
			    if (weights[i-1] > j)
				    K[i][j] = K[i-1][j];
			    else
			    // find max value by excluding or including the ith item
				    K[i][j] = Math.max(K[i-1][j], K[i-1][j-weights[i-1]] + values[i-1]);
            } 
        } 
  
        return K[n][sumWeight]; 
    } 


Rod Cutting Problem

Given a rod of length n and list of pieces of rod of length i where 1 <= i <= n, find the optimal way to cut the rod into smaller rods in order to maximize the profit.

Rod of length = 5
lengths of pieces = {1, 2, 3, 4}
profits of pieces = {2, 5, 7, 8}

If we cut the rod in the given lengths of piece, then profit of each piece is given in corresponding profits set. The rod has to be cut in such a way that we get the maximum profit.


We create the memoization matrix, with column as total length of from 0 to length of the rod, and rows as length of each piece given in the set. Each cell has the profit value for the corresponding piece with the length.

The piece with length 0 is hypothetical condition, hence, entire row's values is 0.
Also if we want to make the total length of 0 for column with sum of length as 0, none of the piece from length 1 to 4 satisfy the condition, hence all entries there are 0.
If we have a piece of length 1 and we want total length to be 1, we need only 1 piece with length 1 to achieve that, hence the profit value of piece 1 is added to the corresponding cell. Similarly, to achieve the total length of 2, we need two pieces of length 1, hence we add profit value of piece 1 twice to corresponding cell. Similar goes for columns 3,4 and 5 with row of piece length 1.

When the row piece length is greater than column total length we copy the cell value from upper row. If the row piece length is less than or equal to column total length, then we use the below logic. Profit by including new piece is,

(Profit of current piece) + 
(Profit of cell with column as (Current Total length - Current piece length) in current row).

Profit = max (Profit by excluding new piece, Profit by including new piece)

profit length 0 1 2 3 4 5
0 0 0 0 0 0 0 0
2 1 0 2 4 6 8 10
5 2 0 2 5 7 10 12
9 3 0 2 5 9 11 14
6 4 0 2 5 9 11 14

// n is the maximum length of the rod
public int rodCut(int[] price, int n)
	{
		// T[i] stores maximum profit achieved from rod of length i
		int[] T = new int[n + 1];
        T[0] = 0;

		// consider rod of length i
		for (int i = 1; i <= n; i++)
		{
			// divide the rod of length i into two rods of length j
			// and i-j each and take maximum
			for (int j = 0; j < i; j++) {
				T[i] = Math.max(T[i], price[j] + T[i - (j + 1)]);
			}
		}

		// T[n] stores maximum profit achieved from rod of length n
		return T[n];
	}



Word Break Problem

Given a string and a dictionary of words, determine if string can be segmented or split into a space-separated sequence of one or more dictionary words.

E.g. dictionary[] = { "this", "th", "is", "famous", "break", "b",
                    "r", "e", "a", "k", "br", "bre", "brea", "ak", "problem"}
string = "Wordbreakproblem"