Tech Blog: February 2017

Sunday, February 26, 2017

Producer - Consumer problem Implementation in java

Problem - Two sets of threads, producer set and consumer set, are writing and reading data from a buffer. Ensure that producer set doesn't produce to a full buffer and consumer set doesn't consume from empty buffer.

We need a data structure that would be shared between the two sets of threads. The producers put data into this data structure while consumers consumer from it. We could use queue, linked lists or any of other data structures that support the implementation of a circular buffer, but here we are more concerned about the concurrency aspect of the problem. To keep the code simple, lets use an array and we will make a circular array so that once we reach the max size we can circle back to the start position.
Read and write access to this array is possible through produce and consume methods. We need to ensure that at any moment, at most one thread can produce to or consumer from the buffer. Apart from guarding the buffer, we also need to ensure that the critical sections of our code are accessed by at most one thread at a time. We will use one Synchronization primitive(mutex, lock, Semaphore) to guard the buffer and to guard the critical sections.

We can use a mutex(Semaphore with 1 permit in java) to guard a resource & critical section. But we will use a java lock as it supports condition variables that can be used to wait on a condition.
Below is the declaration of various variables we have discussed until now.

class ProducerConsumer<T> {
private static final int MAX_SIZE = 10;
// buffer to hold data from producer and deliver to consumer
private static final T[] buffer = (T[]) new Object[MAX_SIZE];
// Lock to guard the buffer and critical sections.
private static final Lock lock = new Lock();
// Condition variables associated with the lock. Threads can wait on these CVs and the lock will be
// relinquished while a thread is waiting.
private static final Condition notFull = lock.newCondition();
private static final Condition notEmpty = lock.newCondition();
// positions of read and write indexes in the buffer
private static int readIdx = 0, writeIdx = 0;
// keep track of count of objects in the buffer to know whether the circular buffer is full.
private static int count = 0;
}

Now for the actual produce consume blocks, we always ensure that we have a lock before performing any operation on the buffer and make use of condition variables to make sure that
1. No thread consumes from an empty buffer.
2. No thread produces to a full buffer.

public void produce(T data) throws InterruptedException {
lock.lock();
while(count == MAX_SIZE) {
notFull.wait();
}
// Now the buffer is not full and its this thread's turn to produce.
buffer[writeIdx++] = data; // store into the buffer
count++;
writeIdx %= MAX_SIZE; // Its a circular buffer. So update the index.
notEmpty.notify(); // Now the buffer is definitely not empty. So notify threads that are waiting.
lock.unlock();
}
public T consume() throws InterruptedException {
lock.lock();
while(count == 0) {
notEmpty.wait();
}
// Now the buffer is not empty and this thread can consume.
T data = buffer[readIdx++];
count--;
readIdx %= MAX_SIZE;
notFull.notify();
lock.unlock();
return data;
}

Implementation of Critical Section in java.

Critical section problem from Operating Systems Concepts goes like this -
Consider a process with n threads and each one of them executes a piece of code which involves changing shared variables, updating a table, writing to a file etc. This piece of code needs to be guarded so that multiple threads don't enter it at the same time. This block of code is the critical section. The critical section problem is to design a protocol so that threads can co-operate.

In order to implement a critical section, we can use one of the OS synchronization primitives such as mutex or Semaphore. Java provides an implementation of Semaphore which is useful to implement critical section. A semaphore with one permit is a mutex. So the below declaration gives us a mutex in java.

Semaphore mutex = new Semaphore(1);

Now that we have a mutex, we can use it to enforce the requirements of a critical section. Essentially the thread that needs access to critical section would wait on the semaphore and whenever another thread relinquishes the control of Semaphore, it gets access to the critical section.

public class CriticalSection {
private static Semaphore mutex = new Semaphore(1);

public synchronized void waitCS() {
mutex.wait();
}

public synchronized void notifyCS() {
mutex.notify();
}

}

Well there's no fun implementing this Critical Section. We haven't added any new functionality on top of what's provided by Semaphore. We could as well use a Semaphore directly to achieve the above functionality instead of wrapping the semaphore inside another class and name it Critical Section.
So lets add some value to this implementation.
A critical section differs from a mutex in that it avoids unnecessary kernel transitions, making it a faster, efficient & light weight implementation of mutex. In the context of a critical section, kernel transitions happen during system calls for wait() & notify(). These transitions are expensive and need to be avoided.

We will use a variable to keep track of the number of threads waiting for access to critical section. When there are no threads waiting for access to Critical Section, we can safely skip system calls(wait() and notify()) and hence avoid kernel transitions.

public class CriticalSection {
private static Semaphore mutex = new Semaphore(1);

// 0 - no threads are waiting - uncontested CS
// > 1 - threads are waiting - contested CS
private static AtomicInteger counter = new AtomicInteger(0);

public synchronized void waitCS() {
while(counter.incrementAndGet() != 1) {
mutex.wait();
}
}

public synchronized void notifyCS() {
int count = counter.get();
counter.set(0);
while(count > 1) {
mutex.notify();
}
}
}

References - Silberschatz, Abraham, Greg Gagne, and Peter B. Galvin. Operating System Concepts, Seventh Edition. N.p.: John Wiley & Sons, 2004. Print.

Sunday, February 19, 2017

Permutations of a String.

One of the standard algorithms that has many applications is the permutation problem. Permutations of a string involves generating all possible arrangements of a String. In order to do so, we keep part of a string the same, while generating the possible arrangements of remaining string.

Given a string ABCD, in order to generate all the permutations,
1. Fix the position of A and permute rest of the String BCD. Now for the String BCD, we fix the position of B and generate permutations for CD.(1a) Next we swap B & C in BCD and fix the position of C, generate the permutations for BD. (1b)
Finally we swap B&D in BCD and fix the position of D, generate the permutations for BC. (1c). This completes the permutations for BCD.
2. Now we swap B with A in ABCD, fix the position of B and generate the permutations for ACD. (2a, 2b, 2c)

Similarly we swap C and D with A and generate permutations for BAD(3a, 3b, 3c) and BCA(4a, 4b, 4c).

At every step, we are performing the same operations swap and permute substring on a smaller data set. So we can implement a recursive solution.

Here's the recursive solution in java.

void permute(int read, int write, String[] input, String[] output) {
if(write == output.length) {
printPermutation(output);
return;
}
for(int i = read; i < input.length; i++) {
swap(i, read);
output[write] = input[read];
permute(read + 1, write + 1, input, output);
swap(i, read);
}
}

In every recursive call, we move the ith element of input array to current position and fix the current position. We copy this element to the output array. Then we process the remaining string(recursive call) and swap the ith element back to its position. When the output array is full, we print the array and start again.
For example, for a string ABCD and read = 0 & i = 0, we swap A with A, write A to output array and make a recursive call to process the remaining string ABC - permute(1,1,[A,B,C,D],[A]).

Below recursion tree helps in understanding the recursive code.

Sunday, February 12, 2017

Knuth Morris Pratt table construction.

KMP algorithm for pattern matching relies on the partial match table. Construction of partial match table is the bulk of work involved in KMP algorithm and could be very daunting to understand. Below, I have tried to document my understanding of the partial match table construction.

One of the easier ways to construct a match table for a pattern is to visualize the pattern as a finite state machine(FSM). A finite machine represents a mathematical model of computation. The machine can be in any one of the possible states at any point of time. The machine has a predefined states and based on the input the machines ends up in one of the states. If and when the machine reaches the end state, we have found the pattern we are looking for. FSM visualization should be intuitive from the below discussion and no prior knowledge is required.

Lets construct an FSM for the pattern ABABAC. A FSM has a start state (0 in this case) and an end state (6 in this case). Each character in the pattern takes the machine from one state to the next.

When the FSM is in state 0, an input of A takes it to state 1. In state 1, an input of B takes the FSM to state 2 and so on.

Now, at each state, we have to account for the possibility of a non-matching input character. A non-matching input character will still leave the machine in some state.

How did we account for the non-matching input characters at each state ? The idea behind using a FSM for pattern matching is that we don't always need to back up to the start of the pattern if there are some characters of pattern that have matched with the input. So we back up to the point where we have a match and start comparison again. When we are in state#0 and the next character is a B or C, we stay in the same state as highlighted in red above. When we are in state 1 and the next input character is a A, the stream of characters processed is AA. But we were expecting AB to move to state 2. Since there's a mismatch, we have to ignore the first character of the string AA and process the remaining characters hoping to find a match with the subsequent characters. So we are left with a A and this input to our FSM again leaves us in state 1 as highlighted in green above. If the input were AC, again there's a mismatch and we ignore the first char and we are left with C. If C is provided as an input to state 0, the FSM remains in state 0. So at every state, if there's a mismatch due to next character of input, we remove the first character of the input, provide the rest of the string as an input to the FSM and start from state 0.

Lets apply this procedure to a couple more states. Refer to the red lines above. At state 2, we have already processed the string AB. Another 'A' would take the FSM to state 3. 'B' would lead to a mismatch and we have to ignore the first character of the input processed until now. So our input stream is effectively ABB - A = BB. Now process BB using the FSM machine. Input of BB would leave the FSM in state 0 and hence the path(broken red line) from state 2 to state 0 with a B as the label. Same is the case with an input of C to state 2.(Refer to the green lines above) The stream of characters would be ABC.On removing the first character and processing the string BC through the FSM, would lead the machine to state 0. Hence the path(broken green line) from state 2 to state 0.

At state 5(figure above), we have already processed the characters, ABABA. 'C' would take the FSM to end state. 'A' would mean an input stream of ABABAA. Since there's a mismatch, remove the first char and we are left with BABAA. When this is processed, the FSM ends up in state 1 as highlighted in red below. So state 5 on a A leads back to state 1 as highlighted by blue broken arrow below.

Similarly for state 5, an input of B would mean a stream of ABABAB. Since the input is not a 'C' we need to figure out where to go next. Discard the first character of the stream and we have BABAB. Process this string through our FSM and we end up in state 4 as highlighted in red below. So state 5 on B leads back to state 1 as highlighted by blue broken arrow below.

Below is the final FSM.

Saturday, February 11, 2017

Polynomial hash function - Horner's method for calculating Hash code

There are various ways of computing hash code for strings. A naive way is to add the ASCII values of all the characters of a String. Doing so would lead to many collisions. Another way is to multiply the ASCII values of all characters in a string.
String in java uses polynomial hashing technique to compute hash code. The primary advantage of a polynomial hash function is that the computed hash code retains information about all the characters of String and thus avoids collisions.
The polynomial hash function used in Java String implementation is
For a String of size n,

h(x) = s[0]*R^(n-1) + s[1]*R^(n-2) + ... + s[n - 1]*R + s[n]

For a String "abcd" and R = 31,
h(x) = 97*31^3 + 98*31^2 + 99*31 + 100 ---------------------------- 1

Here R can be any prime number. The reason for using 31 is two fold.
Firstly, when we mod the resultant hash code with any other number, there are no common factors. Moding the hash code with array size is pretty common operation in a hash table.
Secondly, 31 is a Mersenne prime, a prime that's one less than a two power. Multiplying such a prime is very efficient on any processor as it involves two ops only a shift and subtract. For eg. 31 * n = n << 5 - n.

To evaluate a polynomial, Horner's method could be employed.

Horners polynomial evaluation:
hash = 0;
String str = "abcd";

for(char ch : str.toCharArray()) {
hash = hash*31 + ch;
}

This results in the same value as the formula #1 above does.
iteration#1
hash = 0*31 + 97

iteration#2
hash = 31(97) + 98

iteration#3
hash = 31(97*31 + 98) + 99

iteration#4
hash = 31(97*31^2 + 98*31 + 99) + 100

=> 97*31^3 + 98*31^2 + 99*31 + 100 (same as formula#1 above)

Since java supports 32-bit integers, for larger strings, the above formula might lead to integer overflow and thus result in negative hash codes. So if the hash code needs to be used to put values into a hash table backed by an array, the sign bit needs to be discarded. This is achieved by (hash & 0x7FFFFFFF)
0x7FFFFFFF(2^31-1) is the Integer.MAX_VALUE in java.
&ing with MAX_VALUE retains all the bits except for the highest order bit(32nd bit), which is the sign bit. So the negative integer is changed to a positive one. This is much more efficient and correct compared Math.abs(). Math.abs() might return a negative value for Integer.MIN_VALUE.

Friday, February 10, 2017

Why are PrimeNumbers used for computing modular hash codes ?

One of the objectives of a hash function is to uniformly distribute keys among the available buckets and avoid collisions to the possible extent. While every hash function starts out with this objective, very few hash functions are able to achieve near uniform distribution. Collisions cannot be avoided but definitely can be minimized.
Modular hashing is a popular hashing technique for generating hash codes of positive integers. The hash code of a positive integer is calculated as h(x) = x%m where m is the number of buckets/slots available.

For m = 10 and an universe of first hundred positive integers(keys), h(x) would uniformly distribute the keys to the available 10 buckets.

The data is uniformly distributed across 10 buckets with 10 keys each.
Real world data is seldom uniformly distributed. Data is usually biased or skewed. In such cases, this hash function doesn't uniformly distribute the data. A good hash function uniformly distributes any type of data.

Now assume that the data is skewed and the data primarily consists of multiples of 5. The modular hash function h(x) does a poor job of distributing the data.

Buckets 0 and 5 have bulk of the data while other buckets are largely empty. Why so ? The multiples of 5 and m have a common factor in 5 and so many of them hashed to bucket#5. The other multiples of 5 which are also multiples of 10 hashed to bucket#0. This is unavoidable. The multiples of m will hash to bucket#0 for any value of m.
(it would be interesting to perform the same exercise for m = 12 and keys that are multiples of 3)
In order to make the distribution more uniform, we need to reduce the number common factors between m and the keys. How do we do so ? by choosing a m, which is prime.

Now lets use the same hash function with a m value that's prime. h(x) = x%m m = 7. I chose 7 as its the nearest prime that's less than 10. 11 is also a good choice, but the number of buckets would be more.

For m = 7, the data is almost uniformly distributed.

Now for the same h(x), m = 7 and universe of first 100 positive integers, assume that the data is skewed towards multiples of 3.

Again h(x) does a good job of distributing the multiples of three.

In conclusion, the distribution of data depends both on h(x), m and the data. Since the input data can be skewed or non-uniformly distributed, we need to work on our hash function and m in order to achieve a uniform distribution among available buckets. In the above examples, I have played around with value of m to achieve a near uniform distribution of data. While 7 is not the best prime number to distribute the keys uniformly, it suffices this particular use case.
Finally, why prime numbers ? Every key that has a common factor with m, will hash to that common factor, thus skewing our distribution of keys. The number of common factors a key can have with m can be reduced by picking a m that's prime.

References:
1. http://cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function
2. http://algs4.cs.princeton.edu/34hash/