Saturday, February 02, 2013

The Optimal Binary Search Tree Construction Problem


Again excerpts and knowledge gained from Tim's lecture on Dynamic Programming, this time I will attempt to elaborate on the problem of optimal binary search tree. A binary search tree is a tree with the property that all keys less than the key of a current selected now would always be available in the left subtree of the node with the additional property that each node can have a maximum of two children. So, what might an optimal binary search tree be? It can be thought of as a BST(binary search tree) with minimal search time. Search time can be thought of as the number of nodes to be traversed to reach the key which is being searched. The question then arises as to how can we know what nodes are going to be searched? This can be understood as gathering statistics or having data about the searches to be made. In some conditions, this constraint can be valid. For example: suppose I have a binary tree which has keys: good(key=1), bad(key=2) and ugly(key=3). (For some unthinkable reason, each word is associated with a key) People search for the word bad generally, say 80% and good and ugly follow with a meek 10% of the time. Let us see how a binary tree can be arranged with these three elements
Tree 1: (10%) people searching for good find it in first hit, (80%) people searching for bad find it in 2 steps and the rest find their ugly in 3 steps. So, an average search time which can be defined as probability of the node * number of nodes to be traversed to reach it. For tree 1: 0.1*1+0.8*2+0.1*3 = 2.0

Tree 2: Good guys still get their word in first hit, but now the uglies go to two. so, 0.1*1+0.8*3+0.1*2=2.7

 Tree 3: 0.8*1+0.1*2+0.1*2=1.2

 Tree 4: 0.1*1+0.8*2+0.1*3=2.0

 Tree 5: 0.1*1+0.1*2+0.8*3=2.7

Yay!! we got an optimal BST. Average search time for tree 3 appears to be minimal. Simple observation might lead to a wrong inference that a greedy algorithm in which highest probability node can be put at the root and simply following greedily might lead us to a solution.

Let's try to prove it otherwise. What if the good guy searches fell to 40, baddies down to 30 and the uglies up to reach the rest, i.e. 30. A simple greedy approach might tempt us to root the tree with good which will give us tree1 or tree2 which have the number: 1.9 (0.4*1+0.3*2+0.3*3) . But hey! Tree 3 has got just (0.3*1+0.4*2+0.3*2) = 1.7

 Greedy Boy don't get to help us here.!! So that is the problem we are dealing with. I will extend this post with an elaborate one as the solution to this problem despite the ample assistance provided by Tim and big bro google took ages to wrap my head around. So, I will attempt a clearer path towards a solution rather than a brain dump.


Please find the detailed solution here

No comments: