Notes 8-Binary Search Tree in Algorithms

  • Binary search tree
    • lookup
    • insert
    • performance
  • Orderliness-related operations
    • Maximum key and minimum key
    • Up and down rectification
    • Choice, ranking
    • Range lookup
  • Delete operation
    • Delete maximum and minimum keys
    • General Delete Operation

Binary search tree

Unordered linked lists and ordered arrays, as previously known, are at least linear in performance and cannot be used in situations with large amounts of data. Next, the binary lookup tree is one of the most important algorithms in computer science, which combines the flexibility of inserting linked lists with the efficiency of searching ordered arrays. A Binary Search Tree is a binary tree in which each node contains a Comparable key and its associated values, and the keys of each node are larger than those of any node in its left subtree and smaller than those of any node in its right subtree.

lookup

When searching in a binary search tree, if the tree is empty, the search misses; if the key being searched is equal to the key of the root node, the search hits; otherwise, the search continues recursively in the subtree; if the key being searched is less than the root node, the left subtree is chosen, or the right subtree is chosen. The code implementation of the lookup algorithm is as follows:

public class BST<Key extends Comparable<Key>, Value> {
    private Node root;

    private class Node {
        private Key key;
        private Value val;
        private Node left, right;
        public int size;

        public Node(Key key, Value val, int size) {
            this.key = key;
            this.val = val;
            this.size = size;
        }
    }

    public Value get(Key key) {
        return get(root, key);
    }

    private Value get(Node x, Key key) {
        if (key == null)
            throw new IllegalArgumentException("calls get() with a null key");
        if (x == null)
            return null;
        int cmp = key.compareTo(x.key);
        if (cmp > 0) {
            return get(x.right, key);
        } else if (cmp < 0) {
            return get(x.left, key);
        } else {
            return x.val;
        }
    }
}

Node class is used to represent the nodes of binary lookup tree. Each node contains keys, values, left and right links, and a node counter which will be used for orderly operations such as maximum and minimum.

insert

When inserting key-value pairs, the search is performed first, and if the key already exists in the symbol table, the corresponding value is updated; if the search is not hit, a new node containing the key-value pairs to be inserted is returned.

public void put(Key key, Value val) {
    root = put(root, key, val);
}

private Node put(Node x, Key key, Value val) {
    if (key == null)
        throw new IllegalArgumentException("calls put() with a null key");
    if (x == null)
        return new Node(key, val, 1);
    int cmp = key.compareTo(x.key);
    if (cmp < 0)
        x.left = put(x.left, key, val);
    else if (cmp > 0)
        x.right = put(x.right, key, val);
    else {
        x.val = val;
    }
    x.size = size(x.left) + size(x.right) + 1;
    return x;
}

The code of the insertion operation is similar to that of the lookup operation, but the insertion operation updates the node value or adds new nodes, and updates the node counter. x.left = put(x.left, key, val); code like this succinctly adds nodes using recursive features. In recursive invocation, it is equivalent to searching downward along a branch of the tree according to the logic of binary search. If it is found, it terminates recursion and updates the value of the node. If it is not found at the bottom of the tree, then key==null is established, and the recursion will terminate. At the same time, the newly initialized node has been hung in X. left or x.right. In the process of recursive roll-out, it is equivalent to climbing up the tree. For each layer, * x.size = size(x.left) + size(x.right) + 1; * is executed, so that after adding nodes, the sizes of all nodes on the relevant path are updated.

performance

Insertion of new nodes and missed searches all need to search from the root node of the whole tree to the bottom of the tree, so the performance of the binary search tree depends on the shape of the tree, because the shape of the tree determines the depth of the tree. In the best case, a tree with N nodes is completely balanced, all empty links are at the bottom, and the distance from the root node is LgN; in the worst case, the shape of the tree becomes a list, and the depth of the tree is N, which can be caused by inserting elements one by one into the binary search tree. In general, the shape of the obtained tree is closer to the best case, and the performance of the binary search tree is at logarithmic level. In the original English version of "Ji Ji", there are 14350 words in more than 7 characters. There are 5737 different words in these words. These words are used as keys to test the performance of different symbol tables. The results are as follows: In the figure, abscissa indicates the number of insertions, ordinate indicates the number of comparisons when inserting, grey points indicate the actual number of comparisons when inserting, red points indicate the average number of comparisons (total number of comparisons/number of insertions). We have learned the previous implementation based on the chaotic list and ordered array, the average number of times are 2246 and 484, respectively. We can see that the binary search tree is no matter in the word ratio. In terms of the number of comparisons or the average number of comparisons, progress has been made by leaps and bounds of magnitude.

Orderliness-related operations

Binary lookup trees not only have good performance, but also support orderly related operations because they can keep keys orderly.

Maximum key and minimum key

The value of the left subtree of a node is less than that of the right subtree, so the minimum may be in the left subtree. If the left subtree is empty, the current node is the minimum. Based on this algorithm, the maximum and minimum codes are realized as follows:

public Key min() {
    if (isEmpty())
        throw new NoSuchElementException("calls min() with empty symbol table");
    return min(root).key;
}

private Node min(Node x) {
    if (x.left == null)
        return x;
    else
        return min(x.left);
}

public Key max() {
    if (isEmpty())
        throw new NoSuchElementException("calls max() with empty symbol table");
    return max(root).key;
}

private Node max(Node x) {
    if (x.right == null)
        return x;
    else
        return max(x.right);
}

Up and down rectification

For downward integration, if the given key is less than the key of the root node, then the maximum value of keys less than or equal to the given key is in the left subtree of the root node. If the given key is larger than the root node, then only when there are nodes smaller than or equal to the given key in the right subtree of the root node, the value of downward integration will appear in the right subtree. Otherwise, the root node is the value to be found. The method and method of upward integration This is similar:

public Key floor(Key key) {
   Node n = floor(root, key);
   if (n == null) {
       return null;
   } else {
       return n.key;
   }
}

private Node floor(Node x, Key key) {
   if (x == null) {
       return null;
   }

   int cmp = key.compareTo(x.key);
   if (cmp == 0)
       return x;
   if (cmp < 0)
       return floor(x.left, key);

   Node n = floor(x.right, key);
   if (n == null) {
       return x;
   } else {
       return n;
   }
}

public Key ceiling(Key key) {
   if (n == null) {
       return null;
   } else {
       return n.key;
   }
}

private Node ceiling(Node x, Key key) {
   if (x == null) {
       return null;
   }

   int cmp = key.compareTo(x.key);
   if (cmp == 0)
       return x;
   if (cmp > 0)
       return ceiling(x.right, key);

   Node n = ceiling(x.left, key);
   if (n == null) {
       return x;
   } else {
       return n;
   }
}

Choice, ranking

The ranking starts at 0, and the selection method select(k) returns the ranked key, with K less than its key in the tree. If the node T in the left subtree is greater than k, it will continue to search in the left subtree. If t equals k, then the root node is the key to find. If t is less than k, the key ranked k-t-1 in the right subtree will be searched. The resulting code is as follows:

public Key select(int k) {
    return select(root, k).key;
}

private Node select(Node x, int k) {
    if (x == null) {
        return null;
    }

    int t = size(x.left);
    if (t > k) {
        return select(x.left, k);
    } else if (t < k) {
        return select(x.right, k - t - 1);
    } else {
        return x;
    }
}

The rank() method is the inverse of the selection method, which returns the order of a given key. If the given key is equal to the root node, then the key ranking is the total number of nodes in the left subtree of the root node t; if the given key is less than the root node, the recursive calculation continues in the left subtree; if the given key is larger than the root node, it returns t+1 plus its ranking in the right subtree.

public int rank(Key key) {
    return rank(key, root);
}

private int rank(Key key, Node x) {
    if (x == null) {
        return 0;
    }
    int cmp = key.compareTo(x.key);
    if (cmp > 0) {
        return size(x.left) + rank(key, x.right) + 1;
    } else if (cmp < 0) {
        return rank(key, x.left);
    } else {
        return size(x.left);
    }
}

Range lookup

Range lookup requires that all keys in a given range be returned. The basic method of traversing a binary tree, intermediate traversal, is used here. First traverse all keys in the left subtree, then traverse the root node, and finally all keys in the right subtree. This process is carried out recursively, and all nodes can be traversed in the order of small to large.

public void keys(Node x, Queue<Key> queue, Key lo, Key hi) {
    if (x == null)
        return;
    int cmplo = lo.compareTo(x.key);
    int cmphi = hi.compareTo(x.key);
    if (cmplo < 0)
        keys(x.left, queue, lo, hi);
    if (cmplo <= 0 && cmphi >= 0)
        queue.enqueue(x.key);
    if (cmphi > 0)
        keys(x.right, queue, lo, hi);
}

Delete operation

Delete maximum and minimum keys

When deleting the smallest key, you need to go deep into the left subtree of the root node until you meet an empty link, and then point the link to the right subtree of the node. The node to be deleted will be cleaned up by the garbage collector because it is not referenced by any object. The process of deleting the largest key is similar.

public void deleteMin() {
    root = deleteMin(root);
}

private Node deleteMin(Node x) {
    if (x.left == null)
        return x.right;
    x.left = deleteMin(x.left);
    x.size = size(x.left) + size(x.right) + 1;
    return x;
}                           

public void deleteMax() {
    root = deleteMax(root);
}

private Node deleteMax(Node x) {
    if (x.right == null)
        return x.left;
    x.right = deleteMax(x.right);
    x.size = size(x.left) + size(x.right) + 1;
    return x;
}

The deleteMin(Node x) method returns to the node x unless it encounters empty links. Only the last recursion points to the last node x.right. When the recursion exits, the node counter on the path will be updated.

General Delete Operation

delete() is the most difficult method to implement in binary search tree. When deleting the largest and smallest keys, only one of the two sub-nodes of the deleted node is not empty, but the general node will have two sub-nodes. After deleting this node, we need to deal with its two sub-nodes reasonably. In 1962, T.Hibbard proposed the first way to solve this problem. After deleting node x, he filled its position with its successor nodes. Because X has a right sub-node, its successor node is the smallest node in its right sub-tree. Such substitution still ensures the orderliness of the tree, because there are no other keys between x.key and its successor nodes. Four steps are required to complete this operation:

  • Save the link to the node to be deleted as t;
  • Point x to its successor node min(t.right);
  • The right link of X (originally pointing to a binary search tree with all nodes larger than x.key) points to deleteMin(t.right), which is a sub-binary search tree with all nodes larger than x.key after deletion.
  • Set the left link of x (originally empty) to t.left (all keys below it are smaller than the deleted node and its successors).
public void delete(Key key) {
    root = delete(root, key);
}

private Node delete(Node x, Key key) {
    if (x == null)
        return null;

    int cmp = key.compareTo(x.key);
    if (cmp < 0)
        x.left = delete(x.left, key);
    else if (cmp > 0)
        x.right = delete(x.right, key);
    else {
        if (x.right == null)
            return x.left;
        if (x.left == null)
            return x.right;
        Node t = x;
        x = min(t.right);
        x.right = deleteMin(t.right);
        x.left = t.left;
    }
    x.size = size(x.left) + size(x.right) + 1;
    return x;
}

Tags: Programming less

Posted on Tue, 08 Oct 2019 14:33:02 -0700 by leoric1928