Consideration on the Capacity Expansion Mechanism of JDk1.8 Source HashMap

Preface

JDK 1.8 has made great changes to HashMap, in which the expansion mechanism of HashMap has been optimized.

Before JDK 1.8, in the case of multi-threading, using HashMap for put operation would cause a dead loop. This is because multiple put operations will lead to HashMap's expansion mechanism, which uses interpolation method to move elements, which will cause a closed loop of the list and form a dead cycle. In JDK 1.8, HashMap uses high and low bits to translate elements, which ensures efficiency while avoiding the problem of dead-cycle caused by capacity expansion under multi-threading.

text

In JDK 1.8, HashMap is expanded by resize method. At the same time, resize method also undertakes the initialization of HashMap.

If the array storing Node nodes is empty, the resize method creates a Node array with a length of 16. At the same time, the critical point of the array is set to accommodate 12 Node nodes (16*0.75), 16 Node nodes are the length of the array, and 0.75 is the load factor. If the current array of Node nodes is not empty, resize method doubles the size of the array, creates an array twice the size of the original length, and translates the node of the original array to a new array.

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        // Step 1: The old array is not empty
        if (oldCap > 0) {
            // Step 1.1: If the length of the old array is greater than or equal to the maximum capacity, reset the critical value
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // Step 1.2: If the old array capacity is larger than the default capacity and the right shift bit is smaller than the maximum capacity, double the capacity
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; 
        }
        // Step 2. If the old array is empty and the critical value is greater than 0, set the new array capacity to the critical value.
        else if (oldThr > 0)
            newCap = oldThr;
        // Step 3. If the old array is empty, the critical value is less than or equal to 0, and the capacity and critical value are set as default values.
        else {
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // Step 4: If the critical value of the new array is 0, set the critical value
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        // Step 5: Create a new array
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        // Step 6: If the old array is not empty, traverse the old array and move the node to the new array
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // Step 6.1: If oldTab[index] has only one Node node, recalculate the index and inject the node into a new array
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // Step 6.2: If oldTab[index] is a tree
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // Step 6.3: If oldTab[index] is a linked list, move the linked list to the new array according to the high and low bits
                    else {
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // Step 6.3.1: If it's low, move the list in order to the list with loHead as its head and loTail as its tail.
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // Step 6.3.2: If it's high, move the list in order to the list with hiHead as its head and loTail as its tail.
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // Step 6.3.4: Assign loHead to a new array
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // Step 6.3.5: Assign hihead to a new array
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

Explain step 6.3. If the old array is not empty, we need to migrate the old array to the new array. Data migration requires traversing the old array and moving each index of the old array to the new array. If the Node linked list is found in the current location when traversing the array, the Hash value of Node node and the capacity of the new array need to be calculated at this time. If the calculated value is 0, assign the Node list of the current position of the old array to the same position of the new array, that is, newTab[j] = loHead. If the calculated value is not zero, the Node list of the current position is assigned to the current position of the new array plus the position of the old array capacity, that is, newTab[j + oldCap] = hiHead.

If you want to understand this process, you need to first understand how HashMap calculates array subscripts in JDK 1.8.

int index = node.hash&(table.length-1);

& The operation ensures that the calculated value is less than or equal to any of the values, so the calculated index is less than or equal to table.length-1.

However, after doubling the capacity of the array, the length of the array has changed. At the same time, we must strictly follow the algorithm of index, otherwise we can not get the corresponding Node node at get. Therefore, after translating the data from the old array to the new one, it is necessary to recalculate the array subscripts according to the length of the new array. If we still calculate the array subscripts through the-operation, we need to traverse all the nodes in the array, and calculate the Hash value of the nodes, and subtract the Hash value from the length of the array by One-Operation to get the array subscripts. The implementers of HashMap obviously found a more convenient algorithm. This algorithm can be divided into two cases.

Note: Note the change of the red number represented by the binary Hash of Node node.

Scenario 1: Hash value and old array length

node.hash & oldTable.length != 0;

Let's assume that the Hash value of Node node has a binary of 10000101, the length of the old array is 16, and the binary is 10000. The index calculated at this time is 5.

Hash     : 1000010101
length-1 : 0000001111
           -----
index    : 0000000101

When we expand the array, the length of the array becomes 32, and the Hash value of Node node is still 10000101. At this point, the index calculated is 21.

Hash     : 1000010101
length-1 : 0000011111
           -----
index    : 0000010101

When the node is translated, the new array subscript for storing the node is the old array subscript plus the length of the old array, that is, the new index = oldIndex + oldLength.

Case 2: Hash value and old array length & operation 0

node.hash & oldTable.length == 0;

If we assume that the binary of the Hash value of Node node is 1000000101, the old array length is 16, and the binary is 10000. The index calculated at this time is 5.

Hash     : 1000010101
length-1 : 0000001111
           -----
index    : 0000000101

When we expand the array, the length of the array becomes 32, and the Hash value of Node is still 1011000101. The index calculated at this time is 5.

Hash     : 1000000101
length-1 : 0000011111
           -----
index    : 0000000101

When the node is translated, the new array subscript that stores the node is calculated to be the same as the old array subscript, that is, newIndex = oldIndex.

 

Tags: JDK less

Posted on Wed, 11 Sep 2019 00:26:17 -0700 by Nacota