Java Concurrency Guide 13: HashMap and Concurrent HashMap full parsing in Java 7/8

Wechat Public [Java Technology Jianghu], a technology station for an Ali Java engineer. Respond to "Java" after paying attention to the public number, you can get free learning materials such as Java Foundation, Advancement, Projects and Architects, as well as popular technology learning videos such as database, distributed, micro-services, which are rich in content and take into account the principles and practices. In addition, the author's original Java learning guide and interview guide for Java programmers will also be presented. Equivalent dry goods resources)

Full parsing of HashMap and Concurrent HashMap in Java 7/8

From https://www.javadoop.com/post/hashmap#toc7

Part of the content is transferred from

http://www.jasongj.com/java/concurrenthashmap

A "hydrology" post today may not be understood by many readers, but I would like to introduce it as an indispensable part of concurrent serial articles. I didn't think it would take much time, but in the end I spent a lot of time to finish this article.

There are a lot of articles about HashMap and Concurrent HashMap on the Internet, but there are a lot of articles lacking in weight, so I want to write an article myself and explain the details clearly, especially Concurrent HashMap in Java 8. Most articles are not clear. Ultimately, I hope that we can reduce the cost of learning. I don't want you to look for all kinds of articles that are not reliable everywhere. After reading one article after another, they are still vague.

Reading Suggestions: Four sections can basically be read independently. It is suggested that beginners can read in the order of Java 7 HashMap - > Java 7 Concurrent HashMap - > Java 8 HashMap - > Java 8 Concurrent HashMap, which can properly reduce the reading threshold.

Reading premise: This article analyses the source code, so at least the reader should be familiar with their interface usage. At the same time, for concurrency, the reader should at least know CAS, ReentrantLock, UNSAFE operation, these basic knowledge will not be introduced in this article. Java 8 uses red and black trees, but this article will not be expanded. Interested readers please find relevant information on their own.

Java7 HashMap

HashMap is the simplest. First, we are very familiar with it. Second, it does not support concurrent operations, so the source code is very simple.

First, let's introduce the structure of HashMap with the following diagram.

This is just a sketch, because it doesn't take into account the expansion of the array, let's talk about it later.

In the larger direction, HashMap contains an array, and then each element in the array is a one-way linked list.

In the figure above, each green entity is an instance of the nested class Entry, which contains four attributes: key, value, hash value, and next for a one-way list.

Capacity: Current array capacity, always maintain 2^n, can be expanded, expanded array size is twice the current size.

Load Factor: Load factor, default 0.75.

Threshold: The threshold for expansion, equal to capacity * loadFactor

put process analysis

It's simpler. Follow the code.

public V put(K key, V value) {
    // When inserting the first element, you need to initialize the array size first
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    // If the key is null, you can look in and eventually put the entry in table[0].
    if (key == null)
        return putForNullKey(value);
    // 1. Find the hash value of key
    int hash = hash(key);
    // 2. Find the corresponding array subscript
    int i = indexFor(hash, table.length);
    // 3. Go through the list at the corresponding subscript to see if there is a duplicate key already in existence.
    //    If so, override it directly, and the put method ends by returning the old value.
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    // 4. There is no duplicate key. Add this entry to the list, and the details will follow.
    addEntry(hash, key, value, i);
    return null;
}

Array initialization

When the first element is inserted into HashMap, the initial array size is determined and the threshold of array expansion is calculated.

private void inflateTable(int toSize) {
    // Ensure that the size of the array must be the n-th power of 2.
    // For example, initialize it like this: new HashMap(20), then process it to an initial array size of 32
    int capacity = roundUpToPowerOf2(toSize);
    // Computation of capacity expansion threshold: capacity * loadFactor
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    // Is it an initialization array?
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity); //ignore
}

Here's a way to keep the array size to the n-th power of 2. Java 7 and Java 8 have corresponding requirements for HashMap and Concurrent HashMap, but the implementation code is slightly different, as you will see later.

Calculate the specific array location

This is simple, we can also YY one: use the hash value of key to modular the length of the array.

static int indexFor(int hash, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return hash & (length-1);
}

This method is very simple. Simply put, it takes the low n bit of hash value. For example, when the length of the array is 32, the lower 5 bits of the hash value of the key are actually taken as its subscript position in the array.

Add nodes to the list

When an array subscript is found, the key is weighted first, and if there is no duplication, the new value is ready to be put into the list header.

void addEntry(int hash, K key, V value, int bucketIndex) {
    // If the current HashMap size has reached the threshold, and the new value has elements in the array location to insert, then expand
    if ((size >= threshold) && (null != table[bucketIndex])) {
        // Expansion. I'll introduce it later.
        resize(2 * table.length);
        // After expansion, the hash value is recalculated
        hash = (null != key) ? hash(key) : 0;
        // Recalculating New Subscripts after Expansion
        bucketIndex = indexFor(hash, table.length);
    }
    // Look Down
    createEntry(hash, key, value, bucketIndex);
}
// It's very simple. It's actually putting the new value in the header of the linked list, and then size ++.
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

The main logic of this method is to determine whether the expansion is needed, first expand if necessary, and then insert the new data into the header of the linked list at the corresponding location of the expanded array.

Array expansion

As we have seen before, when inserting new values, if the current size has reached the threshold and there are elements in the position of the array to be inserted, the expansion will be triggered. After the expansion, the array size will be twice the original size.

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }
    // New Array
    Entry[] newTable = new Entry[newCapacity];
    // Migrate values from the original array into a new, larger array
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

 

This article does not clarify the difference between 1.7 and 1.8 HashMap expansion. Here we add that:

 

 

HashMap Expansion Mechanisms: Differences between JDK 1.7 and 1.8

Resize is to recalculate capacity and add elements to the HashMap object continuously. When the array inside the HashMap object can not load more elements, the object needs to expand the length of the array so that more elements can be loaded. Of course, arrays in Java can't be automatically expanded by replacing existing smaller arrays with a new one, just as we use a small bucket to hold water, and if we want to hold more water, we have to replace a large bucket.

We analyze the source code of resize. In view of the complexity of JDK 1.8 integrated into the red-black tree, we still use JDK 1.7 code to understand it better. There is little difference in essence. Let's talk about the specific difference later.

1 void resize (int new capacity) {// incoming new capacity

2) Entry[] oldTable = table; // Reference to Entry array before expansion

 3     int oldCapacity = oldTable.length;        

4) if (oldCapacity = MAXIMUM_CAPACITY) {before // expansion, if the array size has reached its maximum (2 ^ 30)

5) threshold = Integer.MAX_VALUE; // Modify the threshold to the maximum value of int (2^31-1), so that it will not be expanded in the future

 6         return;

 7     }

 8 

9) Entry [] newTable = new Entry [new Capacity]; // Initialize a new Entry array

10. transfer(newTable); Transfer /!! Transfer data to a new Entry array

11. table = newTable; - - table attribute of // HashMap refers to the new Entry array

12. threshold = int (newCapacity * loadFactor); //Modify threshold

13 }

This is to use a larger array to replace the existing smaller array, transfer() method copies the elements of the original Entry array into the new Entry array.

1 void transfer(Entry[] newTable) {

2) Entry[] src = table; - - Entry [] src refers to the old Entry array

 3     int newCapacity = newTable.length;

4) For (int J = 0; J < src. length; J +) {// traverse the old Entry array

5) Entry < K, V > e = SRC [j]; / / Get each element of the old Entry array

 6         if (e != null) {

7) src[j] = null; // Release the object reference of the old Entry array (after the for loop, the old Entry array no longer references any objects)

 8             do {

 9                 Entry<K,V> next = e.next;

10) int i = indexFor (e.hash, new capacity); //!! Recalculate the position of each element in the array

11) e.next = newTable[i]; //tag [1]

12) newTable[i] = e; / / Place elements on arrays

13) e = next; / / Access to elements on the next Entry chain

14             } while (e != null);

15         }

16     }

17 }

The reference to newTable[i] is assigned to e.next, i.e. the head insertion of a single linked list, where new elements are always placed at the head of the list at the same location; so that elements placed first on an index will eventually be placed at the end of the Entry chain (if a hash conflict occurs), which is different from Jdk1.8, as detailed below. Solution. Elements on the same Entry chain in the old array may be placed in different positions of the new array by recalculating the index position.

Here is an example to illustrate the expansion process. Suppose our hash algorithm simply uses key mod to size the table (that is, the length of the array). The hash bucket array table is size=2, so key = 3, 7, 5, put order is 5, 7, 3. After mod 2, there has been a conflict in table[1]. This assumes that the load factor loadFactor=1, that is, when the actual size of the key-value pair is larger than the actual size of the table. The next three steps are the process of resize the hash bucket array into four, and then all Node s rehash again.

Next, we will explain what JDK 1.8 has done to optimize. After observation, we can find that we use the expansion of the second power (i. e. the length is twice as long as the original), so the position of the element is either in situ or in situ and then moved the position of the second power. See the figure below to understand the meaning of this sentence. n is the length of the table. Figure (a) shows an example of key1 and key2 before expansion to determine the index location. Figure (b) shows an example of key1 and key2 after expansion to determine the index location. hash1 is the hash and high-bit operation result corresponding to key1.

After the element recalculates hash, because n becomes twice as large, the mask range of n-1 is more than 1 bit (red), so the new index will change as follows:

Therefore, when we expand HashMap, we don't need to recalculate hash as JDK 1.7 implementations do. We just need to see if the bit added to the original hash value is 1 or 0. If the index is 0, the index will remain unchanged. If it is 1, the index will become "original index + oldCap". You can see the resize schematic of 16 expanded to 32 as follows:

This design is really ingenious, not only eliminating the time to recalculate hash values, but also because the new 1 bit is 0 or 1 can be considered random, so the resize process evenly disperses the previously conflicting nodes into the new bucket. This is the new optimization of JDK 1.8. One thing to note is that when rehash is in JDK 1.7, when the old list migrates the new list, if the array index position of the new table is the same, the list elements will be inverted, but as can be seen from the figure above, JDK 1.8 will not be inverted. Interested students can study the resize source code of JDK1.8 and write it very well, as follows:

1 final Node<K,V>[] resize() {

 2     Node<K,V>[] oldTab = table;

 3     int oldCap = (oldTab == null) ? 0 : oldTab.length;

 4     int oldThr = threshold;

 5     int newCap, newThr = 0;

 6     if (oldCap > 0) {

7) If you exceed the maximum value, you will no longer expand, so you have to collide with it.

 8         if (oldCap >= MAXIMUM_CAPACITY) {

 9             threshold = Integer.MAX_VALUE;

10             return oldTab;

11         }

12) If the // does not exceed the maximum value, it will be expanded to twice the original value.

13         else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&

14                  oldCap >= DEFAULT_INITIAL_CAPACITY)

15             newThr = oldThr << 1; // double threshold

16     }

17     else if (oldThr > 0) // initial capacity was placed in threshold

18         newCap = oldThr;

19     else {               // zero initial threshold signifies using defaults

20         newCap = DEFAULT_INITIAL_CAPACITY;

21         newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

22     }

23. // Calculate the new resize upper limit

24     if (newThr == 0) {

25

26         float ft = (float)newCap * loadFactor;

27         newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?

28                   (int)ft : Integer.MAX_VALUE);

29     }

30     threshold = newThr;

31     @SuppressWarnings({"rawtypes","unchecked"})

32         Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];

33     table = newTab;

34     if (oldTab != null) {

35 // Move each bucket to the new buckets

36         for (int j = 0; j < oldCap; ++j) {

37             Node<K,V> e;

38             if ((e = oldTab[j]) != null) {

39                 oldTab[j] = null;

40                 if (e.next == null)

41                     newTab[e.hash & (newCap - 1)] = e;

42                 else if (e instanceof TreeNode)

43                     ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);

44) Other {// linked list optimizes heavy hash blocks

45                     Node<K,V> loHead = null, loTail = null;

46                     Node<K,V> hiHead = null, hiTail = null;

47                     Node<K,V> next;

48                     do {

49                         next = e.next;

50) / / Original Index

51                         if ((e.hash & oldCap) == 0) {

52                             if (loTail == null)

53                                 loHead = e;

54                             else

55                                 loTail.next = e;

56                             loTail = e;

57                         }

58. / / Original Index + Old Cap

59                         else {

60                             if (hiTail == null)

61                                 hiHead = e;

62                             else

63                                 hiTail.next = e;

64                             hiTail = e;

65                         }

66                     } while ((e = next) != null);

67) Put the original index in the bucket

68                     if (loTail != null) {

69                         loTail.next = null;

70                         newTab[j] = loHead;

71                     }

72) Put // original index + oldCap in bucket

73                     if (hiTail != null) {

74                         hiTail.next = null;

75                         newTab[j + oldCap] = hiHead;

76                     }

77                 }

78             }

79         }

80     }

81     return newTab;

82 }

 

HashMap Thread Safety Classic Problem: Deadlock in Concurrent Modification

In addition, we introduce a classic problem, traversal deadlock problem, which may be caused by multi-threaded put operation.

 

In multi-threaded usage scenarios, we should try to avoid using thread-insecure HashMap and use thread-safe Concurrent HashMap. So why is HashMap thread insecure? Here's an example to illustrate that using HashMap in concurrent multi-threading usage scenarios can cause a dead loop. The code example is as follows (easy to understand, still using JDK 1.7 environment):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

public class HashMapInfiniteLoop { 

 

    private static HashMap<Integer,String> map = new HashMap<Integer,String>(2,0.75f); 

    public static void main(String[] args) { 

        map.put(5, "C"); 

 

        new Thread("Thread1") { 

            public void run() { 

                map.put(7, "B"); 

                System.out.println(map); 

            }; 

        }.start(); 

        new Thread("Thread2") { 

            public void run() { 

                map.put(3, "A); 

                System.out.println(map); 

            }; 

        }.start();       

    } 

}

Where map is initialized as an array of length 2, loadFactor=0.75, threshold=2*0.75=1, that is, when put ting the second key, map needs resize.

By setting breakpoints, thread 1 and thread 2 debug to the first line of transfer method (3.3 code block). Note that both threads have successfully added data at this time. Release the breakpoint of thread1 to the line "Entry next = e.next;" of the transfer method; then release the breakpoint of thread 2 and let thread 2 resize. The results are as follows.

Note that Thread1 e points to key(3), while next points to key(7), which points to the linked list after the second rehash of the thread.

As soon as a thread is scheduled back to execute, it first executes newTalbe[i] = e, and then e = next, which causes e to point to key(7), while next = e.next in the next loop causes next to point to key(3).

e.next = newTable[i] causes key(3).next to point to key(7). Note: At this point the key(7).next has pointed to the key(3), so the ring list appears.

So when we call map.get(11) with threads, a tragedy occurs -- Infinite Loop.

 

get process analysis

 

The get process is very simple compared to the put process.

  1. hash values are calculated based on key.
  2. Find the corresponding array subscript: hash & (length - 1).
  3. Traverse through the list at the location of the array until you find the key that is equal (== or equals).
public V get(Object key) {
    // As I said before, if the key is null, it will be placed in table[0], so just go through the list at table[0].
    if (key == null)
        return getForNullKey();
    // 
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

getEntry(key):

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    int hash = (key == null) ? 0 : hash(key);
    // Determine the array subscripts, then traverse the list from scratch until it is found
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

Java7 ConcurrentHashMap

Concurrent HashMap and HashMap have similar ideas, but because they support concurrent operations, they are more complex.

The whole Concurrent HashMap consists of a Segment, which represents the meaning of "or" paragraph, so it is described as a segmented lock in many places. Note that I use "slots" in many places to represent a segment.

Simply understand, Concurrent HashMap is a Segment array, Segment is locked by inheriting ReentrantLock, so every operation that needs to be locked is a segment, so as to ensure that each Segment is thread-safe, it also achieves global thread security.

Conurrency Level: Parallel level, concurrent number, Segment number, how to translate is not important, understand it. The default is 16, which means that Concurrent HashMap has 16 Segments, so theoretically, at this time, up to 16 threads can be supported concurrently as long as their operations are distributed on different Segments. This value can be set to other values at initialization time, but once initialized, it can not be expanded.

Specifically, within each Segment, each Segment is very similar to the HashMap described earlier, but it has to be thread-safe, so it is more troublesome to handle.

Initialization

Initial Capacity: Initial capacity. This value refers to the initial capacity of the entire Concurrent HashMap, which needs to be averaged out for each Segment in practice.

Load Factor: Load factor, as we said before, Segment arrays cannot be expanded, so this load factor is used internally for each Segment.

public ConcurrentHashMap(int initialCapacity,
                         float loadFactor, int concurrencyLevel) {
    if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
        throw new IllegalArgumentException();
    if (concurrencyLevel > MAX_SEGMENTS)
        concurrencyLevel = MAX_SEGMENTS;
    // Find power-of-two sizes best matching arguments
    int sshift = 0;
    int ssize = 1;
    // Compute the Parallel Level ssize, because keep the Parallel Level to the n th power of 2
    while (ssize < concurrencyLevel) {
        ++sshift;
        ssize <<= 1;
    }
    // Let's not brainstorm like that. By default, concurrency Level is 16 and sshift is 4.
    // Then we calculate segmentShift at 28 and segmentMask at 15, which will be used later.
    this.segmentShift = 32 - sshift;
    this.segmentMask = ssize - 1;

    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;

    // The initial Capacity is to set the initial size of the entire map.
    // Here, the size of each position in the Segment array is calculated based on the initial Capacity.
    // If initial Capacity is 64, then each Segment or "slot" can be divided into four.
    int c = initialCapacity / ssize;
    if (c * ssize < initialCapacity)
        ++c;
    // The default MIN_SEGMENT_TABLE_CAPACITY is 2, and this value is also exquisite, because in this case, for specific slots,
    // Inserting an element will not expand, but inserting a second element will expand.
    int cap = MIN_SEGMENT_TABLE_CAPACITY; 
    while (cap < c)
        cap <<= 1;

    // Create a Segment array,
    // And create the first element segment[0] of the array
    Segment<K,V> s0 =
        new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                         (HashEntry<K,V>[])new HashEntry[cap]);
    Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
    // Write segment[0] to the array
    UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
    this.segments = ss;
}

When initialization is complete, we get an array of Segment s.

When we initialize it with the new Concurrent HashMap () parametric constructor, the initialization is completed:

  • Segment array length is 16, can not be expanded
  • The default size of Segment[i] is 2, the load factor is 0.75, and the initial threshold is 1.5. That is to say, inserting the first element in the future will not trigger expansion, and inserting the second element will make the first expansion.
  • segment[0] is initialized here, and null is the other location. As to why segment[0] is initialized, the following code will describe it.
  • The current segmentShift values are 32 - 4 = 28, and segmentMask 16 - 1 = 15. Simply translate them into shifts and masks, these two values will be used immediately.

put process analysis

Let's first look at the main process of put. Some of the key details of the operation will be described in detail later.

public V put(K key, V value) {
    Segment<K,V> s;
    if (value == null)
        throw new NullPointerException();
    // 1. Calculate the hash value of key
    int hash = hash(key);
    // 2. Find the location j in the Segment array based on the hash value
    //    hash is 32 bits, unsigned right shift segmentShift(28) bits, the remaining low 4 bits.
    //    Then do an operation with segmentMask(15), that is, j is the last four bits of the hash value, which is the array subscript of the slot.
    int j = (hash >>> segmentShift) & segmentMask;
    // As I just said, segment[0] is initialized at initialization, but the other location is null.
    // ensureSegment(j) initializes segment[j]
    if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
         (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
        s = ensureSegment(j);
    // 3. Insert new values into slots
    return s.put(key, hash, value, false);
}

The first layer is very simple. According to the hash value, the corresponding Segment can be found quickly, and then the put operation inside the Segment.

Segment s are composed of arrays and linked lists.

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    // Before writing to the segment, you need to obtain the exclusive lock of the segment
    //    Look at the main process first, and then introduce this part in detail.
    HashEntry<K,V> node = tryLock() ? null :
        scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        // This is an array inside a segment.
        HashEntry<K,V>[] tab = table;
        // Then use hash value to find the array subscripts that should be placed
        int index = (tab.length - 1) & hash;
        // first is the header of the linked list at that location of the array
        HashEntry<K,V> first = entryAt(tab, index);

        // The following for loops are long, but understandable. Consider that there are no elements in the location and there is already a list.
        for (HashEntry<K,V> e = first;;) {
            if (e != null) {
                K k;
                if ((k = e.key) == key ||
                    (e.hash == hash && key.equals(k))) {
                    oldValue = e.value;
                    if (!onlyIfAbsent) {
                        // Covering old values
                        e.value = value;
                        ++modCount;
                    }
                    break;
                }
                // Continue to follow the list
                e = e.next;
            }
            else {
                // Whether a node is null or not depends on the process of acquiring the lock, but it has nothing to do with it here.
                // If it is not null, set it directly to the list header; if it is null, initialize and set it to the list header.
                if (node != null)
                    node.setNext(first);
                else
                    node = new HashEntry<K,V>(hash, key, value, first);

                int c = count + 1;
                // If the threshold of the segment is exceeded, the segment needs to be expanded
                if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                    rehash(node); // Expansion will also be analyzed in detail.
                else
                    // Without reaching the threshold, place the node in the index position of the array tab.
                    // In fact, it is to set the new node as the header of the original list.
                    setEntryAt(tab, index, node);
                ++modCount;
                count = c;
                oldValue = null;
                break;
            }
        }
    } finally {
        // Unlock
        unlock();
    }
    return oldValue;
}

The overall process is still relatively simple. Because of the protection of exclusive locks, the internal operation of segment s is not complicated. As for the concurrency problem, we will introduce it later.

Now that the put operation is over, let's talk about some of the key steps.

Initialization slot: ensureSegment

Concurrent HashMap initializes the first slot segment[0], and for other slots, initializes when the first value is inserted.

Consider concurrency here, because it's likely that multiple threads will come in at the same time to initialize the same slot segment[k], but only if one succeeds.

private Segment<K,V> ensureSegment(int k) {
    final Segment<K,V>[] ss = this.segments;
    long u = (k << SSHIFT) + SBASE; // raw offset
    Segment<K,V> seg;
    if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) {
        // Here you see why segment[0] was initialized before.
        // Initialize segment[k] using the array length and load factor at the current segment[0]
        // Why use "current" because segment[0] may have been expanded long ago
        Segment<K,V> proto = ss[0];
        int cap = proto.table.length;
        float lf = proto.loadFactor;
        int threshold = (int)(cap * lf);

        // Initialize the array inside segment[k]
        HashEntry<K,V>[] tab = (HashEntry<K,V>[])new HashEntry[cap];
        if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))
            == null) { // Check again whether the slot has been initialized by other threads.

            Segment<K,V> s = new Segment<K,V>(lf, threshold, tab);
            // Use the while loop, use CAS internally, exit after the current thread has successfully set its value or other threads have successfully set their value.
            while ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))
                   == null) {
                if (UNSAFE.compareAndSwapObject(ss, u, null, seg = s))
                    break;
            }
        }
    }
    return seg;
}

Overall, ensure Segment (int k) is relatively simple, and CAS is used to control concurrent operations.

I don't understand why a while loop is needed here. If CAS fails, does not it mean that other threads succeed? Why should we judge again?

Thank you for the plum tree in the comment area. If the current thread CAS fails, the while loop here is to return the seg assignment.

Get the write lock: scanAndLockForPut

As we saw earlier, when put ting into a segment, we first call node = tryLock()? Null: scanAndLockForPut (key, hash, value), that is to say, we first make a tryLock() to quickly acquire the exclusive lock of the segment, and if it fails, enter the scanAndLockForPut method to acquire the lock.

Now let's analyze how to control locking in this method.

private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
    HashEntry<K,V> first = entryForHash(this, hash);
    HashEntry<K,V> e = first;
    HashEntry<K,V> node = null;
    int retries = -1; // negative while locating node

    // Loop acquisition lock
    while (!tryLock()) {
        HashEntry<K,V> f; // to recheck first below
        if (retries < 0) {
            if (e == null) {
                if (node == null) // speculatively create node
                    // Enter here to show that the list at that location of the array is empty, without any elements.
                    // Of course, another reason to get in here is the failure of tryLock(), so there is concurrency in the slot, not necessarily the location.
                    node = new HashEntry<K,V>(hash, key, value, null);
                retries = 0;
            }
            else if (key.equals(e.key))
                retries = 0;
            else
                // Go down the chain
                e = e.next;
        }
        // If the number of retries exceeds MAX_SCAN_RETRIES (single core 1 multi-core 64), then no grab, enter the blocking queue and wait for the lock.
        //    lock() is a blocking method until the lock is retrieved
        else if (++retries > MAX_SCAN_RETRIES) {
            lock();
            break;
        }
        else if ((retries & 1) == 0 &&
                 // At this time, there is a big problem, that is, there are new elements into the list, become a new header.
                 //     So the strategy here is to go through the scanAndLockForPut method again.
                 (f = entryForHash(this, hash)) != first) {
            e = first = f; // re-traverse if entry changed
            retries = -1;
        }
    }
    return node;
}

This method has two exits, one is that tryLock() succeeds, the loop terminates, and the other is that the number of retries exceeds MAX_SCAN_RETRIES and enters the lock() method, which blocks waiting until the exclusive lock is successfully obtained.

This method is seemingly complex, but in fact it is to do one thing, that is, get the exclusive lock of the segment, and instantiate the node if necessary.

Expansion: rehash

Again, segmented arrays can't be expanded. The expansion is the HashEntry[] array inside a location of the segmented array. After expansion, the capacity is twice the original capacity.

First of all, we want to review the place where the expansion is triggered. When putting, if it is judged that the insertion of the value will cause the number of elements in the segment to exceed the threshold value, the reader can go back to the put method to have a look.

This method does not need to consider concurrency, because at this point, it holds the exclusive lock of the segment.

// The node on the method parameter is the data that needs to be added to the new array after this expansion.
private void rehash(HashEntry<K,V> node) {
    HashEntry<K,V>[] oldTable = table;
    int oldCapacity = oldTable.length;
    // 2 times
    int newCapacity = oldCapacity << 1;
    threshold = (int)(newCapacity * loadFactor);
    // Create a new array
    HashEntry<K,V>[] newTable =
        (HashEntry<K,V>[]) new HashEntry[newCapacity];
    // If the new mask is expanded from 16 to 32, then the sizeMask is 31, corresponding to the binary `000... 00011111'.
    int sizeMask = newCapacity - 1;

    // Traversing through the original array, the old routine, splitting the list at the position I of the original array into two positions I and i+oldCap of the new array
    for (int i = 0; i < oldCapacity ; i++) {
        // e is the first element of the list
        HashEntry<K,V> e = oldTable[i];
        if (e != null) {
            HashEntry<K,V> next = e.next;
            // Computing should be placed in a new array.
            // Assuming that the length of the original array is 16 and e is at oldTable[3], idx may only be 3 or 3 + 16 = 19
            int idx = e.hash & sizeMask;
            if (next == null)   // There is only one element in this location, which is easier to do.
                newTable[idx] = e;
            else { // Reuse consecutive sequence at same slot
                // e is the list head
                HashEntry<K,V> lastRun = e;
                // idx is the new location of the head node e of the current linked list
                int lastIdx = idx;

                // The for loop below finds a lastRun node, after which all the elements are to be put together.
                for (HashEntry<K,V> last = next;
                     last != null;
                     last = last.next) {
                    int k = last.hash & sizeMask;
                    if (k != lastIdx) {
                        lastIdx = k;
                        lastRun = last;
                    }
                }
                // Put the list of lastRun and all subsequent nodes in lastIdx
                newTable[lastIdx] = lastRun;
                // The following operation deals with the nodes before lastRun.
                //    These nodes may be assigned to another list or to the list above.
                for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
                    V v = p.value;
                    int h = p.hash;
                    int k = h & sizeMask;
                    HashEntry<K,V> n = newTable[k];
                    newTable[k] = new HashEntry<K,V>(h, p.key, v, n);
                }
            }
        }
    }
    // Place the new node at the head of one of the two lists just created in the new array
    int nodeIndex = node.hash & sizeMask; // add the new node
    node.setNext(newTable[nodeIndex]);
    newTable[nodeIndex] = node;
    table = newTable;
}

The expansion here is a little more complicated than the previous HashMap, and the code is a little more difficult to understand. There are two for loops next to each other. What's the use of the first for?

A closer look shows that if there is no first for loop, it will work, but if there are more nodes behind lastRun, it will be worthwhile this time. Because we just need to clone the nodes in front of lastRun, and the latter ones follow lastRun without any operation.

I think Doug Lea's idea is interesting too, but the worse case is that every last Run is the last element of the list or a very backward element, so this traversal is a bit wasteful. But Doug Lea also said that, according to statistics, if the default threshold is used, only about one-sixth of the nodes need to be cloned.

get process analysis

get is really not too simple for put.

  1. Calculate the hash value to find the exact location in the segment array, or the slot we used earlier
  2. The slot is also an array, according to hash to find the specific location of the array
  3. Here is the list. Just follow the list.
public V get(Object key) {
    Segment<K,V> s; // manually integrate access methods to reduce overhead
    HashEntry<K,V>[] tab;
    // 1. hash value
    int h = hash(key);
    long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
    // 2. Find the corresponding segment according to hash
    if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
        (tab = s.table) != null) {
        // 3. Find the linked list of the corresponding positions of the internal array of segment s and traverse it
        for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
                 (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
             e != null; e = e.next) {
            K k;
            if ((k = e.key) == key || (e.hash == h && key.equals(k)))
                return e.value;
        }
    }
    return null;
}

 

size operation

put, remove, and get operations only need to care about one Segment, while size operations need to traverse all Segments to calculate the size of the entire Map. A simple solution is to lock all Sgment s and unlock them after calculation. But in this way, when doing the size operation, it is not only unable to write the Map, but also unable to read, which is not conducive to the parallel operation of the Map.

To better support concurrent operations, Concurrent HashMap calculates the size of each Segment three times on the premise that it is unlocked. If the update times of all segments acquired by an adjacent two calculations (each Segment tracks its own modification times through modCount like HashMap, and the modCount plus one for each modification of a Segment) are equal, say If there is no update operation in the two calculations, the total size calculated by the two calculations is equal and can be returned directly as the final result. If the Map is updated during the three calculations, all Segments are locked and the Size is recalculated. The calculation code is as follows.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

 

public int size() {

final Segment<K,V>[] segments = this.segments;

int size;

boolean overflow; // true if size overflows 32 bits

long sum; // sum of modCounts

long last = 0L; // previous sum

int retries = -1; // first iteration isn't retry

try {

for (;;) {

if (retries++ == RETRIES_BEFORE_LOCK) {

for (int j = 0; j < segments.length; ++j)

ensureSegment(j).lock(); // force creation

}

sum = 0L;

size = 0;

overflow = false;

for (int j = 0; j < segments.length; ++j) {

Segment<K,V> seg = segmentAt(segments, j);

if (seg != null) {

sum += seg.modCount;

int c = seg.count;

if (c < 0 || (size += c) < 0)

overflow = true;

}

}

if (sum == last)

break;

last = sum;

}

} finally {

if (retries > RETRIES_BEFORE_LOCK) {

for (int j = 0; j < segments.length; ++j)

segmentAt(segments, j).unlock();

}

}

return overflow ? Integer.MAX_VALUE : size;

}

 

Differences

JDK 1.7

Compared with HashMap, Concurrent HashMap has the following differences

  • Concurrent HashMap is thread-safe while HashMap is non-thread-safe
  • HashMap allows Key and Value to be null, while Concurrent HashMap does not.
  • HashMap does not allow modifications through HashMap while traversing through Iterator, while Concurrent HashMap allows this behavior, and the update is visible for subsequent traversals.

 

Analysis of Concurrent Problems

 

Now that we have finished put ting and getting, we can see that there is no lock in getting, so naturally we need to consider concurrency.

The operation of putting and removing nodes is to add exclusive locks on segments, so naturally there is no problem between them. The problem we need to consider is that putor removing operations occur in the same segment when get ting.

  1. Thread security for put operations.

    1. Initialization slots, as we said earlier, use CAS to initialize arrays in Segment s.
    2. The operation of adding nodes to the list is inserted into the head of the table, so if the get operation is in the middle of the list traversal process, it will not affect. Of course, another concurrency problem is that after put, the get operation needs to ensure that the node just inserted into the header is read, which depends on the UNSAFE. putOrdered Object used in the setEntryAt method.
    3. Expansion. Expansion is to create an array, migrate the data, and finally set the newTable to the attribute table. So, if the get operation is also in progress at this time, then it doesn't matter. If the get is first, then the query operation is done on the old table; and put is first, then the visibility guarantee of the put operation is that the table uses volatile keyword.
  2. Thread security for remote operations.

    We don't analyze the source code for remove operation, so what the reader is interested in here still needs to go to the source code to find out the facts.

    The get operation needs to traverse the list, but the remove operation "destroys" the list.

    If the get operation of the node destroyed by remote has passed, then there is no problem here.

    If remove destroys a node first, consider it in two cases. 1. If this node is a header node, then the next of the header node needs to be set as the element of the location of the array. table uses volatile modification, but volatile can not provide visibility guarantee for the internal operation of the array. So the source code uses UNSAFE to operate the array. See the method setEntryAt. 2. If the node to be deleted is not a header node, it will connect the successor node of the deleted node to the precursor node. The concurrency guarantee here is that the next attribute is volatile.

Java8 HashMap

Java 8 has made some modifications to HashMap. The biggest difference is that it uses red and black trees, so it consists of arrays, linked lists and red and black trees.

According to the introduction of Java 7 HashMap, we know that when searching, we can quickly locate the specific subscript of the array according to the hash value, but then we need to follow the list one by one to find what we need. The time complexity depends on the length of the list, which is O(n).

To reduce this overhead, in Java 8, when the elements in the list reach 8, the list will be converted to a red-black tree, which can reduce the time complexity to O(logN) when searching for these locations.

Let's have a simple picture.

Note that the figure above is a schematic, mainly describing the structure, and will not achieve this state, because so much data has been expanded long ago.

Next, let's use the code to introduce it. Personally, Java 8's source code is less readable, but it's more streamlined.

In Java 7, Entry is used to represent data nodes in each HashMap. In Java 8, Node is basically the same as key, value, hash and next. However, Node can only be used in the case of linked lists, and TreeNode is used in the case of red-black trees.

We judge whether the first node data type is Node or TreeNode in an array element is a linked list or a red-black tree at that location.

put process analysis

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

// If the third parameter, onlyIfAbsent, is true, then the put operation is performed only if the key does not exist.
// The fourth parameter, evict, we don't care here.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // When the first put value is used, the resize() below will be triggered, and the length of the array will be initialized for the first put similar to java7.
    // The first resize is somewhat different from the subsequent extensions, because this time the array is initialized from null to the default 16 or custom initial capacity
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // Find the specific array subscripts. If there is no value at this location, initialize Node directly and place it in this location.
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);

    else {// Array where there is data
        Node<K,V> e; K k;
        // First, determine whether the first data at that location is "equal" to the data we want to insert, and if so, remove the node.
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // If the node is the node representing the red-black tree, the interpolation method of the red-black tree is called. This paper does not expand on the red-black tree.
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // Here, it shows that the array is a linked list at that location.
            for (int binCount = 0; ; ++binCount) {
                // Insert at the end of the list (Java 7 is at the top of the list)
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // TREEIFY_THRESHOLD is 8, so if the newly inserted value is the eighth in the list
                    // This triggers the following treeifyBin, which converts the linked list to a red-black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If an "equal" key(=== or equals) is found in the list
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    // At this point, break, then e is the node in the list [equal to the key of the new value to be inserted]
                    break;
                p = e;
            }
        }
        // e!=null indicates that the key with an old value is "equal" to the key to be inserted.
        // For the put operation we analyzed, the following if is actually a "value override" and then returns the old value.
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // If the size of HashMap has exceeded the threshold due to the new insertion of this value, it needs to be expanded.
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

A slight difference from Java 7 is that Java 7 expands first and then inserts new values. Java 8 interpolates first and then expands, but this is not important.

Array expansion

resize() method is used to initialize the array or array expansion, after each expansion, the capacity is twice the original, and data migration is carried out.

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) { // Corresponding Array Expansion
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Double the size of the array
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // Double the threshold
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // After initialization using new HashMap (int initial capacity), the first put time
        newCap = oldThr;
    else {// After initialization using new HashMap(), the first put time
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }

    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;

    // Initialize a new array with a new array size
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab; // If it's an initialization array, it's over here. Return to newTab.

    if (oldTab != null) {
        // Start traversing the original array for data migration.
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                // If there is only a single element in the array position, it's easy. Simply migrate the element.
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                // If it's a red-black tree, we're not going to expand it.
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { 
                    // This is the case with linked lists.
                    // You need to split this list into two lists, put it in a new array, and keep the original order.
                    // loHead and loTail correspond to one linked list, hiHead and hiTail correspond to another linked list. The code is relatively simple.
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        // The first list
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        // The new location of the second list is j + oldCap, which is easy to understand.
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

get process analysis

get is really too simple for put.

  1. Calculate the hash value of key and find the corresponding array subscript according to the hash value: hash & (length-1)
  2. Judging whether the element at that location of the array is exactly what we are looking for, if not, take the third step.
  3. Determine whether the element type is TreeNode. If so, use the red-black tree method to get the data. If not, take the fourth step.
  4. Traverse the list until you find the key equal (== or equals)
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // Determine if the first node is needed
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            // Judging whether it is a red-black tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);

            // List traversal
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

Java8 ConcurrentHashMap

The Concurrent HashMap implemented in Java 7 is quite complex to tell the truth, and Java 8 has made great changes to Concurrent HashMap. It is suggested that readers can refer to the changes of HashMap in Java 8 relative to Java 7 HashMap. For Concurrent HashMap, Java 8 also introduces red and black trees.

To be honest, Java 8 Concurrent HashMap source code is not simple, the most difficult is to expand, data migration operation is not easy to understand.

Let's first describe its structure with a schematic diagram:

Structurally, it's basically the same as Java 8's HashMap, but it does have to be thread-safe, so it's a bit more complex on the source code.

Initialization

// Nothing in this constructor
public ConcurrentHashMap() {
}
public ConcurrentHashMap(int initialCapacity) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException();
    int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
               MAXIMUM_CAPACITY :
               tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
    this.sizeCtl = cap;
}

This initialization method is somewhat interesting. By providing the initial capacity, the sizeCtl, sizeCtl = 1.5 * initial capacity + 1, is calculated, and then the nearest n power of 2 is taken up. If initial Capacity is 10, then sizeCtl is 16, and if initial Capacity is 11, sizeCtl is 32.

The attribute sizeCtl is used in many scenarios, but as long as you follow the idea of the article, you won't be confused by it.

If you like tossing, you can also see another method with three parameters. I won't say that here. Most of the time, we will use the parametric constructor to instantiate. Let's also use this idea to do source code analysis.

put process analysis

Look carefully at the line-by-line code:

public V put(K key, V value) {
    return putVal(key, value, false);
}
final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    // Get the hash value
    int hash = spread(key.hashCode());
    // Length used to record the corresponding linked list
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        // If the array is empty, initialize the array
        if (tab == null || (n = tab.length) == 0)
            // Initialize arrays, as detailed later
            tab = initTable();

        // Find the array subscript corresponding to the hash value and get the first node f
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            // If the position of the array is empty,
            //    Put the new value into it with a CAS operation, and the put operation is almost over and can be pulled to the end.
            //          If CAS fails, it's a concurrent operation, just go to the next loop.
            if (casTabAt(tab, i, null,
                         new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }
        // hash can be equal to MOVED, which needs to be seen later, but it can be guessed from the name, it must be because of expansion.
        else if ((fh = f.hash) == MOVED)
            // Help with data migration, which is easy to understand after reading the introduction of the data migration section.
            tab = helpTransfer(tab, f);

        else { // That is to say, f is the head of the position, and it's not empty.

            V oldVal = null;
            // Monitor lock to get the header node of the array at that location
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) { // The hash value of the head node is greater than 0, indicating that it is a linked list.
                        // For accumulation, record the length of the linked list
                        binCount = 1;
                        // Traversal list
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            // If an "equal" key is found, determine whether value coverage is required, and then break
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            // At the end of the list, put the new value at the end of the list.
                            Node<K,V> pred = e;
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    else if (f instanceof TreeBin) { // red-black tree
                        Node<K,V> p;
                        binCount = 2;
                        // Call the interpolation method of red-black tree to insert new nodes
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                       value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
            // BinCount!= 0 indicates that the list operation is done above
            if (binCount != 0) {
                // To determine whether to convert a linked list to a red-black tree, the critical value is 8, just like HashMap.
                if (binCount >= TREEIFY_THRESHOLD)
                    // This method is slightly different from HashMap in that it is not necessarily a red-black tree conversion.
                    // If the length of the current array is less than 64, the array expansion is chosen instead of converting to a red-black tree.
                    //    We will not look at the specific source code, the expansion section later said ___________.
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }
    }
    // 
    addCount(1L, binCount);
    return null;
}

The main process of put is finished, but there are at least a few problems left. The first one is initialization, the second one is expansion, and the third one is to help data migration, which we will introduce later.

Initialization array: initTable

This is relatively simple, basically initializing an array of the appropriate size, and then setting sizeCtl.

The concurrency problem in the initialization method is controlled by a CAS operation on sizeCtl.

private final Node<K,V>[] initTable() {
    Node<K,V>[] tab; int sc;
    while ((tab = table) == null || tab.length == 0) {
        // Initialized "credit" was "grabbed" by other threads
        if ((sc = sizeCtl) < 0)
            Thread.yield(); // lost initialization race; just spin
        // CAS, set sizeCtl to - 1, which means the lock is grabbed.
        else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
            try {
                if ((tab = table) == null || tab.length == 0) {
                    // DEFAULT_CAPACITY defaults to an initial capacity of 16
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    // Initialization array, 16 or the length provided at initialization
                    Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                    // Assign this array to table, which is volatile
                    table = tab = nt;
                    // If n is 16, sc = 12
                    // Actually, it's 0.75 * n.
                    sc = n - (n >>> 2);
                }
            } finally {
                // Set sizeCtl to sc. Let's call it 12.
                sizeCtl = sc;
            }
            break;
        }
    }
    return tab;
}

Chain list to red-black tree: treeifyBin

As we said earlier in put source analysis, treeifyBin does not necessarily convert red-black trees, or it may just expand arrays. Let's do source code analysis.

private final void treeifyBin(Node<K,V>[] tab, int index) {
    Node<K,V> b; int n, sc;
    if (tab != null) {
        // MIN_TREEIFY_CAPACITY is 64
        // So, if the length of the array is less than 64, that is, 32 or 16 or less, it will expand the array.
        if ((n = tab.length) < MIN_TREEIFY_CAPACITY)
            // Later we will analyze this method in detail.
            tryPresize(n << 1);
        // b is the head node
        else if ((b = tabAt(tab, index)) != null && b.hash >= 0) {
            // Lock up
            synchronized (b) {

                if (tabAt(tab, index) == b) {
                    // Here's how to traverse the list and build a red-black tree
                    TreeNode<K,V> hd = null, tl = null;
                    for (Node<K,V> e = b; e != null; e = e.next) {
                        TreeNode<K,V> p =
                            new TreeNode<K,V>(e.hash, e.key, e.val,
                                              null, null);
                        if ((p.prev = tl) == null)
                            hd = p;
                        else
                            tl.next = p;
                        tl = p;
                    }
                    // Set the red-black tree to the corresponding position of the array
                    setTabAt(tab, index, new TreeBin<K,V>(hd));
                }
            }
        }
    }
}

Expansion: tryPresize

If the source code of Java 8 Concurrent HashMap is not simple, then it is expansion operation and migration operation.

This method needs to be fully understood after the transfer method, readers should know this in advance.

The capacity expansion here is doubled, and the capacity of the array after expansion is twice that of the original.

// First of all, the method parameter size doubled when it came in.
private final void tryPresize(int size) {
    // c: size 1.5 times, plus 1, and then take the nearest 2 n power up.
    int c = (size >= (MAXIMUM_CAPACITY >>> 1)) ? MAXIMUM_CAPACITY :
        tableSizeFor(size + (size >>> 1) + 1);
    int sc;
    while ((sc = sizeCtl) >= 0) {
        Node<K,V>[] tab = table; int n;

        // This if branch is basically the same code as the initialization array we said before. Here, we can leave this code alone.
        if (tab == null || (n = tab.length) == 0) {
            n = (sc > c) ? sc : c;
            if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
                try {
                    if (table == tab) {
                        @SuppressWarnings("unchecked")
                        Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                        table = nt;
                        sc = n - (n >>> 2); // 0.75 * n
                    }
                } finally {
                    sizeCtl = sc;
                }
            }
        }
        else if (c <= sc || n >= MAXIMUM_CAPACITY)
            break;
        else if (tab == table) {
            // I don't understand what rs really means, but it doesn't matter.
            int rs = resizeStamp(n);

            if (sc < 0) {
                Node<K,V>[] nt;
                if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                    sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                    transferIndex <= 0)
                    break;
                // 2. Add sizeCtl to 1 with CAS, and then execute transfer method
                //    At this point nextTab is not null
                if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
                    transfer(tab, nt);
            }
            // 1. Set sizeCtl to (rs < RESIZE_STAMP_SHIFT) +2)
            //     Did I understand the true meaning of this value? But what you can calculate is that the result is a relatively large negative number.
            //  Call the transfer method, where the nextTab parameter is null
            else if (U.compareAndSwapInt(this, SIZECTL, sc,
                                         (rs << RESIZE_STAMP_SHIFT) + 2))
                transfer(tab, null);
        }
    }
}

The core of this method is the operation of sizeCtl value. First, set it to a negative number, then execute transfer(tab, null), then add sizeCtl to 1 in the next loop, and execute transfer(tab, nt), then maybe continue adding 1 to sizeCtl and execute transfer(tab, nt).

So, the possible operation is to execute a transfer(tab, null) plus multiple transfers (tab, nt), where how to end the cycle needs to see the transfer source code before it is clear.

Data migration:transfer

The following method is a bit long, migrating the elements of the original tab array to the new nextTab array.

Although the tryPresize method we mentioned earlier does not involve multithreading, this transfer method can be invoked elsewhere. Typically, as we said earlier when we talked about the put method, look up at the put method. Is there a place where the helpTransfer method is called, helpTr? The ansfer method calls the transfer method.

This method supports multi-threaded execution. When the method is called peripherally, the first thread that initiates data migration is guaranteed. The nextTab parameter is null. When this method is called later, the nextTab will not be null.

Before you read the source code, you need to understand the mechanism of concurrent operations. The original array length is n, so we have n migration tasks. It is simplest for each thread to take charge of a small task at a time. Once a task has been completed, we can check whether there are other unfinished tasks to help migrate. Doug Lea uses a stride, which is simply understood as step size. Each thread has a negative step. Part of the responsibility migration, such as 16 small tasks per migration. Therefore, we need a global scheduler to arrange which thread to perform which tasks, which is the role of the attribute transferIndex.

The first thread that initiates the data migration will point transferIndex to the last position of the original array, then the stride tasks from back to front belong to the first thread, then transfer Index to the new location, and the stride tasks from forward belong to the second thread, and so on. Of course, the second thread mentioned here does not necessarily refer to the second thread, but it can also refer to the same thread. This reader should be able to understand. In fact, it divides a large migration task into task packages.

private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
    int n = tab.length, stride;

    // stride is directly equal to N in single core mode and (n > > 3) / NCPU in multi-core mode. The minimum value is 16.
    // stride can be understood as "step size". There are n locations that need to be migrated.
    //   Divide the n tasks into multiple task packages, each with stride tasks
    if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
        stride = MIN_TRANSFER_STRIDE; // subdivide range

    // If nextTab is null, initialize it once first
    //    As we said earlier, the periphery guarantees that when the first thread that initiates the migration calls this method, the parameter nextTab is null.
    //       nextTab will not be null when the thread participating in the migration calls this method
    if (nextTab == null) {
        try {
            // Capacity doubling
            Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
            nextTab = nt;
        } catch (Throwable ex) {      // try to cope with OOME
            sizeCtl = Integer.MAX_VALUE;
            return;
        }
        // nextTable is an attribute in Concurrent HashMap
        nextTable = nextTab;
        // TransfIndex is also an attribute of Concurrent HashMap to control the location of migration
        transferIndex = n;
    }

    int nextn = nextTab.length;

    // Forwarding Node is translated as the Node being migrated.
    // This constructor generates a Node with null keys, value s, and next. The key is that hash is MOVED.
    // As we will see later, when the node at position i in the original array completes the migration work,
    //    Location i is set to this Forwarding Node to tell other threads that the location has been processed
    //    So it's actually a sign.
    ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);


    // advance refers to the migration of one location to prepare for the next.
    boolean advance = true;
    boolean finishing = false; // to ensure sweep before committing nextTab

    /*
     * The following for loop, the most difficult to understand in the front, and to understand them, should first understand the back, and then back to see.
     * 
     */

    // i is the location index, bound is the boundary, note that from back to front
    for (int i = 0, bound = 0;;) {
        Node<K,V> f; int fh;

        // The following while is really hard to understand
        // advance is true to indicate that migration to the next location is possible
        //   Simply understand the ending: i points to transfer index, bound points to transfer index-stride
        while (advance) {
            int nextIndex, nextBound;
            if (--i >= bound || finishing)
                advance = false;

            // Assign the transferIndex value to nextIndex
            // Once transferIndex is less than or equal to 0, it means that all the positions of the original array are processed by corresponding threads.
            else if ((nextIndex = transferIndex) <= 0) {
                i = -1;
                advance = false;
            }
            else if (U.compareAndSwapInt
                     (this, TRANSFERINDEX, nextIndex,
                      nextBound = (nextIndex > stride ?
                                   nextIndex - stride : 0))) {
                // Look at the code in parentheses. nextBound is the boundary of the migration task. Note that it's backward and forward.
                bound = nextBound;
                i = nextIndex - 1;
                advance = false;
            }
        }
        if (i < 0 || i >= n || i + n >= nextn) {
            int sc;
            if (finishing) {
                // All migration operations have been completed
                nextTable = null;
                // Assign the new nextTab to the table attribute to complete the migration
                table = nextTab;
                // Recalculating sizeCtl: n is the length of the original array, so sizeCtl will get a value of 0.75 times the length of the new array.
                sizeCtl = (n << 1) - (n >>> 1);
                return;
            }

            // As we said before, sizeCtl is set to (rs < RESIZE_STAMP_SHIFT) +2 before migration.
            // Then, each thread participating in the migration adds sizeCtl to 1.
            // Here we use CAS operation to subtract 1 from sizeCtl, which means that we have finished our own task.
            if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                // Task End, Method Exit
                if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                    return;

                // Here, note (sc - 2) == resizeStamp (n) << RESIZE_STAMP_SHIFT,
                // That is to say, when all migration tasks are completed, they will enter the if(finishing) {} branch above.
                finishing = advance = true;
                i = n; // recheck before commit
            }
        }
        // If the location i is empty and there are no nodes, place the newly initialized Forwarding Node empty node.“
        else if ((f = tabAt(tab, i)) == null)
            advance = casTabAt(tab, i, null, fwd);
        // The location is a Forwarding Node, which indicates that the location has been moved.
        else if ((fh = f.hash) == MOVED)
            advance = true; // already processed
        else {
            // Lock the node at the location of the array and start processing migration at that location of the array
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    Node<K,V> ln, hn;
                    // The hash of the header node is greater than 0, indicating that it is the Node node of the linked list.
                    if (fh >= 0) {
                        // The following section is similar to the Concurrent HashMap migration in Java 7.
                        // We need to split the list in two.
                        //   Find lastRun in the original list, and then last Run and its subsequent nodes migrate together.
                        //   The nodes before lastRun need to be cloned and then divided into two linked lists
                        int runBit = fh & n;
                        Node<K,V> lastRun = f;
                        for (Node<K,V> p = f.next; p != null; p = p.next) {
                            int b = p.hash & n;
                            if (b != runBit) {
                                runBit = b;
                                lastRun = p;
                            }
                        }
                        if (runBit == 0) {
                            ln = lastRun;
                            hn = null;
                        }
                        else {
                            hn = lastRun;
                            ln = null;
                        }
                        for (Node<K,V> p = f; p != lastRun; p = p.next) {
                            int ph = p.hash; K pk = p.key; V pv = p.val;
                            if ((ph & n) == 0)
                                ln = new Node<K,V>(ph, pk, pv, ln);
                            else
                                hn = new Node<K,V>(ph, pk, pv, hn);
                        }
                        // One of the linked lists is in the position of the new array i
                        setTabAt(nextTab, i, ln);
                        // Another list is placed i n the new array position i+n
                        setTabAt(nextTab, i + n, hn);
                        // Setting the location of the original array to fwd means that the location has been processed.
                        //    Once other threads see the hash value of the location as MOVED, they will not migrate.
                        setTabAt(tab, i, fwd);
                        // advance is set to true to indicate that the location has been migrated
                        advance = true;
                    }
                    else if (f instanceof TreeBin) {
                        // Migration of Red and Black Trees
                        TreeBin<K,V> t = (TreeBin<K,V>)f;
                        TreeNode<K,V> lo = null, loTail = null;
                        TreeNode<K,V> hi = null, hiTail = null;
                        int lc = 0, hc = 0;
                        for (Node<K,V> e = t.first; e != null; e = e.next) {
                            int h = e.hash;
                            TreeNode<K,V> p = new TreeNode<K,V>
                                (h, e.key, e.val, null, null);
                            if ((h & n) == 0) {
                                if ((p.prev = loTail) == null)
                                    lo = p;
                                else
                                    loTail.next = p;
                                loTail = p;
                                ++lc;
                            }
                            else {
                                if ((p.prev = hiTail) == null)
                                    hi = p;
                                else
                                    hiTail.next = p;
                                hiTail = p;
                                ++hc;
                            }
                        }
                        // If the number of nodes is less than 8 after split into two, then the red-black tree is converted back to the linked list
                        ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                            (hc != 0) ? new TreeBin<K,V>(lo) : t;
                        hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                            (lc != 0) ? new TreeBin<K,V>(hi) : t;

                        // Placing ln in the position of the new array i
                        setTabAt(nextTab, i, ln);
                        // Place hn i n the position i+n of the new array
                        setTabAt(nextTab, i + n, hn);
                        // Setting the location of the original array to fwd means that the location has been processed.
                        //    Once other threads see the hash value of the location as MOVED, they will not migrate.
                        setTabAt(tab, i, fwd);
                        // advance is set to true to indicate that the location has been migrated
                        advance = true;
                    }
                }
            }
        }
    }
}

In the final analysis, the transfer method does not achieve all the migration tasks. Each call to this method only implements the migration of transfer Index to the forward stride location. The rest needs to be controlled by the peripheral.

At this point, it may be clearer to go back and look at the tryPresize method.

get process analysis

The get method has always been the simplest, and this is no exception:

  1. Calculate hash value
  2. Find the corresponding position of the array according to the hash value: (n - 1) & H
  3. According to the node property at the location, the corresponding search is carried out.
    • If the location is null, return null directly.
    • If the node at that location is exactly what we need, return the value of that node.
    • If the hash value of the node in this location is less than 0, it means that it is expanding, or the red-black tree. We will introduce the find method later.
    • If none of the above three items is satisfied, it is a linked list, which can be compared by traversal.
public V get(Object key) {
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
    int h = spread(key.hashCode());
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (e = tabAt(tab, (n - 1) & h)) != null) {
        // Judging whether the header node is the node we need
        if ((eh = e.hash) == h) {
            if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                return e.val;
        }
        // If the hash of the header node is less than 0, it indicates that it is expanding or that the location is a red-black tree.
        else if (eh < 0)
            // Refer to ForwardingNode.find(int h, Object k) and TreeBin.find(int h, Object k)
            return (p = e.find(h, key)) != null ? p.val : null;

        // Traversal list
        while ((e = e.next) != null) {
            if (e.hash == h &&
                ((ek = e.key) == key || (ek != null && key.equals(ek))))
                return e.val;
        }
    }
    return null;
}

Simply put, most of the content of this method is very simple, only when it happens to be expanded, Forwarding Node. find (int h, Object k) is a little more complex, but after understanding the process of data migration, this is not difficult, so I will not expand it here.

 

size operation

Both put and remove methods maintain the size of the Map through the addCount method. The size of Map maintained by the addCount method is obtained by the size method of sum Count.

 

Synchronization

For the put operation, if the array element corresponding to Key is null, the CAS operation Set it to the current value. If the array element corresponding to Key (that is, the list header or the root element of the tree) is not null, the synchronized keyword is used to apply for a lock on the element and then operate on it. If the put operation causes the current list length to exceed a certain threshold, then the list is converted into a tree to improve the addressing efficiency.

For read operations, because arrays are modified by volatile keywords, there is no need to worry about the visibility of arrays. At the same time, each element is a Node instance (every element in Java 7 is a HashEntry). Its Key and hash values are modified by final and can not be changed without concern for their modified visibility. Its Value and its reference to the next element are modified by volatile, and visibility is guaranteed.

 

1

2

3

4

5

6

 

static class Node<K,V> implements Map.Entry<K,V> {

final int hash;

final K key;

volatile V val;

volatile Node<K,V> next;

}

The visibility of array elements corresponding to Key is guaranteed by Unsafe's getObjectVolatile method.

 

1

2

3

 

static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {

return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);

}

summary

In fact, it is not very difficult. Although there is no line-by-line source code analysis like AQS and thread pool before, all the places that beginners may be confused are thoroughly introduced. As long as a slightly basic reader, it should be easy to understand HashMap and Concurrent HashMap source code.

Look at the source code is not a goal, in-depth understanding of the design ideas of Doug Lea, I think it is interesting, the master is the master, the code is really good ah.

I found that many people think that I write blogs are mainly source analysis, to be honest, I do not have so much enthusiasm for source analysis, mainly for the sake of using source code to say things, maybe later articles will still have more source analysis components, you can see what you think.

Don't be ashamed to think that the quality of this article is still very high and the amount of information is relatively large. If you think there are some bad places to write, or if you still don't understand them after reading this article, please bring them up.~~~

(End of the text)

 

Tags: Java JDK Attribute less

Posted on Fri, 09 Aug 2019 22:43:41 -0700 by stereofrog