LinkedList implementation principle (JDK1.8)

LinkedList implementation principle (JDK1.8)

The bottom layer of LinkedList adopts bidirectional linked list. If you are familiar with the structure of linked list, it is quite easy to understand the implementation principle of LinkedList.

The linked list uses a group of scattered memory blocks in series through "pointer". Each element (node) points to its next element through the pointer, and the next point of the last node is null. The bidirectional linked list is the last element that points to every element except the head node at the same time. Linked list is different from array, because its address is not continuous, and the size of element occupation is uncertain, so it is impossible to get the value directly according to the address, and it needs traversal access when getting the element. Compared with one-way linked list, two-way linked list takes up extra memory to some extent, but supports two-way traversal, and speeds up element acquisition.

public class LinkedList<E> extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable, java.io.Serializable

LinkedList inherits from AbstractSequentialList abstract class, AbstractSequentialList inherits from AbstractList, and finally implements the List interface, so LinkedList supports some regular operations of List collection, but LinkedList also implements the Queue interface, so LinkedList also supports some uses of queues, such as peek, pop, etc.

1. Member variable

    transient int size = 0;

    /**
     * Pointer to first node.
     * Invariant: (first == null && last == null) ||
     *            (first.prev == null && first.item != null)
     */
    transient Node<E> first;

    /**
     * Pointer to last node.
     * Invariant: (first == null && last == null) ||
     *            (last.next == null && last.item != null)
     */
    transient Node<E> last;

LinkedList has three main member variables: size, first of the head Node and last of the tail Node, which also conforms to the characteristics of two-way linked list. According to the head or tail Node, the entire linked list can be traversed. As for the Node type, it is the internal class Node.

    private static class Node<E> {
        // Node data
        E item;
        // Next node
        Node<E> next;
        // Previous node
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

The Node class is simply composed of three attributes, which are the three essential elements of a bidirectional linked list Node, the current Node data item, the next Node and the prev of the previous Node.

2. Construction method

    public LinkedList() {
    }

    public LinkedList(Collection<? extends E> c) {
        this();
        // Add all elements
        addAll(c);
    }

LinkedList mainly consists of two construction methods, one is parameterless construction, and the other is accepting the construction of collection. This method initializes LinkedList object and adds all elements of collection c.

As for the addAll method, we'll see later.

LinkedList is based on linked list, so it does not need to apply for memory size in advance, and there is no initialization specified capacity size, so LinkedList has no initialization specified size construction method.

3. Add elements

3.1. Tail addition

There are many ways to add LinkedList, including the simplest and most commonly used tail adding.

    public boolean add(E e) {
        linkLast(e);
        return true;
    }

The main logic is linklast (e e e).

    void linkLast(E e) {
        final Node<E> l = last;
        // Construct a new node, the previous node points to the original last
        final Node<E> newNode = new Node<>(l, e, null);
        last = newNode;
        if (l == null)
            first = newNode;
        else
            l.next = newNode;
        size++;
        // Self increment of operands
        modCount++;
    }

Tail adding is to construct a new node, newNode, and point prev of the previous node to the original last, and change the node to a new last node at the same time. l == null determines whether it is the first time to add. If l is null, then newNode is also the head node. There is only one element of newNode in the current collection. If it is not null, because of the bidirectional linked list, the next node of all l points to newNode.

Finally, the auto increment of size and operation count modCount, and the addition of elements at the end are completed.

In addition to add, there is an addlast (E) method for the tail operation. Except for the return value, there is no difference between the two methods. They are all linkLast methods called.

    public void addLast(E e) {
        linkLast(e);
    }
3.2. Add in the middle
    public void add(int index, E element) {
        // Scope check
        checkPositionIndex(index);
        if (index == size)
            linkLast(element);
        else
            linkBefore(element, node(index));
    }

For intermediate addition, you need to check the range first, that is, ensure the insertion position index is between [0, size], otherwise throw the array out of bounds exception IndexOutOfBoundsException, er Array out of bounds

    private void checkPositionIndex(int index) {
        if (!isPositionIndex(index))
            throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
    }

    private boolean isPositionIndex(int index) {
        return index >= 0 && index <= size;
    }

If index == size, it's actually tail insertion, so linkLast is called. This just said tail insertion.

If index < size, there are two steps to insert in the middle:

  1. node(int index) method gets the element succ at index position
  2. Linkbefore (e e e, node < E > succ) connects the element to be inserted after succ

node method is a frequently called method, and many operations of LinkedList depend on the method to find the corresponding elements. When acquiring elements according to index index, because the bidirectional linked list supports traversal before and after, the position judgment is made. Index < (size > > 1), compared with the middle position, the former traverses before, or the latter traverses after.

    Node<E> node(int index) {
        // assert isElementIndex(index);
        if (index < (size >> 1)) { //Preorder traversal
            Node<E> x = first;
            for (int i = 0; i < index; i++)
                x = x.next;
            return x;
        } else {
            Node<E> x = last;
            for (int i = size - 1; i > index; i--)
                x = x.prev;
            return x;
        }
    }

The traversal logic is very simple. Loop to the previous Node (the next Node is the next one) of index, and get the next (prev) to return the Node object succ corresponding to the index position.

    void linkBefore(E e, Node<E> succ) {
        // assert succ != null;
        final Node<E> pred = succ.prev;
        final Node<E> newNode = new Node<>(pred, e, succ);
        succ.prev = newNode;
        if (pred == null)
            first = newNode;
        else
            pred.next = newNode;
        size++;
        modCount++;
    }

linkBefore is almost the same as linkLast, except that one is added to the last node and the other is added to the succ node.

For the middle insert, if the index is 0, it is actually the head insert. At this time, it is better not to call the node method to find the element, so LinkedList also provides an addFirst(E e) method.

    public void addFirst(E e) {
        linkFirst(e);
    }

    private void linkFirst(E e) {
        final Node<E> f = first;
        final Node<E> newNode = new Node<>(null, e, f);
        first = newNode;
        if (f == null)
            last = newNode;
        else
            f.prev = newNode;
        size++;
        modCount++;
    }
3.3. Batch insertion

LinkedList provides tail batch insertion and middle batch insertion, but the internal implementation is actually called addall (int index, collection <? Extends E > C).

    public boolean addAll(Collection<? extends E> c) {
        return addAll(size, c);
    }

    public boolean addAll(int index, Collection<? extends E> c) {
        // Range validation
        checkPositionIndex(index);
        Object[] a = c.toArray();
        int numNew = a.length;
        if (numNew == 0)
            return false;
        // succ is the index location element, and pred is the previous element of index
        Node<E> pred, succ;
        if (index == size) { // Tail insertion
            succ = null;
            pred = last;
        } else {
            succ = node(index);
            pred = succ.prev;
        }
		// Loop insertion
        for (Object o : a) {
            @SuppressWarnings("unchecked") E e = (E) o;
            Node<E> newNode = new Node<>(pred, e, null);
            if (pred == null)
                first = newNode;
            else
                pred.next = newNode;
            pred = newNode;
        }
        // Cohesion processing
        if (succ == null) {
            last = pred;
        } else {
            pred.next = succ;
            succ.prev = pred;
        
        size += numNew;
        modCount++;
        return true;
    }

At first glance, the addall (int index, collection <? Extensions E > C) method seems to be a little complex, but after understanding its principle, it becomes much clearer. The insertion of the chain table is like the water connection pipe. First disconnect the water pipe from a certain position, and then connect the part to be connected with the interface. In this method, the key is two Node objects pred and succ. Succ is the index location element, and pred is the previous element (change) of index.

In special case index == size, i.e. tail insertion, so succ is null, and pred is tail node last.

Then it's loop assignment, in which node nodes are constructed, similar to linkLast.

The last one is the join processing. If the tail is inserted, the pred is the tail node (there is pred = newNode processing in cyclic assignment), so you only need to specify last = pred. The middle insert indicates that pred.next = succ and succ.prev = pred bind the index location to the new previous element.

4. Find elements

LinkedList provides the get first and get last methods in addition to the general get because its properties contain the first and last nodes.

    public E getFirst() {
        final Node<E> f = first;
        if (f == null)
            throw new NoSuchElementException();
        return f.item;
    }

    public E getLast() {
        final Node<E> l = last;
        if (l == null)
            throw new NoSuchElementException();
        return l.item;
    }

For getFirst and getLast, because they are member variables, the search process is omitted, and the node item can be returned directly.

    public E get(int index) {
        // Range validation
        checkElementIndex(index);
        // Node method get node
        return node(index).item;
    }

The acquisition of the pointer is mainly to call the node method to find the node corresponding to the index. The node method has been mentioned before and will not be repeated.

5. Modify element

To modify an element in the LinkedList collection, you need to find the element first, and then change its Node data item.

    public E set(int index, E element) {
        // Scope check
        checkElementIndex(index);
        // Get the Node object of the corresponding position of index
        Node<E> x = node(index);
        E oldVal = x.item;
        x.item = element;
        return oldVal;
    }

6. Delete element

LinkedList provides many ways to delete elements, but the internal implementation logic is basically the same, that is, find the corresponding Node node, and then replace the point to the Node.

6.1. Remove according to index

Let's first look at the remove(int index) method based on the index.

    public E remove(int index) {
        // Scope check
        checkElementIndex(index);
        // Disconnect node pointer
        return unlink(node(index));
    }

The scope check during deletion will not be mentioned, and the node method will not be mentioned any more. The main logic of deletion is the unlink (node < E > x) method.

    E unlink(Node<E> x) {
        // assert x != null;
        final E element = x.item;
        // Next node
        final Node<E> next = x.next;
        // Previous node
        final Node<E> prev = x.prev;
		// If prev exists, the next node of prev will point to next. If not, the current removed node is actually the head node, and next is the new first node
        if (prev == null) {
            first = next;
        } else {
            prev.next = next;
            x.prev = null;
        }
		// If the next node exists, the next previous node will be pointed to prev. If it does not exist, it means that the current removed node is a non node
        if (next == null) {
            last = prev;
        } else {
            next.prev = prev;
            x.next = null;
        }
		// Trigger GC work
        x.item = null;
        size--;
        // Operation counter auto increment
        modCount++;
        return element;
    }

The entire unlink method is a standard two-way linked list deletion operation. Three nodes prev, x, next, and delete x actually point prev to next and next to prev, but there are some special judgments.

After looking at the removal by index, let's look at the other two special cases, removeFirst and removeLast.

    public E removeFirst() {
        final Node<E> f = first;
        if (f == null)
            throw new NoSuchElementException();
        return unlinkFirst(f);
    }

    private E unlinkFirst(Node<E> f) {
        // assert f == first && f != null;
        final E element = f.item;
        final Node<E> next = f.next;
        f.item = null;
        f.next = null; // help GC
        first = next;
        if (next == null)
            last = null;
        else
            next.prev = null;
        size--;
        // Operation counter auto increment
        modCount++;
        return element;
    }

Unlink first is a simplified version of unlink method, because only the header node is processed, and the next node, next, will be the new first when it exists.

In the same way, removeLast is similar. Here you can experience it by yourself.

    public E removeLast() {
        final Node<E> l = last;
        if (l == null)
            throw new NoSuchElementException();
        return unlinkLast(l);
    }

    private E unlinkLast(Node<E> l) {
        // assert l == last && l != null;
        final E element = l.item;
        final Node<E> prev = l.prev;
        l.item = null;
        l.prev = null; // help GC
        last = prev;
        if (prev == null)
            first = null;
        else
            prev.next = null;
        size--;
        modCount++;
        return element;
    }

LinkedList also has a parameter free remove, which is defined by the Deque interface. The implementation is called removeFirst.

    public E remove() {
        return removeFirst();
    }
6.2. Remove according to element

There is not much difference between element removal and index removal, but the way to find the corresponding node has changed.

    public boolean remove(Object o) {
        // Determine whether the element is null because LinkedList supports adding null
        if (o == null) {
            for (Node<E> x = first; x != null; x = x.next) {
                if (x.item == null) {
                    unlink(x);
                    return true;
                }
            }
        } else {
            for (Node<E> x = first; x != null; x = x.next) {
                if (o.equals(x.item)) {
                    unlink(x);
                    return true;
                }
            }
        }
        return false;
    }

You can see that whether the element is null or not, you first locate the node and then call the unlink method.

Because of the two-way traversal feature, LinkedList provides the methods of pre order deletion and post order deletion, namely, removeFirstOccurrence and removealastoccurrence.

    public boolean removeFirstOccurrence(Object o) {
        return remove(o);
    }

    public boolean removeLastOccurrence(Object o) {
        if (o == null) {
            // Post order traversal
            for (Node<E> x = last; x != null; x = x.prev) {
                if (x.item == null) {
                    unlink(x);
                    return true;
                }
            }
        } else {
            // Post order traversal
            for (Node<E> x = last; x != null; x = x.prev) {
                if (o.equals(x.item)) {
                    unlink(x);
                    return true;
                }
            }
        }
        return false;
    }

These two methods are not mysterious. removeLastOccurrence just traverses the collection in reverse order.

6.3. Batch deletion

In the LinkedList class, there is no removeAll method because instead of overriding it, it uses the

    public boolean removeAll(Collection<?> c) {
        Objects.requireNonNull(c);
        boolean modified = false;
        // Using Iterators 
        Iterator<?> it = iterator();
        while (it.hasNext()) {
            if (c.contains(it.next())) {
                it.remove();
                modified = true;
            }
        }
        return modified;
    }

The implementation principle of removeAll is actually iterative deletion. iterator() is only an abstract method in the AbstractCollection class. The AbstractList class has its implementation, but the AbstractSequentialList class overrides this method.

    public Iterator<E> iterator() {
        return listIterator();
    }

The iterator method will call listIterator(), which is implemented in the AbstractList class. He calls the listIterator(int index) method, but LinkedList rewrites the method, so he finally returns to LinkedList.

    public ListIterator<E> listIterator(int index) {
        checkPositionIndex(index);
        return new ListItr(index);
    }

Here the ListItr object is the inner class of LinkedList.

    private class ListItr implements ListIterator<E> {
        private Node<E> lastReturned;
        private Node<E> next;
        private int nextIndex;
        // Expectation counter
        private int expectedModCount = modCount;

When ListItr initializes, it will assign the operation counter modCount to expectedModCount, and every subsequent remove method will check whether expectedModCount and modCount are equal, otherwise an exception will be thrown.

The remove method of ListItr automatically increases the expectedModCount after each call, which has reached the synchronization with the modCount + + in unlink, so that the modCount == expectedModCount has always been established, which is why we need to use the remove method of its iterator when we cycle to delete the LinkedList element.

        public void remove() {
            // Verify modCount
            checkForComodification();
            if (lastReturned == null)
                throw new IllegalStateException();

            Node<E> lastNext = lastReturned.next;
            // unlink delete the node logic, which includes modCount + +;
            unlink(lastReturned);
            if (next == lastReturned)
                next = lastNext;
            else
                nextIndex--;
            lastReturned = null;
            // expectedModCount auto increase
            expectedModCount++;
        }

        final void checkForComodification() {
            // expectedModCount and modCount must be equal
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
        }

Tags: Programming Java

Posted on Tue, 03 Dec 2019 21:36:10 -0800 by bluedogatdingdong