[Heap heap] Introduction to binary Heap, source code implementation and minimum Heap to solve the problem of TOPK

Binary heap

Thank you, little brother Love data structure It's a classic.
Thank you, little brother Love data structure It's a classic.
Thank you, little brother Love data structure It's a classic.

Heap

Emergence of heap

A data structure is designed to store integers. Three interfaces are required

  • Additive elements
  • Get maximum
  • Delete Max

Compare the time complexity with the learned data structure:

In order to meet this demand, a new data structure - heap

  • Get the maximum value: O(logn)O(logn)O(logn)
  • Delete Max: O(logn)O(logn)O(logn)
  • Add element: O(logn)O(logn)O(logn)

Brief introduction of heap

Heap is a tree data structure. Common heap implementations include

  • Binary Heap
  • Multi fork heap (D-heap, D-ary Heap)
  • Index Heap
  • Binomial Heap
  • Fibonacci Heap
  • Leftist Heap
  • Skew Heap

Important properties of heap: the value of any node is always ≥ \ geq ≥ (⩽ \ leqslant ⩽) the value of the child node

  • If the value of any node is always ≥ \ geq ≥ the value of the child node, it is called: maximum heap, large root heap and large top heap
  • If the value of any node is always the value of \ leqslant ⩽ child node, it is called: minimum heap, small root heap, small top heap

Elements in the heap must be comparable (like a binary search tree)

Binary Heap

The logical structure of binary heap is a complete binary tree, so it is also called complete binary heap

In view of some characteristics of complete binary tree, the bottom layer (physical structure) of binary heap is usually realized by array


The law of index i (n is the number of elements)

  • If i = 0, it is the root node
  • If I > 0, the index of its parent node is floor ((I - 1) / 2)
  • If 2i+1 ⩽ n − 12i+1 \ leqslant n - 12i+1 ⩽ n − 1, the index of its left child node is 2i+12i + 12i+1
  • If 2I + 1 > n − 12I + 1 > n - 12I + 1 > n − 1, it has no left child node
  • If 2i+2 ⩽ n − 12i+2 \ leqslant n - 12i+2 ⩽ n − 1, the index of its right child node is 2i+12i + 12i+1
  • If 2I + 2 > n − 12I + 2 > n - 12I + 2 > n − 1, it has no right child node

Get maximum

It mainly checks whether the array cannot be empty, and throws an exception if it is empty.

public E get() {
	emptyCheck();
	return elements[0];
}
private void emptyCheck() {
	if (size == 0) {
		throw new IndexOutOfBoundsException("Heap is empty");
	}
}

Maximum heap - add

The train of thought is:

  • Put the added node at the end of the array

  • Loop up (Sift Up), that is:

    • If node > parent node
      Swap location with parent
    • If node < = parent node, or node has no parent node
      Exit loop
  • Time complexity: O(logn)O(logn)O(logn)

/**
 * Maximum heap - add
 */
public void add(E element) {
	// Check that the passed in element is not empty
	elementNotNullCheck(element);
	// Expand the capacity to ensure that the capacity is greater than the current number of elements
	ensureCapacity(size + 1);
	// Put the element to be added at the end of the array
	elements[size++] = element;
	// Up filtration operation
	siftUp(size - 1);
}
/**
 * Let the elements in index position filter up
 */
private void siftUp(int index) {
	E e = elements[index]; // Nodes to add
	while (index > 0) { // Until the root node is compared
		int pindex = (index - 1) >> 1; // Parent node index
		E p = elements[pindex]; // Add parent of node
		if (compare(e, p) <= 0) return;
		
		// Exchange the contents of index and pindex locations
		E tmp = elements[index];
		elements[index] = elements[pindex];
		elements[pindex] = tmp;
		
		// Reassign index
		index = pindex;
	}
}

Maximum heap - add optimization

The difference from the above is:

  • Not every time the value of the node > the value of the parent node is swapped directly
  • Instead, the newly added nodes are backed up and the final location is determined

/**
 * Maximum heap - add
 */
public void add(E element) {
	// Check that the passed in element is not empty
	elementNotNullCheck(element);
	// Expand the capacity to ensure that the capacity is greater than the current number of elements
	ensureCapacity(size + 1);
	// Put the element to be added at the end of the array
	elements[size++] = element;
	// Up filtration operation
	siftUp(size - 1);
}
/**
 * Let the elements in index position filter up
 */
private void siftUp(int index) {

	E element = elements[index];
	while (index > 0) {
		// Parent node index = (child node index-1) / 2
		int parentIndex = (index - 1) >> 1;
		E parent = elements[parentIndex];
		if (compare(element, parent) <= 0) break;
		
		// Store parent element in index location
		elements[index] = parent;
		
		// Reassign index
		index = parentIndex;
	}
	elements[index] = element;
}

Maximum heap - delete

The train of thought is:

  • Cover the root node with the last node
  • Delete last node
  • The loop performs a sift down operation, that is:
    • If node < the largest child node
      Swap location with the largest child node
    • If node > = the largest child node, or node has no child nodes
      Exit loop
  • Time complexity: O(logn)O(logn)O(logn)
  • Swap location operations can be optimized like adding


/**
 * Max heap - delete top element
 */
public E remove() {
	// Detection array cannot be empty
	emptyCheck(); 
	// Get the index of the last element of the array
	int lastIndex = --size;
	// Get the node to delete the element
	E root = elements[0];
	elements[0] = elements[lastIndex];
	elements[lastIndex] = null;
	
	siftDown(0); // Filtration operation
	return root;
}
/**
 * Let the elements in index position filter down
 */
private void siftDown(int index) {
	E element = elements[index];
	int half = size >> 1; // Number of non leaf nodes
	// Index of the first leaf = = number of non leaf nodes
	// Index < index of the first leaf node
	// It must be ensured that the index position is a non leaf node
	while (index < half) { 
		// There are two types of nodes in index
		// 1. Only the left child node
		// 2. There are left and right child nodes at the same time
		
		// By default, the left child node is compared with it
		int childIndex = (index << 1) + 1;
		E child = elements[childIndex];
		
		// Right child node
		int rightIndex = childIndex + 1;
		
		// Select the one with the largest left and right child nodes
		if (rightIndex < size && compare(elements[rightIndex], child) > 0) {
			child = elements[childIndex = rightIndex];
		}
		
		if (compare(element, child) >= 0) break;

		// Store the child nodes in the index location
		elements[index] = child;
		// Reset index
		index = childIndex;
	}
	elements[index] = element;
}

replace

public E replace(E element) {
	elementNotNullCheck(element);
	
	E root = null;
	if (size == 0) {
		elements[0] = element;
		size++;
	} else {
		root = elements[0];
		elements[0] = element;
		siftDown(0);
	}
	return root;
}

Maximum heap - heapfy

  • Top down filter
  • Bottom up filter

Top down filter

Bottom up filter

Efficiency contrast

  • Top down time complexity of up filtering: O(nlogn)O(nlogn)O(nlogn)
  • Bottom up filtering time complexity: O(nlogk)O(nlogk)O(nlogk)

Binary heap source code

Heap.java, the basic interface of heap

public interface Heap<E> {
	int size();	// Number of elements
	boolean isEmpty();	// Is it empty?
	void clear();	// empty
	void add(E element);	 // Additive elements
	E get();	// Get the top element
	E remove(); // Delete the top of heap element
	E replace(E element); // Insert a new element while deleting the top of heap element
}

Abstract class AbstractHeap.java

@SuppressWarnings("unchecked")
public abstract class AbstractHeap<E> implements Heap<E> {
	protected int size;
	protected Comparator<E> comparator;
	
	public AbstractHeap(Comparator<E> comparator) {
		this.comparator = comparator;
	}
	
	public AbstractHeap() {
		this(null);
	}
	
	@Override
	public int size() {
		return size;
	}

	@Override
	public boolean isEmpty() {
		return size == 0;
	}
	
	protected int compare(E e1, E e2) {
		return comparator != null ? comparator.compare(e1, e2) 
				: ((Comparable<E>)e1).compareTo(e2);
	}
}

Binary heap.java

/**
 * Binary reactor (maximum reactor)
 */
@SuppressWarnings("unchecked")
public class BinaryHeap<E> extends AbstractHeap<E> {
	private E[] elements;
	private static final int DEFAULT_CAPACITY = 10;
	
	public BinaryHeap(E[] elements, Comparator<E> comparator)  {
		super(comparator);
		
		if (elements == null || elements.length == 0) {
			this.elements = (E[]) new Object[DEFAULT_CAPACITY];
		} else {
			size = elements.length;
			int capacity = Math.max(elements.length, DEFAULT_CAPACITY);
			this.elements = (E[]) new Object[capacity];
			for (int i = 0; i < elements.length; i++) {
				this.elements[i] = elements[i];
			}
			heapify();
		}
	}
	
	public BinaryHeap(E[] elements)  {
		this(elements, null);
	}
	
	public BinaryHeap(Comparator<E> comparator) {
		this(null, comparator);
	}
	
	public BinaryHeap() {
		this(null, null);
	}

	@Override
	public void clear() {
		for (int i = 0; i < size; i++) {
			elements[i] = null;
		}
		size = 0;
	}

	@Override
	public void add(E element) {
		elementNotNullCheck(element);
		ensureCapacity(size + 1);
		elements[size++] = element;
		siftUp(size - 1);
	}

	@Override
	public E get() {
		emptyCheck();
		return elements[0];
	}

	/**
	 * Delete the top of heap element
	 */
	public E remove() {
		emptyCheck();
		
		int lastIndex = --size;
		E root = elements[0];
		elements[0] = elements[lastIndex];
		elements[lastIndex] = null;
		
		siftDown(0);
		return root;
	}

	@Override
	public E replace(E element) {
		elementNotNullCheck(element);
		
		E root = null;
		if (size == 0) {
			elements[0] = element;
			size++;
		} else {
			root = elements[0];
			elements[0] = element;
			siftDown(0);
		}
		return root;
	}
	
	/**
	 * Batch building
	 */
	private void heapify() {
		// Top down filter
//		for (int i = 1; i < size; i++) {
//			siftUp(i);
//		}
		
		// Bottom up filter
		for (int i = (size >> 1) - 1; i >= 0; i--) {
			siftDown(i);
		}
	}
	
	/**
	 * Let the elements in index position filter down
	 * @param index
	 */
	private void siftDown(int index) {
		E element = elements[index];
		int half = size >> 1; // Number of non leaf nodes
		// Index of the first leaf = = number of non leaf nodes
		// Index < index of the first leaf node
		// It must be ensured that the index position is a non leaf node
		while (index < half) { 
			// There are two types of nodes in index
			// 1. Only the left child node
			// 2. There are left and right child nodes at the same time
			
			// By default, the left child node is compared with it
			int childIndex = (index << 1) + 1;
			E child = elements[childIndex];
			
			// Right child node
			int rightIndex = childIndex + 1;
			
			// Select the one with the largest left and right child nodes
			if (rightIndex < size && compare(elements[rightIndex], child) > 0) {
				child = elements[childIndex = rightIndex];
			}
			
			if (compare(element, child) >= 0) break;

			// Store the child nodes in the index location
			elements[index] = child;
			// Reset index
			index = childIndex;
		}
		elements[index] = element;
	}
	
	/**
	 * Let the elements in index position filter up
	 */
	private void siftUp(int index) {
//		E e = elements[index];
//		while (index > 0) {
//			int pindex = (index - 1) >> 1;
//			E p = elements[pindex];
//			if (compare(e, p) <= 0) return;
//			
//			//Exchange the contents of index and pindex locations
//			E tmp = elements[index];
//			elements[index] = elements[pindex];
//			elements[pindex] = tmp;
//			
//			//Reassign index
//			index = pindex;
//		}
		E element = elements[index];
		while (index > 0) {
			// Parent node index = (child node index-1) / 2
			int parentIndex = (index - 1) >> 1;
			E parent = elements[parentIndex];
			if (compare(element, parent) <= 0) break;
			
			// Store parent element in index location
			elements[index] = parent;
			
			// Reassign index
			index = parentIndex;
		}
		elements[index] = element;
	}
	
	private void ensureCapacity(int capacity) {
		int oldCapacity = elements.length;
		if (oldCapacity >= capacity) return;
		
		// New capacity is 1.5 times the old capacity
		int newCapacity = oldCapacity + (oldCapacity >> 1);
		E[] newElements = (E[]) new Object[newCapacity];
		for (int i = 0; i < size; i++) {
			newElements[i] = elements[i];
		}
		elements = newElements;
	}
	
	private void emptyCheck() {
		if (size == 0) {
			throw new IndexOutOfBoundsException("Heap is empty");
		}
	}
	
	private void elementNotNullCheck(E element) {
		if (element == null) {
			throw new IllegalArgumentException("element must not be null");
		}
	}
}

Build a minimum heap

After writing the maximum heap, the implementation of the minimum heap does not need to modify the source code, but only needs to pass in the comparator which is the opposite way to the maximum heap when creating the heap.

public static void main(String[] args) {
	Integer[] data = {88, 44, 53, 41, 16, 6, 70, 18, 85, 98, 81, 23, 36, 43, 37};
	BinaryHeap<Integer> heap = new BinaryHeap<>(data, new Comparator<Integer>() {
		public int compare(Integer o1, Integer o2) {
			return o2 - o1; // Opposite to the maximum heap
		}
	});
}

TOP K problem

What is the TopK problem

  • From the N integers, find the largest number of first k (k < n)
  • For example: find the largest 100 integers out of 1 million integers

One of the solutions of TopK problem: it can be solved by data structure "heap"

The time complexity of O(nlogn)O(nlogn)O(nlogn) O (nlogn) is required if the sorting algorithm is used for full sorting

If binary heap is used, the time complexity of O(nlogk)O(nlogk)O(nlogk) can be used

  • Create a new small top heap
  • Scan n integers
    • First, put the first k traversals into the heap
    • Starting from the number k+1, if it is greater than the heap top element, use the replace operation
      (delete the top element and add the number k+1 to the heap)
  • After scanning, all that remains in the heap is the largest number of first k
public static void main(String[] args) {
	// Create a new small top heap
	BinaryHeap<Integer> heap = new BinaryHeap<>(new Comparator<Integer>() {
		public int compare(Integer o1, Integer o2) {
			return o2 - o1;
		}
	});
	
	// Find the largest number of first k
	int k = 3;
	Integer[] data = {51, 30, 39, 92, 74, 25, 16, 93, 
			91, 19, 54, 47, 73, 62, 76, 63, 35, 18, 
			90, 6, 65, 49, 3, 26, 61, 21, 48};
	for (int i = 0; i < data.length; i++) {
		if (heap.size() < k) { // First k added to small top heap
			heap.add(data[i]); // logk
		} else if (data[i] > heap.get()) { // If it is the k + 1 number, and it is greater than the top element of the heap
			heap.replace(data[i]); // logk
		}
	}
	// O(nlogk)
}

What if we find the smallest number of the first k?

  • Use big top pile
  • Use the replace operation if it is smaller than the heap top element
140 original articles published, 30 praised, 10000 visitors+
Private letter follow

Tags: Java

Posted on Sat, 14 Mar 2020 06:21:44 -0700 by drifter