In-depth analysis of the working principle of sort in V8

background

Consideration caused by an algorithm problem.
I met this question before when I was brushing the question in leetcode (the source of the question) Finding the Median of Two Ordered Arrays)

Given two ordered arrays of size m and n, nums1 and nums2.
Please find the median of these two ordered arrays and ask the time complexity of the algorithm to be O (log (m + n).
You can assume that nums1 and nums2 are not empty at the same time.

Example 1
nums1 = [1, 3]
nums2 = [2]

The median is 2.0.

Example 2
nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5

When I did this, I didn't examine the question carefully. I combined the two numbers directly and then ranked them in order to get the median. When I submitted it, I also took AC directly. Later, when I looked at the analysis, I found someone Tucao js directly used sort to play a shameless role. Only then did I find that the time complexity of the original topic was O(log(m + n)).

So the question arises, the fastest sorting is O(nlogn), so why not overtime?

Let's look at the answers I won't examine first.

var findMedianSortedArrays = function(nums1, nums2) {
    let num = nums1.concat(nums2);
    num = num.sort((a, b) => a - b);
    let mid = Math.floor(num.length / 2);
    if (num.length % 2 === 0) {
        return (num[mid-1] + num[mid])/2
    } else {
        return num[mid]
    }
};

I'm using V8 optimized sort instead of ordinary fast queue, so does it prove that V8 sort is faster than fast queue?

Comparison of sort and quick sort in V8

Here I write a script to compare the run time of the two algorithms.

var quickSort = function (arr) {
    if (arr.length <= 1) { return arr; }
    var pivotIndex = Math.floor(arr.length / 2);
    var pivot = arr.splice(pivotIndex, 1)[0];
    var left = [];
    var right = [];
    for (var i = 0; i < arr.length; i++) {
        if (arr[i] < pivot) {
            left.push(arr[i]);
        } else {
            right.push(arr[i]);
        }
    }
    return quickSort(left).concat([pivot], quickSort(right));
};

let arr = [], brr = [], idx = 0, length = Math.floor(Math.random() * 10000000);
while (idx < length) {
    arr[idx] = brr[idx] = Math.floor(Math.random() * length);
    idx++;
}
console.log('length===', length)

console.time('quicksort')
quickSort(arr)
console.timeEnd('quicksort')

console.time('V8_sort')
brr.sort((a, b) => {
    return a - b
})
console.timeEnd('V8_sort')

We can see that no matter how long the random array is, the sort provided by V8 is obviously faster than that provided by fast queue. (Maybe someone has vomited my fast queue has a problem. Quick queue writing is taken from Mr. Ruan Yifeng's blog. Maybe it's very important to say that Mr. Ruan Yifeng's fast queue is non-in-place fast queue. Okay, please go out and turn left, don't send it.)

Principle of Quick Row

Here we first give you a refresher on the principle of fast-track, familiar students can go directly to the next title.

principle

1. Choose an element as the "benchmark"
(2) Elements smaller than the "benchmark" are moved to the left of the "benchmark"; elements larger than the "benchmark" are moved to the right of the "benchmark".
(3) Repeat the first and second steps for the left and right subsets of the benchmark until there is only one element left in all subsets.

Example

The following example is taken from Mr. Ruan Yifeng's blog Javascript Implementation of Quicksort
For example, now there is a data set {85, 24, 63, 45, 17, 31, 96, 50}, how to sort it?

The first step is to select the intermediate element 45 as the "benchmark". (The benchmark value can be chosen arbitrarily, but the intermediate value is easier to understand.)

The second step is to compare each element with the "benchmark" in order to form two subsets, one is "less than 45" and the other is "greater than or equal to 45".

The third step is to repeat the first and second steps of the two subsets until there is only one element left in all subsets.

sort principle in V8

The sort in V8 is not a single sorting method, but a specific method based on the length of the array. When the length of the array is less than or equal to 22, insert sorting is used, and fast sorting is selected when the length of the array is greater than 22. The source code reads as follows:

  // In-place QuickSort algorithm.
  // For short (length <= 22) arrays, insertion sort is used for efficiency.

There's nothing to say about insertion sort, which is skipped in this article.
So let's focus on how to achieve quick sort ing in V8.

Selection of benchmarks

Look at the source code first

  if (to - from <= 10) {
    InsertionSort(a, from, to);
    return;
  }
  if (to - from > 1000) {
    third_index = GetThirdIndex(a, from, to);
  } else {
    third_index = from + ((to - from) >> 1);
  }

When the length of an array is less than or equal to 10, the remaining arrays are sorted directly by insertion.
(2) When the array length is greater than 10 or less than 1000, third_index = from + ((to - from) > 1;
(3) When the length of the array is greater than 1000, it is obtained by the following functions

var GetThirdIndex = function(a, from, to) {
    var t_array = new InternalArray();
    // Use both 'from' and 'to' to determine the pivot candidates.
    var increment = 200 + ((to - from) & 15);
    var j = 0;
    from += 1;
    to -= 1;
    for (var i = from; i < to; i += increment) {
      t_array[j] = [i, a[i]];
      j++;
    }
    t_array.sort(function(a, b) {
      return comparefn(a[1], b[1]);
    });
    var third_index = t_array[t_array.length >> 1][0];
    return third_index;
 }

Here we add from + ((to - from) > 1) and 200 + ((to - from) & 15) & >

The bitwise operator "&" is a binary operator. Its function is to participate in the operation of the two corresponding binary phase sum. Only when the corresponding two binary bits are 1, the result bits are 1. The two numbers involved in the operation appear as complements.

Rules:

1&1=1
1&0=0
0&1=0
0&0=0

For example:

3: 0000 0011 
5: 0000 0101
 The results are as follows:
1: 0000 0001
 So 3 & 5 = 1

(2) > According to the binary system, the digits are shifted right to designated digits, the symbolic bits are zero-plus, the symbolic bits are negative-plus, and the low bits are removed directly.
For example:

let a = 60;
(60: 0011 1100)
a >> 2 And then it's 15.
(15: 0000 1111)

Source code parsing

Source code is too long, we do not go through it line by line, directly paste on the more critical code, interested students can see the source code on github V8 sort source code It is suggested to start with line 710

if (!IS_CALLABLE(comparefn)) {
    comparefn = function (x, y) {
      if (x === y) return 0;
      if (%_IsSmi(x) && %_IsSmi(y)) {
        return %SmiLexicographicCompare(x, y);
      }
      x = TO_STRING(x);
      y = TO_STRING(y);
      if (x == y) return 0;
      else return x < y ? -1 : 1;
    };
  }
  var InsertionSort = f

Students who have used sort should know that the function receives a function comparefn as a parameter. If it is not passed, it defaults to sort elements in ascending order by string, such as:

var InsertionSort = function InsertionSort(a, from, to) {
    for (var i = from + 1; i < to; i++) {
      var element = a[i];
      for (var j = i - 1; j >= from; j--) {
        var tmp = a[j];
        var order = comparefn(tmp, element);
        if (order > 0) {
          a[j + 1] = tmp;
        } else {
          break;
        }
      }
      a[j + 1] = element;
    }
  };

  var GetThirdIndex = function(a, from, to) {
    var t_array = new InternalArray();
    // Use both 'from' and 'to' to determine the pivot candidates.
    var increment = 200 + ((to - from) & 15);
    var j = 0;
    from += 1;
    to -= 1;
    for (var i = from; i < to; i += increment) {
      t_array[j] = [i, a[i]];
      j++;
    }
    t_array.sort(function(a, b) {
      return comparefn(a[1], b[1]);
    });
    var third_index = t_array[t_array.length >> 1][0];
    return third_index;
  }

  var QuickSort = function QuickSort(a, from, to) {
    var third_index = 0;
    while (true) {
      // Insertion sort is faster for short arrays.
      if (to - from <= 10) {
        InsertionSort(a, from, to);
        return;
      }
      if (to - from > 1000) {
        third_index = GetThirdIndex(a, from, to);
      } else {
        third_index = from + ((to - from) >> 1);
      }
      // Find a pivot as the median of first, last and middle element.
      var v0 = a[from];
      var v1 = a[to - 1];
      var v2 = a[third_index];
      var c01 = comparefn(v0, v1);
      if (c01 > 0) {
        // v1 < v0, so swap them.
        var tmp = v0;
        v0 = v1;
        v1 = tmp;
      } // v0 <= v1.
      var c02 = comparefn(v0, v2);
      if (c02 >= 0) {
        // v2 <= v0 <= v1.
        var tmp = v0;
        v0 = v2;
        v2 = v1;
        v1 = tmp;
      } else {
        // v0 <= v1 && v0 < v2
        var c12 = comparefn(v1, v2);
        if (c12 > 0) {
          // v0 <= v2 < v1
          var tmp = v1;
          v1 = v2;
          v2 = tmp;
        }
      }
      // v0 <= v1 <= v2
      a[from] = v0;
      a[to - 1] = v2;
      var pivot = v1;
      var low_end = from + 1;   // Upper bound of elements lower than pivot.
      var high_start = to - 1;  // Lower bound of elements greater than pivot.
      a[third_index] = a[low_end];
      a[low_end] = pivot;

      // From low_end to i are elements equal to pivot.
      // From i to high_start are elements that haven't been compared yet.
      partition: for (var i = low_end + 1; i < high_start; i++) {
        var element = a[i];
        var order = comparefn(element, pivot);
        if (order < 0) {
          a[i] = a[low_end];
          a[low_end] = element;
          low_end++;
        } else if (order > 0) {
          do {
            high_start--;
            if (high_start == i) break partition;
            var top_elem = a[high_start];
            order = comparefn(top_elem, pivot);
          } while (order > 0);
          a[i] = a[high_start];
          a[high_start] = element;
          if (order < 0) {
            element = a[i];
            a[i] = a[low_end];
            a[low_end] = element;
            low_end++;
          }
        }
      }
      if (to - high_start < low_end - from) {
        QuickSort(a, high_start, to);
        to = low_end;
      } else {
        QuickSort(a, from, low_end);
        from = high_start;
      }
    }
  };
  1. Obtain benchmarks using the methods described above
  2. Sort benchmarks, first and last elements
  3. The second element is traversed to the right and the penultimate element is traversed to the left, respectively. The number larger than the reference on the left side and the number larger than the reference on the right side are obtained. Then the position is exchanged, and then the position of the reference is exchanged with the smaller number after the exchange.
  4. Continue traversing and swapping until the left cursor meets the right cursor.
  5. At this time, the left side of the benchmark is smaller than the benchmark, the right side of the benchmark is larger than the benchmark, split into two arrays, and then recursively iterate through all the steps above until the length of the recursive array is less than or equal to 10, the insertion sort is used directly.

For example:
There is an array let arr= [1, 3, 9, 7, 0, 5, 2, 10, 6, 8, 4];

  1. First, the QuickSort function is executed. from is 0, to is 11, the array length is 11, and the benchmark is 0 + ((11 - 0) > 1) equal to 5.
  2. So a[from] is a[0], a[to] is a[10] and a[5], comparing the size of the new array is [1, 3, 9, 7, 0, 4, 2, 10, 6, 8, 5], where a[from]= a[0]= 1, a[from]= a[10]= 5, a[5]= 4;
  3. The benchmark is interchanged with a[to+1], resulting in [1, 4, 9, 7, 0, 3, 2, 10, 6, 8, 5]
  4. Then it enters the partition cycle, where a[low_end] = 9; a[high_start] = 8; starts looking for a value 9 larger than the benchmark from low_end to the right, and a value 2 smaller than the benchmark from high_start to the left, which is exchanged for [1, 4, 2, 7, 0, 3, 9, 10, 6, 8, 5]
  5. Then the benchmark value is exchanged with the smaller value just now to get [1, 2, 4, 7, 0, 3, 9, 10, 6, 8, 5], and repeat step 4.
  6. Then traverse 7, 7 and 3 to get [1, 2, 4, 3, 0, 7, 9, 10, 6, 8, 5]
  7. Next, the benchmark is swapped with smaller values [1, 2, 3, 4, 0, 7, 9, 10, 6, 8, 5]
  8. Finally [1, 2, 3, 0, 4, 7, 9, 10, 6, 8, 5]
  9. It can be seen that the left side of the benchmark is an array smaller than the benchmark, and the right side is an array larger than the benchmark. QuicSort is used to recurse the left side and the right side arrays, respectively, and the results are obtained.

summary

Sort in V8 is not a simple sort method, but a combination of insertion sort and quick sort functions, and optimized for fast sort.

If there are any mistakes, please correct them and change them as soon as possible.

Tags: Javascript less github

Posted on Fri, 30 Aug 2019 00:15:44 -0700 by dantheman