From a CFS Scheduling Case to Talk about the Root Caton of Linux System

The Linux system is a system that feels like Carton. Let me finish:

  • The reason for Carton is that the dispatcher on the Linux kernel never pays attention to business scenarios!

The Linux kernel only sees the machine, not the application.It tends to increase throughput from a bottom-up CPU perspective rather than from a top-down business perspective.

For humans, Linux is a good programmer, but not a good manager.

There must be a reason for this. Linux is a programmer who initiates a group of programmers who toss around, with few people like managers in suits participating.

Programmers talk about performance, time complexity, cache utilization, CPU, memory every day, whereas managers shout about customers, customers, customers, experiences, experiences every day!

It was very late that he returned to his residence from work the night before. Deputy Manager Liu asked me a question. He said that he was debugging a message queue component, involving multiple cooperating threads such as producers and consumers. It was very complex. After deploying it on line, he found a strange problem:

  • The whole system resources seem to be exclusive to the threads of this component. This Message Queuing component is very efficient, but its system is very stuck!

I asked him if he had deployed configurations like cgroup, cpuset, and so on, and he said no.

I asked him how many threads there were in the message queue component. He said no more than 20.

I asked again...He said...

...

I was surprised that I told Deputy Manager Liu that I would like to log on to the machine and debug it. He did not agree, but could tell me as much details as possible and I would like to offer remote assistance.

...

I don't know Message Queuing, nor do I know any middleware. Debugging any of these systems is beyond my power, and I'm sorry. I can only try to explain and solve this problem from the perspective of the operating system.Obviously, this is a problem with the operating system.And it's clear that this is a scheduling problem for the operating system.

When it comes to producers and consumers, if I can reproduce it locally, I can solve it.

So start Google's Linux schedule subsystem about producers and consumers, producer, consumer, schedule, cfs, linux kernel...

The next day, I kept thinking about it all day. I didn't reproduce the environment, but I could only think in my spare time.

After dinner, I found the following patch:
https://git.bricked.de/Bricked/flo/commit/4793241be408b3926ee00c704d7da3b3faf3a05f

Impact: improve/change/fix wakeup-buddy scheduling

Currently we only have a forward looking buddy, that is, we prefer to
schedule to the task we last woke up, under the presumption that its
going to consume the data we just produced, and therefore will have
cache hot benefits.

This allows co-waking producer/consumer task pairs to run ahead of the
pack for a little while, keeping their cache warm. Without this, we
would interleave all pairs, utterly trashing the cache.

This patch introduces a backward looking buddy, that is, suppose that
in the above scenario, the consumer preempts the producer before it
can go to sleep, we will therefore miss the wakeup from consumer to
producer (its already running, after all), breaking the cycle and
reverting to the cache-trashing interleaved schedule pattern.

The backward buddy will try to schedule back to the task that woke us
up in case the forward buddy is not available, under the assumption
that the last task will be the one with the most cache hot task around
barring current.

This will basically allow a task to continue after it got preempted.

In order to avoid starvation, we allow either buddy to get wakeup_gran
ahead of the pack.
























It seems that the LAST_BUDDY feature of the CFS scheduler is related. It involves the next, last pointer of the running queue, so I decided to design a simplest experiment to try to reproduce the problem and explore the LAST_BUDY feature.

My experiment is very simple:

  • The producer cycle awakens the consumer.

In order to make the wake up operation more direct, I want to use wake_up_process to wake up directly instead of using complex mechanisms such as signals, so I have to look for kernel support by writing a character device first to support this operation:

// wakedev.c
#include <linux/sched.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include<linux/uaccess.h>

#define CMD_WAKE	122

dev_t dev = 0;
static struct class *dev_class;
static struct cdev wake_cdev;

static long _ioctl(struct file *file, unsigned int cmd, unsigned long arg);

static struct file_operations fops = {
	.owner          = THIS_MODULE,
	.unlocked_ioctl = _ioctl,
};

static long _ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
	u32 pid = 0;
	struct task_struct *task = NULL;

	switch(cmd) {
		// ioctl wake-up command
		case CMD_WAKE:
			copy_from_user(&pid, (u32 *) arg, sizeof(pid));
			task = pid_task(find_vpid(pid), PIDTYPE_PID);
			if (task) {
				wake_up_process(task);
			}
			break;
	}
	return 0;
}

static int __init crosswake_init(void)
{
	if((alloc_chrdev_region(&dev, 0, 1, "test_dev")) <0){
		printk("alloc failed\n");
		return -1;
	}
	printk("major=%d minor=%d \n",MAJOR(dev), MINOR(dev));

	cdev_init(&wake_cdev, &fops);

	if ((cdev_add(&wake_cdev, dev, 1)) < 0) {
		printk("add failed\n");
		goto err_class;
	}

	if ((dev_class = class_create(THIS_MODULE, "etx_class")) == NULL) {
		printk("class failed\n");
		goto err_class;
	}

	if ((device_create(dev_class, NULL, dev, NULL, "etx_device")) == NULL) {
		printk(KERN_INFO "create failed\n");
		goto err_device;
	}

	return 0;

err_device:
	class_destroy(dev_class);
err_class:
	unregister_chrdev_region(dev,1);
	return -1;
}

void __exit crosswake_exit(void)
{
	device_destroy(dev_class,dev);
	class_destroy(dev_class);
	cdev_del(&wake_cdev);
	unregister_chrdev_region(dev, 1);
}

module_init(crosswake_init);
module_exit(crosswake_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("shabi");

Compile, load, and create character devices:

[root@localhost test]# insmod ./wakedev.ko
[root@localhost test]# dmesg |grep major.*minor
[   68.385310] major = 248 minor = 0
[root@localhost test]# mknod /dev/test c 248 0

OK, and then the code for the producer program:

// producer.c
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>

int main(int argc, char **argv)
{
	int fd, i = 0xfff, j;
	int pid = -1;

	pid = atoi(argv[1]);
	fd = open("/dev/test", O_RDWR);
	perror("open");

	while(1 || i --) {
		j = 0xfffff;
		ioctl(fd, 122, &pid); // Wake up consumer process
		while(j--) {} // Simulate production process!
	}
}

Next comes the consumer program:

#include <stdio.h>
int main()
{
	while(1) {
		sleep(1);
	}
}

Compile and run:

# The name is long because it is prominent in the log!
[root@localhost test]# gcc consumer.c -O0 -o consumerAconsumerACconsumerAconsumer
[root@localhost test]# gcc producer.c -O0 -o producerAproducerAproducerAproducer
# Start consumers
[root@localhost test]# ./consumerAconsumerACconsumerAconsumer &
# Start producer, specify consumer process
[root@localhost test]# ./producerAproducerAproducerAproducer 26274

Almost the same way we did the experiment. We tried several times and found no Carton phenomenon.

It's disappointing, but I think it's inevitable because if the problem is really so easy to reproduce, the community will fix long ago, so there are no obvious problems to solve here.Maybe it's just the wrong way to use the system.

Anyway, look at the data first and analyze the details from it.

First, in the case of experimentation, export the process switching using perf:

[root@localhost test]# perf record -e sched:sched_switch -e sched:sched_wakeup -a -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.303 MB perf.data (1303 samples) ]
[root@localhost test]# perf script -i perf.data > final.data

Here is a snippet that I captured about 10 times:

      loop_sleep  6642 [000] 23085.198476: sched:sched_switch: loop_sleep:6642 [120] R ==> xfsaild/dm-0:400 [120]
    xfsaild/dm-0   400 [000] 23085.198482: sched:sched_switch: xfsaild/dm-0:400 [120] S ==> loop_sleep:6642 [120]
      loop_sleep  6642 [000] 23085.217752: sched:sched_switch: loop_sleep:6642 [120] R ==> producerAproduc:26285 [130]
## From here on
 producerAproduc 26285 [000] 23085.220257: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.220259: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.220273: sched:sched_switch: consumerAconsum:26274 [120] S ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.269921: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.269923: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.269927: sched:sched_switch: consumerAconsum:26274 [120] S ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.292748: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.292749: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.292752: sched:sched_switch: consumerAconsum:26274 [120] producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.320205: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.320208: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.320212: sched:sched_switch: consumerAconsum:26274 [120] S ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.340971: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.340973: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.340977: sched:sched_switch: consumerAconsum:26274 [120] S ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.369630: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.369632: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.369637: sched:sched_switch: consumerAconsum:26274 [120] S ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.400818: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.400821: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.400825: sched:sched_switch: consumerAconsum:26274 [120] S ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.426043: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.426045: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.426048: sched:sched_switch: consumerAconsum:26274 [120] S ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.447646: sched:sched_wakeup: xfsaild/dm-0:400 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.447649: sched:sched_switch: producerAproduc:26285 [130] R ==> xfsaild/dm-0:400 [120]
## It's been a long time, R status is giving way! 
## End here!!!  
    xfsaild/dm-0   400 [000] 23085.447654: sched:sched_switch: xfsaild/dm-0:400 [120] S ==> loop_sleep:6642 [120]
      loop_sleep  6642 [000] 23085.468047: sched:sched_switch: loop_sleep:6642 [120] R ==> producerAproduc:26285 [130]
 producerAproduc 26285 [000] 23085.469862: sched:sched_wakeup: consumerAconsum:26274 [120] success=1 CPU:000
 producerAproduc 26285 [000] 23085.469863: sched:sched_switch: producerAproduc:26285 [130] R ==> consumerAconsum:26274 [120]
 consumerAconsum 26274 [000] 23085.469867: sched:sched_switch: consumerAconsum:26274 [120] S ==> loop_sleep:6642 [120]
      loop_sleep  6642 [000] 23085.488800: sched:sched_switch: loop_sleep:6642 [120] R ==> producerAproduc:26285 [130]

It seems that there really is a phenomenon of producers and consumers working together to dominate the screen, but only a few.

What exactly made the two processes so glued together?This question is interesting.

The CFS scheduler has not fulfilled its promise of "eliminating trick and keeping it simple". More and more heuristic feature s have been added, repeating the old way of O (1) O (1) O (1) scheduler!!

We can see these trick-style features in/sys/kernel/debug/sched_features:

[root@localhost test]# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY CACHE_HOT_BUDDY WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK LB_BIAS NONTASK_POWER TTWU_QUEUE NO_FORCE_SD_OVERLAP RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NUMA_FAVOUR_HIGHER NO_NUMA_RESIST_LOWER

Take a close look at the documents one by one.Here we only care about LAST_BUDDY.

If review ing the code, we'll find the following snippet of the wakeup operation:

static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
	...
    if (wakeup_preempt_entity(se, pse) == 1) {
        /*
         * Bias pick_next to pick the sched entity that is
         * triggering this preemption.
         */
        if (!next_buddy_marked)
            set_next_buddy(pse);
        goto preempt;
    }	
    ...
preempt:
    resched_task(curr);
    /*
     * Only set the backward buddy when the current task is still
     * on the rq. This can happen when a wakeup gets interleaved
     * with schedule on the ->pre_schedule() or idle_balance()
     * point, either of which can * drop the rq lock.
     *
     * Also, during early boot the idle thread is in the fair class,
     * for obvious reasons its a bad idea to schedule back to it.
     */
    if (unlikely(!se->on_rq || curr == rq->idle))
        return;
	// Here's the key!A next is set
    if (sched_feat(LAST_BUDDY) && scale && entity_is_task(se))
        set_last_buddy(se);
}

And the following snippet of the pick_next operation:

/* That's enough!
 * Pick the next process, keeping these things in mind, in this order:
 * 1) keep things fair between processes/task groups
 * 2) pick the "next" process, since someone really wants that to run
 * 3) pick the "last" process, for cache locality
 * 4) do not run the "skip" process, if something else is available
 */
static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
	...
	/*
     * Prefer last buddy, try to return the CPU to a preempted task.
     */
    // next takes precedence over last!
    if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
        se = cfs_rq->last;

    /*
     * Someone really wants this to run. If it's not unfair, run it.
     */
    if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
        se = cfs_rq->next;

    clear_buddies(cfs_rq, se);
}

There's a suspicion here.

If the producer is P and the consumer is C, then:

  • P wakes up C, then P becomes last.
  • C is in operation.
  • C Run end enters blocking.
  • CPU schedules, pick next.
  • last preempted leftmost to win.
    Last's vruntime is larger than leftmost's, but it has less than one granularity, so last is preferred!

The last point above needs explanation, but let me also explain the comment for the kernel function wakeup_preempt_entity:

So far, in fact, there is already a solution to the problem, which is roughly:

  • Disable LAST_BUDDY feature.

This requires only one command:

[root@localhost test]# echo NO_LAST_BUDDY >/sys/kernel/debug/sched_features

After working with the deputy manager, I didn't satisfy my curiosity. I still want to see what happened.

To dig out how things happen, perf alone is not enough and systemtap is needed to help.

The following script detects what the details are when a process wakes up and schedules a switch:

#!/usr/bin/stap -g

global g_cfs_rq;

probe begin {
	g_cfs_rq = 0;
}

function container_of_entity:long(se:long)
{
	offset = &@cast(0, "struct task_struct")->se;
	return se - offset;
}

function container_to_entity:long(task:long)
{
	offset = &@cast(0, "struct task_struct")->se;
	return task +  offset;
}

function entity_to_rbnode:long(rb:long)
{
	offset = &@cast(0, "struct sched_entity")->run_node;
	return rb - offset;
}

function print_task(s:string, se:long, verbose:long, min_vruntime:long)
{
	my_q = @cast(se, "struct sched_entity")->my_q;
	if(my_q == 0) {
		t_se = container_of_entity(se);
		printf("%8s %p  %s %d \n", s, t_se, task_execname(t_se), task_pid(t_se));
	}
}

probe kernel.function("pick_next_task_fair")
{
	printf("--------------- begin pick_next_task_fair --------------->\n");
	g_cfs_rq = &$rq->cfs;
}

probe kernel.function("pick_next_entity")
{
	if (g_cfs_rq == 0)
		next;

	printf("------- begin pick_next_entity ------->\n");
	cfsq = g_cfs_rq;
	vrun_first = 0;
	vrun_last = 0;
	last = @cast(cfsq, "struct cfs_rq")->last;
	if (last) {
		my_q = @cast(last, "struct sched_entity")->my_q;
		if(my_q != 0) {
			cfsq = @cast(last, "struct sched_entity")->my_q;
			last = @cast(cfsq, "struct cfs_rq")->last;
		}
		t_last = container_of_entity(last);
		vrun_last = @cast(last, "struct sched_entity")->vruntime;
		printf("LAST:[%s] vrun:%d\t", task_execname(t_last), vrun_last);
	}
	firstrb = @cast(cfsq, "struct cfs_rq")->rb_leftmost;
	if (firstrb) {
		firstse = entity_to_rbnode(firstrb);
		my_q = @cast(firstse, "struct sched_entity")->my_q;
		if(my_q != 0) {
			firstrb = @cast(my_q, "struct cfs_rq")->rb_leftmost;
			firstse = entity_to_rbnode(firstrb);
		}
		t_first = container_of_entity(firstse);
		vrun_first = @cast(firstse, "struct sched_entity")->vruntime;
		printf("FIRST:[%s] vrun:%d\t", task_execname(t_first), vrun_first);
	}
	if (last && firstrb) {
		printf("delta: %d\n", vrun_last - vrun_first);
	} else {
		printf("delta: N/A\n");
	}
	printf("<------- end pick_next_entity -------\n");
	printf("###################\n");
}

probe kernel.function("pick_next_task_fair").return
{
	if($return != 0) {
		se = &$return->se;
		t_se = container_of_entity(se);
		t_curr = task_current();
		printf("Return task: %s[%d]  From current: %s[%d]\n", task_execname(t_se), task_pid(t_se), task_execname(t_curr), task_pid(t_curr));
	}

	printf("<--------------- end pick_next_task_fair ---------------\n");
	printf("###########################################################\n");
	g_cfs_rq = 0;
}

probe kernel.function("set_last_buddy")
{
	se_se = $se;
	print_task("=== set_last_buddy", se_se, 0, 0);
}

probe kernel.function("__clear_buddies_last")
{
	se_se = $se;
	print_task("=== __clear_buddies_last", se_se, 0, 0);
}

probe kernel.function("check_preempt_wakeup")
{
	printf("--------------- begin check_preempt_wakeup --------------->\n");
	_cfs_rq = &$rq->cfs;
	min_vruntime = @cast(_cfs_rq, "struct cfs_rq")->min_vruntime;
	ok = @cast(_cfs_rq, "struct cfs_rq")->nr_running - sched_nr_latency;
	t_curr = task_current();
	t_se = $p;
	se_curr = container_to_entity(t_curr);
	se_se = container_to_entity(t_se);
	vrun_curr = @cast(se_curr, "struct sched_entity")->vruntime;
	vrun_se = @cast(se_se, "struct sched_entity")->vruntime;

	printf("curr wake:[%s]   woken:[%s]\t", task_execname(t_curr), task_execname(t_se));
	printf("UUUUU curr:%d  se:%d min:%d\t", vrun_curr, vrun_se, min_vruntime);
	printf("VVVVV delta:%d   %d\n", vrun_curr - vrun_se, ok);
}

probe kernel.function("check_preempt_wakeup").return
{
	printf("<--------------- end check_preempt_wakeup ---------------\n");
	printf("###########################################################\n");
}

Re-do the experiment, we run the script several times, and finally collected a case, and look at the output below:

.....
--------------- begin check_preempt_wakeup --------------->
curr wake:[producerAproduc]   woken:[consumerAconsum]   UUUUU curr:17886790442766  se:17886787442766 min:20338095223270 VVVVV delta:3000000   1
=== set_last_buddy 0xffff8800367b4a40  producerAproduc 26285
<--------------- end check_preempt_wakeup ---------------
###########################################################
--------------- begin pick_next_task_fair --------------->
------- begin pick_next_entity ------->
LAST:[producerAproduc] vrun:17886790442766  FIRST:[consumerAconsum] vrun:17886787442766 delta: 3000000
<------- end pick_next_entity -------
###################
Return task: consumerAconsum[26274]  From current: producerAproduc[26285]
<--------------- end pick_next_task_fair ---------------
###########################################################
--------------- begin pick_next_task_fair --------------->
------- begin pick_next_entity ------->
#[Note the case here!]
#[originally loop_sleep would have been put into operation, but the result was preempted by producerAproduc set to last!!!]
LAST:[producerAproduc] vrun:17886790442766  FIRST:[loop_sleep] vrun:17886790410519  delta: 32247
<------- end pick_next_entity -------
###################
=== __clear_buddies_last 0xffff8800367b4a40  producerAproduc 26285
#[Discard loop_sleep and choose producerAproduc]
Return task: producerAproduc[26285]  From current: consumerAconsum[26274]
<--------------- end pick_next_task_fair ---------------
###########################################################
--------------- begin pick_next_task_fair --------------->
------- begin pick_next_entity ------->
FIRST:[loop_sleep] vrun:17886790410519  delta: N/A
<------- end pick_next_entity -------
###################
Return task: loop_sleep[4227]  From current: producerAproduc[26285]
<--------------- end pick_next_task_fair ---------------
.......

This scene was captured, which means that the loop_sleep process that should have been running was preempted by a last anchor process that was previously set due to CPU cache affinity. Although this maximizes the utilization of cache from the CPU perspective, it is not reasonable from the business perspective.

Due to the tight coupling between code and timestamp, it is difficult to construct a scene that sustains LAST_BUDDY preemption, but as long as a case like the one above is found, the problem is essentially explained. Once a global synchronous resonance occurs, the scene of system carton caused by some process coupling overlays reappears.

Overall, the LAST_BUDDY feature introduces a strong competitor to the leftmost node of the CFS red and black tree, and the leftmost node is likely to lose the competition and not be operational. Does this disrupt the sophisticated and simple red and black tree structure CFS prides on and bring benefits for fast process selection?

A red and black tree was pecked like a woodpecker by all kinds of heuristic feature s!

Originally, the O (1) O (1) O (1) O (1) scheduler was driven by an increasing or compensating or penalizing heuristic trick, which underwent a transition such as RSDL(The Rotating Staircase Deadline Schedule), and ultimately was not as straightforward as CFS.Now CFS is getting more complex and bloated.In fact, this eventually makes the CFS scheduler a trade off.

Next, let me comment a little.

Perhaps loop_sleep is a process that needs to be run urgently. It can save managers from fire and water, and applications can ignore the CPU's cache utilization, but not its business logic and the consequences of delayed execution!

Of course, Linux, as a universal operating system, has never said that it is a real-time operating system, but in 10,000 steps back, System Carton is also a very bad experience for users!The Windows system consumes power and switches threads frequently, but it is silky smooth to use. That's the difference. It's not just a pair of shoes, but a suit.

Let's take another look at the big features in /sys/kernel/debug/sched_features. A good set of parameters isn't everyone's ability, nor can a manager. How do so many features fit together into a common scenario? I don't think the great experts have the ability, let alone the constraints between many features.

As long as a set of dynamic priority-boosting scheduling algorithms can solve all these problems, the scheduling algorithms of Windows systems are very good for enhancing the user experience.

With regard to the LAST_BUDY feature, it was even bug gy until the pick next sort was added to the 2.6.39-rc1 kernel version:

static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
	struct sched_entity *se = __pick_next_entity(cfs_rq);
	struct sched_entity *left = se;

	// check next first
	if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
		se = cfs_rq->next;

	/*
	 * Prefer last buddy, try to return the CPU to a preempted task.
	 */
	// check last!Therefore, last may override the selected result next, causing a serious ping-pong effect!
	if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
		se = cfs_rq->last;

	clear_buddies(cfs_rq, se);

	return se;
}

I will laugh and sing.

Zhejiang Wenzhou leather shoes are wet, so you won't get fat when it rains.

Tags: Linux Windows Google git

Posted on Thu, 23 Apr 2020 18:08:37 -0700 by Lyleyboy