diff mbox series

rcu: Use system_unbound_wq to avoid disturbing isolated CPUs

Message ID 20240723181025.187413-1-longman@redhat.com (mailing list archive)
State Accepted
Commit 04225c21b25698dd73b76a684289ad951cb27e15
Headers show
Series rcu: Use system_unbound_wq to avoid disturbing isolated CPUs | expand

Commit Message

Waiman Long July 23, 2024, 6:10 p.m. UTC
It was discovered that isolated CPUs could sometimes be disturbed by
kworkers processing kfree_rcu() works causing higher than expected
latency. It is because the RCU core uses "system_wq" which doesn't have
the WQ_UNBOUND flag to handle all its work items. Fix this violation of
latency limits by using "system_unbound_wq" in the RCU core instead.
This will ensure that those work items will not be run on CPUs marked
as isolated.

Beside the WQ_UNBOUND flag, the other major difference between system_wq
and system_unbound_wq is their max_active count. The system_unbound_wq
has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.

Reported-by: Vratislav Bendel <vbendel@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-50220
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/rcu/tasks.h | 4 ++--
 kernel/rcu/tree.c  | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

Comments

Uladzislau Rezki July 24, 2024, 10:30 a.m. UTC | #1
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> It was discovered that isolated CPUs could sometimes be disturbed by
> kworkers processing kfree_rcu() works causing higher than expected
> latency. It is because the RCU core uses "system_wq" which doesn't have
> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> latency limits by using "system_unbound_wq" in the RCU core instead.
> This will ensure that those work items will not be run on CPUs marked
> as isolated.
> 
> Beside the WQ_UNBOUND flag, the other major difference between system_wq
> and system_unbound_wq is their max_active count. The system_unbound_wq
> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> 
> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> Closes: https://issues.redhat.com/browse/RHEL-50220
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/rcu/tasks.h | 4 ++--
>  kernel/rcu/tree.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e641cc681901..494aa9513d0b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
>  	if (delayed_work_pending(&krcp->monitor_work)) {
>  		delay_left = krcp->monitor_work.timer.expires - jiffies;
>  		if (delay < delay_left)
> -			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
> +			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  		return;
>  	}
> -	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
> +	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  }
>  
>  static void
> @@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  			// be that the work is in the pending state when
>  			// channels have been detached following by each
>  			// other.
> -			queue_rcu_work(system_wq, &krwp->rcu_work);
> +			queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
>  		}
>  	}
>  
> @@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>  			!atomic_xchg(&krcp->work_in_progress, 1)) {
>  		if (atomic_read(&krcp->backoff_page_cache_fill)) {
> -			queue_delayed_work(system_wq,
> +			queue_delayed_work(system_unbound_wq,
>  				&krcp->page_cache_work,
>  					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
>  		} else {
> -- 
> 2.43.5
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Thanks!

--
Uladzislau Rezki
Breno Leitao July 24, 2024, 1:30 p.m. UTC | #2
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> It was discovered that isolated CPUs could sometimes be disturbed by
> kworkers processing kfree_rcu() works causing higher than expected
> latency. It is because the RCU core uses "system_wq" which doesn't have
> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> latency limits by using "system_unbound_wq" in the RCU core instead.
> This will ensure that those work items will not be run on CPUs marked
> as isolated.
> 
> Beside the WQ_UNBOUND flag, the other major difference between system_wq
> and system_unbound_wq is their max_active count. The system_unbound_wq
> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> 
> Reported-by: Vratislav Bendel <vbendel@redhat.com>

I've seen this problem a while ago and reported to the list:

	https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/

I've just applied this test, and run my workload for 2 hours without
hitting this issue. Thanks for solving it.

Tested-by: Breno Leitao <leitao@debian.org>
Uladzislau Rezki July 24, 2024, 2:23 p.m. UTC | #3
On Wed, Jul 24, 2024 at 06:30:29AM -0700, Breno Leitao wrote:
> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> > It was discovered that isolated CPUs could sometimes be disturbed by
> > kworkers processing kfree_rcu() works causing higher than expected
> > latency. It is because the RCU core uses "system_wq" which doesn't have
> > the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> > latency limits by using "system_unbound_wq" in the RCU core instead.
> > This will ensure that those work items will not be run on CPUs marked
> > as isolated.
> > 
> > Beside the WQ_UNBOUND flag, the other major difference between system_wq
> > and system_unbound_wq is their max_active count. The system_unbound_wq
> > has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> > is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> > 
> > Reported-by: Vratislav Bendel <vbendel@redhat.com>
> 
> I've seen this problem a while ago and reported to the list:
> 
> 	https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/
> 
> I've just applied this test, and run my workload for 2 hours without
> hitting this issue. Thanks for solving it.
> 
> Tested-by: Breno Leitao <leitao@debian.org>
>
Thank you for testing this! I saw your recent email about that :)

--
Uladzislau Rezki
Neeraj Upadhyay July 25, 2024, 3:35 p.m. UTC | #4
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> It was discovered that isolated CPUs could sometimes be disturbed by
> kworkers processing kfree_rcu() works causing higher than expected
> latency. It is because the RCU core uses "system_wq" which doesn't have
> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> latency limits by using "system_unbound_wq" in the RCU core instead.
> This will ensure that those work items will not be run on CPUs marked
> as isolated.
> 

Alternative approach here could be, in case we want to keep per CPU worker
pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
is fixing?


- Neeraj

> Beside the WQ_UNBOUND flag, the other major difference between system_wq
> and system_unbound_wq is their max_active count. The system_unbound_wq
> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> 
> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> Closes: https://issues.redhat.com/browse/RHEL-50220
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/rcu/tasks.h | 4 ++--
>  kernel/rcu/tree.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e641cc681901..494aa9513d0b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
>  	if (delayed_work_pending(&krcp->monitor_work)) {
>  		delay_left = krcp->monitor_work.timer.expires - jiffies;
>  		if (delay < delay_left)
> -			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
> +			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  		return;
>  	}
> -	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
> +	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  }
>  
>  static void
> @@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  			// be that the work is in the pending state when
>  			// channels have been detached following by each
>  			// other.
> -			queue_rcu_work(system_wq, &krwp->rcu_work);
> +			queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
>  		}
>  	}
>  
> @@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>  			!atomic_xchg(&krcp->work_in_progress, 1)) {
>  		if (atomic_read(&krcp->backoff_page_cache_fill)) {
> -			queue_delayed_work(system_wq,
> +			queue_delayed_work(system_unbound_wq,
>  				&krcp->page_cache_work,
>  					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
>  		} else {
> -- 
> 2.43.5
>
Waiman Long July 25, 2024, 5:02 p.m. UTC | #5
On 7/25/24 11:35, Neeraj Upadhyay wrote:
> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
>> It was discovered that isolated CPUs could sometimes be disturbed by
>> kworkers processing kfree_rcu() works causing higher than expected
>> latency. It is because the RCU core uses "system_wq" which doesn't have
>> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
>> latency limits by using "system_unbound_wq" in the RCU core instead.
>> This will ensure that those work items will not be run on CPUs marked
>> as isolated.
>>
> Alternative approach here could be, in case we want to keep per CPU worker
> pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
> WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
> is fixing?

What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or 
what will we lose by using system_unbound_wq? All the calls that are 
modified to use system_unbound_wq are using WORK_CPU_UNBOUND as their 
cpu. IOW, they doesn't care which CPUs are used to run the work items. 
The only downside I can see is the possible loss of some cache locality.

In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An 
WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a 
WQ_CPU_INTENSIVE workqueue.

Cheers,
Longman
Neeraj Upadhyay July 25, 2024, 7:33 p.m. UTC | #6
On Thu, Jul 25, 2024 at 01:02:01PM -0400, Waiman Long wrote:
> On 7/25/24 11:35, Neeraj Upadhyay wrote:
> > On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> > > It was discovered that isolated CPUs could sometimes be disturbed by
> > > kworkers processing kfree_rcu() works causing higher than expected
> > > latency. It is because the RCU core uses "system_wq" which doesn't have
> > > the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> > > latency limits by using "system_unbound_wq" in the RCU core instead.
> > > This will ensure that those work items will not be run on CPUs marked
> > > as isolated.
> > > 
> > Alternative approach here could be, in case we want to keep per CPU worker
> > pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
> > WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
> > is fixing?
> 
> What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what
> will we lose by using system_unbound_wq? All the calls that are modified to
> use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they
> doesn't care which CPUs are used to run the work items. The only downside I
> can see is the possible loss of some cache locality.
> 

For the nohz_full case, where unbounded pool workers run only on housekeeping CPU
(cpu0), if multiple other CPUs are queuing work, the execution of those
works could get delayed. However, this should not generally happen as
other CPUs would be mostly running in user mode.


> In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An
> WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a
> WQ_CPU_INTENSIVE workqueue.

Got it, thanks!

I have picked the patch for further review and testing [1]


[1] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next


- Neeraj

> 
> Cheers,
> Longman
> 
>
Waiman Long July 25, 2024, 7:52 p.m. UTC | #7
On 7/25/24 15:33, Neeraj Upadhyay wrote:
> On Thu, Jul 25, 2024 at 01:02:01PM -0400, Waiman Long wrote:
>> On 7/25/24 11:35, Neeraj Upadhyay wrote:
>>> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
>>>> It was discovered that isolated CPUs could sometimes be disturbed by
>>>> kworkers processing kfree_rcu() works causing higher than expected
>>>> latency. It is because the RCU core uses "system_wq" which doesn't have
>>>> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
>>>> latency limits by using "system_unbound_wq" in the RCU core instead.
>>>> This will ensure that those work items will not be run on CPUs marked
>>>> as isolated.
>>>>
>>> Alternative approach here could be, in case we want to keep per CPU worker
>>> pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
>>> WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
>>> is fixing?
>> What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what
>> will we lose by using system_unbound_wq? All the calls that are modified to
>> use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they
>> doesn't care which CPUs are used to run the work items. The only downside I
>> can see is the possible loss of some cache locality.
>>
> For the nohz_full case, where unbounded pool workers run only on housekeeping CPU
> (cpu0), if multiple other CPUs are queuing work, the execution of those
> works could get delayed. However, this should not generally happen as
> other CPUs would be mostly running in user mode.
Well, it there is only one housekeeping CPU, a lot of background kernel 
tasks will be slowed down. Users should be careful about the proper 
balance between the number of housekeeping and nohz-full CPUs.
>
>
>> In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An
>> WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a
>> WQ_CPU_INTENSIVE workqueue.
> Got it, thanks!
>
> I have picked the patch for further review and testing [1]
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next

Thanks, let me know if you see any problem.

Cheers,
Longman
Waiman Long July 29, 2024, 3:06 a.m. UTC | #8
On 7/24/24 09:30, Breno Leitao wrote:
> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
>> It was discovered that isolated CPUs could sometimes be disturbed by
>> kworkers processing kfree_rcu() works causing higher than expected
>> latency. It is because the RCU core uses "system_wq" which doesn't have
>> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
>> latency limits by using "system_unbound_wq" in the RCU core instead.
>> This will ensure that those work items will not be run on CPUs marked
>> as isolated.
>>
>> Beside the WQ_UNBOUND flag, the other major difference between system_wq
>> and system_unbound_wq is their max_active count. The system_unbound_wq
>> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
>> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
>>
>> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> I've seen this problem a while ago and reported to the list:
>
> 	https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/
>
> I've just applied this test, and run my workload for 2 hours without
> hitting this issue. Thanks for solving it.
>
> Tested-by: Breno Leitao <leitao@debian.org>

Thank for testing this patch. So it is just us that saw this problem.

Cheers,
Longman
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e641cc681901..494aa9513d0b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3539,10 +3539,10 @@  schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
 	if (delayed_work_pending(&krcp->monitor_work)) {
 		delay_left = krcp->monitor_work.timer.expires - jiffies;
 		if (delay < delay_left)
-			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
+			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
 		return;
 	}
-	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
+	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
 }
 
 static void
@@ -3634,7 +3634,7 @@  static void kfree_rcu_monitor(struct work_struct *work)
 			// be that the work is in the pending state when
 			// channels have been detached following by each
 			// other.
-			queue_rcu_work(system_wq, &krwp->rcu_work);
+			queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
 		}
 	}
 
@@ -3704,7 +3704,7 @@  run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
 			!atomic_xchg(&krcp->work_in_progress, 1)) {
 		if (atomic_read(&krcp->backoff_page_cache_fill)) {
-			queue_delayed_work(system_wq,
+			queue_delayed_work(system_unbound_wq,
 				&krcp->page_cache_work,
 					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
 		} else {