Message ID | 20221013184028.129486-4-leobras@redhat.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | CPU isolation improvements | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
Hi Leonardo,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on herbert-cryptodev-2.6/master]
[also build test WARNING on herbert-crypto-2.6/master helgaas-pci/next tip/sched/core tj-wq/for-next linus/master v6.0 next-20221013]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Leonardo-Bras/CPU-isolation-improvements/20221014-024707
base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: x86_64-rhel-8.3-kunit
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/7df1006c37ca5353fda63cd01ebf5bf3a2ef6562
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Leonardo-Bras/CPU-isolation-improvements/20221014-024707
git checkout 7df1006c37ca5353fda63cd01ebf5bf3a2ef6562
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/pci/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
drivers/pci/pci-driver.c: In function 'pci_call_probe':
>> drivers/pci/pci-driver.c:382:1: warning: label 'out' defined but not used [-Wunused-label]
382 | out:
| ^~~
vim +/out +382 drivers/pci/pci-driver.c
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 349
d42c69972b853fd Andi Kleen 2005-07-06 350 static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
d42c69972b853fd Andi Kleen 2005-07-06 351 const struct pci_device_id *id)
d42c69972b853fd Andi Kleen 2005-07-06 352 {
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 353 int error, node, cpu;
873392ca514f87e Rusty Russell 2008-12-31 354 struct drv_dev_and_id ddi = { drv, dev, id };
873392ca514f87e Rusty Russell 2008-12-31 355
12c3156f10c5d8c Alexander Duyck 2013-11-18 356 /*
12c3156f10c5d8c Alexander Duyck 2013-11-18 357 * Execute driver initialization on node where the device is
12c3156f10c5d8c Alexander Duyck 2013-11-18 358 * attached. This way the driver likely allocates its local memory
12c3156f10c5d8c Alexander Duyck 2013-11-18 359 * on the right node.
12c3156f10c5d8c Alexander Duyck 2013-11-18 360 */
873392ca514f87e Rusty Russell 2008-12-31 361 node = dev_to_node(&dev->dev);
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 362 dev->is_probed = 1;
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 363
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 364 cpu_hotplug_disable();
12c3156f10c5d8c Alexander Duyck 2013-11-18 365
12c3156f10c5d8c Alexander Duyck 2013-11-18 366 /*
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 367 * Prevent nesting work_on_cpu() for the case where a Virtual Function
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 368 * device is probed from work_on_cpu() of the Physical device.
12c3156f10c5d8c Alexander Duyck 2013-11-18 369 */
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 370 if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
9d42ea0d6984044 Frederic Weisbecker 2022-02-07 371 pci_physfn_is_probed(dev)) {
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 372 cpu = nr_cpu_ids;
9d42ea0d6984044 Frederic Weisbecker 2022-02-07 373 } else {
69a18b18699b596 Alex Belits 2020-06-25 374 cpu = cpumask_any_and(cpumask_of_node(node),
7df1006c37ca535 Leonardo Bras 2022-10-13 375 housekeeping_cpumask(HK_TYPE_WQ));
9d42ea0d6984044 Frederic Weisbecker 2022-02-07 376 }
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 377
873392ca514f87e Rusty Russell 2008-12-31 378 if (cpu < nr_cpu_ids)
873392ca514f87e Rusty Russell 2008-12-31 379 error = work_on_cpu(cpu, local_pci_probe, &ddi);
873392ca514f87e Rusty Russell 2008-12-31 380 else
873392ca514f87e Rusty Russell 2008-12-31 381 error = local_pci_probe(&ddi);
9d42ea0d6984044 Frederic Weisbecker 2022-02-07 @382 out:
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 383 dev->is_probed = 0;
0b2c2a71e6f07fb Thomas Gleixner 2017-05-24 384 cpu_hotplug_enable();
d42c69972b853fd Andi Kleen 2005-07-06 385 return error;
d42c69972b853fd Andi Kleen 2005-07-06 386 }
d42c69972b853fd Andi Kleen 2005-07-06 387
+ Frederic; who actually does most of this code On Thu, Oct 13, 2022 at 03:40:28PM -0300, Leonardo Bras wrote: > Housekeeping code keeps multiple cpumasks in order to keep track of which > cpus can perform given housekeeping category. > > Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu > WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that > the Domain isolation also ends up isolating work queues. > > Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ > makes it simpler to check if a cpu can run a task into an work queue, since > code just need to go through a single HK_TYPE_* cpumask. > > Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and > remove a lot of cpumask_and calls. > > Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we > are sure that 'flags == 0' here. > > Signed-off-by: Leonardo Bras <leobras@redhat.com> I've long maintained that having all these separate masks is daft; Frederic do we really need that? > --- > drivers/pci/pci-driver.c | 13 +------------ > kernel/sched/isolation.c | 4 ++-- > kernel/workqueue.c | 1 - > net/core/net-sysfs.c | 1 - > 4 files changed, 3 insertions(+), 16 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 107d77f3c8467..550bef2504b8d 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -371,19 +371,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, > pci_physfn_is_probed(dev)) { > cpu = nr_cpu_ids; > } else { > - cpumask_var_t wq_domain_mask; > - > - if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) { > - error = -ENOMEM; > - goto out; > - } > - cpumask_and(wq_domain_mask, > - housekeeping_cpumask(HK_TYPE_WQ), > - housekeeping_cpumask(HK_TYPE_DOMAIN)); > - > cpu = cpumask_any_and(cpumask_of_node(node), > - wq_domain_mask); > - free_cpumask_var(wq_domain_mask); > + housekeeping_cpumask(HK_TYPE_WQ)); > } > > if (cpu < nr_cpu_ids) > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c > index 373d42c707bc5..ced4b78564810 100644 > --- a/kernel/sched/isolation.c > +++ b/kernel/sched/isolation.c > @@ -204,7 +204,7 @@ static int __init housekeeping_isolcpus_setup(char *str) > > if (!strncmp(str, "domain,", 7)) { > str += 7; > - flags |= HK_FLAG_DOMAIN; > + flags |= HK_FLAG_DOMAIN | HK_FLAG_WQ; > continue; > } > > @@ -234,7 +234,7 @@ static int __init housekeeping_isolcpus_setup(char *str) > > /* Default behaviour for isolcpus without flags */ > if (!flags) > - flags |= HK_FLAG_DOMAIN; > + flags = HK_FLAG_DOMAIN | HK_FLAG_WQ; > > return housekeeping_setup(str, flags); > } > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 7cd5f5e7e0a1b..b557daa571f17 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -6004,7 +6004,6 @@ void __init workqueue_init_early(void) > > BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL)); > cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ)); > - cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN)); > > pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC); > > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c > index 8409d41405dfe..7b6fb62a118ab 100644 > --- a/net/core/net-sysfs.c > +++ b/net/core/net-sysfs.c > @@ -852,7 +852,6 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, > } > > if (!cpumask_empty(mask)) { > - cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); > cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); > if (cpumask_empty(mask)) { > free_cpumask_var(mask); > -- > 2.38.0 >
On Fri, Oct 14, 2022 at 10:36:19AM +0200, Peter Zijlstra wrote: > > + Frederic; who actually does most of this code > > On Thu, Oct 13, 2022 at 03:40:28PM -0300, Leonardo Bras wrote: > > Housekeeping code keeps multiple cpumasks in order to keep track of which > > cpus can perform given housekeeping category. > > > > Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu > > WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that > > the Domain isolation also ends up isolating work queues. > > > > Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ > > makes it simpler to check if a cpu can run a task into an work queue, since > > code just need to go through a single HK_TYPE_* cpumask. > > > > Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and > > remove a lot of cpumask_and calls. > > > > Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we > > are sure that 'flags == 0' here. > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > I've long maintained that having all these separate masks is daft; > Frederic do we really need that? Indeed. In my queue for the cpuset interface to nohz_full, I have the following patch (but note DOMAIN and WQ have to stay seperate flags because workqueue affinity can be modified seperately from isolcpus) --- From: Frederic Weisbecker <frederic@kernel.org> Date: Tue, 26 Jul 2022 17:03:30 +0200 Subject: [PATCH] sched/isolation: Gather nohz_full related isolation features into common flag Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- arch/x86/kvm/x86.c | 2 +- drivers/pci/pci-driver.c | 2 +- include/linux/sched/isolation.h | 7 +------ kernel/cpu.c | 4 ++-- kernel/kthread.c | 4 ++-- kernel/rcu/tasks.h | 2 +- kernel/rcu/tree_plugin.h | 6 +++--- kernel/sched/core.c | 10 +++++----- kernel/sched/fair.c | 6 +++--- kernel/sched/isolation.c | 25 +++++++------------------ kernel/watchdog.c | 2 +- kernel/workqueue.c | 2 +- net/core/net-sysfs.c | 2 +- 13 files changed, 29 insertions(+), 45 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1910e1e78b15..d0b73fcf4a1c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9009,7 +9009,7 @@ int kvm_arch_init(void *opaque) } if (pi_inject_timer == -1) - pi_inject_timer = housekeeping_enabled(HK_TYPE_TIMER); + pi_inject_timer = housekeeping_enabled(HK_TYPE_NOHZ_FULL); #ifdef CONFIG_X86_64 pvclock_gtod_register_notifier(&pvclock_gtod_notifier); diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 49238ddd39ee..af3494a39921 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -378,7 +378,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, goto out; } cpumask_and(wq_domain_mask, - housekeeping_cpumask(HK_TYPE_WQ), + housekeeping_cpumask(HK_TYPE_NOHZ_FULL), housekeeping_cpumask(HK_TYPE_DOMAIN)); cpu = cpumask_any_and(cpumask_of_node(node), diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index 8c15abd67aed..7ca34e04abe7 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -6,15 +6,10 @@ #include <linux/tick.h> enum hk_type { - HK_TYPE_TIMER, - HK_TYPE_RCU, - HK_TYPE_MISC, + HK_TYPE_NOHZ_FULL, HK_TYPE_SCHED, - HK_TYPE_TICK, HK_TYPE_DOMAIN, - HK_TYPE_WQ, HK_TYPE_MANAGED_IRQ, - HK_TYPE_KTHREAD, HK_TYPE_MAX }; diff --git a/kernel/cpu.c b/kernel/cpu.c index bbad5e375d3b..573f14d75a2e 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1500,8 +1500,8 @@ int freeze_secondary_cpus(int primary) cpu_maps_update_begin(); if (primary == -1) { primary = cpumask_first(cpu_online_mask); - if (!housekeeping_cpu(primary, HK_TYPE_TIMER)) - primary = housekeeping_any_cpu(HK_TYPE_TIMER); + if (!housekeeping_cpu(primary, HK_TYPE_NOHZ_FULL)) + primary = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL); } else { if (!cpu_online(primary)) primary = cpumask_first(cpu_online_mask); diff --git a/kernel/kthread.c b/kernel/kthread.c index 544fd4097406..0719035feba0 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -355,7 +355,7 @@ static int kthread(void *_create) * back to default in case they have been changed. */ sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_UNINTERRUPTIBLE); @@ -721,7 +721,7 @@ int kthreadd(void *unused) /* Setup a clean context for our children to inherit. */ set_task_comm(tsk, "kthreadd"); ignore_signals(tsk); - set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_KTHREAD)); + set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); set_mems_allowed(node_states[N_MEMORY]); current->flags |= PF_NOFREEZE; diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index f5bf6fb430da..b99f79625b26 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -537,7 +537,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) struct rcu_tasks *rtp = arg; /* Run on housekeeping CPUs by default. Sysadm can move if desired. */ - housekeeping_affine(current, HK_TYPE_RCU); + housekeeping_affine(current, HK_TYPE_NOHZ_FULL); WRITE_ONCE(rtp->kthread_ptr, current); // Let GPs start! /* diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index b2219577fbe2..4935b06c3caf 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1237,9 +1237,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) if ((mask & leaf_node_cpu_bit(rnp, cpu)) && cpu != outgoingcpu) cpumask_set_cpu(cpu, cm); - cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU)); + cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); if (cpumask_empty(cm)) - cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU)); + cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); set_cpus_allowed_ptr(t, cm); mutex_unlock(&rnp->boost_kthread_mutex); free_cpumask_var(cm); @@ -1294,5 +1294,5 @@ static void rcu_bind_gp_kthread(void) { if (!tick_nohz_full_enabled()) return; - housekeeping_affine(current, HK_TYPE_RCU); + housekeeping_affine(current, HK_TYPE_NOHZ_FULL); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f53c0096860b..5ff205f39197 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1079,13 +1079,13 @@ int get_nohz_timer_target(void) struct sched_domain *sd; const struct cpumask *hk_mask; - if (housekeeping_cpu(cpu, HK_TYPE_TIMER)) { + if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) { if (!idle_cpu(cpu)) return cpu; default_cpu = cpu; } - hk_mask = housekeeping_cpumask(HK_TYPE_TIMER); + hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL); rcu_read_lock(); for_each_domain(cpu, sd) { @@ -1101,7 +1101,7 @@ int get_nohz_timer_target(void) } if (default_cpu == -1) - default_cpu = housekeeping_any_cpu(HK_TYPE_TIMER); + default_cpu = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL); cpu = default_cpu; unlock: rcu_read_unlock(); @@ -5562,7 +5562,7 @@ static void sched_tick_start(int cpu) int os; struct tick_work *twork; - if (housekeeping_cpu(cpu, HK_TYPE_TICK)) + if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) return; WARN_ON_ONCE(!tick_work_cpu); @@ -5583,7 +5583,7 @@ static void sched_tick_stop(int cpu) struct tick_work *twork; int os; - if (housekeeping_cpu(cpu, HK_TYPE_TICK)) + if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) return; WARN_ON_ONCE(!tick_work_cpu); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 77b2048a9326..ac3b33e00451 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10375,7 +10375,7 @@ static inline int on_null_domain(struct rq *rq) * - When one of the busy CPUs notice that there may be an idle rebalancing * needed, they will kick the idle load balancer, which then does idle * load balancing for all the idle CPUs. - * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED not set + * - HK_TYPE_NOHZ_FULL CPUs are used for this task, because HK_TYPE_SCHED not set * anywhere yet. */ @@ -10384,7 +10384,7 @@ static inline int find_new_ilb(void) int ilb; const struct cpumask *hk_mask; - hk_mask = housekeeping_cpumask(HK_TYPE_MISC); + hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL); for_each_cpu_and(ilb, nohz.idle_cpus_mask, hk_mask) { @@ -10400,7 +10400,7 @@ static inline int find_new_ilb(void) /* * Kick a CPU to do the nohz balancing, if it is time for it. We pick any - * idle CPU in the HK_TYPE_MISC housekeeping set (if there is one). + * idle CPU in the HK_TYPE_NOHZ_FULL housekeeping set (if there is one). */ static void kick_ilb(unsigned int flags) { diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 4087718ee5b4..443f1ce83e32 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -4,20 +4,15 @@ * any CPU: unbound workqueues, timers, kthreads and any offloadable work. * * Copyright (C) 2017 Red Hat, Inc., Frederic Weisbecker - * Copyright (C) 2017-2018 SUSE, Frederic Weisbecker + * Copyright (C) 2017-2022 SUSE, Frederic Weisbecker * */ enum hk_flags { - HK_FLAG_TIMER = BIT(HK_TYPE_TIMER), - HK_FLAG_RCU = BIT(HK_TYPE_RCU), - HK_FLAG_MISC = BIT(HK_TYPE_MISC), + HK_FLAG_NOHZ_FULL = BIT(HK_TYPE_NOHZ_FULL), HK_FLAG_SCHED = BIT(HK_TYPE_SCHED), - HK_FLAG_TICK = BIT(HK_TYPE_TICK), HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN), - HK_FLAG_WQ = BIT(HK_TYPE_WQ), HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ), - HK_FLAG_KTHREAD = BIT(HK_TYPE_KTHREAD), }; DEFINE_STATIC_KEY_FALSE(housekeeping_overridden); @@ -88,7 +83,7 @@ void __init housekeeping_init(void) static_branch_enable(&housekeeping_overridden); - if (housekeeping.flags & HK_FLAG_TICK) + if (housekeeping.flags & HK_FLAG_NOHZ_FULL) sched_tick_offload_init(); for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) { @@ -111,7 +106,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags) cpumask_var_t non_housekeeping_mask, housekeeping_staging; int err = 0; - if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) { + if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL)) { if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) { pr_warn("Housekeeping: nohz unsupported." " Build with CONFIG_NO_HZ_FULL\n"); @@ -163,7 +158,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags) housekeeping_setup_type(type, housekeeping_staging); } - if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) + if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL)) tick_nohz_full_setup(non_housekeeping_mask); housekeeping.flags |= flags; @@ -179,12 +174,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags) static int __init housekeeping_nohz_full_setup(char *str) { - unsigned long flags; - - flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | - HK_FLAG_MISC | HK_FLAG_KTHREAD; - - return housekeeping_setup(str, flags); + return housekeeping_setup(str, HK_FLAG_NOHZ_FULL); } __setup("nohz_full=", housekeeping_nohz_full_setup); @@ -198,8 +188,7 @@ static int __init housekeeping_isolcpus_setup(char *str) while (isalpha(*str)) { if (!strncmp(str, "nohz,", 5)) { str += 5; - flags |= HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | - HK_FLAG_RCU | HK_FLAG_MISC | HK_FLAG_KTHREAD; + flags |= HK_FLAG_NOHZ_FULL; continue; } diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 20a7a55e62b6..3e9636f4bac6 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -852,7 +852,7 @@ void __init lockup_detector_init(void) pr_info("Disabling watchdog on nohz_full cores by default\n"); cpumask_copy(&watchdog_cpumask, - housekeeping_cpumask(HK_TYPE_TIMER)); + housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); if (!watchdog_nmi_probe()) nmi_watchdog_available = true; diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 1ea50f6be843..3eb283d76d81 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -5993,7 +5993,7 @@ void __init workqueue_init_early(void) BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long long)); BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL)); - cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ)); + cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN)); pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index e319e242dddf..6dddf359b754 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -852,7 +852,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, if (!cpumask_empty(mask)) { cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); - cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); + cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); if (cpumask_empty(mask)) { free_cpumask_var(mask); return -EINVAL;
On Fri, 2022-10-14 at 15:24 +0200, Frederic Weisbecker wrote: > On Fri, Oct 14, 2022 at 10:36:19AM +0200, Peter Zijlstra wrote: > > > > + Frederic; who actually does most of this code > > > > On Thu, Oct 13, 2022 at 03:40:28PM -0300, Leonardo Bras wrote: > > > Housekeeping code keeps multiple cpumasks in order to keep track of which > > > cpus can perform given housekeeping category. > > > > > > Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu > > > WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that > > > the Domain isolation also ends up isolating work queues. > > > > > > Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ > > > makes it simpler to check if a cpu can run a task into an work queue, since > > > code just need to go through a single HK_TYPE_* cpumask. > > > > > > Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and > > > remove a lot of cpumask_and calls. > > > > > > Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we > > > are sure that 'flags == 0' here. > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > > > I've long maintained that having all these separate masks is daft; > > Frederic do we really need that? > > Indeed. In my queue for the cpuset interface to nohz_full, I have the following > patch (but note DOMAIN and WQ have to stay seperate flags because workqueue > affinity can be modified seperately from isolcpus) > > --- > From: Frederic Weisbecker <frederic@kernel.org> > Date: Tue, 26 Jul 2022 17:03:30 +0200 > Subject: [PATCH] sched/isolation: Gather nohz_full related isolation features > into common flag > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > --- > arch/x86/kvm/x86.c | 2 +- > drivers/pci/pci-driver.c | 2 +- > include/linux/sched/isolation.h | 7 +------ > kernel/cpu.c | 4 ++-- > kernel/kthread.c | 4 ++-- > kernel/rcu/tasks.h | 2 +- > kernel/rcu/tree_plugin.h | 6 +++--- > kernel/sched/core.c | 10 +++++----- > kernel/sched/fair.c | 6 +++--- > kernel/sched/isolation.c | 25 +++++++------------------ > kernel/watchdog.c | 2 +- > kernel/workqueue.c | 2 +- > net/core/net-sysfs.c | 2 +- > 13 files changed, 29 insertions(+), 45 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 1910e1e78b15..d0b73fcf4a1c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9009,7 +9009,7 @@ int kvm_arch_init(void *opaque) > } > > if (pi_inject_timer == -1) > - pi_inject_timer = housekeeping_enabled(HK_TYPE_TIMER); > + pi_inject_timer = housekeeping_enabled(HK_TYPE_NOHZ_FULL); > #ifdef CONFIG_X86_64 > pvclock_gtod_register_notifier(&pvclock_gtod_notifier); > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 49238ddd39ee..af3494a39921 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -378,7 +378,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, > goto out; > } > cpumask_and(wq_domain_mask, > - housekeeping_cpumask(HK_TYPE_WQ), > + housekeeping_cpumask(HK_TYPE_NOHZ_FULL), > housekeeping_cpumask(HK_TYPE_DOMAIN)); > > cpu = cpumask_any_and(cpumask_of_node(node), > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h > index 8c15abd67aed..7ca34e04abe7 100644 > --- a/include/linux/sched/isolation.h > +++ b/include/linux/sched/isolation.h > @@ -6,15 +6,10 @@ > #include <linux/tick.h> > > enum hk_type { > - HK_TYPE_TIMER, > - HK_TYPE_RCU, > - HK_TYPE_MISC, > + HK_TYPE_NOHZ_FULL, > HK_TYPE_SCHED, > - HK_TYPE_TICK, > HK_TYPE_DOMAIN, > - HK_TYPE_WQ, > HK_TYPE_MANAGED_IRQ, > - HK_TYPE_KTHREAD, > HK_TYPE_MAX > }; > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index bbad5e375d3b..573f14d75a2e 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -1500,8 +1500,8 @@ int freeze_secondary_cpus(int primary) > cpu_maps_update_begin(); > if (primary == -1) { > primary = cpumask_first(cpu_online_mask); > - if (!housekeeping_cpu(primary, HK_TYPE_TIMER)) > - primary = housekeeping_any_cpu(HK_TYPE_TIMER); > + if (!housekeeping_cpu(primary, HK_TYPE_NOHZ_FULL)) > + primary = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL); > } else { > if (!cpu_online(primary)) > primary = cpumask_first(cpu_online_mask); > diff --git a/kernel/kthread.c b/kernel/kthread.c > index 544fd4097406..0719035feba0 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -355,7 +355,7 @@ static int kthread(void *_create) > * back to default in case they have been changed. > */ > sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); > - set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); > + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); > > /* OK, tell user we're spawned, wait for stop or wakeup */ > __set_current_state(TASK_UNINTERRUPTIBLE); > @@ -721,7 +721,7 @@ int kthreadd(void *unused) > /* Setup a clean context for our children to inherit. */ > set_task_comm(tsk, "kthreadd"); > ignore_signals(tsk); > - set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_KTHREAD)); > + set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); > set_mems_allowed(node_states[N_MEMORY]); > > current->flags |= PF_NOFREEZE; > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > index f5bf6fb430da..b99f79625b26 100644 > --- a/kernel/rcu/tasks.h > +++ b/kernel/rcu/tasks.h > @@ -537,7 +537,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) > struct rcu_tasks *rtp = arg; > > /* Run on housekeeping CPUs by default. Sysadm can move if desired. */ > - housekeeping_affine(current, HK_TYPE_RCU); > + housekeeping_affine(current, HK_TYPE_NOHZ_FULL); > WRITE_ONCE(rtp->kthread_ptr, current); // Let GPs start! > > /* > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index b2219577fbe2..4935b06c3caf 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -1237,9 +1237,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) > if ((mask & leaf_node_cpu_bit(rnp, cpu)) && > cpu != outgoingcpu) > cpumask_set_cpu(cpu, cm); > - cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU)); > + cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); > if (cpumask_empty(cm)) > - cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU)); > + cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); > set_cpus_allowed_ptr(t, cm); > mutex_unlock(&rnp->boost_kthread_mutex); > free_cpumask_var(cm); > @@ -1294,5 +1294,5 @@ static void rcu_bind_gp_kthread(void) > { > if (!tick_nohz_full_enabled()) > return; > - housekeeping_affine(current, HK_TYPE_RCU); > + housekeeping_affine(current, HK_TYPE_NOHZ_FULL); > } > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index f53c0096860b..5ff205f39197 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1079,13 +1079,13 @@ int get_nohz_timer_target(void) > struct sched_domain *sd; > const struct cpumask *hk_mask; > > - if (housekeeping_cpu(cpu, HK_TYPE_TIMER)) { > + if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) { > if (!idle_cpu(cpu)) > return cpu; > default_cpu = cpu; > } > > - hk_mask = housekeeping_cpumask(HK_TYPE_TIMER); > + hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL); > > rcu_read_lock(); > for_each_domain(cpu, sd) { > @@ -1101,7 +1101,7 @@ int get_nohz_timer_target(void) > } > > if (default_cpu == -1) > - default_cpu = housekeeping_any_cpu(HK_TYPE_TIMER); > + default_cpu = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL); > cpu = default_cpu; > unlock: > rcu_read_unlock(); > @@ -5562,7 +5562,7 @@ static void sched_tick_start(int cpu) > int os; > struct tick_work *twork; > > - if (housekeeping_cpu(cpu, HK_TYPE_TICK)) > + if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) > return; > > WARN_ON_ONCE(!tick_work_cpu); > @@ -5583,7 +5583,7 @@ static void sched_tick_stop(int cpu) > struct tick_work *twork; > int os; > > - if (housekeeping_cpu(cpu, HK_TYPE_TICK)) > + if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) > return; > > WARN_ON_ONCE(!tick_work_cpu); > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 77b2048a9326..ac3b33e00451 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10375,7 +10375,7 @@ static inline int on_null_domain(struct rq *rq) > * - When one of the busy CPUs notice that there may be an idle rebalancing > * needed, they will kick the idle load balancer, which then does idle > * load balancing for all the idle CPUs. > - * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED not set > + * - HK_TYPE_NOHZ_FULL CPUs are used for this task, because HK_TYPE_SCHED not set > * anywhere yet. > */ > > @@ -10384,7 +10384,7 @@ static inline int find_new_ilb(void) > int ilb; > const struct cpumask *hk_mask; > > - hk_mask = housekeeping_cpumask(HK_TYPE_MISC); > + hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL); > > for_each_cpu_and(ilb, nohz.idle_cpus_mask, hk_mask) { > > @@ -10400,7 +10400,7 @@ static inline int find_new_ilb(void) > > /* > * Kick a CPU to do the nohz balancing, if it is time for it. We pick any > - * idle CPU in the HK_TYPE_MISC housekeeping set (if there is one). > + * idle CPU in the HK_TYPE_NOHZ_FULL housekeeping set (if there is one). > */ > static void kick_ilb(unsigned int flags) > { > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c > index 4087718ee5b4..443f1ce83e32 100644 > --- a/kernel/sched/isolation.c > +++ b/kernel/sched/isolation.c > @@ -4,20 +4,15 @@ > * any CPU: unbound workqueues, timers, kthreads and any offloadable work. > * > * Copyright (C) 2017 Red Hat, Inc., Frederic Weisbecker > - * Copyright (C) 2017-2018 SUSE, Frederic Weisbecker > + * Copyright (C) 2017-2022 SUSE, Frederic Weisbecker > * > */ > > enum hk_flags { > - HK_FLAG_TIMER = BIT(HK_TYPE_TIMER), > - HK_FLAG_RCU = BIT(HK_TYPE_RCU), > - HK_FLAG_MISC = BIT(HK_TYPE_MISC), > + HK_FLAG_NOHZ_FULL = BIT(HK_TYPE_NOHZ_FULL), > HK_FLAG_SCHED = BIT(HK_TYPE_SCHED), > - HK_FLAG_TICK = BIT(HK_TYPE_TICK), > HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN), > - HK_FLAG_WQ = BIT(HK_TYPE_WQ), > HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ), > - HK_FLAG_KTHREAD = BIT(HK_TYPE_KTHREAD), > }; > > DEFINE_STATIC_KEY_FALSE(housekeeping_overridden); > @@ -88,7 +83,7 @@ void __init housekeeping_init(void) > > static_branch_enable(&housekeeping_overridden); > > - if (housekeeping.flags & HK_FLAG_TICK) > + if (housekeeping.flags & HK_FLAG_NOHZ_FULL) > sched_tick_offload_init(); > > for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) { > @@ -111,7 +106,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags) > cpumask_var_t non_housekeeping_mask, housekeeping_staging; > int err = 0; > > - if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) { > + if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL)) { > if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) { > pr_warn("Housekeeping: nohz unsupported." > " Build with CONFIG_NO_HZ_FULL\n"); > @@ -163,7 +158,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags) > housekeeping_setup_type(type, housekeeping_staging); > } > > - if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) > + if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL)) > tick_nohz_full_setup(non_housekeeping_mask); > > housekeeping.flags |= flags; > @@ -179,12 +174,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags) > > static int __init housekeeping_nohz_full_setup(char *str) > { > - unsigned long flags; > - > - flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | > - HK_FLAG_MISC | HK_FLAG_KTHREAD; > - > - return housekeeping_setup(str, flags); > + return housekeeping_setup(str, HK_FLAG_NOHZ_FULL); > } > __setup("nohz_full=", housekeeping_nohz_full_setup); > > @@ -198,8 +188,7 @@ static int __init housekeeping_isolcpus_setup(char *str) > while (isalpha(*str)) { > if (!strncmp(str, "nohz,", 5)) { > str += 5; > - flags |= HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | > - HK_FLAG_RCU | HK_FLAG_MISC | HK_FLAG_KTHREAD; > + flags |= HK_FLAG_NOHZ_FULL; > continue; > } > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 20a7a55e62b6..3e9636f4bac6 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -852,7 +852,7 @@ void __init lockup_detector_init(void) > pr_info("Disabling watchdog on nohz_full cores by default\n"); > > cpumask_copy(&watchdog_cpumask, > - housekeeping_cpumask(HK_TYPE_TIMER)); > + housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); > > if (!watchdog_nmi_probe()) > nmi_watchdog_available = true; > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 1ea50f6be843..3eb283d76d81 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -5993,7 +5993,7 @@ void __init workqueue_init_early(void) > BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long long)); > > BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL)); > - cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ)); > + cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); > cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN)); > > pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC); > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c > index e319e242dddf..6dddf359b754 100644 > --- a/net/core/net-sysfs.c > +++ b/net/core/net-sysfs.c > @@ -852,7 +852,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, > > if (!cpumask_empty(mask)) { > cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); > - cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); > + cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL)); > if (cpumask_empty(mask)) { > free_cpumask_var(mask); > return -EINVAL; Hello Frederic, So, IIUC you are removing all flags composing nohz_full= parameter in favor of a unified NOHZ_FULL flag. I am very new to the code, and I am probably missing the whole picture, but I actually think it's a good approach to keep them split for a couple reasons: 1 - They are easier to understand in code (IMHO): "This cpu should not do this, because it's not able to do WQ housekeeping" looks better than "because it's not in DOMAIN or NOHZ_FULL housekeeping" 2 - They are simpler for using: Suppose we have this function that should run at a WQ, but we want to keep them out of the isolated cpus. If we have the unified flags, we need to combine both DOMAIN and NOHZ_FULL bitmasks, and then combine it again with something like cpu_online_mask. It usually means allocating a new cpumask_t, and also freeing it afterwards. If we have a single WQ flag, we can avoid the allocation altogether by using for_each_cpu_and(), making the code much simpler. 3 - It makes easier to compose new isolation modes: In case the future requires a new isolation mode that also uses the types of isolation we currently have implemented, it would be much easier to just compose it with the current HK flags, instead of having to go through all usages and do a cpumask_and() there. Also, new isolation modes would make (2) worse. What I am able see as a pro in "unified flag" side, is that getting all the multiple flags doing their jobs should be a lot of trouble. While I do understand you are much more experienced than me on that, and that your decision is more likely to be better, I am unable to see the arguments that helped you reach it. Could you please point them, so I can better understand the decision? Best regards, Leo
On Fri, Oct 14, 2022 at 01:27:25PM -0300, Leonardo Brás wrote: > Hello Frederic, > > So, IIUC you are removing all flags composing nohz_full= parameter in favor of a > unified NOHZ_FULL flag. > > I am very new to the code, and I am probably missing the whole picture, but I > actually think it's a good approach to keep them split for a couple reasons: > 1 - They are easier to understand in code (IMHO): > "This cpu should not do this, because it's not able to do WQ housekeeping" looks > better than "because it's not in DOMAIN or NOHZ_FULL housekeeping" A comment above each site may solve that. > > 2 - They are simpler for using: > Suppose we have this function that should run at a WQ, but we want to keep them > out of the isolated cpus. If we have the unified flags, we need to combine both > DOMAIN and NOHZ_FULL bitmasks, and then combine it again with something like > cpu_online_mask. It usually means allocating a new cpumask_t, and also freeing > it afterwards. > If we have a single WQ flag, we can avoid the allocation altogether by using > for_each_cpu_and(), making the code much simpler. I guess having a specific function for workqueues would arrange for it. > > 3 - It makes easier to compose new isolation modes: > In case the future requires a new isolation mode that also uses the types of > isolation we currently have implemented, it would be much easier to just compose > it with the current HK flags, instead of having to go through all usages and do > a cpumask_and() there. Also, new isolation modes would make (2) worse. Actually having a new feature merged in HK_NOHZ_FULL would make it easier to handle as it avoids spreading cpumasks. I'm not sure I understand what you mean. Thanks.
On Tue, 2022-11-29 at 13:10 +0100, Frederic Weisbecker wrote: > On Fri, Oct 14, 2022 at 01:27:25PM -0300, Leonardo Brás wrote: > > Hello Frederic, > > > > So, IIUC you are removing all flags composing nohz_full= parameter in favor of a > > unified NOHZ_FULL flag. > > > > I am very new to the code, and I am probably missing the whole picture, but I > > actually think it's a good approach to keep them split for a couple reasons: > > 1 - They are easier to understand in code (IMHO): > > "This cpu should not do this, because it's not able to do WQ housekeeping" looks > > better than "because it's not in DOMAIN or NOHZ_FULL housekeeping" > > A comment above each site may solve that. Sure, but not having to leave comments would be better. Or am I missing something? > > > > > 2 - They are simpler for using: > > Suppose we have this function that should run at a WQ, but we want to keep them > > out of the isolated cpus. If we have the unified flags, we need to combine both > > DOMAIN and NOHZ_FULL bitmasks, and then combine it again with something like > > cpu_online_mask. It usually means allocating a new cpumask_t, and also freeing > > it afterwards. > > If we have a single WQ flag, we can avoid the allocation altogether by using > > for_each_cpu_and(), making the code much simpler. > > I guess having a specific function for workqueues would arrange for it. You mean keeping a WQ housekeeping bitmap? This could be a solution, but it would affect only the WQ example. > > > > > 3 - It makes easier to compose new isolation modes: > > In case the future requires a new isolation mode that also uses the types of > > isolation we currently have implemented, it would be much easier to just compose > > it with the current HK flags, instead of having to go through all usages and do > > a cpumask_and() there. Also, new isolation modes would make (2) worse. > > Actually having a new feature merged in HK_NOHZ_FULL would make it easier to > handle as it avoids spreading cpumasks. I'm not sure I understand what you > mean. IIUC, your queued patch merges the housekeeping types HK_TYPE_TIMER, HK_TYPE_RCU, HK_TYPE_MISC, HK_TYPE_TICK, HK_TYPE_WQ and HK_TYPE_KTHREAD in a single HK_TYPE_NOHZ_FULL. Suppose in future we need a new isolation feature in cmdline, say isol_new=<cpulist>, and it works exactly like nohz_full=<cpulist>, but also needs to isolate cpulist against something else, say doing X. How would this get implemented? IIUC, following the same pattern: - A new type HK_TYPE_ISOL_NEW would be created together with a cpumask, - The new cpumask would be used to keep cpulist from doing X - All places that use HK_TYPE_NOHZ_FULL bitmap for isolation would need to also bitmask_and() the new cpumask. (sometimes needing a local cpumask_t) Ok, there may be shortcuts for this, like keeping an intermediary bitmap, but that can become tricky. Other more complex example: New isolation feature isol_new2=<cpulist> behaves like nohz_full=<cpulist>, keeps cpulist from doing X, but allows unbound RCU work. Now it's even harder to have shortcuts from previous implementation. What I am trying to defend here is that keeping the HK_type with the idea of "things to get cpulist isolated from" works better for future implementations than a single flag with a lot of responsibilities: - A new type HK_TYPE_X would be created together with a cpumask, - The new cpumask would be used to keep cpulist from doing X - isol_new=<cpulist> is composed with the flags for what cpulist is getting isolated. - (No need to touch already implemented isolations.) In fact, I propose that it works better for current implementations also: The current patch (3/4) takes the WQ isolation responsibility from HK_TYPE_DOMAIN and focus it in HK_TYPE_WQ, adding it to isolcpus=<cpulist> flags. This avoids some cpumask_and()s, and a cpumask_t kzalloc, and makes the code less complex to implement when we need to put isolation in further parts of the code. (patch 4/4) I am not sure if I am missing some important point here. Please let me know if it's the case. > > Thanks. > Thank you for replying! Leo
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 107d77f3c8467..550bef2504b8d 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -371,19 +371,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, pci_physfn_is_probed(dev)) { cpu = nr_cpu_ids; } else { - cpumask_var_t wq_domain_mask; - - if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) { - error = -ENOMEM; - goto out; - } - cpumask_and(wq_domain_mask, - housekeeping_cpumask(HK_TYPE_WQ), - housekeeping_cpumask(HK_TYPE_DOMAIN)); - cpu = cpumask_any_and(cpumask_of_node(node), - wq_domain_mask); - free_cpumask_var(wq_domain_mask); + housekeeping_cpumask(HK_TYPE_WQ)); } if (cpu < nr_cpu_ids) diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 373d42c707bc5..ced4b78564810 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -204,7 +204,7 @@ static int __init housekeeping_isolcpus_setup(char *str) if (!strncmp(str, "domain,", 7)) { str += 7; - flags |= HK_FLAG_DOMAIN; + flags |= HK_FLAG_DOMAIN | HK_FLAG_WQ; continue; } @@ -234,7 +234,7 @@ static int __init housekeeping_isolcpus_setup(char *str) /* Default behaviour for isolcpus without flags */ if (!flags) - flags |= HK_FLAG_DOMAIN; + flags = HK_FLAG_DOMAIN | HK_FLAG_WQ; return housekeeping_setup(str, flags); } diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 7cd5f5e7e0a1b..b557daa571f17 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -6004,7 +6004,6 @@ void __init workqueue_init_early(void) BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL)); cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ)); - cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN)); pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 8409d41405dfe..7b6fb62a118ab 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -852,7 +852,6 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, } if (!cpumask_empty(mask)) { - cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); if (cpumask_empty(mask)) { free_cpumask_var(mask);
Housekeeping code keeps multiple cpumasks in order to keep track of which cpus can perform given housekeeping category. Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that the Domain isolation also ends up isolating work queues. Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ makes it simpler to check if a cpu can run a task into an work queue, since code just need to go through a single HK_TYPE_* cpumask. Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and remove a lot of cpumask_and calls. Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we are sure that 'flags == 0' here. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- drivers/pci/pci-driver.c | 13 +------------ kernel/sched/isolation.c | 4 ++-- kernel/workqueue.c | 1 - net/core/net-sysfs.c | 1 - 4 files changed, 3 insertions(+), 16 deletions(-)