Message ID | 20220923132527.1001870-1-vschneid@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | sched, net: NUMA-aware CPU spreading interface | expand |
On Fri, Sep 23, 2022 at 02:25:20PM +0100, Valentin Schneider wrote: > Hi folks, Hi, I received only 1st patch of the series. Can you give me a link for the full series so that I'll see how the new API is used? Thanks, Yury > Tariq pointed out in [1] that drivers allocating IRQ vectors would benefit > from having smarter NUMA-awareness (cpumask_local_spread() doesn't quite cut > it). > > The proposed interface involved an array of CPUs and a temporary cpumask, and > being my difficult self what I'm proposing here is an interface that doesn't > require any temporary storage other than some stack variables (at the cost of > one wild macro). > > Please note that this is based on top of Yury's bitmap-for-next [2] to leverage > his fancy new FIND_NEXT_BIT() macro. > > [1]: https://lore.kernel.org/all/20220728191203.4055-1-tariqt@nvidia.com/ > [2]: https://github.com/norov/linux.git/ -b bitmap-for-next > > A note on treewide use of for_each_cpu_andnot() > =============================================== > > I've used the below coccinelle script to find places that could be patched (I > couldn't figure out the valid syntax to patch from coccinelle itself): > > ,----- > @tmpandnot@ > expression tmpmask; > iterator for_each_cpu; > position p; > statement S; > @@ > cpumask_andnot(tmpmask, ...); > > ... > > ( > for_each_cpu@p(..., tmpmask, ...) > S > | > for_each_cpu@p(..., tmpmask, ...) > { > ... > } > ) > > @script:python depends on tmpandnot@ > p << tmpandnot.p; > @@ > coccilib.report.print_report(p[0], "andnot loop here") > '----- > > Which yields (against c40e8341e3b3): > > .//arch/powerpc/kernel/smp.c:1587:1-13: andnot loop here > .//arch/powerpc/kernel/smp.c:1530:1-13: andnot loop here > .//arch/powerpc/kernel/smp.c:1440:1-13: andnot loop here > .//arch/powerpc/platforms/powernv/subcore.c:306:2-14: andnot loop here > .//arch/x86/kernel/apic/x2apic_cluster.c:62:1-13: andnot loop here > .//drivers/acpi/acpi_pad.c:110:1-13: andnot loop here > .//drivers/cpufreq/armada-8k-cpufreq.c:148:1-13: andnot loop here > .//drivers/cpufreq/powernv-cpufreq.c:931:1-13: andnot loop here > .//drivers/net/ethernet/sfc/efx_channels.c:73:1-13: andnot loop here > .//drivers/net/ethernet/sfc/siena/efx_channels.c:73:1-13: andnot loop here > .//kernel/sched/core.c:345:1-13: andnot loop here > .//kernel/sched/core.c:366:1-13: andnot loop here > .//net/core/dev.c:3058:1-13: andnot loop here > > A lot of those are actually of the shape > > for_each_cpu(cpu, mask) { > ... > cpumask_andnot(mask, ...); > } > > I think *some* of the powerpc ones would be a match for for_each_cpu_andnot(), > but I decided to just stick to the one obvious one in __sched_core_flip(). > > Revisions > ========= > > v3 -> v4 > ++++++++ > > o Rebased on top of Yury's bitmap-for-next > o Added Tariq's mlx5e patch > o Made sched_numa_hop_mask() return cpu_online_mask for the NUMA_NO_NODE && > hops=0 case > > v2 -> v3 > ++++++++ > > o Added for_each_cpu_and() and for_each_cpu_andnot() tests (Yury) > o New patches to fix issues raised by running the above > > o New patch to use for_each_cpu_andnot() in sched/core.c (Yury) > > v1 -> v2 > ++++++++ > > o Split _find_next_bit() @invert into @invert1 and @invert2 (Yury) > o Rebase onto v6.0-rc1 > > Cheers, > Valentin > > Tariq Toukan (1): > net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity > hints > > Valentin Schneider (6): > lib/find_bit: Introduce find_next_andnot_bit() > cpumask: Introduce for_each_cpu_andnot() > lib/test_cpumask: Add for_each_cpu_and(not) tests > sched/core: Merge cpumask_andnot()+for_each_cpu() into > for_each_cpu_andnot() > sched/topology: Introduce sched_numa_hop_mask() > sched/topology: Introduce for_each_numa_hop_cpu() > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 13 +++++- > include/linux/cpumask.h | 39 ++++++++++++++++ > include/linux/find.h | 33 +++++++++++++ > include/linux/topology.h | 49 ++++++++++++++++++++ > kernel/sched/core.c | 5 +- > kernel/sched/topology.c | 31 +++++++++++++ > lib/cpumask_kunit.c | 19 ++++++++ > lib/find_bit.c | 9 ++++ > 8 files changed, 192 insertions(+), 6 deletions(-) > > -- > 2.31.1
On 23/09/22 08:44, Yury Norov wrote: > On Fri, Sep 23, 2022 at 02:25:20PM +0100, Valentin Schneider wrote: >> Hi folks, > > Hi, > > I received only 1st patch of the series. Can you give me a link for > the full series so that I'll see how the new API is used? > Sigh, I got this when sending these out: 4.3.0 Temporary System Problem. Try again later (10) I'm going to re-send the missing ones and *hopefully* have them thread properly. Sorry about that.
On 9/23/2022 4:25 PM, Valentin Schneider wrote: > Hi folks, > > Tariq pointed out in [1] that drivers allocating IRQ vectors would benefit > from having smarter NUMA-awareness (cpumask_local_spread() doesn't quite cut > it). > > The proposed interface involved an array of CPUs and a temporary cpumask, and > being my difficult self what I'm proposing here is an interface that doesn't > require any temporary storage other than some stack variables (at the cost of > one wild macro). > > Please note that this is based on top of Yury's bitmap-for-next [2] to leverage > his fancy new FIND_NEXT_BIT() macro. > > [1]: https://lore.kernel.org/all/20220728191203.4055-1-tariqt@nvidia.com/ > [2]: https://github.com/norov/linux.git/ -b bitmap-for-next > > A note on treewide use of for_each_cpu_andnot() > =============================================== > > I've used the below coccinelle script to find places that could be patched (I > couldn't figure out the valid syntax to patch from coccinelle itself): > > ,----- > @tmpandnot@ > expression tmpmask; > iterator for_each_cpu; > position p; > statement S; > @@ > cpumask_andnot(tmpmask, ...); > > ... > > ( > for_each_cpu@p(..., tmpmask, ...) > S > | > for_each_cpu@p(..., tmpmask, ...) > { > ... > } > ) > > @script:python depends on tmpandnot@ > p << tmpandnot.p; > @@ > coccilib.report.print_report(p[0], "andnot loop here") > '----- > > Which yields (against c40e8341e3b3): > > .//arch/powerpc/kernel/smp.c:1587:1-13: andnot loop here > .//arch/powerpc/kernel/smp.c:1530:1-13: andnot loop here > .//arch/powerpc/kernel/smp.c:1440:1-13: andnot loop here > .//arch/powerpc/platforms/powernv/subcore.c:306:2-14: andnot loop here > .//arch/x86/kernel/apic/x2apic_cluster.c:62:1-13: andnot loop here > .//drivers/acpi/acpi_pad.c:110:1-13: andnot loop here > .//drivers/cpufreq/armada-8k-cpufreq.c:148:1-13: andnot loop here > .//drivers/cpufreq/powernv-cpufreq.c:931:1-13: andnot loop here > .//drivers/net/ethernet/sfc/efx_channels.c:73:1-13: andnot loop here > .//drivers/net/ethernet/sfc/siena/efx_channels.c:73:1-13: andnot loop here > .//kernel/sched/core.c:345:1-13: andnot loop here > .//kernel/sched/core.c:366:1-13: andnot loop here > .//net/core/dev.c:3058:1-13: andnot loop here > > A lot of those are actually of the shape > > for_each_cpu(cpu, mask) { > ... > cpumask_andnot(mask, ...); > } > > I think *some* of the powerpc ones would be a match for for_each_cpu_andnot(), > but I decided to just stick to the one obvious one in __sched_core_flip(). > > Revisions > ========= > > v3 -> v4 > ++++++++ > > o Rebased on top of Yury's bitmap-for-next > o Added Tariq's mlx5e patch > o Made sched_numa_hop_mask() return cpu_online_mask for the NUMA_NO_NODE && > hops=0 case > > v2 -> v3 > ++++++++ > > o Added for_each_cpu_and() and for_each_cpu_andnot() tests (Yury) > o New patches to fix issues raised by running the above > > o New patch to use for_each_cpu_andnot() in sched/core.c (Yury) > > v1 -> v2 > ++++++++ > > o Split _find_next_bit() @invert into @invert1 and @invert2 (Yury) > o Rebase onto v6.0-rc1 > > Cheers, > Valentin > > Tariq Toukan (1): > net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity > hints > > Valentin Schneider (6): > lib/find_bit: Introduce find_next_andnot_bit() > cpumask: Introduce for_each_cpu_andnot() > lib/test_cpumask: Add for_each_cpu_and(not) tests > sched/core: Merge cpumask_andnot()+for_each_cpu() into > for_each_cpu_andnot() > sched/topology: Introduce sched_numa_hop_mask() > sched/topology: Introduce for_each_numa_hop_cpu() > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 13 +++++- > include/linux/cpumask.h | 39 ++++++++++++++++ > include/linux/find.h | 33 +++++++++++++ > include/linux/topology.h | 49 ++++++++++++++++++++ > kernel/sched/core.c | 5 +- > kernel/sched/topology.c | 31 +++++++++++++ > lib/cpumask_kunit.c | 19 ++++++++ > lib/find_bit.c | 9 ++++ > 8 files changed, 192 insertions(+), 6 deletions(-) > > -- > 2.31.1 > Valentin, thank you for investing your time here. Acked-by: Tariq Toukan <tariqt@nvidia.com> Tested on my mlx5 environment. It works as expected, including the case of node == NUMA_NO_NODE. Regards, Tariq
On 9/23/2022 4:25 PM, Valentin Schneider wrote: > Hi folks, > > Tariq pointed out in [1] that drivers allocating IRQ vectors would benefit > from having smarter NUMA-awareness (cpumask_local_spread() doesn't quite cut > it). > > The proposed interface involved an array of CPUs and a temporary cpumask, and > being my difficult self what I'm proposing here is an interface that doesn't > require any temporary storage other than some stack variables (at the cost of > one wild macro). > > Please note that this is based on top of Yury's bitmap-for-next [2] to leverage > his fancy new FIND_NEXT_BIT() macro. > > [1]: https://lore.kernel.org/all/20220728191203.4055-1-tariqt@nvidia.com/ > [2]: https://github.com/norov/linux.git/ -b bitmap-for-next > > A note on treewide use of for_each_cpu_andnot() > =============================================== > > I've used the below coccinelle script to find places that could be patched (I > couldn't figure out the valid syntax to patch from coccinelle itself): > > ,----- > @tmpandnot@ > expression tmpmask; > iterator for_each_cpu; > position p; > statement S; > @@ > cpumask_andnot(tmpmask, ...); > > ... > > ( > for_each_cpu@p(..., tmpmask, ...) > S > | > for_each_cpu@p(..., tmpmask, ...) > { > ... > } > ) > > @script:python depends on tmpandnot@ > p << tmpandnot.p; > @@ > coccilib.report.print_report(p[0], "andnot loop here") > '----- > > Which yields (against c40e8341e3b3): > > .//arch/powerpc/kernel/smp.c:1587:1-13: andnot loop here > .//arch/powerpc/kernel/smp.c:1530:1-13: andnot loop here > .//arch/powerpc/kernel/smp.c:1440:1-13: andnot loop here > .//arch/powerpc/platforms/powernv/subcore.c:306:2-14: andnot loop here > .//arch/x86/kernel/apic/x2apic_cluster.c:62:1-13: andnot loop here > .//drivers/acpi/acpi_pad.c:110:1-13: andnot loop here > .//drivers/cpufreq/armada-8k-cpufreq.c:148:1-13: andnot loop here > .//drivers/cpufreq/powernv-cpufreq.c:931:1-13: andnot loop here > .//drivers/net/ethernet/sfc/efx_channels.c:73:1-13: andnot loop here > .//drivers/net/ethernet/sfc/siena/efx_channels.c:73:1-13: andnot loop here > .//kernel/sched/core.c:345:1-13: andnot loop here > .//kernel/sched/core.c:366:1-13: andnot loop here > .//net/core/dev.c:3058:1-13: andnot loop here > > A lot of those are actually of the shape > > for_each_cpu(cpu, mask) { > ... > cpumask_andnot(mask, ...); > } > > I think *some* of the powerpc ones would be a match for for_each_cpu_andnot(), > but I decided to just stick to the one obvious one in __sched_core_flip(). > > Revisions > ========= > > v3 -> v4 > ++++++++ > > o Rebased on top of Yury's bitmap-for-next > o Added Tariq's mlx5e patch > o Made sched_numa_hop_mask() return cpu_online_mask for the NUMA_NO_NODE && > hops=0 case > > v2 -> v3 > ++++++++ > > o Added for_each_cpu_and() and for_each_cpu_andnot() tests (Yury) > o New patches to fix issues raised by running the above > > o New patch to use for_each_cpu_andnot() in sched/core.c (Yury) > > v1 -> v2 > ++++++++ > > o Split _find_next_bit() @invert into @invert1 and @invert2 (Yury) > o Rebase onto v6.0-rc1 > > Cheers, > Valentin > Hi, What's the status of this? Do we have agreement on the changes needed for the next respin? Regards, Tariq
On 18/10/22 09:36, Tariq Toukan wrote: > > Hi, > > What's the status of this? > Do we have agreement on the changes needed for the next respin? > Yep, the bitmap patches are in 6.1-rc1, I need to respin the topology ones to address Yury's comments. It's in my todolist, I'll get to it soonish. > Regards, > Tariq