mbox series

[RFC,0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions

Message ID 20240117163511.88173-1-longman@redhat.com (mailing list archive)
Headers show
Series cgroup/cpuset: Support RCU_NOCB on isolated partitions | expand

Message

Waiman Long Jan. 17, 2024, 4:35 p.m. UTC
This patch series is based on the RFC patch from Frederic [1]. Instead
of offering RCU_NOCB as a separate option, it is now lumped into a
root-only cpuset.cpus.isolation_full flag that will enable all the
additional CPU isolation capabilities available for isolated partitions
if set. RCU_NOCB is just the first one to this party. Additional dynamic
CPU isolation capabilities will be added in the future.

The first 2 patches are adopted from Federic with minor twists to fix
merge conflicts and compilation issue. The rests are for implementing
the new cpuset.cpus.isolation_full interface which is essentially a flag
to globally enable or disable full CPU isolation on isolated partitions.
On read, it also shows the CPU isolation capabilities that are currently
enabled. RCU_NOCB requires that the rcu_nocbs option be present in
the kernel boot command line. Without that, the rcu_nocb functionality
cannot be enabled even if the isolation_full flag is set. So we allow
users to check the isolation_full file to verify that if the desired
CPU isolation capability is enabled or not.

Only sanity checking has been done so far. More testing, especially on
the RCU side, will be needed.

[1] https://lore.kernel.org/lkml/20220525221055.1152307-1-frederic@kernel.org/

Frederic Weisbecker (2):
  rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload
  rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected
    cpuset caller

Waiman Long (6):
  rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state
  cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs
  cgroup/cpuset: Add cpuset.cpus.isolation_full
  cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs
  cgroup/cpuset: Document the new cpuset.cpus.isolation_full control
    file
  cgroup/cpuset: Update test_cpuset_prs.sh to handle
    cpuset.cpus.isolation_full

 Documentation/admin-guide/cgroup-v2.rst       |  24 ++
 include/linux/rcupdate.h                      |  15 +-
 kernel/cgroup/cpuset.c                        | 237 ++++++++++++++----
 kernel/rcu/rcutorture.c                       |   6 +-
 kernel/rcu/tree_nocb.h                        | 118 ++++++---
 .../selftests/cgroup/test_cpuset_prs.sh       |  23 +-
 6 files changed, 337 insertions(+), 86 deletions(-)

Comments

Tejun Heo Jan. 17, 2024, 5:07 p.m. UTC | #1
Hello,

On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> The first 2 patches are adopted from Federic with minor twists to fix
> merge conflicts and compilation issue. The rests are for implementing
> the new cpuset.cpus.isolation_full interface which is essentially a flag
> to globally enable or disable full CPU isolation on isolated partitions.

I think the interface is a bit premature. The cpuset partition feature is
already pretty restrictive and makes it really clear that it's to isolate
the CPUs. I think it'd be better to just enable all the isolation features
by default. If there are valid use cases which can't be served without
disabling some isolation features, we can worry about adding the interface
at that point.

Thanks.
Waiman Long Jan. 17, 2024, 5:15 p.m. UTC | #2
On 1/17/24 12:07, Tejun Heo wrote:
> Hello,
>
> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
>> The first 2 patches are adopted from Federic with minor twists to fix
>> merge conflicts and compilation issue. The rests are for implementing
>> the new cpuset.cpus.isolation_full interface which is essentially a flag
>> to globally enable or disable full CPU isolation on isolated partitions.
> I think the interface is a bit premature. The cpuset partition feature is
> already pretty restrictive and makes it really clear that it's to isolate
> the CPUs. I think it'd be better to just enable all the isolation features
> by default. If there are valid use cases which can't be served without
> disabling some isolation features, we can worry about adding the interface
> at that point.

My current thought is to make isolated partitions act like 
isolcpus=domain, additional CPU isolation capabilities are optional and 
can be turned on using isolation_full. However, I am fine with making 
all these turned on by default if it is the consensus.

Cheers,
Longman
Paul E. McKenney Jan. 19, 2024, 10:24 a.m. UTC | #3
On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> This patch series is based on the RFC patch from Frederic [1]. Instead
> of offering RCU_NOCB as a separate option, it is now lumped into a
> root-only cpuset.cpus.isolation_full flag that will enable all the
> additional CPU isolation capabilities available for isolated partitions
> if set. RCU_NOCB is just the first one to this party. Additional dynamic
> CPU isolation capabilities will be added in the future.
> 
> The first 2 patches are adopted from Federic with minor twists to fix
> merge conflicts and compilation issue. The rests are for implementing
> the new cpuset.cpus.isolation_full interface which is essentially a flag
> to globally enable or disable full CPU isolation on isolated partitions.
> On read, it also shows the CPU isolation capabilities that are currently
> enabled. RCU_NOCB requires that the rcu_nocbs option be present in
> the kernel boot command line. Without that, the rcu_nocb functionality
> cannot be enabled even if the isolation_full flag is set. So we allow
> users to check the isolation_full file to verify that if the desired
> CPU isolation capability is enabled or not.
> 
> Only sanity checking has been done so far. More testing, especially on
> the RCU side, will be needed.

There has been some discussion of simplifying the (de-)offloading code
to handle only offline CPUs.  Along with some discussion of eliminating
the (de-)offloading capability altogehter.

We clearly should converge on the capability to be provided before
exposing this to userspace.  ;-)

							Thanx, Paul

> [1] https://lore.kernel.org/lkml/20220525221055.1152307-1-frederic@kernel.org/
> 
> Frederic Weisbecker (2):
>   rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload
>   rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected
>     cpuset caller
> 
> Waiman Long (6):
>   rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state
>   cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs
>   cgroup/cpuset: Add cpuset.cpus.isolation_full
>   cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs
>   cgroup/cpuset: Document the new cpuset.cpus.isolation_full control
>     file
>   cgroup/cpuset: Update test_cpuset_prs.sh to handle
>     cpuset.cpus.isolation_full
> 
>  Documentation/admin-guide/cgroup-v2.rst       |  24 ++
>  include/linux/rcupdate.h                      |  15 +-
>  kernel/cgroup/cpuset.c                        | 237 ++++++++++++++----
>  kernel/rcu/rcutorture.c                       |   6 +-
>  kernel/rcu/tree_nocb.h                        | 118 ++++++---
>  .../selftests/cgroup/test_cpuset_prs.sh       |  23 +-
>  6 files changed, 337 insertions(+), 86 deletions(-)
> 
> -- 
> 2.39.3
>
Michal Koutný Jan. 22, 2024, 3:07 p.m. UTC | #4
Hello Waiman.

On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long <longman@redhat.com> wrote:
> This patch series is based on the RFC patch from Frederic [1]. Instead
> of offering RCU_NOCB as a separate option, it is now lumped into a
> root-only cpuset.cpus.isolation_full flag that will enable all the
> additional CPU isolation capabilities available for isolated partitions
> if set. RCU_NOCB is just the first one to this party. Additional dynamic
> CPU isolation capabilities will be added in the future.

IIUC this is similar to what I suggested back in the day and you didn't
consider it [1]. Do I read this right that you've changed your mind?

(It's fine if you did, I'm only asking to follow the heading of cpuset
controller.)

Thanks,
Michal

[1] https://lore.kernel.org/r/58c87587-417b-1498-185f-1db6bb612c82@redhat.com/
Waiman Long Jan. 23, 2024, 5:50 a.m. UTC | #5
On 1/22/24 10:07, Michal Koutný wrote:
> Hello Waiman.
>
> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long <longman@redhat.com> wrote:
>> This patch series is based on the RFC patch from Frederic [1]. Instead
>> of offering RCU_NOCB as a separate option, it is now lumped into a
>> root-only cpuset.cpus.isolation_full flag that will enable all the
>> additional CPU isolation capabilities available for isolated partitions
>> if set. RCU_NOCB is just the first one to this party. Additional dynamic
>> CPU isolation capabilities will be added in the future.
> IIUC this is similar to what I suggested back in the day and you didn't
> consider it [1]. Do I read this right that you've changed your mind?

I didn't said that we were not going to do this at the time. It's just 
that more evaluation will need to be done before we are going to do 
this. I was also looking to see if there were use cases where such 
capabilities were needed. Now I am aware that such use cases do exist 
and we should start looking into it.

>
> (It's fine if you did, I'm only asking to follow the heading of cpuset
> controller.)

OK, the title of the cover-letter may be too specific. I will make it 
more general in the next version.

Cheers,
Longman
Frederic Weisbecker Feb. 6, 2024, 12:56 p.m. UTC | #6
Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> 
> On 1/17/24 12:07, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > The first 2 patches are adopted from Federic with minor twists to fix
> > > merge conflicts and compilation issue. The rests are for implementing
> > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > to globally enable or disable full CPU isolation on isolated partitions.
> > I think the interface is a bit premature. The cpuset partition feature is
> > already pretty restrictive and makes it really clear that it's to isolate
> > the CPUs. I think it'd be better to just enable all the isolation features
> > by default. If there are valid use cases which can't be served without
> > disabling some isolation features, we can worry about adding the interface
> > at that point.
> 
> My current thought is to make isolated partitions act like isolcpus=domain,
> additional CPU isolation capabilities are optional and can be turned on
> using isolation_full. However, I am fine with making all these turned on by
> default if it is the consensus.

Right it was the consensus last time I tried. Along with the fact that mutating
this isolation_full set has to be done on offline CPUs to simplify the whole
picture.

So lemme try to summarize what needs to be done:

1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
  now. And if it ever proves needed, provide a way later for more finegrained
  tuning.

2) This file must only apply to offline CPUs because it avoids migrations and
  stuff.

3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
   changes.

4) HK_TYPE_TIMER:
   * Wrt. timers in general, not much needs to be done, the CPUs are
     offline. But:
   * arch/x86/kvm/x86.c does something weird
   * drivers/char/random.c might need some care
   * watchdog needs to be (de-)activated
   
5) HK_TYPE_DOMAIN:
   * This one I fear is not mutable, this is isolcpus...

6) HK_TYPE_MANAGED_IRQ:
   * I prefer not to think about it :-)

7) HK_TYPE_TICK:
   * Maybe some tiny ticks internals to revisit, I'll check that.
   * There is a remote tick to take into consideration, but again the
     CPUs are offline so it shouldn't be too complicated.

8) HK_TYPE_WQ:
   * Fortunately we already have all the mutable interface in place.
     But we must make it live nicely with the sysfs workqueue affinity
     files.

9) HK_FLAG_SCHED:
   * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
   Should be removed?

10) HK_TYPE_RCU:
    * That's point 3) and also some kthreads to affine, which leads us
     to the following in HK_TYPE_KTHREAD:

11) HK_FLAG_KTHREAD:
    * I'm guessing it's fine as long as isolation_full is also an
      isolated partition. Then unbound kthreads shouldn't run there.

12) HK_TYPE_MISC:
    * Should be fine as ILB isn't running on offline CPUs.

Thanks.
Marcelo Tosatti Feb. 6, 2024, 7:15 p.m. UTC | #7
On Tue, Feb 06, 2024 at 01:56:23PM +0100, Frederic Weisbecker wrote:
> Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> > 
> > On 1/17/24 12:07, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > > The first 2 patches are adopted from Federic with minor twists to fix
> > > > merge conflicts and compilation issue. The rests are for implementing
> > > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > > to globally enable or disable full CPU isolation on isolated partitions.
> > > I think the interface is a bit premature. The cpuset partition feature is
> > > already pretty restrictive and makes it really clear that it's to isolate
> > > the CPUs. I think it'd be better to just enable all the isolation features
> > > by default. If there are valid use cases which can't be served without
> > > disabling some isolation features, we can worry about adding the interface
> > > at that point.
> > 
> > My current thought is to make isolated partitions act like isolcpus=domain,
> > additional CPU isolation capabilities are optional and can be turned on
> > using isolation_full. However, I am fine with making all these turned on by
> > default if it is the consensus.
> 
> Right it was the consensus last time I tried. Along with the fact that mutating
> this isolation_full set has to be done on offline CPUs to simplify the whole
> picture.
> 
> So lemme try to summarize what needs to be done:
> 
> 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
>   now. And if it ever proves needed, provide a way later for more finegrained
>   tuning.
> 
> 2) This file must only apply to offline CPUs because it avoids migrations and
>   stuff.
> 
> 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
>    changes.
> 
> 4) HK_TYPE_TIMER:
>    * Wrt. timers in general, not much needs to be done, the CPUs are
>      offline. But:
>    * arch/x86/kvm/x86.c does something weird
>    * drivers/char/random.c might need some care
>    * watchdog needs to be (de-)activated
>    
> 5) HK_TYPE_DOMAIN:
>    * This one I fear is not mutable, this is isolcpus...

Except for HK_TYPE_DOMAIN, i have never seen anyone use any of this
flags.

> 
> 6) HK_TYPE_MANAGED_IRQ:
>    * I prefer not to think about it :-)
> 
> 7) HK_TYPE_TICK:
>    * Maybe some tiny ticks internals to revisit, I'll check that.
>    * There is a remote tick to take into consideration, but again the
>      CPUs are offline so it shouldn't be too complicated.
> 
> 8) HK_TYPE_WQ:
>    * Fortunately we already have all the mutable interface in place.
>      But we must make it live nicely with the sysfs workqueue affinity
>      files.
> 
> 9) HK_FLAG_SCHED:
>    * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
>    Should be removed?
> 
> 10) HK_TYPE_RCU:
>     * That's point 3) and also some kthreads to affine, which leads us
>      to the following in HK_TYPE_KTHREAD:
> 
> 11) HK_FLAG_KTHREAD:
>     * I'm guessing it's fine as long as isolation_full is also an
>       isolated partition. Then unbound kthreads shouldn't run there.
> 
> 12) HK_TYPE_MISC:
>     * Should be fine as ILB isn't running on offline CPUs.
> 
> Thanks.
> 
>
Frederic Weisbecker Feb. 7, 2024, 2:47 p.m. UTC | #8
Le Tue, Feb 06, 2024 at 04:15:18PM -0300, Marcelo Tosatti a écrit :
> On Tue, Feb 06, 2024 at 01:56:23PM +0100, Frederic Weisbecker wrote:
> > Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> > > 
> > > On 1/17/24 12:07, Tejun Heo wrote:
> > > > Hello,
> > > > 
> > > > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > > > The first 2 patches are adopted from Federic with minor twists to fix
> > > > > merge conflicts and compilation issue. The rests are for implementing
> > > > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > > > to globally enable or disable full CPU isolation on isolated partitions.
> > > > I think the interface is a bit premature. The cpuset partition feature is
> > > > already pretty restrictive and makes it really clear that it's to isolate
> > > > the CPUs. I think it'd be better to just enable all the isolation features
> > > > by default. If there are valid use cases which can't be served without
> > > > disabling some isolation features, we can worry about adding the interface
> > > > at that point.
> > > 
> > > My current thought is to make isolated partitions act like isolcpus=domain,
> > > additional CPU isolation capabilities are optional and can be turned on
> > > using isolation_full. However, I am fine with making all these turned on by
> > > default if it is the consensus.
> > 
> > Right it was the consensus last time I tried. Along with the fact that mutating
> > this isolation_full set has to be done on offline CPUs to simplify the whole
> > picture.
> > 
> > So lemme try to summarize what needs to be done:
> > 
> > 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
> >   now. And if it ever proves needed, provide a way later for more finegrained
> >   tuning.
> > 
> > 2) This file must only apply to offline CPUs because it avoids migrations and
> >   stuff.
> > 
> > 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
> >    changes.
> > 
> > 4) HK_TYPE_TIMER:
> >    * Wrt. timers in general, not much needs to be done, the CPUs are
> >      offline. But:
> >    * arch/x86/kvm/x86.c does something weird
> >    * drivers/char/random.c might need some care
> >    * watchdog needs to be (de-)activated
> >    
> > 5) HK_TYPE_DOMAIN:
> >    * This one I fear is not mutable, this is isolcpus...
> 
> Except for HK_TYPE_DOMAIN, i have never seen anyone use any of this
> flags.

HK_TYPE_DOMAIN is used by isolcpus=domain,....
HK_TYPE_MANAGED_IRQ is used by isolcpus=managed_irq,...

All the others (except HK_TYPE_SCHED) are used by nohz_full=

Thanks.

> 
> > 
> > 6) HK_TYPE_MANAGED_IRQ:
> >    * I prefer not to think about it :-)
> > 
> > 7) HK_TYPE_TICK:
> >    * Maybe some tiny ticks internals to revisit, I'll check that.
> >    * There is a remote tick to take into consideration, but again the
> >      CPUs are offline so it shouldn't be too complicated.
> > 
> > 8) HK_TYPE_WQ:
> >    * Fortunately we already have all the mutable interface in place.
> >      But we must make it live nicely with the sysfs workqueue affinity
> >      files.
> > 
> > 9) HK_FLAG_SCHED:
> >    * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
> >    Should be removed?
> > 
> > 10) HK_TYPE_RCU:
> >     * That's point 3) and also some kthreads to affine, which leads us
> >      to the following in HK_TYPE_KTHREAD:
> > 
> > 11) HK_FLAG_KTHREAD:
> >     * I'm guessing it's fine as long as isolation_full is also an
> >       isolated partition. Then unbound kthreads shouldn't run there.
> > 
> > 12) HK_TYPE_MISC:
> >     * Should be fine as ILB isn't running on offline CPUs.
> > 
> > Thanks.
> > 
> > 
>
Marcelo Tosatti Feb. 7, 2024, 2:59 p.m. UTC | #9
On Wed, Feb 07, 2024 at 03:47:46PM +0100, Frederic Weisbecker wrote:
> Le Tue, Feb 06, 2024 at 04:15:18PM -0300, Marcelo Tosatti a écrit :
> > On Tue, Feb 06, 2024 at 01:56:23PM +0100, Frederic Weisbecker wrote:
> > > Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> > > > 
> > > > On 1/17/24 12:07, Tejun Heo wrote:
> > > > > Hello,
> > > > > 
> > > > > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > > > > The first 2 patches are adopted from Federic with minor twists to fix
> > > > > > merge conflicts and compilation issue. The rests are for implementing
> > > > > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > > > > to globally enable or disable full CPU isolation on isolated partitions.
> > > > > I think the interface is a bit premature. The cpuset partition feature is
> > > > > already pretty restrictive and makes it really clear that it's to isolate
> > > > > the CPUs. I think it'd be better to just enable all the isolation features
> > > > > by default. If there are valid use cases which can't be served without
> > > > > disabling some isolation features, we can worry about adding the interface
> > > > > at that point.
> > > > 
> > > > My current thought is to make isolated partitions act like isolcpus=domain,
> > > > additional CPU isolation capabilities are optional and can be turned on
> > > > using isolation_full. However, I am fine with making all these turned on by
> > > > default if it is the consensus.
> > > 
> > > Right it was the consensus last time I tried. Along with the fact that mutating
> > > this isolation_full set has to be done on offline CPUs to simplify the whole
> > > picture.
> > > 
> > > So lemme try to summarize what needs to be done:
> > > 
> > > 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
> > >   now. And if it ever proves needed, provide a way later for more finegrained
> > >   tuning.
> > > 
> > > 2) This file must only apply to offline CPUs because it avoids migrations and
> > >   stuff.
> > > 
> > > 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
> > >    changes.
> > > 
> > > 4) HK_TYPE_TIMER:
> > >    * Wrt. timers in general, not much needs to be done, the CPUs are
> > >      offline. But:
> > >    * arch/x86/kvm/x86.c does something weird
> > >    * drivers/char/random.c might need some care
> > >    * watchdog needs to be (de-)activated
> > >    
> > > 5) HK_TYPE_DOMAIN:
> > >    * This one I fear is not mutable, this is isolcpus...
> > 
> > Except for HK_TYPE_DOMAIN, i have never seen anyone use any of this
> > flags.
> 
> HK_TYPE_DOMAIN is used by isolcpus=domain,....

> HK_TYPE_MANAGED_IRQ is used by isolcpus=managed_irq,...
> 
> All the others (except HK_TYPE_SCHED) are used by nohz_full=

I mean i've never seen any use of the individual flags being set.

You either want full isolation (nohz_full and all the flags together,
except for HK_TYPE_DOMAIN which is sometimes enabled/disabled), or not.

So why not group them all together ?

Do you know of any separate uses of these flags (except for
HK_TYPE_DOMAIN).

> Thanks.
> 
> > 
> > > 
> > > 6) HK_TYPE_MANAGED_IRQ:
> > >    * I prefer not to think about it :-)
> > > 
> > > 7) HK_TYPE_TICK:
> > >    * Maybe some tiny ticks internals to revisit, I'll check that.
> > >    * There is a remote tick to take into consideration, but again the
> > >      CPUs are offline so it shouldn't be too complicated.
> > > 
> > > 8) HK_TYPE_WQ:
> > >    * Fortunately we already have all the mutable interface in place.
> > >      But we must make it live nicely with the sysfs workqueue affinity
> > >      files.
> > > 
> > > 9) HK_FLAG_SCHED:
> > >    * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
> > >    Should be removed?
> > > 
> > > 10) HK_TYPE_RCU:
> > >     * That's point 3) and also some kthreads to affine, which leads us
> > >      to the following in HK_TYPE_KTHREAD:
> > > 
> > > 11) HK_FLAG_KTHREAD:
> > >     * I'm guessing it's fine as long as isolation_full is also an
> > >       isolated partition. Then unbound kthreads shouldn't run there.
> > > 
> > > 12) HK_TYPE_MISC:
> > >     * Should be fine as ILB isn't running on offline CPUs.
> > > 
> > > Thanks.
> > > 
> > > 
> > 
> 
>
Waiman Long Feb. 10, 2024, 4:19 a.m. UTC | #10
On 2/6/24 07:56, Frederic Weisbecker wrote:
> Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
>> On 1/17/24 12:07, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
>>>> The first 2 patches are adopted from Federic with minor twists to fix
>>>> merge conflicts and compilation issue. The rests are for implementing
>>>> the new cpuset.cpus.isolation_full interface which is essentially a flag
>>>> to globally enable or disable full CPU isolation on isolated partitions.
>>> I think the interface is a bit premature. The cpuset partition feature is
>>> already pretty restrictive and makes it really clear that it's to isolate
>>> the CPUs. I think it'd be better to just enable all the isolation features
>>> by default. If there are valid use cases which can't be served without
>>> disabling some isolation features, we can worry about adding the interface
>>> at that point.
>> My current thought is to make isolated partitions act like isolcpus=domain,
>> additional CPU isolation capabilities are optional and can be turned on
>> using isolation_full. However, I am fine with making all these turned on by
>> default if it is the consensus.
> Right it was the consensus last time I tried. Along with the fact that mutating
> this isolation_full set has to be done on offline CPUs to simplify the whole
> picture.
>
> So lemme try to summarize what needs to be done:
>
> 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
>    now. And if it ever proves needed, provide a way later for more finegrained
>    tuning.
That is more or less the current plan. As detailed below, HK_TYPE_DOMAIN 
& HK_TYPE_WQ isolation are included in the isolated partitions by 
default. I am also thinking about including other relatively cheap 
isolation flags by default. The expensive ones will have to be enabled 
via isolation_full.
>
> 2) This file must only apply to offline CPUs because it avoids migrations and
>    stuff.
Well, the process of first moving the CPUs offline first is rather 
expensive. I won't mind doing some partial offlining based on the 
existing set of teardown and bringup callbacks, but I would try to avoid 
fully offlining the CPUs first.
>
> 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
>     changes.
>
> 4) HK_TYPE_TIMER:
>     * Wrt. timers in general, not much needs to be done, the CPUs are
>       offline. But:
>     * arch/x86/kvm/x86.c does something weird
>     * drivers/char/random.c might need some care
>     * watchdog needs to be (de-)activated
>     
> 5) HK_TYPE_DOMAIN:
>     * This one I fear is not mutable, this is isolcpus...

HK_TYPE_DOMAIN is already available via the current cpuset isolated 
partition functionality. What I am currently doing is to extend that to 
other HK_TYPE* flags.


>
> 6) HK_TYPE_MANAGED_IRQ:
>     * I prefer not to think about it :-)
>
> 7) HK_TYPE_TICK:
>     * Maybe some tiny ticks internals to revisit, I'll check that.
>     * There is a remote tick to take into consideration, but again the
>       CPUs are offline so it shouldn't be too complicated.
>
> 8) HK_TYPE_WQ:
>     * Fortunately we already have all the mutable interface in place.
>       But we must make it live nicely with the sysfs workqueue affinity
>       files.

HK_TYPE_WQ is basically done and it is going to work properly with the 
workqueue affinity sysfs files. From the workqueue of view, HK_TYPE_WQ 
is currently treated the same as HK_TYPE_DOMAIN.

>
> 9) HK_FLAG_SCHED:
>     * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
>     Should be removed?
I don't think HK_FLAG_SCHED is being used at all. So I believe we should 
remove it to avoid confusion.
>
> 10) HK_TYPE_RCU:
>      * That's point 3) and also some kthreads to affine, which leads us
>       to the following in HK_TYPE_KTHREAD:
>
> 11) HK_FLAG_KTHREAD:
>      * I'm guessing it's fine as long as isolation_full is also an
>        isolated partition. Then unbound kthreads shouldn't run there.

Yes, isolation_full applies only to isolated partitions. It extends the 
amount of CPU isolation by enabling all the other CPU available 
isolation flags.

Cheers,
Longman
Waiman Long Feb. 11, 2024, 1:46 a.m. UTC | #11
On 1/19/24 05:24, Paul E. McKenney wrote:
> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
>> This patch series is based on the RFC patch from Frederic [1]. Instead
>> of offering RCU_NOCB as a separate option, it is now lumped into a
>> root-only cpuset.cpus.isolation_full flag that will enable all the
>> additional CPU isolation capabilities available for isolated partitions
>> if set. RCU_NOCB is just the first one to this party. Additional dynamic
>> CPU isolation capabilities will be added in the future.
>>
>> The first 2 patches are adopted from Federic with minor twists to fix
>> merge conflicts and compilation issue. The rests are for implementing
>> the new cpuset.cpus.isolation_full interface which is essentially a flag
>> to globally enable or disable full CPU isolation on isolated partitions.
>> On read, it also shows the CPU isolation capabilities that are currently
>> enabled. RCU_NOCB requires that the rcu_nocbs option be present in
>> the kernel boot command line. Without that, the rcu_nocb functionality
>> cannot be enabled even if the isolation_full flag is set. So we allow
>> users to check the isolation_full file to verify that if the desired
>> CPU isolation capability is enabled or not.
>>
>> Only sanity checking has been done so far. More testing, especially on
>> the RCU side, will be needed.
> There has been some discussion of simplifying the (de-)offloading code
> to handle only offline CPUs.  Along with some discussion of eliminating
> the (de-)offloading capability altogehter.
>
> We clearly should converge on the capability to be provided before
> exposing this to userspace.  ;-)

Would you mind giving me a pointer to the discussion of simplifying the 
de-offloading code to  handle only offline CPUs?

Thanks,
Longman