mbox series

[6.1,0/2] io_uring/io-wq: respect cgroup cpusets

Message ID 20240911162316.516725-1-felix.moessbauer@siemens.com (mailing list archive)
Headers show
Series io_uring/io-wq: respect cgroup cpusets | expand

Message

Felix Moessbauer Sept. 11, 2024, 4:23 p.m. UTC
Hi,

as discussed in [1], this is a manual backport of the remaining two
patches to let the io worker threads respect the affinites defined by
the cgroup of the process.

In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
("io_uring: One wqe per wq") this is changed to only have a single worker.
As this patch is pretty invasive, Jens and me agreed to not backport it.

Instead we now limit the workers cpuset to the cpus that are in the
intersection between what the cgroup allows and what the NUMA node has.
This leaves the question what to do in case the intersection is empty:
To be backwarts compatible, we allow this case, but restrict the cpumask
of the poller to the cpuset defined by the cgroup. We further believe
this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
anyways.

[1] https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk

Best regards,
Felix Moessbauer
Siemens AG

Felix Moessbauer (2):
  io_uring/io-wq: do not allow pinning outside of cpuset
  io_uring/io-wq: inherit cpuset of cgroup in io worker

 io_uring/io-wq.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

Comments

Jens Axboe Sept. 11, 2024, 4:28 p.m. UTC | #1
On 9/11/24 10:23 AM, Felix Moessbauer wrote:
> Hi,
> 
> as discussed in [1], this is a manual backport of the remaining two
> patches to let the io worker threads respect the affinites defined by
> the cgroup of the process.
> 
> In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> ("io_uring: One wqe per wq") this is changed to only have a single worker.
> As this patch is pretty invasive, Jens and me agreed to not backport it.
> 
> Instead we now limit the workers cpuset to the cpus that are in the
> intersection between what the cgroup allows and what the NUMA node has.
> This leaves the question what to do in case the intersection is empty:
> To be backwarts compatible, we allow this case, but restrict the cpumask
> of the poller to the cpuset defined by the cgroup. We further believe
> this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
> anyways.
> 
> [1] https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk

The upstream patches are staged for 6.12 and marked for a backport, so
they should go upstream next week. Once they are upstream, I'll make
sure to check in on these on the stable front.
Greg KH Sept. 30, 2024, 7:15 p.m. UTC | #2
On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
> Hi,
> 
> as discussed in [1], this is a manual backport of the remaining two
> patches to let the io worker threads respect the affinites defined by
> the cgroup of the process.
> 
> In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> ("io_uring: One wqe per wq") this is changed to only have a single worker.
> As this patch is pretty invasive, Jens and me agreed to not backport it.
> 
> Instead we now limit the workers cpuset to the cpus that are in the
> intersection between what the cgroup allows and what the NUMA node has.
> This leaves the question what to do in case the intersection is empty:
> To be backwarts compatible, we allow this case, but restrict the cpumask
> of the poller to the cpuset defined by the cgroup. We further believe
> this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
> anyways.
> 
> [1] https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk

Why was neither of these actually tagged for inclusion in a stable tree?
Why just 6.1.y?  Please submit them for all relevent kernel versions.

thanks,

greg k-h
Felix Moessbauer Oct. 1, 2024, 7:32 a.m. UTC | #3
On Mon, 2024-09-30 at 21:15 +0200, Greg KH wrote:
> On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
> > Hi,
> > 
> > as discussed in [1], this is a manual backport of the remaining two
> > patches to let the io worker threads respect the affinites defined
> > by
> > the cgroup of the process.
> > 
> > In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> > ("io_uring: One wqe per wq") this is changed to only have a single
> > worker.
> > As this patch is pretty invasive, Jens and me agreed to not
> > backport it.
> > 
> > Instead we now limit the workers cpuset to the cpus that are in the
> > intersection between what the cgroup allows and what the NUMA node
> > has.
> > This leaves the question what to do in case the intersection is
> > empty:
> > To be backwarts compatible, we allow this case, but restrict the
> > cpumask
> > of the poller to the cpuset defined by the cgroup. We further
> > believe
> > this is a reasonable decision, as da64d6db3bd3 drops the NUMA
> > awareness
> > anyways.
> > 
> > [1]
> > https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk
> 
> Why was neither of these actually tagged for inclusion in a stable
> tree?

This is a manual backport of these patches for 6.1, as the subsystem
changed significantly between 6.1 and 6.2, making an automated backport
impossible. This has been agreed on with Jens in
https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk/

> Why just 6.1.y?  Please submit them for all relevent kernel versions.

The original patch was tagged stable and got accepted in 6.6, 6.10 and
6.11.

Felix

> 
> thanks,
> 
> greg k-h
Greg KH Oct. 1, 2024, 7:50 a.m. UTC | #4
On Tue, Oct 01, 2024 at 07:32:42AM +0000, MOESSBAUER, Felix wrote:
> On Mon, 2024-09-30 at 21:15 +0200, Greg KH wrote:
> > On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
> > > Hi,
> > > 
> > > as discussed in [1], this is a manual backport of the remaining two
> > > patches to let the io worker threads respect the affinites defined
> > > by
> > > the cgroup of the process.
> > > 
> > > In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> > > ("io_uring: One wqe per wq") this is changed to only have a single
> > > worker.
> > > As this patch is pretty invasive, Jens and me agreed to not
> > > backport it.
> > > 
> > > Instead we now limit the workers cpuset to the cpus that are in the
> > > intersection between what the cgroup allows and what the NUMA node
> > > has.
> > > This leaves the question what to do in case the intersection is
> > > empty:
> > > To be backwarts compatible, we allow this case, but restrict the
> > > cpumask
> > > of the poller to the cpuset defined by the cgroup. We further
> > > believe
> > > this is a reasonable decision, as da64d6db3bd3 drops the NUMA
> > > awareness
> > > anyways.
> > > 
> > > [1]
> > > https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk
> > 
> > Why was neither of these actually tagged for inclusion in a stable
> > tree?
> 
> This is a manual backport of these patches for 6.1, as the subsystem
> changed significantly between 6.1 and 6.2, making an automated backport
> impossible. This has been agreed on with Jens in
> https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk/
> 
> > Why just 6.1.y?  Please submit them for all relevent kernel versions.
> 
> The original patch was tagged stable and got accepted in 6.6, 6.10 and
> 6.11.

No they were not at all.  Please properly tag them in the future as per
the documentation if you wish to have things applied to the stable
trees:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

thanks,

greg k-h
Jens Axboe Oct. 1, 2024, 1:35 p.m. UTC | #5
On 10/1/24 1:50 AM, gregkh@linuxfoundation.org wrote:
> On Tue, Oct 01, 2024 at 07:32:42AM +0000, MOESSBAUER, Felix wrote:
>> On Mon, 2024-09-30 at 21:15 +0200, Greg KH wrote:
>>> On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
>>>> Hi,
>>>>
>>>> as discussed in [1], this is a manual backport of the remaining two
>>>> patches to let the io worker threads respect the affinites defined
>>>> by
>>>> the cgroup of the process.
>>>>
>>>> In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
>>>> ("io_uring: One wqe per wq") this is changed to only have a single
>>>> worker.
>>>> As this patch is pretty invasive, Jens and me agreed to not
>>>> backport it.
>>>>
>>>> Instead we now limit the workers cpuset to the cpus that are in the
>>>> intersection between what the cgroup allows and what the NUMA node
>>>> has.
>>>> This leaves the question what to do in case the intersection is
>>>> empty:
>>>> To be backwarts compatible, we allow this case, but restrict the
>>>> cpumask
>>>> of the poller to the cpuset defined by the cgroup. We further
>>>> believe
>>>> this is a reasonable decision, as da64d6db3bd3 drops the NUMA
>>>> awareness
>>>> anyways.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk
>>>
>>> Why was neither of these actually tagged for inclusion in a stable
>>> tree?
>>
>> This is a manual backport of these patches for 6.1, as the subsystem
>> changed significantly between 6.1 and 6.2, making an automated backport
>> impossible. This has been agreed on with Jens in
>> https://lore.kernel.org/lkml/ec01745a-b102-4f6e-abc9-abd636d36319@kernel.dk/
>>
>>> Why just 6.1.y?  Please submit them for all relevent kernel versions.
>>
>> The original patch was tagged stable and got accepted in 6.6, 6.10 and
>> 6.11.
> 
> No they were not at all.  Please properly tag them in the future as per
> the documentation if you wish to have things applied to the stable
> trees:
>     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

That's my bad, missed that one of them did not get marked for stable,
the sqpoll one did.