[v1] fuse: add numa affinity for uring queues

Message ID	20250331205709.1148069-1-joannelkoong@gmail.com (mailing list archive)
State	New
Headers	show Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25A30214A7B for <linux-fsdevel@vger.kernel.org>; Mon, 31 Mar 2025 20:57:15 +0000 (UTC) From: Joanne Koong <joannelkoong@gmail.com> To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v1] fuse: add numa affinity for uring queues Date: Mon, 31 Mar 2025 13:57:09 -0700 Message-ID: <20250331205709.1148069-1-joannelkoong@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v1] fuse: add numa affinity for uring queues \| expand [v1] fuse: add numa affinity for uring queues

Joanne Koong March 31, 2025, 8:57 p.m. UTC

There is a 1:1 mapping between cpus and queues. Allocate the queue on
the numa node associated with the cpu to help reduce memory access
latencies.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/dev_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Joanne Koong March 31, 2025, 11:42 p.m. UTC | #1

On Mon, Mar 31, 2025 at 1:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There is a 1:1 mapping between cpus and queues. Allocate the queue on
> the numa node associated with the cpu to help reduce memory access
> latencies.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/dev_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index accdce2977c5..0762d6229ac6 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -256,7 +256,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>         struct fuse_ring_queue *queue;
>         struct list_head *pq;
>
> -       queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
> +       queue = kzalloc_node(sizeof(*queue), GFP_KERNEL_ACCOUNT, cpu_to_node(qid));
>         if (!queue)
>                 return NULL;
>         pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);

On the same note I guess we should also allocate pq on the
corresponding numa node too.
> --
> 2.47.1
>

Bernd Schubert April 1, 2025, 8:11 a.m. UTC | #2

On 4/1/25 01:42, Joanne Koong wrote:
> On Mon, Mar 31, 2025 at 1:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> There is a 1:1 mapping between cpus and queues. Allocate the queue on
>> the numa node associated with the cpu to help reduce memory access
>> latencies.
>>
>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>> ---
>>  fs/fuse/dev_uring.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> index accdce2977c5..0762d6229ac6 100644
>> --- a/fs/fuse/dev_uring.c
>> +++ b/fs/fuse/dev_uring.c
>> @@ -256,7 +256,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>>         struct fuse_ring_queue *queue;
>>         struct list_head *pq;
>>
>> -       queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
>> +       queue = kzalloc_node(sizeof(*queue), GFP_KERNEL_ACCOUNT, cpu_to_node(qid));
>>         if (!queue)
>>                 return NULL;
>>         pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
> 
> On the same note I guess we should also allocate pq on the
> corresponding numa node too.

So this is supposed to be called from a thread that already runs on this
numa node and then kmalloc will allocate anyway on the right node,
afaik. Do you have a use case where this is called from another node? If
you do, all allocations in this file should be changed.


Thanks,
Bernd

Joanne Koong April 1, 2025, 5:50 p.m. UTC | #3

On Tue, Apr 1, 2025 at 1:11 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 4/1/25 01:42, Joanne Koong wrote:
> > On Mon, Mar 31, 2025 at 1:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >>
> >> There is a 1:1 mapping between cpus and queues. Allocate the queue on
> >> the numa node associated with the cpu to help reduce memory access
> >> latencies.
> >>
> >> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >> ---
> >>  fs/fuse/dev_uring.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> >> index accdce2977c5..0762d6229ac6 100644
> >> --- a/fs/fuse/dev_uring.c
> >> +++ b/fs/fuse/dev_uring.c
> >> @@ -256,7 +256,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> >>         struct fuse_ring_queue *queue;
> >>         struct list_head *pq;
> >>
> >> -       queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
> >> +       queue = kzalloc_node(sizeof(*queue), GFP_KERNEL_ACCOUNT, cpu_to_node(qid));
> >>         if (!queue)
> >>                 return NULL;
> >>         pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
> >
> > On the same note I guess we should also allocate pq on the
> > corresponding numa node too.
>
> So this is supposed to be called from a thread that already runs on this
> numa node and then kmalloc will allocate anyway on the right node,
> afaik. Do you have a use case where this is called from another node? If
> you do, all allocations in this file should be changed.
>

I don't have a use case I'm using but imo it seems hardier to ensure
this at the kernel level for queue, pq, and ent allocations instead of
assuming userspace will always submit the registration from the thread
on the numa node corresponding to the qid it's registering. I don't
feel strongly about this though so if you think the responsibility
should be left to userspace, then that's fine with me.

Thanks,
Joanne

>
> Thanks,
> Bernd

Joanne Koong April 1, 2025, 5:53 p.m. UTC | #4

On Tue, Apr 1, 2025 at 10:50 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Tue, Apr 1, 2025 at 1:11 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > On 4/1/25 01:42, Joanne Koong wrote:
> > > On Mon, Mar 31, 2025 at 1:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>
> > >> There is a 1:1 mapping between cpus and queues. Allocate the queue on
> > >> the numa node associated with the cpu to help reduce memory access
> > >> latencies.
> > >>
> > >> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > >> ---
> > >>  fs/fuse/dev_uring.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> > >> index accdce2977c5..0762d6229ac6 100644
> > >> --- a/fs/fuse/dev_uring.c
> > >> +++ b/fs/fuse/dev_uring.c
> > >> @@ -256,7 +256,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> > >>         struct fuse_ring_queue *queue;
> > >>         struct list_head *pq;
> > >>
> > >> -       queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
> > >> +       queue = kzalloc_node(sizeof(*queue), GFP_KERNEL_ACCOUNT, cpu_to_node(qid));
> > >>         if (!queue)
> > >>                 return NULL;
> > >>         pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
> > >
> > > On the same note I guess we should also allocate pq on the
> > > corresponding numa node too.
> >
> > So this is supposed to be called from a thread that already runs on this
> > numa node and then kmalloc will allocate anyway on the right node,
> > afaik. Do you have a use case where this is called from another node? If
> > you do, all allocations in this file should be changed.
> >
>
> I don't have a use case I'm using but imo it seems hardier to ensure
> this at the kernel level for queue, pq, and ent allocations instead of
> assuming userspace will always submit the registration from the thread
> on the numa node corresponding to the qid it's registering. I don't
> feel strongly about this though so if you think the responsibility
> should be left to userspace, then that's fine with me.

Thinking about this some more, if the responsibility for numa affinity
should be left to userspace, then that should hold true too for
configurable queues, no?

Thanks,
Joanne

>
> Thanks,
> Joanne
>
> >
> > Thanks,
> > Bernd

Bernd Schubert April 1, 2025, 5:56 p.m. UTC | #5

On 4/1/25 19:50, Joanne Koong wrote:
> On Tue, Apr 1, 2025 at 1:11 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> On 4/1/25 01:42, Joanne Koong wrote:
>>> On Mon, Mar 31, 2025 at 1:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>>>>
>>>> There is a 1:1 mapping between cpus and queues. Allocate the queue on
>>>> the numa node associated with the cpu to help reduce memory access
>>>> latencies.
>>>>
>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>>> ---
>>>>  fs/fuse/dev_uring.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>> index accdce2977c5..0762d6229ac6 100644
>>>> --- a/fs/fuse/dev_uring.c
>>>> +++ b/fs/fuse/dev_uring.c
>>>> @@ -256,7 +256,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>>>>         struct fuse_ring_queue *queue;
>>>>         struct list_head *pq;
>>>>
>>>> -       queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
>>>> +       queue = kzalloc_node(sizeof(*queue), GFP_KERNEL_ACCOUNT, cpu_to_node(qid));
>>>>         if (!queue)
>>>>                 return NULL;
>>>>         pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
>>>
>>> On the same note I guess we should also allocate pq on the
>>> corresponding numa node too.
>>
>> So this is supposed to be called from a thread that already runs on this
>> numa node and then kmalloc will allocate anyway on the right node,
>> afaik. Do you have a use case where this is called from another node? If
>> you do, all allocations in this file should be changed.
>>
> 
> I don't have a use case I'm using but imo it seems hardier to ensure
> this at the kernel level for queue, pq, and ent allocations instead of
> assuming userspace will always submit the registration from the thread
> on the numa node corresponding to the qid it's registering. I don't
> feel strongly about this though so if you think the responsibility
> should be left to userspace, then that's fine with me.
> 

What I basically mean is that these threads are going to send the
initial SQE, shouldn't it? Sounds a bit weird to me if the setup and
sending the registration SQE are from different threads than the later
handling. 


Thanks,
Bernd

[v1] fuse: add numa affinity for uring queues

Commit Message

Comments

Patch