diff mbox series

fuse: clear FR_PENDING without holding fiq lock for uring requests

Message ID 20250203185040.2365113-1-joannelkoong@gmail.com (mailing list archive)
State New
Headers show
Series fuse: clear FR_PENDING without holding fiq lock for uring requests | expand

Commit Message

Joanne Koong Feb. 3, 2025, 6:50 p.m. UTC
req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
bit is cleared from the request flags when assigning a request to a
uring entry, the fiq->lock does not need to be held.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
---
 fs/fuse/dev_uring.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Bernd Schubert Feb. 4, 2025, 11:03 a.m. UTC | #1
Hi Joanne,

On 2/3/25 19:50, Joanne Koong wrote:
> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
> bit is cleared from the request flags when assigning a request to a
> uring entry, the fiq->lock does not need to be held.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
> ---
>  fs/fuse/dev_uring.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index ab8c26042aa8..42389d3e7235 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>  			ent->state);
>  	}
>  
> -	spin_lock(&fiq->lock);
>  	clear_bit(FR_PENDING, &req->flags);
> -	spin_unlock(&fiq->lock);
>  	ent->fuse_req = req;
>  	ent->state = FRRS_FUSE_REQ;
>  	list_move(&ent->list, &queue->ent_w_req_queue);

I think that would have an issue in request_wait_answer(). Let's say


task-A, request_wait_answer(),
		spin_lock(&fiq->lock);
		/* Request is not yet in userspace, bail out */
		if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
			list_del(&req->list);  // --> removes from the list

task-B, 
fuse_uring_add_req_to_ring_ent()
	clear_bit(FR_PENDING, &req->flags);
	ent->fuse_req = req;
	ent->state = FRRS_FUSE_REQ;
	list_move_tail(&ent->list, &queue->ent_w_req_queue);
	fuse_uring_add_to_pq(ent, req);  // ==> Add to list



What I mean is, task-A passes the if, but is then slower than task-B. I.e.
task-B runs fuse_uring_add_to_pq() before task-B does the list_del.


Now the ring entry gets handled by fuse-server, comes back to fuse-client
and does not find the request anymore, because both tasks raced.
The entire ring entry will be lost - it not be used anymore for
the live time of the connection.

And the other issue might be total list corruption, because two tasks might
access the list at the same time.


Thanks,
Bernd
Christian Brauner Feb. 4, 2025, 11:12 a.m. UTC | #2
On Mon, 03 Feb 2025 10:50:40 -0800, Joanne Koong wrote:
> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
> bit is cleared from the request flags when assigning a request to a
> uring entry, the fiq->lock does not need to be held.
> 
> 

Applied to the vfs-6.15.fuse branch of the vfs/vfs.git tree.
Patches in the vfs-6.15.fuse branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.15.fuse

[1/1] fuse: clear FR_PENDING without holding fiq lock for uring requests
      https://git.kernel.org/vfs/vfs/c/c612e9f8d20b
Bernd Schubert Feb. 4, 2025, 11:31 a.m. UTC | #3
Hi Christian,

On 2/4/25 12:12, Christian Brauner wrote:
> On Mon, 03 Feb 2025 10:50:40 -0800, Joanne Koong wrote:
>> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
>> bit is cleared from the request flags when assigning a request to a
>> uring entry, the fiq->lock does not need to be held.
>>
>>
> 
> Applied to the vfs-6.15.fuse branch of the vfs/vfs.git tree.
> Patches in the vfs-6.15.fuse branch should appear in linux-next soon.
> 
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
> 
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
> 
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs-6.15.fuse
> 
> [1/1] fuse: clear FR_PENDING without holding fiq lock for uring requests
>       https://git.kernel.org/vfs/vfs/c/c612e9f8d20b

do you see my reply?

https://lore.kernel.org/all/ff73c955-2267-4c77-8dca-0e4181d8e8b4@fastmail.fm/



Thanks,
Bernd
Christian Brauner Feb. 4, 2025, 12:31 p.m. UTC | #4
On Tue, Feb 04, 2025 at 12:31:21PM +0100, Bernd Schubert wrote:
> Hi Christian,
> 
> On 2/4/25 12:12, Christian Brauner wrote:
> > On Mon, 03 Feb 2025 10:50:40 -0800, Joanne Koong wrote:
> >> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
> >> bit is cleared from the request flags when assigning a request to a
> >> uring entry, the fiq->lock does not need to be held.
> >>
> >>
> > 
> > Applied to the vfs-6.15.fuse branch of the vfs/vfs.git tree.
> > Patches in the vfs-6.15.fuse branch should appear in linux-next soon.
> > 
> > Please report any outstanding bugs that were missed during review in a
> > new review to the original patch series allowing us to drop it.
> > 
> > It's encouraged to provide Acked-bys and Reviewed-bys even though the
> > patch has now been applied. If possible patch trailers will be updated.
> > 
> > Note that commit hashes shown below are subject to change due to rebase,
> > trailer updates or similar. If in doubt, please check the listed branch.
> > 
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> > branch: vfs-6.15.fuse
> > 
> > [1/1] fuse: clear FR_PENDING without holding fiq lock for uring requests
> >       https://git.kernel.org/vfs/vfs/c/c612e9f8d20b
> 
> do you see my reply?
> 
> https://lore.kernel.org/all/ff73c955-2267-4c77-8dca-0e4181d8e8b4@fastmail.fm/

Hm, that's odd. It didn't appear for me... Sorry about that.
Joanne Koong Feb. 4, 2025, 7:26 p.m. UTC | #5
Hi Bernd,

On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Joanne,
>
> On 2/3/25 19:50, Joanne Koong wrote:
> > req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
> > bit is cleared from the request flags when assigning a request to a
> > uring entry, the fiq->lock does not need to be held.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
> > ---
> >  fs/fuse/dev_uring.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> > index ab8c26042aa8..42389d3e7235 100644
> > --- a/fs/fuse/dev_uring.c
> > +++ b/fs/fuse/dev_uring.c
> > @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
> >                       ent->state);
> >       }
> >
> > -     spin_lock(&fiq->lock);
> >       clear_bit(FR_PENDING, &req->flags);
> > -     spin_unlock(&fiq->lock);
> >       ent->fuse_req = req;
> >       ent->state = FRRS_FUSE_REQ;
> >       list_move(&ent->list, &queue->ent_w_req_queue);
>
> I think that would have an issue in request_wait_answer(). Let's say
>
>
> task-A, request_wait_answer(),
>                 spin_lock(&fiq->lock);
>                 /* Request is not yet in userspace, bail out */
>                 if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
>                         list_del(&req->list);  // --> removes from the list
>
> task-B,
> fuse_uring_add_req_to_ring_ent()
>         clear_bit(FR_PENDING, &req->flags);
>         ent->fuse_req = req;
>         ent->state = FRRS_FUSE_REQ;
>         list_move_tail(&ent->list, &queue->ent_w_req_queue);
>         fuse_uring_add_to_pq(ent, req);  // ==> Add to list
>
>
>
> What I mean is, task-A passes the if, but is then slower than task-B. I.e.
> task-B runs fuse_uring_add_to_pq() before task-B does the list_del.
>

Is this race condition possible given that fiq->ops->send_req() is
called (and completed) before request_wait_answer() is called? The
path I see is this:

__fuse_simple_request()
    __fuse_request_send()
        fuse_send_one()
            fiq->ops->send_req()
                  fuse_uring_queue_fuse_req()
                      fuse_uring_add_req_to_ring_ent()
                           clear FR_PENDING bit
                           fuse_uring_add_to_pq()
        request_wait_answer()

It doesn't seem like task A can call request_wait_answer() while task
B is running fuse_uring_queue_fuse_req() on the same request while the
request still has the FR_PENDING bit set.

This case of task A running request_wait_answer() while task B is
executing fuse_uring_add_req_to_ring_ent() can happen through
fuse_uring_commit_fetch() ->  fuse_uring_add_req_to_ring_ent(), but at
that point the FR_PENDING flag will have already been cleared on the
request, so this would bypass the "if (test_bit(FR_PENDING,...))"
check in request_wait_answer().

Is there something I'm missing? I think if this race condition is
possible, then we also have a bigger problem where the request can be
freed out in this request_wait_answer() ->  if (test_bit(FR_PENDING,
&req->flags))...  case while fuse_uring_add_req_to_ring_ent() ->
fuse_uring_add_to_pq() dereferences it still.


Thanks,
Joanne

>
> Now the ring entry gets handled by fuse-server, comes back to fuse-client
> and does not find the request anymore, because both tasks raced.
> The entire ring entry will be lost - it not be used anymore for
> the live time of the connection.
>
> And the other issue might be total list corruption, because two tasks might
> access the list at the same time.
>
>
> Thanks,
> Bernd
>
Joanne Koong Feb. 4, 2025, 7:47 p.m. UTC | #6
On Tue, Feb 4, 2025 at 11:26 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Hi Bernd,
>
> On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > Hi Joanne,
> >
> > On 2/3/25 19:50, Joanne Koong wrote:
> > > req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
> > > bit is cleared from the request flags when assigning a request to a
> > > uring entry, the fiq->lock does not need to be held.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
> > > ---
> > >  fs/fuse/dev_uring.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > >
> > > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> > > index ab8c26042aa8..42389d3e7235 100644
> > > --- a/fs/fuse/dev_uring.c
> > > +++ b/fs/fuse/dev_uring.c
> > > @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
> > >                       ent->state);
> > >       }
> > >
> > > -     spin_lock(&fiq->lock);
> > >       clear_bit(FR_PENDING, &req->flags);
> > > -     spin_unlock(&fiq->lock);
> > >       ent->fuse_req = req;
> > >       ent->state = FRRS_FUSE_REQ;
> > >       list_move(&ent->list, &queue->ent_w_req_queue);
> >
> > I think that would have an issue in request_wait_answer(). Let's say
> >
> >
> > task-A, request_wait_answer(),
> >                 spin_lock(&fiq->lock);
> >                 /* Request is not yet in userspace, bail out */
> >                 if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
> >                         list_del(&req->list);  // --> removes from the list
> >
> > task-B,
> > fuse_uring_add_req_to_ring_ent()
> >         clear_bit(FR_PENDING, &req->flags);
> >         ent->fuse_req = req;
> >         ent->state = FRRS_FUSE_REQ;
> >         list_move_tail(&ent->list, &queue->ent_w_req_queue);
> >         fuse_uring_add_to_pq(ent, req);  // ==> Add to list
> >
> >
> >
> > What I mean is, task-A passes the if, but is then slower than task-B. I.e.
> > task-B runs fuse_uring_add_to_pq() before task-B does the list_del.
> >
>
> Is this race condition possible given that fiq->ops->send_req() is
> called (and completed) before request_wait_answer() is called? The
> path I see is this:
>
> __fuse_simple_request()
>     __fuse_request_send()
>         fuse_send_one()
>             fiq->ops->send_req()
>                   fuse_uring_queue_fuse_req()
>                       fuse_uring_add_req_to_ring_ent()
>                            clear FR_PENDING bit
>                            fuse_uring_add_to_pq()
>         request_wait_answer()
>
> It doesn't seem like task A can call request_wait_answer() while task
> B is running fuse_uring_queue_fuse_req() on the same request while the
> request still has the FR_PENDING bit set.
>
> This case of task A running request_wait_answer() while task B is
> executing fuse_uring_add_req_to_ring_ent() can happen through
> fuse_uring_commit_fetch() ->  fuse_uring_add_req_to_ring_ent(), but at
> that point the FR_PENDING flag will have already been cleared on the
> request, so this would bypass the "if (test_bit(FR_PENDING,...))"
> check in request_wait_answer().
>
> Is there something I'm missing? I think if this race condition is
> possible, then we also have a bigger problem where the request can be
> freed out in this request_wait_answer() ->  if (test_bit(FR_PENDING,
> &req->flags))...  case while fuse_uring_add_req_to_ring_ent() ->
> fuse_uring_add_to_pq() dereferences it still.

Oh okay so in the case where there is no ring entry available when we
send the request, then the pending bit does not get set on the request
and then a fuse_uring_commit_fetch() ->
fuse_uring_add_req_to_ring_ent() can run in parallel with the
request_wait_answer() ->  if (test_bit(FR_PENDING, &req->flags))
logic.

But it looks like this race condition can happen right now in the
existing code where the last refcount on the request will get dropped
and the request freed while we're dereferencing it in
fuse_uring_add_to_pq(). I'm looking at fuse_uring_ent_assign_req()
right now - maybe we need to grab the fiq lock before we dequeue from
req_queue and then clear FR_PENDING wihile the lock is held? Gonna
think about this some more.

Thanks,
Joanne

>
>
> Thanks,
> Joanne
>
> >
> > Now the ring entry gets handled by fuse-server, comes back to fuse-client
> > and does not find the request anymore, because both tasks raced.
> > The entire ring entry will be lost - it not be used anymore for
> > the live time of the connection.
> >
> > And the other issue might be total list corruption, because two tasks might
> > access the list at the same time.
> >
> >
> > Thanks,
> > Bernd
> >
Bernd Schubert Feb. 4, 2025, 8 p.m. UTC | #7
On 2/4/25 20:26, Joanne Koong wrote:
> Hi Bernd,
> 
> On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> Hi Joanne,
>>
>> On 2/3/25 19:50, Joanne Koong wrote:
>>> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
>>> bit is cleared from the request flags when assigning a request to a
>>> uring entry, the fiq->lock does not need to be held.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
>>> ---
>>>  fs/fuse/dev_uring.c | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>> index ab8c26042aa8..42389d3e7235 100644
>>> --- a/fs/fuse/dev_uring.c
>>> +++ b/fs/fuse/dev_uring.c
>>> @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>>>                       ent->state);
>>>       }
>>>
>>> -     spin_lock(&fiq->lock);
>>>       clear_bit(FR_PENDING, &req->flags);
>>> -     spin_unlock(&fiq->lock);
>>>       ent->fuse_req = req;
>>>       ent->state = FRRS_FUSE_REQ;
>>>       list_move(&ent->list, &queue->ent_w_req_queue);
>>
>> I think that would have an issue in request_wait_answer(). Let's say
>>
>>
>> task-A, request_wait_answer(),
>>                 spin_lock(&fiq->lock);
>>                 /* Request is not yet in userspace, bail out */
>>                 if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
>>                         list_del(&req->list);  // --> removes from the list
>>
>> task-B,
>> fuse_uring_add_req_to_ring_ent()
>>         clear_bit(FR_PENDING, &req->flags);
>>         ent->fuse_req = req;
>>         ent->state = FRRS_FUSE_REQ;
>>         list_move_tail(&ent->list, &queue->ent_w_req_queue);
>>         fuse_uring_add_to_pq(ent, req);  // ==> Add to list
>>
>>
>>
>> What I mean is, task-A passes the if, but is then slower than task-B. I.e.
>> task-B runs fuse_uring_add_to_pq() before task-B does the list_del.
>>
> 
> Is this race condition possible given that fiq->ops->send_req() is
> called (and completed) before request_wait_answer() is called? The
> path I see is this:
> 
> __fuse_simple_request()
>     __fuse_request_send()
>         fuse_send_one()
>             fiq->ops->send_req()
>                   fuse_uring_queue_fuse_req()
>                       fuse_uring_add_req_to_ring_ent()
>                            clear FR_PENDING bit
>                            fuse_uring_add_to_pq()
>         request_wait_answer()
> 
> It doesn't seem like task A can call request_wait_answer() while task
> B is running fuse_uring_queue_fuse_req() on the same request while the
> request still has the FR_PENDING bit set.
> 
> This case of task A running request_wait_answer() while task B is
> executing fuse_uring_add_req_to_ring_ent() can happen through
> fuse_uring_commit_fetch() ->  fuse_uring_add_req_to_ring_ent(), but at
> that point the FR_PENDING flag will have already been cleared on the
> request, so this would bypass the "if (test_bit(FR_PENDING,...))"
> check in request_wait_answer().

I mean this case. I don't think FR_PENDING is cleared - why should it?
And where? The request is pending state, waiting to get into 'FR_SENT'?

> 
> Is there something I'm missing? I think if this race condition is
> possible, then we also have a bigger problem where the request can be
> freed out in this request_wait_answer() ->  if (test_bit(FR_PENDING,
> &req->flags))...  case while fuse_uring_add_req_to_ring_ent() ->
> fuse_uring_add_to_pq() dereferences it still.

I don't think so, if we take the lock.


Thanks,
Bernd
Joanne Koong Feb. 4, 2025, 8:29 p.m. UTC | #8
On Tue, Feb 4, 2025 at 12:00 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 2/4/25 20:26, Joanne Koong wrote:
> > Hi Bernd,
> >
> > On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> >>
> >> Hi Joanne,
> >>
> >> On 2/3/25 19:50, Joanne Koong wrote:
> >>> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
> >>> bit is cleared from the request flags when assigning a request to a
> >>> uring entry, the fiq->lock does not need to be held.
> >>>
> >>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>> Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
> >>> ---
> >>>  fs/fuse/dev_uring.c | 2 --
> >>>  1 file changed, 2 deletions(-)
> >>>
> >>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> >>> index ab8c26042aa8..42389d3e7235 100644
> >>> --- a/fs/fuse/dev_uring.c
> >>> +++ b/fs/fuse/dev_uring.c
> >>> @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
> >>>                       ent->state);
> >>>       }
> >>>
> >>> -     spin_lock(&fiq->lock);
> >>>       clear_bit(FR_PENDING, &req->flags);
> >>> -     spin_unlock(&fiq->lock);
> >>>       ent->fuse_req = req;
> >>>       ent->state = FRRS_FUSE_REQ;
> >>>       list_move(&ent->list, &queue->ent_w_req_queue);
> >>
> >> I think that would have an issue in request_wait_answer(). Let's say
> >>
> >>
> >> task-A, request_wait_answer(),
> >>                 spin_lock(&fiq->lock);
> >>                 /* Request is not yet in userspace, bail out */
> >>                 if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
> >>                         list_del(&req->list);  // --> removes from the list
> >>
> >> task-B,
> >> fuse_uring_add_req_to_ring_ent()
> >>         clear_bit(FR_PENDING, &req->flags);
> >>         ent->fuse_req = req;
> >>         ent->state = FRRS_FUSE_REQ;
> >>         list_move_tail(&ent->list, &queue->ent_w_req_queue);
> >>         fuse_uring_add_to_pq(ent, req);  // ==> Add to list
> >>
> >>
> >>
> >> What I mean is, task-A passes the if, but is then slower than task-B. I.e.
> >> task-B runs fuse_uring_add_to_pq() before task-B does the list_del.
> >>
> >
> > Is this race condition possible given that fiq->ops->send_req() is
> > called (and completed) before request_wait_answer() is called? The
> > path I see is this:
> >
> > __fuse_simple_request()
> >     __fuse_request_send()
> >         fuse_send_one()
> >             fiq->ops->send_req()
> >                   fuse_uring_queue_fuse_req()
> >                       fuse_uring_add_req_to_ring_ent()
> >                            clear FR_PENDING bit
> >                            fuse_uring_add_to_pq()
> >         request_wait_answer()
> >
> > It doesn't seem like task A can call request_wait_answer() while task
> > B is running fuse_uring_queue_fuse_req() on the same request while the
> > request still has the FR_PENDING bit set.
> >
> > This case of task A running request_wait_answer() while task B is
> > executing fuse_uring_add_req_to_ring_ent() can happen through
> > fuse_uring_commit_fetch() ->  fuse_uring_add_req_to_ring_ent(), but at
> > that point the FR_PENDING flag will have already been cleared on the
> > request, so this would bypass the "if (test_bit(FR_PENDING,...))"
> > check in request_wait_answer().
>
> I mean this case. I don't think FR_PENDING is cleared - why should it?
> And where? The request is pending state, waiting to get into 'FR_SENT'?
>
> >
> > Is there something I'm missing? I think if this race condition is
> > possible, then we also have a bigger problem where the request can be
> > freed out in this request_wait_answer() ->  if (test_bit(FR_PENDING,
> > &req->flags))...  case while fuse_uring_add_req_to_ring_ent() ->
> > fuse_uring_add_to_pq() dereferences it still.
>
> I don't think so, if we take the lock.
>

the path I'm looking at is this:

task A -
__fuse_simple_request()
    fuse_get_req() -> request is allocated (req refcount is 1)
    __fuse_request_send()
        __fuse_get_request() -> req refcount is 2
        fuse_send_one() -> req gets sent to uring
        request_wait_answer()
               ...
               hits the interrupt case, goes into "if
test_bit(FR_PENDING, ...)" case which calls __fuse_put_request(), req
refcount is now 1
    fuse_put_request() -> req refcount is dropped to 0, request is freed

while in task B -
fuse_uring_commit_fetch()
    fuse_uring_next_fuse_req()
        fuse_uring_ent_assign_req()
            gets req off fuse_req_queue
            fuse_uring_add_req_to_ring_ent()
                 clear FR_PENDING
                 fuse_uring_add_to_pq()
                     dereferences req

if task A hits the interrupt case in request_wait_answer() and then
calls fuse_put_request() before task B clears the pending flag (and
after it's gotten the request from the fuse_req_queue in
fuse_uring_ent_assign_req()), then I think we hit this case, no?


Thanks,
Joanne
>
> Thanks,
> Bernd
>
Bernd Schubert Feb. 4, 2025, 8:38 p.m. UTC | #9
On 2/4/25 21:29, Joanne Koong wrote:
> On Tue, Feb 4, 2025 at 12:00 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 2/4/25 20:26, Joanne Koong wrote:
>>> Hi Bernd,
>>>
>>> On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert
>>> <bernd.schubert@fastmail.fm> wrote:
>>>>
>>>> Hi Joanne,
>>>>
>>>> On 2/3/25 19:50, Joanne Koong wrote:
>>>>> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
>>>>> bit is cleared from the request flags when assigning a request to a
>>>>> uring entry, the fiq->lock does not need to be held.
>>>>>
>>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>>>> Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
>>>>> ---
>>>>>  fs/fuse/dev_uring.c | 2 --
>>>>>  1 file changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>>> index ab8c26042aa8..42389d3e7235 100644
>>>>> --- a/fs/fuse/dev_uring.c
>>>>> +++ b/fs/fuse/dev_uring.c
>>>>> @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>>>>>                       ent->state);
>>>>>       }
>>>>>
>>>>> -     spin_lock(&fiq->lock);
>>>>>       clear_bit(FR_PENDING, &req->flags);
>>>>> -     spin_unlock(&fiq->lock);
>>>>>       ent->fuse_req = req;
>>>>>       ent->state = FRRS_FUSE_REQ;
>>>>>       list_move(&ent->list, &queue->ent_w_req_queue);
>>>>
>>>> I think that would have an issue in request_wait_answer(). Let's say
>>>>
>>>>
>>>> task-A, request_wait_answer(),
>>>>                 spin_lock(&fiq->lock);
>>>>                 /* Request is not yet in userspace, bail out */
>>>>                 if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
>>>>                         list_del(&req->list);  // --> removes from the list
>>>>
>>>> task-B,
>>>> fuse_uring_add_req_to_ring_ent()
>>>>         clear_bit(FR_PENDING, &req->flags);
>>>>         ent->fuse_req = req;
>>>>         ent->state = FRRS_FUSE_REQ;
>>>>         list_move_tail(&ent->list, &queue->ent_w_req_queue);
>>>>         fuse_uring_add_to_pq(ent, req);  // ==> Add to list
>>>>
>>>>
>>>>
>>>> What I mean is, task-A passes the if, but is then slower than task-B. I.e.
>>>> task-B runs fuse_uring_add_to_pq() before task-B does the list_del.
>>>>
>>>
>>> Is this race condition possible given that fiq->ops->send_req() is
>>> called (and completed) before request_wait_answer() is called? The
>>> path I see is this:
>>>
>>> __fuse_simple_request()
>>>     __fuse_request_send()
>>>         fuse_send_one()
>>>             fiq->ops->send_req()
>>>                   fuse_uring_queue_fuse_req()
>>>                       fuse_uring_add_req_to_ring_ent()
>>>                            clear FR_PENDING bit
>>>                            fuse_uring_add_to_pq()
>>>         request_wait_answer()
>>>
>>> It doesn't seem like task A can call request_wait_answer() while task
>>> B is running fuse_uring_queue_fuse_req() on the same request while the
>>> request still has the FR_PENDING bit set.
>>>
>>> This case of task A running request_wait_answer() while task B is
>>> executing fuse_uring_add_req_to_ring_ent() can happen through
>>> fuse_uring_commit_fetch() ->  fuse_uring_add_req_to_ring_ent(), but at
>>> that point the FR_PENDING flag will have already been cleared on the
>>> request, so this would bypass the "if (test_bit(FR_PENDING,...))"
>>> check in request_wait_answer().
>>
>> I mean this case. I don't think FR_PENDING is cleared - why should it?
>> And where? The request is pending state, waiting to get into 'FR_SENT'?
>>
>>>
>>> Is there something I'm missing? I think if this race condition is
>>> possible, then we also have a bigger problem where the request can be
>>> freed out in this request_wait_answer() ->  if (test_bit(FR_PENDING,
>>> &req->flags))...  case while fuse_uring_add_req_to_ring_ent() ->
>>> fuse_uring_add_to_pq() dereferences it still.
>>
>> I don't think so, if we take the lock.
>>
> 
> the path I'm looking at is this:
> 
> task A -
> __fuse_simple_request()
>     fuse_get_req() -> request is allocated (req refcount is 1)
>     __fuse_request_send()
>         __fuse_get_request() -> req refcount is 2
>         fuse_send_one() -> req gets sent to uring
>         request_wait_answer()
>                ...
>                hits the interrupt case, goes into "if
> test_bit(FR_PENDING, ...)" case which calls __fuse_put_request(), req
> refcount is now 1
>     fuse_put_request() -> req refcount is dropped to 0, request is freed
> 
> while in task B -
> fuse_uring_commit_fetch()
>     fuse_uring_next_fuse_req()
>         fuse_uring_ent_assign_req()
>             gets req off fuse_req_queue
>             fuse_uring_add_req_to_ring_ent()
>                  clear FR_PENDING
>                  fuse_uring_add_to_pq()
>                      dereferences req
> 
> if task A hits the interrupt case in request_wait_answer() and then
> calls fuse_put_request() before task B clears the pending flag (and
> after it's gotten the request from the fuse_req_queue in
> fuse_uring_ent_assign_req()), then I think we hit this case, no?
> 

Oh no, yes, you are right. It is a bit ugly to use fiq lock for list
handling. I think I'm going to add uring handler for that to
request_wait_answer. In general, basically request_wait_answer
is currently operating on the wrong list - it assumes fiq, but that
is not where the request it waiting on.


Thanks for pointing this out!
Bernd Schubert Feb. 4, 2025, 9:31 p.m. UTC | #10
On 2/4/25 21:38, Bernd Schubert wrote:
> 
> 
> On 2/4/25 21:29, Joanne Koong wrote:
>> On Tue, Feb 4, 2025 at 12:00 PM Bernd Schubert
>> <bernd.schubert@fastmail.fm> wrote:
>>>
>>>
>>>
>>> On 2/4/25 20:26, Joanne Koong wrote:
>>>> Hi Bernd,
>>>>
>>>> On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert
>>>> <bernd.schubert@fastmail.fm> wrote:
>>>>>
>>>>> Hi Joanne,
>>>>>
>>>>> On 2/3/25 19:50, Joanne Koong wrote:
>>>>>> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
>>>>>> bit is cleared from the request flags when assigning a request to a
>>>>>> uring entry, the fiq->lock does not need to be held.
>>>>>>
>>>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>>>>> Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
>>>>>> ---
>>>>>>  fs/fuse/dev_uring.c | 2 --
>>>>>>  1 file changed, 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>>>> index ab8c26042aa8..42389d3e7235 100644
>>>>>> --- a/fs/fuse/dev_uring.c
>>>>>> +++ b/fs/fuse/dev_uring.c
>>>>>> @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>>>>>>                       ent->state);
>>>>>>       }
>>>>>>
>>>>>> -     spin_lock(&fiq->lock);
>>>>>>       clear_bit(FR_PENDING, &req->flags);
>>>>>> -     spin_unlock(&fiq->lock);
>>>>>>       ent->fuse_req = req;
>>>>>>       ent->state = FRRS_FUSE_REQ;
>>>>>>       list_move(&ent->list, &queue->ent_w_req_queue);
>>>>>
>>>>> I think that would have an issue in request_wait_answer(). Let's say
>>>>>
>>>>>
>>>>> task-A, request_wait_answer(),
>>>>>                 spin_lock(&fiq->lock);
>>>>>                 /* Request is not yet in userspace, bail out */
>>>>>                 if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
>>>>>                         list_del(&req->list);  // --> removes from the list
>>>>>
>>>>> task-B,
>>>>> fuse_uring_add_req_to_ring_ent()
>>>>>         clear_bit(FR_PENDING, &req->flags);
>>>>>         ent->fuse_req = req;
>>>>>         ent->state = FRRS_FUSE_REQ;
>>>>>         list_move_tail(&ent->list, &queue->ent_w_req_queue);
>>>>>         fuse_uring_add_to_pq(ent, req);  // ==> Add to list
>>>>>
>>>>>
>>>>>
>>>>> What I mean is, task-A passes the if, but is then slower than task-B. I.e.
>>>>> task-B runs fuse_uring_add_to_pq() before task-B does the list_del.
>>>>>
>>>>
>>>> Is this race condition possible given that fiq->ops->send_req() is
>>>> called (and completed) before request_wait_answer() is called? The
>>>> path I see is this:
>>>>
>>>> __fuse_simple_request()
>>>>     __fuse_request_send()
>>>>         fuse_send_one()
>>>>             fiq->ops->send_req()
>>>>                   fuse_uring_queue_fuse_req()
>>>>                       fuse_uring_add_req_to_ring_ent()
>>>>                            clear FR_PENDING bit
>>>>                            fuse_uring_add_to_pq()
>>>>         request_wait_answer()
>>>>
>>>> It doesn't seem like task A can call request_wait_answer() while task
>>>> B is running fuse_uring_queue_fuse_req() on the same request while the
>>>> request still has the FR_PENDING bit set.
>>>>
>>>> This case of task A running request_wait_answer() while task B is
>>>> executing fuse_uring_add_req_to_ring_ent() can happen through
>>>> fuse_uring_commit_fetch() ->  fuse_uring_add_req_to_ring_ent(), but at
>>>> that point the FR_PENDING flag will have already been cleared on the
>>>> request, so this would bypass the "if (test_bit(FR_PENDING,...))"
>>>> check in request_wait_answer().
>>>
>>> I mean this case. I don't think FR_PENDING is cleared - why should it?
>>> And where? The request is pending state, waiting to get into 'FR_SENT'?
>>>
>>>>
>>>> Is there something I'm missing? I think if this race condition is
>>>> possible, then we also have a bigger problem where the request can be
>>>> freed out in this request_wait_answer() ->  if (test_bit(FR_PENDING,
>>>> &req->flags))...  case while fuse_uring_add_req_to_ring_ent() ->
>>>> fuse_uring_add_to_pq() dereferences it still.
>>>
>>> I don't think so, if we take the lock.
>>>
>>
>> the path I'm looking at is this:
>>
>> task A -
>> __fuse_simple_request()
>>     fuse_get_req() -> request is allocated (req refcount is 1)
>>     __fuse_request_send()
>>         __fuse_get_request() -> req refcount is 2
>>         fuse_send_one() -> req gets sent to uring
>>         request_wait_answer()
>>                ...
>>                hits the interrupt case, goes into "if
>> test_bit(FR_PENDING, ...)" case which calls __fuse_put_request(), req
>> refcount is now 1
>>     fuse_put_request() -> req refcount is dropped to 0, request is freed
>>
>> while in task B -
>> fuse_uring_commit_fetch()
>>     fuse_uring_next_fuse_req()
>>         fuse_uring_ent_assign_req()
>>             gets req off fuse_req_queue
>>             fuse_uring_add_req_to_ring_ent()
>>                  clear FR_PENDING
>>                  fuse_uring_add_to_pq()
>>                      dereferences req
>>
>> if task A hits the interrupt case in request_wait_answer() and then
>> calls fuse_put_request() before task B clears the pending flag (and
>> after it's gotten the request from the fuse_req_queue in
>> fuse_uring_ent_assign_req()), then I think we hit this case, no?
>>
> 
> Oh no, yes, you are right. It is a bit ugly to use fiq lock for list
> handling. I think I'm going to add uring handler for that to
> request_wait_answer. In general, basically request_wait_answer
> is currently operating on the wrong list - it assumes fiq, but that
> is not where the request it waiting on.

Please see the attached patch, I need to think about a way to test this
and will send out properly tomorrow. So far it is only basically
compilation tested.


Thanks,
Bernd
fuse: {io-uring} Fix a possible req cancellation race

From: Bernd Schubert <bschubert@ddn.com>

task-A (application) might be in request_wait_answer and
try to remove the request when it has FR_PENDING set.

task-B (a fuse-server io-uring task) might handle this
request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when
fetching the next request and accessed the req from
the pending list in fuse_uring_ent_assign_req().
That code path was not protected by fiq->lock and so
might race with task-A.

For scaling reasons we better don't use fiq->lock, but
add a handler to remove canceled requests from the queue.

Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support")
Reported-by: Joanne Koong <joannelkoong@gmail.com>
Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@mail.gmail.com/
Signed-off-by: Bernd Schubert <bschubert@ddn.com>

--
Compilation tested only
---
 fs/fuse/dev.c         |   25 ++++++++++++++++---------
 fs/fuse/dev_uring.c   |   25 +++++++++++++++++++++----
 fs/fuse/dev_uring_i.h |    6 ++++++
 fs/fuse/fuse_dev_i.h  |    2 ++
 fs/fuse/fuse_i.h      |    2 ++
 5 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 80a11ef4b69a..0494ea47893a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -157,7 +157,7 @@ static void __fuse_get_request(struct fuse_req *req)
 }
 
 /* Must be called with > 1 refcount */
-static void __fuse_put_request(struct fuse_req *req)
+void __fuse_put_request(struct fuse_req *req)
 {
 	refcount_dec(&req->count);
 }
@@ -529,16 +529,23 @@ static void request_wait_answer(struct fuse_req *req)
 		if (!err)
 			return;
 
-		spin_lock(&fiq->lock);
-		/* Request is not yet in userspace, bail out */
-		if (test_bit(FR_PENDING, &req->flags)) {
-			list_del(&req->list);
+		if (test_bit(FR_URING, &req->flags)) {
+			bool removed = fuse_uring_remove_pending_req(req);
+
+			if (removed)
+				return;
+		} else {
+			spin_lock(&fiq->lock);
+			/* Request is not yet in userspace, bail out */
+			if (test_bit(FR_PENDING, &req->flags)) {
+				list_del(&req->list);
+				spin_unlock(&fiq->lock);
+				__fuse_put_request(req);
+				req->out.h.error = -EINTR;
+				return;
+			}
 			spin_unlock(&fiq->lock);
-			__fuse_put_request(req);
-			req->out.h.error = -EINTR;
-			return;
 		}
-		spin_unlock(&fiq->lock);
 	}
 
 	/*
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 1e2bceb4ff1e..f9abdcf5f7e6 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -771,8 +771,6 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 					   struct fuse_req *req)
 {
 	struct fuse_ring_queue *queue = ent->queue;
-	struct fuse_conn *fc = req->fm->fc;
-	struct fuse_iqueue *fiq = &fc->iq;
 
 	lockdep_assert_held(&queue->lock);
 
@@ -782,9 +780,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 			ent->state);
 	}
 
-	spin_lock(&fiq->lock);
 	clear_bit(FR_PENDING, &req->flags);
-	spin_unlock(&fiq->lock);
 	ent->fuse_req = req;
 	ent->state = FRRS_FUSE_REQ;
 	list_move_tail(&ent->list, &queue->ent_w_req_queue);
@@ -1285,6 +1281,8 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 	if (unlikely(queue->stopped))
 		goto err_unlock;
 
+	set_bit(FR_URING, &req->flags);
+	req->ring_queue = queue;
 	ent = list_first_entry_or_null(&queue->ent_avail_queue,
 				       struct fuse_ring_ent, list);
 	if (ent)
@@ -1323,6 +1321,8 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 		return false;
 	}
 
+	set_bit(FR_URING, &req->flags);
+	req->ring_queue = queue;
 	list_add_tail(&req->list, &queue->fuse_req_bg_queue);
 
 	ent = list_first_entry_or_null(&queue->ent_avail_queue,
@@ -1353,6 +1353,23 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 	return true;
 }
 
+bool fuse_uring_remove_pending_req(struct fuse_req *req)
+{
+	struct fuse_ring_queue *queue = req->ring_queue;
+
+	spin_lock(&queue->lock);
+	if (test_bit(FR_PENDING, &req->flags)) {
+		list_del(&req->list);
+		spin_unlock(&queue->lock);
+		__fuse_put_request(req);
+		req->out.h.error = -EINTR;
+		return true;
+	}
+	spin_unlock(&queue->lock);
+
+	return false;
+}
+
 static const struct fuse_iqueue_ops fuse_io_uring_ops = {
 	/* should be send over io-uring as enhancement */
 	.send_forget = fuse_dev_queue_forget,
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index a37991d17d34..86071758628f 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -143,6 +143,7 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
 void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
 bool fuse_uring_queue_bq_req(struct fuse_req *req);
 bool fuse_uring_request_expired(struct fuse_conn *fc);
+bool fuse_uring_remove_pending_req(struct fuse_req *req);
 
 static inline void fuse_uring_abort(struct fuse_conn *fc)
 {
@@ -206,6 +207,11 @@ static inline bool fuse_uring_request_expired(struct fuse_conn *fc)
 	return false;
 }
 
+static inline bool fuse_uring_remove_pending_req(struct fuse_req *req)
+{
+	return false;
+}
+
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 19c29c6000a7..36b9092061ea 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -49,6 +49,8 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 unsigned int fuse_req_hash(u64 unique);
 struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
 
+void __fuse_put_request(struct fuse_req *req);
+
 void fuse_dev_end_requests(struct list_head *head);
 
 void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index dcc1c327a057..29a7a6e57577 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -408,6 +408,7 @@ enum fuse_req_flag {
 	FR_FINISHED,
 	FR_PRIVATE,
 	FR_ASYNC,
+	FR_URING,
 };
 
 /**
@@ -457,6 +458,7 @@ struct fuse_req {
 
 #ifdef CONFIG_FUSE_IO_URING
 	void *ring_entry;
+	void *ring_queue;
 #endif
 	/** When (in jiffies) the request was created */
 	unsigned long create_time;
Jeff Layton Feb. 5, 2025, 3:53 p.m. UTC | #11
On Tue, 2025-02-04 at 22:31 +0100, Bernd Schubert wrote:
> fuse: {io-uring} Fix a possible req cancellation race
> 
> From: Bernd Schubert <bschubert@ddn.com>
> 
> task-A (application) might be in request_wait_answer and
> try to remove the request when it has FR_PENDING set.
> 
> task-B (a fuse-server io-uring task) might handle this
> request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when
> fetching the next request and accessed the req from
> the pending list in fuse_uring_ent_assign_req().
> That code path was not protected by fiq->lock and so
> might race with task-A.
> 
> For scaling reasons we better don't use fiq->lock, but
> add a handler to remove canceled requests from the queue.
> 
> Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support")
> Reported-by: Joanne Koong <joannelkoong@gmail.com>
> Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@mail.gmail.com/
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> 
> --
> Compilation tested only
> ---
>  fs/fuse/dev.c         |   25 ++++++++++++++++---------
>  fs/fuse/dev_uring.c   |   25 +++++++++++++++++++++----
>  fs/fuse/dev_uring_i.h |    6 ++++++
>  fs/fuse/fuse_dev_i.h  |    2 ++
>  fs/fuse/fuse_i.h      |    2 ++
>  5 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 80a11ef4b69a..0494ea47893a 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -157,7 +157,7 @@ static void __fuse_get_request(struct fuse_req *req)
>  }
>  
>  /* Must be called with > 1 refcount */
> -static void __fuse_put_request(struct fuse_req *req)
> +void __fuse_put_request(struct fuse_req *req)
>  {
>  	refcount_dec(&req->count);
>  }
> @@ -529,16 +529,23 @@ static void request_wait_answer(struct fuse_req *req)
>  		if (!err)
>  			return;
>  
> -		spin_lock(&fiq->lock);
> -		/* Request is not yet in userspace, bail out */
> -		if (test_bit(FR_PENDING, &req->flags)) {
> -			list_del(&req->list);
> +		if (test_bit(FR_URING, &req->flags)) {
> +			bool removed = fuse_uring_remove_pending_req(req);
> +
> +			if (removed)
> +				return;
> +		} else {
> +			spin_lock(&fiq->lock);
> +			/* Request is not yet in userspace, bail out */
> +			if (test_bit(FR_PENDING, &req->flags)) {
> +				list_del(&req->list);
> +				spin_unlock(&fiq->lock);
> +				__fuse_put_request(req);
> +				req->out.h.error = -EINTR;
> +				return;
> +			}

One thing that bothers me with the existing code and this patch is that
the semantics around FR_PENDING are unclear.

I know it's supposed to mean that the req is waiting for userland to
read it, but in the above case for instance, we're removing it from the
list and dropping its refcount while leaving the bit set. Shouldn't we
clear it there and in fuse_uring_remove_pending_req()?

 
>  			spin_unlock(&fiq->lock);
> -			__fuse_put_request(req);
> -			req->out.h.error = -EINTR;
> -			return;
>  		}
> -		spin_unlock(&fiq->lock);
>  	}
>  
>  	/*
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 1e2bceb4ff1e..f9abdcf5f7e6 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -771,8 +771,6 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>  					   struct fuse_req *req)
>  {
>  	struct fuse_ring_queue *queue = ent->queue;
> -	struct fuse_conn *fc = req->fm->fc;
> -	struct fuse_iqueue *fiq = &fc->iq;
>  
>  	lockdep_assert_held(&queue->lock);
>  
> @@ -782,9 +780,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>  			ent->state);
>  	}
>  
> -	spin_lock(&fiq->lock);
>  	clear_bit(FR_PENDING, &req->flags);
> -	spin_unlock(&fiq->lock);
>  	ent->fuse_req = req;
>  	ent->state = FRRS_FUSE_REQ;
>  	list_move_tail(&ent->list, &queue->ent_w_req_queue);
> @@ -1285,6 +1281,8 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
>  	if (unlikely(queue->stopped))
>  		goto err_unlock;
>  
> +	set_bit(FR_URING, &req->flags);
> +	req->ring_queue = queue;
>  	ent = list_first_entry_or_null(&queue->ent_avail_queue,
>  				       struct fuse_ring_ent, list);
>  	if (ent)
> @@ -1323,6 +1321,8 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
>  		return false;
>  	}
>  
> +	set_bit(FR_URING, &req->flags);
> +	req->ring_queue = queue;
>  	list_add_tail(&req->list, &queue->fuse_req_bg_queue);
>  
>  	ent = list_first_entry_or_null(&queue->ent_avail_queue,
> @@ -1353,6 +1353,23 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
>  	return true;
>  }
>  
> +bool fuse_uring_remove_pending_req(struct fuse_req *req)
> +{
> +	struct fuse_ring_queue *queue = req->ring_queue;
> +
> +	spin_lock(&queue->lock);
> +	if (test_bit(FR_PENDING, &req->flags)) {
> +		list_del(&req->list);
> +		spin_unlock(&queue->lock);
> +		__fuse_put_request(req);
> +		req->out.h.error = -EINTR;
> +		return true;
> +	}
> +	spin_unlock(&queue->lock);
> +
> +	return false;
> +}
> +
>  static const struct fuse_iqueue_ops fuse_io_uring_ops = {
>  	/* should be send over io-uring as enhancement */
>  	.send_forget = fuse_dev_queue_forget,
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index a37991d17d34..86071758628f 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -143,6 +143,7 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
>  void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
>  bool fuse_uring_queue_bq_req(struct fuse_req *req);
>  bool fuse_uring_request_expired(struct fuse_conn *fc);
> +bool fuse_uring_remove_pending_req(struct fuse_req *req);
>  
>  static inline void fuse_uring_abort(struct fuse_conn *fc)
>  {
> @@ -206,6 +207,11 @@ static inline bool fuse_uring_request_expired(struct fuse_conn *fc)
>  	return false;
>  }
>  
> +static inline bool fuse_uring_remove_pending_req(struct fuse_req *req)
> +{
> +	return false;
> +}
> +
>  #endif /* CONFIG_FUSE_IO_URING */
>  
>  #endif /* _FS_FUSE_DEV_URING_I_H */
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 19c29c6000a7..36b9092061ea 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -49,6 +49,8 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
>  unsigned int fuse_req_hash(u64 unique);
>  struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
>  
> +void __fuse_put_request(struct fuse_req *req);
> +
>  void fuse_dev_end_requests(struct list_head *head);
>  
>  void fuse_copy_init(struct fuse_copy_state *cs, int write,
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index dcc1c327a057..29a7a6e57577 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -408,6 +408,7 @@ enum fuse_req_flag {
>  	FR_FINISHED,
>  	FR_PRIVATE,
>  	FR_ASYNC,
> +	FR_URING,
>  };
>  
>  /**
> @@ -457,6 +458,7 @@ struct fuse_req {
>  
>  #ifdef CONFIG_FUSE_IO_URING
>  	void *ring_entry;
> +	void *ring_queue;
>  #endif
>  	/** When (in jiffies) the request was created */
>  	unsigned long create_time;
Bernd Schubert Feb. 5, 2025, 3:57 p.m. UTC | #12
On 2/5/25 16:53, Jeff Layton wrote:
> On Tue, 2025-02-04 at 22:31 +0100, Bernd Schubert wrote:
>> fuse: {io-uring} Fix a possible req cancellation race
>>
>> From: Bernd Schubert <bschubert@ddn.com>
>>
>> task-A (application) might be in request_wait_answer and
>> try to remove the request when it has FR_PENDING set.
>>
>> task-B (a fuse-server io-uring task) might handle this
>> request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when
>> fetching the next request and accessed the req from
>> the pending list in fuse_uring_ent_assign_req().
>> That code path was not protected by fiq->lock and so
>> might race with task-A.
>>
>> For scaling reasons we better don't use fiq->lock, but
>> add a handler to remove canceled requests from the queue.
>>
>> Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support")
>> Reported-by: Joanne Koong <joannelkoong@gmail.com>
>> Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@mail.gmail.com/
>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>
>> --
>> Compilation tested only
>> ---
>>  fs/fuse/dev.c         |   25 ++++++++++++++++---------
>>  fs/fuse/dev_uring.c   |   25 +++++++++++++++++++++----
>>  fs/fuse/dev_uring_i.h |    6 ++++++
>>  fs/fuse/fuse_dev_i.h  |    2 ++
>>  fs/fuse/fuse_i.h      |    2 ++
>>  5 files changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 80a11ef4b69a..0494ea47893a 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -157,7 +157,7 @@ static void __fuse_get_request(struct fuse_req *req)
>>  }
>>  
>>  /* Must be called with > 1 refcount */
>> -static void __fuse_put_request(struct fuse_req *req)
>> +void __fuse_put_request(struct fuse_req *req)
>>  {
>>  	refcount_dec(&req->count);
>>  }
>> @@ -529,16 +529,23 @@ static void request_wait_answer(struct fuse_req *req)
>>  		if (!err)
>>  			return;
>>  
>> -		spin_lock(&fiq->lock);
>> -		/* Request is not yet in userspace, bail out */
>> -		if (test_bit(FR_PENDING, &req->flags)) {
>> -			list_del(&req->list);
>> +		if (test_bit(FR_URING, &req->flags)) {
>> +			bool removed = fuse_uring_remove_pending_req(req);
>> +
>> +			if (removed)
>> +				return;
>> +		} else {
>> +			spin_lock(&fiq->lock);
>> +			/* Request is not yet in userspace, bail out */
>> +			if (test_bit(FR_PENDING, &req->flags)) {
>> +				list_del(&req->list);
>> +				spin_unlock(&fiq->lock);
>> +				__fuse_put_request(req);
>> +				req->out.h.error = -EINTR;
>> +				return;
>> +			}
> 
> One thing that bothers me with the existing code and this patch is that
> the semantics around FR_PENDING are unclear.
> 
> I know it's supposed to mean that the req is waiting for userland to
> read it, but in the above case for instance, we're removing it from the
> list and dropping its refcount while leaving the bit set. Shouldn't we
> clear it there and in fuse_uring_remove_pending_req()?
> 

Yeah, I think the assumption from the code is that the request is now
getting destructed, I will add in to clear FR_PENDING. 


Thanks,
Bernd
diff mbox series

Patch

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index ab8c26042aa8..42389d3e7235 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -764,9 +764,7 @@  static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 			ent->state);
 	}
 
-	spin_lock(&fiq->lock);
 	clear_bit(FR_PENDING, &req->flags);
-	spin_unlock(&fiq->lock);
 	ent->fuse_req = req;
 	ent->state = FRRS_FUSE_REQ;
 	list_move(&ent->list, &queue->ent_w_req_queue);