diff mbox series

[PATCHv2] thunderbolt: do not double dequeue a request

Message ID 20250327114222.100293-1-senozhatsky@chromium.org (mailing list archive)
State Superseded
Headers show
Series [PATCHv2] thunderbolt: do not double dequeue a request | expand

Commit Message

Sergey Senozhatsky March 27, 2025, 11:41 a.m. UTC
Some of our devices crash in tb_cfg_request_dequeue():

 general protection fault, probably for non-canonical address 0xdead000000000122

 CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65)
 RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0
 Call Trace:
 <TASK>
 ? tb_cfg_request_dequeue+0x2d/0xa0
 tb_cfg_request_work+0x33/0x80
 worker_thread+0x386/0x8f0
 kthread+0xed/0x110
 ret_from_fork+0x38/0x50
 ret_from_fork_asm+0x1b/0x30

The circumstances are unclear, however, the theory is that
tb_cfg_request_work() can be scheduled twice for a request:
first time via frame.callback from ring_work() and second
time from tb_cfg_request().  Both times kworkers will execute
tb_cfg_request_dequeue(), which results in double list_del()
from the ctl->request_queue (the list poison deference hints
at it: 0xdead000000000122).

Another possibility can be tb_cfg_request_sync():

tb_cfg_request_sync()
 tb_cfg_request()
  schedule_work(&req->work) -> tb_cfg_request_dequeue()
 tb_cfg_request_cancel()
  schedule_work(&req->work) -> tb_cfg_request_dequeue()

To address the issue, do not dequeue requests that don't
have TB_CFG_REQUEST_ACTIVE bit set.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: stable@vger.kernel.org
---

v2: updated commit message, kept list_del()

 drivers/thunderbolt/ctl.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Mika Westerberg March 27, 2025, 1:37 p.m. UTC | #1
Hi,

On Thu, Mar 27, 2025 at 08:41:21PM +0900, Sergey Senozhatsky wrote:
> Some of our devices crash in tb_cfg_request_dequeue():
> 
>  general protection fault, probably for non-canonical address 0xdead000000000122
> 
>  CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65)
>  RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0
>  Call Trace:
>  <TASK>
>  ? tb_cfg_request_dequeue+0x2d/0xa0
>  tb_cfg_request_work+0x33/0x80
>  worker_thread+0x386/0x8f0
>  kthread+0xed/0x110
>  ret_from_fork+0x38/0x50
>  ret_from_fork_asm+0x1b/0x30
> 
> The circumstances are unclear, however, the theory is that
> tb_cfg_request_work() can be scheduled twice for a request:
> first time via frame.callback from ring_work() and second
> time from tb_cfg_request().  Both times kworkers will execute
> tb_cfg_request_dequeue(), which results in double list_del()
> from the ctl->request_queue (the list poison deference hints
> at it: 0xdead000000000122).
> 
> Another possibility can be tb_cfg_request_sync():
> 
> tb_cfg_request_sync()
>  tb_cfg_request()
>   schedule_work(&req->work) -> tb_cfg_request_dequeue()
>  tb_cfg_request_cancel()
>   schedule_work(&req->work) -> tb_cfg_request_dequeue()

Not sure about this one because &req->work will only be scheduled once the
second schedule_work() should not queue it again (as far as I can tell).

> To address the issue, do not dequeue requests that don't
> have TB_CFG_REQUEST_ACTIVE bit set.

Just to be sure. After this change you have not seen the issue anymore
with your testing?

> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Cc: stable@vger.kernel.org
> ---
> 
> v2: updated commit message, kept list_del()
> 
>  drivers/thunderbolt/ctl.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
> index cd15e84c47f4..1db2e951b53f 100644
> --- a/drivers/thunderbolt/ctl.c
> +++ b/drivers/thunderbolt/ctl.c
> @@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req)
>  	struct tb_ctl *ctl = req->ctl;
>  
>  	mutex_lock(&ctl->request_queue_lock);
> +	if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) {
> +		mutex_unlock(&ctl->request_queue_lock);
> +		return;
> +	}
> +
>  	list_del(&req->list);
>  	clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags);
>  	if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags))
> -- 
> 2.49.0.395.g12beb8f557-goog
Sergey Senozhatsky March 27, 2025, 2:02 p.m. UTC | #2
Hi,

On (25/03/27 15:37), Mika Westerberg wrote:
> > Another possibility can be tb_cfg_request_sync():
> > 
> > tb_cfg_request_sync()
> >  tb_cfg_request()
> >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> >  tb_cfg_request_cancel()
> >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> 
> Not sure about this one because &req->work will only be scheduled once the
> second schedule_work() should not queue it again (as far as I can tell).

If the second schedule_work() happens after a timeout, that's what
!wait_for_completion_timeout() does, then the first schedule_work()
can already execute the work by that time, and then we can schedule
the work again (but the request is already dequeued).  Am I missing
something?

> > To address the issue, do not dequeue requests that don't
> > have TB_CFG_REQUEST_ACTIVE bit set.
> 
> Just to be sure. After this change you have not seen the issue anymore
> with your testing?

Haven't tried it yet.

We just found it today, it usually takes several weeks before
we can roll out the fix to our fleet and we prefer patches from
upstream/subsystem git, so that's why we reach out to the upstream.

The 0xdead000000000122 deference is a LIST_POISON on x86_64, which
is set explicitly in list_del(), so I'd say I'm fairly confident
that we have a double list_del() in tb_cfg_request_dequeue().
Mika Westerberg March 27, 2025, 2:20 p.m. UTC | #3
Hi,

On Thu, Mar 27, 2025 at 11:02:04PM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> On (25/03/27 15:37), Mika Westerberg wrote:
> > > Another possibility can be tb_cfg_request_sync():
> > > 
> > > tb_cfg_request_sync()
> > >  tb_cfg_request()
> > >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> > >  tb_cfg_request_cancel()
> > >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> > 
> > Not sure about this one because &req->work will only be scheduled once the
> > second schedule_work() should not queue it again (as far as I can tell).
> 
> If the second schedule_work() happens after a timeout, that's what
> !wait_for_completion_timeout() does, then the first schedule_work()
> can already execute the work by that time, and then we can schedule
> the work again (but the request is already dequeued).  Am I missing
> something?

schedule_work() does not schedule the work again if it is already
scheduled.

> > > To address the issue, do not dequeue requests that don't
> > > have TB_CFG_REQUEST_ACTIVE bit set.
> > 
> > Just to be sure. After this change you have not seen the issue anymore
> > with your testing?
> 
> Haven't tried it yet.
> 
> We just found it today, it usually takes several weeks before
> we can roll out the fix to our fleet and we prefer patches from
> upstream/subsystem git, so that's why we reach out to the upstream.

Makes sense.

> The 0xdead000000000122 deference is a LIST_POISON on x86_64, which
> is set explicitly in list_del(), so I'd say I'm fairly confident
> that we have a double list_del() in tb_cfg_request_dequeue().

Yes, I agree but since I have not seen any similar reports (sans what I saw
ages ago), I would like to be sure the issue you see is actually fixed with
the patch (and that there are no unexpected side-effects). ;-)
Sergey Senozhatsky March 27, 2025, 2:37 p.m. UTC | #4
On (25/03/27 16:20), Mika Westerberg wrote:
> > On (25/03/27 15:37), Mika Westerberg wrote:
> > > > Another possibility can be tb_cfg_request_sync():
> > > > 
> > > > tb_cfg_request_sync()
> > > >  tb_cfg_request()
> > > >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> > > >  tb_cfg_request_cancel()
> > > >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> > > 
> > > Not sure about this one because &req->work will only be scheduled once the
> > > second schedule_work() should not queue it again (as far as I can tell).
> > 
> > If the second schedule_work() happens after a timeout, that's what
> > !wait_for_completion_timeout() does, then the first schedule_work()
> > can already execute the work by that time, and then we can schedule
> > the work again (but the request is already dequeued).  Am I missing
> > something?
> 
> schedule_work() does not schedule the work again if it is already
> scheduled.

Yes, if it's scheduled.  If it's already executed then we can schedule
again.

	tb_cfg_request_sync() {
	 tb_cfg_request()
	   schedule_work()
	                        executes tb_cfg_request_dequeue
	 wait_for_completion_timeout()
	   schedule_work()
	                        executes tb_cfg_request_dequeue again
	}

I guess there can be enough delay (for whatever reason, not only
wait_for_completion_timeout(), but maybe also preemption) between
two schedule_work calls?

> > The 0xdead000000000122 deference is a LIST_POISON on x86_64, which
> > is set explicitly in list_del(), so I'd say I'm fairly confident
> > that we have a double list_del() in tb_cfg_request_dequeue().
> 
> Yes, I agree but since I have not seen any similar reports (sans what I saw
> ages ago), I would like to be sure the issue you see is actually fixed with
> the patch (and that there are no unexpected side-effects). ;-)

Let me see what I can do (we don't normally apply patches that
were not in the corresponding subsystem tree).

In the meantime, do you have a subsystem/driver tree that is exposed
to linux-next?  If so, would be cool if you can pick up the patch so
that it can get some extra testing via linux-next.
Mika Westerberg March 27, 2025, 2:55 p.m. UTC | #5
Hi,

On Thu, Mar 27, 2025 at 11:37:35PM +0900, Sergey Senozhatsky wrote:
> On (25/03/27 16:20), Mika Westerberg wrote:
> > > On (25/03/27 15:37), Mika Westerberg wrote:
> > > > > Another possibility can be tb_cfg_request_sync():
> > > > > 
> > > > > tb_cfg_request_sync()
> > > > >  tb_cfg_request()
> > > > >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> > > > >  tb_cfg_request_cancel()
> > > > >   schedule_work(&req->work) -> tb_cfg_request_dequeue()
> > > > 
> > > > Not sure about this one because &req->work will only be scheduled once the
> > > > second schedule_work() should not queue it again (as far as I can tell).
> > > 
> > > If the second schedule_work() happens after a timeout, that's what
> > > !wait_for_completion_timeout() does, then the first schedule_work()
> > > can already execute the work by that time, and then we can schedule
> > > the work again (but the request is already dequeued).  Am I missing
> > > something?
> > 
> > schedule_work() does not schedule the work again if it is already
> > scheduled.
> 
> Yes, if it's scheduled.  If it's already executed then we can schedule
> again.
> 
> 	tb_cfg_request_sync() {
> 	 tb_cfg_request()
> 	   schedule_work()

This point it runs tb_cfg_request_work() which then calls the callback
(tb_cfg_request_complete()) before it dequeues so "done" is completed.

> 	                        executes tb_cfg_request_dequeue

> 	 wait_for_completion_timeout()

so this will return > 0 as "done" completed..

> 	   schedule_work()
> 	                        executes tb_cfg_request_dequeue again

..and we don't call this one.

> 	}
> 
> I guess there can be enough delay (for whatever reason, not only
> wait_for_completion_timeout(), but maybe also preemption) between
> two schedule_work calls?
> 
> > > The 0xdead000000000122 deference is a LIST_POISON on x86_64, which
> > > is set explicitly in list_del(), so I'd say I'm fairly confident
> > > that we have a double list_del() in tb_cfg_request_dequeue().
> > 
> > Yes, I agree but since I have not seen any similar reports (sans what I saw
> > ages ago), I would like to be sure the issue you see is actually fixed with
> > the patch (and that there are no unexpected side-effects). ;-)
> 
> Let me see what I can do (we don't normally apply patches that
> were not in the corresponding subsystem tree).
> 
> In the meantime, do you have a subsystem/driver tree that is exposed
> to linux-next?  If so, would be cool if you can pick up the patch so
> that it can get some extra testing via linux-next.

Yes I do, see [1] but it does not work like that. First you should make
sure you patch works by testing it yourself and then we can pick it up for
others to test.

[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git/
Sergey Senozhatsky March 27, 2025, 3:02 p.m. UTC | #6
On (25/03/27 16:55), Mika Westerberg wrote:
[..]
> > Yes, if it's scheduled.  If it's already executed then we can schedule
> > again.
> > 
> > 	tb_cfg_request_sync() {
> > 	 tb_cfg_request()
> > 	   schedule_work()
> 
> This point it runs tb_cfg_request_work() which then calls the callback
> (tb_cfg_request_complete()) before it dequeues so "done" is completed.
> 
> > 	                        executes tb_cfg_request_dequeue
> 
> > 	 wait_for_completion_timeout()
> 
> so this will return > 0 as "done" completed..
> 
> > 	   schedule_work()
> > 	                        executes tb_cfg_request_dequeue again
> 
> ..and we don't call this one.

Ah, okay, I see.  Thanks for the explanations.  I'll drop
that one from the commit message then (let me re-spin v3,
just for the history).

[..]
> > Let me see what I can do (we don't normally apply patches that
> > were not in the corresponding subsystem tree).
> > 
> > In the meantime, do you have a subsystem/driver tree that is exposed
> > to linux-next?  If so, would be cool if you can pick up the patch so
> > that it can get some extra testing via linux-next.
> 
> Yes I do, see [1] but it does not work like that. First you should make
> sure you patch works by testing it yourself and then we can pick it up for
> others to test.

Sure, if I had the H/W testing would have done by now.  OK, let me try
to work this out.
diff mbox series

Patch

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index cd15e84c47f4..1db2e951b53f 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -151,6 +151,11 @@  static void tb_cfg_request_dequeue(struct tb_cfg_request *req)
 	struct tb_ctl *ctl = req->ctl;
 
 	mutex_lock(&ctl->request_queue_lock);
+	if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) {
+		mutex_unlock(&ctl->request_queue_lock);
+		return;
+	}
+
 	list_del(&req->list);
 	clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags);
 	if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags))