diff mbox series

[2/2] dma-buf/dma-fence: Add quick tests before dma_fence_remove_callback

Message ID 20200715104905.11006-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/2] dma-buf/dma-fence: Trim dma_fence_add_callback() | expand

Commit Message

Chris Wilson July 15, 2020, 10:49 a.m. UTC
When waiting with a callback on the stack, we must remove the callback
upon wait completion. Since this will be notified by the fence signal
callback, the removal often contends with the fence->lock being held by
the signaler. We can look at the list entry to see if the callback was
already signaled before we take the contended lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/dma-buf/dma-fence.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Daniel Vetter July 15, 2020, 12:10 p.m. UTC | #1
On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> When waiting with a callback on the stack, we must remove the callback
> upon wait completion. Since this will be notified by the fence signal
> callback, the removal often contends with the fence->lock being held by
> the signaler. We can look at the list entry to see if the callback was
> already signaled before we take the contended lock.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/dma-buf/dma-fence.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 8d5bdfce638e..b910d7bc0854 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -420,6 +420,9 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
>  	unsigned long flags;
>  	bool ret;
>  
> +	if (list_empty(&cb->node))

I was about to say "but the races" but then noticed that Paul fixed
list_empty to use READ_ONCE like 5 years ago :-)

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +		return false;
> +
>  	spin_lock_irqsave(fence->lock, flags);
>  
>  	ret = !list_empty(&cb->node);
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson July 15, 2020, 12:21 p.m. UTC | #2
Quoting Daniel Vetter (2020-07-15 13:10:22)
> On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> > When waiting with a callback on the stack, we must remove the callback
> > upon wait completion. Since this will be notified by the fence signal
> > callback, the removal often contends with the fence->lock being held by
> > the signaler. We can look at the list entry to see if the callback was
> > already signaled before we take the contended lock.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/dma-buf/dma-fence.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 8d5bdfce638e..b910d7bc0854 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -420,6 +420,9 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
> >       unsigned long flags;
> >       bool ret;
> >  
> > +     if (list_empty(&cb->node))
> 
> I was about to say "but the races" but then noticed that Paul fixed
> list_empty to use READ_ONCE like 5 years ago :-)

I'm always going "when exactly do we need list_empty_careful()"?

We can rule out a concurrent dma_fence_add_callback() for the same
dma_fence_cb, as that is a lost cause. So we only have to worry about
the concurrent list_del_init() from dma_fence_signal_locked(). So it's
the timing of
	list_del_init(): WRITE_ONCE(list->next, list)
vs
	READ_ONCE(list->next) == list
and we don't need to care about the trailing instructions in
list_del_init()...

Wait that trailing instruction is actually important here if the
dma_fence_cb is on the stack, or other imminent free.

Ok, this does need to be list_empty_careful!
-Chris
Chris Wilson July 15, 2020, 12:40 p.m. UTC | #3
Quoting Chris Wilson (2020-07-15 13:21:43)
> Quoting Daniel Vetter (2020-07-15 13:10:22)
> > On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> > > When waiting with a callback on the stack, we must remove the callback
> > > upon wait completion. Since this will be notified by the fence signal
> > > callback, the removal often contends with the fence->lock being held by
> > > the signaler. We can look at the list entry to see if the callback was
> > > already signaled before we take the contended lock.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/dma-buf/dma-fence.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > index 8d5bdfce638e..b910d7bc0854 100644
> > > --- a/drivers/dma-buf/dma-fence.c
> > > +++ b/drivers/dma-buf/dma-fence.c
> > > @@ -420,6 +420,9 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
> > >       unsigned long flags;
> > >       bool ret;
> > >  
> > > +     if (list_empty(&cb->node))
> > 
> > I was about to say "but the races" but then noticed that Paul fixed
> > list_empty to use READ_ONCE like 5 years ago :-)
> 
> I'm always going "when exactly do we need list_empty_careful()"?
> 
> We can rule out a concurrent dma_fence_add_callback() for the same
> dma_fence_cb, as that is a lost cause. So we only have to worry about
> the concurrent list_del_init() from dma_fence_signal_locked(). So it's
> the timing of
>         list_del_init(): WRITE_ONCE(list->next, list)
> vs
>         READ_ONCE(list->next) == list
> and we don't need to care about the trailing instructions in
> list_del_init()...
> 
> Wait that trailing instruction is actually important here if the
> dma_fence_cb is on the stack, or other imminent free.
> 
> Ok, this does need to be list_empty_careful!

There's a further problem in that we call INIT_LIST_HEAD on the
dma_fence_cb before the signal callback. So even if list_empty_careful()
confirms the dma_fence_cb to be completely decoupled, the containing
struct may still be inuse.
-Chris
Daniel Vetter July 15, 2020, 2:03 p.m. UTC | #4
On Wed, Jul 15, 2020 at 2:40 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Chris Wilson (2020-07-15 13:21:43)
> > Quoting Daniel Vetter (2020-07-15 13:10:22)
> > > On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> > > > When waiting with a callback on the stack, we must remove the callback
> > > > upon wait completion. Since this will be notified by the fence signal
> > > > callback, the removal often contends with the fence->lock being held by
> > > > the signaler. We can look at the list entry to see if the callback was
> > > > already signaled before we take the contended lock.
> > > >
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > ---
> > > >  drivers/dma-buf/dma-fence.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > index 8d5bdfce638e..b910d7bc0854 100644
> > > > --- a/drivers/dma-buf/dma-fence.c
> > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > @@ -420,6 +420,9 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
> > > >       unsigned long flags;
> > > >       bool ret;
> > > >
> > > > +     if (list_empty(&cb->node))
> > >
> > > I was about to say "but the races" but then noticed that Paul fixed
> > > list_empty to use READ_ONCE like 5 years ago :-)
> >
> > I'm always going "when exactly do we need list_empty_careful()"?
> >
> > We can rule out a concurrent dma_fence_add_callback() for the same
> > dma_fence_cb, as that is a lost cause. So we only have to worry about
> > the concurrent list_del_init() from dma_fence_signal_locked(). So it's
> > the timing of
> >         list_del_init(): WRITE_ONCE(list->next, list)
> > vs
> >         READ_ONCE(list->next) == list
> > and we don't need to care about the trailing instructions in
> > list_del_init()...
> >
> > Wait that trailing instruction is actually important here if the
> > dma_fence_cb is on the stack, or other imminent free.
> >
> > Ok, this does need to be list_empty_careful!

Hm, tbh I'm not really clear what list_empty_careful does on top.
Seems to lack READ_ONCE, so either some really big trickery with
dependencies is going on, or I'm not even sure how this works without
locks.

I've now stared at list_empty_careful and a bunch of users quite a
bit, and I have now idea when you'd want to use it. Lockless you want
a READ_ONCE I think and a simple check, so list_empty. And just accept
that any time you race you'll go into the locked slowpath for "list
isn't empty". Also only works if the list_empty case is the "nothing
to do, job already done" case, since the other one just isn't
guaranteed to be accurate.

list_empty_careful just wraps a bunch more magic around that will make
this both worse, and more racy (if the compiler feels creative)
because no READ_ONCE or anything like that. I don't get what that
thing is for ...

> There's a further problem in that we call INIT_LIST_HEAD on the
> dma_fence_cb before the signal callback. So even if list_empty_careful()
> confirms the dma_fence_cb to be completely decoupled, the containing
> struct may still be inuse.

The kerneldoc of dma_fence_remove_callback() already has a very stern
warning that this will blow up if you don't hold a full reference or
otherwise control the lifetime of this stuff. So I don't think we have
to worry about any of that. Or do you mean something else?
-Daniel
Chris Wilson July 15, 2020, 2:37 p.m. UTC | #5
Quoting Daniel Vetter (2020-07-15 15:03:34)
> On Wed, Jul 15, 2020 at 2:40 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > There's a further problem in that we call INIT_LIST_HEAD on the
> > dma_fence_cb before the signal callback. So even if list_empty_careful()
> > confirms the dma_fence_cb to be completely decoupled, the containing
> > struct may still be inuse.
> 
> The kerneldoc of dma_fence_remove_callback() already has a very stern
> warning that this will blow up if you don't hold a full reference or
> otherwise control the lifetime of this stuff. So I don't think we have
> to worry about any of that. Or do you mean something else?

It's the struct dma_fence_cb itself that may be freed/reused. Consider
dma_fence_default_wait(). That uses struct default_wait_cb on the stack,
so in order to ensure that the callback is completed the list_empty
check has to remain under the spinlock, or else
dma_fence_default_wait_cb() can still be dereferencing wait->task as the
function returns.

So currently it is:

signed long
dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
{
        struct default_wait_cb cb;
        unsigned long flags;
        signed long ret = timeout ? timeout : 1;

        spin_lock_irqsave(fence->lock, flags);

        if (intr && signal_pending(current)) {
                ret = -ERESTARTSYS;
                goto out;
        }

        if (!__dma_fence_enable_signaling(fence))
                goto out;

        if (!timeout) {
                ret = 0;
                goto out;
        }

        cb.base.func = dma_fence_default_wait_cb;
        cb.task = current;
        list_add(&cb.base.node, &fence->cb_list);

        while (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) && ret > 0) {
                if (intr)
                        __set_current_state(TASK_INTERRUPTIBLE);
                else
                        __set_current_state(TASK_UNINTERRUPTIBLE);
                spin_unlock_irqrestore(fence->lock, flags);

                ret = schedule_timeout(ret);

                spin_lock_irqsave(fence->lock, flags);
                if (ret > 0 && intr && signal_pending(current))
                        ret = -ERESTARTSYS;
        }

        if (!list_empty(&cb.base.node))
                list_del(&cb.base.node);
        __set_current_state(TASK_RUNNING);

out:
        spin_unlock_irqrestore(fence->lock, flags);
        return ret;
}

but it could be written as:

signed long
dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
{
        struct default_wait_cb cb;
	int state = intr ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;

        cb.task = current;
	if (dma_fence_add_callback(fence, &cb.base, dma_fence_default_wait_cb))
		return timeout ? timeout : 1;

	for (;;) {
		set_current_state(state);

		if (dma_fence_is_signaled(fence)) {
			timeout = timeout ? timeout : 1;
			break;
		}

		if (signal_pending_state(state, current)) {
			timeout = -ERESTARTSYS;
			break;
		}

		if (!timeout)
			break;

                timeout = schedule_timeout(timeout);
        }
        __set_current_state(TASK_RUNNING);

	dma_fence_remove_callback(fence, &cb.base);

        return timeout;
}
-Chris
Daniel Vetter July 15, 2020, 2:46 p.m. UTC | #6
On Wed, Jul 15, 2020 at 4:37 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Daniel Vetter (2020-07-15 15:03:34)
> > On Wed, Jul 15, 2020 at 2:40 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > There's a further problem in that we call INIT_LIST_HEAD on the
> > > dma_fence_cb before the signal callback. So even if list_empty_careful()
> > > confirms the dma_fence_cb to be completely decoupled, the containing
> > > struct may still be inuse.
> >
> > The kerneldoc of dma_fence_remove_callback() already has a very stern
> > warning that this will blow up if you don't hold a full reference or
> > otherwise control the lifetime of this stuff. So I don't think we have
> > to worry about any of that. Or do you mean something else?
>
> It's the struct dma_fence_cb itself that may be freed/reused. Consider
> dma_fence_default_wait(). That uses struct default_wait_cb on the stack,
> so in order to ensure that the callback is completed the list_empty
> check has to remain under the spinlock, or else
> dma_fence_default_wait_cb() can still be dereferencing wait->task as the
> function returns.

The current implementation of remove_callback doesn't work if you
don't own the callback structure. Or control its lifetime through some
other means.

So if we have callers removing other callback structures, that just
doesn't work, you can only remove your own.

From a quick spot check across a few callers we don't seem to have a
problem here, all current callers for this function are in various
wait functions (driver specific, or multi fence waits, stuff like
that).
-Daniel

> So currently it is:
>
> signed long
> dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
> {
>         struct default_wait_cb cb;
>         unsigned long flags;
>         signed long ret = timeout ? timeout : 1;
>
>         spin_lock_irqsave(fence->lock, flags);
>
>         if (intr && signal_pending(current)) {
>                 ret = -ERESTARTSYS;
>                 goto out;
>         }
>
>         if (!__dma_fence_enable_signaling(fence))
>                 goto out;
>
>         if (!timeout) {
>                 ret = 0;
>                 goto out;
>         }
>
>         cb.base.func = dma_fence_default_wait_cb;
>         cb.task = current;
>         list_add(&cb.base.node, &fence->cb_list);
>
>         while (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) && ret > 0) {
>                 if (intr)
>                         __set_current_state(TASK_INTERRUPTIBLE);
>                 else
>                         __set_current_state(TASK_UNINTERRUPTIBLE);
>                 spin_unlock_irqrestore(fence->lock, flags);
>
>                 ret = schedule_timeout(ret);
>
>                 spin_lock_irqsave(fence->lock, flags);
>                 if (ret > 0 && intr && signal_pending(current))
>                         ret = -ERESTARTSYS;
>         }
>
>         if (!list_empty(&cb.base.node))
>                 list_del(&cb.base.node);
>         __set_current_state(TASK_RUNNING);
>
> out:
>         spin_unlock_irqrestore(fence->lock, flags);
>         return ret;
> }
>
> but it could be written as:
>
> signed long
> dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
> {
>         struct default_wait_cb cb;
>         int state = intr ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>
>         cb.task = current;
>         if (dma_fence_add_callback(fence, &cb.base, dma_fence_default_wait_cb))
>                 return timeout ? timeout : 1;
>
>         for (;;) {
>                 set_current_state(state);
>
>                 if (dma_fence_is_signaled(fence)) {
>                         timeout = timeout ? timeout : 1;
>                         break;
>                 }
>
>                 if (signal_pending_state(state, current)) {
>                         timeout = -ERESTARTSYS;
>                         break;
>                 }
>
>                 if (!timeout)
>                         break;
>
>                 timeout = schedule_timeout(timeout);
>         }
>         __set_current_state(TASK_RUNNING);
>
>         dma_fence_remove_callback(fence, &cb.base);
>
>         return timeout;
> }
> -Chris
diff mbox series

Patch

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 8d5bdfce638e..b910d7bc0854 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -420,6 +420,9 @@  dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
 	unsigned long flags;
 	bool ret;
 
+	if (list_empty(&cb->node))
+		return false;
+
 	spin_lock_irqsave(fence->lock, flags);
 
 	ret = !list_empty(&cb->node);