Patchwork dma-buf/sw_sync: hold a fence reference when check if it signaled

login
register
mail settings
Submitter Gustavo F. Padovan
Date July 27, 2017, 7:03 p.m.
Message ID <20170727190353.3353-1-gustavo@padovan.org>
Download mbox | patch
Permalink /patch/9867465/
State New
Headers show

Comments

Gustavo F. Padovan - July 27, 2017, 7:03 p.m.
From: Gustavo Padovan <gustavo.padovan@collabora.com>

If userspace already dropped its own reference by closing the sw_sync
fence fd we might end up in a deadlock where
dma_fence_is_signaled_locked() will trigger the release of the fence a
thus try to hold the lock to remove the fence from the list.

We need to grab a reference to the fence before calling into this chain if
we want to avoid this issue.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
---
 drivers/dma-buf/sw_sync.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
Chris Wilson - July 27, 2017, 7:30 p.m.
Quoting Gustavo Padovan (2017-07-27 20:03:53)
> From: Gustavo Padovan <gustavo.padovan@collabora.com>
> 
> If userspace already dropped its own reference by closing the sw_sync
> fence fd we might end up in a deadlock where
> dma_fence_is_signaled_locked() will trigger the release of the fence a
> thus try to hold the lock to remove the fence from the list.

So the issue here is that call to dma_fence_is_signaled_lock() is
triggering the unreference?
 
> We need to grab a reference to the fence before calling into this chain if
> we want to avoid this issue.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
> ---
>  drivers/dma-buf/sw_sync.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index af1bc84..8291434 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -144,11 +144,16 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
>         obj->value += inc;
>  
>         list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
> -               if (!dma_fence_is_signaled_locked(&pt->base))
> +               dma_fence_get(&pt->base);

This would need to be dma_fence_get_rcu() to avoid grabbing the fence
when its refcount has hit 0.

> +               if (!dma_fence_is_signaled_locked(&pt->base)) {
> +                       dma_fence_put(&pt->base);
>                         break;
> +               }
>  
>                 list_del_init(&pt->link);
>                 rb_erase(&pt->node, &obj->pt_tree);

But if I understand correctly, we just need to unlink first, then
signal.

list_for_each_entry_safe() {
	if (!timeline_fence_signaled(&pt->base))
		break;

	list_del_init(&pt->link);
	rb_erase(&pt->node, &obj->pt_tree);

	dma_fence_signal_locked(&pt->base);
}

The challenge is in writing the comment to explain the open-coding.
-Chris
Gustavo F. Padovan - July 28, 2017, 1:57 a.m.
2017-07-27 Chris Wilson <chris@chris-wilson.co.uk>:

> Quoting Gustavo Padovan (2017-07-27 20:03:53)
> > From: Gustavo Padovan <gustavo.padovan@collabora.com>
> > 
> > If userspace already dropped its own reference by closing the sw_sync
> > fence fd we might end up in a deadlock where
> > dma_fence_is_signaled_locked() will trigger the release of the fence a
> > thus try to hold the lock to remove the fence from the list.
> 
> So the issue here is that call to dma_fence_is_signaled_lock() is
> triggering the unreference?

Exactly. I'll say that explicitely in the commit message.

>  
> > We need to grab a reference to the fence before calling into this chain if
> > we want to avoid this issue.
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
> > ---
> >  drivers/dma-buf/sw_sync.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > index af1bc84..8291434 100644
> > --- a/drivers/dma-buf/sw_sync.c
> > +++ b/drivers/dma-buf/sw_sync.c
> > @@ -144,11 +144,16 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> >         obj->value += inc;
> >  
> >         list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
> > -               if (!dma_fence_is_signaled_locked(&pt->base))
> > +               dma_fence_get(&pt->base);
> 
> This would need to be dma_fence_get_rcu() to avoid grabbing the fence
> when its refcount has hit 0.
> 
> > +               if (!dma_fence_is_signaled_locked(&pt->base)) {
> > +                       dma_fence_put(&pt->base);
> >                         break;
> > +               }
> >  
> >                 list_del_init(&pt->link);
> >                 rb_erase(&pt->node, &obj->pt_tree);
> 
> But if I understand correctly, we just need to unlink first, then
> signal.
> 
> list_for_each_entry_safe() {
> 	if (!timeline_fence_signaled(&pt->base))
> 		break;
> 
> 	list_del_init(&pt->link);
> 	rb_erase(&pt->node, &obj->pt_tree);
> 
> 	dma_fence_signal_locked(&pt->base);
> }
> 
> The challenge is in writing the comment to explain the open-coding.

That is cleaner and doesn't need the get/put dance. I'll come up with a
comment to explain it.

Gustavo
Daniel Vetter - July 28, 2017, 7:30 a.m.
On Thu, Jul 27, 2017 at 04:03:53PM -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan <gustavo.padovan@collabora.com>
> 
> If userspace already dropped its own reference by closing the sw_sync
> fence fd we might end up in a deadlock where
> dma_fence_is_signaled_locked() will trigger the release of the fence a
> thus try to hold the lock to remove the fence from the list.
> 
> We need to grab a reference to the fence before calling into this chain if
> we want to avoid this issue.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>

Do we have a testcase for this?
-Daniel

> ---
>  drivers/dma-buf/sw_sync.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index af1bc84..8291434 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -144,11 +144,16 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
>  	obj->value += inc;
>  
>  	list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
> -		if (!dma_fence_is_signaled_locked(&pt->base))
> +		dma_fence_get(&pt->base);
> +		if (!dma_fence_is_signaled_locked(&pt->base)) {
> +			dma_fence_put(&pt->base);
>  			break;
> +		}
>  
>  		list_del_init(&pt->link);
>  		rb_erase(&pt->node, &obj->pt_tree);
> +
> +		dma_fence_put(&pt->base);
>  	}
>  
>  	spin_unlock_irq(&obj->lock);
> -- 
> 2.9.4
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chris Wilson - July 28, 2017, 8:28 a.m.
Quoting Gustavo Padovan (2017-07-28 02:57:25)
> 2017-07-27 Chris Wilson <chris@chris-wilson.co.uk>:
> 
> > Quoting Gustavo Padovan (2017-07-27 20:03:53)
> > > From: Gustavo Padovan <gustavo.padovan@collabora.com>
> > > 
> > > If userspace already dropped its own reference by closing the sw_sync
> > > fence fd we might end up in a deadlock where
> > > dma_fence_is_signaled_locked() will trigger the release of the fence a
> > > thus try to hold the lock to remove the fence from the list.
> > 
> > So the issue here is that call to dma_fence_is_signaled_lock() is
> > triggering the unreference?
> 
> Exactly. I'll say that explicitely in the commit message.

:) It was more of a rhetorical question making sure that I understood
correctly.

> > But if I understand correctly, we just need to unlink first, then
> > signal.
> > 
> > list_for_each_entry_safe() {
> >       if (!timeline_fence_signaled(&pt->base))
> >               break;
> > 
> >       list_del_init(&pt->link);
> >       rb_erase(&pt->node, &obj->pt_tree);
> > 
> >       dma_fence_signal_locked(&pt->base);
> > }
> > 
> > The challenge is in writing the comment to explain the open-coding.
> 
> That is cleaner and doesn't need the get/put dance. I'll come up with a
> comment to explain it.

...

/*
 * A signal callback may release the last reference to this fence,
 * causing it to be freed. That operation has to be last to avoid
 * a use after free inside this loop, and must be after we remove
 * the fence from the timeline in order to prevent deadlocking on
 * timeline->lock inside timeline_fence_release().
 */
 dma_fence_signal_locked().
-Chris

Patch

diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index af1bc84..8291434 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -144,11 +144,16 @@  static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
 	obj->value += inc;
 
 	list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
-		if (!dma_fence_is_signaled_locked(&pt->base))
+		dma_fence_get(&pt->base);
+		if (!dma_fence_is_signaled_locked(&pt->base)) {
+			dma_fence_put(&pt->base);
 			break;
+		}
 
 		list_del_init(&pt->link);
 		rb_erase(&pt->node, &obj->pt_tree);
+
+		dma_fence_put(&pt->base);
 	}
 
 	spin_unlock_irq(&obj->lock);