diff mbox

dma-buf/fence-array: fix deadlock in fence-array

Message ID 1476729612-22602-1-git-send-email-robdclark@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rob Clark Oct. 17, 2016, 6:40 p.m. UTC
Currently with fence-array, we have a potential deadlock situation.  If we
fence_add_callback() on an array-fence, the array-fence's lock is acquired
first, and in it's ->enable_signaling() callback, it will install cb's on
it's array-member fences, so the array-member's lock is acquired second.

But in the signal path, the array-member's lock is acquired first, and the
array-fence's lock acquired second.

To solve that, always enabling signaling up-front (in the fence_array
constructor) without the fence_array's lock held.

lockdep splat:

 ======================================================
[ INFO: possible circular locking dependency detected ]
4.7.0-rc7+ #489 Not tainted
-------------------------------------------------------
surfaceflinger/2034 is trying to acquire lock:
 (&(&array->lock)->rlock){......}, at: [<ffff00000858cddc>] fence_signal+0x5c/0xf8

but task is already holding lock:
 (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>] sw_sync_ioctl+0x228/0x3b0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&obj->child_list_lock)->rlock){......}:
       [<ffff000008108924>] __lock_acquire+0x173c/0x18d8
       [<ffff000008108e0c>] lock_acquire+0x4c/0x68
       [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70
       [<ffff00000858d05c>] fence_add_callback+0x3c/0x100
       [<ffff00000858f100>] fence_array_enable_signaling+0x80/0x170
       [<ffff00000858d0d8>] fence_add_callback+0xb8/0x100
       [<ffff00000858f504>] sync_file_poll+0xd4/0xf0
       [<ffff0000081fd3a0>] do_sys_poll+0x220/0x438
       [<ffff0000081fd8d0>] SyS_ppoll+0x1b0/0x1d8
       [<ffff000008084f30>] el0_svc_naked+0x24/0x28

-> #0 (&(&array->lock)->rlock){......}:
       [<ffff000008104768>] print_circular_bug+0x80/0x2e0
       [<ffff0000081089ac>] __lock_acquire+0x17c4/0x18d8
       [<ffff000008108e0c>] lock_acquire+0x4c/0x68
       [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70
       [<ffff00000858cddc>] fence_signal+0x5c/0xf8
       [<ffff00000858f268>] fence_array_cb_func+0x78/0x88
       [<ffff00000858cb28>] fence_signal_locked+0x80/0xe0
       [<ffff0000085903c8>] sw_sync_ioctl+0x2f8/0x3b0
       [<ffff0000081faf6c>] do_vfs_ioctl+0xa4/0x790
       [<ffff0000081fb6e4>] SyS_ioctl+0x8c/0xa0
       [<ffff000008084f30>] el0_svc_naked+0x24/0x28

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&obj->child_list_lock)->rlock);
                               lock(&(&array->lock)->rlock);
                               lock(&(&obj->child_list_lock)->rlock);
  lock(&(&array->lock)->rlock);

 *** DEADLOCK ***

1 lock held by surfaceflinger/2034:
 #0:  (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>] sw_sync_ioctl+0x228/0x3b0

Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
---
 drivers/dma-buf/fence-array.c |  8 ++++++++
 drivers/dma-buf/fence.c       | 21 +++++++++++++++++++++
 include/linux/fence.h         |  1 +
 3 files changed, 30 insertions(+)

Comments

Gustavo Padovan Oct. 17, 2016, 6:52 p.m. UTC | #1
2016-10-17 Rob Clark <robdclark@gmail.com>:

> Currently with fence-array, we have a potential deadlock situation.  If we
> fence_add_callback() on an array-fence, the array-fence's lock is acquired
> first, and in it's ->enable_signaling() callback, it will install cb's on
> it's array-member fences, so the array-member's lock is acquired second.
> 
> But in the signal path, the array-member's lock is acquired first, and the
> array-fence's lock acquired second.
> 
> To solve that, always enabling signaling up-front (in the fence_array
> constructor) without the fence_array's lock held.

Do we always want to enable signaling for arrays? One of the things we
removed from the Sync Framework was the need to enable signalling at
creation time. 

Just merging fencing doesn't mean you want signaling, that is supposed
to happen only when poll() is called on the sync file.

Gustavo
Rob Clark Oct. 17, 2016, 6:59 p.m. UTC | #2
On Mon, Oct 17, 2016 at 2:52 PM, Gustavo Padovan <gustavo@padovan.org> wrote:
> 2016-10-17 Rob Clark <robdclark@gmail.com>:
>
>> Currently with fence-array, we have a potential deadlock situation.  If we
>> fence_add_callback() on an array-fence, the array-fence's lock is acquired
>> first, and in it's ->enable_signaling() callback, it will install cb's on
>> it's array-member fences, so the array-member's lock is acquired second.
>>
>> But in the signal path, the array-member's lock is acquired first, and the
>> array-fence's lock acquired second.
>>
>> To solve that, always enabling signaling up-front (in the fence_array
>> constructor) without the fence_array's lock held.
>
> Do we always want to enable signaling for arrays? One of the things we
> removed from the Sync Framework was the need to enable signalling at
> creation time.
>
> Just merging fencing doesn't mean you want signaling, that is supposed
> to happen only when poll() is called on the sync file.

It was something Maarten suggested, as an alternative to introducing a
wq into the mix or worse hacks..
https://lists.freedesktop.org/archives/dri-devel/2016-October/120868.html

I think I agree with him that it is an optimization that is unlikely
to be useful in the case of fence-arrays.  If you need to wait on
multiple fences from different timelines, you probably aren't doing
that in hw.

BR,
-R

> Gustavo
>
Gustavo Padovan Oct. 17, 2016, 7:26 p.m. UTC | #3
2016-10-17 Rob Clark <robdclark@gmail.com>:

> On Mon, Oct 17, 2016 at 2:52 PM, Gustavo Padovan <gustavo@padovan.org> wrote:
> > 2016-10-17 Rob Clark <robdclark@gmail.com>:
> >
> >> Currently with fence-array, we have a potential deadlock situation.  If we
> >> fence_add_callback() on an array-fence, the array-fence's lock is acquired
> >> first, and in it's ->enable_signaling() callback, it will install cb's on
> >> it's array-member fences, so the array-member's lock is acquired second.
> >>
> >> But in the signal path, the array-member's lock is acquired first, and the
> >> array-fence's lock acquired second.
> >>
> >> To solve that, always enabling signaling up-front (in the fence_array
> >> constructor) without the fence_array's lock held.
> >
> > Do we always want to enable signaling for arrays? One of the things we
> > removed from the Sync Framework was the need to enable signalling at
> > creation time.
> >
> > Just merging fencing doesn't mean you want signaling, that is supposed
> > to happen only when poll() is called on the sync file.
> 
> It was something Maarten suggested, as an alternative to introducing a
> wq into the mix or worse hacks..
> https://lists.freedesktop.org/archives/dri-devel/2016-October/120868.html
> 
> I think I agree with him that it is an optimization that is unlikely
> to be useful in the case of fence-arrays.  If you need to wait on
> multiple fences from different timelines, you probably aren't doing
> that in hw.

Fair enough, that makes sense to me. 

If no one else has concerns with this, I'd add my r-b to it:

Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

Gustavo
Chris Wilson Oct. 17, 2016, 7:39 p.m. UTC | #4
On Mon, Oct 17, 2016 at 02:59:52PM -0400, Rob Clark wrote:
> On Mon, Oct 17, 2016 at 2:52 PM, Gustavo Padovan <gustavo@padovan.org> wrote:
> > 2016-10-17 Rob Clark <robdclark@gmail.com>:
> >
> >> Currently with fence-array, we have a potential deadlock situation.  If we
> >> fence_add_callback() on an array-fence, the array-fence's lock is acquired
> >> first, and in it's ->enable_signaling() callback, it will install cb's on
> >> it's array-member fences, so the array-member's lock is acquired second.
> >>
> >> But in the signal path, the array-member's lock is acquired first, and the
> >> array-fence's lock acquired second.
> >>
> >> To solve that, always enabling signaling up-front (in the fence_array
> >> constructor) without the fence_array's lock held.
> >
> > Do we always want to enable signaling for arrays? One of the things we
> > removed from the Sync Framework was the need to enable signalling at
> > creation time.
> >
> > Just merging fencing doesn't mean you want signaling, that is supposed
> > to happen only when poll() is called on the sync file.
> 
> It was something Maarten suggested, as an alternative to introducing a
> wq into the mix or worse hacks..
> https://lists.freedesktop.org/archives/dri-devel/2016-October/120868.html
> 
> I think I agree with him that it is an optimization that is unlikely
> to be useful in the case of fence-arrays.  If you need to wait on
> multiple fences from different timelines, you probably aren't doing
> that in hw.

For 2 i915 fences, I definitely do not want signaling enabled at
creation time.
-Chris
Gustavo Padovan Oct. 17, 2016, 7:44 p.m. UTC | #5
2016-10-17 Chris Wilson <chris@chris-wilson.co.uk>:

> On Mon, Oct 17, 2016 at 02:59:52PM -0400, Rob Clark wrote:
> > On Mon, Oct 17, 2016 at 2:52 PM, Gustavo Padovan <gustavo@padovan.org> wrote:
> > > 2016-10-17 Rob Clark <robdclark@gmail.com>:
> > >
> > >> Currently with fence-array, we have a potential deadlock situation.  If we
> > >> fence_add_callback() on an array-fence, the array-fence's lock is acquired
> > >> first, and in it's ->enable_signaling() callback, it will install cb's on
> > >> it's array-member fences, so the array-member's lock is acquired second.
> > >>
> > >> But in the signal path, the array-member's lock is acquired first, and the
> > >> array-fence's lock acquired second.
> > >>
> > >> To solve that, always enabling signaling up-front (in the fence_array
> > >> constructor) without the fence_array's lock held.
> > >
> > > Do we always want to enable signaling for arrays? One of the things we
> > > removed from the Sync Framework was the need to enable signalling at
> > > creation time.
> > >
> > > Just merging fencing doesn't mean you want signaling, that is supposed
> > > to happen only when poll() is called on the sync file.
> > 
> > It was something Maarten suggested, as an alternative to introducing a
> > wq into the mix or worse hacks..
> > https://lists.freedesktop.org/archives/dri-devel/2016-October/120868.html
> > 
> > I think I agree with him that it is an optimization that is unlikely
> > to be useful in the case of fence-arrays.  If you need to wait on
> > multiple fences from different timelines, you probably aren't doing
> > that in hw.
> 
> For 2 i915 fences, I definitely do not want signaling enabled at
> creation time.

Should we add arg flags for fence_array_create()? We already have
signal_on_any flag there. We can convert that arg to a bitfield.

Gustavo
Daniel Vetter Oct. 18, 2016, 7:41 a.m. UTC | #6
On Mon, Oct 17, 2016 at 05:44:48PM -0200, Gustavo Padovan wrote:
> 2016-10-17 Chris Wilson <chris@chris-wilson.co.uk>:
> 
> > On Mon, Oct 17, 2016 at 02:59:52PM -0400, Rob Clark wrote:
> > > On Mon, Oct 17, 2016 at 2:52 PM, Gustavo Padovan <gustavo@padovan.org> wrote:
> > > > 2016-10-17 Rob Clark <robdclark@gmail.com>:
> > > >
> > > >> Currently with fence-array, we have a potential deadlock situation.  If we
> > > >> fence_add_callback() on an array-fence, the array-fence's lock is acquired
> > > >> first, and in it's ->enable_signaling() callback, it will install cb's on
> > > >> it's array-member fences, so the array-member's lock is acquired second.
> > > >>
> > > >> But in the signal path, the array-member's lock is acquired first, and the
> > > >> array-fence's lock acquired second.
> > > >>
> > > >> To solve that, always enabling signaling up-front (in the fence_array
> > > >> constructor) without the fence_array's lock held.
> > > >
> > > > Do we always want to enable signaling for arrays? One of the things we
> > > > removed from the Sync Framework was the need to enable signalling at
> > > > creation time.
> > > >
> > > > Just merging fencing doesn't mean you want signaling, that is supposed
> > > > to happen only when poll() is called on the sync file.
> > > 
> > > It was something Maarten suggested, as an alternative to introducing a
> > > wq into the mix or worse hacks..
> > > https://lists.freedesktop.org/archives/dri-devel/2016-October/120868.html
> > > 
> > > I think I agree with him that it is an optimization that is unlikely
> > > to be useful in the case of fence-arrays.  If you need to wait on
> > > multiple fences from different timelines, you probably aren't doing
> > > that in hw.
> > 
> > For 2 i915 fences, I definitely do not want signaling enabled at
> > creation time.
> 
> Should we add arg flags for fence_array_create()? We already have
> signal_on_any flag there. We can convert that arg to a bitfield.

The thing creating the array might not be aware of the fences contained
therein, e.g. SYNC_FILE_MERGE. Imo we really need to keep the lazy
signalling enabling properties of fences.
-Daniel
Rob Clark Oct. 18, 2016, 11:49 a.m. UTC | #7
On Tue, Oct 18, 2016 at 3:41 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Oct 17, 2016 at 05:44:48PM -0200, Gustavo Padovan wrote:
>> 2016-10-17 Chris Wilson <chris@chris-wilson.co.uk>:
>>
>> > On Mon, Oct 17, 2016 at 02:59:52PM -0400, Rob Clark wrote:
>> > > On Mon, Oct 17, 2016 at 2:52 PM, Gustavo Padovan <gustavo@padovan.org> wrote:
>> > > > 2016-10-17 Rob Clark <robdclark@gmail.com>:
>> > > >
>> > > >> Currently with fence-array, we have a potential deadlock situation.  If we
>> > > >> fence_add_callback() on an array-fence, the array-fence's lock is acquired
>> > > >> first, and in it's ->enable_signaling() callback, it will install cb's on
>> > > >> it's array-member fences, so the array-member's lock is acquired second.
>> > > >>
>> > > >> But in the signal path, the array-member's lock is acquired first, and the
>> > > >> array-fence's lock acquired second.
>> > > >>
>> > > >> To solve that, always enabling signaling up-front (in the fence_array
>> > > >> constructor) without the fence_array's lock held.
>> > > >
>> > > > Do we always want to enable signaling for arrays? One of the things we
>> > > > removed from the Sync Framework was the need to enable signalling at
>> > > > creation time.
>> > > >
>> > > > Just merging fencing doesn't mean you want signaling, that is supposed
>> > > > to happen only when poll() is called on the sync file.
>> > >
>> > > It was something Maarten suggested, as an alternative to introducing a
>> > > wq into the mix or worse hacks..
>> > > https://lists.freedesktop.org/archives/dri-devel/2016-October/120868.html
>> > >
>> > > I think I agree with him that it is an optimization that is unlikely
>> > > to be useful in the case of fence-arrays.  If you need to wait on
>> > > multiple fences from different timelines, you probably aren't doing
>> > > that in hw.
>> >
>> > For 2 i915 fences, I definitely do not want signaling enabled at
>> > creation time.
>>
>> Should we add arg flags for fence_array_create()? We already have
>> signal_on_any flag there. We can convert that arg to a bitfield.
>
> The thing creating the array might not be aware of the fences contained
> therein, e.g. SYNC_FILE_MERGE. Imo we really need to keep the lazy
> signalling enabling properties of fences.

Well, in all my cases, if you end up with a fence-array, it means you
are waiting on cpu, so enabling signalling now vs later doesn't
matter.

But Chris mentioned on IRC that he does actually have a case where he
can wait in hw on fences from multiple contexts (ie. meaning MERGE
would create a fence-array).. so in that case it does still make sense
to defer enable-signalling.  So I guess back to the wq approach..

BR,
-R

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
diff mbox

Patch

diff --git a/drivers/dma-buf/fence-array.c b/drivers/dma-buf/fence-array.c
index f1989fc..b5821e9 100644
--- a/drivers/dma-buf/fence-array.c
+++ b/drivers/dma-buf/fence-array.c
@@ -140,6 +140,14 @@  struct fence_array *fence_array_create(int num_fences, struct fence **fences,
 	atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
 	array->fences = fences;
 
+	/* enable signaling without our lock being held while callbacks
+	 * are installed on the array-member fences.  Otherwise there is
+	 * a potential deadlock between enabling signaling (our lock
+	 * acquired first) and the array-members getting signalled (their
+	 * lock aquired first).
+	 */
+	fence_enable_sw_signaling_nolock(&array->base);
+
 	return array;
 }
 EXPORT_SYMBOL(fence_array_create);
diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
index cc05ddd..ff6b410 100644
--- a/drivers/dma-buf/fence.c
+++ b/drivers/dma-buf/fence.c
@@ -219,6 +219,27 @@  void fence_enable_sw_signaling(struct fence *fence)
 EXPORT_SYMBOL(fence_enable_sw_signaling);
 
 /**
+ * fence_enable_sw_signaling_nolock - enable signaling on fence without taking lock
+ * @fence:	[in]	the fence to enable
+ *
+ * WARNING this is only safe to use when you know you have the only reference
+ * to the fence there is no possibility for race conditions as a result of
+ * calling ops->enable_signaling() without lock.  Ie. it is probably only safe
+ * to use from the fence's construction function.
+ */
+void fence_enable_sw_signaling_nolock(struct fence *fence)
+{
+	if (!test_and_set_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags) &&
+	    !test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
+		trace_fence_enable_signal(fence);
+
+		if (!fence->ops->enable_signaling(fence))
+			fence_signal(fence);
+	}
+}
+EXPORT_SYMBOL(fence_enable_sw_signaling_nolock);
+
+/**
  * fence_add_callback - add a callback to be called when the fence
  * is signaled
  * @fence:	[in]	the fence to wait on
diff --git a/include/linux/fence.h b/include/linux/fence.h
index 0d76305..076ac29 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -226,6 +226,7 @@  int fence_add_callback(struct fence *fence, struct fence_cb *cb,
 		       fence_func_t func);
 bool fence_remove_callback(struct fence *fence, struct fence_cb *cb);
 void fence_enable_sw_signaling(struct fence *fence);
+void fence_enable_sw_signaling_nolock(struct fence *fence);
 
 /**
  * fence_is_signaled_locked - Return an indication if the fence is signaled yet.