diff mbox

[RFC,26/29] dma-buf/fence: remove pointless fence_timeline_signal at destroy phase

Message ID 1452869739-3304-27-git-send-email-gustavo@padovan.org (mailing list archive)
State New, archived
Headers show

Commit Message

Gustavo Padovan Jan. 15, 2016, 2:55 p.m. UTC
From: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

All changes to timeline value come through the user via
fence_timeline_signal() calls. When fence_timeline_destroy() is called no
changes on timeline->value happens hence call fence_timeline_signal() with
no increment is pointless.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
---
 drivers/dma-buf/fence.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

John Harrison Jan. 15, 2016, 5:48 p.m. UTC | #1
On 15/01/2016 14:55, Gustavo Padovan wrote:
> From: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
>
> All changes to timeline value come through the user via
> fence_timeline_signal() calls. When fence_timeline_destroy() is called no
> changes on timeline->value happens hence call fence_timeline_signal() with
> no increment is pointless.
>
> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
> ---
>   drivers/dma-buf/fence.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
> index 7a5fc9b..26f5f0f 100644
> --- a/drivers/dma-buf/fence.c
> +++ b/drivers/dma-buf/fence.c
> @@ -136,7 +136,7 @@ EXPORT_SYMBOL(fence_timeline_put);
>    * fence_timeline_destroy - destroy a fence_timeline
>    * @timeline	[in]	the fence_timeline to destroy
>    *
> - * This function destroys a timeline. It signals any active fence first.
> + * This function destroys a timeline.

The implementation for this was certainly broken but I would say it 
should be fixed to match the comment rather than just abandoned 
completely. That is, what happens if a timeline owner destroys their 
timeline while there are outstanding fences which other drivers are 
waiting on? That is presumably a bug in the code that called destroy 
prematurely, but bugs happen.

The old implementation simply leaked the fences. Doing a debugfs dump 
would show the timeline with all its outstanding fences still floating 
around forever after. Worse, anything waiting on them would never be 
signalled and is therefore potentially deadlocked.

Note that I haven't had chance to look through the entire patch series 
yet so maybe this has been fixed up elsewhere. If not, then I think it 
definitely needs looking into.


>    */
>   void fence_timeline_destroy(struct fence_timeline *timeline)
>   {
> @@ -147,10 +147,6 @@ void fence_timeline_destroy(struct fence_timeline *timeline)
>   	 */
>   	smp_wmb();
>   
> -	/*
> -	 * signal any children that their parent is going away.
> -	 */
> -	fence_timeline_signal(timeline, 0);
>   	fence_timeline_put(timeline);
>   }
>   EXPORT_SYMBOL(fence_timeline_destroy);
Gustavo Padovan Jan. 15, 2016, 6:02 p.m. UTC | #2
2016-01-15 John Harrison <John.C.Harrison@Intel.com>:

> On 15/01/2016 14:55, Gustavo Padovan wrote:
> >From: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
> >
> >All changes to timeline value come through the user via
> >fence_timeline_signal() calls. When fence_timeline_destroy() is called no
> >changes on timeline->value happens hence call fence_timeline_signal() with
> >no increment is pointless.
> >
> >Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
> >---
> >  drivers/dma-buf/fence.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> >
> >diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
> >index 7a5fc9b..26f5f0f 100644
> >--- a/drivers/dma-buf/fence.c
> >+++ b/drivers/dma-buf/fence.c
> >@@ -136,7 +136,7 @@ EXPORT_SYMBOL(fence_timeline_put);
> >   * fence_timeline_destroy - destroy a fence_timeline
> >   * @timeline	[in]	the fence_timeline to destroy
> >   *
> >- * This function destroys a timeline. It signals any active fence first.
> >+ * This function destroys a timeline.
> 
> The implementation for this was certainly broken but I would say it should
> be fixed to match the comment rather than just abandoned completely. That
> is, what happens if a timeline owner destroys their timeline while there are
> outstanding fences which other drivers are waiting on? That is presumably a
> bug in the code that called destroy prematurely, but bugs happen.
> 
> The old implementation simply leaked the fences. Doing a debugfs dump would
> show the timeline with all its outstanding fences still floating around
> forever after. Worse, anything waiting on them would never be signalled and
> is therefore potentially deadlocked.
> 
> Note that I haven't had chance to look through the entire patch series yet
> so maybe this has been fixed up elsewhere. If not, then I think it
> definitely needs looking into.
> 

Patches 27 and 28 are attempt to fix that. I assumed that if some code is
calling fence_timeline_destroy() it wants to stop everything so I
worked on a solution that stops any waiter and allows the timeline to be
destroyed.

No one is using fence_timeline_destroy() in mainline now, so it is
definately a behaviour we can discuss.

	Gustavo
Greg Hackmann Jan. 15, 2016, 11:42 p.m. UTC | #3
On 01/15/2016 10:02 AM, Gustavo Padovan wrote:
> Patches 27 and 28 are attempt to fix that. I assumed that if some code is
> calling fence_timeline_destroy() it wants to stop everything so I
> worked on a solution that stops any waiter and allows the timeline to be
> destroyed.
>
> No one is using fence_timeline_destroy() in mainline now, so it is
> definately a behaviour we can discuss.
>
> 	Gustavo
>

+Tom Cherry and Dmitry Torokhov recently discovered that this was broken 
by the refactoring of Android sync on top of dma-buf fences.

Tom and Dmitry, did you send the proposed fix upstream?
Tom Cherry Feb. 9, 2016, 10:55 p.m. UTC | #4
On Fri, Jan 15, 2016 at 3:42 PM, Greg Hackmann <ghackmann@google.com> wrote:
> On 01/15/2016 10:02 AM, Gustavo Padovan wrote:
>>
>> Patches 27 and 28 are attempt to fix that. I assumed that if some code is
>> calling fence_timeline_destroy() it wants to stop everything so I
>> worked on a solution that stops any waiter and allows the timeline to be
>> destroyed.
>>
>> No one is using fence_timeline_destroy() in mainline now, so it is
>> definately a behaviour we can discuss.
>>
>>         Gustavo
>>
>
> +Tom Cherry and Dmitry Torokhov recently discovered that this was broken by
> the refactoring of Android sync on top of dma-buf fences.
>
> Tom and Dmitry, did you send the proposed fix upstream?

There was a similar issue that I had originally thought to be related
to fence_timeline_destroy() but was actually related to
sync_fence_free().  Dmitry sent the patch upstream at
https://lkml.org/lkml/2015/12/14/953, but it does not look like it has
received any feedback.

We saw real panics without this patch.  I didn't see this patch or any
similar changes in the destaging commits, and I would recommend it be
looked at while destaging this driver.

Tom
Gustavo Padovan Feb. 25, 2016, 3:26 p.m. UTC | #5
2016-02-09 Tom Cherry <tomcherry@google.com>:

> On Fri, Jan 15, 2016 at 3:42 PM, Greg Hackmann <ghackmann@google.com> wrote:
> > On 01/15/2016 10:02 AM, Gustavo Padovan wrote:
> >>
> >> Patches 27 and 28 are attempt to fix that. I assumed that if some code is
> >> calling fence_timeline_destroy() it wants to stop everything so I
> >> worked on a solution that stops any waiter and allows the timeline to be
> >> destroyed.
> >>
> >> No one is using fence_timeline_destroy() in mainline now, so it is
> >> definately a behaviour we can discuss.
> >>
> >>         Gustavo
> >>
> >
> > +Tom Cherry and Dmitry Torokhov recently discovered that this was broken by
> > the refactoring of Android sync on top of dma-buf fences.
> >
> > Tom and Dmitry, did you send the proposed fix upstream?
> 
> There was a similar issue that I had originally thought to be related
> to fence_timeline_destroy() but was actually related to
> sync_fence_free().  Dmitry sent the patch upstream at
> https://lkml.org/lkml/2015/12/14/953, but it does not look like it has
> received any feedback.
> 
> We saw real panics without this patch.  I didn't see this patch or any
> similar changes in the destaging commits, and I would recommend it be
> looked at while destaging this driver.

This patch is already uptream, Greg pushed it in December. :)

	Gustavo
diff mbox

Patch

diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
index 7a5fc9b..26f5f0f 100644
--- a/drivers/dma-buf/fence.c
+++ b/drivers/dma-buf/fence.c
@@ -136,7 +136,7 @@  EXPORT_SYMBOL(fence_timeline_put);
  * fence_timeline_destroy - destroy a fence_timeline
  * @timeline	[in]	the fence_timeline to destroy
  *
- * This function destroys a timeline. It signals any active fence first.
+ * This function destroys a timeline.
  */
 void fence_timeline_destroy(struct fence_timeline *timeline)
 {
@@ -147,10 +147,6 @@  void fence_timeline_destroy(struct fence_timeline *timeline)
 	 */
 	smp_wmb();
 
-	/*
-	 * signal any children that their parent is going away.
-	 */
-	fence_timeline_signal(timeline, 0);
 	fence_timeline_put(timeline);
 }
 EXPORT_SYMBOL(fence_timeline_destroy);