Message ID | 1452869739-3304-27-git-send-email-gustavo@padovan.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 15/01/2016 14:55, Gustavo Padovan wrote: > From: Gustavo Padovan <gustavo.padovan@collabora.co.uk> > > All changes to timeline value come through the user via > fence_timeline_signal() calls. When fence_timeline_destroy() is called no > changes on timeline->value happens hence call fence_timeline_signal() with > no increment is pointless. > > Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> > --- > drivers/dma-buf/fence.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c > index 7a5fc9b..26f5f0f 100644 > --- a/drivers/dma-buf/fence.c > +++ b/drivers/dma-buf/fence.c > @@ -136,7 +136,7 @@ EXPORT_SYMBOL(fence_timeline_put); > * fence_timeline_destroy - destroy a fence_timeline > * @timeline [in] the fence_timeline to destroy > * > - * This function destroys a timeline. It signals any active fence first. > + * This function destroys a timeline. The implementation for this was certainly broken but I would say it should be fixed to match the comment rather than just abandoned completely. That is, what happens if a timeline owner destroys their timeline while there are outstanding fences which other drivers are waiting on? That is presumably a bug in the code that called destroy prematurely, but bugs happen. The old implementation simply leaked the fences. Doing a debugfs dump would show the timeline with all its outstanding fences still floating around forever after. Worse, anything waiting on them would never be signalled and is therefore potentially deadlocked. Note that I haven't had chance to look through the entire patch series yet so maybe this has been fixed up elsewhere. If not, then I think it definitely needs looking into. > */ > void fence_timeline_destroy(struct fence_timeline *timeline) > { > @@ -147,10 +147,6 @@ void fence_timeline_destroy(struct fence_timeline *timeline) > */ > smp_wmb(); > > - /* > - * signal any children that their parent is going away. > - */ > - fence_timeline_signal(timeline, 0); > fence_timeline_put(timeline); > } > EXPORT_SYMBOL(fence_timeline_destroy);
2016-01-15 John Harrison <John.C.Harrison@Intel.com>: > On 15/01/2016 14:55, Gustavo Padovan wrote: > >From: Gustavo Padovan <gustavo.padovan@collabora.co.uk> > > > >All changes to timeline value come through the user via > >fence_timeline_signal() calls. When fence_timeline_destroy() is called no > >changes on timeline->value happens hence call fence_timeline_signal() with > >no increment is pointless. > > > >Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> > >--- > > drivers/dma-buf/fence.c | 6 +----- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > >diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c > >index 7a5fc9b..26f5f0f 100644 > >--- a/drivers/dma-buf/fence.c > >+++ b/drivers/dma-buf/fence.c > >@@ -136,7 +136,7 @@ EXPORT_SYMBOL(fence_timeline_put); > > * fence_timeline_destroy - destroy a fence_timeline > > * @timeline [in] the fence_timeline to destroy > > * > >- * This function destroys a timeline. It signals any active fence first. > >+ * This function destroys a timeline. > > The implementation for this was certainly broken but I would say it should > be fixed to match the comment rather than just abandoned completely. That > is, what happens if a timeline owner destroys their timeline while there are > outstanding fences which other drivers are waiting on? That is presumably a > bug in the code that called destroy prematurely, but bugs happen. > > The old implementation simply leaked the fences. Doing a debugfs dump would > show the timeline with all its outstanding fences still floating around > forever after. Worse, anything waiting on them would never be signalled and > is therefore potentially deadlocked. > > Note that I haven't had chance to look through the entire patch series yet > so maybe this has been fixed up elsewhere. If not, then I think it > definitely needs looking into. > Patches 27 and 28 are attempt to fix that. I assumed that if some code is calling fence_timeline_destroy() it wants to stop everything so I worked on a solution that stops any waiter and allows the timeline to be destroyed. No one is using fence_timeline_destroy() in mainline now, so it is definately a behaviour we can discuss. Gustavo
On 01/15/2016 10:02 AM, Gustavo Padovan wrote: > Patches 27 and 28 are attempt to fix that. I assumed that if some code is > calling fence_timeline_destroy() it wants to stop everything so I > worked on a solution that stops any waiter and allows the timeline to be > destroyed. > > No one is using fence_timeline_destroy() in mainline now, so it is > definately a behaviour we can discuss. > > Gustavo > +Tom Cherry and Dmitry Torokhov recently discovered that this was broken by the refactoring of Android sync on top of dma-buf fences. Tom and Dmitry, did you send the proposed fix upstream?
On Fri, Jan 15, 2016 at 3:42 PM, Greg Hackmann <ghackmann@google.com> wrote: > On 01/15/2016 10:02 AM, Gustavo Padovan wrote: >> >> Patches 27 and 28 are attempt to fix that. I assumed that if some code is >> calling fence_timeline_destroy() it wants to stop everything so I >> worked on a solution that stops any waiter and allows the timeline to be >> destroyed. >> >> No one is using fence_timeline_destroy() in mainline now, so it is >> definately a behaviour we can discuss. >> >> Gustavo >> > > +Tom Cherry and Dmitry Torokhov recently discovered that this was broken by > the refactoring of Android sync on top of dma-buf fences. > > Tom and Dmitry, did you send the proposed fix upstream? There was a similar issue that I had originally thought to be related to fence_timeline_destroy() but was actually related to sync_fence_free(). Dmitry sent the patch upstream at https://lkml.org/lkml/2015/12/14/953, but it does not look like it has received any feedback. We saw real panics without this patch. I didn't see this patch or any similar changes in the destaging commits, and I would recommend it be looked at while destaging this driver. Tom
2016-02-09 Tom Cherry <tomcherry@google.com>: > On Fri, Jan 15, 2016 at 3:42 PM, Greg Hackmann <ghackmann@google.com> wrote: > > On 01/15/2016 10:02 AM, Gustavo Padovan wrote: > >> > >> Patches 27 and 28 are attempt to fix that. I assumed that if some code is > >> calling fence_timeline_destroy() it wants to stop everything so I > >> worked on a solution that stops any waiter and allows the timeline to be > >> destroyed. > >> > >> No one is using fence_timeline_destroy() in mainline now, so it is > >> definately a behaviour we can discuss. > >> > >> Gustavo > >> > > > > +Tom Cherry and Dmitry Torokhov recently discovered that this was broken by > > the refactoring of Android sync on top of dma-buf fences. > > > > Tom and Dmitry, did you send the proposed fix upstream? > > There was a similar issue that I had originally thought to be related > to fence_timeline_destroy() but was actually related to > sync_fence_free(). Dmitry sent the patch upstream at > https://lkml.org/lkml/2015/12/14/953, but it does not look like it has > received any feedback. > > We saw real panics without this patch. I didn't see this patch or any > similar changes in the destaging commits, and I would recommend it be > looked at while destaging this driver. This patch is already uptream, Greg pushed it in December. :) Gustavo
diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c index 7a5fc9b..26f5f0f 100644 --- a/drivers/dma-buf/fence.c +++ b/drivers/dma-buf/fence.c @@ -136,7 +136,7 @@ EXPORT_SYMBOL(fence_timeline_put); * fence_timeline_destroy - destroy a fence_timeline * @timeline [in] the fence_timeline to destroy * - * This function destroys a timeline. It signals any active fence first. + * This function destroys a timeline. */ void fence_timeline_destroy(struct fence_timeline *timeline) { @@ -147,10 +147,6 @@ void fence_timeline_destroy(struct fence_timeline *timeline) */ smp_wmb(); - /* - * signal any children that their parent is going away. - */ - fence_timeline_signal(timeline, 0); fence_timeline_put(timeline); } EXPORT_SYMBOL(fence_timeline_destroy);