drm/i915: Remove incorrect warning in context cleanup
diff mbox

Message ID 1448025816-25584-1-git-send-email-tvrtko.ursulin@linux.intel.com
State New
Headers show

Commit Message

Tvrtko Ursulin Nov. 20, 2015, 1:23 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Mon Oct 5 13:26:36 2015 +0100

    drm/i915: Clean up associated VMAs on context destruction

Added a warning based on an incorrect assumption that all VMAs
in a VM will be on the inactive list at the point last reference
to a context and VM is dropped.

This is not true because i915_gem_object_retire__read will not
put VMA on the inactive list until all activities on the object
in question (in all VMs) have been retired.

As a consequence, whether or not a context/VM will be destroyed
with its VMAs still on the active list, can depend on completely
unrelated activities using the same object from a different
context or engine.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
Testcase: igt/gem_request_retire/retire-vma-not-inactive
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Daniel Vetter Nov. 24, 2015, 10:58 a.m. UTC | #1
On Fri, Nov 20, 2015 at 01:23:36PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Date:   Mon Oct 5 13:26:36 2015 +0100
> 
>     drm/i915: Clean up associated VMAs on context destruction
> 
> Added a warning based on an incorrect assumption that all VMAs
> in a VM will be on the inactive list at the point last reference
> to a context and VM is dropped.
> 
> This is not true because i915_gem_object_retire__read will not
> put VMA on the inactive list until all activities on the object
> in question (in all VMs) have been retired.
> 
> As a consequence, whether or not a context/VM will be destroyed
> with its VMAs still on the active list, can depend on completely
> unrelated activities using the same object from a different
> context or engine.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> Testcase: igt/gem_request_retire/retire-vma-not-inactive
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 204dc7c0b2d6..59dba318213e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -141,8 +141,6 @@ static void i915_gem_context_clean(struct intel_context *ctx)
>  	if (!ppgtt)
>  		return;
>  
> -	WARN_ON(!list_empty(&ppgtt->base.active_list));
> -
>  	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
>  				 mm_list) {
>  		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
> -- 
> 1.9.1
>
Chris Wilson Nov. 24, 2015, 12:53 p.m. UTC | #2
On Tue, Nov 24, 2015 at 11:58:22AM +0100, Daniel Vetter wrote:
> On Fri, Nov 20, 2015 at 01:23:36PM +0000, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> > Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Date:   Mon Oct 5 13:26:36 2015 +0100
> > 
> >     drm/i915: Clean up associated VMAs on context destruction
> > 
> > Added a warning based on an incorrect assumption that all VMAs
> > in a VM will be on the inactive list at the point last reference
> > to a context and VM is dropped.
> > 
> > This is not true because i915_gem_object_retire__read will not
> > put VMA on the inactive list until all activities on the object
> > in question (in all VMs) have been retired.
> > 
> > As a consequence, whether or not a context/VM will be destroyed
> > with its VMAs still on the active list, can depend on completely
> > unrelated activities using the same object from a different
> > context or engine.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> > Testcase: igt/gem_request_retire/retire-vma-not-inactive
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> 
> Queued for -next, thanks for the patch.

The WARN_ON is accurate though. The original patch fails to fix even the
limited aspect of the bug it claimed to.
-Chris
Tvrtko Ursulin Nov. 24, 2015, 1:17 p.m. UTC | #3
On 24/11/15 12:53, Chris Wilson wrote:
> On Tue, Nov 24, 2015 at 11:58:22AM +0100, Daniel Vetter wrote:
>> On Fri, Nov 20, 2015 at 01:23:36PM +0000, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>> Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Date:   Mon Oct 5 13:26:36 2015 +0100
>>>
>>>      drm/i915: Clean up associated VMAs on context destruction
>>>
>>> Added a warning based on an incorrect assumption that all VMAs
>>> in a VM will be on the inactive list at the point last reference
>>> to a context and VM is dropped.
>>>
>>> This is not true because i915_gem_object_retire__read will not
>>> put VMA on the inactive list until all activities on the object
>>> in question (in all VMs) have been retired.
>>>
>>> As a consequence, whether or not a context/VM will be destroyed
>>> with its VMAs still on the active list, can depend on completely
>>> unrelated activities using the same object from a different
>>> context or engine.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>> Testcase: igt/gem_request_retire/retire-vma-not-inactive
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>
>> Queued for -next, thanks for the patch.
>
> The WARN_ON is accurate though. The original patch fails to fix even the
> limited aspect of the bug it claimed to.

That is not true. It only makes it a bit more limited, and not by its 
fault even. Even with that it makes things a bit better, not worse.

And does not impede your VMA rewrite at all. For which I did offer help 
to review as you send out in manageable chunks.

If it is not realistically possible to split it out and do in 
increments, then it would be more constructive to discuss how to do it 
than to keep it in limbo for 15 months, as you say, and use it as a 
reason to shoot down everything else.

Regards,

Tvrtko
Chris Wilson Nov. 24, 2015, 1:27 p.m. UTC | #4
On Tue, Nov 24, 2015 at 01:17:57PM +0000, Tvrtko Ursulin wrote:
> 
> On 24/11/15 12:53, Chris Wilson wrote:
> >On Tue, Nov 24, 2015 at 11:58:22AM +0100, Daniel Vetter wrote:
> >>On Fri, Nov 20, 2015 at 01:23:36PM +0000, Tvrtko Ursulin wrote:
> >>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>
> >>>Commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>>Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>Date:   Mon Oct 5 13:26:36 2015 +0100
> >>>
> >>>     drm/i915: Clean up associated VMAs on context destruction
> >>>
> >>>Added a warning based on an incorrect assumption that all VMAs
> >>>in a VM will be on the inactive list at the point last reference
> >>>to a context and VM is dropped.
> >>>
> >>>This is not true because i915_gem_object_retire__read will not
> >>>put VMA on the inactive list until all activities on the object
> >>>in question (in all VMs) have been retired.
> >>>
> >>>As a consequence, whether or not a context/VM will be destroyed
> >>>with its VMAs still on the active list, can depend on completely
> >>>unrelated activities using the same object from a different
> >>>context or engine.
> >>>
> >>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> >>>Testcase: igt/gem_request_retire/retire-vma-not-inactive
> >>>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc: Michel Thierry <michel.thierry@intel.com>
> >>
> >>Queued for -next, thanks for the patch.
> >
> >The WARN_ON is accurate though. The original patch fails to fix even the
> >limited aspect of the bug it claimed to.
> 
> That is not true. It only makes it a bit more limited, and not by
> its fault even. Even with that it makes things a bit better, not
> worse.

It makes the code worse for very limited improvement, for which we did
not have a publically reported bug, i.e. the impact is very small.
 
> And does not impede your VMA rewrite at all. For which I did offer
> help to review as you send out in manageable chunks.

I have been.
-Chris
Daniel Stone Nov. 24, 2015, 1:29 p.m. UTC | #5
Hi,

On 24 November 2015 at 13:27, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, Nov 24, 2015 at 01:17:57PM +0000, Tvrtko Ursulin wrote:
>> On 24/11/15 12:53, Chris Wilson wrote:
>> >The WARN_ON is accurate though. The original patch fails to fix even the
>> >limited aspect of the bug it claimed to.
>>
>> That is not true. It only makes it a bit more limited, and not by
>> its fault even. Even with that it makes things a bit better, not
>> worse.
>
> It makes the code worse for very limited improvement, for which we did
> not have a publically reported bug, i.e. the impact is very small.

I can get the person who reported it to me to raise a Bugzilla
complaining about the WARN_ON if you like ...

Cheers,
Daniel
Tvrtko Ursulin Nov. 24, 2015, 1:56 p.m. UTC | #6
On 24/11/15 13:27, Chris Wilson wrote:
> On Tue, Nov 24, 2015 at 01:17:57PM +0000, Tvrtko Ursulin wrote:
>>
>> On 24/11/15 12:53, Chris Wilson wrote:
>>> On Tue, Nov 24, 2015 at 11:58:22AM +0100, Daniel Vetter wrote:
>>>> On Fri, Nov 20, 2015 at 01:23:36PM +0000, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> Commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>> Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>
>>>>>      drm/i915: Clean up associated VMAs on context destruction
>>>>>
>>>>> Added a warning based on an incorrect assumption that all VMAs
>>>>> in a VM will be on the inactive list at the point last reference
>>>>> to a context and VM is dropped.
>>>>>
>>>>> This is not true because i915_gem_object_retire__read will not
>>>>> put VMA on the inactive list until all activities on the object
>>>>> in question (in all VMs) have been retired.
>>>>>
>>>>> As a consequence, whether or not a context/VM will be destroyed
>>>>> with its VMAs still on the active list, can depend on completely
>>>>> unrelated activities using the same object from a different
>>>>> context or engine.
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>>>> Testcase: igt/gem_request_retire/retire-vma-not-inactive
>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>>>
>>>> Queued for -next, thanks for the patch.
>>>
>>> The WARN_ON is accurate though. The original patch fails to fix even the
>>> limited aspect of the bug it claimed to.
>>
>> That is not true. It only makes it a bit more limited, and not by
>> its fault even. Even with that it makes things a bit better, not
>> worse.
>
> It makes the code worse for very limited improvement, for which we did
> not have a publically reported bug, i.e. the impact is very small.

Well impact was huge for Android userspace but you are probably right 
that BZ was not created for that.

It was somewhat related to 
https://bugs.freedesktop.org/show_bug.cgi?id=87477 on small memory 
configurations if I remember correctly. Although that hasn't been 
correctly captured in there or a new entry forked off.

We have on the other hand added an IGT for it 
gem_ppgtt/flink-and-close-vma-leak so I don't think your argument is fair.

Especially if the rewrite of it all is imminent - so the worse code, 
even if you think it is so much worse which I disagree with, is only in 
there temporary.

And the memory leak was real even with fbcon and Xorg which I am sure 
you know.

>> And does not impede your VMA rewrite at all. For which I did offer
>> help to review as you send out in manageable chunks.
>
> I have been.

And I have reviewed some, no? Feel free to ping me if I missed some.

Regards,

Tvrtko
Daniel Vetter Nov. 24, 2015, 1:59 p.m. UTC | #7
On Tue, Nov 24, 2015 at 01:29:07PM +0000, Daniel Stone wrote:
> Hi,
> 
> On 24 November 2015 at 13:27, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Tue, Nov 24, 2015 at 01:17:57PM +0000, Tvrtko Ursulin wrote:
> >> On 24/11/15 12:53, Chris Wilson wrote:
> >> >The WARN_ON is accurate though. The original patch fails to fix even the
> >> >limited aspect of the bug it claimed to.
> >>
> >> That is not true. It only makes it a bit more limited, and not by
> >> its fault even. Even with that it makes things a bit better, not
> >> worse.
> >
> > It makes the code worse for very limited improvement, for which we did
> > not have a publically reported bug, i.e. the impact is very small.
> 
> I can get the person who reported it to me to raise a Bugzilla
> complaining about the WARN_ON if you like ...

This is about the original bug, for with the bugfix resulted in the
WARN_ON now being removed here. The underlying problem (I think, it's a
maze) is that our vma active tracking is a bit ... underwhelming.
-Daniel
Daniel Stone Nov. 24, 2015, 2:18 p.m. UTC | #8
Hey,

On 24 November 2015 at 13:59, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Nov 24, 2015 at 01:29:07PM +0000, Daniel Stone wrote:
>> On 24 November 2015 at 13:27, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > On Tue, Nov 24, 2015 at 01:17:57PM +0000, Tvrtko Ursulin wrote:
>> >> On 24/11/15 12:53, Chris Wilson wrote:
>> >> >The WARN_ON is accurate though. The original patch fails to fix even the
>> >> >limited aspect of the bug it claimed to.
>> >>
>> >> That is not true. It only makes it a bit more limited, and not by
>> >> its fault even. Even with that it makes things a bit better, not
>> >> worse.
>> >
>> > It makes the code worse for very limited improvement, for which we did
>> > not have a publically reported bug, i.e. the impact is very small.
>>
>> I can get the person who reported it to me to raise a Bugzilla
>> complaining about the WARN_ON if you like ...
>
> This is about the original bug, for with the bugfix resulted in the
> WARN_ON now being removed here. The underlying problem (I think, it's a
> maze) is that our vma active tracking is a bit ... underwhelming.

Sure, which is fair enough, but OTOH is there an actual plan for
redoing the VMA tracking?

Cheers,
Daniel

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 204dc7c0b2d6..59dba318213e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -141,8 +141,6 @@  static void i915_gem_context_clean(struct intel_context *ctx)
 	if (!ppgtt)
 		return;
 
-	WARN_ON(!list_empty(&ppgtt->base.active_list));
-
 	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
 				 mm_list) {
 		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))