diff mbox series

[1/2] drm/i915: Check activity on i915_vma after confirming pin_count==0

Message ID 20200123224459.38128-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915: Check activity on i915_vma after confirming pin_count==0 | expand

Commit Message

Chris Wilson Jan. 23, 2020, 10:44 p.m. UTC
Only assert that the i915_vma is now idle if and only if no other pins
are present. If another user has the i915_vma pinned, they may submit
more work to the i915_vma skipping the vm->mutex used to serialise the
unbind. We need to wait again, if we want to continue and unbind this
vma.

However, if we own the i915_vma (we hold the vm->mutex for the unbind
and the pin_count is 0), we can assert that the vma remains idle as we
unbind.

Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/530
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Tvrtko Ursulin Jan. 24, 2020, 9:12 a.m. UTC | #1
On 23/01/2020 22:44, Chris Wilson wrote:
> Only assert that the i915_vma is now idle if and only if no other pins
> are present. If another user has the i915_vma pinned, they may submit
> more work to the i915_vma skipping the vm->mutex used to serialise the
> unbind. We need to wait again, if we want to continue and unbind this
> vma.
> 
> However, if we own the i915_vma (we hold the vm->mutex for the unbind
> and the pin_count is 0), we can assert that the vma remains idle as we
> unbind.
> 
> Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/530
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_vma.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 306b951831fe..4999882fbceb 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -1202,16 +1202,26 @@ int __i915_vma_unbind(struct i915_vma *vma)
>   	if (ret)
>   		return ret;
>   
> -	GEM_BUG_ON(i915_vma_is_active(vma));
>   	if (i915_vma_is_pinned(vma)) {
>   		vma_print_allocator(vma, "is pinned");
>   		return -EAGAIN;
>   	}
>   
> -	GEM_BUG_ON(i915_vma_is_active(vma));
> +	/*
> +	 * After confirming that no one else is pinning this vma, wait for
> +	 * any laggards who may have crept in during the wait (through
> +	 * a residual pin skipping the vm->mutex) to complete.
> +	 */
> +	ret = i915_vma_sync(vma);
> +	if (ret)
> +		return ret;
> +
>   	if (!drm_mm_node_allocated(&vma->node))
>   		return 0;
>   
> +	GEM_BUG_ON(i915_vma_is_pinned(vma));
> +	GEM_BUG_ON(i915_vma_is_active(vma));
> +
>   	if (i915_vma_is_map_and_fenceable(vma)) {
>   		/*
>   		 * Check that we have flushed all writes through the GGTT
> 

GEM_BUG_ON in a sandwich of two syncs, which is similar to the while 
loop from an earlier version. Are you sure there are no races with this 
one? If pinned check is the main gate then wouldn't one sync after the 
pinned check be enough? Especially since in 2/2 you add another sync 
before taking the mutex.

Regards,

Tvrtko
Chris Wilson Jan. 24, 2020, 9:30 a.m. UTC | #2
Quoting Tvrtko Ursulin (2020-01-24 09:12:36)
> 
> On 23/01/2020 22:44, Chris Wilson wrote:
> > Only assert that the i915_vma is now idle if and only if no other pins
> > are present. If another user has the i915_vma pinned, they may submit
> > more work to the i915_vma skipping the vm->mutex used to serialise the
> > unbind. We need to wait again, if we want to continue and unbind this
> > vma.
> > 
> > However, if we own the i915_vma (we hold the vm->mutex for the unbind
> > and the pin_count is 0), we can assert that the vma remains idle as we
> > unbind.
> > 
> > Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
> > Closes: https://gitlab.freedesktop.org/drm/intel/issues/530
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_vma.c | 14 ++++++++++++--
> >   1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> > index 306b951831fe..4999882fbceb 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -1202,16 +1202,26 @@ int __i915_vma_unbind(struct i915_vma *vma)
> >       if (ret)
> >               return ret;
> >   
> > -     GEM_BUG_ON(i915_vma_is_active(vma));
> >       if (i915_vma_is_pinned(vma)) {
> >               vma_print_allocator(vma, "is pinned");
> >               return -EAGAIN;
> >       }
> >   
> > -     GEM_BUG_ON(i915_vma_is_active(vma));
> > +     /*
> > +      * After confirming that no one else is pinning this vma, wait for
> > +      * any laggards who may have crept in during the wait (through
> > +      * a residual pin skipping the vm->mutex) to complete.
> > +      */
> > +     ret = i915_vma_sync(vma);
> > +     if (ret)
> > +             return ret;
> > +
> >       if (!drm_mm_node_allocated(&vma->node))
> >               return 0;
> >   
> > +     GEM_BUG_ON(i915_vma_is_pinned(vma));
> > +     GEM_BUG_ON(i915_vma_is_active(vma));
> > +
> >       if (i915_vma_is_map_and_fenceable(vma)) {
> >               /*
> >                * Check that we have flushed all writes through the GGTT
> > 
> 
> GEM_BUG_ON in a sandwich of two syncs, which is similar to the while 
> loop from an earlier version. Are you sure there are no races with this 
> one?

I'm confident that there are no races here with execbuf, but not so
confident that I'd remove the asserts!

> If pinned check is the main gate then wouldn't one sync after the 
> pinned check be enough?

Yes, considered it -- but the sync before the unpin is required to flush
other users so that the pin_count is stable. And then once stable, we
need to chase away the latecomers.

> Especially since in 2/2 you add another sync 
> before taking the mutex.

Yeah, but we don't always take that path, and we need to hold the mutex
to stabilise. So before the mutex, it's just an optimistic wait with no
guarantees for forward progress.
-Chris
Tvrtko Ursulin Jan. 24, 2020, 10:33 a.m. UTC | #3
On 24/01/2020 09:30, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-01-24 09:12:36)
>>
>> On 23/01/2020 22:44, Chris Wilson wrote:
>>> Only assert that the i915_vma is now idle if and only if no other pins
>>> are present. If another user has the i915_vma pinned, they may submit
>>> more work to the i915_vma skipping the vm->mutex used to serialise the
>>> unbind. We need to wait again, if we want to continue and unbind this
>>> vma.
>>>
>>> However, if we own the i915_vma (we hold the vm->mutex for the unbind
>>> and the pin_count is 0), we can assert that the vma remains idle as we
>>> unbind.
>>>
>>> Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
>>> Closes: https://gitlab.freedesktop.org/drm/intel/issues/530
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_vma.c | 14 ++++++++++++--
>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>> index 306b951831fe..4999882fbceb 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -1202,16 +1202,26 @@ int __i915_vma_unbind(struct i915_vma *vma)
>>>        if (ret)
>>>                return ret;
>>>    
>>> -     GEM_BUG_ON(i915_vma_is_active(vma));
>>>        if (i915_vma_is_pinned(vma)) {
>>>                vma_print_allocator(vma, "is pinned");
>>>                return -EAGAIN;
>>>        }
>>>    
>>> -     GEM_BUG_ON(i915_vma_is_active(vma));
>>> +     /*
>>> +      * After confirming that no one else is pinning this vma, wait for
>>> +      * any laggards who may have crept in during the wait (through
>>> +      * a residual pin skipping the vm->mutex) to complete.
>>> +      */
>>> +     ret = i915_vma_sync(vma);
>>> +     if (ret)
>>> +             return ret;
>>> +
>>>        if (!drm_mm_node_allocated(&vma->node))
>>>                return 0;
>>>    
>>> +     GEM_BUG_ON(i915_vma_is_pinned(vma));
>>> +     GEM_BUG_ON(i915_vma_is_active(vma));
>>> +
>>>        if (i915_vma_is_map_and_fenceable(vma)) {
>>>                /*
>>>                 * Check that we have flushed all writes through the GGTT
>>>
>>
>> GEM_BUG_ON in a sandwich of two syncs, which is similar to the while
>> loop from an earlier version. Are you sure there are no races with this
>> one?
> 
> I'm confident that there are no races here with execbuf, but not so
> confident that I'd remove the asserts!
> 
>> If pinned check is the main gate then wouldn't one sync after the
>> pinned check be enough?
> 
> Yes, considered it -- but the sync before the unpin is required to flush
> other users so that the pin_count is stable. And then once stable, we
> need to chase away the latecomers.
> 
>> Especially since in 2/2 you add another sync
>> before taking the mutex.
> 
> Yeah, but we don't always take that path, and we need to hold the mutex
> to stabilise. So before the mutex, it's just an optimistic wait with no
> guarantees for forward progress.

Okay, it's not dangerous or anything so worth a try even without 
bothering (on my side) to think of all possible paths.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Chris Wilson Jan. 24, 2020, 12:14 p.m. UTC | #4
Quoting Tvrtko Ursulin (2020-01-24 10:33:14)
> 
> On 24/01/2020 09:30, Chris Wilson wrote:
> Okay, it's not dangerous or anything so worth a try even without 
> bothering (on my side) to think of all possible paths.

We/I will need to look at why that i915_vma_sync() before the pin_count
is required as it seems to be possible to upset the shrinker when it
tries to unbind itself.

I hope CI is good enough to say yes/no.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 306b951831fe..4999882fbceb 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1202,16 +1202,26 @@  int __i915_vma_unbind(struct i915_vma *vma)
 	if (ret)
 		return ret;
 
-	GEM_BUG_ON(i915_vma_is_active(vma));
 	if (i915_vma_is_pinned(vma)) {
 		vma_print_allocator(vma, "is pinned");
 		return -EAGAIN;
 	}
 
-	GEM_BUG_ON(i915_vma_is_active(vma));
+	/*
+	 * After confirming that no one else is pinning this vma, wait for
+	 * any laggards who may have crept in during the wait (through
+	 * a residual pin skipping the vm->mutex) to complete.
+	 */
+	ret = i915_vma_sync(vma);
+	if (ret)
+		return ret;
+
 	if (!drm_mm_node_allocated(&vma->node))
 		return 0;
 
+	GEM_BUG_ON(i915_vma_is_pinned(vma));
+	GEM_BUG_ON(i915_vma_is_active(vma));
+
 	if (i915_vma_is_map_and_fenceable(vma)) {
 		/*
 		 * Check that we have flushed all writes through the GGTT