diff mbox series

[2/2] drm/i915: fix i915_gem_object_wait_moving_fence

Message ID 20220407164532.1242578-2-matthew.auld@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915: fix broken build | expand

Commit Message

Matthew Auld April 7, 2022, 4:45 p.m. UTC
All of CI is just failing with the following, which prevents loading of
the module:

    i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed

Best guess is that this comes from the pin_map() for the scratch page,
which does an i915_gem_object_wait_moving_fence() somewhere. It looks
like this now calls into dma_resv_wait_timeout() which can return the
remaining timeout, leading to the caller thinking this is an error.

Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Lucas De Marchi April 8, 2022, 1:44 a.m. UTC | #1
On Thu, Apr 07, 2022 at 05:45:32PM +0100, Matthew Auld wrote:
>All of CI is just failing with the following, which prevents loading of
>the module:
>
>    i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
>
>Best guess is that this comes from the pin_map() for the scratch page,
>which does an i915_gem_object_wait_moving_fence() somewhere. It looks
>like this now calls into dma_resv_wait_timeout() which can return the
>remaining timeout, leading to the caller thinking this is an error.
>
>Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Christian König <christian.koenig@amd.com>
>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

This indeed brings CI back to life.


Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>


thanks
Lucas De Marchi
Lucas De Marchi April 8, 2022, 5 a.m. UTC | #2
On Thu, Apr 07, 2022 at 05:45:32PM +0100, Matthew Auld wrote:
>All of CI is just failing with the following, which prevents loading of
>the module:
>
>    i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
>
>Best guess is that this comes from the pin_map() for the scratch page,
>which does an i915_gem_object_wait_moving_fence() somewhere. It looks
>like this now calls into dma_resv_wait_timeout() which can return the
>remaining timeout, leading to the caller thinking this is an error.
>
>Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Christian König <christian.koenig@amd.com>
>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>---
> drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>index 2998d895a6b3..1c88d4121658 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>@@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj,
> int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> 				      bool intr)
> {
>+	long ret;
>+
> 	assert_object_held(obj);
>-	return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
>-				     intr, MAX_SCHEDULE_TIMEOUT);
>+
>+	ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
>+				    intr, MAX_SCHEDULE_TIMEOUT);
>+
>+	return ret < 0 ? ret : 0;

shouldn't == 0 also be an error since it would be a timeout?

Lucas De Marchi
Matthew Auld April 8, 2022, 8:13 a.m. UTC | #3
On 08/04/2022 06:00, Lucas De Marchi wrote:
> On Thu, Apr 07, 2022 at 05:45:32PM +0100, Matthew Auld wrote:
>> All of CI is just failing with the following, which prevents loading of
>> the module:
>>
>>    i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
>>
>> Best guess is that this comes from the pin_map() for the scratch page,
>> which does an i915_gem_object_wait_moving_fence() somewhere. It looks
>> like this now calls into dma_resv_wait_timeout() which can return the
>> remaining timeout, leading to the caller thinking this is an error.
>>
>> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 2998d895a6b3..1c88d4121658 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct 
>> drm_i915_gem_object *obj,
>> int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>                       bool intr)
>> {
>> +    long ret;
>> +
>>     assert_object_held(obj);
>> -    return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
>> -                     intr, MAX_SCHEDULE_TIMEOUT);
>> +
>> +    ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
>> +                    intr, MAX_SCHEDULE_TIMEOUT);
>> +
>> +    return ret < 0 ? ret : 0;
> 
> shouldn't == 0 also be an error since it would be a timeout?

Hmm, I guess so...

> 
> Lucas De Marchi
Tvrtko Ursulin April 8, 2022, 9:08 a.m. UTC | #4
On 07/04/2022 17:45, Matthew Auld wrote:
> All of CI is just failing with the following, which prevents loading of
> the module:
> 
>      i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
> 
> Best guess is that this comes from the pin_map() for the scratch page,
> which does an i915_gem_object_wait_moving_fence() somewhere. It looks
> like this now calls into dma_resv_wait_timeout() which can return the
> remaining timeout, leading to the caller thinking this is an error.
> 
> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")

Has this one went in bypassing i915 CI and merged via drm-misc-next? If 
so I think it's the 2nd large disruption to i915 CI flows recently so 
the lesson here is try not to bypass i915 CI when merging i915 patches.

In this particular example, unless there were merge conflicts causing 
the series not to apply against drm-tip, it should have been doable to 
copy intel-gfx on all patches and so get the CI results. (Even if just 
with --subject-prefix=CI && --suppress-cc=all before merging.)

The second question is which branch to merge through, on which I think 
i915 maintainers would have liked to be consulted.

Regards,

Tvrtko

> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 2998d895a6b3..1c88d4121658 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj,
>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>   				      bool intr)
>   {
> +	long ret;
> +
>   	assert_object_held(obj);
> -	return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
> -				     intr, MAX_SCHEDULE_TIMEOUT);
> +
> +	ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
> +				    intr, MAX_SCHEDULE_TIMEOUT);
> +
> +	return ret < 0 ? ret : 0;
>   }
>   
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
Christian König April 8, 2022, 9:12 a.m. UTC | #5
Am 08.04.22 um 11:08 schrieb Tvrtko Ursulin:
>
> On 07/04/2022 17:45, Matthew Auld wrote:
>> All of CI is just failing with the following, which prevents loading of
>> the module:
>>
>>      i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
>>
>> Best guess is that this comes from the pin_map() for the scratch page,
>> which does an i915_gem_object_wait_moving_fence() somewhere. It looks
>> like this now calls into dma_resv_wait_timeout() which can return the
>> remaining timeout, leading to the caller thinking this is an error.
>>
>> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")
>
> Has this one went in bypassing i915 CI and merged via drm-misc-next? 
> If so I think it's the 2nd large disruption to i915 CI flows recently 
> so the lesson here is try not to bypass i915 CI when merging i915 
> patches.
>
> In this particular example, unless there were merge conflicts causing 
> the series not to apply against drm-tip, it should have been doable to 
> copy intel-gfx on all patches and so get the CI results. (Even if just 
> with --subject-prefix=CI && --suppress-cc=all before merging.)

Exactly that was the problem. I didn't got any usable CI results for 
this set because it always caused merge conflicts between i915 and 
drm-misc-next in drm-tip.

Regards,
Christian.

>
> The second question is which branch to merge through, on which I think 
> i915 maintainers would have liked to be consulted.
>
> Regards,
>
> Tvrtko
>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 2998d895a6b3..1c88d4121658 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct 
>> drm_i915_gem_object *obj,
>>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>                         bool intr)
>>   {
>> +    long ret;
>> +
>>       assert_object_held(obj);
>> -    return dma_resv_wait_timeout(obj->base. resv, 
>> DMA_RESV_USAGE_KERNEL,
>> -                     intr, MAX_SCHEDULE_TIMEOUT);
>> +
>> +    ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
>> +                    intr, MAX_SCHEDULE_TIMEOUT);
>> +
>> +    return ret < 0 ? ret : 0;
>>   }
>>     #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
Tvrtko Ursulin April 8, 2022, 9:23 a.m. UTC | #6
On 08/04/2022 10:12, Christian König wrote:
> Am 08.04.22 um 11:08 schrieb Tvrtko Ursulin:
>>
>> On 07/04/2022 17:45, Matthew Auld wrote:
>>> All of CI is just failing with the following, which prevents loading of
>>> the module:
>>>
>>>      i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
>>>
>>> Best guess is that this comes from the pin_map() for the scratch page,
>>> which does an i915_gem_object_wait_moving_fence() somewhere. It looks
>>> like this now calls into dma_resv_wait_timeout() which can return the
>>> remaining timeout, leading to the caller thinking this is an error.
>>>
>>> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")
>>
>> Has this one went in bypassing i915 CI and merged via drm-misc-next? 
>> If so I think it's the 2nd large disruption to i915 CI flows recently 
>> so the lesson here is try not to bypass i915 CI when merging i915 
>> patches.
>>
>> In this particular example, unless there were merge conflicts causing 
>> the series not to apply against drm-tip, it should have been doable to 
>> copy intel-gfx on all patches and so get the CI results. (Even if just 
>> with --subject-prefix=CI && --suppress-cc=all before merging.)
> 
> Exactly that was the problem. I didn't got any usable CI results for 
> this set because it always caused merge conflicts between i915 and 
> drm-misc-next in drm-tip.

Then a staged approach should be used. First merge the core stuff and 
when we backmerge to drm-intel(-gt)-next send the i915 parts out.

Because knock on effect of such large of a CI fire too many many people 
on our side is very significant.

Regards,

Tvrtko

> 
> Regards,
> Christian.
> 
>>
>> The second question is which branch to merge through, on which I think 
>> i915 maintainers would have liked to be consulted.
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 2998d895a6b3..1c88d4121658 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct 
>>> drm_i915_gem_object *obj,
>>>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>>                         bool intr)
>>>   {
>>> +    long ret;
>>> +
>>>       assert_object_held(obj);
>>> -    return dma_resv_wait_timeout(obj->base. resv, 
>>> DMA_RESV_USAGE_KERNEL,
>>> -                     intr, MAX_SCHEDULE_TIMEOUT);
>>> +
>>> +    ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
>>> +                    intr, MAX_SCHEDULE_TIMEOUT);
>>> +
>>> +    return ret < 0 ? ret : 0;
>>>   }
>>>     #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>
Christian König April 8, 2022, 9:29 a.m. UTC | #7
Am 08.04.22 um 11:23 schrieb Tvrtko Ursulin:
>
> On 08/04/2022 10:12, Christian König wrote:
>> Am 08.04.22 um 11:08 schrieb Tvrtko Ursulin:
>>>
>>> On 07/04/2022 17:45, Matthew Auld wrote:
>>>> All of CI is just failing with the following, which prevents 
>>>> loading of
>>>> the module:
>>>>
>>>>      i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
>>>>
>>>> Best guess is that this comes from the pin_map() for the scratch page,
>>>> which does an i915_gem_object_wait_moving_fence() somewhere. It looks
>>>> like this now calls into dma_resv_wait_timeout() which can return the
>>>> remaining timeout, leading to the caller thinking this is an error.
>>>>
>>>> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency")
>>>
>>> Has this one went in bypassing i915 CI and merged via drm-misc-next? 
>>> If so I think it's the 2nd large disruption to i915 CI flows 
>>> recently so the lesson here is try not to bypass i915 CI when 
>>> merging i915 patches.
>>>
>>> In this particular example, unless there were merge conflicts 
>>> causing the series not to apply against drm-tip, it should have been 
>>> doable to copy intel-gfx on all patches and so get the CI results. 
>>> (Even if just with --subject-prefix=CI && --suppress-cc=all before 
>>> merging.)
>>
>> Exactly that was the problem. I didn't got any usable CI results for 
>> this set because it always caused merge conflicts between i915 and 
>> drm-misc-next in drm-tip.
>
> Then a staged approach should be used. First merge the core stuff and 
> when we backmerge to drm-intel(-gt)-next send the i915 parts out.
>
> Because knock on effect of such large of a CI fire too many many 
> people on our side is very significant.

Sorry for that. I thought we had everything covered in drm-tip, but 
looks like it still broke.

BTW: Why is the CI system failing?

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> The second question is which branch to merge through, on which I 
>>> think i915 maintainers would have liked to be consulted.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++--
>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> index 2998d895a6b3..1c88d4121658 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct 
>>>> drm_i915_gem_object *obj,
>>>>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object 
>>>> *obj,
>>>>                         bool intr)
>>>>   {
>>>> +    long ret;
>>>> +
>>>>       assert_object_held(obj);
>>>> -    return dma_resv_wait_timeout(obj->base. resv, 
>>>> DMA_RESV_USAGE_KERNEL,
>>>> -                     intr, MAX_SCHEDULE_TIMEOUT);
>>>> +
>>>> +    ret = dma_resv_wait_timeout(obj->base. resv, 
>>>> DMA_RESV_USAGE_KERNEL,
>>>> +                    intr, MAX_SCHEDULE_TIMEOUT);
>>>> +
>>>> +    return ret < 0 ? ret : 0;
>>>>   }
>>>>     #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 2998d895a6b3..1c88d4121658 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -772,9 +772,14 @@  int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj,
 int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
 				      bool intr)
 {
+	long ret;
+
 	assert_object_held(obj);
-	return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
-				     intr, MAX_SCHEDULE_TIMEOUT);
+
+	ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL,
+				    intr, MAX_SCHEDULE_TIMEOUT);
+
+	return ret < 0 ? ret : 0;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)