Message ID | 20220407164532.1242578-2-matthew.auld@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/i915: fix broken build | expand |
On Thu, Apr 07, 2022 at 05:45:32PM +0100, Matthew Auld wrote: >All of CI is just failing with the following, which prevents loading of >the module: > > i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed > >Best guess is that this comes from the pin_map() for the scratch page, >which does an i915_gem_object_wait_moving_fence() somewhere. It looks >like this now calls into dma_resv_wait_timeout() which can return the >remaining timeout, leading to the caller thinking this is an error. > >Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") >Signed-off-by: Matthew Auld <matthew.auld@intel.com> >Cc: Christian König <christian.koenig@amd.com> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch> This indeed brings CI back to life. Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> thanks Lucas De Marchi
On Thu, Apr 07, 2022 at 05:45:32PM +0100, Matthew Auld wrote: >All of CI is just failing with the following, which prevents loading of >the module: > > i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed > >Best guess is that this comes from the pin_map() for the scratch page, >which does an i915_gem_object_wait_moving_fence() somewhere. It looks >like this now calls into dma_resv_wait_timeout() which can return the >remaining timeout, leading to the caller thinking this is an error. > >Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") >Signed-off-by: Matthew Auld <matthew.auld@intel.com> >Cc: Christian König <christian.koenig@amd.com> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch> >--- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c >index 2998d895a6b3..1c88d4121658 100644 >--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >@@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj, > int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj, > bool intr) > { >+ long ret; >+ > assert_object_held(obj); >- return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, >- intr, MAX_SCHEDULE_TIMEOUT); >+ >+ ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, >+ intr, MAX_SCHEDULE_TIMEOUT); >+ >+ return ret < 0 ? ret : 0; shouldn't == 0 also be an error since it would be a timeout? Lucas De Marchi
On 08/04/2022 06:00, Lucas De Marchi wrote: > On Thu, Apr 07, 2022 at 05:45:32PM +0100, Matthew Auld wrote: >> All of CI is just failing with the following, which prevents loading of >> the module: >> >> i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed >> >> Best guess is that this comes from the pin_map() for the scratch page, >> which does an i915_gem_object_wait_moving_fence() somewhere. It looks >> like this now calls into dma_resv_wait_timeout() which can return the >> remaining timeout, leading to the caller thinking this is an error. >> >> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") >> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >> Cc: Christian König <christian.koenig@amd.com> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c >> b/drivers/gpu/drm/i915/gem/i915_gem_object.c >> index 2998d895a6b3..1c88d4121658 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct >> drm_i915_gem_object *obj, >> int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj, >> bool intr) >> { >> + long ret; >> + >> assert_object_held(obj); >> - return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, >> - intr, MAX_SCHEDULE_TIMEOUT); >> + >> + ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, >> + intr, MAX_SCHEDULE_TIMEOUT); >> + >> + return ret < 0 ? ret : 0; > > shouldn't == 0 also be an error since it would be a timeout? Hmm, I guess so... > > Lucas De Marchi
On 07/04/2022 17:45, Matthew Auld wrote: > All of CI is just failing with the following, which prevents loading of > the module: > > i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed > > Best guess is that this comes from the pin_map() for the scratch page, > which does an i915_gem_object_wait_moving_fence() somewhere. It looks > like this now calls into dma_resv_wait_timeout() which can return the > remaining timeout, leading to the caller thinking this is an error. > > Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") Has this one went in bypassing i915 CI and merged via drm-misc-next? If so I think it's the 2nd large disruption to i915 CI flows recently so the lesson here is try not to bypass i915 CI when merging i915 patches. In this particular example, unless there were merge conflicts causing the series not to apply against drm-tip, it should have been doable to copy intel-gfx on all patches and so get the CI results. (Even if just with --subject-prefix=CI && --suppress-cc=all before merging.) The second question is which branch to merge through, on which I think i915 maintainers would have liked to be consulted. Regards, Tvrtko > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c > index 2998d895a6b3..1c88d4121658 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj, > int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj, > bool intr) > { > + long ret; > + > assert_object_held(obj); > - return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, > - intr, MAX_SCHEDULE_TIMEOUT); > + > + ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, > + intr, MAX_SCHEDULE_TIMEOUT); > + > + return ret < 0 ? ret : 0; > } > > #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
Am 08.04.22 um 11:08 schrieb Tvrtko Ursulin: > > On 07/04/2022 17:45, Matthew Auld wrote: >> All of CI is just failing with the following, which prevents loading of >> the module: >> >> i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed >> >> Best guess is that this comes from the pin_map() for the scratch page, >> which does an i915_gem_object_wait_moving_fence() somewhere. It looks >> like this now calls into dma_resv_wait_timeout() which can return the >> remaining timeout, leading to the caller thinking this is an error. >> >> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") > > Has this one went in bypassing i915 CI and merged via drm-misc-next? > If so I think it's the 2nd large disruption to i915 CI flows recently > so the lesson here is try not to bypass i915 CI when merging i915 > patches. > > In this particular example, unless there were merge conflicts causing > the series not to apply against drm-tip, it should have been doable to > copy intel-gfx on all patches and so get the CI results. (Even if just > with --subject-prefix=CI && --suppress-cc=all before merging.) Exactly that was the problem. I didn't got any usable CI results for this set because it always caused merge conflicts between i915 and drm-misc-next in drm-tip. Regards, Christian. > > The second question is which branch to merge through, on which I think > i915 maintainers would have liked to be consulted. > > Regards, > > Tvrtko > >> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >> Cc: Christian König <christian.koenig@amd.com> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c >> b/drivers/gpu/drm/i915/gem/i915_gem_object.c >> index 2998d895a6b3..1c88d4121658 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct >> drm_i915_gem_object *obj, >> int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj, >> bool intr) >> { >> + long ret; >> + >> assert_object_held(obj); >> - return dma_resv_wait_timeout(obj->base. resv, >> DMA_RESV_USAGE_KERNEL, >> - intr, MAX_SCHEDULE_TIMEOUT); >> + >> + ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, >> + intr, MAX_SCHEDULE_TIMEOUT); >> + >> + return ret < 0 ? ret : 0; >> } >> #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
On 08/04/2022 10:12, Christian König wrote: > Am 08.04.22 um 11:08 schrieb Tvrtko Ursulin: >> >> On 07/04/2022 17:45, Matthew Auld wrote: >>> All of CI is just failing with the following, which prevents loading of >>> the module: >>> >>> i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed >>> >>> Best guess is that this comes from the pin_map() for the scratch page, >>> which does an i915_gem_object_wait_moving_fence() somewhere. It looks >>> like this now calls into dma_resv_wait_timeout() which can return the >>> remaining timeout, leading to the caller thinking this is an error. >>> >>> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") >> >> Has this one went in bypassing i915 CI and merged via drm-misc-next? >> If so I think it's the 2nd large disruption to i915 CI flows recently >> so the lesson here is try not to bypass i915 CI when merging i915 >> patches. >> >> In this particular example, unless there were merge conflicts causing >> the series not to apply against drm-tip, it should have been doable to >> copy intel-gfx on all patches and so get the CI results. (Even if just >> with --subject-prefix=CI && --suppress-cc=all before merging.) > > Exactly that was the problem. I didn't got any usable CI results for > this set because it always caused merge conflicts between i915 and > drm-misc-next in drm-tip. Then a staged approach should be used. First merge the core stuff and when we backmerge to drm-intel(-gt)-next send the i915 parts out. Because knock on effect of such large of a CI fire too many many people on our side is very significant. Regards, Tvrtko > > Regards, > Christian. > >> >> The second question is which branch to merge through, on which I think >> i915 maintainers would have liked to be consulted. >> >> Regards, >> >> Tvrtko >> >>> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >>> Cc: Christian König <christian.koenig@amd.com> >>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> >>> --- >>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> index 2998d895a6b3..1c88d4121658 100644 >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct >>> drm_i915_gem_object *obj, >>> int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj, >>> bool intr) >>> { >>> + long ret; >>> + >>> assert_object_held(obj); >>> - return dma_resv_wait_timeout(obj->base. resv, >>> DMA_RESV_USAGE_KERNEL, >>> - intr, MAX_SCHEDULE_TIMEOUT); >>> + >>> + ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, >>> + intr, MAX_SCHEDULE_TIMEOUT); >>> + >>> + return ret < 0 ? ret : 0; >>> } >>> #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) >
Am 08.04.22 um 11:23 schrieb Tvrtko Ursulin: > > On 08/04/2022 10:12, Christian König wrote: >> Am 08.04.22 um 11:08 schrieb Tvrtko Ursulin: >>> >>> On 07/04/2022 17:45, Matthew Auld wrote: >>>> All of CI is just failing with the following, which prevents >>>> loading of >>>> the module: >>>> >>>> i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed >>>> >>>> Best guess is that this comes from the pin_map() for the scratch page, >>>> which does an i915_gem_object_wait_moving_fence() somewhere. It looks >>>> like this now calls into dma_resv_wait_timeout() which can return the >>>> remaining timeout, leading to the caller thinking this is an error. >>>> >>>> Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") >>> >>> Has this one went in bypassing i915 CI and merged via drm-misc-next? >>> If so I think it's the 2nd large disruption to i915 CI flows >>> recently so the lesson here is try not to bypass i915 CI when >>> merging i915 patches. >>> >>> In this particular example, unless there were merge conflicts >>> causing the series not to apply against drm-tip, it should have been >>> doable to copy intel-gfx on all patches and so get the CI results. >>> (Even if just with --subject-prefix=CI && --suppress-cc=all before >>> merging.) >> >> Exactly that was the problem. I didn't got any usable CI results for >> this set because it always caused merge conflicts between i915 and >> drm-misc-next in drm-tip. > > Then a staged approach should be used. First merge the core stuff and > when we backmerge to drm-intel(-gt)-next send the i915 parts out. > > Because knock on effect of such large of a CI fire too many many > people on our side is very significant. Sorry for that. I thought we had everything covered in drm-tip, but looks like it still broke. BTW: Why is the CI system failing? Regards, Christian. > > Regards, > > Tvrtko > >> >> Regards, >> Christian. >> >>> >>> The second question is which branch to merge through, on which I >>> think i915 maintainers would have liked to be consulted. >>> >>> Regards, >>> >>> Tvrtko >>> >>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >>>> Cc: Christian König <christian.koenig@amd.com> >>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> >>>> --- >>>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++-- >>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c >>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>>> index 2998d895a6b3..1c88d4121658 100644 >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>>> @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct >>>> drm_i915_gem_object *obj, >>>> int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object >>>> *obj, >>>> bool intr) >>>> { >>>> + long ret; >>>> + >>>> assert_object_held(obj); >>>> - return dma_resv_wait_timeout(obj->base. resv, >>>> DMA_RESV_USAGE_KERNEL, >>>> - intr, MAX_SCHEDULE_TIMEOUT); >>>> + >>>> + ret = dma_resv_wait_timeout(obj->base. resv, >>>> DMA_RESV_USAGE_KERNEL, >>>> + intr, MAX_SCHEDULE_TIMEOUT); >>>> + >>>> + return ret < 0 ? ret : 0; >>>> } >>>> #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) >>
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 2998d895a6b3..1c88d4121658 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -772,9 +772,14 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj, int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj, bool intr) { + long ret; + assert_object_held(obj); - return dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, - intr, MAX_SCHEDULE_TIMEOUT); + + ret = dma_resv_wait_timeout(obj->base. resv, DMA_RESV_USAGE_KERNEL, + intr, MAX_SCHEDULE_TIMEOUT); + + return ret < 0 ? ret : 0; } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
All of CI is just failing with the following, which prevents loading of the module: i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed Best guess is that this comes from the pin_map() for the scratch page, which does an i915_gem_object_wait_moving_fence() somewhere. It looks like this now calls into dma_resv_wait_timeout() which can return the remaining timeout, leading to the caller thinking this is an error. Fixes: 1d7f5e6c5240 ("drm/i915: drop bo->moving dependency") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)