Message ID | 20221111005651.4160369-1-daniele.ceraolospurio@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/huc: fix leak of debug object in huc load fence on driver unload | expand |
Hi Daniele, On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote: > The fence is always initialized in huc_init_early, but the cleanup in > huc_fini is only being run if HuC is enabled. This causes a leaking of > the debug object when HuC is disabled/not supported, which can in turn > trigger a warning if we try to register a new debug offset at the same > address on driver reload. > > To fix the issue, make sure to always run the cleanup code. > > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Reported-by: Brian Norris <briannorris@chromium.org> > Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence") > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Brian Norris <briannorris@chromium.org> > Cc: Alan Previn <alan.previn.teres.alexis@intel.com> > Cc: John Harrison <John.C.Harrison@Intel.com> > --- > > Note: I didn't manage to repro the reported warning, but I did confirm > that we weren't correctly calling i915_sw_fence_fini and that this patch > fixes that. I *did* reproduce, and with this patch, I no longer reproduce. So: Tested-by: Brian Norris <briannorris@chromium.org> I see this differs very slightly from the draft version (which didn't work for me): https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53cf95@intel.com/ so presumably that diff is the fix. Thanks a bunch! Brian > drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++----- > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 1 + > 2 files changed, 8 insertions(+), 5 deletions(-)
On 11/16/2022 5:29 PM, Brian Norris wrote: > Hi Daniele, > > On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote: >> The fence is always initialized in huc_init_early, but the cleanup in >> huc_fini is only being run if HuC is enabled. This causes a leaking of >> the debug object when HuC is disabled/not supported, which can in turn >> trigger a warning if we try to register a new debug offset at the same >> address on driver reload. >> >> To fix the issue, make sure to always run the cleanup code. >> >> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >> Reported-by: Brian Norris <briannorris@chromium.org> >> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence") >> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >> Cc: Brian Norris <briannorris@chromium.org> >> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> >> Cc: John Harrison <John.C.Harrison@Intel.com> >> --- >> >> Note: I didn't manage to repro the reported warning, but I did confirm >> that we weren't correctly calling i915_sw_fence_fini and that this patch >> fixes that. > I *did* reproduce, and with this patch, I no longer reproduce. So: > > Tested-by: Brian Norris <briannorris@chromium.org> > > I see this differs very slightly from the draft version (which didn't > work for me): > > https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53cf95@intel.com/ > > so presumably that diff is the fix. The extra diff makes the driver call the cleanup function even if HuC is disabled, while the draft version just fixed the cleanup function without making sure it was being called. > > Thanks a bunch! Thanks for testing! Daniele > > Brian > >> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++----- >> drivers/gpu/drm/i915/gt/uc/intel_uc.c | 1 + >> 2 files changed, 8 insertions(+), 5 deletions(-)
On 11/10/2022 16:56, Daniele Ceraolo Spurio wrote: > The fence is always initialized in huc_init_early, but the cleanup in > huc_fini is only being run if HuC is enabled. This causes a leaking of > the debug object when HuC is disabled/not supported, which can in turn > trigger a warning if we try to register a new debug offset at the same > address on driver reload. > > To fix the issue, make sure to always run the cleanup code. > > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Reported-by: Brian Norris <briannorris@chromium.org> > Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence") > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Brian Norris <briannorris@chromium.org> > Cc: Alan Previn <alan.previn.teres.alexis@intel.com> > Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> > --- > > Note: I didn't manage to repro the reported warning, but I did confirm > that we weren't correctly calling i915_sw_fence_fini and that this patch > fixes that. > > drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++----- > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 1 + > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c > index fbc8bae14f76..83735a1528fe 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c > @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc) > > void intel_huc_fini(struct intel_huc *huc) > { > - if (!intel_uc_fw_is_loadable(&huc->fw)) > - return; > - > + /* > + * the fence is initialized in init_early, so we need to clean it up > + * even if HuC loading is off. > + */ > delayed_huc_load_complete(huc); > - > i915_sw_fence_fini(&huc->delayed_load.fence); > - intel_uc_fw_fini(&huc->fw); > + > + if (intel_uc_fw_is_loadable(&huc->fw)) > + intel_uc_fw_fini(&huc->fw); > } > > void intel_huc_suspend(struct intel_huc *huc) > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > index dbd048b77e19..41f08b55790e 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc) > > static const struct intel_uc_ops uc_ops_off = { > .init_hw = __uc_check_hw, > + .fini = __uc_fini, /* to clean-up the init_early initialization */ > }; > > static const struct intel_uc_ops uc_ops_on = {
On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote: > The fence is always initialized in huc_init_early, but the cleanup in > huc_fini is only being run if HuC is enabled. This causes a leaking of > the debug object when HuC is disabled/not supported, which can in turn > trigger a warning if we try to register a new debug offset at the same > address on driver reload. > > To fix the issue, make sure to always run the cleanup code. This oopsing in ci now. Somehow the patchwork run did not hit that oops. > > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Reported-by: Brian Norris <briannorris@chromium.org> > Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence") > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Brian Norris <briannorris@chromium.org> > Cc: Alan Previn <alan.previn.teres.alexis@intel.com> > Cc: John Harrison <John.C.Harrison@Intel.com> > --- > > Note: I didn't manage to repro the reported warning, but I did confirm > that we weren't correctly calling i915_sw_fence_fini and that this patch > fixes that. > > drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++----- > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 1 + > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c > index fbc8bae14f76..83735a1528fe 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c > @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc) > > void intel_huc_fini(struct intel_huc *huc) > { > - if (!intel_uc_fw_is_loadable(&huc->fw)) > - return; > - > + /* > + * the fence is initialized in init_early, so we need to clean it up > + * even if HuC loading is off. > + */ > delayed_huc_load_complete(huc); > - > i915_sw_fence_fini(&huc->delayed_load.fence); > - intel_uc_fw_fini(&huc->fw); > + > + if (intel_uc_fw_is_loadable(&huc->fw)) > + intel_uc_fw_fini(&huc->fw); > } > > void intel_huc_suspend(struct intel_huc *huc) > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > index dbd048b77e19..41f08b55790e 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc) > > static const struct intel_uc_ops uc_ops_off = { > .init_hw = __uc_check_hw, > + .fini = __uc_fini, /* to clean-up the init_early initialization */ > }; > > static const struct intel_uc_ops uc_ops_on = { > -- > 2.37.3
On 11/25/2022 5:54 AM, Ville Syrjälä wrote: > On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote: >> The fence is always initialized in huc_init_early, but the cleanup in >> huc_fini is only being run if HuC is enabled. This causes a leaking of >> the debug object when HuC is disabled/not supported, which can in turn >> trigger a warning if we try to register a new debug offset at the same >> address on driver reload. >> >> To fix the issue, make sure to always run the cleanup code. > This oopsing in ci now. Somehow the patchwork run did not > hit that oops. Can you point me to the oops log? I opened a few recent runs at random but I wasn't able to find it. Note that I did spot a potential issue that hits platforms that don't have VCS engines (introduced due to a MTL change to support HuC only on the media GT) and I already have a fix for that on the ML: https://patchwork.freedesktop.org/series/111288/ But without looking at the oops logs or knowing on which platform it was on I don't know if it's the same issue or not. Daniele > >> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >> Reported-by: Brian Norris <briannorris@chromium.org> >> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence") >> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >> Cc: Brian Norris <briannorris@chromium.org> >> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> >> Cc: John Harrison <John.C.Harrison@Intel.com> >> --- >> >> Note: I didn't manage to repro the reported warning, but I did confirm >> that we weren't correctly calling i915_sw_fence_fini and that this patch >> fixes that. >> >> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++----- >> drivers/gpu/drm/i915/gt/uc/intel_uc.c | 1 + >> 2 files changed, 8 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >> index fbc8bae14f76..83735a1528fe 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >> @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc) >> >> void intel_huc_fini(struct intel_huc *huc) >> { >> - if (!intel_uc_fw_is_loadable(&huc->fw)) >> - return; >> - >> + /* >> + * the fence is initialized in init_early, so we need to clean it up >> + * even if HuC loading is off. >> + */ >> delayed_huc_load_complete(huc); >> - >> i915_sw_fence_fini(&huc->delayed_load.fence); >> - intel_uc_fw_fini(&huc->fw); >> + >> + if (intel_uc_fw_is_loadable(&huc->fw)) >> + intel_uc_fw_fini(&huc->fw); >> } >> >> void intel_huc_suspend(struct intel_huc *huc) >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >> index dbd048b77e19..41f08b55790e 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >> @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc) >> >> static const struct intel_uc_ops uc_ops_off = { >> .init_hw = __uc_check_hw, >> + .fini = __uc_fini, /* to clean-up the init_early initialization */ >> }; >> >> static const struct intel_uc_ops uc_ops_on = { >> -- >> 2.37.3
On Mon, Nov 28, 2022 at 01:10:58AM -0800, Ceraolo Spurio, Daniele wrote: > > > On 11/25/2022 5:54 AM, Ville Syrjälä wrote: > > On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote: > >> The fence is always initialized in huc_init_early, but the cleanup in > >> huc_fini is only being run if HuC is enabled. This causes a leaking of > >> the debug object when HuC is disabled/not supported, which can in turn > >> trigger a warning if we try to register a new debug offset at the same > >> address on driver reload. > >> > >> To fix the issue, make sure to always run the cleanup code. > > This oopsing in ci now. Somehow the patchwork run did not > > hit that oops. > > Can you point me to the oops log? I opened a few recent runs at random > but I wasn't able to find it. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12425/fi-blb-e6850/igt@core_hotunplug@unbind-rebind.html
On 11/28/2022 5:08 AM, Ville Syrjälä wrote: > On Mon, Nov 28, 2022 at 01:10:58AM -0800, Ceraolo Spurio, Daniele wrote: >> >> On 11/25/2022 5:54 AM, Ville Syrjälä wrote: >>> On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote: >>>> The fence is always initialized in huc_init_early, but the cleanup in >>>> huc_fini is only being run if HuC is enabled. This causes a leaking of >>>> the debug object when HuC is disabled/not supported, which can in turn >>>> trigger a warning if we try to register a new debug offset at the same >>>> address on driver reload. >>>> >>>> To fix the issue, make sure to always run the cleanup code. >>> This oopsing in ci now. Somehow the patchwork run did not >>> hit that oops. >> Can you point me to the oops log? I opened a few recent runs at random >> but I wasn't able to find it. > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12425/fi-blb-e6850/igt@core_hotunplug@unbind-rebind.html Thanks, it's indeed the same issue (and I've just confirmed that the pre-merge result for the fix do mention that this test is moving from incomplete to pass). From just a visual inspection I thought the problem would only affect MTL, which does have HuC but only on one of the 2 GTs, but it looks like this impacts also platforms without HuC at all (as long as they also have no VCS engines). I'll try to get the fix reviewed and merged ASAP. Thanks, Daniele
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index fbc8bae14f76..83735a1528fe 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc) void intel_huc_fini(struct intel_huc *huc) { - if (!intel_uc_fw_is_loadable(&huc->fw)) - return; - + /* + * the fence is initialized in init_early, so we need to clean it up + * even if HuC loading is off. + */ delayed_huc_load_complete(huc); - i915_sw_fence_fini(&huc->delayed_load.fence); - intel_uc_fw_fini(&huc->fw); + + if (intel_uc_fw_is_loadable(&huc->fw)) + intel_uc_fw_fini(&huc->fw); } void intel_huc_suspend(struct intel_huc *huc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index dbd048b77e19..41f08b55790e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc) static const struct intel_uc_ops uc_ops_off = { .init_hw = __uc_check_hw, + .fini = __uc_fini, /* to clean-up the init_early initialization */ }; static const struct intel_uc_ops uc_ops_on = {
The fence is always initialized in huc_init_early, but the cleanup in huc_fini is only being run if HuC is enabled. This causes a leaking of the debug object when HuC is disabled/not supported, which can in turn trigger a warning if we try to register a new debug offset at the same address on driver reload. To fix the issue, make sure to always run the cleanup code. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reported-by: Brian Norris <briannorris@chromium.org> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> --- Note: I didn't manage to repro the reported warning, but I did confirm that we weren't correctly calling i915_sw_fence_fini and that this patch fixes that. drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++----- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 1 + 2 files changed, 8 insertions(+), 5 deletions(-)