diff mbox series

drm/i915/huc: fix leak of debug object in huc load fence on driver unload

Message ID 20221111005651.4160369-1-daniele.ceraolospurio@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/huc: fix leak of debug object in huc load fence on driver unload | expand

Commit Message

Daniele Ceraolo Spurio Nov. 11, 2022, 12:56 a.m. UTC
The fence is always initialized in huc_init_early, but the cleanup in
huc_fini is only being run if HuC is enabled. This causes a leaking of
the debug object when HuC is disabled/not supported, which can in turn
trigger a warning if we try to register a new debug offset at the same
address on driver reload.

To fix the issue, make sure to always run the cleanup code.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reported-by: Brian Norris <briannorris@chromium.org>
Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---

Note: I didn't manage to repro the reported warning, but I did confirm
that we weren't correctly calling i915_sw_fence_fini and that this patch
fixes that.

 drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++-----
 drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
 2 files changed, 8 insertions(+), 5 deletions(-)

Comments

Brian Norris Nov. 17, 2022, 1:29 a.m. UTC | #1
Hi Daniele,

On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> The fence is always initialized in huc_init_early, but the cleanup in
> huc_fini is only being run if HuC is enabled. This causes a leaking of
> the debug object when HuC is disabled/not supported, which can in turn
> trigger a warning if we try to register a new debug offset at the same
> address on driver reload.
> 
> To fix the issue, make sure to always run the cleanup code.
> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Reported-by: Brian Norris <briannorris@chromium.org>
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Brian Norris <briannorris@chromium.org>
> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> ---
> 
> Note: I didn't manage to repro the reported warning, but I did confirm
> that we weren't correctly calling i915_sw_fence_fini and that this patch
> fixes that.

I *did* reproduce, and with this patch, I no longer reproduce. So:

Tested-by: Brian Norris <briannorris@chromium.org>

I see this differs very slightly from the draft version (which didn't
work for me):

https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53cf95@intel.com/

so presumably that diff is the fix.

Thanks a bunch!

Brian

>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++-----
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)
Daniele Ceraolo Spurio Nov. 17, 2022, 3:57 p.m. UTC | #2
On 11/16/2022 5:29 PM, Brian Norris wrote:
> Hi Daniele,
>
> On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
>> The fence is always initialized in huc_init_early, but the cleanup in
>> huc_fini is only being run if HuC is enabled. This causes a leaking of
>> the debug object when HuC is disabled/not supported, which can in turn
>> trigger a warning if we try to register a new debug offset at the same
>> address on driver reload.
>>
>> To fix the issue, make sure to always run the cleanup code.
>>
>> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Reported-by: Brian Norris <briannorris@chromium.org>
>> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Brian Norris <briannorris@chromium.org>
>> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>> Cc: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>
>> Note: I didn't manage to repro the reported warning, but I did confirm
>> that we weren't correctly calling i915_sw_fence_fini and that this patch
>> fixes that.
> I *did* reproduce, and with this patch, I no longer reproduce. So:
>
> Tested-by: Brian Norris <briannorris@chromium.org>
>
> I see this differs very slightly from the draft version (which didn't
> work for me):
>
> https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53cf95@intel.com/
>
> so presumably that diff is the fix.

The extra diff makes the driver call the cleanup function even if HuC is 
disabled, while the draft version just fixed the cleanup function 
without making sure it was being called.

>
> Thanks a bunch!

Thanks for testing!

Daniele

>
> Brian
>
>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++-----
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>>   2 files changed, 8 insertions(+), 5 deletions(-)
John Harrison Nov. 22, 2022, 11:07 p.m. UTC | #3
On 11/10/2022 16:56, Daniele Ceraolo Spurio wrote:
> The fence is always initialized in huc_init_early, but the cleanup in
> huc_fini is only being run if HuC is enabled. This causes a leaking of
> the debug object when HuC is disabled/not supported, which can in turn
> trigger a warning if we try to register a new debug offset at the same
> address on driver reload.
>
> To fix the issue, make sure to always run the cleanup code.
>
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Reported-by: Brian Norris <briannorris@chromium.org>
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Brian Norris <briannorris@chromium.org>
> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>
> Note: I didn't manage to repro the reported warning, but I did confirm
> that we weren't correctly calling i915_sw_fence_fini and that this patch
> fixes that.
>
>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++-----
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>   2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index fbc8bae14f76..83735a1528fe 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc)
>   
>   void intel_huc_fini(struct intel_huc *huc)
>   {
> -	if (!intel_uc_fw_is_loadable(&huc->fw))
> -		return;
> -
> +	/*
> +	 * the fence is initialized in init_early, so we need to clean it up
> +	 * even if HuC loading is off.
> +	 */
>   	delayed_huc_load_complete(huc);
> -
>   	i915_sw_fence_fini(&huc->delayed_load.fence);
> -	intel_uc_fw_fini(&huc->fw);
> +
> +	if (intel_uc_fw_is_loadable(&huc->fw))
> +		intel_uc_fw_fini(&huc->fw);
>   }
>   
>   void intel_huc_suspend(struct intel_huc *huc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index dbd048b77e19..41f08b55790e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc)
>   
>   static const struct intel_uc_ops uc_ops_off = {
>   	.init_hw = __uc_check_hw,
> +	.fini = __uc_fini, /* to clean-up the init_early initialization */
>   };
>   
>   static const struct intel_uc_ops uc_ops_on = {
Ville Syrjälä Nov. 25, 2022, 1:54 p.m. UTC | #4
On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> The fence is always initialized in huc_init_early, but the cleanup in
> huc_fini is only being run if HuC is enabled. This causes a leaking of
> the debug object when HuC is disabled/not supported, which can in turn
> trigger a warning if we try to register a new debug offset at the same
> address on driver reload.
> 
> To fix the issue, make sure to always run the cleanup code.

This oopsing in ci now. Somehow the patchwork run did not
hit that oops.

> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Reported-by: Brian Norris <briannorris@chromium.org>
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Brian Norris <briannorris@chromium.org>
> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> ---
> 
> Note: I didn't manage to repro the reported warning, but I did confirm
> that we weren't correctly calling i915_sw_fence_fini and that this patch
> fixes that.
> 
>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++-----
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index fbc8bae14f76..83735a1528fe 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc)
>  
>  void intel_huc_fini(struct intel_huc *huc)
>  {
> -	if (!intel_uc_fw_is_loadable(&huc->fw))
> -		return;
> -
> +	/*
> +	 * the fence is initialized in init_early, so we need to clean it up
> +	 * even if HuC loading is off.
> +	 */
>  	delayed_huc_load_complete(huc);
> -
>  	i915_sw_fence_fini(&huc->delayed_load.fence);
> -	intel_uc_fw_fini(&huc->fw);
> +
> +	if (intel_uc_fw_is_loadable(&huc->fw))
> +		intel_uc_fw_fini(&huc->fw);
>  }
>  
>  void intel_huc_suspend(struct intel_huc *huc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index dbd048b77e19..41f08b55790e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc)
>  
>  static const struct intel_uc_ops uc_ops_off = {
>  	.init_hw = __uc_check_hw,
> +	.fini = __uc_fini, /* to clean-up the init_early initialization */
>  };
>  
>  static const struct intel_uc_ops uc_ops_on = {
> -- 
> 2.37.3
Daniele Ceraolo Spurio Nov. 28, 2022, 9:10 a.m. UTC | #5
On 11/25/2022 5:54 AM, Ville Syrjälä wrote:
> On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
>> The fence is always initialized in huc_init_early, but the cleanup in
>> huc_fini is only being run if HuC is enabled. This causes a leaking of
>> the debug object when HuC is disabled/not supported, which can in turn
>> trigger a warning if we try to register a new debug offset at the same
>> address on driver reload.
>>
>> To fix the issue, make sure to always run the cleanup code.
> This oopsing in ci now. Somehow the patchwork run did not
> hit that oops.

Can you point me to the oops log? I opened a few recent runs at random 
but I wasn't able to find it.
Note that I did spot a potential issue that hits platforms that don't 
have VCS engines (introduced due to a MTL change to support HuC only on 
the media GT) and I already have a fix for that on the ML:

https://patchwork.freedesktop.org/series/111288/

But without looking at the oops logs or knowing on which platform it was 
on I don't know if it's the same issue or not.

Daniele

>
>> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Reported-by: Brian Norris <briannorris@chromium.org>
>> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Brian Norris <briannorris@chromium.org>
>> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>> Cc: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>
>> Note: I didn't manage to repro the reported warning, but I did confirm
>> that we weren't correctly calling i915_sw_fence_fini and that this patch
>> fixes that.
>>
>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++++++-----
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>>   2 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> index fbc8bae14f76..83735a1528fe 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc)
>>   
>>   void intel_huc_fini(struct intel_huc *huc)
>>   {
>> -	if (!intel_uc_fw_is_loadable(&huc->fw))
>> -		return;
>> -
>> +	/*
>> +	 * the fence is initialized in init_early, so we need to clean it up
>> +	 * even if HuC loading is off.
>> +	 */
>>   	delayed_huc_load_complete(huc);
>> -
>>   	i915_sw_fence_fini(&huc->delayed_load.fence);
>> -	intel_uc_fw_fini(&huc->fw);
>> +
>> +	if (intel_uc_fw_is_loadable(&huc->fw))
>> +		intel_uc_fw_fini(&huc->fw);
>>   }
>>   
>>   void intel_huc_suspend(struct intel_huc *huc)
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> index dbd048b77e19..41f08b55790e 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc)
>>   
>>   static const struct intel_uc_ops uc_ops_off = {
>>   	.init_hw = __uc_check_hw,
>> +	.fini = __uc_fini, /* to clean-up the init_early initialization */
>>   };
>>   
>>   static const struct intel_uc_ops uc_ops_on = {
>> -- 
>> 2.37.3
Ville Syrjälä Nov. 28, 2022, 1:08 p.m. UTC | #6
On Mon, Nov 28, 2022 at 01:10:58AM -0800, Ceraolo Spurio, Daniele wrote:
> 
> 
> On 11/25/2022 5:54 AM, Ville Syrjälä wrote:
> > On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> >> The fence is always initialized in huc_init_early, but the cleanup in
> >> huc_fini is only being run if HuC is enabled. This causes a leaking of
> >> the debug object when HuC is disabled/not supported, which can in turn
> >> trigger a warning if we try to register a new debug offset at the same
> >> address on driver reload.
> >>
> >> To fix the issue, make sure to always run the cleanup code.
> > This oopsing in ci now. Somehow the patchwork run did not
> > hit that oops.
> 
> Can you point me to the oops log? I opened a few recent runs at random 
> but I wasn't able to find it.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12425/fi-blb-e6850/igt@core_hotunplug@unbind-rebind.html
Daniele Ceraolo Spurio Nov. 28, 2022, 4:32 p.m. UTC | #7
On 11/28/2022 5:08 AM, Ville Syrjälä wrote:
> On Mon, Nov 28, 2022 at 01:10:58AM -0800, Ceraolo Spurio, Daniele wrote:
>>
>> On 11/25/2022 5:54 AM, Ville Syrjälä wrote:
>>> On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
>>>> The fence is always initialized in huc_init_early, but the cleanup in
>>>> huc_fini is only being run if HuC is enabled. This causes a leaking of
>>>> the debug object when HuC is disabled/not supported, which can in turn
>>>> trigger a warning if we try to register a new debug offset at the same
>>>> address on driver reload.
>>>>
>>>> To fix the issue, make sure to always run the cleanup code.
>>> This oopsing in ci now. Somehow the patchwork run did not
>>> hit that oops.
>> Can you point me to the oops log? I opened a few recent runs at random
>> but I wasn't able to find it.
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12425/fi-blb-e6850/igt@core_hotunplug@unbind-rebind.html

Thanks, it's indeed the same issue (and I've just confirmed that the 
pre-merge result for the fix do mention that this test is moving from 
incomplete to pass). From just a visual inspection I thought the problem 
would only affect MTL, which does have HuC but only on one of the 2 GTs, 
but it looks like this impacts also platforms without HuC at all (as 
long as they also have no VCS engines). I'll try to get the fix reviewed 
and merged ASAP.

Thanks,
Daniele
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index fbc8bae14f76..83735a1528fe 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -300,13 +300,15 @@  int intel_huc_init(struct intel_huc *huc)
 
 void intel_huc_fini(struct intel_huc *huc)
 {
-	if (!intel_uc_fw_is_loadable(&huc->fw))
-		return;
-
+	/*
+	 * the fence is initialized in init_early, so we need to clean it up
+	 * even if HuC loading is off.
+	 */
 	delayed_huc_load_complete(huc);
-
 	i915_sw_fence_fini(&huc->delayed_load.fence);
-	intel_uc_fw_fini(&huc->fw);
+
+	if (intel_uc_fw_is_loadable(&huc->fw))
+		intel_uc_fw_fini(&huc->fw);
 }
 
 void intel_huc_suspend(struct intel_huc *huc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index dbd048b77e19..41f08b55790e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -718,6 +718,7 @@  int intel_uc_runtime_resume(struct intel_uc *uc)
 
 static const struct intel_uc_ops uc_ops_off = {
 	.init_hw = __uc_check_hw,
+	.fini = __uc_fini, /* to clean-up the init_early initialization */
 };
 
 static const struct intel_uc_ops uc_ops_on = {