diff mbox series

[v2] drm/i915: do not clean GT table on error path

Message ID 20231115-dont_clean_gt_on_error_path-v2-1-54250125470a@intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm/i915: do not clean GT table on error path | expand

Commit Message

Andrzej Hajda Nov. 15, 2023, 10:54 a.m. UTC
The only task of intel_gt_release_all is to zero gt table. Calling
it on error path prevents intel_gt_driver_late_release_all (called from
i915_driver_late_release) to cleanup GTs, causing leakage.
After i915_driver_late_release GT array is not used anymore so
it does not need cleaning at all.

Sample leak report:

BUG i915_request (...): Objects remaining in i915_request on __kmem_cache_shutdown()
...
Object 0xffff888113420040 @offset=64
Allocated in __i915_request_create+0x75/0x610 [i915] age=18339 cpu=1 pid=1454
 kmem_cache_alloc+0x25b/0x270
 __i915_request_create+0x75/0x610 [i915]
 i915_request_create+0x109/0x290 [i915]
 __engines_record_defaults+0xca/0x440 [i915]
 intel_gt_init+0x275/0x430 [i915]
 i915_gem_init+0x135/0x2c0 [i915]
 i915_driver_probe+0x8d1/0xdc0 [i915]

v2: removed whole intel_gt_release_all

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8489
Fixes: bec68cc9ea42d8 ("drm/i915: Prepare for multiple GTs")
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
- Link to v1: https://lore.kernel.org/r/20231114-dont_clean_gt_on_error_path-v1-1-37f2fa827fd2@intel.com
---
 drivers/gpu/drm/i915/gt/intel_gt.c | 11 -----------
 drivers/gpu/drm/i915/i915_driver.c |  4 +---
 2 files changed, 1 insertion(+), 14 deletions(-)


---
base-commit: 1489bab52c281a869295414031a56506a375b036
change-id: 20231114-dont_clean_gt_on_error_path-91cd9c3caa0a

Best regards,

Comments

Tvrtko Ursulin Nov. 15, 2023, 11:12 a.m. UTC | #1
On 15/11/2023 10:54, Andrzej Hajda wrote:
> The only task of intel_gt_release_all is to zero gt table. Calling
> it on error path prevents intel_gt_driver_late_release_all (called from
> i915_driver_late_release) to cleanup GTs, causing leakage.
> After i915_driver_late_release GT array is not used anymore so
> it does not need cleaning at all.
> 
> Sample leak report:
> 
> BUG i915_request (...): Objects remaining in i915_request on __kmem_cache_shutdown()
> ...
> Object 0xffff888113420040 @offset=64
> Allocated in __i915_request_create+0x75/0x610 [i915] age=18339 cpu=1 pid=1454
>   kmem_cache_alloc+0x25b/0x270
>   __i915_request_create+0x75/0x610 [i915]
>   i915_request_create+0x109/0x290 [i915]
>   __engines_record_defaults+0xca/0x440 [i915]
>   intel_gt_init+0x275/0x430 [i915]
>   i915_gem_init+0x135/0x2c0 [i915]
>   i915_driver_probe+0x8d1/0xdc0 [i915]
> 
> v2: removed whole intel_gt_release_all
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8489
> Fixes: bec68cc9ea42d8 ("drm/i915: Prepare for multiple GTs")
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> ---
> - Link to v1: https://lore.kernel.org/r/20231114-dont_clean_gt_on_error_path-v1-1-37f2fa827fd2@intel.com
> ---
>   drivers/gpu/drm/i915/gt/intel_gt.c | 11 -----------
>   drivers/gpu/drm/i915/i915_driver.c |  4 +---
>   2 files changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index ed32bf5b15464e..ba1186fc524f84 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -982,8 +982,6 @@ int intel_gt_probe_all(struct drm_i915_private *i915)
>   
>   err:
>   	i915_probe_error(i915, "Failed to initialize %s! (%d)\n", gtdef->name, ret);
> -	intel_gt_release_all(i915);
> -
>   	return ret;
>   }
>   
> @@ -1002,15 +1000,6 @@ int intel_gt_tiles_init(struct drm_i915_private *i915)
>   	return 0;
>   }
>   
> -void intel_gt_release_all(struct drm_i915_private *i915)
> -{
> -	struct intel_gt *gt;
> -	unsigned int id;
> -
> -	for_each_gt(gt, i915, id)
> -		i915->gt[id] = NULL;
> -}
> -
>   void intel_gt_info_print(const struct intel_gt_info *info,
>   			 struct drm_printer *p)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 01fd25b622d16c..2a1faf4039659c 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -776,7 +776,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   
>   	ret = i915_driver_mmio_probe(i915);
>   	if (ret < 0)
> -		goto out_tiles_cleanup;
> +		goto out_runtime_pm_put;
>   
>   	ret = i915_driver_hw_probe(i915);
>   	if (ret < 0)
> @@ -836,8 +836,6 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   	i915_ggtt_driver_late_release(i915);
>   out_cleanup_mmio:
>   	i915_driver_mmio_release(i915);
> -out_tiles_cleanup:
> -	intel_gt_release_all(i915);
>   out_runtime_pm_put:
>   	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>   	i915_driver_late_release(i915);
> 
> ---
> base-commit: 1489bab52c281a869295414031a56506a375b036
> change-id: 20231114-dont_clean_gt_on_error_path-91cd9c3caa0a

Looks okay.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Although on the overall driver init/exit in general again started 
feeling a bit messy. Seems like it would need an overhaul every few 
years to keep tidy and logical but meh.

Regards,

Tvrtko
Nirmoy Das Nov. 15, 2023, 4:28 p.m. UTC | #2
On 11/15/2023 11:54 AM, Andrzej Hajda wrote:
> The only task of intel_gt_release_all is to zero gt table. Calling
> it on error path prevents intel_gt_driver_late_release_all (called from
> i915_driver_late_release) to cleanup GTs, causing leakage.
> After i915_driver_late_release GT array is not used anymore so
> it does not need cleaning at all.
>
> Sample leak report:
>
> BUG i915_request (...): Objects remaining in i915_request on __kmem_cache_shutdown()
> ...
> Object 0xffff888113420040 @offset=64
> Allocated in __i915_request_create+0x75/0x610 [i915] age=18339 cpu=1 pid=1454
>   kmem_cache_alloc+0x25b/0x270
>   __i915_request_create+0x75/0x610 [i915]
>   i915_request_create+0x109/0x290 [i915]
>   __engines_record_defaults+0xca/0x440 [i915]
>   intel_gt_init+0x275/0x430 [i915]
>   i915_gem_init+0x135/0x2c0 [i915]
>   i915_driver_probe+0x8d1/0xdc0 [i915]
>
> v2: removed whole intel_gt_release_all
>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8489
> Fixes: bec68cc9ea42d8 ("drm/i915: Prepare for multiple GTs")
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>


> ---
> - Link to v1: https://lore.kernel.org/r/20231114-dont_clean_gt_on_error_path-v1-1-37f2fa827fd2@intel.com
> ---
>   drivers/gpu/drm/i915/gt/intel_gt.c | 11 -----------
>   drivers/gpu/drm/i915/i915_driver.c |  4 +---
>   2 files changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index ed32bf5b15464e..ba1186fc524f84 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -982,8 +982,6 @@ int intel_gt_probe_all(struct drm_i915_private *i915)
>   
>   err:
>   	i915_probe_error(i915, "Failed to initialize %s! (%d)\n", gtdef->name, ret);
> -	intel_gt_release_all(i915);
> -
>   	return ret;
>   }
>   
> @@ -1002,15 +1000,6 @@ int intel_gt_tiles_init(struct drm_i915_private *i915)
>   	return 0;
>   }
>   
> -void intel_gt_release_all(struct drm_i915_private *i915)
> -{
> -	struct intel_gt *gt;
> -	unsigned int id;
> -
> -	for_each_gt(gt, i915, id)
> -		i915->gt[id] = NULL;
> -}
> -
>   void intel_gt_info_print(const struct intel_gt_info *info,
>   			 struct drm_printer *p)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 01fd25b622d16c..2a1faf4039659c 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -776,7 +776,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   
>   	ret = i915_driver_mmio_probe(i915);
>   	if (ret < 0)
> -		goto out_tiles_cleanup;
> +		goto out_runtime_pm_put;
>   
>   	ret = i915_driver_hw_probe(i915);
>   	if (ret < 0)
> @@ -836,8 +836,6 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   	i915_ggtt_driver_late_release(i915);
>   out_cleanup_mmio:
>   	i915_driver_mmio_release(i915);
> -out_tiles_cleanup:
> -	intel_gt_release_all(i915);
>   out_runtime_pm_put:
>   	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>   	i915_driver_late_release(i915);
>
> ---
> base-commit: 1489bab52c281a869295414031a56506a375b036
> change-id: 20231114-dont_clean_gt_on_error_path-91cd9c3caa0a
>
> Best regards,
Andi Shyti Nov. 16, 2023, 11:44 a.m. UTC | #3
Hi Andrzej,

On Wed, Nov 15, 2023 at 11:54:03AM +0100, Andrzej Hajda wrote:
> The only task of intel_gt_release_all is to zero gt table. Calling
> it on error path prevents intel_gt_driver_late_release_all (called from
> i915_driver_late_release) to cleanup GTs, causing leakage.
> After i915_driver_late_release GT array is not used anymore so
> it does not need cleaning at all.
> 
> Sample leak report:
> 
> BUG i915_request (...): Objects remaining in i915_request on __kmem_cache_shutdown()
> ...
> Object 0xffff888113420040 @offset=64
> Allocated in __i915_request_create+0x75/0x610 [i915] age=18339 cpu=1 pid=1454
>  kmem_cache_alloc+0x25b/0x270
>  __i915_request_create+0x75/0x610 [i915]
>  i915_request_create+0x109/0x290 [i915]
>  __engines_record_defaults+0xca/0x440 [i915]
>  intel_gt_init+0x275/0x430 [i915]
>  i915_gem_init+0x135/0x2c0 [i915]
>  i915_driver_probe+0x8d1/0xdc0 [i915]
> 
> v2: removed whole intel_gt_release_all
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8489
> Fixes: bec68cc9ea42d8 ("drm/i915: Prepare for multiple GTs")
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi
Andrzej Hajda Nov. 16, 2023, 12:34 p.m. UTC | #4
On 16.11.2023 12:44, Andi Shyti wrote:
> Hi Andrzej,
>
> On Wed, Nov 15, 2023 at 11:54:03AM +0100, Andrzej Hajda wrote:
>> The only task of intel_gt_release_all is to zero gt table. Calling
>> it on error path prevents intel_gt_driver_late_release_all (called from
>> i915_driver_late_release) to cleanup GTs, causing leakage.
>> After i915_driver_late_release GT array is not used anymore so
>> it does not need cleaning at all.
>>
>> Sample leak report:
>>
>> BUG i915_request (...): Objects remaining in i915_request on __kmem_cache_shutdown()
>> ...
>> Object 0xffff888113420040 @offset=64
>> Allocated in __i915_request_create+0x75/0x610 [i915] age=18339 cpu=1 pid=1454
>>   kmem_cache_alloc+0x25b/0x270
>>   __i915_request_create+0x75/0x610 [i915]
>>   i915_request_create+0x109/0x290 [i915]
>>   __engines_record_defaults+0xca/0x440 [i915]
>>   intel_gt_init+0x275/0x430 [i915]
>>   i915_gem_init+0x135/0x2c0 [i915]
>>   i915_driver_probe+0x8d1/0xdc0 [i915]
>>
>> v2: removed whole intel_gt_release_all
>>
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8489
>> Fixes: bec68cc9ea42d8 ("drm/i915: Prepare for multiple GTs")
>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thx all, pushed.

> Andi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index ed32bf5b15464e..ba1186fc524f84 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -982,8 +982,6 @@  int intel_gt_probe_all(struct drm_i915_private *i915)
 
 err:
 	i915_probe_error(i915, "Failed to initialize %s! (%d)\n", gtdef->name, ret);
-	intel_gt_release_all(i915);
-
 	return ret;
 }
 
@@ -1002,15 +1000,6 @@  int intel_gt_tiles_init(struct drm_i915_private *i915)
 	return 0;
 }
 
-void intel_gt_release_all(struct drm_i915_private *i915)
-{
-	struct intel_gt *gt;
-	unsigned int id;
-
-	for_each_gt(gt, i915, id)
-		i915->gt[id] = NULL;
-}
-
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 01fd25b622d16c..2a1faf4039659c 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -776,7 +776,7 @@  int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = i915_driver_mmio_probe(i915);
 	if (ret < 0)
-		goto out_tiles_cleanup;
+		goto out_runtime_pm_put;
 
 	ret = i915_driver_hw_probe(i915);
 	if (ret < 0)
@@ -836,8 +836,6 @@  int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	i915_ggtt_driver_late_release(i915);
 out_cleanup_mmio:
 	i915_driver_mmio_release(i915);
-out_tiles_cleanup:
-	intel_gt_release_all(i915);
 out_runtime_pm_put:
 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
 	i915_driver_late_release(i915);