diff mbox series

drm/xe/display: Fix memleak in display initialization

Message ID 20240126153453.997855-1-xiaoming.wang@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/xe/display: Fix memleak in display initialization | expand

Commit Message

wangxiaoming321 Jan. 26, 2024, 3:34 p.m. UTC
intel_power_domains_init has been called twice in xe_device_probe:
xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
-> intel_power_domains_init(i915)

It needs remove one to avoid power_domains->power_wells double malloc.

unreferenced object 0xffff88811150ee00 (size 512):
  comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
  hex dump (first 32 bytes):
    10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff  ................
    ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
    [<ffffffff812c98b2>] __kmalloc+0x52/0x150
    [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
    [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
    [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
    [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
    [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
    [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
    [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
    [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
    [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
    [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
    [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
    [<ffffffff8192e159>] bus_add_driver+0x119/0x220
    [<ffffffff81930d00>] driver_register+0x60/0x120
    [<ffffffffa05e50a0>] 0xffffffffa05e50a0

Signed-off-by: wangxiaoming321 <xiaoming.wang@intel.com>
---
 drivers/gpu/drm/xe/xe_display.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Lucas De Marchi Jan. 31, 2024, 2:54 p.m. UTC | #1
+Jani

On Fri, Jan 26, 2024 at 11:34:53PM +0800, wangxiaoming321 wrote:
>intel_power_domains_init has been called twice in xe_device_probe:
>xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
>xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
>-> intel_power_domains_init(i915)

ok, once upon a time intel_power_domains_init() was called by the driver
initialization code and not initialized inside the display. I think.
Now it's part of the display probe and we never updated the xe side.

>
>It needs remove one to avoid power_domains->power_wells double malloc.
>
>unreferenced object 0xffff88811150ee00 (size 512):
>  comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
>  hex dump (first 32 bytes):
>    10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff  ................
>    ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
>  backtrace:
>    [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
>    [<ffffffff812c98b2>] __kmalloc+0x52/0x150
>    [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
>    [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
>    [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
>    [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
>    [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
>    [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
>    [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
>    [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
>    [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
>    [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
>    [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
>    [<ffffffff8192e159>] bus_add_driver+0x119/0x220
>    [<ffffffff81930d00>] driver_register+0x60/0x120
>    [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>

This will need a Fixes trailer.  This seems to be a suitable one:

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")

>Signed-off-by: wangxiaoming321 <xiaoming.wang@intel.com>
>---
> drivers/gpu/drm/xe/xe_display.c | 6 ------
> 1 file changed, 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
>index 74391d9b11ae..e4db069f0db3 100644
>--- a/drivers/gpu/drm/xe/xe_display.c
>+++ b/drivers/gpu/drm/xe/xe_display.c
>@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
>
> int xe_display_init_nommio(struct xe_device *xe)
> {
>-	int err;
>-
> 	if (!xe->info.enable_display)
> 		return 0;
>
>@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
> 	/* This must be called before any calls to HAS_PCH_* */
> 	intel_detect_pch(xe);
>
>-	err = intel_power_domains_init(xe);
>-	if (err)
>-		return err;

xe_display_init_nommio() has xe_display_fini_nommio() as its destructor
counter part. Unfortunately display side looks wrong as it does:

init:
	intel_display_driver_probe_noirq() -> intel_power_domains_init()

destroy:
	i915_driver_late_release() -> intel_power_domains_cleanup()

I think leaving intel_power_domains_cleanup() as is for now so it's
called by xe works, but this needs to go through CI, which apparently
this series didn't go. I re-triggered it.

+Jani if he thinks this can be changed in another way or already have
the complete solution.

Lucas De Marchi
Jani Nikula Jan. 31, 2024, 3:07 p.m. UTC | #2
On Wed, 31 Jan 2024, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
> +Jani
>
> On Fri, Jan 26, 2024 at 11:34:53PM +0800, wangxiaoming321 wrote:
>>intel_power_domains_init has been called twice in xe_device_probe:
>>xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
>>xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
>>-> intel_power_domains_init(i915)
>
> ok, once upon a time intel_power_domains_init() was called by the driver
> initialization code and not initialized inside the display. I think.
> Now it's part of the display probe and we never updated the xe side.
>
>>
>>It needs remove one to avoid power_domains->power_wells double malloc.
>>
>>unreferenced object 0xffff88811150ee00 (size 512):
>>  comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
>>  hex dump (first 32 bytes):
>>    10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff  ................
>>    ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
>>  backtrace:
>>    [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
>>    [<ffffffff812c98b2>] __kmalloc+0x52/0x150
>>    [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
>>    [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
>>    [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
>>    [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
>>    [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
>>    [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
>>    [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
>>    [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
>>    [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
>>    [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
>>    [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
>>    [<ffffffff8192e159>] bus_add_driver+0x119/0x220
>>    [<ffffffff81930d00>] driver_register+0x60/0x120
>>    [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>>
>
> This will need a Fixes trailer.  This seems to be a suitable one:
>
> Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
>
>>Signed-off-by: wangxiaoming321 <xiaoming.wang@intel.com>
>>---
>> drivers/gpu/drm/xe/xe_display.c | 6 ------
>> 1 file changed, 6 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
>>index 74391d9b11ae..e4db069f0db3 100644
>>--- a/drivers/gpu/drm/xe/xe_display.c
>>+++ b/drivers/gpu/drm/xe/xe_display.c
>>@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
>>
>> int xe_display_init_nommio(struct xe_device *xe)
>> {
>>-	int err;
>>-
>> 	if (!xe->info.enable_display)
>> 		return 0;
>>
>>@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
>> 	/* This must be called before any calls to HAS_PCH_* */
>> 	intel_detect_pch(xe);
>>
>>-	err = intel_power_domains_init(xe);
>>-	if (err)
>>-		return err;
>
> xe_display_init_nommio() has xe_display_fini_nommio() as its destructor
> counter part. Unfortunately display side looks wrong as it does:
>
> init:
> 	intel_display_driver_probe_noirq() -> intel_power_domains_init()
>
> destroy:
> 	i915_driver_late_release() -> intel_power_domains_cleanup()
>
> I think leaving intel_power_domains_cleanup() as is for now so it's
> called by xe works, but this needs to go through CI, which apparently
> this series didn't go. I re-triggered it.
>
> +Jani if he thinks this can be changed in another way or already have
> the complete solution.

I don't. But it is and will be a recurring problem. i915 and xe core
drivers should handle display init and cleanup the same way. But
currently i915 goes on to call e.g. intel_power_domains_cleanup()
directly from top level driver code. There are other examples.

And we seem to have recently added *more*. See e.g. bd738d859e71
("drm/i915: Prevent modesets during driver init/shutdown").


BR,
Jani.
Maarten Lankhorst Feb. 1, 2024, 2:19 p.m. UTC | #3
On 2024-01-31 16:07, Jani Nikula wrote:
> On Wed, 31 Jan 2024, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
>> +Jani
>>
>> On Fri, Jan 26, 2024 at 11:34:53PM +0800, wangxiaoming321 wrote:
>>> intel_power_domains_init has been called twice in xe_device_probe:
>>> xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
>>> xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
>>> -> intel_power_domains_init(i915)
>>
>> ok, once upon a time intel_power_domains_init() was called by the driver
>> initialization code and not initialized inside the display. I think.
>> Now it's part of the display probe and we never updated the xe side.
>>
>>>
>>> It needs remove one to avoid power_domains->power_wells double malloc.
>>>
>>> unreferenced object 0xffff88811150ee00 (size 512):
>>>   comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
>>>   hex dump (first 32 bytes):
>>>     10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff  ................
>>>     ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
>>>   backtrace:
>>>     [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
>>>     [<ffffffff812c98b2>] __kmalloc+0x52/0x150
>>>     [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
>>>     [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
>>>     [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
>>>     [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
>>>     [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
>>>     [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
>>>     [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
>>>     [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
>>>     [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
>>>     [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
>>>     [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
>>>     [<ffffffff8192e159>] bus_add_driver+0x119/0x220
>>>     [<ffffffff81930d00>] driver_register+0x60/0x120
>>>     [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>>>
>>
>> This will need a Fixes trailer.  This seems to be a suitable one:
>>
>> Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
>>
>>> Signed-off-by: wangxiaoming321 <xiaoming.wang@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_display.c | 6 ------
>>> 1 file changed, 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
>>> index 74391d9b11ae..e4db069f0db3 100644
>>> --- a/drivers/gpu/drm/xe/xe_display.c
>>> +++ b/drivers/gpu/drm/xe/xe_display.c
>>> @@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
>>>
>>> int xe_display_init_nommio(struct xe_device *xe)
>>> {
>>> -	int err;
>>> -
>>> 	if (!xe->info.enable_display)
>>> 		return 0;
>>>
>>> @@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
>>> 	/* This must be called before any calls to HAS_PCH_* */
>>> 	intel_detect_pch(xe);
>>>
>>> -	err = intel_power_domains_init(xe);
>>> -	if (err)
>>> -		return err;
>>
>> xe_display_init_nommio() has xe_display_fini_nommio() as its destructor
>> counter part. Unfortunately display side looks wrong as it does:
>>
>> init:
>> 	intel_display_driver_probe_noirq() -> intel_power_domains_init()
>>
>> destroy:
>> 	i915_driver_late_release() -> intel_power_domains_cleanup()
>>
>> I think leaving intel_power_domains_cleanup() as is for now so it's
>> called by xe works, but this needs to go through CI, which apparently
>> this series didn't go. I re-triggered it.
>>
>> +Jani if he thinks this can be changed in another way or already have
>> the complete solution.
> 
> I don't. But it is and will be a recurring problem. i915 and xe core
> drivers should handle display init and cleanup the same way. But
> currently i915 goes on to call e.g. intel_power_domains_cleanup()
> directly from top level driver code. There are other examples.
> 
> And we seem to have recently added *more*. See e.g. bd738d859e71
> ("drm/i915: Prevent modesets during driver init/shutdown").
That commit seems terrible Should we instead not only enable any code 
that can cause modesets after it's safe to do so?

Cheers,
~Maarten
diff mbox series

Patch

diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
index 74391d9b11ae..e4db069f0db3 100644
--- a/drivers/gpu/drm/xe/xe_display.c
+++ b/drivers/gpu/drm/xe/xe_display.c
@@ -134,8 +134,6 @@  static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
 
 int xe_display_init_nommio(struct xe_device *xe)
 {
-	int err;
-
 	if (!xe->info.enable_display)
 		return 0;
 
@@ -145,10 +143,6 @@  int xe_display_init_nommio(struct xe_device *xe)
 	/* This must be called before any calls to HAS_PCH_* */
 	intel_detect_pch(xe);
 
-	err = intel_power_domains_init(xe);
-	if (err)
-		return err;
-
 	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
 }