diff mbox series

[v2] drm: Ensure Proper Unload/Reload Order of MEI Modules for i915/Xe Driver

Message ID 20240909040317.17108-1-krishnaiah.bommu@intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm: Ensure Proper Unload/Reload Order of MEI Modules for i915/Xe Driver | expand

Commit Message

Bommu, Krishnaiah Sept. 9, 2024, 4:03 a.m. UTC
This update addresses the unload/reload sequence of MEI modules in relation to
the i915/Xe graphics driver. On platforms where the MEI hardware is integrated
with the graphics device (e.g., DG2/BMG), the i915/xe driver is depend on the MEI
modules. Conversely, on newer platforms like MTL and LNL, where the MEI hardware
is separate, this dependency does not exist.

The changes introduced ensure that MEI modules are unloaded and reloaded in the
correct order based on platform-specific dependencies. This is achieved by adding
a MODULE_SOFTDEP directive to the i915 and Xe module code.

These changes enhance the robustness of MEI module handling across different hardware
platforms, ensuring that the i915/Xe driver can be cleanly unloaded and reloaded
without issues.

v2: updated commit message

Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/i915/i915_module.c | 2 ++
 drivers/gpu/drm/xe/xe_module.c     | 2 ++
 2 files changed, 4 insertions(+)

Comments

Rodrigo Vivi Sept. 10, 2024, 3:03 p.m. UTC | #1
On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
> This update addresses the unload/reload sequence of MEI modules in relation to
> the i915/Xe graphics driver. On platforms where the MEI hardware is integrated
> with the graphics device (e.g., DG2/BMG), the i915/xe driver is depend on the MEI
> modules. Conversely, on newer platforms like MTL and LNL, where the MEI hardware
> is separate, this dependency does not exist.
> 
> The changes introduced ensure that MEI modules are unloaded and reloaded in the
> correct order based on platform-specific dependencies. This is achieved by adding
> a MODULE_SOFTDEP directive to the i915 and Xe module code.
> 
> These changes enhance the robustness of MEI module handling across different hardware
> platforms, ensuring that the i915/Xe driver can be cleanly unloaded and reloaded
> without issues.
> 
> v2: updated commit message
> 
> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_module.c b/drivers/gpu/drm/i915/i915_module.c
> index 65acd7bf75d0..2ad079ad35db 100644
> --- a/drivers/gpu/drm/i915/i915_module.c
> +++ b/drivers/gpu/drm/i915/i915_module.c
> @@ -75,6 +75,8 @@ static const struct {
>  };
>  static int init_progress;
>  
> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
> +
>  static int __init i915_init(void)
>  {
>  	int err, i;
> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> index bfc3deebdaa2..5633ea1841b7 100644
> --- a/drivers/gpu/drm/xe/xe_module.c
> +++ b/drivers/gpu/drm/xe/xe_module.c
> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>  	init_funcs[i].exit();
>  }
>  
> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");

I'm honestly not very comfortable with this.

1. This is not true for every device supported by these modules.
2. This is not true for every (and the most basic) functionality of these drivers.

Shouldn't this be done in the the mei side?

Couldn't at probe we identify the need of them and if needed we return -EPROBE to
attempt a retry after the mei drivers were probed?

Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>

> +
>  static int __init xe_init(void)
>  {
>  	int err, i;
> -- 
> 2.25.1
>
Lucas De Marchi Sept. 10, 2024, 3:43 p.m. UTC | #2
On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> This update addresses the unload/reload sequence of MEI modules in relation to
>> the i915/Xe graphics driver. On platforms where the MEI hardware is integrated
>> with the graphics device (e.g., DG2/BMG), the i915/xe driver is depend on the MEI
>> modules. Conversely, on newer platforms like MTL and LNL, where the MEI hardware
>> is separate, this dependency does not exist.
>>
>> The changes introduced ensure that MEI modules are unloaded and reloaded in the
>> correct order based on platform-specific dependencies. This is achieved by adding
>> a MODULE_SOFTDEP directive to the i915 and Xe module code.


can you explain what causes the modules to be loaded today? Also, is
this to fix anything related to *loading* order or just unload?

>>
>> These changes enhance the robustness of MEI module handling across different hardware
>> platforms, ensuring that the i915/Xe driver can be cleanly unloaded and reloaded
>> without issues.
>>
>> v2: updated commit message
>>
>> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
>> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>>  2 files changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_module.c b/drivers/gpu/drm/i915/i915_module.c
>> index 65acd7bf75d0..2ad079ad35db 100644
>> --- a/drivers/gpu/drm/i915/i915_module.c
>> +++ b/drivers/gpu/drm/i915/i915_module.c
>> @@ -75,6 +75,8 @@ static const struct {
>>  };
>>  static int init_progress;
>>
>> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> +
>>  static int __init i915_init(void)
>>  {
>>  	int err, i;
>> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
>> index bfc3deebdaa2..5633ea1841b7 100644
>> --- a/drivers/gpu/drm/xe/xe_module.c
>> +++ b/drivers/gpu/drm/xe/xe_module.c
>> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>>  	init_funcs[i].exit();
>>  }
>>
>> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>
>I'm honestly not very comfortable with this.
>
>1. This is not true for every device supported by these modules.
>2. This is not true for every (and the most basic) functionality of these drivers.
>
>Shouldn't this be done in the the mei side?

I don't think it's possible to do from the mei side. Would mei depend on
both xe and i915 (and thus cause both to be loaded regardless of the
platform?). For a runtime dependency like this that depends on the
platform, I think the best way would be a weakdep + either a request_module()
or something else that causes the module to load (is that what comp_* is
doing today?)

>
>Couldn't at probe we identify the need of them and if needed we return -EPROBE to
>attempt a retry after the mei drivers were probed?

I'm not sure this is fixing anything for probe. I think we already wait on
the other component to be ready without blocking the rest of the driver
functionality.

A weakdep wouldn't cause the module to be loaded where it's not needed,
but need some clarification if this is trying to fix anything
load-related or just unload.

Lucas De Marchi

>
>Cc: Alexander Usyskin <alexander.usyskin@intel.com>
>Cc: Tomas Winkler <tomas.winkler@intel.com>
>Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>Cc: Jani Nikula <jani.nikula@intel.com>
>Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>Cc: Tvrtko Ursulin <tursulin@ursulin.net>
>
>> +
>>  static int __init xe_init(void)
>>  {
>>  	int err, i;
>> --
>> 2.25.1
>>
Bommu, Krishnaiah Sept. 11, 2024, 6 a.m. UTC | #3
> -----Original Message-----
> From: De Marchi, Lucas <lucas.demarchi@intel.com>
> Sent: Tuesday, September 10, 2024 9:13 PM
> To: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Cc: Bommu, Krishnaiah <krishnaiah.bommu@intel.com>; intel-
> xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Kamil Konieczny
> <kamil.konieczny@linux.intel.com>; Ceraolo Spurio, Daniele
> <daniele.ceraolospurio@intel.com>; Upadhyay, Tejas
> <tejas.upadhyay@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>; Joonas
> Lahtinen <joonas.lahtinen@linux.intel.com>; Nikula, Jani
> <jani.nikula@intel.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>; Teres Alexis, Alan Previn
> <alan.previn.teres.alexis@intel.com>; Winkler, Tomas
> <tomas.winkler@intel.com>; Usyskin, Alexander
> <alexander.usyskin@intel.com>
> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
> Modules for i915/Xe Driver
> 
> On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
> >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
> >> This update addresses the unload/reload sequence of MEI modules in
> >> relation to the i915/Xe graphics driver. On platforms where the MEI
> >> hardware is integrated with the graphics device (e.g., DG2/BMG), the
> >> i915/xe driver is depend on the MEI modules. Conversely, on newer
> >> platforms like MTL and LNL, where the MEI hardware is separate, this
> dependency does not exist.
> >>
> >> The changes introduced ensure that MEI modules are unloaded and
> >> reloaded in the correct order based on platform-specific
> >> dependencies. This is achieved by adding a MODULE_SOFTDEP directive to
> the i915 and Xe module code.
> 
> 
> can you explain what causes the modules to be loaded today? Also, is this to fix
> anything related to *loading* order or just unload?
> 
> >>
> >> These changes enhance the robustness of MEI module handling across
> >> different hardware platforms, ensuring that the i915/Xe driver can be
> >> cleanly unloaded and reloaded without issues.
> >>
> >> v2: updated commit message
> >>
> >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> >> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> >> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
> >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
> >>  2 files changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_module.c
> >> b/drivers/gpu/drm/i915/i915_module.c
> >> index 65acd7bf75d0..2ad079ad35db 100644
> >> --- a/drivers/gpu/drm/i915/i915_module.c
> >> +++ b/drivers/gpu/drm/i915/i915_module.c
> >> @@ -75,6 +75,8 @@ static const struct {  };  static int
> >> init_progress;
> >>
> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
> >> +
> >>  static int __init i915_init(void)
> >>  {
> >>  	int err, i;
> >> diff --git a/drivers/gpu/drm/xe/xe_module.c
> >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
> >> 100644
> >> --- a/drivers/gpu/drm/xe/xe_module.c
> >> +++ b/drivers/gpu/drm/xe/xe_module.c
> >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
> >>  	init_funcs[i].exit();
> >>  }
> >>
> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
> >
> >I'm honestly not very comfortable with this.
> >
> >1. This is not true for every device supported by these modules.
> >2. This is not true for every (and the most basic) functionality of these drivers.
> >
> >Shouldn't this be done in the the mei side?
> 
> I don't think it's possible to do from the mei side. Would mei depend on both xe
> and i915 (and thus cause both to be loaded regardless of the platform?). For a
> runtime dependency like this that depends on the platform, I think the best way
> would be a weakdep + either a request_module() or something else that causes
> the module to load (is that what comp_* is doing today?)
> 
> >
> >Couldn't at probe we identify the need of them and if needed we return
> >-EPROBE to attempt a retry after the mei drivers were probed?
> 
> I'm not sure this is fixing anything for probe. I think we already wait on the other
> component to be ready without blocking the rest of the driver functionality.
> 
> A weakdep wouldn't cause the module to be loaded where it's not needed, but
> need some clarification if this is trying to fix anything load-related or just unload.

This change is fixing unload.
During xe load I am seeing mei_gsc modules was loaded, but not unloaded during the unload xe

root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after system reboot 
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_hdcp               28672  0
mei_pxp                16384  0
mei_me                 49152  2
mei                   167936  5 mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# modprobe xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_gsc_proxy          16384  0
mei_gsc                12288  1
mei_hdcp               28672  0
mei_pxp                16384  0
mei_me                 49152  3 mei_gsc
mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# init 3
root@DUT6127BMGFRD:/home/gta# echo -n auto > /sys/bus/pci/devices/0000\:03\:00.0/power/control
root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" > /sys/bus/pci/drivers/xe/unbind
root@DUT6127BMGFRD:/home/gta# modprobe -r xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_gsc_proxy          16384  0
mei_gsc                12288  0
mei_hdcp               28672  0
mei_pxp                16384  0
mei_me                 49152  3 mei_gsc
mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta#

Regards,
Krishna.

> 
> Lucas De Marchi
> 
> >
> >Cc: Alexander Usyskin <alexander.usyskin@intel.com>
> >Cc: Tomas Winkler <tomas.winkler@intel.com>
> >Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> >Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >Cc: Jani Nikula <jani.nikula@intel.com>
> >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >Cc: Tvrtko Ursulin <tursulin@ursulin.net>
> >
> >> +
> >>  static int __init xe_init(void)
> >>  {
> >>  	int err, i;
> >> --
> >> 2.25.1
> >>
Lucas De Marchi Sept. 11, 2024, 4:18 p.m. UTC | #4
+ linux-modules
+ Luis

On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:
>
>
>> -----Original Message-----
>> From: De Marchi, Lucas <lucas.demarchi@intel.com>
>> Sent: Tuesday, September 10, 2024 9:13 PM
>> To: Vivi, Rodrigo <rodrigo.vivi@intel.com>
>> Cc: Bommu, Krishnaiah <krishnaiah.bommu@intel.com>; intel-
>> xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Kamil Konieczny
>> <kamil.konieczny@linux.intel.com>; Ceraolo Spurio, Daniele
>> <daniele.ceraolospurio@intel.com>; Upadhyay, Tejas
>> <tejas.upadhyay@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>; Joonas
>> Lahtinen <joonas.lahtinen@linux.intel.com>; Nikula, Jani
>> <jani.nikula@intel.com>; Thomas Hellström
>> <thomas.hellstrom@linux.intel.com>; Teres Alexis, Alan Previn
>> <alan.previn.teres.alexis@intel.com>; Winkler, Tomas
>> <tomas.winkler@intel.com>; Usyskin, Alexander
>> <alexander.usyskin@intel.com>
>> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
>> Modules for i915/Xe Driver
>>
>> On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>> >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> >> This update addresses the unload/reload sequence of MEI modules in
>> >> relation to the i915/Xe graphics driver. On platforms where the MEI
>> >> hardware is integrated with the graphics device (e.g., DG2/BMG), the
>> >> i915/xe driver is depend on the MEI modules. Conversely, on newer
>> >> platforms like MTL and LNL, where the MEI hardware is separate, this
>> dependency does not exist.
>> >>
>> >> The changes introduced ensure that MEI modules are unloaded and
>> >> reloaded in the correct order based on platform-specific
>> >> dependencies. This is achieved by adding a MODULE_SOFTDEP directive to
>> the i915 and Xe module code.
>>
>>
>> can you explain what causes the modules to be loaded today? Also, is this to fix
>> anything related to *loading* order or just unload?
>>
>> >>
>> >> These changes enhance the robustness of MEI module handling across
>> >> different hardware platforms, ensuring that the i915/Xe driver can be
>> >> cleanly unloaded and reloaded without issues.
>> >>
>> >> v2: updated commit message
>> >>
>> >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
>> >> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
>> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> >> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> >> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> >> ---
>> >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>> >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>> >>  2 files changed, 4 insertions(+)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/i915_module.c
>> >> b/drivers/gpu/drm/i915/i915_module.c
>> >> index 65acd7bf75d0..2ad079ad35db 100644
>> >> --- a/drivers/gpu/drm/i915/i915_module.c
>> >> +++ b/drivers/gpu/drm/i915/i915_module.c
>> >> @@ -75,6 +75,8 @@ static const struct {  };  static int
>> >> init_progress;
>> >>
>> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >> +
>> >>  static int __init i915_init(void)
>> >>  {
>> >>  	int err, i;
>> >> diff --git a/drivers/gpu/drm/xe/xe_module.c
>> >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
>> >> 100644
>> >> --- a/drivers/gpu/drm/xe/xe_module.c
>> >> +++ b/drivers/gpu/drm/xe/xe_module.c
>> >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>> >>  	init_funcs[i].exit();
>> >>  }
>> >>
>> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >
>> >I'm honestly not very comfortable with this.
>> >
>> >1. This is not true for every device supported by these modules.
>> >2. This is not true for every (and the most basic) functionality of these drivers.
>> >
>> >Shouldn't this be done in the the mei side?
>>
>> I don't think it's possible to do from the mei side. Would mei depend on both xe
>> and i915 (and thus cause both to be loaded regardless of the platform?). For a
>> runtime dependency like this that depends on the platform, I think the best way
>> would be a weakdep + either a request_module() or something else that causes
>> the module to load (is that what comp_* is doing today?)
>>
>> >
>> >Couldn't at probe we identify the need of them and if needed we return
>> >-EPROBE to attempt a retry after the mei drivers were probed?
>>
>> I'm not sure this is fixing anything for probe. I think we already wait on the other
>> component to be ready without blocking the rest of the driver functionality.
>>
>> A weakdep wouldn't cause the module to be loaded where it's not needed, but
>> need some clarification if this is trying to fix anything load-related or just unload.
>
>This change is fixing unload.
>During xe load I am seeing mei_gsc modules was loaded, but not unloaded during the unload xe

so, first thing: if things are correct in the kernel, we shouldn't need to **unload** the module
after unbinding the device. Why are we unloading xe and the other
modules for tests? 

>
>root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after system reboot
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  2
>mei                   167936  5 mei_hdcp,mei_pxp,mei_me
>root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta# modprobe xe
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_gsc_proxy          16384  0
>mei_gsc                12288  1

			       ^ which means there's one user, which
			         should be xe

>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  3 mei_gsc
>mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta# init 3
>root@DUT6127BMGFRD:/home/gta# echo -n auto > /sys/bus/pci/devices/0000\:03\:00.0/power/control
>root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" > /sys/bus/pci/drivers/xe/unbind
>root@DUT6127BMGFRD:/home/gta# modprobe -r xe
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
>root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_gsc_proxy          16384  0
>mei_gsc                12288  0

			       ^ great, so the refcount went to 0,
			         confirming it was xe. It should go to 0
				 even before you unload the module,
				 when unbind.

A couple of points:

1) why do we care about unloading mei_gsc. Just loading xe
    again (or even not even unloading it, just unbind/rebind),
    should still work if the xe <-> mei_gsc integration is done
    correctly.

2) If for some reason we do want to remove the module, then we will
    need some work in kernel/module/  to start tracking runtime module
    dependencies, i.e. when one module does a module_get(foo->owner), it
    would add to a list and output on sysfs together with the holders list.
    This way you would be able to track the runtime deps and remove them
    if their refcount went to 0 after removing xe.

(2) is doable, but previous attempts were not successful [1]. Is  there
something else to make the simpler solution (1) to work?

thanks
Lucas De Marchi

[1] https://lore.kernel.org/linux-modules/cover.1652113087.git.mchehab@kernel.org/

>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  3 mei_gsc
>mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>root@DUT6127BMGFRD:/home/gta#
>
>Regards,
>Krishna.
>
>>
>> Lucas De Marchi
>>
>> >
>> >Cc: Alexander Usyskin <alexander.usyskin@intel.com>
>> >Cc: Tomas Winkler <tomas.winkler@intel.com>
>> >Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>> >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> >Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> >Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> >Cc: Jani Nikula <jani.nikula@intel.com>
>> >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> >Cc: Tvrtko Ursulin <tursulin@ursulin.net>
>> >
>> >> +
>> >>  static int __init xe_init(void)
>> >>  {
>> >>  	int err, i;
>> >> --
>> >> 2.25.1
>> >>
Rodrigo Vivi Sept. 11, 2024, 8:41 p.m. UTC | #5
On Wed, Sep 11, 2024 at 06:00:47AM +0000, Bommu, Krishnaiah wrote:
> 
> 
> > -----Original Message-----
> > From: De Marchi, Lucas <lucas.demarchi@intel.com>
> > Sent: Tuesday, September 10, 2024 9:13 PM
> > To: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> > Cc: Bommu, Krishnaiah <krishnaiah.bommu@intel.com>; intel-
> > xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Kamil Konieczny
> > <kamil.konieczny@linux.intel.com>; Ceraolo Spurio, Daniele
> > <daniele.ceraolospurio@intel.com>; Upadhyay, Tejas
> > <tejas.upadhyay@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>; Joonas
> > Lahtinen <joonas.lahtinen@linux.intel.com>; Nikula, Jani
> > <jani.nikula@intel.com>; Thomas Hellström
> > <thomas.hellstrom@linux.intel.com>; Teres Alexis, Alan Previn
> > <alan.previn.teres.alexis@intel.com>; Winkler, Tomas
> > <tomas.winkler@intel.com>; Usyskin, Alexander
> > <alexander.usyskin@intel.com>
> > Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
> > Modules for i915/Xe Driver
> > 
> > On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
> > >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
> > >> This update addresses the unload/reload sequence of MEI modules in
> > >> relation to the i915/Xe graphics driver. On platforms where the MEI
> > >> hardware is integrated with the graphics device (e.g., DG2/BMG), the
> > >> i915/xe driver is depend on the MEI modules. Conversely, on newer
> > >> platforms like MTL and LNL, where the MEI hardware is separate, this
> > dependency does not exist.
> > >>
> > >> The changes introduced ensure that MEI modules are unloaded and
> > >> reloaded in the correct order based on platform-specific
> > >> dependencies. This is achieved by adding a MODULE_SOFTDEP directive to
> > the i915 and Xe module code.
> > 
> > 
> > can you explain what causes the modules to be loaded today? Also, is this to fix
> > anything related to *loading* order or just unload?
> > 
> > >>
> > >> These changes enhance the robustness of MEI module handling across
> > >> different hardware platforms, ensuring that the i915/Xe driver can be
> > >> cleanly unloaded and reloaded without issues.
> > >>
> > >> v2: updated commit message
> > >>
> > >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> > >> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
> > >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > >> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > >> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> > >> ---
> > >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
> > >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
> > >>  2 files changed, 4 insertions(+)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/i915_module.c
> > >> b/drivers/gpu/drm/i915/i915_module.c
> > >> index 65acd7bf75d0..2ad079ad35db 100644
> > >> --- a/drivers/gpu/drm/i915/i915_module.c
> > >> +++ b/drivers/gpu/drm/i915/i915_module.c
> > >> @@ -75,6 +75,8 @@ static const struct {  };  static int
> > >> init_progress;
> > >>
> > >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
> > >> +
> > >>  static int __init i915_init(void)
> > >>  {
> > >>  	int err, i;
> > >> diff --git a/drivers/gpu/drm/xe/xe_module.c
> > >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
> > >> 100644
> > >> --- a/drivers/gpu/drm/xe/xe_module.c
> > >> +++ b/drivers/gpu/drm/xe/xe_module.c
> > >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
> > >>  	init_funcs[i].exit();
> > >>  }
> > >>
> > >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
> > >
> > >I'm honestly not very comfortable with this.
> > >
> > >1. This is not true for every device supported by these modules.
> > >2. This is not true for every (and the most basic) functionality of these drivers.
> > >
> > >Shouldn't this be done in the the mei side?
> > 
> > I don't think it's possible to do from the mei side. Would mei depend on both xe
> > and i915 (and thus cause both to be loaded regardless of the platform?). For a
> > runtime dependency like this that depends on the platform, I think the best way
> > would be a weakdep + either a request_module() or something else that causes
> > the module to load (is that what comp_* is doing today?)
> > 
> > >
> > >Couldn't at probe we identify the need of them and if needed we return
> > >-EPROBE to attempt a retry after the mei drivers were probed?
> > 
> > I'm not sure this is fixing anything for probe. I think we already wait on the other
> > component to be ready without blocking the rest of the driver functionality.
> > 
> > A weakdep wouldn't cause the module to be loaded where it's not needed, but
> > need some clarification if this is trying to fix anything load-related or just unload.
> 
> This change is fixing unload.
> During xe load I am seeing mei_gsc modules was loaded, but not unloaded during the unload xe

Is it a problem?

> 
> root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after system reboot 
> root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
> mei_hdcp               28672  0
> mei_pxp                16384  0
> mei_me                 49152  2
> mei                   167936  5 mei_hdcp,mei_pxp,mei_me
> root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
> root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta# modprobe xe
> root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
> mei_gsc_proxy          16384  0
> mei_gsc                12288  1
> mei_hdcp               28672  0
> mei_pxp                16384  0
> mei_me                 49152  3 mei_gsc
> mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
> root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta# init 3
> root@DUT6127BMGFRD:/home/gta# echo -n auto > /sys/bus/pci/devices/0000\:03\:00.0/power/control
> root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" > /sys/bus/pci/drivers/xe/unbind
> root@DUT6127BMGFRD:/home/gta# modprobe -r xe
> root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
> root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
> mei_gsc_proxy          16384  0
> mei_gsc                12288  0
> mei_hdcp               28672  0
> mei_pxp                16384  0
> mei_me                 49152  3 mei_gsc
> mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
> root@DUT6127BMGFRD:/home/gta#
> 
> Regards,
> Krishna.
> 
> > 
> > Lucas De Marchi
> > 
> > >
> > >Cc: Alexander Usyskin <alexander.usyskin@intel.com>
> > >Cc: Tomas Winkler <tomas.winkler@intel.com>
> > >Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> > >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > >Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > >Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > >Cc: Jani Nikula <jani.nikula@intel.com>
> > >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > >Cc: Tvrtko Ursulin <tursulin@ursulin.net>
> > >
> > >> +
> > >>  static int __init xe_init(void)
> > >>  {
> > >>  	int err, i;
> > >> --
> > >> 2.25.1
> > >>
Bommu, Krishnaiah Sept. 12, 2024, 11:58 a.m. UTC | #6
> -----Original Message-----
> From: De Marchi, Lucas <lucas.demarchi@intel.com>
> Sent: Wednesday, September 11, 2024 9:49 PM
> To: Bommu, Krishnaiah <krishnaiah.bommu@intel.com>
> Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>; intel-xe@lists.freedesktop.org; intel-
> gfx@lists.freedesktop.org; Kamil Konieczny <kamil.konieczny@linux.intel.com>;
> Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Upadhyay, Tejas
> <tejas.upadhyay@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>; Joonas
> Lahtinen <joonas.lahtinen@linux.intel.com>; Nikula, Jani
> <jani.nikula@intel.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>; Teres Alexis, Alan Previn
> <alan.previn.teres.alexis@intel.com>; Winkler, Tomas
> <tomas.winkler@intel.com>; Usyskin, Alexander
> <alexander.usyskin@intel.com>; linux-modules@vger.kernel.org; Luis
> Chamberlain <mcgrof@kernel.org>
> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
> Modules for i915/Xe Driver
> 
> + linux-modules
> + Luis
> 
> On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:
> >
> >
> >> -----Original Message-----
> >> From: De Marchi, Lucas <lucas.demarchi@intel.com>
> >> Sent: Tuesday, September 10, 2024 9:13 PM
> >> To: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> >> Cc: Bommu, Krishnaiah <krishnaiah.bommu@intel.com>; intel-
> >> xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Kamil
> >> Konieczny <kamil.konieczny@linux.intel.com>; Ceraolo Spurio, Daniele
> >> <daniele.ceraolospurio@intel.com>; Upadhyay, Tejas
> >> <tejas.upadhyay@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>;
> >> Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Nikula, Jani
> >> <jani.nikula@intel.com>; Thomas Hellström
> >> <thomas.hellstrom@linux.intel.com>; Teres Alexis, Alan Previn
> >> <alan.previn.teres.alexis@intel.com>; Winkler, Tomas
> >> <tomas.winkler@intel.com>; Usyskin, Alexander
> >> <alexander.usyskin@intel.com>
> >> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
> >> Modules for i915/Xe Driver
> >>
> >> On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
> >> >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
> >> >> This update addresses the unload/reload sequence of MEI modules in
> >> >> relation to the i915/Xe graphics driver. On platforms where the
> >> >> MEI hardware is integrated with the graphics device (e.g.,
> >> >> DG2/BMG), the i915/xe driver is depend on the MEI modules.
> >> >> Conversely, on newer platforms like MTL and LNL, where the MEI
> >> >> hardware is separate, this
> >> dependency does not exist.
> >> >>
> >> >> The changes introduced ensure that MEI modules are unloaded and
> >> >> reloaded in the correct order based on platform-specific
> >> >> dependencies. This is achieved by adding a MODULE_SOFTDEP
> >> >> directive to
> >> the i915 and Xe module code.
> >>
> >>
> >> can you explain what causes the modules to be loaded today? Also, is
> >> this to fix anything related to *loading* order or just unload?
> >>
> >> >>
> >> >> These changes enhance the robustness of MEI module handling across
> >> >> different hardware platforms, ensuring that the i915/Xe driver can
> >> >> be cleanly unloaded and reloaded without issues.
> >> >>
> >> >> v2: updated commit message
> >> >>
> >> >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> >> >> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
> >> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> >> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> >> >> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> >> >> ---
> >> >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
> >> >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
> >> >>  2 files changed, 4 insertions(+)
> >> >>
> >> >> diff --git a/drivers/gpu/drm/i915/i915_module.c
> >> >> b/drivers/gpu/drm/i915/i915_module.c
> >> >> index 65acd7bf75d0..2ad079ad35db 100644
> >> >> --- a/drivers/gpu/drm/i915/i915_module.c
> >> >> +++ b/drivers/gpu/drm/i915/i915_module.c
> >> >> @@ -75,6 +75,8 @@ static const struct {  };  static int
> >> >> init_progress;
> >> >>
> >> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
> >> >> +
> >> >>  static int __init i915_init(void)  {
> >> >>  	int err, i;
> >> >> diff --git a/drivers/gpu/drm/xe/xe_module.c
> >> >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
> >> >> 100644
> >> >> --- a/drivers/gpu/drm/xe/xe_module.c
> >> >> +++ b/drivers/gpu/drm/xe/xe_module.c
> >> >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
> >> >>  	init_funcs[i].exit();
> >> >>  }
> >> >>
> >> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
> >> >
> >> >I'm honestly not very comfortable with this.
> >> >
> >> >1. This is not true for every device supported by these modules.
> >> >2. This is not true for every (and the most basic) functionality of these
> drivers.
> >> >
> >> >Shouldn't this be done in the the mei side?
> >>
> >> I don't think it's possible to do from the mei side. Would mei depend
> >> on both xe and i915 (and thus cause both to be loaded regardless of
> >> the platform?). For a runtime dependency like this that depends on
> >> the platform, I think the best way would be a weakdep + either a
> >> request_module() or something else that causes the module to load (is
> >> that what comp_* is doing today?)
> >>
> >> >
> >> >Couldn't at probe we identify the need of them and if needed we
> >> >return -EPROBE to attempt a retry after the mei drivers were probed?
> >>
> >> I'm not sure this is fixing anything for probe. I think we already
> >> wait on the other component to be ready without blocking the rest of the
> driver functionality.
> >>
> >> A weakdep wouldn't cause the module to be loaded where it's not
> >> needed, but need some clarification if this is trying to fix anything load-
> related or just unload.
> >
> >This change is fixing unload.
> >During xe load I am seeing mei_gsc modules was loaded, but not unloaded
> >during the unload xe
> 
> so, first thing: if things are correct in the kernel, we shouldn't need to
> **unload** the module after unbinding the device. Why are we unloading xe
> and the other modules for tests?

While running gta@xe_module_load@reload-no-display I see failure, to address this failure I have this changes, previously I am trying to fix from IGT, but as per igt review suggestion I am trying to fix issue in kernel, 
IGT patch: https://patchwork.freedesktop.org/series/137343/

> >root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after
> >system reboot root@DUT6127BMGFRD:/home/gta#
> >root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
> >mei_hdcp               28672  0
> >mei_pxp                16384  0
> >mei_me                 49152  2
> >mei                   167936  5 mei_hdcp,mei_pxp,mei_me
> >root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
> >root@DUT6127BMGFRD:/home/gta# root@DUT6127BMGFRD:/home/gta#
> modprobe xe
> >root@DUT6127BMGFRD:/home/gta# root@DUT6127BMGFRD:/home/gta#
> lsmod |
> >grep mei
> >mei_gsc_proxy          16384  0
> >mei_gsc                12288  1
> 
> 			       ^ which means there's one user, which
> 			         should be xe
> 
> >mei_hdcp               28672  0
> >mei_pxp                16384  0
> >mei_me                 49152  3 mei_gsc
> >mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
> >root@DUT6127BMGFRD:/home/gta#
> >root@DUT6127BMGFRD:/home/gta#
> >root@DUT6127BMGFRD:/home/gta#
> >root@DUT6127BMGFRD:/home/gta# init 3
> >root@DUT6127BMGFRD:/home/gta# echo -n auto >
> >/sys/bus/pci/devices/0000\:03\:00.0/power/control
> >root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" >
> >/sys/bus/pci/drivers/xe/unbind root@DUT6127BMGFRD:/home/gta#
> modprobe
> >-r xe root@DUT6127BMGFRD:/home/gta#
> root@DUT6127BMGFRD:/home/gta# lsmod
> >| grep xe root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
> >mei_gsc_proxy          16384  0
> >mei_gsc                12288  0
> 
> 			       ^ great, so the refcount went to 0,
> 			         confirming it was xe. It should go to 0
> 				 even before you unload the module,
> 				 when unbind.
> 
> A couple of points:
> 
> 1) why do we care about unloading mei_gsc. Just loading xe
>     again (or even not even unloading it, just unbind/rebind),
>     should still work if the xe <-> mei_gsc integration is done
>     correctly.
> 
> 2) If for some reason we do want to remove the module, then we will
>     need some work in kernel/module/  to start tracking runtime module
>     dependencies, i.e. when one module does a module_get(foo->owner), it
>     would add to a list and output on sysfs together with the holders list.
>     This way you would be able to track the runtime deps and remove them
>     if their refcount went to 0 after removing xe.
> 
> (2) is doable, but previous attempts were not successful [1]. Is  there something
> else to make the simpler solution (1) to work?
> 

Reference why I am doing this changes, please see review comments of this patch https://patchwork.freedesktop.org/series/137343/

Regards,
Krishna.

> thanks
> Lucas De Marchi
> 
> [1] https://lore.kernel.org/linux-
> modules/cover.1652113087.git.mchehab@kernel.org/
> 
> >mei_hdcp               28672  0
> >mei_pxp                16384  0
> >mei_me                 49152  3 mei_gsc
> >mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
> >root@DUT6127BMGFRD:/home/gta#
> >
> >Regards,
> >Krishna.
> >
> >>
> >> Lucas De Marchi
> >>
> >> >
> >> >Cc: Alexander Usyskin <alexander.usyskin@intel.com>
> >> >Cc: Tomas Winkler <tomas.winkler@intel.com>
> >> >Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> >> >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> >Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> >> >Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> >Cc: Jani Nikula <jani.nikula@intel.com>
> >> >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> >Cc: Tvrtko Ursulin <tursulin@ursulin.net>
> >> >
> >> >> +
> >> >>  static int __init xe_init(void)
> >> >>  {
> >> >>  	int err, i;
> >> >> --
> >> >> 2.25.1
> >> >>
Lucas De Marchi Sept. 12, 2024, 8:42 p.m. UTC | #7
On Thu, Sep 12, 2024 at 11:58:37AM GMT, Bommu, Krishnaiah wrote:
>
>
>> -----Original Message-----
>> From: De Marchi, Lucas <lucas.demarchi@intel.com>
>> Sent: Wednesday, September 11, 2024 9:49 PM
>> To: Bommu, Krishnaiah <krishnaiah.bommu@intel.com>
>> Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>; intel-xe@lists.freedesktop.org; intel-
>> gfx@lists.freedesktop.org; Kamil Konieczny <kamil.konieczny@linux.intel.com>;
>> Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Upadhyay, Tejas
>> <tejas.upadhyay@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>; Joonas
>> Lahtinen <joonas.lahtinen@linux.intel.com>; Nikula, Jani
>> <jani.nikula@intel.com>; Thomas Hellström
>> <thomas.hellstrom@linux.intel.com>; Teres Alexis, Alan Previn
>> <alan.previn.teres.alexis@intel.com>; Winkler, Tomas
>> <tomas.winkler@intel.com>; Usyskin, Alexander
>> <alexander.usyskin@intel.com>; linux-modules@vger.kernel.org; Luis
>> Chamberlain <mcgrof@kernel.org>
>> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
>> Modules for i915/Xe Driver
>>
>> + linux-modules
>> + Luis
>>
>> On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: De Marchi, Lucas <lucas.demarchi@intel.com>
>> >> Sent: Tuesday, September 10, 2024 9:13 PM
>> >> To: Vivi, Rodrigo <rodrigo.vivi@intel.com>
>> >> Cc: Bommu, Krishnaiah <krishnaiah.bommu@intel.com>; intel-
>> >> xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Kamil
>> >> Konieczny <kamil.konieczny@linux.intel.com>; Ceraolo Spurio, Daniele
>> >> <daniele.ceraolospurio@intel.com>; Upadhyay, Tejas
>> >> <tejas.upadhyay@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>;
>> >> Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Nikula, Jani
>> >> <jani.nikula@intel.com>; Thomas Hellström
>> >> <thomas.hellstrom@linux.intel.com>; Teres Alexis, Alan Previn
>> >> <alan.previn.teres.alexis@intel.com>; Winkler, Tomas
>> >> <tomas.winkler@intel.com>; Usyskin, Alexander
>> >> <alexander.usyskin@intel.com>
>> >> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
>> >> Modules for i915/Xe Driver
>> >>
>> >> On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>> >> >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> >> >> This update addresses the unload/reload sequence of MEI modules in
>> >> >> relation to the i915/Xe graphics driver. On platforms where the
>> >> >> MEI hardware is integrated with the graphics device (e.g.,
>> >> >> DG2/BMG), the i915/xe driver is depend on the MEI modules.
>> >> >> Conversely, on newer platforms like MTL and LNL, where the MEI
>> >> >> hardware is separate, this
>> >> dependency does not exist.
>> >> >>
>> >> >> The changes introduced ensure that MEI modules are unloaded and
>> >> >> reloaded in the correct order based on platform-specific
>> >> >> dependencies. This is achieved by adding a MODULE_SOFTDEP
>> >> >> directive to
>> >> the i915 and Xe module code.
>> >>
>> >>
>> >> can you explain what causes the modules to be loaded today? Also, is
>> >> this to fix anything related to *loading* order or just unload?
>> >>
>> >> >>
>> >> >> These changes enhance the robustness of MEI module handling across
>> >> >> different hardware platforms, ensuring that the i915/Xe driver can
>> >> >> be cleanly unloaded and reloaded without issues.
>> >> >>
>> >> >> v2: updated commit message
>> >> >>
>> >> >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
>> >> >> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
>> >> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> >> >> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> >> >> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> >> >> ---
>> >> >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>> >> >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>> >> >>  2 files changed, 4 insertions(+)
>> >> >>
>> >> >> diff --git a/drivers/gpu/drm/i915/i915_module.c
>> >> >> b/drivers/gpu/drm/i915/i915_module.c
>> >> >> index 65acd7bf75d0..2ad079ad35db 100644
>> >> >> --- a/drivers/gpu/drm/i915/i915_module.c
>> >> >> +++ b/drivers/gpu/drm/i915/i915_module.c
>> >> >> @@ -75,6 +75,8 @@ static const struct {  };  static int
>> >> >> init_progress;
>> >> >>
>> >> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >> >> +
>> >> >>  static int __init i915_init(void)  {
>> >> >>  	int err, i;
>> >> >> diff --git a/drivers/gpu/drm/xe/xe_module.c
>> >> >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
>> >> >> 100644
>> >> >> --- a/drivers/gpu/drm/xe/xe_module.c
>> >> >> +++ b/drivers/gpu/drm/xe/xe_module.c
>> >> >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>> >> >>  	init_funcs[i].exit();
>> >> >>  }
>> >> >>
>> >> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >> >
>> >> >I'm honestly not very comfortable with this.
>> >> >
>> >> >1. This is not true for every device supported by these modules.
>> >> >2. This is not true for every (and the most basic) functionality of these
>> drivers.
>> >> >
>> >> >Shouldn't this be done in the the mei side?
>> >>
>> >> I don't think it's possible to do from the mei side. Would mei depend
>> >> on both xe and i915 (and thus cause both to be loaded regardless of
>> >> the platform?). For a runtime dependency like this that depends on
>> >> the platform, I think the best way would be a weakdep + either a
>> >> request_module() or something else that causes the module to load (is
>> >> that what comp_* is doing today?)
>> >>
>> >> >
>> >> >Couldn't at probe we identify the need of them and if needed we
>> >> >return -EPROBE to attempt a retry after the mei drivers were probed?
>> >>
>> >> I'm not sure this is fixing anything for probe. I think we already
>> >> wait on the other component to be ready without blocking the rest of the
>> driver functionality.
>> >>
>> >> A weakdep wouldn't cause the module to be loaded where it's not
>> >> needed, but need some clarification if this is trying to fix anything load-
>> related or just unload.
>> >
>> >This change is fixing unload.
>> >During xe load I am seeing mei_gsc modules was loaded, but not unloaded
>> >during the unload xe
>>
>> so, first thing: if things are correct in the kernel, we shouldn't need to
>> **unload** the module after unbinding the device. Why are we unloading xe
>> and the other modules for tests?
>
>While running gta@xe_module_load@reload-no-display I see failure, to address this failure I have this changes, previously I am trying to fix from IGT, but as per igt review suggestion I am trying to fix issue in kernel,
>IGT patch: https://patchwork.freedesktop.org/series/137343/


it seems a mistake in igt to try to remove the mei_gsc module.
As a dgfx, it's even worse - what happens if another card is using the
module? What happens if I have a RPL + BMG and i915 driving the former
while xe drives the latter?

You shouldn't need to remove it.  This works for me with BMG (unbinding
all drivers for simplicity since we are removing the module... but if we
don't remove the module, then we can test with the only device we care
about):


# modprobe xe
# unbind
Unbinding /sys/bus/pci/devices/0000:00:02.0 (8086:a782)... ok
Unbinding /sys/bus/pci/devices/0000:03:00.0 (8086:e20b)... ok
# lsmod | grep -e xe -e mei_gsc
xe                   3584000  0
drm_gpuvm              45056  1 xe
video                  77824  1 xe
i2c_algo_bit           12288  1 xe
drm_ttm_helper         16384  1 xe
gpu_sched              61440  1 xe
drm_suballoc_helper    16384  1 xe
drm_display_helper    270336  1 xe
drm_kunit_helpers      16384  1 xe
drm_buddy              20480  1 xe
ttm                   114688  2 drm_ttm_helper,xe
mei_gsc_proxy          16384  0
mei_gsc                12288  0
drm_exec               16384  2 drm_gpuvm,xe
kunit                  73728  2 xe,drm_kunit_helpers
drm_kms_helper        241664  4 drm_display_helper,drm_ttm_helper,xe,drm_kunit_helpers
mei_me                 65536  3 mei_gsc
mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
drm                   737280  11 gpu_sched,drm_kms_helper,drm_exec,drm_gpuvm,drm_suballoc_helper,drm_display_helper,drm_buddy,drm_ttm_helper,xe,drm_kunit_helpers,ttm
# modprobe -r xe
# modprobe xe probe_display=0
# unbind
Unbinding /sys/bus/pci/devices/0000:00:02.0 (8086:a782)... ok
Unbinding /sys/bus/pci/devices/0000:03:00.0 (8086:e20b)... ok
# modprobe -r xe
# modprobe xe

I didn't check if mei_gsc continues to work after reload, but I guess so
as its refcount is incremented:

mei_gsc                12288  1


unbind function is this:

function unbind {
         vga="0300"
         display="0380"
         pci_vendor="8086"

         while read -r pci_slot class devid xxx; do
                 sysdev=/sys/bus/pci/devices/0000:$pci_slot

                 echo -n "Unbinding $sysdev ($devid)... "
                 if [ ! -e "$sysdev/driver" ]; then
                         echo "(skip: not bound)"
                         continue
                 fi

                 echo -n auto > ${sysdev}/power/control
                 echo -n "0000:$pci_slot" > $sysdev/driver/unbind
                 echo "ok"
         done <<<$(lspci -d ${pci_vendor}::${display} -n; lspci -d ${pci_vendor}::${vga} -n )
}


So... for igt: I *think* simply removing the array with modules to
unload first would fix it.

Lucas De Marchi

>
>> >root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after
>> >system reboot root@DUT6127BMGFRD:/home/gta#
>> >root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
>> >mei_hdcp               28672  0
>> >mei_pxp                16384  0
>> >mei_me                 49152  2
>> >mei                   167936  5 mei_hdcp,mei_pxp,mei_me
>> >root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
>> >root@DUT6127BMGFRD:/home/gta# root@DUT6127BMGFRD:/home/gta#
>> modprobe xe
>> >root@DUT6127BMGFRD:/home/gta# root@DUT6127BMGFRD:/home/gta#
>> lsmod |
>> >grep mei
>> >mei_gsc_proxy          16384  0
>> >mei_gsc                12288  1
>>
>> 			       ^ which means there's one user, which
>> 			         should be xe
>>
>> >mei_hdcp               28672  0
>> >mei_pxp                16384  0
>> >mei_me                 49152  3 mei_gsc
>> >mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>> >root@DUT6127BMGFRD:/home/gta#
>> >root@DUT6127BMGFRD:/home/gta#
>> >root@DUT6127BMGFRD:/home/gta#
>> >root@DUT6127BMGFRD:/home/gta# init 3
>> >root@DUT6127BMGFRD:/home/gta# echo -n auto >
>> >/sys/bus/pci/devices/0000\:03\:00.0/power/control
>> >root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" >
>> >/sys/bus/pci/drivers/xe/unbind root@DUT6127BMGFRD:/home/gta#
>> modprobe
>> >-r xe root@DUT6127BMGFRD:/home/gta#
>> root@DUT6127BMGFRD:/home/gta# lsmod
>> >| grep xe root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
>> >mei_gsc_proxy          16384  0
>> >mei_gsc                12288  0
>>
>> 			       ^ great, so the refcount went to 0,
>> 			         confirming it was xe. It should go to 0
>> 				 even before you unload the module,
>> 				 when unbind.
>>
>> A couple of points:
>>
>> 1) why do we care about unloading mei_gsc. Just loading xe
>>     again (or even not even unloading it, just unbind/rebind),
>>     should still work if the xe <-> mei_gsc integration is done
>>     correctly.
>>
>> 2) If for some reason we do want to remove the module, then we will
>>     need some work in kernel/module/  to start tracking runtime module
>>     dependencies, i.e. when one module does a module_get(foo->owner), it
>>     would add to a list and output on sysfs together with the holders list.
>>     This way you would be able to track the runtime deps and remove them
>>     if their refcount went to 0 after removing xe.
>>
>> (2) is doable, but previous attempts were not successful [1]. Is  there something
>> else to make the simpler solution (1) to work?
>>
>
>Reference why I am doing this changes, please see review comments of this patch https://patchwork.freedesktop.org/series/137343/
>
>Regards,
>Krishna.
>
>> thanks
>> Lucas De Marchi
>>
>> [1] https://lore.kernel.org/linux-
>> modules/cover.1652113087.git.mchehab@kernel.org/
>>
>> >mei_hdcp               28672  0
>> >mei_pxp                16384  0
>> >mei_me                 49152  3 mei_gsc
>> >mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>> >root@DUT6127BMGFRD:/home/gta#
>> >
>> >Regards,
>> >Krishna.
>> >
>> >>
>> >> Lucas De Marchi
>> >>
>> >> >
>> >> >Cc: Alexander Usyskin <alexander.usyskin@intel.com>
>> >> >Cc: Tomas Winkler <tomas.winkler@intel.com>
>> >> >Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>> >> >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> >> >Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> >> >Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> >> >Cc: Jani Nikula <jani.nikula@intel.com>
>> >> >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> >> >Cc: Tvrtko Ursulin <tursulin@ursulin.net>
>> >> >
>> >> >> +
>> >> >>  static int __init xe_init(void)
>> >> >>  {
>> >> >>  	int err, i;
>> >> >> --
>> >> >> 2.25.1
>> >> >>
Lucas De Marchi Sept. 13, 2024, 10:21 p.m. UTC | #8
On Thu, Sep 12, 2024 at 03:42:52PM GMT, Lucas De Marchi wrote:
>unbind function is this:
>
>function unbind {
>        vga="0300"
>        display="0380"
>        pci_vendor="8086"
>
>        while read -r pci_slot class devid xxx; do
>                sysdev=/sys/bus/pci/devices/0000:$pci_slot
>
>                echo -n "Unbinding $sysdev ($devid)... "
>                if [ ! -e "$sysdev/driver" ]; then
>                        echo "(skip: not bound)"
>                        continue
>                fi
>
>                echo -n auto > ${sysdev}/power/control
>                echo -n "0000:$pci_slot" > $sysdev/driver/unbind
>                echo "ok"
>        done <<<$(lspci -d ${pci_vendor}::${display} -n; lspci -d ${pci_vendor}::${vga} -n )
>}
>
>
>So... for igt: I *think* simply removing the array with modules to
>unload first would fix it.

I decided to be more useful than just giving the sketch above and typed
something similar to what I'm writing for kmod (soon we will have
`kmod [bind|unbind]` commands):

https://patchwork.freedesktop.org/series/138676/

xe_module_load@reload-no-display works for me with BMG with that patch.
Let's see if it passes the rest of the CI tests.

Lucas De Marchi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_module.c b/drivers/gpu/drm/i915/i915_module.c
index 65acd7bf75d0..2ad079ad35db 100644
--- a/drivers/gpu/drm/i915/i915_module.c
+++ b/drivers/gpu/drm/i915/i915_module.c
@@ -75,6 +75,8 @@  static const struct {
 };
 static int init_progress;
 
+MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
+
 static int __init i915_init(void)
 {
 	int err, i;
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index bfc3deebdaa2..5633ea1841b7 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -127,6 +127,8 @@  static void xe_call_exit_func(unsigned int i)
 	init_funcs[i].exit();
 }
 
+MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
+
 static int __init xe_init(void)
 {
 	int err, i;