diff mbox series

usb: roles: try to get/put all relevant modules

Message ID 20240112080108.1147450-1-xu.yang_2@nxp.com (mailing list archive)
State New
Headers show
Series usb: roles: try to get/put all relevant modules | expand

Commit Message

Xu Yang Jan. 12, 2024, 8:01 a.m. UTC
Generally, usb role switch device will be registered by usb controller
driver. Then this usb controller device will be the parent of the usb
role switch device. And also the usb controller device will not be a
standalone device, it may be registered by other glue drivers. Currently,
the glue driver can't aware the usage of usb role switch device. So it
will remove usb controller device when the glue driver is deregistered.
In this case, below kernel dump will be shown if the user of usb role
swich (such as tcpm) tries to put it.

[  248.891875] Hardware name: NXP i.MX95 19X19 board (DT)
[  248.896998] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  248.903948] pc : usb_role_switch_put+0x28/0x4c
[  248.908385] lr : tcpm_unregister_port+0xb8/0xf8 [tcpm]
[  248.913533] sp : ffff8000836fbbc0
[  248.916835] x29: ffff8000836fbbc0 x28: ffff0000899fd880 x27: 0000000000000000
[  248.923959] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[  248.931083] x23: ffff000081417970 x22: ffff00008aab10e8 x21: ffff00008aab0080
[  248.938207] x20: ffff00008aab3110 x19: ffff00008a138c00 x18: ffffffffffffffff
[  248.945331] x17: 0000000000000000 x16: 00e0030000000000 x15: 0000000000000000
[  248.952455] x14: 0000000000000001 x13: 0000000000000040 x12: 0000000000000000
[  248.959579] x11: 0000000000000000 x10: ffffffffffffffff x9 : ffff8000836fbaf0
[  248.966703] x8 : ffff8000836fbaf0 x7 : ffff000084237728 x6 : 0000000000000400
[  248.973827] x5 : 0000000041001000 x4 : fffffc000228ee60 x3 : 00000000820000f4
[  248.980951] x2 : ffff00008a3b9b28 x1 : 00000000820000f5 x0 : 0000000000000000
[  248.988076] Call trace:
[  248.990512]  usb_role_switch_put+0x28/0x4c
[  248.994602]  tcpm_unregister_port+0xb8/0xf8 [tcpm]
[  248.999385]  tcpci_remove+0x5c/0xbc [tcpci]
[  249.003571]  i2c_device_remove+0x2c/0x9c
[  249.007489]  device_remove+0x4c/0x80
[  249.011059]  device_release_driver_internal+0x1c8/0x224
[  249.016268]  driver_detach+0x50/0x98
[  249.019830]  bus_remove_driver+0x6c/0xbc
[  249.023739]  driver_unregister+0x30/0x60
[  249.027647]  i2c_del_driver+0x54/0x68
[  249.031296]  tcpci_i2c_driver_exit+0x18/0x990 [tcpci]
[  249.036340]  __arm64_sys_delete_module+0x180/0x260
[  249.041124]  invoke_syscall+0x48/0x114
[  249.044868]  el0_svc_common.constprop.0+0xc8/0xe8
[  249.049557]  do_el0_svc+0x20/0x2c
[  249.052858]  el0_svc+0x40/0xf4
[  249.055909]  el0t_64_sync_handler+0x13c/0x158
[  249.060251]  el0t_64_sync+0x190/0x194
[  249.063904] Code: b140041f 540000e8 f9402000 f9403400 (f9400800)
[  249.069985] ---[ end trace 0000000000000000 ]---

To fix this issue, this will try to get/put all relevant modules when the
user tries to get/put usb role switch device.

Fixes: 5c54fcac9a9d ("usb: roles: Take care of driver module reference counting")
cc: <stable@vger.kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
---
 drivers/usb/roles/class.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Greg KH Jan. 12, 2024, 8:24 a.m. UTC | #1
On Fri, Jan 12, 2024 at 04:01:08PM +0800, Xu Yang wrote:
> +void usb_role_switch_get_modules(struct device *dev)
> +{
> +	while (dev) {
> +		if (dev->driver)
> +			WARN_ON(!try_module_get(dev->driver->owner));

You just crashed all systems that have panic-on-warn enabled, which is
by far (i.e. in the billions) the huge majority of Linux systems in the
world.

If this is something that can fail, then properly handle the issue,
don't just give up and say "let's fill the kernel log with a mess and
reboot the box!".

Also, are you SURE that module owners are what you want to track here?
That's not usually the real solution for things like this, remember
module reference counts deal with code, but the driver model deals with
data...

thanks,

greg k-h
Greg KH Jan. 12, 2024, 8:26 a.m. UTC | #2
On Fri, Jan 12, 2024 at 09:24:11AM +0100, Greg KH wrote:
> On Fri, Jan 12, 2024 at 04:01:08PM +0800, Xu Yang wrote:
> > +void usb_role_switch_get_modules(struct device *dev)
> > +{
> > +	while (dev) {
> > +		if (dev->driver)
> > +			WARN_ON(!try_module_get(dev->driver->owner));
> 
> You just crashed all systems that have panic-on-warn enabled, which is
> by far (i.e. in the billions) the huge majority of Linux systems in the
> world.
> 
> If this is something that can fail, then properly handle the issue,
> don't just give up and say "let's fill the kernel log with a mess and
> reboot the box!".

Ah, I see now you are just moving the code, but please, let's fix this
properly, don't persist in the wrong code here.

thanks,

greg k-h
Xu Yang Jan. 12, 2024, 9:28 a.m. UTC | #3
Hi Greg,

> 
> On Fri, Jan 12, 2024 at 09:24:11AM +0100, Greg KH wrote:
> > On Fri, Jan 12, 2024 at 04:01:08PM +0800, Xu Yang wrote:
> > > +void usb_role_switch_get_modules(struct device *dev)
> > > +{
> > > +   while (dev) {
> > > +           if (dev->driver)
> > > +                   WARN_ON(!try_module_get(dev->driver->owner));
> >
> > You just crashed all systems that have panic-on-warn enabled, which is
> > by far (i.e. in the billions) the huge majority of Linux systems in the
> > world.
> >
> > If this is something that can fail, then properly handle the issue,
> > don't just give up and say "let's fill the kernel log with a mess and
> > reboot the box!".
> 
> Ah, I see now you are just moving the code, but please, let's fix this
> properly, don't persist in the wrong code here.

This is a true module dependency issue and it only occurs when
load/unload modules. The dependency of usb controller glue driver,
usb controller driver and the user driver (such as tcpm) of usb role
switch is not correctly established.

This patch is used to adjust dependency of them, without it, two issues
may happen:
1. "NULL pointer dereference" kernel dump will be shown
2.  The reference count of usb controller module may never decrease to 0

Thanks, 
Xu Yang

> 
> thanks,
> 
> greg k-h
Greg KH Jan. 12, 2024, 1:52 p.m. UTC | #4
On Fri, Jan 12, 2024 at 09:28:04AM +0000, Xu Yang wrote:
> Hi Greg,
> 
> > 
> > On Fri, Jan 12, 2024 at 09:24:11AM +0100, Greg KH wrote:
> > > On Fri, Jan 12, 2024 at 04:01:08PM +0800, Xu Yang wrote:
> > > > +void usb_role_switch_get_modules(struct device *dev)
> > > > +{
> > > > +   while (dev) {
> > > > +           if (dev->driver)
> > > > +                   WARN_ON(!try_module_get(dev->driver->owner));
> > >
> > > You just crashed all systems that have panic-on-warn enabled, which is
> > > by far (i.e. in the billions) the huge majority of Linux systems in the
> > > world.
> > >
> > > If this is something that can fail, then properly handle the issue,
> > > don't just give up and say "let's fill the kernel log with a mess and
> > > reboot the box!".
> > 
> > Ah, I see now you are just moving the code, but please, let's fix this
> > properly, don't persist in the wrong code here.
> 
> This is a true module dependency issue and it only occurs when
> load/unload modules. The dependency of usb controller glue driver,
> usb controller driver and the user driver (such as tcpm) of usb role
> switch is not correctly established.

Then the driver model is not being used properly, as no module
references should be needed here.

> This patch is used to adjust dependency of them, without it, two issues
> may happen:
> 1. "NULL pointer dereference" kernel dump will be shown

For when?

> 2.  The reference count of usb controller module may never decrease to 0

The reference shouldn't have been increased as you want to be able to
unload the driver if a device is still present in the system.

So I think there's a larger issue here, module references shouldn't be
incremented just if a driver binds to a device, right?  Only if other
code is using the module, which is different.

thanks,

greg k-h
Greg KH Jan. 14, 2024, 12:25 p.m. UTC | #5
On Fri, Jan 12, 2024 at 04:01:08PM +0800, Xu Yang wrote:
> Generally, usb role switch device will be registered by usb controller
> driver. Then this usb controller device will be the parent of the usb
> role switch device. And also the usb controller device will not be a
> standalone device, it may be registered by other glue drivers. Currently,
> the glue driver can't aware the usage of usb role switch device. So it
> will remove usb controller device when the glue driver is deregistered.
> In this case, below kernel dump will be shown if the user of usb role
> swich (such as tcpm) tries to put it.

Note, manual removal of modules is just that, manual.  No module is
every auto-unloaded.

So how are you seeing this error in a real system?

> 
> [  248.891875] Hardware name: NXP i.MX95 19X19 board (DT)
> [  248.896998] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  248.903948] pc : usb_role_switch_put+0x28/0x4c
> [  248.908385] lr : tcpm_unregister_port+0xb8/0xf8 [tcpm]
> [  248.913533] sp : ffff8000836fbbc0
> [  248.916835] x29: ffff8000836fbbc0 x28: ffff0000899fd880 x27: 0000000000000000
> [  248.923959] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> [  248.931083] x23: ffff000081417970 x22: ffff00008aab10e8 x21: ffff00008aab0080
> [  248.938207] x20: ffff00008aab3110 x19: ffff00008a138c00 x18: ffffffffffffffff
> [  248.945331] x17: 0000000000000000 x16: 00e0030000000000 x15: 0000000000000000
> [  248.952455] x14: 0000000000000001 x13: 0000000000000040 x12: 0000000000000000
> [  248.959579] x11: 0000000000000000 x10: ffffffffffffffff x9 : ffff8000836fbaf0
> [  248.966703] x8 : ffff8000836fbaf0 x7 : ffff000084237728 x6 : 0000000000000400
> [  248.973827] x5 : 0000000041001000 x4 : fffffc000228ee60 x3 : 00000000820000f4
> [  248.980951] x2 : ffff00008a3b9b28 x1 : 00000000820000f5 x0 : 0000000000000000
> [  248.988076] Call trace:
> [  248.990512]  usb_role_switch_put+0x28/0x4c
> [  248.994602]  tcpm_unregister_port+0xb8/0xf8 [tcpm]
> [  248.999385]  tcpci_remove+0x5c/0xbc [tcpci]
> [  249.003571]  i2c_device_remove+0x2c/0x9c
> [  249.007489]  device_remove+0x4c/0x80
> [  249.011059]  device_release_driver_internal+0x1c8/0x224
> [  249.016268]  driver_detach+0x50/0x98
> [  249.019830]  bus_remove_driver+0x6c/0xbc
> [  249.023739]  driver_unregister+0x30/0x60
> [  249.027647]  i2c_del_driver+0x54/0x68
> [  249.031296]  tcpci_i2c_driver_exit+0x18/0x990 [tcpci]
> [  249.036340]  __arm64_sys_delete_module+0x180/0x260
> [  249.041124]  invoke_syscall+0x48/0x114
> [  249.044868]  el0_svc_common.constprop.0+0xc8/0xe8
> [  249.049557]  do_el0_svc+0x20/0x2c
> [  249.052858]  el0_svc+0x40/0xf4
> [  249.055909]  el0t_64_sync_handler+0x13c/0x158
> [  249.060251]  el0t_64_sync+0x190/0x194
> [  249.063904] Code: b140041f 540000e8 f9402000 f9403400 (f9400800)
> [  249.069985] ---[ end trace 0000000000000000 ]---
> 
> To fix this issue, this will try to get/put all relevant modules when the
> user tries to get/put usb role switch device.

With this change, is the module now unable to be loaded if the hardware
is present?

And again, why isn't this properly represented in the driver model so
that if the module is unloaded, the relevant code paths are not properly
cleaned up and the device is removed _before_ the module is unloaded?

thanks,

greg k-h
kernel test robot Jan. 14, 2024, 11:06 p.m. UTC | #6
Hi Xu,

kernel test robot noticed the following build warnings:

[auto build test WARNING on usb/usb-testing]
[also build test WARNING on usb/usb-next usb/usb-linus hid/for-next westeri-thunderbolt/next linus/master v6.7 next-20240112]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yang/usb-roles-try-to-get-put-all-relevant-modules/20240112-155735
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
patch link:    https://lore.kernel.org/r/20240112080108.1147450-1-xu.yang_2%40nxp.com
patch subject: [PATCH] usb: roles: try to get/put all relevant modules
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20240115/202401150608.0Ok2pmGw-lkp@intel.com/config)
compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 9bde5becb44ea071f5e1fa1f5d4071dc8788b18c)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240115/202401150608.0Ok2pmGw-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401150608.0Ok2pmGw-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/usb/roles/class.c:37:6: warning: no previous prototype for function 'usb_role_switch_get_modules' [-Wmissing-prototypes]
      37 | void usb_role_switch_get_modules(struct device *dev)
         |      ^
   drivers/usb/roles/class.c:37:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
      37 | void usb_role_switch_get_modules(struct device *dev)
         | ^
         | static 
>> drivers/usb/roles/class.c:46:6: warning: no previous prototype for function 'usb_role_switch_put_modules' [-Wmissing-prototypes]
      46 | void usb_role_switch_put_modules(struct device *dev)
         |      ^
   drivers/usb/roles/class.c:46:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
      46 | void usb_role_switch_put_modules(struct device *dev)
         | ^
         | static 
   2 warnings generated.


vim +/usb_role_switch_get_modules +37 drivers/usb/roles/class.c

    36	
  > 37	void usb_role_switch_get_modules(struct device *dev)
    38	{
    39		while (dev) {
    40			if (dev->driver)
    41				WARN_ON(!try_module_get(dev->driver->owner));
    42			dev = dev->parent;
    43		}
    44	}
    45	
  > 46	void usb_role_switch_put_modules(struct device *dev)
    47	{
    48		while (dev) {
    49			if (dev->driver)
    50				module_put(dev->driver->owner);
    51			dev = dev->parent;
    52		}
    53	}
    54
kernel test robot Jan. 15, 2024, 2:12 a.m. UTC | #7
Hi Xu,

kernel test robot noticed the following build warnings:

[auto build test WARNING on usb/usb-testing]
[also build test WARNING on usb/usb-next usb/usb-linus hid/for-next westeri-thunderbolt/next linus/master v6.7 next-20240112]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yang/usb-roles-try-to-get-put-all-relevant-modules/20240112-155735
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
patch link:    https://lore.kernel.org/r/20240112080108.1147450-1-xu.yang_2%40nxp.com
patch subject: [PATCH] usb: roles: try to get/put all relevant modules
config: i386-randconfig-061-20240115 (https://download.01.org/0day-ci/archive/20240115/202401151012.Dzvmty7a-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240115/202401151012.Dzvmty7a-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401151012.Dzvmty7a-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/usb/roles/class.c:37:6: sparse: sparse: symbol 'usb_role_switch_get_modules' was not declared. Should it be static?
>> drivers/usb/roles/class.c:46:6: sparse: sparse: symbol 'usb_role_switch_put_modules' was not declared. Should it be static?

vim +/usb_role_switch_get_modules +37 drivers/usb/roles/class.c

    36	
  > 37	void usb_role_switch_get_modules(struct device *dev)
    38	{
    39		while (dev) {
    40			if (dev->driver)
    41				WARN_ON(!try_module_get(dev->driver->owner));
    42			dev = dev->parent;
    43		}
    44	}
    45	
  > 46	void usb_role_switch_put_modules(struct device *dev)
    47	{
    48		while (dev) {
    49			if (dev->driver)
    50				module_put(dev->driver->owner);
    51			dev = dev->parent;
    52		}
    53	}
    54
Xu Yang Jan. 15, 2024, 3:02 a.m. UTC | #8
Hi Greg,

> 
> On Fri, Jan 12, 2024 at 09:28:04AM +0000, Xu Yang wrote:
> > Hi Greg,
> >
> > >
> > > On Fri, Jan 12, 2024 at 09:24:11AM +0100, Greg KH wrote:
> > > > On Fri, Jan 12, 2024 at 04:01:08PM +0800, Xu Yang wrote:
> > > > > +void usb_role_switch_get_modules(struct device *dev)
> > > > > +{
> > > > > +   while (dev) {
> > > > > +           if (dev->driver)
> > > > > +                   WARN_ON(!try_module_get(dev->driver->owner));
> > > >
> > > > You just crashed all systems that have panic-on-warn enabled, which is
> > > > by far (i.e. in the billions) the huge majority of Linux systems in the
> > > > world.
> > > >
> > > > If this is something that can fail, then properly handle the issue,
> > > > don't just give up and say "let's fill the kernel log with a mess and
> > > > reboot the box!".
> > >
> > > Ah, I see now you are just moving the code, but please, let's fix this
> > > properly, don't persist in the wrong code here.
> >
> > This is a true module dependency issue and it only occurs when
> > load/unload modules. The dependency of usb controller glue driver,
> > usb controller driver and the user driver (such as tcpm) of usb role
> > switch is not correctly established.
> 
> Then the driver model is not being used properly, as no module
> references should be needed here.

Taking below diagram as example:

     ci_hdrc.0        register   usb    get     tcpm_port
  (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
         ^  ^                    switch           |   ^
         |  |                                     |   |
       +1|  |           +1                        |   |+1
         |  +-------------------------------------+   |
         |                                            |
     4c200000.usb                                   1-0050
(driver: ci_hdrc_imx)                            (driver: tcpci)

1. Driver ci_hdrc_imx and tcpci are built as module at least.
2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
   and try to get ci_hdrc module's reference. ci_hdrc will register
   usb-role-switch device.
3. When module tcpci is loaded, it will register tcpm port device and try
   to get tcpm module's reference. The tcpm module will get usb-role-switch
   which is registered by ci_hdrc. In current design, tcpm will also try to
   get ci_hdrc module's reference after get usb-role-switch.

> 
> > This patch is used to adjust dependency of them, without it, two issues
> > may happen:
> > 1. "NULL pointer dereference" kernel dump will be shown
> 
> For when?

4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
   unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
   device usb-role-switch is also unregistered.
5. Then, if I try to unload module tcpci, "NULL pointer dereference"
   will be shown due to below code:
   
   module_put(sw->dev.parent->driver->owner);
   
   parent->driver is NULL at this time.

> 
> > 2.  The reference count of usb controller module may never decrease to 0
> 
> The reference shouldn't have been increased as you want to be able to
> unload the driver if a device is still present in the system.
> 
> So I think there's a larger issue here, module references shouldn't be
> incremented just if a driver binds to a device, right?  Only if other
> code is using the module, which is different.

6. If ci_hdrc is also built as module, the module reference will keep at
   value 1 at least due to it failed to subtract 1 at step 5.

This issue is observed on chipidea and dwc3, the usb glue driver shouldn't
be unloaded befer tcpci module.

Thanks,
Xu Yang

> 
> thanks,
> 
> greg k-h
Greg KH Jan. 15, 2024, 6:54 a.m. UTC | #9
On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> Hi Greg,
> 
> > 
> > On Fri, Jan 12, 2024 at 09:28:04AM +0000, Xu Yang wrote:
> > > Hi Greg,
> > >
> > > >
> > > > On Fri, Jan 12, 2024 at 09:24:11AM +0100, Greg KH wrote:
> > > > > On Fri, Jan 12, 2024 at 04:01:08PM +0800, Xu Yang wrote:
> > > > > > +void usb_role_switch_get_modules(struct device *dev)
> > > > > > +{
> > > > > > +   while (dev) {
> > > > > > +           if (dev->driver)
> > > > > > +                   WARN_ON(!try_module_get(dev->driver->owner));
> > > > >
> > > > > You just crashed all systems that have panic-on-warn enabled, which is
> > > > > by far (i.e. in the billions) the huge majority of Linux systems in the
> > > > > world.
> > > > >
> > > > > If this is something that can fail, then properly handle the issue,
> > > > > don't just give up and say "let's fill the kernel log with a mess and
> > > > > reboot the box!".
> > > >
> > > > Ah, I see now you are just moving the code, but please, let's fix this
> > > > properly, don't persist in the wrong code here.
> > >
> > > This is a true module dependency issue and it only occurs when
> > > load/unload modules. The dependency of usb controller glue driver,
> > > usb controller driver and the user driver (such as tcpm) of usb role
> > > switch is not correctly established.
> > 
> > Then the driver model is not being used properly, as no module
> > references should be needed here.
> 
> Taking below diagram as example:
> 
>      ci_hdrc.0        register   usb    get     tcpm_port
>   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
>          ^  ^                    switch           |   ^
>          |  |                                     |   |
>        +1|  |           +1                        |   |+1
>          |  +-------------------------------------+   |
>          |                                            |
>      4c200000.usb                                   1-0050
> (driver: ci_hdrc_imx)                            (driver: tcpci)
> 
> 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
>    and try to get ci_hdrc module's reference. ci_hdrc will register
>    usb-role-switch device.
> 3. When module tcpci is loaded, it will register tcpm port device and try
>    to get tcpm module's reference. The tcpm module will get usb-role-switch
>    which is registered by ci_hdrc. In current design, tcpm will also try to
>    get ci_hdrc module's reference after get usb-role-switch.
> 
> > 
> > > This patch is used to adjust dependency of them, without it, two issues
> > > may happen:
> > > 1. "NULL pointer dereference" kernel dump will be shown
> > 
> > For when?
> 
> 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
>    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
>    device usb-role-switch is also unregistered.
> 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
>    will be shown due to below code:
>    
>    module_put(sw->dev.parent->driver->owner);
>    
>    parent->driver is NULL at this time.

So that means that someone is not properly cleaning up the devices when
the module is removed, can you fix that root problem here instead?

thanks,

greg k-h
Alan Stern Jan. 15, 2024, 4:31 p.m. UTC | #10
Those of us unfamiliar with this code need you to explain a lot more 
about what's going on.

On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> Taking below diagram as example:
> 
>      ci_hdrc.0        register   usb    get     tcpm_port
>   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
>          ^  ^                    switch           |   ^
>          |  |                                     |   |
>        +1|  |           +1                        |   |+1
>          |  +-------------------------------------+   |
>          |                                            |
>      4c200000.usb                                   1-0050
> (driver: ci_hdrc_imx)                            (driver: tcpci)
> 
> 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
>    and try to get ci_hdrc module's reference.

This is very confusing.  Normally, a device is registered by the parent 
module and its driver belongs in the child module.  When the child 
module is loaded it automatically gets a reference to the parent module, 
because it calls functions that are defined in the parent.  I don't know 
of any cases where a parent module takes a reference to one of its 
children -- this would make it impossible to unload the child module!

In your diagram I can't tell whether ci_hdrc is the parent module and 
ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is 
the child, since it the one which gets a reference to the other.  But 
now we have the ci_hdrc.0 device being registered by the child module 
and its driver belonging to the parent module, which is backward!

Very difficult to understand.  Please explain more fully.

>  ci_hdrc will register
>    usb-role-switch device.
> 3. When module tcpci is loaded, it will register tcpm port device and try
>    to get tcpm module's reference. The tcpm module will get usb-role-switch
>    which is registered by ci_hdrc.

What do you mean by "will get"?  Do you mean that tcpm will become the 
driver for the usb_role_switch device?  Or do you mean that it simply 
calls get_device(&usb_role_switch)?

If the latter is the case, how does the tcpm driver learn the address of 
usb_role_switch in the first place?

>  In current design, tcpm will also try to
>    get ci_hdrc module's reference after get usb-role-switch.

This might be a bug.  There should not be any need for the tcpm driver 
to take a reference to the ci_hdrc module.  But there should be a way 
for the ci_hdrc driver to notify tcpm when the usb_role_switch device is 
about to be unregistered.  If tcpm is usb_role_switch's driver then this 
notification happens automatically, by means of the .remove() callback.

> 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
>    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
>    device usb-role-switch is also unregistered.

At this point, tcpm should learn that it has to drop all its references 
to usb_role_swich.  Since the module which registered usb_role_switch 
isn't tcpm's ancestor, tcpm must not keep _any_ references to the device 
after it is unregistered.

Well, strictly speaking that's not true.  By misusing the driver model, 
tcpm could keep a reference to the ci_hdrc module until it was finished 
using usb_role_switch.  Is that what you are trying to do?

> 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
>    will be shown due to below code:
>    
>    module_put(sw->dev.parent->driver->owner);
>    
>    parent->driver is NULL at this time.

What is dev at this point?  And what is dev.parent?  And what did 
dev.parent->driver used to be before it was set to NULL?

Alan Stern
Xu Yang Jan. 16, 2024, 5:44 a.m. UTC | #11
Hi Alan,

> 
> Those of us unfamiliar with this code need you to explain a lot more
> about what's going on.
> 
> On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > Taking below diagram as example:
> >
> >      ci_hdrc.0        register   usb    get     tcpm_port
> >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> >          ^  ^                    switch           |   ^
> >          |  |                                     |   |
> >        +1|  |           +1                        |   |+1
> >          |  +-------------------------------------+   |
> >          |                                            |
> >      4c200000.usb                                   1-0050
> > (driver: ci_hdrc_imx)                            (driver: tcpci)
> >
> > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> >    and try to get ci_hdrc module's reference.
> 
> This is very confusing.  Normally, a device is registered by the parent
> module and its driver belongs in the child module.  When the child
> module is loaded it automatically gets a reference to the parent module,
> because it calls functions that are defined in the parent.  I don't know
> of any cases where a parent module takes a reference to one of its
> children -- this would make it impossible to unload the child module!
> 
> In your diagram I can't tell whether ci_hdrc is the parent module and
> ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> the child, since it the one which gets a reference to the other.  But
> now we have the ci_hdrc.0 device being registered by the child module
> and its driver belonging to the parent module, which is backward!
> 
> Very difficult to understand.  Please explain more fully.

I checked again and let me correct the words.

2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
   At the same time, the reference of module ci_hdrc is added by 1
   automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
   ci_hdrc will register usb-role-switch device.

Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
is a child of 4c200000.usb.

> 
> >  ci_hdrc will register
> >    usb-role-switch device.
> > 3. When module tcpci is loaded, it will register tcpm port device and try
> >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> >    which is registered by ci_hdrc.
> 
> What do you mean by "will get"?  Do you mean that tcpm will become the
> driver for the usb_role_switch device?  Or do you mean that it simply
> calls get_device(&usb_role_switch)?
> 
> If the latter is the case, how does the tcpm driver learn the address of
> usb_role_switch in the first place?

Via
port->role_sw = usb_role_switch_get(port->dev) 
or
port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).

The usb controller will register usb-role-swtich device to the global list
of usb_role class. The fwnode of usb-role-swtich device is also set to usb
controller's fwnode. Initially, a fwnode graph between usb controller of
node and tcpm connector node had already been established. These two
functions will find usb-role-swtich device based on this fwnode graph
and fwnode matching. After usb-role-switce device is found, these two
functions will call: try_module_get(sw->dev.parent->driver->owner).

Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.

> 
> >  In current design, tcpm will also try to
> >    get ci_hdrc module's reference after get usb-role-switch.
> 
> This might be a bug.  There should not be any need for the tcpm driver
> to take a reference to the ci_hdrc module.  But there should be a way
> for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> about to be unregistered.  If tcpm is usb_role_switch's driver then this
> notification happens automatically, by means of the .remove() callback.

I'm not the designer of usb_role class driver. Not sure if this is needed to get
module reference of its parent device's driver. Maybe need @heikki's input.

@heikki.krogerus, can you give some explanations?

> 
> > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> >    device usb-role-switch is also unregistered.
> 
> At this point, tcpm should learn that it has to drop all its references
> to usb_role_swich.  Since the module which registered usb_role_switch
> isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> after it is unregistered.

Yes, I also think so.

> 
> Well, strictly speaking that's not true.  By misusing the driver model,
> tcpm could keep a reference to the ci_hdrc module until it was finished
> using usb_role_switch.  Is that what you are trying to do?

No, I'm trying to get module reference of ci_hdrc_imx too. Then, 
ci_hdrc_imx can't be unloaded before tcpci module unloaded.

> 
> > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> >    will be shown due to below code:
> >
> >    module_put(sw->dev.parent->driver->owner);
> >
> >    parent->driver is NULL at this time.
> 
> What is dev at this point?  And what is dev.parent?  And what did
> dev.parent->driver used to be before it was set to NULL?

Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
sw->dev.parent->driver was ci_hdrc.

Thanks,
Xu Yang

> 
> Alan Stern
Alan Stern Jan. 16, 2024, 4:03 p.m. UTC | #12
On Tue, Jan 16, 2024 at 05:44:47AM +0000, Xu Yang wrote:
> Hi Alan,
> 
> > 
> > Those of us unfamiliar with this code need you to explain a lot more
> > about what's going on.
> > 
> > On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > > Taking below diagram as example:
> > >
> > >      ci_hdrc.0        register   usb    get     tcpm_port
> > >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> > >          ^  ^                    switch           |   ^
> > >          |  |                                     |   |
> > >        +1|  |           +1                        |   |+1
> > >          |  +-------------------------------------+   |
> > >          |                                            |
> > >      4c200000.usb                                   1-0050
> > > (driver: ci_hdrc_imx)                            (driver: tcpci)
> > >
> > > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> > >    and try to get ci_hdrc module's reference.
> > 
> > This is very confusing.  Normally, a device is registered by the parent
> > module and its driver belongs in the child module.  When the child
> > module is loaded it automatically gets a reference to the parent module,
> > because it calls functions that are defined in the parent.  I don't know
> > of any cases where a parent module takes a reference to one of its
> > children -- this would make it impossible to unload the child module!
> > 
> > In your diagram I can't tell whether ci_hdrc is the parent module and
> > ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> > the child, since it the one which gets a reference to the other.  But
> > now we have the ci_hdrc.0 device being registered by the child module
> > and its driver belonging to the parent module, which is backward!
> > 
> > Very difficult to understand.  Please explain more fully.
> 
> I checked again and let me correct the words.
> 
> 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
>    At the same time, the reference of module ci_hdrc is added by 1
>    automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
>    ci_hdrc will register usb-role-switch device.
> 
> Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
> is a child of 4c200000.usb.

And ci_hdrc_imx is a child module of ci_hdrc.  Got it.

> > >  ci_hdrc will register
> > >    usb-role-switch device.
> > > 3. When module tcpci is loaded, it will register tcpm port device and try
> > >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> > >    which is registered by ci_hdrc.
> > 
> > What do you mean by "will get"?  Do you mean that tcpm will become the
> > driver for the usb_role_switch device?  Or do you mean that it simply
> > calls get_device(&usb_role_switch)?
> > 
> > If the latter is the case, how does the tcpm driver learn the address of
> > usb_role_switch in the first place?
> 
> Via
> port->role_sw = usb_role_switch_get(port->dev) 
> or
> port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).
> 
> The usb controller will register usb-role-swtich device to the global list
> of usb_role class. The fwnode of usb-role-swtich device is also set to usb
> controller's fwnode. Initially, a fwnode graph between usb controller of
> node and tcpm connector node had already been established. These two
> functions will find usb-role-swtich device based on this fwnode graph
> and fwnode matching.

If usb_role_switch_get() gives away references to the usb_role_switch 
device, it should have a way to take those references back.  But I guess 
it doesn't.

>  After usb-role-switce device is found, these two
> functions will call: try_module_get(sw->dev.parent->driver->owner).

You mean usb_role_switch_get() and fwnode_usb_role_switch_get() do this?

> Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.
> 
> > 
> > >  In current design, tcpm will also try to
> > >    get ci_hdrc module's reference after get usb-role-switch.
> > 
> > This might be a bug.  There should not be any need for the tcpm driver
> > to take a reference to the ci_hdrc module.  But there should be a way
> > for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> > about to be unregistered.  If tcpm is usb_role_switch's driver then this
> > notification happens automatically, by means of the .remove() callback.
> 
> I'm not the designer of usb_role class driver. Not sure if this is needed to get
> module reference of its parent device's driver. Maybe need @heikki's input.
> 
> @heikki.krogerus, can you give some explanations?

Yes, please, some additional explanation would help.

> > > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> > >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> > >    device usb-role-switch is also unregistered.
> > 
> > At this point, tcpm should learn that it has to drop all its references
> > to usb_role_swich.  Since the module which registered usb_role_switch
> > isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> > after it is unregistered.
> 
> Yes, I also think so.
> 
> > 
> > Well, strictly speaking that's not true.  By misusing the driver model,
> > tcpm could keep a reference to the ci_hdrc module until it was finished
> > using usb_role_switch.  Is that what you are trying to do?
> 
> No, I'm trying to get module reference of ci_hdrc_imx too. Then, 
> ci_hdrc_imx can't be unloaded before tcpci module unloaded.

You shouldn't do this.  Users should be able to unload ci_hdrc_imx 
whenever they want, even if tcpci is still loaded.

> > > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> > >    will be shown due to below code:
> > >
> > >    module_put(sw->dev.parent->driver->owner);

I forgot to ask: What function makes this call?  Is it part of the 
usb_role class driver?

> > >    parent->driver is NULL at this time.
> > 
> > What is dev at this point?  And what is dev.parent?  And what did
> > dev.parent->driver used to be before it was set to NULL?
> 
> Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
> sw->dev.parent->driver was ci_hdrc.

Which is now gone, right.  I understand.

Let's see what Heikki has to say.

However, assuming he wants to continue misusing the driver model in this 
way, what you should do is add a new field to sw, where you will store 
sw->dev.parent->driver.owner at the time of the try_module_get() call 
(but only if the call succeeds!).  Then when the module_put() call runs, 
have it use the value stored in this new field instead of dereferencing 
sw->dev.parent->driver.owner.

Alan Stern
Xu Yang Jan. 17, 2024, 5:57 a.m. UTC | #13
Hi Alan,

> 
> On Tue, Jan 16, 2024 at 05:44:47AM +0000, Xu Yang wrote:
> > Hi Alan,
> >
> > >
> > > Those of us unfamiliar with this code need you to explain a lot more
> > > about what's going on.
> > >
> > > On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > > > Taking below diagram as example:
> > > >
> > > >      ci_hdrc.0        register   usb    get     tcpm_port
> > > >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> > > >          ^  ^                    switch           |   ^
> > > >          |  |                                     |   |
> > > >        +1|  |           +1                        |   |+1
> > > >          |  +-------------------------------------+   |
> > > >          |                                            |
> > > >      4c200000.usb                                   1-0050
> > > > (driver: ci_hdrc_imx)                            (driver: tcpci)
> > > >
> > > > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> > > >    and try to get ci_hdrc module's reference.
> > >
> > > This is very confusing.  Normally, a device is registered by the parent
> > > module and its driver belongs in the child module.  When the child
> > > module is loaded it automatically gets a reference to the parent module,
> > > because it calls functions that are defined in the parent.  I don't know
> > > of any cases where a parent module takes a reference to one of its
> > > children -- this would make it impossible to unload the child module!
> > >
> > > In your diagram I can't tell whether ci_hdrc is the parent module and
> > > ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> > > the child, since it the one which gets a reference to the other.  But
> > > now we have the ci_hdrc.0 device being registered by the child module
> > > and its driver belonging to the parent module, which is backward!
> > >
> > > Very difficult to understand.  Please explain more fully.
> >
> > I checked again and let me correct the words.
> >
> > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
> >    At the same time, the reference of module ci_hdrc is added by 1
> >    automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
> >    ci_hdrc will register usb-role-switch device.
> >
> > Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
> > is a child of 4c200000.usb.
> 
> And ci_hdrc_imx is a child module of ci_hdrc.  Got it.
> 
> > > >  ci_hdrc will register
> > > >    usb-role-switch device.
> > > > 3. When module tcpci is loaded, it will register tcpm port device and try
> > > >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> > > >    which is registered by ci_hdrc.
> > >
> > > What do you mean by "will get"?  Do you mean that tcpm will become the
> > > driver for the usb_role_switch device?  Or do you mean that it simply
> > > calls get_device(&usb_role_switch)?
> > >
> > > If the latter is the case, how does the tcpm driver learn the address of
> > > usb_role_switch in the first place?
> >
> > Via
> > port->role_sw = usb_role_switch_get(port->dev)
> > or
> > port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).
> >
> > The usb controller will register usb-role-swtich device to the global list
> > of usb_role class. The fwnode of usb-role-swtich device is also set to usb
> > controller's fwnode. Initially, a fwnode graph between usb controller of
> > node and tcpm connector node had already been established. These two
> > functions will find usb-role-swtich device based on this fwnode graph
> > and fwnode matching.
> 
> If usb_role_switch_get() gives away references to the usb_role_switch
> device, it should have a way to take those references back.  But I guess
> it doesn't.
> 
> >  After usb-role-switce device is found, these two
> > functions will call: try_module_get(sw->dev.parent->driver->owner).
> 
> You mean usb_role_switch_get() and fwnode_usb_role_switch_get() do this?

Yes.

> 
> > Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.
> >
> > >
> > > >  In current design, tcpm will also try to
> > > >    get ci_hdrc module's reference after get usb-role-switch.
> > >
> > > This might be a bug.  There should not be any need for the tcpm driver
> > > to take a reference to the ci_hdrc module.  But there should be a way
> > > for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> > > about to be unregistered.  If tcpm is usb_role_switch's driver then this
> > > notification happens automatically, by means of the .remove() callback.
> >
> > I'm not the designer of usb_role class driver. Not sure if this is needed to get
> > module reference of its parent device's driver. Maybe need @heikki's input.
> >
> > @heikki.krogerus, can you give some explanations?
> 
> Yes, please, some additional explanation would help.
> 
> > > > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> > > >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> > > >    device usb-role-switch is also unregistered.
> > >
> > > At this point, tcpm should learn that it has to drop all its references
> > > to usb_role_swich.  Since the module which registered usb_role_switch
> > > isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> > > after it is unregistered.
> >
> > Yes, I also think so.
> >
> > >
> > > Well, strictly speaking that's not true.  By misusing the driver model,
> > > tcpm could keep a reference to the ci_hdrc module until it was finished
> > > using usb_role_switch.  Is that what you are trying to do?
> >
> > No, I'm trying to get module reference of ci_hdrc_imx too. Then,
> > ci_hdrc_imx can't be unloaded before tcpci module unloaded.
> 
> You shouldn't do this.  Users should be able to unload ci_hdrc_imx
> whenever they want, even if tcpci is still loaded.

Okay. Understand.

> 
> > > > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> > > >    will be shown due to below code:
> > > >
> > > >    module_put(sw->dev.parent->driver->owner);
> 
> I forgot to ask: What function makes this call?  Is it part of the
> usb_role class driver?

usb_role_switch_put() do this.
Yes, it's a function of usb_role class driver.

> 
> > > >    parent->driver is NULL at this time.
> > >
> > > What is dev at this point?  And what is dev.parent?  And what did
> > > dev.parent->driver used to be before it was set to NULL?
> >
> > Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
> > sw->dev.parent->driver was ci_hdrc.
> 
> Which is now gone, right.  I understand.
> 
> Let's see what Heikki has to say.
> 
> However, assuming he wants to continue misusing the driver model in this
> way, what you should do is add a new field to sw, where you will store
> sw->dev.parent->driver.owner at the time of the try_module_get() call
> (but only if the call succeeds!).  Then when the module_put() call runs,
> have it use the value stored in this new field instead of dereferencing
> sw->dev.parent->driver.owner.

It sounds like a better solution. 
Thanks for the suggestion!

Best Regards,
Xu Yang
Heikki Krogerus Jan. 18, 2024, 9:22 a.m. UTC | #14
Hi,

On Wed, Jan 17, 2024 at 05:57:02AM +0000, Xu Yang wrote:
> Hi Alan,
> 
> > 
> > On Tue, Jan 16, 2024 at 05:44:47AM +0000, Xu Yang wrote:
> > > Hi Alan,
> > >
> > > >
> > > > Those of us unfamiliar with this code need you to explain a lot more
> > > > about what's going on.
> > > >
> > > > On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > > > > Taking below diagram as example:
> > > > >
> > > > >      ci_hdrc.0        register   usb    get     tcpm_port
> > > > >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> > > > >          ^  ^                    switch           |   ^
> > > > >          |  |                                     |   |
> > > > >        +1|  |           +1                        |   |+1
> > > > >          |  +-------------------------------------+   |
> > > > >          |                                            |
> > > > >      4c200000.usb                                   1-0050
> > > > > (driver: ci_hdrc_imx)                            (driver: tcpci)
> > > > >
> > > > > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> > > > >    and try to get ci_hdrc module's reference.
> > > >
> > > > This is very confusing.  Normally, a device is registered by the parent
> > > > module and its driver belongs in the child module.  When the child
> > > > module is loaded it automatically gets a reference to the parent module,
> > > > because it calls functions that are defined in the parent.  I don't know
> > > > of any cases where a parent module takes a reference to one of its
> > > > children -- this would make it impossible to unload the child module!
> > > >
> > > > In your diagram I can't tell whether ci_hdrc is the parent module and
> > > > ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> > > > the child, since it the one which gets a reference to the other.  But
> > > > now we have the ci_hdrc.0 device being registered by the child module
> > > > and its driver belonging to the parent module, which is backward!
> > > >
> > > > Very difficult to understand.  Please explain more fully.
> > >
> > > I checked again and let me correct the words.
> > >
> > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
> > >    At the same time, the reference of module ci_hdrc is added by 1
> > >    automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
> > >    ci_hdrc will register usb-role-switch device.
> > >
> > > Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
> > > is a child of 4c200000.usb.
> > 
> > And ci_hdrc_imx is a child module of ci_hdrc.  Got it.
> > 
> > > > >  ci_hdrc will register
> > > > >    usb-role-switch device.
> > > > > 3. When module tcpci is loaded, it will register tcpm port device and try
> > > > >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> > > > >    which is registered by ci_hdrc.
> > > >
> > > > What do you mean by "will get"?  Do you mean that tcpm will become the
> > > > driver for the usb_role_switch device?  Or do you mean that it simply
> > > > calls get_device(&usb_role_switch)?
> > > >
> > > > If the latter is the case, how does the tcpm driver learn the address of
> > > > usb_role_switch in the first place?
> > >
> > > Via
> > > port->role_sw = usb_role_switch_get(port->dev)
> > > or
> > > port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).
> > >
> > > The usb controller will register usb-role-swtich device to the global list
> > > of usb_role class. The fwnode of usb-role-swtich device is also set to usb
> > > controller's fwnode. Initially, a fwnode graph between usb controller of
> > > node and tcpm connector node had already been established. These two
> > > functions will find usb-role-swtich device based on this fwnode graph
> > > and fwnode matching.
> > 
> > If usb_role_switch_get() gives away references to the usb_role_switch
> > device, it should have a way to take those references back.  But I guess
> > it doesn't.
> > 
> > >  After usb-role-switce device is found, these two
> > > functions will call: try_module_get(sw->dev.parent->driver->owner).
> > 
> > You mean usb_role_switch_get() and fwnode_usb_role_switch_get() do this?
> 
> Yes.
> 
> > 
> > > Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.
> > >
> > > >
> > > > >  In current design, tcpm will also try to
> > > > >    get ci_hdrc module's reference after get usb-role-switch.
> > > >
> > > > This might be a bug.  There should not be any need for the tcpm driver
> > > > to take a reference to the ci_hdrc module.  But there should be a way
> > > > for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> > > > about to be unregistered.  If tcpm is usb_role_switch's driver then this
> > > > notification happens automatically, by means of the .remove() callback.
> > >
> > > I'm not the designer of usb_role class driver. Not sure if this is needed to get
> > > module reference of its parent device's driver. Maybe need @heikki's input.
> > >
> > > @heikki.krogerus, can you give some explanations?
> > 
> > Yes, please, some additional explanation would help.
> > 
> > > > > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> > > > >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> > > > >    device usb-role-switch is also unregistered.
> > > >
> > > > At this point, tcpm should learn that it has to drop all its references
> > > > to usb_role_swich.  Since the module which registered usb_role_switch
> > > > isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> > > > after it is unregistered.
> > >
> > > Yes, I also think so.
> > >
> > > >
> > > > Well, strictly speaking that's not true.  By misusing the driver model,
> > > > tcpm could keep a reference to the ci_hdrc module until it was finished
> > > > using usb_role_switch.  Is that what you are trying to do?
> > >
> > > No, I'm trying to get module reference of ci_hdrc_imx too. Then,
> > > ci_hdrc_imx can't be unloaded before tcpci module unloaded.
> > 
> > You shouldn't do this.  Users should be able to unload ci_hdrc_imx
> > whenever they want, even if tcpci is still loaded.
> 
> Okay. Understand.
> 
> > 
> > > > > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> > > > >    will be shown due to below code:
> > > > >
> > > > >    module_put(sw->dev.parent->driver->owner);
> > 
> > I forgot to ask: What function makes this call?  Is it part of the
> > usb_role class driver?
> 
> usb_role_switch_put() do this.
> Yes, it's a function of usb_role class driver.
> 
> > 
> > > > >    parent->driver is NULL at this time.
> > > >
> > > > What is dev at this point?  And what is dev.parent?  And what did
> > > > dev.parent->driver used to be before it was set to NULL?
> > >
> > > Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
> > > sw->dev.parent->driver was ci_hdrc.
> > 
> > Which is now gone, right.  I understand.
> > 
> > Let's see what Heikki has to say.
> > 
> > However, assuming he wants to continue misusing the driver model in this
> > way, what you should do is add a new field to sw, where you will store
> > sw->dev.parent->driver.owner at the time of the try_module_get() call
> > (but only if the call succeeds!).  Then when the module_put() call runs,
> > have it use the value stored in this new field instead of dereferencing
> > sw->dev.parent->driver.owner.
> 
> It sounds like a better solution. 
> Thanks for the suggestion!

If there is a better way of doing this, then let's use it. I'm happy
with what ever works.

The only requirement here is that we have some way of preventing the
role switch provider from being unloaded while it's being used.

thanks,
Greg KH Jan. 18, 2024, 9:53 a.m. UTC | #15
On Thu, Jan 18, 2024 at 11:22:06AM +0200, Heikki Krogerus wrote:
> Hi,
> 
> On Wed, Jan 17, 2024 at 05:57:02AM +0000, Xu Yang wrote:
> > Hi Alan,
> > 
> > > 
> > > On Tue, Jan 16, 2024 at 05:44:47AM +0000, Xu Yang wrote:
> > > > Hi Alan,
> > > >
> > > > >
> > > > > Those of us unfamiliar with this code need you to explain a lot more
> > > > > about what's going on.
> > > > >
> > > > > On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > > > > > Taking below diagram as example:
> > > > > >
> > > > > >      ci_hdrc.0        register   usb    get     tcpm_port
> > > > > >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> > > > > >          ^  ^                    switch           |   ^
> > > > > >          |  |                                     |   |
> > > > > >        +1|  |           +1                        |   |+1
> > > > > >          |  +-------------------------------------+   |
> > > > > >          |                                            |
> > > > > >      4c200000.usb                                   1-0050
> > > > > > (driver: ci_hdrc_imx)                            (driver: tcpci)
> > > > > >
> > > > > > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > > > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> > > > > >    and try to get ci_hdrc module's reference.
> > > > >
> > > > > This is very confusing.  Normally, a device is registered by the parent
> > > > > module and its driver belongs in the child module.  When the child
> > > > > module is loaded it automatically gets a reference to the parent module,
> > > > > because it calls functions that are defined in the parent.  I don't know
> > > > > of any cases where a parent module takes a reference to one of its
> > > > > children -- this would make it impossible to unload the child module!
> > > > >
> > > > > In your diagram I can't tell whether ci_hdrc is the parent module and
> > > > > ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> > > > > the child, since it the one which gets a reference to the other.  But
> > > > > now we have the ci_hdrc.0 device being registered by the child module
> > > > > and its driver belonging to the parent module, which is backward!
> > > > >
> > > > > Very difficult to understand.  Please explain more fully.
> > > >
> > > > I checked again and let me correct the words.
> > > >
> > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
> > > >    At the same time, the reference of module ci_hdrc is added by 1
> > > >    automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
> > > >    ci_hdrc will register usb-role-switch device.
> > > >
> > > > Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
> > > > is a child of 4c200000.usb.
> > > 
> > > And ci_hdrc_imx is a child module of ci_hdrc.  Got it.
> > > 
> > > > > >  ci_hdrc will register
> > > > > >    usb-role-switch device.
> > > > > > 3. When module tcpci is loaded, it will register tcpm port device and try
> > > > > >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> > > > > >    which is registered by ci_hdrc.
> > > > >
> > > > > What do you mean by "will get"?  Do you mean that tcpm will become the
> > > > > driver for the usb_role_switch device?  Or do you mean that it simply
> > > > > calls get_device(&usb_role_switch)?
> > > > >
> > > > > If the latter is the case, how does the tcpm driver learn the address of
> > > > > usb_role_switch in the first place?
> > > >
> > > > Via
> > > > port->role_sw = usb_role_switch_get(port->dev)
> > > > or
> > > > port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).
> > > >
> > > > The usb controller will register usb-role-swtich device to the global list
> > > > of usb_role class. The fwnode of usb-role-swtich device is also set to usb
> > > > controller's fwnode. Initially, a fwnode graph between usb controller of
> > > > node and tcpm connector node had already been established. These two
> > > > functions will find usb-role-swtich device based on this fwnode graph
> > > > and fwnode matching.
> > > 
> > > If usb_role_switch_get() gives away references to the usb_role_switch
> > > device, it should have a way to take those references back.  But I guess
> > > it doesn't.
> > > 
> > > >  After usb-role-switce device is found, these two
> > > > functions will call: try_module_get(sw->dev.parent->driver->owner).
> > > 
> > > You mean usb_role_switch_get() and fwnode_usb_role_switch_get() do this?
> > 
> > Yes.
> > 
> > > 
> > > > Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.
> > > >
> > > > >
> > > > > >  In current design, tcpm will also try to
> > > > > >    get ci_hdrc module's reference after get usb-role-switch.
> > > > >
> > > > > This might be a bug.  There should not be any need for the tcpm driver
> > > > > to take a reference to the ci_hdrc module.  But there should be a way
> > > > > for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> > > > > about to be unregistered.  If tcpm is usb_role_switch's driver then this
> > > > > notification happens automatically, by means of the .remove() callback.
> > > >
> > > > I'm not the designer of usb_role class driver. Not sure if this is needed to get
> > > > module reference of its parent device's driver. Maybe need @heikki's input.
> > > >
> > > > @heikki.krogerus, can you give some explanations?
> > > 
> > > Yes, please, some additional explanation would help.
> > > 
> > > > > > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> > > > > >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> > > > > >    device usb-role-switch is also unregistered.
> > > > >
> > > > > At this point, tcpm should learn that it has to drop all its references
> > > > > to usb_role_swich.  Since the module which registered usb_role_switch
> > > > > isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> > > > > after it is unregistered.
> > > >
> > > > Yes, I also think so.
> > > >
> > > > >
> > > > > Well, strictly speaking that's not true.  By misusing the driver model,
> > > > > tcpm could keep a reference to the ci_hdrc module until it was finished
> > > > > using usb_role_switch.  Is that what you are trying to do?
> > > >
> > > > No, I'm trying to get module reference of ci_hdrc_imx too. Then,
> > > > ci_hdrc_imx can't be unloaded before tcpci module unloaded.
> > > 
> > > You shouldn't do this.  Users should be able to unload ci_hdrc_imx
> > > whenever they want, even if tcpci is still loaded.
> > 
> > Okay. Understand.
> > 
> > > 
> > > > > > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> > > > > >    will be shown due to below code:
> > > > > >
> > > > > >    module_put(sw->dev.parent->driver->owner);
> > > 
> > > I forgot to ask: What function makes this call?  Is it part of the
> > > usb_role class driver?
> > 
> > usb_role_switch_put() do this.
> > Yes, it's a function of usb_role class driver.
> > 
> > > 
> > > > > >    parent->driver is NULL at this time.
> > > > >
> > > > > What is dev at this point?  And what is dev.parent?  And what did
> > > > > dev.parent->driver used to be before it was set to NULL?
> > > >
> > > > Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
> > > > sw->dev.parent->driver was ci_hdrc.
> > > 
> > > Which is now gone, right.  I understand.
> > > 
> > > Let's see what Heikki has to say.
> > > 
> > > However, assuming he wants to continue misusing the driver model in this
> > > way, what you should do is add a new field to sw, where you will store
> > > sw->dev.parent->driver.owner at the time of the try_module_get() call
> > > (but only if the call succeeds!).  Then when the module_put() call runs,
> > > have it use the value stored in this new field instead of dereferencing
> > > sw->dev.parent->driver.owner.
> > 
> > It sounds like a better solution. 
> > Thanks for the suggestion!
> 
> If there is a better way of doing this, then let's use it. I'm happy
> with what ever works.
> 
> The only requirement here is that we have some way of preventing the
> role switch provider from being unloaded while it's being used.

Why?  What defines "being used"?

Any module should be able to be removed any time, unless there is a
"code" requirement of it being present.  The driver model structures
should make this possible if used properly.  Trying to much around in
various parent devices and the drivers bound to parents should NEVER be
done as happens here, sorry I missed that in the review process.

thanks,

greg k-h
Heikki Krogerus Jan. 18, 2024, 12:38 p.m. UTC | #16
Hi Greg,

On Thu, Jan 18, 2024 at 10:53:34AM +0100, Greg KH wrote:
> On Thu, Jan 18, 2024 at 11:22:06AM +0200, Heikki Krogerus wrote:
> > Hi,
> > 
> > On Wed, Jan 17, 2024 at 05:57:02AM +0000, Xu Yang wrote:
> > > Hi Alan,
> > > 
> > > > 
> > > > On Tue, Jan 16, 2024 at 05:44:47AM +0000, Xu Yang wrote:
> > > > > Hi Alan,
> > > > >
> > > > > >
> > > > > > Those of us unfamiliar with this code need you to explain a lot more
> > > > > > about what's going on.
> > > > > >
> > > > > > On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > > > > > > Taking below diagram as example:
> > > > > > >
> > > > > > >      ci_hdrc.0        register   usb    get     tcpm_port
> > > > > > >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> > > > > > >          ^  ^                    switch           |   ^
> > > > > > >          |  |                                     |   |
> > > > > > >        +1|  |           +1                        |   |+1
> > > > > > >          |  +-------------------------------------+   |
> > > > > > >          |                                            |
> > > > > > >      4c200000.usb                                   1-0050
> > > > > > > (driver: ci_hdrc_imx)                            (driver: tcpci)
> > > > > > >
> > > > > > > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > > > > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> > > > > > >    and try to get ci_hdrc module's reference.
> > > > > >
> > > > > > This is very confusing.  Normally, a device is registered by the parent
> > > > > > module and its driver belongs in the child module.  When the child
> > > > > > module is loaded it automatically gets a reference to the parent module,
> > > > > > because it calls functions that are defined in the parent.  I don't know
> > > > > > of any cases where a parent module takes a reference to one of its
> > > > > > children -- this would make it impossible to unload the child module!
> > > > > >
> > > > > > In your diagram I can't tell whether ci_hdrc is the parent module and
> > > > > > ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> > > > > > the child, since it the one which gets a reference to the other.  But
> > > > > > now we have the ci_hdrc.0 device being registered by the child module
> > > > > > and its driver belonging to the parent module, which is backward!
> > > > > >
> > > > > > Very difficult to understand.  Please explain more fully.
> > > > >
> > > > > I checked again and let me correct the words.
> > > > >
> > > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
> > > > >    At the same time, the reference of module ci_hdrc is added by 1
> > > > >    automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
> > > > >    ci_hdrc will register usb-role-switch device.
> > > > >
> > > > > Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
> > > > > is a child of 4c200000.usb.
> > > > 
> > > > And ci_hdrc_imx is a child module of ci_hdrc.  Got it.
> > > > 
> > > > > > >  ci_hdrc will register
> > > > > > >    usb-role-switch device.
> > > > > > > 3. When module tcpci is loaded, it will register tcpm port device and try
> > > > > > >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> > > > > > >    which is registered by ci_hdrc.
> > > > > >
> > > > > > What do you mean by "will get"?  Do you mean that tcpm will become the
> > > > > > driver for the usb_role_switch device?  Or do you mean that it simply
> > > > > > calls get_device(&usb_role_switch)?
> > > > > >
> > > > > > If the latter is the case, how does the tcpm driver learn the address of
> > > > > > usb_role_switch in the first place?
> > > > >
> > > > > Via
> > > > > port->role_sw = usb_role_switch_get(port->dev)
> > > > > or
> > > > > port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).
> > > > >
> > > > > The usb controller will register usb-role-swtich device to the global list
> > > > > of usb_role class. The fwnode of usb-role-swtich device is also set to usb
> > > > > controller's fwnode. Initially, a fwnode graph between usb controller of
> > > > > node and tcpm connector node had already been established. These two
> > > > > functions will find usb-role-swtich device based on this fwnode graph
> > > > > and fwnode matching.
> > > > 
> > > > If usb_role_switch_get() gives away references to the usb_role_switch
> > > > device, it should have a way to take those references back.  But I guess
> > > > it doesn't.
> > > > 
> > > > >  After usb-role-switce device is found, these two
> > > > > functions will call: try_module_get(sw->dev.parent->driver->owner).
> > > > 
> > > > You mean usb_role_switch_get() and fwnode_usb_role_switch_get() do this?
> > > 
> > > Yes.
> > > 
> > > > 
> > > > > Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.
> > > > >
> > > > > >
> > > > > > >  In current design, tcpm will also try to
> > > > > > >    get ci_hdrc module's reference after get usb-role-switch.
> > > > > >
> > > > > > This might be a bug.  There should not be any need for the tcpm driver
> > > > > > to take a reference to the ci_hdrc module.  But there should be a way
> > > > > > for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> > > > > > about to be unregistered.  If tcpm is usb_role_switch's driver then this
> > > > > > notification happens automatically, by means of the .remove() callback.
> > > > >
> > > > > I'm not the designer of usb_role class driver. Not sure if this is needed to get
> > > > > module reference of its parent device's driver. Maybe need @heikki's input.
> > > > >
> > > > > @heikki.krogerus, can you give some explanations?
> > > > 
> > > > Yes, please, some additional explanation would help.
> > > > 
> > > > > > > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> > > > > > >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> > > > > > >    device usb-role-switch is also unregistered.
> > > > > >
> > > > > > At this point, tcpm should learn that it has to drop all its references
> > > > > > to usb_role_swich.  Since the module which registered usb_role_switch
> > > > > > isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> > > > > > after it is unregistered.
> > > > >
> > > > > Yes, I also think so.
> > > > >
> > > > > >
> > > > > > Well, strictly speaking that's not true.  By misusing the driver model,
> > > > > > tcpm could keep a reference to the ci_hdrc module until it was finished
> > > > > > using usb_role_switch.  Is that what you are trying to do?
> > > > >
> > > > > No, I'm trying to get module reference of ci_hdrc_imx too. Then,
> > > > > ci_hdrc_imx can't be unloaded before tcpci module unloaded.
> > > > 
> > > > You shouldn't do this.  Users should be able to unload ci_hdrc_imx
> > > > whenever they want, even if tcpci is still loaded.
> > > 
> > > Okay. Understand.
> > > 
> > > > 
> > > > > > > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> > > > > > >    will be shown due to below code:
> > > > > > >
> > > > > > >    module_put(sw->dev.parent->driver->owner);
> > > > 
> > > > I forgot to ask: What function makes this call?  Is it part of the
> > > > usb_role class driver?
> > > 
> > > usb_role_switch_put() do this.
> > > Yes, it's a function of usb_role class driver.
> > > 
> > > > 
> > > > > > >    parent->driver is NULL at this time.
> > > > > >
> > > > > > What is dev at this point?  And what is dev.parent?  And what did
> > > > > > dev.parent->driver used to be before it was set to NULL?
> > > > >
> > > > > Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
> > > > > sw->dev.parent->driver was ci_hdrc.
> > > > 
> > > > Which is now gone, right.  I understand.
> > > > 
> > > > Let's see what Heikki has to say.
> > > > 
> > > > However, assuming he wants to continue misusing the driver model in this
> > > > way, what you should do is add a new field to sw, where you will store
> > > > sw->dev.parent->driver.owner at the time of the try_module_get() call
> > > > (but only if the call succeeds!).  Then when the module_put() call runs,
> > > > have it use the value stored in this new field instead of dereferencing
> > > > sw->dev.parent->driver.owner.
> > > 
> > > It sounds like a better solution. 
> > > Thanks for the suggestion!
> > 
> > If there is a better way of doing this, then let's use it. I'm happy
> > with what ever works.
> > 
> > The only requirement here is that we have some way of preventing the
> > role switch provider from being unloaded while it's being used.
> 
> Why?  What defines "being used"?

It is "being used" when something (user) acquires reference to the
role switch that it provides. The "user" is in most cases USB Type-C
port driver - what ever knows the role the port needs to be in.

USB role switch is meant to act as a "resource" like any other. So
when you acquire for example a GPIO, the gpiodev driver is pinned down
by calling module_get() just like this driver is doing to the switch
provider.

> Any module should be able to be removed any time, unless there is a
> "code" requirement of it being present.  The driver model structures
> should make this possible if used properly.  Trying to much around in
> various parent devices and the drivers bound to parents should NEVER be
> done as happens here, sorry I missed that in the review process.

If this is not something that this kind of device class should be
doing, then let's remove the whole thing. But if that is the case,
then shouldn't we remove that from all the other bus and device class
drivers as well?

thanks,
Greg KH Jan. 18, 2024, 2:54 p.m. UTC | #17
On Thu, Jan 18, 2024 at 02:38:58PM +0200, Heikki Krogerus wrote:
> Hi Greg,
> 
> On Thu, Jan 18, 2024 at 10:53:34AM +0100, Greg KH wrote:
> > On Thu, Jan 18, 2024 at 11:22:06AM +0200, Heikki Krogerus wrote:
> > > Hi,
> > > 
> > > On Wed, Jan 17, 2024 at 05:57:02AM +0000, Xu Yang wrote:
> > > > Hi Alan,
> > > > 
> > > > > 
> > > > > On Tue, Jan 16, 2024 at 05:44:47AM +0000, Xu Yang wrote:
> > > > > > Hi Alan,
> > > > > >
> > > > > > >
> > > > > > > Those of us unfamiliar with this code need you to explain a lot more
> > > > > > > about what's going on.
> > > > > > >
> > > > > > > On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > > > > > > > Taking below diagram as example:
> > > > > > > >
> > > > > > > >      ci_hdrc.0        register   usb    get     tcpm_port
> > > > > > > >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> > > > > > > >          ^  ^                    switch           |   ^
> > > > > > > >          |  |                                     |   |
> > > > > > > >        +1|  |           +1                        |   |+1
> > > > > > > >          |  +-------------------------------------+   |
> > > > > > > >          |                                            |
> > > > > > > >      4c200000.usb                                   1-0050
> > > > > > > > (driver: ci_hdrc_imx)                            (driver: tcpci)
> > > > > > > >
> > > > > > > > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > > > > > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> > > > > > > >    and try to get ci_hdrc module's reference.
> > > > > > >
> > > > > > > This is very confusing.  Normally, a device is registered by the parent
> > > > > > > module and its driver belongs in the child module.  When the child
> > > > > > > module is loaded it automatically gets a reference to the parent module,
> > > > > > > because it calls functions that are defined in the parent.  I don't know
> > > > > > > of any cases where a parent module takes a reference to one of its
> > > > > > > children -- this would make it impossible to unload the child module!
> > > > > > >
> > > > > > > In your diagram I can't tell whether ci_hdrc is the parent module and
> > > > > > > ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> > > > > > > the child, since it the one which gets a reference to the other.  But
> > > > > > > now we have the ci_hdrc.0 device being registered by the child module
> > > > > > > and its driver belonging to the parent module, which is backward!
> > > > > > >
> > > > > > > Very difficult to understand.  Please explain more fully.
> > > > > >
> > > > > > I checked again and let me correct the words.
> > > > > >
> > > > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
> > > > > >    At the same time, the reference of module ci_hdrc is added by 1
> > > > > >    automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
> > > > > >    ci_hdrc will register usb-role-switch device.
> > > > > >
> > > > > > Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
> > > > > > is a child of 4c200000.usb.
> > > > > 
> > > > > And ci_hdrc_imx is a child module of ci_hdrc.  Got it.
> > > > > 
> > > > > > > >  ci_hdrc will register
> > > > > > > >    usb-role-switch device.
> > > > > > > > 3. When module tcpci is loaded, it will register tcpm port device and try
> > > > > > > >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> > > > > > > >    which is registered by ci_hdrc.
> > > > > > >
> > > > > > > What do you mean by "will get"?  Do you mean that tcpm will become the
> > > > > > > driver for the usb_role_switch device?  Or do you mean that it simply
> > > > > > > calls get_device(&usb_role_switch)?
> > > > > > >
> > > > > > > If the latter is the case, how does the tcpm driver learn the address of
> > > > > > > usb_role_switch in the first place?
> > > > > >
> > > > > > Via
> > > > > > port->role_sw = usb_role_switch_get(port->dev)
> > > > > > or
> > > > > > port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).
> > > > > >
> > > > > > The usb controller will register usb-role-swtich device to the global list
> > > > > > of usb_role class. The fwnode of usb-role-swtich device is also set to usb
> > > > > > controller's fwnode. Initially, a fwnode graph between usb controller of
> > > > > > node and tcpm connector node had already been established. These two
> > > > > > functions will find usb-role-swtich device based on this fwnode graph
> > > > > > and fwnode matching.
> > > > > 
> > > > > If usb_role_switch_get() gives away references to the usb_role_switch
> > > > > device, it should have a way to take those references back.  But I guess
> > > > > it doesn't.
> > > > > 
> > > > > >  After usb-role-switce device is found, these two
> > > > > > functions will call: try_module_get(sw->dev.parent->driver->owner).
> > > > > 
> > > > > You mean usb_role_switch_get() and fwnode_usb_role_switch_get() do this?
> > > > 
> > > > Yes.
> > > > 
> > > > > 
> > > > > > Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.
> > > > > >
> > > > > > >
> > > > > > > >  In current design, tcpm will also try to
> > > > > > > >    get ci_hdrc module's reference after get usb-role-switch.
> > > > > > >
> > > > > > > This might be a bug.  There should not be any need for the tcpm driver
> > > > > > > to take a reference to the ci_hdrc module.  But there should be a way
> > > > > > > for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> > > > > > > about to be unregistered.  If tcpm is usb_role_switch's driver then this
> > > > > > > notification happens automatically, by means of the .remove() callback.
> > > > > >
> > > > > > I'm not the designer of usb_role class driver. Not sure if this is needed to get
> > > > > > module reference of its parent device's driver. Maybe need @heikki's input.
> > > > > >
> > > > > > @heikki.krogerus, can you give some explanations?
> > > > > 
> > > > > Yes, please, some additional explanation would help.
> > > > > 
> > > > > > > > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> > > > > > > >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> > > > > > > >    device usb-role-switch is also unregistered.
> > > > > > >
> > > > > > > At this point, tcpm should learn that it has to drop all its references
> > > > > > > to usb_role_swich.  Since the module which registered usb_role_switch
> > > > > > > isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> > > > > > > after it is unregistered.
> > > > > >
> > > > > > Yes, I also think so.
> > > > > >
> > > > > > >
> > > > > > > Well, strictly speaking that's not true.  By misusing the driver model,
> > > > > > > tcpm could keep a reference to the ci_hdrc module until it was finished
> > > > > > > using usb_role_switch.  Is that what you are trying to do?
> > > > > >
> > > > > > No, I'm trying to get module reference of ci_hdrc_imx too. Then,
> > > > > > ci_hdrc_imx can't be unloaded before tcpci module unloaded.
> > > > > 
> > > > > You shouldn't do this.  Users should be able to unload ci_hdrc_imx
> > > > > whenever they want, even if tcpci is still loaded.
> > > > 
> > > > Okay. Understand.
> > > > 
> > > > > 
> > > > > > > > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> > > > > > > >    will be shown due to below code:
> > > > > > > >
> > > > > > > >    module_put(sw->dev.parent->driver->owner);
> > > > > 
> > > > > I forgot to ask: What function makes this call?  Is it part of the
> > > > > usb_role class driver?
> > > > 
> > > > usb_role_switch_put() do this.
> > > > Yes, it's a function of usb_role class driver.
> > > > 
> > > > > 
> > > > > > > >    parent->driver is NULL at this time.
> > > > > > >
> > > > > > > What is dev at this point?  And what is dev.parent?  And what did
> > > > > > > dev.parent->driver used to be before it was set to NULL?
> > > > > >
> > > > > > Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
> > > > > > sw->dev.parent->driver was ci_hdrc.
> > > > > 
> > > > > Which is now gone, right.  I understand.
> > > > > 
> > > > > Let's see what Heikki has to say.
> > > > > 
> > > > > However, assuming he wants to continue misusing the driver model in this
> > > > > way, what you should do is add a new field to sw, where you will store
> > > > > sw->dev.parent->driver.owner at the time of the try_module_get() call
> > > > > (but only if the call succeeds!).  Then when the module_put() call runs,
> > > > > have it use the value stored in this new field instead of dereferencing
> > > > > sw->dev.parent->driver.owner.
> > > > 
> > > > It sounds like a better solution. 
> > > > Thanks for the suggestion!
> > > 
> > > If there is a better way of doing this, then let's use it. I'm happy
> > > with what ever works.
> > > 
> > > The only requirement here is that we have some way of preventing the
> > > role switch provider from being unloaded while it's being used.
> > 
> > Why?  What defines "being used"?
> 
> It is "being used" when something (user) acquires reference to the
> role switch that it provides. The "user" is in most cases USB Type-C
> port driver - what ever knows the role the port needs to be in.
> 
> USB role switch is meant to act as a "resource" like any other. So
> when you acquire for example a GPIO, the gpiodev driver is pinned down
> by calling module_get() just like this driver is doing to the switch
> provider.

So again, if the hardware is present in the system, the module reference
count will always be increased and can never be removed no matter what a
user does?  That feels wrong if so.

> > Any module should be able to be removed any time, unless there is a
> > "code" requirement of it being present.  The driver model structures
> > should make this possible if used properly.  Trying to much around in
> > various parent devices and the drivers bound to parents should NEVER be
> > done as happens here, sorry I missed that in the review process.
> 
> If this is not something that this kind of device class should be
> doing, then let's remove the whole thing. But if that is the case,
> then shouldn't we remove that from all the other bus and device class
> drivers as well?

I started to remove it many years ago, but then something prevented that
as there was actually some valid uses, but I can't remember at the
moment.  Try taking it out and see what happens!  :)

Sorry, I don't have the time in the near future to work on this...

greg k-h
Alan Stern Jan. 18, 2024, 3:28 p.m. UTC | #18
On Thu, Jan 18, 2024 at 03:54:16PM +0100, Greg KH wrote:
> On Thu, Jan 18, 2024 at 02:38:58PM +0200, Heikki Krogerus wrote:
> > It is "being used" when something (user) acquires reference to the
> > role switch that it provides. The "user" is in most cases USB Type-C
> > port driver - what ever knows the role the port needs to be in.
> > 
> > USB role switch is meant to act as a "resource" like any other. So
> > when you acquire for example a GPIO, the gpiodev driver is pinned down
> > by calling module_get() just like this driver is doing to the switch
> > provider.
> 
> So again, if the hardware is present in the system, the module reference
> count will always be increased and can never be removed no matter what a
> user does?  That feels wrong if so.

It's not quite that bad.  Even if the USB role-switch hardware is 
present in the system, the module controlling it can be removed if the 
role-switch resource isn't being used (which will cause the module count 
will go back down to 0).  This may require the user first to remove the 
module that's using the resource.

> > > Any module should be able to be removed any time, unless there is a
> > > "code" requirement of it being present.  The driver model structures
> > > should make this possible if used properly.  Trying to much around in
> > > various parent devices and the drivers bound to parents should NEVER be
> > > done as happens here, sorry I missed that in the review process.
> > 
> > If this is not something that this kind of device class should be
> > doing, then let's remove the whole thing. But if that is the case,
> > then shouldn't we remove that from all the other bus and device class
> > drivers as well?
> 
> I started to remove it many years ago, but then something prevented that
> as there was actually some valid uses, but I can't remember at the
> moment.  Try taking it out and see what happens!  :)

The problem here is that a device is being used by module B that isn't a 
descendant of the device's driver module A.  If B didn't take a 
reference to A then A could be unloaded while B was still using the 
device.  When B finally drops its reference to the device, the 
driver-model release call would get a NULL-pointer violation, since the 
code for the release method was part of A and has been unloaded from 
memory.

In the current code, B takes a module reference to A and drops that 
reference after dropping its reference to the device.  The bug that Xu 
Yang wants to fix (the original reason for this discussion) has to do 
with the way B drops its module reference to A.  It gets the module 
handle of A by looking up the owner of the device's driver -- and that 
approach doesn't work if the device has been unbound from its driver, 
for the obvious reason.  The proper solution to this problem is for B to 
cache A's handle at the time when it first acquires the module 
reference.

Alan Stern
Xu Yang Jan. 18, 2024, 3:29 p.m. UTC | #19
Hi Heikki,

> 
> Hi,
> 
> On Wed, Jan 17, 2024 at 05:57:02AM +0000, Xu Yang wrote:
> > Hi Alan,
> >
> > >
> > > On Tue, Jan 16, 2024 at 05:44:47AM +0000, Xu Yang wrote:
> > > > Hi Alan,
> > > >
> > > > >
> > > > > Those of us unfamiliar with this code need you to explain a lot more
> > > > > about what's going on.
> > > > >
> > > > > On Mon, Jan 15, 2024 at 03:02:06AM +0000, Xu Yang wrote:
> > > > > > Taking below diagram as example:
> > > > > >
> > > > > >      ci_hdrc.0        register   usb    get     tcpm_port
> > > > > >   (driver: ci_hdrc)  --------->  role  <----  (driver: tcpm)
> > > > > >          ^  ^                    switch           |   ^
> > > > > >          |  |                                     |   |
> > > > > >        +1|  |           +1                        |   |+1
> > > > > >          |  +-------------------------------------+   |
> > > > > >          |                                            |
> > > > > >      4c200000.usb                                   1-0050
> > > > > > (driver: ci_hdrc_imx)                            (driver: tcpci)
> > > > > >
> > > > > > 1. Driver ci_hdrc_imx and tcpci are built as module at least.
> > > > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device
> > > > > >    and try to get ci_hdrc module's reference.
> > > > >
> > > > > This is very confusing.  Normally, a device is registered by the parent
> > > > > module and its driver belongs in the child module.  When the child
> > > > > module is loaded it automatically gets a reference to the parent module,
> > > > > because it calls functions that are defined in the parent.  I don't know
> > > > > of any cases where a parent module takes a reference to one of its
> > > > > children -- this would make it impossible to unload the child module!
> > > > >
> > > > > In your diagram I can't tell whether ci_hdrc is the parent module and
> > > > > ci_hdrc_imx is the child, or vice versa.  I'll guess that ci_hdrc_imx is
> > > > > the child, since it the one which gets a reference to the other.  But
> > > > > now we have the ci_hdrc.0 device being registered by the child module
> > > > > and its driver belonging to the parent module, which is backward!
> > > > >
> > > > > Very difficult to understand.  Please explain more fully.
> > > >
> > > > I checked again and let me correct the words.
> > > >
> > > > 2. When module ci_hdrc_imx is loaded, it will register ci_hdrc.0 device.
> > > >    At the same time, the reference of module ci_hdrc is added by 1
> > > >    automatically due to ci_hdrc_imx calls some functions in module ci_hdrc.
> > > >    ci_hdrc will register usb-role-switch device.
> > > >
> > > > Therefore, module ci_hdrc_imx depends on module ci_hdrc. Device ci_hdrc.0
> > > > is a child of 4c200000.usb.
> > >
> > > And ci_hdrc_imx is a child module of ci_hdrc.  Got it.
> > >
> > > > > >  ci_hdrc will register
> > > > > >    usb-role-switch device.
> > > > > > 3. When module tcpci is loaded, it will register tcpm port device and try
> > > > > >    to get tcpm module's reference. The tcpm module will get usb-role-switch
> > > > > >    which is registered by ci_hdrc.
> > > > >
> > > > > What do you mean by "will get"?  Do you mean that tcpm will become the
> > > > > driver for the usb_role_switch device?  Or do you mean that it simply
> > > > > calls get_device(&usb_role_switch)?
> > > > >
> > > > > If the latter is the case, how does the tcpm driver learn the address of
> > > > > usb_role_switch in the first place?
> > > >
> > > > Via
> > > > port->role_sw = usb_role_switch_get(port->dev)
> > > > or
> > > > port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode).
> > > >
> > > > The usb controller will register usb-role-swtich device to the global list
> > > > of usb_role class. The fwnode of usb-role-swtich device is also set to usb
> > > > controller's fwnode. Initially, a fwnode graph between usb controller of
> > > > node and tcpm connector node had already been established. These two
> > > > functions will find usb-role-swtich device based on this fwnode graph
> > > > and fwnode matching.
> > >
> > > If usb_role_switch_get() gives away references to the usb_role_switch
> > > device, it should have a way to take those references back.  But I guess
> > > it doesn't.
> > >
> > > >  After usb-role-switce device is found, these two
> > > > functions will call: try_module_get(sw->dev.parent->driver->owner).
> > >
> > > You mean usb_role_switch_get() and fwnode_usb_role_switch_get() do this?
> >
> > Yes.
> >
> > >
> > > > Here sw->dev.parent is device ci_hdrc.0. sw->dev.parent->driver is ci_hdrc.
> > > >
> > > > >
> > > > > >  In current design, tcpm will also try to
> > > > > >    get ci_hdrc module's reference after get usb-role-switch.
> > > > >
> > > > > This might be a bug.  There should not be any need for the tcpm driver
> > > > > to take a reference to the ci_hdrc module.  But there should be a way
> > > > > for the ci_hdrc driver to notify tcpm when the usb_role_switch device is
> > > > > about to be unregistered.  If tcpm is usb_role_switch's driver then this
> > > > > notification happens automatically, by means of the .remove() callback.
> > > >
> > > > I'm not the designer of usb_role class driver. Not sure if this is needed to get
> > > > module reference of its parent device's driver. Maybe need @heikki's input.
> > > >
> > > > @heikki.krogerus, can you give some explanations?
> > >
> > > Yes, please, some additional explanation would help.
> > >
> > > > > > 4. Due to no modules depend on ci_hdrc_imx, ci_hdrc_imx can be manually
> > > > > >    unloaded. Then device ci_hdrc.0 will be removed by ci_hdrc_imx and
> > > > > >    device usb-role-switch is also unregistered.
> > > > >
> > > > > At this point, tcpm should learn that it has to drop all its references
> > > > > to usb_role_swich.  Since the module which registered usb_role_switch
> > > > > isn't tcpm's ancestor, tcpm must not keep _any_ references to the device
> > > > > after it is unregistered.
> > > >
> > > > Yes, I also think so.
> > > >
> > > > >
> > > > > Well, strictly speaking that's not true.  By misusing the driver model,
> > > > > tcpm could keep a reference to the ci_hdrc module until it was finished
> > > > > using usb_role_switch.  Is that what you are trying to do?
> > > >
> > > > No, I'm trying to get module reference of ci_hdrc_imx too. Then,
> > > > ci_hdrc_imx can't be unloaded before tcpci module unloaded.
> > >
> > > You shouldn't do this.  Users should be able to unload ci_hdrc_imx
> > > whenever they want, even if tcpci is still loaded.
> >
> > Okay. Understand.
> >
> > >
> > > > > > 5. Then, if I try to unload module tcpci, "NULL pointer dereference"
> > > > > >    will be shown due to below code:
> > > > > >
> > > > > >    module_put(sw->dev.parent->driver->owner);
> > >
> > > I forgot to ask: What function makes this call?  Is it part of the
> > > usb_role class driver?
> >
> > usb_role_switch_put() do this.
> > Yes, it's a function of usb_role class driver.
> >
> > >
> > > > > >    parent->driver is NULL at this time.
> > > > >
> > > > > What is dev at this point?  And what is dev.parent?  And what did
> > > > > dev.parent->driver used to be before it was set to NULL?
> > > >
> > > > Here sw->dev is usb-role-switch device. sw->dev.parent is ci_hdrc.0 device.
> > > > sw->dev.parent->driver was ci_hdrc.
> > >
> > > Which is now gone, right.  I understand.
> > >
> > > Let's see what Heikki has to say.
> > >
> > > However, assuming he wants to continue misusing the driver model in this
> > > way, what you should do is add a new field to sw, where you will store
> > > sw->dev.parent->driver.owner at the time of the try_module_get() call
> > > (but only if the call succeeds!).  Then when the module_put() call runs,
> > > have it use the value stored in this new field instead of dereferencing
> > > sw->dev.parent->driver.owner.
> >
> > It sounds like a better solution.
> > Thanks for the suggestion!
> 
> If there is a better way of doing this, then let's use it. I'm happy
> with what ever works.
> 
> The only requirement here is that we have some way of preventing the
> role switch provider from being unloaded while it's being used.
> 

Preventing the role switch provider from being unloaded seems infeasible
because the usb controller device (in most cases is the provider) can be
unbinded too. If that happens, the usb role switch device will be unregisterd
too and the provider driver will still exist.

Thanks,
Xu Yang
Xu Yang Jan. 18, 2024, 3:52 p.m. UTC | #20
Hi Alan,

> 
> On Thu, Jan 18, 2024 at 03:54:16PM +0100, Greg KH wrote:
> > On Thu, Jan 18, 2024 at 02:38:58PM +0200, Heikki Krogerus wrote:
> > > It is "being used" when something (user) acquires reference to the
> > > role switch that it provides. The "user" is in most cases USB Type-C
> > > port driver - what ever knows the role the port needs to be in.
> > >
> > > USB role switch is meant to act as a "resource" like any other. So
> > > when you acquire for example a GPIO, the gpiodev driver is pinned down
> > > by calling module_get() just like this driver is doing to the switch
> > > provider.
> >
> > So again, if the hardware is present in the system, the module reference
> > count will always be increased and can never be removed no matter what a
> > user does?  That feels wrong if so.
> 
> It's not quite that bad.  Even if the USB role-switch hardware is
> present in the system, the module controlling it can be removed if the
> role-switch resource isn't being used (which will cause the module count
> will go back down to 0).  This may require the user first to remove the
> module that's using the resource.
> 
> > > > Any module should be able to be removed any time, unless there is a
> > > > "code" requirement of it being present.  The driver model structures
> > > > should make this possible if used properly.  Trying to much around in
> > > > various parent devices and the drivers bound to parents should NEVER be
> > > > done as happens here, sorry I missed that in the review process.
> > >
> > > If this is not something that this kind of device class should be
> > > doing, then let's remove the whole thing. But if that is the case,
> > > then shouldn't we remove that from all the other bus and device class
> > > drivers as well?
> >
> > I started to remove it many years ago, but then something prevented that
> > as there was actually some valid uses, but I can't remember at the
> > moment.  Try taking it out and see what happens!  :)
> 
> The problem here is that a device is being used by module B that isn't a
> descendant of the device's driver module A.  If B didn't take a
> reference to A then A could be unloaded while B was still using the
> device.  When B finally drops its reference to the device, the
> driver-model release call would get a NULL-pointer violation, since the
> code for the release method was part of A and has been unloaded from
> memory.
> 
> In the current code, B takes a module reference to A and drops that
> reference after dropping its reference to the device.  The bug that Xu
> Yang wants to fix (the original reason for this discussion) has to do
> with the way B drops its module reference to A.  It gets the module
> handle of A by looking up the owner of the device's driver -- and that
> approach doesn't work if the device has been unbound from its driver,
> for the obvious reason.  The proper solution to this problem is for B to
> cache A's handle at the time when it first acquires the module
> reference.
> 

I've tried your suggestion and it appears to be working fine. Now I'm not sure if
the module get/put parts should be removed or to fix the NULL pointer issue. I'm
working on this issue, so I have time to fix it. I think if first way is taken, the status
of usb_role_switch device should be updated when it's registered/unregisterd. Or
other issues will occur since the user doesn't know the change of usb_role_switch
device.

Thanks,
Xu Yang
Alan Stern Jan. 18, 2024, 7:21 p.m. UTC | #21
On Thu, Jan 18, 2024 at 03:52:52PM +0000, Xu Yang wrote:
> I've tried your suggestion and it appears to be working fine. Now I'm not sure if
> the module get/put parts should be removed or to fix the NULL pointer issue. I'm
> working on this issue, so I have time to fix it. I think if first way is taken, the status
> of usb_role_switch device should be updated when it's registered/unregisterd. Or
> other issues will occur since the user doesn't know the change of usb_role_switch
> device.

These really are questions for Heikki, not me.  Can you at least show us 
the patch you've been testing?

I agree that the status of the current USB role should be exposed to 
userspace somehow.

Alan Stern
Xu Yang Jan. 19, 2024, 8:18 a.m. UTC | #22
Hi,

> 
> On Thu, Jan 18, 2024 at 03:52:52PM +0000, Xu Yang wrote:
> > I've tried your suggestion and it appears to be working fine. Now I'm not sure if
> > the module get/put parts should be removed or to fix the NULL pointer issue. I'm
> > working on this issue, so I have time to fix it. I think if first way is taken, the status
> > of usb_role_switch device should be updated when it's registered/unregisterd. Or
> > other issues will occur since the user doesn't know the change of usb_role_switch
> > device.
> 
> These really are questions for Heikki, not me.  Can you at least show us
> the patch you've been testing?

I have a simple test based on below changes:

diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
index ae41578bd014..d55a5d8d4fc4 100644
--- a/drivers/usb/roles/class.c
+++ b/drivers/usb/roles/class.c
@@ -20,6 +20,7 @@ static const struct class role_class = {
 
 struct usb_role_switch {
 	struct device dev;
+	struct module *module;
 	struct mutex lock; /* device lock*/
 	enum usb_role role;
 
@@ -135,7 +136,7 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
 						  usb_role_switch_match);
 
 	if (!IS_ERR_OR_NULL(sw))
-		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+		WARN_ON(!try_module_get(sw->module));
 
 	return sw;
 }
@@ -157,7 +158,7 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
 		sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
 						  NULL, usb_role_switch_match);
 	if (!IS_ERR_OR_NULL(sw))
-		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+		WARN_ON(!try_module_get(sw->module));
 
 	return sw;
 }
@@ -172,7 +173,7 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
 void usb_role_switch_put(struct usb_role_switch *sw)
 {
 	if (!IS_ERR_OR_NULL(sw)) {
-		module_put(sw->dev.parent->driver->owner);
+		module_put(sw->module);
 		put_device(&sw->dev);
 	}
 }
@@ -189,15 +190,18 @@ struct usb_role_switch *
 usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
 {
 	struct device *dev;
+	struct usb_role_switch *sw = NULL;
 
 	if (!fwnode)
 		return NULL;
 
 	dev = class_find_device_by_fwnode(&role_class, fwnode);
-	if (dev)
-		WARN_ON(!try_module_get(dev->parent->driver->owner));
+	if(dev) {
+		sw = to_role_switch(dev);
+		WARN_ON(!try_module_get(sw->module));
+	}
 
-	return dev ? to_role_switch(dev) : NULL;
+	return sw;
 }
 EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
 
@@ -339,6 +343,7 @@ usb_role_switch_register(struct device *parent,
 	sw->get = desc->get;
 
 	sw->dev.parent = parent;
+	sw->module = parent->driver->owner;
 	sw->dev.fwnode = desc->fwnode;
 	sw->dev.class = &role_class;
 	sw->dev.type = &usb_role_dev_type;

Thanks,
Xu Yang
Xu Yang Jan. 19, 2024, 11:13 a.m. UTC | #23
Hi Greg,

> 
> > > Why?  What defines "being used"?
> >
> > It is "being used" when something (user) acquires reference to the
> > role switch that it provides. The "user" is in most cases USB Type-C
> > port driver - what ever knows the role the port needs to be in.
> >
> > USB role switch is meant to act as a "resource" like any other. So
> > when you acquire for example a GPIO, the gpiodev driver is pinned down
> > by calling module_get() just like this driver is doing to the switch
> > provider.
> 
> So again, if the hardware is present in the system, the module reference
> count will always be increased and can never be removed no matter what a
> user does?  That feels wrong if so.
> 
> > > Any module should be able to be removed any time, unless there is a
> > > "code" requirement of it being present.  The driver model structures
> > > should make this possible if used properly.  Trying to much around in
> > > various parent devices and the drivers bound to parents should NEVER be
> > > done as happens here, sorry I missed that in the review process.
> >
> > If this is not something that this kind of device class should be
> > doing, then let's remove the whole thing. But if that is the case,
> > then shouldn't we remove that from all the other bus and device class
> > drivers as well?
> 
> I started to remove it many years ago, but then something prevented that
> as there was actually some valid uses, but I can't remember at the
> moment.  Try taking it out and see what happens!  :)

I've a simple test based on below changes too. This will remove
module_get/put() parts and add a flag to indicate the current status
of usb-role-switch device. It works too.

diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
index ae41578bd014..313f378d1a74 100644
--- a/drivers/usb/roles/class.c
+++ b/drivers/usb/roles/class.c
@@ -22,6 +22,7 @@ struct usb_role_switch {
 	struct device dev;
 	struct mutex lock; /* device lock*/
 	enum usb_role role;
+	bool registered;
 
 	/* From descriptor */
 	struct device *usb2_port;
@@ -48,6 +49,9 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum usb_role role)
 	if (IS_ERR_OR_NULL(sw))
 		return 0;
 
+	if (!sw->registered)
+		return 0;
+
 	mutex_lock(&sw->lock);
 
 	ret = sw->set(sw, role);
@@ -76,6 +80,9 @@ enum usb_role usb_role_switch_get_role(struct usb_role_switch *sw)
 	if (IS_ERR_OR_NULL(sw))
 		return USB_ROLE_NONE;
 
+	if (!sw->registered)
+		return USB_ROLE_NONE;
+
 	mutex_lock(&sw->lock);
 
 	if (sw->get)
@@ -134,9 +141,6 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
 		sw = device_connection_find_match(dev, "usb-role-switch", NULL,
 						  usb_role_switch_match);
 
-	if (!IS_ERR_OR_NULL(sw))
-		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
-
 	return sw;
 }
 EXPORT_SYMBOL_GPL(usb_role_switch_get);
@@ -156,9 +160,6 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
 	if (!sw)
 		sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
 						  NULL, usb_role_switch_match);
-	if (!IS_ERR_OR_NULL(sw))
-		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
-
 	return sw;
 }
 EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
@@ -172,7 +173,6 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
 void usb_role_switch_put(struct usb_role_switch *sw)
 {
 	if (!IS_ERR_OR_NULL(sw)) {
-		module_put(sw->dev.parent->driver->owner);
 		put_device(&sw->dev);
 	}
 }
@@ -194,9 +194,6 @@ usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
 		return NULL;
 
 	dev = class_find_device_by_fwnode(&role_class, fwnode);
-	if (dev)
-		WARN_ON(!try_module_get(dev->parent->driver->owner));
-
 	return dev ? to_role_switch(dev) : NULL;
 }
 EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
@@ -352,6 +349,8 @@ usb_role_switch_register(struct device *parent,
 		return ERR_PTR(ret);
 	}
 
+	sw->registered = true;
+
 	/* TODO: Symlinks for the host port and the device controller. */
 
 	return sw;
@@ -366,8 +365,10 @@ EXPORT_SYMBOL_GPL(usb_role_switch_register);
  */
 void usb_role_switch_unregister(struct usb_role_switch *sw)
 {
-	if (!IS_ERR_OR_NULL(sw))
+	if (!IS_ERR_OR_NULL(sw)) {
+		sw->registered = false;
 		device_unregister(&sw->dev);
+	}
 }
 EXPORT_SYMBOL_GPL(usb_role_switch_unregister);

Thanks,
Xu Yang
Heikki Krogerus Jan. 19, 2024, 1:21 p.m. UTC | #24
On Fri, Jan 19, 2024 at 08:18:51AM +0000, Xu Yang wrote:
> Hi,
> 
> > 
> > On Thu, Jan 18, 2024 at 03:52:52PM +0000, Xu Yang wrote:
> > > I've tried your suggestion and it appears to be working fine. Now I'm not sure if
> > > the module get/put parts should be removed or to fix the NULL pointer issue. I'm
> > > working on this issue, so I have time to fix it. I think if first way is taken, the status
> > > of usb_role_switch device should be updated when it's registered/unregisterd. Or
> > > other issues will occur since the user doesn't know the change of usb_role_switch
> > > device.
> > 
> > These really are questions for Heikki, not me.  Can you at least show us
> > the patch you've been testing?
> 
> I have a simple test based on below changes:
> 
> diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
> index ae41578bd014..d55a5d8d4fc4 100644
> --- a/drivers/usb/roles/class.c
> +++ b/drivers/usb/roles/class.c
> @@ -20,6 +20,7 @@ static const struct class role_class = {
>  
>  struct usb_role_switch {
>  	struct device dev;
> +	struct module *module;
>  	struct mutex lock; /* device lock*/
>  	enum usb_role role;
>  
> @@ -135,7 +136,7 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
>  						  usb_role_switch_match);
>  
>  	if (!IS_ERR_OR_NULL(sw))
> -		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> +		WARN_ON(!try_module_get(sw->module));
>  
>  	return sw;
>  }
> @@ -157,7 +158,7 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
>  		sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
>  						  NULL, usb_role_switch_match);
>  	if (!IS_ERR_OR_NULL(sw))
> -		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> +		WARN_ON(!try_module_get(sw->module));
>  
>  	return sw;
>  }
> @@ -172,7 +173,7 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
>  void usb_role_switch_put(struct usb_role_switch *sw)
>  {
>  	if (!IS_ERR_OR_NULL(sw)) {
> -		module_put(sw->dev.parent->driver->owner);
> +		module_put(sw->module);
>  		put_device(&sw->dev);
>  	}
>  }
> @@ -189,15 +190,18 @@ struct usb_role_switch *
>  usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
>  {
>  	struct device *dev;
> +	struct usb_role_switch *sw = NULL;
>  
>  	if (!fwnode)
>  		return NULL;
>  
>  	dev = class_find_device_by_fwnode(&role_class, fwnode);
> -	if (dev)
> -		WARN_ON(!try_module_get(dev->parent->driver->owner));
> +	if(dev) {
> +		sw = to_role_switch(dev);
> +		WARN_ON(!try_module_get(sw->module));
> +	}
>  
> -	return dev ? to_role_switch(dev) : NULL;
> +	return sw;
>  }
>  EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
>  
> @@ -339,6 +343,7 @@ usb_role_switch_register(struct device *parent,
>  	sw->get = desc->get;
>  
>  	sw->dev.parent = parent;
> +	sw->module = parent->driver->owner;
>  	sw->dev.fwnode = desc->fwnode;
>  	sw->dev.class = &role_class;
>  	sw->dev.type = &usb_role_dev_type;

This looks good to me.
Alan Stern Jan. 19, 2024, 2:53 p.m. UTC | #25
On Fri, Jan 19, 2024 at 08:18:51AM +0000, Xu Yang wrote:
> Hi,
> 
> > 
> > On Thu, Jan 18, 2024 at 03:52:52PM +0000, Xu Yang wrote:
> > > I've tried your suggestion and it appears to be working fine. Now I'm not sure if
> > > the module get/put parts should be removed or to fix the NULL pointer issue. I'm
> > > working on this issue, so I have time to fix it. I think if first way is taken, the status
> > > of usb_role_switch device should be updated when it's registered/unregisterd. Or
> > > other issues will occur since the user doesn't know the change of usb_role_switch
> > > device.
> > 
> > These really are questions for Heikki, not me.  Can you at least show us
> > the patch you've been testing?
> 
> I have a simple test based on below changes:
> 
> diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
> index ae41578bd014..d55a5d8d4fc4 100644
> --- a/drivers/usb/roles/class.c
> +++ b/drivers/usb/roles/class.c
> @@ -20,6 +20,7 @@ static const struct class role_class = {
>  
>  struct usb_role_switch {
>  	struct device dev;
> +	struct module *module;
>  	struct mutex lock; /* device lock*/
>  	enum usb_role role;
>  
> @@ -135,7 +136,7 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
>  						  usb_role_switch_match);
>  
>  	if (!IS_ERR_OR_NULL(sw))
> -		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> +		WARN_ON(!try_module_get(sw->module));
>  
>  	return sw;
>  }

I'm surprised that these functions don't handle errors in 
try_module_get().  Is there some reason for this?

> @@ -157,7 +158,7 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
>  		sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
>  						  NULL, usb_role_switch_match);
>  	if (!IS_ERR_OR_NULL(sw))
> -		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> +		WARN_ON(!try_module_get(sw->module));
>  
>  	return sw;
>  }
> @@ -172,7 +173,7 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
>  void usb_role_switch_put(struct usb_role_switch *sw)
>  {
>  	if (!IS_ERR_OR_NULL(sw)) {
> -		module_put(sw->dev.parent->driver->owner);
> +		module_put(sw->module);
>  		put_device(&sw->dev);
>  	}
>  }

If try_module_get() failed then this would be an unbalanced 
module_put().

However, these problems were already present in the driver and this 
patch doesn't change them.  So the patch looks okay to me.

Alan Stern
Alan Stern Jan. 19, 2024, 3 p.m. UTC | #26
On Fri, Jan 19, 2024 at 11:13:23AM +0000, Xu Yang wrote:
> I've a simple test based on below changes too. This will remove
> module_get/put() parts and add a flag to indicate the current status
> of usb-role-switch device. It works too.
> 
> diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
> index ae41578bd014..313f378d1a74 100644
> --- a/drivers/usb/roles/class.c
> +++ b/drivers/usb/roles/class.c
> @@ -22,6 +22,7 @@ struct usb_role_switch {
>  	struct device dev;
>  	struct mutex lock; /* device lock*/
>  	enum usb_role role;
> +	bool registered;
>  
>  	/* From descriptor */
>  	struct device *usb2_port;
> @@ -48,6 +49,9 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum usb_role role)
>  	if (IS_ERR_OR_NULL(sw))
>  		return 0;
>  
> +	if (!sw->registered)
> +		return 0;
> +
>  	mutex_lock(&sw->lock);
>  
>  	ret = sw->set(sw, role);
> @@ -76,6 +80,9 @@ enum usb_role usb_role_switch_get_role(struct usb_role_switch *sw)
>  	if (IS_ERR_OR_NULL(sw))
>  		return USB_ROLE_NONE;
>  
> +	if (!sw->registered)
> +		return USB_ROLE_NONE;
> +
>  	mutex_lock(&sw->lock);
>  
>  	if (sw->get)
> @@ -134,9 +141,6 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
>  		sw = device_connection_find_match(dev, "usb-role-switch", NULL,
>  						  usb_role_switch_match);
>  
> -	if (!IS_ERR_OR_NULL(sw))
> -		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> -
>  	return sw;
>  }
>  EXPORT_SYMBOL_GPL(usb_role_switch_get);
> @@ -156,9 +160,6 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
>  	if (!sw)
>  		sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
>  						  NULL, usb_role_switch_match);
> -	if (!IS_ERR_OR_NULL(sw))
> -		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> -
>  	return sw;
>  }
>  EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
> @@ -172,7 +173,6 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
>  void usb_role_switch_put(struct usb_role_switch *sw)
>  {
>  	if (!IS_ERR_OR_NULL(sw)) {
> -		module_put(sw->dev.parent->driver->owner);
>  		put_device(&sw->dev);
>  	}
>  }
> @@ -194,9 +194,6 @@ usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
>  		return NULL;
>  
>  	dev = class_find_device_by_fwnode(&role_class, fwnode);
> -	if (dev)
> -		WARN_ON(!try_module_get(dev->parent->driver->owner));
> -
>  	return dev ? to_role_switch(dev) : NULL;
>  }
>  EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
> @@ -352,6 +349,8 @@ usb_role_switch_register(struct device *parent,
>  		return ERR_PTR(ret);
>  	}
>  
> +	sw->registered = true;
> +
>  	/* TODO: Symlinks for the host port and the device controller. */
>  
>  	return sw;
> @@ -366,8 +365,10 @@ EXPORT_SYMBOL_GPL(usb_role_switch_register);
>   */
>  void usb_role_switch_unregister(struct usb_role_switch *sw)
>  {
> -	if (!IS_ERR_OR_NULL(sw))
> +	if (!IS_ERR_OR_NULL(sw)) {
> +		sw->registered = false;
>  		device_unregister(&sw->dev);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(usb_role_switch_unregister);

What happens if the provider module is unloaded but then 
usb_role_switch_put() is called after usb_role_switch_unregister()?  
Won't there be a NULL pointer dereference inside the put_device() call?

Alan Stern
Xu Yang Jan. 19, 2024, 3:23 p.m. UTC | #27
Hi Alan,

> 
> On Fri, Jan 19, 2024 at 11:13:23AM +0000, Xu Yang wrote:
> > I've a simple test based on below changes too. This will remove
> > module_get/put() parts and add a flag to indicate the current status
> > of usb-role-switch device. It works too.
> >
> > diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
> > index ae41578bd014..313f378d1a74 100644
> > --- a/drivers/usb/roles/class.c
> > +++ b/drivers/usb/roles/class.c
> > @@ -22,6 +22,7 @@ struct usb_role_switch {
> >       struct device dev;
> >       struct mutex lock; /* device lock*/
> >       enum usb_role role;
> > +     bool registered;
> >
> >       /* From descriptor */
> >       struct device *usb2_port;
> > @@ -48,6 +49,9 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum usb_role role)
> >       if (IS_ERR_OR_NULL(sw))
> >               return 0;
> >
> > +     if (!sw->registered)
> > +             return 0;
> > +
> >       mutex_lock(&sw->lock);
> >
> >       ret = sw->set(sw, role);
> > @@ -76,6 +80,9 @@ enum usb_role usb_role_switch_get_role(struct usb_role_switch *sw)
> >       if (IS_ERR_OR_NULL(sw))
> >               return USB_ROLE_NONE;
> >
> > +     if (!sw->registered)
> > +             return USB_ROLE_NONE;
> > +
> >       mutex_lock(&sw->lock);
> >
> >       if (sw->get)
> > @@ -134,9 +141,6 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
> >               sw = device_connection_find_match(dev, "usb-role-switch", NULL,
> >                                                 usb_role_switch_match);
> >
> > -     if (!IS_ERR_OR_NULL(sw))
> > -             WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> > -
> >       return sw;
> >  }
> >  EXPORT_SYMBOL_GPL(usb_role_switch_get);
> > @@ -156,9 +160,6 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
> >       if (!sw)
> >               sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
> >                                                 NULL, usb_role_switch_match);
> > -     if (!IS_ERR_OR_NULL(sw))
> > -             WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
> > -
> >       return sw;
> >  }
> >  EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
> > @@ -172,7 +173,6 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
> >  void usb_role_switch_put(struct usb_role_switch *sw)
> >  {
> >       if (!IS_ERR_OR_NULL(sw)) {
> > -             module_put(sw->dev.parent->driver->owner);
> >               put_device(&sw->dev);
> >       }
> >  }
> > @@ -194,9 +194,6 @@ usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
> >               return NULL;
> >
> >       dev = class_find_device_by_fwnode(&role_class, fwnode);
> > -     if (dev)
> > -             WARN_ON(!try_module_get(dev->parent->driver->owner));
> > -
> >       return dev ? to_role_switch(dev) : NULL;
> >  }
> >  EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
> > @@ -352,6 +349,8 @@ usb_role_switch_register(struct device *parent,
> >               return ERR_PTR(ret);
> >       }
> >
> > +     sw->registered = true;
> > +
> >       /* TODO: Symlinks for the host port and the device controller. */
> >
> >       return sw;
> > @@ -366,8 +365,10 @@ EXPORT_SYMBOL_GPL(usb_role_switch_register);
> >   */
> >  void usb_role_switch_unregister(struct usb_role_switch *sw)
> >  {
> > -     if (!IS_ERR_OR_NULL(sw))
> > +     if (!IS_ERR_OR_NULL(sw)) {
> > +             sw->registered = false;
> >               device_unregister(&sw->dev);
> > +     }
> >  }
> >  EXPORT_SYMBOL_GPL(usb_role_switch_unregister);
> 
> What happens if the provider module is unloaded but then
> usb_role_switch_put() is called after usb_role_switch_unregister()?
> Won't there be a NULL pointer dereference inside the put_device() call?

The get_device() will be called after the user successfully get usb_role_switch
device. So the resource of sw will continue to exist until usb_role_switch_put()
is called.

Thanks,
Xu Yang
Alan Stern Jan. 19, 2024, 4:22 p.m. UTC | #28
On Fri, Jan 19, 2024 at 03:23:50PM +0000, Xu Yang wrote:
> > What happens if the provider module is unloaded but then
> > usb_role_switch_put() is called after usb_role_switch_unregister()?
> > Won't there be a NULL pointer dereference inside the put_device() call?
> 
> The get_device() will be called after the user successfully get usb_role_switch
> device. So the resource of sw will continue to exist until usb_role_switch_put()
> is called.

But look: Your patch essentially prevents usb_role_switch_set_role() 
from running after the role-switch device has been unregistered.  But 
what if someone had already called usb_role_switch_set_role() before the 
device was unregistered?  Won't that eventually lead to problems if the 
provider's module is then unloaded from memory?

To put it another way, all those try_module_get() and module_put() calls 
were originally added to prevent a specific problem from occurring.  
Once you remove them, won't that problem be able to occur again?

Alan Stern
Xu Yang Jan. 22, 2024, 2:32 a.m. UTC | #29
Hi Alan,

> 
> On Fri, Jan 19, 2024 at 03:23:50PM +0000, Xu Yang wrote:
> > > What happens if the provider module is unloaded but then
> > > usb_role_switch_put() is called after usb_role_switch_unregister()?
> > > Won't there be a NULL pointer dereference inside the put_device() call?
> >
> > The get_device() will be called after the user successfully get usb_role_switch
> > device. So the resource of sw will continue to exist until usb_role_switch_put()
> > is called.
> 
> But look: Your patch essentially prevents usb_role_switch_set_role()
> from running after the role-switch device has been unregistered.  But
> what if someone had already called usb_role_switch_set_role() before the
> device was unregistered?  Won't that eventually lead to problems if the
> provider's module is then unloaded from memory?

Yes, exactly it may happen.

> 
> To put it another way, all those try_module_get() and module_put() calls
> were originally added to prevent a specific problem from occurring.
> Once you remove them, won't that problem be able to occur again?

Yes, it will. Thanks for reminder about this.

Thanks,
Xu Yang
diff mbox series

Patch

diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
index ae41578bd014..41060e354174 100644
--- a/drivers/usb/roles/class.c
+++ b/drivers/usb/roles/class.c
@@ -34,6 +34,24 @@  struct usb_role_switch {
 
 #define to_role_switch(d)	container_of(d, struct usb_role_switch, dev)
 
+void usb_role_switch_get_modules(struct device *dev)
+{
+	while (dev) {
+		if (dev->driver)
+			WARN_ON(!try_module_get(dev->driver->owner));
+		dev = dev->parent;
+	}
+}
+
+void usb_role_switch_put_modules(struct device *dev)
+{
+	while (dev) {
+		if (dev->driver)
+			module_put(dev->driver->owner);
+		dev = dev->parent;
+	}
+}
+
 /**
  * usb_role_switch_set_role - Set USB role for a switch
  * @sw: USB role switch
@@ -135,7 +153,7 @@  struct usb_role_switch *usb_role_switch_get(struct device *dev)
 						  usb_role_switch_match);
 
 	if (!IS_ERR_OR_NULL(sw))
-		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+		usb_role_switch_get_modules(sw->dev.parent);
 
 	return sw;
 }
@@ -157,7 +175,7 @@  struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
 		sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
 						  NULL, usb_role_switch_match);
 	if (!IS_ERR_OR_NULL(sw))
-		WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+		usb_role_switch_get_modules(sw->dev.parent);
 
 	return sw;
 }
@@ -172,7 +190,7 @@  EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
 void usb_role_switch_put(struct usb_role_switch *sw)
 {
 	if (!IS_ERR_OR_NULL(sw)) {
-		module_put(sw->dev.parent->driver->owner);
+		usb_role_switch_put_modules(sw->dev.parent);
 		put_device(&sw->dev);
 	}
 }
@@ -195,7 +213,7 @@  usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
 
 	dev = class_find_device_by_fwnode(&role_class, fwnode);
 	if (dev)
-		WARN_ON(!try_module_get(dev->parent->driver->owner));
+		usb_role_switch_get_modules(dev->parent);
 
 	return dev ? to_role_switch(dev) : NULL;
 }