diff mbox series

[BUG,RFC] media: Wrong module gets acquired

Message ID d2bc538126492151ad325fa653924ca158a39b07.1534177949.git.petrcvekcz@gmail.com (mailing list archive)
State New, archived
Headers show
Series [BUG,RFC] media: Wrong module gets acquired | expand

Commit Message

Petr Cvek Aug. 13, 2018, 4:33 p.m. UTC
From: Petr Cvek <petrcvekcz@gmail.com>

When transferring a media sensor driver from the soc_camera I've found
the controller module can get removed (which will cause a stack dump
because the sensor driver depends on resources from the controller driver).

When I've tried to remove the driver module of the sensor it said the
resource was busy (without a reference name) though is should be
possible to remove the sensor driver because it is at the end of
the dependency list and not to remove the controller driver.

I've dig into the called functions and I've found this in
drivers/media/v4l2-core/v4l2-device.c:

	/*
	 * The reason to acquire the module here is to avoid unloading
	 * a module of sub-device which is registered to a media
	 * device. To make it possible to unload modules for media
	 * devices that also register sub-devices, do not
	 * try_module_get() such sub-device owners.
	 */
	sd->owner_v4l2_dev = v4l2_dev->dev && v4l2_dev->dev->driver &&
		sd->owner == v4l2_dev->dev->driver->owner;

	if (!sd->owner_v4l2_dev && !try_module_get(sd->owner))
		return -ENODEV;

It basicaly checks if subdevice (=sensor) is a same module as the media
device (=controller) and if they are different it acquires the module.

The acquired module is the one in sd->owner, which is the same module from
which the function is called (-> sensor aquires itself). Is this
functionality valid (should the subdevice really be unloadable)? When
I've patched the module to aquire the controller instead the module, the
removal worked as expected (sensor free to go, controller not).

If this is really a bug (= there isn't a sensor which cannot be unloaded from
a controller?) then I send a new patch with reworded commentary.

Signed-off-by: Petr Cvek <petrcvekcz@gmail.com>
---
 drivers/media/v4l2-core/v4l2-device.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Sakari Ailus Aug. 14, 2018, 8:35 a.m. UTC | #1
Hi Pert,

On Mon, Aug 13, 2018 at 06:33:12PM +0200, petrcvekcz@gmail.com wrote:
> From: Petr Cvek <petrcvekcz@gmail.com>
> 
> When transferring a media sensor driver from the soc_camera I've found
> the controller module can get removed (which will cause a stack dump
> because the sensor driver depends on resources from the controller driver).

There may be a kernel oops if a resource used by another driver goes away.
But the right fix isn't to prevent unloading that module. Instead, one way
to address the problem would be to have persistent clock objects that would
not be dependent on the driver that provides them.

> 
> When I've tried to remove the driver module of the sensor it said the
> resource was busy (without a reference name) though is should be
> possible to remove the sensor driver because it is at the end of
> the dependency list and not to remove the controller driver.

That might be one day possible but it is not today.

You'll still need to acquire the sensor module as well as it registers a
media entity as well as a sub-device.

> 
> I've dig into the called functions and I've found this in
> drivers/media/v4l2-core/v4l2-device.c:
> 
> 	/*
> 	 * The reason to acquire the module here is to avoid unloading
> 	 * a module of sub-device which is registered to a media
> 	 * device. To make it possible to unload modules for media
> 	 * devices that also register sub-devices, do not
> 	 * try_module_get() such sub-device owners.
> 	 */
> 	sd->owner_v4l2_dev = v4l2_dev->dev && v4l2_dev->dev->driver &&
> 		sd->owner == v4l2_dev->dev->driver->owner;
> 
> 	if (!sd->owner_v4l2_dev && !try_module_get(sd->owner))
> 		return -ENODEV;
> 
> It basicaly checks if subdevice (=sensor) is a same module as the media
> device (=controller) and if they are different it acquires the module.
> 
> The acquired module is the one in sd->owner, which is the same module from
> which the function is called (-> sensor aquires itself). Is this
> functionality valid (should the subdevice really be unloadable)? When
> I've patched the module to aquire the controller instead the module, the
> removal worked as expected (sensor free to go, controller not).
> 
> If this is really a bug (= there isn't a sensor which cannot be unloaded from
> a controller?) then I send a new patch with reworded commentary.
> 
> Signed-off-by: Petr Cvek <petrcvekcz@gmail.com>
> ---
>  drivers/media/v4l2-core/v4l2-device.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/v4l2-device.c b/drivers/media/v4l2-core/v4l2-device.c
> index 3940e55c72f1..1dec61cd560c 100644
> --- a/drivers/media/v4l2-core/v4l2-device.c
> +++ b/drivers/media/v4l2-core/v4l2-device.c
> @@ -173,7 +173,8 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
>  	sd->owner_v4l2_dev = v4l2_dev->dev && v4l2_dev->dev->driver &&
>  		sd->owner == v4l2_dev->dev->driver->owner;
>  
> -	if (!sd->owner_v4l2_dev && !try_module_get(sd->owner))
> +	if (!sd->owner_v4l2_dev &&
> +		!try_module_get(v4l2_dev->dev->driver->owner))
>  		return -ENODEV;
>  
>  	sd->v4l2_dev = v4l2_dev;
> @@ -209,7 +210,7 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
>  #endif
>  error_module:
>  	if (!sd->owner_v4l2_dev)
> -		module_put(sd->owner);
> +		module_put(v4l2_dev->dev->driver->owner);
>  	sd->v4l2_dev = NULL;
>  	return err;
>  }
> @@ -318,6 +319,6 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
>  #endif
>  	video_unregister_device(sd->devnode);
>  	if (!sd->owner_v4l2_dev)
> -		module_put(sd->owner);
> +		module_put(v4l2_dev->dev->driver->owner);
>  }
>  EXPORT_SYMBOL_GPL(v4l2_device_unregister_subdev);
> -- 
> 2.18.0
>
Petr Cvek Aug. 14, 2018, 10:44 a.m. UTC | #2
Dne 14.8.2018 v 10:35 Sakari Ailus napsal(a):
> Hi Pert,
> 

Hello,

thanks for answering

> On Mon, Aug 13, 2018 at 06:33:12PM +0200, petrcvekcz@gmail.com wrote:
>> From: Petr Cvek <petrcvekcz@gmail.com>
>>
>> When transferring a media sensor driver from the soc_camera I've found
>> the controller module can get removed (which will cause a stack dump
>> because the sensor driver depends on resources from the controller driver).
> 
> There may be a kernel oops if a resource used by another driver goes away.
> But the right fix isn't to prevent unloading that module. Instead, one way
> to address the problem would be to have persistent clock objects that would
> not be dependent on the driver that provides them.
> 

Yeah it is a hack, but it allows me to rmmod the sensor driver.
Otherwise it has a refcount "deadlock" as it holds the refcount on
itself. The _only_ way to unload the driver is to reboot.

ad clocks: Well the sensor driver and the controller are using
v4l2_clk_* so I was poking around that. Anyway the independent clock
object would be a problem, the clock control is inside the capture
driver register set (divider, enable) (I think) and it gets gated out
completely when the capture driver is unloaded. And when the clock is
stopped the sensor isn't responding to i2c commands. The clock control
is really dependent on the driver unless there is some callback to the
sensor which interrupts it from non-responding i2c transfers.

>>
>> When I've tried to remove the driver module of the sensor it said the
>> resource was busy (without a reference name) though is should be
>> possible to remove the sensor driver because it is at the end of
>> the dependency list and not to remove the controller driver.
> 
> That might be one day possible but it is not today.
> 
> You'll still need to acquire the sensor module as well as it registers a
> media entity as well as a sub-device.
> 

The ported sensor driver doesn't contain any media entity code.

best regards,
Petr
Mauro Carvalho Chehab Aug. 14, 2018, 1:03 p.m. UTC | #3
Em Tue, 14 Aug 2018 11:35:01 +0300
Sakari Ailus <sakari.ailus@linux.intel.com> escreveu:

> Hi Pert,
> 
> On Mon, Aug 13, 2018 at 06:33:12PM +0200, petrcvekcz@gmail.com wrote:
> > From: Petr Cvek <petrcvekcz@gmail.com>
> > 
> > When transferring a media sensor driver from the soc_camera I've found
> > the controller module can get removed (which will cause a stack dump
> > because the sensor driver depends on resources from the controller driver).  
> 
> There may be a kernel oops if a resource used by another driver goes away.
> But the right fix isn't to prevent unloading that module. Instead, one way
> to address the problem would be to have persistent clock objects that would
> not be dependent on the driver that provides them.
> 
> > 
> > When I've tried to remove the driver module of the sensor it said the
> > resource was busy (without a reference name) though is should be
> > possible to remove the sensor driver because it is at the end of
> > the dependency list and not to remove the controller driver.  
> 
> That might be one day possible but it is not today.
> 
> You'll still need to acquire the sensor module as well as it registers a
> media entity as well as a sub-device.

Let put my 2 cents here.

Usually, the same problem of removing modules happen if you just
ask the driver's core to unbind a module (with can be done via
sysfs).

Removing/unbinding a driver that uses media controller should work,
if the unbinding code at the driver (e. g. i2c_driver::remove field)
would delete the media controller entities, and the caller driver
doesn't cache it.

The USB drivers whose exports their topology via the media controller
(like em28xx) supports removing an I2C driver without crashing.

Btw, em28xx is an interesting example, as it has a main driver, several
sub-drivers and several I2C drivers.

There, basically, em28xx loads their sub-drivers, with causes the
sub-drivers to increment their device refcount. The same applies
to the I2C drivers. If one wants to remove em28xx from the memory,
it need to remove first the I2C drivers, then the em28xx-foo
sub-drivers, and finally em28xx itself. 

This use to work pretty well (except when some unbind regression gets
introduced).

I use this a lot when testing usb drivers, as it prevents me to boot
the Kernel every time.

-

It could be harder when it has cross-dependencies with other stuff
that aren't properly mapped (clock, gpio, etc). On such cases, the
best way is to use i2c_driver::suppress_bind_attrs, instead of
hacking the v4l2 core.

I didn't look into details (nor did any tests on this RFC patch),
but whatever is there at the core, it should be allowing unbinding
a module, if they don't use suppress_bind_attrs=true.

> 
> > 
> > I've dig into the called functions and I've found this in
> > drivers/media/v4l2-core/v4l2-device.c:
> > 
> > 	/*
> > 	 * The reason to acquire the module here is to avoid unloading
> > 	 * a module of sub-device which is registered to a media
> > 	 * device. To make it possible to unload modules for media
> > 	 * devices that also register sub-devices, do not
> > 	 * try_module_get() such sub-device owners.
> > 	 */
> > 	sd->owner_v4l2_dev = v4l2_dev->dev && v4l2_dev->dev->driver &&
> > 		sd->owner == v4l2_dev->dev->driver->owner;
> > 
> > 	if (!sd->owner_v4l2_dev && !try_module_get(sd->owner))
> > 		return -ENODEV;
> > 
> > It basicaly checks if subdevice (=sensor) is a same module as the media
> > device (=controller) and if they are different it acquires the module.
> > 
> > The acquired module is the one in sd->owner, which is the same module from
> > which the function is called (-> sensor aquires itself). Is this
> > functionality valid (should the subdevice really be unloadable)? When
> > I've patched the module to aquire the controller instead the module, the
> > removal worked as expected (sensor free to go, controller not).
> > 
> > If this is really a bug (= there isn't a sensor which cannot be unloaded from
> > a controller?) then I send a new patch with reworded commentary.
> > 
> > Signed-off-by: Petr Cvek <petrcvekcz@gmail.com>
> > ---
> >  drivers/media/v4l2-core/v4l2-device.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/media/v4l2-core/v4l2-device.c b/drivers/media/v4l2-core/v4l2-device.c
> > index 3940e55c72f1..1dec61cd560c 100644
> > --- a/drivers/media/v4l2-core/v4l2-device.c
> > +++ b/drivers/media/v4l2-core/v4l2-device.c
> > @@ -173,7 +173,8 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
> >  	sd->owner_v4l2_dev = v4l2_dev->dev && v4l2_dev->dev->driver &&
> >  		sd->owner == v4l2_dev->dev->driver->owner;
> >  
> > -	if (!sd->owner_v4l2_dev && !try_module_get(sd->owner))
> > +	if (!sd->owner_v4l2_dev &&
> > +		!try_module_get(v4l2_dev->dev->driver->owner))
> >  		return -ENODEV;
> >  
> >  	sd->v4l2_dev = v4l2_dev;
> > @@ -209,7 +210,7 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
> >  #endif
> >  error_module:
> >  	if (!sd->owner_v4l2_dev)
> > -		module_put(sd->owner);
> > +		module_put(v4l2_dev->dev->driver->owner);
> >  	sd->v4l2_dev = NULL;
> >  	return err;
> >  }
> > @@ -318,6 +319,6 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
> >  #endif
> >  	video_unregister_device(sd->devnode);
> >  	if (!sd->owner_v4l2_dev)
> > -		module_put(sd->owner);
> > +		module_put(v4l2_dev->dev->driver->owner);
> >  }
> >  EXPORT_SYMBOL_GPL(v4l2_device_unregister_subdev);
> > -- 
> > 2.18.0
> >   
> 



Thanks,
Mauro
Petr Cvek Aug. 15, 2018, 1:42 p.m. UTC | #4
Dne 14.8.2018 v 15:03 Mauro Carvalho Chehab napsal(a):
> Em Tue, 14 Aug 2018 11:35:01 +0300
> Sakari Ailus <sakari.ailus@linux.intel.com> escreveu:
> 
>> Hi Pert,
>>
>> On Mon, Aug 13, 2018 at 06:33:12PM +0200, petrcvekcz@gmail.com wrote:
>>> From: Petr Cvek <petrcvekcz@gmail.com>
>>>
>>> When transferring a media sensor driver from the soc_camera I've found
>>> the controller module can get removed (which will cause a stack dump
>>> because the sensor driver depends on resources from the controller driver).  
>>
>> There may be a kernel oops if a resource used by another driver goes away.
>> But the right fix isn't to prevent unloading that module. Instead, one way
>> to address the problem would be to have persistent clock objects that would
>> not be dependent on the driver that provides them.
>>
>>>
>>> When I've tried to remove the driver module of the sensor it said the
>>> resource was busy (without a reference name) though is should be
>>> possible to remove the sensor driver because it is at the end of
>>> the dependency list and not to remove the controller driver.  
>>
>> That might be one day possible but it is not today.
>>
>> You'll still need to acquire the sensor module as well as it registers a
>> media entity as well as a sub-device.
> 
> Let put my 2 cents here.
> 
> Usually, the same problem of removing modules happen if you just
> ask the driver's core to unbind a module (with can be done via
> sysfs).
> 
> Removing/unbinding a driver that uses media controller should work,
> if the unbinding code at the driver (e. g. i2c_driver::remove field)
> would delete the media controller entities, and the caller driver
> doesn't cache it.
> 

Yeah I assume it would be same as removing my sensor module from
v4l2_async and unregistering v4l2 clocks.

Petr
diff mbox series

Patch

diff --git a/drivers/media/v4l2-core/v4l2-device.c b/drivers/media/v4l2-core/v4l2-device.c
index 3940e55c72f1..1dec61cd560c 100644
--- a/drivers/media/v4l2-core/v4l2-device.c
+++ b/drivers/media/v4l2-core/v4l2-device.c
@@ -173,7 +173,8 @@  int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
 	sd->owner_v4l2_dev = v4l2_dev->dev && v4l2_dev->dev->driver &&
 		sd->owner == v4l2_dev->dev->driver->owner;
 
-	if (!sd->owner_v4l2_dev && !try_module_get(sd->owner))
+	if (!sd->owner_v4l2_dev &&
+		!try_module_get(v4l2_dev->dev->driver->owner))
 		return -ENODEV;
 
 	sd->v4l2_dev = v4l2_dev;
@@ -209,7 +210,7 @@  int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
 #endif
 error_module:
 	if (!sd->owner_v4l2_dev)
-		module_put(sd->owner);
+		module_put(v4l2_dev->dev->driver->owner);
 	sd->v4l2_dev = NULL;
 	return err;
 }
@@ -318,6 +319,6 @@  void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
 #endif
 	video_unregister_device(sd->devnode);
 	if (!sd->owner_v4l2_dev)
-		module_put(sd->owner);
+		module_put(v4l2_dev->dev->driver->owner);
 }
 EXPORT_SYMBOL_GPL(v4l2_device_unregister_subdev);