diff mbox

[v4,05/14] coresight: get/put module in coresight_build/release_path

Message ID 20180606155501.704583e1412996a1a2c6fa61@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kim Phillips June 6, 2018, 8:55 p.m. UTC
On Wed, 6 Jun 2018 10:46:36 +0100
Suzuki K Poulose <suzuki.poulose@arm.com> wrote:

> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
> > On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
> >> Increment the refcnt for driver modules in current use by calling
> >> module_get in coresight_build_path and module_put in release_path.
> >>
> >> This prevents driver modules from being unloaded when they are in use,
> >> either in sysfs or perf mode.
> > 
> > Why does it matter?  Shouldn't you be allowed to remove any module at
> > any point in time, much like a networking driver?
> > 
> > 
> >>
> >> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> >> Cc: Leo Yan <leo.yan@linaro.org>
> >> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> Cc: Randy Dunlap <rdunlap@infradead.org>
> >> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >> Cc: Russell King <linux@armlinux.org.uk>
> >> Signed-off-by: Kim Phillips <kim.phillips@arm.com>
> >> ---
> >>   drivers/hwtracing/coresight/coresight.c | 9 +++++++++
> >>   1 file changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
> >> index 338f1719641c..1c941351f1d1 100644
> >> --- a/drivers/hwtracing/coresight/coresight.c
> >> +++ b/drivers/hwtracing/coresight/coresight.c
> >> @@ -465,6 +465,12 @@ static int _coresight_build_path(struct coresight_device *csdev,
> >>   
> >>   	node->csdev = csdev;
> >>   	list_add(&node->link, path);
> >> +
> >> +	if (!try_module_get(csdev->dev.parent->driver->owner)) {
> > 
> > What is to keep parent->driver from going away right here?  What keeps
> > parent around?  This feels very fragile to me, I don't see any locking
> > anywhere around this code path to try to keep things in place.
> 
> You're right. We do have coresight_mutex, which is held across the build 
> path and the csdev is removed when a device is unregistered. However, I
> see that we don't hold the mutex while removing the connections from
> coresight_unregister(). Holding the mutex should protect us from the
> csdev being removed, while we build the path.

OK, I'll add this for the next version:


Thanks,

Kim

Comments

Greg Kroah-Hartman June 7, 2018, 8:34 a.m. UTC | #1
On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
> On Wed, 6 Jun 2018 10:46:36 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> 
> > On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
> > > On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
> > >> Increment the refcnt for driver modules in current use by calling
> > >> module_get in coresight_build_path and module_put in release_path.
> > >>
> > >> This prevents driver modules from being unloaded when they are in use,
> > >> either in sysfs or perf mode.
> > > 
> > > Why does it matter?  Shouldn't you be allowed to remove any module at
> > > any point in time, much like a networking driver?
> > > 
> > > 
> > >>
> > >> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> > >> Cc: Leo Yan <leo.yan@linaro.org>
> > >> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > >> Cc: Randy Dunlap <rdunlap@infradead.org>
> > >> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
> > >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > >> Cc: Russell King <linux@armlinux.org.uk>
> > >> Signed-off-by: Kim Phillips <kim.phillips@arm.com>
> > >> ---
> > >>   drivers/hwtracing/coresight/coresight.c | 9 +++++++++
> > >>   1 file changed, 9 insertions(+)
> > >>
> > >> diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
> > >> index 338f1719641c..1c941351f1d1 100644
> > >> --- a/drivers/hwtracing/coresight/coresight.c
> > >> +++ b/drivers/hwtracing/coresight/coresight.c
> > >> @@ -465,6 +465,12 @@ static int _coresight_build_path(struct coresight_device *csdev,
> > >>   
> > >>   	node->csdev = csdev;
> > >>   	list_add(&node->link, path);
> > >> +
> > >> +	if (!try_module_get(csdev->dev.parent->driver->owner)) {
> > > 
> > > What is to keep parent->driver from going away right here?  What keeps
> > > parent around?  This feels very fragile to me, I don't see any locking
> > > anywhere around this code path to try to keep things in place.
> > 
> > You're right. We do have coresight_mutex, which is held across the build 
> > path and the csdev is removed when a device is unregistered. However, I
> > see that we don't hold the mutex while removing the connections from
> > coresight_unregister(). Holding the mutex should protect us from the
> > csdev being removed, while we build the path.
> 
> OK, I'll add this for the next version:
> 
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index f96258de1e9b..da702507a55c 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -1040,8 +1040,12 @@ EXPORT_SYMBOL_GPL(coresight_register);
>  
>  void coresight_unregister(struct coresight_device *csdev)
>  {
> +       mutex_lock(&coresight_mutex);
> +

Locks are to protect data, not code, be careful here please.

That's the big issue with the module reference counting, it "protects"
code, not data.  If at all possible, never grab a module reference
count, as you should always be able to unload a module, unless you have
a file handle open, and if you have that, the kernel core will properly
protect you.

thanks,

greg k-h
Suzuki K Poulose June 7, 2018, 9:04 a.m. UTC | #2
Hi Greg,

On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
> On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
>> On Wed, 6 Jun 2018 10:46:36 +0100
>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>>> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
>>>> On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
>>>>> Increment the refcnt for driver modules in current use by calling
>>>>> module_get in coresight_build_path and module_put in release_path.
>>>>>
>>>>> This prevents driver modules from being unloaded when they are in use,
>>>>> either in sysfs or perf mode.
>>>>
>>>> Why does it matter?  Shouldn't you be allowed to remove any module at
>>>> any point in time, much like a networking driver?

The user doesn't have an explicit refcount on the individual components
in a trace session. So, when a trace session is in progress, it is as
good as having a "file" open on each component that is part of the
active trace session. So, we don't want the driver to be removed when
the component is being used in the trace collection. This will be
released as soon as the session is ended. It is just like a PMU driver
where the module refcount is held to ensure the module stays until the
session is over. In this case, we have multiple components, each with
its own driver invisible to the PMU driver. Hence the coresight driver
must hold the reference.

>>>>
>>>>
>>>>>
>>>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>>>> Cc: Leo Yan <leo.yan@linaro.org>
>>>>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>>>>> Cc: Randy Dunlap <rdunlap@infradead.org>
>>>>> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>> Cc: Russell King <linux@armlinux.org.uk>
>>>>> Signed-off-by: Kim Phillips <kim.phillips@arm.com>
>>>>> ---
>>>>>    drivers/hwtracing/coresight/coresight.c | 9 +++++++++
>>>>>    1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
>>>>> index 338f1719641c..1c941351f1d1 100644
>>>>> --- a/drivers/hwtracing/coresight/coresight.c
>>>>> +++ b/drivers/hwtracing/coresight/coresight.c
>>>>> @@ -465,6 +465,12 @@ static int _coresight_build_path(struct coresight_device *csdev,
>>>>>    
>>>>>    	node->csdev = csdev;
>>>>>    	list_add(&node->link, path);
>>>>> +
>>>>> +	if (!try_module_get(csdev->dev.parent->driver->owner)) {
>>>>
>>>> What is to keep parent->driver from going away right here?  What keeps
>>>> parent around?  This feels very fragile to me, I don't see any locking
>>>> anywhere around this code path to try to keep things in place.
>>>
>>> You're right. We do have coresight_mutex, which is held across the build
>>> path and the csdev is removed when a device is unregistered. However, I
>>> see that we don't hold the mutex while removing the connections from
>>> coresight_unregister(). Holding the mutex should protect us from the
>>> csdev being removed, while we build the path.
>>
>> OK, I'll add this for the next version:
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
>> index f96258de1e9b..da702507a55c 100644
>> --- a/drivers/hwtracing/coresight/coresight-core.c
>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>> @@ -1040,8 +1040,12 @@ EXPORT_SYMBOL_GPL(coresight_register);
>>   
>>   void coresight_unregister(struct coresight_device *csdev)
>>   {
>> +       mutex_lock(&coresight_mutex);
>> +
> 
> Locks are to protect data, not code, be careful here please.

The mutex here is to protect updates to the device links. We
keep a list of connections from each device to form a trace path.
When we unregister a device, we must remove the references to the
device from all the other connected components to ensure they don't
end up accessing a device which is gone.

> 
> That's the big issue with the module reference counting, it "protects"
> code, not data.  If at all possible, never grab a module reference
> count, as you should always be able to unload a module, unless you have
> a file handle open, and if you have that, the kernel core will properly
> protect you.

So in a nutshell, we have user invisible components which cannot be
refcounted explicitly by the file handles, and thus the driver must
do it.
Now, one option we could explore is getting the refcount on the
devices itself, rather than the drivers for trace sessions. And each
device could potentially hold a refcount on the driver (which I assume
is already held), which can be dropped when the device is no longer
used and thus get rid of the reference on the module everywhere.

Thoughts ? Suggestions ?

Suzuki


> 
> thanks,
> 
> greg k-h
>
Greg Kroah-Hartman June 7, 2018, 9:13 a.m. UTC | #3
On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
> Hi Greg,
> 
> On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
> > On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
> > > On Wed, 6 Jun 2018 10:46:36 +0100
> > > Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> > > 
> > > > On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
> > > > > On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
> > > > > > Increment the refcnt for driver modules in current use by calling
> > > > > > module_get in coresight_build_path and module_put in release_path.
> > > > > > 
> > > > > > This prevents driver modules from being unloaded when they are in use,
> > > > > > either in sysfs or perf mode.
> > > > > 
> > > > > Why does it matter?  Shouldn't you be allowed to remove any module at
> > > > > any point in time, much like a networking driver?
> 
> The user doesn't have an explicit refcount on the individual components
> in a trace session. So, when a trace session is in progress, it is as
> good as having a "file" open on each component that is part of the
> active trace session. So, we don't want the driver to be removed when
> the component is being used in the trace collection.

Why not?  What's wrong with that happening and then the trace collection
starts failing with -ENODEV or something?

Remember, removing a kernel module is something that only happens very
rarely, and is an explicit choice by someone with root permissions.  If
you want to remove that module, it should be able to go, as you know
what you are doing at that point in time.

Don't try to "protect the user from themselves" here, they want to shoot
their foot, make it hurt if they are aiming it there :)

> This will be
> released as soon as the session is ended. It is just like a PMU driver
> where the module refcount is held to ensure the module stays until the
> session is over. In this case, we have multiple components, each with
> its own driver invisible to the PMU driver. Hence the coresight driver
> must hold the reference.

Again, please think this through and don't add extra complexity to the
normal path, and get it right if you do it (the existing patch is not
right as I pointed out.)  Personally, I feel the code should just be
able to be unloaded whenever they want, user beware...

thanks,

greg k-h
Suzuki K Poulose June 7, 2018, 9:32 a.m. UTC | #4
On 06/07/2018 10:13 AM, Greg Kroah-Hartman wrote:
> On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
>> Hi Greg,
>>
>> On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
>>> On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
>>>> On Wed, 6 Jun 2018 10:46:36 +0100
>>>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>>>
>>>>> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
>>>>>> On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
>>>>>>> Increment the refcnt for driver modules in current use by calling
>>>>>>> module_get in coresight_build_path and module_put in release_path.
>>>>>>>
>>>>>>> This prevents driver modules from being unloaded when they are in use,
>>>>>>> either in sysfs or perf mode.
>>>>>>
>>>>>> Why does it matter?  Shouldn't you be allowed to remove any module at
>>>>>> any point in time, much like a networking driver?
>>
>> The user doesn't have an explicit refcount on the individual components
>> in a trace session. So, when a trace session is in progress, it is as
>> good as having a "file" open on each component that is part of the
>> active trace session. So, we don't want the driver to be removed when
>> the component is being used in the trace collection.
> 
> Why not?  What's wrong with that happening and then the trace collection
> starts failing with -ENODEV or something?

May be I am missing something here. Can we allow the driver to be 
removed when one of its device is "turned ON" and we need the same
driver to "turn it OFF" when the session ends ? To make a better
comparison :

Can we unload a usb_mass_storage module when a USB disk(which uses the 
module driver) is mounted and is being used ? I believe, the module
will eventually get unloaded when we unmount the disk, if someone did
a unload.

We have a similar situation here. The only difference is the driver is
referenced only when one of its device is in a trace session.

> 
> Remember, removing a kernel module is something that only happens very
> rarely, and is an explicit choice by someone with root permissions.  If
> you want to remove that module, it should be able to go, as you know
> what you are doing at that point in time.

Right, but when a device is "in use" can we do that ? I thought the user
will get a module is in use or busy, error.


> 
> Don't try to "protect the user from themselves" here, they want to shoot
> their foot, make it hurt if they are aiming it there :)
> 

The module_get/put added here are only triggered when we start a trace 
session, where we build a path for the current session from the 
configured "source" to the configured "sink" and the path is destroyed
at the end of the trace session. i.e, the path is not a permanent thing.
It is constructed per session. So it is perfectly possible to remove a
device in between trace sessions.

>> This will be
>> released as soon as the session is ended. It is just like a PMU driver
>> where the module refcount is held to ensure the module stays until the
>> session is over. In this case, we have multiple components, each with
>> its own driver invisible to the PMU driver. Hence the coresight driver
>> must hold the reference.
> 
> Again, please think this through and don't add extra complexity to the
> normal path, and get it right if you do it (the existing patch is not
> right as I pointed out.)  Personally, I feel the code should just be
> able to be unloaded whenever they want, user beware...

Sure, will explore more to refine the code. Thanks for the trigger.

Cheers
Suzuki
Suzuki K Poulose June 7, 2018, 9:34 a.m. UTC | #5
On 06/07/2018 10:32 AM, Suzuki K Poulose wrote:
> On 06/07/2018 10:13 AM, Greg Kroah-Hartman wrote:
>> On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
>>> Hi Greg,
>>>
>>> On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
>>>> On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
>>>>> On Wed, 6 Jun 2018 10:46:36 +0100
>>>>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>>>>
>>>>>> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
>>>>>>> On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
>>>>>>>> Increment the refcnt for driver modules in current use by calling
>>>>>>>> module_get in coresight_build_path and module_put in release_path.
>>>>>>>>
>>>>>>>> This prevents driver modules from being unloaded when they are 
>>>>>>>> in use,
>>>>>>>> either in sysfs or perf mode.
>>>>>>>
>>>>>>> Why does it matter?  Shouldn't you be allowed to remove any 
>>>>>>> module at
>>>>>>> any point in time, much like a networking driver?
>>>
>>> The user doesn't have an explicit refcount on the individual components
>>> in a trace session. So, when a trace session is in progress, it is as
>>> good as having a "file" open on each component that is part of the
>>> active trace session. So, we don't want the driver to be removed when
>>> the component is being used in the trace collection.
>>
>> Why not?  What's wrong with that happening and then the trace collection
>> starts failing with -ENODEV or something?

Forgot to add, this will indeed hit -ENODEV, if the device driver was
removed, as we fail to build the trace path before the session.

> 
> May be I am missing something here. Can we allow the driver to be 
> removed when one of its device is "turned ON" and we need the same
> driver to "turn it OFF" when the session ends ? To make a better
> comparison :
> 
> Can we unload a usb_mass_storage module when a USB disk(which uses the 
> module driver) is mounted and is being used ? I believe, the module
> will eventually get unloaded when we unmount the disk, if someone did
> a unload.
> 
> We have a similar situation here. The only difference is the driver is
> referenced only when one of its device is in a trace session.
> 
>>
>> Remember, removing a kernel module is something that only happens very
>> rarely, and is an explicit choice by someone with root permissions.  If
>> you want to remove that module, it should be able to go, as you know
>> what you are doing at that point in time.
> 
> Right, but when a device is "in use" can we do that ? I thought the user
> will get a module is in use or busy, error.
> 
> 
>>
>> Don't try to "protect the user from themselves" here, they want to shoot
>> their foot, make it hurt if they are aiming it there :)
>>
> 
> The module_get/put added here are only triggered when we start a trace 
> session, where we build a path for the current session from the 
> configured "source" to the configured "sink" and the path is destroyed
> at the end of the trace session. i.e, the path is not a permanent thing.
> It is constructed per session. So it is perfectly possible to remove a
> device in between trace sessions.
> 
>>> This will be
>>> released as soon as the session is ended. It is just like a PMU driver
>>> where the module refcount is held to ensure the module stays until the
>>> session is over. In this case, we have multiple components, each with
>>> its own driver invisible to the PMU driver. Hence the coresight driver
>>> must hold the reference.
>>
>> Again, please think this through and don't add extra complexity to the
>> normal path, and get it right if you do it (the existing patch is not
>> right as I pointed out.)  Personally, I feel the code should just be
>> able to be unloaded whenever they want, user beware...
> 
> Sure, will explore more to refine the code. Thanks for the trigger.
> 
> Cheers
> Suzuki

Suzuki
Suzuki K Poulose June 7, 2018, 9:43 a.m. UTC | #6
On 06/06/2018 09:55 PM, Kim Phillips wrote:
> On Wed, 6 Jun 2018 10:46:36 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:

> 
>> And while we are at this, I also realised that we hold references to the
>> parent devices for each connection (via bus_find_device() from
>> of_coresight_get_endpoint_device()), while parsing the platform data,
>> which is never released.
> 
> Would this fix that?:

Not completely. We store the dev_name() as a reference, which itself can 
be free'd, when the device is gone. I have a fix for this in my next
version of the DT clean up series [0], where I clean up most of the
platform parsing code.


[0] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-June/582904.html

Cheers
Suzuki

> 
> diff --git a/drivers/hwtracing/coresight/of_coresight.c b/drivers/hwtracing/coresight/of_coresight.c
> index a33a92ebe74b..a43ab078c85e 100644
> --- a/drivers/hwtracing/coresight/of_coresight.c
> +++ b/drivers/hwtracing/coresight/of_coresight.c
> @@ -181,6 +181,8 @@ of_get_coresight_platform_data(struct device *dev,
>                          pdata->child_names[i] = dev_name(rdev);
>                          pdata->child_ports[i] = rendpoint.id;
>   
> +                       put_device(rdev);
> +
>                          i++;
>                  } while (ep);
>          }
Greg Kroah-Hartman June 7, 2018, 9:53 a.m. UTC | #7
On Thu, Jun 07, 2018 at 10:32:21AM +0100, Suzuki K Poulose wrote:
> On 06/07/2018 10:13 AM, Greg Kroah-Hartman wrote:
> > On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
> > > Hi Greg,
> > > 
> > > On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
> > > > On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
> > > > > On Wed, 6 Jun 2018 10:46:36 +0100
> > > > > Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> > > > > 
> > > > > > On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
> > > > > > > On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
> > > > > > > > Increment the refcnt for driver modules in current use by calling
> > > > > > > > module_get in coresight_build_path and module_put in release_path.
> > > > > > > > 
> > > > > > > > This prevents driver modules from being unloaded when they are in use,
> > > > > > > > either in sysfs or perf mode.
> > > > > > > 
> > > > > > > Why does it matter?  Shouldn't you be allowed to remove any module at
> > > > > > > any point in time, much like a networking driver?
> > > 
> > > The user doesn't have an explicit refcount on the individual components
> > > in a trace session. So, when a trace session is in progress, it is as
> > > good as having a "file" open on each component that is part of the
> > > active trace session. So, we don't want the driver to be removed when
> > > the component is being used in the trace collection.
> > 
> > Why not?  What's wrong with that happening and then the trace collection
> > starts failing with -ENODEV or something?
> 
> May be I am missing something here. Can we allow the driver to be removed
> when one of its device is "turned ON" and we need the same
> driver to "turn it OFF" when the session ends ? To make a better
> comparison :
> 
> Can we unload a usb_mass_storage module when a USB disk(which uses the
> module driver) is mounted and is being used ? I believe, the module
> will eventually get unloaded when we unmount the disk, if someone did
> a unload.

No, mount causes the module count to be incrememted.  Mount and
"open/close" are the old-school way of doing module reference counting.

Look at how network drivers work today, you can unload any network
driver even if there is a valid network connection "up and running"
attached to it.  It just gets torn down when that request happens.

> We have a similar situation here. The only difference is the driver is
> referenced only when one of its device is in a trace session.

I understand, I'm saying that you have to be very careful when messing
around with module reference counts to get it correct and perhaps you
should just change your design to not care about module reference counts
at all, like networking did 15+ years ago.

Let's learn from the good examples in our past (like networking), and
not like the older bad examples (like mount/files).

> > Remember, removing a kernel module is something that only happens very
> > rarely, and is an explicit choice by someone with root permissions.  If
> > you want to remove that module, it should be able to go, as you know
> > what you are doing at that point in time.
> 
> Right, but when a device is "in use" can we do that ? I thought the user
> will get a module is in use or busy, error.

Try it on networking today :)

> > Don't try to "protect the user from themselves" here, they want to shoot
> > their foot, make it hurt if they are aiming it there :)
> > 
> 
> The module_get/put added here are only triggered when we start a trace
> session, where we build a path for the current session from the configured
> "source" to the configured "sink" and the path is destroyed
> at the end of the trace session. i.e, the path is not a permanent thing.
> It is constructed per session. So it is perfectly possible to remove a
> device in between trace sessions.

That's fine, but again, just be careful to get this correct.  The patch
I reviewed did not seem to do that.

thanks,

greg k-h
Suzuki K Poulose June 7, 2018, 10:07 a.m. UTC | #8
On 06/07/2018 10:53 AM, Greg Kroah-Hartman wrote:
> On Thu, Jun 07, 2018 at 10:32:21AM +0100, Suzuki K Poulose wrote:
>> On 06/07/2018 10:13 AM, Greg Kroah-Hartman wrote:
>>> On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
>>>> Hi Greg,
>>>>
>>>> On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
>>>>> On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
>>>>>> On Wed, 6 Jun 2018 10:46:36 +0100
>>>>>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>>>>>
>>>>>>> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
>>>>>>>> On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
>>>>>>>>> Increment the refcnt for driver modules in current use by calling
>>>>>>>>> module_get in coresight_build_path and module_put in release_path.
>>>>>>>>>
>>>>>>>>> This prevents driver modules from being unloaded when they are in use,
>>>>>>>>> either in sysfs or perf mode.
>>>>>>>>
>>>>>>>> Why does it matter?  Shouldn't you be allowed to remove any module at
>>>>>>>> any point in time, much like a networking driver?
>>>>
>>>> The user doesn't have an explicit refcount on the individual components
>>>> in a trace session. So, when a trace session is in progress, it is as
>>>> good as having a "file" open on each component that is part of the
>>>> active trace session. So, we don't want the driver to be removed when
>>>> the component is being used in the trace collection.
>>>
>>> Why not?  What's wrong with that happening and then the trace collection
>>> starts failing with -ENODEV or something?
>>
>> May be I am missing something here. Can we allow the driver to be removed
>> when one of its device is "turned ON" and we need the same
>> driver to "turn it OFF" when the session ends ? To make a better
>> comparison :
>>
>> Can we unload a usb_mass_storage module when a USB disk(which uses the
>> module driver) is mounted and is being used ? I believe, the module
>> will eventually get unloaded when we unmount the disk, if someone did
>> a unload.
> 
> No, mount causes the module count to be incrememted.  Mount and
> "open/close" are the old-school way of doing module reference counting.
> 
> Look at how network drivers work today, you can unload any network
> driver even if there is a valid network connection "up and running"
> attached to it.  It just gets torn down when that request happens.

Ok, that makes more sense now. Thanks for the hints. However, it doesn't
look that easy from the coresight point due to the way the devices are
used in an interconnected manner which could be part of multiple trace
sessions.

e.g, a funnel could be part of two independent trace sessions with
different sets of sources/sinks. Tearing down the trace sessions is
going to be a difficult task unless we make drastic changes to the PMU
framework itself. But will see, what best we can do to make it modern
:-)

> 
>> We have a similar situation here. The only difference is the driver is
>> referenced only when one of its device is in a trace session.
> 
> I understand, I'm saying that you have to be very careful when messing
> around with module reference counts to get it correct and perhaps you
> should just change your design to not care about module reference counts
> at all, like networking did 15+ years ago.
> 
> Let's learn from the good examples in our past (like networking), and
> not like the older bad examples (like mount/files).
> 
>>> Remember, removing a kernel module is something that only happens very
>>> rarely, and is an explicit choice by someone with root permissions.  If
>>> you want to remove that module, it should be able to go, as you know
>>> what you are doing at that point in time.
>>
>> Right, but when a device is "in use" can we do that ? I thought the user
>> will get a module is in use or busy, error.
> 
> Try it on networking today :)
> 
>>> Don't try to "protect the user from themselves" here, they want to shoot
>>> their foot, make it hurt if they are aiming it there :)
>>>
>>
>> The module_get/put added here are only triggered when we start a trace
>> session, where we build a path for the current session from the configured
>> "source" to the configured "sink" and the path is destroyed
>> at the end of the trace session. i.e, the path is not a permanent thing.
>> It is constructed per session. So it is perfectly possible to remove a
>> device in between trace sessions.
> 
> That's fine, but again, just be careful to get this correct.  The patch
> I reviewed did not seem to do that.

Thanks for the useful suggestions, we will explore this more.

Cheers
Suzuki
Kim Phillips June 7, 2018, 5:13 p.m. UTC | #9
On Thu, 7 Jun 2018 11:07:15 +0100
Suzuki K Poulose <suzuki.poulose@arm.com> wrote:

> On 06/07/2018 10:53 AM, Greg Kroah-Hartman wrote:
> > On Thu, Jun 07, 2018 at 10:32:21AM +0100, Suzuki K Poulose wrote:
> >> On 06/07/2018 10:13 AM, Greg Kroah-Hartman wrote:
> >>> On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
> >>>> Hi Greg,
> >>>>
> >>>> On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
> >>>>> On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
> >>>>>> On Wed, 6 Jun 2018 10:46:36 +0100
> >>>>>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> >>>>>>
> >>>>>>> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
> >>>>>>>> On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
> >>>>>>>>> Increment the refcnt for driver modules in current use by calling
> >>>>>>>>> module_get in coresight_build_path and module_put in release_path.
> >>>>>>>>>
> >>>>>>>>> This prevents driver modules from being unloaded when they are in use,
> >>>>>>>>> either in sysfs or perf mode.
> >>>>>>>>
> >>>>>>>> Why does it matter?  Shouldn't you be allowed to remove any module at
> >>>>>>>> any point in time, much like a networking driver?
> >>>>
> >>>> The user doesn't have an explicit refcount on the individual components
> >>>> in a trace session. So, when a trace session is in progress, it is as
> >>>> good as having a "file" open on each component that is part of the
> >>>> active trace session. So, we don't want the driver to be removed when
> >>>> the component is being used in the trace collection.
> >>>
> >>> Why not?  What's wrong with that happening and then the trace collection
> >>> starts failing with -ENODEV or something?
> >>
> >> May be I am missing something here. Can we allow the driver to be removed
> >> when one of its device is "turned ON" and we need the same
> >> driver to "turn it OFF" when the session ends ? To make a better
> >> comparison :
> >>
> >> Can we unload a usb_mass_storage module when a USB disk(which uses the
> >> module driver) is mounted and is being used ? I believe, the module
> >> will eventually get unloaded when we unmount the disk, if someone did
> >> a unload.
> > 
> > No, mount causes the module count to be incrememted.  Mount and
> > "open/close" are the old-school way of doing module reference counting.
> > 
> > Look at how network drivers work today, you can unload any network
> > driver even if there is a valid network connection "up and running"
> > attached to it.  It just gets torn down when that request happens.
> 
> Ok, that makes more sense now. Thanks for the hints. However, it doesn't
> look that easy from the coresight point due to the way the devices are
> used in an interconnected manner which could be part of multiple trace
> sessions.
> 
> e.g, a funnel could be part of two independent trace sessions with
> different sets of sources/sinks. Tearing down the trace sessions is
> going to be a difficult task unless we make drastic changes to the PMU
> framework itself. But will see, what best we can do to make it modern
> :-)
> > 
> >> We have a similar situation here. The only difference is the driver is
> >> referenced only when one of its device is in a trace session.
> > 
> > I understand, I'm saying that you have to be very careful when messing
> > around with module reference counts to get it correct and perhaps you
> > should just change your design to not care about module reference counts
> > at all, like networking did 15+ years ago.
> > 
> > Let's learn from the good examples in our past (like networking), and
> > not like the older bad examples (like mount/files).
> > 
> >>> Remember, removing a kernel module is something that only happens very
> >>> rarely, and is an explicit choice by someone with root permissions.  If
> >>> you want to remove that module, it should be able to go, as you know
> >>> what you are doing at that point in time.
> >>
> >> Right, but when a device is "in use" can we do that ? I thought the user
> >> will get a module is in use or busy, error.
> > 
> > Try it on networking today :)
> > 
> >>> Don't try to "protect the user from themselves" here, they want to shoot
> >>> their foot, make it hurt if they are aiming it there :)
> >>>
> >>
> >> The module_get/put added here are only triggered when we start a trace
> >> session, where we build a path for the current session from the configured
> >> "source" to the configured "sink" and the path is destroyed
> >> at the end of the trace session. i.e, the path is not a permanent thing.
> >> It is constructed per session. So it is perfectly possible to remove a
> >> device in between trace sessions.
> > 
> > That's fine, but again, just be careful to get this correct.  The patch
> > I reviewed did not seem to do that.
> 
> Thanks for the useful suggestions, we will explore this more.

I'm going to assume the series is still valid after this discussion,
since technically just this patch can get dropped, and the user is able
to shoot themselves in the foot.  This series is for development
purposes, after all.

Let me know if I'm missing something.

Thanks,

Kim
Suzuki K Poulose June 7, 2018, 9:10 p.m. UTC | #10
On 06/07/2018 06:13 PM, Kim Phillips wrote:
> On Thu, 7 Jun 2018 11:07:15 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> 
>> On 06/07/2018 10:53 AM, Greg Kroah-Hartman wrote:
>>> On Thu, Jun 07, 2018 at 10:32:21AM +0100, Suzuki K Poulose wrote:
>>>> On 06/07/2018 10:13 AM, Greg Kroah-Hartman wrote:
>>>>> On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
>>>>>>> On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
>>>>>>>> On Wed, 6 Jun 2018 10:46:36 +0100
>>>>>>>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>>>>>>>
>>>>>>>>> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
>>>>>>>>>> On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
>>>>>>>>>>> Increment the refcnt for driver modules in current use by calling
>>>>>>>>>>> module_get in coresight_build_path and module_put in release_path.
>>>>>>>>>>>
>>>>>>>>>>> This prevents driver modules from being unloaded when they are in use,
>>>>>>>>>>> either in sysfs or perf mode.
>>>>>>>>>>
>>>>>>>>>> Why does it matter?  Shouldn't you be allowed to remove any module at
>>>>>>>>>> any point in time, much like a networking driver?
>>>>>>
>>>>>> The user doesn't have an explicit refcount on the individual components
>>>>>> in a trace session. So, when a trace session is in progress, it is as
>>>>>> good as having a "file" open on each component that is part of the
>>>>>> active trace session. So, we don't want the driver to be removed when
>>>>>> the component is being used in the trace collection.
>>>>>
>>>>> Why not?  What's wrong with that happening and then the trace collection
>>>>> starts failing with -ENODEV or something?
>>>>
>>>> May be I am missing something here. Can we allow the driver to be removed
>>>> when one of its device is "turned ON" and we need the same
>>>> driver to "turn it OFF" when the session ends ? To make a better
>>>> comparison :
>>>>
>>>> Can we unload a usb_mass_storage module when a USB disk(which uses the
>>>> module driver) is mounted and is being used ? I believe, the module
>>>> will eventually get unloaded when we unmount the disk, if someone did
>>>> a unload.
>>>
>>> No, mount causes the module count to be incrememted.  Mount and
>>> "open/close" are the old-school way of doing module reference counting.
>>>
>>> Look at how network drivers work today, you can unload any network
>>> driver even if there is a valid network connection "up and running"
>>> attached to it.  It just gets torn down when that request happens.
>>
>> Ok, that makes more sense now. Thanks for the hints. However, it doesn't
>> look that easy from the coresight point due to the way the devices are
>> used in an interconnected manner which could be part of multiple trace
>> sessions.
>>
>> e.g, a funnel could be part of two independent trace sessions with
>> different sets of sources/sinks. Tearing down the trace sessions is
>> going to be a difficult task unless we make drastic changes to the PMU
>> framework itself. But will see, what best we can do to make it modern
>> :-)
>>>
>>>> We have a similar situation here. The only difference is the driver is
>>>> referenced only when one of its device is in a trace session.
>>>
>>> I understand, I'm saying that you have to be very careful when messing
>>> around with module reference counts to get it correct and perhaps you
>>> should just change your design to not care about module reference counts
>>> at all, like networking did 15+ years ago.
>>>
>>> Let's learn from the good examples in our past (like networking), and
>>> not like the older bad examples (like mount/files).
>>>
>>>>> Remember, removing a kernel module is something that only happens very
>>>>> rarely, and is an explicit choice by someone with root permissions.  If
>>>>> you want to remove that module, it should be able to go, as you know
>>>>> what you are doing at that point in time.
>>>>
>>>> Right, but when a device is "in use" can we do that ? I thought the user
>>>> will get a module is in use or busy, error.
>>>
>>> Try it on networking today :)
>>>
>>>>> Don't try to "protect the user from themselves" here, they want to shoot
>>>>> their foot, make it hurt if they are aiming it there :)
>>>>>
>>>>
>>>> The module_get/put added here are only triggered when we start a trace
>>>> session, where we build a path for the current session from the configured
>>>> "source" to the configured "sink" and the path is destroyed
>>>> at the end of the trace session. i.e, the path is not a permanent thing.
>>>> It is constructed per session. So it is perfectly possible to remove a
>>>> device in between trace sessions.
>>>
>>> That's fine, but again, just be careful to get this correct.  The patch
>>> I reviewed did not seem to do that.
>>
>> Thanks for the useful suggestions, we will explore this more.

Kim,

> 
> I'm going to assume the series is still valid after this discussion,
> since technically just this patch can get dropped, and the user is able
> to shoot themselves in the foot.

That doesn't mean the kernel can panic() if the user decided to unload 
the module while the trace session is in progress. It only means that
the trace session could be stopped in between in the worst case. But
nothing more harmful to the system.

>  This series is for development  purposes, after all.

Do you mean that this series is for internal development purposes and 
not upstream ? Making the drivers modular are always helpful, especially 
for something related to tracing, that allows the module to be unloaded 
after use. So, it would be good to have this series in, but in a manner 
which is usable and doesn't cause harm to the overall system usage.

I think the summary of the discussion is that we need more robust code
to handle the situation, which also allows unloading the modules without
any trouble.

Cheers

Suzuki

> 
> Let me know if I'm missing something.
> 
> Thanks,
> 
> Kim
>
Mathieu Poirier June 7, 2018, 9:40 p.m. UTC | #11
On 7 June 2018 at 15:10, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> On 06/07/2018 06:13 PM, Kim Phillips wrote:
>>
>> On Thu, 7 Jun 2018 11:07:15 +0100
>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>>> On 06/07/2018 10:53 AM, Greg Kroah-Hartman wrote:
>>>>
>>>> On Thu, Jun 07, 2018 at 10:32:21AM +0100, Suzuki K Poulose wrote:
>>>>>
>>>>> On 06/07/2018 10:13 AM, Greg Kroah-Hartman wrote:
>>>>>>
>>>>>> On Thu, Jun 07, 2018 at 10:04:33AM +0100, Suzuki K Poulose wrote:
>>>>>>>
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> On 06/07/2018 09:34 AM, Greg Kroah-Hartman wrote:
>>>>>>>>
>>>>>>>> On Wed, Jun 06, 2018 at 03:55:01PM -0500, Kim Phillips wrote:
>>>>>>>>>
>>>>>>>>> On Wed, 6 Jun 2018 10:46:36 +0100
>>>>>>>>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 06/06/2018 09:24 AM, Greg Kroah-Hartman wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 05, 2018 at 04:07:01PM -0500, Kim Phillips wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Increment the refcnt for driver modules in current use by
>>>>>>>>>>>> calling
>>>>>>>>>>>> module_get in coresight_build_path and module_put in
>>>>>>>>>>>> release_path.
>>>>>>>>>>>>
>>>>>>>>>>>> This prevents driver modules from being unloaded when they are
>>>>>>>>>>>> in use,
>>>>>>>>>>>> either in sysfs or perf mode.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Why does it matter?  Shouldn't you be allowed to remove any
>>>>>>>>>>> module at
>>>>>>>>>>> any point in time, much like a networking driver?
>>>>>>>
>>>>>>>
>>>>>>> The user doesn't have an explicit refcount on the individual
>>>>>>> components
>>>>>>> in a trace session. So, when a trace session is in progress, it is as
>>>>>>> good as having a "file" open on each component that is part of the
>>>>>>> active trace session. So, we don't want the driver to be removed when
>>>>>>> the component is being used in the trace collection.
>>>>>>
>>>>>>
>>>>>> Why not?  What's wrong with that happening and then the trace
>>>>>> collection
>>>>>> starts failing with -ENODEV or something?
>>>>>
>>>>>
>>>>> May be I am missing something here. Can we allow the driver to be
>>>>> removed
>>>>> when one of its device is "turned ON" and we need the same
>>>>> driver to "turn it OFF" when the session ends ? To make a better
>>>>> comparison :
>>>>>
>>>>> Can we unload a usb_mass_storage module when a USB disk(which uses the
>>>>> module driver) is mounted and is being used ? I believe, the module
>>>>> will eventually get unloaded when we unmount the disk, if someone did
>>>>> a unload.
>>>>
>>>>
>>>> No, mount causes the module count to be incrememted.  Mount and
>>>> "open/close" are the old-school way of doing module reference counting.
>>>>
>>>> Look at how network drivers work today, you can unload any network
>>>> driver even if there is a valid network connection "up and running"
>>>> attached to it.  It just gets torn down when that request happens.
>>>
>>>
>>> Ok, that makes more sense now. Thanks for the hints. However, it doesn't
>>> look that easy from the coresight point due to the way the devices are
>>> used in an interconnected manner which could be part of multiple trace
>>> sessions.
>>>
>>> e.g, a funnel could be part of two independent trace sessions with
>>> different sets of sources/sinks. Tearing down the trace sessions is
>>> going to be a difficult task unless we make drastic changes to the PMU
>>> framework itself. But will see, what best we can do to make it modern
>>> :-)
>>>>
>>>>
>>>>> We have a similar situation here. The only difference is the driver is
>>>>> referenced only when one of its device is in a trace session.
>>>>
>>>>
>>>> I understand, I'm saying that you have to be very careful when messing
>>>> around with module reference counts to get it correct and perhaps you
>>>> should just change your design to not care about module reference counts
>>>> at all, like networking did 15+ years ago.
>>>>
>>>> Let's learn from the good examples in our past (like networking), and
>>>> not like the older bad examples (like mount/files).
>>>>
>>>>>> Remember, removing a kernel module is something that only happens very
>>>>>> rarely, and is an explicit choice by someone with root permissions.
>>>>>> If
>>>>>> you want to remove that module, it should be able to go, as you know
>>>>>> what you are doing at that point in time.
>>>>>
>>>>>
>>>>> Right, but when a device is "in use" can we do that ? I thought the
>>>>> user
>>>>> will get a module is in use or busy, error.
>>>>
>>>>
>>>> Try it on networking today :)
>>>>
>>>>>> Don't try to "protect the user from themselves" here, they want to
>>>>>> shoot
>>>>>> their foot, make it hurt if they are aiming it there :)
>>>>>>
>>>>>
>>>>> The module_get/put added here are only triggered when we start a trace
>>>>> session, where we build a path for the current session from the
>>>>> configured
>>>>> "source" to the configured "sink" and the path is destroyed
>>>>> at the end of the trace session. i.e, the path is not a permanent
>>>>> thing.
>>>>> It is constructed per session. So it is perfectly possible to remove a
>>>>> device in between trace sessions.
>>>>
>>>>
>>>> That's fine, but again, just be careful to get this correct.  The patch
>>>> I reviewed did not seem to do that.
>>>
>>>
>>> Thanks for the useful suggestions, we will explore this more.
>
>
> Kim,
>
>>
>> I'm going to assume the series is still valid after this discussion,
>> since technically just this patch can get dropped, and the user is able
>> to shoot themselves in the foot.
>
>
> That doesn't mean the kernel can panic() if the user decided to unload the
> module while the trace session is in progress. It only means that
> the trace session could be stopped in between in the worst case. But
> nothing more harmful to the system.
>
>>  This series is for development  purposes, after all.
>
>
> Do you mean that this series is for internal development purposes and not
> upstream ? Making the drivers modular are always helpful, especially for
> something related to tracing, that allows the module to be unloaded after
> use. So, it would be good to have this series in, but in a manner which is
> usable and doesn't cause harm to the overall system usage.

Correct, we can't have a patchset that generates a kernel panic.

>
> I think the summary of the discussion is that we need more robust code
> to handle the situation, which also allows unloading the modules without
> any trouble.

The tricky part is the "unloading without any trouble".  The first
thing to so is if the driver is being used, the _remove() functions
need to go through the same process as it would under normal
condition.  That will allow to reinsert the module and have a fairly
good level of assurance that things will work properly.

Looking at things a little closer all the interconnection dependencies
in the core are done using a csdev and a lot of the current code is
already checking for a NULL condition (more checks may be needed with
the introduction of this set).  The real problem is with the "path"
used to keep track of the devices taking part in active sessions.
Those can be accessed when a process is swapped in and out, mandating
something fast and efficient.  One thing we could do is in a path,
keep track of a reference on csdev rather than make a copy of their
addresses.  That way the _remove() functions could simply set those to
NULL, making it easy to deal with.

>
> Cheers
>
> Suzuki
>
>>
>> Let me know if I'm missing something.
>>
>> Thanks,
>>
>> Kim
>>
>
Kim Phillips June 7, 2018, 9:47 p.m. UTC | #12
On Thu, 7 Jun 2018 22:10:07 +0100
Suzuki K Poulose <suzuki.poulose@arm.com> wrote:

> On 06/07/2018 06:13 PM, Kim Phillips wrote:
> > I'm going to assume the series is still valid after this discussion,
> > since technically just this patch can get dropped, and the user is able
> > to shoot themselves in the foot.
> 
> That doesn't mean the kernel can panic() if the user decided to unload 
> the module while the trace session is in progress. It only means that
> the trace session could be stopped in between in the worst case. But
> nothing more harmful to the system.

FWIW, I didn't see the kernel panic in my basic tests; just some bad
accesses: the new remove functions take care of cleaning up most items,
and most drivers still depend on the links and sinks (funnel,
replicator) drivers, so they can't be upset too bad.

> >  This series is for development  purposes, after all.
> 
> Do you mean that this series is for internal development purposes and 
> not upstream ? Making the drivers modular are always helpful, especially 

no, I'm posting them for upstream review because I'd like them upstream.

> for something related to tracing, that allows the module to be unloaded 
> after use. So, it would be good to have this series in, but in a manner 
> which is usable and doesn't cause harm to the overall system usage.
> 
> I think the summary of the discussion is that we need more robust code
> to handle the situation, which also allows unloading the modules without
> any trouble.

Trouble's relative.  My point was since the series is going to be used
mainly by developers testing their code, they already prepare for, and
expect badness to occur anyway.  Greg's point isn't lost here, and in
my interpretation, his review of this patch was that it was in the
wrong direction of safety: it made things unnecessarily too safe, up
front, and that items relative to the perf core should strive to adhere
to the higher standards set in place by the networking subsystem.  So,
this patch doesn't get his ack.

I compiled a new v5 series that omits this patch, and overwrote the v4
series here:

git://linux-arm.org/linux-kp.git, coresight-modules branch

but I'll hold of submitting a v5 for now.

I don't know how the perf core handles AUXTRACE drivers hanging up on
it.  I see intel-pt record support can't be built as a module.  I'm
guessing more testing for actual panics when using perf or sysfs is
what's being sought here?

Kim
Mathieu Poirier June 7, 2018, 9:59 p.m. UTC | #13
On 7 June 2018 at 15:47, Kim Phillips <kim.phillips@arm.com> wrote:
> On Thu, 7 Jun 2018 22:10:07 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
>> On 06/07/2018 06:13 PM, Kim Phillips wrote:
>> > I'm going to assume the series is still valid after this discussion,
>> > since technically just this patch can get dropped, and the user is able
>> > to shoot themselves in the foot.
>>
>> That doesn't mean the kernel can panic() if the user decided to unload
>> the module while the trace session is in progress. It only means that
>> the trace session could be stopped in between in the worst case. But
>> nothing more harmful to the system.
>
> FWIW, I didn't see the kernel panic in my basic tests; just some bad
> accesses: the new remove functions take care of cleaning up most items,
> and most drivers still depend on the links and sinks (funnel,
> replicator) drivers, so they can't be upset too bad.
>
>> >  This series is for development  purposes, after all.
>>
>> Do you mean that this series is for internal development purposes and
>> not upstream ? Making the drivers modular are always helpful, especially
>
> no, I'm posting them for upstream review because I'd like them upstream.
>
>> for something related to tracing, that allows the module to be unloaded
>> after use. So, it would be good to have this series in, but in a manner
>> which is usable and doesn't cause harm to the overall system usage.
>>
>> I think the summary of the discussion is that we need more robust code
>> to handle the situation, which also allows unloading the modules without
>> any trouble.
>
> Trouble's relative.  My point was since the series is going to be used
> mainly by developers testing their code, they already prepare for, and
> expect badness to occur anyway.  Greg's point isn't lost here, and in
> my interpretation, his review of this patch was that it was in the
> wrong direction of safety: it made things unnecessarily too safe, up
> front, and that items relative to the perf core should strive to adhere
> to the higher standards set in place by the networking subsystem.  So,
> this patch doesn't get his ack.

Greg's point was that it's OK to let users harm themselves (which I
totally support), but if you're going to prevent it, make sure to do
it right.

>
> I compiled a new v5 series that omits this patch, and overwrote the v4
> series here:
>
> git://linux-arm.org/linux-kp.git, coresight-modules branch
>
> but I'll hold of submitting a v5 for now.
>
> I don't know how the perf core handles AUXTRACE drivers hanging up on
> it.  I see intel-pt record support can't be built as a module.  I'm
> guessing more testing for actual panics when using perf or sysfs is
> what's being sought here?

There are two ways to approach the problem:

1) Kill active trace sessions (either sysFS or perf) if a driver that
is being used is removed.
2) Deal with the removal in the coresight core, making sure we don't
access operations provided by removed drivers.

The end result in both cases will be the same: failure to properly
terminate the trace session because of user action.

I'm personally in favour of the second option, simply because it keeps
problems resolution with the CS subsystem.

>
> Kim
Suzuki K Poulose June 8, 2018, 9:22 a.m. UTC | #14
On 06/07/2018 10:47 PM, Kim Phillips wrote:
> On Thu, 7 Jun 2018 22:10:07 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> 
>> On 06/07/2018 06:13 PM, Kim Phillips wrote:
>>> I'm going to assume the series is still valid after this discussion,
>>> since technically just this patch can get dropped, and the user is able
>>> to shoot themselves in the foot.
>>
>> That doesn't mean the kernel can panic() if the user decided to unload
>> the module while the trace session is in progress. It only means that
>> the trace session could be stopped in between in the worst case. But
>> nothing more harmful to the system.

Kim,

> 
> FWIW, I didn't see the kernel panic in my basic tests; just some bad
> accesses: the new remove functions take care of cleaning up most items,
> and most drivers still depend on the links and sinks (funnel,
> replicator) drivers, so they can't be upset too bad.

Bad accesses are still bad. The bad access could trigger an Oops for 
e.g, or could even corrupt the other parts of the kernel if we try
to access a memory that is free'd (and reallocated to somebody else).
So, the point is there are issues with the series which we know for
sure from code analysis. It may take different forms to show up at
runtime.

> 
>>>   This series is for development  purposes, after all.
>>
>> Do you mean that this series is for internal development purposes and
>> not upstream ? Making the drivers modular are always helpful, especially
> 
> no, I'm posting them for upstream review because I'd like them upstream.
> 
>> for something related to tracing, that allows the module to be unloaded
>> after use. So, it would be good to have this series in, but in a manner
>> which is usable and doesn't cause harm to the overall system usage.
>>
>> I think the summary of the discussion is that we need more robust code
>> to handle the situation, which also allows unloading the modules without
>> any trouble.
> 
> Trouble's relative.  My point was since the series is going to be used
> mainly by developers testing their code, they already prepare for, and
> expect badness to occur anyway.  Greg's point isn't lost here, and in


Making something modular is not really just for the use of developers.
There are and will be other users for a device driver as a module and
it is a fundamental feature people expect (especially in the enterprise
world, where there is one kernel which builds most of the stuff as
module to let the users pick the individual drivers as they need).
So, at the kernel driver you can't really be sure, if the user is
actually aware of the "developer" only mode and he knows that we can
crash the kernel.

> my interpretation, his review of this patch was that it was in the
> wrong direction of safety: it made things unnecessarily too safe, up
> front, and that items relative to the perf core should strive to adhere

One of the areas of improvement towards the "modern" behavior is failing
the activation of the trace schedule, when a component in the path has
been removed when we go through coresight_enable_path(). Right now, we 
create a path and then we do enable_path() and disable_path() around the
trace schedules and the path is destroyed only at pmu->free_aux(). With
the current patch, we hold the reference to the device/driver throughout
the duration of the life time of the tracing, even when the tracing
may be disabled in between.

I think, if we get to that point, we should be at the best we can reach
towards the expected behavior. But having said that, it is indeed tricky
to get that. May be we could play a little bit with the refcounting on
csdev and check if the refcount is only held by the number of paths this
component is part of (needs more thought).


Suzuki
diff mbox

Patch

diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
index f96258de1e9b..da702507a55c 100644
--- a/drivers/hwtracing/coresight/coresight-core.c
+++ b/drivers/hwtracing/coresight/coresight-core.c
@@ -1040,8 +1040,12 @@  EXPORT_SYMBOL_GPL(coresight_register);
 
 void coresight_unregister(struct coresight_device *csdev)
 {
+       mutex_lock(&coresight_mutex);
+
        /* Remove references of that device in the topology */
        coresight_remove_conns(csdev);
        device_unregister(&csdev->dev);
+
+       mutex_unlock(&coresight_mutex);
 }
 EXPORT_SYMBOL_GPL(coresight_unregister);

> And while we are at this, I also realised that we hold references to the
> parent devices for each connection (via bus_find_device() from 
> of_coresight_get_endpoint_device()), while parsing the platform data, 
> which is never released.

Would this fix that?:

diff --git a/drivers/hwtracing/coresight/of_coresight.c b/drivers/hwtracing/coresight/of_coresight.c
index a33a92ebe74b..a43ab078c85e 100644
--- a/drivers/hwtracing/coresight/of_coresight.c
+++ b/drivers/hwtracing/coresight/of_coresight.c
@@ -181,6 +181,8 @@  of_get_coresight_platform_data(struct device *dev,
                        pdata->child_names[i] = dev_name(rdev);
                        pdata->child_ports[i] = rendpoint.id;
 
+                       put_device(rdev);
+
                        i++;
                } while (ep);
        }