diff mbox series

coresight: Add mode check when enable/disable csdev source

Message ID 20230921132904.60996-1-hejunhao3@huawei.com (mailing list archive)
State New, archived
Headers show
Series coresight: Add mode check when enable/disable csdev source | expand

Commit Message

Junhao He Sept. 21, 2023, 1:29 p.m. UTC
The commmit [1] replace "source_ops(csdev)::enable()" with
coresight_enable_source(). After this patch, when enable ETM through
perf mode, the csdev::enable will be set to true. Then if we to disable
the ETM by sysfs mode at the time, the resource will be released
incorrectly, kernel will report the following call trace.

perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink
echo 1 > /sys/bus/coresight/devices/etm0/enable_source
echo 0 > /sys/bus/coresight/devices/etm0/enable_source
Unable to handle kernel NULL pointer dereference at virtual
address 00000000000001d0
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
...
Call trace:
 etm4_disable+0x54/0x150 [coresight_etm4x]
 coresight_disable_source+0x6c/0x98 [coresight]
 coresight_disable+0x74/0x1c0 [coresight]
 enable_source_store+0x88/0xa0 [coresight]
 dev_attr_store+0x20/0x40
 sysfs_kf_write+0x4c/0x68
 kernfs_fop_write_iter+0x120/0x1b8
 vfs_write+0x2dc/0x3b0
 ksys_write+0x70/0x108
 __arm64_sys_write+0x24/0x38
 invoke_syscall+0x50/0x128
 el0_svc_common.constprop.0+0x104/0x130
 do_el0_svc+0x40/0xb8
 el0_svc+0x2c/0xb8
 el0t_64_sync_handler+0xc0/0xc8
 el0t_64_sync+0x1a4/0x1a8
Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5)
---[ end trace 0000000000000000 ]---

The csdev::enable specify the device is currently part of an active
path, when enabling source via perf mode, csdev::enable also should
be set to true. We can add mode check in etm4_disable() to fix that,
if it's done, the sysfs cannot report busy when sysfs to enable ETM
that has been enabled by perf.

Struct coresight_device add member of mode to check device busy in
coresight_(enable/disable)_source function to fixes handle kernel
NULL pointer, and sysfs report busy if perf session is already using
the ETM.
By the way, inperhaps another cleanup patch may be upload to remove
the etmv4_drvdata::mode tmc_drvdata::mode or others.

Test:
perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
cat /sys/bus/coresight/devices/etm2/enable_source
1
echo 1 > /sys/bus/coresight/devices/etm2/enable_source
-bash: echo: write error: Device or resource busy
echo 0 > /sys/bus/coresight/devices/etm2/enable_source
coresight etm2: Someone is already using the tracer
cat /sys/bus/coresight/devices/etm2/enable_source
1

[1] "coresight: Enable and disable helper devices adjacent to the path"
Fixes: 6148652807ba ("coresight: Enable and disable helper devices adjacent to the path")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
---
 drivers/hwtracing/coresight/coresight-core.c   | 18 ++++++++++++++++--
 .../hwtracing/coresight/coresight-etm-perf.c   |  6 +++++-
 drivers/hwtracing/coresight/coresight-priv.h   |  3 ++-
 include/linux/coresight.h                      | 14 ++++++++------
 4 files changed, 31 insertions(+), 10 deletions(-)

Comments

James Clark Sept. 21, 2023, 3:36 p.m. UTC | #1
On 21/09/2023 14:29, Junhao He wrote:
> The commmit [1] replace "source_ops(csdev)::enable()" with
> coresight_enable_source(). After this patch, when enable ETM through
> perf mode, the csdev::enable will be set to true. Then if we to disable
> the ETM by sysfs mode at the time, the resource will be released
> incorrectly, kernel will report the following call trace.
> 
> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
> echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink
> echo 1 > /sys/bus/coresight/devices/etm0/enable_source
> echo 0 > /sys/bus/coresight/devices/etm0/enable_source
> Unable to handle kernel NULL pointer dereference at virtual
> address 00000000000001d0
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> ...
> Call trace:
>  etm4_disable+0x54/0x150 [coresight_etm4x]
>  coresight_disable_source+0x6c/0x98 [coresight]
>  coresight_disable+0x74/0x1c0 [coresight]
>  enable_source_store+0x88/0xa0 [coresight]
>  dev_attr_store+0x20/0x40
>  sysfs_kf_write+0x4c/0x68
>  kernfs_fop_write_iter+0x120/0x1b8
>  vfs_write+0x2dc/0x3b0
>  ksys_write+0x70/0x108
>  __arm64_sys_write+0x24/0x38
>  invoke_syscall+0x50/0x128
>  el0_svc_common.constprop.0+0x104/0x130
>  do_el0_svc+0x40/0xb8
>  el0_svc+0x2c/0xb8
>  el0t_64_sync_handler+0xc0/0xc8
>  el0t_64_sync+0x1a4/0x1a8
> Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5)
> ---[ end trace 0000000000000000 ]---
> 
> The csdev::enable specify the device is currently part of an active
> path, when enabling source via perf mode, csdev::enable also should
> be set to true. We can add mode check in etm4_disable() to fix that,
> if it's done, the sysfs cannot report busy when sysfs to enable ETM
> that has been enabled by perf.
> 
> Struct coresight_device add member of mode to check device busy in
> coresight_(enable/disable)_source function to fixes handle kernel
> NULL pointer, and sysfs report busy if perf session is already using
> the ETM.
> By the way, inperhaps another cleanup patch may be upload to remove
> the etmv4_drvdata::mode tmc_drvdata::mode or others.
> 

Hi Junhao,

I definitely think we should do the cleanup at the same time as this
change. Otherwise we risk leaving in the other modes which are
duplicates of this and could easily cause confusion.

Although I saw there are a couple of modes that are not enum cs_mode,
but a different kind of mode. And it's not always obvious because enum
cs_mode is sometimes stored as an int...

> Test:
> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
> cat /sys/bus/coresight/devices/etm2/enable_source
> 1
> echo 1 > /sys/bus/coresight/devices/etm2/enable_source
> -bash: echo: write error: Device or resource busy
> echo 0 > /sys/bus/coresight/devices/etm2/enable_source
> coresight etm2: Someone is already using the tracer
> cat /sys/bus/coresight/devices/etm2/enable_source
> 1
> 

I tested the fix and it works, although I have a few comments below on
whether we should do something else instead.

> [1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
adjacent to the path")

What was the original behavior before that change? Maybe it didn't have
exactly the same error but was it still possible to enable and disable a
sink after starting a Perf session? Seems like on the disable the same
thing would have happened or something else bad?

> Signed-off-by: Junhao He <hejunhao3@huawei.com>
> ---
>  drivers/hwtracing/coresight/coresight-core.c   | 18 ++++++++++++++++--
>  .../hwtracing/coresight/coresight-etm-perf.c   |  6 +++++-
>  drivers/hwtracing/coresight/coresight-priv.h   |  3 ++-
>  include/linux/coresight.h                      | 14 ++++++++------
>  4 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 206f17826399..7b485fc2ed19 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -388,6 +388,7 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode,
>  				return ret;
>  		}
>  		csdev->enable = true;
> +		csdev->mode = mode;
>  	}
>  
>  	atomic_inc(&csdev->refcnt);
> @@ -446,18 +447,26 @@ static void coresight_disable_helpers(struct coresight_device *csdev)
>   *  the device if there are no users left.
>   *
>   *  @csdev: The coresight device to disable
> + *  @mode: How the coresight device is being used, perf mode or sysfs mode.
>   *  @data: Opaque data to pass on to the disable function of the source device.
>   *         For example in perf mode this is a pointer to the struct perf_event.
>   *
>   *  Returns true if the device has been disabled.
>   */
> -bool coresight_disable_source(struct coresight_device *csdev, void *data)
> +bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode,
> +			      void *data)
>  {
> +	if (csdev->mode && csdev->mode != mode) {

Can you do "csdev->mode != DISABLED" instead of just the implicit check
for non zero. Although probably only "csdev->mode != mode" is enough
unless you expect coresight_disable_source() to be called with mode =
DISABLED.

> +		dev_err(&csdev->dev, "Someone is already using the tracer\n");
> +		return false;
> +	}
> +
>  	if (atomic_dec_return(&csdev->refcnt) == 0) {
>  		if (source_ops(csdev)->disable)
>  			source_ops(csdev)->disable(csdev, data);
>  		coresight_disable_helpers(csdev);
>  		csdev->enable = false;
> +		csdev->mode = CS_MODE_DISABLED;
>  	}
>  	return !csdev->enable;
>  }
> @@ -1117,6 +1126,11 @@ int coresight_enable(struct coresight_device *csdev)
>  	if (ret)
>  		goto out;
>  
> +	if (csdev->mode == CS_MODE_PERF) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +

I'm struggling to understand why a reference count wouldn't work, rather
than blocking on whether there is a Perf session active or not. Because
now you can do enable via sysfs first, then open the Perf session, but
not the other way around.

Is there a reason for the behavior to not be symmetrical?

And refcounts are already used in the code for the individual devices.

Thanks
James
Suzuki K Poulose Sept. 22, 2023, 9:20 a.m. UTC | #2
On 21/09/2023 16:36, James Clark wrote:
> 
> 
> On 21/09/2023 14:29, Junhao He wrote:
>> The commmit [1] replace "source_ops(csdev)::enable()" with
>> coresight_enable_source(). After this patch, when enable ETM through
>> perf mode, the csdev::enable will be set to true. Then if we to disable
>> the ETM by sysfs mode at the time, the resource will be released
>> incorrectly, kernel will report the following call trace.
>>
>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>> echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink
>> echo 1 > /sys/bus/coresight/devices/etm0/enable_source
>> echo 0 > /sys/bus/coresight/devices/etm0/enable_source
>> Unable to handle kernel NULL pointer dereference at virtual
>> address 00000000000001d0
>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> ...
>> Call trace:
>>   etm4_disable+0x54/0x150 [coresight_etm4x]
>>   coresight_disable_source+0x6c/0x98 [coresight]
>>   coresight_disable+0x74/0x1c0 [coresight]
>>   enable_source_store+0x88/0xa0 [coresight]
>>   dev_attr_store+0x20/0x40
>>   sysfs_kf_write+0x4c/0x68
>>   kernfs_fop_write_iter+0x120/0x1b8
>>   vfs_write+0x2dc/0x3b0
>>   ksys_write+0x70/0x108
>>   __arm64_sys_write+0x24/0x38
>>   invoke_syscall+0x50/0x128
>>   el0_svc_common.constprop.0+0x104/0x130
>>   do_el0_svc+0x40/0xb8
>>   el0_svc+0x2c/0xb8
>>   el0t_64_sync_handler+0xc0/0xc8
>>   el0t_64_sync+0x1a4/0x1a8
>> Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5)
>> ---[ end trace 0000000000000000 ]---
>>
>> The csdev::enable specify the device is currently part of an active
>> path, when enabling source via perf mode, csdev::enable also should
>> be set to true. We can add mode check in etm4_disable() to fix that,
>> if it's done, the sysfs cannot report busy when sysfs to enable ETM
>> that has been enabled by perf.
>>
>> Struct coresight_device add member of mode to check device busy in
>> coresight_(enable/disable)_source function to fixes handle kernel
>> NULL pointer, and sysfs report busy if perf session is already using
>> the ETM.
>> By the way, inperhaps another cleanup patch may be upload to remove
>> the etmv4_drvdata::mode tmc_drvdata::mode or others.
>>
> 
> Hi Junhao,
> 
> I definitely think we should do the cleanup at the same time as this
> change. Otherwise we risk leaving in the other modes which are
> duplicates of this and could easily cause confusion.
> 
> Although I saw there are a couple of modes that are not enum cs_mode,
> but a different kind of mode. And it's not always obvious because enum
> cs_mode is sometimes stored as an int...
> 
>> Test:
>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>> cat /sys/bus/coresight/devices/etm2/enable_source
>> 1
>> echo 1 > /sys/bus/coresight/devices/etm2/enable_source
>> -bash: echo: write error: Device or resource busy
>> echo 0 > /sys/bus/coresight/devices/etm2/enable_source
>> coresight etm2: Someone is already using the tracer
>> cat /sys/bus/coresight/devices/etm2/enable_source
>> 1
>>
> 
> I tested the fix and it works, although I have a few comments below on
> whether we should do something else instead.
> 
>> [1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
> adjacent to the path")
> 
> What was the original behavior before that change? Maybe it didn't have
> exactly the same error but was it still possible to enable and disable a
> sink after starting a Perf session? Seems like on the disable the same
> thing would have happened or something else bad?

Before that, the perf was using the source_ops->enable/disable directly.
And thus the csdev->enable was not set, forcing the sysfs mode to try
enabling it via the source_ops->enable, which would reject the request.

But with the switch to using enable_source, the sysfs mode request
bails out early assuming that the "csdev->enable == true" set by the
perf mode, to indicate the sysfs mode and goes ahead.

Ideally, we should rely on the backend driver to do the check ?

Does the following patch fix problems for you ?




diff --git a/drivers/hwtracing/coresight/coresight-core.c 
b/drivers/hwtracing/coresight/coresight-core.c
index 9fabe00a40d6..10e3609d8290 100644
--- a/drivers/hwtracing/coresight/coresight-core.c
+++ b/drivers/hwtracing/coresight/coresight-core.c
@@ -381,15 +381,12 @@ int coresight_enable_source(struct 
coresight_device *csdev, enum cs_mode mode,
  {
  	int ret;

-	if (!csdev->enable) {
-		if (source_ops(csdev)->enable) {
-			ret = source_ops(csdev)->enable(csdev, data, mode);
-			if (ret)
-				return ret;
-		}
+	if (source_ops(csdev)->enable) {
+		ret = source_ops(csdev)->enable(csdev, data, mode);
+		if (ret)
+			return ret;
  		csdev->enable = true;
  	}
-
  	atomic_inc(&csdev->refcnt);

  	return 0;
@@ -453,7 +450,7 @@ static void coresight_disable_helpers(struct 
coresight_device *csdev)
   */
  bool coresight_disable_source(struct coresight_device *csdev, void *data)
  {
-	if (atomic_dec_return(&csdev->refcnt) == 0) {
+	if (csdev->enable && atomic_dec_return(&csdev->refcnt) == 0) {
  		if (source_ops(csdev)->disable)
  			source_ops(csdev)->disable(csdev, data);
  		coresight_disable_helpers(csdev);
@@ -1202,7 +1199,7 @@ void coresight_disable(struct coresight_device *csdev)
  	if (ret)
  		goto out;

-	if (!csdev->enable || !coresight_disable_source(csdev, NULL))
+	if (!coresight_disable_source(csdev, NULL))
  		goto out;

  	switch (csdev->subtype.source_subtype) {




Suzuki
Junhao He Sept. 26, 2023, 11:43 a.m. UTC | #3
Hi, Suzuki


On 2023/9/22 17:20, Suzuki K Poulose wrote:
> On 21/09/2023 16:36, James Clark wrote:
>>
>>
>> On 21/09/2023 14:29, Junhao He wrote:
>>> The commmit [1] replace "source_ops(csdev)::enable()" with
>>> coresight_enable_source(). After this patch, when enable ETM through
>>> perf mode, the csdev::enable will be set to true. Then if we to disable
>>> the ETM by sysfs mode at the time, the resource will be released
>>> incorrectly, kernel will report the following call trace.
>>>
>>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>>> echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink
>>> echo 1 > /sys/bus/coresight/devices/etm0/enable_source
>>> echo 0 > /sys/bus/coresight/devices/etm0/enable_source
>>> Unable to handle kernel NULL pointer dereference at virtual
>>> address 00000000000001d0
>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>> ...
>>> Call trace:
>>>   etm4_disable+0x54/0x150 [coresight_etm4x]
>>>   coresight_disable_source+0x6c/0x98 [coresight]
>>>   coresight_disable+0x74/0x1c0 [coresight]
>>>   enable_source_store+0x88/0xa0 [coresight]
>>>   dev_attr_store+0x20/0x40
>>>   sysfs_kf_write+0x4c/0x68
>>>   kernfs_fop_write_iter+0x120/0x1b8
>>>   vfs_write+0x2dc/0x3b0
>>>   ksys_write+0x70/0x108
>>>   __arm64_sys_write+0x24/0x38
>>>   invoke_syscall+0x50/0x128
>>>   el0_svc_common.constprop.0+0x104/0x130
>>>   do_el0_svc+0x40/0xb8
>>>   el0_svc+0x2c/0xb8
>>>   el0t_64_sync_handler+0xc0/0xc8
>>>   el0t_64_sync+0x1a4/0x1a8
>>> Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5)
>>> ---[ end trace 0000000000000000 ]---
>>>
>>> The csdev::enable specify the device is currently part of an active
>>> path, when enabling source via perf mode, csdev::enable also should
>>> be set to true. We can add mode check in etm4_disable() to fix that,
>>> if it's done, the sysfs cannot report busy when sysfs to enable ETM
>>> that has been enabled by perf.
>>>
>>> Struct coresight_device add member of mode to check device busy in
>>> coresight_(enable/disable)_source function to fixes handle kernel
>>> NULL pointer, and sysfs report busy if perf session is already using
>>> the ETM.
>>> By the way, inperhaps another cleanup patch may be upload to remove
>>> the etmv4_drvdata::mode tmc_drvdata::mode or others.
>>>
>>
>> Hi Junhao,
>>
>> I definitely think we should do the cleanup at the same time as this
>> change. Otherwise we risk leaving in the other modes which are
>> duplicates of this and could easily cause confusion.
>>
>> Although I saw there are a couple of modes that are not enum cs_mode,
>> but a different kind of mode. And it's not always obvious because enum
>> cs_mode is sometimes stored as an int...
>>
>>> Test:
>>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>>> cat /sys/bus/coresight/devices/etm2/enable_source
>>> 1
>>> echo 1 > /sys/bus/coresight/devices/etm2/enable_source
>>> -bash: echo: write error: Device or resource busy
>>> echo 0 > /sys/bus/coresight/devices/etm2/enable_source
>>> coresight etm2: Someone is already using the tracer
>>> cat /sys/bus/coresight/devices/etm2/enable_source
>>> 1
>>>
>>
>> I tested the fix and it works, although I have a few comments below on
>> whether we should do something else instead.
>>
>>> [1] "coresight: Enable and disable helper devices adjacent to the 
>>> path"> Fixes: 6148652807ba ("coresight: Enable and disable helper 
>>> devices
>> adjacent to the path")
>>
>> What was the original behavior before that change? Maybe it didn't have
>> exactly the same error but was it still possible to enable and disable a
>> sink after starting a Perf session? Seems like on the disable the same
>> thing would have happened or something else bad?
>
> Before that, the perf was using the source_ops->enable/disable directly.
> And thus the csdev->enable was not set, forcing the sysfs mode to try
> enabling it via the source_ops->enable, which would reject the request.
>
> But with the switch to using enable_source, the sysfs mode request
> bails out early assuming that the "csdev->enable == true" set by the
> perf mode, to indicate the sysfs mode and goes ahead.
>
> Ideally, we should rely on the backend driver to do the check ?

As i said above. We can add mode check in etm4_disable() to fix that.
But I didn't do this because I wanted sysfs to report a busy log. If the
logs are unimportant, we do not need to add csdev->mode and can
check the mode in etm4_disable().

>
> Does the following patch fix problems for you ?
>

I tested on my platform and the problem still exists.
If the ETM has been used by the perf session, coresight_disable_source() 
still
responds to the disable csdev by sysfs.

   # perf record -e /cs_etm/@ultra_smb0/ -C 22 -- sleep 30 &
   [2] 11157
   # echo 1 > /sys/bus/coresight/devices/etm22/enable_source
   # echo 0 > /sys/bus/coresight/devices/etm22/enable_source
   [ 5308.708990] Unable to handle kernel NULL pointer dereference at 
virtual address 00000000000001d0
   [ 5308.834767] CPU: 2 PID: 11058 Comm: bash Kdump: loaded Tainted: 
G           O       6.5.0-rc4+ #1
   [ 5308.843894] Hardware name: Huawei TaiShan 200 (Model 
2280)/BC82AMDD, BIOS 2280-V2 CS V5.B221.01 12/09/2021
   [ 5308.854030] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
   [ 5308.861227] pc : etm4_disable+0x54/0x150 [coresight_etm4x]
   [ 5308.866956] lr : coresight_disable_source+0x5c/0xb8 [coresight]

Best regards,
Junhao.

>
>
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c 
> b/drivers/hwtracing/coresight/coresight-core.c
> index 9fabe00a40d6..10e3609d8290 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -381,15 +381,12 @@ int coresight_enable_source(struct 
> coresight_device *csdev, enum cs_mode mode,
>  {
>      int ret;
>
> -    if (!csdev->enable) {
> -        if (source_ops(csdev)->enable) {
> -            ret = source_ops(csdev)->enable(csdev, data, mode);
> -            if (ret)
> -                return ret;
> -        }
> +    if (source_ops(csdev)->enable) {
> +        ret = source_ops(csdev)->enable(csdev, data, mode);
> +        if (ret)
> +            return ret;
>          csdev->enable = true;
>      }
> -
>      atomic_inc(&csdev->refcnt);
>
>      return 0;
> @@ -453,7 +450,7 @@ static void coresight_disable_helpers(struct 
> coresight_device *csdev)
>   */
>  bool coresight_disable_source(struct coresight_device *csdev, void 
> *data)
>  {
> -    if (atomic_dec_return(&csdev->refcnt) == 0) {
> +    if (csdev->enable && atomic_dec_return(&csdev->refcnt) == 0) {
>          if (source_ops(csdev)->disable)
>              source_ops(csdev)->disable(csdev, data);
>          coresight_disable_helpers(csdev);
> @@ -1202,7 +1199,7 @@ void coresight_disable(struct coresight_device 
> *csdev)
>      if (ret)
>          goto out;
>
> -    if (!csdev->enable || !coresight_disable_source(csdev, NULL))
> +    if (!coresight_disable_source(csdev, NULL))
>          goto out;
>
>      switch (csdev->subtype.source_subtype) {
>
>
>
>
> Suzuki
>
>
> _______________________________________________
> CoreSight mailing list -- coresight@lists.linaro.org
> To unsubscribe send an email to coresight-leave@lists.linaro.org
>
> .
>
Suzuki K Poulose Oct. 2, 2023, 10:32 a.m. UTC | #4
Hello

On 26/09/2023 12:43, hejunhao wrote:
> Hi, Suzuki
> 
> 
> On 2023/9/22 17:20, Suzuki K Poulose wrote:
>> On 21/09/2023 16:36, James Clark wrote:
>>>
>>>
>>> On 21/09/2023 14:29, Junhao He wrote:
>>>> The commmit [1] replace "source_ops(csdev)::enable()" with
>>>> coresight_enable_source(). After this patch, when enable ETM through
>>>> perf mode, the csdev::enable will be set to true. Then if we to disable
>>>> the ETM by sysfs mode at the time, the resource will be released
>>>> incorrectly, kernel will report the following call trace.
>>>>
>>>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>>>> echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink
>>>> echo 1 > /sys/bus/coresight/devices/etm0/enable_source
>>>> echo 0 > /sys/bus/coresight/devices/etm0/enable_source
>>>> Unable to handle kernel NULL pointer dereference at virtual
>>>> address 00000000000001d0
>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>> ...
>>>> Call trace:
>>>>   etm4_disable+0x54/0x150 [coresight_etm4x]
>>>>   coresight_disable_source+0x6c/0x98 [coresight]
>>>>   coresight_disable+0x74/0x1c0 [coresight]
>>>>   enable_source_store+0x88/0xa0 [coresight]
>>>>   dev_attr_store+0x20/0x40
>>>>   sysfs_kf_write+0x4c/0x68
>>>>   kernfs_fop_write_iter+0x120/0x1b8
>>>>   vfs_write+0x2dc/0x3b0
>>>>   ksys_write+0x70/0x108
>>>>   __arm64_sys_write+0x24/0x38
>>>>   invoke_syscall+0x50/0x128
>>>>   el0_svc_common.constprop.0+0x104/0x130
>>>>   do_el0_svc+0x40/0xb8
>>>>   el0_svc+0x2c/0xb8
>>>>   el0t_64_sync_handler+0xc0/0xc8
>>>>   el0t_64_sync+0x1a4/0x1a8
>>>> Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5)
>>>> ---[ end trace 0000000000000000 ]---
>>>>
>>>> The csdev::enable specify the device is currently part of an active
>>>> path, when enabling source via perf mode, csdev::enable also should
>>>> be set to true. We can add mode check in etm4_disable() to fix that,
>>>> if it's done, the sysfs cannot report busy when sysfs to enable ETM
>>>> that has been enabled by perf.
>>>>
>>>> Struct coresight_device add member of mode to check device busy in
>>>> coresight_(enable/disable)_source function to fixes handle kernel
>>>> NULL pointer, and sysfs report busy if perf session is already using
>>>> the ETM.
>>>> By the way, inperhaps another cleanup patch may be upload to remove
>>>> the etmv4_drvdata::mode tmc_drvdata::mode or others.
>>>>
>>>
>>> Hi Junhao,
>>>
>>> I definitely think we should do the cleanup at the same time as this
>>> change. Otherwise we risk leaving in the other modes which are
>>> duplicates of this and could easily cause confusion.
>>>
>>> Although I saw there are a couple of modes that are not enum cs_mode,
>>> but a different kind of mode. And it's not always obvious because enum
>>> cs_mode is sometimes stored as an int...
>>>
>>>> Test:
>>>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>>>> cat /sys/bus/coresight/devices/etm2/enable_source
>>>> 1
>>>> echo 1 > /sys/bus/coresight/devices/etm2/enable_source
>>>> -bash: echo: write error: Device or resource busy
>>>> echo 0 > /sys/bus/coresight/devices/etm2/enable_source
>>>> coresight etm2: Someone is already using the tracer
>>>> cat /sys/bus/coresight/devices/etm2/enable_source
>>>> 1
>>>>
>>>
>>> I tested the fix and it works, although I have a few comments below on
>>> whether we should do something else instead.
>>>
>>>> [1] "coresight: Enable and disable helper devices adjacent to the 
>>>> path"> Fixes: 6148652807ba ("coresight: Enable and disable helper 
>>>> devices
>>> adjacent to the path")
>>>
>>> What was the original behavior before that change? Maybe it didn't have
>>> exactly the same error but was it still possible to enable and disable a
>>> sink after starting a Perf session? Seems like on the disable the same
>>> thing would have happened or something else bad?
>>
>> Before that, the perf was using the source_ops->enable/disable directly.
>> And thus the csdev->enable was not set, forcing the sysfs mode to try
>> enabling it via the source_ops->enable, which would reject the request.
>>
>> But with the switch to using enable_source, the sysfs mode request
>> bails out early assuming that the "csdev->enable == true" set by the
>> perf mode, to indicate the sysfs mode and goes ahead.
>>
>> Ideally, we should rely on the backend driver to do the check ?
> 
> As i said above. We can add mode check in etm4_disable() to fix that.
> But I didn't do this because I wanted sysfs to report a busy log. If the
> logs are unimportant, we do not need to add csdev->mode and can
> check the mode in etm4_disable().
> 
>>
>> Does the following patch fix problems for you ?
>>
> 
> I tested on my platform and the problem still exists.
> If the ETM has been used by the perf session, coresight_disable_source() 
> still
> responds to the disable csdev by sysfs.
> 
>    # perf record -e /cs_etm/@ultra_smb0/ -C 22 -- sleep 30 &
>    [2] 11157
>    # echo 1 > /sys/bus/coresight/devices/etm22/enable_source
>    # echo 0 > /sys/bus/coresight/devices/etm22/enable_source
>    [ 5308.708990] Unable to handle kernel NULL pointer dereference at 
> virtual address 00000000000001d0
>    [ 5308.834767] CPU: 2 PID: 11058 Comm: bash Kdump: loaded Tainted: 
> G           O       6.5.0-rc4+ #1
>    [ 5308.843894] Hardware name: Huawei TaiShan 200 (Model 
> 2280)/BC82AMDD, BIOS 2280-V2 CS V5.B221.01 12/09/2021
>    [ 5308.854030] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS 
> BTYPE=--)
>    [ 5308.861227] pc : etm4_disable+0x54/0x150 [coresight_etm4x]
>    [ 5308.866956] lr : coresight_disable_source+0x5c/0xb8 [coresight]
> 
> Best regards,
> Junhao.
> 
>>
>>
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-core.c 
>> b/drivers/hwtracing/coresight/coresight-core.c
>> index 9fabe00a40d6..10e3609d8290 100644
>> --- a/drivers/hwtracing/coresight/coresight-core.c
>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>> @@ -381,15 +381,12 @@ int coresight_enable_source(struct 
>> coresight_device *csdev, enum cs_mode mode,
>>  {
>>      int ret;
>>
>> -    if (!csdev->enable) {
>> -        if (source_ops(csdev)->enable) {
>> -            ret = source_ops(csdev)->enable(csdev, data, mode);
>> -            if (ret)
>> -                return ret;
>> -        }
>> +    if (source_ops(csdev)->enable) {
>> +        ret = source_ops(csdev)->enable(csdev, data, mode);
>> +        if (ret)
>> +            return ret;
>>          csdev->enable = true;
>>      }
>> -
>>      atomic_inc(&csdev->refcnt);
>>
>>      return 0;
>> @@ -453,7 +450,7 @@ static void coresight_disable_helpers(struct 
>> coresight_device *csdev)
>>   */
>>  bool coresight_disable_source(struct coresight_device *csdev, void 
>> *data)
>>  {
>> -    if (atomic_dec_return(&csdev->refcnt) == 0) {
>> +    if (csdev->enable && atomic_dec_return(&csdev->refcnt) == 0) {
>>          if (source_ops(csdev)->disable)
>>              source_ops(csdev)->disable(csdev, data);
>>          coresight_disable_helpers(csdev);
>> @@ -1202,7 +1199,7 @@ void coresight_disable(struct coresight_device 
>> *csdev)
>>      if (ret)
>>          goto out;
>>
>> -    if (!csdev->enable || !coresight_disable_source(csdev, NULL))
>> +    if (!coresight_disable_source(csdev, NULL))
>>          goto out;
>>
>>      switch (csdev->subtype.source_subtype) {
>>
>>

I am concerned about the "mode" changes without any synchronization.
i.e. earlier, coresight_enable_source() was called under coresight_mutex
from sysfs mode. But the perf code doesn't use any of that. So, it
is still possible to race and overwrite the "mode".

I think we need to take a closer look at the locking to make
sure this is clear.

Suzuki
diff mbox series

Patch

diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
index 206f17826399..7b485fc2ed19 100644
--- a/drivers/hwtracing/coresight/coresight-core.c
+++ b/drivers/hwtracing/coresight/coresight-core.c
@@ -388,6 +388,7 @@  int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode,
 				return ret;
 		}
 		csdev->enable = true;
+		csdev->mode = mode;
 	}
 
 	atomic_inc(&csdev->refcnt);
@@ -446,18 +447,26 @@  static void coresight_disable_helpers(struct coresight_device *csdev)
  *  the device if there are no users left.
  *
  *  @csdev: The coresight device to disable
+ *  @mode: How the coresight device is being used, perf mode or sysfs mode.
  *  @data: Opaque data to pass on to the disable function of the source device.
  *         For example in perf mode this is a pointer to the struct perf_event.
  *
  *  Returns true if the device has been disabled.
  */
-bool coresight_disable_source(struct coresight_device *csdev, void *data)
+bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode,
+			      void *data)
 {
+	if (csdev->mode && csdev->mode != mode) {
+		dev_err(&csdev->dev, "Someone is already using the tracer\n");
+		return false;
+	}
+
 	if (atomic_dec_return(&csdev->refcnt) == 0) {
 		if (source_ops(csdev)->disable)
 			source_ops(csdev)->disable(csdev, data);
 		coresight_disable_helpers(csdev);
 		csdev->enable = false;
+		csdev->mode = CS_MODE_DISABLED;
 	}
 	return !csdev->enable;
 }
@@ -1117,6 +1126,11 @@  int coresight_enable(struct coresight_device *csdev)
 	if (ret)
 		goto out;
 
+	if (csdev->mode == CS_MODE_PERF) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	if (csdev->enable) {
 		/*
 		 * There could be multiple applications driving the software
@@ -1202,7 +1216,7 @@  void coresight_disable(struct coresight_device *csdev)
 	if (ret)
 		goto out;
 
-	if (!csdev->enable || !coresight_disable_source(csdev, NULL))
+	if (!csdev->enable || !coresight_disable_source(csdev, CS_MODE_SYSFS, NULL))
 		goto out;
 
 	switch (csdev->subtype.source_subtype) {
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 5ca6278baff4..015aea9f2b22 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -459,6 +459,10 @@  static void etm_event_start(struct perf_event *event, int flags)
 	if (WARN_ON(ctxt->event_data))
 		goto fail;
 
+	/* Sysfs is already using the tracer */
+	if (csdev->mode == CS_MODE_SYSFS)
+		goto fail;
+
 	/*
 	 * Deal with the ring buffer API and get a handle on the
 	 * session's information.
@@ -587,7 +591,7 @@  static void etm_event_stop(struct perf_event *event, int mode)
 		return;
 
 	/* stop tracer */
-	coresight_disable_source(csdev, event);
+	coresight_disable_source(csdev, CS_MODE_PERF, event);
 
 	/* tell the core */
 	event->hw.state = PERF_HES_STOPPED;
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index 767076e07970..1f8b512ba5ac 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -233,6 +233,7 @@  void coresight_set_percpu_sink(int cpu, struct coresight_device *csdev);
 struct coresight_device *coresight_get_percpu_sink(int cpu);
 int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode,
 			    void *data);
-bool coresight_disable_source(struct coresight_device *csdev, void *data);
+bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode,
+			      void *data);
 
 #endif
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index a269fffaf991..acbbedb9abf3 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -216,6 +216,12 @@  struct coresight_sysfs_link {
 	const char *target_name;
 };
 
+enum cs_mode {
+	CS_MODE_DISABLED,
+	CS_MODE_SYSFS,
+	CS_MODE_PERF,
+};
+
 /**
  * struct coresight_device - representation of a device as used by the framework
  * @pdata:	Platform data with device connections associated to this device.
@@ -228,6 +234,7 @@  struct coresight_sysfs_link {
  * @refcnt:	keep track of what is in use.
  * @orphan:	true if the component has connections that haven't been linked.
  * @enable:	'true' if component is currently part of an active path.
+ * @mode:	This tracer's mode, i.e sysFS, Perf or disabled.
  * @activated:	'true' only if a _sink_ has been activated.  A sink can be
  *		activated but not yet enabled.  Enabling for a _sink_
  *		happens when a source has been selected and a path is enabled
@@ -252,6 +259,7 @@  struct coresight_device {
 	atomic_t refcnt;
 	bool orphan;
 	bool enable;	/* true only if configured as part of a path */
+	enum cs_mode mode;
 	/* sink specific fields */
 	bool activated;	/* true only if a sink is part of a path */
 	struct dev_ext_attribute *ea;
@@ -290,12 +298,6 @@  static struct coresight_dev_list (var) = {				\
 
 #define to_coresight_device(d) container_of(d, struct coresight_device, dev)
 
-enum cs_mode {
-	CS_MODE_DISABLED,
-	CS_MODE_SYSFS,
-	CS_MODE_PERF,
-};
-
 #define source_ops(csdev)	csdev->ops->source_ops
 #define sink_ops(csdev)		csdev->ops->sink_ops
 #define link_ops(csdev)		csdev->ops->link_ops