diff mbox series

coresight: dynamic-replicator: Fix handling of multiple connections

Message ID 20200426143725.18116-1-saiprakash.ranjan@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series coresight: dynamic-replicator: Fix handling of multiple connections | expand

Commit Message

Sai Prakash Ranjan April 26, 2020, 2:37 p.m. UTC
Since commit 30af4fb619e5 ("coresight: dynamic-replicator:
Handle multiple connections"), we do not make sure that
the other port is disabled when the dynamic replicator is
enabled. This is seen to cause the CPU hardlockup atleast
on SC7180 SoC with the following topology when enabling ETM
with ETR as the sink via sysfs. Since there is no trace id
logic in coresight yet to make use of multiple sinks in
parallel for different trace sessions, disable the other
port when one port is turned on.

       etm0_out
	  |
   apss_funnel_in0
          |
  apss_merge_funnel_in
          |
      funnel1_in4
	  |
   merge_funnel_in1
	  |
    swao_funnel_in
          |
        etf_in
	  |
  swao_replicator_in
          |
   replicator_in
	  |
        etr_in

  Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S  B             5.4.25 #100
  Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT)
  Call trace:
   dump_backtrace+0x0/0x188
   show_stack+0x20/0x2c
   dump_stack+0xdc/0x144
   panic+0x168/0x370
   arch_seccomp_spec_mitigate+0x0/0x14
   watchdog_timer_fn+0x68/0x290
   __hrtimer_run_queues+0x264/0x498
   hrtimer_interrupt+0xf0/0x22c
   arch_timer_handler_phys+0x40/0x50
   handle_percpu_devid_irq+0x8c/0x158
   __handle_domain_irq+0x84/0xc4
   gic_handle_irq+0x100/0x1c4
   el1_irq+0xbc/0x180
   arch_cpu_idle+0x3c/0x5c
   default_idle_call+0x1c/0x38
   do_idle+0x100/0x280
   cpu_startup_entry+0x24/0x28
   secondary_start_kernel+0x15c/0x170
  SMP: stopping secondary CPUs

Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
---
Changes since RFC:
 * Reworded commit text and included the topology on SC7180.
---
 .../hwtracing/coresight/coresight-replicator.c    | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Suzuki K Poulose April 27, 2020, 9:20 a.m. UTC | #1
On 04/26/2020 03:37 PM, Sai Prakash Ranjan wrote:
> Since commit 30af4fb619e5 ("coresight: dynamic-replicator:
> Handle multiple connections"), we do not make sure that
> the other port is disabled when the dynamic replicator is
> enabled. This is seen to cause the CPU hardlockup atleast
> on SC7180 SoC with the following topology when enabling ETM
> with ETR as the sink via sysfs. Since there is no trace id
> logic in coresight yet to make use of multiple sinks in
> parallel for different trace sessions, disable the other
> port when one port is turned on.
> 
>         etm0_out
> 	  |
>     apss_funnel_in0
>            |
>    apss_merge_funnel_in
>            |
>        funnel1_in4
> 	  |
>     merge_funnel_in1
> 	  |
>      swao_funnel_in
>            |
>          etf_in
> 	  |
>    swao_replicator_in
>            |
>     replicator_in
> 	  |
>          etr_in
> 
>    Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
>    CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S  B             5.4.25 #100
>    Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT)
>    Call trace:
>     dump_backtrace+0x0/0x188
>     show_stack+0x20/0x2c
>     dump_stack+0xdc/0x144
>     panic+0x168/0x370
>     arch_seccomp_spec_mitigate+0x0/0x14
>     watchdog_timer_fn+0x68/0x290
>     __hrtimer_run_queues+0x264/0x498
>     hrtimer_interrupt+0xf0/0x22c
>     arch_timer_handler_phys+0x40/0x50
>     handle_percpu_devid_irq+0x8c/0x158
>     __handle_domain_irq+0x84/0xc4
>     gic_handle_irq+0x100/0x1c4
>     el1_irq+0xbc/0x180
>     arch_cpu_idle+0x3c/0x5c
>     default_idle_call+0x1c/0x38
>     do_idle+0x100/0x280
>     cpu_startup_entry+0x24/0x28
>     secondary_start_kernel+0x15c/0x170
>    SMP: stopping secondary CPUs
> 
> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> Tested-by: Stephen Boyd <swboyd@chromium.org>
> ---
> Changes since RFC:
>   * Reworded commit text and included the topology on SC7180.


> ---
>   .../hwtracing/coresight/coresight-replicator.c    | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
> index e7dc1c31d20d..f4eaa38f8f43 100644
> --- a/drivers/hwtracing/coresight/coresight-replicator.c
> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> @@ -66,14 +66,16 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
>   				     int inport, int outport)
>   {
>   	int rc = 0;
> -	u32 reg;
> +	u32 reg0, reg1;
>   
>   	switch (outport) {
>   	case 0:
> -		reg = REPLICATOR_IDFILTER0;
> +		reg0 = REPLICATOR_IDFILTER0;
> +		reg1 = REPLICATOR_IDFILTER1;
>   		break;
>   	case 1:
> -		reg = REPLICATOR_IDFILTER1;
> +		reg0 = REPLICATOR_IDFILTER1;
> +		reg1 = REPLICATOR_IDFILTER0;
>   		break;
>   	default:
>   		WARN_ON(1);
> @@ -87,8 +89,11 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
>   		rc = coresight_claim_device_unlocked(drvdata->base);
>   
>   	/* Ensure that the outport is enabled. */
> -	if (!rc)
> -		writel_relaxed(0x00, drvdata->base + reg);
> +	if (!rc) {
> +		writel_relaxed(0x00, drvdata->base + reg0);
> +		writel_relaxed(0xff, drvdata->base + reg1);
> +	}
> +

This is not sufficient. You must prevent another session trying to
enable the other port of the replicator as this could silently fail
the "on-going" session. Not ideal. Fail the attempt to enable a port
if the other port is active. You could track this in software and
fail early.

Suzuki
Mike Leach April 27, 2020, 9:45 a.m. UTC | #2
HI,

On Mon, 27 Apr 2020 at 10:15, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
> On 04/26/2020 03:37 PM, Sai Prakash Ranjan wrote:
> > Since commit 30af4fb619e5 ("coresight: dynamic-replicator:
> > Handle multiple connections"), we do not make sure that
> > the other port is disabled when the dynamic replicator is
> > enabled. This is seen to cause the CPU hardlockup atleast
> > on SC7180 SoC with the following topology when enabling ETM
> > with ETR as the sink via sysfs. Since there is no trace id
> > logic in coresight yet to make use of multiple sinks in
> > parallel for different trace sessions, disable the other
> > port when one port is turned on.
> >
> >         etm0_out
> >         |
> >     apss_funnel_in0
> >            |
> >    apss_merge_funnel_in
> >            |
> >        funnel1_in4
> >         |
> >     merge_funnel_in1
> >         |
> >      swao_funnel_in
> >            |
> >          etf_in
> >         |
> >    swao_replicator_in
> >            |
> >     replicator_in
> >         |
> >          etr_in
> >
> >    Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
> >    CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S  B             5.4.25 #100
> >    Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT)
> >    Call trace:
> >     dump_backtrace+0x0/0x188
> >     show_stack+0x20/0x2c
> >     dump_stack+0xdc/0x144
> >     panic+0x168/0x370
> >     arch_seccomp_spec_mitigate+0x0/0x14
> >     watchdog_timer_fn+0x68/0x290
> >     __hrtimer_run_queues+0x264/0x498
> >     hrtimer_interrupt+0xf0/0x22c
> >     arch_timer_handler_phys+0x40/0x50
> >     handle_percpu_devid_irq+0x8c/0x158
> >     __handle_domain_irq+0x84/0xc4
> >     gic_handle_irq+0x100/0x1c4
> >     el1_irq+0xbc/0x180
> >     arch_cpu_idle+0x3c/0x5c
> >     default_idle_call+0x1c/0x38
> >     do_idle+0x100/0x280
> >     cpu_startup_entry+0x24/0x28
> >     secondary_start_kernel+0x15c/0x170
> >    SMP: stopping secondary CPUs
> >
> > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> > Tested-by: Stephen Boyd <swboyd@chromium.org>
> > ---
> > Changes since RFC:
> >   * Reworded commit text and included the topology on SC7180.
>
>
> > ---
> >   .../hwtracing/coresight/coresight-replicator.c    | 15 ++++++++++-----
> >   1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
> > index e7dc1c31d20d..f4eaa38f8f43 100644
> > --- a/drivers/hwtracing/coresight/coresight-replicator.c
> > +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> > @@ -66,14 +66,16 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
> >                                    int inport, int outport)
> >   {
> >       int rc = 0;
> > -     u32 reg;
> > +     u32 reg0, reg1;
> >
> >       switch (outport) {
> >       case 0:
> > -             reg = REPLICATOR_IDFILTER0;
> > +             reg0 = REPLICATOR_IDFILTER0;
> > +             reg1 = REPLICATOR_IDFILTER1;
> >               break;
> >       case 1:
> > -             reg = REPLICATOR_IDFILTER1;
> > +             reg0 = REPLICATOR_IDFILTER1;
> > +             reg1 = REPLICATOR_IDFILTER0;
> >               break;
> >       default:
> >               WARN_ON(1);
> > @@ -87,8 +89,11 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
> >               rc = coresight_claim_device_unlocked(drvdata->base);
> >
> >       /* Ensure that the outport is enabled. */
> > -     if (!rc)
> > -             writel_relaxed(0x00, drvdata->base + reg);
> > +     if (!rc) {
> > +             writel_relaxed(0x00, drvdata->base + reg0);
> > +             writel_relaxed(0xff, drvdata->base + reg1);
> > +     }
> > +
>
> This is not sufficient. You must prevent another session trying to
> enable the other port of the replicator as this could silently fail
> the "on-going" session. Not ideal. Fail the attempt to enable a port
> if the other port is active. You could track this in software and
> fail early.
>
> Suzuki

While I have no issue in principle with not enabling a path to a sink
that is not in use - indeed in some cases attaching to unused sinks
can cause back-pressure that slows throughput (cf TPIU) - I am
concerned that this modification is masking an underlying issue with
the platform in question.

Should we decide to enable the diversion of different IDs to different
sinks or allow different sessions go to different sinks, then this has
potential to fail on the SC7180 SoC - and it will be difficult in
future to associate a problem with this discussion.

Regards

Mike
Suzuki K Poulose April 27, 2020, 1:53 p.m. UTC | #3
On 04/27/2020 10:45 AM, Mike Leach wrote:
> HI,
> 
> On Mon, 27 Apr 2020 at 10:15, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>> On 04/26/2020 03:37 PM, Sai Prakash Ranjan wrote:
>>> Since commit 30af4fb619e5 ("coresight: dynamic-replicator:
>>> Handle multiple connections"), we do not make sure that
>>> the other port is disabled when the dynamic replicator is
>>> enabled. This is seen to cause the CPU hardlockup atleast
>>> on SC7180 SoC with the following topology when enabling ETM
>>> with ETR as the sink via sysfs. Since there is no trace id
>>> logic in coresight yet to make use of multiple sinks in
>>> parallel for different trace sessions, disable the other
>>> port when one port is turned on.
>>>
>>>          etm0_out
>>>          |
>>>      apss_funnel_in0
>>>             |
>>>     apss_merge_funnel_in
>>>             |
>>>         funnel1_in4
>>>          |
>>>      merge_funnel_in1
>>>          |
>>>       swao_funnel_in
>>>             |
>>>           etf_in
>>>          |
>>>     swao_replicator_in
>>>             |
>>>      replicator_in
>>>          |
>>>           etr_in
>>>
>>>     Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
>>>     CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S  B             5.4.25 #100
>>>     Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT)
>>>     Call trace:
>>>      dump_backtrace+0x0/0x188
>>>      show_stack+0x20/0x2c
>>>      dump_stack+0xdc/0x144
>>>      panic+0x168/0x370
>>>      arch_seccomp_spec_mitigate+0x0/0x14
>>>      watchdog_timer_fn+0x68/0x290
>>>      __hrtimer_run_queues+0x264/0x498
>>>      hrtimer_interrupt+0xf0/0x22c
>>>      arch_timer_handler_phys+0x40/0x50
>>>      handle_percpu_devid_irq+0x8c/0x158
>>>      __handle_domain_irq+0x84/0xc4
>>>      gic_handle_irq+0x100/0x1c4
>>>      el1_irq+0xbc/0x180
>>>      arch_cpu_idle+0x3c/0x5c
>>>      default_idle_call+0x1c/0x38
>>>      do_idle+0x100/0x280
>>>      cpu_startup_entry+0x24/0x28
>>>      secondary_start_kernel+0x15c/0x170
>>>     SMP: stopping secondary CPUs
>>>
>>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>>> Tested-by: Stephen Boyd <swboyd@chromium.org>


>>
>> This is not sufficient. You must prevent another session trying to
>> enable the other port of the replicator as this could silently fail
>> the "on-going" session. Not ideal. Fail the attempt to enable a port
>> if the other port is active. You could track this in software and
>> fail early.
>>
>> Suzuki
> 
> While I have no issue in principle with not enabling a path to a sink
> that is not in use - indeed in some cases attaching to unused sinks
> can cause back-pressure that slows throughput (cf TPIU) - I am
> concerned that this modification is masking an underlying issue with
> the platform in question.
> 
> Should we decide to enable the diversion of different IDs to different
> sinks or allow different sessions go to different sinks, then this has
> potential to fail on the SC7180 SoC - and it will be difficult in
> future to associate a problem with this discussion.

Mike,

I think thats a good point.
Sai, please could we narrow down this to the real problem and may be
work around it for the "device" ? Do we know which sink is causing the
back pressure ? We could then push the "work around" to the replicator
it is connected to.

Suzuki
Sai Prakash Ranjan April 28, 2020, 12:23 p.m. UTC | #4
On 2020-04-27 19:23, Suzuki K Poulose wrote:
> On 04/27/2020 10:45 AM, Mike Leach wrote:
[...]
>>> 
>>> This is not sufficient. You must prevent another session trying to
>>> enable the other port of the replicator as this could silently fail
>>> the "on-going" session. Not ideal. Fail the attempt to enable a port
>>> if the other port is active. You could track this in software and
>>> fail early.
>>> 
>>> Suzuki
>> 
>> While I have no issue in principle with not enabling a path to a sink
>> that is not in use - indeed in some cases attaching to unused sinks
>> can cause back-pressure that slows throughput (cf TPIU) - I am
>> concerned that this modification is masking an underlying issue with
>> the platform in question.
>> 
>> Should we decide to enable the diversion of different IDs to different
>> sinks or allow different sessions go to different sinks, then this has
>> potential to fail on the SC7180 SoC - and it will be difficult in
>> future to associate a problem with this discussion.
> 
> Mike,
> 
> I think thats a good point.
> Sai, please could we narrow down this to the real problem and may be
> work around it for the "device" ? Do we know which sink is causing the
> back pressure ? We could then push the "work around" to the replicator
> it is connected to.
> 
> Suzuki

Hi Suzuki, Mike,

To add some more to the information provided earlier, 
swao_replicator(6b06000) and etf are
in AOSS (Always-On-SubSystem) group. Also TPIU(connected to 
qdss_replicator) and EUD(connected
to swao_replicator) sinks are unused.

Please ignore the id filter values provided earlier.
Here are ID filter values after boot and before enabling replicator. As 
per
these idfilter values, we should not try to enable replicator if its 
already
enabled (in this case for swao_replicator) right?

localhost ~ # cat 
/sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter0
0x0
localhost ~ # cat 
/sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter1
0x0

localhost ~ # cat 
/sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter0
0xff
localhost ~ # cat 
/sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter1
0xff

I think the unused sink EUD(also in AOSS group) probably is causing the 
backpressure here.

Thanks,
Sai
Sai Prakash Ranjan April 29, 2020, 11:47 a.m. UTC | #5
On 2020-04-28 17:53, Sai Prakash Ranjan wrote:
> On 2020-04-27 19:23, Suzuki K Poulose wrote:
>> On 04/27/2020 10:45 AM, Mike Leach wrote:
> [...]
>>>> 
>>>> This is not sufficient. You must prevent another session trying to
>>>> enable the other port of the replicator as this could silently fail
>>>> the "on-going" session. Not ideal. Fail the attempt to enable a port
>>>> if the other port is active. You could track this in software and
>>>> fail early.
>>>> 
>>>> Suzuki
>>> 
>>> While I have no issue in principle with not enabling a path to a sink
>>> that is not in use - indeed in some cases attaching to unused sinks
>>> can cause back-pressure that slows throughput (cf TPIU) - I am
>>> concerned that this modification is masking an underlying issue with
>>> the platform in question.
>>> 
>>> Should we decide to enable the diversion of different IDs to 
>>> different
>>> sinks or allow different sessions go to different sinks, then this 
>>> has
>>> potential to fail on the SC7180 SoC - and it will be difficult in
>>> future to associate a problem with this discussion.
>> 
>> Mike,
>> 
>> I think thats a good point.
>> Sai, please could we narrow down this to the real problem and may be
>> work around it for the "device" ? Do we know which sink is causing the
>> back pressure ? We could then push the "work around" to the replicator
>> it is connected to.
>> 
>> Suzuki
> 
> Hi Suzuki, Mike,
> 
> To add some more to the information provided earlier,
> swao_replicator(6b06000) and etf are
> in AOSS (Always-On-SubSystem) group. Also TPIU(connected to
> qdss_replicator) and EUD(connected
> to swao_replicator) sinks are unused.
> 
> Please ignore the id filter values provided earlier.
> Here are ID filter values after boot and before enabling replicator. As 
> per
> these idfilter values, we should not try to enable replicator if its 
> already
> enabled (in this case for swao_replicator) right?
> 
> localhost ~ # cat
> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter0
> 0x0
> localhost ~ # cat
> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter1
> 0x0
> 
> localhost ~ # cat
> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter0
> 0xff
> localhost ~ # cat
> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter1
> 0xff
> 

Looking more into replicator1(swao_replicator) values as 0x0 even after 
replicator_reset()
in replicator probe, I added dynamic_replicator_reset in 
dynamic_replicator_enable()
and am not seeing any hardlockup. Also I added some prints to check the 
idfilter
values before and after reset and found that its not set to 0xff even 
after replicator_reset()
in replicator probe, I don't see any other path setting it to 0x0.

After probe:

[    8.477669] func replicator_probe before reset replicator replicator1 
REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
[    8.489470] func replicator_probe after reset replicator replicator1 
REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
[    8.502738] func replicator_probe before reset replicator replicator0 
REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
[    8.515214] func replicator_probe after reset replicator replicator0 
REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
localhost ~ #
localhost ~ #
localhost ~ # echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
localhost ~ #
localhost ~ # echo 1 > /sys/bus/coresight/devices/etm0/enable_source
[   58.490485] func dynamic_replicator_enable before reset replicator 
replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
[   58.503246] func dynamic_replicator_enable after reset replicator 
replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
[   58.520902] func dynamic_replicator_enable before reset replicator 
replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
[   58.533500] func dynamic_replicator_enable after reset replicator 
replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
localhost ~ #

Can we have a replicator_reset in dynamic_replicator_enable?

diff --git a/drivers/hwtracing/coresight/coresight-replicator.c 
b/drivers/hwtracing/coresight/coresight-replicator.c
index e7dc1c31d20d..794f8e4c049f 100644
--- a/drivers/hwtracing/coresight/coresight-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-replicator.c
@@ -68,6 +68,8 @@ static int dynamic_replicator_enable(struct 
replicator_drvdata *drvdata,
         int rc = 0;
         u32 reg;

+       dynamic_replicator_reset(drvdata);
+
         switch (outport) {
         case 0:
                 reg = REPLICATOR_IDFILTER0;

Thanks,
Sai
Suzuki K Poulose April 29, 2020, 1:49 p.m. UTC | #6
On 04/29/2020 12:47 PM, Sai Prakash Ranjan wrote:
> On 2020-04-28 17:53, Sai Prakash Ranjan wrote:
>> On 2020-04-27 19:23, Suzuki K Poulose wrote:
>>> On 04/27/2020 10:45 AM, Mike Leach wrote:
>> [...]
>>>>>
>>>>> This is not sufficient. You must prevent another session trying to
>>>>> enable the other port of the replicator as this could silently fail
>>>>> the "on-going" session. Not ideal. Fail the attempt to enable a port
>>>>> if the other port is active. You could track this in software and
>>>>> fail early.
>>>>>
>>>>> Suzuki
>>>>
>>>> While I have no issue in principle with not enabling a path to a sink
>>>> that is not in use - indeed in some cases attaching to unused sinks
>>>> can cause back-pressure that slows throughput (cf TPIU) - I am
>>>> concerned that this modification is masking an underlying issue with
>>>> the platform in question.
>>>>
>>>> Should we decide to enable the diversion of different IDs to different
>>>> sinks or allow different sessions go to different sinks, then this has
>>>> potential to fail on the SC7180 SoC - and it will be difficult in
>>>> future to associate a problem with this discussion.
>>>
>>> Mike,
>>>
>>> I think thats a good point.
>>> Sai, please could we narrow down this to the real problem and may be
>>> work around it for the "device" ? Do we know which sink is causing the
>>> back pressure ? We could then push the "work around" to the replicator
>>> it is connected to.
>>>
>>> Suzuki
>>
>> Hi Suzuki, Mike,
>>
>> To add some more to the information provided earlier,
>> swao_replicator(6b06000) and etf are
>> in AOSS (Always-On-SubSystem) group. Also TPIU(connected to
>> qdss_replicator) and EUD(connected
>> to swao_replicator) sinks are unused.
>>
>> Please ignore the id filter values provided earlier.
>> Here are ID filter values after boot and before enabling replicator. 
>> As per
>> these idfilter values, we should not try to enable replicator if its 
>> already
>> enabled (in this case for swao_replicator) right?
>>
>> localhost ~ # cat
>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter0
>> 0x0
>> localhost ~ # cat
>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter1
>> 0x0
>>
>> localhost ~ # cat
>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter0
>> 0xff
>> localhost ~ # cat
>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter1
>> 0xff
>>
> 
> Looking more into replicator1(swao_replicator) values as 0x0 even after 
> replicator_reset()
> in replicator probe, I added dynamic_replicator_reset in 
> dynamic_replicator_enable()
> and am not seeing any hardlockup. Also I added some prints to check the 
> idfilter
> values before and after reset and found that its not set to 0xff even 
> after replicator_reset()
> in replicator probe, I don't see any other path setting it to 0x0.
> 
> After probe:
> 
> [    8.477669] func replicator_probe before reset replicator replicator1 
> REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> [    8.489470] func replicator_probe after reset replicator replicator1 
> REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff

AFAICS, after the reset both of them are set to 0xff.

> [    8.502738] func replicator_probe before reset replicator replicator0 
> REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> [    8.515214] func replicator_probe after reset replicator replicator0 
> REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff



> localhost ~ #
> localhost ~ #
> localhost ~ # echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
> localhost ~ #
> localhost ~ # echo 1 > /sys/bus/coresight/devices/etm0/enable_source
> [   58.490485] func dynamic_replicator_enable before reset replicator 
> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> [   58.503246] func dynamic_replicator_enable after reset replicator 
> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> [   58.520902] func dynamic_replicator_enable before reset replicator 
> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0

You need to find what is resetting the IDFILTERs to 0 for replicator1.

> [   58.533500] func dynamic_replicator_enable after reset replicator 
> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> localhost ~ #
> 
> Can we have a replicator_reset in dynamic_replicator_enable?
> 
> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c 
> b/drivers/hwtracing/coresight/coresight-replicator.c
> index e7dc1c31d20d..794f8e4c049f 100644
> --- a/drivers/hwtracing/coresight/coresight-replicator.c
> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> @@ -68,6 +68,8 @@ static int dynamic_replicator_enable(struct 
> replicator_drvdata *drvdata,
>          int rc = 0;
>          u32 reg;
> 
> +       dynamic_replicator_reset(drvdata);
> +

Again you are trying to mask an issue with this. Is the firmware
using the replicator for anything ? If so, this needs to be claimed
to prevent us from using it.

Suzuki
Sai Prakash Ranjan April 29, 2020, 1:59 p.m. UTC | #7
On 2020-04-29 19:19, Suzuki K Poulose wrote:
> On 04/29/2020 12:47 PM, Sai Prakash Ranjan wrote:
>> On 2020-04-28 17:53, Sai Prakash Ranjan wrote:
>>> On 2020-04-27 19:23, Suzuki K Poulose wrote:
>>>> On 04/27/2020 10:45 AM, Mike Leach wrote:
>>> [...]
>>>>>> 
>>>>>> This is not sufficient. You must prevent another session trying to
>>>>>> enable the other port of the replicator as this could silently 
>>>>>> fail
>>>>>> the "on-going" session. Not ideal. Fail the attempt to enable a 
>>>>>> port
>>>>>> if the other port is active. You could track this in software and
>>>>>> fail early.
>>>>>> 
>>>>>> Suzuki
>>>>> 
>>>>> While I have no issue in principle with not enabling a path to a 
>>>>> sink
>>>>> that is not in use - indeed in some cases attaching to unused sinks
>>>>> can cause back-pressure that slows throughput (cf TPIU) - I am
>>>>> concerned that this modification is masking an underlying issue 
>>>>> with
>>>>> the platform in question.
>>>>> 
>>>>> Should we decide to enable the diversion of different IDs to 
>>>>> different
>>>>> sinks or allow different sessions go to different sinks, then this 
>>>>> has
>>>>> potential to fail on the SC7180 SoC - and it will be difficult in
>>>>> future to associate a problem with this discussion.
>>>> 
>>>> Mike,
>>>> 
>>>> I think thats a good point.
>>>> Sai, please could we narrow down this to the real problem and may be
>>>> work around it for the "device" ? Do we know which sink is causing 
>>>> the
>>>> back pressure ? We could then push the "work around" to the 
>>>> replicator
>>>> it is connected to.
>>>> 
>>>> Suzuki
>>> 
>>> Hi Suzuki, Mike,
>>> 
>>> To add some more to the information provided earlier,
>>> swao_replicator(6b06000) and etf are
>>> in AOSS (Always-On-SubSystem) group. Also TPIU(connected to
>>> qdss_replicator) and EUD(connected
>>> to swao_replicator) sinks are unused.
>>> 
>>> Please ignore the id filter values provided earlier.
>>> Here are ID filter values after boot and before enabling replicator. 
>>> As per
>>> these idfilter values, we should not try to enable replicator if its 
>>> already
>>> enabled (in this case for swao_replicator) right?
>>> 
>>> localhost ~ # cat
>>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter0
>>> 0x0
>>> localhost ~ # cat
>>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter1
>>> 0x0
>>> 
>>> localhost ~ # cat
>>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter0
>>> 0xff
>>> localhost ~ # cat
>>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter1
>>> 0xff
>>> 
>> 
>> Looking more into replicator1(swao_replicator) values as 0x0 even 
>> after replicator_reset()
>> in replicator probe, I added dynamic_replicator_reset in 
>> dynamic_replicator_enable()
>> and am not seeing any hardlockup. Also I added some prints to check 
>> the idfilter
>> values before and after reset and found that its not set to 0xff even 
>> after replicator_reset()
>> in replicator probe, I don't see any other path setting it to 0x0.
>> 
>> After probe:
>> 
>> [    8.477669] func replicator_probe before reset replicator 
>> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
>> [    8.489470] func replicator_probe after reset replicator 
>> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> 
> AFAICS, after the reset both of them are set to 0xff.

Yes I see this too as we call replicator_reset() in probe. What I wanted 
to highlight was the below part where it is set to 0x0 before enabling 
dynamic replicator.

> 
>> [    8.502738] func replicator_probe before reset replicator 
>> replicator0 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
>> [    8.515214] func replicator_probe after reset replicator 
>> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> 
> 
> 
>> localhost ~ #
>> localhost ~ #
>> localhost ~ # echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
>> localhost ~ #
>> localhost ~ # echo 1 > /sys/bus/coresight/devices/etm0/enable_source
>> [   58.490485] func dynamic_replicator_enable before reset replicator 
>> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> [   58.503246] func dynamic_replicator_enable after reset replicator 
>> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> [   58.520902] func dynamic_replicator_enable before reset replicator 
>> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> 
> You need to find what is resetting the IDFILTERs to 0 for replicator1.
> 

That is right.

>> [   58.533500] func dynamic_replicator_enable after reset replicator 
>> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> localhost ~ #
>> 
>> Can we have a replicator_reset in dynamic_replicator_enable?
>> 
>> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c 
>> b/drivers/hwtracing/coresight/coresight-replicator.c
>> index e7dc1c31d20d..794f8e4c049f 100644
>> --- a/drivers/hwtracing/coresight/coresight-replicator.c
>> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
>> @@ -68,6 +68,8 @@ static int dynamic_replicator_enable(struct 
>> replicator_drvdata *drvdata,
>>          int rc = 0;
>>          u32 reg;
>> 
>> +       dynamic_replicator_reset(drvdata);
>> +
> 
> Again you are trying to mask an issue with this. Is the firmware
> using the replicator for anything ? If so, this needs to be claimed
> to prevent us from using it.
> 

I was trying to narrow down further as you suggested. There are other 
ETMs like AOP ETM which use this replicator, will need to check with the 
firmware team for details.

Thanks,
Sai
Mike Leach April 29, 2020, 2:27 p.m. UTC | #8
Hi,

On Wed, 29 Apr 2020 at 14:59, Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> On 2020-04-29 19:19, Suzuki K Poulose wrote:
> > On 04/29/2020 12:47 PM, Sai Prakash Ranjan wrote:
> >> On 2020-04-28 17:53, Sai Prakash Ranjan wrote:
> >>> On 2020-04-27 19:23, Suzuki K Poulose wrote:
> >>>> On 04/27/2020 10:45 AM, Mike Leach wrote:
> >>> [...]
> >>>>>>
> >>>>>> This is not sufficient. You must prevent another session trying to
> >>>>>> enable the other port of the replicator as this could silently
> >>>>>> fail
> >>>>>> the "on-going" session. Not ideal. Fail the attempt to enable a
> >>>>>> port
> >>>>>> if the other port is active. You could track this in software and
> >>>>>> fail early.
> >>>>>>
> >>>>>> Suzuki
> >>>>>
> >>>>> While I have no issue in principle with not enabling a path to a
> >>>>> sink
> >>>>> that is not in use - indeed in some cases attaching to unused sinks
> >>>>> can cause back-pressure that slows throughput (cf TPIU) - I am
> >>>>> concerned that this modification is masking an underlying issue
> >>>>> with
> >>>>> the platform in question.
> >>>>>
> >>>>> Should we decide to enable the diversion of different IDs to
> >>>>> different
> >>>>> sinks or allow different sessions go to different sinks, then this
> >>>>> has
> >>>>> potential to fail on the SC7180 SoC - and it will be difficult in
> >>>>> future to associate a problem with this discussion.
> >>>>
> >>>> Mike,
> >>>>
> >>>> I think thats a good point.
> >>>> Sai, please could we narrow down this to the real problem and may be
> >>>> work around it for the "device" ? Do we know which sink is causing
> >>>> the
> >>>> back pressure ? We could then push the "work around" to the
> >>>> replicator
> >>>> it is connected to.
> >>>>
> >>>> Suzuki
> >>>
> >>> Hi Suzuki, Mike,
> >>>
> >>> To add some more to the information provided earlier,
> >>> swao_replicator(6b06000) and etf are
> >>> in AOSS (Always-On-SubSystem) group. Also TPIU(connected to
> >>> qdss_replicator) and EUD(connected
> >>> to swao_replicator) sinks are unused.
> >>>
> >>> Please ignore the id filter values provided earlier.
> >>> Here are ID filter values after boot and before enabling replicator.
> >>> As per
> >>> these idfilter values, we should not try to enable replicator if its
> >>> already
> >>> enabled (in this case for swao_replicator) right?
> >>>
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter0
> >>> 0x0
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter1
> >>> 0x0
> >>>
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter0
> >>> 0xff
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter1
> >>> 0xff
> >>>
> >>
> >> Looking more into replicator1(swao_replicator) values as 0x0 even
> >> after replicator_reset()
> >> in replicator probe, I added dynamic_replicator_reset in
> >> dynamic_replicator_enable()
> >> and am not seeing any hardlockup. Also I added some prints to check
> >> the idfilter
> >> values before and after reset and found that its not set to 0xff even
> >> after replicator_reset()
> >> in replicator probe, I don't see any other path setting it to 0x0.
> >>
> >> After probe:
> >>
> >> [    8.477669] func replicator_probe before reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >> [    8.489470] func replicator_probe after reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >
> > AFAICS, after the reset both of them are set to 0xff.
>
> Yes I see this too as we call replicator_reset() in probe. What I wanted
> to highlight was the below part where it is set to 0x0 before enabling
> dynamic replicator.
>
> >
> >> [    8.502738] func replicator_probe before reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >> [    8.515214] func replicator_probe after reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >
> >
> >
> >> localhost ~ #
> >> localhost ~ #
> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
> >> localhost ~ #
> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/etm0/enable_source
> >> [   58.490485] func dynamic_replicator_enable before reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> [   58.503246] func dynamic_replicator_enable after reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> [   58.520902] func dynamic_replicator_enable before reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >
> > You need to find what is resetting the IDFILTERs to 0 for replicator1.
> >
>
> That is right.
>

By default all replicators have the IDFILTER registers set to 0 out of
hardware reset. This ensures that programmable replicators behave in
the same way as non-programmable replicators out of reset.

The  dynamic_replicator_reset() is of course a driver state reset -
which filters out all trace on the output ports. The trace is then
enabled when we set the trace path from source to sink.

It seems to me that you have 2 problems that need solving here:

1) Why does the reset_replicator() called from probe() _not_ work
correctly on replicator 1? It seems to work later if you introduce a
reset after more of the system has powered and booted. This is
startiing to look a little like a PM / clocking issue.

This failure is causing the state when we are trying to set an output
port that both branches of this replicator are enabled for output.
In effect for this replicator, setting the output port has no effect
as it is already enabled.

2) Why does having both ports of this repilicator enabled cause a hard
lockup? This is a separate hardware  / system issue.

The worst that should happen if both branches of a replicator are
enabled is that you get undesirable back pressure. (e.g. there is a
system we have seen - I think it is Juno - where there is a static
replicator feeding the TPIU and ETR - we need to disable the TPIU to
prevent undesired back pressure).

Regards

Mike


> >> [   58.533500] func dynamic_replicator_enable after reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> localhost ~ #
> >>
> >> Can we have a replicator_reset in dynamic_replicator_enable?
> >>
> >> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c
> >> b/drivers/hwtracing/coresight/coresight-replicator.c
> >> index e7dc1c31d20d..794f8e4c049f 100644
> >> --- a/drivers/hwtracing/coresight/coresight-replicator.c
> >> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> >> @@ -68,6 +68,8 @@ static int dynamic_replicator_enable(struct
> >> replicator_drvdata *drvdata,
> >>          int rc = 0;
> >>          u32 reg;
> >>
> >> +       dynamic_replicator_reset(drvdata);
> >> +
> >
> > Again you are trying to mask an issue with this. Is the firmware
> > using the replicator for anything ? If so, this needs to be claimed
> > to prevent us from using it.
> >
>
> I was trying to narrow down further as you suggested. There are other
> ETMs like AOP ETM which use this replicator, will need to check with the
> firmware team for details.
>
> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation
Sai Prakash Ranjan April 29, 2020, 2:48 p.m. UTC | #9
Hi Mike,

On 2020-04-29 19:57, Mike Leach wrote:
> Hi,
> 

[...]

>> >> Looking more into replicator1(swao_replicator) values as 0x0 even
>> >> after replicator_reset()
>> >> in replicator probe, I added dynamic_replicator_reset in
>> >> dynamic_replicator_enable()
>> >> and am not seeing any hardlockup. Also I added some prints to check
>> >> the idfilter
>> >> values before and after reset and found that its not set to 0xff even
>> >> after replicator_reset()
>> >> in replicator probe, I don't see any other path setting it to 0x0.
>> >>
>> >> After probe:
>> >>
>> >> [    8.477669] func replicator_probe before reset replicator
>> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
>> >> [    8.489470] func replicator_probe after reset replicator
>> >> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> >
>> > AFAICS, after the reset both of them are set to 0xff.
>> 
>> Yes I see this too as we call replicator_reset() in probe. What I 
>> wanted
>> to highlight was the below part where it is set to 0x0 before enabling
>> dynamic replicator.
>> 
>> >
>> >> [    8.502738] func replicator_probe before reset replicator
>> >> replicator0 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
>> >> [    8.515214] func replicator_probe after reset replicator
>> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> >
>> >
>> >
>> >> localhost ~ #
>> >> localhost ~ #
>> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
>> >> localhost ~ #
>> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/etm0/enable_source
>> >> [   58.490485] func dynamic_replicator_enable before reset replicator
>> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> >> [   58.503246] func dynamic_replicator_enable after reset replicator
>> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> >> [   58.520902] func dynamic_replicator_enable before reset replicator
>> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
>> >
>> > You need to find what is resetting the IDFILTERs to 0 for replicator1.
>> >
>> 
>> That is right.
>> 
> 
> By default all replicators have the IDFILTER registers set to 0 out of
> hardware reset. This ensures that programmable replicators behave in
> the same way as non-programmable replicators out of reset.
> 
> The  dynamic_replicator_reset() is of course a driver state reset -
> which filters out all trace on the output ports. The trace is then
> enabled when we set the trace path from source to sink.
> 

Thanks for these explanations.

> It seems to me that you have 2 problems that need solving here:
> 
> 1) Why does the reset_replicator() called from probe() _not_ work
> correctly on replicator 1? It seems to work later if you introduce a
> reset after more of the system has powered and booted. This is
> startiing to look a little like a PM / clocking issue.

reset_replicator() does work in probe correctly for both replicators, 
below logs is collected before and after reset in probe. It is later 
that it's set back to 0x0 and hence the suggestion to look at firmware 
using this replicator1.

[    8.477669] func replicator_probe before reset replicator replicator1 
REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
[    8.489470] func replicator_probe after reset replicator replicator1 
REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff

> 
> This failure is causing the state when we are trying to set an output
> port that both branches of this replicator are enabled for output.
> In effect for this replicator, setting the output port has no effect
> as it is already enabled.
> 
> 2) Why does having both ports of this repilicator enabled cause a hard
> lockup? This is a separate hardware  / system issue.
> 
> The worst that should happen if both branches of a replicator are
> enabled is that you get undesirable back pressure. (e.g. there is a
> system we have seen - I think it is Juno - where there is a static
> replicator feeding the TPIU and ETR - we need to disable the TPIU to
> prevent undesired back pressure).
> 

Ok so hardlockup is not expected because of this backpressure.

Thanks,
Sai
Mike Leach April 29, 2020, 4:58 p.m. UTC | #10
Hi,

On Wed, 29 Apr 2020 at 15:48, Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Mike,
>
> On 2020-04-29 19:57, Mike Leach wrote:
> > Hi,
> >
>
> [...]
>
> >> >> Looking more into replicator1(swao_replicator) values as 0x0 even
> >> >> after replicator_reset()
> >> >> in replicator probe, I added dynamic_replicator_reset in
> >> >> dynamic_replicator_enable()
> >> >> and am not seeing any hardlockup. Also I added some prints to check
> >> >> the idfilter
> >> >> values before and after reset and found that its not set to 0xff even
> >> >> after replicator_reset()
> >> >> in replicator probe, I don't see any other path setting it to 0x0.
> >> >>
> >> >> After probe:
> >> >>
> >> >> [    8.477669] func replicator_probe before reset replicator
> >> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >> >> [    8.489470] func replicator_probe after reset replicator
> >> >> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> >
> >> > AFAICS, after the reset both of them are set to 0xff.
> >>
> >> Yes I see this too as we call replicator_reset() in probe. What I
> >> wanted
> >> to highlight was the below part where it is set to 0x0 before enabling
> >> dynamic replicator.
> >>
> >> >
> >> >> [    8.502738] func replicator_probe before reset replicator
> >> >> replicator0 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >> >> [    8.515214] func replicator_probe after reset replicator
> >> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> >
> >> >
> >> >
> >> >> localhost ~ #
> >> >> localhost ~ #
> >> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
> >> >> localhost ~ #
> >> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/etm0/enable_source
> >> >> [   58.490485] func dynamic_replicator_enable before reset replicator
> >> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> >> [   58.503246] func dynamic_replicator_enable after reset replicator
> >> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> >> [   58.520902] func dynamic_replicator_enable before reset replicator
> >> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >> >
> >> > You need to find what is resetting the IDFILTERs to 0 for replicator1.
> >> >
> >>
> >> That is right.
> >>
> >
> > By default all replicators have the IDFILTER registers set to 0 out of
> > hardware reset. This ensures that programmable replicators behave in
> > the same way as non-programmable replicators out of reset.
> >
> > The  dynamic_replicator_reset() is of course a driver state reset -
> > which filters out all trace on the output ports. The trace is then
> > enabled when we set the trace path from source to sink.
> >
>
> Thanks for these explanations.
>
> > It seems to me that you have 2 problems that need solving here:
> >
> > 1) Why does the reset_replicator() called from probe() _not_ work
> > correctly on replicator 1? It seems to work later if you introduce a
> > reset after more of the system has powered and booted. This is
> > startiing to look a little like a PM / clocking issue.
>
> reset_replicator() does work in probe correctly for both replicators,
> below logs is collected before and after reset in probe. It is later
> that it's set back to 0x0 and hence the suggestion to look at firmware
> using this replicator1.
>
OK - sorry I read your statement saying that replicator1 was 0 after
the reset in probe(), rather than look at the logs.

From the logs it is working at the time probe() occurs, but by the
time we come to enable the replicator later, something has reset these
registers / hardware outside the control of the replicator driver.


> [    8.477669] func replicator_probe before reset replicator replicator1
> REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> [    8.489470] func replicator_probe after reset replicator replicator1
> REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>
> >
> > This failure is causing the state when we are trying to set an output
> > port that both branches of this replicator are enabled for output.
> > In effect for this replicator, setting the output port has no effect
> > as it is already enabled.
> >
> > 2) Why does having both ports of this repilicator enabled cause a hard
> > lockup? This is a separate hardware  / system issue.
> >
> > The worst that should happen if both branches of a replicator are
> > enabled is that you get undesirable back pressure. (e.g. there is a
> > system we have seen - I think it is Juno - where there is a static
> > replicator feeding the TPIU and ETR - we need to disable the TPIU to
> > prevent undesired back pressure).
> >
>
> Ok so hardlockup is not expected because of this backpressure.
>

Hardlockup is not expected, but this is not related to any possible
backpressure.

Ordinarily having both legs of a replicator enabled should not cause
system failure.

Mike

> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation
Sai Prakash Ranjan April 29, 2020, 5:11 p.m. UTC | #11
Hi Mike,

On 2020-04-29 22:28, Mike Leach wrote:
> Hi,
> 

[...]

>> >> > You need to find what is resetting the IDFILTERs to 0 for replicator1.
>> >> >
>> >>
>> >> That is right.
>> >>
>> >
>> > By default all replicators have the IDFILTER registers set to 0 out of
>> > hardware reset. This ensures that programmable replicators behave in
>> > the same way as non-programmable replicators out of reset.
>> >
>> > The  dynamic_replicator_reset() is of course a driver state reset -
>> > which filters out all trace on the output ports. The trace is then
>> > enabled when we set the trace path from source to sink.
>> >
>> 
>> Thanks for these explanations.
>> 
>> > It seems to me that you have 2 problems that need solving here:
>> >
>> > 1) Why does the reset_replicator() called from probe() _not_ work
>> > correctly on replicator 1? It seems to work later if you introduce a
>> > reset after more of the system has powered and booted. This is
>> > startiing to look a little like a PM / clocking issue.
>> 
>> reset_replicator() does work in probe correctly for both replicators,
>> below logs is collected before and after reset in probe. It is later
>> that it's set back to 0x0 and hence the suggestion to look at firmware
>> using this replicator1.
>> 
> OK - sorry I read your statement saying that replicator1 was 0 after
> the reset in probe(), rather than look at the logs.
> 
> From the logs it is working at the time probe() occurs, but by the
> time we come to enable the replicator later, something has reset these
> registers / hardware outside the control of the replicator driver.
> 

Yes, I will try to get some more information from the firmware side if 
there is anything messing up.

> 
>> [    8.477669] func replicator_probe before reset replicator 
>> replicator1
>> REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
>> [    8.489470] func replicator_probe after reset replicator 
>> replicator1
>> REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
>> 
>> >
>> > This failure is causing the state when we are trying to set an output
>> > port that both branches of this replicator are enabled for output.
>> > In effect for this replicator, setting the output port has no effect
>> > as it is already enabled.
>> >
>> > 2) Why does having both ports of this repilicator enabled cause a hard
>> > lockup? This is a separate hardware  / system issue.
>> >
>> > The worst that should happen if both branches of a replicator are
>> > enabled is that you get undesirable back pressure. (e.g. there is a
>> > system we have seen - I think it is Juno - where there is a static
>> > replicator feeding the TPIU and ETR - we need to disable the TPIU to
>> > prevent undesired back pressure).
>> >
>> 
>> Ok so hardlockup is not expected because of this backpressure.
>> 
> 
> Hardlockup is not expected, but this is not related to any possible
> backpressure.
> 
> Ordinarily having both legs of a replicator enabled should not cause
> system failure.
> 

Ok got it, thanks.

Thanks,
Sai
Sai Prakash Ranjan May 6, 2020, 7:35 a.m. UTC | #12
Hi Suzuki, Mike,

On 2020-04-29 22:41, Sai Prakash Ranjan wrote:
> Hi Mike,
> 
> On 2020-04-29 22:28, Mike Leach wrote:
>> Hi,
>> 
> 
> [...]
> 
>>> >> > You need to find what is resetting the IDFILTERs to 0 for replicator1.
>>> >> >
>>> >>
>>> >> That is right.
>>> >>
>>> >
>>> > By default all replicators have the IDFILTER registers set to 0 out of
>>> > hardware reset. This ensures that programmable replicators behave in
>>> > the same way as non-programmable replicators out of reset.
>>> >
>>> > The  dynamic_replicator_reset() is of course a driver state reset -
>>> > which filters out all trace on the output ports. The trace is then
>>> > enabled when we set the trace path from source to sink.
>>> >
>>> 
>>> Thanks for these explanations.
>>> 
>>> > It seems to me that you have 2 problems that need solving here:
>>> >
>>> > 1) Why does the reset_replicator() called from probe() _not_ work
>>> > correctly on replicator 1? It seems to work later if you introduce a
>>> > reset after more of the system has powered and booted. This is
>>> > startiing to look a little like a PM / clocking issue.
>>> 
>>> reset_replicator() does work in probe correctly for both replicators,
>>> below logs is collected before and after reset in probe. It is later
>>> that it's set back to 0x0 and hence the suggestion to look at 
>>> firmware
>>> using this replicator1.
>>> 
>> OK - sorry I read your statement saying that replicator1 was 0 after
>> the reset in probe(), rather than look at the logs.
>> 
>> From the logs it is working at the time probe() occurs, but by the
>> time we come to enable the replicator later, something has reset these
>> registers / hardware outside the control of the replicator driver.
>> 
> 
> Yes, I will try to get some more information from the firmware side if
> there is anything messing up.
> 

This turned out to be a clock/pm issue. To confirm, I just marked clk as 
critical
so that it won't be gated and I saw the replicator1(swao_replicator) 
registers
intact after probe. Also alternatively, I tried to comment out disabling 
pclk
to check if there is something wrong in amba pm and this keeps the 
registers
intact as well.

@@ -288,7 +295,7 @@ static int amba_probe(struct device *dev)
                 pm_runtime_set_suspended(dev);
                 pm_runtime_put_noidle(dev);

-               amba_put_disable_pclk(pcdev);
+               //amba_put_disable_pclk(pcdev);
                 dev_pm_domain_detach(dev, true);
         } while (0);

Thanks,
Sai
Sai Prakash Ranjan May 8, 2020, 8:53 a.m. UTC | #13
Hi Suzuki, Mike,

On 2020-05-06 13:05, Sai Prakash Ranjan wrote:
[...]

>>>> 
>>> OK - sorry I read your statement saying that replicator1 was 0 after
>>> the reset in probe(), rather than look at the logs.
>>> 
>>> From the logs it is working at the time probe() occurs, but by the
>>> time we come to enable the replicator later, something has reset 
>>> these
>>> registers / hardware outside the control of the replicator driver.
>>> 
>> 
>> Yes, I will try to get some more information from the firmware side if
>> there is anything messing up.
>> 
> 
> This turned out to be a clock/pm issue. To confirm, I just marked clk
> as critical
> so that it won't be gated and I saw the replicator1(swao_replicator) 
> registers
> intact after probe. Also alternatively, I tried to comment out 
> disabling pclk
> to check if there is something wrong in amba pm and this keeps the 
> registers
> intact as well.
> 
> @@ -288,7 +295,7 @@ static int amba_probe(struct device *dev)
>                 pm_runtime_set_suspended(dev);
>                 pm_runtime_put_noidle(dev);
> 
> -               amba_put_disable_pclk(pcdev);
> +               //amba_put_disable_pclk(pcdev);
>                 dev_pm_domain_detach(dev, true);
>         } while (0);
> 

I checked with the debug team and there is a limitation with
the replicator(swao_replicator) in the AOSS group where it
loses the idfilter register context when the clock is disabled.
This is not just in SC7180 SoC but also reported on some latest
upcoming QCOM SoCs as well and will need to be taken care in
order to enable coresight on these chipsets.

Here's what's happening -  After the replicator is initialized,
the clock is disabled in amba_pm_runtime_suspend() as a part of
pm runtime workqueue with the assumption that there will be no
loss of context after the replicator is initialized. But it doesn't
hold good with the replicators with these unfortunate limitation
and the idfilter register context is lost.

[    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator ret=0
[    5.914516] Workqueue: pm pm_runtime_work
[    5.918648] Call trace:
[    5.921185]  dump_backtrace+0x0/0x1d0
[    5.924958]  show_stack+0x2c/0x38
[    5.928382]  dump_stack+0xc0/0x104
[    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
[    5.936469]  __rpm_callback+0xe0/0x140
[    5.940332]  rpm_callback+0x38/0x98
[    5.943926]  rpm_suspend+0xec/0x618
[    5.947522]  rpm_idle+0x5c/0x3f8
[    5.950851]  pm_runtime_work+0xa8/0xc0
[    5.954718]  process_one_work+0x1f8/0x4c0
[    5.958848]  worker_thread+0x50/0x468
[    5.962623]  kthread+0x12c/0x158
[    5.965957]  ret_from_fork+0x10/0x1c

This is a platform/SoC specific replicator issue, so we can either
introduce some DT property for replicators to identify which replicator
has this limitation, check in replicator_enable() and reset the 
registers
or have something like below diff to check the idfilter registers in
replicator_enable() and then reset with clear comment specifying it’s 
the
hardware limitation on some QCOM SoCs. Please let me know your thoughts 
on
this?

diff --git a/drivers/hwtracing/coresight/coresight-replicator.c 
b/drivers/hwtracing/coresight/coresight-replicator.c
index e7dc1c31d20d..a9c039c944eb 100644
--- a/drivers/hwtracing/coresight/coresight-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-replicator.c
@@ -68,6 +68,17 @@ static int dynamic_replicator_enable(struct 
replicator_drvdata *drvdata,
         int rc = 0;
         u32 reg;

+       /*
+        * On some QCOM SoCs with replicators in Always-On domain, 
disabling
+        * clock will result in replicator losing its context. Currently
+        * as a part of pm_runtime workqueue, amba_pm_runtime_suspend 
disables
+        * clock assuming the context is not lost which is not true for 
cases
+        * with hardware limitations as the above.
+        */
+       if ((readl_relaxed(drvdata->base + REPLICATOR_IDFILTER0) == 0) 
&&
+           (readl_relaxed(drvdata->base + REPLICATOR_IDFILTER1) == 0))
+               dynamic_replicator_reset(drvdata);
+
         switch (outport) {
         case 0:
                 reg = REPLICATOR_IDFILTER0;



Thanks,
Sai
Mike Leach May 11, 2020, 11:14 a.m. UTC | #14
HI,

On Fri, 8 May 2020 at 09:53, Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Suzuki, Mike,
>
> On 2020-05-06 13:05, Sai Prakash Ranjan wrote:
> [...]
>
> >>>>
> >>> OK - sorry I read your statement saying that replicator1 was 0 after
> >>> the reset in probe(), rather than look at the logs.
> >>>
> >>> From the logs it is working at the time probe() occurs, but by the
> >>> time we come to enable the replicator later, something has reset
> >>> these
> >>> registers / hardware outside the control of the replicator driver.
> >>>
> >>
> >> Yes, I will try to get some more information from the firmware side if
> >> there is anything messing up.
> >>
> >
> > This turned out to be a clock/pm issue. To confirm, I just marked clk
> > as critical
> > so that it won't be gated and I saw the replicator1(swao_replicator)
> > registers
> > intact after probe. Also alternatively, I tried to comment out
> > disabling pclk
> > to check if there is something wrong in amba pm and this keeps the
> > registers
> > intact as well.
> >
> > @@ -288,7 +295,7 @@ static int amba_probe(struct device *dev)
> >                 pm_runtime_set_suspended(dev);
> >                 pm_runtime_put_noidle(dev);
> >
> > -               amba_put_disable_pclk(pcdev);
> > +               //amba_put_disable_pclk(pcdev);
> >                 dev_pm_domain_detach(dev, true);
> >         } while (0);
> >
>
> I checked with the debug team and there is a limitation with
> the replicator(swao_replicator) in the AOSS group where it
> loses the idfilter register context when the clock is disabled.
> This is not just in SC7180 SoC but also reported on some latest
> upcoming QCOM SoCs as well and will need to be taken care in
> order to enable coresight on these chipsets.
>
> Here's what's happening -  After the replicator is initialized,
> the clock is disabled in amba_pm_runtime_suspend() as a part of
> pm runtime workqueue with the assumption that there will be no
> loss of context after the replicator is initialized. But it doesn't
> hold good with the replicators with these unfortunate limitation
> and the idfilter register context is lost.
>
> [    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator ret=0
> [    5.914516] Workqueue: pm pm_runtime_work
> [    5.918648] Call trace:
> [    5.921185]  dump_backtrace+0x0/0x1d0
> [    5.924958]  show_stack+0x2c/0x38
> [    5.928382]  dump_stack+0xc0/0x104
> [    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
> [    5.936469]  __rpm_callback+0xe0/0x140
> [    5.940332]  rpm_callback+0x38/0x98
> [    5.943926]  rpm_suspend+0xec/0x618
> [    5.947522]  rpm_idle+0x5c/0x3f8
> [    5.950851]  pm_runtime_work+0xa8/0xc0
> [    5.954718]  process_one_work+0x1f8/0x4c0
> [    5.958848]  worker_thread+0x50/0x468
> [    5.962623]  kthread+0x12c/0x158
> [    5.965957]  ret_from_fork+0x10/0x1c
>
> This is a platform/SoC specific replicator issue, so we can either
> introduce some DT property for replicators to identify which replicator
> has this limitation, check in replicator_enable() and reset the
> registers
> or have something like below diff to check the idfilter registers in
> replicator_enable() and then reset with clear comment specifying it’s
> the
> hardware limitation on some QCOM SoCs. Please let me know your thoughts
> on
> this?
>

1) does this replicator part have a unique ID that differs from the
standard ARM designed replicators?
If so perhaps link the modification into this. (even if the part no in
PIDR0/1 is the same the UCI should be different for a different
implementation)

2) We have used DT properties in the past - (e.g. scatter gather in
TMC) where hardware compatibility issues have impacted on the
operation of a coresight component. This is further complicated by the
fact that an ACPI property would be needed as well.

3) The sysfs access to FILTERID0/1 on this replicator is going to show
different values than on a standard replicator (0x00 instead of 0xFF).
Does this need to be addressed?

4 ) An alternative approach could be to model the driver on the ETM /
CTI drivers where the register values are held in the driver structure
and only applied on enable / disable.

Thoughts?

Regards

Mike



> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c
> b/drivers/hwtracing/coresight/coresight-replicator.c
> index e7dc1c31d20d..a9c039c944eb 100644
> --- a/drivers/hwtracing/coresight/coresight-replicator.c
> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> @@ -68,6 +68,17 @@ static int dynamic_replicator_enable(struct
> replicator_drvdata *drvdata,
>          int rc = 0;
>          u32 reg;
>
> +       /*
> +        * On some QCOM SoCs with replicators in Always-On domain,
> disabling
> +        * clock will result in replicator losing its context. Currently
> +        * as a part of pm_runtime workqueue, amba_pm_runtime_suspend
> disables
> +        * clock assuming the context is not lost which is not true for
> cases
> +        * with hardware limitations as the above.
> +        */
> +       if ((readl_relaxed(drvdata->base + REPLICATOR_IDFILTER0) == 0)
> &&
> +           (readl_relaxed(drvdata->base + REPLICATOR_IDFILTER1) == 0))
> +               dynamic_replicator_reset(drvdata);
> +
>          switch (outport) {
>          case 0:
>                  reg = REPLICATOR_IDFILTER0;
>
>
>
> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation
Sai Prakash Ranjan May 11, 2020, 2:16 p.m. UTC | #15
Hi Mike,

On 2020-05-11 16:44, Mike Leach wrote:
[...]

>> 
>> I checked with the debug team and there is a limitation with
>> the replicator(swao_replicator) in the AOSS group where it
>> loses the idfilter register context when the clock is disabled.
>> This is not just in SC7180 SoC but also reported on some latest
>> upcoming QCOM SoCs as well and will need to be taken care in
>> order to enable coresight on these chipsets.
>> 
>> Here's what's happening -  After the replicator is initialized,
>> the clock is disabled in amba_pm_runtime_suspend() as a part of
>> pm runtime workqueue with the assumption that there will be no
>> loss of context after the replicator is initialized. But it doesn't
>> hold good with the replicators with these unfortunate limitation
>> and the idfilter register context is lost.
>> 
>> [    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator 
>> ret=0
>> [    5.914516] Workqueue: pm pm_runtime_work
>> [    5.918648] Call trace:
>> [    5.921185]  dump_backtrace+0x0/0x1d0
>> [    5.924958]  show_stack+0x2c/0x38
>> [    5.928382]  dump_stack+0xc0/0x104
>> [    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
>> [    5.936469]  __rpm_callback+0xe0/0x140
>> [    5.940332]  rpm_callback+0x38/0x98
>> [    5.943926]  rpm_suspend+0xec/0x618
>> [    5.947522]  rpm_idle+0x5c/0x3f8
>> [    5.950851]  pm_runtime_work+0xa8/0xc0
>> [    5.954718]  process_one_work+0x1f8/0x4c0
>> [    5.958848]  worker_thread+0x50/0x468
>> [    5.962623]  kthread+0x12c/0x158
>> [    5.965957]  ret_from_fork+0x10/0x1c
>> 
>> This is a platform/SoC specific replicator issue, so we can either
>> introduce some DT property for replicators to identify which 
>> replicator
>> has this limitation, check in replicator_enable() and reset the
>> registers
>> or have something like below diff to check the idfilter registers in
>> replicator_enable() and then reset with clear comment specifying it’s
>> the
>> hardware limitation on some QCOM SoCs. Please let me know your 
>> thoughts
>> on
>> this?
>> 

Sorry for hurrying up and sending the patch - 
https://lore.kernel.org/patchwork/patch/1239923/.
I will send v2 based on further feedbacks here or there.

> 
> 1) does this replicator part have a unique ID that differs from the
> standard ARM designed replicators?
> If so perhaps link the modification into this. (even if the part no in
> PIDR0/1 is the same the UCI should be different for a different
> implementation)
> 

pid=0x2bb909 for both replicators. So part number is same.
UCI will be different for different implementation(QCOM maybe different 
from ARM),
but will it be different for different replicators under the same 
impl(i.e., on QCOM).

> 2) We have used DT properties in the past - (e.g. scatter gather in
> TMC) where hardware compatibility issues have impacted on the
> operation of a coresight component. This is further complicated by the
> fact that an ACPI property would be needed as well.
> 

Yes, this was also one option which I had mentioned. But as you said we 
need
to have an ACPI property as well and these systems with limitations are 
DT based.

> 3) The sysfs access to FILTERID0/1 on this replicator is going to show
> different values than on a standard replicator (0x00 instead of 0xFF).
> Does this need to be addressed?
> 

I don't think we need to change this because its actually showing the 
right
values for the replicator.

> 4 ) An alternative approach could be to model the driver on the ETM /
> CTI drivers where the register values are held in the driver structure
> and only applied on enable / disable.
> 

This looks good to me since we really don't need to reset replicator in 
probe,
we need to reset it only in replicator_enable() and that ensures clocks 
are enabled
without having to assume things(from amba) about context being lost or 
not when
clocks are disabled since that is implementation defined anyways.

But, why can't we just move replicator_reset() from probe to 
replicator_enable()?
Anything wrong with it? Seems like right thing to do because we will be 
having
clocks enabled when we touch the replicator registers and only in the 
enable
path.

Thanks,
Sai
Suzuki K Poulose May 11, 2020, 2:30 p.m. UTC | #16
On 05/11/2020 03:16 PM, Sai Prakash Ranjan wrote:
> Hi Mike,
> 
> On 2020-05-11 16:44, Mike Leach wrote:
> [...]
> 
>>>
>>> I checked with the debug team and there is a limitation with
>>> the replicator(swao_replicator) in the AOSS group where it
>>> loses the idfilter register context when the clock is disabled.
>>> This is not just in SC7180 SoC but also reported on some latest
>>> upcoming QCOM SoCs as well and will need to be taken care in
>>> order to enable coresight on these chipsets.
>>>
>>> Here's what's happening -  After the replicator is initialized,
>>> the clock is disabled in amba_pm_runtime_suspend() as a part of
>>> pm runtime workqueue with the assumption that there will be no
>>> loss of context after the replicator is initialized. But it doesn't
>>> hold good with the replicators with these unfortunate limitation
>>> and the idfilter register context is lost.
>>>
>>> [    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator ret=0
>>> [    5.914516] Workqueue: pm pm_runtime_work
>>> [    5.918648] Call trace:
>>> [    5.921185]  dump_backtrace+0x0/0x1d0
>>> [    5.924958]  show_stack+0x2c/0x38
>>> [    5.928382]  dump_stack+0xc0/0x104
>>> [    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
>>> [    5.936469]  __rpm_callback+0xe0/0x140
>>> [    5.940332]  rpm_callback+0x38/0x98
>>> [    5.943926]  rpm_suspend+0xec/0x618
>>> [    5.947522]  rpm_idle+0x5c/0x3f8
>>> [    5.950851]  pm_runtime_work+0xa8/0xc0
>>> [    5.954718]  process_one_work+0x1f8/0x4c0
>>> [    5.958848]  worker_thread+0x50/0x468
>>> [    5.962623]  kthread+0x12c/0x158
>>> [    5.965957]  ret_from_fork+0x10/0x1c
>>>
>>> This is a platform/SoC specific replicator issue, so we can either
>>> introduce some DT property for replicators to identify which replicator
>>> has this limitation, check in replicator_enable() and reset the
>>> registers
>>> or have something like below diff to check the idfilter registers in
>>> replicator_enable() and then reset with clear comment specifying it’s
>>> the
>>> hardware limitation on some QCOM SoCs. Please let me know your thoughts
>>> on
>>> this?
>>>
> 
> Sorry for hurrying up and sending the patch - 
> https://lore.kernel.org/patchwork/patch/1239923/.
> I will send v2 based on further feedbacks here or there.
> 
>>
>> 1) does this replicator part have a unique ID that differs from the
>> standard ARM designed replicators?
>> If so perhaps link the modification into this. (even if the part no in
>> PIDR0/1 is the same the UCI should be different for a different
>> implementation)
>>
> 
> pid=0x2bb909 for both replicators. So part number is same.
> UCI will be different for different implementation(QCOM maybe different 
> from ARM),
> but will it be different for different replicators under the same 
> impl(i.e., on QCOM).

May be use PIDR4.DES_2 to match the Implementor and apply the work
around for all QCOM replicators ?

To me that sounds the best option.

Suzuki
Sai Prakash Ranjan May 11, 2020, 2:34 p.m. UTC | #17
On 2020-05-11 19:46, Sai Prakash Ranjan wrote:
> Hi Mike,
> 
> On 2020-05-11 16:44, Mike Leach wrote:
> [...]
> 
>>> 
>>> I checked with the debug team and there is a limitation with
>>> the replicator(swao_replicator) in the AOSS group where it
>>> loses the idfilter register context when the clock is disabled.
>>> This is not just in SC7180 SoC but also reported on some latest
>>> upcoming QCOM SoCs as well and will need to be taken care in
>>> order to enable coresight on these chipsets.
>>> 
>>> Here's what's happening -  After the replicator is initialized,
>>> the clock is disabled in amba_pm_runtime_suspend() as a part of
>>> pm runtime workqueue with the assumption that there will be no
>>> loss of context after the replicator is initialized. But it doesn't
>>> hold good with the replicators with these unfortunate limitation
>>> and the idfilter register context is lost.
>>> 
>>> [    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator 
>>> ret=0
>>> [    5.914516] Workqueue: pm pm_runtime_work
>>> [    5.918648] Call trace:
>>> [    5.921185]  dump_backtrace+0x0/0x1d0
>>> [    5.924958]  show_stack+0x2c/0x38
>>> [    5.928382]  dump_stack+0xc0/0x104
>>> [    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
>>> [    5.936469]  __rpm_callback+0xe0/0x140
>>> [    5.940332]  rpm_callback+0x38/0x98
>>> [    5.943926]  rpm_suspend+0xec/0x618
>>> [    5.947522]  rpm_idle+0x5c/0x3f8
>>> [    5.950851]  pm_runtime_work+0xa8/0xc0
>>> [    5.954718]  process_one_work+0x1f8/0x4c0
>>> [    5.958848]  worker_thread+0x50/0x468
>>> [    5.962623]  kthread+0x12c/0x158
>>> [    5.965957]  ret_from_fork+0x10/0x1c
>>> 
>>> This is a platform/SoC specific replicator issue, so we can either
>>> introduce some DT property for replicators to identify which 
>>> replicator
>>> has this limitation, check in replicator_enable() and reset the
>>> registers
>>> or have something like below diff to check the idfilter registers in
>>> replicator_enable() and then reset with clear comment specifying it’s
>>> the
>>> hardware limitation on some QCOM SoCs. Please let me know your 
>>> thoughts
>>> on
>>> this?
>>> 
> 
> Sorry for hurrying up and sending the patch -
> https://lore.kernel.org/patchwork/patch/1239923/.
> I will send v2 based on further feedbacks here or there.
> 
>> 
>> 1) does this replicator part have a unique ID that differs from the
>> standard ARM designed replicators?
>> If so perhaps link the modification into this. (even if the part no in
>> PIDR0/1 is the same the UCI should be different for a different
>> implementation)
>> 
> 
> pid=0x2bb909 for both replicators. So part number is same.
> UCI will be different for different implementation(QCOM maybe
> different from ARM),
> but will it be different for different replicators under the same
> impl(i.e., on QCOM).
> 

Here is the cid=0xb105900d for both replicators.

Thanks,
Sai
Sai Prakash Ranjan May 11, 2020, 2:41 p.m. UTC | #18
Hi Suzuki,

On 2020-05-11 20:00, Suzuki K Poulose wrote:
> On 05/11/2020 03:16 PM, Sai Prakash Ranjan wrote:
>> Hi Mike,
>> 
>> On 2020-05-11 16:44, Mike Leach wrote:
>> [...]
>> 
>>>> 
>>>> I checked with the debug team and there is a limitation with
>>>> the replicator(swao_replicator) in the AOSS group where it
>>>> loses the idfilter register context when the clock is disabled.
>>>> This is not just in SC7180 SoC but also reported on some latest
>>>> upcoming QCOM SoCs as well and will need to be taken care in
>>>> order to enable coresight on these chipsets.
>>>> 
>>>> Here's what's happening -  After the replicator is initialized,
>>>> the clock is disabled in amba_pm_runtime_suspend() as a part of
>>>> pm runtime workqueue with the assumption that there will be no
>>>> loss of context after the replicator is initialized. But it doesn't
>>>> hold good with the replicators with these unfortunate limitation
>>>> and the idfilter register context is lost.
>>>> 
>>>> [    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator 
>>>> ret=0
>>>> [    5.914516] Workqueue: pm pm_runtime_work
>>>> [    5.918648] Call trace:
>>>> [    5.921185]  dump_backtrace+0x0/0x1d0
>>>> [    5.924958]  show_stack+0x2c/0x38
>>>> [    5.928382]  dump_stack+0xc0/0x104
>>>> [    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
>>>> [    5.936469]  __rpm_callback+0xe0/0x140
>>>> [    5.940332]  rpm_callback+0x38/0x98
>>>> [    5.943926]  rpm_suspend+0xec/0x618
>>>> [    5.947522]  rpm_idle+0x5c/0x3f8
>>>> [    5.950851]  pm_runtime_work+0xa8/0xc0
>>>> [    5.954718]  process_one_work+0x1f8/0x4c0
>>>> [    5.958848]  worker_thread+0x50/0x468
>>>> [    5.962623]  kthread+0x12c/0x158
>>>> [    5.965957]  ret_from_fork+0x10/0x1c
>>>> 
>>>> This is a platform/SoC specific replicator issue, so we can either
>>>> introduce some DT property for replicators to identify which 
>>>> replicator
>>>> has this limitation, check in replicator_enable() and reset the
>>>> registers
>>>> or have something like below diff to check the idfilter registers in
>>>> replicator_enable() and then reset with clear comment specifying 
>>>> it’s
>>>> the
>>>> hardware limitation on some QCOM SoCs. Please let me know your 
>>>> thoughts
>>>> on
>>>> this?
>>>> 
>> 
>> Sorry for hurrying up and sending the patch - 
>> https://lore.kernel.org/patchwork/patch/1239923/.
>> I will send v2 based on further feedbacks here or there.
>> 
>>> 
>>> 1) does this replicator part have a unique ID that differs from the
>>> standard ARM designed replicators?
>>> If so perhaps link the modification into this. (even if the part no 
>>> in
>>> PIDR0/1 is the same the UCI should be different for a different
>>> implementation)
>>> 
>> 
>> pid=0x2bb909 for both replicators. So part number is same.
>> UCI will be different for different implementation(QCOM maybe 
>> different from ARM),
>> but will it be different for different replicators under the same 
>> impl(i.e., on QCOM).
> 
> May be use PIDR4.DES_2 to match the Implementor and apply the work
> around for all QCOM replicators ?
> 
> To me that sounds the best option.
> 

Ok we can do this as well, but just for my understanding, why do we need 
to reset replicators
in replicator_probe() and not in replicator_enable()? Are we accessing 
anything before
we enable replicators?

Thanks,
Sai
Mike Leach May 12, 2020, 11:49 a.m. UTC | #19
HI,

On Mon, 11 May 2020 at 15:41, Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Suzuki,
>
> On 2020-05-11 20:00, Suzuki K Poulose wrote:
> > On 05/11/2020 03:16 PM, Sai Prakash Ranjan wrote:
> >> Hi Mike,
> >>
> >> On 2020-05-11 16:44, Mike Leach wrote:
> >> [...]
> >>
I have reviewed the replicator driver, and compared to all the other CS drivers.
This driver appears to be the only one that sets hardware values in
probe() and expects them to remain in place on enable, and uses that
state for programming decisions later, despite telling the PM
infrastructure that it is clear to suspend the device.

Now we have a system where the replicator hardware is behaving
differently under the driver, but is it behaving unreasonably?
> >>>>
> >>>> I checked with the debug team and there is a limitation with
> >>>> the replicator(swao_replicator) in the AOSS group where it
> >>>> loses the idfilter register context when the clock is disabled.
> >>>> This is not just in SC7180 SoC but also reported on some latest
> >>>> upcoming QCOM SoCs as well and will need to be taken care in
> >>>> order to enable coresight on these chipsets.
> >>>>
> >>>> Here's what's happening -  After the replicator is initialized,
> >>>> the clock is disabled in amba_pm_runtime_suspend() as a part of
> >>>> pm runtime workqueue with the assumption that there will be no
> >>>> loss of context after the replicator is initialized. But it doesn't
> >>>> hold good with the replicators with these unfortunate limitation
> >>>> and the idfilter register context is lost.
> >>>>
> >>>> [    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator
> >>>> ret=0
> >>>> [    5.914516] Workqueue: pm pm_runtime_work
> >>>> [    5.918648] Call trace:
> >>>> [    5.921185]  dump_backtrace+0x0/0x1d0
> >>>> [    5.924958]  show_stack+0x2c/0x38
> >>>> [    5.928382]  dump_stack+0xc0/0x104
> >>>> [    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
> >>>> [    5.936469]  __rpm_callback+0xe0/0x140
> >>>> [    5.940332]  rpm_callback+0x38/0x98
> >>>> [    5.943926]  rpm_suspend+0xec/0x618
> >>>> [    5.947522]  rpm_idle+0x5c/0x3f8
> >>>> [    5.950851]  pm_runtime_work+0xa8/0xc0
> >>>> [    5.954718]  process_one_work+0x1f8/0x4c0
> >>>> [    5.958848]  worker_thread+0x50/0x468
> >>>> [    5.962623]  kthread+0x12c/0x158
> >>>> [    5.965957]  ret_from_fork+0x10/0x1c
> >>>>
> >>>> This is a platform/SoC specific replicator issue, so we can either
> >>>> introduce some DT property for replicators to identify which
> >>>> replicator
> >>>> has this limitation, check in replicator_enable() and reset the
> >>>> registers
> >>>> or have something like below diff to check the idfilter registers in
> >>>> replicator_enable() and then reset with clear comment specifying
> >>>> it’s
> >>>> the
> >>>> hardware limitation on some QCOM SoCs. Please let me know your
> >>>> thoughts
> >>>> on
> >>>> this?
> >>>>
> >>
> >> Sorry for hurrying up and sending the patch -
> >> https://lore.kernel.org/patchwork/patch/1239923/.
> >> I will send v2 based on further feedbacks here or there.
> >>
> >>>
> >>> 1) does this replicator part have a unique ID that differs from the
> >>> standard ARM designed replicators?
> >>> If so perhaps link the modification into this. (even if the part no
> >>> in
> >>> PIDR0/1 is the same the UCI should be different for a different
> >>> implementation)
> >>>
I have reviewed the replicator driver, and compared to all the other CS drivers.
This driver appears to be the only one that sets hardware values in
probe() and expects them to remain in place on enable, and uses that
state for programming decisions later, despite telling the PM
infrastructure that it is clear to suspend the device.

Now we have a system where the replicator hardware is behaving
differently under the driver, but is it behaving unreasonably?
> >>
> >> pid=0x2bb909 for both replicators. So part number is same.
> >> UCI will be different for different implementation(QCOM maybe
> >> different from ARM),
> >> but will it be different for different replicators under the same
> >> impl(i.e., on QCOM).
> >
> > May be use PIDR4.DES_2 to match the Implementor and apply the work
> > around for all QCOM replicators ?
> >
> > To me that sounds the best option.
> >
>

I agree, if it can be established that the register values that make
up UCI (pid0-4, devarch, devtype, PID:CLASS==0x9), can correctly
identify the parts then a flag can be set in the probe() function and
acted on during the enable() function.

> Ok we can do this as well, but just for my understanding, why do we need
> to reset replicators
> in replicator_probe() and not in replicator_enable()? Are we accessing
> anything before
> we enable replicators?
>

This was a design decision made by the original driver writer. A
normal AMBA device should not lose context due to clock removal (see
drivers/amba/bus.c), so resetting in probe means this operation is
done only once, rather than add overhead in the enable() function,and
later decisions can be made according to the state of the registers
set.

As you have pointed out, for this replicator implementation  the
context is unfortunately not retained when clocks are removed - so an
alternative method is required.

perhaps something like:-

probe()
...
if (match_id_non_persistent_state_regs(ID))
    drvdata->check_filter_val_on_enable;
....

and a re-write of enable:-

enable()
...
CS_UNLOCK()
id0val = read(IDFILTER0);
id1val = read(IDFILTER1);

/* some replicator designs lose context when AMBA clocks are removed -
check for this */
if (drvdata->check_filter_val_on_enable && (id0val == id1val == 0x0))
   id0val = id1val = 0xff;

if(id0xal == id1val == 0xff)
   rc =  claim_device()

if (!rc)
   switch (outport)
      case 0: id0val  = 0x0; break
      case 1: id1va; = 0x0; break;
     default: rc = -EINVAL;

if (!rc)
   write(id0val);
   write(id1val);
CS_LOCK()
return rc;
....

Given that the access to the enable() function is predicated on a
reference count per active port, there is also a case for dropping the
check_filter_val_on_enable flag completely - once one port is active,
then the device will remain enabled until both ports are inactive.
This still allows for future development of selective filtering per
port.

One other point here - there is a case as I mentioned above for moving
to a stored value model for the driver - as this is the only coresight
driver that appears to set state in the probe() function rather than
write all on enable.
This however would necessitate a more comprehensive re-write.

Regards

Mike


> Thanks,
> Sai
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Mathieu Poirier May 12, 2020, 5:45 p.m. UTC | #20
On Tue, 12 May 2020 at 05:49, Mike Leach <mike.leach@linaro.org> wrote:
>
> HI,
>
> On Mon, 11 May 2020 at 15:41, Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
> >
> > Hi Suzuki,
> >
> > On 2020-05-11 20:00, Suzuki K Poulose wrote:
> > > On 05/11/2020 03:16 PM, Sai Prakash Ranjan wrote:
> > >> Hi Mike,
> > >>
> > >> On 2020-05-11 16:44, Mike Leach wrote:
> > >> [...]
> > >>
> I have reviewed the replicator driver, and compared to all the other CS drivers.
> This driver appears to be the only one that sets hardware values in
> probe() and expects them to remain in place on enable, and uses that
> state for programming decisions later, despite telling the PM
> infrastructure that it is clear to suspend the device.
>
> Now we have a system where the replicator hardware is behaving
> differently under the driver, but is it behaving unreasonably?
> > >>>>
> > >>>> I checked with the debug team and there is a limitation with
> > >>>> the replicator(swao_replicator) in the AOSS group where it
> > >>>> loses the idfilter register context when the clock is disabled.
> > >>>> This is not just in SC7180 SoC but also reported on some latest
> > >>>> upcoming QCOM SoCs as well and will need to be taken care in
> > >>>> order to enable coresight on these chipsets.
> > >>>>
> > >>>> Here's what's happening -  After the replicator is initialized,
> > >>>> the clock is disabled in amba_pm_runtime_suspend() as a part of
> > >>>> pm runtime workqueue with the assumption that there will be no
> > >>>> loss of context after the replicator is initialized. But it doesn't
> > >>>> hold good with the replicators with these unfortunate limitation
> > >>>> and the idfilter register context is lost.
> > >>>>
> > >>>> [    5.889406] amba_pm_runtime_suspend devname=6b06000.replicator
> > >>>> ret=0
> > >>>> [    5.914516] Workqueue: pm pm_runtime_work
> > >>>> [    5.918648] Call trace:
> > >>>> [    5.921185]  dump_backtrace+0x0/0x1d0
> > >>>> [    5.924958]  show_stack+0x2c/0x38
> > >>>> [    5.928382]  dump_stack+0xc0/0x104
> > >>>> [    5.931896]  amba_pm_runtime_suspend+0xd8/0xe0
> > >>>> [    5.936469]  __rpm_callback+0xe0/0x140
> > >>>> [    5.940332]  rpm_callback+0x38/0x98
> > >>>> [    5.943926]  rpm_suspend+0xec/0x618
> > >>>> [    5.947522]  rpm_idle+0x5c/0x3f8
> > >>>> [    5.950851]  pm_runtime_work+0xa8/0xc0
> > >>>> [    5.954718]  process_one_work+0x1f8/0x4c0
> > >>>> [    5.958848]  worker_thread+0x50/0x468
> > >>>> [    5.962623]  kthread+0x12c/0x158
> > >>>> [    5.965957]  ret_from_fork+0x10/0x1c
> > >>>>
> > >>>> This is a platform/SoC specific replicator issue, so we can either
> > >>>> introduce some DT property for replicators to identify which
> > >>>> replicator
> > >>>> has this limitation, check in replicator_enable() and reset the
> > >>>> registers
> > >>>> or have something like below diff to check the idfilter registers in
> > >>>> replicator_enable() and then reset with clear comment specifying
> > >>>> it’s
> > >>>> the
> > >>>> hardware limitation on some QCOM SoCs. Please let me know your
> > >>>> thoughts
> > >>>> on
> > >>>> this?
> > >>>>
> > >>
> > >> Sorry for hurrying up and sending the patch -
> > >> https://lore.kernel.org/patchwork/patch/1239923/.
> > >> I will send v2 based on further feedbacks here or there.
> > >>
> > >>>
> > >>> 1) does this replicator part have a unique ID that differs from the
> > >>> standard ARM designed replicators?
> > >>> If so perhaps link the modification into this. (even if the part no
> > >>> in
> > >>> PIDR0/1 is the same the UCI should be different for a different
> > >>> implementation)
> > >>>
> I have reviewed the replicator driver, and compared to all the other CS drivers.
> This driver appears to be the only one that sets hardware values in
> probe() and expects them to remain in place on enable, and uses that
> state for programming decisions later, despite telling the PM
> infrastructure that it is clear to suspend the device.
>
> Now we have a system where the replicator hardware is behaving
> differently under the driver, but is it behaving unreasonably?
> > >>
> > >> pid=0x2bb909 for both replicators. So part number is same.
> > >> UCI will be different for different implementation(QCOM maybe
> > >> different from ARM),
> > >> but will it be different for different replicators under the same
> > >> impl(i.e., on QCOM).
> > >
> > > May be use PIDR4.DES_2 to match the Implementor and apply the work
> > > around for all QCOM replicators ?
> > >
> > > To me that sounds the best option.
> > >
> >
>
> I agree, if it can be established that the register values that make
> up UCI (pid0-4, devarch, devtype, PID:CLASS==0x9), can correctly
> identify the parts then a flag can be set in the probe() function and
> acted on during the enable() function.
>
> > Ok we can do this as well, but just for my understanding, why do we need
> > to reset replicators
> > in replicator_probe() and not in replicator_enable()? Are we accessing
> > anything before
> > we enable replicators?
> >
>
> This was a design decision made by the original driver writer. A
> normal AMBA device should not lose context due to clock removal (see
> drivers/amba/bus.c), so resetting in probe means this operation is
> done only once, rather than add overhead in the enable() function,and
> later decisions can be made according to the state of the registers
> set.
>
> As you have pointed out, for this replicator implementation  the
> context is unfortunately not retained when clocks are removed - so an
> alternative method is required.
>
> perhaps something like:-
>
> probe()
> ...
> if (match_id_non_persistent_state_regs(ID))
>     drvdata->check_filter_val_on_enable;
> ....
>
> and a re-write of enable:-
>
> enable()
> ...
> CS_UNLOCK()
> id0val = read(IDFILTER0);
> id1val = read(IDFILTER1);
>
> /* some replicator designs lose context when AMBA clocks are removed -
> check for this */
> if (drvdata->check_filter_val_on_enable && (id0val == id1val == 0x0))
>    id0val = id1val = 0xff;
>
> if(id0xal == id1val == 0xff)
>    rc =  claim_device()
>
> if (!rc)
>    switch (outport)
>       case 0: id0val  = 0x0; break
>       case 1: id1va; = 0x0; break;
>      default: rc = -EINVAL;
>
> if (!rc)
>    write(id0val);
>    write(id1val);
> CS_LOCK()
> return rc;
> ....
>
> Given that the access to the enable() function is predicated on a
> reference count per active port, there is also a case for dropping the
> check_filter_val_on_enable flag completely - once one port is active,
> then the device will remain enabled until both ports are inactive.
> This still allows for future development of selective filtering per
> port.
>
> One other point here - there is a case as I mentioned above for moving
> to a stored value model for the driver - as this is the only coresight
> driver that appears to set state in the probe() function rather than
> write all on enable.

I favour that option.  Looking at the funnel driver we may have the
same issue with the port configuration, but we can have a look at that
if/when it becomes an issue.

> This however would necessitate a more comprehensive re-write.

Maybe a little more effort but overall a better approach.

Thanks,
Mathieu

>
> Regards
>
> Mike
>
>
> > Thanks,
> > Sai
> > --
> > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> > member
> > of Code Aurora Forum, hosted by The Linux Foundation
> --
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK
Sai Prakash Ranjan May 12, 2020, 5:46 p.m. UTC | #21
Hi Mike,

On 2020-05-12 17:19, Mike Leach wrote:
[...]

>> >>
>> >> Sorry for hurrying up and sending the patch -
>> >> https://lore.kernel.org/patchwork/patch/1239923/.
>> >> I will send v2 based on further feedbacks here or there.
>> >>
>> >>>
>> >>> 1) does this replicator part have a unique ID that differs from the
>> >>> standard ARM designed replicators?
>> >>> If so perhaps link the modification into this. (even if the part no
>> >>> in
>> >>> PIDR0/1 is the same the UCI should be different for a different
>> >>> implementation)
>> >>>
> I have reviewed the replicator driver, and compared to all the other CS 
> drivers.
> This driver appears to be the only one that sets hardware values in
> probe() and expects them to remain in place on enable, and uses that
> state for programming decisions later, despite telling the PM
> infrastructure that it is clear to suspend the device.
> 
> Now we have a system where the replicator hardware is behaving
> differently under the driver, but is it behaving unreasonably?

Thanks for taking your time to review this. For new replicator behaving
unreasonably, I think the assumption that the context is not lost on
disabling clock is flawed since its implementation defined. Is such
assumption documented in any TRM?

>> >>
>> >> pid=0x2bb909 for both replicators. So part number is same.
>> >> UCI will be different for different implementation(QCOM maybe
>> >> different from ARM),
>> >> but will it be different for different replicators under the same
>> >> impl(i.e., on QCOM).
>> >
>> > May be use PIDR4.DES_2 to match the Implementor and apply the work
>> > around for all QCOM replicators ?
>> >
>> > To me that sounds the best option.
>> >
>> 
> 
> I agree, if it can be established that the register values that make
> up UCI (pid0-4, devarch, devtype, PID:CLASS==0x9), can correctly
> identify the parts then a flag can be set in the probe() function and
> acted on during the enable() function.
> 

So here I have a doubt as to why we need to use UCI because PID = 
0x2bb909
and CID = 0xb105900d are same for both replicators, so UCI won't 
identify the
different replicators(in same implementation i.e., on QCOM) here.
Am I missing something?

Thats why I think Suzuki suggested to use PIDR4_DES2 and check for QCOM 
impl
and add a workaround for all replicators, something like below: (will 
need cleaning)

#define PIDR4_DES2	0xFD0

if (FIELD_GET(GENMASK(3, 0), readl_relaxed(drvdata->base + PIDR4_DES2)) 
== 0x4)
	id0val = id1val = 0xff;

... and the rest as you suggested.

> 
> This was a design decision made by the original driver writer. A
> normal AMBA device should not lose context due to clock removal (see
> drivers/amba/bus.c), so resetting in probe means this operation is
> done only once, rather than add overhead in the enable() function,and
> later decisions can be made according to the state of the registers
> set.
> 
> As you have pointed out, for this replicator implementation  the
> context is unfortunately not retained when clocks are removed - so an
> alternative method is required.
> 
> perhaps something like:-
> 
> probe()
> ...
> if (match_id_non_persistent_state_regs(ID))
>     drvdata->check_filter_val_on_enable;
> ....
> 
> and a re-write of enable:-
> 
> enable()
> ...
> CS_UNLOCK()
> id0val = read(IDFILTER0);
> id1val = read(IDFILTER1);
> 
> /* some replicator designs lose context when AMBA clocks are removed -
> check for this */
> if (drvdata->check_filter_val_on_enable && (id0val == id1val == 0x0))
>    id0val = id1val = 0xff;
> 
> if(id0xal == id1val == 0xff)
>    rc =  claim_device()
> 
> if (!rc)
>    switch (outport)
>       case 0: id0val  = 0x0; break
>       case 1: id1va; = 0x0; break;
>      default: rc = -EINVAL;
> 
> if (!rc)
>    write(id0val);
>    write(id1val);
> CS_LOCK()
> return rc;
> ....
> 

Thanks for this detailed idea for workaround. I will add this once we
know whether we need to use UCI or PIDR4_DES2.

> Given that the access to the enable() function is predicated on a
> reference count per active port, there is also a case for dropping the
> check_filter_val_on_enable flag completely - once one port is active,
> then the device will remain enabled until both ports are inactive.
> This still allows for future development of selective filtering per
> port.
> 
> One other point here - there is a case as I mentioned above for moving
> to a stored value model for the driver - as this is the only coresight
> driver that appears to set state in the probe() function rather than
> write all on enable.
> This however would necessitate a more comprehensive re-write.
> 

I would defer this to experts as you or suzuki will have more idea
regarding this than me.

Thanks,
Sai
Mike Leach May 12, 2020, 9:52 p.m. UTC | #22
HI Sai,

On Tue, 12 May 2020 at 18:46, Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Mike,
>
> On 2020-05-12 17:19, Mike Leach wrote:
> [...]
>
> >> >>
> >> >> Sorry for hurrying up and sending the patch -
> >> >> https://lore.kernel.org/patchwork/patch/1239923/.
> >> >> I will send v2 based on further feedbacks here or there.
> >> >>
> >> >>>
> >> >>> 1) does this replicator part have a unique ID that differs from the
> >> >>> standard ARM designed replicators?
> >> >>> If so perhaps link the modification into this. (even if the part no
> >> >>> in
> >> >>> PIDR0/1 is the same the UCI should be different for a different
> >> >>> implementation)
> >> >>>
> > I have reviewed the replicator driver, and compared to all the other CS
> > drivers.
> > This driver appears to be the only one that sets hardware values in
> > probe() and expects them to remain in place on enable, and uses that
> > state for programming decisions later, despite telling the PM
> > infrastructure that it is clear to suspend the device.
> >
> > Now we have a system where the replicator hardware is behaving
> > differently under the driver, but is it behaving unreasonably?
>
> Thanks for taking your time to review this. For new replicator behaving
> unreasonably, I think the assumption that the context is not lost on
> disabling clock is flawed since its implementation defined. Is such
> assumption documented in any TRM?
>

Looking at the AMBA driver there is a comment there that AMBA does not
lose state when clocks are removed. This is consistent with the AMBA
protocol spec which states that AMBA slaves can only be accessed /
read / write on various strobe signals,  or state reset on PRESET
signal, all timed by the rising edge of the bus clock. state changes
are not permitted on clock events alone. Given this static nature of
AMBA slaves then removing the clock should not have any effect.

The AMBA driver only /drivers/amba/bus.c  gives permission to
remove/restore the clocks from the devices (pm_suspend pm_resume
callbacks) - this reduces the power consumption of these devices if
the clock is not running, but state must be retained.

> >> >>
> >> >> pid=0x2bb909 for both replicators. So part number is same.
> >> >> UCI will be different for different implementation(QCOM maybe
> >> >> different from ARM),
> >> >> but will it be different for different replicators under the same
> >> >> impl(i.e., on QCOM).
> >> >
> >> > May be use PIDR4.DES_2 to match the Implementor and apply the work
> >> > around for all QCOM replicators ?
> >> >
> >> > To me that sounds the best option.
> >> >
> >>
> >
> > I agree, if it can be established that the register values that make
> > up UCI (pid0-4, devarch, devtype, PID:CLASS==0x9), can correctly
> > identify the parts then a flag can be set in the probe() function and
> > acted on during the enable() function.
> >
>
> So here I have a doubt as to why we need to use UCI because PID =
> 0x2bb909
> and CID = 0xb105900d are same for both replicators, so UCI won't
> identify the
> different replicators(in same implementation i.e., on QCOM) here.
> Am I missing something?
>
> Thats why I think Suzuki suggested to use PIDR4_DES2 and check for QCOM
> impl
> and add a workaround for all replicators, something like below: (will
> need cleaning)
>
> #define PIDR4_DES2      0xFD0
>
> if (FIELD_GET(GENMASK(3, 0), readl_relaxed(drvdata->base + PIDR4_DES2))
> == 0x4)
>         id0val = id1val = 0xff;
>

Please look at the CoreSight components specification 3.0 (ARM IHI
0029E) Section B2.1.2 which describes the Unique Component Identifier
(UCI).
As mentioned above this consists of a combination of bits from
multiple registers, including PIDR4.

> ... and the rest as you suggested.
>
> >
> > This was a design decision made by the original driver writer. A
> > normal AMBA device should not lose context due to clock removal (see
> > drivers/amba/bus.c), so resetting in probe means this operation is
> > done only once, rather than add overhead in the enable() function,and
> > later decisions can be made according to the state of the registers
> > set.
> >
> > As you have pointed out, for this replicator implementation  the
> > context is unfortunately not retained when clocks are removed - so an
> > alternative method is required.
> >
> > perhaps something like:-
> >
> > probe()
> > ...
> > if (match_id_non_persistent_state_regs(ID))
> >     drvdata->check_filter_val_on_enable;
> > ....
> >
> > and a re-write of enable:-
> >
> > enable()
> > ...
> > CS_UNLOCK()
> > id0val = read(IDFILTER0);
> > id1val = read(IDFILTER1);
> >
> > /* some replicator designs lose context when AMBA clocks are removed -
> > check for this */
> > if (drvdata->check_filter_val_on_enable && (id0val == id1val == 0x0))
> >    id0val = id1val = 0xff;
> >
> > if(id0xal == id1val == 0xff)
> >    rc =  claim_device()
> >
> > if (!rc)
> >    switch (outport)
> >       case 0: id0val  = 0x0; break
> >       case 1: id1va; = 0x0; break;
> >      default: rc = -EINVAL;
> >
> > if (!rc)
> >    write(id0val);
> >    write(id1val);
> > CS_LOCK()
> > return rc;
> > ....
> >
>
> Thanks for this detailed idea for workaround. I will add this once we
> know whether we need to use UCI or PIDR4_DES2.
>
> > Given that the access to the enable() function is predicated on a
> > reference count per active port, there is also a case for dropping the
> > check_filter_val_on_enable flag completely - once one port is active,
> > then the device will remain enabled until both ports are inactive.
> > This still allows for future development of selective filtering per
> > port.
> >
> > One other point here - there is a case as I mentioned above for moving
> > to a stored value model for the driver - as this is the only coresight
> > driver that appears to set state in the probe() function rather than
> > write all on enable.
> > This however would necessitate a more comprehensive re-write.
> >
>
> I would defer this to experts as you or suzuki will have more idea
> regarding this than me.
>
> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation

Thanks

Mike
Stephen Boyd May 13, 2020, 1:49 a.m. UTC | #23
Quoting Mike Leach (2020-05-12 14:52:33)
> HI Sai,
> 
> On Tue, 12 May 2020 at 18:46, Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
> >
> > Hi Mike,
> >
> > On 2020-05-12 17:19, Mike Leach wrote:
> > [...]
> >
> > >> >>
> > >> >> Sorry for hurrying up and sending the patch -
> > >> >> https://lore.kernel.org/patchwork/patch/1239923/.
> > >> >> I will send v2 based on further feedbacks here or there.
> > >> >>
> > >> >>>
> > >> >>> 1) does this replicator part have a unique ID that differs from the
> > >> >>> standard ARM designed replicators?
> > >> >>> If so perhaps link the modification into this. (even if the part no
> > >> >>> in
> > >> >>> PIDR0/1 is the same the UCI should be different for a different
> > >> >>> implementation)
> > >> >>>
> > > I have reviewed the replicator driver, and compared to all the other CS
> > > drivers.
> > > This driver appears to be the only one that sets hardware values in
> > > probe() and expects them to remain in place on enable, and uses that
> > > state for programming decisions later, despite telling the PM
> > > infrastructure that it is clear to suspend the device.
> > >
> > > Now we have a system where the replicator hardware is behaving
> > > differently under the driver, but is it behaving unreasonably?
> >
> > Thanks for taking your time to review this. For new replicator behaving
> > unreasonably, I think the assumption that the context is not lost on
> > disabling clock is flawed since its implementation defined. Is such
> > assumption documented in any TRM?
> >
> 
> Looking at the AMBA driver there is a comment there that AMBA does not
> lose state when clocks are removed. This is consistent with the AMBA
> protocol spec which states that AMBA slaves can only be accessed /
> read / write on various strobe signals,  or state reset on PRESET
> signal, all timed by the rising edge of the bus clock. state changes
> are not permitted on clock events alone. Given this static nature of
> AMBA slaves then removing the clock should not have any effect.

I believe the "clock" that is being used here is actually a software
message to the power manager hardware that the debug subsystem isn't
being used anymore. When nothing is requesting that it be enabled the
power manager turns off the power to the debug subsystem and then the
register context is lost. It shouldn't be a clock in the clk subsystem.
It should be a power domain and be attached to the amba devices in the
usual ways. Then the normal runtime PM semantics would follow. If amba
devices require a clk then we'll have to provide a dummy one that
doesn't do anything on this platform.

> 
> The AMBA driver only /drivers/amba/bus.c  gives permission to
> remove/restore the clocks from the devices (pm_suspend pm_resume
> callbacks) - this reduces the power consumption of these devices if
> the clock is not running, but state must be retained.
> 

Ideally the drivers can have enough knowledge about this situation to do
the proper save/restore steps so that if the coresight hardware isn't
being used we don't keep it powered forever and so that across system
wide suspend/resume we can properly power it off.
Sai Prakash Ranjan May 13, 2020, 3:33 p.m. UTC | #24
Hi Mike,

On 2020-05-13 03:22, Mike Leach wrote:

[...]

> 
> Looking at the AMBA driver there is a comment there that AMBA does not
> lose state when clocks are removed. This is consistent with the AMBA
> protocol spec which states that AMBA slaves can only be accessed /
> read / write on various strobe signals,  or state reset on PRESET
> signal, all timed by the rising edge of the bus clock. state changes
> are not permitted on clock events alone. Given this static nature of
> AMBA slaves then removing the clock should not have any effect.
> 
> The AMBA driver only /drivers/amba/bus.c  gives permission to
> remove/restore the clocks from the devices (pm_suspend pm_resume
> callbacks) - this reduces the power consumption of these devices if
> the clock is not running, but state must be retained.
> 

Thanks for the clarification.

>> >> >>
>> >> >> pid=0x2bb909 for both replicators. So part number is same.
>> >> >> UCI will be different for different implementation(QCOM maybe
>> >> >> different from ARM),
>> >> >> but will it be different for different replicators under the same
>> >> >> impl(i.e., on QCOM).
>> >> >
>> >> > May be use PIDR4.DES_2 to match the Implementor and apply the work
>> >> > around for all QCOM replicators ?
>> >> >
>> >> > To me that sounds the best option.
>> >> >
>> >>
>> >
>> > I agree, if it can be established that the register values that make
>> > up UCI (pid0-4, devarch, devtype, PID:CLASS==0x9), can correctly
>> > identify the parts then a flag can be set in the probe() function and
>> > acted on during the enable() function.
>> >
>> 
>> So here I have a doubt as to why we need to use UCI because PID =
>> 0x2bb909
>> and CID = 0xb105900d are same for both replicators, so UCI won't
>> identify the
>> different replicators(in same implementation i.e., on QCOM) here.
>> Am I missing something?
>> 
>> Thats why I think Suzuki suggested to use PIDR4_DES2 and check for 
>> QCOM
>> impl
>> and add a workaround for all replicators, something like below: (will
>> need cleaning)
>> 
>> #define PIDR4_DES2      0xFD0
>> 
>> if (FIELD_GET(GENMASK(3, 0), readl_relaxed(drvdata->base + 
>> PIDR4_DES2))
>> == 0x4)
>>         id0val = id1val = 0xff;
>> 
> 
> Please look at the CoreSight components specification 3.0 (ARM IHI
> 0029E) Section B2.1.2 which describes the Unique Component Identifier
> (UCI).
> As mentioned above this consists of a combination of bits from
> multiple registers, including PIDR4.
> 

Ok got it now, thanks for clearing the doubt. I will go ahead with
this method to identify QCOM impl and post a patch.

Thanks,
Sai
Sai Prakash Ranjan May 13, 2020, 3:45 p.m. UTC | #25
On 2020-05-13 07:19, Stephen Boyd wrote:
> Quoting Mike Leach (2020-05-12 14:52:33)
>> HI Sai,
>> 
>> On Tue, 12 May 2020 at 18:46, Sai Prakash Ranjan
>> <saiprakash.ranjan@codeaurora.org> wrote:
>> >
>> > Hi Mike,
>> >
>> > On 2020-05-12 17:19, Mike Leach wrote:
>> > [...]
>> >
>> > >> >>
>> > >> >> Sorry for hurrying up and sending the patch -
>> > >> >> https://lore.kernel.org/patchwork/patch/1239923/.
>> > >> >> I will send v2 based on further feedbacks here or there.
>> > >> >>
>> > >> >>>
>> > >> >>> 1) does this replicator part have a unique ID that differs from the
>> > >> >>> standard ARM designed replicators?
>> > >> >>> If so perhaps link the modification into this. (even if the part no
>> > >> >>> in
>> > >> >>> PIDR0/1 is the same the UCI should be different for a different
>> > >> >>> implementation)
>> > >> >>>
>> > > I have reviewed the replicator driver, and compared to all the other CS
>> > > drivers.
>> > > This driver appears to be the only one that sets hardware values in
>> > > probe() and expects them to remain in place on enable, and uses that
>> > > state for programming decisions later, despite telling the PM
>> > > infrastructure that it is clear to suspend the device.
>> > >
>> > > Now we have a system where the replicator hardware is behaving
>> > > differently under the driver, but is it behaving unreasonably?
>> >
>> > Thanks for taking your time to review this. For new replicator behaving
>> > unreasonably, I think the assumption that the context is not lost on
>> > disabling clock is flawed since its implementation defined. Is such
>> > assumption documented in any TRM?
>> >
>> 
>> Looking at the AMBA driver there is a comment there that AMBA does not
>> lose state when clocks are removed. This is consistent with the AMBA
>> protocol spec which states that AMBA slaves can only be accessed /
>> read / write on various strobe signals,  or state reset on PRESET
>> signal, all timed by the rising edge of the bus clock. state changes
>> are not permitted on clock events alone. Given this static nature of
>> AMBA slaves then removing the clock should not have any effect.
> 
> I believe the "clock" that is being used here is actually a software
> message to the power manager hardware that the debug subsystem isn't
> being used anymore. When nothing is requesting that it be enabled the
> power manager turns off the power to the debug subsystem and then the
> register context is lost. It shouldn't be a clock in the clk subsystem.
> It should be a power domain and be attached to the amba devices in the
> usual ways. Then the normal runtime PM semantics would follow. If amba
> devices require a clk then we'll have to provide a dummy one that
> doesn't do anything on this platform.
> 

Note that there are 2 dynamic replicators and the behaviour is different 
only on
one of the replicators(swao_replicator) which is in AOSS domain. I don't 
see
how runtime PM would help us differentiate between them and handle PM 
differently
for different replicators.

Thanks,
Sai
Sai Prakash Ranjan May 16, 2020, 10:04 a.m. UTC | #26
Hi Mike, Suzuki

[...]

>> 
>> Please look at the CoreSight components specification 3.0 (ARM IHI
>> 0029E) Section B2.1.2 which describes the Unique Component Identifier
>> (UCI).
>> As mentioned above this consists of a combination of bits from
>> multiple registers, including PIDR4.
>> 
> 
> Ok got it now, thanks for clearing the doubt. I will go ahead with
> this method to identify QCOM impl and post a patch.
> 

Looking some more into this, since we have this limitation only on
specific replicator on very few QCOM SoCs, rather than having a blanket
workaround for all QCOM, we were thinking it would be better to have
this workaround based on a firmware property something like
"qcom,replicator-loses-context" for those replicators with this
limitation and then set the drvdata->check_idfilter_val based on
this property.

Thanks,
Sai
Sai Prakash Ranjan May 19, 2020, 9:04 a.m. UTC | #27
Hi Mike, Suzuki,

On 2020-05-16 15:34, Sai Prakash Ranjan wrote:
> Hi Mike, Suzuki
> 
> [...]
> 
>>> 
>>> Please look at the CoreSight components specification 3.0 (ARM IHI
>>> 0029E) Section B2.1.2 which describes the Unique Component Identifier
>>> (UCI).
>>> As mentioned above this consists of a combination of bits from
>>> multiple registers, including PIDR4.
>>> 
>> 
>> Ok got it now, thanks for clearing the doubt. I will go ahead with
>> this method to identify QCOM impl and post a patch.
>> 
> 
> Looking some more into this, since we have this limitation only on
> specific replicator on very few QCOM SoCs, rather than having a blanket
> workaround for all QCOM, we were thinking it would be better to have
> this workaround based on a firmware property something like
> "qcom,replicator-loses-context" for those replicators with this
> limitation and then set the drvdata->check_idfilter_val based on
> this property.
> 

Sorry for going back and forth on this one, but I think having a 
firmware
property will clearly help us identify the issue on specific SoCs rather 
than
wholesale workaround for all QCOM SoCs. For now, I will post a patch 
based on
the property "qcom,replicator-loses-context", please feel free to yell 
at me
if this is completely wrong and we can discuss it further in that patch.

Thanks,
Sai
diff mbox series

Patch

diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
index e7dc1c31d20d..f4eaa38f8f43 100644
--- a/drivers/hwtracing/coresight/coresight-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-replicator.c
@@ -66,14 +66,16 @@  static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
 				     int inport, int outport)
 {
 	int rc = 0;
-	u32 reg;
+	u32 reg0, reg1;
 
 	switch (outport) {
 	case 0:
-		reg = REPLICATOR_IDFILTER0;
+		reg0 = REPLICATOR_IDFILTER0;
+		reg1 = REPLICATOR_IDFILTER1;
 		break;
 	case 1:
-		reg = REPLICATOR_IDFILTER1;
+		reg0 = REPLICATOR_IDFILTER1;
+		reg1 = REPLICATOR_IDFILTER0;
 		break;
 	default:
 		WARN_ON(1);
@@ -87,8 +89,11 @@  static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
 		rc = coresight_claim_device_unlocked(drvdata->base);
 
 	/* Ensure that the outport is enabled. */
-	if (!rc)
-		writel_relaxed(0x00, drvdata->base + reg);
+	if (!rc) {
+		writel_relaxed(0x00, drvdata->base + reg0);
+		writel_relaxed(0xff, drvdata->base + reg1);
+	}
+
 	CS_LOCK(drvdata->base);
 
 	return rc;