Message ID | 20200405102819.28460-1-saiprakash.ranjan@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] coresight: dynamic-replicator: Fix handling of multiple connections | expand |
Hi, The programmable replicator hardware by design enables trace through both ports on reset. (see 1, section 4.4, 9.11) The replicator driver overrides this functionality to disable output, until the Coresight infrastructure chooses a path from source to sink. Now given that the hardware design is such that we must be able to allow trace to be sent to both ports, a generic patch to prevent this does not seem appropriate here. I think this needs further investigation - to determine why this appears to be failing in this particular instance. Regards Mike [1] CoreSight System-on-Chip SoC-600 TRM, r3p2, 100806_0302_00_en On Sun, 5 Apr 2020 at 11:28, Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> wrote: > > Since commit 30af4fb619e5 ("coresight: dynamic-replicator: > Handle multiple connections"), we do not make sure that > the other port is disabled when the dynamic replicator is > enabled. This is seen to cause the CPU hardlockup atleast > on SC7180 SoC when enabling ETM with ETR as the sink via > sysfs. > > Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 > CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S B 5.4.25 #100 > Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT) > Call trace: > dump_backtrace+0x0/0x188 > show_stack+0x20/0x2c > dump_stack+0xdc/0x144 > panic+0x168/0x370 > arch_seccomp_spec_mitigate+0x0/0x14 > watchdog_timer_fn+0x68/0x290 > __hrtimer_run_queues+0x264/0x498 > hrtimer_interrupt+0xf0/0x22c > arch_timer_handler_phys+0x40/0x50 > handle_percpu_devid_irq+0x8c/0x158 > __handle_domain_irq+0x84/0xc4 > gic_handle_irq+0x100/0x1c4 > el1_irq+0xbc/0x180 > arch_cpu_idle+0x3c/0x5c > default_idle_call+0x1c/0x38 > do_idle+0x100/0x280 > cpu_startup_entry+0x24/0x28 > secondary_start_kernel+0x15c/0x170 > SMP: stopping secondary CPUs > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> > --- > .../hwtracing/coresight/coresight-replicator.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c > index e7dc1c31d20d..f4eaa38f8f43 100644 > --- a/drivers/hwtracing/coresight/coresight-replicator.c > +++ b/drivers/hwtracing/coresight/coresight-replicator.c > @@ -66,14 +66,16 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata, > int inport, int outport) > { > int rc = 0; > - u32 reg; > + u32 reg0, reg1; > > switch (outport) { > case 0: > - reg = REPLICATOR_IDFILTER0; > + reg0 = REPLICATOR_IDFILTER0; > + reg1 = REPLICATOR_IDFILTER1; > break; > case 1: > - reg = REPLICATOR_IDFILTER1; > + reg0 = REPLICATOR_IDFILTER1; > + reg1 = REPLICATOR_IDFILTER0; > break; > default: > WARN_ON(1); > @@ -87,8 +89,11 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata, > rc = coresight_claim_device_unlocked(drvdata->base); > > /* Ensure that the outport is enabled. */ > - if (!rc) > - writel_relaxed(0x00, drvdata->base + reg); > + if (!rc) { > + writel_relaxed(0x00, drvdata->base + reg0); > + writel_relaxed(0xff, drvdata->base + reg1); > + } > + > CS_LOCK(drvdata->base); > > return rc; > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation
Hi Mike, Thanks for taking a look. On 2020-04-06 16:25, Mike Leach wrote: > Hi, > > The programmable replicator hardware by design enables trace through > both ports on reset. (see 1, section 4.4, 9.11) The replicator driver > overrides this functionality to disable output, until the Coresight > infrastructure chooses a path from source to sink. > Now given that the hardware design is such that we must be able to > allow trace to be sent to both ports, a generic patch to prevent this > does not seem appropriate here. > > I think this needs further investigation - to determine why this > appears to be failing in this particular instance. > Yes, this probably needs further investigation, but CPU hardlock stack trace doesnt help much. I could always trigger this hard lockup without this patch on SC7180 SoC and this is only seen when ETR is used as the sink. The only difference I could see between non working case (on SC7180 [1]) and the working case (on SDM845 [2]) is the path from source to sink. SC7180 source to sink path(Not working): ---------------------------------------- etm0_out | apss_funnel_in0 | apss_merge_funnel_in | funnel1_in4 | merge_funnel_in1 | swao_funnel_in | etf_in | swao_replicator_in | replicator_in | etr_in SDM845 source to sink path(Working): ------------------------------------ etm0_out | apss_funnel_in0 | apss_merge_funnel_in | funnel2_in5 | merge_funnel_in2 | etf_in | replicator_in | etr_in [1] - https://lore.kernel.org/patchwork/patch/1212946/ [2] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/qcom/sdm845.dtsi?h=v5.6#n1910 Thanks, Sai
On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: > Hi Mike, > > Thanks for taking a look. > > On 2020-04-06 16:25, Mike Leach wrote: >> Hi, >> >> The programmable replicator hardware by design enables trace through >> both ports on reset. (see 1, section 4.4, 9.11) The replicator driver >> overrides this functionality to disable output, until the Coresight >> infrastructure chooses a path from source to sink. >> Now given that the hardware design is such that we must be able to >> allow trace to be sent to both ports, a generic patch to prevent this >> does not seem appropriate here. >> >> I think this needs further investigation - to determine why this >> appears to be failing in this particular instance. >> > > Yes, this probably needs further investigation, but CPU hardlock stack > trace doesnt help much. I could always trigger this hard lockup without > this patch on SC7180 SoC and this is only seen when ETR is used as the > sink. > > The only difference I could see between non working case (on SC7180 [1]) > and > the working case (on SDM845 [2]) is the path from source to sink. > > SC7180 source to sink path(Not working): > ---------------------------------------- > > etm0_out > | > apss_funnel_in0 > | > apss_merge_funnel_in > | > funnel1_in4 > | > merge_funnel_in1 > | > swao_funnel_in > | > etf_in > | > swao_replicator_in > | > replicator_in > | > etr_in There seems to be two replicators back to back here. What is connected to the other output of both of them ? Are there any TPIUs ? What happens if you choose a sink on the other end of "swao_replicator" (ETB ?) After boot, what do the idfilter registers read for both the replicators ? I believe we need to properly assign the TRACE_IDs for tracing sessions, (rather than static ids) in a way such that we could filter them and use the multiple sinks in parallel for separate trace sessions and this is not simple (involves kernel driver changes and the perf tool to be able to decode the trace id changes too). So for the moment, we need to : 1) Disallow turning the replicator ON, when it is already turned ON 2) Do what your patch does. i.e, disable the other end while one end is turned on. Thoughts ? Kind regards Suzuki
Hi Suzuki, Thanks for looking into this issue. On 2020-04-07 15:54, Suzuki K Poulose wrote: > On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: > > There seems to be two replicators back to back here. What is connected > to the other output of both of them ? Are there any TPIUs ? What > happens > if you choose a sink on the other end of "swao_replicator" (ETB ?) > The other outport of swao replicator is connected to EUD which is a QCOM specific HW which can be used as a sink like USB. And the other outport of other replicator(replicator_out) is connected to TPIU. > After boot, what do the idfilter registers read for both the > replicators ? > Added some prints in replicator_probe. replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 idfilter1=0x0 replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff idfilter1=0xff replicator probe ret=0 devname=6046000.replicator idfilter0=0xff idfilter1=0xff > > I believe we need to properly assign the TRACE_IDs for tracing > sessions, > (rather than static ids) in a way such that we could filter them and > use > the multiple sinks in parallel for separate trace sessions and this is > not simple (involves kernel driver changes and the perf tool to be able > to decode the trace id changes too). > > > So for the moment, we need to : > > 1) Disallow turning the replicator ON, when it is already turned ON > 2) Do what your patch does. i.e, disable the other end while one end > is turned on. > > Thoughts ? > Sounds good to me, Mike would have some comments. Thanks, Sai
On 04/07/2020 12:29 PM, Sai Prakash Ranjan wrote: > Hi Suzuki, > > Thanks for looking into this issue. > > On 2020-04-07 15:54, Suzuki K Poulose wrote: >> On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: >> >> There seems to be two replicators back to back here. What is connected >> to the other output of both of them ? Are there any TPIUs ? What happens >> if you choose a sink on the other end of "swao_replicator" (ETB ?) >> > > The other outport of swao replicator is connected to EUD which is a > QCOM specific HW which can be used as a sink like USB. > And the other outport of other replicator(replicator_out) is connected to > TPIU. > >> After boot, what do the idfilter registers read for both the >> replicators ? >> > > Added some prints in replicator_probe. > > replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 > idfilter1=0x0 > replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff > idfilter1=0xff > replicator probe ret=0 devname=6046000.replicator idfilter0=0xff > idfilter1=0xff Curious to see how the idfilterX is set to 0: if that is never used. Or if the user doesn't reset it back to 0xff. Does your test ever touch EUD (enable the port for EUD at swao-replicator) ? What are the values before you run your test ? Suzuki
Hi Suzuki, On 2020-04-07 18:38, Suzuki K Poulose wrote: > On 04/07/2020 12:29 PM, Sai Prakash Ranjan wrote: >> Hi Suzuki, >> >> Thanks for looking into this issue. >> >> On 2020-04-07 15:54, Suzuki K Poulose wrote: >>> On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: >>> >>> There seems to be two replicators back to back here. What is >>> connected >>> to the other output of both of them ? Are there any TPIUs ? What >>> happens >>> if you choose a sink on the other end of "swao_replicator" (ETB ?) >>> >> >> The other outport of swao replicator is connected to EUD which is a >> QCOM specific HW which can be used as a sink like USB. >> And the other outport of other replicator(replicator_out) is connected >> to >> TPIU. >> >>> After boot, what do the idfilter registers read for both the >>> replicators ? >>> >> >> Added some prints in replicator_probe. >> >> replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 >> idfilter1=0x0 >> replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff >> idfilter1=0xff >> replicator probe ret=0 devname=6046000.replicator idfilter0=0xff >> idfilter1=0xff > > Curious to see how the idfilterX is set to 0: > if that is never used. > Or > if the user doesn't reset it back to 0xff. > For both replicators, the default value seems to be 0x0. replicator probe in res ret=0 devname=6046000.replicator idfilter0=0x0 idfilter1=0x0 replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 idfilter1=0x0 replicator probe in res ret=0 devname=6b06000.replicator idfilter0=0x0 idfilter1=0x0 replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff idfilter1=0xff replicator probe in res ret=0 devname=6046000.replicator idfilter0=0x0 idfilter1=0x0 replicator probe ret=0 devname=6046000.replicator idfilter0=0xff idfilter1=0xff > Does your test ever touch EUD (enable the port for EUD at > swao-replicator) ? What are the values before you run your test ? > > No, we do not use EUD, downstream it is used as dummy sink. And I just try to select the ETR as the sink and enable ETM0 as the trace source. echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink echo 1 > /sys/bus/coresight/devices/etm0/enable_source Also I see the KASAN warning but that seems like some other issue. [ 526.110401] ================================================================== [ 526.117988] BUG: KASAN: slab-out-of-bounds in funnel_enable+0x54/0x1b0 [ 526.124706] Read of size 4 at addr ffffff8135f9549c by task bash/1114 [ 526.131324] [ 526.132886] CPU: 3 PID: 1114 Comm: bash Tainted: G S 5.4.25 #232 [ 526.140397] Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT) [ 526.147113] Call trace: [ 526.149653] dump_backtrace+0x0/0x188 [ 526.153431] show_stack+0x20/0x2c [ 526.156852] dump_stack+0xdc/0x144 [ 526.160370] print_address_description+0x3c/0x494 [ 526.165211] __kasan_report+0x144/0x168 [ 526.169170] kasan_report+0x10/0x18 [ 526.172769] check_memory_region+0x1a4/0x1b4 [ 526.177164] __kasan_check_read+0x18/0x24 [ 526.181292] funnel_enable+0x54/0x1b0 [ 526.185072] coresight_enable_path+0x104/0x198 [ 526.189649] coresight_enable+0x118/0x26c [ 526.193778] enable_source_store+0x64/0xa8 [ 526.198007] dev_attr_store+0x40/0x58 [ 526.201788] sysfs_kf_write+0x4c/0x64 [ 526.205567] kernfs_fop_write+0x16c/0x210 [ 526.209700] __vfs_write+0x54/0x1a8 [ 526.213297] vfs_write+0xe4/0x1a4 [ 526.216714] ksys_write+0x84/0xec [ 526.220131] __arm64_sys_write+0x20/0x2c [ 526.224179] el0_svc_common+0xa8/0x160 [ 526.228040] el0_svc_compat_handler+0x2c/0x38 [ 526.232533] el0_svc_compat+0x8/0x10 [ 526.236225] [ 526.237782] Allocated by task 280: [ 526.241298] __kasan_kmalloc+0xf0/0x1ac [ 526.245249] kasan_kmalloc+0xc/0x14 [ 526.248849] __kmalloc+0x28c/0x3b4 [ 526.252361] coresight_register+0x88/0x250 [ 526.256587] funnel_probe+0x15c/0x228 [ 526.260365] dynamic_funnel_probe+0x20/0x2c [ 526.264679] amba_probe+0xbc/0x158 [ 526.268193] really_probe+0x144/0x408 [ 526.271970] driver_probe_device+0x70/0x140 [ 526.276282] __device_attach_driver+0x9c/0x110 [ 526.280861] bus_for_each_drv+0x90/0xd8 [ 526.284822] __device_attach+0xb4/0x164 [ 526.288772] device_initial_probe+0x20/0x2c [ 526.293081] bus_probe_device+0x34/0x94 [ 526.297030] deferred_probe_work_func+0xa4/0x100 [ 526.301794] process_one_work+0x33c/0x640 [ 526.305922] worker_thread+0x2a0/0x470 [ 526.309786] kthread+0x128/0x138 [ 526.313119] ret_from_fork+0x10/0x18 [ 526.316810] [ 526.318364] Freed by task 0: [ 526.321344] (stack is not available) [ 526.325024] [ 526.326580] The buggy address belongs to the object at ffffff8135f95480 [ 526.326580] which belongs to the cache kmalloc-128 of size 128 [ 526.339439] The buggy address is located 28 bytes inside of [ 526.339439] 128-byte region [ffffff8135f95480, ffffff8135f95500) [ 526.351399] The buggy address belongs to the page: [ 526.356342] page:ffffffff04b7e500 refcount:1 mapcount:0 mapping:ffffff814b00c380 index:0x0 compound_mapcount: 0 [ 526.366711] flags: 0x4000000000010200(slab|head) [ 526.371475] raw: 4000000000010200 ffffffff05034008 ffffffff0501eb08 ffffff814b00c380 [ 526.379435] raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 [ 526.387393] page dumped because: kasan: bad access detected [ 526.393128] [ 526.394681] Memory state around the buggy address: [ 526.399619] ffffff8135f95380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 526.407046] ffffff8135f95400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 526.414473] >ffffff8135f95480: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 526.421900] ^ [ 526.426029] ffffff8135f95500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 526.433456] ffffff8135f95580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 526.440883] ================================================================== Thanks, Sai
On 04/07/2020 02:56 PM, Sai Prakash Ranjan wrote: > Hi Suzuki, > > On 2020-04-07 18:38, Suzuki K Poulose wrote: >> On 04/07/2020 12:29 PM, Sai Prakash Ranjan wrote: >>> Hi Suzuki, >>> >>> Thanks for looking into this issue. >>> >>> On 2020-04-07 15:54, Suzuki K Poulose wrote: >>>> On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: >>>> >>>> There seems to be two replicators back to back here. What is connected >>>> to the other output of both of them ? Are there any TPIUs ? What >>>> happens >>>> if you choose a sink on the other end of "swao_replicator" (ETB ?) >>>> >>> >>> The other outport of swao replicator is connected to EUD which is a >>> QCOM specific HW which can be used as a sink like USB. >>> And the other outport of other replicator(replicator_out) is >>> connected to >>> TPIU. >>> >>>> After boot, what do the idfilter registers read for both the >>>> replicators ? >>>> >>> >>> Added some prints in replicator_probe. >>> >>> replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 >>> idfilter1=0x0 >>> replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff >>> idfilter1=0xff >>> replicator probe ret=0 devname=6046000.replicator idfilter0=0xff >>> idfilter1=0xff >> >> Curious to see how the idfilterX is set to 0: >> if that is never used. >> Or >> if the user doesn't reset it back to 0xff. >> > > For both replicators, the default value seems to be 0x0. > > replicator probe in res ret=0 devname=6046000.replicator idfilter0=0x0 > idfilter1=0x0 > replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 > idfilter1=0x0 > replicator probe in res ret=0 devname=6b06000.replicator idfilter0=0x0 > idfilter1=0x0 > replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff > idfilter1=0xff > replicator probe in res ret=0 devname=6046000.replicator idfilter0=0x0 > idfilter1=0x0 > replicator probe ret=0 devname=6046000.replicator idfilter0=0xff > idfilter1=0xff I am not sure how you have added the debugs, but it looks like the drivers set 0xff for both the port filters on a successful probe. > >> Does your test ever touch EUD (enable the port for EUD at >> swao-replicator) ? What are the values before you run your test ? >> >> > > No, we do not use EUD, downstream it is used as dummy sink. > And I just try to select the ETR as the sink and enable ETM0 as the > trace source. > > echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink > echo 1 > /sys/bus/coresight/devices/etm0/enable_source > > Also I see the KASAN warning but that seems like some other issue. > Does your funnel have sparse input described ? I think we have an issue with the "refcnt" tracking for funnels (especially). When we have a sparse input ports described (ie. if only input ports 0, 3, 5 are described to protect the secure side connections), we could end up accessing beyond the memory allocated for csdev->refcnts. i.e, csdev->pdata->nr_inport = 3, and we could access csdev->refcnts[5], while sizeof(csdev->refcnts) = sizeof(atomic_t) * 3. I will send a patch. > [ 526.110401] > ================================================================== > [ 526.117988] BUG: KASAN: slab-out-of-bounds in funnel_enable+0x54/0x1b0 > [ 526.124706] Read of size 4 at addr ffffff8135f9549c by task bash/1114 > [ 526.131324] > [ 526.132886] CPU: 3 PID: 1114 Comm: bash Tainted: G S 5.4.25 #232 > [ 526.140397] Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT) > [ 526.147113] Call trace: > [ 526.149653] dump_backtrace+0x0/0x188 > [ 526.153431] show_stack+0x20/0x2c > [ 526.156852] dump_stack+0xdc/0x144 > [ 526.160370] print_address_description+0x3c/0x494 > [ 526.165211] __kasan_report+0x144/0x168 > [ 526.169170] kasan_report+0x10/0x18 > [ 526.172769] check_memory_region+0x1a4/0x1b4 > [ 526.177164] __kasan_check_read+0x18/0x24 > [ 526.181292] funnel_enable+0x54/0x1b0 > [ 526.185072] coresight_enable_path+0x104/0x198 > [ 526.189649] coresight_enable+0x118/0x26c > [ 526.193778] enable_source_store+0x64/0xa8 > [ 526.198007] dev_attr_store+0x40/0x58 > [ 526.201788] sysfs_kf_write+0x4c/0x64 > [ 526.205567] kernfs_fop_write+0x16c/0x210 > [ 526.209700] __vfs_write+0x54/0x1a8 > [ 526.213297] vfs_write+0xe4/0x1a4 > [ 526.216714] ksys_write+0x84/0xec > [ 526.220131] __arm64_sys_write+0x20/0x2c > [ 526.224179] el0_svc_common+0xa8/0x160 > [ 526.228040] el0_svc_compat_handler+0x2c/0x38 > [ 526.232533] el0_svc_compat+0x8/0x10 > [ 526.236225] > [ 526.237782] Allocated by task 280: > [ 526.241298] __kasan_kmalloc+0xf0/0x1ac > [ 526.245249] kasan_kmalloc+0xc/0x14 > [ 526.248849] __kmalloc+0x28c/0x3b4 > [ 526.252361] coresight_register+0x88/0x250 > [ 526.256587] funnel_probe+0x15c/0x228 > [ 526.260365] dynamic_funnel_probe+0x20/0x2c > [ 526.264679] amba_probe+0xbc/0x158 > [ 526.268193] really_probe+0x144/0x408 > [ 526.271970] driver_probe_device+0x70/0x140 > [ 526.276282] __device_attach_driver+0x9c/0x110 > [ 526.280861] bus_for_each_drv+0x90/0xd8 > [ 526.284822] __device_attach+0xb4/0x164 > [ 526.288772] device_initial_probe+0x20/0x2c > [ 526.293081] bus_probe_device+0x34/0x94 > [ 526.297030] deferred_probe_work_func+0xa4/0x100 > [ 526.301794] process_one_work+0x33c/0x640 > [ 526.305922] worker_thread+0x2a0/0x470 > [ 526.309786] kthread+0x128/0x138 > [ 526.313119] ret_from_fork+0x10/0x18 > [ 526.316810] > [ 526.318364] Freed by task 0: > [ 526.321344] (stack is not available) > [ 526.325024] > [ 526.326580] The buggy address belongs to the object at ffffff8135f95480 > [ 526.326580] which belongs to the cache kmalloc-128 of size 128 > [ 526.339439] The buggy address is located 28 bytes inside of > [ 526.339439] 128-byte region [ffffff8135f95480, ffffff8135f95500) > [ 526.351399] The buggy address belongs to the page: > [ 526.356342] page:ffffffff04b7e500 refcount:1 mapcount:0 > mapping:ffffff814b00c380 index:0x0 compound_mapcount: 0 > [ 526.366711] flags: 0x4000000000010200(slab|head) > [ 526.371475] raw: 4000000000010200 ffffffff05034008 ffffffff0501eb08 > ffffff814b00c380 > [ 526.379435] raw: 0000000000000000 0000000000190019 00000001ffffffff > 0000000000000000 > [ 526.387393] page dumped because: kasan: bad access detected > [ 526.393128] > [ 526.394681] Memory state around the buggy address: > [ 526.399619] ffffff8135f95380: fc fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc > [ 526.407046] ffffff8135f95400: fc fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc > [ 526.414473] >ffffff8135f95480: 04 fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc > [ 526.421900] ^ > [ 526.426029] ffffff8135f95500: fc fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc > [ 526.433456] ffffff8135f95580: fc fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc > [ 526.440883] > ================================================================== > > Thanks, > Sai > Suzuki
Hi Suzuki, On 2020-04-07 20:23, Suzuki K Poulose wrote: > On 04/07/2020 02:56 PM, Sai Prakash Ranjan wrote: >> Hi Suzuki, >> >> On 2020-04-07 18:38, Suzuki K Poulose wrote: >>> On 04/07/2020 12:29 PM, Sai Prakash Ranjan wrote: >>>> Hi Suzuki, >>>> >>>> Thanks for looking into this issue. >>>> >>>> On 2020-04-07 15:54, Suzuki K Poulose wrote: >>>>> On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: >>>>> >>>>> There seems to be two replicators back to back here. What is >>>>> connected >>>>> to the other output of both of them ? Are there any TPIUs ? What >>>>> happens >>>>> if you choose a sink on the other end of "swao_replicator" (ETB ?) >>>>> >>>> >>>> The other outport of swao replicator is connected to EUD which is a >>>> QCOM specific HW which can be used as a sink like USB. >>>> And the other outport of other replicator(replicator_out) is >>>> connected to >>>> TPIU. >>>> >>>>> After boot, what do the idfilter registers read for both the >>>>> replicators ? >>>>> >>>> >>>> Added some prints in replicator_probe. >>>> >>>> replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 >>>> idfilter1=0x0 >>>> replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff >>>> idfilter1=0xff >>>> replicator probe ret=0 devname=6046000.replicator idfilter0=0xff >>>> idfilter1=0xff >>> >>> Curious to see how the idfilterX is set to 0: >>> if that is never used. >>> Or >>> if the user doesn't reset it back to 0xff. >>> >> >> For both replicators, the default value seems to be 0x0. >> >> replicator probe in res ret=0 devname=6046000.replicator >> idfilter0=0x0 idfilter1=0x0 >> replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 >> idfilter1=0x0 >> replicator probe in res ret=0 devname=6b06000.replicator >> idfilter0=0x0 idfilter1=0x0 >> replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff >> idfilter1=0xff >> replicator probe in res ret=0 devname=6046000.replicator >> idfilter0=0x0 idfilter1=0x0 >> replicator probe ret=0 devname=6046000.replicator idfilter0=0xff >> idfilter1=0xff > > I am not sure how you have added the debugs, but it looks like the > drivers set 0xff for both the port filters on a successful probe. > Yes, thats done by replicator_reset in probe right? Below is the diff: @@ -242,6 +244,9 @@ static int replicator_probe(struct device *dev, struct resource *res) } drvdata->base = base; desc.groups = replicator_groups; + pr_info("replicator probe in res ret=%d devname=%s idfilter0=%#lx idfilter1=%#lx\n", + ret, dev_name(dev), (readl_relaxed(base + REPLICATOR_IDFILTER0)), + (readl_relaxed(base + REPLICATOR_IDFILTER1))); } dev_set_drvdata(dev, drvdata); @@ -272,6 +277,12 @@ static int replicator_probe(struct device *dev, struct resource *res) out_disable_clk: if (ret && !IS_ERR_OR_NULL(drvdata->atclk)) clk_disable_unprepare(drvdata->atclk); + + if (res) + pr_info("replicator probe ret=%d devname=%s idfilter0=%#lx idfilter1=%#lx\n", + ret, dev_name(dev), (readl_relaxed(base + REPLICATOR_IDFILTER0)), + (readl_relaxed(base + REPLICATOR_IDFILTER1))); + return ret; } >> >>> Does your test ever touch EUD (enable the port for EUD at >>> swao-replicator) ? What are the values before you run your test ? >>> >>> >> >> No, we do not use EUD, downstream it is used as dummy sink. >> And I just try to select the ETR as the sink and enable ETM0 as the >> trace source. >> >> echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink >> echo 1 > /sys/bus/coresight/devices/etm0/enable_source >> >> Also I see the KASAN warning but that seems like some other issue. >> > > Does your funnel have sparse input described ? I think we have an > issue with the "refcnt" tracking for funnels (especially). When we > have a sparse input ports described (ie. if only input ports 0, 3, > 5 are described to protect the secure side connections), we could > end up accessing beyond the memory allocated for csdev->refcnts. > i.e, csdev->pdata->nr_inport = 3, and we could access > csdev->refcnts[5], > while sizeof(csdev->refcnts) = sizeof(atomic_t) * 3. > > I will send a patch. > Thanks, I can test it out. Thanks, Sai
On Tue, Apr 07, 2020 at 08:48:54PM +0530, Sai Prakash Ranjan wrote: > Hi Suzuki, > > On 2020-04-07 20:23, Suzuki K Poulose wrote: > > On 04/07/2020 02:56 PM, Sai Prakash Ranjan wrote: > > > Hi Suzuki, > > > > > > On 2020-04-07 18:38, Suzuki K Poulose wrote: > > > > On 04/07/2020 12:29 PM, Sai Prakash Ranjan wrote: > > > > > Hi Suzuki, > > > > > > > > > > Thanks for looking into this issue. > > > > > > > > > > On 2020-04-07 15:54, Suzuki K Poulose wrote: > > > > > > On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: > > > > > > > > > > > > There seems to be two replicators back to back here. > > > > > > What is connected > > > > > > to the other output of both of them ? Are there any > > > > > > TPIUs ? What happens > > > > > > if you choose a sink on the other end of "swao_replicator" (ETB ?) > > > > > > > > > > > > > > > > The other outport of swao replicator is connected to EUD which is a > > > > > QCOM specific HW which can be used as a sink like USB. > > > > > And the other outport of other replicator(replicator_out) is > > > > > connected to > > > > > TPIU. > > > > > > > > > > > After boot, what do the idfilter registers read for both > > > > > > the replicators ? > > > > > > > > > > > > > > > > Added some prints in replicator_probe. > > > > > > > > > > replicator probe ret=-517 devname=6046000.replicator > > > > > idfilter0=0x0 idfilter1=0x0 > > > > > replicator probe ret=0 devname=6b06000.replicator > > > > > idfilter0=0xff idfilter1=0xff > > > > > replicator probe ret=0 devname=6046000.replicator > > > > > idfilter0=0xff idfilter1=0xff > > > > > > > > Curious to see how the idfilterX is set to 0: > > > > if that is never used. > > > > Or > > > > if the user doesn't reset it back to 0xff. > > > > > > > > > > For both replicators, the default value seems to be 0x0. > > > > > > replicator probe in res ret=0 devname=6046000.replicator > > > idfilter0=0x0 idfilter1=0x0 > > > replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 > > > idfilter1=0x0 > > > replicator probe in res ret=0 devname=6b06000.replicator > > > idfilter0=0x0 idfilter1=0x0 > > > replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff > > > idfilter1=0xff > > > replicator probe in res ret=0 devname=6046000.replicator > > > idfilter0=0x0 idfilter1=0x0 > > > replicator probe ret=0 devname=6046000.replicator idfilter0=0xff > > > idfilter1=0xff > > > > I am not sure how you have added the debugs, but it looks like the > > drivers set 0xff for both the port filters on a successful probe. > > > > Yes, thats done by replicator_reset in probe right? Below is the diff: > > @@ -242,6 +244,9 @@ static int replicator_probe(struct device *dev, struct > resource *res) > } > drvdata->base = base; > desc.groups = replicator_groups; > + pr_info("replicator probe in res ret=%d devname=%s > idfilter0=%#lx idfilter1=%#lx\n", > + ret, dev_name(dev), (readl_relaxed(base + > REPLICATOR_IDFILTER0)), > + (readl_relaxed(base + REPLICATOR_IDFILTER1))); > } > > dev_set_drvdata(dev, drvdata); > @@ -272,6 +277,12 @@ static int replicator_probe(struct device *dev, struct > resource *res) > out_disable_clk: > if (ret && !IS_ERR_OR_NULL(drvdata->atclk)) > clk_disable_unprepare(drvdata->atclk); > + > + if (res) > + pr_info("replicator probe ret=%d devname=%s idfilter0=%#lx > idfilter1=%#lx\n", > + ret, dev_name(dev), (readl_relaxed(base + > REPLICATOR_IDFILTER0)), > + (readl_relaxed(base + REPLICATOR_IDFILTER1))); > + > return ret; > } > > > > > > > > Does your test ever touch EUD (enable the port for EUD at > > > > swao-replicator) ? What are the values before you run your test ? > > > > > > > > > > > > > > No, we do not use EUD, downstream it is used as dummy sink. > > > And I just try to select the ETR as the sink and enable ETM0 as the > > > trace source. > > > > > > echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink > > > echo 1 > /sys/bus/coresight/devices/etm0/enable_source > > > > > > Also I see the KASAN warning but that seems like some other issue. > > > > > > > Does your funnel have sparse input described ? I think we have an > > issue with the "refcnt" tracking for funnels (especially). When we > > have a sparse input ports described (ie. if only input ports 0, 3, > > 5 are described to protect the secure side connections), we could > > end up accessing beyond the memory allocated for csdev->refcnts. > > i.e, csdev->pdata->nr_inport = 3, and we could access csdev->refcnts[5], > > while sizeof(csdev->refcnts) = sizeof(atomic_t) * 3. > > > > I will send a patch. > > > > Thanks, I can test it out. Please find the untested patch below. ---8>--- [untested] coresight: Fix support for sparse port numbers On some systems the firmware may not describe all the ports connected to a component (e.g, for security reasons). This could be especially problematic for "funnels" where we could end up in modifying memory beyond the allocated space for refcounts. e.g, for a funnel with input ports listed 0, 3, 5, nr_inport = 3. However the we could access refcnts[5] while checking for references. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> --- .../hwtracing/coresight/coresight-platform.c | 74 ++++++++++++------- drivers/hwtracing/coresight/coresight.c | 8 +- 2 files changed, 56 insertions(+), 26 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c index 3c5bee429105..1c610d6e944b 100644 --- a/drivers/hwtracing/coresight/coresight-platform.c +++ b/drivers/hwtracing/coresight/coresight-platform.c @@ -67,6 +67,7 @@ static void of_coresight_get_ports_legacy(const struct device_node *node, int *nr_inport, int *nr_outport) { struct device_node *ep = NULL; + struct of_endpoint endpoint; int in = 0, out = 0; do { @@ -74,10 +75,16 @@ static void of_coresight_get_ports_legacy(const struct device_node *node, if (!ep) break; - if (of_coresight_legacy_ep_is_input(ep)) - in++; - else - out++; + if (of_graph_parse_endpoint(ep, &endpoint)) + continue; + + if (of_coresight_legacy_ep_is_input(ep)) { + in = (endpoint.port + 1 > in) ? + endpoint.port + 1 : in; + } else { + out = (endpoint.port + 1) > out ? + endpoint.port + 1 : out; + } } while (ep); @@ -117,9 +124,16 @@ of_coresight_count_ports(struct device_node *port_parent) { int i = 0; struct device_node *ep = NULL; + struct of_endpoint endpoint; + + while ((ep = of_graph_get_next_endpoint(port_parent, ep))) { + /* Defer error handling to parsing */ + if (of_graph_parse_endpoint(ep, &endpoint)) + continue; + if (endpoint.port + 1 > i) + i = endpoint.port + 1; + } - while ((ep = of_graph_get_next_endpoint(port_parent, ep))) - i++; return i; } @@ -171,14 +185,12 @@ static int of_coresight_get_cpu(struct device *dev) * Parses the local port, remote device name and the remote port. * * Returns : - * 1 - If the parsing is successful and a connection record - * was created for an output connection. * 0 - If the parsing completed without any fatal errors. * -Errno - Fatal error, abort the scanning. */ static int of_coresight_parse_endpoint(struct device *dev, struct device_node *ep, - struct coresight_connection *conn) + struct coresight_platform_data *pdata) { int ret = 0; struct of_endpoint endpoint, rendpoint; @@ -186,6 +198,7 @@ static int of_coresight_parse_endpoint(struct device *dev, struct device_node *rep = NULL; struct device *rdev = NULL; struct fwnode_handle *rdev_fwnode; + struct coresight_connection *conn; do { /* Parse the local port details */ @@ -212,6 +225,12 @@ static int of_coresight_parse_endpoint(struct device *dev, break; } + conn = &pdata->conns[endpoint.port]; + if (conn->child_fwnode) { + dev_warn(dev, "Duplicate output port %d\n", endpoint.port); + ret = -EINVAL; + break; + } conn->outport = endpoint.port; /* * Hold the refcount to the target device. This could be @@ -224,7 +243,6 @@ static int of_coresight_parse_endpoint(struct device *dev, conn->child_fwnode = fwnode_handle_get(rdev_fwnode); conn->child_port = rendpoint.port; /* Connection record updated */ - ret = 1; } while (0); of_node_put(rparent); @@ -238,7 +256,6 @@ static int of_get_coresight_platform_data(struct device *dev, struct coresight_platform_data *pdata) { int ret = 0; - struct coresight_connection *conn; struct device_node *ep = NULL; const struct device_node *parent = NULL; bool legacy_binding = false; @@ -267,8 +284,6 @@ static int of_get_coresight_platform_data(struct device *dev, dev_warn_once(dev, "Uses obsolete Coresight DT bindings\n"); } - conn = pdata->conns; - /* Iterate through each output port to discover topology */ while ((ep = of_graph_get_next_endpoint(parent, ep))) { /* @@ -280,15 +295,9 @@ static int of_get_coresight_platform_data(struct device *dev, if (legacy_binding && of_coresight_legacy_ep_is_input(ep)) continue; - ret = of_coresight_parse_endpoint(dev, ep, conn); - switch (ret) { - case 1: - conn++; /* Fall through */ - case 0: - break; - default: + ret = of_coresight_parse_endpoint(dev, ep, pdata); + if (ret) return ret; - } } return 0; @@ -627,6 +636,11 @@ static int acpi_coresight_parse_link(struct acpi_device *adev, * coresight_remove_match(). */ conn->child_fwnode = fwnode_handle_get(&r_adev->fwnode); + } else if (dir == ACPI_CORESIGHT_LINK_SLAVE) { + conn->child_port = fields[0].integer.value; + } else { + /* Invalid direction */ + return -EINVAL; } return dir; @@ -672,10 +686,14 @@ static int acpi_coresight_parse_graph(struct acpi_device *adev, return dir; if (dir == ACPI_CORESIGHT_LINK_MASTER) { - pdata->nr_outport++; + if (ptr->outport > pdata->nr_outport) + pdata->nr_outport = ptr->outport; ptr++; } else { - pdata->nr_inport++; + WARN_ON(pdata->nr_inport == ptr->child_port); + /* Do not move the ptr for input connections */ + if (ptr->child_port > pdata->nr_inport) + pdata->nr_inport = ptr->child_port; } } @@ -684,8 +702,13 @@ static int acpi_coresight_parse_graph(struct acpi_device *adev, return rc; /* Copy the connection information to the final location */ - for (i = 0; i < pdata->nr_outport; i++) - pdata->conns[i] = conns[i]; + for (i = 0; conns + i < ptr; i++) { + int port = conns[i].outport; + + /* Duplicate output port */ + WARN_ON(pdata->conns[port].child_fwnode); + pdata->conns[port] = conns[i]; + } devm_kfree(&adev->dev, conns); return 0; @@ -787,6 +810,7 @@ coresight_get_platform_data(struct device *dev) goto error; pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL); + pdata->nr_outport = pdata->nr_inport = -1; if (!pdata) { ret = -ENOMEM; goto error; diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index ef20f74c85fa..f07bc0a7ab88 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -990,6 +990,9 @@ static int coresight_orphan_match(struct device *dev, void *data) for (i = 0; i < i_csdev->pdata->nr_outport; i++) { conn = &i_csdev->pdata->conns[i]; + /* Skip the port if FW doesn't describe it */ + if (!conn->child_fwnode) + continue; /* We have found at least one orphan connection */ if (conn->child_dev == NULL) { /* Does it match this newly added device? */ @@ -1029,6 +1032,9 @@ static void coresight_fixup_device_conns(struct coresight_device *csdev) struct coresight_connection *conn = &csdev->pdata->conns[i]; struct device *dev = NULL; + if (!conn->child_fwnode) + continue; + dev = bus_find_device_by_fwnode(&coresight_bustype, conn->child_fwnode); if (dev) { conn->child_dev = to_coresight_device(dev); @@ -1061,7 +1067,7 @@ static int coresight_remove_match(struct device *dev, void *data) for (i = 0; i < iterator->pdata->nr_outport; i++) { conn = &iterator->pdata->conns[i]; - if (conn->child_dev == NULL) + if (conn->child_dev == NULL || conn->child_fwnode == NULL) continue; if (csdev->dev.fwnode == conn->child_fwnode) {
Quoting Suzuki K Poulose (2020-04-08 15:43:47) > On Tue, Apr 07, 2020 at 08:48:54PM +0530, Sai Prakash Ranjan wrote: > > > > Thanks, I can test it out. > > Please find the untested patch below. > > ---8>--- > > [untested] coresight: Fix support for sparse port numbers > > On some systems the firmware may not describe all the ports > connected to a component (e.g, for security reasons). This > could be especially problematic for "funnels" where we could > end up in modifying memory beyond the allocated space for > refcounts. > > e.g, for a funnel with input ports listed 0, 3, 5, nr_inport = 3. > However the we could access refcnts[5] while checking for > references. > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- Things don't blow up for me anymore with KASAN, so feel free to add Tested-by: Stephen Boyd <swboyd@chromium.org>
Hi Suzuki, On 2020-04-09 04:13, Suzuki K Poulose wrote: > On Tue, Apr 07, 2020 at 08:48:54PM +0530, Sai Prakash Ranjan wrote: > > Please find the untested patch below. > > ---8>--- > > [untested] coresight: Fix support for sparse port numbers > > On some systems the firmware may not describe all the ports > connected to a component (e.g, for security reasons). This > could be especially problematic for "funnels" where we could > end up in modifying memory beyond the allocated space for > refcounts. > > e.g, for a funnel with input ports listed 0, 3, 5, nr_inport = 3. > However the we could access refcnts[5] while checking for > references. > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > .../hwtracing/coresight/coresight-platform.c | 74 ++++++++++++------- > drivers/hwtracing/coresight/coresight.c | 8 +- > 2 files changed, 56 insertions(+), 26 deletions(-) > > diff --git a/drivers/hwtracing/coresight/coresight-platform.c > b/drivers/hwtracing/coresight/coresight-platform.c > index 3c5bee429105..1c610d6e944b 100644 > --- a/drivers/hwtracing/coresight/coresight-platform.c > +++ b/drivers/hwtracing/coresight/coresight-platform.c > @@ -67,6 +67,7 @@ static void of_coresight_get_ports_legacy(const > struct device_node *node, > int *nr_inport, int *nr_outport) > { > struct device_node *ep = NULL; > + struct of_endpoint endpoint; > int in = 0, out = 0; > > do { > @@ -74,10 +75,16 @@ static void of_coresight_get_ports_legacy(const > struct device_node *node, > if (!ep) > break; > > - if (of_coresight_legacy_ep_is_input(ep)) > - in++; > - else > - out++; > + if (of_graph_parse_endpoint(ep, &endpoint)) > + continue; > + > + if (of_coresight_legacy_ep_is_input(ep)) { > + in = (endpoint.port + 1 > in) ? > + endpoint.port + 1 : in; > + } else { > + out = (endpoint.port + 1) > out ? > + endpoint.port + 1 : out; > + } > > } while (ep); > > @@ -117,9 +124,16 @@ of_coresight_count_ports(struct device_node > *port_parent) > { > int i = 0; > struct device_node *ep = NULL; > + struct of_endpoint endpoint; > + > + while ((ep = of_graph_get_next_endpoint(port_parent, ep))) { > + /* Defer error handling to parsing */ > + if (of_graph_parse_endpoint(ep, &endpoint)) > + continue; > + if (endpoint.port + 1 > i) > + i = endpoint.port + 1; > + } > > - while ((ep = of_graph_get_next_endpoint(port_parent, ep))) > - i++; > return i; > } > > @@ -171,14 +185,12 @@ static int of_coresight_get_cpu(struct device > *dev) > * Parses the local port, remote device name and the remote port. > * > * Returns : > - * 1 - If the parsing is successful and a connection record > - * was created for an output connection. > * 0 - If the parsing completed without any fatal errors. > * -Errno - Fatal error, abort the scanning. > */ > static int of_coresight_parse_endpoint(struct device *dev, > struct device_node *ep, > - struct coresight_connection *conn) > + struct coresight_platform_data *pdata) > { > int ret = 0; > struct of_endpoint endpoint, rendpoint; > @@ -186,6 +198,7 @@ static int of_coresight_parse_endpoint(struct > device *dev, > struct device_node *rep = NULL; > struct device *rdev = NULL; > struct fwnode_handle *rdev_fwnode; > + struct coresight_connection *conn; > > do { > /* Parse the local port details */ > @@ -212,6 +225,12 @@ static int of_coresight_parse_endpoint(struct > device *dev, > break; > } > > + conn = &pdata->conns[endpoint.port]; > + if (conn->child_fwnode) { > + dev_warn(dev, "Duplicate output port %d\n", endpoint.port); > + ret = -EINVAL; > + break; > + } > conn->outport = endpoint.port; > /* > * Hold the refcount to the target device. This could be > @@ -224,7 +243,6 @@ static int of_coresight_parse_endpoint(struct > device *dev, > conn->child_fwnode = fwnode_handle_get(rdev_fwnode); > conn->child_port = rendpoint.port; > /* Connection record updated */ > - ret = 1; > } while (0); > > of_node_put(rparent); > @@ -238,7 +256,6 @@ static int of_get_coresight_platform_data(struct > device *dev, > struct coresight_platform_data *pdata) > { > int ret = 0; > - struct coresight_connection *conn; > struct device_node *ep = NULL; > const struct device_node *parent = NULL; > bool legacy_binding = false; > @@ -267,8 +284,6 @@ static int of_get_coresight_platform_data(struct > device *dev, > dev_warn_once(dev, "Uses obsolete Coresight DT bindings\n"); > } > > - conn = pdata->conns; > - > /* Iterate through each output port to discover topology */ > while ((ep = of_graph_get_next_endpoint(parent, ep))) { > /* > @@ -280,15 +295,9 @@ static int of_get_coresight_platform_data(struct > device *dev, > if (legacy_binding && of_coresight_legacy_ep_is_input(ep)) > continue; > > - ret = of_coresight_parse_endpoint(dev, ep, conn); > - switch (ret) { > - case 1: > - conn++; /* Fall through */ > - case 0: > - break; > - default: > + ret = of_coresight_parse_endpoint(dev, ep, pdata); > + if (ret) > return ret; > - } > } > > return 0; > @@ -627,6 +636,11 @@ static int acpi_coresight_parse_link(struct > acpi_device *adev, > * coresight_remove_match(). > */ > conn->child_fwnode = fwnode_handle_get(&r_adev->fwnode); > + } else if (dir == ACPI_CORESIGHT_LINK_SLAVE) { > + conn->child_port = fields[0].integer.value; > + } else { > + /* Invalid direction */ > + return -EINVAL; > } > > return dir; > @@ -672,10 +686,14 @@ static int acpi_coresight_parse_graph(struct > acpi_device *adev, > return dir; > > if (dir == ACPI_CORESIGHT_LINK_MASTER) { > - pdata->nr_outport++; > + if (ptr->outport > pdata->nr_outport) > + pdata->nr_outport = ptr->outport; > ptr++; > } else { > - pdata->nr_inport++; > + WARN_ON(pdata->nr_inport == ptr->child_port); > + /* Do not move the ptr for input connections */ > + if (ptr->child_port > pdata->nr_inport) > + pdata->nr_inport = ptr->child_port; > } > } > > @@ -684,8 +702,13 @@ static int acpi_coresight_parse_graph(struct > acpi_device *adev, > return rc; > > /* Copy the connection information to the final location */ > - for (i = 0; i < pdata->nr_outport; i++) > - pdata->conns[i] = conns[i]; > + for (i = 0; conns + i < ptr; i++) { > + int port = conns[i].outport; > + > + /* Duplicate output port */ > + WARN_ON(pdata->conns[port].child_fwnode); > + pdata->conns[port] = conns[i]; > + } > > devm_kfree(&adev->dev, conns); > return 0; > @@ -787,6 +810,7 @@ coresight_get_platform_data(struct device *dev) > goto error; > > pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL); > + pdata->nr_outport = pdata->nr_inport = -1; > if (!pdata) { > ret = -ENOMEM; > goto error; > diff --git a/drivers/hwtracing/coresight/coresight.c > b/drivers/hwtracing/coresight/coresight.c > index ef20f74c85fa..f07bc0a7ab88 100644 > --- a/drivers/hwtracing/coresight/coresight.c > +++ b/drivers/hwtracing/coresight/coresight.c > @@ -990,6 +990,9 @@ static int coresight_orphan_match(struct device > *dev, void *data) > for (i = 0; i < i_csdev->pdata->nr_outport; i++) { > conn = &i_csdev->pdata->conns[i]; > > + /* Skip the port if FW doesn't describe it */ > + if (!conn->child_fwnode) > + continue; > /* We have found at least one orphan connection */ > if (conn->child_dev == NULL) { > /* Does it match this newly added device? */ > @@ -1029,6 +1032,9 @@ static void coresight_fixup_device_conns(struct > coresight_device *csdev) > struct coresight_connection *conn = &csdev->pdata->conns[i]; > struct device *dev = NULL; > > + if (!conn->child_fwnode) > + continue; > + > dev = bus_find_device_by_fwnode(&coresight_bustype, > conn->child_fwnode); > if (dev) { > conn->child_dev = to_coresight_device(dev); > @@ -1061,7 +1067,7 @@ static int coresight_remove_match(struct device > *dev, void *data) > for (i = 0; i < iterator->pdata->nr_outport; i++) { > conn = &iterator->pdata->conns[i]; > > - if (conn->child_dev == NULL) > + if (conn->child_dev == NULL || conn->child_fwnode == NULL) > continue; > > if (csdev->dev.fwnode == conn->child_fwnode) { Thanks Suzuki, I don't see the KASAN warning anymore with this patch. But somehow tmc_etr probe fails with error -12(ENOMEM). Thanks, Sai
Hi Sai, Thanks for the quick testing ! Please see below for the tmc_etr probe failure. On 04/09/2020 08:51 AM, Sai Prakash Ranjan wrote: > Hi Suzuki, > > On 2020-04-09 04:13, Suzuki K Poulose wrote: >> On Tue, Apr 07, 2020 at 08:48:54PM +0530, Sai Prakash Ranjan wrote: >> >> Please find the untested patch below. >> >> ---8>--- >> >> [untested] coresight: Fix support for sparse port numbers >> >> On some systems the firmware may not describe all the ports >> connected to a component (e.g, for security reasons). This >> could be especially problematic for "funnels" where we could >> end up in modifying memory beyond the allocated space for >> refcounts. >> >> e.g, for a funnel with input ports listed 0, 3, 5, nr_inport = 3. >> However the we could access refcnts[5] while checking for >> references. >> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> .../hwtracing/coresight/coresight-platform.c | 74 ++++++++++++------- >> drivers/hwtracing/coresight/coresight.c | 8 +- >> 2 files changed, 56 insertions(+), 26 deletions(-) >> >> diff --git a/drivers/hwtracing/coresight/coresight-platform.c >> b/drivers/hwtracing/coresight/coresight-platform.c >> index 3c5bee429105..1c610d6e944b 100644 >> --- a/drivers/hwtracing/coresight/coresight-platform.c >> +++ b/drivers/hwtracing/coresight/coresight-platform.c >> @@ -67,6 +67,7 @@ static void of_coresight_get_ports_legacy(const [...] >> @@ -684,8 +702,13 @@ static int acpi_coresight_parse_graph(struct >> acpi_device *adev, >> return rc; >> >> /* Copy the connection information to the final location */ >> - for (i = 0; i < pdata->nr_outport; i++) >> - pdata->conns[i] = conns[i]; >> + for (i = 0; conns + i < ptr; i++) { >> + int port = conns[i].outport; >> + >> + /* Duplicate output port */ >> + WARN_ON(pdata->conns[port].child_fwnode); >> + pdata->conns[port] = conns[i]; >> + } >> >> devm_kfree(&adev->dev, conns); >> return 0; >> @@ -787,6 +810,7 @@ coresight_get_platform_data(struct device *dev) >> goto error; >> >> pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL); >> + pdata->nr_outport = pdata->nr_inport = -1; Please could you remove this hunk and test it ? I forgot to update the commit before I sent this over. >> /* Does it match this newly added device? */ >> @@ -1029,6 +1032,9 @@ static void coresight_fixup_device_conns(struct >> coresight_device *csdev) >> struct coresight_connection *conn = &csdev->pdata->conns[i]; >> struct device *dev = NULL; >> >> + if (!conn->child_fwnode) >> + continue; >> + >> dev = bus_find_device_by_fwnode(&coresight_bustype, >> conn->child_fwnode); >> if (dev) { >> conn->child_dev = to_coresight_device(dev); >> @@ -1061,7 +1067,7 @@ static int coresight_remove_match(struct device >> *dev, void *data) >> for (i = 0; i < iterator->pdata->nr_outport; i++) { >> conn = &iterator->pdata->conns[i]; >> >> - if (conn->child_dev == NULL) >> + if (conn->child_dev == NULL || conn->child_fwnode == NULL) >> continue; >> >> if (csdev->dev.fwnode == conn->child_fwnode) { > > > Thanks Suzuki, I don't see the KASAN warning anymore with this patch. > But somehow tmc_etr probe fails with error -12(ENOMEM). See the above suggestion. Cheers Suzuki
Hi Suzuki, On 2020-04-09 14:47, Suzuki K Poulose wrote: > Hi Sai, > > > Thanks for the quick testing ! Please see below for the > tmc_etr probe failure. > > On 04/09/2020 08:51 AM, Sai Prakash Ranjan wrote: >> Hi Suzuki, >> >> On 2020-04-09 04:13, Suzuki K Poulose wrote: >>> On Tue, Apr 07, 2020 at 08:48:54PM +0530, Sai Prakash Ranjan wrote: >>> >>> Please find the untested patch below. >>> >>> ---8>--- >>> >>> [untested] coresight: Fix support for sparse port numbers >>> >>> On some systems the firmware may not describe all the ports >>> connected to a component (e.g, for security reasons). This >>> could be especially problematic for "funnels" where we could >>> end up in modifying memory beyond the allocated space for >>> refcounts. >>> >>> e.g, for a funnel with input ports listed 0, 3, 5, nr_inport = 3. >>> However the we could access refcnts[5] while checking for >>> references. >>> >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >>> --- >>> .../hwtracing/coresight/coresight-platform.c | 74 >>> ++++++++++++------- >>> drivers/hwtracing/coresight/coresight.c | 8 +- >>> 2 files changed, 56 insertions(+), 26 deletions(-) >>> >>> diff --git a/drivers/hwtracing/coresight/coresight-platform.c >>> b/drivers/hwtracing/coresight/coresight-platform.c >>> index 3c5bee429105..1c610d6e944b 100644 >>> --- a/drivers/hwtracing/coresight/coresight-platform.c >>> +++ b/drivers/hwtracing/coresight/coresight-platform.c >>> @@ -67,6 +67,7 @@ static void of_coresight_get_ports_legacy(const > > [...] > >>> @@ -684,8 +702,13 @@ static int acpi_coresight_parse_graph(struct >>> acpi_device *adev, >>> return rc; >>> >>> /* Copy the connection information to the final location */ >>> - for (i = 0; i < pdata->nr_outport; i++) >>> - pdata->conns[i] = conns[i]; >>> + for (i = 0; conns + i < ptr; i++) { >>> + int port = conns[i].outport; >>> + >>> + /* Duplicate output port */ >>> + WARN_ON(pdata->conns[port].child_fwnode); >>> + pdata->conns[port] = conns[i]; >>> + } >>> >>> devm_kfree(&adev->dev, conns); >>> return 0; >>> @@ -787,6 +810,7 @@ coresight_get_platform_data(struct device *dev) >>> goto error; >>> >>> pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL); >>> + pdata->nr_outport = pdata->nr_inport = -1; > > > Please could you remove this hunk and test it ? I forgot to update the > commit before I sent this over. > I don't see the ETR probe failure and the KASAN warning anymore with this change. Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Thanks, Sai
Hi Suzuki, On 2020-04-07 20:23, Suzuki K Poulose wrote: > On 04/07/2020 02:56 PM, Sai Prakash Ranjan wrote: >> Hi Suzuki, >> >> On 2020-04-07 18:38, Suzuki K Poulose wrote: >>> On 04/07/2020 12:29 PM, Sai Prakash Ranjan wrote: >>>> Hi Suzuki, >>>> >>>> Thanks for looking into this issue. >>>> >>>> On 2020-04-07 15:54, Suzuki K Poulose wrote: >>>>> On 04/07/2020 10:46 AM, Sai Prakash Ranjan wrote: >>>>> >>>>> There seems to be two replicators back to back here. What is >>>>> connected >>>>> to the other output of both of them ? Are there any TPIUs ? What >>>>> happens >>>>> if you choose a sink on the other end of "swao_replicator" (ETB ?) >>>>> >>>> >>>> The other outport of swao replicator is connected to EUD which is a >>>> QCOM specific HW which can be used as a sink like USB. >>>> And the other outport of other replicator(replicator_out) is >>>> connected to >>>> TPIU. >>>> >>>>> After boot, what do the idfilter registers read for both the >>>>> replicators ? >>>>> >>>> >>>> Added some prints in replicator_probe. >>>> >>>> replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 >>>> idfilter1=0x0 >>>> replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff >>>> idfilter1=0xff >>>> replicator probe ret=0 devname=6046000.replicator idfilter0=0xff >>>> idfilter1=0xff >>> >>> Curious to see how the idfilterX is set to 0: >>> if that is never used. >>> Or >>> if the user doesn't reset it back to 0xff. >>> >> >> For both replicators, the default value seems to be 0x0. >> >> replicator probe in res ret=0 devname=6046000.replicator >> idfilter0=0x0 idfilter1=0x0 >> replicator probe ret=-517 devname=6046000.replicator idfilter0=0x0 >> idfilter1=0x0 >> replicator probe in res ret=0 devname=6b06000.replicator >> idfilter0=0x0 idfilter1=0x0 >> replicator probe ret=0 devname=6b06000.replicator idfilter0=0xff >> idfilter1=0xff >> replicator probe in res ret=0 devname=6046000.replicator >> idfilter0=0x0 idfilter1=0x0 >> replicator probe ret=0 devname=6046000.replicator idfilter0=0xff >> idfilter1=0xff > > I am not sure how you have added the debugs, but it looks like the > drivers set 0xff for both the port filters on a successful probe. > About the earlier mentioned points on: 1) Disallow turning the replicator ON, when it is already turned ON 2) Do what your patch does. i.e, disable the other end while one end is turned on. Do we need 1) and should we go ahead with this? Thanks, Sai
diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c index e7dc1c31d20d..f4eaa38f8f43 100644 --- a/drivers/hwtracing/coresight/coresight-replicator.c +++ b/drivers/hwtracing/coresight/coresight-replicator.c @@ -66,14 +66,16 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata, int inport, int outport) { int rc = 0; - u32 reg; + u32 reg0, reg1; switch (outport) { case 0: - reg = REPLICATOR_IDFILTER0; + reg0 = REPLICATOR_IDFILTER0; + reg1 = REPLICATOR_IDFILTER1; break; case 1: - reg = REPLICATOR_IDFILTER1; + reg0 = REPLICATOR_IDFILTER1; + reg1 = REPLICATOR_IDFILTER0; break; default: WARN_ON(1); @@ -87,8 +89,11 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata, rc = coresight_claim_device_unlocked(drvdata->base); /* Ensure that the outport is enabled. */ - if (!rc) - writel_relaxed(0x00, drvdata->base + reg); + if (!rc) { + writel_relaxed(0x00, drvdata->base + reg0); + writel_relaxed(0xff, drvdata->base + reg1); + } + CS_LOCK(drvdata->base); return rc;
Since commit 30af4fb619e5 ("coresight: dynamic-replicator: Handle multiple connections"), we do not make sure that the other port is disabled when the dynamic replicator is enabled. This is seen to cause the CPU hardlockup atleast on SC7180 SoC when enabling ETM with ETR as the sink via sysfs. Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S B 5.4.25 #100 Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT) Call trace: dump_backtrace+0x0/0x188 show_stack+0x20/0x2c dump_stack+0xdc/0x144 panic+0x168/0x370 arch_seccomp_spec_mitigate+0x0/0x14 watchdog_timer_fn+0x68/0x290 __hrtimer_run_queues+0x264/0x498 hrtimer_interrupt+0xf0/0x22c arch_timer_handler_phys+0x40/0x50 handle_percpu_devid_irq+0x8c/0x158 __handle_domain_irq+0x84/0xc4 gic_handle_irq+0x100/0x1c4 el1_irq+0xbc/0x180 arch_cpu_idle+0x3c/0x5c default_idle_call+0x1c/0x38 do_idle+0x100/0x280 cpu_startup_entry+0x24/0x28 secondary_start_kernel+0x15c/0x170 SMP: stopping secondary CPUs Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> --- .../hwtracing/coresight/coresight-replicator.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)