mbox series

[0/4] perf/arm-cmn: Add CMN-650 and CMN-700

Message ID cover.1650320598.git.robin.murphy@arm.com (mailing list archive)
Headers show
Series perf/arm-cmn: Add CMN-650 and CMN-700 | expand

Message

Robin Murphy April 18, 2022, 10:57 p.m. UTC
Hi Will,

As promised last time, here are the patches adding yet more inscrutable
support for the other new members of the CMN family. I'm not sure if any
of the folks who want this are the kind to give upstream feedback, but
we can always live in hope... A colleague has kindly managed to briefly
sanity-check the CMN-700 support under RTL emulation, but otherwise I've
only been able to test the refactorings in terms of not breaking my
little CMN-600 platform. Still, I'm pretty confident that that's caught
all the stupids, and the rest is mostly just yet more events, so I
don't think anything's particularly high-risk here.

Cheers,
Robin.


Robin Murphy (4):
  dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
  perf/arm-cmn: Add CMN-650 support
  perf/arm-cmn: Refactor occupancy filter selector
  perf/arm-cmn: Add CMN-700 support

 .../devicetree/bindings/perf/arm,cmn.yaml     |   2 +
 drivers/perf/arm-cmn.c                        | 606 ++++++++++++++----
 2 files changed, 485 insertions(+), 123 deletions(-)

Comments

Ilkka Koskinen April 21, 2022, 7:54 a.m. UTC | #1
On Mon, 18 Apr 2022, Robin Murphy wrote:

> Hi Will,
>
> As promised last time, here are the patches adding yet more inscrutable
> support for the other new members of the CMN family. I'm not sure if any
> of the folks who want this are the kind to give upstream feedback, but
> we can always live in hope... A colleague has kindly managed to briefly
> sanity-check the CMN-700 support under RTL emulation, but otherwise I've
> only been able to test the refactorings in terms of not breaking my
> little CMN-600 platform. Still, I'm pretty confident that that's caught
> all the stupids, and the rest is mostly just yet more events, so I
> don't think anything's particularly high-risk here.
>
> Cheers,
> Robin.

I tested the patch set on CMN-650 (and CMN-600). Apart from extra events 
on r2p0, everything seemed to work fine.

Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

>
>
> Robin Murphy (4):
>  dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
>  perf/arm-cmn: Add CMN-650 support
>  perf/arm-cmn: Refactor occupancy filter selector
>  perf/arm-cmn: Add CMN-700 support
>
> .../devicetree/bindings/perf/arm,cmn.yaml     |   2 +
> drivers/perf/arm-cmn.c                        | 606 ++++++++++++++----
> 2 files changed, 485 insertions(+), 123 deletions(-)
>
> -- 
> 2.35.3.dirty
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Will Deacon May 6, 2022, 2:43 p.m. UTC | #2
On Mon, 18 Apr 2022 23:57:37 +0100, Robin Murphy wrote:
> As promised last time, here are the patches adding yet more inscrutable
> support for the other new members of the CMN family. I'm not sure if any
> of the folks who want this are the kind to give upstream feedback, but
> we can always live in hope... A colleague has kindly managed to briefly
> sanity-check the CMN-700 support under RTL emulation, but otherwise I've
> only been able to test the refactorings in terms of not breaking my
> little CMN-600 platform. Still, I'm pretty confident that that's caught
> all the stupids, and the rest is mostly just yet more events, so I
> don't think anything's particularly high-risk here.
> 
> [...]

Applied to will (for-next/perf), thanks!

[1/4] dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
      https://git.kernel.org/will/c/2b60a22b70fa
[2/4] perf/arm-cmn: Add CMN-650 support
      https://git.kernel.org/will/c/8e504d93acb6
[3/4] perf/arm-cmn: Refactor occupancy filter selector
      https://git.kernel.org/will/c/65adf71398f5
[4/4] perf/arm-cmn: Add CMN-700 support
      https://git.kernel.org/will/c/23760a014417

Cheers,
Qian Cai May 10, 2022, 5:15 p.m. UTC | #3
On Mon, Apr 18, 2022 at 11:57:37PM +0100, Robin Murphy wrote:
> Hi Will,
> 
> As promised last time, here are the patches adding yet more inscrutable
> support for the other new members of the CMN family. I'm not sure if any
> of the folks who want this are the kind to give upstream feedback, but
> we can always live in hope... A colleague has kindly managed to briefly
> sanity-check the CMN-700 support under RTL emulation, but otherwise I've
> only been able to test the refactorings in terms of not breaking my
> little CMN-600 platform. Still, I'm pretty confident that that's caught
> all the stupids, and the rest is mostly just yet more events, so I
> don't think anything's particularly high-risk here.
> 
> Cheers,
> Robin.
> 
> 
> Robin Murphy (4):
>   dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
>   perf/arm-cmn: Add CMN-650 support
>   perf/arm-cmn: Refactor occupancy filter selector
>   perf/arm-cmn: Add CMN-700 support
> 
>  .../devicetree/bindings/perf/arm,cmn.yaml     |   2 +
>  drivers/perf/arm-cmn.c                        | 606 ++++++++++++++----
>  2 files changed, 485 insertions(+), 123 deletions(-)

Reverting this series fixed a null-ptr-deref while running some syscall
fuzzers by an unprivileged user on a Neoverse-N1 server with two CMN-600
(one for each socket).

 KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 [dfff800000000007] address between user and kernel address ranges
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 CPU: 7 PID: 115283 Comm: trinity-c57 Not tainted 5.18.0-rc6-next-20220510-dirty #93
 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : arm_cmn_event_init [arm_cmn]
 lr : perf_try_init_event
 sp : ffff800058eb7a00
 x29: ffff800058eb7a10 x28: ffff080232bed880 x27: ffff400ccef2e120
 x26: ffff400ccef2e280 x25: 000000000000001f x24: 0000000000000003
 x23: 0000000000000001 x22: dfff800000000000 x21: 0000000000000219
 x20: ffffb0f539491e88 x19: 0000000000000000 x18: 0000000000000000
 x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaae0821540
 x14: 0000000000000028 x13: 0000000000000000 x12: ffff70000b1d6f03
 x11: 1ffff0000b1d6f02 x10: ffff70000b1d6f02 x9 : ffffb0f572260468
 x8 : ffff800058eb7817 x7 : ffff400ccef2e200 x6 : 1fffe80199de5c40
 x5 : ffffb0f53948fa00 x4 : 0000000000000000 x3 : ffff080232bed8e0
 x2 : ffffb0f539491efc x1 : 0000000000000007 x0 : 0000000000000038
 Call trace:
  arm_cmn_event_init [arm_cmn]
  perf_try_init_event
  perf_init_event
  perf_event_alloc
  __do_sys_perf_event_open
  __arm64_sys_perf_event_open
  invoke_syscall
  el0_svc_common.constprop.0
  do_el0_svc
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync
 Code: 35002a21 f9400293 9100e260 d343fc01 (38f66821)
 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Kernel Offset: 0x30f569c50000 from 0xffff800008000000
 PHYS_OFFSET: 0x80000000
 CPU features: 0x000,0042e015,19801c82
 Memory Limit: none


0x3e6c is in arm_cmn_event_init (drivers/perf/arm-cmn.c:1528).
1523    }
1524
1525
1526    static int arm_cmn_event_init(struct perf_event *event)
1527    {
1528            struct arm_cmn *cmn = to_cmn(event->pmu);
1529            struct arm_cmn_hw_event *hw = to_cmn_hw(event);
1530            struct arm_cmn_node *dn;
1531            enum cmn_node_type type;
1532            bool bynodeid;

# "map_1" has the same content.
$ sudo cat /sys/kernel/debug/arm-cmn/map
     X    0        1        2        3        4        5        6        7
Y P D+--------+--------+--------+--------+--------+--------+--------+--------+
5    | XP #40 | XP #41 | XP #42 | XP #43 | XP #44 | XP #45 | XP #46 | XP #47 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  CXRH  |  HN-I  |  HN-I  |  HN-I  |  HN-D  |  HN-I  |  HN-I  |  CXRH  |
    0|   #2   |   #3   |   #4   |   #5   |   #0   |   #6   |   #7   |   #3   |
    1|        |        |        |        |        |        |        |        |
  1  |  RN-D  |  RN-D  |  RN-D  |  RN-I  |  RN-I  |  RN-D  |  RN-D  |  RN-D  |
    0|   #2   |   #3   |   #4   |   #10  |   #11  |   #5   |   #6   |   #7   |
    1|        |        |        |        |        |        |        |        |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
4    | XP #32 | XP #33 | XP #34 | XP #35 | XP #36 | XP #37 | XP #38 | XP #39 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #8   |        |        |   #9   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
    0|   #24  |        |   #26  |        |        |   #28  |        |   #30  |
    1|   #25  |        |   #27  |        |        |   #29  |        |   #31  |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
3    | XP #24 | XP #25 | XP #26 | XP #27 | XP #28 | XP #29 | XP #30 | XP #31 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #6   |        |        |   #7   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
    0|   #16  |        |   #18  |        |        |   #20  |        |   #22  |
    1|   #17  |        |   #19  |        |        |   #21  |        |   #23  |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
2    | XP #16 | XP #17 | XP #18 | XP #19 | XP #20 | XP #21 | XP #22 | XP #23 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #4   |        |        |   #5   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
    0|   #8   |        |   #10  |        |        |   #12  |        |   #14  |
    1|   #9   |        |   #11  |        |        |   #13  |        |   #15  |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
1    | XP #8  | XP #9  | XP #10 | XP #11 | XP #12 | XP #13 | XP #14 | XP #15 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #2   |        |        |   #3   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
    0|   #0   |        |   #2   |        |        |   #4   |        |   #6   |
    1|   #1   |        |   #3   |        |        |   #5   |        |   #7   |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
0    | XP #0  | XP #1  | XP #2  | XP #3  | XP #4  | XP #5  | XP #6  | XP #7  |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  CXRH  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  CXRH  |
    0|   #0   |        |   #0   |        |        |   #1   |        |   #1   |
    1|        |        |        |        |        |        |        |        |
  1  |  RN-D  | RN-F_B |  HN-I  | RN-F_B | RN-F_B |  HN-I  | RN-F_B |  RN-D  |
    0|   #0   |        |   #1   |        |        |   #2   |        |   #1   |
    1|        |        |        |        |        |        |        |        |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
Robin Murphy May 10, 2022, 5:50 p.m. UTC | #4
On 2022-05-10 18:15, Qian Cai wrote:
> On Mon, Apr 18, 2022 at 11:57:37PM +0100, Robin Murphy wrote:
>> Hi Will,
>>
>> As promised last time, here are the patches adding yet more inscrutable
>> support for the other new members of the CMN family. I'm not sure if any
>> of the folks who want this are the kind to give upstream feedback, but
>> we can always live in hope... A colleague has kindly managed to briefly
>> sanity-check the CMN-700 support under RTL emulation, but otherwise I've
>> only been able to test the refactorings in terms of not breaking my
>> little CMN-600 platform. Still, I'm pretty confident that that's caught
>> all the stupids, and the rest is mostly just yet more events, so I
>> don't think anything's particularly high-risk here.
>>
>> Cheers,
>> Robin.
>>
>>
>> Robin Murphy (4):
>>    dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
>>    perf/arm-cmn: Add CMN-650 support
>>    perf/arm-cmn: Refactor occupancy filter selector
>>    perf/arm-cmn: Add CMN-700 support
>>
>>   .../devicetree/bindings/perf/arm,cmn.yaml     |   2 +
>>   drivers/perf/arm-cmn.c                        | 606 ++++++++++++++----
>>   2 files changed, 485 insertions(+), 123 deletions(-)
> 
> Reverting this series fixed a null-ptr-deref while running some syscall
> fuzzers by an unprivileged user on a Neoverse-N1 server with two CMN-600
> (one for each socket).
> 
>   KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
>   Mem abort info:
>     ESR = 0x0000000096000004
>     EC = 0x25: DABT (current EL), IL = 32 bits
>     SET = 0, FnV = 0
>     EA = 0, S1PTW = 0
>     FSC = 0x04: level 0 translation fault
>   Data abort info:
>     ISV = 0, ISS = 0x00000004
>     CM = 0, WnR = 0
>   [dfff800000000007] address between user and kernel address ranges
>   Internal error: Oops: 96000004 [#1] PREEMPT SMP
>   CPU: 7 PID: 115283 Comm: trinity-c57 Not tainted 5.18.0-rc6-next-20220510-dirty #93
>   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : arm_cmn_event_init [arm_cmn]
>   lr : perf_try_init_event
>   sp : ffff800058eb7a00
>   x29: ffff800058eb7a10 x28: ffff080232bed880 x27: ffff400ccef2e120
>   x26: ffff400ccef2e280 x25: 000000000000001f x24: 0000000000000003
>   x23: 0000000000000001 x22: dfff800000000000 x21: 0000000000000219
>   x20: ffffb0f539491e88 x19: 0000000000000000 x18: 0000000000000000
>   x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaae0821540
>   x14: 0000000000000028 x13: 0000000000000000 x12: ffff70000b1d6f03
>   x11: 1ffff0000b1d6f02 x10: ffff70000b1d6f02 x9 : ffffb0f572260468
>   x8 : ffff800058eb7817 x7 : ffff400ccef2e200 x6 : 1fffe80199de5c40
>   x5 : ffffb0f53948fa00 x4 : 0000000000000000 x3 : ffff080232bed8e0
>   x2 : ffffb0f539491efc x1 : 0000000000000007 x0 : 0000000000000038
>   Call trace:
>    arm_cmn_event_init [arm_cmn]
>    perf_try_init_event
>    perf_init_event
>    perf_event_alloc
>    __do_sys_perf_event_open
>    __arm64_sys_perf_event_open
>    invoke_syscall
>    el0_svc_common.constprop.0
>    do_el0_svc
>    el0_svc
>    el0t_64_sync_handler
>    el0t_64_sync
>   Code: 35002a21 f9400293 9100e260 d343fc01 (38f66821)
>   ---[ end trace 0000000000000000 ]---
>   Kernel panic - not syncing: Oops: Fatal exception
>   SMP: stopping secondary CPUs
>   Kernel Offset: 0x30f569c50000 from 0xffff800008000000
>   PHYS_OFFSET: 0x80000000
>   CPU features: 0x000,0042e015,19801c82
>   Memory Limit: none
> 
> 
> 0x3e6c is in arm_cmn_event_init (drivers/perf/arm-cmn.c:1528).
> 1523    }
> 1524
> 1525
> 1526    static int arm_cmn_event_init(struct perf_event *event)
> 1527    {
> 1528            struct arm_cmn *cmn = to_cmn(event->pmu);
> 1529            struct arm_cmn_hw_event *hw = to_cmn_hw(event);
> 1530            struct arm_cmn_node *dn;
> 1531            enum cmn_node_type type;
> 1532            bool bynodeid;

Hmm, can you narrow it down a bit more to a particular patch and/or a 
specific reproducer? I don't see how it could really be crashing 
dereferencing event->pmu, especially since none of the code this early 
in event_init has even changed recently :/

AFAICS the most plausible thing at offset 0x38 from anywhere would be 
event->pmu->type, but again, if event->pmu is NULL here that's surely a 
fault in perf core rather than the driver?

> # "map_1" has the same content.
> $ sudo cat /sys/kernel/debug/arm-cmn/map
>       X    0        1        2        3        4        5        6        7
> Y P D+--------+--------+--------+--------+--------+--------+--------+--------+
> 5    | XP #40 | XP #41 | XP #42 | XP #43 | XP #44 | XP #45 | XP #46 | XP #47 |
>       | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
>       |........|........|........|........|........|........|........|........|
>    0  |  CXRH  |  HN-I  |  HN-I  |  HN-I  |  HN-D  |  HN-I  |  HN-I  |  CXRH  |
>      0|   #2   |   #3   |   #4   |   #5   |   #0   |   #6   |   #7   |   #3   |
>      1|        |        |        |        |        |        |        |        |
>    1  |  RN-D  |  RN-D  |  RN-D  |  RN-I  |  RN-I  |  RN-D  |  RN-D  |  RN-D  |
>      0|   #2   |   #3   |   #4   |   #10  |   #11  |   #5   |   #6   |   #7   |
>      1|        |        |        |        |        |        |        |        |
> -----+--------+--------+--------+--------+--------+--------+--------+--------+
> 4    | XP #32 | XP #33 | XP #34 | XP #35 | XP #36 | XP #37 | XP #38 | XP #39 |
>       | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
>       |........|........|........|........|........|........|........|........|
>    0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
>      0|        |        |   #8   |        |        |   #9   |        |        |
>      1|        |        |        |        |        |        |        |        |
>    1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
>      0|   #24  |        |   #26  |        |        |   #28  |        |   #30  |
>      1|   #25  |        |   #27  |        |        |   #29  |        |   #31  |

Something's definitely screwy here though... these ID numbers should 
have a device type above them. Was this dumped with patch #1 applied, 
and if not do they show as "????" when it is?

Thanks,
Robin.

> -----+--------+--------+--------+--------+--------+--------+--------+--------+
> 3    | XP #24 | XP #25 | XP #26 | XP #27 | XP #28 | XP #29 | XP #30 | XP #31 |
>       | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
>       |........|........|........|........|........|........|........|........|
>    0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
>      0|        |        |   #6   |        |        |   #7   |        |        |
>      1|        |        |        |        |        |        |        |        |
>    1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
>      0|   #16  |        |   #18  |        |        |   #20  |        |   #22  |
>      1|   #17  |        |   #19  |        |        |   #21  |        |   #23  |
> -----+--------+--------+--------+--------+--------+--------+--------+--------+
> 2    | XP #16 | XP #17 | XP #18 | XP #19 | XP #20 | XP #21 | XP #22 | XP #23 |
>       | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
>       |........|........|........|........|........|........|........|........|
>    0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
>      0|        |        |   #4   |        |        |   #5   |        |        |
>      1|        |        |        |        |        |        |        |        |
>    1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
>      0|   #8   |        |   #10  |        |        |   #12  |        |   #14  |
>      1|   #9   |        |   #11  |        |        |   #13  |        |   #15  |
> -----+--------+--------+--------+--------+--------+--------+--------+--------+
> 1    | XP #8  | XP #9  | XP #10 | XP #11 | XP #12 | XP #13 | XP #14 | XP #15 |
>       | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
>       |........|........|........|........|........|........|........|........|
>    0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
>      0|        |        |   #2   |        |        |   #3   |        |        |
>      1|        |        |        |        |        |        |        |        |
>    1  |        | RN-F_B |        | RN-F_B | RN-F_B |        | RN-F_B |        |
>      0|   #0   |        |   #2   |        |        |   #4   |        |   #6   |
>      1|   #1   |        |   #3   |        |        |   #5   |        |   #7   |
> -----+--------+--------+--------+--------+--------+--------+--------+--------+
> 0    | XP #0  | XP #1  | XP #2  | XP #3  | XP #4  | XP #5  | XP #6  | XP #7  |
>       | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
>       |........|........|........|........|........|........|........|........|
>    0  |  CXRH  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  CXRH  |
>      0|   #0   |        |   #0   |        |        |   #1   |        |   #1   |
>      1|        |        |        |        |        |        |        |        |
>    1  |  RN-D  | RN-F_B |  HN-I  | RN-F_B | RN-F_B |  HN-I  | RN-F_B |  RN-D  |
>      0|   #0   |        |   #1   |        |        |   #2   |        |   #1   |
>      1|        |        |        |        |        |        |        |        |
> -----+--------+--------+--------+--------+--------+--------+--------+--------+
Qian Cai May 10, 2022, 7:38 p.m. UTC | #5
On Tue, May 10, 2022 at 06:50:56PM +0100, Robin Murphy wrote:
> Hmm, can you narrow it down a bit more to a particular patch and/or a
> specific reproducer? I don't see how it could really be crashing
> dereferencing event->pmu, especially since none of the code this early in
> event_init has even changed recently :/

Right, there is an old bug that sometimes the debuginfo on kernel
modules is not accurate on arm64, so gdb and faddr2line gave the wrong
line. Anyway, using the printk() method, we were faulted on this line.

+       /* This is sufficiently annoying to recalculate, so cache it */
+       hw->filter_sel = arm_cmn_filter_sel(cmn->model, type, eventid);

Also, it can be reproduced within 5-min by running,

$ trinity -C 128 -cperf_event_open

> Something's definitely screwy here though... these ID numbers should have a
> device type above them. Was this dumped with patch #1 applied, and if not do
> they show as "????" when it is?

Yes, with the series applied, we have,


     X    0        1        2        3        4        5        6        7
Y P D+--------+--------+--------+--------+--------+--------+--------+--------+
5    | XP #40 | XP #41 | XP #42 | XP #43 | XP #44 | XP #45 | XP #46 | XP #47 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  CXRH  |  HN-I  |  HN-I  |  HN-I  |  HN-D  |  HN-I  |  HN-I  |  CXRH  |
    0|   #2   |   #3   |   #4   |   #5   |   #0   |   #6   |   #7   |   #3   |
    1|        |        |        |        |        |        |        |        |
  1  |  RN-D  |  RN-D  |  RN-D  |  RN-I  |  RN-I  |  RN-D  |  RN-D  |  RN-D  |
    0|   #2   |   #3   |   #4   |   #10  |   #11  |   #5   |   #6   |   #7   |
    1|        |        |        |        |        |        |        |        |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
4    | XP #32 | XP #33 | XP #34 | XP #35 | XP #36 | XP #37 | XP #38 | XP #39 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #8   |        |        |   #9   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |  ????  | RN-F_B |  ????  | RN-F_B | RN-F_B |  ????  | RN-F_B |  ????  |
    0|   #24  |        |   #26  |        |        |   #28  |        |   #30  |
    1|   #25  |        |   #27  |        |        |   #29  |        |   #31  |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
3    | XP #24 | XP #25 | XP #26 | XP #27 | XP #28 | XP #29 | XP #30 | XP #31 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #6   |        |        |   #7   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |  ????  | RN-F_B |  ????  | RN-F_B | RN-F_B |  ????  | RN-F_B |  ????  |
    0|   #16  |        |   #18  |        |        |   #20  |        |   #22  |
    1|   #17  |        |   #19  |        |        |   #21  |        |   #23  |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
2    | XP #16 | XP #17 | XP #18 | XP #19 | XP #20 | XP #21 | XP #22 | XP #23 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #4   |        |        |   #5   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |  ????  | RN-F_B |  ????  | RN-F_B | RN-F_B |  ????  | RN-F_B |  ????  |
    0|   #8   |        |   #10  |        |        |   #12  |        |   #14  |
    1|   #9   |        |   #11  |        |        |   #13  |        |   #15  |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
1    | XP #8  | XP #9  | XP #10 | XP #11 | XP #12 | XP #13 | XP #14 | XP #15 |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  SN-F  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  SN-F  |
    0|        |        |   #2   |        |        |   #3   |        |        |
    1|        |        |        |        |        |        |        |        |
  1  |  ????  | RN-F_B |  ????  | RN-F_B | RN-F_B |  ????  | RN-F_B |  ????  |
    0|   #0   |        |   #2   |        |        |   #4   |        |   #6   |
    1|   #1   |        |   #3   |        |        |   #5   |        |   #7   |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
0    | XP #0  | XP #1  | XP #2  | XP #3  | XP #4  | XP #5  | XP #6  | XP #7  |
     | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  | DTC 0  |
     |........|........|........|........|........|........|........|........|
  0  |  CXRH  | RN-F_B |  RN-I  | RN-F_B | RN-F_B |  RN-I  | RN-F_B |  CXRH  |
    0|   #0   |        |   #0   |        |        |   #1   |        |   #1   |
    1|        |        |        |        |        |        |        |        |
  1  |  RN-D  | RN-F_B |  HN-I  | RN-F_B | RN-F_B |  HN-I  | RN-F_B |  RN-D  |
    0|   #0   |        |   #1   |        |        |   #2   |        |   #1   |
    1|        |        |        |        |        |        |        |        |
-----+--------+--------+--------+--------+--------+--------+--------+--------+
Qian Cai May 10, 2022, 8:09 p.m. UTC | #6
On Tue, May 10, 2022 at 06:50:56PM +0100, Robin Murphy wrote:
> Hmm, can you narrow it down a bit more to a particular patch and/or a
> specific reproducer? I don't see how it could really be crashing
> dereferencing event->pmu, especially since none of the code this early in
> event_init has even changed recently :/

Specifically, One of "e" == NULL in arm_cmn_filter_sel().

e = container_of(arm_cmn_event_attrs[i], typeof(*e), attr.attr);

Then, the next line will dereference it.
Robin Murphy May 10, 2022, 9:05 p.m. UTC | #7
On 2022-05-10 20:38, Qian Cai wrote:
> On Tue, May 10, 2022 at 06:50:56PM +0100, Robin Murphy wrote:
>> Hmm, can you narrow it down a bit more to a particular patch and/or a
>> specific reproducer? I don't see how it could really be crashing
>> dereferencing event->pmu, especially since none of the code this early in
>> event_init has even changed recently :/
> 
> Right, there is an old bug that sometimes the debuginfo on kernel
> modules is not accurate on arm64, so gdb and faddr2line gave the wrong
> line. Anyway, using the printk() method, we were faulted on this line.
> 
> +       /* This is sufficiently annoying to recalculate, so cache it */
> +       hw->filter_sel = arm_cmn_filter_sel(cmn->model, type, eventid);

Aha, that's a much more believable culprit - 0x38 also stood out as an 
offset in the attribute structures.

> Also, it can be reproduced within 5-min by running,
> 
> $ trinity -C 128 -cperf_event_open

I can already guess what I've done, and just reproduced the crash 
directly on the first try :)

I'll work on the fix tomorrow, thanks for the report and info!

>> Something's definitely screwy here though... these ID numbers should have a
>> device type above them. Was this dumped with patch #1 applied, and if not do
>> they show as "????" when it is?
> 
> Yes, with the series applied, we have,

Ah, turns out I can no longer get away with being lazy and not masking 
the device_type field properly when reading connect_info, as some of the 
adjacent bits are no longer reserved since the debugfs code was first 
written - I've not played with a CAL config of any of the newer stuff, 
or I might have noticed sooner.

But at least that tweak to differentiate unknown from unconnected has 
now paid for itself :D

Cheers,
Robin.