mbox series

[v4,0/2] Fix RISC-V's arch-topology reporting

Message ID 20220715175155.3567243-1-mail@conchuod.ie (mailing list archive)
Headers show
Series Fix RISC-V's arch-topology reporting | expand

Message

Conor Dooley July 15, 2022, 5:51 p.m. UTC
From: Conor Dooley <conor.dooley@microchip.com>

Hey all,
It's my first time messing around with arch/ code at all, let alone
more than one arch, so forgive me if I have screwed up how to do a
migration like this.

The goal here is the fix the incorrectly reported arch topology on
RISC-V which seems to have been broken since it was added.
cpu, package and thread IDs are all currently reported as -1, so tools
like lstopo think systems have multiple threads on the same core when
this is not true:
https://github.com/open-mpi/hwloc/issues/536

arm64's topology code basically applies to RISC-V too, so it has been
made generic along with the removal of MPIDR related code, which
appears to be redudant code since '3102bc0e6ac7 ("arm64: topology: Stop
using MPIDR for topology information")' replaced the code that actually
interacted with MPIDR with default values.

I only built tested for arm{,64} , so hopefully it is not broken when
used. Testing on both arm64 & !SMP RISC-V would really be appreciated!

For V2, I dropped the idea of doing a RISC-V specific implementation
followed by a move to the generic code & just went for the more straight
forward method of moving to the shared version first. I also dropped the
RFC.

V3 moves store_cpu_topology()'s definition down inside the arch check
alongside the init function so that boot on 32bit arm is not broken.

V4 has moved the RISC-V boot hart's call to store_cpu_topology() later
into the boot process it is now right before SMP is brought up (or not
in the case of !SMP). This prevents calling detect_cache_attributes()
while we cannot allocate memory.

V4 is also rebased on next-20220715 to get Sudeep's most recent
arch_topology patchset.

Thanks,
Conor

Conor Dooley (2):
  arm64: topology: move store_cpu_topology() to shared code
  riscv: topology: fix default topology reporting

 arch/arm64/kernel/topology.c | 40 ------------------------------------
 arch/riscv/Kconfig           |  2 +-
 arch/riscv/kernel/smpboot.c  |  3 ++-
 drivers/base/arch_topology.c | 19 +++++++++++++++++
 4 files changed, 22 insertions(+), 42 deletions(-)


base-commit: 6014cfa5bf32cf8c5c58b3cfd5ee0e1542c8a825

Comments

Conor Dooley July 16, 2022, 1:35 p.m. UTC | #1
On 15/07/2022 18:51, Conor Dooley wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> From: Conor Dooley <conor.dooley@microchip.com>
> 
> Hey all,
> It's my first time messing around with arch/ code at all, let alone
> more than one arch, so forgive me if I have screwed up how to do a
> migration like this.
> 
> The goal here is the fix the incorrectly reported arch topology on
> RISC-V which seems to have been broken since it was added.
> cpu, package and thread IDs are all currently reported as -1, so tools
> like lstopo think systems have multiple threads on the same core when
> this is not true:
> https://github.com/open-mpi/hwloc/issues/536
> 
> arm64's topology code basically applies to RISC-V too, so it has been
> made generic along with the removal of MPIDR related code, which
> appears to be redudant code since '3102bc0e6ac7 ("arm64: topology: Stop
> using MPIDR for topology information")' replaced the code that actually
> interacted with MPIDR with default values.
> 
> I only built tested for arm{,64} , so hopefully it is not broken when
> used. Testing on both arm64 & !SMP RISC-V would really be appreciated!

FWIW, I pushed it to a branch for the sake of LKP testing and all
archs + randconfigs built fine.
Thanks,
Conor.

> 
> For V2, I dropped the idea of doing a RISC-V specific implementation
> followed by a move to the generic code & just went for the more straight
> forward method of moving to the shared version first. I also dropped the
> RFC.
> 
> V3 moves store_cpu_topology()'s definition down inside the arch check
> alongside the init function so that boot on 32bit arm is not broken.
> 
> V4 has moved the RISC-V boot hart's call to store_cpu_topology() later
> into the boot process it is now right before SMP is brought up (or not
> in the case of !SMP). This prevents calling detect_cache_attributes()
> while we cannot allocate memory.
> 
> V4 is also rebased on next-20220715 to get Sudeep's most recent
> arch_topology patchset.
> 
> Thanks,
> Conor
> 
> Conor Dooley (2):
>   arm64: topology: move store_cpu_topology() to shared code
>   riscv: topology: fix default topology reporting
> 
>  arch/arm64/kernel/topology.c | 40 ------------------------------------
>  arch/riscv/Kconfig           |  2 +-
>  arch/riscv/kernel/smpboot.c  |  3 ++-
>  drivers/base/arch_topology.c | 19 +++++++++++++++++
>  4 files changed, 22 insertions(+), 42 deletions(-)
> 
> 
> base-commit: 6014cfa5bf32cf8c5c58b3cfd5ee0e1542c8a825
> --
> 2.37.1
>
Conor Dooley July 23, 2022, 11:22 a.m. UTC | #2
On 15/07/2022 18:51, Conor Dooley wrote:
> From: Conor Dooley <conor.dooley@microchip.com>
> 
> Hey all,
> It's my first time messing around with arch/ code at all, let alone
> more than one arch, so forgive me if I have screwed up how to do a
> migration like this.
> 
> The goal here is the fix the incorrectly reported arch topology on
> RISC-V which seems to have been broken since it was added.
> cpu, package and thread IDs are all currently reported as -1, so tools
> like lstopo think systems have multiple threads on the same core when
> this is not true:
> https://github.com/open-mpi/hwloc/issues/536

Hey,

Not got any feedback on the smpboot changes from the RISC-V side.
I tested it on polarfire, the d1 (with both SMP & !SMP set iirc)
& on the u540. It all looked good to me.

I'd like to have this fixed for v5.20, but there isn't too much
time left before the mw. Not too sure about the cross-tree changes,
does it need an immutable branch or could it go through driver-core?
Catalin suggested removing the CC stable from patch 1/2 & adding it
as a dependency for the 2/2 patch - but obviously that's up to the
committer to sort out.

I guess since it is a fix, it could also go into rc1<

Thanks,
Conor.

> 
> arm64's topology code basically applies to RISC-V too, so it has been
> made generic along with the removal of MPIDR related code, which
> appears to be redudant code since '3102bc0e6ac7 ("arm64: topology: Stop
> using MPIDR for topology information")' replaced the code that actually
> interacted with MPIDR with default values.
> 
> I only built tested for arm{,64} , so hopefully it is not broken when
> used. Testing on both arm64 & !SMP RISC-V would really be appreciated!
> 
> For V2, I dropped the idea of doing a RISC-V specific implementation
> followed by a move to the generic code & just went for the more straight
> forward method of moving to the shared version first. I also dropped the
> RFC.
> 
> V3 moves store_cpu_topology()'s definition down inside the arch check
> alongside the init function so that boot on 32bit arm is not broken.
> 
> V4 has moved the RISC-V boot hart's call to store_cpu_topology() later
> into the boot process it is now right before SMP is brought up (or not
> in the case of !SMP). This prevents calling detect_cache_attributes()
> while we cannot allocate memory.
> 
> V4 is also rebased on next-20220715 to get Sudeep's most recent
> arch_topology patchset.
> 
> Thanks,
> Conor
> 
> Conor Dooley (2):
>   arm64: topology: move store_cpu_topology() to shared code
>   riscv: topology: fix default topology reporting
> 
>  arch/arm64/kernel/topology.c | 40 ------------------------------------
>  arch/riscv/Kconfig           |  2 +-
>  arch/riscv/kernel/smpboot.c  |  3 ++-
>  drivers/base/arch_topology.c | 19 +++++++++++++++++
>  4 files changed, 22 insertions(+), 42 deletions(-)
> 
> 
> base-commit: 6014cfa5bf32cf8c5c58b3cfd5ee0e1542c8a825
Will Deacon July 25, 2022, 9:13 a.m. UTC | #3
On Sat, Jul 23, 2022 at 11:22:01AM +0000, Conor.Dooley@microchip.com wrote:
> On 15/07/2022 18:51, Conor Dooley wrote:
> > From: Conor Dooley <conor.dooley@microchip.com>
> > 
> > Hey all,
> > It's my first time messing around with arch/ code at all, let alone
> > more than one arch, so forgive me if I have screwed up how to do a
> > migration like this.
> > 
> > The goal here is the fix the incorrectly reported arch topology on
> > RISC-V which seems to have been broken since it was added.
> > cpu, package and thread IDs are all currently reported as -1, so tools
> > like lstopo think systems have multiple threads on the same core when
> > this is not true:
> > https://github.com/open-mpi/hwloc/issues/536
> 
> Hey,
> 
> Not got any feedback on the smpboot changes from the RISC-V side.
> I tested it on polarfire, the d1 (with both SMP & !SMP set iirc)
> & on the u540. It all looked good to me.
> 
> I'd like to have this fixed for v5.20, but there isn't too much
> time left before the mw. Not too sure about the cross-tree changes,
> does it need an immutable branch or could it go through driver-core?
> Catalin suggested removing the CC stable from patch 1/2 & adding it
> as a dependency for the 2/2 patch - but obviously that's up to the
> committer to sort out.

I'm finalising the arm64 queue today, so I don't really want to pull in
additional changes beyond critical fixes at this point, I'm afraid. I was
half-expecting a pull request from the riscv side last week but I didn't
see anything.

FWIW, if there's still no movement by -rc1, then I'm happy to queue all
of this on its own branch in the arm64 tree for 5.21.

Let me know.

Will
Conor Dooley July 25, 2022, 9:20 a.m. UTC | #4
On 25/07/2022 10:13, Will Deacon wrote:
> On Sat, Jul 23, 2022 at 11:22:01AM +0000, Conor.Dooley@microchip.com wrote:
>> On 15/07/2022 18:51, Conor Dooley wrote:
>>
>> Hey,
>>
>> Not got any feedback on the smpboot changes from the RISC-V side.
>> I tested it on polarfire, the d1 (with both SMP & !SMP set iirc)
>> & on the u540. It all looked good to me.
>>
>> I'd like to have this fixed for v5.20, but there isn't too much
>> time left before the mw. Not too sure about the cross-tree changes,
>> does it need an immutable branch or could it go through driver-core?
>> Catalin suggested removing the CC stable from patch 1/2 & adding it
>> as a dependency for the 2/2 patch - but obviously that's up to the
>> committer to sort out.
> 
> I'm finalising the arm64 queue today, so I don't really want to pull in
> additional changes beyond critical fixes at this point, I'm afraid. I was
> half-expecting a pull request from the riscv side last week but I didn't
> see anything.

Yeah, that's fair. It's late in the game for cross-tree messing.
I know Palmer has been p busy recently.

> FWIW, if there's still no movement by -rc1, then I'm happy to queue all
> of this on its own branch in the arm64 tree for 5.21.

Hopefully someone on the riscv side will have confirmed what I am doing
is sane by then.

> 
> Let me know.

I will, thanks!
Atish Patra July 26, 2022, 8:12 a.m. UTC | #5
On Mon, Jul 25, 2022 at 2:20 AM <Conor.Dooley@microchip.com> wrote:
>
> On 25/07/2022 10:13, Will Deacon wrote:
> > On Sat, Jul 23, 2022 at 11:22:01AM +0000, Conor.Dooley@microchip.com wrote:
> >> On 15/07/2022 18:51, Conor Dooley wrote:
> >>
> >> Hey,
> >>
> >> Not got any feedback on the smpboot changes from the RISC-V side.
> >> I tested it on polarfire, the d1 (with both SMP & !SMP set iirc)
> >> & on the u540. It all looked good to me.
> >>
> >> I'd like to have this fixed for v5.20, but there isn't too much
> >> time left before the mw. Not too sure about the cross-tree changes,
> >> does it need an immutable branch or could it go through driver-core?
> >> Catalin suggested removing the CC stable from patch 1/2 & adding it
> >> as a dependency for the 2/2 patch - but obviously that's up to the
> >> committer to sort out.
> >
> > I'm finalising the arm64 queue today, so I don't really want to pull in
> > additional changes beyond critical fixes at this point, I'm afraid. I was
> > half-expecting a pull request from the riscv side last week but I didn't
> > see anything.
>
> Yeah, that's fair. It's late in the game for cross-tree messing.
> I know Palmer has been p busy recently.
>
> > FWIW, if there's still no movement by -rc1, then I'm happy to queue all
> > of this on its own branch in the arm64 tree for 5.21.
>
> Hopefully someone on the riscv side will have confirmed what I am doing
> is sane by then.
>
> >
> > Let me know.
>
> I will, thanks!

Sorry for the delayed response here. I was planning to test the series
last week itself
but got dragged into something else and a qemu bug for NUMA.

Thanks for the fixes. I have tested this on Qemu(removing the topology
node) for the following configurations.
SMP, !SMP, NUMA (2 sockets)

FWIW,
Tested-by: Atish Patra <atishp@rivosinc.com>
Conor Dooley July 26, 2022, 9:14 a.m. UTC | #6
On 26/07/2022 09:12, Atish Patra wrote:
> 
> Sorry for the delayed response here. I was planning to test the series
> last week itself
> but got dragged into something else and a qemu bug for NUMA.

No worries, thanks for testing/reviewing it.

> 
> Thanks for the fixes. I have tested this on Qemu(removing the topology
> node) for the following configurations.
> SMP, !SMP, NUMA (2 sockets)

A NUMA test, great! I do need to get qemu set up properly..

> 
> FWIW,
> Tested-by: Atish Patra <atishp@rivosinc.com>

Thanks,
Conor.