mbox series

[v6,00/21] arch_topology: Updates to add socket support and fix cluster ids

Message ID 20220704101605.1318280-1-sudeep.holla@arm.com (mailing list archive)
Headers show
Series arch_topology: Updates to add socket support and fix cluster ids | expand

Message

Sudeep Holla July 4, 2022, 10:15 a.m. UTC
Hi Greg,

Let me know if you prefer to pull the patches directly or prefer pull
request. It has been in -next for a while now.

Hi All,

This version updates cacheinfo to populate and use the information from
there for all the cache topology.

This series intends to fix some discrepancies we have in the CPU topology
parsing from the device tree /cpu-map node. Also this diverges from the
behaviour on a ACPI enabled platform. The expectation is that both DT
and ACPI enabled systems must present consistent view of the CPU topology.

Currently we assign generated cluster count as the physical package identifier
for each CPU which is wrong. The device tree bindings for CPU topology supports
sockets to infer the socket or physical package identifier for a given CPU.
Also we don't check if all the cores/threads belong to the same cluster before
updating their sibling masks which is fine as we don't set the cluster id yet.

These changes also assigns the cluster identifier as parsed from the device tree
cluster nodes within /cpu-map without support for nesting of the clusters.
Finally, it also add support for socket nodes in /cpu-map. With this the
parsing of exact same information from ACPI PPTT and /cpu-map DT node
aligns well.

The only exception is that the last level cache id information can be
inferred from the same ACPI PPTT while we need to parse CPU cache nodes
in the device tree.


v5[5]->v6:
	- Handled out of memory case in early detected correctly after
	  Conor reported boot failures on some RISC-V platform. Also
	  added a log to show up failure of early cacheinfo detection.
	- Added "Remove the unused find_acpi_cpu_cache_topology()" which
	  was missed earlier and posted separately
	- Added all the additional tags recieved

v4[4]->v5[5]:
	- Added all the tags recieved so far. Rafael has acked only change
	  in ACPI and Catalin has acked only change in arm64.
	- Addressed all the typos pointed by Ionela and dropped the patch
	  removing the checks for invalid package id as discussed and update
	  depth in nested cluster warning check.

v3[3]->v4[4]:
	- Updated ACPI PPTT fw_token to use table offset instead of virtual
	  address as it could get changed for everytime it is mapped before
	  the global acpi_permanent_mmap is set
	- Added warning for the topology with nested clusters
	- Added update to cpu_clustergroup_mask so that introduction of
	  correct cluster_id doesn't break existing platforms by limiting
	  the span of clustergroup_mask(by Ionela)

v2[2]->v3[3]:
        - Dropped support to get the device node for the CPU's LLC
        - Updated cacheinfo to support calling of detect_cache_attributes
          early in smp_prepare_cpus stage
        - Added support to check if LLC is valid and shared in the cacheinfo
        - Used the same in arch_topology

v1[1]->v2[2]:
        - Updated ID validity check include all non-negative value
        - Added support to get the device node for the CPU's last level cache
        - Added support to build llc_sibling on DT platforms

[1] https://lore.kernel.org/lkml/20220513095559.1034633-1-sudeep.holla@arm.com
[2] https://lore.kernel.org/lkml/20220518093325.2070336-1-sudeep.holla@arm.com
[3] https://lore.kernel.org/lkml/20220525081416.3306043-1-sudeep.holla@arm.com
[4] https://lore.kernel.org/lkml/20220621192034.3332546-1-sudeep.holla@arm.com
[5] https://lore.kernel.org/lkml/20220627165047.336669-1-sudeep.holla@arm.com

Ionela Voinescu (1):
  arch_topology: Limit span of cpu_clustergroup_mask()

Sudeep Holla (20):
  ACPI: PPTT: Use table offset as fw_token instead of virtual address
  cacheinfo: Use of_cpu_device_node_get instead cpu_dev->of_node
  cacheinfo: Add helper to access any cache index for a given CPU
  cacheinfo: Move cache_leaves_are_shared out of CONFIG_OF
  cacheinfo: Add support to check if last level cache(LLC) is valid or shared
  cacheinfo: Allow early detection and population of cache attributes
  cacheinfo: Use cache identifiers to check if the caches are shared if available
  cacheinfo: Align checks in cache_shared_cpu_map_{setup,remove} for readability
  arch_topology: Add support to parse and detect cache attributes
  arch_topology: Use the last level cache information from the cacheinfo
  arm64: topology: Remove redundant setting of llc_id in CPU topology
  arch_topology: Drop LLC identifier stash from the CPU topology
  arch_topology: Set thread sibling cpumask only within the cluster
  arch_topology: Check for non-negative value rather than -1 for IDs validity
  arch_topology: Avoid parsing through all the CPUs once a outlier CPU is found
  arch_topology: Don't set cluster identifier as physical package identifier
  arch_topology: Set cluster identifier in each core/thread from /cpu-map
  arch_topology: Add support for parsing sockets in /cpu-map
  arch_topology: Warn that topology for nested clusters is not supported
  ACPI: Remove the unused find_acpi_cpu_cache_topology()

 arch/arm64/kernel/topology.c  |  14 ----
 drivers/acpi/pptt.c           |  40 +---------
 drivers/base/arch_topology.c  | 102 ++++++++++++++++++------
 drivers/base/cacheinfo.c      | 143 ++++++++++++++++++++++------------
 include/linux/acpi.h          |   5 --
 include/linux/arch_topology.h |   1 -
 include/linux/cacheinfo.h     |   3 +
 7 files changed, 175 insertions(+), 133 deletions(-)

--
2.37.0

Comments

Conor Dooley July 4, 2022, 3:10 p.m. UTC | #1
On 04/07/2022 11:15, Sudeep Holla wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi Greg,
> 
> Let me know if you prefer to pull the patches directly or prefer pull
> request. It has been in -next for a while now.
> 
> Hi All,
> 
> This version updates cacheinfo to populate and use the information from
> there for all the cache topology.
> 
> This series intends to fix some discrepancies we have in the CPU topology
> parsing from the device tree /cpu-map node. Also this diverges from the
> behaviour on a ACPI enabled platform. The expectation is that both DT
> and ACPI enabled systems must present consistent view of the CPU topology.
> 
> Currently we assign generated cluster count as the physical package identifier
> for each CPU which is wrong. The device tree bindings for CPU topology supports
> sockets to infer the socket or physical package identifier for a given CPU.
> Also we don't check if all the cores/threads belong to the same cluster before
> updating their sibling masks which is fine as we don't set the cluster id yet.
> 
> These changes also assigns the cluster identifier as parsed from the device tree
> cluster nodes within /cpu-map without support for nesting of the clusters.
> Finally, it also add support for socket nodes in /cpu-map. With this the
> parsing of exact same information from ACPI PPTT and /cpu-map DT node
> aligns well.
> 
> The only exception is that the last level cache id information can be
> inferred from the same ACPI PPTT while we need to parse CPU cache nodes
> in the device tree.

For DT + RISC-V on PolarFire SoC and SiFive fu540
Tested-by: Conor Dooley <conor.dooley@microchip.com>

Anecdotally, v5 was tested on the !SMP D1 which worked fine when
CONFIG_SMP was enabled.

Thanks,
Conor.

> 
> 
> v5[5]->v6:
>         - Handled out of memory case in early detected correctly after
>           Conor reported boot failures on some RISC-V platform. Also
>           added a log to show up failure of early cacheinfo detection.
>         - Added "Remove the unused find_acpi_cpu_cache_topology()" which
>           was missed earlier and posted separately
>         - Added all the additional tags recieved
> 
> v4[4]->v5[5]:
>         - Added all the tags recieved so far. Rafael has acked only change
>           in ACPI and Catalin has acked only change in arm64.
>         - Addressed all the typos pointed by Ionela and dropped the patch
>           removing the checks for invalid package id as discussed and update
>           depth in nested cluster warning check.
> 
> v3[3]->v4[4]:
>         - Updated ACPI PPTT fw_token to use table offset instead of virtual
>           address as it could get changed for everytime it is mapped before
>           the global acpi_permanent_mmap is set
>         - Added warning for the topology with nested clusters
>         - Added update to cpu_clustergroup_mask so that introduction of
>           correct cluster_id doesn't break existing platforms by limiting
>           the span of clustergroup_mask(by Ionela)
> 
> v2[2]->v3[3]:
>         - Dropped support to get the device node for the CPU's LLC
>         - Updated cacheinfo to support calling of detect_cache_attributes
>           early in smp_prepare_cpus stage
>         - Added support to check if LLC is valid and shared in the cacheinfo
>         - Used the same in arch_topology
> 
> v1[1]->v2[2]:
>         - Updated ID validity check include all non-negative value
>         - Added support to get the device node for the CPU's last level cache
>         - Added support to build llc_sibling on DT platforms
> 
> [1] https://lore.kernel.org/lkml/20220513095559.1034633-1-sudeep.holla@arm.com
> [2] https://lore.kernel.org/lkml/20220518093325.2070336-1-sudeep.holla@arm.com
> [3] https://lore.kernel.org/lkml/20220525081416.3306043-1-sudeep.holla@arm.com
> [4] https://lore.kernel.org/lkml/20220621192034.3332546-1-sudeep.holla@arm.com
> [5] https://lore.kernel.org/lkml/20220627165047.336669-1-sudeep.holla@arm.com
> 
> Ionela Voinescu (1):
>   arch_topology: Limit span of cpu_clustergroup_mask()
> 
> Sudeep Holla (20):
>   ACPI: PPTT: Use table offset as fw_token instead of virtual address
>   cacheinfo: Use of_cpu_device_node_get instead cpu_dev->of_node
>   cacheinfo: Add helper to access any cache index for a given CPU
>   cacheinfo: Move cache_leaves_are_shared out of CONFIG_OF
>   cacheinfo: Add support to check if last level cache(LLC) is valid or shared
>   cacheinfo: Allow early detection and population of cache attributes
>   cacheinfo: Use cache identifiers to check if the caches are shared if available
>   cacheinfo: Align checks in cache_shared_cpu_map_{setup,remove} for readability
>   arch_topology: Add support to parse and detect cache attributes
>   arch_topology: Use the last level cache information from the cacheinfo
>   arm64: topology: Remove redundant setting of llc_id in CPU topology
>   arch_topology: Drop LLC identifier stash from the CPU topology
>   arch_topology: Set thread sibling cpumask only within the cluster
>   arch_topology: Check for non-negative value rather than -1 for IDs validity
>   arch_topology: Avoid parsing through all the CPUs once a outlier CPU is found
>   arch_topology: Don't set cluster identifier as physical package identifier
>   arch_topology: Set cluster identifier in each core/thread from /cpu-map
>   arch_topology: Add support for parsing sockets in /cpu-map
>   arch_topology: Warn that topology for nested clusters is not supported
>   ACPI: Remove the unused find_acpi_cpu_cache_topology()
> 
>  arch/arm64/kernel/topology.c  |  14 ----
>  drivers/acpi/pptt.c           |  40 +---------
>  drivers/base/arch_topology.c  | 102 ++++++++++++++++++------
>  drivers/base/cacheinfo.c      | 143 ++++++++++++++++++++++------------
>  include/linux/acpi.h          |   5 --
>  include/linux/arch_topology.h |   1 -
>  include/linux/cacheinfo.h     |   3 +
>  7 files changed, 175 insertions(+), 133 deletions(-)
> 
> --
> 2.37.0
>
Sudeep Holla July 4, 2022, 3:20 p.m. UTC | #2
On Mon, Jul 04, 2022 at 03:10:30PM +0000, Conor.Dooley@microchip.com wrote:
> On 04/07/2022 11:15, Sudeep Holla wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > Hi Greg,
> > 
> > Let me know if you prefer to pull the patches directly or prefer pull
> > request. It has been in -next for a while now.
> > 
> > Hi All,
> > 
> > This version updates cacheinfo to populate and use the information from
> > there for all the cache topology.
> > 
> > This series intends to fix some discrepancies we have in the CPU topology
> > parsing from the device tree /cpu-map node. Also this diverges from the
> > behaviour on a ACPI enabled platform. The expectation is that both DT
> > and ACPI enabled systems must present consistent view of the CPU topology.
> > 
> > Currently we assign generated cluster count as the physical package identifier
> > for each CPU which is wrong. The device tree bindings for CPU topology supports
> > sockets to infer the socket or physical package identifier for a given CPU.
> > Also we don't check if all the cores/threads belong to the same cluster before
> > updating their sibling masks which is fine as we don't set the cluster id yet.
> > 
> > These changes also assigns the cluster identifier as parsed from the device tree
> > cluster nodes within /cpu-map without support for nesting of the clusters.
> > Finally, it also add support for socket nodes in /cpu-map. With this the
> > parsing of exact same information from ACPI PPTT and /cpu-map DT node
> > aligns well.
> > 
> > The only exception is that the last level cache id information can be
> > inferred from the same ACPI PPTT while we need to parse CPU cache nodes
> > in the device tree.
> 
> For DT + RISC-V on PolarFire SoC and SiFive fu540
> Tested-by: Conor Dooley <conor.dooley@microchip.com>
> 
> Anecdotally, v5 was tested on the !SMP D1 which worked fine when
> CONFIG_SMP was enabled.
> 

Thanks a lot for testing on RISC-V, much appreciated! Thanks for your
patience and help with v5 so that we could figure out the silly issue
finally.
Conor Dooley July 5, 2022, 7:06 p.m. UTC | #3
[Adding back the CC list from the original thread]

On 05/07/2022 13:27, Brice Goglin wrote:
> [You don't often get email from brice.goglin@inria.fr. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hello Conor
> 
> I am the main developer of hwloc [1] which is used by many people to
> detect the topology of servers. We're started to see some users of hwloc
> on RISC-V and we got some reports about the topology exposed by
> Linux/sysfs being wrong on some platforms.
> 
> For instance https://github.com/open-mpi/hwloc/issues/536 says HiFive
> Unmatched with SiFive Freedom U740 running Linux 5.15 exposes a single
> core with 4 threads instead of 4 cores, while StarFive VisionFive v1
> with JH7100 running 5.18.5 correctly exposes 2 cores.

And with Sudeep's patches applied I get (next-20220704):
# hwloc-calc -N core all
1
# hwloc-calc -N pu all
4
On a PolarFire SoC (so the same as a SiFive U540).
So unfortunately, these patches are not the fix you seek!

Wracked my brains for a bit, but could not see any differences
between the U740 and the JH7100. Culprit seems to be the lack
of a cpu-map node (which is only present in the downstream dt).

I've sent patches for the upstream devicetrees:
https://lore.kernel.org/linux-riscv/20220705190435.1790466-1-mail@conchuod.ie/

> 
> Can you tell me what's the overall status of the CPU/NUMA topology
> exposed in sysfs on RISC-V?

Heh, you've got the wrong man. I don't know.

> Does it depend a lot on the platform because
> device-tree and/or ACPI aren't always properly filled by vendors? Does
> it depend a lot on the Linux kernel version? Should I expect significant
> improvements for both in the next months?
> 
I don't know that either. This is why it's a good idea to preserve
the CC list!


> Thanks
> 
> Brice
> 
> [1] https://www.open-mpi.org/projects/hwloc/
> 
> 
> 
> Le 04/07/2022 à 17:10, Conor.Dooley@microchip.com a écrit :
>> On 04/07/2022 11:15, Sudeep Holla wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> Hi Greg,
>>>
>>> Let me know if you prefer to pull the patches directly or prefer pull
>>> request. It has been in -next for a while now.
>>>
>>> Hi All,
>>>
>>> This version updates cacheinfo to populate and use the information from
>>> there for all the cache topology.
>>>
>>> This series intends to fix some discrepancies we have in the CPU topology
>>> parsing from the device tree /cpu-map node. Also this diverges from the
>>> behaviour on a ACPI enabled platform. The expectation is that both DT
>>> and ACPI enabled systems must present consistent view of the CPU topology.
>>>
>>> Currently we assign generated cluster count as the physical package identifier
>>> for each CPU which is wrong. The device tree bindings for CPU topology supports
>>> sockets to infer the socket or physical package identifier for a given CPU.
>>> Also we don't check if all the cores/threads belong to the same cluster before
>>> updating their sibling masks which is fine as we don't set the cluster id yet.
>>>
>>> These changes also assigns the cluster identifier as parsed from the device tree
>>> cluster nodes within /cpu-map without support for nesting of the clusters.
>>> Finally, it also add support for socket nodes in /cpu-map. With this the
>>> parsing of exact same information from ACPI PPTT and /cpu-map DT node
>>> aligns well.
>>>
>>> The only exception is that the last level cache id information can be
>>> inferred from the same ACPI PPTT while we need to parse CPU cache nodes
>>> in the device tree.
>> For DT + RISC-V on PolarFire SoC and SiFive fu540
>> Tested-by: Conor Dooley <conor.dooley@microchip.com>
>>
>> Anecdotally, v5 was tested on the !SMP D1 which worked fine when
>> CONFIG_SMP was enabled.
>>
>> Thanks,
>> Conor.
>>
>>>
>>> v5[5]->v6:
>>>          - Handled out of memory case in early detected correctly after
>>>            Conor reported boot failures on some RISC-V platform. Also
>>>            added a log to show up failure of early cacheinfo detection.
>>>          - Added "Remove the unused find_acpi_cpu_cache_topology()" which
>>>            was missed earlier and posted separately
>>>          - Added all the additional tags recieved
>>>
>>> v4[4]->v5[5]:
>>>          - Added all the tags recieved so far. Rafael has acked only change
>>>            in ACPI and Catalin has acked only change in arm64.
>>>          - Addressed all the typos pointed by Ionela and dropped the patch
>>>            removing the checks for invalid package id as discussed and update
>>>            depth in nested cluster warning check.
>>>
>>> v3[3]->v4[4]:
>>>          - Updated ACPI PPTT fw_token to use table offset instead of virtual
>>>            address as it could get changed for everytime it is mapped before
>>>            the global acpi_permanent_mmap is set
>>>          - Added warning for the topology with nested clusters
>>>          - Added update to cpu_clustergroup_mask so that introduction of
>>>            correct cluster_id doesn't break existing platforms by limiting
>>>            the span of clustergroup_mask(by Ionela)
>>>
>>> v2[2]->v3[3]:
>>>          - Dropped support to get the device node for the CPU's LLC
>>>          - Updated cacheinfo to support calling of detect_cache_attributes
>>>            early in smp_prepare_cpus stage
>>>          - Added support to check if LLC is valid and shared in the cacheinfo
>>>          - Used the same in arch_topology
>>>
>>> v1[1]->v2[2]:
>>>          - Updated ID validity check include all non-negative value
>>>          - Added support to get the device node for the CPU's last level cache
>>>          - Added support to build llc_sibling on DT platforms
>>>
>>> [1] https://lore.kernel.org/lkml/20220513095559.1034633-1-sudeep.holla@arm.com
>>> [2] https://lore.kernel.org/lkml/20220518093325.2070336-1-sudeep.holla@arm.com
>>> [3] https://lore.kernel.org/lkml/20220525081416.3306043-1-sudeep.holla@arm.com
>>> [4] https://lore.kernel.org/lkml/20220621192034.3332546-1-sudeep.holla@arm.com
>>> [5] https://lore.kernel.org/lkml/20220627165047.336669-1-sudeep.holla@arm.com
>>>
>>> Ionela Voinescu (1):
>>>    arch_topology: Limit span of cpu_clustergroup_mask()
>>>
>>> Sudeep Holla (20):
>>>    ACPI: PPTT: Use table offset as fw_token instead of virtual address
>>>    cacheinfo: Use of_cpu_device_node_get instead cpu_dev->of_node
>>>    cacheinfo: Add helper to access any cache index for a given CPU
>>>    cacheinfo: Move cache_leaves_are_shared out of CONFIG_OF
>>>    cacheinfo: Add support to check if last level cache(LLC) is valid or shared
>>>    cacheinfo: Allow early detection and population of cache attributes
>>>    cacheinfo: Use cache identifiers to check if the caches are shared if available
>>>    cacheinfo: Align checks in cache_shared_cpu_map_{setup,remove} for readability
>>>    arch_topology: Add support to parse and detect cache attributes
>>>    arch_topology: Use the last level cache information from the cacheinfo
>>>    arm64: topology: Remove redundant setting of llc_id in CPU topology
>>>    arch_topology: Drop LLC identifier stash from the CPU topology
>>>    arch_topology: Set thread sibling cpumask only within the cluster
>>>    arch_topology: Check for non-negative value rather than -1 for IDs validity
>>>    arch_topology: Avoid parsing through all the CPUs once a outlier CPU is found
>>>    arch_topology: Don't set cluster identifier as physical package identifier
>>>    arch_topology: Set cluster identifier in each core/thread from /cpu-map
>>>    arch_topology: Add support for parsing sockets in /cpu-map
>>>    arch_topology: Warn that topology for nested clusters is not supported
>>>    ACPI: Remove the unused find_acpi_cpu_cache_topology()
>>>
>>>   arch/arm64/kernel/topology.c  |  14 ----
>>>   drivers/acpi/pptt.c           |  40 +---------
>>>   drivers/base/arch_topology.c  | 102 ++++++++++++++++++------
>>>   drivers/base/cacheinfo.c      | 143 ++++++++++++++++++++++------------
>>>   include/linux/acpi.h          |   5 --
>>>   include/linux/arch_topology.h |   1 -
>>>   include/linux/cacheinfo.h     |   3 +
>>>   7 files changed, 175 insertions(+), 133 deletions(-)
>>>
>>> -- 
>>> 2.37.0
>>>
Sudeep Holla July 5, 2022, 8:07 p.m. UTC | #4
On Tue, Jul 05, 2022 at 07:06:17PM +0000, Conor.Dooley@microchip.com wrote:
> [Adding back the CC list from the original thread]
> 
> On 05/07/2022 13:27, Brice Goglin wrote:
> > [You don't often get email from brice.goglin@inria.fr. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > Hello Conor
> > 
> > I am the main developer of hwloc [1] which is used by many people to
> > detect the topology of servers. We're started to see some users of hwloc
> > on RISC-V and we got some reports about the topology exposed by
> > Linux/sysfs being wrong on some platforms.
> > 
> > For instance https://github.com/open-mpi/hwloc/issues/536 says HiFive
> > Unmatched with SiFive Freedom U740 running Linux 5.15 exposes a single
> > core with 4 threads instead of 4 cores, while StarFive VisionFive v1
> > with JH7100 running 5.18.5 correctly exposes 2 cores.
> 
> And with Sudeep's patches applied I get (next-20220704):
> # hwloc-calc -N core all
> 1
> # hwloc-calc -N pu all
> 4
> On a PolarFire SoC (so the same as a SiFive U540).
> So unfortunately, these patches are not the fix you seek!
>

Not sure what you mean by that ?

> Wracked my brains for a bit, but could not see any differences
> between the U740 and the JH7100. Culprit seems to be the lack
> of a cpu-map node (which is only present in the downstream dt).
>

Indeed, the topology depends on /cpu-map node. However on ARM64 we do
have fallback settings in absence of /cpu-map node so that it is handled
correctly. I wasn't sure what was or can be done on RISC-V as /cpu-map
is optional.

> I've sent patches for the upstream devicetrees:
> https://lore.kernel.org/linux-riscv/20220705190435.1790466-1-mail@conchuod.ie/
>

I will take a look.

> > Does it depend a lot on the platform because
> > device-tree and/or ACPI aren't always properly filled by vendors?

Absolutely.

> > Does it depend a lot on the Linux kernel version?

Ideally not much, but hey we had some issues on Arm64 too which this series
is addressing.

> > Should I expect significant improvements for both in the next months?

Not much in topology or nothing planned. I have no idea on NUMA


Hi Conor,

I would have preferred you to add me to the original thread and referred
this thread from there. I don't want to derail the discussion in this
thread as nothing much can be done here.

--
Regards,
Sudeep
Conor Dooley July 5, 2022, 8:14 p.m. UTC | #5
On 05/07/2022 21:07, Sudeep Holla wrote:
> On Tue, Jul 05, 2022 at 07:06:17PM +0000, Conor.Dooley@microchip.com wrote:
>> [Adding back the CC list from the original thread]
>>
>> On 05/07/2022 13:27, Brice Goglin wrote:
>>> [You don't often get email from brice.goglin@inria.fr. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> Hello Conor
>>>
>>> I am the main developer of hwloc [1] which is used by many people to
>>> detect the topology of servers. We're started to see some users of hwloc
>>> on RISC-V and we got some reports about the topology exposed by
>>> Linux/sysfs being wrong on some platforms.
>>>
>>> For instance https://github.com/open-mpi/hwloc/issues/536 says HiFive
>>> Unmatched with SiFive Freedom U740 running Linux 5.15 exposes a single
>>> core with 4 threads instead of 4 cores, while StarFive VisionFive v1
>>> with JH7100 running 5.18.5 correctly exposes 2 cores.
>>
>> And with Sudeep's patches applied I get (next-20220704):
>> # hwloc-calc -N core all
>> 1
>> # hwloc-calc -N pu all
>> 4
>> On a PolarFire SoC (so the same as a SiFive U540).
>> So unfortunately, these patches are not the fix you seek!
>>
> 
> Not sure what you mean by that ?

Nothing meaningful really, just saying that this patchset
was unrelated to the problem he reported his response to
it.

> 
>> Wracked my brains for a bit, but could not see any differences
>> between the U740 and the JH7100. Culprit seems to be the lack
>> of a cpu-map node (which is only present in the downstream dt).
>>
> 
> Indeed, the topology depends on /cpu-map node. However on ARM64 we do
> have fallback settings in absence of /cpu-map node so that it is handled
> correctly. I wasn't sure what was or can be done on RISC-V as /cpu-map
> is optional.
> 
>> I've sent patches for the upstream devicetrees:
>> https://lore.kernel.org/linux-riscv/20220705190435.1790466-1-mail@conchuod.ie/
>>
> 
> I will take a look.
> 
>>> Does it depend a lot on the platform because
>>> device-tree and/or ACPI aren't always properly filled by vendors?
> 
> Absolutely.
> 
>>> Does it depend a lot on the Linux kernel version?
> 
> Ideally not much, but hey we had some issues on Arm64 too which this series
> is addressing.
> 
>>> Should I expect significant improvements for both in the next months?
> 
> Not much in topology or nothing planned. I have no idea on NUMA
> 
> 
> Hi Conor,
> 
> I would have preferred you to add me to the original thread and referred
> this thread from there. I don't want to derail the discussion in this
> thread as nothing much can be done here.

This is the original thread! It was just one off-list email that was a
to me only response to this arch_topologu thread that you can see here

But yeah - should have CCed you on the cpu-map stuff too.
Sudeep Holla July 5, 2022, 8:22 p.m. UTC | #6
On Tue, Jul 05, 2022 at 08:14:38PM +0000, Conor.Dooley@microchip.com wrote:
> 
> On 05/07/2022 21:07, Sudeep Holla wrote:

[...]

> > 
> > I would have preferred you to add me to the original thread and referred
> > this thread from there. I don't want to derail the discussion in this
> > thread as nothing much can be done here.
> 
> This is the original thread! It was just one off-list email that was a
> to me only response to this arch_topologu thread that you can see here
>

Ah OK, private mail. I missed to see that and assumed it was another
thread, sorry for that.

> But yeah - should have CCed you on the cpu-map stuff too.

No worries I spotted that and responded.