Message ID | 20171201222330.18863-7-jeremy.linton@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: > The PPTT can be used to determine the groupings of CPU's at > given levels in the system. Lets add a few routines to the PPTT > parsing code to return a unique id for each unique level in the > processor hierarchy. This can then be matched to build > thread/core/cluster/die/package/etc mappings for each processing > element in the system. > > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Why can't this be folded into patch [2/9]? Thanks, Rafael
Hi, First, thanks for taking a look at this. On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: > On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: >> The PPTT can be used to determine the groupings of CPU's at >> given levels in the system. Lets add a few routines to the PPTT >> parsing code to return a unique id for each unique level in the >> processor hierarchy. This can then be matched to build >> thread/core/cluster/die/package/etc mappings for each processing >> element in the system. >> >> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > > Why can't this be folded into patch [2/9]? It can, and I will be happy squash it. It was requested that the topology portion of the parser be split out back in v3. https://www.spinics.net/lists/linux-acpi/msg78487.html
On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: > Hi, > > First, thanks for taking a look at this. > > On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: > >On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: > >>The PPTT can be used to determine the groupings of CPU's at > >>given levels in the system. Lets add a few routines to the PPTT > >>parsing code to return a unique id for each unique level in the > >>processor hierarchy. This can then be matched to build > >>thread/core/cluster/die/package/etc mappings for each processing > >>element in the system. > >> > >>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > > > >Why can't this be folded into patch [2/9]? > > It can, and I will be happy squash it. > > It was requested that the topology portion of the parser be split > out back in v3. > > https://www.spinics.net/lists/linux-acpi/msg78487.html I asked to split cache/topology since I am not familiar with cache code and Sudeep - who looks after the cache code - won't be able to review this series in time for v4.16. Lorenzo
On Wed, Dec 13, 2017 at 6:38 PM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: >> Hi, >> >> First, thanks for taking a look at this. >> >> On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: >> >On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: >> >>The PPTT can be used to determine the groupings of CPU's at >> >>given levels in the system. Lets add a few routines to the PPTT >> >>parsing code to return a unique id for each unique level in the >> >>processor hierarchy. This can then be matched to build >> >>thread/core/cluster/die/package/etc mappings for each processing >> >>element in the system. >> >> >> >>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >> > >> >Why can't this be folded into patch [2/9]? >> >> It can, and I will be happy squash it. >> >> It was requested that the topology portion of the parser be split >> out back in v3. >> >> https://www.spinics.net/lists/linux-acpi/msg78487.html > > I asked to split cache/topology since I am not familiar with cache > code and Sudeep - who looks after the cache code - won't be able > to review this series in time for v4.16. OK, so why do we need it in 4.16?
Hi, On 12/13/2017 04:28 PM, Rafael J. Wysocki wrote: > On Wed, Dec 13, 2017 at 6:38 PM, Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com> wrote: >> On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: >>> Hi, >>> >>> First, thanks for taking a look at this. >>> >>> On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: >>>> On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: >>>>> The PPTT can be used to determine the groupings of CPU's at >>>>> given levels in the system. Lets add a few routines to the PPTT >>>>> parsing code to return a unique id for each unique level in the >>>>> processor hierarchy. This can then be matched to build >>>>> thread/core/cluster/die/package/etc mappings for each processing >>>>> element in the system. >>>>> >>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >>>> >>>> Why can't this be folded into patch [2/9]? >>> >>> It can, and I will be happy squash it. >>> >>> It was requested that the topology portion of the parser be split >>> out back in v3. >>> >>> https://www.spinics.net/lists/linux-acpi/msg78487.html >> >> I asked to split cache/topology since I am not familiar with cache >> code and Sudeep - who looks after the cache code - won't be able >> to review this series in time for v4.16. > > OK, so why do we need it in 4.16? I think its more case of as soon as possible. That is because there are machines where the topology is completely incorrect due to assumptions the kernel makes based on registers that aren't defined for that purpose (say describing which cores are in a physical socket, or LLC's attached to interconnects or memory controllers). This incorrect topology information is reported to things like the kernel scheduler, which then makes poor scheduling decisions resulting in sub-optimal system performance. This patchset (and ACPI 6.2) clears up a lot of those problems. Thanks,
On Thu, Dec 14, 2017 at 12:06 AM, Jeremy Linton <jeremy.linton@arm.com> wrote: > Hi, > > > On 12/13/2017 04:28 PM, Rafael J. Wysocki wrote: >> >> On Wed, Dec 13, 2017 at 6:38 PM, Lorenzo Pieralisi >> <lorenzo.pieralisi@arm.com> wrote: >>> >>> On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: >>>> >>>> Hi, >>>> >>>> First, thanks for taking a look at this. >>>> >>>> On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: >>>>> >>>>> On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: >>>>>> >>>>>> The PPTT can be used to determine the groupings of CPU's at >>>>>> given levels in the system. Lets add a few routines to the PPTT >>>>>> parsing code to return a unique id for each unique level in the >>>>>> processor hierarchy. This can then be matched to build >>>>>> thread/core/cluster/die/package/etc mappings for each processing >>>>>> element in the system. >>>>>> >>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >>>>> >>>>> >>>>> Why can't this be folded into patch [2/9]? >>>> >>>> >>>> It can, and I will be happy squash it. >>>> >>>> It was requested that the topology portion of the parser be split >>>> out back in v3. >>>> >>>> https://www.spinics.net/lists/linux-acpi/msg78487.html >>> >>> >>> I asked to split cache/topology since I am not familiar with cache >>> code and Sudeep - who looks after the cache code - won't be able >>> to review this series in time for v4.16. >> >> >> OK, so why do we need it in 4.16? > > > I think its more case of as soon as possible. That is because there are > machines where the topology is completely incorrect due to assumptions the > kernel makes based on registers that aren't defined for that purpose (say > describing which cores are in a physical socket, or LLC's attached to > interconnects or memory controllers). > > This incorrect topology information is reported to things like the kernel > scheduler, which then makes poor scheduling decisions resulting in > sub-optimal system performance. > > This patchset (and ACPI 6.2) clears up a lot of those problems. As long as the ACPI tables are as expected that is, I suppose? Anyway, fair enough, but I don't want to rush it in. Thanks, Rafael
Hi Jeremy, Sorry, I don't have your previous patch emails to reply on right patch context. So commenting on top of this patch. AFAIU, the PPTT v5 patches still rely on CLIDR_EL1 register to know the type of Caches enabled/available on the platform. With PPTT, it should not rely on architecture registers. There can be platforms which can report cache availability in PPTT but not in architecture registers. The following code snippet shows usage of CLIDR_EL1 In arch/arm64/kernel/cacheinfo.c static inline enum cache_type get_cache_type(int level) { u64 clidr; if (level > MAX_CACHE_LEVEL) return CACHE_TYPE_NOCACHE; clidr = read_sysreg(clidr_el1); return CLIDR_CTYPE(clidr, level); } static int __populate_cache_leaves(unsigned int cpu) { unsigned int level, idx; enum cache_type type; struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); struct cacheinfo *this_leaf = this_cpu_ci->info_list; for (idx = 0, level = 1; level <= this_cpu_ci->num_levels && idx < this_cpu_ci->num_leaves; idx++, level++) { type = get_cache_type(level); if (type == CACHE_TYPE_SEPARATE) { ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level); ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level); } else { ci_leaf_init(this_leaf++, type, level); } } return 0; } In populate_cache_leaves() the cache type is read from CLIDR_EL1 register. If CLIDR_EL1 reports CACHE_TYPE_NOCACHE for a particular level then sysfs entry /sys/devices/system/cpu/cpu0/index<n>/type is not created and hence userspace tools like lstopo will not report this cache level. Regards Vijay > -----Original Message----- > From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org] > On Behalf Of Rafael J. Wysocki > Sent: Thursday, December 14, 2017 4:40 AM > To: Jeremy Linton <jeremy.linton@arm.com> > Cc: Mark Rutland <mark.rutland@arm.com>; Jonathan.Zhang@cavium.com; > Jayachandran.Nair@cavium.com; Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com>; Catalin Marinas <catalin.marinas@arm.com>; > Rafael J. Wysocki <rafael@kernel.org>; jhugo@codeaurora.org; Will Deacon > <will.deacon@arm.com>; Linux PM <linux-pm@vger.kernel.org>; Rafael J. > Wysocki <rjw@rjwysocki.net>; Greg Kroah-Hartman > <gregkh@linuxfoundation.org>; Linux Kernel Mailing List <linux- > kernel@vger.kernel.org>; ACPI Devel Maling List <linux-acpi@vger.kernel.org>; > Viresh Kumar <viresh.kumar@linaro.org>; Hanjun Guo > <hanjun.guo@linaro.org>; Al Stone <ahs3@redhat.com>; Sudeep Holla > <sudeep.holla@arm.com>; austinwc@codeaurora.org; > wangxiongfeng2@huawei.com; linux-arm-kernel@lists.infradead.org; Len > Brown <lenb@kernel.org> > Subject: Re: [PATCH v5 6/9] ACPI/PPTT: Add topology parsing code > > On Thu, Dec 14, 2017 at 12:06 AM, Jeremy Linton <jeremy.linton@arm.com> > wrote: > > Hi, > > > > > > On 12/13/2017 04:28 PM, Rafael J. Wysocki wrote: > >> > >> On Wed, Dec 13, 2017 at 6:38 PM, Lorenzo Pieralisi > >> <lorenzo.pieralisi@arm.com> wrote: > >>> > >>> On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: > >>>> > >>>> Hi, > >>>> > >>>> First, thanks for taking a look at this. > >>>> > >>>> On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: > >>>>> > >>>>> On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: > >>>>>> > >>>>>> The PPTT can be used to determine the groupings of CPU's at given > >>>>>> levels in the system. Lets add a few routines to the PPTT parsing > >>>>>> code to return a unique id for each unique level in the processor > >>>>>> hierarchy. This can then be matched to build > >>>>>> thread/core/cluster/die/package/etc mappings for each processing > >>>>>> element in the system. > >>>>>> > >>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > >>>>> > >>>>> > >>>>> Why can't this be folded into patch [2/9]? > >>>> > >>>> > >>>> It can, and I will be happy squash it. > >>>> > >>>> It was requested that the topology portion of the parser be split > >>>> out back in v3. > >>>> > >>>> https://www.spinics.net/lists/linux-acpi/msg78487.html > >>> > >>> > >>> I asked to split cache/topology since I am not familiar with cache > >>> code and Sudeep - who looks after the cache code - won't be able to > >>> review this series in time for v4.16. > >> > >> > >> OK, so why do we need it in 4.16? > > > > > > I think its more case of as soon as possible. That is because there > > are machines where the topology is completely incorrect due to > > assumptions the kernel makes based on registers that aren't defined > > for that purpose (say describing which cores are in a physical socket, > > or LLC's attached to interconnects or memory controllers). > > > > This incorrect topology information is reported to things like the > > kernel scheduler, which then makes poor scheduling decisions resulting > > in sub-optimal system performance. > > > > This patchset (and ACPI 6.2) clears up a lot of those problems. > > As long as the ACPI tables are as expected that is, I suppose? > > Anyway, fair enough, but I don't want to rush it in. > > Thanks, > Rafael > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi, On 01/03/2018 02:49 AM, vkilari@codeaurora.org wrote: > Hi Jeremy, > > Sorry, I don't have your previous patch emails to reply on right patch > context. > So commenting on top of this patch. > > AFAIU, the PPTT v5 patches still rely on CLIDR_EL1 register to know the type > of > Caches enabled/available on the platform. With PPTT, it should not rely on > architecture > registers. There can be platforms which can report cache availability in > PPTT but not in > architecture registers. > > The following code snippet shows usage of CLIDR_EL1 > > In arch/arm64/kernel/cacheinfo.c > > static inline enum cache_type get_cache_type(int level) > { > u64 clidr; > > if (level > MAX_CACHE_LEVEL) > return CACHE_TYPE_NOCACHE; > clidr = read_sysreg(clidr_el1); > return CLIDR_CTYPE(clidr, level); > } > > static int __populate_cache_leaves(unsigned int cpu) > { > unsigned int level, idx; > enum cache_type type; > struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); > struct cacheinfo *this_leaf = this_cpu_ci->info_list; > > for (idx = 0, level = 1; level <= this_cpu_ci->num_levels && > idx < this_cpu_ci->num_leaves; idx++, level++) { > type = get_cache_type(level); > if (type == CACHE_TYPE_SEPARATE) { > ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level); > ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level); > } else { > ci_leaf_init(this_leaf++, type, level); > } > } > return 0; > } > > In populate_cache_leaves() the cache type is read from CLIDR_EL1 register. > If CLIDR_EL1 reports CACHE_TYPE_NOCACHE for a particular level then sysfs > entry > /sys/devices/system/cpu/cpu0/index<n>/type is not created and hence > userspace tools > like lstopo will not report this cache level. This sounds suspiciously like one of things tweaked between v4->v5. If you look at update_cache_properties() in patch 2/9, you will see that we only update/find NOCACHE nodes and convert them to UNIFIED when all the attributes in the node are supplied. This means that if the node has an incomplete set of attributes we won't update it. Can you verify that you have all those attributes set for nodes which aren't being described by the hardware? Thanks, > > Regards > Vijay > >> -----Original Message----- >> From: linux-arm-kernel > [mailto:linux-arm-kernel-bounces@lists.infradead.org] >> On Behalf Of Rafael J. Wysocki >> Sent: Thursday, December 14, 2017 4:40 AM >> To: Jeremy Linton <jeremy.linton@arm.com> >> Cc: Mark Rutland <mark.rutland@arm.com>; Jonathan.Zhang@cavium.com; >> Jayachandran.Nair@cavium.com; Lorenzo Pieralisi >> <lorenzo.pieralisi@arm.com>; Catalin Marinas <catalin.marinas@arm.com>; >> Rafael J. Wysocki <rafael@kernel.org>; jhugo@codeaurora.org; Will Deacon >> <will.deacon@arm.com>; Linux PM <linux-pm@vger.kernel.org>; Rafael J. >> Wysocki <rjw@rjwysocki.net>; Greg Kroah-Hartman >> <gregkh@linuxfoundation.org>; Linux Kernel Mailing List <linux- >> kernel@vger.kernel.org>; ACPI Devel Maling List > <linux-acpi@vger.kernel.org>; >> Viresh Kumar <viresh.kumar@linaro.org>; Hanjun Guo >> <hanjun.guo@linaro.org>; Al Stone <ahs3@redhat.com>; Sudeep Holla >> <sudeep.holla@arm.com>; austinwc@codeaurora.org; >> wangxiongfeng2@huawei.com; linux-arm-kernel@lists.infradead.org; Len >> Brown <lenb@kernel.org> >> Subject: Re: [PATCH v5 6/9] ACPI/PPTT: Add topology parsing code >> >> On Thu, Dec 14, 2017 at 12:06 AM, Jeremy Linton <jeremy.linton@arm.com> >> wrote: >>> Hi, >>> >>> >>> On 12/13/2017 04:28 PM, Rafael J. Wysocki wrote: >>>> >>>> On Wed, Dec 13, 2017 at 6:38 PM, Lorenzo Pieralisi >>>> <lorenzo.pieralisi@arm.com> wrote: >>>>> >>>>> On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> First, thanks for taking a look at this. >>>>>> >>>>>> On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: >>>>>>> >>>>>>> On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: >>>>>>>> >>>>>>>> The PPTT can be used to determine the groupings of CPU's at given >>>>>>>> levels in the system. Lets add a few routines to the PPTT parsing >>>>>>>> code to return a unique id for each unique level in the processor >>>>>>>> hierarchy. This can then be matched to build >>>>>>>> thread/core/cluster/die/package/etc mappings for each processing >>>>>>>> element in the system. >>>>>>>> >>>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >>>>>>> >>>>>>> >>>>>>> Why can't this be folded into patch [2/9]? >>>>>> >>>>>> >>>>>> It can, and I will be happy squash it. >>>>>> >>>>>> It was requested that the topology portion of the parser be split >>>>>> out back in v3. >>>>>> >>>>>> https://www.spinics.net/lists/linux-acpi/msg78487.html >>>>> >>>>> >>>>> I asked to split cache/topology since I am not familiar with cache >>>>> code and Sudeep - who looks after the cache code - won't be able to >>>>> review this series in time for v4.16. >>>> >>>> >>>> OK, so why do we need it in 4.16? >>> >>> >>> I think its more case of as soon as possible. That is because there >>> are machines where the topology is completely incorrect due to >>> assumptions the kernel makes based on registers that aren't defined >>> for that purpose (say describing which cores are in a physical socket, >>> or LLC's attached to interconnects or memory controllers). >>> >>> This incorrect topology information is reported to things like the >>> kernel scheduler, which then makes poor scheduling decisions resulting >>> in sub-optimal system performance. >>> >>> This patchset (and ACPI 6.2) clears up a lot of those problems. >> >> As long as the ACPI tables are as expected that is, I suppose? >> >> Anyway, fair enough, but I don't want to rush it in. >> >> Thanks, >> Rafael >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
Hi Jeremy > -----Original Message----- > From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org] > On Behalf Of Jeremy Linton > Sent: Wednesday, January 3, 2018 10:28 PM > To: vkilari@codeaurora.org > Cc: 'Mark Rutland' <mark.rutland@arm.com>; Jonathan.Zhang@cavium.com; > Jayachandran.Nair@cavium.com; 'Lorenzo Pieralisi' > <lorenzo.pieralisi@arm.com>; austinwc@codeaurora.org; 'Linux PM' <linux- > pm@vger.kernel.org>; jhugo@codeaurora.org; 'Catalin Marinas' > <catalin.marinas@arm.com>; 'Sudeep Holla' <sudeep.holla@arm.com>; 'Will > Deacon' <will.deacon@arm.com>; 'Linux Kernel Mailing List' <linux- > kernel@vger.kernel.org>; wangxiongfeng2@huawei.com; 'ACPI Devel Maling > List' <linux-acpi@vger.kernel.org>; 'Viresh Kumar' <viresh.kumar@linaro.org>; > 'Rafael J. Wysocki' <rjw@rjwysocki.net>; 'Hanjun Guo' > <hanjun.guo@linaro.org>; 'Greg Kroah-Hartman' > <gregkh@linuxfoundation.org>; 'Rafael J. Wysocki' <rafael@kernel.org>; 'Al > Stone' <ahs3@redhat.com>; linux-arm-kernel@lists.infradead.org; 'Len Brown' > <lenb@kernel.org> > Subject: Re: [PATCH v5 6/9] ACPI/PPTT: Add topology parsing code > > Hi, > > On 01/03/2018 02:49 AM, vkilari@codeaurora.org wrote: > > Hi Jeremy, > > > > Sorry, I don't have your previous patch emails to reply on right > > patch context. > > So commenting on top of this patch. > > > > AFAIU, the PPTT v5 patches still rely on CLIDR_EL1 register to know > > the type of Caches enabled/available on the platform. With PPTT, it > > should not rely on architecture registers. There can be platforms > > which can report cache availability in PPTT but not in architecture > > registers. > > > > The following code snippet shows usage of CLIDR_EL1 > > > > In arch/arm64/kernel/cacheinfo.c > > > > static inline enum cache_type get_cache_type(int level) { > > u64 clidr; > > > > if (level > MAX_CACHE_LEVEL) > > return CACHE_TYPE_NOCACHE; > > clidr = read_sysreg(clidr_el1); > > return CLIDR_CTYPE(clidr, level); } > > > > static int __populate_cache_leaves(unsigned int cpu) { > > unsigned int level, idx; > > enum cache_type type; > > struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); > > struct cacheinfo *this_leaf = this_cpu_ci->info_list; > > > > for (idx = 0, level = 1; level <= this_cpu_ci->num_levels && > > idx < this_cpu_ci->num_leaves; idx++, level++) { > > type = get_cache_type(level); > > if (type == CACHE_TYPE_SEPARATE) { > > ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level); > > ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level); > > } else { > > ci_leaf_init(this_leaf++, type, level); > > } > > } > > return 0; > > } > > > > In populate_cache_leaves() the cache type is read from CLIDR_EL1 register. > > If CLIDR_EL1 reports CACHE_TYPE_NOCACHE for a particular level then > > sysfs entry /sys/devices/system/cpu/cpu0/index<n>/type is not created > > and hence userspace tools like lstopo will not report this cache > > level. > > > This sounds suspiciously like one of things tweaked between v4->v5. If you look > at update_cache_properties() in patch 2/9, you will see that we only > update/find NOCACHE nodes and convert them to UNIFIED when all the > attributes in the node are supplied. > > This means that if the node has an incomplete set of attributes we won't update > it. Can you verify that you have all those attributes set for nodes which aren't > being described by the hardware? Thanks for pointing out. Why do we need to check for set of attributes and decide it as UNIFIED cache.? We can get cache type from attributes bits[3:2] if cache type valid flag is set irrespective of other attributes. If cache type valid flag is not set then we can assume it as NOCACHE type as neither architecture register nor in PPTT has valid cache type. > > Thanks, > > > > > > Regards > > Vijay > > > >> -----Original Message----- > >> From: linux-arm-kernel > > [mailto:linux-arm-kernel-bounces@lists.infradead.org] > >> On Behalf Of Rafael J. Wysocki > >> Sent: Thursday, December 14, 2017 4:40 AM > >> To: Jeremy Linton <jeremy.linton@arm.com> > >> Cc: Mark Rutland <mark.rutland@arm.com>; Jonathan.Zhang@cavium.com; > >> Jayachandran.Nair@cavium.com; Lorenzo Pieralisi > >> <lorenzo.pieralisi@arm.com>; Catalin Marinas > >> <catalin.marinas@arm.com>; Rafael J. Wysocki <rafael@kernel.org>; > >> jhugo@codeaurora.org; Will Deacon <will.deacon@arm.com>; Linux PM > <linux-pm@vger.kernel.org>; Rafael J. > >> Wysocki <rjw@rjwysocki.net>; Greg Kroah-Hartman > >> <gregkh@linuxfoundation.org>; Linux Kernel Mailing List <linux- > >> kernel@vger.kernel.org>; ACPI Devel Maling List > > <linux-acpi@vger.kernel.org>; > >> Viresh Kumar <viresh.kumar@linaro.org>; Hanjun Guo > >> <hanjun.guo@linaro.org>; Al Stone <ahs3@redhat.com>; Sudeep Holla > >> <sudeep.holla@arm.com>; austinwc@codeaurora.org; > >> wangxiongfeng2@huawei.com; linux-arm-kernel@lists.infradead.org; Len > >> Brown <lenb@kernel.org> > >> Subject: Re: [PATCH v5 6/9] ACPI/PPTT: Add topology parsing code > >> > >> On Thu, Dec 14, 2017 at 12:06 AM, Jeremy Linton > >> <jeremy.linton@arm.com> > >> wrote: > >>> Hi, > >>> > >>> > >>> On 12/13/2017 04:28 PM, Rafael J. Wysocki wrote: > >>>> > >>>> On Wed, Dec 13, 2017 at 6:38 PM, Lorenzo Pieralisi > >>>> <lorenzo.pieralisi@arm.com> wrote: > >>>>> > >>>>> On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> First, thanks for taking a look at this. > >>>>>> > >>>>>> On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: > >>>>>>> > >>>>>>> On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: > >>>>>>>> > >>>>>>>> The PPTT can be used to determine the groupings of CPU's at > >>>>>>>> given levels in the system. Lets add a few routines to the PPTT > >>>>>>>> parsing code to return a unique id for each unique level in the > >>>>>>>> processor hierarchy. This can then be matched to build > >>>>>>>> thread/core/cluster/die/package/etc mappings for each > >>>>>>>> processing element in the system. > >>>>>>>> > >>>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > >>>>>>> > >>>>>>> > >>>>>>> Why can't this be folded into patch [2/9]? > >>>>>> > >>>>>> > >>>>>> It can, and I will be happy squash it. > >>>>>> > >>>>>> It was requested that the topology portion of the parser be split > >>>>>> out back in v3. > >>>>>> > >>>>>> https://www.spinics.net/lists/linux-acpi/msg78487.html > >>>>> > >>>>> > >>>>> I asked to split cache/topology since I am not familiar with cache > >>>>> code and Sudeep - who looks after the cache code - won't be able > >>>>> to review this series in time for v4.16. > >>>> > >>>> > >>>> OK, so why do we need it in 4.16? > >>> > >>> > >>> I think its more case of as soon as possible. That is because there > >>> are machines where the topology is completely incorrect due to > >>> assumptions the kernel makes based on registers that aren't defined > >>> for that purpose (say describing which cores are in a physical > >>> socket, or LLC's attached to interconnects or memory controllers). > >>> > >>> This incorrect topology information is reported to things like the > >>> kernel scheduler, which then makes poor scheduling decisions > >>> resulting in sub-optimal system performance. > >>> > >>> This patchset (and ACPI 6.2) clears up a lot of those problems. > >> > >> As long as the ACPI tables are as expected that is, I suppose? > >> > >> Anyway, fair enough, but I don't want to rush it in. > >> > >> Thanks, > >> Rafael > >> > >> _______________________________________________ > >> linux-arm-kernel mailing list > >> linux-arm-kernel@lists.infradead.org > >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi, On 01/04/2018 12:48 AM, vkilari@codeaurora.org wrote: > Hi Jeremy > >> -----Original Message----- >> From: linux-arm-kernel > [mailto:linux-arm-kernel-bounces@lists.infradead.org] >> On Behalf Of Jeremy Linton >> Sent: Wednesday, January 3, 2018 10:28 PM >> To: vkilari@codeaurora.org >> Cc: 'Mark Rutland' <mark.rutland@arm.com>; Jonathan.Zhang@cavium.com; >> Jayachandran.Nair@cavium.com; 'Lorenzo Pieralisi' >> <lorenzo.pieralisi@arm.com>; austinwc@codeaurora.org; 'Linux PM' <linux- >> pm@vger.kernel.org>; jhugo@codeaurora.org; 'Catalin Marinas' >> <catalin.marinas@arm.com>; 'Sudeep Holla' <sudeep.holla@arm.com>; 'Will >> Deacon' <will.deacon@arm.com>; 'Linux Kernel Mailing List' <linux- >> kernel@vger.kernel.org>; wangxiongfeng2@huawei.com; 'ACPI Devel Maling >> List' <linux-acpi@vger.kernel.org>; 'Viresh Kumar' > <viresh.kumar@linaro.org>; >> 'Rafael J. Wysocki' <rjw@rjwysocki.net>; 'Hanjun Guo' >> <hanjun.guo@linaro.org>; 'Greg Kroah-Hartman' >> <gregkh@linuxfoundation.org>; 'Rafael J. Wysocki' <rafael@kernel.org>; 'Al >> Stone' <ahs3@redhat.com>; linux-arm-kernel@lists.infradead.org; 'Len > Brown' >> <lenb@kernel.org> >> Subject: Re: [PATCH v5 6/9] ACPI/PPTT: Add topology parsing code >> >> Hi, >> >> On 01/03/2018 02:49 AM, vkilari@codeaurora.org wrote: >>> Hi Jeremy, >>> >>> Sorry, I don't have your previous patch emails to reply on right >>> patch context. >>> So commenting on top of this patch. >>> >>> AFAIU, the PPTT v5 patches still rely on CLIDR_EL1 register to know >>> the type of Caches enabled/available on the platform. With PPTT, it >>> should not rely on architecture registers. There can be platforms >>> which can report cache availability in PPTT but not in architecture >>> registers. >>> >>> The following code snippet shows usage of CLIDR_EL1 >>> >>> In arch/arm64/kernel/cacheinfo.c >>> >>> static inline enum cache_type get_cache_type(int level) { >>> u64 clidr; >>> >>> if (level > MAX_CACHE_LEVEL) >>> return CACHE_TYPE_NOCACHE; >>> clidr = read_sysreg(clidr_el1); >>> return CLIDR_CTYPE(clidr, level); } >>> >>> static int __populate_cache_leaves(unsigned int cpu) { >>> unsigned int level, idx; >>> enum cache_type type; >>> struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); >>> struct cacheinfo *this_leaf = this_cpu_ci->info_list; >>> >>> for (idx = 0, level = 1; level <= this_cpu_ci->num_levels && >>> idx < this_cpu_ci->num_leaves; idx++, level++) { >>> type = get_cache_type(level); >>> if (type == CACHE_TYPE_SEPARATE) { >>> ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, > level); >>> ci_leaf_init(this_leaf++, CACHE_TYPE_INST, > level); >>> } else { >>> ci_leaf_init(this_leaf++, type, level); >>> } >>> } >>> return 0; >>> } >>> >>> In populate_cache_leaves() the cache type is read from CLIDR_EL1 > register. >>> If CLIDR_EL1 reports CACHE_TYPE_NOCACHE for a particular level then >>> sysfs entry /sys/devices/system/cpu/cpu0/index<n>/type is not created >>> and hence userspace tools like lstopo will not report this cache >>> level. >> >> >> This sounds suspiciously like one of things tweaked between v4->v5. If you > look >> at update_cache_properties() in patch 2/9, you will see that we only >> update/find NOCACHE nodes and convert them to UNIFIED when all the >> attributes in the node are supplied. >> >> This means that if the node has an incomplete set of attributes we won't > update >> it. Can you verify that you have all those attributes set for nodes which > aren't >> being described by the hardware? > > Thanks for pointing out. > Why do we need to check for set of attributes and decide it as UNIFIED > cache.? > We can get cache type from attributes bits[3:2] if cache type valid flag is > set > irrespective of other attributes. If cache type valid flag is not set then > we can assume > it as NOCACHE type as neither architecture register nor in PPTT has valid > cache type. To answer the first question, in a strict sense we don't need to check any of the attributes in order to override the cache type. That said, initially I was going to trigger the override only when important attributes were set to assure that we weren't exporting meaningless nodes into sysfs. Then while picking which attributes I considered important, I came to the conclusion that it was simply better to assure that they were all set for nodes entirely generated by the PPTT. AKA, I don't want to see L3 cache nodes with their size or associativity unset, its better in that case that they remain hidden. Per, the cache type valid bit. The code is written with the assumption that it is overriding probed values (despite that not being true at the moment for arm64) in the spirit of the standard. This informs/restricts how the code works because we aren't simply generating the entire cacheinfo directly from PPTT walks. Instead we are merging the PPTT information with anything previously probed, meaning we need a way to match existing cacheinfo structures with PPTT nodes. So, the logic finding/matching an existing probed cache node requires that the cache type is valid because the cache level, and type is used as the match key. If the PPTT cache node doesn't have the cache type valid set, then the match logic won't find the node, and the PPTT code won't make any updates. That may also be what your seeing.. Basically what is happening is that cacheinfo NOCACHE nodes that happen to match valid PPTT UNIFIED nodes, can have their cache types overridden, but only if we determine the remainder of the PPTT node has sufficient information that we aren't exporting cacheinfo structures without useful information. Currently, the only time this can happen is for nodes which are entirely PPTT generated, so I think its fair the PPTT contain enough information to make those nodes useful. Thanks, > >> >> Thanks, >> >> >>> >>> Regards >>> Vijay >>> >>>> -----Original Message----- >>>> From: linux-arm-kernel >>> [mailto:linux-arm-kernel-bounces@lists.infradead.org] >>>> On Behalf Of Rafael J. Wysocki >>>> Sent: Thursday, December 14, 2017 4:40 AM >>>> To: Jeremy Linton <jeremy.linton@arm.com> >>>> Cc: Mark Rutland <mark.rutland@arm.com>; Jonathan.Zhang@cavium.com; >>>> Jayachandran.Nair@cavium.com; Lorenzo Pieralisi >>>> <lorenzo.pieralisi@arm.com>; Catalin Marinas >>>> <catalin.marinas@arm.com>; Rafael J. Wysocki <rafael@kernel.org>; >>>> jhugo@codeaurora.org; Will Deacon <will.deacon@arm.com>; Linux PM >> <linux-pm@vger.kernel.org>; Rafael J. >>>> Wysocki <rjw@rjwysocki.net>; Greg Kroah-Hartman >>>> <gregkh@linuxfoundation.org>; Linux Kernel Mailing List <linux- >>>> kernel@vger.kernel.org>; ACPI Devel Maling List >>> <linux-acpi@vger.kernel.org>; >>>> Viresh Kumar <viresh.kumar@linaro.org>; Hanjun Guo >>>> <hanjun.guo@linaro.org>; Al Stone <ahs3@redhat.com>; Sudeep Holla >>>> <sudeep.holla@arm.com>; austinwc@codeaurora.org; >>>> wangxiongfeng2@huawei.com; linux-arm-kernel@lists.infradead.org; Len >>>> Brown <lenb@kernel.org> >>>> Subject: Re: [PATCH v5 6/9] ACPI/PPTT: Add topology parsing code >>>> >>>> On Thu, Dec 14, 2017 at 12:06 AM, Jeremy Linton >>>> <jeremy.linton@arm.com> >>>> wrote: >>>>> Hi, >>>>> >>>>> >>>>> On 12/13/2017 04:28 PM, Rafael J. Wysocki wrote: >>>>>> >>>>>> On Wed, Dec 13, 2017 at 6:38 PM, Lorenzo Pieralisi >>>>>> <lorenzo.pieralisi@arm.com> wrote: >>>>>>> >>>>>>> On Tue, Dec 12, 2017 at 10:13:08AM -0600, Jeremy Linton wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> First, thanks for taking a look at this. >>>>>>>> >>>>>>>> On 12/11/2017 07:12 PM, Rafael J. Wysocki wrote: >>>>>>>>> >>>>>>>>> On Friday, December 1, 2017 11:23:27 PM CET Jeremy Linton wrote: >>>>>>>>>> >>>>>>>>>> The PPTT can be used to determine the groupings of CPU's at >>>>>>>>>> given levels in the system. Lets add a few routines to the PPTT >>>>>>>>>> parsing code to return a unique id for each unique level in the >>>>>>>>>> processor hierarchy. This can then be matched to build >>>>>>>>>> thread/core/cluster/die/package/etc mappings for each >>>>>>>>>> processing element in the system. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >>>>>>>>> >>>>>>>>> >>>>>>>>> Why can't this be folded into patch [2/9]? >>>>>>>> >>>>>>>> >>>>>>>> It can, and I will be happy squash it. >>>>>>>> >>>>>>>> It was requested that the topology portion of the parser be split >>>>>>>> out back in v3. >>>>>>>> >>>>>>>> https://www.spinics.net/lists/linux-acpi/msg78487.html >>>>>>> >>>>>>> >>>>>>> I asked to split cache/topology since I am not familiar with cache >>>>>>> code and Sudeep - who looks after the cache code - won't be able >>>>>>> to review this series in time for v4.16. >>>>>> >>>>>> >>>>>> OK, so why do we need it in 4.16? >>>>> >>>>> >>>>> I think its more case of as soon as possible. That is because there >>>>> are machines where the topology is completely incorrect due to >>>>> assumptions the kernel makes based on registers that aren't defined >>>>> for that purpose (say describing which cores are in a physical >>>>> socket, or LLC's attached to interconnects or memory controllers). >>>>> >>>>> This incorrect topology information is reported to things like the >>>>> kernel scheduler, which then makes poor scheduling decisions >>>>> resulting in sub-optimal system performance. >>>>> >>>>> This patchset (and ACPI 6.2) clears up a lot of those problems. >>>> >>>> As long as the ACPI tables are as expected that is, I suppose? >>>> >>>> Anyway, fair enough, but I don't want to rush it in. >>>> >>>> Thanks, >>>> Rafael >>>> >>>> _______________________________________________ >>>> linux-arm-kernel mailing list >>>> linux-arm-kernel@lists.infradead.org >>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>> >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index a35e457cefb7..b9b7b8b8ad21 100644 --- a/drivers/acpi/pptt.c +++ b/drivers/acpi/pptt.c @@ -412,6 +412,79 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table, } } +/* Passing level values greater than this will result in search termination */ +#define PPTT_ABORT_PACKAGE 0xFF + +/* + * Given an acpi_pptt_processor node, walk up until we identify the + * package that the node is associated with, or we run out of levels + * to request or the search is terminated with a flag match + * The level parameter also serves to limit possible loops within the tree. + */ +static struct acpi_pptt_processor *acpi_find_processor_package_id( + struct acpi_table_header *table_hdr, + struct acpi_pptt_processor *cpu, + int level, int flag) +{ + struct acpi_pptt_processor *prev_node; + + while (cpu && level) { + if (cpu->flags & flag) + break; + pr_debug("level %d\n", level); + prev_node = fetch_pptt_node(table_hdr, cpu->parent); + if (prev_node == NULL) + break; + cpu = prev_node; + level--; + } + return cpu; +} + +/* + * Get a unique value given a cpu, and a topology level, that can be + * matched to determine which cpus share common topological features + * at that level. + */ +static int topology_get_acpi_cpu_tag(struct acpi_table_header *table, + unsigned int cpu, int level, int flag) +{ + struct acpi_pptt_processor *cpu_node; + u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu); + + cpu_node = acpi_find_processor_node(table, acpi_cpu_id); + if (cpu_node) { + cpu_node = acpi_find_processor_package_id(table, cpu_node, + level, flag); + /* Only the first level has a guaranteed id */ + if (level == 0) + return cpu_node->acpi_processor_id; + return (int)((u8 *)cpu_node - (u8 *)table); + } + pr_err_once("PPTT table found, but unable to locate core for %d\n", + cpu); + return -ENOENT; +} + +static int find_acpi_cpu_topology_tag(unsigned int cpu, int level, int flag) +{ + struct acpi_table_header *table; + acpi_status status; + int retval; + + status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); + if (ACPI_FAILURE(status)) { + pr_err_once("No PPTT table found, cpu topology may be inaccurate\n"); + return -ENOENT; + } + retval = topology_get_acpi_cpu_tag(table, cpu, level, flag); + pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n", + cpu, level, retval); + acpi_put_table(table); + + return retval; +} + /** * acpi_find_last_cache_level() - Determines the number of cache levels for a PE * @cpu: Kernel logical cpu number @@ -475,3 +548,45 @@ int cache_setup_acpi(unsigned int cpu) return status; } + +/** + * find_acpi_cpu_topology() - Determine a unique topology value for a given cpu + * @cpu: Kernel logical cpu number + * @level: The topological level for which we would like a unique ID + * + * Determine a topology unique ID for each thread/core/cluster/mc_grouping + * /socket/etc. This ID can then be used to group peers, which will have + * matching ids. + * + * The search terminates when either the requested level is found or + * we reach a root node. Levels beyond the termination point will return the + * same unique ID. The unique id for level 0 is the acpi processor id. All + * other levels beyond this use a generated value to uniquely identify + * a topological feature. + * + * Return: -ENOENT if the PPTT doesn't exist, or the cpu cannot be found. + * Otherwise returns a value which represents a unique topological feature. + */ +int find_acpi_cpu_topology(unsigned int cpu, int level) +{ + return find_acpi_cpu_topology_tag(cpu, level, 0); +} + +/** + * find_acpi_cpu_topology_package() - Determine a unique cpu package value + * @cpu: Kernel logical cpu number + * + * Determine a topology unique package ID for the given cpu. + * This ID can then be used to group peers, which will have matching ids. + * + * The search terminates when either a level is found with the PHYSICAL_PACKAGE + * flag set or we reach a root node. + * + * Return: -ENOENT if the PPTT doesn't exist, or the cpu cannot be found. + * Otherwise returns a value which represents the package for this cpu. + */ +int find_acpi_cpu_topology_package(unsigned int cpu) +{ + return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE, + ACPI_PPTT_PHYSICAL_PACKAGE); +}
The PPTT can be used to determine the groupings of CPU's at given levels in the system. Lets add a few routines to the PPTT parsing code to return a unique id for each unique level in the processor hierarchy. This can then be matched to build thread/core/cluster/die/package/etc mappings for each processing element in the system. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> --- drivers/acpi/pptt.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+)