Message ID | 20240730044917.4680-3-Dhananjay.Ugwekar@amd.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | RAPL driver fixes for AMD CPUs | expand |
On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> wrote: > > After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"), > on AMD processors that support extended CPUID leaf 0x80000026, the > topology_logical_die_id() macros, no longer returns package id, instead it > returns the CCD (Core Complex Die) id. This leads to the energy-pkg > event scope to be modified to CCD instead of package. > > For more historical context, please refer to commit 32fb480e0a2c > ("powercap/intel_rapl: Support multi-die/package"), which initially changed > the RAPL scope from package to die for all systems, as Intel systems > with Die enumeration have RAPL scope as die, and those without die > enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked > correctly with topology_logical_die_id() until recently, but this changed > after the "0x80000026 leaf" commit mentioned above. > > Future multi-die Intel systems will have package scope RAPL counters, > but they will be using TPMI RAPL interface, which is not affected by > this change. > > Replacing topology_logical_die_id() with topology_physical_package_id() > conditionally only for AMD and Hygon fixes the energy-pkg event. > > On an AMD 2 socket 8 CCD Zen4 server: > > Before: > > linux$ ls /sys/class/powercap/ > intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d > intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0 > intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e > intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0 > intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f > intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0 > intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0 > intel-rapl:3 intel-rapl:7:0 intel-rapl:c > intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0 > > After: > > linux$ ls /sys/class/powercap/ > intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0 > > Only one sysfs entry per-event per-package is created after this change. > > Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") > Reported-by: Michael Larabel <michael@michaellarabel.com> > Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> > Reviewed-by: Zhang Rui <rui.zhang@intel.com> > --- > Changes in v2: > * Updated scope description comment, commit log > * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope() > * Check topology_logical_(die/package)_id return value This patch does not depend on the first one in the series if I'm not mistaken, in which case I can pick it up separately if you want me to do that, so please let me know. Thanks! > --- > drivers/powercap/intel_rapl_common.c | 34 ++++++++++++++++++++++++---- > 1 file changed, 30 insertions(+), 4 deletions(-) > > diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c > index 3cffa6c79538..4bc56acb99d6 100644 > --- a/drivers/powercap/intel_rapl_common.c > +++ b/drivers/powercap/intel_rapl_common.c > @@ -2128,6 +2128,21 @@ void rapl_remove_package(struct rapl_package *rp) > } > EXPORT_SYMBOL_GPL(rapl_remove_package); > > +/* > + * RAPL Package energy counter scope: > + * 1. AMD/HYGON platforms use per-PKG package energy counter > + * 2. For Intel platforms > + * 2.1 CLX-AP platform has per-DIE package energy counter > + * 2.2 Other platforms that uses MSR RAPL are single die systems so the > + * package energy counter can be considered as per-PKG/per-DIE, > + * here it is considered as per-DIE. > + * 2.3 New platforms that use TPMI RAPL doesn't care about the > + * scope because they are not MSR/CPU based. > + */ > +#define rapl_msrs_are_pkg_scope() \ > + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \ > + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) > + > /* caller to ensure CPU hotplug lock is held */ > struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv, > bool id_is_cpu) > @@ -2135,8 +2150,14 @@ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_ > struct rapl_package *rp; > int uid; > > - if (id_is_cpu) > - uid = topology_logical_die_id(id); > + if (id_is_cpu) { > + uid = rapl_msrs_are_pkg_scope() ? > + topology_physical_package_id(id) : topology_logical_die_id(id); > + if (uid < 0) { > + pr_err("topology_logical_(package/die)_id() returned a negative value"); > + return ERR_PTR(-EINVAL); > + } > + } > else > uid = id; > > @@ -2168,9 +2189,14 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr > return ERR_PTR(-ENOMEM); > > if (id_is_cpu) { > - rp->id = topology_logical_die_id(id); > + rp->id = rapl_msrs_are_pkg_scope() ? > + topology_physical_package_id(id) : topology_logical_die_id(id); > + if ((int)(rp->id) < 0) { > + pr_err("topology_logical_(package/die)_id() returned a negative value"); > + return ERR_PTR(-EINVAL); > + } > rp->lead_cpu = id; > - if (topology_max_dies_per_package() > 1) > + if (!rapl_msrs_are_pkg_scope() && topology_max_dies_per_package() > 1) > snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d", > topology_physical_package_id(id), topology_die_id(id)); > else > -- > 2.43.0 > >
Hello Rafael, On 8/2/2024 6:05 PM, Rafael J. Wysocki wrote: > On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar > <Dhananjay.Ugwekar@amd.com> wrote: >> >> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"), >> on AMD processors that support extended CPUID leaf 0x80000026, the >> topology_logical_die_id() macros, no longer returns package id, instead it >> returns the CCD (Core Complex Die) id. This leads to the energy-pkg >> event scope to be modified to CCD instead of package. >> >> For more historical context, please refer to commit 32fb480e0a2c >> ("powercap/intel_rapl: Support multi-die/package"), which initially changed >> the RAPL scope from package to die for all systems, as Intel systems >> with Die enumeration have RAPL scope as die, and those without die >> enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked >> correctly with topology_logical_die_id() until recently, but this changed >> after the "0x80000026 leaf" commit mentioned above. >> >> Future multi-die Intel systems will have package scope RAPL counters, >> but they will be using TPMI RAPL interface, which is not affected by >> this change. >> >> Replacing topology_logical_die_id() with topology_physical_package_id() >> conditionally only for AMD and Hygon fixes the energy-pkg event. >> >> On an AMD 2 socket 8 CCD Zen4 server: >> >> Before: >> >> linux$ ls /sys/class/powercap/ >> intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d >> intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0 >> intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e >> intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0 >> intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f >> intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0 >> intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0 >> intel-rapl:3 intel-rapl:7:0 intel-rapl:c >> intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0 >> >> After: >> >> linux$ ls /sys/class/powercap/ >> intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0 >> >> Only one sysfs entry per-event per-package is created after this change. >> >> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") >> Reported-by: Michael Larabel <michael@michaellarabel.com> >> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> >> Reviewed-by: Zhang Rui <rui.zhang@intel.com> >> --- >> Changes in v2: >> * Updated scope description comment, commit log >> * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope() >> * Check topology_logical_(die/package)_id return value > > This patch does not depend on the first one in the series if I'm not > mistaken, in which case I can pick it up separately if you want me to > do that, so please let me know. Sorry for the late reply, was out sick, Yes, please pick this patch separately, it is independent from the first one. Thanks, Dhananjay > > Thanks! > >> --- >> drivers/powercap/intel_rapl_common.c | 34 ++++++++++++++++++++++++---- >> 1 file changed, 30 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c >> index 3cffa6c79538..4bc56acb99d6 100644 >> --- a/drivers/powercap/intel_rapl_common.c >> +++ b/drivers/powercap/intel_rapl_common.c >> @@ -2128,6 +2128,21 @@ void rapl_remove_package(struct rapl_package *rp) >> } >> EXPORT_SYMBOL_GPL(rapl_remove_package); >> >> +/* >> + * RAPL Package energy counter scope: >> + * 1. AMD/HYGON platforms use per-PKG package energy counter >> + * 2. For Intel platforms >> + * 2.1 CLX-AP platform has per-DIE package energy counter >> + * 2.2 Other platforms that uses MSR RAPL are single die systems so the >> + * package energy counter can be considered as per-PKG/per-DIE, >> + * here it is considered as per-DIE. >> + * 2.3 New platforms that use TPMI RAPL doesn't care about the >> + * scope because they are not MSR/CPU based. >> + */ >> +#define rapl_msrs_are_pkg_scope() \ >> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \ >> + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) >> + >> /* caller to ensure CPU hotplug lock is held */ >> struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv, >> bool id_is_cpu) >> @@ -2135,8 +2150,14 @@ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_ >> struct rapl_package *rp; >> int uid; >> >> - if (id_is_cpu) >> - uid = topology_logical_die_id(id); >> + if (id_is_cpu) { >> + uid = rapl_msrs_are_pkg_scope() ? >> + topology_physical_package_id(id) : topology_logical_die_id(id); >> + if (uid < 0) { >> + pr_err("topology_logical_(package/die)_id() returned a negative value"); >> + return ERR_PTR(-EINVAL); >> + } >> + } >> else >> uid = id; >> >> @@ -2168,9 +2189,14 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr >> return ERR_PTR(-ENOMEM); >> >> if (id_is_cpu) { >> - rp->id = topology_logical_die_id(id); >> + rp->id = rapl_msrs_are_pkg_scope() ? >> + topology_physical_package_id(id) : topology_logical_die_id(id); >> + if ((int)(rp->id) < 0) { >> + pr_err("topology_logical_(package/die)_id() returned a negative value"); >> + return ERR_PTR(-EINVAL); >> + } >> rp->lead_cpu = id; >> - if (topology_max_dies_per_package() > 1) >> + if (!rapl_msrs_are_pkg_scope() && topology_max_dies_per_package() > 1) >> snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d", >> topology_physical_package_id(id), topology_die_id(id)); >> else >> -- >> 2.43.0 >> >>
On Thu, Aug 8, 2024 at 1:18 PM Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> wrote: > > Hello Rafael, > > On 8/2/2024 6:05 PM, Rafael J. Wysocki wrote: > > On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar > > <Dhananjay.Ugwekar@amd.com> wrote: > >> > >> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"), > >> on AMD processors that support extended CPUID leaf 0x80000026, the > >> topology_logical_die_id() macros, no longer returns package id, instead it > >> returns the CCD (Core Complex Die) id. This leads to the energy-pkg > >> event scope to be modified to CCD instead of package. > >> > >> For more historical context, please refer to commit 32fb480e0a2c > >> ("powercap/intel_rapl: Support multi-die/package"), which initially changed > >> the RAPL scope from package to die for all systems, as Intel systems > >> with Die enumeration have RAPL scope as die, and those without die > >> enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked > >> correctly with topology_logical_die_id() until recently, but this changed > >> after the "0x80000026 leaf" commit mentioned above. > >> > >> Future multi-die Intel systems will have package scope RAPL counters, > >> but they will be using TPMI RAPL interface, which is not affected by > >> this change. > >> > >> Replacing topology_logical_die_id() with topology_physical_package_id() > >> conditionally only for AMD and Hygon fixes the energy-pkg event. > >> > >> On an AMD 2 socket 8 CCD Zen4 server: > >> > >> Before: > >> > >> linux$ ls /sys/class/powercap/ > >> intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d > >> intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0 > >> intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e > >> intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0 > >> intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f > >> intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0 > >> intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0 > >> intel-rapl:3 intel-rapl:7:0 intel-rapl:c > >> intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0 > >> > >> After: > >> > >> linux$ ls /sys/class/powercap/ > >> intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0 > >> > >> Only one sysfs entry per-event per-package is created after this change. > >> > >> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") > >> Reported-by: Michael Larabel <michael@michaellarabel.com> > >> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> > >> Reviewed-by: Zhang Rui <rui.zhang@intel.com> > >> --- > >> Changes in v2: > >> * Updated scope description comment, commit log > >> * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope() > >> * Check topology_logical_(die/package)_id return value > > > > This patch does not depend on the first one in the series if I'm not > > mistaken, in which case I can pick it up separately if you want me to > > do that, so please let me know. > > Sorry for the late reply, was out sick, > > Yes, please pick this patch separately, it is independent from the first one. OK, applied as 6.12 material. Thanks!
Hello Rafael, On Mon, Aug 19, 2024 at 03:34:09PM +0200, Rafael J. Wysocki wrote: > On Thu, Aug 8, 2024 at 1:18 PM Dhananjay Ugwekar > <Dhananjay.Ugwekar@amd.com> wrote: > > > > Hello Rafael, > > > > On 8/2/2024 6:05 PM, Rafael J. Wysocki wrote: > > > On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar > > > <Dhananjay.Ugwekar@amd.com> wrote: > > >> > > >> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"), > > >> on AMD processors that support extended CPUID leaf 0x80000026, the > > >> topology_logical_die_id() macros, no longer returns package id, instead it > > >> returns the CCD (Core Complex Die) id. This leads to the energy-pkg > > >> event scope to be modified to CCD instead of package. > > >> > > >> For more historical context, please refer to commit 32fb480e0a2c > > >> ("powercap/intel_rapl: Support multi-die/package"), which initially changed > > >> the RAPL scope from package to die for all systems, as Intel systems > > >> with Die enumeration have RAPL scope as die, and those without die > > >> enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked > > >> correctly with topology_logical_die_id() until recently, but this changed > > >> after the "0x80000026 leaf" commit mentioned above. > > >> > > >> Future multi-die Intel systems will have package scope RAPL counters, > > >> but they will be using TPMI RAPL interface, which is not affected by > > >> this change. > > >> > > >> Replacing topology_logical_die_id() with topology_physical_package_id() > > >> conditionally only for AMD and Hygon fixes the energy-pkg event. > > >> > > >> On an AMD 2 socket 8 CCD Zen4 server: > > >> > > >> Before: > > >> > > >> linux$ ls /sys/class/powercap/ > > >> intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d > > >> intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0 > > >> intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e > > >> intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0 > > >> intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f > > >> intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0 > > >> intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0 > > >> intel-rapl:3 intel-rapl:7:0 intel-rapl:c > > >> intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0 > > >> > > >> After: > > >> > > >> linux$ ls /sys/class/powercap/ > > >> intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0 > > >> > > >> Only one sysfs entry per-event per-package is created after this change. > > >> > > >> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") > > >> Reported-by: Michael Larabel <michael@michaellarabel.com> > > >> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> > > >> Reviewed-by: Zhang Rui <rui.zhang@intel.com> > > >> --- > > >> Changes in v2: > > >> * Updated scope description comment, commit log > > >> * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope() > > >> * Check topology_logical_(die/package)_id return value > > > > > > This patch does not depend on the first one in the series if I'm not > > > mistaken, in which case I can pick it up separately if you want me to > > > do that, so please let me know. > > > > Sorry for the late reply, was out sick, > > > > Yes, please pick this patch separately, it is independent from the first one. > > OK, applied as 6.12 material. Can this go into 6.11 fixes? It fixes the commit 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") which got merged in 6.10. -- Thanks and Regards gautham.
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 3cffa6c79538..4bc56acb99d6 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -2128,6 +2128,21 @@ void rapl_remove_package(struct rapl_package *rp) } EXPORT_SYMBOL_GPL(rapl_remove_package); +/* + * RAPL Package energy counter scope: + * 1. AMD/HYGON platforms use per-PKG package energy counter + * 2. For Intel platforms + * 2.1 CLX-AP platform has per-DIE package energy counter + * 2.2 Other platforms that uses MSR RAPL are single die systems so the + * package energy counter can be considered as per-PKG/per-DIE, + * here it is considered as per-DIE. + * 2.3 New platforms that use TPMI RAPL doesn't care about the + * scope because they are not MSR/CPU based. + */ +#define rapl_msrs_are_pkg_scope() \ + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \ + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) + /* caller to ensure CPU hotplug lock is held */ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv, bool id_is_cpu) @@ -2135,8 +2150,14 @@ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_ struct rapl_package *rp; int uid; - if (id_is_cpu) - uid = topology_logical_die_id(id); + if (id_is_cpu) { + uid = rapl_msrs_are_pkg_scope() ? + topology_physical_package_id(id) : topology_logical_die_id(id); + if (uid < 0) { + pr_err("topology_logical_(package/die)_id() returned a negative value"); + return ERR_PTR(-EINVAL); + } + } else uid = id; @@ -2168,9 +2189,14 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr return ERR_PTR(-ENOMEM); if (id_is_cpu) { - rp->id = topology_logical_die_id(id); + rp->id = rapl_msrs_are_pkg_scope() ? + topology_physical_package_id(id) : topology_logical_die_id(id); + if ((int)(rp->id) < 0) { + pr_err("topology_logical_(package/die)_id() returned a negative value"); + return ERR_PTR(-EINVAL); + } rp->lead_cpu = id; - if (topology_max_dies_per_package() > 1) + if (!rapl_msrs_are_pkg_scope() && topology_max_dies_per_package() > 1) snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d", topology_physical_package_id(id), topology_die_id(id)); else