Message ID | 1503059853-5118-1-git-send-email-rajneesh.bhardwaj@intel.com (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | Andy Shevchenko |
Headers | show |
On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> wrote: > This patch introduces a new debugfs entry to read current Package C-state > residency values and, one new kernel API to read the Package C-10 residency > counter. > > Package C-state residency MSRs provide useful debug information about system > idle states. In idle states system must enter deeper Package C-states. > > For example, on Intel Skylake/Kabylake based systems one should expect more > Package C-8 residency in "Idle Display On" scenarios compared to higher > Package C-states. With a self refresh display panel we must expect even more > deeper Package C-9/C-10 residencies indicating more power savings. Package > C-states depend not only on Core C-States but also on various IP blocks > power gating status and LTR values. > > For Intel Core SoCs Package C-10 entry is a must for deeper sleep states > such as S0ix. "Suspend-to-idle" should ideally take this path: > PC0 -> PC10 -> S0ix. For S0ix debug, its logical to check for Package-C10 > residency if for some reason system fails to enter S0ix. > > Please refer to this link for MSR details: > https://software.intel.com/sites/default/files/managed/22/0d/335592-sdm-vol-4.pdf > > Usage: > cat /sys/kernel/debug/pmc_core/package_cstate_residency > Package C2 : 0xec2e21735f > Package C3 : 0xc30113ba4 > Package C6 : 0x9ef4be15c5 > Package C7 : 0x1e011904 > Package C8 : 0x3c5653cfe5a > Package C9 : 0x0 > Package C10 : 0x16fff4289 > Why this patch is needed? See, we have turbostat and cpupower user space tools which do this without any additional code to be written in kernel. What prevents your user space application do the same? Moreover, we have events for cstate, I assume perf or something alike can monitor those counters as well. Sorry, NAK. > Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> > --- > * Applied on top of "Make the driver PCH family agnostic" sent by Srinivas > * Tested on 4.13-rc4 as well as on the "testing" branch of the > platform/drivers/x86. > > arch/x86/include/asm/pmc_core.h | 1 + > drivers/platform/x86/intel_pmc_core.c | 74 +++++++++++++++++++++++++++++++++++ > drivers/platform/x86/intel_pmc_core.h | 1 + > 3 files changed, 76 insertions(+) > > diff --git a/arch/x86/include/asm/pmc_core.h b/arch/x86/include/asm/pmc_core.h > index d4855f11136d..99f48e63fafc 100644 > --- a/arch/x86/include/asm/pmc_core.h > +++ b/arch/x86/include/asm/pmc_core.h > @@ -23,5 +23,6 @@ > > /* API to read SLP_S0_RESIDENCY counter */ > int intel_pmc_slp_s0_counter_read(u32 *data); > +int intel_pkgc10_counter_read(u64 *data); > > #endif /* _ASM_PMC_CORE_H */ > diff --git a/drivers/platform/x86/intel_pmc_core.c b/drivers/platform/x86/intel_pmc_core.c > index 17e08b42b0a9..5eea99f24b10 100644 > --- a/drivers/platform/x86/intel_pmc_core.c > +++ b/drivers/platform/x86/intel_pmc_core.c > @@ -29,11 +29,24 @@ > #include <asm/cpu_device_id.h> > #include <asm/intel-family.h> > #include <asm/pmc_core.h> > +#include <asm/msr.h> > > #include "intel_pmc_core.h" > > static struct pmc_dev pmc; > > +/* PKGC MSRs are common across Intel Core cpus */ > +static const struct pmc_bit_map msr_map[] = { > + {"Package C2", MSR_PKG_C2_RESIDENCY}, > + {"Package C3", MSR_PKG_C3_RESIDENCY}, > + {"Package C6", MSR_PKG_C6_RESIDENCY}, > + {"Package C7", MSR_PKG_C7_RESIDENCY}, > + {"Package C8", MSR_PKG_C8_RESIDENCY}, > + {"Package C9", MSR_PKG_C9_RESIDENCY}, > + {"Package C10", MSR_PKG_C10_RESIDENCY}, > + {}, > +}; > + > static const struct pmc_bit_map spt_pll_map[] = { > {"MIPI PLL", SPT_PMC_BIT_MPHY_CMN_LANE0}, > {"GEN2 USB2PCIE2 PLL", SPT_PMC_BIT_MPHY_CMN_LANE1}, > @@ -110,6 +123,7 @@ static const struct pmc_reg_map spt_reg_map = { > .pfear_sts = spt_pfear_map, > .mphy_sts = spt_mphy_map, > .pll_sts = spt_pll_map, > + .msr_sts = msr_map, > .slp_s0_offset = SPT_PMC_SLP_S0_RES_COUNTER_OFFSET, > .ltr_ignore_offset = SPT_PMC_LTR_IGNORE_OFFSET, > .regmap_length = SPT_PMC_MMIO_REG_LEN, > @@ -147,6 +161,26 @@ static inline u32 pmc_core_adjust_slp_s0_step(u32 value) > } > > /** > + * intel_pkgc10_counter_read() - Read Package C10 residency. > + * @data: Out param that contains current PC10 count by reading MSR 0x632 > + * > + * MSR_PKG_C10_RESIDENCY counter counts at the same frequency as the TSC. > + * BIT 0:59 represent the value since the last reset that this package is in > + * processor specific C10 states. > + * BIT 63:60 are reserved. > + * > + * Return: an error code or 0 on success. > + */ > +int intel_pkgc10_counter_read(u64 *data) > +{ > + if (!data) > + return -EINVAL; > + > + return rdmsrl_safe(MSR_PKG_C10_RESIDENCY, data); > +} > +EXPORT_SYMBOL_GPL(intel_pkgc10_counter_read); > + > +/** > * intel_pmc_slp_s0_counter_read() - Read SLP_S0 residency. > * @data: Out param that contains current SLP_S0 count. > * > @@ -430,6 +464,39 @@ static const struct file_operations pmc_core_ltr_ignore_ops = { > .release = single_release, > }; > > +static int pmc_core_pkgc_show(struct seq_file *s, void *unused) > +{ > + struct pmc_dev *pmcdev = s->private; > + const struct pmc_bit_map *map = pmcdev->map->msr_sts; > + u64 pcstate_count; > + int index, err; > + > + for (index = 0; map[index].name ; index++) { > + err = rdmsrl_safe(map[index].bit_mask, &pcstate_count); > + if (err) { > + pr_debug("Failed to read %s residency MSR", > + map[index].name); > + return err; > + } > + seq_printf(s, "%s\t : 0x%llx\n", map[index].name, > + pcstate_count); > + } > + > + return 0; > +} > + > +static int pmc_core_pkgc_open(struct inode *inode, struct file *file) > +{ > + return single_open(file, pmc_core_pkgc_show, inode->i_private); > +} > + > +static const struct file_operations pmc_core_pkgc_ops = { > + .open = pmc_core_pkgc_open, > + .read = seq_read, > + .llseek = seq_lseek, > + .release = single_release, > +}; > + > static void pmc_core_dbgfs_unregister(struct pmc_dev *pmcdev) > { > debugfs_remove_recursive(pmcdev->dbgfs_dir); > @@ -474,6 +541,13 @@ static int pmc_core_dbgfs_register(struct pmc_dev *pmcdev) > if (!file) > goto err; > > + file = debugfs_create_file("package_cstate_residency", > + S_IFREG | S_IRUGO, dir, pmcdev, > + &pmc_core_pkgc_ops); > + > + if (!file) > + goto err; > + > return 0; > err: > pmc_core_dbgfs_unregister(pmcdev); > diff --git a/drivers/platform/x86/intel_pmc_core.h b/drivers/platform/x86/intel_pmc_core.h > index 3d225a9cc09f..4423b84c53c0 100644 > --- a/drivers/platform/x86/intel_pmc_core.h > +++ b/drivers/platform/x86/intel_pmc_core.h > @@ -150,6 +150,7 @@ struct pmc_reg_map { > const struct pmc_bit_map *pfear_sts; > const struct pmc_bit_map *mphy_sts; > const struct pmc_bit_map *pll_sts; > + const struct pmc_bit_map *msr_sts; > const u32 slp_s0_offset; > const u32 ltr_ignore_offset; > const u32 base_address; > -- > 2.7.4 >
On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote: > On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj > <rajneesh.bhardwaj@intel.com> wrote: > > This patch introduces a new debugfs entry to read current Package C-state > > residency values and, one new kernel API to read the Package C-10 residency > > counter. > > > > Package C-state residency MSRs provide useful debug information about system > > idle states. In idle states system must enter deeper Package C-states. > > > > For example, on Intel Skylake/Kabylake based systems one should expect more > > Package C-8 residency in "Idle Display On" scenarios compared to higher > > Package C-states. With a self refresh display panel we must expect even more > > deeper Package C-9/C-10 residencies indicating more power savings. Package > > C-states depend not only on Core C-States but also on various IP blocks > > power gating status and LTR values. > > > > For Intel Core SoCs Package C-10 entry is a must for deeper sleep states > > such as S0ix. "Suspend-to-idle" should ideally take this path: > > PC0 -> PC10 -> S0ix. For S0ix debug, its logical to check for Package-C10 > > residency if for some reason system fails to enter S0ix. > > > > Please refer to this link for MSR details: > > https://software.intel.com/sites/default/files/managed/22/0d/335592-sdm-vol-4.pdf > > > > Usage: > > cat /sys/kernel/debug/pmc_core/package_cstate_residency > > Package C2 : 0xec2e21735f > > Package C3 : 0xc30113ba4 > > Package C6 : 0x9ef4be15c5 > > Package C7 : 0x1e011904 > > Package C8 : 0x3c5653cfe5a > > Package C9 : 0x0 > > Package C10 : 0x16fff4289 > > > > > Why this patch is needed? Andy, I'll try to give some background for this. This is needed to enhance the S0ix failure debug capabilities from within the kernel. On ChromeOS we have S0ix failsafe kernel framework that is used to validate S0ix and report the blockers in case of a failure. https://patchwork.kernel.org/patch/9148999/ So far only intel_pmc_slp_s0_counter_read is called by this framework to check whether the previous attempt to enter S0ix was success or not. Having another PC10 counter related exported function enhances the S0ix debug since PC10 state is a prerequisite to enter S0ix. > See, we have turbostat and cpupower user space tools which do this > without any additional code to be written in kernel. What prevents > your user space application do the same? > > Moreover, we have events for cstate, I assume perf or something alike > can monitor those counters as well. You're right, perhaps the debugfs is redundant when we have those user space tools but such tools are not available readily for all platforms/distros. Interfaces like /dev/cpu/*/msr that turbostat uses are not available on all the platforms. PMC driver is a debug driver so i thought its better to show Package C-state related info for low power debug here. > > Sorry, NAK. This patch has two parts i.e. exported PC10 API and the debugfs. Based on the above explanation, if the patch is not good as is, please let me know if i should drop the debugfs part and respin a v2 with just the exported API or drop this totally. Thanks for the feedback and thanks for taking time to review! > > Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> > > --- > > * Applied on top of "Make the driver PCH family agnostic" sent by Srinivas > > * Tested on 4.13-rc4 as well as on the "testing" branch of the > > platform/drivers/x86. > > > > -- > With Best Regards, > Andy Shevchenko
+PeterZ (since I mentioned his name) On Fri, Aug 18, 2017 at 5:58 PM, Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> wrote: > On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote: >> On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj >> <rajneesh.bhardwaj@intel.com> wrote: >> > This patch introduces a new debugfs entry to read current Package C-state >> > residency values and, one new kernel API to read the Package C-10 residency >> > counter. >> > >> > Package C-state residency MSRs provide useful debug information about system >> > idle states. In idle states system must enter deeper Package C-states. >> Why this patch is needed? > > Andy, I'll try to give some background for this. > > This is needed to enhance the S0ix failure debug capabilities from within > the kernel. On ChromeOS we have S0ix failsafe kernel framework that is used > to validate S0ix and report the blockers in case of a failure. > https://patchwork.kernel.org/patch/9148999/ (It's not part of upstream) > So far only intel_pmc_slp_s0_counter_read is called by this framework to > check whether the previous attempt to enter S0ix was success or not. I harder see even a single user of that API in current kernel. It should be unexported and removed I think. > Having > another PC10 counter related exported function enhances the S0ix debug since > PC10 state is a prerequisite to enter S0ix. > >> See, we have turbostat and cpupower user space tools which do this >> without any additional code to be written in kernel. What prevents >> your user space application do the same? >> >> Moreover, we have events for cstate, I assume perf or something alike >> can monitor those counters as well. > > You're right, perhaps the debugfs is redundant when we have those user space > tools but such tools are not available readily for all platforms/distros. > Interfaces like /dev/cpu/*/msr that turbostat uses are not available on all > the platforms. > PMC driver is a debug driver so i thought its better to show Package C-state > related info for low power debug here. > >> >> Sorry, NAK. > > This patch has two parts i.e. exported PC10 API and the debugfs. Based on > the above explanation, if the patch is not good as is, please let me know if > i should drop the debugfs part and respin a v2 with just the exported API or > drop this totally. > > Thanks for the feedback and thanks for taking time to review! Reading above makes me think that entire design of this is misguided. Since the most of values are counters they better to be accessed in a way how perf does. In case you need *in-kernel* facility, do some APIs (if it's not done yet) for events drivers first. cstate event driver is already in upstream. Sorry, NAK for entire patch until it would be blessed by people like Peter Z.
On Fri, Aug 18, 2017 at 08:17:32PM +0300, Andy Shevchenko wrote: > +PeterZ (since I mentioned his name) > > On Fri, Aug 18, 2017 at 5:58 PM, Rajneesh Bhardwaj > <rajneesh.bhardwaj@intel.com> wrote: > > On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote: > >> On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj > >> <rajneesh.bhardwaj@intel.com> wrote: > >> > This patch introduces a new debugfs entry to read current Package C-state > >> > residency values and, one new kernel API to read the Package C-10 residency > >> > counter. > >> > > >> > Package C-state residency MSRs provide useful debug information about system > >> > idle states. In idle states system must enter deeper Package C-states. > > >> Why this patch is needed? > > > > Andy, I'll try to give some background for this. > > > > This is needed to enhance the S0ix failure debug capabilities from within > > the kernel. On ChromeOS we have S0ix failsafe kernel framework that is used > > to validate S0ix and report the blockers in case of a failure. > > https://patchwork.kernel.org/patch/9148999/ > > (It's not part of upstream) Sorry i sent an older link. There are fresh attempts to get this into mainline kernel and looks like there is a traction for it. https://patchwork.kernel.org/patch/9831229/ Package C-state (PC10) validation is discussed there. > > > So far only intel_pmc_slp_s0_counter_read is called by this framework to > > check whether the previous attempt to enter S0ix was success or not. > > I harder see even a single user of that API in current kernel. It > should be unexported and removed I think. > > > Having > > another PC10 counter related exported function enhances the S0ix debug since > > PC10 state is a prerequisite to enter S0ix. > > > >> See, we have turbostat and cpupower user space tools which do this > >> without any additional code to be written in kernel. What prevents > >> your user space application do the same? > >> > >> Moreover, we have events for cstate, I assume perf or something alike > >> can monitor those counters as well. > > > > You're right, perhaps the debugfs is redundant when we have those user space > > tools but such tools are not available readily for all platforms/distros. > > Interfaces like /dev/cpu/*/msr that turbostat uses are not available on all > > the platforms. > > PMC driver is a debug driver so i thought its better to show Package C-state > > related info for low power debug here. > > > >> > >> Sorry, NAK. > > > > This patch has two parts i.e. exported PC10 API and the debugfs. Based on > > the above explanation, if the patch is not good as is, please let me know if > > i should drop the debugfs part and respin a v2 with just the exported API or > > drop this totally. > > > > Thanks for the feedback and thanks for taking time to review! > > Reading above makes me think that entire design of this is misguided. > Since the most of values are counters they better to be accessed in a > way how perf does. > > In case you need *in-kernel* facility, do some APIs (if it's not done > yet) for events drivers first. > cstate event driver is already in upstream. > > Sorry, NAK for entire patch until it would be blessed by people like Peter Z. > > -- > With Best Regards, > Andy Shevchenko
On Fri, Aug 18, 2017 at 08:17:32PM +0300, Andy Shevchenko wrote: > +PeterZ (since I mentioned his name) > Since we're talking C States, +Rafael and +Len > On Fri, Aug 18, 2017 at 5:58 PM, Rajneesh Bhardwaj > <rajneesh.bhardwaj@intel.com> wrote: > > On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote: > >> On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj > >> <rajneesh.bhardwaj@intel.com> wrote: > >> > This patch introduces a new debugfs entry to read current Package C-state > >> > residency values and, one new kernel API to read the Package C-10 residency > >> > counter. > >> > > >> > Package C-state residency MSRs provide useful debug information about system > >> > idle states. In idle states system must enter deeper Package C-states. > > >> Why this patch is needed? > > > > Andy, I'll try to give some background for this. > > > > This is needed to enhance the S0ix failure debug capabilities from within > > the kernel. On ChromeOS we have S0ix failsafe kernel framework that is used > > to validate S0ix and report the blockers in case of a failure. > > https://patchwork.kernel.org/patch/9148999/ > > (It's not part of upstream) > > > So far only intel_pmc_slp_s0_counter_read is called by this framework to > > check whether the previous attempt to enter S0ix was success or not. > > I harder see even a single user of that API in current kernel. It > should be unexported and removed I think. > > > Having > > another PC10 counter related exported function enhances the S0ix debug since > > PC10 state is a prerequisite to enter S0ix. > > > >> See, we have turbostat and cpupower user space tools which do this > >> without any additional code to be written in kernel. What prevents > >> your user space application do the same? > >> > >> Moreover, we have events for cstate, I assume perf or something alike > >> can monitor those counters as well. I tend to agree. > > > > You're right, perhaps the debugfs is redundant when we have those user space > > tools but such tools are not available readily for all platforms/distros. > > Interfaces like /dev/cpu/*/msr that turbostat uses are not available on all > > the platforms. Can these be exposed in this way for *this* platform though? > > PMC driver is a debug driver so i thought its better to show Package C-state > > related info for low power debug here. > > > >> > >> Sorry, NAK. > > > > This patch has two parts i.e. exported PC10 API and the debugfs. Based on > > the above explanation, if the patch is not good as is, please let me know if > > i should drop the debugfs part and respin a v2 with just the exported API or > > drop this totally. > > > > Thanks for the feedback and thanks for taking time to review! > > Reading above makes me think that entire design of this is misguided. > Since the most of values are counters they better to be accessed in a > way how perf does. > > In case you need *in-kernel* facility, do some APIs (if it's not done > yet) for events drivers first. > cstate event driver is already in upstream. > > Sorry, NAK for entire patch until it would be blessed by people like Peter Z. > > -- > With Best Regards, > Andy Shevchenko >
Hello, On Fri, Aug 18, 2017 at 10:47 AM, Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> wrote: > On Fri, Aug 18, 2017 at 08:17:32PM +0300, Andy Shevchenko wrote: >> +PeterZ (since I mentioned his name) >> >> On Fri, Aug 18, 2017 at 5:58 PM, Rajneesh Bhardwaj >> <rajneesh.bhardwaj@intel.com> wrote: >> > On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote: >> >> On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj >> >> <rajneesh.bhardwaj@intel.com> wrote: >> >> > This patch introduces a new debugfs entry to read current Package C-state >> >> > residency values and, one new kernel API to read the Package C-10 residency >> >> > counter. >> >> > >> >> > Package C-state residency MSRs provide useful debug information about system >> >> > idle states. In idle states system must enter deeper Package C-states. >> >> >> Why this patch is needed? >> > >> > Andy, I'll try to give some background for this. >> > >> > This is needed to enhance the S0ix failure debug capabilities from within >> > the kernel. On ChromeOS we have S0ix failsafe kernel framework that is used >> > to validate S0ix and report the blockers in case of a failure. >> > https://patchwork.kernel.org/patch/9148999/ >> >> (It's not part of upstream) > > Sorry i sent an older link. There are fresh attempts to get this into > mainline kernel and looks like there is a traction for it. > https://patchwork.kernel.org/patch/9831229/ > > Package C-state (PC10) validation is discussed there. Yes, Derek has been trying to get it up streamed, and is currently taking care of the comments. One of the comments Rafael Wysocki had (https://lkml.org/lkml/2017/7/10/741), was that getting to PC10 takes care of large amount of power savings, and PC10 is a logical milestone to track / validate as it validates the north complex power state. To do that we need an API to get the PC10 counter. I do agree that an exposed API needs to have a user code that uses the API. In this case it seems to be a chicken and egg problem i.e. the S0ix failsafe framework (https://patchwork.kernel.org/patch/9831229/) needs this API, and the API needs a user (failsafe framework) for it to be accepted? Best Regards, Rajat > >> >> > So far only intel_pmc_slp_s0_counter_read is called by this framework to >> > check whether the previous attempt to enter S0ix was success or not. >> >> I harder see even a single user of that API in current kernel. It >> should be unexported and removed I think. >> >> > Having >> > another PC10 counter related exported function enhances the S0ix debug since >> > PC10 state is a prerequisite to enter S0ix. >> > >> >> See, we have turbostat and cpupower user space tools which do this >> >> without any additional code to be written in kernel. What prevents >> >> your user space application do the same? >> >> >> >> Moreover, we have events for cstate, I assume perf or something alike >> >> can monitor those counters as well. >> > >> > You're right, perhaps the debugfs is redundant when we have those user space >> > tools but such tools are not available readily for all platforms/distros. >> > Interfaces like /dev/cpu/*/msr that turbostat uses are not available on all >> > the platforms. >> > PMC driver is a debug driver so i thought its better to show Package C-state >> > related info for low power debug here. >> > >> >> >> >> Sorry, NAK. >> > >> > This patch has two parts i.e. exported PC10 API and the debugfs. Based on >> > the above explanation, if the patch is not good as is, please let me know if >> > i should drop the debugfs part and respin a v2 with just the exported API or >> > drop this totally. >> > >> > Thanks for the feedback and thanks for taking time to review! >> >> Reading above makes me think that entire design of this is misguided. >> Since the most of values are counters they better to be accessed in a >> way how perf does. >> >> In case you need *in-kernel* facility, do some APIs (if it's not done >> yet) for events drivers first. >> cstate event driver is already in upstream. >> >> Sorry, NAK for entire patch until it would be blessed by people like Peter Z. >> >> -- >> With Best Regards, >> Andy Shevchenko > > -- > Best Regards, > Rajneesh
On Sat, Aug 19, 2017 at 2:23 AM, Rajat Jain <rajatja@google.com> wrote: > On Fri, Aug 18, 2017 at 10:47 AM, Rajneesh Bhardwaj > <rajneesh.bhardwaj@intel.com> wrote: >> On Fri, Aug 18, 2017 at 08:17:32PM +0300, Andy Shevchenko wrote: >>> On Fri, Aug 18, 2017 at 5:58 PM, Rajneesh Bhardwaj >>> <rajneesh.bhardwaj@intel.com> wrote: >>> > On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote: >>> >> On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj >>> >> <rajneesh.bhardwaj@intel.com> wrote: >>> > This is needed to enhance the S0ix failure debug capabilities from within >>> > the kernel. On ChromeOS we have S0ix failsafe kernel framework that is used >>> > to validate S0ix and report the blockers in case of a failure. >>> > https://patchwork.kernel.org/patch/9148999/ >>> >>> (It's not part of upstream) >> >> Sorry i sent an older link. There are fresh attempts to get this into >> mainline kernel and looks like there is a traction for it. >> https://patchwork.kernel.org/patch/9831229/ >> >> Package C-state (PC10) validation is discussed there. > > Yes, Derek has been trying to get it up streamed, and is currently > taking care of the comments. One of the comments Rafael Wysocki had > (https://lkml.org/lkml/2017/7/10/741), was that getting to PC10 takes > care of large amount of power savings, and PC10 is a logical milestone > to track / validate as it validates the north complex power state. To > do that we need an API to get the PC10 counter. So, how many ways we have to get that counter? From HW prospective; from Linux kernel prospective; from user space prospective. > I do agree that an exposed API needs to have a user code that uses the > API. In this case it seems to be a chicken and egg problem i.e. the > S0ix failsafe framework (https://patchwork.kernel.org/patch/9831229/) > needs this API, and the API needs a user (failsafe framework) for it > to be accepted? So, Derek's patch as I can see didn't made upstream and the whole activity seems staled. I'm going to mark this as Rejected. Whenever it would be new approach feel free to send a new version.
diff --git a/arch/x86/include/asm/pmc_core.h b/arch/x86/include/asm/pmc_core.h index d4855f11136d..99f48e63fafc 100644 --- a/arch/x86/include/asm/pmc_core.h +++ b/arch/x86/include/asm/pmc_core.h @@ -23,5 +23,6 @@ /* API to read SLP_S0_RESIDENCY counter */ int intel_pmc_slp_s0_counter_read(u32 *data); +int intel_pkgc10_counter_read(u64 *data); #endif /* _ASM_PMC_CORE_H */ diff --git a/drivers/platform/x86/intel_pmc_core.c b/drivers/platform/x86/intel_pmc_core.c index 17e08b42b0a9..5eea99f24b10 100644 --- a/drivers/platform/x86/intel_pmc_core.c +++ b/drivers/platform/x86/intel_pmc_core.c @@ -29,11 +29,24 @@ #include <asm/cpu_device_id.h> #include <asm/intel-family.h> #include <asm/pmc_core.h> +#include <asm/msr.h> #include "intel_pmc_core.h" static struct pmc_dev pmc; +/* PKGC MSRs are common across Intel Core cpus */ +static const struct pmc_bit_map msr_map[] = { + {"Package C2", MSR_PKG_C2_RESIDENCY}, + {"Package C3", MSR_PKG_C3_RESIDENCY}, + {"Package C6", MSR_PKG_C6_RESIDENCY}, + {"Package C7", MSR_PKG_C7_RESIDENCY}, + {"Package C8", MSR_PKG_C8_RESIDENCY}, + {"Package C9", MSR_PKG_C9_RESIDENCY}, + {"Package C10", MSR_PKG_C10_RESIDENCY}, + {}, +}; + static const struct pmc_bit_map spt_pll_map[] = { {"MIPI PLL", SPT_PMC_BIT_MPHY_CMN_LANE0}, {"GEN2 USB2PCIE2 PLL", SPT_PMC_BIT_MPHY_CMN_LANE1}, @@ -110,6 +123,7 @@ static const struct pmc_reg_map spt_reg_map = { .pfear_sts = spt_pfear_map, .mphy_sts = spt_mphy_map, .pll_sts = spt_pll_map, + .msr_sts = msr_map, .slp_s0_offset = SPT_PMC_SLP_S0_RES_COUNTER_OFFSET, .ltr_ignore_offset = SPT_PMC_LTR_IGNORE_OFFSET, .regmap_length = SPT_PMC_MMIO_REG_LEN, @@ -147,6 +161,26 @@ static inline u32 pmc_core_adjust_slp_s0_step(u32 value) } /** + * intel_pkgc10_counter_read() - Read Package C10 residency. + * @data: Out param that contains current PC10 count by reading MSR 0x632 + * + * MSR_PKG_C10_RESIDENCY counter counts at the same frequency as the TSC. + * BIT 0:59 represent the value since the last reset that this package is in + * processor specific C10 states. + * BIT 63:60 are reserved. + * + * Return: an error code or 0 on success. + */ +int intel_pkgc10_counter_read(u64 *data) +{ + if (!data) + return -EINVAL; + + return rdmsrl_safe(MSR_PKG_C10_RESIDENCY, data); +} +EXPORT_SYMBOL_GPL(intel_pkgc10_counter_read); + +/** * intel_pmc_slp_s0_counter_read() - Read SLP_S0 residency. * @data: Out param that contains current SLP_S0 count. * @@ -430,6 +464,39 @@ static const struct file_operations pmc_core_ltr_ignore_ops = { .release = single_release, }; +static int pmc_core_pkgc_show(struct seq_file *s, void *unused) +{ + struct pmc_dev *pmcdev = s->private; + const struct pmc_bit_map *map = pmcdev->map->msr_sts; + u64 pcstate_count; + int index, err; + + for (index = 0; map[index].name ; index++) { + err = rdmsrl_safe(map[index].bit_mask, &pcstate_count); + if (err) { + pr_debug("Failed to read %s residency MSR", + map[index].name); + return err; + } + seq_printf(s, "%s\t : 0x%llx\n", map[index].name, + pcstate_count); + } + + return 0; +} + +static int pmc_core_pkgc_open(struct inode *inode, struct file *file) +{ + return single_open(file, pmc_core_pkgc_show, inode->i_private); +} + +static const struct file_operations pmc_core_pkgc_ops = { + .open = pmc_core_pkgc_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + static void pmc_core_dbgfs_unregister(struct pmc_dev *pmcdev) { debugfs_remove_recursive(pmcdev->dbgfs_dir); @@ -474,6 +541,13 @@ static int pmc_core_dbgfs_register(struct pmc_dev *pmcdev) if (!file) goto err; + file = debugfs_create_file("package_cstate_residency", + S_IFREG | S_IRUGO, dir, pmcdev, + &pmc_core_pkgc_ops); + + if (!file) + goto err; + return 0; err: pmc_core_dbgfs_unregister(pmcdev); diff --git a/drivers/platform/x86/intel_pmc_core.h b/drivers/platform/x86/intel_pmc_core.h index 3d225a9cc09f..4423b84c53c0 100644 --- a/drivers/platform/x86/intel_pmc_core.h +++ b/drivers/platform/x86/intel_pmc_core.h @@ -150,6 +150,7 @@ struct pmc_reg_map { const struct pmc_bit_map *pfear_sts; const struct pmc_bit_map *mphy_sts; const struct pmc_bit_map *pll_sts; + const struct pmc_bit_map *msr_sts; const u32 slp_s0_offset; const u32 ltr_ignore_offset; const u32 base_address;
This patch introduces a new debugfs entry to read current Package C-state residency values and, one new kernel API to read the Package C-10 residency counter. Package C-state residency MSRs provide useful debug information about system idle states. In idle states system must enter deeper Package C-states. For example, on Intel Skylake/Kabylake based systems one should expect more Package C-8 residency in "Idle Display On" scenarios compared to higher Package C-states. With a self refresh display panel we must expect even more deeper Package C-9/C-10 residencies indicating more power savings. Package C-states depend not only on Core C-States but also on various IP blocks power gating status and LTR values. For Intel Core SoCs Package C-10 entry is a must for deeper sleep states such as S0ix. "Suspend-to-idle" should ideally take this path: PC0 -> PC10 -> S0ix. For S0ix debug, its logical to check for Package-C10 residency if for some reason system fails to enter S0ix. Please refer to this link for MSR details: https://software.intel.com/sites/default/files/managed/22/0d/335592-sdm-vol-4.pdf Usage: cat /sys/kernel/debug/pmc_core/package_cstate_residency Package C2 : 0xec2e21735f Package C3 : 0xc30113ba4 Package C6 : 0x9ef4be15c5 Package C7 : 0x1e011904 Package C8 : 0x3c5653cfe5a Package C9 : 0x0 Package C10 : 0x16fff4289 Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> --- * Applied on top of "Make the driver PCH family agnostic" sent by Srinivas * Tested on 4.13-rc4 as well as on the "testing" branch of the platform/drivers/x86. arch/x86/include/asm/pmc_core.h | 1 + drivers/platform/x86/intel_pmc_core.c | 74 +++++++++++++++++++++++++++++++++++ drivers/platform/x86/intel_pmc_core.h | 1 + 3 files changed, 76 insertions(+)