Message ID | 20211021035159.1117456-4-kai.heng.feng@canonical.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Let user enable ASPM and LTR via sysfs | expand |
On Thu, Oct 21, 2021 at 11:51:59AM +0800, Kai-Heng Feng wrote: > Sometimes BIOS may not be able to program ASPM and LTR settings, for > instance, the NVMe devices behind Intel VMD bridges. For this case, both > ASPM and LTR have to be enabled to have significant power saving. > > Since we still want POLICY_DEFAULT honor the default BIOS ASPM settings, > introduce LTR sysfs knobs so users can set max snoop and max nosnoop > latency manually or via udev rules. How is the user supposed to figure out what "max snoop" and "max nosnoop" values to program? If we add this, I'm afraid we'll have people programming random things that seem to work but are not actually reliable. > [1] https://github.com/systemd/systemd/pull/17899/ > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209789 > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > v2: > - New patch. > > drivers/pci/pcie/aspm.c | 59 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 59 insertions(+) > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > index 1560859ab056..f7dc62936445 100644 > --- a/drivers/pci/pcie/aspm.c > +++ b/drivers/pci/pcie/aspm.c > @@ -1299,6 +1299,59 @@ static ssize_t clkpm_store(struct device *dev, > return len; > } > > +static ssize_t ltr_attr_show_common(struct device *dev, > + struct device_attribute *attr, char *buf, u8 state) > +{ > + struct pci_dev *pdev = to_pci_dev(dev); > + int ltr; > + u16 val; > + > + ltr = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > + if (!ltr) > + return -EINVAL; > + > + pci_read_config_word(pdev, ltr + state, &val); > + > + return sysfs_emit(buf, "0x%0x\n", val); > +} > + > +static ssize_t ltr_attr_store_common(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t len, u8 state) > +{ > + struct pci_dev *pdev = to_pci_dev(dev); > + int ltr; > + u16 val; > + > + ltr = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > + if (!ltr) > + return -EINVAL; > + > + if (kstrtou16(buf, 16, &val) < 0) > + return -EINVAL; > + > + /* LatencyScale is not permitted to be 110 or 111 */ > + if ((val >> 10) > 5) > + return -EINVAL; > + > + pci_write_config_word(pdev, ltr + state, val); > + > + return len; > +} > + > +#define LTR_ATTR(_f, _s) \ > +static ssize_t _f##_show(struct device *dev, \ > + struct device_attribute *attr, char *buf) \ > +{ return ltr_attr_show_common(dev, attr, buf, PCI_LTR_##_s); } \ > + \ > +static ssize_t _f##_store(struct device *dev, \ > + struct device_attribute *attr, \ > + const char *buf, size_t len) \ > +{ return ltr_attr_store_common(dev, attr, buf, len, PCI_LTR_##_s); } > + > +LTR_ATTR(ltr_max_snoop_lat, MAX_SNOOP_LAT); > +LTR_ATTR(ltr_max_nosnoop_lat, MAX_NOSNOOP_LAT); > + > static DEVICE_ATTR_RW(clkpm); > static DEVICE_ATTR_RW(l0s_aspm); > static DEVICE_ATTR_RW(l1_aspm); > @@ -1306,6 +1359,8 @@ static DEVICE_ATTR_RW(l1_1_aspm); > static DEVICE_ATTR_RW(l1_2_aspm); > static DEVICE_ATTR_RW(l1_1_pcipm); > static DEVICE_ATTR_RW(l1_2_pcipm); > +static DEVICE_ATTR_RW(ltr_max_snoop_lat); > +static DEVICE_ATTR_RW(ltr_max_nosnoop_lat); > > static struct attribute *aspm_ctrl_attrs[] = { > &dev_attr_clkpm.attr, > @@ -1315,6 +1370,8 @@ static struct attribute *aspm_ctrl_attrs[] = { > &dev_attr_l1_2_aspm.attr, > &dev_attr_l1_1_pcipm.attr, > &dev_attr_l1_2_pcipm.attr, > + &dev_attr_ltr_max_snoop_lat.attr, > + &dev_attr_ltr_max_nosnoop_lat.attr, > NULL > }; > > @@ -1338,6 +1395,8 @@ static umode_t aspm_ctrl_attrs_are_visible(struct kobject *kobj, > > if (n == 0) > return link->clkpm_capable ? a->mode : 0; > + else if (n == 7 || n == 8) > + return pdev->ltr_path ? a->mode : 0; > > return link->aspm_capable & aspm_state_map[n - 1] ? a->mode : 0; > } > -- > 2.32.0 >
On Thu, Oct 21, 2021 at 11:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Oct 21, 2021 at 11:51:59AM +0800, Kai-Heng Feng wrote: > > Sometimes BIOS may not be able to program ASPM and LTR settings, for > > instance, the NVMe devices behind Intel VMD bridges. For this case, both > > ASPM and LTR have to be enabled to have significant power saving. > > > > Since we still want POLICY_DEFAULT honor the default BIOS ASPM settings, > > introduce LTR sysfs knobs so users can set max snoop and max nosnoop > > latency manually or via udev rules. > > How is the user supposed to figure out what "max snoop" and "max > nosnoop" values to program? Actually, the only way I know is to get the value from other OS. > > If we add this, I'm afraid we'll have people programming random things > that seem to work but are not actually reliable. IMO users need to take full responsibility for own doings. Also, it's already doable by using setpci... If we don't want to add LTR sysfs, what other options do we have to enable VMD ASPM and LTR by default since BIOS doesn't do it for us? 1) Enable it in the PCI or VMD driver, however this approach violates POLICY_DEFAULT. 2) Use `setpci` directly in udev rules to enable VMD ASPM and LTR. I think 2) is bad, and since 1) isn't so good either, the approach in this patch may be the best compromise. Kai-Heng > > > [1] https://github.com/systemd/systemd/pull/17899/ > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209789 > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > --- > > v2: > > - New patch. > > > > drivers/pci/pcie/aspm.c | 59 +++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 59 insertions(+) > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > index 1560859ab056..f7dc62936445 100644 > > --- a/drivers/pci/pcie/aspm.c > > +++ b/drivers/pci/pcie/aspm.c > > @@ -1299,6 +1299,59 @@ static ssize_t clkpm_store(struct device *dev, > > return len; > > } > > > > +static ssize_t ltr_attr_show_common(struct device *dev, > > + struct device_attribute *attr, char *buf, u8 state) > > +{ > > + struct pci_dev *pdev = to_pci_dev(dev); > > + int ltr; > > + u16 val; > > + > > + ltr = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > > + if (!ltr) > > + return -EINVAL; > > + > > + pci_read_config_word(pdev, ltr + state, &val); > > + > > + return sysfs_emit(buf, "0x%0x\n", val); > > +} > > + > > +static ssize_t ltr_attr_store_common(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t len, u8 state) > > +{ > > + struct pci_dev *pdev = to_pci_dev(dev); > > + int ltr; > > + u16 val; > > + > > + ltr = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > > + if (!ltr) > > + return -EINVAL; > > + > > + if (kstrtou16(buf, 16, &val) < 0) > > + return -EINVAL; > > + > > + /* LatencyScale is not permitted to be 110 or 111 */ > > + if ((val >> 10) > 5) > > + return -EINVAL; > > + > > + pci_write_config_word(pdev, ltr + state, val); > > + > > + return len; > > +} > > + > > +#define LTR_ATTR(_f, _s) \ > > +static ssize_t _f##_show(struct device *dev, \ > > + struct device_attribute *attr, char *buf) \ > > +{ return ltr_attr_show_common(dev, attr, buf, PCI_LTR_##_s); } \ > > + \ > > +static ssize_t _f##_store(struct device *dev, \ > > + struct device_attribute *attr, \ > > + const char *buf, size_t len) \ > > +{ return ltr_attr_store_common(dev, attr, buf, len, PCI_LTR_##_s); } > > + > > +LTR_ATTR(ltr_max_snoop_lat, MAX_SNOOP_LAT); > > +LTR_ATTR(ltr_max_nosnoop_lat, MAX_NOSNOOP_LAT); > > + > > static DEVICE_ATTR_RW(clkpm); > > static DEVICE_ATTR_RW(l0s_aspm); > > static DEVICE_ATTR_RW(l1_aspm); > > @@ -1306,6 +1359,8 @@ static DEVICE_ATTR_RW(l1_1_aspm); > > static DEVICE_ATTR_RW(l1_2_aspm); > > static DEVICE_ATTR_RW(l1_1_pcipm); > > static DEVICE_ATTR_RW(l1_2_pcipm); > > +static DEVICE_ATTR_RW(ltr_max_snoop_lat); > > +static DEVICE_ATTR_RW(ltr_max_nosnoop_lat); > > > > static struct attribute *aspm_ctrl_attrs[] = { > > &dev_attr_clkpm.attr, > > @@ -1315,6 +1370,8 @@ static struct attribute *aspm_ctrl_attrs[] = { > > &dev_attr_l1_2_aspm.attr, > > &dev_attr_l1_1_pcipm.attr, > > &dev_attr_l1_2_pcipm.attr, > > + &dev_attr_ltr_max_snoop_lat.attr, > > + &dev_attr_ltr_max_nosnoop_lat.attr, > > NULL > > }; > > > > @@ -1338,6 +1395,8 @@ static umode_t aspm_ctrl_attrs_are_visible(struct kobject *kobj, > > > > if (n == 0) > > return link->clkpm_capable ? a->mode : 0; > > + else if (n == 7 || n == 8) > > + return pdev->ltr_path ? a->mode : 0; > > > > return link->aspm_capable & aspm_state_map[n - 1] ? a->mode : 0; > > } > > -- > > 2.32.0 > >
On Mon, Oct 25, 2021 at 06:33:50PM +0800, Kai-Heng Feng wrote: > On Thu, Oct 21, 2021 at 11:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Oct 21, 2021 at 11:51:59AM +0800, Kai-Heng Feng wrote: > > > Sometimes BIOS may not be able to program ASPM and LTR settings, for > > > instance, the NVMe devices behind Intel VMD bridges. For this case, both > > > ASPM and LTR have to be enabled to have significant power saving. > > > > > > Since we still want POLICY_DEFAULT honor the default BIOS ASPM settings, > > > introduce LTR sysfs knobs so users can set max snoop and max nosnoop > > > latency manually or via udev rules. > > > > How is the user supposed to figure out what "max snoop" and "max > > nosnoop" values to program? > > Actually, the only way I know is to get the value from other OS. I don't see how this can be a workable solution. IIUC this is specifically related to ASPM L1.2. L1.2 depends on LTR (the max snoop/nosnoop values) and the TPOWER_ON and Common_Mode_Restore_Time values. PCIe r5.0, sec 5.5.4, says: Prior to setting either or both of the enable bits for L1.2, the values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM L1.2 Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value and Scale fields) must be programmed. The TPOWER_ON and Common_Mode_Restore_Time fields must be programmed to the appropriate values based on the components and AC coupling capacitors used in the connection linking the two components. The determination of these values is design implementation specific. I do not think collecting values from some other OS is a reasonable way to set these. The values would apparently depend on the electrical design of the particular machine. > > If we add this, I'm afraid we'll have people programming random things > > that seem to work but are not actually reliable. > > IMO users need to take full responsibility for own doings. > Also, it's already doable by using setpci... I don't think it currently does, but setpci should taint the kernel. If users want to write setpci scripts to fiddle with stuff, that's great, but that moves it outside the supportable realm. If we provide a sysfs interface to do this, then it becomes more of *our* problem to make it work correctly, and I don't think that's practical in this case. > If we don't want to add LTR sysfs, what other options do we have to > enable VMD ASPM and LTR by default since BIOS doesn't do it for us? > 1) Enable it in the PCI or VMD driver, however this approach violates > POLICY_DEFAULT. > 2) Use `setpci` directly in udev rules to enable VMD ASPM and LTR. > > I think 2) is bad, and since 1) isn't so good either, the approach in > this patch may be the best compromise. I do not know how to safely enable L1.2. It's likely that I'm just missing something, but I don't see enough information in PCI config space and the PCI Firmware interface (_DSM for Latency Tolerance Reporting) to enable L1.2. It's possible that a new firmware interface is required. I don't think it's wise to enable L1.2 unless we have good confidence that we know how to do it correctly. It's hard enough to debug ASPM issues as it is. Bjorn
On Tue, Oct 26, 2021 at 1:31 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Oct 25, 2021 at 06:33:50PM +0800, Kai-Heng Feng wrote: > > On Thu, Oct 21, 2021 at 11:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Thu, Oct 21, 2021 at 11:51:59AM +0800, Kai-Heng Feng wrote: > > > > Sometimes BIOS may not be able to program ASPM and LTR settings, for > > > > instance, the NVMe devices behind Intel VMD bridges. For this case, both > > > > ASPM and LTR have to be enabled to have significant power saving. > > > > > > > > Since we still want POLICY_DEFAULT honor the default BIOS ASPM settings, > > > > introduce LTR sysfs knobs so users can set max snoop and max nosnoop > > > > latency manually or via udev rules. > > > > > > How is the user supposed to figure out what "max snoop" and "max > > > nosnoop" values to program? > > > > Actually, the only way I know is to get the value from other OS. > > I don't see how this can be a workable solution. IIUC this is > specifically related to ASPM L1.2. L1.2 depends on LTR (the max > snoop/nosnoop values) and the TPOWER_ON and Common_Mode_Restore_Time > values. PCIe r5.0, sec 5.5.4, says: > > Prior to setting either or both of the enable bits for L1.2, the > values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM > L1.2 Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value and > Scale fields) must be programmed. > > The TPOWER_ON and Common_Mode_Restore_Time fields must be programmed > to the appropriate values based on the components and AC coupling > capacitors used in the connection linking the two components. The > determination of these values is design implementation specific. > > I do not think collecting values from some other OS is a reasonable > way to set these. The values would apparently depend on the > electrical design of the particular machine. What if we fallback to the original approach and use the VMD driver to enable ASPM and LTR values? At least I think Intel should be able to provide correct values for their SoC. > > > > If we add this, I'm afraid we'll have people programming random things > > > that seem to work but are not actually reliable. > > > > IMO users need to take full responsibility for own doings. > > Also, it's already doable by using setpci... > > I don't think it currently does, but setpci should taint the kernel. > > If users want to write setpci scripts to fiddle with stuff, that's > great, but that moves it outside the supportable realm. If we provide > a sysfs interface to do this, then it becomes more of *our* problem to > make it work correctly, and I don't think that's practical in this > case. OK. > > > If we don't want to add LTR sysfs, what other options do we have to > > enable VMD ASPM and LTR by default since BIOS doesn't do it for us? > > 1) Enable it in the PCI or VMD driver, however this approach violates > > POLICY_DEFAULT. > > 2) Use `setpci` directly in udev rules to enable VMD ASPM and LTR. > > > > I think 2) is bad, and since 1) isn't so good either, the approach in > > this patch may be the best compromise. > > I do not know how to safely enable L1.2. It's likely that I'm just > missing something, but I don't see enough information in PCI config > space and the PCI Firmware interface (_DSM for Latency Tolerance > Reporting) to enable L1.2. It's possible that a new firmware > interface is required. I was told by Intel that they didn't use _DSM to get LTR values, at least not the VMD case. > > I don't think it's wise to enable L1.2 unless we have good confidence > that we know how to do it correctly. It's hard enough to debug ASPM > issues as it is. So what other options do we have if we want to enable VMD ASPM while keeping CONFIG_PCIEASPM_DEFAULT=y? Right now we enabled the VMD ASPM/LTR bits in downstream kernel, but other distro users may want to have upstream support for this. Kai-Heng > > Bjorn
On Tue, Oct 26, 2021 at 10:28:38AM +0800, Kai-Heng Feng wrote: > What if we fallback to the original approach and use the VMD driver > to enable ASPM and LTR values? At least I think Intel should be > able to provide correct values for their SoC. Can you post the patches for that? I'm not sure exactly what the original approach was. Are these the same as the downstream support you mention below? > So what other options do we have if we want to enable VMD ASPM while > keeping CONFIG_PCIEASPM_DEFAULT=y? Right now we enabled the VMD > ASPM/LTR bits in downstream kernel, but other distro users may want > to have upstream support for this.
On Wed, Oct 27, 2021 at 12:53 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Oct 26, 2021 at 10:28:38AM +0800, Kai-Heng Feng wrote: > > > What if we fallback to the original approach and use the VMD driver > > to enable ASPM and LTR values? At least I think Intel should be > > able to provide correct values for their SoC. > > Can you post the patches for that? I'm not sure exactly what the > original approach was. Are these the same as the downstream support > you mention below? This is the previous attempt to enable VMD ASPM: https://lore.kernel.org/linux-pci/20200930082455.25613-1-kai.heng.feng@canonical.com/ It didn't enable LTR though. Right now the downstream kernel use PCI quirk to enable VMD ASPM and LTR: https://kernel.ubuntu.com/git/ubuntu/unstable.git/commit/?id=069ab00c2613d27cb7cdeb2a4c751de89dab81b4 https://kernel.ubuntu.com/git/ubuntu/unstable.git/commit/?id=1e4dec5fe846f8dd8954b173434670f8ae30b5ff The patches don't touch VMD driver to minimize merge conflict on VMD driver. I really hope we can put these changes to upstream VMD driver. Kai-Heng > > > So what other options do we have if we want to enable VMD ASPM while > > keeping CONFIG_PCIEASPM_DEFAULT=y? Right now we enabled the VMD > > ASPM/LTR bits in downstream kernel, but other distro users may want > > to have upstream support for this.
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index 1560859ab056..f7dc62936445 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -1299,6 +1299,59 @@ static ssize_t clkpm_store(struct device *dev, return len; } +static ssize_t ltr_attr_show_common(struct device *dev, + struct device_attribute *attr, char *buf, u8 state) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ltr; + u16 val; + + ltr = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); + if (!ltr) + return -EINVAL; + + pci_read_config_word(pdev, ltr + state, &val); + + return sysfs_emit(buf, "0x%0x\n", val); +} + +static ssize_t ltr_attr_store_common(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len, u8 state) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ltr; + u16 val; + + ltr = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); + if (!ltr) + return -EINVAL; + + if (kstrtou16(buf, 16, &val) < 0) + return -EINVAL; + + /* LatencyScale is not permitted to be 110 or 111 */ + if ((val >> 10) > 5) + return -EINVAL; + + pci_write_config_word(pdev, ltr + state, val); + + return len; +} + +#define LTR_ATTR(_f, _s) \ +static ssize_t _f##_show(struct device *dev, \ + struct device_attribute *attr, char *buf) \ +{ return ltr_attr_show_common(dev, attr, buf, PCI_LTR_##_s); } \ + \ +static ssize_t _f##_store(struct device *dev, \ + struct device_attribute *attr, \ + const char *buf, size_t len) \ +{ return ltr_attr_store_common(dev, attr, buf, len, PCI_LTR_##_s); } + +LTR_ATTR(ltr_max_snoop_lat, MAX_SNOOP_LAT); +LTR_ATTR(ltr_max_nosnoop_lat, MAX_NOSNOOP_LAT); + static DEVICE_ATTR_RW(clkpm); static DEVICE_ATTR_RW(l0s_aspm); static DEVICE_ATTR_RW(l1_aspm); @@ -1306,6 +1359,8 @@ static DEVICE_ATTR_RW(l1_1_aspm); static DEVICE_ATTR_RW(l1_2_aspm); static DEVICE_ATTR_RW(l1_1_pcipm); static DEVICE_ATTR_RW(l1_2_pcipm); +static DEVICE_ATTR_RW(ltr_max_snoop_lat); +static DEVICE_ATTR_RW(ltr_max_nosnoop_lat); static struct attribute *aspm_ctrl_attrs[] = { &dev_attr_clkpm.attr, @@ -1315,6 +1370,8 @@ static struct attribute *aspm_ctrl_attrs[] = { &dev_attr_l1_2_aspm.attr, &dev_attr_l1_1_pcipm.attr, &dev_attr_l1_2_pcipm.attr, + &dev_attr_ltr_max_snoop_lat.attr, + &dev_attr_ltr_max_nosnoop_lat.attr, NULL }; @@ -1338,6 +1395,8 @@ static umode_t aspm_ctrl_attrs_are_visible(struct kobject *kobj, if (n == 0) return link->clkpm_capable ? a->mode : 0; + else if (n == 7 || n == 8) + return pdev->ltr_path ? a->mode : 0; return link->aspm_capable & aspm_state_map[n - 1] ? a->mode : 0; }
Sometimes BIOS may not be able to program ASPM and LTR settings, for instance, the NVMe devices behind Intel VMD bridges. For this case, both ASPM and LTR have to be enabled to have significant power saving. Since we still want POLICY_DEFAULT honor the default BIOS ASPM settings, introduce LTR sysfs knobs so users can set max snoop and max nosnoop latency manually or via udev rules. [1] https://github.com/systemd/systemd/pull/17899/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209789 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- v2: - New patch. drivers/pci/pcie/aspm.c | 59 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+)