Message ID | 1491627351-1111-5-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, Apr 7, 2017 at 9:55 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > Now that we added a hook to be called from device_add, save the > default values from the HW registers early in the boot for further > reuse during hot device add/remove operations. > > If the link is down during boot, assume that we want to enable L0s > and L1 following hotplug insertion as well as L1SS if supported. IIUC, so far POLICY_DEFAULT meant that we'd just use & follow what BIOS has done, and play it safe (never try to be more opportunistic). With this change however, we'd be slightly overstepping and giving ourselves benefit of doubt if the BIOS could not enable ASPM states because the link was not up. This may be good, but I think we should call it out, and add some more elaborate comment on the POLICY_DEFAULT description (what to, and what not to expect in different situations). It is important because existing systems today, that used to boot without cards and later hotplugged them, didn't have ASPM states enabled. They will now suddenly start seeing all ASPM states enabled including L1 substates for the first time (if supported). My system is not hotplug capable (I have the EP soldered on board, so couldn't do much testing, except for sanity. Please feel free to use my Reviewed-by. > > Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=194895 > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > --- > drivers/pci/pcie/aspm.c | 34 ++++++++++++++++++++++++++++++---- > 1 file changed, 30 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > index e33f84b..c7da087 100644 > --- a/drivers/pci/pcie/aspm.c > +++ b/drivers/pci/pcie/aspm.c > @@ -505,8 +505,10 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist) > */ > if (dwreg.support & upreg.support & PCIE_LINK_STATE_L0S) > link->aspm_support |= ASPM_STATE_L0S; > - if (dwreg.enabled & PCIE_LINK_STATE_L0S) > + if (dwreg.enabled & PCIE_LINK_STATE_L0S) { > link->aspm_enabled |= ASPM_STATE_L0S_UP; > + link->aspm_default |= ASPM_STATE_L0S_UP; > + } > if (upreg.enabled & PCIE_LINK_STATE_L0S) > link->aspm_enabled |= ASPM_STATE_L0S_DW; > link->latency_up.l0s = calc_l0s_latency(upreg.latency_encoding_l0s); > @@ -542,9 +544,6 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist) > if (link->aspm_support & ASPM_STATE_L1SS) > aspm_calc_l1ss_info(link, &upreg, &dwreg); > > - /* Save default state */ > - link->aspm_default = link->aspm_enabled; > - > /* Setup initial capable state. Will be updated later */ > link->aspm_capable = link->aspm_support; > /* > @@ -835,11 +834,38 @@ static int pci_aspm_init_downstream(struct pci_dev *pdev) > static int pci_aspm_init_upstream(struct pci_dev *pdev) > { > struct pcie_link_state *link; > + struct aspm_register_info upreg; > + u16 lnk_status; > + bool ret; > > link = alloc_pcie_link_state(pdev); > if (!link) > return -ENOMEM; > > + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); > + ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); > + > + if (ret) { > + pcie_get_aspm_reg(pdev, &upreg); > + if (upreg.enabled & PCIE_LINK_STATE_L0S) > + link->aspm_default |= ASPM_STATE_L0S_DW; > + if (upreg.enabled & PCIE_LINK_STATE_L1) > + link->aspm_default |= ASPM_STATE_L1; > + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_ASPM_L1_1) > + link->aspm_default |= ASPM_STATE_L1_1; > + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_ASPM_L1_2) > + link->aspm_default |= ASPM_STATE_L1_2; > + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_PCIPM_L1_1) > + link->aspm_default |= ASPM_STATE_L1_1_PCIPM; > + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_PCIPM_L1_2) > + link->aspm_default |= ASPM_STATE_L1_2_PCIPM; > + } else { > + if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_L1SS)) > + link->aspm_default = ASPM_STATE_L0S | ASPM_STATE_L1; > + else > + link->aspm_default = ASPM_STATE_ALL; > + } Optional: May be consider moving this code (more aptly) to pcie_aspm_cap_init() by adding a check for link-up before we start reading downstream registers there? I guess you'll need to move the call to pcie_aspm_cap_init() a little further up in pcie_aspm_init_link_state(). > + > return 0; > } > > -- > 1.9.1 >
Bjorn, On 4/12/2017 3:19 PM, Rajat Jain wrote: > On Fri, Apr 7, 2017 at 9:55 PM, Sinan Kaya <okaya@codeaurora.org> wrote: >> Now that we added a hook to be called from device_add, save the >> default values from the HW registers early in the boot for further >> reuse during hot device add/remove operations. >> >> If the link is down during boot, assume that we want to enable L0s >> and L1 following hotplug insertion as well as L1SS if supported. > > IIUC, so far POLICY_DEFAULT meant that we'd just use & follow what > BIOS has done, and play it safe (never try to be more opportunistic). > With this change however, we'd be slightly overstepping and giving > ourselves benefit of doubt if the BIOS could not enable ASPM states > because the link was not up. This may be good, but I think we should > call it out, and add some more elaborate comment on the POLICY_DEFAULT > description (what to, and what not to expect in different situations). > > It is important because existing systems today, that used to boot > without cards and later hotplugged them, didn't have ASPM states > enabled. They will now suddenly start seeing all ASPM states enabled > including L1 substates for the first time (if supported). > Rajat has a good point here. Would you like me to update the ASPM document with this new behavior for hotplug? Do you have another behavior preference when it comes this? Sinan
[+cc Myron, lkml] On Fri, Apr 14, 2017 at 03:12:35PM -0400, Sinan Kaya wrote: > Bjorn, > > On 4/12/2017 3:19 PM, Rajat Jain wrote: > > On Fri, Apr 7, 2017 at 9:55 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > >> Now that we added a hook to be called from device_add, save the > >> default values from the HW registers early in the boot for further > >> reuse during hot device add/remove operations. > >> > >> If the link is down during boot, assume that we want to enable L0s > >> and L1 following hotplug insertion as well as L1SS if supported. > > > > IIUC, so far POLICY_DEFAULT meant that we'd just use & follow what > > BIOS has done, and play it safe (never try to be more opportunistic). > > With this change however, we'd be slightly overstepping and giving > > ourselves benefit of doubt if the BIOS could not enable ASPM states > > because the link was not up. This may be good, but I think we should > > call it out, and add some more elaborate comment on the POLICY_DEFAULT > > description (what to, and what not to expect in different situations). > > > > It is important because existing systems today, that used to boot > > without cards and later hotplugged them, didn't have ASPM states > > enabled. They will now suddenly start seeing all ASPM states enabled > > including L1 substates for the first time (if supported). > > > > Rajat has a good point here. Would you like me to update the ASPM document > with this new behavior for hotplug? > > Do you have another behavior preference when it comes this? That *is* a very good point. I think the change in behavior could be surprising. I wonder if we should revise our understanding of what POLICY_DEFAULT means. If we decided it means "the kernel never changes any ASPM config", it would be clear that we keep the BIOS configuration for everything present at boot, and we don't enable ASPM for any hot-added devices. I think the motivation for this series is to apply the BIOS's power management policy to hot-added devices. There's no direct way to know the BIOS's policy, so we're trying to infer it from the boot-time link configurations. Should we even *try* to apply the BIOS's policy? I don't know. If a platform really wanted to maintain control over ASPM and apply its policy consistently, I think it could do that by setting ACPI_FADT_NO_ASPM and using acpiphp instead of pciehp. Then the OS would keep its mitts off ASPM, and the BIOS would have a chance to configure ASPM for hot-added devices before giving them to the OS. Here are the possibilities I see for POLICY_DEFAULT: 1) Never touch ASPM config (what we have today) Boot-present devices: ASPM config retained from BIOS Hot-added devices: ASPM disabled (poweron state) 2) Linux maintains BIOS policy (conservative) Boot-present devices: ASPM config retained from BIOS Hot-added devices (slot occupied at boot): use boot-time ASPM config Hot-added devices (slot empty at boot): ASPM disabled 3) Linux maintains BIOS policy (aggressive, your current patch) Boot-present devices: ASPM config retained from BIOS Hot-added devices (slot occupied at boot): use boot-time ASPM config Hot-added devices (slot empty at boot): ASPM enabled I'm becoming less convinced that options 2 or 3 make sense. For one thing, they're both hard to describe concisely because there are too many special cases, and that's always a red flag for me. Even for a given BIOS power management policy, the ASPM configuration may depend on the particular device; for example, a balanced policy might enable ASPM for USB devices but not for NICs. So I'm not sure it really makes sense to remember what BIOS did for the card that was in the slot at boot-time and apply that to a possibly different card hot-added later. I think there's an argument to be made that if we care about ASPM configuration, we should be using a non-POLICY_DEFAULT setting. And I think there's value in having POLICY_DEFAULT be the absolute lowest- risk setting, which I think means option 1. What do you think? Bjorn
On 4/14/2017 5:44 PM, Bjorn Helgaas wrote: > I think there's an argument to be made that if we care about ASPM > configuration, we should be using a non-POLICY_DEFAULT setting. And I > think there's value in having POLICY_DEFAULT be the absolute lowest- > risk setting, which I think means option 1. > > What do you think? I see your point. The counter argument is that most of the users do not know what an ASPM kernel command line is unless they understand PCI language. I have been using the powersave policy option until now. I recently realized that nobody except me is using this option. Therefore, we are wasting power by default following a hotplug insertion. This is the case where I'm trying to avoid. With the introduction of NVMe u.2 drives, hotplug is becoming more and more mainstream. I decided to take the matters into my hand with this series for this very reason. Like you said, BIOS is out of the picture with pciehp. There is nobody to configure ASPM following a hotplug insertion. I can also claim that If user wants performance, they should boot with the performance policy or pcie_aspm=off parameters. I saw this recommendation in multiple DPDK tuning documents. Like you said, what do we do by default is the question. Should we opt for safe like we are doing, or try to save some power. Maybe, we are missing a HPP option from the PCI spec.
On Fri, Apr 14, 2017 at 5:17 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > On 4/14/2017 5:44 PM, Bjorn Helgaas wrote: >> I think there's an argument to be made that if we care about ASPM >> configuration, we should be using a non-POLICY_DEFAULT setting. And I >> think there's value in having POLICY_DEFAULT be the absolute lowest- >> risk setting, which I think means option 1. >> >> What do you think? > > I see your point. The counter argument is that most of the users do not > know what an ASPM kernel command line is unless they understand PCI > language. I don't think the answer is using the "pcie_aspm.policy=" boot argument. I certainly don't want users to have to deal with that. I wish we didn't even have that parameter. I think we need runtime knobs instead (and I guess we already have /sys/module/pcie_aspm/parameters/policy and /sys/.../link_state), and distro userspace should use them. I'm envisioning something in "System Settings / Power" or similar. Basically I think the policy doesn't *have* to be dictated by a kernel boot-time parameter, so it should not be. > I have been using the powersave policy option until now. I recently realized > that nobody except me is using this option. Therefore, we are wasting > power by default following a hotplug insertion. > > This is the case where I'm trying to avoid. With the introduction of NVMe > u.2 drives, hotplug is becoming more and more mainstream. I decided to > take the matters into my hand with this series for this very reason. > > Like you said, BIOS is out of the picture with pciehp. There is nobody > to configure ASPM following a hotplug insertion. > > I can also claim that If user wants performance, they should boot with > the performance policy or pcie_aspm=off parameters. > > I saw this recommendation in multiple DPDK tuning documents. > > Like you said, what do we do by default is the question. Should we opt > for safe like we are doing, or try to save some power. I think safety is paramount. Every user should be able to boot safely without any kernel parameters. We don't want users to have a problem booting and then have to search for a workaround like booting with "pcie_aspm=off". Most users will never do that. Here's a long-term strawman proposal, see what you think: - Deprecate CONFIG_PCIEASPM_DEFAULT, CONFIG_PCIEASPM_POWERSAVE, etc. - Default aspm_policy is POLICY_DEFAULT always. - POLICY_DEFAULT means Linux doesn't touch anything: if BIOS enabled ASPM, we leave it that way; we leave ASPM disabled on hot-added devices. - Deprecate kernel boot parameters (possibly keep pcie_aspm=off for debugging use). - Use /sys/module/pcie_aspm/parameters/policy for run-time system-wide control, including for future hot-added devices. - Remove CONFIG_PCIEASPM_DEBUG and enable that code always, so we have fine-grained run-time control. > Maybe, we are missing a HPP option from the PCI spec. That's an interesting idea. _HPX does have provision for manipulating Link Control bits (see ACPI r5.0, sec 6.2.8.3), but I don't think very many systems implement it. And there's currently no connection between program_hpp_type2() and aspm.c, so I'm a little worried that we might have issues if a system did implement an _HPX that sets any of the ASPM bits. Bjorn
On 4/17/2017 12:38 PM, Bjorn Helgaas wrote: >> Like you said, what do we do by default is the question. Should we opt >> for safe like we are doing, or try to save some power. > I think safety is paramount. Every user should be able to boot safely > without any kernel parameters. We don't want users to have a problem > booting and then have to search for a workaround like booting with > "pcie_aspm=off". Most users will never do that. > OK, no problem with leaving the behavior as it is. My initial approach was #2. We knew this way that user had full control over the ASPM policy by changing the BIOS option. Then, Mayurkumar complained that ASPM is not enabled following a hotplug insertion to an empty slot. That's when I switched to #3 as it sounded like a good thing to have for us. > Here's a long-term strawman proposal, see what you think: > > - Deprecate CONFIG_PCIEASPM_DEFAULT, CONFIG_PCIEASPM_POWERSAVE, etc. > - Default aspm_policy is POLICY_DEFAULT always. > - POLICY_DEFAULT means Linux doesn't touch anything: if BIOS enabled > ASPM, we leave it that way; we leave ASPM disabled on hot-added > devices. I can easily see people complaining the other way around. There could be some boot FW that doesn't know what ASPM is and this particular system could rely on the compile time option for enabling power options. Maybe, a command line option will be required for them to keep the existing behavior. > - Deprecate kernel boot parameters (possibly keep pcie_aspm=off for > debugging use). > - Use /sys/module/pcie_aspm/parameters/policy for run-time > system-wide control, including for future hot-added devices. > - Remove CONFIG_PCIEASPM_DEBUG and enable that code always, so we > have fine-grained run-time control. > Runtime control sounds like a better plan. We need hooks into the system power management policy. >> Maybe, we are missing a HPP option from the PCI spec. > That's an interesting idea. _HPX does have provision for manipulating > Link Control bits (see ACPI r5.0, sec 6.2.8.3), but I don't think very > many systems implement it. And there's currently no connection > between program_hpp_type2() and aspm.c, so I'm a little worried that > we might have issues if a system did implement an _HPX that sets any > of the ASPM bits. I looked at the spec some more. These are there to restore the register settings following hotplug insertion. I agree it won't play nice with ASPM as the control bits need to be enabled in coordination with the upstream device. I think the right approach is to let the userspace feed the required policy to the kernel like you suggested. Then, it needs to be per port via link_state to have the most flexibility.
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index e33f84b..c7da087 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -505,8 +505,10 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist) */ if (dwreg.support & upreg.support & PCIE_LINK_STATE_L0S) link->aspm_support |= ASPM_STATE_L0S; - if (dwreg.enabled & PCIE_LINK_STATE_L0S) + if (dwreg.enabled & PCIE_LINK_STATE_L0S) { link->aspm_enabled |= ASPM_STATE_L0S_UP; + link->aspm_default |= ASPM_STATE_L0S_UP; + } if (upreg.enabled & PCIE_LINK_STATE_L0S) link->aspm_enabled |= ASPM_STATE_L0S_DW; link->latency_up.l0s = calc_l0s_latency(upreg.latency_encoding_l0s); @@ -542,9 +544,6 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist) if (link->aspm_support & ASPM_STATE_L1SS) aspm_calc_l1ss_info(link, &upreg, &dwreg); - /* Save default state */ - link->aspm_default = link->aspm_enabled; - /* Setup initial capable state. Will be updated later */ link->aspm_capable = link->aspm_support; /* @@ -835,11 +834,38 @@ static int pci_aspm_init_downstream(struct pci_dev *pdev) static int pci_aspm_init_upstream(struct pci_dev *pdev) { struct pcie_link_state *link; + struct aspm_register_info upreg; + u16 lnk_status; + bool ret; link = alloc_pcie_link_state(pdev); if (!link) return -ENOMEM; + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); + ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); + + if (ret) { + pcie_get_aspm_reg(pdev, &upreg); + if (upreg.enabled & PCIE_LINK_STATE_L0S) + link->aspm_default |= ASPM_STATE_L0S_DW; + if (upreg.enabled & PCIE_LINK_STATE_L1) + link->aspm_default |= ASPM_STATE_L1; + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_ASPM_L1_1) + link->aspm_default |= ASPM_STATE_L1_1; + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_ASPM_L1_2) + link->aspm_default |= ASPM_STATE_L1_2; + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_PCIPM_L1_1) + link->aspm_default |= ASPM_STATE_L1_1_PCIPM; + if (upreg.l1ss_ctl1 & PCI_L1SS_CTL1_PCIPM_L1_2) + link->aspm_default |= ASPM_STATE_L1_2_PCIPM; + } else { + if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_L1SS)) + link->aspm_default = ASPM_STATE_L0S | ASPM_STATE_L1; + else + link->aspm_default = ASPM_STATE_ALL; + } + return 0; }
Now that we added a hook to be called from device_add, save the default values from the HW registers early in the boot for further reuse during hot device add/remove operations. If the link is down during boot, assume that we want to enable L0s and L1 following hotplug insertion as well as L1SS if supported. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=194895 Signed-off-by: Sinan Kaya <okaya@codeaurora.org> --- drivers/pci/pcie/aspm.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-)