Message ID | 20230517105235.29176-10-ilpo.jarvinen@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Improve PCIe Capability RMW concurrency control | expand |
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes: > Don't assume that only the driver would be accessing LNKCTL. ASPM > policy changes can trigger write to LNKCTL outside of driver's control. > > Use RMW capability accessors which does proper locking to avoid losing > concurrent updates to the register value. On restore, clear the ASPMC > field properly. > > Fixes: 76d870ed09ab ("ath10k: enable ASPM") > Suggested-by: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > Cc: stable@vger.kernel.org Acked-by: Kalle Valo <kvalo@kernel.org>
On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote: > Don't assume that only the driver would be accessing LNKCTL. ASPM > policy changes can trigger write to LNKCTL outside of driver's control. > > Use RMW capability accessors which does proper locking to avoid losing > concurrent updates to the register value. On restore, clear the ASPMC > field properly. > > Fixes: 76d870ed09ab ("ath10k: enable ASPM") > Suggested-by: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > Cc: stable@vger.kernel.org > --- > drivers/net/wireless/ath/ath10k/pci.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c > index a7f44f6335fb..9275a672f90c 100644 > --- a/drivers/net/wireless/ath/ath10k/pci.c > +++ b/drivers/net/wireless/ath/ath10k/pci.c > @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar) > ath10k_pci_irq_enable(ar); > ath10k_pci_rx_post(ar); > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > - ar_pci->link_ctl); > + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL, > + PCI_EXP_LNKCTL_ASPMC, > + ar_pci->link_ctl & PCI_EXP_LNKCTL_ASPMC); > > return 0; > } > @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar, > > pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL, > &ar_pci->link_ctl); > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > - ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC); > + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL, > + PCI_EXP_LNKCTL_ASPMC); These ath drivers all have the form: 1) read LNKCTL 2) save LNKCTL value in ->link_ctl 3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC" to disable ASPM 4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM These patches close the hole between 1) and 3) where other LNKCTL updates could interfere, which is definitely a good thing. But the hole between 1) and 4) is much bigger and still there. Any update by the PCI core in that interval would be lost. Straw-man proposal: - Change pci_disable_link_state() so it ignores aspm_disabled and always disables ASPM even if platform firmware hasn't granted ownership. Maybe this should warn and taint the kernel. - Change drivers to use pci_disable_link_state() instead of writing LNKCTL directly. Bjorn
On Wed, 24 May 2023, Bjorn Helgaas wrote: > On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote: > > Don't assume that only the driver would be accessing LNKCTL. ASPM > > policy changes can trigger write to LNKCTL outside of driver's control. > > > > Use RMW capability accessors which does proper locking to avoid losing > > concurrent updates to the register value. On restore, clear the ASPMC > > field properly. > > > > Fixes: 76d870ed09ab ("ath10k: enable ASPM") > > Suggested-by: Lukas Wunner <lukas@wunner.de> > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > > Cc: stable@vger.kernel.org > > --- > > drivers/net/wireless/ath/ath10k/pci.c | 9 +++++---- > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c > > index a7f44f6335fb..9275a672f90c 100644 > > --- a/drivers/net/wireless/ath/ath10k/pci.c > > +++ b/drivers/net/wireless/ath/ath10k/pci.c > > @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar) > > ath10k_pci_irq_enable(ar); > > ath10k_pci_rx_post(ar); > > > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > - ar_pci->link_ctl); > > + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > + PCI_EXP_LNKCTL_ASPMC, > > + ar_pci->link_ctl & PCI_EXP_LNKCTL_ASPMC); > > > > return 0; > > } > > @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar, > > > > pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > &ar_pci->link_ctl); > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > - ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC); > > + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > + PCI_EXP_LNKCTL_ASPMC); > > These ath drivers all have the form: > > 1) read LNKCTL > 2) save LNKCTL value in ->link_ctl > 3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC" > to disable ASPM > 4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM > > These patches close the hole between 1) and 3) where other LNKCTL > updates could interfere, which is definitely a good thing. > > But the hole between 1) and 4) is much bigger and still there. Any > update by the PCI core in that interval would be lost. Any update to PCI_EXP_LNKCTL_ASPMC field in that interval is lost yes, the updates to _the other fields_ in LNKCTL are not lost. I know this might result in drivers/pci/pcie/aspm.c disagreeing what the state of the ASPM is (as shown under sysfs) compared with LNKCTL value but the cause can no longer be due racing RMW. Essentially, 4) is seen as an override to what core did if it changed ASPMC in between. Technically, something is still "lost" like you say but for a different reason than this series is trying to fix. > Straw-man proposal: > > - Change pci_disable_link_state() so it ignores aspm_disabled and > always disables ASPM even if platform firmware hasn't granted > ownership. Maybe this should warn and taint the kernel. > > - Change drivers to use pci_disable_link_state() instead of writing > LNKCTL directly. I fully agree that's the direction we should be moving, yes. However, I'm a bit hesitant to take that leap in one step. These drivers currently not only disable ASPM but also re-enable it (assuming we guessed the intent right). If I directly implement that proposal, ASPM is not going to be re-enabled when PCI core does not allowing it. Could it cause some power related regression? My plan is to make another patch series after these to realize exactly what you're proposing. It would allow better to isolate the problems that related to the lack of ASPM. I hope this two step approach is an acceptable way forward? I can of course add those patches on top of these if that would be preferrable.
On Thu, 25 May 2023, Ilpo Järvinen wrote: > On Wed, 24 May 2023, Bjorn Helgaas wrote: > > > On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote: > > > Don't assume that only the driver would be accessing LNKCTL. ASPM > > > policy changes can trigger write to LNKCTL outside of driver's control. > > > > > > Use RMW capability accessors which does proper locking to avoid losing > > > concurrent updates to the register value. On restore, clear the ASPMC > > > field properly. > > > > > > Fixes: 76d870ed09ab ("ath10k: enable ASPM") > > > Suggested-by: Lukas Wunner <lukas@wunner.de> > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > > > Cc: stable@vger.kernel.org > > > --- > > > drivers/net/wireless/ath/ath10k/pci.c | 9 +++++---- > > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c > > > index a7f44f6335fb..9275a672f90c 100644 > > > --- a/drivers/net/wireless/ath/ath10k/pci.c > > > +++ b/drivers/net/wireless/ath/ath10k/pci.c > > > @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar) > > > ath10k_pci_irq_enable(ar); > > > ath10k_pci_rx_post(ar); > > > > > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > - ar_pci->link_ctl); > > > + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > + PCI_EXP_LNKCTL_ASPMC, > > > + ar_pci->link_ctl & PCI_EXP_LNKCTL_ASPMC); > > > > > > return 0; > > > } > > > @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar, > > > > > > pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > &ar_pci->link_ctl); > > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > - ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC); > > > + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > + PCI_EXP_LNKCTL_ASPMC); > > > > These ath drivers all have the form: > > > > 1) read LNKCTL > > 2) save LNKCTL value in ->link_ctl > > 3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC" > > to disable ASPM > > 4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM > > > > These patches close the hole between 1) and 3) where other LNKCTL > > updates could interfere, which is definitely a good thing. > > > > But the hole between 1) and 4) is much bigger and still there. Any > > update by the PCI core in that interval would be lost. > > Any update to PCI_EXP_LNKCTL_ASPMC field in that interval is lost yes, the > updates to _the other fields_ in LNKCTL are not lost. > > I know this might result in drivers/pci/pcie/aspm.c disagreeing what > the state of the ASPM is (as shown under sysfs) compared with LNKCTL > value but the cause can no longer be due racing RMW. Essentially, 4) is > seen as an override to what core did if it changed ASPMC in between. > Technically, something is still "lost" like you say but for a different > reason than this series is trying to fix. > > > Straw-man proposal: > > > > - Change pci_disable_link_state() so it ignores aspm_disabled and > > always disables ASPM even if platform firmware hasn't granted > > ownership. Maybe this should warn and taint the kernel. > > > > - Change drivers to use pci_disable_link_state() instead of writing > > LNKCTL directly. Now that I took a deeper look into what pci_disable_link_state() and pci_enable_link_state() do, I realized they're not really disable/enable pair like I had assumed from their names. Disable adds to ->aspm_disable and flags are never removed from that because enable does not touch aspm_disable at all but has it's own flag variable. This asymmetry looks intentional. So if ath drivers would do pci_disable_link_state() to realize 1)-3), there is no way to undo it in 4). It looks as if ath drivers would actually want to use pci_enable_link_state() with different state parameters to realize what they want to do in 1)-4). Any suggestion which way I should go with these ath drivers here, use pci_enable_link_state()? (There are other drivers where pci_disable_link_state() is very much valid thing to do.)
On Thu, May 25, 2023 at 01:11:51PM +0300, Ilpo Järvinen wrote: > On Wed, 24 May 2023, Bjorn Helgaas wrote: > > On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote: > > > Don't assume that only the driver would be accessing LNKCTL. ASPM > > > policy changes can trigger write to LNKCTL outside of driver's control. > > > > > > Use RMW capability accessors which does proper locking to avoid losing > > > concurrent updates to the register value. On restore, clear the ASPMC > > > field properly. > > > > > > Fixes: 76d870ed09ab ("ath10k: enable ASPM") > > > Suggested-by: Lukas Wunner <lukas@wunner.de> > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > > > Cc: stable@vger.kernel.org > > > --- > > > drivers/net/wireless/ath/ath10k/pci.c | 9 +++++---- > > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c > > > index a7f44f6335fb..9275a672f90c 100644 > > > --- a/drivers/net/wireless/ath/ath10k/pci.c > > > +++ b/drivers/net/wireless/ath/ath10k/pci.c > > > @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar) > > > ath10k_pci_irq_enable(ar); > > > ath10k_pci_rx_post(ar); > > > > > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > - ar_pci->link_ctl); > > > + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > + PCI_EXP_LNKCTL_ASPMC, > > > + ar_pci->link_ctl & PCI_EXP_LNKCTL_ASPMC); > > > > > > return 0; > > > } > > > @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar, > > > > > > pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > &ar_pci->link_ctl); > > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > - ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC); > > > + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > + PCI_EXP_LNKCTL_ASPMC); > > > > These ath drivers all have the form: > > > > 1) read LNKCTL > > 2) save LNKCTL value in ->link_ctl > > 3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC" > > to disable ASPM > > 4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM > > > > These patches close the hole between 1) and 3) where other LNKCTL > > updates could interfere, which is definitely a good thing. > > > > But the hole between 1) and 4) is much bigger and still there. Any > > update by the PCI core in that interval would be lost. > > Any update to PCI_EXP_LNKCTL_ASPMC field in that interval is lost yes, the > updates to _the other fields_ in LNKCTL are not lost. Ah, yes, you're right, I missed the masking to PCI_EXP_LNKCTL_ASPMC in the pcie_capability_clear_word(). > > Straw-man proposal: > > > > - Change pci_disable_link_state() so it ignores aspm_disabled and > > always disables ASPM even if platform firmware hasn't granted > > ownership. Maybe this should warn and taint the kernel. > > > > - Change drivers to use pci_disable_link_state() instead of writing > > LNKCTL directly. > > I fully agree that's the direction we should be moving, yes. However, I'm > a bit hesitant to take that leap in one step. These drivers currently not > only disable ASPM but also re-enable it (assuming we guessed the intent > right). > > If I directly implement that proposal, ASPM is not going to be re-enabled > when PCI core does not allowing it. Could it cause some power related > regression? IIUC the potential problem only happens with: - A platform that enables ASPM but doesn't grant PCIe Capability ownership to the OS, and - A device where we force-disable ASPM, presumably to avoid some hardware defect. I'm not sure this case is worth worrying about. A platform that enables ASPM without allowing the OS to disable it is taking a risk because it can't know about these device defects or even about user preferences. A device that has an ASPM-related defect may use more power than necessary. I think that's to be expected. > My plan is to make another patch series after these to realize exactly > what you're proposing. It would allow better to isolate the problems that > related to the lack of ASPM. > > I hope this two step approach is an acceptable way forward? I can of > course add those patches on top of these if that would be preferrable. I think two steps is OK. It's a little more work for the driver maintainers to review them, but this step is pretty trivial already reviewed (except for the GPUs, which are probably the most important :)). Bjorn
On Fri, May 26, 2023 at 02:48:44PM +0300, Ilpo Järvinen wrote: > On Thu, 25 May 2023, Ilpo Järvinen wrote: > > On Wed, 24 May 2023, Bjorn Helgaas wrote: > > > On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote: > > > > Don't assume that only the driver would be accessing LNKCTL. ASPM > > > > policy changes can trigger write to LNKCTL outside of driver's control. > > > > > > > > Use RMW capability accessors which does proper locking to avoid losing > > > > concurrent updates to the register value. On restore, clear the ASPMC > > > > field properly. > > > > > > > > Fixes: 76d870ed09ab ("ath10k: enable ASPM") > > > > Suggested-by: Lukas Wunner <lukas@wunner.de> > > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > > > > Cc: stable@vger.kernel.org > > > > --- > > > > drivers/net/wireless/ath/ath10k/pci.c | 9 +++++---- > > > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c > > > > index a7f44f6335fb..9275a672f90c 100644 > > > > --- a/drivers/net/wireless/ath/ath10k/pci.c > > > > +++ b/drivers/net/wireless/ath/ath10k/pci.c > > > > @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar) > > > > ath10k_pci_irq_enable(ar); > > > > ath10k_pci_rx_post(ar); > > > > > > > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > > - ar_pci->link_ctl); > > > > + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > > + PCI_EXP_LNKCTL_ASPMC, > > > > + ar_pci->link_ctl & PCI_EXP_LNKCTL_ASPMC); > > > > > > > > return 0; > > > > } > > > > @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar, > > > > > > > > pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > > &ar_pci->link_ctl); > > > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > > - ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC); > > > > + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL, > > > > + PCI_EXP_LNKCTL_ASPMC); > > > > > > These ath drivers all have the form: > > > > > > 1) read LNKCTL > > > 2) save LNKCTL value in ->link_ctl > > > 3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC" > > > to disable ASPM > > > 4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM > > > > > > These patches close the hole between 1) and 3) where other LNKCTL > > > updates could interfere, which is definitely a good thing. > > > > > > But the hole between 1) and 4) is much bigger and still there. Any > > > update by the PCI core in that interval would be lost. > > > > Any update to PCI_EXP_LNKCTL_ASPMC field in that interval is lost yes, the > > updates to _the other fields_ in LNKCTL are not lost. > > > > I know this might result in drivers/pci/pcie/aspm.c disagreeing what > > the state of the ASPM is (as shown under sysfs) compared with LNKCTL > > value but the cause can no longer be due racing RMW. Essentially, 4) is > > seen as an override to what core did if it changed ASPMC in between. > > Technically, something is still "lost" like you say but for a different > > reason than this series is trying to fix. > > > > > Straw-man proposal: > > > > > > - Change pci_disable_link_state() so it ignores aspm_disabled and > > > always disables ASPM even if platform firmware hasn't granted > > > ownership. Maybe this should warn and taint the kernel. > > > > > > - Change drivers to use pci_disable_link_state() instead of writing > > > LNKCTL directly. > > Now that I took a deeper look into what pci_disable_link_state() and > pci_enable_link_state() do, I realized they're not really disable/enable > pair like I had assumed from their names. Disable adds to ->aspm_disable > and flags are never removed from that because enable does not touch > aspm_disable at all but has it's own flag variable. This asymmetry looks > intentional. Yes, that's an annoying feature. There's only one caller of pci_enable_link_state(), so it may be possible to make this more symmetric. > So if ath drivers would do pci_disable_link_state() to realize 1)-3), > there is no way to undo it in 4). It looks as if ath drivers would > actually want to use pci_enable_link_state() with different state > parameters to realize what they want to do in 1)-4). Yeah, that does sound like a problem. I don't have any great ideas. Bjorn
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c index a7f44f6335fb..9275a672f90c 100644 --- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar) ath10k_pci_irq_enable(ar); ath10k_pci_rx_post(ar); - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, - ar_pci->link_ctl); + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL, + PCI_EXP_LNKCTL_ASPMC, + ar_pci->link_ctl & PCI_EXP_LNKCTL_ASPMC); return 0; } @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar, pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL, &ar_pci->link_ctl); - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL, - ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC); + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL, + PCI_EXP_LNKCTL_ASPMC); /* * Bring the target up cleanly.
Don't assume that only the driver would be accessing LNKCTL. ASPM policy changes can trigger write to LNKCTL outside of driver's control. Use RMW capability accessors which does proper locking to avoid losing concurrent updates to the register value. On restore, clear the ASPMC field properly. Fixes: 76d870ed09ab ("ath10k: enable ASPM") Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org --- drivers/net/wireless/ath/ath10k/pci.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)