mbox series

[v4,0/5] arm64: dts: qcom: sc8280xp: PCIe fixes and GICv3 ITS enable

Message ID 20240306095651.4551-1-johan+linaro@kernel.org (mailing list archive)
Headers show
Series arm64: dts: qcom: sc8280xp: PCIe fixes and GICv3 ITS enable | expand

Message

Johan Hovold March 6, 2024, 9:56 a.m. UTC
This series addresses a few problems with the sc8280xp PCIe
implementation.

The DWC PCIe controller can either use its internal MSI controller or an
external one such as the GICv3 ITS. Enabling the latter allows for
assigning affinity to individual interrupts, but results in a large
amount of Correctable Errors being logged on both the Lenovo ThinkPad
X13s and the sc8280xp-crd reference design.

It turns out that these errors are always generated, but for some yet to
be determined reason, the AER interrupts are never received when using
the internal MSI controller, which makes the link errors harder to
notice.

On the X13s, there is a large number of errors generated when bringing
up the link on boot. This is related to the fact that UEFI firmware has
already enabled the Wi-Fi PCIe link at Gen2 speed and restarting the
link at Gen3 generates a massive amount of errors until the Wi-Fi
firmware is restarted. This has now also been shown to cause the Wi-Fi
to sometimes not start at all on boot for some users.

A recent commit enabling ASPM on certain Qualcomm platforms introduced
further errors when using the Wi-Fi on the X13s as well as when
accessing the NVMe on the CRD. The exact reason for this has not yet
been identified, but disabling ASPM L0s makes the errors go away. This
could suggest that either the current ASPM implementation is incomplete
or that L0s is not supported with these devices.

Note that the X13s and CRD use the same Wi-Fi controller, but the errors
are only generated on the X13s. The NVMe controller on my X13s does not
support L0s so there are no issues there, unlike on the CRD which uses a
different controller. The modem on the CRD does not generate any errors,
but both the NVMe and modem keeps bouncing in and out of L0s/L1 also
when not used, which could indicate that there are bigger problems with
the ASPM implementation. I don't have a modem on my X13s so I have not
been able to test whether L0s causes any trouble there.

Note that disabling ASPM L0s for the X13s Wi-Fi does not seem to have a
significant impact on the power consumption (and there are indications
that this applies generally for L0s on these platforms).

***

Qualcomm has now confirmed that this is an issue on sc8280xp and its
derivate platforms. Specifically, the PHY configuration used on these
platforms is not correctly tuned for L0s and there is currently no
updated configuration available.

As we are now at 6.8-rc7, I've rebased this series on the Qualcomm PCIe
binding rework in linux-next so that the whole series can be merged for
6.9 (the patch to disable l0s and the devicetree fix are both marked for
stable backport anyway).

The DT bindings and PCI patch are expected to go through the PCI tree,
while Bjorn A takes the devicetree updates through the Qualcomm tree.

Johan


Changes in v4
 - drop the 'aspm-no-l0s' DT property and disable L0s for all sc8280xp
   derivative platforms based on the compatible string for now

Changes in v3
 - drop the two wifi link speed patches which have been picked up for
   6.8
 - rebase on binding rework in linux-next and add the properties also to
   the new qcom,pcie-common.yaml
   - https://lore.kernel.org/linux-pci/20240126-dt-bindings-pci-qcom-split-v3-0-f23cda4d74c0@linaro.org/
 - fix an 'L0s' typo in one commit message

Changes in v2
 - drop RFC from ASPM patches and add stable tags
 - reorder patches and move ITS patch last
 - fix s/GB/MB/ typo in Gen2 speed commit messages
 - fix an incorrect Fixes tag
 - amend commit message X13 wifi link speed patch after user
   confirmation that this fixes the wifi startup issue
 - disable L0s also for modem and wifi on CRD
 - disable L0s also for nvme and modem on X13s


Johan Hovold (5):
  dt-bindings: PCI: qcom: Allow 'required-opps'
  dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
  PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p
  arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
  arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe

 .../bindings/pci/qcom,pcie-common.yaml        |  4 ++-
 .../devicetree/bindings/pci/qcom,pcie.yaml    |  4 ++-
 arch/arm64/boot/dts/qcom/sc8280xp.dtsi        | 17 +++++++++-
 drivers/pci/controller/dwc/pcie-qcom.c        | 31 +++++++++++++++++--
 4 files changed, 51 insertions(+), 5 deletions(-)

Comments

Johan Hovold March 8, 2024, 3:47 p.m. UTC | #1
On Wed, Mar 06, 2024 at 10:56:46AM +0100, Johan Hovold wrote:
> This series addresses a few problems with the sc8280xp PCIe
> implementation.

> Qualcomm has now confirmed that this is an issue on sc8280xp and its
> derivate platforms. Specifically, the PHY configuration used on these
> platforms is not correctly tuned for L0s and there is currently no
> updated configuration available.
> 
> As we are now at 6.8-rc7, I've rebased this series on the Qualcomm PCIe
> binding rework in linux-next so that the whole series can be merged for
> 6.9 (the patch to disable l0s and the devicetree fix are both marked for
> stable backport anyway).
> 
> The DT bindings and PCI patch are expected to go through the PCI tree,
> while Bjorn A takes the devicetree updates through the Qualcomm tree.

> Johan Hovold (5):
>   dt-bindings: PCI: qcom: Allow 'required-opps'
>   dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
>   PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p

Bjorn H, could you pick up these three for 6.9-rc1?

>   arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
>   arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe

That way there's a small chance that Bjorn A can be able to squeeze in
the patch to enable ITS in 6.9 too (e.g. if there's an rc8).

Johan
Lorenzo Pieralisi March 8, 2024, 4:09 p.m. UTC | #2
On Wed, 06 Mar 2024 10:56:46 +0100, Johan Hovold wrote:
> This series addresses a few problems with the sc8280xp PCIe
> implementation.
> 
> The DWC PCIe controller can either use its internal MSI controller or an
> external one such as the GICv3 ITS. Enabling the latter allows for
> assigning affinity to individual interrupts, but results in a large
> amount of Correctable Errors being logged on both the Lenovo ThinkPad
> X13s and the sc8280xp-crd reference design.
> 
> [...]

Applied to controller/qcom, thanks!

[1/5] dt-bindings: PCI: qcom: Allow 'required-opps'
      https://git.kernel.org/pci/pci/c/c8073025c0e4
[2/5] dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
      https://git.kernel.org/pci/pci/c/545e88cb41a6
[3/5] PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p
      https://git.kernel.org/pci/pci/c/d1997c987814

Thanks,
Lorenzo
Bjorn Andersson March 19, 2024, 2:48 a.m. UTC | #3
On Wed, 06 Mar 2024 10:56:46 +0100, Johan Hovold wrote:
> This series addresses a few problems with the sc8280xp PCIe
> implementation.
> 
> The DWC PCIe controller can either use its internal MSI controller or an
> external one such as the GICv3 ITS. Enabling the latter allows for
> assigning affinity to individual interrupts, but results in a large
> amount of Correctable Errors being logged on both the Lenovo ThinkPad
> X13s and the sc8280xp-crd reference design.
> 
> [...]

Applied, thanks!

[4/5] arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
      commit: 2b621971554a94094cf489314dc1c2b65401965c
[5/5] arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe
      commit: 81051f14a66c3913f1d219bd97e47002f1dc91de

Best regards,
Johan Hovold March 19, 2024, 7:25 a.m. UTC | #4
On Mon, Mar 18, 2024 at 09:48:30PM -0500, Bjorn Andersson wrote:
> 
> On Wed, 06 Mar 2024 10:56:46 +0100, Johan Hovold wrote:
> > This series addresses a few problems with the sc8280xp PCIe
> > implementation.
> > 
> > The DWC PCIe controller can either use its internal MSI controller or an
> > external one such as the GICv3 ITS. Enabling the latter allows for
> > assigning affinity to individual interrupts, but results in a large
> > amount of Correctable Errors being logged on both the Lenovo ThinkPad
> > X13s and the sc8280xp-crd reference design.
> > 
> > [...]
> 
> Applied, thanks!
> 
> [4/5] arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
>       commit: 2b621971554a94094cf489314dc1c2b65401965c

I noticed that you applied both of these for 6.10, but this one is a fix
that should go into 6.9.

> [5/5] arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe
>       commit: 81051f14a66c3913f1d219bd97e47002f1dc91de

Johan
Krzysztof Kozlowski March 20, 2024, 8:09 a.m. UTC | #5
On 19/03/2024 08:25, Johan Hovold wrote:
> On Mon, Mar 18, 2024 at 09:48:30PM -0500, Bjorn Andersson wrote:
>>
>> On Wed, 06 Mar 2024 10:56:46 +0100, Johan Hovold wrote:
>>> This series addresses a few problems with the sc8280xp PCIe
>>> implementation.
>>>
>>> The DWC PCIe controller can either use its internal MSI controller or an
>>> external one such as the GICv3 ITS. Enabling the latter allows for
>>> assigning affinity to individual interrupts, but results in a large
>>> amount of Correctable Errors being logged on both the Lenovo ThinkPad
>>> X13s and the sc8280xp-crd reference design.
>>>
>>> [...]
>>
>> Applied, thanks!
>>
>> [4/5] arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
>>       commit: 2b621971554a94094cf489314dc1c2b65401965c
> 
> I noticed that you applied both of these for 6.10, but this one is a fix
> that should go into 6.9.

Well, mixing fixes for different cycles in one patchset was always
discouraged. In case of some subsystems you would receive clear
response, that you must split fixes out of the patchset.

Fixes being first in the patchset would be probably accepted by the rest
of subsystems, but putting it in the middle of the patchset is wrong.

Best regards,
Krzysztof
Johan Hovold March 20, 2024, 8:18 a.m. UTC | #6
On Wed, Mar 20, 2024 at 09:09:02AM +0100, Krzysztof Kozlowski wrote:
> On 19/03/2024 08:25, Johan Hovold wrote:
> > On Mon, Mar 18, 2024 at 09:48:30PM -0500, Bjorn Andersson wrote:
> >> On Wed, 06 Mar 2024 10:56:46 +0100, Johan Hovold wrote:
> >>> This series addresses a few problems with the sc8280xp PCIe
> >>> implementation.

> >> Applied, thanks!
> >>
> >> [4/5] arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
> >>       commit: 2b621971554a94094cf489314dc1c2b65401965c
> > 
> > I noticed that you applied both of these for 6.10, but this one is a fix
> > that should go into 6.9.
> 
> Well, mixing fixes for different cycles in one patchset was always
> discouraged. In case of some subsystems you would receive clear
> response, that you must split fixes out of the patchset.
> 
> Fixes being first in the patchset would be probably accepted by the rest
> of subsystems, but putting it in the middle of the patchset is wrong.

Perhaps you should not comment before reading up on the history of this
series.

This was all intended for 6.9, but merging was stalled for a number of
reasons so here we are. The patches were also going in through different
trees, so patch 4/5 is the first Qualcomm SoC patch.

Johan
Krzysztof Kozlowski March 20, 2024, 8:24 a.m. UTC | #7
On 20/03/2024 09:18, Johan Hovold wrote:
> On Wed, Mar 20, 2024 at 09:09:02AM +0100, Krzysztof Kozlowski wrote:
>> On 19/03/2024 08:25, Johan Hovold wrote:
>>> On Mon, Mar 18, 2024 at 09:48:30PM -0500, Bjorn Andersson wrote:
>>>> On Wed, 06 Mar 2024 10:56:46 +0100, Johan Hovold wrote:
>>>>> This series addresses a few problems with the sc8280xp PCIe
>>>>> implementation.
> 
>>>> Applied, thanks!
>>>>
>>>> [4/5] arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
>>>>       commit: 2b621971554a94094cf489314dc1c2b65401965c
>>>
>>> I noticed that you applied both of these for 6.10, but this one is a fix
>>> that should go into 6.9.
>>
>> Well, mixing fixes for different cycles in one patchset was always
>> discouraged. In case of some subsystems you would receive clear
>> response, that you must split fixes out of the patchset.
>>
>> Fixes being first in the patchset would be probably accepted by the rest
>> of subsystems, but putting it in the middle of the patchset is wrong.
> 
> Perhaps you should not comment before reading up on the history of this
> series.
> 
> This was all intended for 6.9, but merging was stalled for a number of
> reasons so here we are. The patches were also going in through different
> trees, so patch 4/5 is the first Qualcomm SoC patch.

Again, well, you sent it at few days before merge window, so how do you
imagine this being applied for v6.9 and still fulfilling "few linux-next
cycles before merge window" requirement? Especially that arm-soc cut off
is much earlier :/. I talk about patch 5, of course, because that is not
a fix (at least not marked as one). Don't expect in general a arms-co
patch to be applied four days before merge window, thus the actual fix -
patch #4 - should be split.

Best regards,
Krzysztof
Johan Hovold March 20, 2024, 8:40 a.m. UTC | #8
On Wed, Mar 20, 2024 at 09:24:54AM +0100, Krzysztof Kozlowski wrote:
> On 20/03/2024 09:18, Johan Hovold wrote:

> > Perhaps you should not comment before reading up on the history of this
> > series.
> > 
> > This was all intended for 6.9, but merging was stalled for a number of
> > reasons so here we are. The patches were also going in through different
> > trees, so patch 4/5 is the first Qualcomm SoC patch.
> 
> Again, well, you sent it at few days before merge window, so how do you
> imagine this being applied for v6.9 and still fulfilling "few linux-next
> cycles before merge window" requirement? Especially that arm-soc cut off
> is much earlier :/. I talk about patch 5, of course, because that is not
> a fix (at least not marked as one). Don't expect in general a arms-co
> patch to be applied four days before merge window, thus the actual fix -
> patch #4 - should be split.

At the time there was still hope that there may be an rc8, and the patch
in question had been used by a large number of X13s users for several
weeks, which is a lot more testing than the average Qualcomm patch
receives, whether it's in linux-next or not.

And patch 5 depends on the earlier patches in the series so it belongs
in the series, which was also initially posted long before the merge
window.

I'm sure Bjorn A can handle this in general, even if he failed to notice
the CC stable tag or had other reasons for applying the fix for 6.10
instead of 6.9.

Johan
Krzysztof Kozlowski March 20, 2024, 9:16 a.m. UTC | #9
On 20/03/2024 09:40, Johan Hovold wrote:
> On Wed, Mar 20, 2024 at 09:24:54AM +0100, Krzysztof Kozlowski wrote:
>> On 20/03/2024 09:18, Johan Hovold wrote:
> 
>>> Perhaps you should not comment before reading up on the history of this
>>> series.
>>>
>>> This was all intended for 6.9, but merging was stalled for a number of
>>> reasons so here we are. The patches were also going in through different
>>> trees, so patch 4/5 is the first Qualcomm SoC patch.
>>
>> Again, well, you sent it at few days before merge window, so how do you
>> imagine this being applied for v6.9 and still fulfilling "few linux-next
>> cycles before merge window" requirement? Especially that arm-soc cut off
>> is much earlier :/. I talk about patch 5, of course, because that is not
>> a fix (at least not marked as one). Don't expect in general a arms-co
>> patch to be applied four days before merge window, thus the actual fix -
>> patch #4 - should be split.
> 
> At the time there was still hope that there may be an rc8, and the patch
> in question had been used by a large number of X13s users for several
> weeks, which is a lot more testing than the average Qualcomm patch
> receives, whether it's in linux-next or not.

OK, it does solve some parts of our discussion but does not solve my
earlier comment: Fixes should be separate series. Certain folks have
quite strict requirement on this. Try sending a fix with non-fix
(depending on fix somehow like here) to Mark Brown. He has even template
for such case...

> 
> And patch 5 depends on the earlier patches in the series so it belongs
> in the series, which was also initially posted long before the merge
> window.

The dependency is might not be good enough reason to combine fixes and
non-fixes into one series. Dependency should be explained (in 5th
patch), but it's maintainer's judgement and job to handle this.

And in all this, fact that Bjorn missed certain aspects and applied this
differently than you wanted, is another argument that this should be split.

Best regards,
Krzysztof
Johan Hovold March 20, 2024, 9:52 a.m. UTC | #10
On Wed, Mar 20, 2024 at 10:16:16AM +0100, Krzysztof Kozlowski wrote:
> On 20/03/2024 09:40, Johan Hovold wrote:

> > At the time there was still hope that there may be an rc8, and the patch
> > in question had been used by a large number of X13s users for several
> > weeks, which is a lot more testing than the average Qualcomm patch
> > receives, whether it's in linux-next or not.
> 
> OK, it does solve some parts of our discussion but does not solve my
> earlier comment: Fixes should be separate series. Certain folks have
> quite strict requirement on this. Try sending a fix with non-fix
> (depending on fix somehow like here) to Mark Brown. He has even template
> for such case...

That's not a general rule at all.

> > And patch 5 depends on the earlier patches in the series so it belongs
> > in the series, which was also initially posted long before the merge
> > window.
> 
> The dependency is might not be good enough reason to combine fixes and
> non-fixes into one series. Dependency should be explained (in 5th
> patch), but it's maintainer's judgement and job to handle this.

FFS, I'm posting a series adding a new feature, which depends on first
addressing a few related bugs. I post everything together so that it can
be evaluated and tested together. That's perfectly fine, and not that
different from how we post driver and dts changes in one series to
facilitate review. Some maintainers don't like that either, then we deal
with that.

Johan