diff mbox series

PCI: Add Max Payload Size quirk for ASMedia ASM1062 SATA controller

Message ID 20210317115924.31885-1-kabel@kernel.org (mailing list archive)
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Add Max Payload Size quirk for ASMedia ASM1062 SATA controller | expand

Commit Message

Marek Behún March 17, 2021, 11:59 a.m. UTC
The ASMedia ASM1062 SATA controller causes an External Abort on
controllers which support Max Payload Size >= 512. It happens with
Aardvark PCIe controller (tested on Turris MOX) and also with DesignWare
controller (armada8k, tested on CN9130-CRB):

  ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
  ata1.00: ATA-9: WDC WD40EFRX-68WT0N0, 80.00A80, max UDMA/133
  ata1.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
  ERROR:   Unhandled External Abort received on 0x80000000 at EL3!
  ERROR:    exception reason=1 syndrome=0x92000210
  PANIC at PC : 0x00000000040273bc

Limiting Max Payload Size to 256 bytes solves this problem.

On Turris MOX this problem first appeared when the pci-aardvark
controller started using the pci-emul-bridge API, in commit 8a3ebd8de328
("PCI: aardvark: Implement emulated root PCI bridge config space").

On armada8k this was always a problem because it has HW root bridge.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reported-by: Rötti <espressobinboardarmbiantempmailaddress@posteo.de>
Cc: Pali Rohár <pali@kernel.org>
Cc: stable@vger.kernel.org
---
 drivers/pci/quirks.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Pali Rohár March 17, 2021, 6:46 p.m. UTC | #1
On Wednesday 17 March 2021 12:59:24 Marek Behún wrote:
> The ASMedia ASM1062 SATA controller causes an External Abort on
> controllers which support Max Payload Size >= 512. It happens with
> Aardvark PCIe controller (tested on Turris MOX) and also with DesignWare
> controller (armada8k, tested on CN9130-CRB):
> 
>   ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>   ata1.00: ATA-9: WDC WD40EFRX-68WT0N0, 80.00A80, max UDMA/133
>   ata1.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
>   ERROR:   Unhandled External Abort received on 0x80000000 at EL3!
>   ERROR:    exception reason=1 syndrome=0x92000210
>   PANIC at PC : 0x00000000040273bc
> 
> Limiting Max Payload Size to 256 bytes solves this problem.
> 
> On Turris MOX this problem first appeared when the pci-aardvark
> controller started using the pci-emul-bridge API, in commit 8a3ebd8de328
> ("PCI: aardvark: Implement emulated root PCI bridge config space").
> 
> On armada8k this was always a problem because it has HW root bridge.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>
> Reported-by: Rötti <espressobinboardarmbiantempmailaddress@posteo.de>

Link to Rötti's report:
https://lore.kernel.org/linux-ide/cbbb2496501fed013ccbeba524e8d573@posteo.de/T/#u

> Cc: Pali Rohár <pali@kernel.org>

Reviewed-by: Pali Rohár <pali@kernel.org>

> Cc: stable@vger.kernel.org
> ---
>  drivers/pci/quirks.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 653660e3ba9e..a561136efb08 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3251,6 +3251,11 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
>  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1, fixup_mpss_256);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
>  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000B, fixup_mpss_256);
> +/*
> + * For some reason DECLARE_PCI_FIXUP_HEADER does not work with pci-aardvark
> + * controller. We have to use DECLARE_PCI_FIXUP_EARLY.
> + */
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x0612, fixup_mpss_256);
>  
>  /*
>   * Intel 5000 and 5100 Memory controllers have an erratum with read completion
> -- 
> 2.26.2
>
Bjorn Helgaas March 17, 2021, 10:45 p.m. UTC | #2
[+cc Zachary, author of 8a3ebd8de328]

Thanks for the report and the patch, Marek!

On Wed, Mar 17, 2021 at 12:59:24PM +0100, Marek Behún wrote:
> The ASMedia ASM1062 SATA controller causes an External Abort on
> controllers which support Max Payload Size >= 512. It happens with
> Aardvark PCIe controller (tested on Turris MOX) and also with DesignWare
> controller (armada8k, tested on CN9130-CRB):
> 
>   ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>   ata1.00: ATA-9: WDC WD40EFRX-68WT0N0, 80.00A80, max UDMA/133
>   ata1.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
>   ERROR:   Unhandled External Abort received on 0x80000000 at EL3!
>   ERROR:    exception reason=1 syndrome=0x92000210
>   PANIC at PC : 0x00000000040273bc
> 
> Limiting Max Payload Size to 256 bytes solves this problem.
> 
> On Turris MOX this problem first appeared when the pci-aardvark
> controller started using the pci-emul-bridge API, in commit 8a3ebd8de328
> ("PCI: aardvark: Implement emulated root PCI bridge config space").

Any ideas about why 8a3ebd8de328 made a difference?  Could there be a
defect in 8a3ebd8de328?  Or maybe before 8a3ebd8de328 we didn't
actually read or update MPS settings in the hardware?

> On armada8k this was always a problem because it has HW root bridge.

Can you please open a report at bugzilla.kernel.org and attach the
complete "sudo lspci -vv" output for both systems?  I think it's OK to
collect these with the patch applied; we should still be able to see
the information we use to compute the MPS values.  But please include
the CONFIG_PCIE_BUS_* settings and any "pcie_bus_*" kernel command
line arguments.

This quirk suggests that there's a hardware defect in the ASMedia
ASM1062.  But if that's really the case, we should see reports on lots
of platforms, and I'm only aware of these two.  You do mention that it
was always a problem on armada8k; if you know of other reports of
that, please include URLs in the bugzilla.

If the problem only occurs on these platforms, my first guess would be
a hardware defect in these platforms or a software defect in the PCIe
controller driver.  It wouldn't surprise me if we have a generic PCI
core defect in how we set MPS, either.

> Signed-off-by: Marek Behún <kabel@kernel.org>
> Reported-by: Rötti <espressobinboardarmbiantempmailaddress@posteo.de>
> Cc: Pali Rohár <pali@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  drivers/pci/quirks.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 653660e3ba9e..a561136efb08 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3251,6 +3251,11 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
>  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1, fixup_mpss_256);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
>  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000B, fixup_mpss_256);
> +/*
> + * For some reason DECLARE_PCI_FIXUP_HEADER does not work with pci-aardvark
> + * controller. We have to use DECLARE_PCI_FIXUP_EARLY.
> + */

This is curious, too.  If we need the quirk, I'd like to run this down
to figure out why we need an EARLY instead of HEADER quirk.
Differences like that suggest a bug elsewhere, or at least an
unnecessary difference between PCIe controller drivers.

> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x0612, fixup_mpss_256);
>  
>  /*
>   * Intel 5000 and 5100 Memory controllers have an erratum with read completion
> -- 
> 2.26.2
>
Pali Rohár March 17, 2021, 10:55 p.m. UTC | #3
On Wednesday 17 March 2021 17:45:49 Bjorn Helgaas wrote:
> [+cc Zachary, author of 8a3ebd8de328]
> 
> Thanks for the report and the patch, Marek!
> 
> On Wed, Mar 17, 2021 at 12:59:24PM +0100, Marek Behún wrote:
> > The ASMedia ASM1062 SATA controller causes an External Abort on
> > controllers which support Max Payload Size >= 512. It happens with
> > Aardvark PCIe controller (tested on Turris MOX) and also with DesignWare
> > controller (armada8k, tested on CN9130-CRB):
> > 
> >   ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> >   ata1.00: ATA-9: WDC WD40EFRX-68WT0N0, 80.00A80, max UDMA/133
> >   ata1.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> >   ERROR:   Unhandled External Abort received on 0x80000000 at EL3!
> >   ERROR:    exception reason=1 syndrome=0x92000210
> >   PANIC at PC : 0x00000000040273bc
> > 
> > Limiting Max Payload Size to 256 bytes solves this problem.
> > 
> > On Turris MOX this problem first appeared when the pci-aardvark
> > controller started using the pci-emul-bridge API, in commit 8a3ebd8de328
> > ("PCI: aardvark: Implement emulated root PCI bridge config space").
> 
> Any ideas about why 8a3ebd8de328 made a difference?  Could there be a
> defect in 8a3ebd8de328?  Or maybe before 8a3ebd8de328 we didn't
> actually read or update MPS settings in the hardware?

Hi Bjorn! It is because A3720 HW does not have real PCIe Root Bridge and
therefore kernel was not able to read nor write MPS settings.

Commit 8a3ebd8de328 added support for emulated PCIe Root Bridge on A3720
and aardvark driver started emulating it via internal aardvark registers
which supports reading and writing also MPS settings. So kernel was
finally able to set 512 byte payload.

And so after this commit kernel was able to set MPS and due to that
ASMedia bug system started crashing.

> > On armada8k this was always a problem because it has HW root bridge.
> 
> Can you please open a report at bugzilla.kernel.org and attach the
> complete "sudo lspci -vv" output for both systems?  I think it's OK to
> collect these with the patch applied; we should still be able to see
> the information we use to compute the MPS values.  But please include
> the CONFIG_PCIE_BUS_* settings and any "pcie_bus_*" kernel command
> line arguments.
> 
> This quirk suggests that there's a hardware defect in the ASMedia
> ASM1062.  But if that's really the case, we should see reports on lots
> of platforms, and I'm only aware of these two.

Do you have platform which support MPS of 512 bytes? Because I have not
seen any x86 / Intel PCIe controller with such support on ordinary
laptop and desktop.

These two (A3720 and CN9130) are the only which has support for it.

Has somebody else PCIe controller which Root Bridge supports MPS of 512
bytes?

Maybe they are in servers, but then such "cheap" SATA controllers are
not used in servers. So this is probably reason why nobody else reported
such issue.

> You do mention that it
> was always a problem on armada8k; if you know of other reports of
> that, please include URLs in the bugzilla.
> 
> If the problem only occurs on these platforms, my first guess would be
> a hardware defect in these platforms or a software defect in the PCIe
> controller driver.  It wouldn't surprise me if we have a generic PCI
> core defect in how we set MPS, either.
> 
> > Signed-off-by: Marek Behún <kabel@kernel.org>
> > Reported-by: Rötti <espressobinboardarmbiantempmailaddress@posteo.de>
> > Cc: Pali Rohár <pali@kernel.org>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/pci/quirks.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 653660e3ba9e..a561136efb08 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3251,6 +3251,11 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
> >  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1, fixup_mpss_256);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
> >  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000B, fixup_mpss_256);
> > +/*
> > + * For some reason DECLARE_PCI_FIXUP_HEADER does not work with pci-aardvark
> > + * controller. We have to use DECLARE_PCI_FIXUP_EARLY.
> > + */
> 
> This is curious, too.  If we need the quirk, I'd like to run this down
> to figure out why we need an EARLY instead of HEADER quirk.
> Differences like that suggest a bug elsewhere, or at least an
> unnecessary difference between PCIe controller drivers.
> 
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x0612, fixup_mpss_256);
> >  
> >  /*
> >   * Intel 5000 and 5100 Memory controllers have an erratum with read completion
> > -- 
> > 2.26.2
> >
Bjorn Helgaas March 17, 2021, 11:03 p.m. UTC | #4
On Wed, Mar 17, 2021 at 11:55:44PM +0100, Pali Rohár wrote:
> On Wednesday 17 March 2021 17:45:49 Bjorn Helgaas wrote:
> > [+cc Zachary, author of 8a3ebd8de328]
> > 
> > Thanks for the report and the patch, Marek!
> > 
> > On Wed, Mar 17, 2021 at 12:59:24PM +0100, Marek Behún wrote:
> > > The ASMedia ASM1062 SATA controller causes an External Abort on
> > > controllers which support Max Payload Size >= 512. It happens with
> > > Aardvark PCIe controller (tested on Turris MOX) and also with DesignWare
> > > controller (armada8k, tested on CN9130-CRB):
> > > 
> > >   ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > >   ata1.00: ATA-9: WDC WD40EFRX-68WT0N0, 80.00A80, max UDMA/133
> > >   ata1.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> > >   ERROR:   Unhandled External Abort received on 0x80000000 at EL3!
> > >   ERROR:    exception reason=1 syndrome=0x92000210
> > >   PANIC at PC : 0x00000000040273bc
> > > 
> > > Limiting Max Payload Size to 256 bytes solves this problem.
> > > 
> > > On Turris MOX this problem first appeared when the pci-aardvark
> > > controller started using the pci-emul-bridge API, in commit 8a3ebd8de328
> > > ("PCI: aardvark: Implement emulated root PCI bridge config space").
> > 
> > Any ideas about why 8a3ebd8de328 made a difference?  Could there be a
> > defect in 8a3ebd8de328?  Or maybe before 8a3ebd8de328 we didn't
> > actually read or update MPS settings in the hardware?
> 
> Hi Bjorn! It is because A3720 HW does not have real PCIe Root Bridge and
> therefore kernel was not able to read nor write MPS settings.
> 
> Commit 8a3ebd8de328 added support for emulated PCIe Root Bridge on A3720
> and aardvark driver started emulating it via internal aardvark registers
> which supports reading and writing also MPS settings. So kernel was
> finally able to set 512 byte payload.

OK, so in other words, before 8a3ebd8de328 we didn't actually read or
update MPS settings in the hardware.

> And so after this commit kernel was able to set MPS and due to that
> ASMedia bug system started crashing.
> 
> > > On armada8k this was always a problem because it has HW root bridge.
> > 
> > Can you please open a report at bugzilla.kernel.org and attach the
> > complete "sudo lspci -vv" output for both systems?  I think it's OK to
> > collect these with the patch applied; we should still be able to see
> > the information we use to compute the MPS values.  But please include
> > the CONFIG_PCIE_BUS_* settings and any "pcie_bus_*" kernel command
> > line arguments.
> > 
> > This quirk suggests that there's a hardware defect in the ASMedia
> > ASM1062.  But if that's really the case, we should see reports on lots
> > of platforms, and I'm only aware of these two.
> 
> Do you have platform which support MPS of 512 bytes? Because I have not
> seen any x86 / Intel PCIe controller with such support on ordinary
> laptop and desktop.
> 
> These two (A3720 and CN9130) are the only which has support for it.
> 
> Has somebody else PCIe controller which Root Bridge supports MPS of 512
> bytes?
> 
> Maybe they are in servers, but then such "cheap" SATA controllers are
> not used in servers. So this is probably reason why nobody else reported
> such issue.

I have no idea.  My laptop only supports 512 (except for an ASMedia
USB controller).  If the device advertises it, I would expect the
vendor to test it.  Obviously it still could be a device defect.  They
should publish an erratum if that's the case so people know to avoid
it.  So I would try to get ASMedia to say "no, that's tested and
should work" or "oh, sorry, here's an erratum and we'll fix it in the
next round."

> > You do mention that it
> > was always a problem on armada8k; if you know of other reports of
> > that, please include URLs in the bugzilla.
> > 
> > If the problem only occurs on these platforms, my first guess would be
> > a hardware defect in these platforms or a software defect in the PCIe
> > controller driver.  It wouldn't surprise me if we have a generic PCI
> > core defect in how we set MPS, either.
> > 
> > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > Reported-by: Rötti <espressobinboardarmbiantempmailaddress@posteo.de>
> > > Cc: Pali Rohár <pali@kernel.org>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  drivers/pci/quirks.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 653660e3ba9e..a561136efb08 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -3251,6 +3251,11 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
> > >  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1, fixup_mpss_256);
> > >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
> > >  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000B, fixup_mpss_256);
> > > +/*
> > > + * For some reason DECLARE_PCI_FIXUP_HEADER does not work with pci-aardvark
> > > + * controller. We have to use DECLARE_PCI_FIXUP_EARLY.
> > > + */
> > 
> > This is curious, too.  If we need the quirk, I'd like to run this down
> > to figure out why we need an EARLY instead of HEADER quirk.
> > Differences like that suggest a bug elsewhere, or at least an
> > unnecessary difference between PCIe controller drivers.
> > 
> > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x0612, fixup_mpss_256);
> > >  
> > >  /*
> > >   * Intel 5000 and 5100 Memory controllers have an erratum with read completion
> > > -- 
> > > 2.26.2
> > >
Marek Behún March 17, 2021, 11:09 p.m. UTC | #5
On Wed, 17 Mar 2021 17:45:49 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Zachary, author of 8a3ebd8de328]
> 
> Thanks for the report and the patch, Marek!
> 
> On Wed, Mar 17, 2021 at 12:59:24PM +0100, Marek Behún wrote:
> > The ASMedia ASM1062 SATA controller causes an External Abort on
> > controllers which support Max Payload Size >= 512. It happens with
> > Aardvark PCIe controller (tested on Turris MOX) and also with DesignWare
> > controller (armada8k, tested on CN9130-CRB):
> > 
> >   ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> >   ata1.00: ATA-9: WDC WD40EFRX-68WT0N0, 80.00A80, max UDMA/133
> >   ata1.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> >   ERROR:   Unhandled External Abort received on 0x80000000 at EL3!
> >   ERROR:    exception reason=1 syndrome=0x92000210
> >   PANIC at PC : 0x00000000040273bc
> > 
> > Limiting Max Payload Size to 256 bytes solves this problem.
> > 
> > On Turris MOX this problem first appeared when the pci-aardvark
> > controller started using the pci-emul-bridge API, in commit 8a3ebd8de328
> > ("PCI: aardvark: Implement emulated root PCI bridge config space").  
> 
> Any ideas about why 8a3ebd8de328 made a difference?  Could there be a
> defect in 8a3ebd8de328?  Or maybe before 8a3ebd8de328 we didn't
> actually read or update MPS settings in the hardware?

Before 8a3ebd8de328 the PCIe subsystem could not detect that the MPS is
512 bytes, because the emulated bridge did not exists. So it did not try
to change MPS for the SATA controller, or changed it to minimum of 128
bytes. Once pci-emul-bridge was added, the PCIe subsystem had access to
the information that the aardvark controller supports 512 bytes, so it
tried to leverage it, since the SATA controller says it can handle 512
bytes MPS. But apparently it cannot.

> > On armada8k this was always a problem because it has HW root bridge.  
> 
> Can you please open a report at bugzilla.kernel.org and attach the
> complete "sudo lspci -vv" output for both systems?  I think it's OK to
> collect these with the patch applied; we should still be able to see
> the information we use to compute the MPS values.  But please include
> the CONFIG_PCIE_BUS_* settings and any "pcie_bus_*" kernel command
> line arguments.
> 
> This quirk suggests that there's a hardware defect in the ASMedia
> ASM1062.  But if that's really the case, we should see reports on lots
> of platforms, and I'm only aware of these two.  You do mention that it
> was always a problem on armada8k; if you know of other reports of
> that, please include URLs in the bugzilla.

I think that most people use this adapter on x86 platforms. Maybe x86
PCIe controllers do not support 512 bytes MPS (at least my notebook
nor work computer don't). So this issue would be only detected by
someone who used this controller on a controller which supports 512+
bytes MPS. armada8k is mainly used for network devices, so PCIe
there is probably mostly used for wifi cards, and for SATA armada8k has
native controller, and also nVME. The most probable answer is that
noone used this controller on armada8k nor aardvark...

> If the problem only occurs on these platforms, my first guess would be
> a hardware defect in these platforms or a software defect in the PCIe
> controller driver.  It wouldn't surprise me if we have a generic PCI
> core defect in how we set MPS, either.

If someone has this card and a x86 or other PCIe controller capable of
512 byte MPS, maybe they could check.

But aramda8k PCIe controller (which is a DesignWare IP) is completely
different from aardvark (although they both are on Armada SOCs). I
think it is more probable that the problem is in the ASMedia SATA
controller, this controller has also other issues (firmware sometimes
crashing, for example).

> > Signed-off-by: Marek Behún <kabel@kernel.org>
> > Reported-by: Rötti <espressobinboardarmbiantempmailaddress@posteo.de>
> > Cc: Pali Rohár <pali@kernel.org>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/pci/quirks.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 653660e3ba9e..a561136efb08 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3251,6 +3251,11 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
> >  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1, fixup_mpss_256);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
> >  			 PCI_DEVICE_ID_SOLARFLARE_SFC4000B, fixup_mpss_256);
> > +/*
> > + * For some reason DECLARE_PCI_FIXUP_HEADER does not work with pci-aardvark
> > + * controller. We have to use DECLARE_PCI_FIXUP_EARLY.
> > + */  
> 
> This is curious, too.  If we need the quirk, I'd like to run this down
> to figure out why we need an EARLY instead of HEADER quirk.
> Differences like that suggest a bug elsewhere, or at least an
> unnecessary difference between PCIe controller drivers.

Now that I look at this, the comment is wrong. What I meant to say is
that when I used HEADER quirk, the PCIe subsystem did change the MPS to
512 bytes, so clearly the HAEDER quirk only comes after changing of MPS.

I think that the other quirks using the fixup_mpss_256 callback should
also use EARLY instead of HEADER. Maybe something changed in PCI API
sometime and noone noticed that these quirks stopped working?

Marek

> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x0612, fixup_mpss_256);
> >  
> >  /*
> >   * Intel 5000 and 5100 Memory controllers have an erratum with read completion
> > -- 
> > 2.26.2
> >
Pali Rohár March 19, 2021, 7:02 p.m. UTC | #6
On Wednesday 17 March 2021 18:03:55 Bjorn Helgaas wrote:
> On Wed, Mar 17, 2021 at 11:55:44PM +0100, Pali Rohár wrote:
> > On Wednesday 17 March 2021 17:45:49 Bjorn Helgaas wrote:
> > > This quirk suggests that there's a hardware defect in the ASMedia
> > > ASM1062.  But if that's really the case, we should see reports on lots
> > > of platforms, and I'm only aware of these two.
> > 
> > Do you have platform which support MPS of 512 bytes? Because I have not
> > seen any x86 / Intel PCIe controller with such support on ordinary
> > laptop and desktop.
> > 
> > These two (A3720 and CN9130) are the only which has support for it.
> > 
> > Has somebody else PCIe controller which Root Bridge supports MPS of 512
> > bytes?
> > 
> > Maybe they are in servers, but then such "cheap" SATA controllers are
> > not used in servers. So this is probably reason why nobody else reported
> > such issue.
> 
> I have no idea.  My laptop only supports 512 (except for an ASMedia
> USB controller).  If the device advertises it, I would expect the
> vendor to test it.  Obviously it still could be a device defect.  They
> should publish an erratum if that's the case so people know to avoid
> it.  So I would try to get ASMedia to say "no, that's tested and
> should work" or "oh, sorry, here's an erratum and we'll fix it in the
> next round."

I doubt that ASMedia publish something...

But has somebody contact to ASMedia? I can try it.

Basically these ASMedia SATA controller chips are present on more
"noname" mPCIe-form cards and I guess ASMedia is not going to support
them.

Note that we have also tested Marvell PCIe-based SATA controllers which
support MPS of 512 bytes too and there were no problem with them on
A3720 nor CN9130.
Rötti March 21, 2021, 3:09 p.m. UTC | #7
I organized a T60 Thinkpad, pulled out the Wificard (MiniPCIE) and 
plugged in the Marvell SATA-Controller card. Good news is that you're 
right, the DevCap MaxPayload is 128 bytes, so I couldn't reproduce that 
error on that thinkpad. I tried two different Marvell controller cards. 
Wierd thing is, that both cards did not sho up in the lspci -nn -vv 
command. So I'm not sure if these got recognized.

With these patches supplied (@thank you very much Marek & Björn) is 
there a build server I can download a nightly version of armbian I can 
test for you?
Is there any way I can support?

Thank you very much in advance!

Am 19.03.2021 20:02 schrieb Pali Rohár:
> On Wednesday 17 March 2021 18:03:55 Bjorn Helgaas wrote:
>> On Wed, Mar 17, 2021 at 11:55:44PM +0100, Pali Rohár wrote:
>> > On Wednesday 17 March 2021 17:45:49 Bjorn Helgaas wrote:
>> > > This quirk suggests that there's a hardware defect in the ASMedia
>> > > ASM1062.  But if that's really the case, we should see reports on lots
>> > > of platforms, and I'm only aware of these two.
>> >
>> > Do you have platform which support MPS of 512 bytes? Because I have not
>> > seen any x86 / Intel PCIe controller with such support on ordinary
>> > laptop and desktop.
>> >
>> > These two (A3720 and CN9130) are the only which has support for it.
>> >
>> > Has somebody else PCIe controller which Root Bridge supports MPS of 512
>> > bytes?
>> >
>> > Maybe they are in servers, but then such "cheap" SATA controllers are
>> > not used in servers. So this is probably reason why nobody else reported
>> > such issue.
>> 
>> I have no idea.  My laptop only supports 512 (except for an ASMedia
>> USB controller).  If the device advertises it, I would expect the
>> vendor to test it.  Obviously it still could be a device defect.  They
>> should publish an erratum if that's the case so people know to avoid
>> it.  So I would try to get ASMedia to say "no, that's tested and
>> should work" or "oh, sorry, here's an erratum and we'll fix it in the
>> next round."
> 
> I doubt that ASMedia publish something...
> 
> But has somebody contact to ASMedia? I can try it.
> 
> Basically these ASMedia SATA controller chips are present on more
> "noname" mPCIe-form cards and I guess ASMedia is not going to support
> them.
> 
> Note that we have also tested Marvell PCIe-based SATA controllers which
> support MPS of 512 bytes too and there were no problem with them on
> A3720 nor CN9130.
Marek Behún April 16, 2021, 1:54 p.m. UTC | #8
On Wed, 17 Mar 2021 17:45:49 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> Can you please open a report at bugzilla.kernel.org and attach the
> complete "sudo lspci -vv" output for both systems?  I think it's OK to
> collect these with the patch applied; we should still be able to see
> the information we use to compute the MPS values.  But please include
> the CONFIG_PCIE_BUS_* settings and any "pcie_bus_*" kernel command
> line arguments.

Bjorn, I have submitted a report on bugzilla 

https://bugzilla.kernel.org/show_bug.cgi?id=212695

is this enough?

Marek
Pali Rohár April 25, 2021, 3:29 p.m. UTC | #9
On Friday 16 April 2021 15:54:00 Marek Behún wrote:
> On Wed, 17 Mar 2021 17:45:49 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > Can you please open a report at bugzilla.kernel.org and attach the
> > complete "sudo lspci -vv" output for both systems?  I think it's OK to
> > collect these with the patch applied; we should still be able to see
> > the information we use to compute the MPS values.  But please include
> > the CONFIG_PCIE_BUS_* settings and any "pcie_bus_*" kernel command
> > line arguments.
> 
> Bjorn, I have submitted a report on bugzilla 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=212695
> 
> is this enough?
> 
> Marek

Bjorn, it is enough? Or is something else required to include this patch?
Marek Behún May 11, 2021, 4:16 p.m. UTC | #10
Ping? :)

Marek
Pali Rohár May 28, 2021, 12:12 a.m. UTC | #11
On Tuesday 11 May 2021 18:16:12 Marek Behún wrote:
> Ping? :)
> 
> Marek

Bjorn: Gentle reminder :)
Pali Rohár May 30, 2021, 10:21 a.m. UTC | #12
On Sunday 21 March 2021 16:09:33 Rötti wrote:
> With these patches supplied (@thank you very much Marek & Björn) is there a
> build server I can download a nightly version of armbian I can test for you?
> Is there any way I can support?

Hello Rötti! For Armbian support you need to contact Armbian people.
Only they can include this patch into their build system and prepare
a patched Armbian kernel for you. Try to fill bug report on their
github issue tracker.
Krzysztof Wilczyński June 1, 2021, 11:36 a.m. UTC | #13
Hi Marek,

[...]
> +/*
> + * For some reason DECLARE_PCI_FIXUP_HEADER does not work with pci-aardvark
> + * controller. We have to use DECLARE_PCI_FIXUP_EARLY.
> + */
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x0612, fixup_mpss_256);
[...]

Thank you for fixing this and also for filling the bug report on
Bugzilla!  Appreciated.

Reviewed-by: Krzysztof Wilczyński <kw@linux.com>

	Krzysztof
Bjorn Helgaas June 1, 2021, 5:09 p.m. UTC | #14
[+cc Krzysztof]

On Fri, May 28, 2021 at 02:12:08AM +0200, Pali Rohár wrote:
> On Tuesday 11 May 2021 18:16:12 Marek Behún wrote:
> > Ping? :)
> > 
> > Marek
> 
> Bjorn: Gentle reminder :)

The current patch [1] doesn't look mergeable as-is.

  - "ASM1062 SATA controller causes an External Abort on controllers
    which support Max Payload Size >= 512" doesn't sound like a
    correct description.

    That description suggests the problem is with the *PCI
    controller*, not with the ASM1062.  I think that's incorrect; I
    think the problem is with the ASM1062.

    I would expect something like "ASM1062 advertises Max_Payload_Size
    Supported of 512, but in fact it cannot handle TLPs with payload
    size of 512."

  - Also, "External Abort" is an arch-specific description.  Ideally
    we would know the PCIe error.  Probably ASM1062 reports a
    Malformed TLP error when it receives a data payload of 512 bytes,
    and Aardvark, DesignWare, etc convert this to an arm64 External
    Abort.

  - "this problem first appeared ... after 8a3ebd8de328 ..." seems
    only marginally relevant.  It seems to have exposed the latent
    ASM1062 defect.  We *could* say something about how the defect was
    masked by the fact that many PCI bridges only support MPS <= 256,
    and 8a3ebd8de328 would then be relevant because it effectively
    limits MPS.

  - A bugzilla link is in the email thread; thanks for that.
    Somebody needs to include it in a revised commit log.

  - The "For some reason DECLARE_PCI_FIXUP_HEADER does not work"
    comment in the code needs explanation.  If we need to change all
    the quirks to EARLY quirks, we should do that.  If this one needs
    to be different from the others, we need to explain *why*, not
    just "for some reason."

I know some of these were addressed in the email thread, and I could
synthesize some of those responses into a commit log, but it would be
better if you collected things up into a v2 posting.

[1] https://lore.kernel.org/r/20210317115924.31885-1-kabel@kernel.org)
Bjorn Helgaas June 1, 2021, 9:15 p.m. UTC | #15
On Tue, Jun 01, 2021 at 12:09:09PM -0500, Bjorn Helgaas wrote:
> On Fri, May 28, 2021 at 02:12:08AM +0200, Pali Rohár wrote:
> > On Tuesday 11 May 2021 18:16:12 Marek Behún wrote:
> > > Ping? :)
> > > 
> > > Marek
> > 
> > Bjorn: Gentle reminder :)
> 
> The current patch [1] doesn't look mergeable as-is.
> 
>   - "ASM1062 SATA controller causes an External Abort on controllers
>     which support Max Payload Size >= 512" doesn't sound like a
>     correct description.
> 
>     That description suggests the problem is with the *PCI
>     controller*, not with the ASM1062.  I think that's incorrect; I
>     think the problem is with the ASM1062.
> 
>     I would expect something like "ASM1062 advertises Max_Payload_Size
>     Supported of 512, but in fact it cannot handle TLPs with payload
>     size of 512."

MPS is somewhat complicated, and I don't think we have good
documentation in the Linux source.  FWIW, this paper has a really good
explanation of the MPS (Max Payload Size) and MRRS (Max Read Request
Size) concepts:

https://www.xilinx.com/support/documentation/white_papers/wp350.pdf
Rötti Aug. 26, 2021, 11 a.m. UTC | #16
I directed the Armbian guys (Werner) to your patch and they included it 
into their master branch.
So I got to compile an Armbian kernel in order to test the patch in my 
setup with the EspressoBin Board and the AS-Media SATA-controller chips.

I've tested it and it works flawlessly :-)

Thanks!

Am 21.03.2021 16:09 schrieb Rötti:
> I organized a T60 Thinkpad, pulled out the Wificard (MiniPCIE) and
> plugged in the Marvell SATA-Controller card. Good news is that you're
> right, the DevCap MaxPayload is 128 bytes, so I couldn't reproduce
> that error on that thinkpad. I tried two different Marvell controller
> cards. Wierd thing is, that both cards did not sho up in the lspci -nn
> -vv command. So I'm not sure if these got recognized.
> 
> With these patches supplied (@thank you very much Marek & Björn) is
> there a build server I can download a nightly version of armbian I can
> test for you?
> Is there any way I can support?
> 
> Thank you very much in advance!
> 
> Am 19.03.2021 20:02 schrieb Pali Rohár:
>> On Wednesday 17 March 2021 18:03:55 Bjorn Helgaas wrote:
>>> On Wed, Mar 17, 2021 at 11:55:44PM +0100, Pali Rohár wrote:
>>> > On Wednesday 17 March 2021 17:45:49 Bjorn Helgaas wrote:
>>> > > This quirk suggests that there's a hardware defect in the ASMedia
>>> > > ASM1062.  But if that's really the case, we should see reports on lots
>>> > > of platforms, and I'm only aware of these two.
>>> >
>>> > Do you have platform which support MPS of 512 bytes? Because I have not
>>> > seen any x86 / Intel PCIe controller with such support on ordinary
>>> > laptop and desktop.
>>> >
>>> > These two (A3720 and CN9130) are the only which has support for it.
>>> >
>>> > Has somebody else PCIe controller which Root Bridge supports MPS of 512
>>> > bytes?
>>> >
>>> > Maybe they are in servers, but then such "cheap" SATA controllers are
>>> > not used in servers. So this is probably reason why nobody else reported
>>> > such issue.
>>> 
>>> I have no idea.  My laptop only supports 512 (except for an ASMedia
>>> USB controller).  If the device advertises it, I would expect the
>>> vendor to test it.  Obviously it still could be a device defect.  
>>> They
>>> should publish an erratum if that's the case so people know to avoid
>>> it.  So I would try to get ASMedia to say "no, that's tested and
>>> should work" or "oh, sorry, here's an erratum and we'll fix it in the
>>> next round."
>> 
>> I doubt that ASMedia publish something...
>> 
>> But has somebody contact to ASMedia? I can try it.
>> 
>> Basically these ASMedia SATA controller chips are present on more
>> "noname" mPCIe-form cards and I guess ASMedia is not going to support
>> them.
>> 
>> Note that we have also tested Marvell PCIe-based SATA controllers 
>> which
>> support MPS of 512 bytes too and there were no problem with them on
>> A3720 nor CN9130.
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..a561136efb08 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3251,6 +3251,11 @@  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
 			 PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1, fixup_mpss_256);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SOLARFLARE,
 			 PCI_DEVICE_ID_SOLARFLARE_SFC4000B, fixup_mpss_256);
+/*
+ * For some reason DECLARE_PCI_FIXUP_HEADER does not work with pci-aardvark
+ * controller. We have to use DECLARE_PCI_FIXUP_EARLY.
+ */
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x0612, fixup_mpss_256);
 
 /*
  * Intel 5000 and 5100 Memory controllers have an erratum with read completion