mbox series

[0/2] Adjust AMD GPU ATS quirks

Message ID 20200114205523.1054271-1-alexander.deucher@amd.com (mailing list archive)
Headers show
Series Adjust AMD GPU ATS quirks | expand

Message

Alex Deucher Jan. 14, 2020, 8:55 p.m. UTC
We've root caused the issue and clarified the quirk.
This also adds a new quirk for a new GPU.

Alex Deucher (2):
  pci: Clarify ATS quirk
  pci: add ATS quirk for navi14 board (v2)

 drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

Comments

Bjorn Helgaas Jan. 14, 2020, 11:41 p.m. UTC | #1
On Tue, Jan 14, 2020 at 03:55:21PM -0500, Alex Deucher wrote:
> We've root caused the issue and clarified the quirk.
> This also adds a new quirk for a new GPU.
> 
> Alex Deucher (2):
>   pci: Clarify ATS quirk
>   pci: add ATS quirk for navi14 board (v2)
> 
>  drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)

I propose the following, which I intend to be functionally identical.
It just doesn't repeat the pci_info() and pdev->ats_cap = 0.

commit 998c4f7975b0 ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Jan 14 17:09:28 2020 -0600

    PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken
    
    To account for parts of the chip that are "harvested" (disabled) due to
    silicon flaws, caches on some AMD GPUs must be initialized before ATS is
    enabled.
    
    ATS is normally enabled by the IOMMU driver before the GPU driver loads, so
    this cache initialization would have to be done in a quirk, but that's too
    complex to be practical.
    
    For Navi14 (device ID 0x7340), this initialization is done by the VBIOS,
    but apparently some boards went to production with an older VBIOS that
    doesn't do it.  Disable ATS for those boards.
    
    https://lore.kernel.org/r/20200114205523.1054271-3-alexander.deucher@amd.com
    Bug: https://gitlab.freedesktop.org/drm/amd/issues/1015
    See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken")
    See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 4937a088d7d8..fbeb9f73ef28 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5074,18 +5074,25 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags);
 
 #ifdef CONFIG_PCI_ATS
 /*
- * Some devices have a broken ATS implementation causing IOMMU stalls.
- * Don't use ATS for those devices.
+ * Some devices require additional driver setup to enable ATS.  Don't use
+ * ATS for those devices as ATS will be enabled before the driver has had a
+ * chance to load and configure the device.
  */
-static void quirk_no_ats(struct pci_dev *pdev)
+static void quirk_amd_harvest_no_ats(struct pci_dev *pdev)
 {
-	pci_info(pdev, "disabling ATS (broken on this device)\n");
+	if (pdev->device == 0x7340 && pdev->revision != 0xc5)
+		return;
+
+	pci_info(pdev, "disabling ATS\n");
 	pdev->ats_cap = 0;
 }
 
 /* AMD Stoney platform GPU */
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_no_ats);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_amd_harvest_no_ats);
+/* AMD Iceland dGPU */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats);
+/* AMD Navi14 dGPU */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
 #endif /* CONFIG_PCI_ATS */
 
 /* Freescale PCIe doesn't support MSI in RC mode */
Alex Deucher Jan. 15, 2020, 2:08 p.m. UTC | #2
[AMD Public Use]

> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Tuesday, January 14, 2020 6:42 PM
> To: Alex Deucher <alexdeucher@gmail.com>
> Cc: amd-gfx@lists.freedesktop.org; linux-pci@vger.kernel.org; Deucher,
> Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH 0/2] Adjust AMD GPU ATS quirks
> 
> On Tue, Jan 14, 2020 at 03:55:21PM -0500, Alex Deucher wrote:
> > We've root caused the issue and clarified the quirk.
> > This also adds a new quirk for a new GPU.
> >
> > Alex Deucher (2):
> >   pci: Clarify ATS quirk
> >   pci: add ATS quirk for navi14 board (v2)
> >
> >  drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
> >  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> I propose the following, which I intend to be functionally identical.
> It just doesn't repeat the pci_info() and pdev->ats_cap = 0.
> 

Works for me.  Thanks!

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> commit 998c4f7975b0 ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Tue Jan 14 17:09:28 2020 -0600
> 
>     PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken
> 
>     To account for parts of the chip that are "harvested" (disabled) due to
>     silicon flaws, caches on some AMD GPUs must be initialized before ATS is
>     enabled.
> 
>     ATS is normally enabled by the IOMMU driver before the GPU driver loads,
> so
>     this cache initialization would have to be done in a quirk, but that's too
>     complex to be practical.
> 
>     For Navi14 (device ID 0x7340), this initialization is done by the VBIOS,
>     but apparently some boards went to production with an older VBIOS that
>     doesn't do it.  Disable ATS for those boards.
> 
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fr%2F20200114205523.1054271-3-
> alexander.deucher%40amd.com&amp;data=02%7C01%7Calexander.deucher
> %40amd.com%7C7bbf2f086ba64a68891e08d7994b5216%7C3dd8961fe4884e6
> 08e11a82d994e183d%7C0%7C0%7C637146421098328112&amp;sdata=aLaNuiJ
> pB4dYatxvBJuC%2Blk90Dhl4qd5jvLp75ZUDns%3D&amp;reserved=0
>     Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2Fissues%2F1015&amp;data=02%7C01%
> 7Calexander.deucher%40amd.com%7C7bbf2f086ba64a68891e08d7994b5216
> %7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63714642109832811
> 2&amp;sdata=QgCFuWKp8Dg3lpQhXCb2z4qmukdqkiX0e3%2BRz%2FcPkg0%3
> D&amp;reserved=0
>     See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as
> broken")
>     See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> 4937a088d7d8..fbeb9f73ef28 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5074,18 +5074,25 @@
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422,
> quirk_no_ext_tags);
> 
>  #ifdef CONFIG_PCI_ATS
>  /*
> - * Some devices have a broken ATS implementation causing IOMMU stalls.
> - * Don't use ATS for those devices.
> + * Some devices require additional driver setup to enable ATS.  Don't
> + use
> + * ATS for those devices as ATS will be enabled before the driver has
> + had a
> + * chance to load and configure the device.
>   */
> -static void quirk_no_ats(struct pci_dev *pdev)
> +static void quirk_amd_harvest_no_ats(struct pci_dev *pdev)
>  {
> -	pci_info(pdev, "disabling ATS (broken on this device)\n");
> +	if (pdev->device == 0x7340 && pdev->revision != 0xc5)
> +		return;
> +
> +	pci_info(pdev, "disabling ATS\n");
>  	pdev->ats_cap = 0;
>  }
> 
>  /* AMD Stoney platform GPU */
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats); -
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_no_ats);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> +quirk_amd_harvest_no_ats);
> +/* AMD Iceland dGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900,
> +quirk_amd_harvest_no_ats);
> +/* AMD Navi14 dGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> +quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
> 
>  /* Freescale PCIe doesn't support MSI in RC mode */
Bjorn Helgaas Jan. 15, 2020, 5:14 p.m. UTC | #3
On Tue, Jan 14, 2020 at 05:41:44PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 14, 2020 at 03:55:21PM -0500, Alex Deucher wrote:
> > We've root caused the issue and clarified the quirk.
> > This also adds a new quirk for a new GPU.
> > 
> > Alex Deucher (2):
> >   pci: Clarify ATS quirk
> >   pci: add ATS quirk for navi14 board (v2)
> > 
> >  drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
> >  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> I propose the following, which I intend to be functionally identical.
> It just doesn't repeat the pci_info() and pdev->ats_cap = 0.

Applied to pci/misc for v5.6, thanks!

> commit 998c4f7975b0 ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Tue Jan 14 17:09:28 2020 -0600
> 
>     PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken
>     
>     To account for parts of the chip that are "harvested" (disabled) due to
>     silicon flaws, caches on some AMD GPUs must be initialized before ATS is
>     enabled.
>     
>     ATS is normally enabled by the IOMMU driver before the GPU driver loads, so
>     this cache initialization would have to be done in a quirk, but that's too
>     complex to be practical.
>     
>     For Navi14 (device ID 0x7340), this initialization is done by the VBIOS,
>     but apparently some boards went to production with an older VBIOS that
>     doesn't do it.  Disable ATS for those boards.
>     
>     https://lore.kernel.org/r/20200114205523.1054271-3-alexander.deucher@amd.com
>     Bug: https://gitlab.freedesktop.org/drm/amd/issues/1015
>     See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken")
>     See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 4937a088d7d8..fbeb9f73ef28 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5074,18 +5074,25 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags);
>  
>  #ifdef CONFIG_PCI_ATS
>  /*
> - * Some devices have a broken ATS implementation causing IOMMU stalls.
> - * Don't use ATS for those devices.
> + * Some devices require additional driver setup to enable ATS.  Don't use
> + * ATS for those devices as ATS will be enabled before the driver has had a
> + * chance to load and configure the device.
>   */
> -static void quirk_no_ats(struct pci_dev *pdev)
> +static void quirk_amd_harvest_no_ats(struct pci_dev *pdev)
>  {
> -	pci_info(pdev, "disabling ATS (broken on this device)\n");
> +	if (pdev->device == 0x7340 && pdev->revision != 0xc5)
> +		return;
> +
> +	pci_info(pdev, "disabling ATS\n");
>  	pdev->ats_cap = 0;
>  }
>  
>  /* AMD Stoney platform GPU */
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_no_ats);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_amd_harvest_no_ats);
> +/* AMD Iceland dGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats);
> +/* AMD Navi14 dGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
>  
>  /* Freescale PCIe doesn't support MSI in RC mode */
Alex Deucher Jan. 15, 2020, 5:26 p.m. UTC | #4
On Wed, Jan 15, 2020 at 12:14 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Jan 14, 2020 at 05:41:44PM -0600, Bjorn Helgaas wrote:
> > On Tue, Jan 14, 2020 at 03:55:21PM -0500, Alex Deucher wrote:
> > > We've root caused the issue and clarified the quirk.
> > > This also adds a new quirk for a new GPU.
> > >
> > > Alex Deucher (2):
> > >   pci: Clarify ATS quirk
> > >   pci: add ATS quirk for navi14 board (v2)
> > >
> > >  drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
> > >  1 file changed, 25 insertions(+), 7 deletions(-)
> >
> > I propose the following, which I intend to be functionally identical.
> > It just doesn't repeat the pci_info() and pdev->ats_cap = 0.
>
> Applied to pci/misc for v5.6, thanks!

Can we add this to stable as well?

Alex

>
> > commit 998c4f7975b0 ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Tue Jan 14 17:09:28 2020 -0600
> >
> >     PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken
> >
> >     To account for parts of the chip that are "harvested" (disabled) due to
> >     silicon flaws, caches on some AMD GPUs must be initialized before ATS is
> >     enabled.
> >
> >     ATS is normally enabled by the IOMMU driver before the GPU driver loads, so
> >     this cache initialization would have to be done in a quirk, but that's too
> >     complex to be practical.
> >
> >     For Navi14 (device ID 0x7340), this initialization is done by the VBIOS,
> >     but apparently some boards went to production with an older VBIOS that
> >     doesn't do it.  Disable ATS for those boards.
> >
> >     https://lore.kernel.org/r/20200114205523.1054271-3-alexander.deucher@amd.com
> >     Bug: https://gitlab.freedesktop.org/drm/amd/issues/1015
> >     See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken")
> >     See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")
> >     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 4937a088d7d8..fbeb9f73ef28 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5074,18 +5074,25 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags);
> >
> >  #ifdef CONFIG_PCI_ATS
> >  /*
> > - * Some devices have a broken ATS implementation causing IOMMU stalls.
> > - * Don't use ATS for those devices.
> > + * Some devices require additional driver setup to enable ATS.  Don't use
> > + * ATS for those devices as ATS will be enabled before the driver has had a
> > + * chance to load and configure the device.
> >   */
> > -static void quirk_no_ats(struct pci_dev *pdev)
> > +static void quirk_amd_harvest_no_ats(struct pci_dev *pdev)
> >  {
> > -     pci_info(pdev, "disabling ATS (broken on this device)\n");
> > +     if (pdev->device == 0x7340 && pdev->revision != 0xc5)
> > +             return;
> > +
> > +     pci_info(pdev, "disabling ATS\n");
> >       pdev->ats_cap = 0;
> >  }
> >
> >  /* AMD Stoney platform GPU */
> > -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats);
> > -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_no_ats);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_amd_harvest_no_ats);
> > +/* AMD Iceland dGPU */
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats);
> > +/* AMD Navi14 dGPU */
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
> >  #endif /* CONFIG_PCI_ATS */
> >
> >  /* Freescale PCIe doesn't support MSI in RC mode */
Bjorn Helgaas Jan. 15, 2020, 8:17 p.m. UTC | #5
On Wed, Jan 15, 2020 at 12:26:32PM -0500, Alex Deucher wrote:
> On Wed, Jan 15, 2020 at 12:14 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Jan 14, 2020 at 05:41:44PM -0600, Bjorn Helgaas wrote:
> > > On Tue, Jan 14, 2020 at 03:55:21PM -0500, Alex Deucher wrote:
> > > > We've root caused the issue and clarified the quirk.
> > > > This also adds a new quirk for a new GPU.
> > > >
> > > > Alex Deucher (2):
> > > >   pci: Clarify ATS quirk
> > > >   pci: add ATS quirk for navi14 board (v2)
> > > >
> > > >  drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
> > > >  1 file changed, 25 insertions(+), 7 deletions(-)
> > >
> > > I propose the following, which I intend to be functionally identical.
> > > It just doesn't repeat the pci_info() and pdev->ats_cap = 0.
> >
> > Applied to pci/misc for v5.6, thanks!
> 
> Can we add this to stable as well?

Done!  Do you want it in v5.5?  It's pretty localized so looks pretty
low-risk.
Alex Deucher Jan. 15, 2020, 8:20 p.m. UTC | #6
On Wed, Jan 15, 2020 at 3:17 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jan 15, 2020 at 12:26:32PM -0500, Alex Deucher wrote:
> > On Wed, Jan 15, 2020 at 12:14 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, Jan 14, 2020 at 05:41:44PM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Jan 14, 2020 at 03:55:21PM -0500, Alex Deucher wrote:
> > > > > We've root caused the issue and clarified the quirk.
> > > > > This also adds a new quirk for a new GPU.
> > > > >
> > > > > Alex Deucher (2):
> > > > >   pci: Clarify ATS quirk
> > > > >   pci: add ATS quirk for navi14 board (v2)
> > > > >
> > > > >  drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
> > > > >  1 file changed, 25 insertions(+), 7 deletions(-)
> > > >
> > > > I propose the following, which I intend to be functionally identical.
> > > > It just doesn't repeat the pci_info() and pdev->ats_cap = 0.
> > >
> > > Applied to pci/misc for v5.6, thanks!
> >
> > Can we add this to stable as well?
>
> Done!  Do you want it in v5.5?  It's pretty localized so looks pretty
> low-risk.

Sure.  Thanks!

Alex
Bjorn Helgaas Jan. 15, 2020, 10:52 p.m. UTC | #7
On Wed, Jan 15, 2020 at 03:20:18PM -0500, Alex Deucher wrote:
> On Wed, Jan 15, 2020 at 3:17 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Jan 15, 2020 at 12:26:32PM -0500, Alex Deucher wrote:
> > > On Wed, Jan 15, 2020 at 12:14 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Tue, Jan 14, 2020 at 05:41:44PM -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Jan 14, 2020 at 03:55:21PM -0500, Alex Deucher wrote:
> > > > > > We've root caused the issue and clarified the quirk.
> > > > > > This also adds a new quirk for a new GPU.
> > > > > >
> > > > > > Alex Deucher (2):
> > > > > >   pci: Clarify ATS quirk
> > > > > >   pci: add ATS quirk for navi14 board (v2)
> > > > > >
> > > > > >  drivers/pci/quirks.c | 32 +++++++++++++++++++++++++-------
> > > > > >  1 file changed, 25 insertions(+), 7 deletions(-)
> > > > >
> > > > > I propose the following, which I intend to be functionally identical.
> > > > > It just doesn't repeat the pci_info() and pdev->ats_cap = 0.
> > > >
> > > > Applied to pci/misc for v5.6, thanks!
> > >
> > > Can we add this to stable as well?
> >
> > Done!  Do you want it in v5.5?  It's pretty localized so looks pretty
> > low-risk.
> 
> Sure.  Thanks!

Moved to for-linus for v5.5.