diff mbox series

[1/1] x86/vtd: Mask DMAR faults for IGD devices

Message ID 20200417133626.72302-2-brendank310@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/1] x86/vtd: Mask DMAR faults for IGD devices | expand

Commit Message

Brendan Kerrigan April 17, 2020, 1:36 p.m. UTC
From: Brendan Kerrigan <kerriganb@ainfosec.com>

The Intel graphics device records DMAR faults regardless
of the Fault Process Disable bit being set. When this fault
occurs, enable the Interrupt Mask (IM) bit in the Fault
Event Control (FECTL) register to prevent the continued
recording of the fault.

Signed-off-by: Brendan Kerrigan <kerriganb@ainfosec.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Tian, Kevin April 20, 2020, 3:28 a.m. UTC | #1
> From: Brendan Kerrigan <brendank310@gmail.com>
> Sent: Friday, April 17, 2020 9:36 PM
> 
> From: Brendan Kerrigan <kerriganb@ainfosec.com>
> 
> The Intel graphics device records DMAR faults regardless
> of the Fault Process Disable bit being set. When this fault

Since this is an errata for specific generations, let's describe
this way instead of stating it as a generic problem.

> occurs, enable the Interrupt Mask (IM) bit in the Fault
> Event Control (FECTL) register to prevent the continued
> recording of the fault.
> 
> Signed-off-by: Brendan Kerrigan <kerriganb@ainfosec.com>
> ---
>  xen/drivers/passthrough/vtd/iommu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 07d40b37fe..288399d816 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -41,6 +41,8 @@
>  #include "vtd.h"
>  #include "../ats.h"
> 
> +#define IS_IGD(seg, id) (0 == seg && 0 == PCI_BUS(id) && 2 == PCI_SLOT(id)
> && 0 == PCI_FUNC(id))
> +
>  struct mapped_rmrr {
>      struct list_head list;
>      u64 base, end;
> @@ -872,6 +874,13 @@ static int iommu_page_fault_do_one(struct
> vtd_iommu *iommu, int type,
>      printk(XENLOG_G_WARNING VTDPREFIX "%s: reason %02x - %s\n",
>             kind, fault_reason, reason);
> 
> +    if ( DMA_REMAP == fault_type && type && IS_IGD(seg, source_id) ) {
> +        u32 fectl = dmar_readl(iommu->reg, DMAR_FECTL_REG);
> +        fectl |= DMA_FECTL_IM;
> +        dmar_writel(iommu->reg, DMAR_FECTL_REG, fectl);
> +        printk(XENLOG_G_WARNING VTDPREFIX "Disabling DMAR faults for
> IGD\n");
> +    }
> +

Several questions. First, I just note that FPD is not touched by any code
today. then how is this errata being caught? Second, we already have
pci_check_disable_device in place which stops DMAs from the problematic
device if it triggers excessive fault reports. why doesn't it work for your
case? Last, dma_msi_end just forces clearing the mask bit at end of handling
the fault interrupt, making above fix meaningless. Those questions just
make me wonder how you actually test this patch and whether it's necessary...

Thanks
Kevin

>      if ( iommu_verbose && fault_type == DMA_REMAP )
>          print_vtd_entries(iommu, PCI_BUS(source_id), PCI_DEVFN2(source_id),
>                            addr >> PAGE_SHIFT);
> --
> 2.17.1
Brendan Kerrigan April 20, 2020, 1:20 p.m. UTC | #2
While it's described as errata for gen8/9, the previous reporting was from
2015 which predates those generations. I tested it on a gen 7 box (which
was causing me the grief of consuming my Xen console buffer). It could be
the case that the FPD code isn't implemented (which wouldn't matter for
gen8/9 because it is broken), and the original problem of faulty firmware
reporting bad ranges is the ultimate culprit. As far as the last two
questions, I was testing on an older version of Xen (4.9.x) and ported it
to master. Happy to hear a better approach to solving the original problem.

-Brendan

On Sun, Apr 19, 2020 at 11:28 PM Tian, Kevin <kevin.tian@intel.com> wrote:

> > From: Brendan Kerrigan <brendank310@gmail.com>
> > Sent: Friday, April 17, 2020 9:36 PM
> >
> > From: Brendan Kerrigan <kerriganb@ainfosec.com>
> >
> > The Intel graphics device records DMAR faults regardless
> > of the Fault Process Disable bit being set. When this fault
>
> Since this is an errata for specific generations, let's describe
> this way instead of stating it as a generic problem.
>
> > occurs, enable the Interrupt Mask (IM) bit in the Fault
> > Event Control (FECTL) register to prevent the continued
> > recording of the fault.
> >
> > Signed-off-by: Brendan Kerrigan <kerriganb@ainfosec.com>
> > ---
> >  xen/drivers/passthrough/vtd/iommu.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/vtd/iommu.c
> > b/xen/drivers/passthrough/vtd/iommu.c
> > index 07d40b37fe..288399d816 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -41,6 +41,8 @@
> >  #include "vtd.h"
> >  #include "../ats.h"
> >
> > +#define IS_IGD(seg, id) (0 == seg && 0 == PCI_BUS(id) && 2 ==
> PCI_SLOT(id)
> > && 0 == PCI_FUNC(id))
> > +
> >  struct mapped_rmrr {
> >      struct list_head list;
> >      u64 base, end;
> > @@ -872,6 +874,13 @@ static int iommu_page_fault_do_one(struct
> > vtd_iommu *iommu, int type,
> >      printk(XENLOG_G_WARNING VTDPREFIX "%s: reason %02x - %s\n",
> >             kind, fault_reason, reason);
> >
> > +    if ( DMA_REMAP == fault_type && type && IS_IGD(seg, source_id) ) {
> > +        u32 fectl = dmar_readl(iommu->reg, DMAR_FECTL_REG);
> > +        fectl |= DMA_FECTL_IM;
> > +        dmar_writel(iommu->reg, DMAR_FECTL_REG, fectl);
> > +        printk(XENLOG_G_WARNING VTDPREFIX "Disabling DMAR faults for
> > IGD\n");
> > +    }
> > +
>
> Several questions. First, I just note that FPD is not touched by any code
> today. then how is this errata being caught? Second, we already have
> pci_check_disable_device in place which stops DMAs from the problematic
> device if it triggers excessive fault reports. why doesn't it work for your
> case? Last, dma_msi_end just forces clearing the mask bit at end of
> handling
> the fault interrupt, making above fix meaningless. Those questions just
> make me wonder how you actually test this patch and whether it's
> necessary...
>
> Thanks
> Kevin
>
> >      if ( iommu_verbose && fault_type == DMA_REMAP )
> >          print_vtd_entries(iommu, PCI_BUS(source_id),
> PCI_DEVFN2(source_id),
> >                            addr >> PAGE_SHIFT);
> > --
> > 2.17.1
>
>
Jan Beulich April 21, 2020, 10:01 a.m. UTC | #3
On 17.04.2020 15:36, Brendan Kerrigan wrote:
> The Intel graphics device records DMAR faults regardless
> of the Fault Process Disable bit being set.

I can't seem to be able to find a place where we would set FPD.
Why do we need the workaround then, or what am I missing?

> When this fault
> occurs, enable the Interrupt Mask (IM) bit in the Fault
> Event Control (FECTL) register to prevent the continued
> recording of the fault.

This should mention the underlying erratum in one way or another.

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -41,6 +41,8 @@
>  #include "vtd.h"
>  #include "../ats.h"
>  
> +#define IS_IGD(seg, id) (0 == seg && 0 == PCI_BUS(id) && 2 == PCI_SLOT(id) && 0 == PCI_FUNC(id))
> +
>  struct mapped_rmrr {
>      struct list_head list;
>      u64 base, end;
> @@ -872,6 +874,13 @@ static int iommu_page_fault_do_one(struct vtd_iommu *iommu, int type,
>      printk(XENLOG_G_WARNING VTDPREFIX "%s: reason %02x - %s\n",
>             kind, fault_reason, reason);
>  
> +    if ( DMA_REMAP == fault_type && type && IS_IGD(seg, source_id) ) {

Using existing infrastructure would be at least some improvement, in
particular to avoid this workaround triggering on unaffected
hardware (including such where there's something else at 0000:00:02.0).
See e.g. is_igd_drhd(), and note that most uses of it are currently
further qualified to limit what hardware quirk workarounds get applied
to. In any event you'll want to clarify in the description that this
won't ever be done on hardware not having this issue, at least as long
as this isn't obvious from the code.

Also - why the "type" part of the conditional? The erratum text
doesn't talk about only DMA reads being affected.

Finally, style-wise the brace belongs on its own line.

> +        u32 fectl = dmar_readl(iommu->reg, DMAR_FECTL_REG);
> +        fectl |= DMA_FECTL_IM;

While this is the recommended workaround, I don't see you undoing
the masking anywhere later, and I'm also unconvinced this should be
done upon seeing the first fault. One option might be to do so on the
the first fault when FPD is set for the offending device. Or else,
following the title, it might want/need doing unconditionally at
initialization time.

> +        dmar_writel(iommu->reg, DMAR_FECTL_REG, fectl);
> +        printk(XENLOG_G_WARNING VTDPREFIX "Disabling DMAR faults for IGD\n");

If, as suggested above, IM will get cleared again at some point,
this printk() should imo not be issued more than once.

Jan
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 07d40b37fe..288399d816 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -41,6 +41,8 @@ 
 #include "vtd.h"
 #include "../ats.h"
 
+#define IS_IGD(seg, id) (0 == seg && 0 == PCI_BUS(id) && 2 == PCI_SLOT(id) && 0 == PCI_FUNC(id))
+
 struct mapped_rmrr {
     struct list_head list;
     u64 base, end;
@@ -872,6 +874,13 @@  static int iommu_page_fault_do_one(struct vtd_iommu *iommu, int type,
     printk(XENLOG_G_WARNING VTDPREFIX "%s: reason %02x - %s\n",
            kind, fault_reason, reason);
 
+    if ( DMA_REMAP == fault_type && type && IS_IGD(seg, source_id) ) {
+        u32 fectl = dmar_readl(iommu->reg, DMAR_FECTL_REG);
+        fectl |= DMA_FECTL_IM;
+        dmar_writel(iommu->reg, DMAR_FECTL_REG, fectl);
+        printk(XENLOG_G_WARNING VTDPREFIX "Disabling DMAR faults for IGD\n");
+    }
+
     if ( iommu_verbose && fault_type == DMA_REMAP )
         print_vtd_entries(iommu, PCI_BUS(source_id), PCI_DEVFN2(source_id),
                           addr >> PAGE_SHIFT);