diff mbox series

[for-4.17,v2,4/5] pci: do not disable memory decoding for devices

Message ID 20221025144418.66800-5-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series (v)pci: fixes related to memory decoding handling | expand

Commit Message

Roger Pau Monne Oct. 25, 2022, 2:44 p.m. UTC
Commit 75cc460a1b added checks to ensure the position of the BARs from
PCI devices don't overlap with regions defined on the memory map.
When there's a collision memory decoding is left disabled for the
device, assuming that dom0 will reposition the BAR if necessary and
enable memory decoding.

While this would be the case for devices being used by dom0, devices
being used by the firmware itself that have no driver would usually be
left with memory decoding disabled by dom0 if that's the state dom0
found them in, and thus firmware trying to make use of them will not
function correctly.

The initial intent of 75cc460a1b was to prevent vPCI from creating
MMIO mappings on the dom0 p2m over regions that would otherwise
already have mappings established.  It's my view now that we likely
went too far with 75cc460a1b, and Xen disabling memory decoding of
devices (as buggy as they might be) is harmful, and reduces the set of
hardware on which Xen works.

This commits reverts most of 75cc460a1b, and instead adds checks to
vPCI in order to prevent misplaced BARs from being added to the
hardware domain p2m.  Signaling on whether BARs are mapped is tracked
in the vpci structure, so that misplaced BARs are not mapped, and thus
Xen won't attempt to unmap them when memory decoding is disabled.

This restores the behavior of Xen for PV dom0 to the state it was
previous to 75cc460a1b, while also introducing a more contained fix
for the vPCI BAR mapping issues.

Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
AT Citrix we have a system with a device with the following BARs:

BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
BAR [0, 0x1fff] -> not positioned, outside host bridge window

And memory decoding enabled by the firmware.  With the current code
(or any of the previous fix proposals), Xen would still disable memory
decoding for the device, and the system will freeze when attempting to
set EFI vars.

I'm afraid the best solution to avoid regressions caused by 75cc460a1b
is the proposal here.
---
 xen/drivers/passthrough/pci.c | 69 -----------------------------------
 xen/drivers/vpci/header.c     | 22 ++++++++++-
 2 files changed, 20 insertions(+), 71 deletions(-)

Comments

Andrew Cooper Oct. 25, 2022, 2:57 p.m. UTC | #1
On 25/10/2022 15:44, Roger Pau Monne wrote:
> Commit 75cc460a1b added checks to ensure the position of the BARs from
> PCI devices don't overlap with regions defined on the memory map.
> When there's a collision memory decoding is left disabled for the
> device, assuming that dom0 will reposition the BAR if necessary and
> enable memory decoding.
>
> While this would be the case for devices being used by dom0, devices
> being used by the firmware itself that have no driver would usually be
> left with memory decoding disabled by dom0 if that's the state dom0
> found them in, and thus firmware trying to make use of them will not
> function correctly.
>
> The initial intent of 75cc460a1b was to prevent vPCI from creating
> MMIO mappings on the dom0 p2m over regions that would otherwise
> already have mappings established.  It's my view now that we likely
> went too far with 75cc460a1b, and Xen disabling memory decoding of
> devices (as buggy as they might be) is harmful, and reduces the set of
> hardware on which Xen works.
>
> This commits reverts most of 75cc460a1b, and instead adds checks to
> vPCI in order to prevent misplaced BARs from being added to the
> hardware domain p2m.  Signaling on whether BARs are mapped is tracked
> in the vpci structure, so that misplaced BARs are not mapped, and thus
> Xen won't attempt to unmap them when memory decoding is disabled.
>
> This restores the behavior of Xen for PV dom0 to the state it was
> previous to 75cc460a1b, while also introducing a more contained fix
> for the vPCI BAR mapping issues.
>
> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> AT Citrix we have a system with a device with the following BARs:
>
> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> BAR [0, 0x1fff] -> not positioned, outside host bridge window

This needs a bit more explanation (even if only in the email thread). 
The first item here is permitted under the UEFI spec, and exists on
production systems.  We've got several examples.

The second has only been seen on an SDP, and is hopefully a firmware bug
that doesn't get out to production, but we should boot nevertheless.

~Andrew
Jan Beulich Oct. 25, 2022, 3 p.m. UTC | #2
On 25.10.2022 16:57, Andrew Cooper wrote:
> On 25/10/2022 15:44, Roger Pau Monne wrote:
>> Commit 75cc460a1b added checks to ensure the position of the BARs from
>> PCI devices don't overlap with regions defined on the memory map.
>> When there's a collision memory decoding is left disabled for the
>> device, assuming that dom0 will reposition the BAR if necessary and
>> enable memory decoding.
>>
>> While this would be the case for devices being used by dom0, devices
>> being used by the firmware itself that have no driver would usually be
>> left with memory decoding disabled by dom0 if that's the state dom0
>> found them in, and thus firmware trying to make use of them will not
>> function correctly.
>>
>> The initial intent of 75cc460a1b was to prevent vPCI from creating
>> MMIO mappings on the dom0 p2m over regions that would otherwise
>> already have mappings established.  It's my view now that we likely
>> went too far with 75cc460a1b, and Xen disabling memory decoding of
>> devices (as buggy as they might be) is harmful, and reduces the set of
>> hardware on which Xen works.
>>
>> This commits reverts most of 75cc460a1b, and instead adds checks to
>> vPCI in order to prevent misplaced BARs from being added to the
>> hardware domain p2m.  Signaling on whether BARs are mapped is tracked
>> in the vpci structure, so that misplaced BARs are not mapped, and thus
>> Xen won't attempt to unmap them when memory decoding is disabled.
>>
>> This restores the behavior of Xen for PV dom0 to the state it was
>> previous to 75cc460a1b, while also introducing a more contained fix
>> for the vPCI BAR mapping issues.
>>
>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>> AT Citrix we have a system with a device with the following BARs:
>>
>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
> 
> This needs a bit more explanation (even if only in the email thread). 
> The first item here is permitted under the UEFI spec, and exists on
> production systems.  We've got several examples.

Afaict it is at best unclear whether this is really permitted. Generally
BARs are not supposed to be covered by memory map entries, be it E820 or
EFI.

Jan

> The second has only been seen on an SDP, and is hopefully a firmware bug
> that doesn't get out to production, but we should boot nevertheless.
> 
> ~Andrew
Roger Pau Monne Oct. 25, 2022, 3:56 p.m. UTC | #3
On Tue, Oct 25, 2022 at 02:57:57PM +0000, Andrew Cooper wrote:
> On 25/10/2022 15:44, Roger Pau Monne wrote:
> > Commit 75cc460a1b added checks to ensure the position of the BARs from
> > PCI devices don't overlap with regions defined on the memory map.
> > When there's a collision memory decoding is left disabled for the
> > device, assuming that dom0 will reposition the BAR if necessary and
> > enable memory decoding.
> >
> > While this would be the case for devices being used by dom0, devices
> > being used by the firmware itself that have no driver would usually be
> > left with memory decoding disabled by dom0 if that's the state dom0
> > found them in, and thus firmware trying to make use of them will not
> > function correctly.
> >
> > The initial intent of 75cc460a1b was to prevent vPCI from creating
> > MMIO mappings on the dom0 p2m over regions that would otherwise
> > already have mappings established.  It's my view now that we likely
> > went too far with 75cc460a1b, and Xen disabling memory decoding of
> > devices (as buggy as they might be) is harmful, and reduces the set of
> > hardware on which Xen works.
> >
> > This commits reverts most of 75cc460a1b, and instead adds checks to
> > vPCI in order to prevent misplaced BARs from being added to the
> > hardware domain p2m.  Signaling on whether BARs are mapped is tracked
> > in the vpci structure, so that misplaced BARs are not mapped, and thus
> > Xen won't attempt to unmap them when memory decoding is disabled.
> >
> > This restores the behavior of Xen for PV dom0 to the state it was
> > previous to 75cc460a1b, while also introducing a more contained fix
> > for the vPCI BAR mapping issues.
> >
> > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > AT Citrix we have a system with a device with the following BARs:
> >
> > BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> > BAR [0, 0x1fff] -> not positioned, outside host bridge window
> 
> This needs a bit more explanation (even if only in the email thread). 
> The first item here is permitted under the UEFI spec, and exists on
> production systems.  We've got several examples.
> 
> The second has only been seen on an SDP, and is hopefully a firmware bug
> that doesn't get out to production, but we should boot nevertheless.

I already saw the second on production systems, as is what triggered
the change in 75cc460a1b.  I might not have seen both in conjunction
on the same device on a production system.

Thanks, Roger.
Jan Beulich Oct. 26, 2022, 12:35 p.m. UTC | #4
On 25.10.2022 16:44, Roger Pau Monne wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -121,7 +121,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>          }
>  
>          if ( !rom_only &&
> -             (bar->type != VPCI_BAR_ROM || header->rom_enabled) )
> +             (bar->type != VPCI_BAR_ROM || header->rom_enabled) &&
> +             pci_check_bar(pdev, _mfn(PFN_DOWN(bar->addr)),
> +                           _mfn(PFN_DOWN(bar->addr + bar->size - 1))) )
>              bar->enabled = map;
>      }

What about the ROM handling immediately above from here?

> @@ -234,9 +236,25 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>  
>          if ( !MAPPABLE_BAR(bar) ||
>               (rom_only ? bar->type != VPCI_BAR_ROM
> -                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
> +                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) ||
> +             /* Skip BARs already in the requested state. */
> +             bar->enabled == !!(cmd & PCI_COMMAND_MEMORY) )
>              continue;
>  
> +        /*
> +         * Only do BAR position checks for the hardware domain, for guests it's
> +         * assumed that the hardware domain has changed the position of any
> +         * problematic BARs.
> +         */
> +        if ( is_hardware_domain(pdev->domain) &&
> +             !pci_check_bar(pdev, _mfn(start), _mfn(end)) )
> +        {
> +            printk(XENLOG_G_WARNING
> +                   "%pp: not mapping BAR [%lx, %lx] invalid position\n",
> +                   &pdev->sbdf, start, end);
> +            continue;
> +        }

I'm not convinced of it being appropriate to skip the check for DomU.
I'd rather consider this a "fixme", as (perhaps somewhere else) we
should return an error if a misconfigured device was passed. We cannot
blindly leave the security of the system to tool stack + Dom0 kernel
imo.

And then, if this is Dom0-only, I think it wants to be XENLOG_WARNING,
i.e. without the G infix.

Jan
Roger Pau Monne Oct. 26, 2022, 12:58 p.m. UTC | #5
On Wed, Oct 26, 2022 at 02:35:44PM +0200, Jan Beulich wrote:
> On 25.10.2022 16:44, Roger Pau Monne wrote:
> > --- a/xen/drivers/vpci/header.c
> > +++ b/xen/drivers/vpci/header.c
> > @@ -121,7 +121,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
> >          }
> >  
> >          if ( !rom_only &&
> > -             (bar->type != VPCI_BAR_ROM || header->rom_enabled) )
> > +             (bar->type != VPCI_BAR_ROM || header->rom_enabled) &&
> > +             pci_check_bar(pdev, _mfn(PFN_DOWN(bar->addr)),
> > +                           _mfn(PFN_DOWN(bar->addr + bar->size - 1))) )
> >              bar->enabled = map;
> >      }
> 
> What about the ROM handling immediately above from here?

Needs fixing indeed.  I had plans to short circuit the ROM only
mapping path earlier if the BAR wasn't correctly positioned, but
decided it was too complicated and then forgot to adjust the
conditional.

> > @@ -234,9 +236,25 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
> >  
> >          if ( !MAPPABLE_BAR(bar) ||
> >               (rom_only ? bar->type != VPCI_BAR_ROM
> > -                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
> > +                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) ||
> > +             /* Skip BARs already in the requested state. */
> > +             bar->enabled == !!(cmd & PCI_COMMAND_MEMORY) )
> >              continue;
> >  
> > +        /*
> > +         * Only do BAR position checks for the hardware domain, for guests it's
> > +         * assumed that the hardware domain has changed the position of any
> > +         * problematic BARs.
> > +         */
> > +        if ( is_hardware_domain(pdev->domain) &&
> > +             !pci_check_bar(pdev, _mfn(start), _mfn(end)) )
> > +        {
> > +            printk(XENLOG_G_WARNING
> > +                   "%pp: not mapping BAR [%lx, %lx] invalid position\n",
> > +                   &pdev->sbdf, start, end);
> > +            continue;
> > +        }
> 
> I'm not convinced of it being appropriate to skip the check for DomU.
> I'd rather consider this a "fixme", as (perhaps somewhere else) we
> should return an error if a misconfigured device was passed. We cannot
> blindly leave the security of the system to tool stack + Dom0 kernel
> imo.
> 
> And then, if this is Dom0-only, I think it wants to be XENLOG_WARNING,
> i.e. without the G infix.

OK, I don't mind removing the is_hardware_domain() condition, it's
still not clear how we want to handle all this when domU support is
added.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 149f68bb6e..b42acb8d7c 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -233,9 +233,6 @@  static void check_pdev(const struct pci_dev *pdev)
      PCI_STATUS_REC_TARGET_ABORT | PCI_STATUS_REC_MASTER_ABORT | \
      PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY)
     u16 val;
-    unsigned int nbars = 0, rom_pos = 0, i;
-    static const char warn[] = XENLOG_WARNING
-        "%pp disabled: %sBAR [%#lx, %#lx] overlaps with memory map\n";
 
     if ( command_mask )
     {
@@ -254,8 +251,6 @@  static void check_pdev(const struct pci_dev *pdev)
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
     case PCI_HEADER_TYPE_BRIDGE:
-        nbars = PCI_HEADER_BRIDGE_NR_BARS;
-        rom_pos = PCI_ROM_ADDRESS1;
         if ( !bridge_ctl_mask )
             break;
         val = pci_conf_read16(pdev->sbdf, PCI_BRIDGE_CONTROL);
@@ -272,75 +267,11 @@  static void check_pdev(const struct pci_dev *pdev)
         }
         break;
 
-    case PCI_HEADER_TYPE_NORMAL:
-        nbars = PCI_HEADER_NORMAL_NR_BARS;
-        rom_pos = PCI_ROM_ADDRESS;
-        break;
-
     case PCI_HEADER_TYPE_CARDBUS:
         /* TODO */
         break;
     }
 #undef PCI_STATUS_CHECK
-
-    /* Check if BARs overlap with other memory regions. */
-    val = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
-    if ( !(val & PCI_COMMAND_MEMORY) || pdev->ignore_bars )
-        return;
-
-    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val & ~PCI_COMMAND_MEMORY);
-    for ( i = 0; i < nbars; )
-    {
-        uint64_t addr, size;
-        unsigned int reg = PCI_BASE_ADDRESS_0 + i * 4;
-        int rc = 1;
-
-        if ( (pci_conf_read32(pdev->sbdf, reg) & PCI_BASE_ADDRESS_SPACE) !=
-             PCI_BASE_ADDRESS_SPACE_MEMORY )
-            goto next;
-
-        rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
-                              (i == nbars - 1) ? PCI_BAR_LAST : 0);
-        if ( rc < 0 )
-            /* Unable to size, better leave memory decoding disabled. */
-            return;
-        if ( size && !pci_check_bar(pdev, maddr_to_mfn(addr),
-                                    maddr_to_mfn(addr + size - 1)) )
-        {
-            /*
-             * Return without enabling memory decoding if BAR position is not
-             * in IO suitable memory. Let the hardware domain re-position the
-             * BAR.
-             */
-            printk(warn,
-                   &pdev->sbdf, "", PFN_DOWN(addr), PFN_DOWN(addr + size - 1));
-            return;
-        }
-
- next:
-        ASSERT(rc > 0);
-        i += rc;
-    }
-
-    if ( rom_pos &&
-         (pci_conf_read32(pdev->sbdf, rom_pos) & PCI_ROM_ADDRESS_ENABLE) )
-    {
-        uint64_t addr, size;
-        int rc = pci_size_mem_bar(pdev->sbdf, rom_pos, &addr, &size,
-                                  PCI_BAR_ROM);
-
-        if ( rc < 0 )
-            return;
-        if ( size && !pci_check_bar(pdev, maddr_to_mfn(addr),
-                                    maddr_to_mfn(addr + size - 1)) )
-        {
-            printk(warn, &pdev->sbdf, "ROM ", PFN_DOWN(addr),
-                   PFN_DOWN(addr + size - 1));
-            return;
-        }
-    }
-
-    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val);
 }
 
 static void apply_quirks(struct pci_dev *pdev)
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index eb9219a49a..4d7f8f4a30 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -121,7 +121,9 @@  static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
         }
 
         if ( !rom_only &&
-             (bar->type != VPCI_BAR_ROM || header->rom_enabled) )
+             (bar->type != VPCI_BAR_ROM || header->rom_enabled) &&
+             pci_check_bar(pdev, _mfn(PFN_DOWN(bar->addr)),
+                           _mfn(PFN_DOWN(bar->addr + bar->size - 1))) )
             bar->enabled = map;
     }
 
@@ -234,9 +236,25 @@  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
-                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
+                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) ||
+             /* Skip BARs already in the requested state. */
+             bar->enabled == !!(cmd & PCI_COMMAND_MEMORY) )
             continue;
 
+        /*
+         * Only do BAR position checks for the hardware domain, for guests it's
+         * assumed that the hardware domain has changed the position of any
+         * problematic BARs.
+         */
+        if ( is_hardware_domain(pdev->domain) &&
+             !pci_check_bar(pdev, _mfn(start), _mfn(end)) )
+        {
+            printk(XENLOG_G_WARNING
+                   "%pp: not mapping BAR [%lx, %lx] invalid position\n",
+                   &pdev->sbdf, start, end);
+            continue;
+        }
+
         rc = rangeset_add_range(mem, start, end);
         if ( rc )
         {