diff mbox series

[v10,12/17] vpci/header: emulate PCI_COMMAND register for guests

Message ID 20231012220854.2736994-13-volodymyr_babchuk@epam.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Volodymyr Babchuk Oct. 12, 2023, 10:09 p.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's view of this will want to be zero initially, the host having set
it to 1 may not easily be overwritten with 0, or else we'd effectively
imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the guest's view
of the command register.

Here is the full list of command register bits with notes about
emulation, along with QEMU behavior in the same situation:

PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
in real device. Instead it is always set to 1. A guest can write to this
register, but writes are ignored.

PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
Xen case, we handle writes to this bit by mapping/unmapping BAR
regions. For devices assigned to DomUs, memory decoding will be
disabled and the initialization.

PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
writes to this bit.

PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
access to host bridge that supports software generation of special
cycles. In our case guest has no access to host bridges at all. Value
after reset is 0. QEMU passes through writes of this bit, we will do
the same.

PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
to be generated. It requires additional configuration via Cacheline
Size register. We are not emulating this register right now and we
can't expect guest to properly configure it. QEMU "emulates" access to
Cachline Size register by ignoring all writes to it. QEMU passes through
writes of PCI_COMMAND_INVALIDATE bit, we will do the same.

PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
through writes of this bit, we will do the same.

PCI_COMMAND_PARITY - Controls how device response to parity
errors. QEMU ignores writes to this bit, we will do the same.

PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
through writes of this bit, so we will do the same.

PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
writes to this bit, we will do the same.

PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
transactions. It is configured by firmware, so we don't want guest to
control it. QEMU ignores writes to this bit, we will do the same.

PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
enabled, device is prohibited from asserting INTx as per
specification. Value after reset is 0. In QEMU case, it checks of INTx
was mapped for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
change value of this bit if MSI(X) is enabled.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
In v10:
- Added cf_check attribute to guest_cmd_read
- Removed warning about non-zero cmd
- Updated comment MSI code regarding disabling INTX
- Used ternary operator in vpci_add_register() call
- Disable memory decoding for DomUs in init_bars()
In v9:
- Reworked guest_cmd_read
- Added handling for more bits
Since v6:
- fold guest's logic into cmd_write
- implement cmd_read, so we can report emulated INTx state to guests
- introduce header->guest_cmd to hold the emulated state of the
  PCI_COMMAND register for guests
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c | 44 +++++++++++++++++++++++++++++++++++----
 xen/drivers/vpci/msi.c    |  6 ++++++
 xen/drivers/vpci/msix.c   |  4 ++++
 xen/include/xen/vpci.h    |  3 +++
 4 files changed, 53 insertions(+), 4 deletions(-)

Comments

Volodymyr Babchuk Oct. 13, 2023, 9:53 p.m. UTC | #1
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> writes:

> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
> guest's view of this will want to be zero initially, the host having set
> it to 1 may not easily be overwritten with 0, or else we'd effectively
> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
> proper emulation in order to honor host's settings.
>
> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
> Device Control" the reset state of the command register is typically 0,
> so when assigning a PCI device use 0 as the initial state for the guest's view
> of the command register.
>
> Here is the full list of command register bits with notes about
> emulation, along with QEMU behavior in the same situation:
>
> PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
> in real device. Instead it is always set to 1. A guest can write to this
> register, but writes are ignored.
>
> PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
> Xen case, we handle writes to this bit by mapping/unmapping BAR
> regions. For devices assigned to DomUs, memory decoding will be
> disabled and the initialization.
>
> PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
> writes to this bit.
>
> PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
> access to host bridge that supports software generation of special
> cycles. In our case guest has no access to host bridges at all. Value
> after reset is 0. QEMU passes through writes of this bit, we will do
> the same.
>
> PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
> to be generated. It requires additional configuration via Cacheline
> Size register. We are not emulating this register right now and we
> can't expect guest to properly configure it. QEMU "emulates" access to
> Cachline Size register by ignoring all writes to it. QEMU passes through
> writes of PCI_COMMAND_INVALIDATE bit, we will do the same.
>
> PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
> through writes of this bit, we will do the same.
>
> PCI_COMMAND_PARITY - Controls how device response to parity
> errors. QEMU ignores writes to this bit, we will do the same.
>
> PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
> through writes of this bit, so we will do the same.
>
> PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
> writes to this bit, we will do the same.
>
> PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
> transactions. It is configured by firmware, so we don't want guest to
> control it. QEMU ignores writes to this bit, we will do the same.
>
> PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
> enabled, device is prohibited from asserting INTx as per
> specification. Value after reset is 0. In QEMU case, it checks of INTx
> was mapped for a device. If it is not, then guest can't control
> PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
> change value of this bit if MSI(X) is enabled.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> In v10:
> - Added cf_check attribute to guest_cmd_read
> - Removed warning about non-zero cmd
> - Updated comment MSI code regarding disabling INTX
> - Used ternary operator in vpci_add_register() call
> - Disable memory decoding for DomUs in init_bars()
> In v9:
> - Reworked guest_cmd_read
> - Added handling for more bits
> Since v6:
> - fold guest's logic into cmd_write
> - implement cmd_read, so we can report emulated INTx state to guests
> - introduce header->guest_cmd to hold the emulated state of the
>   PCI_COMMAND register for guests
> Since v5:
> - add additional check for MSI-X enabled while altering INTX bit
> - make sure INTx disabled while guests enable MSI/MSI-X
> Since v3:
> - gate more code on CONFIG_HAS_MSI
> - removed logic for the case when MSI/MSI-X not enabled
> ---
>  xen/drivers/vpci/header.c | 44 +++++++++++++++++++++++++++++++++++----
>  xen/drivers/vpci/msi.c    |  6 ++++++
>  xen/drivers/vpci/msix.c   |  4 ++++
>  xen/include/xen/vpci.h    |  3 +++
>  4 files changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index efce0bc2ae..e8eed6a674 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -501,14 +501,32 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>      return 0;
>  }
>  
> +/* TODO: Add proper emulation for all bits of the command register. */
>  static void cf_check cmd_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
>  {
>      struct vpci_header *header = data;
>  
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        const struct vpci *vpci = pdev->vpci;
> +        uint16_t excluded = PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
> +            PCI_COMMAND_FAST_BACK;
> +
> +        header->guest_cmd = cmd;
> +
> +        if ( (vpci->msi && vpci->msi->enabled) ||
> +             (vpci->msix && vpci->msi->enabled) )

There is a nasty mistake. It should be
                (vpci->msix && vpci->msix->enabled)

> +            excluded |= PCI_COMMAND_INTX_DISABLE;
> +
> +        cmd &= ~excluded;
> +        cmd |= pci_conf_read16(pdev->sbdf, reg) & excluded;
> +    }
> +
>      /*
> -     * Let Dom0 play with all the bits directly except for the memory
> -     * decoding one.
> +     * Let guest play with all the bits directly except for the memory
> +     * decoding one. Bits that are not allowed for DomU are already
> +     * handled above.
>       */
>      if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
>          /*
> @@ -522,6 +540,14 @@ static void cf_check cmd_write(
>          pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> +static uint32_t cf_check guest_cmd_read(
> +    const struct pci_dev *pdev, unsigned int reg, void *data)
> +{
> +    const struct vpci_header *header = data;
> +
> +    return header->guest_cmd;
> +}
> +
>  static void cf_check bar_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> @@ -737,8 +763,9 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      }
>  
>      /* Setup a handler for the command register. */
> -    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
> -                           2, header);
> +    rc = vpci_add_register(pdev->vpci,
> +                           is_hwdom ? vpci_hw_read16 : guest_cmd_read,
> +                           cmd_write, PCI_COMMAND, 2, header);
>      if ( rc )
>          return rc;
>  
> @@ -750,6 +777,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      if ( cmd & PCI_COMMAND_MEMORY )
>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
>  
> +    /*
> +     * Clear PCI_COMMAND_MEMORY for DomUs, so they will always start with
> +     * memory decoding disabled and to ensure that we will not call modify_bars()
> +     * at the end of this function.
> +     */
> +    if ( !is_hwdom )
> +        cmd &= ~PCI_COMMAND_MEMORY;
> +    header->guest_cmd = cmd;
> +
>      for ( i = 0; i < num_bars; i++ )
>      {
>          uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 2faa54b7ce..0920bd071f 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -70,6 +70,12 @@ static void cf_check control_write(
>  
>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>              return;
> +
> +        /*
> +         * Make sure guest doesn't enable INTx while enabling MSI.
> +         */
> +        if ( !is_hardware_domain(pdev->domain) )
> +            pci_intx(pdev, false);
>      }
>      else
>          vpci_msi_arch_disable(msi, pdev);
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index b6abab47ef..9d0233d0e3 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -97,6 +97,10 @@ static void cf_check control_write(
>          for ( i = 0; i < msix->max_entries; i++ )
>              if ( !msix->entries[i].masked && msix->entries[i].updated )
>                  update_entry(&msix->entries[i], pdev, i);
> +
> +        /* Make sure guest doesn't enable INTx while enabling MSI-X. */
> +        if ( !is_hardware_domain(pdev->domain) )
> +            pci_intx(pdev, false);
>      }
>      else if ( !new_enabled && msix->enabled )
>      {
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index c5301e284f..60bdc10c13 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -87,6 +87,9 @@ struct vpci {
>          } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
>          /* At most 6 BARS + 1 expansion ROM BAR. */
>  
> +        /* Guest view of the PCI_COMMAND register. */
> +        uint16_t guest_cmd;
> +
>          /*
>           * Store whether the ROM enable bit is set (doesn't imply ROM BAR
>           * is mapped into guest p2m) if there's a ROM BAR on the device.
Roger Pau Monné Nov. 21, 2023, 2:17 p.m. UTC | #2
On Thu, Oct 12, 2023 at 10:09:18PM +0000, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
> guest's view of this will want to be zero initially, the host having set
> it to 1 may not easily be overwritten with 0, or else we'd effectively
> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
> proper emulation in order to honor host's settings.
> 
> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
> Device Control" the reset state of the command register is typically 0,
> so when assigning a PCI device use 0 as the initial state for the guest's view
> of the command register.
> 
> Here is the full list of command register bits with notes about
> emulation, along with QEMU behavior in the same situation:
> 
> PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
> in real device. Instead it is always set to 1. A guest can write to this
> register, but writes are ignored.
> 
> PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
> Xen case, we handle writes to this bit by mapping/unmapping BAR
> regions. For devices assigned to DomUs, memory decoding will be
> disabled and the initialization.
> 
> PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
> writes to this bit.
> 
> PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
> access to host bridge that supports software generation of special
> cycles. In our case guest has no access to host bridges at all. Value
> after reset is 0. QEMU passes through writes of this bit, we will do
> the same.
> 
> PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
> to be generated. It requires additional configuration via Cacheline
> Size register. We are not emulating this register right now and we
> can't expect guest to properly configure it. QEMU "emulates" access to
> Cachline Size register by ignoring all writes to it. QEMU passes through
> writes of PCI_COMMAND_INVALIDATE bit, we will do the same.
> 
> PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
> through writes of this bit, we will do the same.
> 
> PCI_COMMAND_PARITY - Controls how device response to parity
> errors. QEMU ignores writes to this bit, we will do the same.
> 
> PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
> through writes of this bit, so we will do the same.
> 
> PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
> writes to this bit, we will do the same.
> 
> PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
> transactions. It is configured by firmware, so we don't want guest to
> control it. QEMU ignores writes to this bit, we will do the same.
> 
> PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
> enabled, device is prohibited from asserting INTx as per
> specification. Value after reset is 0. In QEMU case, it checks of INTx
> was mapped for a device. If it is not, then guest can't control
> PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
> change value of this bit if MSI(X) is enabled.

FWIW, bits 11-15 are RsvdP, so we will need to add support for them
after the series from Stewart that adds support for register masks is
accepted.

> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
> In v10:
> - Added cf_check attribute to guest_cmd_read
> - Removed warning about non-zero cmd
> - Updated comment MSI code regarding disabling INTX
> - Used ternary operator in vpci_add_register() call
> - Disable memory decoding for DomUs in init_bars()
> In v9:
> - Reworked guest_cmd_read
> - Added handling for more bits
> Since v6:
> - fold guest's logic into cmd_write
> - implement cmd_read, so we can report emulated INTx state to guests
> - introduce header->guest_cmd to hold the emulated state of the
>   PCI_COMMAND register for guests
> Since v5:
> - add additional check for MSI-X enabled while altering INTX bit
> - make sure INTx disabled while guests enable MSI/MSI-X
> Since v3:
> - gate more code on CONFIG_HAS_MSI
> - removed logic for the case when MSI/MSI-X not enabled
> ---
>  xen/drivers/vpci/header.c | 44 +++++++++++++++++++++++++++++++++++----
>  xen/drivers/vpci/msi.c    |  6 ++++++
>  xen/drivers/vpci/msix.c   |  4 ++++
>  xen/include/xen/vpci.h    |  3 +++
>  4 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index efce0bc2ae..e8eed6a674 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -501,14 +501,32 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>      return 0;
>  }
>  
> +/* TODO: Add proper emulation for all bits of the command register. */
>  static void cf_check cmd_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
>  {
>      struct vpci_header *header = data;
>  
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        const struct vpci *vpci = pdev->vpci;
> +        uint16_t excluded = PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
> +            PCI_COMMAND_FAST_BACK;

You could implement those bits using the RsvdP mask also.

You seem to miss PCI_COMMAND_IO?  In the commit message you note that
writes are ignored, yet here you seem to pass through writes to the
underlying device (which might be OK, but needs to be coherent with
the commit message).

> +
> +        header->guest_cmd = cmd;

I'm kind of unsure whether we want to fake the guest view by returning
what the guest writes.

> +
> +        if ( (vpci->msi && vpci->msi->enabled) ||
> +             (vpci->msix && vpci->msi->enabled) )

The typo that you mentioned about msi vs msix.

> +            excluded |= PCI_COMMAND_INTX_DISABLE;
> +
> +        cmd &= ~excluded;
> +        cmd |= pci_conf_read16(pdev->sbdf, reg) & excluded;
> +    }
> +
>      /*
> -     * Let Dom0 play with all the bits directly except for the memory
> -     * decoding one.
> +     * Let guest play with all the bits directly except for the memory
> +     * decoding one. Bits that are not allowed for DomU are already
> +     * handled above.

I think this should be: "Let Dom0 play with all the bits directly ..."
as you mention both Dom0 and DomU.

>       */
>      if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
>          /*
> @@ -522,6 +540,14 @@ static void cf_check cmd_write(
>          pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> +static uint32_t cf_check guest_cmd_read(
> +    const struct pci_dev *pdev, unsigned int reg, void *data)
> +{
> +    const struct vpci_header *header = data;
> +
> +    return header->guest_cmd;
> +}
> +
>  static void cf_check bar_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> @@ -737,8 +763,9 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      }
>  
>      /* Setup a handler for the command register. */
> -    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
> -                           2, header);
> +    rc = vpci_add_register(pdev->vpci,
> +                           is_hwdom ? vpci_hw_read16 : guest_cmd_read,
> +                           cmd_write, PCI_COMMAND, 2, header);
>      if ( rc )
>          return rc;
>  
> @@ -750,6 +777,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      if ( cmd & PCI_COMMAND_MEMORY )
>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
>  
> +    /*
> +     * Clear PCI_COMMAND_MEMORY for DomUs, so they will always start with
> +     * memory decoding disabled and to ensure that we will not call modify_bars()
> +     * at the end of this function.
> +     */
> +    if ( !is_hwdom )
> +        cmd &= ~PCI_COMMAND_MEMORY;

Just for symmetry I would also disable PCI_COMMAND_IO.

I do wonder in which states does SeaBIOS or OVMF expects to find the
devices.

> +    header->guest_cmd = cmd;
> +
>      for ( i = 0; i < num_bars; i++ )
>      {
>          uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 2faa54b7ce..0920bd071f 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -70,6 +70,12 @@ static void cf_check control_write(
>  
>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>              return;
> +
> +        /*
> +         * Make sure guest doesn't enable INTx while enabling MSI.
> +         */
> +        if ( !is_hardware_domain(pdev->domain) )
> +            pci_intx(pdev, false);
>      }
>      else
>          vpci_msi_arch_disable(msi, pdev);
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index b6abab47ef..9d0233d0e3 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -97,6 +97,10 @@ static void cf_check control_write(
>          for ( i = 0; i < msix->max_entries; i++ )
>              if ( !msix->entries[i].masked && msix->entries[i].updated )
>                  update_entry(&msix->entries[i], pdev, i);
> +
> +        /* Make sure guest doesn't enable INTx while enabling MSI-X. */
> +        if ( !is_hardware_domain(pdev->domain) )
> +            pci_intx(pdev, false);

Note that if both new_enabled and new_masked are set, you won't get
inside of this condition, and that could lead to MSIX being enabled
with INTx set in the command register (albeit with the maskall bit
set).

You might have to add a new check before the pci_conf_write16() that
disables INTx if `new_enabled && !msix->enabled`.

>      }
>      else if ( !new_enabled && msix->enabled )
>      {
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index c5301e284f..60bdc10c13 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -87,6 +87,9 @@ struct vpci {
>          } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
>          /* At most 6 BARS + 1 expansion ROM BAR. */
>  
> +        /* Guest view of the PCI_COMMAND register. */

Maybe we want to add '(domU only)' to the comment.

Thanks, Roger.
Volodymyr Babchuk Dec. 1, 2023, 2:05 a.m. UTC | #3
Hi Roger, Stewart,

Roger Pau Monné <roger.pau@citrix.com> writes:

> On Thu, Oct 12, 2023 at 10:09:18PM +0000, Volodymyr Babchuk wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> 
>> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
>> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
>> guest's view of this will want to be zero initially, the host having set
>> it to 1 may not easily be overwritten with 0, or else we'd effectively
>> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
>> proper emulation in order to honor host's settings.
>> 
>> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
>> Device Control" the reset state of the command register is typically 0,
>> so when assigning a PCI device use 0 as the initial state for the guest's view
>> of the command register.
>> 
>> Here is the full list of command register bits with notes about
>> emulation, along with QEMU behavior in the same situation:
>> 
>> PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
>> in real device. Instead it is always set to 1. A guest can write to this
>> register, but writes are ignored.
>> 
>> PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
>> Xen case, we handle writes to this bit by mapping/unmapping BAR
>> regions. For devices assigned to DomUs, memory decoding will be
>> disabled and the initialization.
>> 
>> PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
>> writes to this bit.
>> 
>> PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
>> access to host bridge that supports software generation of special
>> cycles. In our case guest has no access to host bridges at all. Value
>> after reset is 0. QEMU passes through writes of this bit, we will do
>> the same.
>> 
>> PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
>> to be generated. It requires additional configuration via Cacheline
>> Size register. We are not emulating this register right now and we
>> can't expect guest to properly configure it. QEMU "emulates" access to
>> Cachline Size register by ignoring all writes to it. QEMU passes through
>> writes of PCI_COMMAND_INVALIDATE bit, we will do the same.
>> 
>> PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
>> through writes of this bit, we will do the same.
>> 
>> PCI_COMMAND_PARITY - Controls how device response to parity
>> errors. QEMU ignores writes to this bit, we will do the same.
>> 
>> PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
>> through writes of this bit, so we will do the same.
>> 
>> PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
>> writes to this bit, we will do the same.
>> 
>> PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
>> transactions. It is configured by firmware, so we don't want guest to
>> control it. QEMU ignores writes to this bit, we will do the same.
>> 
>> PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
>> enabled, device is prohibited from asserting INTx as per
>> specification. Value after reset is 0. In QEMU case, it checks of INTx
>> was mapped for a device. If it is not, then guest can't control
>> PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
>> change value of this bit if MSI(X) is enabled.
>
> FWIW, bits 11-15 are RsvdP, so we will need to add support for them
> after the series from Stewart that adds support for register masks is
> accepted.

Stewart's series implement much better register handling than this
patch. What about dropping this change at all in favor of Stewart's
implementation? I'll be 100% okay with this.

[...]
Roger Pau Monné Dec. 1, 2023, 9:04 a.m. UTC | #4
On Fri, Dec 01, 2023 at 02:05:54AM +0000, Volodymyr Babchuk wrote:
> 
> Hi Roger, Stewart,
> 
> Roger Pau Monné <roger.pau@citrix.com> writes:
> 
> > On Thu, Oct 12, 2023 at 10:09:18PM +0000, Volodymyr Babchuk wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >> 
> >> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
> >> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
> >> guest's view of this will want to be zero initially, the host having set
> >> it to 1 may not easily be overwritten with 0, or else we'd effectively
> >> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
> >> proper emulation in order to honor host's settings.
> >> 
> >> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
> >> Device Control" the reset state of the command register is typically 0,
> >> so when assigning a PCI device use 0 as the initial state for the guest's view
> >> of the command register.
> >> 
> >> Here is the full list of command register bits with notes about
> >> emulation, along with QEMU behavior in the same situation:
> >> 
> >> PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
> >> in real device. Instead it is always set to 1. A guest can write to this
> >> register, but writes are ignored.
> >> 
> >> PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
> >> Xen case, we handle writes to this bit by mapping/unmapping BAR
> >> regions. For devices assigned to DomUs, memory decoding will be
> >> disabled and the initialization.
> >> 
> >> PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
> >> writes to this bit.
> >> 
> >> PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
> >> access to host bridge that supports software generation of special
> >> cycles. In our case guest has no access to host bridges at all. Value
> >> after reset is 0. QEMU passes through writes of this bit, we will do
> >> the same.
> >> 
> >> PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
> >> to be generated. It requires additional configuration via Cacheline
> >> Size register. We are not emulating this register right now and we
> >> can't expect guest to properly configure it. QEMU "emulates" access to
> >> Cachline Size register by ignoring all writes to it. QEMU passes through
> >> writes of PCI_COMMAND_INVALIDATE bit, we will do the same.
> >> 
> >> PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
> >> through writes of this bit, we will do the same.
> >> 
> >> PCI_COMMAND_PARITY - Controls how device response to parity
> >> errors. QEMU ignores writes to this bit, we will do the same.
> >> 
> >> PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
> >> through writes of this bit, so we will do the same.
> >> 
> >> PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
> >> writes to this bit, we will do the same.
> >> 
> >> PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
> >> transactions. It is configured by firmware, so we don't want guest to
> >> control it. QEMU ignores writes to this bit, we will do the same.
> >> 
> >> PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
> >> enabled, device is prohibited from asserting INTx as per
> >> specification. Value after reset is 0. In QEMU case, it checks of INTx
> >> was mapped for a device. If it is not, then guest can't control
> >> PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
> >> change value of this bit if MSI(X) is enabled.
> >
> > FWIW, bits 11-15 are RsvdP, so we will need to add support for them
> > after the series from Stewart that adds support for register masks is
> > accepted.
> 
> Stewart's series implement much better register handling than this
> patch. What about dropping this change at all in favor of Stewart's
> implementation? I'll be 100% okay with this.

That's fine.  I expect Stewart's series will go in quite soon, and
then we can likely rework this commit on top of them?

Thanks, Roger.
Stewart Hildebrand Dec. 21, 2023, 10:58 p.m. UTC | #5
On 11/21/23 09:17, Roger Pau Monné wrote:
> On Thu, Oct 12, 2023 at 10:09:18PM +0000, Volodymyr Babchuk wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
>> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
>> guest's view of this will want to be zero initially, the host having set
>> it to 1 may not easily be overwritten with 0, or else we'd effectively
>> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
>> proper emulation in order to honor host's settings.
>>
>> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
>> Device Control" the reset state of the command register is typically 0,
>> so when assigning a PCI device use 0 as the initial state for the guest's view
>> of the command register.
>>
>> Here is the full list of command register bits with notes about
>> emulation, along with QEMU behavior in the same situation:
>>
>> PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
>> in real device. Instead it is always set to 1.

As far as I can tell QEMU only sets this bit to 1 (in hardware) if it exposes an I/O BAR to the guest.

>> A guest can write to this
>> register, but writes are ignored.

In Xen, I think we should use rsvdp_mask for domUs for now since we don't (yet) support I/O BARs for domUs in vPCI.

>>
>> PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
>> Xen case, we handle writes to this bit by mapping/unmapping BAR
>> regions. For devices assigned to DomUs, memory decoding will be
>> disabled and the initialization.
>>
>> PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
>> writes to this bit.
>>
>> PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
>> access to host bridge that supports software generation of special
>> cycles. In our case guest has no access to host bridges at all. Value
>> after reset is 0. QEMU passes through writes of this bit, we will do
>> the same.
>>
>> PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
>> to be generated. It requires additional configuration via Cacheline
>> Size register. We are not emulating this register right now and we
>> can't expect guest to properly configure it. QEMU "emulates" access to
>> Cachline Size register by ignoring all writes to it. QEMU passes through
>> writes of PCI_COMMAND_INVALIDATE bit, we will do the same.
>>
>> PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
>> through writes of this bit, we will do the same.
>>
>> PCI_COMMAND_PARITY - Controls how device response to parity
>> errors. QEMU ignores writes to this bit, we will do the same.
>>
>> PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
>> through writes of this bit, so we will do the same.

Actually, PCI_COMMAND_WAIT bit is in qemu's res_mask, meaning it only passes through the writes in permissive mode. By default qemu does not pass through writes. PCI LB 3.0 and PCIe 6.1 specifications say devices should hardwire this bit to 0, but that some devices may have implemented it as RW. So I think we should use rsvdp_mask in Xen for this bit.

>>
>> PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
>> writes to this bit, we will do the same.
>>
>> PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
>> transactions. It is configured by firmware, so we don't want guest to
>> control it. QEMU ignores writes to this bit, we will do the same.
>>
>> PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
>> enabled, device is prohibited from asserting INTx as per
>> specification. Value after reset is 0. In QEMU case, it checks of INTx
>> was mapped for a device. If it is not, then guest can't control
>> PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
>> change value of this bit if MSI(X) is enabled.
> 
> FWIW, bits 11-15 are RsvdP, so we will need to add support for them
> after the series from Stewart that adds support for register masks is
> accepted.

I am working on this.

> 
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> ---
>> In v10:
>> - Added cf_check attribute to guest_cmd_read
>> - Removed warning about non-zero cmd
>> - Updated comment MSI code regarding disabling INTX
>> - Used ternary operator in vpci_add_register() call
>> - Disable memory decoding for DomUs in init_bars()
>> In v9:
>> - Reworked guest_cmd_read
>> - Added handling for more bits
>> Since v6:
>> - fold guest's logic into cmd_write
>> - implement cmd_read, so we can report emulated INTx state to guests
>> - introduce header->guest_cmd to hold the emulated state of the
>>   PCI_COMMAND register for guests
>> Since v5:
>> - add additional check for MSI-X enabled while altering INTX bit
>> - make sure INTx disabled while guests enable MSI/MSI-X
>> Since v3:
>> - gate more code on CONFIG_HAS_MSI
>> - removed logic for the case when MSI/MSI-X not enabled
>> ---
>>  xen/drivers/vpci/header.c | 44 +++++++++++++++++++++++++++++++++++----
>>  xen/drivers/vpci/msi.c    |  6 ++++++
>>  xen/drivers/vpci/msix.c   |  4 ++++
>>  xen/include/xen/vpci.h    |  3 +++
>>  4 files changed, 53 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index efce0bc2ae..e8eed6a674 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -501,14 +501,32 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>      return 0;
>>  }
>>  
>> +/* TODO: Add proper emulation for all bits of the command register. */
>>  static void cf_check cmd_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
>>  {
>>      struct vpci_header *header = data;
>>  
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        const struct vpci *vpci = pdev->vpci;
>> +        uint16_t excluded = PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
>> +            PCI_COMMAND_FAST_BACK;
> 
> You could implement those bits using the RsvdP mask also.

Will do. The rsvdp_mask (in the write path) has already been applied before reaching this handler, so the guest's write value won't propagate to the header->guest_cmd variable. This is okay as long as ...

> 
> You seem to miss PCI_COMMAND_IO?  In the commit message you note that
> writes are ignored, yet here you seem to pass through writes to the
> underlying device (which might be OK, but needs to be coherent with
> the commit message).
> 
>> +
>> +        header->guest_cmd = cmd;
> 
> I'm kind of unsure whether we want to fake the guest view by returning
> what the guest writes.

... we don't provide an emulated view of the additional bits that we're treating as RsvdP. So let's not provide such an emulated/fake view for these bits (for now, at least).

Side note: qemu does provide such an emulated view, using a combination of emu_mask and emulated register variables. Looking at the qemu history, it looks like there may be other registers where we'd likely want to have such an emulated/fake view. So, regardless of whether we want to emulate certain bits in the command register, having the flexibility of a emulated mask/register in vPCI could (eventually) be beneficial (but not as part of this series). I'll make an in-code comment that we may want to re-visit this in the future.

> 
>> +
>> +        if ( (vpci->msi && vpci->msi->enabled) ||
>> +             (vpci->msix && vpci->msi->enabled) )
> 
> The typo that you mentioned about msi vs msix.
> 
>> +            excluded |= PCI_COMMAND_INTX_DISABLE;
>> +
>> +        cmd &= ~excluded;
>> +        cmd |= pci_conf_read16(pdev->sbdf, reg) & excluded;
>> +    }
>> +
>>      /*
>> -     * Let Dom0 play with all the bits directly except for the memory
>> -     * decoding one.
>> +     * Let guest play with all the bits directly except for the memory
>> +     * decoding one. Bits that are not allowed for DomU are already
>> +     * handled above.
> 
> I think this should be: "Let Dom0 play with all the bits directly ..."
> as you mention both Dom0 and DomU.
> 
>>       */
>>      if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
>>          /*
>> @@ -522,6 +540,14 @@ static void cf_check cmd_write(
>>          pci_conf_write16(pdev->sbdf, reg, cmd);
>>  }
>>  
>> +static uint32_t cf_check guest_cmd_read(
>> +    const struct pci_dev *pdev, unsigned int reg, void *data)
>> +{
>> +    const struct vpci_header *header = data;
>> +
>> +    return header->guest_cmd;
>> +}
>> +
>>  static void cf_check bar_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>  {
>> @@ -737,8 +763,9 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      }
>>  
>>      /* Setup a handler for the command register. */
>> -    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
>> -                           2, header);
>> +    rc = vpci_add_register(pdev->vpci,
>> +                           is_hwdom ? vpci_hw_read16 : guest_cmd_read,
>> +                           cmd_write, PCI_COMMAND, 2, header);
>>      if ( rc )
>>          return rc;
>>  
>> @@ -750,6 +777,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      if ( cmd & PCI_COMMAND_MEMORY )
>>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
>>  
>> +    /*
>> +     * Clear PCI_COMMAND_MEMORY for DomUs, so they will always start with
>> +     * memory decoding disabled and to ensure that we will not call modify_bars()
>> +     * at the end of this function.
>> +     */
>> +    if ( !is_hwdom )
>> +        cmd &= ~PCI_COMMAND_MEMORY;
> 
> Just for symmetry I would also disable PCI_COMMAND_IO.
> 
> I do wonder in which states does SeaBIOS or OVMF expects to find the
> devices.
> 
>> +    header->guest_cmd = cmd;
>> +
>>      for ( i = 0; i < num_bars; i++ )
>>      {
>>          uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index 2faa54b7ce..0920bd071f 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -70,6 +70,12 @@ static void cf_check control_write(
>>  
>>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>>              return;
>> +
>> +        /*
>> +         * Make sure guest doesn't enable INTx while enabling MSI.
>> +         */
>> +        if ( !is_hardware_domain(pdev->domain) )
>> +            pci_intx(pdev, false);
>>      }
>>      else
>>          vpci_msi_arch_disable(msi, pdev);
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index b6abab47ef..9d0233d0e3 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -97,6 +97,10 @@ static void cf_check control_write(
>>          for ( i = 0; i < msix->max_entries; i++ )
>>              if ( !msix->entries[i].masked && msix->entries[i].updated )
>>                  update_entry(&msix->entries[i], pdev, i);
>> +
>> +        /* Make sure guest doesn't enable INTx while enabling MSI-X. */
>> +        if ( !is_hardware_domain(pdev->domain) )
>> +            pci_intx(pdev, false);
> 
> Note that if both new_enabled and new_masked are set, you won't get
> inside of this condition, and that could lead to MSIX being enabled
> with INTx set in the command register (albeit with the maskall bit
> set).
> 
> You might have to add a new check before the pci_conf_write16() that
> disables INTx if `new_enabled && !msix->enabled`.
> 
>>      }
>>      else if ( !new_enabled && msix->enabled )
>>      {
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index c5301e284f..60bdc10c13 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -87,6 +87,9 @@ struct vpci {
>>          } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
>>          /* At most 6 BARS + 1 expansion ROM BAR. */
>>  
>> +        /* Guest view of the PCI_COMMAND register. */
> 
> Maybe we want to add '(domU only)' to the comment.
> 
> Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index efce0bc2ae..e8eed6a674 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -501,14 +501,32 @@  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
     return 0;
 }
 
+/* TODO: Add proper emulation for all bits of the command register. */
 static void cf_check cmd_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
 {
     struct vpci_header *header = data;
 
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        const struct vpci *vpci = pdev->vpci;
+        uint16_t excluded = PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
+            PCI_COMMAND_FAST_BACK;
+
+        header->guest_cmd = cmd;
+
+        if ( (vpci->msi && vpci->msi->enabled) ||
+             (vpci->msix && vpci->msi->enabled) )
+            excluded |= PCI_COMMAND_INTX_DISABLE;
+
+        cmd &= ~excluded;
+        cmd |= pci_conf_read16(pdev->sbdf, reg) & excluded;
+    }
+
     /*
-     * Let Dom0 play with all the bits directly except for the memory
-     * decoding one.
+     * Let guest play with all the bits directly except for the memory
+     * decoding one. Bits that are not allowed for DomU are already
+     * handled above.
      */
     if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
         /*
@@ -522,6 +540,14 @@  static void cf_check cmd_write(
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static uint32_t cf_check guest_cmd_read(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    const struct vpci_header *header = data;
+
+    return header->guest_cmd;
+}
+
 static void cf_check bar_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
@@ -737,8 +763,9 @@  static int cf_check init_bars(struct pci_dev *pdev)
     }
 
     /* Setup a handler for the command register. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-                           2, header);
+    rc = vpci_add_register(pdev->vpci,
+                           is_hwdom ? vpci_hw_read16 : guest_cmd_read,
+                           cmd_write, PCI_COMMAND, 2, header);
     if ( rc )
         return rc;
 
@@ -750,6 +777,15 @@  static int cf_check init_bars(struct pci_dev *pdev)
     if ( cmd & PCI_COMMAND_MEMORY )
         pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
 
+    /*
+     * Clear PCI_COMMAND_MEMORY for DomUs, so they will always start with
+     * memory decoding disabled and to ensure that we will not call modify_bars()
+     * at the end of this function.
+     */
+    if ( !is_hwdom )
+        cmd &= ~PCI_COMMAND_MEMORY;
+    header->guest_cmd = cmd;
+
     for ( i = 0; i < num_bars; i++ )
     {
         uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 2faa54b7ce..0920bd071f 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -70,6 +70,12 @@  static void cf_check control_write(
 
         if ( vpci_msi_arch_enable(msi, pdev, vectors) )
             return;
+
+        /*
+         * Make sure guest doesn't enable INTx while enabling MSI.
+         */
+        if ( !is_hardware_domain(pdev->domain) )
+            pci_intx(pdev, false);
     }
     else
         vpci_msi_arch_disable(msi, pdev);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index b6abab47ef..9d0233d0e3 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -97,6 +97,10 @@  static void cf_check control_write(
         for ( i = 0; i < msix->max_entries; i++ )
             if ( !msix->entries[i].masked && msix->entries[i].updated )
                 update_entry(&msix->entries[i], pdev, i);
+
+        /* Make sure guest doesn't enable INTx while enabling MSI-X. */
+        if ( !is_hardware_domain(pdev->domain) )
+            pci_intx(pdev, false);
     }
     else if ( !new_enabled && msix->enabled )
     {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index c5301e284f..60bdc10c13 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -87,6 +87,9 @@  struct vpci {
         } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
         /* At most 6 BARS + 1 expansion ROM BAR. */
 
+        /* Guest view of the PCI_COMMAND register. */
+        uint16_t guest_cmd;
+
         /*
          * Store whether the ROM enable bit is set (doesn't imply ROM BAR
          * is mapped into guest p2m) if there's a ROM BAR on the device.