diff mbox series

[v12,10/15] vpci/header: emulate PCI_COMMAND register for guests

Message ID 20240109215145.430207-11-stewart.hildebrand@amd.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Stewart Hildebrand Jan. 9, 2024, 9:51 p.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.

Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.

PCI_COMMAND_IO (bit 0)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if an I/O BAR is exposed to the guest.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
    don't yet support I/O BARs for domUs.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_MEMORY (bit 1)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if a Memory BAR is exposed to the guest.
  Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
    regions.
  Xen domU: For devices assigned to DomUs, memory decoding will be
    disabled at the time of initialization.

PCI_COMMAND_MASTER (bit 2)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_SPECIAL (bit 3)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_INVALIDATE (bit 4)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_VGA_PALETTE (bit 5)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_PARITY (bit 6)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_WAIT (bit 7)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: hardwire to 0
  QEMU: res_mask
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_SERR (bit 8)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_FAST_BACK (bit 9)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_INTX_DISABLE (bit 10)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU checks if INTx was mapped
    for a device. If it is not, then guest can't control
    PCI_COMMAND_INTX_DISABLE bit.
  Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
  Xen dom0: We allow dom0 to control this bit freely.

Bits 11-15
  PCIe 6.1: RsvdP
  PCI LB 3.0: Reserved
  QEMU: res_mask
  Xen domU/dom0: rsvdp_mask

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
In v12:
- Rework patch using vpci_add_register_mask()
- Add bitmask #define in pci_regs.h according to PCIe 6.1 spec, except
  don't add the RO bits because they were RW in PCI LB 3.0 spec.
- Move and expand TODO comment about properly emulating bits
- Update guest_cmd in msi.c/msix.c:control_write()
- Simplify cmd_write(), thanks to rsvdp_mask
- Update commit description

In v11:
- Fix copy-paste mistake: vpci->msi should be vpci->msix
- Handle PCI_COMMAND_IO
- Fix condition for disabling INTx in the MSI-X code
- Show domU changes to only allowed bits
- Show PCI_COMMAND_MEMORY write only after P2M was altered
- Update comments in the code
In v10:
- Added cf_check attribute to guest_cmd_read
- Removed warning about non-zero cmd
- Updated comment MSI code regarding disabling INTX
- Used ternary operator in vpci_add_register() call
- Disable memory decoding for DomUs in init_bars()
In v9:
- Reworked guest_cmd_read
- Added handling for more bits
Since v6:
- fold guest's logic into cmd_write
- implement cmd_read, so we can report emulated INTx state to guests
- introduce header->guest_cmd to hold the emulated state of the
  PCI_COMMAND register for guests
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c  | 59 +++++++++++++++++++++++++++++++++++---
 xen/drivers/vpci/msi.c     |  9 ++++++
 xen/drivers/vpci/msix.c    |  7 +++++
 xen/include/xen/pci_regs.h |  1 +
 xen/include/xen/vpci.h     |  3 ++
 5 files changed, 75 insertions(+), 4 deletions(-)

Comments

Jan Beulich Jan. 25, 2024, 3:43 p.m. UTC | #1
On 09.01.2024 22:51, Stewart Hildebrand wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>      if ( !rom_only )
>      {
>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> +        /* Show DomU that we updated P2M */
> +        header->guest_cmd &= ~PCI_COMMAND_MEMORY;
> +        header->guest_cmd |= cmd & PCI_COMMAND_MEMORY;
>          header->bars_mapped = map;
>      }

I don't follow what the comment means to say. The bit in question has no
real connection to the P2M, and the guest also may have no notion of the
underlying hypervisor's internals. Likely connected to ...

> @@ -524,9 +527,26 @@ static void cf_check cmd_write(
>  {
>      struct vpci_header *header = data;
>  
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        const struct vpci *vpci = pdev->vpci;
> +
> +        if ( (vpci->msi && vpci->msi->enabled) ||
> +             (vpci->msix && vpci->msix->enabled) )
> +            cmd |= PCI_COMMAND_INTX_DISABLE;
> +
> +        /*
> +         * Do not show change to PCI_COMMAND_MEMORY bit until we finish
> +         * modifying P2M mappings.
> +         */
> +        header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) |
> +                            (header->guest_cmd & PCI_COMMAND_MEMORY);
> +    }

... the comment here, but then shouldn't it be that the guest can't even
issue a 2nd cfg space access until the present write has been carried out?
Otherwise I'd be inclined to claim that such a partial update is unlikely
to be spec-conformant.

> @@ -843,6 +885,15 @@ static int cf_check init_header(struct pci_dev *pdev)
>      if ( cmd & PCI_COMMAND_MEMORY )
>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
>  
> +    /*
> +     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
> +     * always start with memory decoding disabled and to ensure that we will not
> +     * call modify_bars() at the end of this function.
> +     */
> +    if ( !is_hwdom )
> +        cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
> +    header->guest_cmd = cmd;

With PCI_COMMAND_MEMORY clear, the hw reg won't further be written on the
success return path. Yet wouldn't we better clear PCI_COMMAND_IO also in
hardware (until we properly support it)?

I also think the insertion point for the new code isn't well chosen: The
comment just out of context indicates that the code in context above is
connected to the subsequent code. Whereas the addition is not.

> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -70,6 +70,15 @@ static void cf_check control_write(
>  
>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>              return;
> +
> +        /*
> +         * Make sure domU doesn't enable INTx while enabling MSI.
> +         */

Nit: This ought to be a single line comment, just like ...

> +        if ( !is_hardware_domain(pdev->domain) )
> +        {
> +            pci_intx(pdev, false);
> +            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
> +        }
>      }
>      else
>          vpci_msi_arch_disable(msi, pdev);
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -135,6 +135,13 @@ static void cf_check control_write(
>          }
>      }
>  
> +    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
> +    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
> +    {
> +        pci_intx(pdev, false);
> +        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
> +    }

... the similar code here has it.

In both cases, is it really appropriate to set the bit in guest view?

Jan
Stewart Hildebrand Feb. 1, 2024, 4:50 a.m. UTC | #2
On 1/25/24 10:43, Jan Beulich wrote:
> On 09.01.2024 22:51, Stewart Hildebrand wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>>      if ( !rom_only )
>>      {
>>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> +        /* Show DomU that we updated P2M */
>> +        header->guest_cmd &= ~PCI_COMMAND_MEMORY;
>> +        header->guest_cmd |= cmd & PCI_COMMAND_MEMORY;
>>          header->bars_mapped = map;
>>      }
> 
> I don't follow what the comment means to say. The bit in question has no
> real connection to the P2M, and the guest also may have no notion of the
> underlying hypervisor's internals. Likely connected to ...

Indeed. If the comment survives to v13, I'll update it to:

        /* Now that we updated P2M, show DomU change to PCI_COMMAND_MEMORY */

> 
>> @@ -524,9 +527,26 @@ static void cf_check cmd_write(
>>  {
>>      struct vpci_header *header = data;
>>  
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        const struct vpci *vpci = pdev->vpci;
>> +
>> +        if ( (vpci->msi && vpci->msi->enabled) ||
>> +             (vpci->msix && vpci->msix->enabled) )
>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>> +
>> +        /*
>> +         * Do not show change to PCI_COMMAND_MEMORY bit until we finish
>> +         * modifying P2M mappings.
>> +         */
>> +        header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) |
>> +                            (header->guest_cmd & PCI_COMMAND_MEMORY);
>> +    }
> 
> ... the comment here, but then shouldn't it be that the guest can't even
> issue a 2nd cfg space access until the present write has been carried out?
> Otherwise I'd be inclined to claim that such a partial update is unlikely
> to be spec-conformant.

Due to the raise_softirq() call added in

  3e568fa9e19c ("vpci: fix deferral of long operations")

my current understanding is: when the guest toggles memory decoding, the guest vcpu doesn't resume execution until vpci_process_pending() and modify_decoding() have finished. So I think the guest should see a consistent state of the register, unless it was trying to read from a different vcpu than the one doing the writing.

Regardless, if the guest did have an opportunity to successfully read the partially updated state of the register, I'm not really spotting what part of the spec that would be a violation of. PCIe 6.1 has this description regarding the bit: "When this bit is Set" and "When this bit is Clear" the device will decode (or not) memory accesses. The spec doesn't seem to distinguish whether the host or the device itself is the one to set/clear the bit. One might even try to argue the opposite: allowing the bit to be toggled before the device reflects the change would be a violation of spec. Since the spec is ambiguous in this regard, I don't think either argument is particularly strong.

Chesterton's fence: the logic for deferring the update of PCI_COMMAND_MEMORY in guest_cmd was added between v10 and v11 of this series. I went back to look at the review comments on v10 [1], but the rationale is still not entirely clear to me. At the end of the day, with the information I have at hand, I suspect it would be fine either way (whether updating guest_cmd is deferred or not). If no other info comes to light, I'm leaning toward not deferring because it would be simpler to update the bit right away in cmd_write().

[1] https://lore.kernel.org/xen-devel/ZVy73iJ3E8nJHvgf@macbook.local/

> 
>> @@ -843,6 +885,15 @@ static int cf_check init_header(struct pci_dev *pdev)
>>      if ( cmd & PCI_COMMAND_MEMORY )
>>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
>>  
>> +    /*
>> +     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
>> +     * always start with memory decoding disabled and to ensure that we will not
>> +     * call modify_bars() at the end of this function.
>> +     */
>> +    if ( !is_hwdom )
>> +        cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
>> +    header->guest_cmd = cmd;
> 
> With PCI_COMMAND_MEMORY clear, the hw reg won't further be written on the
> success return path. Yet wouldn't we better clear PCI_COMMAND_IO also in
> hardware (until we properly support it)?

Yes, I'll clear PCI_COMMAND_IO in hardware too

> 
> I also think the insertion point for the new code isn't well chosen: The
> comment just out of context indicates that the code in context above is
> connected to the subsequent code. Whereas the addition is not.

I'll rearrange it

> 
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -70,6 +70,15 @@ static void cf_check control_write(
>>  
>>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>>              return;
>> +
>> +        /*
>> +         * Make sure domU doesn't enable INTx while enabling MSI.
>> +         */
> 
> Nit: This ought to be a single line comment, just like ...

OK, I'll make it a single line

> 
>> +        if ( !is_hardware_domain(pdev->domain) )
>> +        {
>> +            pci_intx(pdev, false);
>> +            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
>> +        }
>>      }
>>      else
>>          vpci_msi_arch_disable(msi, pdev);
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -135,6 +135,13 @@ static void cf_check control_write(
>>          }
>>      }
>>  
>> +    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
>> +    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
>> +    {
>> +        pci_intx(pdev, false);
>> +        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
>> +    }
> 
> ... the similar code here has it.
> 
> In both cases, is it really appropriate to set the bit in guest view?

I added this based on Roger's comment at [2]. Roger, what do you think? I don't believe QEMU updates the guest view in this manner.

[2] https://lore.kernel.org/xen-devel/ZLqI65gmNj1XDBm4@MacBook-Air-de-Roger.local/
Jan Beulich Feb. 1, 2024, 8:14 a.m. UTC | #3
On 01.02.2024 05:50, Stewart Hildebrand wrote:
> On 1/25/24 10:43, Jan Beulich wrote:
>> On 09.01.2024 22:51, Stewart Hildebrand wrote:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>>>      if ( !rom_only )
>>>      {
>>>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>>> +        /* Show DomU that we updated P2M */
>>> +        header->guest_cmd &= ~PCI_COMMAND_MEMORY;
>>> +        header->guest_cmd |= cmd & PCI_COMMAND_MEMORY;
>>>          header->bars_mapped = map;
>>>      }
>>
>> I don't follow what the comment means to say. The bit in question has no
>> real connection to the P2M, and the guest also may have no notion of the
>> underlying hypervisor's internals. Likely connected to ...
> 
> Indeed. If the comment survives to v13, I'll update it to:
> 
>         /* Now that we updated P2M, show DomU change to PCI_COMMAND_MEMORY */
> 
>>
>>> @@ -524,9 +527,26 @@ static void cf_check cmd_write(
>>>  {
>>>      struct vpci_header *header = data;
>>>  
>>> +    if ( !is_hardware_domain(pdev->domain) )
>>> +    {
>>> +        const struct vpci *vpci = pdev->vpci;
>>> +
>>> +        if ( (vpci->msi && vpci->msi->enabled) ||
>>> +             (vpci->msix && vpci->msix->enabled) )
>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>> +
>>> +        /*
>>> +         * Do not show change to PCI_COMMAND_MEMORY bit until we finish
>>> +         * modifying P2M mappings.
>>> +         */
>>> +        header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) |
>>> +                            (header->guest_cmd & PCI_COMMAND_MEMORY);
>>> +    }
>>
>> ... the comment here, but then shouldn't it be that the guest can't even
>> issue a 2nd cfg space access until the present write has been carried out?
>> Otherwise I'd be inclined to claim that such a partial update is unlikely
>> to be spec-conformant.
> 
> Due to the raise_softirq() call added in
> 
>   3e568fa9e19c ("vpci: fix deferral of long operations")
> 
> my current understanding is: when the guest toggles memory decoding, the guest vcpu doesn't resume execution until vpci_process_pending() and modify_decoding() have finished. So I think the guest should see a consistent state of the register, unless it was trying to read from a different vcpu than the one doing the writing.
> 
> Regardless, if the guest did have an opportunity to successfully read the partially updated state of the register, I'm not really spotting what part of the spec that would be a violation of. PCIe 6.1 has this description regarding the bit: "When this bit is Set" and "When this bit is Clear" the device will decode (or not) memory accesses. The spec doesn't seem to distinguish whether the host or the device itself is the one to set/clear the bit. One might even try to argue the opposite: allowing the bit to be toggled before the device reflects the change would be a violation of spec. Since the spec is ambiguous in this regard, I don't think either argument is particularly strong.
> 
> Chesterton's fence: the logic for deferring the update of PCI_COMMAND_MEMORY in guest_cmd was added between v10 and v11 of this series. I went back to look at the review comments on v10 [1], but the rationale is still not entirely clear to me.

Indeed. The only sentence possibly hinting in such a direction would imo
have been "I'm kind of unsure whether we want to fake the guest view by
returning what the guest writes." It's unclear to me whether it really
was meant that way.

> At the end of the day, with the information I have at hand, I suspect it would be fine either way (whether updating guest_cmd is deferred or not). If no other info comes to light, I'm leaning toward not deferring because it would be simpler to update the bit right away in cmd_write().

I'm not sure it would be fine either way. Config space writes aren't
posted writes, so they complete synchronously. IOW whatever internal
state updates are needed in the device, they ought to have finished by
the time the write completes.

> [1] https://lore.kernel.org/xen-devel/ZVy73iJ3E8nJHvgf@macbook.local/
> 
>>[...]
>>> --- a/xen/drivers/vpci/msix.c
>>> +++ b/xen/drivers/vpci/msix.c
>>> @@ -135,6 +135,13 @@ static void cf_check control_write(
>>>          }
>>>      }
>>>  
>>> +    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
>>> +    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
>>> +    {
>>> +        pci_intx(pdev, false);
>>> +        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
>>> +    }
>>
>> ... the similar code here has it.
>>
>> In both cases, is it really appropriate to set the bit in guest view?
> 
> I added this based on Roger's comment at [2]. Roger, what do you think? I don't believe QEMU updates the guest view in this manner.
> 
> [2] https://lore.kernel.org/xen-devel/ZLqI65gmNj1XDBm4@MacBook-Air-de-Roger.local/

Leaving this for Roger to answer.

Jan
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index f0b0b64b0929..374e8e119231 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -168,6 +168,9 @@  static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
     if ( !rom_only )
     {
         pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+        /* Show DomU that we updated P2M */
+        header->guest_cmd &= ~PCI_COMMAND_MEMORY;
+        header->guest_cmd |= cmd & PCI_COMMAND_MEMORY;
         header->bars_mapped = map;
     }
     else
@@ -524,9 +527,26 @@  static void cf_check cmd_write(
 {
     struct vpci_header *header = data;
 
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        const struct vpci *vpci = pdev->vpci;
+
+        if ( (vpci->msi && vpci->msi->enabled) ||
+             (vpci->msix && vpci->msix->enabled) )
+            cmd |= PCI_COMMAND_INTX_DISABLE;
+
+        /*
+         * Do not show change to PCI_COMMAND_MEMORY bit until we finish
+         * modifying P2M mappings.
+         */
+        header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) |
+                            (header->guest_cmd & PCI_COMMAND_MEMORY);
+    }
+
     /*
      * Let Dom0 play with all the bits directly except for the memory
-     * decoding one.
+     * decoding one. Bits that are not allowed for DomU are already
+     * handled above and by the rsvdp_mask.
      */
     if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
         /*
@@ -540,6 +560,14 @@  static void cf_check cmd_write(
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static uint32_t cf_check guest_cmd_read(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    const struct vpci_header *header = data;
+
+    return header->guest_cmd;
+}
+
 static void cf_check bar_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
@@ -756,9 +784,23 @@  static int cf_check init_header(struct pci_dev *pdev)
         return -EOPNOTSUPP;
     }
 
-    /* Setup a handler for the command register. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-                           2, header);
+    /*
+     * Setup a handler for the command register.
+     *
+     * TODO: If support for emulated bits is added, re-visit how to handle
+     * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK.
+     */
+    rc = vpci_add_register_mask(pdev->vpci,
+                                is_hwdom ? vpci_hw_read16 : guest_cmd_read,
+                                cmd_write, PCI_COMMAND, 2, header, 0, 0,
+                                PCI_COMMAND_RSVDP_MASK |
+                                    (is_hwdom ? 0
+                                              : PCI_COMMAND_IO |
+                                                PCI_COMMAND_PARITY |
+                                                PCI_COMMAND_WAIT |
+                                                PCI_COMMAND_SERR |
+                                                PCI_COMMAND_FAST_BACK),
+                                0);
     if ( rc )
         return rc;
 
@@ -843,6 +885,15 @@  static int cf_check init_header(struct pci_dev *pdev)
     if ( cmd & PCI_COMMAND_MEMORY )
         pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
 
+    /*
+     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
+     * always start with memory decoding disabled and to ensure that we will not
+     * call modify_bars() at the end of this function.
+     */
+    if ( !is_hwdom )
+        cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
+    header->guest_cmd = cmd;
+
     for ( i = 0; i < num_bars; i++ )
     {
         uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 6ff71e5f9ab7..aae8055ce92d 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -70,6 +70,15 @@  static void cf_check control_write(
 
         if ( vpci_msi_arch_enable(msi, pdev, vectors) )
             return;
+
+        /*
+         * Make sure domU doesn't enable INTx while enabling MSI.
+         */
+        if ( !is_hardware_domain(pdev->domain) )
+        {
+            pci_intx(pdev, false);
+            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
+        }
     }
     else
         vpci_msi_arch_disable(msi, pdev);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index b6abab47efdd..d4f756ce287a 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -135,6 +135,13 @@  static void cf_check control_write(
         }
     }
 
+    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
+    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
+    {
+        pci_intx(pdev, false);
+        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
+    }
+
     msix->masked = new_masked;
     msix->enabled = new_enabled;
 
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 9909b27425a5..b2f2e43e864d 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -48,6 +48,7 @@ 
 #define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
 #define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
+#define  PCI_COMMAND_RSVDP_MASK	0xf800
 
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_IMM_READY	0x01	/* Immediate Readiness */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e89c571890b2..77320a667e55 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -107,6 +107,9 @@  struct vpci {
         } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
         /* At most 6 BARS + 1 expansion ROM BAR. */
 
+        /* Guest (domU only) view of the PCI_COMMAND register. */
+        uint16_t guest_cmd;
+
         /*
          * Store whether the ROM enable bit is set (doesn't imply ROM BAR
          * is mapped into guest p2m) if there's a ROM BAR on the device.