diff mbox series

[v2,1/1] pcie: Add hotplug detect state register to cmask

Message ID 20230706045546.593605-3-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/1] pcie: Add hotplug detect state register to cmask | expand

Commit Message

Leonardo Bras July 6, 2023, 4:55 a.m. UTC
When trying to migrate a machine type pc-q35-6.0 or lower, with this
cmdline options,

-device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
-device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1

the following bug happens after all ram pages were sent:

qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
qemu-kvm: load of migration failed: Invalid argument

This happens on pc-q35-6.0 or lower because of:
{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }

In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
hotplug for the guest. After a while the guest will deal with this hotplug
and qemu will clear the above bit.

Then, during migration, get_pci_config_device() will compare the
configs of both the freshly created device and the one that is being
received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
and cause the bug to reproduce.

To avoid this fake incompatibility, there are tree fields in PCIDevice that
can help:

- wmask: Used to implement R/W bytes, and
- w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
- cmask: Used to enable config checks on load.

According to PCI Express® Base Specification Revision 5.0 Version 1.0,
table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
listed as RO (read-only), so it only makes sense to make use of the cmask
field.

So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
get_pci_config_device() does not abort the migration.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 hw/pci/pcie.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Juan Quintela July 6, 2023, 7:37 a.m. UTC | #1
Leonardo Bras <leobras@redhat.com> wrote:
> When trying to migrate a machine type pc-q35-6.0 or lower, with this
> cmdline options,
>
> -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
>
> the following bug happens after all ram pages were sent:
>
> qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> qemu-kvm: Failed to load PCIDevice:config
> qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> qemu-kvm: load of migration failed: Invalid argument
>
> This happens on pc-q35-6.0 or lower because of:
> { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
>
> In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
> hotplug for the guest. After a while the guest will deal with this hotplug
> and qemu will clear the above bit.
>
> Then, during migration, get_pci_config_device() will compare the
> configs of both the freshly created device and the one that is being
> received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> and cause the bug to reproduce.
>
> To avoid this fake incompatibility, there are tree fields in PCIDevice that
> can help:
>
> - wmask: Used to implement R/W bytes, and
> - w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> - cmask: Used to enable config checks on load.
>
> According to PCI Express® Base Specification Revision 5.0 Version 1.0,
> table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
> listed as RO (read-only), so it only makes sense to make use of the cmask
> field.
>
> So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
> get_pci_config_device() does not abort the migration.
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> Signed-off-by: Leonardo Bras <leobras@redhat.com>




> ---
>  hw/pci/pcie.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index b8c24cf45f..cae56bf1c8 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -659,6 +659,10 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
>      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
>                                 PCI_EXP_HP_EV_SUPPORTED);
>  
> +    /* Avoid migration abortion when this device hot-removed by guest
> */

I would have included here the text in the commit:

 According to PCI Express® Base Specification Revision 5.0 Version 1.0,
 table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
 listed as RO (read-only), so it only makes sense to make use of the cmask
 field.

and

This happens on pc-q35-6.0 or lower because of:
{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }

so if we ever remove the machine type pc-q35-6.0, we can drop it.

Yes, I know that we don't drop machine types, but we should at some point.


> +    pci_word_test_and_clear_mask(dev->cmask + pos + PCI_EXP_SLTSTA,
> +                                 PCI_EXP_SLTSTA_PDS);
> +
>      dev->exp.hpev_notified = false;
>  
>      qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))),

I agree that this is (at least) a step on the right direction.

I wmould had expected to have to need some check related to the value
of:

{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }

But I will not claim _any_ understanding of the PCI specification.

So:

Reviewed-by: Juan Quintela <quintela@redhat.com>

about that it fixes the migration bug.
Peter Xu July 6, 2023, 2:35 p.m. UTC | #2
On Thu, Jul 06, 2023 at 01:55:47AM -0300, Leonardo Bras wrote:
> When trying to migrate a machine type pc-q35-6.0 or lower, with this
> cmdline options,
> 
> -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> 
> the following bug happens after all ram pages were sent:
> 
> qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> qemu-kvm: Failed to load PCIDevice:config
> qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> qemu-kvm: load of migration failed: Invalid argument
> 
> This happens on pc-q35-6.0 or lower because of:
> { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> 
> In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
> hotplug for the guest. After a while the guest will deal with this hotplug
> and qemu will clear the above bit.

Do you mean that the bit will be cleared after this point for the whole
lifecycle of the VM, as long as the pcie topology doesn't change again?

"This bit indicates the presence of an adapter in the slot"

IIUC the adapter in the slot is there, why it's cleared rather than set?

> 
> Then, during migration, get_pci_config_device() will compare the
> configs of both the freshly created device and the one that is being
> received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> and cause the bug to reproduce.
> 
> To avoid this fake incompatibility, there are tree fields in PCIDevice that
> can help:
> 
> - wmask: Used to implement R/W bytes, and
> - w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> - cmask: Used to enable config checks on load.
> 
> According to PCI Express® Base Specification Revision 5.0 Version 1.0,
> table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
> listed as RO (read-only), so it only makes sense to make use of the cmask
> field.
> 
> So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
> get_pci_config_device() does not abort the migration.

Yes, using cmask makes more sense to me, but we'd need some pci developer
to ack it at last I guess, anyway.

> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

I asked the same question, and I still keep confused: whether there's a
first bad commit?  Starting from when it fails?

For example, is this broken on 6.0 binaries too with pc-q35-6.0?

Thanks,

> ---
>  hw/pci/pcie.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index b8c24cf45f..cae56bf1c8 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -659,6 +659,10 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
>      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
>                                 PCI_EXP_HP_EV_SUPPORTED);
>  
> +    /* Avoid migration abortion when this device hot-removed by guest */
> +    pci_word_test_and_clear_mask(dev->cmask + pos + PCI_EXP_SLTSTA,
> +                                 PCI_EXP_SLTSTA_PDS);
> +
>      dev->exp.hpev_notified = false;
>  
>      qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))),
> -- 
> 2.41.0
>
Leonardo Bras July 6, 2023, 5:58 p.m. UTC | #3
On Thu, Jul 6, 2023 at 4:37 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > cmdline options,
> >
> > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> >
> > the following bug happens after all ram pages were sent:
> >
> > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > qemu-kvm: Failed to load PCIDevice:config
> > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > qemu-kvm: load of migration failed: Invalid argument
> >
> > This happens on pc-q35-6.0 or lower because of:
> > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> >
> > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
> > hotplug for the guest. After a while the guest will deal with this hotplug
> > and qemu will clear the above bit.
> >
> > Then, during migration, get_pci_config_device() will compare the
> > configs of both the freshly created device and the one that is being
> > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > and cause the bug to reproduce.
> >
> > To avoid this fake incompatibility, there are tree fields in PCIDevice that
> > can help:
> >
> > - wmask: Used to implement R/W bytes, and
> > - w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > - cmask: Used to enable config checks on load.
> >
> > According to PCI Express® Base Specification Revision 5.0 Version 1.0,
> > table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
> > listed as RO (read-only), so it only makes sense to make use of the cmask
> > field.
> >
> > So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
> > get_pci_config_device() does not abort the migration.
> >
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
>
>
>
> > ---
> >  hw/pci/pcie.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index b8c24cf45f..cae56bf1c8 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -659,6 +659,10 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> >                                 PCI_EXP_HP_EV_SUPPORTED);
> >
> > +    /* Avoid migration abortion when this device hot-removed by guest
> > */
>
> I would have included here the text in the commit:
>
>  According to PCI Express® Base Specification Revision 5.0 Version 1.0,
>  table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
>  listed as RO (read-only), so it only makes sense to make use of the cmask
>  field.
>
> and
>
> This happens on pc-q35-6.0 or lower because of:
> { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
>
> so if we ever remove the machine type pc-q35-6.0, we can drop it.

It makes sense adding this to the code.

In the remove machine-type case, IIUC we would have to drop every
machine-type older than pc-q35-6.0.
Also,  we would not support migration to any machine without
ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, right?

>
> Yes, I know that we don't drop machine types, but we should at some point.
>
>
> > +    pci_word_test_and_clear_mask(dev->cmask + pos + PCI_EXP_SLTSTA,
> > +                                 PCI_EXP_SLTSTA_PDS);
> > +
> >      dev->exp.hpev_notified = false;
> >
> >      qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))),
>
> I agree that this is (at least) a step on the right direction.
>
> I wmould had expected to have to need some check related to the value
> of:
>
> { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
>

This bug affects versions older than qemu 6.0. If we add this, we
would have some extra work backporting this to older versions (if
necessary) because the 'property' did not exist back then.

> But I will not claim _any_ understanding of the PCI specification.
>
> So:
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> about that it fixes the migration bug.
>

Thanks Juan!
Leo
Leonardo Bras July 6, 2023, 6:07 p.m. UTC | #4
On Thu, Jul 6, 2023 at 11:35 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jul 06, 2023 at 01:55:47AM -0300, Leonardo Bras wrote:
> > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > cmdline options,
> >
> > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> >
> > the following bug happens after all ram pages were sent:
> >
> > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > qemu-kvm: Failed to load PCIDevice:config
> > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > qemu-kvm: load of migration failed: Invalid argument
> >
> > This happens on pc-q35-6.0 or lower because of:
> > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> >
> > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
> > hotplug for the guest. After a while the guest will deal with this hotplug
> > and qemu will clear the above bit.
>
> Do you mean that the bit will be cleared after this point for the whole
> lifecycle of the VM, as long as the pcie topology doesn't change again?
>
> "This bit indicates the presence of an adapter in the slot"
>
> IIUC the adapter in the slot is there, why it's cleared rather than set?

Fort some reason the guest is powering down the device, and we have in qemu:

 /*
     * If the slot is populated, power indicator is off and power
     * controller is off, it is safe to detach the devices.
     *
     * Note: don't detach if condition was already true:
     * this is a work around for guests that overwrite
     * control of powered off slots before powering them on.
     */
    if ((sltsta & PCI_EXP_SLTSTA_PDS) && pcie_sltctl_powered_off(val) &&
        !pcie_sltctl_powered_off(old_slt_ctl))
    {
        pcie_cap_slot_do_unplug(dev);  // clears PCI_EXP_SLTSTA_PDS
    }


>
> >
> > Then, during migration, get_pci_config_device() will compare the
> > configs of both the freshly created device and the one that is being
> > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > and cause the bug to reproduce.
> >
> > To avoid this fake incompatibility, there are tree fields in PCIDevice that
> > can help:
> >
> > - wmask: Used to implement R/W bytes, and
> > - w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > - cmask: Used to enable config checks on load.
> >
> > According to PCI Express® Base Specification Revision 5.0 Version 1.0,
> > table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
> > listed as RO (read-only), so it only makes sense to make use of the cmask
> > field.
> >
> > So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
> > get_pci_config_device() does not abort the migration.
>
> Yes, using cmask makes more sense to me, but we'd need some pci developer
> to ack it at last I guess, anyway.

Agree! I am waiting for Michael's opinion on this.

>
> >
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
> I asked the same question, and I still keep confused: whether there's a
> first bad commit?  Starting from when it fails?
>
> For example, is this broken on 6.0 binaries too with pc-q35-6.0?

I tested for qemu 6.0, and it still reproduces, but have not pursued
this any further.

>
> Thanks,


Thank you!
Leo

>
> > ---
> >  hw/pci/pcie.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index b8c24cf45f..cae56bf1c8 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -659,6 +659,10 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> >                                 PCI_EXP_HP_EV_SUPPORTED);
> >
> > +    /* Avoid migration abortion when this device hot-removed by guest */
> > +    pci_word_test_and_clear_mask(dev->cmask + pos + PCI_EXP_SLTSTA,
> > +                                 PCI_EXP_SLTSTA_PDS);
> > +
> >      dev->exp.hpev_notified = false;
> >
> >      qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))),
> > --
> > 2.41.0
> >
>
> --
> Peter Xu
>
Peter Xu July 6, 2023, 6:14 p.m. UTC | #5
On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > I asked the same question, and I still keep confused: whether there's a
> > first bad commit?  Starting from when it fails?
> >
> > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> 
> I tested for qemu 6.0, and it still reproduces, but have not pursued
> this any further.

I see, thanks!

But then do you know why it's never hit before?  I assume it means this bug
has been there for a long time.
Leonardo Bras July 6, 2023, 6:37 p.m. UTC | #6
On Thu, Jul 6, 2023 at 3:24 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > I asked the same question, and I still keep confused: whether there's a
> > > first bad commit?  Starting from when it fails?
> > >
> > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> >
> > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > this any further.
>
> I see, thanks!
>
> But then do you know why it's never hit before?  I assume it means this bug
> has been there for a long time.


Oh, I totally missed updating the commit msg  on this:

---
In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
- hotplug for the guest. After a while the guest will deal with this hotplug
- and qemu will clear the above bit.
+ hotplug for the guest. After a while, if the guest powers down the device
+ qemu will clear the above bit.
---

The whole idea is that the guest powers down the device, which causes
qemu to hot-remove it, and clear PCI_EXP_SLTSTA_PDS.

---
 /*
     * If the slot is populated, power indicator is off and power
     * controller is off, it is safe to detach the devices.
     *
     * Note: don't detach if condition was already true:
     * this is a work around for guests that overwrite
     * control of powered off slots before powering them on.
     */
    if ((sltsta & PCI_EXP_SLTSTA_PDS) && pcie_sltctl_powered_off(val) &&
        !pcie_sltctl_powered_off(old_slt_ctl))
    {
        pcie_cap_slot_do_unplug(dev); // clear PCI_EXP_SLTSTA_PDS
    }
---

Since the bit is different on source & target qemu, the migration is aborted.


>
> --
> Peter Xu
>

Thanks for reviewing Peter!
I will send a v3 with the updated commit msg and add the comments
suggested by Juan in the source code.

Thanks!
Leo
Michael S. Tsirkin July 6, 2023, 6:50 p.m. UTC | #7
On Thu, Jul 06, 2023 at 02:14:37PM -0400, Peter Xu wrote:
> On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > I asked the same question, and I still keep confused: whether there's a
> > > first bad commit?  Starting from when it fails?
> > >
> > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> > 
> > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > this any further.
> 
> I see, thanks!
> 
> But then do you know why it's never hit before?  I assume it means this bug
> has been there for a long time.

It's a race - you have to migrate after the bit has been set before
the bit got cleared.
cmask is exactly for bits that qemu modifies itself.
Peter Xu July 6, 2023, 7:02 p.m. UTC | #8
On Thu, Jul 06, 2023 at 02:50:20PM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 06, 2023 at 02:14:37PM -0400, Peter Xu wrote:
> > On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > > I asked the same question, and I still keep confused: whether there's a
> > > > first bad commit?  Starting from when it fails?
> > > >
> > > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> > > 
> > > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > > this any further.
> > 
> > I see, thanks!
> > 
> > But then do you know why it's never hit before?  I assume it means this bug
> > has been there for a long time.
> 
> It's a race - you have to migrate after the bit has been set before
> the bit got cleared.
> cmask is exactly for bits that qemu modifies itself.

Michael, do you mean that Leo's patch is wrong?

I just got understood why it got cleared - I think Leo didn't mention that
the device was actually offlined before migration, IIUC that's why the PDS
bit got cleared, if PDS was trying to describe that of the slot.

According to:

    /* Used to enable checks on load. Note that writable bits are
     * never checked even if set in cmask. */
    uint8_t *cmask;

It does sound reasonable to me to have PDS cleared when device offlined.
Since hypervisor doesn't really know what the condition the slot presence
bit would be when migrating, it seems we should just clear the bit in
cmask.

So with the last reply from Leo, the patch looks all right to me.  It's
just that as Leo mentioned, we should mention the offline process if that's
the case, because that's definitely an important step to reproduce the issue.

Thanks,
Michael S. Tsirkin July 6, 2023, 8 p.m. UTC | #9
On Thu, Jul 06, 2023 at 03:02:07PM -0400, Peter Xu wrote:
> On Thu, Jul 06, 2023 at 02:50:20PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 06, 2023 at 02:14:37PM -0400, Peter Xu wrote:
> > > On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > > > I asked the same question, and I still keep confused: whether there's a
> > > > > first bad commit?  Starting from when it fails?
> > > > >
> > > > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> > > > 
> > > > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > > > this any further.
> > > 
> > > I see, thanks!
> > > 
> > > But then do you know why it's never hit before?  I assume it means this bug
> > > has been there for a long time.
> > 
> > It's a race - you have to migrate after the bit has been set before
> > the bit got cleared.
> > cmask is exactly for bits that qemu modifies itself.
> 
> Michael, do you mean that Leo's patch is wrong?


I mean his patch is exactly right. cmask was designed with this
kind of use case in mind.
Will queue.

> I just got understood why it got cleared - I think Leo didn't mention that
> the device was actually offlined before migration, IIUC that's why the PDS
> bit got cleared, if PDS was trying to describe that of the slot.
> 
> According to:
> 
>     /* Used to enable checks on load. Note that writable bits are
>      * never checked even if set in cmask. */
>     uint8_t *cmask;
> 
> It does sound reasonable to me to have PDS cleared when device offlined.
> Since hypervisor doesn't really know what the condition the slot presence
> bit would be when migrating, it seems we should just clear the bit in
> cmask.
> 
> So with the last reply from Leo, the patch looks all right to me.  It's
> just that as Leo mentioned, we should mention the offline process if that's
> the case, because that's definitely an important step to reproduce the issue.
> 
> Thanks,

If you want to suggest more text to the commit log, for the benefit
of backporters, that is fine by me.

> -- 
> Peter Xu
Peter Xu July 6, 2023, 8:17 p.m. UTC | #10
On Thu, Jul 06, 2023 at 04:00:49PM -0400, Michael S. Tsirkin wrote:
> I mean his patch is exactly right. cmask was designed with this
> kind of use case in mind.
> Will queue.

That's great news.

> If you want to suggest more text to the commit log, for the benefit
> of backporters, that is fine by me.

If you're fine with it I've no issue; since we're reaching soft-freeze I'd
think it better if it can be merged on time (or just slightly enhance the
commit message when merge).

Thanks!
Leonardo Bras July 6, 2023, 9:47 p.m. UTC | #11
On Thu, Jul 6, 2023 at 3:24 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > I asked the same question, and I still keep confused: whether there's a
> > > first bad commit?  Starting from when it fails?
> > >
> > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> >
> > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > this any further.
>
> I see, thanks!
>
> But then do you know why it's never hit before?  I assume it means this bug
> has been there for a long time.

Even longer than expected:

I did some testing looking for the bug:

qemu v5.0.1 reproduces
qemu v4.0.1 reproduces
qemu v3.0.1 reproduces
qemu v2.12.1 reproduces

I decided to stop testing at this point, because it required python2
for building qemu, and it is far enough (5 years).

Seems to be a very old bug, that just hasn't bothered anyone until now.

>
> --
> Peter Xu
>

Thanks!
Leo
Leonardo Bras July 10, 2023, 5:49 p.m. UTC | #12
On Thu, Jul 6, 2023 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jul 06, 2023 at 03:02:07PM -0400, Peter Xu wrote:
> > On Thu, Jul 06, 2023 at 02:50:20PM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 06, 2023 at 02:14:37PM -0400, Peter Xu wrote:
> > > > On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > > > > I asked the same question, and I still keep confused: whether there's a
> > > > > > first bad commit?  Starting from when it fails?
> > > > > >
> > > > > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> > > > >
> > > > > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > > > > this any further.
> > > >
> > > > I see, thanks!
> > > >
> > > > But then do you know why it's never hit before?  I assume it means this bug
> > > > has been there for a long time.
> > >
> > > It's a race - you have to migrate after the bit has been set before
> > > the bit got cleared.
> > > cmask is exactly for bits that qemu modifies itself.
> >
> > Michael, do you mean that Leo's patch is wrong?
>
>
> I mean his patch is exactly right. cmask was designed with this
> kind of use case in mind.
> Will queue.

Thanks Michael!

Any chance this will get in on time for v8.1 ?

>
> > I just got understood why it got cleared - I think Leo didn't mention that
> > the device was actually offlined before migration, IIUC that's why the PDS
> > bit got cleared, if PDS was trying to describe that of the slot.
> >
> > According to:
> >
> >     /* Used to enable checks on load. Note that writable bits are
> >      * never checked even if set in cmask. */
> >     uint8_t *cmask;
> >
> > It does sound reasonable to me to have PDS cleared when device offlined.
> > Since hypervisor doesn't really know what the condition the slot presence
> > bit would be when migrating, it seems we should just clear the bit in
> > cmask.
> >
> > So with the last reply from Leo, the patch looks all right to me.  It's
> > just that as Leo mentioned, we should mention the offline process if that's
> > the case, because that's definitely an important step to reproduce the issue.
> >
> > Thanks,
>
> If you want to suggest more text to the commit log, for the benefit
> of backporters, that is fine by me.
>
> > --
> > Peter Xu
>
Michael S. Tsirkin July 10, 2023, 6:16 p.m. UTC | #13
On Mon, Jul 10, 2023 at 02:49:05PM -0300, Leonardo Bras Soares Passos wrote:
> On Thu, Jul 6, 2023 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 06, 2023 at 03:02:07PM -0400, Peter Xu wrote:
> > > On Thu, Jul 06, 2023 at 02:50:20PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 06, 2023 at 02:14:37PM -0400, Peter Xu wrote:
> > > > > On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > > > > > I asked the same question, and I still keep confused: whether there's a
> > > > > > > first bad commit?  Starting from when it fails?
> > > > > > >
> > > > > > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> > > > > >
> > > > > > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > > > > > this any further.
> > > > >
> > > > > I see, thanks!
> > > > >
> > > > > But then do you know why it's never hit before?  I assume it means this bug
> > > > > has been there for a long time.
> > > >
> > > > It's a race - you have to migrate after the bit has been set before
> > > > the bit got cleared.
> > > > cmask is exactly for bits that qemu modifies itself.
> > >
> > > Michael, do you mean that Leo's patch is wrong?
> >
> >
> > I mean his patch is exactly right. cmask was designed with this
> > kind of use case in mind.
> > Will queue.
> 
> Thanks Michael!
> 
> Any chance this will get in on time for v8.1 ?

Yes, working on pull request now.


> >
> > > I just got understood why it got cleared - I think Leo didn't mention that
> > > the device was actually offlined before migration, IIUC that's why the PDS
> > > bit got cleared, if PDS was trying to describe that of the slot.
> > >
> > > According to:
> > >
> > >     /* Used to enable checks on load. Note that writable bits are
> > >      * never checked even if set in cmask. */
> > >     uint8_t *cmask;
> > >
> > > It does sound reasonable to me to have PDS cleared when device offlined.
> > > Since hypervisor doesn't really know what the condition the slot presence
> > > bit would be when migrating, it seems we should just clear the bit in
> > > cmask.
> > >
> > > So with the last reply from Leo, the patch looks all right to me.  It's
> > > just that as Leo mentioned, we should mention the offline process if that's
> > > the case, because that's definitely an important step to reproduce the issue.
> > >
> > > Thanks,
> >
> > If you want to suggest more text to the commit log, for the benefit
> > of backporters, that is fine by me.
> >
> > > --
> > > Peter Xu
> >
Leonardo Bras July 10, 2023, 9:48 p.m. UTC | #14
On Mon, Jul 10, 2023 at 3:16 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jul 10, 2023 at 02:49:05PM -0300, Leonardo Bras Soares Passos wrote:
> > On Thu, Jul 6, 2023 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jul 06, 2023 at 03:02:07PM -0400, Peter Xu wrote:
> > > > On Thu, Jul 06, 2023 at 02:50:20PM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 06, 2023 at 02:14:37PM -0400, Peter Xu wrote:
> > > > > > On Thu, Jul 06, 2023 at 03:07:40PM -0300, Leonardo Bras Soares Passos wrote:
> > > > > > > > I asked the same question, and I still keep confused: whether there's a
> > > > > > > > first bad commit?  Starting from when it fails?
> > > > > > > >
> > > > > > > > For example, is this broken on 6.0 binaries too with pc-q35-6.0?
> > > > > > >
> > > > > > > I tested for qemu 6.0, and it still reproduces, but have not pursued
> > > > > > > this any further.
> > > > > >
> > > > > > I see, thanks!
> > > > > >
> > > > > > But then do you know why it's never hit before?  I assume it means this bug
> > > > > > has been there for a long time.
> > > > >
> > > > > It's a race - you have to migrate after the bit has been set before
> > > > > the bit got cleared.
> > > > > cmask is exactly for bits that qemu modifies itself.
> > > >
> > > > Michael, do you mean that Leo's patch is wrong?
> > >
> > >
> > > I mean his patch is exactly right. cmask was designed with this
> > > kind of use case in mind.
> > > Will queue.
> >
> > Thanks Michael!
> >
> > Any chance this will get in on time for v8.1 ?
>
> Yes, working on pull request now.

Thanks!

>
>
> > >
> > > > I just got understood why it got cleared - I think Leo didn't mention that
> > > > the device was actually offlined before migration, IIUC that's why the PDS
> > > > bit got cleared, if PDS was trying to describe that of the slot.
> > > >
> > > > According to:
> > > >
> > > >     /* Used to enable checks on load. Note that writable bits are
> > > >      * never checked even if set in cmask. */
> > > >     uint8_t *cmask;
> > > >
> > > > It does sound reasonable to me to have PDS cleared when device offlined.
> > > > Since hypervisor doesn't really know what the condition the slot presence
> > > > bit would be when migrating, it seems we should just clear the bit in
> > > > cmask.
> > > >
> > > > So with the last reply from Leo, the patch looks all right to me.  It's
> > > > just that as Leo mentioned, we should mention the offline process if that's
> > > > the case, because that's definitely an important step to reproduce the issue.
> > > >
> > > > Thanks,
> > >
> > > If you want to suggest more text to the commit log, for the benefit
> > > of backporters, that is fine by me.
> > >
> > > > --
> > > > Peter Xu
> > >
>
diff mbox series

Patch

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index b8c24cf45f..cae56bf1c8 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -659,6 +659,10 @@  void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
     pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
                                PCI_EXP_HP_EV_SUPPORTED);
 
+    /* Avoid migration abortion when this device hot-removed by guest */
+    pci_word_test_and_clear_mask(dev->cmask + pos + PCI_EXP_SLTSTA,
+                                 PCI_EXP_SLTSTA_PDS);
+
     dev->exp.hpev_notified = false;
 
     qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))),