diff mbox series

[1/1] pcie: Add hotplug detect state register to w1cmask

Message ID 20230629090500.438976-2-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series [1/1] pcie: Add hotplug detect state register to w1cmask | expand

Commit Message

Leonardo Bras June 29, 2023, 9:05 a.m. UTC
When trying to migrate a machine type pc-q35-6.0 or lower, with this
cmdline options:

-device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
-device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1

the following bug happens after all ram pages were sent:

qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
qemu-kvm: load of migration failed: Invalid argument

This happens on pc-q35-6.0 or lower because of:
{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }

In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
signal PCI hotplug for the guest. After a while the guest will deal with
this hotplug and qemu will clear the above bit.

Then, during migration, get_pci_config_device() will compare the
configs of both the freshly created device and the one that is being
received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
and cause the bug to reproduce.

To avoid this fake incompatibility, there are two fields in PCIDevice that
can help:

.wmask: Used to implement R/W bytes, and
.w1cmask: Used to implement RW1C(Write 1 to Clear) bytes

According to pcie_cap_slot_init() the slot status register
(PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
under w1cmask field, with makes sense due to the way signaling the hotplug
works.

So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
get_pci_config_device() does not abort the migration.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 hw/pci/pcie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Peter Xu June 29, 2023, 5:01 p.m. UTC | #1
Hi, Leo,

Thanks for figuring this out.  Let me copy a few more potential reviewers
from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
Q35").

On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> When trying to migrate a machine type pc-q35-6.0 or lower, with this
> cmdline options:
> 
> -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> 
> the following bug happens after all ram pages were sent:
> 
> qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> qemu-kvm: Failed to load PCIDevice:config
> qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> qemu-kvm: load of migration failed: Invalid argument
> 
> This happens on pc-q35-6.0 or lower because of:
> { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> 
> In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> signal PCI hotplug for the guest. After a while the guest will deal with
> this hotplug and qemu will clear the above bit.
> 
> Then, during migration, get_pci_config_device() will compare the
> configs of both the freshly created device and the one that is being
> received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> and cause the bug to reproduce.
> 
> To avoid this fake incompatibility, there are two fields in PCIDevice that
> can help:
> 
> .wmask: Used to implement R/W bytes, and
> .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes

Is there one more option to clear the bit in cmask?

IIUC w1cmask means the guest can now write to this bit, but afaiu from the
pcie spec it's RO.

> 
> According to pcie_cap_slot_init() the slot status register
> (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> under w1cmask field, with makes sense due to the way signaling the hotplug
> works.
> 
> So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> get_pci_config_device() does not abort the migration.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Do we need a Fixes: and also the need to copy stable?

> ---
>  hw/pci/pcie.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index b8c24cf45f..2def1765a5 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
>                                 PCI_EXP_SLTCTL_EIC);
>  
>      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> -                               PCI_EXP_HP_EV_SUPPORTED);
> +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
>  
>      dev->exp.hpev_notified = false;
>  
> -- 
> 2.41.0
>
Michael S. Tsirkin June 29, 2023, 7:33 p.m. UTC | #2
On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> Hi, Leo,
> 
> Thanks for figuring this out.  Let me copy a few more potential reviewers
> from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> Q35").
> 
> On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > cmdline options:
> > 
> > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > 
> > the following bug happens after all ram pages were sent:
> > 
> > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > qemu-kvm: Failed to load PCIDevice:config
> > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > qemu-kvm: load of migration failed: Invalid argument
> > 
> > This happens on pc-q35-6.0 or lower because of:
> > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > 
> > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> > signal PCI hotplug for the guest. After a while the guest will deal with
> > this hotplug and qemu will clear the above bit.

Presence Detect State – This bit indicates the presence of an
adapter in the slot, reflected by the logical “OR” of the Physical
Layer in-band presence detect mechanism and, if present, any
out-of-band presence detect mechanism defined for the slot’s
corresponding form factor. Note that the in-band presence
detect mechanism requires that power be applied to an adapter
for its presence to be detected. Consequently, form factors that
require a power controller for hot-plug must implement a
physical pin presence detect mechanism.
RO
Defined encodings are:
0b Slot Empty
1b Card Present in slot
This bit must be implemented on all Downstream Ports that
implement slots. For Downstream Ports not connected to slots
(where the Slot Implemented bit of the PCI Express Capabilities
register is 0b), this bit must be hardwired to 1b.


And this seems to match what QEMU is doing: it clears on unplug
not after guest deals with hotplug.


> > Then, during migration, get_pci_config_device() will compare the
> > configs of both the freshly created device and the one that is being
> > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > and cause the bug to reproduce.

So bit is set on source.
But why is the bit cleared on destination? This is the part I don't get.


> > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > can help:
> > 
> > .wmask: Used to implement R/W bytes, and
> > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> 
> Is there one more option to clear the bit in cmask?
> 
> IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> pcie spec it's RO.

Yes this bit must be RO.

> > 
> > According to pcie_cap_slot_init() the slot status register
> > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > under w1cmask field, with makes sense due to the way signaling the hotplug
> > works.
> > 
> > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > get_pci_config_device() does not abort the migration.
> > 
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> 
> Do we need a Fixes: and also the need to copy stable?
> 
> > ---
> >  hw/pci/pcie.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index b8c24cf45f..2def1765a5 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> >                                 PCI_EXP_SLTCTL_EIC);
> >  
> >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > -                               PCI_EXP_HP_EV_SUPPORTED);
> > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> >  
> >      dev->exp.hpev_notified = false;
> >  
> > -- 
> > 2.41.0
> > 
> 
> -- 
> Peter Xu
Peter Xu June 29, 2023, 8:01 p.m. UTC | #3
On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > Hi, Leo,
> > 
> > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > Q35").
> > 
> > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > cmdline options:
> > > 
> > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > 
> > > the following bug happens after all ram pages were sent:
> > > 
> > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > qemu-kvm: Failed to load PCIDevice:config
> > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > qemu-kvm: load of migration failed: Invalid argument
> > > 
> > > This happens on pc-q35-6.0 or lower because of:
> > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > 
> > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > this hotplug and qemu will clear the above bit.
> 
> Presence Detect State – This bit indicates the presence of an
> adapter in the slot, reflected by the logical “OR” of the Physical
> Layer in-band presence detect mechanism and, if present, any
> out-of-band presence detect mechanism defined for the slot’s
> corresponding form factor. Note that the in-band presence
> detect mechanism requires that power be applied to an adapter
> for its presence to be detected. Consequently, form factors that
> require a power controller for hot-plug must implement a
> physical pin presence detect mechanism.
> RO
> Defined encodings are:
> 0b Slot Empty
> 1b Card Present in slot
> This bit must be implemented on all Downstream Ports that
> implement slots. For Downstream Ports not connected to slots
> (where the Slot Implemented bit of the PCI Express Capabilities
> register is 0b), this bit must be hardwired to 1b.
> 
> 
> And this seems to match what QEMU is doing: it clears on unplug
> not after guest deals with hotplug.
> 
> 
> > > Then, during migration, get_pci_config_device() will compare the
> > > configs of both the freshly created device and the one that is being
> > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > and cause the bug to reproduce.
> 
> So bit is set on source.
> But why is the bit cleared on destination? This is the part I don't get.

My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
device, we just won't ever PCI_EXP_SLTSTA_PDS bit?

> 
> 
> > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > can help:
> > > 
> > > .wmask: Used to implement R/W bytes, and
> > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > 
> > Is there one more option to clear the bit in cmask?
> > 
> > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > pcie spec it's RO.
> 
> Yes this bit must be RO.
> 
> > > 
> > > According to pcie_cap_slot_init() the slot status register
> > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > works.
> > > 
> > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > get_pci_config_device() does not abort the migration.
> > > 
> > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > 
> > Do we need a Fixes: and also the need to copy stable?
> > 
> > > ---
> > >  hw/pci/pcie.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index b8c24cf45f..2def1765a5 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > >                                 PCI_EXP_SLTCTL_EIC);
> > >  
> > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > >  
> > >      dev->exp.hpev_notified = false;
> > >  
> > > -- 
> > > 2.41.0
> > > 
> > 
> > -- 
> > Peter Xu
>
Michael S. Tsirkin June 29, 2023, 8:06 p.m. UTC | #4
On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > Hi, Leo,
> > > 
> > > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > Q35").
> > > 
> > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > > cmdline options:
> > > > 
> > > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > 
> > > > the following bug happens after all ram pages were sent:
> > > > 
> > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > qemu-kvm: Failed to load PCIDevice:config
> > > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > > qemu-kvm: load of migration failed: Invalid argument
> > > > 
> > > > This happens on pc-q35-6.0 or lower because of:
> > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > 
> > > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> > > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > > this hotplug and qemu will clear the above bit.
> > 
> > Presence Detect State – This bit indicates the presence of an
> > adapter in the slot, reflected by the logical “OR” of the Physical
> > Layer in-band presence detect mechanism and, if present, any
> > out-of-band presence detect mechanism defined for the slot’s
> > corresponding form factor. Note that the in-band presence
> > detect mechanism requires that power be applied to an adapter
> > for its presence to be detected. Consequently, form factors that
> > require a power controller for hot-plug must implement a
> > physical pin presence detect mechanism.
> > RO
> > Defined encodings are:
> > 0b Slot Empty
> > 1b Card Present in slot
> > This bit must be implemented on all Downstream Ports that
> > implement slots. For Downstream Ports not connected to slots
> > (where the Slot Implemented bit of the PCI Express Capabilities
> > register is 0b), this bit must be hardwired to 1b.
> > 
> > 
> > And this seems to match what QEMU is doing: it clears on unplug
> > not after guest deals with hotplug.
> > 
> > 
> > > > Then, during migration, get_pci_config_device() will compare the
> > > > configs of both the freshly created device and the one that is being
> > > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > > and cause the bug to reproduce.
> > 
> > So bit is set on source.
> > But why is the bit cleared on destination? This is the part I don't get.
> 
> My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
> device, we just won't ever PCI_EXP_SLTSTA_PDS bit?

Why?


> > 
> > 
> > > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > > can help:
> > > > 
> > > > .wmask: Used to implement R/W bytes, and
> > > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > > 
> > > Is there one more option to clear the bit in cmask?
> > > 
> > > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > > pcie spec it's RO.
> > 
> > Yes this bit must be RO.
> > 
> > > > 
> > > > According to pcie_cap_slot_init() the slot status register
> > > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > > works.
> > > > 
> > > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > > get_pci_config_device() does not abort the migration.
> > > > 
> > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > 
> > > Do we need a Fixes: and also the need to copy stable?
> > > 
> > > > ---
> > > >  hw/pci/pcie.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > index b8c24cf45f..2def1765a5 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > > >                                 PCI_EXP_SLTCTL_EIC);
> > > >  
> > > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > > >  
> > > >      dev->exp.hpev_notified = false;
> > > >  
> > > > -- 
> > > > 2.41.0
> > > > 
> > > 
> > > -- 
> > > Peter Xu
> > 
> 
> -- 
> Peter Xu
Peter Xu June 29, 2023, 8:56 p.m. UTC | #5
On Thu, Jun 29, 2023 at 04:06:53PM -0400, Michael S. Tsirkin wrote:
> On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> > On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > > Hi, Leo,
> > > > 
> > > > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > > Q35").
> > > > 
> > > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > > > cmdline options:
> > > > > 
> > > > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > > 
> > > > > the following bug happens after all ram pages were sent:
> > > > > 
> > > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > > qemu-kvm: Failed to load PCIDevice:config
> > > > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > > > qemu-kvm: load of migration failed: Invalid argument
> > > > > 
> > > > > This happens on pc-q35-6.0 or lower because of:
> > > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > > 
> > > > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> > > > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > > > this hotplug and qemu will clear the above bit.
> > > 
> > > Presence Detect State – This bit indicates the presence of an
> > > adapter in the slot, reflected by the logical “OR” of the Physical
> > > Layer in-band presence detect mechanism and, if present, any
> > > out-of-band presence detect mechanism defined for the slot’s
> > > corresponding form factor. Note that the in-band presence
> > > detect mechanism requires that power be applied to an adapter
> > > for its presence to be detected. Consequently, form factors that
> > > require a power controller for hot-plug must implement a
> > > physical pin presence detect mechanism.
> > > RO
> > > Defined encodings are:
> > > 0b Slot Empty
> > > 1b Card Present in slot
> > > This bit must be implemented on all Downstream Ports that
> > > implement slots. For Downstream Ports not connected to slots
> > > (where the Slot Implemented bit of the PCI Express Capabilities
> > > register is 0b), this bit must be hardwired to 1b.
> > > 
> > > 
> > > And this seems to match what QEMU is doing: it clears on unplug
> > > not after guest deals with hotplug.
> > > 
> > > 
> > > > > Then, during migration, get_pci_config_device() will compare the
> > > > > configs of both the freshly created device and the one that is being
> > > > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > > > and cause the bug to reproduce.
> > > 
> > > So bit is set on source.
> > > But why is the bit cleared on destination? This is the part I don't get.
> > 
> > My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
> > device, we just won't ever PCI_EXP_SLTSTA_PDS bit?
> 
> Why?

Never mind, spoke too soon, sorry. :(

I thought pcie_cap_slot_plug_cb() can skip the set, but then I just found
that dev->hotplugged is not what I imagined there.

Leo should know better.

> 
> 
> > > 
> > > 
> > > > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > > > can help:
> > > > > 
> > > > > .wmask: Used to implement R/W bytes, and
> > > > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > > > 
> > > > Is there one more option to clear the bit in cmask?
> > > > 
> > > > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > > > pcie spec it's RO.
> > > 
> > > Yes this bit must be RO.
> > > 
> > > > > 
> > > > > According to pcie_cap_slot_init() the slot status register
> > > > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > > > works.
> > > > > 
> > > > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > > > get_pci_config_device() does not abort the migration.
> > > > > 
> > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > 
> > > > Do we need a Fixes: and also the need to copy stable?
> > > > 
> > > > > ---
> > > > >  hw/pci/pcie.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > index b8c24cf45f..2def1765a5 100644
> > > > > --- a/hw/pci/pcie.c
> > > > > +++ b/hw/pci/pcie.c
> > > > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > > > >                                 PCI_EXP_SLTCTL_EIC);
> > > > >  
> > > > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > > > >  
> > > > >      dev->exp.hpev_notified = false;
> > > > >  
> > > > > -- 
> > > > > 2.41.0
> > > > > 
> > > > 
> > > > -- 
> > > > Peter Xu
> > > 
> > 
> > -- 
> > Peter Xu
>
Leonardo Bras July 3, 2023, 5:20 a.m. UTC | #6
Hello Peter and Michael, thanks for reviewing!


On Thu, 2023-06-29 at 16:56 -0400, Peter Xu wrote:
> On Thu, Jun 29, 2023 at 04:06:53PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> > > On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > > > Hi, Leo,
> > > > > 
> > > > > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > > > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > > > Q35").
> > > > > 
> > > > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > > > > cmdline options:
> > > > > > 
> > > > > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > > > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > > > 
> > > > > > the following bug happens after all ram pages were sent:
> > > > > > 
> > > > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > > > qemu-kvm: Failed to load PCIDevice:config
> > > > > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > > > > qemu-kvm: load of migration failed: Invalid argument
> > > > > > 
> > > > > > This happens on pc-q35-6.0 or lower because of:
> > > > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > > > 
> > > > > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > > > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> > > > > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > > > > this hotplug and qemu will clear the above bit.
> > > > 
> > > > Presence Detect State – This bit indicates the presence of an
> > > > adapter in the slot, reflected by the logical “OR” of the Physical
> > > > Layer in-band presence detect mechanism and, if present, any
> > > > out-of-band presence detect mechanism defined for the slot’s
> > > > corresponding form factor. Note that the in-band presence
> > > > detect mechanism requires that power be applied to an adapter
> > > > for its presence to be detected. Consequently, form factors that
> > > > require a power controller for hot-plug must implement a
> > > > physical pin presence detect mechanism.
> > > > RO
> > > > Defined encodings are:
> > > > 0b Slot Empty
> > > > 1b Card Present in slot
> > > > This bit must be implemented on all Downstream Ports that
> > > > implement slots. For Downstream Ports not connected to slots
> > > > (where the Slot Implemented bit of the PCI Express Capabilities
> > > > register is 0b), this bit must be hardwired to 1b.

Thank you for providing this doc!
I am new to PCI stuff, could you please point this doc?

> > > > 
> > > > 
> > > > And this seems to match what QEMU is doing: it clears on unplug
> > > > not after guest deals with hotplug.

Oh, that's weird.
It should not unplug the device, so IIUC it should not clear the bit.
Maybe something weird is happening in the guest, I will take a look.

> > > > 
> > > > 
> > > > > > Then, during migration, get_pci_config_device() will compare the
> > > > > > configs of both the freshly created device and the one that is being
> > > > > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > > > > and cause the bug to reproduce.
> > > > 
> > > > So bit is set on source.
> > > > But why is the bit cleared on destination? This is the part I don't get.

No, bit is set when the device is created by qemu.
After some time running (boot process completion) the bit is cleared.

The receiving end of migration will then create the device with the bit set, and
then wait for migration. After the source device is received, the compare fails
due to those bits being different.



> > > 
> > > My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
> > > device, we just won't ever PCI_EXP_SLTSTA_PDS bit?
> > 
> > Why?
> 
> Never mind, spoke too soon, sorry. :(
> 
> I thought pcie_cap_slot_plug_cb() can skip the set, but then I just found
> that dev->hotplugged is not what I imagined there.
> 
> Leo should know better.

There is a difference of which hotplug function is called based on the 
ACPI_PM_PROP_ACPI_PCIHP_BRIDGE option:

When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="off", hotplug_handler_plug() calls
pcie_cap_slot_plug_cb() which sets the bus dev->config byte 0x6e with bit
PCI_EXP_SLTSTA_PDS.

When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="on", hotplug_handler_plug() calls
ich9_pm_device_plug_cb(), which does not set this bit.

> 
> > 
> > 
> > > > 
> > > > 
> > > > > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > > > > can help:
> > > > > > 
> > > > > > .wmask: Used to implement R/W bytes, and
> > > > > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > > > > 
> > > > > Is there one more option to clear the bit in cmask?

We could clear the bit for .cmask . I suggested w1cmask because I previously
understood that bit was guest-writeable.

> > > > > 
> > > > > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > > > > pcie spec it's RO.
> > > > 
> > > > Yes this bit must be RO.

My bad, I assumed behavior based on how the guest was working, and this gone
wrong. With above documentation provided, I would suggest clearing the .config
mask related bit so qemu skips checking this one.

What is your opinion on that?

> > > > 
> > > > > > 
> > > > > > According to pcie_cap_slot_init() the slot status register
> > > > > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > > > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > > > > works.
> > > > > > 
> > > > > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > > > > get_pci_config_device() does not abort the migration.
> > > > > > 
> > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > 
> > > > > Do we need a Fixes: and also the need to copy stable?
> > > > > 
> > > > > > ---
> > > > > >  hw/pci/pcie.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > index b8c24cf45f..2def1765a5 100644
> > > > > > --- a/hw/pci/pcie.c
> > > > > > +++ b/hw/pci/pcie.c
> > > > > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > > > > >                                 PCI_EXP_SLTCTL_EIC);
> > > > > >  
> > > > > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > > > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > > > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > > > > >  
> > > > > >      dev->exp.hpev_notified = false;
> > > > > >  
> > > > > > -- 
> > > > > > 2.41.0
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > Peter Xu
> > > > 
> > > 
> > > -- 
> > > Peter Xu
> > 
>
Leonardo Bras July 4, 2023, 6:20 a.m. UTC | #7
Hello Peter and Michael, I have a few updates on this:

On Mon, 2023-07-03 at 02:20 -0300, Leonardo Brás wrote:
> Hello Peter and Michael, thanks for reviewing!
> 
> 
> On Thu, 2023-06-29 at 16:56 -0400, Peter Xu wrote:
> > On Thu, Jun 29, 2023 at 04:06:53PM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> > > > On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > > > > Hi, Leo,
> > > > > > 
> > > > > > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > > > > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > > > > Q35").
> > > > > > 
> > > > > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > > > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > > > > > cmdline options:
> > > > > > > 
> > > > > > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > > > > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > > > > 
> > > > > > > the following bug happens after all ram pages were sent:
> > > > > > > 
> > > > > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > > > > qemu-kvm: Failed to load PCIDevice:config
> > > > > > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > > > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > > > > > qemu-kvm: load of migration failed: Invalid argument
> > > > > > > 
> > > > > > > This happens on pc-q35-6.0 or lower because of:
> > > > > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > > > > 
> > > > > > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > > > > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> > > > > > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > > > > > this hotplug and qemu will clear the above bit.
> > > > > 
> > > > > Presence Detect State – This bit indicates the presence of an
> > > > > adapter in the slot, reflected by the logical “OR” of the Physical
> > > > > Layer in-band presence detect mechanism and, if present, any
> > > > > out-of-band presence detect mechanism defined for the slot’s
> > > > > corresponding form factor. Note that the in-band presence
> > > > > detect mechanism requires that power be applied to an adapter
> > > > > for its presence to be detected. Consequently, form factors that
> > > > > require a power controller for hot-plug must implement a
> > > > > physical pin presence detect mechanism.
> > > > > RO
> > > > > Defined encodings are:
> > > > > 0b Slot Empty
> > > > > 1b Card Present in slot
> > > > > This bit must be implemented on all Downstream Ports that
> > > > > implement slots. For Downstream Ports not connected to slots
> > > > > (where the Slot Implemented bit of the PCI Express Capabilities
> > > > > register is 0b), this bit must be hardwired to 1b.
> 
> Thank you for providing this doc!
> I am new to PCI stuff, could you please point this doc?

(I mean, the link to the documentation)

> 
> > > > > 
> > > > > 
> > > > > And this seems to match what QEMU is doing: it clears on unplug
> > > > > not after guest deals with hotplug.
> 
> Oh, that's weird.
> It should not unplug the device, so IIUC it should not clear the bit.
> Maybe something weird is happening in the guest, I will take a look.

Updates on this:
You are right! For some reason the guest is hot-unplugging the device under some
conditions, so there is another bug on this for me to look after.

> 
> > > > > 
> > > > > 
> > > > > > > Then, during migration, get_pci_config_device() will compare the
> > > > > > > configs of both the freshly created device and the one that is being
> > > > > > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > > > > > and cause the bug to reproduce.
> > > > > 
> > > > > So bit is set on source.
> > > > > But why is the bit cleared on destination? This is the part I don't get.
> 
> No, bit is set when the device is created by qemu.
> After some time running (boot process completion) the bit is cleared.

The 'after some time' here is about the guest hot-unplugging the device.

> 
> The receiving end of migration will then create the device with the bit set, and
> then wait for migration. After the source device is received, the compare fails
> due to those bits being different.
> 

But anyway, there is some chance the device will be hot-unplugged by the guest
OS for any reason, so we need to cover this scenario so it does not break
migration.

> 
> 
> > > > 
> > > > My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
> > > > device, we just won't ever PCI_EXP_SLTSTA_PDS bit?
> > > 
> > > Why?
> > 
> > Never mind, spoke too soon, sorry. :(
> > 
> > I thought pcie_cap_slot_plug_cb() can skip the set, but then I just found
> > that dev->hotplugged is not what I imagined there.
> > 
> > Leo should know better.
> 
> There is a difference of which hotplug function is called based on the 
> ACPI_PM_PROP_ACPI_PCIHP_BRIDGE option:
> 
> When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="off", hotplug_handler_plug() calls
> pcie_cap_slot_plug_cb() which sets the bus dev->config byte 0x6e with bit
> PCI_EXP_SLTSTA_PDS.
> 
> When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="on", hotplug_handler_plug() calls
> ich9_pm_device_plug_cb(), which does not set this bit.
> 
> > 
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > > > > > can help:
> > > > > > > 
> > > > > > > .wmask: Used to implement R/W bytes, and
> > > > > > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > > > > > 
> > > > > > Is there one more option to clear the bit in cmask?
> 
> We could clear the bit for .cmask . I suggested w1cmask because I previously
> understood that bit was guest-writeable.

IIUC, the bit is guest-writeable, so we should use .wmask instead of .cmask .
Is this correct?

> 
> > > > > > 
> > > > > > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > > > > > pcie spec it's RO.
> > > > > 
> > > > > Yes this bit must be RO.
> 
> My bad, I assumed behavior based on how the guest was working, and this gone
> wrong. With above documentation provided, I would suggest clearing the .config
> mask related bit so qemu skips checking this one.
> 
> What is your opinion on that?
> 
> > > > > 
> > > > > > > 
> > > > > > > According to pcie_cap_slot_init() the slot status register
> > > > > > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > > > > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > > > > > works.
> > > > > > > 
> > > > > > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > > > > > get_pci_config_device() does not abort the migration.
> > > > > > > 
> > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > > 
> > > > > > Do we need a Fixes: and also the need to copy stable?
> > > > > > 
> > > > > > > ---
> > > > > > >  hw/pci/pcie.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > > index b8c24cf45f..2def1765a5 100644
> > > > > > > --- a/hw/pci/pcie.c
> > > > > > > +++ b/hw/pci/pcie.c
> > > > > > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > > > > > >                                 PCI_EXP_SLTCTL_EIC);
> > > > > > >  
> > > > > > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > > > > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > > > > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > > > > > >  
> > > > > > >      dev->exp.hpev_notified = false;
> > > > > > >  
> > > > > > > -- 
> > > > > > > 2.41.0
> > > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > Peter Xu
> > > > > 
> > > > 
> > > > -- 
> > > > Peter Xu
> > > 
> > 
>
Michael S. Tsirkin July 4, 2023, 6:43 a.m. UTC | #8
On Tue, Jul 04, 2023 at 03:20:36AM -0300, Leonardo Brás wrote:
> Hello Peter and Michael, I have a few updates on this:
> 
> On Mon, 2023-07-03 at 02:20 -0300, Leonardo Brás wrote:
> > Hello Peter and Michael, thanks for reviewing!
> > 
> > 
> > On Thu, 2023-06-29 at 16:56 -0400, Peter Xu wrote:
> > > On Thu, Jun 29, 2023 at 04:06:53PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> > > > > On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > > > > > Hi, Leo,
> > > > > > > 
> > > > > > > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > > > > > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > > > > > Q35").
> > > > > > > 
> > > > > > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > > > > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > > > > > > cmdline options:
> > > > > > > > 
> > > > > > > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > > > > > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > > > > > 
> > > > > > > > the following bug happens after all ram pages were sent:
> > > > > > > > 
> > > > > > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > > > > > qemu-kvm: Failed to load PCIDevice:config
> > > > > > > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > > > > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > > > > > > qemu-kvm: load of migration failed: Invalid argument
> > > > > > > > 
> > > > > > > > This happens on pc-q35-6.0 or lower because of:
> > > > > > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > > > > > 
> > > > > > > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > > > > > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to 
> > > > > > > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > > > > > > this hotplug and qemu will clear the above bit.
> > > > > > 
> > > > > > Presence Detect State – This bit indicates the presence of an
> > > > > > adapter in the slot, reflected by the logical “OR” of the Physical
> > > > > > Layer in-band presence detect mechanism and, if present, any
> > > > > > out-of-band presence detect mechanism defined for the slot’s
> > > > > > corresponding form factor. Note that the in-band presence
> > > > > > detect mechanism requires that power be applied to an adapter
> > > > > > for its presence to be detected. Consequently, form factors that
> > > > > > require a power controller for hot-plug must implement a
> > > > > > physical pin presence detect mechanism.
> > > > > > RO
> > > > > > Defined encodings are:
> > > > > > 0b Slot Empty
> > > > > > 1b Card Present in slot
> > > > > > This bit must be implemented on all Downstream Ports that
> > > > > > implement slots. For Downstream Ports not connected to slots
> > > > > > (where the Slot Implemented bit of the PCI Express Capabilities
> > > > > > register is 0b), this bit must be hardwired to 1b.
> > 
> > Thank you for providing this doc!
> > I am new to PCI stuff, could you please point this doc?
> 
> (I mean, the link to the documentation)

The pci specs are all here: https://pcisig.com/
Red Hat is a member so just register, it's free.

I'd get the 5.0 version of pci express base:
https://members.pcisig.com/wg/PCI-SIG/document/13005

6.0 is out but they did something to make it take years to open,
and it shouldn't matter for this.

> > 
> > > > > > 
> > > > > > 
> > > > > > And this seems to match what QEMU is doing: it clears on unplug
> > > > > > not after guest deals with hotplug.
> > 
> > Oh, that's weird.
> > It should not unplug the device, so IIUC it should not clear the bit.
> > Maybe something weird is happening in the guest, I will take a look.
> 
> Updates on this:
> You are right! For some reason the guest is hot-unplugging the device under some
> conditions, so there is another bug on this for me to look after.
> 
> > 
> > > > > > 
> > > > > > 
> > > > > > > > Then, during migration, get_pci_config_device() will compare the
> > > > > > > > configs of both the freshly created device and the one that is being
> > > > > > > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > > > > > > and cause the bug to reproduce.
> > > > > > 
> > > > > > So bit is set on source.
> > > > > > But why is the bit cleared on destination? This is the part I don't get.
> > 
> > No, bit is set when the device is created by qemu.
> > After some time running (boot process completion) the bit is cleared.
> 
> The 'after some time' here is about the guest hot-unplugging the device.
> 
> > 
> > The receiving end of migration will then create the device with the bit set, and
> > then wait for migration. After the source device is received, the compare fails
> > due to those bits being different.
> > 
> 
> But anyway, there is some chance the device will be hot-unplugged by the guest
> OS for any reason, so we need to cover this scenario so it does not break
> migration.
> 
> > 
> > 
> > > > > 
> > > > > My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
> > > > > device, we just won't ever PCI_EXP_SLTSTA_PDS bit?
> > > > 
> > > > Why?
> > > 
> > > Never mind, spoke too soon, sorry. :(
> > > 
> > > I thought pcie_cap_slot_plug_cb() can skip the set, but then I just found
> > > that dev->hotplugged is not what I imagined there.
> > > 
> > > Leo should know better.
> > 
> > There is a difference of which hotplug function is called based on the 
> > ACPI_PM_PROP_ACPI_PCIHP_BRIDGE option:
> > 
> > When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="off", hotplug_handler_plug() calls
> > pcie_cap_slot_plug_cb() which sets the bus dev->config byte 0x6e with bit
> > PCI_EXP_SLTSTA_PDS.
> > 
> > When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="on", hotplug_handler_plug() calls
> > ich9_pm_device_plug_cb(), which does not set this bit.
> > 
> > > 
> > > > 
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > > > > > > can help:
> > > > > > > > 
> > > > > > > > .wmask: Used to implement R/W bytes, and
> > > > > > > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > > > > > > 
> > > > > > > Is there one more option to clear the bit in cmask?
> > 
> > We could clear the bit for .cmask . I suggested w1cmask because I previously
> > understood that bit was guest-writeable.
> 
> IIUC, the bit is guest-writeable, so we should use .wmask instead of .cmask .
> Is this correct?
> 
> > 
> > > > > > > 
> > > > > > > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > > > > > > pcie spec it's RO.
> > > > > > 
> > > > > > Yes this bit must be RO.
> > 
> > My bad, I assumed behavior based on how the guest was working, and this gone
> > wrong. With above documentation provided, I would suggest clearing the .config
> > mask related bit so qemu skips checking this one.
> > 
> > What is your opinion on that?
> > 
> > > > > > 
> > > > > > > > 
> > > > > > > > According to pcie_cap_slot_init() the slot status register
> > > > > > > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > > > > > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > > > > > > works.
> > > > > > > > 
> > > > > > > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > > > > > > get_pci_config_device() does not abort the migration.
> > > > > > > > 
> > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > > > 
> > > > > > > Do we need a Fixes: and also the need to copy stable?
> > > > > > > 
> > > > > > > > ---
> > > > > > > >  hw/pci/pcie.c | 2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > > > index b8c24cf45f..2def1765a5 100644
> > > > > > > > --- a/hw/pci/pcie.c
> > > > > > > > +++ b/hw/pci/pcie.c
> > > > > > > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > > > > > > >                                 PCI_EXP_SLTCTL_EIC);
> > > > > > > >  
> > > > > > > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > > > > > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > > > > > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > > > > > > >  
> > > > > > > >      dev->exp.hpev_notified = false;
> > > > > > > >  
> > > > > > > > -- 
> > > > > > > > 2.41.0
> > > > > > > > 
> > > > > > > 
> > > > > > > -- 
> > > > > > > Peter Xu
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > Peter Xu
> > > > 
> > > 
> >
Leonardo Bras July 5, 2023, 6:40 a.m. UTC | #9
On Tue, Jul 4, 2023 at 3:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jul 04, 2023 at 03:20:36AM -0300, Leonardo Brás wrote:
> > Hello Peter and Michael, I have a few updates on this:
> >
> > On Mon, 2023-07-03 at 02:20 -0300, Leonardo Brás wrote:
> > > Hello Peter and Michael, thanks for reviewing!
> > >
> > >
> > > On Thu, 2023-06-29 at 16:56 -0400, Peter Xu wrote:
> > > > On Thu, Jun 29, 2023 at 04:06:53PM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> > > > > > On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > > > > > > Hi, Leo,
> > > > > > > >
> > > > > > > > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > > > > > > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > > > > > > Q35").
> > > > > > > >
> > > > > > > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > > > > > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > > > > > > > cmdline options:
> > > > > > > > >
> > > > > > > > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > > > > > > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > > > > > >
> > > > > > > > > the following bug happens after all ram pages were sent:
> > > > > > > > >
> > > > > > > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > > > > > > qemu-kvm: Failed to load PCIDevice:config
> > > > > > > > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > > > > > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > > > > > > > qemu-kvm: load of migration failed: Invalid argument
> > > > > > > > >
> > > > > > > > > This happens on pc-q35-6.0 or lower because of:
> > > > > > > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > > > > > >
> > > > > > > > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > > > > > > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to
> > > > > > > > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > > > > > > > this hotplug and qemu will clear the above bit.
> > > > > > >
> > > > > > > Presence Detect State – This bit indicates the presence of an
> > > > > > > adapter in the slot, reflected by the logical “OR” of the Physical
> > > > > > > Layer in-band presence detect mechanism and, if present, any
> > > > > > > out-of-band presence detect mechanism defined for the slot’s
> > > > > > > corresponding form factor. Note that the in-band presence
> > > > > > > detect mechanism requires that power be applied to an adapter
> > > > > > > for its presence to be detected. Consequently, form factors that
> > > > > > > require a power controller for hot-plug must implement a
> > > > > > > physical pin presence detect mechanism.
> > > > > > > RO
> > > > > > > Defined encodings are:
> > > > > > > 0b Slot Empty
> > > > > > > 1b Card Present in slot
> > > > > > > This bit must be implemented on all Downstream Ports that
> > > > > > > implement slots. For Downstream Ports not connected to slots
> > > > > > > (where the Slot Implemented bit of the PCI Express Capabilities
> > > > > > > register is 0b), this bit must be hardwired to 1b.
> > >
> > > Thank you for providing this doc!
> > > I am new to PCI stuff, could you please point this doc?
> >
> > (I mean, the link to the documentation)
>
> The pci specs are all here: https://pcisig.com/
> Red Hat is a member so just register, it's free.
>
> I'd get the 5.0 version of pci express base:
> https://members.pcisig.com/wg/PCI-SIG/document/13005
>
> 6.0 is out but they did something to make it take years to open,
> and it shouldn't matter for this.

This is great! Thanks for sharing!

>
> > >
> > > > > > >
> > > > > > >
> > > > > > > And this seems to match what QEMU is doing: it clears on unplug
> > > > > > > not after guest deals with hotplug.
> > >
> > > Oh, that's weird.
> > > It should not unplug the device, so IIUC it should not clear the bit.
> > > Maybe something weird is happening in the guest, I will take a look.
> >
> > Updates on this:
> > You are right! For some reason the guest is hot-unplugging the device under some
> > conditions, so there is another bug on this for me to look after.
> >
> > >
> > > > > > >
> > > > > > >
> > > > > > > > > Then, during migration, get_pci_config_device() will compare the
> > > > > > > > > configs of both the freshly created device and the one that is being
> > > > > > > > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > > > > > > > and cause the bug to reproduce.
> > > > > > >
> > > > > > > So bit is set on source.
> > > > > > > But why is the bit cleared on destination? This is the part I don't get.
> > >
> > > No, bit is set when the device is created by qemu.
> > > After some time running (boot process completion) the bit is cleared.
> >
> > The 'after some time' here is about the guest hot-unplugging the device.
> >
> > >
> > > The receiving end of migration will then create the device with the bit set, and
> > > then wait for migration. After the source device is received, the compare fails
> > > due to those bits being different.
> > >
> >
> > But anyway, there is some chance the device will be hot-unplugged by the guest
> > OS for any reason, so we need to cover this scenario so it does not break
> > migration.
> >
> > >
> > >
> > > > > >
> > > > > > My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
> > > > > > device, we just won't ever PCI_EXP_SLTSTA_PDS bit?
> > > > >
> > > > > Why?
> > > >
> > > > Never mind, spoke too soon, sorry. :(
> > > >
> > > > I thought pcie_cap_slot_plug_cb() can skip the set, but then I just found
> > > > that dev->hotplugged is not what I imagined there.
> > > >
> > > > Leo should know better.
> > >
> > > There is a difference of which hotplug function is called based on the
> > > ACPI_PM_PROP_ACPI_PCIHP_BRIDGE option:
> > >
> > > When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="off", hotplug_handler_plug() calls
> > > pcie_cap_slot_plug_cb() which sets the bus dev->config byte 0x6e with bit
> > > PCI_EXP_SLTSTA_PDS.
> > >
> > > When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="on", hotplug_handler_plug() calls
> > > ich9_pm_device_plug_cb(), which does not set this bit.
> > >
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > > > > > > > can help:
> > > > > > > > >
> > > > > > > > > .wmask: Used to implement R/W bytes, and
> > > > > > > > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > > > > > > >
> > > > > > > > Is there one more option to clear the bit in cmask?
> > >
> > > We could clear the bit for .cmask . I suggested w1cmask because I previously
> > > understood that bit was guest-writeable.
> >
> > IIUC, the bit is guest-writeable, so we should use .wmask instead of .cmask .
> > Is this correct?

It was incorrect :/

> >
> > >
> > > > > > > >
> > > > > > > > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > > > > > > > pcie spec it's RO.
> > > > > > >
> > > > > > > Yes this bit must be RO.
> > >
> > > My bad, I assumed behavior based on how the guest was working, and this gone
> > > wrong. With above documentation provided, I would suggest clearing the .config
> > > mask related bit so qemu skips checking this one.
> > >
> > > What is your opinion on that?

Michael,
Is the above fine?

If so, I will send a v2 on this.
Any other suggestions?

> > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > According to pcie_cap_slot_init() the slot status register
> > > > > > > > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > > > > > > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > > > > > > > works.
> > > > > > > > >
> > > > > > > > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > > > > > > > get_pci_config_device() does not abort the migration.
> > > > > > > > >
> > > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > > > >
> > > > > > > > Do we need a Fixes: and also the need to copy stable?
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  hw/pci/pcie.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > > > > index b8c24cf45f..2def1765a5 100644
> > > > > > > > > --- a/hw/pci/pcie.c
> > > > > > > > > +++ b/hw/pci/pcie.c
> > > > > > > > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > > > > > > > >                                 PCI_EXP_SLTCTL_EIC);
> > > > > > > > >
> > > > > > > > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > > > > > > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > > > > > > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > > > > > > > >
> > > > > > > > >      dev->exp.hpev_notified = false;
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.41.0
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Peter Xu
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Peter Xu
> > > > >
> > > >
> > >
>
Leonardo Bras July 6, 2023, 4:19 a.m. UTC | #10
On Wed, Jul 5, 2023 at 3:40 AM Leonardo Bras Soares Passos
<leobras@redhat.com> wrote:
>
> On Tue, Jul 4, 2023 at 3:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jul 04, 2023 at 03:20:36AM -0300, Leonardo Brás wrote:
> > > Hello Peter and Michael, I have a few updates on this:
> > >
> > > On Mon, 2023-07-03 at 02:20 -0300, Leonardo Brás wrote:
> > > > Hello Peter and Michael, thanks for reviewing!
> > > >
> > > >
> > > > On Thu, 2023-06-29 at 16:56 -0400, Peter Xu wrote:
> > > > > On Thu, Jun 29, 2023 at 04:06:53PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> > > > > > > On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > > > > > > > Hi, Leo,
> > > > > > > > >
> > > > > > > > > Thanks for figuring this out.  Let me copy a few more potential reviewers
> > > > > > > > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > > > > > > > Q35").
> > > > > > > > >
> > > > > > > > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > > > > > > > When trying to migrate a machine type pc-q35-6.0 or lower, with this
> > > > > > > > > > cmdline options:
> > > > > > > > > >
> > > > > > > > > > -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> > > > > > > > > > -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > > > > > > >
> > > > > > > > > > the following bug happens after all ram pages were sent:
> > > > > > > > > >
> > > > > > > > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > > > > > > > qemu-kvm: Failed to load PCIDevice:config
> > > > > > > > > > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > > > > > > > qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> > > > > > > > > > qemu-kvm: load of migration failed: Invalid argument
> > > > > > > > > >
> > > > > > > > > > This happens on pc-q35-6.0 or lower because of:
> > > > > > > > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > > > > > > >
> > > > > > > > > > In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> > > > > > > > > > which sets the bus dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to
> > > > > > > > > > signal PCI hotplug for the guest. After a while the guest will deal with
> > > > > > > > > > this hotplug and qemu will clear the above bit.
> > > > > > > >
> > > > > > > > Presence Detect State – This bit indicates the presence of an
> > > > > > > > adapter in the slot, reflected by the logical “OR” of the Physical
> > > > > > > > Layer in-band presence detect mechanism and, if present, any
> > > > > > > > out-of-band presence detect mechanism defined for the slot’s
> > > > > > > > corresponding form factor. Note that the in-band presence
> > > > > > > > detect mechanism requires that power be applied to an adapter
> > > > > > > > for its presence to be detected. Consequently, form factors that
> > > > > > > > require a power controller for hot-plug must implement a
> > > > > > > > physical pin presence detect mechanism.
> > > > > > > > RO
> > > > > > > > Defined encodings are:
> > > > > > > > 0b Slot Empty
> > > > > > > > 1b Card Present in slot
> > > > > > > > This bit must be implemented on all Downstream Ports that
> > > > > > > > implement slots. For Downstream Ports not connected to slots
> > > > > > > > (where the Slot Implemented bit of the PCI Express Capabilities
> > > > > > > > register is 0b), this bit must be hardwired to 1b.
> > > >
> > > > Thank you for providing this doc!
> > > > I am new to PCI stuff, could you please point this doc?
> > >
> > > (I mean, the link to the documentation)
> >
> > The pci specs are all here: https://pcisig.com/
> > Red Hat is a member so just register, it's free.
> >
> > I'd get the 5.0 version of pci express base:
> > https://members.pcisig.com/wg/PCI-SIG/document/13005
> >
> > 6.0 is out but they did something to make it take years to open,
> > and it shouldn't matter for this.
>
> This is great! Thanks for sharing!
>
> >
> > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > And this seems to match what QEMU is doing: it clears on unplug
> > > > > > > > not after guest deals with hotplug.
> > > >
> > > > Oh, that's weird.
> > > > It should not unplug the device, so IIUC it should not clear the bit.
> > > > Maybe something weird is happening in the guest, I will take a look.
> > >
> > > Updates on this:
> > > You are right! For some reason the guest is hot-unplugging the device under some
> > > conditions, so there is another bug on this for me to look after.
> > >
> > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > Then, during migration, get_pci_config_device() will compare the
> > > > > > > > > > configs of both the freshly created device and the one that is being
> > > > > > > > > > received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> > > > > > > > > > and cause the bug to reproduce.
> > > > > > > >
> > > > > > > > So bit is set on source.
> > > > > > > > But why is the bit cleared on destination? This is the part I don't get.
> > > >
> > > > No, bit is set when the device is created by qemu.
> > > > After some time running (boot process completion) the bit is cleared.
> > >
> > > The 'after some time' here is about the guest hot-unplugging the device.
> > >
> > > >
> > > > The receiving end of migration will then create the device with the bit set, and
> > > > then wait for migration. After the source device is received, the compare fails
> > > > due to those bits being different.
> > > >
> > >
> > > But anyway, there is some chance the device will be hot-unplugged by the guest
> > > OS for any reason, so we need to cover this scenario so it does not break
> > > migration.
> > >
> > > >
> > > >
> > > > > > >
> > > > > > > My understanding is that when ACPI_PM_PROP_ACPI_PCIHP_BRIDGE is off for the
> > > > > > > device, we just won't ever PCI_EXP_SLTSTA_PDS bit?
> > > > > >
> > > > > > Why?
> > > > >
> > > > > Never mind, spoke too soon, sorry. :(
> > > > >
> > > > > I thought pcie_cap_slot_plug_cb() can skip the set, but then I just found
> > > > > that dev->hotplugged is not what I imagined there.
> > > > >
> > > > > Leo should know better.
> > > >
> > > > There is a difference of which hotplug function is called based on the
> > > > ACPI_PM_PROP_ACPI_PCIHP_BRIDGE option:
> > > >
> > > > When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="off", hotplug_handler_plug() calls
> > > > pcie_cap_slot_plug_cb() which sets the bus dev->config byte 0x6e with bit
> > > > PCI_EXP_SLTSTA_PDS.
> > > >
> > > > When ACPI_PM_PROP_ACPI_PCIHP_BRIDGE=="on", hotplug_handler_plug() calls
> > > > ich9_pm_device_plug_cb(), which does not set this bit.
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > To avoid this fake incompatibility, there are two fields in PCIDevice that
> > > > > > > > > > can help:
> > > > > > > > > >
> > > > > > > > > > .wmask: Used to implement R/W bytes, and
> > > > > > > > > > .w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> > > > > > > > >
> > > > > > > > > Is there one more option to clear the bit in cmask?
> > > >
> > > > We could clear the bit for .cmask . I suggested w1cmask because I previously
> > > > understood that bit was guest-writeable.
> > >
> > > IIUC, the bit is guest-writeable, so we should use .wmask instead of .cmask .
> > > Is this correct?
>
> It was incorrect :/

I was under the impression the guest was writing to this value, but in
fact it was writing to SLTCTL (RW), and qemu was the one detecting
power on -> power off for the device and writing 0 to SLTSTA->
Presence Detect State.

>
> > >
> > > >
> > > > > > > > >
> > > > > > > > > IIUC w1cmask means the guest can now write to this bit, but afaiu from the
> > > > > > > > > pcie spec it's RO.
> > > > > > > >
> > > > > > > > Yes this bit must be RO.
> > > >
> > > > My bad, I assumed behavior based on how the guest was working, and this gone
> > > > wrong. With above documentation provided, I would suggest clearing the .config
> > > > mask related bit so qemu skips checking this one.
> > > >
> > > > What is your opinion on that?
>
> Michael,
> Is the above fine?
>
> If so, I will send a v2 on this.
> Any other suggestions?

This v2 is mostly ready, so I will send it, and in any case it's not
the best solution, we can improve that on a possible v3.


> > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > According to pcie_cap_slot_init() the slot status register
> > > > > > > > > > (PCI_EXP_SLTSTA), in which PCI_EXP_SLTSTA_PDS is a flag, seems to fall
> > > > > > > > > > under w1cmask field, with makes sense due to the way signaling the hotplug
> > > > > > > > > > works.
> > > > > > > > > >
> > > > > > > > > > So, add PCI_EXP_SLTSTA_PDS bit to w1cmask, so the fake incompatibility on
> > > > > > > > > > get_pci_config_device() does not abort the migration.
> > > > > > > > > >
> > > > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> > > > > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > > > > >
> > > > > > > > > Do we need a Fixes: and also the need to copy stable?
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  hw/pci/pcie.c | 2 +-
> > > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > > > > > index b8c24cf45f..2def1765a5 100644
> > > > > > > > > > --- a/hw/pci/pcie.c
> > > > > > > > > > +++ b/hw/pci/pcie.c
> > > > > > > > > > @@ -657,7 +657,7 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
> > > > > > > > > >                                 PCI_EXP_SLTCTL_EIC);
> > > > > > > > > >
> > > > > > > > > >      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
> > > > > > > > > > -                               PCI_EXP_HP_EV_SUPPORTED);
> > > > > > > > > > +                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
> > > > > > > > > >
> > > > > > > > > >      dev->exp.hpev_notified = false;
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.41.0
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Peter Xu
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Peter Xu
> > > > > >
> > > > >
> > > >
> >
diff mbox series

Patch

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index b8c24cf45f..2def1765a5 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -657,7 +657,7 @@  void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
                                PCI_EXP_SLTCTL_EIC);
 
     pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
-                               PCI_EXP_HP_EV_SUPPORTED);
+                               PCI_EXP_HP_EV_SUPPORTED | PCI_EXP_SLTSTA_PDS);
 
     dev->exp.hpev_notified = false;