diff mbox series

[1/1] pcie: Do not set power state for some hot-plugged devices

Message ID 20211214215312.944-1-annie.li@oracle.com (mailing list archive)
State New, archived
Headers show
Series [1/1] pcie: Do not set power state for some hot-plugged devices | expand

Commit Message

Annie Li Dec. 14, 2021, 9:53 p.m. UTC
After the PCIe device is hot-plugged, the device's power state is
initialized as ON. However, the device isn't powered on yet, i.e.
the PCI_EXP_SYSCTL_PCC bit isn't set to PCI_EXP_SLTCTL_PWR_ON.
Later on, its power state will set back to OFF due to the non
PCI_EXP_SLTCTL_PWR_ON state. The device is invisible until
PCI_EXP_SLTCTL_PWR_ON is set.

This may be appropriate for general PCIe hot-plug cases. However,
if the device is hot-plugged when the VM is in VM_STATE_PRELAUNCH
state, especially the system disk device, the firmware will fail
to find the system disk. As a result, the guest fails to boot.

An extra flag(set_power) is added in this patch to indicate if
pci_set_power is needed. After the device is powered
on(PCI_EXP_SLTCTL_PWR_ON), its power state will be set as normal
devices.

Fixes: 090b32b8dae6 ("implement slot power control for pcie root ports")

Signed-off-by: Annie Li <annie.li@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
---
 hw/pci/pci.c         |  1 +
 hw/pci/pcie.c        | 29 +++++++++++++++++++++++++++--
 include/hw/pci/pci.h |  1 +
 3 files changed, 29 insertions(+), 2 deletions(-)

Comments

Gerd Hoffmann Dec. 15, 2021, 6:05 a.m. UTC | #1
On Tue, Dec 14, 2021 at 09:53:12PM +0000, Annie Li wrote:
> After the PCIe device is hot-plugged, the device's power state is
> initialized as ON. However, the device isn't powered on yet, i.e.
> the PCI_EXP_SYSCTL_PCC bit isn't set to PCI_EXP_SLTCTL_PWR_ON.
> Later on, its power state will set back to OFF due to the non
> PCI_EXP_SLTCTL_PWR_ON state. The device is invisible until
> PCI_EXP_SLTCTL_PWR_ON is set.
> 
> This may be appropriate for general PCIe hot-plug cases. However,
> if the device is hot-plugged when the VM is in VM_STATE_PRELAUNCH
> state, especially the system disk device, the firmware will fail
> to find the system disk. As a result, the guest fails to boot.

Maybe we should just not set DeviceState->hotplugged = true for devices
added in VM_STATE_PRELAUNCH?  It's not actual hotplug (i.e. device added
while the system is running) after all ...

There are lots of places checking DeviceState->hotplugged, and I suspect
we have similar issues elsewhere.

take care,
  Gerd
Annie Li Dec. 15, 2021, 7:20 p.m. UTC | #2
On 12/15/2021 1:05 AM, Gerd Hoffmann wrote:
> On Tue, Dec 14, 2021 at 09:53:12PM +0000, Annie Li wrote:
>> After the PCIe device is hot-plugged, the device's power state is
>> initialized as ON. However, the device isn't powered on yet, i.e.
>> the PCI_EXP_SYSCTL_PCC bit isn't set to PCI_EXP_SLTCTL_PWR_ON.
>> Later on, its power state will set back to OFF due to the non
>> PCI_EXP_SLTCTL_PWR_ON state. The device is invisible until
>> PCI_EXP_SLTCTL_PWR_ON is set.
>>
>> This may be appropriate for general PCIe hot-plug cases. However,
>> if the device is hot-plugged when the VM is in VM_STATE_PRELAUNCH
>> state, especially the system disk device, the firmware will fail
>> to find the system disk. As a result, the guest fails to boot.
> Maybe we should just not set DeviceState->hotplugged = true for devices
> added in VM_STATE_PRELAUNCH?  It's not actual hotplug (i.e. device added
> while the system is running) after all ...
Simply not setting "DeviceState->hotplugged" doesn't work. Devices 
created in
PHASE_MACHINE_READY phase are treated as hot-plugged devices. So I just 
tried
following change for the quick test, the device is still invisible to 
the firmware with
this change.

static void device_initfn(Object *obj)
{
...snip...
-     if (phase_check(PHASE_MACHINE_READY)) {
+    if (phase_check(PHASE_MACHINE_READY) && 
!runstate_check(RUN_STATE_PRELAUNCH)) {
         dev->hotplugged = 1;
         qdev_hot_added = true;
     }
...snip...
}

Thanks
Annie
>
> There are lots of places checking DeviceState->hotplugged, and I suspect
> we have similar issues elsewhere.
>
> take care,
>    Gerd
>
Gerd Hoffmann Dec. 16, 2021, 6:11 a.m. UTC | #3
Hi,

> > Maybe we should just not set DeviceState->hotplugged = true for devices
> > added in VM_STATE_PRELAUNCH?  It's not actual hotplug (i.e. device added
> > while the system is running) after all ...

> Simply not setting "DeviceState->hotplugged" doesn't work. Devices created
> in
> PHASE_MACHINE_READY phase are treated as hot-plugged devices. So I just
> tried
> following change for the quick test, the device is still invisible to the
> firmware with
> this change.

Looking again, the difference is probably the reset handling.
pcie_cap_slot_reset() will turn on power (via PCI_EXP_SLTCTL_PCC) in
case some device is plugged into the slot.

So I suspect when plugging devices during VM_STATE_PRELAUNCH they are
resetted individually (specifically before the device is plugged),
whereas otherwise they are resetted when all devices are plugged in.

Does resetting devices when leaving RUN_STATE_PRELAUNCH fix this?

take care,
  Gerd
Annie Li Dec. 16, 2021, 11:10 p.m. UTC | #4
Hello Gerd,

On 12/16/2021 1:11 AM, Gerd Hoffmann wrote:
>    Hi,
>
>>> Maybe we should just not set DeviceState->hotplugged = true for devices
>>> added in VM_STATE_PRELAUNCH?  It's not actual hotplug (i.e. device added
>>> while the system is running) after all ...
>> Simply not setting "DeviceState->hotplugged" doesn't work. Devices created
>> in
>> PHASE_MACHINE_READY phase are treated as hot-plugged devices. So I just
>> tried
>> following change for the quick test, the device is still invisible to the
>> firmware with
>> this change.
> Looking again, the difference is probably the reset handling.
> pcie_cap_slot_reset() will turn on power (via PCI_EXP_SLTCTL_PCC) in
> case some device is plugged into the slot.
If the VM is booting from the system disk in the qemu command line(no 
hot-plug),
pcie_cap_slot_reset turns on the power for this device. And this happens 
before
the VM runs into VM_STATE_PRELAUNCH state.(I add '-S' option in this case
for comparison)
>
> So I suspect when plugging devices during VM_STATE_PRELAUNCH they are
> resetted individually (specifically before the device is plugged),
When the device is hot-plugged in VM_STATE_PRELAUNCH state, there is no
reset for the device during this state. Before entering this state,
pcie_cap_slot_reset  does get called for the device(like general VM 
booting).
However, it doesn't turn on the power. I think this is due to that the 
device isn't
hot-plugged yet, and "populated" value is false.
> whereas otherwise they are resetted when all devices are plugged in.
>
> Does resetting devices when leaving RUN_STATE_PRELAUNCH fix this?
I suppose only calling pcie_cap_slot_reset  isn't sufficient? maybe 
rp_reset?
However, the place of state transition is in runstate_set. I don't know
hot to hook this function to trigger the reset yet.
Just thinking of the implementation, if the patch is deployed in this way,
isn't the change is more complicated than the current one? :) Maybe I've
missed something here.

Thanks
Annie
>
> take care,
>    Gerd
>
Gerd Hoffmann Dec. 17, 2021, 6:47 a.m. UTC | #5
On Thu, Dec 16, 2021 at 06:10:40PM -0500, Annie.li wrote:
> Hello Gerd,
> 
> On 12/16/2021 1:11 AM, Gerd Hoffmann wrote:
> > 
> > Looking again, the difference is probably the reset handling.
> > pcie_cap_slot_reset() will turn on power (via PCI_EXP_SLTCTL_PCC) in
> > case some device is plugged into the slot.
> If the VM is booting from the system disk in the qemu command line(no
> hot-plug),
> pcie_cap_slot_reset turns on the power for this device. And this happens
> before
> the VM runs into VM_STATE_PRELAUNCH state.(I add '-S' option in this case
> for comparison)
> > 
> > So I suspect when plugging devices during VM_STATE_PRELAUNCH they are
> > resetted individually (specifically before the device is plugged),
> When the device is hot-plugged in VM_STATE_PRELAUNCH state, there is no
> reset for the device during this state. Before entering this state,
> pcie_cap_slot_reset  does get called for the device(like general VM
> booting).
> However, it doesn't turn on the power. I think this is due to that the
> device isn't
> hot-plugged yet, and "populated" value is false.

Ok, so maybe we just need to move the device reset from "before
prelaunch" to "after prelaunch" ?

> > whereas otherwise they are resetted when all devices are plugged in.
> > 
> > Does resetting devices when leaving RUN_STATE_PRELAUNCH fix this?
> I suppose only calling pcie_cap_slot_reset  isn't sufficient? maybe
> rp_reset?

Well, the underlying problem I see here is that the setup workflow for
devices is different depending how they got added (cmd line vs.
prelauch).  This should not be the case.

I wouldn't be surprised if we find this causing similar issues in other
devices.

> Just thinking of the implementation, if the patch is deployed in this way,
> isn't the change is more complicated than the current one? :) Maybe I've
> missed something here.

I want the root cause be found and fixed, not band aid being applied ...

take care,
  Gerd
diff mbox series

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e5993c1ef5..b61c547291 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2186,6 +2186,7 @@  static void pci_qdev_realize(DeviceState *qdev, Error **errp)
         return;
     }
 
+    pci_dev->set_power = true;
     pci_set_power(pci_dev, true);
 }
 
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index d7d73a31e4..e4ff23f3b9 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -28,6 +28,7 @@ 
 #include "hw/pci/pcie_regs.h"
 #include "hw/pci/pcie_port.h"
 #include "qemu/range.h"
+#include "sysemu/runstate.h"
 
 //#define DEBUG_PCIE
 #ifdef DEBUG_PCIE
@@ -385,8 +386,20 @@  static void pcie_cap_update_power(PCIDevice *hotplug_dev)
         power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON;
     }
 
-    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                        pcie_set_power_device, &power);
+    /*
+     * For devices hot-plugged in RUN_STATE_PRELAUNCH state, set_power is
+     * set to false to avoid unnecessary power state changes before the device
+     * is powered on. After the device is powered on, set_power has to be
+     * set back to true to allow general power state changes.
+     */
+    if (!hotplug_dev->set_power && power) {
+        hotplug_dev->set_power = true;
+    }
+
+    if (hotplug_dev->set_power) {
+        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
+                            pcie_set_power_device, &power);
+    }
 }
 
 /*
@@ -475,6 +488,18 @@  void pcie_cap_slot_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
         }
         pcie_cap_slot_event(hotplug_pdev,
                             PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
+
+        /*
+         * After the system disk device is hot-plugged during
+         * RUN_STATE_PRELAUNCH state, its power state will be set to OFF
+         * before the device is actually powered on. The device is invisible
+         * during this period. Hence the firmware won't find the system
+         * disk to boot. The set_power is set to false to avoid setting the
+         * power state to OFF.
+         */
+        if (runstate_check(RUN_STATE_PRELAUNCH)) {
+            hotplug_pdev->set_power = false;
+        }
         pcie_cap_update_power(hotplug_pdev);
     }
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e7cdf2d5ec..753df3523e 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -269,6 +269,7 @@  struct PCIDevice {
     DeviceState qdev;
     bool partially_hotplugged;
     bool has_power;
+    bool set_power;
 
     /* PCI config space */
     uint8_t *config;