[0/6] RfC: try improve native hotplug for pcie root ports

Message ID	20211011120504.254053-1-kraxel@redhat.com (mailing list archive)
Headers	show Return-Path: <SRS0=GFPn=O7=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B49B560E53 From: Gerd Hoffmann <kraxel@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH 0/6] RfC: try improve native hotplug for pcie root ports Date: Mon, 11 Oct 2021 14:04:58 +0200 Message-Id: <20211011120504.254053-1-kraxel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=170.10.133.124; envelope-from=kraxel@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.049, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Gerd Hoffmann <kraxel@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	RfC: try improve native hotplug for pcie root ports \| expand [0/6] RfC: try improve native hotplug for pcie root ports [1/6] pci: implement power state [2/6] pcie: implement slow power control for pcie root ports [3/6] pcie: add power indicator blink check [4/6] pcie: factor out pcie_cap_slot_unplug() [5/6] pcie: fast unplug when slot power is off [6/6] pcie: expire pending delete

Gerd Hoffmann Oct. 11, 2021, 12:04 p.m. UTC

Gerd Hoffmann (6):
  pci: implement power state
  pcie: implement slow power control for pcie root ports
  pcie: add power indicator blink check
  pcie: factor out pcie_cap_slot_unplug()
  pcie: fast unplug when slot power is off
  pcie: expire pending delete

 include/hw/pci/pci.h   |  2 ++
 include/hw/qdev-core.h |  1 +
 hw/pci/pci.c           | 25 +++++++++++--
 hw/pci/pci_host.c      |  6 ++--
 hw/pci/pcie.c          | 82 ++++++++++++++++++++++++++++++++++--------
 softmmu/qdev-monitor.c |  4 ++-
 6 files changed, 101 insertions(+), 19 deletions(-)

Michael S. Tsirkin Oct. 18, 2021, 3:36 p.m. UTC | #1

On Mon, Oct 11, 2021 at 02:04:58PM +0200, Gerd Hoffmann wrote:
> 
> 
> Gerd Hoffmann (6):
>   pci: implement power state
>   pcie: implement slow power control for pcie root ports
>   pcie: add power indicator blink check
>   pcie: factor out pcie_cap_slot_unplug()
>   pcie: fast unplug when slot power is off
>   pcie: expire pending delete

So what's left to do here?
I'm guessing more testing?
I would also like to see a shorter timeout, maybe 100ms, so
that we are more responsive to guest changes in resending request.


>  include/hw/pci/pci.h   |  2 ++
>  include/hw/qdev-core.h |  1 +
>  hw/pci/pci.c           | 25 +++++++++++--
>  hw/pci/pci_host.c      |  6 ++--
>  hw/pci/pcie.c          | 82 ++++++++++++++++++++++++++++++++++--------
>  softmmu/qdev-monitor.c |  4 ++-
>  6 files changed, 101 insertions(+), 19 deletions(-)
> 
> -- 
> 2.31.1
>

Gerd Hoffmann Oct. 19, 2021, 5:21 a.m. UTC | #2

On Mon, Oct 18, 2021 at 11:36:45AM -0400, Michael S. Tsirkin wrote:
> On Mon, Oct 11, 2021 at 02:04:58PM +0200, Gerd Hoffmann wrote:
> > 
> > 
> > Gerd Hoffmann (6):
> >   pci: implement power state
> >   pcie: implement slow power control for pcie root ports
> >   pcie: add power indicator blink check
> >   pcie: factor out pcie_cap_slot_unplug()
> >   pcie: fast unplug when slot power is off
> >   pcie: expire pending delete
> 
> So what's left to do here?
> I'm guessing more testing?

Yes.  Maybe ask rh qe to run the patch set through their hotplug test
suite (to avoid a apci-hotplug style disaster where qe found various
issues after release)?

> I would also like to see a shorter timeout, maybe 100ms, so
> that we are more responsive to guest changes in resending request.

I don't think it is a good idea to go for a shorter timeout given that
the 5 seconds are in the specs and we want avoid a resent request being
interpreted as cancel.

It also wouldn't change anything at least for linux guests because linux
is waiting those 5 seconds (with power indicator in blinking state).
Only the reason for refusing 'device_del' changes from "5 secs not over
yet" to "guest is busy processing the hotplug request".

We could consider to tackle the 5sec timeout on the guest side, i.e.
have linux skip the 5sec wait in case the root port is virtual (should
be easy to figure by checking the pci id).

take care,
  Gerd

Michael S. Tsirkin Oct. 19, 2021, 5:46 a.m. UTC | #3

On Tue, Oct 19, 2021 at 07:21:44AM +0200, Gerd Hoffmann wrote:
> On Mon, Oct 18, 2021 at 11:36:45AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Oct 11, 2021 at 02:04:58PM +0200, Gerd Hoffmann wrote:
> > > 
> > > 
> > > Gerd Hoffmann (6):
> > >   pci: implement power state
> > >   pcie: implement slow power control for pcie root ports
> > >   pcie: add power indicator blink check
> > >   pcie: factor out pcie_cap_slot_unplug()
> > >   pcie: fast unplug when slot power is off
> > >   pcie: expire pending delete
> > 
> > So what's left to do here?
> > I'm guessing more testing?
> 
> Yes.  Maybe ask rh qe to run the patch set through their hotplug test
> suite (to avoid a apci-hotplug style disaster where qe found various
> issues after release)?

I'll poke around to see if they can help us... we'll need
a backport for that though.

> > I would also like to see a shorter timeout, maybe 100ms, so
> > that we are more responsive to guest changes in resending request.
> 
> I don't think it is a good idea to go for a shorter timeout given that
> the 5 seconds are in the specs and we want avoid a resent request being
> interpreted as cancel.
> It also wouldn't change anything at least for linux guests because linux
> is waiting those 5 seconds (with power indicator in blinking state).
> Only the reason for refusing 'device_del' changes from "5 secs not over
> yet" to "guest is busy processing the hotplug request".

First 5 seconds yes. But the retries afterwards?

> 
> We could consider to tackle the 5sec timeout on the guest side, i.e.
> have linux skip the 5sec wait in case the root port is virtual (should
> be easy to figure by checking the pci id).
> 
> take care,
>   Gerd

Yes ... do we want to control how long it blinks from hypervisor side?
And if we do then what? Shorten retry period?

Gerd Hoffmann Oct. 19, 2021, 6:29 a.m. UTC | #4

Hi,

> > Yes.  Maybe ask rh qe to run the patch set through their hotplug test
> > suite (to avoid a apci-hotplug style disaster where qe found various
> > issues after release)?
> 
> I'll poke around to see if they can help us... we'll need
> a backport for that though.

Easy, it's a clean cherry-pick for 6.1, scratch build is on the way.

> > > I would also like to see a shorter timeout, maybe 100ms, so
> > > that we are more responsive to guest changes in resending request.
> > 
> > I don't think it is a good idea to go for a shorter timeout given that
> > the 5 seconds are in the specs and we want avoid a resent request being
> > interpreted as cancel.
> > It also wouldn't change anything at least for linux guests because linux
> > is waiting those 5 seconds (with power indicator in blinking state).
> > Only the reason for refusing 'device_del' changes from "5 secs not over
> > yet" to "guest is busy processing the hotplug request".
> 
> First 5 seconds yes. But the retries afterwards?

Hmm, maybe, but I'd tend to keep it simple and go for 5 secs no matter
what.  If the guest isn't responding (maybe because it is in the middle
of a reboot) it's unlikely that fast re-requests are fundamentally
changing things.

> > We could consider to tackle the 5sec timeout on the guest side, i.e.
> > have linux skip the 5sec wait in case the root port is virtual (should
> > be easy to figure by checking the pci id).
> > 
> > take care,
> >   Gerd
> 
> Yes ... do we want to control how long it blinks from hypervisor side?

Is there a good reason for that?
If not, then no.  Keep it simple.

When the guest powers off the slot pcie_cap_slot_write_config() will
happily unplug the device without additional checks (no check whenever
the 5 seconds are over, also no check whenever there is a pending unplug
request in the first place).

So in theory the guest turning off slot power quickly should work just
fine and speed up the unplug process in the common case (guest is
up'n'running and responsitive).  Down to 1-2 secs instead of 5-7.
Didn't actually test that though.

take care,
  Gerd

Michael S. Tsirkin Nov. 1, 2021, 9:47 p.m. UTC | #5

On Tue, Oct 19, 2021 at 08:29:19AM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > > Yes.  Maybe ask rh qe to run the patch set through their hotplug test
> > > suite (to avoid a apci-hotplug style disaster where qe found various
> > > issues after release)?
> > 
> > I'll poke around to see if they can help us... we'll need
> > a backport for that though.
> 
> Easy, it's a clean cherry-pick for 6.1, scratch build is on the way.
> 
> > > > I would also like to see a shorter timeout, maybe 100ms, so
> > > > that we are more responsive to guest changes in resending request.
> > > 
> > > I don't think it is a good idea to go for a shorter timeout given that
> > > the 5 seconds are in the specs and we want avoid a resent request being
> > > interpreted as cancel.
> > > It also wouldn't change anything at least for linux guests because linux
> > > is waiting those 5 seconds (with power indicator in blinking state).
> > > Only the reason for refusing 'device_del' changes from "5 secs not over
> > > yet" to "guest is busy processing the hotplug request".
> > 
> > First 5 seconds yes. But the retries afterwards?
> 
> Hmm, maybe, but I'd tend to keep it simple and go for 5 secs no matter
> what.  If the guest isn't responding (maybe because it is in the middle
> of a reboot) it's unlikely that fast re-requests are fundamentally
> changing things.
> 
> > > We could consider to tackle the 5sec timeout on the guest side, i.e.
> > > have linux skip the 5sec wait in case the root port is virtual (should
> > > be easy to figure by checking the pci id).
> > > 
> > > take care,
> > >   Gerd
> > 
> > Yes ... do we want to control how long it blinks from hypervisor side?
> 
> Is there a good reason for that?
> If not, then no.  Keep it simple.
> 
> When the guest powers off the slot pcie_cap_slot_write_config() will
> happily unplug the device without additional checks (no check whenever
> the 5 seconds are over, also no check whenever there is a pending unplug
> request in the first place).
> 
> So in theory the guest turning off slot power quickly should work just
> fine and speed up the unplug process in the common case (guest is
> up'n'running and responsitive).  Down to 1-2 secs instead of 5-7.
> Didn't actually test that though.
> 
> take care,
>   Gerd

Even if this speeds up unplug, hotplug remains slow, right?

Gerd Hoffmann Nov. 2, 2021, 12:09 p.m. UTC | #6

Hi,

> > So in theory the guest turning off slot power quickly should work just
> > fine and speed up the unplug process in the common case (guest is
> > up'n'running and responsitive).  Down to 1-2 secs instead of 5-7.
> > Didn't actually test that though.

Tried mean while, test patch (not polished yet) below.

> Even if this speeds up unplug, hotplug remains slow, right?

Notplug never was slow for me ...

take care,
  Gerd

From 074627a24a54203f2b4baf787fd4110c78222e23 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Tue, 19 Oct 2021 09:12:22 +0200
Subject: [PATCH] pciehp: fast virtual unplug for VMs

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/pci/hotplug/pciehp.h      |  3 +++
 drivers/pci/hotplug/pciehp_core.c |  5 +++++
 drivers/pci/hotplug/pciehp_ctrl.c | 10 +++++++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..131ffec2e947 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -79,6 +79,7 @@ extern int pciehp_poll_time;
  * @request_result: result of last user request submitted to the IRQ thread
  * @requester: wait queue to wake up on completion of user request,
  *	used for synchronous slot enable/disable request via sysfs
+ * @is_virtual: virtual machine pcie port.
  *
  * PCIe hotplug has a 1:1 relationship between controller and slot, hence
  * unlike other drivers, the two aren't represented by separate structures.
@@ -109,6 +110,8 @@ struct controller {
 	unsigned int ist_running;
 	int request_result;
 	wait_queue_head_t requester;
+
+	bool is_virtual;
 };
 
 /**
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..28867ec33f5b 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -227,6 +227,11 @@ static int pciehp_probe(struct pcie_device *dev)
 		goto err_out_shutdown_notification;
 	}
 
+	if (dev->port->vendor == PCI_VENDOR_ID_REDHAT &&
+	    dev->port->device == 0x000c)
+		/* qemu pcie root port */
+		ctrl->is_virtual = true;
+
 	pciehp_check_presence(ctrl);
 
 	return 0;
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 529c34808440..c0a05bbdb948 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -21,6 +21,10 @@
 #include <linux/pci.h>
 #include "pciehp.h"
 
+static bool fast_virtual_unplug = true;
+module_param(fast_virtual_unplug, bool, 0644);
+MODULE_PARM_DESC(fast_virtual_unplug, "Fast unplug (don't wait 5 seconds) for virtual machines.");
+
 /* The following routines constitute the bulk of the
    hotplug controller logic
  */
@@ -176,7 +180,11 @@ void pciehp_handle_button_press(struct controller *ctrl)
 		/* blink power indicator and turn off attention */
 		pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
 				      PCI_EXP_SLTCTL_ATTN_IND_OFF);
-		schedule_delayed_work(&ctrl->button_work, 5 * HZ);
+		if (ctrl->is_virtual && fast_virtual_unplug) {
+			schedule_delayed_work(&ctrl->button_work, 1);
+		} else {
+			schedule_delayed_work(&ctrl->button_work, 5 * HZ);
+		}
 		break;
 	case BLINKINGOFF_STATE:
 	case BLINKINGON_STATE:

Michael S. Tsirkin Nov. 10, 2021, 12:02 p.m. UTC | #7

Given it's a bugfix, and given that I hear through internal channels
that QE results so far have been encouraging, I am inclined to bite the
bullet and merge this for -rc1.

I don't think this conflicts with Julia's patches as users can still
disable ACPI hotplug into bridges.  Gerd, agree? Worth the risk?

On Mon, Oct 11, 2021 at 02:04:58PM +0200, Gerd Hoffmann wrote:
> 
> 
> Gerd Hoffmann (6):
>   pci: implement power state
>   pcie: implement slow power control for pcie root ports
>   pcie: add power indicator blink check
>   pcie: factor out pcie_cap_slot_unplug()
>   pcie: fast unplug when slot power is off
>   pcie: expire pending delete
> 
>  include/hw/pci/pci.h   |  2 ++
>  include/hw/qdev-core.h |  1 +
>  hw/pci/pci.c           | 25 +++++++++++--
>  hw/pci/pci_host.c      |  6 ++--
>  hw/pci/pcie.c          | 82 ++++++++++++++++++++++++++++++++++--------
>  softmmu/qdev-monitor.c |  4 ++-
>  6 files changed, 101 insertions(+), 19 deletions(-)
> 
> -- 
> 2.31.1
>

Gerd Hoffmann Nov. 11, 2021, 7:53 a.m. UTC | #8

Hi,

> Given it's a bugfix, and given that I hear through internal channels
> that QE results so far have been encouraging, I am inclined to bite the
> bullet and merge this for -rc1.

Fine with me.

> I don't think this conflicts with Julia's patches as users can still
> disable ACPI hotplug into bridges.  Gerd, agree?  Worth the risk?

Combining this with Julia's patches looks a bit risky to me and surely
needs testing.  I expect the problematic case is both native and acpi
hotplug being enabled.  When the guest uses acpi hotplug nobody will
turn on slot power on the pcie root port ...

I'd suggest to just revert to pcie native hotplug for 6.2.  Give acpi
hotplug another try for 7.0, and post patches early in the devel cycle
then instead of waiting until -rc0 is released.

take care,
  Gerd

Michael S. Tsirkin Nov. 11, 2021, 8:20 a.m. UTC | #9

On Thu, Nov 11, 2021 at 08:53:06AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > Given it's a bugfix, and given that I hear through internal channels
> > that QE results so far have been encouraging, I am inclined to bite the
> > bullet and merge this for -rc1.
> 
> Fine with me.
> 
> > I don't think this conflicts with Julia's patches as users can still
> > disable ACPI hotplug into bridges.  Gerd, agree?  Worth the risk?
> 
> Combining this with Julia's patches looks a bit risky to me and surely
> needs testing.  I expect the problematic case is both native and acpi
> hotplug being enabled.
>  When the guest uses acpi hotplug nobody will
> turn on slot power on the pcie root port ...

I'm not sure I understand what the situation is, and how to trigger it.
Could you clarify pls?

> I'd suggest to just revert to pcie native hotplug for 6.2.

Hmm that kind of change seems even riskier to me. I think I'll try with
Igor's patches.

> Give acpi
> hotplug another try for 7.0, and post patches early in the devel cycle
> then instead of waiting until -rc0 is released.
> 
> take care,
>   Gerd

Gerd Hoffmann Nov. 11, 2021, 9:34 a.m. UTC | #10

On Thu, Nov 11, 2021 at 03:20:07AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 08:53:06AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > Given it's a bugfix, and given that I hear through internal channels
> > > that QE results so far have been encouraging, I am inclined to bite the
> > > bullet and merge this for -rc1.
> > 
> > Fine with me.
> > 
> > > I don't think this conflicts with Julia's patches as users can still
> > > disable ACPI hotplug into bridges.  Gerd, agree?  Worth the risk?
> > 
> > Combining this with Julia's patches looks a bit risky to me and surely
> > needs testing.  I expect the problematic case is both native and acpi
> > hotplug being enabled.
> >  When the guest uses acpi hotplug nobody will
> > turn on slot power on the pcie root port ...
> 
> I'm not sure I understand what the situation is, and how to trigger it.
> Could you clarify pls?

My patch series implements proper slot power emulation for pcie root
ports, i.e. the pci device plugged into the slot is only visible to
the guest when slot power is enabled.

When the pciehp driver is used the linux kernel will enable slot power
of course.

When the acpihp driver is used the linux kernel will just call the aml
methods and I suspect the pci device will stay invisible then because
nobody flips the slot power control bit (with native-hotplug=on, for
native-hotplug=off this isn't a problem of course).

> > I'd suggest to just revert to pcie native hotplug for 6.2.
> 
> Hmm that kind of change seems even riskier to me.

It's not clear to me why you consider that riskier.  It should fix the
qemu 6.1 regressions.  Given QE has found no problems so far it should
not be worse than qemu 6.0 was.  Most likely it will work better than
qemu 6.0.

Going with a new approach (enable both native and acpi hotplug) after
freeze looks risky to me, even without considering the conflict outlined
above.  Last-minute changes with the chance to repeat the 6.1 disaster,
only with a different set of bugs ...

take care,
  Gerd

Daniel P. Berrangé Nov. 11, 2021, 9:35 a.m. UTC | #11

On Thu, Nov 11, 2021 at 03:20:07AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 08:53:06AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > Given it's a bugfix, and given that I hear through internal channels
> > > that QE results so far have been encouraging, I am inclined to bite the
> > > bullet and merge this for -rc1.
> > 
> > Fine with me.
> > 
> > > I don't think this conflicts with Julia's patches as users can still
> > > disable ACPI hotplug into bridges.  Gerd, agree?  Worth the risk?
> > 
> > Combining this with Julia's patches looks a bit risky to me and surely
> > needs testing.  I expect the problematic case is both native and acpi
> > hotplug being enabled.
> >  When the guest uses acpi hotplug nobody will
> > turn on slot power on the pcie root port ...
> 
> I'm not sure I understand what the situation is, and how to trigger it.
> Could you clarify pls?
> 
> > I'd suggest to just revert to pcie native hotplug for 6.2.
> 
> Hmm that kind of change seems even riskier to me. I think I'll try with
> Igor's patches.

Why would it be risky ? PCIE native hotplug is what we've used in
QEMU for years & years, until 6.1 enabled the buggy ACPI hotplug.
The behaviour of the current PCIE native hotplug impl is a known
quantity.

Regards,
Daniel

Gerd Hoffmann Nov. 11, 2021, 12:09 p.m. UTC | #12

Hi,

> When the acpihp driver is used the linux kernel will just call the aml
> methods and I suspect the pci device will stay invisible then because
> nobody flips the slot power control bit (with native-hotplug=on, for
> native-hotplug=off this isn't a problem of course).

Hmm, on a quick smoke test with both patch series (mine + igors) applied
everything seems to work fine on a quick glance.  Dunno why.  Maybe the
pcieport driver turns on slot power even in case pciehp is not active.

I still feel a bit nervous about trying the new "both pcie + acpi
hotplug enabled" approach that close to the release date ...

take care,
  Gerd

Michael S. Tsirkin Nov. 11, 2021, 3:39 p.m. UTC | #13

On Thu, Nov 11, 2021 at 01:09:05PM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > When the acpihp driver is used the linux kernel will just call the aml
> > methods and I suspect the pci device will stay invisible then because
> > nobody flips the slot power control bit (with native-hotplug=on, for
> > native-hotplug=off this isn't a problem of course).
> 
> Hmm, on a quick smoke test with both patch series (mine + igors) applied
> everything seems to work fine on a quick glance.  Dunno why.  Maybe the
> pcieport driver turns on slot power even in case pciehp is not active.

Well power and hotplug capabilities are mostly unrelated, right?

> I still feel a bit nervous about trying the new "both pcie + acpi
> hotplug enabled" approach that close to the release date ...
> 
> take care,
>   Gerd

I feel switching to native so late would be inappropriate, looks more
like a feature than a bugfix. Given that - we need Igor's patches.
Given that - would you say I should apply yours? I'm inclined to
do so but if you prefer waiting until after the release I'll
defer to your judgement.

Michael S. Tsirkin Nov. 11, 2021, 5:11 p.m. UTC | #14

On Thu, Nov 11, 2021 at 09:35:37AM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 11, 2021 at 03:20:07AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2021 at 08:53:06AM +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > Given it's a bugfix, and given that I hear through internal channels
> > > > that QE results so far have been encouraging, I am inclined to bite the
> > > > bullet and merge this for -rc1.
> > > 
> > > Fine with me.
> > > 
> > > > I don't think this conflicts with Julia's patches as users can still
> > > > disable ACPI hotplug into bridges.  Gerd, agree?  Worth the risk?
> > > 
> > > Combining this with Julia's patches looks a bit risky to me and surely
> > > needs testing.  I expect the problematic case is both native and acpi
> > > hotplug being enabled.
> > >  When the guest uses acpi hotplug nobody will
> > > turn on slot power on the pcie root port ...
> > 
> > I'm not sure I understand what the situation is, and how to trigger it.
> > Could you clarify pls?
> > 
> > > I'd suggest to just revert to pcie native hotplug for 6.2.
> > 
> > Hmm that kind of change seems even riskier to me. I think I'll try with
> > Igor's patches.
> 
> Why would it be risky ? PCIE native hotplug is what we've used in
> QEMU for years & years, until 6.1 enabled the buggy ACPI hotplug.
> The behaviour of the current PCIE native hotplug impl is a known
> quantity.
> 
> Regards,
> Daniel

For example we might regress some of the bugs that the switch to ACPI fixed back to
6.0 state. There were a bunch of these. For example it should be
possible for guests to disable native and use ACPI then, but isn't.
I'm very willing to consider the switch back to native by default
but given the timing doing big changes like that at the last
minute seems unusual.

> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Daniel P. Berrangé Nov. 11, 2021, 6:08 p.m. UTC | #15

On Thu, Nov 11, 2021 at 12:11:19PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 09:35:37AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Nov 11, 2021 at 03:20:07AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Nov 11, 2021 at 08:53:06AM +0100, Gerd Hoffmann wrote:
> > > >   Hi,
> > > > 
> > > > > Given it's a bugfix, and given that I hear through internal channels
> > > > > that QE results so far have been encouraging, I am inclined to bite the
> > > > > bullet and merge this for -rc1.
> > > > 
> > > > Fine with me.
> > > > 
> > > > > I don't think this conflicts with Julia's patches as users can still
> > > > > disable ACPI hotplug into bridges.  Gerd, agree?  Worth the risk?
> > > > 
> > > > Combining this with Julia's patches looks a bit risky to me and surely
> > > > needs testing.  I expect the problematic case is both native and acpi
> > > > hotplug being enabled.
> > > >  When the guest uses acpi hotplug nobody will
> > > > turn on slot power on the pcie root port ...
> > > 
> > > I'm not sure I understand what the situation is, and how to trigger it.
> > > Could you clarify pls?
> > > 
> > > > I'd suggest to just revert to pcie native hotplug for 6.2.
> > > 
> > > Hmm that kind of change seems even riskier to me. I think I'll try with
> > > Igor's patches.
> > 
> > Why would it be risky ? PCIE native hotplug is what we've used in
> > QEMU for years & years, until 6.1 enabled the buggy ACPI hotplug.
> > The behaviour of the current PCIE native hotplug impl is a known
> > quantity.
> > 
> > Regards,
> > Daniel
> 
> For example we might regress some of the bugs that the switch to ACPI fixed back to
> 6.0 state. There were a bunch of these. For example it should be
> possible for guests to disable native and use ACPI then, but isn't.

Of course there were bugs fixed by switching to ACPI, but we'd
lived with native hotplug in production and the majority of
the time it worked as users needed. The bugs were edge cases
essentially only affecting a small subset of users.

The switch to ACPI broke the out of the box configuration for
used by OpenStack. That's not an edge case, that's a serious
impact.

> I'm very willing to consider the switch back to native by default
> but given the timing doing big changes like that at the last
> minute seems unusual.

I consider it to be fixing a serious regression by going back
to a known working safe impl, that has been used in production
successfully for a long time. We know there are bugs with
native hotplug, but they're *known* problems.

The unsual thing about timing is having a major functional
regression identified in the previous release and then not
even having patches propposed to fix it until after soft
freeze for the next release arrives :-(

It doesn't give a feeling of confidence, but makes me
wonder what other *unknown* problems we're liable to hit
still.

Regards,
Daniel

Michael S. Tsirkin Nov. 11, 2021, 6:43 p.m. UTC | #16

On Thu, Nov 11, 2021 at 06:08:11PM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 11, 2021 at 12:11:19PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2021 at 09:35:37AM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Nov 11, 2021 at 03:20:07AM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 11, 2021 at 08:53:06AM +0100, Gerd Hoffmann wrote:
> > > > >   Hi,
> > > > > 
> > > > > > Given it's a bugfix, and given that I hear through internal channels
> > > > > > that QE results so far have been encouraging, I am inclined to bite the
> > > > > > bullet and merge this for -rc1.
> > > > > 
> > > > > Fine with me.
> > > > > 
> > > > > > I don't think this conflicts with Julia's patches as users can still
> > > > > > disable ACPI hotplug into bridges.  Gerd, agree?  Worth the risk?
> > > > > 
> > > > > Combining this with Julia's patches looks a bit risky to me and surely
> > > > > needs testing.  I expect the problematic case is both native and acpi
> > > > > hotplug being enabled.
> > > > >  When the guest uses acpi hotplug nobody will
> > > > > turn on slot power on the pcie root port ...
> > > > 
> > > > I'm not sure I understand what the situation is, and how to trigger it.
> > > > Could you clarify pls?
> > > > 
> > > > > I'd suggest to just revert to pcie native hotplug for 6.2.
> > > > 
> > > > Hmm that kind of change seems even riskier to me. I think I'll try with
> > > > Igor's patches.
> > > 
> > > Why would it be risky ? PCIE native hotplug is what we've used in
> > > QEMU for years & years, until 6.1 enabled the buggy ACPI hotplug.
> > > The behaviour of the current PCIE native hotplug impl is a known
> > > quantity.
> > > 
> > > Regards,
> > > Daniel
> > 
> > For example we might regress some of the bugs that the switch to ACPI fixed back to
> > 6.0 state. There were a bunch of these. For example it should be
> > possible for guests to disable native and use ACPI then, but isn't.
> 
> Of course there were bugs fixed by switching to ACPI, but we'd
> lived with native hotplug in production and the majority of
> the time it worked as users needed. The bugs were edge cases
> essentially only affecting a small subset of users.
> 
> The switch to ACPI broke the out of the box configuration for
> used by OpenStack. That's not an edge case, that's a serious
> impact.

Right. It's pretty easy for downstreams to switch back if they want to,
though.

> > I'm very willing to consider the switch back to native by default
> > but given the timing doing big changes like that at the last
> > minute seems unusual.
> 
> I consider it to be fixing a serious regression by going back
> to a known working safe impl, that has been used in production
> successfully for a long time. We know there are bugs with
> native hotplug, but they're *known* problems.

Are you familiar with the issues or are just arguing generally?  From my
POV they made native too unreliable to be useful.  It's great that the
narrow usecase of openstack managed not to hit any of the races
involved, but I think it's more a question of luck.  This until these
specific patches, but they are only in v2 out of rfc status just today.

> The unsual thing about timing is having a major functional
> regression identified in the previous release and then not
> even having patches propposed to fix it until after soft
> freeze for the next release arrives :-(
> 
> It doesn't give a feeling of confidence, but makes me
> wonder what other *unknown* problems we're liable to hit
> still.
> 
> Regards,
> Daniel

Right. And this applies to both approaches.  So I thought hard about the
balance here, and am inclined to go with a rough consensus opinion.

I don't see any reasons or in fact anyone objecting strongly to Igor's
patches, so I'm taking these. These are just bugfixes.

But if we also reach a rough consensus on others and e.g. Igor acks
Gerd's patch to revert to native then I think I'll merge it, it's
a close thing.

> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Gerd Hoffmann Nov. 12, 2021, 10:16 a.m. UTC | #17

Hi,

> involved, but I think it's more a question of luck.  This until these
> specific patches, but they are only in v2 out of rfc status just today.

Never posted a formal non-rfc version because I had no pending changes.
Maybe I should have done that ...

v2 has no functional changes, it only resolves a conflict,
so effectively the same thing as the rfc series.

take care,
  Gerd

Gerd Hoffmann Nov. 12, 2021, 11:15 a.m. UTC | #18

On Thu, Nov 11, 2021 at 10:39:59AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 01:09:05PM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > When the acpihp driver is used the linux kernel will just call the aml
> > > methods and I suspect the pci device will stay invisible then because
> > > nobody flips the slot power control bit (with native-hotplug=on, for
> > > native-hotplug=off this isn't a problem of course).
> > 
> > Hmm, on a quick smoke test with both patch series (mine + igors) applied
> > everything seems to work fine on a quick glance.  Dunno why.  Maybe the
> > pcieport driver turns on slot power even in case pciehp is not active.

Digged deeper.  Updating power status is handled by the plug() callback,
which is never called in case acpi hotplug is active.  The guest seems
to never touch slot power control either, so it's working fine.  Still
feels a bit fragile though.

> Well power and hotplug capabilities are mostly unrelated, right?

At least they are separate slot capabilities.  The linux pciehp driver
checks whenever the power control is present before using it, so having
PwrCtrl- HotPlug+ seems to be a valid combination.

We even have an option for that: pcie-root-port.power_controller_present

So flipping that to off in case apci hotplug is active should make sure
we never run into trouble with pci devices being powered off.

Igor?  Can you add that to your patch series?

> I feel switching to native so late would be inappropriate, looks more
> like a feature than a bugfix. Given that - we need Igor's patches.
> Given that - would you say I should apply yours?

I think when setting power_controller_present=off for acpi hotplug it is
safe to merge both mine and igor's.

take care,
  Gerd

Igor Mammedov Nov. 12, 2021, 12:17 p.m. UTC | #19

On Fri, 12 Nov 2021 12:15:28 +0100
Gerd Hoffmann <kraxel@redhat.com> wrote:

> On Thu, Nov 11, 2021 at 10:39:59AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2021 at 01:09:05PM +0100, Gerd Hoffmann wrote:  
> > >   Hi,
> > >   
> > > > When the acpihp driver is used the linux kernel will just call the aml
> > > > methods and I suspect the pci device will stay invisible then because
> > > > nobody flips the slot power control bit (with native-hotplug=on, for
> > > > native-hotplug=off this isn't a problem of course).  
> > > 
> > > Hmm, on a quick smoke test with both patch series (mine + igors) applied
> > > everything seems to work fine on a quick glance.  Dunno why.  Maybe the
> > > pcieport driver turns on slot power even in case pciehp is not active.  
> 
> Digged deeper.  Updating power status is handled by the plug() callback,
> which is never called in case acpi hotplug is active.  The guest seems
> to never touch slot power control either, so it's working fine.  Still
> feels a bit fragile though.
> 
> > Well power and hotplug capabilities are mostly unrelated, right?  
> 
> At least they are separate slot capabilities.  The linux pciehp driver
> checks whenever the power control is present before using it, so having
> PwrCtrl- HotPlug+ seems to be a valid combination.
> 
> We even have an option for that: pcie-root-port.power_controller_present
> 
> So flipping that to off in case apci hotplug is active should make sure
> we never run into trouble with pci devices being powered off.
> 
> Igor?  Can you add that to your patch series?

Sorry, saw it too late.
I'll test idea with my set of guests to see if there are any adverse effects.


> > I feel switching to native so late would be inappropriate, looks more
> > like a feature than a bugfix. Given that - we need Igor's patches.
> > Given that - would you say I should apply yours?  
> 
> I think when setting power_controller_present=off for acpi hotplug it is
> safe to merge both mine and igor's.
> 
> take care,
>   Gerd
>

Michael S. Tsirkin Nov. 15, 2021, 11:13 a.m. UTC | #20

On Fri, Nov 12, 2021 at 12:15:28PM +0100, Gerd Hoffmann wrote:
> On Thu, Nov 11, 2021 at 10:39:59AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2021 at 01:09:05PM +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > When the acpihp driver is used the linux kernel will just call the aml
> > > > methods and I suspect the pci device will stay invisible then because
> > > > nobody flips the slot power control bit (with native-hotplug=on, for
> > > > native-hotplug=off this isn't a problem of course).
> > > 
> > > Hmm, on a quick smoke test with both patch series (mine + igors) applied
> > > everything seems to work fine on a quick glance.  Dunno why.  Maybe the
> > > pcieport driver turns on slot power even in case pciehp is not active.
> 
> Digged deeper.  Updating power status is handled by the plug() callback,
> which is never called in case acpi hotplug is active.  The guest seems
> to never touch slot power control either, so it's working fine.  Still
> feels a bit fragile though.
> 
> > Well power and hotplug capabilities are mostly unrelated, right?
> 
> At least they are separate slot capabilities.  The linux pciehp driver
> checks whenever the power control is present before using it, so having
> PwrCtrl- HotPlug+ seems to be a valid combination.
> 
> We even have an option for that: pcie-root-port.power_controller_present
> 
> So flipping that to off in case apci hotplug is active should make sure
> we never run into trouble with pci devices being powered off.
> 
> Igor?  Can you add that to your patch series?
> 
> > I feel switching to native so late would be inappropriate, looks more
> > like a feature than a bugfix. Given that - we need Igor's patches.
> > Given that - would you say I should apply yours?
> 
> I think when setting power_controller_present=off for acpi hotplug it is
> safe to merge both mine and igor's.
> 
> take care,
>   Gerd

So this did not surface yet but I guess we can do this as
a patch on top, either of you can post it.

[0/6] RfC: try improve native hotplug for pcie root ports

Message

Comments