diff mbox series

[7/7] spapr: Perform machine reset in a more sensible order

Message ID 20190911040452.8341-8-david@gibson.dropbear.id.au (mailing list archive)
State New, archived
Headers show
Series spapr: CAS and reset cleanup preliminaries | expand

Commit Message

David Gibson Sept. 11, 2019, 4:04 a.m. UTC
We've made several changes in the past to the machine reset order to fix
specific problems.  However, we've ended up with an order that doesn't make
a lot of logical sense.  This is an attempt to rectify this.

First we reset global CAS options, since that should depend on nothing
else.  Then we reset the CPUs, which shouldn't depend on external devices.
Then the irq subsystem, then the bulk of devices (which might rely on
irqs).  Finally we set up the entry state ready for boot, which could
potentially rely on a bunch of other things.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

Comments

Alexey Kardashevskiy Sept. 11, 2019, 7:40 a.m. UTC | #1
On 11/09/2019 14:04, David Gibson wrote:
> We've made several changes in the past to the machine reset order to fix
> specific problems.  However, we've ended up with an order that doesn't make
> a lot of logical sense.  This is an attempt to rectify this.
> 
> First we reset global CAS options, since that should depend on nothing
> else.  Then we reset the CPUs, which shouldn't depend on external devices.
> Then the irq subsystem, then the bulk of devices (which might rely on
> irqs).  Finally we set up the entry state ready for boot, which could
> potentially rely on a bunch of other things.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>


Breaks console on P8 and asserts on rebooting a P9 guest.



> ---
>  hw/ppc/spapr.c | 47 +++++++++++++++++++++++++----------------------
>  1 file changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5a919a6cc1..1560a11738 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1724,6 +1724,28 @@ static void spapr_machine_reset(MachineState *machine)
>      void *fdt;
>      int rc;
>  
> +    /*
> +     * If this reset wasn't generated by CAS, we should reset our
> +     * negotiated options and start from scratch
> +     */
> +    if (!spapr->cas_reboot) {
> +        spapr_ovec_cleanup(spapr->ov5_cas);
> +        spapr->ov5_cas = spapr_ovec_new();
> +
> +        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
> +    }
> +
> +    /*
> +     * There is no CAS under qtest. Simulate one to please the code that
> +     * depends on spapr->ov5_cas. This is especially needed to test device
> +     * unplug, so we do that before resetting the DRCs.
> +     */
> +    if (qtest_enabled()) {
> +        spapr_ovec_cleanup(spapr->ov5_cas);
> +        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
> +    }
> +
> +    /* Reset the CPUs */
>      spapr_caps_apply(spapr);
>  
>      first_ppc_cpu = POWERPC_CPU(first_cpu);
> @@ -1741,34 +1763,15 @@ static void spapr_machine_reset(MachineState *machine)
>          spapr_setup_hpt_and_vrma(spapr);
>      }
>  
> -    qemu_devices_reset();
> -
> -    /*
> -     * If this reset wasn't generated by CAS, we should reset our
> -     * negotiated options and start from scratch
> -     */
> -    if (!spapr->cas_reboot) {
> -        spapr_ovec_cleanup(spapr->ov5_cas);
> -        spapr->ov5_cas = spapr_ovec_new();
> -
> -        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
> -    }
> -
> +    /* Reset IRQ subsystem */
>      /*
>       * This is fixing some of the default configuration of the XIVE
>       * devices. To be called after the reset of the machine devices.
>       */
>      spapr_irq_reset(spapr, &error_fatal);
>  
> -    /*
> -     * There is no CAS under qtest. Simulate one to please the code that
> -     * depends on spapr->ov5_cas. This is especially needed to test device
> -     * unplug, so we do that before resetting the DRCs.
> -     */
> -    if (qtest_enabled()) {
> -        spapr_ovec_cleanup(spapr->ov5_cas);
> -        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
> -    }
> +    /* Reset other devices */
> +    qemu_devices_reset();
>  
>      /* DRC reset may cause a device to be unplugged. This will cause troubles
>       * if this device is used by another device (eg, a running vhost backend
>
David Gibson Sept. 11, 2019, 7:51 a.m. UTC | #2
On Wed, Sep 11, 2019 at 05:40:58PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 11/09/2019 14:04, David Gibson wrote:
> > We've made several changes in the past to the machine reset order to fix
> > specific problems.  However, we've ended up with an order that doesn't make
> > a lot of logical sense.  This is an attempt to rectify this.
> > 
> > First we reset global CAS options, since that should depend on nothing
> > else.  Then we reset the CPUs, which shouldn't depend on external devices.
> > Then the irq subsystem, then the bulk of devices (which might rely on
> > irqs).  Finally we set up the entry state ready for boot, which could
> > potentially rely on a bunch of other things.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> 
> 
> Breaks console on P8 and asserts on rebooting a P9 guest.

Yeah, I jumped the gun on this one - I need to rethink and test more thoroughly.

> 
> 
> 
> > ---
> >  hw/ppc/spapr.c | 47 +++++++++++++++++++++++++----------------------
> >  1 file changed, 25 insertions(+), 22 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 5a919a6cc1..1560a11738 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1724,6 +1724,28 @@ static void spapr_machine_reset(MachineState *machine)
> >      void *fdt;
> >      int rc;
> >  
> > +    /*
> > +     * If this reset wasn't generated by CAS, we should reset our
> > +     * negotiated options and start from scratch
> > +     */
> > +    if (!spapr->cas_reboot) {
> > +        spapr_ovec_cleanup(spapr->ov5_cas);
> > +        spapr->ov5_cas = spapr_ovec_new();
> > +
> > +        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
> > +    }
> > +
> > +    /*
> > +     * There is no CAS under qtest. Simulate one to please the code that
> > +     * depends on spapr->ov5_cas. This is especially needed to test device
> > +     * unplug, so we do that before resetting the DRCs.
> > +     */
> > +    if (qtest_enabled()) {
> > +        spapr_ovec_cleanup(spapr->ov5_cas);
> > +        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
> > +    }
> > +
> > +    /* Reset the CPUs */
> >      spapr_caps_apply(spapr);
> >  
> >      first_ppc_cpu = POWERPC_CPU(first_cpu);
> > @@ -1741,34 +1763,15 @@ static void spapr_machine_reset(MachineState *machine)
> >          spapr_setup_hpt_and_vrma(spapr);
> >      }
> >  
> > -    qemu_devices_reset();
> > -
> > -    /*
> > -     * If this reset wasn't generated by CAS, we should reset our
> > -     * negotiated options and start from scratch
> > -     */
> > -    if (!spapr->cas_reboot) {
> > -        spapr_ovec_cleanup(spapr->ov5_cas);
> > -        spapr->ov5_cas = spapr_ovec_new();
> > -
> > -        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
> > -    }
> > -
> > +    /* Reset IRQ subsystem */
> >      /*
> >       * This is fixing some of the default configuration of the XIVE
> >       * devices. To be called after the reset of the machine devices.
> >       */
> >      spapr_irq_reset(spapr, &error_fatal);
> >  
> > -    /*
> > -     * There is no CAS under qtest. Simulate one to please the code that
> > -     * depends on spapr->ov5_cas. This is especially needed to test device
> > -     * unplug, so we do that before resetting the DRCs.
> > -     */
> > -    if (qtest_enabled()) {
> > -        spapr_ovec_cleanup(spapr->ov5_cas);
> > -        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
> > -    }
> > +    /* Reset other devices */
> > +    qemu_devices_reset();
> >  
> >      /* DRC reset may cause a device to be unplugged. This will cause troubles
> >       * if this device is used by another device (eg, a running vhost backend
> > 
>
Cédric Le Goater Sept. 11, 2019, 7:54 a.m. UTC | #3
On 11/09/2019 06:04, David Gibson wrote:
> We've made several changes in the past to the machine reset order to fix
> specific problems.  However, we've ended up with an order that doesn't make
> a lot of logical sense.  This is an attempt to rectify this.

There are some more problems though. See below.

> 
> First we reset global CAS options, since that should depend on nothing
> else.  Then we reset the CPUs, which shouldn't depend on external devices.
> Then the irq subsystem, then the bulk of devices (which might rely on
> irqs).  Finally we set up the entry state ready for boot, which could
> potentially rely on a bunch of other things.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 47 +++++++++++++++++++++++++----------------------
>  1 file changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5a919a6cc1..1560a11738 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1724,6 +1724,28 @@ static void spapr_machine_reset(MachineState *machine)
>      void *fdt;
>      int rc;
>  
> +    /*
> +     * If this reset wasn't generated by CAS, we should reset our
> +     * negotiated options and start from scratch
> +     */
> +    if (!spapr->cas_reboot) {
> +        spapr_ovec_cleanup(spapr->ov5_cas);
> +        spapr->ov5_cas = spapr_ovec_new();
> +
> +        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
> +    }
> +
> +    /*
> +     * There is no CAS under qtest. Simulate one to please the code that
> +     * depends on spapr->ov5_cas. This is especially needed to test device
> +     * unplug, so we do that before resetting the DRCs.
> +     */
> +    if (qtest_enabled()) {
> +        spapr_ovec_cleanup(spapr->ov5_cas);
> +        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
> +    }
> +
> +    /* Reset the CPUs */
>      spapr_caps_apply(spapr);
>  
>      first_ppc_cpu = POWERPC_CPU(first_cpu);
> @@ -1741,34 +1763,15 @@ static void spapr_machine_reset(MachineState *machine)
>          spapr_setup_hpt_and_vrma(spapr);
>      }
>  
> -    qemu_devices_reset();
> -
> -    /*
> -     * If this reset wasn't generated by CAS, we should reset our
> -     * negotiated options and start from scratch
> -     */
> -    if (!spapr->cas_reboot) {
> -        spapr_ovec_cleanup(spapr->ov5_cas);
> -        spapr->ov5_cas = spapr_ovec_new();
> -
> -        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
> -    }
> -
> +    /* Reset IRQ subsystem */
>      /*
>       * This is fixing some of the default configuration of the XIVE
>       * devices. To be called after the reset of the machine devices.
>       */
>      spapr_irq_reset(spapr, &error_fatal);

spapr_irq_reset() is now called before qemu_devices_reset(). So it will
break the XIVE emulated model.

KVM P8 guests are also broken : 

 qemu-system-ppc64: kernel_irqchip requested but unavailable: Unable to restore KVM interrupt controller state (0x0) for CPU 0: Invalid argument

Something wrong in kvmppc_xics_set_icp(). I need to look closer.

and TCG P9 guests still do the reset after CAS. 

C.

>  
> -    /*
> -     * There is no CAS under qtest. Simulate one to please the code that
> -     * depends on spapr->ov5_cas. This is especially needed to test device
> -     * unplug, so we do that before resetting the DRCs.
> -     */
> -    if (qtest_enabled()) {
> -        spapr_ovec_cleanup(spapr->ov5_cas);
> -        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
> -    }
> +    /* Reset other devices */
> +    qemu_devices_reset();
>  
>      /* DRC reset may cause a device to be unplugged. This will cause troubles
>       * if this device is used by another device (eg, a running vhost backend
>
diff mbox series

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5a919a6cc1..1560a11738 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1724,6 +1724,28 @@  static void spapr_machine_reset(MachineState *machine)
     void *fdt;
     int rc;
 
+    /*
+     * If this reset wasn't generated by CAS, we should reset our
+     * negotiated options and start from scratch
+     */
+    if (!spapr->cas_reboot) {
+        spapr_ovec_cleanup(spapr->ov5_cas);
+        spapr->ov5_cas = spapr_ovec_new();
+
+        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
+    }
+
+    /*
+     * There is no CAS under qtest. Simulate one to please the code that
+     * depends on spapr->ov5_cas. This is especially needed to test device
+     * unplug, so we do that before resetting the DRCs.
+     */
+    if (qtest_enabled()) {
+        spapr_ovec_cleanup(spapr->ov5_cas);
+        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
+    }
+
+    /* Reset the CPUs */
     spapr_caps_apply(spapr);
 
     first_ppc_cpu = POWERPC_CPU(first_cpu);
@@ -1741,34 +1763,15 @@  static void spapr_machine_reset(MachineState *machine)
         spapr_setup_hpt_and_vrma(spapr);
     }
 
-    qemu_devices_reset();
-
-    /*
-     * If this reset wasn't generated by CAS, we should reset our
-     * negotiated options and start from scratch
-     */
-    if (!spapr->cas_reboot) {
-        spapr_ovec_cleanup(spapr->ov5_cas);
-        spapr->ov5_cas = spapr_ovec_new();
-
-        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
-    }
-
+    /* Reset IRQ subsystem */
     /*
      * This is fixing some of the default configuration of the XIVE
      * devices. To be called after the reset of the machine devices.
      */
     spapr_irq_reset(spapr, &error_fatal);
 
-    /*
-     * There is no CAS under qtest. Simulate one to please the code that
-     * depends on spapr->ov5_cas. This is especially needed to test device
-     * unplug, so we do that before resetting the DRCs.
-     */
-    if (qtest_enabled()) {
-        spapr_ovec_cleanup(spapr->ov5_cas);
-        spapr->ov5_cas = spapr_ovec_clone(spapr->ov5);
-    }
+    /* Reset other devices */
+    qemu_devices_reset();
 
     /* DRC reset may cause a device to be unplugged. This will cause troubles
      * if this device is used by another device (eg, a running vhost backend