diff mbox

[03/10] spapr: add option vector handling in CAS-generated resets

Message ID 1477370856-8940-4-git-send-email-mdroth@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael Roth Oct. 25, 2016, 4:47 a.m. UTC
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
   call spapr_dt_cas_updates() with the set of capabilities
   added since the previous call to ibm,client-architecture-support.
   For the initial boot, or a system reset generated by something
   other than the CAS call itself, this set will consist of *all*
   options supported both the platform and the guest. For calls
   to ibm,client-architecture-support immediately after a CAS-induced
   reset, we call spapr_dt_cas_updates() with only the set
   of capabilities added since the previous call, since the other
   capabilities will have already been addressed by the boot-time
   device-tree this time around. In the unlikely event that
   capabilities are *removed* since the previous CAS, we will
   generate a CAS-induced reset. In the unlikely event that we
   cannot fit the device-tree updates into the buffer provided
   by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
   include the updates in the boot-time device tree, we call the
   spapr_dt_cas_updates() using the full set of negotiated
   capabilities as part of the reset path. At initial boot, or after
   a reset generated by something other than the CAS call itself,
   this set will be empty, resulting in what should be the same
   boot-time device-tree as we generated prior to this patch. For
   CAS-induced reset, this routine will be called with the full set of
   capabilities negotiated by the platform/guest in the previous
   CAS call, which should result in CAS updates from previous call
   being accounted for in the initial boot-time device tree.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c         | 40 ++++++++++++++++++++++++++++++++++------
 hw/ppc/spapr_hcall.c   | 22 ++++++++++++++++++----
 include/hw/ppc/spapr.h |  4 +++-
 3 files changed, 55 insertions(+), 11 deletions(-)

Comments

David Gibson Oct. 25, 2016, 11:56 p.m. UTC | #1
On Mon, Oct 24, 2016 at 11:47:29PM -0500, Michael Roth wrote:
> In some cases, ibm,client-architecture-support calls can fail. This
> could happen in the current code for situations where the modified
> device tree segment exceeds the buffer size provided by the guest
> via the call parameters. In these cases, QEMU will reset, allowing
> an opportunity to regenerate the device tree from scratch via
> boot-time handling. There are potentially other scenarios as well,
> not currently reachable in the current code, but possible in theory,
> such as cases where device-tree properties or nodes need to be removed.
> 
> We currently don't handle either of these properly for option vector
> capabilities however. Instead of carrying the negotiated capability
> beyond the reset and creating the boot-time device tree accordingly,
> we start from scratch, generating the same boot-time device tree as we
> did prior to the CAS-generated and the same device tree updates as we
> did before. This could (in theory) cause us to get stuck in a reset
> loop. This hasn't been observed, but depending on the extensiveness
> of CAS-induced device tree updates in the future, could eventually
> become an issue.
> 
> Address this by pulling capability-related device tree
> updates resulting from CAS calls into a common routine,
> spapr_dt_cas_updates(), and adding an sPAPROptionVector*
> parameter that allows us to test for newly-negotiated capabilities.
> We invoke it as follows:
> 
> 1) When ibm,client-architecture-support gets called, we
>    call spapr_dt_cas_updates() with the set of capabilities
>    added since the previous call to ibm,client-architecture-support.
>    For the initial boot, or a system reset generated by something
>    other than the CAS call itself, this set will consist of *all*
>    options supported both the platform and the guest. For calls
>    to ibm,client-architecture-support immediately after a CAS-induced
>    reset, we call spapr_dt_cas_updates() with only the set
>    of capabilities added since the previous call, since the other
>    capabilities will have already been addressed by the boot-time
>    device-tree this time around. In the unlikely event that
>    capabilities are *removed* since the previous CAS, we will
>    generate a CAS-induced reset. In the unlikely event that we
>    cannot fit the device-tree updates into the buffer provided
>    by the guest, well generate a CAS-induced reset.
> 
> 2) When a CAS update results in the need to reset the machine and
>    include the updates in the boot-time device tree, we call the
>    spapr_dt_cas_updates() using the full set of negotiated
>    capabilities as part of the reset path. At initial boot, or after
>    a reset generated by something other than the CAS call itself,
>    this set will be empty, resulting in what should be the same
>    boot-time device-tree as we generated prior to this patch. For
>    CAS-induced reset, this routine will be called with the full set of
>    capabilities negotiated by the platform/guest in the previous
>    CAS call, which should result in CAS updates from previous call
>    being accounted for in the initial boot-time device tree.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

One little nit..

[snip]
> @@ -1013,13 +1013,27 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
>       * of guest input. To model these properly we'd want some sort of mask,
>       * but since they only currently apply to memory migration as defined
>       * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
> -     * to worry about this.
> +     * to worry about this for now.
>       */
> +    ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
> +    /* full range of negotiated ov5 capabilities */
>      spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
>      spapr_ovec_cleanup(ov5_guest);
> +    /* capabilities that have been added since CAS-generated guest reset.
> +     * if capabilities have since been removed, generate another reset
> +     */
> +    ov5_updates = spapr_ovec_new();
> +    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
> +                                        ov5_cas_old, spapr->ov5_cas);
> +
> +    if (!spapr->cas_reboot) {
> +        spapr->cas_reboot =
> +            spapr_h_cas_compose_response(spapr, args[1], args[2], cpu_update,
> +                                         ov5_updates);

spapr->cas_reboot is a bool, whereas spapr_h_cas_compose_response()
returns an int error code.  Now that C has real bools, I think the
compiler will do the right thing here, but I'd prefer to see an explicit

cas_reboot = (spapr_h_cas_compose_response() != 0)

for clarity.

> +    }
> +    spapr_ovec_cleanup(ov5_updates);
>  
> -    if (spapr_h_cas_compose_response(spapr, args[1], args[2],
> -                                     cpu_update)) {
> +    if (spapr->cas_reboot) {
>          qemu_system_reset_request();
>      }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a37eee8..b6f9f1b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -75,6 +75,7 @@ struct sPAPRMachineState {
>      bool has_graphics;
>      sPAPROptionVector *ov5;         /* QEMU-supported option vectors */
>      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
> +    bool cas_reboot;
>  
>      uint32_t check_exception_irq;
>      Notifier epow_notifier;
> @@ -586,7 +587,8 @@ void spapr_events_init(sPAPRMachineState *sm);
>  void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
>  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
>                                   target_ulong addr, target_ulong size,
> -                                 bool cpu_update);
> +                                 bool cpu_update,
> +                                 sPAPROptionVector *ov5_updates);
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
>  void spapr_tce_table_enable(sPAPRTCETable *tcet,
>                              uint32_t page_shift, uint64_t bus_offset,
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index af5a239..3b64580 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -655,13 +655,28 @@  out:
     return ret;
 }
 
+static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
+                                sPAPROptionVector *ov5_updates)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    int ret = 0;
+
+    /* Generate ibm,dynamic-reconfiguration-memory node if required */
+    if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
+        g_assert(smc->dr_lmb_enabled);
+        ret = spapr_populate_drconf_memory(spapr, fdt);
+    }
+
+    return ret;
+}
+
 int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update)
+                                 bool cpu_update,
+                                 sPAPROptionVector *ov5_updates)
 {
     void *fdt, *fdt_skel;
     sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
 
     size -= sizeof(hdr);
 
@@ -680,10 +695,8 @@  int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
         _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
     }
 
-    /* Generate ibm,dynamic-reconfiguration-memory node if required */
-    if (spapr_ovec_test(spapr->ov5_cas, OV5_DRCONF_MEMORY)) {
-        g_assert(smc->dr_lmb_enabled);
-        _FDT((spapr_populate_drconf_memory(spapr, fdt)));
+    if (spapr_dt_cas_updates(spapr, fdt, ov5_updates)) {
+        return -1;
     }
 
     /* Pack resulting tree */
@@ -972,6 +985,13 @@  static void *spapr_build_fdt(sPAPRMachineState *spapr,
         _FDT((fdt_add_mem_rsv(fdt, spapr->initrd_base, spapr->initrd_size)));
     }
 
+    /* ibm,client-architecture-support updates */
+    ret = spapr_dt_cas_updates(spapr, fdt, spapr->ov5_cas);
+    if (ret < 0) {
+        error_report("couldn't setup CAS properties fdt");
+        exit(1);
+    }
+
     return fdt;
 }
 
@@ -1137,6 +1157,13 @@  static void ppc_spapr_reset(void)
     rtas_addr = rtas_limit - RTAS_MAX_SIZE;
     fdt_addr = rtas_addr - FDT_MAX_SIZE;
 
+    /* if this reset wasn't generated by CAS, we should reset our
+     * negotiated options and start from scratch */
+    if (!spapr->cas_reboot) {
+        spapr_ovec_cleanup(spapr->ov5_cas);
+        spapr->ov5_cas = spapr_ovec_new();
+    }
+
     fdt = spapr_build_fdt(spapr, rtas_addr, spapr->rtas_size);
 
     spapr_load_rtas(spapr, fdt, rtas_addr);
@@ -1164,6 +1191,7 @@  static void ppc_spapr_reset(void)
     first_cpu->halted = 0;
     first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
 
+    spapr->cas_reboot = false;
 }
 
 static void spapr_create_nvram(sPAPRMachineState *spapr)
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index f1d081b..d277813 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -950,7 +950,7 @@  static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     unsigned compat_lvl = 0, cpu_version = 0;
     unsigned max_lvl = get_compat_level(cpu_->max_compat);
     int counter;
-    sPAPROptionVector *ov5_guest;
+    sPAPROptionVector *ov5_guest, *ov5_cas_old, *ov5_updates;
 
     /* Parse PVR list */
     for (counter = 0; counter < 512; ++counter) {
@@ -1013,13 +1013,27 @@  static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
      * of guest input. To model these properly we'd want some sort of mask,
      * but since they only currently apply to memory migration as defined
      * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
-     * to worry about this.
+     * to worry about this for now.
      */
+    ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
+    /* full range of negotiated ov5 capabilities */
     spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
     spapr_ovec_cleanup(ov5_guest);
+    /* capabilities that have been added since CAS-generated guest reset.
+     * if capabilities have since been removed, generate another reset
+     */
+    ov5_updates = spapr_ovec_new();
+    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
+                                        ov5_cas_old, spapr->ov5_cas);
+
+    if (!spapr->cas_reboot) {
+        spapr->cas_reboot =
+            spapr_h_cas_compose_response(spapr, args[1], args[2], cpu_update,
+                                         ov5_updates);
+    }
+    spapr_ovec_cleanup(ov5_updates);
 
-    if (spapr_h_cas_compose_response(spapr, args[1], args[2],
-                                     cpu_update)) {
+    if (spapr->cas_reboot) {
         qemu_system_reset_request();
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a37eee8..b6f9f1b 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -75,6 +75,7 @@  struct sPAPRMachineState {
     bool has_graphics;
     sPAPROptionVector *ov5;         /* QEMU-supported option vectors */
     sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
+    bool cas_reboot;
 
     uint32_t check_exception_irq;
     Notifier epow_notifier;
@@ -586,7 +587,8 @@  void spapr_events_init(sPAPRMachineState *sm);
 void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update);
+                                 bool cpu_update,
+                                 sPAPROptionVector *ov5_updates);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
 void spapr_tce_table_enable(sPAPRTCETable *tcet,
                             uint32_t page_shift, uint64_t bus_offset,