diff mbox

[QEMU,v5,5/6] migration: spapr: migrate ccs_list in spapr state

Message ID 1475519097-27611-6-git-send-email-duanj@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jianjun Duan Oct. 3, 2016, 6:24 p.m. UTC
ccs_list in spapr state maintains the device tree related
information on the rtas side for hotplugged devices. In racing
situations between hotplug events and migration operation, a rtas
hotplug event could be migrated from the source guest to target
guest, or the source guest could have not yet finished fetching
the device tree when migration is started, the target will try
to finish fetching the device tree. By migrating ccs_list, the
target can fetch the device tree properly.

ccs_list is put in a subsection in the spapr state VMSD to make
sure migration across different versions is not broken.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Comments

David Gibson Oct. 7, 2016, 3:36 a.m. UTC | #1
On Mon, Oct 03, 2016 at 11:24:56AM -0700, Jianjun Duan wrote:
> ccs_list in spapr state maintains the device tree related
> information on the rtas side for hotplugged devices. In racing
> situations between hotplug events and migration operation, a rtas
> hotplug event could be migrated from the source guest to target
> guest, or the source guest could have not yet finished fetching
> the device tree when migration is started, the target will try
> to finish fetching the device tree. By migrating ccs_list, the
> target can fetch the device tree properly.
> 
> ccs_list is put in a subsection in the spapr state VMSD to make
> sure migration across different versions is not broken.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>

I'm still not entirely convinced we need to migrate the ccs_list.
What would happen if we did this:

   * Keep a flag which indicates whether the guest is in the middle of
     the configure_connector process.
       - I'm not sure if that would need to be a new bit of state, or
         if we could deduce it from the value of the isolation and
	 allocation states
       - If it's new state, we'd need to migrate it, obviously not if
         we can derive it from other state flags

   * On the destination during post_load, if there was an in-progress
     configure_connector on the source, we set another "stale
     configure" flag

   * When a configure_connector call is attempted on the destination
     with the stale configure flag set, return an error

The question is, if we choose the right error, can we get the guest to
either restart the configure from scratch, or fail gracefully, so the
operator can restart the hotplug

> ---
>  hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 63b6a0d..1847d35 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1255,6 +1255,36 @@ static bool version_before_3(void *opaque, int version_id)
>      return version_id < 3;
>  }
>  
> +static bool spapr_ccs_list_needed(void *opaque)
> +{
> +    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> +    return !QTAILQ_EMPTY(&spapr->ccs_list);
> +}
> +
> +static const VMStateDescription vmstate_spapr_ccs = {
> +    .name = "spaprconfigureconnectorstate",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorState),
> +        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorState),
> +        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_ccs_list = {
> +    .name = "spaprccslist",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_ccs_list_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_QTAILQ_V(ccs_list, sPAPRMachineState, 1,
> +                         vmstate_spapr_ccs, sPAPRConfigureConnectorState, next),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -1270,6 +1300,10 @@ static const VMStateDescription vmstate_spapr = {
>          VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription*[]) {
> +        &vmstate_spapr_ccs_list,
> +        NULL
> +    }
>  };
>  
>  static int htab_save_setup(QEMUFile *f, void *opaque)
Michael Roth Oct. 7, 2016, 2:52 p.m. UTC | #2
Quoting David Gibson (2016-10-06 22:36:07)
> On Mon, Oct 03, 2016 at 11:24:56AM -0700, Jianjun Duan wrote:
> > ccs_list in spapr state maintains the device tree related
> > information on the rtas side for hotplugged devices. In racing
> > situations between hotplug events and migration operation, a rtas
> > hotplug event could be migrated from the source guest to target
> > guest, or the source guest could have not yet finished fetching
> > the device tree when migration is started, the target will try
> > to finish fetching the device tree. By migrating ccs_list, the
> > target can fetch the device tree properly.
> > 
> > ccs_list is put in a subsection in the spapr state VMSD to make
> > sure migration across different versions is not broken.
> > 
> > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> 
> I'm still not entirely convinced we need to migrate the ccs_list.
> What would happen if we did this:
> 
>    * Keep a flag which indicates whether the guest is in the middle of
>      the configure_connector process.
>        - I'm not sure if that would need to be a new bit of state, or
>          if we could deduce it from the value of the isolation and
>          allocation states
>        - If it's new state, we'd need to migrate it, obviously not if
>          we can derive it from other state flags
> 
>    * On the destination during post_load, if there was an in-progress
>      configure_connector on the source, we set another "stale
>      configure" flag
> 
>    * When a configure_connector call is attempted on the destination
>      with the stale configure flag set, return an error
> 
> The question is, if we choose the right error, can we get the guest to
> either restart the configure from scratch, or fail gracefully, so the
> operator can restart the hotplug

To get the configure to restart, the guest's configure_connector
implementation would need to changed. Current code in drmgr would just
bail on any error, and I'd imagine the in-kernel version does the same.

So at least for existing guests, the only option is failing the command
at the operator's interface, namely device_add. device_add is
asynchronous to the actual hotplug event handling however. So if we want
to convey failure to the user, it would have to be either in the form of
a new QMP event emitted to convey hotplug success or error, or through
device_add itself by implementing something like the async QMP rework
that Marc-Andre posted some time back (which still seems to be a topic
of debate). Which either approach, something like libvirt, with adequate
state-tracking for pending hotplug events, could handle an error event
on the target side post-migrate and convey that to the user somehow.

That's probably a much larger discussion if it comes to that, but doable
in theory.

But even that wouldn't get us totally out of the woods: DRC state can
still be modified outside of hotplug. For instance, a guest should be
able to do:

  drmgr -c pci -s <drc_index> -r
  drmgr -c pci -s <drc_index> -a

to return a device to firmware and then later take it back and
reconfigure it. I'm not aware of any common case where this would occur,
but it's not disallowed by the specification, and performing a migration
between these 2 operations would currently break this since the default
coldplug state on target assumes a configured state in source.

> 
> > ---
> >  hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 63b6a0d..1847d35 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1255,6 +1255,36 @@ static bool version_before_3(void *opaque, int version_id)
> >      return version_id < 3;
> >  }
> >  
> > +static bool spapr_ccs_list_needed(void *opaque)
> > +{
> > +    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> > +    return !QTAILQ_EMPTY(&spapr->ccs_list);
> > +}
> > +
> > +static const VMStateDescription vmstate_spapr_ccs = {
> > +    .name = "spaprconfigureconnectorstate",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorState),
> > +        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorState),
> > +        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorState),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> > +static const VMStateDescription vmstate_spapr_ccs_list = {
> > +    .name = "spaprccslist",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = spapr_ccs_list_needed,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_QTAILQ_V(ccs_list, sPAPRMachineState, 1,
> > +                         vmstate_spapr_ccs, sPAPRConfigureConnectorState, next),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> >  static const VMStateDescription vmstate_spapr = {
> >      .name = "spapr",
> >      .version_id = 3,
> > @@ -1270,6 +1300,10 @@ static const VMStateDescription vmstate_spapr = {
> >          VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
> >          VMSTATE_END_OF_LIST()
> >      },
> > +    .subsections = (const VMStateDescription*[]) {
> > +        &vmstate_spapr_ccs_list,
> > +        NULL
> > +    }
> >  };
> >  
> >  static int htab_save_setup(QEMUFile *f, void *opaque)
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson
David Gibson Oct. 10, 2016, 5:05 a.m. UTC | #3
On Fri, Oct 07, 2016 at 09:52:51AM -0500, Michael Roth wrote:
> Quoting David Gibson (2016-10-06 22:36:07)
> > On Mon, Oct 03, 2016 at 11:24:56AM -0700, Jianjun Duan wrote:
> > > ccs_list in spapr state maintains the device tree related
> > > information on the rtas side for hotplugged devices. In racing
> > > situations between hotplug events and migration operation, a rtas
> > > hotplug event could be migrated from the source guest to target
> > > guest, or the source guest could have not yet finished fetching
> > > the device tree when migration is started, the target will try
> > > to finish fetching the device tree. By migrating ccs_list, the
> > > target can fetch the device tree properly.
> > > 
> > > ccs_list is put in a subsection in the spapr state VMSD to make
> > > sure migration across different versions is not broken.
> > > 
> > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > 
> > I'm still not entirely convinced we need to migrate the ccs_list.
> > What would happen if we did this:
> > 
> >    * Keep a flag which indicates whether the guest is in the middle of
> >      the configure_connector process.
> >        - I'm not sure if that would need to be a new bit of state, or
> >          if we could deduce it from the value of the isolation and
> >          allocation states
> >        - If it's new state, we'd need to migrate it, obviously not if
> >          we can derive it from other state flags
> > 
> >    * On the destination during post_load, if there was an in-progress
> >      configure_connector on the source, we set another "stale
> >      configure" flag
> > 
> >    * When a configure_connector call is attempted on the destination
> >      with the stale configure flag set, return an error
> > 
> > The question is, if we choose the right error, can we get the guest to
> > either restart the configure from scratch, or fail gracefully, so the
> > operator can restart the hotplug
> 
> To get the configure to restart, the guest's configure_connector
> implementation would need to changed. Current code in drmgr would just
> bail on any error, and I'd imagine the in-kernel version does the same.
> 
> So at least for existing guests, the only option is failing the command
> at the operator's interface, namely device_add. device_add is
> asynchronous to the actual hotplug event handling however. So if we want
> to convey failure to the user, it would have to be either in the form of
> a new QMP event emitted to convey hotplug success or error, or through
> device_add itself by implementing something like the async QMP rework
> that Marc-Andre posted some time back (which still seems to be a topic
> of debate). Which either approach, something like libvirt, with adequate
> state-tracking for pending hotplug events, could handle an error event
> on the target side post-migrate and convey that to the user somehow.
> 
> That's probably a much larger discussion if it comes to that, but doable
> in theory.
> 
> But even that wouldn't get us totally out of the woods: DRC state can
> still be modified outside of hotplug. For instance, a guest should be
> able to do:
> 
>   drmgr -c pci -s <drc_index> -r
>   drmgr -c pci -s <drc_index> -a
> 
> to return a device to firmware and then later take it back and
> reconfigure it. I'm not aware of any common case where this would occur,
> but it's not disallowed by the specification, and performing a migration
> between these 2 operations would currently break this since the default
> coldplug state on target assumes a configured state in source.

Thanks, that's a good case for why we need this.  Can you please fold
this description into your commit messages so it's there for posterity.
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 63b6a0d..1847d35 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1255,6 +1255,36 @@  static bool version_before_3(void *opaque, int version_id)
     return version_id < 3;
 }
 
+static bool spapr_ccs_list_needed(void *opaque)
+{
+    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
+    return !QTAILQ_EMPTY(&spapr->ccs_list);
+}
+
+static const VMStateDescription vmstate_spapr_ccs = {
+    .name = "spaprconfigureconnectorstate",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorState),
+        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorState),
+        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_ccs_list = {
+    .name = "spaprccslist",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_ccs_list_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_QTAILQ_V(ccs_list, sPAPRMachineState, 1,
+                         vmstate_spapr_ccs, sPAPRConfigureConnectorState, next),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -1270,6 +1300,10 @@  static const VMStateDescription vmstate_spapr = {
         VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription*[]) {
+        &vmstate_spapr_ccs_list,
+        NULL
+    }
 };
 
 static int htab_save_setup(QEMUFile *f, void *opaque)