Message ID | 1475519097-27611-6-git-send-email-duanj@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Oct 03, 2016 at 11:24:56AM -0700, Jianjun Duan wrote: > ccs_list in spapr state maintains the device tree related > information on the rtas side for hotplugged devices. In racing > situations between hotplug events and migration operation, a rtas > hotplug event could be migrated from the source guest to target > guest, or the source guest could have not yet finished fetching > the device tree when migration is started, the target will try > to finish fetching the device tree. By migrating ccs_list, the > target can fetch the device tree properly. > > ccs_list is put in a subsection in the spapr state VMSD to make > sure migration across different versions is not broken. > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com> I'm still not entirely convinced we need to migrate the ccs_list. What would happen if we did this: * Keep a flag which indicates whether the guest is in the middle of the configure_connector process. - I'm not sure if that would need to be a new bit of state, or if we could deduce it from the value of the isolation and allocation states - If it's new state, we'd need to migrate it, obviously not if we can derive it from other state flags * On the destination during post_load, if there was an in-progress configure_connector on the source, we set another "stale configure" flag * When a configure_connector call is attempted on the destination with the stale configure flag set, return an error The question is, if we choose the right error, can we get the guest to either restart the configure from scratch, or fail gracefully, so the operator can restart the hotplug > --- > hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 63b6a0d..1847d35 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1255,6 +1255,36 @@ static bool version_before_3(void *opaque, int version_id) > return version_id < 3; > } > > +static bool spapr_ccs_list_needed(void *opaque) > +{ > + sPAPRMachineState *spapr = (sPAPRMachineState *)opaque; > + return !QTAILQ_EMPTY(&spapr->ccs_list); > +} > + > +static const VMStateDescription vmstate_spapr_ccs = { > + .name = "spaprconfigureconnectorstate", > + .version_id = 1, > + .minimum_version_id = 1, > + .fields = (VMStateField[]) { > + VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorState), > + VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorState), > + VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorState), > + VMSTATE_END_OF_LIST() > + }, > +}; > + > +static const VMStateDescription vmstate_spapr_ccs_list = { > + .name = "spaprccslist", > + .version_id = 1, > + .minimum_version_id = 1, > + .needed = spapr_ccs_list_needed, > + .fields = (VMStateField[]) { > + VMSTATE_QTAILQ_V(ccs_list, sPAPRMachineState, 1, > + vmstate_spapr_ccs, sPAPRConfigureConnectorState, next), > + VMSTATE_END_OF_LIST() > + }, > +}; > + > static const VMStateDescription vmstate_spapr = { > .name = "spapr", > .version_id = 3, > @@ -1270,6 +1300,10 @@ static const VMStateDescription vmstate_spapr = { > VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2), > VMSTATE_END_OF_LIST() > }, > + .subsections = (const VMStateDescription*[]) { > + &vmstate_spapr_ccs_list, > + NULL > + } > }; > > static int htab_save_setup(QEMUFile *f, void *opaque)
Quoting David Gibson (2016-10-06 22:36:07) > On Mon, Oct 03, 2016 at 11:24:56AM -0700, Jianjun Duan wrote: > > ccs_list in spapr state maintains the device tree related > > information on the rtas side for hotplugged devices. In racing > > situations between hotplug events and migration operation, a rtas > > hotplug event could be migrated from the source guest to target > > guest, or the source guest could have not yet finished fetching > > the device tree when migration is started, the target will try > > to finish fetching the device tree. By migrating ccs_list, the > > target can fetch the device tree properly. > > > > ccs_list is put in a subsection in the spapr state VMSD to make > > sure migration across different versions is not broken. > > > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com> > > I'm still not entirely convinced we need to migrate the ccs_list. > What would happen if we did this: > > * Keep a flag which indicates whether the guest is in the middle of > the configure_connector process. > - I'm not sure if that would need to be a new bit of state, or > if we could deduce it from the value of the isolation and > allocation states > - If it's new state, we'd need to migrate it, obviously not if > we can derive it from other state flags > > * On the destination during post_load, if there was an in-progress > configure_connector on the source, we set another "stale > configure" flag > > * When a configure_connector call is attempted on the destination > with the stale configure flag set, return an error > > The question is, if we choose the right error, can we get the guest to > either restart the configure from scratch, or fail gracefully, so the > operator can restart the hotplug To get the configure to restart, the guest's configure_connector implementation would need to changed. Current code in drmgr would just bail on any error, and I'd imagine the in-kernel version does the same. So at least for existing guests, the only option is failing the command at the operator's interface, namely device_add. device_add is asynchronous to the actual hotplug event handling however. So if we want to convey failure to the user, it would have to be either in the form of a new QMP event emitted to convey hotplug success or error, or through device_add itself by implementing something like the async QMP rework that Marc-Andre posted some time back (which still seems to be a topic of debate). Which either approach, something like libvirt, with adequate state-tracking for pending hotplug events, could handle an error event on the target side post-migrate and convey that to the user somehow. That's probably a much larger discussion if it comes to that, but doable in theory. But even that wouldn't get us totally out of the woods: DRC state can still be modified outside of hotplug. For instance, a guest should be able to do: drmgr -c pci -s <drc_index> -r drmgr -c pci -s <drc_index> -a to return a device to firmware and then later take it back and reconfigure it. I'm not aware of any common case where this would occur, but it's not disallowed by the specification, and performing a migration between these 2 operations would currently break this since the default coldplug state on target assumes a configured state in source. > > > --- > > hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 34 insertions(+) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 63b6a0d..1847d35 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1255,6 +1255,36 @@ static bool version_before_3(void *opaque, int version_id) > > return version_id < 3; > > } > > > > +static bool spapr_ccs_list_needed(void *opaque) > > +{ > > + sPAPRMachineState *spapr = (sPAPRMachineState *)opaque; > > + return !QTAILQ_EMPTY(&spapr->ccs_list); > > +} > > + > > +static const VMStateDescription vmstate_spapr_ccs = { > > + .name = "spaprconfigureconnectorstate", > > + .version_id = 1, > > + .minimum_version_id = 1, > > + .fields = (VMStateField[]) { > > + VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorState), > > + VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorState), > > + VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorState), > > + VMSTATE_END_OF_LIST() > > + }, > > +}; > > + > > +static const VMStateDescription vmstate_spapr_ccs_list = { > > + .name = "spaprccslist", > > + .version_id = 1, > > + .minimum_version_id = 1, > > + .needed = spapr_ccs_list_needed, > > + .fields = (VMStateField[]) { > > + VMSTATE_QTAILQ_V(ccs_list, sPAPRMachineState, 1, > > + vmstate_spapr_ccs, sPAPRConfigureConnectorState, next), > > + VMSTATE_END_OF_LIST() > > + }, > > +}; > > + > > static const VMStateDescription vmstate_spapr = { > > .name = "spapr", > > .version_id = 3, > > @@ -1270,6 +1300,10 @@ static const VMStateDescription vmstate_spapr = { > > VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2), > > VMSTATE_END_OF_LIST() > > }, > > + .subsections = (const VMStateDescription*[]) { > > + &vmstate_spapr_ccs_list, > > + NULL > > + } > > }; > > > > static int htab_save_setup(QEMUFile *f, void *opaque) > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson
On Fri, Oct 07, 2016 at 09:52:51AM -0500, Michael Roth wrote: > Quoting David Gibson (2016-10-06 22:36:07) > > On Mon, Oct 03, 2016 at 11:24:56AM -0700, Jianjun Duan wrote: > > > ccs_list in spapr state maintains the device tree related > > > information on the rtas side for hotplugged devices. In racing > > > situations between hotplug events and migration operation, a rtas > > > hotplug event could be migrated from the source guest to target > > > guest, or the source guest could have not yet finished fetching > > > the device tree when migration is started, the target will try > > > to finish fetching the device tree. By migrating ccs_list, the > > > target can fetch the device tree properly. > > > > > > ccs_list is put in a subsection in the spapr state VMSD to make > > > sure migration across different versions is not broken. > > > > > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com> > > > > I'm still not entirely convinced we need to migrate the ccs_list. > > What would happen if we did this: > > > > * Keep a flag which indicates whether the guest is in the middle of > > the configure_connector process. > > - I'm not sure if that would need to be a new bit of state, or > > if we could deduce it from the value of the isolation and > > allocation states > > - If it's new state, we'd need to migrate it, obviously not if > > we can derive it from other state flags > > > > * On the destination during post_load, if there was an in-progress > > configure_connector on the source, we set another "stale > > configure" flag > > > > * When a configure_connector call is attempted on the destination > > with the stale configure flag set, return an error > > > > The question is, if we choose the right error, can we get the guest to > > either restart the configure from scratch, or fail gracefully, so the > > operator can restart the hotplug > > To get the configure to restart, the guest's configure_connector > implementation would need to changed. Current code in drmgr would just > bail on any error, and I'd imagine the in-kernel version does the same. > > So at least for existing guests, the only option is failing the command > at the operator's interface, namely device_add. device_add is > asynchronous to the actual hotplug event handling however. So if we want > to convey failure to the user, it would have to be either in the form of > a new QMP event emitted to convey hotplug success or error, or through > device_add itself by implementing something like the async QMP rework > that Marc-Andre posted some time back (which still seems to be a topic > of debate). Which either approach, something like libvirt, with adequate > state-tracking for pending hotplug events, could handle an error event > on the target side post-migrate and convey that to the user somehow. > > That's probably a much larger discussion if it comes to that, but doable > in theory. > > But even that wouldn't get us totally out of the woods: DRC state can > still be modified outside of hotplug. For instance, a guest should be > able to do: > > drmgr -c pci -s <drc_index> -r > drmgr -c pci -s <drc_index> -a > > to return a device to firmware and then later take it back and > reconfigure it. I'm not aware of any common case where this would occur, > but it's not disallowed by the specification, and performing a migration > between these 2 operations would currently break this since the default > coldplug state on target assumes a configured state in source. Thanks, that's a good case for why we need this. Can you please fold this description into your commit messages so it's there for posterity.
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 63b6a0d..1847d35 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1255,6 +1255,36 @@ static bool version_before_3(void *opaque, int version_id) return version_id < 3; } +static bool spapr_ccs_list_needed(void *opaque) +{ + sPAPRMachineState *spapr = (sPAPRMachineState *)opaque; + return !QTAILQ_EMPTY(&spapr->ccs_list); +} + +static const VMStateDescription vmstate_spapr_ccs = { + .name = "spaprconfigureconnectorstate", + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorState), + VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorState), + VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorState), + VMSTATE_END_OF_LIST() + }, +}; + +static const VMStateDescription vmstate_spapr_ccs_list = { + .name = "spaprccslist", + .version_id = 1, + .minimum_version_id = 1, + .needed = spapr_ccs_list_needed, + .fields = (VMStateField[]) { + VMSTATE_QTAILQ_V(ccs_list, sPAPRMachineState, 1, + vmstate_spapr_ccs, sPAPRConfigureConnectorState, next), + VMSTATE_END_OF_LIST() + }, +}; + static const VMStateDescription vmstate_spapr = { .name = "spapr", .version_id = 3, @@ -1270,6 +1300,10 @@ static const VMStateDescription vmstate_spapr = { VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2), VMSTATE_END_OF_LIST() }, + .subsections = (const VMStateDescription*[]) { + &vmstate_spapr_ccs_list, + NULL + } }; static int htab_save_setup(QEMUFile *f, void *opaque)
ccs_list in spapr state maintains the device tree related information on the rtas side for hotplugged devices. In racing situations between hotplug events and migration operation, a rtas hotplug event could be migrated from the source guest to target guest, or the source guest could have not yet finished fetching the device tree when migration is started, the target will try to finish fetching the device tree. By migrating ccs_list, the target can fetch the device tree properly. ccs_list is put in a subsection in the spapr state VMSD to make sure migration across different versions is not broken. Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com> --- hw/ppc/spapr.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)