Message ID | 20220220095716.153757-11-yishaih@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add mlx5 live migration driver and v2 migration protocol | expand |
> From: Yishai Hadas <yishaih@nvidia.com> > Sent: Sunday, February 20, 2022 5:57 PM > > From: Jason Gunthorpe <jgg@nvidia.com> > > The RUNNING_P2P state is designed to support multiple devices in the same > VM that are doing P2P transactions between themselves. When in > RUNNING_P2P > the device must be able to accept incoming P2P transactions but should not > generate outgoing P2P transactions. > > As an optional extension to the mandatory states it is defined as > inbetween STOP and RUNNING: > STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP > > For drivers that are unable to support RUNNING_P2P the core code > silently merges RUNNING_P2P and RUNNING together. Unless driver support > is present, the new state cannot be used in SET_STATE. > Drivers that support this will be required to implement 4 FSM arcs > beyond the basic FSM. 2 of the basic FSM arcs become combination > transitions. > > Compared to the v1 clarification, NDMA is redefined into FSM states and is > described in terms of the desired P2P quiescent behavior, noting that > halting all DMA is an acceptable implementation. > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> > --- > drivers/vfio/vfio.c | 84 +++++++++++++++++++++++++++++++-------- > include/linux/vfio.h | 1 + > include/uapi/linux/vfio.h | 36 ++++++++++++++++- > 3 files changed, 102 insertions(+), 19 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index b37ab27b511f..bdb5205bb358 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -1577,39 +1577,55 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > enum vfio_device_mig_state new_fsm, > enum vfio_device_mig_state *next_fsm) > { > - enum { VFIO_DEVICE_NUM_STATES = > VFIO_DEVICE_STATE_RESUMING + 1 }; > + enum { VFIO_DEVICE_NUM_STATES = > VFIO_DEVICE_STATE_RUNNING_P2P + 1 }; > /* > - * The coding in this table requires the driver to implement 6 > + * The coding in this table requires the driver to implement > * FSM arcs: > * RESUMING -> STOP > - * RUNNING -> STOP > * STOP -> RESUMING > - * STOP -> RUNNING > * STOP -> STOP_COPY > * STOP_COPY -> STOP > * > - * The coding will step through multiple states for these combination > - * transitions: > - * RESUMING -> STOP -> RUNNING > + * If P2P is supported then the driver must also implement these FSM > + * arcs: > + * RUNNING -> RUNNING_P2P > + * RUNNING_P2P -> RUNNING > + * RUNNING_P2P -> STOP > + * STOP -> RUNNING_P2P > + * Without P2P the driver must implement: > + * RUNNING -> STOP > + * STOP -> RUNNING > + * > + * If all optional features are supported then the coding will step > + * through multiple states for these combination transitions: > + * RESUMING -> STOP -> RUNNING_P2P > + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING > * RESUMING -> STOP -> STOP_COPY > - * RUNNING -> STOP -> RESUMING > - * RUNNING -> STOP -> STOP_COPY > + * RUNNING -> RUNNING_P2P -> STOP > + * RUNNING -> RUNNING_P2P -> STOP -> RESUMING > + * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY > + * RUNNING_P2P -> STOP -> RESUMING > + * RUNNING_P2P -> STOP -> STOP_COPY > + * STOP -> RUNNING_P2P -> RUNNING > * STOP_COPY -> STOP -> RESUMING > - * STOP_COPY -> STOP -> RUNNING > + * STOP_COPY -> STOP -> RUNNING_P2P > + * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING > */ > static const u8 > vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STA > TES] = { > [VFIO_DEVICE_STATE_STOP] = { > [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_STOP, > - [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING, > + [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP_COPY, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_RESUMING, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_RUNNING] = { > - [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING, > - [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP, > - [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_RUNNING_P2P, > + [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_RUNNING_P2P, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_STOP_COPY] = { > @@ -1617,6 +1633,7 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP_COPY, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_RESUMING] = { > @@ -1624,6 +1641,15 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_RESUMING, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > + }, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = { > + [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING, > + [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_ERROR] = { > @@ -1631,17 +1657,41 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_ERROR, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_ERROR, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_ERROR, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_ERROR, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > }; > > - if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table))) > + static const unsigned int > state_flags_table[VFIO_DEVICE_NUM_STATES] = { > + [VFIO_DEVICE_STATE_STOP] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_RUNNING] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_RESUMING] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > + VFIO_MIGRATION_STOP_COPY | > VFIO_MIGRATION_P2P, > + [VFIO_DEVICE_STATE_ERROR] = ~0U, > + }; > + > + if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) || > + (state_flags_table[cur_fsm] & device->migration_flags) != > + state_flags_table[cur_fsm])) > return -EINVAL; > > - if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table)) > + if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table) || > + (state_flags_table[new_fsm] & device->migration_flags) != > + state_flags_table[new_fsm]) > return -EINVAL; > > + /* > + * Arcs touching optional and unsupported states are skipped over. > The > + * driver will instead see an arc from the original state to the next > + * logical state, as per the above comment. > + */ > *next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm]; > + while ((state_flags_table[*next_fsm] & device->migration_flags) != > + state_flags_table[*next_fsm]) > + *next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm]; > + > return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL; > } > EXPORT_SYMBOL_GPL(vfio_mig_get_next_state); > @@ -1731,7 +1781,7 @@ static int > vfio_ioctl_device_feature_migration(struct vfio_device *device, > size_t argsz) > { > struct vfio_device_feature_migration mig = { > - .flags = VFIO_MIGRATION_STOP_COPY, > + .flags = device->migration_flags, > }; > int ret; > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 3bbadcdbc9c8..3176cb5d4464 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -33,6 +33,7 @@ struct vfio_device { > struct vfio_group *group; > struct vfio_device_set *dev_set; > struct list_head dev_set_list; > + unsigned int migration_flags; > > /* Members below here are private, not for driver use */ > refcount_t refcount; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 02b836ea8f46..46b06946f0a8 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -1010,10 +1010,16 @@ struct vfio_device_feature { > * > * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and > * RESUMING are supported. > + * > + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that > RUNNING_P2P > + * is supported in addition to the STOP_COPY states. > + * > + * Other combinations of flags have behavior to be defined in the future. > */ > struct vfio_device_feature_migration { > __aligned_u64 flags; > #define VFIO_MIGRATION_STOP_COPY (1 << 0) > +#define VFIO_MIGRATION_P2P (1 << 1) > }; > #define VFIO_DEVICE_FEATURE_MIGRATION 1 > > @@ -1064,10 +1070,13 @@ struct vfio_device_feature_mig_state { > * RESUMING - The device is stopped and is loading a new internal state > * ERROR - The device has failed and must be reset > * > + * And 1 optional state to support VFIO_MIGRATION_P2P: > + * RUNNING_P2P - RUNNING, except the device cannot do peer to peer > DMA > + * > * The FSM takes actions on the arcs between FSM states. The driver > implements > * the following behavior for the FSM arcs: > * > - * RUNNING -> STOP > + * RUNNING_P2P -> STOP > * STOP_COPY -> STOP > * While in STOP the device must stop the operation of the device. The > device > * must not generate interrupts, DMA, or any other change to external state. > @@ -1094,11 +1103,16 @@ struct vfio_device_feature_mig_state { > * > * To abort a RESUMING session the device must be reset. > * > - * STOP -> RUNNING > + * RUNNING_P2P -> RUNNING > * While in RUNNING the device is fully operational, the device may > generate > * interrupts, DMA, respond to MMIO, all vfio device regions are functional, > * and the device may advance its internal state. > * > + * RUNNING -> RUNNING_P2P > + * STOP -> RUNNING_P2P > + * While in RUNNING_P2P the device is partially running in the P2P > quiescent > + * state defined below. > + * > * STOP -> STOP_COPY > * This arc begin the process of saving the device state and will return a > * new data_fd. > @@ -1128,6 +1142,18 @@ struct vfio_device_feature_mig_state { > * To recover from ERROR VFIO_DEVICE_RESET must be used to return the > * device_state back to RUNNING. > * > + * The optional peer to peer (P2P) quiescent state is intended to be a > quiescent > + * state for the device for the purposes of managing multiple devices within > a > + * user context where peer-to-peer DMA between devices may be active. > The > + * RUNNING_P2P states must prevent the device from initiating > + * any new P2P DMA transactions. If the device can identify P2P transactions > + * then it can stop only P2P DMA, otherwise it must stop all DMA. The > migration > + * driver must complete any such outstanding operations prior to > completing the > + * FSM arc into a P2P state. For the purpose of specification the states > + * behave as though the device was fully running if not supported. Like > while in > + * STOP or STOP_COPY the user must not touch the device, otherwise the > state > + * can be exited. > + * > * The remaining possible transitions are interpreted as combinations of the > * above FSM arcs. As there are multiple paths through the FSM arcs the > path > * should be selected based on the following rules: > @@ -1140,6 +1166,11 @@ struct vfio_device_feature_mig_state { > * fails. When handling these types of errors users should anticipate future > * revisions of this protocol using new states and those states becoming > * visible in this case. > + * > + * The optional states cannot be used with SET_STATE if the device does not > + * support them. The user can discover if these states are supported by > using > + * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions > the user can > + * avoid knowing about these optional states if the kernel driver supports > them. > */ > enum vfio_device_mig_state { > VFIO_DEVICE_STATE_ERROR = 0, > @@ -1147,6 +1178,7 @@ enum vfio_device_mig_state { > VFIO_DEVICE_STATE_RUNNING = 2, > VFIO_DEVICE_STATE_STOP_COPY = 3, > VFIO_DEVICE_STATE_RESUMING = 4, > + VFIO_DEVICE_STATE_RUNNING_P2P = 5, > }; > > /* -------- API for Type1 VFIO IOMMU -------- */ > -- > 2.18.1
On Sun, 20 Feb 2022 11:57:11 +0200 Yishai Hadas <yishaih@nvidia.com> wrote: > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 3bbadcdbc9c8..3176cb5d4464 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -33,6 +33,7 @@ struct vfio_device { > struct vfio_group *group; > struct vfio_device_set *dev_set; > struct list_head dev_set_list; > + unsigned int migration_flags; Maybe paranoia, but should we sanity test this in __vfio_register_dev() to reinforce to driver authors that not all bit combinations are valid? Thanks, Alex
On Wed, Feb 23, 2022 at 10:42:48AM -0700, Alex Williamson wrote: > On Sun, 20 Feb 2022 11:57:11 +0200 > Yishai Hadas <yishaih@nvidia.com> wrote: > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > > index 3bbadcdbc9c8..3176cb5d4464 100644 > > +++ b/include/linux/vfio.h > > @@ -33,6 +33,7 @@ struct vfio_device { > > struct vfio_group *group; > > struct vfio_device_set *dev_set; > > struct list_head dev_set_list; > > + unsigned int migration_flags; > > Maybe paranoia, but should we sanity test this in __vfio_register_dev() > to reinforce to driver authors that not all bit combinations are valid? > Thanks, I don't like sanity testing things that are easy to audit for.. Jason
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index b37ab27b511f..bdb5205bb358 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1577,39 +1577,55 @@ int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state new_fsm, enum vfio_device_mig_state *next_fsm) { - enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RESUMING + 1 }; + enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 }; /* - * The coding in this table requires the driver to implement 6 + * The coding in this table requires the driver to implement * FSM arcs: * RESUMING -> STOP - * RUNNING -> STOP * STOP -> RESUMING - * STOP -> RUNNING * STOP -> STOP_COPY * STOP_COPY -> STOP * - * The coding will step through multiple states for these combination - * transitions: - * RESUMING -> STOP -> RUNNING + * If P2P is supported then the driver must also implement these FSM + * arcs: + * RUNNING -> RUNNING_P2P + * RUNNING_P2P -> RUNNING + * RUNNING_P2P -> STOP + * STOP -> RUNNING_P2P + * Without P2P the driver must implement: + * RUNNING -> STOP + * STOP -> RUNNING + * + * If all optional features are supported then the coding will step + * through multiple states for these combination transitions: + * RESUMING -> STOP -> RUNNING_P2P + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING * RESUMING -> STOP -> STOP_COPY - * RUNNING -> STOP -> RESUMING - * RUNNING -> STOP -> STOP_COPY + * RUNNING -> RUNNING_P2P -> STOP + * RUNNING -> RUNNING_P2P -> STOP -> RESUMING + * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY + * RUNNING_P2P -> STOP -> RESUMING + * RUNNING_P2P -> STOP -> STOP_COPY + * STOP -> RUNNING_P2P -> RUNNING * STOP_COPY -> STOP -> RESUMING - * STOP_COPY -> STOP -> RUNNING + * STOP_COPY -> STOP -> RUNNING_P2P + * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING */ static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = { [VFIO_DEVICE_STATE_STOP] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, - [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, [VFIO_DEVICE_STATE_RUNNING] = { - [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, - [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, - [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, [VFIO_DEVICE_STATE_STOP_COPY] = { @@ -1617,6 +1633,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, [VFIO_DEVICE_STATE_RESUMING] = { @@ -1624,6 +1641,15 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, + }, + [VFIO_DEVICE_STATE_RUNNING_P2P] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, [VFIO_DEVICE_STATE_ERROR] = { @@ -1631,17 +1657,41 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, }; - if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table))) + static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY, + [VFIO_DEVICE_STATE_RUNNING_P2P] = + VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P, + [VFIO_DEVICE_STATE_ERROR] = ~0U, + }; + + if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) || + (state_flags_table[cur_fsm] & device->migration_flags) != + state_flags_table[cur_fsm])) return -EINVAL; - if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table)) + if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table) || + (state_flags_table[new_fsm] & device->migration_flags) != + state_flags_table[new_fsm]) return -EINVAL; + /* + * Arcs touching optional and unsupported states are skipped over. The + * driver will instead see an arc from the original state to the next + * logical state, as per the above comment. + */ *next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm]; + while ((state_flags_table[*next_fsm] & device->migration_flags) != + state_flags_table[*next_fsm]) + *next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm]; + return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL; } EXPORT_SYMBOL_GPL(vfio_mig_get_next_state); @@ -1731,7 +1781,7 @@ static int vfio_ioctl_device_feature_migration(struct vfio_device *device, size_t argsz) { struct vfio_device_feature_migration mig = { - .flags = VFIO_MIGRATION_STOP_COPY, + .flags = device->migration_flags, }; int ret; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 3bbadcdbc9c8..3176cb5d4464 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -33,6 +33,7 @@ struct vfio_device { struct vfio_group *group; struct vfio_device_set *dev_set; struct list_head dev_set_list; + unsigned int migration_flags; /* Members below here are private, not for driver use */ refcount_t refcount; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 02b836ea8f46..46b06946f0a8 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1010,10 +1010,16 @@ struct vfio_device_feature { * * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and * RESUMING are supported. + * + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P + * is supported in addition to the STOP_COPY states. + * + * Other combinations of flags have behavior to be defined in the future. */ struct vfio_device_feature_migration { __aligned_u64 flags; #define VFIO_MIGRATION_STOP_COPY (1 << 0) +#define VFIO_MIGRATION_P2P (1 << 1) }; #define VFIO_DEVICE_FEATURE_MIGRATION 1 @@ -1064,10 +1070,13 @@ struct vfio_device_feature_mig_state { * RESUMING - The device is stopped and is loading a new internal state * ERROR - The device has failed and must be reset * + * And 1 optional state to support VFIO_MIGRATION_P2P: + * RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA + * * The FSM takes actions on the arcs between FSM states. The driver implements * the following behavior for the FSM arcs: * - * RUNNING -> STOP + * RUNNING_P2P -> STOP * STOP_COPY -> STOP * While in STOP the device must stop the operation of the device. The device * must not generate interrupts, DMA, or any other change to external state. @@ -1094,11 +1103,16 @@ struct vfio_device_feature_mig_state { * * To abort a RESUMING session the device must be reset. * - * STOP -> RUNNING + * RUNNING_P2P -> RUNNING * While in RUNNING the device is fully operational, the device may generate * interrupts, DMA, respond to MMIO, all vfio device regions are functional, * and the device may advance its internal state. * + * RUNNING -> RUNNING_P2P + * STOP -> RUNNING_P2P + * While in RUNNING_P2P the device is partially running in the P2P quiescent + * state defined below. + * * STOP -> STOP_COPY * This arc begin the process of saving the device state and will return a * new data_fd. @@ -1128,6 +1142,18 @@ struct vfio_device_feature_mig_state { * To recover from ERROR VFIO_DEVICE_RESET must be used to return the * device_state back to RUNNING. * + * The optional peer to peer (P2P) quiescent state is intended to be a quiescent + * state for the device for the purposes of managing multiple devices within a + * user context where peer-to-peer DMA between devices may be active. The + * RUNNING_P2P states must prevent the device from initiating + * any new P2P DMA transactions. If the device can identify P2P transactions + * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration + * driver must complete any such outstanding operations prior to completing the + * FSM arc into a P2P state. For the purpose of specification the states + * behave as though the device was fully running if not supported. Like while in + * STOP or STOP_COPY the user must not touch the device, otherwise the state + * can be exited. + * * The remaining possible transitions are interpreted as combinations of the * above FSM arcs. As there are multiple paths through the FSM arcs the path * should be selected based on the following rules: @@ -1140,6 +1166,11 @@ struct vfio_device_feature_mig_state { * fails. When handling these types of errors users should anticipate future * revisions of this protocol using new states and those states becoming * visible in this case. + * + * The optional states cannot be used with SET_STATE if the device does not + * support them. The user can discover if these states are supported by using + * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can + * avoid knowing about these optional states if the kernel driver supports them. */ enum vfio_device_mig_state { VFIO_DEVICE_STATE_ERROR = 0, @@ -1147,6 +1178,7 @@ enum vfio_device_mig_state { VFIO_DEVICE_STATE_RUNNING = 2, VFIO_DEVICE_STATE_STOP_COPY = 3, VFIO_DEVICE_STATE_RESUMING = 4, + VFIO_DEVICE_STATE_RUNNING_P2P = 5, }; /* -------- API for Type1 VFIO IOMMU -------- */