diff mbox series

[V9,mlx5-next,10/15] vfio: Extend the device migration protocol with RUNNING_P2P

Message ID 20220224142024.147653-11-yishaih@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Add mlx5 live migration driver and v2 migration protocol | expand

Commit Message

Yishai Hadas Feb. 24, 2022, 2:20 p.m. UTC
From: Jason Gunthorpe <jgg@nvidia.com>

The RUNNING_P2P state is designed to support multiple devices in the same
VM that are doing P2P transactions between themselves. When in RUNNING_P2P
the device must be able to accept incoming P2P transactions but should not
generate outgoing P2P transactions.

As an optional extension to the mandatory states it is defined as
inbetween STOP and RUNNING:
   STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP

For drivers that are unable to support RUNNING_P2P the core code
silently merges RUNNING_P2P and RUNNING together. Unless driver support
is present, the new state cannot be used in SET_STATE.
Drivers that support this will be required to implement 4 FSM arcs
beyond the basic FSM. 2 of the basic FSM arcs become combination
transitions.

Compared to the v1 clarification, NDMA is redefined into FSM states and is
described in terms of the desired P2P quiescent behavior, noting that
halting all DMA is an acceptable implementation.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/vfio.c       | 84 +++++++++++++++++++++++++++++++--------
 include/linux/vfio.h      |  1 +
 include/uapi/linux/vfio.h | 36 ++++++++++++++++-
 3 files changed, 102 insertions(+), 19 deletions(-)

Comments

Cornelia Huck Feb. 24, 2022, 3:21 p.m. UTC | #1
On Thu, Feb 24 2022, Yishai Hadas <yishaih@nvidia.com> wrote:

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 22ed358c04c5..26a66f68371d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1011,10 +1011,16 @@ struct vfio_device_feature {
>   *
>   * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
>   * RESUMING are supported.
> + *
> + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
> + * is supported in addition to the STOP_COPY states.
> + *
> + * Other combinations of flags have behavior to be defined in the future.
>   */
>  struct vfio_device_feature_migration {
>  	__aligned_u64 flags;
>  #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
> +#define VFIO_MIGRATION_P2P		(1 << 1)
>  };

Coming back to my argument (for the previous series) that this should
rather be "at least one of the flags below must be set". If we operate
under the general assumption that each flag indicates that a certain
functionality (including some states) is supported, and that flags may
depend on other flags, we might have a future flag that defines a
different behaviour, but does not depend on STOP_COPY, but rather
conflicts with it. We should not create the impression that STOP_COPY
will neccessarily be mandatory for all time.

So, if we use my suggestion from the last round, what about making the
new addition

"VFIO_MIGRATION_P2P means that RUNNING_P2P is supported in addition to
the STOP_COPY states. It depends on VFIO_MIGRATION_STOP_COPY."

Maybe we could also use the additional clarification

"at least one of the flags below must be set, and flags may depend on or
conflict with each other."

That implies that VFIO_MIGRATION_STOP_COPY is mandatory with the current
set of defined flags. I would not really object to adding "This flag is
currently mandatory", but I do not like singling it out in the general
description of how the flags work.

Sorry if that sounds nitpicky, but I think we really need to make it
clear that we have some nice possible flexibility with how the flags
work.
Alex Williamson Feb. 24, 2022, 3:30 p.m. UTC | #2
On Thu, 24 Feb 2022 16:21:11 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, Feb 24 2022, Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 22ed358c04c5..26a66f68371d 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -1011,10 +1011,16 @@ struct vfio_device_feature {
> >   *
> >   * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
> >   * RESUMING are supported.
> > + *
> > + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
> > + * is supported in addition to the STOP_COPY states.
> > + *
> > + * Other combinations of flags have behavior to be defined in the future.
> >   */
> >  struct vfio_device_feature_migration {
> >  	__aligned_u64 flags;
> >  #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
> > +#define VFIO_MIGRATION_P2P		(1 << 1)
> >  };  
> 
> Coming back to my argument (for the previous series) that this should
> rather be "at least one of the flags below must be set". If we operate
> under the general assumption that each flag indicates that a certain
> functionality (including some states) is supported, and that flags may
> depend on other flags, we might have a future flag that defines a
> different behaviour, but does not depend on STOP_COPY, but rather
> conflicts with it. We should not create the impression that STOP_COPY
> will neccessarily be mandatory for all time.

This sounds more like an enum than a bitfield.  What happens when
VFIO_MIGRATION_FOO is defined that can be combined with either
STOP_COPY or P2P, do we then add two new enum values, one with, one
without?  Using a bitfield seems cleaner here.  Reporting P2P alone is
invalid because it doesn't provide a sufficient FSM, but we might also
later define STOP_COPYv2 and possibly it might be compatible with the
existing P2P definition.  Thanks,

Alex
Jason Gunthorpe Feb. 24, 2022, 4:13 p.m. UTC | #3
On Thu, Feb 24, 2022 at 08:30:42AM -0700, Alex Williamson wrote:
> On Thu, 24 Feb 2022 16:21:11 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Thu, Feb 24 2022, Yishai Hadas <yishaih@nvidia.com> wrote:
> > 
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 22ed358c04c5..26a66f68371d 100644
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -1011,10 +1011,16 @@ struct vfio_device_feature {
> > >   *
> > >   * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
> > >   * RESUMING are supported.
> > > + *
> > > + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
> > > + * is supported in addition to the STOP_COPY states.
> > > + *
> > > + * Other combinations of flags have behavior to be defined in the future.
> > >   */
> > >  struct vfio_device_feature_migration {
> > >  	__aligned_u64 flags;
> > >  #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
> > > +#define VFIO_MIGRATION_P2P		(1 << 1)
> > >  };  
> > 
> > Coming back to my argument (for the previous series) that this should
> > rather be "at least one of the flags below must be set". If we operate
> > under the general assumption that each flag indicates that a certain
> > functionality (including some states) is supported, and that flags may
> > depend on other flags, we might have a future flag that defines a
> > different behaviour, but does not depend on STOP_COPY, but rather
> > conflicts with it. We should not create the impression that STOP_COPY
> > will neccessarily be mandatory for all time.
> 
> This sounds more like an enum than a bitfield. 

It is kind of working in both ways.

The comment enumerates all the valid tests of the flags. This is not
really a mandatory/optional scheme.

If userspace wants to check support for what is described by
VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P then it must test both
bits exactly as the comment says.

In this way the universe of valid tests is limited, and it acts sort
of like an enumeration.

Using a bit test, not an equality, allows better options for future
expansion.

The key takeaway is that userspace cannot test bit combinations that
are not defined in the comment and expect anything - which is exactly
what the comment says:

> * Other combinations of flags have behavior to be defined in the future.


> > conflicts with it. We should not create the impression that STOP_COPY
> > will neccessarily be mandatory for all time.

We really *should* create that impression because a userspace that
does not test STOP_COPY in the cases required above is *broken* and
must be strongly discouraged from existing.

The purpose of this comment is to inform the userspace implementator,
not to muse about possible future expansion options for kernel
developers. We all agree this expansion path exists and is valid, we
need to keep that option open by helping userspace implement
correctly.

Jason
Alex Williamson Feb. 24, 2022, 4:35 p.m. UTC | #4
On Thu, 24 Feb 2022 12:13:30 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Feb 24, 2022 at 08:30:42AM -0700, Alex Williamson wrote:
> > On Thu, 24 Feb 2022 16:21:11 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> > > On Thu, Feb 24 2022, Yishai Hadas <yishaih@nvidia.com> wrote:
> > >   
> > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > index 22ed358c04c5..26a66f68371d 100644
> > > > +++ b/include/uapi/linux/vfio.h
> > > > @@ -1011,10 +1011,16 @@ struct vfio_device_feature {
> > > >   *
> > > >   * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
> > > >   * RESUMING are supported.
> > > > + *
> > > > + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
> > > > + * is supported in addition to the STOP_COPY states.
> > > > + *
> > > > + * Other combinations of flags have behavior to be defined in the future.
> > > >   */
> > > >  struct vfio_device_feature_migration {
> > > >  	__aligned_u64 flags;
> > > >  #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
> > > > +#define VFIO_MIGRATION_P2P		(1 << 1)
> > > >  };    
> > > 
> > > Coming back to my argument (for the previous series) that this should
> > > rather be "at least one of the flags below must be set". If we operate
> > > under the general assumption that each flag indicates that a certain
> > > functionality (including some states) is supported, and that flags may
> > > depend on other flags, we might have a future flag that defines a
> > > different behaviour, but does not depend on STOP_COPY, but rather
> > > conflicts with it. We should not create the impression that STOP_COPY
> > > will neccessarily be mandatory for all time.  
> > 
> > This sounds more like an enum than a bitfield.   
> 
> It is kind of working in both ways.
> 
> The comment enumerates all the valid tests of the flags. This is not
> really a mandatory/optional scheme.
> 
> If userspace wants to check support for what is described by
> VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P then it must test both
> bits exactly as the comment says.
> 
> In this way the universe of valid tests is limited, and it acts sort
> of like an enumeration.
> 
> Using a bit test, not an equality, allows better options for future
> expansion.

Yes.
 
> The key takeaway is that userspace cannot test bit combinations that
> are not defined in the comment and expect anything - which is exactly
> what the comment says:
> 
> > * Other combinations of flags have behavior to be defined in the future.  
> 
> 
> > > conflicts with it. We should not create the impression that STOP_COPY
> > > will neccessarily be mandatory for all time.  
> 
> We really *should* create that impression because a userspace that
> does not test STOP_COPY in the cases required above is *broken* and
> must be strongly discouraged from existing.
> 
> The purpose of this comment is to inform the userspace implementator,
> not to muse about possible future expansion options for kernel
> developers. We all agree this expansion path exists and is valid, we
> need to keep that option open by helping userspace implement
> correctly.

Chatting with Connie offline, I think the clarification that might help
is something alone the lines that the combination of bits must support
migration, which currently requires the STOP_COPY and RESUMING states.
The VFIO_MIGRATION_P2P flag alone does not provide these states.  The
only flag in the current specification to provide these states is
VFIO_MIGRATION_STOP_COPY.  I don't think we want to preclude that some
future flag might provide variants of STOP_COPY and RESUMING, so it's
not so much that VFIO_MIGRATION_STOP_COPY is mandatory, but it is
currently the only flag which provides the base degree of migration
support.

How or if that translates to an actual documentation update, I'm not
sure.  As it stands, we're not speculating about future support, we're
only stating these two combinations are valid.  Future combinations may
or may not include VFIO_MIGRATION_STOP_COPY.  As the existing proposed
comment indicates, other combinations are TBD.  Connie?  Thanks,

Alex
Cornelia Huck Feb. 24, 2022, 4:53 p.m. UTC | #5
On Thu, Feb 24 2022, Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu, 24 Feb 2022 12:13:30 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
>> On Thu, Feb 24, 2022 at 08:30:42AM -0700, Alex Williamson wrote:
>> > On Thu, 24 Feb 2022 16:21:11 +0100
>> > Cornelia Huck <cohuck@redhat.com> wrote:

>> > > conflicts with it. We should not create the impression that STOP_COPY
>> > > will neccessarily be mandatory for all time.  
>> 
>> We really *should* create that impression because a userspace that
>> does not test STOP_COPY in the cases required above is *broken* and
>> must be strongly discouraged from existing.

Well yes, you need STOP_COPY with the current implementation. I'm not
arguing against that.

>> 
>> The purpose of this comment is to inform the userspace implementator,
>> not to muse about possible future expansion options for kernel
>> developers. We all agree this expansion path exists and is valid, we
>> need to keep that option open by helping userspace implement
>> correctly.
>
> Chatting with Connie offline, I think the clarification that might help
> is something alone the lines that the combination of bits must support
> migration, which currently requires the STOP_COPY and RESUMING states.
> The VFIO_MIGRATION_P2P flag alone does not provide these states.  The
> only flag in the current specification to provide these states is
> VFIO_MIGRATION_STOP_COPY.  I don't think we want to preclude that some
> future flag might provide variants of STOP_COPY and RESUMING, so it's
> not so much that VFIO_MIGRATION_STOP_COPY is mandatory, but it is
> currently the only flag which provides the base degree of migration
> support.

Indeed.

>
> How or if that translates to an actual documentation update, I'm not
> sure.  As it stands, we're not speculating about future support, we're
> only stating these two combinations are valid.  Future combinations may
> or may not include VFIO_MIGRATION_STOP_COPY.  As the existing proposed
> comment indicates, other combinations are TBD.  Connie?  Thanks,
>
> Alex

Hm... "a flag indicating support for a migration state machine such as
VFIO_MIGRATION_STOP_COPY is mandatory"?
Alex Williamson Feb. 24, 2022, 8:46 p.m. UTC | #6
On Thu, 24 Feb 2022 17:53:59 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, Feb 24 2022, Alex Williamson <alex.williamson@redhat.com> wrote:
> > Chatting with Connie offline, I think the clarification that might help
> > is something alone the lines that the combination of bits must support
> > migration, which currently requires the STOP_COPY and RESUMING states.
> > The VFIO_MIGRATION_P2P flag alone does not provide these states.  The
> > only flag in the current specification to provide these states is
> > VFIO_MIGRATION_STOP_COPY.  I don't think we want to preclude that some
> > future flag might provide variants of STOP_COPY and RESUMING, so it's
> > not so much that VFIO_MIGRATION_STOP_COPY is mandatory, but it is
> > currently the only flag which provides the base degree of migration
> > support.  
> 
> Indeed.
> 
> >
> > How or if that translates to an actual documentation update, I'm not
> > sure.  As it stands, we're not speculating about future support, we're
> > only stating these two combinations are valid.  Future combinations may
> > or may not include VFIO_MIGRATION_STOP_COPY.  As the existing proposed
> > comment indicates, other combinations are TBD.  Connie?  Thanks,
> >
> > Alex  
> 
> Hm... "a flag indicating support for a migration state machine such as
> VFIO_MIGRATION_STOP_COPY is mandatory"?

TBH, I'm not sure this makes a substantive improvement.  We don't know
what those new flag bits will be used for, including which bit or bits
will combine to indicate a valid state machine.  Userspace written to
this spec needs to support STOP_COPY and optionally P2P as we're
stating.  Nothing really compels us to speculate general rules for
unknown future bit combinations.  Thanks,

Alex
Cornelia Huck March 2, 2022, 11:51 a.m. UTC | #7
On Thu, Feb 24 2022, Yishai Hadas <yishaih@nvidia.com> wrote:

> From: Jason Gunthorpe <jgg@nvidia.com>
>
> The RUNNING_P2P state is designed to support multiple devices in the same
> VM that are doing P2P transactions between themselves. When in RUNNING_P2P
> the device must be able to accept incoming P2P transactions but should not
> generate outgoing P2P transactions.
>
> As an optional extension to the mandatory states it is defined as
> inbetween STOP and RUNNING:
>    STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP
>
> For drivers that are unable to support RUNNING_P2P the core code
> silently merges RUNNING_P2P and RUNNING together. Unless driver support
> is present, the new state cannot be used in SET_STATE.
> Drivers that support this will be required to implement 4 FSM arcs
> beyond the basic FSM. 2 of the basic FSM arcs become combination
> transitions.
>
> Compared to the v1 clarification, NDMA is redefined into FSM states and is
> described in terms of the desired P2P quiescent behavior, noting that
> halting all DMA is an acceptable implementation.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/vfio.c       | 84 +++++++++++++++++++++++++++++++--------
>  include/linux/vfio.h      |  1 +
>  include/uapi/linux/vfio.h | 36 ++++++++++++++++-
>  3 files changed, 102 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index b37ab27b511f..bdb5205bb358 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1577,39 +1577,55 @@ int vfio_mig_get_next_state(struct vfio_device *device,
>  			    enum vfio_device_mig_state new_fsm,
>  			    enum vfio_device_mig_state *next_fsm)
>  {
> -	enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RESUMING + 1 };
> +	enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 };
>  	/*
> -	 * The coding in this table requires the driver to implement 6
> +	 * The coding in this table requires the driver to implement
>  	 * FSM arcs:

Nit: this now reads a bit strange; maybe "requires the driver to
implement the following FSM arcs"?

>  	 *         RESUMING -> STOP
> -	 *         RUNNING -> STOP
>  	 *         STOP -> RESUMING
> -	 *         STOP -> RUNNING
>  	 *         STOP -> STOP_COPY
>  	 *         STOP_COPY -> STOP
>  	 *
> -	 * The coding will step through multiple states for these combination
> -	 * transitions:
> -	 *         RESUMING -> STOP -> RUNNING
> +	 * If P2P is supported then the driver must also implement these FSM
> +	 * arcs:
> +	 *         RUNNING -> RUNNING_P2P
> +	 *         RUNNING_P2P -> RUNNING
> +	 *         RUNNING_P2P -> STOP
> +	 *         STOP -> RUNNING_P2P
> +	 * Without P2P the driver must implement:
> +	 *         RUNNING -> STOP
> +	 *         STOP -> RUNNING
> +	 *
> +	 * If all optional features are supported then the coding will step
> +	 * through multiple states for these combination transitions:

Maybe "The coding will step through multiple states for some combination
transitions; if all optional features are supported, this means the
following ones:"?

> +	 *         RESUMING -> STOP -> RUNNING_P2P
> +	 *         RESUMING -> STOP -> RUNNING_P2P -> RUNNING
>  	 *         RESUMING -> STOP -> STOP_COPY
> -	 *         RUNNING -> STOP -> RESUMING
> -	 *         RUNNING -> STOP -> STOP_COPY
> +	 *         RUNNING -> RUNNING_P2P -> STOP
> +	 *         RUNNING -> RUNNING_P2P -> STOP -> RESUMING
> +	 *         RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY
> +	 *         RUNNING_P2P -> STOP -> RESUMING
> +	 *         RUNNING_P2P -> STOP -> STOP_COPY
> +	 *         STOP -> RUNNING_P2P -> RUNNING
>  	 *         STOP_COPY -> STOP -> RESUMING
> -	 *         STOP_COPY -> STOP -> RUNNING
> +	 *         STOP_COPY -> STOP -> RUNNING_P2P
> +	 *         STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING
>  	 */
diff mbox series

Patch

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index b37ab27b511f..bdb5205bb358 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1577,39 +1577,55 @@  int vfio_mig_get_next_state(struct vfio_device *device,
 			    enum vfio_device_mig_state new_fsm,
 			    enum vfio_device_mig_state *next_fsm)
 {
-	enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RESUMING + 1 };
+	enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 };
 	/*
-	 * The coding in this table requires the driver to implement 6
+	 * The coding in this table requires the driver to implement
 	 * FSM arcs:
 	 *         RESUMING -> STOP
-	 *         RUNNING -> STOP
 	 *         STOP -> RESUMING
-	 *         STOP -> RUNNING
 	 *         STOP -> STOP_COPY
 	 *         STOP_COPY -> STOP
 	 *
-	 * The coding will step through multiple states for these combination
-	 * transitions:
-	 *         RESUMING -> STOP -> RUNNING
+	 * If P2P is supported then the driver must also implement these FSM
+	 * arcs:
+	 *         RUNNING -> RUNNING_P2P
+	 *         RUNNING_P2P -> RUNNING
+	 *         RUNNING_P2P -> STOP
+	 *         STOP -> RUNNING_P2P
+	 * Without P2P the driver must implement:
+	 *         RUNNING -> STOP
+	 *         STOP -> RUNNING
+	 *
+	 * If all optional features are supported then the coding will step
+	 * through multiple states for these combination transitions:
+	 *         RESUMING -> STOP -> RUNNING_P2P
+	 *         RESUMING -> STOP -> RUNNING_P2P -> RUNNING
 	 *         RESUMING -> STOP -> STOP_COPY
-	 *         RUNNING -> STOP -> RESUMING
-	 *         RUNNING -> STOP -> STOP_COPY
+	 *         RUNNING -> RUNNING_P2P -> STOP
+	 *         RUNNING -> RUNNING_P2P -> STOP -> RESUMING
+	 *         RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY
+	 *         RUNNING_P2P -> STOP -> RESUMING
+	 *         RUNNING_P2P -> STOP -> STOP_COPY
+	 *         STOP -> RUNNING_P2P -> RUNNING
 	 *         STOP_COPY -> STOP -> RESUMING
-	 *         STOP_COPY -> STOP -> RUNNING
+	 *         STOP_COPY -> STOP -> RUNNING_P2P
+	 *         STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING
 	 */
 	static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = {
 		[VFIO_DEVICE_STATE_STOP] = {
 			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
-			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
 		},
 		[VFIO_DEVICE_STATE_RUNNING] = {
-			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
-			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP,
-			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
 		},
 		[VFIO_DEVICE_STATE_STOP_COPY] = {
@@ -1617,6 +1633,7 @@  int vfio_mig_get_next_state(struct vfio_device *device,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
 		},
 		[VFIO_DEVICE_STATE_RESUMING] = {
@@ -1624,6 +1641,15 @@  int vfio_mig_get_next_state(struct vfio_device *device,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+		},
+		[VFIO_DEVICE_STATE_RUNNING_P2P] = {
+			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
 		},
 		[VFIO_DEVICE_STATE_ERROR] = {
@@ -1631,17 +1657,41 @@  int vfio_mig_get_next_state(struct vfio_device *device,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
 		},
 	};
 
-	if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table)))
+	static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = {
+		[VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY,
+		[VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY,
+		[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY,
+		[VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY,
+		[VFIO_DEVICE_STATE_RUNNING_P2P] =
+			VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P,
+		[VFIO_DEVICE_STATE_ERROR] = ~0U,
+	};
+
+	if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
+		    (state_flags_table[cur_fsm] & device->migration_flags) !=
+			state_flags_table[cur_fsm]))
 		return -EINVAL;
 
-	if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table))
+	if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
+	   (state_flags_table[new_fsm] & device->migration_flags) !=
+			state_flags_table[new_fsm])
 		return -EINVAL;
 
+	/*
+	 * Arcs touching optional and unsupported states are skipped over. The
+	 * driver will instead see an arc from the original state to the next
+	 * logical state, as per the above comment.
+	 */
 	*next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
+	while ((state_flags_table[*next_fsm] & device->migration_flags) !=
+			state_flags_table[*next_fsm])
+		*next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm];
+
 	return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL;
 }
 EXPORT_SYMBOL_GPL(vfio_mig_get_next_state);
@@ -1731,7 +1781,7 @@  static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
 					       size_t argsz)
 {
 	struct vfio_device_feature_migration mig = {
-		.flags = VFIO_MIGRATION_STOP_COPY,
+		.flags = device->migration_flags,
 	};
 	int ret;
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index acc99aeea29b..f1b5d231b7ed 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -33,6 +33,7 @@  struct vfio_device {
 	struct vfio_group *group;
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
+	unsigned int migration_flags;
 
 	/* Members below here are private, not for driver use */
 	refcount_t refcount;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 22ed358c04c5..26a66f68371d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1011,10 +1011,16 @@  struct vfio_device_feature {
  *
  * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
  * RESUMING are supported.
+ *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
+ * is supported in addition to the STOP_COPY states.
+ *
+ * Other combinations of flags have behavior to be defined in the future.
  */
 struct vfio_device_feature_migration {
 	__aligned_u64 flags;
 #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
+#define VFIO_MIGRATION_P2P		(1 << 1)
 };
 #define VFIO_DEVICE_FEATURE_MIGRATION 1
 
@@ -1065,10 +1071,13 @@  struct vfio_device_feature_mig_state {
  *  RESUMING - The device is stopped and is loading a new internal state
  *  ERROR - The device has failed and must be reset
  *
+ * And 1 optional state to support VFIO_MIGRATION_P2P:
+ *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
+ *
  * The FSM takes actions on the arcs between FSM states. The driver implements
  * the following behavior for the FSM arcs:
  *
- * RUNNING -> STOP
+ * RUNNING_P2P -> STOP
  * STOP_COPY -> STOP
  *   While in STOP the device must stop the operation of the device. The device
  *   must not generate interrupts, DMA, or any other change to external state.
@@ -1095,11 +1104,16 @@  struct vfio_device_feature_mig_state {
  *
  *   To abort a RESUMING session the device must be reset.
  *
- * STOP -> RUNNING
+ * RUNNING_P2P -> RUNNING
  *   While in RUNNING the device is fully operational, the device may generate
  *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
  *   and the device may advance its internal state.
  *
+ * RUNNING -> RUNNING_P2P
+ * STOP -> RUNNING_P2P
+ *   While in RUNNING_P2P the device is partially running in the P2P quiescent
+ *   state defined below.
+ *
  * STOP -> STOP_COPY
  *   This arc begin the process of saving the device state and will return a
  *   new data_fd.
@@ -1129,6 +1143,18 @@  struct vfio_device_feature_mig_state {
  *   To recover from ERROR VFIO_DEVICE_RESET must be used to return the
  *   device_state back to RUNNING.
  *
+ * The optional peer to peer (P2P) quiescent state is intended to be a quiescent
+ * state for the device for the purposes of managing multiple devices within a
+ * user context where peer-to-peer DMA between devices may be active. The
+ * RUNNING_P2P states must prevent the device from initiating
+ * any new P2P DMA transactions. If the device can identify P2P transactions
+ * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
+ * driver must complete any such outstanding operations prior to completing the
+ * FSM arc into a P2P state. For the purpose of specification the states
+ * behave as though the device was fully running if not supported. Like while in
+ * STOP or STOP_COPY the user must not touch the device, otherwise the state
+ * can be exited.
+ *
  * The remaining possible transitions are interpreted as combinations of the
  * above FSM arcs. As there are multiple paths through the FSM arcs the path
  * should be selected based on the following rules:
@@ -1141,6 +1167,11 @@  struct vfio_device_feature_mig_state {
  * fails. When handling these types of errors users should anticipate future
  * revisions of this protocol using new states and those states becoming
  * visible in this case.
+ *
+ * The optional states cannot be used with SET_STATE if the device does not
+ * support them. The user can discover if these states are supported by using
+ * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
+ * avoid knowing about these optional states if the kernel driver supports them.
  */
 enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_ERROR = 0,
@@ -1148,6 +1179,7 @@  enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_RUNNING = 2,
 	VFIO_DEVICE_STATE_STOP_COPY = 3,
 	VFIO_DEVICE_STATE_RESUMING = 4,
+	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
 };
 
 /* -------- API for Type1 VFIO IOMMU -------- */