diff mbox series

[Kernel,v20,4/8] vfio iommu: Add ioctl definition for dirty pages tracking

Message ID 1589488667-9683-5-git-send-email-kwankhede@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Add UAPIs to support migration for VFIO devices | expand

Commit Message

Kirti Wankhede May 14, 2020, 8:37 p.m. UTC
IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
All pages pinned by vendor driver through this API should be considered as
dirty during migration. When container consists of IOMMU capable device and
all pages are pinned and mapped, then all pages are marked dirty.
Added support to start/stop dirtied pages tracking and to get bitmap of all
dirtied pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
---
 include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

Comments

Cornelia Huck May 15, 2020, 10:59 a.m. UTC | #1
On Fri, 15 May 2020 02:07:43 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
> All pages pinned by vendor driver through this API should be considered as
> dirty during migration. When container consists of IOMMU capable device and
> all pages are pinned and mapped, then all pages are marked dirty.
> Added support to start/stop dirtied pages tracking and to get bitmap of all
> dirtied pages for requested IO virtual address range.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> Reviewed-by: Neo Jia <cjia@nvidia.com>
> ---
>  include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ad9bb5af3463..123de3bc2dce 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1033,6 +1033,12 @@ struct vfio_iommu_type1_dma_map {
>  
>  #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
>  
> +struct vfio_bitmap {
> +	__u64        pgsize;	/* page size for bitmap in bytes */
> +	__u64        size;	/* in bytes */
> +	__u64 __user *data;	/* one bit per page */
> +};
> +
>  /**
>   * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
>   *							struct vfio_dma_unmap)
> @@ -1059,6 +1065,55 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**
> + * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
> + *                                     struct vfio_iommu_type1_dirty_bitmap)
> + * IOCTL is used for dirty pages tracking.
> + * Caller should set flag depending on which operation to perform, details as
> + * below:
> + *
> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
> + * migration is active and IOMMU module should track pages which are dirtied or
> + * potentially dirtied by device.

"Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_START instructs the
IOMMU driver to track pages that are dirtied or potentially dirtied by
the device; designed to be used when a migration is in progress."

?

> + * Dirty pages are tracked until tracking is stopped by user application by
> + * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.

"...by calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP." ?

> + *
> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
> + * IOMMU should stop tracking dirtied pages.

"Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP instructs the
IOMMU driver to stop tracking dirtied pages."

?

> + *
> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
> + * IOCTL returns dirty pages bitmap for IOMMU container during migration for
> + * given IOVA range. 

"Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_GET_BITMAP returns the
dirty pages bitmap for the IOMMU container for a given IOVA range." ?

Q: How does this interact with the two other operations? I imagine
getting an empty bitmap before _START and a bitmap-in-progress between
_START and _STOP. After _STOP, will subsequent calls always give the
same bitmap?

> User must provide data[] as the structure
> + * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
> + * pgsize.

"The user must specify the IOVA range and the pgsize through the
vfio_iommu_type1_dirty_bitmap_get structure in the data[] portion."

?

> This interface supports to get bitmap of smallest supported pgsize
> + * only and can be modified in future to get bitmap of specified pgsize.

That's a current restriction? How can the user find out whether it has
been lifted (or, more generally, find out which pgsize values are
supported)?

> + * User must allocate memory for bitmap, zero the bitmap memory  and set size
> + * of allocated memory in bitmap.size field.

"The user must provide a zeroed memory area for the bitmap memory and
specify its size in bitmap.size."

?

> One bit is used to represent one
> + * page consecutively starting from iova offset. User should provide page size
> + * in bitmap.pgsize field. 

s/User/The user/

Is that the 'pgsize' the comment above talks about?

> Bit set in bitmap indicates page at that offset from
> + * iova is dirty. 

"A bit set in the bitmap indicates that the page at that offset from
iova is dirty." ?

> Caller must set argsz including size of structure
> + * vfio_iommu_type1_dirty_bitmap_get.

s/Caller/The caller/

Does argz also include the size of the bitmap?

> + *
> + * Only one of the flags _START, STOP and _GET may be specified at a time.

s/STOP/_STOP/

(just to be consistent)

> + *
> + */
> +struct vfio_iommu_type1_dirty_bitmap {
> +	__u32        argsz;
> +	__u32        flags;
> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START	(1 << 0)
> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP	(1 << 1)
> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP	(1 << 2)
> +	__u8         data[];
> +};
> +
> +struct vfio_iommu_type1_dirty_bitmap_get {
> +	__u64              iova;	/* IO virtual address */
> +	__u64              size;	/* Size of iova range */
> +	struct vfio_bitmap bitmap;
> +};

That's for type1 only, right?

> +
> +#define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
Kirti Wankhede May 15, 2020, 5:35 p.m. UTC | #2
On 5/15/2020 4:29 PM, Cornelia Huck wrote:
> On Fri, 15 May 2020 02:07:43 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
>> All pages pinned by vendor driver through this API should be considered as
>> dirty during migration. When container consists of IOMMU capable device and
>> all pages are pinned and mapped, then all pages are marked dirty.
>> Added support to start/stop dirtied pages tracking and to get bitmap of all
>> dirtied pages for requested IO virtual address range.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> Reviewed-by: Neo Jia <cjia@nvidia.com>
>> ---
>>   include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 55 insertions(+)
>>
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index ad9bb5af3463..123de3bc2dce 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -1033,6 +1033,12 @@ struct vfio_iommu_type1_dma_map {
>>   
>>   #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
>>   
>> +struct vfio_bitmap {
>> +	__u64        pgsize;	/* page size for bitmap in bytes */
>> +	__u64        size;	/* in bytes */
>> +	__u64 __user *data;	/* one bit per page */
>> +};
>> +
>>   /**
>>    * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
>>    *							struct vfio_dma_unmap)
>> @@ -1059,6 +1065,55 @@ struct vfio_iommu_type1_dma_unmap {
>>   #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>>   #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>>   
>> +/**
>> + * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
>> + *                                     struct vfio_iommu_type1_dirty_bitmap)
>> + * IOCTL is used for dirty pages tracking.
>> + * Caller should set flag depending on which operation to perform, details as
>> + * below:
>> + *
>> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
>> + * migration is active and IOMMU module should track pages which are dirtied or
>> + * potentially dirtied by device.
> 
> "Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_START instructs the
> IOMMU driver to track pages that are dirtied or potentially dirtied by
> the device; designed to be used when a migration is in progress."
> 
> ?
> 

Ok, updating.

>> + * Dirty pages are tracked until tracking is stopped by user application by
>> + * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.
> 
> "...by calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP." ?
> 
>> + *
>> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
>> + * IOMMU should stop tracking dirtied pages.
> 
> "Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP instructs the
> IOMMU driver to stop tracking dirtied pages."
> 
> ?
> 

Ok.

>> + *
>> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
>> + * IOCTL returns dirty pages bitmap for IOMMU container during migration for
>> + * given IOVA range.
> 
> "Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_GET_BITMAP returns the
> dirty pages bitmap for the IOMMU container for a given IOVA range." ?
> 
> Q: How does this interact with the two other operations? I imagine
> getting an empty bitmap before _START 

No, if dirty page tracking is not started, get_bitmap IOCTL will fail 
with -EINVAL.

> and a bitmap-in-progress between
> _START and _STOP. > After _STOP, will subsequent calls always give the
> same bitmap?
>

No, return -EINVAL.


>> User must provide data[] as the structure
>> + * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
>> + * pgsize.
> 
> "The user must specify the IOVA range and the pgsize through the
> vfio_iommu_type1_dirty_bitmap_get structure in the data[] portion."
> 
> ?
> 
>> This interface supports to get bitmap of smallest supported pgsize
>> + * only and can be modified in future to get bitmap of specified pgsize.
> 
> That's a current restriction? How can the user find out whether it has
> been lifted (or, more generally, find out which pgsize values are
> supported)?

Migration capability is added to IOMMU info chain. That gives supported 
pgsize bitmap by IOMMU driver.

> 
>> + * User must allocate memory for bitmap, zero the bitmap memory  and set size
>> + * of allocated memory in bitmap.size field.
> 
> "The user must provide a zeroed memory area for the bitmap memory and
> specify its size in bitmap.size."
> 
> ?
> 
>> One bit is used to represent one
>> + * page consecutively starting from iova offset. User should provide page size
>> + * in bitmap.pgsize field.
> 
> s/User/The user/
> 
> Is that the 'pgsize' the comment above talks about?
> 

By specifying pgsize here user can ask for bitmap of specific pgsize.

>> Bit set in bitmap indicates page at that offset from
>> + * iova is dirty.
> 
> "A bit set in the bitmap indicates that the page at that offset from
> iova is dirty." ?
> 
>> Caller must set argsz including size of structure
>> + * vfio_iommu_type1_dirty_bitmap_get.
> 
> s/Caller/The caller/
> 
> Does argz also include the size of the bitmap?

No.

> 
>> + *
>> + * Only one of the flags _START, STOP and _GET may be specified at a time.
> 
> s/STOP/_STOP/
> 
> (just to be consistent)
> 
>> + *
>> + */
>> +struct vfio_iommu_type1_dirty_bitmap {
>> +	__u32        argsz;
>> +	__u32        flags;
>> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START	(1 << 0)
>> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP	(1 << 1)
>> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP	(1 << 2)
>> +	__u8         data[];
>> +};
>> +
>> +struct vfio_iommu_type1_dirty_bitmap_get {
>> +	__u64              iova;	/* IO virtual address */
>> +	__u64              size;	/* Size of iova range */
>> +	struct vfio_bitmap bitmap;
>> +};
> 
> That's for type1 only, right?
>

Yes.

Thanks,
Kirti

>> +
>> +#define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
>> +
>>   /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>>   
>>   /*
>
Cornelia Huck May 19, 2020, 3:35 p.m. UTC | #3
On Fri, 15 May 2020 23:05:24 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 5/15/2020 4:29 PM, Cornelia Huck wrote:
> > On Fri, 15 May 2020 02:07:43 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
> >> All pages pinned by vendor driver through this API should be considered as
> >> dirty during migration. When container consists of IOMMU capable device and
> >> all pages are pinned and mapped, then all pages are marked dirty.
> >> Added support to start/stop dirtied pages tracking and to get bitmap of all
> >> dirtied pages for requested IO virtual address range.
> >>
> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> >> Reviewed-by: Neo Jia <cjia@nvidia.com>
> >> ---
> >>   include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 55 insertions(+)

(...)

> >> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
> >> + * IOCTL returns dirty pages bitmap for IOMMU container during migration for
> >> + * given IOVA range.  
> > 
> > "Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_GET_BITMAP returns the
> > dirty pages bitmap for the IOMMU container for a given IOVA range." ?
> > 
> > Q: How does this interact with the two other operations? I imagine
> > getting an empty bitmap before _START   
> 
> No, if dirty page tracking is not started, get_bitmap IOCTL will fail 
> with -EINVAL.
> 
> > and a bitmap-in-progress between
> > _START and _STOP. > After _STOP, will subsequent calls always give the
> > same bitmap?
> >  
> 
> No, return -EINVAL.

Maybe add

"If the IOCTL has not yet been called with
VFIO_IOMMU_DIRTY_PAGES_FLAG_START, or if it has been called with
VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP, calling it with
VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP will return -EINVAL." ?

> 
> 
> >> User must provide data[] as the structure
> >> + * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
> >> + * pgsize.  
> > 
> > "The user must specify the IOVA range and the pgsize through the
> > vfio_iommu_type1_dirty_bitmap_get structure in the data[] portion."
> > 
> > ?
> >   
> >> This interface supports to get bitmap of smallest supported pgsize
> >> + * only and can be modified in future to get bitmap of specified pgsize.  
> > 
> > That's a current restriction? How can the user find out whether it has
> > been lifted (or, more generally, find out which pgsize values are
> > supported)?  
> 
> Migration capability is added to IOMMU info chain. That gives supported 
> pgsize bitmap by IOMMU driver.

Add that info?

"The supported pgsize values for this interface are reported via the
migration capability in the IOMMU info chain."

> 
> >   
> >> + * User must allocate memory for bitmap, zero the bitmap memory  and set size
> >> + * of allocated memory in bitmap.size field.  
> > 
> > "The user must provide a zeroed memory area for the bitmap memory and
> > specify its size in bitmap.size."
> > 
> > ?
> >   
> >> One bit is used to represent one
> >> + * page consecutively starting from iova offset. User should provide page size
> >> + * in bitmap.pgsize field.  
> > 
> > s/User/The user/
> > 
> > Is that the 'pgsize' the comment above talks about?
> >   
> 
> By specifying pgsize here user can ask for bitmap of specific pgsize.

"The user should provide the page size for the bitmap in the
bitmap.pgsize field." ?

> 
> >> Bit set in bitmap indicates page at that offset from
> >> + * iova is dirty.  
> > 
> > "A bit set in the bitmap indicates that the page at that offset from
> > iova is dirty." ?
> >   
> >> Caller must set argsz including size of structure
> >> + * vfio_iommu_type1_dirty_bitmap_get.  
> > 
> > s/Caller/The caller/
> > 
> > Does argz also include the size of the bitmap?  
> 
> No.

"The caller must set argsz to a value including the size of stuct
vfio_io_type1_dirty_bitmap_get, but excluding the size of the actual
bitmap." ?
Alex Williamson May 19, 2020, 3:53 p.m. UTC | #4
On Tue, 19 May 2020 17:35:07 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 15 May 2020 23:05:24 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
> > On 5/15/2020 4:29 PM, Cornelia Huck wrote:  
> > > On Fri, 15 May 2020 02:07:43 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >     
> > >> IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
> > >> All pages pinned by vendor driver through this API should be considered as
> > >> dirty during migration. When container consists of IOMMU capable device and
> > >> all pages are pinned and mapped, then all pages are marked dirty.
> > >> Added support to start/stop dirtied pages tracking and to get bitmap of all
> > >> dirtied pages for requested IO virtual address range.
> > >>
> > >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > >> Reviewed-by: Neo Jia <cjia@nvidia.com>
> > >> ---
> > >>   include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++
> > >>   1 file changed, 55 insertions(+)  
> 
> (...)
> 
> > >> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
> > >> + * IOCTL returns dirty pages bitmap for IOMMU container during migration for
> > >> + * given IOVA range.    
> > > 
> > > "Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_GET_BITMAP returns the
> > > dirty pages bitmap for the IOMMU container for a given IOVA range." ?
> > > 
> > > Q: How does this interact with the two other operations? I imagine
> > > getting an empty bitmap before _START     
> > 
> > No, if dirty page tracking is not started, get_bitmap IOCTL will fail 
> > with -EINVAL.
> >   
> > > and a bitmap-in-progress between
> > > _START and _STOP. > After _STOP, will subsequent calls always give the
> > > same bitmap?
> > >    
> > 
> > No, return -EINVAL.  
> 
> Maybe add
> 
> "If the IOCTL has not yet been called with
> VFIO_IOMMU_DIRTY_PAGES_FLAG_START, or if it has been called with
> VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP, calling it with
> VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP will return -EINVAL." ?

Let's not specify ourselves into a corner, I think we can simply say
that the dirty bitmap is only available while dirty logging is enabled.
We certainly don't need to specify specific errno values that'll trip
us up later.

> > >> User must provide data[] as the structure
> > >> + * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
> > >> + * pgsize.    
> > > 
> > > "The user must specify the IOVA range and the pgsize through the
> > > vfio_iommu_type1_dirty_bitmap_get structure in the data[] portion."
> > > 
> > > ?
> > >     
> > >> This interface supports to get bitmap of smallest supported pgsize
> > >> + * only and can be modified in future to get bitmap of specified pgsize.    
> > > 
> > > That's a current restriction? How can the user find out whether it has
> > > been lifted (or, more generally, find out which pgsize values are
> > > supported)?    
> > 
> > Migration capability is added to IOMMU info chain. That gives supported 
> > pgsize bitmap by IOMMU driver.  
> 
> Add that info?
> 
> "The supported pgsize values for this interface are reported via the
> migration capability in the IOMMU info chain."
> 
> >   
> > >     
> > >> + * User must allocate memory for bitmap, zero the bitmap memory  and set size
> > >> + * of allocated memory in bitmap.size field.    
> > > 
> > > "The user must provide a zeroed memory area for the bitmap memory and
> > > specify its size in bitmap.size."
> > > 
> > > ?
> > >     
> > >> One bit is used to represent one
> > >> + * page consecutively starting from iova offset. User should provide page size
> > >> + * in bitmap.pgsize field.    
> > > 
> > > s/User/The user/
> > > 
> > > Is that the 'pgsize' the comment above talks about?
> > >     
> > 
> > By specifying pgsize here user can ask for bitmap of specific pgsize.  
> 
> "The user should provide the page size for the bitmap in the
> bitmap.pgsize field." ?
> 
> >   
> > >> Bit set in bitmap indicates page at that offset from
> > >> + * iova is dirty.    
> > > 
> > > "A bit set in the bitmap indicates that the page at that offset from
> > > iova is dirty." ?
> > >     
> > >> Caller must set argsz including size of structure
> > >> + * vfio_iommu_type1_dirty_bitmap_get.    
> > > 
> > > s/Caller/The caller/
> > > 
> > > Does argz also include the size of the bitmap?    
> > 
> > No.  
> 
> "The caller must set argsz to a value including the size of stuct
> vfio_io_type1_dirty_bitmap_get, but excluding the size of the actual
> bitmap." ?

Yes, it wouldn't make sense for argsz to include the size of the bitmap
itself, that's accessed independently via a user provided pointer and
we have a separate size field for that.  Thanks,

Alex
Cornelia Huck May 19, 2020, 4 p.m. UTC | #5
On Tue, 19 May 2020 09:53:56 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 19 May 2020 17:35:07 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Fri, 15 May 2020 23:05:24 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> > > On 5/15/2020 4:29 PM, Cornelia Huck wrote:    
> > > > On Fri, 15 May 2020 02:07:43 +0530
> > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > >       
> > > >> IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
> > > >> All pages pinned by vendor driver through this API should be considered as
> > > >> dirty during migration. When container consists of IOMMU capable device and
> > > >> all pages are pinned and mapped, then all pages are marked dirty.
> > > >> Added support to start/stop dirtied pages tracking and to get bitmap of all
> > > >> dirtied pages for requested IO virtual address range.
> > > >>
> > > >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > > >> Reviewed-by: Neo Jia <cjia@nvidia.com>
> > > >> ---
> > > >>   include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++
> > > >>   1 file changed, 55 insertions(+)    
> > 
> > (...)
> >   
> > > >> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
> > > >> + * IOCTL returns dirty pages bitmap for IOMMU container during migration for
> > > >> + * given IOVA range.      
> > > > 
> > > > "Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_GET_BITMAP returns the
> > > > dirty pages bitmap for the IOMMU container for a given IOVA range." ?
> > > > 
> > > > Q: How does this interact with the two other operations? I imagine
> > > > getting an empty bitmap before _START       
> > > 
> > > No, if dirty page tracking is not started, get_bitmap IOCTL will fail 
> > > with -EINVAL.
> > >     
> > > > and a bitmap-in-progress between
> > > > _START and _STOP. > After _STOP, will subsequent calls always give the
> > > > same bitmap?
> > > >      
> > > 
> > > No, return -EINVAL.    
> > 
> > Maybe add
> > 
> > "If the IOCTL has not yet been called with
> > VFIO_IOMMU_DIRTY_PAGES_FLAG_START, or if it has been called with
> > VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP, calling it with
> > VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP will return -EINVAL." ?  
> 
> Let's not specify ourselves into a corner, I think we can simply say
> that the dirty bitmap is only available while dirty logging is enabled.
> We certainly don't need to specify specific errno values that'll trip
> us up later.

"If dirty logging is not enabled, an error will be returned." ?

(...)

> > > >> Caller must set argsz including size of structure
> > > >> + * vfio_iommu_type1_dirty_bitmap_get.      
> > > > 
> > > > s/Caller/The caller/
> > > > 
> > > > Does argz also include the size of the bitmap?      
> > > 
> > > No.    
> > 
> > "The caller must set argsz to a value including the size of stuct
> > vfio_io_type1_dirty_bitmap_get, but excluding the size of the actual
> > bitmap." ?  
> 
> Yes, it wouldn't make sense for argsz to include the size of the bitmap
> itself, that's accessed independently via a user provided pointer and
> we have a separate size field for that.  Thanks,
> 
> Alex

Yes, I just wanted to make it as obvious as possible to make it easier
for folks trying to interact with this interface.
diff mbox series

Patch

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ad9bb5af3463..123de3bc2dce 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1033,6 +1033,12 @@  struct vfio_iommu_type1_dma_map {
 
 #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
 
+struct vfio_bitmap {
+	__u64        pgsize;	/* page size for bitmap in bytes */
+	__u64        size;	/* in bytes */
+	__u64 __user *data;	/* one bit per page */
+};
+
 /**
  * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
  *							struct vfio_dma_unmap)
@@ -1059,6 +1065,55 @@  struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ *                                     struct vfio_iommu_type1_dirty_bitmap)
+ * IOCTL is used for dirty pages tracking.
+ * Caller should set flag depending on which operation to perform, details as
+ * below:
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
+ * migration is active and IOMMU module should track pages which are dirtied or
+ * potentially dirtied by device.
+ * Dirty pages are tracked until tracking is stopped by user application by
+ * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
+ * IOMMU should stop tracking dirtied pages.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
+ * IOCTL returns dirty pages bitmap for IOMMU container during migration for
+ * given IOVA range. User must provide data[] as the structure
+ * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
+ * pgsize. This interface supports to get bitmap of smallest supported pgsize
+ * only and can be modified in future to get bitmap of specified pgsize.
+ * User must allocate memory for bitmap, zero the bitmap memory  and set size
+ * of allocated memory in bitmap.size field. One bit is used to represent one
+ * page consecutively starting from iova offset. User should provide page size
+ * in bitmap.pgsize field. Bit set in bitmap indicates page at that offset from
+ * iova is dirty. Caller must set argsz including size of structure
+ * vfio_iommu_type1_dirty_bitmap_get.
+ *
+ * Only one of the flags _START, STOP and _GET may be specified at a time.
+ *
+ */
+struct vfio_iommu_type1_dirty_bitmap {
+	__u32        argsz;
+	__u32        flags;
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START	(1 << 0)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP	(1 << 1)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP	(1 << 2)
+	__u8         data[];
+};
+
+struct vfio_iommu_type1_dirty_bitmap_get {
+	__u64              iova;	/* IO virtual address */
+	__u64              size;	/* Size of iova range */
+	struct vfio_bitmap bitmap;
+};
+
+#define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*