[V1,5/5] vfio: block during VA suspend

Message ID	1609861013-129801-6-git-send-email-steven.sistare@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Steve Sistare <steven.sistare@oracle.com> To: kvm@vger.kernel.org Cc: Alex Williamson <alex.williamson@redhat.com>, Cornelia Huck <cohuck@redhat.com>, Kirti Wankhede <kwankhede@nvidia.com>, Steve Sistare <steven.sistare@oracle.com> Subject: [PATCH V1 5/5] vfio: block during VA suspend Date: Tue, 5 Jan 2021 07:36:53 -0800 Message-Id: <1609861013-129801-6-git-send-email-steven.sistare@oracle.com> In-Reply-To: <1609861013-129801-1-git-send-email-steven.sistare@oracle.com> References: <1609861013-129801-1-git-send-email-steven.sistare@oracle.com> Precedence: bulk
Series	vfio virtual address update \| expand [V1,0/5] vfio virtual address update [V1,1/5] vfio: maintain dma_list order [V1,2/5] vfio: option to unmap all [V1,3/5] vfio: detect closed container [V1,4/5] vfio: VA suspend interface [V1,5/5] vfio: block during VA suspend

Steven Sistare Jan. 5, 2021, 3:36 p.m. UTC

Block translation of host virtual address while an iova range is suspended.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

kernel test robot Jan. 5, 2021, 6:08 p.m. UTC | #1

Hi Steve,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vfio/next]
[also build test WARNING on v5.11-rc2 next-20210104]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Steve-Sistare/vfio-virtual-address-update/20210106-000752
base:   https://github.com/awilliam/linux-vfio.git next
config: s390-randconfig-r015-20210105 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f680b8156755f4465e94574d5421747561d23749
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Steve-Sistare/vfio-virtual-address-update/20210106-000752
        git checkout f680b8156755f4465e94574d5421747561d23749
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/vfio/vfio_iommu_type1.c:503:6: warning: no previous prototype for 'vfio_vaddr_valid' [-Wmissing-prototypes]
     503 | bool vfio_vaddr_valid(struct vfio_iommu *iommu, struct vfio_dma *dma)
         |      ^~~~~~~~~~~~~~~~


vim +/vfio_vaddr_valid +503 drivers/vfio/vfio_iommu_type1.c

   501	
   502	
 > 503	bool vfio_vaddr_valid(struct vfio_iommu *iommu, struct vfio_dma *dma)
   504	{
   505		while (dma->suspended) {
   506			mutex_unlock(&iommu->lock);
   507			msleep_interruptible(10);
   508			mutex_lock(&iommu->lock);
   509			if (kthread_should_stop() || !vfio_iommu_contained(iommu) ||
   510			    fatal_signal_pending(current)) {
   511				return false;
   512			}
   513		}
   514		return true;
   515	}
   516	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

kernel test robot Jan. 5, 2021, 8:03 p.m. UTC | #2

Hi Steve,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vfio/next]
[also build test WARNING on v5.11-rc2 next-20210104]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Steve-Sistare/vfio-virtual-address-update/20210106-000752
base:   https://github.com/awilliam/linux-vfio.git next
config: x86_64-randconfig-s022-20210105 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-208-g46a52ca4-dirty
        # https://github.com/0day-ci/linux/commit/f680b8156755f4465e94574d5421747561d23749
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Steve-Sistare/vfio-virtual-address-update/20210106-000752
        git checkout f680b8156755f4465e94574d5421747561d23749
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
   drivers/vfio/vfio_iommu_type1.c:163:32: sparse: sparse: Using plain integer as NULL pointer
   drivers/vfio/vfio_iommu_type1.c:179:23: sparse: sparse: Using plain integer as NULL pointer
>> drivers/vfio/vfio_iommu_type1.c:503:6: sparse: sparse: symbol 'vfio_vaddr_valid' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Steven Sistare Jan. 7, 2021, 3:17 p.m. UTC | #3

I will fix the warnings in V2, and make W=1 C=1 for all my submissions!

- Steve

On 1/5/2021 3:03 PM, kernel test robot wrote:
> Hi Steve,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on vfio/next]
> [also build test WARNING on v5.11-rc2 next-20210104]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Steve-Sistare/vfio-virtual-address-update/20210106-000752
> base:   https://github.com/awilliam/linux-vfio.git next
> config: x86_64-randconfig-s022-20210105 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.3-208-g46a52ca4-dirty
>         # https://github.com/0day-ci/linux/commit/f680b8156755f4465e94574d5421747561d23749
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Steve-Sistare/vfio-virtual-address-update/20210106-000752
>         git checkout f680b8156755f4465e94574d5421747561d23749
>         # save the attached .config to linux build tree
>         make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> 
> "sparse warnings: (new ones prefixed by >>)"
>    drivers/vfio/vfio_iommu_type1.c:163:32: sparse: sparse: Using plain integer as NULL pointer
>    drivers/vfio/vfio_iommu_type1.c:179:23: sparse: sparse: Using plain integer as NULL pointer
>>> drivers/vfio/vfio_iommu_type1.c:503:6: sparse: sparse: symbol 'vfio_vaddr_valid' was not declared. Should it be static?
> 
> Please review and possibly fold the followup patch.
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>

Alex Williamson Jan. 8, 2021, 9:32 p.m. UTC | #4

On Tue,  5 Jan 2021 07:36:53 -0800
Steve Sistare <steven.sistare@oracle.com> wrote:

> Block translation of host virtual address while an iova range is suspended.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2c164a6..8035b9a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -31,6 +31,8 @@
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> +#include <linux/delay.h>
> +#include <linux/kthread.h>
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
> @@ -484,6 +486,34 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>  	return ret;
>  }
>  
> +static bool vfio_iommu_contained(struct vfio_iommu *iommu)
> +{
> +	struct vfio_domain *domain = iommu->external_domain;
> +	struct vfio_group *group;
> +
> +	if (!domain)
> +		domain = list_first_entry(&iommu->domain_list,
> +					  struct vfio_domain, next);
> +
> +	group = list_first_entry(&domain->group_list, struct vfio_group, next);
> +	return vfio_iommu_group_contained(group->iommu_group);
> +}

This seems really backwards for a vfio iommu backend to ask vfio-core
whether its internal object is active.

> +
> +
> +bool vfio_vaddr_valid(struct vfio_iommu *iommu, struct vfio_dma *dma)
> +{
> +	while (dma->suspended) {
> +		mutex_unlock(&iommu->lock);
> +		msleep_interruptible(10);
> +		mutex_lock(&iommu->lock);
> +		if (kthread_should_stop() || !vfio_iommu_contained(iommu) ||
> +		    fatal_signal_pending(current)) {
> +			return false;
> +		}
> +	}
> +	return true;
> +}
> +
>  /*
>   * Attempt to pin pages.  We really don't want to track all the pfns and
>   * the iommu can only map chunks of consecutive pfns anyway, so get the
> @@ -690,6 +720,11 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>  			continue;
>  		}
>  
> +		if (!vfio_vaddr_valid(iommu, dma)) {
> +			ret = -EFAULT;
> +			goto pin_unwind;
> +		}

This could have dropped iommu->lock, we have no basis to believe our
vfio_dma object is still valid.  Also this is called with an elevated
reference to the container, so we theoretically should not need to test
whether the iommu is still contained.

> +
>  		remote_vaddr = dma->vaddr + (iova - dma->iova);
>  		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
>  					     do_accounting);
> @@ -1514,12 +1549,16 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  					i += PAGE_SIZE;
>  				}
>  			} else {
> -				unsigned long pfn;
> -				unsigned long vaddr = dma->vaddr +
> -						     (iova - dma->iova);
> +				unsigned long pfn, vaddr;
>  				size_t n = dma->iova + dma->size - iova;
>  				long npage;
>  
> +				if (!vfio_vaddr_valid(iommu, dma)) {
> +					ret = -EFAULT;
> +					goto unwind;
> +				}

This one is even scarier, do we have any basis to assume either object
is still valid after iommu->lock is released?  We can only get here if
we're attaching a group to the iommu with suspended vfio_dmas.  Do we
need to allow that?  It seems we could -EAGAIN and let the user figure
out that they can't attach a group while mappings have invalid vaddrs.

> +				vaddr = dma->vaddr + (iova - dma->iova);
> +
>  				npage = vfio_pin_pages_remote(dma, vaddr,
>  							      n >> PAGE_SHIFT,
>  							      &pfn, limit);
> @@ -2965,6 +3004,9 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
>  	if (count > dma->size - offset)
>  		count = dma->size - offset;
>  
> +	if (!vfio_vaddr_valid(iommu, dma))

Same as the pinning path, this should hold a container user reference
but we cannot assume our vfio_dma object is valid if we've released the
lock.  Thanks,

Alex

> +		goto out;
> +
>  	vaddr = dma->vaddr + offset;
>  
>  	if (write) {

Steven Sistare Jan. 11, 2021, 9:16 p.m. UTC | #5

On 1/8/2021 4:32 PM, Alex Williamson wrote:
> On Tue,  5 Jan 2021 07:36:53 -0800
> Steve Sistare <steven.sistare@oracle.com> wrote:
> 
>> Block translation of host virtual address while an iova range is suspended.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 45 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2c164a6..8035b9a 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -31,6 +31,8 @@
>>  #include <linux/rbtree.h>
>>  #include <linux/sched/signal.h>
>>  #include <linux/sched/mm.h>
>> +#include <linux/delay.h>
>> +#include <linux/kthread.h>
>>  #include <linux/slab.h>
>>  #include <linux/uaccess.h>
>>  #include <linux/vfio.h>
>> @@ -484,6 +486,34 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>>  	return ret;
>>  }
>>  
>> +static bool vfio_iommu_contained(struct vfio_iommu *iommu)
>> +{
>> +	struct vfio_domain *domain = iommu->external_domain;
>> +	struct vfio_group *group;
>> +
>> +	if (!domain)
>> +		domain = list_first_entry(&iommu->domain_list,
>> +					  struct vfio_domain, next);
>> +
>> +	group = list_first_entry(&domain->group_list, struct vfio_group, next);
>> +	return vfio_iommu_group_contained(group->iommu_group);
>> +}
> 
> This seems really backwards for a vfio iommu backend to ask vfio-core
> whether its internal object is active.

Yes.  I considered the architecturally purer approach of defining a new back end
interface and calling it from the core when the change to uncontained occurs,
but it seemed like overkill.
 
>> +
>> +
>> +bool vfio_vaddr_valid(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> +{
>> +	while (dma->suspended) {
>> +		mutex_unlock(&iommu->lock);
>> +		msleep_interruptible(10);
>> +		mutex_lock(&iommu->lock);
>> +		if (kthread_should_stop() || !vfio_iommu_contained(iommu) ||
>> +		    fatal_signal_pending(current)) {
>> +			return false;
>> +		}
>> +	}
>> +	return true;
>> +}
>> +
>>  /*
>>   * Attempt to pin pages.  We really don't want to track all the pfns and
>>   * the iommu can only map chunks of consecutive pfns anyway, so get the
>> @@ -690,6 +720,11 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>>  			continue;
>>  		}
>>  
>> +		if (!vfio_vaddr_valid(iommu, dma)) {
>> +			ret = -EFAULT;
>> +			goto pin_unwind;
>> +		}
> 
> This could have dropped iommu->lock, we have no basis to believe our
> vfio_dma object is still valid.  

Ouch, I am embarrassed. To fix, I will pass iova to vfio_vaddr_valid() and call
vfio_find_dma after re-acquiring the lock.

> Also this is called with an elevated
> reference to the container, so we theoretically should not need to test
> whether the iommu is still contained.

We still have a reference to the container, but userland may have closed the container fd
and hence will never call the ioctl to replace the vaddr, so we must bail.
 
>> +
>>  		remote_vaddr = dma->vaddr + (iova - dma->iova);
>>  		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
>>  					     do_accounting);
>> @@ -1514,12 +1549,16 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>  					i += PAGE_SIZE;
>>  				}
>>  			} else {
>> -				unsigned long pfn;
>> -				unsigned long vaddr = dma->vaddr +
>> -						     (iova - dma->iova);
>> +				unsigned long pfn, vaddr;
>>  				size_t n = dma->iova + dma->size - iova;
>>  				long npage;
>>  
>> +				if (!vfio_vaddr_valid(iommu, dma)) {
>> +					ret = -EFAULT;
>> +					goto unwind;
>> +				}
> 
> This one is even scarier, do we have any basis to assume either object
> is still valid after iommu->lock is released?  

Another ouch. The iommu object still exists, but its dma_list may have changed.  
We cannot even unwind and punt with EAGAIN.

I can fix this by adding a preamble loop that holds iommu->lock and verifies that
all dma_list entries are not suspended.  If the loop hits a suspended entry, it
drops the lock and sleeps (like vfio_vaddr_valid), and retries from the beginning
of the list.  On reaching the end of the list, the lock remains held and prevents
entries from being suspended.

Instead of retrying, I could return EAGAIN, but that would require unwinding of 
earlier groups in __vfio_container_attach_groups.

> We can only get here if
> we're attaching a group to the iommu with suspended vfio_dmas.  Do we
> need to allow that?  It seems we could -EAGAIN and let the user figure
> out that they can't attach a group while mappings have invalid vaddrs.

I would like to allow it to minimize userland code changes.

>> +				vaddr = dma->vaddr + (iova - dma->iova);
>> +
>>  				npage = vfio_pin_pages_remote(dma, vaddr,
>>  							      n >> PAGE_SHIFT,
>>  							      &pfn, limit);
>> @@ -2965,6 +3004,9 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
>>  	if (count > dma->size - offset)
>>  		count = dma->size - offset;
>>  
>> +	if (!vfio_vaddr_valid(iommu, dma))
> 
> Same as the pinning path, this should hold a container user reference
> but we cannot assume our vfio_dma object is valid if we've released the
> lock.  Thanks,

Yup.  Same fix needed.

- Steve

>> +		goto out;
>> +
>>  	vaddr = dma->vaddr + offset;
>>  
>>  	if (write) {
>

[V1,5/5] vfio: block during VA suspend

Commit Message

Comments

Patch