[V7,3/7] vfio/type1: track locked_vm per dma

Message ID	1671568765-297322-4-git-send-email-steven.sistare@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Steve Sistare <steven.sistare@oracle.com> To: kvm@vger.kernel.org Cc: Alex Williamson <alex.williamson@redhat.com>, Cornelia Huck <cohuck@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>, Kevin Tian <kevin.tian@intel.com>, Steve Sistare <steven.sistare@oracle.com> Subject: [PATCH V7 3/7] vfio/type1: track locked_vm per dma Date: Tue, 20 Dec 2022 12:39:21 -0800 Message-Id: <1671568765-297322-4-git-send-email-steven.sistare@oracle.com> In-Reply-To: <1671568765-297322-1-git-send-email-steven.sistare@oracle.com> References: <1671568765-297322-1-git-send-email-steven.sistare@oracle.com> Precedence: bulk
Series	fixes for virtual address update \| expand [V7,0/7] fixes for virtual address update [V7,1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR [V7,2/7] vfio/type1: prevent underflow of locked_vm via exec() [V7,3/7] vfio/type1: track locked_vm per dma [V7,4/7] vfio/type1: restore locked_vm [V7,5/7] vfio/type1: revert "block on invalid vaddr" [V7,6/7] vfio/type1: revert "implement notify callback" [V7,7/7] vfio: revert "iommu driver notify callback"

Message ID

1671568765-297322-4-git-send-email-steven.sistare@oracle.com (mailing list archive)

State

New, archived

Headers

From: Steve Sistare <steven.sistare@oracle.com>
To: kvm@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
        Cornelia Huck <cohuck@redhat.com>,
        Jason Gunthorpe <jgg@nvidia.com>,
        Kevin Tian <kevin.tian@intel.com>,
        Steve Sistare <steven.sistare@oracle.com>
Subject: [PATCH V7 3/7] vfio/type1: track locked_vm per dma
Date: Tue, 20 Dec 2022 12:39:21 -0800
Message-Id: <1671568765-297322-4-git-send-email-steven.sistare@oracle.com>
In-Reply-To: <1671568765-297322-1-git-send-email-steven.sistare@oracle.com>
References: <1671568765-297322-1-git-send-email-steven.sistare@oracle.com>
Precedence: bulk

Series

fixes for virtual address update | expand

Commit Message

Steven Sistare Dec. 20, 2022, 8:39 p.m. UTC

Track locked_vm per dma struct, and create a new subroutine, both for use
in a subsequent patch.  No functional change.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

Comments

Jason Gunthorpe Jan. 3, 2023, 3:21 p.m. UTC | #1

On Tue, Dec 20, 2022 at 12:39:21PM -0800, Steve Sistare wrote:
> Track locked_vm per dma struct, and create a new subroutine, both for use
> in a subsequent patch.  No functional change.
> 
> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
> Cc: stable@vger.kernel.org
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 71f980b..588d690 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -101,6 +101,7 @@ struct vfio_dma {
>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>  	unsigned long		*bitmap;
>  	struct mm_struct	*mm;
> +	long			locked_vm;

Why is it long? Can it be negative?

>  };
>  
>  struct vfio_batch {
> @@ -413,22 +414,21 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)
>  	return ret;
>  }
>  
> -static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> +static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm,
> +			bool lock_cap, long npage, bool async)
>  {

Now async is even more confusing, the caller really should have a
valid handle on the mm before using it as an argument like this.

Jason

Steven Sistare Jan. 3, 2023, 6:13 p.m. UTC | #2

On 1/3/2023 10:21 AM, Jason Gunthorpe wrote:
> On Tue, Dec 20, 2022 at 12:39:21PM -0800, Steve Sistare wrote:
>> Track locked_vm per dma struct, and create a new subroutine, both for use
>> in a subsequent patch.  No functional change.
>>
>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 20 +++++++++++++++-----
>>  1 file changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 71f980b..588d690 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -101,6 +101,7 @@ struct vfio_dma {
>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>  	unsigned long		*bitmap;
>>  	struct mm_struct	*mm;
>> +	long			locked_vm;
> 
> Why is it long? Can it be negative?

The existing code uses both long and uint64_t for page counts, and I picked one.
I'll use size_t instead to match vfio_dma size.

>>  };
>>  
>>  struct vfio_batch {
>> @@ -413,22 +414,21 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)
>>  	return ret;
>>  }
>>  
>> -static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>> +static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm,
>> +			bool lock_cap, long npage, bool async)
>>  {
> 
> Now async is even more confusing, the caller really should have a
> valid handle on the mm before using it as an argument like this.

The caller holds a grab reference on mm, and mm_lock_acct does mmget_not_zero to 
validate the mm.  IMO this is a close analog of the original vfio_lock_acct code
where the caller holds a get reference on task, and does get_task_mm to validate
the mm.

However, I can hoist the mmget_not_zero from mm_lock_acct to its callsites in
vfio_lock_acct and vfio_change_dma_owner.

- Steve

Steven Sistare Jan. 9, 2023, 9:24 p.m. UTC | #3

On 1/3/2023 1:13 PM, Steven Sistare wrote:
> On 1/3/2023 10:21 AM, Jason Gunthorpe wrote:
>> On Tue, Dec 20, 2022 at 12:39:21PM -0800, Steve Sistare wrote:
>>> Track locked_vm per dma struct, and create a new subroutine, both for use
>>> in a subsequent patch.  No functional change.
>>>
>>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 20 +++++++++++++++-----
>>>  1 file changed, 15 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 71f980b..588d690 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -101,6 +101,7 @@ struct vfio_dma {
>>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>>  	unsigned long		*bitmap;
>>>  	struct mm_struct	*mm;
>>> +	long			locked_vm;
>>
>> Why is it long? Can it be negative?
> 
> The existing code uses both long and uint64_t for page counts, and I picked one.
> I'll use size_t instead to match vfio_dma size.
> 
>>>  };
>>>  
>>>  struct vfio_batch {
>>> @@ -413,22 +414,21 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)
>>>  	return ret;
>>>  }
>>>  
>>> -static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>> +static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm,
>>> +			bool lock_cap, long npage, bool async)
>>>  {
>>
>> Now async is even more confusing, the caller really should have a
>> valid handle on the mm before using it as an argument like this.
> 
> The caller holds a grab reference on mm, and mm_lock_acct does mmget_not_zero to 
> validate the mm.  IMO this is a close analog of the original vfio_lock_acct code
> where the caller holds a get reference on task, and does get_task_mm to validate
> the mm.
> 
> However, I can hoist the mmget_not_zero from mm_lock_acct to its callsites in
> vfio_lock_acct and vfio_change_dma_owner.

Yielding:

static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm,
                        bool lock_cap, long npage)
{
        int ret = mmap_write_lock_killable(mm);

        if (!ret) {
                ret = __account_locked_vm(mm, abs(npage), npage > 0, task,
                                          lock_cap);
                mmap_write_unlock(mm);
        }

        return ret;
}

static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
{
        struct mm_struct *mm = dma->mm;
        int ret;

        if (!npage)
                return 0;

        if (async && !mmget_not_zero(mm))
                return -ESRCH; /* process exited */

        ret = mm_lock_acct(dma->task, mm, dma->lock_cap, npage);
        if (!ret)
                dma->locked_vm += npage;

        if (async)
                mmput(mm);

        return ret;
}

static int vfio_change_dma_owner(struct vfio_dma *dma)
{
...
                ret = mm_lock_acct(task, mm, lock_cap, npage);
                if (ret)
                        return ret;

                if (mmget_not_zero(dma->mm)) {
                        mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage);
                        mmput(dma->mm);
                }
...

- Steve

Jason Gunthorpe Jan. 10, 2023, 12:31 a.m. UTC | #4

On Mon, Jan 09, 2023 at 04:24:03PM -0500, Steven Sistare wrote:
> On 1/3/2023 1:13 PM, Steven Sistare wrote:
> > On 1/3/2023 10:21 AM, Jason Gunthorpe wrote:
> >> On Tue, Dec 20, 2022 at 12:39:21PM -0800, Steve Sistare wrote:
> >>> Track locked_vm per dma struct, and create a new subroutine, both for use
> >>> in a subsequent patch.  No functional change.
> >>>
> >>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
> >>> Cc: stable@vger.kernel.org
> >>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> >>> ---
> >>>  drivers/vfio/vfio_iommu_type1.c | 20 +++++++++++++++-----
> >>>  1 file changed, 15 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>> index 71f980b..588d690 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -101,6 +101,7 @@ struct vfio_dma {
> >>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
> >>>  	unsigned long		*bitmap;
> >>>  	struct mm_struct	*mm;
> >>> +	long			locked_vm;
> >>
> >> Why is it long? Can it be negative?
> > 
> > The existing code uses both long and uint64_t for page counts, and I picked one.
> > I'll use size_t instead to match vfio_dma size.
> > 
> >>>  };
> >>>  
> >>>  struct vfio_batch {
> >>> @@ -413,22 +414,21 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)
> >>>  	return ret;
> >>>  }
> >>>  
> >>> -static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >>> +static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm,
> >>> +			bool lock_cap, long npage, bool async)
> >>>  {
> >>
> >> Now async is even more confusing, the caller really should have a
> >> valid handle on the mm before using it as an argument like this.
> > 
> > The caller holds a grab reference on mm, and mm_lock_acct does mmget_not_zero to 
> > validate the mm.  IMO this is a close analog of the original vfio_lock_acct code
> > where the caller holds a get reference on task, and does get_task_mm to validate
> > the mm.
> > 
> > However, I can hoist the mmget_not_zero from mm_lock_acct to its callsites in
> > vfio_lock_acct and vfio_change_dma_owner.
> 
> Yielding:
> 
> static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm,
>                         bool lock_cap, long npage)
> {
>         int ret = mmap_write_lock_killable(mm);
> 
>         if (!ret) {

Please don't write in the 'single return' style, that is not kernel
code.

'success oriented flow' means you have early returns and goto error so
a straight line read of the function tells what success looks like

Jason

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 71f980b..588d690 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -101,6 +101,7 @@  struct vfio_dma {
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
 	struct mm_struct	*mm;
+	long			locked_vm;
 };
 
 struct vfio_batch {
@@ -413,22 +414,21 @@  static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)
 	return ret;
 }
 
-static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
+static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm,
+			bool lock_cap, long npage, bool async)
 {
-	struct mm_struct *mm;
 	int ret;
 
 	if (!npage)
 		return 0;
 
-	mm = dma->mm;
 	if (async && !mmget_not_zero(mm))
 		return -ESRCH; /* process exited */
 
 	ret = mmap_write_lock_killable(mm);
 	if (!ret) {
-		ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
-					  dma->lock_cap);
+		ret = __account_locked_vm(mm, abs(npage), npage > 0, task,
+					  lock_cap);
 		mmap_write_unlock(mm);
 	}
 
@@ -438,6 +438,16 @@  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	return ret;
 }
 
+static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
+{
+	int ret;
+
+	ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage, async);
+	if (!ret)
+		dma->locked_vm += npage;
+	return ret;
+}
+
 /*
  * Some mappings aren't backed by a struct page, for example an mmap'd
  * MMIO range for our own or another device.  These use a different

[V7,3/7] vfio/type1: track locked_vm per dma

Commit Message

Comments

Patch