[V1,2/2] vfio/type1: prevent locked_vm underflow

Message ID	1670946416-155307-3-git-send-email-steven.sistare@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Steve Sistare <steven.sistare@oracle.com> To: kvm@vger.kernel.org Cc: Alex Williamson <alex.williamson@redhat.com>, Cornelia Huck <cohuck@redhat.com>, Steve Sistare <steven.sistare@oracle.com> Subject: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow Date: Tue, 13 Dec 2022 07:46:56 -0800 Message-Id: <1670946416-155307-3-git-send-email-steven.sistare@oracle.com> In-Reply-To: <1670946416-155307-1-git-send-email-steven.sistare@oracle.com> References: <1670946416-155307-1-git-send-email-steven.sistare@oracle.com> Precedence: bulk
Series	fixes for virtual address update \| expand [V1,0/2] fixes for virtual address update [V1,1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR [V1,2/2] vfio/type1: prevent locked_vm underflow

Steven Sistare Dec. 13, 2022, 3:46 p.m. UTC

When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
dma mapping, locked_vm underflows to a large unsigned value, and a
subsequent dma map request fails with ENOMEM in __account_locked_vm.

To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
add the mapping's pinned page count to the new mm->locked_vm, subject to
the rlimit.  Now that mediated devices are excluded when using
VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
mapping.

Underflow will not occur when all dma mappings are invalidated before exec.
An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
will fail with EINVAL because the mapping is in the vaddr_invalid state.
Underflow may still occur in a buggy application that fails to invalidate
all before exec.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Alex Williamson Dec. 13, 2022, 6:02 p.m. UTC | #1

On Tue, 13 Dec 2022 07:46:56 -0800
Steve Sistare <steven.sistare@oracle.com> wrote:

> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
> dma mapping, locked_vm underflows to a large unsigned value, and a
> subsequent dma map request fails with ENOMEM in __account_locked_vm.
> 
> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
> add the mapping's pinned page count to the new mm->locked_vm, subject to
> the rlimit.  Now that mediated devices are excluded when using
> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
> mapping.
> 
> Underflow will not occur when all dma mappings are invalidated before exec.
> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
> will fail with EINVAL because the mapping is in the vaddr_invalid state.

Where is this enforced?

> Underflow may still occur in a buggy application that fails to invalidate
> all before exec.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index f81e925..e5a02f8 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -100,6 +100,7 @@ struct vfio_dma {
>  	struct task_struct	*task;
>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>  	unsigned long		*bitmap;
> +	struct mm_struct	*mm;
>  };
>  
>  struct vfio_batch {
> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	vfio_unmap_unpin(iommu, dma, true);
>  	vfio_unlink_dma(iommu, dma);
>  	put_task_struct(dma->task);
> +	mmdrop(dma->mm);
>  	vfio_dma_bitmap_free(dma);
>  	if (dma->vaddr_invalid) {
>  		iommu->vaddr_invalid_count--;
> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  			dma->vaddr = vaddr;
>  			dma->vaddr_invalid = false;
>  			iommu->vaddr_invalid_count--;
> +			if (current->mm != dma->mm) {
> +				mmdrop(dma->mm);
> +				dma->mm = current->mm;
> +				mmgrab(dma->mm);
> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
> +						     0);

What does it actually mean if this fails?  The pages are still pinned.
lock_vm doesn't get updated.  Underflow can still occur.  Thanks,

Alex

> +			}
>  			wake_up_all(&iommu->vaddr_wait);
>  		}
>  		goto out_unlock;
> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	get_task_struct(current->group_leader);
>  	dma->task = current->group_leader;
>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> +	dma->mm = dma->task->mm;
> +	mmgrab(dma->mm);
>  
>  	dma->pfn_list = RB_ROOT;
>

Steven Sistare Dec. 13, 2022, 6:17 p.m. UTC | #2

On 12/13/2022 1:02 PM, Alex Williamson wrote:
> On Tue, 13 Dec 2022 07:46:56 -0800
> Steve Sistare <steven.sistare@oracle.com> wrote:
> 
>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>> dma mapping, locked_vm underflows to a large unsigned value, and a
>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>
>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
>> add the mapping's pinned page count to the new mm->locked_vm, subject to
>> the rlimit.  Now that mediated devices are excluded when using
>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
>> mapping.
>>
>> Underflow will not occur when all dma mappings are invalidated before exec.
>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
>> will fail with EINVAL because the mapping is in the vaddr_invalid state.
> 
> Where is this enforced?

In vfio_dma_do_unmap:
        if (invalidate_vaddr) {
                if (dma->vaddr_invalid) {
                        ...
                        ret = -EINVAL;

>> Underflow may still occur in a buggy application that fails to invalidate
>> all before exec.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index f81e925..e5a02f8 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>  	struct task_struct	*task;
>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>  	unsigned long		*bitmap;
>> +	struct mm_struct	*mm;
>>  };
>>  
>>  struct vfio_batch {
>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_unmap_unpin(iommu, dma, true);
>>  	vfio_unlink_dma(iommu, dma);
>>  	put_task_struct(dma->task);
>> +	mmdrop(dma->mm);
>>  	vfio_dma_bitmap_free(dma);
>>  	if (dma->vaddr_invalid) {
>>  		iommu->vaddr_invalid_count--;
>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  			dma->vaddr = vaddr;
>>  			dma->vaddr_invalid = false;
>>  			iommu->vaddr_invalid_count--;
>> +			if (current->mm != dma->mm) {
>> +				mmdrop(dma->mm);
>> +				dma->mm = current->mm;
>> +				mmgrab(dma->mm);
>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
>> +						     0);
> 
> What does it actually mean if this fails?  The pages are still pinned.
> lock_vm doesn't get updated.  Underflow can still occur.  Thanks,

If this fails, the user has locked additional memory after exec and before making
this call -- more than was locked before exec -- and the rlimit is exceeded.
A misbehaving application, which will only hurt itself.

However, I should reorder these, and check ret before changing the other state.

- Steve

>> +			}
>>  			wake_up_all(&iommu->vaddr_wait);
>>  		}
>>  		goto out_unlock;
>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	get_task_struct(current->group_leader);
>>  	dma->task = current->group_leader;
>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>> +	dma->mm = dma->task->mm;
>> +	mmgrab(dma->mm);
>>  
>>  	dma->pfn_list = RB_ROOT;
>>  
>

Steven Sistare Dec. 13, 2022, 6:21 p.m. UTC | #3

On 12/13/2022 1:17 PM, Steven Sistare wrote:
> On 12/13/2022 1:02 PM, Alex Williamson wrote:
>> On Tue, 13 Dec 2022 07:46:56 -0800
>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>
>>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>>> dma mapping, locked_vm underflows to a large unsigned value, and a
>>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>>
>>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
>>> add the mapping's pinned page count to the new mm->locked_vm, subject to
>>> the rlimit.  Now that mediated devices are excluded when using
>>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
>>> mapping.
>>>
>>> Underflow will not occur when all dma mappings are invalidated before exec.
>>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
>>> will fail with EINVAL because the mapping is in the vaddr_invalid state.
>>
>> Where is this enforced?
> 
> In vfio_dma_do_unmap:
>         if (invalidate_vaddr) {
>                 if (dma->vaddr_invalid) {
>                         ...
>                         ret = -EINVAL;

My bad, this is a different case, and my comment in the commit message is
incorrect.  I should test mm != dma->mm during unmap as well, and suppress
the locked_vm deduction there.

- Steve

>>> Underflow may still occur in a buggy application that fails to invalidate
>>> all before exec.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index f81e925..e5a02f8 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>>  	struct task_struct	*task;
>>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>>  	unsigned long		*bitmap;
>>> +	struct mm_struct	*mm;
>>>  };
>>>  
>>>  struct vfio_batch {
>>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>  	vfio_unmap_unpin(iommu, dma, true);
>>>  	vfio_unlink_dma(iommu, dma);
>>>  	put_task_struct(dma->task);
>>> +	mmdrop(dma->mm);
>>>  	vfio_dma_bitmap_free(dma);
>>>  	if (dma->vaddr_invalid) {
>>>  		iommu->vaddr_invalid_count--;
>>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  			dma->vaddr = vaddr;
>>>  			dma->vaddr_invalid = false;
>>>  			iommu->vaddr_invalid_count--;
>>> +			if (current->mm != dma->mm) {
>>> +				mmdrop(dma->mm);
>>> +				dma->mm = current->mm;
>>> +				mmgrab(dma->mm);
>>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
>>> +						     0);
>>
>> What does it actually mean if this fails?  The pages are still pinned.
>> lock_vm doesn't get updated.  Underflow can still occur.  Thanks,
> 
> If this fails, the user has locked additional memory after exec and before making
> this call -- more than was locked before exec -- and the rlimit is exceeded.
> A misbehaving application, which will only hurt itself.
> 
> However, I should reorder these, and check ret before changing the other state.
> 
> - Steve
> 
>>> +			}
>>>  			wake_up_all(&iommu->vaddr_wait);
>>>  		}
>>>  		goto out_unlock;
>>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  	get_task_struct(current->group_leader);
>>>  	dma->task = current->group_leader;
>>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>>> +	dma->mm = dma->task->mm;
>>> +	mmgrab(dma->mm);
>>>  
>>>  	dma->pfn_list = RB_ROOT;
>>>  
>>

Alex Williamson Dec. 13, 2022, 7:29 p.m. UTC | #4

On Tue, 13 Dec 2022 13:21:15 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 12/13/2022 1:17 PM, Steven Sistare wrote:
> > On 12/13/2022 1:02 PM, Alex Williamson wrote:  
> >> On Tue, 13 Dec 2022 07:46:56 -0800
> >> Steve Sistare <steven.sistare@oracle.com> wrote:
> >>  
> >>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
> >>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
> >>> dma mapping, locked_vm underflows to a large unsigned value, and a
> >>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
> >>>
> >>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
> >>> add the mapping's pinned page count to the new mm->locked_vm, subject to
> >>> the rlimit.  Now that mediated devices are excluded when using
> >>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
> >>> mapping.
> >>>
> >>> Underflow will not occur when all dma mappings are invalidated before exec.
> >>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
> >>> will fail with EINVAL because the mapping is in the vaddr_invalid state.  
> >>
> >> Where is this enforced?  
> > 
> > In vfio_dma_do_unmap:
> >         if (invalidate_vaddr) {
> >                 if (dma->vaddr_invalid) {
> >                         ...
> >                         ret = -EINVAL;  
> 
> My bad, this is a different case, and my comment in the commit message is
> incorrect.  I should test mm != dma->mm during unmap as well, and suppress
> the locked_vm deduction there.

I'm getting confused how this patch actually does anything.  We grab
the mm of the task doing mappings, and we swap that grab when updating
the vaddr, but vfio_lock_acct() uses the original dma->task mm for
accounting.  Therefore how can an underflow occur?  It seems we're
simply failing to adjust locked_vm for the new mm at all.

> >>> Underflow may still occur in a buggy application that fails to invalidate
> >>> all before exec.
> >>>
> >>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>> ---
> >>>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
> >>>  1 file changed, 11 insertions(+)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>> index f81e925..e5a02f8 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -100,6 +100,7 @@ struct vfio_dma {
> >>>  	struct task_struct	*task;
> >>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
> >>>  	unsigned long		*bitmap;
> >>> +	struct mm_struct	*mm;
> >>>  };
> >>>  
> >>>  struct vfio_batch {
> >>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> >>>  	vfio_unmap_unpin(iommu, dma, true);
> >>>  	vfio_unlink_dma(iommu, dma);
> >>>  	put_task_struct(dma->task);
> >>> +	mmdrop(dma->mm);
> >>>  	vfio_dma_bitmap_free(dma);
> >>>  	if (dma->vaddr_invalid) {
> >>>  		iommu->vaddr_invalid_count--;
> >>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>>  			dma->vaddr = vaddr;
> >>>  			dma->vaddr_invalid = false;
> >>>  			iommu->vaddr_invalid_count--;
> >>> +			if (current->mm != dma->mm) {
> >>> +				mmdrop(dma->mm);
> >>> +				dma->mm = current->mm;
> >>> +				mmgrab(dma->mm);
> >>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
> >>> +						     0);  
> >>
> >> What does it actually mean if this fails?  The pages are still pinned.
> >> lock_vm doesn't get updated.  Underflow can still occur.  Thanks,  
> > 
> > If this fails, the user has locked additional memory after exec and before making
> > this call -- more than was locked before exec -- and the rlimit is exceeded.
> > A misbehaving application, which will only hurt itself.
> > 
> > However, I should reorder these, and check ret before changing the other state.

The result would then be that the mapping remains with vaddr_invalid on
error?  Thanks,

Alex

> >>> +			}
> >>>  			wake_up_all(&iommu->vaddr_wait);
> >>>  		}
> >>>  		goto out_unlock;
> >>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>>  	get_task_struct(current->group_leader);
> >>>  	dma->task = current->group_leader;
> >>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> >>> +	dma->mm = dma->task->mm;
> >>> +	mmgrab(dma->mm);
> >>>  
> >>>  	dma->pfn_list = RB_ROOT;
> >>>    
> >>  
>

Steven Sistare Dec. 13, 2022, 7:40 p.m. UTC | #5

On 12/13/2022 2:29 PM, Alex Williamson wrote:
> On Tue, 13 Dec 2022 13:21:15 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 12/13/2022 1:17 PM, Steven Sistare wrote:
>>> On 12/13/2022 1:02 PM, Alex Williamson wrote:  
>>>> On Tue, 13 Dec 2022 07:46:56 -0800
>>>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>>>  
>>>>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>>>>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>>>>> dma mapping, locked_vm underflows to a large unsigned value, and a
>>>>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>>>>
>>>>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
>>>>> add the mapping's pinned page count to the new mm->locked_vm, subject to
>>>>> the rlimit.  Now that mediated devices are excluded when using
>>>>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
>>>>> mapping.
>>>>>
>>>>> Underflow will not occur when all dma mappings are invalidated before exec.
>>>>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
>>>>> will fail with EINVAL because the mapping is in the vaddr_invalid state.  
>>>>
>>>> Where is this enforced?  
>>>
>>> In vfio_dma_do_unmap:
>>>         if (invalidate_vaddr) {
>>>                 if (dma->vaddr_invalid) {
>>>                         ...
>>>                         ret = -EINVAL;  
>>
>> My bad, this is a different case, and my comment in the commit message is
>> incorrect.  I should test mm != dma->mm during unmap as well, and suppress
>> the locked_vm deduction there.
> 
> I'm getting confused how this patch actually does anything.  We grab
> the mm of the task doing mappings, and we swap that grab when updating
> the vaddr, but vfio_lock_acct() uses the original dma->task mm for
> accounting.  Therefore how can an underflow occur?  It seems we're
> simply failing to adjust locked_vm for the new mm at all.

The old code saves dma->task, but not dma->task->mm.  The task's mm changes 
across exec.

>>>>> Underflow may still occur in a buggy application that fails to invalidate
>>>>> all before exec.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> ---
>>>>>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
>>>>>  1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>>> index f81e925..e5a02f8 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>>>>  	struct task_struct	*task;
>>>>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>>>>  	unsigned long		*bitmap;
>>>>> +	struct mm_struct	*mm;
>>>>>  };
>>>>>  
>>>>>  struct vfio_batch {
>>>>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>>>  	vfio_unmap_unpin(iommu, dma, true);
>>>>>  	vfio_unlink_dma(iommu, dma);
>>>>>  	put_task_struct(dma->task);
>>>>> +	mmdrop(dma->mm);
>>>>>  	vfio_dma_bitmap_free(dma);
>>>>>  	if (dma->vaddr_invalid) {
>>>>>  		iommu->vaddr_invalid_count--;
>>>>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>>  			dma->vaddr = vaddr;
>>>>>  			dma->vaddr_invalid = false;
>>>>>  			iommu->vaddr_invalid_count--;
>>>>> +			if (current->mm != dma->mm) {
>>>>> +				mmdrop(dma->mm);
>>>>> +				dma->mm = current->mm;
>>>>> +				mmgrab(dma->mm);
>>>>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
>>>>> +						     0);  
>>>>
>>>> What does it actually mean if this fails?  The pages are still pinned.
>>>> lock_vm doesn't get updated.  Underflow can still occur.  Thanks,  
>>>
>>> If this fails, the user has locked additional memory after exec and before making
>>> this call -- more than was locked before exec -- and the rlimit is exceeded.
>>> A misbehaving application, which will only hurt itself.
>>>
>>> However, I should reorder these, and check ret before changing the other state.
> 
> The result would then be that the mapping remains with vaddr_invalid on
> error?  Thanks,

Correct.  In theory the app could recover by releasing the extra locked memory that
it grabbed, or increase its rlimit, and then try map_flag_vaddr again.

- Steve

>>>>> +			}
>>>>>  			wake_up_all(&iommu->vaddr_wait);
>>>>>  		}
>>>>>  		goto out_unlock;
>>>>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>>  	get_task_struct(current->group_leader);
>>>>>  	dma->task = current->group_leader;
>>>>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>>>>> +	dma->mm = dma->task->mm;
>>>>> +	mmgrab(dma->mm);
>>>>>  
>>>>>  	dma->pfn_list = RB_ROOT;
>>>>>    
>>>>  
>>
>

[V1,2/2] vfio/type1: prevent locked_vm underflow

Commit Message

Comments

Patch