diff mbox series

[V5,2/7] vfio/type1: prevent locked_vm underflow

Message ID 1671141424-81853-3-git-send-email-steven.sistare@oracle.com (mailing list archive)
State New, archived
Headers show
Series fixes for virtual address update | expand

Commit Message

Steven Sistare Dec. 15, 2022, 9:56 p.m. UTC
When a vfio container is preserved across exec, the task does not change,
but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
mapping, locked_vm underflows to a large unsigned value, and a subsequent
dma map request fails with ENOMEM in __account_locked_vm.

To avoid underflow, grab and save the mm at the time a dma is mapped.
Use that mm when adjusting locked_vm, rather than re-acquiring the saved
task's mm, which may have changed.  If the saved mm is dead, do nothing.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

Comments

Jason Gunthorpe Dec. 16, 2022, 2:09 p.m. UTC | #1
On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:
> When a vfio container is preserved across exec, the task does not change,
> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> dma map request fails with ENOMEM in __account_locked_vm.
> 
> To avoid underflow, grab and save the mm at the time a dma is mapped.
> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)

Add fixes lines and a CC stable

The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'

> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	get_task_struct(current->group_leader);
>  	dma->task = current->group_leader;
>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> +	dma->mm = dma->task->mm;

This should be current->mm, current->group_leader->mm is not quite the
same thing (and maybe another bug, I'm not sure)

Jason
Steven Sistare Dec. 16, 2022, 3:42 p.m. UTC | #2
On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
> On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:
>> When a vfio container is preserved across exec, the task does not change,
>> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
>> mapping, locked_vm underflows to a large unsigned value, and a subsequent
>> dma map request fails with ENOMEM in __account_locked_vm.
>>
>> To avoid underflow, grab and save the mm at the time a dma is mapped.
>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
>> task's mm, which may have changed.  If the saved mm is dead, do nothing.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
>>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> Add fixes lines and a CC stable

This predates the update vaddr functionality, so AFAICT:

    Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")

I'll wait on cc'ing stable until alex has chimed in.

> The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'

Underflow is a more precise description of the first corruption. How about:

vfio/type1: Prevent underflow of locked_vm via exec()

>> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	get_task_struct(current->group_leader);
>>  	dma->task = current->group_leader;
>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>> +	dma->mm = dma->task->mm;
> 
> This should be current->mm, current->group_leader->mm is not quite the
> same thing (and maybe another bug, I'm not sure)

When are they different -- when the leader is a zombie?

BTW I just noticed I need to update the comments about mm preceding these lines.

- Steve
Alex Williamson Dec. 16, 2022, 4:10 p.m. UTC | #3
On Fri, 16 Dec 2022 10:42:13 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
> > On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:  
> >> When a vfio container is preserved across exec, the task does not change,
> >> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> >> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> >> dma map request fails with ENOMEM in __account_locked_vm.
> >>
> >> To avoid underflow, grab and save the mm at the time a dma is mapped.
> >> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> >> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
> >>  1 file changed, 10 insertions(+), 7 deletions(-)  
> > 
> > Add fixes lines and a CC stable  
> 
> This predates the update vaddr functionality, so AFAICT:
> 
>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
> 
> I'll wait on cc'ing stable until alex has chimed in.

Technically, adding the stable Cc tag is still the correct approach per
the stable process docs, but the Fixes: tag alone is generally
sufficient to crank up the backport engines.  The original
implementation is probably the correct commit to identify, exec was
certainly not considered there.  Thanks,

Alex
 
> > The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'  
> 
> Underflow is a more precise description of the first corruption. How about:
> 
> vfio/type1: Prevent underflow of locked_vm via exec()
> 
> >> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>  	get_task_struct(current->group_leader);
> >>  	dma->task = current->group_leader;
> >>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> >> +	dma->mm = dma->task->mm;  
> > 
> > This should be current->mm, current->group_leader->mm is not quite the
> > same thing (and maybe another bug, I'm not sure)  
> 
> When are they different -- when the leader is a zombie?
> 
> BTW I just noticed I need to update the comments about mm preceding these lines.
> 
> - Steve
>
Steven Sistare Dec. 16, 2022, 4:16 p.m. UTC | #4
On 12/16/2022 11:10 AM, Alex Williamson wrote:
> On Fri, 16 Dec 2022 10:42:13 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
>>> On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:  
>>>> When a vfio container is preserved across exec, the task does not change,
>>>> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
>>>> mapping, locked_vm underflows to a large unsigned value, and a subsequent
>>>> dma map request fails with ENOMEM in __account_locked_vm.
>>>>
>>>> To avoid underflow, grab and save the mm at the time a dma is mapped.
>>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
>>>> task's mm, which may have changed.  If the saved mm is dead, do nothing.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
>>>>  1 file changed, 10 insertions(+), 7 deletions(-)  
>>>
>>> Add fixes lines and a CC stable  
>>
>> This predates the update vaddr functionality, so AFAICT:
>>
>>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
>>
>> I'll wait on cc'ing stable until alex has chimed in.
> 
> Technically, adding the stable Cc tag is still the correct approach per
> the stable process docs, but the Fixes: tag alone is generally
> sufficient to crank up the backport engines.  The original
> implementation is probably the correct commit to identify, exec was
> certainly not considered there.  Thanks,

Should I cc stable on the whole series, or re-send individually?  If the
latter, which ones?

- Steve
  
>>> The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'  
>>
>> Underflow is a more precise description of the first corruption. How about:
>>
>> vfio/type1: Prevent underflow of locked_vm via exec()
>>
>>>> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>  	get_task_struct(current->group_leader);
>>>>  	dma->task = current->group_leader;
>>>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>>>> +	dma->mm = dma->task->mm;  
>>>
>>> This should be current->mm, current->group_leader->mm is not quite the
>>> same thing (and maybe another bug, I'm not sure)  
>>
>> When are they different -- when the leader is a zombie?
>>
>> BTW I just noticed I need to update the comments about mm preceding these lines.
>>
>> - Steve
>>
>
Alex Williamson Dec. 16, 2022, 4:33 p.m. UTC | #5
On Fri, 16 Dec 2022 11:16:59 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 12/16/2022 11:10 AM, Alex Williamson wrote:
> > On Fri, 16 Dec 2022 10:42:13 -0500
> > Steven Sistare <steven.sistare@oracle.com> wrote:
> >   
> >> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:  
> >>> On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:    
> >>>> When a vfio container is preserved across exec, the task does not change,
> >>>> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> >>>> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> >>>> dma map request fails with ENOMEM in __account_locked_vm.
> >>>>
> >>>> To avoid underflow, grab and save the mm at the time a dma is mapped.
> >>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> >>>> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> >>>>
> >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>>> ---
> >>>>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
> >>>>  1 file changed, 10 insertions(+), 7 deletions(-)    
> >>>
> >>> Add fixes lines and a CC stable    
> >>
> >> This predates the update vaddr functionality, so AFAICT:
> >>
> >>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
> >>
> >> I'll wait on cc'ing stable until alex has chimed in.  
> > 
> > Technically, adding the stable Cc tag is still the correct approach per
> > the stable process docs, but the Fixes: tag alone is generally
> > sufficient to crank up the backport engines.  The original
> > implementation is probably the correct commit to identify, exec was
> > certainly not considered there.  Thanks,  
> 
> Should I cc stable on the whole series, or re-send individually?  If the
> latter, which ones?

Only for the Fixes: commits, and note that Cc: stable is effectively
just a tag like Fixes:, you don't actually need to copy the stable@vger
list on the patch (and IIRC you'll get some correctional email if you
do).  Thanks,

Alex
Jason Gunthorpe Dec. 16, 2022, 5:07 p.m. UTC | #6
On Fri, Dec 16, 2022 at 10:42:13AM -0500, Steven Sistare wrote:
> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
> > On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:
> >> When a vfio container is preserved across exec, the task does not change,
> >> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> >> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> >> dma map request fails with ENOMEM in __account_locked_vm.
> >>
> >> To avoid underflow, grab and save the mm at the time a dma is mapped.
> >> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> >> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
> >>  1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > Add fixes lines and a CC stable
> 
> This predates the update vaddr functionality, so AFAICT:
> 
>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
> 
> I'll wait on cc'ing stable until alex has chimed in.

Yes

> > The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'
> 
> Underflow is a more precise description of the first corruption. How about:
> 
> vfio/type1: Prevent underflow of locked_vm via exec()

sure
 
> >> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>  	get_task_struct(current->group_leader);
> >>  	dma->task = current->group_leader;
> >>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> >> +	dma->mm = dma->task->mm;
> > 
> > This should be current->mm, current->group_leader->mm is not quite the
> > same thing (and maybe another bug, I'm not sure)
> 
> When are they different -- when the leader is a zombie?

I'm actually not sure if they can be different, but if they are
different then group_leader is the wrong one. Better not to chance it

Jason
diff mbox series

Patch

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 144f5bb..cd49b656 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -100,6 +100,7 @@  struct vfio_dma {
 	struct task_struct	*task;
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
+	struct mm_struct	*mm;
 };
 
 struct vfio_batch {
@@ -420,8 +421,8 @@  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	if (!npage)
 		return 0;
 
-	mm = async ? get_task_mm(dma->task) : dma->task->mm;
-	if (!mm)
+	mm = dma->mm;
+	if (async && !mmget_not_zero(mm))
 		return -ESRCH; /* process exited */
 
 	ret = mmap_write_lock_killable(mm);
@@ -794,8 +795,8 @@  static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
 	struct mm_struct *mm;
 	int ret;
 
-	mm = get_task_mm(dma->task);
-	if (!mm)
+	mm = dma->mm;
+	if (!mmget_not_zero(mm))
 		return -ENODEV;
 
 	ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, pages);
@@ -1180,6 +1181,7 @@  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_unmap_unpin(iommu, dma, true);
 	vfio_unlink_dma(iommu, dma);
 	put_task_struct(dma->task);
+	mmdrop(dma->mm);
 	vfio_dma_bitmap_free(dma);
 	if (dma->vaddr_invalid) {
 		iommu->vaddr_invalid_count--;
@@ -1687,6 +1689,8 @@  static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	get_task_struct(current->group_leader);
 	dma->task = current->group_leader;
 	dma->lock_cap = capable(CAP_IPC_LOCK);
+	dma->mm = dma->task->mm;
+	mmgrab(dma->mm);
 
 	dma->pfn_list = RB_ROOT;
 
@@ -3122,9 +3126,8 @@  static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
 			!(dma->prot & IOMMU_READ))
 		return -EPERM;
 
-	mm = get_task_mm(dma->task);
-
-	if (!mm)
+	mm = dma->mm;
+	if (!mmget_not_zero(mm))
 		return -EPERM;
 
 	if (kthread)