Message ID | 1671568765-297322-3-git-send-email-steven.sistare@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fixes for virtual address update | expand |
On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: > When a vfio container is preserved across exec, the task does not change, > but it gets a new mm with locked_vm=0, and loses the count from existing > dma mappings. If the user later unmaps a dma mapping, locked_vm underflows > to a large unsigned value, and a subsequent dma map request fails with > ENOMEM in __account_locked_vm. > > To avoid underflow, grab and save the mm at the time a dma is mapped. > Use that mm when adjusting locked_vm, rather than re-acquiring the saved > task's mm, which may have changed. If the saved mm is dead, do nothing. > > locked_vm is incremented for existing mappings in a subsequent patch. > > Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") > Cc: stable@vger.kernel.org > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > --- > drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- > 1 file changed, 11 insertions(+), 16 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 144f5bb..71f980b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -100,6 +100,7 @@ struct vfio_dma { > struct task_struct *task; > struct rb_root pfn_list; /* Ex-user pinned pfn list */ > unsigned long *bitmap; > + struct mm_struct *mm; > }; > > struct vfio_batch { > @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > if (!npage) > return 0; > > - mm = async ? get_task_mm(dma->task) : dma->task->mm; > - if (!mm) > + mm = dma->mm; > + if (async && !mmget_not_zero(mm)) > return -ESRCH; /* process exited */ Just delete the async, the lock_acct always acts on the dma which always has a singular mm. FIx the few callers that need it to do the mmget_no_zero() before calling in. > @@ -794,8 +795,8 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, > struct mm_struct *mm; > int ret; > > - mm = get_task_mm(dma->task); > - if (!mm) > + mm = dma->mm; > + if (!mmget_not_zero(mm)) > return -ENODEV; eg things get all confused here where we have the mmget_not_zero But then we go ahead and call vfio_lock_acct() with true Jason
On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: > On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >> When a vfio container is preserved across exec, the task does not change, >> but it gets a new mm with locked_vm=0, and loses the count from existing >> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >> to a large unsigned value, and a subsequent dma map request fails with >> ENOMEM in __account_locked_vm. >> >> To avoid underflow, grab and save the mm at the time a dma is mapped. >> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >> task's mm, which may have changed. If the saved mm is dead, do nothing. >> >> locked_vm is incremented for existing mappings in a subsequent patch. >> >> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >> Cc: stable@vger.kernel.org >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> Reviewed-by: Kevin Tian <kevin.tian@intel.com> >> --- >> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >> 1 file changed, 11 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index 144f5bb..71f980b 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -100,6 +100,7 @@ struct vfio_dma { >> struct task_struct *task; >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >> unsigned long *bitmap; >> + struct mm_struct *mm; >> }; >> >> struct vfio_batch { >> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >> if (!npage) >> return 0; >> >> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >> - if (!mm) >> + mm = dma->mm; >> + if (async && !mmget_not_zero(mm)) >> return -ESRCH; /* process exited */ > > Just delete the async, the lock_acct always acts on the dma which > always has a singular mm. > > FIx the few callers that need it to do the mmget_no_zero() before > calling in. Most of the callers pass async=true: ret = vfio_lock_acct(dma, lock_acct, false); vfio_lock_acct(dma, locked - unlocked, true); ret = vfio_lock_acct(dma, 1, true); vfio_lock_acct(dma, -unlocked, true); vfio_lock_acct(dma, -1, true); vfio_lock_acct(dma, -unlocked, true); ret = mm_lock_acct(task, mm, lock_cap, npage, false); mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); vfio_lock_acct(dma, locked - unlocked, true); Hoisting mmget_not_zero and mmput will make many call sites messier. Alex, what is your opinion? >> @@ -794,8 +795,8 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, >> struct mm_struct *mm; >> int ret; >> >> - mm = get_task_mm(dma->task); >> - if (!mm) >> + mm = dma->mm; >> + if (!mmget_not_zero(mm)) >> return -ENODEV; > > eg things get all confused here where we have the mmget_not_zero > > But then we go ahead and call vfio_lock_acct() with true Yes, I should pass false at this call site to optimize. - Steve
On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: > On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: > > On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: > >> When a vfio container is preserved across exec, the task does not change, > >> but it gets a new mm with locked_vm=0, and loses the count from existing > >> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows > >> to a large unsigned value, and a subsequent dma map request fails with > >> ENOMEM in __account_locked_vm. > >> > >> To avoid underflow, grab and save the mm at the time a dma is mapped. > >> Use that mm when adjusting locked_vm, rather than re-acquiring the saved > >> task's mm, which may have changed. If the saved mm is dead, do nothing. > >> > >> locked_vm is incremented for existing mappings in a subsequent patch. > >> > >> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >> Reviewed-by: Kevin Tian <kevin.tian@intel.com> > >> --- > >> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- > >> 1 file changed, 11 insertions(+), 16 deletions(-) > >> > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> index 144f5bb..71f980b 100644 > >> --- a/drivers/vfio/vfio_iommu_type1.c > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -100,6 +100,7 @@ struct vfio_dma { > >> struct task_struct *task; > >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ > >> unsigned long *bitmap; > >> + struct mm_struct *mm; > >> }; > >> > >> struct vfio_batch { > >> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > >> if (!npage) > >> return 0; > >> > >> - mm = async ? get_task_mm(dma->task) : dma->task->mm; > >> - if (!mm) > >> + mm = dma->mm; > >> + if (async && !mmget_not_zero(mm)) > >> return -ESRCH; /* process exited */ > > > > Just delete the async, the lock_acct always acts on the dma which > > always has a singular mm. > > > > FIx the few callers that need it to do the mmget_no_zero() before > > calling in. > > Most of the callers pass async=true: > ret = vfio_lock_acct(dma, lock_acct, false); > vfio_lock_acct(dma, locked - unlocked, true); > ret = vfio_lock_acct(dma, 1, true); > vfio_lock_acct(dma, -unlocked, true); > vfio_lock_acct(dma, -1, true); > vfio_lock_acct(dma, -unlocked, true); > ret = mm_lock_acct(task, mm, lock_cap, npage, false); > mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); > vfio_lock_acct(dma, locked - unlocked, true); Seems like if you make a lock_sub_acct() function that does the -1* and does the mmget it will be OK? Jason
On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: > On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: >> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: >>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >>>> When a vfio container is preserved across exec, the task does not change, >>>> but it gets a new mm with locked_vm=0, and loses the count from existing >>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >>>> to a large unsigned value, and a subsequent dma map request fails with >>>> ENOMEM in __account_locked_vm. >>>> >>>> To avoid underflow, grab and save the mm at the time a dma is mapped. >>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >>>> task's mm, which may have changed. If the saved mm is dead, do nothing. >>>> >>>> locked_vm is incremented for existing mappings in a subsequent patch. >>>> >>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >>>> Cc: stable@vger.kernel.org >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com> >>>> --- >>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >>>> 1 file changed, 11 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>> index 144f5bb..71f980b 100644 >>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>> @@ -100,6 +100,7 @@ struct vfio_dma { >>>> struct task_struct *task; >>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>> unsigned long *bitmap; >>>> + struct mm_struct *mm; >>>> }; >>>> >>>> struct vfio_batch { >>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >>>> if (!npage) >>>> return 0; >>>> >>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >>>> - if (!mm) >>>> + mm = dma->mm; >>>> + if (async && !mmget_not_zero(mm)) >>>> return -ESRCH; /* process exited */ >>> >>> Just delete the async, the lock_acct always acts on the dma which >>> always has a singular mm. >>> >>> FIx the few callers that need it to do the mmget_no_zero() before >>> calling in. >> >> Most of the callers pass async=true: >> ret = vfio_lock_acct(dma, lock_acct, false); >> vfio_lock_acct(dma, locked - unlocked, true); >> ret = vfio_lock_acct(dma, 1, true); >> vfio_lock_acct(dma, -unlocked, true); >> vfio_lock_acct(dma, -1, true); >> vfio_lock_acct(dma, -unlocked, true); >> ret = mm_lock_acct(task, mm, lock_cap, npage, false); >> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); >> vfio_lock_acct(dma, locked - unlocked, true); > > Seems like if you make a lock_sub_acct() function that does the -1* > and does the mmget it will be OK? Do you mean, provide two versions of vfio_lock_acct? Simplified: vfio_lock_acct() { mm_lock_acct() dma->locked_vm += npage; } vfio_lock_acct_async() { mmget_not_zero(dma->mm) mm_lock_acct() dma->locked_vm += npage; mmput(dma->mm); } If so, I will factor this into a separate patch, since it is cosmetic, so stable can omit it. - Steve
On Fri, Jan 06, 2023 at 10:14:57AM -0500, Steven Sistare wrote: > On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: > > On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: > >> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: > >>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: > >>>> When a vfio container is preserved across exec, the task does not change, > >>>> but it gets a new mm with locked_vm=0, and loses the count from existing > >>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows > >>>> to a large unsigned value, and a subsequent dma map request fails with > >>>> ENOMEM in __account_locked_vm. > >>>> > >>>> To avoid underflow, grab and save the mm at the time a dma is mapped. > >>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved > >>>> task's mm, which may have changed. If the saved mm is dead, do nothing. > >>>> > >>>> locked_vm is incremented for existing mappings in a subsequent patch. > >>>> > >>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") > >>>> Cc: stable@vger.kernel.org > >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com> > >>>> --- > >>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- > >>>> 1 file changed, 11 insertions(+), 16 deletions(-) > >>>> > >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >>>> index 144f5bb..71f980b 100644 > >>>> --- a/drivers/vfio/vfio_iommu_type1.c > >>>> +++ b/drivers/vfio/vfio_iommu_type1.c > >>>> @@ -100,6 +100,7 @@ struct vfio_dma { > >>>> struct task_struct *task; > >>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ > >>>> unsigned long *bitmap; > >>>> + struct mm_struct *mm; > >>>> }; > >>>> > >>>> struct vfio_batch { > >>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > >>>> if (!npage) > >>>> return 0; > >>>> > >>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; > >>>> - if (!mm) > >>>> + mm = dma->mm; > >>>> + if (async && !mmget_not_zero(mm)) > >>>> return -ESRCH; /* process exited */ > >>> > >>> Just delete the async, the lock_acct always acts on the dma which > >>> always has a singular mm. > >>> > >>> FIx the few callers that need it to do the mmget_no_zero() before > >>> calling in. > >> > >> Most of the callers pass async=true: > >> ret = vfio_lock_acct(dma, lock_acct, false); > >> vfio_lock_acct(dma, locked - unlocked, true); > >> ret = vfio_lock_acct(dma, 1, true); > >> vfio_lock_acct(dma, -unlocked, true); > >> vfio_lock_acct(dma, -1, true); > >> vfio_lock_acct(dma, -unlocked, true); > >> ret = mm_lock_acct(task, mm, lock_cap, npage, false); > >> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); > >> vfio_lock_acct(dma, locked - unlocked, true); > > > > Seems like if you make a lock_sub_acct() function that does the -1* > > and does the mmget it will be OK? > > Do you mean, provide two versions of vfio_lock_acct? Simplified: > > vfio_lock_acct() > { > mm_lock_acct() > dma->locked_vm += npage; > } > > vfio_lock_acct_async() > { > mmget_not_zero(dma->mm) > > mm_lock_acct() > dma->locked_vm += npage; > > mmput(dma->mm); > } I was thinking more like vfio_lock_acct_subtract() mmget_not_zero(dma->mm) mm->locked_vm -= npage Jason
On 1/9/2023 8:52 AM, Jason Gunthorpe wrote: > On Fri, Jan 06, 2023 at 10:14:57AM -0500, Steven Sistare wrote: >> On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: >>> On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: >>>> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: >>>>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >>>>>> When a vfio container is preserved across exec, the task does not change, >>>>>> but it gets a new mm with locked_vm=0, and loses the count from existing >>>>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >>>>>> to a large unsigned value, and a subsequent dma map request fails with >>>>>> ENOMEM in __account_locked_vm. >>>>>> >>>>>> To avoid underflow, grab and save the mm at the time a dma is mapped. >>>>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >>>>>> task's mm, which may have changed. If the saved mm is dead, do nothing. >>>>>> >>>>>> locked_vm is incremented for existing mappings in a subsequent patch. >>>>>> >>>>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >>>>>> Cc: stable@vger.kernel.org >>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com> >>>>>> --- >>>>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >>>>>> 1 file changed, 11 insertions(+), 16 deletions(-) >>>>>> >>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>>>> index 144f5bb..71f980b 100644 >>>>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>>>> @@ -100,6 +100,7 @@ struct vfio_dma { >>>>>> struct task_struct *task; >>>>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>>>> unsigned long *bitmap; >>>>>> + struct mm_struct *mm; >>>>>> }; >>>>>> >>>>>> struct vfio_batch { >>>>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >>>>>> if (!npage) >>>>>> return 0; >>>>>> >>>>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >>>>>> - if (!mm) >>>>>> + mm = dma->mm; >>>>>> + if (async && !mmget_not_zero(mm)) >>>>>> return -ESRCH; /* process exited */ >>>>> >>>>> Just delete the async, the lock_acct always acts on the dma which >>>>> always has a singular mm. >>>>> >>>>> FIx the few callers that need it to do the mmget_no_zero() before >>>>> calling in. >>>> >>>> Most of the callers pass async=true: >>>> ret = vfio_lock_acct(dma, lock_acct, false); >>>> vfio_lock_acct(dma, locked - unlocked, true); >>>> ret = vfio_lock_acct(dma, 1, true); >>>> vfio_lock_acct(dma, -unlocked, true); >>>> vfio_lock_acct(dma, -1, true); >>>> vfio_lock_acct(dma, -unlocked, true); >>>> ret = mm_lock_acct(task, mm, lock_cap, npage, false); >>>> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); >>>> vfio_lock_acct(dma, locked - unlocked, true); >>> >>> Seems like if you make a lock_sub_acct() function that does the -1* >>> and does the mmget it will be OK? >> >> Do you mean, provide two versions of vfio_lock_acct? Simplified: >> >> vfio_lock_acct() >> { >> mm_lock_acct() >> dma->locked_vm += npage; >> } >> >> vfio_lock_acct_async() >> { >> mmget_not_zero(dma->mm) >> >> mm_lock_acct() >> dma->locked_vm += npage; >> >> mmput(dma->mm); >> } > > I was thinking more like > > () > mmget_not_zero(dma->mm) > mm->locked_vm -= npage ^^^^^^ Is this shorthand for open coding __account_locked_vm? If so, we are essentially saying the same thing. My function vfio_lock_acct_async calls mm_lock_acct which calls __account_locked_vm. But, your vfio_lock_acct_subtract does not call mmput, so maybe I still don't grok your suggestion. FWIW here are my functions with all error checking: static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm, bool lock_cap, long npage) { int ret = mmap_write_lock_killable(mm); if (!ret) { ret = __account_locked_vm(mm, abs(npage), npage > 0, task, lock_cap); mmap_write_unlock(mm); } return ret; } static int vfio_lock_acct(struct vfio_dma *dma, long npage) { int ret; if (!npage) return 0; ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); if (!ret) dma->locked_vm += npage; return ret; } static int vfio_lock_acct_async(struct vfio_dma *dma, long npage) { int ret; if (!npage) return 0; if (!mmget_not_zero(dma->mm)) return -ESRCH; /* process exited */ ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); if (!ret) dma->locked_vm += npage; mmput(dma->mm); return ret; } - Steve
On 1/9/2023 10:54 AM, Steven Sistare wrote: > On 1/9/2023 8:52 AM, Jason Gunthorpe wrote: >> On Fri, Jan 06, 2023 at 10:14:57AM -0500, Steven Sistare wrote: >>> On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: >>>> On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: >>>>> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: >>>>>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >>>>>>> When a vfio container is preserved across exec, the task does not change, >>>>>>> but it gets a new mm with locked_vm=0, and loses the count from existing >>>>>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >>>>>>> to a large unsigned value, and a subsequent dma map request fails with >>>>>>> ENOMEM in __account_locked_vm. >>>>>>> >>>>>>> To avoid underflow, grab and save the mm at the time a dma is mapped. >>>>>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >>>>>>> task's mm, which may have changed. If the saved mm is dead, do nothing. >>>>>>> >>>>>>> locked_vm is incremented for existing mappings in a subsequent patch. >>>>>>> >>>>>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >>>>>>> Cc: stable@vger.kernel.org >>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>>>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com> >>>>>>> --- >>>>>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >>>>>>> 1 file changed, 11 insertions(+), 16 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>>>>> index 144f5bb..71f980b 100644 >>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>>>>> @@ -100,6 +100,7 @@ struct vfio_dma { >>>>>>> struct task_struct *task; >>>>>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>>>>> unsigned long *bitmap; >>>>>>> + struct mm_struct *mm; >>>>>>> }; >>>>>>> >>>>>>> struct vfio_batch { >>>>>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >>>>>>> if (!npage) >>>>>>> return 0; >>>>>>> >>>>>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >>>>>>> - if (!mm) >>>>>>> + mm = dma->mm; >>>>>>> + if (async && !mmget_not_zero(mm)) >>>>>>> return -ESRCH; /* process exited */ >>>>>> >>>>>> Just delete the async, the lock_acct always acts on the dma which >>>>>> always has a singular mm. >>>>>> >>>>>> FIx the few callers that need it to do the mmget_no_zero() before >>>>>> calling in. >>>>> >>>>> Most of the callers pass async=true: >>>>> ret = vfio_lock_acct(dma, lock_acct, false); >>>>> vfio_lock_acct(dma, locked - unlocked, true); >>>>> ret = vfio_lock_acct(dma, 1, true); >>>>> vfio_lock_acct(dma, -unlocked, true); >>>>> vfio_lock_acct(dma, -1, true); >>>>> vfio_lock_acct(dma, -unlocked, true); >>>>> ret = mm_lock_acct(task, mm, lock_cap, npage, false); >>>>> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); >>>>> vfio_lock_acct(dma, locked - unlocked, true); >>>> >>>> Seems like if you make a lock_sub_acct() function that does the -1* >>>> and does the mmget it will be OK? >>> >>> Do you mean, provide two versions of vfio_lock_acct? Simplified: >>> >>> vfio_lock_acct() >>> { >>> mm_lock_acct() >>> dma->locked_vm += npage; >>> } >>> >>> vfio_lock_acct_async() >>> { >>> mmget_not_zero(dma->mm) >>> >>> mm_lock_acct() >>> dma->locked_vm += npage; >>> >>> mmput(dma->mm); >>> } >> >> I was thinking more like >> >> () >> mmget_not_zero(dma->mm) >> mm->locked_vm -= npage > ^^^^^^ > Is this shorthand for open coding __account_locked_vm? If so, we are > essentially saying the same thing. My function vfio_lock_acct_async calls > mm_lock_acct which calls __account_locked_vm. > > But, your vfio_lock_acct_subtract does not call mmput, so maybe I still don't > grok your suggestion. > > FWIW here are my functions with all error checking: > > static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm, > bool lock_cap, long npage) > { > int ret = mmap_write_lock_killable(mm); > > if (!ret) { > ret = __account_locked_vm(mm, abs(npage), npage > 0, task, > lock_cap); > mmap_write_unlock(mm); > } > > return ret; > } > > static int vfio_lock_acct(struct vfio_dma *dma, long npage) > { > int ret; > > if (!npage) > return 0; > > ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); > if (!ret) > dma->locked_vm += npage; > > return ret; > } > > static int vfio_lock_acct_async(struct vfio_dma *dma, long npage) > { > int ret; > > if (!npage) > return 0; > > if (!mmget_not_zero(dma->mm)) > return -ESRCH; /* process exited */ > > ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); > if (!ret) > dma->locked_vm += npage; > > mmput(dma->mm); > > return ret; > } Let's leave the async arg and vfio_lock_acct as is. We are over-polishing a small style issue, in pre-existing code, in a soon-to-be dead-end code base. - Steve
On Mon, Jan 09, 2023 at 04:16:18PM -0500, Steven Sistare wrote: > Let's leave the async arg and vfio_lock_acct as is. We are over-polishing a small > style issue, in pre-existing code, in a soon-to-be dead-end code base. fine Jason
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 144f5bb..71f980b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -100,6 +100,7 @@ struct vfio_dma { struct task_struct *task; struct rb_root pfn_list; /* Ex-user pinned pfn list */ unsigned long *bitmap; + struct mm_struct *mm; }; struct vfio_batch { @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) if (!npage) return 0; - mm = async ? get_task_mm(dma->task) : dma->task->mm; - if (!mm) + mm = dma->mm; + if (async && !mmget_not_zero(mm)) return -ESRCH; /* process exited */ ret = mmap_write_lock_killable(mm); @@ -794,8 +795,8 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, struct mm_struct *mm; int ret; - mm = get_task_mm(dma->task); - if (!mm) + mm = dma->mm; + if (!mmget_not_zero(mm)) return -ENODEV; ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, pages); @@ -1180,6 +1181,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); + mmdrop(dma->mm); vfio_dma_bitmap_free(dma); if (dma->vaddr_invalid) { iommu->vaddr_invalid_count--; @@ -1664,15 +1666,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, * against the locked memory limit and we need to be able to do both * outside of this call path as pinning can be asynchronous via the * external interfaces for mdev devices. RLIMIT_MEMLOCK requires a - * task_struct and VM locked pages requires an mm_struct, however - * holding an indefinite mm reference is not recommended, therefore we - * only hold a reference to a task. We could hold a reference to - * current, however QEMU uses this call path through vCPU threads, - * which can be killed resulting in a NULL mm and failure in the unmap - * path when called via a different thread. Avoid this problem by - * using the group_leader as threads within the same group require - * both CLONE_THREAD and CLONE_VM and will therefore use the same - * mm_struct. + * task_struct and VM locked pages requires an mm_struct. * * Previously we also used the task for testing CAP_IPC_LOCK at the * time of pinning and accounting, however has_capability() makes use @@ -1687,6 +1681,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, get_task_struct(current->group_leader); dma->task = current->group_leader; dma->lock_cap = capable(CAP_IPC_LOCK); + dma->mm = current->mm; + mmgrab(dma->mm); dma->pfn_list = RB_ROOT; @@ -3122,9 +3118,8 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu, !(dma->prot & IOMMU_READ)) return -EPERM; - mm = get_task_mm(dma->task); - - if (!mm) + mm = dma->mm; + if (!mmget_not_zero(mm)) return -EPERM; if (kthread)