Message ID | 20190506232942.12623-5-rcampbell@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hmm: HMM documentation updates and code fixes | expand |
On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote: > > From: Ralph Campbell <rcampbell@nvidia.com> > > The helper function hmm_vma_fault() calls hmm_range_register() but is > missing a call to hmm_range_unregister() in one of the error paths. > This leads to a reference count leak and ultimately a memory leak on > struct hmm. > > Always call hmm_range_unregister() if hmm_range_register() succeeded. How about * Call hmm_range_unregister() in error path if hmm_range_register() succeeded* ? > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Balbir Singh <bsingharora@gmail.com> > Cc: Dan Carpenter <dan.carpenter@oracle.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Souptick Joarder <jrdr.linux@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > --- > include/linux/hmm.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index 35a429621e1e..fa0671d67269 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > return (int)ret; > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { > + hmm_range_unregister(range); > /* > * The mmap_sem was taken by driver we release it here and > * returns -EAGAIN which correspond to mmap_sem have been > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > ret = hmm_range_fault(range, block); > if (ret <= 0) { > + hmm_range_unregister(range); what is the reason to moved it up ? > if (ret == -EBUSY || !ret) { > /* Same as above, drop mmap_sem to match old API. */ > up_read(&range->vma->vm_mm->mmap_sem); > ret = -EBUSY; > } else if (ret == -EAGAIN) > ret = -EBUSY; > - hmm_range_unregister(range); > return ret; > } > return 0; > -- > 2.20.1 >
On 5/7/19 6:15 AM, Souptick Joarder wrote: > On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote: >> >> From: Ralph Campbell <rcampbell@nvidia.com> >> >> The helper function hmm_vma_fault() calls hmm_range_register() but is >> missing a call to hmm_range_unregister() in one of the error paths. >> This leads to a reference count leak and ultimately a memory leak on >> struct hmm. >> >> Always call hmm_range_unregister() if hmm_range_register() succeeded. > > How about * Call hmm_range_unregister() in error path if > hmm_range_register() succeeded* ? Sure, sounds good. I'll include that in v2. >> >> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: Ira Weiny <ira.weiny@intel.com> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Arnd Bergmann <arnd@arndb.de> >> Cc: Balbir Singh <bsingharora@gmail.com> >> Cc: Dan Carpenter <dan.carpenter@oracle.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Souptick Joarder <jrdr.linux@gmail.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> --- >> include/linux/hmm.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/hmm.h b/include/linux/hmm.h >> index 35a429621e1e..fa0671d67269 100644 >> --- a/include/linux/hmm.h >> +++ b/include/linux/hmm.h >> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >> return (int)ret; >> >> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { >> + hmm_range_unregister(range); >> /* >> * The mmap_sem was taken by driver we release it here and >> * returns -EAGAIN which correspond to mmap_sem have been >> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >> >> ret = hmm_range_fault(range, block); >> if (ret <= 0) { >> + hmm_range_unregister(range); > > what is the reason to moved it up ? I moved it up because the normal calling pattern is: down_read(&mm->mmap_sem) hmm_vma_fault() hmm_range_register() hmm_range_fault() hmm_range_unregister() up_read(&mm->mmap_sem) I don't think it is a bug to unlock mmap_sem and then unregister, it is just more consistent nesting. >> if (ret == -EBUSY || !ret) { >> /* Same as above, drop mmap_sem to match old API. */ >> up_read(&range->vma->vm_mm->mmap_sem); >> ret = -EBUSY; >> } else if (ret == -EAGAIN) >> ret = -EBUSY; >> - hmm_range_unregister(range); >> return ret; >> } >> return 0; >> -- >> 2.20.1 >>
On Tue, May 7, 2019 at 11:42 PM Ralph Campbell <rcampbell@nvidia.com> wrote: > > > On 5/7/19 6:15 AM, Souptick Joarder wrote: > > On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote: > >> > >> From: Ralph Campbell <rcampbell@nvidia.com> > >> > >> The helper function hmm_vma_fault() calls hmm_range_register() but is > >> missing a call to hmm_range_unregister() in one of the error paths. > >> This leads to a reference count leak and ultimately a memory leak on > >> struct hmm. > >> > >> Always call hmm_range_unregister() if hmm_range_register() succeeded. > > > > How about * Call hmm_range_unregister() in error path if > > hmm_range_register() succeeded* ? > > Sure, sounds good. > I'll include that in v2. Thanks. > > >> > >> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> > >> Cc: John Hubbard <jhubbard@nvidia.com> > >> Cc: Ira Weiny <ira.weiny@intel.com> > >> Cc: Dan Williams <dan.j.williams@intel.com> > >> Cc: Arnd Bergmann <arnd@arndb.de> > >> Cc: Balbir Singh <bsingharora@gmail.com> > >> Cc: Dan Carpenter <dan.carpenter@oracle.com> > >> Cc: Matthew Wilcox <willy@infradead.org> > >> Cc: Souptick Joarder <jrdr.linux@gmail.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> --- > >> include/linux/hmm.h | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/include/linux/hmm.h b/include/linux/hmm.h > >> index 35a429621e1e..fa0671d67269 100644 > >> --- a/include/linux/hmm.h > >> +++ b/include/linux/hmm.h > >> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > >> return (int)ret; > >> > >> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { > >> + hmm_range_unregister(range); > >> /* > >> * The mmap_sem was taken by driver we release it here and > >> * returns -EAGAIN which correspond to mmap_sem have been > >> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > >> > >> ret = hmm_range_fault(range, block); > >> if (ret <= 0) { > >> + hmm_range_unregister(range); > > > > what is the reason to moved it up ? > > I moved it up because the normal calling pattern is: > down_read(&mm->mmap_sem) > hmm_vma_fault() > hmm_range_register() > hmm_range_fault() > hmm_range_unregister() > up_read(&mm->mmap_sem) > > I don't think it is a bug to unlock mmap_sem and then unregister, > it is just more consistent nesting. Ok. I think, adding it in change log will be helpful :) > > >> if (ret == -EBUSY || !ret) { > >> /* Same as above, drop mmap_sem to match old API. */ > >> up_read(&range->vma->vm_mm->mmap_sem); > >> ret = -EBUSY; > >> } else if (ret == -EAGAIN) > >> ret = -EBUSY; > >> - hmm_range_unregister(range); > >> return ret; > >> } > >> return 0; > >> -- > >> 2.20.1 > >>
On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote: > > On 5/7/19 6:15 AM, Souptick Joarder wrote: > > On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote: > > > > > > From: Ralph Campbell <rcampbell@nvidia.com> > > > > > > The helper function hmm_vma_fault() calls hmm_range_register() but is > > > missing a call to hmm_range_unregister() in one of the error paths. > > > This leads to a reference count leak and ultimately a memory leak on > > > struct hmm. > > > > > > Always call hmm_range_unregister() if hmm_range_register() succeeded. > > > > How about * Call hmm_range_unregister() in error path if > > hmm_range_register() succeeded* ? > > Sure, sounds good. > I'll include that in v2. NAK for the patch see below why > > > > > > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > Cc: Ira Weiny <ira.weiny@intel.com> > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > Cc: Arnd Bergmann <arnd@arndb.de> > > > Cc: Balbir Singh <bsingharora@gmail.com> > > > Cc: Dan Carpenter <dan.carpenter@oracle.com> > > > Cc: Matthew Wilcox <willy@infradead.org> > > > Cc: Souptick Joarder <jrdr.linux@gmail.com> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > --- > > > include/linux/hmm.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > > > index 35a429621e1e..fa0671d67269 100644 > > > --- a/include/linux/hmm.h > > > +++ b/include/linux/hmm.h > > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > > return (int)ret; > > > > > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { > > > + hmm_range_unregister(range); > > > /* > > > * The mmap_sem was taken by driver we release it here and > > > * returns -EAGAIN which correspond to mmap_sem have been > > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > > > > > ret = hmm_range_fault(range, block); > > > if (ret <= 0) { > > > + hmm_range_unregister(range); > > > > what is the reason to moved it up ? > > I moved it up because the normal calling pattern is: > down_read(&mm->mmap_sem) > hmm_vma_fault() > hmm_range_register() > hmm_range_fault() > hmm_range_unregister() > up_read(&mm->mmap_sem) > > I don't think it is a bug to unlock mmap_sem and then unregister, > it is just more consistent nesting. So this is not the usage pattern with HMM usage pattern is: hmm_range_register() hmm_range_fault() hmm_range_unregister() The hmm_vma_fault() is gonne so this patch here break thing. See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
On Sun, May 12, 2019 at 11:07:24AM -0400, Jerome Glisse wrote: > On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote: > > > > On 5/7/19 6:15 AM, Souptick Joarder wrote: > > > On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote: > > > > > > > > From: Ralph Campbell <rcampbell@nvidia.com> > > > > > > > > The helper function hmm_vma_fault() calls hmm_range_register() but is > > > > missing a call to hmm_range_unregister() in one of the error paths. > > > > This leads to a reference count leak and ultimately a memory leak on > > > > struct hmm. > > > > > > > > Always call hmm_range_unregister() if hmm_range_register() succeeded. > > > > > > How about * Call hmm_range_unregister() in error path if > > > hmm_range_register() succeeded* ? > > > > Sure, sounds good. > > I'll include that in v2. > > NAK for the patch see below why > > > > > > > > > > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> > > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > > Cc: Ira Weiny <ira.weiny@intel.com> > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > Cc: Arnd Bergmann <arnd@arndb.de> > > > > Cc: Balbir Singh <bsingharora@gmail.com> > > > > Cc: Dan Carpenter <dan.carpenter@oracle.com> > > > > Cc: Matthew Wilcox <willy@infradead.org> > > > > Cc: Souptick Joarder <jrdr.linux@gmail.com> > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > --- > > > > include/linux/hmm.h | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > > > > index 35a429621e1e..fa0671d67269 100644 > > > > --- a/include/linux/hmm.h > > > > +++ b/include/linux/hmm.h > > > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > > > return (int)ret; > > > > > > > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { > > > > + hmm_range_unregister(range); > > > > /* > > > > * The mmap_sem was taken by driver we release it here and > > > > * returns -EAGAIN which correspond to mmap_sem have been > > > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > > > > > > > ret = hmm_range_fault(range, block); > > > > if (ret <= 0) { > > > > + hmm_range_unregister(range); > > > > > > what is the reason to moved it up ? > > > > I moved it up because the normal calling pattern is: > > down_read(&mm->mmap_sem) > > hmm_vma_fault() > > hmm_range_register() > > hmm_range_fault() > > hmm_range_unregister() > > up_read(&mm->mmap_sem) > > > > I don't think it is a bug to unlock mmap_sem and then unregister, > > it is just more consistent nesting. > > So this is not the usage pattern with HMM usage pattern is: > > hmm_range_register() > hmm_range_fault() > hmm_range_unregister() > > The hmm_vma_fault() is gonne so this patch here break thing. > > See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3 Sorry not enough coffee on sunday morning, so yeah this patch looks good except that you do not need to move it up. Note that hmm_vma_fault() is a gonner once ODP to HMM is upstream and i converted nouveau/amd to new API then we can remove that one. Cheers, Jérôme
On 5/12/19 8:07 AM, Jerome Glisse wrote: > On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote: >> >> On 5/7/19 6:15 AM, Souptick Joarder wrote: >>> On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote: >>>> >>>> From: Ralph Campbell <rcampbell@nvidia.com> >>>> >>>> The helper function hmm_vma_fault() calls hmm_range_register() but is >>>> missing a call to hmm_range_unregister() in one of the error paths. >>>> This leads to a reference count leak and ultimately a memory leak on >>>> struct hmm. >>>> >>>> Always call hmm_range_unregister() if hmm_range_register() succeeded. >>> >>> How about * Call hmm_range_unregister() in error path if >>> hmm_range_register() succeeded* ? >> >> Sure, sounds good. >> I'll include that in v2. > > NAK for the patch see below why > >> >>>> >>>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> >>>> Cc: John Hubbard <jhubbard@nvidia.com> >>>> Cc: Ira Weiny <ira.weiny@intel.com> >>>> Cc: Dan Williams <dan.j.williams@intel.com> >>>> Cc: Arnd Bergmann <arnd@arndb.de> >>>> Cc: Balbir Singh <bsingharora@gmail.com> >>>> Cc: Dan Carpenter <dan.carpenter@oracle.com> >>>> Cc: Matthew Wilcox <willy@infradead.org> >>>> Cc: Souptick Joarder <jrdr.linux@gmail.com> >>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>> --- >>>> include/linux/hmm.h | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h >>>> index 35a429621e1e..fa0671d67269 100644 >>>> --- a/include/linux/hmm.h >>>> +++ b/include/linux/hmm.h >>>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >>>> return (int)ret; >>>> >>>> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { >>>> + hmm_range_unregister(range); >>>> /* >>>> * The mmap_sem was taken by driver we release it here and >>>> * returns -EAGAIN which correspond to mmap_sem have been >>>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >>>> >>>> ret = hmm_range_fault(range, block); >>>> if (ret <= 0) { >>>> + hmm_range_unregister(range); >>> >>> what is the reason to moved it up ? >> >> I moved it up because the normal calling pattern is: >> down_read(&mm->mmap_sem) >> hmm_vma_fault() >> hmm_range_register() >> hmm_range_fault() >> hmm_range_unregister() >> up_read(&mm->mmap_sem) >> >> I don't think it is a bug to unlock mmap_sem and then unregister, >> it is just more consistent nesting. > > So this is not the usage pattern with HMM usage pattern is: > > hmm_range_register() > hmm_range_fault() > hmm_range_unregister() > > The hmm_vma_fault() is gonne so this patch here break thing. > > See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3 The patch series is on top of v5.1-rc6-mmotm-2019-04-25-16-30. hmm_vma_fault() is defined there and in your hmm-5.2-v3 branch as a backward compatibility transition function in include/linux/hmm.h. So I agree the new API is to use hmm_range_register(), etc. This is intended to cover the transition period. Note that hmm_vma_fault() is being called from drivers/gpu/drm/nouveau/nouveau_svm.c in both trees.
On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote: > From: Ralph Campbell <rcampbell@nvidia.com> > > The helper function hmm_vma_fault() calls hmm_range_register() but is > missing a call to hmm_range_unregister() in one of the error paths. > This leads to a reference count leak and ultimately a memory leak on > struct hmm. > > Always call hmm_range_unregister() if hmm_range_register() succeeded. > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Balbir Singh <bsingharora@gmail.com> > Cc: Dan Carpenter <dan.carpenter@oracle.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Souptick Joarder <jrdr.linux@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > --- > include/linux/hmm.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index 35a429621e1e..fa0671d67269 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > return (int)ret; > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { > + hmm_range_unregister(range); > /* > * The mmap_sem was taken by driver we release it here and > * returns -EAGAIN which correspond to mmap_sem have been > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > ret = hmm_range_fault(range, block); > if (ret <= 0) { > + hmm_range_unregister(range); While this seems to be a clear improvement, it seems there is still a bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but never calls hmm_range_unregister() for its on stack range - and hmm_vma_fault() still returns with the range registered. As hmm_vma_fault() is only used by nouveau and is marked as deprecated, I think we need to fix nouveau, either by dropping hmm_range_fault(), or by adding the missing unregister to nouveau in this patch. Also, I see in linux-next that amdgpu_ttm.c has wrongly copied use of this deprecated API, including these bugs... amd folks: Can you please push a patch for your driver to stop using hmm_vma_fault() and correct the use-after free? Ideally I'd like to delete this function this merge cycle from hmm.git Also if you missed it, I'm running a clean hmm.git that you can pull into the AMD tree, if necessary, to get the changes that will go into 5.3 - if you need/wish to do this please consult with me before making a merge commit, thanks. See: https://lore.kernel.org/lkml/20190524124455.GB16845@ziepe.ca/ So Ralph, you'll need to resend this. Thanks, Jason
On 6/6/19 7:50 AM, Jason Gunthorpe wrote: > On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote: >> From: Ralph Campbell <rcampbell@nvidia.com> >> >> The helper function hmm_vma_fault() calls hmm_range_register() but is >> missing a call to hmm_range_unregister() in one of the error paths. >> This leads to a reference count leak and ultimately a memory leak on >> struct hmm. >> >> Always call hmm_range_unregister() if hmm_range_register() succeeded. >> >> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> >> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: Ira Weiny <ira.weiny@intel.com> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Arnd Bergmann <arnd@arndb.de> >> Cc: Balbir Singh <bsingharora@gmail.com> >> Cc: Dan Carpenter <dan.carpenter@oracle.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Souptick Joarder <jrdr.linux@gmail.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> --- >> include/linux/hmm.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> diff --git a/include/linux/hmm.h b/include/linux/hmm.h >> index 35a429621e1e..fa0671d67269 100644 >> --- a/include/linux/hmm.h >> +++ b/include/linux/hmm.h >> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >> return (int)ret; >> >> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { >> + hmm_range_unregister(range); >> /* >> * The mmap_sem was taken by driver we release it here and >> * returns -EAGAIN which correspond to mmap_sem have been >> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >> >> ret = hmm_range_fault(range, block); >> if (ret <= 0) { >> + hmm_range_unregister(range); > > While this seems to be a clear improvement, it seems there is still a > bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but > never calls hmm_range_unregister() for its on stack range - and > hmm_vma_fault() still returns with the range registered. > > As hmm_vma_fault() is only used by nouveau and is marked as > deprecated, I think we need to fix nouveau, either by dropping > hmm_range_fault(), or by adding the missing unregister to nouveau in > this patch. I will send a patch for nouveau to use hmm_range_register() and hmm_range_fault() and do some testing with OpenCL. I can also send a separate patch to then remove hmm_vma_fault() but I guess that should be after AMD's changes. > Also, I see in linux-next that amdgpu_ttm.c has wrongly copied use of > this deprecated API, including these bugs... > > amd folks: Can you please push a patch for your driver to stop using > hmm_vma_fault() and correct the use-after free? Ideally I'd like to > delete this function this merge cycle from hmm.git > > Also if you missed it, I'm running a clean hmm.git that you can pull > into the AMD tree, if necessary, to get the changes that will go into > 5.3 - if you need/wish to do this please consult with me before making a > merge commit, thanks. See: > > https://lore.kernel.org/lkml/20190524124455.GB16845@ziepe.ca/ > > So Ralph, you'll need to resend this. > > Thanks, > Jason >
On Thu, Jun 06, 2019 at 12:44:36PM -0700, Ralph Campbell wrote: > > On 6/6/19 7:50 AM, Jason Gunthorpe wrote: > > On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote: > > > From: Ralph Campbell <rcampbell@nvidia.com> > > > > > > The helper function hmm_vma_fault() calls hmm_range_register() but is > > > missing a call to hmm_range_unregister() in one of the error paths. > > > This leads to a reference count leak and ultimately a memory leak on > > > struct hmm. > > > > > > Always call hmm_range_unregister() if hmm_range_register() succeeded. > > > > > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> > > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > Cc: Ira Weiny <ira.weiny@intel.com> > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > Cc: Arnd Bergmann <arnd@arndb.de> > > > Cc: Balbir Singh <bsingharora@gmail.com> > > > Cc: Dan Carpenter <dan.carpenter@oracle.com> > > > Cc: Matthew Wilcox <willy@infradead.org> > > > Cc: Souptick Joarder <jrdr.linux@gmail.com> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > include/linux/hmm.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > > > index 35a429621e1e..fa0671d67269 100644 > > > +++ b/include/linux/hmm.h > > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > > return (int)ret; > > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { > > > + hmm_range_unregister(range); > > > /* > > > * The mmap_sem was taken by driver we release it here and > > > * returns -EAGAIN which correspond to mmap_sem have been > > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) > > > ret = hmm_range_fault(range, block); > > > if (ret <= 0) { > > > + hmm_range_unregister(range); > > > > While this seems to be a clear improvement, it seems there is still a > > bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but > > never calls hmm_range_unregister() for its on stack range - and > > hmm_vma_fault() still returns with the range registered. > > > > As hmm_vma_fault() is only used by nouveau and is marked as > > deprecated, I think we need to fix nouveau, either by dropping > > hmm_range_fault(), or by adding the missing unregister to nouveau in > > this patch. > > I will send a patch for nouveau to use hmm_range_register() and > hmm_range_fault() and do some testing with OpenCL. wow, thanks, I'd like to also really like to send such a thing through hmm.git - do you know who the nouveau maintainers are so we can collaborate on patch planning this? > I can also send a separate patch to then remove hmm_vma_fault() > but I guess that should be after AMD's changes. Let us wait to hear back from AMD how they can consume hmm.git - I'd very much like to get everything done in one kernel cycle! Regards, Jason
On 6/6/19 12:54 PM, Jason Gunthorpe wrote: > On Thu, Jun 06, 2019 at 12:44:36PM -0700, Ralph Campbell wrote: >> >> On 6/6/19 7:50 AM, Jason Gunthorpe wrote: >>> On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote: >>>> From: Ralph Campbell <rcampbell@nvidia.com> >>>> >>>> The helper function hmm_vma_fault() calls hmm_range_register() but is >>>> missing a call to hmm_range_unregister() in one of the error paths. >>>> This leads to a reference count leak and ultimately a memory leak on >>>> struct hmm. >>>> >>>> Always call hmm_range_unregister() if hmm_range_register() succeeded. >>>> >>>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> >>>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> >>>> Cc: John Hubbard <jhubbard@nvidia.com> >>>> Cc: Ira Weiny <ira.weiny@intel.com> >>>> Cc: Dan Williams <dan.j.williams@intel.com> >>>> Cc: Arnd Bergmann <arnd@arndb.de> >>>> Cc: Balbir Singh <bsingharora@gmail.com> >>>> Cc: Dan Carpenter <dan.carpenter@oracle.com> >>>> Cc: Matthew Wilcox <willy@infradead.org> >>>> Cc: Souptick Joarder <jrdr.linux@gmail.com> >>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>> include/linux/hmm.h | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h >>>> index 35a429621e1e..fa0671d67269 100644 >>>> +++ b/include/linux/hmm.h >>>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >>>> return (int)ret; >>>> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { >>>> + hmm_range_unregister(range); >>>> /* >>>> * The mmap_sem was taken by driver we release it here and >>>> * returns -EAGAIN which correspond to mmap_sem have been >>>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) >>>> ret = hmm_range_fault(range, block); >>>> if (ret <= 0) { >>>> + hmm_range_unregister(range); >>> >>> While this seems to be a clear improvement, it seems there is still a >>> bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but >>> never calls hmm_range_unregister() for its on stack range - and >>> hmm_vma_fault() still returns with the range registered. >>> >>> As hmm_vma_fault() is only used by nouveau and is marked as >>> deprecated, I think we need to fix nouveau, either by dropping >>> hmm_range_fault(), or by adding the missing unregister to nouveau in >>> this patch. >> >> I will send a patch for nouveau to use hmm_range_register() and >> hmm_range_fault() and do some testing with OpenCL. > > wow, thanks, I'd like to also really like to send such a thing through > hmm.git - do you know who the nouveau maintainers are so we can > collaborate on patch planning this? Ben Skeggs <bskeggs@redhat.com> is the maintainer and nouveau@lists.freedesktop.org is the mailing list for changes. I'll be sure to CC them for the patch. >> I can also send a separate patch to then remove hmm_vma_fault() >> but I guess that should be after AMD's changes. > > Let us wait to hear back from AMD how they can consume hmm.git - I'd > very much like to get everything done in one kernel cycle! > > Regards, > Jason >
diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 35a429621e1e..fa0671d67269 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) return (int)ret; if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { + hmm_range_unregister(range); /* * The mmap_sem was taken by driver we release it here and * returns -EAGAIN which correspond to mmap_sem have been @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) ret = hmm_range_fault(range, block); if (ret <= 0) { + hmm_range_unregister(range); if (ret == -EBUSY || !ret) { /* Same as above, drop mmap_sem to match old API. */ up_read(&range->vma->vm_mm->mmap_sem); ret = -EBUSY; } else if (ret == -EAGAIN) ret = -EBUSY; - hmm_range_unregister(range); return ret; } return 0;