diff mbox series

[4/5] mm/hmm: hmm_vma_fault() doesn't always call hmm_range_unregister()

Message ID 20190506232942.12623-5-rcampbell@nvidia.com (mailing list archive)
State New, archived
Headers show
Series mm/hmm: HMM documentation updates and code fixes | expand

Commit Message

Ralph Campbell May 6, 2019, 11:29 p.m. UTC
From: Ralph Campbell <rcampbell@nvidia.com>

The helper function hmm_vma_fault() calls hmm_range_register() but is
missing a call to hmm_range_unregister() in one of the error paths.
This leads to a reference count leak and ultimately a memory leak on
struct hmm.

Always call hmm_range_unregister() if hmm_range_register() succeeded.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/hmm.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Souptick Joarder May 7, 2019, 1:15 p.m. UTC | #1
On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote:
>
> From: Ralph Campbell <rcampbell@nvidia.com>
>
> The helper function hmm_vma_fault() calls hmm_range_register() but is
> missing a call to hmm_range_unregister() in one of the error paths.
> This leads to a reference count leak and ultimately a memory leak on
> struct hmm.
>
> Always call hmm_range_unregister() if hmm_range_register() succeeded.

How about * Call hmm_range_unregister() in error path if
hmm_range_register() succeeded* ?

>
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/hmm.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 35a429621e1e..fa0671d67269 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>                 return (int)ret;
>
>         if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> +               hmm_range_unregister(range);
>                 /*
>                  * The mmap_sem was taken by driver we release it here and
>                  * returns -EAGAIN which correspond to mmap_sem have been
> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>
>         ret = hmm_range_fault(range, block);
>         if (ret <= 0) {
> +               hmm_range_unregister(range);

what is the reason to moved it up ?

>                 if (ret == -EBUSY || !ret) {
>                         /* Same as above, drop mmap_sem to match old API. */
>                         up_read(&range->vma->vm_mm->mmap_sem);
>                         ret = -EBUSY;
>                 } else if (ret == -EAGAIN)
>                         ret = -EBUSY;
> -               hmm_range_unregister(range);
>                 return ret;
>         }
>         return 0;
> --
> 2.20.1
>
Ralph Campbell May 7, 2019, 6:12 p.m. UTC | #2
On 5/7/19 6:15 AM, Souptick Joarder wrote:
> On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote:
>>
>> From: Ralph Campbell <rcampbell@nvidia.com>
>>
>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>> missing a call to hmm_range_unregister() in one of the error paths.
>> This leads to a reference count leak and ultimately a memory leak on
>> struct hmm.
>>
>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
> 
> How about * Call hmm_range_unregister() in error path if
> hmm_range_register() succeeded* ?

Sure, sounds good.
I'll include that in v2.

>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Balbir Singh <bsingharora@gmail.com>
>> Cc: Dan Carpenter <dan.carpenter@oracle.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>   include/linux/hmm.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>> index 35a429621e1e..fa0671d67269 100644
>> --- a/include/linux/hmm.h
>> +++ b/include/linux/hmm.h
>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>                  return (int)ret;
>>
>>          if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>> +               hmm_range_unregister(range);
>>                  /*
>>                   * The mmap_sem was taken by driver we release it here and
>>                   * returns -EAGAIN which correspond to mmap_sem have been
>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>
>>          ret = hmm_range_fault(range, block);
>>          if (ret <= 0) {
>> +               hmm_range_unregister(range);
> 
> what is the reason to moved it up ?

I moved it up because the normal calling pattern is:
     down_read(&mm->mmap_sem)
     hmm_vma_fault()
         hmm_range_register()
         hmm_range_fault()
         hmm_range_unregister()
     up_read(&mm->mmap_sem)

I don't think it is a bug to unlock mmap_sem and then unregister,
it is just more consistent nesting.

>>                  if (ret == -EBUSY || !ret) {
>>                          /* Same as above, drop mmap_sem to match old API. */
>>                          up_read(&range->vma->vm_mm->mmap_sem);
>>                          ret = -EBUSY;
>>                  } else if (ret == -EAGAIN)
>>                          ret = -EBUSY;
>> -               hmm_range_unregister(range);
>>                  return ret;
>>          }
>>          return 0;
>> --
>> 2.20.1
>>
Souptick Joarder May 9, 2019, 4:42 a.m. UTC | #3
On Tue, May 7, 2019 at 11:42 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
>
>
> On 5/7/19 6:15 AM, Souptick Joarder wrote:
> > On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote:
> >>
> >> From: Ralph Campbell <rcampbell@nvidia.com>
> >>
> >> The helper function hmm_vma_fault() calls hmm_range_register() but is
> >> missing a call to hmm_range_unregister() in one of the error paths.
> >> This leads to a reference count leak and ultimately a memory leak on
> >> struct hmm.
> >>
> >> Always call hmm_range_unregister() if hmm_range_register() succeeded.
> >
> > How about * Call hmm_range_unregister() in error path if
> > hmm_range_register() succeeded* ?
>
> Sure, sounds good.
> I'll include that in v2.

Thanks.
>
> >>
> >> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> >> Cc: John Hubbard <jhubbard@nvidia.com>
> >> Cc: Ira Weiny <ira.weiny@intel.com>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Cc: Arnd Bergmann <arnd@arndb.de>
> >> Cc: Balbir Singh <bsingharora@gmail.com>
> >> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> >> Cc: Matthew Wilcox <willy@infradead.org>
> >> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> ---
> >>   include/linux/hmm.h | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> >> index 35a429621e1e..fa0671d67269 100644
> >> --- a/include/linux/hmm.h
> >> +++ b/include/linux/hmm.h
> >> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> >>                  return (int)ret;
> >>
> >>          if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> >> +               hmm_range_unregister(range);
> >>                  /*
> >>                   * The mmap_sem was taken by driver we release it here and
> >>                   * returns -EAGAIN which correspond to mmap_sem have been
> >> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> >>
> >>          ret = hmm_range_fault(range, block);
> >>          if (ret <= 0) {
> >> +               hmm_range_unregister(range);
> >
> > what is the reason to moved it up ?
>
> I moved it up because the normal calling pattern is:
>      down_read(&mm->mmap_sem)
>      hmm_vma_fault()
>          hmm_range_register()
>          hmm_range_fault()
>          hmm_range_unregister()
>      up_read(&mm->mmap_sem)
>
> I don't think it is a bug to unlock mmap_sem and then unregister,
> it is just more consistent nesting.

Ok. I think, adding it in change log will be helpful :)
>
> >>                  if (ret == -EBUSY || !ret) {
> >>                          /* Same as above, drop mmap_sem to match old API. */
> >>                          up_read(&range->vma->vm_mm->mmap_sem);
> >>                          ret = -EBUSY;
> >>                  } else if (ret == -EAGAIN)
> >>                          ret = -EBUSY;
> >> -               hmm_range_unregister(range);
> >>                  return ret;
> >>          }
> >>          return 0;
> >> --
> >> 2.20.1
> >>
Jerome Glisse May 12, 2019, 3:07 p.m. UTC | #4
On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote:
> 
> On 5/7/19 6:15 AM, Souptick Joarder wrote:
> > On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote:
> > > 
> > > From: Ralph Campbell <rcampbell@nvidia.com>
> > > 
> > > The helper function hmm_vma_fault() calls hmm_range_register() but is
> > > missing a call to hmm_range_unregister() in one of the error paths.
> > > This leads to a reference count leak and ultimately a memory leak on
> > > struct hmm.
> > > 
> > > Always call hmm_range_unregister() if hmm_range_register() succeeded.
> > 
> > How about * Call hmm_range_unregister() in error path if
> > hmm_range_register() succeeded* ?
> 
> Sure, sounds good.
> I'll include that in v2.

NAK for the patch see below why

> 
> > > 
> > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: Balbir Singh <bsingharora@gmail.com>
> > > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: Souptick Joarder <jrdr.linux@gmail.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > ---
> > >   include/linux/hmm.h | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > > index 35a429621e1e..fa0671d67269 100644
> > > --- a/include/linux/hmm.h
> > > +++ b/include/linux/hmm.h
> > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > >                  return (int)ret;
> > > 
> > >          if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> > > +               hmm_range_unregister(range);
> > >                  /*
> > >                   * The mmap_sem was taken by driver we release it here and
> > >                   * returns -EAGAIN which correspond to mmap_sem have been
> > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > 
> > >          ret = hmm_range_fault(range, block);
> > >          if (ret <= 0) {
> > > +               hmm_range_unregister(range);
> > 
> > what is the reason to moved it up ?
> 
> I moved it up because the normal calling pattern is:
>     down_read(&mm->mmap_sem)
>     hmm_vma_fault()
>         hmm_range_register()
>         hmm_range_fault()
>         hmm_range_unregister()
>     up_read(&mm->mmap_sem)
> 
> I don't think it is a bug to unlock mmap_sem and then unregister,
> it is just more consistent nesting.

So this is not the usage pattern with HMM usage pattern is:

hmm_range_register()
hmm_range_fault()
hmm_range_unregister()

The hmm_vma_fault() is gonne so this patch here break thing.

See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
Jerome Glisse May 12, 2019, 3:28 p.m. UTC | #5
On Sun, May 12, 2019 at 11:07:24AM -0400, Jerome Glisse wrote:
> On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote:
> > 
> > On 5/7/19 6:15 AM, Souptick Joarder wrote:
> > > On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote:
> > > > 
> > > > From: Ralph Campbell <rcampbell@nvidia.com>
> > > > 
> > > > The helper function hmm_vma_fault() calls hmm_range_register() but is
> > > > missing a call to hmm_range_unregister() in one of the error paths.
> > > > This leads to a reference count leak and ultimately a memory leak on
> > > > struct hmm.
> > > > 
> > > > Always call hmm_range_unregister() if hmm_range_register() succeeded.
> > > 
> > > How about * Call hmm_range_unregister() in error path if
> > > hmm_range_register() succeeded* ?
> > 
> > Sure, sounds good.
> > I'll include that in v2.
> 
> NAK for the patch see below why
> 
> > 
> > > > 
> > > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > > Cc: Balbir Singh <bsingharora@gmail.com>
> > > > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > > > Cc: Matthew Wilcox <willy@infradead.org>
> > > > Cc: Souptick Joarder <jrdr.linux@gmail.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > ---
> > > >   include/linux/hmm.h | 3 ++-
> > > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > > > index 35a429621e1e..fa0671d67269 100644
> > > > --- a/include/linux/hmm.h
> > > > +++ b/include/linux/hmm.h
> > > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > >                  return (int)ret;
> > > > 
> > > >          if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> > > > +               hmm_range_unregister(range);
> > > >                  /*
> > > >                   * The mmap_sem was taken by driver we release it here and
> > > >                   * returns -EAGAIN which correspond to mmap_sem have been
> > > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > > 
> > > >          ret = hmm_range_fault(range, block);
> > > >          if (ret <= 0) {
> > > > +               hmm_range_unregister(range);
> > > 
> > > what is the reason to moved it up ?
> > 
> > I moved it up because the normal calling pattern is:
> >     down_read(&mm->mmap_sem)
> >     hmm_vma_fault()
> >         hmm_range_register()
> >         hmm_range_fault()
> >         hmm_range_unregister()
> >     up_read(&mm->mmap_sem)
> > 
> > I don't think it is a bug to unlock mmap_sem and then unregister,
> > it is just more consistent nesting.
> 
> So this is not the usage pattern with HMM usage pattern is:
> 
> hmm_range_register()
> hmm_range_fault()
> hmm_range_unregister()
> 
> The hmm_vma_fault() is gonne so this patch here break thing.
> 
> See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3

Sorry not enough coffee on sunday morning, so yeah this patch
looks good except that you do not need to move it up.

Note that hmm_vma_fault() is a gonner once ODP to HMM is upstream
and i converted nouveau/amd to new API then we can remove that
one.

Cheers,
Jérôme
Ralph Campbell May 13, 2019, 5:23 p.m. UTC | #6
On 5/12/19 8:07 AM, Jerome Glisse wrote:
> On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote:
>>
>> On 5/7/19 6:15 AM, Souptick Joarder wrote:
>>> On Tue, May 7, 2019 at 5:00 AM <rcampbell@nvidia.com> wrote:
>>>>
>>>> From: Ralph Campbell <rcampbell@nvidia.com>
>>>>
>>>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>>>> missing a call to hmm_range_unregister() in one of the error paths.
>>>> This leads to a reference count leak and ultimately a memory leak on
>>>> struct hmm.
>>>>
>>>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>>>
>>> How about * Call hmm_range_unregister() in error path if
>>> hmm_range_register() succeeded* ?
>>
>> Sure, sounds good.
>> I'll include that in v2.
> 
> NAK for the patch see below why
> 
>>
>>>>
>>>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>>>> Cc: John Hubbard <jhubbard@nvidia.com>
>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>> Cc: Balbir Singh <bsingharora@gmail.com>
>>>> Cc: Dan Carpenter <dan.carpenter@oracle.com>
>>>> Cc: Matthew Wilcox <willy@infradead.org>
>>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> ---
>>>>    include/linux/hmm.h | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>>>> index 35a429621e1e..fa0671d67269 100644
>>>> --- a/include/linux/hmm.h
>>>> +++ b/include/linux/hmm.h
>>>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>>                   return (int)ret;
>>>>
>>>>           if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>>>> +               hmm_range_unregister(range);
>>>>                   /*
>>>>                    * The mmap_sem was taken by driver we release it here and
>>>>                    * returns -EAGAIN which correspond to mmap_sem have been
>>>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>>
>>>>           ret = hmm_range_fault(range, block);
>>>>           if (ret <= 0) {
>>>> +               hmm_range_unregister(range);
>>>
>>> what is the reason to moved it up ?
>>
>> I moved it up because the normal calling pattern is:
>>      down_read(&mm->mmap_sem)
>>      hmm_vma_fault()
>>          hmm_range_register()
>>          hmm_range_fault()
>>          hmm_range_unregister()
>>      up_read(&mm->mmap_sem)
>>
>> I don't think it is a bug to unlock mmap_sem and then unregister,
>> it is just more consistent nesting.
> 
> So this is not the usage pattern with HMM usage pattern is:
> 
> hmm_range_register()
> hmm_range_fault()
> hmm_range_unregister()
> 
> The hmm_vma_fault() is gonne so this patch here break thing.
> 
> See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3

The patch series is on top of v5.1-rc6-mmotm-2019-04-25-16-30.
hmm_vma_fault() is defined there and in your hmm-5.2-v3 branch as
a backward compatibility transition function in include/linux/hmm.h.
So I agree the new API is to use hmm_range_register(), etc.
This is intended to cover the transition period.
Note that hmm_vma_fault() is being called from
drivers/gpu/drm/nouveau/nouveau_svm.c in both trees.
Jason Gunthorpe June 6, 2019, 2:50 p.m. UTC | #7
On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote:
> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> The helper function hmm_vma_fault() calls hmm_range_register() but is
> missing a call to hmm_range_unregister() in one of the error paths.
> This leads to a reference count leak and ultimately a memory leak on
> struct hmm.
> 
> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/hmm.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 35a429621e1e..fa0671d67269 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>  		return (int)ret;
>  
>  	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> +		hmm_range_unregister(range);
>  		/*
>  		 * The mmap_sem was taken by driver we release it here and
>  		 * returns -EAGAIN which correspond to mmap_sem have been
> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>  
>  	ret = hmm_range_fault(range, block);
>  	if (ret <= 0) {
> +		hmm_range_unregister(range);

While this seems to be a clear improvement, it seems there is still a
bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
never calls hmm_range_unregister() for its on stack range - and
hmm_vma_fault() still returns with the range registered.

As hmm_vma_fault() is only used by nouveau and is marked as
deprecated, I think we need to fix nouveau, either by dropping
hmm_range_fault(), or by adding the missing unregister to nouveau in
this patch.

Also, I see in linux-next that amdgpu_ttm.c has wrongly copied use of
this deprecated API, including these bugs...

amd folks: Can you please push a patch for your driver to stop using
hmm_vma_fault() and correct the use-after free? Ideally I'd like to
delete this function this merge cycle from hmm.git

Also if you missed it, I'm running a clean hmm.git that you can pull
into the AMD tree, if necessary, to get the changes that will go into
5.3 - if you need/wish to do this please consult with me before making a
merge commit, thanks. See:

 https://lore.kernel.org/lkml/20190524124455.GB16845@ziepe.ca/

So Ralph, you'll need to resend this.

Thanks,
Jason
Ralph Campbell June 6, 2019, 7:44 p.m. UTC | #8
On 6/6/19 7:50 AM, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote:
>> From: Ralph Campbell <rcampbell@nvidia.com>
>>
>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>> missing a call to hmm_range_unregister() in one of the error paths.
>> This leads to a reference count leak and ultimately a memory leak on
>> struct hmm.
>>
>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Balbir Singh <bsingharora@gmail.com>
>> Cc: Dan Carpenter <dan.carpenter@oracle.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>   include/linux/hmm.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>> index 35a429621e1e..fa0671d67269 100644
>> --- a/include/linux/hmm.h
>> +++ b/include/linux/hmm.h
>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>   		return (int)ret;
>>   
>>   	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>> +		hmm_range_unregister(range);
>>   		/*
>>   		 * The mmap_sem was taken by driver we release it here and
>>   		 * returns -EAGAIN which correspond to mmap_sem have been
>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>   
>>   	ret = hmm_range_fault(range, block);
>>   	if (ret <= 0) {
>> +		hmm_range_unregister(range);
> 
> While this seems to be a clear improvement, it seems there is still a
> bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
> never calls hmm_range_unregister() for its on stack range - and
> hmm_vma_fault() still returns with the range registered.
> 
> As hmm_vma_fault() is only used by nouveau and is marked as
> deprecated, I think we need to fix nouveau, either by dropping
> hmm_range_fault(), or by adding the missing unregister to nouveau in
> this patch.

I will send a patch for nouveau to use hmm_range_register() and
hmm_range_fault() and do some testing with OpenCL.
I can also send a separate patch to then remove hmm_vma_fault()
but I guess that should be after AMD's changes.

> Also, I see in linux-next that amdgpu_ttm.c has wrongly copied use of
> this deprecated API, including these bugs...
> 
> amd folks: Can you please push a patch for your driver to stop using
> hmm_vma_fault() and correct the use-after free? Ideally I'd like to
> delete this function this merge cycle from hmm.git
> 
> Also if you missed it, I'm running a clean hmm.git that you can pull
> into the AMD tree, if necessary, to get the changes that will go into
> 5.3 - if you need/wish to do this please consult with me before making a
> merge commit, thanks. See:
> 
>   https://lore.kernel.org/lkml/20190524124455.GB16845@ziepe.ca/
> 
> So Ralph, you'll need to resend this.
> 
> Thanks,
> Jason
>
Jason Gunthorpe June 6, 2019, 7:54 p.m. UTC | #9
On Thu, Jun 06, 2019 at 12:44:36PM -0700, Ralph Campbell wrote:
> 
> On 6/6/19 7:50 AM, Jason Gunthorpe wrote:
> > On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote:
> > > From: Ralph Campbell <rcampbell@nvidia.com>
> > > 
> > > The helper function hmm_vma_fault() calls hmm_range_register() but is
> > > missing a call to hmm_range_unregister() in one of the error paths.
> > > This leads to a reference count leak and ultimately a memory leak on
> > > struct hmm.
> > > 
> > > Always call hmm_range_unregister() if hmm_range_register() succeeded.
> > > 
> > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: Balbir Singh <bsingharora@gmail.com>
> > > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: Souptick Joarder <jrdr.linux@gmail.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > >   include/linux/hmm.h | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > > index 35a429621e1e..fa0671d67269 100644
> > > +++ b/include/linux/hmm.h
> > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > >   		return (int)ret;
> > >   	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> > > +		hmm_range_unregister(range);
> > >   		/*
> > >   		 * The mmap_sem was taken by driver we release it here and
> > >   		 * returns -EAGAIN which correspond to mmap_sem have been
> > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > >   	ret = hmm_range_fault(range, block);
> > >   	if (ret <= 0) {
> > > +		hmm_range_unregister(range);
> > 
> > While this seems to be a clear improvement, it seems there is still a
> > bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
> > never calls hmm_range_unregister() for its on stack range - and
> > hmm_vma_fault() still returns with the range registered.
> > 
> > As hmm_vma_fault() is only used by nouveau and is marked as
> > deprecated, I think we need to fix nouveau, either by dropping
> > hmm_range_fault(), or by adding the missing unregister to nouveau in
> > this patch.
> 
> I will send a patch for nouveau to use hmm_range_register() and
> hmm_range_fault() and do some testing with OpenCL.

wow, thanks, I'd like to also really like to send such a thing through
hmm.git - do you know who the nouveau maintainers are so we can
collaborate on patch planning this?

> I can also send a separate patch to then remove hmm_vma_fault()
> but I guess that should be after AMD's changes.

Let us wait to hear back from AMD how they can consume hmm.git - I'd
very much like to get everything done in one kernel cycle!

Regards,
Jason
Ralph Campbell June 6, 2019, 9:08 p.m. UTC | #10
On 6/6/19 12:54 PM, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2019 at 12:44:36PM -0700, Ralph Campbell wrote:
>>
>> On 6/6/19 7:50 AM, Jason Gunthorpe wrote:
>>> On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@nvidia.com wrote:
>>>> From: Ralph Campbell <rcampbell@nvidia.com>
>>>>
>>>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>>>> missing a call to hmm_range_unregister() in one of the error paths.
>>>> This leads to a reference count leak and ultimately a memory leak on
>>>> struct hmm.
>>>>
>>>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>>>>
>>>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>>>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>>>> Cc: John Hubbard <jhubbard@nvidia.com>
>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>> Cc: Balbir Singh <bsingharora@gmail.com>
>>>> Cc: Dan Carpenter <dan.carpenter@oracle.com>
>>>> Cc: Matthew Wilcox <willy@infradead.org>
>>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>    include/linux/hmm.h | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>>>> index 35a429621e1e..fa0671d67269 100644
>>>> +++ b/include/linux/hmm.h
>>>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>>    		return (int)ret;
>>>>    	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>>>> +		hmm_range_unregister(range);
>>>>    		/*
>>>>    		 * The mmap_sem was taken by driver we release it here and
>>>>    		 * returns -EAGAIN which correspond to mmap_sem have been
>>>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>>    	ret = hmm_range_fault(range, block);
>>>>    	if (ret <= 0) {
>>>> +		hmm_range_unregister(range);
>>>
>>> While this seems to be a clear improvement, it seems there is still a
>>> bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
>>> never calls hmm_range_unregister() for its on stack range - and
>>> hmm_vma_fault() still returns with the range registered.
>>>
>>> As hmm_vma_fault() is only used by nouveau and is marked as
>>> deprecated, I think we need to fix nouveau, either by dropping
>>> hmm_range_fault(), or by adding the missing unregister to nouveau in
>>> this patch.
>>
>> I will send a patch for nouveau to use hmm_range_register() and
>> hmm_range_fault() and do some testing with OpenCL.
> 
> wow, thanks, I'd like to also really like to send such a thing through
> hmm.git - do you know who the nouveau maintainers are so we can
> collaborate on patch planning this?

Ben Skeggs <bskeggs@redhat.com> is the maintainer and
nouveau@lists.freedesktop.org is the mailing list for changes.
I'll be sure to CC them for the patch.

>> I can also send a separate patch to then remove hmm_vma_fault()
>> but I guess that should be after AMD's changes.
> 
> Let us wait to hear back from AMD how they can consume hmm.git - I'd
> very much like to get everything done in one kernel cycle!
> 
> Regards,
> Jason
>
diff mbox series

Patch

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 35a429621e1e..fa0671d67269 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -559,6 +559,7 @@  static inline int hmm_vma_fault(struct hmm_range *range, bool block)
 		return (int)ret;
 
 	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
+		hmm_range_unregister(range);
 		/*
 		 * The mmap_sem was taken by driver we release it here and
 		 * returns -EAGAIN which correspond to mmap_sem have been
@@ -570,13 +571,13 @@  static inline int hmm_vma_fault(struct hmm_range *range, bool block)
 
 	ret = hmm_range_fault(range, block);
 	if (ret <= 0) {
+		hmm_range_unregister(range);
 		if (ret == -EBUSY || !ret) {
 			/* Same as above, drop mmap_sem to match old API. */
 			up_read(&range->vma->vm_mm->mmap_sem);
 			ret = -EBUSY;
 		} else if (ret == -EAGAIN)
 			ret = -EBUSY;
-		hmm_range_unregister(range);
 		return ret;
 	}
 	return 0;