diff mbox series

[v2,10/11] mm/hmm: add helpers for driver to safely take the mmap_sem v2

Message ID 20190325144011.10560-11-jglisse@redhat.com (mailing list archive)
State New, archived
Headers show
Series Improve HMM driver API v2 | expand

Commit Message

Jerome Glisse March 25, 2019, 2:40 p.m. UTC
From: Jérôme Glisse <jglisse@redhat.com>

The device driver context which holds reference to mirror and thus to
core hmm struct might outlive the mm against which it was created. To
avoid every driver to check for that case provide an helper that check
if mm is still alive and take the mmap_sem in read mode if so. If the
mm have been destroy (mmu_notifier release call back did happen) then
we return -EINVAL so that calling code knows that it is trying to do
something against a mm that is no longer valid.

Changes since v1:
    - removed bunch of useless check (if API is use with bogus argument
      better to fail loudly so user fix their code)

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 3 deletions(-)

Comments

Ira Weiny March 28, 2019, 6:44 p.m. UTC | #1
On Thu, Mar 28, 2019 at 04:34:04PM -0700, John Hubbard wrote:
> On 3/28/19 4:24 PM, Jerome Glisse wrote:
> > On Thu, Mar 28, 2019 at 04:20:37PM -0700, John Hubbard wrote:
> >> On 3/28/19 4:05 PM, Jerome Glisse wrote:
> >>> On Thu, Mar 28, 2019 at 03:43:33PM -0700, John Hubbard wrote:
> >>>> On 3/28/19 3:40 PM, Jerome Glisse wrote:
> >>>>> On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote:
> >>>>>> On 3/28/19 3:08 PM, Jerome Glisse wrote:
> >>>>>>> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
> >>>>>>>> On 3/28/19 2:30 PM, Jerome Glisse wrote:
> >>>>>>>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
> >>>>>>>>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
> >>>>>>>>>>> From: Jérôme Glisse <jglisse@redhat.com>
> >>>>>> [...]
> >>>>>> OK, so let's either drop this patch, or if merge windows won't allow that,
> >>>>>> then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
> >>>>>> that does the same checks.
> >>>>>
> >>>>> RDMA depends on this, so does the nouveau patchset that convert to new API.
> >>>>> So i do not see reason to drop this. They are user for this they are posted
> >>>>> and i hope i explained properly the benefit.
> >>>>>
> >>>>> It is a common pattern. Yes it only save couple lines of code but down the
> >>>>> road i will also help for people working on the mmap_sem patchset.
> >>>>>
> >>>>
> >>>> It *adds* a couple of lines that are misleading, because they look like they
> >>>> make things safer, but they don't actually do so.
> >>>
> >>> It is not about safety, sorry if it confused you but there is nothing about
> >>> safety here, i can add a big fat comment that explains that there is no safety
> >>> here. The intention is to allow the page fault handler that potential have
> >>> hundred of page fault queue up to abort as soon as it sees that it is pointless
> >>> to keep faulting on a dying process.
> >>>
> >>> Again if we race it is _fine_ nothing bad will happen, we are just doing use-
> >>> less work that gonna be thrown on the floor and we are just slowing down the
> >>> process tear down.
> >>>
> >>
> >> In addition to a comment, how about naming this thing to indicate the above 
> >> intention?  I have a really hard time with this odd down_read() wrapper, which
> >> allows code to proceed without really getting a lock. It's just too wrong-looking.
> >> If it were instead named:
> >>
> >> 	hmm_is_exiting()
> > 
> > What about: hmm_lock_mmap_if_alive() ?
> > 
> 
> That's definitely better, but I want to vote for just doing a check, not 
> taking any locks.
> 
> I'm not super concerned about the exact name, but I really want a routine that
> just checks (and optionally asserts, via WARN or BUG), and that's *all*. Then
> drivers can scatter that around like pixie dust as they see fit. Maybe right before
> taking a lock, maybe in other places. Decoupled from locking.

I agree.  Names matter and any function which is called *_down_read and could
potentially not take the lock should be called try_*_down_read.  Furthermore
users should be checking the return values from any try_*.

It is also odd that we are calling "down/up" on something which is not a
semaphore.  So the user here needs to _know_ that they are really getting the
lock on the mm which sits behind the scenes.  What John is proposing is more
explicit when reading driver code.

Ira

> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
>
John Hubbard March 28, 2019, 8:54 p.m. UTC | #2
On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> The device driver context which holds reference to mirror and thus to
> core hmm struct might outlive the mm against which it was created. To
> avoid every driver to check for that case provide an helper that check
> if mm is still alive and take the mmap_sem in read mode if so. If the
> mm have been destroy (mmu_notifier release call back did happen) then
> we return -EINVAL so that calling code knows that it is trying to do
> something against a mm that is no longer valid.
> 
> Changes since v1:
>     - removed bunch of useless check (if API is use with bogus argument
>       better to fail loudly so user fix their code)
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index f3b919b04eda..5f9deaeb9d77 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -438,6 +438,50 @@ struct hmm_mirror {
>  int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
>  void hmm_mirror_unregister(struct hmm_mirror *mirror);
>  
> +/*
> + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
> + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
> + *
> + * The device driver context which holds reference to mirror and thus to core
> + * hmm struct might outlive the mm against which it was created. To avoid every
> + * driver to check for that case provide an helper that check if mm is still
> + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
> + * (mmu_notifier release call back did happen) then we return -EINVAL so that
> + * calling code knows that it is trying to do something against a mm that is
> + * no longer valid.
> + */
> +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)

Hi Jerome,

Let's please not do this. There are at least two problems here:

1. The hmm_mirror_mm_down_read() wrapper around down_read() requires a 
return value. This is counter to how locking is normally done: callers do
not normally have to check the return value of most locks (other than
trylocks). And sure enough, your own code below doesn't check the return value.
That is a pretty good illustration of why not to do this.

2. This is a weird place to randomly check for semi-unrelated state, such 
as "is HMM still alive". By that I mean, if you have to detect a problem
at down_read() time, then the problem could have existed both before and
after the call to this wrapper. So it is providing a false sense of security,
and it is therefore actually undesirable to add the code.

If you insist on having this wrapper, I think it should have approximately 
this form:

void hmm_mirror_mm_down_read(...)
{
	WARN_ON(...)
	down_read(...)
} 

> +{
> +	struct mm_struct *mm;
> +
> +	/* Sanity check ... */
> +	if (!mirror || !mirror->hmm)
> +		return -EINVAL;
> +	/*
> +	 * Before trying to take the mmap_sem make sure the mm is still
> +	 * alive as device driver context might outlive the mm lifetime.

Let's find another way, and a better place, to solve this problem.
Ref counting?

> +	 *
> +	 * FIXME: should we also check for mm that outlive its owning
> +	 * task ?
> +	 */
> +	mm = READ_ONCE(mirror->hmm->mm);
> +	if (mirror->hmm->dead || !mm)
> +		return -EINVAL;
> +
> +	down_read(&mm->mmap_sem);
> +	return 0;
> +}
> +
> +/*
> + * hmm_mirror_mm_up_read() - unlock the mmap_sem from read mode
> + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> + */
> +static inline void hmm_mirror_mm_up_read(struct hmm_mirror *mirror)
> +{
> +	up_read(&mirror->hmm->mm->mmap_sem);
> +}
> +
>  
>  /*
>   * To snapshot the CPU page table you first have to call hmm_range_register()
> @@ -463,7 +507,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror);
>   *          if (ret)
>   *              return ret;
>   *
> - *          down_read(mm->mmap_sem);
> + *          hmm_mirror_mm_down_read(mirror);

See? The normal down_read() code never needs to check a return value, so when
someone does a "simple" upgrade, it introduces a fatal bug here: if the wrapper
returns early, then the caller proceeds without having acquired the mmap_sem.

>   *      again:
>   *
>   *          if (!hmm_range_wait_until_valid(&range, TIMEOUT)) {
> @@ -476,13 +520,13 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror);
>   *
>   *          ret = hmm_range_snapshot(&range); or hmm_range_fault(&range);
>   *          if (ret == -EAGAIN) {
> - *              down_read(mm->mmap_sem);
> + *              hmm_mirror_mm_down_read(mirror);

Same problem here.


thanks,
Jerome Glisse March 28, 2019, 9:30 p.m. UTC | #3
On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > The device driver context which holds reference to mirror and thus to
> > core hmm struct might outlive the mm against which it was created. To
> > avoid every driver to check for that case provide an helper that check
> > if mm is still alive and take the mmap_sem in read mode if so. If the
> > mm have been destroy (mmu_notifier release call back did happen) then
> > we return -EINVAL so that calling code knows that it is trying to do
> > something against a mm that is no longer valid.
> > 
> > Changes since v1:
> >     - removed bunch of useless check (if API is use with bogus argument
> >       better to fail loudly so user fix their code)
> > 
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 47 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > index f3b919b04eda..5f9deaeb9d77 100644
> > --- a/include/linux/hmm.h
> > +++ b/include/linux/hmm.h
> > @@ -438,6 +438,50 @@ struct hmm_mirror {
> >  int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
> >  void hmm_mirror_unregister(struct hmm_mirror *mirror);
> >  
> > +/*
> > + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
> > + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> > + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
> > + *
> > + * The device driver context which holds reference to mirror and thus to core
> > + * hmm struct might outlive the mm against which it was created. To avoid every
> > + * driver to check for that case provide an helper that check if mm is still
> > + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
> > + * (mmu_notifier release call back did happen) then we return -EINVAL so that
> > + * calling code knows that it is trying to do something against a mm that is
> > + * no longer valid.
> > + */
> > +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
> 
> Hi Jerome,
> 
> Let's please not do this. There are at least two problems here:
> 
> 1. The hmm_mirror_mm_down_read() wrapper around down_read() requires a 
> return value. This is counter to how locking is normally done: callers do
> not normally have to check the return value of most locks (other than
> trylocks). And sure enough, your own code below doesn't check the return value.
> That is a pretty good illustration of why not to do this.

Please read the function description this is not about checking lock
return value it is about checking wether we are racing with process
destruction and avoid trying to take lock in such cases so that driver
do abort as quickly as possible when a process is being kill.

> 
> 2. This is a weird place to randomly check for semi-unrelated state, such 
> as "is HMM still alive". By that I mean, if you have to detect a problem
> at down_read() time, then the problem could have existed both before and
> after the call to this wrapper. So it is providing a false sense of security,
> and it is therefore actually undesirable to add the code.

It is not, this function is use in device page fault handler which will
happens asynchronously from CPU event or process lifetime when a process
is killed or is dying we do want to avoid useless page fault work and
we do want to avoid blocking the page fault queue of the device. This
function reports to the caller that the process is dying and that it
should just abort the page fault and do whatever other device specific
thing that needs to happen.

> 
> If you insist on having this wrapper, I think it should have approximately 
> this form:
> 
> void hmm_mirror_mm_down_read(...)
> {
> 	WARN_ON(...)
> 	down_read(...)
> } 

I do insist as it is useful and use by both RDMA and nouveau and the
above would kill the intent. The intent is do not try to take the lock
if the process is dying.


> 
> > +{
> > +	struct mm_struct *mm;
> > +
> > +	/* Sanity check ... */
> > +	if (!mirror || !mirror->hmm)
> > +		return -EINVAL;
> > +	/*
> > +	 * Before trying to take the mmap_sem make sure the mm is still
> > +	 * alive as device driver context might outlive the mm lifetime.
> 
> Let's find another way, and a better place, to solve this problem.
> Ref counting?

This has nothing to do with refcount or use after free or anthing
like that. It is just about checking wether we are about to do
something pointless. If the process is dying then it is pointless
to try to take the lock and it is pointless for the device driver
to trigger handle_mm_fault().

> 
> > +	 *
> > +	 * FIXME: should we also check for mm that outlive its owning
> > +	 * task ?
> > +	 */
> > +	mm = READ_ONCE(mirror->hmm->mm);
> > +	if (mirror->hmm->dead || !mm)
> > +		return -EINVAL;
> > +
> > +	down_read(&mm->mmap_sem);
> > +	return 0;
> > +}
> > +
> > +/*
> > + * hmm_mirror_mm_up_read() - unlock the mmap_sem from read mode
> > + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> > + */
> > +static inline void hmm_mirror_mm_up_read(struct hmm_mirror *mirror)
> > +{
> > +	up_read(&mirror->hmm->mm->mmap_sem);
> > +}
> > +
> >  
> >  /*
> >   * To snapshot the CPU page table you first have to call hmm_range_register()
> > @@ -463,7 +507,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror);
> >   *          if (ret)
> >   *              return ret;
> >   *
> > - *          down_read(mm->mmap_sem);
> > + *          hmm_mirror_mm_down_read(mirror);
> 
> See? The normal down_read() code never needs to check a return value, so when
> someone does a "simple" upgrade, it introduces a fatal bug here: if the wrapper
> returns early, then the caller proceeds without having acquired the mmap_sem.

That convertion is useless can't remember why i did it.

> >   *      again:
> >   *
> >   *          if (!hmm_range_wait_until_valid(&range, TIMEOUT)) {
> > @@ -476,13 +520,13 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror);
> >   *
> >   *          ret = hmm_range_snapshot(&range); or hmm_range_fault(&range);
> >   *          if (ret == -EAGAIN) {
> > - *              down_read(mm->mmap_sem);
> > + *              hmm_mirror_mm_down_read(mirror);
> 
> Same problem here.

Again useless i can't remember why i did that one. This helper is
intended to be use by driver.

Cheers,
Jérôme
John Hubbard March 28, 2019, 9:41 p.m. UTC | #4
On 3/28/19 2:30 PM, Jerome Glisse wrote:
> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
>>> From: Jérôme Glisse <jglisse@redhat.com>
>>>
>>> The device driver context which holds reference to mirror and thus to
>>> core hmm struct might outlive the mm against which it was created. To
>>> avoid every driver to check for that case provide an helper that check
>>> if mm is still alive and take the mmap_sem in read mode if so. If the
>>> mm have been destroy (mmu_notifier release call back did happen) then
>>> we return -EINVAL so that calling code knows that it is trying to do
>>> something against a mm that is no longer valid.
>>>
>>> Changes since v1:
>>>     - removed bunch of useless check (if API is use with bogus argument
>>>       better to fail loudly so user fix their code)
>>>
>>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>>> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: John Hubbard <jhubbard@nvidia.com>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>> ---
>>>  include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
>>>  1 file changed, 47 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>>> index f3b919b04eda..5f9deaeb9d77 100644
>>> --- a/include/linux/hmm.h
>>> +++ b/include/linux/hmm.h
>>> @@ -438,6 +438,50 @@ struct hmm_mirror {
>>>  int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
>>>  void hmm_mirror_unregister(struct hmm_mirror *mirror);
>>>  
>>> +/*
>>> + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
>>> + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
>>> + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
>>> + *
>>> + * The device driver context which holds reference to mirror and thus to core
>>> + * hmm struct might outlive the mm against which it was created. To avoid every
>>> + * driver to check for that case provide an helper that check if mm is still
>>> + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
>>> + * (mmu_notifier release call back did happen) then we return -EINVAL so that
>>> + * calling code knows that it is trying to do something against a mm that is
>>> + * no longer valid.
>>> + */
>>> +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
>>
>> Hi Jerome,
>>
>> Let's please not do this. There are at least two problems here:
>>
>> 1. The hmm_mirror_mm_down_read() wrapper around down_read() requires a 
>> return value. This is counter to how locking is normally done: callers do
>> not normally have to check the return value of most locks (other than
>> trylocks). And sure enough, your own code below doesn't check the return value.
>> That is a pretty good illustration of why not to do this.
> 
> Please read the function description this is not about checking lock
> return value it is about checking wether we are racing with process
> destruction and avoid trying to take lock in such cases so that driver
> do abort as quickly as possible when a process is being kill.
> 
>>
>> 2. This is a weird place to randomly check for semi-unrelated state, such 
>> as "is HMM still alive". By that I mean, if you have to detect a problem
>> at down_read() time, then the problem could have existed both before and
>> after the call to this wrapper. So it is providing a false sense of security,
>> and it is therefore actually undesirable to add the code.
> 
> It is not, this function is use in device page fault handler which will
> happens asynchronously from CPU event or process lifetime when a process
> is killed or is dying we do want to avoid useless page fault work and
> we do want to avoid blocking the page fault queue of the device. This
> function reports to the caller that the process is dying and that it
> should just abort the page fault and do whatever other device specific
> thing that needs to happen.
> 

But it's inherently racy, to check for a condition outside of any lock, so again,
it's a false sense of security.

>>
>> If you insist on having this wrapper, I think it should have approximately 
>> this form:
>>
>> void hmm_mirror_mm_down_read(...)
>> {
>> 	WARN_ON(...)
>> 	down_read(...)
>> } 
> 
> I do insist as it is useful and use by both RDMA and nouveau and the
> above would kill the intent. The intent is do not try to take the lock
> if the process is dying.

Could you provide me a link to those examples so I can take a peek? I
am still convinced that this whole thing is a race condition at best.

> 
> 
>>
>>> +{
>>> +	struct mm_struct *mm;
>>> +
>>> +	/* Sanity check ... */
>>> +	if (!mirror || !mirror->hmm)
>>> +		return -EINVAL;
>>> +	/*
>>> +	 * Before trying to take the mmap_sem make sure the mm is still
>>> +	 * alive as device driver context might outlive the mm lifetime.
>>
>> Let's find another way, and a better place, to solve this problem.
>> Ref counting?
> 
> This has nothing to do with refcount or use after free or anthing
> like that. It is just about checking wether we are about to do
> something pointless. If the process is dying then it is pointless
> to try to take the lock and it is pointless for the device driver
> to trigger handle_mm_fault().

Well, what happens if you let such pointless code run anyway? 
Does everything still work? If yes, then we don't need this change.
If no, then we need a race-free version of this change.

thanks,
Jerome Glisse March 28, 2019, 10:08 p.m. UTC | #5
On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
> On 3/28/19 2:30 PM, Jerome Glisse wrote:
> > On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
> >> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
> >>> From: Jérôme Glisse <jglisse@redhat.com>
> >>>
> >>> The device driver context which holds reference to mirror and thus to
> >>> core hmm struct might outlive the mm against which it was created. To
> >>> avoid every driver to check for that case provide an helper that check
> >>> if mm is still alive and take the mmap_sem in read mode if so. If the
> >>> mm have been destroy (mmu_notifier release call back did happen) then
> >>> we return -EINVAL so that calling code knows that it is trying to do
> >>> something against a mm that is no longer valid.
> >>>
> >>> Changes since v1:
> >>>     - removed bunch of useless check (if API is use with bogus argument
> >>>       better to fail loudly so user fix their code)
> >>>
> >>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> >>> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
> >>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>> Cc: John Hubbard <jhubbard@nvidia.com>
> >>> Cc: Dan Williams <dan.j.williams@intel.com>
> >>> ---
> >>>  include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
> >>>  1 file changed, 47 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> >>> index f3b919b04eda..5f9deaeb9d77 100644
> >>> --- a/include/linux/hmm.h
> >>> +++ b/include/linux/hmm.h
> >>> @@ -438,6 +438,50 @@ struct hmm_mirror {
> >>>  int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
> >>>  void hmm_mirror_unregister(struct hmm_mirror *mirror);
> >>>  
> >>> +/*
> >>> + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
> >>> + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> >>> + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
> >>> + *
> >>> + * The device driver context which holds reference to mirror and thus to core
> >>> + * hmm struct might outlive the mm against which it was created. To avoid every
> >>> + * driver to check for that case provide an helper that check if mm is still
> >>> + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
> >>> + * (mmu_notifier release call back did happen) then we return -EINVAL so that
> >>> + * calling code knows that it is trying to do something against a mm that is
> >>> + * no longer valid.
> >>> + */
> >>> +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
> >>
> >> Hi Jerome,
> >>
> >> Let's please not do this. There are at least two problems here:
> >>
> >> 1. The hmm_mirror_mm_down_read() wrapper around down_read() requires a 
> >> return value. This is counter to how locking is normally done: callers do
> >> not normally have to check the return value of most locks (other than
> >> trylocks). And sure enough, your own code below doesn't check the return value.
> >> That is a pretty good illustration of why not to do this.
> > 
> > Please read the function description this is not about checking lock
> > return value it is about checking wether we are racing with process
> > destruction and avoid trying to take lock in such cases so that driver
> > do abort as quickly as possible when a process is being kill.
> > 
> >>
> >> 2. This is a weird place to randomly check for semi-unrelated state, such 
> >> as "is HMM still alive". By that I mean, if you have to detect a problem
> >> at down_read() time, then the problem could have existed both before and
> >> after the call to this wrapper. So it is providing a false sense of security,
> >> and it is therefore actually undesirable to add the code.
> > 
> > It is not, this function is use in device page fault handler which will
> > happens asynchronously from CPU event or process lifetime when a process
> > is killed or is dying we do want to avoid useless page fault work and
> > we do want to avoid blocking the page fault queue of the device. This
> > function reports to the caller that the process is dying and that it
> > should just abort the page fault and do whatever other device specific
> > thing that needs to happen.
> > 
> 
> But it's inherently racy, to check for a condition outside of any lock, so again,
> it's a false sense of security.

Yes and race are fine here, this is to avoid useless work if we are
unlucky and we race and fail to see the destruction that is just
happening then it is fine we are just going to do useless work. So
we do not care about race here we just want to bailout early if we
can witness the process dying.

> 
> >>
> >> If you insist on having this wrapper, I think it should have approximately 
> >> this form:
> >>
> >> void hmm_mirror_mm_down_read(...)
> >> {
> >> 	WARN_ON(...)
> >> 	down_read(...)
> >> } 
> > 
> > I do insist as it is useful and use by both RDMA and nouveau and the
> > above would kill the intent. The intent is do not try to take the lock
> > if the process is dying.
> 
> Could you provide me a link to those examples so I can take a peek? I
> am still convinced that this whole thing is a race condition at best.

The race is fine and ok see:

https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-odp-v2&id=eebd4f3095290a16ebc03182e2d3ab5dfa7b05ec

which has been posted and i think i provided a link in the cover
letter to that post. The same patch exist for nouveau i need to
cleanup that tree and push it.

> > 
> > 
> >>
> >>> +{
> >>> +	struct mm_struct *mm;
> >>> +
> >>> +	/* Sanity check ... */
> >>> +	if (!mirror || !mirror->hmm)
> >>> +		return -EINVAL;
> >>> +	/*
> >>> +	 * Before trying to take the mmap_sem make sure the mm is still
> >>> +	 * alive as device driver context might outlive the mm lifetime.
> >>
> >> Let's find another way, and a better place, to solve this problem.
> >> Ref counting?
> > 
> > This has nothing to do with refcount or use after free or anthing
> > like that. It is just about checking wether we are about to do
> > something pointless. If the process is dying then it is pointless
> > to try to take the lock and it is pointless for the device driver
> > to trigger handle_mm_fault().
> 
> Well, what happens if you let such pointless code run anyway? 
> Does everything still work? If yes, then we don't need this change.
> If no, then we need a race-free version of this change.

Yes everything work, nothing bad can happen from a race, it will just
do useless work which never hurt anyone.

Cheers,
Jérôme
John Hubbard March 28, 2019, 10:25 p.m. UTC | #6
On 3/28/19 3:08 PM, Jerome Glisse wrote:
> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
>> On 3/28/19 2:30 PM, Jerome Glisse wrote:
>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
>>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
>>>>> From: Jérôme Glisse <jglisse@redhat.com>
[...]
>>
>>>>
>>>> If you insist on having this wrapper, I think it should have approximately 
>>>> this form:
>>>>
>>>> void hmm_mirror_mm_down_read(...)
>>>> {
>>>> 	WARN_ON(...)
>>>> 	down_read(...)
>>>> } 
>>>
>>> I do insist as it is useful and use by both RDMA and nouveau and the
>>> above would kill the intent. The intent is do not try to take the lock
>>> if the process is dying.
>>
>> Could you provide me a link to those examples so I can take a peek? I
>> am still convinced that this whole thing is a race condition at best.
> 
> The race is fine and ok see:
> 
> https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-odp-v2&id=eebd4f3095290a16ebc03182e2d3ab5dfa7b05ec
> 
> which has been posted and i think i provided a link in the cover
> letter to that post. The same patch exist for nouveau i need to
> cleanup that tree and push it.

Thanks for that link, and I apologize for not keeping up with that
other review thread.

Looking it over, hmm_mirror_mm_down_read() is only used in one place.
So, what you really want there is not a down_read() wrapper, but rather,
something like

	hmm_sanity_check()

, that ib_umem_odp_map_dma_pages() calls.


> 
>>>
>>>
>>>>
>>>>> +{
>>>>> +	struct mm_struct *mm;
>>>>> +
>>>>> +	/* Sanity check ... */
>>>>> +	if (!mirror || !mirror->hmm)
>>>>> +		return -EINVAL;
>>>>> +	/*
>>>>> +	 * Before trying to take the mmap_sem make sure the mm is still
>>>>> +	 * alive as device driver context might outlive the mm lifetime.
>>>>
>>>> Let's find another way, and a better place, to solve this problem.
>>>> Ref counting?
>>>
>>> This has nothing to do with refcount or use after free or anthing
>>> like that. It is just about checking wether we are about to do
>>> something pointless. If the process is dying then it is pointless
>>> to try to take the lock and it is pointless for the device driver
>>> to trigger handle_mm_fault().
>>
>> Well, what happens if you let such pointless code run anyway? 
>> Does everything still work? If yes, then we don't need this change.
>> If no, then we need a race-free version of this change.
> 
> Yes everything work, nothing bad can happen from a race, it will just
> do useless work which never hurt anyone.
> 

OK, so let's either drop this patch, or if merge windows won't allow that,
then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
that does the same checks.


thanks,
Jerome Glisse March 28, 2019, 10:40 p.m. UTC | #7
On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote:
> On 3/28/19 3:08 PM, Jerome Glisse wrote:
> > On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
> >> On 3/28/19 2:30 PM, Jerome Glisse wrote:
> >>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
> >>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
> >>>>> From: Jérôme Glisse <jglisse@redhat.com>
> [...]
> >>
> >>>>
> >>>> If you insist on having this wrapper, I think it should have approximately 
> >>>> this form:
> >>>>
> >>>> void hmm_mirror_mm_down_read(...)
> >>>> {
> >>>> 	WARN_ON(...)
> >>>> 	down_read(...)
> >>>> } 
> >>>
> >>> I do insist as it is useful and use by both RDMA and nouveau and the
> >>> above would kill the intent. The intent is do not try to take the lock
> >>> if the process is dying.
> >>
> >> Could you provide me a link to those examples so I can take a peek? I
> >> am still convinced that this whole thing is a race condition at best.
> > 
> > The race is fine and ok see:
> > 
> > https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-odp-v2&id=eebd4f3095290a16ebc03182e2d3ab5dfa7b05ec
> > 
> > which has been posted and i think i provided a link in the cover
> > letter to that post. The same patch exist for nouveau i need to
> > cleanup that tree and push it.
> 
> Thanks for that link, and I apologize for not keeping up with that
> other review thread.
> 
> Looking it over, hmm_mirror_mm_down_read() is only used in one place.
> So, what you really want there is not a down_read() wrapper, but rather,
> something like
> 
> 	hmm_sanity_check()
> 
> , that ib_umem_odp_map_dma_pages() calls.

Why ? The device driver pattern is:
    if (hmm_is_it_dying()) {
        // handle when process die and abort the fault ie useless
        // to call within HMM
    }
    down_read(mmap_sem);

This pattern is common within nouveau and RDMA and other device driver in
the work. Hence why i am replacing it with just one helper. Also it has the
added benefit that changes being discussed around the mmap sem will be easier
to do as it avoid having to update each driver but instead it can be done
just once for the HMM helpers.

> 
> 
> > 
> >>>
> >>>
> >>>>
> >>>>> +{
> >>>>> +	struct mm_struct *mm;
> >>>>> +
> >>>>> +	/* Sanity check ... */
> >>>>> +	if (!mirror || !mirror->hmm)
> >>>>> +		return -EINVAL;
> >>>>> +	/*
> >>>>> +	 * Before trying to take the mmap_sem make sure the mm is still
> >>>>> +	 * alive as device driver context might outlive the mm lifetime.
> >>>>
> >>>> Let's find another way, and a better place, to solve this problem.
> >>>> Ref counting?
> >>>
> >>> This has nothing to do with refcount or use after free or anthing
> >>> like that. It is just about checking wether we are about to do
> >>> something pointless. If the process is dying then it is pointless
> >>> to try to take the lock and it is pointless for the device driver
> >>> to trigger handle_mm_fault().
> >>
> >> Well, what happens if you let such pointless code run anyway? 
> >> Does everything still work? If yes, then we don't need this change.
> >> If no, then we need a race-free version of this change.
> > 
> > Yes everything work, nothing bad can happen from a race, it will just
> > do useless work which never hurt anyone.
> > 
> 
> OK, so let's either drop this patch, or if merge windows won't allow that,
> then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
> that does the same checks.

RDMA depends on this, so does the nouveau patchset that convert to new API.
So i do not see reason to drop this. They are user for this they are posted
and i hope i explained properly the benefit.

It is a common pattern. Yes it only save couple lines of code but down the
road i will also help for people working on the mmap_sem patchset.


Cheers,
Jérôme
John Hubbard March 28, 2019, 10:43 p.m. UTC | #8
On 3/28/19 3:40 PM, Jerome Glisse wrote:
> On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote:
>> On 3/28/19 3:08 PM, Jerome Glisse wrote:
>>> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
>>>> On 3/28/19 2:30 PM, Jerome Glisse wrote:
>>>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
>>>>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
>>>>>>> From: Jérôme Glisse <jglisse@redhat.com>
>> [...]
>>>>
>>>>>>
>>>>>> If you insist on having this wrapper, I think it should have approximately 
>>>>>> this form:
>>>>>>
>>>>>> void hmm_mirror_mm_down_read(...)
>>>>>> {
>>>>>> 	WARN_ON(...)
>>>>>> 	down_read(...)
>>>>>> } 
>>>>>
>>>>> I do insist as it is useful and use by both RDMA and nouveau and the
>>>>> above would kill the intent. The intent is do not try to take the lock
>>>>> if the process is dying.
>>>>
>>>> Could you provide me a link to those examples so I can take a peek? I
>>>> am still convinced that this whole thing is a race condition at best.
>>>
>>> The race is fine and ok see:
>>>
>>> https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-odp-v2&id=eebd4f3095290a16ebc03182e2d3ab5dfa7b05ec
>>>
>>> which has been posted and i think i provided a link in the cover
>>> letter to that post. The same patch exist for nouveau i need to
>>> cleanup that tree and push it.
>>
>> Thanks for that link, and I apologize for not keeping up with that
>> other review thread.
>>
>> Looking it over, hmm_mirror_mm_down_read() is only used in one place.
>> So, what you really want there is not a down_read() wrapper, but rather,
>> something like
>>
>> 	hmm_sanity_check()
>>
>> , that ib_umem_odp_map_dma_pages() calls.
> 
> Why ? The device driver pattern is:
>     if (hmm_is_it_dying()) {
>         // handle when process die and abort the fault ie useless
>         // to call within HMM
>     }
>     down_read(mmap_sem);
> 
> This pattern is common within nouveau and RDMA and other device driver in
> the work. Hence why i am replacing it with just one helper. Also it has the
> added benefit that changes being discussed around the mmap sem will be easier
> to do as it avoid having to update each driver but instead it can be done
> just once for the HMM helpers.

Yes, and I'm saying that the pattern is broken. Because it's racy. :)

>>>>>>> +{
>>>>>>> +	struct mm_struct *mm;
>>>>>>> +
>>>>>>> +	/* Sanity check ... */
>>>>>>> +	if (!mirror || !mirror->hmm)
>>>>>>> +		return -EINVAL;
>>>>>>> +	/*
>>>>>>> +	 * Before trying to take the mmap_sem make sure the mm is still
>>>>>>> +	 * alive as device driver context might outlive the mm lifetime.
>>>>>>
>>>>>> Let's find another way, and a better place, to solve this problem.
>>>>>> Ref counting?
>>>>>
>>>>> This has nothing to do with refcount or use after free or anthing
>>>>> like that. It is just about checking wether we are about to do
>>>>> something pointless. If the process is dying then it is pointless
>>>>> to try to take the lock and it is pointless for the device driver
>>>>> to trigger handle_mm_fault().
>>>>
>>>> Well, what happens if you let such pointless code run anyway? 
>>>> Does everything still work? If yes, then we don't need this change.
>>>> If no, then we need a race-free version of this change.
>>>
>>> Yes everything work, nothing bad can happen from a race, it will just
>>> do useless work which never hurt anyone.
>>>
>>
>> OK, so let's either drop this patch, or if merge windows won't allow that,
>> then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
>> that does the same checks.
> 
> RDMA depends on this, so does the nouveau patchset that convert to new API.
> So i do not see reason to drop this. They are user for this they are posted
> and i hope i explained properly the benefit.
> 
> It is a common pattern. Yes it only save couple lines of code but down the
> road i will also help for people working on the mmap_sem patchset.
> 

It *adds* a couple of lines that are misleading, because they look like they
make things safer, but they don't actually do so.

thanks,
Jerome Glisse March 28, 2019, 11:05 p.m. UTC | #9
On Thu, Mar 28, 2019 at 03:43:33PM -0700, John Hubbard wrote:
> On 3/28/19 3:40 PM, Jerome Glisse wrote:
> > On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote:
> >> On 3/28/19 3:08 PM, Jerome Glisse wrote:
> >>> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
> >>>> On 3/28/19 2:30 PM, Jerome Glisse wrote:
> >>>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
> >>>>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
> >>>>>>> From: Jérôme Glisse <jglisse@redhat.com>
> >> [...]
> >>>>
> >>>>>>
> >>>>>> If you insist on having this wrapper, I think it should have approximately 
> >>>>>> this form:
> >>>>>>
> >>>>>> void hmm_mirror_mm_down_read(...)
> >>>>>> {
> >>>>>> 	WARN_ON(...)
> >>>>>> 	down_read(...)
> >>>>>> } 
> >>>>>
> >>>>> I do insist as it is useful and use by both RDMA and nouveau and the
> >>>>> above would kill the intent. The intent is do not try to take the lock
> >>>>> if the process is dying.
> >>>>
> >>>> Could you provide me a link to those examples so I can take a peek? I
> >>>> am still convinced that this whole thing is a race condition at best.
> >>>
> >>> The race is fine and ok see:
> >>>
> >>> https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-odp-v2&id=eebd4f3095290a16ebc03182e2d3ab5dfa7b05ec
> >>>
> >>> which has been posted and i think i provided a link in the cover
> >>> letter to that post. The same patch exist for nouveau i need to
> >>> cleanup that tree and push it.
> >>
> >> Thanks for that link, and I apologize for not keeping up with that
> >> other review thread.
> >>
> >> Looking it over, hmm_mirror_mm_down_read() is only used in one place.
> >> So, what you really want there is not a down_read() wrapper, but rather,
> >> something like
> >>
> >> 	hmm_sanity_check()
> >>
> >> , that ib_umem_odp_map_dma_pages() calls.
> > 
> > Why ? The device driver pattern is:
> >     if (hmm_is_it_dying()) {
> >         // handle when process die and abort the fault ie useless
> >         // to call within HMM
> >     }
> >     down_read(mmap_sem);
> > 
> > This pattern is common within nouveau and RDMA and other device driver in
> > the work. Hence why i am replacing it with just one helper. Also it has the
> > added benefit that changes being discussed around the mmap sem will be easier
> > to do as it avoid having to update each driver but instead it can be done
> > just once for the HMM helpers.
> 
> Yes, and I'm saying that the pattern is broken. Because it's racy. :)

And i explained why it is fine, it just an optimization, in most case
it takes time to tear down a process and the device page fault handler
can be trigger while that happens, so instead of having it pile more
work on we can detect that even if it is racy. It is just about avoiding
useless work. There is nothing about correctness here. It does not need
to identify dying process with 100% accuracy. The fact that the process
is dying will be identified race free later on and it just means that in
the meantime we are doing useless works, potential tons of useless works.

They are hardware that can storm the page fault handler and we end up
with hundred of page fault queued up against a process that might be
dying. It is a big waste to go over all those fault and do works that
will be trown on the floor later on.

> 
> >>>>>>> +{
> >>>>>>> +	struct mm_struct *mm;
> >>>>>>> +
> >>>>>>> +	/* Sanity check ... */
> >>>>>>> +	if (!mirror || !mirror->hmm)
> >>>>>>> +		return -EINVAL;
> >>>>>>> +	/*
> >>>>>>> +	 * Before trying to take the mmap_sem make sure the mm is still
> >>>>>>> +	 * alive as device driver context might outlive the mm lifetime.
> >>>>>>
> >>>>>> Let's find another way, and a better place, to solve this problem.
> >>>>>> Ref counting?
> >>>>>
> >>>>> This has nothing to do with refcount or use after free or anthing
> >>>>> like that. It is just about checking wether we are about to do
> >>>>> something pointless. If the process is dying then it is pointless
> >>>>> to try to take the lock and it is pointless for the device driver
> >>>>> to trigger handle_mm_fault().
> >>>>
> >>>> Well, what happens if you let such pointless code run anyway? 
> >>>> Does everything still work? If yes, then we don't need this change.
> >>>> If no, then we need a race-free version of this change.
> >>>
> >>> Yes everything work, nothing bad can happen from a race, it will just
> >>> do useless work which never hurt anyone.
> >>>
> >>
> >> OK, so let's either drop this patch, or if merge windows won't allow that,
> >> then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
> >> that does the same checks.
> > 
> > RDMA depends on this, so does the nouveau patchset that convert to new API.
> > So i do not see reason to drop this. They are user for this they are posted
> > and i hope i explained properly the benefit.
> > 
> > It is a common pattern. Yes it only save couple lines of code but down the
> > road i will also help for people working on the mmap_sem patchset.
> > 
> 
> It *adds* a couple of lines that are misleading, because they look like they
> make things safer, but they don't actually do so.

It is not about safety, sorry if it confused you but there is nothing about
safety here, i can add a big fat comment that explains that there is no safety
here. The intention is to allow the page fault handler that potential have
hundred of page fault queue up to abort as soon as it sees that it is pointless
to keep faulting on a dying process.

Again if we race it is _fine_ nothing bad will happen, we are just doing use-
less work that gonna be thrown on the floor and we are just slowing down the
process tear down.

Cheers,
Jérôme
John Hubbard March 28, 2019, 11:20 p.m. UTC | #10
On 3/28/19 4:05 PM, Jerome Glisse wrote:
> On Thu, Mar 28, 2019 at 03:43:33PM -0700, John Hubbard wrote:
>> On 3/28/19 3:40 PM, Jerome Glisse wrote:
>>> On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote:
>>>> On 3/28/19 3:08 PM, Jerome Glisse wrote:
>>>>> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
>>>>>> On 3/28/19 2:30 PM, Jerome Glisse wrote:
>>>>>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
>>>>>>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
>>>>>>>>> From: Jérôme Glisse <jglisse@redhat.com>
>>>> [...]
>>>> OK, so let's either drop this patch, or if merge windows won't allow that,
>>>> then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
>>>> that does the same checks.
>>>
>>> RDMA depends on this, so does the nouveau patchset that convert to new API.
>>> So i do not see reason to drop this. They are user for this they are posted
>>> and i hope i explained properly the benefit.
>>>
>>> It is a common pattern. Yes it only save couple lines of code but down the
>>> road i will also help for people working on the mmap_sem patchset.
>>>
>>
>> It *adds* a couple of lines that are misleading, because they look like they
>> make things safer, but they don't actually do so.
> 
> It is not about safety, sorry if it confused you but there is nothing about
> safety here, i can add a big fat comment that explains that there is no safety
> here. The intention is to allow the page fault handler that potential have
> hundred of page fault queue up to abort as soon as it sees that it is pointless
> to keep faulting on a dying process.
> 
> Again if we race it is _fine_ nothing bad will happen, we are just doing use-
> less work that gonna be thrown on the floor and we are just slowing down the
> process tear down.
> 

In addition to a comment, how about naming this thing to indicate the above 
intention?  I have a really hard time with this odd down_read() wrapper, which
allows code to proceed without really getting a lock. It's just too wrong-looking.
If it were instead named:

	hmm_is_exiting()

and had a comment about why racy is OK, then I'd be a lot happier. :)


thanks,
Jerome Glisse March 28, 2019, 11:24 p.m. UTC | #11
On Thu, Mar 28, 2019 at 04:20:37PM -0700, John Hubbard wrote:
> On 3/28/19 4:05 PM, Jerome Glisse wrote:
> > On Thu, Mar 28, 2019 at 03:43:33PM -0700, John Hubbard wrote:
> >> On 3/28/19 3:40 PM, Jerome Glisse wrote:
> >>> On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote:
> >>>> On 3/28/19 3:08 PM, Jerome Glisse wrote:
> >>>>> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
> >>>>>> On 3/28/19 2:30 PM, Jerome Glisse wrote:
> >>>>>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
> >>>>>>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
> >>>>>>>>> From: Jérôme Glisse <jglisse@redhat.com>
> >>>> [...]
> >>>> OK, so let's either drop this patch, or if merge windows won't allow that,
> >>>> then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
> >>>> that does the same checks.
> >>>
> >>> RDMA depends on this, so does the nouveau patchset that convert to new API.
> >>> So i do not see reason to drop this. They are user for this they are posted
> >>> and i hope i explained properly the benefit.
> >>>
> >>> It is a common pattern. Yes it only save couple lines of code but down the
> >>> road i will also help for people working on the mmap_sem patchset.
> >>>
> >>
> >> It *adds* a couple of lines that are misleading, because they look like they
> >> make things safer, but they don't actually do so.
> > 
> > It is not about safety, sorry if it confused you but there is nothing about
> > safety here, i can add a big fat comment that explains that there is no safety
> > here. The intention is to allow the page fault handler that potential have
> > hundred of page fault queue up to abort as soon as it sees that it is pointless
> > to keep faulting on a dying process.
> > 
> > Again if we race it is _fine_ nothing bad will happen, we are just doing use-
> > less work that gonna be thrown on the floor and we are just slowing down the
> > process tear down.
> > 
> 
> In addition to a comment, how about naming this thing to indicate the above 
> intention?  I have a really hard time with this odd down_read() wrapper, which
> allows code to proceed without really getting a lock. It's just too wrong-looking.
> If it were instead named:
> 
> 	hmm_is_exiting()

What about: hmm_lock_mmap_if_alive() ?


> 
> and had a comment about why racy is OK, then I'd be a lot happier. :)

Will add fat comment.

Cheers,
Jérôme
John Hubbard March 28, 2019, 11:34 p.m. UTC | #12
On 3/28/19 4:24 PM, Jerome Glisse wrote:
> On Thu, Mar 28, 2019 at 04:20:37PM -0700, John Hubbard wrote:
>> On 3/28/19 4:05 PM, Jerome Glisse wrote:
>>> On Thu, Mar 28, 2019 at 03:43:33PM -0700, John Hubbard wrote:
>>>> On 3/28/19 3:40 PM, Jerome Glisse wrote:
>>>>> On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote:
>>>>>> On 3/28/19 3:08 PM, Jerome Glisse wrote:
>>>>>>> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote:
>>>>>>>> On 3/28/19 2:30 PM, Jerome Glisse wrote:
>>>>>>>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote:
>>>>>>>>>> On 3/25/19 7:40 AM, jglisse@redhat.com wrote:
>>>>>>>>>>> From: Jérôme Glisse <jglisse@redhat.com>
>>>>>> [...]
>>>>>> OK, so let's either drop this patch, or if merge windows won't allow that,
>>>>>> then *eventually* drop this patch. And instead, put in a hmm_sanity_check()
>>>>>> that does the same checks.
>>>>>
>>>>> RDMA depends on this, so does the nouveau patchset that convert to new API.
>>>>> So i do not see reason to drop this. They are user for this they are posted
>>>>> and i hope i explained properly the benefit.
>>>>>
>>>>> It is a common pattern. Yes it only save couple lines of code but down the
>>>>> road i will also help for people working on the mmap_sem patchset.
>>>>>
>>>>
>>>> It *adds* a couple of lines that are misleading, because they look like they
>>>> make things safer, but they don't actually do so.
>>>
>>> It is not about safety, sorry if it confused you but there is nothing about
>>> safety here, i can add a big fat comment that explains that there is no safety
>>> here. The intention is to allow the page fault handler that potential have
>>> hundred of page fault queue up to abort as soon as it sees that it is pointless
>>> to keep faulting on a dying process.
>>>
>>> Again if we race it is _fine_ nothing bad will happen, we are just doing use-
>>> less work that gonna be thrown on the floor and we are just slowing down the
>>> process tear down.
>>>
>>
>> In addition to a comment, how about naming this thing to indicate the above 
>> intention?  I have a really hard time with this odd down_read() wrapper, which
>> allows code to proceed without really getting a lock. It's just too wrong-looking.
>> If it were instead named:
>>
>> 	hmm_is_exiting()
> 
> What about: hmm_lock_mmap_if_alive() ?
> 

That's definitely better, but I want to vote for just doing a check, not 
taking any locks.

I'm not super concerned about the exact name, but I really want a routine that
just checks (and optionally asserts, via WARN or BUG), and that's *all*. Then
drivers can scatter that around like pixie dust as they see fit. Maybe right before
taking a lock, maybe in other places. Decoupled from locking.

thanks,
diff mbox series

Patch

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index f3b919b04eda..5f9deaeb9d77 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -438,6 +438,50 @@  struct hmm_mirror {
 int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
 void hmm_mirror_unregister(struct hmm_mirror *mirror);
 
+/*
+ * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
+ * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
+ * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
+ *
+ * The device driver context which holds reference to mirror and thus to core
+ * hmm struct might outlive the mm against which it was created. To avoid every
+ * driver to check for that case provide an helper that check if mm is still
+ * alive and take the mmap_sem in read mode if so. If the mm have been destroy
+ * (mmu_notifier release call back did happen) then we return -EINVAL so that
+ * calling code knows that it is trying to do something against a mm that is
+ * no longer valid.
+ */
+static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
+{
+	struct mm_struct *mm;
+
+	/* Sanity check ... */
+	if (!mirror || !mirror->hmm)
+		return -EINVAL;
+	/*
+	 * Before trying to take the mmap_sem make sure the mm is still
+	 * alive as device driver context might outlive the mm lifetime.
+	 *
+	 * FIXME: should we also check for mm that outlive its owning
+	 * task ?
+	 */
+	mm = READ_ONCE(mirror->hmm->mm);
+	if (mirror->hmm->dead || !mm)
+		return -EINVAL;
+
+	down_read(&mm->mmap_sem);
+	return 0;
+}
+
+/*
+ * hmm_mirror_mm_up_read() - unlock the mmap_sem from read mode
+ * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
+ */
+static inline void hmm_mirror_mm_up_read(struct hmm_mirror *mirror)
+{
+	up_read(&mirror->hmm->mm->mmap_sem);
+}
+
 
 /*
  * To snapshot the CPU page table you first have to call hmm_range_register()
@@ -463,7 +507,7 @@  void hmm_mirror_unregister(struct hmm_mirror *mirror);
  *          if (ret)
  *              return ret;
  *
- *          down_read(mm->mmap_sem);
+ *          hmm_mirror_mm_down_read(mirror);
  *      again:
  *
  *          if (!hmm_range_wait_until_valid(&range, TIMEOUT)) {
@@ -476,13 +520,13 @@  void hmm_mirror_unregister(struct hmm_mirror *mirror);
  *
  *          ret = hmm_range_snapshot(&range); or hmm_range_fault(&range);
  *          if (ret == -EAGAIN) {
- *              down_read(mm->mmap_sem);
+ *              hmm_mirror_mm_down_read(mirror);
  *              goto again;
  *          } else if (ret == -EBUSY) {
  *              goto again;
  *          }
  *
- *          up_read(&mm->mmap_sem);
+ *          hmm_mirror_mm_up_read(mirror);
  *          if (ret) {
  *              hmm_range_unregister(range);
  *              return ret;