diff mbox series

[10/10] mm/hmm: add helpers for driver to safely take the mmap_sem

Message ID 20190129165428.3931-11-jglisse@redhat.com (mailing list archive)
State New, archived
Headers show
Series HMM updates for 5.1 | expand

Commit Message

Jerome Glisse Jan. 29, 2019, 4:54 p.m. UTC
From: Jérôme Glisse <jglisse@redhat.com>

The device driver context which holds reference to mirror and thus to
core hmm struct might outlive the mm against which it was created. To
avoid every driver to check for that case provide an helper that check
if mm is still alive and take the mmap_sem in read mode if so. If the
mm have been destroy (mmu_notifier release call back did happen) then
we return -EINVAL so that calling code knows that it is trying to do
something against a mm that is no longer valid.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 3 deletions(-)

Comments

John Hubbard Feb. 20, 2019, 9:59 p.m. UTC | #1
On 1/29/19 8:54 AM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> The device driver context which holds reference to mirror and thus to
> core hmm struct might outlive the mm against which it was created. To
> avoid every driver to check for that case provide an helper that check
> if mm is still alive and take the mmap_sem in read mode if so. If the
> mm have been destroy (mmu_notifier release call back did happen) then
> we return -EINVAL so that calling code knows that it is trying to do
> something against a mm that is no longer valid.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>   include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index b3850297352f..4a1454e3efba 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -438,6 +438,50 @@ struct hmm_mirror {
>   int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
>   void hmm_mirror_unregister(struct hmm_mirror *mirror);
>   
> +/*
> + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
> + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
> + *
> + * The device driver context which holds reference to mirror and thus to core
> + * hmm struct might outlive the mm against which it was created. To avoid every
> + * driver to check for that case provide an helper that check if mm is still
> + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
> + * (mmu_notifier release call back did happen) then we return -EINVAL so that
> + * calling code knows that it is trying to do something against a mm that is
> + * no longer valid.
> + */

Hi Jerome,

Are you thinking that, throughout the HMM API, there is a problem that
the mm may have gone away, and so driver code needs to be littered with
checks to ensure that mm is non-NULL? If so, why doesn't HMM take a
reference on mm->count?

This solution here cannot work. I think you'd need refcounting in order
to avoid this kind of problem. Just doing a check will always be open to
races (see below).


> +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
> +{
> +	struct mm_struct *mm;
> +
> +	/* Sanity check ... */
> +	if (!mirror || !mirror->hmm)
> +		return -EINVAL;
> +	/*
> +	 * Before trying to take the mmap_sem make sure the mm is still
> +	 * alive as device driver context might outlive the mm lifetime.
> +	 *
> +	 * FIXME: should we also check for mm that outlive its owning
> +	 * task ?
> +	 */
> +	mm = READ_ONCE(mirror->hmm->mm);
> +	if (mirror->hmm->dead || !mm)
> +		return -EINVAL;
> +

Nothing really prevents mirror->hmm->mm from changing to NULL right here.

> +	down_read(&mm->mmap_sem);
> +	return 0;
> +}
> +

...maybe better to just drop this patch from the series, until we see a
pattern of uses in the calling code.

thanks,
Jerome Glisse Feb. 20, 2019, 10:19 p.m. UTC | #2
On Wed, Feb 20, 2019 at 01:59:13PM -0800, John Hubbard wrote:
> On 1/29/19 8:54 AM, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > The device driver context which holds reference to mirror and thus to
> > core hmm struct might outlive the mm against which it was created. To
> > avoid every driver to check for that case provide an helper that check
> > if mm is still alive and take the mmap_sem in read mode if so. If the
> > mm have been destroy (mmu_notifier release call back did happen) then
> > we return -EINVAL so that calling code knows that it is trying to do
> > something against a mm that is no longer valid.
> > 
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Ralph Campbell <rcampbell@nvidia.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > ---
> >   include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
> >   1 file changed, 47 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > index b3850297352f..4a1454e3efba 100644
> > --- a/include/linux/hmm.h
> > +++ b/include/linux/hmm.h
> > @@ -438,6 +438,50 @@ struct hmm_mirror {
> >   int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
> >   void hmm_mirror_unregister(struct hmm_mirror *mirror);
> > +/*
> > + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
> > + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> > + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
> > + *
> > + * The device driver context which holds reference to mirror and thus to core
> > + * hmm struct might outlive the mm against which it was created. To avoid every
> > + * driver to check for that case provide an helper that check if mm is still
> > + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
> > + * (mmu_notifier release call back did happen) then we return -EINVAL so that
> > + * calling code knows that it is trying to do something against a mm that is
> > + * no longer valid.
> > + */
> 
> Hi Jerome,
> 
> Are you thinking that, throughout the HMM API, there is a problem that
> the mm may have gone away, and so driver code needs to be littered with
> checks to ensure that mm is non-NULL? If so, why doesn't HMM take a
> reference on mm->count?
> 
> This solution here cannot work. I think you'd need refcounting in order
> to avoid this kind of problem. Just doing a check will always be open to
> races (see below).
> 
> 
> > +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
> > +{
> > +	struct mm_struct *mm;
> > +
> > +	/* Sanity check ... */
> > +	if (!mirror || !mirror->hmm)
> > +		return -EINVAL;
> > +	/*
> > +	 * Before trying to take the mmap_sem make sure the mm is still
> > +	 * alive as device driver context might outlive the mm lifetime.
> > +	 *
> > +	 * FIXME: should we also check for mm that outlive its owning
> > +	 * task ?
> > +	 */
> > +	mm = READ_ONCE(mirror->hmm->mm);
> > +	if (mirror->hmm->dead || !mm)
> > +		return -EINVAL;
> > +
> 
> Nothing really prevents mirror->hmm->mm from changing to NULL right here.

This is really just to catch driver mistake, if driver does not call
hmm_mirror_unregister() then the !mm will never be true ie the
mirror->hmm->mm can not go NULL until the last reference to hmm_mirror
is gone.

> 
> > +	down_read(&mm->mmap_sem);
> > +	return 0;
> > +}
> > +
> 
> ...maybe better to just drop this patch from the series, until we see a
> pattern of uses in the calling code.

It use by nouveau now.

Cheers,
Jérôme
John Hubbard Feb. 20, 2019, 10:40 p.m. UTC | #3
On 2/20/19 2:19 PM, Jerome Glisse wrote:
> On Wed, Feb 20, 2019 at 01:59:13PM -0800, John Hubbard wrote:
>> On 1/29/19 8:54 AM, jglisse@redhat.com wrote:
>>> From: Jérôme Glisse <jglisse@redhat.com>
>>>
>>> The device driver context which holds reference to mirror and thus to
>>> core hmm struct might outlive the mm against which it was created. To
>>> avoid every driver to check for that case provide an helper that check
>>> if mm is still alive and take the mmap_sem in read mode if so. If the
>>> mm have been destroy (mmu_notifier release call back did happen) then
>>> we return -EINVAL so that calling code knows that it is trying to do
>>> something against a mm that is no longer valid.
>>>
>>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>>> Cc: John Hubbard <jhubbard@nvidia.com>
>>> ---
>>>    include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
>>>    1 file changed, 47 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>>> index b3850297352f..4a1454e3efba 100644
>>> --- a/include/linux/hmm.h
>>> +++ b/include/linux/hmm.h
>>> @@ -438,6 +438,50 @@ struct hmm_mirror {
>>>    int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
>>>    void hmm_mirror_unregister(struct hmm_mirror *mirror);
>>> +/*
>>> + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
>>> + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
>>> + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
>>> + *
>>> + * The device driver context which holds reference to mirror and thus to core
>>> + * hmm struct might outlive the mm against which it was created. To avoid every
>>> + * driver to check for that case provide an helper that check if mm is still
>>> + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
>>> + * (mmu_notifier release call back did happen) then we return -EINVAL so that
>>> + * calling code knows that it is trying to do something against a mm that is
>>> + * no longer valid.
>>> + */
>>
>> Hi Jerome,
>>
>> Are you thinking that, throughout the HMM API, there is a problem that
>> the mm may have gone away, and so driver code needs to be littered with
>> checks to ensure that mm is non-NULL? If so, why doesn't HMM take a
>> reference on mm->count?
>>
>> This solution here cannot work. I think you'd need refcounting in order
>> to avoid this kind of problem. Just doing a check will always be open to
>> races (see below).
>>
>>
>>> +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
>>> +{
>>> +	struct mm_struct *mm;
>>> +
>>> +	/* Sanity check ... */
>>> +	if (!mirror || !mirror->hmm)
>>> +		return -EINVAL;
>>> +	/*
>>> +	 * Before trying to take the mmap_sem make sure the mm is still
>>> +	 * alive as device driver context might outlive the mm lifetime.
>>> +	 *
>>> +	 * FIXME: should we also check for mm that outlive its owning
>>> +	 * task ?
>>> +	 */
>>> +	mm = READ_ONCE(mirror->hmm->mm);
>>> +	if (mirror->hmm->dead || !mm)
>>> +		return -EINVAL;
>>> +
>>
>> Nothing really prevents mirror->hmm->mm from changing to NULL right here.
> 
> This is really just to catch driver mistake, if driver does not call
> hmm_mirror_unregister() then the !mm will never be true ie the
> mirror->hmm->mm can not go NULL until the last reference to hmm_mirror
> is gone.

In that case, then this again seems unnecessary, and in fact undesirable.
If the driver code has a bug, then let's let the backtrace from a NULL
dereference just happen, loud and clear.

This patch, at best, hides bugs. And it adds code that should simply be
unnecessary, so I don't like it. :)  Let's make it go away.

> 
>>
>>> +	down_read(&mm->mmap_sem);
>>> +	return 0;
>>> +}
>>> +
>>
>> ...maybe better to just drop this patch from the series, until we see a
>> pattern of uses in the calling code.
> 
> It use by nouveau now.

Maybe you'd have to remove that use case in a couple steps, depending on the
order that patches are going in.


thanks,
Jerome Glisse Feb. 20, 2019, 11:09 p.m. UTC | #4
On Wed, Feb 20, 2019 at 02:40:20PM -0800, John Hubbard wrote:
> On 2/20/19 2:19 PM, Jerome Glisse wrote:
> > On Wed, Feb 20, 2019 at 01:59:13PM -0800, John Hubbard wrote:
> > > On 1/29/19 8:54 AM, jglisse@redhat.com wrote:
> > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > 
> > > > The device driver context which holds reference to mirror and thus to
> > > > core hmm struct might outlive the mm against which it was created. To
> > > > avoid every driver to check for that case provide an helper that check
> > > > if mm is still alive and take the mmap_sem in read mode if so. If the
> > > > mm have been destroy (mmu_notifier release call back did happen) then
> > > > we return -EINVAL so that calling code knows that it is trying to do
> > > > something against a mm that is no longer valid.
> > > > 
> > > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: Ralph Campbell <rcampbell@nvidia.com>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > ---
> > > >    include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
> > > >    1 file changed, 47 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > > > index b3850297352f..4a1454e3efba 100644
> > > > --- a/include/linux/hmm.h
> > > > +++ b/include/linux/hmm.h
> > > > @@ -438,6 +438,50 @@ struct hmm_mirror {
> > > >    int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
> > > >    void hmm_mirror_unregister(struct hmm_mirror *mirror);
> > > > +/*
> > > > + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
> > > > + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
> > > > + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
> > > > + *
> > > > + * The device driver context which holds reference to mirror and thus to core
> > > > + * hmm struct might outlive the mm against which it was created. To avoid every
> > > > + * driver to check for that case provide an helper that check if mm is still
> > > > + * alive and take the mmap_sem in read mode if so. If the mm have been destroy
> > > > + * (mmu_notifier release call back did happen) then we return -EINVAL so that
> > > > + * calling code knows that it is trying to do something against a mm that is
> > > > + * no longer valid.
> > > > + */
> > > 
> > > Hi Jerome,
> > > 
> > > Are you thinking that, throughout the HMM API, there is a problem that
> > > the mm may have gone away, and so driver code needs to be littered with
> > > checks to ensure that mm is non-NULL? If so, why doesn't HMM take a
> > > reference on mm->count?
> > > 
> > > This solution here cannot work. I think you'd need refcounting in order
> > > to avoid this kind of problem. Just doing a check will always be open to
> > > races (see below).
> > > 
> > > 
> > > > +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
> > > > +{
> > > > +	struct mm_struct *mm;
> > > > +
> > > > +	/* Sanity check ... */
> > > > +	if (!mirror || !mirror->hmm)
> > > > +		return -EINVAL;
> > > > +	/*
> > > > +	 * Before trying to take the mmap_sem make sure the mm is still
> > > > +	 * alive as device driver context might outlive the mm lifetime.
> > > > +	 *
> > > > +	 * FIXME: should we also check for mm that outlive its owning
> > > > +	 * task ?
> > > > +	 */
> > > > +	mm = READ_ONCE(mirror->hmm->mm);
> > > > +	if (mirror->hmm->dead || !mm)
> > > > +		return -EINVAL;
> > > > +
> > > 
> > > Nothing really prevents mirror->hmm->mm from changing to NULL right here.
> > 
> > This is really just to catch driver mistake, if driver does not call
> > hmm_mirror_unregister() then the !mm will never be true ie the
> > mirror->hmm->mm can not go NULL until the last reference to hmm_mirror
> > is gone.
> 
> In that case, then this again seems unnecessary, and in fact undesirable.
> If the driver code has a bug, then let's let the backtrace from a NULL
> dereference just happen, loud and clear.
> 
> This patch, at best, hides bugs. And it adds code that should simply be
> unnecessary, so I don't like it. :)  Let's make it go away.
> 
> > 
> > > 
> > > > +	down_read(&mm->mmap_sem);
> > > > +	return 0;
> > > > +}
> > > > +
> > > 
> > > ...maybe better to just drop this patch from the series, until we see a
> > > pattern of uses in the calling code.
> > 
> > It use by nouveau now.
> 
> Maybe you'd have to remove that use case in a couple steps, depending on the
> order that patches are going in.

Well all that is needed is removing if (mirror->hmm->dead || !mm) return -EINVAL;
from functions so it does not have any ordering conflict with anything really.

Cheers,
Jérôme
diff mbox series

Patch

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index b3850297352f..4a1454e3efba 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -438,6 +438,50 @@  struct hmm_mirror {
 int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
 void hmm_mirror_unregister(struct hmm_mirror *mirror);
 
+/*
+ * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
+ * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
+ * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
+ *
+ * The device driver context which holds reference to mirror and thus to core
+ * hmm struct might outlive the mm against which it was created. To avoid every
+ * driver to check for that case provide an helper that check if mm is still
+ * alive and take the mmap_sem in read mode if so. If the mm have been destroy
+ * (mmu_notifier release call back did happen) then we return -EINVAL so that
+ * calling code knows that it is trying to do something against a mm that is
+ * no longer valid.
+ */
+static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
+{
+	struct mm_struct *mm;
+
+	/* Sanity check ... */
+	if (!mirror || !mirror->hmm)
+		return -EINVAL;
+	/*
+	 * Before trying to take the mmap_sem make sure the mm is still
+	 * alive as device driver context might outlive the mm lifetime.
+	 *
+	 * FIXME: should we also check for mm that outlive its owning
+	 * task ?
+	 */
+	mm = READ_ONCE(mirror->hmm->mm);
+	if (mirror->hmm->dead || !mm)
+		return -EINVAL;
+
+	down_read(&mm->mmap_sem);
+	return 0;
+}
+
+/*
+ * hmm_mirror_mm_up_read() - unlock the mmap_sem from read mode
+ * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
+ */
+static inline void hmm_mirror_mm_up_read(struct hmm_mirror *mirror)
+{
+	up_read(&mirror->hmm->mm->mmap_sem);
+}
+
 
 /*
  * To snapshot the CPU page table you first have to call hmm_range_register()
@@ -463,7 +507,7 @@  void hmm_mirror_unregister(struct hmm_mirror *mirror);
  *          if (ret)
  *              return ret;
  *
- *          down_read(mm->mmap_sem);
+ *          hmm_mirror_mm_down_read(mirror);
  *      again:
  *
  *          if (!hmm_range_wait_until_valid(&range, TIMEOUT)) {
@@ -476,13 +520,13 @@  void hmm_mirror_unregister(struct hmm_mirror *mirror);
  *
  *          ret = hmm_range_snapshot(&range); or hmm_range_fault(&range);
  *          if (ret == -EAGAIN) {
- *              down_read(mm->mmap_sem);
+ *              hmm_mirror_mm_down_read(mirror);
  *              goto again;
  *          } else if (ret == -EBUSY) {
  *              goto again;
  *          }
  *
- *          up_read(&mm->mmap_sem);
+ *          hmm_mirror_mm_up_read(mirror);
  *          if (ret) {
  *              hmm_range_unregister(range);
  *              return ret;