diff mbox

[vfio] vfio: Use get_user_pages_longterm correctly

Message ID CAPcyv4i9g-RGnYrAJgRzCyUp3FGGbNfabayPAraY2e+O20YS5A@mail.gmail.com (mailing list archive)
State RFC
Headers show

Commit Message

Dan Williams July 1, 2018, 4:10 p.m. UTC
On Sat, Jun 30, 2018 at 1:17 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Fri, 29 Jun 2018 11:31:50 -0600
> Jason Gunthorpe <jgg@mellanox.com> wrote:
>
>> The patch noted in the fixes below converted get_user_pages_fast() to
>> get_user_pages_longterm(), however the two calls differ in a few ways.
>>
>> First _fast() is documented to not require the mmap_sem, while _longterm()
>> is documented to need it. Hold the mmap sem as required.
>>
>> Second, _fast accepts an 'int write' while _longterm uses 'unsigned int
>> gup_flags', so the expression '!!(prot & IOMMU_WRITE)' is only working by
>> luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE
>> constant instead.
>>
>> Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Minor change as shown below, we don't need both branches coming up with
> the FOLL_WRITE flag in slightly different ways.
>
>> I noticed this while trying to review some RDMA code that was touching
>> our get_user_pages_longterm() call site and wanted to see what others
>> are doing..
>>
>> If someone can explain that get_user_pages_longterm() is safe to call
>> without the mmap_sem held I'd love to here it!
>
> Me too :-\
>
>> The comments in gup.c do seem to pretty clearly state the
>> __get_user_pages_locked() called internally by
>> get_user_pages_longterm() needs mmap_sem held..
>>
>> This is confusing me because this is the only
>> get_user_pages_longterm() callsite that doesn't hold the mmap_sem, and
>> if it really isn't required I'd like to remove it from the RDMA code
>> as well :)
>
> commit 0e81a8fc0411c9baec88f3f65154285fede473f6
> Author: Jason Gunthorpe <jgg@mellanox.com>
> Date:   Fri Jun 29 11:31:50 2018 -0600
>
>     vfio: Use get_user_pages_longterm correctly
>
>     The patch noted in the fixes below converted get_user_pages_fast() to
>     get_user_pages_longterm(), however the two calls differ in a few ways.
>
>     First _fast() is documented to not require the mmap_sem, while _longterm()
>     is documented to need it. Hold the mmap sem as required.
>
>     Second, _fast accepts an 'int write' while _longterm uses 'unsigned int
>     gup_flags', so the expression '!!(prot & IOMMU_WRITE)' is only working by
>     luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE
>     constant instead.
>
>     Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning")
>     Cc: <stable@vger.kernel.org>
>     Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
>     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Ugh, yes.

Acked-by: Dan Williams <dan.j.williams@intel.com>

I think we also need the following clue bat for people like me in the future:

                BUG_ON(vmas);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Leon Romanovsky July 1, 2018, 4:43 p.m. UTC | #1
On Sun, Jul 01, 2018 at 09:10:57AM -0700, Dan Williams wrote:
> On Sat, Jun 30, 2018 at 1:17 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Fri, 29 Jun 2018 11:31:50 -0600
> > Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> >> The patch noted in the fixes below converted get_user_pages_fast() to
> >> get_user_pages_longterm(), however the two calls differ in a few ways.
> >>
> >> First _fast() is documented to not require the mmap_sem, while _longterm()
> >> is documented to need it. Hold the mmap sem as required.
> >>
> >> Second, _fast accepts an 'int write' while _longterm uses 'unsigned int
> >> gup_flags', so the expression '!!(prot & IOMMU_WRITE)' is only working by
> >> luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE
> >> constant instead.
> >>
> >> Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning")
> >> Cc: <stable@vger.kernel.org>
> >> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > Minor change as shown below, we don't need both branches coming up with
> > the FOLL_WRITE flag in slightly different ways.
> >
> >> I noticed this while trying to review some RDMA code that was touching
> >> our get_user_pages_longterm() call site and wanted to see what others
> >> are doing..
> >>
> >> If someone can explain that get_user_pages_longterm() is safe to call
> >> without the mmap_sem held I'd love to here it!
> >
> > Me too :-\
> >
> >> The comments in gup.c do seem to pretty clearly state the
> >> __get_user_pages_locked() called internally by
> >> get_user_pages_longterm() needs mmap_sem held..
> >>
> >> This is confusing me because this is the only
> >> get_user_pages_longterm() callsite that doesn't hold the mmap_sem, and
> >> if it really isn't required I'd like to remove it from the RDMA code
> >> as well :)
> >
> > commit 0e81a8fc0411c9baec88f3f65154285fede473f6
> > Author: Jason Gunthorpe <jgg@mellanox.com>
> > Date:   Fri Jun 29 11:31:50 2018 -0600
> >
> >     vfio: Use get_user_pages_longterm correctly
> >
> >     The patch noted in the fixes below converted get_user_pages_fast() to
> >     get_user_pages_longterm(), however the two calls differ in a few ways.
> >
> >     First _fast() is documented to not require the mmap_sem, while _longterm()
> >     is documented to need it. Hold the mmap sem as required.
> >
> >     Second, _fast accepts an 'int write' while _longterm uses 'unsigned int
> >     gup_flags', so the expression '!!(prot & IOMMU_WRITE)' is only working by
> >     luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE
> >     constant instead.
> >
> >     Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning")
> >     Cc: <stable@vger.kernel.org>
> >     Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>
> Ugh, yes.
>
> Acked-by: Dan Williams <dan.j.williams@intel.com>
>
> I think we also need the following clue bat for people like me in the future:
>
> diff --git a/mm/gup.c b/mm/gup.c
> index b70d7ba7cc13..388a5c799fa5 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -873,6 +873,8 @@ static __always_inline long
> __get_user_pages_locked(struct task_struct *ts
>         long ret, pages_done;
>         bool lock_dropped;
>
> +       lockdep_assert_held_read(&mm->mmap_sem);
> +

It should be "lockdep_assert_held(&mm->mmap_sem);" and not as you wrote.
There is nothing wrong in holding exclusive lock while accessing gup.

>         if (locked) {
>                 /* if VM_FAULT_RETRY can be returned, vmas become invalid */
>                 BUG_ON(vmas);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams July 1, 2018, 5:34 p.m. UTC | #2
On Sun, Jul 1, 2018 at 9:43 AM, Leon Romanovsky <leonro@mellanox.com> wrote:
> On Sun, Jul 01, 2018 at 09:10:57AM -0700, Dan Williams wrote:
>> On Sat, Jun 30, 2018 at 1:17 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Fri, 29 Jun 2018 11:31:50 -0600
>> > Jason Gunthorpe <jgg@mellanox.com> wrote:
>> >
>> >> The patch noted in the fixes below converted get_user_pages_fast() to
>> >> get_user_pages_longterm(), however the two calls differ in a few ways.
>> >>
>> >> First _fast() is documented to not require the mmap_sem, while _longterm()
>> >> is documented to need it. Hold the mmap sem as required.
>> >>
>> >> Second, _fast accepts an 'int write' while _longterm uses 'unsigned int
>> >> gup_flags', so the expression '!!(prot & IOMMU_WRITE)' is only working by
>> >> luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE
>> >> constant instead.
>> >>
>> >> Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning")
>> >> Cc: <stable@vger.kernel.org>
>> >> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
>> >> ---
>> >>  drivers/vfio/vfio_iommu_type1.c | 4 ++--
>> >>  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > Minor change as shown below, we don't need both branches coming up with
>> > the FOLL_WRITE flag in slightly different ways.
>> >
>> >> I noticed this while trying to review some RDMA code that was touching
>> >> our get_user_pages_longterm() call site and wanted to see what others
>> >> are doing..
>> >>
>> >> If someone can explain that get_user_pages_longterm() is safe to call
>> >> without the mmap_sem held I'd love to here it!
>> >
>> > Me too :-\
>> >
>> >> The comments in gup.c do seem to pretty clearly state the
>> >> __get_user_pages_locked() called internally by
>> >> get_user_pages_longterm() needs mmap_sem held..
>> >>
>> >> This is confusing me because this is the only
>> >> get_user_pages_longterm() callsite that doesn't hold the mmap_sem, and
>> >> if it really isn't required I'd like to remove it from the RDMA code
>> >> as well :)
>> >
>> > commit 0e81a8fc0411c9baec88f3f65154285fede473f6
>> > Author: Jason Gunthorpe <jgg@mellanox.com>
>> > Date:   Fri Jun 29 11:31:50 2018 -0600
>> >
>> >     vfio: Use get_user_pages_longterm correctly
>> >
>> >     The patch noted in the fixes below converted get_user_pages_fast() to
>> >     get_user_pages_longterm(), however the two calls differ in a few ways.
>> >
>> >     First _fast() is documented to not require the mmap_sem, while _longterm()
>> >     is documented to need it. Hold the mmap sem as required.
>> >
>> >     Second, _fast accepts an 'int write' while _longterm uses 'unsigned int
>> >     gup_flags', so the expression '!!(prot & IOMMU_WRITE)' is only working by
>> >     luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE
>> >     constant instead.
>> >
>> >     Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning")
>> >     Cc: <stable@vger.kernel.org>
>> >     Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
>> >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>
>> Ugh, yes.
>>
>> Acked-by: Dan Williams <dan.j.williams@intel.com>
>>
>> I think we also need the following clue bat for people like me in the future:
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index b70d7ba7cc13..388a5c799fa5 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -873,6 +873,8 @@ static __always_inline long
>> __get_user_pages_locked(struct task_struct *ts
>>         long ret, pages_done;
>>         bool lock_dropped;
>>
>> +       lockdep_assert_held_read(&mm->mmap_sem);
>> +
>
> It should be "lockdep_assert_held(&mm->mmap_sem);" and not as you wrote.
> There is nothing wrong in holding exclusive lock while accessing gup.

You're right, I thought for the purposes of lockdep that an exclusive
hold also means protection against new read locks, but the assertion
goes the other way and lockdep_assert_held_read() is incorrect.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 3, 2018, 4:54 p.m. UTC | #3
On Sun, Jul 01, 2018 at 09:10:57AM -0700, Dan Williams wrote:

> I think we also need the following clue bat for people like me in the future:
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index b70d7ba7cc13..388a5c799fa5 100644
> +++ b/mm/gup.c
> @@ -873,6 +873,8 @@ static __always_inline long
> __get_user_pages_locked(struct task_struct *ts
>         long ret, pages_done;
>         bool lock_dropped;
> 
> +       lockdep_assert_held_read(&mm->mmap_sem);
> +
>         if (locked) {
>                 /* if VM_FAULT_RETRY can be returned, vmas become invalid */
>                 BUG_ON(vmas);

Yes please, I had the same thought..

Dan, can you make this happen?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams July 3, 2018, 4:58 p.m. UTC | #4
On Tue, Jul 3, 2018 at 9:54 AM, Jason Gunthorpe <jgg@mellanox.com> wrote:
> On Sun, Jul 01, 2018 at 09:10:57AM -0700, Dan Williams wrote:
>
>> I think we also need the following clue bat for people like me in the future:
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index b70d7ba7cc13..388a5c799fa5 100644
>> +++ b/mm/gup.c
>> @@ -873,6 +873,8 @@ static __always_inline long
>> __get_user_pages_locked(struct task_struct *ts
>>         long ret, pages_done;
>>         bool lock_dropped;
>>
>> +       lockdep_assert_held_read(&mm->mmap_sem);
>> +
>>         if (locked) {
>>                 /* if VM_FAULT_RETRY can be returned, vmas become invalid */
>>                 BUG_ON(vmas);
>
> Yes please, I had the same thought..
>
> Dan, can you make this happen?
>

Sure, I'll send the correct version of this to Andrew and cc you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/gup.c b/mm/gup.c
index b70d7ba7cc13..388a5c799fa5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -873,6 +873,8 @@  static __always_inline long
__get_user_pages_locked(struct task_struct *ts
        long ret, pages_done;
        bool lock_dropped;

+       lockdep_assert_held_read(&mm->mmap_sem);
+
        if (locked) {
                /* if VM_FAULT_RETRY can be returned, vmas become invalid */