mbox series

[v2,0/5] userfaultfd: support access/write hints

Message ID 20220718114748.2623-1-namit@vmware.com (mailing list archive)
Headers show
Series userfaultfd: support access/write hints | expand

Message

Nadav Amit July 18, 2022, 11:47 a.m. UTC
From: Nadav Amit <namit@vmware.com>

This patch-set introduces access/write hints for userfaultfd. Unlike the
previous versions, the use of these hints in this version is 
limited. Yet, in order to keep introducing new features again and again,
hints are introduced for all of uffd related ioctls.

The access-hint is currently used to set the young bit, similarly to
do_set_pte(). This has no effect on x86, but may on arm64.

When a write-hint is provided on zeropage ioctl, a clear page is
allocated instead of mapping the zero-page.

Future patches would use the write-hint to decide whether to map the
writable pages on write-(un)protect ioctl.

Setting the access-bit and dirty-bit introduces a tradeoff. When the bit
is set access/write is faster, but memory reclamation might be slower.
Currently, in the common userfaultfd cases the access-bit is not set on
and the dirty-bit is set. This is a questionable behavior.

Allow userspace to control this behavior through hints access- and
write-likely hints. These hints are used to control access- and
dirty-bits. For zero-pages that with write-likely hint, allocate a clear
page instead of mapping the zero-page.

v1 -> v2:
* Leave dirty-bit as it was before [Peter Xu]

RFCv2 -> v1:
* Adding hints to zeropage and continue
* Fixing other issues pointed by David H. & Peter Xu
* Adding tests to ./run_vmtests.sh

Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>

Nadav Amit (5):
  userfaultfd: introduce uffd_flags
  userfaultfd: introduce access-likely mode for common operations
  userfaultfd: introduce write-likely mode for uffd operations
  userfaultfd: zero access/write hints
  selftest/userfaultfd: test read/write hints

 fs/userfaultfd.c                          |  77 ++++++++++++++--
 include/linux/hugetlb.h                   |   4 +-
 include/linux/shmem_fs.h                  |   8 +-
 include/linux/userfaultfd_k.h             |  26 ++++--
 include/uapi/linux/userfaultfd.h          |  31 ++++++-
 mm/hugetlb.c                              |   3 +-
 mm/internal.h                             |  13 +++
 mm/memory.c                               |  12 ---
 mm/shmem.c                                |   6 +-
 mm/userfaultfd.c                          | 103 +++++++++++++++-------
 tools/testing/selftests/vm/run_vmtests.sh |  16 ++--
 tools/testing/selftests/vm/userfaultfd.c  |  54 +++++++++++-
 12 files changed, 275 insertions(+), 78 deletions(-)

Comments

Peter Xu July 18, 2022, 8:05 p.m. UTC | #1
On Mon, Jul 18, 2022 at 04:47:44AM -0700, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> As the next patches are going to introduce more information that needs
> to be propagated regarding handled user requests, introduce uffd_flags
> that would be used to propagate this information.
> 
> Remove the unused UFFD_FLAGS_SET to avoid confusion in the constant
> names.
> 
> Introducing uffd flags also allows to avoid mm/userfaultfd from being
> using uapi (e.g., UFFDIO_COPY_MODE_WP).
> 
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>

Acked-by: Peter Xu <peterx@redhat.com>
David Hildenbrand July 22, 2022, 7:54 a.m. UTC | #2
On 18.07.22 13:47, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> As the next patches are going to introduce more information that needs
> to be propagated regarding handled user requests, introduce uffd_flags
> that would be used to propagate this information.
> 
> Remove the unused UFFD_FLAGS_SET to avoid confusion in the constant
> names.
> 
> Introducing uffd flags also allows to avoid mm/userfaultfd from being
> using uapi (e.g., UFFDIO_COPY_MODE_WP).
> 
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>

[...]

>  
>  int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> -			unsigned long len, bool enable_wp,
> -			atomic_t *mmap_changing)
> +			unsigned long len,
> +			atomic_t *mmap_changing, uffd_flags_t uffd_flags)
>  {
> +	bool enable_wp = uffd_flags & UFFD_FLAGS_WP;

Could be that this will trigger a sparse warnings, but I haven't fully
understood yet when/how sparse will start to complain. If so, this would
have to be

bool enable_wp = !!(uffd_flags & UFFD_FLAGS_WP);

I stumbled into something like that in
https://lore.kernel.org/lkml/202202252038.ij1YGn0d-lkp@intel.com/T/


Reviewed-by: David Hildenbrand <david@redhat.com>
Nadav Amit July 22, 2022, 6:47 p.m. UTC | #3
On Jul 22, 2022, at 12:54 AM, David Hildenbrand <david@redhat.com> wrote:

> ⚠ External Email
> 
> On 18.07.22 13:47, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> As the next patches are going to introduce more information that needs
>> to be propagated regarding handled user requests, introduce uffd_flags
>> that would be used to propagate this information.
>> 
>> Remove the unused UFFD_FLAGS_SET to avoid confusion in the constant
>> names.
>> 
>> Introducing uffd flags also allows to avoid mm/userfaultfd from being
>> using uapi (e.g., UFFDIO_COPY_MODE_WP).
>> 
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
> 
> [...]
> 
>> int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
>> -                     unsigned long len, bool enable_wp,
>> -                     atomic_t *mmap_changing)
>> +                     unsigned long len,
>> +                     atomic_t *mmap_changing, uffd_flags_t uffd_flags)
>> {
>> +     bool enable_wp = uffd_flags & UFFD_FLAGS_WP;
> 
> Could be that this will trigger a sparse warnings, but I haven't fully
> understood yet when/how sparse will start to complain. If so, this would
> have to be
> 
> bool enable_wp = !!(uffd_flags & UFFD_FLAGS_WP);
> 
> I stumbled into something like that in
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F202202252038.ij1YGn0d-lkp%40intel.com%2FT%2F&amp;data=05%7C01%7Cnamit%40vmware.com%7Cc237032d11f04972fdb708da6bb77ada%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637940733049609220%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=qW0xe6sS7PjP3papl890GoPTbZ97iE%2Ffztt1rA9t6%2F0%3D&amp;reserved=0

Oh, damn. Thanks for pointing it out. Sparse gives me segmentation faults
for some reason, but I guess it should be addressed - just in case.
Mike Rapoport July 23, 2022, 9:12 a.m. UTC | #4
On Mon, Jul 18, 2022 at 04:47:44AM -0700, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> As the next patches are going to introduce more information that needs
> to be propagated regarding handled user requests, introduce uffd_flags
> that would be used to propagate this information.
> 
> Remove the unused UFFD_FLAGS_SET to avoid confusion in the constant
> names.
> 
> Introducing uffd flags also allows to avoid mm/userfaultfd from being
> using uapi (e.g., UFFDIO_COPY_MODE_WP).
> 
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  fs/userfaultfd.c              | 22 +++++++++++---
>  include/linux/hugetlb.h       |  4 +--
>  include/linux/shmem_fs.h      |  8 +++--
>  include/linux/userfaultfd_k.h | 24 +++++++++------
>  mm/hugetlb.c                  |  3 +-
>  mm/shmem.c                    |  6 ++--
>  mm/userfaultfd.c              | 57 ++++++++++++++++++-----------------
>  7 files changed, 73 insertions(+), 51 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index e943370107d0..2ae24327beec 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1682,6 +1682,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>  	struct uffdio_copy uffdio_copy;
>  	struct uffdio_copy __user *user_uffdio_copy;
>  	struct userfaultfd_wake_range range;
> +	bool mode_wp;
> +	uffd_flags_t uffd_flags;
>  
>  	user_uffdio_copy = (struct uffdio_copy __user *) arg;
>  
> @@ -1708,10 +1710,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>  		goto out;
>  	if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
>  		goto out;
> +
> +	mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP;

This seems to be the only place where mode_wp is used in this function. 
I'd just drop it, and set uffd_flags directly from uffdio_copy.mode. E.g.
something like

	uffd_flags_t uffd_flags = UFFD_FLAGS_NONE;

	...

	if (uffdio_copy.mode & UFFDIO_COPY_MODE_WP)
		uffd_flags = UFFD_FLAGS_WP;

Otherwise

Acked-by: Mike Rapoport <rppt@linux.ibm.com>

> +
> +	uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE;
> +
>  	if (mmget_not_zero(ctx->mm)) {
>  		ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
>  				   uffdio_copy.len, &ctx->mmap_changing,
> -				   uffdio_copy.mode);
> +				   uffd_flags);
>  		mmput(ctx->mm);
>  	} else {
>  		return -ESRCH;
Nadav Amit July 25, 2022, 5:23 p.m. UTC | #5
> On Jul 23, 2022, at 2:12 AM, Mike Rapoport <rppt@linux.ibm.com> wrote:
> 
> On Mon, Jul 18, 2022 at 04:47:44AM -0700, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> As the next patches are going to introduce more information that needs
>> to be propagated regarding handled user requests, introduce uffd_flags
>> that would be used to propagate this information.
>> 
>> Remove the unused UFFD_FLAGS_SET to avoid confusion in the constant
>> names.
>> 
>> Introducing uffd flags also allows to avoid mm/userfaultfd from being
>> using uapi (e.g., UFFDIO_COPY_MODE_WP).
>> 
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> fs/userfaultfd.c              | 22 +++++++++++---
>> include/linux/hugetlb.h       |  4 +--
>> include/linux/shmem_fs.h      |  8 +++--
>> include/linux/userfaultfd_k.h | 24 +++++++++------
>> mm/hugetlb.c                  |  3 +-
>> mm/shmem.c                    |  6 ++--
>> mm/userfaultfd.c              | 57 ++++++++++++++++++-----------------
>> 7 files changed, 73 insertions(+), 51 deletions(-)
>> 
>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
>> index e943370107d0..2ae24327beec 100644
>> --- a/fs/userfaultfd.c
>> +++ b/fs/userfaultfd.c
>> @@ -1682,6 +1682,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>> 	struct uffdio_copy uffdio_copy;
>> 	struct uffdio_copy __user *user_uffdio_copy;
>> 	struct userfaultfd_wake_range range;
>> +	bool mode_wp;
>> +	uffd_flags_t uffd_flags;
>> 
>> 	user_uffdio_copy = (struct uffdio_copy __user *) arg;
>> 
>> @@ -1708,10 +1710,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>> 		goto out;
>> 	if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
>> 		goto out;
>> +
>> +	mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP;
> 
> This seems to be the only place where mode_wp is used in this function. 
> I'd just drop it, and set uffd_flags directly from uffdio_copy.mode. E.g.
> something like
> 
> 	uffd_flags_t uffd_flags = UFFD_FLAGS_NONE;
> 
> 	...
> 
> 	if (uffdio_copy.mode & UFFDIO_COPY_MODE_WP)
> 		uffd_flags = UFFD_FLAGS_WP;

Good point; taken.