diff mbox series

[v2] userfaultfd: preserve user-supplied address tag in struct uffd_msg

Message ID 20210630232931.3779403-1-pcc@google.com (mailing list archive)
State New, archived
Headers show
Series [v2] userfaultfd: preserve user-supplied address tag in struct uffd_msg | expand

Commit Message

Peter Collingbourne June 30, 2021, 11:29 p.m. UTC
If a user program uses userfaultfd on ranges of heap memory, it may
end up passing a tagged pointer to the kernel in the range.start
field of the UFFDIO_REGISTER ioctl. This can happen when using an
MTE-capable allocator, or on Android if using the Tagged Pointers
feature for MTE readiness [1].

When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field
of struct uffd_msg. However, from the application's perspective,
the tagged address *is* the memory address, so if the application
is unaware of memory tags, it may get confused by receiving an
address that is, from its point of view, outside of the bounds of the
allocation. We observed this behavior in the kselftest for userfaultfd
[2] but other applications could have the same problem.

Fix this by remembering which tag was used to originally register the
userfaultfd and passing that tag back in fault.address. In a future
enhancement, we may want to pass back the original fault address,
but like SA_EXPOSE_TAGBITS, this should be guarded by a flag.

[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0a25501b
Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Cc: <stable@vger.kernel.org> # 5.4
---
 Documentation/arm64/tagged-pointers.rst |  5 +++++
 fs/userfaultfd.c                        | 17 +++++++++++------
 include/linux/mm_types.h                |  3 ++-
 3 files changed, 18 insertions(+), 7 deletions(-)

Comments

Catalin Marinas July 1, 2021, 3:51 p.m. UTC | #1
Hi Peter,

On Wed, Jun 30, 2021 at 04:29:31PM -0700, Peter Collingbourne wrote:
> If a user program uses userfaultfd on ranges of heap memory, it may
> end up passing a tagged pointer to the kernel in the range.start
> field of the UFFDIO_REGISTER ioctl. This can happen when using an
> MTE-capable allocator, or on Android if using the Tagged Pointers
> feature for MTE readiness [1].

When we added the tagged addr ABI, we realised it's nearly impossible to
sort out all ioctls, so we added a note to the documentation that any
address other than pointer to user structures as arguments to ioctl()
should be untagged. Arguably, userfaultfd is not a random device but if
we place it in the same category as mmap/mremap/brk, those don't allow
tagged pointers either. And we do expect some apps to break when they
rely on malloc() to return untagged pointers.

> When a fault subsequently occurs, the tag is stripped from the fault
> address returned to the application in the fault.address field
> of struct uffd_msg. However, from the application's perspective,
> the tagged address *is* the memory address, so if the application
> is unaware of memory tags, it may get confused by receiving an
> address that is, from its point of view, outside of the bounds of the
> allocation. We observed this behavior in the kselftest for userfaultfd
> [2] but other applications could have the same problem.

Just curious, what's generating the tagged pointers in the kselftest? Is
it posix_memalign()?

> Fix this by remembering which tag was used to originally register the
> userfaultfd and passing that tag back in fault.address. In a future
> enhancement, we may want to pass back the original fault address,
> but like SA_EXPOSE_TAGBITS, this should be guarded by a flag.

I don't see exposing the tagged fault address vs making up a tag (from
the original request) that different. I find the former cleaner from an
ABI perspective, though it's a bit more intrusive to pass the tagged
address via handle_mm_fault().

My preference is to fix this in user-space entirely, by explicit
untagging of the malloc'ed pointer either before being passed to
userfaultfd or when handling the userfaultfd message. How common is it
for apps to register malloc'ed pointers with userfaultfd? I was hoping
that's more of an (anonymous) mmap() play.
Peter Collingbourne July 1, 2021, 5:50 p.m. UTC | #2
On Thu, Jul 1, 2021 at 8:51 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> Hi Peter,
>
> On Wed, Jun 30, 2021 at 04:29:31PM -0700, Peter Collingbourne wrote:
> > If a user program uses userfaultfd on ranges of heap memory, it may
> > end up passing a tagged pointer to the kernel in the range.start
> > field of the UFFDIO_REGISTER ioctl. This can happen when using an
> > MTE-capable allocator, or on Android if using the Tagged Pointers
> > feature for MTE readiness [1].
>
> When we added the tagged addr ABI, we realised it's nearly impossible to
> sort out all ioctls, so we added a note to the documentation that any
> address other than pointer to user structures as arguments to ioctl()
> should be untagged. Arguably, userfaultfd is not a random device but if
> we place it in the same category as mmap/mremap/brk, those don't allow
> tagged pointers either. And we do expect some apps to break when they
> rely on malloc() to return untagged pointers.

Okay, so arguably another approach would be to make userfaultfd
consistent with mmap/mremap/brk and let the UFFDIO_REGISTER fail if
given a tagged address.

> > When a fault subsequently occurs, the tag is stripped from the fault
> > address returned to the application in the fault.address field
> > of struct uffd_msg. However, from the application's perspective,
> > the tagged address *is* the memory address, so if the application
> > is unaware of memory tags, it may get confused by receiving an
> > address that is, from its point of view, outside of the bounds of the
> > allocation. We observed this behavior in the kselftest for userfaultfd
> > [2] but other applications could have the same problem.
>
> Just curious, what's generating the tagged pointers in the kselftest? Is
> it posix_memalign()?

Yes, on Android that call goes into our allocator which returns the
tagged pointer.

> > Fix this by remembering which tag was used to originally register the
> > userfaultfd and passing that tag back in fault.address. In a future
> > enhancement, we may want to pass back the original fault address,
> > but like SA_EXPOSE_TAGBITS, this should be guarded by a flag.
>
> I don't see exposing the tagged fault address vs making up a tag (from
> the original request) that different. I find the former cleaner from an
> ABI perspective, though it's a bit more intrusive to pass the tagged
> address via handle_mm_fault().
>
> My preference is to fix this in user-space entirely, by explicit
> untagging of the malloc'ed pointer either before being passed to
> userfaultfd or when handling the userfaultfd message. How common is it
> for apps to register malloc'ed pointers with userfaultfd? I was hoping
> that's more of an (anonymous) mmap() play.

At least we haven't seen any apps do this so far, and the tagged
pointers feature has been in Android since last year's Android 11
release. So maybe we can say this is uncommon enough that we can just
let userspace handle this. So we would do:

1. Forbid tagged pointers in the ioctl as mentioned above.
2. Fix the kselftest (e.g. by untagging the pointer, or making it use
mmap). A fix would probably be needed here anyway because we noticed
that the test is later passing a tagged heap pointer to mremap (and
failing).

I'd be okay with this approach but I'd first like to hear from
Alistair and/or Lokesh since I think they favored the approach in my
patch.

Peter
Lokesh Gidra July 2, 2021, 5:27 a.m. UTC | #3
On Thu, Jul 1, 2021 at 10:50 AM Peter Collingbourne <pcc@google.com> wrote:
>
> On Thu, Jul 1, 2021 at 8:51 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> >
> > Hi Peter,
> >
> > On Wed, Jun 30, 2021 at 04:29:31PM -0700, Peter Collingbourne wrote:
> > > If a user program uses userfaultfd on ranges of heap memory, it may
> > > end up passing a tagged pointer to the kernel in the range.start
> > > field of the UFFDIO_REGISTER ioctl. This can happen when using an
> > > MTE-capable allocator, or on Android if using the Tagged Pointers
> > > feature for MTE readiness [1].
> >
> > When we added the tagged addr ABI, we realised it's nearly impossible to
> > sort out all ioctls, so we added a note to the documentation that any
> > address other than pointer to user structures as arguments to ioctl()
> > should be untagged. Arguably, userfaultfd is not a random device but if
> > we place it in the same category as mmap/mremap/brk, those don't allow
> > tagged pointers either. And we do expect some apps to break when they
> > rely on malloc() to return untagged pointers.
>
> Okay, so arguably another approach would be to make userfaultfd
> consistent with mmap/mremap/brk and let the UFFDIO_REGISTER fail if
> given a tagged address.
>
This approach also seems reasonable. The problem, as things stand
today, is that UFFDIO_REGISTER doesn't complain when a tagged pointer
is used to register a memory range. But eventually the returned fault
address in messages are untagged. If UFFDIO_REGISTER were to fail on
passing a tagged pointer, then the userspace can address the issue.

> > > When a fault subsequently occurs, the tag is stripped from the fault
> > > address returned to the application in the fault.address field
> > > of struct uffd_msg. However, from the application's perspective,
> > > the tagged address *is* the memory address, so if the application
> > > is unaware of memory tags, it may get confused by receiving an
> > > address that is, from its point of view, outside of the bounds of the
> > > allocation. We observed this behavior in the kselftest for userfaultfd
> > > [2] but other applications could have the same problem.
> >
> > Just curious, what's generating the tagged pointers in the kselftest? Is
> > it posix_memalign()?
>
> Yes, on Android that call goes into our allocator which returns the
> tagged pointer.
>
> > > Fix this by remembering which tag was used to originally register the
> > > userfaultfd and passing that tag back in fault.address. In a future
> > > enhancement, we may want to pass back the original fault address,
> > > but like SA_EXPOSE_TAGBITS, this should be guarded by a flag.
> >
> > I don't see exposing the tagged fault address vs making up a tag (from
> > the original request) that different. I find the former cleaner from an
> > ABI perspective, though it's a bit more intrusive to pass the tagged
> > address via handle_mm_fault().
> >
> > My preference is to fix this in user-space entirely, by explicit
> > untagging of the malloc'ed pointer either before being passed to
> > userfaultfd or when handling the userfaultfd message. How common is it
> > for apps to register malloc'ed pointers with userfaultfd? I was hoping
> > that's more of an (anonymous) mmap() play.

I think it is very unlikely for someone to use malloc'ed pointers with
userfaultfd.

>
> At least we haven't seen any apps do this so far, and the tagged
> pointers feature has been in Android since last year's Android 11
> release. So maybe we can say this is uncommon enough that we can just
> let userspace handle this. So we would do:
>
> 1. Forbid tagged pointers in the ioctl as mentioned above.
> 2. Fix the kselftest (e.g. by untagging the pointer, or making it use
> mmap). A fix would probably be needed here anyway because we noticed
> that the test is later passing a tagged heap pointer to mremap (and
> failing).

The plan looks good to me. Using mmap (instead of posix_memalign)
seems like a cleaner fix to the kselftest as compared to untagging the
pointer everywhere.
>
> I'd be okay with this approach but I'd first like to hear from
> Alistair and/or Lokesh since I think they favored the approach in my
> patch.
>
> Peter
Catalin Marinas July 2, 2021, 11:48 a.m. UTC | #4
On Thu, Jul 01, 2021 at 10:27:31PM -0700, Lokesh Gidra wrote:
> On Thu, Jul 1, 2021 at 10:50 AM Peter Collingbourne <pcc@google.com> wrote:
> > On Thu, Jul 1, 2021 at 8:51 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Wed, Jun 30, 2021 at 04:29:31PM -0700, Peter Collingbourne wrote:
> > > > If a user program uses userfaultfd on ranges of heap memory, it may
> > > > end up passing a tagged pointer to the kernel in the range.start
> > > > field of the UFFDIO_REGISTER ioctl. This can happen when using an
> > > > MTE-capable allocator, or on Android if using the Tagged Pointers
> > > > feature for MTE readiness [1].
> > >
> > > When we added the tagged addr ABI, we realised it's nearly impossible to
> > > sort out all ioctls, so we added a note to the documentation that any
> > > address other than pointer to user structures as arguments to ioctl()
> > > should be untagged. Arguably, userfaultfd is not a random device but if
> > > we place it in the same category as mmap/mremap/brk, those don't allow
> > > tagged pointers either. And we do expect some apps to break when they
> > > rely on malloc() to return untagged pointers.
> >
> > Okay, so arguably another approach would be to make userfaultfd
> > consistent with mmap/mremap/brk and let the UFFDIO_REGISTER fail if
> > given a tagged address.
>
> This approach also seems reasonable. The problem, as things stand
> today, is that UFFDIO_REGISTER doesn't complain when a tagged pointer
> is used to register a memory range. But eventually the returned fault
> address in messages are untagged. If UFFDIO_REGISTER were to fail on
> passing a tagged pointer, then the userspace can address the issue.

On the mmap etc. functions we get an error as a side effect of addr
being larger than TASK_SIZE (unless explicitly untagged). The
userfaultfd_register() function had similar checks but they were relaxed
by commit 7d0325749a6c ("userfaultfd: untag user pointers").

I think we should revert the above, or part of it. We did something
similar for mmap/mremap/brk when untagging the address broke glibc:
commit dcde237319e6 ("mm: Avoid creating virtual address aliases in
brk()/mmap()/mremap()").
diff mbox series

Patch

diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst
index 19d284b70384..ec8e1f90b744 100644
--- a/Documentation/arm64/tagged-pointers.rst
+++ b/Documentation/arm64/tagged-pointers.rst
@@ -73,6 +73,11 @@  flag setting.
 Non-zero tags are never preserved in sigcontext.fault_address
 regardless of the SA_EXPOSE_TAGBITS flag setting.
 
+When using userfaultfd the address tag supplied in the range.start
+field of the UFFDIO_REGISTER ioctl is preserved and returned to
+userspace via the fault.address field of struct uffd_msg, and the
+tag of the original fault address is discarded.
+
 The architecture prevents the use of a tagged PC, so the upper byte will
 be set to a sign-extension of bit 55 on exception return.
 
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index dd7a6c62b56f..adb0f7d0638a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -110,15 +110,15 @@  static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
 	struct userfaultfd_wake_range *range = key;
 	int ret;
 	struct userfaultfd_wait_queue *uwq;
-	unsigned long start, len;
+	unsigned long start, len, addr;
 
 	uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
 	ret = 0;
 	/* len == 0 means wake all */
 	start = range->start;
 	len = range->len;
-	if (len && (start > uwq->msg.arg.pagefault.address ||
-		    start + len <= uwq->msg.arg.pagefault.address))
+	addr = untagged_addr(uwq->msg.arg.pagefault.address);
+	if (len && (start > addr || start + len <= addr))
 		goto out;
 	WRITE_ONCE(uwq->waken, true);
 	/*
@@ -480,8 +480,9 @@  vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 
 	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
 	uwq.wq.private = current;
-	uwq.msg = userfault_msg(vmf->address, vmf->flags, reason,
-			ctx->features);
+	uwq.msg = userfault_msg(
+		vmf->address + vmf->vma->vm_userfaultfd_ctx.address_tag,
+		vmf->flags, reason, ctx->features);
 	uwq.ctx = ctx;
 	uwq.waken = false;
 
@@ -1287,7 +1288,7 @@  static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	unsigned long vm_flags, new_flags;
 	bool found;
 	bool basic_ioctls;
-	unsigned long start, end, vma_end;
+	unsigned long address_tag, start, end, vma_end;
 
 	user_uffdio_register = (struct uffdio_register __user *) arg;
 
@@ -1313,6 +1314,9 @@  static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		vm_flags |= VM_UFFD_MINOR;
 	}
 
+	address_tag = uffdio_register.range.start -
+		      untagged_addr(uffdio_register.range.start);
+
 	ret = validate_range(mm, &uffdio_register.range.start,
 			     uffdio_register.range.len);
 	if (ret)
@@ -1462,6 +1466,7 @@  static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		 */
 		vma->vm_flags = new_flags;
 		vma->vm_userfaultfd_ctx.ctx = ctx;
+		vma->vm_userfaultfd_ctx.address_tag = address_tag;
 
 		if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma))
 			hugetlb_unshare_all_pmds(vma);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8f0fb62e8975..cb93e5b17896 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -286,9 +286,10 @@  struct vm_region {
 };
 
 #ifdef CONFIG_USERFAULTFD
-#define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) { NULL, })
+#define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) { NULL, 0, })
 struct vm_userfaultfd_ctx {
 	struct userfaultfd_ctx *ctx;
+	unsigned long address_tag;
 };
 #else /* CONFIG_USERFAULTFD */
 #define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) {})