diff mbox series

mm/userfaultfd: provide unmasked address on page-fault

Message ID 20211007235055.469587-1-namit@vmware.com (mailing list archive)
State New
Headers show
Series mm/userfaultfd: provide unmasked address on page-fault | expand

Commit Message

Nadav Amit Oct. 7, 2021, 11:50 p.m. UTC
From: Nadav Amit <namit@vmware.com>

Userfaultfd is supposed to provide the full address (i.e., unmasked) of
the faulting access back to userspace. However, that is not the case for
quite some time.

Even running "userfaultfd_demo" from the userfaultfd man page provides
the wrong output (and contradicts the man page). Notice that
"UFFD_EVENT_PAGEFAULT event" shows the masked address.

	Address returned by mmap() = 0x7fc5e30b3000

	fault_handler_thread():
	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
		(uffdio_copy.copy returned 4096)
	Read address 0x7fc5e30b300f in main(): A
	Read address 0x7fc5e30b340f in main(): A
	Read address 0x7fc5e30b380f in main(): A
	Read address 0x7fc5e30b3c0f in main(): A

Add a new "real_address" field to vmf to hold the unmasked address. It
is possible to keep the unmasked address in the existing address field
(and mask whenever necessary) instead, but this is likely to cause
backporting problems of this patch.

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Fixes: 1a29d85eb0f19 ("mm: use vmf->address instead of of vmf->virtual_address")
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 fs/userfaultfd.c   | 2 +-
 include/linux/mm.h | 3 ++-
 mm/memory.c        | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Oct. 8, 2021, 8:05 a.m. UTC | #1
On 08.10.21 01:50, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> Userfaultfd is supposed to provide the full address (i.e., unmasked) of
> the faulting access back to userspace. However, that is not the case for
> quite some time.
> 
> Even running "userfaultfd_demo" from the userfaultfd man page provides
> the wrong output (and contradicts the man page). Notice that
> "UFFD_EVENT_PAGEFAULT event" shows the masked address.
> 
> 	Address returned by mmap() = 0x7fc5e30b3000
> 
> 	fault_handler_thread():
> 	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
> 	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
> 		(uffdio_copy.copy returned 4096)
> 	Read address 0x7fc5e30b300f in main(): A
> 	Read address 0x7fc5e30b340f in main(): A
> 	Read address 0x7fc5e30b380f in main(): A
> 	Read address 0x7fc5e30b3c0f in main(): A
> 
> Add a new "real_address" field to vmf to hold the unmasked address. It
> is possible to keep the unmasked address in the existing address field
> (and mask whenever necessary) instead, but this is likely to cause
> backporting problems of this patch.

Can we be sure that no existing users will rely on this behavior that 
has been the case since end of 2016 IIRC, one year after UFFD was 
upstreamed? I do wonder what the official ABI nowadays is, because man 
pages aren't necessarily the source of truth.

I checked QEMU (postcopy live migration), and I think it should be fine 
with this change.

If we don't want to change the current ABI behavior, we could add a new 
feature flag to change behavior.

@Peter, what are your thoughts?
Nadav Amit Oct. 8, 2021, 10:02 p.m. UTC | #2
> On Oct 8, 2021, at 1:05 AM, David Hildenbrand <david@redhat.com> wrote:
> 
> On 08.10.21 01:50, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> Userfaultfd is supposed to provide the full address (i.e., unmasked) of
>> the faulting access back to userspace. However, that is not the case for
>> quite some time.
>> Even running "userfaultfd_demo" from the userfaultfd man page provides
>> the wrong output (and contradicts the man page). Notice that
>> "UFFD_EVENT_PAGEFAULT event" shows the masked address.
>> 	Address returned by mmap() = 0x7fc5e30b3000
>> 	fault_handler_thread():
>> 	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
>> 	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
>> 		(uffdio_copy.copy returned 4096)
>> 	Read address 0x7fc5e30b300f in main(): A
>> 	Read address 0x7fc5e30b340f in main(): A
>> 	Read address 0x7fc5e30b380f in main(): A
>> 	Read address 0x7fc5e30b3c0f in main(): A
>> Add a new "real_address" field to vmf to hold the unmasked address. It
>> is possible to keep the unmasked address in the existing address field
>> (and mask whenever necessary) instead, but this is likely to cause
>> backporting problems of this patch.
> 
> Can we be sure that no existing users will rely on this behavior that has been the case since end of 2016 IIRC, one year after UFFD was upstreamed?

Let me to blow off your mind: how do you be sure that the current behavior does not make applications to misbehave? It might cause performance issues as it did for me or hidden correctness issues.

> I do wonder what the official ABI nowadays is, because man pages aren't necessarily the source of truth.

Documentation/admin-guide/mm/userfaultfd.rst says: "You get the address of the access that triggered the missing page
event”.

So it is a bug.
David Hildenbrand Oct. 9, 2021, 7:59 a.m. UTC | #3
On 09.10.21 00:02, Nadav Amit wrote:
> 
> 
>> On Oct 8, 2021, at 1:05 AM, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.10.21 01:50, Nadav Amit wrote:
>>> From: Nadav Amit <namit@vmware.com>
>>> Userfaultfd is supposed to provide the full address (i.e., unmasked) of
>>> the faulting access back to userspace. However, that is not the case for
>>> quite some time.
>>> Even running "userfaultfd_demo" from the userfaultfd man page provides
>>> the wrong output (and contradicts the man page). Notice that
>>> "UFFD_EVENT_PAGEFAULT event" shows the masked address.
>>> 	Address returned by mmap() = 0x7fc5e30b3000
>>> 	fault_handler_thread():
>>> 	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
>>> 	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
>>> 		(uffdio_copy.copy returned 4096)
>>> 	Read address 0x7fc5e30b300f in main(): A
>>> 	Read address 0x7fc5e30b340f in main(): A
>>> 	Read address 0x7fc5e30b380f in main(): A
>>> 	Read address 0x7fc5e30b3c0f in main(): A
>>> Add a new "real_address" field to vmf to hold the unmasked address. It
>>> is possible to keep the unmasked address in the existing address field
>>> (and mask whenever necessary) instead, but this is likely to cause
>>> backporting problems of this patch.
>>
>> Can we be sure that no existing users will rely on this behavior that has been the case since end of 2016 IIRC, one year after UFFD was upstreamed?
> 
> Let me to blow off your mind: how do you be sure that the current behavior does not make applications to misbehave? It might cause performance issues as it did for me or hidden correctness issues.
> 

Fair point, but now we can speculate what's more likely:

Having an app rely on >4 year old kernel behavior just after the feature 
was released or having and app rely on kernel behavior that was the case 
for the last 4 years?

<offtopic>
Someone once told me about the unwritten way to remove things from the 
kernel. 1) Silently break it upstream 2) Wait 2 kernel releases 3) 
Propose removal of the feature because it's broken and nobody complained.
<\offtopic>

You might ask "why does David even care?", here is why:

For the records, I *do* have a prototype from last year that breaks with 
this new behavior as far as I can tell: using uffd in the context of 
virtio-balloon in QEMU. I just pushed the latest state to a !private 
github tree:
   https://github.com/davidhildenbrand/qemu/tree/virtio-balloon-uffd


In that code, I made sure that I'm only dealing with 4k pages (because 
that's the only thing virtio-balloon really can deal with), and during 
the debugging I figured that the kernel always returns 4k aligned page 
fault addresses, so I didn't care about masking. I'll reuse the 
unmodified fault address for UFFDIO_ZEROPAGE()/UFFDIO_COPY()/... which 
should then fail because:

"
EINVAL The start or the len field of the ufdio_range structure
               was not a multiple of the system page size; or len was
               zero; or the specified range was otherwise invalid.
"


If I'm too lazy to read all documentation, I'm quite sure that there are 
other people that don't. I don't care to much if this patch breaks that 
prototype, it's just a prototype after all, but I am concerned that we 
might break other users in a similar way.

>> I do wonder what the official ABI nowadays is, because man pages aren't necessarily the source of truth.
> 
> Documentation/admin-guide/mm/userfaultfd.rst says: "You get the address of the access that triggered the missing page
> event”.
> 
> So it is a bug.

The least thing I would expect in the patch description is a better 
motivation ("who cares and why" -- I know you have a better motivation 
that making the doc correct :) ) and a discussion on the chances of this 
actually breaking other apps (see my example).

I'd sleep better if we'd glue the changed behavior to a new feature 
flag, but that's just my 2 cents.
Mike Rapoport Oct. 10, 2021, 5:29 a.m. UTC | #4
On Fri, Oct 08, 2021 at 10:05:36AM +0200, David Hildenbrand wrote:
> On 08.10.21 01:50, Nadav Amit wrote:
> > From: Nadav Amit <namit@vmware.com>
> > 
> > Userfaultfd is supposed to provide the full address (i.e., unmasked) of
> > the faulting access back to userspace. However, that is not the case for
> > quite some time.
> > 
> > Even running "userfaultfd_demo" from the userfaultfd man page provides
> > the wrong output (and contradicts the man page). Notice that
> > "UFFD_EVENT_PAGEFAULT event" shows the masked address.
> > 
> > 	Address returned by mmap() = 0x7fc5e30b3000
> > 
> > 	fault_handler_thread():
> > 	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
> > 	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
> > 		(uffdio_copy.copy returned 4096)
> > 	Read address 0x7fc5e30b300f in main(): A
> > 	Read address 0x7fc5e30b340f in main(): A
> > 	Read address 0x7fc5e30b380f in main(): A
> > 	Read address 0x7fc5e30b3c0f in main(): A
> > 
> > Add a new "real_address" field to vmf to hold the unmasked address. It
> > is possible to keep the unmasked address in the existing address field
> > (and mask whenever necessary) instead, but this is likely to cause
> > backporting problems of this patch.
> 
> Can we be sure that no existing users will rely on this behavior that has
> been the case since end of 2016 IIRC, one year after UFFD was upstreamed? I
> do wonder what the official ABI nowadays is, because man pages aren't
> necessarily the source of truth.
> 
> I checked QEMU (postcopy live migration), and I think it should be fine with
> this change.

CRIU is Ok with this change, we anyway mask the address.
diff mbox series

Patch

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 003f0d31743e..1dfc0fcd83c1 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -481,7 +481,7 @@  vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 
 	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
 	uwq.wq.private = current;
-	uwq.msg = userfault_msg(vmf->address, vmf->flags, reason,
+	uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason,
 			ctx->features);
 	uwq.ctx = ctx;
 	uwq.waken = false;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bb2d938df4..f3f324e3f2bf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -523,7 +523,8 @@  struct vm_fault {
 		struct vm_area_struct *vma;	/* Target VMA */
 		gfp_t gfp_mask;			/* gfp mask to be used for allocations */
 		pgoff_t pgoff;			/* Logical page offset based on vma */
-		unsigned long address;		/* Faulting virtual address */
+		unsigned long address;		/* Faulting virtual address - masked */
+		unsigned long real_address;	/* Faulting virtual address - unmaked */
 	};
 	enum fault_flag flags;		/* FAULT_FLAG_xxx flags
 					 * XXX: should really be 'const' */
diff --git a/mm/memory.c b/mm/memory.c
index 12a7b2094434..3d2d7fdbb7dc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4594,6 +4594,7 @@  static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	struct vm_fault vmf = {
 		.vma = vma,
 		.address = address & PAGE_MASK,
+		.real_address = address,
 		.flags = flags,
 		.pgoff = linear_page_index(vma, address),
 		.gfp_mask = __get_fault_gfp_mask(vma),