diff mbox series

[v3,20/23] fs: Allow copy_mount_options() to access user-space in a single pass

Message ID 20200421142603.3894-21-catalin.marinas@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: Memory Tagging Extension user-space support | expand

Commit Message

Catalin Marinas April 21, 2020, 2:26 p.m. UTC
The copy_mount_options() function takes a user pointer argument but not
a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
not guaranteed to return all the accessible bytes if, for example, the
access crosses a page boundary and gets a fault on the second page. To
work around this, the current copy_mount_options() implementations
performs to copy_from_user() passes, first to the end of the current
page and the second to what's left in the subsequent page.

Some architectures like arm64 can guarantee an exact copy_from_user()
depending on the size (since the arch function performs some alignment
on the source register). Introduce an arch_has_exact_copy_from_user()
function and allow copy_mount_options() to perform the user access in a
single pass.

While this function is not on a critical path, the single-pass behaviour
is required for arm64 MTE (memory tagging) support where a uaccess can
trigger intra-page faults (tag not matching). With the current
implementation, if this happens during the first page, the function will
return -EFAULT.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    New in v3.

 arch/arm64/include/asm/uaccess.h | 11 +++++++++++
 fs/namespace.c                   |  7 +++++--
 include/linux/uaccess.h          |  8 ++++++++
 3 files changed, 24 insertions(+), 2 deletions(-)

Comments

Al Viro April 21, 2020, 3:29 p.m. UTC | #1
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:

> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.

Details, please.
Catalin Marinas April 21, 2020, 4:45 p.m. UTC | #2
On Tue, Apr 21, 2020 at 04:29:48PM +0100, Al Viro wrote:
> On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > While this function is not on a critical path, the single-pass behaviour
> > is required for arm64 MTE (memory tagging) support where a uaccess can
> > trigger intra-page faults (tag not matching). With the current
> > implementation, if this happens during the first page, the function will
> > return -EFAULT.
> 
> Details, please.

With the arm64 MTE support (memory tagging extensions, see [1] for the
full series), bits 56..59 of a pointer (the tag) are checked against the
corresponding tag/colour set in memory (on a 16-byte granule). When
copy_mount_options() gets such tagged user pointer, it attempts to read
4K even though the user buffer is smaller. The user would only guarantee
the same matching tag for the data it masses to mount(), not the whole
4K or to the end of a page. The side effect is that the first
copy_from_user() could still fault after reading some bytes but before
reaching the end of the page.

Prior to commit 12efec560274 ("saner copy_mount_options()"), this code
had a fallback to byte-by-byte copying. I thought I'd not revert this
commit as the copy_mount_options() now looks cleaner.

[1] https://lore.kernel.org/linux-arm-kernel/20200421142603.3894-1-catalin.marinas@arm.com/
Dave Martin April 27, 2020, 4:56 p.m. UTC | #3
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> The copy_mount_options() function takes a user pointer argument but not
> a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> not guaranteed to return all the accessible bytes if, for example, the
> access crosses a page boundary and gets a fault on the second page. To
> work around this, the current copy_mount_options() implementations
> performs to copy_from_user() passes, first to the end of the current
> page and the second to what's left in the subsequent page.
> 
> Some architectures like arm64 can guarantee an exact copy_from_user()
> depending on the size (since the arch function performs some alignment
> on the source register). Introduce an arch_has_exact_copy_from_user()
> function and allow copy_mount_options() to perform the user access in a
> single pass.
> 
> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.

Do you know how much extra overhead we'd incur if we read at must one
tag granule at a time, instead of PAGE_SIZE?

I'm guessing that in practice strcpy_from_user() type operations copy
much less than a page most of the time, so what we lose in uaccess
overheads we _might_ regain in less redundant copying.

Would need behchmarking though.

[...]

Cheers
---Dave
Catalin Marinas April 28, 2020, 2:06 p.m. UTC | #4
On Mon, Apr 27, 2020 at 05:56:42PM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > The copy_mount_options() function takes a user pointer argument but not
> > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> > not guaranteed to return all the accessible bytes if, for example, the
> > access crosses a page boundary and gets a fault on the second page. To
> > work around this, the current copy_mount_options() implementations
> > performs to copy_from_user() passes, first to the end of the current
> > page and the second to what's left in the subsequent page.
> > 
> > Some architectures like arm64 can guarantee an exact copy_from_user()
> > depending on the size (since the arch function performs some alignment
> > on the source register). Introduce an arch_has_exact_copy_from_user()
> > function and allow copy_mount_options() to perform the user access in a
> > single pass.
> > 
> > While this function is not on a critical path, the single-pass behaviour
> > is required for arm64 MTE (memory tagging) support where a uaccess can
> > trigger intra-page faults (tag not matching). With the current
> > implementation, if this happens during the first page, the function will
> > return -EFAULT.
> 
> Do you know how much extra overhead we'd incur if we read at must one
> tag granule at a time, instead of PAGE_SIZE?

Our copy routines already read 16 bytes at a time, so that's the tag
granule. With current copy_mount_options() we have the issue that it
assumes a fault in the first page is fatal.

Even if we change it to a loop of smaller uaccess, we still have the
issue of unaligned accesses which can fail without reading all that's
possible (i.e. the access goes across a tag granule boundary).

The previous copy_mount_options() implementation (from couple of months
ago I think) had a fallback to byte-by-byte, didn't have this issue.

> I'm guessing that in practice strcpy_from_user() type operations copy
> much less than a page most of the time, so what we lose in uaccess
> overheads we _might_ regain in less redundant copying.

strncpy_from_user() has a fallback to byte by byte, so we don't have an
issue here.

The above is only for synchronous accesses. For async, in v3 I disabled
such checks for the uaccess routines.
Kevin Brodsky April 28, 2020, 6:16 p.m. UTC | #5
On 21/04/2020 15:26, Catalin Marinas wrote:
> The copy_mount_options() function takes a user pointer argument but not
> a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> not guaranteed to return all the accessible bytes if, for example, the
> access crosses a page boundary and gets a fault on the second page. To
> work around this, the current copy_mount_options() implementations
> performs to copy_from_user() passes, first to the end of the current
> page and the second to what's left in the subsequent page.
>
> Some architectures like arm64 can guarantee an exact copy_from_user()
> depending on the size (since the arch function performs some alignment
> on the source register). Introduce an arch_has_exact_copy_from_user()
> function and allow copy_mount_options() to perform the user access in a
> single pass.
>
> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Will Deacon <will@kernel.org>
> ---
>
> Notes:
>      New in v3.
>
>   arch/arm64/include/asm/uaccess.h | 11 +++++++++++
>   fs/namespace.c                   |  7 +++++--
>   include/linux/uaccess.h          |  8 ++++++++
>   3 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 32fc8061aa76..566da441eba2 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
>   #define INLINE_COPY_TO_USER
>   #define INLINE_COPY_FROM_USER
>   
> +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> +{
> +	/*
> +	 * copy_from_user() aligns the source pointer if the size is greater
> +	 * than 15. Since all the loads are naturally aligned, they can only
> +	 * fail on the first byte.
> +	 */
> +	return n > 15;
> +}
> +#define arch_has_exact_copy_from_user
> +
>   extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n);
>   static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n)
>   {
> diff --git a/fs/namespace.c b/fs/namespace.c
> index a28e4db075ed..8febc50dfc5d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
>   	if (!copy)
>   		return ERR_PTR(-ENOMEM);
>   
> -	size = PAGE_SIZE - offset_in_page(data);
> +	size = PAGE_SIZE;
> +	if (!arch_has_exact_copy_from_user(size))
> +		size -= offset_in_page(data);
>   
> -	if (copy_from_user(copy, data, size)) {
> +	if (copy_from_user(copy, data, size) == size) {
>   		kfree(copy);
>   		return ERR_PTR(-EFAULT);
>   	}
>   	if (size != PAGE_SIZE) {
> +		WARN_ON(1);

I'm not sure I understand the rationale here. If we don't have exact copy_from_user 
for size, then we will attempt to copy up to the end of the page. Assuming this 
doesn't fault, we then want to carry on copying from the start of the next page, 
until we reach a total size of up to 4K. Why would we warn in that case? AIUI, if you 
don't have exact copy_from_user, there are 3 cases:
1. copy_from_user() returns size, we bail out.
2. copy_from_user() returns 0, we carry on copying from the next page.
3. copy_from_user() returns anything else, we return immediately.

I think you're not handling case 3 here.

Kevin

>   		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
>   			memset(copy + size, 0, PAGE_SIZE - size);
>   	}
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 67f016010aad..00e097a9e8d6 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -152,6 +152,14 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
>   		n = _copy_to_user(to, from, n);
>   	return n;
>   }
> +
> +#ifndef arch_has_exact_copy_from_user
> +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> +{
> +	return false;
> +}
> +#endif
> +
>   #ifdef CONFIG_COMPAT
>   static __always_inline unsigned long __must_check
>   copy_in_user(void __user *to, const void __user *from, unsigned long n)
Catalin Marinas April 28, 2020, 7:36 p.m. UTC | #6
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
>  	if (!copy)
>  		return ERR_PTR(-ENOMEM);
>  
> -	size = PAGE_SIZE - offset_in_page(data);
> +	size = PAGE_SIZE;
> +	if (!arch_has_exact_copy_from_user(size))
> +		size -= offset_in_page(data);
>  
> -	if (copy_from_user(copy, data, size)) {
> +	if (copy_from_user(copy, data, size) == size) {
>  		kfree(copy);
>  		return ERR_PTR(-EFAULT);
>  	}
>  	if (size != PAGE_SIZE) {
> +		WARN_ON(1);
>  		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
>  			memset(copy + size, 0, PAGE_SIZE - size);
>  	}

Argh, this WARN_ON should not be here at all. It's something I added to
make check that I don't reach this part in arm64. Will remove in v4.
Catalin Marinas April 28, 2020, 7:40 p.m. UTC | #7
On Tue, Apr 28, 2020 at 07:16:29PM +0100, Kevin Brodsky wrote:
> On 21/04/2020 15:26, Catalin Marinas wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > index 32fc8061aa76..566da441eba2 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
> >   #define INLINE_COPY_TO_USER
> >   #define INLINE_COPY_FROM_USER
> > +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> > +{
> > +	/*
> > +	 * copy_from_user() aligns the source pointer if the size is greater
> > +	 * than 15. Since all the loads are naturally aligned, they can only
> > +	 * fail on the first byte.
> > +	 */
> > +	return n > 15;
> > +}
> > +#define arch_has_exact_copy_from_user
> > +
> >   extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n);
> >   static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n)
> >   {
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index a28e4db075ed..8febc50dfc5d 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> >   	if (!copy)
> >   		return ERR_PTR(-ENOMEM);
> > -	size = PAGE_SIZE - offset_in_page(data);
> > +	size = PAGE_SIZE;
> > +	if (!arch_has_exact_copy_from_user(size))
> > +		size -= offset_in_page(data);
> > -	if (copy_from_user(copy, data, size)) {
> > +	if (copy_from_user(copy, data, size) == size) {
> >   		kfree(copy);
> >   		return ERR_PTR(-EFAULT);
> >   	}
> >   	if (size != PAGE_SIZE) {
> > +		WARN_ON(1);
> 
> I'm not sure I understand the rationale here. If we don't have exact
> copy_from_user for size, then we will attempt to copy up to the end of the
> page. Assuming this doesn't fault, we then want to carry on copying from the
> start of the next page, until we reach a total size of up to 4K. Why would
> we warn in that case?

We shouldn't warn, thanks for spotting this. I added it for some testing
and somehow ended up in the commit.
Dave Martin April 29, 2020, 10:26 a.m. UTC | #8
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> The copy_mount_options() function takes a user pointer argument but not
> a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> not guaranteed to return all the accessible bytes if, for example, the
> access crosses a page boundary and gets a fault on the second page. To
> work around this, the current copy_mount_options() implementations
> performs to copy_from_user() passes, first to the end of the current

implementation performs two

> page and the second to what's left in the subsequent page.
> 
> Some architectures like arm64 can guarantee an exact copy_from_user()
> depending on the size (since the arch function performs some alignment
> on the source register). Introduce an arch_has_exact_copy_from_user()
> function and allow copy_mount_options() to perform the user access in a
> single pass.
> 
> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Will Deacon <will@kernel.org>
> ---
> 
> Notes:
>     New in v3.
> 
>  arch/arm64/include/asm/uaccess.h | 11 +++++++++++
>  fs/namespace.c                   |  7 +++++--
>  include/linux/uaccess.h          |  8 ++++++++
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 32fc8061aa76..566da441eba2 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
>  #define INLINE_COPY_TO_USER
>  #define INLINE_COPY_FROM_USER
>  
> +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> +{
> +	/*
> +	 * copy_from_user() aligns the source pointer if the size is greater
> +	 * than 15. Since all the loads are naturally aligned, they can only
> +	 * fail on the first byte.
> +	 */
> +	return n > 15;
> +}
> +#define arch_has_exact_copy_from_user

Did you mean:

#define arch_has_exact_copy_from_user arch_has_exact_copy_from_user

Mind you, if this expands to 1 I'd have expected copy_mount_options()
not to compile, so I may be missing something.

[...]

> diff --git a/fs/namespace.c b/fs/namespace.c
> index a28e4db075ed..8febc50dfc5d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)

[ Is this applying a band-aid to duct tape?

The fs presumably knows ahead of time whether it's expecting a string or
a fixed-size blob for data, so I'd hope we could just DTRT rather than
playing SEGV roulette here.

This might require more refactoring than makes sense for this series
though. ]

>  	if (!copy)
>  		return ERR_PTR(-ENOMEM);
>  
> -	size = PAGE_SIZE - offset_in_page(data);
> +	size = PAGE_SIZE;
> +	if (!arch_has_exact_copy_from_user(size))
> +		size -= offset_in_page(data);
>  
> -	if (copy_from_user(copy, data, size)) {
> +	if (copy_from_user(copy, data, size) == size) {
>  		kfree(copy);
>  		return ERR_PTR(-EFAULT);
>  	}
>  	if (size != PAGE_SIZE) {
> +		WARN_ON(1);
>  		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
>  			memset(copy + size, 0, PAGE_SIZE - size);
>  	}

[...]

Cheers
---Dave
Dave Martin April 29, 2020, 10:28 a.m. UTC | #9
On Tue, Apr 28, 2020 at 03:06:27PM +0100, Catalin Marinas wrote:
> On Mon, Apr 27, 2020 at 05:56:42PM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > > The copy_mount_options() function takes a user pointer argument but not
> > > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> > > not guaranteed to return all the accessible bytes if, for example, the
> > > access crosses a page boundary and gets a fault on the second page. To
> > > work around this, the current copy_mount_options() implementations
> > > performs to copy_from_user() passes, first to the end of the current
> > > page and the second to what's left in the subsequent page.
> > > 
> > > Some architectures like arm64 can guarantee an exact copy_from_user()
> > > depending on the size (since the arch function performs some alignment
> > > on the source register). Introduce an arch_has_exact_copy_from_user()
> > > function and allow copy_mount_options() to perform the user access in a
> > > single pass.
> > > 
> > > While this function is not on a critical path, the single-pass behaviour
> > > is required for arm64 MTE (memory tagging) support where a uaccess can
> > > trigger intra-page faults (tag not matching). With the current
> > > implementation, if this happens during the first page, the function will
> > > return -EFAULT.
> > 
> > Do you know how much extra overhead we'd incur if we read at must one
> > tag granule at a time, instead of PAGE_SIZE?
> 
> Our copy routines already read 16 bytes at a time, so that's the tag
> granule. With current copy_mount_options() we have the issue that it
> assumes a fault in the first page is fatal.
> 
> Even if we change it to a loop of smaller uaccess, we still have the
> issue of unaligned accesses which can fail without reading all that's
> possible (i.e. the access goes across a tag granule boundary).
> 
> The previous copy_mount_options() implementation (from couple of months
> ago I think) had a fallback to byte-by-byte, didn't have this issue.
> 
> > I'm guessing that in practice strcpy_from_user() type operations copy
> > much less than a page most of the time, so what we lose in uaccess
> > overheads we _might_ regain in less redundant copying.
> 
> strncpy_from_user() has a fallback to byte by byte, so we don't have an
> issue here.
> 
> The above is only for synchronous accesses. For async, in v3 I disabled
> such checks for the uaccess routines.

Fair enough, I hadn't fully got my head around what's going on here.

(But see my other reply.)


I was suspicious about the WARN_ON(), but I see people are on top of
that.

Cheers
---Dave
Catalin Marinas April 29, 2020, 11:58 a.m. UTC | #10
On Tue, Apr 28, 2020 at 07:16:29PM +0100, Kevin Brodsky wrote:
> On 21/04/2020 15:26, Catalin Marinas wrote:
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index a28e4db075ed..8febc50dfc5d 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> >   	if (!copy)
> >   		return ERR_PTR(-ENOMEM);
> > -	size = PAGE_SIZE - offset_in_page(data);
> > +	size = PAGE_SIZE;
> > +	if (!arch_has_exact_copy_from_user(size))
> > +		size -= offset_in_page(data);
> > -	if (copy_from_user(copy, data, size)) {
> > +	if (copy_from_user(copy, data, size) == size) {
> >   		kfree(copy);
> >   		return ERR_PTR(-EFAULT);
> >   	}
> >   	if (size != PAGE_SIZE) {
> > +		WARN_ON(1);
> 
> I'm not sure I understand the rationale here. If we don't have exact
> copy_from_user for size, then we will attempt to copy up to the end of the
> page. Assuming this doesn't fault, we then want to carry on copying from the
> start of the next page, until we reach a total size of up to 4K. Why would
> we warn in that case? AIUI, if you don't have exact copy_from_user, there
> are 3 cases:
> 1. copy_from_user() returns size, we bail out.
> 2. copy_from_user() returns 0, we carry on copying from the next page.
> 3. copy_from_user() returns anything else, we return immediately.
> 
> I think you're not handling case 3 here.

(3) is still handled as (2) since the only check we have is whether
copy_from_user() returned size. Since size is not updated, it falls
through the second if block (where WARN_ON should have disappeared).

Thinking some more about this, I think it can be simplified without
adding arch_has_exact_copy_from_user(). We do have to guarantee on arm64
that a copy_from_user() to the end of a page (4K aligned, hence tag
granule aligned) is exact but that's just matching the current
semantics.

What about this new patch below, replacing the current one:

-------------8<-------------------------------
From cf9a1c9668ce77af3ef6589ee8038e91df127dab Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed, 15 Apr 2020 18:45:44 +0100
Subject: [PATCH] fs: Handle intra-page faults in copy_mount_options()

The copy_mount_options() function takes a user pointer argument but no
size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
not guaranteed to return all the accessible bytes if, for example, the
access crosses a page boundary and gets a fault on the second page. To
work around this, the current copy_mount_options() implementation
performs two copy_from_user() passes, first to the end of the current
page and the second to what's left in the subsequent page.

On arm64 with MTE enabled, access to a user page may trigger a fault
after part of the buffer has been copied (when the user pointer tag,
bits 56-59, no longer matches the allocation tag stored in memory).
Allow copy_mount_options() to handle such case by only returning -EFAULT
if the first copy_from_user() has not copied any bytes.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Will Deacon <will@kernel.org>
---
 fs/namespace.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a28e4db075ed..51eecbd8ea89 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3016,7 +3016,7 @@ static void shrink_submounts(struct mount *mnt)
 void *copy_mount_options(const void __user * data)
 {
 	char *copy;
-	unsigned size;
+	unsigned size, left;
 
 	if (!data)
 		return NULL;
@@ -3027,11 +3027,22 @@ void *copy_mount_options(const void __user * data)
 
 	size = PAGE_SIZE - offset_in_page(data);
 
-	if (copy_from_user(copy, data, size)) {
+	/*
+	 * Attempt to copy to the end of the first user page. On success,
+	 * left == 0, copy the rest from the second user page (if it is
+	 * accessible).
+	 *
+	 * On architectures with intra-page faults (arm64 with MTE), the read
+	 * from the first page may fail after copying part of the user data
+	 * (left > 0 && left < size). Do not attempt the second copy in this
+	 * case as the end of the valid user buffer has already been reached.
+	 */
+	left = copy_from_user(copy, data, size);
+	if (left == size) {
 		kfree(copy);
 		return ERR_PTR(-EFAULT);
 	}
-	if (size != PAGE_SIZE) {
+	if (left == 0 && size != PAGE_SIZE) {
 		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
 			memset(copy + size, 0, PAGE_SIZE - size);
 	}
Catalin Marinas April 29, 2020, 1:52 p.m. UTC | #11
On Wed, Apr 29, 2020 at 11:26:51AM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > index 32fc8061aa76..566da441eba2 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
> >  #define INLINE_COPY_TO_USER
> >  #define INLINE_COPY_FROM_USER
> >  
> > +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> > +{
> > +	/*
> > +	 * copy_from_user() aligns the source pointer if the size is greater
> > +	 * than 15. Since all the loads are naturally aligned, they can only
> > +	 * fail on the first byte.
> > +	 */
> > +	return n > 15;
> > +}
> > +#define arch_has_exact_copy_from_user
> 
> Did you mean:
> 
> #define arch_has_exact_copy_from_user arch_has_exact_copy_from_user

Yes (and I shouldn't write patches late in the day).

> Mind you, if this expands to 1 I'd have expected copy_mount_options()
> not to compile, so I may be missing something.

I think arch_has_exact_copy_from_user() (with the braces) is looked up
in the function namespace, so the macro isn't expanded. So arguably the
patch is correct but pretty dodgy ;).

I scrapped this in my second attempt in reply to Kevin.

> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index a28e4db075ed..8febc50dfc5d 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> 
> [ Is this applying a band-aid to duct tape?
> 
> The fs presumably knows ahead of time whether it's expecting a string or
> a fixed-size blob for data, so I'd hope we could just DTRT rather than
> playing SEGV roulette here.
> 
> This might require more refactoring than makes sense for this series
> though. ]

That's possible but it means moving the copy from sys_mount() to the
specific places where it has additional information (the filesystems).
I'm not even sure it's guaranteed to be strings. If it is, we could just
replace all this with a strncpy_from_user().
Dave Martin May 4, 2020, 4:40 p.m. UTC | #12
On Wed, Apr 29, 2020 at 02:52:25PM +0100, Catalin Marinas wrote:
> On Wed, Apr 29, 2020 at 11:26:51AM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > > index 32fc8061aa76..566da441eba2 100644
> > > --- a/arch/arm64/include/asm/uaccess.h
> > > +++ b/arch/arm64/include/asm/uaccess.h
> > > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
> > >  #define INLINE_COPY_TO_USER
> > >  #define INLINE_COPY_FROM_USER
> > >  
> > > +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> > > +{
> > > +	/*
> > > +	 * copy_from_user() aligns the source pointer if the size is greater
> > > +	 * than 15. Since all the loads are naturally aligned, they can only
> > > +	 * fail on the first byte.
> > > +	 */
> > > +	return n > 15;
> > > +}
> > > +#define arch_has_exact_copy_from_user
> > 
> > Did you mean:
> > 
> > #define arch_has_exact_copy_from_user arch_has_exact_copy_from_user
> 
> Yes (and I shouldn't write patches late in the day).
> 
> > Mind you, if this expands to 1 I'd have expected copy_mount_options()
> > not to compile, so I may be missing something.
> 
> I think arch_has_exact_copy_from_user() (with the braces) is looked up
> in the function namespace, so the macro isn't expanded. So arguably the
> patch is correct but pretty dodgy ;).
> 
> I scrapped this in my second attempt in reply to Kevin.

Problem solved!

> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index a28e4db075ed..8febc50dfc5d 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> > 
> > [ Is this applying a band-aid to duct tape?
> > 
> > The fs presumably knows ahead of time whether it's expecting a string or
> > a fixed-size blob for data, so I'd hope we could just DTRT rather than
> > playing SEGV roulette here.
> > 
> > This might require more refactoring than makes sense for this series
> > though. ]
> 
> That's possible but it means moving the copy from sys_mount() to the
> specific places where it has additional information (the filesystems).
> I'm not even sure it's guaranteed to be strings. If it is, we could just
> replace all this with a strncpy_from_user().

Fair enough.  I'll add it to my wishlist...

Cheers
---Dave
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 32fc8061aa76..566da441eba2 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -416,6 +416,17 @@  extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
 #define INLINE_COPY_TO_USER
 #define INLINE_COPY_FROM_USER
 
+static inline bool arch_has_exact_copy_from_user(unsigned long n)
+{
+	/*
+	 * copy_from_user() aligns the source pointer if the size is greater
+	 * than 15. Since all the loads are naturally aligned, they can only
+	 * fail on the first byte.
+	 */
+	return n > 15;
+}
+#define arch_has_exact_copy_from_user
+
 extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n);
 static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n)
 {
diff --git a/fs/namespace.c b/fs/namespace.c
index a28e4db075ed..8febc50dfc5d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3025,13 +3025,16 @@  void *copy_mount_options(const void __user * data)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	size = PAGE_SIZE - offset_in_page(data);
+	size = PAGE_SIZE;
+	if (!arch_has_exact_copy_from_user(size))
+		size -= offset_in_page(data);
 
-	if (copy_from_user(copy, data, size)) {
+	if (copy_from_user(copy, data, size) == size) {
 		kfree(copy);
 		return ERR_PTR(-EFAULT);
 	}
 	if (size != PAGE_SIZE) {
+		WARN_ON(1);
 		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
 			memset(copy + size, 0, PAGE_SIZE - size);
 	}
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad..00e097a9e8d6 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -152,6 +152,14 @@  copy_to_user(void __user *to, const void *from, unsigned long n)
 		n = _copy_to_user(to, from, n);
 	return n;
 }
+
+#ifndef arch_has_exact_copy_from_user
+static inline bool arch_has_exact_copy_from_user(unsigned long n)
+{
+	return false;
+}
+#endif
+
 #ifdef CONFIG_COMPAT
 static __always_inline unsigned long __must_check
 copy_in_user(void __user *to, const void __user *from, unsigned long n)