Message ID | 20200421142603.3894-21-catalin.marinas@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Memory Tagging Extension user-space support | expand |
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > While this function is not on a critical path, the single-pass behaviour > is required for arm64 MTE (memory tagging) support where a uaccess can > trigger intra-page faults (tag not matching). With the current > implementation, if this happens during the first page, the function will > return -EFAULT. Details, please.
On Tue, Apr 21, 2020 at 04:29:48PM +0100, Al Viro wrote: > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > > While this function is not on a critical path, the single-pass behaviour > > is required for arm64 MTE (memory tagging) support where a uaccess can > > trigger intra-page faults (tag not matching). With the current > > implementation, if this happens during the first page, the function will > > return -EFAULT. > > Details, please. With the arm64 MTE support (memory tagging extensions, see [1] for the full series), bits 56..59 of a pointer (the tag) are checked against the corresponding tag/colour set in memory (on a 16-byte granule). When copy_mount_options() gets such tagged user pointer, it attempts to read 4K even though the user buffer is smaller. The user would only guarantee the same matching tag for the data it masses to mount(), not the whole 4K or to the end of a page. The side effect is that the first copy_from_user() could still fault after reading some bytes but before reaching the end of the page. Prior to commit 12efec560274 ("saner copy_mount_options()"), this code had a fallback to byte-by-byte copying. I thought I'd not revert this commit as the copy_mount_options() now looks cleaner. [1] https://lore.kernel.org/linux-arm-kernel/20200421142603.3894-1-catalin.marinas@arm.com/
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > The copy_mount_options() function takes a user pointer argument but not > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is > not guaranteed to return all the accessible bytes if, for example, the > access crosses a page boundary and gets a fault on the second page. To > work around this, the current copy_mount_options() implementations > performs to copy_from_user() passes, first to the end of the current > page and the second to what's left in the subsequent page. > > Some architectures like arm64 can guarantee an exact copy_from_user() > depending on the size (since the arch function performs some alignment > on the source register). Introduce an arch_has_exact_copy_from_user() > function and allow copy_mount_options() to perform the user access in a > single pass. > > While this function is not on a critical path, the single-pass behaviour > is required for arm64 MTE (memory tagging) support where a uaccess can > trigger intra-page faults (tag not matching). With the current > implementation, if this happens during the first page, the function will > return -EFAULT. Do you know how much extra overhead we'd incur if we read at must one tag granule at a time, instead of PAGE_SIZE? I'm guessing that in practice strcpy_from_user() type operations copy much less than a page most of the time, so what we lose in uaccess overheads we _might_ regain in less redundant copying. Would need behchmarking though. [...] Cheers ---Dave
On Mon, Apr 27, 2020 at 05:56:42PM +0100, Dave P Martin wrote: > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > > The copy_mount_options() function takes a user pointer argument but not > > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is > > not guaranteed to return all the accessible bytes if, for example, the > > access crosses a page boundary and gets a fault on the second page. To > > work around this, the current copy_mount_options() implementations > > performs to copy_from_user() passes, first to the end of the current > > page and the second to what's left in the subsequent page. > > > > Some architectures like arm64 can guarantee an exact copy_from_user() > > depending on the size (since the arch function performs some alignment > > on the source register). Introduce an arch_has_exact_copy_from_user() > > function and allow copy_mount_options() to perform the user access in a > > single pass. > > > > While this function is not on a critical path, the single-pass behaviour > > is required for arm64 MTE (memory tagging) support where a uaccess can > > trigger intra-page faults (tag not matching). With the current > > implementation, if this happens during the first page, the function will > > return -EFAULT. > > Do you know how much extra overhead we'd incur if we read at must one > tag granule at a time, instead of PAGE_SIZE? Our copy routines already read 16 bytes at a time, so that's the tag granule. With current copy_mount_options() we have the issue that it assumes a fault in the first page is fatal. Even if we change it to a loop of smaller uaccess, we still have the issue of unaligned accesses which can fail without reading all that's possible (i.e. the access goes across a tag granule boundary). The previous copy_mount_options() implementation (from couple of months ago I think) had a fallback to byte-by-byte, didn't have this issue. > I'm guessing that in practice strcpy_from_user() type operations copy > much less than a page most of the time, so what we lose in uaccess > overheads we _might_ regain in less redundant copying. strncpy_from_user() has a fallback to byte by byte, so we don't have an issue here. The above is only for synchronous accesses. For async, in v3 I disabled such checks for the uaccess routines.
On 21/04/2020 15:26, Catalin Marinas wrote: > The copy_mount_options() function takes a user pointer argument but not > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is > not guaranteed to return all the accessible bytes if, for example, the > access crosses a page boundary and gets a fault on the second page. To > work around this, the current copy_mount_options() implementations > performs to copy_from_user() passes, first to the end of the current > page and the second to what's left in the subsequent page. > > Some architectures like arm64 can guarantee an exact copy_from_user() > depending on the size (since the arch function performs some alignment > on the source register). Introduce an arch_has_exact_copy_from_user() > function and allow copy_mount_options() to perform the user access in a > single pass. > > While this function is not on a critical path, the single-pass behaviour > is required for arm64 MTE (memory tagging) support where a uaccess can > trigger intra-page faults (tag not matching). With the current > implementation, if this happens during the first page, the function will > return -EFAULT. > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Will Deacon <will@kernel.org> > --- > > Notes: > New in v3. > > arch/arm64/include/asm/uaccess.h | 11 +++++++++++ > fs/namespace.c | 7 +++++-- > include/linux/uaccess.h | 8 ++++++++ > 3 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h > index 32fc8061aa76..566da441eba2 100644 > --- a/arch/arm64/include/asm/uaccess.h > +++ b/arch/arm64/include/asm/uaccess.h > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi > #define INLINE_COPY_TO_USER > #define INLINE_COPY_FROM_USER > > +static inline bool arch_has_exact_copy_from_user(unsigned long n) > +{ > + /* > + * copy_from_user() aligns the source pointer if the size is greater > + * than 15. Since all the loads are naturally aligned, they can only > + * fail on the first byte. > + */ > + return n > 15; > +} > +#define arch_has_exact_copy_from_user > + > extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n); > static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n) > { > diff --git a/fs/namespace.c b/fs/namespace.c > index a28e4db075ed..8febc50dfc5d 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) > if (!copy) > return ERR_PTR(-ENOMEM); > > - size = PAGE_SIZE - offset_in_page(data); > + size = PAGE_SIZE; > + if (!arch_has_exact_copy_from_user(size)) > + size -= offset_in_page(data); > > - if (copy_from_user(copy, data, size)) { > + if (copy_from_user(copy, data, size) == size) { > kfree(copy); > return ERR_PTR(-EFAULT); > } > if (size != PAGE_SIZE) { > + WARN_ON(1); I'm not sure I understand the rationale here. If we don't have exact copy_from_user for size, then we will attempt to copy up to the end of the page. Assuming this doesn't fault, we then want to carry on copying from the start of the next page, until we reach a total size of up to 4K. Why would we warn in that case? AIUI, if you don't have exact copy_from_user, there are 3 cases: 1. copy_from_user() returns size, we bail out. 2. copy_from_user() returns 0, we carry on copying from the next page. 3. copy_from_user() returns anything else, we return immediately. I think you're not handling case 3 here. Kevin > if (copy_from_user(copy + size, data + size, PAGE_SIZE - size)) > memset(copy + size, 0, PAGE_SIZE - size); > } > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h > index 67f016010aad..00e097a9e8d6 100644 > --- a/include/linux/uaccess.h > +++ b/include/linux/uaccess.h > @@ -152,6 +152,14 @@ copy_to_user(void __user *to, const void *from, unsigned long n) > n = _copy_to_user(to, from, n); > return n; > } > + > +#ifndef arch_has_exact_copy_from_user > +static inline bool arch_has_exact_copy_from_user(unsigned long n) > +{ > + return false; > +} > +#endif > + > #ifdef CONFIG_COMPAT > static __always_inline unsigned long __must_check > copy_in_user(void __user *to, const void __user *from, unsigned long n)
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) > if (!copy) > return ERR_PTR(-ENOMEM); > > - size = PAGE_SIZE - offset_in_page(data); > + size = PAGE_SIZE; > + if (!arch_has_exact_copy_from_user(size)) > + size -= offset_in_page(data); > > - if (copy_from_user(copy, data, size)) { > + if (copy_from_user(copy, data, size) == size) { > kfree(copy); > return ERR_PTR(-EFAULT); > } > if (size != PAGE_SIZE) { > + WARN_ON(1); > if (copy_from_user(copy + size, data + size, PAGE_SIZE - size)) > memset(copy + size, 0, PAGE_SIZE - size); > } Argh, this WARN_ON should not be here at all. It's something I added to make check that I don't reach this part in arm64. Will remove in v4.
On Tue, Apr 28, 2020 at 07:16:29PM +0100, Kevin Brodsky wrote: > On 21/04/2020 15:26, Catalin Marinas wrote: > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h > > index 32fc8061aa76..566da441eba2 100644 > > --- a/arch/arm64/include/asm/uaccess.h > > +++ b/arch/arm64/include/asm/uaccess.h > > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi > > #define INLINE_COPY_TO_USER > > #define INLINE_COPY_FROM_USER > > +static inline bool arch_has_exact_copy_from_user(unsigned long n) > > +{ > > + /* > > + * copy_from_user() aligns the source pointer if the size is greater > > + * than 15. Since all the loads are naturally aligned, they can only > > + * fail on the first byte. > > + */ > > + return n > 15; > > +} > > +#define arch_has_exact_copy_from_user > > + > > extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n); > > static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n) > > { > > diff --git a/fs/namespace.c b/fs/namespace.c > > index a28e4db075ed..8febc50dfc5d 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) > > if (!copy) > > return ERR_PTR(-ENOMEM); > > - size = PAGE_SIZE - offset_in_page(data); > > + size = PAGE_SIZE; > > + if (!arch_has_exact_copy_from_user(size)) > > + size -= offset_in_page(data); > > - if (copy_from_user(copy, data, size)) { > > + if (copy_from_user(copy, data, size) == size) { > > kfree(copy); > > return ERR_PTR(-EFAULT); > > } > > if (size != PAGE_SIZE) { > > + WARN_ON(1); > > I'm not sure I understand the rationale here. If we don't have exact > copy_from_user for size, then we will attempt to copy up to the end of the > page. Assuming this doesn't fault, we then want to carry on copying from the > start of the next page, until we reach a total size of up to 4K. Why would > we warn in that case? We shouldn't warn, thanks for spotting this. I added it for some testing and somehow ended up in the commit.
On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > The copy_mount_options() function takes a user pointer argument but not > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is > not guaranteed to return all the accessible bytes if, for example, the > access crosses a page boundary and gets a fault on the second page. To > work around this, the current copy_mount_options() implementations > performs to copy_from_user() passes, first to the end of the current implementation performs two > page and the second to what's left in the subsequent page. > > Some architectures like arm64 can guarantee an exact copy_from_user() > depending on the size (since the arch function performs some alignment > on the source register). Introduce an arch_has_exact_copy_from_user() > function and allow copy_mount_options() to perform the user access in a > single pass. > > While this function is not on a critical path, the single-pass behaviour > is required for arm64 MTE (memory tagging) support where a uaccess can > trigger intra-page faults (tag not matching). With the current > implementation, if this happens during the first page, the function will > return -EFAULT. > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Will Deacon <will@kernel.org> > --- > > Notes: > New in v3. > > arch/arm64/include/asm/uaccess.h | 11 +++++++++++ > fs/namespace.c | 7 +++++-- > include/linux/uaccess.h | 8 ++++++++ > 3 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h > index 32fc8061aa76..566da441eba2 100644 > --- a/arch/arm64/include/asm/uaccess.h > +++ b/arch/arm64/include/asm/uaccess.h > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi > #define INLINE_COPY_TO_USER > #define INLINE_COPY_FROM_USER > > +static inline bool arch_has_exact_copy_from_user(unsigned long n) > +{ > + /* > + * copy_from_user() aligns the source pointer if the size is greater > + * than 15. Since all the loads are naturally aligned, they can only > + * fail on the first byte. > + */ > + return n > 15; > +} > +#define arch_has_exact_copy_from_user Did you mean: #define arch_has_exact_copy_from_user arch_has_exact_copy_from_user Mind you, if this expands to 1 I'd have expected copy_mount_options() not to compile, so I may be missing something. [...] > diff --git a/fs/namespace.c b/fs/namespace.c > index a28e4db075ed..8febc50dfc5d 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) [ Is this applying a band-aid to duct tape? The fs presumably knows ahead of time whether it's expecting a string or a fixed-size blob for data, so I'd hope we could just DTRT rather than playing SEGV roulette here. This might require more refactoring than makes sense for this series though. ] > if (!copy) > return ERR_PTR(-ENOMEM); > > - size = PAGE_SIZE - offset_in_page(data); > + size = PAGE_SIZE; > + if (!arch_has_exact_copy_from_user(size)) > + size -= offset_in_page(data); > > - if (copy_from_user(copy, data, size)) { > + if (copy_from_user(copy, data, size) == size) { > kfree(copy); > return ERR_PTR(-EFAULT); > } > if (size != PAGE_SIZE) { > + WARN_ON(1); > if (copy_from_user(copy + size, data + size, PAGE_SIZE - size)) > memset(copy + size, 0, PAGE_SIZE - size); > } [...] Cheers ---Dave
On Tue, Apr 28, 2020 at 03:06:27PM +0100, Catalin Marinas wrote: > On Mon, Apr 27, 2020 at 05:56:42PM +0100, Dave P Martin wrote: > > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > > > The copy_mount_options() function takes a user pointer argument but not > > > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is > > > not guaranteed to return all the accessible bytes if, for example, the > > > access crosses a page boundary and gets a fault on the second page. To > > > work around this, the current copy_mount_options() implementations > > > performs to copy_from_user() passes, first to the end of the current > > > page and the second to what's left in the subsequent page. > > > > > > Some architectures like arm64 can guarantee an exact copy_from_user() > > > depending on the size (since the arch function performs some alignment > > > on the source register). Introduce an arch_has_exact_copy_from_user() > > > function and allow copy_mount_options() to perform the user access in a > > > single pass. > > > > > > While this function is not on a critical path, the single-pass behaviour > > > is required for arm64 MTE (memory tagging) support where a uaccess can > > > trigger intra-page faults (tag not matching). With the current > > > implementation, if this happens during the first page, the function will > > > return -EFAULT. > > > > Do you know how much extra overhead we'd incur if we read at must one > > tag granule at a time, instead of PAGE_SIZE? > > Our copy routines already read 16 bytes at a time, so that's the tag > granule. With current copy_mount_options() we have the issue that it > assumes a fault in the first page is fatal. > > Even if we change it to a loop of smaller uaccess, we still have the > issue of unaligned accesses which can fail without reading all that's > possible (i.e. the access goes across a tag granule boundary). > > The previous copy_mount_options() implementation (from couple of months > ago I think) had a fallback to byte-by-byte, didn't have this issue. > > > I'm guessing that in practice strcpy_from_user() type operations copy > > much less than a page most of the time, so what we lose in uaccess > > overheads we _might_ regain in less redundant copying. > > strncpy_from_user() has a fallback to byte by byte, so we don't have an > issue here. > > The above is only for synchronous accesses. For async, in v3 I disabled > such checks for the uaccess routines. Fair enough, I hadn't fully got my head around what's going on here. (But see my other reply.) I was suspicious about the WARN_ON(), but I see people are on top of that. Cheers ---Dave
On Tue, Apr 28, 2020 at 07:16:29PM +0100, Kevin Brodsky wrote: > On 21/04/2020 15:26, Catalin Marinas wrote: > > diff --git a/fs/namespace.c b/fs/namespace.c > > index a28e4db075ed..8febc50dfc5d 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) > > if (!copy) > > return ERR_PTR(-ENOMEM); > > - size = PAGE_SIZE - offset_in_page(data); > > + size = PAGE_SIZE; > > + if (!arch_has_exact_copy_from_user(size)) > > + size -= offset_in_page(data); > > - if (copy_from_user(copy, data, size)) { > > + if (copy_from_user(copy, data, size) == size) { > > kfree(copy); > > return ERR_PTR(-EFAULT); > > } > > if (size != PAGE_SIZE) { > > + WARN_ON(1); > > I'm not sure I understand the rationale here. If we don't have exact > copy_from_user for size, then we will attempt to copy up to the end of the > page. Assuming this doesn't fault, we then want to carry on copying from the > start of the next page, until we reach a total size of up to 4K. Why would > we warn in that case? AIUI, if you don't have exact copy_from_user, there > are 3 cases: > 1. copy_from_user() returns size, we bail out. > 2. copy_from_user() returns 0, we carry on copying from the next page. > 3. copy_from_user() returns anything else, we return immediately. > > I think you're not handling case 3 here. (3) is still handled as (2) since the only check we have is whether copy_from_user() returned size. Since size is not updated, it falls through the second if block (where WARN_ON should have disappeared). Thinking some more about this, I think it can be simplified without adding arch_has_exact_copy_from_user(). We do have to guarantee on arm64 that a copy_from_user() to the end of a page (4K aligned, hence tag granule aligned) is exact but that's just matching the current semantics. What about this new patch below, replacing the current one: -------------8<------------------------------- From cf9a1c9668ce77af3ef6589ee8038e91df127dab Mon Sep 17 00:00:00 2001 From: Catalin Marinas <catalin.marinas@arm.com> Date: Wed, 15 Apr 2020 18:45:44 +0100 Subject: [PATCH] fs: Handle intra-page faults in copy_mount_options() The copy_mount_options() function takes a user pointer argument but no size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is not guaranteed to return all the accessible bytes if, for example, the access crosses a page boundary and gets a fault on the second page. To work around this, the current copy_mount_options() implementation performs two copy_from_user() passes, first to the end of the current page and the second to what's left in the subsequent page. On arm64 with MTE enabled, access to a user page may trigger a fault after part of the buffer has been copied (when the user pointer tag, bits 56-59, no longer matches the allocation tag stored in memory). Allow copy_mount_options() to handle such case by only returning -EFAULT if the first copy_from_user() has not copied any bytes. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Will Deacon <will@kernel.org> --- fs/namespace.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index a28e4db075ed..51eecbd8ea89 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3016,7 +3016,7 @@ static void shrink_submounts(struct mount *mnt) void *copy_mount_options(const void __user * data) { char *copy; - unsigned size; + unsigned size, left; if (!data) return NULL; @@ -3027,11 +3027,22 @@ void *copy_mount_options(const void __user * data) size = PAGE_SIZE - offset_in_page(data); - if (copy_from_user(copy, data, size)) { + /* + * Attempt to copy to the end of the first user page. On success, + * left == 0, copy the rest from the second user page (if it is + * accessible). + * + * On architectures with intra-page faults (arm64 with MTE), the read + * from the first page may fail after copying part of the user data + * (left > 0 && left < size). Do not attempt the second copy in this + * case as the end of the valid user buffer has already been reached. + */ + left = copy_from_user(copy, data, size); + if (left == size) { kfree(copy); return ERR_PTR(-EFAULT); } - if (size != PAGE_SIZE) { + if (left == 0 && size != PAGE_SIZE) { if (copy_from_user(copy + size, data + size, PAGE_SIZE - size)) memset(copy + size, 0, PAGE_SIZE - size); }
On Wed, Apr 29, 2020 at 11:26:51AM +0100, Dave P Martin wrote: > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h > > index 32fc8061aa76..566da441eba2 100644 > > --- a/arch/arm64/include/asm/uaccess.h > > +++ b/arch/arm64/include/asm/uaccess.h > > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi > > #define INLINE_COPY_TO_USER > > #define INLINE_COPY_FROM_USER > > > > +static inline bool arch_has_exact_copy_from_user(unsigned long n) > > +{ > > + /* > > + * copy_from_user() aligns the source pointer if the size is greater > > + * than 15. Since all the loads are naturally aligned, they can only > > + * fail on the first byte. > > + */ > > + return n > 15; > > +} > > +#define arch_has_exact_copy_from_user > > Did you mean: > > #define arch_has_exact_copy_from_user arch_has_exact_copy_from_user Yes (and I shouldn't write patches late in the day). > Mind you, if this expands to 1 I'd have expected copy_mount_options() > not to compile, so I may be missing something. I think arch_has_exact_copy_from_user() (with the braces) is looked up in the function namespace, so the macro isn't expanded. So arguably the patch is correct but pretty dodgy ;). I scrapped this in my second attempt in reply to Kevin. > > diff --git a/fs/namespace.c b/fs/namespace.c > > index a28e4db075ed..8febc50dfc5d 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) > > [ Is this applying a band-aid to duct tape? > > The fs presumably knows ahead of time whether it's expecting a string or > a fixed-size blob for data, so I'd hope we could just DTRT rather than > playing SEGV roulette here. > > This might require more refactoring than makes sense for this series > though. ] That's possible but it means moving the copy from sys_mount() to the specific places where it has additional information (the filesystems). I'm not even sure it's guaranteed to be strings. If it is, we could just replace all this with a strncpy_from_user().
On Wed, Apr 29, 2020 at 02:52:25PM +0100, Catalin Marinas wrote: > On Wed, Apr 29, 2020 at 11:26:51AM +0100, Dave P Martin wrote: > > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote: > > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h > > > index 32fc8061aa76..566da441eba2 100644 > > > --- a/arch/arm64/include/asm/uaccess.h > > > +++ b/arch/arm64/include/asm/uaccess.h > > > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi > > > #define INLINE_COPY_TO_USER > > > #define INLINE_COPY_FROM_USER > > > > > > +static inline bool arch_has_exact_copy_from_user(unsigned long n) > > > +{ > > > + /* > > > + * copy_from_user() aligns the source pointer if the size is greater > > > + * than 15. Since all the loads are naturally aligned, they can only > > > + * fail on the first byte. > > > + */ > > > + return n > 15; > > > +} > > > +#define arch_has_exact_copy_from_user > > > > Did you mean: > > > > #define arch_has_exact_copy_from_user arch_has_exact_copy_from_user > > Yes (and I shouldn't write patches late in the day). > > > Mind you, if this expands to 1 I'd have expected copy_mount_options() > > not to compile, so I may be missing something. > > I think arch_has_exact_copy_from_user() (with the braces) is looked up > in the function namespace, so the macro isn't expanded. So arguably the > patch is correct but pretty dodgy ;). > > I scrapped this in my second attempt in reply to Kevin. Problem solved! > > > diff --git a/fs/namespace.c b/fs/namespace.c > > > index a28e4db075ed..8febc50dfc5d 100644 > > > --- a/fs/namespace.c > > > +++ b/fs/namespace.c > > > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) > > > > [ Is this applying a band-aid to duct tape? > > > > The fs presumably knows ahead of time whether it's expecting a string or > > a fixed-size blob for data, so I'd hope we could just DTRT rather than > > playing SEGV roulette here. > > > > This might require more refactoring than makes sense for this series > > though. ] > > That's possible but it means moving the copy from sys_mount() to the > specific places where it has additional information (the filesystems). > I'm not even sure it's guaranteed to be strings. If it is, we could just > replace all this with a strncpy_from_user(). Fair enough. I'll add it to my wishlist... Cheers ---Dave
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 32fc8061aa76..566da441eba2 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi #define INLINE_COPY_TO_USER #define INLINE_COPY_FROM_USER +static inline bool arch_has_exact_copy_from_user(unsigned long n) +{ + /* + * copy_from_user() aligns the source pointer if the size is greater + * than 15. Since all the loads are naturally aligned, they can only + * fail on the first byte. + */ + return n > 15; +} +#define arch_has_exact_copy_from_user + extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n); static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n) { diff --git a/fs/namespace.c b/fs/namespace.c index a28e4db075ed..8febc50dfc5d 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data) if (!copy) return ERR_PTR(-ENOMEM); - size = PAGE_SIZE - offset_in_page(data); + size = PAGE_SIZE; + if (!arch_has_exact_copy_from_user(size)) + size -= offset_in_page(data); - if (copy_from_user(copy, data, size)) { + if (copy_from_user(copy, data, size) == size) { kfree(copy); return ERR_PTR(-EFAULT); } if (size != PAGE_SIZE) { + WARN_ON(1); if (copy_from_user(copy + size, data + size, PAGE_SIZE - size)) memset(copy + size, 0, PAGE_SIZE - size); } diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 67f016010aad..00e097a9e8d6 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -152,6 +152,14 @@ copy_to_user(void __user *to, const void *from, unsigned long n) n = _copy_to_user(to, from, n); return n; } + +#ifndef arch_has_exact_copy_from_user +static inline bool arch_has_exact_copy_from_user(unsigned long n) +{ + return false; +} +#endif + #ifdef CONFIG_COMPAT static __always_inline unsigned long __must_check copy_in_user(void __user *to, const void __user *from, unsigned long n)
The copy_mount_options() function takes a user pointer argument but not a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is not guaranteed to return all the accessible bytes if, for example, the access crosses a page boundary and gets a fault on the second page. To work around this, the current copy_mount_options() implementations performs to copy_from_user() passes, first to the end of the current page and the second to what's left in the subsequent page. Some architectures like arm64 can guarantee an exact copy_from_user() depending on the size (since the arch function performs some alignment on the source register). Introduce an arch_has_exact_copy_from_user() function and allow copy_mount_options() to perform the user access in a single pass. While this function is not on a critical path, the single-pass behaviour is required for arm64 MTE (memory tagging) support where a uaccess can trigger intra-page faults (tag not matching). With the current implementation, if this happens during the first page, the function will return -EFAULT. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Will Deacon <will@kernel.org> --- Notes: New in v3. arch/arm64/include/asm/uaccess.h | 11 +++++++++++ fs/namespace.c | 7 +++++-- include/linux/uaccess.h | 8 ++++++++ 3 files changed, 24 insertions(+), 2 deletions(-)