diff mbox series

[RESEND,1/4] uaccess: Add user_read_access_begin/end and user_write_access_begin/end

Message ID 27106d62fdbd4ffb47796236050e418131cb837f.1585811416.git.christophe.leroy@c-s.fr (mailing list archive)
State New, archived
Headers show
Series [RESEND,1/4] uaccess: Add user_read_access_begin/end and user_write_access_begin/end | expand

Commit Message

Christophe Leroy April 2, 2020, 7:34 a.m. UTC
Some architectures like powerpc64 have the capability to separate
read access and write access protection.
For get_user() and copy_from_user(), powerpc64 only open read access.
For put_user() and copy_to_user(), powerpc64 only open write access.
But when using unsafe_get_user() or unsafe_put_user(),
user_access_begin open both read and write.

Other architectures like powerpc book3s 32 bits only allow write
access protection. And on this architecture protection is an heavy
operation as it requires locking/unlocking per segment of 256Mbytes.
On those architecture it is therefore desirable to do the unlocking
only for write access. (Note that book3s/32 ranges from very old
powermac from the 90's with powerpc 601 processor, till modern
ADSL boxes with PowerQuicc II modern processors for instance so it
is still worth considering)

In order to avoid any risk based of hacking some variable parameters
passed to user_access_begin/end that would allow hacking and
leaving user access open or opening too much, it is preferable to
use dedicated static functions that can't be overridden.

Add a user_read_access_begin and user_read_access_end to only open
read access.

Add a user_write_access_begin and user_write_access_end to only open
write access.

By default, when undefined, those new access helpers default on the
existing user_access_begin and user_access_end.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://patchwork.ozlabs.org/patch/1227926/
---
Resending this series as I mistakenly only sent it to powerpc list
begining of February (https://patchwork.ozlabs.org/patch/1233172/)

This series is based on the discussion we had in January, see
https://patchwork.ozlabs.org/patch/1227926/ . I tried to
take into account all remarks, especially @hpa 's remark to use
a fixed API on not base the relocking on a magic id returned at
unlocking.

This series is awaited for implementing selective lkdtm test to
test powerpc64 independant read and write protection, see
https://patchwork.ozlabs.org/patch/1231765/

 include/linux/uaccess.h | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Kees Cook April 2, 2020, 7:46 a.m. UTC | #1
On Thu, Apr 02, 2020 at 07:34:16AM +0000, Christophe Leroy wrote:
> Some architectures like powerpc64 have the capability to separate
> read access and write access protection.
> For get_user() and copy_from_user(), powerpc64 only open read access.
> For put_user() and copy_to_user(), powerpc64 only open write access.
> But when using unsafe_get_user() or unsafe_put_user(),
> user_access_begin open both read and write.
> 
> Other architectures like powerpc book3s 32 bits only allow write
> access protection. And on this architecture protection is an heavy
> operation as it requires locking/unlocking per segment of 256Mbytes.
> On those architecture it is therefore desirable to do the unlocking
> only for write access. (Note that book3s/32 ranges from very old
> powermac from the 90's with powerpc 601 processor, till modern
> ADSL boxes with PowerQuicc II modern processors for instance so it
> is still worth considering)
> 
> In order to avoid any risk based of hacking some variable parameters
> passed to user_access_begin/end that would allow hacking and
> leaving user access open or opening too much, it is preferable to
> use dedicated static functions that can't be overridden.
> 
> Add a user_read_access_begin and user_read_access_end to only open
> read access.
> 
> Add a user_write_access_begin and user_write_access_end to only open
> write access.
> 
> By default, when undefined, those new access helpers default on the
> existing user_access_begin and user_access_end.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> Link: https://patchwork.ozlabs.org/patch/1227926/
> ---
> Resending this series as I mistakenly only sent it to powerpc list
> begining of February (https://patchwork.ozlabs.org/patch/1233172/)
> 
> This series is based on the discussion we had in January, see
> https://patchwork.ozlabs.org/patch/1227926/ . I tried to
> take into account all remarks, especially @hpa 's remark to use
> a fixed API on not base the relocking on a magic id returned at
> unlocking.
> 
> This series is awaited for implementing selective lkdtm test to
> test powerpc64 independant read and write protection, see
> https://patchwork.ozlabs.org/patch/1231765/
> 
>  include/linux/uaccess.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 67f016010aad..9861c89f93be 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -378,6 +378,14 @@ extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
>  static inline unsigned long user_access_save(void) { return 0UL; }
>  static inline void user_access_restore(unsigned long flags) { }
>  #endif
> +#ifndef user_write_access_begin
> +#define user_write_access_begin user_access_begin
> +#define user_write_access_end user_access_end
> +#endif
> +#ifndef user_read_access_begin
> +#define user_read_access_begin user_access_begin
> +#define user_read_access_end user_access_end
> +#endif
>  
>  #ifdef CONFIG_HARDENED_USERCOPY
>  void usercopy_warn(const char *name, const char *detail, bool to_user,
> -- 
> 2.25.0
>
Al Viro April 2, 2020, 4:29 p.m. UTC | #2
On Thu, Apr 02, 2020 at 07:34:16AM +0000, Christophe Leroy wrote:
> Some architectures like powerpc64 have the capability to separate
> read access and write access protection.
> For get_user() and copy_from_user(), powerpc64 only open read access.
> For put_user() and copy_to_user(), powerpc64 only open write access.
> But when using unsafe_get_user() or unsafe_put_user(),
> user_access_begin open both read and write.
> 
> Other architectures like powerpc book3s 32 bits only allow write
> access protection. And on this architecture protection is an heavy
> operation as it requires locking/unlocking per segment of 256Mbytes.
> On those architecture it is therefore desirable to do the unlocking
> only for write access. (Note that book3s/32 ranges from very old
> powermac from the 90's with powerpc 601 processor, till modern
> ADSL boxes with PowerQuicc II modern processors for instance so it
> is still worth considering)
> 
> In order to avoid any risk based of hacking some variable parameters
> passed to user_access_begin/end that would allow hacking and
> leaving user access open or opening too much, it is preferable to
> use dedicated static functions that can't be overridden.
> 
> Add a user_read_access_begin and user_read_access_end to only open
> read access.
> 
> Add a user_write_access_begin and user_write_access_end to only open
> write access.
> 
> By default, when undefined, those new access helpers default on the
> existing user_access_begin and user_access_end.

The only problem I have is that we'd better choose the calling
conventions that work for other architectures as well.

AFAICS, aside of ppc and x86 we have (at least) this:
arm:
	unsigned int __ua_flags = uaccess_save_and_enable();
	...
	uaccess_restore(__ua_flags);
arm64:
	uaccess_enable_not_uao();
	...
	uaccess_disable_not_uao();
riscv:
	__enable_user_access();
	...
	__disable_user_access();
s390/mvc:
	old_fs = enable_sacf_uaccess();
	...
        disable_sacf_uaccess(old_fs);

arm64 and riscv are easy - they map well on what we have now.
The interesting ones are ppc, arm and s390.

You wants to specify the kind of access; OK, but...  it's not just read
vs. write - there's read-write as well.  AFAICS, there are 3 users of
that:
	* copy_in_user()
	* arch_futex_atomic_op_inuser()
	* futex_atomic_cmpxchg_inatomic()
The former is of dubious utility (all users outside of arch are in
the badly done compat code), but the other two are not going to go
away.

What should we do about that?  Do we prohibit such blocks outside
of arch?

What should we do about arm and s390?  There we want a cookie passed
from beginning of block to its end; should that be a return value?

And at least on arm that thing nests (see e.g. __clear_user_memset()
there), so "stash that cookie in current->something" is not a solution...

Folks, let's sort that out while we still have few users of that
interface; changing the calling conventions later will be much harder.
Comments?
Christophe Leroy April 2, 2020, 5:03 p.m. UTC | #3
Le 02/04/2020 à 18:29, Al Viro a écrit :
> On Thu, Apr 02, 2020 at 07:34:16AM +0000, Christophe Leroy wrote:
>> Some architectures like powerpc64 have the capability to separate
>> read access and write access protection.
>> For get_user() and copy_from_user(), powerpc64 only open read access.
>> For put_user() and copy_to_user(), powerpc64 only open write access.
>> But when using unsafe_get_user() or unsafe_put_user(),
>> user_access_begin open both read and write.
>>
>> Other architectures like powerpc book3s 32 bits only allow write
>> access protection. And on this architecture protection is an heavy
>> operation as it requires locking/unlocking per segment of 256Mbytes.
>> On those architecture it is therefore desirable to do the unlocking
>> only for write access. (Note that book3s/32 ranges from very old
>> powermac from the 90's with powerpc 601 processor, till modern
>> ADSL boxes with PowerQuicc II modern processors for instance so it
>> is still worth considering)
>>
>> In order to avoid any risk based of hacking some variable parameters
>> passed to user_access_begin/end that would allow hacking and
>> leaving user access open or opening too much, it is preferable to
>> use dedicated static functions that can't be overridden.
>>
>> Add a user_read_access_begin and user_read_access_end to only open
>> read access.
>>
>> Add a user_write_access_begin and user_write_access_end to only open
>> write access.
>>
>> By default, when undefined, those new access helpers default on the
>> existing user_access_begin and user_access_end.
> 
> The only problem I have is that we'd better choose the calling
> conventions that work for other architectures as well.
> 
> AFAICS, aside of ppc and x86 we have (at least) this:
> arm:
> 	unsigned int __ua_flags = uaccess_save_and_enable();
> 	...
> 	uaccess_restore(__ua_flags);
> arm64:
> 	uaccess_enable_not_uao();
> 	...
> 	uaccess_disable_not_uao();
> riscv:
> 	__enable_user_access();
> 	...
> 	__disable_user_access();
> s390/mvc:
> 	old_fs = enable_sacf_uaccess();
> 	...
>          disable_sacf_uaccess(old_fs);
> 
> arm64 and riscv are easy - they map well on what we have now.
> The interesting ones are ppc, arm and s390.
> 
> You wants to specify the kind of access; OK, but...  it's not just read
> vs. write - there's read-write as well.  AFAICS, there are 3 users of
> that:
> 	* copy_in_user()
> 	* arch_futex_atomic_op_inuser()
> 	* futex_atomic_cmpxchg_inatomic()
> The former is of dubious utility (all users outside of arch are in
> the badly done compat code), but the other two are not going to go
> away.

user_access_begin() grants both read and write.

This patch adds user_read_access_begin() and user_write_access_begin() 
but it doesn't remove user_access_begin()

> 
> What should we do about that?  Do we prohibit such blocks outside
> of arch?
> 
> What should we do about arm and s390?  There we want a cookie passed
> from beginning of block to its end; should that be a return value?

That was the way I implemented it in January, see 
https://patchwork.ozlabs.org/patch/1227926/

There was some discussion around that and most noticeable was:

H. Peter (hpa) said about it: "I have *deep* concern with carrying state 
in a "key" variable: it's a direct attack vector for a crowbar attack, 
especially since it is by definition live inside a user access region."

> 
> And at least on arm that thing nests (see e.g. __clear_user_memset()
> there), so "stash that cookie in current->something" is not a solution...
> 
> Folks, let's sort that out while we still have few users of that
> interface; changing the calling conventions later will be much harder.
> Comments?
> 

This patch minimises the change by just adding user_read_access_begin() 
and user_write_access_begin() keeping the same parameters as the 
existing user_access_begin().

So I can come back to a mix of this patch and the January version if it 
corresponds to everyone's view, it will also be a bit easier for powerpc 
(especially book3s/32). But that didn't seem to be the expected 
direction back when we discussed it in January.

Christophe
Kees Cook April 2, 2020, 5:38 p.m. UTC | #4
On Thu, Apr 02, 2020 at 07:03:28PM +0200, Christophe Leroy wrote:
> > What should we do about arm and s390?  There we want a cookie passed
> > from beginning of block to its end; should that be a return value?
> 
> That was the way I implemented it in January, see
> https://patchwork.ozlabs.org/patch/1227926/
> 
> There was some discussion around that and most noticeable was:
> 
> H. Peter (hpa) said about it: "I have *deep* concern with carrying state in
> a "key" variable: it's a direct attack vector for a crowbar attack,
> especially since it is by definition live inside a user access region."

I share this concern -- we want to keep user/kernel access as static as
possible. It should be provable with static analysis, etc (e.g. objtool
does this already for x86).

Since this doesn't disrupt existing R+W access, I'd prefer the design of
this series as-is.
Al Viro April 2, 2020, 5:50 p.m. UTC | #5
On Thu, Apr 02, 2020 at 07:03:28PM +0200, Christophe Leroy wrote:

> user_access_begin() grants both read and write.
> 
> This patch adds user_read_access_begin() and user_write_access_begin() but
> it doesn't remove user_access_begin()

Ouch...  So the most generic name is for the rarest case?
 
> > What should we do about that?  Do we prohibit such blocks outside
> > of arch?
> > 
> > What should we do about arm and s390?  There we want a cookie passed
> > from beginning of block to its end; should that be a return value?
> 
> That was the way I implemented it in January, see
> https://patchwork.ozlabs.org/patch/1227926/
> 
> There was some discussion around that and most noticeable was:
> 
> H. Peter (hpa) said about it: "I have *deep* concern with carrying state in
> a "key" variable: it's a direct attack vector for a crowbar attack,
> especially since it is by definition live inside a user access region."

> This patch minimises the change by just adding user_read_access_begin() and
> user_write_access_begin() keeping the same parameters as the existing
> user_access_begin().

Umm...  What about the arm situation?  The same concerns would apply there,
wouldn't they?  Currently we have
static __always_inline unsigned int uaccess_save_and_enable(void)
{
#ifdef CONFIG_CPU_SW_DOMAIN_PAN
        unsigned int old_domain = get_domain();

        /* Set the current domain access to permit user accesses */
        set_domain((old_domain & ~domain_mask(DOMAIN_USER)) |
                   domain_val(DOMAIN_USER, DOMAIN_CLIENT));

        return old_domain;
#else
        return 0;
#endif
}
and
static __always_inline void uaccess_restore(unsigned int flags)
{
#ifdef CONFIG_CPU_SW_DOMAIN_PAN
        /* Restore the user access mask */
        set_domain(flags);
#endif
}

How much do we need nesting on those, anyway?  rmk?
Christophe Leroy April 2, 2020, 6:35 p.m. UTC | #6
Le 02/04/2020 à 19:50, Al Viro a écrit :
> On Thu, Apr 02, 2020 at 07:03:28PM +0200, Christophe Leroy wrote:
> 
>> user_access_begin() grants both read and write.
>>
>> This patch adds user_read_access_begin() and user_write_access_begin() but
>> it doesn't remove user_access_begin()
> 
> Ouch...  So the most generic name is for the rarest case?
>   

I can add another patch at the end of the series to rename 
user_access_begin() to user_full_access_begin().

Would it suit you ?

Christophe
Kees Cook April 2, 2020, 6:35 p.m. UTC | #7
On Thu, Apr 02, 2020 at 06:50:32PM +0100, Al Viro wrote:
> On Thu, Apr 02, 2020 at 07:03:28PM +0200, Christophe Leroy wrote:
> 
> > user_access_begin() grants both read and write.
> > 
> > This patch adds user_read_access_begin() and user_write_access_begin() but
> > it doesn't remove user_access_begin()
> 
> Ouch...  So the most generic name is for the rarest case?
>  
> > > What should we do about that?  Do we prohibit such blocks outside
> > > of arch?
> > > 
> > > What should we do about arm and s390?  There we want a cookie passed
> > > from beginning of block to its end; should that be a return value?
> > 
> > That was the way I implemented it in January, see
> > https://patchwork.ozlabs.org/patch/1227926/
> > 
> > There was some discussion around that and most noticeable was:
> > 
> > H. Peter (hpa) said about it: "I have *deep* concern with carrying state in
> > a "key" variable: it's a direct attack vector for a crowbar attack,
> > especially since it is by definition live inside a user access region."
> 
> > This patch minimises the change by just adding user_read_access_begin() and
> > user_write_access_begin() keeping the same parameters as the existing
> > user_access_begin().
> 
> Umm...  What about the arm situation?  The same concerns would apply there,
> wouldn't they?  Currently we have
> static __always_inline unsigned int uaccess_save_and_enable(void)
> {
> #ifdef CONFIG_CPU_SW_DOMAIN_PAN
>         unsigned int old_domain = get_domain();
> 
>         /* Set the current domain access to permit user accesses */
>         set_domain((old_domain & ~domain_mask(DOMAIN_USER)) |
>                    domain_val(DOMAIN_USER, DOMAIN_CLIENT));
> 
>         return old_domain;
> #else
>         return 0;
> #endif
> }
> and
> static __always_inline void uaccess_restore(unsigned int flags)
> {
> #ifdef CONFIG_CPU_SW_DOMAIN_PAN
>         /* Restore the user access mask */
>         set_domain(flags);
> #endif
> }
> 
> How much do we need nesting on those, anyway?  rmk?

Yup, I think it's a weakness of the ARM implementation and I'd like to
not extend it further. AFAIK we should never nest, but I would not be
surprised at all if we did.

If we were looking at a design goal for all architectures, I'd like
to be doing what the public PaX patchset did for their memory access
switching, which is to alarm if calling into "enable" found the access
already enabled, etc. Such a condition would show an unexpected nesting
(like we've seen with similar constructs with set_fs() not getting reset
during an exception handler, etc etc).
Linus Torvalds April 2, 2020, 7:26 p.m. UTC | #8
On Thu, Apr 2, 2020 at 11:36 AM Kees Cook <keescook@chromium.org> wrote:
>
> Yup, I think it's a weakness of the ARM implementation and I'd like to
> not extend it further. AFAIK we should never nest, but I would not be
> surprised at all if we did.

Wel, at least the user_access_begin/end() sections can't nest. objtool
verifies and warns about that on x86.

> If we were looking at a design goal for all architectures, I'd like
> to be doing what the public PaX patchset

We already do better than PaX ever did. Seriously. Mainline has long
since passed their hacky garbage.

Plus PaX  and grsecurity should be actively shunned. Don't look at it,
don't use it, and tell everybody you know to not use that shit.

                Linus
Kees Cook April 2, 2020, 8:27 p.m. UTC | #9
On Thu, Apr 02, 2020 at 12:26:52PM -0700, Linus Torvalds wrote:
> On Thu, Apr 2, 2020 at 11:36 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > Yup, I think it's a weakness of the ARM implementation and I'd like to
> > not extend it further. AFAIK we should never nest, but I would not be
> > surprised at all if we did.
> 
> Wel, at least the user_access_begin/end() sections can't nest. objtool
> verifies and warns about that on x86.

Right, yes, I mentioned that earlier in the thread. I meant I wasn't
100% sure about ARM's corner cases. I would _hope_ it doesn't.

> > If we were looking at a design goal for all architectures, I'd like
> > to be doing what the public PaX patchset
> 
> We already do better than PaX ever did. Seriously. Mainline has long
> since passed their hacky garbage.

I was just speaking to design principles in this area: if the "enable"
is called when already enabled, Something Is Wrong. :) (And one thing
still missing in this general subject is that x86 still lacks SMAP
emulation. And yes, I understand it's just not been a priority for anyone
that can work on it, but it is still a gap.)
Linus Torvalds April 2, 2020, 8:47 p.m. UTC | #10
On Thu, Apr 2, 2020 at 1:27 PM Kees Cook <keescook@chromium.org> wrote:
>
> I was just speaking to design principles in this area: if the "enable"
> is called when already enabled, Something Is Wrong. :)

Well, the "something is wrong" could easily be "the hardware does not
support this".

I'm not at all interested in the crazy code to do this in software.
Nobody sane should ever do that.

Yes, I realize that PaX did software emulation of things like that,
and it was one of the reasons why it was never useful to any normal
use.

Security is not an end goal in itself, it's always secondary to "can I
use this".

Security that means "normal people can't use this, it's only for the
special l33t users" is not security, it's garbage. That "do page
tables in software" was a prime example of garbage.

               Linus
Al Viro April 3, 2020, 12:58 a.m. UTC | #11
On Thu, Apr 02, 2020 at 11:35:57AM -0700, Kees Cook wrote:

> Yup, I think it's a weakness of the ARM implementation and I'd like to
> not extend it further. AFAIK we should never nest, but I would not be
> surprised at all if we did.
> 
> If we were looking at a design goal for all architectures, I'd like
> to be doing what the public PaX patchset did for their memory access
> switching, which is to alarm if calling into "enable" found the access
> already enabled, etc. Such a condition would show an unexpected nesting
> (like we've seen with similar constructs with set_fs() not getting reset
> during an exception handler, etc etc).

FWIW, maybe I'm misreading the ARM uaccess logics, but... it smells like
KERNEL_DS is somewhat more dangerous there than on e.g. x86.

Look: with CONFIG_CPU_DOMAINS, set_fs(KERNEL_DS) tells MMU to ignore
per-page permission bits in DOMAIN_KERNEL (i.e. for kernel address
ranges), allowing them even if they would normally be denied.  We need
that for actual uaccess loads/stores, since those use insns that pretend
to be done in user mode and we want them to access the kernel pages.
But that affects the normal loads/stores as well; unless I'm misreading
that code, it will ignore (supervisor) r/o on a page.  And that's not
just for the code inside the uaccess blocks; *everything* done under
KERNEL_DS is subject to that.

Why do we do that (modify_domain(), that is) inside set_fs() and not
in uaccess_enable() et.al.?
Russell King (Oracle) April 3, 2020, 9:49 a.m. UTC | #12
On Fri, Apr 03, 2020 at 01:58:31AM +0100, Al Viro wrote:
> On Thu, Apr 02, 2020 at 11:35:57AM -0700, Kees Cook wrote:
> 
> > Yup, I think it's a weakness of the ARM implementation and I'd like to
> > not extend it further. AFAIK we should never nest, but I would not be
> > surprised at all if we did.
> > 
> > If we were looking at a design goal for all architectures, I'd like
> > to be doing what the public PaX patchset did for their memory access
> > switching, which is to alarm if calling into "enable" found the access
> > already enabled, etc. Such a condition would show an unexpected nesting
> > (like we've seen with similar constructs with set_fs() not getting reset
> > during an exception handler, etc etc).
> 
> FWIW, maybe I'm misreading the ARM uaccess logics, but... it smells like
> KERNEL_DS is somewhat more dangerous there than on e.g. x86.
> 
> Look: with CONFIG_CPU_DOMAINS, set_fs(KERNEL_DS) tells MMU to ignore
> per-page permission bits in DOMAIN_KERNEL (i.e. for kernel address
> ranges), allowing them even if they would normally be denied.  We need
> that for actual uaccess loads/stores, since those use insns that pretend
> to be done in user mode and we want them to access the kernel pages.
> But that affects the normal loads/stores as well; unless I'm misreading
> that code, it will ignore (supervisor) r/o on a page.  And that's not
> just for the code inside the uaccess blocks; *everything* done under
> KERNEL_DS is subject to that.
> 
> Why do we do that (modify_domain(), that is) inside set_fs() and not
> in uaccess_enable() et.al.?

First, CONFIG_CPU_DOMAINS is used on older ARMs, not ARMv7. Second,
the kernel image itself is not RO-protected on any ARM32 platform.

If we get rid of CONFIG_CPU_DOMAINS, we will use the ARMv7 method of
user access, which is to use normal load/stores for the user accessors
and every access must check against the address limit, even the
__-accessors.
Russell King (Oracle) April 3, 2020, 10:02 a.m. UTC | #13
On Thu, Apr 02, 2020 at 11:35:57AM -0700, Kees Cook wrote:
> On Thu, Apr 02, 2020 at 06:50:32PM +0100, Al Viro wrote:
> > On Thu, Apr 02, 2020 at 07:03:28PM +0200, Christophe Leroy wrote:
> > 
> > > user_access_begin() grants both read and write.
> > > 
> > > This patch adds user_read_access_begin() and user_write_access_begin() but
> > > it doesn't remove user_access_begin()
> > 
> > Ouch...  So the most generic name is for the rarest case?
> >  
> > > > What should we do about that?  Do we prohibit such blocks outside
> > > > of arch?
> > > > 
> > > > What should we do about arm and s390?  There we want a cookie passed
> > > > from beginning of block to its end; should that be a return value?
> > > 
> > > That was the way I implemented it in January, see
> > > https://patchwork.ozlabs.org/patch/1227926/
> > > 
> > > There was some discussion around that and most noticeable was:
> > > 
> > > H. Peter (hpa) said about it: "I have *deep* concern with carrying state in
> > > a "key" variable: it's a direct attack vector for a crowbar attack,
> > > especially since it is by definition live inside a user access region."
> > 
> > > This patch minimises the change by just adding user_read_access_begin() and
> > > user_write_access_begin() keeping the same parameters as the existing
> > > user_access_begin().
> > 
> > Umm...  What about the arm situation?  The same concerns would apply there,
> > wouldn't they?  Currently we have
> > static __always_inline unsigned int uaccess_save_and_enable(void)
> > {
> > #ifdef CONFIG_CPU_SW_DOMAIN_PAN
> >         unsigned int old_domain = get_domain();
> > 
> >         /* Set the current domain access to permit user accesses */
> >         set_domain((old_domain & ~domain_mask(DOMAIN_USER)) |
> >                    domain_val(DOMAIN_USER, DOMAIN_CLIENT));
> > 
> >         return old_domain;
> > #else
> >         return 0;
> > #endif
> > }
> > and
> > static __always_inline void uaccess_restore(unsigned int flags)
> > {
> > #ifdef CONFIG_CPU_SW_DOMAIN_PAN
> >         /* Restore the user access mask */
> >         set_domain(flags);
> > #endif
> > }
> > 
> > How much do we need nesting on those, anyway?  rmk?

It's that way because it's easy, logical, and actually *more* efficient
to do it that way, rather than read-modify-write the domain register
each time we want to change it.

> Yup, I think it's a weakness of the ARM implementation and I'd like to
> not extend it further. AFAIK we should never nest, but I would not be
> surprised at all if we did.

There is one known nesting, which is __clear_user() when used with
the (IMHO horrid and I don't care about) UACCESS_WITH_MEMCPY feature.
That's not intentional however.

When I introduced this on ARM, the placement I adopted was to locate
it _as close as sanely possible_ to the userspace access so we
minimised the kernel accesses, so we minimise the number of accesses
that could go stray because of the domain issue - we ideally only
want the access done by the accessor itself to be affected, which
we achieve for most accesses.

Thinking laterally, maybe we should get rid of the whole KERNEL_DS
stuff entirely, and come up with an alternative way of handling the
kernel-wants-to-access-kernelspace-via-user-accessors problem.
Such as, copying some data back to userspace memory?
Catalin Marinas April 3, 2020, 11:26 a.m. UTC | #14
On Fri, Apr 03, 2020 at 01:58:31AM +0100, Al Viro wrote:
> On Thu, Apr 02, 2020 at 11:35:57AM -0700, Kees Cook wrote:
> > Yup, I think it's a weakness of the ARM implementation and I'd like to
> > not extend it further. AFAIK we should never nest, but I would not be
> > surprised at all if we did.
> > 
> > If we were looking at a design goal for all architectures, I'd like
> > to be doing what the public PaX patchset did for their memory access
> > switching, which is to alarm if calling into "enable" found the access
> > already enabled, etc. Such a condition would show an unexpected nesting
> > (like we've seen with similar constructs with set_fs() not getting reset
> > during an exception handler, etc etc).
> 
> FWIW, maybe I'm misreading the ARM uaccess logics, but... it smells like
> KERNEL_DS is somewhat more dangerous there than on e.g. x86.
> 
> Look: with CONFIG_CPU_DOMAINS, set_fs(KERNEL_DS) tells MMU to ignore
> per-page permission bits in DOMAIN_KERNEL (i.e. for kernel address
> ranges), allowing them even if they would normally be denied.  We need
> that for actual uaccess loads/stores, since those use insns that pretend
> to be done in user mode and we want them to access the kernel pages.
> But that affects the normal loads/stores as well; unless I'm misreading
> that code, it will ignore (supervisor) r/o on a page.  And that's not
> just for the code inside the uaccess blocks; *everything* done under
> KERNEL_DS is subject to that.

That's correct. Luckily this only affects ARMv5 and earlier. From ARMv6
onwards, CONFIG_CPU_USE_DOMAINS is no longer selected and the uaccess
instructions are just plain ldr/str.

Russell should know the details on whether there was much choice. Since
the kernel was living in the linear map with full rwx permissions, the
KERNEL_DS overriding was probably not a concern and the ldrt/strt for
uaccess deemed more secure. We also have weird permission setting
pre-ARMv6 (or rather v6k) where RO user pages are writable from the
kernel with standard str instructions (breaking CoW). I don't recall
whether it was a choice made by the kernel or something the architecture
enforced. The vectors page has to be kernel writable (and user RO) to
store the TLS value in the absence of a TLS register but maybe we could
do this via the linear alias together with the appropriate cache
maintenance.

From ARMv6, the domain overriding had the side-effect of ignoring the XN
bit and causing random instruction fetches from ioremap() areas. So we
had to remove the domain switching. We also gained a dedicated TLS
register.

> Why do we do that (modify_domain(), that is) inside set_fs() and not
> in uaccess_enable() et.al.?

I think uaccess_enable() could indeed switch the kernel domain if
KERNEL_DS is set and move this out of set_fs(). It would reduce the
window the kernel domain permissions are overridden. Anyway,
uaccess_enable() appeared much later on arm when Russell introduced PAN
(SMAP) like support by switching the user domain.
Russell King (Oracle) April 3, 2020, 1:37 p.m. UTC | #15
On Fri, Apr 03, 2020 at 12:26:10PM +0100, Catalin Marinas wrote:
> On Fri, Apr 03, 2020 at 01:58:31AM +0100, Al Viro wrote:
> > On Thu, Apr 02, 2020 at 11:35:57AM -0700, Kees Cook wrote:
> > > Yup, I think it's a weakness of the ARM implementation and I'd like to
> > > not extend it further. AFAIK we should never nest, but I would not be
> > > surprised at all if we did.
> > > 
> > > If we were looking at a design goal for all architectures, I'd like
> > > to be doing what the public PaX patchset did for their memory access
> > > switching, which is to alarm if calling into "enable" found the access
> > > already enabled, etc. Such a condition would show an unexpected nesting
> > > (like we've seen with similar constructs with set_fs() not getting reset
> > > during an exception handler, etc etc).
> > 
> > FWIW, maybe I'm misreading the ARM uaccess logics, but... it smells like
> > KERNEL_DS is somewhat more dangerous there than on e.g. x86.
> > 
> > Look: with CONFIG_CPU_DOMAINS, set_fs(KERNEL_DS) tells MMU to ignore
> > per-page permission bits in DOMAIN_KERNEL (i.e. for kernel address
> > ranges), allowing them even if they would normally be denied.  We need
> > that for actual uaccess loads/stores, since those use insns that pretend
> > to be done in user mode and we want them to access the kernel pages.
> > But that affects the normal loads/stores as well; unless I'm misreading
> > that code, it will ignore (supervisor) r/o on a page.  And that's not
> > just for the code inside the uaccess blocks; *everything* done under
> > KERNEL_DS is subject to that.
> 
> That's correct. Luckily this only affects ARMv5 and earlier. From ARMv6
> onwards, CONFIG_CPU_USE_DOMAINS is no longer selected and the uaccess
> instructions are just plain ldr/str.
> 
> Russell should know the details on whether there was much choice. Since
> the kernel was living in the linear map with full rwx permissions, the
> KERNEL_DS overriding was probably not a concern and the ldrt/strt for
> uaccess deemed more secure. We also have weird permission setting
> pre-ARMv6 (or rather v6k) where RO user pages are writable from the
> kernel with standard str instructions (breaking CoW). I don't recall
> whether it was a choice made by the kernel or something the architecture
> enforced. The vectors page has to be kernel writable (and user RO) to
> store the TLS value in the absence of a TLS register but maybe we could
> do this via the linear alias together with the appropriate cache
> maintenance.
> 
> From ARMv6, the domain overriding had the side-effect of ignoring the XN
> bit and causing random instruction fetches from ioremap() areas. So we
> had to remove the domain switching. We also gained a dedicated TLS
> register.

Indeed.  On pre-ARMv6, we have the following choices for protection
attributes:

Page tables	Control Reg	Privileged	User
AP		S,R		permission	permission
00		0,0		No access	No access
00		1,0		Read-only	No access
00		0,1		Read-only	Read-only
00		1,1		Unpredictable	Unpredictable
01		X,X		Read/Write	No access
10		X,X		Read/Write	Read-only
11		X,X		Read/Write	Read/Write

We use S,R=1,0 under Linux because this allows us to read-protect
kernel pages without making them visible to userspace.  If we
changed to S,R=0,1, then we could have our read-only permissions
for both kernel and userspace, drop domain switching, and use the
plain LDR/STR instructions, but we then lose the ability to
write-protect module executable code and other parts of kernel
space without making them visible to userspace.

So, it essentially boils down to making a choice - which set of
security features we think are the most important.

> I think uaccess_enable() could indeed switch the kernel domain if
> KERNEL_DS is set and move this out of set_fs(). It would reduce the
> window the kernel domain permissions are overridden. Anyway,
> uaccess_enable() appeared much later on arm when Russell introduced PAN
> (SMAP) like support by switching the user domain.

Yes, that would be a possibility.  Another possibility would be to
eliminate as much usage of KERNEL_DS as possible - I've just found
one instance in sys_oabi-compat.c that can be eliminated (epoll_ctl)
but there's several there that can't with the current code structure,
and re-coding the contents of some fs/* functions to work around that
is a very bad idea.  If there's some scope for rejigging some of the
fs/* code, it may be possible to elimate some other cases in there.

I notice that the fs/* code seems like some of the last remaining
users of KERNEL_DS, although I suspect that some aren't possible to
eliminate. :(
Al Viro April 3, 2020, 5:26 p.m. UTC | #16
On Fri, Apr 03, 2020 at 02:37:19PM +0100, Russell King - ARM Linux admin wrote:

> > I think uaccess_enable() could indeed switch the kernel domain if
> > KERNEL_DS is set and move this out of set_fs(). It would reduce the
> > window the kernel domain permissions are overridden. Anyway,
> > uaccess_enable() appeared much later on arm when Russell introduced PAN
> > (SMAP) like support by switching the user domain.
> 
> Yes, that would be a possibility.  Another possibility would be to
> eliminate as much usage of KERNEL_DS as possible

That's definitely worth doing, but that's another long-term project ;-/

> - I've just found
> one instance in sys_oabi-compat.c that can be eliminated (epoll_ctl)
> but there's several there that can't with the current code structure,
> and re-coding the contents of some fs/* functions to work around that
> is a very bad idea.  If there's some scope for rejigging some of the
> fs/* code, it may be possible to elimate some other cases in there.

Well, your do_locks() definitely can be converted.  epoll_wait()...
not sure, need to look into that.  Is that about the layout mismatch
between struct oabi_epoll_event and struct epoll_event?  In case of
semtimedop...  Hell knows, I would probably consider moving that thing
into ipc/sem.c under ifdef...
diff mbox series

Patch

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad..9861c89f93be 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -378,6 +378,14 @@  extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 static inline unsigned long user_access_save(void) { return 0UL; }
 static inline void user_access_restore(unsigned long flags) { }
 #endif
+#ifndef user_write_access_begin
+#define user_write_access_begin user_access_begin
+#define user_write_access_end user_access_end
+#endif
+#ifndef user_read_access_begin
+#define user_read_access_begin user_access_begin
+#define user_read_access_end user_access_end
+#endif
 
 #ifdef CONFIG_HARDENED_USERCOPY
 void usercopy_warn(const char *name, const char *detail, bool to_user,