[v7,0/8] arm64: untag user pointers passed to the kernel
mbox series

Message ID cover.1538485901.git.andreyknvl@google.com
Headers show
Series
  • arm64: untag user pointers passed to the kernel
Related show

Message

Andrey Konovalov Oct. 2, 2018, 1:12 p.m. UTC
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.

Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:

1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
             tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
	      pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
	      pointers")

When passing tagged pointers to syscalls, there's a special case of such a
pointer being passed to one of the memory syscalls (mmap, mprotect, etc.).
These syscalls don't do memory accesses but rather deal with memory
ranges, hence an untagged pointer is better suited.

This patchset extends tagged pointer support to non-memory syscalls. This
is done by reusing the untagged_addr macro to untag user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok).

The following testing approaches has been taken to find potential issues
with user pointer untagging:

1. Static testing (with sparse [2] and separately with a custom static
   analyzer based on Clang) to track casts of __user pointers to integer
   types to find places where untagging needs to be done.

2. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
   a modified syzkaller version that passes tagged pointers to the kernel.

Based on the results of the testing the requried patches have been added
to the patchset.

This patchset is a prerequisite for ARM's memory tagging hardware feature
support [3].

Thanks!

[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292

[3] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a

Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
  the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
  passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
  patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.

Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).

Changes in v5:
- Added 3 new patches that add untagging to places found with static
  analysis.
- Rebased onto 44c929e1 (4.18-rc8).

Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
  kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).

Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.

Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.

Changes in v1:
- Rebased onto 4.17-rc1.

Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
  defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).

Andrey Konovalov (8):
  arm64: add type casts to untagged_addr macro
  uaccess: add untagged_addr definition for other arches
  arm64: untag user addresses in access_ok and __uaccess_mask_ptr
  mm, arm64: untag user addresses in mm/gup.c
  lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user
  fs, arm64: untag user address in copy_mount_options
  arm64: update Documentation/arm64/tagged-pointers.txt
  selftests, arm64: add a selftest for passing tagged pointers to kernel

 Documentation/arm64/tagged-pointers.txt       | 24 +++++++++++--------
 arch/arm64/include/asm/uaccess.h              | 14 +++++++----
 fs/namespace.c                                |  2 +-
 include/linux/uaccess.h                       |  4 ++++
 lib/strncpy_from_user.c                       |  2 ++
 lib/strnlen_user.c                            |  2 ++
 mm/gup.c                                      |  4 ++++
 tools/testing/selftests/arm64/.gitignore      |  1 +
 tools/testing/selftests/arm64/Makefile        | 11 +++++++++
 .../testing/selftests/arm64/run_tags_test.sh  | 12 ++++++++++
 tools/testing/selftests/arm64/tags_test.c     | 19 +++++++++++++++
 11 files changed, 79 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/arm64/.gitignore
 create mode 100644 tools/testing/selftests/arm64/Makefile
 create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
 create mode 100644 tools/testing/selftests/arm64/tags_test.c

Comments

Luc Van Oostenryck Oct. 3, 2018, 1:31 p.m. UTC | #1
On Tue, Oct 02, 2018 at 03:12:35PM +0200, Andrey Konovalov wrote:
...

> Changes in v7:
> - Rebased onto 17b57b18 (4.19-rc6).
> - Dropped the "arm64: untag user address in __do_user_fault" patch, since
>   the existing patches already handle user faults properly.
> - Dropped the "usb, arm64: untag user addresses in devio" patch, since the
>   passed pointer must come from a vma and therefore be untagged.
> - Dropped the "arm64: annotate user pointers casts detected by sparse"
>   patch (see the discussion to the replies of the v6 of this patchset).
> - Added more context to the cover letter.
> - Updated Documentation/arm64/tagged-pointers.txt.

Hi,

I'm quite hapy now with what concerns me (sparse).
Please, feel free to add my:
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>


Cheers,
-- Luc Van Oostenryck
Vincenzo Frascino Oct. 17, 2018, 2:06 p.m. UTC | #2
Hi Andrey,

On 02/10/2018 14:12, Andrey Konovalov wrote:
> arm64 has a feature called Top Byte Ignore, which allows to embed pointer
> tags into the top byte of each pointer. Userspace programs (such as
> HWASan, a memory debugging tool [1]) might use this feature and pass
> tagged user pointers to the kernel through syscalls or other interfaces.
> 
> Right now the kernel is already able to handle user faults with tagged
> pointers, due to these patches:
> 
> 1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
>              tagged pointer")
> 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
> 	      pointers")
> 3. 276e9327 ("arm64: entry: improve data abort handling of tagged
> 	      pointers")
> 
> When passing tagged pointers to syscalls, there's a special case of such a
> pointer being passed to one of the memory syscalls (mmap, mprotect, etc.).
> These syscalls don't do memory accesses but rather deal with memory
> ranges, hence an untagged pointer is better suited.
> 
> This patchset extends tagged pointer support to non-memory syscalls. This
> is done by reusing the untagged_addr macro to untag user pointers when the
> kernel performs pointer checking to find out whether the pointer comes
> from userspace (most notably in access_ok).
> 
> The following testing approaches has been taken to find potential issues
> with user pointer untagging:
> 
> 1. Static testing (with sparse [2] and separately with a custom static
>    analyzer based on Clang) to track casts of __user pointers to integer
>    types to find places where untagging needs to be done.
> 
> 2. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
>    a modified syzkaller version that passes tagged pointers to the kernel.
> 
...

I have been thinking a bit lately on how to address the problem of user tagged pointers passed to the kernel through syscalls, and IMHO probably the best way we have to catch them all and make sure that the approach is maintainable in the long term is to introduce shims that tag/untag the pointers passed to the kernel.

In details, what I am proposing can live either in userspace (preferred solution so that we do not have to relax the ABI) or in kernel space and can be summarized as follows:
 - A shim is specific to a syscall and is called by the libc when it needs to invoke the respective syscall.
 - It is required only if the syscall accepts pointers.
 - It saves the tags of a pointers passed to the syscall in memory (same approach if the we are passing a struct that contains pointers to the kernel, with the difference that all the tags of the pointers in the struct need to be saved singularly)
 - Untags the pointers
 - Invokes the syscall
 - Retags the pointers with the tags stored in memory
 - Returns

What do you think?
Andrey Konovalov Oct. 17, 2018, 2:20 p.m. UTC | #3
On Wed, Oct 17, 2018 at 4:06 PM, Vincenzo Frascino
<vincenzo.frascino@arm.com> wrote:
> Hi Andrey,
> I have been thinking a bit lately on how to address the problem of user tagged pointers passed to the kernel through syscalls, and IMHO probably the best way we have to catch them all and make sure that the approach is maintainable in the long term is to introduce shims that tag/untag the pointers passed to the kernel.
>
> In details, what I am proposing can live either in userspace (preferred solution so that we do not have to relax the ABI) or in kernel space and can be summarized as follows:
>  - A shim is specific to a syscall and is called by the libc when it needs to invoke the respective syscall.
>  - It is required only if the syscall accepts pointers.
>  - It saves the tags of a pointers passed to the syscall in memory (same approach if the we are passing a struct that contains pointers to the kernel, with the difference that all the tags of the pointers in the struct need to be saved singularly)
>  - Untags the pointers
>  - Invokes the syscall
>  - Retags the pointers with the tags stored in memory
>  - Returns
>
> What do you think?

Hi Vincenzo,

If I correctly understand what you are proposing, I'm not sure if that
would work with the countless number of different ioctl calls. For
example when an ioctl accepts a struct with a bunch of pointer fields.
In this case a shim like the one you propose can't live in userspace,
since libc doesn't know about the interface of all ioctls, so it can't
know which fields to untag. The kernel knows about those interfaces
(since the kernel implements them), but then we would need a custom
shim for each ioctl variation, which doesn't seem practical.

Thanks!
Evgenii Stepanov Oct. 17, 2018, 8:25 p.m. UTC | #4
On Wed, Oct 17, 2018 at 7:20 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
> On Wed, Oct 17, 2018 at 4:06 PM, Vincenzo Frascino
> <vincenzo.frascino@arm.com> wrote:
>> Hi Andrey,
>> I have been thinking a bit lately on how to address the problem of user tagged pointers passed to the kernel through syscalls, and IMHO probably the best way we have to catch them all and make sure that the approach is maintainable in the long term is to introduce shims that tag/untag the pointers passed to the kernel.
>>
>> In details, what I am proposing can live either in userspace (preferred solution so that we do not have to relax the ABI) or in kernel space and can be summarized as follows:
>>  - A shim is specific to a syscall and is called by the libc when it needs to invoke the respective syscall.
>>  - It is required only if the syscall accepts pointers.
>>  - It saves the tags of a pointers passed to the syscall in memory (same approach if the we are passing a struct that contains pointers to the kernel, with the difference that all the tags of the pointers in the struct need to be saved singularly)
>>  - Untags the pointers
>>  - Invokes the syscall
>>  - Retags the pointers with the tags stored in memory
>>  - Returns
>>
>> What do you think?
>
> Hi Vincenzo,
>
> If I correctly understand what you are proposing, I'm not sure if that
> would work with the countless number of different ioctl calls. For
> example when an ioctl accepts a struct with a bunch of pointer fields.
> In this case a shim like the one you propose can't live in userspace,
> since libc doesn't know about the interface of all ioctls, so it can't
> know which fields to untag. The kernel knows about those interfaces
> (since the kernel implements them), but then we would need a custom
> shim for each ioctl variation, which doesn't seem practical.

The current patchset handles majority of pointers in a just a few
common places, like copy_from_user. Userspace shims will need to untag
& retag all pointer arguments - we are looking at hundreds if not
thousands of shims. They will also be located in a different code base
from the syscall / ioctl implementations, which would make them
impossible to keep up to date.
Catalin Marinas Oct. 18, 2018, 5:33 p.m. UTC | #5
On Wed, Oct 17, 2018 at 01:25:42PM -0700, Evgenii Stepanov wrote:
> On Wed, Oct 17, 2018 at 7:20 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
> > On Wed, Oct 17, 2018 at 4:06 PM, Vincenzo Frascino
> > <vincenzo.frascino@arm.com> wrote:
> >> I have been thinking a bit lately on how to address the problem of
> >> user tagged pointers passed to the kernel through syscalls, and
> >> IMHO probably the best way we have to catch them all and make sure
> >> that the approach is maintainable in the long term is to introduce
> >> shims that tag/untag the pointers passed to the kernel.
> >>
> >> In details, what I am proposing can live either in userspace
> >> (preferred solution so that we do not have to relax the ABI) or in
> >> kernel space and can be summarized as follows:
> >>  - A shim is specific to a syscall and is called by the libc when
> >>  it needs to invoke the respective syscall.
> >>  - It is required only if the syscall accepts pointers.
> >>  - It saves the tags of a pointers passed to the syscall in memory
> >>  (same approach if the we are passing a struct that contains
> >>  pointers to the kernel, with the difference that all the tags of
> >>  the pointers in the struct need to be saved singularly)
> >>  - Untags the pointers
> >>  - Invokes the syscall
> >>  - Retags the pointers with the tags stored in memory
> >>  - Returns
> >>
> >> What do you think?
> >
> > If I correctly understand what you are proposing, I'm not sure if that
> > would work with the countless number of different ioctl calls. For
> > example when an ioctl accepts a struct with a bunch of pointer fields.
> > In this case a shim like the one you propose can't live in userspace,
> > since libc doesn't know about the interface of all ioctls, so it can't
> > know which fields to untag. The kernel knows about those interfaces
> > (since the kernel implements them), but then we would need a custom
> > shim for each ioctl variation, which doesn't seem practical.
> 
> The current patchset handles majority of pointers in a just a few
> common places, like copy_from_user. Userspace shims will need to untag
> & retag all pointer arguments - we are looking at hundreds if not
> thousands of shims. They will also be located in a different code base
> from the syscall / ioctl implementations, which would make them
> impossible to keep up to date.

I think ioctls are a good reason not to attempt such user-space shim
layer (though it would have been much easier for the kernel ;)).
Vincenzo Frascino Oct. 19, 2018, 9:04 a.m. UTC | #6
On 10/17/18 9:25 PM, Evgenii Stepanov wrote:
> On Wed, Oct 17, 2018 at 7:20 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
>> On Wed, Oct 17, 2018 at 4:06 PM, Vincenzo Frascino
>> <vincenzo.frascino@arm.com> wrote:
>>> Hi Andrey,
>>> I have been thinking a bit lately on how to address the problem of user tagged pointers passed to the kernel through syscalls, and IMHO probably the best way we have to catch them all and make sure that the approach is maintainable in the long term is to introduce shims that tag/untag the pointers passed to the kernel.
>>>
>>> In details, what I am proposing can live either in userspace (preferred solution so that we do not have to relax the ABI) or in kernel space and can be summarized as follows:
>>>  - A shim is specific to a syscall and is called by the libc when it needs to invoke the respective syscall.
>>>  - It is required only if the syscall accepts pointers.
>>>  - It saves the tags of a pointers passed to the syscall in memory (same approach if the we are passing a struct that contains pointers to the kernel, with the difference that all the tags of the pointers in the struct need to be saved singularly)
>>>  - Untags the pointers
>>>  - Invokes the syscall
>>>  - Retags the pointers with the tags stored in memory
>>>  - Returns
>>>
>>> What do you think?
>>
>> Hi Vincenzo,
>>
>> If I correctly understand what you are proposing, I'm not sure if that
>> would work with the countless number of different ioctl calls. For
>> example when an ioctl accepts a struct with a bunch of pointer fields.
>> In this case a shim like the one you propose can't live in userspace,
>> since libc doesn't know about the interface of all ioctls, so it can't
>> know which fields to untag. The kernel knows about those interfaces
>> (since the kernel implements them), but then we would need a custom
>> shim for each ioctl variation, which doesn't seem practical.
> 
> The current patchset handles majority of pointers in a just a few
> common places, like copy_from_user. Userspace shims will need to untag
> & retag all pointer arguments - we are looking at hundreds if not
> thousands of shims. They will also be located in a different code base
> from the syscall / ioctl implementations, which would make them
> impossible to keep up to date.
> 

I agree with both of you, ioctl is the real show stopper for this approach. Thanks for pointing this out.