[v2,2/4] arm64: Define Documentation/arm64/elf_at_flags.txt
diff mbox series

Message ID 20190318163533.26838-3-vincenzo.frascino@arm.com
State New
Headers show
Series
  • [v2,1/4] elf: Make AT_FLAGS arch configurable
Related show

Commit Message

Vincenzo Frascino March 18, 2019, 4:35 p.m. UTC
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
the userspace (EL0) is allowed to set a non-zero value in the
top byte but the resulting pointers are not allowed at the
user-kernel syscall ABI boundary.

With the relaxed ABI proposed through this document, it is now possible
to pass tagged pointers to the syscalls, when these pointers are in
memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().

This change in the ABI requires a mechanism to inform the userspace
that such an option is available.

Specify and document the way in which AT_FLAGS can be used to advertise
this feature to the userspace.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
CC: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
---
 Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)
 create mode 100644 Documentation/arm64/elf_at_flags.txt

Comments

Amit Kachhap March 22, 2019, 6:22 a.m. UTC | #1
Hi Vincenzo,

On Mon, Mar 18, 2019 at 10:06 PM Vincenzo Frascino
<vincenzo.frascino@arm.com> wrote:
>
> On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
> the userspace (EL0) is allowed to set a non-zero value in the
> top byte but the resulting pointers are not allowed at the
> user-kernel syscall ABI boundary.
>
> With the relaxed ABI proposed through this document, it is now possible
> to pass tagged pointers to the syscalls, when these pointers are in
> memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
>
> This change in the ABI requires a mechanism to inform the userspace
> that such an option is available.
>
> Specify and document the way in which AT_FLAGS can be used to advertise
> this feature to the userspace.
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> CC: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
>
> Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
> ---
>  Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++
>  1 file changed, 133 insertions(+)
>  create mode 100644 Documentation/arm64/elf_at_flags.txt
>
> diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt
> new file mode 100644
> index 000000000000..9b3494207c14
> --- /dev/null
> +++ b/Documentation/arm64/elf_at_flags.txt
> @@ -0,0 +1,133 @@
> +ARM64 ELF AT_FLAGS
> +==================
> +
> +This document describes the usage and semantics of AT_FLAGS on arm64.
> +
> +1. Introduction
> +---------------
> +
> +AT_FLAGS is part of the Auxiliary Vector, contains the flags and it
> +is set to zero by the kernel on arm64 unless one or more of the
> +features detailed in paragraph 2 are present.
> +
> +The auxiliary vector can be accessed by the userspace using the
> +getauxval() API provided by the C library.
> +getauxval() returns an unsigned long and when a flag is present in
> +the AT_FLAGS, the corresponding bit in the returned value is set to 1.
> +
> +The AT_FLAGS with a "defined semantics" on arm64 are exposed to the
> +userspace via user API (uapi/asm/atflags.h).
> +The AT_FLAGS bits with "undefined semantics" are set to zero by default.
> +This means that the AT_FLAGS bits to which this document does not assign
> +an explicit meaning are to be intended reserved for future use.
> +The kernel will populate all such bits with zero until meanings are
> +assigned to them. If and when meanings are assigned, it is guaranteed
> +that they will not impact the functional operation of existing userspace
> +software. Userspace software should ignore any AT_FLAGS bit whose meaning
> +is not defined when the software is written.
> +
> +The userspace software can test for features by acquiring the AT_FLAGS
> +entry of the auxiliary vector, and testing whether a relevant flag
> +is set.
> +
> +Example of a userspace test function:
> +
> +bool feature_x_is_present(void)
> +{
> +       unsigned long at_flags = getauxval(AT_FLAGS);
> +       if (at_flags & FEATURE_X)
> +               return true;
> +
> +       return false;
> +}
> +
> +Where the software relies on a feature advertised by AT_FLAGS, it
> +must check that the feature is present before attempting to
> +use it.
> +
> +2. Features exposed via AT_FLAGS
> +--------------------------------
> +
> +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
> +
> +    On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
> +    kernel, hence the userspace (EL0) is allowed to set a non-zero value
> +    in the top byte but the resulting pointers are not allowed at the
> +    user-kernel syscall ABI boundary.
> +    When bit[0] is set to 1 the kernel is advertising to the userspace
> +    that a relaxed ABI is supported hence this type of pointers are now
> +    allowed to be passed to the syscalls, when these pointers are in
> +    memory ranges privately owned by a process and obtained by the
> +    process in accordance with the definition of "valid tagged pointer"
> +    in paragraph 3.
> +    In these cases the tag is preserved as the pointer goes through the
> +    kernel. Only when the kernel needs to check if a pointer is coming
> +    from userspace an untag operation is required.
> +
> +3. ARM64_AT_FLAGS_SYSCALL_TBI
> +-----------------------------
> +
> +From the kernel syscall interface prospective, we define, for the purposes
> +of this document, a "valid tagged pointer" as a pointer that either it has
> +a zero value set in the top byte or it has a non-zero value, it is in memory
> +ranges privately owned by a userspace process and it is obtained in one of
> +the following ways:
> +  - mmap() done by the process itself, where either:
> +    * flags = MAP_PRIVATE | MAP_ANONYMOUS
> +    * flags = MAP_PRIVATE and the file descriptor refers to a regular
> +      file or "/dev/zero"
> +  - a mapping below sbrk(0) done by the process itself
> +  - any memory mapped by the kernel in the process's address space during
> +    creation and following the restrictions presented above (i.e. data, bss,
> +    stack).
> +
> +When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following
> +behaviours are guaranteed by the ABI:
> +
> +  - Every current or newly introduced syscall can accept any valid tagged
> +    pointers.
> +
> +  - If a non valid tagged pointer is passed to a syscall then the behaviour
> +    is undefined.
> +
> +  - Every valid tagged pointer is expected to work as an untagged one.
> +
> +  - The kernel preserves any valid tagged pointers and returns them to the
> +    userspace unchanged in all the cases except the ones documented in the
> +    "Preserving tags" paragraph of tagged-pointers.txt.
> +
> +A definition of the meaning of tagged pointers on arm64 can be found in:
> +Documentation/arm64/tagged-pointers.txt.
> +
> +Example of correct usage (pseudo-code) for a userspace application:
> +
> +bool arm64_syscall_tbi_is_present(void)
> +{
> +       unsigned long at_flags = getauxval(AT_FLAGS);
> +       if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
> +                       return true;
> +
> +       return false;
> +}
> +
> +void main(void)
> +{
> +       char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                         MAP_ANONYMOUS, -1, 0);
> +
> +       int fd = open("test.txt", O_WRONLY);
> +
> +       /* Check if the relaxed ABI is supported */
> +       if (arm64_syscall_tbi_is_present()) {
> +               /* Add a tag to the pointer */
> +               addr = tag_pointer(addr);
> +       }
> +
> +       strcpy("Hello World\n", addr);
Nit: s/strcpy("Hello World\n", addr)/strcpy(addr, "Hello World\n")

Thanks,
Amit D
> +
> +       /* Write to a file */
> +       write(fd, addr, sizeof(addr));
> +
> +       close(fd);
> +}
> +
> --
> 2.21.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas March 22, 2019, 10:48 a.m. UTC | #2
On Fri, Mar 22, 2019 at 11:52:37AM +0530, Amit Daniel Kachhap wrote:
> On Mon, Mar 18, 2019 at 10:06 PM Vincenzo Frascino
> <vincenzo.frascino@arm.com> wrote:
> > +Example of correct usage (pseudo-code) for a userspace application:
> > +
> > +bool arm64_syscall_tbi_is_present(void)
> > +{
> > +       unsigned long at_flags = getauxval(AT_FLAGS);
> > +       if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
> > +                       return true;
> > +
> > +       return false;
> > +}
> > +
> > +void main(void)
> > +{
> > +       char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
> > +                         MAP_ANONYMOUS, -1, 0);
> > +
> > +       int fd = open("test.txt", O_WRONLY);
> > +
> > +       /* Check if the relaxed ABI is supported */
> > +       if (arm64_syscall_tbi_is_present()) {
> > +               /* Add a tag to the pointer */
> > +               addr = tag_pointer(addr);
> > +       }
> > +
> > +       strcpy("Hello World\n", addr);
> 
> Nit: s/strcpy("Hello World\n", addr)/strcpy(addr, "Hello World\n")

Not exactly a nit ;).

> > +
> > +       /* Write to a file */
> > +       write(fd, addr, sizeof(addr));

I presume this was supposed to write "Hello World\n" to a file but
sizeof(addr) is 1.

Since we already support tagged pointers in user space (as long as they
are not passed into the kernel), the above example could tag the pointer
unconditionally and only clear it before write() if
!arm64_syscall_tbi_is_present().
Kevin Brodsky March 22, 2019, 3:52 p.m. UTC | #3
On 18/03/2019 16:35, Vincenzo Frascino wrote:
> On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
> the userspace (EL0) is allowed to set a non-zero value in the
> top byte but the resulting pointers are not allowed at the
> user-kernel syscall ABI boundary.
>
> With the relaxed ABI proposed through this document, it is now possible
> to pass tagged pointers to the syscalls, when these pointers are in
> memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
>
> This change in the ABI requires a mechanism to inform the userspace
> that such an option is available.
>
> Specify and document the way in which AT_FLAGS can be used to advertise
> this feature to the userspace.
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> CC: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
>
> Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
> ---
>   Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++
>   1 file changed, 133 insertions(+)
>   create mode 100644 Documentation/arm64/elf_at_flags.txt
>
> diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt
> new file mode 100644
> index 000000000000..9b3494207c14
> --- /dev/null
> +++ b/Documentation/arm64/elf_at_flags.txt
> @@ -0,0 +1,133 @@
> +ARM64 ELF AT_FLAGS
> +==================
> +
> +This document describes the usage and semantics of AT_FLAGS on arm64.
> +
> +1. Introduction
> +---------------
> +
> +AT_FLAGS is part of the Auxiliary Vector, contains the flags and it
> +is set to zero by the kernel on arm64 unless one or more of the
> +features detailed in paragraph 2 are present.
> +
> +The auxiliary vector can be accessed by the userspace using the
> +getauxval() API provided by the C library.
> +getauxval() returns an unsigned long and when a flag is present in
> +the AT_FLAGS, the corresponding bit in the returned value is set to 1.
> +
> +The AT_FLAGS with a "defined semantics" on arm64 are exposed to the
> +userspace via user API (uapi/asm/atflags.h).
> +The AT_FLAGS bits with "undefined semantics" are set to zero by default.
> +This means that the AT_FLAGS bits to which this document does not assign
> +an explicit meaning are to be intended reserved for future use.
> +The kernel will populate all such bits with zero until meanings are
> +assigned to them. If and when meanings are assigned, it is guaranteed
> +that they will not impact the functional operation of existing userspace
> +software. Userspace software should ignore any AT_FLAGS bit whose meaning
> +is not defined when the software is written.
> +
> +The userspace software can test for features by acquiring the AT_FLAGS
> +entry of the auxiliary vector, and testing whether a relevant flag
> +is set.
> +
> +Example of a userspace test function:
> +
> +bool feature_x_is_present(void)
> +{
> +	unsigned long at_flags = getauxval(AT_FLAGS);
> +	if (at_flags & FEATURE_X)
> +		return true;
> +
> +	return false;
> +}
> +
> +Where the software relies on a feature advertised by AT_FLAGS, it
> +must check that the feature is present before attempting to
> +use it.
> +
> +2. Features exposed via AT_FLAGS
> +--------------------------------
> +
> +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
> +
> +    On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
> +    kernel, hence the userspace (EL0) is allowed to set a non-zero value
> +    in the top byte but the resulting pointers are not allowed at the
> +    user-kernel syscall ABI boundary.
> +    When bit[0] is set to 1 the kernel is advertising to the userspace
> +    that a relaxed ABI is supported hence this type of pointers are now
> +    allowed to be passed to the syscalls, when these pointers are in
> +    memory ranges privately owned by a process and obtained by the
> +    process in accordance with the definition of "valid tagged pointer"
> +    in paragraph 3.
> +    In these cases the tag is preserved as the pointer goes through the
> +    kernel. Only when the kernel needs to check if a pointer is coming
> +    from userspace an untag operation is required.

I would leave this last sentence out, because:
1. It is an implementation detail that doesn't impact this user ABI.
2. It is not entirely accurate: untagging the pointer may be needed for various kinds 
of address lookup (like finding the corresponding VMA), at which point the kernel 
usually already knows it is a userspace pointer.

> +
> +3. ARM64_AT_FLAGS_SYSCALL_TBI
> +-----------------------------
> +
> +From the kernel syscall interface prospective, we define, for the purposes
> +of this document, a "valid tagged pointer" as a pointer that either it has
> +a zero value set in the top byte or it has a non-zero value, it is in memory
> +ranges privately owned by a userspace process and it is obtained in one of
> +the following ways:
> +  - mmap() done by the process itself, where either:
> +    * flags = MAP_PRIVATE | MAP_ANONYMOUS
> +    * flags = MAP_PRIVATE and the file descriptor refers to a regular
> +      file or "/dev/zero"
> +  - a mapping below sbrk(0) done by the process itself

I don't think that's very clear, this doesn't say how the mapping is obtained. Maybe 
"a mapping obtained by the process using brk() or sbrk()"?

> +  - any memory mapped by the kernel in the process's address space during
> +    creation and following the restrictions presented above (i.e. data, bss,
> +    stack).

With the rules above, the code section is included as well. Replacing "i.e." with 
"e.g." would avoid having to list every single section (which is probably not a good 
idea anyway).

Kevin

> +
> +When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following
> +behaviours are guaranteed by the ABI:
> +
> +  - Every current or newly introduced syscall can accept any valid tagged
> +    pointers.
> +
> +  - If a non valid tagged pointer is passed to a syscall then the behaviour
> +    is undefined.
> +
> +  - Every valid tagged pointer is expected to work as an untagged one.
> +
> +  - The kernel preserves any valid tagged pointers and returns them to the
> +    userspace unchanged in all the cases except the ones documented in the
> +    "Preserving tags" paragraph of tagged-pointers.txt.
> +
> +A definition of the meaning of tagged pointers on arm64 can be found in:
> +Documentation/arm64/tagged-pointers.txt.
> +
> +Example of correct usage (pseudo-code) for a userspace application:
> +
> +bool arm64_syscall_tbi_is_present(void)
> +{
> +	unsigned long at_flags = getauxval(AT_FLAGS);
> +	if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
> +			return true;
> +
> +	return false;
> +}
> +
> +void main(void)
> +{
> +	char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
> +			  MAP_ANONYMOUS, -1, 0);
> +
> +	int fd = open("test.txt", O_WRONLY);
> +
> +	/* Check if the relaxed ABI is supported */
> +	if (arm64_syscall_tbi_is_present()) {
> +		/* Add a tag to the pointer */
> +		addr = tag_pointer(addr);
> +	}
> +
> +	strcpy("Hello World\n", addr);
> +
> +	/* Write to a file */
> +	write(fd, addr, sizeof(addr));
> +
> +	close(fd);
> +}
> +
Catalin Marinas April 3, 2019, 4:50 p.m. UTC | #4
On Fri, Mar 22, 2019 at 03:52:49PM +0000, Kevin Brodsky wrote:
> On 18/03/2019 16:35, Vincenzo Frascino wrote:
> > +2. Features exposed via AT_FLAGS
> > +--------------------------------
> > +
> > +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
> > +
> > +    On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
> > +    kernel, hence the userspace (EL0) is allowed to set a non-zero value
> > +    in the top byte but the resulting pointers are not allowed at the
> > +    user-kernel syscall ABI boundary.
> > +    When bit[0] is set to 1 the kernel is advertising to the userspace
> > +    that a relaxed ABI is supported hence this type of pointers are now
> > +    allowed to be passed to the syscalls, when these pointers are in
> > +    memory ranges privately owned by a process and obtained by the
> > +    process in accordance with the definition of "valid tagged pointer"
> > +    in paragraph 3.
> > +    In these cases the tag is preserved as the pointer goes through the
> > +    kernel. Only when the kernel needs to check if a pointer is coming
> > +    from userspace an untag operation is required.
> 
> I would leave this last sentence out, because:
> 1. It is an implementation detail that doesn't impact this user ABI.
> 2. It is not entirely accurate: untagging the pointer may be needed for
> various kinds of address lookup (like finding the corresponding VMA), at
> which point the kernel usually already knows it is a userspace pointer.

I fully agree, the above paragraph should not be part of the user ABI
document.

> > +3. ARM64_AT_FLAGS_SYSCALL_TBI
> > +-----------------------------
> > +
> > +From the kernel syscall interface prospective, we define, for the purposes
> > +of this document, a "valid tagged pointer" as a pointer that either it has
> > +a zero value set in the top byte or it has a non-zero value, it is in memory
> > +ranges privately owned by a userspace process and it is obtained in one of
> > +the following ways:
> > +  - mmap() done by the process itself, where either:
> > +    * flags = MAP_PRIVATE | MAP_ANONYMOUS
> > +    * flags = MAP_PRIVATE and the file descriptor refers to a regular
> > +      file or "/dev/zero"
> > +  - a mapping below sbrk(0) done by the process itself
> 
> I don't think that's very clear, this doesn't say how the mapping is
> obtained. Maybe "a mapping obtained by the process using brk() or sbrk()"?

I think what we mean here is anything in the "[heap]" section as per
/proc/*/maps (in the kernel this would be start_brk to brk).

> > +  - any memory mapped by the kernel in the process's address space during
> > +    creation and following the restrictions presented above (i.e. data, bss,
> > +    stack).
> 
> With the rules above, the code section is included as well. Replacing "i.e."
> with "e.g." would avoid having to list every single section (which is
> probably not a good idea anyway).

We could mention [stack] explicitly as that's documented in the
Documentation/filesystems/proc.txt and it's likely considered ABI
already.

The code section is MAP_PRIVATE, and can be done by the dynamic loader
(user process), so it falls under the mmap() rules listed above. I guess
we could simply drop "done by the process itself" here and allow
MAP_PRIVATE|MAP_ANONYMOUS or MAP_PRIVATE of regular file. This would
cover the [heap] and [stack] and we won't have to debate the brk() case
at all.

We probably mention somewhere (or we should in the tagged pointers doc)
that we don't support tagged PC.
Kevin Brodsky April 12, 2019, 2:16 p.m. UTC | #5
On 03/04/2019 17:50, Catalin Marinas wrote:
> On Fri, Mar 22, 2019 at 03:52:49PM +0000, Kevin Brodsky wrote:
>> On 18/03/2019 16:35, Vincenzo Frascino wrote:
>>> +2. Features exposed via AT_FLAGS
>>> +--------------------------------
>>> +
>>> +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
>>> +
>>> +    On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
>>> +    kernel, hence the userspace (EL0) is allowed to set a non-zero value
>>> +    in the top byte but the resulting pointers are not allowed at the
>>> +    user-kernel syscall ABI boundary.
>>> +    When bit[0] is set to 1 the kernel is advertising to the userspace
>>> +    that a relaxed ABI is supported hence this type of pointers are now
>>> +    allowed to be passed to the syscalls, when these pointers are in
>>> +    memory ranges privately owned by a process and obtained by the
>>> +    process in accordance with the definition of "valid tagged pointer"
>>> +    in paragraph 3.
>>> +    In these cases the tag is preserved as the pointer goes through the
>>> +    kernel. Only when the kernel needs to check if a pointer is coming
>>> +    from userspace an untag operation is required.
>> I would leave this last sentence out, because:
>> 1. It is an implementation detail that doesn't impact this user ABI.
>> 2. It is not entirely accurate: untagging the pointer may be needed for
>> various kinds of address lookup (like finding the corresponding VMA), at
>> which point the kernel usually already knows it is a userspace pointer.
> I fully agree, the above paragraph should not be part of the user ABI
> document.
>
>>> +3. ARM64_AT_FLAGS_SYSCALL_TBI
>>> +-----------------------------
>>> +
>>> +From the kernel syscall interface prospective, we define, for the purposes
>>> +of this document, a "valid tagged pointer" as a pointer that either it has
>>> +a zero value set in the top byte or it has a non-zero value, it is in memory
>>> +ranges privately owned by a userspace process and it is obtained in one of
>>> +the following ways:
>>> +  - mmap() done by the process itself, where either:
>>> +    * flags = MAP_PRIVATE | MAP_ANONYMOUS
>>> +    * flags = MAP_PRIVATE and the file descriptor refers to a regular
>>> +      file or "/dev/zero"
>>> +  - a mapping below sbrk(0) done by the process itself
>> I don't think that's very clear, this doesn't say how the mapping is
>> obtained. Maybe "a mapping obtained by the process using brk() or sbrk()"?
> I think what we mean here is anything in the "[heap]" section as per
> /proc/*/maps (in the kernel this would be start_brk to brk).
>
>>> +  - any memory mapped by the kernel in the process's address space during
>>> +    creation and following the restrictions presented above (i.e. data, bss,
>>> +    stack).
>> With the rules above, the code section is included as well. Replacing "i.e."
>> with "e.g." would avoid having to list every single section (which is
>> probably not a good idea anyway).
> We could mention [stack] explicitly as that's documented in the
> Documentation/filesystems/proc.txt and it's likely considered ABI
> already.
>
> The code section is MAP_PRIVATE, and can be done by the dynamic loader
> (user process), so it falls under the mmap() rules listed above. I guess
> we could simply drop "done by the process itself" here and allow
> MAP_PRIVATE|MAP_ANONYMOUS or MAP_PRIVATE of regular file. This would
> cover the [heap] and [stack] and we won't have to debate the brk() case
> at all.

That's probably the best option. I initially used this wording because I was worried 
that there could be cases where the kernel allocates "magic" memory for userspace 
that is MAP_PRIVATE|MAP_ANONYMOUS, but in fact it's probably not the case (presumably 
such mapping should always be done via install_special_mapping(), which is definitely 
not MAP_PRIVATE).

> We probably mention somewhere (or we should in the tagged pointers doc)
> that we don't support tagged PC.

I think that Documentation/arm64/tagged-pointers.txt already makes it reasonably 
clear (anyway, with the architecture not supporting it, you can't expect much from 
the kernel).

Kevin

Patch
diff mbox series

diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt
new file mode 100644
index 000000000000..9b3494207c14
--- /dev/null
+++ b/Documentation/arm64/elf_at_flags.txt
@@ -0,0 +1,133 @@ 
+ARM64 ELF AT_FLAGS
+==================
+
+This document describes the usage and semantics of AT_FLAGS on arm64.
+
+1. Introduction
+---------------
+
+AT_FLAGS is part of the Auxiliary Vector, contains the flags and it
+is set to zero by the kernel on arm64 unless one or more of the
+features detailed in paragraph 2 are present.
+
+The auxiliary vector can be accessed by the userspace using the
+getauxval() API provided by the C library.
+getauxval() returns an unsigned long and when a flag is present in
+the AT_FLAGS, the corresponding bit in the returned value is set to 1.
+
+The AT_FLAGS with a "defined semantics" on arm64 are exposed to the
+userspace via user API (uapi/asm/atflags.h).
+The AT_FLAGS bits with "undefined semantics" are set to zero by default.
+This means that the AT_FLAGS bits to which this document does not assign
+an explicit meaning are to be intended reserved for future use.
+The kernel will populate all such bits with zero until meanings are
+assigned to them. If and when meanings are assigned, it is guaranteed
+that they will not impact the functional operation of existing userspace
+software. Userspace software should ignore any AT_FLAGS bit whose meaning
+is not defined when the software is written.
+
+The userspace software can test for features by acquiring the AT_FLAGS
+entry of the auxiliary vector, and testing whether a relevant flag
+is set.
+
+Example of a userspace test function:
+
+bool feature_x_is_present(void)
+{
+	unsigned long at_flags = getauxval(AT_FLAGS);
+	if (at_flags & FEATURE_X)
+		return true;
+
+	return false;
+}
+
+Where the software relies on a feature advertised by AT_FLAGS, it
+must check that the feature is present before attempting to
+use it.
+
+2. Features exposed via AT_FLAGS
+--------------------------------
+
+bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
+
+    On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
+    kernel, hence the userspace (EL0) is allowed to set a non-zero value
+    in the top byte but the resulting pointers are not allowed at the
+    user-kernel syscall ABI boundary.
+    When bit[0] is set to 1 the kernel is advertising to the userspace
+    that a relaxed ABI is supported hence this type of pointers are now
+    allowed to be passed to the syscalls, when these pointers are in
+    memory ranges privately owned by a process and obtained by the
+    process in accordance with the definition of "valid tagged pointer"
+    in paragraph 3.
+    In these cases the tag is preserved as the pointer goes through the
+    kernel. Only when the kernel needs to check if a pointer is coming
+    from userspace an untag operation is required.
+
+3. ARM64_AT_FLAGS_SYSCALL_TBI
+-----------------------------
+
+From the kernel syscall interface prospective, we define, for the purposes
+of this document, a "valid tagged pointer" as a pointer that either it has
+a zero value set in the top byte or it has a non-zero value, it is in memory
+ranges privately owned by a userspace process and it is obtained in one of
+the following ways:
+  - mmap() done by the process itself, where either:
+    * flags = MAP_PRIVATE | MAP_ANONYMOUS
+    * flags = MAP_PRIVATE and the file descriptor refers to a regular
+      file or "/dev/zero"
+  - a mapping below sbrk(0) done by the process itself
+  - any memory mapped by the kernel in the process's address space during
+    creation and following the restrictions presented above (i.e. data, bss,
+    stack).
+
+When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following
+behaviours are guaranteed by the ABI:
+
+  - Every current or newly introduced syscall can accept any valid tagged
+    pointers.
+
+  - If a non valid tagged pointer is passed to a syscall then the behaviour
+    is undefined.
+
+  - Every valid tagged pointer is expected to work as an untagged one.
+
+  - The kernel preserves any valid tagged pointers and returns them to the
+    userspace unchanged in all the cases except the ones documented in the
+    "Preserving tags" paragraph of tagged-pointers.txt.
+
+A definition of the meaning of tagged pointers on arm64 can be found in:
+Documentation/arm64/tagged-pointers.txt.
+
+Example of correct usage (pseudo-code) for a userspace application:
+
+bool arm64_syscall_tbi_is_present(void)
+{
+	unsigned long at_flags = getauxval(AT_FLAGS);
+	if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
+			return true;
+
+	return false;
+}
+
+void main(void)
+{
+	char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
+			  MAP_ANONYMOUS, -1, 0);
+
+	int fd = open("test.txt", O_WRONLY);
+
+	/* Check if the relaxed ABI is supported */
+	if (arm64_syscall_tbi_is_present()) {
+		/* Add a tag to the pointer */
+		addr = tag_pointer(addr);
+	}
+
+	strcpy("Hello World\n", addr);
+
+	/* Write to a file */
+	write(fd, addr, sizeof(addr));
+
+	close(fd);
+}
+