diff mbox series

[v7] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis

Message ID 20210629235704.2495183-1-pcc@google.com (mailing list archive)
State New, archived
Headers show
Series [v7] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis | expand

Commit Message

Peter Collingbourne June 29, 2021, 11:57 p.m. UTC
On some CPUs the performance of MTE in synchronous mode is similar
to that of asynchronous mode. This makes it worthwhile to enable
synchronous mode on those CPUs when asynchronous mode is requested,
in order to gain the error detection benefits of synchronous mode
without the performance downsides. Therefore, make it possible for
user programs to opt into upgrading to synchronous mode on those CPUs.

This is done by introducing a notion of a preferred TCF mode, which
is controlled on a per-CPU basis by a sysfs node. The existing SYNC
and ASYNC TCF settings are repurposed as bitfields that specify a set
of possible modes. If the preferred TCF mode for a particular CPU is
in the user-provided mode set then that mode is used when running on
that CPU, otherwise the least strict mode in the mode set is used.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8
---
v7:
- switch to new API proposed on list

v6:
- switch to strings in sysfs nodes instead of TCF values

v5:
- updated documentation
- address some nits in mte.c

v4:
- switch to new mte_ctrl field
- make register_mte_upgrade_async_sysctl return an int
- change the sysctl to take 0 or 1 instead of raw TCF values
- "same as" -> "similar to"

v3:
- drop the device tree support
- add documentation
- add static_assert to ensure no overlap with real HW bits
- move per-CPU variable initialization to mte.c
- use smp_call_function_single instead of stop_machine

v2:
- make it an opt-in behavior
- change the format of the device tree node
- also allow controlling the feature via sysfs

 .../arm64/memory-tagging-extension.rst        |  36 +++-
 arch/arm64/include/asm/mte.h                  |   4 +
 arch/arm64/include/asm/processor.h            |   9 +-
 arch/arm64/kernel/asm-offsets.c               |   2 +-
 arch/arm64/kernel/entry.S                     |   4 +-
 arch/arm64/kernel/mte.c                       | 160 ++++++++++++------
 include/uapi/linux/prctl.h                    |   9 +-
 7 files changed, 162 insertions(+), 62 deletions(-)

Comments

Catalin Marinas June 30, 2021, 5:34 p.m. UTC | #1
On Tue, Jun 29, 2021 at 04:57:04PM -0700, Peter Collingbourne wrote:
> diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
> index b540178a93f8..39c04541f57c 100644
> --- a/Documentation/arm64/memory-tagging-extension.rst
> +++ b/Documentation/arm64/memory-tagging-extension.rst
> @@ -77,14 +77,18 @@ configurable behaviours:
>    address is unknown).
>  
>  The user can select the above modes, per thread, using the
> -``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> -``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
> +contains any number of the following values in the ``PR_MTE_TCF_MASK``
>  bit-field:
>  
> -- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults

I'd keep this and maybe add a comment that it will be ignored if
combined with any other options. IOW, it's already user API and I'd keep
it together with all the definition in prctl.h (you haven't removed it).

>  - ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
>  - ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
>  
> +If no modes are specified, tag check faults are ignored. If a single
> +mode is specified, the program will run in that mode. If multiple
> +modes are specified, the mode is selected as described in the "Per-CPU
> +preferred tag checking modes" section below.
> +
>  The current tag check fault mode can be read using the
>  ``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
>  
> @@ -120,13 +124,37 @@ in the ``PR_MTE_TAG_MASK`` bit-field.
>  interface provides an include mask. An include mask of ``0`` (exclusion
>  mask ``0xffff``) results in the CPU always generating tag ``0``.
>  
> +Per-CPU preferred tag checking mode
> +-----------------------------------
> +
> +On some CPUs the performance of MTE in stricter tag checking modes
> +is similar to that of less strict tag checking modes. This makes it
> +worthwhile to enable stricter checks on those CPUs when a less strict
> +checking mode is requested, in order to gain the error detection
> +benefits of the stricter checks without the performance downsides. To
> +support this scenario, a privileged user may configure a stricter
> +tag checking mode as the CPU's preferred tag checking mode.
> +
> +The preferred tag checking mode for each CPU is controlled by
> +``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
> +privileged user may write the value ``async`` or ``sync``.  The default
> +preferred mode for each CPU is ``async``.
> +
> +To allow a program to potentially run in the CPU's preferred tag
> +checking mode, the user program may set multiple tag check fault mode
> +bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> +flags, 0, 0, 0)`` system call. If the CPU's preferred tag checking mode
> +is in the task's set of provided tag checking modes, that mode will
> +be selected. Otherwise, the least strict mode in the set is selected,
> +where ``async`` is considered less strict than ``sync``.

Specifying async vs sync order at this stage probably doesn't bring
much, though I don't mind as we'll add asym at some point. We already
have the default "async" in the mte_tcf_preferred, so async will be
selected if an app selects both. Also the intersection of task mode and
mte_tcf_preferred is only void if a single mode is selected by the user,
in which that case that mode would have to be honoured.

Where it gets more interesting is when we add the asym mode and current
applications only set sync|async. I expect CPUs that implement sync
sufficiently fast will continue with this preferred mode. The others may
likely want asym as a middle ground. With the bit-based order, sync will
be selected.

We might as well leave this debate until we have asym support since the
order doesn't matter with only two modes.

> +
>  Initial process state
>  ---------------------
>  
>  On ``execve()``, the new process has the following configuration:
>  
>  - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
> -- Tag checking mode set to ``PR_MTE_TCF_NONE``
> +- No tag checking modes are selected (tag check faults ignored)
>  - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
>  - ``PSTATE.TCO`` set to 0
>  - ``PROT_MTE`` not set on any of the initial memory maps

I think we should also update the example to show the or'ing of the
modes into the prctl() flags.

> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 9df3feeee890..6ad7b6c188f4 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -16,6 +16,13 @@
>   */
>  #define NET_IP_ALIGN	0
>  
> +#define MTE_CTRL_GCR_USER_EXCL_SHIFT	0
> +#define MTE_CTRL_GCR_USER_EXCL_MASK	0xffff
> +
> +#define MTE_CTRL_TCF_NONE		(1UL << 16)

Do we need this bit for none? I think we can get rid of it and simplify
the if/else blocks slightly.

> +#define MTE_CTRL_TCF_SYNC		(1UL << 17)
> +#define MTE_CTRL_TCF_ASYNC		(1UL << 18)
> +
>  #ifndef __ASSEMBLY__
>  
>  #include <linux/build_bug.h>
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 125a10e413e9..c85eb9bd5a37 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -216,15 +209,34 @@ void mte_thread_init_user(void)
>  	dsb(ish);
>  	write_sysreg_s(0, SYS_TFSRE0_EL1);
>  	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> -	/* disable tag checking */
> -	set_task_sctlr_el1((current->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK) |
> -			   SCTLR_EL1_TCF0_NONE);
> -	/* reset tag generation mask */
> -	set_gcr_el1_excl(SYS_GCR_EL1_EXCL_MASK);
> +	/* disable tag checking and reset tag generation mask */
> +	current->thread.mte_ctrl =
> +		MTE_CTRL_GCR_USER_EXCL_MASK | MTE_CTRL_TCF_NONE;

We can drop MTE_CTRL_TCF_NONE here. If no async or sync mode is implied,
we assume no tag checking without an explicit 'none'.

> +	mte_update_sctlr_user(current);
> +	set_task_sctlr_el1(current->thread.sctlr_user);
> +}
> +
> +void mte_update_sctlr_user(struct task_struct *task)
> +{
> +	unsigned long sctlr = task->thread.sctlr_user;
> +	unsigned long pref = __this_cpu_read(mte_tcf_preferred);

We need to ensure this is called in a non-preemptible context. I don't
think the prctl() can guarantee that, so we get some weird behaviour if
it gets context switched mid-way through this. The callback from
mte_thread_switch() is fine and I think smp_call_function_single() does
a get_cpu() already.

I'd also add a function comment that it can only be called on the
current or next task otherwise since the CPU must match where the thread
is going to run.

Maybe easier to just pass the CPU number in addition to task. In most
cases it is smp_processor_id() while for prctl() we'd do a
get_cpu/put_cpu(). This would be cleaner unless this_cpu_read() is
(marginally) faster than per_cpu().

> +	unsigned long mte_ctrl = task->thread.mte_ctrl;
> +	unsigned long resolved_mte_tcf = (mte_ctrl & pref) ? pref : mte_ctrl;
> +
> +	sctlr &= ~SCTLR_EL1_TCF0_MASK;
> +	if (resolved_mte_tcf & MTE_CTRL_TCF_NONE)
> +		sctlr |= SCTLR_EL1_TCF0_NONE;

Same comment on dropping MTE_CTRL_TCF_NONE.

> +	else if (resolved_mte_tcf & MTE_CTRL_TCF_ASYNC)
> +		sctlr |= SCTLR_EL1_TCF0_ASYNC;
> +	else if (resolved_mte_tcf & MTE_CTRL_TCF_SYNC)
> +		sctlr |= SCTLR_EL1_TCF0_SYNC;
> +	task->thread.sctlr_user = sctlr;
>  }
>  
>  void mte_thread_switch(struct task_struct *next)
>  {
> +	mte_update_sctlr_user(next);
> +
>  	/*
>  	 * Check if an async tag exception occurred at EL1.
>  	 *
> @@ -262,33 +274,24 @@ void mte_suspend_exit(void)
>  
>  long set_mte_ctrl(struct task_struct *task, unsigned long arg)
>  {
> -	u64 sctlr = task->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK;
> -	u64 gcr_excl = ~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) &
> -		       SYS_GCR_EL1_EXCL_MASK;
> +	u64 mte_ctrl = (~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) &
> +			SYS_GCR_EL1_EXCL_MASK)
> +		       << MTE_CTRL_GCR_USER_EXCL_SHIFT;

Nit: just move the << MTE_CTRL_GCR_USER_EXCL_SHIFT to the previous line
(it gets to about 80 characters, I don't have a strict limit).

>  	if (!system_supports_mte())
>  		return 0;
>  
> -	switch (arg & PR_MTE_TCF_MASK) {
> -	case PR_MTE_TCF_NONE:
> -		sctlr |= SCTLR_EL1_TCF0_NONE;
> -		break;
> -	case PR_MTE_TCF_SYNC:
> -		sctlr |= SCTLR_EL1_TCF0_SYNC;
> -		break;
> -	case PR_MTE_TCF_ASYNC:
> -		sctlr |= SCTLR_EL1_TCF0_ASYNC;
> -		break;
> -	default:
> -		return -EINVAL;
> -	}
> -
> -	if (task != current) {
> -		task->thread.sctlr_user = sctlr;
> -		task->thread.gcr_user_excl = gcr_excl;
> -	} else {
> -		set_task_sctlr_el1(sctlr);
> -		set_gcr_el1_excl(gcr_excl);
> +	if ((arg & PR_MTE_TCF_MASK) == PR_MTE_TCF_NONE)
> +		mte_ctrl |= MTE_CTRL_TCF_NONE;
> +	if (arg & PR_MTE_TCF_ASYNC)
> +		mte_ctrl |= MTE_CTRL_TCF_ASYNC;
> +	if (arg & PR_MTE_TCF_SYNC)
> +		mte_ctrl |= MTE_CTRL_TCF_SYNC;
> +
> +	task->thread.mte_ctrl = mte_ctrl;
> +	if (task == current) {
> +		mte_update_sctlr_user(task);
> +		set_task_sctlr_el1(task->thread.sctlr_user);

Here I don't think we guarantee the preemption being disabled.

> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 18a9f59dc067..48de354b847f 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -234,11 +234,10 @@ struct prctl_mm_map {
>  #define PR_GET_TAGGED_ADDR_CTRL		56
>  # define PR_TAGGED_ADDR_ENABLE		(1UL << 0)
>  /* MTE tag check fault modes */
> -# define PR_MTE_TCF_SHIFT		1
> -# define PR_MTE_TCF_NONE		(0UL << PR_MTE_TCF_SHIFT)
> -# define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
> -# define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
> -# define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
> +# define PR_MTE_TCF_NONE		0
> +# define PR_MTE_TCF_SYNC		(1UL << 1)
> +# define PR_MTE_TCF_ASYNC		(1UL << 2)
> +# define PR_MTE_TCF_MASK		(PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC)

I'm fine with redefining these but removing PR_MTE_TCF_SHIFT may break
some current user-space compilation. It does look weird to keep it
around though.

Anyway, I think the patch looks alright overall but I'd split it in 3-4
small patches for easier reviewing. Something like:

0. Cover letter ;)

1. Rename gcr_user_excl to mte_ctrl

2. Use individual bits for TCF sync/async and allow combinations. This
   would update the uapi prctl.h. On its own, this patch will have to
   choose a preferred mode since we don't have the sysfs interface yet.

3. Introduce the sysfs interface and the preferred mode selection

4. Update documentation

Thanks.
Peter Collingbourne June 30, 2021, 11:15 p.m. UTC | #2
On Wed, Jun 30, 2021 at 10:34 AM Catalin Marinas
<catalin.marinas@arm.com> wrote:
>
> On Tue, Jun 29, 2021 at 04:57:04PM -0700, Peter Collingbourne wrote:
> > diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
> > index b540178a93f8..39c04541f57c 100644
> > --- a/Documentation/arm64/memory-tagging-extension.rst
> > +++ b/Documentation/arm64/memory-tagging-extension.rst
> > @@ -77,14 +77,18 @@ configurable behaviours:
> >    address is unknown).
> >
> >  The user can select the above modes, per thread, using the
> > -``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> > -``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> > +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
> > +contains any number of the following values in the ``PR_MTE_TCF_MASK``
> >  bit-field:
> >
> > -- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
>
> I'd keep this and maybe add a comment that it will be ignored if
> combined with any other options. IOW, it's already user API and I'd keep
> it together with all the definition in prctl.h (you haven't removed it).

Done.

> >  - ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> >  - ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
> >
> > +If no modes are specified, tag check faults are ignored. If a single
> > +mode is specified, the program will run in that mode. If multiple
> > +modes are specified, the mode is selected as described in the "Per-CPU
> > +preferred tag checking modes" section below.
> > +
> >  The current tag check fault mode can be read using the
> >  ``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
> >
> > @@ -120,13 +124,37 @@ in the ``PR_MTE_TAG_MASK`` bit-field.
> >  interface provides an include mask. An include mask of ``0`` (exclusion
> >  mask ``0xffff``) results in the CPU always generating tag ``0``.
> >
> > +Per-CPU preferred tag checking mode
> > +-----------------------------------
> > +
> > +On some CPUs the performance of MTE in stricter tag checking modes
> > +is similar to that of less strict tag checking modes. This makes it
> > +worthwhile to enable stricter checks on those CPUs when a less strict
> > +checking mode is requested, in order to gain the error detection
> > +benefits of the stricter checks without the performance downsides. To
> > +support this scenario, a privileged user may configure a stricter
> > +tag checking mode as the CPU's preferred tag checking mode.
> > +
> > +The preferred tag checking mode for each CPU is controlled by
> > +``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
> > +privileged user may write the value ``async`` or ``sync``.  The default
> > +preferred mode for each CPU is ``async``.
> > +
> > +To allow a program to potentially run in the CPU's preferred tag
> > +checking mode, the user program may set multiple tag check fault mode
> > +bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > +flags, 0, 0, 0)`` system call. If the CPU's preferred tag checking mode
> > +is in the task's set of provided tag checking modes, that mode will
> > +be selected. Otherwise, the least strict mode in the set is selected,
> > +where ``async`` is considered less strict than ``sync``.
>
> Specifying async vs sync order at this stage probably doesn't bring
> much, though I don't mind as we'll add asym at some point. We already
> have the default "async" in the mte_tcf_preferred, so async will be
> selected if an app selects both. Also the intersection of task mode and
> mte_tcf_preferred is only void if a single mode is selected by the user,
> in which that case that mode would have to be honoured.
>
> Where it gets more interesting is when we add the asym mode and current
> applications only set sync|async. I expect CPUs that implement sync
> sufficiently fast will continue with this preferred mode. The others may
> likely want asym as a middle ground. With the bit-based order, sync will
> be selected.
>
> We might as well leave this debate until we have asym support since the
> order doesn't matter with only two modes.

I'm not sure I agree about the bit-based order but we don't need to
specify it now, as you mentioned. So I reworded this to avoid
specifying it.

> > +
> >  Initial process state
> >  ---------------------
> >
> >  On ``execve()``, the new process has the following configuration:
> >
> >  - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
> > -- Tag checking mode set to ``PR_MTE_TCF_NONE``
> > +- No tag checking modes are selected (tag check faults ignored)
> >  - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
> >  - ``PSTATE.TCO`` set to 0
> >  - ``PROT_MTE`` not set on any of the initial memory maps
>
> I think we should also update the example to show the or'ing of the
> modes into the prctl() flags.

Done.

> > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> > index 9df3feeee890..6ad7b6c188f4 100644
> > --- a/arch/arm64/include/asm/processor.h
> > +++ b/arch/arm64/include/asm/processor.h
> > @@ -16,6 +16,13 @@
> >   */
> >  #define NET_IP_ALIGN 0
> >
> > +#define MTE_CTRL_GCR_USER_EXCL_SHIFT 0
> > +#define MTE_CTRL_GCR_USER_EXCL_MASK  0xffff
> > +
> > +#define MTE_CTRL_TCF_NONE            (1UL << 16)
>
> Do we need this bit for none? I think we can get rid of it and simplify
> the if/else blocks slightly.

No, I've removed it. I figured maybe we would want this if we wanted
to allow upgrading from NONE, but that doesn't need to be implemented
now.

> > +#define MTE_CTRL_TCF_SYNC            (1UL << 17)
> > +#define MTE_CTRL_TCF_ASYNC           (1UL << 18)
> > +
> >  #ifndef __ASSEMBLY__
> >
> >  #include <linux/build_bug.h>
> > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> > index 125a10e413e9..c85eb9bd5a37 100644
> > --- a/arch/arm64/kernel/mte.c
> > +++ b/arch/arm64/kernel/mte.c
> > @@ -216,15 +209,34 @@ void mte_thread_init_user(void)
> >       dsb(ish);
> >       write_sysreg_s(0, SYS_TFSRE0_EL1);
> >       clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> > -     /* disable tag checking */
> > -     set_task_sctlr_el1((current->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK) |
> > -                        SCTLR_EL1_TCF0_NONE);
> > -     /* reset tag generation mask */
> > -     set_gcr_el1_excl(SYS_GCR_EL1_EXCL_MASK);
> > +     /* disable tag checking and reset tag generation mask */
> > +     current->thread.mte_ctrl =
> > +             MTE_CTRL_GCR_USER_EXCL_MASK | MTE_CTRL_TCF_NONE;
>
> We can drop MTE_CTRL_TCF_NONE here. If no async or sync mode is implied,
> we assume no tag checking without an explicit 'none'.

Done.

> > +     mte_update_sctlr_user(current);
> > +     set_task_sctlr_el1(current->thread.sctlr_user);
> > +}
> > +
> > +void mte_update_sctlr_user(struct task_struct *task)
> > +{
> > +     unsigned long sctlr = task->thread.sctlr_user;
> > +     unsigned long pref = __this_cpu_read(mte_tcf_preferred);
>
> We need to ensure this is called in a non-preemptible context. I don't
> think the prctl() can guarantee that, so we get some weird behaviour if
> it gets context switched mid-way through this. The callback from
> mte_thread_switch() is fine and I think smp_call_function_single() does
> a get_cpu() already.
>
> I'd also add a function comment that it can only be called on the
> current or next task otherwise since the CPU must match where the thread
> is going to run.

Done.

> Maybe easier to just pass the CPU number in addition to task. In most
> cases it is smp_processor_id() while for prctl() we'd do a
> get_cpu/put_cpu(). This would be cleaner unless this_cpu_read() is
> (marginally) faster than per_cpu().

It seems like the simplest thing to do would be to add
preempt_disable()/preempt_enable() to this function, so that's what
I've done.

> > +     unsigned long mte_ctrl = task->thread.mte_ctrl;
> > +     unsigned long resolved_mte_tcf = (mte_ctrl & pref) ? pref : mte_ctrl;
> > +
> > +     sctlr &= ~SCTLR_EL1_TCF0_MASK;
> > +     if (resolved_mte_tcf & MTE_CTRL_TCF_NONE)
> > +             sctlr |= SCTLR_EL1_TCF0_NONE;
>
> Same comment on dropping MTE_CTRL_TCF_NONE.

Done.

> > +     else if (resolved_mte_tcf & MTE_CTRL_TCF_ASYNC)
> > +             sctlr |= SCTLR_EL1_TCF0_ASYNC;
> > +     else if (resolved_mte_tcf & MTE_CTRL_TCF_SYNC)
> > +             sctlr |= SCTLR_EL1_TCF0_SYNC;
> > +     task->thread.sctlr_user = sctlr;
> >  }
> >
> >  void mte_thread_switch(struct task_struct *next)
> >  {
> > +     mte_update_sctlr_user(next);
> > +
> >       /*
> >        * Check if an async tag exception occurred at EL1.
> >        *
> > @@ -262,33 +274,24 @@ void mte_suspend_exit(void)
> >
> >  long set_mte_ctrl(struct task_struct *task, unsigned long arg)
> >  {
> > -     u64 sctlr = task->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK;
> > -     u64 gcr_excl = ~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) &
> > -                    SYS_GCR_EL1_EXCL_MASK;
> > +     u64 mte_ctrl = (~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) &
> > +                     SYS_GCR_EL1_EXCL_MASK)
> > +                    << MTE_CTRL_GCR_USER_EXCL_SHIFT;
>
> Nit: just move the << MTE_CTRL_GCR_USER_EXCL_SHIFT to the previous line
> (it gets to about 80 characters, I don't have a strict limit).

Done. It looks like it's 79 columns in the end, not sure why
clang-format puts it on a separate line.

> >       if (!system_supports_mte())
> >               return 0;
> >
> > -     switch (arg & PR_MTE_TCF_MASK) {
> > -     case PR_MTE_TCF_NONE:
> > -             sctlr |= SCTLR_EL1_TCF0_NONE;
> > -             break;
> > -     case PR_MTE_TCF_SYNC:
> > -             sctlr |= SCTLR_EL1_TCF0_SYNC;
> > -             break;
> > -     case PR_MTE_TCF_ASYNC:
> > -             sctlr |= SCTLR_EL1_TCF0_ASYNC;
> > -             break;
> > -     default:
> > -             return -EINVAL;
> > -     }
> > -
> > -     if (task != current) {
> > -             task->thread.sctlr_user = sctlr;
> > -             task->thread.gcr_user_excl = gcr_excl;
> > -     } else {
> > -             set_task_sctlr_el1(sctlr);
> > -             set_gcr_el1_excl(gcr_excl);
> > +     if ((arg & PR_MTE_TCF_MASK) == PR_MTE_TCF_NONE)
> > +             mte_ctrl |= MTE_CTRL_TCF_NONE;
> > +     if (arg & PR_MTE_TCF_ASYNC)
> > +             mte_ctrl |= MTE_CTRL_TCF_ASYNC;
> > +     if (arg & PR_MTE_TCF_SYNC)
> > +             mte_ctrl |= MTE_CTRL_TCF_SYNC;
> > +
> > +     task->thread.mte_ctrl = mte_ctrl;
> > +     if (task == current) {
> > +             mte_update_sctlr_user(task);
> > +             set_task_sctlr_el1(task->thread.sctlr_user);
>
> Here I don't think we guarantee the preemption being disabled.
>
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index 18a9f59dc067..48de354b847f 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -234,11 +234,10 @@ struct prctl_mm_map {
> >  #define PR_GET_TAGGED_ADDR_CTRL              56
> >  # define PR_TAGGED_ADDR_ENABLE               (1UL << 0)
> >  /* MTE tag check fault modes */
> > -# define PR_MTE_TCF_SHIFT            1
> > -# define PR_MTE_TCF_NONE             (0UL << PR_MTE_TCF_SHIFT)
> > -# define PR_MTE_TCF_SYNC             (1UL << PR_MTE_TCF_SHIFT)
> > -# define PR_MTE_TCF_ASYNC            (2UL << PR_MTE_TCF_SHIFT)
> > -# define PR_MTE_TCF_MASK             (3UL << PR_MTE_TCF_SHIFT)
> > +# define PR_MTE_TCF_NONE             0
> > +# define PR_MTE_TCF_SYNC             (1UL << 1)
> > +# define PR_MTE_TCF_ASYNC            (1UL << 2)
> > +# define PR_MTE_TCF_MASK             (PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC)
>
> I'm fine with redefining these but removing PR_MTE_TCF_SHIFT may break
> some current user-space compilation. It does look weird to keep it
> around though.

Okay, I brought back PR_MTE_TCF_SHIFT with a comment to say that it's unused.

> Anyway, I think the patch looks alright overall but I'd split it in 3-4
> small patches for easier reviewing. Something like:
>
> 0. Cover letter ;)
>
> 1. Rename gcr_user_excl to mte_ctrl
>
> 2. Use individual bits for TCF sync/async and allow combinations. This
>    would update the uapi prctl.h. On its own, this patch will have to
>    choose a preferred mode since we don't have the sysfs interface yet.
>
> 3. Introduce the sysfs interface and the preferred mode selection
>
> 4. Update documentation

Done.

Peter
diff mbox series

Patch

diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
index b540178a93f8..39c04541f57c 100644
--- a/Documentation/arm64/memory-tagging-extension.rst
+++ b/Documentation/arm64/memory-tagging-extension.rst
@@ -77,14 +77,18 @@  configurable behaviours:
   address is unknown).
 
 The user can select the above modes, per thread, using the
-``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
-``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
+``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
+contains any number of the following values in the ``PR_MTE_TCF_MASK``
 bit-field:
 
-- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
 - ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
 - ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
 
+If no modes are specified, tag check faults are ignored. If a single
+mode is specified, the program will run in that mode. If multiple
+modes are specified, the mode is selected as described in the "Per-CPU
+preferred tag checking modes" section below.
+
 The current tag check fault mode can be read using the
 ``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
 
@@ -120,13 +124,37 @@  in the ``PR_MTE_TAG_MASK`` bit-field.
 interface provides an include mask. An include mask of ``0`` (exclusion
 mask ``0xffff``) results in the CPU always generating tag ``0``.
 
+Per-CPU preferred tag checking mode
+-----------------------------------
+
+On some CPUs the performance of MTE in stricter tag checking modes
+is similar to that of less strict tag checking modes. This makes it
+worthwhile to enable stricter checks on those CPUs when a less strict
+checking mode is requested, in order to gain the error detection
+benefits of the stricter checks without the performance downsides. To
+support this scenario, a privileged user may configure a stricter
+tag checking mode as the CPU's preferred tag checking mode.
+
+The preferred tag checking mode for each CPU is controlled by
+``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
+privileged user may write the value ``async`` or ``sync``.  The default
+preferred mode for each CPU is ``async``.
+
+To allow a program to potentially run in the CPU's preferred tag
+checking mode, the user program may set multiple tag check fault mode
+bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
+flags, 0, 0, 0)`` system call. If the CPU's preferred tag checking mode
+is in the task's set of provided tag checking modes, that mode will
+be selected. Otherwise, the least strict mode in the set is selected,
+where ``async`` is considered less strict than ``sync``.
+
 Initial process state
 ---------------------
 
 On ``execve()``, the new process has the following configuration:
 
 - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
-- Tag checking mode set to ``PR_MTE_TCF_NONE``
+- No tag checking modes are selected (tag check faults ignored)
 - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
 - ``PSTATE.TCO`` set to 0
 - ``PROT_MTE`` not set on any of the initial memory maps
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index bc88a1ced0d7..719687412798 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -40,6 +40,7 @@  void mte_free_tag_storage(char *storage);
 void mte_sync_tags(pte_t *ptep, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void mte_thread_init_user(void);
+void mte_update_sctlr_user(struct task_struct *task);
 void mte_thread_switch(struct task_struct *next);
 void mte_suspend_enter(void);
 void mte_suspend_exit(void);
@@ -62,6 +63,9 @@  static inline void mte_copy_page_tags(void *kto, const void *kfrom)
 static inline void mte_thread_init_user(void)
 {
 }
+static inline void mte_update_sctlr_user(struct task_struct *task)
+{
+}
 static inline void mte_thread_switch(struct task_struct *next)
 {
 }
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 9df3feeee890..6ad7b6c188f4 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -16,6 +16,13 @@ 
  */
 #define NET_IP_ALIGN	0
 
+#define MTE_CTRL_GCR_USER_EXCL_SHIFT	0
+#define MTE_CTRL_GCR_USER_EXCL_MASK	0xffff
+
+#define MTE_CTRL_TCF_NONE		(1UL << 16)
+#define MTE_CTRL_TCF_SYNC		(1UL << 17)
+#define MTE_CTRL_TCF_ASYNC		(1UL << 18)
+
 #ifndef __ASSEMBLY__
 
 #include <linux/build_bug.h>
@@ -151,7 +158,7 @@  struct thread_struct {
 	struct ptrauth_keys_kernel	keys_kernel;
 #endif
 #ifdef CONFIG_ARM64_MTE
-	u64			gcr_user_excl;
+	u64			mte_ctrl;
 #endif
 	u64			sctlr_user;
 };
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cb34ccb6e73..63d02cd67b44 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -49,7 +49,7 @@  int main(void)
   DEFINE(THREAD_KEYS_KERNEL,	offsetof(struct task_struct, thread.keys_kernel));
 #endif
 #ifdef CONFIG_ARM64_MTE
-  DEFINE(THREAD_GCR_EL1_USER,	offsetof(struct task_struct, thread.gcr_user_excl));
+  DEFINE(THREAD_MTE_CTRL,	offsetof(struct task_struct, thread.mte_ctrl));
 #endif
   BLANK();
   DEFINE(S_X0,			offsetof(struct pt_regs, regs[0]));
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 3513984a88bd..ce59280355c5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -182,7 +182,7 @@  alternative_else_nop_endif
 	 * the RRND (bit[16]) setting.
 	 */
 	mrs_s	\tmp2, SYS_GCR_EL1
-	bfi	\tmp2, \tmp, #0, #16
+	bfxil	\tmp2, \tmp, #MTE_CTRL_GCR_USER_EXCL_SHIFT, #16
 	msr_s	SYS_GCR_EL1, \tmp2
 #endif
 	.endm
@@ -205,7 +205,7 @@  alternative_else_nop_endif
 alternative_if_not ARM64_MTE
 	b	1f
 alternative_else_nop_endif
-	ldr	\tmp, [\tsk, #THREAD_GCR_EL1_USER]
+	ldr	\tmp, [\tsk, #THREAD_MTE_CTRL]
 
 	mte_set_gcr \tmp, \tmp2
 1:
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 125a10e413e9..c85eb9bd5a37 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -4,6 +4,7 @@ 
  */
 
 #include <linux/bitops.h>
+#include <linux/cpu.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/prctl.h>
@@ -26,6 +27,8 @@  u64 gcr_kernel_excl __ro_after_init;
 
 static bool report_fault_once = true;
 
+static DEFINE_PER_CPU_READ_MOSTLY(u64, mte_tcf_preferred);
+
 #ifdef CONFIG_KASAN_HW_TAGS
 /* Whether the MTE asynchronous mode is enabled. */
 DEFINE_STATIC_KEY_FALSE(mte_async_mode);
@@ -197,16 +200,6 @@  static void update_gcr_el1_excl(u64 excl)
 	sysreg_clear_set_s(SYS_GCR_EL1, SYS_GCR_EL1_EXCL_MASK, excl);
 }
 
-static void set_gcr_el1_excl(u64 excl)
-{
-	current->thread.gcr_user_excl = excl;
-
-	/*
-	 * SYS_GCR_EL1 will be set to current->thread.gcr_user_excl value
-	 * by mte_set_user_gcr() in kernel_exit,
-	 */
-}
-
 void mte_thread_init_user(void)
 {
 	if (!system_supports_mte())
@@ -216,15 +209,34 @@  void mte_thread_init_user(void)
 	dsb(ish);
 	write_sysreg_s(0, SYS_TFSRE0_EL1);
 	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
-	/* disable tag checking */
-	set_task_sctlr_el1((current->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK) |
-			   SCTLR_EL1_TCF0_NONE);
-	/* reset tag generation mask */
-	set_gcr_el1_excl(SYS_GCR_EL1_EXCL_MASK);
+	/* disable tag checking and reset tag generation mask */
+	current->thread.mte_ctrl =
+		MTE_CTRL_GCR_USER_EXCL_MASK | MTE_CTRL_TCF_NONE;
+	mte_update_sctlr_user(current);
+	set_task_sctlr_el1(current->thread.sctlr_user);
+}
+
+void mte_update_sctlr_user(struct task_struct *task)
+{
+	unsigned long sctlr = task->thread.sctlr_user;
+	unsigned long pref = __this_cpu_read(mte_tcf_preferred);
+	unsigned long mte_ctrl = task->thread.mte_ctrl;
+	unsigned long resolved_mte_tcf = (mte_ctrl & pref) ? pref : mte_ctrl;
+
+	sctlr &= ~SCTLR_EL1_TCF0_MASK;
+	if (resolved_mte_tcf & MTE_CTRL_TCF_NONE)
+		sctlr |= SCTLR_EL1_TCF0_NONE;
+	else if (resolved_mte_tcf & MTE_CTRL_TCF_ASYNC)
+		sctlr |= SCTLR_EL1_TCF0_ASYNC;
+	else if (resolved_mte_tcf & MTE_CTRL_TCF_SYNC)
+		sctlr |= SCTLR_EL1_TCF0_SYNC;
+	task->thread.sctlr_user = sctlr;
 }
 
 void mte_thread_switch(struct task_struct *next)
 {
+	mte_update_sctlr_user(next);
+
 	/*
 	 * Check if an async tag exception occurred at EL1.
 	 *
@@ -262,33 +274,24 @@  void mte_suspend_exit(void)
 
 long set_mte_ctrl(struct task_struct *task, unsigned long arg)
 {
-	u64 sctlr = task->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK;
-	u64 gcr_excl = ~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) &
-		       SYS_GCR_EL1_EXCL_MASK;
+	u64 mte_ctrl = (~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) &
+			SYS_GCR_EL1_EXCL_MASK)
+		       << MTE_CTRL_GCR_USER_EXCL_SHIFT;
 
 	if (!system_supports_mte())
 		return 0;
 
-	switch (arg & PR_MTE_TCF_MASK) {
-	case PR_MTE_TCF_NONE:
-		sctlr |= SCTLR_EL1_TCF0_NONE;
-		break;
-	case PR_MTE_TCF_SYNC:
-		sctlr |= SCTLR_EL1_TCF0_SYNC;
-		break;
-	case PR_MTE_TCF_ASYNC:
-		sctlr |= SCTLR_EL1_TCF0_ASYNC;
-		break;
-	default:
-		return -EINVAL;
-	}
-
-	if (task != current) {
-		task->thread.sctlr_user = sctlr;
-		task->thread.gcr_user_excl = gcr_excl;
-	} else {
-		set_task_sctlr_el1(sctlr);
-		set_gcr_el1_excl(gcr_excl);
+	if ((arg & PR_MTE_TCF_MASK) == PR_MTE_TCF_NONE)
+		mte_ctrl |= MTE_CTRL_TCF_NONE;
+	if (arg & PR_MTE_TCF_ASYNC)
+		mte_ctrl |= MTE_CTRL_TCF_ASYNC;
+	if (arg & PR_MTE_TCF_SYNC)
+		mte_ctrl |= MTE_CTRL_TCF_SYNC;
+
+	task->thread.mte_ctrl = mte_ctrl;
+	if (task == current) {
+		mte_update_sctlr_user(task);
+		set_task_sctlr_el1(task->thread.sctlr_user);
 	}
 
 	return 0;
@@ -297,24 +300,20 @@  long set_mte_ctrl(struct task_struct *task, unsigned long arg)
 long get_mte_ctrl(struct task_struct *task)
 {
 	unsigned long ret;
-	u64 incl = ~task->thread.gcr_user_excl & SYS_GCR_EL1_EXCL_MASK;
+	u64 mte_ctrl = task->thread.mte_ctrl;
+	u64 incl = (~mte_ctrl >> MTE_CTRL_GCR_USER_EXCL_SHIFT) &
+		   SYS_GCR_EL1_EXCL_MASK;
 
 	if (!system_supports_mte())
 		return 0;
 
 	ret = incl << PR_MTE_TAG_SHIFT;
-
-	switch (task->thread.sctlr_user & SCTLR_EL1_TCF0_MASK) {
-	case SCTLR_EL1_TCF0_NONE:
+	if (mte_ctrl & MTE_CTRL_TCF_NONE)
 		ret |= PR_MTE_TCF_NONE;
-		break;
-	case SCTLR_EL1_TCF0_SYNC:
-		ret |= PR_MTE_TCF_SYNC;
-		break;
-	case SCTLR_EL1_TCF0_ASYNC:
+	if (mte_ctrl & MTE_CTRL_TCF_ASYNC)
 		ret |= PR_MTE_TCF_ASYNC;
-		break;
-	}
+	if (mte_ctrl & MTE_CTRL_TCF_SYNC)
+		ret |= PR_MTE_TCF_SYNC;
 
 	return ret;
 }
@@ -453,3 +452,66 @@  int mte_ptrace_copy_tags(struct task_struct *child, long request,
 
 	return ret;
 }
+
+static ssize_t mte_tcf_preferred_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	switch (per_cpu(mte_tcf_preferred, dev->id)) {
+	case MTE_CTRL_TCF_ASYNC:
+		return sysfs_emit(buf, "async\n");
+	case MTE_CTRL_TCF_SYNC:
+		return sysfs_emit(buf, "sync\n");
+	default:
+		return sysfs_emit(buf, "???\n");
+	}
+}
+
+static void sync_sctlr(void *arg)
+{
+	mte_update_sctlr_user(current);
+	set_task_sctlr_el1(current->thread.sctlr_user);
+}
+
+static ssize_t mte_tcf_preferred_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t count)
+{
+	ssize_t ret = 0;
+	u64 tcf;
+
+	if (sysfs_streq(buf, "async"))
+		tcf = MTE_CTRL_TCF_ASYNC;
+	else if (sysfs_streq(buf, "sync"))
+		tcf = MTE_CTRL_TCF_SYNC;
+	else
+		return -EINVAL;
+
+	device_lock(dev);
+	per_cpu(mte_tcf_preferred, dev->id) = tcf;
+
+	if (cpu_online(dev->id))
+		ret = smp_call_function_single(dev->id, sync_sctlr, NULL, 0);
+	if (ret == 0)
+		ret = count;
+	device_unlock(dev);
+
+	return ret;
+}
+static DEVICE_ATTR_RW(mte_tcf_preferred);
+
+static int register_mte_tcf_preferred_sysctl(void)
+{
+	unsigned int cpu;
+
+	if (!system_supports_mte())
+		return 0;
+
+	for_each_possible_cpu(cpu) {
+		per_cpu(mte_tcf_preferred, cpu) = MTE_CTRL_TCF_ASYNC;
+		device_create_file(get_cpu_device(cpu),
+				   &dev_attr_mte_tcf_preferred);
+	}
+
+	return 0;
+}
+subsys_initcall(register_mte_tcf_preferred_sysctl);
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 18a9f59dc067..48de354b847f 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -234,11 +234,10 @@  struct prctl_mm_map {
 #define PR_GET_TAGGED_ADDR_CTRL		56
 # define PR_TAGGED_ADDR_ENABLE		(1UL << 0)
 /* MTE tag check fault modes */
-# define PR_MTE_TCF_SHIFT		1
-# define PR_MTE_TCF_NONE		(0UL << PR_MTE_TCF_SHIFT)
-# define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
-# define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
-# define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
+# define PR_MTE_TCF_NONE		0
+# define PR_MTE_TCF_SYNC		(1UL << 1)
+# define PR_MTE_TCF_ASYNC		(1UL << 2)
+# define PR_MTE_TCF_MASK		(PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC)
 /* MTE tag inclusion mask */
 # define PR_MTE_TAG_SHIFT		3
 # define PR_MTE_TAG_MASK		(0xffffUL << PR_MTE_TAG_SHIFT)