diff mbox series

arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis

Message ID 20210602232445.3829248-1-pcc@google.com (mailing list archive)
State New, archived
Headers show
Series arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis | expand

Commit Message

Peter Collingbourne June 2, 2021, 11:24 p.m. UTC
On some CPUs the performance of MTE in synchronous mode is the same
as that of asynchronous mode. This makes it worthwhile to enable
synchronous mode on those CPUs when asynchronous mode is requested,
in order to gain the error detection benefits of synchronous mode
without the performance downsides. Therefore, make it possible for CPUs
to opt into upgrading to synchronous mode via a new mte-prefer-sync
device tree attribute.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8
---
 arch/arm64/kernel/process.c |  8 ++++++++
 arch/arm64/kernel/smp.c     | 22 ++++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Vincenzo Frascino June 3, 2021, 1:02 p.m. UTC | #1
Hi Peter,

On 6/3/21 12:24 AM, Peter Collingbourne wrote:
> On some CPUs the performance of MTE in synchronous mode is the same
> as that of asynchronous mode. This makes it worthwhile to enable
> synchronous mode on those CPUs when asynchronous mode is requested,
> in order to gain the error detection benefits of synchronous mode
> without the performance downsides. Therefore, make it possible for CPUs
> to opt into upgrading to synchronous mode via a new mte-prefer-sync
> device tree attribute.
> 

I had a look at your patch and I think that there are few points that are worth
mentioning:
1) The approach you are using is per-CPU hence we might end up with a system
that has some PE configured as sync and some configured as async. We currently
support only a system wide setting.
2) async and sync have slightly different semantics (e.g. in sync mode the
access does not take place and it requires emulation) this means that a mixed
configuration affects the ABI.
3) In your patch you use DT to enforce sync mode on a CPU, probably it is better
to have an MIDR scheme to mark these CPUs.

> Signed-off-by: Peter Collingbourne <pcc@google.com>
> Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8
> ---
>  arch/arm64/kernel/process.c |  8 ++++++++
>  arch/arm64/kernel/smp.c     | 22 ++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index b4bb67f17a2c..ba6ed0c1390c 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -527,8 +527,16 @@ static void erratum_1418040_thread_switch(struct task_struct *prev,
>  	write_sysreg(val, cntkctl_el1);
>  }
>  
> +DECLARE_PER_CPU_READ_MOSTLY(bool, mte_prefer_sync);
> +
>  static void update_sctlr_el1(u64 sctlr)
>  {
> +	if ((sctlr & SCTLR_EL1_TCF0_MASK) == SCTLR_EL1_TCF0_ASYNC &&
> +	    __this_cpu_read(mte_prefer_sync)) {
> +		sctlr &= ~SCTLR_EL1_TCF0_MASK;
> +		sctlr |= SCTLR_EL1_TCF0_SYNC;
> +	}
> +
>  	/*
>  	 * EnIA must not be cleared while in the kernel as this is necessary for
>  	 * in-kernel PAC. It will be cleared on kernel exit if needed.
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index dcd7041b2b07..3a475722f768 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -56,6 +56,8 @@
>  
>  DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
>  EXPORT_PER_CPU_SYMBOL(cpu_number);
> +DEFINE_PER_CPU_READ_MOSTLY(bool, mte_prefer_sync);
> +EXPORT_PER_CPU_SYMBOL(mte_prefer_sync);
>  
>  /*
>   * as from 2.5, kernels no longer have an init_tasks structure
> @@ -649,6 +651,16 @@ static void __init acpi_parse_and_init_cpus(void)
>  #define acpi_parse_and_init_cpus(...)	do { } while (0)
>  #endif
>  
> +/*
> + * Read per-CPU properties from the device tree and store them in per-CPU
> + * variables for efficient access later.
> + */
> +static void __init of_read_cpu_properties(int cpu, struct device_node *dn)
> +{
> +	per_cpu(mte_prefer_sync, cpu) =
> +		of_property_read_bool(dn, "mte-prefer-sync");
> +}
> +
>  /*
>   * Enumerate the possible CPU set from the device tree and build the
>   * cpu logical map array containing MPIDR values related to logical
> @@ -789,6 +801,16 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  		set_cpu_present(cpu, true);
>  		numa_store_cpu_info(cpu);
>  	}
> +
> +	if (acpi_disabled) {
> +		struct device_node *dn;
> +		int cpu = 0;
> +
> +		for_each_of_cpu_node(dn) {
> +			of_read_cpu_properties(cpu, dn);
> +			cpu++;
> +		}
> +	}
>  }
>  
>  static const char *ipi_types[NR_IPI] __tracepoint_string = {
>
Catalin Marinas June 3, 2021, 2:30 p.m. UTC | #2
Hi Peter,

On Wed, Jun 02, 2021 at 04:24:45PM -0700, Peter Collingbourne wrote:
> On some CPUs the performance of MTE in synchronous mode is the same
> as that of asynchronous mode. This makes it worthwhile to enable
> synchronous mode on those CPUs when asynchronous mode is requested,
> in order to gain the error detection benefits of synchronous mode
> without the performance downsides. Therefore, make it possible for CPUs
> to opt into upgrading to synchronous mode via a new mte-prefer-sync
> device tree attribute.

As Vincenzo said, there's an ABI change which I don't particularly like.
I think the current PR_MTE_TCF_* should be honoured as they described.
We could introduce a new control, PR_MTE_TCF_DYNAMIC (or a better name)
that would cover both your mixed system or some future scenario where we
want to switch between async and sync (and the new asymmetric mode in
8.7) at run-time via some other control/timer and transparently from the
application. But that would be an application opt-in.
Peter Collingbourne June 3, 2021, 5:49 p.m. UTC | #3
On Thu, Jun 3, 2021 at 6:01 AM Vincenzo Frascino
<vincenzo.frascino@arm.com> wrote:
>
> Hi Peter,
>
> On 6/3/21 12:24 AM, Peter Collingbourne wrote:
> > On some CPUs the performance of MTE in synchronous mode is the same
> > as that of asynchronous mode. This makes it worthwhile to enable
> > synchronous mode on those CPUs when asynchronous mode is requested,
> > in order to gain the error detection benefits of synchronous mode
> > without the performance downsides. Therefore, make it possible for CPUs
> > to opt into upgrading to synchronous mode via a new mte-prefer-sync
> > device tree attribute.
> >
>
> I had a look at your patch and I think that there are few points that are worth
> mentioning:
> 1) The approach you are using is per-CPU hence we might end up with a system
> that has some PE configured as sync and some configured as async. We currently
> support only a system wide setting.

This is the intent. On e.g. a big/little system this means that we
would effectively have sampling of sync MTE faults at a higher rate
than a pure userspace implementation could achieve, at zero cost.

> 2) async and sync have slightly different semantics (e.g. in sync mode the
> access does not take place and it requires emulation) this means that a mixed
> configuration affects the ABI.

We considered the ABI question and think that is somewhat academic.
While it's true that we would prevent the first access from succeeding
(and, more visibly, use SEGV_MTESERR in the signal rather than
SEGV_MTEAERR) I'm not aware of a reasonable way that a userspace
program could depend on the access succeeding. While it's slightly
more plausible that there could be a dependency on the signal type, we
don't depend on that in Android, at least not in a way that would lead
to worse outcomes if we get MTESERR instead of MTEAERR (it would lead
to better outcomes, in the form of a more accurate/detailed crash
report, which is what motivates this change). I also checked glibc and
they don't appear to have any dependencies on the signal type, or
indeed have any detailed crash reporting at all as far as I can tell.
Furthermore, basically nobody has hardware at the moment so I don't
think we would be breaking any actual users by doing this.

> 3) In your patch you use DT to enforce sync mode on a CPU, probably it is better
> to have an MIDR scheme to mark these CPUs.

Okay, so in your scheme we would say that e.g. all Cortex-A510 CPUs
should be subject to this treatment. Can we guarantee that all
Cortex-A510 CPUs would have the same performance for sync and async or
could the system designer tweak some aspect of the system such that
they could get different performance? The possibility of the latter is
what led me to specify the information via DT.

Peter
Catalin Marinas June 8, 2021, 2:43 p.m. UTC | #4
On Thu, Jun 03, 2021 at 10:49:24AM -0700, Peter Collingbourne wrote:
> On Thu, Jun 3, 2021 at 6:01 AM Vincenzo Frascino
> <vincenzo.frascino@arm.com> wrote:
> > On 6/3/21 12:24 AM, Peter Collingbourne wrote:
> > > On some CPUs the performance of MTE in synchronous mode is the same
> > > as that of asynchronous mode. This makes it worthwhile to enable
> > > synchronous mode on those CPUs when asynchronous mode is requested,
> > > in order to gain the error detection benefits of synchronous mode
> > > without the performance downsides. Therefore, make it possible for CPUs
> > > to opt into upgrading to synchronous mode via a new mte-prefer-sync
> > > device tree attribute.
> > >
> >
> > I had a look at your patch and I think that there are few points that are worth
> > mentioning:
> > 1) The approach you are using is per-CPU hence we might end up with a system
> > that has some PE configured as sync and some configured as async. We currently
> > support only a system wide setting.
> 
> This is the intent. On e.g. a big/little system this means that we
> would effectively have sampling of sync MTE faults at a higher rate
> than a pure userspace implementation could achieve, at zero cost.
> 
> > 2) async and sync have slightly different semantics (e.g. in sync mode the
> > access does not take place and it requires emulation) this means that a mixed
> > configuration affects the ABI.
> 
> We considered the ABI question and think that is somewhat academic.
> While it's true that we would prevent the first access from succeeding
> (and, more visibly, use SEGV_MTESERR in the signal rather than
> SEGV_MTEAERR) I'm not aware of a reasonable way that a userspace
> program could depend on the access succeeding.

It's more about whether some software relies on the async mode only for
logging without any intention of handling the synchronous faults. In
such scenario, the async signal is fairly simple, it logs and continues
safely (well, as "safe" as before MTE). With the sync mode, however, the
signal handler will have to ensure the access took place either before
continuing, either by emulating, restarting the instruction with
PSTATE.TCO set or by falling back to the async mode.

IOW, I don't expect all programs making use of MTE to simply die on an
MTE fault (though I guess they'll be in a minority but we still need to
allow such scenarios).

> While it's slightly
> more plausible that there could be a dependency on the signal type, we
> don't depend on that in Android, at least not in a way that would lead
> to worse outcomes if we get MTESERR instead of MTEAERR (it would lead
> to better outcomes, in the form of a more accurate/detailed crash
> report, which is what motivates this change). I also checked glibc and
> they don't appear to have any dependencies on the signal type, or
> indeed have any detailed crash reporting at all as far as I can tell.
> Furthermore, basically nobody has hardware at the moment so I don't
> think we would be breaking any actual users by doing this.

While there's no user-space out there yet, given that MTE was merged in
5.10 and that's an LTS kernel, we'll see software running on this kernel
version at some point in the next few years. So any fix will need
backporting but an ABI change for better performance on specific SoCs
hardly counts as a fix.

I'm ok with improving the ABI but not breaking the current one, hence
the suggestion for a new PR_MTE_TCF_* field or maybe a new bit that
allows the mode to be "upgraded", for some definition of this (for
example, if TCF_NONE is as fast as TCF_ASYNC, should NONE be upgraded?)

> > 3) In your patch you use DT to enforce sync mode on a CPU, probably it is better
> > to have an MIDR scheme to mark these CPUs.
> 
> Okay, so in your scheme we would say that e.g. all Cortex-A510 CPUs
> should be subject to this treatment. Can we guarantee that all
> Cortex-A510 CPUs would have the same performance for sync and async or
> could the system designer tweak some aspect of the system such that
> they could get different performance? The possibility of the latter is
> what led me to specify the information via DT.

While it's more of a CPU microarchitecture issue, there's indeed a good
chance that the SoC has a non-trivial impact on the performance of the
synchronous mode, so it may tip the balance one way or another.

Another idea would be to introduce a PR_MTE_TCF_DEFAULT mode together
with some /sys/*/cpu*/ entries to control the default MTE mode per CPU.
We'd leave it entirely to user-space (privileged user), maybe it even
wants to run some checks before tuning the default mode per CPU.

I'm pretty sure the sync vs async decision is not a clear cut (e.g. sync
may still be slower but within a tolerable margin for certain
benchmarks). Leaving the decision to the hardware vendor, hard-coded in
the DT, is probably not the best option (nor is the MIDR). Some future
benchmark with a weird memory access pattern could expose slowness in
the sync mode and vendors will scramble on how to change the DT already
deployed (maybe they can do this OTA already).
Peter Collingbourne June 8, 2021, 7:54 p.m. UTC | #5
On Tue, Jun 8, 2021 at 7:44 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Jun 03, 2021 at 10:49:24AM -0700, Peter Collingbourne wrote:
> > On Thu, Jun 3, 2021 at 6:01 AM Vincenzo Frascino
> > <vincenzo.frascino@arm.com> wrote:
> > > On 6/3/21 12:24 AM, Peter Collingbourne wrote:
> > > > On some CPUs the performance of MTE in synchronous mode is the same
> > > > as that of asynchronous mode. This makes it worthwhile to enable
> > > > synchronous mode on those CPUs when asynchronous mode is requested,
> > > > in order to gain the error detection benefits of synchronous mode
> > > > without the performance downsides. Therefore, make it possible for CPUs
> > > > to opt into upgrading to synchronous mode via a new mte-prefer-sync
> > > > device tree attribute.
> > > >
> > >
> > > I had a look at your patch and I think that there are few points that are worth
> > > mentioning:
> > > 1) The approach you are using is per-CPU hence we might end up with a system
> > > that has some PE configured as sync and some configured as async. We currently
> > > support only a system wide setting.
> >
> > This is the intent. On e.g. a big/little system this means that we
> > would effectively have sampling of sync MTE faults at a higher rate
> > than a pure userspace implementation could achieve, at zero cost.
> >
> > > 2) async and sync have slightly different semantics (e.g. in sync mode the
> > > access does not take place and it requires emulation) this means that a mixed
> > > configuration affects the ABI.
> >
> > We considered the ABI question and think that is somewhat academic.
> > While it's true that we would prevent the first access from succeeding
> > (and, more visibly, use SEGV_MTESERR in the signal rather than
> > SEGV_MTEAERR) I'm not aware of a reasonable way that a userspace
> > program could depend on the access succeeding.
>
> It's more about whether some software relies on the async mode only for
> logging without any intention of handling the synchronous faults. In
> such scenario, the async signal is fairly simple, it logs and continues
> safely (well, as "safe" as before MTE). With the sync mode, however, the
> signal handler will have to ensure the access took place either before
> continuing, either by emulating, restarting the instruction with
> PSTATE.TCO set or by falling back to the async mode.
>
> IOW, I don't expect all programs making use of MTE to simply die on an
> MTE fault (though I guess they'll be in a minority but we still need to
> allow such scenarios).

Okay, allowing such programs seems reasonable to me. The question is
then whether the new behavior should be an opt-in or an opt-out.

> > While it's slightly
> > more plausible that there could be a dependency on the signal type, we
> > don't depend on that in Android, at least not in a way that would lead
> > to worse outcomes if we get MTESERR instead of MTEAERR (it would lead
> > to better outcomes, in the form of a more accurate/detailed crash
> > report, which is what motivates this change). I also checked glibc and
> > they don't appear to have any dependencies on the signal type, or
> > indeed have any detailed crash reporting at all as far as I can tell.
> > Furthermore, basically nobody has hardware at the moment so I don't
> > think we would be breaking any actual users by doing this.
>
> While there's no user-space out there yet, given that MTE was merged in
> 5.10 and that's an LTS kernel, we'll see software running on this kernel
> version at some point in the next few years. So any fix will need
> backporting but an ABI change for better performance on specific SoCs
> hardly counts as a fix.

Okay, seems reasonable enough.

> I'm ok with improving the ABI but not breaking the current one, hence
> the suggestion for a new PR_MTE_TCF_* field or maybe a new bit that
> allows the mode to be "upgraded", for some definition of this (for
> example, if TCF_NONE is as fast as TCF_ASYNC, should NONE be upgraded?)

I think the possibility of upgrading NONE to ASYNC is a good reason to
make this an opt-in, since it alters the behavior in a more
significant way than ASYNC to SYNC.

> > > 3) In your patch you use DT to enforce sync mode on a CPU, probably it is better
> > > to have an MIDR scheme to mark these CPUs.
> >
> > Okay, so in your scheme we would say that e.g. all Cortex-A510 CPUs
> > should be subject to this treatment. Can we guarantee that all
> > Cortex-A510 CPUs would have the same performance for sync and async or
> > could the system designer tweak some aspect of the system such that
> > they could get different performance? The possibility of the latter is
> > what led me to specify the information via DT.
>
> While it's more of a CPU microarchitecture issue, there's indeed a good
> chance that the SoC has a non-trivial impact on the performance of the
> synchronous mode, so it may tip the balance one way or another.
>
> Another idea would be to introduce a PR_MTE_TCF_DEFAULT mode together
> with some /sys/*/cpu*/ entries to control the default MTE mode per CPU.
> We'd leave it entirely to user-space (privileged user), maybe it even
> wants to run some checks before tuning the default mode per CPU.

That's an interesting idea, but it sounds like it wouldn't permit
upgrading from NONE as a separate feature from upgrading from ASYNC.

> I'm pretty sure the sync vs async decision is not a clear cut (e.g. sync
> may still be slower but within a tolerable margin for certain
> benchmarks). Leaving the decision to the hardware vendor, hard-coded in
> the DT, is probably not the best option (nor is the MIDR). Some future
> benchmark with a weird memory access pattern could expose slowness in
> the sync mode and vendors will scramble on how to change the DT already
> deployed (maybe they can do this OTA already).

Perhaps we can let the default upgrading settings be specified via DT,
and allow them to be overridden later via sysfs. That seems like the
best of both worlds, since I think that in most cases DT should be
enough, while the ability to override should satisfy the remaining
cases. It also means that ACPI users have a way to use this feature
until it gets added to the spec.

By the way, if we want to allow upgrading from NONE in the future, or
upgrading to asymmetric mode, perhaps we should spell the option
"mte-upgrade-async" and have it take an argument in the range 0-3
(corresponding to the SCTLR_EL1.TCF bits) specifying the desired mode
to upgrade to. Then we could have e.g. "mte-upgrade-none" later with
the same semantics.

Peter
Evgenii Stepanov June 8, 2021, 9:55 p.m. UTC | #6
On Tue, Jun 8, 2021 at 12:54 PM Peter Collingbourne <pcc@google.com> wrote:
>
> On Tue, Jun 8, 2021 at 7:44 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> >
> > On Thu, Jun 03, 2021 at 10:49:24AM -0700, Peter Collingbourne wrote:
> > > On Thu, Jun 3, 2021 at 6:01 AM Vincenzo Frascino
> > > <vincenzo.frascino@arm.com> wrote:
> > > > On 6/3/21 12:24 AM, Peter Collingbourne wrote:
> > > > > On some CPUs the performance of MTE in synchronous mode is the same
> > > > > as that of asynchronous mode. This makes it worthwhile to enable
> > > > > synchronous mode on those CPUs when asynchronous mode is requested,
> > > > > in order to gain the error detection benefits of synchronous mode
> > > > > without the performance downsides. Therefore, make it possible for CPUs
> > > > > to opt into upgrading to synchronous mode via a new mte-prefer-sync
> > > > > device tree attribute.
> > > > >
> > > >
> > > > I had a look at your patch and I think that there are few points that are worth
> > > > mentioning:
> > > > 1) The approach you are using is per-CPU hence we might end up with a system
> > > > that has some PE configured as sync and some configured as async. We currently
> > > > support only a system wide setting.
> > >
> > > This is the intent. On e.g. a big/little system this means that we
> > > would effectively have sampling of sync MTE faults at a higher rate
> > > than a pure userspace implementation could achieve, at zero cost.
> > >
> > > > 2) async and sync have slightly different semantics (e.g. in sync mode the
> > > > access does not take place and it requires emulation) this means that a mixed
> > > > configuration affects the ABI.
> > >
> > > We considered the ABI question and think that is somewhat academic.
> > > While it's true that we would prevent the first access from succeeding
> > > (and, more visibly, use SEGV_MTESERR in the signal rather than
> > > SEGV_MTEAERR) I'm not aware of a reasonable way that a userspace
> > > program could depend on the access succeeding.
> >
> > It's more about whether some software relies on the async mode only for
> > logging without any intention of handling the synchronous faults. In
> > such scenario, the async signal is fairly simple, it logs and continues
> > safely (well, as "safe" as before MTE). With the sync mode, however, the
> > signal handler will have to ensure the access took place either before
> > continuing, either by emulating, restarting the instruction with
> > PSTATE.TCO set or by falling back to the async mode.
> >
> > IOW, I don't expect all programs making use of MTE to simply die on an
> > MTE fault (though I guess they'll be in a minority but we still need to
> > allow such scenarios).
>
> Okay, allowing such programs seems reasonable to me. The question is
> then whether the new behavior should be an opt-in or an opt-out.
>
> > > While it's slightly
> > > more plausible that there could be a dependency on the signal type, we
> > > don't depend on that in Android, at least not in a way that would lead
> > > to worse outcomes if we get MTESERR instead of MTEAERR (it would lead
> > > to better outcomes, in the form of a more accurate/detailed crash
> > > report, which is what motivates this change). I also checked glibc and
> > > they don't appear to have any dependencies on the signal type, or
> > > indeed have any detailed crash reporting at all as far as I can tell.
> > > Furthermore, basically nobody has hardware at the moment so I don't
> > > think we would be breaking any actual users by doing this.
> >
> > While there's no user-space out there yet, given that MTE was merged in
> > 5.10 and that's an LTS kernel, we'll see software running on this kernel
> > version at some point in the next few years. So any fix will need
> > backporting but an ABI change for better performance on specific SoCs
> > hardly counts as a fix.
>
> Okay, seems reasonable enough.
>
> > I'm ok with improving the ABI but not breaking the current one, hence
> > the suggestion for a new PR_MTE_TCF_* field or maybe a new bit that
> > allows the mode to be "upgraded", for some definition of this (for
> > example, if TCF_NONE is as fast as TCF_ASYNC, should NONE be upgraded?)
>
> I think the possibility of upgrading NONE to ASYNC is a good reason to
> make this an opt-in, since it alters the behavior in a more
> significant way than ASYNC to SYNC.
>
> > > > 3) In your patch you use DT to enforce sync mode on a CPU, probably it is better
> > > > to have an MIDR scheme to mark these CPUs.
> > >
> > > Okay, so in your scheme we would say that e.g. all Cortex-A510 CPUs
> > > should be subject to this treatment. Can we guarantee that all
> > > Cortex-A510 CPUs would have the same performance for sync and async or
> > > could the system designer tweak some aspect of the system such that
> > > they could get different performance? The possibility of the latter is
> > > what led me to specify the information via DT.
> >
> > While it's more of a CPU microarchitecture issue, there's indeed a good
> > chance that the SoC has a non-trivial impact on the performance of the
> > synchronous mode, so it may tip the balance one way or another.
> >
> > Another idea would be to introduce a PR_MTE_TCF_DEFAULT mode together
> > with some /sys/*/cpu*/ entries to control the default MTE mode per CPU.
> > We'd leave it entirely to user-space (privileged user), maybe it even
> > wants to run some checks before tuning the default mode per CPU.
>
> That's an interesting idea, but it sounds like it wouldn't permit
> upgrading from NONE as a separate feature from upgrading from ASYNC.

For the userspace API, I like the idea of a new bit flag with the
meaning "upgrade to a stricter mode with similar performance".
On a hypothetical CPU where SYNC mode has negligible overhead this
could upgrade NONE all the way to SYNC.
It would also allow transparent opt-in to any future modes, like asymmetric.

A /sys interface in addition to DT would be nice to have, but with
this API we already have an escape hatch in the userspace - remove the
bit, or even have libc forcibly remove the bit for all prctl calls.

>
> > I'm pretty sure the sync vs async decision is not a clear cut (e.g. sync
> > may still be slower but within a tolerable margin for certain
> > benchmarks). Leaving the decision to the hardware vendor, hard-coded in
> > the DT, is probably not the best option (nor is the MIDR). Some future
> > benchmark with a weird memory access pattern could expose slowness in
> > the sync mode and vendors will scramble on how to change the DT already
> > deployed (maybe they can do this OTA already).
>
> Perhaps we can let the default upgrading settings be specified via DT,
> and allow them to be overridden later via sysfs. That seems like the
> best of both worlds, since I think that in most cases DT should be
> enough, while the ability to override should satisfy the remaining
> cases. It also means that ACPI users have a way to use this feature
> until it gets added to the spec.
>
> By the way, if we want to allow upgrading from NONE in the future, or
> upgrading to asymmetric mode, perhaps we should spell the option
> "mte-upgrade-async" and have it take an argument in the range 0-3
> (corresponding to the SCTLR_EL1.TCF bits) specifying the desired mode
> to upgrade to. Then we could have e.g. "mte-upgrade-none" later with
> the same semantics.
>
> Peter
diff mbox series

Patch

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index b4bb67f17a2c..ba6ed0c1390c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -527,8 +527,16 @@  static void erratum_1418040_thread_switch(struct task_struct *prev,
 	write_sysreg(val, cntkctl_el1);
 }
 
+DECLARE_PER_CPU_READ_MOSTLY(bool, mte_prefer_sync);
+
 static void update_sctlr_el1(u64 sctlr)
 {
+	if ((sctlr & SCTLR_EL1_TCF0_MASK) == SCTLR_EL1_TCF0_ASYNC &&
+	    __this_cpu_read(mte_prefer_sync)) {
+		sctlr &= ~SCTLR_EL1_TCF0_MASK;
+		sctlr |= SCTLR_EL1_TCF0_SYNC;
+	}
+
 	/*
 	 * EnIA must not be cleared while in the kernel as this is necessary for
 	 * in-kernel PAC. It will be cleared on kernel exit if needed.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dcd7041b2b07..3a475722f768 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -56,6 +56,8 @@ 
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
 EXPORT_PER_CPU_SYMBOL(cpu_number);
+DEFINE_PER_CPU_READ_MOSTLY(bool, mte_prefer_sync);
+EXPORT_PER_CPU_SYMBOL(mte_prefer_sync);
 
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
@@ -649,6 +651,16 @@  static void __init acpi_parse_and_init_cpus(void)
 #define acpi_parse_and_init_cpus(...)	do { } while (0)
 #endif
 
+/*
+ * Read per-CPU properties from the device tree and store them in per-CPU
+ * variables for efficient access later.
+ */
+static void __init of_read_cpu_properties(int cpu, struct device_node *dn)
+{
+	per_cpu(mte_prefer_sync, cpu) =
+		of_property_read_bool(dn, "mte-prefer-sync");
+}
+
 /*
  * Enumerate the possible CPU set from the device tree and build the
  * cpu logical map array containing MPIDR values related to logical
@@ -789,6 +801,16 @@  void __init smp_prepare_cpus(unsigned int max_cpus)
 		set_cpu_present(cpu, true);
 		numa_store_cpu_info(cpu);
 	}
+
+	if (acpi_disabled) {
+		struct device_node *dn;
+		int cpu = 0;
+
+		for_each_of_cpu_node(dn) {
+			of_read_cpu_properties(cpu, dn);
+			cpu++;
+		}
+	}
 }
 
 static const char *ipi_types[NR_IPI] __tracepoint_string = {