Message ID | 20211220234114.3926-1-scott@os.amperecomputing.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5] arm64: errata: Fix exec handling in erratum 1418040 workaround | expand |
On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: > The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 > when executing compat threads. The workaround is applied when switching > between tasks, but the need for the workaround could also change at an > exec(), when a non-compat task execs a compat binary or vice versa. Apply > the workaround in arch_setup_new_exec(). > > This leaves a small window of time between SET_PERSONALITY and > arch_setup_new_exec where preemption could occur and confuse the old > workaround logic that compares TIF_32BIT between prev and next. Instead, we > can just read cntkctl to make sure it's in the state that the next task > needs. I measured cntkctl read time to be about the same as a mov from a > general-purpose register on N1. Update the workaround logic to examine the > current value of cntkctl instead of the previous task's compat state. The patch looks fine to me but I was wondering what the cost of writing CNTKCTL_EL1 is, compared to a read. If it turns out to be negligible, we can simplify this patch further ;).
Catalin Marinas <catalin.marinas@arm.com> writes: > On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: >> The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 >> when executing compat threads. The workaround is applied when switching >> between tasks, but the need for the workaround could also change at an >> exec(), when a non-compat task execs a compat binary or vice versa. Apply >> the workaround in arch_setup_new_exec(). >> >> This leaves a small window of time between SET_PERSONALITY and >> arch_setup_new_exec where preemption could occur and confuse the old >> workaround logic that compares TIF_32BIT between prev and next. Instead, we >> can just read cntkctl to make sure it's in the state that the next task >> needs. I measured cntkctl read time to be about the same as a mov from a >> general-purpose register on N1. Update the workaround logic to examine the >> current value of cntkctl instead of the previous task's compat state. > > The patch looks fine to me but I was wondering what the cost of writing > CNTKCTL_EL1 is, compared to a read. If it turns out to be negligible, we > can simplify this patch further ;). I measured it at something like 20-30x the time of a read, though that was in a tight loop of writing, so maybe the cost could be hidden some by out-of-order execution. Are you thinking of moving the erratum workaround back to the exit to user path? > -- > Catalin
On Tue, Dec 21, 2021 at 12:10:08PM -0800, D Scott Phillips wrote: > Catalin Marinas <catalin.marinas@arm.com> writes: > > On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: > >> The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 > >> when executing compat threads. The workaround is applied when switching > >> between tasks, but the need for the workaround could also change at an > >> exec(), when a non-compat task execs a compat binary or vice versa. Apply > >> the workaround in arch_setup_new_exec(). > >> > >> This leaves a small window of time between SET_PERSONALITY and > >> arch_setup_new_exec where preemption could occur and confuse the old > >> workaround logic that compares TIF_32BIT between prev and next. Instead, we > >> can just read cntkctl to make sure it's in the state that the next task > >> needs. I measured cntkctl read time to be about the same as a mov from a > >> general-purpose register on N1. Update the workaround logic to examine the > >> current value of cntkctl instead of the previous task's compat state. > > > > The patch looks fine to me but I was wondering what the cost of writing > > CNTKCTL_EL1 is, compared to a read. If it turns out to be negligible, we > > can simplify this patch further ;). > > I measured it at something like 20-30x the time of a read, though that > was in a tight loop of writing, so maybe the cost could be hidden some > by out-of-order execution. Are you thinking of moving the erratum > workaround back to the exit to user path? No, just wondering whether we can avoid the read/check/write with preemption disabled. Thread switches happen less often than the return to user. I'll probably take your current patch as a fix of Marc's commit. Waiting a bit to see if Marc has any further comments.
On Wed, 22 Dec 2021 11:03:13 +0000, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Dec 21, 2021 at 12:10:08PM -0800, D Scott Phillips wrote: > > Catalin Marinas <catalin.marinas@arm.com> writes: > > > On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: > > >> The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 > > >> when executing compat threads. The workaround is applied when switching > > >> between tasks, but the need for the workaround could also change at an > > >> exec(), when a non-compat task execs a compat binary or vice versa. Apply > > >> the workaround in arch_setup_new_exec(). > > >> > > >> This leaves a small window of time between SET_PERSONALITY and > > >> arch_setup_new_exec where preemption could occur and confuse the old > > >> workaround logic that compares TIF_32BIT between prev and next. Instead, we > > >> can just read cntkctl to make sure it's in the state that the next task > > >> needs. I measured cntkctl read time to be about the same as a mov from a > > >> general-purpose register on N1. Update the workaround logic to examine the > > >> current value of cntkctl instead of the previous task's compat state. > > > > > > The patch looks fine to me but I was wondering what the cost of writing > > > CNTKCTL_EL1 is, compared to a read. If it turns out to be negligible, we > > > can simplify this patch further ;). > > > > I measured it at something like 20-30x the time of a read, though that > > was in a tight loop of writing, so maybe the cost could be hidden some > > by out-of-order execution. I'm not overly surprised. Writing to this register is likely to require some level of synchronisation pretty deep inside the core as event stream changes would take effect immediately. > > Are you thinking of moving the erratum workaround back to the exit > > to user path? > > No, just wondering whether we can avoid the read/check/write with > preemption disabled. Thread switches happen less often than the return > to user. > > I'll probably take your current patch as a fix of Marc's commit. Waiting > a bit to see if Marc has any further comments. No, I think this is pretty much it. Feel free to apply it with my Reviewed-by: Marc Zyngier <maz@kernel.org> Thanks, M.
On Mon, 20 Dec 2021 15:41:14 -0800, D Scott Phillips wrote: > The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 > when executing compat threads. The workaround is applied when switching > between tasks, but the need for the workaround could also change at an > exec(), when a non-compat task execs a compat binary or vice versa. Apply > the workaround in arch_setup_new_exec(). > > This leaves a small window of time between SET_PERSONALITY and > arch_setup_new_exec where preemption could occur and confuse the old > workaround logic that compares TIF_32BIT between prev and next. Instead, we > can just read cntkctl to make sure it's in the state that the next task > needs. I measured cntkctl read time to be about the same as a mov from a > general-purpose register on N1. Update the workaround logic to examine the > current value of cntkctl instead of the previous task's compat state. > > [...] Applied to arm64 (for-next/misc), thanks! [1/1] arm64: errata: Fix exec handling in erratum 1418040 workaround https://git.kernel.org/arm64/c/38e0257e0e6f
On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: > The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 > when executing compat threads. The workaround is applied when switching > between tasks, but the need for the workaround could also change at an > exec(), when a non-compat task execs a compat binary or vice versa. Apply > the workaround in arch_setup_new_exec(). > > This leaves a small window of time between SET_PERSONALITY and > arch_setup_new_exec where preemption could occur and confuse the old > workaround logic that compares TIF_32BIT between prev and next. Instead, we > can just read cntkctl to make sure it's in the state that the next task > needs. I measured cntkctl read time to be about the same as a mov from a > general-purpose register on N1. Update the workaround logic to examine the > current value of cntkctl instead of the previous task's compat state. > > Fixes: d49f7d7376d0 ("arm64: Move handling of erratum 1418040 into C code") > Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> > Cc: <stable@vger.kernel.org> # 5.4.x Why 5.4? I think the fixed commit is 5.9.
Catalin Marinas <catalin.marinas@arm.com> writes: > On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: >> The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 >> when executing compat threads. The workaround is applied when switching >> between tasks, but the need for the workaround could also change at an >> exec(), when a non-compat task execs a compat binary or vice versa. Apply >> the workaround in arch_setup_new_exec(). >> >> This leaves a small window of time between SET_PERSONALITY and >> arch_setup_new_exec where preemption could occur and confuse the old >> workaround logic that compares TIF_32BIT between prev and next. Instead, we >> can just read cntkctl to make sure it's in the state that the next task >> needs. I measured cntkctl read time to be about the same as a mov from a >> general-purpose register on N1. Update the workaround logic to examine the >> current value of cntkctl instead of the previous task's compat state. >> >> Fixes: d49f7d7376d0 ("arm64: Move handling of erratum 1418040 into C code") >> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> >> Cc: <stable@vger.kernel.org> # 5.4.x > > Why 5.4? I think the fixed commit is 5.9. d49f7d7376d0 got backported into v5.4.62's 82b05f0838aa. That looks like that's the farthest it's made it back. Is this the correct way to handle fixing the backported change, or should a separate backport be sent to 5.4 for the fix?
On Wed, Dec 22, 2021 at 08:12:27AM -0800, D Scott Phillips wrote: > Catalin Marinas <catalin.marinas@arm.com> writes: > > On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: > >> The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 > >> when executing compat threads. The workaround is applied when switching > >> between tasks, but the need for the workaround could also change at an > >> exec(), when a non-compat task execs a compat binary or vice versa. Apply > >> the workaround in arch_setup_new_exec(). > >> > >> This leaves a small window of time between SET_PERSONALITY and > >> arch_setup_new_exec where preemption could occur and confuse the old > >> workaround logic that compares TIF_32BIT between prev and next. Instead, we > >> can just read cntkctl to make sure it's in the state that the next task > >> needs. I measured cntkctl read time to be about the same as a mov from a > >> general-purpose register on N1. Update the workaround logic to examine the > >> current value of cntkctl instead of the previous task's compat state. > >> > >> Fixes: d49f7d7376d0 ("arm64: Move handling of erratum 1418040 into C code") > >> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> > >> Cc: <stable@vger.kernel.org> # 5.4.x > > > > Why 5.4? I think the fixed commit is 5.9. > > d49f7d7376d0 got backported into v5.4.62's 82b05f0838aa. That looks like > that's the farthest it's made it back. Is this the correct way to handle > fixing the backported change, or should a separate backport be sent to > 5.4 for the fix? If it applies cleanly to 5.4.62, I'll just tweak the fixes line back to 5.4 (I changed it to 5.9 as I have a git hook that adjusts the Fixes line automatically when applying). If it doesn't apply cleanly, you can send it separately.
Catalin Marinas <catalin.marinas@arm.com> writes: > On Wed, Dec 22, 2021 at 08:12:27AM -0800, D Scott Phillips wrote: >> Catalin Marinas <catalin.marinas@arm.com> writes: >> > On Mon, Dec 20, 2021 at 03:41:14PM -0800, D Scott Phillips wrote: >> >> The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 >> >> when executing compat threads. The workaround is applied when switching >> >> between tasks, but the need for the workaround could also change at an >> >> exec(), when a non-compat task execs a compat binary or vice versa. Apply >> >> the workaround in arch_setup_new_exec(). >> >> >> >> This leaves a small window of time between SET_PERSONALITY and >> >> arch_setup_new_exec where preemption could occur and confuse the old >> >> workaround logic that compares TIF_32BIT between prev and next. Instead, we >> >> can just read cntkctl to make sure it's in the state that the next task >> >> needs. I measured cntkctl read time to be about the same as a mov from a >> >> general-purpose register on N1. Update the workaround logic to examine the >> >> current value of cntkctl instead of the previous task's compat state. >> >> >> >> Fixes: d49f7d7376d0 ("arm64: Move handling of erratum 1418040 into C code") >> >> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> >> >> Cc: <stable@vger.kernel.org> # 5.4.x >> > >> > Why 5.4? I think the fixed commit is 5.9. >> >> d49f7d7376d0 got backported into v5.4.62's 82b05f0838aa. That looks like >> that's the farthest it's made it back. Is this the correct way to handle >> fixing the backported change, or should a separate backport be sent to >> 5.4 for the fix? > > If it applies cleanly to 5.4.62, I'll just tweak the fixes line back to > 5.4 (I changed it to 5.9 as I have a git hook that adjusts the Fixes > line automatically when applying). If it doesn't apply cleanly, you can > send it separately. Ah, of course, that makes sense. Looks like its got some superficial conflicts applying back to 5.4 as is, so 5.9 sounds like the way to go. Thanks for fixing that up, Scott
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index aacf2f5559a8..271d4bbf468e 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -439,34 +439,26 @@ static void entry_task_switch(struct task_struct *next) /* * ARM erratum 1418040 handling, affecting the 32bit view of CNTVCT. - * Assuming the virtual counter is enabled at the beginning of times: - * - * - disable access when switching from a 64bit task to a 32bit task - * - enable access when switching from a 32bit task to a 64bit task + * Ensure access is disabled when switching to a 32bit task, ensure + * access is enabled when switching to a 64bit task. */ -static void erratum_1418040_thread_switch(struct task_struct *prev, - struct task_struct *next) +static void erratum_1418040_thread_switch(struct task_struct *next) { - bool prev32, next32; - u64 val; - - if (!IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040)) - return; - - prev32 = is_compat_thread(task_thread_info(prev)); - next32 = is_compat_thread(task_thread_info(next)); - - if (prev32 == next32 || !this_cpu_has_cap(ARM64_WORKAROUND_1418040)) + if (!IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040) || + !this_cpu_has_cap(ARM64_WORKAROUND_1418040)) return; - val = read_sysreg(cntkctl_el1); - - if (!next32) - val |= ARCH_TIMER_USR_VCT_ACCESS_EN; + if (is_compat_thread(task_thread_info(next))) + sysreg_clear_set(cntkctl_el1, ARCH_TIMER_USR_VCT_ACCESS_EN, 0); else - val &= ~ARCH_TIMER_USR_VCT_ACCESS_EN; + sysreg_clear_set(cntkctl_el1, 0, ARCH_TIMER_USR_VCT_ACCESS_EN); +} - write_sysreg(val, cntkctl_el1); +static void erratum_1418040_new_exec(void) +{ + preempt_disable(); + erratum_1418040_thread_switch(current); + preempt_enable(); } /* @@ -501,7 +493,7 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev, contextidr_thread_switch(next); entry_task_switch(next); ssbs_thread_switch(next); - erratum_1418040_thread_switch(prev, next); + erratum_1418040_thread_switch(next); ptrauth_thread_switch_user(next); /* @@ -611,6 +603,7 @@ void arch_setup_new_exec(void) current->mm->context.flags = mmflags; ptrauth_thread_init_user(); mte_thread_init_user(); + erratum_1418040_new_exec(); if (task_spec_ssb_noexec(current)) { arch_prctl_spec_ctrl_set(current, PR_SPEC_STORE_BYPASS,
The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 when executing compat threads. The workaround is applied when switching between tasks, but the need for the workaround could also change at an exec(), when a non-compat task execs a compat binary or vice versa. Apply the workaround in arch_setup_new_exec(). This leaves a small window of time between SET_PERSONALITY and arch_setup_new_exec where preemption could occur and confuse the old workaround logic that compares TIF_32BIT between prev and next. Instead, we can just read cntkctl to make sure it's in the state that the next task needs. I measured cntkctl read time to be about the same as a mov from a general-purpose register on N1. Update the workaround logic to examine the current value of cntkctl instead of the previous task's compat state. Fixes: d49f7d7376d0 ("arm64: Move handling of erratum 1418040 into C code") Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Cc: <stable@vger.kernel.org> # 5.4.x --- v5: - Move preempt_enable/disable out of switch_to path (Marc) - commit message nits (Marc) v4: - Move exec() handling into arch_setup_new_exec(), drop prev32==next32 comparison to fix possible confusion in the small window between SET_PERSONALITY() and arch_setup_new_exec(). (Catalin) v3: - Un-nest conditionals (Marc) v2: - Use sysreg_clear_set instead of open coding (Marc) - guard this_cpu_has_cap() check under IS_ENABLED() to avoid tons of WARN_ON(preemptible()) when built with !CONFIG_ARM64_ERRATUM_1418040 arch/arm64/kernel/process.c | 39 +++++++++++++++---------------------- 1 file changed, 16 insertions(+), 23 deletions(-)