Message ID | 20221027112741.1678057-1-ardb@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] arm64: Enable data independent timing (DIT) in the kernel | expand |
On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote: > The ARM architecture revision v8.4 introduces a data independent timing > control (DIT) which can be set at any exception level, and instructs the > CPU to avoid optimizations that may result in a correlation between the > execution time of certain instructions and the value of the data they > operate on. > > The DIT bit is part of PSTATE, and is therefore context switched as > usual, given that it becomes part of the saved program state (SPSR) when > taking an exception. We have also defined a hwcap for DIT, and so user > space can discover already whether or nor DIT is available. This means > that, as far as user space is concerned, DIT is wired up and fully > functional. > > In the kernel, however, we never bothered with DIT: we disable at it > boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we > might run with DIT enabled if user space happened to set it. FWIW, I did raise this with some architects a while back, and the thinking at the time was that implementations were likely to have data independent timing regardless of the value of the DIT bit, and so manipulating this at exception entry was busy work with no actual gain (especially if we had to handle mismatched big.LITTLE systems). Since then, I have become aware of some possible implementation choices which would consider the bit (though I have no idea if anyone is building such implementations). > Given that running privileged code with DIT disabled on a CPU that > implements support for it may result in a side channel that exposes > privileged data to unprivileged user space processes, I appreciate this is a simple way to rule out issues of that sort, but I think the "may" in that sentence is doing a lot of work, since: * IIUC, we don't have a specific case in mind that we're concerned about. I can believe that we think all the crypto code we intend to be constant time is theoretically affected. * IIUC we haven't gone an audited all constant-time code to check it doesn't happen to use instructions with data-dependent-timing. So there might be more work to do atop this to ensure theoretical correctness. * AFAIK there are no contemporary implementations where the DIT is both implemented and it being clear results in data-dependent-timing. i.e. we have nothing to actually test on. Given that, it would be nice if we could make this a bit clearer, e.g. state that this is currently a belt-and-braces approach for theoretical cases, rather than an extant side-channel today (as far as we're aware). > let's enable DIT while running in the kernel if supported by all CPUs. FWIW, I think it's reasonable to do this when all CPUs have DIT. I have a slight fear that (as above) if there are future CPUs which do consider DIT, there's presumably a noticeable performance difference (or the CPU would just provide data-independent-timing regardless), but I'm not sure if that's just something we have to live with or could punt on until we notice such cases. > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Marc Zyngier <maz@kernel.org> > Cc: Eric Biggers <ebiggers@kernel.org> > Cc: Jason A. Donenfeld <Jason@zx2c4.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > Cc: Adam Langley <agl@google.com> > Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@gmail.com/ > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/arm64/include/asm/sysreg.h | 3 +++ > arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++ > arch/arm64/kernel/entry.S | 3 +++ > arch/arm64/tools/cpucaps | 1 + > 4 files changed, 23 insertions(+) Don't we also need to touch __cpu_suspend_exit() in arch/am64/kernel/suspend.c? I'm assuming so given that has __uaccess_enable_hw_pan() today. Presumably we'd also care about this in KVM hyp code? > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index 7d301700d1a9..18e065f5130c 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -94,15 +94,18 @@ > #define PSTATE_PAN pstate_field(0, 4) > #define PSTATE_UAO pstate_field(0, 3) > #define PSTATE_SSBS pstate_field(3, 1) > +#define PSTATE_DIT pstate_field(3, 2) > #define PSTATE_TCO pstate_field(3, 4) > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift)) > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) > > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x)) > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x)) > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x)) > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x)) > > #define __SYS_BARRIER_INSN(CRm, op2, Rt) \ > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 6062454a9067..273a74df24fe 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused) > sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP); > } > > +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused) > +{ > + set_pstate_dit(1); > +} I think this wants the same treatment as PAN, i.e. WARN_ON_ONCE(in_interrupt()); with a comment about PSTATE being discareded upon ERET. Thanks, Mark. > + > /* Internal helper functions to match cpu capability type */ > static bool > cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap) > @@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = { > .matches = has_cpuid_feature, > .cpu_enable = cpu_trap_el0_impdef, > }, > + { > + .desc = "Data independent timing control (DIT)", > + .capability = ARM64_HAS_DIT, > + .type = ARM64_CPUCAP_SYSTEM_FEATURE, > + .matches = has_cpuid_feature, > + .sys_reg = SYS_ID_AA64PFR0_EL1, > + .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT, > + .field_width = 4, > + .min_field_value = 1, > + .cpu_enable = cpu_enable_dit, > + }, > {}, > }; > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index e28137d64b76..229b505e6366 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -197,6 +197,9 @@ alternative_cb_end > .endm > > .macro kernel_entry, el, regsize = 64 > + .if \el == 0 > + ALTERNATIVE(nop, SET_PSTATE_DIT(1), ARM64_HAS_DIT) > + .endif > .if \regsize == 32 > mov w0, w0 // zero upper 32 bits of x0 > .endif > diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps > index f1c0347ec31a..a86ee376920a 100644 > --- a/arch/arm64/tools/cpucaps > +++ b/arch/arm64/tools/cpucaps > @@ -20,6 +20,7 @@ HAS_CNP > HAS_CRC32 > HAS_DCPODP > HAS_DCPOP > +HAS_DIT > HAS_E0PD > HAS_ECV > HAS_EPAN > -- > 2.35.1 >
On Thu, Oct 27, 2022 at 01:12:16PM +0100, Mark Rutland wrote: > I appreciate this is a simple way to rule out issues of that sort, but I think > the "may" in that sentence is doing a lot of work, since: > > * IIUC, we don't have a specific case in mind that we're concerned about. I can > believe that we think all the crypto code we intend to be constant time is > theoretically affected. > > * IIUC we haven't gone an audited all constant-time code to check it doesn't > happen to use instructions with data-dependent-timing. So there might be more > work to do atop this to ensure theoretical correctness. > > * AFAIK there are no contemporary implementations where the DIT is both > implemented and it being clear results in data-dependent-timing. i.e. we have > nothing to actually test on. > > I have a slight fear that (as above) if there are future CPUs which do consider > DIT, there's presumably a noticeable performance difference (or the CPU would > just provide data-independent-timing regardless), but I'm not sure if that's > just something we have to live with or could punt on until we notice such > cases. You're heading on a road to disaster reasoning like that. You wrote, "we don't have a specific case in mind that we're concerned about", but actually, all you can say here is that you're not personally aware of a specific case in mind to be concerned about. As somebody who actually works on this code, I do have specific cases I'm worried about, and I know there are certain assumptions I've made about various coding patterns being CT, resulting in various instructions that I assume to be CT, which is something I tend to check by hand, while others have entire frameworks to automatically verify this kind of thing. In other words, one man's theory is another man's practice. Then you write that there aren't any contemporary instructions where this matters, but you fear they could come up in the future. Okay, good, that's a perspective we can both share. The logical thing to do about that would be Ard's patch here. However, you then conclude something vague about performance and suggest punting this down the road. I guess this makes sense to you because you don't think timing attacks are real anyway. You're entitled to your opinion, of course, but I don't think it's a popular one, and it certainly is contrary to that of most implementers of the concerned code. On the contrary, we should be proactive in ensuring the kernel remains a suitable environment for CT code, preventing the problem *before* it happens, rather than having to deal with vulnerability response down the road, "punt[ing]" it, as you said. And perhaps if we handle this now, CPU designers also won't feel like they can get away with silly performance gains at the cost of CT instructions. Jason
Hi Jason, I think my wording below might have been unclear -- I was not arguing to punt on this patch. I've tried to clarify that below. Tone is always hard in text... On Thu, Oct 27, 2022 at 02:40:14PM +0200, Jason A. Donenfeld wrote: > On Thu, Oct 27, 2022 at 01:12:16PM +0100, Mark Rutland wrote: > > I appreciate this is a simple way to rule out issues of that sort, but I think > > the "may" in that sentence is doing a lot of work, since: > > > > * IIUC, we don't have a specific case in mind that we're concerned about. I can > > believe that we think all the crypto code we intend to be constant time is > > theoretically affected. > > > > * IIUC we haven't gone an audited all constant-time code to check it doesn't > > happen to use instructions with data-dependent-timing. So there might be more > > work to do atop this to ensure theoretical correctness. > > > > * AFAIK there are no contemporary implementations where the DIT is both > > implemented and it being clear results in data-dependent-timing. i.e. we have > > nothing to actually test on. > > > > I have a slight fear that (as above) if there are future CPUs which do consider > > DIT, there's presumably a noticeable performance difference (or the CPU would > > just provide data-independent-timing regardless), but I'm not sure if that's > > just something we have to live with or could punt on until we notice such > > cases. > > You're heading on a road to disaster reasoning like that. > > You wrote, "we don't have a specific case in mind that we're concerned > about", but actually, all you can say here is that you're not personally > aware of a specific case in mind to be concerned about. As somebody who > actually works on this code, I do have specific cases I'm worried about, > and I know there are certain assumptions I've made about various coding > patterns being CT, resulting in various instructions that I assume to be > CT, which is something I tend to check by hand, while others have entire > frameworks to automatically verify this kind of thing. In other words, > one man's theory is another man's practice. What I'm trying to say here is that as-written the "may result in a side channel" wording is sufficiently vague that it can easily be read both more strongly and wore weakly than may have been intended (and can easily be editotorialised to "there's a new side-channel bug" style reporting that isn't helpful). Hence I'm calling that out explicitly for the benefit of others. If we say something like "in the absence of DIT we can't ensure that instructions sequences have data independent timing, and so where this matters there could in theory be a side channel", or something like that, I'd be happier. As you say, people have different levels of riguour here, and (unfortunately) there's plenty of extant code with informal assumptions that are hard to reason about. If you have a specific example to contribute to the commit messge, that would be great. > Then you write that there aren't any contemporary instructions where > this matters, but you fear they could come up in the future. Okay, good, > that's a perspective we can both share. The logical thing to do about > that would be Ard's patch here. However, you then conclude something > vague about performance and suggest punting this down the road. Sorry, I think my wording was unclear here: I meant punting on the performance concern (i.e. take Ard's patch now, and if we see a performance issue in future, address it then). I think we're actually agreed on that? My reason for noting the performance concern was to lampshade that for anyone aware of any extant / in-progress implementations, both to get them to tell their HW folk that they might not get the practical gains they expect, and since if those do exist we could consider following up with mechanisms to opt-out of data-independent-timing where we are both satisifed that doing so is safe and where there's a noticeable performance impact. > I guess this makes sense to you because you don't think timing attacks are > real anyway. You're entitled to your opinion, of course, but I don't think > it's a popular one, and it certainly is contrary to that of most implementers > of the concerned code. That is not at all what I was trying to say here, but I can see how it's possible to read that, so apologies for not being clear. I am well aware that timing attacks are real, having specnt quite a while on spectre its acquaintances, and having argued for stronger architectural guarantees / better mechanisms to deal with this sort of thing. > On the contrary, we should be proactive in ensuring the kernel remains > a suitable environment for CT code, preventing the problem *before* it > happens, I agree. I'm just trying to make sure that we're clear that we're doing that and this isn't a fix for an extant seurity bug which is actively exploitable today. > rather than having to deal with vulnerability response down the road, > "punt[ing]" it, as you said. And perhaps if we handle this now, CPU designers > also won't feel like they can get away with silly performance gains at the > cost of CT instructions. Sure. Thanks, Mark.
On Thu, 27 Oct 2022 at 14:12, Mark Rutland <mark.rutland@arm.com> wrote: > > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote: > > The ARM architecture revision v8.4 introduces a data independent timing > > control (DIT) which can be set at any exception level, and instructs the > > CPU to avoid optimizations that may result in a correlation between the > > execution time of certain instructions and the value of the data they > > operate on. > > > > The DIT bit is part of PSTATE, and is therefore context switched as > > usual, given that it becomes part of the saved program state (SPSR) when > > taking an exception. We have also defined a hwcap for DIT, and so user > > space can discover already whether or nor DIT is available. This means > > that, as far as user space is concerned, DIT is wired up and fully > > functional. > > > > In the kernel, however, we never bothered with DIT: we disable at it > > boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we > > might run with DIT enabled if user space happened to set it. > > FWIW, I did raise this with some architects a while back, and the thinking at > the time was that implementations were likely to have data independent timing > regardless of the value of the DIT bit, and so manipulating this at exception > entry was busy work with no actual gain (especially if we had to handle > mismatched big.LITTLE systems). > > Since then, I have become aware of some possible implementation choices which > would consider the bit (though I have no idea if anyone is building such > implementations). > It's a bit unfortunate that FEAT_DIT is mandatory even if the uarch in question behaves the same regardless. And the fact that it is documented as resetting to 0x0, and already being exposed to user space doesn't help either. But that doesn't justify keeping this disabled in the kernel. The architecture does not permit us to distinguish between the cases where this is just busywork and where it does make a difference. So we have to assume it is the latter. > > Given that running privileged code with DIT disabled on a CPU that > > implements support for it may result in a side channel that exposes > > privileged data to unprivileged user space processes, > > I appreciate this is a simple way to rule out issues of that sort, but I think > the "may" in that sentence is doing a lot of work, since: > > * IIUC, we don't have a specific case in mind that we're concerned about. I can > believe that we think all the crypto code we intend to be constant time is > theoretically affected. > I think this reasoning is backwards. I don't think it is necessary to go and identify where we are relying on timing invariance. Crypto keeps coming up, and of course, it is a well known example of where timing variances may leak the plaintext or the key. But crypto is just one way to keep data confidential, and another method we rely on heavily is the privilege boundary between kernel and user space. So just like all crypto should be constant time, all privileged execution should [ideally] be constant time as well. > * IIUC we haven't gone an audited all constant-time code to check it doesn't > happen to use instructions with data-dependent-timing. So there might be more > work to do atop this to ensure theoretical correctness. > Sure. > * AFAIK there are no contemporary implementations where the DIT is both > implemented and it being clear results in data-dependent-timing. i.e. we have > nothing to actually test on. > Then why on earth are such implementations required to implement FEAT_DIT?? > Given that, it would be nice if we could make this a bit clearer, e.g. state > that this is currently a belt-and-braces approach for theoretical cases, rather > than an extant side-channel today (as far as we're aware). > Sure - I'll add some extra prose to the commit log to capture the above. > > let's enable DIT while running in the kernel if supported by all CPUs. > > FWIW, I think it's reasonable to do this when all CPUs have DIT. > > I have a slight fear that (as above) if there are future CPUs which do consider > DIT, there's presumably a noticeable performance difference (or the CPU would > just provide data-independent-timing regardless), but I'm not sure if that's > just something we have to live with or could punt on until we notice such > cases. > Sure. Another concern might be the overhead of toggling the bit, so getting this change in sooner than later might actually help direct the development towards implementations where the performance uplift substantially outweighs the cycles spent on managing the DIT state. To me, it seems unlikely that timing dependent optimizations in privileged code would benefit actual real-world workloads while not resulting in exploitable timing leajs, but user space should be able to manage this however it wants. IOW, yes. Let's pick a safe default, and when use cases turn up where disabling DIT makes a substantial difference, we can revisit this. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Will Deacon <will@kernel.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Marc Zyngier <maz@kernel.org> > > Cc: Eric Biggers <ebiggers@kernel.org> > > Cc: Jason A. Donenfeld <Jason@zx2c4.com> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > Cc: Adam Langley <agl@google.com> > > Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@gmail.com/ > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/arm64/include/asm/sysreg.h | 3 +++ > > arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++ > > arch/arm64/kernel/entry.S | 3 +++ > > arch/arm64/tools/cpucaps | 1 + > > 4 files changed, 23 insertions(+) > > Don't we also need to touch __cpu_suspend_exit() in arch/am64/kernel/suspend.c? > I'm assuming so given that has __uaccess_enable_hw_pan() today. > Indeed, I missed that. > Presumably we'd also care about this in KVM hyp code? > Presumably, yes but I didn't investigate thoroughly what that would entail. I think this can be managed separately, and I'll look into it once the discussion here converges. > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > > index 7d301700d1a9..18e065f5130c 100644 > > --- a/arch/arm64/include/asm/sysreg.h > > +++ b/arch/arm64/include/asm/sysreg.h > > @@ -94,15 +94,18 @@ > > #define PSTATE_PAN pstate_field(0, 4) > > #define PSTATE_UAO pstate_field(0, 3) > > #define PSTATE_SSBS pstate_field(3, 1) > > +#define PSTATE_DIT pstate_field(3, 2) > > #define PSTATE_TCO pstate_field(3, 4) > > > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) > > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) > > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift)) > > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) > > > > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x)) > > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x)) > > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x)) > > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x)) > > > > #define __SYS_BARRIER_INSN(CRm, op2, Rt) \ > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index 6062454a9067..273a74df24fe 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused) > > sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP); > > } > > > > +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused) > > +{ > > + set_pstate_dit(1); > > +} > > I think this wants the same treatment as PAN, i.e. > WARN_ON_ONCE(in_interrupt()); with a comment about PSTATE being discareded upon > ERET. > This is only called from the CPU feature code, which calls it under stop_machine(), so I decided it was not needed. I don't mind adding it, though, just seems unnecessary to me.
> From: Ard Biesheuvel <ardb@kernel.org> > Date: Thu, 27 Oct 2022 15:17:56 +0200 > > On Thu, 27 Oct 2022 at 14:12, Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote: > > > The ARM architecture revision v8.4 introduces a data independent timing > > > control (DIT) which can be set at any exception level, and instructs the > > > CPU to avoid optimizations that may result in a correlation between the > > > execution time of certain instructions and the value of the data they > > > operate on. > > > > > > The DIT bit is part of PSTATE, and is therefore context switched as > > > usual, given that it becomes part of the saved program state (SPSR) when > > > taking an exception. We have also defined a hwcap for DIT, and so user > > > space can discover already whether or nor DIT is available. This means > > > that, as far as user space is concerned, DIT is wired up and fully > > > functional. > > > > > > In the kernel, however, we never bothered with DIT: we disable at it > > > boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we > > > might run with DIT enabled if user space happened to set it. > > > > FWIW, I did raise this with some architects a while back, and the thinking at > > the time was that implementations were likely to have data independent timing > > regardless of the value of the DIT bit, and so manipulating this at exception > > entry was busy work with no actual gain (especially if we had to handle > > mismatched big.LITTLE systems). > > > > Since then, I have become aware of some possible implementation choices which > > would consider the bit (though I have no idea if anyone is building such > > implementations). > > > > It's a bit unfortunate that FEAT_DIT is mandatory even if the uarch in > question behaves the same regardless. And the fact that it is > documented as resetting to 0x0, and already being exposed to user > space doesn't help either. But that doesn't justify keeping this > disabled in the kernel. > > The architecture does not permit us to distinguish between the cases > where this is just busywork and where it does make a difference. So we > have to assume it is the latter. > > > > Given that running privileged code with DIT disabled on a CPU that > > > implements support for it may result in a side channel that exposes > > > privileged data to unprivileged user space processes, > > > > I appreciate this is a simple way to rule out issues of that sort, but I think > > the "may" in that sentence is doing a lot of work, since: > > > > * IIUC, we don't have a specific case in mind that we're concerned about. I can > > believe that we think all the crypto code we intend to be constant time is > > theoretically affected. > > > > I think this reasoning is backwards. I don't think it is necessary to > go and identify where we are relying on timing invariance. Crypto > keeps coming up, and of course, it is a well known example of where > timing variances may leak the plaintext or the key. But crypto is just > one way to keep data confidential, and another method we rely on > heavily is the privilege boundary between kernel and user space. So > just like all crypto should be constant time, all privileged execution > should [ideally] be constant time as well. > > > * IIUC we haven't gone an audited all constant-time code to check it doesn't > > happen to use instructions with data-dependent-timing. So there might be more > > work to do atop this to ensure theoretical correctness. > > > > Sure. > > > * AFAIK there are no contemporary implementations where the DIT is both > > implemented and it being clear results in data-dependent-timing. i.e. we have > > nothing to actually test on. > > > > Then why on earth are such implementations required to implement FEAT_DIT?? > > > Given that, it would be nice if we could make this a bit clearer, e.g. state > > that this is currently a belt-and-braces approach for theoretical cases, rather > > than an extant side-channel today (as far as we're aware). > > > > Sure - I'll add some extra prose to the commit log to capture the above. > > > > let's enable DIT while running in the kernel if supported by all CPUs. > > > > FWIW, I think it's reasonable to do this when all CPUs have DIT. > > > > I have a slight fear that (as above) if there are future CPUs which do consider > > DIT, there's presumably a noticeable performance difference (or the CPU would > > just provide data-independent-timing regardless), but I'm not sure if that's > > just something we have to live with or could punt on until we notice such > > cases. > > > > Sure. Another concern might be the overhead of toggling the bit, so > getting this change in sooner than later might actually help direct > the development towards implementations where the performance uplift > substantially outweighs the cycles spent on managing the DIT state. > > To me, it seems unlikely that timing dependent optimizations in > privileged code would benefit actual real-world workloads while not > resulting in exploitable timing leajs, but user space should be able > to manage this however it wants. > > IOW, yes. Let's pick a safe default, and when use cases turn up where > disabling DIT makes a substantial difference, we can revisit this. FWIW, I came to the same conclusion and that's what I implemented for OpenBSD. > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > > Cc: Will Deacon <will@kernel.org> > > > Cc: Mark Rutland <mark.rutland@arm.com> > > > Cc: Marc Zyngier <maz@kernel.org> > > > Cc: Eric Biggers <ebiggers@kernel.org> > > > Cc: Jason A. Donenfeld <Jason@zx2c4.com> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > > Cc: Adam Langley <agl@google.com> > > > Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@gmail.com/ > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > --- > > > arch/arm64/include/asm/sysreg.h | 3 +++ > > > arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++ > > > arch/arm64/kernel/entry.S | 3 +++ > > > arch/arm64/tools/cpucaps | 1 + > > > 4 files changed, 23 insertions(+) > > > > Don't we also need to touch __cpu_suspend_exit() in arch/am64/kernel/suspend.c? > > I'm assuming so given that has __uaccess_enable_hw_pan() today. > > > > Indeed, I missed that. > > > Presumably we'd also care about this in KVM hyp code? > > > > Presumably, yes but I didn't investigate thoroughly what that would > entail. I think this can be managed separately, and I'll look into it > once the discussion here converges. > > > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > > > index 7d301700d1a9..18e065f5130c 100644 > > > --- a/arch/arm64/include/asm/sysreg.h > > > +++ b/arch/arm64/include/asm/sysreg.h > > > @@ -94,15 +94,18 @@ > > > #define PSTATE_PAN pstate_field(0, 4) > > > #define PSTATE_UAO pstate_field(0, 3) > > > #define PSTATE_SSBS pstate_field(3, 1) > > > +#define PSTATE_DIT pstate_field(3, 2) > > > #define PSTATE_TCO pstate_field(3, 4) > > > > > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > > > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) > > > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) > > > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift)) > > > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) > > > > > > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x)) > > > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x)) > > > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x)) > > > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x)) > > > > > > #define __SYS_BARRIER_INSN(CRm, op2, Rt) \ > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > > index 6062454a9067..273a74df24fe 100644 > > > --- a/arch/arm64/kernel/cpufeature.c > > > +++ b/arch/arm64/kernel/cpufeature.c > > > @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused) > > > sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP); > > > } > > > > > > +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused) > > > +{ > > > + set_pstate_dit(1); > > > +} > > > > I think this wants the same treatment as PAN, i.e. > > WARN_ON_ONCE(in_interrupt()); with a comment about PSTATE being discareded upon > > ERET. > > > > This is only called from the CPU feature code, which calls it under > stop_machine(), so I decided it was not needed. I don't mind adding > it, though, just seems unnecessary to me. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
Hi Ard, On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote: > Given that running privileged code with DIT disabled on a CPU that > implements support for it may result in a side channel that exposes > privileged data to unprivileged user space processes, let's enable DIT > while running in the kernel if supported by all CPUs. This patch looks good to me, though I'm not an expert in low-level arm64 stuff. It's a bit unfortunate that we have to manually create the .inst to enable DIT instead of just using the assembler. But it looks like there's a reason for it (it's done for PAN et al. too), and based on the manual it looks correct. Two nits below: > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index 7d301700d1a9..18e065f5130c 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -94,15 +94,18 @@ > #define PSTATE_PAN pstate_field(0, 4) > #define PSTATE_UAO pstate_field(0, 3) > #define PSTATE_SSBS pstate_field(3, 1) > +#define PSTATE_DIT pstate_field(3, 2) > #define PSTATE_TCO pstate_field(3, 4) > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift)) > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) > > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x)) > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x)) > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x)) > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x)) To keep the order consistent, set_pstate_dit() should be defined after set_pstate_ssbs(), not before. > /* Internal helper functions to match cpu capability type */ > static bool > cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap) > @@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = { > .matches = has_cpuid_feature, > .cpu_enable = cpu_trap_el0_impdef, > }, > + { > + .desc = "Data independent timing control (DIT)", > + .capability = ARM64_HAS_DIT, > + .type = ARM64_CPUCAP_SYSTEM_FEATURE, > + .matches = has_cpuid_feature, > + .sys_reg = SYS_ID_AA64PFR0_EL1, > + .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT, > + .field_width = 4, > + .min_field_value = 1, > + .cpu_enable = cpu_enable_dit, > + }, The other entries in this array are explicit about '.sign = FTR_UNSIGNED' (even though FTR_UNSIGNED is defined to false, so it's the default value). - Eric
On Fri, 4 Nov 2022 at 09:09, Eric Biggers <ebiggers@kernel.org> wrote: > > Hi Ard, > > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote: > > Given that running privileged code with DIT disabled on a CPU that > > implements support for it may result in a side channel that exposes > > privileged data to unprivileged user space processes, let's enable DIT > > while running in the kernel if supported by all CPUs. > > This patch looks good to me, though I'm not an expert in low-level arm64 stuff. > It's a bit unfortunate that we have to manually create the .inst to enable DIT > instead of just using the assembler. But it looks like there's a reason for it > (it's done for PAN et al. too), and based on the manual it looks correct. > Yes. The reason is that the assembler requires -march=armv8.2-a to be passed when using the DIT register (and similar requirements apply to the other registers). However, doing so may result in object code that can no longer run on pre-v8.2 cores, whereas the DIT accesses themselves are only emitted in a carefully controlled manner anyway, so keeping the arch baseline to v8.0 and using .inst is the cleanest way around this. > Two nits below: > > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > > index 7d301700d1a9..18e065f5130c 100644 > > --- a/arch/arm64/include/asm/sysreg.h > > +++ b/arch/arm64/include/asm/sysreg.h > > @@ -94,15 +94,18 @@ > > #define PSTATE_PAN pstate_field(0, 4) > > #define PSTATE_UAO pstate_field(0, 3) > > #define PSTATE_SSBS pstate_field(3, 1) > > +#define PSTATE_DIT pstate_field(3, 2) > > #define PSTATE_TCO pstate_field(3, 4) > > > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) > > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) > > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) > > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift)) > > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) > > > > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x)) > > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x)) > > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x)) > > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x)) > > To keep the order consistent, set_pstate_dit() should be defined after > set_pstate_ssbs(), not before. > Ack. Seems I just inserted it one from the bottom without actually reading :-) > > /* Internal helper functions to match cpu capability type */ > > static bool > > cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap) > > @@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = { > > .matches = has_cpuid_feature, > > .cpu_enable = cpu_trap_el0_impdef, > > }, > > + { > > + .desc = "Data independent timing control (DIT)", > > + .capability = ARM64_HAS_DIT, > > + .type = ARM64_CPUCAP_SYSTEM_FEATURE, > > + .matches = has_cpuid_feature, > > + .sys_reg = SYS_ID_AA64PFR0_EL1, > > + .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT, > > + .field_width = 4, > > + .min_field_value = 1, > > + .cpu_enable = cpu_enable_dit, > > + }, > > The other entries in this array are explicit about '.sign = FTR_UNSIGNED' > (even though FTR_UNSIGNED is defined to false, so it's the default value). > Ack.
On Fri, Nov 04, 2022 at 10:29:10AM +0100, Ard Biesheuvel wrote: > On Fri, 4 Nov 2022 at 09:09, Eric Biggers <ebiggers@kernel.org> wrote: > > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote: > > > Given that running privileged code with DIT disabled on a CPU that > > > implements support for it may result in a side channel that exposes > > > privileged data to unprivileged user space processes, let's enable DIT > > > while running in the kernel if supported by all CPUs. > > > > This patch looks good to me, though I'm not an expert in low-level arm64 stuff. > > It's a bit unfortunate that we have to manually create the .inst to enable DIT > > instead of just using the assembler. But it looks like there's a reason for it > > (it's done for PAN et al. too), and based on the manual it looks correct. > > Yes. The reason is that the assembler requires -march=armv8.2-a to be > passed when using the DIT register (and similar requirements apply to > the other registers). However, doing so may result in object code that > can no longer run on pre-v8.2 cores, whereas the DIT accesses > themselves are only emitted in a carefully controlled manner anyway, > so keeping the arch baseline to v8.0 and using .inst is the cleanest > way around this. We worked around this already by defining asm-arch in arch/arm64/Makefile to the latest that the assembler supports while keeping the C compiler on armv8.0. Unlike the C compiler, the assembler shouldn't generate new instructions unless specifically asked through inline asm or .S files. We use this trick for MTE already (and LSE atomics), though we needed another __MTE_PREAMBLE as armv8.5-a wasn't sufficient for these instructions. I think we ended up with .inst initially as binutils did not support some of those instructions. We could try to clean them up but it's a bit of a hassle to check which versions your binutils supports.
On Fri, 4 Nov 2022 at 17:19, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Fri, Nov 04, 2022 at 10:29:10AM +0100, Ard Biesheuvel wrote: > > On Fri, 4 Nov 2022 at 09:09, Eric Biggers <ebiggers@kernel.org> wrote: > > > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote: > > > > Given that running privileged code with DIT disabled on a CPU that > > > > implements support for it may result in a side channel that exposes > > > > privileged data to unprivileged user space processes, let's enable DIT > > > > while running in the kernel if supported by all CPUs. > > > > > > This patch looks good to me, though I'm not an expert in low-level arm64 stuff. > > > It's a bit unfortunate that we have to manually create the .inst to enable DIT > > > instead of just using the assembler. But it looks like there's a reason for it > > > (it's done for PAN et al. too), and based on the manual it looks correct. > > > > Yes. The reason is that the assembler requires -march=armv8.2-a to be > > passed when using the DIT register (and similar requirements apply to > > the other registers). However, doing so may result in object code that > > can no longer run on pre-v8.2 cores, whereas the DIT accesses > > themselves are only emitted in a carefully controlled manner anyway, > > so keeping the arch baseline to v8.0 and using .inst is the cleanest > > way around this. > > We worked around this already by defining asm-arch in > arch/arm64/Makefile to the latest that the assembler supports while > keeping the C compiler on armv8.0. Unlike the C compiler, the assembler > shouldn't generate new instructions unless specifically asked through > inline asm or .S files. We use this trick for MTE already (and LSE > atomics), though we needed another __MTE_PREAMBLE as armv8.5-a wasn't > sufficient for these instructions. > > I think we ended up with .inst initially as binutils did not support > some of those instructions. We could try to clean them up but it's a bit > of a hassle to check which versions your binutils supports. > OK, good to know. However, I double checked, and DIT needs v8.4-a (not v8.2 as i mentioned above), and my ubuntu 16.04 toolchain, which has GCC 5.3, only goes up to v8.2 So I guess we should be able to fix this at /some/ point, but for now, I'll need to stick with the __inst()
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 7d301700d1a9..18e065f5130c 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -94,15 +94,18 @@ #define PSTATE_PAN pstate_field(0, 4) #define PSTATE_UAO pstate_field(0, 3) #define PSTATE_SSBS pstate_field(3, 1) +#define PSTATE_DIT pstate_field(3, 2) #define PSTATE_TCO pstate_field(3, 4) #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x)) #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x)) +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x)) #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x)) #define __SYS_BARRIER_INSN(CRm, op2, Rt) \ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6062454a9067..273a74df24fe 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused) sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP); } +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused) +{ + set_pstate_dit(1); +} + /* Internal helper functions to match cpu capability type */ static bool cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap) @@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .cpu_enable = cpu_trap_el0_impdef, }, + { + .desc = "Data independent timing control (DIT)", + .capability = ARM64_HAS_DIT, + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .matches = has_cpuid_feature, + .sys_reg = SYS_ID_AA64PFR0_EL1, + .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT, + .field_width = 4, + .min_field_value = 1, + .cpu_enable = cpu_enable_dit, + }, {}, }; diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index e28137d64b76..229b505e6366 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -197,6 +197,9 @@ alternative_cb_end .endm .macro kernel_entry, el, regsize = 64 + .if \el == 0 + ALTERNATIVE(nop, SET_PSTATE_DIT(1), ARM64_HAS_DIT) + .endif .if \regsize == 32 mov w0, w0 // zero upper 32 bits of x0 .endif diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index f1c0347ec31a..a86ee376920a 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -20,6 +20,7 @@ HAS_CNP HAS_CRC32 HAS_DCPODP HAS_DCPOP +HAS_DIT HAS_E0PD HAS_ECV HAS_EPAN
The ARM architecture revision v8.4 introduces a data independent timing control (DIT) which can be set at any exception level, and instructs the CPU to avoid optimizations that may result in a correlation between the execution time of certain instructions and the value of the data they operate on. The DIT bit is part of PSTATE, and is therefore context switched as usual, given that it becomes part of the saved program state (SPSR) when taking an exception. We have also defined a hwcap for DIT, and so user space can discover already whether or nor DIT is available. This means that, as far as user space is concerned, DIT is wired up and fully functional. In the kernel, however, we never bothered with DIT: we disable at it boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we might run with DIT enabled if user space happened to set it. Given that running privileged code with DIT disabled on a CPU that implements support for it may result in a side channel that exposes privileged data to unprivileged user space processes, let's enable DIT while running in the kernel if supported by all CPUs. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Kees Cook <keescook@chromium.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Adam Langley <agl@google.com> Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@gmail.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++ arch/arm64/kernel/entry.S | 3 +++ arch/arm64/tools/cpucaps | 1 + 4 files changed, 23 insertions(+)