Message ID | 20230125012801.362496-1-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | x86: enable Data Operand Independent Timing Mode | expand |
On Tue, Jan 24, 2023 at 05:28:01PM -0800, Eric Biggers wrote: > +.. SPDX-License-Identifier: GPL-2.0 > + > +DODT - Data Operand Dependent Timing > +==================================== > + > +Data Operand Dependent Timing (DODT) is a CPU vulnerability that makes the > +execution times of instructions depend on the values of the data operated on. > + > +This vulnerability potentially enables side-channel attacks on data, including > +cryptographic keys. Most cryptography algorithms require that a variety of > +instructions be constant-time in order to prevent side-channel attacks. > + > +Affected CPUs > +------------- > + > +This vulnerability affects Intel Core family processors based on the Ice Lake > +and later microarchitectures, and Intel Atom family processors based on the > +Gracemont and later microarchitectures. For more information, see Intel's > +documentation [1]_. > + > +Mitigation > +---------- > + > +Mitigation of this vulnerability involves setting a Model Specific Register > +(MSR) bit to enable Data Operand Independent Timing Mode (DOITM). > + > +By the default, the kernel does this on all CPUs. This mitigation is global, so > +it applies to both the kernel and userspace. > + > +This mitigation can be disabled by adding ``doitm=off`` to the kernel command > +line. It's also one of the mitigations that can be disabled by > +``mitigations=off``. > + > +References > +---------- > +.. [1] Data Operand Independent Timing Instruction Set Architecture (ISA) Guidance > + https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html > diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst > index 4df436e7c4177..cd962f9634dad 100644 > --- a/Documentation/admin-guide/hw-vuln/index.rst > +++ b/Documentation/admin-guide/hw-vuln/index.rst > @@ -18,3 +18,4 @@ are configurable at compile, boot or run time. > core-scheduling.rst > l1d_flush.rst > processor_mmio_stale_data.rst > + dodt.rst > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 6cfa6e3996cf7..a6a872c4365e6 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1119,6 +1119,12 @@ > The filter can be disabled or changed to another > driver later using sysfs. > > + doitm=off [X86,INTEL] Disable the use of Data Operand Independent > + Timing Mode (DOITM). I.e., disable the mitigation for > + the Data Operand Dependent Timing (DODT) CPU > + vulnerability. For details, see > + Documentation/admin-guide/hw-vuln/dodt.rst > + > driver_async_probe= [KNL] > List of driver names to be probed asynchronously. * > matches with all driver names. If * is specified, the > @@ -3259,6 +3265,7 @@ > no_uaccess_flush [PPC] > mmio_stale_data=off [X86] > retbleed=off [X86] > + doitm=off [X86,INTEL] > > Exceptions: > This does not have any effect on The doc LGTM, thanks! Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
On 1/24/23 17:28, Eric Biggers wrote: > To mitigate this CPU vulnerability, it's possible to enable "Data > Operand Independent Timing Mode" (DOITM) by setting a bit in a MSR. > While Intel's documentation suggests that this bit should only be set > where "necessary", that is highly impractical, given the fact that > cryptography can happen nearly anywhere in the kernel and userspace, and > the fact that the entire kernel likely needs to be protected anyway. I think this misses a key point from the documentation: This functionality is intended for use by software which has already applied other techniques to mitigate software timing side channels, such as those documented in Intel's Guidelines for Mitigating Timing Side Channels Against Cryptographic Implementations. Translating from Intel-speak: Intel thinks that DOITM purely a way to make the CPU run slower if you haven't already written code specifically to mitigate timing side channels. All pain, no gain. The kernel as a whole is not written that way. I'm sure the crypto folks that are cc'd can tell us specifically if the kernel crypto code is written following those recommendations. So, let's call this patch what it is: a potential global slowdown which protects a very small amount of crypto code, probably just in userspace. That is probably the code that's generating your RSA keys, so it's quite important, but it's also a _very_ small total amount of code. There's another part here which I think was recently added to the documentation: Intel expects the performance impact of this mode may be significantly higher on future processors. That's _meant_ to be really scary and keep folks from turning this on by default, aka. what this patch does. Your new CPU will be really slow if you turn this on! Boo! All that said, and given the information that Intel has released, I think this patch is generally the right thing to do. I don't think people are wrong for looking at "DODT" as being a new vulnerability. Intel obviously doesn't see it that way, which is why "DODT" has (as far as I can tell) not been treated with the same security pomp and circumstance as other stuff. Last, if you're going to propose that this be turned on, I expect to see at least _some_ performance data. DOITM=1 isn't free, even on Ice Lake.
On 1/25/23 07:29, Dave Hansen wrote: > There's another part here which I think was recently added to the > documentation: > > Intel expects the performance impact of this mode may be > significantly higher on future processors. > > That's _meant_ to be really scary and keep folks from turning this on by > default, aka. what this patch does. Your new CPU will be really slow if > you turn this on! Boo! *If* we go forward with this patch's approach in the kernel, I think we should at least consider what the kernel will do in a future where there are two classes of systems: 1. Ice Lake era ones where DOITM=1 is cheap enough that it can on by default. 2. "Future processors" where DOITM=1 by default is too costly. Maybe for #2 we set DOITM=0 in the kernel. Maybe we add per-task controls. But, there *is* DOITM cost where the large fleets are going to be tempted to turn it off somehow, somewhere. The kernel will be better off if we can design that in now.
On Wed, 25 Jan 2023 at 16:29, Dave Hansen <dave.hansen@intel.com> wrote: > > On 1/24/23 17:28, Eric Biggers wrote: > > To mitigate this CPU vulnerability, it's possible to enable "Data > > Operand Independent Timing Mode" (DOITM) by setting a bit in a MSR. > > While Intel's documentation suggests that this bit should only be set > > where "necessary", that is highly impractical, given the fact that > > cryptography can happen nearly anywhere in the kernel and userspace, and > > the fact that the entire kernel likely needs to be protected anyway. > > I think this misses a key point from the documentation: > > This functionality is intended for use by software which has > already applied other techniques to mitigate software timing > side channels, such as those documented in Intel's Guidelines > for Mitigating Timing Side Channels Against Cryptographic > Implementations. > > Translating from Intel-speak: Intel thinks that DOITM purely a way to > make the CPU run slower if you haven't already written code specifically > to mitigate timing side channels. All pain, no gain. > > The kernel as a whole is not written that way. I'm sure the crypto > folks that are cc'd can tell us specifically if the kernel crypto code > is written following those recommendations. > Cryptography is often singled out because it deals with confidential data, and if timing variances leak the data, the confidentiality is violated. However, this is not fundamentally different from execution at a higher privilege level. In this case, the data is shielded from other observers by the h/w enforced privilege boundary rather than encryption, but if timing variances leak the data, the result is the same, i.e., data that was assumed to be confidential as far as user space is concerned is no longer confidential. All the nospec stuff we added for Spectre v1 serves the same purpose, essentially, although the timing variances due to cache misses are likely easier to measure. IOW, some of the kernel is now written that way in fact, although the author of that doc may have had something else in mind. So IMHO, the scope is really not as narrow as you think. > So, let's call this patch what it is: a potential global slowdown which > protects a very small amount of crypto code, probably just in userspace. > That is probably the code that's generating your RSA keys, so it's > quite important, but it's also a _very_ small total amount of code. > > There's another part here which I think was recently added to the > documentation: > > Intel expects the performance impact of this mode may be > significantly higher on future processors. > > That's _meant_ to be really scary and keep folks from turning this on by > default, aka. what this patch does. Your new CPU will be really slow if > you turn this on! Boo! > What is the penalty for switching it on and off? On arm64, it is now on by default in the kernel, and off by default in user space, and user space can opt into it using an unprivileged instruction. > All that said, and given the information that Intel has released, I > think this patch is generally the right thing to do. I don't think > people are wrong for looking at "DODT" as being a new vulnerability. > Intel obviously doesn't see it that way, which is why "DODT" has (as far > as I can tell) not been treated with the same security pomp and > circumstance as other stuff. > > Last, if you're going to propose that this be turned on, I expect to see > at least _some_ performance data. DOITM=1 isn't free, even on Ice Lake.
On 1/25/23 08:22, Ard Biesheuvel wrote: ... > All the nospec stuff we added for Spectre v1 serves the same purpose, > essentially, although the timing variances due to cache misses are > likely easier to measure. IOW, some of the kernel is now written that > way in fact, although the author of that doc may have had something > else in mind. > > So IMHO, the scope is really not as narrow as you think. I've spoken with the folks who wrote that doc. They've told me repeatedly that the scope is super narrow. Seriously, look at just *one* thing in the other Intel doc about mitigating timing side-channels[1]: be wary of code generated from high-level language source code that appears to adhere to all of these recommendations. The kernel has a fair amount of code written in high-level languages. The authors of the DOIT doc truly intend the real-world benefits of DOITM to be exceedingly narrow. I think it would be fair to say that they think: DOITM is basically useless for most code written in C, including basically the entire kernel. I'll go forward this on to them and make sure I'm not overstating this _too_ much. >> That's _meant_ to be really scary and keep folks from turning this on by >> default, aka. what this patch does. Your new CPU will be really slow if >> you turn this on! Boo! > > What is the penalty for switching it on and off? On arm64, it is now > on by default in the kernel, and off by default in user space, and > user space can opt into it using an unprivileged instruction. Right now, DOITM is controlled by a bit in an MSR and it applies everywhere. It is (thankfully) one of the cheap MSRs and is not architecturally serializing. That's still not ideal and there is a desire to expose the bit to userspace *somehow* to make it much, much cheaper to toggle. But, it'll still be an extra bit that needs to get managed and context switched. When I looked, the arm64 bit seemed to be in some flags register that got naturally saved and restored already on user<->kernel transitions. Was I reading it right? It seemed like a really nice, simple mechanism to me. 1. https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/secure-coding/mitigate-timing-side-channel-crypto-implementation.html
On Wed, 25 Jan 2023 at 17:46, Dave Hansen <dave.hansen@intel.com> wrote: > > On 1/25/23 08:22, Ard Biesheuvel wrote: > ... > > All the nospec stuff we added for Spectre v1 serves the same purpose, > > essentially, although the timing variances due to cache misses are > > likely easier to measure. IOW, some of the kernel is now written that > > way in fact, although the author of that doc may have had something > > else in mind. > > > > So IMHO, the scope is really not as narrow as you think. > > I've spoken with the folks who wrote that doc. They've told me > repeatedly that the scope is super narrow. Seriously, look at just > *one* thing in the other Intel doc about mitigating timing side-channels[1]: > > be wary of code generated from high-level language source code > that appears to adhere to all of these recommendations. > > The kernel has a fair amount of code written in high-level languages. > This is why we have crypto_memneq(), for instance, which is intended to be time invariant, whereas the time taken by ordinary memcmp() is typically correlated with the byte index of the first unequal byte. So what we do there is compare every byte, instead of returning early on the first mismatch. We do, however, perform the comparison in the native word size and not byte by byte. So if these optimizations result in word comparisons potentially taking less time if the first byte is a mismatch, we definitely have a problem. (This particular example may be far fetched but you get my point) > The authors of the DOIT doc truly intend the real-world benefits of > DOITM to be exceedingly narrow. I understand that this is the intent. But for privileged execution, this should really be the other way around: the scope for optimizations relying on data dependent timing is exceedingly narrow in the kernel, because any data it processes must be assumed to be confidential by default (wrt user space), and it will probably be rather tricky to identify CPU bound workloads in the kernel where data dependent optimizations are guaranteed to be safe and result in a significant speedup. This is basically the same argument I made for arm64. > I think it would be fair to say that > they think: > > DOITM is basically useless for most code written in C, including > basically the entire kernel. > > I'll go forward this on to them and make sure I'm not overstating this > _too_ much. > C code that was not specifically written with data independent timing in mind may still behave that way today, C code that *was* specifically written with data independent timing in mind (such as crypto_memneq()) could potentially lose that property under these optimizations. > >> That's _meant_ to be really scary and keep folks from turning this on by > >> default, aka. what this patch does. Your new CPU will be really slow if > >> you turn this on! Boo! > > > > What is the penalty for switching it on and off? On arm64, it is now > > on by default in the kernel, and off by default in user space, and > > user space can opt into it using an unprivileged instruction. > > Right now, DOITM is controlled by a bit in an MSR and it applies > everywhere. It is (thankfully) one of the cheap MSRs and is not > architecturally serializing. > > That's still not ideal and there is a desire to expose the bit to > userspace *somehow* to make it much, much cheaper to toggle. But, it'll > still be an extra bit that needs to get managed and context switched. > > When I looked, the arm64 bit seemed to be in some flags register that > got naturally saved and restored already on user<->kernel transitions. > Was I reading it right? It seemed like a really nice, simple mechanism > to me. > Indeed. It is part of PSTATE, which means is gets preserved/restored along with the rest of the SPSR (saved program state) when an exception is taken.
On Wed, Jan 25, 2023 at 4:30 PM Dave Hansen <dave.hansen@intel.com> wrote: > On 1/24/23 17:28, Eric Biggers wrote: > > To mitigate this CPU vulnerability, it's possible to enable "Data > > Operand Independent Timing Mode" (DOITM) by setting a bit in a MSR. > > While Intel's documentation suggests that this bit should only be set > > where "necessary", that is highly impractical, given the fact that > > cryptography can happen nearly anywhere in the kernel and userspace, and > > the fact that the entire kernel likely needs to be protected anyway. > > I think this misses a key point from the documentation: > > This functionality is intended for use by software which has > already applied other techniques to mitigate software timing > side channels, such as those documented in Intel's Guidelines > for Mitigating Timing Side Channels Against Cryptographic > Implementations. > > Translating from Intel-speak: Intel thinks that DOITM purely a way to > make the CPU run slower if you haven't already written code specifically > to mitigate timing side channels. All pain, no gain. > > The kernel as a whole is not written that way. The kernel as a whole also doesn't really use the FPU registers for anything other than checksumming and cryptography and stuff like that (it's disabled in the compiler flags because the FPU registers normally contain userspace state that must not be clobbered). The instructions listed on that Intel help page are all weird PM* and VP* arithmetic instructions that can't be generated from C code in the kernel (except for weird subsystems in which every function is only callable in kernel-FPU-enabled mode and the compiler is set to enable FPU instruction generation, by which I mean amdgpu). So with the current generations of CPUs, based on that Intel help page, "the kernel as a whole" should not be affected by this setting, it's mostly just cryptographic helpers written in assembly. From a quick-and-dirty grep: $ find arch/x86/ -name '*.S' | xargs egrep -li '(PMADDUBSW|PMULUDQ|VPMULHRSW|PMADDWD|VPLZCNTD|VPMULHUW|PMULDQ|VPLZCNTQ|VPMULHW|PMULHRSW|VPMADD52HUQ|VPMULLD|PMULHUW|VPMADD52LUQ|VPMULLQ|PMULHW|VPMADDUBSW|VPMULLW|PMULLD|VPMADDWD|VPMULUDQ|PMULLW|VPMULDQ)' arch/x86/crypto/nh-avx2-x86_64.S arch/x86/crypto/nh-sse2-x86_64.S arch/x86/crypto/poly1305-x86_64-cryptogams.S That's NHPoly1305 and Poly1305, two crypto functions. Poly1305 is used for the authentication part of AEAD schemes (in particular WireGuard uses Poly1305 for all its authentication), NHPoly1305 is for Adiantum (weird Android disk encryption stuff AFAIK). > I'm sure the crypto > folks that are cc'd can tell us specifically if the kernel crypto code > is written following those recommendations. Note that all the memory-management and block layer code that copies page contents around and swaps them to disk and so on should (ideally) be cryptographically timing-safe (yes I know that's very much not true with zram swap), because it operates on basically all userspace memory, including userspace memory used by cryptographic code. On top of that, some cryptographic secrets (including SSH authentication keys) need to be persisted somewhere - some cryptographic key material is stored in filesystems. And of course there are also secrets that are not quite cryptographic, and can be similarly important and small as cryptographic keys, such as cookies and OAuth tokens stored in the various databases of something like a browser. I think part of the reason why KSM is nowadays disabled in most cloud environments is that it introduced timing differences in the MM subsystem (between taking a write fault on a private page and taking a write fault on a KSM-shared page), and those timing differences could be used to leak information between VMs. (See, for example, <https://gruss.cc/files/remote_dedup.pdf>.) So I think there is quite a bit more kernel code that *handles cryptographic intermediate states and cryptographic secrets* (and non-cryptographic secrets that have similar properties) than kernel code that *performs cryptography*. Do we have a guarantee that things like page copying (for example as part of migration or forking), usercopy (for example for filesystem access), and whatever checksumming and other things might reasonably be happening in the block layer (including for swap) and in filesystems are going to be data-timing-independent on future processors? > So, let's call this patch what it is: a potential global slowdown which > protects a very small amount of crypto code, probably just in userspace. And a bunch of data paths that this crypto code's secrets might go through, in particular in the kernel. Also note that the kernel can do a bunch of important cryptography on its own, like for disk encryption, filesystem encryption, SSL/TLS sockets (yes you can have the kernel do the easy parts of TLS for you), IPSec, WireGuard, and probably a bunch of other things. > That is probably the code that's generating your RSA keys, so it's > quite important, but it's also a _very_ small total amount of code. Yeah but all the code that handles storage of your RSA keys really also needs a similar level of protection when touching them. See https://arxiv.org/pdf/2108.04600.pdf for an example: "We analyze the exploitability of base64 decoding functions across several widely used cryptographic libraries. Base64 decoding is used when loading keys stored in PEM format. We show that these functions by themselves leak sufficient information even if libraries are executed in trusted execution environments. In fact, we show that recent countermeasures to transient execution attacks such as LVI ease the exploitability of the observed faint leakages, allowing us to robustly infer sufficient information about RSA private keys with a single trace. We present a complete attack, including a broad library analysis, a high-resolution last level cache attack on SGX enclaves, and a fully parallelized implementation of the extend-and-prune approach that allows a complete key recovery at medium costs." > There's another part here which I think was recently added to the > documentation: > > Intel expects the performance impact of this mode may be > significantly higher on future processors. > > That's _meant_ to be really scary and keep folks from turning this on by > default, aka. what this patch does. Your new CPU will be really slow if > you turn this on! Boo! It does sound really scary, because it sounds like that list of instructions Intel published might be useless for trying to reason about the security impact of turning this optimization on for future hardware? If the performance impact is going to be higher on newer CPUs then I would *definitely* want DOITM to be on by default, and then when Intel can say what instructions are actually affected on a new CPU, we can figure out whether it's safe to turn it off as a per-cpu-generation thing? > All that said, and given the information that Intel has released, I > think this patch is generally the right thing to do. I don't think > people are wrong for looking at "DODT" as being a new vulnerability. > Intel obviously doesn't see it that way, which is why "DODT" has (as far > as I can tell) not been treated with the same security pomp and > circumstance as other stuff. > > Last, if you're going to propose that this be turned on, I expect to see > at least _some_ performance data. DOITM=1 isn't free, even on Ice Lake. So all in all, it sounds to me like this should only affect the performance of crypto code in current CPUs (where we really want data-independent timing), so it should definitely be enabled on current CPUs. And on future CPUs it's a surprise package of potential security problems, so for those we should be turning it on even more?
On 1/26/23 05:52, Jann Horn wrote: > On Wed, Jan 25, 2023 at 4:30 PM Dave Hansen <dave.hansen@intel.com> wrote: >> Translating from Intel-speak: Intel thinks that DOITM purely a way to >> make the CPU run slower if you haven't already written code specifically >> to mitigate timing side channels. All pain, no gain. >> >> The kernel as a whole is not written that way. > > The kernel as a whole also doesn't really use the FPU registers for > anything other than checksumming and cryptography and stuff like that > (it's disabled in the compiler flags because the FPU registers > normally contain userspace state that must not be clobbered). The > instructions listed on that Intel help page are all weird PM* and VP* > arithmetic instructions that can't be generated from C code in the > kernel (except for weird subsystems in which every function is only > callable in kernel-FPU-enabled mode and the compiler is set to enable > FPU instruction generation, by which I mean amdgpu). Maybe I'm totally missing something, but I thought the scope here was the "non-data operand independent timing behavior for the listed instructions" referenced here: > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html where the "listed instructions" is this list: > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html For example, that includes XOR with the 0x31 and 0x81 opcodes which there are plenty of in the kernel. That's a bit wider scope than the crazy instructions like VPLZCNTD. The crazy instructions list that I _think_ you were grepping for is the "Instructions That May Exhibit MCDT Behavior". That's also a fun one, but it is more narrow than the DOITM list.
On Thu, Jan 26, 2023 at 5:40 PM Dave Hansen <dave.hansen@intel.com> wrote: > On 1/26/23 05:52, Jann Horn wrote: > > On Wed, Jan 25, 2023 at 4:30 PM Dave Hansen <dave.hansen@intel.com> wrote: > >> Translating from Intel-speak: Intel thinks that DOITM purely a way to > >> make the CPU run slower if you haven't already written code specifically > >> to mitigate timing side channels. All pain, no gain. > >> > >> The kernel as a whole is not written that way. > > > > The kernel as a whole also doesn't really use the FPU registers for > > anything other than checksumming and cryptography and stuff like that > > (it's disabled in the compiler flags because the FPU registers > > normally contain userspace state that must not be clobbered). The > > instructions listed on that Intel help page are all weird PM* and VP* > > arithmetic instructions that can't be generated from C code in the > > kernel (except for weird subsystems in which every function is only > > callable in kernel-FPU-enabled mode and the compiler is set to enable > > FPU instruction generation, by which I mean amdgpu). > > Maybe I'm totally missing something, but I thought the scope here was > the "non-data operand independent timing behavior for the listed > instructions" referenced here: > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html > > where the "listed instructions" is this list: > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html > > For example, that includes XOR with the 0x31 and 0x81 opcodes which > there are plenty of in the kernel. That list says at the top: "The table below lists instructions that have data-independent timing." And the "MCDT (MXCSR-Sensitivity)" column that marks instructions that do not actually have data-independent timing if you set the MSR to the unsafe state is only marked for PMADDUBSW, PMADDWD, PMULDQ, PMULHRSW, PMULHUW, PMULHW, PMULLD, PMULLW, PMULUDQ, VPLZCNTD, VPLZCNTQ, VPMADD52HUQ, VPMADD52LUQ, VPMADDUBSW, VPMADDWD, VPMULDQ, VPMULHRSW, VPMULHUW, VPMULHW, VPMULLD, VPMULLQ, VPMULLW, VPMULUDQ. All the others are guaranteed to always have data-independent timing, if I understand the table correctly. > That's a bit wider scope than the crazy instructions like VPLZCNTD. The > crazy instructions list that I _think_ you were grepping for is the > "Instructions That May Exhibit MCDT Behavior". That's also a fun one, > but it is more narrow than the DOITM list.
On 1/26/23 09:52, Jann Horn wrote: >> Maybe I'm totally missing something, but I thought the scope here was >> the "non-data operand independent timing behavior for the listed >> instructions" referenced here: >> >>> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html >> where the "listed instructions" is this list: >> >>> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html >> For example, that includes XOR with the 0x31 and 0x81 opcodes which >> there are plenty of in the kernel. > That list says at the top: "The table below lists instructions that > have data-independent timing." So, first of all, apologies for the documentation. It needs some work, and I see where the confusion is coming from. But, I did just confirm with the folks that wrote it. The "listed instructions" *ARE* within the scope of being affected by the DOITM=0/1 setting. Instead of saying: The table below lists instructions that have data-independent timing. I think it should probably say something like: The table below lists instructions that have data-independent timing when DOITM is enabled. (Modulo the MXCSR interactions for now) Somebody from Intel please thwack me over the head if I'm managing to get this wrong (wouldn't be the first time).
On Thu, Jan 26, 2023 at 11:12:36AM -0800, Dave Hansen wrote: > On 1/26/23 09:52, Jann Horn wrote: > >> Maybe I'm totally missing something, but I thought the scope here was > >> the "non-data operand independent timing behavior for the listed > >> instructions" referenced here: > >> > >>> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html > >> where the "listed instructions" is this list: > >> > >>> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html > >> For example, that includes XOR with the 0x31 and 0x81 opcodes which > >> there are plenty of in the kernel. > > That list says at the top: "The table below lists instructions that > > have data-independent timing." > > So, first of all, apologies for the documentation. It needs some work, > and I see where the confusion is coming from. > > But, I did just confirm with the folks that wrote it. The "listed > instructions" *ARE* within the scope of being affected by the DOITM=0/1 > setting. > > Instead of saying: > > The table below lists instructions that have data-independent > timing. > > I think it should probably say something like: > > The table below lists instructions that have data-independent > timing when DOITM is enabled. > > (Modulo the MXCSR interactions for now) > > Somebody from Intel please thwack me over the head if I'm managing to > get this wrong (wouldn't be the first time). That's my understanding too, based on the part of https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html that describes DOITM. The page https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html is actively misleading, so yes please get that fixed. I think the following is also extremely important: For Intel® Core™ family processors based on microarchitectures before Ice Lake and Intel Atom® family processors based on microarchitectures before Gracemont that do not enumerate IA32_UARCH_MISC_CTL, developers may assume that the instructions listed here operate as if DOITM is enabled. The end result is that on older CPUs, Intel explicitly guarantees that the instructions in https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html have data operand independent timing. But on newer CPUs, Intel has explicitly removed that guarantee, and enabling DOITM is needed to get it back. By the way, surely the importance of using DOITM on a particular CPU correlates strongly with its performance overhead? So I'm not sure that benchmarks of DOITM would even be very interesting, as we couldn't necessarily decide on something like "don't use DOITM if the overhead is more than X percent", since that would exclude exactly the CPUs where it's the most important to use... I think the real takeaway here is that the optimizations that Intel is apparently trying to introduce are a bad idea and not safe at all. To the extent that they exist at all, they should be an opt-in thing, not out-opt. The CPU gets that wrong, but Linux can flip that and do it right. - Eric
On 1/26/23 14:37, Eric Biggers wrote: > The end result is that on older CPUs, Intel explicitly guarantees that the > instructions in > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html > have data operand independent timing. But on newer CPUs, Intel has explicitly > removed that guarantee, and enabling DOITM is needed to get it back. Yep. > By the way, surely the importance of using DOITM on a particular CPU correlates > strongly with its performance overhead? So I'm not sure that benchmarks of > DOITM would even be very interesting, as we couldn't necessarily decide on > something like "don't use DOITM if the overhead is more than X percent", since > that would exclude exactly the CPUs where it's the most important to use... We've looked at how bad the cure is compared to the disease for *every* one of these issues. As far as I know, either the disease has always gotten better over time or the cure gets cheaper (think IBRS/retpoline -> EIBRS or RDCL_NO for Meltdown). DOITM doesn't follow that pattern. It appears that it's fairly cheap now, but Intel is reserving the right to make it worse over time. I don't know how we can come up with a kernel policy which will be sane going forward when things get worse over time. > I think the real takeaway here is that the optimizations that Intel is > apparently trying to introduce are a bad idea and not safe at all. To the > extent that they exist at all, they should be an opt-in thing, not out-opt. The > CPU gets that wrong, but Linux can flip that and do it right. Let's try to break this down a _bit_. The code most sensitive to the DOITM behavior is code written to be resistant to timing side-channels today. DOITM=0 behavior is obviously unsafe there. This code is, unfortunately, out in the wild and completely unannotated and completely unaware of DOITM. The only way to mitigate it is to set DOITM=1 whenever it _might_ be running, which is when userspace runs. If the kernel can't be bothered to switch DOITM on kernel entry/exit, then it's game over and DOITM=1 is the only choice. For this code, it's arguable that even mitigations=off shouldn't set DOITM=0. The other extreme is code that's known to be vulnerable to timing side-channels no matter the value of DOITM. KSM is an example here. Setting DOITM=1 isn't going to make KSM's side channels go away. Then, there's the middle ground. There is surely some code that does not have timing side-channels with DOITM=0, but some appear with DOITM=1. I _think_ my colleagues at Intel consider this code to be in the same bucket as "known vulnerable". Basically, "if it's not designed to be timing side-channel resistant then it surely is vulnerable to one, regardless of DOITM." That middle ground matters a *lot* because it's probably 99.9% of all software, including essentially the whole kernel. I think what I'm hearing from folks in this thread is that if going from DOITM=1->0 _makes_ any code vulnerable in practice, then they view that as a bug -- a real-world security bug. It doesn't matter whether the code was designed to be side-channel resistant or not. A regression is a regression. I'm also hearing that the old (pre-Ice Lake), universal, DOITM=1 behavior is basically architecture because software depends on it. It can't be changed except with an explicit software opt-in. Even if DOITM were the right basic interface for this opt-in, the default polarity is wrong for an opt-in.
We've been talking about this inside Intel. Suffice to say that DOITM was not designed to be turned on all the time. If software turns it on all the time, it won't accomplish what it was designed to do. The _most_ likely thing that's going to happen is that DOITM gets redefined to be much more limited in scope. The current DOITM architecture is very broad, but the implementations have much more limited effects. This means that the architecture can probably be constrained and still match the hardware that's out there today. That's pure speculation (ha!) on my part. I think we should hold off on merging any DOITM patches until the dust settles. As far as I know, there is no pressing practical reason to apply something urgently. Any objections?
On Tue, Jan 31, 2023 at 02:48:05PM -0800, Dave Hansen wrote: > We've been talking about this inside Intel. Suffice to say that DOITM > was not designed to be turned on all the time. If software turns it on > all the time, it won't accomplish what it was designed to do. Why wouldn't it accomplish what it is designed to do? Does it not actually work? > > The _most_ likely thing that's going to happen is that DOITM gets > redefined to be much more limited in scope. The current DOITM > architecture is very broad, but the implementations have much more > limited effects. This means that the architecture can probably be > constrained and still match the hardware that's out there today. That's > pure speculation (ha!) on my part. > > I think we should hold off on merging any DOITM patches until the dust > settles. As far as I know, there is no pressing practical reason to > apply something urgently. > > Any objections? Does this mean that Intel will be restoring the data operand independent timing guarantee of some instructions that have had it removed? If so, which instructions will still be left? Also, given that the DOITM flag can only be set and cleared by the kernel, and assuming that Linux won't support DOITM in any way yet (as you're recommending), what should the developers of userspace cryptographic libraries do? Does Intel have a list of which instructions *already* have started having data operand dependent timing on released CPUs, so that the existing security impact can be assessed and so that developers can avoid using those instructions? - Eric
On 1/31/23 22:54, Eric Biggers wrote: > On Tue, Jan 31, 2023 at 02:48:05PM -0800, Dave Hansen wrote: >> We've been talking about this inside Intel. Suffice to say that DOITM >> was not designed to be turned on all the time. If software turns it on >> all the time, it won't accomplish what it was designed to do. > > Why wouldn't it accomplish what it is designed to do? Does it not actually > work? It was designed with the idea that timing-sensitive and *ONLY* timing-sensitive code would be wrapped with DOITM. Wrapping that code would allow the rest of the system to safely operate with fancy new optimizations that exhibit data dependent timing. But, first of all, that code isn't wrapped with DOITM-prodding APIs today. That model falls apart with current software on current DOITM implementations. Second, we consider the kernel in general to be timing-sensitive enough that we want DOITM=1 behavior in the kernel. Those lead software folks to set DOITM=1 on all the time. The fallout from that is that nobody will ever use those fancy new optimizations. If nobody can take advantage of them, silicon shouldn't be wasted on them in the first place. Basically, DOITM was meant to open the door for adding fancy new optimizations. It's a failure if it doesn't do that. >> The _most_ likely thing that's going to happen is that DOITM gets >> redefined to be much more limited in scope. The current DOITM >> architecture is very broad, but the implementations have much more >> limited effects. This means that the architecture can probably be >> constrained and still match the hardware that's out there today. That's >> pure speculation (ha!) on my part. >> >> I think we should hold off on merging any DOITM patches until the dust >> settles. As far as I know, there is no pressing practical reason to >> apply something urgently. >> >> Any objections? > > Does this mean that Intel will be restoring the data operand independent timing > guarantee of some instructions that have had it removed? If so, which > instructions will still be left? Speaking for myself and *not* the official Intel plan here. Yes, I think the pre-DOITM behavior can and will be restored for basically the entire list[1] (ignoring MCDT of course). > Also, given that the DOITM flag can only be set and cleared by the > kernel, and assuming that Linux won't support DOITM in any way yet > (as you're recommending), what should the developers of userspace > cryptographic libraries do? Does Intel have a list of which > instructions *already* have started having data operand dependent > timing on released CPUs, so that the existing security impact can be > assessed and so that developers can avoid using those instructions? Good questions. I want to make sure what I'm saying here is accurate, and I don't have good answers to those right now. We're working on it, and should have something useful to say in the next couple of days. 1. https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html
On Wed, Feb 01, 2023 at 10:09:16AM -0800, Dave Hansen wrote: > It was designed with the idea that timing-sensitive and *ONLY* > timing-sensitive code would be wrapped with DOITM. Wrapping that code > would allow the rest of the system to safely operate with fancy new > optimizations that exhibit data dependent timing. > > But, first of all, that code isn't wrapped with DOITM-prodding APIs > today. That model falls apart with current software on current DOITM > implementations. > > Second, we consider the kernel in general to be timing-sensitive enough > that we want DOITM=1 behavior in the kernel. > > Those lead software folks to set DOITM=1 on all the time. The fallout > from that is that nobody will ever use those fancy new optimizations. > If nobody can take advantage of them, silicon shouldn't be wasted on > them in the first place. > > Basically, DOITM was meant to open the door for adding fancy new > optimizations. It's a failure if it doesn't do that. It seems like it still potentially accomplishes its intended purpose even if it's opt-in. A prctl to turn it on for a particular process, or a particular process and its children, could work if that process knows it wants all the performance it can get and won't be handling data for which privilege boundaries matter. If this actually has the potential for substantial optimizations that would be worth it. But yeah, opt-out seems like a non-starter, since it'd require fixing all the cryptographic code in the world to request DOITM=1.
On 2/1/23 10:09, Dave Hansen wrote: > Good questions. I want to make sure what I'm saying here is accurate, > and I don't have good answers to those right now. We're working on it, > and should have something useful to say in the next couple of days. This is an attempt to make sure that everyone that is concerned about DOITM behavior has all the same information as Intel folks before we make a decision about a kernel implementation. Here we go... The execution latency of the DOIT instructions[1] does not depend on the value of data operands on all currently-supported Intel processors. This includes all processors that enumerate DOITM support. There are no plans for any processors where this behavior would change, despite the DOITM architecture theoretically allowing it. So, what's the point of DOITM in the first place? Fixed execution latency does not mean that programs as a whole will have constant overall latency. DOITM currently affects features which do not affect execution latency but may, for instance, affect overall program latency due to side-effects of prefetching on the cache. Even with fixed instruction execution latency, these side-effects can matter especially to the paranoid. Today, those affected features are: * Data Dependent Prefetchers (DDP)[2] * Some Fast Store Forwarding Predictors (FSFP)[3]. There are existing controls for those features, including spec_store_bypass_disable=[4]. Some paranoid software may already have mitigations in place that are a superset of DOITM. In addition, both DDP and FSFP were also designed to limit nastiness when crossing privilege boundaries. Please see the linked docs for more details. That's basically the Intel side of things. Everyone else should have all the background that I have. Now back to maintainer mode... So, in short, I don't think everyone's crypto libraries are busted today when DOITM=0. I don't think they're going to _become_ busted any time soon. Where do we go from here? There are a few choices: 1. Apply the patch in this thread, set DOITM=1 always. Today, this reduces exposure to DDP and FSFP, but probably only for userspace. It reduces exposure to any future predictors under the DOITM umbrella and also from Intel changing its mind. 2. Ignore DOITM, leave it to the hardware default of DOITM=0. Today, this probably just steers folks to using relatively heavyweight mitigations (like SSBD) if they want DDP/FSFP disabled. It also leaves Linux exposed to Intel changing its mind on its future plans. 3. Apply the patch in this thread, but leave DOITM=0 as the default. This lets folks enable DOITM on a moment's notice if the need arises. There are some other crazier choices like adding ABI to toggle DOITM for userspace, but I'm not sure they're even worth discussing. #1 is obviously the only way to go if the DOITM architecture remains as-is. There is talk of making changes, like completely removing the idea of variable execution latency. But that's a slow process and would be a non-starter if *anyone* (like other OSes) is depending on the existing DOITM definition. My inclination is to wait a couple of weeks to see which way DOITM is headed and if the definition is likely to get changed. Does anyone feel any greater sense of urgency? 1. https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html 2. https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/data-dependent-prefetcher.html 3. https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/fast-store-forwarding-predictor.html 4. https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
BTW, I'm basically moving forward assuming that we're going to apply this patch in _some_ form. I'm going to make some changes, but I'll discuss them in this thread to make sure we're all on the same page first. On 1/24/23 17:28, Eric Biggers wrote: > +Affected CPUs > +------------- > + > +This vulnerability affects Intel Core family processors based on the Ice Lake > +and later microarchitectures, and Intel Atom family processors based on the > +Gracemont and later microarchitectures. For more information, see Intel's > +documentation [1]_. I had a hard time following the docs in this area. But I'm not sure this statement is correct. The docs actually say: For Intel® Core™ family processors based on microarchitectures before Ice Lake and Intel Atom® family processors based on microarchitectures before Gracemont that do not enumerate IA32_UARCH_MISC_CTL, developers may assume that the instructions listed here operate as if DOITM is enabled. A processor needs to be before "Ice Lake" and friends *AND* not enumerate IA32_UARCH_MISC_CTL to be unaffected. There's also another tweak that's needed because: Processors that do not enumerate IA32_ARCH_CAPABILITIES[DOITM] when the latest microcode is applied do not need to set IA32_UARCH_MISC_CTL [DOITM] in order to have the behavior described in this document... First, we need to mention the "latest microcode" thing in the kernel docs. I also _think_ the whole "microarchitectures before" stuff is rather irrelevant and we can simplify this down to: This vulnerability affects all Intel processors that support MSR_IA32_ARCH_CAPABILITIES and set MSR_IA32_ARCH_CAPABILITIES[DOITM] when the latest microcode is applied. Which reminds me. This: > +void update_doitm_msr(void) > +{ > + u64 msr; > + > + if (doitm_off) > + return; > + > + rdmsrl(MSR_IA32_UARCH_MISC_CTL, msr); > + wrmsrl(MSR_IA32_UARCH_MISC_CTL, msr | UARCH_MISC_DOITM); > +} should probably be: void update_doitm_msr(void) { u64 msr; /* * All processors that enumerate support for DOIT * are affected *and* have the mitigation available. */ if (!boot_cpu_has_bug(X86_BUG_DODT)) return; rdmsrl(MSR_IA32_UARCH_MISC_CTL, msr); if (doitm_off) msr &= ~UARCH_MISC_DOITM; else msr |= UARCH_MISC_DOITM; wrmsrl(MSR_IA32_UARCH_MISC_CTL, msr); } in case the CPU isn't actually coming out of reset, like if kexec() left DOITM=1.
> On Feb 3, 2023, at 10:25 AM, Dave Hansen <dave.hansen@intel.com> wrote: > > BTW, I'm basically moving forward assuming that we're going to apply > this patch in _some_ form. I'm going to make some changes, but I'll > discuss them in this thread to make sure we're all on the same page first. Just checking in on the changes mentioned here. > On 1/24/23 17:28, Eric Biggers wrote: >> +Affected CPUs >> +------------- >> + >> +This vulnerability affects Intel Core family processors based on the Ice Lake >> +and later microarchitectures, and Intel Atom family processors based on the >> +Gracemont and later microarchitectures. For more information, see Intel's >> +documentation [1]_. > > I had a hard time following the docs in this area. > > But I'm not sure this statement is correct. The docs actually say: > > For Intel® Core™ family processors based on microarchitectures > before Ice Lake and Intel Atom® family processors based on > microarchitectures before Gracemont that do not enumerate > IA32_UARCH_MISC_CTL, developers may assume that the instructions > listed here operate as if DOITM is enabled. Have we been able to clarify if this assumption is guaranteed? > > A processor needs to be before "Ice Lake" and friends *AND* not > enumerate IA32_UARCH_MISC_CTL to be unaffected. > > There's also another tweak that's needed because: > > Processors that do not enumerate IA32_ARCH_CAPABILITIES[DOITM] > when the latest microcode is applied do not need to set > IA32_UARCH_MISC_CTL [DOITM] in order to have the behavior > described in this document... > > First, we need to mention the "latest microcode" thing in the kernel > docs. I also _think_ the whole "microarchitectures before" stuff is > rather irrelevant and we can simplify this down to: > > This vulnerability affects all Intel processors that support > MSR_IA32_ARCH_CAPABILITIES and set MSR_IA32_ARCH_CAPABILITIES[DOITM] > when the latest microcode is applied. > Certainly a lot cleaner. Would be great if the Intel docs reflected this. — Regards, Roxana > Which reminds me. This: > >> +void update_doitm_msr(void) >> +{ >> + u64 msr; >> + >> + if (doitm_off) >> + return; >> + >> + rdmsrl(MSR_IA32_UARCH_MISC_CTL, msr); >> + wrmsrl(MSR_IA32_UARCH_MISC_CTL, msr | UARCH_MISC_DOITM); >> +} > > should probably be: > > void update_doitm_msr(void) > { > u64 msr; > > /* > * All processors that enumerate support for DOIT > * are affected *and* have the mitigation available. > */ > if (!boot_cpu_has_bug(X86_BUG_DODT)) > return; > > rdmsrl(MSR_IA32_UARCH_MISC_CTL, msr); > if (doitm_off) > msr &= ~UARCH_MISC_DOITM; > else > msr |= UARCH_MISC_DOITM; > wrmsrl(MSR_IA32_UARCH_MISC_CTL, msr); > } > > in case the CPU isn't actually coming out of reset, like if kexec() left > DOITM=1. >
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index f54867cadb0f6..b76966aac3470 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -524,6 +524,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/itlb_multihit /sys/devices/system/cpu/vulnerabilities/mmio_stale_data /sys/devices/system/cpu/vulnerabilities/retbleed + /sys/devices/system/cpu/vulnerabilities/dodt Date: January 2018 Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> Description: Information about CPU vulnerabilities diff --git a/Documentation/admin-guide/hw-vuln/dodt.rst b/Documentation/admin-guide/hw-vuln/dodt.rst new file mode 100644 index 0000000000000..03d24f4d97100 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/dodt.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: GPL-2.0 + +DODT - Data Operand Dependent Timing +==================================== + +Data Operand Dependent Timing (DODT) is a CPU vulnerability that makes the +execution times of instructions depend on the values of the data operated on. + +This vulnerability potentially enables side-channel attacks on data, including +cryptographic keys. Most cryptography algorithms require that a variety of +instructions be constant-time in order to prevent side-channel attacks. + +Affected CPUs +------------- + +This vulnerability affects Intel Core family processors based on the Ice Lake +and later microarchitectures, and Intel Atom family processors based on the +Gracemont and later microarchitectures. For more information, see Intel's +documentation [1]_. + +Mitigation +---------- + +Mitigation of this vulnerability involves setting a Model Specific Register +(MSR) bit to enable Data Operand Independent Timing Mode (DOITM). + +By the default, the kernel does this on all CPUs. This mitigation is global, so +it applies to both the kernel and userspace. + +This mitigation can be disabled by adding ``doitm=off`` to the kernel command +line. It's also one of the mitigations that can be disabled by +``mitigations=off``. + +References +---------- +.. [1] Data Operand Independent Timing Instruction Set Architecture (ISA) Guidance + https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst index 4df436e7c4177..cd962f9634dad 100644 --- a/Documentation/admin-guide/hw-vuln/index.rst +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -18,3 +18,4 @@ are configurable at compile, boot or run time. core-scheduling.rst l1d_flush.rst processor_mmio_stale_data.rst + dodt.rst diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6cfa6e3996cf7..a6a872c4365e6 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1119,6 +1119,12 @@ The filter can be disabled or changed to another driver later using sysfs. + doitm=off [X86,INTEL] Disable the use of Data Operand Independent + Timing Mode (DOITM). I.e., disable the mitigation for + the Data Operand Dependent Timing (DODT) CPU + vulnerability. For details, see + Documentation/admin-guide/hw-vuln/dodt.rst + driver_async_probe= [KNL] List of driver names to be probed asynchronously. * matches with all driver names. If * is specified, the @@ -3259,6 +3265,7 @@ no_uaccess_flush [PPC] mmio_stale_data=off [X86] retbleed=off [X86] + doitm=off [X86,INTEL] Exceptions: This does not have any effect on diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 61012476d66e0..7ddb7390c8a60 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -466,5 +466,6 @@ #define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */ #define X86_BUG_RETBLEED X86_BUG(27) /* CPU is affected by RETBleed */ #define X86_BUG_EIBRS_PBRSB X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */ +#define X86_BUG_DODT X86_BUG(29) /* CPU has data operand dependent timing by default */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 37ff47552bcb7..1d929ad4301a8 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -114,6 +114,11 @@ * Not susceptible to * TSX Async Abort (TAA) vulnerabilities. */ +#define ARCH_CAP_DOITM BIT(12) /* + * The processor supports Data Operand + * Independent Timing Mode, i.e. it is + * vulnerable to data operand dependent timing. + */ #define ARCH_CAP_SBDR_SSDP_NO BIT(13) /* * Not susceptible to SBDR and SSDP * variants of Processor MMIO stale data @@ -155,6 +160,9 @@ * supported */ +#define MSR_IA32_UARCH_MISC_CTL 0x00001b01 +#define UARCH_MISC_DOITM BIT(0) /* Enable Data Operand Independent Timing Mode. */ + #define MSR_IA32_FLUSH_CMD 0x0000010b #define L1D_FLUSH BIT(0) /* * Writeback and invalidate the diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index bca0bd8f48464..7ea20a2618232 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -49,6 +49,7 @@ static void __init taa_select_mitigation(void); static void __init mmio_select_mitigation(void); static void __init srbds_select_mitigation(void); static void __init l1d_flush_select_mitigation(void); +static void __init dodt_select_mitigation(void); /* The base value of the SPEC_CTRL MSR without task-specific bits set */ u64 x86_spec_ctrl_base; @@ -124,6 +125,8 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush); DEFINE_STATIC_KEY_FALSE(mmio_stale_data_clear); EXPORT_SYMBOL_GPL(mmio_stale_data_clear); +static bool __ro_after_init doitm_off; + void __init check_bugs(void) { identify_boot_cpu(); @@ -167,6 +170,7 @@ void __init check_bugs(void) md_clear_select_mitigation(); srbds_select_mitigation(); l1d_flush_select_mitigation(); + dodt_select_mitigation(); arch_smt_update(); @@ -2218,6 +2222,42 @@ static int __init l1tf_cmdline(char *str) } early_param("l1tf", l1tf_cmdline); +#undef pr_fmt +#define pr_fmt(fmt) "DODT: " fmt + +static __init int doitm_parse_cmdline(char *str) +{ + if (!str) + return -EINVAL; + + if (!boot_cpu_has_bug(X86_BUG_DODT)) + return 0; + + doitm_off = !strcmp(str, "off"); + if (doitm_off) + pr_info("'doitm=off' specified, not enabling Data Operand Independent Timing Mode\n"); + return 0; +} +early_param("doitm", doitm_parse_cmdline); + +static __init void dodt_select_mitigation(void) +{ + if (cpu_mitigations_off() || !boot_cpu_has_bug(X86_BUG_DODT)) + doitm_off = true; + update_doitm_msr(); +} + +void update_doitm_msr(void) +{ + u64 msr; + + if (doitm_off) + return; + + rdmsrl(MSR_IA32_UARCH_MISC_CTL, msr); + wrmsrl(MSR_IA32_UARCH_MISC_CTL, msr | UARCH_MISC_DOITM); +} + #undef pr_fmt #define pr_fmt(fmt) fmt @@ -2415,6 +2455,14 @@ static ssize_t retbleed_show_state(char *buf) return sysfs_emit(buf, "%s\n", retbleed_strings[retbleed_mitigation]); } +static ssize_t dodt_show_state(char *buf) +{ + if (doitm_off) + return sysfs_emit(buf, "Vulnerable\n"); + else + return sysfs_emit(buf, "Mitigation: DOITM\n"); +} + static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { @@ -2464,6 +2512,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr case X86_BUG_RETBLEED: return retbleed_show_state(buf); + case X86_BUG_DODT: + return dodt_show_state(buf); + default: break; } @@ -2528,4 +2579,9 @@ ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, cha { return cpu_show_common(dev, attr, buf, X86_BUG_RETBLEED); } + +ssize_t cpu_show_dodt(struct device *dev, struct device_attribute *attr, char *buf) +{ + return cpu_show_common(dev, attr, buf, X86_BUG_DODT); +} #endif diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9cfca3d7d0e20..b27fa7e730f32 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1325,6 +1325,9 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO)) setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT); + if (ia32_cap & ARCH_CAP_DOITM) + setup_force_cpu_bug(X86_BUG_DODT); + if (cpu_matches(cpu_vuln_whitelist, NO_SPECULATION)) return; @@ -1972,6 +1975,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c) validate_apic_and_package_id(c); x86_spec_ctrl_setup_ap(); update_srbds_msr(); + update_doitm_msr(); tsx_ap_init(); } diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 7c9b5893c30ab..187190cf1ccca 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -82,6 +82,7 @@ unsigned int aperfmperf_get_khz(int cpu); extern void x86_spec_ctrl_setup_ap(void); extern void update_srbds_msr(void); +extern void update_doitm_msr(void); extern u64 x86_read_arch_cap_msr(void); diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 4c98849577d4e..281d4e18dddd2 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -576,6 +576,12 @@ ssize_t __weak cpu_show_retbleed(struct device *dev, return sysfs_emit(buf, "Not affected\n"); } +ssize_t __weak cpu_show_dodt(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "Not affected\n"); +} + static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); @@ -587,6 +593,7 @@ static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL); static DEVICE_ATTR(srbds, 0444, cpu_show_srbds, NULL); static DEVICE_ATTR(mmio_stale_data, 0444, cpu_show_mmio_stale_data, NULL); static DEVICE_ATTR(retbleed, 0444, cpu_show_retbleed, NULL); +static DEVICE_ATTR(dodt, 0444, cpu_show_dodt, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { &dev_attr_meltdown.attr, @@ -600,6 +607,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = { &dev_attr_srbds.attr, &dev_attr_mmio_stale_data.attr, &dev_attr_retbleed.attr, + &dev_attr_dodt.attr, NULL }; diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 314802f98b9da..8fa28be9f48bc 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -70,6 +70,8 @@ extern ssize_t cpu_show_mmio_stale_data(struct device *dev, char *buf); extern ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_dodt(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata,