Message ID | 20231221140239.4349-1-weijiang.yang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Enable CET Virtualization | expand |
On Thu, 2023-12-21 at 09:02 -0500, Yang Weijiang wrote: > Control-flow Enforcement Technology (CET) is a kind of CPU feature > used > to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) > attacks. > It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP > style control-flow subversion attacks. > > Shadow Stack (SHSTK): > A shadow stack is a second stack used exclusively for control > transfer > operations. The shadow stack is separate from the data/normal stack > and > can be enabled individually in user and kernel mode. When shadow > stack > is enabled, CALL pushes the return address on both the data and > shadow > stack. RET pops the return address from both stacks and compares > them. > If the return addresses from the two stacks do not match, the > processor > generates a #CP. > > Indirect Branch Tracking (IBT): > IBT introduces new instruction(ENDBRANCH)to mark valid target > addresses of > indirect branches (CALL, JMP etc...). If an indirect branch is > executed > and the next instruction is _not_ an ENDBRANCH, the processor > generates a > #CP. These instruction behaves as a NOP on platforms that doesn't > support > CET. What is the design around CET and the KVM emulator? My understanding is that the KVM emulator kind of does what it has to keep things running, and isn't expected to emulate every possible instruction. With CET though, it is changing the behavior of existing supported instructions. I could imagine a guest could skip over CET enforcement by causing an MMIO exit and racing to overwrite the exit- causing instruction from a different vcpu to be an indirect CALL/RET, etc. With reasonable assumptions around the threat model in use by the guest this is probably not a huge problem. And I guess also reasonable assumptions about functional expectations, as a misshandled CALL or RET by the emulator would corrupt the shadow stack. But, another thing to do could be to just return X86EMUL_UNHANDLEABLE or X86EMUL_RETRY_INSTR when CET is active and RET or CALL are emulated. And I guess also for all instructions if the TRACKER bit is set. It might tie up that loose end without too much trouble. Anyway, was there a conscious decision to just punt on CET enforcement in the emulator?
On 1/4/2024 2:50 AM, Edgecombe, Rick P wrote: > On Thu, 2023-12-21 at 09:02 -0500, Yang Weijiang wrote: >> Control-flow Enforcement Technology (CET) is a kind of CPU feature >> used >> to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) >> attacks. >> It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP >> style control-flow subversion attacks. >> >> Shadow Stack (SHSTK): >> A shadow stack is a second stack used exclusively for control >> transfer >> operations. The shadow stack is separate from the data/normal stack >> and >> can be enabled individually in user and kernel mode. When shadow >> stack >> is enabled, CALL pushes the return address on both the data and >> shadow >> stack. RET pops the return address from both stacks and compares >> them. >> If the return addresses from the two stacks do not match, the >> processor >> generates a #CP. >> >> Indirect Branch Tracking (IBT): >> IBT introduces new instruction(ENDBRANCH)to mark valid target >> addresses of >> indirect branches (CALL, JMP etc...). If an indirect branch is >> executed >> and the next instruction is _not_ an ENDBRANCH, the processor >> generates a >> #CP. These instruction behaves as a NOP on platforms that doesn't >> support >> CET. > What is the design around CET and the KVM emulator? KVM doesn't emulate CET HW behavior for guest CET, instead it leaves CET related checks and handling in guest kernel. E.g., if emulated JMP/CALL in emulator triggers mismatch of data stack and shadow stack contents, #CP is generated in non-root mode instead of being injected by KVM. KVM only emulates basic x86 HW behaviors, e.g., call/jmp/ret/in/out etc. > My understanding is that the KVM emulator kind of does what it has to > keep things running, and isn't expected to emulate every possible > instruction. With CET though, it is changing the behavior of existing > supported instructions. I could imagine a guest could skip over CET > enforcement by causing an MMIO exit and racing to overwrite the exit- > causing instruction from a different vcpu to be an indirect CALL/RET, > etc. Can you elaborate the case? I cannot figure out how it works. > With reasonable assumptions around the threat model in use by the > guest this is probably not a huge problem. And I guess also reasonable > assumptions about functional expectations, as a misshandled CALL or RET > by the emulator would corrupt the shadow stack. KVM emulates general x86 HW behaviors, if something wrong happens after emulation then it can happen even on bare metal, i.e., guest SW most likely gets wrong somewhere and it's expected to trigger CET exceptions in guest kernel. > But, another thing to do could be to just return X86EMUL_UNHANDLEABLE > or X86EMUL_RETRY_INSTR when CET is active and RET or CALL are emulated. IMHO, translating the CET induced exceptions into X86EMUL_UNHANDLEABLE or X86EMUL_RETRY_INSTR would confuse guest kernel or even VMM, I prefer letting guest kernel handle #CP directly. > And I guess also for all instructions if the TRACKER bit is set. It > might tie up that loose end without too much trouble. > > Anyway, was there a conscious decision to just punt on CET enforcement > in the emulator? I don't remember we ever discussed it in community, but since KVM maintainers reviewed the CET virtualization series for a long time, I assume we're moving on the right way :-)
On Thu, 2024-01-04 at 15:11 +0800, Yang, Weijiang wrote: > > What is the design around CET and the KVM emulator? > > KVM doesn't emulate CET HW behavior for guest CET, instead it leaves > CET related > checks and handling in guest kernel. E.g., if emulated JMP/CALL in > emulator triggers > mismatch of data stack and shadow stack contents, #CP is generated in > non-root > mode instead of being injected by KVM. KVM only emulates basic x86 > HW behaviors, > e.g., call/jmp/ret/in/out etc. Right. In the case of CET those basic behaviors (call/jmp/ret) now have host emulation behavior that doesn't match what guest execution would do. > > > My understanding is that the KVM emulator kind of does what it has > > to > > keep things running, and isn't expected to emulate every possible > > instruction. With CET though, it is changing the behavior of > > existing > > supported instructions. I could imagine a guest could skip over CET > > enforcement by causing an MMIO exit and racing to overwrite the > > exit- > > causing instruction from a different vcpu to be an indirect > > CALL/RET, > > etc. > > Can you elaborate the case? I cannot figure out how it works. The point that it should be possible for KVM to emulate call/ret with CET enabled. Not saying the specific case is critical, but the one I used as an example was that the KVM emulator can (or at least in the not too distant past) be forced to emulate arbitrary instructions if the guest overwrites the instruction between the exit and the SW fetch from the host. The steps are: vcpu 1 vcpu 2 ------------------------------------- mov to mmio addr vm exit ept_misconfig overwrite mov instruction to call %rax host emulator fetches host emulates call instruction So then the guest call operation will skip the endbranch check. But I'm not sure that there are not less exotic cases that would run across it. I see a bunch of cases where write protected memory kicks to the emulator as well. Not sure the exact scenarios and whether this could happen naturally in races during live migration, dirty tracking, etc. Again, I'm more just asking the exposure and thinking on it. > > > With reasonable assumptions around the threat model in use by the > > guest this is probably not a huge problem. And I guess also > > reasonable > > assumptions about functional expectations, as a misshandled CALL or > > RET > > by the emulator would corrupt the shadow stack. > > KVM emulates general x86 HW behaviors, if something wrong happens > after emulation > then it can happen even on bare metal, i.e., guest SW most likely > gets wrong somewhere > and it's expected to trigger CET exceptions in guest kernel. > > > But, another thing to do could be to just return > > X86EMUL_UNHANDLEABLE > > or X86EMUL_RETRY_INSTR when CET is active and RET or CALL are > > emulated. > > IMHO, translating the CET induced exceptions into > X86EMUL_UNHANDLEABLE or X86EMUL_RETRY_INSTR would confuse guest > kernel or even VMM, I prefer letting guest kernel handle #CP > directly. Doesn't X86EMUL_RETRY_INSTR kick it back to the guest which is what you want? Today it will do the operations without the special CET behavior. But I do see how this could be tricky to avoid the guest getting stuck in a loop with X86EMUL_RETRY_INSTR. I guess the question is if this situation is encountered, when KVM can't handle the emulation correctly, what should happen? I think usually it returns KVM_INTERNAL_ERROR_EMULATION to userspace? So I don't see why the CET case is different. If the scenario (call/ret emulation with CET enabled) doesn't happen, how can the guest be confused? If it does happen, won't it be an issue? > > And I guess also for all instructions if the TRACKER bit is set. It > > might tie up that loose end without too much trouble. > > > > Anyway, was there a conscious decision to just punt on CET > > enforcement > > in the emulator? > > I don't remember we ever discussed it in community, but since KVM > maintainers reviewed > the CET virtualization series for a long time, I assume we're moving > on the right way :-) It seems like kind of leap that if it never came up that they must be approving of the specific detail. Don't know. Maybe they will chime in.
On Thu, 2023-12-21 at 09:02 -0500, Yang Weijiang wrote: > Tests: > ====================== > This series passed basic CET user shadow stack test and kernel IBT > test in L1 > and L2 guest. With the build fix, reproduced the basic IBT and user shadow stack tests, plus the CET enabled glibc unit tests. Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
On Thu, Jan 04, 2024, Rick P Edgecombe wrote: > On Thu, 2024-01-04 at 15:11 +0800, Yang, Weijiang wrote: > > > What is the design around CET and the KVM emulator? > > > > KVM doesn't emulate CET HW behavior for guest CET, instead it leaves CET > > related checks and handling in guest kernel. E.g., if emulated JMP/CALL in > > emulator triggers mismatch of data stack and shadow stack contents, #CP is > > generated in non-root mode instead of being injected by KVM. KVM only > > emulates basic x86 HW behaviors, e.g., call/jmp/ret/in/out etc. > > Right. In the case of CET those basic behaviors (call/jmp/ret) now have > host emulation behavior that doesn't match what guest execution would > do. I wouldn't say that KVM emulates "basic" x86. KVM emulates instructions that BIOS and kernels execute in Big Real Mode (and other "illegal" modes prior to Intel adding unrestricted guest), instructions that guests commonly use for MMIO, I/O, and page table modifications, and few other tidbits that have cropped up over the years. In other words, as Weijiang suspects below, KVM's emulator handles juuust enough stuff to squeak by and not barf on real world guests. It is not, and has never been, anything remotely resembling a fully capable architectural emulator. > > > My understanding is that the KVM emulator kind of does what it has to > > > keep things running, and isn't expected to emulate every possible > > > instruction. With CET though, it is changing the behavior of existing > > > supported instructions. I could imagine a guest could skip over CET > > > enforcement by causing an MMIO exit and racing to overwrite the exit- > > > causing instruction from a different vcpu to be an indirect CALL/RET, > > > etc. > > > > Can you elaborate the case? I cannot figure out how it works. > > The point that it should be possible for KVM to emulate call/ret with > CET enabled. Not saying the specific case is critical, but the one I > used as an example was that the KVM emulator can (or at least in the > not too distant past) be forced to emulate arbitrary instructions if > the guest overwrites the instruction between the exit and the SW fetch > from the host. > > The steps are: > vcpu 1 vcpu 2 > ------------------------------------- > mov to mmio addr > vm exit ept_misconfig > overwrite mov instruction to call %rax > host emulator fetches > host emulates call instruction > > So then the guest call operation will skip the endbranch check. But I'm > not sure that there are not less exotic cases that would run across it. > I see a bunch of cases where write protected memory kicks to the > emulator as well. Not sure the exact scenarios and whether this could > happen naturally in races during live migration, dirty tracking, etc. It's for shadow paging. Instead of _immediately_ zapping SPTEs on any write to a shadowed guest PTE, KVM instead tries to emulate the faulting instruction (and then still zaps SPTE). If KVM can't emulate the instruction for whatever reason, then KVM will _usually_ just zap the SPTE and resume the guest, i.e. retry the faulting instruction. The reason KVM doesn't automatically/unconditionally zap and retry is that there are circumstances where the guest can't make forward progress, e.g. if an instruction is using a guest PTE that it is writing, if L2 is modifying L1 PTEs, and probably a few other edge cases I'm forgetting. > Again, I'm more just asking the exposure and thinking on it. If you care about exposure to the emulator from a guest security perspective, assume that a compromised guest can coerce KVM into attempting to emulate arbitrary bytes. As in the situation described above, it's not _that_ difficult to play games with TLBs and instruction vs. data caches. If all you care about is not breaking misbehaving guests, I wouldn't worry too much about it. > > > With reasonable assumptions around the threat model in use by the guest > > > this is probably not a huge problem. And I guess also reasonable > > > assumptions about functional expectations, as a misshandled CALL or RET > > > by the emulator would corrupt the shadow stack. > > > > KVM emulates general x86 HW behaviors, if something wrong happens after > > emulation then it can happen even on bare metal, i.e., guest SW most likely > > gets wrong somewhere and it's expected to trigger CET exceptions in guest > > kernel. No, the days of KVM making shit up from are done. IIUC, you're advocating that it's ok for KVM to induce a #CP that architecturally should not happen. That is not acceptable, full stop. Retrying the instruction in the guest, exiting to userspace, and even terminating the VM are all perfectly acceptable behaviors if KVM encounters something it can't *correctly* emulate. But clobbering the shadow stack or not detecting a CFI violation, even if the guest is misbehaving, is not ok. > > > But, another thing to do could be to just return X86EMUL_UNHANDLEABLE or > > > X86EMUL_RETRY_INSTR when CET is active and RET or CALL are emulated. > > > > IMHO, translating the CET induced exceptions into X86EMUL_UNHANDLEABLE or > > X86EMUL_RETRY_INSTR would confuse guest kernel or even VMM, I prefer > > letting guest kernel handle #CP directly. > > Doesn't X86EMUL_RETRY_INSTR kick it back to the guest which is what you > want? Today it will do the operations without the special CET behavior. > > But I do see how this could be tricky to avoid the guest getting stuck > in a loop with X86EMUL_RETRY_INSTR. I guess the question is if this > situation is encountered, when KVM can't handle the emulation > correctly, what should happen? I think usually it returns > KVM_INTERNAL_ERROR_EMULATION to userspace? So I don't see why the CET > case is different. > > If the scenario (call/ret emulation with CET enabled) doesn't happen, > how can the guest be confused? If it does happen, won't it be an issue? > > > > And I guess also for all instructions if the TRACKER bit is set. It > > > might tie up that loose end without too much trouble. > > > > > > Anyway, was there a conscious decision to just punt on CET enforcement in > > > the emulator? > > > > I don't remember we ever discussed it in community, but since KVM > > maintainers reviewed the CET virtualization series for a long time, I > > assume we're moving on the right way :-) > > It seems like kind of leap that if it never came up that they must be > approving of the specific detail. Don't know. Maybe they will chime in. Yeah, I don't even know what the TRACKER bit does (I don't feel like reading the SDM right now), let alone if what KVM does or doesn't do in response is remotely correct. For CALL/RET (and presumably any branch instructions with IBT?) other instructions that are directly affected by CET, the simplest thing would probably be to disable those in KVM's emulator if shadow stacks and/or IBT are enabled, and let KVM's failure paths take it from there. Then, *if* a use case comes along where the guest is utilizing CET and "needs" KVM to emulate affected instructions, we can add the necessary support the emulator. Alternatively, if teaching KVM's emulator to play nice with shadow stacks and IBT is easy-ish, just do that.
On Thu, 2024-01-04 at 16:22 -0800, Sean Christopherson wrote: > No, the days of KVM making shit up from are done. IIUC, you're > advocating that > it's ok for KVM to induce a #CP that architecturally should not > happen. That is > not acceptable, full stop. Nope, not advocating that at all. I'm noticing that in this series KVM has special emulator behavior that doesn't match the HW when CET is enabled. That it *skips* emitting #CPs (and other CET behaviors SW depends on), and wondering if it is a problem. I'm worried that there is some way attackers will induce the host to emulate an instruction and skip CET enforcement that the HW would normally do. > > Retrying the instruction in the guest, exiting to userspace, and even > terminating > the VM are all perfectly acceptable behaviors if KVM encounters > something it can't > *correctly* emulate. But clobbering the shadow stack or not > detecting a CFI > violation, even if the guest is misbehaving, is not ok. > [snip] > Yeah, I don't even know what the TRACKER bit does (I don't feel like > reading the > SDM right now), let alone if what KVM does or doesn't do in response > is remotely > correct. > > For CALL/RET (and presumably any branch instructions with IBT?) other > instructions > that are directly affected by CET, the simplest thing would probably > be to disable > those in KVM's emulator if shadow stacks and/or IBT are enabled, and > let KVM's > failure paths take it from there. Right, that is what I was wondering might be the normal solution for situations like this. > > Then, *if* a use case comes along where the guest is utilizing CET > and "needs" > KVM to emulate affected instructions, we can add the necessary > support the emulator. > > Alternatively, if teaching KVM's emulator to play nice with shadow > stacks and IBT > is easy-ish, just do that. I think it will not be very easy.
On Thu, Jan 4, 2024 at 4:34 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Thu, 2024-01-04 at 16:22 -0800, Sean Christopherson wrote: > > No, the days of KVM making shit up from are done. IIUC, you're > > advocating that > > it's ok for KVM to induce a #CP that architecturally should not > > happen. That is > > not acceptable, full stop. > > Nope, not advocating that at all. I'm noticing that in this series KVM > has special emulator behavior that doesn't match the HW when CET is > enabled. That it *skips* emitting #CPs (and other CET behaviors SW > depends on), and wondering if it is a problem. > > I'm worried that there is some way attackers will induce the host to > emulate an instruction and skip CET enforcement that the HW would > normally do. > > > > > Retrying the instruction in the guest, exiting to userspace, and even > > terminating > > the VM are all perfectly acceptable behaviors if KVM encounters > > something it can't > > *correctly* emulate. But clobbering the shadow stack or not > > detecting a CFI > > violation, even if the guest is misbehaving, is not ok. > > > [snip] > > Yeah, I don't even know what the TRACKER bit does (I don't feel like > > reading the > > SDM right now), let alone if what KVM does or doesn't do in response > > is remotely > > correct. > > > > For CALL/RET (and presumably any branch instructions with IBT?) other > > instructions > > that are directly affected by CET, the simplest thing would probably > > be to disable > > those in KVM's emulator if shadow stacks and/or IBT are enabled, and > > let KVM's > > failure paths take it from there. > > Right, that is what I was wondering might be the normal solution for > situations like this. On AMD CPUs and on Intel CPUs with "unrestricted guest," I don't think there is any need to emulate an instruction that doesn't either (a) cause a VM-exit by opcode (e.g. CPUID) or (b) access memory. I think we should probably disable emulation of anything else, for both security and sanity. > > > > Then, *if* a use case comes along where the guest is utilizing CET > > and "needs" > > KVM to emulate affected instructions, we can add the necessary > > support the emulator. > > > > Alternatively, if teaching KVM's emulator to play nice with shadow > > stacks and IBT > > is easy-ish, just do that. > > I think it will not be very easy.
On Fri, Jan 05, 2024, Rick P Edgecombe wrote: > On Thu, 2024-01-04 at 16:22 -0800, Sean Christopherson wrote: > > No, the days of KVM making shit up from are done. IIUC, you're advocating > > that it's ok for KVM to induce a #CP that architecturally should not > > happen. That is not acceptable, full stop. > > Nope, not advocating that at all. Heh, wrong "you". That "you" was directed at Weijiang, who I *think* is saying that clobbering the shadow stack by emulating CALL+RET and thus inducing a bogus #CP in the guest is ok. > I'm noticing that in this series KVM has special emulator behavior that > doesn't match the HW when CET is enabled. That it *skips* emitting #CPs (and > other CET behaviors SW depends on), and wondering if it is a problem. Yes, it's a problem. But IIUC, as is KVM would also induce bogus #CPs (which is probably less of a problem in practice, but still not acceptable). > I'm worried that there is some way attackers will induce the host to > emulate an instruction and skip CET enforcement that the HW would > normally do. Yep. The best behavior for this is likely KVM's existing behavior, i.e. retry the instruction in the guest, and if that doesn't work, kick out to userspace and let userspace try to sort things out. > > For CALL/RET (and presumably any branch instructions with IBT?) other > > instructions that are directly affected by CET, the simplest thing would > > probably be to disable those in KVM's emulator if shadow stacks and/or IBT > > are enabled, and let KVM's failure paths take it from there. > > Right, that is what I was wondering might be the normal solution for > situations like this. If KVM can't emulate something, it either retries the instruction (with some decent logic to guard against infinite retries) or punts to userspace. Or if the platform owner likes to play with fire and doesn't enable KVM_CAP_EXIT_ON_EMULATION_FAILURE, KVM will inject a #UD (and still exit to userspace if the emulation happened at CPL0). And yes, that #UD is 100% KVM making shit up, and yes, it has caused problems and confusion. :-) > > Then, *if* a use case comes along where the guest is utilizing CET and > > "needs" KVM to emulate affected instructions, we can add the necessary > > support the emulator. > > > > Alternatively, if teaching KVM's emulator to play nice with shadow stacks > > and IBT is easy-ish, just do that. > > I think it will not be very easy. Yeah. As Jim alluded to, I think it's probably time to admit that emulating instructions for modern CPUs is a fools errand and KVM should simply stop trying.
On 1/5/2024 5:10 AM, Edgecombe, Rick P wrote: > On Thu, 2024-01-04 at 15:11 +0800, Yang, Weijiang wrote: [...] >>> My understanding is that the KVM emulator kind of does what it has >>> to >>> keep things running, and isn't expected to emulate every possible >>> instruction. With CET though, it is changing the behavior of >>> existing >>> supported instructions. I could imagine a guest could skip over CET >>> enforcement by causing an MMIO exit and racing to overwrite the >>> exit- >>> causing instruction from a different vcpu to be an indirect >>> CALL/RET, >>> etc. >> Can you elaborate the case? I cannot figure out how it works. > The point that it should be possible for KVM to emulate call/ret with > CET enabled. Not saying the specific case is critical, but the one I > used as an example was that the KVM emulator can (or at least in the > not too distant past) be forced to emulate arbitrary instructions if > the guest overwrites the instruction between the exit and the SW fetch > from the host. > > The steps are: > vcpu 1 vcpu 2 > ------------------------------------- > mov to mmio addr > vm exit ept_misconfig > overwrite mov instruction to call %rax > host emulator fetches > host emulates call instruction > > So then the guest call operation will skip the endbranch check. But I'm > not sure that there are not less exotic cases that would run across it. > I see a bunch of cases where write protected memory kicks to the > emulator as well. Not sure the exact scenarios and whether this could > happen naturally in races during live migration, dirty tracking, etc. > Again, I'm more just asking the exposure and thinking on it. Now I get your points, I didn't think of exposure from guest and just thought of the normal execution flow in guest, so I said let guest handle #CP directly. Yes, I think we need to take these cases into account, as Sean suggested in following replies, stopping emulation JMP/CALL/RET etc. instructions when guest CET is enabled is effective and simple, I'll investigate the emulator code. Thanks for raising the concerns!
On 1/5/2024 8:54 AM, Sean Christopherson wrote: > On Fri, Jan 05, 2024, Rick P Edgecombe wrote: >> On Thu, 2024-01-04 at 16:22 -0800, Sean Christopherson wrote: >>> No, the days of KVM making shit up from are done. IIUC, you're advocating >>> that it's ok for KVM to induce a #CP that architecturally should not >>> happen. That is not acceptable, full stop. >> Nope, not advocating that at all. > Heh, wrong "you". That "you" was directed at Weijiang, who I *think* is saying > that clobbering the shadow stack by emulating CALL+RET and thus inducing a bogus > #CP in the guest is ok. My fault, I just thought of the normal execution instead of the subverting cases :-) > >> I'm noticing that in this series KVM has special emulator behavior that >> doesn't match the HW when CET is enabled. That it *skips* emitting #CPs (and >> other CET behaviors SW depends on), and wondering if it is a problem. > Yes, it's a problem. But IIUC, as is KVM would also induce bogus #CPs (which is > probably less of a problem in practice, but still not acceptable). I'd choose to stop emulating the CET sensitive instructions while CET is enabled in guest as re-enter guest after emulation would raise some kind of risk, but I don't know how to stop the emulation decently. >> I'm worried that there is some way attackers will induce the host to >> emulate an instruction and skip CET enforcement that the HW would >> normally do. > Yep. The best behavior for this is likely KVM's existing behavior, i.e. retry > the instruction in the guest, and if that doesn't work, kick out to userspace and > let userspace try to sort things out. > >>> For CALL/RET (and presumably any branch instructions with IBT?) other >>> instructions that are directly affected by CET, the simplest thing would >>> probably be to disable those in KVM's emulator if shadow stacks and/or IBT >>> are enabled, and let KVM's failure paths take it from there. >> Right, that is what I was wondering might be the normal solution for >> situations like this. > If KVM can't emulate something, it either retries the instruction (with some > decent logic to guard against infinite retries) or punts to userspace. What kind of error is proper if KVM has to punt to userspace? Or just inject #UD into guest on detecting this case? > > Or if the platform owner likes to play with fire and doesn't enable > KVM_CAP_EXIT_ON_EMULATION_FAILURE, KVM will inject a #UD (and still exit to > userspace if the emulation happened at CPL0). And yes, that #UD is 100% KVM > making shit up, and yes, it has caused problems and confusion. :-) > >>> Then, *if* a use case comes along where the guest is utilizing CET and >>> "needs" KVM to emulate affected instructions, we can add the necessary >>> support the emulator. >>> >>> Alternatively, if teaching KVM's emulator to play nice with shadow stacks >>> and IBT is easy-ish, just do that. >> I think it will not be very easy. > Yeah. As Jim alluded to, I think it's probably time to admit that emulating > instructions for modern CPUs is a fools errand and KVM should simply stop trying. >
On Fri, Jan 05, 2024, Weijiang Yang wrote: > On 1/5/2024 8:54 AM, Sean Christopherson wrote: > > On Fri, Jan 05, 2024, Rick P Edgecombe wrote: > > > > For CALL/RET (and presumably any branch instructions with IBT?) other > > > > instructions that are directly affected by CET, the simplest thing would > > > > probably be to disable those in KVM's emulator if shadow stacks and/or IBT > > > > are enabled, and let KVM's failure paths take it from there. > > > Right, that is what I was wondering might be the normal solution for > > > situations like this. > > If KVM can't emulate something, it either retries the instruction (with some > > decent logic to guard against infinite retries) or punts to userspace. > > What kind of error is proper if KVM has to punt to userspace? KVM_INTERNAL_ERROR_EMULATION. See prepare_emulation_failure_exit(). > Or just inject #UD into guest on detecting this case? No, do not inject #UD or do anything else that deviates from architecturally defined behavior.
On Fri, 2024-01-05 at 08:21 -0800, Sean Christopherson wrote: > No, do not inject #UD or do anything else that deviates from > architecturally > defined behavior. Here is a, at least partial, list of CET touch points I just created by searching the SDM: 1. The emulator SW fetch with TRACKER=1 2. CALL, RET, JMP, IRET, INT, SYSCALL, SYSENTER, SYSEXIT, SYSRET 3. Task switching 4. The new CET instructions (which I guess should be handled by default): CLRSSBSY, INCSSPD, RSTORSSP, SAVEPREVSSP, SETSSBSYY, WRSS, WRUSS Not all of those are security checks, but would have some functional implications. It's still not clear to me if this could happen naturally (the TDP shadowing stuff), or only via strange attacker behavior. If we only care about the attacker case, then we could have a smaller list. It also sounds like the instructions in 2 could maybe be filtered by mode instead of caring about CET being enabled. But maybe it's not good to mix the CET problem with the bigger emulator issues. Don't know.
On Fri, Jan 5, 2024 at 9:53 AM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Fri, 2024-01-05 at 08:21 -0800, Sean Christopherson wrote: > > No, do not inject #UD or do anything else that deviates from > > architecturally > > defined behavior. > > Here is a, at least partial, list of CET touch points I just created by > searching the SDM: > 1. The emulator SW fetch with TRACKER=1 > 2. CALL, RET, JMP, IRET, INT, SYSCALL, SYSENTER, SYSEXIT, SYSRET > 3. Task switching Sigh. KVM is forced to emulate task switch, because the hardware is incapable of virtualizing it. How hard would it be to make KVM's task-switch emulation CET-aware? > 4. The new CET instructions (which I guess should be handled by > default): CLRSSBSY, INCSSPD, RSTORSSP, SAVEPREVSSP, SETSSBSYY, WRSS, > WRUSS > > Not all of those are security checks, but would have some functional > implications. It's still not clear to me if this could happen naturally > (the TDP shadowing stuff), or only via strange attacker behavior. If we > only care about the attacker case, then we could have a smaller list. > > It also sounds like the instructions in 2 could maybe be filtered by > mode instead of caring about CET being enabled. But maybe it's not good > to mix the CET problem with the bigger emulator issues. Don't know.
On Fri, 2024-01-05 at 10:09 -0800, Jim Mattson wrote: > > 3. Task switching > > Sigh. KVM is forced to emulate task switch, because the hardware is > incapable of virtualizing it. How hard would it be to make KVM's > task-switch emulation CET-aware? (I am not too familiar with this part of the arch). See SDM Vol 3a, chapter 7.3, number 8 and 15. The behavior is around actual task switching. At first glance, it looks annoying at least. It would need to do a CMPXCHG to guest memory at some points and take care to not implement the "Complex Shadow-Stack Updates" behavior. But, would anyone use it? I'm not aware of any 32 bit supervisor shadow stack support out there. So maybe it is ok to just punt to userspace in this case?
On Fri, Jan 05, 2024, Rick P Edgecombe wrote: > On Fri, 2024-01-05 at 10:09 -0800, Jim Mattson wrote: > > > 3. Task switching > > > > Sigh. KVM is forced to emulate task switch, because the hardware is > > incapable of virtualizing it. How hard would it be to make KVM's > > task-switch emulation CET-aware? > > (I am not too familiar with this part of the arch). > > See SDM Vol 3a, chapter 7.3, number 8 and 15. The behavior is around > actual task switching. At first glance, it looks annoying at least. It > would need to do a CMPXCHG to guest memory at some points and take care > to not implement the "Complex Shadow-Stack Updates" behavior. > > But, would anyone use it? I'm not aware of any 32 bit supervisor shadow > stack support out there. So maybe it is ok to just punt to userspace in > this case? Yeah, I think KVM can punt.
On 1/6/2024 12:21 AM, Sean Christopherson wrote: > On Fri, Jan 05, 2024, Weijiang Yang wrote: >> On 1/5/2024 8:54 AM, Sean Christopherson wrote: >>> On Fri, Jan 05, 2024, Rick P Edgecombe wrote: >>>>> For CALL/RET (and presumably any branch instructions with IBT?) other >>>>> instructions that are directly affected by CET, the simplest thing would >>>>> probably be to disable those in KVM's emulator if shadow stacks and/or IBT >>>>> are enabled, and let KVM's failure paths take it from there. >>>> Right, that is what I was wondering might be the normal solution for >>>> situations like this. >>> If KVM can't emulate something, it either retries the instruction (with some >>> decent logic to guard against infinite retries) or punts to userspace. >> What kind of error is proper if KVM has to punt to userspace? > KVM_INTERNAL_ERROR_EMULATION. See prepare_emulation_failure_exit(). > >> Or just inject #UD into guest on detecting this case? > No, do not inject #UD or do anything else that deviates from architecturally > defined behavior. Thanks! But based on current KVM implementation and patch 24, seems that if CET is exposed to guest, the emulation code or shadow paging mode couldn't be activated at the same time: In vmx.c, hardware_setup(void): if (!cpu_has_vmx_unrestricted_guest() || !enable_ept) enable_unrestricted_guest = 0; in vmx_set_cr0(): [...] if (enable_unrestricted_guest) hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST; else { hw_cr0 |= KVM_VM_CR0_ALWAYS_ON; if (!enable_ept) hw_cr0 |= X86_CR0_WP; if (vmx->rmode.vm86_active && (cr0 & X86_CR0_PE)) enter_pmode(vcpu); if (!vmx->rmode.vm86_active && !(cr0 & X86_CR0_PE)) enter_rmode(vcpu); } [...] And in patch 24: + if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest || + !cpu_has_vmx_basic_no_hw_errcode()) { + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); + kvm_cpu_cap_clear(X86_FEATURE_IBT); + } Not sure if I missed anything.
On Mon, Jan 08, 2024, Weijiang Yang wrote: > On 1/6/2024 12:21 AM, Sean Christopherson wrote: > > On Fri, Jan 05, 2024, Weijiang Yang wrote: > > > On 1/5/2024 8:54 AM, Sean Christopherson wrote: > > > > On Fri, Jan 05, 2024, Rick P Edgecombe wrote: > > > > > > For CALL/RET (and presumably any branch instructions with IBT?) other > > > > > > instructions that are directly affected by CET, the simplest thing would > > > > > > probably be to disable those in KVM's emulator if shadow stacks and/or IBT > > > > > > are enabled, and let KVM's failure paths take it from there. > > > > > Right, that is what I was wondering might be the normal solution for > > > > > situations like this. > > > > If KVM can't emulate something, it either retries the instruction (with some > > > > decent logic to guard against infinite retries) or punts to userspace. > > > What kind of error is proper if KVM has to punt to userspace? > > KVM_INTERNAL_ERROR_EMULATION. See prepare_emulation_failure_exit(). > > > > > Or just inject #UD into guest on detecting this case? > > No, do not inject #UD or do anything else that deviates from architecturally > > defined behavior. > > Thanks! > But based on current KVM implementation and patch 24, seems that if CET is exposed > to guest, the emulation code or shadow paging mode couldn't be activated at the same time: No, requiring unrestricted guest only disables the paths where KVM *delibeately* emulates the entire guest code stream. In no way, shape, or form does it prevent KVM from attempting to emulate arbitrary instructions. > In vmx.c, > hardware_setup(void): > if (!cpu_has_vmx_unrestricted_guest() || !enable_ept) > enable_unrestricted_guest = 0; > > in vmx_set_cr0(): > [...] > if (enable_unrestricted_guest) > hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST; > else { > hw_cr0 |= KVM_VM_CR0_ALWAYS_ON; > if (!enable_ept) > hw_cr0 |= X86_CR0_WP; > > if (vmx->rmode.vm86_active && (cr0 & X86_CR0_PE)) > enter_pmode(vcpu); > > if (!vmx->rmode.vm86_active && !(cr0 & X86_CR0_PE)) > enter_rmode(vcpu); > } > [...] > > And in patch 24: > > + if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest || > + !cpu_has_vmx_basic_no_hw_errcode()) { > + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); > + kvm_cpu_cap_clear(X86_FEATURE_IBT); > + } > > Not sure if I missed anything. > >
On 1/9/2024 11:10 PM, Sean Christopherson wrote: > On Mon, Jan 08, 2024, Weijiang Yang wrote: >> On 1/6/2024 12:21 AM, Sean Christopherson wrote: >>> On Fri, Jan 05, 2024, Weijiang Yang wrote: >>>> On 1/5/2024 8:54 AM, Sean Christopherson wrote: >>>>> On Fri, Jan 05, 2024, Rick P Edgecombe wrote: >>>>>>> For CALL/RET (and presumably any branch instructions with IBT?) other >>>>>>> instructions that are directly affected by CET, the simplest thing would >>>>>>> probably be to disable those in KVM's emulator if shadow stacks and/or IBT >>>>>>> are enabled, and let KVM's failure paths take it from there. >>>>>> Right, that is what I was wondering might be the normal solution for >>>>>> situations like this. >>>>> If KVM can't emulate something, it either retries the instruction (with some >>>>> decent logic to guard against infinite retries) or punts to userspace. >>>> What kind of error is proper if KVM has to punt to userspace? >>> KVM_INTERNAL_ERROR_EMULATION. See prepare_emulation_failure_exit(). >>> >>>> Or just inject #UD into guest on detecting this case? >>> No, do not inject #UD or do anything else that deviates from architecturally >>> defined behavior. >> Thanks! >> But based on current KVM implementation and patch 24, seems that if CET is exposed >> to guest, the emulation code or shadow paging mode couldn't be activated at the same time: > No, requiring unrestricted guest only disables the paths where KVM *delibeately* > emulates the entire guest code stream. In no way, shape, or form does it prevent > KVM from attempting to emulate arbitrary instructions. Yes, also need to prevent sporadic emulation, how about adding below patch in emulator? diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e223043ef5b2..e817d8560ceb 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -178,6 +178,7 @@ #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */ #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */ #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */ +#define IsProtected ((u64)1 << 57) /* Instruction is protected by CET. */ #define DstXacc (DstAccLo | SrcAccHi | SrcWrite) @@ -4098,9 +4099,9 @@ static const struct opcode group4[] = { static const struct opcode group5[] = { F(DstMem | SrcNone | Lock, em_inc), F(DstMem | SrcNone | Lock, em_dec), - I(SrcMem | NearBranch | IsBranch, em_call_near_abs), - I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far), - I(SrcMem | NearBranch | IsBranch, em_jmp_abs), + I(SrcMem | NearBranch | IsBranch | IsProtected, em_call_near_abs), + I(SrcMemFAddr | ImplicitOps | IsBranch | IsProtected, em_call_far), + I(SrcMem | NearBranch | IsBranch | IsProtected, em_jmp_abs), I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far), I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined), }; @@ -4362,11 +4363,11 @@ static const struct opcode opcode_table[256] = { /* 0xC8 - 0xCF */ I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter), I(Stack | IsBranch, em_leave), - I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm), - I(ImplicitOps | IsBranch, em_ret_far), - D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn), + I(ImplicitOps | SrcImmU16 | IsBranch | IsProtected, em_ret_far_imm), + I(ImplicitOps | IsBranch | IsProtected, em_ret_far), + D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | IsProtected, intn), D(ImplicitOps | No64 | IsBranch), - II(ImplicitOps | IsBranch, em_iret, iret), + II(ImplicitOps | IsBranch | IsProtected, em_iret, iret), /* 0xD0 - 0xD7 */ G(Src2One | ByteOp, group2), G(Src2One, group2), G(Src2CL | ByteOp, group2), G(Src2CL, group2), @@ -4382,7 +4383,7 @@ static const struct opcode opcode_table[256] = { I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in), I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out), /* 0xE8 - 0xEF */ - I(SrcImm | NearBranch | IsBranch, em_call), + I(SrcImm | NearBranch | IsBranch | IsProtected, em_call), D(SrcImm | ImplicitOps | NearBranch | IsBranch), I(SrcImmFAddr | No64 | IsBranch, em_jmp_far), D(SrcImmByte | ImplicitOps | NearBranch | IsBranch), @@ -4401,7 +4402,7 @@ static const struct opcode opcode_table[256] = { static const struct opcode twobyte_table[256] = { /* 0x00 - 0x0F */ G(0, group6), GD(0, &group7), N, N, - N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall), + N, I(ImplicitOps | EmulateOnUD | IsBranch | IsProtected, em_syscall), II(ImplicitOps | Priv, em_clts, clts), N, DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N, @@ -4432,8 +4433,8 @@ static const struct opcode twobyte_table[256] = { IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc), II(ImplicitOps | Priv, em_rdmsr, rdmsr), IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc), - I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter), - I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit), + I(ImplicitOps | EmulateOnUD | IsBranch | IsProtected, em_sysenter), + I(ImplicitOps | Priv | EmulateOnUD | IsBranch | IsProtected, em_sysexit), N, N, N, N, N, N, N, N, N, N, /* 0x40 - 0x4F */ @@ -4971,6 +4972,12 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int if (ctxt->d == 0) return EMULATION_FAILED; + if ((opcode.flags & IsProtected) && + (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET)) { + WARN_ONCE(1, "CET is active, emulation aborted.\n"); + return EMULATION_FAILED; + } + ctxt->execute = opcode.u.execute; if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
On Thu, Jan 11, 2024 at 10:56:55PM +0800, Yang, Weijiang wrote: >On 1/9/2024 11:10 PM, Sean Christopherson wrote: >> On Mon, Jan 08, 2024, Weijiang Yang wrote: >> > On 1/6/2024 12:21 AM, Sean Christopherson wrote: >> > > On Fri, Jan 05, 2024, Weijiang Yang wrote: >> > > > On 1/5/2024 8:54 AM, Sean Christopherson wrote: >> > > > > On Fri, Jan 05, 2024, Rick P Edgecombe wrote: >> > > > > > > For CALL/RET (and presumably any branch instructions with IBT?) other >> > > > > > > instructions that are directly affected by CET, the simplest thing would >> > > > > > > probably be to disable those in KVM's emulator if shadow stacks and/or IBT >> > > > > > > are enabled, and let KVM's failure paths take it from there. >> > > > > > Right, that is what I was wondering might be the normal solution for >> > > > > > situations like this. >> > > > > If KVM can't emulate something, it either retries the instruction (with some >> > > > > decent logic to guard against infinite retries) or punts to userspace. >> > > > What kind of error is proper if KVM has to punt to userspace? >> > > KVM_INTERNAL_ERROR_EMULATION. See prepare_emulation_failure_exit(). >> > > >> > > > Or just inject #UD into guest on detecting this case? >> > > No, do not inject #UD or do anything else that deviates from architecturally >> > > defined behavior. >> > Thanks! >> > But based on current KVM implementation and patch 24, seems that if CET is exposed >> > to guest, the emulation code or shadow paging mode couldn't be activated at the same time: >> No, requiring unrestricted guest only disables the paths where KVM *delibeately* >> emulates the entire guest code stream. In no way, shape, or form does it prevent >> KVM from attempting to emulate arbitrary instructions. > >Yes, also need to prevent sporadic emulation, how about adding below patch in emulator? > > >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >index e223043ef5b2..e817d8560ceb 100644 >--- a/arch/x86/kvm/emulate.c >+++ b/arch/x86/kvm/emulate.c >@@ -178,6 +178,7 @@ > #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */ > #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */ > #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */ >+#define IsProtected ((u64)1 << 57) /* Instruction is protected by CET. */ > > #define DstXacc (DstAccLo | SrcAccHi | SrcWrite) > >@@ -4098,9 +4099,9 @@ static const struct opcode group4[] = { > static const struct opcode group5[] = { > F(DstMem | SrcNone | Lock, em_inc), > F(DstMem | SrcNone | Lock, em_dec), >- I(SrcMem | NearBranch | IsBranch, em_call_near_abs), >- I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far), >- I(SrcMem | NearBranch | IsBranch, em_jmp_abs), >+ I(SrcMem | NearBranch | IsBranch | IsProtected, em_call_near_abs), >+ I(SrcMemFAddr | ImplicitOps | IsBranch | IsProtected, em_call_far), >+ I(SrcMem | NearBranch | IsBranch | IsProtected, em_jmp_abs), > I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far), > I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined), > }; >@@ -4362,11 +4363,11 @@ static const struct opcode opcode_table[256] = { > /* 0xC8 - 0xCF */ > I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter), > I(Stack | IsBranch, em_leave), >- I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm), >- I(ImplicitOps | IsBranch, em_ret_far), >- D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn), >+ I(ImplicitOps | SrcImmU16 | IsBranch | IsProtected, em_ret_far_imm), >+ I(ImplicitOps | IsBranch | IsProtected, em_ret_far), >+ D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | IsProtected, intn), > D(ImplicitOps | No64 | IsBranch), >- II(ImplicitOps | IsBranch, em_iret, iret), >+ II(ImplicitOps | IsBranch | IsProtected, em_iret, iret), > /* 0xD0 - 0xD7 */ > G(Src2One | ByteOp, group2), G(Src2One, group2), > G(Src2CL | ByteOp, group2), G(Src2CL, group2), >@@ -4382,7 +4383,7 @@ static const struct opcode opcode_table[256] = { > I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in), > I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out), > /* 0xE8 - 0xEF */ >- I(SrcImm | NearBranch | IsBranch, em_call), >+ I(SrcImm | NearBranch | IsBranch | IsProtected, em_call), > D(SrcImm | ImplicitOps | NearBranch | IsBranch), > I(SrcImmFAddr | No64 | IsBranch, em_jmp_far), > D(SrcImmByte | ImplicitOps | NearBranch | IsBranch), >@@ -4401,7 +4402,7 @@ static const struct opcode opcode_table[256] = { > static const struct opcode twobyte_table[256] = { > /* 0x00 - 0x0F */ > G(0, group6), GD(0, &group7), N, N, >- N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall), >+ N, I(ImplicitOps | EmulateOnUD | IsBranch | IsProtected, em_syscall), > II(ImplicitOps | Priv, em_clts, clts), N, > DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, > N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N, >@@ -4432,8 +4433,8 @@ static const struct opcode twobyte_table[256] = { > IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc), > II(ImplicitOps | Priv, em_rdmsr, rdmsr), > IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc), >- I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter), >- I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit), >+ I(ImplicitOps | EmulateOnUD | IsBranch | IsProtected, em_sysenter), >+ I(ImplicitOps | Priv | EmulateOnUD | IsBranch | IsProtected, em_sysexit), > N, N, > N, N, N, N, N, N, N, N, > /* 0x40 - 0x4F */ >@@ -4971,6 +4972,12 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int > if (ctxt->d == 0) > return EMULATION_FAILED; >+ if ((opcode.flags & IsProtected) && >+ (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET)) { CR4.CET doesn't necessarily mean IBT or shadow stack is enabled. why not check CPL and IA32_S/U_CET? >+ WARN_ONCE(1, "CET is active, emulation aborted.\n"); remove this WARN_ONCE(). Guest can trigger this at will and overflow host dmesg. if you really want to tell usespace the emulation_failure is due to CET, maybe you can add a new flag like KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES. for now, I won't bother to add this because probably userspace just terminates the VM on any instruction failure (i.e., won't try to figure out the reason of the instruction failure and fix it).
On 1/15/2024 9:55 AM, Chao Gao wrote: > On Thu, Jan 11, 2024 at 10:56:55PM +0800, Yang, Weijiang wrote: >> On 1/9/2024 11:10 PM, Sean Christopherson wrote: >>> On Mon, Jan 08, 2024, Weijiang Yang wrote: >>>> On 1/6/2024 12:21 AM, Sean Christopherson wrote: >>>>> On Fri, Jan 05, 2024, Weijiang Yang wrote: >>>>>> On 1/5/2024 8:54 AM, Sean Christopherson wrote: >>>>>>> On Fri, Jan 05, 2024, Rick P Edgecombe wrote: >>>>>>>>> For CALL/RET (and presumably any branch instructions with IBT?) other >>>>>>>>> instructions that are directly affected by CET, the simplest thing would >>>>>>>>> probably be to disable those in KVM's emulator if shadow stacks and/or IBT >>>>>>>>> are enabled, and let KVM's failure paths take it from there. >>>>>>>> Right, that is what I was wondering might be the normal solution for >>>>>>>> situations like this. >>>>>>> If KVM can't emulate something, it either retries the instruction (with some >>>>>>> decent logic to guard against infinite retries) or punts to userspace. >>>>>> What kind of error is proper if KVM has to punt to userspace? >>>>> KVM_INTERNAL_ERROR_EMULATION. See prepare_emulation_failure_exit(). >>>>> >>>>>> Or just inject #UD into guest on detecting this case? >>>>> No, do not inject #UD or do anything else that deviates from architecturally >>>>> defined behavior. >>>> Thanks! >>>> But based on current KVM implementation and patch 24, seems that if CET is exposed >>>> to guest, the emulation code or shadow paging mode couldn't be activated at the same time: >>> No, requiring unrestricted guest only disables the paths where KVM *delibeately* >>> emulates the entire guest code stream. In no way, shape, or form does it prevent >>> KVM from attempting to emulate arbitrary instructions. >> Yes, also need to prevent sporadic emulation, how about adding below patch in emulator? >> >> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >> index e223043ef5b2..e817d8560ceb 100644 >> --- a/arch/x86/kvm/emulate.c >> +++ b/arch/x86/kvm/emulate.c >> @@ -178,6 +178,7 @@ >> #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */ >> #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */ >> #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */ >> +#define IsProtected ((u64)1 << 57) /* Instruction is protected by CET. */ >> >> #define DstXacc (DstAccLo | SrcAccHi | SrcWrite) >> >> @@ -4098,9 +4099,9 @@ static const struct opcode group4[] = { >> static const struct opcode group5[] = { >> F(DstMem | SrcNone | Lock, em_inc), >> F(DstMem | SrcNone | Lock, em_dec), >> - I(SrcMem | NearBranch | IsBranch, em_call_near_abs), >> - I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far), >> - I(SrcMem | NearBranch | IsBranch, em_jmp_abs), >> + I(SrcMem | NearBranch | IsBranch | IsProtected, em_call_near_abs), >> + I(SrcMemFAddr | ImplicitOps | IsBranch | IsProtected, em_call_far), >> + I(SrcMem | NearBranch | IsBranch | IsProtected, em_jmp_abs), >> I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far), >> I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined), >> }; >> @@ -4362,11 +4363,11 @@ static const struct opcode opcode_table[256] = { >> /* 0xC8 - 0xCF */ >> I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter), >> I(Stack | IsBranch, em_leave), >> - I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm), >> - I(ImplicitOps | IsBranch, em_ret_far), >> - D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn), >> + I(ImplicitOps | SrcImmU16 | IsBranch | IsProtected, em_ret_far_imm), >> + I(ImplicitOps | IsBranch | IsProtected, em_ret_far), >> + D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | IsProtected, intn), >> D(ImplicitOps | No64 | IsBranch), >> - II(ImplicitOps | IsBranch, em_iret, iret), >> + II(ImplicitOps | IsBranch | IsProtected, em_iret, iret), >> /* 0xD0 - 0xD7 */ >> G(Src2One | ByteOp, group2), G(Src2One, group2), >> G(Src2CL | ByteOp, group2), G(Src2CL, group2), >> @@ -4382,7 +4383,7 @@ static const struct opcode opcode_table[256] = { >> I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in), >> I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out), >> /* 0xE8 - 0xEF */ >> - I(SrcImm | NearBranch | IsBranch, em_call), >> + I(SrcImm | NearBranch | IsBranch | IsProtected, em_call), >> D(SrcImm | ImplicitOps | NearBranch | IsBranch), >> I(SrcImmFAddr | No64 | IsBranch, em_jmp_far), >> D(SrcImmByte | ImplicitOps | NearBranch | IsBranch), >> @@ -4401,7 +4402,7 @@ static const struct opcode opcode_table[256] = { >> static const struct opcode twobyte_table[256] = { >> /* 0x00 - 0x0F */ >> G(0, group6), GD(0, &group7), N, N, >> - N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall), >> + N, I(ImplicitOps | EmulateOnUD | IsBranch | IsProtected, em_syscall), >> II(ImplicitOps | Priv, em_clts, clts), N, >> DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, >> N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N, >> @@ -4432,8 +4433,8 @@ static const struct opcode twobyte_table[256] = { >> IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc), >> II(ImplicitOps | Priv, em_rdmsr, rdmsr), >> IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc), >> - I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter), >> - I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit), >> + I(ImplicitOps | EmulateOnUD | IsBranch | IsProtected, em_sysenter), >> + I(ImplicitOps | Priv | EmulateOnUD | IsBranch | IsProtected, em_sysexit), >> N, N, >> N, N, N, N, N, N, N, N, >> /* 0x40 - 0x4F */ >> @@ -4971,6 +4972,12 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int >> if (ctxt->d == 0) >> return EMULATION_FAILED; >> + if ((opcode.flags & IsProtected) && >> + (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET)) { > CR4.CET doesn't necessarily mean IBT or shadow stack is enabled. why not check > CPL and IA32_S/U_CET? CR4.CET is the master control bit for CET features, a sane guest should set the bit iff it wants to activate CET features. On the contrast, the IBT/SHSTK bits in IA32_S/U_CET only mean the feature is enabled but maybe not active at the moment emulator is working, so no need to stop emulation in this case. > >> + WARN_ONCE(1, "CET is active, emulation aborted.\n"); > remove this WARN_ONCE(). Guest can trigger this at will and overflow host dmesg. OK, the purpose is to give some informative message when guest hits the prohibited cases. I can remove it. Thanks! > > if you really want to tell usespace the emulation_failure is due to CET, maybe > you can add a new flag like KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES. > for now, I won't bother to add this because probably userspace just terminates > the VM on any instruction failure (i.e., won't try to figure out the reason of > the instruction failure and fix it). Agreed, don't need to another flag to indicate this is due to CET on.