Message ID | 20220307205313.31247-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/kexec: Fix kexec-reboot with CET active | expand |
On 07/03/2022 20:53, Andrew Cooper wrote: > The kexec_reloc() asm has an indirect jump to relocate onto the identity > trampoline. While we clear CET in machine_crash_shutdown(), we fail to clear > CET for the non-crash path. This in turn highlights that the same is true of > resetting the CPUID masking/faulting. > > Move both pieces of logic from machine_crash_shutdown() to machine_kexec(), > the latter being common for all kexec transitions. Adjust the condition for > CET being considered active to check in CR4, which is simpler and more robust. Reviewed-by: David Vrabel <dvrabel@amazon.co.uk> > Fixes: 311434bfc9d1 ("x86/setup: Rework MSR_S_CET handling for CET-IBT") > Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") > Fixes: 5ab9564c6fa1 ("x86/cpu: Context switch cpuid masks and faulting state in context_switch()") > Reported-by: David Vrabel (XXX which alias to use?) Amazon, please. David
On 07.03.2022 21:53, Andrew Cooper wrote: > --- a/xen/arch/x86/machine_kexec.c > +++ b/xen/arch/x86/machine_kexec.c > @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) > */ > local_irq_disable(); > > + /* Reset CPUID masking and faulting to the host's default. */ > + ctxt_switch_levelling(NULL); > + > + /* Disable CET. */ > + if ( read_cr4() & X86_CR4_CET ) > + { > + wrmsrl(MSR_S_CET, 0); > + write_cr4(read_cr4() & ~X86_CR4_CET); > + } > + > /* Now regular interrupts are disabled, we need to reduce the impact > * of interrupts not disabled by 'cli'. > * Besides introducing somewhat of a disconnect between the comment in context here and the earlier local_irq_disable(), is it really necessary to do both actions with IRQs off? Jan
On 08/03/2022 08:15, Jan Beulich wrote: > On 07.03.2022 21:53, Andrew Cooper wrote: >> --- a/xen/arch/x86/machine_kexec.c >> +++ b/xen/arch/x86/machine_kexec.c >> @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) >> */ >> local_irq_disable(); >> >> + /* Reset CPUID masking and faulting to the host's default. */ >> + ctxt_switch_levelling(NULL); >> + >> + /* Disable CET. */ >> + if ( read_cr4() & X86_CR4_CET ) >> + { >> + wrmsrl(MSR_S_CET, 0); >> + write_cr4(read_cr4() & ~X86_CR4_CET); >> + } >> + >> /* Now regular interrupts are disabled, we need to reduce the impact >> * of interrupts not disabled by 'cli'. >> * > Besides introducing somewhat of a disconnect between the comment in > context here and the earlier local_irq_disable(), is it really > necessary to do both actions with IRQs off? We are a handful of instructions away from discarding Xen's context entirely. IRQs are not a relevant concern. If we're nitpicking, irqs want to be off before kexecing gets set, because absolutely nothing good can come of handling interrupts later than that point. ~Andrew
On 08.03.2022 17:22, Andrew Cooper wrote: > On 08/03/2022 08:15, Jan Beulich wrote: >> On 07.03.2022 21:53, Andrew Cooper wrote: >>> --- a/xen/arch/x86/machine_kexec.c >>> +++ b/xen/arch/x86/machine_kexec.c >>> @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) >>> */ >>> local_irq_disable(); >>> >>> + /* Reset CPUID masking and faulting to the host's default. */ >>> + ctxt_switch_levelling(NULL); >>> + >>> + /* Disable CET. */ >>> + if ( read_cr4() & X86_CR4_CET ) >>> + { >>> + wrmsrl(MSR_S_CET, 0); >>> + write_cr4(read_cr4() & ~X86_CR4_CET); >>> + } >>> + >>> /* Now regular interrupts are disabled, we need to reduce the impact >>> * of interrupts not disabled by 'cli'. >>> * >> Besides introducing somewhat of a disconnect between the comment in >> context here and the earlier local_irq_disable(), is it really >> necessary to do both actions with IRQs off? > > We are a handful of instructions away from discarding Xen's context > entirely. IRQs are not a relevant concern. Well, as said - the comment was what caught my eye. But as you appear to think that slight disconnect is not an issue: I don't mean my remark to be an objection. Feel free to commit with David's R-b. Jan
On 07.03.2022 21:53, Andrew Cooper wrote: > --- a/xen/arch/x86/machine_kexec.c > +++ b/xen/arch/x86/machine_kexec.c > @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) > */ > local_irq_disable(); > > + /* Reset CPUID masking and faulting to the host's default. */ > + ctxt_switch_levelling(NULL); > + > + /* Disable CET. */ > + if ( read_cr4() & X86_CR4_CET ) > + { > + wrmsrl(MSR_S_CET, 0); > + write_cr4(read_cr4() & ~X86_CR4_CET); > + } It just occurred to me: Isn't using read_cr4() here somewhat risky? If we crashed on the CR4 write in write_cr4(), we'd try to load CR4 here again with a value which may cause another fault. Jan
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c index 003222c0f1ac..99089f77a7eb 100644 --- a/xen/arch/x86/crash.c +++ b/xen/arch/x86/crash.c @@ -187,16 +187,6 @@ void machine_crash_shutdown(void) nmi_shootdown_cpus(); - /* Reset CPUID masking and faulting to the host's default. */ - ctxt_switch_levelling(NULL); - - /* Disable CET. */ - if ( cpu_has_xen_shstk || cpu_has_xen_ibt ) - { - wrmsrl(MSR_S_CET, 0); - write_cr4(read_cr4() & ~X86_CR4_CET); - } - info = kexec_crash_save_info(); info->xen_phys_start = xen_phys_start; info->dom0_pfn_to_mfn_frame_list_list = diff --git a/xen/arch/x86/machine_kexec.c b/xen/arch/x86/machine_kexec.c index 751a9efcaf6a..d83aa4e7e93b 100644 --- a/xen/arch/x86/machine_kexec.c +++ b/xen/arch/x86/machine_kexec.c @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) */ local_irq_disable(); + /* Reset CPUID masking and faulting to the host's default. */ + ctxt_switch_levelling(NULL); + + /* Disable CET. */ + if ( read_cr4() & X86_CR4_CET ) + { + wrmsrl(MSR_S_CET, 0); + write_cr4(read_cr4() & ~X86_CR4_CET); + } + /* Now regular interrupts are disabled, we need to reduce the impact * of interrupts not disabled by 'cli'. *
The kexec_reloc() asm has an indirect jump to relocate onto the identity trampoline. While we clear CET in machine_crash_shutdown(), we fail to clear CET for the non-crash path. This in turn highlights that the same is true of resetting the CPUID masking/faulting. Move both pieces of logic from machine_crash_shutdown() to machine_kexec(), the latter being common for all kexec transitions. Adjust the condition for CET being considered active to check in CR4, which is simpler and more robust. Fixes: 311434bfc9d1 ("x86/setup: Rework MSR_S_CET handling for CET-IBT") Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") Fixes: 5ab9564c6fa1 ("x86/cpu: Context switch cpuid masks and faulting state in context_switch()") Reported-by: David Vrabel (XXX which alias to use?) Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: David Vrabel <dvrabel@cantab.net> --- xen/arch/x86/crash.c | 10 ---------- xen/arch/x86/machine_kexec.c | 10 ++++++++++ 2 files changed, 10 insertions(+), 10 deletions(-)