Message ID | 20230914063325.85503-4-weijiang.yang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enable CET Virtualization | expand |
On Thu, 2023-09-14 at 02:33 -0400, Yang Weijiang wrote: > Add supervisor mode state support within FPU xstate management > framework. > Although supervisor shadow stack is not enabled/used today in > kernel,KVM ^ Nit: needs a space > requires the support because when KVM advertises shadow stack feature > to > guest, architechturally it claims the support for both user and ^ Spelling: "architecturally" > supervisor > modes for Linux and non-Linux guest OSes. > > With the xstate support, guest supervisor mode shadow stack state can > be > properly saved/restored when 1) guest/host FPU context is swapped > 2) vCPU > thread is sched out/in. (2) is a little bit confusing, because the lazy FPU stuff won't always save/restore while scheduling. But trying to explain the details in this commit log is probably unnecessary. Maybe something like? 2) At the proper times while other tasks are scheduled I think also a key part of this is that XFEATURE_CET_KERNEL is not *all* of the "guest supervisor mode shadow stack state", at least with respect to the MSRs. It might be worth calling that out a little more loudly. > > The alternative is to enable it in KVM domain, but KVM maintainers > NAKed > the solution. The external discussion can be found at [*], it ended > up > with adding the support in kernel instead of KVM domain. > > Note, in KVM case, guest CET supervisor state i.e., > IA32_PL{0,1,2}_MSRs, > are preserved after VM-Exit until host/guest fpstates are swapped, > but > since host supervisor shadow stack is disabled, the preserved MSRs > won't > hurt host. It might beg the question of if this solution will need to be redone by some future Linux supervisor shadow stack effort. I *think* the answer is no. Most of the xsave managed features are restored before returning to userspace because they would have userspace effect. But XFEATURE_CET_KERNEL is different. It only effects the kernel. But the IA32_PL{0,1,2}_MSRs are used when transitioning to those rings. So for Linux they would get used when transitioning back from userspace. In order for it to be used when control transfers back *from* userspace, it needs to be restored before returning *to* userspace. So despite being needed only for the kernel, and having no effect on userspace, it might need to be swapped/restored at the same time as the rest of the FPU state that only affects userspace. Probably supervisor shadow stack for Linux needs much more analysis, but trying to leave some breadcrumbs on the thinking from internal reviews. I don't know if it might be good to include some of this reasoning in the commit log. It's a bit hand wavy. > > [*]: > https://lore.kernel.org/all/806e26c2-8d21-9cc9-a0b7-7787dd231729@intel.com/ > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Otherwise, the code looked good to me.
On 9/15/2023 8:06 AM, Edgecombe, Rick P wrote: > On Thu, 2023-09-14 at 02:33 -0400, Yang Weijiang wrote: >> Add supervisor mode state support within FPU xstate management >> framework. >> Although supervisor shadow stack is not enabled/used today in >> kernel,KVM > ^ Nit: needs a space >> requires the support because when KVM advertises shadow stack feature >> to >> guest, architechturally it claims the support for both user and > ^ Spelling: "architecturally" Thank you!! >> supervisor >> modes for Linux and non-Linux guest OSes. >> >> With the xstate support, guest supervisor mode shadow stack state can >> be >> properly saved/restored when 1) guest/host FPU context is swapped >> 2) vCPU >> thread is sched out/in. > (2) is a little bit confusing, because the lazy FPU stuff won't always > save/restore while scheduling. It's true for normal thread, but for vCPU thread, it's a bit different, on the path to vm-entry, after host/guest fpu states swapped, preemption is not disabled and vCPU thread could be sched out/in, in this case, guest FPU states will be saved/ restored because TIF_NEED_FPU_LOAD is always cleared after swap. > But trying to explain the details in > this commit log is probably unnecessary. Maybe something like? > > 2) At the proper times while other tasks are scheduled I just want to justify that enabling of supervisor xstate is necessary for guest. Maybe I need to reword a bit :-) > I think also a key part of this is that XFEATURE_CET_KERNEL is not > *all* of the "guest supervisor mode shadow stack state", at least with > respect to the MSRs. It might be worth calling that out a little more > loudly. OK, I will call it out that supervisor mode shadow stack state also includes IA32_S_CET msr. >> The alternative is to enable it in KVM domain, but KVM maintainers >> NAKed >> the solution. The external discussion can be found at [*], it ended >> up >> with adding the support in kernel instead of KVM domain. >> >> Note, in KVM case, guest CET supervisor state i.e., >> IA32_PL{0,1,2}_MSRs, >> are preserved after VM-Exit until host/guest fpstates are swapped, >> but >> since host supervisor shadow stack is disabled, the preserved MSRs >> won't >> hurt host. > It might beg the question of if this solution will need to be redone by > some future Linux supervisor shadow stack effort. I *think* the answer > is no. AFAICT KVM needs to be modified if host shadow stack is implemented, at least guest/host CET supervisor MSRs should be swapped at the earliest time after vm-exit so that host won't misbehavior on *guest* MSR contents. > Most of the xsave managed features are restored before returning to > userspace because they would have userspace effect. But > XFEATURE_CET_KERNEL is different. It only effects the kernel. But the > IA32_PL{0,1,2}_MSRs are used when transitioning to those rings. So for > Linux they would get used when transitioning back from userspace. In > order for it to be used when control transfers back *from* userspace, > it needs to be restored before returning *to* userspace. So despite > being needed only for the kernel, and having no effect on userspace, it > might need to be swapped/restored at the same time as the rest of the > FPU state that only affects userspace. You're right, for enabling of supervisor mode shadow stack, we need to take it carefully whenever ring/stack is switching. But we still have time to figure out the points. Thanks a lot for bring up such kind of thinking! > Probably supervisor shadow stack for Linux needs much more analysis, > but trying to leave some breadcrumbs on the thinking from internal > reviews. I don't know if it might be good to include some of this > reasoning in the commit log. It's a bit hand wavy. IMO, we have put much assumption on the fact that CET supervisor shadow stack is not enabled in kernel and this patch itself is straightforward and simple, it's just a small brick for enabling supervisor shadow stack, we would revisit whether something is an issue based on how SSS is implemented in kernel. So let's not add such kind of reasoning :-) Thank you for the enlightenment! >> [*]: >> https://lore.kernel.org/all/806e26c2-8d21-9cc9-a0b7-7787dd231729@intel.com/ >> >> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> > Otherwise, the code looked good to me.
On Fri, 2023-09-15 at 14:30 +0800, Yang, Weijiang wrote: > On 9/15/2023 8:06 AM, Edgecombe, Rick P wrote: > > On Thu, 2023-09-14 at 02:33 -0400, Yang Weijiang wrote: > > > Add supervisor mode state support within FPU xstate management > > > framework. > > > Although supervisor shadow stack is not enabled/used today in > > > kernel,KVM > > ^ Nit: needs a space > > > requires the support because when KVM advertises shadow stack feature > > > to > > > guest, architechturally it claims the support for both user and > > ^ Spelling: "architecturally" > > Thank you!! > > > > supervisor > > > modes for Linux and non-Linux guest OSes. > > > > > > With the xstate support, guest supervisor mode shadow stack state can > > > be > > > properly saved/restored when 1) guest/host FPU context is swapped > > > 2) vCPU > > > thread is sched out/in. > > (2) is a little bit confusing, because the lazy FPU stuff won't always > > save/restore while scheduling. > > It's true for normal thread, but for vCPU thread, it's a bit different, on the path to > vm-entry, after host/guest fpu states swapped, preemption is not disabled and > vCPU thread could be sched out/in, in this case, guest FPU states will be saved/ > restored because TIF_NEED_FPU_LOAD is always cleared after swap. > > > But trying to explain the details in > > this commit log is probably unnecessary. Maybe something like? > > > > 2) At the proper times while other tasks are scheduled > > I just want to justify that enabling of supervisor xstate is necessary for guest. > Maybe I need to reword a bit :-) > > > I think also a key part of this is that XFEATURE_CET_KERNEL is not > > *all* of the "guest supervisor mode shadow stack state", at least with > > respect to the MSRs. It might be worth calling that out a little more > > loudly. > > OK, I will call it out that supervisor mode shadow stack state also includes IA32_S_CET msr. > > > > The alternative is to enable it in KVM domain, but KVM maintainers > > > NAKed > > > the solution. The external discussion can be found at [*], it ended > > > up > > > with adding the support in kernel instead of KVM domain. > > > > > > Note, in KVM case, guest CET supervisor state i.e., > > > IA32_PL{0,1,2}_MSRs, > > > are preserved after VM-Exit until host/guest fpstates are swapped, > > > but > > > since host supervisor shadow stack is disabled, the preserved MSRs > > > won't > > > hurt host. > > It might beg the question of if this solution will need to be redone by > > some future Linux supervisor shadow stack effort. I *think* the answer > > is no. > > AFAICT KVM needs to be modified if host shadow stack is implemented, at least > guest/host CET supervisor MSRs should be swapped at the earliest time after > vm-exit so that host won't misbehavior on *guest* MSR contents. I agree. > > > Most of the xsave managed features are restored before returning to > > userspace because they would have userspace effect. But > > XFEATURE_CET_KERNEL is different. It only effects the kernel. But the > > IA32_PL{0,1,2}_MSRs are used when transitioning to those rings. So for > > Linux they would get used when transitioning back from userspace. In > > order for it to be used when control transfers back *from* userspace, > > it needs to be restored before returning *to* userspace. So despite > > being needed only for the kernel, and having no effect on userspace, it > > might need to be swapped/restored at the same time as the rest of the > > FPU state that only affects userspace. > > You're right, for enabling of supervisor mode shadow stack, we need to take > it carefully whenever ring/stack is switching. But we still have time to figure out > the points. > > Thanks a lot for bring up such kind of thinking! > > > Probably supervisor shadow stack for Linux needs much more analysis, > > but trying to leave some breadcrumbs on the thinking from internal > > reviews. I don't know if it might be good to include some of this > > reasoning in the commit log. It's a bit hand wavy. > > IMO, we have put much assumption on the fact that CET supervisor shadow stack is not > enabled in kernel and this patch itself is straightforward and simple, it's just a small > brick for enabling supervisor shadow stack, we would revisit whether something is an > issue based on how SSS is implemented in kernel. So let's not add such kind of reasoning :-) Overall the patch looks OK to me. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Best regards, Maxim Levitsky > > Thank you for the enlightenment! > > > [*]: > > > https://lore.kernel.org/all/806e26c2-8d21-9cc9-a0b7-7787dd231729@intel.com/ > > > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> > > Otherwise, the code looked good to me.
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index eb810074f1e7..c6fd13a17205 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -116,7 +116,7 @@ enum xfeature { XFEATURE_PKRU, XFEATURE_PASID, XFEATURE_CET_USER, - XFEATURE_CET_KERNEL_UNUSED, + XFEATURE_CET_KERNEL, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -139,7 +139,7 @@ enum xfeature { #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) #define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) -#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -264,6 +264,16 @@ struct cet_user_state { u64 user_ssp; }; +/* + * State component 12 is Control-flow Enforcement supervisor states + */ +struct cet_supervisor_state { + /* supervisor ssp pointers */ + u64 pl0_ssp; + u64 pl1_ssp; + u64 pl2_ssp; +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index d4427b88ee12..3b4a038d3c57 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -51,7 +51,8 @@ /* All currently supported supervisor features */ #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ - XFEATURE_MASK_CET_USER) + XFEATURE_MASK_CET_USER | \ + XFEATURE_MASK_CET_KERNEL) /* * A supervisor state component may not always contain valuable information, @@ -78,8 +79,7 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ - XFEATURE_MASK_CET_KERNEL) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 12c8cb278346..c3ed86732d33 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -51,7 +51,7 @@ static const char *xfeature_names[] = "Protection Keys User registers", "PASID state", "Control-flow User registers", - "Control-flow Kernel registers (unused)", + "Control-flow Kernel registers", "unknown xstate feature", "unknown xstate feature", "unknown xstate feature", @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_OSPKE, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_KERNEL] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -277,6 +278,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); print_xstate_feature(XFEATURE_MASK_CET_USER); + print_xstate_feature(XFEATURE_MASK_CET_KERNEL); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -346,6 +348,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ XFEATURE_MASK_CET_USER | \ + XFEATURE_MASK_CET_KERNEL | \ XFEATURE_MASK_XTILE) /* @@ -546,6 +549,7 @@ static bool __init check_xstate_against_struct(int nr) case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_CET_KERNEL: return XCHECK_SZ(sz, nr, struct cet_supervisor_state); case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; default: XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
Add supervisor mode state support within FPU xstate management framework. Although supervisor shadow stack is not enabled/used today in kernel,KVM requires the support because when KVM advertises shadow stack feature to guest, architechturally it claims the support for both user and supervisor modes for Linux and non-Linux guest OSes. With the xstate support, guest supervisor mode shadow stack state can be properly saved/restored when 1) guest/host FPU context is swapped 2) vCPU thread is sched out/in. The alternative is to enable it in KVM domain, but KVM maintainers NAKed the solution. The external discussion can be found at [*], it ended up with adding the support in kernel instead of KVM domain. Note, in KVM case, guest CET supervisor state i.e., IA32_PL{0,1,2}_MSRs, are preserved after VM-Exit until host/guest fpstates are swapped, but since host supervisor shadow stack is disabled, the preserved MSRs won't hurt host. [*]: https://lore.kernel.org/all/806e26c2-8d21-9cc9-a0b7-7787dd231729@intel.com/ Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> --- arch/x86/include/asm/fpu/types.h | 14 ++++++++++++-- arch/x86/include/asm/fpu/xstate.h | 6 +++--- arch/x86/kernel/fpu/xstate.c | 6 +++++- 3 files changed, 20 insertions(+), 6 deletions(-)