Message ID | 20200501225838.9866-3-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: Support for CET Supervisor Shadow Stacks | expand |
On 02.05.2020 00:58, Andrew Cooper wrote: > For one, they render the vector in a different base. > > Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their > mnemonic, which starts bringing the code/diagnostics in line with the Intel > and AMD manuals. For this "bringing in line" purpose I'd like to see whether you could live with some adjustments to how you're currently doing things: - NMI is nowhere prefixed by #, hence I think we'd better not do so either; may require embedding the #-es in the names[] table, or not using N() for NMI - neither Coprocessor Segment Overrun nor vector 0x0f have a mnemonic and hence I think we shouldn't invent one; just treat them like other reserved vectors (of which at least vector 0x09 indeed is one on x86-64)? Jan
On 04/05/2020 14:08, Jan Beulich wrote: > On 02.05.2020 00:58, Andrew Cooper wrote: >> For one, they render the vector in a different base. >> >> Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their >> mnemonic, which starts bringing the code/diagnostics in line with the Intel >> and AMD manuals. > For this "bringing in line" purpose I'd like to see whether you could > live with some adjustments to how you're currently doing things: > - NMI is nowhere prefixed by #, hence I think we'd better not do so > either; may require embedding the #-es in the names[] table, or not > using N() for NMI No-one is going to get confused at seeing #NMI in an error message. I don't mind jugging the existing names table, but anything more complicated is overkill. > - neither Coprocessor Segment Overrun nor vector 0x0f have a mnemonic > and hence I think we shouldn't invent one; just treat them like > other reserved vectors (of which at least vector 0x09 indeed is one > on x86-64)? This I disagree with. Coprocessor Segment Overrun *is* its name in both manuals, and the avoidance of vector 0xf is clearly documented as well, due to it being the default PIC Spurious Interrupt Vector. Neither CSO or SPV are expected to be encountered in practice, but if they are, highlighting them is a damn-sight more helpful than pretending they don't exist. ~Andrew
On 11.05.2020 17:01, Andrew Cooper wrote: > On 04/05/2020 14:08, Jan Beulich wrote: >> On 02.05.2020 00:58, Andrew Cooper wrote: >>> For one, they render the vector in a different base. >>> >>> Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their >>> mnemonic, which starts bringing the code/diagnostics in line with the Intel >>> and AMD manuals. >> For this "bringing in line" purpose I'd like to see whether you could >> live with some adjustments to how you're currently doing things: >> - NMI is nowhere prefixed by #, hence I think we'd better not do so >> either; may require embedding the #-es in the names[] table, or not >> using N() for NMI > > No-one is going to get confused at seeing #NMI in an error message. I > don't mind jugging the existing names table, but anything more > complicated is overkill. > >> - neither Coprocessor Segment Overrun nor vector 0x0f have a mnemonic >> and hence I think we shouldn't invent one; just treat them like >> other reserved vectors (of which at least vector 0x09 indeed is one >> on x86-64)? > > This I disagree with. Coprocessor Segment Overrun *is* its name in both > manuals, and the avoidance of vector 0xf is clearly documented as well, > due to it being the default PIC Spurious Interrupt Vector. > > Neither CSO or SPV are expected to be encountered in practice, but if > they are, highlighting them is a damn-sight more helpful than pretending > they don't exist. How is them occurring (and getting logged with their vector numbers) any different from other reserved, acronym-less vectors? I particularly didn't suggest to pretend they don't exist; instead I did suggest that they are as reserved as, say, vector 0x18. By inventing an acronym and logging this instead of the vector number you'll make people other than you have to look up what the odd acronym means iff such an exception ever got raised. Jan
On 11/05/2020 16:09, Jan Beulich wrote: > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. > > On 11.05.2020 17:01, Andrew Cooper wrote: >> On 04/05/2020 14:08, Jan Beulich wrote: >>> On 02.05.2020 00:58, Andrew Cooper wrote: >>>> For one, they render the vector in a different base. >>>> >>>> Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their >>>> mnemonic, which starts bringing the code/diagnostics in line with the Intel >>>> and AMD manuals. >>> For this "bringing in line" purpose I'd like to see whether you could >>> live with some adjustments to how you're currently doing things: >>> - NMI is nowhere prefixed by #, hence I think we'd better not do so >>> either; may require embedding the #-es in the names[] table, or not >>> using N() for NMI >> No-one is going to get confused at seeing #NMI in an error message. I >> don't mind jugging the existing names table, but anything more >> complicated is overkill. >> >>> - neither Coprocessor Segment Overrun nor vector 0x0f have a mnemonic >>> and hence I think we shouldn't invent one; just treat them like >>> other reserved vectors (of which at least vector 0x09 indeed is one >>> on x86-64)? >> This I disagree with. Coprocessor Segment Overrun *is* its name in both >> manuals, and the avoidance of vector 0xf is clearly documented as well, >> due to it being the default PIC Spurious Interrupt Vector. >> >> Neither CSO or SPV are expected to be encountered in practice, but if >> they are, highlighting them is a damn-sight more helpful than pretending >> they don't exist. > How is them occurring (and getting logged with their vector numbers) > any different from other reserved, acronym-less vectors? I particularly > didn't suggest to pretend they don't exist; instead I did suggest that > they are as reserved as, say, vector 0x18. By inventing an acronym and > logging this instead of the vector number you'll make people other than > you have to look up what the odd acronym means iff such an exception > ever got raised. You snipped the bits in the patch where both the vector number and acronym are printed together. Anyone who doesn't know the vector has to look it up anyway, at which point they'll find that what Xen prints out matches what both manuals say. OTOH, people who know what a coprocessor segment overrun or PIC spurious vector is won't need to look it up. ~Andrew
On 18.05.2020 18:54, Andrew Cooper wrote: > On 11/05/2020 16:09, Jan Beulich wrote: >> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. >> >> On 11.05.2020 17:01, Andrew Cooper wrote: >>> On 04/05/2020 14:08, Jan Beulich wrote: >>>> On 02.05.2020 00:58, Andrew Cooper wrote: >>>>> For one, they render the vector in a different base. >>>>> >>>>> Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their >>>>> mnemonic, which starts bringing the code/diagnostics in line with the Intel >>>>> and AMD manuals. >>>> For this "bringing in line" purpose I'd like to see whether you could >>>> live with some adjustments to how you're currently doing things: >>>> - NMI is nowhere prefixed by #, hence I think we'd better not do so >>>> either; may require embedding the #-es in the names[] table, or not >>>> using N() for NMI >>> No-one is going to get confused at seeing #NMI in an error message. I >>> don't mind jugging the existing names table, but anything more >>> complicated is overkill. >>> >>>> - neither Coprocessor Segment Overrun nor vector 0x0f have a mnemonic >>>> and hence I think we shouldn't invent one; just treat them like >>>> other reserved vectors (of which at least vector 0x09 indeed is one >>>> on x86-64)? >>> This I disagree with. Coprocessor Segment Overrun *is* its name in both >>> manuals, and the avoidance of vector 0xf is clearly documented as well, >>> due to it being the default PIC Spurious Interrupt Vector. >>> >>> Neither CSO or SPV are expected to be encountered in practice, but if >>> they are, highlighting them is a damn-sight more helpful than pretending >>> they don't exist. >> How is them occurring (and getting logged with their vector numbers) >> any different from other reserved, acronym-less vectors? I particularly >> didn't suggest to pretend they don't exist; instead I did suggest that >> they are as reserved as, say, vector 0x18. By inventing an acronym and >> logging this instead of the vector number you'll make people other than >> you have to look up what the odd acronym means iff such an exception >> ever got raised. > > You snipped the bits in the patch where both the vector number and > acronym are printed together. > > Anyone who doesn't know the vector has to look it up anyway, at which > point they'll find that what Xen prints out matches what both manuals > say. OTOH, people who know what a coprocessor segment overrun or PIC > spurious vector is won't need to look it up. And who know to decipher the non-standard CPO and SPV (which are what triggered my comments in the first place). What I continue to fail to see is why these reserved vectors need treatment different from all others. In addition I'm having trouble seeing how the default spurious PIC vector matters for us - we program the PIC to vectors 0x20-0x2f, i.e. a spurious PIC0 IRQ would show up at vector 0x27. (I notice we still blindly assume there's a pair of PICs in the first place.) Jan
On 19/05/2020 09:50, Jan Beulich wrote: > On 18.05.2020 18:54, Andrew Cooper wrote: >> On 11/05/2020 16:09, Jan Beulich wrote: >>> On 11.05.2020 17:01, Andrew Cooper wrote: >>>> On 04/05/2020 14:08, Jan Beulich wrote: >>>>> On 02.05.2020 00:58, Andrew Cooper wrote: >>>>>> For one, they render the vector in a different base. >>>>>> >>>>>> Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their >>>>>> mnemonic, which starts bringing the code/diagnostics in line with the Intel >>>>>> and AMD manuals. >>>>> For this "bringing in line" purpose I'd like to see whether you could >>>>> live with some adjustments to how you're currently doing things: >>>>> - NMI is nowhere prefixed by #, hence I think we'd better not do so >>>>> either; may require embedding the #-es in the names[] table, or not >>>>> using N() for NMI >>>> No-one is going to get confused at seeing #NMI in an error message. I >>>> don't mind jugging the existing names table, but anything more >>>> complicated is overkill. >>>> >>>>> - neither Coprocessor Segment Overrun nor vector 0x0f have a mnemonic >>>>> and hence I think we shouldn't invent one; just treat them like >>>>> other reserved vectors (of which at least vector 0x09 indeed is one >>>>> on x86-64)? >>>> This I disagree with. Coprocessor Segment Overrun *is* its name in both >>>> manuals, and the avoidance of vector 0xf is clearly documented as well, >>>> due to it being the default PIC Spurious Interrupt Vector. >>>> >>>> Neither CSO or SPV are expected to be encountered in practice, but if >>>> they are, highlighting them is a damn-sight more helpful than pretending >>>> they don't exist. >>> How is them occurring (and getting logged with their vector numbers) >>> any different from other reserved, acronym-less vectors? I particularly >>> didn't suggest to pretend they don't exist; instead I did suggest that >>> they are as reserved as, say, vector 0x18. By inventing an acronym and >>> logging this instead of the vector number you'll make people other than >>> you have to look up what the odd acronym means iff such an exception >>> ever got raised. >> You snipped the bits in the patch where both the vector number and >> acronym are printed together. >> >> Anyone who doesn't know the vector has to look it up anyway, at which >> point they'll find that what Xen prints out matches what both manuals >> say. OTOH, people who know what a coprocessor segment overrun or PIC >> spurious vector is won't need to look it up. > And who know to decipher the non-standard CPO and SPV (which are what > triggered my comments in the first place). CSO, and no. Anyone who doesn't know the text still has the vector number to work with, and still needs to look it up. At which point they will observe that the text is appropriate in context. > What I continue to fail to > see is why these reserved vectors need treatment different from all > others. Because it has nothing to do with reserved-ness. It is about providing clarifying information (for all vectors which currently have, or have ever had, meaning) for mere mortals who can't (or rather, don't want to) debug crashes based on raw numbers alone. > In addition I'm having trouble seeing how the default spurious > PIC vector matters for us - we program the PIC to vectors 0x20-0x2f, > i.e. a spurious PIC0 IRQ would show up at vector 0x27. (I notice we > still blindly assume there's a pair of PICs in the first place.) That's not relevant. What is relevant is the actions taken when we see vector 15 being raised. Hitting CSO means that legacy #FERR_FREEZE external signal has been wired up (and it is very SMP-unsafe, hence why it was phased out with the introductions integrated x87's). Hitting SPV means that the PIC wasn't reprogrammed and something wonky is going on with one of the input pins. Both of these are strictly more helpful in a log than "something went wrong - figure it out yourself", and both indicate that something is very wrong with the system. ~Andrew
On 26.05.2020 17:38, Andrew Cooper wrote: > On 19/05/2020 09:50, Jan Beulich wrote: >> On 18.05.2020 18:54, Andrew Cooper wrote: >>> On 11/05/2020 16:09, Jan Beulich wrote: >>>> On 11.05.2020 17:01, Andrew Cooper wrote: >>>>> On 04/05/2020 14:08, Jan Beulich wrote: >>>>>> On 02.05.2020 00:58, Andrew Cooper wrote: >>>>>>> For one, they render the vector in a different base. >>>>>>> >>>>>>> Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their >>>>>>> mnemonic, which starts bringing the code/diagnostics in line with the Intel >>>>>>> and AMD manuals. >>>>>> For this "bringing in line" purpose I'd like to see whether you could >>>>>> live with some adjustments to how you're currently doing things: >>>>>> - NMI is nowhere prefixed by #, hence I think we'd better not do so >>>>>> either; may require embedding the #-es in the names[] table, or not >>>>>> using N() for NMI >>>>> No-one is going to get confused at seeing #NMI in an error message. I >>>>> don't mind jugging the existing names table, but anything more >>>>> complicated is overkill. >>>>> >>>>>> - neither Coprocessor Segment Overrun nor vector 0x0f have a mnemonic >>>>>> and hence I think we shouldn't invent one; just treat them like >>>>>> other reserved vectors (of which at least vector 0x09 indeed is one >>>>>> on x86-64)? >>>>> This I disagree with. Coprocessor Segment Overrun *is* its name in both >>>>> manuals, and the avoidance of vector 0xf is clearly documented as well, >>>>> due to it being the default PIC Spurious Interrupt Vector. >>>>> >>>>> Neither CSO or SPV are expected to be encountered in practice, but if >>>>> they are, highlighting them is a damn-sight more helpful than pretending >>>>> they don't exist. >>>> How is them occurring (and getting logged with their vector numbers) >>>> any different from other reserved, acronym-less vectors? I particularly >>>> didn't suggest to pretend they don't exist; instead I did suggest that >>>> they are as reserved as, say, vector 0x18. By inventing an acronym and >>>> logging this instead of the vector number you'll make people other than >>>> you have to look up what the odd acronym means iff such an exception >>>> ever got raised. >>> You snipped the bits in the patch where both the vector number and >>> acronym are printed together. >>> >>> Anyone who doesn't know the vector has to look it up anyway, at which >>> point they'll find that what Xen prints out matches what both manuals >>> say. OTOH, people who know what a coprocessor segment overrun or PIC >>> spurious vector is won't need to look it up. >> And who know to decipher the non-standard CPO and SPV (which are what >> triggered my comments in the first place). > > CSO, and no. > > Anyone who doesn't know the text still has the vector number to work > with, and still needs to look it up. > > At which point they will observe that the text is appropriate in context. > >> What I continue to fail to >> see is why these reserved vectors need treatment different from all >> others. > > Because it has nothing to do with reserved-ness. How does it not? The SDM page, among historic information, specifically says "Intel reserved". Seeing more exception vectors getting used after many years of "silence" in this area, I'm pretty sure if they ran out of vectors they'd re-use this one. Vector 15 doesn't even have a page, which puts it even more in the same group as other reserved ones. > It is about providing clarifying information (for all vectors which > currently have, or have ever had, meaning) for mere mortals who can't > (or rather, don't want to) debug crashes based on raw numbers alone. > >> In addition I'm having trouble seeing how the default spurious >> PIC vector matters for us - we program the PIC to vectors 0x20-0x2f, >> i.e. a spurious PIC0 IRQ would show up at vector 0x27. (I notice we >> still blindly assume there's a pair of PICs in the first place.) > > That's not relevant. What is relevant is the actions taken when we see > vector 15 being raised. > > Hitting CSO means that legacy #FERR_FREEZE external signal has been > wired up (and it is very SMP-unsafe, hence why it was phased out with > the introductions integrated x87's). What does FERR have to do with this vector? This exception is a stand- in for #GP (and maybe #PF) on the 386/387 pair. > Hitting SPV means that the PIC wasn't reprogrammed and something wonky > is going on with one of the input pins. If the PIC was neither re-programmed nor properly masked, we're in bigger trouble, I'm afraid. > Both of these are strictly more helpful in a log than "something went > wrong - figure it out yourself", and both indicate that something is > very wrong with the system. So what do we do? We can't seem to be able to reach agreement here, because our views are different and neither of us can convince the other. Looking back at my initial reply, hesitantly Acked-by: Jan Beulich <jbeulich@suse.com> then. Jan
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index fe9457cdb6..e73f07f28a 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -686,6 +686,20 @@ const char *trapstr(unsigned int trapnr) return trapnr < ARRAY_SIZE(strings) ? strings[trapnr] : "???"; } +static const char *vec_name(unsigned int vec) +{ + static const char names[][4] = { +#define N(x) [X86_EXC_ ## x] = #x + N(DE), N(DB), N(NMI), N(BP), N(OF), N(BR), N(UD), N(NM), + N(DF), N(CSO), N(TS), N(NP), N(SS), N(GP), N(PF), N(SPV), + N(MF), N(AC), N(MC), N(XM), N(VE), N(CP), + N(HV), N(VC), N(SX), +#undef N + }; + + return (vec < ARRAY_SIZE(names) && names[vec][0]) ? names[vec] : "??"; +} + /* * This is called for faults at very unexpected times (e.g., when interrupts * are disabled). In such situations we can't do much that is safe. We try to @@ -743,10 +757,9 @@ void fatal_trap(const struct cpu_user_regs *regs, bool show_remote) } } - panic("FATAL TRAP: vector = %d (%s)\n" - "[error_code=%04x] %s\n", - trapnr, trapstr(trapnr), regs->error_code, - (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT"); + panic("FATAL TRAP: vec %u, #%s[%04x]%s\n", + trapnr, vec_name(trapnr), regs->error_code, + (regs->eflags & X86_EFLAGS_IF) ? "" : " IN INTERRUPT CONTEXT"); } static void do_reserved_trap(struct cpu_user_regs *regs) @@ -757,7 +770,8 @@ static void do_reserved_trap(struct cpu_user_regs *regs) return; show_execution_state(regs); - panic("FATAL RESERVED TRAP %#x: %s\n", trapnr, trapstr(trapnr)); + panic("FATAL RESERVED TRAP: vec %u, #%s[%04x]\n", + trapnr, vec_name(trapnr), regs->error_code); } static void do_trap(struct cpu_user_regs *regs) diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 8f6f5a97dd..12b55e1022 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -43,11 +43,7 @@ #define TRAP_virtualisation 20 #define TRAP_nr 32 -#define TRAP_HAVE_EC \ - ((1u << TRAP_double_fault) | (1u << TRAP_invalid_tss) | \ - (1u << TRAP_no_segment) | (1u << TRAP_stack_error) | \ - (1u << TRAP_gp_fault) | (1u << TRAP_page_fault) | \ - (1u << TRAP_alignment_check)) +#define TRAP_HAVE_EC X86_EXC_HAVE_EC /* Set for entry via SYSCALL. Informs return code to use SYSRETQ not IRETQ. */ /* NB. Same as VGCF_in_syscall. No bits in common with any other TRAP_ defn. */ diff --git a/xen/include/asm-x86/x86-defns.h b/xen/include/asm-x86/x86-defns.h index 8bf503220a..84e15b15be 100644 --- a/xen/include/asm-x86/x86-defns.h +++ b/xen/include/asm-x86/x86-defns.h @@ -118,4 +118,39 @@ #define X86_NR_VECTORS 256 +/* Exception Vectors */ +#define X86_EXC_DE 0 /* Divide Error. */ +#define X86_EXC_DB 1 /* Debug Exception. */ +#define X86_EXC_NMI 2 /* NMI. */ +#define X86_EXC_BP 3 /* Breakpoint. */ +#define X86_EXC_OF 4 /* Overflow. */ +#define X86_EXC_BR 5 /* BOUND Range. */ +#define X86_EXC_UD 6 /* Invalid Opcode. */ +#define X86_EXC_NM 7 /* Device Not Available. */ +#define X86_EXC_DF 8 /* Double Fault. */ +#define X86_EXC_CSO 9 /* Coprocessor Segment Overrun. */ +#define X86_EXC_TS 10 /* Invalid TSS. */ +#define X86_EXC_NP 11 /* Segment Not Present. */ +#define X86_EXC_SS 12 /* Stack-Segment Fault. */ +#define X86_EXC_GP 13 /* General Porection Fault. */ +#define X86_EXC_PF 14 /* Page Fault. */ +#define X86_EXC_SPV 15 /* PIC Spurious Interrupt Vector. */ +#define X86_EXC_MF 16 /* Maths fault (x87 FPU). */ +#define X86_EXC_AC 17 /* Alignment Check. */ +#define X86_EXC_MC 18 /* Machine Check. */ +#define X86_EXC_XM 19 /* SIMD Exception. */ +#define X86_EXC_VE 20 /* Virtualisation Exception. */ +#define X86_EXC_CP 21 /* Control-flow Protection. */ +#define X86_EXC_HV 28 /* Hypervisor Injection. */ +#define X86_EXC_VC 29 /* VMM Communication. */ +#define X86_EXC_SX 30 /* Security Exception. */ + +/* Bitmap of exceptions which have error codes. */ +#define X86_EXC_HAVE_EC \ + ((1u << X86_EXC_DF) | (1u << X86_EXC_TS) | (1u << X86_EXC_NP) | \ + (1u << X86_EXC_SS) | (1u << X86_EXC_GP) | (1u << X86_EXC_PF) | \ + (1u << X86_EXC_AC) | (1u << X86_EXC_CP) | \ + (1u << X86_EXC_VC) | (1u << X86_EXC_SX)) + + #endif /* __XEN_X86_DEFNS_H__ */
For one, they render the vector in a different base. Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their mnemonic, which starts bringing the code/diagnostics in line with the Intel and AMD manuals. Provide constants for every archtiecturally defined exception, even those not implemented by Xen yet, as do_reserved_trap() is a catch-all handler. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wl@xen.org> CC: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/traps.c | 24 +++++++++++++++++++----- xen/include/asm-x86/processor.h | 6 +----- xen/include/asm-x86/x86-defns.h | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+), 10 deletions(-)