diff mbox series

[3/6] x86/kvm/emulate: Avoid RET for fastops

Message ID 20250414113754.172767741@infradead.org (mailing list archive)
State New
Headers show
Series objtool: Detect and warn about indirect calls in __nocfi functions | expand

Commit Message

Peter Zijlstra April 14, 2025, 11:11 a.m. UTC
Since there is only a single fastop() function, convert the FASTOP
stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
thunks and all that jazz.

Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
which not all of them can trivially do (call depth tracing suffers
here).

Objtool strenuously complains about things, therefore fix up the
various problems:

 - indirect call without a .rodata, fails to determine JUMP_TABLE,
   add an annotation for this.
 - fastop functions fall through, create an exception for this case
 - unreachable instruction after fastop_return, save/restore

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c              |   20 +++++++++++++++-----
 include/linux/objtool_types.h       |    1 +
 tools/include/linux/objtool_types.h |    1 +
 tools/objtool/check.c               |   11 ++++++++++-
 4 files changed, 27 insertions(+), 6 deletions(-)

Comments

Josh Poimboeuf April 14, 2025, 10:36 p.m. UTC | #1
On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> Since there is only a single fastop() function, convert the FASTOP
> stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> thunks and all that jazz.
> 
> Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> which not all of them can trivially do (call depth tracing suffers
> here).
> 
> Objtool strenuously complains about things, therefore fix up the
> various problems:
> 
>  - indirect call without a .rodata, fails to determine JUMP_TABLE,
>    add an annotation for this.
>  - fastop functions fall through, create an exception for this case
>  - unreachable instruction after fastop_return, save/restore

I think this breaks unwinding.  Each of the individual fastops inherits
fastop()'s stack but the ORC doesn't reflect that.

Should they just be moved to a proper .S file?
Peter Zijlstra April 15, 2025, 7:44 a.m. UTC | #2
On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > Since there is only a single fastop() function, convert the FASTOP
> > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > thunks and all that jazz.
> > 
> > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > which not all of them can trivially do (call depth tracing suffers
> > here).
> > 
> > Objtool strenuously complains about things, therefore fix up the
> > various problems:
> > 
> >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> >    add an annotation for this.
> >  - fastop functions fall through, create an exception for this case
> >  - unreachable instruction after fastop_return, save/restore
> 
> I think this breaks unwinding.  Each of the individual fastops inherits
> fastop()'s stack but the ORC doesn't reflect that.

I'm not sure I understand. There is only the one location, and we
simply save/restore the state around the one 'call'.
Josh Poimboeuf April 15, 2025, 2:39 p.m. UTC | #3
On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > > Since there is only a single fastop() function, convert the FASTOP
> > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > > thunks and all that jazz.
> > > 
> > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > > which not all of them can trivially do (call depth tracing suffers
> > > here).
> > > 
> > > Objtool strenuously complains about things, therefore fix up the
> > > various problems:
> > > 
> > >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> > >    add an annotation for this.
> > >  - fastop functions fall through, create an exception for this case
> > >  - unreachable instruction after fastop_return, save/restore
> > 
> > I think this breaks unwinding.  Each of the individual fastops inherits
> > fastop()'s stack but the ORC doesn't reflect that.
> 
> I'm not sure I understand. There is only the one location, and we
> simply save/restore the state around the one 'call'.

The problem isn't fastop() but rather the tiny functions it "calls".
Each of those is marked STT_FUNC so it gets its own ORC data saying the
return address is at RSP+8.

Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address
isn't pushed before the branch.  Thus they become part of fastop()
rather than separate functions.  RSP+8 is only correct if it happens to
have not pushed anything to the stack before the indirect JMP.

The addresses aren't stored in an .rodata jump table so objtool doesn't
know the control flow.  Even if we made them non-FUNC, objtool wouldn't
be able to transfer the stack state.
Peter Zijlstra April 16, 2025, 8:38 a.m. UTC | #4
On Tue, Apr 15, 2025 at 07:39:41AM -0700, Josh Poimboeuf wrote:
> On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> > > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > > > Since there is only a single fastop() function, convert the FASTOP
> > > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > > > thunks and all that jazz.
> > > > 
> > > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > > > which not all of them can trivially do (call depth tracing suffers
> > > > here).
> > > > 
> > > > Objtool strenuously complains about things, therefore fix up the
> > > > various problems:
> > > > 
> > > >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> > > >    add an annotation for this.
> > > >  - fastop functions fall through, create an exception for this case
> > > >  - unreachable instruction after fastop_return, save/restore
> > > 
> > > I think this breaks unwinding.  Each of the individual fastops inherits
> > > fastop()'s stack but the ORC doesn't reflect that.
> > 
> > I'm not sure I understand. There is only the one location, and we
> > simply save/restore the state around the one 'call'.
> 
> The problem isn't fastop() but rather the tiny functions it "calls".
> Each of those is marked STT_FUNC so it gets its own ORC data saying the
> return address is at RSP+8.
> 
> Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address
> isn't pushed before the branch.  Thus they become part of fastop()
> rather than separate functions.  RSP+8 is only correct if it happens to
> have not pushed anything to the stack before the indirect JMP.

Yeah, I finally got there. I'll go cook up something else.
diff mbox series

Patch

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -285,8 +285,8 @@  static void invalidate_registers(struct
  * different operand sizes can be reached by calculation, rather than a jump
  * table (which would be bigger than the code).
  *
- * The 16 byte alignment, considering 5 bytes for the RET thunk, 3 for ENDBR
- * and 1 for the straight line speculation INT3, leaves 7 bytes for the
+ * The 16 byte alignment, considering 5 bytes for the JMP, 4 for ENDBR
+ * and 1 for the straight line speculation INT3, leaves 6 bytes for the
  * body of the function.  Currently none is larger than 4.
  */
 static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop);
@@ -304,7 +304,7 @@  static int fastop(struct x86_emulate_ctx
 	__FOP_FUNC(#name)
 
 #define __FOP_RET(name) \
-	"11: " ASM_RET \
+	"11: jmp fastop_return; int3 \n\t" \
 	".size " name ", .-" name "\n\t"
 
 #define FOP_RET(name) \
@@ -5044,14 +5044,24 @@  static void fetch_possible_mmx_operand(s
 		kvm_read_mmx_reg(op->addr.mm, &op->mm_val);
 }
 
-static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
+/*
+ * All the FASTOP magic above relies on there being *one* instance of this
+ * so it can JMP back, avoiding RET and it's various thunks.
+ */
+static noinline int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
 {
 	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
 
 	if (!(ctxt->d & ByteOp))
 		fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
 
-	asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
+	asm("push %[flags]; popf \n\t"
+	    UNWIND_HINT(UNWIND_HINT_TYPE_SAVE, 0, 0, 0)
+	    ASM_ANNOTATE(ANNOTYPE_JUMP_TABLE)
+	    JMP_NOSPEC
+	    "fastop_return: \n\t"
+	    UNWIND_HINT(UNWIND_HINT_TYPE_RESTORE, 0, 0, 0)
+	    "pushf; pop %[flags]\n"
 	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
 	      [thunk_target]"+S"(fop), ASM_CALL_CONSTRAINT
 	    : "c"(ctxt->src2.val));
--- a/include/linux/objtool_types.h
+++ b/include/linux/objtool_types.h
@@ -65,5 +65,6 @@  struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_JUMP_TABLE		9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/include/linux/objtool_types.h
+++ b/tools/include/linux/objtool_types.h
@@ -65,5 +65,6 @@  struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_JUMP_TABLE		9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2428,6 +2428,14 @@  static int __annotate_late(struct objtoo
 		insn->dead_end = false;
 		break;
 
+	/*
+	 * Must be after add_jump_table(); for it doesn't set a sane
+	 * _jump_table value.
+	 */
+	case ANNOTYPE_JUMP_TABLE:
+		insn->_jump_table = (void *)1;
+		break;
+
 	default:
 		ERROR_INSN(insn, "Unknown annotation type: %d", type);
 		return -1;
@@ -3559,7 +3567,8 @@  static int validate_branch(struct objtoo
 		if (func && insn_func(insn) && func != insn_func(insn)->pfunc) {
 			/* Ignore KCFI type preambles, which always fall through */
 			if (!strncmp(func->name, "__cfi_", 6) ||
-			    !strncmp(func->name, "__pfx_", 6))
+			    !strncmp(func->name, "__pfx_", 6) ||
+			    !strcmp(insn_func(insn)->name, "fastop"))
 				return 0;
 
 			if (file->ignore_unreachables)