diff mbox series

KVM/VMX: Do not declare vmread_error asmlinkage

Message ID 20220817144045.3206-1-ubizjak@gmail.com (mailing list archive)
State New, archived
Headers show
Series KVM/VMX: Do not declare vmread_error asmlinkage | expand

Commit Message

Uros Bizjak Aug. 17, 2022, 2:40 p.m. UTC
There is no need to declare vmread_error asmlinkage, its arguments
can be passed via registers for both, 32-bit and 64-bit targets.
Function argument registers are considered call-clobbered registers,
they are saved in the trampoline just before the function call and
restored afterwards.

Note that asmlinkage and __attribute__((regparm(0))) have no effect
on 64-bit targets. The trampoline is called from the assembler glue
code that implements its own stack-passing function calling convention,
so the attribute on the trampoline declaration does not change anything
for 64-bit as well as 32-bit targets. We can declare it asmlinkage for
documentation purposes.

The patch unifies trampoline function argument handling between 32-bit
and 64-bit targets and improves generated code for 32-bit targets.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 arch/x86/kvm/vmx/vmenter.S | 15 +++------------
 arch/x86/kvm/vmx/vmx.c     |  2 +-
 arch/x86/kvm/vmx/vmx_ops.h |  6 +++---
 3 files changed, 7 insertions(+), 16 deletions(-)

Comments

Sean Christopherson Aug. 17, 2022, 3:58 p.m. UTC | #1
+PeterZ

On Wed, Aug 17, 2022, Uros Bizjak wrote:
> There is no need to declare vmread_error asmlinkage, its arguments
> can be passed via registers for both, 32-bit and 64-bit targets.
> Function argument registers are considered call-clobbered registers,
> they are saved in the trampoline just before the function call and
> restored afterwards.

I'm officially confused.  What's the purpose of asmlinkage when used in the kernel?
Is it some historical wart that's no longer truly necessary and only causes pain?

When I wrote this code, I thought that the intent was that it should be applied to
any and all asm => C function calls.  But that's obviously not required as there
are multiple instances of asm code calling C functions without annotations of any
kind.

And vmread_error() isn't the only case where asmlinkage appears to be a burden, e.g.
schedule_tail_wrapper() => schedule_tail() seems to exist purely to deal with the
side affect of asmlinkage generating -regparm=0 on 32-bit kernels.
Uros Bizjak Aug. 31, 2022, 7:10 a.m. UTC | #2
On Wed, Aug 17, 2022 at 5:58 PM Sean Christopherson <seanjc@google.com> wrote:
>
> +PeterZ
>
> On Wed, Aug 17, 2022, Uros Bizjak wrote:
> > There is no need to declare vmread_error asmlinkage, its arguments
> > can be passed via registers for both, 32-bit and 64-bit targets.
> > Function argument registers are considered call-clobbered registers,
> > they are saved in the trampoline just before the function call and
> > restored afterwards.
>
> I'm officially confused.  What's the purpose of asmlinkage when used in the kernel?
> Is it some historical wart that's no longer truly necessary and only causes pain?
>
> When I wrote this code, I thought that the intent was that it should be applied to
> any and all asm => C function calls.  But that's obviously not required as there
> are multiple instances of asm code calling C functions without annotations of any
> kind.

It is the other way around. As written in coding-style.rst:

Large, non-trivial assembly functions should go in .S files, with corresponding
C prototypes defined in C header files.  The C prototypes for assembly
functions should use ``asmlinkage``.

So, prototypes for *assembly functions* should use asmlinkage.

That said, asmlinkage for i386 just switches ABI to the default
stack-passing ABI. However, we are calling assembly files, so the
argument handling in the callee is totally under our control and there
is no need to switch ABIs. It looks to me that besides syscalls,
asmlinkage is and should be used only for a large imported body of asm
functions that use standard stack-passing ABI (e.g. x86 crypto and
math-emu functions), otherwise it is just a burden to push and pop
registers to/from stack for no apparent benefit.

> And vmread_error() isn't the only case where asmlinkage appears to be a burden, e.g.
> schedule_tail_wrapper() => schedule_tail() seems to exist purely to deal with the
> side affect of asmlinkage generating -regparm=0 on 32-bit kernels.

schedule_tail is external to the x86 arch directory, and for some
reason marked asmlinkage. So, the call from asm must follow asmlinkage
ABI.

FYI, there is some discussion about asmlinkage on stackoverflow:

https://stackoverflow.com/questions/61635931/does-asmlinkage-mean-stack-or-registers

Uros.
Sean Christopherson Sept. 1, 2022, 3:37 p.m. UTC | #3
On Wed, Aug 31, 2022, Uros Bizjak wrote:
> On Wed, Aug 17, 2022 at 5:58 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > +PeterZ
> >
> > On Wed, Aug 17, 2022, Uros Bizjak wrote:
> > > There is no need to declare vmread_error asmlinkage, its arguments
> > > can be passed via registers for both, 32-bit and 64-bit targets.
> > > Function argument registers are considered call-clobbered registers,
> > > they are saved in the trampoline just before the function call and
> > > restored afterwards.
> >
> > I'm officially confused.  What's the purpose of asmlinkage when used in the kernel?
> > Is it some historical wart that's no longer truly necessary and only causes pain?
> >
> > When I wrote this code, I thought that the intent was that it should be applied to
> > any and all asm => C function calls.  But that's obviously not required as there
> > are multiple instances of asm code calling C functions without annotations of any
> > kind.
> 
> It is the other way around. As written in coding-style.rst:
> 
> Large, non-trivial assembly functions should go in .S files, with corresponding
> C prototypes defined in C header files.  The C prototypes for assembly
> functions should use ``asmlinkage``.
> 
> So, prototypes for *assembly functions* should use asmlinkage.

I gotta imagine that documentation is stale.  I don't understand why asmlinkage
would be a one-way thing.

> That said, asmlinkage for i386 just switches ABI to the default
> stack-passing ABI. However, we are calling assembly files, so the
> argument handling in the callee is totally under our control and there
> is no need to switch ABIs. It looks to me that besides syscalls,
> asmlinkage is and should be used only for a large imported body of asm
> functions that use standard stack-passing ABI (e.g. x86 crypto and
> math-emu functions), otherwise it is just a burden to push and pop
> registers to/from stack for no apparent benefit.

Yeah, this is what I'm confused about.  Unless there's something we're missing,
we should update the docs to clarify when asmlinkage is actually needed.

> > And vmread_error() isn't the only case where asmlinkage appears to be a burden, e.g.
> > schedule_tail_wrapper() => schedule_tail() seems to exist purely to deal with the
> > side affect of asmlinkage generating -regparm=0 on 32-bit kernels.
> 
> schedule_tail is external to the x86 arch directory, and for some
> reason marked asmlinkage. So, the call from asm must follow asmlinkage
> ABI.

Ahhh, it's a common helper that's called from assembly on other architectures.
That makes sense.

Thanks much!
Sean Christopherson Sept. 1, 2022, 3:39 p.m. UTC | #4
On Wed, Aug 17, 2022, Uros Bizjak wrote:
> There is no need to declare vmread_error asmlinkage, its arguments
> can be passed via registers for both, 32-bit and 64-bit targets.
> Function argument registers are considered call-clobbered registers,
> they are saved in the trampoline just before the function call and
> restored afterwards.
> 
> Note that asmlinkage and __attribute__((regparm(0))) have no effect
> on 64-bit targets. The trampoline is called from the assembler glue
> code that implements its own stack-passing function calling convention,
> so the attribute on the trampoline declaration does not change anything
> for 64-bit as well as 32-bit targets. We can declare it asmlinkage for
> documentation purposes.

...

> diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
> index 5cfc49ddb1b4..550a89394d9f 100644
> --- a/arch/x86/kvm/vmx/vmx_ops.h
> +++ b/arch/x86/kvm/vmx/vmx_ops.h
> @@ -10,9 +10,9 @@
>  #include "vmcs.h"
>  #include "../x86.h"
>  
> -asmlinkage void vmread_error(unsigned long field, bool fault);
> -__attribute__((regparm(0))) void vmread_error_trampoline(unsigned long field,
> -							 bool fault);
> +void vmread_error(unsigned long field, bool fault);
> +asmlinkage void vmread_error_trampoline(unsigned long field,
> +					bool fault);
>  void vmwrite_error(unsigned long field, unsigned long value);
>  void vmclear_error(struct vmcs *vmcs, u64 phys_addr);
>  void vmptrld_error(struct vmcs *vmcs, u64 phys_addr);

If it's ok with you, I'll split this into two patches.  One to drop asmlinkage
from vmread_error(), and one to convert the open coded regparm to asmlinkage.
Uros Bizjak Sept. 1, 2022, 5:29 p.m. UTC | #5
On Thu, Sep 1, 2022 at 5:40 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Aug 17, 2022, Uros Bizjak wrote:
> > There is no need to declare vmread_error asmlinkage, its arguments
> > can be passed via registers for both, 32-bit and 64-bit targets.
> > Function argument registers are considered call-clobbered registers,
> > they are saved in the trampoline just before the function call and
> > restored afterwards.
> >
> > Note that asmlinkage and __attribute__((regparm(0))) have no effect
> > on 64-bit targets. The trampoline is called from the assembler glue
> > code that implements its own stack-passing function calling convention,
> > so the attribute on the trampoline declaration does not change anything
> > for 64-bit as well as 32-bit targets. We can declare it asmlinkage for
> > documentation purposes.
>
> ...
>
> > diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
> > index 5cfc49ddb1b4..550a89394d9f 100644
> > --- a/arch/x86/kvm/vmx/vmx_ops.h
> > +++ b/arch/x86/kvm/vmx/vmx_ops.h
> > @@ -10,9 +10,9 @@
> >  #include "vmcs.h"
> >  #include "../x86.h"
> >
> > -asmlinkage void vmread_error(unsigned long field, bool fault);
> > -__attribute__((regparm(0))) void vmread_error_trampoline(unsigned long field,
> > -                                                      bool fault);
> > +void vmread_error(unsigned long field, bool fault);
> > +asmlinkage void vmread_error_trampoline(unsigned long field,
> > +                                     bool fault);
> >  void vmwrite_error(unsigned long field, unsigned long value);
> >  void vmclear_error(struct vmcs *vmcs, u64 phys_addr);
> >  void vmptrld_error(struct vmcs *vmcs, u64 phys_addr);
>
> If it's ok with you, I'll split this into two patches.  One to drop asmlinkage
> from vmread_error(), and one to convert the open coded regparm to asmlinkage.

Sure, please go ahead.

Thanks,
Uros.
Wang, Wei W Sept. 6, 2022, 7:28 a.m. UTC | #6
On Thursday, September 1, 2022 11:37 PM, Sean Christopherson wrote:
> > > And vmread_error() isn't the only case where asmlinkage appears to be a
> burden, e.g.
> > > schedule_tail_wrapper() => schedule_tail() seems to exist purely to
> > > deal with the side affect of asmlinkage generating -regparm=0 on 32-bit
> kernels.
> >
> > schedule_tail is external to the x86 arch directory, and for some
> > reason marked asmlinkage. So, the call from asm must follow asmlinkage
> > ABI.
> 
> Ahhh, it's a common helper that's called from assembly on other architectures.
> That makes sense.

I still doubt the necessity. The compilation is architecture specific, and we don't
build one architecture-agnostic kernel binary to run on different architectures,
right?

Thanks,
Wei
Sean Christopherson Sept. 8, 2022, 3:47 p.m. UTC | #7
On Tue, Sep 06, 2022, Wang, Wei W wrote:
> On Thursday, September 1, 2022 11:37 PM, Sean Christopherson wrote:
> > > > And vmread_error() isn't the only case where asmlinkage appears to be a
> > burden, e.g.
> > > > schedule_tail_wrapper() => schedule_tail() seems to exist purely to
> > > > deal with the side affect of asmlinkage generating -regparm=0 on 32-bit
> > kernels.
> > >
> > > schedule_tail is external to the x86 arch directory, and for some
> > > reason marked asmlinkage. So, the call from asm must follow asmlinkage
> > > ABI.
> > 
> > Ahhh, it's a common helper that's called from assembly on other architectures.
> > That makes sense.
> 
> I still doubt the necessity. The compilation is architecture specific, and we don't
> build one architecture-agnostic kernel binary to run on different architectures,
> right?

Right, it's not strictly necessary, e.g. wrapping schedule_tail()'s asmlinkage in
"#ifndef CONFIG_X86" would allow for the removal of schedule_tail_wrapper().  But
that's arguably worse than forcing i386 to use a wrapper given that the few extra
instructions are unlikely to add meaningful overhead, and since i386 is a rather
uncommon configuration these days.
Sean Christopherson Sept. 8, 2022, 5:23 p.m. UTC | #8
On Thu, Sep 01, 2022, Uros Bizjak wrote:
> On Thu, Sep 1, 2022 at 5:40 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Aug 17, 2022, Uros Bizjak wrote:
> > > There is no need to declare vmread_error asmlinkage, its arguments
> > > can be passed via registers for both, 32-bit and 64-bit targets.
> > > Function argument registers are considered call-clobbered registers,
> > > they are saved in the trampoline just before the function call and
> > > restored afterwards.
> > >
> > > Note that asmlinkage and __attribute__((regparm(0))) have no effect
> > > on 64-bit targets. The trampoline is called from the assembler glue
> > > code that implements its own stack-passing function calling convention,
> > > so the attribute on the trampoline declaration does not change anything
> > > for 64-bit as well as 32-bit targets. We can declare it asmlinkage for
> > > documentation purposes.
> >
> > ...
> >
> > > diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
> > > index 5cfc49ddb1b4..550a89394d9f 100644
> > > --- a/arch/x86/kvm/vmx/vmx_ops.h
> > > +++ b/arch/x86/kvm/vmx/vmx_ops.h
> > > @@ -10,9 +10,9 @@
> > >  #include "vmcs.h"
> > >  #include "../x86.h"
> > >
> > > -asmlinkage void vmread_error(unsigned long field, bool fault);
> > > -__attribute__((regparm(0))) void vmread_error_trampoline(unsigned long field,
> > > -                                                      bool fault);
> > > +void vmread_error(unsigned long field, bool fault);
> > > +asmlinkage void vmread_error_trampoline(unsigned long field,
> > > +                                     bool fault);
> > >  void vmwrite_error(unsigned long field, unsigned long value);
> > >  void vmclear_error(struct vmcs *vmcs, u64 phys_addr);
> > >  void vmptrld_error(struct vmcs *vmcs, u64 phys_addr);
> >
> > If it's ok with you, I'll split this into two patches.  One to drop asmlinkage
> > from vmread_error(), and one to convert the open coded regparm to asmlinkage.
> 
> Sure, please go ahead.

On second thought, even though "__attribute__((regparm(0)))" doesn't actually do
anything for 64-bit targets, I'd prefer to keep the open coded weirdness _because_
the whole thing is open coded weirdness.  The attribute isn't strictly necessary
for 32-bit targets either since the CALL is emitted from inline assembly.  

I now remember that I added the explicit regparm(0) to try and document that
vmread_error_trampoline() _always_ passes params on the stack, even for 64-bit
targets, i.e. even if "asmlinkage" is a nop.

Alternatively, given that the trampoline exists purely to support inline asm, i.e.
should never be called from C code in any circumstance, what about turning the
function declaration into an opaque symbol and then writing a proper comment.

That way, attempting to invoke vmread_error_trampoline() from C yields:

arch/x86/kernel/../kvm/vmx/vmx_ops.h: In function ‘__vmcs_readl’:
arch/x86/kernel/../kvm/vmx/vmx_ops.h:113:2: error: called object ‘vmread_error_trampoline’ is not a function or function pointer
  113 |  vmread_error_trampoline(field, false);
      |  ^~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kernel/../kvm/vmx/vmx_ops.h:33:22: note: declared here
   33 | extern unsigned long vmread_error_trampoline;
      |                      ^~~~~~~~~~~~~~~~~~~~~~~
---
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 8 Sep 2022 10:17:40 -0700
Subject: [PATCH] KVM: VMX: Make vmread_error_trampoline() uncallable from C
 code

Declare vmread_error_trampoline() as an opaque symbol so that it cannot
be called from C code, at least not without some serious fudging.  The
trampoline always passes parameters on the stack so that the inline
VMREAD sequence doesn't need to clobber registers.  regparm(0) was
originally added to document the stack behavior, but it ended up being
confusing because regparm(0) is a nop for 64-bit targets.

Opportunustically wrap the trampoline and its declaration in #ifdeffery
to make it even harder to invoke incorrectly, to document why it exists,
and so that it's not left behind if/when CONFIG_CC_HAS_ASM_GOTO_OUTPUT
is true for all supported toolchains.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmenter.S |  2 ++
 arch/x86/kvm/vmx/vmx_ops.h | 18 ++++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 8477d8bdd69c..24c54577ac84 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -269,6 +269,7 @@ SYM_FUNC_END(__vmx_vcpu_run)
 
 .section .text, "ax"
 
+#ifndef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
 /**
  * vmread_error_trampoline - Trampoline from inline asm to vmread_error()
  * @field:	VMCS field encoding that failed
@@ -317,6 +318,7 @@ SYM_FUNC_START(vmread_error_trampoline)
 
 	RET
 SYM_FUNC_END(vmread_error_trampoline)
+#endif
 
 SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
 	/*
diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
index ec268df83ed6..7ea99e6b4908 100644
--- a/arch/x86/kvm/vmx/vmx_ops.h
+++ b/arch/x86/kvm/vmx/vmx_ops.h
@@ -11,14 +11,28 @@
 #include "../x86.h"
 
 void vmread_error(unsigned long field, bool fault);
-__attribute__((regparm(0))) void vmread_error_trampoline(unsigned long field,
-							 bool fault);
 void vmwrite_error(unsigned long field, unsigned long value);
 void vmclear_error(struct vmcs *vmcs, u64 phys_addr);
 void vmptrld_error(struct vmcs *vmcs, u64 phys_addr);
 void invvpid_error(unsigned long ext, u16 vpid, gva_t gva);
 void invept_error(unsigned long ext, u64 eptp, gpa_t gpa);
 
+#ifndef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
+/*
+ * The VMREAD error trampoline _always_ uses the stack to pass parameters, even
+ * for 64-bit targets.  Preserving all registers allows the VMREAD inline asm
+ * blob to avoid clobbering GPRs, which in turn allows the compiler to better
+ * optimize sequences of VMREADs.
+ *
+ * Declare trampoline as an opaque label as it's not safe to call from C code;
+ * there is no way to tell the compiler to pass params on the stack for 64-bit
+ * targets.
+ *
+ * void vmread_error_trampoline(unsigned long field, bool fault);
+ */
+extern unsigned long vmread_error_trampoline;
+#endif
+
 static __always_inline void vmcs_check16(unsigned long field)
 {
 	BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2000,

base-commit: d2a22504d86e106c63236e4d6a085c2ac91bfa73
--
Sean Christopherson Sept. 8, 2022, 5:25 p.m. UTC | #9
On Wed, Aug 17, 2022, Uros Bizjak wrote:
> There is no need to declare vmread_error asmlinkage, its arguments
> can be passed via registers for both, 32-bit and 64-bit targets.
> Function argument registers are considered call-clobbered registers,
> they are saved in the trampoline just before the function call and
> restored afterwards.
> 
> Note that asmlinkage and __attribute__((regparm(0))) have no effect
> on 64-bit targets. The trampoline is called from the assembler glue
> code that implements its own stack-passing function calling convention,
> so the attribute on the trampoline declaration does not change anything
> for 64-bit as well as 32-bit targets. We can declare it asmlinkage for
> documentation purposes.
> 
> The patch unifies trampoline function argument handling between 32-bit
> and 64-bit targets and improves generated code for 32-bit targets.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> ---

Minus the vmread_error_trampoline() change, pushed to branch `for_paolo/6.1` at:

    https://github.com/sean-jc/linux.git

Unless you hear otherwise, it will make its way to kvm/queue "soon".

Note, the commit IDs are not guaranteed to be stable.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 6de96b943804..2b83bab6e371 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -293,22 +293,13 @@  SYM_FUNC_START(vmread_error_trampoline)
 	push %r10
 	push %r11
 #endif
-#ifdef CONFIG_X86_64
+
 	/* Load @field and @fault to arg1 and arg2 respectively. */
-	mov 3*WORD_SIZE(%rbp), %_ASM_ARG2
-	mov 2*WORD_SIZE(%rbp), %_ASM_ARG1
-#else
-	/* Parameters are passed on the stack for 32-bit (see asmlinkage). */
-	push 3*WORD_SIZE(%ebp)
-	push 2*WORD_SIZE(%ebp)
-#endif
+	mov 3*WORD_SIZE(%_ASM_BP), %_ASM_ARG2
+	mov 2*WORD_SIZE(%_ASM_BP), %_ASM_ARG1
 
 	call vmread_error
 
-#ifndef CONFIG_X86_64
-	add $8, %esp
-#endif
-
 	/* Zero out @fault, which will be popped into the result register. */
 	_ASM_MOV $0, 3*WORD_SIZE(%_ASM_BP)
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d7f8331d6f7e..c940688ceaa4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -439,7 +439,7 @@  do {					\
 	pr_warn_ratelimited(fmt);	\
 } while (0)
 
-asmlinkage void vmread_error(unsigned long field, bool fault)
+void vmread_error(unsigned long field, bool fault)
 {
 	if (fault)
 		kvm_spurious_fault();
diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
index 5cfc49ddb1b4..550a89394d9f 100644
--- a/arch/x86/kvm/vmx/vmx_ops.h
+++ b/arch/x86/kvm/vmx/vmx_ops.h
@@ -10,9 +10,9 @@ 
 #include "vmcs.h"
 #include "../x86.h"
 
-asmlinkage void vmread_error(unsigned long field, bool fault);
-__attribute__((regparm(0))) void vmread_error_trampoline(unsigned long field,
-							 bool fault);
+void vmread_error(unsigned long field, bool fault);
+asmlinkage void vmread_error_trampoline(unsigned long field,
+					bool fault);
 void vmwrite_error(unsigned long field, unsigned long value);
 void vmclear_error(struct vmcs *vmcs, u64 phys_addr);
 void vmptrld_error(struct vmcs *vmcs, u64 phys_addr);