Message ID | 20200105164801.26278-3-liuwe@microsoft.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | More Hyper-V infrastructure | expand |
> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, paddr_t output) > +{ > + uint64_t status; > + > + asm volatile ("mov %[output], %%r8\n" > + "call hv_hypercall_page" > + : "=a" (status), "+c" (control), > + "+d" (input) ASM_CALL_CONSTRAINT > + : [output] "rm" (output) > + : "cc", "memory", "r8", "r9", "r10", "r11"); I think you want: register unsigned long r8 asm("r8") = output; and "+r" (r8) as an output constraint. In particular, that doesn't force the compiler to put output into a register other than r8 (or worse, spill it to the stack) to have the opaque blob of asm move it back into r8. What it will do in practice is cause the compiler to construct output directly in r8. As for the other clobbers, I can't find anything at all in the spec which even mentions those registers. There will be a decent improvement to code generation if we don't force them to be spilled around a hypercall. However, HyperV will(may?) modify %xmm{0..5} in some cases, and this will corrupt the current vcpu's FPU context. Care will need to be taken to spill these if necessary. ~Andrew
On Sun, Jan 05, 2020 at 07:08:28PM +0000, Andrew Cooper wrote: > > > +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, paddr_t output) > > +{ > > + uint64_t status; > > + > > + asm volatile ("mov %[output], %%r8\n" > > + "call hv_hypercall_page" > > + : "=a" (status), "+c" (control), > > + "+d" (input) ASM_CALL_CONSTRAINT > > + : [output] "rm" (output) > > + : "cc", "memory", "r8", "r9", "r10", "r11"); > > I think you want: > > register unsigned long r8 asm("r8") = output; > > and "+r" (r8) as an output constraint. Although it is named `output`, it is really just an input parameter from Hyper-V's PoV. It contains the address of the page Hyper-V should write its output to. > > In particular, that doesn't force the compiler to put output into a > register other than r8 (or worse, spill it to the stack) to have the > opaque blob of asm move it back into r8. What it will do in practice is > cause the compiler to construct output directly in r8. > > As for the other clobbers, I can't find anything at all in the spec > which even mentions those registers. There will be a decent improvement > to code generation if we don't force them to be spilled around a hypercall. > Neither can I. But Linux's commit says that's needed, so I chose to err on the safe side. I'm fine with dropping r9-r11. > However, HyperV will(may?) modify %xmm{0..5} in some cases, and this > will corrupt the current vcpu's FPU context. Care will need to be taken > to spill these if necessary. > The hypercalls we care about (TLB, APIC etc) don't use that ABI as far as I can tell. At the very least Linux, which uses the same set of hypercalls, doesn't need to save / restore XMM registers around the calls. Wei. > ~Andrew
On 05/01/2020 21:22, Wei Liu wrote: > On Sun, Jan 05, 2020 at 07:08:28PM +0000, Andrew Cooper wrote: >>> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, paddr_t output) >>> +{ >>> + uint64_t status; >>> + >>> + asm volatile ("mov %[output], %%r8\n" >>> + "call hv_hypercall_page" >>> + : "=a" (status), "+c" (control), >>> + "+d" (input) ASM_CALL_CONSTRAINT >>> + : [output] "rm" (output) >>> + : "cc", "memory", "r8", "r9", "r10", "r11"); >> I think you want: >> >> register unsigned long r8 asm("r8") = output; >> >> and "+r" (r8) as an output constraint. > Although it is named `output`, it is really just an input parameter from > Hyper-V's PoV. Yes, but it is also clobbered. This is an awkward corner case of gnu inline assembly. It is not permitted to have a clobber list overlap with any input/output operations, and because r8 doesn't have a unique letter, you can't do the usual trick of "=r8" (discard) : "r8" (input). The only available option is to mark it as read and written (which is "+r" in the output list), and not use the C variable r8 at any point later. Having looked through the spec a bit more, is this a wise API to have in the first place? input and output (perhaps better named input_addr and output_addr) are fixed per CPU, and control is semantically linked to the hypercall and its particular ABI. I suppose the answer ultimately depends on what the callers look like. > >> In particular, that doesn't force the compiler to put output into a >> register other than r8 (or worse, spill it to the stack) to have the >> opaque blob of asm move it back into r8. What it will do in practice is >> cause the compiler to construct output directly in r8. >> >> As for the other clobbers, I can't find anything at all in the spec >> which even mentions those registers. There will be a decent improvement >> to code generation if we don't force them to be spilled around a hypercall. >> > Neither can I. But Linux's commit says that's needed, so I chose to err > on the safe side. That's dull. Is there any qualifying information? >> However, HyperV will(may?) modify %xmm{0..5} in some cases, and this >> will corrupt the current vcpu's FPU context. Care will need to be taken >> to spill these if necessary. >> > The hypercalls we care about (TLB, APIC etc) don't use that ABI as far > as I can tell. At the very least Linux, which uses the same set of > hypercalls, doesn't need to save / restore XMM registers around the > calls. Ok - it looks to be fine until we need to use a hypercall which uses the fast extended ABI, which is the first to introduce the use of the %xmm registers. ~Andrew
On 05.01.2020 17:47, Wei Liu wrote: > +static inline uint64_t hv_do_fast_hypercall(uint16_t code, > + uint64_t input1, uint64_t input2) > +{ > + uint64_t status; > + uint64_t control = (uint64_t)code | HV_HYPERCALL_FAST_BIT; Unnecessary (afaict) cast. > + asm volatile ("mov %[input2], %%r8\n" > + "call hv_hypercall_page" > + : "=a" (status), "+c" (control), > + "+d" (input1) ASM_CALL_CONSTRAINT > + : [input2] "rm" (input2) > + : "cc", "r8", "r9", "r10", "r11"); > + > + return status; > +} > + > +static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t rep_count, > + uint16_t varhead_size, > + paddr_t input, paddr_t output) > +{ > + uint64_t control = code; > + uint64_t status; > + uint16_t rep_comp; > + > + control |= (uint64_t)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET; > + control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET; > + > + do { > + status = hv_do_hypercall(control, input, output); > + if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS ) > + break; > + > + rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >> > + HV_HYPERCALL_REP_COMP_OFFSET; MASK_EXTR()? (I then also wonder whether MASK_INSR() would better be used with some of the other constructs here.) What's worse though - looking at the definition of HV_HYPERCALL_REP_COMP_MASK I notice that it and a few others use GENMASK_ULL(), when it was clearly said during review (perhaps of another but related patch) that this macro should not be used outside of Arm-specific code until it gets put into better shape: https://lists.xenproject.org/archives/html/xen-devel/2019-12/msg00705.html > + control &= ~HV_HYPERCALL_REP_START_MASK; > + control |= (uint64_t)rep_comp << HV_HYPERCALL_REP_COMP_OFFSET; > + > + } while ( rep_comp < rep_count ); We don't normally have a blank line ahead of the closing brace of a block. Jan
On Sun, Jan 05, 2020 at 10:06:08PM +0000, Andrew Cooper wrote: > On 05/01/2020 21:22, Wei Liu wrote: > > On Sun, Jan 05, 2020 at 07:08:28PM +0000, Andrew Cooper wrote: > >>> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, paddr_t output) > >>> +{ > >>> + uint64_t status; > >>> + > >>> + asm volatile ("mov %[output], %%r8\n" > >>> + "call hv_hypercall_page" > >>> + : "=a" (status), "+c" (control), > >>> + "+d" (input) ASM_CALL_CONSTRAINT > >>> + : [output] "rm" (output) > >>> + : "cc", "memory", "r8", "r9", "r10", "r11"); > >> I think you want: > >> > >> register unsigned long r8 asm("r8") = output; > >> > >> and "+r" (r8) as an output constraint. > > Although it is named `output`, it is really just an input parameter from > > Hyper-V's PoV. > > Yes, but it is also clobbered. > > This is an awkward corner case of gnu inline assembly. > > It is not permitted to have a clobber list overlap with any input/output > operations, and because r8 doesn't have a unique letter, you can't do > the usual trick of "=r8" (discard) : "r8" (input). > > The only available option is to mark it as read and written (which is > "+r" in the output list), and not use the C variable r8 at any point later. But r8 is only listed in clobber list, so it certainly doesn't overlap with any input register. I fail to see what the bug (if there is any) is here. I think what you're asking for here is an optimisation. Is that correct? I don't mind changing the code. What I need is clarification here. > > > Having looked through the spec a bit more, is this a wise API to have in > the first place? input and output (perhaps better named input_addr and > output_addr) are fixed per CPU, and control is semantically linked to > the hypercall and its particular ABI. > > I suppose the answer ultimately depends on what the callers look like. The call sites will be like struct hv_input_arg *input_arg; input_arg = per_cpu_input_page; input_arg.foo = xxx; input_arg.bar = xxx; hv_do_hypercall(control, virt_to_maddr(input_arg), NULL); . (Alternatively, we can put virt_to_maddr in hv_do_hypercall now that we're sure the input page is from xenheap) > > > > >> In particular, that doesn't force the compiler to put output into a > >> register other than r8 (or worse, spill it to the stack) to have the > >> opaque blob of asm move it back into r8. What it will do in practice is > >> cause the compiler to construct output directly in r8. > >> > >> As for the other clobbers, I can't find anything at all in the spec > >> which even mentions those registers. There will be a decent improvement > >> to code generation if we don't force them to be spilled around a hypercall. > >> > > Neither can I. But Linux's commit says that's needed, so I chose to err > > on the safe side. > > That's dull. Is there any qualifying information? See Linux commit fc53662f13b. I will also ask my contact in Hyper-V team for clarification. Wei.
On Mon, Jan 06, 2020 at 10:38:23AM +0100, Jan Beulich wrote: [...] > > + > > +static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t rep_count, > > + uint16_t varhead_size, > > + paddr_t input, paddr_t output) > > +{ > > + uint64_t control = code; > > + uint64_t status; > > + uint16_t rep_comp; > > + > > + control |= (uint64_t)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET; > > + control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET; > > + > > + do { > > + status = hv_do_hypercall(control, input, output); > > + if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS ) > > + break; > > + > > + rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >> > > + HV_HYPERCALL_REP_COMP_OFFSET; > > MASK_EXTR()? (I then also wonder whether MASK_INSR() would better be > used with some of the other constructs here.) Sure, I can see if that can be used. > > What's worse though - looking at the definition of > HV_HYPERCALL_REP_COMP_MASK I notice that it and a few others use > GENMASK_ULL(), when it was clearly said during review (perhaps of > another but related patch) that this macro should not be used > outside of Arm-specific code until it gets put into better shape: > https://lists.xenproject.org/archives/html/xen-devel/2019-12/msg00705.html That's a straight import from Linux. I only made the header build without further inspection. That can be fixed, of course. Wei.
On Sun, Jan 05, 2020 at 07:08:28PM +0000, Andrew Cooper wrote: > > > +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, paddr_t output) > > +{ > > + uint64_t status; > > + > > + asm volatile ("mov %[output], %%r8\n" > > + "call hv_hypercall_page" > > + : "=a" (status), "+c" (control), > > + "+d" (input) ASM_CALL_CONSTRAINT > > + : [output] "rm" (output) > > + : "cc", "memory", "r8", "r9", "r10", "r11"); > > I think you want: > > register unsigned long r8 asm("r8") = output; > > and "+r" (r8) as an output constraint. > > In particular, that doesn't force the compiler to put output into a > register other than r8 (or worse, spill it to the stack) to have the > opaque blob of asm move it back into r8. What it will do in practice is > cause the compiler to construct output directly in r8. > > As for the other clobbers, I can't find anything at all in the spec > which even mentions those registers. There will be a decent improvement > to code generation if we don't force them to be spilled around a hypercall. This is actually from Windows 2012 R2's TLFS. Current version of Hyper-V doesn't clobber those registers anymore. I think just putting "memory" there should be fine. Wei.
On 07/01/2020 16:17, Wei Liu wrote: > On Sun, Jan 05, 2020 at 10:06:08PM +0000, Andrew Cooper wrote: >> On 05/01/2020 21:22, Wei Liu wrote: >>> On Sun, Jan 05, 2020 at 07:08:28PM +0000, Andrew Cooper wrote: >>>>> +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, paddr_t output) >>>>> +{ >>>>> + uint64_t status; >>>>> + >>>>> + asm volatile ("mov %[output], %%r8\n" >>>>> + "call hv_hypercall_page" >>>>> + : "=a" (status), "+c" (control), >>>>> + "+d" (input) ASM_CALL_CONSTRAINT >>>>> + : [output] "rm" (output) >>>>> + : "cc", "memory", "r8", "r9", "r10", "r11"); >>>> I think you want: >>>> >>>> register unsigned long r8 asm("r8") = output; >>>> >>>> and "+r" (r8) as an output constraint. >>> Although it is named `output`, it is really just an input parameter from >>> Hyper-V's PoV. >> Yes, but it is also clobbered. >> >> This is an awkward corner case of gnu inline assembly. >> >> It is not permitted to have a clobber list overlap with any input/output >> operations, and because r8 doesn't have a unique letter, you can't do >> the usual trick of "=r8" (discard) : "r8" (input). >> >> The only available option is to mark it as read and written (which is >> "+r" in the output list), and not use the C variable r8 at any point later. > But r8 is only listed in clobber list, so it certainly doesn't overlap > with any input register. I fail to see what the bug (if there is any) is > here. %r8 is an input parameter. You have "mov %[output], %%r8" in the asm. The way this is written causes the compiler to construct %[output] in a register, pass it in via the sole input parameter, and behind the scenes move it into %r8. There is a small chance of the emitted asm being "mov %r8, %r8", and a much larger chance of the compiler being forced to spill a different register when it could have used %r8 in the first place. > I think what you're asking for here is an optimisation. Is that correct? > I don't mind changing the code. What I need is clarification here. I'm on the fence as to whether to put in the category of optimisation, or buggy asm. I think the generated code will DTRT, but it is a suspicious looking piece of assembly, so "optimisation" I guess? ~Andrew
diff --git a/xen/include/asm-x86/guest/hyperv-hcall.h b/xen/include/asm-x86/guest/hyperv-hcall.h new file mode 100644 index 0000000000..4b87dcfe64 --- /dev/null +++ b/xen/include/asm-x86/guest/hyperv-hcall.h @@ -0,0 +1,95 @@ +/****************************************************************************** + * asm-x86/guest/hyperv-hcall.h + * + * This program is free software; you can redistribute it and/or + * modify it under the terms and conditions of the GNU General Public + * License, version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; If not, see <http://www.gnu.org/licenses/>. + * + * Copyright (c) 2019 Microsoft. + */ + +#ifndef __X86_HYPERV_HCALL_H__ +#define __X86_HYPERV_HCALL_H__ + +#include <xen/types.h> + +#include <asm/asm_defns.h> +#include <asm/guest/hyperv-tlfs.h> +#include <asm/page.h> + +static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input, paddr_t output) +{ + uint64_t status; + + asm volatile ("mov %[output], %%r8\n" + "call hv_hypercall_page" + : "=a" (status), "+c" (control), + "+d" (input) ASM_CALL_CONSTRAINT + : [output] "rm" (output) + : "cc", "memory", "r8", "r9", "r10", "r11"); + + return status; +} + +static inline uint64_t hv_do_fast_hypercall(uint16_t code, + uint64_t input1, uint64_t input2) +{ + uint64_t status; + uint64_t control = (uint64_t)code | HV_HYPERCALL_FAST_BIT; + + asm volatile ("mov %[input2], %%r8\n" + "call hv_hypercall_page" + : "=a" (status), "+c" (control), + "+d" (input1) ASM_CALL_CONSTRAINT + : [input2] "rm" (input2) + : "cc", "r8", "r9", "r10", "r11"); + + return status; +} + +static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t rep_count, + uint16_t varhead_size, + paddr_t input, paddr_t output) +{ + uint64_t control = code; + uint64_t status; + uint16_t rep_comp; + + control |= (uint64_t)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET; + control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET; + + do { + status = hv_do_hypercall(control, input, output); + if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS ) + break; + + rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >> + HV_HYPERCALL_REP_COMP_OFFSET; + + control &= ~HV_HYPERCALL_REP_START_MASK; + control |= (uint64_t)rep_comp << HV_HYPERCALL_REP_COMP_OFFSET; + + } while ( rep_comp < rep_count ); + + return status; +} + +#endif /* __X86_HYPERV_HCALL_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */
These functions will be used later to make hypercalls to Hyper-V. I couldn't find reference in TLFS that Hyper-V clobbers flags and r9-r11, but Linux's commit message says it does. Err on the safe side. Signed-off-by: Wei Liu <liuwe@microsoft.com> --- v3: 1. Name the file hyperv-hcall.h v2: 1. Use direct call --- xen/include/asm-x86/guest/hyperv-hcall.h | 95 ++++++++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 xen/include/asm-x86/guest/hyperv-hcall.h