Message ID | 20190805124301.12887-3-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/boot: Cleanup | expand |
On 05.08.2019 14:42, Andrew Cooper wrote: > Split up the long asm block by commenting the logical subsections. > > The movabs for obtaining __start_xen can be a rip-relative lea instead. This > has the added advantage that objdump can now cross reference it during > disassembly. I'm surprised this works, but I take it that you've tested it: At the time the asm() executes, I thought we're still in (what EFI calls) physical mode, i.e. %rip holding a value less than 4Gb. Accessing memory using %rip-relative addressing is fine, since the Xen image is mapped in both places, but obtaining the new linear address for %rip this way via lea should not be, as this wouldn't move us to the XEN_VIRT_{START,END} range. I'm curious to learn which part of my understanding is wrong here. > The stack handing is confusing to follow. %rsp is set up by reading > stack_start which is a pointer to cpu0_stack, then constructing an lret frame > under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack), and uses > the Pascal form of lret to move %rsp to the base of cpu0_stack. > > Remove stack_start from the mix and use a single lea to load cpu0_stack base > directly, I disagree with this change, at least as long as xen/arch/x86/boot/x86_64.S also reads from stack_start, rather than accessing cpu0_stack directly. The code here is intended to mirror what's being done on the non-EFI path. And it was always my understanding that it's done this way such that one would need to go hunt for uses if one wanted to change what (right now) stack_start points at during pre-SMP boot. Otherwise stack_start wouldn't need an initializer anymore, and hence could move to .bss. > and use the more common push/push/lretq sequence for reloading %cs. I don't see what's wrong with what you call "Pascal form" of lret (C's __stdcall uses this as well, for example). I don't heavily mind this transformation, but (I'm sorry to say that) it looks to me as if this was a change for the sake of changing the code, not for making it any "better" (for whatever definition of "better"). Jan
On 06/08/2019 16:20, Jan Beulich wrote: > On 05.08.2019 14:42, Andrew Cooper wrote: >> Split up the long asm block by commenting the logical subsections. >> >> The movabs for obtaining __start_xen can be a rip-relative lea >> instead. This >> has the added advantage that objdump can now cross reference it during >> disassembly. > > I'm surprised this works, but I take it that you've tested it: So I did specifically test it, but it now occurs to me that the test I did was via the MB2 64-bit EFI path, which isn't this path. /sigh > At the time the asm() executes, I thought we're still in (what EFI > calls) physical mode, i.e. %rip holding a value less than 4Gb. In which case, what is the point of using a file format which does identify the virtual address it would prefer to run at... (This is a rhetorical question. The answer is "because EFI seems to always do the unhelpful thing, given the choice".) > Accessing memory using %rip-relative addressing is fine, since > the Xen image is mapped in both places, but obtaining the new > linear address for %rip this way via lea should not be, as this > wouldn't move us to the XEN_VIRT_{START,END} range. I'm curious > to learn which part of my understanding is wrong here. > >> The stack handing is confusing to follow. %rsp is set up by reading >> stack_start which is a pointer to cpu0_stack, then constructing an >> lret frame >> under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack), >> and uses >> the Pascal form of lret to move %rsp to the base of cpu0_stack. >> >> Remove stack_start from the mix and use a single lea to load >> cpu0_stack base >> directly, > > I disagree with this change, at least as long as > xen/arch/x86/boot/x86_64.S also reads from stack_start, rather than > accessing cpu0_stack directly. That doesn't mean that a) its conceptually the right thing to do ... > The code here is intended to mirror > what's being done on the non-EFI path. And it was always my > understanding that it's done this way such that one would need to go > hunt for uses if one wanted to change what (right now) stack_start > points at during pre-SMP boot. Otherwise stack_start wouldn't need > an initializer anymore, and hence could move to .bss. ... or b) that I have any intention of letting stack_start survive. Specifically, it is an unnecessary point of serialisation for APs, which needs to disappear. cpu0_stack is where cpu0 should have its stack, and this path is exclusive to cpu0. > >> and use the more common push/push/lretq sequence for reloading %cs. > > I don't see what's wrong with what you call "Pascal form" of lret > (C's __stdcall uses this as well, for example). I'm afraid that this statement clearly highlights the problem I'm trying to solve. > I don't heavily > mind this transformation, but (I'm sorry to say that) it looks to > me as if this was a change for the sake of changing the code, not > for making it any "better" (for whatever definition of "better"). It really doesn't matter if you can follow the code, or whether I can follow it when I've double checked the instruction behaviour because, while I'm aware this form exists, frankly I'm a little rusty on Pascal it having ceased to be a dominant programming language before I was born... The easier code is to follow, the easier time other people will have to debug and develop on it, and the healthier the Xen community will be as result. ~Andrew
On 07.08.2019 12:33, Andrew Cooper wrote: > On 06/08/2019 16:20, Jan Beulich wrote: >> On 05.08.2019 14:42, Andrew Cooper wrote: >>> Split up the long asm block by commenting the logical subsections. >>> >>> The movabs for obtaining __start_xen can be a rip-relative lea >>> instead. This >>> has the added advantage that objdump can now cross reference it during >>> disassembly. >> >> I'm surprised this works, but I take it that you've tested it: > > So I did specifically test it, but it now occurs to me that the test I > did was via the MB2 64-bit EFI path, which isn't this path. /sigh > >> At the time the asm() executes, I thought we're still in (what EFI >> calls) physical mode, i.e. %rip holding a value less than 4Gb. > > In which case, what is the point of using a file format which does > identify the virtual address it would prefer to run at... > > (This is a rhetorical question. The answer is "because EFI seems to > always do the unhelpful thing, given the choice".) Not a rhetorical question at all. As said - the pre-OS environment is a physical one, hence relocating binaries to their preferred addresses may (and with the addresses we use definitely is) not possible. Hence them relocating images by default. >> Accessing memory using %rip-relative addressing is fine, since >> the Xen image is mapped in both places, but obtaining the new >> linear address for %rip this way via lea should not be, as this >> wouldn't move us to the XEN_VIRT_{START,END} range. I'm curious >> to learn which part of my understanding is wrong here. >> >>> The stack handing is confusing to follow. %rsp is set up by reading >>> stack_start which is a pointer to cpu0_stack, then constructing an >>> lret frame >>> under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack), >>> and uses >>> the Pascal form of lret to move %rsp to the base of cpu0_stack. >>> >>> Remove stack_start from the mix and use a single lea to load >>> cpu0_stack base >>> directly, >> >> I disagree with this change, at least as long as >> xen/arch/x86/boot/x86_64.S also reads from stack_start, rather than >> accessing cpu0_stack directly. > > That doesn't mean that a) its conceptually the right thing to do ... > >> The code here is intended to mirror >> what's being done on the non-EFI path. And it was always my >> understanding that it's done this way such that one would need to go >> hunt for uses if one wanted to change what (right now) stack_start >> points at during pre-SMP boot. Otherwise stack_start wouldn't need >> an initializer anymore, and hence could move to .bss. > > ... or b) that I have any intention of letting stack_start survive. > Specifically, it is an unnecessary point of serialisation for APs, which > needs to disappear. > > cpu0_stack is where cpu0 should have its stack, and this path is > exclusive to cpu0. And I'd be okay (but not enthusiastic) to see the other CPU0 use disappear at the same time (same series at least, not necessarily same patch). >>> and use the more common push/push/lretq sequence for reloading %cs. >> >> I don't see what's wrong with what you call "Pascal form" of lret >> (C's __stdcall uses this as well, for example). > > I'm afraid that this statement clearly highlights the problem I'm trying > to solve. ??? >> I don't heavily >> mind this transformation, but (I'm sorry to say that) it looks to >> me as if this was a change for the sake of changing the code, not >> for making it any "better" (for whatever definition of "better"). > > It really doesn't matter if you can follow the code, or whether I can > follow it when I've double checked the instruction behaviour because, > while I'm aware this form exists, frankly I'm a little rusty on Pascal > it having ceased to be a dominant programming language before I was born... Still, I don't see what Pascal has got to do with it other than it being (having been) one of the users of it. I don't think insn use in our code base should be influenced by what languages or other environments use them. If they're suitable for the purpose, they're fine to use. Jan
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 7a13a30bc0..2f59d8bdbd 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -249,17 +249,20 @@ static void __init noreturn efi_arch_post_exit_boot(void) "or $"__stringify(X86_CR4_PGE)", %[cr4]\n\t" "mov %[cr4], %%cr4\n\t" #endif - "movabs $__start_xen, %[rip]\n\t" + /* Load data segments. */ "lgdt gdt_descr(%%rip)\n\t" - "mov stack_start(%%rip), %%rsp\n\t" "mov %[ds], %%ss\n\t" "mov %[ds], %%ds\n\t" "mov %[ds], %%es\n\t" "mov %[ds], %%fs\n\t" "mov %[ds], %%gs\n\t" - "movl %[cs], 8(%%rsp)\n\t" - "mov %[rip], (%%rsp)\n\t" - "lretq %[stkoff]-16" + + /* Switch stack, reload %cs and jump. */ + "lea %c[stkoff] + cpu0_stack(%%rip), %%rsp\n\t" + "lea __start_xen(%%rip), %[rip]\n\t" + "push %[cs]\n\t" + "push %[rip]\n\t" + "lretq" : [rip] "=&r" (efer/* any dead 64-bit variable */), [cr4] "+&r" (cr4) : [cr3] "r" (idle_pg_table), @@ -268,7 +271,7 @@ static void __init noreturn efi_arch_post_exit_boot(void) [stkoff] "i" (STACK_SIZE - sizeof(struct cpu_info)), "D" (&mbi) : "memory" ); - for( ; ; ); /* not reached */ + unreachable(); } static void __init efi_arch_cfg_file_early(EFI_FILE_HANDLE dir_handle, char *section)
Split up the long asm block by commenting the logical subsections. The movabs for obtaining __start_xen can be a rip-relative lea instead. This has the added advantage that objdump can now cross reference it during disassembly. The stack handing is confusing to follow. %rsp is set up by reading stack_start which is a pointer to cpu0_stack, then constructing an lret frame under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack), and uses the Pascal form of lret to move %rsp to the base of cpu0_stack. Remove stack_start from the mix and use a single lea to load cpu0_stack base directly, and use the more common push/push/lretq sequence for reloading %cs. Use unreachable() rather than an infinite for loop, which lets the compiler discard all the epilogue code that it inserted previously. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wl@xen.org> CC: Roger Pau Monné <roger.pau@citrix.com> Overall, the asm block is 10 bytes shorter, not that this was the point of the change. In principle, the constraints for [cs] and [ds] could be relaxed to include "m", but Clang decided to insert 5 rip-relative memory operands for the segment loads, which isn't a clever optimisation to make. --- xen/arch/x86/efi/efi-boot.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)