diff mbox series

[2/4] x86/boot: Minor improvements to efi_arch_post_exit_boot()

Message ID 20190805124301.12887-3-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/boot: Cleanup | expand

Commit Message

Andrew Cooper Aug. 5, 2019, 12:42 p.m. UTC
Split up the long asm block by commenting the logical subsections.

The movabs for obtaining __start_xen can be a rip-relative lea instead.  This
has the added advantage that objdump can now cross reference it during
disassembly.

The stack handing is confusing to follow.  %rsp is set up by reading
stack_start which is a pointer to cpu0_stack, then constructing an lret frame
under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack), and uses
the Pascal form of lret to move %rsp to the base of cpu0_stack.

Remove stack_start from the mix and use a single lea to load cpu0_stack base
directly, and use the more common push/push/lretq sequence for reloading %cs.

Use unreachable() rather than an infinite for loop, which lets the compiler
discard all the epilogue code that it inserted previously.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

Overall, the asm block is 10 bytes shorter, not that this was the point of the
change.

In principle, the constraints for [cs] and [ds] could be relaxed to include
"m", but Clang decided to insert 5 rip-relative memory operands for the
segment loads, which isn't a clever optimisation to make.
---
 xen/arch/x86/efi/efi-boot.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Jan Beulich Aug. 6, 2019, 3:20 p.m. UTC | #1
On 05.08.2019 14:42, Andrew Cooper wrote:
> Split up the long asm block by commenting the logical subsections.
> 
> The movabs for obtaining __start_xen can be a rip-relative lea instead.  This
> has the added advantage that objdump can now cross reference it during
> disassembly.

I'm surprised this works, but I take it that you've tested it: At
the time the asm() executes, I thought we're still in (what EFI
calls) physical mode, i.e. %rip holding a value less than 4Gb.
Accessing memory using %rip-relative addressing is fine, since
the Xen image is mapped in both places, but obtaining the new
linear address for %rip this way via lea should not be, as this
wouldn't move us to the XEN_VIRT_{START,END} range. I'm curious
to learn which part of my understanding is wrong here.

> The stack handing is confusing to follow.  %rsp is set up by reading
> stack_start which is a pointer to cpu0_stack, then constructing an lret frame
> under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack), and uses
> the Pascal form of lret to move %rsp to the base of cpu0_stack.
> 
> Remove stack_start from the mix and use a single lea to load cpu0_stack base
> directly,

I disagree with this change, at least as long as
xen/arch/x86/boot/x86_64.S also reads from stack_start, rather than
accessing cpu0_stack directly. The code here is intended to mirror
what's being done on the non-EFI path. And it was always my
understanding that it's done this way such that one would need to go
hunt for uses if one wanted to change what (right now) stack_start
points at during pre-SMP boot. Otherwise stack_start wouldn't need
an initializer anymore, and hence could move to .bss.

> and use the more common push/push/lretq sequence for reloading %cs.

I don't see what's wrong with what you call "Pascal form" of lret
(C's __stdcall uses this as well, for example). I don't heavily
mind this transformation, but (I'm sorry to say that) it looks to
me as if this was a change for the sake of changing the code, not
for making it any "better" (for whatever definition of "better").

Jan
Andrew Cooper Aug. 7, 2019, 10:33 a.m. UTC | #2
On 06/08/2019 16:20, Jan Beulich wrote:
> On 05.08.2019 14:42, Andrew Cooper wrote:
>> Split up the long asm block by commenting the logical subsections.
>>
>> The movabs for obtaining __start_xen can be a rip-relative lea
>> instead.  This
>> has the added advantage that objdump can now cross reference it during
>> disassembly.
>
> I'm surprised this works, but I take it that you've tested it:

So I did specifically test it, but it now occurs to me that the test I
did was via the MB2 64-bit EFI path, which isn't this path.  /sigh

> At the time the asm() executes, I thought we're still in (what EFI
> calls) physical mode, i.e. %rip holding a value less than 4Gb.

In which case, what is the point of using a file format which does
identify the virtual address it would prefer to run at...

(This is a rhetorical question.  The answer is "because EFI seems to
always do the unhelpful thing, given the choice".)

> Accessing memory using %rip-relative addressing is fine, since
> the Xen image is mapped in both places, but obtaining the new
> linear address for %rip this way via lea should not be, as this
> wouldn't move us to the XEN_VIRT_{START,END} range. I'm curious
> to learn which part of my understanding is wrong here.
>
>> The stack handing is confusing to follow.  %rsp is set up by reading
>> stack_start which is a pointer to cpu0_stack, then constructing an
>> lret frame
>> under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack),
>> and uses
>> the Pascal form of lret to move %rsp to the base of cpu0_stack.
>>
>> Remove stack_start from the mix and use a single lea to load
>> cpu0_stack base
>> directly,
>
> I disagree with this change, at least as long as
> xen/arch/x86/boot/x86_64.S also reads from stack_start, rather than
> accessing cpu0_stack directly.

That doesn't mean that a) its conceptually the right thing to do ...

> The code here is intended to mirror
> what's being done on the non-EFI path.  And it was always my
> understanding that it's done this way such that one would need to go
> hunt for uses if one wanted to change what (right now) stack_start
> points at during pre-SMP boot. Otherwise stack_start wouldn't need
> an initializer anymore, and hence could move to .bss.

... or b) that I have any intention of letting stack_start survive. 
Specifically, it is an unnecessary point of serialisation for APs, which
needs to disappear.

cpu0_stack is where cpu0 should have its stack, and this path is
exclusive to cpu0.

>
>> and use the more common push/push/lretq sequence for reloading %cs.
>
> I don't see what's wrong with what you call "Pascal form" of lret
> (C's __stdcall uses this as well, for example).

I'm afraid that this statement clearly highlights the problem I'm trying
to solve.

> I don't heavily
> mind this transformation, but (I'm sorry to say that) it looks to
> me as if this was a change for the sake of changing the code, not
> for making it any "better" (for whatever definition of "better").

It really doesn't matter if you can follow the code, or whether I can
follow it when I've double checked the instruction behaviour because,
while I'm aware this form exists, frankly I'm a little rusty on Pascal
it having ceased to be a dominant programming language before I was born...

The easier code is to follow, the easier time other people will have to
debug and develop on it, and the healthier the Xen community will be as
result.

~Andrew
Jan Beulich Aug. 7, 2019, 10:50 a.m. UTC | #3
On 07.08.2019 12:33, Andrew Cooper wrote:
> On 06/08/2019 16:20, Jan Beulich wrote:
>> On 05.08.2019 14:42, Andrew Cooper wrote:
>>> Split up the long asm block by commenting the logical subsections.
>>>
>>> The movabs for obtaining __start_xen can be a rip-relative lea
>>> instead.  This
>>> has the added advantage that objdump can now cross reference it during
>>> disassembly.
>>
>> I'm surprised this works, but I take it that you've tested it:
> 
> So I did specifically test it, but it now occurs to me that the test I
> did was via the MB2 64-bit EFI path, which isn't this path.  /sigh
> 
>> At the time the asm() executes, I thought we're still in (what EFI
>> calls) physical mode, i.e. %rip holding a value less than 4Gb.
> 
> In which case, what is the point of using a file format which does
> identify the virtual address it would prefer to run at...
> 
> (This is a rhetorical question.  The answer is "because EFI seems to
> always do the unhelpful thing, given the choice".)

Not a rhetorical question at all. As said - the pre-OS environment
is a physical one, hence relocating binaries to their preferred
addresses may (and with the addresses we use definitely is) not
possible. Hence them relocating images by default.

>> Accessing memory using %rip-relative addressing is fine, since
>> the Xen image is mapped in both places, but obtaining the new
>> linear address for %rip this way via lea should not be, as this
>> wouldn't move us to the XEN_VIRT_{START,END} range. I'm curious
>> to learn which part of my understanding is wrong here.
>>
>>> The stack handing is confusing to follow.  %rsp is set up by reading
>>> stack_start which is a pointer to cpu0_stack, then constructing an
>>> lret frame
>>> under %rsp (to avoid clobbering whatever is adjacent to cpu0_stack),
>>> and uses
>>> the Pascal form of lret to move %rsp to the base of cpu0_stack.
>>>
>>> Remove stack_start from the mix and use a single lea to load
>>> cpu0_stack base
>>> directly,
>>
>> I disagree with this change, at least as long as
>> xen/arch/x86/boot/x86_64.S also reads from stack_start, rather than
>> accessing cpu0_stack directly.
> 
> That doesn't mean that a) its conceptually the right thing to do ...
> 
>> The code here is intended to mirror
>> what's being done on the non-EFI path.  And it was always my
>> understanding that it's done this way such that one would need to go
>> hunt for uses if one wanted to change what (right now) stack_start
>> points at during pre-SMP boot. Otherwise stack_start wouldn't need
>> an initializer anymore, and hence could move to .bss.
> 
> ... or b) that I have any intention of letting stack_start survive.
> Specifically, it is an unnecessary point of serialisation for APs, which
> needs to disappear.
> 
> cpu0_stack is where cpu0 should have its stack, and this path is
> exclusive to cpu0.

And I'd be okay (but not enthusiastic) to see the other CPU0 use
disappear at the same time (same series at least, not necessarily
same patch).

>>> and use the more common push/push/lretq sequence for reloading %cs.
>>
>> I don't see what's wrong with what you call "Pascal form" of lret
>> (C's __stdcall uses this as well, for example).
> 
> I'm afraid that this statement clearly highlights the problem I'm trying
> to solve.

???

>> I don't heavily
>> mind this transformation, but (I'm sorry to say that) it looks to
>> me as if this was a change for the sake of changing the code, not
>> for making it any "better" (for whatever definition of "better").
> 
> It really doesn't matter if you can follow the code, or whether I can
> follow it when I've double checked the instruction behaviour because,
> while I'm aware this form exists, frankly I'm a little rusty on Pascal
> it having ceased to be a dominant programming language before I was born...

Still, I don't see what Pascal has got to do with it other than it
being (having been) one of the users of it. I don't think insn use
in our code base should be influenced by what languages or other
environments use them. If they're suitable for the purpose, they're
fine to use.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 7a13a30bc0..2f59d8bdbd 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -249,17 +249,20 @@  static void __init noreturn efi_arch_post_exit_boot(void)
                    "or     $"__stringify(X86_CR4_PGE)", %[cr4]\n\t"
                    "mov    %[cr4], %%cr4\n\t"
 #endif
-                   "movabs $__start_xen, %[rip]\n\t"
+                   /* Load data segments. */
                    "lgdt   gdt_descr(%%rip)\n\t"
-                   "mov    stack_start(%%rip), %%rsp\n\t"
                    "mov    %[ds], %%ss\n\t"
                    "mov    %[ds], %%ds\n\t"
                    "mov    %[ds], %%es\n\t"
                    "mov    %[ds], %%fs\n\t"
                    "mov    %[ds], %%gs\n\t"
-                   "movl   %[cs], 8(%%rsp)\n\t"
-                   "mov    %[rip], (%%rsp)\n\t"
-                   "lretq  %[stkoff]-16"
+
+                   /* Switch stack, reload %cs and jump. */
+                   "lea    %c[stkoff] + cpu0_stack(%%rip), %%rsp\n\t"
+                   "lea    __start_xen(%%rip), %[rip]\n\t"
+                   "push   %[cs]\n\t"
+                   "push   %[rip]\n\t"
+                   "lretq"
                    : [rip] "=&r" (efer/* any dead 64-bit variable */),
                      [cr4] "+&r" (cr4)
                    : [cr3] "r" (idle_pg_table),
@@ -268,7 +271,7 @@  static void __init noreturn efi_arch_post_exit_boot(void)
                      [stkoff] "i" (STACK_SIZE - sizeof(struct cpu_info)),
                      "D" (&mbi)
                    : "memory" );
-    for( ; ; ); /* not reached */
+    unreachable();
 }
 
 static void __init efi_arch_cfg_file_early(EFI_FILE_HANDLE dir_handle, char *section)