diff mbox

[v5,15/16] x86: make Xen early boot code relocatable

Message ID 1471646606-28519-16-git-send-email-daniel.kiper@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Kiper Aug. 19, 2016, 10:43 p.m. UTC
Every multiboot protocol (regardless of version) compatible image must
specify its load address (in ELF or multiboot header). Multiboot protocol
compatible loader have to load image at specified address. However, there
is no guarantee that the requested memory region (in case of Xen it starts
at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
and it is free (legacy BIOS platforms are merciful for Xen but I found at
least one EFI platform on which Xen load address conflicts with EFI boot
services; it is Dell PowerEdge R820 with latest firmware). To cope with that
problem we must make Xen early boot code relocatable and help boot loader to
relocate image in proper way by suggesting, not requesting specific load
addresses as it is right now, allowed address ranges. This patch does former.
It does not add multiboot2 protocol interface which is done in "x86: add
multiboot2 protocol support for relocatable images" patch.

This patch changes following things:
  - default load address is changed from 1 MiB to 2 MiB; I did that because
    initial page tables are using 2 MiB huge pages and this way required
    updates for them are quite easy; it means that e.g. we avoid spacial
    cases for start and end of required memory region if it live at address
    not aligned to 2 MiB,
  - %esi and %r15d registers are used as a storage for Xen image load base
    address (%r15d shortly because %rsi is used for EFI SystemTable address
    in 64-bit code); both registers are (%esi is mostly) unused in early
    boot code and preserved during C functions calls,
  - %fs is used as base for Xen data relative addressing in 32-bit code
    if it is possible; %esi is used for that thing during error printing
    because it is not always possible to properly and efficiently
    initialize %fs.

PS I am still not convinced that move to %fs relative addressing is good
   idea. As you can see code grows larger due to GDT initialization stuff,
   etc. However, I cannot see potential gains for now and future (probably
   it would be if whole Xen code, not early boot one, played segment registers
   games). Well, maybe in one or two places where base register is not used
   in full SIB addressing mode. So, question is: does it pay? Does gains
   overweight all efforts related to %fs games? Maybe we should stay with
   %esi relative addressing? Of course I am aware that it is not perfect.
   However, IMO, it is much simpler and clearer.
   This is my suggestion. If you agree with me I can change code once again
   and back to %esi. This is not big problem. If not I am not going to argue
   longer. I will do what you request. Well, it will be nice if you convince
   me that your idea is good and I am wrong then...  ;-)))

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
v4 - suggestions/fixes:
   - do not relocate Xen image if boot loader did work for us
     (suggested by Andrew Cooper and Jan Beulich),
   - initialize xen_img_load_base_addr in EFI boot code too,
   - properly initialize trampoline_xen_phys_start,
   - calculate Xen image load base address in
     x86_64 code ourselves,
     (suggested by Jan Beulich),
   - change how and when Xen image base address is printed,
   - use %fs instead of %esi for relative addressing
     (suggested by Andrew Cooper and Jan Beulich),
   - create esi_offset and fs_offset() macros in assembly,
   - calculate <final-exec-addr> mkelf32 argument automatically,
   - optimize and cleanup code,
   - improve comments,
   - improve commit message.

v3 - suggestions/fixes:
   - improve segment registers initialization
     (suggested by Jan Beulich),
   - simplify Xen image load base address calculation
     (suggested by Jan Beulich),
   - use %esi and %r15d instead of %ebp to store
     Xen image load base address,
   - use %esi instead of %fs for relative addressing;
     this way we get shorter and simpler code,
   - rename some variables and constants
     (suggested by Jan Beulich),
   - improve comments
     (suggested by Konrad Rzeszutek Wilk),
   - improve commit message
     (suggested by Jan Beulich).
---
 xen/arch/x86/Makefile          |    4 +-
 xen/arch/x86/Rules.mk          |    4 +
 xen/arch/x86/boot/head.S       |  204 +++++++++++++++++++++++++++++++---------
 xen/arch/x86/boot/trampoline.S |   10 +-
 xen/arch/x86/boot/wakeup.S     |    4 +-
 xen/arch/x86/boot/x86_64.S     |   51 ++++------
 xen/arch/x86/efi/efi-boot.h    |    3 +-
 xen/arch/x86/setup.c           |   31 +++---
 xen/arch/x86/xen.lds.S         |    8 +-
 xen/include/asm-x86/config.h   |    1 +
 xen/include/asm-x86/page.h     |    2 +-
 11 files changed, 217 insertions(+), 105 deletions(-)

Comments

Jan Beulich Aug. 25, 2016, 2:41 p.m. UTC | #1
>>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
> Every multiboot protocol (regardless of version) compatible image must
> specify its load address (in ELF or multiboot header). Multiboot protocol
> compatible loader have to load image at specified address. However, there
> is no guarantee that the requested memory region (in case of Xen it starts
> at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
> and it is free (legacy BIOS platforms are merciful for Xen but I found at
> least one EFI platform on which Xen load address conflicts with EFI boot
> services; it is Dell PowerEdge R820 with latest firmware). To cope with that
> problem we must make Xen early boot code relocatable and help boot loader to
> relocate image in proper way by suggesting, not requesting specific load
> addresses as it is right now, allowed address ranges. This patch does former.
> It does not add multiboot2 protocol interface which is done in "x86: add
> multiboot2 protocol support for relocatable images" patch.
> 
> This patch changes following things:
>   - default load address is changed from 1 MiB to 2 MiB; I did that because
>     initial page tables are using 2 MiB huge pages and this way required
>     updates for them are quite easy; it means that e.g. we avoid spacial
>     cases for start and end of required memory region if it live at address
>     not aligned to 2 MiB,

Should this be a separate change then, to separate possible
regressions due to that from such due to any other of the changes
here? And then I don't see why, when making the image relocatable
anyway, the link time load address actually still matters.

>   - %esi and %r15d registers are used as a storage for Xen image load base
>     address (%r15d shortly because %rsi is used for EFI SystemTable address
>     in 64-bit code); both registers are (%esi is mostly) unused in early
>     boot code and preserved during C functions calls,

In a context where you (also) talk about 64-bit code, %esi can't
be called preserved unilaterally. You should make clear that this is
for 32-bit function calls.

> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -12,13 +12,16 @@
>          .text
>          .code32
>  
> -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
> +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)

In a patch already large and hence hard to review, did you really
need to do this rename?

> @@ -126,26 +130,26 @@ vga_text_buffer:
>          .section .init.text, "ax", @progbits
>  
>  bad_cpu:
> -        mov     $(sym_phys(.Lbad_cpu_msg)),%esi # Error message
> +        lea     esi_offset(.Lbad_cpu_msg),%esi  # Error message

Why not just

        add     $sym_offset(.Lbad_cpu_msg),%esi  # Error message

? Together with not doing said rename, this would be more obviously
correct.

> @@ -409,36 +436,93 @@ trampoline_bios_setup:
>          cmovb   %edx,%ecx           /* and use the smaller */
>  
>  trampoline_setup:
> +        /*
> +         * Called on legacy BIOS and EFI platforms.
> +         *
> +         * Compute 0-15 bits of BOOT_FS segment descriptor base address.
> +         */
> +        mov     %esi,%edx
> +        shl     $16,%edx
> +        or      %edx,BOOT_FS+esi_offset(trampoline_gdt)

	mov   %si,BOOT_FS+esi_offset(trampoline_gdt)

> +        /* Compute 16-23 bits of BOOT_FS segment descriptor base address. */
> +        mov     %esi,%edx
> +        shr     $16,%edx
> +        and     $0x000000ff,%edx
> +        or      %edx,BOOT_FS+4+esi_offset(trampoline_gdt)

	mov   %dl,BOOT_FS+4+esi_offset(trampoline_gdt)

> +        /* Compute 24-31 bits of BOOT_FS segment descriptor base address. */
> +        mov     %esi,%edx
> +        and     $0xff000000,%edx
> +        or      %edx,BOOT_FS+4+esi_offset(trampoline_gdt)

	mov   %dh,BOOT_FS+7+esi_offset(trampoline_gdt)

(with various of the intermediate instructions dropped)

That'll reduce you code size concern for the GDT setup quite a bit.

> +        /*
> +         * Initialise %fs and later use it to access Xen data if possible.
> +         * According to Intel 64 and IA-32 Architectures Software Developer’s
> +         * Manual it is safe to do that without reloading GDTR before.
> +         *
> +         * Please check Intel 64 and IA-32 Architectures Software Developer’s
> +         * Manual, Volume 2 (2A, 2B & 2C): Instruction Set Reference,
> +         * LGDT and MOV instructions description and
> +         * Intel 64 and IA-32 Architectures Software Developer’s
> +         * Manual Volume 3 (3A, 3B & 3C): System Programming Guide,
> +         * section 3.4.3, Segment Registers for more details.
> +         *
> +         * AIUI, only GDT address and limit are loaded into GDTR when
> +         * lgdt is executed. Segment descriptor is loaded directly from
> +         * memory into segment register (hiden part) only when relevant
> +         * load instruction is used (e.g. mov %edx,%fs). Though GDT content
> +         * probably could be stored in CPU cache but nothing suggest that
> +         * CPU caching interfere in one way or another with segment descriptor
> +         * load. So, it looks that every change in active GDT is immediately
> +         * available for relevant segment descriptor load instruction.
> +         *
> +         * I was not able to find anything which invalidates above.
> +         * So, everything suggest that we do not need an extra lgdt here.
> +         */

All of this comment except for the first sentence is just stating basic
architecture facts. Please drop all that.

> +        mov     $BOOT_FS,%edx
> +        mov     %edx,%fs
> +
>          /* Reserve 64kb for the trampoline. */
>          sub     $0x1000,%ecx
>  
>          /* From arch/x86/smpboot.c: start_eip had better be page-aligned! */
>          xor     %cl, %cl
>          shl     $4, %ecx
> -        mov     %ecx,sym_phys(trampoline_phys)
> +        mov     %ecx,fs_offset(trampoline_phys)

Seeing the first instance, together with the earlier comment about
not renaming sym_phys() I think this would end up more consistently
with using sym_fs() (and then obviously sym_esi()) as names.

> @@ -461,62 +545,88 @@ trampoline_setup:
>  
>          /* Stash TSC to calculate a good approximation of time-since-boot */
>          rdtsc
> -        mov     %eax,sym_phys(boot_tsc_stamp)
> -        mov     %edx,sym_phys(boot_tsc_stamp+4)
> +        mov     %eax,fs_offset(boot_tsc_stamp)
> +        mov     %edx,fs_offset(boot_tsc_stamp)+4
> +
> +        /* Update frame addresses in page tables. */
> +        mov     $((__page_tables_end-__page_tables_start)/8),%ecx
> +1:      testl   $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8)
> +        jz      2f
> +        add     %esi,fs_offset(__page_tables_start)-8(,%ecx,8)
> +2:      loop    1b

This loop includes the mapping of the low Mb, which I don't think it
should modify. Or wait - you move __page_tables_start, which by
itself is fragile (but looks to be acceptable if accompanied by a
comment explaining why it doesn't cover l1_identmap).

Also, just to double check - all these page table setup adjustments
don't require reflecting in xen.efi's page table setup code?

>          /* Apply relocations to bootstrap trampoline. */
> -        mov     sym_phys(trampoline_phys),%edx
> -        mov     $sym_phys(__trampoline_rel_start),%edi
> +        mov     fs_offset(trampoline_phys),%edx
> +        mov     $sym_offset(__trampoline_rel_start),%edi
> +        mov     $sym_offset(__trampoline_rel_stop),%ebx
>  1:
> -        mov     (%edi),%eax
> -        add     %edx,(%edi,%eax)
> +        mov     %fs:(%edi),%eax
> +        add     %edx,%fs:(%edi,%eax)
>          add     $4,%edi
> -        cmp     $sym_phys(__trampoline_rel_stop),%edi
> +        cmp     %ebx,%edi
>          jb      1b

And again it looks like the switch to using %ebx here is not needed.

>          /* Patch in the trampoline segment. */
>          shr     $4,%edx
> -        mov     $sym_phys(__trampoline_seg_start),%edi
> +        mov     $sym_offset(__trampoline_seg_start),%edi
> +        mov     $sym_offset(__trampoline_seg_stop),%ebx
>  1:
> -        mov     (%edi),%eax
> -        mov     %dx,(%edi,%eax)
> +        mov     %fs:(%edi),%eax
> +        mov     %dx,%fs:(%edi,%eax)
>          add     $4,%edi
> -        cmp     $sym_phys(__trampoline_seg_stop),%edi
> +        cmp     %ebx,%edi
>          jb      1b

Same here then, obviously.

> --- a/xen/arch/x86/boot/trampoline.S
> +++ b/xen/arch/x86/boot/trampoline.S
> @@ -54,12 +54,20 @@ trampoline_gdt:
>          /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
>          .long   0x0000ffff
>          .long   0x00009200
> +        /*
> +         * 0x0030: ring 0 Xen data, 16 MiB size, base
> +         * address is computed during runtime.
> +         */
> +        .quad   0x00c0920000001000

This does not look like it covers 1Mb, it's more like 1Mb+4k-1.

>          .pushsection .trampoline_rel, "a"
>          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
>          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
>          .popsection
>  
> +GLOBAL(xen_img_load_base_addr)
> +        .long   0

I've yet to understand the purpose of this symbol, but in any case:
If the trampoline code doesn't reference it, why does it get placed
here?

> @@ -861,15 +860,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1);
>  #endif
>  
> +    /* Do not relocate Xen image if boot loader did work for us. */
> +    if ( xen_img_load_base_addr )
> +        xen_phys_start = xen_img_load_base_addr;

Okay, with this change the question really is: Why do you need the
new symbol? I.e. why can't you just use xen_phys_start, just like I
managed to re-use it when I introduced boot from EFI?

>      for ( i = boot_e820.nr_map-1; i >= 0; i-- )
>      {
>          uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
>          uint64_t end, limit = ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT;
>  
> -        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */

I can see that you want to get rid of BOOTSTRAP_MAP_BASE, but
please don't delete the comment as a whole.

>          s = (boot_e820.map[i].addr + mask) & ~mask;
>          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
>              continue;

This means you now map memory between 2Mb and 16Mb here. Is
this necessary?

> @@ -901,7 +902,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>              l4_pgentry_t *pl4e;
>              l3_pgentry_t *pl3e;
>              l2_pgentry_t *pl2e;
> -            uint64_t load_start;
>              int i, j, k;
>  
>              /* Select relocation address. */
> @@ -915,9 +915,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>               * with a barrier(). After this we must *not* modify static/global
>               * data until after we have switched to the relocated pagetables!
>               */
> -            load_start = (unsigned long)_start - XEN_VIRT_START;
>              barrier();
> -            move_memory(e + load_start, load_start, _end - _start, 1);
> +            move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, 1);

Assuming _start - XEN_VIRT_START == XEN_IMG_OFFSET, is this
change necessary? Or is it rather just simplification, which again
should be split from an already complex patch.

> @@ -957,15 +956,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>               * Undo the temporary-hooking of the l1_identmap.  __2M_text_start
>               * is contained in this PTE.
>               */
> -            BUG_ON(l2_table_offset((unsigned long)_erodata) ==
> -                   l2_table_offset((unsigned long)_stext));

At least when using_2M_mapping() this surely ought to hold?

> @@ -1019,6 +1017,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>                  : "memory" );
>  
>              bootstrap_map(NULL);
> +
> +            printk("New Xen image base address: 0x%08lx\n", xen_phys_start);

I don't see the motivation for adding such a message in this patch,
but if, then please use %#lx.

> @@ -1082,6 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      if ( !xen_phys_start )
>          panic("Not enough memory to relocate Xen.");
> +
>      reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end));

Another stray change.

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -55,7 +55,7 @@ SECTIONS
>    __2M_text_start = .;         /* Start of 2M superpages, mapped RX. */
>  #endif
>  
> -  . = __XEN_VIRT_START + MB(1);
> +  . = __XEN_VIRT_START + XEN_IMG_OFFSET;
>    _start = .;
>    .text : {
>          _stext = .;            /* Text and read-only data */
> @@ -257,12 +257,14 @@ SECTIONS
>    .reloc : {
>      *(.reloc)
>    } :text
> -  /* Trick the linker into setting the image size to exactly 16Mb. */
>    . = ALIGN(__section_alignment__);
> +#endif
> +
> +  /* Trick the linker into setting the image size to exactly 16Mb. */
>    .pad : {
>      . = ALIGN(MB(16));
> +    __end_of_image__ = .;

I see that you use this symbol in xen/arch/x86/Makefile, but I again
don't follow why this logic needs to change.

> --- a/xen/include/asm-x86/page.h
> +++ b/xen/include/asm-x86/page.h
> @@ -288,7 +288,7 @@ extern root_pgentry_t 
> idle_pg_table[ROOT_PAGETABLE_ENTRIES];
>  extern l2_pgentry_t  *compat_idle_pg_table_l2;
>  extern unsigned int   m2p_compat_vstart;
>  extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
> -    l2_bootmap[L2_PAGETABLE_ENTRIES];
> +    l2_bootmap[4*L2_PAGETABLE_ENTRIES];

Why do you need 4 of them? I can see why one might not suffice,
but two surely should?

Jan
Daniel Kiper Aug. 31, 2016, 2:59 p.m. UTC | #2
On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
> >>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
> > Every multiboot protocol (regardless of version) compatible image must
> > specify its load address (in ELF or multiboot header). Multiboot protocol
> > compatible loader have to load image at specified address. However, there
> > is no guarantee that the requested memory region (in case of Xen it starts
> > at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
> > and it is free (legacy BIOS platforms are merciful for Xen but I found at
> > least one EFI platform on which Xen load address conflicts with EFI boot
> > services; it is Dell PowerEdge R820 with latest firmware). To cope with that
> > problem we must make Xen early boot code relocatable and help boot loader to
> > relocate image in proper way by suggesting, not requesting specific load
> > addresses as it is right now, allowed address ranges. This patch does former.
> > It does not add multiboot2 protocol interface which is done in "x86: add
> > multiboot2 protocol support for relocatable images" patch.
> >
> > This patch changes following things:
> >   - default load address is changed from 1 MiB to 2 MiB; I did that because
> >     initial page tables are using 2 MiB huge pages and this way required
> >     updates for them are quite easy; it means that e.g. we avoid spacial
> >     cases for start and end of required memory region if it live at address
> >     not aligned to 2 MiB,
>
> Should this be a separate change then, to separate possible
> regressions due to that from such due to any other of the changes

Potentially yes. However, it should be done properly. Otherwise in
case of revert we can break Xen relocatable infrastructure and other
things. So, I am not sure does it pays. Anyway, I will check is it
possible or not.

> here? And then I don't see why, when making the image relocatable
> anyway, the link time load address actually still matters.

It matters for multiboot (v1) and multiboot2 compatible boot
loaders not supporting relocatable images.

> >   - %esi and %r15d registers are used as a storage for Xen image load base
> >     address (%r15d shortly because %rsi is used for EFI SystemTable address
> >     in 64-bit code); both registers are (%esi is mostly) unused in early
> >     boot code and preserved during C functions calls,
>
> In a context where you (also) talk about 64-bit code, %esi can't
> be called preserved unilaterally. You should make clear that this is
> for 32-bit function calls.

OK.

> > --- a/xen/arch/x86/boot/head.S
> > +++ b/xen/arch/x86/boot/head.S
> > @@ -12,13 +12,16 @@
> >          .text
> >          .code32
> >
> > -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
> > +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)
>
> In a patch already large and hence hard to review, did you really
> need to do this rename?

I think that sym_offset() is better term here. However, if you wish
I can leave sym_phys() as is. Though, IMO, sym_phys() name suggests
that we are playing with physical addresses and it is not correct in
all contexts, e.g. loot at fs_offset() (or sym_fs() as you wish).

> > @@ -126,26 +130,26 @@ vga_text_buffer:
> >          .section .init.text, "ax", @progbits
> >
> >  bad_cpu:
> > -        mov     $(sym_phys(.Lbad_cpu_msg)),%esi # Error message
> > +        lea     esi_offset(.Lbad_cpu_msg),%esi  # Error message
>
> Why not just
>
>         add     $sym_offset(.Lbad_cpu_msg),%esi  # Error message

OK.

[...]

> > @@ -461,62 +545,88 @@ trampoline_setup:
> >
> >          /* Stash TSC to calculate a good approximation of time-since-boot */
> >          rdtsc
> > -        mov     %eax,sym_phys(boot_tsc_stamp)
> > -        mov     %edx,sym_phys(boot_tsc_stamp+4)
> > +        mov     %eax,fs_offset(boot_tsc_stamp)
> > +        mov     %edx,fs_offset(boot_tsc_stamp)+4
> > +
> > +        /* Update frame addresses in page tables. */
> > +        mov     $((__page_tables_end-__page_tables_start)/8),%ecx
> > +1:      testl   $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8)
> > +        jz      2f
> > +        add     %esi,fs_offset(__page_tables_start)-8(,%ecx,8)
> > +2:      loop    1b
>
> This loop includes the mapping of the low Mb, which I don't think it
> should modify. Or wait - you move __page_tables_start, which by
> itself is fragile (but looks to be acceptable if accompanied by a
> comment explaining why it doesn't cover l1_identmap).

OK, I will add relevant comment somewhere.

> Also, just to double check - all these page table setup adjustments
> don't require reflecting in xen.efi's page table setup code?

I will check it once again but I do not think so.

[...]

> > --- a/xen/arch/x86/boot/trampoline.S
> > +++ b/xen/arch/x86/boot/trampoline.S
> > @@ -54,12 +54,20 @@ trampoline_gdt:
> >          /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
> >          .long   0x0000ffff
> >          .long   0x00009200
> > +        /*
> > +         * 0x0030: ring 0 Xen data, 16 MiB size, base
> > +         * address is computed during runtime.
> > +         */
> > +        .quad   0x00c0920000001000
>
> This does not look like it covers 1Mb, it's more like 1Mb+4k-1.

I have checked it once again. It covers 16 MiB as required:
  4 KiB * 0x1000 = 0x1000000 (no taking into account relocation).

By the way, it looks that we can define this segment as 0x200000 - 0x1000000.
However, then sym_fs() should be defined as:

#define sym_fs(sym)    %fs:(sym_offset(sym) - XEN_IMG_OFFSET)

Does it make sense?

> >          .pushsection .trampoline_rel, "a"
> >          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
> >          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
> >          .popsection
> >
> > +GLOBAL(xen_img_load_base_addr)
> > +        .long   0
>
> I've yet to understand the purpose of this symbol, but in any case:
> If the trampoline code doesn't reference it, why does it get placed
> here?

This symbol was used by code which unconditionally relocated Xen image
even if boot loader did work for us. I removed most of this code because
we agreed that if boot loader relocated Xen image then we do not have to
relocate it higher once again. If you are still OK with this idea I can go
further, as I planned, and remove all such remnants from next release.
However, it looks that then I have to store load address in xen_phys_start
from 32-bit assembly code. So, it will be nice if I can define it as
"unsigned int" instead of "unsigned long". Is it safe? I am asking because
this variable is used in quite important places and I do not want to break
something by mistake. At first sight it looks that it is safe but it will
be nice if you double check it.

> > @@ -861,15 +860,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >          highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1);
> >  #endif
> >
> > +    /* Do not relocate Xen image if boot loader did work for us. */
> > +    if ( xen_img_load_base_addr )
> > +        xen_phys_start = xen_img_load_base_addr;
>
> Okay, with this change the question really is: Why do you need the
> new symbol? I.e. why can't you just use xen_phys_start, just like I
> managed to re-use it when I introduced boot from EFI?

This is what I was going to do in next release... If... Please look above...

> >      for ( i = boot_e820.nr_map-1; i >= 0; i-- )
> >      {
> >          uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
> >          uint64_t end, limit = ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT;
> >
> > -        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
>
> I can see that you want to get rid of BOOTSTRAP_MAP_BASE, but
> please don't delete the comment as a whole.

OK.

> >          s = (boot_e820.map[i].addr + mask) & ~mask;
> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
> >              continue;
>
> This means you now map memory between 2Mb and 16Mb here. Is
> this necessary?

Without this change Xen relocated by boot loader crashes with:

(XEN) Panic on CPU 0:
(XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902

So, it must be mapped. However, maybe we should not map this
region when Xen is not relocated by boot loader.

> > @@ -901,7 +902,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >              l4_pgentry_t *pl4e;
> >              l3_pgentry_t *pl3e;
> >              l2_pgentry_t *pl2e;
> > -            uint64_t load_start;
> >              int i, j, k;
> >
> >              /* Select relocation address. */
> > @@ -915,9 +915,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >               * with a barrier(). After this we must *not* modify static/global
> >               * data until after we have switched to the relocated pagetables!
> >               */
> > -            load_start = (unsigned long)_start - XEN_VIRT_START;
> >              barrier();
> > -            move_memory(e + load_start, load_start, _end - _start, 1);
> > +            move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, 1);
>
> Assuming _start - XEN_VIRT_START == XEN_IMG_OFFSET, is this
> change necessary? Or is it rather just simplification, which again
> should be split from an already complex patch.

Just simplification. I will split it from this patch.

> > @@ -957,15 +956,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >               * Undo the temporary-hooking of the l1_identmap.  __2M_text_start
> >               * is contained in this PTE.
> >               */
> > -            BUG_ON(l2_table_offset((unsigned long)_erodata) ==
> > -                   l2_table_offset((unsigned long)_stext));
>
> At least when using_2M_mapping() this surely ought to hold?

OK.

> > @@ -1019,6 +1017,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >                  : "memory" );
> >
> >              bootstrap_map(NULL);
> > +
> > +            printk("New Xen image base address: 0x%08lx\n", xen_phys_start);
>
> I don't see the motivation for adding such a message in this patch,
> but if, then please use %#lx.

OK, I will use %#lx.

> > @@ -1082,6 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >
> >      if ( !xen_phys_start )
> >          panic("Not enough memory to relocate Xen.");
> > +
> >      reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end));
>
> Another stray change.

I will fix it.

> > --- a/xen/arch/x86/xen.lds.S
> > +++ b/xen/arch/x86/xen.lds.S
> > @@ -55,7 +55,7 @@ SECTIONS
> >    __2M_text_start = .;         /* Start of 2M superpages, mapped RX. */
> >  #endif
> >
> > -  . = __XEN_VIRT_START + MB(1);
> > +  . = __XEN_VIRT_START + XEN_IMG_OFFSET;
> >    _start = .;
> >    .text : {
> >          _stext = .;            /* Text and read-only data */
> > @@ -257,12 +257,14 @@ SECTIONS
> >    .reloc : {
> >      *(.reloc)
> >    } :text
> > -  /* Trick the linker into setting the image size to exactly 16Mb. */
> >    . = ALIGN(__section_alignment__);
> > +#endif
> > +
> > +  /* Trick the linker into setting the image size to exactly 16Mb. */
> >    .pad : {
> >      . = ALIGN(MB(16));
> > +    __end_of_image__ = .;
>
> I see that you use this symbol in xen/arch/x86/Makefile, but I again
> don't follow why this logic needs to change.

Current logic does not work because it gets wrong address from xen/xen-syms.
This way boot loader allocates incorrect, usually huge, buffer for Xen image
(wait, there is a chance that this is a fix for issues related to 2 MiB mappings
found by Andrew). I do not see simple reliable fix for current solution. So,
I introduced __end_of_image__ and look for its address. This is much better
method to establish end of image address then previous one. However, I can
agree that this could be introduced in separate patch.

> > --- a/xen/include/asm-x86/page.h
> > +++ b/xen/include/asm-x86/page.h
> > @@ -288,7 +288,7 @@ extern root_pgentry_t
> > idle_pg_table[ROOT_PAGETABLE_ENTRIES];
> >  extern l2_pgentry_t  *compat_idle_pg_table_l2;
> >  extern unsigned int   m2p_compat_vstart;
> >  extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
> > -    l2_bootmap[L2_PAGETABLE_ENTRIES];
> > +    l2_bootmap[4*L2_PAGETABLE_ENTRIES];
>
> Why do you need 4 of them? I can see why one might not suffice,
> but two surely should?

In worst case we need three. One for l1_identmap hook and two
for Xen image mapping. So, I am not sure that it pays to complicate
assembly mapping code just to save just 1 page. Additionally, you
should remember that l2_bootmap is freed after init.

Daniel
Jan Beulich Aug. 31, 2016, 3:46 p.m. UTC | #3
>>> On 31.08.16 at 16:59, <daniel.kiper@oracle.com> wrote:
> On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
>> >>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
>> > Every multiboot protocol (regardless of version) compatible image must
>> > specify its load address (in ELF or multiboot header). Multiboot protocol
>> > compatible loader have to load image at specified address. However, there
>> > is no guarantee that the requested memory region (in case of Xen it starts
>> > at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
>> > and it is free (legacy BIOS platforms are merciful for Xen but I found at
>> > least one EFI platform on which Xen load address conflicts with EFI boot
>> > services; it is Dell PowerEdge R820 with latest firmware). To cope with that
>> > problem we must make Xen early boot code relocatable and help boot loader to
>> > relocate image in proper way by suggesting, not requesting specific load
>> > addresses as it is right now, allowed address ranges. This patch does former.
>> > It does not add multiboot2 protocol interface which is done in "x86: add
>> > multiboot2 protocol support for relocatable images" patch.
>> >
>> > This patch changes following things:
>> >   - default load address is changed from 1 MiB to 2 MiB; I did that because
>> >     initial page tables are using 2 MiB huge pages and this way required
>> >     updates for them are quite easy; it means that e.g. we avoid spacial
>> >     cases for start and end of required memory region if it live at address
>> >     not aligned to 2 MiB,
>>
>> Should this be a separate change then, to separate possible
>> regressions due to that from such due to any other of the changes
> 
> Potentially yes. However, it should be done properly. Otherwise in
> case of revert we can break Xen relocatable infrastructure and other
> things. So, I am not sure does it pays. Anyway, I will check is it
> possible or not.

It's not so much about possibly reverting (we'd rather revert
everything if such a need arises) than bisecting.

>> here? And then I don't see why, when making the image relocatable
>> anyway, the link time load address actually still matters.
> 
> It matters for multiboot (v1) and multiboot2 compatible boot
> loaders not supporting relocatable images.

Oh, right.

>> > --- a/xen/arch/x86/boot/head.S
>> > +++ b/xen/arch/x86/boot/head.S
>> > @@ -12,13 +12,16 @@
>> >          .text
>> >          .code32
>> >
>> > -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
>> > +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)
>>
>> In a patch already large and hence hard to review, did you really
>> need to do this rename?
> 
> I think that sym_offset() is better term here. However, if you wish
> I can leave sym_phys() as is. Though, IMO, sym_phys() name suggests
> that we are playing with physical addresses and it is not correct in
> all contexts, e.g. loot at fs_offset() (or sym_fs() as you wish).

After some more thinking about this I agree sym_phys() isn't
ideal anymore after this patch. Still, to avoid unrelated changes in
this quite hard to review patch, I'd like to ask that you keep the
name here and do just the rename in a subsequent patch.

>> Also, just to double check - all these page table setup adjustments
>> don't require reflecting in xen.efi's page table setup code?
> 
> I will check it once again but I do not think so.

Thanks for doing so. You perhaps realize that Andrew said the same
for what became 2ce5963727, and then recently we had to fix things.

>> > --- a/xen/arch/x86/boot/trampoline.S
>> > +++ b/xen/arch/x86/boot/trampoline.S
>> > @@ -54,12 +54,20 @@ trampoline_gdt:
>> >          /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
>> >          .long   0x0000ffff
>> >          .long   0x00009200
>> > +        /*
>> > +         * 0x0030: ring 0 Xen data, 16 MiB size, base
>> > +         * address is computed during runtime.
>> > +         */
>> > +        .quad   0x00c0920000001000
>>
>> This does not look like it covers 1Mb, it's more like 1Mb+4k-1.
> 
> I have checked it once again. It covers 16 MiB as required:
>   4 KiB * 0x1000 = 0x1000000 (no taking into account relocation).

I'm sorry, but no. The raw limit value taken from that descriptor
is 0x1000, and the G bit is set. That makes the actual limit 0x1000fff.

> By the way, it looks that we can define this segment as 0x200000 - 0x1000000.
> However, then sym_fs() should be defined as:
> 
> #define sym_fs(sym)    %fs:(sym_offset(sym) - XEN_IMG_OFFSET)
> 
> Does it make sense?

If you feel like it's worth it ...

>> >          .pushsection .trampoline_rel, "a"
>> >          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
>> >          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
>> >          .popsection
>> >
>> > +GLOBAL(xen_img_load_base_addr)
>> > +        .long   0
>>
>> I've yet to understand the purpose of this symbol, but in any case:
>> If the trampoline code doesn't reference it, why does it get placed
>> here?
> 
> This symbol was used by code which unconditionally relocated Xen image
> even if boot loader did work for us. I removed most of this code because
> we agreed that if boot loader relocated Xen image then we do not have to
> relocate it higher once again. If you are still OK with this idea I can go
> further, as I planned, and remove all such remnants from next release.
> However, it looks that then I have to store load address in xen_phys_start
> from 32-bit assembly code. So, it will be nice if I can define it as
> "unsigned int" instead of "unsigned long". Is it safe?

I don't see why you shouldn't be able to store into the low 32 bits of
the variable without altering the high ones (which are all zero).

>> >          s = (boot_e820.map[i].addr + mask) & ~mask;
>> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
>> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
>> >              continue;
>>
>> This means you now map memory between 2Mb and 16Mb here. Is
>> this necessary?
> 
> Without this change Xen relocated by boot loader crashes with:
> 
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
> 
> So, it must be mapped.

That's too little context to be useful for understanding the full
background.

>> > --- a/xen/arch/x86/xen.lds.S
>> > +++ b/xen/arch/x86/xen.lds.S
>> > @@ -55,7 +55,7 @@ SECTIONS
>> >    __2M_text_start = .;         /* Start of 2M superpages, mapped RX. */
>> >  #endif
>> >
>> > -  . = __XEN_VIRT_START + MB(1);
>> > +  . = __XEN_VIRT_START + XEN_IMG_OFFSET;
>> >    _start = .;
>> >    .text : {
>> >          _stext = .;            /* Text and read-only data */
>> > @@ -257,12 +257,14 @@ SECTIONS
>> >    .reloc : {
>> >      *(.reloc)
>> >    } :text
>> > -  /* Trick the linker into setting the image size to exactly 16Mb. */
>> >    . = ALIGN(__section_alignment__);
>> > +#endif
>> > +
>> > +  /* Trick the linker into setting the image size to exactly 16Mb. */
>> >    .pad : {
>> >      . = ALIGN(MB(16));
>> > +    __end_of_image__ = .;
>>
>> I see that you use this symbol in xen/arch/x86/Makefile, but I again
>> don't follow why this logic needs to change.
> 
> Current logic does not work because it gets wrong address from xen/xen-syms.
> This way boot loader allocates incorrect, usually huge, buffer for Xen image
> (wait, there is a chance that this is a fix for issues related to 2 MiB mappings
> found by Andrew). I do not see simple reliable fix for current solution. So,
> I introduced __end_of_image__ and look for its address. This is much better
> method to establish end of image address then previous one. However, I can
> agree that this could be introduced in separate patch.

Yes please, as that also will (hopefully) result in the need for the
change getting properly explained.

>> > --- a/xen/include/asm-x86/page.h
>> > +++ b/xen/include/asm-x86/page.h
>> > @@ -288,7 +288,7 @@ extern root_pgentry_t
>> > idle_pg_table[ROOT_PAGETABLE_ENTRIES];
>> >  extern l2_pgentry_t  *compat_idle_pg_table_l2;
>> >  extern unsigned int   m2p_compat_vstart;
>> >  extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
>> > -    l2_bootmap[L2_PAGETABLE_ENTRIES];
>> > +    l2_bootmap[4*L2_PAGETABLE_ENTRIES];
>>
>> Why do you need 4 of them? I can see why one might not suffice,
>> but two surely should?
> 
> In worst case we need three. One for l1_identmap hook and two
> for Xen image mapping. So, I am not sure that it pays to complicate
> assembly mapping code just to save just 1 page. Additionally, you
> should remember that l2_bootmap is freed after init.

Well, I was asking, not saying this needs to change. And you
pointing out the l1_identmap aspect makes clear this is fine as is.

Jan
Daniel Kiper Aug. 31, 2016, 8:50 p.m. UTC | #4
On Wed, Aug 31, 2016 at 09:46:31AM -0600, Jan Beulich wrote:
> >>> On 31.08.16 at 16:59, <daniel.kiper@oracle.com> wrote:
> > On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
> >> >>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:

[...]

> >> > --- a/xen/arch/x86/boot/head.S
> >> > +++ b/xen/arch/x86/boot/head.S
> >> > @@ -12,13 +12,16 @@
> >> >          .text
> >> >          .code32
> >> >
> >> > -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
> >> > +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)
> >>
> >> In a patch already large and hence hard to review, did you really
> >> need to do this rename?
> >
> > I think that sym_offset() is better term here. However, if you wish
> > I can leave sym_phys() as is. Though, IMO, sym_phys() name suggests
> > that we are playing with physical addresses and it is not correct in
> > all contexts, e.g. loot at fs_offset() (or sym_fs() as you wish).
>
> After some more thinking about this I agree sym_phys() isn't
> ideal anymore after this patch. Still, to avoid unrelated changes in
> this quite hard to review patch, I'd like to ask that you keep the
> name here and do just the rename in a subsequent patch.

Granted!

[...]

> >> > --- a/xen/arch/x86/boot/trampoline.S
> >> > +++ b/xen/arch/x86/boot/trampoline.S
> >> > @@ -54,12 +54,20 @@ trampoline_gdt:
> >> >          /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
> >> >          .long   0x0000ffff
> >> >          .long   0x00009200
> >> > +        /*
> >> > +         * 0x0030: ring 0 Xen data, 16 MiB size, base
> >> > +         * address is computed during runtime.
> >> > +         */
> >> > +        .quad   0x00c0920000001000
> >>
> >> This does not look like it covers 1Mb, it's more like 1Mb+4k-1.
> >
> > I have checked it once again. It covers 16 MiB as required:
> >   4 KiB * 0x1000 = 0x1000000 (no taking into account relocation).
>
> I'm sorry, but no. The raw limit value taken from that descriptor
> is 0x1000, and the G bit is set. That makes the actual limit 0x1000fff.

I was not sure why but this explains it: "When the granularity flag is
set, the twelve least significant bits of an offset are not tested when
checking the offset against the segment limit. For example, when the
granularity flag is set, a limit of 0 results in valid offsets from 0 to 4095."
(Intel(R) 64 and IA-32 Architectures Software Developer’s Manual Volume 3
(3A, 3B & 3C): System Programming Guide). So, it means that this should be:

  .quad 0x00c0920000000fff

[...]

> >> >          .pushsection .trampoline_rel, "a"
> >> >          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
> >> >          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
> >> >          .popsection
> >> >
> >> > +GLOBAL(xen_img_load_base_addr)
> >> > +        .long   0
> >>
> >> I've yet to understand the purpose of this symbol, but in any case:
> >> If the trampoline code doesn't reference it, why does it get placed
> >> here?
> >
> > This symbol was used by code which unconditionally relocated Xen image
> > even if boot loader did work for us. I removed most of this code because
> > we agreed that if boot loader relocated Xen image then we do not have to
> > relocate it higher once again. If you are still OK with this idea I can go
> > further, as I planned, and remove all such remnants from next release.
> > However, it looks that then I have to store load address in xen_phys_start
> > from 32-bit assembly code. So, it will be nice if I can define it as
> > "unsigned int" instead of "unsigned long". Is it safe?
>
> I don't see why you shouldn't be able to store into the low 32 bits of
> the variable without altering the high ones (which are all zero).

I was thinking about that too but IMO it is not nice. However, if you
are OK with that I can do it.

> >> >          s = (boot_e820.map[i].addr + mask) & ~mask;
> >> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> >> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> >> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
> >> >              continue;
> >>
> >> This means you now map memory between 2Mb and 16Mb here. Is
> >> this necessary?
> >
> > Without this change Xen relocated by boot loader crashes with:
> >
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
> >
> > So, it must be mapped.
>
> That's too little context to be useful for understanding the full
> background.

Here it is:

(XEN) Detected 2194.733 MHz processor.
(XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
(XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:  C   ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d0803c55d4>] subarch_init_memory+0x53e/0x5ea
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: ffff830000200000   rbx: ffffffffffffffff   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 000000007ea04000   rdi: 8000000000000000
(XEN) rbp: ffff82d080417d78   rsp: ffff82d080417d38   r8:  ffff830000000000
(XEN) r9:  00000007c7ffffff   r10: ffffffffffffffff   r11: 0000000000000000
(XEN) r12: ffff83007ea04008   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006e0
(XEN) cr3: 000000007ea0c000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d0803c55d4> (subarch_init_memory+0x53e/0x5ea):
(XEN)  09 fe 41 f6 c6 01 75 02 <0f> 0b 41 f6 c6 80 74 0f 48 09 fa 49 89 14 24 48
(XEN) Xen stack trace from rsp=ffff82d080417d38:
(XEN)    ffff82d0802324c3 000000000007ea04 ffff82d080417d78 0000000000180000
(XEN)    0000000000000009 0000000000180000 ffff82e000000000 0000000000180000
(XEN)    ffff82d080417db8 ffff82d0803c42af ffff82d080417da8 ffff82d080417fff
(XEN)    ffff83017f1e0670 ffff82d08054cad0 ffff82d08054cad0 0000000000000003
(XEN)    ffff82d080417f08 ffff82d0803ca8d5 0000000000000000 0000000000000001
(XEN)    0000024d00000000 000000000034cc00 000000000000014d 00000000000001f5
(XEN)    00000000000001f5 000000000000019f 000000000000019f 000000000000014e
(XEN)    000000000000014e 0000000000000002 0000000000000001 0000000000000001
(XEN)    0000000000000001 0000000000000001 0000000000000001 0000000000000001
(XEN)    0000000000b42000 ffff83000008eee0 0000000000000000 000000000000000e
(XEN)    0000000000180000 ffff83000008efa0 000000017f1f2000 fffffffffffff001
(XEN)    ffff83000008efb0 ffff82d0803ef99c 0000000000000000 0000000000000000
(XEN)    0000000800000000 000000010000006e 0000000000000003 00000000000002f8
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff82d0802000f3
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d0803c55d4>] subarch_init_memory+0x53e/0x5ea
(XEN)    [<ffff82d0803c42af>] arch_init_memory+0x2f3/0x5d3
(XEN)    [<ffff82d0803ca8d5>] __start_xen+0x242f/0x2965
(XEN)    [<ffff82d0802000f3>] __high_start+0x53/0x60
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
(XEN) ****************************************

Is it sufficient? If not please drop me a line what else do you need.

Daniel
Jan Beulich Sept. 1, 2016, 7:46 a.m. UTC | #5
>>> On 31.08.16 at 22:50, <daniel.kiper@oracle.com> wrote:
> On Wed, Aug 31, 2016 at 09:46:31AM -0600, Jan Beulich wrote:
>> >>> On 31.08.16 at 16:59, <daniel.kiper@oracle.com> wrote:
>> > On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
>> >> >>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
>> >> > --- a/xen/arch/x86/boot/head.S
>> >> > +++ b/xen/arch/x86/boot/head.S
>> >> > @@ -12,13 +12,16 @@
>> >> >          .text
>> >> >          .code32
>> >> >
>> >> > -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
>> >> > +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)
>> >>
>> >> In a patch already large and hence hard to review, did you really
>> >> need to do this rename?
>> >
>> > I think that sym_offset() is better term here. However, if you wish
>> > I can leave sym_phys() as is. Though, IMO, sym_phys() name suggests
>> > that we are playing with physical addresses and it is not correct in
>> > all contexts, e.g. loot at fs_offset() (or sym_fs() as you wish).
>>
>> After some more thinking about this I agree sym_phys() isn't
>> ideal anymore after this patch. Still, to avoid unrelated changes in
>> this quite hard to review patch, I'd like to ask that you keep the
>> name here and do just the rename in a subsequent patch.
> 
> Granted!

Btw., to save on typing and since you will need to touch this anyway
- can I talk you into using sym_offs() instead of sym_offset()?

>> >> >          .pushsection .trampoline_rel, "a"
>> >> >          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
>> >> >          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
>> >> >          .popsection
>> >> >
>> >> > +GLOBAL(xen_img_load_base_addr)
>> >> > +        .long   0
>> >>
>> >> I've yet to understand the purpose of this symbol, but in any case:
>> >> If the trampoline code doesn't reference it, why does it get placed
>> >> here?
>> >
>> > This symbol was used by code which unconditionally relocated Xen image
>> > even if boot loader did work for us. I removed most of this code because
>> > we agreed that if boot loader relocated Xen image then we do not have to
>> > relocate it higher once again. If you are still OK with this idea I can go
>> > further, as I planned, and remove all such remnants from next release.
>> > However, it looks that then I have to store load address in xen_phys_start
>> > from 32-bit assembly code. So, it will be nice if I can define it as
>> > "unsigned int" instead of "unsigned long". Is it safe?
>>
>> I don't see why you shouldn't be able to store into the low 32 bits of
>> the variable without altering the high ones (which are all zero).
> 
> I was thinking about that too but IMO it is not nice. However, if you
> are OK with that I can do it.

Well, imo that's something which is perfectly fine in assembly code.

>> >> >          s = (boot_e820.map[i].addr + mask) & ~mask;
>> >> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
>> >> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>> >> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
>> >> >              continue;
>> >>
>> >> This means you now map memory between 2Mb and 16Mb here. Is
>> >> this necessary?
>> >
>> > Without this change Xen relocated by boot loader crashes with:
>> >
>> > (XEN) Panic on CPU 0:
>> > (XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
>> >
>> > So, it must be mapped.
>>
>> That's too little context to be useful for understanding the full
>> background.
> 
> Here it is:
>[...]
> Is it sufficient? If not please drop me a line what else do you need.

I'm sorry, but no. I didn't mean the whole register and stack dump. I
meant an explanation of _why_ this assertion triggers (after all it
doesn't trigger without your patches, and hence I can't just go and
check the relevant source files).

Jan
Daniel Kiper Sept. 1, 2016, 9:19 p.m. UTC | #6
On Thu, Sep 01, 2016 at 01:46:24AM -0600, Jan Beulich wrote:
> >>> On 31.08.16 at 22:50, <daniel.kiper@oracle.com> wrote:
> > On Wed, Aug 31, 2016 at 09:46:31AM -0600, Jan Beulich wrote:
> >> >>> On 31.08.16 at 16:59, <daniel.kiper@oracle.com> wrote:
> >> > On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
> >> >> >>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
> >> >> > --- a/xen/arch/x86/boot/head.S
> >> >> > +++ b/xen/arch/x86/boot/head.S
> >> >> > @@ -12,13 +12,16 @@
> >> >> >          .text
> >> >> >          .code32
> >> >> >
> >> >> > -#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
> >> >> > +#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)
> >> >>
> >> >> In a patch already large and hence hard to review, did you really
> >> >> need to do this rename?
> >> >
> >> > I think that sym_offset() is better term here. However, if you wish
> >> > I can leave sym_phys() as is. Though, IMO, sym_phys() name suggests
> >> > that we are playing with physical addresses and it is not correct in
> >> > all contexts, e.g. loot at fs_offset() (or sym_fs() as you wish).
> >>
> >> After some more thinking about this I agree sym_phys() isn't
> >> ideal anymore after this patch. Still, to avoid unrelated changes in
> >> this quite hard to review patch, I'd like to ask that you keep the
> >> name here and do just the rename in a subsequent patch.
> >
> > Granted!
>
> Btw., to save on typing and since you will need to touch this anyway
> - can I talk you into using sym_offs() instead of sym_offset()?

Sure thing!

> >> >> >          .pushsection .trampoline_rel, "a"
> >> >> >          .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
> >> >> >          .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
> >> >> >          .popsection
> >> >> >
> >> >> > +GLOBAL(xen_img_load_base_addr)
> >> >> > +        .long   0
> >> >>
> >> >> I've yet to understand the purpose of this symbol, but in any case:
> >> >> If the trampoline code doesn't reference it, why does it get placed
> >> >> here?
> >> >
> >> > This symbol was used by code which unconditionally relocated Xen image
> >> > even if boot loader did work for us. I removed most of this code because
> >> > we agreed that if boot loader relocated Xen image then we do not have to
> >> > relocate it higher once again. If you are still OK with this idea I can go
> >> > further, as I planned, and remove all such remnants from next release.
> >> > However, it looks that then I have to store load address in xen_phys_start
> >> > from 32-bit assembly code. So, it will be nice if I can define it as
> >> > "unsigned int" instead of "unsigned long". Is it safe?
> >>
> >> I don't see why you shouldn't be able to store into the low 32 bits of
> >> the variable without altering the high ones (which are all zero).
> >
> > I was thinking about that too but IMO it is not nice. However, if you
> > are OK with that I can do it.
>
> Well, imo that's something which is perfectly fine in assembly code.

OK, I will do that.

> >> >> >          s = (boot_e820.map[i].addr + mask) & ~mask;
> >> >> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> >> >> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> >> >> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
> >> >> >              continue;
> >> >>
> >> >> This means you now map memory between 2Mb and 16Mb here. Is
> >> >> this necessary?
> >> >
> >> > Without this change Xen relocated by boot loader crashes with:
> >> >
> >> > (XEN) Panic on CPU 0:
> >> > (XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
> >> >
> >> > So, it must be mapped.
> >>
> >> That's too little context to be useful for understanding the full
> >> background.
> >
> > Here it is:
> >[...]
> > Is it sufficient? If not please drop me a line what else do you need.
>
> I'm sorry, but no. I didn't mean the whole register and stack dump. I
> meant an explanation of _why_ this assertion triggers (after all it
> doesn't trigger without your patches, and hence I can't just go and
> check the relevant source files).

AIUI, if Xen image is not relocated by boot loader (e.g. GRUB2) then region
2 MiB - 16 MiB is mapped. Later Xen after relocating itself marks 1 MiB - 16 MiB
region as NX (if it is supported by hardware) in subarch_init_memory(). However,
if boot loader relocate Xen then region 2 MiB - 16 MiB is not mapped if change
under consideration is not applied. Then ASSERT(l2e_get_flags(l2e) & _PAGE_PRESENT)
in subarch_init_memory() is triggered and Xen crashes.

I hope that helps.

Daniel
Jan Beulich Sept. 2, 2016, 6:58 a.m. UTC | #7
>>> On 01.09.16 at 23:19, <daniel.kiper@oracle.com> wrote:
> On Thu, Sep 01, 2016 at 01:46:24AM -0600, Jan Beulich wrote:
>> >>> On 31.08.16 at 22:50, <daniel.kiper@oracle.com> wrote:
>> > On Wed, Aug 31, 2016 at 09:46:31AM -0600, Jan Beulich wrote:
>> >> >>> On 31.08.16 at 16:59, <daniel.kiper@oracle.com> wrote:
>> >> > On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
>> >> >> >>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
>> >> >> >          s = (boot_e820.map[i].addr + mask) & ~mask;
>> >> >> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
>> >> >> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>> >> >> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
>> >> >> >              continue;
>> >> >>
>> >> >> This means you now map memory between 2Mb and 16Mb here. Is
>> >> >> this necessary?
>> >> >
>> >> > Without this change Xen relocated by boot loader crashes with:
>> >> >
>> >> > (XEN) Panic on CPU 0:
>> >> > (XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
>> >> >
>> >> > So, it must be mapped.
>> >>
>> >> That's too little context to be useful for understanding the full
>> >> background.
>> >
>> > Here it is:
>> >[...]
>> > Is it sufficient? If not please drop me a line what else do you need.
>>
>> I'm sorry, but no. I didn't mean the whole register and stack dump. I
>> meant an explanation of _why_ this assertion triggers (after all it
>> doesn't trigger without your patches, and hence I can't just go and
>> check the relevant source files).
> 
> AIUI, if Xen image is not relocated by boot loader (e.g. GRUB2) then region
> 2 MiB - 16 MiB is mapped. Later Xen after relocating itself marks 1 MiB - 16 MiB
> region as NX (if it is supported by hardware) in subarch_init_memory(). However,
> if boot loader relocate Xen then region 2 MiB - 16 MiB is not mapped if change
> under consideration is not applied. Then ASSERT(l2e_get_flags(l2e) & _PAGE_PRESENT)
> in subarch_init_memory() is triggered and Xen crashes.

Ah, I see now. But instead of altering the setup.c logic even for the
not relocated case, perhaps you'd better establish the here expected
mapping in the relocated case? I'm rather hesitant to see that setup.c
logic change for pre-existing cases; it may alternatively be acceptable
to make the line you delete conditional.

Jan
Daniel Kiper Sept. 2, 2016, 7:28 a.m. UTC | #8
On Fri, Sep 02, 2016 at 12:58:14AM -0600, Jan Beulich wrote:
> >>> On 01.09.16 at 23:19, <daniel.kiper@oracle.com> wrote:
> > On Thu, Sep 01, 2016 at 01:46:24AM -0600, Jan Beulich wrote:
> >> >>> On 31.08.16 at 22:50, <daniel.kiper@oracle.com> wrote:
> >> > On Wed, Aug 31, 2016 at 09:46:31AM -0600, Jan Beulich wrote:
> >> >> >>> On 31.08.16 at 16:59, <daniel.kiper@oracle.com> wrote:
> >> >> > On Thu, Aug 25, 2016 at 08:41:39AM -0600, Jan Beulich wrote:
> >> >> >> >>> On 20.08.16 at 00:43, <daniel.kiper@oracle.com> wrote:
> >> >> >> >          s = (boot_e820.map[i].addr + mask) & ~mask;
> >> >> >> >          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> >> >> >> > -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> >> >> >> >          if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
> >> >> >> >              continue;
> >> >> >>
> >> >> >> This means you now map memory between 2Mb and 16Mb here. Is
> >> >> >> this necessary?
> >> >> >
> >> >> > Without this change Xen relocated by boot loader crashes with:
> >> >> >
> >> >> > (XEN) Panic on CPU 0:
> >> >> > (XEN) Assertion 'l2e_get_flags(l2e) & _PAGE_PRESENT' failed at x86_64/mm.c:902
> >> >> >
> >> >> > So, it must be mapped.
> >> >>
> >> >> That's too little context to be useful for understanding the full
> >> >> background.
> >> >
> >> > Here it is:
> >> >[...]
> >> > Is it sufficient? If not please drop me a line what else do you need.
> >>
> >> I'm sorry, but no. I didn't mean the whole register and stack dump. I
> >> meant an explanation of _why_ this assertion triggers (after all it
> >> doesn't trigger without your patches, and hence I can't just go and
> >> check the relevant source files).
> >
> > AIUI, if Xen image is not relocated by boot loader (e.g. GRUB2) then region
> > 2 MiB - 16 MiB is mapped. Later Xen after relocating itself marks 1 MiB - 16 MiB
> > region as NX (if it is supported by hardware) in subarch_init_memory(). However,
> > if boot loader relocate Xen then region 2 MiB - 16 MiB is not mapped if change
> > under consideration is not applied. Then ASSERT(l2e_get_flags(l2e) & _PAGE_PRESENT)
> > in subarch_init_memory() is triggered and Xen crashes.
>
> Ah, I see now. But instead of altering the setup.c logic even for the
> not relocated case, perhaps you'd better establish the here expected
> mapping in the relocated case? I'm rather hesitant to see that setup.c
> logic change for pre-existing cases; it may alternatively be acceptable
> to make the line you delete conditional.

...or map 2 MiB - 16 MiB region in xen/arch/x86/boot/head.S. It could
be done in similar way to Xen image mapping. However, this should be
tested...

Daniel
diff mbox

Patch

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 9464b7b..df899c1 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -89,8 +89,8 @@  all_symbols =
 endif
 
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
-	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) 0x100000 \
-	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
+	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \
+		`$(NM) -nr $(TARGET)-syms | awk '$$3 == "__end_of_image__" {print "0x"$$1}'`
 
 .PHONY: tests
 tests:
diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
index 42be4bc..dd10afe 100644
--- a/xen/arch/x86/Rules.mk
+++ b/xen/arch/x86/Rules.mk
@@ -1,6 +1,10 @@ 
 ########################################
 # x86-specific definitions
 
+XEN_IMG_OFFSET = 0x200000
+
+CFLAGS += -DXEN_IMG_OFFSET=$(XEN_IMG_OFFSET)
+
 CFLAGS += -I$(BASEDIR)/include
 CFLAGS += -I$(BASEDIR)/include/asm-x86/mach-generic
 CFLAGS += -I$(BASEDIR)/include/asm-x86/mach-default
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index b832b21..a1b0c05 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -12,13 +12,16 @@ 
         .text
         .code32
 
-#define sym_phys(sym)     ((sym) - __XEN_VIRT_START)
+#define sym_offset(sym)   ((sym) - __XEN_VIRT_START)
+#define esi_offset(sym)   sym_offset(sym)(%esi)
+#define fs_offset(sym)    %fs:sym_offset(sym)
 
 #define BOOT_CS32        0x0008
 #define BOOT_CS64        0x0010
 #define BOOT_DS          0x0018
 #define BOOT_PSEUDORM_CS 0x0020
 #define BOOT_PSEUDORM_DS 0x0028
+#define BOOT_FS          0x0030
 
 #define MB2_HT(name)      (MULTIBOOT2_HEADER_TAG_##name)
 #define MB2_TT(name)      (MULTIBOOT2_TAG_TYPE_##name)
@@ -94,7 +97,7 @@  multiboot2_header_start:
 
         /* EFI64 entry point. */
         mb2ht_init MB2_HT(ENTRY_ADDRESS_EFI64), MB2_HT(OPTIONAL), \
-                   sym_phys(__efi64_start)
+                   sym_offset(__efi64_start)
 
         /* Multiboot2 header end tag. */
         mb2ht_init MB2_HT(END), MB2_HT(REQUIRED)
@@ -105,12 +108,13 @@  multiboot2_header_start:
 
         .word   0
 gdt_boot_descr:
-        .word   6*8-1
-        .long   sym_phys(trampoline_gdt)
+        .word   7*8-1
+gdt_boot_base:
+        .long   sym_offset(trampoline_gdt)
         .long   0 /* Needed for 64-bit lgdt */
 
 cs32_switch_addr:
-        .long   sym_phys(cs32_switch)
+        .long   sym_offset(cs32_switch)
         .word   BOOT_CS32
 
 vga_text_buffer:
@@ -126,26 +130,26 @@  vga_text_buffer:
         .section .init.text, "ax", @progbits
 
 bad_cpu:
-        mov     $(sym_phys(.Lbad_cpu_msg)),%esi # Error message
+        lea     esi_offset(.Lbad_cpu_msg),%esi  # Error message
         jmp     0f
 not_multiboot:
-        mov     $(sym_phys(.Lbad_ldr_msg)),%esi # Error message
+        lea     esi_offset(.Lbad_ldr_msg),%esi  # Error message
         jmp     0f
 mb2_no_st:
-        mov     $(sym_phys(.Lbad_ldr_nst)),%esi # Error message
+        lea     esi_offset(.Lbad_ldr_nst),%esi  # Error message
         jmp     0f
 mb2_no_ih:
-        mov     $(sym_phys(.Lbad_ldr_nih)),%esi # Error message
+        lea     esi_offset(.Lbad_ldr_nih),%esi  # Error message
         jmp     0f
 mb2_no_bs:
-        mov     $(sym_phys(.Lbad_ldr_nbs)),%esi # Error message
+        lea     esi_offset(.Lbad_ldr_nbs),%esi  # Error message
         xor     %edi,%edi                       # No VGA text buffer
         jmp     1f
 mb2_efi_ia_32:
-        mov     $(sym_phys(.Lbad_efi_msg)),%esi # Error message
+        lea     esi_offset(.Lbad_efi_msg),%esi  # Error message
         xor     %edi,%edi                       # No VGA text buffer
         jmp     1f
-0:      mov     sym_phys(vga_text_buffer),%edi
+0:      mov     esi_offset(vga_text_buffer),%edi
 1:      mov     (%esi),%bl
         test    %bl,%bl        # Terminate on '\0' sentinel
         je      .Lhalt
@@ -173,6 +177,9 @@  __efi64_start:
         /* VGA is not available on EFI platforms. */
         movl   $0,vga_text_buffer(%rip)
 
+        /* Load Xen image load base address. */
+        lea     __image_base__(%rip),%r15d
+
         /* Check for Multiboot2 bootloader. */
         cmp     $MULTIBOOT2_BOOTLOADER_MAGIC,%eax
         je      .Lefi_multiboot2_proto
@@ -288,6 +295,9 @@  run_bs:
 
         pop     %rax
 
+        /* Store Xen image load base address in place accessible for 32-bit code. */
+        mov     %r15d,%esi
+
         /* Jump to trampoline_setup after switching CPU to x86_32 mode. */
         lea     trampoline_setup(%rip),%edi
 
@@ -295,9 +305,11 @@  x86_32_switch:
         cli
 
         /* Initialise GDT. */
+        add     %esi,gdt_boot_base(%rip)
         lgdt    gdt_boot_descr(%rip)
 
         /* Reload code selector. */
+        add     %esi,cs32_switch_addr(%rip)
         ljmpl   *cs32_switch_addr(%rip)
 
         .code32
@@ -327,12 +339,8 @@  __start:
         cld
         cli
 
-        /* Initialise GDT and basic data segments. */
-        lgdt    %cs:sym_phys(gdt_boot_descr)
-        mov     $BOOT_DS,%ecx
-        mov     %ecx,%ds
-        mov     %ecx,%es
-        mov     %ecx,%ss
+        /* Load default Xen image load base address. */
+        mov     $sym_offset(__image_base__),%esi
 
         /* Bootloaders may set multiboot{1,2}.mem_lower to a nonzero value. */
         xor     %edx,%edx
@@ -388,6 +396,25 @@  __start:
         jmp     0b
 
 trampoline_bios_setup:
+        /*
+         * Called on legacy BIOS platforms only.
+         *
+         * Initialise GDT and basic data segments.
+         */
+        add     %esi,esi_offset(gdt_boot_base)
+        lgdt    esi_offset(gdt_boot_descr)
+
+        mov     $BOOT_DS,%ecx
+        mov     %ecx,%ds
+        mov     %ecx,%es
+        mov     %ecx,%ss
+        /* %esp is initialised later. */
+
+        /* Load null descriptor to unused segment registers. */
+        xor     %ecx,%ecx
+        mov     %ecx,%fs
+        mov     %ecx,%gs
+
         /* Set up trampoline segment 64k below EBDA */
         movzwl  0x40e,%ecx          /* EBDA segment */
         cmp     $0xa000,%ecx        /* sanity check (high) */
@@ -409,36 +436,93 @@  trampoline_bios_setup:
         cmovb   %edx,%ecx           /* and use the smaller */
 
 trampoline_setup:
+        /*
+         * Called on legacy BIOS and EFI platforms.
+         *
+         * Compute 0-15 bits of BOOT_FS segment descriptor base address.
+         */
+        mov     %esi,%edx
+        shl     $16,%edx
+        or      %edx,BOOT_FS+esi_offset(trampoline_gdt)
+
+        /* Compute 16-23 bits of BOOT_FS segment descriptor base address. */
+        mov     %esi,%edx
+        shr     $16,%edx
+        and     $0x000000ff,%edx
+        or      %edx,BOOT_FS+4+esi_offset(trampoline_gdt)
+
+        /* Compute 24-31 bits of BOOT_FS segment descriptor base address. */
+        mov     %esi,%edx
+        and     $0xff000000,%edx
+        or      %edx,BOOT_FS+4+esi_offset(trampoline_gdt)
+
+        /*
+         * Initialise %fs and later use it to access Xen data if possible.
+         * According to Intel 64 and IA-32 Architectures Software Developer’s
+         * Manual it is safe to do that without reloading GDTR before.
+         *
+         * Please check Intel 64 and IA-32 Architectures Software Developer’s
+         * Manual, Volume 2 (2A, 2B & 2C): Instruction Set Reference,
+         * LGDT and MOV instructions description and
+         * Intel 64 and IA-32 Architectures Software Developer’s
+         * Manual Volume 3 (3A, 3B & 3C): System Programming Guide,
+         * section 3.4.3, Segment Registers for more details.
+         *
+         * AIUI, only GDT address and limit are loaded into GDTR when
+         * lgdt is executed. Segment descriptor is loaded directly from
+         * memory into segment register (hiden part) only when relevant
+         * load instruction is used (e.g. mov %edx,%fs). Though GDT content
+         * probably could be stored in CPU cache but nothing suggest that
+         * CPU caching interfere in one way or another with segment descriptor
+         * load. So, it looks that every change in active GDT is immediately
+         * available for relevant segment descriptor load instruction.
+         *
+         * I was not able to find anything which invalidates above.
+         * So, everything suggest that we do not need an extra lgdt here.
+         */
+        mov     $BOOT_FS,%edx
+        mov     %edx,%fs
+
         /* Reserve 64kb for the trampoline. */
         sub     $0x1000,%ecx
 
         /* From arch/x86/smpboot.c: start_eip had better be page-aligned! */
         xor     %cl, %cl
         shl     $4, %ecx
-        mov     %ecx,sym_phys(trampoline_phys)
+        mov     %ecx,fs_offset(trampoline_phys)
+
+        /* Save Xen image load base address for later use. */
+        mov     %esi,fs_offset(xen_img_load_base_addr)
+        mov     %esi,fs_offset(trampoline_xen_phys_start)
+
+        /* Setup stack. %ss was initialized earlier. */
+        lea     1024+esi_offset(cpu0_stack),%esp
 
         /* Save the Multiboot info struct (after relocation) for later use. */
-        mov     $sym_phys(cpu0_stack)+1024,%esp
         push    %ecx                /* Boot trampoline address. */
         push    %ebx                /* Multiboot information address. */
         push    %eax                /* Multiboot magic. */
         call    reloc
-        mov     %eax,sym_phys(multiboot_ptr)
+        mov     %eax,fs_offset(multiboot_ptr)
 
         /*
          * Do not zero BSS on EFI platform here.
          * It was initialized earlier.
          */
-        cmpb    $0,sym_phys(skip_realmode)
+        cmpb    $0,fs_offset(skip_realmode)
         jnz     1f
 
         /* Initialize BSS (no nasty surprises!). */
-        mov     $sym_phys(__bss_start),%edi
-        mov     $sym_phys(__bss_end),%ecx
+        mov     $sym_offset(__bss_start),%edi
+        mov     $sym_offset(__bss_end),%ecx
+        push    %fs
+        pop     %es
         sub     %edi,%ecx
         shr     $2,%ecx
         xor     %eax,%eax
         rep stosl
+        push    %ds
+        pop     %es
 
 1:
         /* Interrogate CPU extended features via CPUID. */
@@ -452,8 +536,8 @@  trampoline_setup:
         jbe     1f
         mov     $0x80000001,%eax
         cpuid
-1:      mov     %edx,sym_phys(cpuid_ext_features)
-        mov     %edx,sym_phys(boot_cpu_data)+CPUINFO_FEATURE_OFFSET(X86_FEATURE_LM)
+1:      mov     %edx,fs_offset(cpuid_ext_features)
+        mov     %edx,fs_offset(boot_cpu_data)+CPUINFO_FEATURE_OFFSET(X86_FEATURE_LM)
 
         /* Check for availability of long mode. */
         bt      $cpufeat_bit(X86_FEATURE_LM),%edx
@@ -461,62 +545,88 @@  trampoline_setup:
 
         /* Stash TSC to calculate a good approximation of time-since-boot */
         rdtsc
-        mov     %eax,sym_phys(boot_tsc_stamp)
-        mov     %edx,sym_phys(boot_tsc_stamp+4)
+        mov     %eax,fs_offset(boot_tsc_stamp)
+        mov     %edx,fs_offset(boot_tsc_stamp)+4
+
+        /* Update frame addresses in page tables. */
+        mov     $((__page_tables_end-__page_tables_start)/8),%ecx
+1:      testl   $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8)
+        jz      2f
+        add     %esi,fs_offset(__page_tables_start)-8(,%ecx,8)
+2:      loop    1b
+
+        /* Initialise L2 boot-map/direct map page table entries (14MB). */
+        lea     esi_offset(start),%ebx
+        lea     (1<<L2_PAGETABLE_SHIFT)*7+(PAGE_HYPERVISOR|_PAGE_PSE)(%ebx),%eax
+        shr     $(L2_PAGETABLE_SHIFT-3),%ebx
+        mov     $8,%ecx
+1:      mov     %eax,fs_offset(l2_bootmap)-8(%ebx,%ecx,8)
+        mov     %eax,fs_offset(l2_identmap)-8(%ebx,%ecx,8)
+        sub     $(1<<L2_PAGETABLE_SHIFT),%eax
+        loop    1b
+
+        /* Initialise L3 boot-map page directory entry. */
+        lea     __PAGE_HYPERVISOR+(L2_PAGETABLE_ENTRIES*8)*3+esi_offset(l2_bootmap),%eax
+        mov     $4,%ecx
+1:      mov     %eax,fs_offset(l3_bootmap)-8(,%ecx,8)
+        sub     $(L2_PAGETABLE_ENTRIES*8),%eax
+        loop    1b
 
         /*
          * During boot, hook 4kB mappings of first 2MB of memory into L2.
-         * This avoids mixing cachability for the legacy VGA region, and is
-         * corrected when Xen relocates itself.
+         * This avoids mixing cachability for the legacy VGA region.
          */
-        mov     $sym_phys(l1_identmap)+__PAGE_HYPERVISOR,%edi
-        mov     %edi,sym_phys(l2_xenmap)
+        lea     __PAGE_HYPERVISOR+esi_offset(l1_identmap),%edi
+        mov     %edi,fs_offset(l2_bootmap)
 
         /* Apply relocations to bootstrap trampoline. */
-        mov     sym_phys(trampoline_phys),%edx
-        mov     $sym_phys(__trampoline_rel_start),%edi
+        mov     fs_offset(trampoline_phys),%edx
+        mov     $sym_offset(__trampoline_rel_start),%edi
+        mov     $sym_offset(__trampoline_rel_stop),%ebx
 1:
-        mov     (%edi),%eax
-        add     %edx,(%edi,%eax)
+        mov     %fs:(%edi),%eax
+        add     %edx,%fs:(%edi,%eax)
         add     $4,%edi
-        cmp     $sym_phys(__trampoline_rel_stop),%edi
+        cmp     %ebx,%edi
         jb      1b
 
         /* Patch in the trampoline segment. */
         shr     $4,%edx
-        mov     $sym_phys(__trampoline_seg_start),%edi
+        mov     $sym_offset(__trampoline_seg_start),%edi
+        mov     $sym_offset(__trampoline_seg_stop),%ebx
 1:
-        mov     (%edi),%eax
-        mov     %dx,(%edi,%eax)
+        mov     %fs:(%edi),%eax
+        mov     %dx,%fs:(%edi,%eax)
         add     $4,%edi
-        cmp     $sym_phys(__trampoline_seg_stop),%edi
+        cmp     %ebx,%edi
         jb      1b
 
         /* Do not parse command line on EFI platform here. */
-        cmpb    $0,sym_phys(skip_realmode)
+        cmpb    $0,fs_offset(skip_realmode)
         jnz     1f
 
         /* Bail if there is no command line to parse. */
-        mov     sym_phys(multiboot_ptr),%ebx
+        mov     fs_offset(multiboot_ptr),%ebx
         testl   $MBI_CMDLINE,MB_flags(%ebx)
         jz      1f
 
-        pushl   $sym_phys(early_boot_opts)
+        lea     esi_offset(early_boot_opts),%eax
+        push    %eax
         pushl   MB_cmdline(%ebx)
         call    cmdline_parse_early
 
 1:
         /* Switch to low-memory stack.  */
-        mov     sym_phys(trampoline_phys),%edi
+        mov     fs_offset(trampoline_phys),%edi
         lea     0x10000(%edi),%esp
         lea     trampoline_boot_cpu_entry-trampoline_start(%edi),%eax
         pushl   $BOOT_CS32
         push    %eax
 
         /* Copy bootstrap trampoline to low memory, below 1MB. */
-        mov     $sym_phys(trampoline_start),%esi
+        mov     $sym_offset(trampoline_start),%esi
         mov     $((trampoline_end - trampoline_start) / 4),%ecx
-        rep movsl
+        rep movsl %fs:(%esi),%es:(%edi)
 
         /* Jump into the relocated trampoline. */
         lret
diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
index 8a32728..cfb47a4 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -54,12 +54,20 @@  trampoline_gdt:
         /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
         .long   0x0000ffff
         .long   0x00009200
+        /*
+         * 0x0030: ring 0 Xen data, 16 MiB size, base
+         * address is computed during runtime.
+         */
+        .quad   0x00c0920000001000
 
         .pushsection .trampoline_rel, "a"
         .long   trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
         .long   trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
         .popsection
 
+GLOBAL(xen_img_load_base_addr)
+        .long   0
+
 GLOBAL(trampoline_misc_enable_off)
         .quad   0
 
@@ -87,7 +95,7 @@  trampoline_protmode_entry:
         mov     %ecx,%cr4
 
         /* Load pagetable base register. */
-        mov     $sym_phys(idle_pg_table),%eax
+        mov     $sym_offset(idle_pg_table),%eax
         add     bootsym_rel(trampoline_xen_phys_start,4,%eax)
         mov     %eax,%cr3
 
diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
index 08ea9b2..3c56d12 100644
--- a/xen/arch/x86/boot/wakeup.S
+++ b/xen/arch/x86/boot/wakeup.S
@@ -120,7 +120,7 @@  wakeup_32:
         mov     $bootsym_rel(wakeup_stack, 4, %esp)
 
         # check saved magic again
-        mov     $sym_phys(saved_magic), %eax
+        mov     $sym_offset(saved_magic),%eax
         add     bootsym_rel(trampoline_xen_phys_start, 4, %eax)
         mov     (%eax), %eax
         cmp     $0x9abcdef0, %eax
@@ -133,7 +133,7 @@  wakeup_32:
         mov     %ecx, %cr4
 
         /* Load pagetable base register */
-        mov     $sym_phys(idle_pg_table),%eax
+        mov     $sym_offset(idle_pg_table),%eax
         add     bootsym_rel(trampoline_xen_phys_start,4,%eax)
         mov     %eax,%cr3
 
diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S
index 9ab9231..9929dd0 100644
--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -81,7 +81,6 @@  GLOBAL(boot_cpu_compat_gdt_table)
         .quad 0x0000910000000000     /* per-CPU entry (limit == cpu)      */
         .align PAGE_SIZE, 0
 
-GLOBAL(__page_tables_start)
 /*
  * Mapping of first 2 megabytes of memory. This is mapped with 4kB mappings
  * to avoid type conflicts with fixed-range MTRRs covering the lowest megabyte
@@ -101,18 +100,12 @@  GLOBAL(l1_identmap)
         .endr
         .size l1_identmap, . - l1_identmap
 
-/*
- * Space for mapping the first 4GB of memory, with the first 16 megabytes
- * actualy mapped (mostly using superpages).  Uses 4x 4k pages.
- */
+GLOBAL(__page_tables_start)
+
+/* Space for mapping the first 4GB of memory. Uses 4x 4k pages. */
 GLOBAL(l2_identmap)
-        .quad sym_phys(l1_identmap) + __PAGE_HYPERVISOR
-        idx = 1
-        .rept 7
-        .quad (idx << L2_PAGETABLE_SHIFT) | PAGE_HYPERVISOR | _PAGE_PSE
-        idx = idx + 1
-        .endr
-        .fill 4 * L2_PAGETABLE_ENTRIES - 8, 8, 0
+        .quad sym_offset(l1_identmap) + __PAGE_HYPERVISOR
+        .fill 4 * L2_PAGETABLE_ENTRIES - 1, 8, 0
         .size l2_identmap, . - l2_identmap
 
 /*
@@ -121,9 +114,10 @@  GLOBAL(l2_identmap)
  * page.
  */
 GLOBAL(l2_xenmap)
-        idx = 0
-        .rept 8
-        .quad sym_phys(__image_base__) + (idx << L2_PAGETABLE_SHIFT) + (PAGE_HYPERVISOR | _PAGE_PSE)
+        .quad 0
+        idx = 1
+        .rept 7
+        .quad sym_offset(__image_base__) + (idx << L2_PAGETABLE_SHIFT) + (PAGE_HYPERVISOR | _PAGE_PSE)
         idx = idx + 1
         .endr
         .fill L2_PAGETABLE_ENTRIES - 8, 8, 0
@@ -134,7 +128,7 @@  l2_fixmap:
         idx = 0
         .rept L2_PAGETABLE_ENTRIES
         .if idx == l2_table_offset(FIXADDR_TOP - 1)
-        .quad sym_phys(l1_fixmap) + __PAGE_HYPERVISOR
+        .quad sym_offset(l1_fixmap) + __PAGE_HYPERVISOR
         .else
         .quad 0
         .endif
@@ -146,7 +140,7 @@  l2_fixmap:
 GLOBAL(l3_identmap)
         idx = 0
         .rept 4
-        .quad sym_phys(l2_identmap) + (idx << PAGE_SHIFT) + __PAGE_HYPERVISOR
+        .quad sym_offset(l2_identmap) + (idx << PAGE_SHIFT) + __PAGE_HYPERVISOR
         idx = idx + 1
         .endr
         .fill L3_PAGETABLE_ENTRIES - 4, 8, 0
@@ -157,9 +151,9 @@  l3_xenmap:
         idx = 0
         .rept L3_PAGETABLE_ENTRIES
         .if idx == l3_table_offset(XEN_VIRT_START)
-        .quad sym_phys(l2_xenmap) + __PAGE_HYPERVISOR
+        .quad sym_offset(l2_xenmap) + __PAGE_HYPERVISOR
         .elseif idx == l3_table_offset(FIXADDR_TOP - 1)
-        .quad sym_phys(l2_fixmap) + __PAGE_HYPERVISOR
+        .quad sym_offset(l2_fixmap) + __PAGE_HYPERVISOR
         .else
         .quad 0
         .endif
@@ -169,13 +163,13 @@  l3_xenmap:
 
 /* Top-level master (and idle-domain) page directory. */
 GLOBAL(idle_pg_table)
-        .quad sym_phys(l3_bootmap) + __PAGE_HYPERVISOR
+        .quad sym_offset(l3_bootmap) + __PAGE_HYPERVISOR
         idx = 1
         .rept L4_PAGETABLE_ENTRIES - 1
         .if idx == l4_table_offset(DIRECTMAP_VIRT_START)
-        .quad sym_phys(l3_identmap) + __PAGE_HYPERVISOR
+        .quad sym_offset(l3_identmap) + __PAGE_HYPERVISOR
         .elseif idx == l4_table_offset(XEN_VIRT_START)
-        .quad sym_phys(l3_xenmap) + __PAGE_HYPERVISOR
+        .quad sym_offset(l3_xenmap) + __PAGE_HYPERVISOR
         .else
         .quad 0
         .endif
@@ -185,21 +179,14 @@  GLOBAL(idle_pg_table)
 
 GLOBAL(__page_tables_end)
 
-/* Init pagetables.  Enough page directories to map into the bottom 1GB. */
+/* Init pagetables. Enough page directories to map into 4GB. */
         .section .init.data, "a", @progbits
         .align PAGE_SIZE, 0
 
 GLOBAL(l2_bootmap)
-        .quad sym_phys(l1_identmap) + __PAGE_HYPERVISOR
-        idx = 1
-        .rept 7
-        .quad (idx << L2_PAGETABLE_SHIFT) | __PAGE_HYPERVISOR | _PAGE_PSE
-        idx = idx + 1
-        .endr
-        .fill L2_PAGETABLE_ENTRIES - 8, 8, 0
+        .fill 4 * L2_PAGETABLE_ENTRIES, 8, 0
         .size l2_bootmap, . - l2_bootmap
 
 GLOBAL(l3_bootmap)
-        .quad sym_phys(l2_bootmap) + __PAGE_HYPERVISOR
-        .fill L3_PAGETABLE_ENTRIES - 1, 8, 0
+        .fill L3_PAGETABLE_ENTRIES, 8, 0
         .size l3_bootmap, . - l3_bootmap
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 2622cf7..82f4216 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -136,7 +136,7 @@  void __init free_ebmalloc_unused_mem(void)
 
     if ( ebmalloc_free )
     {
-        start = (unsigned long)ebmalloc_free - xen_phys_start;
+        start = (unsigned long)ebmalloc_free - xen_img_load_base_addr;
         start = PAGE_ALIGN(start + XEN_VIRT_START);
     }
     else
@@ -679,6 +679,7 @@  static void __init efi_arch_load_addr_check(EFI_LOADED_IMAGE *loaded_image)
         blexit(L"Xen must be loaded below 4Gb.");
     if ( xen_phys_start & ((1 << L2_PAGETABLE_SHIFT) - 1) )
         blexit(L"Xen must be loaded at a 2Mb boundary.");
+    xen_img_load_base_addr = xen_phys_start;
     trampoline_xen_phys_start = xen_phys_start;
 }
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index bc48351..9706b43 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -286,9 +286,6 @@  static void *__init bootstrap_map(const module_t *mod)
     if ( start >= end )
         return NULL;
 
-    if ( end <= BOOTSTRAP_MAP_BASE )
-        return (void *)(unsigned long)start;
-
     ret = (void *)(map_cur + (unsigned long)(start & mask));
     start &= ~mask;
     end = (end + mask) & ~mask;
@@ -674,6 +671,8 @@  void __init noreturn __start_xen(unsigned long mbi_p)
 
     printk("Command line: %s\n", cmdline);
 
+    printk("Xen image load base address: 0x%08x\n", xen_img_load_base_addr);
+
     printk("Video information:\n");
 
     /* Print VGA display mode information. */
@@ -861,15 +860,17 @@  void __init noreturn __start_xen(unsigned long mbi_p)
         highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1);
 #endif
 
+    /* Do not relocate Xen image if boot loader did work for us. */
+    if ( xen_img_load_base_addr )
+        xen_phys_start = xen_img_load_base_addr;
+
     for ( i = boot_e820.nr_map-1; i >= 0; i-- )
     {
         uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
         uint64_t end, limit = ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT;
 
-        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
         s = (boot_e820.map[i].addr + mask) & ~mask;
         e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
-        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
         if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
             continue;
 
@@ -901,7 +902,6 @@  void __init noreturn __start_xen(unsigned long mbi_p)
             l4_pgentry_t *pl4e;
             l3_pgentry_t *pl3e;
             l2_pgentry_t *pl2e;
-            uint64_t load_start;
             int i, j, k;
 
             /* Select relocation address. */
@@ -915,9 +915,8 @@  void __init noreturn __start_xen(unsigned long mbi_p)
              * with a barrier(). After this we must *not* modify static/global
              * data until after we have switched to the relocated pagetables!
              */
-            load_start = (unsigned long)_start - XEN_VIRT_START;
             barrier();
-            move_memory(e + load_start, load_start, _end - _start, 1);
+            move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, 1);
 
             /* Walk initial pagetables, relocating page directory entries. */
             pl4e = __va(__pa(idle_pg_table));
@@ -933,7 +932,7 @@  void __init noreturn __start_xen(unsigned long mbi_p)
                     /* Not present, 1GB mapping, or already relocated? */
                     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ||
                          (l3e_get_flags(*pl3e) & _PAGE_PSE) ||
-                         (l3e_get_pfn(*pl3e) > 0x1000) )
+                         (l3e_get_pfn(*pl3e) > PFN_DOWN(xen_phys_start)) )
                         continue;
                     *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) +
                                             xen_phys_start);
@@ -943,7 +942,7 @@  void __init noreturn __start_xen(unsigned long mbi_p)
                         /* Not present, PSE, or already relocated? */
                         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
                              (l2e_get_flags(*pl2e) & _PAGE_PSE) ||
-                             (l2e_get_pfn(*pl2e) > 0x1000) )
+                             (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) )
                             continue;
                         *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) +
                                                 xen_phys_start);
@@ -957,15 +956,14 @@  void __init noreturn __start_xen(unsigned long mbi_p)
              * Undo the temporary-hooking of the l1_identmap.  __2M_text_start
              * is contained in this PTE.
              */
-            BUG_ON(l2_table_offset((unsigned long)_erodata) ==
-                   l2_table_offset((unsigned long)_stext));
             *pl2e++ = l2e_from_pfn(xen_phys_start >> PAGE_SHIFT,
                                    PAGE_HYPERVISOR_RX | _PAGE_PSE);
             for ( i = 1; i < L2_PAGETABLE_ENTRIES; i++, pl2e++ )
             {
                 unsigned int flags;
 
-                if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
+                if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
+                     (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) )
                     continue;
 
                 if ( !using_2M_mapping() )
@@ -1019,6 +1017,8 @@  void __init noreturn __start_xen(unsigned long mbi_p)
                 : "memory" );
 
             bootstrap_map(NULL);
+
+            printk("New Xen image base address: 0x%08lx\n", xen_phys_start);
         }
 
         /* Is the region suitable for relocating the multiboot modules? */
@@ -1082,6 +1082,7 @@  void __init noreturn __start_xen(unsigned long mbi_p)
 
     if ( !xen_phys_start )
         panic("Not enough memory to relocate Xen.");
+
     reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end));
 
     /* Late kexec reservation (dynamic start address). */
@@ -1154,14 +1155,12 @@  void __init noreturn __start_xen(unsigned long mbi_p)
 
         set_pdx_range(s >> PAGE_SHIFT, e >> PAGE_SHIFT);
 
-        /* Need to create mappings above BOOTSTRAP_MAP_BASE. */
-        map_s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
+        map_s = s;
         map_e = min_t(uint64_t, e,
                       ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT);
 
         /* Pass mapped memory to allocator /before/ creating new mappings. */
         init_boot_pages(s, min(map_s, e));
-        s = map_s;
         if ( s < map_e )
         {
             uint64_t mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index a6246c0..a0d1ea3 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -55,7 +55,7 @@  SECTIONS
   __2M_text_start = .;         /* Start of 2M superpages, mapped RX. */
 #endif
 
-  . = __XEN_VIRT_START + MB(1);
+  . = __XEN_VIRT_START + XEN_IMG_OFFSET;
   _start = .;
   .text : {
         _stext = .;            /* Text and read-only data */
@@ -257,12 +257,14 @@  SECTIONS
   .reloc : {
     *(.reloc)
   } :text
-  /* Trick the linker into setting the image size to exactly 16Mb. */
   . = ALIGN(__section_alignment__);
+#endif
+
+  /* Trick the linker into setting the image size to exactly 16Mb. */
   .pad : {
     . = ALIGN(MB(16));
+    __end_of_image__ = .;
   } :text
-#endif
 
   efi = DEFINED(efi) ? efi : .;
 
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index 6fd84e7..f5a2d2f 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -96,6 +96,7 @@  extern unsigned long trampoline_phys;
                  trampoline_phys-__pa(trampoline_start)))
 extern char trampoline_start[], trampoline_end[];
 extern char trampoline_realmode_entry[];
+extern unsigned int xen_img_load_base_addr;
 extern unsigned int trampoline_xen_phys_start;
 extern unsigned char trampoline_cpu_started;
 extern char wakeup_start[];
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 4ae387f..7324afe 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -288,7 +288,7 @@  extern root_pgentry_t idle_pg_table[ROOT_PAGETABLE_ENTRIES];
 extern l2_pgentry_t  *compat_idle_pg_table_l2;
 extern unsigned int   m2p_compat_vstart;
 extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
-    l2_bootmap[L2_PAGETABLE_ENTRIES];
+    l2_bootmap[4*L2_PAGETABLE_ENTRIES];
 extern l3_pgentry_t l3_bootmap[L3_PAGETABLE_ENTRIES];
 extern l2_pgentry_t l2_identmap[4*L2_PAGETABLE_ENTRIES];
 extern l1_pgentry_t l1_identmap[L1_PAGETABLE_ENTRIES],