diff mbox

[v4,5/5] fix: add multiboot2 protocol support for EFI platforms

Message ID 2bf37d7ba36646f8245fcaabf820c87683cce83e.1484748781.git-series.cardoe@cardoe.com (mailing list archive)
State New, archived
Headers show

Commit Message

Douglas Goldstein Jan. 18, 2017, 2:17 p.m. UTC
This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol
support for EFI platforms'.

- fix incorrect assembly (identified by Andrew Cooper)
- fix issue where the trampoline size was left as 0 and the
  way the memory is allocated for the trampolines we would go to
  the end of an available section and then subtract off the size
  to decide where to place it. The end result was that we would
  always copy the trampolines and the 32-bit stack into some
  form of reserved memory after the conventional region we
  wanted to put things into. On some systems this did not
  manifest as a crash while on others it did. Reworked the
  changes to always reserve 64kb for both the stack and the size
  of the trampolines. Added a build time assert to make sure we have
  enough room always.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
---
Doug v4 - change wording around "stack base"
          (found by Jan Beulich)
        - added build time assert as suggested by Jan Beulich
        - added a KB() macro to make our sizes consistent with MB() and
          GB().
Doug v3 - drop ASSERTs since they are runtime only without any output.
          This should be completely mitigated by using max() and
          ensuring we have a sane value.
          (found by Jan Beulich)
        - removed extra_mem variable that was incorrectly left behind.
          (found by Jan Beulich)
        - fix comment around the "start of stack"
          (found by Jan Beulich)
Doug v2 - new in this version to help show what's changed
---
---
 xen/arch/x86/boot/head.S    | 1 +
 xen/arch/x86/efi/efi-boot.h | 8 ++++----
 xen/arch/x86/efi/stub.c     | 2 +-
 xen/arch/x86/xen.lds.S      | 7 +++++++
 xen/include/xen/config.h    | 1 +
 5 files changed, 14 insertions(+), 5 deletions(-)

Comments

Jan Beulich Jan. 18, 2017, 2:52 p.m. UTC | #1
>>> On 18.01.17 at 15:17, <cardoe@cardoe.com> wrote:
> This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol
> support for EFI platforms'.
> 
> - fix incorrect assembly (identified by Andrew Cooper)
> - fix issue where the trampoline size was left as 0 and the
>   way the memory is allocated for the trampolines we would go to
>   the end of an available section and then subtract off the size
>   to decide where to place it. The end result was that we would
>   always copy the trampolines and the 32-bit stack into some
>   form of reserved memory after the conventional region we
>   wanted to put things into. On some systems this did not
>   manifest as a crash while on others it did. Reworked the
>   changes to always reserve 64kb for both the stack and the size
>   of the trampolines. Added a build time assert to make sure we have
>   enough room always.
> 
> Signed-off-by: Doug Goldstein <cardoe@cardoe.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with ...

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -333,3 +333,10 @@ ASSERT(IS_ALIGNED(trampoline_start, 4), 
> "trampoline_start misaligned")
>  ASSERT(IS_ALIGNED(trampoline_end,   4), "trampoline_end misaligned")
>  ASSERT(IS_ALIGNED(__bss_start,      8), "__bss_start misaligned")
>  ASSERT(IS_ALIGNED(__bss_end,        8), "__bss_end misaligned")
> +
> +/* The trampolines and the low memory stack must fit in 64kb. In testing
> + * the low memory stack never exceeded 1kb so just require that the
> + * trampolines fit in 63kb, leaving 1kb for the stack.
> + */
> +ASSERT((trampoline_end - trampoline_start) < KB(63),
> +    "not enough room for trampolines and low memory stack")

... the comment style here fixed (which could be done upon commit
or when Daniel merges this back into his series).

Jan
Andrew Cooper Jan. 18, 2017, 7:53 p.m. UTC | #2
On 18/01/17 14:17, Doug Goldstein wrote:
> This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol
> support for EFI platforms'.
>
> - fix incorrect assembly (identified by Andrew Cooper)
> - fix issue where the trampoline size was left as 0 and the
>   way the memory is allocated for the trampolines we would go to
>   the end of an available section and then subtract off the size
>   to decide where to place it. The end result was that we would
>   always copy the trampolines and the 32-bit stack into some
>   form of reserved memory after the conventional region we
>   wanted to put things into. On some systems this did not
>   manifest as a crash while on others it did. Reworked the
>   changes to always reserve 64kb for both the stack and the size
>   of the trampolines. Added a build time assert to make sure we have
>   enough room always.
>
> Signed-off-by: Doug Goldstein <cardoe@cardoe.com>

LGTM, but this really does want squashing.

~Andrew
Daniel Kiper Jan. 18, 2017, 9:25 p.m. UTC | #3
On Wed, Jan 18, 2017 at 07:52:44AM -0700, Jan Beulich wrote:
> >>> On 18.01.17 at 15:17, <cardoe@cardoe.com> wrote:
> > This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol
> > support for EFI platforms'.
> >
> > - fix incorrect assembly (identified by Andrew Cooper)
> > - fix issue where the trampoline size was left as 0 and the
> >   way the memory is allocated for the trampolines we would go to
> >   the end of an available section and then subtract off the size
> >   to decide where to place it. The end result was that we would
> >   always copy the trampolines and the 32-bit stack into some
> >   form of reserved memory after the conventional region we
> >   wanted to put things into. On some systems this did not
> >   manifest as a crash while on others it did. Reworked the
> >   changes to always reserve 64kb for both the stack and the size
> >   of the trampolines. Added a build time assert to make sure we have
> >   enough room always.
> >
> > Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with ...
>
> > --- a/xen/arch/x86/xen.lds.S
> > +++ b/xen/arch/x86/xen.lds.S
> > @@ -333,3 +333,10 @@ ASSERT(IS_ALIGNED(trampoline_start, 4),
> > "trampoline_start misaligned")
> >  ASSERT(IS_ALIGNED(trampoline_end,   4), "trampoline_end misaligned")
> >  ASSERT(IS_ALIGNED(__bss_start,      8), "__bss_start misaligned")
> >  ASSERT(IS_ALIGNED(__bss_end,        8), "__bss_end misaligned")
> > +
> > +/* The trampolines and the low memory stack must fit in 64kb. In testing
> > + * the low memory stack never exceeded 1kb so just require that the
> > + * trampolines fit in 63kb, leaving 1kb for the stack.
> > + */
> > +ASSERT((trampoline_end - trampoline_start) < KB(63),
> > +    "not enough room for trampolines and low memory stack")
>
> ... the comment style here fixed (which could be done upon commit
> or when Daniel merges this back into his series).

Hmmm... Why this patch and #0 was not CC-ed to me?

Anyway, it looks that I have cleared my backlog to some extent and now I am
able to take a stab on v12. There is a chance that I will have it with Doug's
and some additional fixes on Friday or Monday.

Daniel
Douglas Goldstein Jan. 19, 2017, 3:40 a.m. UTC | #4
On 1/18/17 4:25 PM, Daniel Kiper wrote:
> On Wed, Jan 18, 2017 at 07:52:44AM -0700, Jan Beulich wrote:

>>
>> ... the comment style here fixed (which could be done upon commit
>> or when Daniel merges this back into his series).
> 
> Hmmm... Why this patch and #0 was not CC-ed to me?
>

You say this like I did it intentionally to upset you. I switched to
using git-series and as a result my workflow changed slightly. When I
glanced at 1/5 it had you CCed and I assumed this one was as well.

> Anyway, it looks that I have cleared my backlog to some extent and now I am
> able to take a stab on v12. There is a chance that I will have it with Doug's
> and some additional fixes on Friday or Monday.

So then there should be no issue with this landing before that or not.
The changes for you will be identical. A simple rebase --onto would
actually be easier than folding this into your larger series.

Can you tell me what were the fixes to the relocation code?
Daniel Kiper Jan. 19, 2017, 11:56 a.m. UTC | #5
On Wed, Jan 18, 2017 at 10:40:07PM -0500, Doug Goldstein wrote:
> On 1/18/17 4:25 PM, Daniel Kiper wrote:
> > On Wed, Jan 18, 2017 at 07:52:44AM -0700, Jan Beulich wrote:
> >> ... the comment style here fixed (which could be done upon commit
> >> or when Daniel merges this back into his series).
> >
> > Hmmm... Why this patch and #0 was not CC-ed to me?
>
> You say this like I did it intentionally to upset you. I switched to
> using git-series and as a result my workflow changed slightly. When I
> glanced at 1/5 it had you CCed and I assumed this one was as well.

I do not know why you read this in that way. I have just stated the fact.
No more no less.

> > Anyway, it looks that I have cleared my backlog to some extent and now I am
> > able to take a stab on v12. There is a chance that I will have it with Doug's
> > and some additional fixes on Friday or Monday.
>
> So then there should be no issue with this landing before that or not.
> The changes for you will be identical. A simple rebase --onto would
> actually be easier than folding this into your larger series.

I do not fully agree, however, I respect your point of view. Though I am not
going to discuss this any longer. I have just stated my preference clearly
earlier. Maintainers are the Supreme Court and they will decide what to do.
And I am not going to discuss this too. I prefer to focus on my work.

> Can you tell me what were the fixes to the relocation code?

Unfortunately I have not received any details from you. Just vague
statements that it does not work. So, I am not able to say anything
about that. If you provide more details I am happy to help.

Daniel
Douglas Goldstein Jan. 19, 2017, 2:25 p.m. UTC | #6
On 1/19/17 6:56 AM, Daniel Kiper wrote:
>> Can you tell me what were the fixes to the relocation code?
> 
> Unfortunately I have not received any details from you. Just vague
> statements that it does not work. So, I am not able to say anything
> about that. If you provide more details I am happy to help.

We discussed this on your original v11 thread. It fails on Intel NUCs,
Lenovo T430, Dell PowerEdge something (I'm on vacation so I can't see
the model number), HP Proliant ML10, QEMU (with OVMF) and some other
broads that I don't remember the model number off of my head. Its about
a dozen machines all together. They fail in two different ways:

- the other CPUs are reported as stuck and the machine boots but 'xl
info' reports 1 available CPU.
- when the non-CPUs are brought up it dies when setting paging back into
cr0.

Every single machine fails in one of these two ways.
Daniel Kiper Jan. 20, 2017, 1:45 a.m. UTC | #7
On Thu, Jan 19, 2017 at 09:25:08AM -0500, Doug Goldstein wrote:
> On 1/19/17 6:56 AM, Daniel Kiper wrote:
> >> Can you tell me what were the fixes to the relocation code?
> >
> > Unfortunately I have not received any details from you. Just vague
> > statements that it does not work. So, I am not able to say anything
> > about that. If you provide more details I am happy to help.
>
> We discussed this on your original v11 thread. It fails on Intel NUCs,
> Lenovo T430, Dell PowerEdge something (I'm on vacation so I can't see
> the model number), HP Proliant ML10, QEMU (with OVMF) and some other

Do you use QEMU with KVM enabled? I was hit twice by an issue (I do not
remember the detail right now) when OVMF was running in KVM. When I disabled
KVM then everything worked without any issue. It does not happen often but...

> broads that I don't remember the model number off of my head. Its about
> a dozen machines all together. They fail in two different ways:
>
> - the other CPUs are reported as stuck and the machine boots but 'xl
> info' reports 1 available CPU.
> - when the non-CPUs are brought up it dies when setting paging back into
> cr0.
>
> Every single machine fails in one of these two ways.

Hmmm... I have not seen that anywhere. And nobody except you reported such
issue. Is it happening with GRUB2 and/or iPXE?

Anyway, I have just released v12. It contains various fixes including
yours. I hope that I have not missed anything. Please take a look.

In the meantime I will try to find one of machines mentioned above and
reproduce this issue there.

Daniel
diff mbox

Patch

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index ac93df0..df5fdab 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -519,6 +519,7 @@  trampoline_setup:
 1:
         /* Switch to low-memory stack.  */
         mov     sym_phys(trampoline_phys),%edi
+        /* The stack ends 64kb after the location of trampoline_phys */
         lea     0x10000(%edi),%esp
         lea     trampoline_boot_cpu_entry-trampoline_start(%edi),%eax
         pushl   $BOOT_CS32
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index dc857d8..d2ebf21 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -146,8 +146,6 @@  static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
 {
     struct e820entry *e;
     unsigned int i;
-    /* Check for extra mem for mbi data if Xen is loaded via multiboot2 protocol. */
-    UINTN extra_mem = efi_enabled(EFI_LOADER) ? 0 : (64 << 10);
 
     /* Populate E820 table and check trampoline area availability. */
     e = e820map - 1;
@@ -170,8 +168,7 @@  static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
             /* fall through */
         case EfiConventionalMemory:
             if ( !trampoline_phys && desc->PhysicalStart + len <= 0x100000 &&
-                 len >= cfg.size + extra_mem &&
-                 desc->PhysicalStart + len > cfg.addr )
+                 len >= cfg.size && desc->PhysicalStart + len > cfg.addr )
                 cfg.addr = (desc->PhysicalStart + len - cfg.size) & PAGE_MASK;
             /* fall through */
         case EfiLoaderCode:
@@ -686,6 +683,9 @@  paddr_t __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTa
     setup_efi_pci();
     efi_variables();
 
+    /* This is the maximum size of our trampoline + our low memory stack */
+    cfg.size = KB(64);
+
     if ( gop )
         efi_set_gop_mode(gop, gop_mode);
 
diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c
index 6ea6aa1..b81adc0 100644
--- a/xen/arch/x86/efi/stub.c
+++ b/xen/arch/x86/efi/stub.c
@@ -33,7 +33,7 @@  paddr_t __init noreturn efi_multiboot2(EFI_HANDLE ImageHandle,
      * not be directly supported by C compiler.
      */
     asm volatile(
-    "    call %2                      \n"
+    "    call *%2                     \n"
     "0:  hlt                          \n"
     "    jmp  0b                      \n"
        : "+c" (StdErr), "+d" (err) : "g" (StdErr->OutputString)
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index e0e2529..504348e 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -333,3 +333,10 @@  ASSERT(IS_ALIGNED(trampoline_start, 4), "trampoline_start misaligned")
 ASSERT(IS_ALIGNED(trampoline_end,   4), "trampoline_end misaligned")
 ASSERT(IS_ALIGNED(__bss_start,      8), "__bss_start misaligned")
 ASSERT(IS_ALIGNED(__bss_end,        8), "__bss_end misaligned")
+
+/* The trampolines and the low memory stack must fit in 64kb. In testing
+ * the low memory stack never exceeded 1kb so just require that the
+ * trampolines fit in 63kb, leaving 1kb for the stack.
+ */
+ASSERT((trampoline_end - trampoline_start) < KB(63),
+    "not enough room for trampolines and low memory stack")
diff --git a/xen/include/xen/config.h b/xen/include/xen/config.h
index 473c5e8..04e4da5 100644
--- a/xen/include/xen/config.h
+++ b/xen/include/xen/config.h
@@ -70,6 +70,7 @@ 
 #define __force
 #define __bitwise
 
+#define KB(_kb)     (_AC(_kb, ULL) << 10)
 #define MB(_mb)     (_AC(_mb, ULL) << 20)
 #define GB(_gb)     (_AC(_gb, ULL) << 30)