Message ID | 2bf37d7ba36646f8245fcaabf820c87683cce83e.1484748781.git-series.cardoe@cardoe.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 18.01.17 at 15:17, <cardoe@cardoe.com> wrote: > This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol > support for EFI platforms'. > > - fix incorrect assembly (identified by Andrew Cooper) > - fix issue where the trampoline size was left as 0 and the > way the memory is allocated for the trampolines we would go to > the end of an available section and then subtract off the size > to decide where to place it. The end result was that we would > always copy the trampolines and the 32-bit stack into some > form of reserved memory after the conventional region we > wanted to put things into. On some systems this did not > manifest as a crash while on others it did. Reworked the > changes to always reserve 64kb for both the stack and the size > of the trampolines. Added a build time assert to make sure we have > enough room always. > > Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with ... > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -333,3 +333,10 @@ ASSERT(IS_ALIGNED(trampoline_start, 4), > "trampoline_start misaligned") > ASSERT(IS_ALIGNED(trampoline_end, 4), "trampoline_end misaligned") > ASSERT(IS_ALIGNED(__bss_start, 8), "__bss_start misaligned") > ASSERT(IS_ALIGNED(__bss_end, 8), "__bss_end misaligned") > + > +/* The trampolines and the low memory stack must fit in 64kb. In testing > + * the low memory stack never exceeded 1kb so just require that the > + * trampolines fit in 63kb, leaving 1kb for the stack. > + */ > +ASSERT((trampoline_end - trampoline_start) < KB(63), > + "not enough room for trampolines and low memory stack") ... the comment style here fixed (which could be done upon commit or when Daniel merges this back into his series). Jan
On 18/01/17 14:17, Doug Goldstein wrote: > This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol > support for EFI platforms'. > > - fix incorrect assembly (identified by Andrew Cooper) > - fix issue where the trampoline size was left as 0 and the > way the memory is allocated for the trampolines we would go to > the end of an available section and then subtract off the size > to decide where to place it. The end result was that we would > always copy the trampolines and the 32-bit stack into some > form of reserved memory after the conventional region we > wanted to put things into. On some systems this did not > manifest as a crash while on others it did. Reworked the > changes to always reserve 64kb for both the stack and the size > of the trampolines. Added a build time assert to make sure we have > enough room always. > > Signed-off-by: Doug Goldstein <cardoe@cardoe.com> LGTM, but this really does want squashing. ~Andrew
On Wed, Jan 18, 2017 at 07:52:44AM -0700, Jan Beulich wrote: > >>> On 18.01.17 at 15:17, <cardoe@cardoe.com> wrote: > > This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol > > support for EFI platforms'. > > > > - fix incorrect assembly (identified by Andrew Cooper) > > - fix issue where the trampoline size was left as 0 and the > > way the memory is allocated for the trampolines we would go to > > the end of an available section and then subtract off the size > > to decide where to place it. The end result was that we would > > always copy the trampolines and the 32-bit stack into some > > form of reserved memory after the conventional region we > > wanted to put things into. On some systems this did not > > manifest as a crash while on others it did. Reworked the > > changes to always reserve 64kb for both the stack and the size > > of the trampolines. Added a build time assert to make sure we have > > enough room always. > > > > Signed-off-by: Doug Goldstein <cardoe@cardoe.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > with ... > > > --- a/xen/arch/x86/xen.lds.S > > +++ b/xen/arch/x86/xen.lds.S > > @@ -333,3 +333,10 @@ ASSERT(IS_ALIGNED(trampoline_start, 4), > > "trampoline_start misaligned") > > ASSERT(IS_ALIGNED(trampoline_end, 4), "trampoline_end misaligned") > > ASSERT(IS_ALIGNED(__bss_start, 8), "__bss_start misaligned") > > ASSERT(IS_ALIGNED(__bss_end, 8), "__bss_end misaligned") > > + > > +/* The trampolines and the low memory stack must fit in 64kb. In testing > > + * the low memory stack never exceeded 1kb so just require that the > > + * trampolines fit in 63kb, leaving 1kb for the stack. > > + */ > > +ASSERT((trampoline_end - trampoline_start) < KB(63), > > + "not enough room for trampolines and low memory stack") > > ... the comment style here fixed (which could be done upon commit > or when Daniel merges this back into his series). Hmmm... Why this patch and #0 was not CC-ed to me? Anyway, it looks that I have cleared my backlog to some extent and now I am able to take a stab on v12. There is a chance that I will have it with Doug's and some additional fixes on Friday or Monday. Daniel
On 1/18/17 4:25 PM, Daniel Kiper wrote: > On Wed, Jan 18, 2017 at 07:52:44AM -0700, Jan Beulich wrote: >> >> ... the comment style here fixed (which could be done upon commit >> or when Daniel merges this back into his series). > > Hmmm... Why this patch and #0 was not CC-ed to me? > You say this like I did it intentionally to upset you. I switched to using git-series and as a result my workflow changed slightly. When I glanced at 1/5 it had you CCed and I assumed this one was as well. > Anyway, it looks that I have cleared my backlog to some extent and now I am > able to take a stab on v12. There is a chance that I will have it with Doug's > and some additional fixes on Friday or Monday. So then there should be no issue with this landing before that or not. The changes for you will be identical. A simple rebase --onto would actually be easier than folding this into your larger series. Can you tell me what were the fixes to the relocation code?
On Wed, Jan 18, 2017 at 10:40:07PM -0500, Doug Goldstein wrote: > On 1/18/17 4:25 PM, Daniel Kiper wrote: > > On Wed, Jan 18, 2017 at 07:52:44AM -0700, Jan Beulich wrote: > >> ... the comment style here fixed (which could be done upon commit > >> or when Daniel merges this back into his series). > > > > Hmmm... Why this patch and #0 was not CC-ed to me? > > You say this like I did it intentionally to upset you. I switched to > using git-series and as a result my workflow changed slightly. When I > glanced at 1/5 it had you CCed and I assumed this one was as well. I do not know why you read this in that way. I have just stated the fact. No more no less. > > Anyway, it looks that I have cleared my backlog to some extent and now I am > > able to take a stab on v12. There is a chance that I will have it with Doug's > > and some additional fixes on Friday or Monday. > > So then there should be no issue with this landing before that or not. > The changes for you will be identical. A simple rebase --onto would > actually be easier than folding this into your larger series. I do not fully agree, however, I respect your point of view. Though I am not going to discuss this any longer. I have just stated my preference clearly earlier. Maintainers are the Supreme Court and they will decide what to do. And I am not going to discuss this too. I prefer to focus on my work. > Can you tell me what were the fixes to the relocation code? Unfortunately I have not received any details from you. Just vague statements that it does not work. So, I am not able to say anything about that. If you provide more details I am happy to help. Daniel
On 1/19/17 6:56 AM, Daniel Kiper wrote: >> Can you tell me what were the fixes to the relocation code? > > Unfortunately I have not received any details from you. Just vague > statements that it does not work. So, I am not able to say anything > about that. If you provide more details I am happy to help. We discussed this on your original v11 thread. It fails on Intel NUCs, Lenovo T430, Dell PowerEdge something (I'm on vacation so I can't see the model number), HP Proliant ML10, QEMU (with OVMF) and some other broads that I don't remember the model number off of my head. Its about a dozen machines all together. They fail in two different ways: - the other CPUs are reported as stuck and the machine boots but 'xl info' reports 1 available CPU. - when the non-CPUs are brought up it dies when setting paging back into cr0. Every single machine fails in one of these two ways.
On Thu, Jan 19, 2017 at 09:25:08AM -0500, Doug Goldstein wrote: > On 1/19/17 6:56 AM, Daniel Kiper wrote: > >> Can you tell me what were the fixes to the relocation code? > > > > Unfortunately I have not received any details from you. Just vague > > statements that it does not work. So, I am not able to say anything > > about that. If you provide more details I am happy to help. > > We discussed this on your original v11 thread. It fails on Intel NUCs, > Lenovo T430, Dell PowerEdge something (I'm on vacation so I can't see > the model number), HP Proliant ML10, QEMU (with OVMF) and some other Do you use QEMU with KVM enabled? I was hit twice by an issue (I do not remember the detail right now) when OVMF was running in KVM. When I disabled KVM then everything worked without any issue. It does not happen often but... > broads that I don't remember the model number off of my head. Its about > a dozen machines all together. They fail in two different ways: > > - the other CPUs are reported as stuck and the machine boots but 'xl > info' reports 1 available CPU. > - when the non-CPUs are brought up it dies when setting paging back into > cr0. > > Every single machine fails in one of these two ways. Hmmm... I have not seen that anywhere. And nobody except you reported such issue. Is it happening with GRUB2 and/or iPXE? Anyway, I have just released v12. It contains various fixes including yours. I hope that I have not missed anything. Please take a look. In the meantime I will try to find one of machines mentioned above and reproduce this issue there. Daniel
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index ac93df0..df5fdab 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -519,6 +519,7 @@ trampoline_setup: 1: /* Switch to low-memory stack. */ mov sym_phys(trampoline_phys),%edi + /* The stack ends 64kb after the location of trampoline_phys */ lea 0x10000(%edi),%esp lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax pushl $BOOT_CS32 diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index dc857d8..d2ebf21 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -146,8 +146,6 @@ static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable, { struct e820entry *e; unsigned int i; - /* Check for extra mem for mbi data if Xen is loaded via multiboot2 protocol. */ - UINTN extra_mem = efi_enabled(EFI_LOADER) ? 0 : (64 << 10); /* Populate E820 table and check trampoline area availability. */ e = e820map - 1; @@ -170,8 +168,7 @@ static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable, /* fall through */ case EfiConventionalMemory: if ( !trampoline_phys && desc->PhysicalStart + len <= 0x100000 && - len >= cfg.size + extra_mem && - desc->PhysicalStart + len > cfg.addr ) + len >= cfg.size && desc->PhysicalStart + len > cfg.addr ) cfg.addr = (desc->PhysicalStart + len - cfg.size) & PAGE_MASK; /* fall through */ case EfiLoaderCode: @@ -686,6 +683,9 @@ paddr_t __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTa setup_efi_pci(); efi_variables(); + /* This is the maximum size of our trampoline + our low memory stack */ + cfg.size = KB(64); + if ( gop ) efi_set_gop_mode(gop, gop_mode); diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c index 6ea6aa1..b81adc0 100644 --- a/xen/arch/x86/efi/stub.c +++ b/xen/arch/x86/efi/stub.c @@ -33,7 +33,7 @@ paddr_t __init noreturn efi_multiboot2(EFI_HANDLE ImageHandle, * not be directly supported by C compiler. */ asm volatile( - " call %2 \n" + " call *%2 \n" "0: hlt \n" " jmp 0b \n" : "+c" (StdErr), "+d" (err) : "g" (StdErr->OutputString) diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S index e0e2529..504348e 100644 --- a/xen/arch/x86/xen.lds.S +++ b/xen/arch/x86/xen.lds.S @@ -333,3 +333,10 @@ ASSERT(IS_ALIGNED(trampoline_start, 4), "trampoline_start misaligned") ASSERT(IS_ALIGNED(trampoline_end, 4), "trampoline_end misaligned") ASSERT(IS_ALIGNED(__bss_start, 8), "__bss_start misaligned") ASSERT(IS_ALIGNED(__bss_end, 8), "__bss_end misaligned") + +/* The trampolines and the low memory stack must fit in 64kb. In testing + * the low memory stack never exceeded 1kb so just require that the + * trampolines fit in 63kb, leaving 1kb for the stack. + */ +ASSERT((trampoline_end - trampoline_start) < KB(63), + "not enough room for trampolines and low memory stack") diff --git a/xen/include/xen/config.h b/xen/include/xen/config.h index 473c5e8..04e4da5 100644 --- a/xen/include/xen/config.h +++ b/xen/include/xen/config.h @@ -70,6 +70,7 @@ #define __force #define __bitwise +#define KB(_kb) (_AC(_kb, ULL) << 10) #define MB(_mb) (_AC(_mb, ULL) << 20) #define GB(_gb) (_AC(_gb, ULL) << 30)
This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol support for EFI platforms'. - fix incorrect assembly (identified by Andrew Cooper) - fix issue where the trampoline size was left as 0 and the way the memory is allocated for the trampolines we would go to the end of an available section and then subtract off the size to decide where to place it. The end result was that we would always copy the trampolines and the 32-bit stack into some form of reserved memory after the conventional region we wanted to put things into. On some systems this did not manifest as a crash while on others it did. Reworked the changes to always reserve 64kb for both the stack and the size of the trampolines. Added a build time assert to make sure we have enough room always. Signed-off-by: Doug Goldstein <cardoe@cardoe.com> --- Doug v4 - change wording around "stack base" (found by Jan Beulich) - added build time assert as suggested by Jan Beulich - added a KB() macro to make our sizes consistent with MB() and GB(). Doug v3 - drop ASSERTs since they are runtime only without any output. This should be completely mitigated by using max() and ensuring we have a sane value. (found by Jan Beulich) - removed extra_mem variable that was incorrectly left behind. (found by Jan Beulich) - fix comment around the "start of stack" (found by Jan Beulich) Doug v2 - new in this version to help show what's changed --- --- xen/arch/x86/boot/head.S | 1 + xen/arch/x86/efi/efi-boot.h | 8 ++++---- xen/arch/x86/efi/stub.c | 2 +- xen/arch/x86/xen.lds.S | 7 +++++++ xen/include/xen/config.h | 1 + 5 files changed, 14 insertions(+), 5 deletions(-)