Message ID | 20211207105339.28440-2-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/boot: Cleanup | expand |
On 07.12.2021 11:53, Andrew Cooper wrote: > This check has existed in one form or another since c/s 369bafdb1c1 "xen: Big > changes to x86 start-of-day" in 2007. Even then, AFAICT, it wasn't necessary > because there was a clean split between the statically created entries (L3 and > higher) and the dynamically created ones (L2 and lower). > > Without dissecting the myriad changes over the past 14 years, I'm pretty > certain Xen only booted by accident when l2_xenmap[0] was handled specially > and skipped the pte_update_limit check which would have left it corrupt. > > Nevertheless, as of right now, all non-leaf PTEs (the first loop), and all 2M > xenmap leaf mappings (the second loop) need relocating. They are no unused > mappings in the range, no mappings which will be encountered multiple times, > and it is unlikely that such mappings would be introduced. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> However, as to the description and ... > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -1230,7 +1230,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) > l3_pgentry_t *pl3e; > l2_pgentry_t *pl2e; > int i, j, k; > - unsigned long pte_update_limit; > > /* Select relocation address. */ > xen_phys_start = end - reloc_size; > @@ -1238,14 +1237,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) > bootsym(trampoline_xen_phys_start) = xen_phys_start; > > /* > - * No PTEs pointing above this address are candidates for relocation. > - * Due to possibility of partial overlap of the end of source image > - * and the beginning of region for destination image some PTEs may > - * point to addresses in range [e, e + XEN_IMG_OFFSET). > - */ > - pte_update_limit = PFN_DOWN(e); ... considering the comment here: Isn't it 0d31d1680868 ("x86/setup: do not relocate Xen over current Xen image placement") where need for this disappeared? Afaict the non-overlap of source and destination is the crucial factor here, yet your description doesn't mention this aspect at all. I'd therefore like to ask for an adjustment there. Jan
On 07/12/2021 11:37, Jan Beulich wrote: > On 07.12.2021 11:53, Andrew Cooper wrote: >> This check has existed in one form or another since c/s 369bafdb1c1 "xen: Big >> changes to x86 start-of-day" in 2007. Even then, AFAICT, it wasn't necessary >> because there was a clean split between the statically created entries (L3 and >> higher) and the dynamically created ones (L2 and lower). >> >> Without dissecting the myriad changes over the past 14 years, I'm pretty >> certain Xen only booted by accident when l2_xenmap[0] was handled specially >> and skipped the pte_update_limit check which would have left it corrupt. >> >> Nevertheless, as of right now, all non-leaf PTEs (the first loop), and all 2M >> xenmap leaf mappings (the second loop) need relocating. They are no unused >> mappings in the range, no mappings which will be encountered multiple times, >> and it is unlikely that such mappings would be introduced. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > However, as to the description and ... > >> --- a/xen/arch/x86/setup.c >> +++ b/xen/arch/x86/setup.c >> @@ -1230,7 +1230,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) >> l3_pgentry_t *pl3e; >> l2_pgentry_t *pl2e; >> int i, j, k; >> - unsigned long pte_update_limit; >> >> /* Select relocation address. */ >> xen_phys_start = end - reloc_size; >> @@ -1238,14 +1237,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) >> bootsym(trampoline_xen_phys_start) = xen_phys_start; >> >> /* >> - * No PTEs pointing above this address are candidates for relocation. >> - * Due to possibility of partial overlap of the end of source image >> - * and the beginning of region for destination image some PTEs may >> - * point to addresses in range [e, e + XEN_IMG_OFFSET). >> - */ >> - pte_update_limit = PFN_DOWN(e); > ... considering the comment here: Isn't it 0d31d1680868 ("x86/setup: do > not relocate Xen over current Xen image placement") where need for this > disappeared? Afaict the non-overlap of source and destination is the > crucial factor here, yet your description doesn't mention this aspect at > all. I'd therefore like to ask for an adjustment there. I don't consider that commit relevant. There is no circumstance ever where you can relocate Xen with actually-overlapping ranges, because one way or another, you'd end up copying non-pagetable data over the live pagetables. That particular commit was part of trying to make Xen's entry code relocatable, so the MB2 path could load Xen at somewhere which wasn't 0, but to this day we still skip any physical relocation if Xen isn't started at 0. To the comment specifically, it's actively wrong. Back when XEN_IMG_OFFSET was 1M, and we had 16M worth of unconditional mappings, then we could get PTE overlap as described, in the corner case where we were moving Xen from 0 to anywhere between 4 and 16M physical (2M physical was in principle a problem, but not an eligible position to relocate to, given Xen's compile size). And in that corner case, the logic would "corrupt" various PTEs by not relocating them. The PTE coving _start at 1M was special cased ahead of the 2nd loop (finally fixed last week) and the PTEs mapping beyond _end were unused which is why nothing actually went wrong. ~Andrew
On 09.12.2021 18:34, Andrew Cooper wrote: > On 07/12/2021 11:37, Jan Beulich wrote: >> On 07.12.2021 11:53, Andrew Cooper wrote: >>> --- a/xen/arch/x86/setup.c >>> +++ b/xen/arch/x86/setup.c >>> @@ -1230,7 +1230,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) >>> l3_pgentry_t *pl3e; >>> l2_pgentry_t *pl2e; >>> int i, j, k; >>> - unsigned long pte_update_limit; >>> >>> /* Select relocation address. */ >>> xen_phys_start = end - reloc_size; >>> @@ -1238,14 +1237,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) >>> bootsym(trampoline_xen_phys_start) = xen_phys_start; >>> >>> /* >>> - * No PTEs pointing above this address are candidates for relocation. >>> - * Due to possibility of partial overlap of the end of source image >>> - * and the beginning of region for destination image some PTEs may >>> - * point to addresses in range [e, e + XEN_IMG_OFFSET). >>> - */ >>> - pte_update_limit = PFN_DOWN(e); >> ... considering the comment here: Isn't it 0d31d1680868 ("x86/setup: do >> not relocate Xen over current Xen image placement") where need for this >> disappeared? Afaict the non-overlap of source and destination is the >> crucial factor here, yet your description doesn't mention this aspect at >> all. I'd therefore like to ask for an adjustment there. > > I don't consider that commit relevant. > > There is no circumstance ever where you can relocate Xen with > actually-overlapping ranges, because one way or another, you'd end up > copying non-pagetable data over the live pagetables. That was fragile, yes. I think I (vaguely!) recall a discussion I had with Keir about this, with him pointing out that the logic builds upon all necessary mappings being held in the TLB. If you strictly think that's not worthwhile to consider or mention, then so be it. > That particular commit was part of trying to make Xen's entry code > relocatable, so the MB2 path could load Xen at somewhere which wasn't 0, > but to this day we still skip any physical relocation if Xen isn't > started at 0. > > > To the comment specifically, it's actively wrong. > > Back when XEN_IMG_OFFSET was 1M, and we had 16M worth of unconditional > mappings, then we could get PTE overlap as described, in the corner case > where we were moving Xen from 0 to anywhere between 4 and 16M physical > (2M physical was in principle a problem, but not an eligible position to > relocate to, given Xen's compile size). > > And in that corner case, the logic would "corrupt" various PTEs by not > relocating them. The PTE coving _start at 1M was special cased ahead of > the 2nd loop (finally fixed last week) and the PTEs mapping beyond _end > were unused which is why nothing actually went wrong. The latter fact being why I guess you've put "corrupt" in quotes. Jan
On 10/12/2021 07:17, Jan Beulich wrote: > On 09.12.2021 18:34, Andrew Cooper wrote: >> On 07/12/2021 11:37, Jan Beulich wrote: >>> On 07.12.2021 11:53, Andrew Cooper wrote: >>>> --- a/xen/arch/x86/setup.c >>>> +++ b/xen/arch/x86/setup.c >>>> @@ -1230,7 +1230,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) >>>> l3_pgentry_t *pl3e; >>>> l2_pgentry_t *pl2e; >>>> int i, j, k; >>>> - unsigned long pte_update_limit; >>>> >>>> /* Select relocation address. */ >>>> xen_phys_start = end - reloc_size; >>>> @@ -1238,14 +1237,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) >>>> bootsym(trampoline_xen_phys_start) = xen_phys_start; >>>> >>>> /* >>>> - * No PTEs pointing above this address are candidates for relocation. >>>> - * Due to possibility of partial overlap of the end of source image >>>> - * and the beginning of region for destination image some PTEs may >>>> - * point to addresses in range [e, e + XEN_IMG_OFFSET). >>>> - */ >>>> - pte_update_limit = PFN_DOWN(e); >>> ... considering the comment here: Isn't it 0d31d1680868 ("x86/setup: do >>> not relocate Xen over current Xen image placement") where need for this >>> disappeared? Afaict the non-overlap of source and destination is the >>> crucial factor here, yet your description doesn't mention this aspect at >>> all. I'd therefore like to ask for an adjustment there. >> I don't consider that commit relevant. >> >> There is no circumstance ever where you can relocate Xen with >> actually-overlapping ranges, because one way or another, you'd end up >> copying non-pagetable data over the live pagetables. > That was fragile, yes. I think I (vaguely!) recall a discussion I had > with Keir about this, with him pointing out that the logic builds upon > all necessary mappings being held in the TLB. If you strictly think > that's not worthwhile to consider or mention, then so be it. Ok, I'll tweak the commit message. But having come back to this after more than a year away, I think it's worth pointing out that the details in 0d31d1680868 demonstrate that the "partial overlap" logic was indeed buggy. In a VM, a vcpu migration at just the wrong moment will empty the TLB under your feet. So will a whole slew of micro-architectural conditions, or a poorly timed SMI. Whatever people have tried to call it in the past, it was broken plain and simple. ~Andrew
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index c8641c227d9a..0492856292cf 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -1230,7 +1230,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) l3_pgentry_t *pl3e; l2_pgentry_t *pl2e; int i, j, k; - unsigned long pte_update_limit; /* Select relocation address. */ xen_phys_start = end - reloc_size; @@ -1238,14 +1237,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) bootsym(trampoline_xen_phys_start) = xen_phys_start; /* - * No PTEs pointing above this address are candidates for relocation. - * Due to possibility of partial overlap of the end of source image - * and the beginning of region for destination image some PTEs may - * point to addresses in range [e, e + XEN_IMG_OFFSET). - */ - pte_update_limit = PFN_DOWN(e); - - /* * Perform relocation to new physical address. * Before doing so we must sync static/global data with main memory * with a barrier(). After this we must *not* modify static/global @@ -1267,8 +1258,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) { /* Not present, 1GB mapping, or already relocated? */ if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) || - (l3e_get_flags(*pl3e) & _PAGE_PSE) || - (l3e_get_pfn(*pl3e) >= pte_update_limit) ) + (l3e_get_flags(*pl3e) & _PAGE_PSE) ) continue; *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) + xen_phys_start); @@ -1277,8 +1267,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) { /* Not present, PSE, or already relocated? */ if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) || - (l2e_get_flags(*pl2e) & _PAGE_PSE) || - (l2e_get_pfn(*pl2e) >= pte_update_limit) ) + (l2e_get_flags(*pl2e) & _PAGE_PSE) ) continue; *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start); @@ -1291,8 +1280,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, pl2e++ ) { if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) || - !(l2e_get_flags(*pl2e) & _PAGE_PSE) || - (l2e_get_pfn(*pl2e) >= pte_update_limit) ) + !(l2e_get_flags(*pl2e) & _PAGE_PSE) ) continue; *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start);
This check has existed in one form or another since c/s 369bafdb1c1 "xen: Big changes to x86 start-of-day" in 2007. Even then, AFAICT, it wasn't necessary because there was a clean split between the statically created entries (L3 and higher) and the dynamically created ones (L2 and lower). Without dissecting the myriad changes over the past 14 years, I'm pretty certain Xen only booted by accident when l2_xenmap[0] was handled specially and skipped the pte_update_limit check which would have left it corrupt. Nevertheless, as of right now, all non-leaf PTEs (the first loop), and all 2M xenmap leaf mappings (the second loop) need relocating. They are no unused mappings in the range, no mappings which will be encountered multiple times, and it is unlikely that such mappings would be introduced. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> --- xen/arch/x86/setup.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-)