Message ID | 20250410195012.363658-1-jason.andryuk@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] x86/xen: Fix PVH dom0 xen_hypercall detection | expand |
On 10/04/2025 8:50 pm, Jason Andryuk wrote: > A Xen PVH dom0 on an AMD processor triple faults early in boot on > 6.6.86. CPU detection appears to fail, as the faulting instruction is > vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). > > Detection fails because __xen_hypercall_setfunc() returns the full > kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - > e.g. 0xffffffff815b93f0. But this is compared against the rip-relative > xen_hypercall_amd(%rip), which when running from identity mapping, is > only 0x015b93f0. > > Replace the rip-relative address with just loading the actual address to > restore the proper comparision. > > This only seems to affect PVH dom0 boot. This is probably because the > XENMEM_memory_map hypercall is issued early on from the identity > mappings. With a domU, the memory map is provided via hvm_start_info > and the hypercall is skipped. The domU is probably running from the > kernel high mapping when it issues hypercalls. > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> > --- > I think this sort of address mismatch would be addresed by > e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") > > That could be backported instead, but it depends on a fair number of > patches. I've just spoken to Ard, and he thinks that it's standalone. Should be ok to backport as a fix. > Not sure on how getting a patch just into 6.6 would work. This patch > could go into upstream Linux though it's not strictly necessary when the > rip-relative address is a high address. Do we know which other trees are broken? I only found 6.6 because I was messing around with other bits of CI that happen to use 6.6. ~Andrew
On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 10/04/2025 8:50 pm, Jason Andryuk wrote: > > A Xen PVH dom0 on an AMD processor triple faults early in boot on > > 6.6.86. CPU detection appears to fail, as the faulting instruction is > > vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). > > > > Detection fails because __xen_hypercall_setfunc() returns the full > > kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - > > e.g. 0xffffffff815b93f0. But this is compared against the rip-relative > > xen_hypercall_amd(%rip), which when running from identity mapping, is > > only 0x015b93f0. > > > > Replace the rip-relative address with just loading the actual address to > > restore the proper comparision. > > > > This only seems to affect PVH dom0 boot. This is probably because the > > XENMEM_memory_map hypercall is issued early on from the identity > > mappings. With a domU, the memory map is provided via hvm_start_info > > and the hypercall is skipped. The domU is probably running from the > > kernel high mapping when it issues hypercalls. > > > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> > > --- > > I think this sort of address mismatch would be addresed by > > e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") > > > > That could be backported instead, but it depends on a fair number of > > patches. > > I've just spoken to Ard, and he thinks that it's standalone. Should be > ok to backport as a fix. > I've tried building and booting 6.6.y with the patch applied - GS will still be set to the 1:1 mapped address but that shouldn't matter, given that it is only used for the stack canary, and we don't do address comparisons on that afaik. > > Not sure on how getting a patch just into 6.6 would work. This patch > > could go into upstream Linux though it's not strictly necessary when the > > rip-relative address is a high address. > > Do we know which other trees are broken? I only found 6.6 because I was > messing around with other bits of CI that happen to use 6.6. > I'd assume all trees that had the hypercall page removal patch backported to them will be broken in the same way.
On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote: > A Xen PVH dom0 on an AMD processor triple faults early in boot on > 6.6.86. CPU detection appears to fail, as the faulting instruction is > vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). > > Detection fails because __xen_hypercall_setfunc() returns the full > kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - > e.g. 0xffffffff815b93f0. But this is compared against the rip-relative > xen_hypercall_amd(%rip), which when running from identity mapping, is > only 0x015b93f0. > > Replace the rip-relative address with just loading the actual address to > restore the proper comparision. > > This only seems to affect PVH dom0 boot. This is probably because the > XENMEM_memory_map hypercall is issued early on from the identity > mappings. With a domU, the memory map is provided via hvm_start_info > and the hypercall is skipped. The domU is probably running from the > kernel high mapping when it issues hypercalls. > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> > --- > I think this sort of address mismatch would be addresed by > e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") > > That could be backported instead, but it depends on a fair number of > patches. > > Not sure on how getting a patch just into 6.6 would work. This patch > could go into upstream Linux though it's not strictly necessary when the > rip-relative address is a high address. > --- > arch/x86/xen/xen-head.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S > index 059f343da76d..71a0eda2da60 100644 > --- a/arch/x86/xen/xen-head.S > +++ b/arch/x86/xen/xen-head.S > @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm) > pop %ebx > pop %eax > #else > - lea xen_hypercall_amd(%rip), %rcx > + mov $xen_hypercall_amd, %rcx (Now that this is known to be the fix upstream) This probably wants to be plain lea without RIP-relative addressing, like the x86_32 branch above? > cmp %rax, %rcx > #ifdef CONFIG_FRAME_POINTER > pop %rax /* Dummy pop. */
On 11.04.2025 14:46, Alejandro Vallejo wrote: > On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote: >> A Xen PVH dom0 on an AMD processor triple faults early in boot on >> 6.6.86. CPU detection appears to fail, as the faulting instruction is >> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). >> >> Detection fails because __xen_hypercall_setfunc() returns the full >> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - >> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative >> xen_hypercall_amd(%rip), which when running from identity mapping, is >> only 0x015b93f0. >> >> Replace the rip-relative address with just loading the actual address to >> restore the proper comparision. >> >> This only seems to affect PVH dom0 boot. This is probably because the >> XENMEM_memory_map hypercall is issued early on from the identity >> mappings. With a domU, the memory map is provided via hvm_start_info >> and the hypercall is skipped. The domU is probably running from the >> kernel high mapping when it issues hypercalls. >> >> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> >> --- >> I think this sort of address mismatch would be addresed by >> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") >> >> That could be backported instead, but it depends on a fair number of >> patches. >> >> Not sure on how getting a patch just into 6.6 would work. This patch >> could go into upstream Linux though it's not strictly necessary when the >> rip-relative address is a high address. >> --- >> arch/x86/xen/xen-head.S | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S >> index 059f343da76d..71a0eda2da60 100644 >> --- a/arch/x86/xen/xen-head.S >> +++ b/arch/x86/xen/xen-head.S >> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm) >> pop %ebx >> pop %eax >> #else >> - lea xen_hypercall_amd(%rip), %rcx >> + mov $xen_hypercall_amd, %rcx > > (Now that this is known to be the fix upstream) This probably wants to > be plain lea without RIP-relative addressing, like the x86_32 branch > above? Why would you want to use LEA there? It's functionally identical, but the MOV can be encoded without ModR/M byte. Jan
On Fri Apr 11, 2025 at 2:08 PM BST, Jan Beulich wrote: > On 11.04.2025 14:46, Alejandro Vallejo wrote: >> On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote: >>> A Xen PVH dom0 on an AMD processor triple faults early in boot on >>> 6.6.86. CPU detection appears to fail, as the faulting instruction is >>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). >>> >>> Detection fails because __xen_hypercall_setfunc() returns the full >>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - >>> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative >>> xen_hypercall_amd(%rip), which when running from identity mapping, is >>> only 0x015b93f0. >>> >>> Replace the rip-relative address with just loading the actual address to >>> restore the proper comparision. >>> >>> This only seems to affect PVH dom0 boot. This is probably because the >>> XENMEM_memory_map hypercall is issued early on from the identity >>> mappings. With a domU, the memory map is provided via hvm_start_info >>> and the hypercall is skipped. The domU is probably running from the >>> kernel high mapping when it issues hypercalls. >>> >>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> >>> --- >>> I think this sort of address mismatch would be addresed by >>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") >>> >>> That could be backported instead, but it depends on a fair number of >>> patches. >>> >>> Not sure on how getting a patch just into 6.6 would work. This patch >>> could go into upstream Linux though it's not strictly necessary when the >>> rip-relative address is a high address. >>> --- >>> arch/x86/xen/xen-head.S | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S >>> index 059f343da76d..71a0eda2da60 100644 >>> --- a/arch/x86/xen/xen-head.S >>> +++ b/arch/x86/xen/xen-head.S >>> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm) >>> pop %ebx >>> pop %eax >>> #else >>> - lea xen_hypercall_amd(%rip), %rcx >>> + mov $xen_hypercall_amd, %rcx >> >> (Now that this is known to be the fix upstream) This probably wants to >> be plain lea without RIP-relative addressing, like the x86_32 branch >> above? > > Why would you want to use LEA there? It's functionally identical, but the > MOV can be encoded without ModR/M byte. > > Jan It's not the using of a particular encoding that I meant, but not using the same on both 32 and 64 bit paths. Surely whatever argument in favour of either would hold for both 32 and 64 bits. Cheers, Alejandro
On 2025-04-11 07:35, Ard Biesheuvel wrote: > On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> >> On 10/04/2025 8:50 pm, Jason Andryuk wrote: >>> A Xen PVH dom0 on an AMD processor triple faults early in boot on >>> 6.6.86. CPU detection appears to fail, as the faulting instruction is >>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). >>> >>> Detection fails because __xen_hypercall_setfunc() returns the full >>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - >>> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative >>> xen_hypercall_amd(%rip), which when running from identity mapping, is >>> only 0x015b93f0. >>> >>> Replace the rip-relative address with just loading the actual address to >>> restore the proper comparision. >>> >>> This only seems to affect PVH dom0 boot. This is probably because the >>> XENMEM_memory_map hypercall is issued early on from the identity >>> mappings. With a domU, the memory map is provided via hvm_start_info >>> and the hypercall is skipped. The domU is probably running from the >>> kernel high mapping when it issues hypercalls. >>> >>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> >>> --- >>> I think this sort of address mismatch would be addresed by >>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") >>> >>> That could be backported instead, but it depends on a fair number of >>> patches. >> >> I've just spoken to Ard, and he thinks that it's standalone. Should be >> ok to backport as a fix. >> > > I've tried building and booting 6.6.y with the patch applied - GS will > still be set to the 1:1 mapped address but that shouldn't matter, > given that it is only used for the stack canary, and we don't do > address comparisons on that afaik. Yes, it seems to work - I tested with dom0 and it booted. I removed the use of phys_base - the diff is included below. Does that match what you did? >>> Not sure on how getting a patch just into 6.6 would work. This patch >>> could go into upstream Linux though it's not strictly necessary when the >>> rip-relative address is a high address. >> >> Do we know which other trees are broken? I only found 6.6 because I was >> messing around with other bits of CI that happen to use 6.6. >> > > I'd assume all trees that had the hypercall page removal patch > backported to them will be broken in the same way. Yes, I think so. Looks like it went back to 5.10 but not to 5.4. Ard, I can submit the stable request unless you want to. Regards, Jason diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S index c4365a05ab83..9bf4cc04f079 100644 --- a/arch/x86/platform/pvh/head.S +++ b/arch/x86/platform/pvh/head.S @@ -100,7 +100,11 @@ SYM_CODE_START_LOCAL(pvh_start_xen) xor %edx, %edx wrmsr - call xen_prepare_pvh + /* Call xen_prepare_pvh() via the kernel virtual mapping */ + leaq xen_prepare_pvh(%rip), %rax + addq $__START_KERNEL_map, %rax + ANNOTATE_RETPOLINE_SAFE + call *%rax /* startup_64 expects boot_params in %rsi. */ mov $_pa(pvh_bootparams), %rsi
On Fri, 11 Apr 2025 at 16:28, Jason Andryuk <jason.andryuk@amd.com> wrote: > > > > On 2025-04-11 07:35, Ard Biesheuvel wrote: > > On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > >> > >> On 10/04/2025 8:50 pm, Jason Andryuk wrote: > >>> A Xen PVH dom0 on an AMD processor triple faults early in boot on > >>> 6.6.86. CPU detection appears to fail, as the faulting instruction is > >>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). > >>> > >>> Detection fails because __xen_hypercall_setfunc() returns the full > >>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - > >>> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative > >>> xen_hypercall_amd(%rip), which when running from identity mapping, is > >>> only 0x015b93f0. > >>> > >>> Replace the rip-relative address with just loading the actual address to > >>> restore the proper comparision. > >>> > >>> This only seems to affect PVH dom0 boot. This is probably because the > >>> XENMEM_memory_map hypercall is issued early on from the identity > >>> mappings. With a domU, the memory map is provided via hvm_start_info > >>> and the hypercall is skipped. The domU is probably running from the > >>> kernel high mapping when it issues hypercalls. > >>> > >>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> > >>> --- > >>> I think this sort of address mismatch would be addresed by > >>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") > >>> > >>> That could be backported instead, but it depends on a fair number of > >>> patches. > >> > >> I've just spoken to Ard, and he thinks that it's standalone. Should be > >> ok to backport as a fix. > >> > > > > I've tried building and booting 6.6.y with the patch applied - GS will > > still be set to the 1:1 mapped address but that shouldn't matter, > > given that it is only used for the stack canary, and we don't do > > address comparisons on that afaik. > > Yes, it seems to work - I tested with dom0 and it booted. I removed the > use of phys_base - the diff is included below. Does that match what you > did? > The stable tree maintainers generally prefer the backports to be as close to the originals as possible, and given that phys_base is guaranteed to be 0x0, you might as well keep the subtraction. > >>> Not sure on how getting a patch just into 6.6 would work. This patch > >>> could go into upstream Linux though it's not strictly necessary when the > >>> rip-relative address is a high address. > >> > >> Do we know which other trees are broken? I only found 6.6 because I was > >> messing around with other bits of CI that happen to use 6.6. > >> > > > > I'd assume all trees that had the hypercall page removal patch > > backported to them will be broken in the same way. > > Yes, I think so. Looks like it went back to 5.10 but not to 5.4. > > Ard, I can submit the stable request unless you want to. > Please go ahead.
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S index 059f343da76d..71a0eda2da60 100644 --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm) pop %ebx pop %eax #else - lea xen_hypercall_amd(%rip), %rcx + mov $xen_hypercall_amd, %rcx cmp %rax, %rcx #ifdef CONFIG_FRAME_POINTER pop %rax /* Dummy pop. */
A Xen PVH dom0 on an AMD processor triple faults early in boot on 6.6.86. CPU detection appears to fail, as the faulting instruction is vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd(). Detection fails because __xen_hypercall_setfunc() returns the full kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() - e.g. 0xffffffff815b93f0. But this is compared against the rip-relative xen_hypercall_amd(%rip), which when running from identity mapping, is only 0x015b93f0. Replace the rip-relative address with just loading the actual address to restore the proper comparision. This only seems to affect PVH dom0 boot. This is probably because the XENMEM_memory_map hypercall is issued early on from the identity mappings. With a domU, the memory map is provided via hvm_start_info and the hypercall is skipped. The domU is probably running from the kernel high mapping when it issues hypercalls. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> --- I think this sort of address mismatch would be addresed by e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping") That could be backported instead, but it depends on a fair number of patches. Not sure on how getting a patch just into 6.6 would work. This patch could go into upstream Linux though it's not strictly necessary when the rip-relative address is a high address. --- arch/x86/xen/xen-head.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)