Message ID | 0102018c8d11471f-9a6d73eb-0c34-4f61-8d37-5a4418f9e0d7-000000@eu-west-1.amazonses.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] target/i386: Fix physical address truncation | expand |
Queued, thanks. Paolo
On 12/22/23 02:49, Michael Brown wrote: > The address translation logic in get_physical_address() will currently > truncate physical addresses to 32 bits unless long mode is enabled. > This is incorrect when using physical address extensions (PAE) outside > of long mode, with the result that a 32-bit operating system using PAE > to access memory above 4G will experience undefined behaviour. > > The truncation code was originally introduced in commit 33dfdb5 ("x86: > only allow real mode to access 32bit without LMA"), where it applied > only to translations performed while paging is disabled (and so cannot > affect guests using PAE). > > Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX") > rearranged the code such that the truncation also applied to the use > of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use > atomic operations for pte updates") brought this truncation into scope > for page table entry accesses, and is the first commit for which a > Windows 10 32-bit guest will reliably fail to boot if memory above 4G > is present. > > The original truncation code (now ten years old) appears to be wholly > redundant in the current codebase. With paging disabled, the CPU > cannot be in long mode and so the maximum address size for any > executed instruction is 32 bits. This will already cause the linear > address to be truncated to 32 bits, and there is therefore no way for > get_physical_address() to be asked to translate an address outside of > the 32-bit range. > > Fix by removing the address truncation in get_physical_address(). > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040 > Signed-off-by: Michael Brown <mcb30@ipxe.org> > --- > target/i386/tcg/sysemu/excp_helper.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c > index 5b86f439ad..707f7326d4 100644 > --- a/target/i386/tcg/sysemu/excp_helper.c > +++ b/target/i386/tcg/sysemu/excp_helper.c > @@ -582,12 +582,6 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, > > /* Translation disabled. */ > out->paddr = addr & x86_get_a20_mask(env); > -#ifdef TARGET_X86_64 > - if (!(env->hflags & HF_LMA_MASK)) { > - /* Without long mode we can only address 32bits in real mode */ > - out->paddr = (uint32_t)out->paddr; > - } > -#endif If the extension is not needed, then the a20 mask isn't either. But I think there are some missing masks within mmu_translate that need fixing at the same time: > /* > * Page table level 3 > */ > pte_addr = ((in->cr3 & ~0x1f) + ((addr >> 27) & 0x18)) & a20_mask; Bits 32-63 of cr3 must be ignored when !LMA. > /* > * Page table level 2 > */ > pte_addr = ((in->cr3 & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask; > if (!ptw_translate(&pte_trans, pte_addr)) { > return false; > } > restart_2_nopae: Likewise. Looking again, it appears that all of the actual pte_addr calculations have both PG_ADDRESS_MASK and a20_mask applied, and have verified that bits beyond MAXPHYSADDR are zero via rsvd_mask. r~
On Thu, Dec 21, 2023 at 10:33 PM Richard Henderson <richard.henderson@linaro.org> wrote: > > On 12/22/23 02:49, Michael Brown wrote: > > The address translation logic in get_physical_address() will currently > > truncate physical addresses to 32 bits unless long mode is enabled. > > This is incorrect when using physical address extensions (PAE) outside > > of long mode, with the result that a 32-bit operating system using PAE > > to access memory above 4G will experience undefined behaviour. > > > > The truncation code was originally introduced in commit 33dfdb5 ("x86: > > only allow real mode to access 32bit without LMA"), where it applied > > only to translations performed while paging is disabled (and so cannot > > affect guests using PAE). > > > > Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX") > > rearranged the code such that the truncation also applied to the use > > of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use > > atomic operations for pte updates") brought this truncation into scope > > for page table entry accesses, and is the first commit for which a > > Windows 10 32-bit guest will reliably fail to boot if memory above 4G > > is present. > > > > The original truncation code (now ten years old) appears to be wholly > > redundant in the current codebase. With paging disabled, the CPU > > cannot be in long mode and so the maximum address size for any > > executed instruction is 32 bits. This will already cause the linear > > address to be truncated to 32 bits, and there is therefore no way for > > get_physical_address() to be asked to translate an address outside of > > the 32-bit range. > > > > Fix by removing the address truncation in get_physical_address(). > > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040 > > Signed-off-by: Michael Brown <mcb30@ipxe.org> > > --- > > target/i386/tcg/sysemu/excp_helper.c | 6 ------ > > 1 file changed, 6 deletions(-) > > > > diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c > > index 5b86f439ad..707f7326d4 100644 > > --- a/target/i386/tcg/sysemu/excp_helper.c > > +++ b/target/i386/tcg/sysemu/excp_helper.c > > @@ -582,12 +582,6 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, > > > > /* Translation disabled. */ > > out->paddr = addr & x86_get_a20_mask(env); > > -#ifdef TARGET_X86_64 > > - if (!(env->hflags & HF_LMA_MASK)) { > > - /* Without long mode we can only address 32bits in real mode */ > > - out->paddr = (uint32_t)out->paddr; > > - } > > -#endif > > If the extension is not needed, then the a20 mask isn't either. I think it is. The extension is not needed because the masking is applied by either TCG (e.g. in gen_lea_v_seg_dest or gen_add_A0_im) or mmu_translate(); but the a20 mask is never applied elsewhere for either non-paging mode or page table walks. > But I think there are some missing masks within mmu_translate that need fixing at the same > time: Right. > > /* > > * Page table level 3 > > */ > > pte_addr = ((in->cr3 & ~0x1f) + ((addr >> 27) & 0x18)) & a20_mask; > > Bits 32-63 of cr3 must be ignored when !LMA. > > > /* > > * Page table level 2 > > */ > > pte_addr = ((in->cr3 & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask; > > if (!ptw_translate(&pte_trans, pte_addr)) { > > return false; > > } > > restart_2_nopae: > > Likewise. > > Looking again, it appears that all of the actual pte_addr calculations have both > PG_ADDRESS_MASK and a20_mask applied, and have verified that bits beyond MAXPHYSADDR are > zero via rsvd_mask. In fact, applying a20_mask is incorrect when there will be an NPT walk. I'll include Michael's patch in a more complete series and send it out after testing. Paolo
On Fri, Dec 22, 2023 at 10:04 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > If the extension is not needed, then the a20 mask isn't either. > > I think it is. The extension is not needed because the masking is > applied by either TCG (e.g. in gen_lea_v_seg_dest or gen_add_A0_im) or > mmu_translate(); but the a20 mask is never applied elsewhere for > either non-paging mode or page table walks. Hmm, except helpers do not apply the masking. :/ So Michael's patch would for example break something as silly as a BOUND, FSAVE or XSAVE operation invoked around the 4GB boundary. The easiest way to proceed is to introduce a new MMU index MMU_PTW_IDX, which is the same as MMU_PHYS_IDX except it does not mask 32-bit addresses. Any objections? Paolo
On Fri, Dec 22, 2023 at 5:16 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On Fri, Dec 22, 2023 at 10:04 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > If the extension is not needed, then the a20 mask isn't either. > > > > I think it is. The extension is not needed because the masking is > > applied by either TCG (e.g. in gen_lea_v_seg_dest or gen_add_A0_im) or > > mmu_translate(); but the a20 mask is never applied elsewhere for > > either non-paging mode or page table walks. > > Hmm, except helpers do not apply the masking. :/ > > So Michael's patch would for example break something as silly as a > BOUND, FSAVE or XSAVE operation invoked around the 4GB boundary. > > The easiest way to proceed is to introduce a new MMU index > MMU_PTW_IDX, which is the same as MMU_PHYS_IDX except it does not mask > 32-bit addresses. Any objections? Nevermind, I wasn't thinking straight. Helpers will not use MMU_PHYS_IDX. So those are fine, we just need to keep the masking before the "break". The only user of MMU_PHYS_IDX is VMRUN/VMLOAD/VMSAVE. We need to add checks that the VMCB is aligned there, and same for writing to MSR_HSAVE_PA. Paolo
diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 5b86f439ad..707f7326d4 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -582,12 +582,6 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, /* Translation disabled. */ out->paddr = addr & x86_get_a20_mask(env); -#ifdef TARGET_X86_64 - if (!(env->hflags & HF_LMA_MASK)) { - /* Without long mode we can only address 32bits in real mode */ - out->paddr = (uint32_t)out->paddr; - } -#endif out->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; out->page_size = TARGET_PAGE_SIZE; return true;
The address translation logic in get_physical_address() will currently truncate physical addresses to 32 bits unless long mode is enabled. This is incorrect when using physical address extensions (PAE) outside of long mode, with the result that a 32-bit operating system using PAE to access memory above 4G will experience undefined behaviour. The truncation code was originally introduced in commit 33dfdb5 ("x86: only allow real mode to access 32bit without LMA"), where it applied only to translations performed while paging is disabled (and so cannot affect guests using PAE). Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX") rearranged the code such that the truncation also applied to the use of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use atomic operations for pte updates") brought this truncation into scope for page table entry accesses, and is the first commit for which a Windows 10 32-bit guest will reliably fail to boot if memory above 4G is present. The original truncation code (now ten years old) appears to be wholly redundant in the current codebase. With paging disabled, the CPU cannot be in long mode and so the maximum address size for any executed instruction is 32 bits. This will already cause the linear address to be truncated to 32 bits, and there is therefore no way for get_physical_address() to be asked to translate an address outside of the 32-bit range. Fix by removing the address truncation in get_physical_address(). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040 Signed-off-by: Michael Brown <mcb30@ipxe.org> --- target/i386/tcg/sysemu/excp_helper.c | 6 ------ 1 file changed, 6 deletions(-)