[v2] target/i386: Fix physical address truncation

Message ID	0102018c8d11471f-9a6d73eb-0c34-4f61-8d37-5a4418f9e0d7-000000@eu-west-1.amazonses.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Michael Brown <mcb30@ipxe.org> To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org, Michael Brown <mcb30@ipxe.org>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net> Subject: [PATCH v2] target/i386: Fix physical address truncation Date: Thu, 21 Dec 2023 15:49:29 +0000 Message-ID: <0102018c8d11471f-9a6d73eb-0c34-4f61-8d37-5a4418f9e0d7-000000@eu-west-1.amazonses.com> In-Reply-To: <25995a01-720e-485a-b7c2-36ec612a888b@ipxe.org> References: <25995a01-720e-485a-b7c2-36ec612a888b@ipxe.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Feedback-ID: 1.eu-west-1.fspj4M/5bzJ9NLRzJP0PaxRwxrpZqiDQJ1IF94CF2TA=:AmazonSES Received-SPF: pass client-ip=54.240.7.18; envelope-from=0102018c8d11471f-9a6d73eb-0c34-4f61-8d37-5a4418f9e0d7-000000@eu-west-1.amazonses.com; helo=a7-18.smtp-out.eu-west-1.amazonses.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	[v2] target/i386: Fix physical address truncation \| expand [v2] target/i386: Fix physical address truncation

Michael Brown Dec. 21, 2023, 3:49 p.m. UTC

The address translation logic in get_physical_address() will currently
truncate physical addresses to 32 bits unless long mode is enabled.
This is incorrect when using physical address extensions (PAE) outside
of long mode, with the result that a 32-bit operating system using PAE
to access memory above 4G will experience undefined behaviour.

The truncation code was originally introduced in commit 33dfdb5 ("x86:
only allow real mode to access 32bit without LMA"), where it applied
only to translations performed while paging is disabled (and so cannot
affect guests using PAE).

Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
rearranged the code such that the truncation also applied to the use
of MMU_PHYS_IDX and MMU_NESTED_IDX.  Commit 4a1e9d4 ("target/i386: Use
atomic operations for pte updates") brought this truncation into scope
for page table entry accesses, and is the first commit for which a
Windows 10 32-bit guest will reliably fail to boot if memory above 4G
is present.

The original truncation code (now ten years old) appears to be wholly
redundant in the current codebase.  With paging disabled, the CPU
cannot be in long mode and so the maximum address size for any
executed instruction is 32 bits.  This will already cause the linear
address to be truncated to 32 bits, and there is therefore no way for
get_physical_address() to be asked to translate an address outside of
the 32-bit range.

Fix by removing the address truncation in get_physical_address().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040
Signed-off-by: Michael Brown <mcb30@ipxe.org>
---
 target/i386/tcg/sysemu/excp_helper.c | 6 ------
 1 file changed, 6 deletions(-)

Paolo Bonzini Dec. 21, 2023, 5:21 p.m. UTC | #1

Queued, thanks.

Paolo

Richard Henderson Dec. 21, 2023, 9:33 p.m. UTC | #2

On 12/22/23 02:49, Michael Brown wrote:
> The address translation logic in get_physical_address() will currently
> truncate physical addresses to 32 bits unless long mode is enabled.
> This is incorrect when using physical address extensions (PAE) outside
> of long mode, with the result that a 32-bit operating system using PAE
> to access memory above 4G will experience undefined behaviour.
> 
> The truncation code was originally introduced in commit 33dfdb5 ("x86:
> only allow real mode to access 32bit without LMA"), where it applied
> only to translations performed while paging is disabled (and so cannot
> affect guests using PAE).
> 
> Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
> rearranged the code such that the truncation also applied to the use
> of MMU_PHYS_IDX and MMU_NESTED_IDX.  Commit 4a1e9d4 ("target/i386: Use
> atomic operations for pte updates") brought this truncation into scope
> for page table entry accesses, and is the first commit for which a
> Windows 10 32-bit guest will reliably fail to boot if memory above 4G
> is present.
> 
> The original truncation code (now ten years old) appears to be wholly
> redundant in the current codebase.  With paging disabled, the CPU
> cannot be in long mode and so the maximum address size for any
> executed instruction is 32 bits.  This will already cause the linear
> address to be truncated to 32 bits, and there is therefore no way for
> get_physical_address() to be asked to translate an address outside of
> the 32-bit range.
> 
> Fix by removing the address truncation in get_physical_address().
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040
> Signed-off-by: Michael Brown <mcb30@ipxe.org>
> ---
>   target/i386/tcg/sysemu/excp_helper.c | 6 ------
>   1 file changed, 6 deletions(-)
> 
> diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c
> index 5b86f439ad..707f7326d4 100644
> --- a/target/i386/tcg/sysemu/excp_helper.c
> +++ b/target/i386/tcg/sysemu/excp_helper.c
> @@ -582,12 +582,6 @@ static bool get_physical_address(CPUX86State *env, vaddr addr,
>   
>       /* Translation disabled. */
>       out->paddr = addr & x86_get_a20_mask(env);
> -#ifdef TARGET_X86_64
> -    if (!(env->hflags & HF_LMA_MASK)) {
> -        /* Without long mode we can only address 32bits in real mode */
> -        out->paddr = (uint32_t)out->paddr;
> -    }
> -#endif

If the extension is not needed, then the a20 mask isn't either.

But I think there are some missing masks within mmu_translate that need fixing at the same 
time:

>             /*
>              * Page table level 3
>              */
>             pte_addr = ((in->cr3 & ~0x1f) + ((addr >> 27) & 0x18)) & a20_mask;

Bits 32-63 of cr3 must be ignored when !LMA.

>         /*
>          * Page table level 2
>          */
>         pte_addr = ((in->cr3 & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask;
>         if (!ptw_translate(&pte_trans, pte_addr)) {
>             return false;
>         }
>     restart_2_nopae:

Likewise.

Looking again, it appears that all of the actual pte_addr calculations have both 
PG_ADDRESS_MASK and a20_mask applied, and have verified that bits beyond MAXPHYSADDR are 
zero via rsvd_mask.


r~

Paolo Bonzini Dec. 22, 2023, 9:04 a.m. UTC | #3

On Thu, Dec 21, 2023 at 10:33 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 12/22/23 02:49, Michael Brown wrote:
> > The address translation logic in get_physical_address() will currently
> > truncate physical addresses to 32 bits unless long mode is enabled.
> > This is incorrect when using physical address extensions (PAE) outside
> > of long mode, with the result that a 32-bit operating system using PAE
> > to access memory above 4G will experience undefined behaviour.
> >
> > The truncation code was originally introduced in commit 33dfdb5 ("x86:
> > only allow real mode to access 32bit without LMA"), where it applied
> > only to translations performed while paging is disabled (and so cannot
> > affect guests using PAE).
> >
> > Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
> > rearranged the code such that the truncation also applied to the use
> > of MMU_PHYS_IDX and MMU_NESTED_IDX.  Commit 4a1e9d4 ("target/i386: Use
> > atomic operations for pte updates") brought this truncation into scope
> > for page table entry accesses, and is the first commit for which a
> > Windows 10 32-bit guest will reliably fail to boot if memory above 4G
> > is present.
> >
> > The original truncation code (now ten years old) appears to be wholly
> > redundant in the current codebase.  With paging disabled, the CPU
> > cannot be in long mode and so the maximum address size for any
> > executed instruction is 32 bits.  This will already cause the linear
> > address to be truncated to 32 bits, and there is therefore no way for
> > get_physical_address() to be asked to translate an address outside of
> > the 32-bit range.
> >
> > Fix by removing the address truncation in get_physical_address().
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040
> > Signed-off-by: Michael Brown <mcb30@ipxe.org>
> > ---
> >   target/i386/tcg/sysemu/excp_helper.c | 6 ------
> >   1 file changed, 6 deletions(-)
> >
> > diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c
> > index 5b86f439ad..707f7326d4 100644
> > --- a/target/i386/tcg/sysemu/excp_helper.c
> > +++ b/target/i386/tcg/sysemu/excp_helper.c
> > @@ -582,12 +582,6 @@ static bool get_physical_address(CPUX86State *env, vaddr addr,
> >
> >       /* Translation disabled. */
> >       out->paddr = addr & x86_get_a20_mask(env);
> > -#ifdef TARGET_X86_64
> > -    if (!(env->hflags & HF_LMA_MASK)) {
> > -        /* Without long mode we can only address 32bits in real mode */
> > -        out->paddr = (uint32_t)out->paddr;
> > -    }
> > -#endif
>
> If the extension is not needed, then the a20 mask isn't either.

I think it is. The extension is not needed because the masking is
applied by either TCG (e.g. in gen_lea_v_seg_dest or gen_add_A0_im) or
mmu_translate(); but the a20 mask is never applied elsewhere for
either non-paging mode or page table walks.

> But I think there are some missing masks within mmu_translate that need fixing at the same
> time:

Right.

> >             /*
> >              * Page table level 3
> >              */
> >             pte_addr = ((in->cr3 & ~0x1f) + ((addr >> 27) & 0x18)) & a20_mask;
>
> Bits 32-63 of cr3 must be ignored when !LMA.
>
> >         /*
> >          * Page table level 2
> >          */
> >         pte_addr = ((in->cr3 & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask;
> >         if (!ptw_translate(&pte_trans, pte_addr)) {
> >             return false;
> >         }
> >     restart_2_nopae:
>
> Likewise.
>
> Looking again, it appears that all of the actual pte_addr calculations have both
> PG_ADDRESS_MASK and a20_mask applied, and have verified that bits beyond MAXPHYSADDR are
> zero via rsvd_mask.

In fact, applying a20_mask is incorrect when there will be an NPT
walk.  I'll include Michael's patch in a more complete series and send
it out after testing.

Paolo

Paolo Bonzini Dec. 22, 2023, 4:16 p.m. UTC | #4

On Fri, Dec 22, 2023 at 10:04 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > If the extension is not needed, then the a20 mask isn't either.
>
> I think it is. The extension is not needed because the masking is
> applied by either TCG (e.g. in gen_lea_v_seg_dest or gen_add_A0_im) or
> mmu_translate(); but the a20 mask is never applied elsewhere for
> either non-paging mode or page table walks.

Hmm, except helpers do not apply the masking. :/

So Michael's patch would for example break something as silly as a
BOUND, FSAVE or XSAVE operation invoked around the 4GB boundary.

The easiest way to proceed is to introduce a new MMU index
MMU_PTW_IDX, which is the same as MMU_PHYS_IDX except it does not mask
32-bit addresses. Any objections?

Paolo

Paolo Bonzini Dec. 22, 2023, 4:52 p.m. UTC | #5

On Fri, Dec 22, 2023 at 5:16 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Fri, Dec 22, 2023 at 10:04 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > If the extension is not needed, then the a20 mask isn't either.
> >
> > I think it is. The extension is not needed because the masking is
> > applied by either TCG (e.g. in gen_lea_v_seg_dest or gen_add_A0_im) or
> > mmu_translate(); but the a20 mask is never applied elsewhere for
> > either non-paging mode or page table walks.
>
> Hmm, except helpers do not apply the masking. :/
>
> So Michael's patch would for example break something as silly as a
> BOUND, FSAVE or XSAVE operation invoked around the 4GB boundary.
>
> The easiest way to proceed is to introduce a new MMU index
> MMU_PTW_IDX, which is the same as MMU_PHYS_IDX except it does not mask
> 32-bit addresses. Any objections?

Nevermind, I wasn't thinking straight.

Helpers will not use MMU_PHYS_IDX. So those are fine, we just need to
keep the masking before the "break".

The only user of MMU_PHYS_IDX is VMRUN/VMLOAD/VMSAVE. We need to add
checks that the VMCB is aligned there, and same for writing to
MSR_HSAVE_PA.

Paolo

[v2] target/i386: Fix physical address truncation

Commit Message

Comments

Patch