diff mbox series

[v3] target-i386: Walk NPT in guest real mode

Message ID 20240921085712.28902-1-graf@amazon.com (mailing list archive)
State New, archived
Headers show
Series [v3] target-i386: Walk NPT in guest real mode | expand

Commit Message

Alexander Graf Sept. 21, 2024, 8:57 a.m. UTC
When translating virtual to physical address with a guest CPU that
supports nested paging (NPT), we need to perform every page table walk
access indirectly through the NPT, which we correctly do.

However, we treat real mode (no page table walk) special: In that case,
we currently just skip any walks and translate VA -> PA. With NPT
enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA
which we fail to do so far.

The net result of that is that TCG VMs with NPT enabled that execute
real mode code (like SeaBIOS) end up with GPA==HPA mappings which means
the guest accesses host code and data. This typically shows as failure
to boot guests.

This patch changes the page walk logic for NPT enabled guests so that we
always perform a GVA -> GPA translation and then skip any logic that
requires an actual PTE.

That way, all remaining logic to walk the NPT stays and we successfully
walk the NPT in real mode.

Fixes: fe441054bb3f0 ("target-i386: Add NPT support")

Signed-off-by: Alexander Graf <graf@amazon.com>
Reported-by: Eduard Vlad <evlad@amazon.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

---

v1 -> v2:

  - Remove hack where we fake a PTE and instead just set the
    corresponding resolved variables and jump straight to the
    stage2 code.

v2 -> v3:

  - Fix comment
---
 target/i386/tcg/sysemu/excp_helper.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Richard Henderson Oct. 22, 2024, 1:55 a.m. UTC | #1
On 9/21/24 01:57, Alexander Graf wrote:
> When translating virtual to physical address with a guest CPU that
> supports nested paging (NPT), we need to perform every page table walk
> access indirectly through the NPT, which we correctly do.
> 
> However, we treat real mode (no page table walk) special: In that case,
> we currently just skip any walks and translate VA -> PA. With NPT
> enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA
> which we fail to do so far.
> 
> The net result of that is that TCG VMs with NPT enabled that execute
> real mode code (like SeaBIOS) end up with GPA==HPA mappings which means
> the guest accesses host code and data. This typically shows as failure
> to boot guests.
> 
> This patch changes the page walk logic for NPT enabled guests so that we
> always perform a GVA -> GPA translation and then skip any logic that
> requires an actual PTE.
> 
> That way, all remaining logic to walk the NPT stays and we successfully
> walk the NPT in real mode.
> 
> Fixes: fe441054bb3f0 ("target-i386: Add NPT support")
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Reported-by: Eduard Vlad <evlad@amazon.de>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> ---
> 
> v1 -> v2:
> 
>    - Remove hack where we fake a PTE and instead just set the
>      corresponding resolved variables and jump straight to the
>      stage2 code.
> 
> v2 -> v3:
> 
>    - Fix comment

Thanks, queued.


r~
Mark Cave-Ayland Nov. 5, 2024, 10:54 p.m. UTC | #2
On 21/09/2024 09:57, Alexander Graf wrote:

> When translating virtual to physical address with a guest CPU that
> supports nested paging (NPT), we need to perform every page table walk
> access indirectly through the NPT, which we correctly do.
> 
> However, we treat real mode (no page table walk) special: In that case,
> we currently just skip any walks and translate VA -> PA. With NPT
> enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA
> which we fail to do so far.
> 
> The net result of that is that TCG VMs with NPT enabled that execute
> real mode code (like SeaBIOS) end up with GPA==HPA mappings which means
> the guest accesses host code and data. This typically shows as failure
> to boot guests.
> 
> This patch changes the page walk logic for NPT enabled guests so that we
> always perform a GVA -> GPA translation and then skip any logic that
> requires an actual PTE.
> 
> That way, all remaining logic to walk the NPT stays and we successfully
> walk the NPT in real mode.
> 
> Fixes: fe441054bb3f0 ("target-i386: Add NPT support")
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Reported-by: Eduard Vlad <evlad@amazon.de>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> ---
> 
> v1 -> v2:
> 
>    - Remove hack where we fake a PTE and instead just set the
>      corresponding resolved variables and jump straight to the
>      stage2 code.
> 
> v2 -> v3:
> 
>    - Fix comment
> ---
>   target/i386/tcg/sysemu/excp_helper.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c
> index 8fb05b1f53..24dd6935f9 100644
> --- a/target/i386/tcg/sysemu/excp_helper.c
> +++ b/target/i386/tcg/sysemu/excp_helper.c
> @@ -298,7 +298,7 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in,
>           /* combine pde and pte nx, user and rw protections */
>           ptep &= pte ^ PG_NX_MASK;
>           page_size = 4096;
> -    } else {
> +    } else if (pg_mode) {
>           /*
>            * Page table level 2
>            */
> @@ -343,6 +343,15 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in,
>           ptep &= pte | PG_NX_MASK;
>           page_size = 4096;
>           rsvd_mask = 0;
> +    } else {
> +        /*
> +         * No paging (real mode), let's tentatively resolve the address as 1:1
> +         * here, but conditionally still perform an NPT walk on it later.
> +         */
> +        page_size = 0x40000000;
> +        paddr = in->addr;
> +        prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
> +        goto stage2;
>       }
>   
>   do_check_protect:
> @@ -420,6 +429,7 @@ do_check_protect_pse36:
>   
>       /* merge offset within page */
>       paddr = (pte & PG_ADDRESS_MASK & ~(page_size - 1)) | (addr & (page_size - 1));
> +stage2:
>   
>       /*
>        * Note that NPT is walked (for both paging structures and final guest
> @@ -562,7 +572,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr,
>               addr = (uint32_t)addr;
>           }
>   
> -        if (likely(env->cr[0] & CR0_PG_MASK)) {
> +        if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) {
>               in.cr3 = env->cr[3];
>               in.mmu_idx = mmu_idx;
>               in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX;

Hi Alex,

This commit appears to break my WinXP boot test: with this patch applied, attempting 
to boot WinXP from CDROM fails with SeaBIOS getting stuck early in a boot loop. It is 
possible to reproduce the issue easily with:

   ./build/qemu-system-x86_64 -cdrom winxp.iso -boot d


ATB,

Mark.
Alexander Graf Nov. 6, 2024, 5:18 a.m. UTC | #3
Hey Mark,

On 05.11.24 23:54, Mark Cave-Ayland wrote:
>
> Hi Alex,
>
> This commit appears to break my WinXP boot test: with this patch 
> applied, attempting
> to boot WinXP from CDROM fails with SeaBIOS getting stuck early in a 
> boot loop. It is
> possible to reproduce the issue easily with:
>
>   ./build/qemu-system-x86_64 -cdrom winxp.iso -boot d


Thanks a lot for the report! It also seems to break NetBSD 
(https://gitlab.com/qemu-project/qemu/-/issues/2654). I'll have a look 
tomorrow to root cause and will send a patch to fix.


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
diff mbox series

Patch

diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c
index 8fb05b1f53..24dd6935f9 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -298,7 +298,7 @@  static bool mmu_translate(CPUX86State *env, const TranslateParams *in,
         /* combine pde and pte nx, user and rw protections */
         ptep &= pte ^ PG_NX_MASK;
         page_size = 4096;
-    } else {
+    } else if (pg_mode) {
         /*
          * Page table level 2
          */
@@ -343,6 +343,15 @@  static bool mmu_translate(CPUX86State *env, const TranslateParams *in,
         ptep &= pte | PG_NX_MASK;
         page_size = 4096;
         rsvd_mask = 0;
+    } else {
+        /*
+         * No paging (real mode), let's tentatively resolve the address as 1:1
+         * here, but conditionally still perform an NPT walk on it later.
+         */
+        page_size = 0x40000000;
+        paddr = in->addr;
+        prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+        goto stage2;
     }
 
 do_check_protect:
@@ -420,6 +429,7 @@  do_check_protect_pse36:
 
     /* merge offset within page */
     paddr = (pte & PG_ADDRESS_MASK & ~(page_size - 1)) | (addr & (page_size - 1));
+stage2:
 
     /*
      * Note that NPT is walked (for both paging structures and final guest
@@ -562,7 +572,7 @@  static bool get_physical_address(CPUX86State *env, vaddr addr,
             addr = (uint32_t)addr;
         }
 
-        if (likely(env->cr[0] & CR0_PG_MASK)) {
+        if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) {
             in.cr3 = env->cr[3];
             in.mmu_idx = mmu_idx;
             in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX;