target/i386: fix direction of "32-bit MMU" test

Message ID	20240311075806.668555-1-pbonzini@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Paolo Bonzini <pbonzini@redhat.com> To: qemu-devel@nongnu.org Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, qemu-stable@nongnu.org Subject: [PATCH] target/i386: fix direction of "32-bit MMU" test Date: Mon, 11 Mar 2024 08:58:06 +0100 Message-ID: <20240311075806.668555-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -30 X-Spam_score: -3.1 X-Spam_bar: --- X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.945, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	target/i386: fix direction of "32-bit MMU" test \| expand target/i386: fix direction of "32-bit MMU" test

Paolo Bonzini March 11, 2024, 7:58 a.m. UTC

The low bit of MMU indices for x86 TCG indicates whether the processor is
in 32-bit mode and therefore linear addresses have to be masked to 32 bits.
However, the index was computed incorrectly, leading to possible conflicts
in the TLB for any address above 4G.

Analyzed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Fixes: b1661801c18 ("target/i386: Fix physical address truncation", 2024-02-28)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2206
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/cpu.h | 2 +-
 target/i386/cpu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Richard Henderson March 11, 2024, 4:57 p.m. UTC | #1

On 3/10/24 21:58, Paolo Bonzini wrote:
> The low bit of MMU indices for x86 TCG indicates whether the processor is
> in 32-bit mode and therefore linear addresses have to be masked to 32 bits.
> However, the index was computed incorrectly, leading to possible conflicts
> in the TLB for any address above 4G.
> 
> Analyzed-by: Mark Cave-Ayland<mark.cave-ayland@ilande.co.uk>
> Fixes: b1661801c18 ("target/i386: Fix physical address truncation", 2024-02-28)
> Cc:qemu-stable@nongnu.org
> Resolves:https://gitlab.com/qemu-project/qemu/-/issues/2206
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
>   target/i386/cpu.h | 2 +-
>   target/i386/cpu.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)

Oopsie.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

Mark Cave-Ayland March 11, 2024, 8:37 p.m. UTC | #2

On 11/03/2024 07:58, Paolo Bonzini wrote:

> The low bit of MMU indices for x86 TCG indicates whether the processor is
> in 32-bit mode and therefore linear addresses have to be masked to 32 bits.
> However, the index was computed incorrectly, leading to possible conflicts
> in the TLB for any address above 4G.
> 
> Analyzed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Fixes: b1661801c18 ("target/i386: Fix physical address truncation", 2024-02-28)
> Cc: qemu-stable@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2206
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   target/i386/cpu.h | 2 +-
>   target/i386/cpu.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 952174bb6f5..6b057380791 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -2334,7 +2334,7 @@ static inline bool is_mmu_index_32(int mmu_index)
>   
>   static inline int cpu_mmu_index_kernel(CPUX86State *env)
>   {
> -    int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 1 : 0;
> +    int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1;
>       int mmu_index_base =
>           !(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX :
>           ((env->hflags & HF_CPL_MASK) < 3 && (env->eflags & AC_MASK)) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX;
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 2666ef38089..78524bc6073 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -7735,7 +7735,7 @@ static bool x86_cpu_has_work(CPUState *cs)
>   static int x86_cpu_mmu_index(CPUState *cs, bool ifetch)
>   {
>       CPUX86State *env = cpu_env(cs);
> -    int mmu_index_32 = (env->hflags & HF_CS64_MASK) ? 1 : 0;
> +    int mmu_index_32 = (env->hflags & HF_CS64_MASK) ? 0 : 1;
>       int mmu_index_base =
>           (env->hflags & HF_CPL_MASK) == 3 ? MMU_USER64_IDX :
>           !(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX :

LGTM. I've just done a few quick Windows boot tests, and all of Win98SE, WinXP and 
Win7 64-bit now appear to be working fine with this patch so:

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>


ATB,

Mark.

Michael Tokarev April 1, 2024, 6:02 a.m. UTC | #3

11.03.2024 10:58, Paolo Bonzini wrote:
> The low bit of MMU indices for x86 TCG indicates whether the processor is
> in 32-bit mode and therefore linear addresses have to be masked to 32 bits.
> However, the index was computed incorrectly, leading to possible conflicts
> in the TLB for any address above 4G.
> 
> Analyzed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Fixes: b1661801c18 ("target/i386: Fix physical address truncation", 2024-02-28)
> Cc: qemu-stable@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2206
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Anyone can guess why this rather trivial and obviously correct patch causes segfaults
in a few tests in staging-7.2 - when run in tcg mode, namely:

   pxe-test
   migration-test
   boot-serial-test
   bios-tables-test
   vmgenid-test
   cdrom-test

When reverting this single commit from staging-7.2, it all works fine again.

This area in 7.2 is a bit twisty since I picked up previous commit (for which
this one is a fix), b1661801c18 "target/i386: Fix physical address truncation",
before a few others in the same place, especially before 90f641531c782c8
"target/i386: use separate MMU indexes for 32-bit accesses", so that picking
up the other commits has become more problematic due to differences in context.
Also, 7.2 lacks ace0c5fe59 "target/i386: Populate CPUClass.mmu_index" which moves
one of the places this commit touches to another file.  Hopefully I haven't
screwed up this place entirely and the prob is just some other missing commit.

I'm re-verifying my cherry-picks once again. Meanwhile, help from someone who
is more familiar with this place is definitely welcome.

BTW, Paolo, it looks like this patch is missing two tags - Reviewed-by Richard
and Tested-by Mark.  It's obviously too late to change that now, but it's
something to check for in the future.

Thanks,

/mjt

> ---
>   target/i386/cpu.h | 2 +-
>   target/i386/cpu.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 952174bb6f5..6b057380791 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -2334,7 +2334,7 @@ static inline bool is_mmu_index_32(int mmu_index)
>   
>   static inline int cpu_mmu_index_kernel(CPUX86State *env)
>   {
> -    int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 1 : 0;
> +    int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1;
>       int mmu_index_base =
>           !(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX :
>           ((env->hflags & HF_CPL_MASK) < 3 && (env->eflags & AC_MASK)) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX;
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 2666ef38089..78524bc6073 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -7735,7 +7735,7 @@ static bool x86_cpu_has_work(CPUState *cs)
>   static int x86_cpu_mmu_index(CPUState *cs, bool ifetch)
>   {
>       CPUX86State *env = cpu_env(cs);
> -    int mmu_index_32 = (env->hflags & HF_CS64_MASK) ? 1 : 0;
> +    int mmu_index_32 = (env->hflags & HF_CS64_MASK) ? 0 : 1;
>       int mmu_index_base =
>           (env->hflags & HF_CPL_MASK) == 3 ? MMU_USER64_IDX :
>           !(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX :

Michael Tokarev April 5, 2024, 5:30 p.m. UTC | #4

01.04.2024 09:02, Michael Tokarev:

> Anyone can guess why this rather trivial and obviously correct patch causes segfaults
> in a few tests in staging-7.2 - when run in tcg mode, namely:
> 
>    pxe-test
>    migration-test
>    boot-serial-test
>    bios-tables-test
>    vmgenid-test
>    cdrom-test
> 
> When reverting this single commit from staging-7.2, it all works fine again.

It sigsegvs in probe_access_internal():

   CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one returns NULL,

and next there's a call

   tlb_addr = tlb_read_ofs(entry, elt_ofs);

which fails.

#0  0x0000555555c5de8a in tlb_read_ofs (ofs=8, entry=0x0) at 7.2/accel/tcg/cputlb.c:1455
#1  probe_access_internal
     (env=0x555556a862a0, addr=4294967280, fault_size=fault_size@entry=1, access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=5, 
nonfault=nonfault@entry=false, phost=0x7fffea4d32a0, pfull=0x7fffea4d3298, retaddr=0)
     at 7.2/accel/tcg/cputlb.c:1555
#2  0x0000555555c62aba in get_page_addr_code_hostp
     (env=<optimized out>, addr=addr@entry=4294967280, hostp=hostp@entry=0x0)
     at 7.2/accel/tcg/cputlb.c:1691
#3  0x0000555555c52b54 in get_page_addr_code (addr=4294967280, env=<optimized out>)
     at 7.2/include/exec/exec-all.h:714
#4  tb_htable_lookup
     (cpu=cpu@entry=0x555556a85530, pc=pc@entry=4294967280, cs_base=cs_base@entry=4294901760, flags=flags@entry=64, cflags=cflags@entry=4278190080) at 
7.2/accel/tcg/cpu-exec.c:236
#5  0x0000555555c53e8e in tb_lookup
     (cflags=4278190080, flags=64, cs_base=4294901760, pc=4294967280, cpu=0x555556a85530)
     at 7.2/accel/tcg/cpu-exec.c:270
#6  cpu_exec (cpu=cpu@entry=0x555556a85530) at 7.2/accel/tcg/cpu-exec.c:1001
#7  0x0000555555c75d2f in tcg_cpus_exec (cpu=cpu@entry=0x555556a85530)
     at 7.2/accel/tcg/tcg-accel-ops.c:69
#8  0x0000555555c75e80 in mttcg_cpu_thread_fn (arg=arg@entry=0x555556a85530)
     at 7.2/accel/tcg/tcg-accel-ops-mttcg.c:95
#9  0x0000555555ded098 in qemu_thread_start (args=0x555556adac40)
     at 7.2/util/qemu-thread-posix.c:505
#10 0x00007ffff5793134 in start_thread (arg=<optimized out>)
#11 0x00007ffff58137dc in clone3 ()


I'm removing this whole set from 7.2 for now:

  2cc68629a6fc target/i386: fix direction of "32-bit MMU" test
  90f641531c78 target/i386: use separate MMU indexes for 32-bit accesses
  5f97afe2543f target/i386: introduce function to query MMU indices

This leaves us with

  b1661801c184 "target/i386: Fix physical address truncation"

but without its fix, 2cc68629a6fc.

It looks like I should revert b1661801c184 from 7.2 too, re-opening
https://gitlab.com/qemu-project/qemu/-/issues/2040 - since to me it isn't
clear if this change actually fixes this issue or not without the
previous change, 90f641531c78, which is missing from 7.2.10.

At the very least this will simplify possible another attempt to cherry-pick
these changes to 7.2.

Thanks,

/mjt

Philippe Mathieu-Daudé April 8, 2024, 9:35 a.m. UTC | #5

On 5/4/24 19:30, Michael Tokarev wrote:
> 01.04.2024 09:02, Michael Tokarev:
> 
>> Anyone can guess why this rather trivial and obviously correct patch 
>> causes segfaults
>> in a few tests in staging-7.2 - when run in tcg mode, namely:
>>
>>    pxe-test
>>    migration-test
>>    boot-serial-test
>>    bios-tables-test
>>    vmgenid-test
>>    cdrom-test
>>
>> When reverting this single commit from staging-7.2, it all works fine 
>> again.
> 
> It sigsegvs in probe_access_internal():
> 
>    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one 
> returns NULL,
> 
> and next there's a call
> 
>    tlb_addr = tlb_read_ofs(entry, elt_ofs);
> 
> which fails.
> 
> #0  0x0000555555c5de8a in tlb_read_ofs (ofs=8, entry=0x0) at 
> 7.2/accel/tcg/cputlb.c:1455
> #1  probe_access_internal
>      (env=0x555556a862a0, addr=4294967280, 
> fault_size=fault_size@entry=1, 
> access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=5, 
> nonfault=nonfault@entry=false, phost=0x7fffea4d32a0, 
> pfull=0x7fffea4d3298, retaddr=0)
>      at 7.2/accel/tcg/cputlb.c:1555
> #2  0x0000555555c62aba in get_page_addr_code_hostp
>      (env=<optimized out>, addr=addr@entry=4294967280, 
> hostp=hostp@entry=0x0)
>      at 7.2/accel/tcg/cputlb.c:1691
> #3  0x0000555555c52b54 in get_page_addr_code (addr=4294967280, 
> env=<optimized out>)
>      at 7.2/include/exec/exec-all.h:714
> #4  tb_htable_lookup
>      (cpu=cpu@entry=0x555556a85530, pc=pc@entry=4294967280, 
> cs_base=cs_base@entry=4294901760, flags=flags@entry=64, 
> cflags=cflags@entry=4278190080) at 7.2/accel/tcg/cpu-exec.c:236
> #5  0x0000555555c53e8e in tb_lookup
>      (cflags=4278190080, flags=64, cs_base=4294901760, pc=4294967280, 
> cpu=0x555556a85530)
>      at 7.2/accel/tcg/cpu-exec.c:270
> #6  cpu_exec (cpu=cpu@entry=0x555556a85530) at 
> 7.2/accel/tcg/cpu-exec.c:1001
> #7  0x0000555555c75d2f in tcg_cpus_exec (cpu=cpu@entry=0x555556a85530)
>      at 7.2/accel/tcg/tcg-accel-ops.c:69
> #8  0x0000555555c75e80 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x555556a85530)
>      at 7.2/accel/tcg/tcg-accel-ops-mttcg.c:95
> #9  0x0000555555ded098 in qemu_thread_start (args=0x555556adac40)
>      at 7.2/util/qemu-thread-posix.c:505
> #10 0x00007ffff5793134 in start_thread (arg=<optimized out>)
> #11 0x00007ffff58137dc in clone3 ()
> 
> 
> I'm removing this whole set from 7.2 for now:
> 
>   2cc68629a6fc target/i386: fix direction of "32-bit MMU" test
>   90f641531c78 target/i386: use separate MMU indexes for 32-bit accesses
>   5f97afe2543f target/i386: introduce function to query MMU indices

Cc'ing Giuseppe Ghibò for
https://gitlab.com/qemu-project/qemu/-/issues/2264

> This leaves us with
> 
>   b1661801c184 "target/i386: Fix physical address truncation"
> 
> but without its fix, 2cc68629a6fc.
> 
> It looks like I should revert b1661801c184 from 7.2 too, re-opening
> https://gitlab.com/qemu-project/qemu/-/issues/2040 - since to me it isn't
> clear if this change actually fixes this issue or not without the
> previous change, 90f641531c78, which is missing from 7.2.10.
> 
> At the very least this will simplify possible another attempt to 
> cherry-pick
> these changes to 7.2.
> 
> Thanks,
> 
> /mjt
>

Paolo Bonzini April 8, 2024, 8:12 p.m. UTC | #6

Il ven 5 apr 2024, 19:30 Michael Tokarev <mjt@tls.msk.ru> ha scritto:

> 01.04.2024 09:02, Michael Tokarev:
>
> > Anyone can guess why this rather trivial and obviously correct patch
> causes segfaults
> > in a few tests in staging-7.2 - when run in tcg mode, namely:
> >
> >    pxe-test
> >    migration-test
> >    boot-serial-test
> >    bios-tables-test
> >    vmgenid-test
> >    cdrom-test
> >
> > When reverting this single commit from staging-7.2, it all works fine
> again.
>
> It sigsegvs in probe_access_internal():
>
>    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one returns
> NULL,
>
> and next there's a call
>
>    tlb_addr = tlb_read_ofs(entry, elt_ofs);
>
> which fails.
>

I will take a look tomorrow.

Paolo


> #0  0x0000555555c5de8a in tlb_read_ofs (ofs=8, entry=0x0) at
> 7.2/accel/tcg/cputlb.c:1455
> #1  probe_access_internal
>      (env=0x555556a862a0, addr=4294967280, fault_size=fault_size@entry=1,
> access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=5,
> nonfault=nonfault@entry=false, phost=0x7fffea4d32a0,
> pfull=0x7fffea4d3298, retaddr=0)
>      at 7.2/accel/tcg/cputlb.c:1555
> #2  0x0000555555c62aba in get_page_addr_code_hostp
>      (env=<optimized out>, addr=addr@entry=4294967280, hostp=hostp@entry
> =0x0)
>      at 7.2/accel/tcg/cputlb.c:1691
> #3  0x0000555555c52b54 in get_page_addr_code (addr=4294967280,
> env=<optimized out>)
>      at 7.2/include/exec/exec-all.h:714
> #4  tb_htable_lookup
>      (cpu=cpu@entry=0x555556a85530, pc=pc@entry=4294967280,
> cs_base=cs_base@entry=4294901760, flags=flags@entry=64,
> cflags=cflags@entry=4278190080) at
> 7.2/accel/tcg/cpu-exec.c:236
> #5  0x0000555555c53e8e in tb_lookup
>      (cflags=4278190080, flags=64, cs_base=4294901760, pc=4294967280,
> cpu=0x555556a85530)
>      at 7.2/accel/tcg/cpu-exec.c:270
> #6  cpu_exec (cpu=cpu@entry=0x555556a85530) at
> 7.2/accel/tcg/cpu-exec.c:1001
> #7  0x0000555555c75d2f in tcg_cpus_exec (cpu=cpu@entry=0x555556a85530)
>      at 7.2/accel/tcg/tcg-accel-ops.c:69
> #8  0x0000555555c75e80 in mttcg_cpu_thread_fn (arg=arg@entry
> =0x555556a85530)
>      at 7.2/accel/tcg/tcg-accel-ops-mttcg.c:95
> #9  0x0000555555ded098 in qemu_thread_start (args=0x555556adac40)
>      at 7.2/util/qemu-thread-posix.c:505
> #10 0x00007ffff5793134 in start_thread (arg=<optimized out>)
> #11 0x00007ffff58137dc in clone3 ()
>
>
> I'm removing this whole set from 7.2 for now:
>
>   2cc68629a6fc target/i386: fix direction of "32-bit MMU" test
>   90f641531c78 target/i386: use separate MMU indexes for 32-bit accesses
>   5f97afe2543f target/i386: introduce function to query MMU indices
>
> This leaves us with
>
>   b1661801c184 "target/i386: Fix physical address truncation"
>
> but without its fix, 2cc68629a6fc.
>
> It looks like I should revert b1661801c184 from 7.2 too, re-opening
> https://gitlab.com/qemu-project/qemu/-/issues/2040 - since to me it isn't
> clear if this change actually fixes this issue or not without the
> previous change, 90f641531c78, which is missing from 7.2.10.
>
> At the very least this will simplify possible another attempt to
> cherry-pick
> these changes to 7.2.
>
> Thanks,
>
> /mjt
>
>

Michael Tokarev April 8, 2024, 8:18 p.m. UTC | #7

08.04.2024 23:12, Paolo Bonzini wrote:
> Il ven 5 apr 2024, 19:30 Michael Tokarev <mjt@tls.msk.ru <mailto:mjt@tls.msk.ru>> ha scritto:

>     It sigsegvs in probe_access_internal():
> 
>         CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one returns NULL,
> 
>     and next there's a call
> 
>         tlb_addr = tlb_read_ofs(entry, elt_ofs);
> 
>     which fails.
> 
> 
> I will take a look tomorrow.

The changes on top of 7.2.10 are available at https://gitlab.com/mjt0k/qemu/-/commits/7.2-i386-mmu-idx/ -
I'm still blaming myself for bad back-port, but I can't find where I failed.

Thanks,

/mjt

Paolo Bonzini April 9, 2024, 11:02 a.m. UTC | #8

On Tue, Apr 9, 2024 at 12:59 PM Zhao Liu <zhao1.liu@intel.com> wrote:
>
> Hi Michael & Paolo,
>
> On Fri, Apr 05, 2024 at 08:30:43PM +0300, Michael Tokarev wrote:
> > Date: Fri, 5 Apr 2024 20:30:43 +0300
> > From: Michael Tokarev <mjt@tls.msk.ru>
> > Subject: Re: [PATCH] target/i386: fix direction of "32-bit MMU" test
> >
> > 01.04.2024 09:02, Michael Tokarev:
> >
> > > Anyone can guess why this rather trivial and obviously correct patch causes segfaults
> > > in a few tests in staging-7.2 - when run in tcg mode, namely:
> > >
> > >    pxe-test
> > >    migration-test
> > >    boot-serial-test
> > >    bios-tables-test
> > >    vmgenid-test
> > >    cdrom-test
> > >
> > > When reverting this single commit from staging-7.2, it all works fine again.
> >
> > It sigsegvs in probe_access_internal():
> >
> >   CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one returns NULL,
> >
> > and next there's a call
> >
> >   tlb_addr = tlb_read_ofs(entry, elt_ofs);
> >
> > which fails.
> >
> > #0  0x0000555555c5de8a in tlb_read_ofs (ofs=8, entry=0x0) at 7.2/accel/tcg/cputlb.c:1455
> > #1  probe_access_internal
> >     (env=0x555556a862a0, addr=4294967280, fault_size=fault_size@entry=1,
> > access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=5,
> > nonfault=nonfault@entry=false, phost=0x7fffea4d32a0, pfull=0x7fffea4d3298,
> > retaddr=0)
> >     at 7.2/accel/tcg/cputlb.c:1555
> > #2  0x0000555555c62aba in get_page_addr_code_hostp
> >     (env=<optimized out>, addr=addr@entry=4294967280, hostp=hostp@entry=0x0)
> >     at 7.2/accel/tcg/cputlb.c:1691
> > #3  0x0000555555c52b54 in get_page_addr_code (addr=4294967280, env=<optimized out>)
> >     at 7.2/include/exec/exec-all.h:714
> > #4  tb_htable_lookup
> >     (cpu=cpu@entry=0x555556a85530, pc=pc@entry=4294967280,
> > cs_base=cs_base@entry=4294901760, flags=flags@entry=64,
> > cflags=cflags@entry=4278190080) at 7.2/accel/tcg/cpu-exec.c:236
> > #5  0x0000555555c53e8e in tb_lookup
> >     (cflags=4278190080, flags=64, cs_base=4294901760, pc=4294967280, cpu=0x555556a85530)
> >     at 7.2/accel/tcg/cpu-exec.c:270
> > #6  cpu_exec (cpu=cpu@entry=0x555556a85530) at 7.2/accel/tcg/cpu-exec.c:1001
> > #7  0x0000555555c75d2f in tcg_cpus_exec (cpu=cpu@entry=0x555556a85530)
> >     at 7.2/accel/tcg/tcg-accel-ops.c:69
> > #8  0x0000555555c75e80 in mttcg_cpu_thread_fn (arg=arg@entry=0x555556a85530)
> >     at 7.2/accel/tcg/tcg-accel-ops-mttcg.c:95
> > #9  0x0000555555ded098 in qemu_thread_start (args=0x555556adac40)
> >     at 7.2/util/qemu-thread-posix.c:505
> > #10 0x00007ffff5793134 in start_thread (arg=<optimized out>)
> > #11 0x00007ffff58137dc in clone3 ()
> >
>
> I debugged it manually, and found the problem occurs in tlb_index() with
> mmu_idx=5.
>
> For v7.2, the maximum mmu index supported by i386 is 4 (since
> NB_MMU_MODES = 5 defined in target/i386/cpu-param.h).
>
> On Michael's 7.2-i386-mmu-idx tree, the commit 9fc3a7828d25 ("target/i386:
> use separate MMU indexes for 32-bit accesses") introduced more indexes
> without relaxing the NB_MMU_MODES for i386.
>
> Before this fix, probe_access_internal() just got the wrong mmu_idx as 4,
> and it's not out of bounds. After this fix, the right mmu_idx=5 is truly
> out of bounds.
>
> On the master branch, there's no such issue since the commits ffd824f3f32d
> ("include/exec: Set default NB_MMU_MODES to 16") and 6787318a5d86
> ("target/i386: Remove NB_MMU_MODES define") relaxed upper limit of MMU
> index for i386.

Thanks Zhao! Alternatively, it's enough to set NB_MMU_MODES to 8 in
commit 9fc3a7828d25.

Paolo

Zhao Liu April 9, 2024, 11:13 a.m. UTC | #9

Hi Michael & Paolo,

On Fri, Apr 05, 2024 at 08:30:43PM +0300, Michael Tokarev wrote:
> Date: Fri, 5 Apr 2024 20:30:43 +0300
> From: Michael Tokarev <mjt@tls.msk.ru>
> Subject: Re: [PATCH] target/i386: fix direction of "32-bit MMU" test
> 
> 01.04.2024 09:02, Michael Tokarev:
> 
> > Anyone can guess why this rather trivial and obviously correct patch causes segfaults
> > in a few tests in staging-7.2 - when run in tcg mode, namely:
> > 
> >    pxe-test
> >    migration-test
> >    boot-serial-test
> >    bios-tables-test
> >    vmgenid-test
> >    cdrom-test
> > 
> > When reverting this single commit from staging-7.2, it all works fine again.
> 
> It sigsegvs in probe_access_internal():
> 
>   CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one returns NULL,
> 
> and next there's a call
> 
>   tlb_addr = tlb_read_ofs(entry, elt_ofs);
> 
> which fails.
> 
> #0  0x0000555555c5de8a in tlb_read_ofs (ofs=8, entry=0x0) at 7.2/accel/tcg/cputlb.c:1455
> #1  probe_access_internal
>     (env=0x555556a862a0, addr=4294967280, fault_size=fault_size@entry=1,
> access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=5,
> nonfault=nonfault@entry=false, phost=0x7fffea4d32a0, pfull=0x7fffea4d3298,
> retaddr=0)
>     at 7.2/accel/tcg/cputlb.c:1555
> #2  0x0000555555c62aba in get_page_addr_code_hostp
>     (env=<optimized out>, addr=addr@entry=4294967280, hostp=hostp@entry=0x0)
>     at 7.2/accel/tcg/cputlb.c:1691
> #3  0x0000555555c52b54 in get_page_addr_code (addr=4294967280, env=<optimized out>)
>     at 7.2/include/exec/exec-all.h:714
> #4  tb_htable_lookup
>     (cpu=cpu@entry=0x555556a85530, pc=pc@entry=4294967280,
> cs_base=cs_base@entry=4294901760, flags=flags@entry=64,
> cflags=cflags@entry=4278190080) at 7.2/accel/tcg/cpu-exec.c:236
> #5  0x0000555555c53e8e in tb_lookup
>     (cflags=4278190080, flags=64, cs_base=4294901760, pc=4294967280, cpu=0x555556a85530)
>     at 7.2/accel/tcg/cpu-exec.c:270
> #6  cpu_exec (cpu=cpu@entry=0x555556a85530) at 7.2/accel/tcg/cpu-exec.c:1001
> #7  0x0000555555c75d2f in tcg_cpus_exec (cpu=cpu@entry=0x555556a85530)
>     at 7.2/accel/tcg/tcg-accel-ops.c:69
> #8  0x0000555555c75e80 in mttcg_cpu_thread_fn (arg=arg@entry=0x555556a85530)
>     at 7.2/accel/tcg/tcg-accel-ops-mttcg.c:95
> #9  0x0000555555ded098 in qemu_thread_start (args=0x555556adac40)
>     at 7.2/util/qemu-thread-posix.c:505
> #10 0x00007ffff5793134 in start_thread (arg=<optimized out>)
> #11 0x00007ffff58137dc in clone3 ()
> 

I debugged it manually, and found the problem occurs in tlb_index() with
mmu_idx=5.

For v7.2, the maximum mmu index supported by i386 is 4 (since
NB_MMU_MODES = 5 defined in target/i386/cpu-param.h).

On Michael's 7.2-i386-mmu-idx tree, the commit 9fc3a7828d25 ("target/i386:
use separate MMU indexes for 32-bit accesses") introduced more indexes
without relaxing the NB_MMU_MODES for i386.

Before this fix, probe_access_internal() just got the wrong mmu_idx as 4,
and it's not out of bounds. After this fix, the right mmu_idx=5 is truly
out of bounds.

On the master branch, there's no such issue since the commits ffd824f3f32d
("include/exec: Set default NB_MMU_MODES to 16") and 6787318a5d86
("target/i386: Remove NB_MMU_MODES define") relaxed upper limit of MMU
index for i386.

So these 2 commits should also be picked (with these 2 commits, tests in
"make check" passed).

However, the cleanup for NB_MMU_MODES is a big series (total 23 patches,
from the commit ffd824f3f32d ("include/exec: Set default NB_MMU_MODES to
16") to 00da6b49a227 ("include/exec: Remove guards around NB_MMU_MODES")),
maybe changes in other arches should be picked together?

In addition, I think maybe we could add assert() in tlb_index() and
tlb_entry() to ensure the mmu_idx not exceed the limit of NB_MMU_MODES.
But I'm a little unsure if this will hurt performance, because these 2
helpers look like the hotspot functions.

-Zhao

Michael Tokarev April 9, 2024, 6:21 p.m. UTC | #10

09.04.2024 14:02, Paolo Bonzini wrote:
> On Tue, Apr 9, 2024 at 12:59 PM Zhao Liu <zhao1.liu@intel.com> wrote:
>>
>> Hi Michael & Paolo,

>> I debugged it manually, and found the problem occurs in tlb_index() with
>> mmu_idx=5.
>>
>> For v7.2, the maximum mmu index supported by i386 is 4 (since
>> NB_MMU_MODES = 5 defined in target/i386/cpu-param.h).

>> On Michael's 7.2-i386-mmu-idx tree, the commit 9fc3a7828d25 ("target/i386:
>> use separate MMU indexes for 32-bit accesses") introduced more indexes
>> without relaxing the NB_MMU_MODES for i386.

Oh well.  I starred a few times at the new MMU_foo_IDXes in v8.2.0-1876-g90f641531c78
"target/i386: use separate MMU indexes for 32-bit accesses" a few times, it looked
somehow wrong, but it didn't occur to me what exactly was wrong there :)

>> Before this fix, probe_access_internal() just got the wrong mmu_idx as 4,
>> and it's not out of bounds. After this fix, the right mmu_idx=5 is truly
>> out of bounds.
>>
>> On the master branch, there's no such issue since the commits ffd824f3f32d
>> ("include/exec: Set default NB_MMU_MODES to 16") and 6787318a5d86
>> ("target/i386: Remove NB_MMU_MODES define") relaxed upper limit of MMU
>> index for i386.
> 
> Thanks Zhao! Alternatively, it's enough to set NB_MMU_MODES to 8 in
> commit 9fc3a7828d25.

Thank you, both of you, this is exactly the prob.  I fixed it by increasing
NB_MMU_NODES in "use separate MMU indexes for 32-bit accesses", since this
is the change which introduces the new indexes.

Now 7.2 work again, and appears to be under control, which is an excellent
news! :)

/mjt

target/i386: fix direction of "32-bit MMU" test

Commit Message

Comments

Patch