diff mbox series

target/loongarch: Support 4K page size

Message ID 20231023024059.3858349-1-gaosong@loongson.cn (mailing list archive)
State New, archived
Headers show
Series target/loongarch: Support 4K page size | expand

Commit Message

gaosong Oct. 23, 2023, 2:40 a.m. UTC
The LoongArch kernel supports 4K page size.
Change TARGET_PAGE_BITS to 12.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu-param.h  | 2 +-
 target/loongarch/tlb_helper.c | 9 ++++-----
 2 files changed, 5 insertions(+), 6 deletions(-)

Comments

Bibo Mao Oct. 23, 2023, 4:06 a.m. UTC | #1
在 2023/10/23 上午10:40, Song Gao 写道:
> The LoongArch kernel supports 4K page size.
> Change TARGET_PAGE_BITS to 12.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/cpu-param.h  | 2 +-
>   target/loongarch/tlb_helper.c | 9 ++++-----
>   2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
> index 1265dc7cb5..cfe195db4e 100644
> --- a/target/loongarch/cpu-param.h
> +++ b/target/loongarch/cpu-param.h
> @@ -12,6 +12,6 @@
>   #define TARGET_PHYS_ADDR_SPACE_BITS 48
>   #define TARGET_VIRT_ADDR_SPACE_BITS 48
>   
> -#define TARGET_PAGE_BITS 14
> +#define TARGET_PAGE_BITS 12
Hi Gaosong,

The popular OS about LoongArch still uses 16K page size, qemu should 
follow the rule of OS rather than defining 4K page size alone.

Regards
Bibo Mao

>   
>   #endif
> diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
> index c8b8b0497f..449043c68b 100644
> --- a/target/loongarch/tlb_helper.c
> +++ b/target/loongarch/tlb_helper.c
> @@ -60,6 +60,9 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical,
>           tlb_rplv = 0;
>       }
>   
> +    /* Remove sw bit between bit12 -- bit PS*/
> +    tlb_ppn = tlb_ppn & ~(((0x1UL << (tlb_ps - 12)) -1));
> +
>       /* Check access rights */
>       if (!tlb_v) {
>           return TLBRET_INVALID;
> @@ -82,10 +85,6 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical,
>           return TLBRET_DIRTY;
>       }
>   
> -    /*
> -     * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15]
> -     * need adjust.
> -     */
>       *physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) |
>                   (address & MAKE_64BIT_MASK(0, tlb_ps));
>       *prot = PAGE_READ;
> @@ -774,7 +773,7 @@ void helper_ldpte(CPULoongArchState *env, target_ulong base, target_ulong odd,
>           /* Move Global bit */
>           tmp0 = ((tmp0 & (1 << LOONGARCH_HGLOBAL_SHIFT))  >>
>                   LOONGARCH_HGLOBAL_SHIFT) << R_TLBENTRY_G_SHIFT |
> -                (tmp0 & (~(1 << R_TLBENTRY_G_SHIFT)));
> +                (tmp0 & (~(1 << LOONGARCH_HGLOBAL_SHIFT)));
>           ps = ptbase + ptwidth - 1;
>           if (odd) {
>               tmp0 += MAKE_64BIT_MASK(ps, 1);
>
Peter Maydell Oct. 23, 2023, 10:22 a.m. UTC | #2
On Mon, 23 Oct 2023 at 05:06, maobibo <maobibo@loongson.cn> wrote:
>
>
>
> 在 2023/10/23 上午10:40, Song Gao 写道:
> > The LoongArch kernel supports 4K page size.
> > Change TARGET_PAGE_BITS to 12.
> >
> > Signed-off-by: Song Gao <gaosong@loongson.cn>
> > ---
> >   target/loongarch/cpu-param.h  | 2 +-
> >   target/loongarch/tlb_helper.c | 9 ++++-----
> >   2 files changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
> > index 1265dc7cb5..cfe195db4e 100644
> > --- a/target/loongarch/cpu-param.h
> > +++ b/target/loongarch/cpu-param.h
> > @@ -12,6 +12,6 @@
> >   #define TARGET_PHYS_ADDR_SPACE_BITS 48
> >   #define TARGET_VIRT_ADDR_SPACE_BITS 48
> >
> > -#define TARGET_PAGE_BITS 14
> > +#define TARGET_PAGE_BITS 12
> Hi Gaosong,
>
> The popular OS about LoongArch still uses 16K page size, qemu should
> follow the rule of OS rather than defining 4K page size alone.

The TARGET_PAGE_BITS value in QEMU is a property of the hardware,
not the guest OS. It should specify the smallest page size the
guest can configure the CPU to use. If the guest asks for a
larger page size than the minimum then that works fine. See
for example PPC64 -- on this architecture both 4K and 64K
pages are possible, so we define TARGET_PAGE_BITS to 12,
even though a lot of Linux guests use 64K pages.

It is slightly less efficient when the guest uses a page size
larger than the TARGET_PAGE_BITS value indicates, so if you
have an architecture where some CPUs support small pages
but most do not, you can do what Arm does, and use the
TARGET_PAGE_BITS_VARY support. This makes the TARGET_PAGE_BITS
macro be a runtime-configurable value, where a machine model can
set the mc->minimum_page_bits value to indicate that that
machine doesn't need the small-pages handling.

thanks
-- PMM
Bibo Mao Oct. 23, 2023, 12:56 p.m. UTC | #3
在 2023/10/23 下午6:22, Peter Maydell 写道:
> On Mon, 23 Oct 2023 at 05:06, maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> 在 2023/10/23 上午10:40, Song Gao 写道:
>>> The LoongArch kernel supports 4K page size.
>>> Change TARGET_PAGE_BITS to 12.
>>>
>>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>>> ---
>>>    target/loongarch/cpu-param.h  | 2 +-
>>>    target/loongarch/tlb_helper.c | 9 ++++-----
>>>    2 files changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
>>> index 1265dc7cb5..cfe195db4e 100644
>>> --- a/target/loongarch/cpu-param.h
>>> +++ b/target/loongarch/cpu-param.h
>>> @@ -12,6 +12,6 @@
>>>    #define TARGET_PHYS_ADDR_SPACE_BITS 48
>>>    #define TARGET_VIRT_ADDR_SPACE_BITS 48
>>>
>>> -#define TARGET_PAGE_BITS 14
>>> +#define TARGET_PAGE_BITS 12
>> Hi Gaosong,
>>
>> The popular OS about LoongArch still uses 16K page size, qemu should
>> follow the rule of OS rather than defining 4K page size alone.
> 
> The TARGET_PAGE_BITS value in QEMU is a property of the hardware,
> not the guest OS. It should specify the smallest page size the
> guest can configure the CPU to use. If the guest asks for a
> larger page size than the minimum then that works fine. See
> for example PPC64 -- on this architecture both 4K and 64K
> pages are possible, so we define TARGET_PAGE_BITS to 12,
> even though a lot of Linux guests use 64K pages.
> 
> It is slightly less efficient when the guest uses a page size
> larger than the TARGET_PAGE_BITS value indicates, so if you
> have an architecture where some CPUs support small pages
> but most do not, you can do what Arm does, and use the
> TARGET_PAGE_BITS_VARY support. This makes the TARGET_PAGE_BITS
> macro be a runtime-configurable value, where a machine model can
> set the mc->minimum_page_bits value to indicate that that
> machine doesn't need the small-pages handling.
Peter,

Thanks for your guidance, the TARGET_PAGE_BITS setting issue puzzle
us for a long time. I have a simple test for kernel with 4K/16K 
different page size, it boots well if TARGET_PAGE_BITS
is set as 12. And we will do more test, we will switch to 
TARGET_PAGE_BITS to 12 if all the test pass.

Regards
Bibo Mao

> 
> thanks
> -- PMM
>
Philippe Mathieu-Daudé Oct. 23, 2023, 2:43 p.m. UTC | #4
On 23/10/23 12:22, Peter Maydell wrote:
> On Mon, 23 Oct 2023 at 05:06, maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> 在 2023/10/23 上午10:40, Song Gao 写道:
>>> The LoongArch kernel supports 4K page size.
>>> Change TARGET_PAGE_BITS to 12.
>>>
>>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>>> ---
>>>    target/loongarch/cpu-param.h  | 2 +-
>>>    target/loongarch/tlb_helper.c | 9 ++++-----
>>>    2 files changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
>>> index 1265dc7cb5..cfe195db4e 100644
>>> --- a/target/loongarch/cpu-param.h
>>> +++ b/target/loongarch/cpu-param.h
>>> @@ -12,6 +12,6 @@
>>>    #define TARGET_PHYS_ADDR_SPACE_BITS 48
>>>    #define TARGET_VIRT_ADDR_SPACE_BITS 48
>>>
>>> -#define TARGET_PAGE_BITS 14
>>> +#define TARGET_PAGE_BITS 12
>> Hi Gaosong,
>>
>> The popular OS about LoongArch still uses 16K page size, qemu should
>> follow the rule of OS rather than defining 4K page size alone.
> 
> The TARGET_PAGE_BITS value in QEMU is a property of the hardware,
> not the guest OS. It should specify the smallest page size the
> guest can configure the CPU to use. If the guest asks for a
> larger page size than the minimum then that works fine. See
> for example PPC64 -- on this architecture both 4K and 64K
> pages are possible, so we define TARGET_PAGE_BITS to 12,
> even though a lot of Linux guests use 64K pages.
> 
> It is slightly less efficient when the guest uses a page size
> larger than the TARGET_PAGE_BITS value indicates, so if you
> have an architecture where some CPUs support small pages
> but most do not, you can do what Arm does, and use the
> TARGET_PAGE_BITS_VARY support. This makes the TARGET_PAGE_BITS
> macro be a runtime-configurable value, where a machine model can
> set the mc->minimum_page_bits value to indicate that that
> machine doesn't need the small-pages handling.

With heterogeneous architectures emulation, eventually all
targets will use TARGET_PAGE_BITS_VARY.
Michael Tokarev Oct. 7, 2024, 2:48 p.m. UTC | #5
23.10.2023 05:40, Song Gao wrote:
> The LoongArch kernel supports 4K page size.
> Change TARGET_PAGE_BITS to 12.

This change appears to have 2 issues.

First, the subject is misleading, - it does not only introduces support for 4K page
size, it actually *switches* to 4K page size.  But this is sort of minor.

More interestingly is that it has quite noticeable effect on performance.  For
example, https://gitlab.com/qemu-project/qemu/-/issues/2491 - I confirm 7z
decompression performance drop from ~110Mb/s before this change to ~73Mb/s
after it.

Is such a performance drop expected?

Thanks,

/mjt
Richard Henderson Oct. 7, 2024, 3:09 p.m. UTC | #6
On 10/7/24 07:48, Michael Tokarev wrote:
> 23.10.2023 05:40, Song Gao wrote:
>> The LoongArch kernel supports 4K page size.
>> Change TARGET_PAGE_BITS to 12.
> 
> This change appears to have 2 issues.
> 
> First, the subject is misleading, - it does not only introduces support for 4K page
> size, it actually *switches* to 4K page size.  But this is sort of minor.
> 
> More interestingly is that it has quite noticeable effect on performance.  For
> example, https://gitlab.com/qemu-project/qemu/-/issues/2491 - I confirm 7z
> decompression performance drop from ~110Mb/s before this change to ~73Mb/s
> after it.
> 
> Is such a performance drop expected?

The #2491 issue appears to be for user-mode emulation.  Because the reported host is x86, 
I would expect guest page size == host page size to improve performance, not degrade it.

If this were system mode emulation, quite possibly.  If the guest loongarch kernel is 
still using 16k pages, then all pages that are given to softmmu are "large pages", which 
perform poorly.  I hope to address this at some point.

If this is really about user-mode, then perf may be your friend in determining where the 
extra overhead is coming from.


r~
Michael Tokarev Oct. 7, 2024, 3:45 p.m. UTC | #7
07.10.2024 18:09, Richard Henderson wrote:

> The #2491 issue appears to be for user-mode emulation.  Because the reported host is x86, I would expect guest page size == host page size to improve 
> performance, not degrade it.

Yes, it is about linux-user.

> If this is really about user-mode, then perf may be your friend in determining where the extra overhead is coming from.

I updated the issue adding some perf output.  It looks like
the 4k-pagesize case just calls tb_lookup() and extract64()
significantly more times than with 16K pages.

Thanks,

/mjt
diff mbox series

Patch

diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
index 1265dc7cb5..cfe195db4e 100644
--- a/target/loongarch/cpu-param.h
+++ b/target/loongarch/cpu-param.h
@@ -12,6 +12,6 @@ 
 #define TARGET_PHYS_ADDR_SPACE_BITS 48
 #define TARGET_VIRT_ADDR_SPACE_BITS 48
 
-#define TARGET_PAGE_BITS 14
+#define TARGET_PAGE_BITS 12
 
 #endif
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index c8b8b0497f..449043c68b 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -60,6 +60,9 @@  static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical,
         tlb_rplv = 0;
     }
 
+    /* Remove sw bit between bit12 -- bit PS*/
+    tlb_ppn = tlb_ppn & ~(((0x1UL << (tlb_ps - 12)) -1));
+
     /* Check access rights */
     if (!tlb_v) {
         return TLBRET_INVALID;
@@ -82,10 +85,6 @@  static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical,
         return TLBRET_DIRTY;
     }
 
-    /*
-     * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15]
-     * need adjust.
-     */
     *physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) |
                 (address & MAKE_64BIT_MASK(0, tlb_ps));
     *prot = PAGE_READ;
@@ -774,7 +773,7 @@  void helper_ldpte(CPULoongArchState *env, target_ulong base, target_ulong odd,
         /* Move Global bit */
         tmp0 = ((tmp0 & (1 << LOONGARCH_HGLOBAL_SHIFT))  >>
                 LOONGARCH_HGLOBAL_SHIFT) << R_TLBENTRY_G_SHIFT |
-                (tmp0 & (~(1 << R_TLBENTRY_G_SHIFT)));
+                (tmp0 & (~(1 << LOONGARCH_HGLOBAL_SHIFT)));
         ps = ptbase + ptwidth - 1;
         if (odd) {
             tmp0 += MAKE_64BIT_MASK(ps, 1);