mbox series

[v8,0/4] riscv: Use PUD/P4D/PGD pages for the linear mapping

Message ID 20230316131711.1284451-1-alexghiti@rivosinc.com (mailing list archive)
Headers show
Series riscv: Use PUD/P4D/PGD pages for the linear mapping | expand

Message

Alexandre Ghiti March 16, 2023, 1:17 p.m. UTC
This patchset intends to improve tlb utilization by using hugepages for
the linear mapping.

As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
take care of isolating the kernel text and rodata so that they are not
mapped with a PUD mapping which would then assign wrong permissions to
the whole region: it is achieved by introducing a new memblock API.

Another patch makes use of this new API in arm64 which used some sort of
hack to solve this issue: it was built/boot tested successfully.

base-commit-tag: v6.3-rc1

v8:
- Fix rv32, as reported by Anup
- Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
- Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
- Fix arm64 double mapping (which to me did not work in v7), but ends up not
  being pretty at all, will wait for comments from arm64 reviewers, but
  this patch can easily be dropped if they do not want it.

v7:
- Fix Anup bug report by introducing memblock_isolate_memory which
  allows us to split the memblock mappings and then avoid to map the
  the PUD which contains the kernel as read only
- Add a patch to arm64 to use this newly introduced API

v6:
- quiet LLVM warning by casting phys_ram_base into an unsigned long

v5:
- Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
  Conor
- Add RB from Andrew

v4:
- Rebase on top of v6.2-rc3, as noted by Conor
- Add Acked-by Rob

v3:
- Change the comment about initrd_start VA conversion so that it fits
  ARM64 and RISCV64 (and others in the future if needed), as suggested
  by Rob

v2:
- Add a comment on why RISCV64 does not need to set initrd_start/end that
  early in the boot process, as asked by Rob

Alexandre Ghiti (4):
  riscv: Get rid of riscv_pfn_base variable
  mm: Introduce memblock_isolate_memory
  arm64: Make use of memblock_isolate_memory for the linear mapping
  riscv: Use PUD/P4D/PGD pages for the linear mapping

 arch/arm64/mm/mmu.c           | 25 +++++++++++------
 arch/riscv/include/asm/page.h | 19 +++++++++++--
 arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
 arch/riscv/mm/physaddr.c      | 16 +++++++++++
 drivers/of/fdt.c              | 11 ++++----
 include/linux/memblock.h      |  1 +
 mm/memblock.c                 | 20 +++++++++++++
 7 files changed, 119 insertions(+), 26 deletions(-)

Comments

Anup Patel March 23, 2023, 12:18 p.m. UTC | #1
Hi Alex,

On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> This patchset intends to improve tlb utilization by using hugepages for
> the linear mapping.
>
> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> take care of isolating the kernel text and rodata so that they are not
> mapped with a PUD mapping which would then assign wrong permissions to
> the whole region: it is achieved by introducing a new memblock API.
>
> Another patch makes use of this new API in arm64 which used some sort of
> hack to solve this issue: it was built/boot tested successfully.
>
> base-commit-tag: v6.3-rc1
>
> v8:
> - Fix rv32, as reported by Anup
> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
>   being pretty at all, will wait for comments from arm64 reviewers, but
>   this patch can easily be dropped if they do not want it.
>
> v7:
> - Fix Anup bug report by introducing memblock_isolate_memory which
>   allows us to split the memblock mappings and then avoid to map the
>   the PUD which contains the kernel as read only
> - Add a patch to arm64 to use this newly introduced API
>
> v6:
> - quiet LLVM warning by casting phys_ram_base into an unsigned long
>
> v5:
> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
>   Conor
> - Add RB from Andrew
>
> v4:
> - Rebase on top of v6.2-rc3, as noted by Conor
> - Add Acked-by Rob
>
> v3:
> - Change the comment about initrd_start VA conversion so that it fits
>   ARM64 and RISCV64 (and others in the future if needed), as suggested
>   by Rob
>
> v2:
> - Add a comment on why RISCV64 does not need to set initrd_start/end that
>   early in the boot process, as asked by Rob
>
> Alexandre Ghiti (4):
>   riscv: Get rid of riscv_pfn_base variable
>   mm: Introduce memblock_isolate_memory
>   arm64: Make use of memblock_isolate_memory for the linear mapping
>   riscv: Use PUD/P4D/PGD pages for the linear mapping

Kernel boot fine on RV64 but there is a failure which is still not
addressed. You can see this failure as following message in
kernel boot log:
    0.000000] Failed to add a System RAM resource at 80200000

Regards,
Anup

>
>  arch/arm64/mm/mmu.c           | 25 +++++++++++------
>  arch/riscv/include/asm/page.h | 19 +++++++++++--
>  arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
>  arch/riscv/mm/physaddr.c      | 16 +++++++++++
>  drivers/of/fdt.c              | 11 ++++----
>  include/linux/memblock.h      |  1 +
>  mm/memblock.c                 | 20 +++++++++++++
>  7 files changed, 119 insertions(+), 26 deletions(-)
>
> --
> 2.37.2
>
Alexandre Ghiti March 23, 2023, 12:54 p.m. UTC | #2
Hi Anup,

On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
>
> Hi Alex,
>
> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > This patchset intends to improve tlb utilization by using hugepages for
> > the linear mapping.
> >
> > As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > take care of isolating the kernel text and rodata so that they are not
> > mapped with a PUD mapping which would then assign wrong permissions to
> > the whole region: it is achieved by introducing a new memblock API.
> >
> > Another patch makes use of this new API in arm64 which used some sort of
> > hack to solve this issue: it was built/boot tested successfully.
> >
> > base-commit-tag: v6.3-rc1
> >
> > v8:
> > - Fix rv32, as reported by Anup
> > - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> >   being pretty at all, will wait for comments from arm64 reviewers, but
> >   this patch can easily be dropped if they do not want it.
> >
> > v7:
> > - Fix Anup bug report by introducing memblock_isolate_memory which
> >   allows us to split the memblock mappings and then avoid to map the
> >   the PUD which contains the kernel as read only
> > - Add a patch to arm64 to use this newly introduced API
> >
> > v6:
> > - quiet LLVM warning by casting phys_ram_base into an unsigned long
> >
> > v5:
> > - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> >   Conor
> > - Add RB from Andrew
> >
> > v4:
> > - Rebase on top of v6.2-rc3, as noted by Conor
> > - Add Acked-by Rob
> >
> > v3:
> > - Change the comment about initrd_start VA conversion so that it fits
> >   ARM64 and RISCV64 (and others in the future if needed), as suggested
> >   by Rob
> >
> > v2:
> > - Add a comment on why RISCV64 does not need to set initrd_start/end that
> >   early in the boot process, as asked by Rob
> >
> > Alexandre Ghiti (4):
> >   riscv: Get rid of riscv_pfn_base variable
> >   mm: Introduce memblock_isolate_memory
> >   arm64: Make use of memblock_isolate_memory for the linear mapping
> >   riscv: Use PUD/P4D/PGD pages for the linear mapping
>
> Kernel boot fine on RV64 but there is a failure which is still not
> addressed. You can see this failure as following message in
> kernel boot log:
>     0.000000] Failed to add a System RAM resource at 80200000

Hmmm I don't get that in any of my test configs, would you mind
sharing yours and your qemu command line?

Thanks

>
> Regards,
> Anup
>
> >
> >  arch/arm64/mm/mmu.c           | 25 +++++++++++------
> >  arch/riscv/include/asm/page.h | 19 +++++++++++--
> >  arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> >  arch/riscv/mm/physaddr.c      | 16 +++++++++++
> >  drivers/of/fdt.c              | 11 ++++----
> >  include/linux/memblock.h      |  1 +
> >  mm/memblock.c                 | 20 +++++++++++++
> >  7 files changed, 119 insertions(+), 26 deletions(-)
> >
> > --
> > 2.37.2
> >
Anup Patel March 23, 2023, 2:55 p.m. UTC | #3
On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> Hi Anup,
>
> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> >
> > Hi Alex,
> >
> > On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >
> > > This patchset intends to improve tlb utilization by using hugepages for
> > > the linear mapping.
> > >
> > > As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > > take care of isolating the kernel text and rodata so that they are not
> > > mapped with a PUD mapping which would then assign wrong permissions to
> > > the whole region: it is achieved by introducing a new memblock API.
> > >
> > > Another patch makes use of this new API in arm64 which used some sort of
> > > hack to solve this issue: it was built/boot tested successfully.
> > >
> > > base-commit-tag: v6.3-rc1
> > >
> > > v8:
> > > - Fix rv32, as reported by Anup
> > > - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > > - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > > - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > >   being pretty at all, will wait for comments from arm64 reviewers, but
> > >   this patch can easily be dropped if they do not want it.
> > >
> > > v7:
> > > - Fix Anup bug report by introducing memblock_isolate_memory which
> > >   allows us to split the memblock mappings and then avoid to map the
> > >   the PUD which contains the kernel as read only
> > > - Add a patch to arm64 to use this newly introduced API
> > >
> > > v6:
> > > - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > >
> > > v5:
> > > - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > >   Conor
> > > - Add RB from Andrew
> > >
> > > v4:
> > > - Rebase on top of v6.2-rc3, as noted by Conor
> > > - Add Acked-by Rob
> > >
> > > v3:
> > > - Change the comment about initrd_start VA conversion so that it fits
> > >   ARM64 and RISCV64 (and others in the future if needed), as suggested
> > >   by Rob
> > >
> > > v2:
> > > - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > >   early in the boot process, as asked by Rob
> > >
> > > Alexandre Ghiti (4):
> > >   riscv: Get rid of riscv_pfn_base variable
> > >   mm: Introduce memblock_isolate_memory
> > >   arm64: Make use of memblock_isolate_memory for the linear mapping
> > >   riscv: Use PUD/P4D/PGD pages for the linear mapping
> >
> > Kernel boot fine on RV64 but there is a failure which is still not
> > addressed. You can see this failure as following message in
> > kernel boot log:
> >     0.000000] Failed to add a System RAM resource at 80200000
>
> Hmmm I don't get that in any of my test configs, would you mind
> sharing yours and your qemu command line?

Try alexghiti_test branch at
https://github.com/avpatel/linux.git

I am building the kernel using defconfig and my rootfs is
based on busybox.

My QEMU command is:
qemu-system-riscv64 -M virt -m 512M -nographic -bios
opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4

Regards,
Anup

>
> Thanks
>
> >
> > Regards,
> > Anup
> >
> > >
> > >  arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > >  arch/riscv/include/asm/page.h | 19 +++++++++++--
> > >  arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > >  arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > >  drivers/of/fdt.c              | 11 ++++----
> > >  include/linux/memblock.h      |  1 +
> > >  mm/memblock.c                 | 20 +++++++++++++
> > >  7 files changed, 119 insertions(+), 26 deletions(-)
> > >
> > > --
> > > 2.37.2
> > >
Alexandre Ghiti March 24, 2023, 9:59 a.m. UTC | #4
On 3/23/23 15:55, Anup Patel wrote:
> On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>> Hi Anup,
>>
>> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
>>> Hi Alex,
>>>
>>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>>>> This patchset intends to improve tlb utilization by using hugepages for
>>>> the linear mapping.
>>>>
>>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
>>>> take care of isolating the kernel text and rodata so that they are not
>>>> mapped with a PUD mapping which would then assign wrong permissions to
>>>> the whole region: it is achieved by introducing a new memblock API.
>>>>
>>>> Another patch makes use of this new API in arm64 which used some sort of
>>>> hack to solve this issue: it was built/boot tested successfully.
>>>>
>>>> base-commit-tag: v6.3-rc1
>>>>
>>>> v8:
>>>> - Fix rv32, as reported by Anup
>>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
>>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
>>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
>>>>    being pretty at all, will wait for comments from arm64 reviewers, but
>>>>    this patch can easily be dropped if they do not want it.
>>>>
>>>> v7:
>>>> - Fix Anup bug report by introducing memblock_isolate_memory which
>>>>    allows us to split the memblock mappings and then avoid to map the
>>>>    the PUD which contains the kernel as read only
>>>> - Add a patch to arm64 to use this newly introduced API
>>>>
>>>> v6:
>>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
>>>>
>>>> v5:
>>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
>>>>    Conor
>>>> - Add RB from Andrew
>>>>
>>>> v4:
>>>> - Rebase on top of v6.2-rc3, as noted by Conor
>>>> - Add Acked-by Rob
>>>>
>>>> v3:
>>>> - Change the comment about initrd_start VA conversion so that it fits
>>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
>>>>    by Rob
>>>>
>>>> v2:
>>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
>>>>    early in the boot process, as asked by Rob
>>>>
>>>> Alexandre Ghiti (4):
>>>>    riscv: Get rid of riscv_pfn_base variable
>>>>    mm: Introduce memblock_isolate_memory
>>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
>>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
>>> Kernel boot fine on RV64 but there is a failure which is still not
>>> addressed. You can see this failure as following message in
>>> kernel boot log:
>>>      0.000000] Failed to add a System RAM resource at 80200000
>> Hmmm I don't get that in any of my test configs, would you mind
>> sharing yours and your qemu command line?
> Try alexghiti_test branch at
> https://github.com/avpatel/linux.git
>
> I am building the kernel using defconfig and my rootfs is
> based on busybox.
>
> My QEMU command is:
> qemu-system-riscv64 -M virt -m 512M -nographic -bios
> opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4


So splitting memblock.memory is the culprit, it "confuses" the resources 
addition and I can only find hacky ways to fix that...

So given that the arm64 patch with the new API is not pretty and that 
the simplest solution is to re-merge the memblock regions afterwards 
(which is done by memblock_clear_nomap), I'll drop the new API and the 
arm64 patch to use the nomap API like arm64: I'll take advantage of that 
to clean setup_vm_final which I have wanted to do for a long time.

@Mike Thanks for you reviews!

@Anup Thanks for all your bug reports on this patchset, I have to 
improve my test flow (it is in the work :)).


> Regards,
> Anup
>
>> Thanks
>>
>>> Regards,
>>> Anup
>>>
>>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
>>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
>>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
>>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
>>>>   drivers/of/fdt.c              | 11 ++++----
>>>>   include/linux/memblock.h      |  1 +
>>>>   mm/memblock.c                 | 20 +++++++++++++
>>>>   7 files changed, 119 insertions(+), 26 deletions(-)
>>>>
>>>> --
>>>> 2.37.2
>>>>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Nylon Chen Jan. 18, 2024, 8:23 a.m. UTC | #5
> On 3/23/23 15:55, Anup Patel wrote:
> > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >> Hi Anup,
> >>
> >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> >>> Hi Alex,
> >>>
> >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >>>> This patchset intends to improve tlb utilization by using hugepages for
> >>>> the linear mapping.
> >>>>
> >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> >>>> take care of isolating the kernel text and rodata so that they are not
> >>>> mapped with a PUD mapping which would then assign wrong permissions to
> >>>> the whole region: it is achieved by introducing a new memblock API.
> >>>>
> >>>> Another patch makes use of this new API in arm64 which used some sort of
> >>>> hack to solve this issue: it was built/boot tested successfully.
> >>>>
> >>>> base-commit-tag: v6.3-rc1
> >>>>
> >>>> v8:
> >>>> - Fix rv32, as reported by Anup
> >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> >>>>    this patch can easily be dropped if they do not want it.
> >>>>
> >>>> v7:
> >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> >>>>    allows us to split the memblock mappings and then avoid to map the
> >>>>    the PUD which contains the kernel as read only
> >>>> - Add a patch to arm64 to use this newly introduced API
> >>>>
> >>>> v6:
> >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> >>>>
> >>>> v5:
> >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> >>>>    Conor
> >>>> - Add RB from Andrew
> >>>>
> >>>> v4:
> >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> >>>> - Add Acked-by Rob
> >>>>
> >>>> v3:
> >>>> - Change the comment about initrd_start VA conversion so that it fits
> >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> >>>>    by Rob
> >>>>
> >>>> v2:
> >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> >>>>    early in the boot process, as asked by Rob
> >>>>
> >>>> Alexandre Ghiti (4):
> >>>>    riscv: Get rid of riscv_pfn_base variable
> >>>>    mm: Introduce memblock_isolate_memory
> >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> >>> Kernel boot fine on RV64 but there is a failure which is still not
> >>> addressed. You can see this failure as following message in
> >>> kernel boot log:
> >>>      0.000000] Failed to add a System RAM resource at 80200000
> >> Hmmm I don't get that in any of my test configs, would you mind
> >> sharing yours and your qemu command line?
> > Try alexghiti_test branch at
> > https://github.com/avpatel/linux.git
> >
> > I am building the kernel using defconfig and my rootfs is
> > based on busybox.
> >
> > My QEMU command is:
> > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> 
> 
> So splitting memblock.memory is the culprit, it "confuses" the resources
> addition and I can only find hacky ways to fix that...
Hi Alexandre,

We encountered the same error as Anup. After adding your patch
(3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
error again.

What I have observed so far is

- before your patch
When merging consecutive memblocks, if the memblock types are different,
they will be merged into reserved
- after your patch
When consecutive memblocks are merged, if the memblock types are
different, they will be merged into memory.

Such a result will cause the memory location of OpenSBI to be changed
from reserved to memory. Will this have any side effects?
> 
> So given that the arm64 patch with the new API is not pretty and that
> the simplest solution is to re-merge the memblock regions afterwards
> (which is done by memblock_clear_nomap), I'll drop the new API and the
> arm64 patch to use the nomap API like arm64: I'll take advantage of that
> to clean setup_vm_final which I have wanted to do for a long time.
> 
> @Mike Thanks for you reviews!
> 
> @Anup Thanks for all your bug reports on this patchset, I have to
> improve my test flow (it is in the work :)).
> 
> 
> > Regards,
> > Anup
> >
> >> Thanks
> >>
> >>> Regards,
> >>> Anup
> >>>
> >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> >>>>   drivers/of/fdt.c              | 11 ++++----
> >>>>   include/linux/memblock.h      |  1 +
> >>>>   mm/memblock.c                 | 20 +++++++++++++
> >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> >>>>
> >>>> --
> >>>> 2.37.2
> >>>>
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Alexandre Ghiti Jan. 18, 2024, 1:01 p.m. UTC | #6
Hi Nylon,

On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@sifive.com> wrote:
>
> > On 3/23/23 15:55, Anup Patel wrote:
> > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >> Hi Anup,
> > >>
> > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > >>> Hi Alex,
> > >>>
> > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >>>> This patchset intends to improve tlb utilization by using hugepages for
> > >>>> the linear mapping.
> > >>>>
> > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > >>>> take care of isolating the kernel text and rodata so that they are not
> > >>>> mapped with a PUD mapping which would then assign wrong permissions to
> > >>>> the whole region: it is achieved by introducing a new memblock API.
> > >>>>
> > >>>> Another patch makes use of this new API in arm64 which used some sort of
> > >>>> hack to solve this issue: it was built/boot tested successfully.
> > >>>>
> > >>>> base-commit-tag: v6.3-rc1
> > >>>>
> > >>>> v8:
> > >>>> - Fix rv32, as reported by Anup
> > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> > >>>>    this patch can easily be dropped if they do not want it.
> > >>>>
> > >>>> v7:
> > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> > >>>>    allows us to split the memblock mappings and then avoid to map the
> > >>>>    the PUD which contains the kernel as read only
> > >>>> - Add a patch to arm64 to use this newly introduced API
> > >>>>
> > >>>> v6:
> > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > >>>>
> > >>>> v5:
> > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > >>>>    Conor
> > >>>> - Add RB from Andrew
> > >>>>
> > >>>> v4:
> > >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> > >>>> - Add Acked-by Rob
> > >>>>
> > >>>> v3:
> > >>>> - Change the comment about initrd_start VA conversion so that it fits
> > >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> > >>>>    by Rob
> > >>>>
> > >>>> v2:
> > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > >>>>    early in the boot process, as asked by Rob
> > >>>>
> > >>>> Alexandre Ghiti (4):
> > >>>>    riscv: Get rid of riscv_pfn_base variable
> > >>>>    mm: Introduce memblock_isolate_memory
> > >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> > >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> > >>> Kernel boot fine on RV64 but there is a failure which is still not
> > >>> addressed. You can see this failure as following message in
> > >>> kernel boot log:
> > >>>      0.000000] Failed to add a System RAM resource at 80200000
> > >> Hmmm I don't get that in any of my test configs, would you mind
> > >> sharing yours and your qemu command line?
> > > Try alexghiti_test branch at
> > > https://github.com/avpatel/linux.git
> > >
> > > I am building the kernel using defconfig and my rootfs is
> > > based on busybox.
> > >
> > > My QEMU command is:
> > > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> >
> >
> > So splitting memblock.memory is the culprit, it "confuses" the resources
> > addition and I can only find hacky ways to fix that...
> Hi Alexandre,
>
> We encountered the same error as Anup. After adding your patch
> (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
> error again.
>
> What I have observed so far is
>
> - before your patch
> When merging consecutive memblocks, if the memblock types are different,
> they will be merged into reserved
> - after your patch
> When consecutive memblocks are merged, if the memblock types are
> different, they will be merged into memory.
>
> Such a result will cause the memory location of OpenSBI to be changed
> from reserved to memory. Will this have any side effects?

I guess it will end up in the memory pool and pages from openSBI
region will be allocated, so we should see very quickly bad stuff
happening (either PMP violation or M-mode ecall never
returning/trapping/etc).

But I don't observe the same thing, I always see the openSBI region
being reserved:

reserved[0x0] [0x0000000080000000-0x000000008007ffff],
0x0000000000080000 bytes flags: 0x0

Can you elaborate a bit more about "When consecutive memblocks are
merged, if the memblock types are different, they will be merged into
memory"? Where/when does this merge happen? Can you give me a config
file and a kernel revision so that I can take a look?

Thanks,

Alex

> >
> > So given that the arm64 patch with the new API is not pretty and that
> > the simplest solution is to re-merge the memblock regions afterwards
> > (which is done by memblock_clear_nomap), I'll drop the new API and the
> > arm64 patch to use the nomap API like arm64: I'll take advantage of that
> > to clean setup_vm_final which I have wanted to do for a long time.
> >
> > @Mike Thanks for you reviews!
> >
> > @Anup Thanks for all your bug reports on this patchset, I have to
> > improve my test flow (it is in the work :)).
> >
> >
> > > Regards,
> > > Anup
> > >
> > >> Thanks
> > >>
> > >>> Regards,
> > >>> Anup
> > >>>
> > >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> > >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > >>>>   drivers/of/fdt.c              | 11 ++++----
> > >>>>   include/linux/memblock.h      |  1 +
> > >>>>   mm/memblock.c                 | 20 +++++++++++++
> > >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> > >>>>
> > >>>> --
> > >>>> 2.37.2
> > >>>>
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
Nylon Chen Jan. 19, 2024, 9:26 a.m. UTC | #7
Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年1月18日 週四 下午9:01寫道:
>
> Hi Nylon,
Hi Alexandre, thanks for your feedback,
>
> On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> >
> > > On 3/23/23 15:55, Anup Patel wrote:
> > > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > >> Hi Anup,
> > > >>
> > > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > > >>> Hi Alex,
> > > >>>
> > > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > >>>> This patchset intends to improve tlb utilization by using hugepages for
> > > >>>> the linear mapping.
> > > >>>>
> > > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > > >>>> take care of isolating the kernel text and rodata so that they are not
> > > >>>> mapped with a PUD mapping which would then assign wrong permissions to
> > > >>>> the whole region: it is achieved by introducing a new memblock API.
> > > >>>>
> > > >>>> Another patch makes use of this new API in arm64 which used some sort of
> > > >>>> hack to solve this issue: it was built/boot tested successfully.
> > > >>>>
> > > >>>> base-commit-tag: v6.3-rc1
> > > >>>>
> > > >>>> v8:
> > > >>>> - Fix rv32, as reported by Anup
> > > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > > >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> > > >>>>    this patch can easily be dropped if they do not want it.
> > > >>>>
> > > >>>> v7:
> > > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> > > >>>>    allows us to split the memblock mappings and then avoid to map the
> > > >>>>    the PUD which contains the kernel as read only
> > > >>>> - Add a patch to arm64 to use this newly introduced API
> > > >>>>
> > > >>>> v6:
> > > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > > >>>>
> > > >>>> v5:
> > > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > > >>>>    Conor
> > > >>>> - Add RB from Andrew
> > > >>>>
> > > >>>> v4:
> > > >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> > > >>>> - Add Acked-by Rob
> > > >>>>
> > > >>>> v3:
> > > >>>> - Change the comment about initrd_start VA conversion so that it fits
> > > >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > >>>>    by Rob
> > > >>>>
> > > >>>> v2:
> > > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > >>>>    early in the boot process, as asked by Rob
> > > >>>>
> > > >>>> Alexandre Ghiti (4):
> > > >>>>    riscv: Get rid of riscv_pfn_base variable
> > > >>>>    mm: Introduce memblock_isolate_memory
> > > >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> > > >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> > > >>> Kernel boot fine on RV64 but there is a failure which is still not
> > > >>> addressed. You can see this failure as following message in
> > > >>> kernel boot log:
> > > >>>      0.000000] Failed to add a System RAM resource at 80200000
> > > >> Hmmm I don't get that in any of my test configs, would you mind
> > > >> sharing yours and your qemu command line?
> > > > Try alexghiti_test branch at
> > > > https://github.com/avpatel/linux.git
> > > >
> > > > I am building the kernel using defconfig and my rootfs is
> > > > based on busybox.
> > > >
> > > > My QEMU command is:
> > > > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> > >
> > >
> > > So splitting memblock.memory is the culprit, it "confuses" the resources
> > > addition and I can only find hacky ways to fix that...
> > Hi Alexandre,
> >
> > We encountered the same error as Anup. After adding your patch
> > (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
> > error again.
> >
> > What I have observed so far is
> >
> > - before your patch
> > When merging consecutive memblocks, if the memblock types are different,
> > they will be merged into reserved
> > - after your patch
> > When consecutive memblocks are merged, if the memblock types are
> > different, they will be merged into memory.
> >
> > Such a result will cause the memory location of OpenSBI to be changed
> > from reserved to memory. Will this have any side effects?
>
> I guess it will end up in the memory pool and pages from openSBI
> region will be allocated, so we should see very quickly bad stuff
> happening (either PMP violation or M-mode ecall never
> returning/trapping/etc).
>
> But I don't observe the same thing, I always see the openSBI region
> being reserved:
>
> reserved[0x0] [0x0000000080000000-0x000000008007ffff],
> 0x0000000000080000 bytes flags: 0x0
>
> Can you elaborate a bit more about "When consecutive memblocks are
> merged, if the memblock types are different, they will be merged into
> memory"? Where/when does this merge happen? Can you give me a config
> file and a kernel revision so that I can take a look?
Ok, If you want to reproduce the same results you just need to modify OpenSBI

[ lib/sbi/sbi_domain.c ]
+#define TEST_SIZE 0x200000

-                                 (scratch->fw_size - scratch->fw_rw_offset),
+                                 (TEST_SIZE - scratch->fw_rw_offset),

In addition, you can insert checks in the kernel merged function
[ mm/memblock.c ]
static void __init_memblock memblock_merge_regions(struct memblock_type *type)
        while (i < type->cnt - 1) {
         ...
                /* move forward from next + 1, index of which is i + 2 */
                memmove(next, next + 1, (type->cnt - (i + 2)) * sizeof(*next));
                type->cnt--;
        }
+       pr_info("Merged memblock_type: cnt = %lu, max = %lu,
total_size = 0x%llx\n",type->cnt, type->max, type->total_size);
+       for (i = 0; i < type->cnt; i++) {
+               const char *region_type =
memblock_is_memory(type->regions[i].base) ? "memory" : "reserve";
+               pr_info("Region %d: base = 0x%llx, size = 0x%llx, type
= %s\n", i, type->regions[i].base, type->regions[i].size,
region_type);
+       }
 }
This is kernel boot log
- before your patch
...
[    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
[    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1628501
[    0.000000] Region 0: base = 0x80000000, size = 0x1600000, type = reserve
...

- after your patch
...
[    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
[    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x180c42e
[    0.000000] Region 0: base = 0x80000000, size = 0x1800000, type = memory
...
[    0.000000] Failed to add a system RAM resource at 80200000
...
>
> Thanks,
>
> Alex
>
> > >
> > > So given that the arm64 patch with the new API is not pretty and that
> > > the simplest solution is to re-merge the memblock regions afterwards
> > > (which is done by memblock_clear_nomap), I'll drop the new API and the
> > > arm64 patch to use the nomap API like arm64: I'll take advantage of that
> > > to clean setup_vm_final which I have wanted to do for a long time.
> > >
> > > @Mike Thanks for you reviews!
> > >
> > > @Anup Thanks for all your bug reports on this patchset, I have to
> > > improve my test flow (it is in the work :)).
> > >
> > >
> > > > Regards,
> > > > Anup
> > > >
> > > >> Thanks
> > > >>
> > > >>> Regards,
> > > >>> Anup
> > > >>>
> > > >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > > >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> > > >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > > >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > > >>>>   drivers/of/fdt.c              | 11 ++++----
> > > >>>>   include/linux/memblock.h      |  1 +
> > > >>>>   mm/memblock.c                 | 20 +++++++++++++
> > > >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> > > >>>>
> > > >>>> --
> > > >>>> 2.37.2
> > > >>>>
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > linux-riscv@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
Alexandre Ghiti Feb. 5, 2024, 9:32 a.m. UTC | #8
Hi Nylon,

On Fri, Jan 19, 2024 at 10:27 AM Nylon Chen <nylon.chen@sifive.com> wrote:
>
> Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年1月18日 週四 下午9:01寫道:
> >
> > Hi Nylon,
> Hi Alexandre, thanks for your feedback,
> >
> > On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > >
> > > > On 3/23/23 15:55, Anup Patel wrote:
> > > > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > >> Hi Anup,
> > > > >>
> > > > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > > > >>> Hi Alex,
> > > > >>>
> > > > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > >>>> This patchset intends to improve tlb utilization by using hugepages for
> > > > >>>> the linear mapping.
> > > > >>>>
> > > > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > > > >>>> take care of isolating the kernel text and rodata so that they are not
> > > > >>>> mapped with a PUD mapping which would then assign wrong permissions to
> > > > >>>> the whole region: it is achieved by introducing a new memblock API.
> > > > >>>>
> > > > >>>> Another patch makes use of this new API in arm64 which used some sort of
> > > > >>>> hack to solve this issue: it was built/boot tested successfully.
> > > > >>>>
> > > > >>>> base-commit-tag: v6.3-rc1
> > > > >>>>
> > > > >>>> v8:
> > > > >>>> - Fix rv32, as reported by Anup
> > > > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > > > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > > > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > > > >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> > > > >>>>    this patch can easily be dropped if they do not want it.
> > > > >>>>
> > > > >>>> v7:
> > > > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> > > > >>>>    allows us to split the memblock mappings and then avoid to map the
> > > > >>>>    the PUD which contains the kernel as read only
> > > > >>>> - Add a patch to arm64 to use this newly introduced API
> > > > >>>>
> > > > >>>> v6:
> > > > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > > > >>>>
> > > > >>>> v5:
> > > > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > > > >>>>    Conor
> > > > >>>> - Add RB from Andrew
> > > > >>>>
> > > > >>>> v4:
> > > > >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> > > > >>>> - Add Acked-by Rob
> > > > >>>>
> > > > >>>> v3:
> > > > >>>> - Change the comment about initrd_start VA conversion so that it fits
> > > > >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > > >>>>    by Rob
> > > > >>>>
> > > > >>>> v2:
> > > > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > > >>>>    early in the boot process, as asked by Rob
> > > > >>>>
> > > > >>>> Alexandre Ghiti (4):
> > > > >>>>    riscv: Get rid of riscv_pfn_base variable
> > > > >>>>    mm: Introduce memblock_isolate_memory
> > > > >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> > > > >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> > > > >>> Kernel boot fine on RV64 but there is a failure which is still not
> > > > >>> addressed. You can see this failure as following message in
> > > > >>> kernel boot log:
> > > > >>>      0.000000] Failed to add a System RAM resource at 80200000
> > > > >> Hmmm I don't get that in any of my test configs, would you mind
> > > > >> sharing yours and your qemu command line?
> > > > > Try alexghiti_test branch at
> > > > > https://github.com/avpatel/linux.git
> > > > >
> > > > > I am building the kernel using defconfig and my rootfs is
> > > > > based on busybox.
> > > > >
> > > > > My QEMU command is:
> > > > > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > > > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > > > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > > > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> > > >
> > > >
> > > > So splitting memblock.memory is the culprit, it "confuses" the resources
> > > > addition and I can only find hacky ways to fix that...
> > > Hi Alexandre,
> > >
> > > We encountered the same error as Anup. After adding your patch
> > > (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
> > > error again.
> > >
> > > What I have observed so far is
> > >
> > > - before your patch
> > > When merging consecutive memblocks, if the memblock types are different,
> > > they will be merged into reserved
> > > - after your patch
> > > When consecutive memblocks are merged, if the memblock types are
> > > different, they will be merged into memory.
> > >
> > > Such a result will cause the memory location of OpenSBI to be changed
> > > from reserved to memory. Will this have any side effects?
> >
> > I guess it will end up in the memory pool and pages from openSBI
> > region will be allocated, so we should see very quickly bad stuff
> > happening (either PMP violation or M-mode ecall never
> > returning/trapping/etc).
> >
> > But I don't observe the same thing, I always see the openSBI region
> > being reserved:
> >
> > reserved[0x0] [0x0000000080000000-0x000000008007ffff],
> > 0x0000000000080000 bytes flags: 0x0
> >
> > Can you elaborate a bit more about "When consecutive memblocks are
> > merged, if the memblock types are different, they will be merged into
> > memory"? Where/when does this merge happen? Can you give me a config
> > file and a kernel revision so that I can take a look?
> Ok, If you want to reproduce the same results you just need to modify OpenSBI
>
> [ lib/sbi/sbi_domain.c ]
> +#define TEST_SIZE 0x200000
>
> -                                 (scratch->fw_size - scratch->fw_rw_offset),
> +                                 (TEST_SIZE - scratch->fw_rw_offset),
>
> In addition, you can insert checks in the kernel merged function
> [ mm/memblock.c ]
> static void __init_memblock memblock_merge_regions(struct memblock_type *type)
>         while (i < type->cnt - 1) {
>          ...
>                 /* move forward from next + 1, index of which is i + 2 */
>                 memmove(next, next + 1, (type->cnt - (i + 2)) * sizeof(*next));
>                 type->cnt--;
>         }
> +       pr_info("Merged memblock_type: cnt = %lu, max = %lu,
> total_size = 0x%llx\n",type->cnt, type->max, type->total_size);
> +       for (i = 0; i < type->cnt; i++) {
> +               const char *region_type =
> memblock_is_memory(type->regions[i].base) ? "memory" : "reserve";
> +               pr_info("Region %d: base = 0x%llx, size = 0x%llx, type
> = %s\n", i, type->regions[i].base, type->regions[i].size,
> region_type);
> +       }
>  }
> This is kernel boot log
> - before your patch
> ...
> [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1628501
> [    0.000000] Region 0: base = 0x80000000, size = 0x1600000, type = reserve
> ...
>
> - after your patch
> ...
> [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x180c42e
> [    0.000000] Region 0: base = 0x80000000, size = 0x1800000, type = memory

So the openSBI region is marked as memory, and not reserved because
this region is now described as nomap, and memblock_mark_nomap() does
not move this region into the reserved memblock list, but keep it in
the memory list with the nomap flag
(https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L479).
But as stated in the description of memblock_mark_nomap()
(https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L969),
the pages associated with the region will be marked as PageReserved
and the region will not be covered in the linear mapping.

So to me, this is normal and we are safe. Let me know if I made a mistake.

And sorry for the long delay, that slipped my mind!

Thanks,

Alex

> ...
> [    0.000000] Failed to add a system RAM resource at 80200000
> ...
> >
> > Thanks,
> >
> > Alex
> >
> > > >
> > > > So given that the arm64 patch with the new API is not pretty and that
> > > > the simplest solution is to re-merge the memblock regions afterwards
> > > > (which is done by memblock_clear_nomap), I'll drop the new API and the
> > > > arm64 patch to use the nomap API like arm64: I'll take advantage of that
> > > > to clean setup_vm_final which I have wanted to do for a long time.
> > > >
> > > > @Mike Thanks for you reviews!
> > > >
> > > > @Anup Thanks for all your bug reports on this patchset, I have to
> > > > improve my test flow (it is in the work :)).
> > > >
> > > >
> > > > > Regards,
> > > > > Anup
> > > > >
> > > > >> Thanks
> > > > >>
> > > > >>> Regards,
> > > > >>> Anup
> > > > >>>
> > > > >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > > > >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> > > > >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > > > >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > > > >>>>   drivers/of/fdt.c              | 11 ++++----
> > > > >>>>   include/linux/memblock.h      |  1 +
> > > > >>>>   mm/memblock.c                 | 20 +++++++++++++
> > > > >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> > > > >>>>
> > > > >>>> --
> > > > >>>> 2.37.2
> > > > >>>>
> > > > > _______________________________________________
> > > > > linux-riscv mailing list
> > > > > linux-riscv@lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > >
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > linux-riscv@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
Nylon Chen March 12, 2024, 6:40 a.m. UTC | #9
Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年2月5日 週一 下午5:32寫道:
>
> Hi Nylon,
>
> On Fri, Jan 19, 2024 at 10:27 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> >
> > Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年1月18日 週四 下午9:01寫道:
> > >
> > > Hi Nylon,
> > Hi Alexandre, thanks for your feedback,
> > >
> > > On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > > >
> > > > > On 3/23/23 15:55, Anup Patel wrote:
> > > > > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > >> Hi Anup,
> > > > > >>
> > > > > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > > > > >>> Hi Alex,
> > > > > >>>
> > > > > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > >>>> This patchset intends to improve tlb utilization by using hugepages for
> > > > > >>>> the linear mapping.
> > > > > >>>>
> > > > > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > > > > >>>> take care of isolating the kernel text and rodata so that they are not
> > > > > >>>> mapped with a PUD mapping which would then assign wrong permissions to
> > > > > >>>> the whole region: it is achieved by introducing a new memblock API.
> > > > > >>>>
> > > > > >>>> Another patch makes use of this new API in arm64 which used some sort of
> > > > > >>>> hack to solve this issue: it was built/boot tested successfully.
> > > > > >>>>
> > > > > >>>> base-commit-tag: v6.3-rc1
> > > > > >>>>
> > > > > >>>> v8:
> > > > > >>>> - Fix rv32, as reported by Anup
> > > > > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > > > > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > > > > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > > > > >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> > > > > >>>>    this patch can easily be dropped if they do not want it.
> > > > > >>>>
> > > > > >>>> v7:
> > > > > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> > > > > >>>>    allows us to split the memblock mappings and then avoid to map the
> > > > > >>>>    the PUD which contains the kernel as read only
> > > > > >>>> - Add a patch to arm64 to use this newly introduced API
> > > > > >>>>
> > > > > >>>> v6:
> > > > > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > > > > >>>>
> > > > > >>>> v5:
> > > > > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > > > > >>>>    Conor
> > > > > >>>> - Add RB from Andrew
> > > > > >>>>
> > > > > >>>> v4:
> > > > > >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> > > > > >>>> - Add Acked-by Rob
> > > > > >>>>
> > > > > >>>> v3:
> > > > > >>>> - Change the comment about initrd_start VA conversion so that it fits
> > > > > >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > > > >>>>    by Rob
> > > > > >>>>
> > > > > >>>> v2:
> > > > > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > > > >>>>    early in the boot process, as asked by Rob
> > > > > >>>>
> > > > > >>>> Alexandre Ghiti (4):
> > > > > >>>>    riscv: Get rid of riscv_pfn_base variable
> > > > > >>>>    mm: Introduce memblock_isolate_memory
> > > > > >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> > > > > >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> > > > > >>> Kernel boot fine on RV64 but there is a failure which is still not
> > > > > >>> addressed. You can see this failure as following message in
> > > > > >>> kernel boot log:
> > > > > >>>      0.000000] Failed to add a System RAM resource at 80200000
> > > > > >> Hmmm I don't get that in any of my test configs, would you mind
> > > > > >> sharing yours and your qemu command line?
> > > > > > Try alexghiti_test branch at
> > > > > > https://github.com/avpatel/linux.git
> > > > > >
> > > > > > I am building the kernel using defconfig and my rootfs is
> > > > > > based on busybox.
> > > > > >
> > > > > > My QEMU command is:
> > > > > > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > > > > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > > > > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > > > > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> > > > >
> > > > >
> > > > > So splitting memblock.memory is the culprit, it "confuses" the resources
> > > > > addition and I can only find hacky ways to fix that...
> > > > Hi Alexandre,
> > > >
> > > > We encountered the same error as Anup. After adding your patch
> > > > (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
> > > > error again.
> > > >
> > > > What I have observed so far is
> > > >
> > > > - before your patch
> > > > When merging consecutive memblocks, if the memblock types are different,
> > > > they will be merged into reserved
> > > > - after your patch
> > > > When consecutive memblocks are merged, if the memblock types are
> > > > different, they will be merged into memory.
> > > >
> > > > Such a result will cause the memory location of OpenSBI to be changed
> > > > from reserved to memory. Will this have any side effects?
> > >
> > > I guess it will end up in the memory pool and pages from openSBI
> > > region will be allocated, so we should see very quickly bad stuff
> > > happening (either PMP violation or M-mode ecall never
> > > returning/trapping/etc).
> > >
> > > But I don't observe the same thing, I always see the openSBI region
> > > being reserved:
> > >
> > > reserved[0x0] [0x0000000080000000-0x000000008007ffff],
> > > 0x0000000000080000 bytes flags: 0x0
> > >
> > > Can you elaborate a bit more about "When consecutive memblocks are
> > > merged, if the memblock types are different, they will be merged into
> > > memory"? Where/when does this merge happen? Can you give me a config
> > > file and a kernel revision so that I can take a look?
> > Ok, If you want to reproduce the same results you just need to modify OpenSBI
> >
> > [ lib/sbi/sbi_domain.c ]
> > +#define TEST_SIZE 0x200000
> >
> > -                                 (scratch->fw_size - scratch->fw_rw_offset),
> > +                                 (TEST_SIZE - scratch->fw_rw_offset),
> >
> > In addition, you can insert checks in the kernel merged function
> > [ mm/memblock.c ]
> > static void __init_memblock memblock_merge_regions(struct memblock_type *type)
> >         while (i < type->cnt - 1) {
> >          ...
> >                 /* move forward from next + 1, index of which is i + 2 */
> >                 memmove(next, next + 1, (type->cnt - (i + 2)) * sizeof(*next));
> >                 type->cnt--;
> >         }
> > +       pr_info("Merged memblock_type: cnt = %lu, max = %lu,
> > total_size = 0x%llx\n",type->cnt, type->max, type->total_size);
> > +       for (i = 0; i < type->cnt; i++) {
> > +               const char *region_type =
> > memblock_is_memory(type->regions[i].base) ? "memory" : "reserve";
> > +               pr_info("Region %d: base = 0x%llx, size = 0x%llx, type
> > = %s\n", i, type->regions[i].base, type->regions[i].size,
> > region_type);
> > +       }
> >  }
> > This is kernel boot log
> > - before your patch
> > ...
> > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1628501
> > [    0.000000] Region 0: base = 0x80000000, size = 0x1600000, type = reserve
> > ...
> >
> > - after your patch
> > ...
> > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x180c42e
> > [    0.000000] Region 0: base = 0x80000000, size = 0x1800000, type = memory
>
Hi Alex, thanks for your feedback.
> So the openSBI region is marked as memory, and not reserved because
> this region is now described as nomap, and memblock_mark_nomap() does
> not move this region into the reserved memblock list, but keep it in
> the memory list with the nomap flag
> (https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L479).
> But as stated in the description of memblock_mark_nomap()
> (https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L969),
> the pages associated with the region will be marked as PageReserved
> and the region will not be covered in the linear mapping.
I traced it via GDB, and indeed, it enters
early_init_dt_reserve_memory() and calls memblock_reserve to reserve
this block of memory.

[before your patch]
[    0.000000] OF: fdt: check nomap Reserving memory: base =
0x80000000, size = 0x200000
[    0.000000] ---  Reserved memory: Base address: 80000000, Size:
200000---
[    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size =
0x1e28501
[    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type =
reserve
[    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type =
memory
....
[    0.000000] OF: fdt: Reserved memory: reserved region for node
'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
[    0.000000] OF: reserved mem:
0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
mmode_resv0@80000000

[after your patch]
[    0.000000] OF: fdt: check nomap Reserving memory: base =
0x80000000, size = 0x200000
[    0.000000] --- Reserved memory: Base address: 80000000, Size: 200000---
[    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1e25501
[    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type = memory
[    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type = memory
...
[    0.000000] OF: fdt: Reserved memory: reserved region for node
'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
[    0.000000] OF: reserved mem:
0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
mmode_resv0@80000000

At the moment, it can be confirmed that there is no need to worry
about this block of memory being used.

But I still have a question I'd like to ask, which is why this
location is flagged as 'reserve' instead of 'memory' in the memblock

Thanks,
Nylon
>
> So to me, this is normal and we are safe. Let me know if I made a mistake.
>
> And sorry for the long delay, that slipped my mind!
>
> Thanks,
>
> Alex
>
> > ...
> > [    0.000000] Failed to add a system RAM resource at 80200000
> > ...
> > >
> > > Thanks,
> > >
> > > Alex
> > >
> > > > >
> > > > > So given that the arm64 patch with the new API is not pretty and that
> > > > > the simplest solution is to re-merge the memblock regions afterwards
> > > > > (which is done by memblock_clear_nomap), I'll drop the new API and the
> > > > > arm64 patch to use the nomap API like arm64: I'll take advantage of that
> > > > > to clean setup_vm_final which I have wanted to do for a long time.
> > > > >
> > > > > @Mike Thanks for you reviews!
> > > > >
> > > > > @Anup Thanks for all your bug reports on this patchset, I have to
> > > > > improve my test flow (it is in the work :)).
> > > > >
> > > > >
> > > > > > Regards,
> > > > > > Anup
> > > > > >
> > > > > >> Thanks
> > > > > >>
> > > > > >>> Regards,
> > > > > >>> Anup
> > > > > >>>
> > > > > >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > > > > >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> > > > > >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > > > > >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > > > > >>>>   drivers/of/fdt.c              | 11 ++++----
> > > > > >>>>   include/linux/memblock.h      |  1 +
> > > > > >>>>   mm/memblock.c                 | 20 +++++++++++++
> > > > > >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> > > > > >>>>
> > > > > >>>> --
> > > > > >>>> 2.37.2
> > > > > >>>>
> > > > > > _______________________________________________
> > > > > > linux-riscv mailing list
> > > > > > linux-riscv@lists.infradead.org
> > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > > >
> > > > > _______________________________________________
> > > > > linux-riscv mailing list
> > > > > linux-riscv@lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
Nylon Chen March 12, 2024, 6:48 a.m. UTC | #10
Nylon Chen <nylon.chen@sifive.com> 於 2024年3月12日 週二 下午2:40寫道:
>
> Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年2月5日 週一 下午5:32寫道:
> >
> > Hi Nylon,
> >
> > On Fri, Jan 19, 2024 at 10:27 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > >
> > > Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年1月18日 週四 下午9:01寫道:
> > > >
> > > > Hi Nylon,
> > > Hi Alexandre, thanks for your feedback,
> > > >
> > > > On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > > > >
> > > > > > On 3/23/23 15:55, Anup Patel wrote:
> > > > > > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > >> Hi Anup,
> > > > > > >>
> > > > > > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > > > > > >>> Hi Alex,
> > > > > > >>>
> > > > > > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > >>>> This patchset intends to improve tlb utilization by using hugepages for
> > > > > > >>>> the linear mapping.
> > > > > > >>>>
> > > > > > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > > > > > >>>> take care of isolating the kernel text and rodata so that they are not
> > > > > > >>>> mapped with a PUD mapping which would then assign wrong permissions to
> > > > > > >>>> the whole region: it is achieved by introducing a new memblock API.
> > > > > > >>>>
> > > > > > >>>> Another patch makes use of this new API in arm64 which used some sort of
> > > > > > >>>> hack to solve this issue: it was built/boot tested successfully.
> > > > > > >>>>
> > > > > > >>>> base-commit-tag: v6.3-rc1
> > > > > > >>>>
> > > > > > >>>> v8:
> > > > > > >>>> - Fix rv32, as reported by Anup
> > > > > > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > > > > > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > > > > > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > > > > > >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> > > > > > >>>>    this patch can easily be dropped if they do not want it.
> > > > > > >>>>
> > > > > > >>>> v7:
> > > > > > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> > > > > > >>>>    allows us to split the memblock mappings and then avoid to map the
> > > > > > >>>>    the PUD which contains the kernel as read only
> > > > > > >>>> - Add a patch to arm64 to use this newly introduced API
> > > > > > >>>>
> > > > > > >>>> v6:
> > > > > > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > > > > > >>>>
> > > > > > >>>> v5:
> > > > > > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > > > > > >>>>    Conor
> > > > > > >>>> - Add RB from Andrew
> > > > > > >>>>
> > > > > > >>>> v4:
> > > > > > >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> > > > > > >>>> - Add Acked-by Rob
> > > > > > >>>>
> > > > > > >>>> v3:
> > > > > > >>>> - Change the comment about initrd_start VA conversion so that it fits
> > > > > > >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > > > > >>>>    by Rob
> > > > > > >>>>
> > > > > > >>>> v2:
> > > > > > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > > > > >>>>    early in the boot process, as asked by Rob
> > > > > > >>>>
> > > > > > >>>> Alexandre Ghiti (4):
> > > > > > >>>>    riscv: Get rid of riscv_pfn_base variable
> > > > > > >>>>    mm: Introduce memblock_isolate_memory
> > > > > > >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> > > > > > >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> > > > > > >>> Kernel boot fine on RV64 but there is a failure which is still not
> > > > > > >>> addressed. You can see this failure as following message in
> > > > > > >>> kernel boot log:
> > > > > > >>>      0.000000] Failed to add a System RAM resource at 80200000
> > > > > > >> Hmmm I don't get that in any of my test configs, would you mind
> > > > > > >> sharing yours and your qemu command line?
> > > > > > > Try alexghiti_test branch at
> > > > > > > https://github.com/avpatel/linux.git
> > > > > > >
> > > > > > > I am building the kernel using defconfig and my rootfs is
> > > > > > > based on busybox.
> > > > > > >
> > > > > > > My QEMU command is:
> > > > > > > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > > > > > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > > > > > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > > > > > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> > > > > >
> > > > > >
> > > > > > So splitting memblock.memory is the culprit, it "confuses" the resources
> > > > > > addition and I can only find hacky ways to fix that...
> > > > > Hi Alexandre,
> > > > >
> > > > > We encountered the same error as Anup. After adding your patch
> > > > > (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
> > > > > error again.
> > > > >
> > > > > What I have observed so far is
> > > > >
> > > > > - before your patch
> > > > > When merging consecutive memblocks, if the memblock types are different,
> > > > > they will be merged into reserved
> > > > > - after your patch
> > > > > When consecutive memblocks are merged, if the memblock types are
> > > > > different, they will be merged into memory.
> > > > >
> > > > > Such a result will cause the memory location of OpenSBI to be changed
> > > > > from reserved to memory. Will this have any side effects?
> > > >
> > > > I guess it will end up in the memory pool and pages from openSBI
> > > > region will be allocated, so we should see very quickly bad stuff
> > > > happening (either PMP violation or M-mode ecall never
> > > > returning/trapping/etc).
> > > >
> > > > But I don't observe the same thing, I always see the openSBI region
> > > > being reserved:
> > > >
> > > > reserved[0x0] [0x0000000080000000-0x000000008007ffff],
> > > > 0x0000000000080000 bytes flags: 0x0
> > > >
> > > > Can you elaborate a bit more about "When consecutive memblocks are
> > > > merged, if the memblock types are different, they will be merged into
> > > > memory"? Where/when does this merge happen? Can you give me a config
> > > > file and a kernel revision so that I can take a look?
> > > Ok, If you want to reproduce the same results you just need to modify OpenSBI
> > >
> > > [ lib/sbi/sbi_domain.c ]
> > > +#define TEST_SIZE 0x200000
> > >
> > > -                                 (scratch->fw_size - scratch->fw_rw_offset),
> > > +                                 (TEST_SIZE - scratch->fw_rw_offset),
> > >
> > > In addition, you can insert checks in the kernel merged function
> > > [ mm/memblock.c ]
> > > static void __init_memblock memblock_merge_regions(struct memblock_type *type)
> > >         while (i < type->cnt - 1) {
> > >          ...
> > >                 /* move forward from next + 1, index of which is i + 2 */
> > >                 memmove(next, next + 1, (type->cnt - (i + 2)) * sizeof(*next));
> > >                 type->cnt--;
> > >         }
> > > +       pr_info("Merged memblock_type: cnt = %lu, max = %lu,
> > > total_size = 0x%llx\n",type->cnt, type->max, type->total_size);
> > > +       for (i = 0; i < type->cnt; i++) {
> > > +               const char *region_type =
> > > memblock_is_memory(type->regions[i].base) ? "memory" : "reserve";
> > > +               pr_info("Region %d: base = 0x%llx, size = 0x%llx, type
> > > = %s\n", i, type->regions[i].base, type->regions[i].size,
> > > region_type);
> > > +       }
> > >  }
> > > This is kernel boot log
> > > - before your patch
> > > ...
> > > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1628501
> > > [    0.000000] Region 0: base = 0x80000000, size = 0x1600000, type = reserve
> > > ...
> > >
> > > - after your patch
> > > ...
> > > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x180c42e
> > > [    0.000000] Region 0: base = 0x80000000, size = 0x1800000, type = memory
> >
> Hi Alex, thanks for your feedback.
> > So the openSBI region is marked as memory, and not reserved because
> > this region is now described as nomap, and memblock_mark_nomap() does
> > not move this region into the reserved memblock list, but keep it in
> > the memory list with the nomap flag
> > (https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L479).
> > But as stated in the description of memblock_mark_nomap()
> > (https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L969),
> > the pages associated with the region will be marked as PageReserved
> > and the region will not be covered in the linear mapping.
> I traced it via GDB, and indeed, it enters
> early_init_dt_reserve_memory() and calls memblock_reserve to reserve
> this block of memory.
>
> [before your patch]
> [    0.000000] OF: fdt: check nomap Reserving memory: base =
> 0x80000000, size = 0x200000
> [    0.000000] ---  Reserved memory: Base address: 80000000, Size:
> 200000---
> [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size =
> 0x1e28501
> [    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type =
> reserve
> [    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type =
> memory
> ....
> [    0.000000] OF: fdt: Reserved memory: reserved region for node
> 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
> [    0.000000] OF: reserved mem:
> 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
> mmode_resv0@80000000
>
> [after your patch]
> [    0.000000] OF: fdt: check nomap Reserving memory: base =
> 0x80000000, size = 0x200000
> [    0.000000] --- Reserved memory: Base address: 80000000, Size: 200000---
> [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1e25501
> [    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type = memory
> [    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type = memory
> ...
> [    0.000000] OF: fdt: Reserved memory: reserved region for node
> 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
> [    0.000000] OF: reserved mem:
> 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
> mmode_resv0@80000000
>
> At the moment, it can be confirmed that there is no need to worry
> about this block of memory being used.
>
> But I still have a question I'd like to ask, which is why this
> location is flagged as 'reserve' instead of 'memory' in the memblock
Sorry, I asked the wrong question.

Why is this location marked as "memory" instead of "reserve" in the memblock?
>
> Thanks,
> Nylon
> >
> > So to me, this is normal and we are safe. Let me know if I made a mistake.
> >
> > And sorry for the long delay, that slipped my mind!
> >
> > Thanks,
> >
> > Alex
> >
> > > ...
> > > [    0.000000] Failed to add a system RAM resource at 80200000
> > > ...
> > > >
> > > > Thanks,
> > > >
> > > > Alex
> > > >
> > > > > >
> > > > > > So given that the arm64 patch with the new API is not pretty and that
> > > > > > the simplest solution is to re-merge the memblock regions afterwards
> > > > > > (which is done by memblock_clear_nomap), I'll drop the new API and the
> > > > > > arm64 patch to use the nomap API like arm64: I'll take advantage of that
> > > > > > to clean setup_vm_final which I have wanted to do for a long time.
> > > > > >
> > > > > > @Mike Thanks for you reviews!
> > > > > >
> > > > > > @Anup Thanks for all your bug reports on this patchset, I have to
> > > > > > improve my test flow (it is in the work :)).
> > > > > >
> > > > > >
> > > > > > > Regards,
> > > > > > > Anup
> > > > > > >
> > > > > > >> Thanks
> > > > > > >>
> > > > > > >>> Regards,
> > > > > > >>> Anup
> > > > > > >>>
> > > > > > >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > > > > > >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> > > > > > >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > > > > > >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > > > > > >>>>   drivers/of/fdt.c              | 11 ++++----
> > > > > > >>>>   include/linux/memblock.h      |  1 +
> > > > > > >>>>   mm/memblock.c                 | 20 +++++++++++++
> > > > > > >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> > > > > > >>>>
> > > > > > >>>> --
> > > > > > >>>> 2.37.2
> > > > > > >>>>
> > > > > > > _______________________________________________
> > > > > > > linux-riscv mailing list
> > > > > > > linux-riscv@lists.infradead.org
> > > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > > > >
> > > > > > _______________________________________________
> > > > > > linux-riscv mailing list
> > > > > > linux-riscv@lists.infradead.org
> > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
Alexandre Ghiti March 12, 2024, 9:33 a.m. UTC | #11
Hi Nylon,

On Tue, Mar 12, 2024 at 7:48 AM Nylon Chen <nylon.chen@sifive.com> wrote:
>
> Nylon Chen <nylon.chen@sifive.com> 於 2024年3月12日 週二 下午2:40寫道:
> >
> > Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年2月5日 週一 下午5:32寫道:
> > >
> > > Hi Nylon,
> > >
> > > On Fri, Jan 19, 2024 at 10:27 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > > >
> > > > Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年1月18日 週四 下午9:01寫道:
> > > > >
> > > > > Hi Nylon,
> > > > Hi Alexandre, thanks for your feedback,
> > > > >
> > > > > On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > > > > >
> > > > > > > On 3/23/23 15:55, Anup Patel wrote:
> > > > > > > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > >> Hi Anup,
> > > > > > > >>
> > > > > > > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > > > > > > >>> Hi Alex,
> > > > > > > >>>
> > > > > > > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > >>>> This patchset intends to improve tlb utilization by using hugepages for
> > > > > > > >>>> the linear mapping.
> > > > > > > >>>>
> > > > > > > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > > > > > > >>>> take care of isolating the kernel text and rodata so that they are not
> > > > > > > >>>> mapped with a PUD mapping which would then assign wrong permissions to
> > > > > > > >>>> the whole region: it is achieved by introducing a new memblock API.
> > > > > > > >>>>
> > > > > > > >>>> Another patch makes use of this new API in arm64 which used some sort of
> > > > > > > >>>> hack to solve this issue: it was built/boot tested successfully.
> > > > > > > >>>>
> > > > > > > >>>> base-commit-tag: v6.3-rc1
> > > > > > > >>>>
> > > > > > > >>>> v8:
> > > > > > > >>>> - Fix rv32, as reported by Anup
> > > > > > > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > > > > > > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > > > > > > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > > > > > > >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> > > > > > > >>>>    this patch can easily be dropped if they do not want it.
> > > > > > > >>>>
> > > > > > > >>>> v7:
> > > > > > > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> > > > > > > >>>>    allows us to split the memblock mappings and then avoid to map the
> > > > > > > >>>>    the PUD which contains the kernel as read only
> > > > > > > >>>> - Add a patch to arm64 to use this newly introduced API
> > > > > > > >>>>
> > > > > > > >>>> v6:
> > > > > > > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > > > > > > >>>>
> > > > > > > >>>> v5:
> > > > > > > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > > > > > > >>>>    Conor
> > > > > > > >>>> - Add RB from Andrew
> > > > > > > >>>>
> > > > > > > >>>> v4:
> > > > > > > >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> > > > > > > >>>> - Add Acked-by Rob
> > > > > > > >>>>
> > > > > > > >>>> v3:
> > > > > > > >>>> - Change the comment about initrd_start VA conversion so that it fits
> > > > > > > >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > > > > > >>>>    by Rob
> > > > > > > >>>>
> > > > > > > >>>> v2:
> > > > > > > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > > > > > >>>>    early in the boot process, as asked by Rob
> > > > > > > >>>>
> > > > > > > >>>> Alexandre Ghiti (4):
> > > > > > > >>>>    riscv: Get rid of riscv_pfn_base variable
> > > > > > > >>>>    mm: Introduce memblock_isolate_memory
> > > > > > > >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> > > > > > > >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> > > > > > > >>> Kernel boot fine on RV64 but there is a failure which is still not
> > > > > > > >>> addressed. You can see this failure as following message in
> > > > > > > >>> kernel boot log:
> > > > > > > >>>      0.000000] Failed to add a System RAM resource at 80200000
> > > > > > > >> Hmmm I don't get that in any of my test configs, would you mind
> > > > > > > >> sharing yours and your qemu command line?
> > > > > > > > Try alexghiti_test branch at
> > > > > > > > https://github.com/avpatel/linux.git
> > > > > > > >
> > > > > > > > I am building the kernel using defconfig and my rootfs is
> > > > > > > > based on busybox.
> > > > > > > >
> > > > > > > > My QEMU command is:
> > > > > > > > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > > > > > > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > > > > > > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > > > > > > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> > > > > > >
> > > > > > >
> > > > > > > So splitting memblock.memory is the culprit, it "confuses" the resources
> > > > > > > addition and I can only find hacky ways to fix that...
> > > > > > Hi Alexandre,
> > > > > >
> > > > > > We encountered the same error as Anup. After adding your patch
> > > > > > (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
> > > > > > error again.
> > > > > >
> > > > > > What I have observed so far is
> > > > > >
> > > > > > - before your patch
> > > > > > When merging consecutive memblocks, if the memblock types are different,
> > > > > > they will be merged into reserved
> > > > > > - after your patch
> > > > > > When consecutive memblocks are merged, if the memblock types are
> > > > > > different, they will be merged into memory.
> > > > > >
> > > > > > Such a result will cause the memory location of OpenSBI to be changed
> > > > > > from reserved to memory. Will this have any side effects?
> > > > >
> > > > > I guess it will end up in the memory pool and pages from openSBI
> > > > > region will be allocated, so we should see very quickly bad stuff
> > > > > happening (either PMP violation or M-mode ecall never
> > > > > returning/trapping/etc).
> > > > >
> > > > > But I don't observe the same thing, I always see the openSBI region
> > > > > being reserved:
> > > > >
> > > > > reserved[0x0] [0x0000000080000000-0x000000008007ffff],
> > > > > 0x0000000000080000 bytes flags: 0x0
> > > > >
> > > > > Can you elaborate a bit more about "When consecutive memblocks are
> > > > > merged, if the memblock types are different, they will be merged into
> > > > > memory"? Where/when does this merge happen? Can you give me a config
> > > > > file and a kernel revision so that I can take a look?
> > > > Ok, If you want to reproduce the same results you just need to modify OpenSBI
> > > >
> > > > [ lib/sbi/sbi_domain.c ]
> > > > +#define TEST_SIZE 0x200000
> > > >
> > > > -                                 (scratch->fw_size - scratch->fw_rw_offset),
> > > > +                                 (TEST_SIZE - scratch->fw_rw_offset),
> > > >
> > > > In addition, you can insert checks in the kernel merged function
> > > > [ mm/memblock.c ]
> > > > static void __init_memblock memblock_merge_regions(struct memblock_type *type)
> > > >         while (i < type->cnt - 1) {
> > > >          ...
> > > >                 /* move forward from next + 1, index of which is i + 2 */
> > > >                 memmove(next, next + 1, (type->cnt - (i + 2)) * sizeof(*next));
> > > >                 type->cnt--;
> > > >         }
> > > > +       pr_info("Merged memblock_type: cnt = %lu, max = %lu,
> > > > total_size = 0x%llx\n",type->cnt, type->max, type->total_size);
> > > > +       for (i = 0; i < type->cnt; i++) {
> > > > +               const char *region_type =
> > > > memblock_is_memory(type->regions[i].base) ? "memory" : "reserve";
> > > > +               pr_info("Region %d: base = 0x%llx, size = 0x%llx, type
> > > > = %s\n", i, type->regions[i].base, type->regions[i].size,
> > > > region_type);
> > > > +       }
> > > >  }
> > > > This is kernel boot log
> > > > - before your patch
> > > > ...
> > > > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > > > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1628501
> > > > [    0.000000] Region 0: base = 0x80000000, size = 0x1600000, type = reserve
> > > > ...
> > > >
> > > > - after your patch
> > > > ...
> > > > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > > > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x180c42e
> > > > [    0.000000] Region 0: base = 0x80000000, size = 0x1800000, type = memory
> > >
> > Hi Alex, thanks for your feedback.
> > > So the openSBI region is marked as memory, and not reserved because
> > > this region is now described as nomap, and memblock_mark_nomap() does
> > > not move this region into the reserved memblock list, but keep it in
> > > the memory list with the nomap flag
> > > (https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L479).
> > > But as stated in the description of memblock_mark_nomap()
> > > (https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L969),
> > > the pages associated with the region will be marked as PageReserved
> > > and the region will not be covered in the linear mapping.
> > I traced it via GDB, and indeed, it enters
> > early_init_dt_reserve_memory() and calls memblock_reserve to reserve
> > this block of memory.
> >
> > [before your patch]
> > [    0.000000] OF: fdt: check nomap Reserving memory: base =
> > 0x80000000, size = 0x200000
> > [    0.000000] ---  Reserved memory: Base address: 80000000, Size:
> > 200000---
> > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size =
> > 0x1e28501
> > [    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type =
> > reserve
> > [    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type =
> > memory
> > ....
> > [    0.000000] OF: fdt: Reserved memory: reserved region for node
> > 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
> > [    0.000000] OF: reserved mem:
> > 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
> > mmode_resv0@80000000
> >
> > [after your patch]
> > [    0.000000] OF: fdt: check nomap Reserving memory: base =
> > 0x80000000, size = 0x200000
> > [    0.000000] --- Reserved memory: Base address: 80000000, Size: 200000---
> > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1e25501
> > [    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type = memory
> > [    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type = memory
> > ...
> > [    0.000000] OF: fdt: Reserved memory: reserved region for node
> > 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
> > [    0.000000] OF: reserved mem:
> > 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
> > mmode_resv0@80000000
> >
> > At the moment, it can be confirmed that there is no need to worry
> > about this block of memory being used.
> >
> > But I still have a question I'd like to ask, which is why this
> > location is flagged as 'reserve' instead of 'memory' in the memblock
> Sorry, I asked the wrong question.
>
> Why is this location marked as "memory" instead of "reserve" in the memblock?

No idea, let's see if @Mike Rapoport can answer this :)

> >
> > Thanks,
> > Nylon
> > >
> > > So to me, this is normal and we are safe. Let me know if I made a mistake.
> > >
> > > And sorry for the long delay, that slipped my mind!
> > >
> > > Thanks,
> > >
> > > Alex
> > >
> > > > ...
> > > > [    0.000000] Failed to add a system RAM resource at 80200000
> > > > ...
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > > > > >
> > > > > > > So given that the arm64 patch with the new API is not pretty and that
> > > > > > > the simplest solution is to re-merge the memblock regions afterwards
> > > > > > > (which is done by memblock_clear_nomap), I'll drop the new API and the
> > > > > > > arm64 patch to use the nomap API like arm64: I'll take advantage of that
> > > > > > > to clean setup_vm_final which I have wanted to do for a long time.
> > > > > > >
> > > > > > > @Mike Thanks for you reviews!
> > > > > > >
> > > > > > > @Anup Thanks for all your bug reports on this patchset, I have to
> > > > > > > improve my test flow (it is in the work :)).
> > > > > > >
> > > > > > >
> > > > > > > > Regards,
> > > > > > > > Anup
> > > > > > > >
> > > > > > > >> Thanks
> > > > > > > >>
> > > > > > > >>> Regards,
> > > > > > > >>> Anup
> > > > > > > >>>
> > > > > > > >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > > > > > > >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> > > > > > > >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > > > > > > >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > > > > > > >>>>   drivers/of/fdt.c              | 11 ++++----
> > > > > > > >>>>   include/linux/memblock.h      |  1 +
> > > > > > > >>>>   mm/memblock.c                 | 20 +++++++++++++
> > > > > > > >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> > > > > > > >>>>
> > > > > > > >>>> --
> > > > > > > >>>> 2.37.2
> > > > > > > >>>>
> > > > > > > > _______________________________________________
> > > > > > > > linux-riscv mailing list
> > > > > > > > linux-riscv@lists.infradead.org
> > > > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > linux-riscv mailing list
> > > > > > > linux-riscv@lists.infradead.org
> > > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv