diff mbox series

[2/3] Documentation: riscv: Add documentation that describes the VM layout

Message ID 20210225080453.1314-3-alex@ghiti.fr (mailing list archive)
State New, archived
Headers show
Series Move kernel mapping outside the linear mapping | expand

Commit Message

Alexandre Ghiti Feb. 25, 2021, 8:04 a.m. UTC
This new document presents the RISC-V virtual memory layout and is based
one the x86 one: it describes the different limits of the different regions
of the virtual address space.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
 Documentation/riscv/index.rst     |  1 +
 Documentation/riscv/vm-layout.rst | 61 +++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)
 create mode 100644 Documentation/riscv/vm-layout.rst

Comments

David Hildenbrand Feb. 25, 2021, 10:34 a.m. UTC | #1
|            |                  |         |> + 
ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
> +   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
> +   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
> +   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
> +   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB | vmalloc/ioremap space
> +   ffffffe000000000 | -128    GB | ffffffff7fffffff |  126 GB | direct mapping of all physical memory

^ So you could never ever have more than 126 GB, correct?

I assume that's nothing new.
Alexandre Ghiti Feb. 25, 2021, 11:56 a.m. UTC | #2
Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :
>                   |            |                  |         |> + 
> ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
>> +   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
>> +   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
>> +   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
>> +   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB | 
>> vmalloc/ioremap space
>> +   ffffffe000000000 | -128    GB | ffffffff7fffffff |  126 GB | 
>> direct mapping of all physical memory
> 
> ^ So you could never ever have more than 126 GB, correct?
> 
> I assume that's nothing new.
> 

Before this patch, the limit was 128GB, so in my sense, there is nothing 
new. If ever we want to increase that limit, we'll just have to lower 
PAGE_OFFSET, there is still some unused virtual addresses after kasan 
for example.

Thanks,

Alex
Arnd Bergmann March 10, 2021, 11:42 a.m. UTC | #3
On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote:
>
> Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :
> >                   |            |                  |         |> +
> > ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
> >> +   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
> >> +   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
> >> +   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
> >> +   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB |
> >> vmalloc/ioremap space
> >> +   ffffffe000000000 | -128    GB | ffffffff7fffffff |  126 GB |
> >> direct mapping of all physical memory
> >
> > ^ So you could never ever have more than 126 GB, correct?
> >
> > I assume that's nothing new.
> >
>
> Before this patch, the limit was 128GB, so in my sense, there is nothing
> new. If ever we want to increase that limit, we'll just have to lower
> PAGE_OFFSET, there is still some unused virtual addresses after kasan
> for example.

Linus Walleij is looking into changing the arm32 code to have the kernel
direct map inside of the vmalloc area, which would be another place
that you could use here. It would be nice to not have too many different
ways of doing this, but I'm not sure how hard it would be to rework your
code, or if there are any downsides of doing this.

        Arnd
Alexandre Ghiti March 10, 2021, 7:12 p.m. UTC | #4
Hi Arnd,

Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit :
> On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>
>> Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :
>>>                    |            |                  |         |> +
>>> ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
>>>> +   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
>>>> +   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
>>>> +   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
>>>> +   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB |
>>>> vmalloc/ioremap space
>>>> +   ffffffe000000000 | -128    GB | ffffffff7fffffff |  126 GB |
>>>> direct mapping of all physical memory
>>>
>>> ^ So you could never ever have more than 126 GB, correct?
>>>
>>> I assume that's nothing new.
>>>
>>
>> Before this patch, the limit was 128GB, so in my sense, there is nothing
>> new. If ever we want to increase that limit, we'll just have to lower
>> PAGE_OFFSET, there is still some unused virtual addresses after kasan
>> for example.
> 
> Linus Walleij is looking into changing the arm32 code to have the kernel
> direct map inside of the vmalloc area, which would be another place
> that you could use here. It would be nice to not have too many different
> ways of doing this, but I'm not sure how hard it would be to rework your
> code, or if there are any downsides of doing this.

This was what my previous version did: https://lkml.org/lkml/2020/6/7/28.

This approach was not welcomed very well and it fixed only the problem 
of the implementation of relocatable kernel. The second issue I'm trying 
to resolve here is to support both 3 and 4 level page tables using the 
same kernel without being relocatable (which would introduce performance 
penalty). I can't do it when the kernel mapping is in the vmalloc region 
since vmalloc region relies on PAGE_OFFSET which is different on both 3 
and 4 level page table and that would then require the kernel to be 
relocatable.

Alex

> 
>          Arnd
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
Arnd Bergmann March 11, 2021, 8:42 a.m. UTC | #5
On Wed, Mar 10, 2021 at 8:12 PM Alex Ghiti <alex@ghiti.fr> wrote:
> Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit :
> > On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>
> >> Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :
> >>>                    |            |                  |         |> +
> >>> ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
> >>>> +   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
> >>>> +   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
> >>>> +   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
> >>>> +   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB |
> >>>> vmalloc/ioremap space
> >>>> +   ffffffe000000000 | -128    GB | ffffffff7fffffff |  126 GB |
> >>>> direct mapping of all physical memory
> >>>
> >>> ^ So you could never ever have more than 126 GB, correct?
> >>>
> >>> I assume that's nothing new.
> >>>
> >>
> >> Before this patch, the limit was 128GB, so in my sense, there is nothing
> >> new. If ever we want to increase that limit, we'll just have to lower
> >> PAGE_OFFSET, there is still some unused virtual addresses after kasan
> >> for example.
> >
> > Linus Walleij is looking into changing the arm32 code to have the kernel
> > direct map inside of the vmalloc area, which would be another place
> > that you could use here. It would be nice to not have too many different
> > ways of doing this, but I'm not sure how hard it would be to rework your
> > code, or if there are any downsides of doing this.
>
> This was what my previous version did: https://lkml.org/lkml/2020/6/7/28.
>
> This approach was not welcomed very well and it fixed only the problem
> of the implementation of relocatable kernel. The second issue I'm trying
> to resolve here is to support both 3 and 4 level page tables using the
> same kernel without being relocatable (which would introduce performance
> penalty). I can't do it when the kernel mapping is in the vmalloc region
> since vmalloc region relies on PAGE_OFFSET which is different on both 3
> and 4 level page table and that would then require the kernel to be
> relocatable.

Ok, I see.

I suppose it might work if you moved the direct-map to the lowest
address and the vmalloc area (incorporating the kernel mapping,
modules, pio, and fixmap at fixed addresses) to the very top of the
address space, but you probably already considered and rejected
that for other reasons.

         Arnd
Alexandre Ghiti March 13, 2021, 8:23 a.m. UTC | #6
Hi Arnd,

Le 3/11/21 à 3:42 AM, Arnd Bergmann a écrit :
> On Wed, Mar 10, 2021 at 8:12 PM Alex Ghiti <alex@ghiti.fr> wrote:
>> Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit :
>>> On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>
>>>> Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :
>>>>>                     |            |                  |         |> +
>>>>> ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
>>>>>> +   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
>>>>>> +   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
>>>>>> +   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
>>>>>> +   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB |
>>>>>> vmalloc/ioremap space
>>>>>> +   ffffffe000000000 | -128    GB | ffffffff7fffffff |  126 GB |
>>>>>> direct mapping of all physical memory
>>>>>
>>>>> ^ So you could never ever have more than 126 GB, correct?
>>>>>
>>>>> I assume that's nothing new.
>>>>>
>>>>
>>>> Before this patch, the limit was 128GB, so in my sense, there is nothing
>>>> new. If ever we want to increase that limit, we'll just have to lower
>>>> PAGE_OFFSET, there is still some unused virtual addresses after kasan
>>>> for example.
>>>
>>> Linus Walleij is looking into changing the arm32 code to have the kernel
>>> direct map inside of the vmalloc area, which would be another place
>>> that you could use here. It would be nice to not have too many different
>>> ways of doing this, but I'm not sure how hard it would be to rework your
>>> code, or if there are any downsides of doing this.
>>
>> This was what my previous version did: https://lkml.org/lkml/2020/6/7/28.
>>
>> This approach was not welcomed very well and it fixed only the problem
>> of the implementation of relocatable kernel. The second issue I'm trying
>> to resolve here is to support both 3 and 4 level page tables using the
>> same kernel without being relocatable (which would introduce performance
>> penalty). I can't do it when the kernel mapping is in the vmalloc region
>> since vmalloc region relies on PAGE_OFFSET which is different on both 3
>> and 4 level page table and that would then require the kernel to be
>> relocatable.
> 
> Ok, I see.
> 
> I suppose it might work if you moved the direct-map to the lowest
> address and the vmalloc area (incorporating the kernel mapping,
> modules, pio, and fixmap at fixed addresses) to the very top of the
> address space, but you probably already considered and rejected
> that for other reasons.
> 

Yes I considered it...when you re-proposed it :) I'm not opposed to your 
solution in the vmalloc region but I can't find any advantage over the 
current solution, are there ? That would harmonize with Linus's work, 
but then we'd be quite different from x86 address space.

And by the way, thanks for having suggested the current solution in a 
previous conversation :)

Thanks again,

Alex

>           Arnd
>
Arnd Bergmann March 13, 2021, 10:34 p.m. UTC | #7
On Sat, Mar 13, 2021 at 9:23 AM Alex Ghiti <alex@ghiti.fr> wrote:
>
> Yes I considered it...when you re-proposed it :) I'm not opposed to your
> solution in the vmalloc region but I can't find any advantage over the
> current solution, are there ? That would harmonize with Linus's work,
> but then we'd be quite different from x86 address space.
>
> And by the way, thanks for having suggested the current solution in a
> previous conversation :)

Ah, I really need to keep track better of what I already commented on...

      Arnd
diff mbox series

Patch

diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst
index 6e6e39482502..ea915c196048 100644
--- a/Documentation/riscv/index.rst
+++ b/Documentation/riscv/index.rst
@@ -6,6 +6,7 @@  RISC-V architecture
     :maxdepth: 1
 
     boot-image-header
+    vm-layout
     pmu
     patch-acceptance
 
diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
new file mode 100644
index 000000000000..e8e569e2686a
--- /dev/null
+++ b/Documentation/riscv/vm-layout.rst
@@ -0,0 +1,61 @@ 
+=====================================
+Virtual Memory Layout on RISC-V Linux
+=====================================
+
+:Author: Alexandre Ghiti <alex@ghiti.fr>
+:Date: 12 February 2021
+
+This document describes the virtual memory layout used by the RISC-V Linux
+Kernel.
+
+RISC-V Linux Kernel 32bit
+=========================
+
+RISC-V Linux Kernel SV32
+------------------------
+
+TODO
+
+RISC-V Linux Kernel 64bit
+=========================
+
+The RISC-V privileged architecture document states that the 64bit addresses
+"must have bits 63–48 all equal to bit 47, or else a page-fault exception will
+occur.": that splits the virtual address space into 2 halves separated by a very
+big hole, the lower half is where the userspace resides, the upper half is where
+the RISC-V Linux Kernel resides.
+
+RISC-V Linux Kernel SV39
+------------------------
+
+::
+
+  ========================================================================================================================
+      Start addr    |   Offset   |     End addr     |  Size   | VM area description
+  ========================================================================================================================
+                    |            |                  |         |
+   0000000000000000 |    0       | 0000003fffffffff |  256 GB | user-space virtual memory, different per mm
+  __________________|____________|__________________|_________|___________________________________________________________
+                    |            |                  |         |
+   0000004000000000 | +256    GB | ffffffbfffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
+                    |            |                  |         |     virtual memory addresses up to the -256 GB
+                    |            |                  |         |     starting offset of kernel mappings.
+  __________________|____________|__________________|_________|___________________________________________________________
+                                                              |
+                                                              | Kernel-space virtual memory, shared between all processes:
+  ____________________________________________________________|___________________________________________________________
+                    |            |                  |         |
+   ffffffc000000000 | -256    GB | ffffffc7ffffffff |   32 GB | kasan
+   ffffffcefee00000 | -196    GB | ffffffcefeffffff |    2 MB | fixmap
+   ffffffceff000000 | -196    GB | ffffffceffffffff |   16 MB | PCI io
+   ffffffcf00000000 | -196    GB | ffffffcfffffffff |    4 GB | vmemmap
+   ffffffd000000000 | -192    GB | ffffffdfffffffff |   64 GB | vmalloc/ioremap space
+   ffffffe000000000 | -128    GB | ffffffff7fffffff |  126 GB | direct mapping of all physical memory
+  __________________|____________|__________________|_________|____________________________________________________________
+                                                              |
+                                                              |
+  ____________________________________________________________|____________________________________________________________
+                    |            |                  |         |
+   ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | modules
+   ffffffff80000000 |   -2    GB | ffffffffffffffff |    2 GB | kernel, BPF
+  __________________|____________|__________________|_________|____________________________________________________________