mbox series

[v2,00/12] 52-bit kernel + user VAs

Message ID 20190528161026.13193-1-steve.capper@arm.com (mailing list archive)
Headers show
Series 52-bit kernel + user VAs | expand

Message

Steve Capper May 28, 2019, 4:10 p.m. UTC
This patch series adds support for 52-bit kernel VAs using some of the
machinery already introduced by the 52-bit userspace VA code in 5.0.

As 52-bit virtual address support is an optional hardware feature,
software support for 52-bit kernel VAs needs to be deduced at early boot
time. If HW support is not available, the kernel falls back to 48-bit.

A significant proportion of this series focuses on "de-constifying"
VA_BITS related constants.

In order to allow for a KASAN shadow that changes size at boot time, one
must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
start address. Also, it is highly desirable to maintain the same
function addresses in the kernel .text between VA sizes. Both of these
requirements necessitate us to flip the kernel address space halves s.t.
the direct linear map occupies the lower addresses.

In V2 of this series (apologies for the long delay from V1), the major
change is that PAGE_OFFSET is retained as a constant. This allows for
much faster virt_to_page computations. This is achieved by expanding the
size of the VMEMMAP region to accommodate a disjoint 52-bit/48-bit
direct linear map. This has been found to work well in my testing, but I
would appreciate any feedback on this if it needs changing. To aid with
git bisect, this logic is broken down into a few smaller patches.

As far as I'm aware, there are two outstanding issues with this series
that need to be resolved:
 1) Is the code patching for ttbr1_offset safe? I need to analyse this
    a little more,
 2) How can this memory map be advertised to kdump tools/documentation?
    I was planning on getting the kernel VA structure agreed on, then I
    would add the relevant exports/documentation.

Cheers,

Comments

Anshuman Khandual June 7, 2019, 1:53 p.m. UTC | #1
Hello Steve,

On 05/28/2019 09:40 PM, Steve Capper wrote:
> This patch series adds support for 52-bit kernel VAs using some of the
> machinery already introduced by the 52-bit userspace VA code in 5.0.
> 
> As 52-bit virtual address support is an optional hardware feature,
> software support for 52-bit kernel VAs needs to be deduced at early boot
> time. If HW support is not available, the kernel falls back to 48-bit.

Just to summarize.

If kernel is configured for 52 bits then it just setups up infrastructure
for 52 bits kernel VA space.

When at the boot

a. Detects HW feature	   -> Use 52 bits VA on 52 bits infra
b. Does not detect feature -> Use 48 bits VA on 52 bits infra (adjusted)

> A significant proportion of this series focuses on "de-constifying"
> VA_BITS related constants.

I assume this is required for the situation (b) because of adjustments
at boot time which will be required after detecting that 52 bit is not
supported in the HW.
  
> 
> In order to allow for a KASAN shadow that changes size at boot time, one

Ditto as above ?

> must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
> start address. Also, it is highly desirable to maintain the same

Is there any particular reason why KASAN_SHADOW_START cannot be fixed and
KASAN_SHADOW_END "grow" instead ? Is it because we are trying to make start
address (which will be closer to VA_START) for all required sections variable ?

> function addresses in the kernel .text between VA sizes. Both of these

Kernel .text range should remain same as the kernel is already loaded in
memory at boot and executing while also trying to fix the effective VA_BITS
after detecting (or not) the 52 bits HW feature.

> requirements necessitate us to flip the kernel address space halves s.t.
> the direct linear map occupies the lower addresses.

Still trying to understand all the reasons for this VA space flip here.

The current kernel 48 bit VA range is split into two halves

1. Higher half	- [UL(~0) ...... PAGE_OFFSET] for linear mapping
2. Lower half	- [PAGE_OFFSET ... VA_START]  for everything else

The split in the middle is based on VA_BITS. When that becomes variable then
boot time computed lower half sections like kernel text, fixed mapping etc
become problematic as they are already running or being used and cannot be
relocated. This is caused by the fact the 48 bits to 52 bits adjustment can
only happen on the VA_START end as the other end UL(~0) is fixed. Hence move
those non-relocatable/fixed sections to  higher half so they dont get impacted
from the 48-52 bits adjustments. Linear mapping (so would PAGE_OFFSET) on the
other hand will have to grow/shrink (or not) during 48-52 bits adjustment.
Hence it can be aligned with the VA_START end instead. Is that correct or I
am missing something.

> 
> In V2 of this series (apologies for the long delay from V1), the major
> change is that PAGE_OFFSET is retained as a constant. This allows for
> much faster virt_to_page computations. This is achieved by expanding the

virt_to_page(), __va(), __pa() needs to be based on just linear offset
calculations else there will be performance impact. 

> size of the VMEMMAP region to accommodate a disjoint 52-bit/48-bit
> direct linear map. This has been found to work well in my testing, but I

I assume it means that we create linear mapping for the entire 52 bit VA
space but back it up with vmmmap struct page mapping only for the actual
bits (48 or 52) in use.
Steve Capper June 7, 2019, 2:24 p.m. UTC | #2
On Fri, Jun 07, 2019 at 07:23:59PM +0530, Anshuman Khandual wrote:
> Hello Steve,

Hi Anshuman,

> 
> On 05/28/2019 09:40 PM, Steve Capper wrote:
> > This patch series adds support for 52-bit kernel VAs using some of the
> > machinery already introduced by the 52-bit userspace VA code in 5.0.
> > 
> > As 52-bit virtual address support is an optional hardware feature,
> > software support for 52-bit kernel VAs needs to be deduced at early boot
> > time. If HW support is not available, the kernel falls back to 48-bit.
> 
> Just to summarize.
> 
> If kernel is configured for 52 bits then it just setups up infrastructure
> for 52 bits kernel VA space.
> 
> When at the boot
> 
> a. Detects HW feature	   -> Use 52 bits VA on 52 bits infra
> b. Does not detect feature -> Use 48 bits VA on 52 bits infra (adjusted)
> 
> > A significant proportion of this series focuses on "de-constifying"
> > VA_BITS related constants.
> 
> I assume this is required for the situation (b) because of adjustments
> at boot time which will be required after detecting that 52 bit is not
> supported in the HW.
>   
> > 
> > In order to allow for a KASAN shadow that changes size at boot time, one
> 
> Ditto as above ?
> 
> > must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
> > start address. Also, it is highly desirable to maintain the same
> 
> Is there any particular reason why KASAN_SHADOW_START cannot be fixed and
> KASAN_SHADOW_END "grow" instead ? Is it because we are trying to make start
> address (which will be closer to VA_START) for all required sections variable ?
> 

KASAN has a mode of operation whereby the shadow offset computation:
shadowPtr = (ptr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET 

is inlined into the executable with a constant scale and offset. As we
are dealing with TTBR1 style addresses (i.e. prefixed by 0xfff...) this
effectively means that the KASAN shadow end address becomes fixed (the
highest ptr is always ~0UL which is invariant to VA space size
changes).

The only way that I am aware of fixing the start address is to somehow
patch the KASAN_SHADOW_OFFSET, or prohibit the KASAN inline mode (which
would then hurt performance).

> > function addresses in the kernel .text between VA sizes. Both of these
> 
> Kernel .text range should remain same as the kernel is already loaded in
> memory at boot and executing while also trying to fix the effective VA_BITS
> after detecting (or not) the 52 bits HW feature.
> 
> > requirements necessitate us to flip the kernel address space halves s.t.
> > the direct linear map occupies the lower addresses.
> 
> Still trying to understand all the reasons for this VA space flip here.
> 
> The current kernel 48 bit VA range is split into two halves
> 
> 1. Higher half	- [UL(~0) ...... PAGE_OFFSET] for linear mapping
> 2. Lower half	- [PAGE_OFFSET ... VA_START]  for everything else
> 
> The split in the middle is based on VA_BITS. When that becomes variable then
> boot time computed lower half sections like kernel text, fixed mapping etc
> become problematic as they are already running or being used and cannot be
> relocated. This is caused by the fact the 48 bits to 52 bits adjustment can
> only happen on the VA_START end as the other end UL(~0) is fixed. Hence move
> those non-relocatable/fixed sections to  higher half so they dont get impacted
> from the 48-52 bits adjustments. Linear mapping (so would PAGE_OFFSET) on the
> other hand will have to grow/shrink (or not) during 48-52 bits adjustment.
> Hence it can be aligned with the VA_START end instead. Is that correct or I
> am missing something.
> 

Agreed with the .text addresses. For PAGE_OFFSET we don't strictly need
it to point to the start of the linear map if we grow the vmemmap and
adjust the (already variable) vmemmap offset (along with physvirt_offset).

Also we need to flip the VA space to fit KASAN in as it will grow from
the start.

> > 
> > In V2 of this series (apologies for the long delay from V1), the major
> > change is that PAGE_OFFSET is retained as a constant. This allows for
> > much faster virt_to_page computations. This is achieved by expanding the
> 
> virt_to_page(), __va(), __pa() needs to be based on just linear offset
> calculations else there will be performance impact. 
> 

IIUC I've maintained equal perf for these, but if I've missed something
please shout :-).

> > size of the VMEMMAP region to accommodate a disjoint 52-bit/48-bit
> > direct linear map. This has been found to work well in my testing, but I
> 
> I assume it means that we create linear mapping for the entire 52 bit VA
> space but back it up with vmmmap struct page mapping only for the actual
> bits (48 or 52) in use.
>

That is my understanding too.

A big thank you for looking at this!

Cheers,
Bhupesh Sharma June 10, 2019, 10:40 a.m. UTC | #3
Hi Steve,

Thanks for the v2. I still did not get much time to go through this in 
deep and have a go with the same on LVA supporting prototype platforms 
or old CPUs (which don't support ARMv8.2 LVA/LPA extensions) I have. May 
be I will give this a quick check on the same in a day or two.

On 05/28/2019 09:40 PM, Steve Capper wrote:
> This patch series adds support for 52-bit kernel VAs using some of the
> machinery already introduced by the 52-bit userspace VA code in 5.0.
> 
> As 52-bit virtual address support is an optional hardware feature,
> software support for 52-bit kernel VAs needs to be deduced at early boot
> time. If HW support is not available, the kernel falls back to 48-bit.
> 
> A significant proportion of this series focuses on "de-constifying"
> VA_BITS related constants.
> 
> In order to allow for a KASAN shadow that changes size at boot time, one
> must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
> start address. Also, it is highly desirable to maintain the same
> function addresses in the kernel .text between VA sizes. Both of these
> requirements necessitate us to flip the kernel address space halves s.t.
> the direct linear map occupies the lower addresses.
> 
> In V2 of this series (apologies for the long delay from V1), the major
> change is that PAGE_OFFSET is retained as a constant. This allows for
> much faster virt_to_page computations. This is achieved by expanding the
> size of the VMEMMAP region to accommodate a disjoint 52-bit/48-bit
> direct linear map. This has been found to work well in my testing, but I
> would appreciate any feedback on this if it needs changing. To aid with
> git bisect, this logic is broken down into a few smaller patches.
> 
> As far as I'm aware, there are two outstanding issues with this series
> that need to be resolved:
>   1) Is the code patching for ttbr1_offset safe? I need to analyse this
>      a little more,
>   2) How can this memory map be advertised to kdump tools/documentation?
>      I was planning on getting the kernel VA structure agreed on, then I
>      would add the relevant exports/documentation.


Indeed, in the absence of corresponding changes to the Documentation 
section,
it is hard to visualize the changes being made in the memory map.

Also I would suggest that we note in the patchset itself (may be the git 
log) that kdump tools (or even crash for that matter) will be broken 
with this patchset - to prevent kernel bugs being reported.

BTW, James and I are already discussing more coherent methods (see [0]) 
to manage this exporting of information to user-land (so to that we can 
save ourselves from requiring to export new variables in the vmcoreinfo 
in case we have similar changes to the virtual/physical address spaces 
in future).

I will work on and send a patchset addressing the same shortly.

[0]. http://lists.infradead.org/pipermail/kexec/2019-June/023105.html

Thanks,
Bhupesh
Catalin Marinas June 10, 2019, 10:54 a.m. UTC | #4
On Mon, Jun 10, 2019 at 04:10:50PM +0530, Bhupesh Sharma wrote:
> On 05/28/2019 09:40 PM, Steve Capper wrote:
> >   2) How can this memory map be advertised to kdump tools/documentation?
> >      I was planning on getting the kernel VA structure agreed on, then I
> >      would add the relevant exports/documentation.
> 
> Indeed, in the absence of corresponding changes to the Documentation
> section, it is hard to visualize the changes being made in the memory
> map.

We used to have some better documentation in the arm64 memory.txt until
commit 08375198b010 ("arm64: Determine the vmalloc/vmemmap space at
build time based on VA_BITS") which removed it in favour of what the
kernel was printing. Subsequently, the kernel VA layout printing was
also removed. It would be nice to bring back the memory.txt, even if it
is for a single configuration as per defconfig.
Bhupesh Sharma June 10, 2019, 11:15 a.m. UTC | #5
Hi Catalin,

On Mon, Jun 10, 2019 at 4:24 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Mon, Jun 10, 2019 at 04:10:50PM +0530, Bhupesh Sharma wrote:
> > On 05/28/2019 09:40 PM, Steve Capper wrote:
> > >   2) How can this memory map be advertised to kdump tools/documentation?
> > >      I was planning on getting the kernel VA structure agreed on, then I
> > >      would add the relevant exports/documentation.
> >
> > Indeed, in the absence of corresponding changes to the Documentation
> > section, it is hard to visualize the changes being made in the memory
> > map.
>
> We used to have some better documentation in the arm64 memory.txt until
> commit 08375198b010 ("arm64: Determine the vmalloc/vmemmap space at
> build time based on VA_BITS") which removed it in favour of what the
> kernel was printing. Subsequently, the kernel VA layout printing was
> also removed. It would be nice to bring back the memory.txt, even if it
> is for a single configuration as per defconfig.

Indeed, that's what I suggested during the v1 review as well. See
<https://www.spinics.net/lists/arm-kernel/msg718096.html> for details.

Also, we may want to have a doc dedicated to 52-bit address space
details on arm64, similar to what we have currently for x86 (see [1a]
and [1b])

[1a]. https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/5level-paging.txt
[1b].https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt

Thanks,
Bhupesh