mbox series

[v3,0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores

Message ID 20200224121733.2202-1-ardb@kernel.org (mailing list archive)
Headers show
Series ARM: decompressor: use by-VA cache maintenance for v7 cores | expand

Message

Ard Biesheuvel Feb. 24, 2020, 12:17 p.m. UTC
While making changes to the EFI stub startup code, I noticed that we are
still doing set/way maintenance on the caches when booting on v7 cores.
This works today on VMs by virtue of the fact that KVM traps set/way ops
and cleans the whole address space by VA on behalf of the guest, and on
most v7 hardware, the set/way ops are in fact sufficient when only one
core is running, as there usually is no system cache. But on systems
like SynQuacer, for which 32-bit firmware is available, the current cache
maintenance only pushes the data out to the L3 system cache, where it
is not visible to the CPU once it turns the MMU and caches off.

So instead, switch to the by-VA cache maintenance that the architecture
requires for v7 and later (and ARM1176, as a side effect).

Changes since v2:
- add a patch to factor out the code sequence that obtains the inflated image
  size by doing an unaligned LE32 load from the end of the compressed data
- use new macro to load the inflated image size instead of doing a potentially
  unaligned load
- omit the stack for getting the base and size of the self-relocated zImage

Changes since v1:
- include the EFI patch that was sent out separately before (#1)
- split the preparatory work to pass the region to clean in r0/r1 in a EFI
  specific one and one for the decompressor - this way, the first two patches
  can go on a stable branch that is shared between the ARM tree and the EFI
  tree
- document the meaning of the values in r0/r1 upon entry to cache_clean_flush
- take care to treat the region end address as exclusive
- switch to clean+invalidate to align with the other implementations
- drop some code that manages the stack pointer value before calling
  cache_clean_flush(), which is no longer necessary
- take care to clean the entire region that is covered by the relocated zImage
  if it needs to relocate itself before decompressing

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm32-efi-cache-ops

[ Several people asked me offline why on earth I am running SynQuacer on 32 bit:
  the answer is that this is simply to prove that it is currently broken, and
  this implies that for 32-bit VMs running under KVM, we are relying on the
  special, non-architectural cache management done by the hypervisor on behalf
  of the guest to be able to run this code. ]

Cc: Russell King <linux@armlinux.org.uk>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Linus Walleij <linus.walleij@linaro.org>

Ard Biesheuvel (5):
  efi/arm: Work around missing cache maintenance in decompressor
    handover
  efi/arm: Pass start and end addresses to cache_clean_flush()
  ARM: decompressor: factor out routine to obtain the inflated image
    size
  ARM: decompressor: prepare cache_clean_flush for doing by-VA
    maintenance
  ARM: decompressor: switch to by-VA cache maintenance for v7 cores

 arch/arm/boot/compressed/head.S | 166 +++++++++++---------
 1 file changed, 91 insertions(+), 75 deletions(-)

Comments

Linus Walleij Feb. 25, 2020, 3:48 p.m. UTC | #1
On Mon, Feb 24, 2020 at 1:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> While making changes to the EFI stub startup code, I noticed that we are
> still doing set/way maintenance on the caches when booting on v7 cores.
> This works today on VMs by virtue of the fact that KVM traps set/way ops
> and cleans the whole address space by VA on behalf of the guest, and on
> most v7 hardware, the set/way ops are in fact sufficient when only one
> core is running, as there usually is no system cache. But on systems
> like SynQuacer, for which 32-bit firmware is available, the current cache
> maintenance only pushes the data out to the L3 system cache, where it
> is not visible to the CPU once it turns the MMU and caches off.
>
> So instead, switch to the by-VA cache maintenance that the architecture
> requires for v7 and later (and ARM1176, as a side effect).

I took this v3 patch set for a ride on some ARMv7 and ARMv6
(hardware) boards using zImage:s so the compressed path
should be exercised:

- Ux500 (ARMv7 Cortex A9 x 2) works like a charm
- RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm

Tested-by: Linus Walleij <linus.walleij@linaro.org>

I can do more thorough tests with more boards if needed.

Yours,
Linus Walleij
Ard Biesheuvel Feb. 25, 2020, 5:18 p.m. UTC | #2
On Tue, 25 Feb 2020 at 16:48, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Mon, Feb 24, 2020 at 1:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > While making changes to the EFI stub startup code, I noticed that we are
> > still doing set/way maintenance on the caches when booting on v7 cores.
> > This works today on VMs by virtue of the fact that KVM traps set/way ops
> > and cleans the whole address space by VA on behalf of the guest, and on
> > most v7 hardware, the set/way ops are in fact sufficient when only one
> > core is running, as there usually is no system cache. But on systems
> > like SynQuacer, for which 32-bit firmware is available, the current cache
> > maintenance only pushes the data out to the L3 system cache, where it
> > is not visible to the CPU once it turns the MMU and caches off.
> >
> > So instead, switch to the by-VA cache maintenance that the architecture
> > requires for v7 and later (and ARM1176, as a side effect).
>
> I took this v3 patch set for a ride on some ARMv7 and ARMv6
> (hardware) boards using zImage:s so the compressed path
> should be exercised:
>
> - Ux500 (ARMv7 Cortex A9 x 2) works like a charm
> - RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm
>
> Tested-by: Linus Walleij <linus.walleij@linaro.org>
>
> I can do more thorough tests with more boards if needed.
>

Thanks Linus. Do you happen to have any boards that boot with appended DTB?
Ard Biesheuvel Feb. 25, 2020, 5:30 p.m. UTC | #3
On Tue, 25 Feb 2020 at 18:18, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 25 Feb 2020 at 16:48, Linus Walleij <linus.walleij@linaro.org> wrote:
> >
> > On Mon, Feb 24, 2020 at 1:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > While making changes to the EFI stub startup code, I noticed that we are
> > > still doing set/way maintenance on the caches when booting on v7 cores.
> > > This works today on VMs by virtue of the fact that KVM traps set/way ops
> > > and cleans the whole address space by VA on behalf of the guest, and on
> > > most v7 hardware, the set/way ops are in fact sufficient when only one
> > > core is running, as there usually is no system cache. But on systems
> > > like SynQuacer, for which 32-bit firmware is available, the current cache
> > > maintenance only pushes the data out to the L3 system cache, where it
> > > is not visible to the CPU once it turns the MMU and caches off.
> > >
> > > So instead, switch to the by-VA cache maintenance that the architecture
> > > requires for v7 and later (and ARM1176, as a side effect).
> >
> > I took this v3 patch set for a ride on some ARMv7 and ARMv6
> > (hardware) boards using zImage:s so the compressed path
> > should be exercised:
> >
> > - Ux500 (ARMv7 Cortex A9 x 2) works like a charm
> > - RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm
> >
> > Tested-by: Linus Walleij <linus.walleij@linaro.org>
> >
> > I can do more thorough tests with more boards if needed.
> >
>
> Thanks Linus. Do you happen to have any boards that boot with appended DTB?

Actually, I can easily test that myself as well in QEMU.
Linus Walleij Feb. 25, 2020, 9:25 p.m. UTC | #4
On Tue, Feb 25, 2020 at 6:18 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Tue, 25 Feb 2020 at 16:48, Linus Walleij <linus.walleij@linaro.org> wrote:

> > I took this v3 patch set for a ride on some ARMv7 and ARMv6
> > (hardware) boards using zImage:s so the compressed path
> > should be exercised:
> >
> > - Ux500 (ARMv7 Cortex A9 x 2) works like a charm
> > - RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm
> >
> > Tested-by: Linus Walleij <linus.walleij@linaro.org>
> >
> > I can do more thorough tests with more boards if needed.
>
> Thanks Linus. Do you happen to have any boards that boot with appended DTB?

Oh, both of these use appended DTB so it's definitely working.

Yours,
Linus Walleij