diff mbox series

[PULL,18/38] hw/arm/virt: Honor highmem setting when computing the memory map

Message ID 20220120123630.267975-19-peter.maydell@linaro.org (mailing list archive)
State New, archived
Headers show
Series [PULL,01/38] hw/arm/virt: KVM: Enable PAuth when supported by the host | expand

Commit Message

Peter Maydell Jan. 20, 2022, 12:36 p.m. UTC
From: Marc Zyngier <maz@kernel.org>

Even when the VM is configured with highmem=off, the highest_gpa
field includes devices that are above the 4GiB limit.
Similarily, nothing seem to check that the memory is within
the limit set by the highmem=off option.

This leads to failures in virt_kvm_type() on systems that have
a crippled IPA range, as the reported IPA space is larger than
what it should be.

Instead, honor the user-specified limit to only use the devices
at the lowest end of the spectrum, and fail if we have memory
crossing the 4GiB limit.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-id: 20220114140741.1358263-4-maz@kernel.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Akihiko Odaki Feb. 13, 2022, 5:05 a.m. UTC | #1
On 2022/01/20 21:36, Peter Maydell wrote:
> From: Marc Zyngier <maz@kernel.org>
> 
> Even when the VM is configured with highmem=off, the highest_gpa
> field includes devices that are above the 4GiB limit.
> Similarily, nothing seem to check that the memory is within
> the limit set by the highmem=off option.
> 
> This leads to failures in virt_kvm_type() on systems that have
> a crippled IPA range, as the reported IPA space is larger than
> what it should be.
> 
> Instead, honor the user-specified limit to only use the devices
> at the lowest end of the spectrum, and fail if we have memory
> crossing the 4GiB limit.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Message-id: 20220114140741.1358263-4-maz@kernel.org
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   hw/arm/virt.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 62bdce1eb4b..3b839ba78ba 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1670,7 +1670,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>   static void virt_set_memmap(VirtMachineState *vms)
>   {
>       MachineState *ms = MACHINE(vms);
> -    hwaddr base, device_memory_base, device_memory_size;
> +    hwaddr base, device_memory_base, device_memory_size, memtop;
>       int i;
>   
>       vms->memmap = extended_memmap;
> @@ -1697,7 +1697,11 @@ static void virt_set_memmap(VirtMachineState *vms)
>       device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB;
>   
>       /* Base address of the high IO region */
> -    base = device_memory_base + ROUND_UP(device_memory_size, GiB);
> +    memtop = base = device_memory_base + ROUND_UP(device_memory_size, GiB);
> +    if (!vms->highmem && memtop > 4 * GiB) {
> +        error_report("highmem=off, but memory crosses the 4GiB limit\n");
> +        exit(EXIT_FAILURE);
> +    }
>       if (base < device_memory_base) {
>           error_report("maxmem/slots too huge");
>           exit(EXIT_FAILURE);
> @@ -1714,7 +1718,7 @@ static void virt_set_memmap(VirtMachineState *vms)
>           vms->memmap[i].size = size;
>           base += size;
>       }
> -    vms->highest_gpa = base - 1;
> +    vms->highest_gpa = (vms->highmem ? base : memtop) - 1;
>       if (device_memory_size > 0) {
>           ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
>           ms->device_memory->base = device_memory_base;

Hi,
This breaks in a case where highmem is disabled but can have more than 4 
GiB of RAM. M1 (Apple Silicon) actually can have 36-bit PA with HVF, 
which is not enough for highmem MMIO but is enough to contain 32 GiB of RAM.

Where the magic number of 4 GiB / 32-bit came from? I also don't quite 
understand what failures virt_kvm_type() had.

Regards,
Akihiko Odaki
Marc Zyngier Feb. 13, 2022, 10:22 a.m. UTC | #2
[+ Alex for HVF]

On Sun, 13 Feb 2022 05:05:33 +0000,
Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> 
> On 2022/01/20 21:36, Peter Maydell wrote:
> > From: Marc Zyngier <maz@kernel.org>
> > 
> > Even when the VM is configured with highmem=off, the highest_gpa
> > field includes devices that are above the 4GiB limit.
> > Similarily, nothing seem to check that the memory is within
> > the limit set by the highmem=off option.
> > 
> > This leads to failures in virt_kvm_type() on systems that have
> > a crippled IPA range, as the reported IPA space is larger than
> > what it should be.
> > 
> > Instead, honor the user-specified limit to only use the devices
> > at the lowest end of the spectrum, and fail if we have memory
> > crossing the 4GiB limit.
> > 
> > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Message-id: 20220114140741.1358263-4-maz@kernel.org
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> >   hw/arm/virt.c | 10 +++++++---
> >   1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 62bdce1eb4b..3b839ba78ba 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1670,7 +1670,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
> >   static void virt_set_memmap(VirtMachineState *vms)
> >   {
> >       MachineState *ms = MACHINE(vms);
> > -    hwaddr base, device_memory_base, device_memory_size;
> > +    hwaddr base, device_memory_base, device_memory_size, memtop;
> >       int i;
> >         vms->memmap = extended_memmap;
> > @@ -1697,7 +1697,11 @@ static void virt_set_memmap(VirtMachineState *vms)
> >       device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB;
> >         /* Base address of the high IO region */
> > -    base = device_memory_base + ROUND_UP(device_memory_size, GiB);
> > +    memtop = base = device_memory_base + ROUND_UP(device_memory_size, GiB);
> > +    if (!vms->highmem && memtop > 4 * GiB) {
> > +        error_report("highmem=off, but memory crosses the 4GiB limit\n");
> > +        exit(EXIT_FAILURE);
> > +    }
> >       if (base < device_memory_base) {
> >           error_report("maxmem/slots too huge");
> >           exit(EXIT_FAILURE);
> > @@ -1714,7 +1718,7 @@ static void virt_set_memmap(VirtMachineState *vms)
> >           vms->memmap[i].size = size;
> >           base += size;
> >       }
> > -    vms->highest_gpa = base - 1;
> > +    vms->highest_gpa = (vms->highmem ? base : memtop) - 1;
> >       if (device_memory_size > 0) {
> >           ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
> >           ms->device_memory->base = device_memory_base;
> 
> Hi,
> This breaks in a case where highmem is disabled but can have more than
> 4 GiB of RAM. M1 (Apple Silicon) actually can have 36-bit PA with HVF,
> which is not enough for highmem MMIO but is enough to contain 32 GiB
> of RAM.

Funny. The whole point of this series is to make it all work correctly
on M1.

> Where the magic number of 4 GiB / 32-bit came from?

Not exactly a magic number. From QEMU's docs/system/arm/virt.rst:

<quote>
highmem
  Set ``on``/``off`` to enable/disable placing devices and RAM in physical
  address space above 32 bits. The default is ``on`` for machine types
  later than ``virt-2.12``.
</quote>

TL;DR: Removing the bogus 'highmem=off' option from your command-line
should get you going with large memory spaces, up to the IPA limit.

The fact that you could run with 32GB of RAM while mandating that the
guest IPA space was limited to 32bit was nothing but a bug, further
"exploited" by HVF to allow disabling the highhmem devices which are
out of reach given the HW limitations (see [1] for details on the
discussion, specially around patch 3).

This is now fixed, and has been extended to work with any IPA size
(including 36bit machines such as M1).

> I also don't quite understand what failures virt_kvm_type() had.

QEMU works by first computing the memory map and passing the required
IPA limit to KVM as part of the VM type. By failing to take into
account the initial limit requirements to the IPA space (either via a
command-line option such as 'highmem', or by using the value provided
by KVM itself), QEMU would try to create a VM that cannot run on the
HW, and KVM would simply return an error.

All of this is documented as part of the KVM/arm64 API [2]. And with
this fixed, QEMU is able to correctly drive KVM on M1.

	M.

[1] https://lore.kernel.org/all/20210822144441.1290891-1-maz@kernel.org
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/virt/kvm/api.rst#n138
Peter Maydell Feb. 13, 2022, 10:45 a.m. UTC | #3
On Sun, 13 Feb 2022 at 10:22, Marc Zyngier <maz@kernel.org> wrote:
>
> [+ Alex for HVF]
>
> On Sun, 13 Feb 2022 05:05:33 +0000,
> Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > Hi,
> > This breaks in a case where highmem is disabled but can have more than
> > 4 GiB of RAM. M1 (Apple Silicon) actually can have 36-bit PA with HVF,
> > which is not enough for highmem MMIO but is enough to contain 32 GiB
> > of RAM.
>
> Funny. The whole point of this series is to make it all work correctly
> on M1.
>
> > Where the magic number of 4 GiB / 32-bit came from?
>
> Not exactly a magic number. From QEMU's docs/system/arm/virt.rst:
>
> <quote>
> highmem
>   Set ``on``/``off`` to enable/disable placing devices and RAM in physical
>   address space above 32 bits. The default is ``on`` for machine types
>   later than ``virt-2.12``.
> </quote>
>
> TL;DR: Removing the bogus 'highmem=off' option from your command-line
> should get you going with large memory spaces, up to the IPA limit.

Yep. I've tested this with hvf, and we now correctly:
 * refuse to put RAM above 32-bits if you asked for a 32-bit
   IPA space with highmem=off
 * use the full 36-bit address space if you don't say highmem=off
   on an M1

Note that there is a macos bug where if you don't say highmem=off
on an M1 Pro then you'll get a macos kernel panic. M1 non-Pro is fine.

thanks
-- PMM
Akihiko Odaki Feb. 13, 2022, 11:38 a.m. UTC | #4
On Sun, Feb 13, 2022 at 7:46 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Sun, 13 Feb 2022 at 10:22, Marc Zyngier <maz@kernel.org> wrote:
> >
> > [+ Alex for HVF]
> >
> > On Sun, 13 Feb 2022 05:05:33 +0000,
> > Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > > Hi,
> > > This breaks in a case where highmem is disabled but can have more than
> > > 4 GiB of RAM. M1 (Apple Silicon) actually can have 36-bit PA with HVF,
> > > which is not enough for highmem MMIO but is enough to contain 32 GiB
> > > of RAM.
> >
> > Funny. The whole point of this series is to make it all work correctly
> > on M1.
> >
> > > Where the magic number of 4 GiB / 32-bit came from?
> >
> > Not exactly a magic number. From QEMU's docs/system/arm/virt.rst:
> >
> > <quote>
> > highmem
> >   Set ``on``/``off`` to enable/disable placing devices and RAM in physical
> >   address space above 32 bits. The default is ``on`` for machine types
> >   later than ``virt-2.12``.
> > </quote>
> >
> > TL;DR: Removing the bogus 'highmem=off' option from your command-line
> > should get you going with large memory spaces, up to the IPA limit.
>
> Yep. I've tested this with hvf, and we now correctly:
>  * refuse to put RAM above 32-bits if you asked for a 32-bit
>    IPA space with highmem=off
>  * use the full 36-bit address space if you don't say highmem=off
>    on an M1
>
> Note that there is a macos bug where if you don't say highmem=off
> on an M1 Pro then you'll get a macos kernel panic. M1 non-Pro is fine.
>
> thanks
> -- PMM

I found that it actually gets the available PA bit of the emulated CPU
when highmem=on. I used "cortex-a72", which can have more than 36
bits. I just simply switched to "host"; hvf didn't support "host" when
I set up my VM but now it does.
Thanks for your prompt replies.

Regards,
Akihiko Odaki
Peter Maydell Feb. 13, 2022, 12:57 p.m. UTC | #5
On Sun, 13 Feb 2022 at 11:38, Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> I found that it actually gets the available PA bit of the emulated CPU
> when highmem=on. I used "cortex-a72", which can have more than 36
> bits. I just simply switched to "host"; hvf didn't support "host" when
> I set up my VM but now it does.

It's a bug that we accept 'cortex-a72' there. What should happen
is something like:
 * we want to use the ID register values of a cortex-a72
 * QEMU's hvf layer should say "no, that doesn't match the actual
   CPU we're running on", and give an error

This works correctly with KVM because there the kernel refuses
attempts to set ID registers to values that don't match the host;
for hvf the hvf APIs do permit lying to the guest about ID register
values so we need to do the check ourselves.

(The other approach would be to check the ID register values
and allow them to the extent that the host CPU actually has
the support for the features they imply, so you could "downgrade"
to a less capable CPU but not tell the guest it has feature X
if it isn't really there. But this is (a) a lot more complicated
and (b) gets into the swamp of trying to figure out how to tell
the guest about CPU errata -- the guest needs to apply errata
workarounds for the real host CPU, not for whatever the emulated
CPU is. So "just reject anything that's not an exact match" is
the easy approach.)

-- PMM
diff mbox series

Patch

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 62bdce1eb4b..3b839ba78ba 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1670,7 +1670,7 @@  static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
 static void virt_set_memmap(VirtMachineState *vms)
 {
     MachineState *ms = MACHINE(vms);
-    hwaddr base, device_memory_base, device_memory_size;
+    hwaddr base, device_memory_base, device_memory_size, memtop;
     int i;
 
     vms->memmap = extended_memmap;
@@ -1697,7 +1697,11 @@  static void virt_set_memmap(VirtMachineState *vms)
     device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB;
 
     /* Base address of the high IO region */
-    base = device_memory_base + ROUND_UP(device_memory_size, GiB);
+    memtop = base = device_memory_base + ROUND_UP(device_memory_size, GiB);
+    if (!vms->highmem && memtop > 4 * GiB) {
+        error_report("highmem=off, but memory crosses the 4GiB limit\n");
+        exit(EXIT_FAILURE);
+    }
     if (base < device_memory_base) {
         error_report("maxmem/slots too huge");
         exit(EXIT_FAILURE);
@@ -1714,7 +1718,7 @@  static void virt_set_memmap(VirtMachineState *vms)
         vms->memmap[i].size = size;
         base += size;
     }
-    vms->highest_gpa = base - 1;
+    vms->highest_gpa = (vms->highmem ? base : memtop) - 1;
     if (device_memory_size > 0) {
         ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
         ms->device_memory->base = device_memory_base;