diff mbox series

[v3,3/5] hw/arm/virt: Honor highmem setting when computing the memory map

Message ID 20211227211642.994461-4-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series target/arm: Reduced-IPA space and highmem=off fixes | expand

Commit Message

Marc Zyngier Dec. 27, 2021, 9:16 p.m. UTC
Even when the VM is configured with highmem=off, the highest_gpa
field includes devices that are above the 4GiB limit.
Similarily, nothing seem to check that the memory is within
the limit set by the highmem=off option.

This leads to failures in virt_kvm_type() on systems that have
a crippled IPA range, as the reported IPA space is larger than
what it should be.

Instead, honor the user-specified limit to only use the devices
at the lowest end of the spectrum, and fail if we have memory
crossing the 4GiB limit.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 hw/arm/virt.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Eric Auger Jan. 5, 2022, 9:22 a.m. UTC | #1
Hi Marc,

On 12/27/21 10:16 PM, Marc Zyngier wrote:
> Even when the VM is configured with highmem=off, the highest_gpa
> field includes devices that are above the 4GiB limit.
> Similarily, nothing seem to check that the memory is within
> the limit set by the highmem=off option.
>
> This leads to failures in virt_kvm_type() on systems that have
> a crippled IPA range, as the reported IPA space is larger than
> what it should be.
>
> Instead, honor the user-specified limit to only use the devices
> at the lowest end of the spectrum, and fail if we have memory
> crossing the 4GiB limit.
>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  hw/arm/virt.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 8b600d82c1..84dd3b36fb 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1678,6 +1678,11 @@ static void virt_set_memmap(VirtMachineState *vms)
>          exit(EXIT_FAILURE);
>      }
>  
> +    if (!vms->highmem &&
> +        vms->memmap[VIRT_MEM].base + ms->maxram_size > 4 * GiB) {
> +        error_report("highmem=off, but memory crosses the 4GiB limit\n");
> +        exit(EXIT_FAILURE);

The memory is composed of initial memory and device memory.
device memory is put after the initial memory but has a 1GB alignment
On top of that you have 1G page alignment per device memory slot

so potentially the highest mem address is larger than
vms->memmap[VIRT_MEM].base + ms->maxram_size.
I would rather do the check on device_memory_base + device_memory_size
> +    }
>      /*
>       * We compute the base of the high IO region depending on the
>       * amount of initial and device memory. The device memory start/size
> @@ -1707,7 +1712,9 @@ static void virt_set_memmap(VirtMachineState *vms)
>          vms->memmap[i].size = size;
>          base += size;
>      }
> -    vms->highest_gpa = base - 1;
> +    vms->highest_gpa = (vms->highmem ?
> +                        base :
> +                        vms->memmap[VIRT_MEM].base + ms->maxram_size) - 1;
As per the previous comment this looks wrong to me if !highmem.

If !highmem, if RAM requirements are low we still could get benefit from
REDIST2 and HIGH ECAM which could fit within the 4GB limit. But maybe we
simply don't care? If we don't, why don't we simply skip the
extended_memmap overlay as suggested in v2? I did not get your reply sorry.

Thanks

Eric
>      if (device_memory_size > 0) {
>          ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
>          ms->device_memory->base = device_memory_base;
Eric Auger Jan. 5, 2022, 9:36 a.m. UTC | #2
Hi Marc,

On 1/5/22 10:22 AM, Eric Auger wrote:
> Hi Marc,
> 
> On 12/27/21 10:16 PM, Marc Zyngier wrote:
>> Even when the VM is configured with highmem=off, the highest_gpa
>> field includes devices that are above the 4GiB limit.
>> Similarily, nothing seem to check that the memory is within
>> the limit set by the highmem=off option.
>>
>> This leads to failures in virt_kvm_type() on systems that have
>> a crippled IPA range, as the reported IPA space is larger than
>> what it should be.
>>
>> Instead, honor the user-specified limit to only use the devices
>> at the lowest end of the spectrum, and fail if we have memory
>> crossing the 4GiB limit.
>>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  hw/arm/virt.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 8b600d82c1..84dd3b36fb 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -1678,6 +1678,11 @@ static void virt_set_memmap(VirtMachineState *vms)
>>          exit(EXIT_FAILURE);
>>      }
>>  
>> +    if (!vms->highmem &&
>> +        vms->memmap[VIRT_MEM].base + ms->maxram_size > 4 * GiB) {
>> +        error_report("highmem=off, but memory crosses the 4GiB limit\n");
>> +        exit(EXIT_FAILURE);
> 
> The memory is composed of initial memory and device memory.
> device memory is put after the initial memory but has a 1GB alignment
> On top of that you have 1G page alignment per device memory slot
> 
> so potentially the highest mem address is larger than
> vms->memmap[VIRT_MEM].base + ms->maxram_size.
> I would rather do the check on device_memory_base + device_memory_size
>> +    }
>>      /*
>>       * We compute the base of the high IO region depending on the
>>       * amount of initial and device memory. The device memory start/size
>> @@ -1707,7 +1712,9 @@ static void virt_set_memmap(VirtMachineState *vms)
>>          vms->memmap[i].size = size;
>>          base += size;
>>      }
>> -    vms->highest_gpa = base - 1;
>> +    vms->highest_gpa = (vms->highmem ?
>> +                        base :
>> +                        vms->memmap[VIRT_MEM].base + ms->maxram_size) - 1;
> As per the previous comment this looks wrong to me if !highmem.
> 
> If !highmem, if RAM requirements are low we still could get benefit from
> REDIST2 and HIGH ECAM which could fit within the 4GB limit. But maybe we
the above assertion is wrong, sorry, as we kept the legacy mem map in
that situation (HIGH regions always put above 256 GB). So anyway we can
skip the extended memmap if !highmem.

Eric
> simply don't care? If we don't, why don't we simply skip the
> extended_memmap overlay as suggested in v2? I did not get your reply sorry.
> 
> Thanks
> 
> Eric
>>      if (device_memory_size > 0) {
>>          ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
>>          ms->device_memory->base = device_memory_base;
>
Marc Zyngier Jan. 6, 2022, 9:26 p.m. UTC | #3
On Wed, 05 Jan 2022 09:22:39 +0000,
Eric Auger <eric.auger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 12/27/21 10:16 PM, Marc Zyngier wrote:
> > Even when the VM is configured with highmem=off, the highest_gpa
> > field includes devices that are above the 4GiB limit.
> > Similarily, nothing seem to check that the memory is within
> > the limit set by the highmem=off option.
> >
> > This leads to failures in virt_kvm_type() on systems that have
> > a crippled IPA range, as the reported IPA space is larger than
> > what it should be.
> >
> > Instead, honor the user-specified limit to only use the devices
> > at the lowest end of the spectrum, and fail if we have memory
> > crossing the 4GiB limit.
> >
> > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  hw/arm/virt.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 8b600d82c1..84dd3b36fb 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1678,6 +1678,11 @@ static void virt_set_memmap(VirtMachineState *vms)
> >          exit(EXIT_FAILURE);
> >      }
> >  
> > +    if (!vms->highmem &&
> > +        vms->memmap[VIRT_MEM].base + ms->maxram_size > 4 * GiB) {
> > +        error_report("highmem=off, but memory crosses the 4GiB limit\n");
> > +        exit(EXIT_FAILURE);
> 
> The memory is composed of initial memory and device memory.
> device memory is put after the initial memory but has a 1GB alignment
> On top of that you have 1G page alignment per device memory slot
> 
> so potentially the highest mem address is larger than
> vms->memmap[VIRT_MEM].base + ms->maxram_size.
> I would rather do the check on device_memory_base + device_memory_size

Yup, that's a good point.

There is also a corner case in one of the later patches where I check
this limit against the PA using the rounded-up device_memory_size.
This could result in returning an error if the last memory slot would
still fit in the PA space, but the rounded-up quantity wouldn't. I
don't think it matters much, but I'll fix it anyway.

> > +    }
> >      /*
> >       * We compute the base of the high IO region depending on the
> >       * amount of initial and device memory. The device memory start/size
> > @@ -1707,7 +1712,9 @@ static void virt_set_memmap(VirtMachineState *vms)
> >          vms->memmap[i].size = size;
> >          base += size;
> >      }
> > -    vms->highest_gpa = base - 1;
> > +    vms->highest_gpa = (vms->highmem ?
> > +                        base :
> > +                        vms->memmap[VIRT_MEM].base + ms->maxram_size) - 1;
> As per the previous comment this looks wrong to me if !highmem.

Agreed.

> If !highmem, if RAM requirements are low we still could get benefit from
> REDIST2 and HIGH ECAM which could fit within the 4GB limit. But maybe we
> simply don't care?

I don't see how. These devices live at a minimum of 256GB, which
contradicts the very meaning of !highmem being a 4GB limit.

> If we don't, why don't we simply skip the extended_memmap overlay as
> suggested in v2? I did not get your reply sorry.

Because although this makes sense if you only care about a 32bit
limit, we eventually want to check against an arbitrary PA limit and
enable the individual devices that do fit in that space.

In order to do that, we need to compute the base addresses for these
extra devices. Also, computing 3 base addresses isn't going to be
massively expensive.

Thanks,

	M.
Eric Auger Jan. 7, 2022, 5:15 p.m. UTC | #4
Hi Marc,

On 1/6/22 10:26 PM, Marc Zyngier wrote:
> On Wed, 05 Jan 2022 09:22:39 +0000,
> Eric Auger <eric.auger@redhat.com> wrote:
>> Hi Marc,
>>
>> On 12/27/21 10:16 PM, Marc Zyngier wrote:
>>> Even when the VM is configured with highmem=off, the highest_gpa
>>> field includes devices that are above the 4GiB limit.
>>> Similarily, nothing seem to check that the memory is within
>>> the limit set by the highmem=off option.
>>>
>>> This leads to failures in virt_kvm_type() on systems that have
>>> a crippled IPA range, as the reported IPA space is larger than
>>> what it should be.
>>>
>>> Instead, honor the user-specified limit to only use the devices
>>> at the lowest end of the spectrum, and fail if we have memory
>>> crossing the 4GiB limit.
>>>
>>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>> ---
>>>  hw/arm/virt.c | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 8b600d82c1..84dd3b36fb 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -1678,6 +1678,11 @@ static void virt_set_memmap(VirtMachineState *vms)
>>>          exit(EXIT_FAILURE);
>>>      }
>>>  
>>> +    if (!vms->highmem &&
>>> +        vms->memmap[VIRT_MEM].base + ms->maxram_size > 4 * GiB) {
>>> +        error_report("highmem=off, but memory crosses the 4GiB limit\n");
>>> +        exit(EXIT_FAILURE);
>> The memory is composed of initial memory and device memory.
>> device memory is put after the initial memory but has a 1GB alignment
>> On top of that you have 1G page alignment per device memory slot
>>
>> so potentially the highest mem address is larger than
>> vms->memmap[VIRT_MEM].base + ms->maxram_size.
>> I would rather do the check on device_memory_base + device_memory_size
> Yup, that's a good point.
>
> There is also a corner case in one of the later patches where I check
> this limit against the PA using the rounded-up device_memory_size.
> This could result in returning an error if the last memory slot would
> still fit in the PA space, but the rounded-up quantity wouldn't. I
> don't think it matters much, but I'll fix it anyway.
>
>>> +    }
>>>      /*
>>>       * We compute the base of the high IO region depending on the
>>>       * amount of initial and device memory. The device memory start/size
>>> @@ -1707,7 +1712,9 @@ static void virt_set_memmap(VirtMachineState *vms)
>>>          vms->memmap[i].size = size;
>>>          base += size;
>>>      }
>>> -    vms->highest_gpa = base - 1;
>>> +    vms->highest_gpa = (vms->highmem ?
>>> +                        base :
>>> +                        vms->memmap[VIRT_MEM].base + ms->maxram_size) - 1;
>> As per the previous comment this looks wrong to me if !highmem.
> Agreed.
>
>> If !highmem, if RAM requirements are low we still could get benefit from
>> REDIST2 and HIGH ECAM which could fit within the 4GB limit. But maybe we
>> simply don't care?
> I don't see how. These devices live at a minimum of 256GB, which
> contradicts the very meaning of !highmem being a 4GB limit.
Yes I corrected the above statement afterwards, sorry for the noise.
>
>> If we don't, why don't we simply skip the extended_memmap overlay as
>> suggested in v2? I did not get your reply sorry.
> Because although this makes sense if you only care about a 32bit
> limit, we eventually want to check against an arbitrary PA limit and
> enable the individual devices that do fit in that space.

In my understanding that is what virt_kvm_type() was supposed to do by
testing the result of kvm_arm_get_max_vm_ipa_size and requested_pa_size
(which accounted the high regions) and exiting if they were
incompatible. But I must miss something.
>
> In order to do that, we need to compute the base addresses for these
> extra devices. Also, computing 3 base addresses isn't going to be
> massively expensive.
>
> Thanks,
>
> 	M.
>
Eric
Marc Zyngier Jan. 7, 2022, 6:18 p.m. UTC | #5
Hi Eric,

On Fri, 07 Jan 2022 17:15:19 +0000,
Eric Auger <eric.auger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 1/6/22 10:26 PM, Marc Zyngier wrote:
> > On Wed, 05 Jan 2022 09:22:39 +0000,
> > Eric Auger <eric.auger@redhat.com> wrote:
> >> Hi Marc,
> >>
> >> On 12/27/21 10:16 PM, Marc Zyngier wrote:
> >>> Even when the VM is configured with highmem=off, the highest_gpa
> >>> field includes devices that are above the 4GiB limit.
> >>> Similarily, nothing seem to check that the memory is within
> >>> the limit set by the highmem=off option.
> >>>
> >>> This leads to failures in virt_kvm_type() on systems that have
> >>> a crippled IPA range, as the reported IPA space is larger than
> >>> what it should be.
> >>>
> >>> Instead, honor the user-specified limit to only use the devices
> >>> at the lowest end of the spectrum, and fail if we have memory
> >>> crossing the 4GiB limit.
> >>>
> >>> Reviewed-by: Andrew Jones <drjones@redhat.com>
> >>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> >>> ---
> >>>  hw/arm/virt.c | 9 ++++++++-
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>> index 8b600d82c1..84dd3b36fb 100644
> >>> --- a/hw/arm/virt.c
> >>> +++ b/hw/arm/virt.c
> >>> @@ -1678,6 +1678,11 @@ static void virt_set_memmap(VirtMachineState *vms)
> >>>          exit(EXIT_FAILURE);
> >>>      }
> >>>  
> >>> +    if (!vms->highmem &&
> >>> +        vms->memmap[VIRT_MEM].base + ms->maxram_size > 4 * GiB) {
> >>> +        error_report("highmem=off, but memory crosses the 4GiB limit\n");
> >>> +        exit(EXIT_FAILURE);
> >> The memory is composed of initial memory and device memory.
> >> device memory is put after the initial memory but has a 1GB alignment
> >> On top of that you have 1G page alignment per device memory slot
> >>
> >> so potentially the highest mem address is larger than
> >> vms->memmap[VIRT_MEM].base + ms->maxram_size.
> >> I would rather do the check on device_memory_base + device_memory_size
> > Yup, that's a good point.
> >
> > There is also a corner case in one of the later patches where I check
> > this limit against the PA using the rounded-up device_memory_size.
> > This could result in returning an error if the last memory slot would
> > still fit in the PA space, but the rounded-up quantity wouldn't. I
> > don't think it matters much, but I'll fix it anyway.
> >
> >>> +    }
> >>>      /*
> >>>       * We compute the base of the high IO region depending on the
> >>>       * amount of initial and device memory. The device memory start/size
> >>> @@ -1707,7 +1712,9 @@ static void virt_set_memmap(VirtMachineState *vms)
> >>>          vms->memmap[i].size = size;
> >>>          base += size;
> >>>      }
> >>> -    vms->highest_gpa = base - 1;
> >>> +    vms->highest_gpa = (vms->highmem ?
> >>> +                        base :
> >>> +                        vms->memmap[VIRT_MEM].base + ms->maxram_size) - 1;
> >> As per the previous comment this looks wrong to me if !highmem.
> > Agreed.
> >
> >> If !highmem, if RAM requirements are low we still could get benefit from
> >> REDIST2 and HIGH ECAM which could fit within the 4GB limit. But maybe we
> >> simply don't care?
> > I don't see how. These devices live at a minimum of 256GB, which
> > contradicts the very meaning of !highmem being a 4GB limit.
> Yes I corrected the above statement afterwards, sorry for the noise.
> >
> >> If we don't, why don't we simply skip the extended_memmap overlay as
> >> suggested in v2? I did not get your reply sorry.
> > Because although this makes sense if you only care about a 32bit
> > limit, we eventually want to check against an arbitrary PA limit and
> > enable the individual devices that do fit in that space.
> 
> In my understanding that is what virt_kvm_type() was supposed to do by
> testing the result of kvm_arm_get_max_vm_ipa_size and requested_pa_size
> (which accounted the high regions) and exiting if they were
> incompatible. But I must miss something.

This is a chicken and egg problem: you need the IPA size to compute
the memory map, and you need the memory map to compute the IPA
size. Fun, isn't it?

At the moment, virt_set_memmap() doesn't know about the IPA space,
generates a highest_gpa that may not work, and we end-up failing
because the resulting VM type is out of bound.

My solution to that is to feed the *maximum* IPA size to
virt_set_memmap(), compute the memory map there, and then use
highest_gpa to compute the actual IPA size that is used to create the
VM. By knowing the IPA limit in virt_set_memmap(), I'm able to keep it
in check and avoid generating an unusable memory map.

I've tried to make that clearer in my v4. Hopefully I succeeded.

Thanks,

	M.
Peter Maydell Jan. 7, 2022, 6:48 p.m. UTC | #6
On Fri, 7 Jan 2022 at 18:18, Marc Zyngier <maz@kernel.org> wrote:
> This is a chicken and egg problem: you need the IPA size to compute
> the memory map, and you need the memory map to compute the IPA
> size. Fun, isn't it?
>
> At the moment, virt_set_memmap() doesn't know about the IPA space,
> generates a highest_gpa that may not work, and we end-up failing
> because the resulting VM type is out of bound.
>
> My solution to that is to feed the *maximum* IPA size to
> virt_set_memmap(), compute the memory map there, and then use
> highest_gpa to compute the actual IPA size that is used to create the
> VM. By knowing the IPA limit in virt_set_memmap(), I'm able to keep it
> in check and avoid generating an unusable memory map.

Is there any reason not to just always create the VM with the
maximum supported IPA size, rather than trying to create it
with the smallest IPA size that will work? (ie skip the last
step of computing the IPA size to create the VM with)

-- PMM
Marc Zyngier Jan. 7, 2022, 7:04 p.m. UTC | #7
On Fri, 07 Jan 2022 18:48:16 +0000,
Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> On Fri, 7 Jan 2022 at 18:18, Marc Zyngier <maz@kernel.org> wrote:
> > This is a chicken and egg problem: you need the IPA size to compute
> > the memory map, and you need the memory map to compute the IPA
> > size. Fun, isn't it?
> >
> > At the moment, virt_set_memmap() doesn't know about the IPA space,
> > generates a highest_gpa that may not work, and we end-up failing
> > because the resulting VM type is out of bound.
> >
> > My solution to that is to feed the *maximum* IPA size to
> > virt_set_memmap(), compute the memory map there, and then use
> > highest_gpa to compute the actual IPA size that is used to create the
> > VM. By knowing the IPA limit in virt_set_memmap(), I'm able to keep it
> > in check and avoid generating an unusable memory map.
> 
> Is there any reason not to just always create the VM with the
> maximum supported IPA size, rather than trying to create it
> with the smallest IPA size that will work? (ie skip the last
> step of computing the IPA size to create the VM with)

That gives KVM the opportunity to reduce the depth of the S2 page
tables. On HW that supports a large PA space, there is a real
advantage in keeping these shallow if at all possible.

Thanks,

	M.
diff mbox series

Patch

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8b600d82c1..84dd3b36fb 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1678,6 +1678,11 @@  static void virt_set_memmap(VirtMachineState *vms)
         exit(EXIT_FAILURE);
     }
 
+    if (!vms->highmem &&
+        vms->memmap[VIRT_MEM].base + ms->maxram_size > 4 * GiB) {
+        error_report("highmem=off, but memory crosses the 4GiB limit\n");
+        exit(EXIT_FAILURE);
+    }
     /*
      * We compute the base of the high IO region depending on the
      * amount of initial and device memory. The device memory start/size
@@ -1707,7 +1712,9 @@  static void virt_set_memmap(VirtMachineState *vms)
         vms->memmap[i].size = size;
         base += size;
     }
-    vms->highest_gpa = base - 1;
+    vms->highest_gpa = (vms->highmem ?
+                        base :
+                        vms->memmap[VIRT_MEM].base + ms->maxram_size) - 1;
     if (device_memory_size > 0) {
         ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
         ms->device_memory->base = device_memory_base;