diff mbox series

hvmloader: probe memory below 4G before allocation for OVMF

Message ID 1585844328-30654-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State New, archived
Headers show
Series hvmloader: probe memory below 4G before allocation for OVMF | expand

Commit Message

Igor Druzhinin April 2, 2020, 4:18 p.m. UTC
The area just below 4G where OVMF image is originally relocated is not
necessarily a hole - it might contain pages preallocated by device model
or the toolstack. By unconditionally populating on top of this memory
the original pages are getting lost while still potentially foreign mapped
in Dom0.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
That doesn't seem necessary for at least upstream toolstack now.
Alternative might be - to move population of this area to the toolstack
where there is more control over memory layout.
---
 tools/firmware/hvmloader/ovmf.c |  3 ++-
 tools/firmware/hvmloader/util.c | 14 ++++++++++++++
 tools/firmware/hvmloader/util.h |  3 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

Comments

Jan Beulich April 3, 2020, 1:53 p.m. UTC | #1
On 02.04.2020 18:18, Igor Druzhinin wrote:
> The area just below 4G where OVMF image is originally relocated is not
> necessarily a hole - it might contain pages preallocated by device model
> or the toolstack. By unconditionally populating on top of this memory
> the original pages are getting lost while still potentially foreign mapped
> in Dom0.

When there are pre-allocated pages - have they been orphaned? If
so, shouldn't whoever populated them unpopulate rather than
orphaning them? Or if not - how is the re-use you do safe?

> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -398,6 +398,20 @@ int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
>      return rc;
>  }
>  
> +bool mem_probe_ram(xen_pfn_t mfn)
> +{
> +    uint32_t tmp, magic = 0xdeadbeef;
> +    volatile uint32_t *addr = (volatile uint32_t *)(mfn << PAGE_SHIFT);
> +
> +    tmp = *addr;
> +    *addr = magic;
> +    if ( *addr != magic )
> +        return 0;
> +
> +    *addr = tmp;
> +    return 1;
> +}

This looks to probe r/o behavior. If there was a ROM page pre-populated,
wouldn't it then be equally lost once you populate RAM over it? And what
if this is MMIO, i.e. writable but potentially with side effects?

Whether, as you suggest as an alternative, moving populating of this
space to the tool stack is feasible I don't know. If it was, I would
wonder though why it wasn't done like this in the first place.

Jan
Igor Druzhinin April 3, 2020, 2:26 p.m. UTC | #2
On 03/04/2020 14:53, Jan Beulich wrote:
> On 02.04.2020 18:18, Igor Druzhinin wrote:
>> The area just below 4G where OVMF image is originally relocated is not
>> necessarily a hole - it might contain pages preallocated by device model
>> or the toolstack. By unconditionally populating on top of this memory
>> the original pages are getting lost while still potentially foreign mapped
>> in Dom0.
> 
> When there are pre-allocated pages - have they been orphaned? If
> so, shouldn't whoever populated them unpopulate rather than
> orphaning them? Or if not - how is the re-use you do safe?

There is no signal to the device model when that happens - it has no idea when the
memory it populated and hold a mapping to becomes suddenly unmapped in the guest.
Re-use is safe as the memory prepopulated before OVMF starts has not been put
in use yet until after it's finished initializing.

>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -398,6 +398,20 @@ int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
>>      return rc;
>>  }
>>  
>> +bool mem_probe_ram(xen_pfn_t mfn)
>> +{
>> +    uint32_t tmp, magic = 0xdeadbeef;
>> +    volatile uint32_t *addr = (volatile uint32_t *)(mfn << PAGE_SHIFT);
>> +
>> +    tmp = *addr;
>> +    *addr = magic;
>> +    if ( *addr != magic )
>> +        return 0;
>> +
>> +    *addr = tmp;
>> +    return 1;
>> +}
> 
> This looks to probe r/o behavior. If there was a ROM page pre-populated,
> wouldn't it then be equally lost once you populate RAM over it? And what
> if this is MMIO, i.e. writable but potentially with side effects?

I assume if the pages behind it are not r/w there is no other way to avoid
crashing immediately except go and repopulate on top. MMIO is a problematic
corner case I expect device model would try to avoid.

> Whether, as you suggest as an alternative, moving populating of this
> space to the tool stack is feasible I don't know. If it was, I would
> wonder though why it wasn't done like this in the first place.

I expect one complication is to know the type of firmware at the moment
domain is constructed to make sure area is populated for OVMF only. 

Igor
Jan Beulich April 3, 2020, 2:34 p.m. UTC | #3
On 03.04.2020 16:26, Igor Druzhinin wrote:
> On 03/04/2020 14:53, Jan Beulich wrote:
>> On 02.04.2020 18:18, Igor Druzhinin wrote:
>>> The area just below 4G where OVMF image is originally relocated is not
>>> necessarily a hole - it might contain pages preallocated by device model
>>> or the toolstack. By unconditionally populating on top of this memory
>>> the original pages are getting lost while still potentially foreign mapped
>>> in Dom0.
>>
>> When there are pre-allocated pages - have they been orphaned? If
>> so, shouldn't whoever populated them unpopulate rather than
>> orphaning them? Or if not - how is the re-use you do safe?
> 
> There is no signal to the device model when that happens - it has no idea when the
> memory it populated and hold a mapping to becomes suddenly unmapped in the guest.
> Re-use is safe as the memory prepopulated before OVMF starts has not been put
> in use yet until after it's finished initializing.

I guess I'm lacking some details here to fully understand what
you're saying: What is it that may pre-populate this range, and
for what purpose?

Jan
Andrew Cooper April 3, 2020, 2:39 p.m. UTC | #4
On 03/04/2020 14:53, Jan Beulich wrote:
> On 02.04.2020 18:18, Igor Druzhinin wrote:
>> The area just below 4G where OVMF image is originally relocated is not
>> necessarily a hole - it might contain pages preallocated by device model
>> or the toolstack. By unconditionally populating on top of this memory
>> the original pages are getting lost while still potentially foreign mapped
>> in Dom0.
> When there are pre-allocated pages - have they been orphaned? If
> so, shouldn't whoever populated them unpopulate rather than
> orphaning them? Or if not - how is the re-use you do safe?

So this is a mess.

OVMF is linked to run at a fixed address suitable for native hardware,
which is in the SPI ROM immediately below the 4G boundary (this is
correct).  We also put the framebuffer there (this is not correct).

This was fine for RomBIOS which is located under the 1M boundary.

It is also fine for a fully-emulated VGA device in Qemu, because the the
framebuffer if moved (re-set up actually) when the virtual BAR is moved,
but with a real GPU (SR-IOV in this case), there is no logic to play games.

The problem is entirely caused by the framebuffer in Xen not being like
any real system.  The framebuffer isn't actually in a BAR, and also
doesn't manifest itself in the way that graphics-stolen-ram normally
does, either.

~Andrew
Igor Druzhinin April 3, 2020, 2:47 p.m. UTC | #5
On 03/04/2020 15:39, Andrew Cooper wrote:
> On 03/04/2020 14:53, Jan Beulich wrote:
>> On 02.04.2020 18:18, Igor Druzhinin wrote:
>>> The area just below 4G where OVMF image is originally relocated is not
>>> necessarily a hole - it might contain pages preallocated by device model
>>> or the toolstack. By unconditionally populating on top of this memory
>>> the original pages are getting lost while still potentially foreign mapped
>>> in Dom0.
>> When there are pre-allocated pages - have they been orphaned? If
>> so, shouldn't whoever populated them unpopulate rather than
>> orphaning them? Or if not - how is the re-use you do safe?
> 
> So this is a mess.
> 
> OVMF is linked to run at a fixed address suitable for native hardware,
> which is in the SPI ROM immediately below the 4G boundary (this is
> correct).  We also put the framebuffer there (this is not correct).
> 
> This was fine for RomBIOS which is located under the 1M boundary.
> 
> It is also fine for a fully-emulated VGA device in Qemu, because the the
> framebuffer if moved (re-set up actually) when the virtual BAR is moved,
> but with a real GPU (SR-IOV in this case), there is no logic to play games.
> 
> The problem is entirely caused by the framebuffer in Xen not being like
> any real system.  The framebuffer isn't actually in a BAR, and also
> doesn't manifest itself in the way that graphics-stolen-ram normally
> does, either.

Adding to what Andrew said:

There multiple technical complications that caused this mess.
One of them is that there is no unfortunately a better place for the
framebuffer to be located initially. Second, SR-IOV device
is real and adding a virtual BAR to it is also complicated (due to
compatibility reasons) and NVIDIA decided to avoid that.

Igor
Jan Beulich April 3, 2020, 3:05 p.m. UTC | #6
On 03.04.2020 16:47, Igor Druzhinin wrote:
> On 03/04/2020 15:39, Andrew Cooper wrote:
>> On 03/04/2020 14:53, Jan Beulich wrote:
>>> On 02.04.2020 18:18, Igor Druzhinin wrote:
>>>> The area just below 4G where OVMF image is originally relocated is not
>>>> necessarily a hole - it might contain pages preallocated by device model
>>>> or the toolstack. By unconditionally populating on top of this memory
>>>> the original pages are getting lost while still potentially foreign mapped
>>>> in Dom0.
>>> When there are pre-allocated pages - have they been orphaned? If
>>> so, shouldn't whoever populated them unpopulate rather than
>>> orphaning them? Or if not - how is the re-use you do safe?
>>
>> So this is a mess.
>>
>> OVMF is linked to run at a fixed address suitable for native hardware,
>> which is in the SPI ROM immediately below the 4G boundary (this is
>> correct).  We also put the framebuffer there (this is not correct).
>>
>> This was fine for RomBIOS which is located under the 1M boundary.
>>
>> It is also fine for a fully-emulated VGA device in Qemu, because the the
>> framebuffer if moved (re-set up actually) when the virtual BAR is moved,
>> but with a real GPU (SR-IOV in this case), there is no logic to play games.

So are you saying OVMF starts out appearing to run in VRAM then
in the OVMF case, until the frame buffer gets moved? If so,
with the logic added by this patch, how would both places end
(old VRAM address, where OVMF lives, and new VRAM address) get
backed by distinct pages? Wouldn't the avoided re-populate
result in the same page having two uses? Or alternatively there
being a hole in OVMF space, which would be a problem if this
was backing runtime memory?

>> The problem is entirely caused by the framebuffer in Xen not being like
>> any real system.  The framebuffer isn't actually in a BAR, and also
>> doesn't manifest itself in the way that graphics-stolen-ram normally
>> does, either.
> 
> Adding to what Andrew said:
> 
> There multiple technical complications that caused this mess.
> One of them is that there is no unfortunately a better place for the
> framebuffer to be located initially. Second, SR-IOV device
> is real and adding a virtual BAR to it is also complicated (due to
> compatibility reasons) and NVIDIA decided to avoid that.

In which case I wonder - aren't you ending up with the MMIO case
that I had mentioned, and that you said is difficult to deal with?

Jan
Igor Druzhinin April 3, 2020, 3:17 p.m. UTC | #7
On 03/04/2020 16:05, Jan Beulich wrote:
> On 03.04.2020 16:47, Igor Druzhinin wrote:
>> On 03/04/2020 15:39, Andrew Cooper wrote:
>>> On 03/04/2020 14:53, Jan Beulich wrote:
>>>> On 02.04.2020 18:18, Igor Druzhinin wrote:
>>>>> The area just below 4G where OVMF image is originally relocated is not
>>>>> necessarily a hole - it might contain pages preallocated by device model
>>>>> or the toolstack. By unconditionally populating on top of this memory
>>>>> the original pages are getting lost while still potentially foreign mapped
>>>>> in Dom0.
>>>> When there are pre-allocated pages - have they been orphaned? If
>>>> so, shouldn't whoever populated them unpopulate rather than
>>>> orphaning them? Or if not - how is the re-use you do safe?
>>>
>>> So this is a mess.
>>>
>>> OVMF is linked to run at a fixed address suitable for native hardware,
>>> which is in the SPI ROM immediately below the 4G boundary (this is
>>> correct).  We also put the framebuffer there (this is not correct).
>>>
>>> This was fine for RomBIOS which is located under the 1M boundary.
>>>
>>> It is also fine for a fully-emulated VGA device in Qemu, because the the
>>> framebuffer if moved (re-set up actually) when the virtual BAR is moved,
>>> but with a real GPU (SR-IOV in this case), there is no logic to play games.
> 
> So are you saying OVMF starts out appearing to run in VRAM then
> in the OVMF case, until the frame buffer gets moved? If so,
> with the logic added by this patch, how would both places end
> (old VRAM address, where OVMF lives, and new VRAM address) get
> backed by distinct pages? Wouldn't the avoided re-populate
> result in the same page having two uses? Or alternatively there
> being a hole in OVMF space, which would be a problem if this
> was backing runtime memory? 

In normal case (not SR-IOV) VRAM gets evacuated (by PCI logic) before
hvmloader overwrites it. So the issue is avoided. But for SR-IOV VRAM
stays so VRAM area is temporary used to hold OVMF image - until decompression
is complete. With this patch VRAM pages would be used for that purpose
instead new ones.

>>> The problem is entirely caused by the framebuffer in Xen not being like
>>> any real system.  The framebuffer isn't actually in a BAR, and also
>>> doesn't manifest itself in the way that graphics-stolen-ram normally
>>> does, either.
>>
>> Adding to what Andrew said:
>>
>> There multiple technical complications that caused this mess.
>> One of them is that there is no unfortunately a better place for the
>> framebuffer to be located initially. Second, SR-IOV device
>> is real and adding a virtual BAR to it is also complicated (due to
>> compatibility reasons) and NVIDIA decided to avoid that.
> 
> In which case I wonder - aren't you ending up with the MMIO case
> that I had mentioned, and that you said is difficult to deal with?

No, it's VRAM area (normal RAM pages) - not MMIO.

Igor
Jan Beulich April 3, 2020, 3:28 p.m. UTC | #8
On 03.04.2020 17:17, Igor Druzhinin wrote:
> On 03/04/2020 16:05, Jan Beulich wrote:
>> On 03.04.2020 16:47, Igor Druzhinin wrote:
>>> There multiple technical complications that caused this mess.
>>> One of them is that there is no unfortunately a better place for the
>>> framebuffer to be located initially. Second, SR-IOV device
>>> is real and adding a virtual BAR to it is also complicated (due to
>>> compatibility reasons) and NVIDIA decided to avoid that.
>>
>> In which case I wonder - aren't you ending up with the MMIO case
>> that I had mentioned, and that you said is difficult to deal with?
> 
> No, it's VRAM area (normal RAM pages) - not MMIO.

Well, VRAM is still MMIO from the CPU's perspective, just without
any side effects. But if it was another device that was passed
through, couldn't its MMIO similarly end up in that area?

Jan
Igor Druzhinin April 3, 2020, 3:36 p.m. UTC | #9
On 03/04/2020 16:28, Jan Beulich wrote:
> On 03.04.2020 17:17, Igor Druzhinin wrote:
>> On 03/04/2020 16:05, Jan Beulich wrote:
>>> On 03.04.2020 16:47, Igor Druzhinin wrote:
>>>> There multiple technical complications that caused this mess.
>>>> One of them is that there is no unfortunately a better place for the
>>>> framebuffer to be located initially. Second, SR-IOV device
>>>> is real and adding a virtual BAR to it is also complicated (due to
>>>> compatibility reasons) and NVIDIA decided to avoid that.
>>>
>>> In which case I wonder - aren't you ending up with the MMIO case
>>> that I had mentioned, and that you said is difficult to deal with?
>>
>> No, it's VRAM area (normal RAM pages) - not MMIO.
> 
> Well, VRAM is still MMIO from the CPU's perspective, just without
> any side effects. But if it was another device that was passed
> through, couldn't its MMIO similarly end up in that area?

As Andrew said, Xen VRAM is just normal RAM. It's originally
populated in this area for the lack of a better place (at least now).
It's special and has nothing to do with the device passed through
using conventional PCI mechanisms which BARs will only occupy MMIO hole.

Igor
Paul Durrant April 3, 2020, 4:16 p.m. UTC | #10
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Igor Druzhinin
> Sent: 03 April 2020 16:36
> To: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; roger.pau@citrix.com; ian.jackson@eu.citrix.com;
> wl@xen.org; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH] hvmloader: probe memory below 4G before allocation for OVMF
> 
> On 03/04/2020 16:28, Jan Beulich wrote:
> > On 03.04.2020 17:17, Igor Druzhinin wrote:
> >> On 03/04/2020 16:05, Jan Beulich wrote:
> >>> On 03.04.2020 16:47, Igor Druzhinin wrote:
> >>>> There multiple technical complications that caused this mess.
> >>>> One of them is that there is no unfortunately a better place for the
> >>>> framebuffer to be located initially. Second, SR-IOV device
> >>>> is real and adding a virtual BAR to it is also complicated (due to
> >>>> compatibility reasons) and NVIDIA decided to avoid that.
> >>>
> >>> In which case I wonder - aren't you ending up with the MMIO case
> >>> that I had mentioned, and that you said is difficult to deal with?
> >>
> >> No, it's VRAM area (normal RAM pages) - not MMIO.
> >
> > Well, VRAM is still MMIO from the CPU's perspective, just without
> > any side effects. But if it was another device that was passed
> > through, couldn't its MMIO similarly end up in that area?
> 
> As Andrew said, Xen VRAM is just normal RAM. It's originally
> populated in this area for the lack of a better place (at least now).
> It's special and has nothing to do with the device passed through
> using conventional PCI mechanisms which BARs will only occupy MMIO hole.
> 

I assume Jan's point is that the guest doesn't access it as if it is normal RAM. It's only accessed directly if it is present in a PCI BAR, otherwise it is accessed indirectly (via QEMU) in response to accesses to the VGA aperture.

  Paul
Igor Druzhinin April 5, 2020, 2:48 p.m. UTC | #11
On 02/04/2020 17:18, Igor Druzhinin wrote:
> The area just below 4G where OVMF image is originally relocated is not
> necessarily a hole - it might contain pages preallocated by device model
> or the toolstack. By unconditionally populating on top of this memory
> the original pages are getting lost while still potentially foreign mapped
> in Dom0.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
> That doesn't seem necessary for at least upstream toolstack now.
> Alternative might be - to move population of this area to the toolstack
> where there is more control over memory layout.
> ---

Thanks for the discussion, please ignore this patch. We found a better way how
deal with our problem locally. We will introduce a reserved region within
MMIO hole that is managed similarly to stolen memory.

Igor
diff mbox series

Patch

diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 23610a0..70d5f70 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -106,7 +106,8 @@  static void ovmf_load(const struct bios_config *config,
     {
         mfn = (uint32_t) (addr >> PAGE_SHIFT);
         addr += PAGE_SIZE;
-        mem_hole_populate_ram(mfn, 1);
+        if ( !mem_probe_ram(mfn) )
+            mem_hole_populate_ram(mfn, 1);
     }
 
     /* Check that source and destination does not overlaps. */
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d2..724cea0 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -398,6 +398,20 @@  int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
     return rc;
 }
 
+bool mem_probe_ram(xen_pfn_t mfn)
+{
+    uint32_t tmp, magic = 0xdeadbeef;
+    volatile uint32_t *addr = (volatile uint32_t *)(mfn << PAGE_SHIFT);
+
+    tmp = *addr;
+    *addr = magic;
+    if ( *addr != magic )
+        return 0;
+
+    *addr = tmp;
+    return 1;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
     static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7bca641..00a7c13 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -194,6 +194,9 @@  int vprintf(const char *fmt, va_list ap);
 /* Buffer output */
 int snprintf(char *buf, size_t size, const char *fmt, ...) __attribute__ ((format (printf, 3, 4)));
 
+/* Probe whether a page is populated with RAM. */
+bool mem_probe_ram(xen_pfn_t mfn);
+
 /* Populate specified memory hole with RAM. */
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns);