diff mbox series

[v3,24/35] OvmfPkg/XenPlatformPei: Rework memory detection

Message ID 20190704144233.27968-25-anthony.perard@citrix.com (mailing list archive)
State Superseded
Headers show
Series Specific platform to run OVMF in Xen PVH and HVM guests | expand

Commit Message

Anthony PERARD July 4, 2019, 2:42 p.m. UTC
When running as a Xen PVH guest, there is no CMOS to read the memory
size from.  Rework GetSystemMemorySize(Below|Above)4gb() so they can
works without CMOS by reading the e820 table.

Rework XenPublishRamRegions for PVH, handle the Reserve type and explain
about the ACPI type. MTRR settings aren't modified anymore, on HVM, it's
already done by hvmloader, on PVH it is supposed to have sane default.

Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
---

Notes:
    Comment for Xen people:
    About MTRR, should we redo the setting in OVMF? Even if in both case of
    PVH and HVM, something would have setup the default type to write back
    and handle a few other ranges like PCI hole, hvmloader for HVM or and
    libxc I think for PVH.
    
    (For PVH, it's in the spec as well
    https://xenbits.xenproject.org/docs/unstable/misc/pvh.html#mtrr )

 OvmfPkg/XenPlatformPei/Platform.h  |  6 +++
 OvmfPkg/XenPlatformPei/MemDetect.c | 71 ++++++++++++++++++++++++++++++
 OvmfPkg/XenPlatformPei/Xen.c       | 47 ++++++++++++++------
 3 files changed, 111 insertions(+), 13 deletions(-)

Comments

Roger Pau Monne July 15, 2019, 2:15 p.m. UTC | #1
On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> When running as a Xen PVH guest, there is no CMOS to read the memory
> size from.  Rework GetSystemMemorySize(Below|Above)4gb() so they can
> works without CMOS by reading the e820 table.
> 
> Rework XenPublishRamRegions for PVH, handle the Reserve type and explain
> about the ACPI type. MTRR settings aren't modified anymore, on HVM, it's
> already done by hvmloader, on PVH it is supposed to have sane default.
> 
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Acked-by: Laszlo Ersek <lersek@redhat.com>
> ---
> 
> Notes:
>     Comment for Xen people:
>     About MTRR, should we redo the setting in OVMF? Even if in both case of
>     PVH and HVM, something would have setup the default type to write back
>     and handle a few other ranges like PCI hole, hvmloader for HVM or and
>     libxc I think for PVH.

That's a tricky question. Ideally we would like the firmware (OVMF) to
take care of that, because it already has code to do so. Problem here
is that PVH can also be booted without firmware, in which case it
needs the hypervisor to have setup some sane initial MTRR state.

The statement in the PVH document about initial MTRR state is vague
enough that allows Xen to boot into the guest with a minimal MTRR
state, that can for example not contain UC regions for the MMIO
regions of passed through devices, hence I think OVMF should be in
charge of creating a more complete MTRR state if possible.

Is this something OVMF already has logic for?

Also accounting for the MMIO regions of devices?

>     (For PVH, it's in the spec as well
>     https://xenbits.xenproject.org/docs/unstable/misc/pvh.html#mtrr )
> 
>  OvmfPkg/XenPlatformPei/Platform.h  |  6 +++
>  OvmfPkg/XenPlatformPei/MemDetect.c | 71 ++++++++++++++++++++++++++++++
>  OvmfPkg/XenPlatformPei/Xen.c       | 47 ++++++++++++++------
>  3 files changed, 111 insertions(+), 13 deletions(-)
> 
> diff --git a/OvmfPkg/XenPlatformPei/Platform.h b/OvmfPkg/XenPlatformPei/Platform.h
> index db9a62572f..e8e0b835a5 100644
> --- a/OvmfPkg/XenPlatformPei/Platform.h
> +++ b/OvmfPkg/XenPlatformPei/Platform.h
> @@ -114,6 +114,12 @@ XenPublishRamRegions (
>    VOID
>    );
>  
> +EFI_STATUS
> +XenGetE820Map (
> +  EFI_E820_ENTRY64 **Entries,
> +  UINT32 *Count
> +  );
> +
>  extern EFI_BOOT_MODE mBootMode;
>  
>  extern UINT8 mPhysMemAddressWidth;
> diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c b/OvmfPkg/XenPlatformPei/MemDetect.c
> index cb7dd93ad6..3e33e7f414 100644
> --- a/OvmfPkg/XenPlatformPei/MemDetect.c
> +++ b/OvmfPkg/XenPlatformPei/MemDetect.c
> @@ -96,6 +96,47 @@ Q35TsegMbytesInitialization (
>    mQ35TsegMbytes = ExtendedTsegMbytes;
>  }
>  
> +STATIC
> +UINT64
> +GetHighestSystemMemoryAddress (
> +  BOOLEAN       Below4gb
> +  )
> +{
> +  EFI_E820_ENTRY64    *E820Map;
> +  UINT32              E820EntriesCount;
> +  EFI_E820_ENTRY64    *Entry;
> +  EFI_STATUS          Status;
> +  UINT32              Loop;
> +  UINT64              HighestAddress;
> +  UINT64              EntryEnd;
> +
> +  HighestAddress = 0;
> +
> +  Status = XenGetE820Map (&E820Map, &E820EntriesCount);

You could maybe initialize this as a global to avoid having to issue
a hypercall each time you need to get something from the memory map.

> +  ASSERT_EFI_ERROR (Status);
> +
> +  for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> +    Entry = E820Map + Loop;
> +    EntryEnd = Entry->BaseAddr + Entry->Length;
> +
> +    if (Entry->Type == EfiAcpiAddressRangeMemory &&
> +        EntryEnd > HighestAddress) {
> +
> +      if (Below4gb && (EntryEnd <= BASE_4GB)) {
> +        HighestAddress = EntryEnd;
> +      } else if (!Below4gb && (EntryEnd >= BASE_4GB)) {
> +        HighestAddress = EntryEnd;
> +      }
> +    }
> +  }
> +
> +  //
> +  // Round down the end address.
> +  //
> +  HighestAddress &= ~(UINT64)EFI_PAGE_MASK;
> +
> +  return HighestAddress;

You could do the rounding on the return statement.

> +}
>  
>  UINT32
>  GetSystemMemorySizeBelow4gb (
> @@ -105,6 +146,19 @@ GetSystemMemorySizeBelow4gb (
>    UINT8 Cmos0x34;
>    UINT8 Cmos0x35;
>  
> +  //
> +  // In PVH case, there is no CMOS, we have to calculate the memory size
> +  // from parsing the E820
> +  //
> +  if (XenPvhDetected ()) {

IIRC on HVM you can also get the memory map from the hypercall, in
which case you could use the same code path for both HVM and PVH.

> +    UINT64  HighestAddress;
> +
> +    HighestAddress = GetHighestSystemMemoryAddress (TRUE);
> +    ASSERT (HighestAddress > 0 && HighestAddress <= BASE_4GB);
> +
> +    return HighestAddress;

The name of the function here is GetSystemMemorySizeBelow4gb, but you
are returning the highest memory address in the range, is this
expected?

ie: highest address != memory size

On HVM there are quite some holes in the memory map, and nothing
guarantees there are no memory regions after the holes or non-RAM
regions.

> +  }
> +
>    //
>    // CMOS 0x34/0x35 specifies the system memory above 16 MB.
>    // * CMOS(0x35) is the high byte
> @@ -129,6 +183,23 @@ GetSystemMemorySizeAbove4gb (
>    UINT32 Size;
>    UINTN  CmosIndex;
>  
> +  //
> +  // In PVH case, there is no CMOS, we have to calculate the memory size
> +  // from parsing the E820
> +  //
> +  if (XenPvhDetected ()) {
> +    UINT64  HighestAddress;
> +
> +    HighestAddress = GetHighestSystemMemoryAddress (FALSE);
> +    ASSERT (HighestAddress == 0 || HighestAddress >= BASE_4GB);
> +
> +    if (HighestAddress >= BASE_4GB) {
> +      HighestAddress -= BASE_4GB;
> +    }
> +
> +    return HighestAddress;
> +  }
> +
>    //
>    // CMOS 0x5b-0x5d specifies the system memory above 4GB MB.
>    // * CMOS(0x5d) is the most significant size byte
> diff --git a/OvmfPkg/XenPlatformPei/Xen.c b/OvmfPkg/XenPlatformPei/Xen.c
> index cbfd8058fc..62a2c3ed93 100644
> --- a/OvmfPkg/XenPlatformPei/Xen.c
> +++ b/OvmfPkg/XenPlatformPei/Xen.c
> @@ -279,6 +279,8 @@ XenPublishRamRegions (
>    EFI_E820_ENTRY64  *E820Map;
>    UINT32            E820EntriesCount;
>    EFI_STATUS        Status;
> +  EFI_E820_ENTRY64 *Entry;
> +  UINTN Index;
>  
>    DEBUG ((EFI_D_INFO, "Using memory map provided by Xen\n"));
>  
> @@ -287,26 +289,45 @@ XenPublishRamRegions (
>    //
>    E820EntriesCount = 0;
>    Status = XenGetE820Map (&E820Map, &E820EntriesCount);
> -
>    ASSERT_EFI_ERROR (Status);
>  
> -  if (E820EntriesCount > 0) {
> -    EFI_E820_ENTRY64 *Entry;
> -    UINT32 Loop;
> +  for (Index = 0; Index < E820EntriesCount; Index++) {
> +    UINT64 Base;
> +    UINT64 End;
>  
> -    for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> -      Entry = E820Map + Loop;
> +    Entry = &E820Map[Index];
>  
> +
> +    //
> +    // Round up the start address, and round down the end address.
> +    //
> +    Base = ALIGN_VALUE (Entry->BaseAddr, (UINT64)EFI_PAGE_SIZE);
> +    End = (Entry->BaseAddr + Entry->Length) & ~(UINT64)EFI_PAGE_MASK;
> +
> +    switch (Entry->Type) {
> +    case EfiAcpiAddressRangeMemory:
> +      AddMemoryRangeHob (Base, End);
> +      break;
> +    case EfiAcpiAddressRangeACPI:
> +      //
> +      // Ignore, OVMF should read the ACPI tables and provide them to linux
> +      // from a different location.

Will OVMF also parse dynamic tables to check for references there?

> +      //
> +      break;
> +    case EfiAcpiAddressRangeReserved:
>        //
> -      // Only care about RAM
> +      // Avoid ranges marked as reserved in the e820 table provided by
> +      // hvmloader as it conflicts with an other aperture.

I think you want the last part of the sentence to be: '... as it
conflicts with other apertures.'

I think however that you should make sure ranges marked as reserved in
the original memory map also end up in the final one, hence
overlapping ranges should be merged, instead of discarded.

> +      // error message: CpuDxe: IntersectMemoryDescriptor:
> +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
>        //
> -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> -        continue;
> +      if (!XenHvmloaderDetected ()) {
> +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);

This special casing for PVH looks weird, ideally we would like to use
the same code path, or else it should be explicitly mentioned why PVH
has diverging behaviour.

Thanks, Roger.
Anthony PERARD July 22, 2019, 2:53 p.m. UTC | #2
On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > When running as a Xen PVH guest, there is no CMOS to read the memory
> > size from.  Rework GetSystemMemorySize(Below|Above)4gb() so they can
> > works without CMOS by reading the e820 table.
> > 
> > Rework XenPublishRamRegions for PVH, handle the Reserve type and explain
> > about the ACPI type. MTRR settings aren't modified anymore, on HVM, it's
> > already done by hvmloader, on PVH it is supposed to have sane default.
> > 
> > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689
> > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> > Acked-by: Laszlo Ersek <lersek@redhat.com>
> > ---
> > 
> > Notes:
> >     Comment for Xen people:
> >     About MTRR, should we redo the setting in OVMF? Even if in both case of
> >     PVH and HVM, something would have setup the default type to write back
> >     and handle a few other ranges like PCI hole, hvmloader for HVM or and
> >     libxc I think for PVH.
> 
> That's a tricky question. Ideally we would like the firmware (OVMF) to
> take care of that, because it already has code to do so. Problem here
> is that PVH can also be booted without firmware, in which case it
> needs the hypervisor to have setup some sane initial MTRR state.
> 
> The statement in the PVH document about initial MTRR state is vague
> enough that allows Xen to boot into the guest with a minimal MTRR
> state, that can for example not contain UC regions for the MMIO
> regions of passed through devices, hence I think OVMF should be in
> charge of creating a more complete MTRR state if possible.
> 
> Is this something OVMF already has logic for?

Well, there are some logic but it's for QEMU (and uses an interface that
isn't available when running on Xen, fwcfg).

The logic that was there for Xen HVM was very simple, a single set
cache-write-back for the RAM, that's why I remove it (and because I'm
not sure yet I figured out how to run the mtrr functions correctly in
OVMF).

I probably going to have to write a new logic which would rewrite the
MTRR from scratch instead of relying on the existing setup.

> Also accounting for the MMIO regions of devices?

I'll have to dig deeper into OVMF codes, and PCI device handling. On
HVM, we have a different logic than the one for QEMU, OVMF only scan
what hvmloader have done instead of re-setup the pci devices. I'm
probably missing other stuff.

> > diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c b/OvmfPkg/XenPlatformPei/MemDetect.c
> > index cb7dd93ad6..3e33e7f414 100644
> > --- a/OvmfPkg/XenPlatformPei/MemDetect.c
> > +++ b/OvmfPkg/XenPlatformPei/MemDetect.c
> > @@ -96,6 +96,47 @@ Q35TsegMbytesInitialization (
> >    mQ35TsegMbytes = ExtendedTsegMbytes;
> >  }
> >  
> > +STATIC
> > +UINT64
> > +GetHighestSystemMemoryAddress (
> > +  BOOLEAN       Below4gb
> > +  )
> > +{
> > +  EFI_E820_ENTRY64    *E820Map;
> > +  UINT32              E820EntriesCount;
> > +  EFI_E820_ENTRY64    *Entry;
> > +  EFI_STATUS          Status;
> > +  UINT32              Loop;
> > +  UINT64              HighestAddress;
> > +  UINT64              EntryEnd;
> > +
> > +  HighestAddress = 0;
> > +
> > +  Status = XenGetE820Map (&E820Map, &E820EntriesCount);
> 
> You could maybe initialize this as a global to avoid having to issue
> a hypercall each time you need to get something from the memory map.

That function does that, it only make the hypercall once. (The hypercall
can only be made once anyway, the second time Xen doesn't return the
map.)

> > +  ASSERT_EFI_ERROR (Status);
> > +
> > +  for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> > +    Entry = E820Map + Loop;
> > +    EntryEnd = Entry->BaseAddr + Entry->Length;
> > +
> > +    if (Entry->Type == EfiAcpiAddressRangeMemory &&
> > +        EntryEnd > HighestAddress) {
> > +
> > +      if (Below4gb && (EntryEnd <= BASE_4GB)) {
> > +        HighestAddress = EntryEnd;
> > +      } else if (!Below4gb && (EntryEnd >= BASE_4GB)) {
> > +        HighestAddress = EntryEnd;
> > +      }
> > +    }
> > +  }
> > +
> > +  //
> > +  // Round down the end address.
> > +  //
> > +  HighestAddress &= ~(UINT64)EFI_PAGE_MASK;
> > +
> > +  return HighestAddress;
> 
> You could do the rounding on the return statement.

Yes, I think that can be done.

> > +}
> >  
> >  UINT32
> >  GetSystemMemorySizeBelow4gb (
> > @@ -105,6 +146,19 @@ GetSystemMemorySizeBelow4gb (
> >    UINT8 Cmos0x34;
> >    UINT8 Cmos0x35;
> >  
> > +  //
> > +  // In PVH case, there is no CMOS, we have to calculate the memory size
> > +  // from parsing the E820
> > +  //
> > +  if (XenPvhDetected ()) {
> 
> IIRC on HVM you can also get the memory map from the hypercall, in
> which case you could use the same code path for both HVM and PVH.

I think that wouldn't work because in my experiment, the hypercall would
only return the map the first time (at least on PVH). hvmloader already
make the hypercall so OVMF can't.

On the other hand, XenGetE820Map() return an E820 map, it doesn't matter
if it's the one passed by hvmloader, or the one we've got directly from
Xen. So I guess we could ignore what hvmloader have written in the CMOS
and use the information in the e820 directly. But I think I let this
change for future patch.

> > +    UINT64  HighestAddress;
> > +
> > +    HighestAddress = GetHighestSystemMemoryAddress (TRUE);
> > +    ASSERT (HighestAddress > 0 && HighestAddress <= BASE_4GB);
> > +
> > +    return HighestAddress;
> 
> The name of the function here is GetSystemMemorySizeBelow4gb, but you
> are returning the highest memory address in the range, is this
> expected?
> 
> ie: highest address != memory size
> 
> On HVM there are quite some holes in the memory map, and nothing
> guarantees there are no memory regions after the holes or non-RAM
> regions.

I think that's what is expected by caller of the function.

> > +  }
> > +
> >    //
> >    // CMOS 0x34/0x35 specifies the system memory above 16 MB.
> >    // * CMOS(0x35) is the high byte
> > diff --git a/OvmfPkg/XenPlatformPei/Xen.c b/OvmfPkg/XenPlatformPei/Xen.c
> > index cbfd8058fc..62a2c3ed93 100644
> > --- a/OvmfPkg/XenPlatformPei/Xen.c
> > +++ b/OvmfPkg/XenPlatformPei/Xen.c
> > @@ -279,6 +279,8 @@ XenPublishRamRegions (
> >    EFI_E820_ENTRY64  *E820Map;
> >    UINT32            E820EntriesCount;
> >    EFI_STATUS        Status;
> > +  EFI_E820_ENTRY64 *Entry;
> > +  UINTN Index;
> >  
> >    DEBUG ((EFI_D_INFO, "Using memory map provided by Xen\n"));
> >  
> > @@ -287,26 +289,45 @@ XenPublishRamRegions (
> >    //
> >    E820EntriesCount = 0;
> >    Status = XenGetE820Map (&E820Map, &E820EntriesCount);
> > -
> >    ASSERT_EFI_ERROR (Status);
> >  
> > -  if (E820EntriesCount > 0) {
> > -    EFI_E820_ENTRY64 *Entry;
> > -    UINT32 Loop;
> > +  for (Index = 0; Index < E820EntriesCount; Index++) {
> > +    UINT64 Base;
> > +    UINT64 End;
> >  
> > -    for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> > -      Entry = E820Map + Loop;
> > +    Entry = &E820Map[Index];
> >  
> > +
> > +    //
> > +    // Round up the start address, and round down the end address.
> > +    //
> > +    Base = ALIGN_VALUE (Entry->BaseAddr, (UINT64)EFI_PAGE_SIZE);
> > +    End = (Entry->BaseAddr + Entry->Length) & ~(UINT64)EFI_PAGE_MASK;
> > +
> > +    switch (Entry->Type) {
> > +    case EfiAcpiAddressRangeMemory:
> > +      AddMemoryRangeHob (Base, End);
> > +      break;
> > +    case EfiAcpiAddressRangeACPI:
> > +      //
> > +      // Ignore, OVMF should read the ACPI tables and provide them to linux
> > +      // from a different location.
> 
> Will OVMF also parse dynamic tables to check for references there?

I haven't looked at what OVMF does with the ACPI tables, but Linux seems
fine. I've compared the boot output of linux running as PVH vs booted
via OVMF. Beside the location of the table been different, the number of
table where the same, I don't remember other difference.

> > +      //
> > +      break;
> > +    case EfiAcpiAddressRangeReserved:
> >        //
> > -      // Only care about RAM
> > +      // Avoid ranges marked as reserved in the e820 table provided by
> > +      // hvmloader as it conflicts with an other aperture.
> 
> I think you want the last part of the sentence to be: '... as it
> conflicts with other apertures.'
> 
> I think however that you should make sure ranges marked as reserved in
> the original memory map also end up in the final one, hence
> overlapping ranges should be merged, instead of discarded.

I'll look into that.

> > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> >        //
> > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > -        continue;
> > +      if (!XenHvmloaderDetected ()) {
> > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> 
> This special casing for PVH looks weird, ideally we would like to use
> the same code path, or else it should be explicitly mentioned why PVH
> has diverging behaviour.

I think hvmloader is the issue rather than PVH. Here is part of the
"memory map" as found in hvmloader/config.h:

  /* Special BIOS mappings, etc. are allocated from here upwards... */
  #define RESERVED_MEMBASE              0xFC000000
  /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
  #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
  #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
  #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000

and hvmloader simply creates a single e820 reserved entry, from
RESERVED_MEMBASE to the top of 4GB. It's probably too much.
If hvmloader only reserved
ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
to special case hvmloader. As far as I know 0xfee00000 isn't a special
bios mapping, but something the hardware provides.


Anyway, thanks for the feedback, there's probably quite a bit to do to
cleanup the memory stuff. I do think about one day running OVMF without
running hvmloader first :-), but there's a bit more to do.
Laszlo Ersek July 22, 2019, 7:45 p.m. UTC | #3
On 07/22/19 16:53, Anthony PERARD wrote:
> On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
>> On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
>>> When running as a Xen PVH guest, there is no CMOS to read the memory
>>> size from.  Rework GetSystemMemorySize(Below|Above)4gb() so they can
>>> works without CMOS by reading the e820 table.
>>>
>>> Rework XenPublishRamRegions for PVH, handle the Reserve type and explain
>>> about the ACPI type. MTRR settings aren't modified anymore, on HVM, it's
>>> already done by hvmloader, on PVH it is supposed to have sane default.
>>>
>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689
>>> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
>>> Acked-by: Laszlo Ersek <lersek@redhat.com>
>>> ---
>>>
>>> Notes:
>>>     Comment for Xen people:
>>>     About MTRR, should we redo the setting in OVMF? Even if in both case of
>>>     PVH and HVM, something would have setup the default type to write back
>>>     and handle a few other ranges like PCI hole, hvmloader for HVM or and
>>>     libxc I think for PVH.
>>
>> That's a tricky question. Ideally we would like the firmware (OVMF) to
>> take care of that, because it already has code to do so. Problem here
>> is that PVH can also be booted without firmware, in which case it
>> needs the hypervisor to have setup some sane initial MTRR state.
>>
>> The statement in the PVH document about initial MTRR state is vague
>> enough that allows Xen to boot into the guest with a minimal MTRR
>> state, that can for example not contain UC regions for the MMIO
>> regions of passed through devices, hence I think OVMF should be in
>> charge of creating a more complete MTRR state if possible.
>>
>> Is this something OVMF already has logic for?
> 
> Well, there are some logic but it's for QEMU (and uses an interface that
> isn't available when running on Xen, fwcfg).
> 
> The logic that was there for Xen HVM was very simple, a single set
> cache-write-back for the RAM, that's why I remove it (and because I'm
> not sure yet I figured out how to run the mtrr functions correctly in
> OVMF).
> 
> I probably going to have to write a new logic which would rewrite the
> MTRR from scratch instead of relying on the existing setup.

MTRR setup is complex in OVMF, in comparison to firmware that runs on
physical machines, because:

- the physical RAM size can change from boot to boot, with almost total
freedom, and that can incur some unexpected changes in the physical RAM
map too (i.e. affect not just the end, but holes)

- the number of variable MTRRs is severely limited and can't cover an
arbitrary physical RAM map. And, some platform-independent modules in
edk2 consume variable MTRRs too, via gDS->SetMemorySpaceAttributes(), so
we have to be very conservative with even those variable MTRRs that exist.

Even on QEMU i440fx & pc, we've *just* made OVMF cope with an arbitrary
guest RAM size (that is, beyond 128MB), and that logic relies on some
open-coded board-specific knowledge about low (<4G) RAM size. So much so
that, on i440fx, we place the 32-bit PCI IOMMU aperture based on what we
can configure with a minimal amount of variable MTRRs, and not vice
versa (i.e. we don't first set the 32-bit MMIO aperture and then attempt
to mark it as uncached). Please see:

  https://bugzilla.tianocore.org/show_bug.cgi?id=1814

This is one of the nastiest parts of OVMF. (PlatformPei is, in general.)

Thanks
Laszlo
Laszlo Ersek July 22, 2019, 11:08 p.m. UTC | #4
On 07/22/19 21:45, Laszlo Ersek wrote:

> we place the 32-bit PCI IOMMU aperture based on [...]

Do I get a medal for this hugely confusing typo? :)

In earnest, I'm sorry about it -- my comment had nothing to do with
"IOMMU"; I meant "MMIO". (At least I got it right in the rest of the email.)

Sorry!
Laszlo
Roger Pau Monne July 23, 2019, 9:42 a.m. UTC | #5
On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote:
> On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > > When running as a Xen PVH guest, there is no CMOS to read the memory
> > > size from.  Rework GetSystemMemorySize(Below|Above)4gb() so they can
> > > works without CMOS by reading the e820 table.
> > > 
> > > Rework XenPublishRamRegions for PVH, handle the Reserve type and explain
> > > about the ACPI type. MTRR settings aren't modified anymore, on HVM, it's
> > > already done by hvmloader, on PVH it is supposed to have sane default.
> > > 
> > > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689
> > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> > > Acked-by: Laszlo Ersek <lersek@redhat.com>
> > > ---
> > > 
> > > Notes:
> > >     Comment for Xen people:
> > >     About MTRR, should we redo the setting in OVMF? Even if in both case of
> > >     PVH and HVM, something would have setup the default type to write back
> > >     and handle a few other ranges like PCI hole, hvmloader for HVM or and
> > >     libxc I think for PVH.
> > 
> > That's a tricky question. Ideally we would like the firmware (OVMF) to
> > take care of that, because it already has code to do so. Problem here
> > is that PVH can also be booted without firmware, in which case it
> > needs the hypervisor to have setup some sane initial MTRR state.
> > 
> > The statement in the PVH document about initial MTRR state is vague
> > enough that allows Xen to boot into the guest with a minimal MTRR
> > state, that can for example not contain UC regions for the MMIO
> > regions of passed through devices, hence I think OVMF should be in
> > charge of creating a more complete MTRR state if possible.
> > 
> > Is this something OVMF already has logic for?
> 
> Well, there are some logic but it's for QEMU (and uses an interface that
> isn't available when running on Xen, fwcfg).
> 
> The logic that was there for Xen HVM was very simple, a single set
> cache-write-back for the RAM, that's why I remove it (and because I'm
> not sure yet I figured out how to run the mtrr functions correctly in
> OVMF).
> 
> I probably going to have to write a new logic which would rewrite the
> MTRR from scratch instead of relying on the existing setup.
> 
> > Also accounting for the MMIO regions of devices?
> 
> I'll have to dig deeper into OVMF codes, and PCI device handling. On
> HVM, we have a different logic than the one for QEMU, OVMF only scan
> what hvmloader have done instead of re-setup the pci devices. I'm
> probably missing other stuff.

MTRR setup it's always a PITA, I was hoping OVMF could manage to do
that based on the memory map plus scanning for devices and positioning
BARs, but if it gets the information from other side-channels it's
going to be more complicated.

Anyway, something to improve in the future in order to get rid of
hvmloader.

> > > diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c b/OvmfPkg/XenPlatformPei/MemDetect.c
> > > index cb7dd93ad6..3e33e7f414 100644
> > > --- a/OvmfPkg/XenPlatformPei/MemDetect.c
> > > +++ b/OvmfPkg/XenPlatformPei/MemDetect.c
> > > @@ -96,6 +96,47 @@ Q35TsegMbytesInitialization (
> > >    mQ35TsegMbytes = ExtendedTsegMbytes;
> > >  }
> > >  
> > > +STATIC
> > > +UINT64
> > > +GetHighestSystemMemoryAddress (
> > > +  BOOLEAN       Below4gb
> > > +  )
> > > +{
> > > +  EFI_E820_ENTRY64    *E820Map;
> > > +  UINT32              E820EntriesCount;
> > > +  EFI_E820_ENTRY64    *Entry;
> > > +  EFI_STATUS          Status;
> > > +  UINT32              Loop;
> > > +  UINT64              HighestAddress;
> > > +  UINT64              EntryEnd;
> > > +
> > > +  HighestAddress = 0;
> > > +
> > > +  Status = XenGetE820Map (&E820Map, &E820EntriesCount);
> > 
> > You could maybe initialize this as a global to avoid having to issue
> > a hypercall each time you need to get something from the memory map.
> 
> That function does that, it only make the hypercall once. (The hypercall
> can only be made once anyway, the second time Xen doesn't return the
> map.)

Why? I'm looking at the implementation in Xen of XENMEM_memory_map and
I'm not sure I see how/why the hypercall can only be made once. AFAICT
you should be able to call XENMEM_memory_map multiple times without
issues, or else it's a bug somewhere.

> > > +  ASSERT_EFI_ERROR (Status);
> > > +
> > > +  for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> > > +    Entry = E820Map + Loop;
> > > +    EntryEnd = Entry->BaseAddr + Entry->Length;
> > > +
> > > +    if (Entry->Type == EfiAcpiAddressRangeMemory &&
> > > +        EntryEnd > HighestAddress) {
> > > +
> > > +      if (Below4gb && (EntryEnd <= BASE_4GB)) {
> > > +        HighestAddress = EntryEnd;
> > > +      } else if (!Below4gb && (EntryEnd >= BASE_4GB)) {
> > > +        HighestAddress = EntryEnd;
> > > +      }
> > > +    }
> > > +  }
> > > +
> > > +  //
> > > +  // Round down the end address.
> > > +  //
> > > +  HighestAddress &= ~(UINT64)EFI_PAGE_MASK;
> > > +
> > > +  return HighestAddress;
> > 
> > You could do the rounding on the return statement.
> 
> Yes, I think that can be done.
> 
> > > +}
> > >  
> > >  UINT32
> > >  GetSystemMemorySizeBelow4gb (
> > > @@ -105,6 +146,19 @@ GetSystemMemorySizeBelow4gb (
> > >    UINT8 Cmos0x34;
> > >    UINT8 Cmos0x35;
> > >  
> > > +  //
> > > +  // In PVH case, there is no CMOS, we have to calculate the memory size
> > > +  // from parsing the E820
> > > +  //
> > > +  if (XenPvhDetected ()) {
> > 
> > IIRC on HVM you can also get the memory map from the hypercall, in
> > which case you could use the same code path for both HVM and PVH.
> 
> I think that wouldn't work because in my experiment, the hypercall would
> only return the map the first time (at least on PVH). hvmloader already
> make the hypercall so OVMF can't.

OK, I'm not sure the reason for this, as I said above I think this is
a bug somewhere. You should be able to call XENMEM_memory_map multiple
times.

> On the other hand, XenGetE820Map() return an E820 map, it doesn't matter
> if it's the one passed by hvmloader, or the one we've got directly from
> Xen. So I guess we could ignore what hvmloader have written in the CMOS

Hm, I'm not sure hvmloader uploads a new memory map to Xen (using
XENMEM_set_memory_map) if it does any modifications to it. It should
certainly do it, so that the guest OS gets the same memory map from
the hypercall or from the firmware.

> and use the information in the e820 directly. But I think I let this
> change for future patch.
> 
> > > +    UINT64  HighestAddress;
> > > +
> > > +    HighestAddress = GetHighestSystemMemoryAddress (TRUE);
> > > +    ASSERT (HighestAddress > 0 && HighestAddress <= BASE_4GB);
> > > +
> > > +    return HighestAddress;
> > 
> > The name of the function here is GetSystemMemorySizeBelow4gb, but you
> > are returning the highest memory address in the range, is this
> > expected?
> > 
> > ie: highest address != memory size
> > 
> > On HVM there are quite some holes in the memory map, and nothing
> > guarantees there are no memory regions after the holes or non-RAM
> > regions.
> 
> I think that's what is expected by caller of the function.

OK, the naming is slightly misleading, just wanted to make sure this
was the expected value.

> > > +  }
> > > +
> > >    //
> > >    // CMOS 0x34/0x35 specifies the system memory above 16 MB.
> > >    // * CMOS(0x35) is the high byte
> > > diff --git a/OvmfPkg/XenPlatformPei/Xen.c b/OvmfPkg/XenPlatformPei/Xen.c
> > > index cbfd8058fc..62a2c3ed93 100644
> > > --- a/OvmfPkg/XenPlatformPei/Xen.c
> > > +++ b/OvmfPkg/XenPlatformPei/Xen.c
> > > @@ -279,6 +279,8 @@ XenPublishRamRegions (
> > >    EFI_E820_ENTRY64  *E820Map;
> > >    UINT32            E820EntriesCount;
> > >    EFI_STATUS        Status;
> > > +  EFI_E820_ENTRY64 *Entry;
> > > +  UINTN Index;
> > >  
> > >    DEBUG ((EFI_D_INFO, "Using memory map provided by Xen\n"));
> > >  
> > > @@ -287,26 +289,45 @@ XenPublishRamRegions (
> > >    //
> > >    E820EntriesCount = 0;
> > >    Status = XenGetE820Map (&E820Map, &E820EntriesCount);
> > > -
> > >    ASSERT_EFI_ERROR (Status);
> > >  
> > > -  if (E820EntriesCount > 0) {
> > > -    EFI_E820_ENTRY64 *Entry;
> > > -    UINT32 Loop;
> > > +  for (Index = 0; Index < E820EntriesCount; Index++) {
> > > +    UINT64 Base;
> > > +    UINT64 End;
> > >  
> > > -    for (Loop = 0; Loop < E820EntriesCount; Loop++) {
> > > -      Entry = E820Map + Loop;
> > > +    Entry = &E820Map[Index];
> > >  
> > > +
> > > +    //
> > > +    // Round up the start address, and round down the end address.
> > > +    //
> > > +    Base = ALIGN_VALUE (Entry->BaseAddr, (UINT64)EFI_PAGE_SIZE);
> > > +    End = (Entry->BaseAddr + Entry->Length) & ~(UINT64)EFI_PAGE_MASK;
> > > +
> > > +    switch (Entry->Type) {
> > > +    case EfiAcpiAddressRangeMemory:
> > > +      AddMemoryRangeHob (Base, End);
> > > +      break;
> > > +    case EfiAcpiAddressRangeACPI:
> > > +      //
> > > +      // Ignore, OVMF should read the ACPI tables and provide them to linux
> > > +      // from a different location.
> > 
> > Will OVMF also parse dynamic tables to check for references there?
> 
> I haven't looked at what OVMF does with the ACPI tables, but Linux seems
> fine. I've compared the boot output of linux running as PVH vs booted
> via OVMF. Beside the location of the table been different, the number of
> table where the same, I don't remember other difference.

OK, what I find weird is that you seem to discard quite a lot of stuff
from the original memory map, and then reconstruct it afterwards I
assume?

It would seem safer to not discard regions from the memory map
provided to OVMF, and instead just build on top of it. I would expect
for example that OVMF will use some of the RAM regions on the memory
map, and it should likely turn those areas from RAM into reserved
regions.

> > > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> > >        //
> > > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > > -        continue;
> > > +      if (!XenHvmloaderDetected ()) {
> > > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> > 
> > This special casing for PVH looks weird, ideally we would like to use
> > the same code path, or else it should be explicitly mentioned why PVH
> > has diverging behaviour.
> 
> I think hvmloader is the issue rather than PVH. Here is part of the
> "memory map" as found in hvmloader/config.h:
> 
>   /* Special BIOS mappings, etc. are allocated from here upwards... */
>   #define RESERVED_MEMBASE              0xFC000000
>   /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
>   #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
>   #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
>   #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
> 
> and hvmloader simply creates a single e820 reserved entry, from
> RESERVED_MEMBASE to the top of 4GB. It's probably too much.

But isn't this kind of dangerous? How can you assure future versions
of hvmloader won't use this space?

> If hvmloader only reserved
> ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
> to special case hvmloader.

Could we look into getting this fixed in hvmloader then?

I think it's dangerous for OVMF to play such tricks with the memory
map.

> As far as I know 0xfee00000 isn't a special
> bios mapping, but something the hardware provides.

Yes, that's used by the lapic, so it's not specific to hvmloader.

> Anyway, thanks for the feedback, there's probably quite a bit to do to
> cleanup the memory stuff. I do think about one day running OVMF without
> running hvmloader first :-), but there's a bit more to do.

Sorry if some of my comments are out of place, I certainly know very
little about OVMF code base, or the Xen bits in it.

Running OVMF without hvmloader would be my end goal also if possible
:).

Thanks, Roger.
Anthony PERARD July 23, 2019, 4:06 p.m. UTC | #6
On Tue, Jul 23, 2019 at 11:42:07AM +0200, Roger Pau Monné wrote:
> On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote:
> > On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> > > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > > You could maybe initialize this as a global to avoid having to issue
> > > a hypercall each time you need to get something from the memory map.
> > 
> > That function does that, it only make the hypercall once. (The hypercall
> > can only be made once anyway, the second time Xen doesn't return the
> > map.)
> 
> Why? I'm looking at the implementation in Xen of XENMEM_memory_map and
> I'm not sure I see how/why the hypercall can only be made once. AFAICT
> you should be able to call XENMEM_memory_map multiple times without
> issues, or else it's a bug somewhere.

:-(, I probably made a mistake when testing that. I tried again and
calling the hypercall serveral time gave the same result. Sorry for the
noise.

> > > > +}
> > > >  
> > > >  UINT32
> > > >  GetSystemMemorySizeBelow4gb (
> > > > @@ -105,6 +146,19 @@ GetSystemMemorySizeBelow4gb (
> > > >    UINT8 Cmos0x34;
> > > >    UINT8 Cmos0x35;
> > > >  
> > > > +  //
> > > > +  // In PVH case, there is no CMOS, we have to calculate the memory size
> > > > +  // from parsing the E820
> > > > +  //
> > > > +  if (XenPvhDetected ()) {
> > > 
> > > IIRC on HVM you can also get the memory map from the hypercall, in
> > > which case you could use the same code path for both HVM and PVH.
> > 
> > I think that wouldn't work because in my experiment, the hypercall would
> > only return the map the first time (at least on PVH). hvmloader already
> > make the hypercall so OVMF can't.
> 
> OK, I'm not sure the reason for this, as I said above I think this is
> a bug somewhere. You should be able to call XENMEM_memory_map multiple
> times.
> 
> > On the other hand, XenGetE820Map() return an E820 map, it doesn't matter
> > if it's the one passed by hvmloader, or the one we've got directly from
> > Xen. So I guess we could ignore what hvmloader have written in the CMOS
> 
> Hm, I'm not sure hvmloader uploads a new memory map to Xen (using
> XENMEM_set_memory_map) if it does any modifications to it. It should
> certainly do it, so that the guest OS gets the same memory map from
> the hypercall or from the firmware.

hvmloader doesn't call XENMEM_set_memory_map (I don't find that string
in the source code), also, I've tested again calling the get memory_map
hypercall in HVM guests and the e820 from hvmloader is different from
the one from the hypercall:

from hvmloader:
    Type Mem  - 00000000 -> 000A0000
    Type Res  - 000F0000 -> 00100000
    Type Mem  - 00100000 -> 3F6B3000
    Type Res  - FC000000 -> 100000000
from Xen:
    Type Mem  - 00100000 -> 3F800000

> > > > +    switch (Entry->Type) {
> > > > +    case EfiAcpiAddressRangeMemory:
> > > > +      AddMemoryRangeHob (Base, End);
> > > > +      break;
> > > > +    case EfiAcpiAddressRangeACPI:
> > > > +      //
> > > > +      // Ignore, OVMF should read the ACPI tables and provide them to linux
> > > > +      // from a different location.
> > > 
> > > Will OVMF also parse dynamic tables to check for references there?
> > 
> > I haven't looked at what OVMF does with the ACPI tables, but Linux seems
> > fine. I've compared the boot output of linux running as PVH vs booted
> > via OVMF. Beside the location of the table been different, the number of
> > table where the same, I don't remember other difference.
> 
> OK, what I find weird is that you seem to discard quite a lot of stuff
> from the original memory map, and then reconstruct it afterwards I
> assume?
> 
> It would seem safer to not discard regions from the memory map
> provided to OVMF, and instead just build on top of it. I would expect

OK, I'll add back the EfiAcpiAddressRangeACPI into the reserved regions.

> for example that OVMF will use some of the RAM regions on the memory
> map, and it should likely turn those areas from RAM into reserved
> regions.
> 
> > > > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > > > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > > > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> > > >        //
> > > > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > > > -        continue;
> > > > +      if (!XenHvmloaderDetected ()) {
> > > > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> > > 
> > > This special casing for PVH looks weird, ideally we would like to use
> > > the same code path, or else it should be explicitly mentioned why PVH
> > > has diverging behaviour.
> > 
> > I think hvmloader is the issue rather than PVH. Here is part of the
> > "memory map" as found in hvmloader/config.h:
> > 
> >   /* Special BIOS mappings, etc. are allocated from here upwards... */
> >   #define RESERVED_MEMBASE              0xFC000000
> >   /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
> >   #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
> >   #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
> >   #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
> > 
> > and hvmloader simply creates a single e820 reserved entry, from
> > RESERVED_MEMBASE to the top of 4GB. It's probably too much.
> 
> But isn't this kind of dangerous? How can you assure future versions
> of hvmloader won't use this space?
> 
> > If hvmloader only reserved
> > ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
> > to special case hvmloader.
> 
> Could we look into getting this fixed in hvmloader then?
> 
> I think it's dangerous for OVMF to play such tricks with the memory
> map.

It's not worse than before :), there's a lot more to cleanup :(.

I've left an hvmloader specific call
    AddReservedMemoryBaseSizeHob (0xFC000000, 0x1000000, FALSE);
which is outside of that XenPublishRamRegions() function. See:
https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg00360.html
I probably need to remove that now and see if I can simply use
hvmloader's e820.
Anthony PERARD July 24, 2019, 4:17 p.m. UTC | #7
On Tue, Jul 23, 2019 at 11:42:07AM +0200, Roger Pau Monné wrote:
> On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote:
> > On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> > > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > > > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > > > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > > > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> > > >        //
> > > > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > > > -        continue;
> > > > +      if (!XenHvmloaderDetected ()) {
> > > > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> > > 
> > > This special casing for PVH looks weird, ideally we would like to use
> > > the same code path, or else it should be explicitly mentioned why PVH
> > > has diverging behaviour.
> > 
> > I think hvmloader is the issue rather than PVH. Here is part of the
> > "memory map" as found in hvmloader/config.h:
> > 
> >   /* Special BIOS mappings, etc. are allocated from here upwards... */
> >   #define RESERVED_MEMBASE              0xFC000000
> >   /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
> >   #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
> >   #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
> >   #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
> > 
> > and hvmloader simply creates a single e820 reserved entry, from
> > RESERVED_MEMBASE to the top of 4GB. It's probably too much.
> 
> But isn't this kind of dangerous? How can you assure future versions
> of hvmloader won't use this space?
> 
> > If hvmloader only reserved
> > ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
> > to special case hvmloader.
> 
> Could we look into getting this fixed in hvmloader then?
> 
> I think it's dangerous for OVMF to play such tricks with the memory
> map.
> 
> > As far as I know 0xfee00000 isn't a special
> > bios mapping, but something the hardware provides.
> 
> Yes, that's used by the lapic, so it's not specific to hvmloader.

Right, I've got a closer look at that CpuDxe module, it wants the local
APIC memory mapped space to be "mapped IO", and that different than
"reserved".

So while parsing the e820 from hvmloader, instead of ignoring all
reserved region, I'm going to avoid adding the local apic memory mapped
space.

something like:
  if (hvmloaderDetected())
    Base = $(start of the e820 entry);
    End = $(start of the e820 entry + size);
    LocalApic = 0xfee00000;
    if (Base < LocalApic && LocalApic < End) {
      AddReservedMemoryRangeHob (Base, LocalApic, FALSE);
      if (End > (LocalApic + SIZE_4KB)) {
        AddReservedMemoryRangeHob (LocalApic + SIZE_4KB, End, FALSE);
      }
    }
  }

Also, I will always add the 0xfee00000 as mapped IO, CpuDxe will not
complain as the region will be of the expected type.

I think with that change (and the other about the ACPI entries),
everything from the e820 table will be put into OVMF's memory map.
Roger Pau Monne July 25, 2019, 9:08 a.m. UTC | #8
On Wed, Jul 24, 2019 at 05:17:59PM +0100, Anthony PERARD wrote:
> On Tue, Jul 23, 2019 at 11:42:07AM +0200, Roger Pau Monné wrote:
> > On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote:
> > > On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> > > > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > > > > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > > > > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > > > > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> > > > >        //
> > > > > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > > > > -        continue;
> > > > > +      if (!XenHvmloaderDetected ()) {
> > > > > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> > > > 
> > > > This special casing for PVH looks weird, ideally we would like to use
> > > > the same code path, or else it should be explicitly mentioned why PVH
> > > > has diverging behaviour.
> > > 
> > > I think hvmloader is the issue rather than PVH. Here is part of the
> > > "memory map" as found in hvmloader/config.h:
> > > 
> > >   /* Special BIOS mappings, etc. are allocated from here upwards... */
> > >   #define RESERVED_MEMBASE              0xFC000000
> > >   /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
> > >   #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
> > >   #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
> > >   #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
> > > 
> > > and hvmloader simply creates a single e820 reserved entry, from
> > > RESERVED_MEMBASE to the top of 4GB. It's probably too much.
> > 
> > But isn't this kind of dangerous? How can you assure future versions
> > of hvmloader won't use this space?
> > 
> > > If hvmloader only reserved
> > > ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
> > > to special case hvmloader.
> > 
> > Could we look into getting this fixed in hvmloader then?
> > 
> > I think it's dangerous for OVMF to play such tricks with the memory
> > map.
> > 
> > > As far as I know 0xfee00000 isn't a special
> > > bios mapping, but something the hardware provides.
> > 
> > Yes, that's used by the lapic, so it's not specific to hvmloader.
> 
> Right, I've got a closer look at that CpuDxe module, it wants the local
> APIC memory mapped space to be "mapped IO", and that different than
> "reserved".
> 
> So while parsing the e820 from hvmloader, instead of ignoring all
> reserved region, I'm going to avoid adding the local apic memory mapped
> space.
> 
> something like:
>   if (hvmloaderDetected())

I don't think you need to gate this on hvmloader being used, while
it's true that PVH memory map doesn't contain such reserved memory
region ATM I don't see any harm in doing this for PVH also.

>     Base = $(start of the e820 entry);
>     End = $(start of the e820 entry + size);
>     LocalApic = 0xfee00000;
>     if (Base < LocalApic && LocalApic < End) {
>       AddReservedMemoryRangeHob (Base, LocalApic, FALSE);
>       if (End > (LocalApic + SIZE_4KB)) {

The range is actually from 0xfee00000 to 0xfeefffff (2MB), so you
likely want to make sure non of this is added as reserved?

Thanks, Roger.
Anthony PERARD July 25, 2019, 10:05 a.m. UTC | #9
On Thu, Jul 25, 2019 at 11:08:29AM +0200, Roger Pau Monné wrote:
> On Wed, Jul 24, 2019 at 05:17:59PM +0100, Anthony PERARD wrote:
> > On Tue, Jul 23, 2019 at 11:42:07AM +0200, Roger Pau Monné wrote:
> > > On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote:
> > > > On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> > > > > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > > > > > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > > > > > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > > > > > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> > > > > >        //
> > > > > > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > > > > > -        continue;
> > > > > > +      if (!XenHvmloaderDetected ()) {
> > > > > > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> > > > > 
> > > > > This special casing for PVH looks weird, ideally we would like to use
> > > > > the same code path, or else it should be explicitly mentioned why PVH
> > > > > has diverging behaviour.
> > > > 
> > > > I think hvmloader is the issue rather than PVH. Here is part of the
> > > > "memory map" as found in hvmloader/config.h:
> > > > 
> > > >   /* Special BIOS mappings, etc. are allocated from here upwards... */
> > > >   #define RESERVED_MEMBASE              0xFC000000
> > > >   /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
> > > >   #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
> > > >   #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
> > > >   #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
> > > > 
> > > > and hvmloader simply creates a single e820 reserved entry, from
> > > > RESERVED_MEMBASE to the top of 4GB. It's probably too much.
> > > 
> > > But isn't this kind of dangerous? How can you assure future versions
> > > of hvmloader won't use this space?
> > > 
> > > > If hvmloader only reserved
> > > > ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
> > > > to special case hvmloader.
> > > 
> > > Could we look into getting this fixed in hvmloader then?
> > > 
> > > I think it's dangerous for OVMF to play such tricks with the memory
> > > map.
> > > 
> > > > As far as I know 0xfee00000 isn't a special
> > > > bios mapping, but something the hardware provides.
> > > 
> > > Yes, that's used by the lapic, so it's not specific to hvmloader.
> > 
> > Right, I've got a closer look at that CpuDxe module, it wants the local
> > APIC memory mapped space to be "mapped IO", and that different than
> > "reserved".
> > 
> > So while parsing the e820 from hvmloader, instead of ignoring all
> > reserved region, I'm going to avoid adding the local apic memory mapped
> > space.
> > 
> > something like:
> >   if (hvmloaderDetected())
> 
> I don't think you need to gate this on hvmloader being used, while
> it's true that PVH memory map doesn't contain such reserved memory
> region ATM I don't see any harm in doing this for PVH also.

Ok.

> >     Base = $(start of the e820 entry);
> >     End = $(start of the e820 entry + size);
> >     LocalApic = 0xfee00000;
> >     if (Base < LocalApic && LocalApic < End) {
> >       AddReservedMemoryRangeHob (Base, LocalApic, FALSE);
> >       if (End > (LocalApic + SIZE_4KB)) {
> 
> The range is actually from 0xfee00000 to 0xfeefffff (2MB), so you
> likely want to make sure non of this is added as reserved?

You mean 1MB, right ? :-). I've try to find out in the Intel manual why
it would be 1MB and couldn't find that, but on the other hand the
initialisation code for OVMF running on QEMU does also reserve 1MB for
the local apic. So I'll change to 1MB.

Thanks,
Roger Pau Monne July 25, 2019, 11:03 a.m. UTC | #10
On Thu, Jul 25, 2019 at 11:05:34AM +0100, Anthony PERARD wrote:
> On Thu, Jul 25, 2019 at 11:08:29AM +0200, Roger Pau Monné wrote:
> > On Wed, Jul 24, 2019 at 05:17:59PM +0100, Anthony PERARD wrote:
> > > On Tue, Jul 23, 2019 at 11:42:07AM +0200, Roger Pau Monné wrote:
> > > > On Mon, Jul 22, 2019 at 03:53:19PM +0100, Anthony PERARD wrote:
> > > > > On Mon, Jul 15, 2019 at 04:15:21PM +0200, Roger Pau Monné wrote:
> > > > > > On Thu, Jul 04, 2019 at 03:42:22PM +0100, Anthony PERARD wrote:
> > > > > > > +      // error message: CpuDxe: IntersectMemoryDescriptor:
> > > > > > > +      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
> > > > > > > +      //        conflicts with aperture [FEE00000, FEE01000) cap 1
> > > > > > >        //
> > > > > > > -      if (Entry->Type != EfiAcpiAddressRangeMemory) {
> > > > > > > -        continue;
> > > > > > > +      if (!XenHvmloaderDetected ()) {
> > > > > > > +        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
> > > > > > 
> > > > > > This special casing for PVH looks weird, ideally we would like to use
> > > > > > the same code path, or else it should be explicitly mentioned why PVH
> > > > > > has diverging behaviour.
> > > > > 
> > > > > I think hvmloader is the issue rather than PVH. Here is part of the
> > > > > "memory map" as found in hvmloader/config.h:
> > > > > 
> > > > >   /* Special BIOS mappings, etc. are allocated from here upwards... */
> > > > >   #define RESERVED_MEMBASE              0xFC000000
> > > > >   /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
> > > > >   #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
> > > > >   #define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
> > > > >   #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
> > > > > 
> > > > > and hvmloader simply creates a single e820 reserved entry, from
> > > > > RESERVED_MEMBASE to the top of 4GB. It's probably too much.
> > > > 
> > > > But isn't this kind of dangerous? How can you assure future versions
> > > > of hvmloader won't use this space?
> > > > 
> > > > > If hvmloader only reserved
> > > > > ACPI_INFO_PHYSICAL_ADDRESS-RESERVED_MEMORY_DYNAMIC_END, I might not have
> > > > > to special case hvmloader.
> > > > 
> > > > Could we look into getting this fixed in hvmloader then?
> > > > 
> > > > I think it's dangerous for OVMF to play such tricks with the memory
> > > > map.
> > > > 
> > > > > As far as I know 0xfee00000 isn't a special
> > > > > bios mapping, but something the hardware provides.
> > > > 
> > > > Yes, that's used by the lapic, so it's not specific to hvmloader.
> > > 
> > > Right, I've got a closer look at that CpuDxe module, it wants the local
> > > APIC memory mapped space to be "mapped IO", and that different than
> > > "reserved".
> > > 
> > > So while parsing the e820 from hvmloader, instead of ignoring all
> > > reserved region, I'm going to avoid adding the local apic memory mapped
> > > space.
> > > 
> > > something like:
> > >   if (hvmloaderDetected())
> > 
> > I don't think you need to gate this on hvmloader being used, while
> > it's true that PVH memory map doesn't contain such reserved memory
> > region ATM I don't see any harm in doing this for PVH also.
> 
> Ok.
> 
> > >     Base = $(start of the e820 entry);
> > >     End = $(start of the e820 entry + size);
> > >     LocalApic = 0xfee00000;
> > >     if (Base < LocalApic && LocalApic < End) {
> > >       AddReservedMemoryRangeHob (Base, LocalApic, FALSE);
> > >       if (End > (LocalApic + SIZE_4KB)) {
> > 
> > The range is actually from 0xfee00000 to 0xfeefffff (2MB), so you
> > likely want to make sure non of this is added as reserved?
> 
> You mean 1MB, right ? :-).

D'oh, yes :).

> I've try to find out in the Intel manual why
> it would be 1MB and couldn't find that, but on the other hand the
> initialisation code for OVMF running on QEMU does also reserve 1MB for
> the local apic. So I'll change to 1MB.

It's the Interrupt Address Range, which contains the mmio lapic
registers and the mmio region where devices write in order to signal
interrupts to the apic (used as the address field for MSI(-X)).

Thanks, Roger.
diff mbox series

Patch

diff --git a/OvmfPkg/XenPlatformPei/Platform.h b/OvmfPkg/XenPlatformPei/Platform.h
index db9a62572f..e8e0b835a5 100644
--- a/OvmfPkg/XenPlatformPei/Platform.h
+++ b/OvmfPkg/XenPlatformPei/Platform.h
@@ -114,6 +114,12 @@  XenPublishRamRegions (
   VOID
   );
 
+EFI_STATUS
+XenGetE820Map (
+  EFI_E820_ENTRY64 **Entries,
+  UINT32 *Count
+  );
+
 extern EFI_BOOT_MODE mBootMode;
 
 extern UINT8 mPhysMemAddressWidth;
diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c b/OvmfPkg/XenPlatformPei/MemDetect.c
index cb7dd93ad6..3e33e7f414 100644
--- a/OvmfPkg/XenPlatformPei/MemDetect.c
+++ b/OvmfPkg/XenPlatformPei/MemDetect.c
@@ -96,6 +96,47 @@  Q35TsegMbytesInitialization (
   mQ35TsegMbytes = ExtendedTsegMbytes;
 }
 
+STATIC
+UINT64
+GetHighestSystemMemoryAddress (
+  BOOLEAN       Below4gb
+  )
+{
+  EFI_E820_ENTRY64    *E820Map;
+  UINT32              E820EntriesCount;
+  EFI_E820_ENTRY64    *Entry;
+  EFI_STATUS          Status;
+  UINT32              Loop;
+  UINT64              HighestAddress;
+  UINT64              EntryEnd;
+
+  HighestAddress = 0;
+
+  Status = XenGetE820Map (&E820Map, &E820EntriesCount);
+  ASSERT_EFI_ERROR (Status);
+
+  for (Loop = 0; Loop < E820EntriesCount; Loop++) {
+    Entry = E820Map + Loop;
+    EntryEnd = Entry->BaseAddr + Entry->Length;
+
+    if (Entry->Type == EfiAcpiAddressRangeMemory &&
+        EntryEnd > HighestAddress) {
+
+      if (Below4gb && (EntryEnd <= BASE_4GB)) {
+        HighestAddress = EntryEnd;
+      } else if (!Below4gb && (EntryEnd >= BASE_4GB)) {
+        HighestAddress = EntryEnd;
+      }
+    }
+  }
+
+  //
+  // Round down the end address.
+  //
+  HighestAddress &= ~(UINT64)EFI_PAGE_MASK;
+
+  return HighestAddress;
+}
 
 UINT32
 GetSystemMemorySizeBelow4gb (
@@ -105,6 +146,19 @@  GetSystemMemorySizeBelow4gb (
   UINT8 Cmos0x34;
   UINT8 Cmos0x35;
 
+  //
+  // In PVH case, there is no CMOS, we have to calculate the memory size
+  // from parsing the E820
+  //
+  if (XenPvhDetected ()) {
+    UINT64  HighestAddress;
+
+    HighestAddress = GetHighestSystemMemoryAddress (TRUE);
+    ASSERT (HighestAddress > 0 && HighestAddress <= BASE_4GB);
+
+    return HighestAddress;
+  }
+
   //
   // CMOS 0x34/0x35 specifies the system memory above 16 MB.
   // * CMOS(0x35) is the high byte
@@ -129,6 +183,23 @@  GetSystemMemorySizeAbove4gb (
   UINT32 Size;
   UINTN  CmosIndex;
 
+  //
+  // In PVH case, there is no CMOS, we have to calculate the memory size
+  // from parsing the E820
+  //
+  if (XenPvhDetected ()) {
+    UINT64  HighestAddress;
+
+    HighestAddress = GetHighestSystemMemoryAddress (FALSE);
+    ASSERT (HighestAddress == 0 || HighestAddress >= BASE_4GB);
+
+    if (HighestAddress >= BASE_4GB) {
+      HighestAddress -= BASE_4GB;
+    }
+
+    return HighestAddress;
+  }
+
   //
   // CMOS 0x5b-0x5d specifies the system memory above 4GB MB.
   // * CMOS(0x5d) is the most significant size byte
diff --git a/OvmfPkg/XenPlatformPei/Xen.c b/OvmfPkg/XenPlatformPei/Xen.c
index cbfd8058fc..62a2c3ed93 100644
--- a/OvmfPkg/XenPlatformPei/Xen.c
+++ b/OvmfPkg/XenPlatformPei/Xen.c
@@ -279,6 +279,8 @@  XenPublishRamRegions (
   EFI_E820_ENTRY64  *E820Map;
   UINT32            E820EntriesCount;
   EFI_STATUS        Status;
+  EFI_E820_ENTRY64 *Entry;
+  UINTN Index;
 
   DEBUG ((EFI_D_INFO, "Using memory map provided by Xen\n"));
 
@@ -287,26 +289,45 @@  XenPublishRamRegions (
   //
   E820EntriesCount = 0;
   Status = XenGetE820Map (&E820Map, &E820EntriesCount);
-
   ASSERT_EFI_ERROR (Status);
 
-  if (E820EntriesCount > 0) {
-    EFI_E820_ENTRY64 *Entry;
-    UINT32 Loop;
+  for (Index = 0; Index < E820EntriesCount; Index++) {
+    UINT64 Base;
+    UINT64 End;
 
-    for (Loop = 0; Loop < E820EntriesCount; Loop++) {
-      Entry = E820Map + Loop;
+    Entry = &E820Map[Index];
 
+
+    //
+    // Round up the start address, and round down the end address.
+    //
+    Base = ALIGN_VALUE (Entry->BaseAddr, (UINT64)EFI_PAGE_SIZE);
+    End = (Entry->BaseAddr + Entry->Length) & ~(UINT64)EFI_PAGE_MASK;
+
+    switch (Entry->Type) {
+    case EfiAcpiAddressRangeMemory:
+      AddMemoryRangeHob (Base, End);
+      break;
+    case EfiAcpiAddressRangeACPI:
+      //
+      // Ignore, OVMF should read the ACPI tables and provide them to linux
+      // from a different location.
+      //
+      break;
+    case EfiAcpiAddressRangeReserved:
       //
-      // Only care about RAM
+      // Avoid ranges marked as reserved in the e820 table provided by
+      // hvmloader as it conflicts with an other aperture.
+      // error message: CpuDxe: IntersectMemoryDescriptor:
+      //        desc [FC000000, 100000000) type 1 cap 8700000000026001
+      //        conflicts with aperture [FEE00000, FEE01000) cap 1
       //
-      if (Entry->Type != EfiAcpiAddressRangeMemory) {
-        continue;
+      if (!XenHvmloaderDetected ()) {
+        AddReservedMemoryBaseSizeHob (Base, End - Base, FALSE);
       }
-
-      AddMemoryBaseSizeHob (Entry->BaseAddr, Entry->Length);
-
-      MtrrSetMemoryAttribute (Entry->BaseAddr, Entry->Length, CacheWriteBack);
+      break;
+    default:
+      break;
     }
   }
 }