diff mbox series

[v3,3/4] Add a new hypercall to get the ESRT

Message ID Yl7aC2a+TtOaFtqZ@itl-email (mailing list archive)
State New
Headers show
Series EFI System Resource Table support | expand

Commit Message

Demi Marie Obenour April 19, 2022, 3:49 p.m. UTC
This hypercall can be used to get the ESRT from the hypervisor.  It
returning successfully also indicates that Xen has reserved the ESRT and
it can safely be parsed by dom0.
---
 xen/common/efi/boot.c         | 15 ++++++++++-----
 xen/common/efi/efi.h          |  2 ++
 xen/common/efi/runtime.c      | 14 ++++++++++++++
 xen/include/public/platform.h |  7 +++++++
 4 files changed, 33 insertions(+), 5 deletions(-)

Comments

Jan Beulich April 27, 2022, 8:56 a.m. UTC | #1
On 19.04.2022 17:49, Demi Marie Obenour wrote:
> This hypercall can be used to get the ESRT from the hypervisor.  It
> returning successfully also indicates that Xen has reserved the ESRT and
> it can safely be parsed by dom0.

I'm not convinced of the need, and I view such an addition as inconsistent
with the original intentions. The pointer comes from the config table,
which Dom0 already has access to. All a Dom0 kernel may need to know in
addition is whether the range was properly reserved. This could be achieved
by splitting the EFI memory map entry in patch 2, instead of only splitting
the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out
the range's type. Another way to find out would be for Dom0 to attempt to
map this area as MMIO, after first checking that no part of the range is in
its own memory allocation. This 2nd approach may, however, not really be
suitable for PVH Dom0, I think.

Jan
Demi Marie Obenour April 27, 2022, 7:08 p.m. UTC | #2
On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote:
> On 19.04.2022 17:49, Demi Marie Obenour wrote:
> > This hypercall can be used to get the ESRT from the hypervisor.  It
> > returning successfully also indicates that Xen has reserved the ESRT and
> > it can safely be parsed by dom0.
> 
> I'm not convinced of the need, and I view such an addition as inconsistent
> with the original intentions. The pointer comes from the config table,
> which Dom0 already has access to. All a Dom0 kernel may need to know in
> addition is whether the range was properly reserved. This could be achieved
> by splitting the EFI memory map entry in patch 2, instead of only splitting
> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out
> the range's type. Another way to find out would be for Dom0 to attempt to
> map this area as MMIO, after first checking that no part of the range is in
> its own memory allocation. This 2nd approach may, however, not really be
> suitable for PVH Dom0, I think.

On further thought, I think the hypercall approach is actually better
than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
return anything other than the actual firmware-provided memory
information, and the current approach seems to require more and more
special-casing of the ESRT, not to mention potentially wasting memory
and splitting a potentially large memory region into two smaller ones.
By copying the entire ESRT into memory owned by Xen, the logic becomes
significantly simpler on both the Xen and dom0 sides.

Is using ebmalloc() to allocate a copy of the ESRT a reasonable option?
Is it possible that the ESRT is so large that this causes boot to fail?
Jan Beulich April 28, 2022, 6:47 a.m. UTC | #3
On 27.04.2022 21:08, Demi Marie Obenour wrote:
> On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote:
>> On 19.04.2022 17:49, Demi Marie Obenour wrote:
>>> This hypercall can be used to get the ESRT from the hypervisor.  It
>>> returning successfully also indicates that Xen has reserved the ESRT and
>>> it can safely be parsed by dom0.
>>
>> I'm not convinced of the need, and I view such an addition as inconsistent
>> with the original intentions. The pointer comes from the config table,
>> which Dom0 already has access to. All a Dom0 kernel may need to know in
>> addition is whether the range was properly reserved. This could be achieved
>> by splitting the EFI memory map entry in patch 2, instead of only splitting
>> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out
>> the range's type. Another way to find out would be for Dom0 to attempt to
>> map this area as MMIO, after first checking that no part of the range is in
>> its own memory allocation. This 2nd approach may, however, not really be
>> suitable for PVH Dom0, I think.
> 
> On further thought, I think the hypercall approach is actually better
> than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
> return anything other than the actual firmware-provided memory
> information, and the current approach seems to require more and more
> special-casing of the ESRT, not to mention potentially wasting memory
> and splitting a potentially large memory region into two smaller ones.
> By copying the entire ESRT into memory owned by Xen, the logic becomes
> significantly simpler on both the Xen and dom0 sides.

I actually did consider the option of making a private copy when you did
send the initial version of this, but I'm not convinced this simplifies
things from a kernel perspective: They'd now need to discover the table
by some entirely different means. In Linux at least such divergence
"just for Xen" hasn't been liked in the past.

There's also the question of how to propagate the information across
kexec. But I guess that question exists even outside of Xen, with the
area living in memory which the OS is expected to recycle.

> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option?

I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to
make the copy before ExitBootServices(), via normal EFI allocation. If
replacing a pointer in the config table was okay(ish), this could even
be utilized to overcome the kexec problem.

> Is it possible that the ESRT is so large that this causes boot to fail?

I don't know - that's a question firmware folks would need to answer.

Jan
Demi Marie Obenour April 28, 2022, 10:54 p.m. UTC | #4
On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote:
> On 27.04.2022 21:08, Demi Marie Obenour wrote:
> > On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote:
> >> On 19.04.2022 17:49, Demi Marie Obenour wrote:
> >>> This hypercall can be used to get the ESRT from the hypervisor.  It
> >>> returning successfully also indicates that Xen has reserved the ESRT and
> >>> it can safely be parsed by dom0.
> >>
> >> I'm not convinced of the need, and I view such an addition as inconsistent
> >> with the original intentions. The pointer comes from the config table,
> >> which Dom0 already has access to. All a Dom0 kernel may need to know in
> >> addition is whether the range was properly reserved. This could be achieved
> >> by splitting the EFI memory map entry in patch 2, instead of only splitting
> >> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out
> >> the range's type. Another way to find out would be for Dom0 to attempt to
> >> map this area as MMIO, after first checking that no part of the range is in
> >> its own memory allocation. This 2nd approach may, however, not really be
> >> suitable for PVH Dom0, I think.
> > 
> > On further thought, I think the hypercall approach is actually better
> > than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
> > return anything other than the actual firmware-provided memory
> > information, and the current approach seems to require more and more
> > special-casing of the ESRT, not to mention potentially wasting memory
> > and splitting a potentially large memory region into two smaller ones.
> > By copying the entire ESRT into memory owned by Xen, the logic becomes
> > significantly simpler on both the Xen and dom0 sides.
> 
> I actually did consider the option of making a private copy when you did
> send the initial version of this, but I'm not convinced this simplifies
> things from a kernel perspective: They'd now need to discover the table
> by some entirely different means. In Linux at least such divergence
> "just for Xen" hasn't been liked in the past.
> 
> There's also the question of how to propagate the information across
> kexec. But I guess that question exists even outside of Xen, with the
> area living in memory which the OS is expected to recycle.

Indeed it does.  A simple rule might be, “Only trust the ESRT if it is
in memory of type EfiRuntimeServicesData.”  That is easy to achieve by
monkeypatching the config table as you suggested below.

I *am* worried that the config table might be mapped read-only on some
systems, in which case the overwrite would cause a fatal page fault.  Is
there a way for Xen to check for this?  It could also be undefined
behavior to modify it.

> > Is using ebmalloc() to allocate a copy of the ESRT a reasonable option?
> 
> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to
> make the copy before ExitBootServices(), via normal EFI allocation. If
> replacing a pointer in the config table was okay(ish), this could even
> be utilized to overcome the kexec problem.

What type should I use for the allocation?  EfiLoaderData looks like the
most consistent choice, but I am not sure if memory so allocated remains
valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a
better choice.  To avoid memory leaks from repeated kexec(), this could
be made conditional on the ESRT not being in memory of type
EfiRuntimeServicesData to begin with.
Jan Beulich April 29, 2022, 8:40 a.m. UTC | #5
On 29.04.2022 00:54, Demi Marie Obenour wrote:
> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote:
>> On 27.04.2022 21:08, Demi Marie Obenour wrote:
>>> On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote:
>>>> On 19.04.2022 17:49, Demi Marie Obenour wrote:
>>>>> This hypercall can be used to get the ESRT from the hypervisor.  It
>>>>> returning successfully also indicates that Xen has reserved the ESRT and
>>>>> it can safely be parsed by dom0.
>>>>
>>>> I'm not convinced of the need, and I view such an addition as inconsistent
>>>> with the original intentions. The pointer comes from the config table,
>>>> which Dom0 already has access to. All a Dom0 kernel may need to know in
>>>> addition is whether the range was properly reserved. This could be achieved
>>>> by splitting the EFI memory map entry in patch 2, instead of only splitting
>>>> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out
>>>> the range's type. Another way to find out would be for Dom0 to attempt to
>>>> map this area as MMIO, after first checking that no part of the range is in
>>>> its own memory allocation. This 2nd approach may, however, not really be
>>>> suitable for PVH Dom0, I think.
>>>
>>> On further thought, I think the hypercall approach is actually better
>>> than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
>>> return anything other than the actual firmware-provided memory
>>> information, and the current approach seems to require more and more
>>> special-casing of the ESRT, not to mention potentially wasting memory
>>> and splitting a potentially large memory region into two smaller ones.
>>> By copying the entire ESRT into memory owned by Xen, the logic becomes
>>> significantly simpler on both the Xen and dom0 sides.
>>
>> I actually did consider the option of making a private copy when you did
>> send the initial version of this, but I'm not convinced this simplifies
>> things from a kernel perspective: They'd now need to discover the table
>> by some entirely different means. In Linux at least such divergence
>> "just for Xen" hasn't been liked in the past.
>>
>> There's also the question of how to propagate the information across
>> kexec. But I guess that question exists even outside of Xen, with the
>> area living in memory which the OS is expected to recycle.
> 
> Indeed it does.  A simple rule might be, “Only trust the ESRT if it is
> in memory of type EfiRuntimeServicesData.”  That is easy to achieve by
> monkeypatching the config table as you suggested below.
> 
> I *am* worried that the config table might be mapped read-only on some
> systems, in which case the overwrite would cause a fatal page fault.  Is
> there a way for Xen to check for this?

While in boot mode, aiui page tables aren't supposed to be enforcing
access restrictions. Recall that on other architectures EFI even runs
with paging disabled; this simply is not possible for x86-64. So
portable firmware shouldn't map anything r/o. In principle the pointer
could still be in ROM; I consider this unlikely, but we could check
for that (just like we could do a page table walk to figure out
whether a r/o mapping would prevent us from updating the field).

>  It could also be undefined behavior to modify it.

That's the bigger worry I have.

>>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option?
>>
>> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to
>> make the copy before ExitBootServices(), via normal EFI allocation. If
>> replacing a pointer in the config table was okay(ish), this could even
>> be utilized to overcome the kexec problem.
> 
> What type should I use for the allocation?  EfiLoaderData looks like the
> most consistent choice, but I am not sure if memory so allocated remains
> valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a
> better choice.

It definitely is. We do recycle EfiLoaderData ourselves.

>  To avoid memory leaks from repeated kexec(), this could
> be made conditional on the ESRT not being in memory of type
> EfiRuntimeServicesData to begin with.

Of course - there's no point relocating the blob when it already is
immune to recycling.

Jan
Demi Marie Obenour April 29, 2022, 5:06 p.m. UTC | #6
On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote:
> On 29.04.2022 00:54, Demi Marie Obenour wrote:
> > On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote:
> >> On 27.04.2022 21:08, Demi Marie Obenour wrote:
> >>> On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote:
> >>>> On 19.04.2022 17:49, Demi Marie Obenour wrote:
> >>>>> This hypercall can be used to get the ESRT from the hypervisor.  It
> >>>>> returning successfully also indicates that Xen has reserved the ESRT and
> >>>>> it can safely be parsed by dom0.
> >>>>
> >>>> I'm not convinced of the need, and I view such an addition as inconsistent
> >>>> with the original intentions. The pointer comes from the config table,
> >>>> which Dom0 already has access to. All a Dom0 kernel may need to know in
> >>>> addition is whether the range was properly reserved. This could be achieved
> >>>> by splitting the EFI memory map entry in patch 2, instead of only splitting
> >>>> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out
> >>>> the range's type. Another way to find out would be for Dom0 to attempt to
> >>>> map this area as MMIO, after first checking that no part of the range is in
> >>>> its own memory allocation. This 2nd approach may, however, not really be
> >>>> suitable for PVH Dom0, I think.
> >>>
> >>> On further thought, I think the hypercall approach is actually better
> >>> than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
> >>> return anything other than the actual firmware-provided memory
> >>> information, and the current approach seems to require more and more
> >>> special-casing of the ESRT, not to mention potentially wasting memory
> >>> and splitting a potentially large memory region into two smaller ones.
> >>> By copying the entire ESRT into memory owned by Xen, the logic becomes
> >>> significantly simpler on both the Xen and dom0 sides.
> >>
> >> I actually did consider the option of making a private copy when you did
> >> send the initial version of this, but I'm not convinced this simplifies
> >> things from a kernel perspective: They'd now need to discover the table
> >> by some entirely different means. In Linux at least such divergence
> >> "just for Xen" hasn't been liked in the past.
> >>
> >> There's also the question of how to propagate the information across
> >> kexec. But I guess that question exists even outside of Xen, with the
> >> area living in memory which the OS is expected to recycle.
> > 
> > Indeed it does.  A simple rule might be, “Only trust the ESRT if it is
> > in memory of type EfiRuntimeServicesData.”  That is easy to achieve by
> > monkeypatching the config table as you suggested below.
> > 
> > I *am* worried that the config table might be mapped read-only on some
> > systems, in which case the overwrite would cause a fatal page fault.  Is
> > there a way for Xen to check for this?
> 
> While in boot mode, aiui page tables aren't supposed to be enforcing
> access restrictions. Recall that on other architectures EFI even runs
> with paging disabled; this simply is not possible for x86-64.

Yikes!  No wonder firmware has nonexistent exploit mitigations.  They
really ought to start porting UEFI to Rust, with ASLR, NX, stack
canaries, a hardened allocator, and support for de-priviliged services
that run in user mode.

That reminds me: Can Xen itself run from ROM?  Xen is being ported to
POWER for use in Qubes OS, and one approach under consideration is to
have Xen and a mini-dom0 be part of the firmware.  Personally, I really
like this approach, as it makes untrusted storage domains much simpler.
If this should be a separate email thread, let me know.

> So
> portable firmware shouldn't map anything r/o. In principle the pointer
> could still be in ROM; I consider this unlikely, but we could check
> for that (just like we could do a page table walk to figure out
> whether a r/o mapping would prevent us from updating the field).

Is there a utility function that could be used for this?

> >  It could also be undefined behavior to modify it.
> 
> That's the bigger worry I have.

Turns out that it is *not* undefined behavior, so long as
ExitBootServices() has not been called.  This is becaues EFI drivers
will modify the config table, so firmware cannot assume it to be
read-only.

> >>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option?
> >>
> >> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to
> >> make the copy before ExitBootServices(), via normal EFI allocation. If
> >> replacing a pointer in the config table was okay(ish), this could even
> >> be utilized to overcome the kexec problem.
> > 
> > What type should I use for the allocation?  EfiLoaderData looks like the
> > most consistent choice, but I am not sure if memory so allocated remains
> > valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a
> > better choice.
> 
> It definitely is. We do recycle EfiLoaderData ourselves.

I wonder why the ESRT was not in EfiRuntimeServicesData to begin with.

> >  To avoid memory leaks from repeated kexec(), this could
> > be made conditional on the ESRT not being in memory of type
> > EfiRuntimeServicesData to begin with.
> 
> Of course - there's no point relocating the blob when it already is
> immune to recycling.

Yup.  Is it reasonable for dom0 to check that the ESRT is in
EfiRuntimeServicesData when under Xen?
Jan Beulich May 2, 2022, 6:24 a.m. UTC | #7
On 29.04.2022 19:06, Demi Marie Obenour wrote:
> On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote:
>> On 29.04.2022 00:54, Demi Marie Obenour wrote:
>>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote:
>>>> On 27.04.2022 21:08, Demi Marie Obenour wrote:
>>>>> On further thought, I think the hypercall approach is actually better
>>>>> than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
>>>>> return anything other than the actual firmware-provided memory
>>>>> information, and the current approach seems to require more and more
>>>>> special-casing of the ESRT, not to mention potentially wasting memory
>>>>> and splitting a potentially large memory region into two smaller ones.
>>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes
>>>>> significantly simpler on both the Xen and dom0 sides.
>>>>
>>>> I actually did consider the option of making a private copy when you did
>>>> send the initial version of this, but I'm not convinced this simplifies
>>>> things from a kernel perspective: They'd now need to discover the table
>>>> by some entirely different means. In Linux at least such divergence
>>>> "just for Xen" hasn't been liked in the past.
>>>>
>>>> There's also the question of how to propagate the information across
>>>> kexec. But I guess that question exists even outside of Xen, with the
>>>> area living in memory which the OS is expected to recycle.
>>>
>>> Indeed it does.  A simple rule might be, “Only trust the ESRT if it is
>>> in memory of type EfiRuntimeServicesData.”  That is easy to achieve by
>>> monkeypatching the config table as you suggested below.
>>>
>>> I *am* worried that the config table might be mapped read-only on some
>>> systems, in which case the overwrite would cause a fatal page fault.  Is
>>> there a way for Xen to check for this?
>>
>> While in boot mode, aiui page tables aren't supposed to be enforcing
>> access restrictions. Recall that on other architectures EFI even runs
>> with paging disabled; this simply is not possible for x86-64.
> 
> Yikes!  No wonder firmware has nonexistent exploit mitigations.  They
> really ought to start porting UEFI to Rust, with ASLR, NX, stack
> canaries, a hardened allocator, and support for de-priviliged services
> that run in user mode.
> 
> That reminds me: Can Xen itself run from ROM?

I guess that could be possible in principle, but would certainly require
some work.

>  Xen is being ported to
> POWER for use in Qubes OS, and one approach under consideration is to
> have Xen and a mini-dom0 be part of the firmware.  Personally, I really
> like this approach, as it makes untrusted storage domains much simpler.
> If this should be a separate email thread, let me know.

It probably should be.

>> So
>> portable firmware shouldn't map anything r/o. In principle the pointer
>> could still be in ROM; I consider this unlikely, but we could check
>> for that (just like we could do a page table walk to figure out
>> whether a r/o mapping would prevent us from updating the field).
> 
> Is there a utility function that could be used for this?

I don't think there is.

>>>  It could also be undefined behavior to modify it.
>>
>> That's the bigger worry I have.
> 
> Turns out that it is *not* undefined behavior, so long as
> ExitBootServices() has not been called.  This is becaues EFI drivers
> will modify the config table, so firmware cannot assume it to be
> read-only.

Ah, right - we could even use InstallConfigurationTable() ourselves
to make the adjustment.

>>>>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option?
>>>>
>>>> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to
>>>> make the copy before ExitBootServices(), via normal EFI allocation. If
>>>> replacing a pointer in the config table was okay(ish), this could even
>>>> be utilized to overcome the kexec problem.
>>>
>>> What type should I use for the allocation?  EfiLoaderData looks like the
>>> most consistent choice, but I am not sure if memory so allocated remains
>>> valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a
>>> better choice.
>>
>> It definitely is. We do recycle EfiLoaderData ourselves.
> 
> I wonder why the ESRT was not in EfiRuntimeServicesData to begin with.

So do I.

>>>  To avoid memory leaks from repeated kexec(), this could
>>> be made conditional on the ESRT not being in memory of type
>>> EfiRuntimeServicesData to begin with.
>>
>> Of course - there's no point relocating the blob when it already is
>> immune to recycling.
> 
> Yup.  Is it reasonable for dom0 to check that the ESRT is in
> EfiRuntimeServicesData when under Xen?

I think it is, but kernel folks may not like Xen specific code in this
(or about any) area.

Jan
Demi Marie Obenour May 2, 2022, 7:11 a.m. UTC | #8
On Mon, May 02, 2022 at 08:24:30AM +0200, Jan Beulich wrote:
> On 29.04.2022 19:06, Demi Marie Obenour wrote:
> > On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote:
> >> On 29.04.2022 00:54, Demi Marie Obenour wrote:
> >>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote:
> >>>> On 27.04.2022 21:08, Demi Marie Obenour wrote:
> >>>>> On further thought, I think the hypercall approach is actually better
> >>>>> than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
> >>>>> return anything other than the actual firmware-provided memory
> >>>>> information, and the current approach seems to require more and more
> >>>>> special-casing of the ESRT, not to mention potentially wasting memory
> >>>>> and splitting a potentially large memory region into two smaller ones.
> >>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes
> >>>>> significantly simpler on both the Xen and dom0 sides.
> >>>>
> >>>> I actually did consider the option of making a private copy when you did
> >>>> send the initial version of this, but I'm not convinced this simplifies
> >>>> things from a kernel perspective: They'd now need to discover the table
> >>>> by some entirely different means. In Linux at least such divergence
> >>>> "just for Xen" hasn't been liked in the past.
> >>>>
> >>>> There's also the question of how to propagate the information across
> >>>> kexec. But I guess that question exists even outside of Xen, with the
> >>>> area living in memory which the OS is expected to recycle.
> >>>
> >>> Indeed it does.  A simple rule might be, “Only trust the ESRT if it is
> >>> in memory of type EfiRuntimeServicesData.”  That is easy to achieve by
> >>> monkeypatching the config table as you suggested below.
> >>>
> >>> I *am* worried that the config table might be mapped read-only on some
> >>> systems, in which case the overwrite would cause a fatal page fault.  Is
> >>> there a way for Xen to check for this?
> >>
> >> While in boot mode, aiui page tables aren't supposed to be enforcing
> >> access restrictions. Recall that on other architectures EFI even runs
> >> with paging disabled; this simply is not possible for x86-64.
> > 
> > Yikes!  No wonder firmware has nonexistent exploit mitigations.  They
> > really ought to start porting UEFI to Rust, with ASLR, NX, stack
> > canaries, a hardened allocator, and support for de-priviliged services
> > that run in user mode.
> > 
> > That reminds me: Can Xen itself run from ROM?
> 
> I guess that could be possible in principle, but would certainly require
> some work.
> 
> >  Xen is being ported to
> > POWER for use in Qubes OS, and one approach under consideration is to
> > have Xen and a mini-dom0 be part of the firmware.  Personally, I really
> > like this approach, as it makes untrusted storage domains much simpler.
> > If this should be a separate email thread, let me know.
> 
> It probably should be.

I will make one at some point.

> >> So
> >> portable firmware shouldn't map anything r/o. In principle the pointer
> >> could still be in ROM; I consider this unlikely, but we could check
> >> for that (just like we could do a page table walk to figure out
> >> whether a r/o mapping would prevent us from updating the field).
> > 
> > Is there a utility function that could be used for this?
> 
> I don't think there is.

Then it is good that none is necessary :)

Also, should the various bug checks I added be replaced by ASSERT()?

> >>>  It could also be undefined behavior to modify it.
> >>
> >> That's the bigger worry I have.
> > 
> > Turns out that it is *not* undefined behavior, so long as
> > ExitBootServices() has not been called.  This is becaues EFI drivers
> > will modify the config table, so firmware cannot assume it to be
> > read-only.
> 
> Ah, right - we could even use InstallConfigurationTable() ourselves
> to make the adjustment.

That is even simpler than I thought!  I was worried that
InstallConfigurationTable() would assume that memory for the table was
allocated a certain way and cause invalid free errors, but at least
TianoCore does not do that.

> >>>>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option?
> >>>>
> >>>> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to
> >>>> make the copy before ExitBootServices(), via normal EFI allocation. If
> >>>> replacing a pointer in the config table was okay(ish), this could even
> >>>> be utilized to overcome the kexec problem.
> >>>
> >>> What type should I use for the allocation?  EfiLoaderData looks like the
> >>> most consistent choice, but I am not sure if memory so allocated remains
> >>> valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a
> >>> better choice.
> >>
> >> It definitely is. We do recycle EfiLoaderData ourselves.
> > 
> > I wonder why the ESRT was not in EfiRuntimeServicesData to begin with.
> 
> So do I.

I suspect the assumption was that the ESRT would be parsed by the OS
before ExitBootServices(), and that the OS would have no need for the
ESRT after that.

> >>>  To avoid memory leaks from repeated kexec(), this could
> >>> be made conditional on the ESRT not being in memory of type
> >>> EfiRuntimeServicesData to begin with.
> >>
> >> Of course - there's no point relocating the blob when it already is
> >> immune to recycling.
> > 
> > Yup.  Is it reasonable for dom0 to check that the ESRT is in
> > EfiRuntimeServicesData when under Xen?
> 
> I think it is, but kernel folks may not like Xen specific code in this
> (or about any) area.
> 
> Jan

There is PVops et al already :)
Jan Beulich May 2, 2022, 7:37 a.m. UTC | #9
On 02.05.2022 09:11, Demi Marie Obenour wrote:
> On Mon, May 02, 2022 at 08:24:30AM +0200, Jan Beulich wrote:
>> On 29.04.2022 19:06, Demi Marie Obenour wrote:
>>> On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote:
>>>> On 29.04.2022 00:54, Demi Marie Obenour wrote:
>>>>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote:
>>>>>> On 27.04.2022 21:08, Demi Marie Obenour wrote:
>>>>>>> On further thought, I think the hypercall approach is actually better
>>>>>>> than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
>>>>>>> return anything other than the actual firmware-provided memory
>>>>>>> information, and the current approach seems to require more and more
>>>>>>> special-casing of the ESRT, not to mention potentially wasting memory
>>>>>>> and splitting a potentially large memory region into two smaller ones.
>>>>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes
>>>>>>> significantly simpler on both the Xen and dom0 sides.
>>>>>>
>>>>>> I actually did consider the option of making a private copy when you did
>>>>>> send the initial version of this, but I'm not convinced this simplifies
>>>>>> things from a kernel perspective: They'd now need to discover the table
>>>>>> by some entirely different means. In Linux at least such divergence
>>>>>> "just for Xen" hasn't been liked in the past.
>>>>>>
>>>>>> There's also the question of how to propagate the information across
>>>>>> kexec. But I guess that question exists even outside of Xen, with the
>>>>>> area living in memory which the OS is expected to recycle.
>>>>>
>>>>> Indeed it does.  A simple rule might be, “Only trust the ESRT if it is
>>>>> in memory of type EfiRuntimeServicesData.”  That is easy to achieve by
>>>>> monkeypatching the config table as you suggested below.
>>>>>
>>>>> I *am* worried that the config table might be mapped read-only on some
>>>>> systems, in which case the overwrite would cause a fatal page fault.  Is
>>>>> there a way for Xen to check for this?
>>>>
>>>> While in boot mode, aiui page tables aren't supposed to be enforcing
>>>> access restrictions. Recall that on other architectures EFI even runs
>>>> with paging disabled; this simply is not possible for x86-64.
>>>
>>> Yikes!  No wonder firmware has nonexistent exploit mitigations.  They
>>> really ought to start porting UEFI to Rust, with ASLR, NX, stack
>>> canaries, a hardened allocator, and support for de-priviliged services
>>> that run in user mode.
>>>
>>> That reminds me: Can Xen itself run from ROM?
>>
>> I guess that could be possible in principle, but would certainly require
>> some work.
>>
>>>  Xen is being ported to
>>> POWER for use in Qubes OS, and one approach under consideration is to
>>> have Xen and a mini-dom0 be part of the firmware.  Personally, I really
>>> like this approach, as it makes untrusted storage domains much simpler.
>>> If this should be a separate email thread, let me know.
>>
>> It probably should be.
> 
> I will make one at some point.
> 
>>>> So
>>>> portable firmware shouldn't map anything r/o. In principle the pointer
>>>> could still be in ROM; I consider this unlikely, but we could check
>>>> for that (just like we could do a page table walk to figure out
>>>> whether a r/o mapping would prevent us from updating the field).
>>>
>>> Is there a utility function that could be used for this?
>>
>> I don't think there is.
> 
> Then it is good that none is necessary :)
> 
> Also, should the various bug checks I added be replaced by ASSERT()?

You mean those in the earlier patch(es)? Not sure - depends on what you
would be doing for release builds. In the cases where you simply re-
check what was checked earlier on, ASSERT() would probably indeed be
preferable over BUG_ON() (and there I wouldn't even see a strong need
to consider alternatives for release builds).

Jan
Demi Marie Obenour May 2, 2022, 7:42 a.m. UTC | #10
On Mon, May 02, 2022 at 09:37:39AM +0200, Jan Beulich wrote:
> On 02.05.2022 09:11, Demi Marie Obenour wrote:
> > On Mon, May 02, 2022 at 08:24:30AM +0200, Jan Beulich wrote:
> >> On 29.04.2022 19:06, Demi Marie Obenour wrote:
> >>> On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote:
> >>>> On 29.04.2022 00:54, Demi Marie Obenour wrote:
> >>>>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote:
> >>>>>> On 27.04.2022 21:08, Demi Marie Obenour wrote:
> >>>>>>> On further thought, I think the hypercall approach is actually better
> >>>>>>> than reserving the ESRT.  I really do not want XEN_FW_EFI_MEM_INFO to
> >>>>>>> return anything other than the actual firmware-provided memory
> >>>>>>> information, and the current approach seems to require more and more
> >>>>>>> special-casing of the ESRT, not to mention potentially wasting memory
> >>>>>>> and splitting a potentially large memory region into two smaller ones.
> >>>>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes
> >>>>>>> significantly simpler on both the Xen and dom0 sides.
> >>>>>>
> >>>>>> I actually did consider the option of making a private copy when you did
> >>>>>> send the initial version of this, but I'm not convinced this simplifies
> >>>>>> things from a kernel perspective: They'd now need to discover the table
> >>>>>> by some entirely different means. In Linux at least such divergence
> >>>>>> "just for Xen" hasn't been liked in the past.
> >>>>>>
> >>>>>> There's also the question of how to propagate the information across
> >>>>>> kexec. But I guess that question exists even outside of Xen, with the
> >>>>>> area living in memory which the OS is expected to recycle.
> >>>>>
> >>>>> Indeed it does.  A simple rule might be, “Only trust the ESRT if it is
> >>>>> in memory of type EfiRuntimeServicesData.”  That is easy to achieve by
> >>>>> monkeypatching the config table as you suggested below.
> >>>>>
> >>>>> I *am* worried that the config table might be mapped read-only on some
> >>>>> systems, in which case the overwrite would cause a fatal page fault.  Is
> >>>>> there a way for Xen to check for this?
> >>>>
> >>>> While in boot mode, aiui page tables aren't supposed to be enforcing
> >>>> access restrictions. Recall that on other architectures EFI even runs
> >>>> with paging disabled; this simply is not possible for x86-64.
> >>>
> >>> Yikes!  No wonder firmware has nonexistent exploit mitigations.  They
> >>> really ought to start porting UEFI to Rust, with ASLR, NX, stack
> >>> canaries, a hardened allocator, and support for de-priviliged services
> >>> that run in user mode.
> >>>
> >>> That reminds me: Can Xen itself run from ROM?
> >>
> >> I guess that could be possible in principle, but would certainly require
> >> some work.
> >>
> >>>  Xen is being ported to
> >>> POWER for use in Qubes OS, and one approach under consideration is to
> >>> have Xen and a mini-dom0 be part of the firmware.  Personally, I really
> >>> like this approach, as it makes untrusted storage domains much simpler.
> >>> If this should be a separate email thread, let me know.
> >>
> >> It probably should be.
> > 
> > I will make one at some point.
> > 
> >>>> So
> >>>> portable firmware shouldn't map anything r/o. In principle the pointer
> >>>> could still be in ROM; I consider this unlikely, but we could check
> >>>> for that (just like we could do a page table walk to figure out
> >>>> whether a r/o mapping would prevent us from updating the field).
> >>>
> >>> Is there a utility function that could be used for this?
> >>
> >> I don't think there is.
> > 
> > Then it is good that none is necessary :)
> > 
> > Also, should the various bug checks I added be replaced by ASSERT()?
> 
> You mean those in the earlier patch(es)? Not sure - depends on what you
> would be doing for release builds. In the cases where you simply re-
> check what was checked earlier on, ASSERT() would probably indeed be
> preferable over BUG_ON() (and there I wouldn't even see a strong need
> to consider alternatives for release builds).

Yup, that’s what the BUG_ON()s were for.  I will use ASSERT() in the
next round.
diff mbox series

Patch

diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 31664818c1..01b2409c5e 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -567,8 +567,6 @@  static int __init efi_check_dt_boot(const EFI_LOADED_IMAGE *loaded_image)
 }
 #endif
 
-static UINTN __initdata esrt = EFI_INVALID_TABLE_ADDR;
-
 static bool __init is_esrt_valid(
     const EFI_MEMORY_DESCRIPTOR *const desc)
 {
@@ -594,9 +592,13 @@  static bool __init is_esrt_valid(
     esrt_ptr = (const ESRT *)esrt;
     if ( esrt_ptr->Version != 1 || !esrt_ptr->Count )
         return false;
-    return esrt_ptr->Count <=
-           (available_len - sizeof(*esrt_ptr)) /
-           sizeof(esrt_ptr->Entries[0]);
+    if ( esrt_ptr->Count >
+	 (available_len - sizeof(*esrt_ptr)) /
+	 sizeof(esrt_ptr->Entries[0]) )
+        return false;
+    esrt_size = sizeof(*esrt_ptr) +
+        esrt_ptr->Count * sizeof(esrt_ptr->Entries[0]);
+    return true;
 }
 
 /*
@@ -1121,6 +1123,9 @@  static void __init efi_exit_boot(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *Syste
             }
         }
 
+	if ( esrt_desc == (const EFI_MEMORY_DESCRIPTOR *)EFI_INVALID_TABLE_ADDR )
+	    esrt = EFI_INVALID_TABLE_ADDR;
+
         efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size,
                                     efi_mdesc_size, mdesc_ver);
 
diff --git a/xen/common/efi/efi.h b/xen/common/efi/efi.h
index 02f499071a..0736662ebc 100644
--- a/xen/common/efi/efi.h
+++ b/xen/common/efi/efi.h
@@ -46,6 +46,8 @@  extern const EFI_RUNTIME_SERVICES *efi_rs;
 extern UINTN efi_memmap_size, efi_mdesc_size;
 extern void *efi_memmap;
 extern const EFI_MEMORY_DESCRIPTOR *esrt_desc;
+extern UINTN esrt;
+extern UINTN esrt_size;
 
 #ifdef CONFIG_X86
 extern mfn_t efi_l4_mfn;
diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c
index 0d09647952..4466d5379c 100644
--- a/xen/common/efi/runtime.c
+++ b/xen/common/efi/runtime.c
@@ -227,6 +227,12 @@  const CHAR16 *wmemchr(const CHAR16 *s, CHAR16 c, UINTN n)
 #endif /* COMPAT */
 
 #ifndef CONFIG_ARM /* TODO - disabled until implemented on ARM */
+
+#ifndef COMPAT
+UINTN esrt = EFI_INVALID_TABLE_ADDR;
+UINTN esrt_size = 0;
+#endif
+
 int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
 {
     unsigned int i, n;
@@ -311,6 +317,14 @@  int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
         info->apple_properties.size = efi_apple_properties_len;
         break;
 
+    case XEN_FW_EFI_ESRT:
+        if ( esrt_desc == (const EFI_MEMORY_DESCRIPTOR *)EFI_INVALID_TABLE_ADDR )
+            return -ENODATA;
+        if ( info->esrt.size < esrt_size )
+            return -ERANGE;
+        if ( copy_to_guest(info->esrt.table, (const ESRT *)esrt, esrt_size) )
+            return -EFAULT;
+        break;
     default:
         return -EINVAL;
     }
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 8100133509..a848df2066 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -243,6 +243,7 @@  DEFINE_XEN_GUEST_HANDLE(xenpf_efi_runtime_call_t);
 #define  XEN_FW_EFI_RT_VERSION     4
 #define  XEN_FW_EFI_PCI_ROM        5
 #define  XEN_FW_EFI_APPLE_PROPERTIES 6
+#define  XEN_FW_EFI_ESRT           7
 #define XEN_FW_KBD_SHIFT_FLAGS    5
 struct xenpf_firmware_info {
     /* IN variables. */
@@ -307,6 +308,12 @@  struct xenpf_firmware_info {
                 uint64_t address;
                 xen_ulong_t size;
             } apple_properties;
+            struct {
+                /* IN variables */
+                uint64_t size;
+                /* OUT variables */
+                XEN_GUEST_HANDLE(void) table;
+            } esrt;
         } efi_info; /* XEN_FW_EFI_INFO */
 
         /* Int16, Fn02: Get keyboard shift flags. */