Message ID | Yl7aC2a+TtOaFtqZ@itl-email (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | EFI System Resource Table support | expand |
On 19.04.2022 17:49, Demi Marie Obenour wrote: > This hypercall can be used to get the ESRT from the hypervisor. It > returning successfully also indicates that Xen has reserved the ESRT and > it can safely be parsed by dom0. I'm not convinced of the need, and I view such an addition as inconsistent with the original intentions. The pointer comes from the config table, which Dom0 already has access to. All a Dom0 kernel may need to know in addition is whether the range was properly reserved. This could be achieved by splitting the EFI memory map entry in patch 2, instead of only splitting the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out the range's type. Another way to find out would be for Dom0 to attempt to map this area as MMIO, after first checking that no part of the range is in its own memory allocation. This 2nd approach may, however, not really be suitable for PVH Dom0, I think. Jan
On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote: > On 19.04.2022 17:49, Demi Marie Obenour wrote: > > This hypercall can be used to get the ESRT from the hypervisor. It > > returning successfully also indicates that Xen has reserved the ESRT and > > it can safely be parsed by dom0. > > I'm not convinced of the need, and I view such an addition as inconsistent > with the original intentions. The pointer comes from the config table, > which Dom0 already has access to. All a Dom0 kernel may need to know in > addition is whether the range was properly reserved. This could be achieved > by splitting the EFI memory map entry in patch 2, instead of only splitting > the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out > the range's type. Another way to find out would be for Dom0 to attempt to > map this area as MMIO, after first checking that no part of the range is in > its own memory allocation. This 2nd approach may, however, not really be > suitable for PVH Dom0, I think. On further thought, I think the hypercall approach is actually better than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to return anything other than the actual firmware-provided memory information, and the current approach seems to require more and more special-casing of the ESRT, not to mention potentially wasting memory and splitting a potentially large memory region into two smaller ones. By copying the entire ESRT into memory owned by Xen, the logic becomes significantly simpler on both the Xen and dom0 sides. Is using ebmalloc() to allocate a copy of the ESRT a reasonable option? Is it possible that the ESRT is so large that this causes boot to fail?
On 27.04.2022 21:08, Demi Marie Obenour wrote: > On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote: >> On 19.04.2022 17:49, Demi Marie Obenour wrote: >>> This hypercall can be used to get the ESRT from the hypervisor. It >>> returning successfully also indicates that Xen has reserved the ESRT and >>> it can safely be parsed by dom0. >> >> I'm not convinced of the need, and I view such an addition as inconsistent >> with the original intentions. The pointer comes from the config table, >> which Dom0 already has access to. All a Dom0 kernel may need to know in >> addition is whether the range was properly reserved. This could be achieved >> by splitting the EFI memory map entry in patch 2, instead of only splitting >> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out >> the range's type. Another way to find out would be for Dom0 to attempt to >> map this area as MMIO, after first checking that no part of the range is in >> its own memory allocation. This 2nd approach may, however, not really be >> suitable for PVH Dom0, I think. > > On further thought, I think the hypercall approach is actually better > than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to > return anything other than the actual firmware-provided memory > information, and the current approach seems to require more and more > special-casing of the ESRT, not to mention potentially wasting memory > and splitting a potentially large memory region into two smaller ones. > By copying the entire ESRT into memory owned by Xen, the logic becomes > significantly simpler on both the Xen and dom0 sides. I actually did consider the option of making a private copy when you did send the initial version of this, but I'm not convinced this simplifies things from a kernel perspective: They'd now need to discover the table by some entirely different means. In Linux at least such divergence "just for Xen" hasn't been liked in the past. There's also the question of how to propagate the information across kexec. But I guess that question exists even outside of Xen, with the area living in memory which the OS is expected to recycle. > Is using ebmalloc() to allocate a copy of the ESRT a reasonable option? I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to make the copy before ExitBootServices(), via normal EFI allocation. If replacing a pointer in the config table was okay(ish), this could even be utilized to overcome the kexec problem. > Is it possible that the ESRT is so large that this causes boot to fail? I don't know - that's a question firmware folks would need to answer. Jan
On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote: > On 27.04.2022 21:08, Demi Marie Obenour wrote: > > On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote: > >> On 19.04.2022 17:49, Demi Marie Obenour wrote: > >>> This hypercall can be used to get the ESRT from the hypervisor. It > >>> returning successfully also indicates that Xen has reserved the ESRT and > >>> it can safely be parsed by dom0. > >> > >> I'm not convinced of the need, and I view such an addition as inconsistent > >> with the original intentions. The pointer comes from the config table, > >> which Dom0 already has access to. All a Dom0 kernel may need to know in > >> addition is whether the range was properly reserved. This could be achieved > >> by splitting the EFI memory map entry in patch 2, instead of only splitting > >> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out > >> the range's type. Another way to find out would be for Dom0 to attempt to > >> map this area as MMIO, after first checking that no part of the range is in > >> its own memory allocation. This 2nd approach may, however, not really be > >> suitable for PVH Dom0, I think. > > > > On further thought, I think the hypercall approach is actually better > > than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to > > return anything other than the actual firmware-provided memory > > information, and the current approach seems to require more and more > > special-casing of the ESRT, not to mention potentially wasting memory > > and splitting a potentially large memory region into two smaller ones. > > By copying the entire ESRT into memory owned by Xen, the logic becomes > > significantly simpler on both the Xen and dom0 sides. > > I actually did consider the option of making a private copy when you did > send the initial version of this, but I'm not convinced this simplifies > things from a kernel perspective: They'd now need to discover the table > by some entirely different means. In Linux at least such divergence > "just for Xen" hasn't been liked in the past. > > There's also the question of how to propagate the information across > kexec. But I guess that question exists even outside of Xen, with the > area living in memory which the OS is expected to recycle. Indeed it does. A simple rule might be, “Only trust the ESRT if it is in memory of type EfiRuntimeServicesData.” That is easy to achieve by monkeypatching the config table as you suggested below. I *am* worried that the config table might be mapped read-only on some systems, in which case the overwrite would cause a fatal page fault. Is there a way for Xen to check for this? It could also be undefined behavior to modify it. > > Is using ebmalloc() to allocate a copy of the ESRT a reasonable option? > > I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to > make the copy before ExitBootServices(), via normal EFI allocation. If > replacing a pointer in the config table was okay(ish), this could even > be utilized to overcome the kexec problem. What type should I use for the allocation? EfiLoaderData looks like the most consistent choice, but I am not sure if memory so allocated remains valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a better choice. To avoid memory leaks from repeated kexec(), this could be made conditional on the ESRT not being in memory of type EfiRuntimeServicesData to begin with.
On 29.04.2022 00:54, Demi Marie Obenour wrote: > On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote: >> On 27.04.2022 21:08, Demi Marie Obenour wrote: >>> On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote: >>>> On 19.04.2022 17:49, Demi Marie Obenour wrote: >>>>> This hypercall can be used to get the ESRT from the hypervisor. It >>>>> returning successfully also indicates that Xen has reserved the ESRT and >>>>> it can safely be parsed by dom0. >>>> >>>> I'm not convinced of the need, and I view such an addition as inconsistent >>>> with the original intentions. The pointer comes from the config table, >>>> which Dom0 already has access to. All a Dom0 kernel may need to know in >>>> addition is whether the range was properly reserved. This could be achieved >>>> by splitting the EFI memory map entry in patch 2, instead of only splitting >>>> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out >>>> the range's type. Another way to find out would be for Dom0 to attempt to >>>> map this area as MMIO, after first checking that no part of the range is in >>>> its own memory allocation. This 2nd approach may, however, not really be >>>> suitable for PVH Dom0, I think. >>> >>> On further thought, I think the hypercall approach is actually better >>> than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to >>> return anything other than the actual firmware-provided memory >>> information, and the current approach seems to require more and more >>> special-casing of the ESRT, not to mention potentially wasting memory >>> and splitting a potentially large memory region into two smaller ones. >>> By copying the entire ESRT into memory owned by Xen, the logic becomes >>> significantly simpler on both the Xen and dom0 sides. >> >> I actually did consider the option of making a private copy when you did >> send the initial version of this, but I'm not convinced this simplifies >> things from a kernel perspective: They'd now need to discover the table >> by some entirely different means. In Linux at least such divergence >> "just for Xen" hasn't been liked in the past. >> >> There's also the question of how to propagate the information across >> kexec. But I guess that question exists even outside of Xen, with the >> area living in memory which the OS is expected to recycle. > > Indeed it does. A simple rule might be, “Only trust the ESRT if it is > in memory of type EfiRuntimeServicesData.” That is easy to achieve by > monkeypatching the config table as you suggested below. > > I *am* worried that the config table might be mapped read-only on some > systems, in which case the overwrite would cause a fatal page fault. Is > there a way for Xen to check for this? While in boot mode, aiui page tables aren't supposed to be enforcing access restrictions. Recall that on other architectures EFI even runs with paging disabled; this simply is not possible for x86-64. So portable firmware shouldn't map anything r/o. In principle the pointer could still be in ROM; I consider this unlikely, but we could check for that (just like we could do a page table walk to figure out whether a r/o mapping would prevent us from updating the field). > It could also be undefined behavior to modify it. That's the bigger worry I have. >>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option? >> >> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to >> make the copy before ExitBootServices(), via normal EFI allocation. If >> replacing a pointer in the config table was okay(ish), this could even >> be utilized to overcome the kexec problem. > > What type should I use for the allocation? EfiLoaderData looks like the > most consistent choice, but I am not sure if memory so allocated remains > valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a > better choice. It definitely is. We do recycle EfiLoaderData ourselves. > To avoid memory leaks from repeated kexec(), this could > be made conditional on the ESRT not being in memory of type > EfiRuntimeServicesData to begin with. Of course - there's no point relocating the blob when it already is immune to recycling. Jan
On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote: > On 29.04.2022 00:54, Demi Marie Obenour wrote: > > On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote: > >> On 27.04.2022 21:08, Demi Marie Obenour wrote: > >>> On Wed, Apr 27, 2022 at 10:56:34AM +0200, Jan Beulich wrote: > >>>> On 19.04.2022 17:49, Demi Marie Obenour wrote: > >>>>> This hypercall can be used to get the ESRT from the hypervisor. It > >>>>> returning successfully also indicates that Xen has reserved the ESRT and > >>>>> it can safely be parsed by dom0. > >>>> > >>>> I'm not convinced of the need, and I view such an addition as inconsistent > >>>> with the original intentions. The pointer comes from the config table, > >>>> which Dom0 already has access to. All a Dom0 kernel may need to know in > >>>> addition is whether the range was properly reserved. This could be achieved > >>>> by splitting the EFI memory map entry in patch 2, instead of only splitting > >>>> the E820 derivation, as then XEN_FW_EFI_MEM_INFO can be used to find out > >>>> the range's type. Another way to find out would be for Dom0 to attempt to > >>>> map this area as MMIO, after first checking that no part of the range is in > >>>> its own memory allocation. This 2nd approach may, however, not really be > >>>> suitable for PVH Dom0, I think. > >>> > >>> On further thought, I think the hypercall approach is actually better > >>> than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to > >>> return anything other than the actual firmware-provided memory > >>> information, and the current approach seems to require more and more > >>> special-casing of the ESRT, not to mention potentially wasting memory > >>> and splitting a potentially large memory region into two smaller ones. > >>> By copying the entire ESRT into memory owned by Xen, the logic becomes > >>> significantly simpler on both the Xen and dom0 sides. > >> > >> I actually did consider the option of making a private copy when you did > >> send the initial version of this, but I'm not convinced this simplifies > >> things from a kernel perspective: They'd now need to discover the table > >> by some entirely different means. In Linux at least such divergence > >> "just for Xen" hasn't been liked in the past. > >> > >> There's also the question of how to propagate the information across > >> kexec. But I guess that question exists even outside of Xen, with the > >> area living in memory which the OS is expected to recycle. > > > > Indeed it does. A simple rule might be, “Only trust the ESRT if it is > > in memory of type EfiRuntimeServicesData.” That is easy to achieve by > > monkeypatching the config table as you suggested below. > > > > I *am* worried that the config table might be mapped read-only on some > > systems, in which case the overwrite would cause a fatal page fault. Is > > there a way for Xen to check for this? > > While in boot mode, aiui page tables aren't supposed to be enforcing > access restrictions. Recall that on other architectures EFI even runs > with paging disabled; this simply is not possible for x86-64. Yikes! No wonder firmware has nonexistent exploit mitigations. They really ought to start porting UEFI to Rust, with ASLR, NX, stack canaries, a hardened allocator, and support for de-priviliged services that run in user mode. That reminds me: Can Xen itself run from ROM? Xen is being ported to POWER for use in Qubes OS, and one approach under consideration is to have Xen and a mini-dom0 be part of the firmware. Personally, I really like this approach, as it makes untrusted storage domains much simpler. If this should be a separate email thread, let me know. > So > portable firmware shouldn't map anything r/o. In principle the pointer > could still be in ROM; I consider this unlikely, but we could check > for that (just like we could do a page table walk to figure out > whether a r/o mapping would prevent us from updating the field). Is there a utility function that could be used for this? > > It could also be undefined behavior to modify it. > > That's the bigger worry I have. Turns out that it is *not* undefined behavior, so long as ExitBootServices() has not been called. This is becaues EFI drivers will modify the config table, so firmware cannot assume it to be read-only. > >>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option? > >> > >> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to > >> make the copy before ExitBootServices(), via normal EFI allocation. If > >> replacing a pointer in the config table was okay(ish), this could even > >> be utilized to overcome the kexec problem. > > > > What type should I use for the allocation? EfiLoaderData looks like the > > most consistent choice, but I am not sure if memory so allocated remains > > valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a > > better choice. > > It definitely is. We do recycle EfiLoaderData ourselves. I wonder why the ESRT was not in EfiRuntimeServicesData to begin with. > > To avoid memory leaks from repeated kexec(), this could > > be made conditional on the ESRT not being in memory of type > > EfiRuntimeServicesData to begin with. > > Of course - there's no point relocating the blob when it already is > immune to recycling. Yup. Is it reasonable for dom0 to check that the ESRT is in EfiRuntimeServicesData when under Xen?
On 29.04.2022 19:06, Demi Marie Obenour wrote: > On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote: >> On 29.04.2022 00:54, Demi Marie Obenour wrote: >>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote: >>>> On 27.04.2022 21:08, Demi Marie Obenour wrote: >>>>> On further thought, I think the hypercall approach is actually better >>>>> than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to >>>>> return anything other than the actual firmware-provided memory >>>>> information, and the current approach seems to require more and more >>>>> special-casing of the ESRT, not to mention potentially wasting memory >>>>> and splitting a potentially large memory region into two smaller ones. >>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes >>>>> significantly simpler on both the Xen and dom0 sides. >>>> >>>> I actually did consider the option of making a private copy when you did >>>> send the initial version of this, but I'm not convinced this simplifies >>>> things from a kernel perspective: They'd now need to discover the table >>>> by some entirely different means. In Linux at least such divergence >>>> "just for Xen" hasn't been liked in the past. >>>> >>>> There's also the question of how to propagate the information across >>>> kexec. But I guess that question exists even outside of Xen, with the >>>> area living in memory which the OS is expected to recycle. >>> >>> Indeed it does. A simple rule might be, “Only trust the ESRT if it is >>> in memory of type EfiRuntimeServicesData.” That is easy to achieve by >>> monkeypatching the config table as you suggested below. >>> >>> I *am* worried that the config table might be mapped read-only on some >>> systems, in which case the overwrite would cause a fatal page fault. Is >>> there a way for Xen to check for this? >> >> While in boot mode, aiui page tables aren't supposed to be enforcing >> access restrictions. Recall that on other architectures EFI even runs >> with paging disabled; this simply is not possible for x86-64. > > Yikes! No wonder firmware has nonexistent exploit mitigations. They > really ought to start porting UEFI to Rust, with ASLR, NX, stack > canaries, a hardened allocator, and support for de-priviliged services > that run in user mode. > > That reminds me: Can Xen itself run from ROM? I guess that could be possible in principle, but would certainly require some work. > Xen is being ported to > POWER for use in Qubes OS, and one approach under consideration is to > have Xen and a mini-dom0 be part of the firmware. Personally, I really > like this approach, as it makes untrusted storage domains much simpler. > If this should be a separate email thread, let me know. It probably should be. >> So >> portable firmware shouldn't map anything r/o. In principle the pointer >> could still be in ROM; I consider this unlikely, but we could check >> for that (just like we could do a page table walk to figure out >> whether a r/o mapping would prevent us from updating the field). > > Is there a utility function that could be used for this? I don't think there is. >>> It could also be undefined behavior to modify it. >> >> That's the bigger worry I have. > > Turns out that it is *not* undefined behavior, so long as > ExitBootServices() has not been called. This is becaues EFI drivers > will modify the config table, so firmware cannot assume it to be > read-only. Ah, right - we could even use InstallConfigurationTable() ourselves to make the adjustment. >>>>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option? >>>> >>>> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to >>>> make the copy before ExitBootServices(), via normal EFI allocation. If >>>> replacing a pointer in the config table was okay(ish), this could even >>>> be utilized to overcome the kexec problem. >>> >>> What type should I use for the allocation? EfiLoaderData looks like the >>> most consistent choice, but I am not sure if memory so allocated remains >>> valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a >>> better choice. >> >> It definitely is. We do recycle EfiLoaderData ourselves. > > I wonder why the ESRT was not in EfiRuntimeServicesData to begin with. So do I. >>> To avoid memory leaks from repeated kexec(), this could >>> be made conditional on the ESRT not being in memory of type >>> EfiRuntimeServicesData to begin with. >> >> Of course - there's no point relocating the blob when it already is >> immune to recycling. > > Yup. Is it reasonable for dom0 to check that the ESRT is in > EfiRuntimeServicesData when under Xen? I think it is, but kernel folks may not like Xen specific code in this (or about any) area. Jan
On Mon, May 02, 2022 at 08:24:30AM +0200, Jan Beulich wrote: > On 29.04.2022 19:06, Demi Marie Obenour wrote: > > On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote: > >> On 29.04.2022 00:54, Demi Marie Obenour wrote: > >>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote: > >>>> On 27.04.2022 21:08, Demi Marie Obenour wrote: > >>>>> On further thought, I think the hypercall approach is actually better > >>>>> than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to > >>>>> return anything other than the actual firmware-provided memory > >>>>> information, and the current approach seems to require more and more > >>>>> special-casing of the ESRT, not to mention potentially wasting memory > >>>>> and splitting a potentially large memory region into two smaller ones. > >>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes > >>>>> significantly simpler on both the Xen and dom0 sides. > >>>> > >>>> I actually did consider the option of making a private copy when you did > >>>> send the initial version of this, but I'm not convinced this simplifies > >>>> things from a kernel perspective: They'd now need to discover the table > >>>> by some entirely different means. In Linux at least such divergence > >>>> "just for Xen" hasn't been liked in the past. > >>>> > >>>> There's also the question of how to propagate the information across > >>>> kexec. But I guess that question exists even outside of Xen, with the > >>>> area living in memory which the OS is expected to recycle. > >>> > >>> Indeed it does. A simple rule might be, “Only trust the ESRT if it is > >>> in memory of type EfiRuntimeServicesData.” That is easy to achieve by > >>> monkeypatching the config table as you suggested below. > >>> > >>> I *am* worried that the config table might be mapped read-only on some > >>> systems, in which case the overwrite would cause a fatal page fault. Is > >>> there a way for Xen to check for this? > >> > >> While in boot mode, aiui page tables aren't supposed to be enforcing > >> access restrictions. Recall that on other architectures EFI even runs > >> with paging disabled; this simply is not possible for x86-64. > > > > Yikes! No wonder firmware has nonexistent exploit mitigations. They > > really ought to start porting UEFI to Rust, with ASLR, NX, stack > > canaries, a hardened allocator, and support for de-priviliged services > > that run in user mode. > > > > That reminds me: Can Xen itself run from ROM? > > I guess that could be possible in principle, but would certainly require > some work. > > > Xen is being ported to > > POWER for use in Qubes OS, and one approach under consideration is to > > have Xen and a mini-dom0 be part of the firmware. Personally, I really > > like this approach, as it makes untrusted storage domains much simpler. > > If this should be a separate email thread, let me know. > > It probably should be. I will make one at some point. > >> So > >> portable firmware shouldn't map anything r/o. In principle the pointer > >> could still be in ROM; I consider this unlikely, but we could check > >> for that (just like we could do a page table walk to figure out > >> whether a r/o mapping would prevent us from updating the field). > > > > Is there a utility function that could be used for this? > > I don't think there is. Then it is good that none is necessary :) Also, should the various bug checks I added be replaced by ASSERT()? > >>> It could also be undefined behavior to modify it. > >> > >> That's the bigger worry I have. > > > > Turns out that it is *not* undefined behavior, so long as > > ExitBootServices() has not been called. This is becaues EFI drivers > > will modify the config table, so firmware cannot assume it to be > > read-only. > > Ah, right - we could even use InstallConfigurationTable() ourselves > to make the adjustment. That is even simpler than I thought! I was worried that InstallConfigurationTable() would assume that memory for the table was allocated a certain way and cause invalid free errors, but at least TianoCore does not do that. > >>>>> Is using ebmalloc() to allocate a copy of the ESRT a reasonable option? > >>>> > >>>> I'd suggest to try hard to avoid ebmalloc(). It ought to be possible to > >>>> make the copy before ExitBootServices(), via normal EFI allocation. If > >>>> replacing a pointer in the config table was okay(ish), this could even > >>>> be utilized to overcome the kexec problem. > >>> > >>> What type should I use for the allocation? EfiLoaderData looks like the > >>> most consistent choice, but I am not sure if memory so allocated remains > >>> valid when Xen hands off to the OS, so EfiRuntimeServicesData might be a > >>> better choice. > >> > >> It definitely is. We do recycle EfiLoaderData ourselves. > > > > I wonder why the ESRT was not in EfiRuntimeServicesData to begin with. > > So do I. I suspect the assumption was that the ESRT would be parsed by the OS before ExitBootServices(), and that the OS would have no need for the ESRT after that. > >>> To avoid memory leaks from repeated kexec(), this could > >>> be made conditional on the ESRT not being in memory of type > >>> EfiRuntimeServicesData to begin with. > >> > >> Of course - there's no point relocating the blob when it already is > >> immune to recycling. > > > > Yup. Is it reasonable for dom0 to check that the ESRT is in > > EfiRuntimeServicesData when under Xen? > > I think it is, but kernel folks may not like Xen specific code in this > (or about any) area. > > Jan There is PVops et al already :)
On 02.05.2022 09:11, Demi Marie Obenour wrote: > On Mon, May 02, 2022 at 08:24:30AM +0200, Jan Beulich wrote: >> On 29.04.2022 19:06, Demi Marie Obenour wrote: >>> On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote: >>>> On 29.04.2022 00:54, Demi Marie Obenour wrote: >>>>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote: >>>>>> On 27.04.2022 21:08, Demi Marie Obenour wrote: >>>>>>> On further thought, I think the hypercall approach is actually better >>>>>>> than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to >>>>>>> return anything other than the actual firmware-provided memory >>>>>>> information, and the current approach seems to require more and more >>>>>>> special-casing of the ESRT, not to mention potentially wasting memory >>>>>>> and splitting a potentially large memory region into two smaller ones. >>>>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes >>>>>>> significantly simpler on both the Xen and dom0 sides. >>>>>> >>>>>> I actually did consider the option of making a private copy when you did >>>>>> send the initial version of this, but I'm not convinced this simplifies >>>>>> things from a kernel perspective: They'd now need to discover the table >>>>>> by some entirely different means. In Linux at least such divergence >>>>>> "just for Xen" hasn't been liked in the past. >>>>>> >>>>>> There's also the question of how to propagate the information across >>>>>> kexec. But I guess that question exists even outside of Xen, with the >>>>>> area living in memory which the OS is expected to recycle. >>>>> >>>>> Indeed it does. A simple rule might be, “Only trust the ESRT if it is >>>>> in memory of type EfiRuntimeServicesData.” That is easy to achieve by >>>>> monkeypatching the config table as you suggested below. >>>>> >>>>> I *am* worried that the config table might be mapped read-only on some >>>>> systems, in which case the overwrite would cause a fatal page fault. Is >>>>> there a way for Xen to check for this? >>>> >>>> While in boot mode, aiui page tables aren't supposed to be enforcing >>>> access restrictions. Recall that on other architectures EFI even runs >>>> with paging disabled; this simply is not possible for x86-64. >>> >>> Yikes! No wonder firmware has nonexistent exploit mitigations. They >>> really ought to start porting UEFI to Rust, with ASLR, NX, stack >>> canaries, a hardened allocator, and support for de-priviliged services >>> that run in user mode. >>> >>> That reminds me: Can Xen itself run from ROM? >> >> I guess that could be possible in principle, but would certainly require >> some work. >> >>> Xen is being ported to >>> POWER for use in Qubes OS, and one approach under consideration is to >>> have Xen and a mini-dom0 be part of the firmware. Personally, I really >>> like this approach, as it makes untrusted storage domains much simpler. >>> If this should be a separate email thread, let me know. >> >> It probably should be. > > I will make one at some point. > >>>> So >>>> portable firmware shouldn't map anything r/o. In principle the pointer >>>> could still be in ROM; I consider this unlikely, but we could check >>>> for that (just like we could do a page table walk to figure out >>>> whether a r/o mapping would prevent us from updating the field). >>> >>> Is there a utility function that could be used for this? >> >> I don't think there is. > > Then it is good that none is necessary :) > > Also, should the various bug checks I added be replaced by ASSERT()? You mean those in the earlier patch(es)? Not sure - depends on what you would be doing for release builds. In the cases where you simply re- check what was checked earlier on, ASSERT() would probably indeed be preferable over BUG_ON() (and there I wouldn't even see a strong need to consider alternatives for release builds). Jan
On Mon, May 02, 2022 at 09:37:39AM +0200, Jan Beulich wrote: > On 02.05.2022 09:11, Demi Marie Obenour wrote: > > On Mon, May 02, 2022 at 08:24:30AM +0200, Jan Beulich wrote: > >> On 29.04.2022 19:06, Demi Marie Obenour wrote: > >>> On Fri, Apr 29, 2022 at 10:40:42AM +0200, Jan Beulich wrote: > >>>> On 29.04.2022 00:54, Demi Marie Obenour wrote: > >>>>> On Thu, Apr 28, 2022 at 08:47:49AM +0200, Jan Beulich wrote: > >>>>>> On 27.04.2022 21:08, Demi Marie Obenour wrote: > >>>>>>> On further thought, I think the hypercall approach is actually better > >>>>>>> than reserving the ESRT. I really do not want XEN_FW_EFI_MEM_INFO to > >>>>>>> return anything other than the actual firmware-provided memory > >>>>>>> information, and the current approach seems to require more and more > >>>>>>> special-casing of the ESRT, not to mention potentially wasting memory > >>>>>>> and splitting a potentially large memory region into two smaller ones. > >>>>>>> By copying the entire ESRT into memory owned by Xen, the logic becomes > >>>>>>> significantly simpler on both the Xen and dom0 sides. > >>>>>> > >>>>>> I actually did consider the option of making a private copy when you did > >>>>>> send the initial version of this, but I'm not convinced this simplifies > >>>>>> things from a kernel perspective: They'd now need to discover the table > >>>>>> by some entirely different means. In Linux at least such divergence > >>>>>> "just for Xen" hasn't been liked in the past. > >>>>>> > >>>>>> There's also the question of how to propagate the information across > >>>>>> kexec. But I guess that question exists even outside of Xen, with the > >>>>>> area living in memory which the OS is expected to recycle. > >>>>> > >>>>> Indeed it does. A simple rule might be, “Only trust the ESRT if it is > >>>>> in memory of type EfiRuntimeServicesData.” That is easy to achieve by > >>>>> monkeypatching the config table as you suggested below. > >>>>> > >>>>> I *am* worried that the config table might be mapped read-only on some > >>>>> systems, in which case the overwrite would cause a fatal page fault. Is > >>>>> there a way for Xen to check for this? > >>>> > >>>> While in boot mode, aiui page tables aren't supposed to be enforcing > >>>> access restrictions. Recall that on other architectures EFI even runs > >>>> with paging disabled; this simply is not possible for x86-64. > >>> > >>> Yikes! No wonder firmware has nonexistent exploit mitigations. They > >>> really ought to start porting UEFI to Rust, with ASLR, NX, stack > >>> canaries, a hardened allocator, and support for de-priviliged services > >>> that run in user mode. > >>> > >>> That reminds me: Can Xen itself run from ROM? > >> > >> I guess that could be possible in principle, but would certainly require > >> some work. > >> > >>> Xen is being ported to > >>> POWER for use in Qubes OS, and one approach under consideration is to > >>> have Xen and a mini-dom0 be part of the firmware. Personally, I really > >>> like this approach, as it makes untrusted storage domains much simpler. > >>> If this should be a separate email thread, let me know. > >> > >> It probably should be. > > > > I will make one at some point. > > > >>>> So > >>>> portable firmware shouldn't map anything r/o. In principle the pointer > >>>> could still be in ROM; I consider this unlikely, but we could check > >>>> for that (just like we could do a page table walk to figure out > >>>> whether a r/o mapping would prevent us from updating the field). > >>> > >>> Is there a utility function that could be used for this? > >> > >> I don't think there is. > > > > Then it is good that none is necessary :) > > > > Also, should the various bug checks I added be replaced by ASSERT()? > > You mean those in the earlier patch(es)? Not sure - depends on what you > would be doing for release builds. In the cases where you simply re- > check what was checked earlier on, ASSERT() would probably indeed be > preferable over BUG_ON() (and there I wouldn't even see a strong need > to consider alternatives for release builds). Yup, that’s what the BUG_ON()s were for. I will use ASSERT() in the next round.
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c index 31664818c1..01b2409c5e 100644 --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -567,8 +567,6 @@ static int __init efi_check_dt_boot(const EFI_LOADED_IMAGE *loaded_image) } #endif -static UINTN __initdata esrt = EFI_INVALID_TABLE_ADDR; - static bool __init is_esrt_valid( const EFI_MEMORY_DESCRIPTOR *const desc) { @@ -594,9 +592,13 @@ static bool __init is_esrt_valid( esrt_ptr = (const ESRT *)esrt; if ( esrt_ptr->Version != 1 || !esrt_ptr->Count ) return false; - return esrt_ptr->Count <= - (available_len - sizeof(*esrt_ptr)) / - sizeof(esrt_ptr->Entries[0]); + if ( esrt_ptr->Count > + (available_len - sizeof(*esrt_ptr)) / + sizeof(esrt_ptr->Entries[0]) ) + return false; + esrt_size = sizeof(*esrt_ptr) + + esrt_ptr->Count * sizeof(esrt_ptr->Entries[0]); + return true; } /* @@ -1121,6 +1123,9 @@ static void __init efi_exit_boot(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *Syste } } + if ( esrt_desc == (const EFI_MEMORY_DESCRIPTOR *)EFI_INVALID_TABLE_ADDR ) + esrt = EFI_INVALID_TABLE_ADDR; + efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size, efi_mdesc_size, mdesc_ver); diff --git a/xen/common/efi/efi.h b/xen/common/efi/efi.h index 02f499071a..0736662ebc 100644 --- a/xen/common/efi/efi.h +++ b/xen/common/efi/efi.h @@ -46,6 +46,8 @@ extern const EFI_RUNTIME_SERVICES *efi_rs; extern UINTN efi_memmap_size, efi_mdesc_size; extern void *efi_memmap; extern const EFI_MEMORY_DESCRIPTOR *esrt_desc; +extern UINTN esrt; +extern UINTN esrt_size; #ifdef CONFIG_X86 extern mfn_t efi_l4_mfn; diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c index 0d09647952..4466d5379c 100644 --- a/xen/common/efi/runtime.c +++ b/xen/common/efi/runtime.c @@ -227,6 +227,12 @@ const CHAR16 *wmemchr(const CHAR16 *s, CHAR16 c, UINTN n) #endif /* COMPAT */ #ifndef CONFIG_ARM /* TODO - disabled until implemented on ARM */ + +#ifndef COMPAT +UINTN esrt = EFI_INVALID_TABLE_ADDR; +UINTN esrt_size = 0; +#endif + int efi_get_info(uint32_t idx, union xenpf_efi_info *info) { unsigned int i, n; @@ -311,6 +317,14 @@ int efi_get_info(uint32_t idx, union xenpf_efi_info *info) info->apple_properties.size = efi_apple_properties_len; break; + case XEN_FW_EFI_ESRT: + if ( esrt_desc == (const EFI_MEMORY_DESCRIPTOR *)EFI_INVALID_TABLE_ADDR ) + return -ENODATA; + if ( info->esrt.size < esrt_size ) + return -ERANGE; + if ( copy_to_guest(info->esrt.table, (const ESRT *)esrt, esrt_size) ) + return -EFAULT; + break; default: return -EINVAL; } diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h index 8100133509..a848df2066 100644 --- a/xen/include/public/platform.h +++ b/xen/include/public/platform.h @@ -243,6 +243,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_efi_runtime_call_t); #define XEN_FW_EFI_RT_VERSION 4 #define XEN_FW_EFI_PCI_ROM 5 #define XEN_FW_EFI_APPLE_PROPERTIES 6 +#define XEN_FW_EFI_ESRT 7 #define XEN_FW_KBD_SHIFT_FLAGS 5 struct xenpf_firmware_info { /* IN variables. */ @@ -307,6 +308,12 @@ struct xenpf_firmware_info { uint64_t address; xen_ulong_t size; } apple_properties; + struct { + /* IN variables */ + uint64_t size; + /* OUT variables */ + XEN_GUEST_HANDLE(void) table; + } esrt; } efi_info; /* XEN_FW_EFI_INFO */ /* Int16, Fn02: Get keyboard shift flags. */