Message ID | 20250130134346.1754143-9-clg@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vfio: Improve error reporting when MMIO region mapping fails | expand |
On Thu, 30 Jan 2025 14:43:45 +0100 Cédric Le Goater <clg@redhat.com> wrote: > Print a warning if IOMMU address space width is smaller than the > physical address width. In this case, PCI peer-to-peer transactions on > BARs are not supported and failures of device MMIO regions are to be > expected. > > This can occur with the 39-bit IOMMU address space width as found on > consumer grade processors or when using a vIOMMU device with default > settings. > > Signed-off-by: Cédric Le Goater <clg@redhat.com> > --- > hw/vfio/common.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 5c9d8657d746ce30af5ae8f9122101e086a61ef5..e5ef93c589b2bed68f790608868ec3c7779d4cb8 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -44,6 +44,8 @@ > #include "migration/qemu-file.h" > #include "system/tpm.h" > > +#include "hw/core/cpu.h" > + > VFIODeviceList vfio_device_list = > QLIST_HEAD_INITIALIZER(vfio_device_list); > static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces = > @@ -1546,12 +1548,28 @@ retry: > return info; > } > > +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp) > +{ > + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu); > + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev); > + > + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) { Should we be testing the 64-bit MMIO window and maximum RAM GPA rather than the vCPU physical address width? Maybe we're just stuck with an indirect solution currently. AIUI, we're testing the vCPU physical address width because (obviously) all devices and memory need to be addressable within that address space. However, as we've explored offline, there are bare metal systems where the CPU physical address width exceeds the IOMMU address width and this is not a fundamental problem. It places restrictions on the maximum RAM physical address and the location of the 64-bit MMIO space. RAM tends not to be a problem on these bare metal systems since they physically cannot hold enough RAM to exceed the IOMMU width and, for the most part, we map RAM starting from the bottom of the address space. So long as the VMM doesn't decide to map RAM at the top of the physical address space, this doesn't become a problem. However, we've decided to do exactly that for the 64-bit MMIO window. It's not that the vCPU width being greater than the IOMMU width is a fundamental problem, it's that we've chosen a 64-bit MMIO policy that makes this become a problem and QEMU only has a convenient mechanism to check the host IOMMU width when a vfio device is present. Arguably, when a vIOMMU is present guest firmware should be enlightened to understand the address width of that vIOMMU when placing the 64-bit MMIO window. I'd say the failure to do so is a current firmware bug. If the vIOMMU address width were honored, we could recommend users set that to match the host, regardless of the vCPU physical address width. We also already have a failure condition if the vIOMMU address width exceeds the vfio-device (ie. indirectly the host) IOMMU width. Of course without a vIOMMU, given our 64-bit MMIO policy, we don't have a good solution without specifying the 64-bit window or implementing a more conservative placement. Not sure how much of this is immediately solvable and some indication to the user how they can resolve the issue, such as implemented here, is better than none, but maybe we can elaborate in a comment that this is really more of a workaround for the current behavior of firmware relative to the 64-bit MMIO placement policy. Thanks, Alex > + error_setg(errp, "Host physical address space (%u) is larger than " > + "the host IOMMU address space (%u).", cpu_aw_bits, > + iommu_aw_bits); > + vfio_device_error_append(vbasedev, errp); > + return false; > + } > + return true; > +} > + > bool vfio_attach_device(char *name, VFIODevice *vbasedev, > AddressSpace *as, Error **errp) > { > const VFIOIOMMUClass *ops = > VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY)); > HostIOMMUDevice *hiod = NULL; > + Error *local_err = NULL; > > if (vbasedev->iommufd) { > ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD)); > @@ -1571,6 +1589,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev, > return false; > } > > + if (!vfio_device_check_address_space(vbasedev, &local_err)) { > + warn_report_err(local_err); > + } > return true; > } >
+ Gerd for insights regarding vIOMMU support in edk2. On 1/30/25 19:58, Alex Williamson wrote: > On Thu, 30 Jan 2025 14:43:45 +0100 > Cédric Le Goater <clg@redhat.com> wrote: > >> Print a warning if IOMMU address space width is smaller than the >> physical address width. In this case, PCI peer-to-peer transactions on >> BARs are not supported and failures of device MMIO regions are to be >> expected. >> >> This can occur with the 39-bit IOMMU address space width as found on >> consumer grade processors or when using a vIOMMU device with default >> settings. >> >> Signed-off-by: Cédric Le Goater <clg@redhat.com> >> --- >> hw/vfio/common.c | 21 +++++++++++++++++++++ >> 1 file changed, 21 insertions(+) >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index 5c9d8657d746ce30af5ae8f9122101e086a61ef5..e5ef93c589b2bed68f790608868ec3c7779d4cb8 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -44,6 +44,8 @@ >> #include "migration/qemu-file.h" >> #include "system/tpm.h" >> >> +#include "hw/core/cpu.h" >> + >> VFIODeviceList vfio_device_list = >> QLIST_HEAD_INITIALIZER(vfio_device_list); >> static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces = >> @@ -1546,12 +1548,28 @@ retry: >> return info; >> } >> >> +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp) >> +{ >> + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu); >> + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev); >> + >> + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) { > > Should we be testing the 64-bit MMIO window and maximum RAM GPA rather > than the vCPU physical address width? > > Maybe we're just stuck with an indirect solution currently. AIUI, > we're testing the vCPU physical address width because (obviously) all > devices and memory need to be addressable within that address space. > However, as we've explored offline, there are bare metal systems where > the CPU physical address width exceeds the IOMMU address width and this > is not a fundamental problem. It places restrictions on the maximum > RAM physical address and the location of the 64-bit MMIO space. > > RAM tends not to be a problem on these bare metal systems since they > physically cannot hold enough RAM to exceed the IOMMU width and, for > the most part, we map RAM starting from the bottom of the address > space. So long as the VMM doesn't decide to map RAM at the top of the > physical address space, this doesn't become a problem. > > However, we've decided to do exactly that for the 64-bit MMIO window. > It's not that the vCPU width being greater than the IOMMU width is a > fundamental problem, it's that we've chosen a 64-bit MMIO policy that > makes this become a problem and QEMU only has a convenient mechanism to > check the host IOMMU width when a vfio device is present. Arguably, > when a vIOMMU is present guest firmware should be enlightened to > understand the address width of that vIOMMU when placing the 64-bit > MMIO window. I'd say the failure to do so is a current firmware bug. > > If the vIOMMU address width were honored, we could recommend users set > that to match the host, regardless of the vCPU physical address width. > We also already have a failure condition if the vIOMMU address width > exceeds the vfio-device (ie. indirectly the host) IOMMU width. > > Of course without a vIOMMU, given our 64-bit MMIO policy, we don't have > a good solution without specifying the 64-bit window or implementing a > more conservative placement. > > Not sure how much of this is immediately solvable and some indication > to the user how they can resolve the issue, such as implemented here, is > better than none, but maybe we can elaborate in a comment that this is > really more of a workaround for the current behavior of firmware > relative to the 64-bit MMIO placement policy. Thanks, Sure. I will improve the commit log in v3. Thanks, C. > Alex > >> + error_setg(errp, "Host physical address space (%u) is larger than " >> + "the host IOMMU address space (%u).", cpu_aw_bits, >> + iommu_aw_bits); >> + vfio_device_error_append(vbasedev, errp); >> + return false; >> + } >> + return true; >> +} >> + >> bool vfio_attach_device(char *name, VFIODevice *vbasedev, >> AddressSpace *as, Error **errp) >> { >> const VFIOIOMMUClass *ops = >> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY)); >> HostIOMMUDevice *hiod = NULL; >> + Error *local_err = NULL; >> >> if (vbasedev->iommufd) { >> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD)); >> @@ -1571,6 +1589,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev, >> return false; >> } >> >> + if (!vfio_device_check_address_space(vbasedev, &local_err)) { >> + warn_report_err(local_err); >> + } >> return true; >> } >> >
On Fri, Jan 31, 2025 at 01:42:31PM +0100, Cédric Le Goater wrote: > + Gerd for insights regarding vIOMMU support in edk2. > > > > +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp) > > > +{ > > > + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu); > > > + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev); > > > + > > > + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) { > > > > Should we be testing the 64-bit MMIO window and maximum RAM GPA rather > > than the vCPU physical address width? Placing the 64-bit mmio window is done by the guest firmware, so this isn't something qemu can check before boot. > > However, we've decided to do exactly that for the 64-bit MMIO window. > > It's not that the vCPU width being greater than the IOMMU width is a > > fundamental problem, it's that we've chosen a 64-bit MMIO policy that > > makes this become a problem and QEMU only has a convenient mechanism to > > check the host IOMMU width when a vfio device is present. Arguably, > > when a vIOMMU is present guest firmware should be enlightened to > > understand the address width of that vIOMMU when placing the 64-bit > > MMIO window. I'd say the failure to do so is a current firmware bug. edk2 has no iommu driver ... Is there some simple way to figure what the iommu width is (inside the guest)? Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a hint for the guest what the usable address width is. It was added because there are cases where the guest simply can't figure that it is not possible to use the full physical address space of the cpu. There are some non-obvious limitations around 5-level paging. Intel has some CPUs with phys-bits > 48 but only 4-level EPT for example. So one option to handle this is to make sure guest-phys-bits is not larger than the iommu width. take care, Gerd
Hello Gerd, On 1/31/25 14:23, Gerd Hoffmann wrote: > On Fri, Jan 31, 2025 at 01:42:31PM +0100, Cédric Le Goater wrote: >> + Gerd for insights regarding vIOMMU support in edk2. >> >>>> +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp) >>>> +{ >>>> + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu); >>>> + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev); >>>> + >>>> + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) { >>> >>> Should we be testing the 64-bit MMIO window and maximum RAM GPA rather >>> than the vCPU physical address width? > > Placing the 64-bit mmio window is done by the guest firmware, > so this isn't something qemu can check before boot. > >>> However, we've decided to do exactly that for the 64-bit MMIO window. >>> It's not that the vCPU width being greater than the IOMMU width is a >>> fundamental problem, it's that we've chosen a 64-bit MMIO policy that >>> makes this become a problem and QEMU only has a convenient mechanism to >>> check the host IOMMU width when a vfio device is present. Arguably, >>> when a vIOMMU is present guest firmware should be enlightened to >>> understand the address width of that vIOMMU when placing the 64-bit >>> MMIO window. I'd say the failure to do so is a current firmware bug. > > edk2 has no iommu driver ... > > Is there some simple way to figure what the iommu width is (inside the > guest)? When using VFIO, vfio_device_get_aw_bits() will return the IOMMU address space width. There are checks when using a vIOMMU that the host and vIOMMU address space are compatible. vIOMMU should be smaller. > Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a > hint for the guest what the usable address width is. It was added > because there are cases where the guest simply can't figure that it is > not possible to use the full physical address space of the cpu. There > are some non-obvious limitations around 5-level paging. Intel has some > CPUs with phys-bits > 48 but only 4-level EPT for example. > > So one option to handle this is to make sure guest-phys-bits is not > larger than the iommu width. Yes. This is what I am trying to do. Patch [1] returns X86_CPU(cs)->phys_bits. I was not sure which *phys* property to use. If you think this is incorrect and not returning the right information, I will change the proposal with guest-phys-bits. Thanks, C. [1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-8-clg@redhat.com/
On Fri, 31 Jan 2025 14:23:58 +0100 Gerd Hoffmann <kraxel@redhat.com> wrote: > On Fri, Jan 31, 2025 at 01:42:31PM +0100, Cédric Le Goater wrote: > > + Gerd for insights regarding vIOMMU support in edk2. > > > > > > +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp) > > > > +{ > > > > + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu); > > > > + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev); > > > > + > > > > + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) { > > > > > > Should we be testing the 64-bit MMIO window and maximum RAM GPA rather > > > than the vCPU physical address width? > > Placing the 64-bit mmio window is done by the guest firmware, > so this isn't something qemu can check before boot. > > > > However, we've decided to do exactly that for the 64-bit MMIO window. > > > It's not that the vCPU width being greater than the IOMMU width is a > > > fundamental problem, it's that we've chosen a 64-bit MMIO policy that > > > makes this become a problem and QEMU only has a convenient mechanism to > > > check the host IOMMU width when a vfio device is present. Arguably, > > > when a vIOMMU is present guest firmware should be enlightened to > > > understand the address width of that vIOMMU when placing the 64-bit > > > MMIO window. I'd say the failure to do so is a current firmware bug. > > edk2 has no iommu driver ... > > Is there some simple way to figure what the iommu width is (inside the > guest)? If the guest firmware is exposing a DMAR table (VT-d), there's a host address width field in that table. Otherwise there are capability registers on the DRHD units that could be queried. AMD-Vi seems to have similar information in the IVinfo table linked from the IVRS table. Maybe an iterative solution is ok, starting with the most common IOMMU types and assuming others are 64-bits wide until proven otherwise. > > Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a > hint for the guest what the usable address width is. It was added > because there are cases where the guest simply can't figure that it is > not possible to use the full physical address space of the cpu. There > are some non-obvious limitations around 5-level paging. Intel has some > CPUs with phys-bits > 48 but only 4-level EPT for example. > > So one option to handle this is to make sure guest-phys-bits is not > larger than the iommu width. Right, as Cédric notes that's sort of what happens here, but my concern was that the we're kind of incorrectly linking the cpu address width to the platform address width, which isn't specifically the requirement. It's valid to have CPU physical address width great than IOMMU address width, that exists on bare metal, but guest firmware needs to take IOMMU address width into account in placing the 64-bit MMIO window if we want PCI BARs to be DMA addressable. Thanks, Alex
> > Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a > > hint for the guest what the usable address width is. It was added > > because there are cases where the guest simply can't figure that it is > > not possible to use the full physical address space of the cpu. There > > are some non-obvious limitations around 5-level paging. Intel has some > > CPUs with phys-bits > 48 but only 4-level EPT for example. > > > > So one option to handle this is to make sure guest-phys-bits is not > > larger than the iommu width. > > Yes. This is what I am trying to do. > > Patch [1] returns X86_CPU(cs)->phys_bits. I was not sure which *phys* > property to use. If you think this is incorrect and not returning the > right information, I will change the proposal with guest-phys-bits. > [1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-8-clg@redhat.com/ Yes, guest-phys-bits should be used here (and the helpers renamed too for consistency, to avoid making all this more complex than it already is). take care, Gerd
Hi, > > Is there some simple way to figure what the iommu width is (inside the > > guest)? > > If the guest firmware is exposing a DMAR table (VT-d), there's a host > address width field in that table. Otherwise there are capability > registers on the DRHD units that could be queried. AMD-Vi seems to > have similar information in the IVinfo table linked from the IVRS > table. Maybe an iterative solution is ok, starting with the most > common IOMMU types and assuming others are 64-bits wide until proven > otherwise. Hmm. The guest firmware simply exposes the tables it gets from qemu (without parsing it). Also the acpi tables are loaded after the firmware created the address space layout, because qemu adjusts them according to the programming done by the firmware (set pci bus ranges according to bridge windows etc). So checking ACPI tables for that information doesn't look very attractive. The firmware would have to load the tables twice (once early to parse DMAR, once late for the final tables). Going for the capability registers might be possible. Can the hardware be probed for safely? Has AMD capability registers too? Third option would be using another channel to communicate the information from qemu to firmware, where using 'guest-phys-bits' would be one candidate. > Right, as Cédric notes that's sort of what happens here, but my concern > was that the we're kind of incorrectly linking the cpu address width to > the platform address width, which isn't specifically the requirement. There is a separate 'guest-phys-bits' property for that reason. The phys-bits of the CPU are not changed. take care, Gerd
On 2/6/25 08:54, Gerd Hoffmann wrote: >>> Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a >>> hint for the guest what the usable address width is. It was added >>> because there are cases where the guest simply can't figure that it is >>> not possible to use the full physical address space of the cpu. There >>> are some non-obvious limitations around 5-level paging. Intel has some >>> CPUs with phys-bits > 48 but only 4-level EPT for example. >>> >>> So one option to handle this is to make sure guest-phys-bits is not >>> larger than the iommu width. >> >> Yes. This is what I am trying to do. >> >> Patch [1] returns X86_CPU(cs)->phys_bits. I was not sure which *phys* >> property to use. If you think this is incorrect and not returning the >> right information, I will change the proposal with guest-phys-bits. > >> [1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-8-clg@redhat.com/ > > Yes, guest-phys-bits should be used here (and the helpers renamed too > for consistency, to avoid making all this more complex than it already > is). yep. I will do that in a dedicated series. Thanks C.
diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 5c9d8657d746ce30af5ae8f9122101e086a61ef5..e5ef93c589b2bed68f790608868ec3c7779d4cb8 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -44,6 +44,8 @@ #include "migration/qemu-file.h" #include "system/tpm.h" +#include "hw/core/cpu.h" + VFIODeviceList vfio_device_list = QLIST_HEAD_INITIALIZER(vfio_device_list); static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces = @@ -1546,12 +1548,28 @@ retry: return info; } +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp) +{ + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu); + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev); + + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) { + error_setg(errp, "Host physical address space (%u) is larger than " + "the host IOMMU address space (%u).", cpu_aw_bits, + iommu_aw_bits); + vfio_device_error_append(vbasedev, errp); + return false; + } + return true; +} + bool vfio_attach_device(char *name, VFIODevice *vbasedev, AddressSpace *as, Error **errp) { const VFIOIOMMUClass *ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY)); HostIOMMUDevice *hiod = NULL; + Error *local_err = NULL; if (vbasedev->iommufd) { ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD)); @@ -1571,6 +1589,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev, return false; } + if (!vfio_device_check_address_space(vbasedev, &local_err)) { + warn_report_err(local_err); + } return true; }
Print a warning if IOMMU address space width is smaller than the physical address width. In this case, PCI peer-to-peer transactions on BARs are not supported and failures of device MMIO regions are to be expected. This can occur with the 39-bit IOMMU address space width as found on consumer grade processors or when using a vIOMMU device with default settings. Signed-off-by: Cédric Le Goater <clg@redhat.com> --- hw/vfio/common.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)