Message ID | 20230808145916.81657-1-lersek@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF | expand |
On Tue, 8 Aug 2023 16:59:16 +0200 Laszlo Ersek <lersek@redhat.com> wrote: > The Solarflare Communications SFC9220 NIC's physical function (PF) appears > to expose an expansion ROM with the following characteristics: > > (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver). > Alex's rom-parser utility dumps it like this: > > > Valid ROM signature found @0h, PCIR offset 20h > > PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002 > > PCIR: revision 3, vendor revision: 1 > > Last image > > (2) The BIOS image crashes when booted on i440fx. > > (3) The BIOS image prints the following messages on q35: > > > Solarflare Boot Manager (v5.2.2.1006) > > Solarflare Communications 2008-2019 > > gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...] > > So it appears like a modified derivative of old gPXE. > > Alex surmised in advance that the BIOS image could be accessing > host-physical addresses rather than guest-phys ones, leading to the crash > on i440fx. ROMs sometimes take shortcuts around the standard interfaces to the device and can therefore hit gaps in the virtualization, which is why that's suspect to me. However if it works on q35 but not 440fx it might be more that we're not matching a PCI topology expectation of the ROM. Was it only tested on 440fx attached to the root bus or does it also fail if the PF is attached downstream of a PCI-to-PCI bridge? > Don't expose the option ROM BAR to the VM by default. While this prevents > netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario), > it does not make any difference for UEFI, and at least the VM doesn't > crash during boot on i440fx/SeaBIOS (a relatively frequent scenario). > Users can restore the original behavior via the QEMU cmdline and the > libvirt domain XML. > > (In two years, we've not seen any customer interest in this bug, hence > there's no incentive to investigate (2).) > > Cc: Alex Williamson <alex.williamson@redhat.com> (supporter:VFIO) > Cc: "Cédric Le Goater" <clg@redhat.com> (supporter:VFIO) > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776 > Signed-off-by: Laszlo Ersek <lersek@redhat.com> > --- > hw/vfio/pci-quirks.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c > index f4ff83680572..270eb16b91fa 100644 > --- a/hw/vfio/pci-quirks.c > +++ b/hw/vfio/pci-quirks.c > @@ -45,6 +45,10 @@ static const struct { > uint32_t device; > } rom_denylist[] = { > { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */ > + { 0x1924, 0x0a03 }, /* Solarflare Communications > + * SFC9220 10/40G Ethernet Controller > + * https://bugzilla.redhat.com/show_bug.cgi?id=1975776 Unfortunately this is not a public bz so there's not much point in referencing it in public code or commit log :-\ Thanks, Alex > + */ > }; > > bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)
On 8/8/23 17:40, Alex Williamson wrote: > On Tue, 8 Aug 2023 16:59:16 +0200 > Laszlo Ersek <lersek@redhat.com> wrote: > >> The Solarflare Communications SFC9220 NIC's physical function (PF) appears >> to expose an expansion ROM with the following characteristics: >> >> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver). >> Alex's rom-parser utility dumps it like this: >> >>> Valid ROM signature found @0h, PCIR offset 20h >>> PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002 >>> PCIR: revision 3, vendor revision: 1 >>> Last image >> >> (2) The BIOS image crashes when booted on i440fx. >> >> (3) The BIOS image prints the following messages on q35: >> >>> Solarflare Boot Manager (v5.2.2.1006) >>> Solarflare Communications 2008-2019 >>> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...] >> >> So it appears like a modified derivative of old gPXE. >> >> Alex surmised in advance that the BIOS image could be accessing >> host-physical addresses rather than guest-phys ones, leading to the crash >> on i440fx. > > ROMs sometimes take shortcuts around the standard interfaces to the > device and can therefore hit gaps in the virtualization, which is why > that's suspect to me. However if it works on q35 but not 440fx it > might be more that we're not matching a PCI topology expectation of the > ROM. Was it only tested on 440fx attached to the root bus or does it > also fail if the PF is attached downstream of a PCI-to-PCI bridge? I don't know; I'll need to test both of these setups then. > >> Don't expose the option ROM BAR to the VM by default. While this prevents >> netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario), >> it does not make any difference for UEFI, and at least the VM doesn't >> crash during boot on i440fx/SeaBIOS (a relatively frequent scenario). >> Users can restore the original behavior via the QEMU cmdline and the >> libvirt domain XML. >> >> (In two years, we've not seen any customer interest in this bug, hence >> there's no incentive to investigate (2).) >> >> Cc: Alex Williamson <alex.williamson@redhat.com> (supporter:VFIO) >> Cc: "Cédric Le Goater" <clg@redhat.com> (supporter:VFIO) >> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776 >> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >> --- >> hw/vfio/pci-quirks.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c >> index f4ff83680572..270eb16b91fa 100644 >> --- a/hw/vfio/pci-quirks.c >> +++ b/hw/vfio/pci-quirks.c >> @@ -45,6 +45,10 @@ static const struct { >> uint32_t device; >> } rom_denylist[] = { >> { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */ >> + { 0x1924, 0x0a03 }, /* Solarflare Communications >> + * SFC9220 10/40G Ethernet Controller >> + * https://bugzilla.redhat.com/show_bug.cgi?id=1975776 > > Unfortunately this is not a public bz so there's not much point in > referencing it in public code or commit log :-\ Thanks, The BZ is not public because it was originally (mis-)filed for the RH kernel, and those BZs are private by default. I'd corrected the BZ component yesterday, but didn't realize I should relax the BZ's visibility. I'll do that now. (That's the right thing to do regardless of whether this patch gets in or not.) Thanks! Laszlo > > Alex > >> + */ >> }; >> >> bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev) >
On 8/8/23 17:40, Alex Williamson wrote: > On Tue, 8 Aug 2023 16:59:16 +0200 > Laszlo Ersek <lersek@redhat.com> wrote: > >> The Solarflare Communications SFC9220 NIC's physical function (PF) appears >> to expose an expansion ROM with the following characteristics: >> >> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver). >> Alex's rom-parser utility dumps it like this: >> >>> Valid ROM signature found @0h, PCIR offset 20h >>> PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002 >>> PCIR: revision 3, vendor revision: 1 >>> Last image >> >> (2) The BIOS image crashes when booted on i440fx. >> >> (3) The BIOS image prints the following messages on q35: >> >>> Solarflare Boot Manager (v5.2.2.1006) >>> Solarflare Communications 2008-2019 >>> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...] >> >> So it appears like a modified derivative of old gPXE. >> >> Alex surmised in advance that the BIOS image could be accessing >> host-physical addresses rather than guest-phys ones, leading to the crash >> on i440fx. > > ROMs sometimes take shortcuts around the standard interfaces to the > device and can therefore hit gaps in the virtualization, which is why > that's suspect to me. However if it works on q35 but not 440fx it > might be more that we're not matching a PCI topology expectation of the > ROM. Was it only tested on 440fx attached to the root bus or does it > also fail if the PF is attached downstream of a PCI-to-PCI bridge? Turns out the oprom wants the NIC to have PCI device number 0, regardless of the bus number, and regardless of the bus's location in the PCI topology. Please drop this patch; I've documented the workaround in the BZ for now (which I've also opened up to the public). We should probably find a more visible place for the documentation, though. Thanks for pointing me in the right direction! Laszlo > >> Don't expose the option ROM BAR to the VM by default. While this prevents >> netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario), >> it does not make any difference for UEFI, and at least the VM doesn't >> crash during boot on i440fx/SeaBIOS (a relatively frequent scenario). >> Users can restore the original behavior via the QEMU cmdline and the >> libvirt domain XML. >> >> (In two years, we've not seen any customer interest in this bug, hence >> there's no incentive to investigate (2).) >> >> Cc: Alex Williamson <alex.williamson@redhat.com> (supporter:VFIO) >> Cc: "Cédric Le Goater" <clg@redhat.com> (supporter:VFIO) >> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776 >> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >> --- >> hw/vfio/pci-quirks.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c >> index f4ff83680572..270eb16b91fa 100644 >> --- a/hw/vfio/pci-quirks.c >> +++ b/hw/vfio/pci-quirks.c >> @@ -45,6 +45,10 @@ static const struct { >> uint32_t device; >> } rom_denylist[] = { >> { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */ >> + { 0x1924, 0x0a03 }, /* Solarflare Communications >> + * SFC9220 10/40G Ethernet Controller >> + * https://bugzilla.redhat.com/show_bug.cgi?id=1975776 > > Unfortunately this is not a public bz so there's not much point in > referencing it in public code or commit log :-\ Thanks, > > Alex > >> + */ >> }; >> >> bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev) >
On Wed, 9 Aug 2023 14:07:07 +0200 Laszlo Ersek <lersek@redhat.com> wrote: > On 8/8/23 17:40, Alex Williamson wrote: > > On Tue, 8 Aug 2023 16:59:16 +0200 > > Laszlo Ersek <lersek@redhat.com> wrote: > > > >> The Solarflare Communications SFC9220 NIC's physical function (PF) appears > >> to expose an expansion ROM with the following characteristics: > >> > >> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver). > >> Alex's rom-parser utility dumps it like this: > >> > >>> Valid ROM signature found @0h, PCIR offset 20h > >>> PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002 > >>> PCIR: revision 3, vendor revision: 1 > >>> Last image > >> > >> (2) The BIOS image crashes when booted on i440fx. > >> > >> (3) The BIOS image prints the following messages on q35: > >> > >>> Solarflare Boot Manager (v5.2.2.1006) > >>> Solarflare Communications 2008-2019 > >>> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...] > >> > >> So it appears like a modified derivative of old gPXE. > >> > >> Alex surmised in advance that the BIOS image could be accessing > >> host-physical addresses rather than guest-phys ones, leading to the crash > >> on i440fx. > > > > ROMs sometimes take shortcuts around the standard interfaces to the > > device and can therefore hit gaps in the virtualization, which is why > > that's suspect to me. However if it works on q35 but not 440fx it > > might be more that we're not matching a PCI topology expectation of the > > ROM. Was it only tested on 440fx attached to the root bus or does it > > also fail if the PF is attached downstream of a PCI-to-PCI bridge? > > Turns out the oprom wants the NIC to have PCI device number 0, > regardless of the bus number, and regardless of the bus's location in > the PCI topology. > > Please drop this patch; I've documented the workaround in the BZ for now > (which I've also opened up to the public). > > We should probably find a more visible place for the documentation, though. > > Thanks for pointing me in the right direction! Thanks for pursuing this, it's an interesting discovery and just the sort of shortcut I'd expect to find in a ROM. Documentation seems like a reasonable solution to me, especially given that the ROM works in the current recommended default configuration. We could probably come up with a device quirk to flag an incompatible configuration, but it's tricky to get the right balance since ROMs can be updated and we're not identifying the ROM itself, we're only essentially flagging that a bad ROM has been observed on a given device. Given the lack of prevalence of this device for assignment versus the VF use case, it's probably not worth the effort to do more. Thanks, Alex
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index f4ff83680572..270eb16b91fa 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -45,6 +45,10 @@ static const struct { uint32_t device; } rom_denylist[] = { { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */ + { 0x1924, 0x0a03 }, /* Solarflare Communications + * SFC9220 10/40G Ethernet Controller + * https://bugzilla.redhat.com/show_bug.cgi?id=1975776 + */ }; bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)