diff mbox series

vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF

Message ID 20230808145916.81657-1-lersek@redhat.com (mailing list archive)
State New, archived
Headers show
Series vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF | expand

Commit Message

Laszlo Ersek Aug. 8, 2023, 2:59 p.m. UTC
The Solarflare Communications SFC9220 NIC's physical function (PF) appears
to expose an expansion ROM with the following characteristics:

(1) Single-image ROM, with only a legacy BIOS image (no UEFI driver).
Alex's rom-parser utility dumps it like this:

> Valid ROM signature found @0h, PCIR offset 20h
>         PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002
>         PCIR: revision 3, vendor revision: 1
>         Last image

(2) The BIOS image crashes when booted on i440fx.

(3) The BIOS image prints the following messages on q35:

> Solarflare Boot Manager (v5.2.2.1006)
> Solarflare Communications 2008-2019
> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...]

So it appears like a modified derivative of old gPXE.

Alex surmised in advance that the BIOS image could be accessing
host-physical addresses rather than guest-phys ones, leading to the crash
on i440fx.

Don't expose the option ROM BAR to the VM by default. While this prevents
netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario),
it does not make any difference for UEFI, and at least the VM doesn't
crash during boot on i440fx/SeaBIOS (a relatively frequent scenario).
Users can restore the original behavior via the QEMU cmdline and the
libvirt domain XML.

(In two years, we've not seen any customer interest in this bug, hence
there's no incentive to investigate (2).)

Cc: Alex Williamson <alex.williamson@redhat.com> (supporter:VFIO)
Cc: "Cédric Le Goater" <clg@redhat.com> (supporter:VFIO)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 hw/vfio/pci-quirks.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Alex Williamson Aug. 8, 2023, 3:40 p.m. UTC | #1
On Tue,  8 Aug 2023 16:59:16 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> The Solarflare Communications SFC9220 NIC's physical function (PF) appears
> to expose an expansion ROM with the following characteristics:
> 
> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver).
> Alex's rom-parser utility dumps it like this:
> 
> > Valid ROM signature found @0h, PCIR offset 20h
> >         PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002
> >         PCIR: revision 3, vendor revision: 1
> >         Last image  
> 
> (2) The BIOS image crashes when booted on i440fx.
> 
> (3) The BIOS image prints the following messages on q35:
> 
> > Solarflare Boot Manager (v5.2.2.1006)
> > Solarflare Communications 2008-2019
> > gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...]  
> 
> So it appears like a modified derivative of old gPXE.
> 
> Alex surmised in advance that the BIOS image could be accessing
> host-physical addresses rather than guest-phys ones, leading to the crash
> on i440fx.

ROMs sometimes take shortcuts around the standard interfaces to the
device and can therefore hit gaps in the virtualization, which is why
that's suspect to me.  However if it works on q35 but not 440fx it
might be more that we're not matching a PCI topology expectation of the
ROM.  Was it only tested on 440fx attached to the root bus or does it
also fail if the PF is attached downstream of a PCI-to-PCI bridge?

> Don't expose the option ROM BAR to the VM by default. While this prevents
> netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario),
> it does not make any difference for UEFI, and at least the VM doesn't
> crash during boot on i440fx/SeaBIOS (a relatively frequent scenario).
> Users can restore the original behavior via the QEMU cmdline and the
> libvirt domain XML.
> 
> (In two years, we've not seen any customer interest in this bug, hence
> there's no incentive to investigate (2).)
> 
> Cc: Alex Williamson <alex.williamson@redhat.com> (supporter:VFIO)
> Cc: "Cédric Le Goater" <clg@redhat.com> (supporter:VFIO)
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  hw/vfio/pci-quirks.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index f4ff83680572..270eb16b91fa 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -45,6 +45,10 @@ static const struct {
>      uint32_t device;
>  } rom_denylist[] = {
>      { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */
> +    { 0x1924, 0x0a03 }, /* Solarflare Communications
> +                         * SFC9220 10/40G Ethernet Controller
> +                         * https://bugzilla.redhat.com/show_bug.cgi?id=1975776

Unfortunately this is not a public bz so there's not much point in
referencing it in public code or commit log :-\  Thanks,

Alex

> +                         */
>  };
>  
>  bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)
Laszlo Ersek Aug. 9, 2023, 9:07 a.m. UTC | #2
On 8/8/23 17:40, Alex Williamson wrote:
> On Tue,  8 Aug 2023 16:59:16 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> The Solarflare Communications SFC9220 NIC's physical function (PF) appears
>> to expose an expansion ROM with the following characteristics:
>>
>> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver).
>> Alex's rom-parser utility dumps it like this:
>>
>>> Valid ROM signature found @0h, PCIR offset 20h
>>>         PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002
>>>         PCIR: revision 3, vendor revision: 1
>>>         Last image  
>>
>> (2) The BIOS image crashes when booted on i440fx.
>>
>> (3) The BIOS image prints the following messages on q35:
>>
>>> Solarflare Boot Manager (v5.2.2.1006)
>>> Solarflare Communications 2008-2019
>>> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...]  
>>
>> So it appears like a modified derivative of old gPXE.
>>
>> Alex surmised in advance that the BIOS image could be accessing
>> host-physical addresses rather than guest-phys ones, leading to the crash
>> on i440fx.
> 
> ROMs sometimes take shortcuts around the standard interfaces to the
> device and can therefore hit gaps in the virtualization, which is why
> that's suspect to me.  However if it works on q35 but not 440fx it
> might be more that we're not matching a PCI topology expectation of the
> ROM.  Was it only tested on 440fx attached to the root bus or does it
> also fail if the PF is attached downstream of a PCI-to-PCI bridge?

I don't know; I'll need to test both of these setups then.

> 
>> Don't expose the option ROM BAR to the VM by default. While this prevents
>> netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario),
>> it does not make any difference for UEFI, and at least the VM doesn't
>> crash during boot on i440fx/SeaBIOS (a relatively frequent scenario).
>> Users can restore the original behavior via the QEMU cmdline and the
>> libvirt domain XML.
>>
>> (In two years, we've not seen any customer interest in this bug, hence
>> there's no incentive to investigate (2).)
>>
>> Cc: Alex Williamson <alex.williamson@redhat.com> (supporter:VFIO)
>> Cc: "Cédric Le Goater" <clg@redhat.com> (supporter:VFIO)
>> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>> ---
>>  hw/vfio/pci-quirks.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index f4ff83680572..270eb16b91fa 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -45,6 +45,10 @@ static const struct {
>>      uint32_t device;
>>  } rom_denylist[] = {
>>      { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */
>> +    { 0x1924, 0x0a03 }, /* Solarflare Communications
>> +                         * SFC9220 10/40G Ethernet Controller
>> +                         * https://bugzilla.redhat.com/show_bug.cgi?id=1975776
> 
> Unfortunately this is not a public bz so there's not much point in
> referencing it in public code or commit log :-\  Thanks,

The BZ is not public because it was originally (mis-)filed for the RH
kernel, and those BZs are private by default. I'd corrected the BZ
component yesterday, but didn't realize I should relax the BZ's
visibility. I'll do that now. (That's the right thing to do regardless
of whether this patch gets in or not.)

Thanks!
Laszlo

> 
> Alex
> 
>> +                         */
>>  };
>>  
>>  bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)
>
Laszlo Ersek Aug. 9, 2023, 12:07 p.m. UTC | #3
On 8/8/23 17:40, Alex Williamson wrote:
> On Tue,  8 Aug 2023 16:59:16 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> The Solarflare Communications SFC9220 NIC's physical function (PF) appears
>> to expose an expansion ROM with the following characteristics:
>>
>> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver).
>> Alex's rom-parser utility dumps it like this:
>>
>>> Valid ROM signature found @0h, PCIR offset 20h
>>>         PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002
>>>         PCIR: revision 3, vendor revision: 1
>>>         Last image  
>>
>> (2) The BIOS image crashes when booted on i440fx.
>>
>> (3) The BIOS image prints the following messages on q35:
>>
>>> Solarflare Boot Manager (v5.2.2.1006)
>>> Solarflare Communications 2008-2019
>>> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...]  
>>
>> So it appears like a modified derivative of old gPXE.
>>
>> Alex surmised in advance that the BIOS image could be accessing
>> host-physical addresses rather than guest-phys ones, leading to the crash
>> on i440fx.
> 
> ROMs sometimes take shortcuts around the standard interfaces to the
> device and can therefore hit gaps in the virtualization, which is why
> that's suspect to me.  However if it works on q35 but not 440fx it
> might be more that we're not matching a PCI topology expectation of the
> ROM.  Was it only tested on 440fx attached to the root bus or does it
> also fail if the PF is attached downstream of a PCI-to-PCI bridge?

Turns out the oprom wants the NIC to have PCI device number 0,
regardless of the bus number, and regardless of the bus's location in
the PCI topology.

Please drop this patch; I've documented the workaround in the BZ for now
(which I've also opened up to the public).

We should probably find a more visible place for the documentation, though.

Thanks for pointing me in the right direction!
Laszlo

> 
>> Don't expose the option ROM BAR to the VM by default. While this prevents
>> netbooting the VM off the PF on q35/SeaBIOS (a relatively rare scenario),
>> it does not make any difference for UEFI, and at least the VM doesn't
>> crash during boot on i440fx/SeaBIOS (a relatively frequent scenario).
>> Users can restore the original behavior via the QEMU cmdline and the
>> libvirt domain XML.
>>
>> (In two years, we've not seen any customer interest in this bug, hence
>> there's no incentive to investigate (2).)
>>
>> Cc: Alex Williamson <alex.williamson@redhat.com> (supporter:VFIO)
>> Cc: "Cédric Le Goater" <clg@redhat.com> (supporter:VFIO)
>> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1975776
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>> ---
>>  hw/vfio/pci-quirks.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index f4ff83680572..270eb16b91fa 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -45,6 +45,10 @@ static const struct {
>>      uint32_t device;
>>  } rom_denylist[] = {
>>      { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */
>> +    { 0x1924, 0x0a03 }, /* Solarflare Communications
>> +                         * SFC9220 10/40G Ethernet Controller
>> +                         * https://bugzilla.redhat.com/show_bug.cgi?id=1975776
> 
> Unfortunately this is not a public bz so there's not much point in
> referencing it in public code or commit log :-\  Thanks,
> 
> Alex
> 
>> +                         */
>>  };
>>  
>>  bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)
>
Alex Williamson Aug. 9, 2023, 5:03 p.m. UTC | #4
On Wed, 9 Aug 2023 14:07:07 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 8/8/23 17:40, Alex Williamson wrote:
> > On Tue,  8 Aug 2023 16:59:16 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> The Solarflare Communications SFC9220 NIC's physical function (PF) appears
> >> to expose an expansion ROM with the following characteristics:
> >>
> >> (1) Single-image ROM, with only a legacy BIOS image (no UEFI driver).
> >> Alex's rom-parser utility dumps it like this:
> >>  
> >>> Valid ROM signature found @0h, PCIR offset 20h
> >>>         PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002
> >>>         PCIR: revision 3, vendor revision: 1
> >>>         Last image    
> >>
> >> (2) The BIOS image crashes when booted on i440fx.
> >>
> >> (3) The BIOS image prints the following messages on q35:
> >>  
> >>> Solarflare Boot Manager (v5.2.2.1006)
> >>> Solarflare Communications 2008-2019
> >>> gPXE (http://etherboot.org) - [...] PCI[...] PnP PMM[...]    
> >>
> >> So it appears like a modified derivative of old gPXE.
> >>
> >> Alex surmised in advance that the BIOS image could be accessing
> >> host-physical addresses rather than guest-phys ones, leading to the crash
> >> on i440fx.  
> > 
> > ROMs sometimes take shortcuts around the standard interfaces to the
> > device and can therefore hit gaps in the virtualization, which is why
> > that's suspect to me.  However if it works on q35 but not 440fx it
> > might be more that we're not matching a PCI topology expectation of the
> > ROM.  Was it only tested on 440fx attached to the root bus or does it
> > also fail if the PF is attached downstream of a PCI-to-PCI bridge?  
> 
> Turns out the oprom wants the NIC to have PCI device number 0,
> regardless of the bus number, and regardless of the bus's location in
> the PCI topology.
> 
> Please drop this patch; I've documented the workaround in the BZ for now
> (which I've also opened up to the public).
> 
> We should probably find a more visible place for the documentation, though.
> 
> Thanks for pointing me in the right direction!

Thanks for pursuing this, it's an interesting discovery and just the
sort of shortcut I'd expect to find in a ROM.  Documentation seems like
a reasonable solution to me, especially given that the ROM works in the
current recommended default configuration.

We could probably come up with a device quirk to flag an incompatible
configuration, but it's tricky to get the right balance since ROMs can
be updated and we're not identifying the ROM itself, we're only
essentially flagging that a bad ROM has been observed on a given device.

Given the lack of prevalence of this device for assignment versus the
VF use case, it's probably not worth the effort to do more.  Thanks,

Alex
diff mbox series

Patch

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index f4ff83680572..270eb16b91fa 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -45,6 +45,10 @@  static const struct {
     uint32_t device;
 } rom_denylist[] = {
     { 0x14e4, 0x168e }, /* Broadcom BCM 57810 */
+    { 0x1924, 0x0a03 }, /* Solarflare Communications
+                         * SFC9220 10/40G Ethernet Controller
+                         * https://bugzilla.redhat.com/show_bug.cgi?id=1975776
+                         */
 };
 
 bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev)