mbox series

[v5,0/8] virtio-net: add support for SR-IOV emulation

Message ID 20240715-sriov-v5-0-3f5539093ffc@daynix.com (mailing list archive)
Headers show
Series virtio-net: add support for SR-IOV emulation | expand

Message

Akihiko Odaki July 15, 2024, 5:19 a.m. UTC
Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com>
("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")

Introduction
------------

This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.

The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.

User Interface
--------------

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
  -netdev user,id=n -netdev user,id=o
  -netdev user,id=p -netdev user,id=q
  -device pcie-root-port,id=b
  -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs have
function numbers larger than one of the PF, and the function numbers
have a consistent stride.

Keeping VF instances
--------------------

A problem with SR-IOV emulation is that it needs to hotplug the VFs as
the guest requests. Previously, this behavior was implemented by
realizing and unrealizing VFs at runtime. However, this strategy does
not work well for the proposed virtio-net emulation; in this proposal,
device options passed in the command line must be maintained as VFs
are hotplugged, but they are consumed when the machine starts and not
available after that, which makes realizing VFs at runtime impossible.

As an strategy alternative to runtime realization/unrealization, this
series proposes to reuse the code to power down PCI Express devices.
When a PCI Express device is powered down, it will be hidden from the
guest but will be kept realized. This effectively implements the
behavior we need for the SR-IOV emulation.

Summary
-------

Patch 1 disables ROM BAR, which virtio-net-pci enables by default, for
VFs.
Patch 2 makes zero stride valid for 1 VF configuration.
Patch 3 and 4 adds validations.
Patch 5 adds user-created SR-IOV VF infrastructure.
Patch 6 makes virtio-pci work as SR-IOV PF for user-created VFs.
Patch 7 allows user to create SR-IOV VFs with virtio-net-pci.

[1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
[2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/

Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Changes in v5:
- Dropped the RFC tag.
- Fixed device unrealization.
- Rebased.
- Link to v4: https://lore.kernel.org/r/20240428-sriov-v4-0-ac8ac6212982@daynix.com

Changes in v4:
- Added patch "hw/pci: Fix SR-IOV VF number calculation" to fix division
  by zero reported by Yui Washizu.
- Rebased.
- Link to v3: https://lore.kernel.org/r/20240305-sriov-v3-0-abdb75770372@daynix.com

Changes in v3:
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com

Changes in v2:
- Changed to keep VF instances.
- Link to v1: https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7bd6@daynix.com

---
Akihiko Odaki (8):
      hw/pci: Do not add ROM BAR for SR-IOV VF
      hw/pci: Fix SR-IOV VF number calculation
      pcie_sriov: Ensure PF and VF are mutually exclusive
      pcie_sriov: Check PCI Express for SR-IOV PF
      pcie_sriov: Allow user to create SR-IOV device
      virtio-pci: Implement SR-IOV PF
      virtio-net: Implement SR-IOV VF
      docs: Document composable SR-IOV device

 MAINTAINERS                    |   1 +
 docs/system/index.rst          |   1 +
 docs/system/sriov.rst          |  36 +++++
 include/hw/pci/pci_device.h    |   6 +-
 include/hw/pci/pcie_sriov.h    |  18 +++
 include/hw/virtio/virtio-pci.h |   1 +
 hw/pci/pci.c                   |  76 +++++++----
 hw/pci/pcie_sriov.c            | 298 +++++++++++++++++++++++++++++++++--------
 hw/virtio/virtio-net-pci.c     |   1 +
 hw/virtio/virtio-pci.c         |  20 ++-
 10 files changed, 369 insertions(+), 89 deletions(-)
---
base-commit: e7b8390b9167be9272c2c959919d9b397842da7f
change-id: 20231202-sriov-9402fb262be8

Best regards,

Comments

Michael S. Tsirkin July 30, 2024, 11:37 a.m. UTC | #1
On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote:
> Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com>
> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")

OK I will revert this for now. We'll try again after the release,
there will be time to address s390.
Akihiko Odaki July 30, 2024, 12:26 p.m. UTC | #2
On 2024/07/30 20:37, Michael S. Tsirkin wrote:
> On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote:
>> Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com>
>> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
> 
> OK I will revert this for now. We'll try again after the release,
> there will be time to address s390.
> 

I'd like to know if anybody wants to use igb on a s390x machine 
configured with libvirt. Such a configuration is already kind of broken, 
and it is likely to require significant effort on both side of libvirt 
and QEMU to fix it.

As an alternative, I'm also introducing SR-IOV support to 
virtio-net-pci. It does not suffer the same problem with igb thanks to 
its flexible configuration mechanism.

Regards,
Akihiko Odaki
Michael S. Tsirkin July 30, 2024, 5:56 p.m. UTC | #3
On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote:
> On 2024/07/30 20:37, Michael S. Tsirkin wrote:
> > On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote:
> > > Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com>
> > > ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
> > 
> > OK I will revert this for now. We'll try again after the release,
> > there will be time to address s390.
> > 
> 
> I'd like to know if anybody wants to use igb on a s390x machine configured
> with libvirt. Such a configuration is already kind of broken, and it is
> likely to require significant effort on both side of libvirt and QEMU to fix
> it.


I assume Cédric wouldn't report it if was unused.


> As an alternative, I'm also introducing SR-IOV support to virtio-net-pci. It
> does not suffer the same problem with igb thanks to its flexible
> configuration mechanism.
> 
> Regards,
> Akihiko Odaki

Sounds like this needs more review time, anyway.
Cédric Le Goater July 31, 2024, 8:58 a.m. UTC | #4
On 7/30/24 19:56, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote:
>> On 2024/07/30 20:37, Michael S. Tsirkin wrote:
>>> On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote:
>>>> Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com>
>>>> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
>>>
>>> OK I will revert this for now. We'll try again after the release,
>>> there will be time to address s390.
>>>
>>
>> I'd like to know if anybody wants to use igb on a s390x machine configured
>> with libvirt. Such a configuration is already kind of broken, and it is
>> likely to require significant effort on both side of libvirt and QEMU to fix
>> it.
> 
> 
> I assume Cédric wouldn't report it if was unused.
> 
> 
>> As an alternative, I'm also introducing SR-IOV support to virtio-net-pci. It
>> does not suffer the same problem with igb thanks to its flexible
>> configuration mechanism.
>>
>> Regards,
>> Akihiko Odaki
> 
> Sounds like this needs more review time, anyway.
> 
> 

Using an emulated IGB device in a s390x VM is not a common scenario.
The IGB device is not supported downstream (RHEL on Z). However, the
change broke the s390x machines I use for upstream QEMU development,
I removed the IGB device as a fix for now.

The Z PCI device model has 'uid' and 'fid' properties which are set
by the VMM, and in this case, they are auto-generated, hence the
conflicting ids with libvirt. This is Z specific but, possibly there
are subtle use cases on other platforms which could have similar
consequences. Something to be aware of.


Also, and this is why I thought it was important to report. Creating
PCI devices at machine init time (with -nodefaults) in the back of the
management layer is a no-no. libvirt requests to have full control
on the machine topology and this change seems like a violation of
this rule, even if VFs are kind of special.

Daniel, did I understand correctly the above constraint on the machine
definition ?


I can't say if we should revert for 9.1. Again, this Z is specific.
The changes were introduced to catch errors early when the PF device
is realized. There is no doubt they are useful. Some rework is needed
to avoid the conflicting id issue though.


Thanks,

C.
Akihiko Odaki Aug. 1, 2024, 7:13 a.m. UTC | #5
On 2024/07/31 17:58, Cédric Le Goater wrote:
> On 7/30/24 19:56, Michael S. Tsirkin wrote:
>> On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote:
>>> On 2024/07/30 20:37, Michael S. Tsirkin wrote:
>>>> On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote:
>>>>> Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com>
>>>>> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
>>>>
>>>> OK I will revert this for now. We'll try again after the release,
>>>> there will be time to address s390.
>>>>
>>>
>>> I'd like to know if anybody wants to use igb on a s390x machine 
>>> configured
>>> with libvirt. Such a configuration is already kind of broken, and it is
>>> likely to require significant effort on both side of libvirt and QEMU 
>>> to fix
>>> it.
>>
>>
>> I assume Cédric wouldn't report it if was unused.
>>
>>
>>> As an alternative, I'm also introducing SR-IOV support to 
>>> virtio-net-pci. It
>>> does not suffer the same problem with igb thanks to its flexible
>>> configuration mechanism.
>>>
>>> Regards,
>>> Akihiko Odaki
>>
>> Sounds like this needs more review time, anyway.
>>
>>
> 
> Using an emulated IGB device in a s390x VM is not a common scenario.
> The IGB device is not supported downstream (RHEL on Z). However, the
> change broke the s390x machines I use for upstream QEMU development,
> I removed the IGB device as a fix for now.
> 
> The Z PCI device model has 'uid' and 'fid' properties which are set
> by the VMM, and in this case, they are auto-generated, hence the
> conflicting ids with libvirt. This is Z specific but, possibly there
> are subtle use cases on other platforms which could have similar
> consequences. Something to be aware of.
> 
> 
> Also, and this is why I thought it was important to report. Creating
> PCI devices at machine init time (with -nodefaults) in the back of the
> management layer is a no-no. libvirt requests to have full control
> on the machine topology and this change seems like a violation of
> this rule, even if VFs are kind of special.
> 
> Daniel, did I understand correctly the above constraint on the machine
> definition ?

It is problematic even if it is not init time. QEMU shouldn't implicitly 
add a PCI device without letting the management layer know.

For the virtio-net SR-IOV support which I have been preparing, I'm 
avoiding automatically adding PCI devices but instead letting the user 
create VFs. However, fixing existing SR-IOV devices (igb and nvme) will 
require more efforts and possible breaking changes of the command line.

> 
> 
> I can't say if we should revert for 9.1. Again, this Z is specific.
> The changes were introduced to catch errors early when the PF device
> is realized. There is no doubt they are useful. Some rework is needed
> to avoid the conflicting id issue though.

I think it is a good idea to revert these patches for now, and add them 
back along with the virtio-net-pci SR-IOV support when the next merge 
window starts though I don't require to do so. If someone experiences 
this problem in the next merge window, we can advise to use 
virtio-net-pci, or to specify qemu:commandline to workaround it. We can 
start thinking of making a breaking change if neither of them satisfies 
the requirement.

Regards,
Akihiko Odaki
Michael S. Tsirkin Aug. 1, 2024, 7:51 a.m. UTC | #6
On Thu, Aug 01, 2024 at 04:13:14PM +0900, Akihiko Odaki wrote:
> I think it is a good idea to revert these patches for now

OK I reverted the 2 patchsets. there were some bugfixes there
but I had to revert them too due to the dependency.
If appropriate, feel free to resubmit just the fixes.