diff mbox series

[PATCH/RFC,net-next,1/2] devlink: expose port function commands to assign VFs to multiple netdevs

Message ID 20230206153603.2801791-2-simon.horman@corigine.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series devlink: expose port function commands to assign VFs to multiple devlink | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 383 this patch: 383
netdev/cc_maintainers warning 4 maintainers not CCed: corbet@lwn.net jiri@nvidia.com linux-doc@vger.kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 534 this patch: 534
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 14 this patch: 14
netdev/source_inline success Was 0 now: 0

Commit Message

Simon Horman Feb. 6, 2023, 3:36 p.m. UTC
From: Fei Qin <fei.qin@corigine.com>

Multiple physical ports of the same NIC may share the single
PCI address. In some cases, assigning VFs to different physical
ports can be demanded, especially under high-traffic scenario.
Load balancing can be realized in virtualised use¬cases through
distributing packets between different physical ports with LAGs
of VFs which are assigned to those physical ports.

This patch adds new attribute "vf_count" to 'devlink port function'
API which only can be shown and configured under devlink ports
with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL".

e.g.
$ devlink port show pci/0000:82:00.0/0
pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical
port 0 splittable true lanes 4
    function:
       vf_count 0

$ devlink port function set pci/0000:82:00.0/0 vf_count 3

$ devlink port show pci/0000:82:00.0/0
pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical
port 0 splittable true lanes 4
    function:
       vf_count 3

Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
---
 .../networking/devlink/devlink-port.rst       | 24 +++++++
 include/net/devlink.h                         | 21 ++++++
 include/uapi/linux/devlink.h                  |  1 +
 net/devlink/leftover.c                        | 65 +++++++++++++++++++
 4 files changed, 111 insertions(+)

Comments

Jakub Kicinski Feb. 7, 2023, 2:42 a.m. UTC | #1
On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
> +VF assignment setup
> +---------------------------
> +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
> +different ports.

Please make sure you run make htmldocs when changing docs,
this will warn.

> +- Get count of VFs assigned to physical port::
> +
> +    $ devlink port show pci/0000:82:00.0/0
> +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4

Physical port has VFs? My knee jerk reaction is that allocating
resources via devlink is fine but this seems to lean a bit into
forwarding. How do other vendors do it? What's the mapping of VFs
to ports?

What do you suggest should happen when user enables switchdev mode?
Simon Horman Feb. 8, 2023, 10:38 a.m. UTC | #2
On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
> On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
> > +VF assignment setup
> > +---------------------------
> > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
> > +different ports.
> 
> Please make sure you run make htmldocs when changing docs,
> this will warn.

Thanks, will do.

> > +- Get count of VFs assigned to physical port::
> > +
> > +    $ devlink port show pci/0000:82:00.0/0
> > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
> 
> Physical port has VFs? My knee jerk reaction is that allocating
> resources via devlink is fine but this seems to lean a bit into
> forwarding. How do other vendors do it? What's the mapping of VFs
> to ports?

We are not aware of any non-vendor-specific mechanism.
If there is one we'd gladly consider it.

> What do you suggest should happen when user enables switchdev mode?

Thanks for pointing this out. It ought to be documented.

Prior to this patch-set, for all NFP application firmwares supported by
upstream (and I'm not implying there is anything sneaky elsewhere - I am
honestly not aware of any such things), the embedded switch on the NIC
(which as you know is software running on the NIC), allows forwarding of
packets between VFs and physical ports without any partitioning - other
than what policy may implement.

This remains the behaviour if the feature proposed by this patch-set is not
enabled, either because it is unsupported by the application firmware, or
has not been enabled, i.e. using devlink.

If the feature is enabled, then in effect there is a partition between
VFs on one physical port and those on another. And some sort of forwarding
in software (on the host) is required. It is my understanding that
this is the dominant behaviour amongst multi-port NICs from other vendors
which, by default (and perhaps can only), associate VFs with specific
physical ports.

This is my understanding of things.
And I believe this applies in both switchdev and legacy mode.
Leon Romanovsky Feb. 8, 2023, 11:21 a.m. UTC | #3
On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
> On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
> > +VF assignment setup
> > +---------------------------
> > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
> > +different ports.
> 
> Please make sure you run make htmldocs when changing docs,
> this will warn.
> 
> > +- Get count of VFs assigned to physical port::
> > +
> > +    $ devlink port show pci/0000:82:00.0/0
> > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
> 
> Physical port has VFs? My knee jerk reaction is that allocating
> resources via devlink is fine but this seems to lean a bit into
> forwarding. How do other vendors do it? What's the mapping of VFs
> to ports?

I don't understand the meaning of VFs here. If we are talking about PCI
VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which
talks about having one bit to enable all VFs at once. All these VFs will
have separate netdevs.

> 
> What do you suggest should happen when user enables switchdev mode?
>
Simon Horman Feb. 8, 2023, 11:36 a.m. UTC | #4
On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote:
> On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
> > On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
> > > +VF assignment setup
> > > +---------------------------
> > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
> > > +different ports.
> > 
> > Please make sure you run make htmldocs when changing docs,
> > this will warn.
> > 
> > > +- Get count of VFs assigned to physical port::
> > > +
> > > +    $ devlink port show pci/0000:82:00.0/0
> > > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
> > 
> > Physical port has VFs? My knee jerk reaction is that allocating
> > resources via devlink is fine but this seems to lean a bit into
> > forwarding. How do other vendors do it? What's the mapping of VFs
> > to ports?
> 
> I don't understand the meaning of VFs here. If we are talking about PCI
> VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which
> talks about having one bit to enable all VFs at once. All these VFs will
> have separate netdevs.

Yes, that is the case here too (before and after).

What we are talking about is the association of VFs to physical ports
(in the case where a NIC has more than one physical port).
Jiri Pirko Feb. 8, 2023, 11:40 a.m. UTC | #5
Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote:
>From: Fei Qin <fei.qin@corigine.com>
>
>Multiple physical ports of the same NIC may share the single
>PCI address. In some cases, assigning VFs to different physical
>ports can be demanded, especially under high-traffic scenario.
>Load balancing can be realized in virtualised use¬cases through
>distributing packets between different physical ports with LAGs
>of VFs which are assigned to those physical ports.
>
>This patch adds new attribute "vf_count" to 'devlink port function'
>API which only can be shown and configured under devlink ports
>with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL".

I have to be missing something. That is the meaning of "assigning VF"
to a physical port? Why there should be any relationship between
physical port and VF other than configured forwarding (using TC for
example)?

This seems very wrong. Preliminary NAK.


>
>e.g.
>$ devlink port show pci/0000:82:00.0/0
>pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical
>port 0 splittable true lanes 4
>    function:
>       vf_count 0
>
>$ devlink port function set pci/0000:82:00.0/0 vf_count 3
>
>$ devlink port show pci/0000:82:00.0/0
>pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical
>port 0 splittable true lanes 4
>    function:
>       vf_count 3
>
>Signed-off-by: Fei Qin <fei.qin@corigine.com>
>Signed-off-by: Simon Horman <simon.horman@corigine.com>
Jiri Pirko Feb. 8, 2023, 11:41 a.m. UTC | #6
Wed, Feb 08, 2023 at 12:36:53PM CET, simon.horman@corigine.com wrote:
>On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote:
>> On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
>> > On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
>> > > +VF assignment setup
>> > > +---------------------------
>> > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
>> > > +different ports.
>> > 
>> > Please make sure you run make htmldocs when changing docs,
>> > this will warn.
>> > 
>> > > +- Get count of VFs assigned to physical port::
>> > > +
>> > > +    $ devlink port show pci/0000:82:00.0/0
>> > > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
>> > 
>> > Physical port has VFs? My knee jerk reaction is that allocating
>> > resources via devlink is fine but this seems to lean a bit into
>> > forwarding. How do other vendors do it? What's the mapping of VFs
>> > to ports?
>> 
>> I don't understand the meaning of VFs here. If we are talking about PCI
>> VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which
>> talks about having one bit to enable all VFs at once. All these VFs will
>> have separate netdevs.
>
>Yes, that is the case here too (before and after).
>
>What we are talking about is the association of VFs to physical ports
>(in the case where a NIC has more than one physical port).

What is "the association"?
Leon Romanovsky Feb. 8, 2023, 11:53 a.m. UTC | #7
On Wed, Feb 08, 2023 at 12:36:53PM +0100, Simon Horman wrote:
> On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote:
> > On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
> > > On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
> > > > +VF assignment setup
> > > > +---------------------------
> > > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
> > > > +different ports.
> > > 
> > > Please make sure you run make htmldocs when changing docs,
> > > this will warn.
> > > 
> > > > +- Get count of VFs assigned to physical port::
> > > > +
> > > > +    $ devlink port show pci/0000:82:00.0/0
> > > > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
> > > 
> > > Physical port has VFs? My knee jerk reaction is that allocating
> > > resources via devlink is fine but this seems to lean a bit into
> > > forwarding. How do other vendors do it? What's the mapping of VFs
> > > to ports?
> > 
> > I don't understand the meaning of VFs here. If we are talking about PCI
> > VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which
> > talks about having one bit to enable all VFs at once. All these VFs will
> > have separate netdevs.
> 
> Yes, that is the case here too (before and after).
> 
> What we are talking about is the association of VFs to physical ports
> (in the case where a NIC has more than one physical port).

We have devices with multiple ports too, but don't have such issues.
So it will help if you can provide more context here.

I'm failing to see connection between physical ports and physical VFs.

Are you saying that physical ports are actual PCI VFs, which spans L2 VFs,
which you want to assign to another port (PF)?

Thanks
Simon Horman Feb. 8, 2023, 12:05 p.m. UTC | #8
On Wed, Feb 08, 2023 at 01:53:48PM +0200, Leon Romanovsky wrote:
> On Wed, Feb 08, 2023 at 12:36:53PM +0100, Simon Horman wrote:
> > On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote:
> > > On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
> > > > On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
> > > > > +VF assignment setup
> > > > > +---------------------------
> > > > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
> > > > > +different ports.
> > > > 
> > > > Please make sure you run make htmldocs when changing docs,
> > > > this will warn.
> > > > 
> > > > > +- Get count of VFs assigned to physical port::
> > > > > +
> > > > > +    $ devlink port show pci/0000:82:00.0/0
> > > > > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
> > > > 
> > > > Physical port has VFs? My knee jerk reaction is that allocating
> > > > resources via devlink is fine but this seems to lean a bit into
> > > > forwarding. How do other vendors do it? What's the mapping of VFs
> > > > to ports?
> > > 
> > > I don't understand the meaning of VFs here. If we are talking about PCI
> > > VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which
> > > talks about having one bit to enable all VFs at once. All these VFs will
> > > have separate netdevs.
> > 
> > Yes, that is the case here too (before and after).
> > 
> > What we are talking about is the association of VFs to physical ports
> > (in the case where a NIC has more than one physical port).
> 
> We have devices with multiple ports too, but don't have such issues.
> So it will help if you can provide more context here.
> 
> I'm failing to see connection between physical ports and physical VFs.
> 
> Are you saying that physical ports are actual PCI VFs, which spans L2 VFs,
> which you want to assign to another port (PF)?

No, a physical port is not a VF (nor a PF, FWIIW).

The topic here is about two modes of behaviour.

One, where VFs are associated with physical ports - conceptually one might
think of this as some VFs and one phys port being plugged into one VEB,
while other VFs and another phys port are plugged into another VEB.

I believe this is the mode on most multi-port NICs.

And another mode where all VFs are associated with one physical port,
even if more than one is present. The NFP currently implements this model.

This patch is about allowing NICs, in particular the NFP based NICs,
to switch modes.
Simon Horman Feb. 8, 2023, 12:07 p.m. UTC | #9
On Wed, Feb 08, 2023 at 12:40:45PM +0100, Jiri Pirko wrote:
> Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote:
> >From: Fei Qin <fei.qin@corigine.com>
> >
> >Multiple physical ports of the same NIC may share the single
> >PCI address. In some cases, assigning VFs to different physical
> >ports can be demanded, especially under high-traffic scenario.
> >Load balancing can be realized in virtualised use¬cases through
> >distributing packets between different physical ports with LAGs
> >of VFs which are assigned to those physical ports.
> >
> >This patch adds new attribute "vf_count" to 'devlink port function'
> >API which only can be shown and configured under devlink ports
> >with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL".
> 
> I have to be missing something. That is the meaning of "assigning VF"
> to a physical port? Why there should be any relationship between
> physical port and VF other than configured forwarding (using TC for
> example)?
> 
> This seems very wrong. Preliminary NAK.

Of course if TC is involved, then we have flexibility.

What we are talking about here is primarily legacy mode.
And the behaviour described would, when enabled allow NFP based NICs
to behave more like most other multi-port NICs.

That is, we can envisage a VEB with some VFs and one physical port.
And anther with other VFs and another physical port.

This is as opposed to a single VEB with all VFs, as is currently
the case on NFP based NICs (but not most other multi-port NICs).
Simon Horman Feb. 8, 2023, 12:09 p.m. UTC | #10
On Wed, Feb 08, 2023 at 12:41:45PM +0100, Jiri Pirko wrote:
> Wed, Feb 08, 2023 at 12:36:53PM CET, simon.horman@corigine.com wrote:
> >On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote:
> >> On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
> >> > On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
> >> > > +VF assignment setup
> >> > > +---------------------------
> >> > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
> >> > > +different ports.
> >> > 
> >> > Please make sure you run make htmldocs when changing docs,
> >> > this will warn.
> >> > 
> >> > > +- Get count of VFs assigned to physical port::
> >> > > +
> >> > > +    $ devlink port show pci/0000:82:00.0/0
> >> > > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
> >> > 
> >> > Physical port has VFs? My knee jerk reaction is that allocating
> >> > resources via devlink is fine but this seems to lean a bit into
> >> > forwarding. How do other vendors do it? What's the mapping of VFs
> >> > to ports?
> >> 
> >> I don't understand the meaning of VFs here. If we are talking about PCI
> >> VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which
> >> talks about having one bit to enable all VFs at once. All these VFs will
> >> have separate netdevs.
> >
> >Yes, that is the case here too (before and after).
> >
> >What we are talking about is the association of VFs to physical ports
> >(in the case where a NIC has more than one physical port).
> 
> What is "the association"?

My current explanation - and I'm sure I can dig and find others - is
to association == plugged into same VEB. But I feel that description
will lead to further confusion :(
Jiri Pirko Feb. 8, 2023, 12:34 p.m. UTC | #11
Wed, Feb 08, 2023 at 01:07:54PM CET, simon.horman@corigine.com wrote:
>On Wed, Feb 08, 2023 at 12:40:45PM +0100, Jiri Pirko wrote:
>> Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote:
>> >From: Fei Qin <fei.qin@corigine.com>
>> >
>> >Multiple physical ports of the same NIC may share the single
>> >PCI address. In some cases, assigning VFs to different physical
>> >ports can be demanded, especially under high-traffic scenario.
>> >Load balancing can be realized in virtualised use¬cases through
>> >distributing packets between different physical ports with LAGs
>> >of VFs which are assigned to those physical ports.
>> >
>> >This patch adds new attribute "vf_count" to 'devlink port function'
>> >API which only can be shown and configured under devlink ports
>> >with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL".
>> 
>> I have to be missing something. That is the meaning of "assigning VF"
>> to a physical port? Why there should be any relationship between
>> physical port and VF other than configured forwarding (using TC for
>> example)?
>> 
>> This seems very wrong. Preliminary NAK.
>
>Of course if TC is involved, then we have flexibility.
>
>What we are talking about here is primarily legacy mode.

I don't see any reason to add knobs for purpose of supporting the legacy
mode, sorry.

If you need this functionality, use TC.



>And the behaviour described would, when enabled allow NFP based NICs
>to behave more like most other multi-port NICs.
>
>That is, we can envisage a VEB with some VFs and one physical port.
>And anther with other VFs and another physical port.
>
>This is as opposed to a single VEB with all VFs, as is currently
>the case on NFP based NICs (but not most other multi-port NICs).
>
Simon Horman Feb. 8, 2023, 12:37 p.m. UTC | #12
On Wed, Feb 08, 2023 at 01:34:19PM +0100, Jiri Pirko wrote:
> Wed, Feb 08, 2023 at 01:07:54PM CET, simon.horman@corigine.com wrote:
> >On Wed, Feb 08, 2023 at 12:40:45PM +0100, Jiri Pirko wrote:
> >> Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote:
> >> >From: Fei Qin <fei.qin@corigine.com>
> >> >
> >> >Multiple physical ports of the same NIC may share the single
> >> >PCI address. In some cases, assigning VFs to different physical
> >> >ports can be demanded, especially under high-traffic scenario.
> >> >Load balancing can be realized in virtualised use¬cases through
> >> >distributing packets between different physical ports with LAGs
> >> >of VFs which are assigned to those physical ports.
> >> >
> >> >This patch adds new attribute "vf_count" to 'devlink port function'
> >> >API which only can be shown and configured under devlink ports
> >> >with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL".
> >> 
> >> I have to be missing something. That is the meaning of "assigning VF"
> >> to a physical port? Why there should be any relationship between
> >> physical port and VF other than configured forwarding (using TC for
> >> example)?
> >> 
> >> This seems very wrong. Preliminary NAK.
> >
> >Of course if TC is involved, then we have flexibility.
> >
> >What we are talking about here is primarily legacy mode.
> 
> I don't see any reason to add knobs for purpose of supporting the legacy
> mode, sorry.
> 
> If you need this functionality, use TC.

I understand your point, even if I don't agree in this case.

> >And the behaviour described would, when enabled allow NFP based NICs
> >to behave more like most other multi-port NICs.
> >
> >That is, we can envisage a VEB with some VFs and one physical port.
> >And anther with other VFs and another physical port.
> >
> >This is as opposed to a single VEB with all VFs, as is currently
> >the case on NFP based NICs (but not most other multi-port NICs).
> >
Saeed Mahameed Feb. 8, 2023, 9:37 p.m. UTC | #13
On 08 Feb 13:05, Simon Horman wrote:
>On Wed, Feb 08, 2023 at 01:53:48PM +0200, Leon Romanovsky wrote:
>> On Wed, Feb 08, 2023 at 12:36:53PM +0100, Simon Horman wrote:
>> > On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote:
>> > > On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote:
>> > > > On Mon,  6 Feb 2023 16:36:02 +0100 Simon Horman wrote:
>> > > > > +VF assignment setup
>> > > > > +---------------------------
>> > > > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
>> > > > > +different ports.
>> > > >
>> > > > Please make sure you run make htmldocs when changing docs,
>> > > > this will warn.
>> > > >
>> > > > > +- Get count of VFs assigned to physical port::
>> > > > > +
>> > > > > +    $ devlink port show pci/0000:82:00.0/0
>> > > > > +    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
>> > > >
>> > > > Physical port has VFs? My knee jerk reaction is that allocating
>> > > > resources via devlink is fine but this seems to lean a bit into
>> > > > forwarding. How do other vendors do it? What's the mapping of VFs
>> > > > to ports?
>> > >
>> > > I don't understand the meaning of VFs here. If we are talking about PCI
>> > > VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which
>> > > talks about having one bit to enable all VFs at once. All these VFs will
>> > > have separate netdevs.
>> >
>> > Yes, that is the case here too (before and after).
>> >
>> > What we are talking about is the association of VFs to physical ports
>> > (in the case where a NIC has more than one physical port).
>>
>> We have devices with multiple ports too, but don't have such issues.
>> So it will help if you can provide more context here.
>>
>> I'm failing to see connection between physical ports and physical VFs.
>>
>> Are you saying that physical ports are actual PCI VFs, which spans L2 VFs,
>> which you want to assign to another port (PF)?
>
>No, a physical port is not a VF (nor a PF, FWIIW).
>
>The topic here is about two modes of behaviour.
>
>One, where VFs are associated with physical ports - conceptually one might
>think of this as some VFs and one phys port being plugged into one VEB,
>while other VFs and another phys port are plugged into another VEB.
>
>I believe this is the mode on most multi-port NICs.
>
>And another mode where all VFs are associated with one physical port,
>even if more than one is present. The NFP currently implements this model.
>
>This patch is about allowing NICs, in particular the NFP based NICs,
>to switch modes.

I don't understand the difference between the two modes, 
1) "where VFs are associated with physical ports"
2) "another mode where all VFs are associated with one physical port"

anyway here how it works for ConnectX devices, and i think the model should
be generalized to others as it simplifies the user life in my opinion.

Each physical port is represented by a PCI PF function.

devlink dev show
pci/0000:81:00.0 #port 1
pci/0000:81:00.1 #port 2

when you enable sriov, you enable it on a specific PF, eventually port,
meaning those vfs are only associated with that port.

# enable vfs on port 1
echo 4 > /sys/bus/pci/devices/0000:81:00.0/sriov_numvfs

# enable vfs on port 2
echo 4 > /sys/bus/pci/devices/0000:81:00.1/sriov_numvfs

$ devlink dev show
# port 1 PF
pci/0000:81:00.0

# port 2 PF
pci/0000:81:00.1

# port 1 VFs
pci/0000:81:00.2
pci/0000:81:00.3
pci/0000:81:00.4
pci/0000:81:00.5

# port 2 VFs
pci/0000:81:01.2
pci/0000:81:01.3
pci/0000:81:01.4
pci/0000:81:01.5

The pci address enumeration is device specific, but i don't think user
should care.
Jakub Kicinski Feb. 8, 2023, 11:35 p.m. UTC | #14
On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote:
> I don't understand the difference between the two modes, 
> 1) "where VFs are associated with physical ports"
> 2) "another mode where all VFs are associated with one physical port"
> 
> anyway here how it works for ConnectX devices, and i think the model should
> be generalized to others as it simplifies the user life in my opinion.

I'm guessing the version of the NFP Simon posted this for behaves 
much like CX3 / mlx4. One PF, multiple Ethernet ports.
Jakub Kicinski Feb. 8, 2023, 11:41 p.m. UTC | #15
On Wed, 8 Feb 2023 13:34:19 +0100 Jiri Pirko wrote:
>> Of course if TC is involved, then we have flexibility.
>>
>> What we are talking about here is primarily legacy mode.  
> 
> I don't see any reason to add knobs for purpose of supporting the legacy
> mode, sorry.
> 
> If you need this functionality, use TC.

Agreed, I seem to remember that mlx4 had some custom module param 
to do exactly the same thing. But this is a new addition so we should
just say no.
Saeed Mahameed Feb. 9, 2023, 12:55 a.m. UTC | #16
On 08 Feb 15:35, Jakub Kicinski wrote:
>On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote:
>> I don't understand the difference between the two modes,
>> 1) "where VFs are associated with physical ports"
>> 2) "another mode where all VFs are associated with one physical port"
>>
>> anyway here how it works for ConnectX devices, and i think the model should
>> be generalized to others as it simplifies the user life in my opinion.
>
>I'm guessing the version of the NFP Simon posted this for behaves
>much like CX3 / mlx4. One PF, multiple Ethernet ports.

Then the question is, can they do PF per port and avoid such complex APIs ?
Yinjun Zhang Feb. 9, 2023, 2:20 a.m. UTC | #17
On Wed, 8 Feb 2023 16:55:12 -0800, Saeed Mahameed wrote:
> On 08 Feb 15:35, Jakub Kicinski wrote:
> >On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote:
> >> I don't understand the difference between the two modes,
> >> 1) "where VFs are associated with physical ports"
> >> 2) "another mode where all VFs are associated with one physical port"
> >>
> >> anyway here how it works for ConnectX devices, and i think the model
> should
> >> be generalized to others as it simplifies the user life in my opinion.
> >
> >I'm guessing the version of the NFP Simon posted this for behaves
> >much like CX3 / mlx4. One PF, multiple Ethernet ports.
> 
> Then the question is, can they do PF per port and avoid such complex APIs ?
> 

To answer your last question, it needs silicon support, so we can't for some old products.

Then let me clarify something more for this patch-set's purpose. 
Indeed, one port per PF is current mainstream. In this case, all the VFs created from PF0
use physical port 0 as the uplink port(outlet to external world), and all the VFs from PF1
use p1 as the uplink port. Let me call them two switch-sets. And they're isolated, you can't 
make the traffic input from VFs of PF0 output to p1 or VFs of PF1, right? Even with TC in
switchdev mode, the two switch-sets are still isolated, right? Correct me if I'm wrong here.
And the posted configuration in this patch-set is useless in this case, it's for one PF with
multi ports.

Let me take NFP implementation for example here, all the VFs created from the single PF
use p0 as the uplink port by default. In legacy mode, by no means we can choose other
ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split
one switch-set to several switch-sets with every physical port as the uplink port respectively,
by grouping the VFs and assigning them to physical ports.
Jiri Pirko Feb. 9, 2023, 3:15 p.m. UTC | #18
Thu, Feb 09, 2023 at 03:20:48AM CET, yinjun.zhang@corigine.com wrote:
>On Wed, 8 Feb 2023 16:55:12 -0800, Saeed Mahameed wrote:
>> On 08 Feb 15:35, Jakub Kicinski wrote:
>> >On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote:
>> >> I don't understand the difference between the two modes,
>> >> 1) "where VFs are associated with physical ports"
>> >> 2) "another mode where all VFs are associated with one physical port"
>> >>
>> >> anyway here how it works for ConnectX devices, and i think the model
>> should
>> >> be generalized to others as it simplifies the user life in my opinion.
>> >
>> >I'm guessing the version of the NFP Simon posted this for behaves
>> >much like CX3 / mlx4. One PF, multiple Ethernet ports.
>> 
>> Then the question is, can they do PF per port and avoid such complex APIs ?
>> 
>
>To answer your last question, it needs silicon support, so we can't for some old products.
>
>Then let me clarify something more for this patch-set's purpose. 
>Indeed, one port per PF is current mainstream. In this case, all the VFs created from PF0
>use physical port 0 as the uplink port(outlet to external world), and all the VFs from PF1
>use p1 as the uplink port. Let me call them two switch-sets. And they're isolated, you can't 
>make the traffic input from VFs of PF0 output to p1 or VFs of PF1, right? Even with TC in
>switchdev mode, the two switch-sets are still isolated, right? Correct me if I'm wrong here.
>And the posted configuration in this patch-set is useless in this case, it's for one PF with
>multi ports.
>
>Let me take NFP implementation for example here, all the VFs created from the single PF
>use p0 as the uplink port by default. In legacy mode, by no means we can choose other

Legacy is legacy. I believe it is like 5 years already no knobs for
legacy mode are accepted. You should not use it for new features.
Why this is any different?

Implement TC offloading and then you can ballance the hell out of the
thing :)


>ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split
>one switch-set to several switch-sets with every physical port as the uplink port respectively,
>by grouping the VFs and assigning them to physical ports.
Yinjun Zhang Feb. 10, 2023, 2:14 a.m. UTC | #19
On Thu, 9 Feb 2023 16:15:58 +0100, Jiri Pirko wrote:
> Thu, Feb 09, 2023 at 03:20:48AM CET, yinjun.zhang@corigine.com wrote:
> >
> >Let me take NFP implementation for example here, all the VFs created from the single PF
> >use p0 as the uplink port by default. In legacy mode, by no means we can choose other
> 
> Legacy is legacy. I believe it is like 5 years already no knobs for
> legacy mode are accepted. You should not use it for new features.
> Why this is any different?
> 
> Implement TC offloading and then you can ballance the hell out of the
> thing :)

I understand in switchdev mode, the fine-grained manipulation by TC can do it.
While legacy has fixed forwarding rule, and we hope it can be implemented without
too much involved configuration from user if they only want legacy forwarding.

As multi-port mapping to one PF NIC is scarce, maybe we should implement is as
vendor specific configuration, make sense?

> 
> 
> >ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split
> >one switch-set to several switch-sets with every physical port as the uplink port respectively,
> >by grouping the VFs and assigning them to physical ports.
Jakub Kicinski Feb. 10, 2023, 3:30 a.m. UTC | #20
On Fri, 10 Feb 2023 02:14:27 +0000 Yinjun Zhang wrote:
> I understand in switchdev mode, the fine-grained manipulation by TC can do it.
> While legacy has fixed forwarding rule, and we hope it can be implemented without
> too much involved configuration from user if they only want legacy forwarding.
> 
> As multi-port mapping to one PF NIC is scarce, maybe we should implement is as
> vendor specific configuration, make sense?

Vendor extension or not we are disallowing adding configuration 
for legacy SR-IOV mode. We want people to move to switchdev mode,
otherwise we'll have to keep extending both for ever.
Jiri Pirko Feb. 10, 2023, 9:45 a.m. UTC | #21
Fri, Feb 10, 2023 at 03:14:27AM CET, yinjun.zhang@corigine.com wrote:
>On Thu, 9 Feb 2023 16:15:58 +0100, Jiri Pirko wrote:
>> Thu, Feb 09, 2023 at 03:20:48AM CET, yinjun.zhang@corigine.com wrote:
>> >
>> >Let me take NFP implementation for example here, all the VFs created from the single PF
>> >use p0 as the uplink port by default. In legacy mode, by no means we can choose other
>> 
>> Legacy is legacy. I believe it is like 5 years already no knobs for
>> legacy mode are accepted. You should not use it for new features.
>> Why this is any different?
>> 
>> Implement TC offloading and then you can ballance the hell out of the
>> thing :)
>
>I understand in switchdev mode, the fine-grained manipulation by TC can do it.
>While legacy has fixed forwarding rule, and we hope it can be implemented without
>too much involved configuration from user if they only want legacy forwarding.
>
>As multi-port mapping to one PF NIC is scarce, maybe we should implement is as
>vendor specific configuration, make sense?

No, it does not make sense what so ever.

You want to extend legacy, which is no longer an option (for many years).

If you need this feature, implement switchdev mode for your device.
Simple as that. I think this was clearly stated in multiple emails in
this thread, I don't follow why it needs to be repeated.


>
>> 
>> 
>> >ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split
>> >one switch-set to several switch-sets with every physical port as the uplink port respectively,
>> >by grouping the VFs and assigning them to physical ports.
diff mbox series

Patch

diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 3da590953ce8..5c3996bce6d9 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -128,6 +128,9 @@  Users may also set the RoCE capability of the function using
 Users may also set the function as migratable using
 'devlink port function set migratable' command.
 
+Users may also assign VFs to physical ports using
+'devlink port function set vf_count' command.
+
 Function attributes
 ===================
 
@@ -240,6 +243,27 @@  Attach VF to the VM.
 Start the VM.
 Perform live migration.
 
+
+VF assignment setup
+---------------------------
+In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to
+different ports.
+
+- Get count of VFs assigned to physical port::
+
+    $ devlink port show pci/0000:82:00.0/0
+    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
+        function:
+            vf_count 0
+
+- Set count of VFs assigned to physical port::
+    $ devlink port function set pci/0000:82:00.0/0 vf_count 3
+
+    $ devlink port show pci/0000:82:00.0/0
+    pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4
+        function:
+            vf_count 3
+
 Subfunction
 ============
 
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2e85a5970a32..3e98fa3d251f 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1491,6 +1491,27 @@  struct devlink_ops {
 	int (*port_fn_migratable_set)(struct devlink_port *devlink_port,
 				      bool enable,
 				      struct netlink_ext_ack *extack);
+
+	/**
+	 * @port_fn_vf_count_get: Port function's VF count get function
+	 *
+	 * Get assigned VF count of a function managed by the devlink port,
+	 * should only be used for DEVLINK_PORT_FLAVOUR_PHYSICAL.
+	 * Return -EOPNOTSUPP if port function vf_count setup is not supported.
+	 */
+	int (*port_fn_vf_count_get)(struct devlink_port *port, u16 *vf_count,
+				    struct netlink_ext_ack *extack);
+
+	/**
+	 * @port_fn_vf_count_set: Port function's VF count set function
+	 *
+	 * Set assigned VF count of a function managed by the devlink port,
+	 * should only be used for DEVLINK_PORT_FLAVOUR_PHYSICAL.
+	 * Return -EOPNOTSUPP if port function vf_count setup is not supported.
+	 */
+	int (*port_fn_vf_count_set)(struct devlink_port *port, u16 vf_count,
+				    struct netlink_ext_ack *extack);
+
 	/**
 	 * port_new() - Add a new port function of a specified flavor
 	 * @devlink: Devlink instance
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 3782d4219ac9..774e17f6100b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -676,6 +676,7 @@  enum devlink_port_function_attr {
 	DEVLINK_PORT_FN_ATTR_STATE,	/* u8 */
 	DEVLINK_PORT_FN_ATTR_OPSTATE,	/* u8 */
 	DEVLINK_PORT_FN_ATTR_CAPS,	/* bitfield32 */
+	DEVLINK_PORT_FN_ATTR_VF_COUNT,	/* u16 */
 
 	__DEVLINK_PORT_FUNCTION_ATTR_MAX,
 	DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index 97d30ea98b00..6dac8b562232 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -141,6 +141,7 @@  static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_
 				 DEVLINK_PORT_FN_STATE_ACTIVE),
 	[DEVLINK_PORT_FN_ATTR_CAPS] =
 		NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK),
+	[DEVLINK_PORT_FN_ATTR_VF_COUNT] = { .type = NLA_U16 },
 };
 
 #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port)				\
@@ -520,6 +521,35 @@  static int devlink_port_fn_caps_fill(const struct devlink_ops *ops,
 	return 0;
 }
 
+static int devlink_port_fn_vf_count_fill(const struct devlink_ops *ops,
+					 struct devlink_port *devlink_port,
+					 struct sk_buff *msg,
+					 struct netlink_ext_ack *extack,
+					 bool *msg_updated)
+{
+	u16 vf_count;
+	int err;
+
+	if (!ops->port_fn_vf_count_get ||
+	    devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PHYSICAL)
+		return 0;
+
+	err = ops->port_fn_vf_count_get(devlink_port, &vf_count, extack);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+
+	err = nla_put_u16(msg, DEVLINK_PORT_FN_ATTR_VF_COUNT, vf_count);
+	if (err)
+		return err;
+
+	*msg_updated = true;
+
+	return 0;
+}
+
 static int
 devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
 				  struct genl_info *info,
@@ -871,6 +901,16 @@  static int devlink_port_fn_caps_set(struct devlink_port *devlink_port,
 	return 0;
 }
 
+static int devlink_port_fn_vf_count_set(struct devlink_port *devlink_port,
+					const struct nlattr *attr,
+					struct netlink_ext_ack *extack)
+{
+	const struct devlink_ops *ops = devlink_port->devlink->ops;
+	u16 vf_count = nla_get_u16(attr);
+
+	return ops->port_fn_vf_count_set(devlink_port, vf_count, extack);
+}
+
 static int
 devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
 				   struct netlink_ext_ack *extack)
@@ -893,6 +933,11 @@  devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
 					&msg_updated);
 	if (err)
 		goto out;
+
+	err = devlink_port_fn_vf_count_fill(ops, port, msg, extack, &msg_updated);
+	if (err)
+		goto out;
+
 	err = devlink_port_fn_state_fill(ops, port, msg, extack, &msg_updated);
 out:
 	if (err || !msg_updated)
@@ -1219,6 +1264,19 @@  static int devlink_port_function_validate(struct devlink_port *devlink_port,
 				    "Function does not support state setting");
 		return -EOPNOTSUPP;
 	}
+	attr = tb[DEVLINK_PORT_FN_ATTR_VF_COUNT];
+	if (attr) {
+		if (!ops->port_fn_vf_count_set) {
+			NL_SET_ERR_MSG_ATTR(extack, attr,
+					    "Function doesn't support VF assignment");
+			return -EOPNOTSUPP;
+		}
+		if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PHYSICAL) {
+			NL_SET_ERR_MSG_ATTR(extack, attr,
+					    "VFs assignment supported for physical ports only");
+			return -EOPNOTSUPP;
+		}
+	}
 	attr = tb[DEVLINK_PORT_FN_ATTR_CAPS];
 	if (attr) {
 		struct nla_bitfield32 caps;
@@ -1278,6 +1336,13 @@  static int devlink_port_function_set(struct devlink_port *port,
 			return err;
 	}
 
+	attr = tb[DEVLINK_PORT_FN_ATTR_VF_COUNT];
+	if (attr) {
+		err = devlink_port_fn_vf_count_set(port, attr, extack);
+		if (err)
+			return err;
+	}
+
 	/* Keep this as the last function attribute set, so that when
 	 * multiple port function attributes are set along with state,
 	 * Those can be applied first before activating the state.