mbox series

[net-next,v4,0/2] devlink: Add port function attribute for IO EQs

Message ID 20240406010538.220167-1-parav@nvidia.com (mailing list archive)
Headers show
Series devlink: Add port function attribute for IO EQs | expand

Message

Parav Pandit April 6, 2024, 1:05 a.m. UTC
Currently, PCI SFs and VFs use IO event queues to deliver netdev per
channel events. The number of netdev channels is a function of IO
event queues. In the second scenario of an RDMA device, the
completion vectors are also a function of IO event queues. Currently, an
administrator on the hypervisor has no means to provision the number
of IO event queues for the SF device or the VF device. Device/firmware
determines some arbitrary value for these IO event queues. Due to this,
the SF netdev channels are unpredictable, and consequently, the
performance is too.

This short series introduces a new port function attribute: max_io_eqs.
The goal is to provide administrators at the hypervisor level with the
ability to provision the maximum number of IO event queues for a
function. This gives the control to the administrator to provision
right number of IO event queues and have predictable performance.

Examples of when an administrator provisions (set) maximum number of
IO event queues when using switchdev mode:

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10

  $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20

This sets the corresponding maximum IO event queues of the function
before it is enumerated. Thus, when the VF/SF driver reads the
capability from the device, it sees the value provisioned by the
hypervisor. The driver is then able to configure the number of channels
for the net device, as well as the number of completion vectors
for the RDMA device. The device/firmware also honors the provisioned
value, hence any VF/SF driver attempting to create IO EQs
beyond provisioned value results in an error.

With above setting now, the administrator is able to achieve the 2x
performance on SFs with 20 channels. In second example when SF was
provisioned for a container with 2 cpus, the administrator provisioned only
2 IO event queues, thereby saving device resources.

With the above settings now in place, the administrator achieved 2x
performance with the SF device with 20 channels. In the second example,
when the SF was provisioned for a container with 2 CPUs, the administrator
provisioned only 2 IO event queues, thereby saving device resources.

changelog:
v2->v3:
- limited to 80 chars per line in devlink
- fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
  on error path
v1->v2:
- limited comment to 80 chars per line in header file
- fixed set function variables for reverse christmas tree
- fixed comments from Kalesh
- fixed missing kfree in get call
- returning error code for get cmd failure
- fixed error msg copy paste error in set on cmd failure

Parav Pandit (2):
  devlink: Support setting max_io_eqs
  mlx5/core: Support max_io_eqs for a function

 .../networking/devlink/devlink-port.rst       | 33 +++++++
 .../mellanox/mlx5/core/esw/devlink_port.c     |  4 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  7 ++
 .../mellanox/mlx5/core/eswitch_offloads.c     | 97 +++++++++++++++++++
 include/net/devlink.h                         | 14 +++
 include/uapi/linux/devlink.h                  |  1 +
 net/devlink/port.c                            | 53 ++++++++++
 7 files changed, 209 insertions(+)

Comments

Zhu Yanjun April 6, 2024, 9:05 a.m. UTC | #1
在 2024/4/6 3:05, Parav Pandit 写道:
> Currently, PCI SFs and VFs use IO event queues to deliver netdev per
> channel events. The number of netdev channels is a function of IO
> event queues. In the second scenario of an RDMA device, the
> completion vectors are also a function of IO event queues. Currently, an
> administrator on the hypervisor has no means to provision the number
> of IO event queues for the SF device or the VF device. Device/firmware
> determines some arbitrary value for these IO event queues. Due to this,
> the SF netdev channels are unpredictable, and consequently, the
> performance is too.
> 
> This short series introduces a new port function attribute: max_io_eqs.
> The goal is to provide administrators at the hypervisor level with the
> ability to provision the maximum number of IO event queues for a
> function. This gives the control to the administrator to provision
> right number of IO event queues and have predictable performance.
> 
> Examples of when an administrator provisions (set) maximum number of
> IO event queues when using switchdev mode:
> 
>    $ devlink port show pci/0000:06:00.0/1
>        pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
>            function:
>            hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10
> 
>    $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20
> 
>    $ devlink port show pci/0000:06:00.0/1
>        pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
>            function:
>            hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20
> 
> This sets the corresponding maximum IO event queues of the function
> before it is enumerated. Thus, when the VF/SF driver reads the
> capability from the device, it sees the value provisioned by the
> hypervisor. The driver is then able to configure the number of channels
> for the net device, as well as the number of completion vectors
> for the RDMA device. The device/firmware also honors the provisioned
> value, hence any VF/SF driver attempting to create IO EQs
> beyond provisioned value results in an error.
> 
> With above setting now, the administrator is able to achieve the 2x
> performance on SFs with 20 channels. In second example when SF was
> provisioned for a container with 2 cpus, the administrator provisioned only
> 2 IO event queues, thereby saving device resources.
> 

The following paragraph is the same with the above paragraph?

> With the above settings now in place, the administrator achieved 2x
> performance with the SF device with 20 channels. In the second example,
> when the SF was provisioned for a container with 2 CPUs, the administrator
> provisioned only 2 IO event queues, thereby saving device resources.
> 
> changelog:
> v2->v3:
> - limited to 80 chars per line in devlink
> - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
>    on error path
> v1->v2:
> - limited comment to 80 chars per line in header file
> - fixed set function variables for reverse christmas tree
> - fixed comments from Kalesh
> - fixed missing kfree in get call
> - returning error code for get cmd failure
> - fixed error msg copy paste error in set on cmd failure
> 
> Parav Pandit (2):
>    devlink: Support setting max_io_eqs
>    mlx5/core: Support max_io_eqs for a function
> 
>   .../networking/devlink/devlink-port.rst       | 33 +++++++
>   .../mellanox/mlx5/core/esw/devlink_port.c     |  4 +
>   .../net/ethernet/mellanox/mlx5/core/eswitch.h |  7 ++
>   .../mellanox/mlx5/core/eswitch_offloads.c     | 97 +++++++++++++++++++
>   include/net/devlink.h                         | 14 +++
>   include/uapi/linux/devlink.h                  |  1 +
>   net/devlink/port.c                            | 53 ++++++++++
>   7 files changed, 209 insertions(+)
>
Parav Pandit April 8, 2024, 3:20 a.m. UTC | #2
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> Sent: Saturday, April 6, 2024 2:36 PM
> 
> 在 2024/4/6 3:05, Parav Pandit 写道:
> > Currently, PCI SFs and VFs use IO event queues to deliver netdev per
> > channel events. The number of netdev channels is a function of IO
> > event queues. In the second scenario of an RDMA device, the completion
> > vectors are also a function of IO event queues. Currently, an
> > administrator on the hypervisor has no means to provision the number
> > of IO event queues for the SF device or the VF device. Device/firmware
> > determines some arbitrary value for these IO event queues. Due to
> > this, the SF netdev channels are unpredictable, and consequently, the
> > performance is too.
> >
> > This short series introduces a new port function attribute: max_io_eqs.
> > The goal is to provide administrators at the hypervisor level with the
> > ability to provision the maximum number of IO event queues for a
> > function. This gives the control to the administrator to provision
> > right number of IO event queues and have predictable performance.
> >
> > Examples of when an administrator provisions (set) maximum number of
> > IO event queues when using switchdev mode:
> >
> >    $ devlink port show pci/0000:06:00.0/1
> >        pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum
> 0 vfnum 0
> >            function:
> >            hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10
> >
> >    $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20
> >
> >    $ devlink port show pci/0000:06:00.0/1
> >        pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum
> 0 vfnum 0
> >            function:
> >            hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20
> >
> > This sets the corresponding maximum IO event queues of the function
> > before it is enumerated. Thus, when the VF/SF driver reads the
> > capability from the device, it sees the value provisioned by the
> > hypervisor. The driver is then able to configure the number of
> > channels for the net device, as well as the number of completion
> > vectors for the RDMA device. The device/firmware also honors the
> > provisioned value, hence any VF/SF driver attempting to create IO EQs
> > beyond provisioned value results in an error.
> >
> > With above setting now, the administrator is able to achieve the 2x
> > performance on SFs with 20 channels. In second example when SF was
> > provisioned for a container with 2 cpus, the administrator provisioned
> > only
> > 2 IO event queues, thereby saving device resources.
> >
> 
> The following paragraph is the same with the above paragraph?
>
Ah, yes. I forgot to remove one of them while doing minor grammar changes.

 
> > With the above settings now in place, the administrator achieved 2x
> > performance with the SF device with 20 channels. In the second
> > example, when the SF was provisioned for a container with 2 CPUs, the
> > administrator provisioned only 2 IO event queues, thereby saving device
> resources.
> >
> > changelog:
> > v2->v3:
> > - limited to 80 chars per line in devlink
> > - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
> >    on error path
> > v1->v2:
> > - limited comment to 80 chars per line in header file
> > - fixed set function variables for reverse christmas tree
> > - fixed comments from Kalesh
> > - fixed missing kfree in get call
> > - returning error code for get cmd failure
> > - fixed error msg copy paste error in set on cmd failure
> >
> > Parav Pandit (2):
> >    devlink: Support setting max_io_eqs
> >    mlx5/core: Support max_io_eqs for a function
> >
> >   .../networking/devlink/devlink-port.rst       | 33 +++++++
> >   .../mellanox/mlx5/core/esw/devlink_port.c     |  4 +
> >   .../net/ethernet/mellanox/mlx5/core/eswitch.h |  7 ++
> >   .../mellanox/mlx5/core/eswitch_offloads.c     | 97 +++++++++++++++++++
> >   include/net/devlink.h                         | 14 +++
> >   include/uapi/linux/devlink.h                  |  1 +
> >   net/devlink/port.c                            | 53 ++++++++++
> >   7 files changed, 209 insertions(+)
> >
patchwork-bot+netdevbpf@kernel.org April 8, 2024, 1:20 p.m. UTC | #3
Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Sat, 6 Apr 2024 04:05:36 +0300 you wrote:
> Currently, PCI SFs and VFs use IO event queues to deliver netdev per
> channel events. The number of netdev channels is a function of IO
> event queues. In the second scenario of an RDMA device, the
> completion vectors are also a function of IO event queues. Currently, an
> administrator on the hypervisor has no means to provision the number
> of IO event queues for the SF device or the VF device. Device/firmware
> determines some arbitrary value for these IO event queues. Due to this,
> the SF netdev channels are unpredictable, and consequently, the
> performance is too.
> 
> [...]

Here is the summary with links:
  - [net-next,v4,1/2] devlink: Support setting max_io_eqs
    https://git.kernel.org/netdev/net-next/c/5af3e3876d56
  - [net-next,v4,2/2] mlx5/core: Support max_io_eqs for a function
    https://git.kernel.org/netdev/net-next/c/93197c7c509d

You are awesome, thank you!