Message ID | 20241204220931.254964-1-tariqt@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes | expand |
On Thu, Dec 05, 2024 at 12:09:20AM +0200, Tariq Toukan wrote: > Hi, > > This patchset starts with 4 patches that modify the IFC, targeted to > mlx5-next in order to be taken to rdma-next branch side sooner than in > the next merge window. > > This patchset consists of two features: > 1. In patches 5-6, Itamar adds SW Steering support for ConnectX-8. > 2. Followed by patches by Carolina that add rate management support on > traffic classes in devlink and mlx5, more details below [1]. > > Series generated against: > commit bb18265c3aba ("r8169: remove support for chip version 11") > > Regards, > Tariq <...> > Carolina Jubran (6): > net/mlx5: Add support for new scheduling elements > > Cosmin Ratiu (2): > net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits > net/mlx5: qos: Add ifc support for cross-esw scheduling > > Yevgeny Kliteynik (1): > net/mlx5: Add ConnectX-8 device to ifc I applied these IFC patches to our mlx5-next shared branch. https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-next Thanks
On Thu, 5 Dec 2024 00:09:20 +0200 Tariq Toukan wrote: > This patch series extends the devlink-rate API to support traffic class > (TC) bandwidth management, enabling more granular control over traffic > shaping and rate limiting across multiple TCs. The API now allows users > to specify bandwidth proportions for different traffic classes in a > single command. This is particularly useful for managing Enhanced > Transmission Selection (ETS) for groups of Virtual Functions (VFs), > allowing precise bandwidth allocation across traffic classes. > > Additionally the series refines the QoS handling in net/mlx5 to support > TC arbitration and bandwidth management on vports and rate nodes. > > Extend devlink-rate API to support rate management on TCs: > - devlink: Extend the devlink rate API to support traffic class > bandwidth management > > Introduce a no-op implementation: > - net/mlx5: Add no-op implementation for setting tc-bw on rate objects > > Add support for enabling and disabling TC QoS on vports and nodes: > - net/mlx5: Add support for setting tc-bw on nodes > - net/mlx5: Add traffic class scheduling support for vport QoS > > Support for setting tc-bw on rate objects: > - net/mlx5: Manage TC arbiter nodes and implement full support for > tc-bw Do you expect TC bw allocation to work on non-leaf nodes? How does this relate to the rate API which Paolo added? He was asked to build in a way to integrate with devlink now devlink is growing extra features again, which presumably the other API will also need. And the integration may turn out to be challenging.
On 07/12/2024 4:13, Jakub Kicinski wrote: > On Thu, 5 Dec 2024 00:09:20 +0200 Tariq Toukan wrote: >> This patch series extends the devlink-rate API to support traffic class >> (TC) bandwidth management, enabling more granular control over traffic >> shaping and rate limiting across multiple TCs. The API now allows users >> to specify bandwidth proportions for different traffic classes in a >> single command. This is particularly useful for managing Enhanced >> Transmission Selection (ETS) for groups of Virtual Functions (VFs), >> allowing precise bandwidth allocation across traffic classes. >> >> Additionally the series refines the QoS handling in net/mlx5 to support >> TC arbitration and bandwidth management on vports and rate nodes. >> >> Extend devlink-rate API to support rate management on TCs: >> - devlink: Extend the devlink rate API to support traffic class >> bandwidth management >> >> Introduce a no-op implementation: >> - net/mlx5: Add no-op implementation for setting tc-bw on rate objects >> >> Add support for enabling and disabling TC QoS on vports and nodes: >> - net/mlx5: Add support for setting tc-bw on nodes >> - net/mlx5: Add traffic class scheduling support for vport QoS >> >> Support for setting tc-bw on rate objects: >> - net/mlx5: Manage TC arbiter nodes and implement full support for >> tc-bw > > Do you expect TC bw allocation to work on non-leaf nodes? > Yes. That's the point. It works. > How does this relate to the rate API which Paolo added? He was asked > to build in a way to integrate with devlink now devlink is growing > extra features again, which presumably the other API will also need. > And the integration may turn out to be challenging. > AFAIU Paolo's work is not for shapers 'above' the network device level, i.e. groups.
On Mon, 9 Dec 2024 21:32:11 +0200 Tariq Toukan wrote: > > Do you expect TC bw allocation to work on non-leaf nodes? > > Yes. That's the point. It works. Let's level -- I'm not trying to be difficult, but you're defining uAPI with little to no documentation. "It works" is not going to cut it. > > How does this relate to the rate API which Paolo added? He was asked > > to build in a way to integrate with devlink now devlink is growing > > extra features again, which presumably the other API will also need. > > And the integration may turn out to be challenging. > > AFAIU Paolo's work is not for shapers 'above' the network device level, > i.e. groups. What's the difference between queue group and a VF?
On Mon, 2024-12-09 at 13:41 -0800, Jakub Kicinski wrote: > On Mon, 9 Dec 2024 21:32:11 +0200 Tariq Toukan wrote: > > > Do you expect TC bw allocation to work on non-leaf nodes? > > > > Yes. That's the point. It works. > > Let's level -- I'm not trying to be difficult, but you're defining > uAPI with little to no documentation. "It works" is not going to cut > it. The original intent was to document this in the devlink man page. But we will add something in the kernel documentation as well in the next submission. > > > > How does this relate to the rate API which Paolo added? He was > > > asked > > > to build in a way to integrate with devlink now devlink is > > > growing > > > extra features again, which presumably the other API will also > > > need. > > > And the integration may turn out to be challenging. > > > > AFAIU Paolo's work is not for shapers 'above' the network device > > level, > > i.e. groups. > > What's the difference between queue group and a VF? > I've looked over the latest version of the net-shapers API. There is some conceptual overlap between this patchset and net-shapers ability to define a group of device queues and manipulate its tx limits. But as far as I am aware ([1]), the net-shapers API doesn't intend to shape entities above netdev level. So there are two things to discuss here: 1. Integrating device-level TC shaping into net-shapers. The net- shapers model would need to be extended with the ability to define TC queues. At the moment I see it's concerned with device tx queues which don't necessarily map 1:1 to traffic classes. Then, it would need to have the ability to group TC queues into a node. Then the integration should be easy. Either API can call the device driver implementation or one API can call the other's function to do so. Paolo, what are your thoughts on tc shaping in the net-shapers API? 2. VF-group TC shaping. The current patchset offers the ability to split TC bandwidth on a devlink rate node, applying to all VFs in the node. As far as I am aware, net-shapers doesn't intend to address this use case. Do we want to have two completely different APIs to manipulate tc bandwidth? Cosmin. [1] https://lore.kernel.org/netdev/7195630a-1021-4e1e-b48b-a07945477863@redhat.com/
On Wed, 11 Dec 2024 09:49:28 +0000 Cosmin Ratiu wrote: > I've looked over the latest version of the net-shapers API. > There is some conceptual overlap between this patchset and net-shapers > ability to define a group of device queues and manipulate its tx > limits. But as far as I am aware ([1]), the net-shapers API doesn't > intend to shape entities above netdev level. It's not about the uAPI but about having a uniform way of representing the shaping hierarchy. > So there are two things to discuss here: > 1. Integrating device-level TC shaping into net-shapers. The net- > shapers model would need to be extended with the ability to define TC > queues. At the moment I see it's concerned with device tx queues which > don't necessarily map 1:1 to traffic classes. What are "TC queues"? NIC queues with assigned TC? Your patches shape on a group of VFs, so the equivalent would be a group of queues (e.g. group of queues assigned to a container). > Then, it would need to have the ability to group TC queues into a node.
On Wed, 2024-12-11 at 17:49 -0800, Jakub Kicinski wrote: > On Wed, 11 Dec 2024 09:49:28 +0000 Cosmin Ratiu wrote: > > I've looked over the latest version of the net-shapers API. > > There is some conceptual overlap between this patchset and net- > > shapers > > ability to define a group of device queues and manipulate its tx > > limits. But as far as I am aware ([1]), the net-shapers API doesn't > > intend to shape entities above netdev level. > > It's not about the uAPI but about having a uniform way of > representing the shaping hierarchy. I understand your point now. > > So there are two things to discuss here: > > 1. Integrating device-level TC shaping into net-shapers. The net- > > shapers model would need to be extended with the ability to define > > TC > > queues. At the moment I see it's concerned with device tx queues > > which > > don't necessarily map 1:1 to traffic classes. > > What are "TC queues"? NIC queues with assigned TC? Your patches shape > on a group of VFs, so the equivalent would be a group of queues > (e.g. group of queues assigned to a container). My terminology was slightly off. "TC queues" are a logical construct, not necessarily corresponding to device queues. As far as I know, packet traffic classes are determined with a variety of methods, and can be encoded in the IP header (ToS) or as metadata in the tx descriptor somewhere. I am not sure there's any correspondence with device queues although one could define specific queues for specific traffic classes, I guess. The "TC queues" I was mentioning are a logical representation of the packet flow and refer to the hardware's ability to treat different TCs differently with HW scheduling elements. > > Then, it would need to have the ability to group TC queues into a > > node. > >