mbox series

[net-next,v9,0/9] Implement devlink-rate API and extend it

Message ID 20221104143102.1120076-1-michal.wilczynski@intel.com (mailing list archive)
Headers show
Series Implement devlink-rate API and extend it | expand

Message

Wilczynski, Michal Nov. 4, 2022, 2:30 p.m. UTC
This patch series implements devlink-rate for ice driver. Unfortunately
current API isn't flexible enough for our use case, so there is a need to
extend it. Some functions have been introduced to enable the driver to
export current Tx scheduling configuration.

Pasting justification for this series from commit implementing devlink-rate
in ice driver(that is a part of this series):

There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.

This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.

Example initial tree with 2 VF's:

[root@fedora ~]# devlink port function rate show
pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27


Let me visualize part of the tree:

                        +---------+
                        |  node_0 |
                        +---------+
                             |
                        +----v----+
                        | node_26 |
                        +----+----+
                             |
                        +----v----+
                        | node_27 |
                        +----+----+
                             |
                    |-----------------|
               +----v----+       +----v----+
               |   VF 1  |       |   VF 2  |
               +----+----+       +----+----+

So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.

[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
                 tx_max 5Gbps

This would cap the VF 1 BW to 5Gbps.

But let's say you would like to create a completely new branch.
This can be done like this:

[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
                 pci/0000:4b:00.0/1 parent node_custom_1

This creates a completely new branch and reassigns VF 1 to it.

A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.


V9:
 - changed misleading name from 'parameter' to 'attribute'
 - removed mechanism referring for kernel object by string,
   now it's referring to them as pointers
 - removed limiting name size in devl_rate_node_create()
 - removed commit that allowed for change of priv in set_parent
   callback
 - added commit that allows for pre-allocation of ice_sched
   elements


V8:
- address minor formatting issues
- fix memory leak
- address warnings

V7:
- split into smaller commits
- paste justification for this series to cover letter

V6:
- replaced strncpy with strscpy
- renamed rate_vport -> rate_leaf

V5:
- removed queue support per community request
- fix division of 64bit variable with 32bit divisor by using div_u64()
- remove RDMA, ADQ exlusion as it's not necessary anymore
- changed how driver exports configuration, as queues are not supported
  anymore
- changed IDA to Xarray for unique node identification


V4:
- changed static variable counter to per port IDA to
  uniquely identify nodes

V3:
- removed shift macros, since FIELD_PREP is used
- added static_assert for struct
- removed unnecessary functions
- used tab instead of space in define

V2:
- fixed Alexandr comments
- refactored code to fix checkpatch issues
- added mutual exclusion for RDMA, DCB


Michal Wilczynski (9):
  devlink: Introduce new attribute 'tx_priority' to devlink-rate
  devlink: Introduce new attribute 'tx_weight' to devlink-rate
  devlink: Enable creation of the devlink-rate nodes from the driver
  devlink: Allow for devlink-rate nodes parent reassignment
  devlink: Allow to set up parent in devl_rate_leaf_create()
  ice: Introduce new parameters in ice_sched_node
  ice: Add an option to pre-allocate memory for ice_sched_node
  ice: Implement devlink-rate API
  ice: Prevent ADQ, DCB coexistence with Custom Tx scheduler

 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   4 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |   7 +-
 drivers/net/ethernet/intel/ice/ice_dcb.c      |   2 +-
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |   4 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 483 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h  |   2 +
 drivers/net/ethernet/intel/ice/ice_repr.c     |  13 +
 drivers/net/ethernet/intel/ice/ice_sched.c    | 101 +++-
 drivers/net/ethernet/intel/ice/ice_sched.h    |  31 +-
 drivers/net/ethernet/intel/ice/ice_type.h     |   9 +
 .../mellanox/mlx5/core/esw/devlink_port.c     |  14 +-
 drivers/net/netdevsim/dev.c                   |   9 +-
 include/net/devlink.h                         |  18 +-
 include/uapi/linux/devlink.h                  |   3 +
 net/core/devlink.c                            | 133 ++++-
 15 files changed, 799 insertions(+), 34 deletions(-)

Comments

Jakub Kicinski Nov. 5, 2022, 2:05 a.m. UTC | #1
On Fri,  4 Nov 2022 15:30:53 +0100 Michal Wilczynski wrote:
>  .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   4 +-
>  drivers/net/ethernet/intel/ice/ice_common.c   |   7 +-
>  drivers/net/ethernet/intel/ice/ice_dcb.c      |   2 +-
>  drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |   4 +
>  drivers/net/ethernet/intel/ice/ice_devlink.c  | 483 ++++++++++++++++++
>  drivers/net/ethernet/intel/ice/ice_devlink.h  |   2 +
>  drivers/net/ethernet/intel/ice/ice_repr.c     |  13 +
>  drivers/net/ethernet/intel/ice/ice_sched.c    | 101 +++-
>  drivers/net/ethernet/intel/ice/ice_sched.h    |  31 +-
>  drivers/net/ethernet/intel/ice/ice_type.h     |   9 +
>  .../mellanox/mlx5/core/esw/devlink_port.c     |  14 +-
>  drivers/net/netdevsim/dev.c                   |   9 +-
>  include/net/devlink.h                         |  18 +-
>  include/uapi/linux/devlink.h                  |   3 +
>  net/core/devlink.c                            | 133 ++++-

Please provide some documentation.
Wilczynski, Michal Nov. 7, 2022, 6:18 p.m. UTC | #2
On 11/5/2022 3:05 AM, Jakub Kicinski wrote:
> On Fri,  4 Nov 2022 15:30:53 +0100 Michal Wilczynski wrote:
>>   .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   4 +-
>>   drivers/net/ethernet/intel/ice/ice_common.c   |   7 +-
>>   drivers/net/ethernet/intel/ice/ice_dcb.c      |   2 +-
>>   drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |   4 +
>>   drivers/net/ethernet/intel/ice/ice_devlink.c  | 483 ++++++++++++++++++
>>   drivers/net/ethernet/intel/ice/ice_devlink.h  |   2 +
>>   drivers/net/ethernet/intel/ice/ice_repr.c     |  13 +
>>   drivers/net/ethernet/intel/ice/ice_sched.c    | 101 +++-
>>   drivers/net/ethernet/intel/ice/ice_sched.h    |  31 +-
>>   drivers/net/ethernet/intel/ice/ice_type.h     |   9 +
>>   .../mellanox/mlx5/core/esw/devlink_port.c     |  14 +-
>>   drivers/net/netdevsim/dev.c                   |   9 +-
>>   include/net/devlink.h                         |  18 +-
>>   include/uapi/linux/devlink.h                  |   3 +
>>   net/core/devlink.c                            | 133 ++++-
> Please provide some documentation.

I provided some documentation in v10 in ice.rst file.
Unfortunately there is no devlink-rate.rst as far as I can
tell and at some point we even discussed adding this with Edward,
but honestly I think this could be added in a separate patch
series to not unnecessarily prolong merging this.

BR,
Michał
Jakub Kicinski Nov. 7, 2022, 6:31 p.m. UTC | #3
On Mon, 7 Nov 2022 19:18:10 +0100 Wilczynski, Michal wrote:
> I provided some documentation in v10 in ice.rst file.
> Unfortunately there is no devlink-rate.rst as far as I can
> tell and at some point we even discussed adding this with Edward,
> but honestly I think this could be added in a separate patch
> series to not unnecessarily prolong merging this.

You can't reply to email and then immediately post a new version :/
How am I supposed to have a conversation with you? Extremely annoying.

I'm tossing v10 from patchwork, and v11 better come with the docs :/
Wilczynski, Michal Nov. 8, 2022, 8:08 a.m. UTC | #4
On 11/7/2022 7:31 PM, Jakub Kicinski wrote:
> On Mon, 7 Nov 2022 19:18:10 +0100 Wilczynski, Michal wrote:
>> I provided some documentation in v10 in ice.rst file.
>> Unfortunately there is no devlink-rate.rst as far as I can
>> tell and at some point we even discussed adding this with Edward,
>> but honestly I think this could be added in a separate patch
>> series to not unnecessarily prolong merging this.
> You can't reply to email and then immediately post a new version :/
> How am I supposed to have a conversation with you? Extremely annoying.

I'm sorry if you find this annoying, however I can't see any harm here ?
I fixed some legit issues that you've pointed in v9, wrote some
documentation and basically said, "I wrote some documentation in
the next patchset, is it enough ?". I think it's better to get feedback
for smaller commits faster, this way I send the updated patchset
quickly.

>
> I'm tossing v10 from patchwork, and v11 better come with the docs :/

I will however create a new devlink-rate.rst file if you insist.

BR,
Michał
Wilczynski, Michal Nov. 8, 2022, 8:36 a.m. UTC | #5
On 11/8/2022 9:08 AM, Wilczynski, Michal wrote:
>
>
> On 11/7/2022 7:31 PM, Jakub Kicinski wrote:
>> On Mon, 7 Nov 2022 19:18:10 +0100 Wilczynski, Michal wrote:
>>> I provided some documentation in v10 in ice.rst file.
>>> Unfortunately there is no devlink-rate.rst as far as I can
>>> tell and at some point we even discussed adding this with Edward,
>>> but honestly I think this could be added in a separate patch
>>> series to not unnecessarily prolong merging this.
>> You can't reply to email and then immediately post a new version :/
>> How am I supposed to have a conversation with you? Extremely annoying.
>
> I'm sorry if you find this annoying, however I can't see any harm here ?
> I fixed some legit issues that you've pointed in v9, wrote some
> documentation and basically said, "I wrote some documentation in
> the next patchset, is it enough ?". I think it's better to get feedback
> for smaller commits faster, this way I send the updated patchset
> quickly.
>
>>
>> I'm tossing v10 from patchwork, and v11 better come with the docs :/
>
> I will however create a new devlink-rate.rst file if you insist.
>
> BR,
> Michał

There is however a mention about rate-object management in
devlink-port.rst. Would it be okay to extend devlink-por.rstt with new
attributes tx_priority, tx_weight instead of creating a new
devlink-rate.rst ?

BR,
Michał

>
>
Keller, Jacob E Nov. 8, 2022, 4:54 p.m. UTC | #6
On 11/8/2022 12:08 AM, Wilczynski, Michal wrote:
> 
> 
> On 11/7/2022 7:31 PM, Jakub Kicinski wrote:
>> On Mon, 7 Nov 2022 19:18:10 +0100 Wilczynski, Michal wrote:
>>> I provided some documentation in v10 in ice.rst file.
>>> Unfortunately there is no devlink-rate.rst as far as I can
>>> tell and at some point we even discussed adding this with Edward,
>>> but honestly I think this could be added in a separate patch
>>> series to not unnecessarily prolong merging this.
>> You can't reply to email and then immediately post a new version :/
>> How am I supposed to have a conversation with you? Extremely annoying.
> 
> I'm sorry if you find this annoying, however I can't see any harm here ?
> I fixed some legit issues that you've pointed in v9, wrote some
> documentation and basically said, "I wrote some documentation in
> the next patchset, is it enough ?". I think it's better to get feedback
> for smaller commits faster, this way I send the updated patchset
> quickly.
> 

 From 
https://docs.kernel.org/process/maintainer-netdev.html?highlight=netdev

> 2. netdev FAQ¶
> 2.1. tl;dr¶
> designate your patch to a tree - [PATCH net] or [PATCH net-next]
> 
> for fixes the Fixes: tag is required, regardless of the tree
> 
> don’t post large series (> 15 patches), break them up
> 
> don’t repost your patches within one 24h period
> 
> reverse xmas tree

Giving everyone at least 24 hours per posting (if not more) helps ensure 
that limited reviewer time doesn't get overloaded by constantly 
re-reviewing the same code. It also helps ensure that everyone has a 
chance to look at the patches.

Thanks,
Jake
Wilczynski, Michal Nov. 8, 2022, 6:10 p.m. UTC | #7
On 11/8/2022 5:54 PM, Jacob Keller wrote:
>
>
> On 11/8/2022 12:08 AM, Wilczynski, Michal wrote:
>>
>>
>> On 11/7/2022 7:31 PM, Jakub Kicinski wrote:
>>> On Mon, 7 Nov 2022 19:18:10 +0100 Wilczynski, Michal wrote:
>>>> I provided some documentation in v10 in ice.rst file.
>>>> Unfortunately there is no devlink-rate.rst as far as I can
>>>> tell and at some point we even discussed adding this with Edward,
>>>> but honestly I think this could be added in a separate patch
>>>> series to not unnecessarily prolong merging this.
>>> You can't reply to email and then immediately post a new version :/
>>> How am I supposed to have a conversation with you? Extremely annoying.
>>
>> I'm sorry if you find this annoying, however I can't see any harm here ?
>> I fixed some legit issues that you've pointed in v9, wrote some
>> documentation and basically said, "I wrote some documentation in
>> the next patchset, is it enough ?". I think it's better to get feedback
>> for smaller commits faster, this way I send the updated patchset
>> quickly.
>>
>
> From 
> https://docs.kernel.org/process/maintainer-netdev.html?highlight=netdev
>
>> 2. netdev FAQ¶
>> 2.1. tl;dr¶
>> designate your patch to a tree - [PATCH net] or [PATCH net-next]
>>
>> for fixes the Fixes: tag is required, regardless of the tree
>>
>> don’t post large series (> 15 patches), break them up
>>
>> don’t repost your patches within one 24h period
>>
>> reverse xmas tree
>
> Giving everyone at least 24 hours per posting (if not more) helps 
> ensure that limited reviewer time doesn't get overloaded by constantly 
> re-reviewing the same code. It also helps ensure that everyone has a 
> chance to look at the patches.

Again, I'm sorry if my actions were annoying, or seemed overly impatient.
My question would be then, when I am allowed to re-post v11 ?
Technically I didn't re-post my series during 24h period, I just responded
to comments after re-sending a new version :(.
I guess I should wait for Kuba response now, and only after getting a
response re-send a v11  ?

Thanks,
Michał


>
> Thanks,
> Jake
Jakub Kicinski Nov. 8, 2022, 10:34 p.m. UTC | #8
On Tue, 8 Nov 2022 09:36:01 +0100 Wilczynski, Michal wrote:
> > On 11/7/2022 7:31 PM, Jakub Kicinski wrote:  
> >> You can't reply to email and then immediately post a new version :/
> >> How am I supposed to have a conversation with you? Extremely annoying.  
> >
> > I'm sorry if you find this annoying, however I can't see any harm here ?
> > I fixed some legit issues that you've pointed in v9, wrote some
> > documentation and basically said, "I wrote some documentation in
> > the next patchset, is it enough ?". I think it's better to get feedback
> > for smaller commits faster, this way I send the updated patchset
> > quickly.

Perhaps spending some time reading the list would help you understand
what normal development upstream looks like. Posting the N + 1 version
of a patch set and then replying to comments on version N is confusing
because it's impossible to decide where the conversation is taking
place. Should I reply to you on version N or reply to N + 1 even tho
there I can't quote your reply. And will the maintainer who's applying
the patches understand that N + 1 got rejected if the discussion is
happening under the thread of version N?

Perhaps one has to read the list to appreciate the challenges involved.

> >> I'm tossing v10 from patchwork, and v11 better come with the docs :/  
> >
> > I will however create a new devlink-rate.rst file if you insist.
> 
> There is however a mention about rate-object management in
> devlink-port.rst. Would it be okay to extend devlink-por.rstt with new
> attributes tx_priority, tx_weight instead of creating a new
> devlink-rate.rst ?

Sounds good, but please make sure you describe the interaction between
the params and the algorithm. We don't have a SW implementation like we
have for qdiscs here, and we don't want each vendor to be coming up
with their own interpretation of the arguments. So we need solid docs,
and some pseudo code to describe the behavior, perhaps?

Let me ask some extra questions on the doc.