Message ID | 1622636251-29892-1-git-send-email-dlinkin@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | devlink: rate objects API | expand |
On Wed, 2 Jun 2021 15:17:13 +0300 dlinkin@nvidia.com wrote: > From: Dmytro Linkin <dlinkin@nvidia.com> > > Resending without RFC. > > Currently kernel provides a way to change tx rate of single VF in > switchdev mode via tc-police action. When lots of VFs are configured > management of theirs rates becomes non-trivial task and some grouping > mechanism is required. Implementing such grouping in tc-police will bring > flow related limitations and unwanted complications, like: > - tc-police is a policer and there is a user request for a traffic > shaper, so shared tc-police action is not suitable; > - flows requires net device to be placed on, means "groups" wouldn't > have net device instance itself. Taking into the account previous > point was reviewed a sollution, when representor have a policer and > the driver use a shaper if qdisc contains group of VFs - such approach > ugly, compilated and misleading; > - TC is ingress only, while configuring "other" side of the wire looks > more like a "real" picture where shaping is outside of the steering > world, similar to "ip link" command; > > According to that devlink is the most appropriate place. I don't think you researched TC well enough. But whatever, I'm tired of being the only one who pushes back given I neither work on or use any of these features. You need to provide a real implementation for this new uAPI, tho. netdevsim won't cut it.
On 6/2/21 7:58 PM, Jakub Kicinski wrote: > On Wed, 2 Jun 2021 15:17:13 +0300 dlinkin@nvidia.com wrote: >> From: Dmytro Linkin <dlinkin@nvidia.com> >> >> Resending without RFC. >> >> Currently kernel provides a way to change tx rate of single VF in >> switchdev mode via tc-police action. When lots of VFs are configured >> management of theirs rates becomes non-trivial task and some grouping >> mechanism is required. Implementing such grouping in tc-police will bring >> flow related limitations and unwanted complications, like: >> - tc-police is a policer and there is a user request for a traffic >> shaper, so shared tc-police action is not suitable; >> - flows requires net device to be placed on, means "groups" wouldn't >> have net device instance itself. Taking into the account previous >> point was reviewed a sollution, when representor have a policer and >> the driver use a shaper if qdisc contains group of VFs - such approach >> ugly, compilated and misleading; >> - TC is ingress only, while configuring "other" side of the wire looks >> more like a "real" picture where shaping is outside of the steering >> world, similar to "ip link" command; >> >> According to that devlink is the most appropriate place. > > I don't think you researched TC well enough. But whatever, I'm tired > of being the only one who pushes back given I neither work on or use > any of these features. > > You need to provide a real implementation for this new uAPI, tho. > netdevsim won't cut it. > +Saeed The series is already big enough to add more patches to it and implementation (mlx5_core) must go through Saeed. How would You like to proceed?
On 2021/6/3 0:58, Jakub Kicinski wrote: > On Wed, 2 Jun 2021 15:17:13 +0300 dlinkin@nvidia.com wrote: >> From: Dmytro Linkin <dlinkin@nvidia.com> >> >> Resending without RFC. >> >> Currently kernel provides a way to change tx rate of single VF in >> switchdev mode via tc-police action. When lots of VFs are configured >> management of theirs rates becomes non-trivial task and some grouping >> mechanism is required. Implementing such grouping in tc-police will bring >> flow related limitations and unwanted complications, like: >> - tc-police is a policer and there is a user request for a traffic >> shaper, so shared tc-police action is not suitable; >> - flows requires net device to be placed on, means "groups" wouldn't >> have net device instance itself. Taking into the account previous >> point was reviewed a sollution, when representor have a policer and >> the driver use a shaper if qdisc contains group of VFs - such approach >> ugly, compilated and misleading; >> - TC is ingress only, while configuring "other" side of the wire looks >> more like a "real" picture where shaping is outside of the steering >> world, similar to "ip link" command; >> >> According to that devlink is the most appropriate place. > > I don't think you researched TC well enough. But whatever, I'm tired > of being the only one who pushes back given I neither work on or use > any of these features. tc action offload feature used in [1] seems to solve the police action lifecycle problem? And it seem to allow different flow to use the same action, I am not sure if different function can use the same action, it seems jianbo has mentioned about the same usecase? 1. https://lore.kernel.org/netdev/CALnP8ZaZQAbvm1girLUSLcFZTKV5MvBMEtN67OiA55OAvsO_1Q@mail.gmail.com/T/
From: Dmytro Linkin <dlinkin@nvidia.com> Resending without RFC. Currently kernel provides a way to change tx rate of single VF in switchdev mode via tc-police action. When lots of VFs are configured management of theirs rates becomes non-trivial task and some grouping mechanism is required. Implementing such grouping in tc-police will bring flow related limitations and unwanted complications, like: - tc-police is a policer and there is a user request for a traffic shaper, so shared tc-police action is not suitable; - flows requires net device to be placed on, means "groups" wouldn't have net device instance itself. Taking into the account previous point was reviewed a sollution, when representor have a policer and the driver use a shaper if qdisc contains group of VFs - such approach ugly, compilated and misleading; - TC is ingress only, while configuring "other" side of the wire looks more like a "real" picture where shaping is outside of the steering world, similar to "ip link" command; According to that devlink is the most appropriate place. This series introduces devlink API for managing tx rate of single devlink port or of a group by invoking callbacks (see below) of corresponding driver. Also devlink port or a group can be added to the parent group, where driver responsible to handle rates of a group elements. To achieve all of that new rate object is added. It can be one of the two types: - leaf - represents a single devlink port; created/destroyed by the driver and bound to the devlink port. As example, some driver may create leaf rate object for every devlink port associated with VF. Since leaf have 1to1 mapping to it's devlink port, in user space it is referred as pci/<bus_addr>/<port_index>; - node - represents a group of rate objects; created/deleted by request from the userspace; initially empty (no rate objects added). In userspace it is referred as pci/<bus_addr>/<node_name>, where node name can be any, except decimal number, to avoid collisions with leafs. devlink_ops extended with following callbacks: - rate_{leaf|node}_tx_{share|max}_set - rate_node_{new|del} - rate_{leaf|node}_parent_set KAPI provides: - creation/destruction of the leaf rate object associated with devlink port - destruction of rate nodes to allow a vendor driver to free allocated resources on driver removal or due to the other reasons when nodes destruction required UAPI provides: - dumping all or single rate objects - setting tx_{share|max} of rate object of any type - creating/deleting node rate object - setting/unsetting parent of any rate object Added devlink rate object support for netdevsim driver Issues/open questions: - Does user need DEVLINK_CMD_RATE_DEL_ALL_CHILD command to clean all children of particular parent node? For example: $ devlink port function rate flush netdevsim/netdevsim10/group - priv pointer passed to the callbacks is a source of bugs; in leaf case driver can embed rate object into internal structure and use container_of() on it; in node case it cannot be done since nodes are created from userspace v1->v2: - fixed kernel-doc for devlink_rate_leaf_{create|destroy}() - s/func/function/ for all devlink port command occurences v2->v3: - devlink: - added devlink_rate_nodes_destroy() function - netdevsim: - added call of devlink_rate_nodes_destroy() function Dmytro Linkin (18): netdevsim: Add max_vfs to bus_dev netdevsim: Disable VFs on nsim_dev_reload_destroy() call netdevsim: Implement port types and indexing netdevsim: Implement VFs netdevsim: Implement legacy/switchdev mode for VFs devlink: Introduce rate object netdevsim: Register devlink rate leaf objects per VF selftest: netdevsim: Add devlink rate test devlink: Allow setting tx rate for devlink rate leaf objects netdevsim: Implement devlink rate leafs tx rate support selftest: netdevsim: Add devlink port shared/max tx rate test devlink: Introduce rate nodes netdevsim: Implement support for devlink rate nodes selftest: netdevsim: Add devlink rate nodes test devlink: Allow setting parent node of rate objects netdevsim: Allow setting parent node of rate objects selftest: netdevsim: Add devlink rate grouping test Documentation: devlink rate objects Documentation/networking/devlink/devlink-port.rst | 35 ++ Documentation/networking/devlink/netdevsim.rst | 26 + drivers/net/netdevsim/bus.c | 131 +++- drivers/net/netdevsim/dev.c | 396 ++++++++++++- drivers/net/netdevsim/netdev.c | 95 ++- drivers/net/netdevsim/netdevsim.h | 48 ++ include/net/devlink.h | 48 ++ include/uapi/linux/devlink.h | 17 + net/core/devlink.c | 660 ++++++++++++++++++++- .../selftests/drivers/net/netdevsim/devlink.sh | 167 +++++- 10 files changed, 1565 insertions(+), 58 deletions(-)