diff mbox series

[RFC] net: introduce HW Rate Limiting Driver API

Message ID 3d1e2d945904a0fb55258559eb7322d7e11066b6.1715199358.git.pabeni@redhat.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC] net: introduce HW Rate Limiting Driver API | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4858 this patch: 4858
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: edumazet@google.com
netdev/build_clang success Errors and warnings before: 1038 this patch: 1038
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5134 this patch: 5134
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 84 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc fail Errors and warnings before: 22 this patch: 23
netdev/source_inline success Was 0 now: 0

Commit Message

Paolo Abeni May 8, 2024, 8:20 p.m. UTC
This is the first incarnation in a formal (pre-RFC) patch of the
HW TX Rate Limiting Driver API proposal shared here[1].

The goal is to outline the proposed APIs before pushing the actual
implementation.

The network devices gain a new ops struct to directly manipulate the
H/W shapers implemented by the NIC.

The shapers can be attached to a pre-defined set of 'domains' - port,
vf, etc. - and the overall shapers configuration pushed to the H/W is
maintained by the kernel.

Each shaper is identified by an unique integer id based on the domain
and additional domain-specific information - e.g. for the VF domain, the
virtual function number/identifier.

[1] https://lore.kernel.org/netdev/20240405102313.GA310894@kernel.org/

Co-developed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/netdevice.h |  15 +++
 include/net/net_shaper.h  | 206 ++++++++++++++++++++++++++++++++++++++
 net/Kconfig               |   3 +
 3 files changed, 224 insertions(+)
 create mode 100644 include/net/net_shaper.h

Comments

Andrew Lunn May 8, 2024, 9:47 p.m. UTC | #1
Hi Paolo

> + * struct net_shaper_info - represents a shaping node on the NIC H/W
> + * @metric: Specify if the bw limits refers to PPS or BPS
> + * @bw_min: Minimum guaranteed rate for this shaper
> + * @bw_max: Maximum peak bw allowed for this shaper
> + * @burst: Maximum burst for the peek rate of this shaper
> + * @priority: Scheduling priority for this shaper
> + * @weight: Scheduling weight for this shaper
> + */
> +struct net_shaper_info {
> +	enum net_shaper_metric metric;
> +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> +	u64 bw_max;	/* maximum allowed bandwidth */
> +	u32 burst;	/* maximum burst in bytes for bw_max */
> +	u32 priority;	/* scheduling strict priority */

Above it say priority. Here is strict priority? Is there a difference?

> +	u32 weight;	/* scheduling WRR weight*/
> +};

Are there any special semantics for weight? Looking at the hardware i
have, which has 8 queues for a switch port, i can either set it to
strict priority, meaning queue 7 needs to be empty before it look at
queue 6, and it will only look at queue 5 when 6 is empty etc. Or i
can set weights per queue. How would i expect strict priority?

Shaping itself is not performed on the queues, but the port. So it
seems like i should create 8 net_shaper_info and set the weight in
each, and everything else to 0? And then create a queue group shaper,
put the 8 queue shapers into it, and then configure bw_max for the
group? Everything else is 0, because that is all i can configure.

Does this sound correct?

> + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.

Are they also available on plain boring devices which don't have PFs
or VFs?

Would i be correct in assuming my driver should just create these
shapers. There will be some netlink calls to allow user space to
enumerate them, and display the relationships between them? And
netlink calls to set values in a shaper? Will there be a way to say
which fields are actually settable, since i doubt most hardware will
have everything? In my case, only one field appears to be relevant in
each shaper, and maybe we want to give a hint about that to userspace?

	Andrew
Paolo Abeni May 9, 2024, 2:19 p.m. UTC | #2
On Wed, 2024-05-08 at 23:47 +0200, Andrew Lunn wrote:
> > + * struct net_shaper_info - represents a shaping node on the NIC H/W
> > + * @metric: Specify if the bw limits refers to PPS or BPS
> > + * @bw_min: Minimum guaranteed rate for this shaper
> > + * @bw_max: Maximum peak bw allowed for this shaper
> > + * @burst: Maximum burst for the peek rate of this shaper
> > + * @priority: Scheduling priority for this shaper
> > + * @weight: Scheduling weight for this shaper
> > + */
> > +struct net_shaper_info {
> > +	enum net_shaper_metric metric;
> > +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> > +	u64 bw_max;	/* maximum allowed bandwidth */
> > +	u32 burst;	/* maximum burst in bytes for bw_max */
> > +	u32 priority;	/* scheduling strict priority */
> 
> Above it say priority. Here is strict priority? Is there a difference?

the 'priority' field is the strict priority for scheduling. We will
clarify the first comment.

> 
> > +	u32 weight;	/* scheduling WRR weight*/
> > +};
> 
> Are there any special semantics for weight? Looking at the hardware i
> have, which has 8 queues for a switch port, i can either set it to
> strict priority, meaning queue 7 needs to be empty before it look at
> queue 6, and it will only look at queue 5 when 6 is empty etc. Or i
> can set weights per queue. How would i expect strict priority?

The expected behaviour is that schedulers at the same level with the
same priority will be served according to their weight.

I'm unsure if I read your question correctly. My understanding is that
the your H/W don't allow strict priority and WRR simultaneously. In
such case, the set/create operations should reject calls setting both
non zero weight and priority values.

> Shaping itself is not performed on the queues, but the port. So it
> seems like i should create 8 net_shaper_info and set the weight in
> each, and everything else to 0? And then create a queue group shaper,
> put the 8 queue shapers into it, and then configure bw_max for the
> group? Everything else is 0, because that is all i can configure.
> 
> Does this sound correct?

It sounds correct. You can avoid creating the queue group and set the
rate at the NETDEV scope, or possibly not even set/create such shaper:
the transmission is expected to be limited by the line rate.

> > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> 
> Are they also available on plain boring devices which don't have PFs
> or VFs?

Yes, a driver is free to implement/support an arbitrary subset of the
available feature. Operations attempting to set an unsupported state
should fail with a relevant extended ack.

> Would i be correct in assuming my driver should just create these
> shapers. There will be some netlink calls to allow user space to
> enumerate them, and display the relationships between them?

Yes, the idea/plan/roadmap is to implement yml-based netlink API to
fully expose the feature to the user-space.

>  And
> netlink calls to set values in a shaper? Will there be a way to say
> which fields are actually settable, since i doubt most hardware will
> have everything? 

This was not planned yet, even because some (most?) H/W can have non-
trivial constraints to expose. e.g. WRR is available at the queue scope
only, and only if there is at most one SP shaper somewhere else. 

> In my case, only one field appears to be relevant in
> each shaper, and maybe we want to give a hint about that to userspace?

Any suggestion on how to expose that in reasonable way more then
welcome!

Thanks,

Paolo
Andrew Lunn May 9, 2024, 3 p.m. UTC | #3
On Thu, May 09, 2024 at 04:19:52PM +0200, Paolo Abeni wrote:
> On Wed, 2024-05-08 at 23:47 +0200, Andrew Lunn wrote:
> > > + * struct net_shaper_info - represents a shaping node on the NIC H/W
> > > + * @metric: Specify if the bw limits refers to PPS or BPS
> > > + * @bw_min: Minimum guaranteed rate for this shaper
> > > + * @bw_max: Maximum peak bw allowed for this shaper
> > > + * @burst: Maximum burst for the peek rate of this shaper
> > > + * @priority: Scheduling priority for this shaper
> > > + * @weight: Scheduling weight for this shaper
> > > + */
> > > +struct net_shaper_info {
> > > +	enum net_shaper_metric metric;
> > > +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> > > +	u64 bw_max;	/* maximum allowed bandwidth */
> > > +	u32 burst;	/* maximum burst in bytes for bw_max */
> > > +	u32 priority;	/* scheduling strict priority */
> > 
> > Above it say priority. Here is strict priority? Is there a difference?
> 
> the 'priority' field is the strict priority for scheduling. We will
> clarify the first comment.
> 
> > 
> > > +	u32 weight;	/* scheduling WRR weight*/
> > > +};
> > 
> > Are there any special semantics for weight? Looking at the hardware i
> > have, which has 8 queues for a switch port, i can either set it to
> > strict priority, meaning queue 7 needs to be empty before it look at
> > queue 6, and it will only look at queue 5 when 6 is empty etc. Or i
> > can set weights per queue. How would i expect strict priority?
> 
> The expected behaviour is that schedulers at the same level with the
> same priority will be served according to their weight.
> 
> I'm unsure if I read your question correctly. My understanding is that
> the your H/W don't allow strict priority and WRR simultaneously.

Correct. There is one bit to select between the two. If WRR is
enabled, it then becomes possible for some generations of the hardware
to configure the weights, for others the weights are fixed in silicon.

> In
> such case, the set/create operations should reject calls setting both
> non zero weight and priority values.

So in order to set strict priority, i need to set the priority field
to 7, 6, 5, 4, 3, 2, 1, 0, for my 8 queues, and weight to 0. For WRR,
i need to set the priority fields to 0, and the weight values, either
to what is hard coded in the silicon, or valid weights when it is
configurable.

Now the question is, how do i get between these two states? It is not
possible to mix WRR and strict priority. Any kAPI which only modifies
one queue at once will go straight into an invalid state, and the
driver will need to return -EOPNOTSUPP. So it seems like there needs
to be an atomic set N queue configuration at once, so i can cleanly go
from strict priority across 8 queues to WRR across 8 queues. Is that
foreseen?

> It sounds correct. You can avoid creating the queue group and set the
> rate at the NETDEV scope, or possibly not even set/create such shaper:
> the transmission is expected to be limited by the line rate.

Well, the hardware exists, i probably should just create the shapers
to match the hardware. However, if i set the hardware equivalent of
bw_max to 0, it then uses line rate. Is this something we want to
document? You can disable a shaper from shaping by setting the
bandwidth to 0? Or do we want a separate enable/disable bit in the
structure?

> 
> > > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> > > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> > 
> > Are they also available on plain boring devices which don't have PFs
> > or VFs?
> 
> Yes, a driver is free to implement/support an arbitrary subset of the
> available feature. Operations attempting to set an unsupported state
> should fail with a relevant extended ack.

Great. Please update the comment.

> > In my case, only one field appears to be relevant in
> > each shaper, and maybe we want to give a hint about that to userspace?
> 
> Any suggestion on how to expose that in reasonable way more then
> welcome!

We might first want to gather knowledge on what these shapers actually
look like in different hardwares. There are going to be some which are
very limited fixed functions as in the hardware i have, probably most
SOHO switches are like this. And then we have TOR and smart NICs which
might be able to create and destroy shapers on the fly, and are a lot
less restrictive.

     Andrew
Andrew Lunn May 9, 2024, 3:09 p.m. UTC | #4
> +/**
> + * enum net_shaper_metric - the metric of the shaper
> + * @NET_SHAPER_METRIC_PPS: Shaper operates on a packets per second basis
> + * @NET_SHAPER_METRIC_BPS: Shaper operates on a bits per second basis
> + */
> +enum net_shaper_metric {
> +	NET_SHAPER_METRIC_PPS,
> +	NET_SHAPER_METRIC_BPS
> +};

I think we need a third value. In the hardware i'm talking about, the
8 queues themselves don't do any shaping. All i'm configuring is WRR
vs strict priority order. So ideally i want something to indicate this
shaper does not actually shape.

       Andrew
Paolo Abeni May 9, 2024, 3:43 p.m. UTC | #5
On Thu, 2024-05-09 at 17:00 +0200, Andrew Lunn wrote:
> On Thu, May 09, 2024 at 04:19:52PM +0200, Paolo Abeni wrote:
> > On Wed, 2024-05-08 at 23:47 +0200, Andrew Lunn wrote:
> > > > + * struct net_shaper_info - represents a shaping node on the NIC H/W
> > > > + * @metric: Specify if the bw limits refers to PPS or BPS
> > > > + * @bw_min: Minimum guaranteed rate for this shaper
> > > > + * @bw_max: Maximum peak bw allowed for this shaper
> > > > + * @burst: Maximum burst for the peek rate of this shaper
> > > > + * @priority: Scheduling priority for this shaper
> > > > + * @weight: Scheduling weight for this shaper
> > > > + */
> > > > +struct net_shaper_info {
> > > > +	enum net_shaper_metric metric;
> > > > +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> > > > +	u64 bw_max;	/* maximum allowed bandwidth */
> > > > +	u32 burst;	/* maximum burst in bytes for bw_max */
> > > > +	u32 priority;	/* scheduling strict priority */
> > > 
> > > Above it say priority. Here is strict priority? Is there a difference?
> > 
> > the 'priority' field is the strict priority for scheduling. We will
> > clarify the first comment.
> > 
> > > 
> > > > +	u32 weight;	/* scheduling WRR weight*/
> > > > +};
> > > 
> > > Are there any special semantics for weight? Looking at the hardware i
> > > have, which has 8 queues for a switch port, i can either set it to
> > > strict priority, meaning queue 7 needs to be empty before it look at
> > > queue 6, and it will only look at queue 5 when 6 is empty etc. Or i
> > > can set weights per queue. How would i expect strict priority?
> > 
> > The expected behaviour is that schedulers at the same level with the
> > same priority will be served according to their weight.
> > 
> > I'm unsure if I read your question correctly. My understanding is that
> > the your H/W don't allow strict priority and WRR simultaneously.
> 
> Correct. There is one bit to select between the two. If WRR is
> enabled, it then becomes possible for some generations of the hardware
> to configure the weights, for others the weights are fixed in silicon.
> 
> > In
> > such case, the set/create operations should reject calls setting both
> > non zero weight and priority values.
> 
> So in order to set strict priority, i need to set the priority field
> to 7, 6, 5, 4, 3, 2, 1, 0, for my 8 queues, and weight to 0. For WRR,
> i need to set the priority fields to 0, and the weight values, either
> to what is hard coded in the silicon, or valid weights when it is
> configurable.
> 
> Now the question is, how do i get between these two states? It is not
> possible to mix WRR and strict priority. Any kAPI which only modifies
> one queue at once will go straight into an invalid state, and the
> driver will need to return -EOPNOTSUPP. So it seems like there needs
> to be an atomic set N queue configuration at once, so i can cleanly go
> from strict priority across 8 queues to WRR across 8 queues. Is that
> foreseen?

You could delete all the WRR shapers and then create/add SP based ones.

> > It sounds correct. You can avoid creating the queue group and set the
> > rate at the NETDEV scope, or possibly not even set/create such shaper:
> > the transmission is expected to be limited by the line rate.
> 
> Well, the hardware exists, i probably should just create the shapers
> to match the hardware. However, if i set the hardware equivalent of
> bw_max to 0, it then uses line rate. Is this something we want to
> document? You can disable a shaper from shaping by setting the
> bandwidth to 0? Or do we want a separate enable/disable bit in the
> structure?

I documenting that '0' means 'unlimited' for bw_max fields should be
good enough. If the NIC can't support any kind of shaping it will
reject with an appropriate extended ack any configuration change with
bw != 0.

I guess we need another comment update here :)


> > > > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> > > > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> > > 
> > > Are they also available on plain boring devices which don't have PFs
> > > or VFs?
> > 
> > Yes, a driver is free to implement/support an arbitrary subset of the
> > available feature. Operations attempting to set an unsupported state
> > should fail with a relevant extended ack.
> 
> Great. Please update the comment.

Sure, will do.

> > > In my case, only one field appears to be relevant in
> > > each shaper, and maybe we want to give a hint about that to userspace?
> > 
> > Any suggestion on how to expose that in reasonable way more then
> > welcome!
> 
> We might first want to gather knowledge on what these shapers actually
> look like in different hardwares. There are going to be some which are
> very limited fixed functions as in the hardware i have, probably most
> SOHO switches are like this. And then we have TOR and smart NICs which
> might be able to create and destroy shapers on the fly, and are a lot
> less restrictive.

As a reference the initial configuration status has been discussed
here:

https://lore.kernel.org/netdev/20240410075745.4637c537@kernel.org/

(grep for 'Transforming arbitrary pre-existing driver hierarchy')

The conclusion was (to my understanding) to reduce the core complexity
enforcing an uniform views of the NIC defaults.

The 'create' op is just an abstraction to tell the NIC to switch from
the default configuration to the specified one.

The default configuration means 'no restrictions/shaping beyond the
link rate'. If the NIC has buildin shapers it can't disable, it must
initialize them to match the above. In both case the kernel view of the
default will be 'no shapers added'

The 'delete' op will restore the default for the specified scope
location. If the NIC has a persistent shaper there, it will update the
setting to the default, or actually delete the H/W shaper if possible.
In both case the kernel view of such scope location with return to be
'no shapers added'

My understading/guess/hope is that is possible to implement the above
default with any H/W.

Does the above solve your doubt/answer your question?

Thanks!

Paolo
Andrew Lunn May 9, 2024, 4:17 p.m. UTC | #6
> > Now the question is, how do i get between these two states? It is not
> > possible to mix WRR and strict priority. Any kAPI which only modifies
> > one queue at once will go straight into an invalid state, and the
> > driver will need to return -EOPNOTSUPP. So it seems like there needs
> > to be an atomic set N queue configuration at once, so i can cleanly go
> > from strict priority across 8 queues to WRR across 8 queues. Is that
> > foreseen?
> 
> You could delete all the WRR shapers and then create/add SP based ones.

But that does not match the hardware. I cannot delete the hardware. It
will either do strict priority or WRR. If i delete the software
representation of the shaper, the hardware shaper will keep on doing
what it was doing. So i don't see this as a good model. I think the
driver will create shapers to represent the hardware, and you are not
allowed to delete them or add more of them, because that is what the
hardware is. All you can do is configure the shapers that exist.

> The 'create' op is just an abstraction to tell the NIC to switch from
> the default configuration to the specified one.

Well, the hardware default is i think WRR for the queues, and line
rate. That will be what the software representation of the shapers
will be set to when the driver probes and creates the shapers
representors.

	Andrew
Naveen Mamindlapalli May 10, 2024, 7:10 a.m. UTC | #7
Hi Paolo,

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, May 9, 2024 1:51 AM
> To: netdev@vger.kernel.org
> Cc: Jakub Kicinski <kuba@kernel.org>; Jiri Pirko <jiri@resnulli.us>; Madhu
> Chittim <madhu.chittim@intel.com>; Sridhar Samudrala
> <sridhar.samudrala@intel.com>; Simon Horman <horms@kernel.org>; John
> Fastabend <john.fastabend@gmail.com>; Sunil Kovvuri Goutham
> <sgoutham@marvell.com>; Jamal Hadi Salim <jhs@mojatatu.com>
> Subject: [RFC PATCH] net: introduce HW Rate Limiting Driver API
> 
> This is the first incarnation in a formal (pre-RFC) patch of the HW TX Rate
> Limiting Driver API proposal shared here[1].
> 
> The goal is to outline the proposed APIs before pushing the actual implementation.
> 
> The network devices gain a new ops struct to directly manipulate the H/W
> shapers implemented by the NIC.
> 
> The shapers can be attached to a pre-defined set of 'domains' - port, vf, etc. - and
> the overall shapers configuration pushed to the H/W is maintained by the kernel.
> 
> Each shaper is identified by an unique integer id based on the domain and
> additional domain-specific information - e.g. for the VF domain, the virtual function
> number/identifier.
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_netdev_20240405102313.GA310894-
> 40kernel.org_&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=TwreqwV6mQ8K9wI
> pqwFO8yjikO_w1jUOe2MzChg4Rmg&m=g3KFBILzzqT9znnKDO0ePHOi5tc0Zzy6r
> G9xL3GdGKS_9IkdC3q-
> okhCTptMrIKN&s=hVQbF8IF41zg6I197jY8r96uQq0oQbgTfe4-4HF6c3M&e=
> 
> Co-developed-by: Simon Horman <horms@kernel.org>
> Signed-off-by: Simon Horman <horms@kernel.org>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/netdevice.h |  15 +++
>  include/net/net_shaper.h  | 206
> ++++++++++++++++++++++++++++++++++++++
>  net/Kconfig               |   3 +
>  3 files changed, 224 insertions(+)
>  create mode 100644 include/net/net_shaper.h
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index
> cf261fb89d73..39f66af014be 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -79,6 +79,8 @@ struct xdp_buff;
>  struct xdp_frame;
>  struct xdp_metadata_ops;
>  struct xdp_md;
> +struct net_shaper_ops;
> +struct net_shaper_data;
> 
>  typedef u32 xdp_features_t;
> 
> @@ -1596,6 +1598,13 @@ struct net_device_ops {
>  	int			(*ndo_hwtstamp_set)(struct net_device *dev,
>  						    struct kernel_hwtstamp_config
> *kernel_config,
>  						    struct netlink_ext_ack
> *extack);
> +
> +#if IS_ENABLED(CONFIG_NET_SHAPER)
> +	/** @net_shaper_ops: Device shaping offload operations
> +	 * see include/net/net_shapers.h
> +	 */
> +	const struct net_shaper_ops *net_shaper_ops; #endif
>  };
> 
>  /**
> @@ -2403,6 +2412,12 @@ struct net_device {
>  	/** @page_pools: page pools created for this netdevice */
>  	struct hlist_head	page_pools;
>  #endif
> +#if IS_ENABLED(CONFIG_NET_SHAPER)
> +	/** @net_shaper_data: data tracking the current shaper status
> +	 *  see include/net/net_shapers.h
> +	 */
> +	struct net_shaper_data *net_shaper_data; #endif
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
> 
> diff --git a/include/net/net_shaper.h b/include/net/net_shaper.h new file mode
> 100644 index 000000000000..a4fbadd99870
> --- /dev/null
> +++ b/include/net/net_shaper.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#ifndef _NET_SHAPER_H_
> +#define _NET_SHAPER_H_
> +
> +#include <linux/types.h>
> +#include <linux/netdevice.h>
> +#include <linux/netlink.h>
> +
> +/**
> + * enum net_shaper_metric - the metric of the shaper
> + * @NET_SHAPER_METRIC_PPS: Shaper operates on a packets per second
> +basis
> + * @NET_SHAPER_METRIC_BPS: Shaper operates on a bits per second basis
> +*/ enum net_shaper_metric {
> +	NET_SHAPER_METRIC_PPS,
> +	NET_SHAPER_METRIC_BPS
> +};
> +
> +/**
> + * struct net_shaper_info - represents a shaping node on the NIC H/W
> + * @metric: Specify if the bw limits refers to PPS or BPS
> + * @bw_min: Minimum guaranteed rate for this shaper
> + * @bw_max: Maximum peak bw allowed for this shaper
> + * @burst: Maximum burst for the peek rate of this shaper
> + * @priority: Scheduling priority for this shaper
> + * @weight: Scheduling weight for this shaper  */ struct
> +net_shaper_info {
> +	enum net_shaper_metric metric;
> +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> +	u64 bw_max;	/* maximum allowed bandwidth */
> +	u32 burst;	/* maximum burst in bytes for bw_max */
> +	u32 priority;	/* scheduling strict priority */
> +	u32 weight;	/* scheduling WRR weight*/
> +};
> +
> +/**
> + * enum net_shaper_scope - the different scopes where a shaper could be
> attached
> + * @NET_SHAPER_SCOPE_PORT:   The root shaper for the whole H/W.
> + * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network
> device.
> + * @NET_SHAPER_SCOPE_VF:     The shaper is attached to the given virtual
> + * function.
> + * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple
> queues
> +under the
> + * same device.
> + * @NET_SHAPER_SCOPE_QUEUE:  The shaper is attached to the given
> device queue.
> + *
> + * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only
> available on
> + * PF devices, usually inside the host/hypervisor.

What is the difference between NET_SHAPER_SCOPE_VF and NET_SHAPER_SCOPE_NETDEV from a VF traffic shaping perspective?
Is NET_SHAPER_SCOPE_VF used to configure a shaper for a VF which is in a separate domain that is not a NETDEV?

Thanks,
Naveen

> + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP
> and
> + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> + */
> +enum net_shaper_scope {
> +	NET_SHAPER_SCOPE_PORT,
> +	NET_SHAPER_SCOPE_NETDEV,
> +	NET_SHAPER_SCOPE_VF,
> +	NET_SHAPER_SCOPE_QUEUE_GROUP,
> +	NET_SHAPER_SCOPE_QUEUE,
> +};
> +
> +/**
> + * struct net_shaper_ops - Operations on device H/W shapers
> + * @add: Creates a new shaper in the specified scope.
> + * @set: Modify the existing shaper.
> + * @delete: Delete the specified shaper.
> + * @move: Move an existing shaper under a different parent.
> + *
> + * The initial shaping configuration ad device initialization is empty/
> + * a no-op/does not constraint the b/w in any way.
> + * The network core keeps track of the applied user-configuration in
> + * per device storage.
> + *
> + * Each shaper is uniquely identified within the device with an
> +'handle',
> + * dependent on the shaper scope and other data, see
> +@shaper_make_handle()  */ struct net_shaper_ops {
> +	/** add - Add a shaper inside the shaper hierarchy
> +	 * @dev: netdevice to operate on
> +	 * @handle: the shaper indetifier
> +	 * @shaper: configuration of shaper
> +	 * @extack: Netlink extended ACK for reporting errors.
> +	 *
> +	 * Return:
> +	 * * 0 on success
> +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> +	 *                  or core for any reason. @extack should be set to
> +	 *                  text describing the reason.
> +	 * * Other negative error values on failure.
> +	 *
> +	 * Examples or reasons this operation may fail include:
> +	 * * H/W resources limits.
> +	 * * Can’t respect the requested bw limits.
> +	 */
> +	int (*add)(struct net_device *dev, u32 handle,
> +		   const struct net_shaper_info *shaper,
> +		   struct netlink_ext_ack *extack);
> +
> +	/** set - Update the specified shaper, if it exists
> +	 * @dev: Netdevice to operate on.
> +	 * @handle: the shaper identifier
> +	 * @shaper: Configuration of shaper.
> +	 * @extack: Netlink extended ACK for reporting errors.
> +	 *
> +	 * Return:
> +	 * * %0 - Success
> +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> +	 *                  or core for any reason. @extack should be set to
> +	 *                  text describing the reason.
> +	 * * Other negative error values on failure.
> +	 */
> +	int (*set)(struct net_device *dev, u32 handle,
> +		   const struct net_shaper_info *shaper,
> +		   struct netlink_ext_ack *extack);
> +
> +	/** delete - Removes a shaper from the NIC
> +	 * @dev: netdevice to operate on.
> +	 * @handle: the shaper identifier
> +	 * @extack: Netlink extended ACK for reporting errors.
> +	 *
> +	 * Return:
> +	 * * %0 - Success
> +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> +	 *                  or core for any reason. @extack should be set to
> +	 *                  text describing the reason.
> +	 * * Other negative error value on failure.
> +	 */
> +	int (*delete)(struct net_device *dev, u32 handle,
> +		      struct netlink_ext_ack *extack);
> +
> +	/** Move - change the parent id of the specified shaper
> +	 * @dev: netdevice to operate on.
> +	 * @handle: unique identifier for the shaper
> +	 * @new_parent_id: identifier of the new parent for this shaper
> +	 * @extack: Netlink extended ACK for reporting errors.
> +	 *
> +	 * Move the specified shaper in the hierarchy replacing its
> +	 * current parent shaper with @new_parent_id
> +	 *
> +	 * Return:
> +	 * * %0 - Success
> +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> +	 *                  or core for any reason. @extack should be set to
> +	 *                  text describing the reason.
> +	 * * Other negative error values on failure.
> +	 */
> +	int (*move)(struct net_device *dev, u32 handle,
> +		    u32 new_parent_handle, struct netlink_ext_ack *extack); };
> +
> +/**
> + * net_shaper_make_handle - creates an unique shaper identifier
> + * @scope: the shaper scope
> + * @vf: virtual function number
> + * @id: queue group or queue id
> + *
> + * Return: an unique identifier for the shaper
> + *
> + * Combines the specified arguments to create an unique identifier for
> + * the shaper.
> + * The virtual function number is only used within
> +@NET_SHAPER_SCOPE_VF,
> + * @NET_SHAPER_SCOPE_QUEUE_GROUP and
> @NET_SHAPER_SCOPE_QUEUE.
> + * The @id number is only used for @NET_SHAPER_SCOPE_QUEUE_GROUP
> and
> + * @NET_SHAPER_SCOPE_QUEUE, and must be, respectively, the queue
> group
> + * identifier or the queue number.
> + */
> +u32 net_shaper_make_handle(enum net_shaper_scope scope, int vf, int
> +id);
> +
> +/*
> + * Examples:
> + * - set shaping on a given queue
> + *   struct shaper_info info = { }; // fill this
> + *   u32 handle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE, 0,
> queue_id);
> + *   dev->shaper_ops->add(dev, handle, &info, NULL);
> + *
> + * - create a queue group with a queue group shaping limits.
> + *   Assuming the following topology already exists:
> + *                       < netdev shaper  >
> + *                        /              \
> + *               <queue 0 shaper> . . .  <queue N shaper>
> + *
> + *   struct shaper_info ginfo = { }; // fill this
> + *   u32 ghandle =
> shaper_make_handle(NET_SHAPER_SCOPE_QUEUE_GROUP, 0, 0);
> + *   dev->shaper_ops->add(dev, ghandle, &ginfo);
> + *
> + *   // now topology is:
> + *   //                              < netdev shaper  >
> + *   //                             /         |          \
> + *   //                            /          |       < newly created shaper  >
> + *   //                           /           |
> + *   //	<queue 0 shaper> . . .    <queue N shaper>
> + *
> + *   // move a shapers for queues 3..n out of such queue group
> + *   for (i = 0; i <= 2; ++i) {
> + *       u32 qhandle = net_shaper_make_handle(NET_SHAPER_SCOPE_QUEUE,
> 0, i);
> + *       dev->netshaper_ops->move(dev, qhandle, ghandle, NULL);
> + *   }
> + *
> + *   // now the topology is:
> + *   //                                < netdev shaper  >
> + *   //                                 /            \
> + *   //               < newly created shaper>   <queue 3 shaper> .. <queue n shaper>
> + *   //                /               \
> + *   //	<queue 0 shaper> . . .    <queue 2 shaper>
> + */
> +#endif
> +
> diff --git a/net/Kconfig b/net/Kconfig
> index f0a8692496ff..29c6fec54711 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -66,6 +66,9 @@ config SKB_DECRYPTED
>  config SKB_EXTENSIONS
>  	bool
> 
> +config NET_SHAPER
> +	bool
> +
>  menu "Networking options"
> 
>  source "net/packet/Kconfig"
> --
> 2.43.2
>
Paolo Abeni May 10, 2024, 7:58 a.m. UTC | #8
On Fri, 2024-05-10 at 07:10 +0000, Naveen Mamindlapalli wrote:

> On Thursday, May 9, 2024 1:51 AM Paolo Abeni <pabeni@redhat.com> wrote:
[...]
> > +/**
> > + * enum net_shaper_scope - the different scopes where a shaper could be
> > attached
> > + * @NET_SHAPER_SCOPE_PORT:   The root shaper for the whole H/W.
> > + * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network
> > device.
> > + * @NET_SHAPER_SCOPE_VF:     The shaper is attached to the given virtual
> > + * function.
> > + * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple
> > queues
> > +under the
> > + * same device.
> > + * @NET_SHAPER_SCOPE_QUEUE:  The shaper is attached to the given
> > device queue.
> > + *
> > + * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only
> > available on
> > + * PF devices, usually inside the host/hypervisor.
> 
> What is the difference between NET_SHAPER_SCOPE_VF and NET_SHAPER_SCOPE_NETDEV 
> from a VF traffic shaping perspective?

With NET_SHAPER_SCOPE_VF, the user can control VF-level shapers from
within the hypervisor.

With NET_SHAPER_SCOPE_NETDEV, the user will control whatever is really
under the gear of the given 'dev' struct: the PF-level shaper if the
user itself is within the hypervisor, or the relevant VF if the user is
within the VM.

Cheers,

Paolo
Paolo Abeni May 10, 2024, 11:05 a.m. UTC | #9
On Thu, 2024-05-09 at 18:17 +0200, Andrew Lunn wrote:
> > > Now the question is, how do i get between these two states? It is not
> > > possible to mix WRR and strict priority. Any kAPI which only modifies
> > > one queue at once will go straight into an invalid state, and the
> > > driver will need to return -EOPNOTSUPP. So it seems like there needs
> > > to be an atomic set N queue configuration at once, so i can cleanly go
> > > from strict priority across 8 queues to WRR across 8 queues. Is that
> > > foreseen?
> > 
> > You could delete all the WRR shapers and then create/add SP based ones.
> 
> But that does not match the hardware. I cannot delete the hardware. It
> will either do strict priority or WRR. If i delete the software
> representation of the shaper, the hardware shaper will keep on doing
> what it was doing. So i don't see this as a good model. I think the
> driver will create shapers to represent the hardware, and you are not
> allowed to delete them or add more of them, because that is what the
> hardware is. All you can do is configure the shapers that exist.
> 
> > The 'create' op is just an abstraction to tell the NIC to switch from
> > the default configuration to the specified one.
> 
> Well, the hardware default is i think WRR for the queues, and line
> rate. That will be what the software representation of the shapers
> will be set to when the driver probes and creates the shapers
> representors.

If I read correctly, allowing each NIC to expose it's own different
starting configuration still will not solve the problem for this H/W to
switch from WRR to SP (and vice versa).

AFAICS, what would be needed there is an atomic set of operations:
'set_many' (and e.v. 'delete_many', 'create_many') that will allow
changing all the shapers at once. 

With such operations, that H/W could still fit the expected 'no-op'
default, as WRR on the queue shapers is what we expect. I agree with
Jakub, handling the complexity of arbitrary starting configuration
would pose a lot of trouble to the user/admin.

If all the above stands together, I think we have a few options (in
random order):

- add both set of operations: the ones operating on a single shaper and
the ones operating on multiple shapers
- use only the multiple shapers ops.

And the latter looks IMHO the simple/better. At that point I would
probably drop the 'add' op and would rename 'delete' as
'reset':

int (*set)(struct net_device *dev, int how_many, const u32 *handles,
	   const struct net_shaper_info *shapers,
           struct netlink_ext_ack *extack);
int (*reset)(struct net_device *dev, int how_many, const u32 *handles,
             struct netlink_ext_ack *extack);
int (*move)(struct net_device *dev, int how_many, const u32 *handles,
            const u32 *new_parent_handles,
	    struct netlink_ext_ack *extack);

An NIC with 'static' shapers can implement a dummy move always
returning EOPNOTSUPP and eventually filling a detailed extack.

NIC without any constraints on mixing and matching different kind of
shapers could implement the above as a loop over whatever they will do
for the corresponding 'single shaper op'

NIC with constrains alike the one you pointed out could validate the
final state before atomically applying the specified operation.

After a successful  'reset' operation, the kernel could drop any data
it retains/caches for the relevant shapers - the current idea is to
keep a copy of all successfully configured shaper_info in a xarray,
using the 'handle' as the index.

Side note: the move() operation could look like a complex artifact, but
it's the simplest instrument I could think of to support scenarios
where the user creates/configures/sets a queue group and 'move' some
queue under the newly created group

WDYT?

Thanks,

Paolo
Simon Horman May 15, 2024, 9:51 a.m. UTC | #10
On Fri, May 10, 2024 at 01:05:41PM +0200, Paolo Abeni wrote:
> On Thu, 2024-05-09 at 18:17 +0200, Andrew Lunn wrote:
> > > > Now the question is, how do i get between these two states? It is not
> > > > possible to mix WRR and strict priority. Any kAPI which only modifies
> > > > one queue at once will go straight into an invalid state, and the
> > > > driver will need to return -EOPNOTSUPP. So it seems like there needs
> > > > to be an atomic set N queue configuration at once, so i can cleanly go
> > > > from strict priority across 8 queues to WRR across 8 queues. Is that
> > > > foreseen?
> > > 
> > > You could delete all the WRR shapers and then create/add SP based ones.
> > 
> > But that does not match the hardware. I cannot delete the hardware. It
> > will either do strict priority or WRR. If i delete the software
> > representation of the shaper, the hardware shaper will keep on doing
> > what it was doing. So i don't see this as a good model. I think the
> > driver will create shapers to represent the hardware, and you are not
> > allowed to delete them or add more of them, because that is what the
> > hardware is. All you can do is configure the shapers that exist.
> > 
> > > The 'create' op is just an abstraction to tell the NIC to switch from
> > > the default configuration to the specified one.
> > 
> > Well, the hardware default is i think WRR for the queues, and line
> > rate. That will be what the software representation of the shapers
> > will be set to when the driver probes and creates the shapers
> > representors.
> 
> If I read correctly, allowing each NIC to expose it's own different
> starting configuration still will not solve the problem for this H/W to
> switch from WRR to SP (and vice versa).
> 
> AFAICS, what would be needed there is an atomic set of operations:
> 'set_many' (and e.v. 'delete_many', 'create_many') that will allow
> changing all the shapers at once. 
> 
> With such operations, that H/W could still fit the expected 'no-op'
> default, as WRR on the queue shapers is what we expect. I agree with
> Jakub, handling the complexity of arbitrary starting configuration
> would pose a lot of trouble to the user/admin.
> 
> If all the above stands together, I think we have a few options (in
> random order):
> 
> - add both set of operations: the ones operating on a single shaper and
> the ones operating on multiple shapers
> - use only the multiple shapers ops.
> 
> And the latter looks IMHO the simple/better. At that point I would
> probably drop the 'add' op and would rename 'delete' as
> 'reset':
> 
> int (*set)(struct net_device *dev, int how_many, const u32 *handles,
> 	   const struct net_shaper_info *shapers,
>            struct netlink_ext_ack *extack);
> int (*reset)(struct net_device *dev, int how_many, const u32 *handles,
>              struct netlink_ext_ack *extack);
> int (*move)(struct net_device *dev, int how_many, const u32 *handles,
>             const u32 *new_parent_handles,
> 	    struct netlink_ext_ack *extack);
> 
> An NIC with 'static' shapers can implement a dummy move always
> returning EOPNOTSUPP and eventually filling a detailed extack.
> 
> NIC without any constraints on mixing and matching different kind of
> shapers could implement the above as a loop over whatever they will do
> for the corresponding 'single shaper op'
> 
> NIC with constrains alike the one you pointed out could validate the
> final state before atomically applying the specified operation.
> 
> After a successful  'reset' operation, the kernel could drop any data
> it retains/caches for the relevant shapers - the current idea is to
> keep a copy of all successfully configured shaper_info in a xarray,
> using the 'handle' as the index.
> 
> Side note: the move() operation could look like a complex artifact, but
> it's the simplest instrument I could think of to support scenarios
> where the user creates/configures/sets a queue group and 'move' some
> queue under the newly created group
> 
> WDYT?

Hi Andrew,

Picking up this discussion, I'm wondering if Paolo's proposal
addresses your concerns.
Andrew Lunn May 15, 2024, 2:19 p.m. UTC | #11
> > If I read correctly, allowing each NIC to expose it's own different
> > starting configuration still will not solve the problem for this H/W to
> > switch from WRR to SP (and vice versa).

I also suspect this is not unique to this hardware. I've not looked at
other SOHO switches, but it is reasonably common to have different
queues for different priority classes, and then one shaper for the
overall port rate.

> > AFAICS, what would be needed there is an atomic set of operations:
> > 'set_many' (and e.v. 'delete_many', 'create_many') that will allow
> > changing all the shapers at once. 

Yep.

> > With such operations, that H/W could still fit the expected 'no-op'
> > default, as WRR on the queue shapers is what we expect. I agree with
> > Jakub, handling the complexity of arbitrary starting configuration
> > would pose a lot of trouble to the user/admin.
> > 
> > If all the above stands together, I think we have a few options (in
> > random order):
> > 
> > - add both set of operations: the ones operating on a single shaper and
> > the ones operating on multiple shapers
> > - use only the multiple shapers ops.
> > 
> > And the latter looks IMHO the simple/better.

I would agree, start with only multiple shaper opps. If we find that
many implementation end up just iterating the list and dealing with
them individually, would could pull that iterator into the core, and
expand the ops to either/or, multiple or single.

> > int (*set)(struct net_device *dev, int how_many, const u32 *handles,
> > 	   const struct net_shaper_info *shapers,
> >            struct netlink_ext_ack *extack);
> > int (*reset)(struct net_device *dev, int how_many, const u32 *handles,
> >              struct netlink_ext_ack *extack);
> > int (*move)(struct net_device *dev, int how_many, const u32 *handles,
> >             const u32 *new_parent_handles,
> > 	    struct netlink_ext_ack *extack);
> > 
> > An NIC with 'static' shapers can implement a dummy move always
> > returning EOPNOTSUPP and eventually filling a detailed extack.

The extack is going to be important here, we are going to need
meaningful error messages.

Overall, i think this can be made to work with the hardware i have.

	 Andrew
Andrew Lunn May 15, 2024, 2:41 p.m. UTC | #12
> + * struct net_shaper_info - represents a shaping node on the NIC H/W
> + * @metric: Specify if the bw limits refers to PPS or BPS
> + * @bw_min: Minimum guaranteed rate for this shaper
> + * @bw_max: Maximum peak bw allowed for this shaper
> + * @burst: Maximum burst for the peek rate of this shaper
> + * @priority: Scheduling priority for this shaper
> + * @weight: Scheduling weight for this shaper
> + */
> +struct net_shaper_info {
> +	enum net_shaper_metric metric;
> +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> +	u64 bw_max;	/* maximum allowed bandwidth */
> +	u32 burst;	/* maximum burst in bytes for bw_max */
> +	u32 priority;	/* scheduling strict priority */
> +	u32 weight;	/* scheduling WRR weight*/
> +};

...

> +	/** set - Update the specified shaper, if it exists
> +	 * @dev: Netdevice to operate on.
> +	 * @handle: the shaper identifier
> +	 * @shaper: Configuration of shaper.
> +	 * @extack: Netlink extended ACK for reporting errors.
> +	 *
> +	 * Return:
> +	 * * %0 - Success
> +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> +	 *                  or core for any reason. @extack should be set to
> +	 *                  text describing the reason.
> +	 * * Other negative error values on failure.
> +	 */
> +	int (*set)(struct net_device *dev, u32 handle,
> +		   const struct net_shaper_info *shaper,
> +		   struct netlink_ext_ack *extack);

> + * net_shaper_make_handle - creates an unique shaper identifier
> + * @scope: the shaper scope
> + * @vf: virtual function number
> + * @id: queue group or queue id
> + *
> + * Return: an unique identifier for the shaper
> + *
> + * Combines the specified arguments to create an unique identifier for
> + * the shaper.
> + * The virtual function number is only used within @NET_SHAPER_SCOPE_VF,
> + * @NET_SHAPER_SCOPE_QUEUE_GROUP and @NET_SHAPER_SCOPE_QUEUE.
> + * The @id number is only used for @NET_SHAPER_SCOPE_QUEUE_GROUP and
> + * @NET_SHAPER_SCOPE_QUEUE, and must be, respectively, the queue group
> + * identifier or the queue number.
> + */
> +u32 net_shaper_make_handle(enum net_shaper_scope scope, int vf, int id);

One thing i'm missing here is a function which does the opposite of
net_shaper_make_handle(). Given a handle, it returns the scope, vf and
the id.

When the set() op is called, i somehow need to find the software
instance representing the hardware block. If i know the id, it is just
an array access. Otherwise i need additional bookkeeping, maybe a
linked list of handles and pointers to structures etc.

Or net_shaper_make_handle() could maybe take an addition void * priv,
and provide a function void * net_shape_priv(u32 handle);

    Andrew
Simon Horman May 15, 2024, 2:50 p.m. UTC | #13
On Wed, May 15, 2024 at 04:41:09PM +0200, Andrew Lunn wrote:
> > + * struct net_shaper_info - represents a shaping node on the NIC H/W
> > + * @metric: Specify if the bw limits refers to PPS or BPS
> > + * @bw_min: Minimum guaranteed rate for this shaper
> > + * @bw_max: Maximum peak bw allowed for this shaper
> > + * @burst: Maximum burst for the peek rate of this shaper
> > + * @priority: Scheduling priority for this shaper
> > + * @weight: Scheduling weight for this shaper
> > + */
> > +struct net_shaper_info {
> > +	enum net_shaper_metric metric;
> > +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> > +	u64 bw_max;	/* maximum allowed bandwidth */
> > +	u32 burst;	/* maximum burst in bytes for bw_max */
> > +	u32 priority;	/* scheduling strict priority */
> > +	u32 weight;	/* scheduling WRR weight*/
> > +};
> 
> ...
> 
> > +	/** set - Update the specified shaper, if it exists
> > +	 * @dev: Netdevice to operate on.
> > +	 * @handle: the shaper identifier
> > +	 * @shaper: Configuration of shaper.
> > +	 * @extack: Netlink extended ACK for reporting errors.
> > +	 *
> > +	 * Return:
> > +	 * * %0 - Success
> > +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> > +	 *                  or core for any reason. @extack should be set to
> > +	 *                  text describing the reason.
> > +	 * * Other negative error values on failure.
> > +	 */
> > +	int (*set)(struct net_device *dev, u32 handle,
> > +		   const struct net_shaper_info *shaper,
> > +		   struct netlink_ext_ack *extack);
> 
> > + * net_shaper_make_handle - creates an unique shaper identifier
> > + * @scope: the shaper scope
> > + * @vf: virtual function number
> > + * @id: queue group or queue id
> > + *
> > + * Return: an unique identifier for the shaper
> > + *
> > + * Combines the specified arguments to create an unique identifier for
> > + * the shaper.
> > + * The virtual function number is only used within @NET_SHAPER_SCOPE_VF,
> > + * @NET_SHAPER_SCOPE_QUEUE_GROUP and @NET_SHAPER_SCOPE_QUEUE.
> > + * The @id number is only used for @NET_SHAPER_SCOPE_QUEUE_GROUP and
> > + * @NET_SHAPER_SCOPE_QUEUE, and must be, respectively, the queue group
> > + * identifier or the queue number.
> > + */
> > +u32 net_shaper_make_handle(enum net_shaper_scope scope, int vf, int id);
> 
> One thing i'm missing here is a function which does the opposite of
> net_shaper_make_handle(). Given a handle, it returns the scope, vf and
> the id.
> 
> When the set() op is called, i somehow need to find the software
> instance representing the hardware block. If i know the id, it is just
> an array access. Otherwise i need additional bookkeeping, maybe a
> linked list of handles and pointers to structures etc.
> 
> Or net_shaper_make_handle() could maybe take an addition void * priv,
> and provide a function void * net_shape_priv(u32 handle);

Hi Andrew,

I think that, initially at least, the implementation of
net_shaper_make_handle() can be fairly simple, involving packing
fields into an integer in a static manner.

As such I think implementing a helper or helpers to an to extract fields
should be trivial.

In other words, yes, I think that can be added.
Simon Horman May 15, 2024, 2:56 p.m. UTC | #14
On Wed, May 15, 2024 at 04:19:57PM +0200, Andrew Lunn wrote:
> > > If I read correctly, allowing each NIC to expose it's own different
> > > starting configuration still will not solve the problem for this H/W to
> > > switch from WRR to SP (and vice versa).
> 
> I also suspect this is not unique to this hardware. I've not looked at
> other SOHO switches, but it is reasonably common to have different
> queues for different priority classes, and then one shaper for the
> overall port rate.

Yes, understood. It's about creating a sufficiently general solution.
And the HW you have in mind has lead us to see some shortcomings
of the proposed API in that area. Because it drew a bit too much on
understanding of a different category of HW.

> > > AFAICS, what would be needed there is an atomic set of operations:
> > > 'set_many' (and e.v. 'delete_many', 'create_many') that will allow
> > > changing all the shapers at once. 
> 
> Yep.
> 
> > > With such operations, that H/W could still fit the expected 'no-op'
> > > default, as WRR on the queue shapers is what we expect. I agree with
> > > Jakub, handling the complexity of arbitrary starting configuration
> > > would pose a lot of trouble to the user/admin.
> > > 
> > > If all the above stands together, I think we have a few options (in
> > > random order):
> > > 
> > > - add both set of operations: the ones operating on a single shaper and
> > > the ones operating on multiple shapers
> > > - use only the multiple shapers ops.
> > > 
> > > And the latter looks IMHO the simple/better.
> 
> I would agree, start with only multiple shaper opps. If we find that
> many implementation end up just iterating the list and dealing with
> them individually, would could pull that iterator into the core, and
> expand the ops to either/or, multiple or single.

FWIIW, this was my thinking too.

> > > int (*set)(struct net_device *dev, int how_many, const u32 *handles,
> > > 	   const struct net_shaper_info *shapers,
> > >            struct netlink_ext_ack *extack);
> > > int (*reset)(struct net_device *dev, int how_many, const u32 *handles,
> > >              struct netlink_ext_ack *extack);
> > > int (*move)(struct net_device *dev, int how_many, const u32 *handles,
> > >             const u32 *new_parent_handles,
> > > 	    struct netlink_ext_ack *extack);
> > > 
> > > An NIC with 'static' shapers can implement a dummy move always
> > > returning EOPNOTSUPP and eventually filling a detailed extack.
> 
> The extack is going to be important here, we are going to need
> meaningful error messages.

Always :)

> Overall, i think this can be made to work with the hardware i have.

Great, I think the next step is for us to propose a revised API
with multiple shaper ops in place of single shaper ops.
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf261fb89d73..39f66af014be 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -79,6 +79,8 @@  struct xdp_buff;
 struct xdp_frame;
 struct xdp_metadata_ops;
 struct xdp_md;
+struct net_shaper_ops;
+struct net_shaper_data;
 
 typedef u32 xdp_features_t;
 
@@ -1596,6 +1598,13 @@  struct net_device_ops {
 	int			(*ndo_hwtstamp_set)(struct net_device *dev,
 						    struct kernel_hwtstamp_config *kernel_config,
 						    struct netlink_ext_ack *extack);
+
+#if IS_ENABLED(CONFIG_NET_SHAPER)
+	/** @net_shaper_ops: Device shaping offload operations
+	 * see include/net/net_shapers.h
+	 */
+	const struct net_shaper_ops *net_shaper_ops;
+#endif
 };
 
 /**
@@ -2403,6 +2412,12 @@  struct net_device {
 	/** @page_pools: page pools created for this netdevice */
 	struct hlist_head	page_pools;
 #endif
+#if IS_ENABLED(CONFIG_NET_SHAPER)
+	/** @net_shaper_data: data tracking the current shaper status
+	 *  see include/net/net_shapers.h
+	 */
+	struct net_shaper_data *net_shaper_data;
+#endif
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/net/net_shaper.h b/include/net/net_shaper.h
new file mode 100644
index 000000000000..a4fbadd99870
--- /dev/null
+++ b/include/net/net_shaper.h
@@ -0,0 +1,206 @@ 
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef _NET_SHAPER_H_
+#define _NET_SHAPER_H_
+
+#include <linux/types.h>
+#include <linux/netdevice.h>
+#include <linux/netlink.h>
+
+/**
+ * enum net_shaper_metric - the metric of the shaper
+ * @NET_SHAPER_METRIC_PPS: Shaper operates on a packets per second basis
+ * @NET_SHAPER_METRIC_BPS: Shaper operates on a bits per second basis
+ */
+enum net_shaper_metric {
+	NET_SHAPER_METRIC_PPS,
+	NET_SHAPER_METRIC_BPS
+};
+
+/**
+ * struct net_shaper_info - represents a shaping node on the NIC H/W
+ * @metric: Specify if the bw limits refers to PPS or BPS
+ * @bw_min: Minimum guaranteed rate for this shaper
+ * @bw_max: Maximum peak bw allowed for this shaper
+ * @burst: Maximum burst for the peek rate of this shaper
+ * @priority: Scheduling priority for this shaper
+ * @weight: Scheduling weight for this shaper
+ */
+struct net_shaper_info {
+	enum net_shaper_metric metric;
+	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
+	u64 bw_max;	/* maximum allowed bandwidth */
+	u32 burst;	/* maximum burst in bytes for bw_max */
+	u32 priority;	/* scheduling strict priority */
+	u32 weight;	/* scheduling WRR weight*/
+};
+
+/**
+ * enum net_shaper_scope - the different scopes where a shaper could be attached
+ * @NET_SHAPER_SCOPE_PORT:   The root shaper for the whole H/W.
+ * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network device.
+ * @NET_SHAPER_SCOPE_VF:     The shaper is attached to the given virtual
+ * function.
+ * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple queues under the
+ * same device.
+ * @NET_SHAPER_SCOPE_QUEUE:  The shaper is attached to the given device queue.
+ *
+ * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only available on
+ * PF devices, usually inside the host/hypervisor.
+ * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
+ * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
+ */
+enum net_shaper_scope {
+	NET_SHAPER_SCOPE_PORT,
+	NET_SHAPER_SCOPE_NETDEV,
+	NET_SHAPER_SCOPE_VF,
+	NET_SHAPER_SCOPE_QUEUE_GROUP,
+	NET_SHAPER_SCOPE_QUEUE,
+};
+
+/**
+ * struct net_shaper_ops - Operations on device H/W shapers
+ * @add: Creates a new shaper in the specified scope.
+ * @set: Modify the existing shaper.
+ * @delete: Delete the specified shaper.
+ * @move: Move an existing shaper under a different parent.
+ *
+ * The initial shaping configuration ad device initialization is empty/
+ * a no-op/does not constraint the b/w in any way.
+ * The network core keeps track of the applied user-configuration in
+ * per device storage.
+ *
+ * Each shaper is uniquely identified within the device with an 'handle',
+ * dependent on the shaper scope and other data, see @shaper_make_handle()
+ */
+struct net_shaper_ops {
+	/** add - Add a shaper inside the shaper hierarchy
+	 * @dev: netdevice to operate on
+	 * @handle: the shaper indetifier
+	 * @shaper: configuration of shaper
+	 * @extack: Netlink extended ACK for reporting errors.
+	 *
+	 * Return:
+	 * * 0 on success
+	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
+	 *                  or core for any reason. @extack should be set to
+	 *                  text describing the reason.
+	 * * Other negative error values on failure.
+	 *
+	 * Examples or reasons this operation may fail include:
+	 * * H/W resources limits.
+	 * * Can’t respect the requested bw limits.
+	 */
+	int (*add)(struct net_device *dev, u32 handle,
+		   const struct net_shaper_info *shaper,
+		   struct netlink_ext_ack *extack);
+
+	/** set - Update the specified shaper, if it exists
+	 * @dev: Netdevice to operate on.
+	 * @handle: the shaper identifier
+	 * @shaper: Configuration of shaper.
+	 * @extack: Netlink extended ACK for reporting errors.
+	 *
+	 * Return:
+	 * * %0 - Success
+	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
+	 *                  or core for any reason. @extack should be set to
+	 *                  text describing the reason.
+	 * * Other negative error values on failure.
+	 */
+	int (*set)(struct net_device *dev, u32 handle,
+		   const struct net_shaper_info *shaper,
+		   struct netlink_ext_ack *extack);
+
+	/** delete - Removes a shaper from the NIC
+	 * @dev: netdevice to operate on.
+	 * @handle: the shaper identifier
+	 * @extack: Netlink extended ACK for reporting errors.
+	 *
+	 * Return:
+	 * * %0 - Success
+	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
+	 *                  or core for any reason. @extack should be set to
+	 *                  text describing the reason.
+	 * * Other negative error value on failure.
+	 */
+	int (*delete)(struct net_device *dev, u32 handle,
+		      struct netlink_ext_ack *extack);
+
+	/** Move - change the parent id of the specified shaper
+	 * @dev: netdevice to operate on.
+	 * @handle: unique identifier for the shaper
+	 * @new_parent_id: identifier of the new parent for this shaper
+	 * @extack: Netlink extended ACK for reporting errors.
+	 *
+	 * Move the specified shaper in the hierarchy replacing its
+	 * current parent shaper with @new_parent_id
+	 *
+	 * Return:
+	 * * %0 - Success
+	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
+	 *                  or core for any reason. @extack should be set to
+	 *                  text describing the reason.
+	 * * Other negative error values on failure.
+	 */
+	int (*move)(struct net_device *dev, u32 handle,
+		    u32 new_parent_handle, struct netlink_ext_ack *extack);
+};
+
+/**
+ * net_shaper_make_handle - creates an unique shaper identifier
+ * @scope: the shaper scope
+ * @vf: virtual function number
+ * @id: queue group or queue id
+ *
+ * Return: an unique identifier for the shaper
+ *
+ * Combines the specified arguments to create an unique identifier for
+ * the shaper.
+ * The virtual function number is only used within @NET_SHAPER_SCOPE_VF,
+ * @NET_SHAPER_SCOPE_QUEUE_GROUP and @NET_SHAPER_SCOPE_QUEUE.
+ * The @id number is only used for @NET_SHAPER_SCOPE_QUEUE_GROUP and
+ * @NET_SHAPER_SCOPE_QUEUE, and must be, respectively, the queue group
+ * identifier or the queue number.
+ */
+u32 net_shaper_make_handle(enum net_shaper_scope scope, int vf, int id);
+
+/*
+ * Examples:
+ * - set shaping on a given queue
+ *   struct shaper_info info = { }; // fill this
+ *   u32 handle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE, 0, queue_id);
+ *   dev->shaper_ops->add(dev, handle, &info, NULL);
+ *
+ * - create a queue group with a queue group shaping limits.
+ *   Assuming the following topology already exists:
+ *                       < netdev shaper  >
+ *                        /              \
+ *               <queue 0 shaper> . . .  <queue N shaper>
+ *
+ *   struct shaper_info ginfo = { }; // fill this
+ *   u32 ghandle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE_GROUP, 0, 0);
+ *   dev->shaper_ops->add(dev, ghandle, &ginfo);
+ *
+ *   // now topology is:
+ *   //                              < netdev shaper  >
+ *   //                             /         |          \
+ *   //                            /          |       < newly created shaper  >
+ *   //                           /           |
+ *   //	<queue 0 shaper> . . .    <queue N shaper>
+ *
+ *   // move a shapers for queues 3..n out of such queue group
+ *   for (i = 0; i <= 2; ++i) {
+ *       u32 qhandle = net_shaper_make_handle(NET_SHAPER_SCOPE_QUEUE, 0, i);
+ *       dev->netshaper_ops->move(dev, qhandle, ghandle, NULL);
+ *   }
+ *
+ *   // now the topology is:
+ *   //                                < netdev shaper  >
+ *   //                                 /            \
+ *   //               < newly created shaper>   <queue 3 shaper> .. <queue n shaper>
+ *   //                /               \
+ *   //	<queue 0 shaper> . . .    <queue 2 shaper>
+ */
+#endif
+
diff --git a/net/Kconfig b/net/Kconfig
index f0a8692496ff..29c6fec54711 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -66,6 +66,9 @@  config SKB_DECRYPTED
 config SKB_EXTENSIONS
 	bool
 
+config NET_SHAPER
+	bool
+
 menu "Networking options"
 
 source "net/packet/Kconfig"