diff mbox series

[net-next,v2] ipv6: ioam: Support for Queue depth data field

Message ID 20211224135000.9291-1-justin.iurman@uliege.be (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] ipv6: ioam: Support for Queue depth data field | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 32 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Justin Iurman Dec. 24, 2021, 1:50 p.m. UTC
v2:
 - Fix sparse warning (use rcu_dereference)

This patch adds support for the queue depth in IOAM trace data fields.

The draft [1] says the following:

   The "queue depth" field is a 4-octet unsigned integer field.  This
   field indicates the current length of the egress interface queue of
   the interface from where the packet is forwarded out.  The queue
   depth is expressed as the current amount of memory buffers used by
   the queue (a packet could consume one or more memory buffers,
   depending on its size).

An existing function (i.e., qdisc_qstats_qlen_backlog) is used to
retrieve the current queue length without reinventing the wheel.

Note: it was tested and qlen is increasing when an artificial delay is
added on the egress with tc.

  [1] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.2.7

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 net/ipv6/ioam6.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Ido Schimmel Dec. 24, 2021, 5:53 p.m. UTC | #1
On Fri, Dec 24, 2021 at 02:50:00PM +0100, Justin Iurman wrote:
> v2:
>  - Fix sparse warning (use rcu_dereference)
> 
> This patch adds support for the queue depth in IOAM trace data fields.
> 
> The draft [1] says the following:
> 
>    The "queue depth" field is a 4-octet unsigned integer field.  This
>    field indicates the current length of the egress interface queue of
>    the interface from where the packet is forwarded out.  The queue
>    depth is expressed as the current amount of memory buffers used by
>    the queue (a packet could consume one or more memory buffers,
>    depending on its size).
> 
> An existing function (i.e., qdisc_qstats_qlen_backlog) is used to
> retrieve the current queue length without reinventing the wheel.
> 
> Note: it was tested and qlen is increasing when an artificial delay is
> added on the egress with tc.
> 
>   [1] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.2.7
> 
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  net/ipv6/ioam6.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> index 122a3d47424c..969a5adbaf5c 100644
> --- a/net/ipv6/ioam6.c
> +++ b/net/ipv6/ioam6.c
> @@ -13,10 +13,12 @@
>  #include <linux/ioam6.h>
>  #include <linux/ioam6_genl.h>
>  #include <linux/rhashtable.h>
> +#include <linux/netdevice.h>
>  
>  #include <net/addrconf.h>
>  #include <net/genetlink.h>
>  #include <net/ioam6.h>
> +#include <net/sch_generic.h>
>  
>  static void ioam6_ns_release(struct ioam6_namespace *ns)
>  {
> @@ -717,7 +719,19 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
>  
>  	/* queue depth */
>  	if (trace->type.bit6) {
> -		*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
> +		struct netdev_queue *queue;
> +		struct Qdisc *qdisc;
> +		__u32 qlen, backlog;
> +
> +		if (skb_dst(skb)->dev->flags & IFF_LOOPBACK) {
> +			*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
> +		} else {
> +			queue = skb_get_tx_queue(skb_dst(skb)->dev, skb);
> +			qdisc = rcu_dereference(queue->qdisc);
> +			qdisc_qstats_qlen_backlog(qdisc, &qlen, &backlog);
> +
> +			*(__be32 *)data = cpu_to_be32(qlen);

Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
seems that queue depth needs to take into account the size of the
enqueued packets, not only their number.

Did you check what other IOAM implementations (SW/HW) report for queue
depth? I would assume that they report bytes.

> +		}
>  		data += sizeof(__be32);
>  	}
>  
> -- 
> 2.25.1
>
Justin Iurman Dec. 26, 2021, 11:47 a.m. UTC | #2
On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> seems that queue depth needs to take into account the size of the
> enqueued packets, not only their number.

The quoted paragraph contains the following sentence:

   "The queue depth is expressed as the current amount of memory
    buffers used by the queue"

So my understanding is that we need their number, not their size.

> Did you check what other IOAM implementations (SW/HW) report for queue
> depth? I would assume that they report bytes.

Unfortunately, IOAM is quite new, and so IOAM implementations don't
grow on trees. The Linux kernel implementation is one of the first,
except for VPP and IOS (Cisco) which did not implement the queue
depth data field.
Ido Schimmel Dec. 26, 2021, 12:40 p.m. UTC | #3
On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> > seems that queue depth needs to take into account the size of the
> > enqueued packets, not only their number.
> 
> The quoted paragraph contains the following sentence:
> 
>    "The queue depth is expressed as the current amount of memory
>     buffers used by the queue"
> 
> So my understanding is that we need their number, not their size.

It also says "a packet could consume one or more memory buffers,
depending on its size". If, for example, you define tc-red limit as 1M,
then it makes a lot of difference if the 1,000 packets you have in the
queue are 9,000 bytes in size or 64 bytes.

> 
> > Did you check what other IOAM implementations (SW/HW) report for queue
> > depth? I would assume that they report bytes.
> 
> Unfortunately, IOAM is quite new, and so IOAM implementations don't
> grow on trees. The Linux kernel implementation is one of the first,
> except for VPP and IOS (Cisco) which did not implement the queue
> depth data field.

At least on Mellanox/Nvidia switches, queue depth (not necessarily for
IOAM) is always reported in bytes. I have a colleague who authored a few
IOAM IETF drafts, I will ask for his input on this and share.
Justin Iurman Dec. 26, 2021, 12:59 p.m. UTC | #4
On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
> On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
>> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
>> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
>> > seems that queue depth needs to take into account the size of the
>> > enqueued packets, not only their number.
>> 
>> The quoted paragraph contains the following sentence:
>> 
>>    "The queue depth is expressed as the current amount of memory
>>     buffers used by the queue"
>> 
>> So my understanding is that we need their number, not their size.
> 
> It also says "a packet could consume one or more memory buffers,
> depending on its size". If, for example, you define tc-red limit as 1M,
> then it makes a lot of difference if the 1,000 packets you have in the
> queue are 9,000 bytes in size or 64 bytes.

Agree. We probably could use 'backlog' instead, regarding this
statement:

  "It should be noted that the semantics of some of the node data fields
   that are defined below, such as the queue depth and buffer occupancy,
   are implementation specific.  This approach is intended to allow IOAM
   nodes with various different architectures."

It would indeed make more sense, based on your example. However, the
limit (32 bits) could be reached faster using 'backlog' rather than
'qlen'. But I guess this tradeoff is the price to pay to be as close
as possible to the spec.
Ido Schimmel Dec. 26, 2021, 1:15 p.m. UTC | #5
On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> >> > seems that queue depth needs to take into account the size of the
> >> > enqueued packets, not only their number.
> >> 
> >> The quoted paragraph contains the following sentence:
> >> 
> >>    "The queue depth is expressed as the current amount of memory
> >>     buffers used by the queue"
> >> 
> >> So my understanding is that we need their number, not their size.
> > 
> > It also says "a packet could consume one or more memory buffers,
> > depending on its size". If, for example, you define tc-red limit as 1M,
> > then it makes a lot of difference if the 1,000 packets you have in the
> > queue are 9,000 bytes in size or 64 bytes.
> 
> Agree. We probably could use 'backlog' instead, regarding this
> statement:
> 
>   "It should be noted that the semantics of some of the node data fields
>    that are defined below, such as the queue depth and buffer occupancy,
>    are implementation specific.  This approach is intended to allow IOAM
>    nodes with various different architectures."
> 
> It would indeed make more sense, based on your example. However, the
> limit (32 bits) could be reached faster using 'backlog' rather than
> 'qlen'. But I guess this tradeoff is the price to pay to be as close
> as possible to the spec.

At least in Linux 'backlog' is 32 bits so we are OK :)
We don't have such big buffers in hardware and I'm not sure what
insights an operator will get from a queue depth larger than 4GB...

I just got an OOO auto-reply from my colleague so I'm not sure I will be
able to share his input before next week. Anyway, reporting 'backlog'
makes sense to me, FWIW.
Justin Iurman Dec. 27, 2021, 2:06 p.m. UTC | #6
On Dec 26, 2021, at 2:15 PM, Ido Schimmel idosch@idosch.org wrote:
> On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
>> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
>> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
>> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
>> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
>> >> > seems that queue depth needs to take into account the size of the
>> >> > enqueued packets, not only their number.
>> >> 
>> >> The quoted paragraph contains the following sentence:
>> >> 
>> >>    "The queue depth is expressed as the current amount of memory
>> >>     buffers used by the queue"
>> >> 
>> >> So my understanding is that we need their number, not their size.
>> > 
>> > It also says "a packet could consume one or more memory buffers,
>> > depending on its size". If, for example, you define tc-red limit as 1M,
>> > then it makes a lot of difference if the 1,000 packets you have in the
>> > queue are 9,000 bytes in size or 64 bytes.
>> 
>> Agree. We probably could use 'backlog' instead, regarding this
>> statement:
>> 
>>   "It should be noted that the semantics of some of the node data fields
>>    that are defined below, such as the queue depth and buffer occupancy,
>>    are implementation specific.  This approach is intended to allow IOAM
>>    nodes with various different architectures."
>> 
>> It would indeed make more sense, based on your example. However, the
>> limit (32 bits) could be reached faster using 'backlog' rather than
>> 'qlen'. But I guess this tradeoff is the price to pay to be as close
>> as possible to the spec.
> 
> At least in Linux 'backlog' is 32 bits so we are OK :)
> We don't have such big buffers in hardware and I'm not sure what
> insights an operator will get from a queue depth larger than 4GB...

Indeed :-)

> I just got an OOO auto-reply from my colleague so I'm not sure I will be
> able to share his input before next week. Anyway, reporting 'backlog'
> makes sense to me, FWIW.

Right. I read that Linus is planning to release a -rc8 so I think I can
wait another week before posting -v3.
Ido Schimmel Dec. 30, 2021, 2:47 p.m. UTC | #7
On Mon, Dec 27, 2021 at 03:06:42PM +0100, Justin Iurman wrote:
> On Dec 26, 2021, at 2:15 PM, Ido Schimmel idosch@idosch.org wrote:
> > On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
> >> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
> >> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
> >> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> >> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> >> >> > seems that queue depth needs to take into account the size of the
> >> >> > enqueued packets, not only their number.
> >> >> 
> >> >> The quoted paragraph contains the following sentence:
> >> >> 
> >> >>    "The queue depth is expressed as the current amount of memory
> >> >>     buffers used by the queue"
> >> >> 
> >> >> So my understanding is that we need their number, not their size.
> >> > 
> >> > It also says "a packet could consume one or more memory buffers,
> >> > depending on its size". If, for example, you define tc-red limit as 1M,
> >> > then it makes a lot of difference if the 1,000 packets you have in the
> >> > queue are 9,000 bytes in size or 64 bytes.
> >> 
> >> Agree. We probably could use 'backlog' instead, regarding this
> >> statement:
> >> 
> >>   "It should be noted that the semantics of some of the node data fields
> >>    that are defined below, such as the queue depth and buffer occupancy,
> >>    are implementation specific.  This approach is intended to allow IOAM
> >>    nodes with various different architectures."
> >> 
> >> It would indeed make more sense, based on your example. However, the
> >> limit (32 bits) could be reached faster using 'backlog' rather than
> >> 'qlen'. But I guess this tradeoff is the price to pay to be as close
> >> as possible to the spec.
> > 
> > At least in Linux 'backlog' is 32 bits so we are OK :)
> > We don't have such big buffers in hardware and I'm not sure what
> > insights an operator will get from a queue depth larger than 4GB...
> 
> Indeed :-)
> 
> > I just got an OOO auto-reply from my colleague so I'm not sure I will be
> > able to share his input before next week. Anyway, reporting 'backlog'
> > makes sense to me, FWIW.
> 
> Right. I read that Linus is planning to release a -rc8 so I think I can
> wait another week before posting -v3.

The answer I got from my colleagues is that they expect the field to
either encode bytes (what Mellanox/Nvidia is doing) or "cells", which is
an "allocation granularity of memory within the shared buffer" (see man
devlink-sb).
Justin Iurman Dec. 30, 2021, 4:50 p.m. UTC | #8
On Dec 30, 2021, at 3:47 PM, Ido Schimmel idosch@idosch.org wrote:
> On Mon, Dec 27, 2021 at 03:06:42PM +0100, Justin Iurman wrote:
>> On Dec 26, 2021, at 2:15 PM, Ido Schimmel idosch@idosch.org wrote:
>> > On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
>> >> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
>> >> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
>> >> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
>> >> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
>> >> >> > seems that queue depth needs to take into account the size of the
>> >> >> > enqueued packets, not only their number.
>> >> >> 
>> >> >> The quoted paragraph contains the following sentence:
>> >> >> 
>> >> >>    "The queue depth is expressed as the current amount of memory
>> >> >>     buffers used by the queue"
>> >> >> 
>> >> >> So my understanding is that we need their number, not their size.
>> >> > 
>> >> > It also says "a packet could consume one or more memory buffers,
>> >> > depending on its size". If, for example, you define tc-red limit as 1M,
>> >> > then it makes a lot of difference if the 1,000 packets you have in the
>> >> > queue are 9,000 bytes in size or 64 bytes.
>> >> 
>> >> Agree. We probably could use 'backlog' instead, regarding this
>> >> statement:
>> >> 
>> >>   "It should be noted that the semantics of some of the node data fields
>> >>    that are defined below, such as the queue depth and buffer occupancy,
>> >>    are implementation specific.  This approach is intended to allow IOAM
>> >>    nodes with various different architectures."
>> >> 
>> >> It would indeed make more sense, based on your example. However, the
>> >> limit (32 bits) could be reached faster using 'backlog' rather than
>> >> 'qlen'. But I guess this tradeoff is the price to pay to be as close
>> >> as possible to the spec.
>> > 
>> > At least in Linux 'backlog' is 32 bits so we are OK :)
>> > We don't have such big buffers in hardware and I'm not sure what
>> > insights an operator will get from a queue depth larger than 4GB...
>> 
>> Indeed :-)
>> 
>> > I just got an OOO auto-reply from my colleague so I'm not sure I will be
>> > able to share his input before next week. Anyway, reporting 'backlog'
>> > makes sense to me, FWIW.
>> 
>> Right. I read that Linus is planning to release a -rc8 so I think I can
>> wait another week before posting -v3.
> 
> The answer I got from my colleagues is that they expect the field to
> either encode bytes (what Mellanox/Nvidia is doing) or "cells", which is
> an "allocation granularity of memory within the shared buffer" (see man
> devlink-sb).

Thanks for that. It looks like devlink-sb would be gold for IOAM. But
based on what we discussed previously with Jakub, it cannot be used here
unfortunately. So I guess we have no choice but to use 'backlog' and
therefore report bytes. Which is also fine anyway. Thanks again for your
helpful comments, Ido. I appreciate.
diff mbox series

Patch

diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 122a3d47424c..969a5adbaf5c 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -13,10 +13,12 @@ 
 #include <linux/ioam6.h>
 #include <linux/ioam6_genl.h>
 #include <linux/rhashtable.h>
+#include <linux/netdevice.h>
 
 #include <net/addrconf.h>
 #include <net/genetlink.h>
 #include <net/ioam6.h>
+#include <net/sch_generic.h>
 
 static void ioam6_ns_release(struct ioam6_namespace *ns)
 {
@@ -717,7 +719,19 @@  static void __ioam6_fill_trace_data(struct sk_buff *skb,
 
 	/* queue depth */
 	if (trace->type.bit6) {
-		*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
+		struct netdev_queue *queue;
+		struct Qdisc *qdisc;
+		__u32 qlen, backlog;
+
+		if (skb_dst(skb)->dev->flags & IFF_LOOPBACK) {
+			*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
+		} else {
+			queue = skb_get_tx_queue(skb_dst(skb)->dev, skb);
+			qdisc = rcu_dereference(queue->qdisc);
+			qdisc_qstats_qlen_backlog(qdisc, &qlen, &backlog);
+
+			*(__be32 *)data = cpu_to_be32(qlen);
+		}
 		data += sizeof(__be32);
 	}