diff mbox series

[net-next] net: dsa: Remove indirect function call for flow dissection

Message ID 20200102233657.12933-1-f.fainelli@gmail.com (mailing list archive)
State New, archived
Headers show
Series [net-next] net: dsa: Remove indirect function call for flow dissection | expand

Commit Message

Florian Fainelli Jan. 2, 2020, 11:36 p.m. UTC
We only need "static" information to be given for DSA flow dissection,
so replace the expensive call to .flow_dissect() with an integer giving
us the offset into the packet array of bytes that we must de-reference
to obtain the protocol number. The overhead was alreayd available from
the dsa_device_ops structure so use that directly.

The presence of a flow_dissect callback used to indicate that the DSA
tagger supported returning that information,we now encode this with a
proto_off value of DSA_PROTO_OFF_UNPSEC if the tagger does not support
providing that information yet.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes since RFC:

- use a constant instead of the "magic" -1
- update all tag drivers and build test correctly

 include/net/dsa.h         |  5 +++--
 net/core/flow_dissector.c | 15 ++++++++++-----
 net/dsa/tag_brcm.c        |  2 ++
 net/dsa/tag_dsa.c         | 10 +---------
 net/dsa/tag_edsa.c        | 10 +---------
 net/dsa/tag_gswip.c       |  1 +
 net/dsa/tag_ksz.c         |  3 +++
 net/dsa/tag_lan9303.c     |  1 +
 net/dsa/tag_mtk.c         | 11 +----------
 net/dsa/tag_ocelot.c      |  1 +
 net/dsa/tag_qca.c         | 11 +----------
 net/dsa/tag_sja1105.c     |  1 +
 12 files changed, 26 insertions(+), 45 deletions(-)

Comments

Vladimir Oltean Jan. 3, 2020, 12:19 a.m. UTC | #1
Hi Florian,

On Fri, 3 Jan 2020 at 01:39, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> We only need "static" information to be given for DSA flow dissection,
> so replace the expensive call to .flow_dissect() with an integer giving
> us the offset into the packet array of bytes that we must de-reference

packet array? packed array?

> to obtain the protocol number. The overhead was alreayd available from

already

> the dsa_device_ops structure so use that directly.
>
> The presence of a flow_dissect callback used to indicate that the DSA
> tagger supported returning that information,we now encode this with a
> proto_off value of DSA_PROTO_OFF_UNPSEC if the tagger does not support

UNSPEC

> providing that information yet.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---

Unfortunately I don't really understand the DSA implementations of flow_dissect.
Is proto_off supposed to mean "the __be16 pointer difference A - B
between A. the position of the real EtherType and B. the current
skb->data (aka ETH_HLEN bytes into the frame, aka 2 bytes after the
normal EtherType was supposed to be)"?
Otherwise said, the offset in bytes between the real EtherType
position and skb->data is 2 * (proto_off + 1).
Furthermore, the offset in bytes is exactly equal to the tagger
overhead in bytes, unless it's a tag that doesn't push the EtherType
to the right, such as the trailer tag.

If the above is indeed correct, can you just skip DSA_PROTO_OFF_UNSPEC
and add proper proto_off values "in blind" for all taggers? I think
it's rather safe to assume that they all push the EtherType to the
right with the exception of the trailer tag, which will have an offset
of -1 in terms of __be16 pointers, by the way (so your -1 encoding of
DSA_PROTO_OFF_UNSPEC won't work for it anyway).

Also, documenting the unit of measurement for proto_off would really
go a long way.

What is a good test that the flow_dissector does what it's supposed to
do with DSA?

Regards,
-Vladimir
Florian Fainelli Jan. 3, 2020, 8:50 p.m. UTC | #2
On 1/2/20 4:19 PM, Vladimir Oltean wrote:
> Hi Florian,
> 
> On Fri, 3 Jan 2020 at 01:39, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> We only need "static" information to be given for DSA flow dissection,
>> so replace the expensive call to .flow_dissect() with an integer giving
>> us the offset into the packet array of bytes that we must de-reference
> 
> packet array? packed array?

Yes, packet array skb->data[] if you prefe

> 
>> to obtain the protocol number. The overhead was alreayd available from
> 
> already
> 
>> the dsa_device_ops structure so use that directly.
>>
>> The presence of a flow_dissect callback used to indicate that the DSA
>> tagger supported returning that information,we now encode this with a
>> proto_off value of DSA_PROTO_OFF_UNPSEC if the tagger does not support
> 
> UNSPEC
> 
>> providing that information yet.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
> 
> Unfortunately I don't really understand the DSA implementations of flow_dissect.
> Is proto_off supposed to mean "the __be16 pointer difference A - B
> between A. the position of the real EtherType and B. the current
> skb->data (aka ETH_HLEN bytes into the frame, aka 2 bytes after the
> normal EtherType was supposed to be)"?
> Otherwise said, the offset in bytes between the real EtherType
> position and skb->data is 2 * (proto_off + 1).> Furthermore, the offset in bytes is exactly equal to the tagger
> overhead in bytes, unless it's a tag that doesn't push the EtherType
> to the right, such as the trailer tag.

The call path is the following on TX (e.g.: when you run a DHCP client),
so this is always hit for raw packets:

__sys_sendto
  packet_sendmsg
    packet_parse_headers
       __skb_flow_dissect

and on RX this is not hit by default until you configure a RFS map on
your DSA master network device (more on that below), then the call stack is:

napi_complete_done
  gro_normal_list
     netif_receive_skb_list_internal
        get_rps_cpu
           skb_get_hash
              __skb_get_hash
                  __skb_flow_dissect

and this is called from the DSA master's RX path, so with an Ethernet
frame that has the DSA switch tag, and for which eth_type_trans() has
already been called so the SKB has already been pulled by ETH_HLEN and
skb->protocol is ETH_P_XDSA.

I don't think your formula works for EDSA which has an EtherType, but
this would probably work for all tags we currently support except trailer.

proto = (__be16 *)(skb->data)[overhead / 2 - 1];

> 
> If the above is indeed correct, can you just skip DSA_PROTO_OFF_UNSPEC
> and add proper proto_off values "in blind" for all taggers? I think
> it's rather safe to assume that they all push the EtherType to the
> right with the exception of the trailer tag, which will have an offset
> of -1 in terms of __be16 pointers, by the way (so your -1 encoding of
> DSA_PROTO_OFF_UNSPEC won't work for it anyway).
> 
> Also, documenting the unit of measurement for proto_off would really
> go a long way.
> 
> What is a good test that the flow_dissector does what it's supposed to
> do with DSA?

The commit that introduced flow dissection meant to fix it for the DSA
master network device (as you can see from the call trace), and this was
presumably meant to be used to steer traffic onto different RX or TX
queues on the DSA master, which is IMHO the wrong location where it
should be done for a number of reasons, but mainly because DSA slave
network devices can inherit the number of RX queues of their DSA master
and you can perform RFS there, in a standard way, without requiring
further changes down net/core/flow_dissect.c.

I don't think anyone except Alexander did serious investigation this.
For now, what I am interested in is reducing the amount of technical
debt and expensive function calls.
Vladimir Oltean Jan. 3, 2020, 9:28 p.m. UTC | #3
On Fri, 3 Jan 2020 at 22:50, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> The call path is the following on TX (e.g.: when you run a DHCP client),

Oh, it gets called on TX too, ok.
In that case, static proto_off information won't work for asymmetric
taggers such as ocelot which may have an independently configurable
prefix length on RX and TX.
I want to get rid of the RX tag prefix in ocelot though, but just saying.

> I don't think your formula works for EDSA which has an EtherType, but

Why doesn't it work with edsa?

> this would probably work for all tags we currently support except trailer.
>
> proto = (__be16 *)(skb->data)[overhead / 2 - 1];
>

I wasn't suggesting to do this exact calculation in flow_dissector.c,
but rather to pre-populate proto_off with a value statically derived
from it on a piece of paper, with the trailer exception where it would
be -2 in bytes or -1 in shorts, but nonetheless a negative and valid
value.

>
> I don't think anyone except Alexander did serious investigation this.
> For now, what I am interested in is reducing the amount of technical
> debt and expensive function calls.

Does the change bring any measurable improvement?

> --
> Florian

Regards,
-Vladimir
David Miller Jan. 5, 2020, 10:25 p.m. UTC | #4
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu,  2 Jan 2020 15:36:53 -0800

> We only need "static" information to be given for DSA flow dissection,
> so replace the expensive call to .flow_dissect() with an integer giving
> us the offset into the packet array of bytes that we must de-reference
> to obtain the protocol number. The overhead was alreayd available from
> the dsa_device_ops structure so use that directly.
> 
> The presence of a flow_dissect callback used to indicate that the DSA
> tagger supported returning that information,we now encode this with a
> proto_off value of DSA_PROTO_OFF_UNPSEC if the tagger does not support
> providing that information yet.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Changes since RFC:
> 
> - use a constant instead of the "magic" -1
> - update all tag drivers and build test correctly

At the very least the typos need to be fixed, so marking this changes
requested in patchwork just FYI...
Florian Fainelli Jan. 8, 2020, 6:03 p.m. UTC | #5
On 1/3/20 1:28 PM, Vladimir Oltean wrote:
> On Fri, 3 Jan 2020 at 22:50, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> The call path is the following on TX (e.g.: when you run a DHCP client),
> 
> Oh, it gets called on TX too, ok.
> In that case, static proto_off information won't work for asymmetric
> taggers such as ocelot which may have an independently configurable
> prefix length on RX and TX.
> I want to get rid of the RX tag prefix in ocelot though, but just saying.
> 
>> I don't think your formula works for EDSA which has an EtherType, but
> 
> Why doesn't it work with edsa?

It would, my bad.

> 
>> this would probably work for all tags we currently support except trailer.
>>
>> proto = (__be16 *)(skb->data)[overhead / 2 - 1];
>>
> 
> I wasn't suggesting to do this exact calculation in flow_dissector.c,
> but rather to pre-populate proto_off with a value statically derived
> from it on a piece of paper, with the trailer exception where it would
> be -2 in bytes or -1 in shorts, but nonetheless a negative and valid
> value.

With the trailer, the EtherType is actually at the expected location,
that is 12 bytes from the beginning of the Ethernet frame, so we can
simplify things even more.

> 
>>
>> I don't think anyone except Alexander did serious investigation this.
>> For now, what I am interested in is reducing the amount of technical
>> debt and expensive function calls.
> 
> Does the change bring any measurable improvement?

I did not implement flow dissection for Broadcom tags largely because it
did not show up as a performance problem with the different customers
but I will try to collect some numbers. At any rate, the patch is not
meant to be a performance improvement (though it might provide some
improvements) but ease maintenance and make it more straight forward for
future protocols to automatically gain dissection without having to
provide a function pointer.
diff mbox series

Patch

diff --git a/include/net/dsa.h b/include/net/dsa.h
index da5578db228e..5b77eb7eea02 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -68,18 +68,19 @@  enum dsa_tag_protocol {
 struct packet_type;
 struct dsa_switch;
 
+#define DSA_PROTO_OFF_UNSPEC	-1
+
 struct dsa_device_ops {
 	struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
 	struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
 			       struct packet_type *pt);
-	int (*flow_dissect)(const struct sk_buff *skb, __be16 *proto,
-			    int *offset);
 	/* Used to determine which traffic should match the DSA filter in
 	 * eth_type_trans, and which, if any, should bypass it and be processed
 	 * as regular on the master net device.
 	 */
 	bool (*filter)(const struct sk_buff *skb, struct net_device *dev);
 	unsigned int overhead;
+	int proto_off;
 	const char *name;
 	enum dsa_tag_protocol proto;
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2dbbb030fbed..1d8f1ecde51e 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -972,13 +972,18 @@  bool __skb_flow_dissect(const struct net *net,
 		if (unlikely(skb->dev && netdev_uses_dsa(skb->dev) &&
 			     proto == htons(ETH_P_XDSA))) {
 			const struct dsa_device_ops *ops;
-			int offset = 0;
+			unsigned int overhead;
+			int proto_off;
 
 			ops = skb->dev->dsa_ptr->tag_ops;
-			if (ops->flow_dissect &&
-			    !ops->flow_dissect(skb, &proto, &offset)) {
-				hlen -= offset;
-				nhoff += offset;
+			overhead = ops->overhead;
+			proto_off = ops->proto_off;
+			if (likely(overhead &&
+				   proto_off != DSA_PROTO_OFF_UNSPEC &&
+				   proto_off < skb->len)) {
+				hlen -= overhead;
+				nhoff += overhead;
+				proto = ((__be16 *)skb->data)[proto_off];
 			}
 		}
 #endif
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index 9c3114179690..abc050e3c092 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -177,6 +177,7 @@  static const struct dsa_device_ops brcm_netdev_ops = {
 	.xmit	= brcm_tag_xmit,
 	.rcv	= brcm_tag_rcv,
 	.overhead = BRCM_TAG_LEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 DSA_TAG_DRIVER(brcm_netdev_ops);
@@ -205,6 +206,7 @@  static const struct dsa_device_ops brcm_prepend_netdev_ops = {
 	.xmit	= brcm_tag_xmit_prepend,
 	.rcv	= brcm_tag_rcv_prepend,
 	.overhead = BRCM_TAG_LEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 DSA_TAG_DRIVER(brcm_prepend_netdev_ops);
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 7ddec9794477..4a970e959fef 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -142,21 +142,13 @@  static struct sk_buff *dsa_rcv(struct sk_buff *skb, struct net_device *dev,
 	return skb;
 }
 
-static int dsa_tag_flow_dissect(const struct sk_buff *skb, __be16 *proto,
-				int *offset)
-{
-	*offset = 4;
-	*proto = ((__be16 *)skb->data)[1];
-	return 0;
-}
-
 static const struct dsa_device_ops dsa_netdev_ops = {
 	.name	= "dsa",
 	.proto	= DSA_TAG_PROTO_DSA,
 	.xmit	= dsa_xmit,
 	.rcv	= dsa_rcv,
-	.flow_dissect   = dsa_tag_flow_dissect,
 	.overhead = DSA_HLEN,
+	.proto_off = 1,
 };
 
 MODULE_LICENSE("GPL");
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index e8eaa804ccb9..c7cb0df17287 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -161,21 +161,13 @@  static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 	return skb;
 }
 
-static int edsa_tag_flow_dissect(const struct sk_buff *skb, __be16 *proto,
-				 int *offset)
-{
-	*offset = 8;
-	*proto = ((__be16 *)skb->data)[3];
-	return 0;
-}
-
 static const struct dsa_device_ops edsa_netdev_ops = {
 	.name	= "edsa",
 	.proto	= DSA_TAG_PROTO_EDSA,
 	.xmit	= edsa_xmit,
 	.rcv	= edsa_rcv,
-	.flow_dissect   = edsa_tag_flow_dissect,
 	.overhead = EDSA_HLEN,
+	.proto_off = 3,
 };
 
 MODULE_LICENSE("GPL");
diff --git a/net/dsa/tag_gswip.c b/net/dsa/tag_gswip.c
index b678160bbd66..4161852d871d 100644
--- a/net/dsa/tag_gswip.c
+++ b/net/dsa/tag_gswip.c
@@ -109,6 +109,7 @@  static const struct dsa_device_ops gswip_netdev_ops = {
 	.xmit = gswip_tag_xmit,
 	.rcv = gswip_tag_rcv,
 	.overhead = GSWIP_RX_HEADER_LEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 MODULE_LICENSE("GPL");
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 90d055c4df9e..4c9576201963 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -123,6 +123,7 @@  static const struct dsa_device_ops ksz8795_netdev_ops = {
 	.xmit	= ksz8795_xmit,
 	.rcv	= ksz8795_rcv,
 	.overhead = KSZ_INGRESS_TAG_LEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 DSA_TAG_DRIVER(ksz8795_netdev_ops);
@@ -198,6 +199,7 @@  static const struct dsa_device_ops ksz9477_netdev_ops = {
 	.xmit	= ksz9477_xmit,
 	.rcv	= ksz9477_rcv,
 	.overhead = KSZ9477_INGRESS_TAG_LEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 DSA_TAG_DRIVER(ksz9477_netdev_ops);
@@ -236,6 +238,7 @@  static const struct dsa_device_ops ksz9893_netdev_ops = {
 	.xmit	= ksz9893_xmit,
 	.rcv	= ksz9477_rcv,
 	.overhead = KSZ_INGRESS_TAG_LEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 DSA_TAG_DRIVER(ksz9893_netdev_ops);
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index eb0e7a32e53d..16cdc2e4c050 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -134,6 +134,7 @@  static const struct dsa_device_ops lan9303_netdev_ops = {
 	.xmit = lan9303_xmit,
 	.rcv = lan9303_rcv,
 	.overhead = LAN9303_TAG_LEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 MODULE_LICENSE("GPL");
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index b5705cba8318..c96354f12317 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -89,22 +89,13 @@  static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 	return skb;
 }
 
-static int mtk_tag_flow_dissect(const struct sk_buff *skb, __be16 *proto,
-				int *offset)
-{
-	*offset = 4;
-	*proto = ((__be16 *)skb->data)[1];
-
-	return 0;
-}
-
 static const struct dsa_device_ops mtk_netdev_ops = {
 	.name		= "mtk",
 	.proto		= DSA_TAG_PROTO_MTK,
 	.xmit		= mtk_tag_xmit,
 	.rcv		= mtk_tag_rcv,
-	.flow_dissect	= mtk_tag_flow_dissect,
 	.overhead	= MTK_HDR_LEN,
+	.proto_off	= 1,
 };
 
 MODULE_LICENSE("GPL");
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index 8e3e7283d430..f9d9cc705caf 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -233,6 +233,7 @@  static struct dsa_device_ops ocelot_netdev_ops = {
 	.xmit			= ocelot_xmit,
 	.rcv			= ocelot_rcv,
 	.overhead		= OCELOT_TAG_LEN + OCELOT_LONG_PREFIX_LEN,
+	.proto_off		= DSA_PROTO_OFF_UNSPEC,
 };
 
 MODULE_LICENSE("GPL v2");
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index c95885215525..87cf2b9f78ea 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -90,22 +90,13 @@  static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 	return skb;
 }
 
-static int qca_tag_flow_dissect(const struct sk_buff *skb, __be16 *proto,
-                                int *offset)
-{
-	*offset = QCA_HDR_LEN;
-	*proto = ((__be16 *)skb->data)[0];
-
-	return 0;
-}
-
 static const struct dsa_device_ops qca_netdev_ops = {
 	.name	= "qca",
 	.proto	= DSA_TAG_PROTO_QCA,
 	.xmit	= qca_tag_xmit,
 	.rcv	= qca_tag_rcv,
-	.flow_dissect = qca_tag_flow_dissect,
 	.overhead = QCA_HDR_LEN,
+	.proto_off = 0,
 };
 
 MODULE_LICENSE("GPL");
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 63ef2a14c934..9be591186638 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -300,6 +300,7 @@  static struct dsa_device_ops sja1105_netdev_ops = {
 	.rcv = sja1105_rcv,
 	.filter = sja1105_filter,
 	.overhead = VLAN_HLEN,
+	.proto_off = DSA_PROTO_OFF_UNSPEC,
 };
 
 MODULE_LICENSE("GPL v2");