diff mbox series

[net-next,v2] docs: net: dsa: describe issues with checksum offload

Message ID 20220411230305.28951-1-luizluca@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] docs: net: dsa: describe issues with checksum offload | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 23 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Luiz Angelo Daros de Luca April 11, 2022, 11:03 p.m. UTC
DSA tags before IP header (categories 1 and 2) or after the payload (3)
might introduce offload checksum issues.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
---
 Documentation/networking/dsa/dsa.rst | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Vladimir Oltean April 13, 2022, 8:08 p.m. UTC | #1
I've copied a bunch of new people to this email.

TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060,
does the DSA master declare any of the following features as "on"?

ethtool -k eth0 | grep tx-checksum-ip

I would expect not. Otherwise, we've either found a bug, or discovered the Sasquatch.

On Mon, Apr 11, 2022 at 08:03:06PM -0300, Luiz Angelo Daros de Luca wrote:
> DSA tags before IP header (categories 1 and 2) or after the payload (3)
> might introduce offload checksum issues.
> 
> Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
> ---

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>

>  Documentation/networking/dsa/dsa.rst | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
> index ddc1dd039337..ed7fa76e7a40 100644
> --- a/Documentation/networking/dsa/dsa.rst
> +++ b/Documentation/networking/dsa/dsa.rst
> @@ -193,6 +193,23 @@ protocol. If not all packets are of equal size, the tagger can implement the
>  default behavior by specifying the correct offset incurred by each individual
>  RX packet. Tail taggers do not cause issues to the flow dissector.
>  
> +Checksum offload should work with category 1 and 2 taggers when the DSA master
> +driver declares NETIF_F_HW_CSUM in vlan_features and looks at csum_start and
> +csum_offset. For those cases, DSA will shift the checksum start and offset by
> +the tag size. If the DSA master driver still uses the legacy NETIF_F_IP_CSUM
> +or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the
> +offload hardware already expects that specific tag (perhaps due to matching
> +vendors). DSA slaves inherit those flags from the master port, and it is up to
> +the driver to correctly fall back to software checksum when the IP header is not
> +where the hardware expects. If that check is ineffective, the packets might go
> +to the network without a proper checksum (the checksum field will have the
> +pseudo IP header sum). For category 3, when the offload hardware does not
> +already expect the switch tag in use, the checksum must be calculated before any
> +tag is inserted (i.e. inside the tagger). Otherwise, the DSA master would
> +include the tail tag in the (software or hardware) checksum calculation. Then,
> +when the tag gets stripped by the switch during transmission, it will leave an
> +incorrect IP checksum in place.
> +

While what you're describing here is truthful to what is currently being
done, I'm re-reading this conversation:
https://lore.kernel.org/netdev/20210715114908.ripblpevmdujkf2m@skbuf/T/#m13a2e3a78d22b82f14bcdf85d988844053b1e8f9
and trying to remember why I didn't point out what now seems obvious.

It was said that inheriting master->vlan_features & NETIF_F_HW_CSUM is
counter-productive for tail taggers, since now we have to patch all of
them to do that "skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))"
dance. And that is most definitely true.

It was also said that some systems where the DSA master vendor coincides
with the DSA switch vendor rely on the switch inheriting NETIF_F_HW_CSUM
from master->vlan_features, for a boost in performance. That is also
most definitely true.

But none of the examples given was for a tail tagger, which is what the
discussion was about. With the exception of the obsolete tag_trailer.c
used by mv88e6060, Marvell use Ethertype headers, and Broadcom either
use an Ethertype header or a header prepended to the Ethernet header.

Of all tagging protocol drivers which declare a non-zero needed_tailroom:

- tag_ksz.c also calls skb_checksum_help() so I don't have doubts that
  there aren't masters which offload checksumming for it

- tag_trailer.c doesn't call skb_checksum_help(), but it's orphan and
  probably super broken anyway

- tag_hellcreek.c doesn't call skb_checksum_help() and is therefore
  probably broken with checksum offloading masters. But it was probably
  only tested on FPGA (or at least I assume "hirschmann,hellcreek-de1soc-r1"
  stands for "Altera DE1") and it happens to work there.

- tag_xrs700x.c doesn't call skb_checksum_help() either, so there are
  probably breakages waiting to happen

- tag_sja1105.c (actually only SJA1110) uses a tail tag only for
  link-local traffic, which is non-IP so there is no checksum offload
  breakage there

- tag_rtl8_4.c (the tail tagging version) has been added by yourself
  with a call to skb_checksum_help().

In any case, we give this advice to driver writers so off-hand that it's
comical (I'm not attacking you, Luiz, for merely writing it down):

| For category 3, when the offload hardware does not already expect the
| switch tag in use, the checksum must be calculated before any tag is
| inserted (i.e. inside the tagger).

As if the tagging protocol driver has any crystal ball to guess whether
the offload hardware of the DSA master in current use is going to expect
the tail tag or not. BS. A tail tagging protocol concerned with correctness
and portability is always going to call skb_checksum_help(), hence the
absurdity of allowing tail taggers to inherit NETIF_F_HW_CSUM |
NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM from the master in the first place.

>  Due to various reasons (most common being category 1 taggers being associated
>  with DSA-unaware masters, mangling what the master perceives as MAC DA), the
>  tagging protocol may require the DSA master to operate in promiscuous mode, to
> -- 
> 2.35.1
>
Andrew Lunn April 13, 2022, 8:49 p.m. UTC | #2
On Wed, Apr 13, 2022 at 08:08:41PM +0000, Vladimir Oltean wrote:
> I've copied a bunch of new people to this email.
> 
> TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060,
> does the DSA master declare any of the following features as "on"?
> 
> ethtool -k eth0 | grep tx-checksum-ip

Zii-devel-c, which uses a FEC as master:

root@zii-devel-c:~# ethtool -k eth1 | grep tx-checksum-ip
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]

370RD is a Marvell reference design, using mvneta as the master

andrew@370rd:~$ /usr/sbin/ethtool -k eth1 | grep tx-checksum-ip
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: on

WRT1900AC is a WiFi access point, also mvneta

root@wrt1900ac:~# ethtool -k eth0 | grep tx-checksum-ip
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on

I have one more system i can check, using a Marvell Kirkwood SoC using
the mv643xx as master. I need to blow the dust off it first, i've not
booted it in years.

    Andrew
Vladimir Oltean April 13, 2022, 8:53 p.m. UTC | #3
On Wed, Apr 13, 2022 at 10:49:53PM +0200, Andrew Lunn wrote:
> On Wed, Apr 13, 2022 at 08:08:41PM +0000, Vladimir Oltean wrote:
> > I've copied a bunch of new people to this email.
> > 
> > TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060,
> > does the DSA master declare any of the following features as "on"?
> > 
> > ethtool -k eth0 | grep tx-checksum-ip
> 
> Zii-devel-c, which uses a FEC as master:
> 
> root@zii-devel-c:~# ethtool -k eth1 | grep tx-checksum-ip
> 	tx-checksum-ipv4: off [fixed]
> 	tx-checksum-ip-generic: off [fixed]
> 	tx-checksum-ipv6: off [fixed]
> 
> 370RD is a Marvell reference design, using mvneta as the master
> 
> andrew@370rd:~$ /usr/sbin/ethtool -k eth1 | grep tx-checksum-ip
> 	tx-checksum-ipv4: on
> 	tx-checksum-ip-generic: off [fixed]
> 	tx-checksum-ipv6: on
> 
> WRT1900AC is a WiFi access point, also mvneta
> 
> root@wrt1900ac:~# ethtool -k eth0 | grep tx-checksum-ip
>         tx-checksum-ipv4: on
>         tx-checksum-ip-generic: off [fixed]
>         tx-checksum-ipv6: on
> 
> I have one more system i can check, using a Marvell Kirkwood SoC using
> the mv643xx as master. I need to blow the dust off it first, i've not
> booted it in years.
> 
>     Andrew

I meant to ask about the actual mv88e6060 driver, the one that uses
tag_trailer.c, not mv88e6xxx.
Andrew Lunn April 13, 2022, 8:56 p.m. UTC | #4
On Wed, Apr 13, 2022 at 08:08:41PM +0000, Vladimir Oltean wrote:
> I've copied a bunch of new people to this email.
> 
> TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060,
> does the DSA master declare any of the following features as "on"?
> 
> ethtool -k eth0 | grep tx-checksum-ip
> 
> I would expect not. Otherwise, we've either found a bug, or discovered the Sasquatch.

Easier than i was expecting. As i said, Marvell Kirkwood with a
mv643xx as master:

root@dir665:~# ethtool -k eth0 | grep tx-checksum-ip
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off [fixed]

	Andrew
Andrew Lunn April 13, 2022, 8:57 p.m. UTC | #5
> I meant to ask about the actual mv88e6060 driver, the one that uses
> tag_trailer.c, not mv88e6xxx.

Ah, sorry, i don't have one of those. And i've no idea of anybody who
does. It is a long long time since i heard of anybody with one.

      Andrew
Vladimir Oltean April 13, 2022, 9 p.m. UTC | #6
On Wed, Apr 13, 2022 at 10:57:42PM +0200, Andrew Lunn wrote:
> > I meant to ask about the actual mv88e6060 driver, the one that uses
> > tag_trailer.c, not mv88e6xxx.
> 
> Ah, sorry, i don't have one of those. And i've no idea of anybody who
> does. It is a long long time since i heard of anybody with one.
> 
>       Andrew

Ok, I'll go with "no checksum offload for its trailer tag, and bugs
never fixed because no one uses it", in any case no Sasquatch. Thanks.
Luiz Angelo Daros de Luca April 14, 2022, 2:48 a.m. UTC | #7
> Ok, I'll go with "no checksum offload for its trailer tag, and bugs
> never fixed because no one uses it", in any case no Sasquatch. Thanks.

Vladimir, so the DSA switch will not copy the offload flags when a tag
requests tail room? At least it will work.

Now, if the offload HW does support that tag, what would be the
options? Set the slave port checksum flag from userland?
It would be nice to have some type of "magic trick" to have it enabled
by default. I'm already expecting a no, but wouldn't it be a nice case
for a DSA property in the device tree?

Regards,

Luiz
Kurt Kanzenbach April 14, 2022, 6:29 a.m. UTC | #8
On Wed Apr 13 2022, Vladimir Oltean wrote:
> I've copied a bunch of new people to this email.
>
> TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060,
> does the DSA master declare any of the following features as "on"?
>
> ethtool -k eth0 | grep tx-checksum-ip

It's a Cyclone V with stmmac as master:

|root@tsn:~# ethtool -k eth0 | grep tx-checksum-ip
|        tx-checksum-ipv4: on
|        tx-checksum-ip-generic: off [fixed]
|        tx-checksum-ipv6: on

>
> I would expect not. Otherwise, we've either found a bug, or discovered
> the Sasquatch.

Now, I'm wondering how this actually works. Anyway, I'll send a patch to
add the missing skb_checksum_help().

Thanks,
Kurt
Vladimir Oltean April 14, 2022, 12:58 p.m. UTC | #9
On Wed, Apr 13, 2022 at 11:48:46PM -0300, Luiz Angelo Daros de Luca wrote:
> > Ok, I'll go with "no checksum offload for its trailer tag, and bugs
> > never fixed because no one uses it", in any case no Sasquatch. Thanks.
> 
> Vladimir, so the DSA switch will not copy the offload flags when a tag
> requests tail room? At least it will work.
> 
> Now, if the offload HW does support that tag, what would be the
> options? Set the slave port checksum flag from userland?
> It would be nice to have some type of "magic trick" to have it enabled
> by default. I'm already expecting a no, but wouldn't it be a nice case
> for a DSA property in the device tree?
> 
> Regards,
> 
> Luiz

DSA calls netdev_upper_dev_link(master, slave_dev, NULL) to establish a
relationship with its master and the master driver can detect this by
monitoring NETDEV_CHANGEUPPER.

If we look a bit closer at the implementation of netdev_upper_dev_link
we see it calls __netdev_upper_dev_link() which contains an interesting
pair of arguments "void *upper_priv, void *upper_info". These are
accessible to netdev_master_upper_dev_link(), and the bonding driver
(for example) makes use of them, see bond_master_upper_dev_link().

I'm thinking DSA could create a struct netdev_dsa_upper_info too, and
certain DSA masters could populate things in it. Then DSA could look at
what the DSA master has said (or not said) and fix up features
accordingly.

One information that could get populated by the master is a bit field of
whether checksumming is supported for a certain tagging protocol.
I'd rather pass a full bit field of all tagging protocols, rather than
just the protocol in current use by the slave, because:
(a) it's less gory compared to having the master look at
    dsa_port_from_netdev(info->upper_dev)->cpu_dp->tag_ops->proto
(b) the DSA tagging protocol can change at runtime, and when it does, no
    NETDEV_CHANGEUPPER will be emitted, so the master won't have a
    chance to inform us whether it can offload checksumming for the new
    protocol. So it's better to have this information from the get go.

We'd also need the DSA master to populate a "bool acked=true" from this
new struct netdev_dsa_upper_info. The reason is to be able to
distinguish between an empty bit mask that means "yup, I really can't
offload checksumming for anything", and a bit mask that means "DSA who?"
(where checksum offloading is expected to work under the normal
circumstances described by you, no special code required).
Vladimir Oltean April 14, 2022, 3:22 p.m. UTC | #10
On Thu, Apr 14, 2022 at 08:29:12AM +0200, Kurt Kanzenbach wrote:
> On Wed Apr 13 2022, Vladimir Oltean wrote:
> > I've copied a bunch of new people to this email.
> >
> > TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060,
> > does the DSA master declare any of the following features as "on"?
> >
> > ethtool -k eth0 | grep tx-checksum-ip
> 
> It's a Cyclone V with stmmac as master:
> 
> |root@tsn:~# ethtool -k eth0 | grep tx-checksum-ip
> |        tx-checksum-ipv4: on
> |        tx-checksum-ip-generic: off [fixed]
> |        tx-checksum-ipv6: on
> 
> >
> > I would expect not. Otherwise, we've either found a bug, or discovered
> > the Sasquatch.
> 
> Now, I'm wondering how this actually works. Anyway, I'll send a patch to
> add the missing skb_checksum_help().
> 
> Thanks,
> Kurt

Thanks, Kurt.

It works because stmmac declares NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM in
ndev->hw_features but not in ndev->vlan_features. Whereas DSA inherits
what it inherits from ndev->vlan_features.

It's good to fix this in hellcreek anyway.
Luiz Angelo Daros de Luca April 15, 2022, 8:01 a.m. UTC | #11
> On Wed, Apr 13, 2022 at 11:48:46PM -0300, Luiz Angelo Daros de Luca wrote:
> > > Ok, I'll go with "no checksum offload for its trailer tag, and bugs
> > > never fixed because no one uses it", in any case no Sasquatch. Thanks.
> >
> > Vladimir, so the DSA switch will not copy the offload flags when a tag
> > requests tail room? At least it will work.
> >
> > Now, if the offload HW does support that tag, what would be the
> > options? Set the slave port checksum flag from userland?
> > It would be nice to have some type of "magic trick" to have it enabled
> > by default. I'm already expecting a no, but wouldn't it be a nice case
> > for a DSA property in the device tree?
> >
> > Regards,
> >
> > Luiz
>
> DSA calls netdev_upper_dev_link(master, slave_dev, NULL) to establish a
> relationship with its master and the master driver can detect this by
> monitoring NETDEV_CHANGEUPPER.
>
> If we look a bit closer at the implementation of netdev_upper_dev_link
> we see it calls __netdev_upper_dev_link() which contains an interesting
> pair of arguments "void *upper_priv, void *upper_info". These are
> accessible to netdev_master_upper_dev_link(), and the bonding driver
> (for example) makes use of them, see bond_master_upper_dev_link().
>
> I'm thinking DSA could create a struct netdev_dsa_upper_info too, and
> certain DSA masters could populate things in it. Then DSA could look at
> what the DSA master has said (or not said) and fix up features
> accordingly.
>
> One information that could get populated by the master is a bit field of
> whether checksumming is supported for a certain tagging protocol.
> I'd rather pass a full bit field of all tagging protocols, rather than
> just the protocol in current use by the slave, because:
> (a) it's less gory compared to having the master look at
>     dsa_port_from_netdev(info->upper_dev)->cpu_dp->tag_ops->proto
> (b) the DSA tagging protocol can change at runtime, and when it does, no
>     NETDEV_CHANGEUPPER will be emitted, so the master won't have a
>     chance to inform us whether it can offload checksumming for the new
>     protocol. So it's better to have this information from the get go.
>
> We'd also need the DSA master to populate a "bool acked=true" from this
> new struct netdev_dsa_upper_info. The reason is to be able to
> distinguish between an empty bit mask that means "yup, I really can't
> offload checksumming for anything", and a bit mask that means "DSA who?"
> (where checksum offloading is expected to work under the normal
> circumstances described by you, no special code required).

That looks like a larger change. Should we put this patch on hold
waiting for the code refactor or we merge it as is (as it tells no
lies about current code).
Vladimir Oltean April 15, 2022, 10:22 a.m. UTC | #12
On Fri, Apr 15, 2022 at 05:01:46AM -0300, Luiz Angelo Daros de Luca wrote:
> That looks like a larger change.

Well, you asked what are the options in the unlikely event that we'll
see offloading DSA masters for tail tags.

> Should we put this patch on hold waiting for the code refactor or we
> merge it as is (as it tells no lies about current code).

It looks like the patch was marked as "changes requested", my side
comments other than the Reviewed-by tag were probably misinterpreted:
https://patchwork.kernel.org/project/netdevbpf/patch/20220411230305.28951-1-luizluca@gmail.com/

So please repost.
Luiz Angelo Daros de Luca April 16, 2022, 5:30 a.m. UTC | #13
> > Should we put this patch on hold waiting for the code refactor or we
> > merge it as is (as it tells no lies about current code).
>
> It looks like the patch was marked as "changes requested", my side
> comments other than the Reviewed-by tag were probably misinterpreted:
> https://patchwork.kernel.org/project/netdevbpf/patch/20220411230305.28951-1-luizluca@gmail.com/
>
> So please repost.

Done as v3 (unchanged except for you review tag). Thanks!
diff mbox series

Patch

diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index ddc1dd039337..ed7fa76e7a40 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -193,6 +193,23 @@  protocol. If not all packets are of equal size, the tagger can implement the
 default behavior by specifying the correct offset incurred by each individual
 RX packet. Tail taggers do not cause issues to the flow dissector.
 
+Checksum offload should work with category 1 and 2 taggers when the DSA master
+driver declares NETIF_F_HW_CSUM in vlan_features and looks at csum_start and
+csum_offset. For those cases, DSA will shift the checksum start and offset by
+the tag size. If the DSA master driver still uses the legacy NETIF_F_IP_CSUM
+or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the
+offload hardware already expects that specific tag (perhaps due to matching
+vendors). DSA slaves inherit those flags from the master port, and it is up to
+the driver to correctly fall back to software checksum when the IP header is not
+where the hardware expects. If that check is ineffective, the packets might go
+to the network without a proper checksum (the checksum field will have the
+pseudo IP header sum). For category 3, when the offload hardware does not
+already expect the switch tag in use, the checksum must be calculated before any
+tag is inserted (i.e. inside the tagger). Otherwise, the DSA master would
+include the tail tag in the (software or hardware) checksum calculation. Then,
+when the tag gets stripped by the switch during transmission, it will leave an
+incorrect IP checksum in place.
+
 Due to various reasons (most common being category 1 taggers being associated
 with DSA-unaware masters, mangling what the master perceives as MAC DA), the
 tagging protocol may require the DSA master to operate in promiscuous mode, to