Message ID | 20230927075124.23941-10-larysa.zaremba@intel.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | XDP metadata via kfuncs for ice + mlx5 | expand |
On Wed, 27 Sep 2023 09:51:09 +0200 Larysa Zaremba wrote: > Implement functionality that enables drivers to expose VLAN tag > to XDP code. > > VLAN tag is represented by 2 variables: > - protocol ID, which is passed to bpf code in BE > - VLAN TCI, in host byte order Sorry for a random chime-in but was there any discussion about the validity of VLAN stripping as an offload? I always thought this is a legacy "Windows" thing which allowed Windows drivers to operate on VLAN-tagged networks even before the OS itself understood VLANs... Do people actually care about having it enabled?
From: Jakub Kicinski <kuba@kernel.org> Date: Tue, 3 Oct 2023 05:35:19 -0700 > On Wed, 27 Sep 2023 09:51:09 +0200 Larysa Zaremba wrote: >> Implement functionality that enables drivers to expose VLAN tag >> to XDP code. >> >> VLAN tag is represented by 2 variables: >> - protocol ID, which is passed to bpf code in BE >> - VLAN TCI, in host byte order > > Sorry for a random chime-in but was there any discussion about > the validity of VLAN stripping as an offload? > > I always thought this is a legacy "Windows" thing which allowed > Windows drivers to operate on VLAN-tagged networks even before > the OS itself understood VLANs... Do people actually care about > having it enabled? On MIPS routers, I actually have some perf gains from having it enabled. So they do, I'd say. Mediatek even has DSA tag stripping. Both save you some skb->data push-pulls, csum corrections when CHECKSUM_COMPLETE, skb unsharing in some cases, reduce L3/L4 headers cacheline spanning etc. Thanks, Olek
On Tue, 3 Oct 2023 15:09:39 +0200 Alexander Lobakin wrote: > > Sorry for a random chime-in but was there any discussion about > > the validity of VLAN stripping as an offload? > > > > I always thought this is a legacy "Windows" thing which allowed > > Windows drivers to operate on VLAN-tagged networks even before > > the OS itself understood VLANs... Do people actually care about > > having it enabled? > > On MIPS routers, I actually have some perf gains from having it enabled. > So they do, I'd say. Mediatek even has DSA tag stripping. Both save you > some skb->data push-pulls, csum corrections when CHECKSUM_COMPLETE, skb > unsharing in some cases, reduce L3/L4 headers cacheline spanning etc. No unsharing - you can still strip it in the driver. Do you really think that for XDP kfunc call will be cheaper?
From: Jakub Kicinski <kuba@kernel.org> Date: Wed, 4 Oct 2023 11:08:50 -0700 > On Tue, 3 Oct 2023 15:09:39 +0200 Alexander Lobakin wrote: >>> Sorry for a random chime-in but was there any discussion about >>> the validity of VLAN stripping as an offload? >>> >>> I always thought this is a legacy "Windows" thing which allowed >>> Windows drivers to operate on VLAN-tagged networks even before >>> the OS itself understood VLANs... Do people actually care about >>> having it enabled? >> >> On MIPS routers, I actually have some perf gains from having it enabled. >> So they do, I'd say. Mediatek even has DSA tag stripping. Both save you >> some skb->data push-pulls, csum corrections when CHECKSUM_COMPLETE, skb >> unsharing in some cases, reduce L3/L4 headers cacheline spanning etc. > > No unsharing - you can still strip it in the driver. Nobody manually strips VLAN tags in the drivers. You either have HW stripping or pass VLAN-tagged skb to the stack, so that skb_vlan_untag() takes care of it. > Do you really think that for XDP kfunc call will be cheaper? Wait, you initially asked: * discussion about the validity of VLAN stripping as an offload? * Do people actually care about having it enabled? I did read this as "do we still need HW VLAN stripping in general?", not only for XDP. So I replied for "in general" -- yes. Forcefully disabling stripping when XDP is active is obscure IMO, let the user decide. Thanks, Olek
On Thu, 5 Oct 2023 18:58:33 +0200 Alexander Lobakin wrote: > > No unsharing - you can still strip it in the driver. > > Nobody manually strips VLAN tags in the drivers. You either have HW > stripping or pass VLAN-tagged skb to the stack, so that skb_vlan_untag() > takes care of it. Isn't it just a case of circular logic tho? We don't optimize the stack for SW stripping because HW does it. Then HW does it because SW is not optimized. > > Do you really think that for XDP kfunc call will be cheaper? > > Wait, you initially asked: > > * discussion about the validity of VLAN stripping as an offload? > * Do people actually care about having it enabled? > > I did read this as "do we still need HW VLAN stripping in general?", not > only for XDP. So I replied for "in general" -- yes. > Forcefully disabling stripping when XDP is active is obscure IMO, let > the user decide. Every time I'm involved in conversations about NIC datapath host interfaces I cringe at this stupid VLAN offload. Maybe I'm too daft to understand it's amazing value but we just shift 2B from the packet to the descriptor and then we have to worry about all the corner cases that come from vlan stacking :(
On 10/5/23 11:16 AM, Jakub Kicinski wrote: > On Thu, 5 Oct 2023 18:58:33 +0200 Alexander Lobakin wrote: >>> No unsharing - you can still strip it in the driver. >> >> Nobody manually strips VLAN tags in the drivers. You either have HW >> stripping or pass VLAN-tagged skb to the stack, so that skb_vlan_untag() >> takes care of it. > > Isn't it just a case of circular logic tho? > We don't optimize the stack for SW stripping because HW does it. > Then HW does it because SW is not optimized. > >>> Do you really think that for XDP kfunc call will be cheaper? >> >> Wait, you initially asked: >> >> * discussion about the validity of VLAN stripping as an offload? >> * Do people actually care about having it enabled? >> >> I did read this as "do we still need HW VLAN stripping in general?", not >> only for XDP. So I replied for "in general" -- yes. >> Forcefully disabling stripping when XDP is active is obscure IMO, let >> the user decide. > > Every time I'm involved in conversations about NIC datapath host > interfaces I cringe at this stupid VLAN offload. Maybe I'm too > daft to understand it's amazing value but we just shift 2B from > the packet to the descriptor and then we have to worry about all > the corner cases that come from vlan stacking :( 4B (vlan tci + protocol). VLAN stripping in S/W and pushing the header on Tx is measurable and does have a noticeable performance impact. XDP programs need to co-exist with enabled offloads. If the tag is not stripped, XDP program needs to handle it. If the tag is stripped, the XDP program needs to access to the value.
On Thu, 5 Oct 2023 11:20:49 -0600 David Ahern wrote: > > Every time I'm involved in conversations about NIC datapath host > > interfaces I cringe at this stupid VLAN offload. Maybe I'm too > > daft to understand it's amazing value but we just shift 2B from > > the packet to the descriptor and then we have to worry about all > > the corner cases that come from vlan stacking :( > > 4B (vlan tci + protocol). > > VLAN stripping in S/W and pushing the header on Tx is measurable and > does have a noticeable performance impact. > > XDP programs need to co-exist with enabled offloads. If the tag is not > stripped, XDP program needs to handle it. If the tag is stripped, the > XDP program needs to access to the value. Well, I thought I'd ask :) I'm not opposed. But if either of you have the data on how much slower well-implemented Rx stripping in the driver is than putting the info in the descriptor, I'd be very interested. Tx is a different situation.
diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst index 205696780b78..bb53b00d1b2c 100644 --- a/Documentation/networking/xdp-rx-metadata.rst +++ b/Documentation/networking/xdp-rx-metadata.rst @@ -18,7 +18,13 @@ Currently, the following kfuncs are supported. In the future, as more metadata is supported, this set will grow: .. kernel-doc:: net/core/xdp.c - :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash + :identifiers: bpf_xdp_metadata_rx_timestamp + +.. kernel-doc:: net/core/xdp.c + :identifiers: bpf_xdp_metadata_rx_hash + +.. kernel-doc:: net/core/xdp.c + :identifiers: bpf_xdp_metadata_rx_vlan_tag An XDP program can use these kfuncs to read the metadata into stack variables for its own consumption. Or, to pass the metadata on to other diff --git a/include/net/xdp.h b/include/net/xdp.h index eb77040b4825..ef79f124dbcf 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -399,6 +399,10 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, NETDEV_XDP_RX_METADATA_HASH, \ bpf_xdp_metadata_rx_hash, \ xmo_rx_hash) \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \ + NETDEV_XDP_RX_METADATA_VLAN_TAG, \ + bpf_xdp_metadata_rx_vlan_tag, \ + xmo_rx_vlan_tag) \ enum xdp_rx_metadata { #define XDP_METADATA_KFUNC(name, _, __, ___) name, @@ -460,6 +464,8 @@ struct xdp_metadata_ops { int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash, enum xdp_rss_hash_type *rss_type); + int (*xmo_rx_vlan_tag)(const struct xdp_md *ctx, __be16 *vlan_proto, + u16 *vlan_tci); }; #ifdef CONFIG_NET diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 2943a151d4f1..661f603e3e43 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -44,13 +44,16 @@ enum netdev_xdp_act { * timestamp via bpf_xdp_metadata_rx_timestamp(). * @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet * hash via bpf_xdp_metadata_rx_hash(). + * @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing stripped + * receive VLAN tag (proto and TCI) via bpf_xdp_metadata_rx_vlan_tag(). */ enum netdev_xdp_rx_metadata { NETDEV_XDP_RX_METADATA_TIMESTAMP = 1, NETDEV_XDP_RX_METADATA_HASH = 2, + NETDEV_XDP_RX_METADATA_VLAN_TAG = 4, /* private: */ - NETDEV_XDP_RX_METADATA_MASK = 3, + NETDEV_XDP_RX_METADATA_MASK = 7, }; enum { diff --git a/net/core/xdp.c b/net/core/xdp.c index df4789ab512d..fb87925b3dc3 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -738,6 +738,39 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash, return -EOPNOTSUPP; } +/** + * bpf_xdp_metadata_rx_vlan_tag - Get XDP packet outermost VLAN tag + * @ctx: XDP context pointer. + * @vlan_proto: Destination pointer for VLAN Tag protocol identifier (TPID). + * @vlan_tci: Destination pointer for VLAN TCI (VID + DEI + PCP) + * + * In case of success, ``vlan_proto`` contains *Tag protocol identifier (TPID)*, + * usually ``ETH_P_8021Q`` or ``ETH_P_8021AD``, but some networks can use + * custom TPIDs. ``vlan_proto`` is stored in **network byte order (BE)** + * and should be used as follows: + * ``if (vlan_proto == bpf_htons(ETH_P_8021Q)) do_something();`` + * + * ``vlan_tci`` contains the remaining 16 bits of a VLAN tag. + * Driver is expected to provide those in **host byte order (usually LE)**, + * so the bpf program should not perform byte conversion. + * According to 802.1Q standard, *VLAN TCI (Tag control information)* + * is a bit field that contains: + * *VLAN identifier (VID)* that can be read with ``vlan_tci & 0xfff``, + * *Drop eligible indicator (DEI)* - 1 bit, + * *Priority code point (PCP)* - 3 bits. + * For detailed meaning of DEI and PCP, please refer to other sources. + * + * Return: + * * Returns 0 on success or ``-errno`` on error. + * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc + * * ``-ENODATA`` : VLAN tag was not stripped or is not available + */ +__bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, + __be16 *vlan_proto, u16 *vlan_tci) +{ + return -EOPNOTSUPP; +} + __diag_pop(); BTF_SET8_START(xdp_metadata_kfunc_ids) diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index 2943a151d4f1..661f603e3e43 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -44,13 +44,16 @@ enum netdev_xdp_act { * timestamp via bpf_xdp_metadata_rx_timestamp(). * @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet * hash via bpf_xdp_metadata_rx_hash(). + * @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing stripped + * receive VLAN tag (proto and TCI) via bpf_xdp_metadata_rx_vlan_tag(). */ enum netdev_xdp_rx_metadata { NETDEV_XDP_RX_METADATA_TIMESTAMP = 1, NETDEV_XDP_RX_METADATA_HASH = 2, + NETDEV_XDP_RX_METADATA_VLAN_TAG = 4, /* private: */ - NETDEV_XDP_RX_METADATA_MASK = 3, + NETDEV_XDP_RX_METADATA_MASK = 7, }; enum {