diff mbox series

[RFC,bpf-next,v2,09/24] xdp: Add VLAN tag hint

Message ID 20230927075124.23941-10-larysa.zaremba@intel.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series XDP metadata via kfuncs for ice + mlx5 | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-0 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-10 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on s390x with gcc
netdev/series_format fail Series longer than 15 patches (and no cover letter)
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 5487 this patch: 5488
netdev/cc_maintainers warning 7 maintainers not CCed: corbet@lwn.net linux-doc@vger.kernel.org pabeni@redhat.com davem@davemloft.net edumazet@google.com lorenzo@kernel.org hawk@kernel.org
netdev/build_clang success Errors and warnings before: 1660 this patch: 1660
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 5867 this patch: 5868
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 105 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Larysa Zaremba Sept. 27, 2023, 7:51 a.m. UTC
Implement functionality that enables drivers to expose VLAN tag
to XDP code.

VLAN tag is represented by 2 variables:
- protocol ID, which is passed to bpf code in BE
- VLAN TCI, in host byte order

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 Documentation/networking/xdp-rx-metadata.rst |  8 ++++-
 include/net/xdp.h                            |  6 ++++
 include/uapi/linux/netdev.h                  |  5 ++-
 net/core/xdp.c                               | 33 ++++++++++++++++++++
 tools/include/uapi/linux/netdev.h            |  5 ++-
 5 files changed, 54 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski Oct. 3, 2023, 12:35 p.m. UTC | #1
On Wed, 27 Sep 2023 09:51:09 +0200 Larysa Zaremba wrote:
> Implement functionality that enables drivers to expose VLAN tag
> to XDP code.
> 
> VLAN tag is represented by 2 variables:
> - protocol ID, which is passed to bpf code in BE
> - VLAN TCI, in host byte order

Sorry for a random chime-in but was there any discussion about 
the validity of VLAN stripping as an offload?

I always thought this is a legacy "Windows" thing which allowed
Windows drivers to operate on VLAN-tagged networks even before
the OS itself understood VLANs...  Do people actually care about
having it enabled?
Alexander Lobakin Oct. 3, 2023, 1:09 p.m. UTC | #2
From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 3 Oct 2023 05:35:19 -0700

> On Wed, 27 Sep 2023 09:51:09 +0200 Larysa Zaremba wrote:
>> Implement functionality that enables drivers to expose VLAN tag
>> to XDP code.
>>
>> VLAN tag is represented by 2 variables:
>> - protocol ID, which is passed to bpf code in BE
>> - VLAN TCI, in host byte order
> 
> Sorry for a random chime-in but was there any discussion about 
> the validity of VLAN stripping as an offload?
> 
> I always thought this is a legacy "Windows" thing which allowed
> Windows drivers to operate on VLAN-tagged networks even before
> the OS itself understood VLANs...  Do people actually care about
> having it enabled?

On MIPS routers, I actually have some perf gains from having it enabled.
So they do, I'd say. Mediatek even has DSA tag stripping. Both save you
some skb->data push-pulls, csum corrections when CHECKSUM_COMPLETE, skb
unsharing in some cases, reduce L3/L4 headers cacheline spanning etc.

Thanks,
Olek
Jakub Kicinski Oct. 4, 2023, 6:08 p.m. UTC | #3
On Tue, 3 Oct 2023 15:09:39 +0200 Alexander Lobakin wrote:
> > Sorry for a random chime-in but was there any discussion about 
> > the validity of VLAN stripping as an offload?
> > 
> > I always thought this is a legacy "Windows" thing which allowed
> > Windows drivers to operate on VLAN-tagged networks even before
> > the OS itself understood VLANs...  Do people actually care about
> > having it enabled?  
> 
> On MIPS routers, I actually have some perf gains from having it enabled.
> So they do, I'd say. Mediatek even has DSA tag stripping. Both save you
> some skb->data push-pulls, csum corrections when CHECKSUM_COMPLETE, skb
> unsharing in some cases, reduce L3/L4 headers cacheline spanning etc.

No unsharing - you can still strip it in the driver.
Do you really think that for XDP kfunc call will be cheaper?
Alexander Lobakin Oct. 5, 2023, 4:58 p.m. UTC | #4
From: Jakub Kicinski <kuba@kernel.org>
Date: Wed, 4 Oct 2023 11:08:50 -0700

> On Tue, 3 Oct 2023 15:09:39 +0200 Alexander Lobakin wrote:
>>> Sorry for a random chime-in but was there any discussion about 
>>> the validity of VLAN stripping as an offload?
>>>
>>> I always thought this is a legacy "Windows" thing which allowed
>>> Windows drivers to operate on VLAN-tagged networks even before
>>> the OS itself understood VLANs...  Do people actually care about
>>> having it enabled?  
>>
>> On MIPS routers, I actually have some perf gains from having it enabled.
>> So they do, I'd say. Mediatek even has DSA tag stripping. Both save you
>> some skb->data push-pulls, csum corrections when CHECKSUM_COMPLETE, skb
>> unsharing in some cases, reduce L3/L4 headers cacheline spanning etc.
> 
> No unsharing - you can still strip it in the driver.

Nobody manually strips VLAN tags in the drivers. You either have HW
stripping or pass VLAN-tagged skb to the stack, so that skb_vlan_untag()
takes care of it.

> Do you really think that for XDP kfunc call will be cheaper?

Wait, you initially asked:

* discussion about the validity of VLAN stripping as an offload?
* Do people actually care about having it enabled?

I did read this as "do we still need HW VLAN stripping in general?", not
only for XDP. So I replied for "in general" -- yes.
Forcefully disabling stripping when XDP is active is obscure IMO, let
the user decide.

Thanks,
Olek
Jakub Kicinski Oct. 5, 2023, 5:16 p.m. UTC | #5
On Thu, 5 Oct 2023 18:58:33 +0200 Alexander Lobakin wrote:
> > No unsharing - you can still strip it in the driver.  
> 
> Nobody manually strips VLAN tags in the drivers. You either have HW
> stripping or pass VLAN-tagged skb to the stack, so that skb_vlan_untag()
> takes care of it.

Isn't it just a case of circular logic tho?
We don't optimize the stack for SW stripping because HW does it.
Then HW does it because SW is not optimized.

> > Do you really think that for XDP kfunc call will be cheaper?  
> 
> Wait, you initially asked:
> 
> * discussion about the validity of VLAN stripping as an offload?
> * Do people actually care about having it enabled?
> 
> I did read this as "do we still need HW VLAN stripping in general?", not
> only for XDP. So I replied for "in general" -- yes.
> Forcefully disabling stripping when XDP is active is obscure IMO, let
> the user decide.

Every time I'm involved in conversations about NIC datapath host
interfaces I cringe at this stupid VLAN offload. Maybe I'm too
daft to understand it's amazing value but we just shift 2B from
the packet to the descriptor and then we have to worry about all
the corner cases that come from vlan stacking :(
David Ahern Oct. 5, 2023, 5:20 p.m. UTC | #6
On 10/5/23 11:16 AM, Jakub Kicinski wrote:
> On Thu, 5 Oct 2023 18:58:33 +0200 Alexander Lobakin wrote:
>>> No unsharing - you can still strip it in the driver.  
>>
>> Nobody manually strips VLAN tags in the drivers. You either have HW
>> stripping or pass VLAN-tagged skb to the stack, so that skb_vlan_untag()
>> takes care of it.
> 
> Isn't it just a case of circular logic tho?
> We don't optimize the stack for SW stripping because HW does it.
> Then HW does it because SW is not optimized.
> 
>>> Do you really think that for XDP kfunc call will be cheaper?  
>>
>> Wait, you initially asked:
>>
>> * discussion about the validity of VLAN stripping as an offload?
>> * Do people actually care about having it enabled?
>>
>> I did read this as "do we still need HW VLAN stripping in general?", not
>> only for XDP. So I replied for "in general" -- yes.
>> Forcefully disabling stripping when XDP is active is obscure IMO, let
>> the user decide.
> 
> Every time I'm involved in conversations about NIC datapath host
> interfaces I cringe at this stupid VLAN offload. Maybe I'm too
> daft to understand it's amazing value but we just shift 2B from
> the packet to the descriptor and then we have to worry about all
> the corner cases that come from vlan stacking :(

4B (vlan tci + protocol).

VLAN stripping in S/W and pushing the header on Tx is measurable and
does have a noticeable performance impact.

XDP programs need to co-exist with enabled offloads. If the tag is not
stripped, XDP program needs to handle it. If the tag is stripped, the
XDP program needs to access to the value.
Jakub Kicinski Oct. 5, 2023, 6:06 p.m. UTC | #7
On Thu, 5 Oct 2023 11:20:49 -0600 David Ahern wrote:
> > Every time I'm involved in conversations about NIC datapath host
> > interfaces I cringe at this stupid VLAN offload. Maybe I'm too
> > daft to understand it's amazing value but we just shift 2B from
> > the packet to the descriptor and then we have to worry about all
> > the corner cases that come from vlan stacking :(  
> 
> 4B (vlan tci + protocol).
> 
> VLAN stripping in S/W and pushing the header on Tx is measurable and
> does have a noticeable performance impact.
> 
> XDP programs need to co-exist with enabled offloads. If the tag is not
> stripped, XDP program needs to handle it. If the tag is stripped, the
> XDP program needs to access to the value.

Well, I thought I'd ask :) I'm not opposed.

But if either of you have the data on how much slower well-implemented
Rx stripping in the driver is than putting the info in the descriptor,
I'd be very interested.

Tx is a different situation.
diff mbox series

Patch

diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index 205696780b78..bb53b00d1b2c 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -18,7 +18,13 @@  Currently, the following kfuncs are supported. In the future, as more
 metadata is supported, this set will grow:
 
 .. kernel-doc:: net/core/xdp.c
-   :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
+   :identifiers: bpf_xdp_metadata_rx_timestamp
+
+.. kernel-doc:: net/core/xdp.c
+   :identifiers: bpf_xdp_metadata_rx_hash
+
+.. kernel-doc:: net/core/xdp.c
+   :identifiers: bpf_xdp_metadata_rx_vlan_tag
 
 An XDP program can use these kfuncs to read the metadata into stack
 variables for its own consumption. Or, to pass the metadata on to other
diff --git a/include/net/xdp.h b/include/net/xdp.h
index eb77040b4825..ef79f124dbcf 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -399,6 +399,10 @@  void xdp_attachment_setup(struct xdp_attachment_info *info,
 			   NETDEV_XDP_RX_METADATA_HASH, \
 			   bpf_xdp_metadata_rx_hash, \
 			   xmo_rx_hash) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
+			   NETDEV_XDP_RX_METADATA_VLAN_TAG, \
+			   bpf_xdp_metadata_rx_vlan_tag, \
+			   xmo_rx_vlan_tag) \
 
 enum xdp_rx_metadata {
 #define XDP_METADATA_KFUNC(name, _, __, ___) name,
@@ -460,6 +464,8 @@  struct xdp_metadata_ops {
 	int	(*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
 	int	(*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash,
 			       enum xdp_rss_hash_type *rss_type);
+	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, __be16 *vlan_proto,
+				   u16 *vlan_tci);
 };
 
 #ifdef CONFIG_NET
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 2943a151d4f1..661f603e3e43 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -44,13 +44,16 @@  enum netdev_xdp_act {
  *   timestamp via bpf_xdp_metadata_rx_timestamp().
  * @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet
  *   hash via bpf_xdp_metadata_rx_hash().
+ * @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing stripped
+ *   receive VLAN tag (proto and TCI) via bpf_xdp_metadata_rx_vlan_tag().
  */
 enum netdev_xdp_rx_metadata {
 	NETDEV_XDP_RX_METADATA_TIMESTAMP = 1,
 	NETDEV_XDP_RX_METADATA_HASH = 2,
+	NETDEV_XDP_RX_METADATA_VLAN_TAG = 4,
 
 	/* private: */
-	NETDEV_XDP_RX_METADATA_MASK = 3,
+	NETDEV_XDP_RX_METADATA_MASK = 7,
 };
 
 enum {
diff --git a/net/core/xdp.c b/net/core/xdp.c
index df4789ab512d..fb87925b3dc3 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -738,6 +738,39 @@  __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
 	return -EOPNOTSUPP;
 }
 
+/**
+ * bpf_xdp_metadata_rx_vlan_tag - Get XDP packet outermost VLAN tag
+ * @ctx: XDP context pointer.
+ * @vlan_proto: Destination pointer for VLAN Tag protocol identifier (TPID).
+ * @vlan_tci: Destination pointer for VLAN TCI (VID + DEI + PCP)
+ *
+ * In case of success, ``vlan_proto`` contains *Tag protocol identifier (TPID)*,
+ * usually ``ETH_P_8021Q`` or ``ETH_P_8021AD``, but some networks can use
+ * custom TPIDs. ``vlan_proto`` is stored in **network byte order (BE)**
+ * and should be used as follows:
+ * ``if (vlan_proto == bpf_htons(ETH_P_8021Q)) do_something();``
+ *
+ * ``vlan_tci`` contains the remaining 16 bits of a VLAN tag.
+ * Driver is expected to provide those in **host byte order (usually LE)**,
+ * so the bpf program should not perform byte conversion.
+ * According to 802.1Q standard, *VLAN TCI (Tag control information)*
+ * is a bit field that contains:
+ * *VLAN identifier (VID)* that can be read with ``vlan_tci & 0xfff``,
+ * *Drop eligible indicator (DEI)* - 1 bit,
+ * *Priority code point (PCP)* - 3 bits.
+ * For detailed meaning of DEI and PCP, please refer to other sources.
+ *
+ * Return:
+ * * Returns 0 on success or ``-errno`` on error.
+ * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc
+ * * ``-ENODATA``    : VLAN tag was not stripped or is not available
+ */
+__bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
+					     __be16 *vlan_proto, u16 *vlan_tci)
+{
+	return -EOPNOTSUPP;
+}
+
 __diag_pop();
 
 BTF_SET8_START(xdp_metadata_kfunc_ids)
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 2943a151d4f1..661f603e3e43 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -44,13 +44,16 @@  enum netdev_xdp_act {
  *   timestamp via bpf_xdp_metadata_rx_timestamp().
  * @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet
  *   hash via bpf_xdp_metadata_rx_hash().
+ * @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing stripped
+ *   receive VLAN tag (proto and TCI) via bpf_xdp_metadata_rx_vlan_tag().
  */
 enum netdev_xdp_rx_metadata {
 	NETDEV_XDP_RX_METADATA_TIMESTAMP = 1,
 	NETDEV_XDP_RX_METADATA_HASH = 2,
+	NETDEV_XDP_RX_METADATA_VLAN_TAG = 4,
 
 	/* private: */
-	NETDEV_XDP_RX_METADATA_MASK = 3,
+	NETDEV_XDP_RX_METADATA_MASK = 7,
 };
 
 enum {