diff mbox series

[net-next,4/4] xsk: add helper to get &xdp_desc's DMA and meta pointer in one go

Message ID 20250206182630.3914318-5-aleksander.lobakin@intel.com (mailing list archive)
State Accepted
Commit 23d9324a27a48858cfdd7f0342f52328e8595c6d
Delegated to: Netdev Maintainers
Headers show
Series xsk: the lost bits from Chapter III | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers warning 4 maintainers not CCed: jonathan.lemon@gmail.com bjorn@kernel.org hawk@kernel.org horms@kernel.org
netdev/build_clang success Errors and warnings before: 157 this patch: 157
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 30 this patch: 30
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 152 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-07--21-00 (tests: 890)

Commit Message

Alexander Lobakin Feb. 6, 2025, 6:26 p.m. UTC
Currently, when your driver supports XSk Tx metadata and you want to
send an XSk frame, you need to do the following:

* call external xsk_buff_raw_get_dma();
* call inline xsk_buff_get_metadata(), which calls external
  xsk_buff_raw_get_data() and then do some inline checks.

This effectively means that the following piece:

addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;

is done twice per frame, plus you have 2 external calls per frame, plus
this:

	meta = pool->addrs + addr - pool->tx_metadata_len;
	if (unlikely(!xsk_buff_valid_tx_metadata(meta)))

is always inlined, even if there's no meta or it's invalid.

Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
in one go. It returns a small structure with 2 fields: DMA address,
filled unconditionally, and metadata pointer, non-NULL only if it's
present and valid. The address correction is performed only once and
you also have only 1 external call per XSk frame, which does all the
calculations and checks outside of your hotpath. You only need to
check `if (ctx.meta)` for the metadata presence.
To not copy any existing code, derive address correction and getting
virtual and DMA address into small helpers. bloat-o-meter reports no
object code changes for the existing functionality.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/xdp_sock_drv.h  | 43 +++++++++++++++++++++++++++++++---
 include/net/xsk_buff_pool.h |  8 +++++++
 net/xdp/xsk_buff_pool.c     | 46 +++++++++++++++++++++++++++++++++----
 3 files changed, 90 insertions(+), 7 deletions(-)

Comments

Simon Horman Feb. 9, 2025, 11:03 a.m. UTC | #1
On Thu, Feb 06, 2025 at 07:26:29PM +0100, Alexander Lobakin wrote:
> Currently, when your driver supports XSk Tx metadata and you want to
> send an XSk frame, you need to do the following:
> 
> * call external xsk_buff_raw_get_dma();
> * call inline xsk_buff_get_metadata(), which calls external
>   xsk_buff_raw_get_data() and then do some inline checks.
> 
> This effectively means that the following piece:
> 
> addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
> 
> is done twice per frame, plus you have 2 external calls per frame, plus
> this:
> 
> 	meta = pool->addrs + addr - pool->tx_metadata_len;
> 	if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
> 
> is always inlined, even if there's no meta or it's invalid.
> 
> Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
> in one go. It returns a small structure with 2 fields: DMA address,
> filled unconditionally, and metadata pointer, non-NULL only if it's
> present and valid. The address correction is performed only once and
> you also have only 1 external call per XSk frame, which does all the
> calculations and checks outside of your hotpath. You only need to
> check `if (ctx.meta)` for the metadata presence.
> To not copy any existing code, derive address correction and getting
> virtual and DMA address into small helpers. bloat-o-meter reports no
> object code changes for the existing functionality.
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Hi Alexander,

I think that this patch needs to be accompanied by at least one
patch that uses xsk_buff_raw_get_ctx() in a driver.

Also, as this seems to be an optimisation, some performance data would
be nice too.

Which brings me to my last point. I'd always understood that
returning a struct was discouraged due to performance implications.
Perhaps that information is out of date, doesn't apply because
the returned struct is so small in this case, or just plain wrong.
But I'd appreciate it if you could add some colour to this.
Alexander Lobakin Feb. 10, 2025, 4 p.m. UTC | #2
From: Simon Horman <horms@kernel.org>
Date: Sun, 9 Feb 2025 11:03:44 +0000

> On Thu, Feb 06, 2025 at 07:26:29PM +0100, Alexander Lobakin wrote:
>> Currently, when your driver supports XSk Tx metadata and you want to
>> send an XSk frame, you need to do the following:
>>
>> * call external xsk_buff_raw_get_dma();
>> * call inline xsk_buff_get_metadata(), which calls external
>>   xsk_buff_raw_get_data() and then do some inline checks.
>>
>> This effectively means that the following piece:
>>
>> addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
>>
>> is done twice per frame, plus you have 2 external calls per frame, plus
>> this:
>>
>> 	meta = pool->addrs + addr - pool->tx_metadata_len;
>> 	if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
>>
>> is always inlined, even if there's no meta or it's invalid.
>>
>> Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
>> in one go. It returns a small structure with 2 fields: DMA address,
>> filled unconditionally, and metadata pointer, non-NULL only if it's
>> present and valid. The address correction is performed only once and
>> you also have only 1 external call per XSk frame, which does all the
>> calculations and checks outside of your hotpath. You only need to
>> check `if (ctx.meta)` for the metadata presence.
>> To not copy any existing code, derive address correction and getting
>> virtual and DMA address into small helpers. bloat-o-meter reports no
>> object code changes for the existing functionality.
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> 
> Hi Alexander,
> 
> I think that this patch needs to be accompanied by at least one
> patch that uses xsk_buff_raw_get_ctx() in a driver.

This mini-series is the final part of my Chapter III, which was all
about prereqs in order to add libeth_xdp and then XDP for idpf.
This helper will be used in the next series (Chapter IV) I'll send once
this lands.

> 
> Also, as this seems to be an optimisation, some performance data would
> be nice too.

-1 Kb of object code which has an unrolled-by-8 loop which used this
function each iteration. I don't remember the perf numbers since it was
a year ago and since then I've been using this helper only, but it was
something around a couple procent (which is several hundred Kpps when it
comes to XSk).

> 
> Which brings me to my last point. I'd always understood that
> returning a struct was discouraged due to performance implications.

Rather stack usage, not perf implications. Compound returns are used
heavily throughout the kernel code when sizeof(result) <= 16 bytes.
Here it's also 16 bytes. Just the same as one __u128. Plus this function
doesn't recurse, so the stack won't blow up.

> Perhaps that information is out of date, doesn't apply because
> the returned struct is so small in this case, or just plain wrong.
> But I'd appreciate it if you could add some colour to this.

Moreover, the function is global, not inline, so passing a pointer here
instead of returning a struct may even behave worse in this case.

(and we'll save basically only 8 bytes on the stack, which I don't
 believe is worth it).

Thanks,
Olek
Simon Horman Feb. 10, 2025, 9:06 p.m. UTC | #3
On Mon, Feb 10, 2025 at 05:00:36PM +0100, Alexander Lobakin wrote:
> From: Simon Horman <horms@kernel.org>
> Date: Sun, 9 Feb 2025 11:03:44 +0000
> 
> > On Thu, Feb 06, 2025 at 07:26:29PM +0100, Alexander Lobakin wrote:
> >> Currently, when your driver supports XSk Tx metadata and you want to
> >> send an XSk frame, you need to do the following:
> >>
> >> * call external xsk_buff_raw_get_dma();
> >> * call inline xsk_buff_get_metadata(), which calls external
> >>   xsk_buff_raw_get_data() and then do some inline checks.
> >>
> >> This effectively means that the following piece:
> >>
> >> addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
> >>
> >> is done twice per frame, plus you have 2 external calls per frame, plus
> >> this:
> >>
> >> 	meta = pool->addrs + addr - pool->tx_metadata_len;
> >> 	if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
> >>
> >> is always inlined, even if there's no meta or it's invalid.
> >>
> >> Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
> >> in one go. It returns a small structure with 2 fields: DMA address,
> >> filled unconditionally, and metadata pointer, non-NULL only if it's
> >> present and valid. The address correction is performed only once and
> >> you also have only 1 external call per XSk frame, which does all the
> >> calculations and checks outside of your hotpath. You only need to
> >> check `if (ctx.meta)` for the metadata presence.
> >> To not copy any existing code, derive address correction and getting
> >> virtual and DMA address into small helpers. bloat-o-meter reports no
> >> object code changes for the existing functionality.
> >>
> >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> > 
> > Hi Alexander,
> > 
> > I think that this patch needs to be accompanied by at least one
> > patch that uses xsk_buff_raw_get_ctx() in a driver.
> 
> This mini-series is the final part of my Chapter III, which was all
> about prereqs in order to add libeth_xdp and then XDP for idpf.
> This helper will be used in the next series (Chapter IV) I'll send once
> this lands.

Understood. If it's going to be used in chapter IV then, given
that we've made it to chapter II, that is fine by me.

> > Also, as this seems to be an optimisation, some performance data would
> > be nice too.
> 
> -1 Kb of object code which has an unrolled-by-8 loop which used this
> function each iteration. I don't remember the perf numbers since it was
> a year ago and since then I've been using this helper only, but it was
> something around a couple procent (which is several hundred Kpps when it
> comes to XSk).

Thanks. It might be worth including some of that information in the commit
message, but I don't feel strongly about it.

> 
> > 
> > Which brings me to my last point. I'd always understood that
> > returning a struct was discouraged due to performance implications.
> 
> Rather stack usage, not perf implications. Compound returns are used
> heavily throughout the kernel code when sizeof(result) <= 16 bytes.
> Here it's also 16 bytes. Just the same as one __u128. Plus this function
> doesn't recurse, so the stack won't blow up.

Also understood. It seems my assumptions were somewhat wrong.
So I have no objections to this approach.

> > Perhaps that information is out of date, doesn't apply because
> > the returned struct is so small in this case, or just plain wrong.
> > But I'd appreciate it if you could add some colour to this.
> 
> Moreover, the function is global, not inline, so passing a pointer here
> instead of returning a struct may even behave worse in this case.
> 
> (and we'll save basically only 8 bytes on the stack, which I don't
>  believe is worth it).
> 
> Thanks,
> Olek
>
diff mbox series

Patch

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 784cd34f5bba..15086dcf51d8 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -196,6 +196,23 @@  static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 	return xp_raw_get_data(pool, addr);
 }
 
+/**
+ * xsk_buff_raw_get_ctx - get &xdp_desc context
+ * @pool: XSk buff pool desc address belongs to
+ * @addr: desc address (from userspace)
+ *
+ * Wrapper for xp_raw_get_ctx() to be used in drivers, see its kdoc for
+ * details.
+ *
+ * Return: new &xdp_desc_ctx struct containing desc's DMA address and metadata
+ * pointer, if it is present and valid (initialized to %NULL otherwise).
+ */
+static inline struct xdp_desc_ctx
+xsk_buff_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr)
+{
+	return xp_raw_get_ctx(pool, addr);
+}
+
 #define XDP_TXMD_FLAGS_VALID ( \
 		XDP_TXMD_FLAGS_TIMESTAMP | \
 		XDP_TXMD_FLAGS_CHECKSUM | \
@@ -207,20 +224,27 @@  xsk_buff_valid_tx_metadata(const struct xsk_tx_metadata *meta)
 	return !(meta->flags & ~XDP_TXMD_FLAGS_VALID);
 }
 
-static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr)
+static inline struct xsk_tx_metadata *
+__xsk_buff_get_metadata(const struct xsk_buff_pool *pool, void *data)
 {
 	struct xsk_tx_metadata *meta;
 
 	if (!pool->tx_metadata_len)
 		return NULL;
 
-	meta = xp_raw_get_data(pool, addr) - pool->tx_metadata_len;
+	meta = data - pool->tx_metadata_len;
 	if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
 		return NULL; /* no way to signal the error to the user */
 
 	return meta;
 }
 
+static inline struct xsk_tx_metadata *
+xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr)
+{
+	return __xsk_buff_get_metadata(pool, xp_raw_get_data(pool, addr));
+}
+
 static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp)
 {
 	struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
@@ -388,12 +412,25 @@  static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 	return NULL;
 }
 
+static inline struct xdp_desc_ctx
+xsk_buff_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr)
+{
+	return (struct xdp_desc_ctx){ };
+}
+
 static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta)
 {
 	return false;
 }
 
-static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr)
+static inline struct xsk_tx_metadata *
+__xsk_buff_get_metadata(const struct xsk_buff_pool *pool, void *data)
+{
+	return NULL;
+}
+
+static inline struct xsk_tx_metadata *
+xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr)
 {
 	return NULL;
 }
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 50779406bc2d..1dcd4d71468a 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -141,6 +141,14 @@  u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max);
 bool xp_can_alloc(struct xsk_buff_pool *pool, u32 count);
 void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr);
 dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr);
+
+struct xdp_desc_ctx {
+	dma_addr_t dma;
+	struct xsk_tx_metadata *meta;
+};
+
+struct xdp_desc_ctx xp_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr);
+
 static inline dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb)
 {
 	return xskb->dma;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 1f7975b49657..c263fb7a68dc 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -699,18 +699,56 @@  void xp_free(struct xdp_buff_xsk *xskb)
 }
 EXPORT_SYMBOL(xp_free);
 
-void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
+static u64 __xp_raw_get_addr(const struct xsk_buff_pool *pool, u64 addr)
+{
+	return pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
+}
+
+static void *__xp_raw_get_data(const struct xsk_buff_pool *pool, u64 addr)
 {
-	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
 	return pool->addrs + addr;
 }
+
+void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
+{
+	return __xp_raw_get_data(pool, __xp_raw_get_addr(pool, addr));
+}
 EXPORT_SYMBOL(xp_raw_get_data);
 
-dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr)
+static dma_addr_t __xp_raw_get_dma(const struct xsk_buff_pool *pool, u64 addr)
 {
-	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
 	return (pool->dma_pages[addr >> PAGE_SHIFT] &
 		~XSK_NEXT_PG_CONTIG_MASK) +
 		(addr & ~PAGE_MASK);
 }
+
+dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr)
+{
+	return __xp_raw_get_dma(pool, __xp_raw_get_addr(pool, addr));
+}
 EXPORT_SYMBOL(xp_raw_get_dma);
+
+/**
+ * xp_raw_get_ctx - get &xdp_desc context
+ * @pool: XSk buff pool desc address belongs to
+ * @addr: desc address (from userspace)
+ *
+ * Helper for getting desc's DMA address and metadata pointer, if present.
+ * Saves one call on hotpath, double calculation of the actual address,
+ * and inline checks for metadata presence and sanity.
+ *
+ * Return: new &xdp_desc_ctx struct containing desc's DMA address and metadata
+ * pointer, if it is present and valid (initialized to %NULL otherwise).
+ */
+struct xdp_desc_ctx xp_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr)
+{
+	struct xdp_desc_ctx ret;
+
+	addr = __xp_raw_get_addr(pool, addr);
+
+	ret.dma = __xp_raw_get_dma(pool, addr);
+	ret.meta = __xsk_buff_get_metadata(pool, __xp_raw_get_data(pool, addr));
+
+	return ret;
+}
+EXPORT_SYMBOL(xp_raw_get_ctx);