diff mbox series

[net-next,v3] page_pool: check for dma_sync_size earlier

Message ID 20250106030225.3901305-1-0x1207@gmail.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v3] page_pool: check for dma_sync_size earlier | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-01-06--15-00 (tests: 887)

Commit Message

Furong Xu Jan. 6, 2025, 3:02 a.m. UTC
Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
already did.
We can save a couple of function calls if check for dma_sync_size earlier.

This is a micro optimization, about 0.6% PPS performance improvement
has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
traffic test.

Before this patch:
The average of packets per second is 234026 in one minute.

After this patch:
The average of packets per second is 235537 in one minute.

Signed-off-by: Furong Xu <0x1207@gmail.com>
---
V2 -> V3: Add more details about measurement in commit message
V2: https://lore.kernel.org/r/20250103082814.3850096-1-0x1207@gmail.com

V1 -> V2: Add measurement data about performance improvement in commit message
V1: https://lore.kernel.org/r/20241010114019.1734573-1-0x1207@gmail.com
---
 net/core/page_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jason Xing Jan. 6, 2025, 3:15 a.m. UTC | #1
On Mon, Jan 6, 2025 at 11:02 AM Furong Xu <0x1207@gmail.com> wrote:
>
> Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
> already did.
> We can save a couple of function calls if check for dma_sync_size earlier.
>
> This is a micro optimization, about 0.6% PPS performance improvement
> has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
> traffic test.
>
> Before this patch:
> The average of packets per second is 234026 in one minute.
>
> After this patch:
> The average of packets per second is 235537 in one minute.

Sorry, I keep skeptical that this small improvement can be statically
observed? What exact tool or benchmark are you using, I wonder?

Thanks,
Jason
Furong Xu Jan. 6, 2025, 3:31 a.m. UTC | #2
On Mon, 6 Jan 2025 11:15:45 +0800, Jason Xing <kerneljasonxing@gmail.com> wrote:

> On Mon, Jan 6, 2025 at 11:02 AM Furong Xu <0x1207@gmail.com> wrote:
> >
> > Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
> > already did.
> > We can save a couple of function calls if check for dma_sync_size earlier.
> >
> > This is a micro optimization, about 0.6% PPS performance improvement
> > has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
> > traffic test.
> >
> > Before this patch:
> > The average of packets per second is 234026 in one minute.
> >
> > After this patch:
> > The average of packets per second is 235537 in one minute.  
> 
> Sorry, I keep skeptical that this small improvement can be statically
> observed? What exact tool or benchmark are you using, I wonder?

A x86 PC send out UDP packet and the sar cmd from Sysstat package to report
the PPS on RX side:
sar -n DEV 60 1
diff mbox series

Patch

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9733206d6406..9bb2d2300d0b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -458,7 +458,7 @@  page_pool_dma_sync_for_device(const struct page_pool *pool,
 			      netmem_ref netmem,
 			      u32 dma_sync_size)
 {
-	if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
+	if (pool->dma_sync && dma_dev_need_sync(pool->p.dev) && dma_sync_size)
 		__page_pool_dma_sync_for_device(pool, netmem, dma_sync_size);
 }