Message ID | 20230525125746.553874-8-aleksander.lobakin@intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: intel: start The Great Code Dedup + Page Pool for iavf | expand |
Context | Check | Description |
---|---|---|
netdev/series_format | success | Posting correctly formatted |
netdev/tree_selection | success | Clearly marked for net-next, async |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 99 this patch: 99 |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/build_clang | fail | Errors and warnings before: 313 this patch: 36 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/deprecated_api | success | None detected |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | No Fixes tag |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 478 this patch: 150 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 31 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 821c75bba125..08e9571d2545 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -46,6 +46,9 @@ * device driver responsibility */ #define PP_FLAG_PAGE_FRAG BIT(2) /* for page frag feature */ +#define PP_FLAG_DMA_MAYBE_SYNC BIT(3) /* Internal, should not be used in + * drivers + */ #define PP_FLAG_ALL (PP_FLAG_DMA_MAP |\ PP_FLAG_DMA_SYNC_DEV |\ PP_FLAG_PAGE_FRAG) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index e212e9d7edcb..57f323dee6c4 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -175,6 +175,10 @@ static int page_pool_init(struct page_pool *pool, /* pool->p.offset has to be set according to the address * offset used by the DMA engine to start copying rx data */ + + /* Try to avoid calling no-op syncs */ + pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC; + pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV; } if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT && @@ -323,6 +327,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) page_pool_set_dma_addr(page, dma); + if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) && + dma_need_sync(pool->p.dev, dma)) { + pool->p.flags |= PP_FLAG_DMA_SYNC_DEV; + pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC; + } + if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles even when on DMA-coherent platforms (like x86) with no active IOMMU or swiotlb, just for the call ladder. Indeed, it's page_pool_put_page() page_pool_put_defragged_page() <- external __page_pool_put_page() page_pool_dma_sync_for_device() <- non-inline dma_sync_single_range_for_device() dma_sync_single_for_device() <- external dma_direct_sync_single_for_device() dev_is_dma_coherent() <- exit For the inline functions, no guarantees the compiler won't uninline them (they're clearly not one-liners and sometimes compilers uninline even 2 + 2). The first external call is necessary, but the rest 2+ are done for nothing each time, plus a bunch of checks here and there. Since Page Pool mappings are long-term and for one "device + addr" pair dma_need_sync() will always return the same value (basically, whether it belongs to an swiotlb pool), addresses can be tested once right after they're obtained and the result can be reused until the page is unmapped. Define new PP flag, which will mean "do DMA syncs for device, but only when needed" and turn it on by default when the driver asks to sync pages. When a page is mapped, check whether it needs syncs and if so, replace that "sync when needed" back to "always do syncs" globally for the whole pool (better safe than sorry). As long as a pool has no pages requiring DMA syncs, this cuts off a good piece of calls and checks. On my x86_64, this gives from 2% to 5% performance benefit with no negative impact for cases when IOMMU is on and the shortcut can't be used. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> --- include/net/page_pool.h | 3 +++ net/core/page_pool.c | 10 ++++++++++ 2 files changed, 13 insertions(+)