Message ID | 1626255285-5079-3-git-send-email-linyunsheng@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | add frag page support in page pool | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | warning | 2 maintainers not CCed: jhubbard@nvidia.com aarcange@redhat.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | fail | Errors and warnings before: 18729 this patch: 26958 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 100 lines checked |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 18166 this patch: 17409 |
netdev/header_inline | success | Link |
On 14/07/2021 11.34, Yunsheng Lin wrote: > As suggested by Alexander, "A DMA mapping should be page > aligned anyway so the lower 12 bits would be reserved 0", > so it might make more sense to repurpose the lower 12 bits > of the dma address to store the frag count for frag page > support in page pool for 32 bit systems with 64 bit dma, > which should be rare those days. Do we have any real driver users with 32-bit arch and 64-bit DMA, that want to use this new frag-count system you are adding to page_pool? This "lower 12-bit use" complicates the code we need to maintain forever. My guess is that it is never used, but we need to update and maintain it, and it will never be tested. Why don't you simply reject using page_pool flag PP_FLAG_PAGE_FRAG during setup of the page_pool for this case? if ((pool->p.flags & PP_FLAG_PAGE_FRAG) && (sizeof(dma_addr_t) > sizeof(unsigned long))) goto reject-setup; > For normal system, the dma_addr[1] in 'struct page' is not > used, so we can reuse one of the dma_addr for storing frag > count, which means how many frags this page might be splited > to. > > The PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to decide > where to store the frag count, as the "sizeof(dma_addr_t) > > sizeof(unsigned long)" is false for most systems those days, > so hopefully the compiler will optimize out the unused code > for those systems. > > The newly added page_pool_set_frag_count() should be called > before the page is passed to any user. Otherwise, call the > newly added page_pool_atomic_sub_frag_count_return(). > > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> > --- > include/linux/mm_types.h | 8 +++++-- > include/net/page_pool.h | 54 ++++++++++++++++++++++++++++++++++++++++++------ > net/core/page_pool.c | 10 +++++++++ > 3 files changed, 64 insertions(+), 8 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index d33d97c..82bcbb0 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -103,11 +103,15 @@ struct page { > unsigned long pp_magic; > struct page_pool *pp; > unsigned long _pp_mapping_pad; > + atomic_long_t pp_frag_count; > /** > * @dma_addr: might require a 64-bit value on > - * 32-bit architectures. > + * 32-bit architectures, if so, store the lower 32 > + * bits in pp_frag_count, and a DMA mapping should > + * be page aligned, so the frag count can be stored > + * in lower 12 bits for 4K page size. > */ > - unsigned long dma_addr[2]; > + unsigned long dma_addr; > }; > struct { /* slab, slob and slub */ > union { > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > index 8d7744d..ef449c2 100644 > --- a/include/net/page_pool.h > +++ b/include/net/page_pool.h > @@ -198,19 +198,61 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, > page_pool_put_full_page(pool, page, true); > } > > +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ > + (sizeof(dma_addr_t) > sizeof(unsigned long)) > + > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - dma_addr_t ret = page->dma_addr[0]; > - if (sizeof(dma_addr_t) > sizeof(unsigned long)) > - ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16; > + dma_addr_t ret = page->dma_addr; > + > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { > + ret <<= 32; > + ret |= atomic_long_read(&page->pp_frag_count) & PAGE_MASK; > + } > + > return ret; > } > > static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) > { > - page->dma_addr[0] = addr; > - if (sizeof(dma_addr_t) > sizeof(unsigned long)) > - page->dma_addr[1] = upper_32_bits(addr); > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { > + atomic_long_set(&page->pp_frag_count, addr & PAGE_MASK); > + addr >>= 32; > + } > + > + page->dma_addr = addr; > +} > + > +static inline long page_pool_atomic_sub_frag_count_return(struct page *page, > + long nr) > +{ > + long frag_count = atomic_long_read(&page->pp_frag_count); > + long ret; > + > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { > + if ((frag_count & ~PAGE_MASK) == nr) > + return 0; > + > + ret = atomic_long_sub_return(nr, &page->pp_frag_count); > + WARN_ON((ret & PAGE_MASK) != (frag_count & PAGE_MASK)); > + ret &= ~PAGE_MASK; > + } else { > + if (frag_count == nr) > + return 0; > + > + ret = atomic_long_sub_return(nr, &page->pp_frag_count); > + WARN_ON(ret < 0); > + } > + > + return ret; > +} > + > +static inline void page_pool_set_frag_count(struct page *page, long nr) > +{ > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) > + nr |= atomic_long_read(&page->pp_frag_count) & PAGE_MASK; > + > + atomic_long_set(&page->pp_frag_count, nr); > } > > static inline bool is_page_pool_compiled_in(void) > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 78838c6..0082f33 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -198,6 +198,16 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) > if (dma_mapping_error(pool->p.dev, dma)) > return false; > > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT && > + WARN_ON(pool->p.flags & PP_FLAG_PAGE_FRAG && > + dma & ~PAGE_MASK)) { > + dma_unmap_page_attrs(pool->p.dev, dma, > + PAGE_SIZE << pool->p.order, > + pool->p.dma_dir, > + DMA_ATTR_SKIP_CPU_SYNC); > + return false; > + } > + > page_pool_set_dma_addr(page, dma); > > if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) >
On Wed, 14 Jul 2021 at 13:18, Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > > On 14/07/2021 11.34, Yunsheng Lin wrote: > > As suggested by Alexander, "A DMA mapping should be page > > aligned anyway so the lower 12 bits would be reserved 0", > > so it might make more sense to repurpose the lower 12 bits > > of the dma address to store the frag count for frag page > > support in page pool for 32 bit systems with 64 bit dma, > > which should be rare those days. > > Do we have any real driver users with 32-bit arch and 64-bit DMA, that > want to use this new frag-count system you are adding to page_pool? > > This "lower 12-bit use" complicates the code we need to maintain > forever. My guess is that it is never used, but we need to update and > maintain it, and it will never be tested. > > Why don't you simply reject using page_pool flag PP_FLAG_PAGE_FRAG > during setup of the page_pool for this case? > > if ((pool->p.flags & PP_FLAG_PAGE_FRAG) && > (sizeof(dma_addr_t) > sizeof(unsigned long))) > goto reject-setup; > +1 > > > For normal system, the dma_addr[1] in 'struct page' is not > > used, so we can reuse one of the dma_addr for storing frag > > count, which means how many frags this page might be splited > > to. > > > > The PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to decide > > where to store the frag count, as the "sizeof(dma_addr_t) > > > sizeof(unsigned long)" is false for most systems those days, > > so hopefully the compiler will optimize out the unused code > > for those systems. > > > > The newly added page_pool_set_frag_count() should be called > > before the page is passed to any user. Otherwise, call the > > newly added page_pool_atomic_sub_frag_count_return(). > > > > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> > > --- > > include/linux/mm_types.h | 8 +++++-- > > include/net/page_pool.h | 54 ++++++++++++++++++++++++++++++++++++++++++------ > > net/core/page_pool.c | 10 +++++++++ > > 3 files changed, 64 insertions(+), 8 deletions(-) > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index d33d97c..82bcbb0 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -103,11 +103,15 @@ struct page { > > unsigned long pp_magic; > > struct page_pool *pp; > > unsigned long _pp_mapping_pad; > > + atomic_long_t pp_frag_count; > > /** > > * @dma_addr: might require a 64-bit value on > > - * 32-bit architectures. > > + * 32-bit architectures, if so, store the lower 32 > > + * bits in pp_frag_count, and a DMA mapping should > > + * be page aligned, so the frag count can be stored > > + * in lower 12 bits for 4K page size. > > */ > > - unsigned long dma_addr[2]; > > + unsigned long dma_addr; > > }; > > struct { /* slab, slob and slub */ > > union { > > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > > index 8d7744d..ef449c2 100644 > > --- a/include/net/page_pool.h > > +++ b/include/net/page_pool.h > > @@ -198,19 +198,61 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, > > page_pool_put_full_page(pool, page, true); > > } > > > > +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ > > + (sizeof(dma_addr_t) > sizeof(unsigned long)) > > + > > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > > { > > - dma_addr_t ret = page->dma_addr[0]; > > - if (sizeof(dma_addr_t) > sizeof(unsigned long)) > > - ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16; > > + dma_addr_t ret = page->dma_addr; > > + > > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { > > + ret <<= 32; > > + ret |= atomic_long_read(&page->pp_frag_count) & PAGE_MASK; > > + } > > + > > return ret; > > } > > > > static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) > > { > > - page->dma_addr[0] = addr; > > - if (sizeof(dma_addr_t) > sizeof(unsigned long)) > > - page->dma_addr[1] = upper_32_bits(addr); > > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { > > + atomic_long_set(&page->pp_frag_count, addr & PAGE_MASK); > > + addr >>= 32; > > + } > > + > > + page->dma_addr = addr; > > +} > > + > > +static inline long page_pool_atomic_sub_frag_count_return(struct page *page, > > + long nr) > > +{ > > + long frag_count = atomic_long_read(&page->pp_frag_count); > > + long ret; > > + > > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { > > + if ((frag_count & ~PAGE_MASK) == nr) > > + return 0; > > + > > + ret = atomic_long_sub_return(nr, &page->pp_frag_count); > > + WARN_ON((ret & PAGE_MASK) != (frag_count & PAGE_MASK)); > > + ret &= ~PAGE_MASK; > > + } else { > > + if (frag_count == nr) > > + return 0; > > + > > + ret = atomic_long_sub_return(nr, &page->pp_frag_count); > > + WARN_ON(ret < 0); > > + } > > + > > + return ret; > > +} > > + > > +static inline void page_pool_set_frag_count(struct page *page, long nr) > > +{ > > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) > > + nr |= atomic_long_read(&page->pp_frag_count) & PAGE_MASK; > > + > > + atomic_long_set(&page->pp_frag_count, nr); > > } > > > > static inline bool is_page_pool_compiled_in(void) > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > index 78838c6..0082f33 100644 > > --- a/net/core/page_pool.c > > +++ b/net/core/page_pool.c > > @@ -198,6 +198,16 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) > > if (dma_mapping_error(pool->p.dev, dma)) > > return false; > > > > + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT && > > + WARN_ON(pool->p.flags & PP_FLAG_PAGE_FRAG && > > + dma & ~PAGE_MASK)) { > > + dma_unmap_page_attrs(pool->p.dev, dma, > > + PAGE_SIZE << pool->p.order, > > + pool->p.dma_dir, > > + DMA_ATTR_SKIP_CPU_SYNC); > > + return false; > > + } > > + > > page_pool_set_dma_addr(page, dma); > > > > if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) > > >
On Wed, Jul 14, 2021 at 3:18 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > > On 14/07/2021 11.34, Yunsheng Lin wrote: > > As suggested by Alexander, "A DMA mapping should be page > > aligned anyway so the lower 12 bits would be reserved 0", > > so it might make more sense to repurpose the lower 12 bits > > of the dma address to store the frag count for frag page > > support in page pool for 32 bit systems with 64 bit dma, > > which should be rare those days. > > Do we have any real driver users with 32-bit arch and 64-bit DMA, that > want to use this new frag-count system you are adding to page_pool? > > This "lower 12-bit use" complicates the code we need to maintain > forever. My guess is that it is never used, but we need to update and > maintain it, and it will never be tested. > > Why don't you simply reject using page_pool flag PP_FLAG_PAGE_FRAG > during setup of the page_pool for this case? > > if ((pool->p.flags & PP_FLAG_PAGE_FRAG) && > (sizeof(dma_addr_t) > sizeof(unsigned long))) > goto reject-setup; > > That sounds good to me if we want to go that route. It would simplify this quite a bit since essentially we could just drop these if blocks. Thanks. - Alex
On 2021/7/14 22:46, Alexander Duyck wrote: > On Wed, Jul 14, 2021 at 3:18 AM Jesper Dangaard Brouer > <jbrouer@redhat.com> wrote: >> >> >> >> On 14/07/2021 11.34, Yunsheng Lin wrote: >>> As suggested by Alexander, "A DMA mapping should be page >>> aligned anyway so the lower 12 bits would be reserved 0", >>> so it might make more sense to repurpose the lower 12 bits >>> of the dma address to store the frag count for frag page >>> support in page pool for 32 bit systems with 64 bit dma, >>> which should be rare those days. >> >> Do we have any real driver users with 32-bit arch and 64-bit DMA, that >> want to use this new frag-count system you are adding to page_pool? >> >> This "lower 12-bit use" complicates the code we need to maintain >> forever. My guess is that it is never used, but we need to update and >> maintain it, and it will never be tested. >> >> Why don't you simply reject using page_pool flag PP_FLAG_PAGE_FRAG >> during setup of the page_pool for this case? >> >> if ((pool->p.flags & PP_FLAG_PAGE_FRAG) && >> (sizeof(dma_addr_t) > sizeof(unsigned long))) >> goto reject-setup; >> >> > > That sounds good to me if we want to go that route. It would simplify > this quite a bit since essentially we could just drop these if blocks. Ok, let's wait for a few day to see if there is anyone with 32-bit arch and 64-bit DMA system care enough to use the frag-count support in the page pool. > > Thanks. > > - Alex > . >
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d33d97c..82bcbb0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -103,11 +103,15 @@ struct page { unsigned long pp_magic; struct page_pool *pp; unsigned long _pp_mapping_pad; + atomic_long_t pp_frag_count; /** * @dma_addr: might require a 64-bit value on - * 32-bit architectures. + * 32-bit architectures, if so, store the lower 32 + * bits in pp_frag_count, and a DMA mapping should + * be page aligned, so the frag count can be stored + * in lower 12 bits for 4K page size. */ - unsigned long dma_addr[2]; + unsigned long dma_addr; }; struct { /* slab, slob and slub */ union { diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 8d7744d..ef449c2 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -198,19 +198,61 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, page_pool_put_full_page(pool, page, true); } +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ + (sizeof(dma_addr_t) > sizeof(unsigned long)) + static inline dma_addr_t page_pool_get_dma_addr(struct page *page) { - dma_addr_t ret = page->dma_addr[0]; - if (sizeof(dma_addr_t) > sizeof(unsigned long)) - ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16; + dma_addr_t ret = page->dma_addr; + + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { + ret <<= 32; + ret |= atomic_long_read(&page->pp_frag_count) & PAGE_MASK; + } + return ret; } static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) { - page->dma_addr[0] = addr; - if (sizeof(dma_addr_t) > sizeof(unsigned long)) - page->dma_addr[1] = upper_32_bits(addr); + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { + atomic_long_set(&page->pp_frag_count, addr & PAGE_MASK); + addr >>= 32; + } + + page->dma_addr = addr; +} + +static inline long page_pool_atomic_sub_frag_count_return(struct page *page, + long nr) +{ + long frag_count = atomic_long_read(&page->pp_frag_count); + long ret; + + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { + if ((frag_count & ~PAGE_MASK) == nr) + return 0; + + ret = atomic_long_sub_return(nr, &page->pp_frag_count); + WARN_ON((ret & PAGE_MASK) != (frag_count & PAGE_MASK)); + ret &= ~PAGE_MASK; + } else { + if (frag_count == nr) + return 0; + + ret = atomic_long_sub_return(nr, &page->pp_frag_count); + WARN_ON(ret < 0); + } + + return ret; +} + +static inline void page_pool_set_frag_count(struct page *page, long nr) +{ + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) + nr |= atomic_long_read(&page->pp_frag_count) & PAGE_MASK; + + atomic_long_set(&page->pp_frag_count, nr); } static inline bool is_page_pool_compiled_in(void) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 78838c6..0082f33 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -198,6 +198,16 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) if (dma_mapping_error(pool->p.dev, dma)) return false; + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT && + WARN_ON(pool->p.flags & PP_FLAG_PAGE_FRAG && + dma & ~PAGE_MASK)) { + dma_unmap_page_attrs(pool->p.dev, dma, + PAGE_SIZE << pool->p.order, + pool->p.dma_dir, + DMA_ATTR_SKIP_CPU_SYNC); + return false; + } + page_pool_set_dma_addr(page, dma); if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
As suggested by Alexander, "A DMA mapping should be page aligned anyway so the lower 12 bits would be reserved 0", so it might make more sense to repurpose the lower 12 bits of the dma address to store the frag count for frag page support in page pool for 32 bit systems with 64 bit dma, which should be rare those days. For normal system, the dma_addr[1] in 'struct page' is not used, so we can reuse one of the dma_addr for storing frag count, which means how many frags this page might be splited to. The PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to decide where to store the frag count, as the "sizeof(dma_addr_t) > sizeof(unsigned long)" is false for most systems those days, so hopefully the compiler will optimize out the unused code for those systems. The newly added page_pool_set_frag_count() should be called before the page is passed to any user. Otherwise, call the newly added page_pool_atomic_sub_frag_count_return(). Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> --- include/linux/mm_types.h | 8 +++++-- include/net/page_pool.h | 54 ++++++++++++++++++++++++++++++++++++++++++------ net/core/page_pool.c | 10 +++++++++ 3 files changed, 64 insertions(+), 8 deletions(-)