Message ID | 20250308214045.1160445-3-almasrymina@google.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Device memory TCP TX | expand |
On 3/8/25 10:40 PM, Mina Almasry wrote: > Currently net_iovs support only pp ref counts, and do not support a > page ref equivalent. > > This is fine for the RX path as net_iovs are used exclusively with the > pp and only pp refcounting is needed there. The TX path however does not > use pp ref counts, thus, support for get_page/put_page equivalent is > needed for netmem. > > Support get_netmem/put_netmem. Check the type of the netmem before > passing it to page or net_iov specific code to obtain a page ref > equivalent. > > For dmabuf net_iovs, we obtain a ref on the underlying binding. This > ensures the entire binding doesn't disappear until all the net_iovs have > been put_netmem'ed. We do not need to track the refcount of individual > dmabuf net_iovs as we don't allocate/free them from a pool similar to > what the buddy allocator does for pages. > > This code is written to be extensible by other net_iov implementers. > get_netmem/put_netmem will check the type of the netmem and route it to > the correct helper: > > pages -> [get|put]_page() > dmabuf net_iovs -> net_devmem_[get|put]_net_iov() > new net_iovs -> new helpers > > Signed-off-by: Mina Almasry <almasrymina@google.com> > Acked-by: Stanislav Fomichev <sdf@fomichev.me> > > --- > > v5: https://lore.kernel.org/netdev/20250227041209.2031104-2-almasrymina@google.com/ > > - Updated to check that the net_iov is devmem before calling > net_devmem_put_net_iov(). > > - Jakub requested that callers of __skb_frag_ref()/skb_page_unref be > inspected to make sure that they generate / anticipate skbs with the > correct pp_recycle and unreadable setting: > > skb_page_unref > ============== > > - callers that are unreachable for unreadable skbs: > > gro_pull_from_frag0, skb_copy_ubufs, __pskb_pull_tail Why `__pskb_pull_tail` is not reachable? it's called by __pskb_trim(), via skb_condense(). /P
On Thu, Mar 13, 2025 at 3:47 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 3/8/25 10:40 PM, Mina Almasry wrote: > > Currently net_iovs support only pp ref counts, and do not support a > > page ref equivalent. > > > > This is fine for the RX path as net_iovs are used exclusively with the > > pp and only pp refcounting is needed there. The TX path however does not > > use pp ref counts, thus, support for get_page/put_page equivalent is > > needed for netmem. > > > > Support get_netmem/put_netmem. Check the type of the netmem before > > passing it to page or net_iov specific code to obtain a page ref > > equivalent. > > > > For dmabuf net_iovs, we obtain a ref on the underlying binding. This > > ensures the entire binding doesn't disappear until all the net_iovs have > > been put_netmem'ed. We do not need to track the refcount of individual > > dmabuf net_iovs as we don't allocate/free them from a pool similar to > > what the buddy allocator does for pages. > > > > This code is written to be extensible by other net_iov implementers. > > get_netmem/put_netmem will check the type of the netmem and route it to > > the correct helper: > > > > pages -> [get|put]_page() > > dmabuf net_iovs -> net_devmem_[get|put]_net_iov() > > new net_iovs -> new helpers > > > > Signed-off-by: Mina Almasry <almasrymina@google.com> > > Acked-by: Stanislav Fomichev <sdf@fomichev.me> > > > > --- > > > > v5: https://lore.kernel.org/netdev/20250227041209.2031104-2-almasrymina@google.com/ > > > > - Updated to check that the net_iov is devmem before calling > > net_devmem_put_net_iov(). > > > > - Jakub requested that callers of __skb_frag_ref()/skb_page_unref be > > inspected to make sure that they generate / anticipate skbs with the > > correct pp_recycle and unreadable setting: > > > > skb_page_unref > > ============== > > > > - callers that are unreachable for unreadable skbs: > > > > gro_pull_from_frag0, skb_copy_ubufs, __pskb_pull_tail > > Why `__pskb_pull_tail` is not reachable? it's called by __pskb_trim(), > via skb_condense(). > I meant to say "the skb_page_unref call from __skb_pull_tail is not reachable for unreadable skbs". This is because __skb_pull_tail early returns on unreadable skbs.
diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 0f3c58007488..9e49372ef1a0 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -17,7 +17,7 @@ */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + get_netmem(skb_frag_netmem(frag)); } /** @@ -40,7 +40,7 @@ static inline void skb_page_unref(netmem_ref netmem, bool recycle) if (recycle && napi_pp_put_page(netmem)) return; #endif - put_page(netmem_to_page(netmem)); + put_netmem(netmem); } /** diff --git a/include/net/netmem.h b/include/net/netmem.h index 16ef53ea713a..e8afe6b654aa 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -277,4 +277,7 @@ static inline unsigned long netmem_get_dma_addr(netmem_ref netmem) return __netmem_clear_lsb(netmem)->dma_addr; } +void get_netmem(netmem_ref netmem); +void put_netmem(netmem_ref netmem); + #endif /* _NET_NETMEM_H */ diff --git a/net/core/devmem.c b/net/core/devmem.c index 69c160ad3ebd..0cf3d189f06c 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -326,6 +326,16 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, return ERR_PTR(err); } +void net_devmem_get_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_get(net_devmem_iov_binding(niov)); +} + +void net_devmem_put_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_put(net_devmem_iov_binding(niov)); +} + /*** "Dmabuf devmem memory provider" ***/ int mp_dmabuf_devmem_init(struct page_pool *pool) diff --git a/net/core/devmem.h b/net/core/devmem.h index 7fc158d52729..946f2e015746 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -29,6 +29,10 @@ struct net_devmem_dmabuf_binding { * The binding undos itself and unmaps the underlying dmabuf once all * those refs are dropped and the binding is no longer desired or in * use. + * + * net_devmem_get_net_iov() on dmabuf net_iovs will increment this + * reference, making sure that the binding remains alive until all the + * net_iovs are no longer used. */ refcount_t ref; @@ -111,6 +115,9 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) __net_devmem_dmabuf_binding_free(binding); } +void net_devmem_get_net_iov(struct net_iov *niov); +void net_devmem_put_net_iov(struct net_iov *niov); + struct net_iov * net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding); void net_devmem_free_dmabuf(struct net_iov *ppiov); @@ -120,6 +127,19 @@ bool net_is_devmem_iov(struct net_iov *niov); #else struct net_devmem_dmabuf_binding; +static inline void +net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) +{ +} + +static inline void net_devmem_get_net_iov(struct net_iov *niov) +{ +} + +static inline void net_devmem_put_net_iov(struct net_iov *niov) +{ +} + static inline void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ab8acb737b93..ee2d1b769c13 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -89,6 +89,7 @@ #include <linux/textsearch.h> #include "dev.h" +#include "devmem.h" #include "netmem_priv.h" #include "sock_destructor.h" @@ -7315,3 +7316,32 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, return false; } EXPORT_SYMBOL(csum_and_copy_from_iter_full); + +void get_netmem(netmem_ref netmem) +{ + struct net_iov *niov; + + if (netmem_is_net_iov(netmem)) { + niov = netmem_to_net_iov(netmem); + if (net_is_devmem_iov(niov)) + net_devmem_get_net_iov(netmem_to_net_iov(netmem)); + return; + } + get_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(get_netmem); + +void put_netmem(netmem_ref netmem) +{ + struct net_iov *niov; + + if (netmem_is_net_iov(netmem)) { + niov = netmem_to_net_iov(netmem); + if (net_is_devmem_iov(niov)) + net_devmem_put_net_iov(netmem_to_net_iov(netmem)); + return; + } + + put_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(put_netmem);