[vhost,v2,4/7] virtio_net: big mode support premapped

Message ID	20240422072408.126821-5-xuanzhuo@linux.alibaba.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A54650267 for <netdev@vger.kernel.org>; Mon, 22 Apr 2024 07:24:16 +0000 (UTC) From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> To: virtualization@lists.linux.dev Cc: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Xuan Zhuo <xuanzhuo@linux.alibaba.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org Subject: [PATCH vhost v2 4/7] virtio_net: big mode support premapped Date: Mon, 22 Apr 2024 15:24:05 +0800 Message-Id: <20240422072408.126821-5-xuanzhuo@linux.alibaba.com> In-Reply-To: <20240422072408.126821-1-xuanzhuo@linux.alibaba.com> References: <20240422072408.126821-1-xuanzhuo@linux.alibaba.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	virtio_net: rx enable premapped mode by default \| expand [vhost,v2,0/7] virtio_net: rx enable premapped mode by default [vhost,v2,1/7] virtio_ring: introduce dma map api for page [vhost,v2,2/7] virtio_ring: enable premapped mode whatever use_dma_api [vhost,v2,3/7] virtio_net: replace private by pp struct inside page [vhost,v2,4/7] virtio_net: big mode support premapped [vhost,v2,5/7] virtio_net: enable premapped by default [vhost,v2,6/7] virtio_net: rx remove premapped failover code [vhost,v2,7/7] virtio_net: remove the misleading comment

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	fail	Errors and warnings before: 926 this patch: 928
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 8 of 8 maintainers
netdev/build_clang	success	Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 937 this patch: 939
netdev/checkpatch	warning	WARNING: line length of 81 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Xuan Zhuo April 22, 2024, 7:24 a.m. UTC

In big mode, pre-mapping DMA is beneficial because if the pages are not
used, we can reuse them without needing to unmap and remap.

We require space to store the DMA address. I use the page.dma_addr to
store the DMA address from the pp structure inside the page.

Every page retrieved from get_a_page() is mapped, and its DMA address is
stored in page.dma_addr. When a page is returned to the chain, we check
the DMA status; if it is not mapped (potentially having been unmapped),
we remap it before returning it to the chain.

Based on the following points, we do not use page pool to manage these
pages:

1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
   we can only prevent the page pool from performing DMA operations, and
   let the driver perform DMA operations on the allocated pages.
2. But when the page pool releases the page, we have no chance to
   execute dma unmap.
3. A solution to #2 is to execute dma unmap every time before putting
   the page back to the page pool. (This is actually a waste, we don't
   execute unmap so frequently.)
4. But there is another problem, we still need to use page.dma_addr to
   save the dma address. Using page.dma_addr while using page pool is
   unsafe behavior.

More:
    https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 108 insertions(+), 15 deletions(-)

Jason Wang April 23, 2024, 4:36 a.m. UTC | #1

On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> In big mode, pre-mapping DMA is beneficial because if the pages are not
> used, we can reuse them without needing to unmap and remap.
>
> We require space to store the DMA address. I use the page.dma_addr to
> store the DMA address from the pp structure inside the page.
>
> Every page retrieved from get_a_page() is mapped, and its DMA address is
> stored in page.dma_addr. When a page is returned to the chain, we check
> the DMA status; if it is not mapped (potentially having been unmapped),
> we remap it before returning it to the chain.
>
> Based on the following points, we do not use page pool to manage these
> pages:
>
> 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
>    we can only prevent the page pool from performing DMA operations, and
>    let the driver perform DMA operations on the allocated pages.
> 2. But when the page pool releases the page, we have no chance to
>    execute dma unmap.
> 3. A solution to #2 is to execute dma unmap every time before putting
>    the page back to the page pool. (This is actually a waste, we don't
>    execute unmap so frequently.)
> 4. But there is another problem, we still need to use page.dma_addr to
>    save the dma address. Using page.dma_addr while using page pool is
>    unsafe behavior.
>
> More:
>     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 108 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2c7a67ad4789..d4f5e65b247e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
>         return (struct virtio_net_common_hdr *)skb->cb;
>  }
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> +       sg->dma_address = addr;
> +       sg->length = len;
> +}
> +
> +/* For pages submitted to the ring, we need to record its dma for unmap.
> + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> + * address.
> + */
> +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> +{
> +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {

Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.

> +               p->dma_addr = lower_32_bits(addr);
> +               p->pp_magic = upper_32_bits(addr);

And this uses three fields on page_pool which I'm not sure the other
maintainers are happy with. For example, re-using pp_maing might be
dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").

I think a more safe way is to reuse page pool, for example introducing
a new flag with dma callbacks?

Thanks

Xuan Zhuo April 23, 2024, 5:47 a.m. UTC | #2

On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > used, we can reuse them without needing to unmap and remap.
> >
> > We require space to store the DMA address. I use the page.dma_addr to
> > store the DMA address from the pp structure inside the page.
> >
> > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > stored in page.dma_addr. When a page is returned to the chain, we check
> > the DMA status; if it is not mapped (potentially having been unmapped),
> > we remap it before returning it to the chain.
> >
> > Based on the following points, we do not use page pool to manage these
> > pages:
> >
> > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> >    we can only prevent the page pool from performing DMA operations, and
> >    let the driver perform DMA operations on the allocated pages.
> > 2. But when the page pool releases the page, we have no chance to
> >    execute dma unmap.
> > 3. A solution to #2 is to execute dma unmap every time before putting
> >    the page back to the page pool. (This is actually a waste, we don't
> >    execute unmap so frequently.)
> > 4. But there is another problem, we still need to use page.dma_addr to
> >    save the dma address. Using page.dma_addr while using page pool is
> >    unsafe behavior.
> >
> > More:
> >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 108 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2c7a67ad4789..d4f5e65b247e 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> >         return (struct virtio_net_common_hdr *)skb->cb;
> >  }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > +       sg->dma_address = addr;
> > +       sg->length = len;
> > +}
> > +
> > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > + * address.
> > + */
> > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > +{
> > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
>
> Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
>
> > +               p->dma_addr = lower_32_bits(addr);
> > +               p->pp_magic = upper_32_bits(addr);
>
> And this uses three fields on page_pool which I'm not sure the other
> maintainers are happy with. For example, re-using pp_maing might be
> dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
>
> I think a more safe way is to reuse page pool, for example introducing
> a new flag with dma callbacks?

Let me try.

Thanks.

>
> Thanks
>

Xuan Zhuo April 23, 2024, 12:31 p.m. UTC | #3

On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > used, we can reuse them without needing to unmap and remap.
> >
> > We require space to store the DMA address. I use the page.dma_addr to
> > store the DMA address from the pp structure inside the page.
> >
> > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > stored in page.dma_addr. When a page is returned to the chain, we check
> > the DMA status; if it is not mapped (potentially having been unmapped),
> > we remap it before returning it to the chain.
> >
> > Based on the following points, we do not use page pool to manage these
> > pages:
> >
> > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> >    we can only prevent the page pool from performing DMA operations, and
> >    let the driver perform DMA operations on the allocated pages.
> > 2. But when the page pool releases the page, we have no chance to
> >    execute dma unmap.
> > 3. A solution to #2 is to execute dma unmap every time before putting
> >    the page back to the page pool. (This is actually a waste, we don't
> >    execute unmap so frequently.)
> > 4. But there is another problem, we still need to use page.dma_addr to
> >    save the dma address. Using page.dma_addr while using page pool is
> >    unsafe behavior.
> >
> > More:
> >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 108 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2c7a67ad4789..d4f5e65b247e 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> >         return (struct virtio_net_common_hdr *)skb->cb;
> >  }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > +       sg->dma_address = addr;
> > +       sg->length = len;
> > +}
> > +
> > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > + * address.
> > + */
> > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > +{
> > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
>
> Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
>
> > +               p->dma_addr = lower_32_bits(addr);
> > +               p->pp_magic = upper_32_bits(addr);
>
> And this uses three fields on page_pool which I'm not sure the other
> maintainers are happy with. For example, re-using pp_maing might be
> dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
>
> I think a more safe way is to reuse page pool, for example introducing
> a new flag with dma callbacks?

If we use page pool, how can we chain the pages allocated for a packet?

Yon know the "private" can not be used.


If the pp struct inside the page is not safe, how about:

		struct {	/* Page cache and anonymous pages */
			/**
			 * @lru: Pageout list, eg. active_list protected by
			 * lruvec->lru_lock.  Sometimes used as a generic list
			 * by the page owner.
			 */
			union {
				struct list_head lru;

				/* Or, for the Unevictable "LRU list" slot */
				struct {
					/* Always even, to negate PageTail */
					void *__filler;
					/* Count page's or folio's mlocks */
					unsigned int mlock_count;
				};

				/* Or, free page */
				struct list_head buddy_list;
				struct list_head pcp_list;
			};
			/* See page-flags.h for PAGE_MAPPING_FLAGS */
			struct address_space *mapping;
			union {
				pgoff_t index;		/* Our offset within mapping. */
				unsigned long share;	/* share count for fsdax */
			};
			/**
			 * @private: Mapping-private opaque data.
			 * Usually used for buffer_heads if PagePrivate.
			 * Used for swp_entry_t if PageSwapCache.
			 * Indicates order in the buddy system if PageBuddy.
			 */
			unsigned long private;
		};

Or, we can map the private space of the page as a new structure.

Thanks.


>
> Thanks
>

Jason Wang April 24, 2024, 12:43 a.m. UTC | #4

On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > used, we can reuse them without needing to unmap and remap.
> > >
> > > We require space to store the DMA address. I use the page.dma_addr to
> > > store the DMA address from the pp structure inside the page.
> > >
> > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > we remap it before returning it to the chain.
> > >
> > > Based on the following points, we do not use page pool to manage these
> > > pages:
> > >
> > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > >    we can only prevent the page pool from performing DMA operations, and
> > >    let the driver perform DMA operations on the allocated pages.
> > > 2. But when the page pool releases the page, we have no chance to
> > >    execute dma unmap.
> > > 3. A solution to #2 is to execute dma unmap every time before putting
> > >    the page back to the page pool. (This is actually a waste, we don't
> > >    execute unmap so frequently.)
> > > 4. But there is another problem, we still need to use page.dma_addr to
> > >    save the dma address. Using page.dma_addr while using page pool is
> > >    unsafe behavior.
> > >
> > > More:
> > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > >         return (struct virtio_net_common_hdr *)skb->cb;
> > >  }
> > >
> > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > +{
> > > +       sg->dma_address = addr;
> > > +       sg->length = len;
> > > +}
> > > +
> > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > + * address.
> > > + */
> > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > +{
> > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> >
> > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> >
> > > +               p->dma_addr = lower_32_bits(addr);
> > > +               p->pp_magic = upper_32_bits(addr);
> >
> > And this uses three fields on page_pool which I'm not sure the other
> > maintainers are happy with. For example, re-using pp_maing might be
> > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> >
> > I think a more safe way is to reuse page pool, for example introducing
> > a new flag with dma callbacks?
>
> If we use page pool, how can we chain the pages allocated for a packet?

I'm not sure I get this, it is chained via the descriptor flag.

>
> Yon know the "private" can not be used.
>
>
> If the pp struct inside the page is not safe, how about:
>
>                 struct {        /* Page cache and anonymous pages */
>                         /**
>                          * @lru: Pageout list, eg. active_list protected by
>                          * lruvec->lru_lock.  Sometimes used as a generic list
>                          * by the page owner.
>                          */
>                         union {
>                                 struct list_head lru;
>
>                                 /* Or, for the Unevictable "LRU list" slot */
>                                 struct {
>                                         /* Always even, to negate PageTail */
>                                         void *__filler;
>                                         /* Count page's or folio's mlocks */
>                                         unsigned int mlock_count;
>                                 };
>
>                                 /* Or, free page */
>                                 struct list_head buddy_list;
>                                 struct list_head pcp_list;
>                         };
>                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
>                         struct address_space *mapping;
>                         union {
>                                 pgoff_t index;          /* Our offset within mapping. */
>                                 unsigned long share;    /* share count for fsdax */
>                         };
>                         /**
>                          * @private: Mapping-private opaque data.
>                          * Usually used for buffer_heads if PagePrivate.
>                          * Used for swp_entry_t if PageSwapCache.
>                          * Indicates order in the buddy system if PageBuddy.
>                          */
>                         unsigned long private;
>                 };
>
> Or, we can map the private space of the page as a new structure.

It could be a way. But such allocation might be huge if we are using
indirect descriptors or I may miss something.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>

Xuan Zhuo April 24, 2024, 12:53 a.m. UTC | #5

On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > used, we can reuse them without needing to unmap and remap.
> > > >
> > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > store the DMA address from the pp structure inside the page.
> > > >
> > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > we remap it before returning it to the chain.
> > > >
> > > > Based on the following points, we do not use page pool to manage these
> > > > pages:
> > > >
> > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > >    we can only prevent the page pool from performing DMA operations, and
> > > >    let the driver perform DMA operations on the allocated pages.
> > > > 2. But when the page pool releases the page, we have no chance to
> > > >    execute dma unmap.
> > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > >    the page back to the page pool. (This is actually a waste, we don't
> > > >    execute unmap so frequently.)
> > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > >    save the dma address. Using page.dma_addr while using page pool is
> > > >    unsafe behavior.
> > > >
> > > > More:
> > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > >  }
> > > >
> > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > +{
> > > > +       sg->dma_address = addr;
> > > > +       sg->length = len;
> > > > +}
> > > > +
> > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > + * address.
> > > > + */
> > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > +{
> > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > >
> > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > >
> > > > +               p->dma_addr = lower_32_bits(addr);
> > > > +               p->pp_magic = upper_32_bits(addr);
> > >
> > > And this uses three fields on page_pool which I'm not sure the other
> > > maintainers are happy with. For example, re-using pp_maing might be
> > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > >
> > > I think a more safe way is to reuse page pool, for example introducing
> > > a new flag with dma callbacks?
> >
> > If we use page pool, how can we chain the pages allocated for a packet?
>
> I'm not sure I get this, it is chained via the descriptor flag.


In the big mode, we will commit many pages to the virtio core by
virtqueue_add_inbuf().

By virtqueue_get_buf_ctx(), we got the data. That is the first page.
Other pages are chained by the "private".

If we use the page pool, how can we chain the pages.
After virtqueue_add_inbuf(), we need to get the pages to fill the skb.



>
> >
> > Yon know the "private" can not be used.
> >
> >
> > If the pp struct inside the page is not safe, how about:
> >
> >                 struct {        /* Page cache and anonymous pages */
> >                         /**
> >                          * @lru: Pageout list, eg. active_list protected by
> >                          * lruvec->lru_lock.  Sometimes used as a generic list
> >                          * by the page owner.
> >                          */
> >                         union {
> >                                 struct list_head lru;
> >
> >                                 /* Or, for the Unevictable "LRU list" slot */
> >                                 struct {
> >                                         /* Always even, to negate PageTail */
> >                                         void *__filler;
> >                                         /* Count page's or folio's mlocks */
> >                                         unsigned int mlock_count;
> >                                 };
> >
> >                                 /* Or, free page */
> >                                 struct list_head buddy_list;
> >                                 struct list_head pcp_list;
> >                         };
> >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> >                         struct address_space *mapping;
> >                         union {
> >                                 pgoff_t index;          /* Our offset within mapping. */
> >                                 unsigned long share;    /* share count for fsdax */
> >                         };
> >                         /**
> >                          * @private: Mapping-private opaque data.
> >                          * Usually used for buffer_heads if PagePrivate.
> >                          * Used for swp_entry_t if PageSwapCache.
> >                          * Indicates order in the buddy system if PageBuddy.
> >                          */
> >                         unsigned long private;
> >                 };
> >
> > Or, we can map the private space of the page as a new structure.
>
> It could be a way. But such allocation might be huge if we are using
> indirect descriptors or I may miss something.

No. we only need to store the "chain next" and the dma as this patch set did.
The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
That is enough for us.

If you worry about the change of the pp structure, we can use the "private" as
origin and use the "struct list_head lru" to store the dma.

The min size of "struct list_head lru" is 8 bytes, that is enough for the
dma_addr_t.

We can do this more simper:

static void page_chain_set_dma(struct page *p, dma_addr_t dma)
{
	BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));

	dma_addr_t *addr;

	addr = &page->lru;

	*addr = dma;
}

Thanks.



>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>

Jason Wang April 24, 2024, 2:34 a.m. UTC | #6

On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > used, we can reuse them without needing to unmap and remap.
> > > > >
> > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > store the DMA address from the pp structure inside the page.
> > > > >
> > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > we remap it before returning it to the chain.
> > > > >
> > > > > Based on the following points, we do not use page pool to manage these
> > > > > pages:
> > > > >
> > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > 2. But when the page pool releases the page, we have no chance to
> > > > >    execute dma unmap.
> > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > >    execute unmap so frequently.)
> > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > >    unsafe behavior.
> > > > >
> > > > > More:
> > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > >  }
> > > > >
> > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > +{
> > > > > +       sg->dma_address = addr;
> > > > > +       sg->length = len;
> > > > > +}
> > > > > +
> > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > + * address.
> > > > > + */
> > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > +{
> > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > >
> > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > >
> > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > +               p->pp_magic = upper_32_bits(addr);
> > > >
> > > > And this uses three fields on page_pool which I'm not sure the other
> > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > >
> > > > I think a more safe way is to reuse page pool, for example introducing
> > > > a new flag with dma callbacks?
> > >
> > > If we use page pool, how can we chain the pages allocated for a packet?
> >
> > I'm not sure I get this, it is chained via the descriptor flag.
>
>
> In the big mode, we will commit many pages to the virtio core by
> virtqueue_add_inbuf().
>
> By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> Other pages are chained by the "private".
>
> If we use the page pool, how can we chain the pages.
> After virtqueue_add_inbuf(), we need to get the pages to fill the skb.

Right, technically it could be solved by providing helpers in the
virtio core, but considering it's an optimization for big mode which
is not popular, it's not worth to bother.

>
>
>
> >
> > >
> > > Yon know the "private" can not be used.
> > >
> > >
> > > If the pp struct inside the page is not safe, how about:
> > >
> > >                 struct {        /* Page cache and anonymous pages */
> > >                         /**
> > >                          * @lru: Pageout list, eg. active_list protected by
> > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > >                          * by the page owner.
> > >                          */
> > >                         union {
> > >                                 struct list_head lru;
> > >
> > >                                 /* Or, for the Unevictable "LRU list" slot */
> > >                                 struct {
> > >                                         /* Always even, to negate PageTail */
> > >                                         void *__filler;
> > >                                         /* Count page's or folio's mlocks */
> > >                                         unsigned int mlock_count;
> > >                                 };
> > >
> > >                                 /* Or, free page */
> > >                                 struct list_head buddy_list;
> > >                                 struct list_head pcp_list;
> > >                         };
> > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > >                         struct address_space *mapping;
> > >                         union {
> > >                                 pgoff_t index;          /* Our offset within mapping. */
> > >                                 unsigned long share;    /* share count for fsdax */
> > >                         };
> > >                         /**
> > >                          * @private: Mapping-private opaque data.
> > >                          * Usually used for buffer_heads if PagePrivate.
> > >                          * Used for swp_entry_t if PageSwapCache.
> > >                          * Indicates order in the buddy system if PageBuddy.
> > >                          */
> > >                         unsigned long private;
> > >                 };
> > >
> > > Or, we can map the private space of the page as a new structure.
> >
> > It could be a way. But such allocation might be huge if we are using
> > indirect descriptors or I may miss something.
>
> No. we only need to store the "chain next" and the dma as this patch set did.
> The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> That is enough for us.
>
> If you worry about the change of the pp structure, we can use the "private" as
> origin and use the "struct list_head lru" to store the dma.

This looks even worse, as it uses fields belonging to the different
structures in the union.

>
> The min size of "struct list_head lru" is 8 bytes, that is enough for the
> dma_addr_t.
>
> We can do this more simper:
>
> static void page_chain_set_dma(struct page *p, dma_addr_t dma)
> {
>         BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));
>
>         dma_addr_t *addr;
>
>         addr = &page->lru;
>
>         *addr = dma;
> }

So we had this in the kernel code.

       /*
         * Five words (20/40 bytes) are available in this union.
         * WARNING: bit 0 of the first word is used for PageTail(). That
         * means the other users of this union MUST NOT use the bit to
         * avoid collision and false-positive PageTail().
         */

And by looking at the discussion that introduces the pp_magic, reusing
fields seems to be tricky as we may end up with side effects of
aliasing fields in page structure. Technically, we can invent new
structures in the union, but it might not be worth it to bother.

So I think we can leave the fallback code and revisit this issue in the future.

Thanks

>
> Thanks.
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Xuan Zhuo April 24, 2024, 2:39 a.m. UTC | #7

On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > >
> > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > store the DMA address from the pp structure inside the page.
> > > > > >
> > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > we remap it before returning it to the chain.
> > > > > >
> > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > pages:
> > > > > >
> > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > >    execute dma unmap.
> > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > >    execute unmap so frequently.)
> > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > >    unsafe behavior.
> > > > > >
> > > > > > More:
> > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > >  }
> > > > > >
> > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > +{
> > > > > > +       sg->dma_address = addr;
> > > > > > +       sg->length = len;
> > > > > > +}
> > > > > > +
> > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > + * address.
> > > > > > + */
> > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > +{
> > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > >
> > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > >
> > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > >
> > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > >
> > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > a new flag with dma callbacks?
> > > >
> > > > If we use page pool, how can we chain the pages allocated for a packet?
> > >
> > > I'm not sure I get this, it is chained via the descriptor flag.
> >
> >
> > In the big mode, we will commit many pages to the virtio core by
> > virtqueue_add_inbuf().
> >
> > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > Other pages are chained by the "private".
> >
> > If we use the page pool, how can we chain the pages.
> > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
>
> Right, technically it could be solved by providing helpers in the
> virtio core, but considering it's an optimization for big mode which
> is not popular, it's not worth to bother.
>
> >
> >
> >
> > >
> > > >
> > > > Yon know the "private" can not be used.
> > > >
> > > >
> > > > If the pp struct inside the page is not safe, how about:
> > > >
> > > >                 struct {        /* Page cache and anonymous pages */
> > > >                         /**
> > > >                          * @lru: Pageout list, eg. active_list protected by
> > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > >                          * by the page owner.
> > > >                          */
> > > >                         union {
> > > >                                 struct list_head lru;
> > > >
> > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > >                                 struct {
> > > >                                         /* Always even, to negate PageTail */
> > > >                                         void *__filler;
> > > >                                         /* Count page's or folio's mlocks */
> > > >                                         unsigned int mlock_count;
> > > >                                 };
> > > >
> > > >                                 /* Or, free page */
> > > >                                 struct list_head buddy_list;
> > > >                                 struct list_head pcp_list;
> > > >                         };
> > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > >                         struct address_space *mapping;
> > > >                         union {
> > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > >                                 unsigned long share;    /* share count for fsdax */
> > > >                         };
> > > >                         /**
> > > >                          * @private: Mapping-private opaque data.
> > > >                          * Usually used for buffer_heads if PagePrivate.
> > > >                          * Used for swp_entry_t if PageSwapCache.
> > > >                          * Indicates order in the buddy system if PageBuddy.
> > > >                          */
> > > >                         unsigned long private;
> > > >                 };
> > > >
> > > > Or, we can map the private space of the page as a new structure.
> > >
> > > It could be a way. But such allocation might be huge if we are using
> > > indirect descriptors or I may miss something.
> >
> > No. we only need to store the "chain next" and the dma as this patch set did.
> > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > That is enough for us.
> >
> > If you worry about the change of the pp structure, we can use the "private" as
> > origin and use the "struct list_head lru" to store the dma.
>
> This looks even worse, as it uses fields belonging to the different
> structures in the union.

I mean we do not use the elems from the pp structure inside the page,
if we worry the change of the pp structure.

I mean use the "private" and "lru", these in the same structure.

I think this is a good way.

Thanks.

>
> >
> > The min size of "struct list_head lru" is 8 bytes, that is enough for the
> > dma_addr_t.
> >
> > We can do this more simper:
> >
> > static void page_chain_set_dma(struct page *p, dma_addr_t dma)
> > {
> >         BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));
> >
> >         dma_addr_t *addr;
> >
> >         addr = &page->lru;
> >
> >         *addr = dma;
> > }
>
> So we had this in the kernel code.
>
>        /*
>          * Five words (20/40 bytes) are available in this union.
>          * WARNING: bit 0 of the first word is used for PageTail(). That
>          * means the other users of this union MUST NOT use the bit to
>          * avoid collision and false-positive PageTail().
>          */
>
> And by looking at the discussion that introduces the pp_magic, reusing
> fields seems to be tricky as we may end up with side effects of
> aliasing fields in page structure. Technically, we can invent new
> structures in the union, but it might not be worth it to bother.
>
> So I think we can leave the fallback code and revisit this issue in the future.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Jason Wang April 24, 2024, 2:45 a.m. UTC | #8

On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > >
> > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > >
> > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > we remap it before returning it to the chain.
> > > > > > >
> > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > pages:
> > > > > > >
> > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > >    execute dma unmap.
> > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > >    execute unmap so frequently.)
> > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > >    unsafe behavior.
> > > > > > >
> > > > > > > More:
> > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > +{
> > > > > > > +       sg->dma_address = addr;
> > > > > > > +       sg->length = len;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > + * address.
> > > > > > > + */
> > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > +{
> > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > >
> > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > >
> > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > >
> > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > >
> > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > a new flag with dma callbacks?
> > > > >
> > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > >
> > > > I'm not sure I get this, it is chained via the descriptor flag.
> > >
> > >
> > > In the big mode, we will commit many pages to the virtio core by
> > > virtqueue_add_inbuf().
> > >
> > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > Other pages are chained by the "private".
> > >
> > > If we use the page pool, how can we chain the pages.
> > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> >
> > Right, technically it could be solved by providing helpers in the
> > virtio core, but considering it's an optimization for big mode which
> > is not popular, it's not worth to bother.
> >
> > >
> > >
> > >
> > > >
> > > > >
> > > > > Yon know the "private" can not be used.
> > > > >
> > > > >
> > > > > If the pp struct inside the page is not safe, how about:
> > > > >
> > > > >                 struct {        /* Page cache and anonymous pages */
> > > > >                         /**
> > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > >                          * by the page owner.
> > > > >                          */
> > > > >                         union {
> > > > >                                 struct list_head lru;
> > > > >
> > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > >                                 struct {
> > > > >                                         /* Always even, to negate PageTail */
> > > > >                                         void *__filler;
> > > > >                                         /* Count page's or folio's mlocks */
> > > > >                                         unsigned int mlock_count;
> > > > >                                 };
> > > > >
> > > > >                                 /* Or, free page */
> > > > >                                 struct list_head buddy_list;
> > > > >                                 struct list_head pcp_list;
> > > > >                         };
> > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > >                         struct address_space *mapping;
> > > > >                         union {
> > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > >                         };
> > > > >                         /**
> > > > >                          * @private: Mapping-private opaque data.
> > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > >                          */
> > > > >                         unsigned long private;
> > > > >                 };
> > > > >
> > > > > Or, we can map the private space of the page as a new structure.
> > > >
> > > > It could be a way. But such allocation might be huge if we are using
> > > > indirect descriptors or I may miss something.
> > >
> > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > That is enough for us.
> > >
> > > If you worry about the change of the pp structure, we can use the "private" as
> > > origin and use the "struct list_head lru" to store the dma.
> >
> > This looks even worse, as it uses fields belonging to the different
> > structures in the union.
>
> I mean we do not use the elems from the pp structure inside the page,
> if we worry the change of the pp structure.
>
> I mean use the "private" and "lru", these in the same structure.
>
> I think this is a good way.
>
> Thanks.

See this:

https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/

Thanks

Xuan Zhuo April 24, 2024, 2:54 a.m. UTC | #9

On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > >
> > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > >
> > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > we remap it before returning it to the chain.
> > > > > > > >
> > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > pages:
> > > > > > > >
> > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > >    execute dma unmap.
> > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > >    execute unmap so frequently.)
> > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > >    unsafe behavior.
> > > > > > > >
> > > > > > > > More:
> > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > +{
> > > > > > > > +       sg->dma_address = addr;
> > > > > > > > +       sg->length = len;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > + * address.
> > > > > > > > + */
> > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > +{
> > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > >
> > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > >
> > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > >
> > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > >
> > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > a new flag with dma callbacks?
> > > > > >
> > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > >
> > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > >
> > > >
> > > > In the big mode, we will commit many pages to the virtio core by
> > > > virtqueue_add_inbuf().
> > > >
> > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > Other pages are chained by the "private".
> > > >
> > > > If we use the page pool, how can we chain the pages.
> > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > >
> > > Right, technically it could be solved by providing helpers in the
> > > virtio core, but considering it's an optimization for big mode which
> > > is not popular, it's not worth to bother.
> > >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Yon know the "private" can not be used.
> > > > > >
> > > > > >
> > > > > > If the pp struct inside the page is not safe, how about:
> > > > > >
> > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > >                         /**
> > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > >                          * by the page owner.
> > > > > >                          */
> > > > > >                         union {
> > > > > >                                 struct list_head lru;
> > > > > >
> > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > >                                 struct {
> > > > > >                                         /* Always even, to negate PageTail */
> > > > > >                                         void *__filler;
> > > > > >                                         /* Count page's or folio's mlocks */
> > > > > >                                         unsigned int mlock_count;
> > > > > >                                 };
> > > > > >
> > > > > >                                 /* Or, free page */
> > > > > >                                 struct list_head buddy_list;
> > > > > >                                 struct list_head pcp_list;
> > > > > >                         };
> > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > >                         struct address_space *mapping;
> > > > > >                         union {
> > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > >                         };
> > > > > >                         /**
> > > > > >                          * @private: Mapping-private opaque data.
> > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > >                          */
> > > > > >                         unsigned long private;
> > > > > >                 };
> > > > > >
> > > > > > Or, we can map the private space of the page as a new structure.
> > > > >
> > > > > It could be a way. But such allocation might be huge if we are using
> > > > > indirect descriptors or I may miss something.
> > > >
> > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > That is enough for us.
> > > >
> > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > origin and use the "struct list_head lru" to store the dma.
> > >
> > > This looks even worse, as it uses fields belonging to the different
> > > structures in the union.
> >
> > I mean we do not use the elems from the pp structure inside the page,
> > if we worry the change of the pp structure.
> >
> > I mean use the "private" and "lru", these in the same structure.
> >
> > I think this is a good way.
> >
> > Thanks.
>
> See this:
>
> https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/


I think that is because that the page pool will share the page with
the skbs.  I'm not entirely sure.

In our case, virtio-net fully owns the page. After the page is referenced by skb,
virtio-net no longer references the page. I don't think there is any problem
here.

The key is that who owns the page, who can use the page private space (20/40 bytes).

Is that?

Thanks.


>
> Thanks
>

Jason Wang April 24, 2024, 3:50 a.m. UTC | #10

On Wed, Apr 24, 2024 at 10:58 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > >
> > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > >
> > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > >
> > > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > > pages:
> > > > > > > > >
> > > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > > >    execute dma unmap.
> > > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > > >    execute unmap so frequently.)
> > > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > > >    unsafe behavior.
> > > > > > > > >
> > > > > > > > > More:
> > > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > +{
> > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > +       sg->length = len;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > > + * address.
> > > > > > > > > + */
> > > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > > +{
> > > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > > >
> > > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > > >
> > > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > > >
> > > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > > >
> > > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > > a new flag with dma callbacks?
> > > > > > >
> > > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > > >
> > > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > > >
> > > > >
> > > > > In the big mode, we will commit many pages to the virtio core by
> > > > > virtqueue_add_inbuf().
> > > > >
> > > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > > Other pages are chained by the "private".
> > > > >
> > > > > If we use the page pool, how can we chain the pages.
> > > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > > >
> > > > Right, technically it could be solved by providing helpers in the
> > > > virtio core, but considering it's an optimization for big mode which
> > > > is not popular, it's not worth to bother.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Yon know the "private" can not be used.
> > > > > > >
> > > > > > >
> > > > > > > If the pp struct inside the page is not safe, how about:
> > > > > > >
> > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > >                         /**
> > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > >                          * by the page owner.
> > > > > > >                          */
> > > > > > >                         union {
> > > > > > >                                 struct list_head lru;
> > > > > > >
> > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > >                                 struct {
> > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > >                                         void *__filler;
> > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > >                                         unsigned int mlock_count;
> > > > > > >                                 };
> > > > > > >
> > > > > > >                                 /* Or, free page */
> > > > > > >                                 struct list_head buddy_list;
> > > > > > >                                 struct list_head pcp_list;
> > > > > > >                         };
> > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > >                         struct address_space *mapping;
> > > > > > >                         union {
> > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > >                         };
> > > > > > >                         /**
> > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > >                          */
> > > > > > >                         unsigned long private;
> > > > > > >                 };
> > > > > > >
> > > > > > > Or, we can map the private space of the page as a new structure.
> > > > > >
> > > > > > It could be a way. But such allocation might be huge if we are using
> > > > > > indirect descriptors or I may miss something.
> > > > >
> > > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > > That is enough for us.
> > > > >
> > > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > > origin and use the "struct list_head lru" to store the dma.
> > > >
> > > > This looks even worse, as it uses fields belonging to the different
> > > > structures in the union.
> > >
> > > I mean we do not use the elems from the pp structure inside the page,
> > > if we worry the change of the pp structure.
> > >
> > > I mean use the "private" and "lru", these in the same structure.
> > >
> > > I think this is a good way.
> > >
> > > Thanks.
> >
> > See this:
> >
> > https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/
>
>
> I think that is because that the page pool will share the page with
> the skbs.  I'm not entirely sure.
>
> In our case, virtio-net fully owns the page. After the page is referenced by skb,
> virtio-net no longer references the page. I don't think there is any problem
> here.

Well, in the rx path, though the page is allocated by the virtio-net,
unlike the page pool, those pages are not freed by virtio-net. So it
may leave things in the page structure which is problematic. I don't
think we can introduce a virtio-net specific hook for kfree_skb() in
this case. That's why I think leveraging the page pool is better.

For reusing page pool. Maybe we can reuse __pp_mapping_pad for
virtio-net specific use cases like chaining, and clear it in
page_pool_clear_pp_info(). And we need to make sure we don't break
things like TCP RX zerocopy since mapping is aliasied with
__pp_mapping_pad at a first glance.

>
> The key is that who owns the page, who can use the page private space (20/40 bytes).
>
> Is that?

I'm not saying we can't investigate in this direction. But it needs
more comments from mm guys and we need to evaluate the price we pay
for that.

The motivation is to drop the fallback code when pre mapping is not
supported to improve the maintainability of the code and ease the
AF_XDP support for virtio-net. But it turns out to be not easy.

Considering the rx fallback code we need to maintain is not too huge,
maybe we can leave it as is, for example forbid AF_XDP in big modes.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>

Xuan Zhuo April 24, 2024, 5:43 a.m. UTC | #11

On Wed, 24 Apr 2024 11:50:44 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 10:58 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > > >
> > > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > > >
> > > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > > >
> > > > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > > > pages:
> > > > > > > > > >
> > > > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > > > >    execute dma unmap.
> > > > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > > > >    execute unmap so frequently.)
> > > > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > > > >    unsafe behavior.
> > > > > > > > > >
> > > > > > > > > > More:
> > > > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > > +{
> > > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > > +       sg->length = len;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > > > + * address.
> > > > > > > > > > + */
> > > > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > > > +{
> > > > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > > > >
> > > > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > > > >
> > > > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > > > >
> > > > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > > > >
> > > > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > > > a new flag with dma callbacks?
> > > > > > > >
> > > > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > > > >
> > > > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > > > >
> > > > > >
> > > > > > In the big mode, we will commit many pages to the virtio core by
> > > > > > virtqueue_add_inbuf().
> > > > > >
> > > > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > > > Other pages are chained by the "private".
> > > > > >
> > > > > > If we use the page pool, how can we chain the pages.
> > > > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > > > >
> > > > > Right, technically it could be solved by providing helpers in the
> > > > > virtio core, but considering it's an optimization for big mode which
> > > > > is not popular, it's not worth to bother.
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Yon know the "private" can not be used.
> > > > > > > >
> > > > > > > >
> > > > > > > > If the pp struct inside the page is not safe, how about:
> > > > > > > >
> > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > >                         /**
> > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > >                          * by the page owner.
> > > > > > > >                          */
> > > > > > > >                         union {
> > > > > > > >                                 struct list_head lru;
> > > > > > > >
> > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > >                                 struct {
> > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > >                                         void *__filler;
> > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > >                                         unsigned int mlock_count;
> > > > > > > >                                 };
> > > > > > > >
> > > > > > > >                                 /* Or, free page */
> > > > > > > >                                 struct list_head buddy_list;
> > > > > > > >                                 struct list_head pcp_list;
> > > > > > > >                         };
> > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > >                         struct address_space *mapping;
> > > > > > > >                         union {
> > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > >                         };
> > > > > > > >                         /**
> > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > >                          */
> > > > > > > >                         unsigned long private;
> > > > > > > >                 };
> > > > > > > >
> > > > > > > > Or, we can map the private space of the page as a new structure.
> > > > > > >
> > > > > > > It could be a way. But such allocation might be huge if we are using
> > > > > > > indirect descriptors or I may miss something.
> > > > > >
> > > > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > > > That is enough for us.
> > > > > >
> > > > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > > > origin and use the "struct list_head lru" to store the dma.
> > > > >
> > > > > This looks even worse, as it uses fields belonging to the different
> > > > > structures in the union.
> > > >
> > > > I mean we do not use the elems from the pp structure inside the page,
> > > > if we worry the change of the pp structure.
> > > >
> > > > I mean use the "private" and "lru", these in the same structure.
> > > >
> > > > I think this is a good way.
> > > >
> > > > Thanks.
> > >
> > > See this:
> > >
> > > https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/
> >
> >
> > I think that is because that the page pool will share the page with
> > the skbs.  I'm not entirely sure.
> >
> > In our case, virtio-net fully owns the page. After the page is referenced by skb,
> > virtio-net no longer references the page. I don't think there is any problem
> > here.
>
> Well, in the rx path, though the page is allocated by the virtio-net,
> unlike the page pool, those pages are not freed by virtio-net. So it
> may leave things in the page structure which is problematic. I don't
> think we can introduce a virtio-net specific hook for kfree_skb() in
> this case. That's why I think leveraging the page pool is better.
>
> For reusing page pool. Maybe we can reuse __pp_mapping_pad for
> virtio-net specific use cases like chaining, and clear it in
> page_pool_clear_pp_info(). And we need to make sure we don't break
> things like TCP RX zerocopy since mapping is aliasied with
> __pp_mapping_pad at a first glance.
>
> >
> > The key is that who owns the page, who can use the page private space (20/40 bytes).
> >
> > Is that?
>
> I'm not saying we can't investigate in this direction. But it needs
> more comments from mm guys and we need to evaluate the price we pay
> for that.
>
> The motivation is to drop the fallback code when pre mapping is not
> supported to improve the maintainability of the code and ease the
> AF_XDP support for virtio-net. But it turns out to be not easy.
>
> Considering the rx fallback code we need to maintain is not too huge,
> maybe we can leave it as is, for example forbid AF_XDP in big modes.

I see.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>

[vhost,v2,4/7] virtio_net: big mode support premapped

Checks

Commit Message

Comments

Patch