diff mbox series

[vhost,v2,4/7] virtio_net: big mode support premapped

Message ID 20240422072408.126821-5-xuanzhuo@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series virtio_net: rx enable premapped mode by default | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 926 this patch: 928
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 937 this patch: 939
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Xuan Zhuo April 22, 2024, 7:24 a.m. UTC
In big mode, pre-mapping DMA is beneficial because if the pages are not
used, we can reuse them without needing to unmap and remap.

We require space to store the DMA address. I use the page.dma_addr to
store the DMA address from the pp structure inside the page.

Every page retrieved from get_a_page() is mapped, and its DMA address is
stored in page.dma_addr. When a page is returned to the chain, we check
the DMA status; if it is not mapped (potentially having been unmapped),
we remap it before returning it to the chain.

Based on the following points, we do not use page pool to manage these
pages:

1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
   we can only prevent the page pool from performing DMA operations, and
   let the driver perform DMA operations on the allocated pages.
2. But when the page pool releases the page, we have no chance to
   execute dma unmap.
3. A solution to #2 is to execute dma unmap every time before putting
   the page back to the page pool. (This is actually a waste, we don't
   execute unmap so frequently.)
4. But there is another problem, we still need to use page.dma_addr to
   save the dma address. Using page.dma_addr while using page pool is
   unsafe behavior.

More:
    https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 108 insertions(+), 15 deletions(-)

Comments

Jason Wang April 23, 2024, 4:36 a.m. UTC | #1
On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> In big mode, pre-mapping DMA is beneficial because if the pages are not
> used, we can reuse them without needing to unmap and remap.
>
> We require space to store the DMA address. I use the page.dma_addr to
> store the DMA address from the pp structure inside the page.
>
> Every page retrieved from get_a_page() is mapped, and its DMA address is
> stored in page.dma_addr. When a page is returned to the chain, we check
> the DMA status; if it is not mapped (potentially having been unmapped),
> we remap it before returning it to the chain.
>
> Based on the following points, we do not use page pool to manage these
> pages:
>
> 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
>    we can only prevent the page pool from performing DMA operations, and
>    let the driver perform DMA operations on the allocated pages.
> 2. But when the page pool releases the page, we have no chance to
>    execute dma unmap.
> 3. A solution to #2 is to execute dma unmap every time before putting
>    the page back to the page pool. (This is actually a waste, we don't
>    execute unmap so frequently.)
> 4. But there is another problem, we still need to use page.dma_addr to
>    save the dma address. Using page.dma_addr while using page pool is
>    unsafe behavior.
>
> More:
>     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 108 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2c7a67ad4789..d4f5e65b247e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
>         return (struct virtio_net_common_hdr *)skb->cb;
>  }
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> +       sg->dma_address = addr;
> +       sg->length = len;
> +}
> +
> +/* For pages submitted to the ring, we need to record its dma for unmap.
> + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> + * address.
> + */
> +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> +{
> +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {

Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.

> +               p->dma_addr = lower_32_bits(addr);
> +               p->pp_magic = upper_32_bits(addr);

And this uses three fields on page_pool which I'm not sure the other
maintainers are happy with. For example, re-using pp_maing might be
dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").

I think a more safe way is to reuse page pool, for example introducing
a new flag with dma callbacks?

Thanks
Xuan Zhuo April 23, 2024, 5:47 a.m. UTC | #2
On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > used, we can reuse them without needing to unmap and remap.
> >
> > We require space to store the DMA address. I use the page.dma_addr to
> > store the DMA address from the pp structure inside the page.
> >
> > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > stored in page.dma_addr. When a page is returned to the chain, we check
> > the DMA status; if it is not mapped (potentially having been unmapped),
> > we remap it before returning it to the chain.
> >
> > Based on the following points, we do not use page pool to manage these
> > pages:
> >
> > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> >    we can only prevent the page pool from performing DMA operations, and
> >    let the driver perform DMA operations on the allocated pages.
> > 2. But when the page pool releases the page, we have no chance to
> >    execute dma unmap.
> > 3. A solution to #2 is to execute dma unmap every time before putting
> >    the page back to the page pool. (This is actually a waste, we don't
> >    execute unmap so frequently.)
> > 4. But there is another problem, we still need to use page.dma_addr to
> >    save the dma address. Using page.dma_addr while using page pool is
> >    unsafe behavior.
> >
> > More:
> >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 108 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2c7a67ad4789..d4f5e65b247e 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> >         return (struct virtio_net_common_hdr *)skb->cb;
> >  }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > +       sg->dma_address = addr;
> > +       sg->length = len;
> > +}
> > +
> > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > + * address.
> > + */
> > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > +{
> > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
>
> Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
>
> > +               p->dma_addr = lower_32_bits(addr);
> > +               p->pp_magic = upper_32_bits(addr);
>
> And this uses three fields on page_pool which I'm not sure the other
> maintainers are happy with. For example, re-using pp_maing might be
> dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
>
> I think a more safe way is to reuse page pool, for example introducing
> a new flag with dma callbacks?

Let me try.

Thanks.

>
> Thanks
>
Xuan Zhuo April 23, 2024, 12:31 p.m. UTC | #3
On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > used, we can reuse them without needing to unmap and remap.
> >
> > We require space to store the DMA address. I use the page.dma_addr to
> > store the DMA address from the pp structure inside the page.
> >
> > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > stored in page.dma_addr. When a page is returned to the chain, we check
> > the DMA status; if it is not mapped (potentially having been unmapped),
> > we remap it before returning it to the chain.
> >
> > Based on the following points, we do not use page pool to manage these
> > pages:
> >
> > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> >    we can only prevent the page pool from performing DMA operations, and
> >    let the driver perform DMA operations on the allocated pages.
> > 2. But when the page pool releases the page, we have no chance to
> >    execute dma unmap.
> > 3. A solution to #2 is to execute dma unmap every time before putting
> >    the page back to the page pool. (This is actually a waste, we don't
> >    execute unmap so frequently.)
> > 4. But there is another problem, we still need to use page.dma_addr to
> >    save the dma address. Using page.dma_addr while using page pool is
> >    unsafe behavior.
> >
> > More:
> >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 108 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2c7a67ad4789..d4f5e65b247e 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> >         return (struct virtio_net_common_hdr *)skb->cb;
> >  }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > +       sg->dma_address = addr;
> > +       sg->length = len;
> > +}
> > +
> > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > + * address.
> > + */
> > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > +{
> > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
>
> Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
>
> > +               p->dma_addr = lower_32_bits(addr);
> > +               p->pp_magic = upper_32_bits(addr);
>
> And this uses three fields on page_pool which I'm not sure the other
> maintainers are happy with. For example, re-using pp_maing might be
> dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
>
> I think a more safe way is to reuse page pool, for example introducing
> a new flag with dma callbacks?

If we use page pool, how can we chain the pages allocated for a packet?

Yon know the "private" can not be used.


If the pp struct inside the page is not safe, how about:

		struct {	/* Page cache and anonymous pages */
			/**
			 * @lru: Pageout list, eg. active_list protected by
			 * lruvec->lru_lock.  Sometimes used as a generic list
			 * by the page owner.
			 */
			union {
				struct list_head lru;

				/* Or, for the Unevictable "LRU list" slot */
				struct {
					/* Always even, to negate PageTail */
					void *__filler;
					/* Count page's or folio's mlocks */
					unsigned int mlock_count;
				};

				/* Or, free page */
				struct list_head buddy_list;
				struct list_head pcp_list;
			};
			/* See page-flags.h for PAGE_MAPPING_FLAGS */
			struct address_space *mapping;
			union {
				pgoff_t index;		/* Our offset within mapping. */
				unsigned long share;	/* share count for fsdax */
			};
			/**
			 * @private: Mapping-private opaque data.
			 * Usually used for buffer_heads if PagePrivate.
			 * Used for swp_entry_t if PageSwapCache.
			 * Indicates order in the buddy system if PageBuddy.
			 */
			unsigned long private;
		};

Or, we can map the private space of the page as a new structure.

Thanks.


>
> Thanks
>
Jason Wang April 24, 2024, 12:43 a.m. UTC | #4
On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > used, we can reuse them without needing to unmap and remap.
> > >
> > > We require space to store the DMA address. I use the page.dma_addr to
> > > store the DMA address from the pp structure inside the page.
> > >
> > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > we remap it before returning it to the chain.
> > >
> > > Based on the following points, we do not use page pool to manage these
> > > pages:
> > >
> > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > >    we can only prevent the page pool from performing DMA operations, and
> > >    let the driver perform DMA operations on the allocated pages.
> > > 2. But when the page pool releases the page, we have no chance to
> > >    execute dma unmap.
> > > 3. A solution to #2 is to execute dma unmap every time before putting
> > >    the page back to the page pool. (This is actually a waste, we don't
> > >    execute unmap so frequently.)
> > > 4. But there is another problem, we still need to use page.dma_addr to
> > >    save the dma address. Using page.dma_addr while using page pool is
> > >    unsafe behavior.
> > >
> > > More:
> > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > >         return (struct virtio_net_common_hdr *)skb->cb;
> > >  }
> > >
> > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > +{
> > > +       sg->dma_address = addr;
> > > +       sg->length = len;
> > > +}
> > > +
> > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > + * address.
> > > + */
> > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > +{
> > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> >
> > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> >
> > > +               p->dma_addr = lower_32_bits(addr);
> > > +               p->pp_magic = upper_32_bits(addr);
> >
> > And this uses three fields on page_pool which I'm not sure the other
> > maintainers are happy with. For example, re-using pp_maing might be
> > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> >
> > I think a more safe way is to reuse page pool, for example introducing
> > a new flag with dma callbacks?
>
> If we use page pool, how can we chain the pages allocated for a packet?

I'm not sure I get this, it is chained via the descriptor flag.

>
> Yon know the "private" can not be used.
>
>
> If the pp struct inside the page is not safe, how about:
>
>                 struct {        /* Page cache and anonymous pages */
>                         /**
>                          * @lru: Pageout list, eg. active_list protected by
>                          * lruvec->lru_lock.  Sometimes used as a generic list
>                          * by the page owner.
>                          */
>                         union {
>                                 struct list_head lru;
>
>                                 /* Or, for the Unevictable "LRU list" slot */
>                                 struct {
>                                         /* Always even, to negate PageTail */
>                                         void *__filler;
>                                         /* Count page's or folio's mlocks */
>                                         unsigned int mlock_count;
>                                 };
>
>                                 /* Or, free page */
>                                 struct list_head buddy_list;
>                                 struct list_head pcp_list;
>                         };
>                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
>                         struct address_space *mapping;
>                         union {
>                                 pgoff_t index;          /* Our offset within mapping. */
>                                 unsigned long share;    /* share count for fsdax */
>                         };
>                         /**
>                          * @private: Mapping-private opaque data.
>                          * Usually used for buffer_heads if PagePrivate.
>                          * Used for swp_entry_t if PageSwapCache.
>                          * Indicates order in the buddy system if PageBuddy.
>                          */
>                         unsigned long private;
>                 };
>
> Or, we can map the private space of the page as a new structure.

It could be a way. But such allocation might be huge if we are using
indirect descriptors or I may miss something.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>
Xuan Zhuo April 24, 2024, 12:53 a.m. UTC | #5
On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > used, we can reuse them without needing to unmap and remap.
> > > >
> > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > store the DMA address from the pp structure inside the page.
> > > >
> > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > we remap it before returning it to the chain.
> > > >
> > > > Based on the following points, we do not use page pool to manage these
> > > > pages:
> > > >
> > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > >    we can only prevent the page pool from performing DMA operations, and
> > > >    let the driver perform DMA operations on the allocated pages.
> > > > 2. But when the page pool releases the page, we have no chance to
> > > >    execute dma unmap.
> > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > >    the page back to the page pool. (This is actually a waste, we don't
> > > >    execute unmap so frequently.)
> > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > >    save the dma address. Using page.dma_addr while using page pool is
> > > >    unsafe behavior.
> > > >
> > > > More:
> > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > >  }
> > > >
> > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > +{
> > > > +       sg->dma_address = addr;
> > > > +       sg->length = len;
> > > > +}
> > > > +
> > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > + * address.
> > > > + */
> > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > +{
> > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > >
> > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > >
> > > > +               p->dma_addr = lower_32_bits(addr);
> > > > +               p->pp_magic = upper_32_bits(addr);
> > >
> > > And this uses three fields on page_pool which I'm not sure the other
> > > maintainers are happy with. For example, re-using pp_maing might be
> > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > >
> > > I think a more safe way is to reuse page pool, for example introducing
> > > a new flag with dma callbacks?
> >
> > If we use page pool, how can we chain the pages allocated for a packet?
>
> I'm not sure I get this, it is chained via the descriptor flag.


In the big mode, we will commit many pages to the virtio core by
virtqueue_add_inbuf().

By virtqueue_get_buf_ctx(), we got the data. That is the first page.
Other pages are chained by the "private".

If we use the page pool, how can we chain the pages.
After virtqueue_add_inbuf(), we need to get the pages to fill the skb.



>
> >
> > Yon know the "private" can not be used.
> >
> >
> > If the pp struct inside the page is not safe, how about:
> >
> >                 struct {        /* Page cache and anonymous pages */
> >                         /**
> >                          * @lru: Pageout list, eg. active_list protected by
> >                          * lruvec->lru_lock.  Sometimes used as a generic list
> >                          * by the page owner.
> >                          */
> >                         union {
> >                                 struct list_head lru;
> >
> >                                 /* Or, for the Unevictable "LRU list" slot */
> >                                 struct {
> >                                         /* Always even, to negate PageTail */
> >                                         void *__filler;
> >                                         /* Count page's or folio's mlocks */
> >                                         unsigned int mlock_count;
> >                                 };
> >
> >                                 /* Or, free page */
> >                                 struct list_head buddy_list;
> >                                 struct list_head pcp_list;
> >                         };
> >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> >                         struct address_space *mapping;
> >                         union {
> >                                 pgoff_t index;          /* Our offset within mapping. */
> >                                 unsigned long share;    /* share count for fsdax */
> >                         };
> >                         /**
> >                          * @private: Mapping-private opaque data.
> >                          * Usually used for buffer_heads if PagePrivate.
> >                          * Used for swp_entry_t if PageSwapCache.
> >                          * Indicates order in the buddy system if PageBuddy.
> >                          */
> >                         unsigned long private;
> >                 };
> >
> > Or, we can map the private space of the page as a new structure.
>
> It could be a way. But such allocation might be huge if we are using
> indirect descriptors or I may miss something.

No. we only need to store the "chain next" and the dma as this patch set did.
The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
That is enough for us.

If you worry about the change of the pp structure, we can use the "private" as
origin and use the "struct list_head lru" to store the dma.

The min size of "struct list_head lru" is 8 bytes, that is enough for the
dma_addr_t.

We can do this more simper:

static void page_chain_set_dma(struct page *p, dma_addr_t dma)
{
	BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));

	dma_addr_t *addr;

	addr = &page->lru;

	*addr = dma;
}

Thanks.



>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>
Jason Wang April 24, 2024, 2:34 a.m. UTC | #6
On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > used, we can reuse them without needing to unmap and remap.
> > > > >
> > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > store the DMA address from the pp structure inside the page.
> > > > >
> > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > we remap it before returning it to the chain.
> > > > >
> > > > > Based on the following points, we do not use page pool to manage these
> > > > > pages:
> > > > >
> > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > 2. But when the page pool releases the page, we have no chance to
> > > > >    execute dma unmap.
> > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > >    execute unmap so frequently.)
> > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > >    unsafe behavior.
> > > > >
> > > > > More:
> > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > >  }
> > > > >
> > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > +{
> > > > > +       sg->dma_address = addr;
> > > > > +       sg->length = len;
> > > > > +}
> > > > > +
> > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > + * address.
> > > > > + */
> > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > +{
> > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > >
> > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > >
> > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > +               p->pp_magic = upper_32_bits(addr);
> > > >
> > > > And this uses three fields on page_pool which I'm not sure the other
> > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > >
> > > > I think a more safe way is to reuse page pool, for example introducing
> > > > a new flag with dma callbacks?
> > >
> > > If we use page pool, how can we chain the pages allocated for a packet?
> >
> > I'm not sure I get this, it is chained via the descriptor flag.
>
>
> In the big mode, we will commit many pages to the virtio core by
> virtqueue_add_inbuf().
>
> By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> Other pages are chained by the "private".
>
> If we use the page pool, how can we chain the pages.
> After virtqueue_add_inbuf(), we need to get the pages to fill the skb.

Right, technically it could be solved by providing helpers in the
virtio core, but considering it's an optimization for big mode which
is not popular, it's not worth to bother.

>
>
>
> >
> > >
> > > Yon know the "private" can not be used.
> > >
> > >
> > > If the pp struct inside the page is not safe, how about:
> > >
> > >                 struct {        /* Page cache and anonymous pages */
> > >                         /**
> > >                          * @lru: Pageout list, eg. active_list protected by
> > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > >                          * by the page owner.
> > >                          */
> > >                         union {
> > >                                 struct list_head lru;
> > >
> > >                                 /* Or, for the Unevictable "LRU list" slot */
> > >                                 struct {
> > >                                         /* Always even, to negate PageTail */
> > >                                         void *__filler;
> > >                                         /* Count page's or folio's mlocks */
> > >                                         unsigned int mlock_count;
> > >                                 };
> > >
> > >                                 /* Or, free page */
> > >                                 struct list_head buddy_list;
> > >                                 struct list_head pcp_list;
> > >                         };
> > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > >                         struct address_space *mapping;
> > >                         union {
> > >                                 pgoff_t index;          /* Our offset within mapping. */
> > >                                 unsigned long share;    /* share count for fsdax */
> > >                         };
> > >                         /**
> > >                          * @private: Mapping-private opaque data.
> > >                          * Usually used for buffer_heads if PagePrivate.
> > >                          * Used for swp_entry_t if PageSwapCache.
> > >                          * Indicates order in the buddy system if PageBuddy.
> > >                          */
> > >                         unsigned long private;
> > >                 };
> > >
> > > Or, we can map the private space of the page as a new structure.
> >
> > It could be a way. But such allocation might be huge if we are using
> > indirect descriptors or I may miss something.
>
> No. we only need to store the "chain next" and the dma as this patch set did.
> The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> That is enough for us.
>
> If you worry about the change of the pp structure, we can use the "private" as
> origin and use the "struct list_head lru" to store the dma.

This looks even worse, as it uses fields belonging to the different
structures in the union.

>
> The min size of "struct list_head lru" is 8 bytes, that is enough for the
> dma_addr_t.
>
> We can do this more simper:
>
> static void page_chain_set_dma(struct page *p, dma_addr_t dma)
> {
>         BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));
>
>         dma_addr_t *addr;
>
>         addr = &page->lru;
>
>         *addr = dma;
> }

So we had this in the kernel code.

       /*
         * Five words (20/40 bytes) are available in this union.
         * WARNING: bit 0 of the first word is used for PageTail(). That
         * means the other users of this union MUST NOT use the bit to
         * avoid collision and false-positive PageTail().
         */

And by looking at the discussion that introduces the pp_magic, reusing
fields seems to be tricky as we may end up with side effects of
aliasing fields in page structure. Technically, we can invent new
structures in the union, but it might not be worth it to bother.

So I think we can leave the fallback code and revisit this issue in the future.

Thanks

>
> Thanks.
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>
Xuan Zhuo April 24, 2024, 2:39 a.m. UTC | #7
On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > >
> > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > store the DMA address from the pp structure inside the page.
> > > > > >
> > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > we remap it before returning it to the chain.
> > > > > >
> > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > pages:
> > > > > >
> > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > >    execute dma unmap.
> > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > >    execute unmap so frequently.)
> > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > >    unsafe behavior.
> > > > > >
> > > > > > More:
> > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > >  }
> > > > > >
> > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > +{
> > > > > > +       sg->dma_address = addr;
> > > > > > +       sg->length = len;
> > > > > > +}
> > > > > > +
> > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > + * address.
> > > > > > + */
> > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > +{
> > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > >
> > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > >
> > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > >
> > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > >
> > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > a new flag with dma callbacks?
> > > >
> > > > If we use page pool, how can we chain the pages allocated for a packet?
> > >
> > > I'm not sure I get this, it is chained via the descriptor flag.
> >
> >
> > In the big mode, we will commit many pages to the virtio core by
> > virtqueue_add_inbuf().
> >
> > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > Other pages are chained by the "private".
> >
> > If we use the page pool, how can we chain the pages.
> > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
>
> Right, technically it could be solved by providing helpers in the
> virtio core, but considering it's an optimization for big mode which
> is not popular, it's not worth to bother.
>
> >
> >
> >
> > >
> > > >
> > > > Yon know the "private" can not be used.
> > > >
> > > >
> > > > If the pp struct inside the page is not safe, how about:
> > > >
> > > >                 struct {        /* Page cache and anonymous pages */
> > > >                         /**
> > > >                          * @lru: Pageout list, eg. active_list protected by
> > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > >                          * by the page owner.
> > > >                          */
> > > >                         union {
> > > >                                 struct list_head lru;
> > > >
> > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > >                                 struct {
> > > >                                         /* Always even, to negate PageTail */
> > > >                                         void *__filler;
> > > >                                         /* Count page's or folio's mlocks */
> > > >                                         unsigned int mlock_count;
> > > >                                 };
> > > >
> > > >                                 /* Or, free page */
> > > >                                 struct list_head buddy_list;
> > > >                                 struct list_head pcp_list;
> > > >                         };
> > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > >                         struct address_space *mapping;
> > > >                         union {
> > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > >                                 unsigned long share;    /* share count for fsdax */
> > > >                         };
> > > >                         /**
> > > >                          * @private: Mapping-private opaque data.
> > > >                          * Usually used for buffer_heads if PagePrivate.
> > > >                          * Used for swp_entry_t if PageSwapCache.
> > > >                          * Indicates order in the buddy system if PageBuddy.
> > > >                          */
> > > >                         unsigned long private;
> > > >                 };
> > > >
> > > > Or, we can map the private space of the page as a new structure.
> > >
> > > It could be a way. But such allocation might be huge if we are using
> > > indirect descriptors or I may miss something.
> >
> > No. we only need to store the "chain next" and the dma as this patch set did.
> > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > That is enough for us.
> >
> > If you worry about the change of the pp structure, we can use the "private" as
> > origin and use the "struct list_head lru" to store the dma.
>
> This looks even worse, as it uses fields belonging to the different
> structures in the union.

I mean we do not use the elems from the pp structure inside the page,
if we worry the change of the pp structure.

I mean use the "private" and "lru", these in the same structure.

I think this is a good way.

Thanks.

>
> >
> > The min size of "struct list_head lru" is 8 bytes, that is enough for the
> > dma_addr_t.
> >
> > We can do this more simper:
> >
> > static void page_chain_set_dma(struct page *p, dma_addr_t dma)
> > {
> >         BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));
> >
> >         dma_addr_t *addr;
> >
> >         addr = &page->lru;
> >
> >         *addr = dma;
> > }
>
> So we had this in the kernel code.
>
>        /*
>          * Five words (20/40 bytes) are available in this union.
>          * WARNING: bit 0 of the first word is used for PageTail(). That
>          * means the other users of this union MUST NOT use the bit to
>          * avoid collision and false-positive PageTail().
>          */
>
> And by looking at the discussion that introduces the pp_magic, reusing
> fields seems to be tricky as we may end up with side effects of
> aliasing fields in page structure. Technically, we can invent new
> structures in the union, but it might not be worth it to bother.
>
> So I think we can leave the fallback code and revisit this issue in the future.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>
Jason Wang April 24, 2024, 2:45 a.m. UTC | #8
On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > >
> > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > >
> > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > we remap it before returning it to the chain.
> > > > > > >
> > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > pages:
> > > > > > >
> > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > >    execute dma unmap.
> > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > >    execute unmap so frequently.)
> > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > >    unsafe behavior.
> > > > > > >
> > > > > > > More:
> > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > +{
> > > > > > > +       sg->dma_address = addr;
> > > > > > > +       sg->length = len;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > + * address.
> > > > > > > + */
> > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > +{
> > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > >
> > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > >
> > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > >
> > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > >
> > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > a new flag with dma callbacks?
> > > > >
> > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > >
> > > > I'm not sure I get this, it is chained via the descriptor flag.
> > >
> > >
> > > In the big mode, we will commit many pages to the virtio core by
> > > virtqueue_add_inbuf().
> > >
> > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > Other pages are chained by the "private".
> > >
> > > If we use the page pool, how can we chain the pages.
> > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> >
> > Right, technically it could be solved by providing helpers in the
> > virtio core, but considering it's an optimization for big mode which
> > is not popular, it's not worth to bother.
> >
> > >
> > >
> > >
> > > >
> > > > >
> > > > > Yon know the "private" can not be used.
> > > > >
> > > > >
> > > > > If the pp struct inside the page is not safe, how about:
> > > > >
> > > > >                 struct {        /* Page cache and anonymous pages */
> > > > >                         /**
> > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > >                          * by the page owner.
> > > > >                          */
> > > > >                         union {
> > > > >                                 struct list_head lru;
> > > > >
> > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > >                                 struct {
> > > > >                                         /* Always even, to negate PageTail */
> > > > >                                         void *__filler;
> > > > >                                         /* Count page's or folio's mlocks */
> > > > >                                         unsigned int mlock_count;
> > > > >                                 };
> > > > >
> > > > >                                 /* Or, free page */
> > > > >                                 struct list_head buddy_list;
> > > > >                                 struct list_head pcp_list;
> > > > >                         };
> > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > >                         struct address_space *mapping;
> > > > >                         union {
> > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > >                         };
> > > > >                         /**
> > > > >                          * @private: Mapping-private opaque data.
> > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > >                          */
> > > > >                         unsigned long private;
> > > > >                 };
> > > > >
> > > > > Or, we can map the private space of the page as a new structure.
> > > >
> > > > It could be a way. But such allocation might be huge if we are using
> > > > indirect descriptors or I may miss something.
> > >
> > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > That is enough for us.
> > >
> > > If you worry about the change of the pp structure, we can use the "private" as
> > > origin and use the "struct list_head lru" to store the dma.
> >
> > This looks even worse, as it uses fields belonging to the different
> > structures in the union.
>
> I mean we do not use the elems from the pp structure inside the page,
> if we worry the change of the pp structure.
>
> I mean use the "private" and "lru", these in the same structure.
>
> I think this is a good way.
>
> Thanks.

See this:

https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/

Thanks
Xuan Zhuo April 24, 2024, 2:54 a.m. UTC | #9
On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > >
> > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > >
> > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > we remap it before returning it to the chain.
> > > > > > > >
> > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > pages:
> > > > > > > >
> > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > >    execute dma unmap.
> > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > >    execute unmap so frequently.)
> > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > >    unsafe behavior.
> > > > > > > >
> > > > > > > > More:
> > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > +{
> > > > > > > > +       sg->dma_address = addr;
> > > > > > > > +       sg->length = len;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > + * address.
> > > > > > > > + */
> > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > +{
> > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > >
> > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > >
> > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > >
> > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > >
> > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > a new flag with dma callbacks?
> > > > > >
> > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > >
> > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > >
> > > >
> > > > In the big mode, we will commit many pages to the virtio core by
> > > > virtqueue_add_inbuf().
> > > >
> > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > Other pages are chained by the "private".
> > > >
> > > > If we use the page pool, how can we chain the pages.
> > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > >
> > > Right, technically it could be solved by providing helpers in the
> > > virtio core, but considering it's an optimization for big mode which
> > > is not popular, it's not worth to bother.
> > >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Yon know the "private" can not be used.
> > > > > >
> > > > > >
> > > > > > If the pp struct inside the page is not safe, how about:
> > > > > >
> > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > >                         /**
> > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > >                          * by the page owner.
> > > > > >                          */
> > > > > >                         union {
> > > > > >                                 struct list_head lru;
> > > > > >
> > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > >                                 struct {
> > > > > >                                         /* Always even, to negate PageTail */
> > > > > >                                         void *__filler;
> > > > > >                                         /* Count page's or folio's mlocks */
> > > > > >                                         unsigned int mlock_count;
> > > > > >                                 };
> > > > > >
> > > > > >                                 /* Or, free page */
> > > > > >                                 struct list_head buddy_list;
> > > > > >                                 struct list_head pcp_list;
> > > > > >                         };
> > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > >                         struct address_space *mapping;
> > > > > >                         union {
> > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > >                         };
> > > > > >                         /**
> > > > > >                          * @private: Mapping-private opaque data.
> > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > >                          */
> > > > > >                         unsigned long private;
> > > > > >                 };
> > > > > >
> > > > > > Or, we can map the private space of the page as a new structure.
> > > > >
> > > > > It could be a way. But such allocation might be huge if we are using
> > > > > indirect descriptors or I may miss something.
> > > >
> > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > That is enough for us.
> > > >
> > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > origin and use the "struct list_head lru" to store the dma.
> > >
> > > This looks even worse, as it uses fields belonging to the different
> > > structures in the union.
> >
> > I mean we do not use the elems from the pp structure inside the page,
> > if we worry the change of the pp structure.
> >
> > I mean use the "private" and "lru", these in the same structure.
> >
> > I think this is a good way.
> >
> > Thanks.
>
> See this:
>
> https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/


I think that is because that the page pool will share the page with
the skbs.  I'm not entirely sure.

In our case, virtio-net fully owns the page. After the page is referenced by skb,
virtio-net no longer references the page. I don't think there is any problem
here.

The key is that who owns the page, who can use the page private space (20/40 bytes).

Is that?

Thanks.


>
> Thanks
>
Jason Wang April 24, 2024, 3:50 a.m. UTC | #10
On Wed, Apr 24, 2024 at 10:58 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > >
> > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > >
> > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > >
> > > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > > pages:
> > > > > > > > >
> > > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > > >    execute dma unmap.
> > > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > > >    execute unmap so frequently.)
> > > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > > >    unsafe behavior.
> > > > > > > > >
> > > > > > > > > More:
> > > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > +{
> > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > +       sg->length = len;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > > + * address.
> > > > > > > > > + */
> > > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > > +{
> > > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > > >
> > > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > > >
> > > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > > >
> > > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > > >
> > > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > > a new flag with dma callbacks?
> > > > > > >
> > > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > > >
> > > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > > >
> > > > >
> > > > > In the big mode, we will commit many pages to the virtio core by
> > > > > virtqueue_add_inbuf().
> > > > >
> > > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > > Other pages are chained by the "private".
> > > > >
> > > > > If we use the page pool, how can we chain the pages.
> > > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > > >
> > > > Right, technically it could be solved by providing helpers in the
> > > > virtio core, but considering it's an optimization for big mode which
> > > > is not popular, it's not worth to bother.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Yon know the "private" can not be used.
> > > > > > >
> > > > > > >
> > > > > > > If the pp struct inside the page is not safe, how about:
> > > > > > >
> > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > >                         /**
> > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > >                          * by the page owner.
> > > > > > >                          */
> > > > > > >                         union {
> > > > > > >                                 struct list_head lru;
> > > > > > >
> > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > >                                 struct {
> > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > >                                         void *__filler;
> > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > >                                         unsigned int mlock_count;
> > > > > > >                                 };
> > > > > > >
> > > > > > >                                 /* Or, free page */
> > > > > > >                                 struct list_head buddy_list;
> > > > > > >                                 struct list_head pcp_list;
> > > > > > >                         };
> > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > >                         struct address_space *mapping;
> > > > > > >                         union {
> > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > >                         };
> > > > > > >                         /**
> > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > >                          */
> > > > > > >                         unsigned long private;
> > > > > > >                 };
> > > > > > >
> > > > > > > Or, we can map the private space of the page as a new structure.
> > > > > >
> > > > > > It could be a way. But such allocation might be huge if we are using
> > > > > > indirect descriptors or I may miss something.
> > > > >
> > > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > > That is enough for us.
> > > > >
> > > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > > origin and use the "struct list_head lru" to store the dma.
> > > >
> > > > This looks even worse, as it uses fields belonging to the different
> > > > structures in the union.
> > >
> > > I mean we do not use the elems from the pp structure inside the page,
> > > if we worry the change of the pp structure.
> > >
> > > I mean use the "private" and "lru", these in the same structure.
> > >
> > > I think this is a good way.
> > >
> > > Thanks.
> >
> > See this:
> >
> > https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/
>
>
> I think that is because that the page pool will share the page with
> the skbs.  I'm not entirely sure.
>
> In our case, virtio-net fully owns the page. After the page is referenced by skb,
> virtio-net no longer references the page. I don't think there is any problem
> here.

Well, in the rx path, though the page is allocated by the virtio-net,
unlike the page pool, those pages are not freed by virtio-net. So it
may leave things in the page structure which is problematic. I don't
think we can introduce a virtio-net specific hook for kfree_skb() in
this case. That's why I think leveraging the page pool is better.

For reusing page pool. Maybe we can reuse __pp_mapping_pad for
virtio-net specific use cases like chaining, and clear it in
page_pool_clear_pp_info(). And we need to make sure we don't break
things like TCP RX zerocopy since mapping is aliasied with
__pp_mapping_pad at a first glance.

>
> The key is that who owns the page, who can use the page private space (20/40 bytes).
>
> Is that?

I'm not saying we can't investigate in this direction. But it needs
more comments from mm guys and we need to evaluate the price we pay
for that.

The motivation is to drop the fallback code when pre mapping is not
supported to improve the maintainability of the code and ease the
AF_XDP support for virtio-net. But it turns out to be not easy.

Considering the rx fallback code we need to maintain is not too huge,
maybe we can leave it as is, for example forbid AF_XDP in big modes.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>
Xuan Zhuo April 24, 2024, 5:43 a.m. UTC | #11
On Wed, 24 Apr 2024 11:50:44 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 10:58 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > > >
> > > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > > >
> > > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > > >
> > > > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > > > pages:
> > > > > > > > > >
> > > > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > > > >    execute dma unmap.
> > > > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > > > >    execute unmap so frequently.)
> > > > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > > > >    unsafe behavior.
> > > > > > > > > >
> > > > > > > > > > More:
> > > > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > > +{
> > > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > > +       sg->length = len;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > > > + * address.
> > > > > > > > > > + */
> > > > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > > > +{
> > > > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > > > >
> > > > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > > > >
> > > > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > > > >
> > > > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > > > >
> > > > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > > > a new flag with dma callbacks?
> > > > > > > >
> > > > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > > > >
> > > > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > > > >
> > > > > >
> > > > > > In the big mode, we will commit many pages to the virtio core by
> > > > > > virtqueue_add_inbuf().
> > > > > >
> > > > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > > > Other pages are chained by the "private".
> > > > > >
> > > > > > If we use the page pool, how can we chain the pages.
> > > > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > > > >
> > > > > Right, technically it could be solved by providing helpers in the
> > > > > virtio core, but considering it's an optimization for big mode which
> > > > > is not popular, it's not worth to bother.
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Yon know the "private" can not be used.
> > > > > > > >
> > > > > > > >
> > > > > > > > If the pp struct inside the page is not safe, how about:
> > > > > > > >
> > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > >                         /**
> > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > >                          * by the page owner.
> > > > > > > >                          */
> > > > > > > >                         union {
> > > > > > > >                                 struct list_head lru;
> > > > > > > >
> > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > >                                 struct {
> > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > >                                         void *__filler;
> > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > >                                         unsigned int mlock_count;
> > > > > > > >                                 };
> > > > > > > >
> > > > > > > >                                 /* Or, free page */
> > > > > > > >                                 struct list_head buddy_list;
> > > > > > > >                                 struct list_head pcp_list;
> > > > > > > >                         };
> > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > >                         struct address_space *mapping;
> > > > > > > >                         union {
> > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > >                         };
> > > > > > > >                         /**
> > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > >                          */
> > > > > > > >                         unsigned long private;
> > > > > > > >                 };
> > > > > > > >
> > > > > > > > Or, we can map the private space of the page as a new structure.
> > > > > > >
> > > > > > > It could be a way. But such allocation might be huge if we are using
> > > > > > > indirect descriptors or I may miss something.
> > > > > >
> > > > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > > > That is enough for us.
> > > > > >
> > > > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > > > origin and use the "struct list_head lru" to store the dma.
> > > > >
> > > > > This looks even worse, as it uses fields belonging to the different
> > > > > structures in the union.
> > > >
> > > > I mean we do not use the elems from the pp structure inside the page,
> > > > if we worry the change of the pp structure.
> > > >
> > > > I mean use the "private" and "lru", these in the same structure.
> > > >
> > > > I think this is a good way.
> > > >
> > > > Thanks.
> > >
> > > See this:
> > >
> > > https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/
> >
> >
> > I think that is because that the page pool will share the page with
> > the skbs.  I'm not entirely sure.
> >
> > In our case, virtio-net fully owns the page. After the page is referenced by skb,
> > virtio-net no longer references the page. I don't think there is any problem
> > here.
>
> Well, in the rx path, though the page is allocated by the virtio-net,
> unlike the page pool, those pages are not freed by virtio-net. So it
> may leave things in the page structure which is problematic. I don't
> think we can introduce a virtio-net specific hook for kfree_skb() in
> this case. That's why I think leveraging the page pool is better.
>
> For reusing page pool. Maybe we can reuse __pp_mapping_pad for
> virtio-net specific use cases like chaining, and clear it in
> page_pool_clear_pp_info(). And we need to make sure we don't break
> things like TCP RX zerocopy since mapping is aliasied with
> __pp_mapping_pad at a first glance.
>
> >
> > The key is that who owns the page, who can use the page private space (20/40 bytes).
> >
> > Is that?
>
> I'm not saying we can't investigate in this direction. But it needs
> more comments from mm guys and we need to evaluate the price we pay
> for that.
>
> The motivation is to drop the fallback code when pre mapping is not
> supported to improve the maintainability of the code and ease the
> AF_XDP support for virtio-net. But it turns out to be not easy.
>
> Considering the rx fallback code we need to maintain is not too huge,
> maybe we can leave it as is, for example forbid AF_XDP in big modes.

I see.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2c7a67ad4789..d4f5e65b247e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -439,6 +439,81 @@  skb_vnet_common_hdr(struct sk_buff *skb)
 	return (struct virtio_net_common_hdr *)skb->cb;
 }
 
+static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
+{
+	sg->dma_address = addr;
+	sg->length = len;
+}
+
+/* For pages submitted to the ring, we need to record its dma for unmap.
+ * Here, we use the page.dma_addr and page.pp_magic to store the dma
+ * address.
+ */
+static void page_chain_set_dma(struct page *p, dma_addr_t addr)
+{
+	if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
+		p->dma_addr = lower_32_bits(addr);
+		p->pp_magic = upper_32_bits(addr);
+	} else {
+		p->dma_addr = addr;
+	}
+}
+
+static dma_addr_t page_chain_get_dma(struct page *p)
+{
+	if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
+		u64 addr;
+
+		addr = p->pp_magic;
+		return (addr << 32) + p->dma_addr;
+	} else {
+		return p->dma_addr;
+	}
+}
+
+static void page_chain_sync_for_cpu(struct receive_queue *rq, struct page *p)
+{
+	virtqueue_dma_sync_single_range_for_cpu(rq->vq, page_chain_get_dma(p),
+						0, PAGE_SIZE, DMA_FROM_DEVICE);
+}
+
+static void page_chain_unmap(struct receive_queue *rq, struct page *p, bool sync)
+{
+	int attr = 0;
+
+	if (!sync)
+		attr = DMA_ATTR_SKIP_CPU_SYNC;
+
+	virtqueue_dma_unmap_page_attrs(rq->vq, page_chain_get_dma(p), PAGE_SIZE,
+				       DMA_FROM_DEVICE, attr);
+}
+
+static int page_chain_map(struct receive_queue *rq, struct page *p)
+{
+	dma_addr_t addr;
+
+	addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
+	if (virtqueue_dma_mapping_error(rq->vq, addr))
+		return -ENOMEM;
+
+	page_chain_set_dma(p, addr);
+	return 0;
+}
+
+static void page_chain_release(struct receive_queue *rq)
+{
+	struct page *p, *n;
+
+	for (p = rq->pages; p; p = n) {
+		n = page_chain_next(p);
+
+		page_chain_unmap(rq, p, true);
+		__free_pages(p, 0);
+	}
+
+	rq->pages = NULL;
+}
+
 /*
  * put the whole most recent used list in the beginning for reuse
  */
@@ -461,8 +536,15 @@  static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 		rq->pages = page_chain_next(p);
 		/* clear chain here, it is used to chain pages */
 		page_chain_add(p, NULL);
-	} else
+	} else {
 		p = alloc_page(gfp_mask);
+
+		if (page_chain_map(rq, p)) {
+			__free_pages(p, 0);
+			return NULL;
+		}
+	}
+
 	return p;
 }
 
@@ -617,9 +699,12 @@  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		if (unlikely(!skb))
 			return NULL;
 
-		page = page_chain_next(page);
-		if (page)
-			give_pages(rq, page);
+		if (!vi->mergeable_rx_bufs) {
+			page_chain_unmap(rq, page, false);
+			page = page_chain_next(page);
+			if (page)
+				give_pages(rq, page);
+		}
 		goto ok;
 	}
 
@@ -662,6 +747,9 @@  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	BUG_ON(offset >= PAGE_SIZE);
 	while (len) {
 		unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
+
+		page_chain_unmap(rq, page, !offset);
+
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
 				frag_size, truesize);
 		len -= frag_size;
@@ -828,7 +916,8 @@  static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
 
 	rq = &vi->rq[i];
 
-	if (rq->do_dma)
+	/* Skip the unmap for big mode. */
+	if (!vi->big_packets || vi->mergeable_rx_bufs)
 		virtnet_rq_unmap(rq, buf, 0);
 
 	virtnet_rq_free_buf(vi, rq, buf);
@@ -1351,8 +1440,12 @@  static struct sk_buff *receive_big(struct net_device *dev,
 				   struct virtnet_rq_stats *stats)
 {
 	struct page *page = buf;
-	struct sk_buff *skb =
-		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
+	struct sk_buff *skb;
+
+	/* sync first page. The follow code may read this page. */
+	page_chain_sync_for_cpu(rq, page);
+
+	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
 
 	u64_stats_add(&stats->bytes, len - vi->hdr_len);
 	if (unlikely(!skb))
@@ -1901,7 +1994,7 @@  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 			   gfp_t gfp)
 {
 	struct page *first, *list = NULL;
-	char *p;
+	dma_addr_t p;
 	int i, err, offset;
 
 	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
@@ -1914,7 +2007,7 @@  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 				give_pages(rq, list);
 			return -ENOMEM;
 		}
-		sg_set_buf(&rq->sg[i], page_address(first), PAGE_SIZE);
+		sg_fill_dma(&rq->sg[i], page_chain_get_dma(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
 		page_chain_add(first, list);
@@ -1926,15 +2019,16 @@  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 		give_pages(rq, list);
 		return -ENOMEM;
 	}
-	p = page_address(first);
+
+	p = page_chain_get_dma(first);
 
 	/* rq->sg[0], rq->sg[1] share the same page */
 	/* a separated rq->sg[0] for header - required in case !any_header_sg */
-	sg_set_buf(&rq->sg[0], p, vi->hdr_len);
+	sg_fill_dma(&rq->sg[0], p, vi->hdr_len);
 
 	/* rq->sg[1] for data packet, from offset */
 	offset = sizeof(struct padded_vnet_hdr);
-	sg_set_buf(&rq->sg[1], p + offset, PAGE_SIZE - offset);
+	sg_fill_dma(&rq->sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
 	page_chain_add(first, list);
@@ -2136,7 +2230,7 @@  static int virtnet_receive(struct receive_queue *rq, int budget,
 		}
 	} else {
 		while (packets < budget &&
-		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
+		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			packets++;
 		}
@@ -4257,8 +4351,7 @@  static void _free_receive_bufs(struct virtnet_info *vi)
 	int i;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		while (vi->rq[i].pages)
-			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
+		page_chain_release(&vi->rq[i]);
 
 		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
 		RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);