diff mbox series

[1/5] virtio-net: fix overflow inside virtnet_rq_alloc

Message ID 20241014031234.7659-2-xuanzhuo@linux.alibaba.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series virtio_net: enable premapped mode by default | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: line length of 97 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-14--09-00 (tests: 777)

Commit Message

Xuan Zhuo Oct. 14, 2024, 3:12 a.m. UTC
When the frag just got a page, then may lead to regression on VM.
Specially if the sysctl net.core.high_order_alloc_disable value is 1,
then the frag always get a page when do refill.

Which could see reliable crashes or scp failure (scp a file 100M in size
to VM):

The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
of a new frag. When the frag size is larger than PAGE_SIZE,
everything is fine. However, if the frag is only one page and the
total size of the buffer and virtnet_rq_dma is larger than one page, an
overflow may occur.

Here, when the frag size is not enough, we reduce the buffer len to fix
this problem.

Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Paolo Abeni Oct. 17, 2024, 1:42 p.m. UTC | #1
On 10/14/24 05:12, Xuan Zhuo wrote:
> When the frag just got a page, then may lead to regression on VM.
> Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> then the frag always get a page when do refill.
> 
> Which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
> 
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur.
> 
> Here, when the frag size is not enough, we reduce the buffer len to fix
> this problem.
> 
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

This looks like a fix that should target the net tree, but the following 
patches looks like net-next material. Any special reason to bundle them 
together?

Also, please explicitly include the the target tree in the subj on next 
submissions, thanks!

Paolo
Jason Wang Oct. 18, 2024, 7:41 a.m. UTC | #2
On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> When the frag just got a page, then may lead to regression on VM.
> Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> then the frag always get a page when do refill.
>
> Which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
>
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur.
>
> Here, when the frag size is not enough, we reduce the buffer len to fix
> this problem.
>
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Though this may fix the problem and we need it now, I would like to
have some tweaks in the future. Basically because of the security
implication for sharing driver metadata with the device (most IOMMU
can only do protection at the granule at the page). So we may end up
with device-triggerable behaviour. For safety, we should use driver
private memory for DMA metadata.

> ---
>  drivers/net/virtio_net.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f8131f92a392..59a99bbaf852 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>         void *buf, *head;
>         dma_addr_t addr;
>
> -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -               return NULL;
> -
>         head = page_address(alloc_frag->page);
>
>         if (rq->do_dma) {
> @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         len = SKB_DATA_ALIGN(len) +
>               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +               return -ENOMEM;
> +
>         buf = virtnet_rq_alloc(rq, len, gfp);
>         if (unlikely(!buf))
>                 return -ENOMEM;
> @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>          */
>         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>
> +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +               return -ENOMEM;

This makes me think of another question, how could we guarantee len +
room is less or equal to PAGE_SIZE. Especially considering we need
extra head and tailroom for XDP? If we can't it is a bug:

"""
/**
 * skb_page_frag_refill - check that a page_frag contains enough room
 * @sz: minimum size of the fragment we want to get
 * @pfrag: pointer to page_frag
 * @gfp: priority for memory allocation
 *
 * Note: While this allocator tries to use high order pages, there is
 * no guarantee that allocations succeed. Therefore, @sz MUST be
 * less or equal than PAGE_SIZE.
 */
"""

> +
> +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +               len -= sizeof(struct virtnet_rq_dma);

Any reason we need to check alloc_frag->offset?

> +
>         buf = virtnet_rq_alloc(rq, len + room, gfp);

Btw, as pointed out in previous review, we should have a consistent API:

1) hide the alloc_frag like virtnet_rq_alloc()

or

2) pass the alloc_frag to virtnet_rq_alloc()

>         if (unlikely(!buf))
>                 return -ENOMEM;
> --
> 2.32.0.3.g01195cf9f
>

Thanks
Michael S. Tsirkin Oct. 19, 2024, 4:34 p.m. UTC | #3
On Thu, Oct 17, 2024 at 03:42:59PM +0200, Paolo Abeni wrote:
> 
> 
> On 10/14/24 05:12, Xuan Zhuo wrote:
> > When the frag just got a page, then may lead to regression on VM.
> > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > then the frag always get a page when do refill.
> > 
> > Which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> > 
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur.
> > 
> > Here, when the frag size is not enough, we reduce the buffer len to fix
> > this problem.
> > 
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> 
> This looks like a fix that should target the net tree, but the following
> patches looks like net-next material. Any special reason to bundle them
> together?
> 
> Also, please explicitly include the the target tree in the subj on next
> submissions, thanks!
> 
> Paolo

The issue is that net-next is not rebased on net, so if patches 2-5 land
in next and 1 only in net, next will be broken. I think the simplest fix
is just to have 1 on net-next, backport it to net too, and git will
figure it out. Right?
Xuan Zhuo Oct. 25, 2024, 2:35 a.m. UTC | #4
On Thu, 17 Oct 2024 15:42:59 +0200, Paolo Abeni <pabeni@redhat.com> wrote:
>
>
> On 10/14/24 05:12, Xuan Zhuo wrote:
> > When the frag just got a page, then may lead to regression on VM.
> > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > then the frag always get a page when do refill.
> >
> > Which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur.
> >
> > Here, when the frag size is not enough, we reduce the buffer len to fix
> > this problem.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> This looks like a fix that should target the net tree, but the following
> patches looks like net-next material. Any special reason to bundle them
> together?

Sorry, I forgot to add net-next as a target tree.

This may look like a fix. But the feature was disabled in the last Linux
version. So the bug cannot be triggered, so we don't need to push to the net
tree.

Thanks.

>
> Also, please explicitly include the the target tree in the subj on next
> submissions, thanks!
>
> Paolo
>
>
Xuan Zhuo Oct. 25, 2024, 2:40 a.m. UTC | #5
On Fri, 18 Oct 2024 15:41:41 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > When the frag just got a page, then may lead to regression on VM.
> > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > then the frag always get a page when do refill.
> >
> > Which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur.
> >
> > Here, when the frag size is not enough, we reduce the buffer len to fix
> > this problem.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> Though this may fix the problem and we need it now, I would like to
> have some tweaks in the future. Basically because of the security
> implication for sharing driver metadata with the device (most IOMMU
> can only do protection at the granule at the page). So we may end up
> with device-triggerable behaviour. For safety, we should use driver
> private memory for DMA metadata.
>
> > ---
> >  drivers/net/virtio_net.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f8131f92a392..59a99bbaf852 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >         void *buf, *head;
> >         dma_addr_t addr;
> >
> > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > -               return NULL;
> > -
> >         head = page_address(alloc_frag->page);
> >
> >         if (rq->do_dma) {
> > @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         len = SKB_DATA_ALIGN(len) +
> >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >
> > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > +               return -ENOMEM;
> > +
> >         buf = virtnet_rq_alloc(rq, len, gfp);
> >         if (unlikely(!buf))
> >                 return -ENOMEM;
> > @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >          */
> >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> >
> > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > +               return -ENOMEM;
>
> This makes me think of another question, how could we guarantee len +
> room is less or equal to PAGE_SIZE. Especially considering we need
> extra head and tailroom for XDP? If we can't it is a bug:

get_mergeable_buf_len() do this.

>
> """
> /**
>  * skb_page_frag_refill - check that a page_frag contains enough room
>  * @sz: minimum size of the fragment we want to get
>  * @pfrag: pointer to page_frag
>  * @gfp: priority for memory allocation
>  *
>  * Note: While this allocator tries to use high order pages, there is
>  * no guarantee that allocations succeed. Therefore, @sz MUST be
>  * less or equal than PAGE_SIZE.
>  */
> """
>
> > +
> > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > +               len -= sizeof(struct virtnet_rq_dma);
>
> Any reason we need to check alloc_frag->offset?

We just need to check when the page of the alloc frag is new.
In this case, we need to allocate space to store dma meta.
If the offset > 0, then we do not need to allocate extra space,
so it is safe.


>
> > +
> >         buf = virtnet_rq_alloc(rq, len + room, gfp);
>
> Btw, as pointed out in previous review, we should have a consistent API:
>
> 1) hide the alloc_frag like virtnet_rq_alloc()
>
> or
>
> 2) pass the alloc_frag to virtnet_rq_alloc()


Now we need to check len+room before calling skb_page_frag_refill()
so we must move  virtnet_rq_alloc() outside virtnet_rq_alloc().

Thanks

>
> >         if (unlikely(!buf))
> >                 return -ENOMEM;
> > --
> > 2.32.0.3.g01195cf9f
> >
>
> Thanks
>
Simon Horman Oct. 25, 2024, 1:22 p.m. UTC | #6
On Fri, Oct 25, 2024 at 10:35:53AM +0800, Xuan Zhuo wrote:
> On Thu, 17 Oct 2024 15:42:59 +0200, Paolo Abeni <pabeni@redhat.com> wrote:
> >
> >
> > On 10/14/24 05:12, Xuan Zhuo wrote:
> > > When the frag just got a page, then may lead to regression on VM.
> > > Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> > > then the frag always get a page when do refill.
> > >
> > > Which could see reliable crashes or scp failure (scp a file 100M in size
> > > to VM):
> > >
> > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > everything is fine. However, if the frag is only one page and the
> > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > overflow may occur.
> > >
> > > Here, when the frag size is not enough, we reduce the buffer len to fix
> > > this problem.
> > >
> > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> > This looks like a fix that should target the net tree, but the following
> > patches looks like net-next material. Any special reason to bundle them
> > together?
> 
> Sorry, I forgot to add net-next as a target tree.
> 
> This may look like a fix. But the feature was disabled in the last Linux
> version. So the bug cannot be triggered, so we don't need to push to the net
> tree.

I think it would be useful to be clear in the commit message, use of tags,
and target tree regarding fixes and non-fixes.

Please describe in the commit message why this is not fixing a bug, as you
have described above. And please do not include Fixes tags in patches that
are not bug fixes, which seems to be the case here.

If you want to refer to the patch that introduced the problem, you can use
the following syntax, in the commit message, before the tags. Unlike Fixes
tags, this may be line wrapped.

  This problem is not a bug fix for net because... It was was introduced by
  commit f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api").

  Reported-by: ...
  ...
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f8131f92a392..59a99bbaf852 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -926,9 +926,6 @@  static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 	void *buf, *head;
 	dma_addr_t addr;
 
-	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
-		return NULL;
-
 	head = page_address(alloc_frag->page);
 
 	if (rq->do_dma) {
@@ -2423,6 +2420,9 @@  static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	len = SKB_DATA_ALIGN(len) +
 	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
+	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
+		return -ENOMEM;
+
 	buf = virtnet_rq_alloc(rq, len, gfp);
 	if (unlikely(!buf))
 		return -ENOMEM;
@@ -2525,6 +2525,12 @@  static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	 */
 	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
 
+	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
+		return -ENOMEM;
+
+	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
+		len -= sizeof(struct virtnet_rq_dma);
+
 	buf = virtnet_rq_alloc(rq, len + room, gfp);
 	if (unlikely(!buf))
 		return -ENOMEM;