diff mbox series

[net-next] udp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES

Message ID 1580952.1690961810@warthog.procyon.org.uk (mailing list archive)
State Accepted
Commit ce650a1663354a6cad7145e7f5131008458b39d4
Delegated to: Netdev Maintainers
Headers show
Series [net-next] udp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 29 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

David Howells Aug. 2, 2023, 7:36 a.m. UTC
__ip6_append_data() can has a similar problem to __ip_append_data()[1] when
asked to splice into a partially-built UDP message that has more than the
frag-limit data and up to the MTU limit, but in the ipv6 case, it errors
out with EINVAL.  This can be triggered with something like:

        pipe(pfd);
        sfd = socket(AF_INET6, SOCK_DGRAM, 0);
        connect(sfd, ...);
        send(sfd, buffer, 8137, MSG_CONFIRM|MSG_MORE);
        write(pfd[1], buffer, 8);
        splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);

where the amount of data given to send() is dependent on the MTU size (in
this instance an interface with an MTU of 8192).

The problem is that the calculation of the amount to copy in
__ip6_append_data() goes negative in two places, but a check has been put
in to give an error in this case.

This happens because when pagedlen > 0 (which happens for MSG_ZEROCOPY and
MSG_SPLICE_PAGES), the terms in:

        copy = datalen - transhdrlen - fraggap - pagedlen;

then mostly cancel when pagedlen is substituted for, leaving just -fraggap.

Fix this by:

 (1) Insert a note about the dodgy calculation of 'copy'.

 (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above
     equation, so that 'offset' isn't regressed and 'length' isn't
     increased, which will mean that length and thus copy should match the
     amount left in the iterator.

 (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if
     we're asked to splice more than is in the iterator.  It might be
     better to not give the warning or even just give a 'short' write.

 (4) If MSG_SPLICE_PAGES, override the copy<0 check.

[!] Note that this should also affect MSG_ZEROCOPY, but that will return
-EINVAL for the range of send sizes that requires the skbuff to be split.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/ [1]
---
 net/ipv6/ip6_output.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn Aug. 2, 2023, 2:25 p.m. UTC | #1
David Howells wrote:
> __ip6_append_data() can has a similar problem to __ip_append_data()[1] when
> asked to splice into a partially-built UDP message that has more than the
> frag-limit data and up to the MTU limit, but in the ipv6 case, it errors
> out with EINVAL.  This can be triggered with something like:
> 
>         pipe(pfd);
>         sfd = socket(AF_INET6, SOCK_DGRAM, 0);
>         connect(sfd, ...);
>         send(sfd, buffer, 8137, MSG_CONFIRM|MSG_MORE);
>         write(pfd[1], buffer, 8);
>         splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);
> 
> where the amount of data given to send() is dependent on the MTU size (in
> this instance an interface with an MTU of 8192).
> 
> The problem is that the calculation of the amount to copy in
> __ip6_append_data() goes negative in two places, but a check has been put
> in to give an error in this case.
> 
> This happens because when pagedlen > 0 (which happens for MSG_ZEROCOPY and
> MSG_SPLICE_PAGES), the terms in:
> 
>         copy = datalen - transhdrlen - fraggap - pagedlen;
> 
> then mostly cancel when pagedlen is substituted for, leaving just -fraggap.
> 
> Fix this by:
> 
>  (1) Insert a note about the dodgy calculation of 'copy'.
> 
>  (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above
>      equation, so that 'offset' isn't regressed and 'length' isn't
>      increased, which will mean that length and thus copy should match the
>      amount left in the iterator.
> 
>  (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if
>      we're asked to splice more than is in the iterator.  It might be
>      better to not give the warning or even just give a 'short' write.
> 
>  (4) If MSG_SPLICE_PAGES, override the copy<0 check.
> 
> [!] Note that this should also affect MSG_ZEROCOPY, but that will return
> -EINVAL for the range of send sizes that requires the skbuff to be split.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: David Ahern <dsahern@kernel.org>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: netdev@vger.kernel.org
> Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/ [1]

Reviewed-by: Willem de Bruijn <willemb@google.com>

I'm beginning to understand your point that the bug is older and copy
should never end up equal to -fraglen. pagedlen includes all of
datalen, which includes fraggap. This is wrong, as fraggap is always
copied to skb->linear. Haven't really thought it through, but would
this solve it as well?

                        else {
                                alloclen = fragheaderlen + transhdrlen;
-                               pagedlen = datalen - transhdrlen;
+                               pagedlen = datalen - transhdrlen - fraggap;

After that copy no longer subtracts fraglen twice.

                        copy = datalen - transhdrlen - fraggap - pagedlen;

But don't mean to delay these targeted fixes for MSG_SPLICE_PAGES any
further.
patchwork-bot+netdevbpf@kernel.org Aug. 3, 2023, 1:10 p.m. UTC | #2
Hello:

This patch was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Wed, 02 Aug 2023 08:36:50 +0100 you wrote:
> __ip6_append_data() can has a similar problem to __ip_append_data()[1] when
> asked to splice into a partially-built UDP message that has more than the
> frag-limit data and up to the MTU limit, but in the ipv6 case, it errors
> out with EINVAL.  This can be triggered with something like:
> 
>         pipe(pfd);
>         sfd = socket(AF_INET6, SOCK_DGRAM, 0);
>         connect(sfd, ...);
>         send(sfd, buffer, 8137, MSG_CONFIRM|MSG_MORE);
>         write(pfd[1], buffer, 8);
>         splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);
> 
> [...]

Here is the summary with links:
  - [net-next] udp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/ce650a166335

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 1e8c90e97608..bc96559bbf0f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1693,7 +1693,10 @@  static int __ip6_append_data(struct sock *sk,
 			fraglen = datalen + fragheaderlen;
 
 			copy = datalen - transhdrlen - fraggap - pagedlen;
-			if (copy < 0) {
+			/* [!] NOTE: copy may be negative if pagedlen>0
+			 * because then the equation may reduces to -fraggap.
+			 */
+			if (copy < 0 && !(flags & MSG_SPLICE_PAGES)) {
 				err = -EINVAL;
 				goto error;
 			}
@@ -1744,6 +1747,8 @@  static int __ip6_append_data(struct sock *sk,
 				err = -EFAULT;
 				kfree_skb(skb);
 				goto error;
+			} else if (flags & MSG_SPLICE_PAGES) {
+				copy = 0;
 			}
 
 			offset += copy;
@@ -1791,6 +1796,10 @@  static int __ip6_append_data(struct sock *sk,
 		} else if (flags & MSG_SPLICE_PAGES) {
 			struct msghdr *msg = from;
 
+			err = -EIO;
+			if (WARN_ON_ONCE(copy > msg->msg_iter.count))
+				goto error;
+
 			err = skb_splice_from_iter(skb, &msg->msg_iter, copy,
 						   sk->sk_allocation);
 			if (err < 0)