diff mbox series

[v2,net,1/3] udp: Fix multiple wraparounds of sk->sk_rmem_alloc.

Message ID 20250325195826.52385-2-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series udp: Fix two integer overflows when sk->sk_rcvbuf is close to INT_MAX. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 6 this patch: 6
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 40 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0

Commit Message

Kuniyuki Iwashima March 25, 2025, 7:58 p.m. UTC
__udp_enqueue_schedule_skb() has the following condition:

  if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
          goto drop;

sk->sk_rcvbuf is initialised by net.core.rmem_default and later can
be configured by SO_RCVBUF, which is limited by net.core.rmem_max,
or SO_RCVBUFFORCE.

If we set INT_MAX to sk->sk_rcvbuf, the condition is always false
as sk->sk_rmem_alloc is also signed int.

Then, the size of the incoming skb is added to sk->sk_rmem_alloc
unconditionally.

This results in integer overflow (possibly multiple times) on
sk->sk_rmem_alloc and allows a single socket to have skb up to
net.core.udp_mem[1].

For example, if we set a large value to udp_mem[1] and INT_MAX to
sk->sk_rcvbuf and flood packets to the socket, we can see multiple
overflows:

  # cat /proc/net/sockstat | grep UDP:
  UDP: inuse 3 mem 7956736  <-- (7956736 << 12) bytes > INT_MAX * 15
                                             ^- PAGE_SHIFT
  # ss -uam
  State  Recv-Q      ...
  UNCONN -1757018048 ...    <-- flipping the sign repeatedly
         skmem:(r2537949248,rb2147483646,t0,tb212992,f1984,w0,o0,bl0,d0)

Previously, we had a boundary check for INT_MAX, which was removed by
commit 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc").

A complete fix would be to revert it and cap the right operand by
INT_MAX:

  rmem = atomic_add_return(size, &sk->sk_rmem_alloc);
  if (rmem > min(size + (unsigned int)sk->sk_rcvbuf, INT_MAX))
          goto uncharge_drop;

but we do not want to add the expensive atomic_add_return() back just
for the corner case.

Casting rmem to unsigned int prevents multiple wraparounds, but we still
allow a single wraparound.

  # cat /proc/net/sockstat | grep UDP:
  UDP: inuse 3 mem 524288  <-- (INT_MAX + 1) >> 12

  # ss -uam
  State  Recv-Q      ...
  UNCONN -2147482816 ...   <-- INT_MAX + 831 bytes
         skmem:(r2147484480,rb2147483646,t0,tb212992,f3264,w0,o0,bl0,d14468947)

So, let's define rmem and rcvbuf as unsigned int and check skb->truesize
only when rcvbuf is large enough to lower the overflow possibility.

Note that we still have a small chance to see overflow if multiple skbs
to the same socket are processed on different core at the same time and
each size does not exceed the limit but the total size does.

Note also that we must ignore skb->truesize for a small buffer as
explained in commit 363dc73acacb ("udp: be less conservative with
sock rmem accounting").

Fixes: 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
v2:
  * Define rmem and rcvbuf as unsigned int
  * Take skb->truesize into account for the large rcvbuf case
---
 net/ipv4/udp.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

Comments

Willem de Bruijn March 26, 2025, 2:09 p.m. UTC | #1
Kuniyuki Iwashima wrote:
> __udp_enqueue_schedule_skb() has the following condition:
> 
>   if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
>           goto drop;
> 
> sk->sk_rcvbuf is initialised by net.core.rmem_default and later can
> be configured by SO_RCVBUF, which is limited by net.core.rmem_max,
> or SO_RCVBUFFORCE.
> 
> If we set INT_MAX to sk->sk_rcvbuf, the condition is always false
> as sk->sk_rmem_alloc is also signed int.
> 
> Then, the size of the incoming skb is added to sk->sk_rmem_alloc
> unconditionally.
> 
> This results in integer overflow (possibly multiple times) on
> sk->sk_rmem_alloc and allows a single socket to have skb up to
> net.core.udp_mem[1].
> 
> For example, if we set a large value to udp_mem[1] and INT_MAX to
> sk->sk_rcvbuf and flood packets to the socket, we can see multiple
> overflows:
> 
>   # cat /proc/net/sockstat | grep UDP:
>   UDP: inuse 3 mem 7956736  <-- (7956736 << 12) bytes > INT_MAX * 15
>                                              ^- PAGE_SHIFT
>   # ss -uam
>   State  Recv-Q      ...
>   UNCONN -1757018048 ...    <-- flipping the sign repeatedly
>          skmem:(r2537949248,rb2147483646,t0,tb212992,f1984,w0,o0,bl0,d0)
> 
> Previously, we had a boundary check for INT_MAX, which was removed by
> commit 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc").
> 
> A complete fix would be to revert it and cap the right operand by
> INT_MAX:
> 
>   rmem = atomic_add_return(size, &sk->sk_rmem_alloc);
>   if (rmem > min(size + (unsigned int)sk->sk_rcvbuf, INT_MAX))
>           goto uncharge_drop;
> 
> but we do not want to add the expensive atomic_add_return() back just
> for the corner case.
> 
> Casting rmem to unsigned int prevents multiple wraparounds, but we still
> allow a single wraparound.
> 
>   # cat /proc/net/sockstat | grep UDP:
>   UDP: inuse 3 mem 524288  <-- (INT_MAX + 1) >> 12
> 
>   # ss -uam
>   State  Recv-Q      ...
>   UNCONN -2147482816 ...   <-- INT_MAX + 831 bytes
>          skmem:(r2147484480,rb2147483646,t0,tb212992,f3264,w0,o0,bl0,d14468947)
> 
> So, let's define rmem and rcvbuf as unsigned int and check skb->truesize
> only when rcvbuf is large enough to lower the overflow possibility.
> 
> Note that we still have a small chance to see overflow if multiple skbs
> to the same socket are processed on different core at the same time and
> each size does not exceed the limit but the total size does.
> 
> Note also that we must ignore skb->truesize for a small buffer as
> explained in commit 363dc73acacb ("udp: be less conservative with
> sock rmem accounting").
> 
> Fixes: 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>
diff mbox series

Patch

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index a9bb9ce5438e..4499e1fe4d50 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1726,17 +1726,25 @@  static int udp_rmem_schedule(struct sock *sk, int size)
 int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct sk_buff_head *list = &sk->sk_receive_queue;
-	int rmem, err = -ENOMEM;
+	unsigned int rmem, rcvbuf;
 	spinlock_t *busy = NULL;
-	int size, rcvbuf;
+	int size, err = -ENOMEM;
 
-	/* Immediately drop when the receive queue is full.
-	 * Always allow at least one packet.
-	 */
 	rmem = atomic_read(&sk->sk_rmem_alloc);
 	rcvbuf = READ_ONCE(sk->sk_rcvbuf);
-	if (rmem > rcvbuf)
-		goto drop;
+	size = skb->truesize;
+
+	/* Immediately drop when the receive queue is full.
+	 * Cast to unsigned int performs the boundary check for INT_MAX.
+	 */
+	if (rmem + size > rcvbuf) {
+		if (rcvbuf > INT_MAX >> 1)
+			goto drop;
+
+		/* Always allow at least one packet for small buffer. */
+		if (rmem > rcvbuf)
+			goto drop;
+	}
 
 	/* Under mem pressure, it might be helpful to help udp_recvmsg()
 	 * having linear skbs :
@@ -1749,7 +1757,7 @@  int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
 
 		busy = busylock_acquire(sk);
 	}
-	size = skb->truesize;
+
 	udp_set_dev_scratch(skb);
 
 	atomic_add(size, &sk->sk_rmem_alloc);