diff mbox series

[bpf,v3,01/12] bpf: sockmap, pass skb ownership through read_skb

Message ID 20230403200138.937569-2-john.fastabend@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf sockmap fixes | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-3 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-5 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-7 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-8 success Logs for set-matrix
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf, async
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 35 this patch: 37
netdev/cc_maintainers warning 6 maintainers not CCed: pabeni@redhat.com dsahern@kernel.org willemdebruijn.kernel@gmail.com kuba@kernel.org kuniyu@amazon.com davem@davemloft.net
netdev/build_clang fail Errors and warnings before: 18 this patch: 22
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn fail Errors and warnings before: 35 this patch: 37
netdev/checkpatch warning WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: 04919bed948d ("tcp: Introduce tcp_read_skb()")' WARNING: Reported-by: should be immediately followed by Link: with a URL to the report
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

John Fastabend April 3, 2023, 8:01 p.m. UTC
The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that the consume_skb() doesn't kfree the sk_buff.

This is problematic because in some error cases under memory pressure
we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
Then we get this,

 skb_linearize()
   __pskb_pull_tail()
     pskb_expand_head()
       BUG_ON(skb_shared(skb))

Because we incremented users refcnt from sk_psock_verdict_recv() we
hit the bug on with refcnt > 1 and trip it.

To fix lets simply pass ownership of the sk_buff through the skb_read
call. Then we can drop the consume from read_skb handlers and assume
the verdict recv does any required kfree.

Bug found while testing in our CI which runs in VMs that hit memory
constraints rather regularly. William tested TCP read_skb handlers.

[  106.536188] ------------[ cut here ]------------
[  106.536197] kernel BUG at net/core/skbuff.c:1693!
[  106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
[  106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
[  106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
[  106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
[  106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
[  106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
[  106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
[  106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
[  106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
[  106.540568] FS:  00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[  106.540954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
[  106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  106.542255] Call Trace:
[  106.542383]  <IRQ>
[  106.542487]  __pskb_pull_tail+0x4b/0x3e0
[  106.542681]  skb_ensure_writable+0x85/0xa0
[  106.542882]  sk_skb_pull_data+0x18/0x20
[  106.543084]  bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
[  106.543536]  ? migrate_disable+0x66/0x80
[  106.543871]  sk_psock_verdict_recv+0xe2/0x310
[  106.544258]  ? sk_psock_write_space+0x1f0/0x1f0
[  106.544561]  tcp_read_skb+0x7b/0x120
[  106.544740]  tcp_data_queue+0x904/0xee0
[  106.544931]  tcp_rcv_established+0x212/0x7c0
[  106.545142]  tcp_v4_do_rcv+0x174/0x2a0
[  106.545326]  tcp_v4_rcv+0xe70/0xf60
[  106.545500]  ip_protocol_deliver_rcu+0x48/0x290
[  106.545744]  ip_local_deliver_finish+0xa7/0x150

Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
Reported-by: William Findlay <will@isovalent.com>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/skmsg.c   | 2 --
 net/ipv4/tcp.c     | 1 -
 net/ipv4/udp.c     | 5 +----
 net/unix/af_unix.c | 5 +----
 4 files changed, 2 insertions(+), 11 deletions(-)

Comments

kernel test robot April 4, 2023, 2:10 a.m. UTC | #1
Hi John,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf/master]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Fastabend/bpf-sockmap-pass-skb-ownership-through-read_skb/20230404-040431
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
patch link:    https://lore.kernel.org/r/20230403200138.937569-2-john.fastabend%40gmail.com
patch subject: [PATCH bpf v3 01/12] bpf: sockmap, pass skb ownership through read_skb
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20230404/202304040949.mjn0pmKV-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/44ddc6a14f8903f3f97d347b5303f678a8e2c3ea
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review John-Fastabend/bpf-sockmap-pass-skb-ownership-through-read_skb/20230404-040431
        git checkout 44ddc6a14f8903f3f97d347b5303f678a8e2c3ea
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/ipv4/ net/unix/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304040949.mjn0pmKV-lkp@intel.com/

All warnings (new ones prefixed by >>):

   net/ipv4/udp.c: In function 'udp_read_skb':
>> net/ipv4/udp.c:1816:18: warning: unused variable 'copied' [-Wunused-variable]
    1816 |         int err, copied;
         |                  ^~~~~~
--
   net/unix/af_unix.c: In function 'unix_read_skb':
>> net/unix/af_unix.c:2556:18: warning: unused variable 'copied' [-Wunused-variable]
    2556 |         int err, copied;
         |                  ^~~~~~


vim +/copied +1816 net/ipv4/udp.c

2276f58ac5890e Paolo Abeni     2017-05-16  1812  
965b57b469a589 Cong Wang       2022-06-15  1813  int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
d7f571188ecf25 Cong Wang       2021-03-30  1814  {
d7f571188ecf25 Cong Wang       2021-03-30  1815  	struct sk_buff *skb;
31f1fbcb346c93 Peilin Ye       2022-09-22 @1816  	int err, copied;
d7f571188ecf25 Cong Wang       2021-03-30  1817  
31f1fbcb346c93 Peilin Ye       2022-09-22  1818  try_again:
ec095263a96572 Oliver Hartkopp 2022-04-11  1819  	skb = skb_recv_udp(sk, MSG_DONTWAIT, &err);
d7f571188ecf25 Cong Wang       2021-03-30  1820  	if (!skb)
d7f571188ecf25 Cong Wang       2021-03-30  1821  		return err;
099f896f498a2b Cong Wang       2021-11-14  1822  
099f896f498a2b Cong Wang       2021-11-14  1823  	if (udp_lib_checksum_complete(skb)) {
31f1fbcb346c93 Peilin Ye       2022-09-22  1824  		int is_udplite = IS_UDPLITE(sk);
31f1fbcb346c93 Peilin Ye       2022-09-22  1825  		struct net *net = sock_net(sk);
31f1fbcb346c93 Peilin Ye       2022-09-22  1826  
31f1fbcb346c93 Peilin Ye       2022-09-22  1827  		__UDP_INC_STATS(net, UDP_MIB_CSUMERRORS, is_udplite);
31f1fbcb346c93 Peilin Ye       2022-09-22  1828  		__UDP_INC_STATS(net, UDP_MIB_INERRORS, is_udplite);
099f896f498a2b Cong Wang       2021-11-14  1829  		atomic_inc(&sk->sk_drops);
099f896f498a2b Cong Wang       2021-11-14  1830  		kfree_skb(skb);
31f1fbcb346c93 Peilin Ye       2022-09-22  1831  		goto try_again;
099f896f498a2b Cong Wang       2021-11-14  1832  	}
099f896f498a2b Cong Wang       2021-11-14  1833  
db39dfdc1c3bd9 Peilin Ye       2022-09-20  1834  	WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
44ddc6a14f8903 John Fastabend  2023-04-03  1835  	return recv_actor(sk, skb);
d7f571188ecf25 Cong Wang       2021-03-30  1836  }
965b57b469a589 Cong Wang       2022-06-15  1837  EXPORT_SYMBOL(udp_read_skb);
d7f571188ecf25 Cong Wang       2021-03-30  1838
diff mbox series

Patch

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index f81883759d38..4a3dc8d27295 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1183,8 +1183,6 @@  static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
 	int ret = __SK_DROP;
 	int len = skb->len;
 
-	skb_get(skb);
-
 	rcu_read_lock();
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 288693981b00..1be305e3d3c7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1770,7 +1770,6 @@  int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 		WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
 		tcp_flags = TCP_SKB_CB(skb)->tcp_flags;
 		used = recv_actor(sk, skb);
-		consume_skb(skb);
 		if (used < 0) {
 			if (!copied)
 				copied = used;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c605d171eb2d..73cb561c3a4f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1832,10 +1832,7 @@  int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 	}
 
 	WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
-	copied = recv_actor(sk, skb);
-	kfree_skb(skb);
-
-	return copied;
+	return recv_actor(sk, skb);
 }
 EXPORT_SYMBOL(udp_read_skb);
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0b0f18ecce44..162ec061d438 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2561,10 +2561,7 @@  static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 	if (!skb)
 		return err;
 
-	copied = recv_actor(sk, skb);
-	kfree_skb(skb);
-
-	return copied;
+	return recv_actor(sk, skb);
 }
 
 /*