From patchwork Tue Mar 21 21:52:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183295 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11811C76195 for ; Tue, 21 Mar 2023 21:52:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229854AbjCUVwU (ORCPT ); Tue, 21 Mar 2023 17:52:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbjCUVwT (ORCPT ); Tue, 21 Mar 2023 17:52:19 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 492CA3B3F3; Tue, 21 Mar 2023 14:52:18 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id j3-20020a17090adc8300b0023d09aea4a6so21758933pjv.5; Tue, 21 Mar 2023 14:52:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435538; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OpJzN70oPUzh0zn7YxxB2eKnGbh8qoYQfWsq6BBmv1Y=; b=mbmDk8CF8C6NeGCyjgPOAbvoemE313pSiBlys8ZJji4hTypC1TuSa66ZOkrZNcwp83 zikPVA8GGbkyZXKOuN10bTYkZMR9/+l4WOSILhzfMXE+luQv7c+Gz7sESltkyBH3Onsf BZhlTzRUzhz5L+wJVeEtuRHezC3xCZK4KCX+5RLwqL0wADpJipEaDscebgU7Yvxg2XLh hq0z01n6Hp5D5qdNAn9oEMw21Oxh+Wbho5D9b1qT6/FviMZNyKyiTdqowEGBTTEJCnwE RAaAWIv+LtSfqzFqWDnewfoXP20uzXkwK0/wJt6RhAY9rUW7RsXh4Bg+ePiLTI6lxr1J bSCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435538; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OpJzN70oPUzh0zn7YxxB2eKnGbh8qoYQfWsq6BBmv1Y=; b=tgwiMuN/HqjyNhnvlhzGHpGblTpGmn+s3mP4PiUqD5PlIzBOLtBF9q0udT4wRDLalD JCMm7X/stqWu+oCRQEdo4D248Clxbv3PBRo3n59HWdsYvpff1WQKdxLZ7FIxkd1K5uaz EQr1BfGzO9wVGq4Z+PJszt+IYhp8bkl7i6NH2aSXZkFF/lZl/Rik0s7ApKCzHqZ/qAzR DJ0r8YzMwyGifDEILAV+diFsNitVnDWUgoO7obHW/YI/pnw6l+6md3SgVEpnx+rbchG8 J3zHHXNi/uCazhufrET7oM4ITa08t2oitQFEDohdegZtbQLamHjINbsbeEhUvD7ZqY6f LI6g== X-Gm-Message-State: AO0yUKV4ZTYdbeSQ8mRMoon9jyVvGwnfyBVpGlUmTLCjktvrt9PU+/07 pOua2dHtSnvKHG8C+CMoTAs= X-Google-Smtp-Source: AK7set+B8FYRnzEVpGRrgOsbPvdBI0MzErfx/+2b6TFDB0yxGxrZVJtLI9ED5cTGifzLrsPBf5Evaw== X-Received: by 2002:a17:903:22c3:b0:19e:674a:1fb9 with SMTP id y3-20020a17090322c300b0019e674a1fb9mr543866plg.69.1679435537774; Tue, 21 Mar 2023 14:52:17 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:17 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 01/11] bpf: sockmap, pass skb ownership through read_skb Date: Tue, 21 Mar 2023 14:52:02 -0700 Message-Id: <20230321215212.525630-2-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff. This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this, skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb)) Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it. To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree. Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers. [ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1 [ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014 [ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330 [ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202 [ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20 [ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8 [ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000 [ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8 [ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8 [ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0 [ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.542255] Call Trace: [ 106.542383] [ 106.542487] __pskb_pull_tail+0x4b/0x3e0 [ 106.542681] skb_ensure_writable+0x85/0xa0 [ 106.542882] sk_skb_pull_data+0x18/0x20 [ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9 [ 106.543536] ? migrate_disable+0x66/0x80 [ 106.543871] sk_psock_verdict_recv+0xe2/0x310 [ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0 [ 106.544561] tcp_read_skb+0x7b/0x120 [ 106.544740] tcp_data_queue+0x904/0xee0 [ 106.544931] tcp_rcv_established+0x212/0x7c0 [ 106.545142] tcp_v4_do_rcv+0x174/0x2a0 [ 106.545326] tcp_v4_rcv+0xe70/0xf60 [ 106.545500] ip_protocol_deliver_rcu+0x48/0x290 [ 106.545744] ip_local_deliver_finish+0xa7/0x150 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reported-by: William Findlay Tested-by: William Findlay Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 -- net/ipv4/tcp.c | 1 - net/ipv4/udp.c | 5 +---- net/unix/af_unix.c | 5 +---- 4 files changed, 2 insertions(+), 11 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 53d0251788aa..2b6d9519ff29 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1180,8 +1180,6 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) int ret = __SK_DROP; int len = skb->len; - skb_get(skb); - rcu_read_lock(); psock = sk_psock(sk); if (unlikely(!psock)) { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 33f559f491c8..6572962b0237 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1770,7 +1770,6 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); tcp_flags = TCP_SKB_CB(skb)->tcp_flags; used = recv_actor(sk, skb); - consume_skb(skb); if (used < 0) { if (!copied) copied = used; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 9592fe3e444a..04e8c6385246 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1832,10 +1832,7 @@ int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) } WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); - copied = recv_actor(sk, skb); - kfree_skb(skb); - - return copied; + return recv_actor(sk, skb); } EXPORT_SYMBOL(udp_read_skb); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f0c2293f1d3b..a5dd2ee0cfed 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2554,10 +2554,7 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor) if (!skb) return err; - copied = recv_actor(sk, skb); - kfree_skb(skb); - - return copied; + return recv_actor(sk, skb); } /* From patchwork Tue Mar 21 21:52:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183296 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ACECC6FD1D for ; Tue, 21 Mar 2023 21:52:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229961AbjCUVwZ (ORCPT ); Tue, 21 Mar 2023 17:52:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbjCUVwW (ORCPT ); Tue, 21 Mar 2023 17:52:22 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D44C3E633; Tue, 21 Mar 2023 14:52:20 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id z19so7092570plo.2; Tue, 21 Mar 2023 14:52:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C16N4t/yebwnWEt3up1L0bcXG9FtrfIrB+7RyncYbw4=; b=iOt2pqUT1uZdYmLJpLvbbUao5fOULrd6HbpoNz92z/sRYVzExF+q69UAb5H6HJeS4K aqta9IomgATWddpABEH3YtcMrFwIxAr+/AQRMFGm1bd+c+YbZkX2mnyfIovxsgUSBDKF EKJ09wdNIzBD1y4OnK/ceIvPriewAbirXHWUqJkDQE58lwrDjeIGlTt2iSMimfGOvaxQ IKXnGbSCxqzeHP1jJq59A94PUPpXwy1NVQIGfX/5RZkYZGDgrbSAlDRUAuZbgNKOjXuG cYM/4+I7PTatQVoasZiQ6orACEtiELei+CS/OZ2/Xc+zbzgdzB0Ctno2ndSyPkoyugaD YCGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C16N4t/yebwnWEt3up1L0bcXG9FtrfIrB+7RyncYbw4=; b=aJD34v0hUCtZTi4wkZ7M7R6rJPAGb8c2mnYohMTKNeZI2fdG7iehh6E5Xw2rHVwLNK EroVrl159s7YyiOuh1rXlsniPcmC9l+ZBJx5ezuPwqPLf0bYNbW/vE3OcgkVKCiIsQDu bemxzvy84qxRaBgwbiFjrsMOjoqQR59YR2KlYa5PRVg1T23J6awWwqyq91iv4Hloab1R x+oc2YNk41DXszTrgGKY++hzbi3h4ONYfIIPr5S2t6VVbsN6hDfEcNkCYDehjZs5Ii8m QpTv1W4Jl/sSUh2UVd+THowER46VQj4imePM0X4M8v5z0jCIzma9YYPvhBE5qI0F0lRS w9Ow== X-Gm-Message-State: AO0yUKUolPmn/wDRe25LZQ66wtD8UV8UhtO20+948HiMvJVALEd57sLR VUUIxFqgZnSzy5VHf0voo1I= X-Google-Smtp-Source: AK7set8EHJeD6AjLpG62ySoy1RH3j6xP4CQFdIxg5Jz3fT8jMdK3lkZZikO6mO5+AgKMgypABvmsBw== X-Received: by 2002:a17:902:dac2:b0:1a1:d1be:91fe with SMTP id q2-20020a170902dac200b001a1d1be91femr813276plx.0.1679435539767; Tue, 21 Mar 2023 14:52:19 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:19 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 02/11] bpf: sockmap, convert schedule_work into delayed_work Date: Tue, 21 Mar 2023 14:52:03 -0700 Message-Id: <20230321215212.525630-3-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Sk_buffs are fed into sockmap verdict programs either from a strparser (when the user might want to decide how framing of skb is done by attaching another parser program) or directly through tcp_read_sock. The tcp_read_sock is the preferred method for performance when the BPF logic is a stream parser. The flow for Cilium's common use case with a stream parser is, tcp_read_sock() sk_psock_verdict_recv ret = bpf_prog_run_pin_on_cpu() sk_psock_verdict_apply(sock, skb, ret) // if system is under memory pressure or app is slow we may // need to queue skb. Do this queuing through ingress_skb and // then kick timer to wake up handler skb_queue_tail(ingress_skb, skb) schedule_work(work); The work queue is wired up to sk_psock_backlog(). This will then walk the ingress_skb skb list that holds our sk_buffs that could not be handled, but should be OK to run at some later point. However, its possible that the workqueue doing this work still hits an error when sending the skb. When this happens the skbuff is requeued on a temporary 'state' struct kept with the workqueue. This is necessary because its possible to partially send an skbuff before hitting an error and we need to know how and where to restart when the workqueue runs next. Now for the trouble, we don't rekick the workqueue. This can cause a stall where the skbuff we just cached on the state variable might never be sent. This happens when its the last packet in a flow and no further packets come along that would cause the system to kick the workqueue from that side. To fix we could do simple schedule_work(), but while under memory pressure it makes sense to back off some instead of continue to retry repeatedly. So instead to fix convert schedule_work to schedule_delayed_work and add backoff logic to reschedule from backlog queue on errors. Its not obvious though what a good backoff is so use '1'. To test we observed some flakes whil running NGINX compliance test with sockmap we attributed these failed test to this bug and subsequent issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Tested-by: William Findlay Signed-off-by: John Fastabend --- include/linux/skmsg.h | 2 +- net/core/skmsg.c | 19 ++++++++++++------- net/core/sock_map.c | 3 ++- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 84f787416a54..904ff9a32ad6 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -105,7 +105,7 @@ struct sk_psock { struct proto *sk_proto; struct mutex work_mutex; struct sk_psock_work_state work_state; - struct work_struct work; + struct delayed_work work; struct rcu_work rwork; }; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 2b6d9519ff29..96a6a3a74a67 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -481,7 +481,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, } out: if (psock->work_state.skb && copied > 0) - schedule_work(&psock->work); + schedule_delayed_work(&psock->work, 0); return copied; } EXPORT_SYMBOL_GPL(sk_msg_recvmsg); @@ -639,7 +639,8 @@ static void sk_psock_skb_state(struct sk_psock *psock, static void sk_psock_backlog(struct work_struct *work) { - struct sk_psock *psock = container_of(work, struct sk_psock, work); + struct delayed_work *dwork = to_delayed_work(work); + struct sk_psock *psock = container_of(dwork, struct sk_psock, work); struct sk_psock_work_state *state = &psock->work_state; struct sk_buff *skb = NULL; bool ingress; @@ -679,6 +680,10 @@ static void sk_psock_backlog(struct work_struct *work) if (ret == -EAGAIN) { sk_psock_skb_state(psock, state, skb, len, off); + + // Delay slightly to prioritize any + // other work that might be here. + schedule_delayed_work(&psock->work, 1); goto end; } /* Hard errors break pipe and stop xmit. */ @@ -733,7 +738,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) INIT_LIST_HEAD(&psock->link); spin_lock_init(&psock->link_lock); - INIT_WORK(&psock->work, sk_psock_backlog); + INIT_DELAYED_WORK(&psock->work, sk_psock_backlog); mutex_init(&psock->work_mutex); INIT_LIST_HEAD(&psock->ingress_msg); spin_lock_init(&psock->ingress_lock); @@ -822,7 +827,7 @@ static void sk_psock_destroy(struct work_struct *work) sk_psock_done_strp(psock); - cancel_work_sync(&psock->work); + cancel_delayed_work_sync(&psock->work); mutex_destroy(&psock->work_mutex); psock_progs_drop(&psock->progs); @@ -937,7 +942,7 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb) } skb_queue_tail(&psock_other->ingress_skb, skb); - schedule_work(&psock_other->work); + schedule_delayed_work(&psock_other->work, 0); spin_unlock_bh(&psock_other->ingress_lock); return 0; } @@ -1017,7 +1022,7 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, spin_lock_bh(&psock->ingress_lock); if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); + schedule_delayed_work(&psock->work, 0); err = 0; } spin_unlock_bh(&psock->ingress_lock); @@ -1048,7 +1053,7 @@ static void sk_psock_write_space(struct sock *sk) psock = sk_psock(sk); if (likely(psock)) { if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) - schedule_work(&psock->work); + schedule_delayed_work(&psock->work, 0); write_space = psock->saved_write_space; } rcu_read_unlock(); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index a68a7290a3b2..d38267201892 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1624,9 +1624,10 @@ void sock_map_close(struct sock *sk, long timeout) rcu_read_unlock(); sk_psock_stop(psock); release_sock(sk); - cancel_work_sync(&psock->work); + cancel_delayed_work_sync(&psock->work); sk_psock_put(sk, psock); } + /* Make sure we do not recurse. This is a bug. * Leak the socket instead of crashing on a stack overflow. */ From patchwork Tue Mar 21 21:52:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183297 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41EC5C6FD20 for ; Tue, 21 Mar 2023 21:52:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229924AbjCUVw0 (ORCPT ); Tue, 21 Mar 2023 17:52:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229922AbjCUVwY (ORCPT ); Tue, 21 Mar 2023 17:52:24 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A4803E0B1; Tue, 21 Mar 2023 14:52:22 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id j3-20020a17090adc8300b0023d09aea4a6so21759067pjv.5; Tue, 21 Mar 2023 14:52:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435541; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AfTerV6/U69AI92CPlAfsf8FX5XYFtSurKpJkvd75cI=; b=WNdgta6YjiNwjEBjC/bv0c0IqnXah8zvAwSq1pNgel4usiKoeZ/J6mAM5To2I03j9y 8zCm4kGqd1xzJ+l18wlU79QJz5gGbFyM6kmrbjXxT762w94qNWe1p+YFcI7HncPMw1uw KvkfbRDoHJ0NrHL7R8kuj+d/HIauU+wmfPLfzR25tE+xvWLuXcbzc9uvJNQwikxrLLHo BqU+DgXYOUh2iFyVnedP+syJs5itmSMDbYAhDa+owUYWuhJ3Z4lcRi/MiAdGTjJwg0s8 V8xFn5Ana/DlbuYoHlvwUZxLBafuGood+uSveD0Wj0zQl/cQTufALNB+GpJhlVS5eYoD WVmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435541; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AfTerV6/U69AI92CPlAfsf8FX5XYFtSurKpJkvd75cI=; b=KhG0EqD1HJzIx2mJNovui/fq/yX9uWekQYbz8zV3A+5ICa6PqP8DlQ5nwxe9q70IYr v5YdT53M+VCS2V4rxr/6IKmgpjNTIIGpy43zDKOJOP1/3x47M7bS/qXXYEaqHGAf5Mvt L3LSaobJiXayfquVwHu+8rDsXiP6H3ODC0FFIlleMCAUgKnjLgkDoOT+fecWYRmLeg09 EVFJ2xld+m0XNV9GsN38Q5R0BfKOwfg4XxdGqC0xJF6UBNzblT8y+ugytTDDnWMuj5Nr j7+xnzftBx4mg/+sYKkKYZNZfgivu1X6C5q+4ZEC9/io59LKcQBLs9zm7/p+dVrM9j+t IX6w== X-Gm-Message-State: AO0yUKXJOTb/iweJpEqIlfIL0zL7eaGKqqUe9nkCHp48rLjqlx2w5Ymu EZk5ZSsJ7RbHOwhDbt90cs0= X-Google-Smtp-Source: AK7set9LOtuuD2cOlOwPe1GlYKCpQ0iCgULMMddx5ZnobY7tUbKSZ+hHwS8Jxk4A7hMzVEihE9SxSw== X-Received: by 2002:a17:902:e0d1:b0:19d:1bc1:ce22 with SMTP id e17-20020a170902e0d100b0019d1bc1ce22mr433082pla.5.1679435541705; Tue, 21 Mar 2023 14:52:21 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:21 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 03/11] bpf: sockmap, improved check for empty queue Date: Tue, 21 Mar 2023 14:52:04 -0700 Message-Id: <20230321215212.525630-4-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix also check the 'state' variable where we would cache partially processed sk_buff. This catches the majority of cases. But, we also need to use the mutex lock around this check because we can't have both codes running and check sensibly. We could perhaps do this with atomic bit checks, but we are already here due to memory pressure so slowing things down a bit seems OK and simpler to just grab a lock. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Tested-by: William Findlay Signed-off-by: John Fastabend --- net/core/skmsg.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 96a6a3a74a67..34de0605694e 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -985,6 +985,7 @@ EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read); static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, int verdict) { + struct sk_psock_work_state *state; struct sock *sk_other; int err = 0; u32 len, off; @@ -1001,13 +1002,28 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, skb_bpf_set_ingress(skb); + /* We need to grab mutex here because in-flight skb is in one of + * the following states: either on ingress_skb, in psock->state + * or being processed by backlog and neither in state->skb and + * ingress_skb may be also empty. The troublesome case is when + * the skb has been dequeued from ingress_skb list or taken from + * state->skb because we can not easily test this case. Maybe we + * could be clever with flags and resolve this but being clever + * got us here in the first place and we note this is done under + * sock lock and backlog conditions mean we are already running + * into ENOMEM or other performance hindering cases so lets do + * the obvious thing and grab the mutex. + */ + mutex_lock(&psock->work_mutex); + state = &psock->work_state; + /* If the queue is empty then we can submit directly * into the msg queue. If its not empty we have to * queue work otherwise we may get OOO data. Otherwise, * if sk_psock_skb_ingress errors will be handled by * retrying later from workqueue. */ - if (skb_queue_empty(&psock->ingress_skb)) { + if (skb_queue_empty(&psock->ingress_skb) && likely(!state->skb)) { len = skb->len; off = 0; if (skb_bpf_strparser(skb)) { @@ -1028,9 +1044,11 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, spin_unlock_bh(&psock->ingress_lock); if (err < 0) { skb_bpf_redirect_clear(skb); + mutex_unlock(&psock->work_mutex); goto out_free; } } + mutex_unlock(&psock->work_mutex); break; case __SK_REDIRECT: err = sk_psock_skb_redirect(psock, skb); From patchwork Tue Mar 21 21:52:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183298 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4682CC6FD1D for ; Tue, 21 Mar 2023 21:52:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229922AbjCUVw2 (ORCPT ); Tue, 21 Mar 2023 17:52:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229906AbjCUVwZ (ORCPT ); Tue, 21 Mar 2023 17:52:25 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 375B0442CC; Tue, 21 Mar 2023 14:52:24 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id j3-20020a17090adc8300b0023d09aea4a6so21759142pjv.5; Tue, 21 Mar 2023 14:52:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435544; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y5tqObSMAtHNVK29HuvTPLgGUtx1E+Ad7BIcyyykhH0=; b=HScSEp5FMnWxC576elCb1Yu4r4jc7IgYJuU7qloJz++LeYaiebwp1UfQvEMziql2xH wr+d4OWe1/OJipHKnBYpjMg+PYv/uqEl2KojQ6GkU14H2cXir8jNmbNcn7byuvnHOZaK A3cHZv7Nxydsdi6e4Av01vVdkEuoV9Rgip7oGMiEOoHHaKzYpgaZ91cK4lzTmkZStdPJ OW5TJ1jKV1ezyPFfrzRnrxi1fldqxFEyZd3ChZSaWmk8jelsLBdrOdhxUqHfQowPqd6P eFMJFoGxnaiGZ1kTDUSM4DYRiacZO0hH2gFRa2X/hQ0I8dX0rxMB21ma/B6rNaIvd/jU WsxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435544; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y5tqObSMAtHNVK29HuvTPLgGUtx1E+Ad7BIcyyykhH0=; b=fLx6FFzDWWH/yphW2D74ziD1R2NceJ3n8e8TUh7GzjVxUF83OJx/bSGFCqe8Mp4Rc7 G+6hm9/kKuArfD8ZW3ourWidvZ1SOPviKzGhh0JQS6oHgg9pUVk4ianFwnbWVvfTfx05 6If8875PgOo3En61sSD2KsGIqQr3Gt32ZMWy6QHdIDIsEKaP/LhJ4HYy+0KzxLxGQsfv vANVNHQl9L41v7a5O9gfBU/nm1k7inrs++Ftv7yeRGHUO3+ViOZzGRyKiKw2nMvIMTrU AwY1/l1ZcizKBI0fBjnF5mggHX+5IWf6sjKttKTwM4T/EaK+lqMBmlHHtvSIyS0Z5wTp KREg== X-Gm-Message-State: AO0yUKXDUo9vfs0c3iW37TQ+JHy6zQWrKu+QYs02c/luI8kq+VV8Rop+ pEj62TBM10aEvYFbW5L8Bco= X-Google-Smtp-Source: AK7set/pjEdwZ0m5HPLZDTDbny/Sj4QbmUyYy0MvWDnuuJMu4PcjpV3QoOXdgyIsb4xQ9lvcw9/D+g== X-Received: by 2002:a05:6a20:8b05:b0:d9:5db:7345 with SMTP id l5-20020a056a208b0500b000d905db7345mr3180458pzh.26.1679435543849; Tue, 21 Mar 2023 14:52:23 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:23 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 04/11] bpf: sockmap, handle fin correctly Date: Tue, 21 Mar 2023 14:52:05 -0700 Message-Id: <20230321215212.525630-5-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The sockmap code is returning EAGAIN after a FIN packet is received and no more data is on the receive queue. Correct behavior is to return 0 to the user and the user can then close the socket. The EAGAIN causes many apps to retry which masks the problem. Eventually the socket is evicted from the sockmap because its released from sockmap sock free handling. The issue creates a delay and can cause some errors on application side. To fix this check on sk_msg_recvmsg side if length is zero and FIN flag is set then set return to zero. A selftest will be added to check this condition. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Tested-by: William Findlay Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index cf26d65ca389..3a0f43f3afd8 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -174,6 +174,24 @@ static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock, return ret; } +static bool is_next_msg_fin(struct sk_psock *psock) +{ + struct scatterlist *sge; + struct sk_msg *msg_rx; + int i; + + msg_rx = sk_psock_peek_msg(psock); + i = msg_rx->sg.start; + sge = sk_msg_elem(msg_rx, i); + if (!sge->length) { + struct sk_buff *skb = msg_rx->skb; + + if (skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) + return true; + } + return false; +} + static int tcp_bpf_recvmsg_parser(struct sock *sk, struct msghdr *msg, size_t len, @@ -193,6 +211,19 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, lock_sock(sk); msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + /* The typical case for EFAULT is the socket was gracefully + * shutdown with a FIN pkt. So check here the other case is + * some error on copy_page_to_iter which would be unexpected. + * On fin return correct return code to zero. + */ + if (copied == -EFAULT) { + bool is_fin = is_next_msg_fin(psock); + + if (is_fin) { + copied = 0; + goto out; + } + } if (!copied) { long timeo; int data; From patchwork Tue Mar 21 21:52:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183299 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D87EC6FD20 for ; Tue, 21 Mar 2023 21:52:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230012AbjCUVwj (ORCPT ); Tue, 21 Mar 2023 17:52:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229964AbjCUVwa (ORCPT ); Tue, 21 Mar 2023 17:52:30 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA8C93B3E9; Tue, 21 Mar 2023 14:52:26 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id kc4so2898673plb.10; Tue, 21 Mar 2023 14:52:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435546; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bTfLE6aG3j8MrgJYsS7RufQHedEAG+l/eLAgkkBvx1s=; b=GOzMVSK7KZMcxlss9S2efi2Z3EDc2QXiBqzAEOMZzXnYw3b9Z3w9EkffTAmKyABR6Q isfmi9fWsMvlNMGNfMfag040+STSSBawMdcIErdQswFGsidvjFR5l2hbq+4wTfl4ZFQa SRwedbnQ4cxCFAlvIVZYcp2Z0baMX2i5tAh+Ajsqd0oBvV5Byh04aB8ViklzEhrFXVO5 H50mF6l6DJlsAgubsBbmAf0pMpRZt5spSu0qBru9EbP2l9Hf4cFhAGytgsnWxDCPrKXd qq8XrMl2IdY18wDAcGrgb+bUXkgr0fiAbu3tdUQj3TPWvPz661agbM7pU0pXwV7IvoC7 gkGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435546; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bTfLE6aG3j8MrgJYsS7RufQHedEAG+l/eLAgkkBvx1s=; b=IWxpXpZRufB8IlSzVKLVL9O+5Tcyx7LU2BPHJFvC4mdlLdaMPkc5Mxky9Gd/03NktL 44pZq/84qQViTXmfPXtiy+ltnjDRgXLYuOGAi3r8dMMIjjnszW3mPMknaDAqiHjhQWcN r09ZsSEWRSVg3Xg7FaZesr02cYl3eGLQ+TnL1oq4HTv39Z2Ca3t44UL+M4ggaRe2jZnt SfkQ0FPraRgRmKzq9ukyJSZnMVg139R+hA6TpUrjdy/aMU0B4zgACJZM38k6Aee4lMTh 3ASVFQQpV+PGN1wr1OXi31b3WQG0VF5CfM+jnK6OS6N+75JvMC0I5TjW/ydI9rJj5K6B muUQ== X-Gm-Message-State: AO0yUKVWhNx6F2ZIklqvxOqZSj/YA8+CJf3D62wA6Wxc1gMlgLWCEccB UMyQ0ImguMtqfX3laNRX8Iw= X-Google-Smtp-Source: AK7set/WyUzFNRXXiRp29CxSOP05m0wlFuuJ8mA50HWlxjDVT/HhUTHVzw49cT7oCdeiXPv0kf0EkQ== X-Received: by 2002:a17:902:e293:b0:19f:1c79:8b21 with SMTP id o19-20020a170902e29300b0019f1c798b21mr391727plc.42.1679435545814; Tue, 21 Mar 2023 14:52:25 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:25 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 05/11] bpf: sockmap, TCP data stall on recv before accept Date: Tue, 21 Mar 2023 14:52:06 -0700 Message-Id: <20230321215212.525630-6-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A common mechanism to put a TCP socket into the sockmap is to hook the BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program that can map the socket info to the correct BPF verdict parser. When the user adds the socket to the map the psock is created and the new ops are assigned to ensure the verdict program will 'see' the sk_buffs as they arrive. Part of this process hooks the sk_data_ready op with a BPF specific handler to wake up the BPF verdict program when data is ready to read. The logic is simple enough (posted here for easy reading) static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; sock->ops->read_skb(sk, sk_psock_verdict_recv); } The oversight here is sk->sk_socket is not assigned until the application accepts() the new socket. However, its entirely ok for the peer application to do a connect() followed immediately by sends. The socket on the receiver is sitting on the backlog queue of the listening socket until its accepted and the data is queued up. If the peer never accepts the socket or is slow it will eventually hit data limits and rate limit the session. But, important for BPF sockmap hooks when this data is received TCP stack does the sk_data_ready() call but the read_skb() for this data is never called because sk_socket is missing. The data sits on the sk_receive_queue. Then once the socket is accepted if we never receive more data from the peer there will be no further sk_data_ready calls and all the data is still on the sk_receive_queue(). Then user calls recvmsg after accept() and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler. The handler checks for data in the sk_msg ingress queue expecting that the BPF program has already run from the sk_data_ready hook and enqueued the data as needed. So we are stuck. To fix do an unlikely check in recvmsg handler for data on the sk_receive_queue and if it exists wake up data_ready. We have the sock locked in both read_skb and recvmsg so should avoid having multiple runners. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 3a0f43f3afd8..b1ba58be0c5a 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -209,6 +209,26 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, return tcp_recvmsg(sk, msg, len, flags, addr_len); lock_sock(sk); + + /* We may have received data on the sk_receive_queue pre-accept and + * then we can not use read_skb in this context because we haven't + * assigned a sk_socket yet so have no link to the ops. The work-around + * is to check the sk_receive_queue and in these cases read skbs off + * queue again. The read_skb hook is not running at this point because + * of lock_sock so we avoid having multiple runners in read_skb. + */ + if (unlikely(!skb_queue_empty_lockless(&sk->sk_receive_queue))) { + tcp_data_ready(sk); + /* This handles the ENOMEM errors if we both receive data + * pre accept and are already under memory pressure. At least + * let user no to retry. + */ + if (unlikely(!skb_queue_empty_lockless(&sk->sk_receive_queue))) { + copied = -EAGAIN; + goto out; + } + } + msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); /* The typical case for EFAULT is the socket was gracefully From patchwork Tue Mar 21 21:52:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183300 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C487C76195 for ; Tue, 21 Mar 2023 21:52:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229966AbjCUVwl (ORCPT ); Tue, 21 Mar 2023 17:52:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230022AbjCUVwh (ORCPT ); Tue, 21 Mar 2023 17:52:37 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03A514FAB3; Tue, 21 Mar 2023 14:52:28 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id ix20so17516528plb.3; Tue, 21 Mar 2023 14:52:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JtsTYTsJcWAWZKuDnStnRk5vYxx1AzkuxP7veMJ6MVo=; b=ATQ8vN1DHH5Su80gq2uMK9CC5tmK4oK7oVCoqCIepaWJ66mEve+GFjz600rntVLMFm oMeYHhBWucIpNWK/SXl5WpO7q5rfQmIg9UVBQTHgOSZPWRQyKQWR8ZaxG5bZv7EYRsnX qy4anwiXV42QQSXcWVpo6/mYo0yuDfVId4DHRx13zZWnNNN08lRhkZabpuzHZVjo0Au5 s82P80QyiOlWTjiXeVIHBvmdgB0FxqNHJvk+tgjGLAGM/20aRs3YzjBTBz8pP5X7UeXz fzSYVhtBTVvfFdr4SHBF50nc1i6lmVY3gL28l5Hl6BdyrVu2SeO6wSyuBJqnKShh/pD3 F6xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JtsTYTsJcWAWZKuDnStnRk5vYxx1AzkuxP7veMJ6MVo=; b=7N+1lw7w4xm2gipRwJaRIfeEjMXbs9VsIS6ebwhRrg7oMAo51HMXukyI10KI3r4joL dgT2aKFTS/egys8BKeByb15dIioadVK5Ek4tlCEzqhJA+m/WKirfTOtFa7Oh8ZBlYaTM 4pUpfKTn8H0GDakgRmCjg+dmV1hdE14T2zn67TJ2naJZt02UU9qyp9qF7InAofQyWMpO SQnItO1op9twGBc023Yg2Il1ow92FbDBLh3ykiau0TpnaTsa1+alVm8Vh1R9VYwlwBvw /cszjN1NRfuaJiOAe1jv/tCwFhIoNbKwV7gw45AyB29/wC/AF+ljIOTovLmRScHSbgts AaOQ== X-Gm-Message-State: AO0yUKWpPGErah5BQf2i0WUym9mTQmIfsm1013FM95bHsLU6ilJFaeRj ZlZ1sJJUp0/F0tQreZqxcbQ= X-Google-Smtp-Source: AK7set9MBuTRTBl6+QsN3IJSIMHBI9yheYby5PCOI8Bg+uelRoQ4cm7Nbo2VjqKzdFC4P9TmUuDc9w== X-Received: by 2002:a17:903:32cf:b0:19c:e842:a9e0 with SMTP id i15-20020a17090332cf00b0019ce842a9e0mr591123plr.16.1679435548177; Tue, 21 Mar 2023 14:52:28 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:27 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 06/11] bpf: sockmap, wake up polling after data copy Date: Tue, 21 Mar 2023 14:52:07 -0700 Message-Id: <20230321215212.525630-7-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When TCP stack has data ready to read sk_data_ready() is called. Sockmap overwrites this with its own handler to call into BPF verdict program. But, the original TCP socket had sock_def_readable that would additionally wake up any user space waiters with sk_wake_async(). Sockmap saved the callback when the socket was created so call the saved data ready callback and then we can wake up any epoll() logic waiting on the read. Note we call on 'copied >= 0' to account for returning 0 when a FIN is received because we need to wake up user for this as well so they can do the recvmsg() -> 0 and detect the shutdown. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend --- net/core/skmsg.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 34de0605694e..10e5481da662 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1230,10 +1230,19 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; + int copied; if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; - sock->ops->read_skb(sk, sk_psock_verdict_recv); + copied = sock->ops->read_skb(sk, sk_psock_verdict_recv); + if (copied >= 0) { + struct sk_psock *psock; + + rcu_read_lock(); + psock = sk_psock(sk); + psock->saved_data_ready(sk); + rcu_read_unlock(); + } } void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock) From patchwork Tue Mar 21 21:52:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183301 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7701BC6FD1D for ; Tue, 21 Mar 2023 21:52:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229808AbjCUVwp (ORCPT ); Tue, 21 Mar 2023 17:52:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229917AbjCUVwi (ORCPT ); Tue, 21 Mar 2023 17:52:38 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB22158B6E; Tue, 21 Mar 2023 14:52:30 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id c18so17493872ple.11; Tue, 21 Mar 2023 14:52:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435550; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aMnc+04Pbpcqj9zq7iu5DADhV34QBm8AEW4HKab4X5M=; b=ArUgyPZdMS7/HRhKUi4a/EIWxBqxxU3iF9n2pR6lhWkYmOd29eWZ6rz4VK4Hd8Rvqq wMus7hPnfIUF52HX3z+QLBIql8nqJE9MQ1jLG1qHNc2hf4jZb3glwCme9yECzx1OhOV0 I31sI0FX6eyS9iBh9ETX7UNQ8R/V61+UkLguwTYgj/svZhjgoZpgPkQ30qUGbg+/t5mD +vCsooUZqt2z1N3dJXdgRphdCmcPtlV7rntZgtm15aeR/OJo/pWmQLmFu3Lt701RnQ5l QszDT/kvllhmJwYSm3wqek3iCMqiT1/r8TghO/keTVNS/ycy2Y/weWrGfvqhYedaYk/M x5ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435550; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aMnc+04Pbpcqj9zq7iu5DADhV34QBm8AEW4HKab4X5M=; b=i66j6eL0z8PzVxm/4cF5g7T/mkjc8g6OygHxsgB5I52bqUQ1pkat8wekhVJo14TEGD rF/lXQ6FEfRL4CWIS5CcGvMi8MWXqLLXNX0T0XXOUxcPXO1twhqEZTpbd5XN5vjErdMX BIQDDEDk9bin6Ka80QxgLq7LAgaVATidZQQQe/GsUDFLu+Og570MCcT/oU1X9WoYJK6p emKQ3BpI/u4qDta8tayopfqgMNHYyUlp6MNI4ScVYRSoCMI+EHUBRcZVTpCNolG2JH+c pAWtD08bqnCzQrXjTTeQD0/yZ2VEKt6WxCBjuLYJNQdsPRjNXiksOl8N/CofdFhJvm1O a4Ag== X-Gm-Message-State: AO0yUKWgIZ8KpFQ8Tpy5r6/nRmrlhNFMDiJbaUmHVso9MYkQFY+wEKGw 7cjQiojh6CXYOWoWT9U0cm4= X-Google-Smtp-Source: AK7set+bx8eGDvMdqo6Fgzq1cmNMW5+QOTn+o/P/8jWKOcnSuA48bUuBJMVHznCWDVpLzxCh72WYmw== X-Received: by 2002:a17:90a:e7cd:b0:23d:35d9:d065 with SMTP id kb13-20020a17090ae7cd00b0023d35d9d065mr885671pjb.48.1679435550101; Tue, 21 Mar 2023 14:52:30 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:29 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 07/11] bpf: sockmap incorrectly handling copied_seq Date: Tue, 21 Mar 2023 14:52:08 -0700 Message-Id: <20230321215212.525630-8-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the application. This results in application errors, if the application does an ioctl(FIONREAD) we return zero because this is calculated from the copied_seq value. To fix this we move tcp->copied_seq accounting into the recv handler so that we update these when the recvmsg() hook is called and data is in fact copied into user buffers. This gives an accurate FIONREAD value as expected and improves ACK handling. Before we were calling the tcp_rcv_space_adjust() which would update 'number of bytes copied to user in last RTT' which is wrong for programs returning SK_PASS. The bytes are only copied to the user when recvmsg is handled. Doing the fix for recvmsg is straightforward, but fixing redirect and SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then call this from skmsg handlers. This fixes another issue where a broken socket with a BPF program doing a resubmit could hang the receiver. This happened because although read_skb() consumed the skb through sock_drop() it did not update the copied_seq. Now if a single reccv socket is redirecting to many sockets (for example for lb) the receiver sk will be hung even though we might expect it to continue. The hang comes from not updating the copied_seq numbers and memory pressure resulting from that. We have a slight layer problem of calling tcp_eat_skb even if its not a TCP socket. To fix we could refactor and create per type receiver handlers. I decided this is more work than we want in the fix and we already have some small tweaks depending on caller that use the helper skb_bpf_strparser(). So we extend that a bit and always set the strparser bit when it is in use and then we can gate the seq_copied updates on this. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend --- include/net/tcp.h | 3 +++ net/core/skmsg.c | 7 +++++-- net/ipv4/tcp.c | 10 +--------- net/ipv4/tcp_bpf.c | 28 +++++++++++++++++++++++++++- 4 files changed, 36 insertions(+), 12 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index db9f828e9d1e..674044b8bdaf 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1467,6 +1467,8 @@ static inline void tcp_adjust_rcv_ssthresh(struct sock *sk) } void tcp_cleanup_rbuf(struct sock *sk, int copied); +void __tcp_cleanup_rbuf(struct sock *sk, int copied); + /* We provision sk_rcvbuf around 200% of sk_rcvlowat. * If 87.5 % (7/8) of the space has been consumed, we want to override @@ -2321,6 +2323,7 @@ struct sk_psock; struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); +void tcp_eat_skb(struct sock *sk, struct sk_buff *skb); #endif /* CONFIG_BPF_SYSCALL */ int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress, diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 10e5481da662..b141b422697c 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1051,11 +1051,14 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, mutex_unlock(&psock->work_mutex); break; case __SK_REDIRECT: + tcp_eat_skb(psock->sk, skb); err = sk_psock_skb_redirect(psock, skb); break; case __SK_DROP: default: out_free: + tcp_eat_skb(psock->sk, skb); + skb_bpf_redirect_clear(skb); sock_drop(psock->sk, skb); } @@ -1100,8 +1103,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) skb_dst_drop(skb); skb_bpf_redirect_clear(skb); ret = bpf_prog_run_pin_on_cpu(prog, skb); - if (ret == SK_PASS) - skb_bpf_set_strparser(skb); + skb_bpf_set_strparser(skb); ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb)); skb->sk = NULL; } @@ -1207,6 +1209,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) psock = sk_psock(sk); if (unlikely(!psock)) { len = 0; + tcp_eat_skb(sk, skb); sock_drop(sk, skb); goto out; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6572962b0237..e2594d8e3429 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1568,7 +1568,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len) * calculation of whether or not we must ACK for the sake of * a window update. */ -static void __tcp_cleanup_rbuf(struct sock *sk, int copied) +void __tcp_cleanup_rbuf(struct sock *sk, int copied) { struct tcp_sock *tp = tcp_sk(sk); bool time_to_ack = false; @@ -1783,14 +1783,6 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) break; } } - WRITE_ONCE(tp->copied_seq, seq); - - tcp_rcv_space_adjust(sk); - - /* Clean up data we have read: This will do ACK frames. */ - if (copied > 0) - __tcp_cleanup_rbuf(sk, copied); - return copied; } EXPORT_SYMBOL(tcp_read_skb); diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index b1ba58be0c5a..c0e5680dccc0 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -11,6 +11,24 @@ #include #include +void tcp_eat_skb(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tcp; + int copied; + + if (!skb || !skb->len || !sk_is_tcp(sk)) + return; + + if (skb_bpf_strparser(skb)) + return; + + tcp = tcp_sk(sk); + copied = tcp->copied_seq + skb->len; + WRITE_ONCE(tcp->copied_seq, skb->len); + tcp_rcv_space_adjust(sk); + __tcp_cleanup_rbuf(sk, skb->len); +} + static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg, u32 apply_bytes, int flags) { @@ -198,8 +216,10 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, int flags, int *addr_len) { + struct tcp_sock *tcp = tcp_sk(sk); + u32 seq = tcp->copied_seq; struct sk_psock *psock; - int copied; + int copied = 0; if (unlikely(flags & MSG_ERRQUEUE)) return inet_recv_error(sk, msg, len, addr_len); @@ -241,9 +261,11 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, if (is_fin) { copied = 0; + seq++; goto out; } } + seq += copied; if (!copied) { long timeo; int data; @@ -281,6 +303,10 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, copied = -EAGAIN; } out: + WRITE_ONCE(tcp->copied_seq, seq); + tcp_rcv_space_adjust(sk); + if (copied > 0) + __tcp_cleanup_rbuf(sk, copied); release_sock(sk); sk_psock_put(sk, psock); return copied; From patchwork Tue Mar 21 21:52:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183302 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58319C6FD20 for ; Tue, 21 Mar 2023 21:52:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229847AbjCUVw4 (ORCPT ); Tue, 21 Mar 2023 17:52:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230000AbjCUVwk (ORCPT ); Tue, 21 Mar 2023 17:52:40 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C4BB4AFF0; Tue, 21 Mar 2023 14:52:33 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id fy10-20020a17090b020a00b0023b4bcf0727so17334956pjb.0; Tue, 21 Mar 2023 14:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435552; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=a6bFJnafED6Q9SZ4mvXold+p94/0Uyx0b8mGj4R6+no=; b=cGabqS+qbZGH+f7IqHAUAbMcDGZ1KZh7LuuQ6xYf+/5Y4yuLl7ciWcbGGVzq8tk8a0 cqU1FRD7aeoVOKT079F1HVTVMfrEMEPoxPCX4k1G04OeS0DQJ5v+kWjN6eEYRCuZR6Pn imHepfc8uURuRXDZDVmrJ6/dyzgApdsV7Py+7tqqTw9X1az+4eMU6LdUwf08c3BFVfH1 zkJ6d36APVbZVl8b7soQZxP7SeC0dlX8T5kJvRBB0bLoYJier3KBYpodRliGx4hOcEfc aCLYlvqW0xPdiberHy1Hmysp0WQPa3Q3a0dtDMLUMBlSjr01hp7UKwJzhoef10nVzMlP iLHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435552; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=a6bFJnafED6Q9SZ4mvXold+p94/0Uyx0b8mGj4R6+no=; b=QxBWiMun5t57aEUAFTV5GJT7kHPJpk1+9GLkdfZCJJZltwrwEkP/xEgAP/d8NMMRxe 7oS2BoDlM73JLsXA0YvP0K6AuutUyjssmp5kwErk76CGDUYl7K0h5feWTMtWFD6CZA0L qPflmaLx3eXKhM10zXTusVsEX+oq4nHFjG2r5LkgoMPqiFIBwlOz8O9vGLqHmLW75lH0 W7xrxLS8vLPxuRIJA+L3lxiHdfLcLNIDwVTESmBl/Kn8pa+SEbogFY0T8NXo16qbVBIs rxfBENsgwyMBwI9AaTbk/CKAI6YK7TWsc69VHMBHQbe3OLXETv1jzbZRfMRBf8TfpeGZ OkUw== X-Gm-Message-State: AO0yUKXIpkvmJnHwWkvPc0bXPVvffnXsCyzAYYbm+k3pnFmpIiW4cYbg rgZsV1lqDi/QmB3286JuqHE= X-Google-Smtp-Source: AK7set9oro4/CVE6yt0zqiLBmlzZrsPoryMrOXODycjOXKFLRBV8dBoM7VSYzWg8ABVGtf0L0wz36g== X-Received: by 2002:a05:6a20:4c29:b0:d7:4339:fac6 with SMTP id fm41-20020a056a204c2900b000d74339fac6mr2912784pzb.5.1679435552377; Tue, 21 Mar 2023 14:52:32 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:31 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 08/11] bpf: sockmap, pull socket helpers out of listen test for general use Date: Tue, 21 Mar 2023 14:52:09 -0700 Message-Id: <20230321215212.525630-9-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net No functional change here we merely pull the helpers in sockmap_listen.c into a header file so we can use these in other programs. The tests we are about to add aren't really _listen tests so doesn't make sense to add them here. Signed-off-by: John Fastabend --- .../bpf/prog_tests/sockmap_helpers.h | 249 ++++++++++++++++++ .../selftests/bpf/prog_tests/sockmap_listen.c | 245 +---------------- 2 files changed, 250 insertions(+), 244 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h new file mode 100644 index 000000000000..bff56844e745 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h @@ -0,0 +1,249 @@ +#ifndef __SOCKAMP_HELPERS__ +#define __SOCKMAP_HELPERS__ + +#define IO_TIMEOUT_SEC 30 +#define MAX_STRERR_LEN 256 +#define MAX_TEST_NAME 80 + +#define __always_unused __attribute__((__unused__)) + +#define _FAIL(errnum, fmt...) \ + ({ \ + error_at_line(0, (errnum), __func__, __LINE__, fmt); \ + CHECK_FAIL(true); \ + }) +#define FAIL(fmt...) _FAIL(0, fmt) +#define FAIL_ERRNO(fmt...) _FAIL(errno, fmt) +#define FAIL_LIBBPF(err, msg) \ + ({ \ + char __buf[MAX_STRERR_LEN]; \ + libbpf_strerror((err), __buf, sizeof(__buf)); \ + FAIL("%s: %s", (msg), __buf); \ + }) + +/* Wrappers that fail the test on error and report it. */ + +#define xaccept_nonblock(fd, addr, len) \ + ({ \ + int __ret = \ + accept_timeout((fd), (addr), (len), IO_TIMEOUT_SEC); \ + if (__ret == -1) \ + FAIL_ERRNO("accept"); \ + __ret; \ + }) + +#define xbind(fd, addr, len) \ + ({ \ + int __ret = bind((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("bind"); \ + __ret; \ + }) + +#define xclose(fd) \ + ({ \ + int __ret = close((fd)); \ + if (__ret == -1) \ + FAIL_ERRNO("close"); \ + __ret; \ + }) + +#define xconnect(fd, addr, len) \ + ({ \ + int __ret = connect((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("connect"); \ + __ret; \ + }) + +#define xgetsockname(fd, addr, len) \ + ({ \ + int __ret = getsockname((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("getsockname"); \ + __ret; \ + }) + +#define xgetsockopt(fd, level, name, val, len) \ + ({ \ + int __ret = getsockopt((fd), (level), (name), (val), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("getsockopt(" #name ")"); \ + __ret; \ + }) + +#define xlisten(fd, backlog) \ + ({ \ + int __ret = listen((fd), (backlog)); \ + if (__ret == -1) \ + FAIL_ERRNO("listen"); \ + __ret; \ + }) + +#define xsetsockopt(fd, level, name, val, len) \ + ({ \ + int __ret = setsockopt((fd), (level), (name), (val), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("setsockopt(" #name ")"); \ + __ret; \ + }) + +#define xsend(fd, buf, len, flags) \ + ({ \ + ssize_t __ret = send((fd), (buf), (len), (flags)); \ + if (__ret == -1) \ + FAIL_ERRNO("send"); \ + __ret; \ + }) + +#define xrecv_nonblock(fd, buf, len, flags) \ + ({ \ + ssize_t __ret = recv_timeout((fd), (buf), (len), (flags), \ + IO_TIMEOUT_SEC); \ + if (__ret == -1) \ + FAIL_ERRNO("recv"); \ + __ret; \ + }) + +#define xsocket(family, sotype, flags) \ + ({ \ + int __ret = socket(family, sotype, flags); \ + if (__ret == -1) \ + FAIL_ERRNO("socket"); \ + __ret; \ + }) + +#define xbpf_map_delete_elem(fd, key) \ + ({ \ + int __ret = bpf_map_delete_elem((fd), (key)); \ + if (__ret < 0) \ + FAIL_ERRNO("map_delete"); \ + __ret; \ + }) + +#define xbpf_map_lookup_elem(fd, key, val) \ + ({ \ + int __ret = bpf_map_lookup_elem((fd), (key), (val)); \ + if (__ret < 0) \ + FAIL_ERRNO("map_lookup"); \ + __ret; \ + }) + +#define xbpf_map_update_elem(fd, key, val, flags) \ + ({ \ + int __ret = bpf_map_update_elem((fd), (key), (val), (flags)); \ + if (__ret < 0) \ + FAIL_ERRNO("map_update"); \ + __ret; \ + }) + +#define xbpf_prog_attach(prog, target, type, flags) \ + ({ \ + int __ret = \ + bpf_prog_attach((prog), (target), (type), (flags)); \ + if (__ret < 0) \ + FAIL_ERRNO("prog_attach(" #type ")"); \ + __ret; \ + }) + +#define xbpf_prog_detach2(prog, target, type) \ + ({ \ + int __ret = bpf_prog_detach2((prog), (target), (type)); \ + if (__ret < 0) \ + FAIL_ERRNO("prog_detach2(" #type ")"); \ + __ret; \ + }) + +#define xpthread_create(thread, attr, func, arg) \ + ({ \ + int __ret = pthread_create((thread), (attr), (func), (arg)); \ + errno = __ret; \ + if (__ret) \ + FAIL_ERRNO("pthread_create"); \ + __ret; \ + }) + +#define xpthread_join(thread, retval) \ + ({ \ + int __ret = pthread_join((thread), (retval)); \ + errno = __ret; \ + if (__ret) \ + FAIL_ERRNO("pthread_join"); \ + __ret; \ + }) + +static inline int poll_read(int fd, unsigned int timeout_sec) +{ + struct timeval timeout = { .tv_sec = timeout_sec }; + fd_set rfds; + int r; + + FD_ZERO(&rfds); + FD_SET(fd, &rfds); + + r = select(fd + 1, &rfds, NULL, NULL, &timeout); + if (r == 0) + errno = ETIME; + + return r == 1 ? 0 : -1; +} + +static inline int accept_timeout(int fd, struct sockaddr *addr, socklen_t *len, + unsigned int timeout_sec) +{ + if (poll_read(fd, timeout_sec)) + return -1; + + return accept(fd, addr, len); +} + +static inline int recv_timeout(int fd, void *buf, size_t len, int flags, + unsigned int timeout_sec) +{ + if (poll_read(fd, timeout_sec)) + return -1; + + return recv(fd, buf, len, flags); +} + +static inline void init_addr_loopback4(struct sockaddr_storage *ss, socklen_t *len) +{ + struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss)); + + addr4->sin_family = AF_INET; + addr4->sin_port = 0; + addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); + *len = sizeof(*addr4); +} + +static inline void init_addr_loopback6(struct sockaddr_storage *ss, socklen_t *len) +{ + struct sockaddr_in6 *addr6 = memset(ss, 0, sizeof(*ss)); + + addr6->sin6_family = AF_INET6; + addr6->sin6_port = 0; + addr6->sin6_addr = in6addr_loopback; + *len = sizeof(*addr6); +} + +static inline void init_addr_loopback(int family, struct sockaddr_storage *ss, + socklen_t *len) +{ + switch (family) { + case AF_INET: + init_addr_loopback4(ss, len); + return; + case AF_INET6: + init_addr_loopback6(ss, len); + return; + default: + FAIL("unsupported address family %d", family); + } +} + +static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss) +{ + return (struct sockaddr *)ss; +} + +#endif // __SOCKMAP_HELPERS__ diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 567e07c19ecc..0f0cddd4e15e 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -26,250 +26,7 @@ #include "test_progs.h" #include "test_sockmap_listen.skel.h" -#define IO_TIMEOUT_SEC 30 -#define MAX_STRERR_LEN 256 -#define MAX_TEST_NAME 80 - -#define __always_unused __attribute__((__unused__)) - -#define _FAIL(errnum, fmt...) \ - ({ \ - error_at_line(0, (errnum), __func__, __LINE__, fmt); \ - CHECK_FAIL(true); \ - }) -#define FAIL(fmt...) _FAIL(0, fmt) -#define FAIL_ERRNO(fmt...) _FAIL(errno, fmt) -#define FAIL_LIBBPF(err, msg) \ - ({ \ - char __buf[MAX_STRERR_LEN]; \ - libbpf_strerror((err), __buf, sizeof(__buf)); \ - FAIL("%s: %s", (msg), __buf); \ - }) - -/* Wrappers that fail the test on error and report it. */ - -#define xaccept_nonblock(fd, addr, len) \ - ({ \ - int __ret = \ - accept_timeout((fd), (addr), (len), IO_TIMEOUT_SEC); \ - if (__ret == -1) \ - FAIL_ERRNO("accept"); \ - __ret; \ - }) - -#define xbind(fd, addr, len) \ - ({ \ - int __ret = bind((fd), (addr), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("bind"); \ - __ret; \ - }) - -#define xclose(fd) \ - ({ \ - int __ret = close((fd)); \ - if (__ret == -1) \ - FAIL_ERRNO("close"); \ - __ret; \ - }) - -#define xconnect(fd, addr, len) \ - ({ \ - int __ret = connect((fd), (addr), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("connect"); \ - __ret; \ - }) - -#define xgetsockname(fd, addr, len) \ - ({ \ - int __ret = getsockname((fd), (addr), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("getsockname"); \ - __ret; \ - }) - -#define xgetsockopt(fd, level, name, val, len) \ - ({ \ - int __ret = getsockopt((fd), (level), (name), (val), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("getsockopt(" #name ")"); \ - __ret; \ - }) - -#define xlisten(fd, backlog) \ - ({ \ - int __ret = listen((fd), (backlog)); \ - if (__ret == -1) \ - FAIL_ERRNO("listen"); \ - __ret; \ - }) - -#define xsetsockopt(fd, level, name, val, len) \ - ({ \ - int __ret = setsockopt((fd), (level), (name), (val), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("setsockopt(" #name ")"); \ - __ret; \ - }) - -#define xsend(fd, buf, len, flags) \ - ({ \ - ssize_t __ret = send((fd), (buf), (len), (flags)); \ - if (__ret == -1) \ - FAIL_ERRNO("send"); \ - __ret; \ - }) - -#define xrecv_nonblock(fd, buf, len, flags) \ - ({ \ - ssize_t __ret = recv_timeout((fd), (buf), (len), (flags), \ - IO_TIMEOUT_SEC); \ - if (__ret == -1) \ - FAIL_ERRNO("recv"); \ - __ret; \ - }) - -#define xsocket(family, sotype, flags) \ - ({ \ - int __ret = socket(family, sotype, flags); \ - if (__ret == -1) \ - FAIL_ERRNO("socket"); \ - __ret; \ - }) - -#define xbpf_map_delete_elem(fd, key) \ - ({ \ - int __ret = bpf_map_delete_elem((fd), (key)); \ - if (__ret < 0) \ - FAIL_ERRNO("map_delete"); \ - __ret; \ - }) - -#define xbpf_map_lookup_elem(fd, key, val) \ - ({ \ - int __ret = bpf_map_lookup_elem((fd), (key), (val)); \ - if (__ret < 0) \ - FAIL_ERRNO("map_lookup"); \ - __ret; \ - }) - -#define xbpf_map_update_elem(fd, key, val, flags) \ - ({ \ - int __ret = bpf_map_update_elem((fd), (key), (val), (flags)); \ - if (__ret < 0) \ - FAIL_ERRNO("map_update"); \ - __ret; \ - }) - -#define xbpf_prog_attach(prog, target, type, flags) \ - ({ \ - int __ret = \ - bpf_prog_attach((prog), (target), (type), (flags)); \ - if (__ret < 0) \ - FAIL_ERRNO("prog_attach(" #type ")"); \ - __ret; \ - }) - -#define xbpf_prog_detach2(prog, target, type) \ - ({ \ - int __ret = bpf_prog_detach2((prog), (target), (type)); \ - if (__ret < 0) \ - FAIL_ERRNO("prog_detach2(" #type ")"); \ - __ret; \ - }) - -#define xpthread_create(thread, attr, func, arg) \ - ({ \ - int __ret = pthread_create((thread), (attr), (func), (arg)); \ - errno = __ret; \ - if (__ret) \ - FAIL_ERRNO("pthread_create"); \ - __ret; \ - }) - -#define xpthread_join(thread, retval) \ - ({ \ - int __ret = pthread_join((thread), (retval)); \ - errno = __ret; \ - if (__ret) \ - FAIL_ERRNO("pthread_join"); \ - __ret; \ - }) - -static int poll_read(int fd, unsigned int timeout_sec) -{ - struct timeval timeout = { .tv_sec = timeout_sec }; - fd_set rfds; - int r; - - FD_ZERO(&rfds); - FD_SET(fd, &rfds); - - r = select(fd + 1, &rfds, NULL, NULL, &timeout); - if (r == 0) - errno = ETIME; - - return r == 1 ? 0 : -1; -} - -static int accept_timeout(int fd, struct sockaddr *addr, socklen_t *len, - unsigned int timeout_sec) -{ - if (poll_read(fd, timeout_sec)) - return -1; - - return accept(fd, addr, len); -} - -static int recv_timeout(int fd, void *buf, size_t len, int flags, - unsigned int timeout_sec) -{ - if (poll_read(fd, timeout_sec)) - return -1; - - return recv(fd, buf, len, flags); -} - -static void init_addr_loopback4(struct sockaddr_storage *ss, socklen_t *len) -{ - struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss)); - - addr4->sin_family = AF_INET; - addr4->sin_port = 0; - addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); - *len = sizeof(*addr4); -} - -static void init_addr_loopback6(struct sockaddr_storage *ss, socklen_t *len) -{ - struct sockaddr_in6 *addr6 = memset(ss, 0, sizeof(*ss)); - - addr6->sin6_family = AF_INET6; - addr6->sin6_port = 0; - addr6->sin6_addr = in6addr_loopback; - *len = sizeof(*addr6); -} - -static void init_addr_loopback(int family, struct sockaddr_storage *ss, - socklen_t *len) -{ - switch (family) { - case AF_INET: - init_addr_loopback4(ss, len); - return; - case AF_INET6: - init_addr_loopback6(ss, len); - return; - default: - FAIL("unsupported address family %d", family); - } -} - -static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss) -{ - return (struct sockaddr *)ss; -} +#include "sockmap_helpers.h" static int enable_reuseport(int s, int progfd) { From patchwork Tue Mar 21 21:52:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183303 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC887C6FD1D for ; Tue, 21 Mar 2023 21:52:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230051AbjCUVw6 (ORCPT ); Tue, 21 Mar 2023 17:52:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230093AbjCUVwm (ORCPT ); Tue, 21 Mar 2023 17:52:42 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3F1358C19; Tue, 21 Mar 2023 14:52:34 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id c18so17494029ple.11; Tue, 21 Mar 2023 14:52:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435554; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+TaDLCIXSz9eYXTku+ilKJmltzAIAJaTUBVD0OtqxmY=; b=LdEgNXfa23i45cUVYwzdKak5nkZQLAls+YSDZPbahr9Rbenw14n+GzL1LFpB0gk8lE GUiYx2qMOuO113T8FiQfxs4/yIO3X69Q29zHOE4PfMQ2hMDKDdxesUm0X+xDs4nFxUbr 4ONkCaYjxTeGUsPNkhSGE7c+et0kuQVCzbqnDMsqj3jXhiZBgn11oLlTNAxdFPuUa8Cs B1zSVaYL/dP580wiIqphGBG+So1Iv5jZs24OQIhXt70St2pJLC2EktokquRa1sVrTmTA XxNHZKZd0asgvhpjg0eWyzmK4b07wIryAIXTTtCin5pYQp7VcT/PcDZVP3Cctb7e5wv3 jL1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435554; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+TaDLCIXSz9eYXTku+ilKJmltzAIAJaTUBVD0OtqxmY=; b=kA+LVU1pNlSEZHz4jjr8uLpwHGI8f3n44AmOOlgUC960i/DvnEhMEibKuPxZQe47fU hyQhu2AncfJzbYx1+PP1LlYpmrXD9fV+WctfZns4miakkEGX5jV19mWaLSv4Q26MsX3Y Dqn3Gp9q/vaic+ZP6Rw7sZZ+UW8BC2BG1jhP26x1lqk7TkiQ+VvZgmmL/9MGibz4qlVA lsX/wyj4QaFnMEY/vU6s+o5p7hiwY8qAAJ0rvrfTK+RcMYrm62MkStZsSypOWMrTh0f+ 5h/P0uq9mNBWkVRq41ph3XU8fzOz2ViLTjdB13kb5NVNWIWIx+y72IGrdZXHHXji5AQf wJew== X-Gm-Message-State: AO0yUKUyONZLL20swNtoW2o8MRL2igix7rUegdUghle2l7NRuz+HBKXv M4n02mVmRgHk8gRFxcGGKUc= X-Google-Smtp-Source: AK7set/2FKbqF16GkpBq97ZtUJwU245rbguqy/t/vfw/dVPKK8IO+rvftcmfJ1DLciBh8h0CXR8hCw== X-Received: by 2002:a05:6a20:19b:b0:d5:c14c:1263 with SMTP id 27-20020a056a20019b00b000d5c14c1263mr2916827pzy.53.1679435554445; Tue, 21 Mar 2023 14:52:34 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:34 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 09/11] bpf: sockmap, build helper to create connected socket pair Date: Tue, 21 Mar 2023 14:52:10 -0700 Message-Id: <20230321215212.525630-10-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A common operation for testing is to spin up a pair of sockets that are connected. Then we can use these to run specific tests that need to send data, check BPF programs and so on. The sockmap_listen programs already have this logic lets move it into the new sockmap_helpers header file for general use. Signed-off-by: John Fastabend --- .../bpf/prog_tests/sockmap_helpers.h | 125 ++++++++++++++++++ .../selftests/bpf/prog_tests/sockmap_listen.c | 107 +-------------- 2 files changed, 130 insertions(+), 102 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h index bff56844e745..54e3a019ba72 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h @@ -246,4 +246,129 @@ static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss) return (struct sockaddr *)ss; } +static inline int add_to_sockmap(int sock_mapfd, int fd1, int fd2) +{ + u64 value; + u32 key; + int err; + + key = 0; + value = fd1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + return err; + + key = 1; + value = fd2; + return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); +} + +static inline int create_socket_pairs(int s, int family, int sotype, + int *c0, int *c1, int *p0, int *p1) +{ + struct sockaddr_storage addr; + socklen_t len; + int err = 0; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + return err; + + *c0 = xsocket(family, sotype, 0); + if (*c0 < 0) + return errno; + err = xconnect(*c0, sockaddr(&addr), len); + if (err) { + err = errno; + goto close_cli0; + } + + *p0 = xaccept_nonblock(s, NULL, NULL); + if (*p0 < 0) { + err = errno; + goto close_cli0; + } + + *c1 = xsocket(family, sotype, 0); + if (*c1 < 0) { + err = errno; + goto close_peer0; + } + err = xconnect(*c1, sockaddr(&addr), len); + if (err) { + err = errno; + goto close_cli1; + } + + *p1 = xaccept_nonblock(s, NULL, NULL); + if (*p1 < 0) { + err = errno; + goto close_peer1; + } + return err; +close_peer1: + close(*p1); +close_cli1: + close(*c1); +close_peer0: + close(*p0); +close_cli0: + close(*c0); + return err; +} + +static inline int enable_reuseport(int s, int progfd) +{ + int err, one = 1; + + err = xsetsockopt(s, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one)); + if (err) + return -1; + err = xsetsockopt(s, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &progfd, + sizeof(progfd)); + if (err) + return -1; + + return 0; +} + +static inline int socket_loopback_reuseport(int family, int sotype, int progfd) +{ + struct sockaddr_storage addr; + socklen_t len; + int err, s; + + init_addr_loopback(family, &addr, &len); + + s = xsocket(family, sotype, 0); + if (s == -1) + return -1; + + if (progfd >= 0) + enable_reuseport(s, progfd); + + err = xbind(s, sockaddr(&addr), len); + if (err) + goto close; + + if (sotype & SOCK_DGRAM) + return s; + + err = xlisten(s, SOMAXCONN); + if (err) + goto close; + + return s; +close: + xclose(s); + return -1; +} + +static inline int socket_loopback(int family, int sotype) +{ + return socket_loopback_reuseport(family, sotype, -1); +} + + #endif // __SOCKMAP_HELPERS__ diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 0f0cddd4e15e..f3913ba9e899 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -28,58 +28,6 @@ #include "sockmap_helpers.h" -static int enable_reuseport(int s, int progfd) -{ - int err, one = 1; - - err = xsetsockopt(s, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one)); - if (err) - return -1; - err = xsetsockopt(s, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &progfd, - sizeof(progfd)); - if (err) - return -1; - - return 0; -} - -static int socket_loopback_reuseport(int family, int sotype, int progfd) -{ - struct sockaddr_storage addr; - socklen_t len; - int err, s; - - init_addr_loopback(family, &addr, &len); - - s = xsocket(family, sotype, 0); - if (s == -1) - return -1; - - if (progfd >= 0) - enable_reuseport(s, progfd); - - err = xbind(s, sockaddr(&addr), len); - if (err) - goto close; - - if (sotype & SOCK_DGRAM) - return s; - - err = xlisten(s, SOMAXCONN); - if (err) - goto close; - - return s; -close: - xclose(s); - return -1; -} - -static int socket_loopback(int family, int sotype) -{ - return socket_loopback_reuseport(family, sotype, -1); -} - static void test_insert_invalid(struct test_sockmap_listen *skel __always_unused, int family, int sotype, int mapfd) { @@ -722,31 +670,12 @@ static const char *redir_mode_str(enum redir_mode mode) } } -static int add_to_sockmap(int sock_mapfd, int fd1, int fd2) -{ - u64 value; - u32 key; - int err; - - key = 0; - value = fd1; - err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); - if (err) - return err; - - key = 1; - value = fd2; - return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); -} - static void redir_to_connected(int family, int sotype, int sock_mapfd, int verd_mapfd, enum redir_mode mode) { const char *log_prefix = redir_mode_str(mode); - struct sockaddr_storage addr; int s, c0, c1, p0, p1; unsigned int pass; - socklen_t len; int err, n; u32 key; char b; @@ -757,36 +686,13 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, if (s < 0) return; - len = sizeof(addr); - err = xgetsockname(s, sockaddr(&addr), &len); + err = create_socket_pairs(s, family, sotype, &c0, &c1, &p0, &p1); if (err) goto close_srv; - c0 = xsocket(family, sotype, 0); - if (c0 < 0) - goto close_srv; - err = xconnect(c0, sockaddr(&addr), len); - if (err) - goto close_cli0; - - p0 = xaccept_nonblock(s, NULL, NULL); - if (p0 < 0) - goto close_cli0; - - c1 = xsocket(family, sotype, 0); - if (c1 < 0) - goto close_peer0; - err = xconnect(c1, sockaddr(&addr), len); - if (err) - goto close_cli1; - - p1 = xaccept_nonblock(s, NULL, NULL); - if (p1 < 0) - goto close_cli1; - err = add_to_sockmap(sock_mapfd, p0, p1); if (err) - goto close_peer1; + goto close; n = write(mode == REDIR_INGRESS ? c1 : p1, "a", 1); if (n < 0) @@ -794,12 +700,12 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, if (n == 0) FAIL("%s: incomplete write", log_prefix); if (n < 1) - goto close_peer1; + goto close; key = SK_PASS; err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); if (err) - goto close_peer1; + goto close; if (pass != 1) FAIL("%s: want pass count 1, have %d", log_prefix, pass); n = recv_timeout(c0, &b, 1, 0, IO_TIMEOUT_SEC); @@ -808,13 +714,10 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, if (n == 0) FAIL("%s: incomplete recv", log_prefix); -close_peer1: +close: xclose(p1); -close_cli1: xclose(c1); -close_peer0: xclose(p0); -close_cli0: xclose(c0); close_srv: xclose(s); From patchwork Tue Mar 21 21:52:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183304 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ED23C6FD1D for ; Tue, 21 Mar 2023 21:53:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229813AbjCUVxH (ORCPT ); Tue, 21 Mar 2023 17:53:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbjCUVwp (ORCPT ); Tue, 21 Mar 2023 17:52:45 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DAAE58C3C; Tue, 21 Mar 2023 14:52:36 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id c18so17494129ple.11; Tue, 21 Mar 2023 14:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435556; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4hZZQ2mIzy3LU2Gpjy5oQ2SLZ1jhZM9Dhcqx1wEiVz8=; b=a5fSOkkJYv+0+E8Z0hmI4oaTmUkfo3t+Vi5kPHoT17nd1IKEg8d4sGp1ntUFeHF4bn KSf91HSiPyESIJxYsAawXnLjoff0jd6c48lr+5umetF/J4IeWkcqJZ/j75TD0rae+7+i ClD9Mv2UblqbefErJw8ym8sv0tVxcJdbbVGqdlTvNXkOWqigF6mtuRRDkNISrB9skDhz LVW+8rM8zxo3FlbONwriq7GHO1h9qkVCPyOAUyo/vdUYUQVi0TAVHwl+cG0+mq2+PxFQ YAYuWFPH6qW1ntep/vdAEXbYOGQnq3um88aMjwaCLNMKmlpkvszg88a5NUmN4goGumPc UCDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435556; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4hZZQ2mIzy3LU2Gpjy5oQ2SLZ1jhZM9Dhcqx1wEiVz8=; b=clT+pgaMAbLOIhxQxCKltnt04ZDzSU+lXX5cVmvSUhWFIe2zO0ofrsPPN2Me4Ai0Pq w5DWwktsuBlgCKRuRApu1ev5qGjUfbOou5ZPiYT2rydbZgyVlFdgnBYmRB4Rzktqu+Pi F9F9gcSMQiUUhbdGHvvLo7vaH4P0rdu+DMW0/rG65tGFSL89qI8WxNHGZN3oDu/wDYkV OcFIW2CaSJXgr7mF5kc8fVDU7KB7A3M8r/bXGg4HDn0I8GZ6+Jw+STWv87jAkagNAyGX AFl465jH+7USEczaZN6NZDzbpy5o++Y/InPa6PyfmKRePku8+/PPx/xruiBq75X419vF IocA== X-Gm-Message-State: AO0yUKWJ8/2XgfbizW1SoBEGHTPnZZAtEVfjHF+a+lKGKvS1Y28BkV6E hwp21Qt/Yqxpp5NffcWE0Yk= X-Google-Smtp-Source: AK7set+fdc0fZqq4yFDJHuweU3uPvmhd4ZlYbkPY060kyYg4q7/4VYWqHeGWwKEByPc7IizXZkdBpg== X-Received: by 2002:a17:903:4306:b0:1a1:dd3a:7512 with SMTP id jz6-20020a170903430600b001a1dd3a7512mr511954plb.21.1679435556032; Tue, 21 Mar 2023 14:52:36 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:35 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 10/11] bpf: sockmap, test shutdown() correctly exits epoll and recv()=0 Date: Tue, 21 Mar 2023 14:52:11 -0700 Message-Id: <20230321215212.525630-11-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When session gracefully shutdowns epoll needs to wake up and any recv() readers should return 0 not the -EAGAIN they previously returned. Note we use epoll instead of select to test the epoll wake on shutdown event as well. Signed-off-by: John Fastabend --- .../selftests/bpf/prog_tests/sockmap_basic.c | 71 ++++++++++++++++++- .../bpf/progs/test_sockmap_pass_prog.c | 32 +++++++++ 2 files changed, 100 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 0aa088900699..38a22c71b8dd 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -2,6 +2,7 @@ // Copyright (c) 2020 Cloudflare #include #include +#include #include "test_progs.h" #include "test_skmsg_load_helpers.skel.h" @@ -9,8 +10,11 @@ #include "test_sockmap_invalid_update.skel.h" #include "test_sockmap_skb_verdict_attach.skel.h" #include "test_sockmap_progs_query.skel.h" +#include "test_sockmap_pass_prog.skel.h" #include "bpf_iter_sockmap.skel.h" +#include "sockmap_helpers.h" + #define TCP_REPAIR 19 /* TCP sock is under repair right now */ #define TCP_REPAIR_ON 1 @@ -286,9 +290,6 @@ static void test_sockmap_skb_verdict_attach(enum bpf_attach_type first, err = bpf_prog_attach(verdict, map, second, 0); ASSERT_EQ(err, -EBUSY, "prog_attach_fail"); - err = bpf_prog_detach2(verdict, map, first); - if (!ASSERT_OK(err, "bpf_prog_detach2")) - goto out; out: test_sockmap_skb_verdict_attach__destroy(skel); } @@ -350,6 +351,68 @@ static void test_sockmap_progs_query(enum bpf_attach_type attach_type) test_sockmap_progs_query__destroy(skel); } +#define MAX_EVENTS 10 +static void test_sockmap_skb_verdict_shutdown(void) +{ + int n, err, map, verdict, s, c0, c1, p0, p1; + struct epoll_event ev, events[MAX_EVENTS]; + struct test_sockmap_pass_prog *skel; + int epollfd; + int zero = 0; + char b; + + skel = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + map = bpf_map__fd(skel->maps.sock_map_rx); + + err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach")) + goto out; + + s = socket_loopback(AF_INET, SOCK_STREAM); + if (s < 0) + goto out; + err = create_socket_pairs(s, AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1); + if (err < 0) + goto out; + + err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST); + if (err < 0) + goto out_close; + + shutdown(c0, SHUT_RDWR); + shutdown(p1, SHUT_WR); + + ev.events = EPOLLIN; + ev.data.fd = c1; + + epollfd = epoll_create1(0); + if (!ASSERT_GT(epollfd, -1, "epoll_create(0)")) + goto out_close; + err = epoll_ctl(epollfd, EPOLL_CTL_ADD, c1, &ev); + if (!ASSERT_OK(err, "epoll_ctl(EPOLL_CTL_ADD)")) + goto out_close; + err = epoll_wait(epollfd, events, MAX_EVENTS, -1); + if (!ASSERT_EQ(err, 1, "epoll_wait(fd)")) + goto out_close; + + n = recv(c1, &b, 1, SOCK_NONBLOCK); + ASSERT_EQ(n, 0, "recv_timeout(fin)"); + n = recv(p0, &b, 1, SOCK_NONBLOCK); + ASSERT_EQ(n, 0, "recv_timeout(fin)"); + +out_close: + close(c0); + close(p0); + close(c1); + close(p1); +out: + test_sockmap_pass_prog__destroy(skel); +} + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -384,4 +447,6 @@ void test_sockmap_basic(void) test_sockmap_progs_query(BPF_SK_SKB_STREAM_VERDICT); if (test__start_subtest("sockmap skb_verdict progs query")) test_sockmap_progs_query(BPF_SK_SKB_VERDICT); + if (test__start_subtest("sockmap skb_verdict shutdown")) + test_sockmap_skb_verdict_shutdown(); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c new file mode 100644 index 000000000000..1d86a717a290 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c @@ -0,0 +1,32 @@ +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_rx SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_tx SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_msg SEC(".maps"); + +SEC("sk_skb") +int prog_skb_verdict(struct __sk_buff *skb) +{ + return SK_PASS; +} + +char _license[] SEC("license") = "GPL"; From patchwork Tue Mar 21 21:52:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13183305 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FBDEC76195 for ; Tue, 21 Mar 2023 21:53:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229550AbjCUVxJ (ORCPT ); Tue, 21 Mar 2023 17:53:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230110AbjCUVwx (ORCPT ); Tue, 21 Mar 2023 17:52:53 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B53833D0AD; Tue, 21 Mar 2023 14:52:38 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id f6-20020a17090ac28600b0023b9bf9eb63so17297801pjt.5; Tue, 21 Mar 2023 14:52:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679435557; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ryxn+5k5MD1criPhX9Z+EEFH37S3XNjS5ric5THSlS8=; b=dHVyFEo/2DWq2+kLMcYWhJqMyaaZDJGAeePwX5ONAMEblLoCjNYvVSQXkMv669q6AI Zv4L0Qw0pvpJH9eRnqD+P1rkV3f5ejTygFe9u/LbXVNgkLN7QQGCIinpj8vZzOLTqtUL d4gkEFVgurLHfaP20amdX/bXmoy8ri6Sx0DI7P/O0LWiYCAVuXpnn7cvvCYeg6b1ZPR/ 9ojJZzHuAv/xgTCAvTaatlMmkHJSyZg9Ry0gKott+9SGB0axYqu+uwL4tDLKIYl1m1M7 EMandROxu6sOTRGH3f6M1mkZVSaAh89cq/DZ1tmnUK6ROPwBY9qR3pBR6TUVDmUCwS4p ZtjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679435557; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ryxn+5k5MD1criPhX9Z+EEFH37S3XNjS5ric5THSlS8=; b=c6dTs8l44RYq9oUzVLMMTpdBsjLHKRWuHgYCg/IpA7NCnejxFwhi7PSVHOo8c/H+kk c16O6wROrFHELhdDKBOfnTyCY4ZdMMhHVjh/Fq0hAQbrRf+Cm/teLuC/XOwp5LKdmff5 2XRBPzxxYYVJbJ46nAgfVC7N1cS1WEqxUusx8DBZSE4WJaShaND0XPK88zV+4iWTmZae jPlqfh8DJeet4hI4dgXCNOxC5Mg7uMvU2aNq160STH8KzVlbCyJ11xAiJIHWVGqqUJNy 8ao4H2qcMcsgRACdFlnhYpIfmr8wsGVRar37t+SfOQ2aIS+77uMj8u0yAJiMwaWXXfLl mAIw== X-Gm-Message-State: AO0yUKUIJX5DfQ3giuQKWXyFUoxq3xokd85ypBFpIrwNPzjIYiRGNgr1 Z3dhDM+EOlnQYTEigv2Npec= X-Google-Smtp-Source: AK7set86hnr0tSI2zIGyZtp8UiGA+CMlm35cQ2akSabsiI+Cxpij++FQDC/+eIvUw1znXeFNnup5Xg== X-Received: by 2002:a05:6a20:c29c:b0:da:368e:7c73 with SMTP id bs28-20020a056a20c29c00b000da368e7c73mr3185014pzb.37.1679435557681; Tue, 21 Mar 2023 14:52:37 -0700 (PDT) Received: from john.lan ([98.97.36.54]) by smtp.gmail.com with ESMTPSA id m3-20020a63fd43000000b004facdf070d6sm8661331pgj.39.2023.03.21.14.52.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 14:52:37 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net, lmb@isovalent.com, cong.wang@bytedance.com Cc: bpf@vger.kernel.org, john.fastabend@gmail.com, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf 11/11] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer Date: Tue, 21 Mar 2023 14:52:12 -0700 Message-Id: <20230321215212.525630-12-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230321215212.525630-1-john.fastabend@gmail.com> References: <20230321215212.525630-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A bug was reported where ioctl(FIONREAD) returned zero even though the socket with a SK_SKB verdict program attached had bytes in the msg queue. The result is programs may hang or more likely try to recover, but use suboptimal buffer sizes. Add a test to check that ioctl(FIONREAD) returns the correct number of bytes. Signed-off-by: John Fastabend --- .../selftests/bpf/prog_tests/sockmap_basic.c | 48 +++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 38a22c71b8dd..b092355a8833 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -413,6 +413,52 @@ static void test_sockmap_skb_verdict_shutdown(void) test_sockmap_pass_prog__destroy(skel); } +static void test_sockmap_skb_verdict_fionread(void) +{ + int err, map, verdict, s, c0, c1, p0, p1; + struct test_sockmap_pass_prog *skel; + int zero = 0, sent, recvd, avail; + char buf[256] = "0123456789"; + + skel = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + map = bpf_map__fd(skel->maps.sock_map_rx); + + err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach")) + goto out; + + s = socket_loopback(AF_INET, SOCK_STREAM); + if (!ASSERT_GT(s, -1, "socket_loopback(s)")) + goto out; + err = create_socket_pairs(s, AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1); + if (!ASSERT_OK(err, "create_socket_pairs(s)")) + goto out; + + err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(c1)")) + goto out_close; + + sent = xsend(p1, &buf, sizeof(buf), 0); + ASSERT_EQ(sent, sizeof(buf), "xsend(p0)"); + err = ioctl(c1, FIONREAD, &avail); + ASSERT_OK(err, "ioctl(FIONREAD) error"); + ASSERT_EQ(avail, sizeof(buf), "ioctl(FIONREAD)"); + recvd = recv_timeout(c1, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC); + ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(c0)"); + +out_close: + close(c0); + close(p0); + close(c1); + close(p1); +out: + test_sockmap_pass_prog__destroy(skel); +} + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -449,4 +495,6 @@ void test_sockmap_basic(void) test_sockmap_progs_query(BPF_SK_SKB_VERDICT); if (test__start_subtest("sockmap skb_verdict shutdown")) test_sockmap_skb_verdict_shutdown(); + if (test__start_subtest("sockmap skb_verdict fionread")) + test_sockmap_skb_verdict_fionread(); }