From patchwork Thu Mar 30 15:17:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13194324 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DA23C6FD1D for ; Thu, 30 Mar 2023 15:19:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232849AbjC3PTl (ORCPT ); Thu, 30 Mar 2023 11:19:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232841AbjC3PTl (ORCPT ); Thu, 30 Mar 2023 11:19:41 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA858D321 for ; Thu, 30 Mar 2023 08:18:32 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id le6so18359042plb.12 for ; Thu, 30 Mar 2023 08:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680189495; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MfYJYnl0Heey6FS2EwHdEmVSHYV5GV5CMBukfM7CZco=; b=G1uq2MRc9yjupxXARmLcuBek5Hg2bm4sXA0E2d8Ii3tF9LADjKu93mvPVID/9BXgqw B2GDNgHcS4zV6k0HEM+iNJzWibohdAzaUeE6xy8d+jwvSp6CFxJqxfkp+eMcSjyoQDOS cjBN4DXu53/wvvYsoScxpCc/S+KAn9cy8QgCJYWp7dDA3zmebHbx9Yz1GvJUt+KL8H11 fsUaWSiECWn1aQpeT0Ea79OjT4pMqlkd4B07FhTJbuwEz3R+v4EVuzxdhCUChGR5Dquy XBdFTvbfAR4rFXUFDr/18wDnznvTKa2afjQXSQwTV2n1surFLNRE/HvZN+ou3vDAVXdT dp9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680189495; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MfYJYnl0Heey6FS2EwHdEmVSHYV5GV5CMBukfM7CZco=; b=6WLb+axM/9u8Aa2ZVzes6BWzMdEpRH/wpNnYLh4GeZ+sjM7WBCYb17Z3y5H/Bnf5kM +8kq+i7WYM/IAvsyRBv8hnU6US0z58HhokX1Apwt3XrxH38Ksgf7D8JEi8aQC3t/0n/g rUUM9y8CNxuRBPeNqQzg1J+9BX7A/e+wtCxEGKIOh3kATf8UhJRipPD31m/mSXA8h2y1 5dNSXVudGtM7y5+c1LfLaWuLk8H7l76bN2vg31NXzPVTwECyYT1OLyj4j0a/+ITtWZ/2 TCgVQ+eVY2jCzku+MDVXXVoGJ/nw5lgoO/k54gfyklcSfqWgKUkqWSiolOX6PYZDIg49 gHuA== X-Gm-Message-State: AAQBX9f6ED7vQujo7ST6V2TuClEW9w3rq8GYseI9GazEa97Thf2eAv/c uCDY7/AwtacPhm5LhXFOBUjoELbaAH0yVECqt58= X-Google-Smtp-Source: AKy350bnmzaGnqWjigYq8IFee8gaikToSb1uWywU73iuZEay7jMQPAI5fRto5XMHCI1iyymV2GJgqg== X-Received: by 2002:a17:903:410c:b0:1a1:f0ce:bb8f with SMTP id r12-20020a170903410c00b001a1f0cebb8fmr18495480pld.65.1680189495257; Thu, 30 Mar 2023 08:18:15 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id f17-20020a63de11000000b004fc1d91e695sm23401177pgg.79.2023.03.30.08.18.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 08:18:14 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v5 bpf-next 1/7] bpf: tcp: Avoid taking fast sock lock in iterator Date: Thu, 30 Mar 2023 15:17:52 +0000 Message-Id: <20230330151758.531170-2-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230330151758.531170-1-aditi.ghag@isovalent.com> References: <20230330151758.531170-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Previously, BPF TCP iterator was acquiring fast version of sock lock that disables the BH. This introduced a circular dependency with code paths that later acquire sockets hash table bucket lock. Replace the fast version of sock lock with slow that faciliates BPF programs executed from the iterator to destroy TCP listening sockets using the bpf_sock_destroy kfunc (implemened in follow-up commits). Here is a stack trace that motivated this change: ``` 1) sock_lock with BH disabled + bucket lock lock_acquire+0xcd/0x330 _raw_spin_lock_bh+0x38/0x50 inet_unhash+0x96/0xd0 tcp_set_state+0x6a/0x210 tcp_abort+0x12b/0x230 bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa bpf_iter_run_prog+0x1ff/0x340 bpf_iter_tcp_seq_show+0xca/0x190 bpf_seq_read+0x177/0x450 vfs_read+0xc6/0x300 ksys_read+0x69/0xf0 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc 2) sock lock with BH enable [ 1.499968] lock_acquire+0xcd/0x330 [ 1.500316] _raw_spin_lock+0x33/0x40 [ 1.500670] sk_clone_lock+0x146/0x520 [ 1.501030] inet_csk_clone_lock+0x1b/0x110 [ 1.501433] tcp_create_openreq_child+0x22/0x3f0 [ 1.501873] tcp_v6_syn_recv_sock+0x96/0x940 [ 1.502284] tcp_check_req+0x137/0x660 [ 1.502646] tcp_v6_rcv+0xa63/0xe80 [ 1.502994] ip6_protocol_deliver_rcu+0x78/0x590 [ 1.503434] ip6_input_finish+0x72/0x140 [ 1.503818] __netif_receive_skb_one_core+0x63/0xa0 [ 1.504281] process_backlog+0x79/0x260 [ 1.504668] __napi_poll.constprop.0+0x27/0x170 [ 1.505104] net_rx_action+0x14a/0x2a0 [ 1.505469] __do_softirq+0x165/0x510 [ 1.505842] do_softirq+0xcd/0x100 [ 1.506172] __local_bh_enable_ip+0xcc/0xf0 [ 1.506588] ip6_finish_output2+0x2a8/0xb00 [ 1.506988] ip6_finish_output+0x274/0x510 [ 1.507377] ip6_xmit+0x319/0x9b0 [ 1.507726] inet6_csk_xmit+0x12b/0x2b0 [ 1.508096] __tcp_transmit_skb+0x549/0xc40 [ 1.508498] tcp_rcv_state_process+0x362/0x1180 ``` Signed-off-by: Aditi Ghag --- net/ipv4/tcp_ipv4.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ea370afa70ed..f2d370a9450f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) struct bpf_iter_meta meta; struct bpf_prog *prog; struct sock *sk = v; - bool slow; uid_t uid; int ret; @@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) return 0; if (sk_fullsock(sk)) - slow = lock_sock_fast(sk); + lock_sock(sk); if (unlikely(sk_unhashed(sk))) { ret = SEQ_SKIP; @@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) unlock: if (sk_fullsock(sk)) - unlock_sock_fast(sk, slow); + release_sock(sk); return ret; } From patchwork Thu Mar 30 15:17:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13194325 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F995C77B60 for ; Thu, 30 Mar 2023 15:19:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232841AbjC3PTm (ORCPT ); Thu, 30 Mar 2023 11:19:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232846AbjC3PTl (ORCPT ); Thu, 30 Mar 2023 11:19:41 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA90CD322 for ; Thu, 30 Mar 2023 08:18:32 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id o2so18386185plg.4 for ; Thu, 30 Mar 2023 08:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680189497; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8GMH/sMqxTO7VPUYqW8LTgQ4I7YNXjBTU9UJG9Mf4+M=; b=GARD2atwfFct/hksBYYWgGmT6DOVOD/f3StFCdez2EMXHq5rsdENK1kSVptguNNgp6 fNXOk4BYlKLhy/Wp6Vk4GkNP37glJuoEhEXgZb0eCaH0xaNJ58nLi5qEBEv/IZE5jzb+ IATCtuJDSM6KfR+wvC9t6cE7cTPTlgz2MNM+ZnCtpF+xNuMUqnyPFNAqkf1qQ897KqSO W9YWwxk4o2bCusKS+1+6+hkkJ/JvXIaT8KArfwjjVjOb6XnFa9kpsRnoOBtTXFJWexdb hDvRLZLLtWP4qRVQ8pudRU1uZpZ50UWKJ79sRoHiGmL4je5AsDXrYevTmiHB0rJP0AZ4 DFUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680189497; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8GMH/sMqxTO7VPUYqW8LTgQ4I7YNXjBTU9UJG9Mf4+M=; b=AyA2flq5pKykaUIXaVorD01+RIEqjtBbkdjVKTTT8IALm7YgD60WtVP1zn7F+IQX2U MaJ10J4cBeq9VrwiJkP06g5vrZAbwoo1Il3WFPwjjJ/f9Vj0yL+kQ1h7Sp+HS3a96f0r jsPpkqiv7PWCsbTdwypzqoRSpyiKLDCEagQHa2kIX+GLPYxYYmvFGn99O6WwKI/yqEmH d3P5SuCTQhCbjOeEhIhdDhcoI74lYKJYdjHfEhw93RQXF7/n6Yiog58CSvLe32pawocm mYJK2FVzF45sWpjIVOMk0aDOXJ+aF+kpP6tPw4STMOacTysmMjtO/pivyMZMvJ4AOl4h tu6w== X-Gm-Message-State: AAQBX9dkdMAEKtChyG+R1kNOB7H0cQiN6RNiYUBKa8A3n0aOFw2D9ymj usGHoycFGnNaMGj45wgpdgBUperlGDsmAqaPyyo= X-Google-Smtp-Source: AKy350aG0ukGp5IfYrR1xAiIo1RMXxkNK6L1e5DsZCsfPTaROnuaJuzvLahOwPZoSKU/8SzcIs/6cg== X-Received: by 2002:a17:902:fb08:b0:1a2:1fb9:3b2 with SMTP id le8-20020a170902fb0800b001a21fb903b2mr20154375plb.11.1680189497062; Thu, 30 Mar 2023 08:18:17 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id f17-20020a63de11000000b004fc1d91e695sm23401177pgg.79.2023.03.30.08.18.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 08:18:16 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com, Martin KaFai Lau Subject: [PATCH v5 bpf-next 2/7] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state Date: Thu, 30 Mar 2023 15:17:53 +0000 Message-Id: <20230330151758.531170-3-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230330151758.531170-1-aditi.ghag@isovalent.com> References: <20230330151758.531170-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This is a preparatory commit to remove the field. The field was previously shared between proc fs and BPF UDP socket iterators. As the follow-up commits will decouple the implementation for the iterators, remove the field. As for BPF socket iterator, filtering of sockets is exepected to be done in BPF programs. Suggested-by: Martin KaFai Lau Signed-off-by: Aditi Ghag --- include/net/udp.h | 1 - net/ipv4/udp.c | 15 +++------------ 2 files changed, 3 insertions(+), 13 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index de4b528522bb..5cad44318d71 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -437,7 +437,6 @@ struct udp_seq_afinfo { struct udp_iter_state { struct seq_net_private p; int bucket; - struct udp_seq_afinfo *bpf_seq_afinfo; }; void *udp_seq_start(struct seq_file *seq, loff_t *pos); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c605d171eb2d..c574c8c17ec9 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2997,10 +2997,7 @@ static struct sock *udp_get_first(struct seq_file *seq, int start) struct udp_table *udptable; struct sock *sk; - if (state->bpf_seq_afinfo) - afinfo = state->bpf_seq_afinfo; - else - afinfo = pde_data(file_inode(seq->file)); + afinfo = pde_data(file_inode(seq->file)); udptable = udp_get_table_afinfo(afinfo, net); @@ -3033,10 +3030,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk) struct udp_seq_afinfo *afinfo; struct udp_table *udptable; - if (state->bpf_seq_afinfo) - afinfo = state->bpf_seq_afinfo; - else - afinfo = pde_data(file_inode(seq->file)); + afinfo = pde_data(file_inode(seq->file)); do { sk = sk_next(sk); @@ -3094,10 +3088,7 @@ void udp_seq_stop(struct seq_file *seq, void *v) struct udp_seq_afinfo *afinfo; struct udp_table *udptable; - if (state->bpf_seq_afinfo) - afinfo = state->bpf_seq_afinfo; - else - afinfo = pde_data(file_inode(seq->file)); + afinfo = pde_data(file_inode(seq->file)); udptable = udp_get_table_afinfo(afinfo, seq_file_net(seq)); From patchwork Thu Mar 30 15:17:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13194326 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0109C761A6 for ; Thu, 30 Mar 2023 15:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232850AbjC3PTo (ORCPT ); Thu, 30 Mar 2023 11:19:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232846AbjC3PTo (ORCPT ); Thu, 30 Mar 2023 11:19:44 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D3D8D326 for ; Thu, 30 Mar 2023 08:18:33 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id qe8-20020a17090b4f8800b0023f07253a2cso19944776pjb.3 for ; Thu, 30 Mar 2023 08:18:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680189499; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hdtJYr2bcULfCg+reBAuKmer6q4tZwo2JWP9J/6JBzY=; b=OddMGB5L56fYKW3o/65LL9YmzgW2XQl2leiGZMSN5Wc0Z9IccYkUticXIWY9Vsx9Kl PEj7FUWYOt7mdt/c+lX7BfcB2st7kcHO0nNlUL8NdNJ626BJsfZoeaTNb0L9NN0MmBS/ W/I+Nya2WYmBjwVvjMiFkoMNVhUAHLHcPSIBSBS+0WU1NezU6VmrjwFVOd1yd51SGX0p iqDiQ1Q6ARITWOo8VyTtpEDeCUa4uAys6UdtOBlT67B1RPvki/irVQGi7Gh9FRnZi9mF WwD4s964yU3GFdzXYsiC3MAPjWDP11DQCOqWaphDw1v3abrln58yBh0Ar6qBYvvJIeJQ Y+Bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680189499; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hdtJYr2bcULfCg+reBAuKmer6q4tZwo2JWP9J/6JBzY=; b=kAUsCdRQpPNZvjynocfDDe5tHX+XerqrBwC6Nlk6VFK9IGpwoEVQSEhZJgRLeWm9+M ik9LyPXstwKBfZ/BqW2zvtwiM15cV4RMoxIoHE2rbXP2nKIDgP7ULaYYqcsUhncEK7+E kQoo1WtevTmpqcgbRW+8GPu0UiA9nEluP2/lG0P4eW2YDHl4LPpPyo3FRU4M9KWK4YzM Wld82TvlQccHLz/uiM7nmEjmY63WufOKmdfJXirY43FVIVO6sgm2xf3uKR9GtOZUfLay oA9S44XlPht2GZC8W22niimNOz5trPKHOVAaVctYmIV6SuKiHJAQrytlyZwDXwgpCv4I ferQ== X-Gm-Message-State: AAQBX9cqyQB7v+ku55PWuFkJXZ4bD/fJnbEDYQ/8d3KAqf4oKeISVX80 1/D4T6HgdSZFUctbRldmbQ+v7gfGnQH+pzEqw1k= X-Google-Smtp-Source: AKy350b5i7hlIpsbeewqoYd+IegYziAKqem7vl3zaxjnuYBar5+z1UyItIf7m+S0RNfVqx10wXbGtw== X-Received: by 2002:a17:903:7cf:b0:1a1:c792:8e73 with SMTP id ko15-20020a17090307cf00b001a1c7928e73mr19973110plb.60.1680189499371; Thu, 30 Mar 2023 08:18:19 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id f17-20020a63de11000000b004fc1d91e695sm23401177pgg.79.2023.03.30.08.18.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 08:18:18 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Date: Thu, 30 Mar 2023 15:17:54 +0000 Message-Id: <20230330151758.531170-4-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230330151758.531170-1-aditi.ghag@isovalent.com> References: <20230330151758.531170-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This is a preparatory commit to refactor code that matches socket attributes in iterators to a helper function, and use it in the proc fs iterator. Signed-off-by: Aditi Ghag --- net/ipv4/udp.c | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c574c8c17ec9..cead4acb64c6 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2983,6 +2983,8 @@ EXPORT_SYMBOL(udp_prot); /* ------------------------------------------------------------------------ */ #ifdef CONFIG_PROC_FS +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk); + static struct udp_table *udp_get_table_afinfo(struct udp_seq_afinfo *afinfo, struct net *net) { @@ -3010,10 +3012,7 @@ static struct sock *udp_get_first(struct seq_file *seq, int start) spin_lock_bh(&hslot->lock); sk_for_each(sk, &hslot->head) { - if (!net_eq(sock_net(sk), net)) - continue; - if (afinfo->family == AF_UNSPEC || - sk->sk_family == afinfo->family) + if (seq_sk_match(seq, sk)) goto found; } spin_unlock_bh(&hslot->lock); @@ -3034,9 +3033,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk) do { sk = sk_next(sk); - } while (sk && (!net_eq(sock_net(sk), net) || - (afinfo->family != AF_UNSPEC && - sk->sk_family != afinfo->family))); + } while (sk && !seq_sk_match(seq, sk)); if (!sk) { udptable = udp_get_table_afinfo(afinfo, net); @@ -3143,6 +3140,17 @@ struct bpf_iter__udp { int bucket __aligned(8); }; +static unsigned short seq_file_family(const struct seq_file *seq); + +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk) +{ + unsigned short family = seq_file_family(seq); + + /* AF_UNSPEC is used as a match all */ + return ((family == AF_UNSPEC || family == sk->sk_family) && + net_eq(sock_net(sk), seq_file_net(seq))); +} + static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta, struct udp_sock *udp_sk, uid_t uid, int bucket) { @@ -3194,6 +3202,19 @@ static const struct seq_operations bpf_iter_udp_seq_ops = { .stop = bpf_iter_udp_seq_stop, .show = bpf_iter_udp_seq_show, }; + +static unsigned short seq_file_family(const struct seq_file *seq) +{ + const struct udp_seq_afinfo *afinfo; + + /* BPF iterator: bpf programs to filter sockets. */ + if (seq->op == &bpf_iter_udp_seq_ops) + return AF_UNSPEC; + + /* Proc fs iterator */ + afinfo = pde_data(file_inode(seq->file)); + return afinfo->family; +} #endif const struct seq_operations udp_seq_ops = { From patchwork Thu Mar 30 15:17:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13194329 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50FBBC77B60 for ; Thu, 30 Mar 2023 15:19:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232854AbjC3PTr (ORCPT ); Thu, 30 Mar 2023 11:19:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232851AbjC3PTp (ORCPT ); Thu, 30 Mar 2023 11:19:45 -0400 Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E31D0C177 for ; Thu, 30 Mar 2023 08:18:34 -0700 (PDT) Received: by mail-pg1-x52a.google.com with SMTP id s19so11565428pgi.0 for ; Thu, 30 Mar 2023 08:18:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680189501; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TWfRXbE3ZHJxRbTfZMhbXzd5pM7CkBXB2XSVU2K7wQg=; b=VGIhDQLitwDCmpbkeatBNhkdDs0EGqswBMH6tBuv5mgYpdnzz4TLxJtFcL7r822cH4 6dn8ArNMN1psAvBeqVh+mw32DEOMnmZxyWBSMsPdenNTrAcJiWgXcQdmFD6RaD9YJYx5 4px4izwWwZXohIF2Bdi4ot/bVh3Z0YY3sNnxSt0aSyvx5DeVSfz1eadOHuxTZH20nIPu qqDIr7YIfqT/jc6uzVe7u0XcK2ZTAATbW6u1YWADrYN7lJPFB5x7t+UOw0ZirvA+39wB 8/1rHrmFjewsroUa+C9G9Nf66EQqMpyxIBGLvCPxFZqzhSNhtZ2j9h1bVuZJXA754eTY F+xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680189501; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TWfRXbE3ZHJxRbTfZMhbXzd5pM7CkBXB2XSVU2K7wQg=; b=ynAeiZoI5Swq8repkvEgIbRcLvvOacGl78AejQDNZu6UuWdZ3XKnUv0Tm0sneDFtZ1 qQFY5P+971JezDvHNUwzjuVsU54Crm4aV+5EpIKGe8HsX8GFdQN3IEy0tQZ0ysC3VyE3 Dk3+blYvpnFYhqq+H1RGqAcWGEU1ZRHNMLxSPVYct4llIC0ARP8pd1Z9uAXoupi6ODlD 3pjuH0lkUZISl34Dy3pEDbYLQUXNIcfjpn1OZrkK3LIJ0Q7OO4e6AHV37WhYJDlNXJtf VrHBEaFJD97DsX8k+zc803RpdwHsO05hbR+dy+ymoKsUMXxmPhfg3bic9vPIEPAda21Y 4ooQ== X-Gm-Message-State: AAQBX9dQJrqvXEe0W3IzNxjkIGC7ioZTQHkcp3UoNXTeXnvS1/XNMmOx uJZpjnhAvCVYl4pibxaY3e8C6tU3q9kXtw03yyY= X-Google-Smtp-Source: AKy350ajT4HmzuzO8AU0+E0rmcMGV9yQLCWkQwrq++cb4U2WN0vuti7Bnkq1zFy06Id02q24M9K+xA== X-Received: by 2002:aa7:8e8e:0:b0:625:7300:5550 with SMTP id a14-20020aa78e8e000000b0062573005550mr24366541pfr.31.1680189501147; Thu, 30 Mar 2023 08:18:21 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id f17-20020a63de11000000b004fc1d91e695sm23401177pgg.79.2023.03.30.08.18.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 08:18:20 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com, Martin KaFai Lau Subject: [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator Date: Thu, 30 Mar 2023 15:17:55 +0000 Message-Id: <20230330151758.531170-5-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230330151758.531170-1-aditi.ghag@isovalent.com> References: <20230330151758.531170-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Batch UDP sockets from BPF iterator that allows for overlapping locking semantics in BPF/kernel helpers executed in BPF programs. This facilitates BPF socket destroy kfunc (introduced by follow-up patches) to execute from BPF iterator programs. Previously, BPF iterators acquired the sock lock and sockets hash table bucket lock while executing BPF programs. This prevented BPF helpers that again acquire these locks to be executed from BPF iterators. With the batching approach, we acquire a bucket lock, batch all the bucket sockets, and then release the bucket lock. This enables BPF or kernel helpers to skip sock locking when invoked in the supported BPF contexts. The batching logic is similar to the logic implemented in TCP iterator: https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com/. Suggested-by: Martin KaFai Lau Signed-off-by: Aditi Ghag --- net/ipv4/udp.c | 230 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 213 insertions(+), 17 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index cead4acb64c6..9af23d1c8d6b 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -3140,7 +3140,19 @@ struct bpf_iter__udp { int bucket __aligned(8); }; +struct bpf_udp_iter_state { + struct udp_iter_state state; + unsigned int cur_sk; + unsigned int end_sk; + unsigned int max_sk; + int offset; + struct sock **batch; + bool st_bucket_done; +}; + static unsigned short seq_file_family(const struct seq_file *seq); +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, + unsigned int new_batch_sz); static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk) { @@ -3151,6 +3163,149 @@ static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk) net_eq(sock_net(sk), seq_file_net(seq))); } +static struct sock *bpf_iter_udp_batch(struct seq_file *seq) +{ + struct bpf_udp_iter_state *iter = seq->private; + struct udp_iter_state *state = &iter->state; + struct net *net = seq_file_net(seq); + struct sock *first_sk = NULL; + struct udp_seq_afinfo afinfo; + struct udp_table *udptable; + unsigned int batch_sks = 0; + bool resized = false; + struct sock *sk; + int offset = 0; + int new_offset; + + /* The current batch is done, so advance the bucket. */ + if (iter->st_bucket_done) { + state->bucket++; + iter->offset = 0; + } + + afinfo.family = AF_UNSPEC; + afinfo.udp_table = NULL; + udptable = udp_get_table_afinfo(&afinfo, net); + + if (state->bucket > udptable->mask) { + state->bucket = 0; + iter->offset = 0; + return NULL; + } + +again: + /* New batch for the next bucket. + * Iterate over the hash table to find a bucket with sockets matching + * the iterator attributes, and return the first matching socket from + * the bucket. The remaining matched sockets from the bucket are batched + * before releasing the bucket lock. This allows BPF programs that are + * called in seq_show to acquire the bucket lock if needed. + */ + iter->cur_sk = 0; + iter->end_sk = 0; + iter->st_bucket_done = false; + first_sk = NULL; + batch_sks = 0; + offset = iter->offset; + + for (; state->bucket <= udptable->mask; state->bucket++) { + struct udp_hslot *hslot2 = &udptable->hash2[state->bucket]; + + if (hlist_empty(&hslot2->head)) { + offset = 0; + continue; + } + new_offset = offset; + + spin_lock_bh(&hslot2->lock); + udp_portaddr_for_each_entry(sk, &hslot2->head) { + if (seq_sk_match(seq, sk)) { + /* Resume from the last iterated socket at the + * offset in the bucket before iterator was stopped. + */ + if (offset) { + --offset; + continue; + } + if (!first_sk) + first_sk = sk; + if (iter->end_sk < iter->max_sk) { + sock_hold(sk); + iter->batch[iter->end_sk++] = sk; + } + batch_sks++; + new_offset++; + } + } + spin_unlock_bh(&hslot2->lock); + + if (first_sk) + break; + + /* Reset the current bucket's offset before moving to the next bucket. */ + offset = 0; + } + + /* All done: no batch made. */ + if (!first_sk) + goto ret; + + if (iter->end_sk == batch_sks) { + /* Batching is done for the current bucket; return the first + * socket to be iterated from the batch. + */ + iter->st_bucket_done = true; + goto ret; + } + if (!resized && !bpf_iter_udp_realloc_batch(iter, batch_sks * 3 / 2)) { + resized = true; + /* Go back to the previous bucket to resize its batch. */ + state->bucket--; + goto again; + } +ret: + iter->offset = new_offset; + return first_sk; +} + +static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct bpf_udp_iter_state *iter = seq->private; + struct sock *sk; + + /* Whenever seq_next() is called, the iter->cur_sk is + * done with seq_show(), so unref the iter->cur_sk. + */ + if (iter->cur_sk < iter->end_sk) { + sock_put(iter->batch[iter->cur_sk++]); + ++iter->offset; + } + + /* After updating iter->cur_sk, check if there are more sockets + * available in the current bucket batch. + */ + if (iter->cur_sk < iter->end_sk) { + sk = iter->batch[iter->cur_sk]; + } else { + // Prepare a new batch. + sk = bpf_iter_udp_batch(seq); + } + + ++*pos; + return sk; +} + +static void *bpf_iter_udp_seq_start(struct seq_file *seq, loff_t *pos) +{ + /* bpf iter does not support lseek, so it always + * continue from where it was stop()-ped. + */ + if (*pos) + return bpf_iter_udp_batch(seq); + + return SEQ_START_TOKEN; +} + static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta, struct udp_sock *udp_sk, uid_t uid, int bucket) { @@ -3171,18 +3326,37 @@ static int bpf_iter_udp_seq_show(struct seq_file *seq, void *v) struct bpf_prog *prog; struct sock *sk = v; uid_t uid; + int rc; if (v == SEQ_START_TOKEN) return 0; + lock_sock(sk); + + if (unlikely(sk_unhashed(sk))) { + rc = SEQ_SKIP; + goto unlock; + } + uid = from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)); meta.seq = seq; prog = bpf_iter_get_info(&meta, false); - return udp_prog_seq_show(prog, &meta, v, uid, state->bucket); + rc = udp_prog_seq_show(prog, &meta, v, uid, state->bucket); + +unlock: + release_sock(sk); + return rc; +} + +static void bpf_iter_udp_put_batch(struct bpf_udp_iter_state *iter) +{ + while (iter->cur_sk < iter->end_sk) + sock_put(iter->batch[iter->cur_sk++]); } static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) { + struct bpf_udp_iter_state *iter = seq->private; struct bpf_iter_meta meta; struct bpf_prog *prog; @@ -3193,12 +3367,15 @@ static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) (void)udp_prog_seq_show(prog, &meta, v, 0, 0); } - udp_seq_stop(seq, v); + if (iter->cur_sk < iter->end_sk) { + bpf_iter_udp_put_batch(iter); + iter->st_bucket_done = false; + } } static const struct seq_operations bpf_iter_udp_seq_ops = { - .start = udp_seq_start, - .next = udp_seq_next, + .start = bpf_iter_udp_seq_start, + .next = bpf_iter_udp_seq_next, .stop = bpf_iter_udp_seq_stop, .show = bpf_iter_udp_seq_show, }; @@ -3425,38 +3602,57 @@ static struct pernet_operations __net_initdata udp_sysctl_ops = { DEFINE_BPF_ITER_FUNC(udp, struct bpf_iter_meta *meta, struct udp_sock *udp_sk, uid_t uid, int bucket) -static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux) +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, + unsigned int new_batch_sz) { - struct udp_iter_state *st = priv_data; - struct udp_seq_afinfo *afinfo; - int ret; + struct sock **new_batch; - afinfo = kmalloc(sizeof(*afinfo), GFP_USER | __GFP_NOWARN); - if (!afinfo) + new_batch = kvmalloc_array(new_batch_sz, sizeof(*new_batch), + GFP_USER | __GFP_NOWARN); + if (!new_batch) return -ENOMEM; - afinfo->family = AF_UNSPEC; - afinfo->udp_table = NULL; - st->bpf_seq_afinfo = afinfo; + bpf_iter_udp_put_batch(iter); + kvfree(iter->batch); + iter->batch = new_batch; + iter->max_sk = new_batch_sz; + + return 0; +} + +#define INIT_BATCH_SZ 16 + +static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux) +{ + struct bpf_udp_iter_state *iter = priv_data; + int ret; + ret = bpf_iter_init_seq_net(priv_data, aux); if (ret) - kfree(afinfo); + return ret; + + ret = bpf_iter_udp_realloc_batch(iter, INIT_BATCH_SZ); + if (ret) { + bpf_iter_fini_seq_net(priv_data); + return ret; + } + return ret; } static void bpf_iter_fini_udp(void *priv_data) { - struct udp_iter_state *st = priv_data; + struct bpf_udp_iter_state *iter = priv_data; - kfree(st->bpf_seq_afinfo); bpf_iter_fini_seq_net(priv_data); + kvfree(iter->batch); } static const struct bpf_iter_seq_info udp_seq_info = { .seq_ops = &bpf_iter_udp_seq_ops, .init_seq_private = bpf_iter_init_udp, .fini_seq_private = bpf_iter_fini_udp, - .seq_priv_size = sizeof(struct udp_iter_state), + .seq_priv_size = sizeof(struct bpf_udp_iter_state), }; static struct bpf_iter_reg udp_reg_info = { From patchwork Thu Mar 30 15:17:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13194327 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80140C77B6D for ; Thu, 30 Mar 2023 15:19:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232851AbjC3PTr (ORCPT ); Thu, 30 Mar 2023 11:19:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232852AbjC3PTq (ORCPT ); Thu, 30 Mar 2023 11:19:46 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F5C0D336 for ; Thu, 30 Mar 2023 08:18:35 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id o2so18386513plg.4 for ; Thu, 30 Mar 2023 08:18:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680189503; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/Pe+L74dLf3bG/I8RXaN8we2k3/vEMVt3UylCiPqjW4=; b=U8pVj0XWQrsSC9D8YgFSegV6eZbJyX/cX3BUXPCId+kBJjnorgM35D7Wu8slxWQCfE 0nFI0BkEpuAfBiN+rh4GecmWfOX5pdaG/SUfKXXoWUbXH236m3Up677NmCBmOa0fS5SS 7XpWCmK6DdOQLqCHkDIAvVqUd5Rf9lwQYTn4Qlh2OYhrdnEvrvdMwhDVq3UGaBzCCfln kwyqc7B48E5sXCTAYKmfwcCpki5F7T2ax0cqAAH7MBmLJGL4k1i6NbaPtR1h4Mqhbo1n n4DdI1eq1GB0k3tgNhbCrH44LGo1A4gQEYrFhX9SMNUwFJJDuLnSzpFDkQ8kuSkl7jUY y/HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680189503; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/Pe+L74dLf3bG/I8RXaN8we2k3/vEMVt3UylCiPqjW4=; b=5gUmY4rz67boro1Zo5yoQ0xBCEO1kiCG1+jTad5bDEzYpOSOoaIhoyDxCZVUeYQiLU lff9i4rzChUtiJr+ppqmwLMb7L32SgGbH7ZuBDl1nBQ35u1h2jTMeHss9EbcfvS1QYpE uGdfqa/aMLaUMs2ZOKFEuLpZKvf2uovGWidGiWgx32AOlmq6XCVzX8h6FTEy65EYRu6g HHYWnE78sSqthkU/1GbNeS1G6YWZMJgndkT4675pDTBKpmKDWBwLJFxG3mlxCFegkJYv 0niCJfF+htWHmc2Zm4H4xnwuDPphN2j5iNIi3RH69XGsR5B0JOpxHTk+Ubl3DdSsh8Q6 NzSA== X-Gm-Message-State: AO0yUKWNy/KpJiq6xU9JA3derNKPkD+9iJ9sSRwCSEORYgLeakPFRB9z cuMyadd20s6PuUReMjaAvBM8yS8mr7VW1z7cuGg= X-Google-Smtp-Source: AK7set/GWaPNzOkR3FYhQrAy7sstmaeQM3BUjIJ2aQkHiAzsVp6r6ZMFL8of7uU7sSmq0BZWs01mYg== X-Received: by 2002:a05:6a20:7788:b0:d9:18ab:16be with SMTP id c8-20020a056a20778800b000d918ab16bemr19660085pzg.29.1680189503425; Thu, 30 Mar 2023 08:18:23 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id f17-20020a63de11000000b004fc1d91e695sm23401177pgg.79.2023.03.30.08.18.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 08:18:21 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v5 bpf-next 5/7] bpf: Add bpf_sock_destroy kfunc Date: Thu, 30 Mar 2023 15:17:56 +0000 Message-Id: <20230330151758.531170-6-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230330151758.531170-1-aditi.ghag@isovalent.com> References: <20230330151758.531170-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The socket destroy kfunc is used to forcefully terminate sockets from certain BPF contexts. We plan to use the capability in Cilium to force client sockets to reconnect when their remote load-balancing backends are deleted. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be forcefully terminated. The helper allows terminating sockets that may or may not be actively sending traffic. The helper is currently exposed to certain BPF iterators where users can filter, and terminate selected sockets. Additionally, the helper can only be called from these BPF contexts that ensure socket locking in order to allow synchronous execution of destroy helpers that also acquire socket locks. The previous commit that batches UDP sockets during iteration facilitated a synchronous invocation of the destroy helper from BPF context by skipping taking socket locks in the destroy handler. TCP iterators already supported batching. The helper takes `sock_common` type argument, even though it expects, and casts them to a `sock` pointer. This enables the verifier to allow the sock_destroy kfunc to be called for TCP with `sock_common` and UDP with `sock` structs. As a comparison, BPF helpers enable this behavior with the `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such option available with the verifier logic that handles kfuncs where BTF types are inferred. Furthermore, as `sock_common` only has a subset of certain fields of `sock`, casting pointer to the latter type might not always be safe for certain sockets like request sockets, but these have a special handling in the diag_destroy handlers. Signed-off-by: Aditi Ghag --- net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp.c | 10 ++++++--- net/ipv4/udp.c | 6 ++++-- 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 3370efad1dda..a70c7b9876fa 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11724,3 +11724,57 @@ static int __init bpf_kfunc_init(void) return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); } late_initcall(bpf_kfunc_init); + +/* Disables missing prototype warnings */ +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code. + * + * The helper expects a non-NULL pointer to a socket. It invokes the + * protocol specific socket destroy handlers. + * + * The helper can only be called from BPF contexts that have acquired the socket + * locks. + * + * Parameters: + * @sock: Pointer to socket to be destroyed + * + * Return: + * On error, may return EPROTONOSUPPORT, EINVAL. + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented. + * 0 otherwise + */ +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock) +{ + struct sock *sk = (struct sock *)sock; + + if (!sk) + return -EINVAL; + + /* The locking semantics that allow for synchronous execution of the + * destroy handlers are only supported for TCP and UDP. + */ + if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW) + return -EOPNOTSUPP; + + return sk->sk_prot->diag_destroy(sk, ECONNABORTED); +} + +__diag_pop() + +BTF_SET8_START(sock_destroy_kfunc_set) +BTF_ID_FLAGS(func, bpf_sock_destroy) +BTF_SET8_END(sock_destroy_kfunc_set) + +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = { + .owner = THIS_MODULE, + .set = &sock_destroy_kfunc_set, +}; + +static int init_subsystem(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set); +} +late_initcall(init_subsystem); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 288693981b00..2259b4facc2f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4679,8 +4679,10 @@ int tcp_abort(struct sock *sk, int err) return 0; } - /* Don't race with userspace socket closes such as tcp_close. */ - lock_sock(sk); + /* BPF context ensures sock locking. */ + if (!has_current_bpf_ctx()) + /* Don't race with userspace socket closes such as tcp_close. */ + lock_sock(sk); if (sk->sk_state == TCP_LISTEN) { tcp_set_state(sk, TCP_CLOSE); @@ -4702,9 +4704,11 @@ int tcp_abort(struct sock *sk, int err) } bh_unlock_sock(sk); + local_bh_enable(); tcp_write_queue_purge(sk); - release_sock(sk); + if (!has_current_bpf_ctx()) + release_sock(sk); return 0; } EXPORT_SYMBOL_GPL(tcp_abort); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 9af23d1c8d6b..576a2ad272a7 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll); int udp_abort(struct sock *sk, int err) { - lock_sock(sk); + if (!has_current_bpf_ctx()) + lock_sock(sk); /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing * with close() @@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err) __udp_disconnect(sk, 0); out: - release_sock(sk); + if (!has_current_bpf_ctx()) + release_sock(sk); return 0; } From patchwork Thu Mar 30 15:17:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13194328 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A6A1C77B62 for ; Thu, 30 Mar 2023 15:19:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232852AbjC3PTs (ORCPT ); Thu, 30 Mar 2023 11:19:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232846AbjC3PTq (ORCPT ); Thu, 30 Mar 2023 11:19:46 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DD2BD332 for ; Thu, 30 Mar 2023 08:18:35 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id l14so12747133pfc.11 for ; Thu, 30 Mar 2023 08:18:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680189505; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C2aJUrcOGgofhZJGW/JTKC/kLosPU/GUamDiJfbJh5w=; b=Bpo80lqtt3mCWIOh5o7yX+EetAM2THbqoIPsZd4lhMb+xjM9/hiEzpAfgxqqZYzZxw M7ZHhXdm5QWgoxw2dSxQGs2cgcAVn5hqgu1Lx/zROVTTXLZvN+5tKzB7ZuZQ6uBdrkwX UN6UCqm65oFUqJF8ywD/EBrrDVc5u3yxf74YdAI8S+T6qNT0/E0AabGXllvrelufUPQ4 EKAFznD5KUrmKXAX9PgEZfblrpcX5mxV/PsbeAB82AYpJTt6geUocuKWW1Ge8zay8Cvb xSJUkj3LFiE33twK2J8jn6tAPwSRwO93yY/4iIdfXiSCj1GpT0UREY84FsEGRyRZFnEg CPtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680189505; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C2aJUrcOGgofhZJGW/JTKC/kLosPU/GUamDiJfbJh5w=; b=BUibpvz8IMY8yflgP3JK4PmcjgGlytlTb1eFUOVDNXS84RbMzgPE8HwCTXkcu8/0gg +TDZl5XJ8MniG4xuxVLwV/iYsAeewbvDyepzEFDLUC24QOhH3gZu7+7ZlwbMimXffd+k P3cvPGz9F1MnDajO4bOPK1qugk7Uu6g0OFmdQCxNB0wI0cF2B6mNKEvtl9dLRsj7FbyP u1CXOLjAXiqRigrQreuEWvn3oB402GlBlxIFayXoIeRF9h7W0tD3w+/wcgD8rbrpYU00 sGiR9Zy9TN00xumfUQoc5fvI3mhfOi87VlQ8vk/ipPLn5jA6Bu14GCpsAP5wf/PRF63C dAZA== X-Gm-Message-State: AAQBX9cxDQjYHjVTvIe3gnQsEJc2T0+a+1A3RVHlKCiCtSBhfsLKguvM jKP1bqVTnTmGqvYjusDpm3akVRYMY+mhbLtJIiM= X-Google-Smtp-Source: AKy350azBLJW0b0FOPcCz60g9VaYa1mk2uJDu5E6QYy8tqFIa1IdMDedvwMUGfqgVQMex5BrakCIMg== X-Received: by 2002:a62:6505:0:b0:627:deeb:af96 with SMTP id z5-20020a626505000000b00627deebaf96mr6005265pfb.11.1680189505108; Thu, 30 Mar 2023 08:18:25 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id f17-20020a63de11000000b004fc1d91e695sm23401177pgg.79.2023.03.30.08.18.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 08:18:23 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname Date: Thu, 30 Mar 2023 15:17:57 +0000 Message-Id: <20230330151758.531170-7-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230330151758.531170-1-aditi.ghag@isovalent.com> References: <20230330151758.531170-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The helper will be used to programmatically retrieve, and pass ports in userspace and kernel selftest programs. Suggested-by: Stanislav Fomichev Signed-off-by: Aditi Ghag --- tools/testing/selftests/bpf/network_helpers.c | 14 ++++++++++++++ tools/testing/selftests/bpf/network_helpers.h | 1 + 2 files changed, 15 insertions(+) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 596caa176582..4c1dc7cf7390 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -427,3 +427,17 @@ void close_netns(struct nstoken *token) close(token->orig_netns_fd); free(token); } + +int get_sock_port6(int sock_fd, __u16 *out_port) +{ + struct sockaddr_in6 addr = {}; + socklen_t addr_len = sizeof(addr); + int err; + + err = getsockname(sock_fd, (struct sockaddr *)&addr, &addr_len); + if (err < 0) + return err; + *out_port = addr.sin6_port; + + return err; +} diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h index f882c691b790..2ab3b50de0b7 100644 --- a/tools/testing/selftests/bpf/network_helpers.h +++ b/tools/testing/selftests/bpf/network_helpers.h @@ -56,6 +56,7 @@ int fastopen_connect(int server_fd, const char *data, unsigned int data_len, int make_sockaddr(int family, const char *addr_str, __u16 port, struct sockaddr_storage *addr, socklen_t *len); char *ping_command(int family); +int get_sock_port6(int sock_fd, __u16 *out_port); struct nstoken; /** From patchwork Thu Mar 30 15:17:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13194330 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE567C761A6 for ; Thu, 30 Mar 2023 15:19:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232846AbjC3PTs (ORCPT ); Thu, 30 Mar 2023 11:19:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232853AbjC3PTq (ORCPT ); Thu, 30 Mar 2023 11:19:46 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 982BCC655 for ; Thu, 30 Mar 2023 08:18:36 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id o11so18408187ple.1 for ; Thu, 30 Mar 2023 08:18:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680189507; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=z6cujn8BtwtRa4nxKrYRIGxiq36LBqfHbyeHAKJRSJ4=; b=KFLE/vZ3GIeUHewjpt9OXD3x43ZX/2yWL92Y6Ea7ohwDU7nq4Sn4P3VlZQIPAk/rNe 4wwchwhbgKvbSiiTQMZI8Sg9YckgBIkMIxJPwhsSt5Ex5sT2Ph6Qe+q5fUD/u3BILu11 Qht+wC+r3IV4Fv8IeAtxKFkjhful7Y2WSTGLPd8a1mx95zM2HDbxKb7WmO4hgJs+V2Gz wmCYNwmM4cQ2hzf4ovE1mvoIPbUVJnsyxk1bFocgzD3vY1E0C3zDqBc+P9pwSn0B4I7G DN2n2hBucsJuwYwqi6xz+BC5N+r1FV2FpvfwPR2p8D+ZcZFJ1M7CXsuORHZNCOsvBpjC pqVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680189507; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z6cujn8BtwtRa4nxKrYRIGxiq36LBqfHbyeHAKJRSJ4=; b=biuajNDC+bv8lMoANxG8iNuxFFtwcWmtrEusE+51KdWWZgND8YQ7MgCZ9sFzFsx1BZ YZLvlLjOlgSNxJh/0cy7DDWPmJCTy5RJAS20Kx0LLBx6ry4cDcSBOLK2uPjfa/0dVCqN k3p7vmrIvbaSWYTOs6USJG/7M4YPeXatufivCRNmt6KOq2cF4lHo4/+cEHD9LXIl3vPy JbEp/vNtjpawDWfZjOLsnkoLwMJ7TiByUn4T3MEfJiYJaL7zuYDwnr+z9y6HbIaMdLVX 9950wz5eNxX6XEItrvF+fV+hfXxGMzWKO4xqspa273H3VAQ00aJDxZJ6kDqz3xJQT8eO GSmQ== X-Gm-Message-State: AO0yUKUoWx74cmqasQJRRvjIAFo1XXy5D+CVyC6Kwl+Uko4R7QzITjsB GNF/jpVT4nb4Xe69uvM1zojS336dlPCdDWZsjHw= X-Google-Smtp-Source: AK7set8axsEUUGtFVzvO7R5ObYH9yuBjobq3qm6WtEA3H/QcGPtswlQ9rNA9GRDgtwt9KJixYZCX/Q== X-Received: by 2002:a05:6a20:19a:b0:cb:f5ab:3bd0 with SMTP id 26-20020a056a20019a00b000cbf5ab3bd0mr18720918pzy.59.1680189507156; Thu, 30 Mar 2023 08:18:27 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id f17-20020a63de11000000b004fc1d91e695sm23401177pgg.79.2023.03.30.08.18.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 08:18:26 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy Date: Thu, 30 Mar 2023 15:17:58 +0000 Message-Id: <20230330151758.531170-8-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230330151758.531170-1-aditi.ghag@isovalent.com> References: <20230330151758.531170-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The test cases for destroying sockets mirror the intended usages of the bpf_sock_destroy kfunc using iterators. The destroy helpers set `ECONNABORTED` error code that we can validate in the test code with client sockets. But UDP sockets have an overriding error code from the disconnect called during abort, so the error code the validation is only done for TCP sockets. Signed-off-by: Aditi Ghag --- .../selftests/bpf/prog_tests/sock_destroy.c | 203 ++++++++++++++++++ .../selftests/bpf/progs/sock_destroy_prog.c | 147 +++++++++++++ 2 files changed, 350 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy.c create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_prog.c diff --git a/tools/testing/selftests/bpf/prog_tests/sock_destroy.c b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c new file mode 100644 index 000000000000..d5d16fabac48 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c @@ -0,0 +1,203 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include + +#include "sock_destroy_prog.skel.h" +#include "network_helpers.h" + +static void start_iter_sockets(struct bpf_program *prog) +{ + struct bpf_link *link; + char buf[50] = {}; + int iter_fd, len; + + link = bpf_program__attach_iter(prog, NULL); + if (!ASSERT_OK_PTR(link, "attach_iter")) + return; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "create_iter")) + goto free_link; + + while ((len = read(iter_fd, buf, sizeof(buf))) > 0) + ; + ASSERT_GE(len, 0, "read"); + + close(iter_fd); + +free_link: + bpf_link__destroy(link); +} + +static void test_tcp_client(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + serv = accept(serv, NULL, NULL); + if (!ASSERT_GE(serv, 0, "serv accept")) + goto cleanup; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys connected client sockets. */ + start_iter_sockets(skel->progs.iter_tcp6_client); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket"); + + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + +static void test_tcp_server(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0, err; + __u16 serv_port = 0; + + serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + err = get_sock_port6(serv, &serv_port); + if (!ASSERT_EQ(err, 0, "get_sock_port6")) + goto cleanup; + skel->bss->serv_port = ntohs(serv_port); + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + serv = accept(serv, NULL, NULL); + if (!ASSERT_GE(serv, 0, "serv accept")) + goto cleanup; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys server sockets. */ + start_iter_sockets(skel->progs.iter_tcp6_server); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + ASSERT_EQ(errno, ECONNRESET, "error code on destroyed socket"); + + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + + +static void test_udp_client(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_DGRAM, NULL, 0, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys sockets. */ + start_iter_sockets(skel->progs.iter_udp6_client); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + /* UDP sockets have an overriding error code after they are disconnected, + * so we don't check for ECONNABORTED error code. + */ + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + +static void test_udp_server(struct sock_destroy_prog *skel) +{ + int *listen_fds = NULL, n, i, err; + unsigned int num_listens = 5; + char buf[1]; + __u16 serv_port; + + /* Start reuseport servers. */ + listen_fds = start_reuseport_server(AF_INET6, SOCK_DGRAM, + "::1", 0, 0, num_listens); + if (!ASSERT_OK_PTR(listen_fds, "start_reuseport_server")) + goto cleanup; + err = get_sock_port6(listen_fds[0], &serv_port); + if (!ASSERT_EQ(err, 0, "get_sock_port6")) + goto cleanup; + skel->bss->serv_port = ntohs(serv_port); + + /* Run iterator program that destroys server sockets. */ + start_iter_sockets(skel->progs.iter_udp6_server); + + for (i = 0; i < num_listens; ++i) { + n = read(listen_fds[i], buf, sizeof(buf)); + if (!ASSERT_EQ(n, -1, "read") || + !ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket")) + break; + } + ASSERT_EQ(i, num_listens, "server socket"); + +cleanup: + free_fds(listen_fds, num_listens); +} + +void test_sock_destroy(void) +{ + struct sock_destroy_prog *skel; + int cgroup_fd = 0; + + skel = sock_destroy_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + cgroup_fd = test__join_cgroup("/sock_destroy"); + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup")) + goto close_cgroup_fd; + + skel->links.sock_connect = bpf_program__attach_cgroup( + skel->progs.sock_connect, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links.sock_connect, "prog_attach")) + goto close_cgroup_fd; + + if (test__start_subtest("tcp_client")) + test_tcp_client(skel); + if (test__start_subtest("tcp_server")) + test_tcp_server(skel); + if (test__start_subtest("udp_client")) + test_udp_client(skel); + if (test__start_subtest("udp_server")) + test_udp_server(skel); + + +close_cgroup_fd: + close(cgroup_fd); + sock_destroy_prog__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c new file mode 100644 index 000000000000..5c1e65d50598 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#include "bpf_tracing_net.h" + +#define AF_INET6 10 + +__u16 serv_port = 0; + +int bpf_sock_destroy(struct sock_common *sk) __ksym; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} tcp_conn_sockets SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} udp_conn_sockets SEC(".maps"); + +SEC("cgroup/connect6") +int sock_connect(struct bpf_sock_addr *ctx) +{ + int key = 0; + __u64 sock_cookie = 0; + __u32 keyc = 0; + + if (ctx->family != AF_INET6 || ctx->user_family != AF_INET6) + return 1; + + sock_cookie = bpf_get_socket_cookie(ctx); + if (ctx->protocol == IPPROTO_TCP) + bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0); + else if (ctx->protocol == IPPROTO_UDP) + bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0); + else + return 1; + + return 1; +} + +SEC("iter/tcp") +int iter_tcp6_client(struct bpf_iter__tcp *ctx) +{ + struct sock_common *sk_common = ctx->sk_common; + __u64 sock_cookie = 0; + __u64 *val; + int key = 0; + + if (!sk_common) + return 0; + + if (sk_common->skc_family != AF_INET6) + return 0; + + sock_cookie = bpf_get_socket_cookie(sk_common); + val = bpf_map_lookup_elem(&tcp_conn_sockets, &key); + if (!val) + return 0; + /* Destroy connected client sockets. */ + if (sock_cookie == *val) + bpf_sock_destroy(sk_common); + + return 0; +} + +SEC("iter/tcp") +int iter_tcp6_server(struct bpf_iter__tcp *ctx) +{ + struct sock_common *sk_common = ctx->sk_common; + struct tcp6_sock *tcp_sk; + const struct inet_connection_sock *icsk; + const struct inet_sock *inet; + __u16 srcp; + + if (!sk_common) + return 0; + + if (sk_common->skc_family != AF_INET6) + return 0; + + tcp_sk = bpf_skc_to_tcp6_sock(sk_common); + if (!tcp_sk) + return 0; + + icsk = &tcp_sk->tcp.inet_conn; + inet = &icsk->icsk_inet; + srcp = bpf_ntohs(inet->inet_sport); + + /* Destroy server sockets. */ + if (srcp == serv_port) + bpf_sock_destroy(sk_common); + + return 0; +} + + +SEC("iter/udp") +int iter_udp6_client(struct bpf_iter__udp *ctx) +{ + struct udp_sock *udp_sk = ctx->udp_sk; + struct sock *sk = (struct sock *) udp_sk; + __u64 sock_cookie = 0, *val; + int key = 0; + + if (!sk) + return 0; + + sock_cookie = bpf_get_socket_cookie(sk); + val = bpf_map_lookup_elem(&udp_conn_sockets, &key); + if (!val) + return 0; + /* Destroy connected client sockets. */ + if (sock_cookie == *val) + bpf_sock_destroy((struct sock_common *)sk); + + return 0; +} + +SEC("iter/udp") +int iter_udp6_server(struct bpf_iter__udp *ctx) +{ + struct udp_sock *udp_sk = ctx->udp_sk; + struct sock *sk = (struct sock *) udp_sk; + __u16 srcp; + struct inet_sock *inet; + + if (!sk) + return 0; + + inet = &udp_sk->inet; + srcp = bpf_ntohs(inet->inet_sport); + if (srcp == serv_port) + bpf_sock_destroy((struct sock_common *)sk); + + return 0; +} + +char _license[] SEC("license") = "GPL";