From patchwork Thu Mar 13 23:35:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14016084 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 697401FE450 for ; Thu, 13 Mar 2025 23:36:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741908982; cv=none; b=AFhl14dYQsasUCiG5kyoYN2hsGqJfyUANoSosCotOPP3EJIv1k0k5xMe4t+MZkm1EzM7DS54frd1W13D/m3w4KNCgiFtfYhy9yIymoFJsXlSifZ1LvnmKjOLRTY11czeOL4YrfW4C8YdLbbILbDZh2Fyw6Z6moljv/k+cVDb3AI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741908982; c=relaxed/simple; bh=pDqx/oBd4EkxVRGrbsc8//FAwklwDKmmwxtAokH5X40=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FanQ0y05BJ1z0SJ/EmGSBrheKyC1jEfyikKBCNqwiSaHin9v4gfh6Sd5BRNG7pfnZITWrTcB7wGxoSarMwwOKZUTCQEg4hGll77cK5OAH9GKINJh2b4XFTQsop/Z4rFz/yZ4En2qD4d/166zWmMagyXSHTyXvSpYPSRgFh+vaYE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jrife.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zqvZ27NC; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jrife.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zqvZ27NC" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-301192d5d75so4069211a91.0 for ; Thu, 13 Mar 2025 16:36:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741908979; x=1742513779; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SLnFFEgXuV02S88NGJhDng33QYC4VcTtXVXim4Kjqkw=; b=zqvZ27NCn+BmLcW8Et+f4unsjPfpbQyqOqg68vWK30IXel4w08tf99VQiwYLxs2OLk dzitn2zli+rx9JNRL4xU7mx0AwggErLN9Abbc+oNqPMg9jX1JhhCFDjS5lDWDkibXNDm sjVhlU06sZ/dEPhN33uFZC8NhYwmtqbFayxndth/oGhSbMP/88gxSRLw2wMm2ZH6iPNt 5voEAk8bfp4W1uRpgTonz+q4niznw9F9FliXMxWQJRhA7EFlqMTHSqVEXG6r8Y8wThzt RcsL2CcGG0YBravwqifg/9QoPoLKMX3nvmXNPvPX9exv7Xv3FCpB64UZjpmkLnetbMF9 srHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741908979; x=1742513779; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SLnFFEgXuV02S88NGJhDng33QYC4VcTtXVXim4Kjqkw=; b=taaPI98KLacZ2tgXtHdJszIGpRpBkWw73OyNdaCM/aHv4FajukZc2QKI96Fvk6Z1cY KISP6KlzNKFW/S/gdxkXNXQq+hH28ViLRmwDFtZ8MTeU3xALnuUoCJOtIHRynpIALPka PglaG7dUGLbnKEVuiLkAcOKK1/mYTih/X5RIRCdNGLNMwhuc7pvFyp2r2wW5Ab9h0FO2 qKp3fbLx8pBdTEIlrR8TVAGNPHJ7y009Iz4s0aXWX1mEzG18KfY3yWgQzlOzBx5SIoDR YtDLKSCKIGCxYVRtvCMug2AkPWAz8AqbKx73qgZusWYc1p5oqrBPWxq9oFuyxwEx2Ac7 NPmw== X-Forwarded-Encrypted: i=1; AJvYcCUXNQsO7JMGdeElApkRe7od4etSpj5DUejdA6Q9QOuVsoUv84K39ZPbbEftyYULMNNg59Y=@vger.kernel.org X-Gm-Message-State: AOJu0YxfhzOV0L2tt9tRkg2UnEKgMCLtqoI+9oR7l8OGziHVU9uHscBP VngzZNoDg3GiasMqoEE99IQTlXa90Y9I36Ng6wF3RbrbJ+Z+x3tEK2CFZ087vbxQRojqY9m8bw= = X-Google-Smtp-Source: AGHT+IHCQe+x5Ig6/+JG4qWJq6nfvhnp/R8Ca+8RPm3mEv2B7nL0yruekONFz6MBhh9E1zQzaUnrAbPiLw== X-Received: from pjbsp12.prod.google.com ([2002:a17:90b:52cc:b0:2ff:5516:6add]) (user=jrife job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:fc4d:b0:2fa:1851:a023 with SMTP id 98e67ed59e1d1-30151d65226mr427988a91.35.1741908979675; Thu, 13 Mar 2025 16:36:19 -0700 (PDT) Date: Thu, 13 Mar 2025 23:35:25 +0000 In-Reply-To: <20250313233615.2329869-1-jrife@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250313233615.2329869-1-jrife@google.com> X-Mailer: git-send-email 2.49.0.rc1.451.g8f38331e32-goog Message-ID: <20250313233615.2329869-2-jrife@google.com> Subject: [RFC PATCH bpf-next 1/3] bpf: udp: Avoid socket skips during iteration From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Daniel Borkmann , Martin KaFai Lau , Yonghong Song , Aditi Ghag X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Replace the offset-based approach for tracking progress through a bucket in the UDP table with one based on unique, monotonically increasing index numbers associated with each socket in a bucket. Signed-off-by: Jordan Rife --- include/net/sock.h | 2 ++ include/net/udp.h | 1 + net/ipv4/udp.c | 38 +++++++++++++++++++++++++------------- 3 files changed, 28 insertions(+), 13 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 8036b3b79cd8..b11f43e8e7ec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -228,6 +228,7 @@ struct sock_common { u32 skc_window_clamp; u32 skc_tw_snd_nxt; /* struct tcp_timewait_sock */ }; + __s64 skc_idx; /* public: */ }; @@ -378,6 +379,7 @@ struct sock { #define sk_incoming_cpu __sk_common.skc_incoming_cpu #define sk_flags __sk_common.skc_flags #define sk_rxhash __sk_common.skc_rxhash +#define sk_idx __sk_common.skc_idx __cacheline_group_begin(sock_write_rx); diff --git a/include/net/udp.h b/include/net/udp.h index 6e89520e100d..9398561addc6 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -102,6 +102,7 @@ struct udp_table { #endif unsigned int mask; unsigned int log; + atomic64_t ver; }; extern struct udp_table udp_table; void udp_table_init(struct udp_table *, const char *); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index a9bb9ce5438e..d7e9b3346983 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -229,6 +229,11 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot) return reuseport_alloc(sk, inet_rcv_saddr_any(sk)); } +static inline __s64 udp_table_next_idx(struct udp_table *udptable, bool pos) +{ + return (pos ? 1 : -1) * atomic64_inc_return(&udptable->ver); +} + /** * udp_lib_get_port - UDP/-Lite port lookup for IPv4 and IPv6 * @@ -244,6 +249,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, struct udp_hslot *hslot, *hslot2; struct net *net = sock_net(sk); int error = -EADDRINUSE; + bool add_tail; if (!snum) { DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN); @@ -335,14 +341,16 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash); spin_lock(&hslot2->lock); - if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && - sk->sk_family == AF_INET6) + add_tail = IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && + sk->sk_family == AF_INET6; + if (add_tail) hlist_add_tail_rcu(&udp_sk(sk)->udp_portaddr_node, &hslot2->head); else hlist_add_head_rcu(&udp_sk(sk)->udp_portaddr_node, &hslot2->head); hslot2->count++; + sk->sk_idx = udp_table_next_idx(udptable, add_tail); spin_unlock(&hslot2->lock); } @@ -2250,6 +2258,8 @@ void udp_lib_rehash(struct sock *sk, u16 newhash, u16 newhash4) hlist_add_head_rcu(&udp_sk(sk)->udp_portaddr_node, &nhslot2->head); nhslot2->count++; + sk->sk_idx = udp_table_next_idx(udptable, + false); spin_unlock(&nhslot2->lock); } @@ -3390,9 +3400,9 @@ struct bpf_udp_iter_state { unsigned int cur_sk; unsigned int end_sk; unsigned int max_sk; - int offset; struct sock **batch; bool st_bucket_done; + __s64 prev_idx; }; static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, @@ -3402,14 +3412,13 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) struct bpf_udp_iter_state *iter = seq->private; struct udp_iter_state *state = &iter->state; struct net *net = seq_file_net(seq); - int resume_bucket, resume_offset; struct udp_table *udptable; unsigned int batch_sks = 0; bool resized = false; + int resume_bucket; struct sock *sk; resume_bucket = state->bucket; - resume_offset = iter->offset; /* The current batch is done, so advance the bucket. */ if (iter->st_bucket_done) @@ -3436,18 +3445,19 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) if (hlist_empty(&hslot2->head)) continue; - iter->offset = 0; spin_lock_bh(&hslot2->lock); + /* Reset prev_idx if this is a new bucket. */ + if (!resume_bucket || state->bucket != resume_bucket) + iter->prev_idx = 0; udp_portaddr_for_each_entry(sk, &hslot2->head) { if (seq_sk_match(seq, sk)) { - /* Resume from the last iterated socket at the - * offset in the bucket before iterator was stopped. + /* Resume from the first socket that we didn't + * see last time around. */ if (state->bucket == resume_bucket && - iter->offset < resume_offset) { - ++iter->offset; + iter->prev_idx && + sk->sk_idx <= iter->prev_idx) continue; - } if (iter->end_sk < iter->max_sk) { sock_hold(sk); iter->batch[iter->end_sk++] = sk; @@ -3492,8 +3502,9 @@ static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) * done with seq_show(), so unref the iter->cur_sk. */ if (iter->cur_sk < iter->end_sk) { - sock_put(iter->batch[iter->cur_sk++]); - ++iter->offset; + sk = iter->batch[iter->cur_sk++]; + iter->prev_idx = sk->sk_idx; + sock_put(sk); } /* After updating iter->cur_sk, check if there are more sockets @@ -3740,6 +3751,7 @@ static struct udp_table __net_init *udp_pernet_table_alloc(unsigned int hash_ent udptable->hash2 = (void *)(udptable->hash + hash_entries); udptable->mask = hash_entries - 1; udptable->log = ilog2(hash_entries); + atomic64_set(&udptable->ver, 0); for (i = 0; i < hash_entries; i++) { INIT_HLIST_HEAD(&udptable->hash[i].head); From patchwork Thu Mar 13 23:35:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14016085 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E62C82139AF for ; Thu, 13 Mar 2025 23:36:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741908983; cv=none; b=ufaBtaeppnR+nSz6Z1x/HFy1XxBvMIcCXE1e9WZJgCiEd/T2/YDF0gcjKnh0kxDB7WAFQVDB3EDa+QakQF0YORr1UWHy+f+SlUtLZoL/n7qk4gHH1t0kYBaj4ISQEqwc03s/SxoVc5BVwnLrwuEUFHmguPuKyADV2ZozZO2KPL8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741908983; c=relaxed/simple; bh=MEKI3VBnIur5t+UOWEUKJ3aasrnKw4wV31ERHLuUEng=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FEYaRjVkhInEM+h8id8u8Z5Sd1r3oxefGeqtsstFG6CFF3/3NhCju1oDxEGbXE2ULVOQn0rbKPpSsRu/UC7SYplQkgb4lEm+CBqWNEcLcDtFBr0dz+/BhdqxtK2RDE0tU5KIj6mf1bMAvLmzD9raKOdCeb0SyjewOrzAojQ0rnM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jrife.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MDhtPhcV; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jrife.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MDhtPhcV" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-22410053005so40454895ad.1 for ; Thu, 13 Mar 2025 16:36:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741908981; x=1742513781; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XGB+lOhCdqUZGn3xMYzsRRLShpHA743cfsiVh9w/5P8=; b=MDhtPhcV9SdhMWrMfko2KmGx1IwkeUaXwjOhcNSs9PUxtUMBOFcbxsYVhwMwHTgrhm FzNgC4YmKWf44vSMlXzYeJWvhBQb+37WrOkSErEgW+dx+PWDA2uM088P6Z5Lu1PusLUe A+uuhvwSiJVo/sktKJiUmgbrc0sW2uGeyFL2q/ReGcfbNgrjF9m5cAg+MZiTZbGBKqqK dfUDkYFfFgACxvBzPNejHJKUUBEfZqv2TadSgH9MOaEFHEcUSu4w31/dCm1nspfVcwRn ukqHLyHTt4YnCeIo75b44UtzLnOXSBpwugraZ+qNcDd+/ZHuBVRTNkX5XqiK7MKgMV0m dGsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741908981; x=1742513781; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XGB+lOhCdqUZGn3xMYzsRRLShpHA743cfsiVh9w/5P8=; b=NyCqhEQiST1IoOV5JmSAZbYUblnWRNstq7yzbkxNP3Qlqs0KaH5aWfR7clenQkvyZ2 1j+wKdOyJfP+pA1Mc8+zJvv/gPqoxgkdwBNGhPwGm3fhYVwOB5jz6Yp8E5nHV5D3zelj O84wMZf3rsKWBxDwI8Vz0vv1uphyHkZrhSC9f2fpIFEE2EBtkrBayLfoq4jI120ERA/c cSx8/qIO2wMZq6pcRAbB7OYHkKsGPtVdkzetS10zJ4OUyVvw4HCToL0vHUyKGpS4ZGCA GZXwtuaDl5IKyaLe/NBuBp7wNuLj6JZE5llMbQsF08BF4rPL0W08eegJ2puEHuJgQtqc uV6Q== X-Forwarded-Encrypted: i=1; AJvYcCXjzUf8Aaa7FRGFods0eWyKik4C8uD//OMgldXz1asLCPOlbWvH/k+MFNKINxsJP4FNuIc=@vger.kernel.org X-Gm-Message-State: AOJu0Yy08SLHed0puaXGk9ULfznPKyiGG6W6u0Csa0Uj7qV+Tp8ievF8 0SU32gEQ+yGA64xvqCuokK+ADJsIXmCFdKERcWD+4PKatt/1l+WRA7zcrY5Ni9niytKp6JPdrg= = X-Google-Smtp-Source: AGHT+IE8gqQd2I6kG5u2tuh9cZNViDDzIsiyVIfJ7jYqIJC09AKhDccVCQzJSq7lJ9vGCq1nl1N7rRNwAQ== X-Received: from pfjt14.prod.google.com ([2002:a05:6a00:21ce:b0:725:e4b6:901f]) (user=jrife job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1402:b0:736:73ad:365b with SMTP id d2e1a72fcca58-737223c0336mr419765b3a.14.1741908981193; Thu, 13 Mar 2025 16:36:21 -0700 (PDT) Date: Thu, 13 Mar 2025 23:35:26 +0000 In-Reply-To: <20250313233615.2329869-1-jrife@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250313233615.2329869-1-jrife@google.com> X-Mailer: git-send-email 2.49.0.rc1.451.g8f38331e32-goog Message-ID: <20250313233615.2329869-3-jrife@google.com> Subject: [RFC PATCH bpf-next 2/3] bpf: tcp: Avoid socket skips during iteration From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Daniel Borkmann , Martin KaFai Lau , Yonghong Song , Aditi Ghag X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Replace the offset-based approach for tracking progress through a bucket in the TCP table with one based on unique, monotonically increasing index numbers associated with each socket in a bucket. Signed-off-by: Jordan Rife --- include/net/inet_hashtables.h | 2 ++ include/net/tcp.h | 3 ++- net/ipv4/inet_hashtables.c | 18 +++++++++++++++--- net/ipv4/tcp.c | 1 + net/ipv4/tcp_ipv4.c | 29 ++++++++++++++++------------- 5 files changed, 36 insertions(+), 17 deletions(-) diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 5eea47f135a4..c95d3b1da199 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -172,6 +172,8 @@ struct inet_hashinfo { struct inet_listen_hashbucket *lhash2; bool pernet; + + atomic64_t ver; } ____cacheline_aligned_in_smp; static inline struct inet_hashinfo *tcp_or_dccp_get_hashinfo(const struct sock *sk) diff --git a/include/net/tcp.h b/include/net/tcp.h index 2d08473a6dc0..499acd6da35f 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2202,7 +2202,8 @@ struct tcp_iter_state { struct seq_net_private p; enum tcp_seq_states state; struct sock *syn_wait_sk; - int bucket, offset, sbucket, num; + int bucket, sbucket, num; + __s64 prev_idx; loff_t last_pos; }; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 9bfcfd016e18..bc9f58172790 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -534,6 +534,12 @@ struct sock *__inet_lookup_established(const struct net *net, } EXPORT_SYMBOL_GPL(__inet_lookup_established); +static inline __s64 inet_hashinfo_next_idx(struct inet_hashinfo *hinfo, + bool pos) +{ + return (pos ? 1 : -1) * atomic64_inc_return(&hinfo->ver); +} + /* called with local bh disabled */ static int __inet_check_established(struct inet_timewait_death_row *death_row, struct sock *sk, __u16 lport, @@ -581,6 +587,7 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row, sk->sk_hash = hash; WARN_ON(!sk_unhashed(sk)); __sk_nulls_add_node_rcu(sk, &head->chain); + sk->sk_idx = inet_hashinfo_next_idx(hinfo, false); if (tw) { sk_nulls_del_node_init_rcu((struct sock *)tw); __NET_INC_STATS(net, LINUX_MIB_TIMEWAITRECYCLED); @@ -678,8 +685,10 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk) ret = false; } - if (ret) + if (ret) { __sk_nulls_add_node_rcu(sk, list); + sk->sk_idx = inet_hashinfo_next_idx(hashinfo, false); + } spin_unlock(lock); @@ -729,6 +738,7 @@ int __inet_hash(struct sock *sk, struct sock *osk) { struct inet_hashinfo *hashinfo = tcp_or_dccp_get_hashinfo(sk); struct inet_listen_hashbucket *ilb2; + bool add_tail; int err = 0; if (sk->sk_state != TCP_LISTEN) { @@ -747,11 +757,13 @@ int __inet_hash(struct sock *sk, struct sock *osk) goto unlock; } sock_set_flag(sk, SOCK_RCU_FREE); - if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && - sk->sk_family == AF_INET6) + add_tail = IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && + sk->sk_family == AF_INET6; + if (add_tail) __sk_nulls_add_node_tail_rcu(sk, &ilb2->nulls_head); else __sk_nulls_add_node_rcu(sk, &ilb2->nulls_head); + sk->sk_idx = inet_hashinfo_next_idx(hashinfo, add_tail); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); unlock: spin_unlock(&ilb2->lock); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 285678d8ce07..63693af0c05c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -5147,6 +5147,7 @@ void __init tcp_init(void) cnt = tcp_hashinfo.ehash_mask + 1; sysctl_tcp_max_orphans = cnt / 2; + atomic64_set(&tcp_hashinfo.ver, 0); tcp_init_mem(); /* Set per-socket limits to no more than 1/128 the pressure threshold */ diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 2632844d2c35..d0ddb307e2a1 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2602,7 +2602,7 @@ static void *listening_get_first(struct seq_file *seq) struct inet_hashinfo *hinfo = seq_file_net(seq)->ipv4.tcp_death_row.hashinfo; struct tcp_iter_state *st = seq->private; - st->offset = 0; + st->prev_idx = 0; for (; st->bucket <= hinfo->lhash2_mask; st->bucket++) { struct inet_listen_hashbucket *ilb2; struct hlist_nulls_node *node; @@ -2637,7 +2637,7 @@ static void *listening_get_next(struct seq_file *seq, void *cur) struct sock *sk = cur; ++st->num; - ++st->offset; + st->prev_idx = sk->sk_idx; sk = sk_nulls_next(sk); sk_nulls_for_each_from(sk, node) { @@ -2658,7 +2658,6 @@ static void *listening_get_idx(struct seq_file *seq, loff_t *pos) void *rc; st->bucket = 0; - st->offset = 0; rc = listening_get_first(seq); while (rc && *pos) { @@ -2683,7 +2682,7 @@ static void *established_get_first(struct seq_file *seq) struct inet_hashinfo *hinfo = seq_file_net(seq)->ipv4.tcp_death_row.hashinfo; struct tcp_iter_state *st = seq->private; - st->offset = 0; + st->prev_idx = 0; for (; st->bucket <= hinfo->ehash_mask; ++st->bucket) { struct sock *sk; struct hlist_nulls_node *node; @@ -2714,7 +2713,7 @@ static void *established_get_next(struct seq_file *seq, void *cur) struct sock *sk = cur; ++st->num; - ++st->offset; + st->prev_idx = sk->sk_idx; sk = sk_nulls_next(sk); @@ -2763,8 +2762,8 @@ static void *tcp_seek_last_pos(struct seq_file *seq) { struct inet_hashinfo *hinfo = seq_file_net(seq)->ipv4.tcp_death_row.hashinfo; struct tcp_iter_state *st = seq->private; + __s64 prev_idx = st->prev_idx; int bucket = st->bucket; - int offset = st->offset; int orig_num = st->num; void *rc = NULL; @@ -2773,18 +2772,21 @@ static void *tcp_seek_last_pos(struct seq_file *seq) if (st->bucket > hinfo->lhash2_mask) break; rc = listening_get_first(seq); - while (offset-- && rc && bucket == st->bucket) + while (rc && bucket == st->bucket && prev_idx && + ((struct sock *)rc)->sk_idx <= prev_idx) rc = listening_get_next(seq, rc); if (rc) break; st->bucket = 0; + prev_idx = 0; st->state = TCP_SEQ_STATE_ESTABLISHED; fallthrough; case TCP_SEQ_STATE_ESTABLISHED: if (st->bucket > hinfo->ehash_mask) break; rc = established_get_first(seq); - while (offset-- && rc && bucket == st->bucket) + while (rc && bucket == st->bucket && prev_idx && + ((struct sock *)rc)->sk_idx <= prev_idx) rc = established_get_next(seq, rc); } @@ -2807,7 +2809,7 @@ void *tcp_seq_start(struct seq_file *seq, loff_t *pos) st->state = TCP_SEQ_STATE_LISTENING; st->num = 0; st->bucket = 0; - st->offset = 0; + st->prev_idx = 0; rc = *pos ? tcp_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; out: @@ -2832,7 +2834,7 @@ void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos) if (!rc) { st->state = TCP_SEQ_STATE_ESTABLISHED; st->bucket = 0; - st->offset = 0; + st->prev_idx = 0; rc = established_get_first(seq); } break; @@ -3124,7 +3126,7 @@ static struct sock *bpf_iter_tcp_batch(struct seq_file *seq) * it has to advance to the next bucket. */ if (iter->st_bucket_done) { - st->offset = 0; + st->prev_idx = 0; st->bucket++; if (st->state == TCP_SEQ_STATE_LISTENING && st->bucket > hinfo->lhash2_mask) { @@ -3192,8 +3194,9 @@ static void *bpf_iter_tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos) * the future start() will resume at st->offset in * st->bucket. See tcp_seek_last_pos(). */ - st->offset++; - sock_gen_put(iter->batch[iter->cur_sk++]); + sk = iter->batch[iter->cur_sk++]; + st->prev_idx = sk->sk_idx; + sock_gen_put(sk); } if (iter->cur_sk < iter->end_sk) From patchwork Thu Mar 13 23:35:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14016086 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F9A9213E85 for ; Thu, 13 Mar 2025 23:36:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741908985; cv=none; b=GSl8gqbceqeuRKJY1SjXRr8Sfe4A5/S/rTprg+W/pdA02Mfd6saGrsSeX2JheKSzN9rhZz1+3HMipv22IujYsARKSreInzxtcdCBDWE1T9nwEhwiu/ncEaie9a+b6wbGYzWzEKEl8FizTyzivZEVfLUnv30x8umUVtHnjXJeo/A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741908985; c=relaxed/simple; bh=zlaF7V/D4LUYbSkKU1UbelZoLgHYBpEtCUZNB2BjYAU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PucZzk26y/F0gk53tCV5pySt/1PIHjTralW7a02VgfDAigZvqKoWBTDD7sNP/x+jIwutvymocJhR9VyisBUjtFp5UU5xlrJM42TCoLBaDEk8RPvGKe+6H1lq/huu94zpjMCnZHNtIdy4vYgAjktB7+160460X4Od+0Ckc2a8TKY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jrife.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iY9/LVgo; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jrife.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iY9/LVgo" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff69646218so4021959a91.3 for ; Thu, 13 Mar 2025 16:36:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741908983; x=1742513783; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=d18/d09Dz9ktq8VmcM8vb1l6yxRF8rMrPqQQdqFZ7tk=; b=iY9/LVgoiA9PVr/T0rIDrUv13rR+Ej4cwxxv++G2q1JDI+Hw75fXVKOP2Dezcamit5 KbLPc+oKciiZtuAQs0WZrwooUpb9jHdWuM6TDWTv/tW5xlheL3809EF0+zTqVKQyD6se Ew4CkgtlPJJO9E4IA+S7s8GrG91//zDkNYBN6L6uMF7GaTEDCr85pvJNo0GpSr+Z/u1I BFo70l6hak/752xrvGuXE2dQxHNuNX1ZJp7sQ+HtXcD+XeLfFSghW/oCSwo6AUFVtTNR 64sfd9vwypyfrEOh+cZvOBOFbh3NSziwIhx8r9YclRObbv8Op7k6kEkG1NDvu6vIgQhZ rr7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741908983; x=1742513783; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=d18/d09Dz9ktq8VmcM8vb1l6yxRF8rMrPqQQdqFZ7tk=; b=uNuJXwjfF2B7S9b2bE+BoQ9nvaylpzv6tNZzftT43p+qofyqNNYn8qP1p74Z9TmVXO kS26but7fJGPBoCWN8PrpGtUCMdOVpV/q22getgxMc5l5ARqMOO8upqjDSvta4PbK7AJ /CgLybAZqTpbuAwaic3z0zkczCXT9cEBblftrbCm8In/+WSf/KrxYJ0gJG/EKBgyqO35 GGk7/g5zk719ffPqtYlxj89ylCTr3czJTqG6O+2r++Iw+UebPYBK6+O3WXHK44dElki0 DXWflk2KEEdI2NvjyXeZxV9vFngWw5LbDzoc642teVpmIc1sbyIBGlbxcGzRemG7dRvh dU6w== X-Forwarded-Encrypted: i=1; AJvYcCXxAo1QrL0aL3y+Q5b6LEIiiRSSuzupQyEdyzf7ZIRRxVKzEHVFBn2sq21CYWDEK2Jomb0=@vger.kernel.org X-Gm-Message-State: AOJu0YzE/AepVC2+gKrXqNvYtFxciaWAAsKDkZmOA916rXK4UM8k9fcq nTVlzJinK4rejJJ7YvJ45dxDN1R3fgwivcXJdahw3xLFMOfIKxa7RMKd9XBLUepPs6VZaVRkfw= = X-Google-Smtp-Source: AGHT+IHDa1188aeb9ZwFHUKfNFL/nXAhQihB0hUq6dTHWYIpkjnb+e2O3fD/UoPdHvEeQG70WhPFU6YTxg== X-Received: from pjbkl7.prod.google.com ([2002:a17:90b:4987:b0:2ff:5344:b54]) (user=jrife job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:fc4f:b0:2f2:a664:df20 with SMTP id 98e67ed59e1d1-30151c5f303mr511803a91.7.1741908982757; Thu, 13 Mar 2025 16:36:22 -0700 (PDT) Date: Thu, 13 Mar 2025 23:35:27 +0000 In-Reply-To: <20250313233615.2329869-1-jrife@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250313233615.2329869-1-jrife@google.com> X-Mailer: git-send-email 2.49.0.rc1.451.g8f38331e32-goog Message-ID: <20250313233615.2329869-4-jrife@google.com> Subject: [RFC PATCH bpf-next 3/3] selftests/bpf: Add tests for socket skips and repeats From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Daniel Borkmann , Martin KaFai Lau , Yonghong Song , Aditi Ghag X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add do_skip_test() and do_repeat_test() subtests to the sock_iter_batch prog_test to check for socket skips and repeats, respectively. Extend the sock_iter_batch BPF program to output the socket cookie as well, so that we can check for uniqueness. The skip test works by partially iterating through a bucket, then closing one of the sockets that have already been seen to remove it from the bucket. Before, this would have resulted in skipping the fourth socket. Now, the fourth socket is seen. The repeat test works by partially iterating through a bucket, then adding four more sockets to the head of the bucket. Before, this would have resulted in repeating several of the sockets from the first batch, but now we see sockets exactly once. Signed-off-by: Jordan Rife --- .../bpf/prog_tests/sock_iter_batch.c | 293 +++++++++++++++++- .../selftests/bpf/progs/bpf_tracing_net.h | 1 + .../selftests/bpf/progs/sock_iter_batch.c | 24 +- 3 files changed, 300 insertions(+), 18 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c b/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c index d56e18b25528..cf554e8fdf79 100644 --- a/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c +++ b/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c @@ -6,15 +6,275 @@ #include "sock_iter_batch.skel.h" #define TEST_NS "sock_iter_batch_netns" +#define nr_soreuse 4 -static const int nr_soreuse = 4; +static const __u16 reuse_port = 10001; + +struct iter_out { + int idx; + __u64 cookie; +} __packed; + +struct sock_count { + __u64 cookie; + int count; +}; + +static int insert(__u64 cookie, struct sock_count counts[], int counts_len) +{ + int insert = -1; + int i = 0; + + for (; i < counts_len; i++) { + if (!counts[i].cookie) { + insert = i; + } else if (counts[i].cookie == cookie) { + insert = i; + break; + } + } + if (insert < 0) + return insert; + + counts[insert].cookie = cookie; + counts[insert].count++; + + return counts[insert].count; +} + +static int read_n(int iter_fd, int n, struct sock_count counts[], + int counts_len) +{ + struct iter_out out; + int nread = 1; + int i = 0; + + for (; nread > 0 && (n < 0 || i < n); i++) { + nread = read(iter_fd, &out, sizeof(out)); + if (!nread || !ASSERT_GE(nread, 1, "nread")) + break; + ASSERT_GE(insert(out.cookie, counts, counts_len), 0, "insert"); + } + + ASSERT_TRUE(n < 0 || i == n, "n < 0 || i == n"); + + return i; +} + +static __u64 socket_cookie(int fd) +{ + __u64 cookie; + socklen_t cookie_len = sizeof(cookie); + static __u32 duration; /* for CHECK macro */ + + if (CHECK(getsockopt(fd, SOL_SOCKET, SO_COOKIE, &cookie, &cookie_len) < 0, + "getsockopt(SO_COOKIE)", "%s\n", strerror(errno))) + return 0; + return cookie; +} + +static bool was_seen(int fd, struct sock_count counts[], int counts_len) +{ + __u64 cookie = socket_cookie(fd); + int i = 0; + + for (; cookie && i < counts_len; i++) + if (cookie == counts[i].cookie) + return true; + + return false; +} + +static int get_seen_socket(int *fds, struct sock_count counts[], int n) +{ + int i = 0; + + for (; i < n; i++) + if (was_seen(fds[i], counts, n)) + return i; + return -1; +} + +static int get_seen_count(int fd, struct sock_count counts[], int n) +{ + __u64 cookie = socket_cookie(fd); + int count = 0; + int i = 0; + + for (; cookie && !count && i < n; i++) + if (cookie == counts[i].cookie) + count = counts[i].count; + + return count; +} + +static void check_n_were_seen_once(int *fds, int fds_len, int n, + struct sock_count counts[], int counts_len) +{ + int seen_once = 0; + int seen_cnt; + int i = 0; + + for (; i < fds_len; i++) { + /* Skip any sockets that were closed or that weren't seen + * exactly once. + */ + if (fds[i] < 0) + continue; + seen_cnt = get_seen_count(fds[i], counts, counts_len); + if (seen_cnt && ASSERT_EQ(seen_cnt, 1, "seen_cnt")) + seen_once++; + } + + ASSERT_EQ(seen_once, n, "seen_once"); +} + +static void do_skip_test(int sock_type) +{ + struct sock_count counts[nr_soreuse] = {}; + struct bpf_link *link = NULL; + struct sock_iter_batch *skel; + int err, iter_fd = -1; + int close_idx; + int *fds; + + skel = sock_iter_batch__open(); + if (!ASSERT_OK_PTR(skel, "sock_iter_batch__open")) + return; + + /* Prepare a bucket of sockets in the kernel hashtable */ + int local_port; + + fds = start_reuseport_server(AF_INET, sock_type, "127.0.0.1", 0, 0, + nr_soreuse); + if (!ASSERT_OK_PTR(fds, "start_reuseport_server")) + goto done; + local_port = get_socket_local_port(*fds); + if (!ASSERT_GE(local_port, 0, "get_socket_local_port")) + goto done; + skel->rodata->ports[0] = ntohs(local_port); + skel->rodata->sf = AF_INET; + + err = sock_iter_batch__load(skel); + if (!ASSERT_OK(err, "sock_iter_batch__load")) + goto done; + + link = bpf_program__attach_iter(sock_type == SOCK_STREAM ? + skel->progs.iter_tcp_soreuse : + skel->progs.iter_udp_soreuse, + NULL); + if (!ASSERT_OK_PTR(link, "bpf_program__attach_iter")) + goto done; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "bpf_iter_create")) + goto done; + + /* Iterate through the first three sockets. */ + read_n(iter_fd, nr_soreuse - 1, counts, nr_soreuse); + + /* Make sure we saw three sockets from fds exactly once. */ + check_n_were_seen_once(fds, nr_soreuse, nr_soreuse - 1, counts, + nr_soreuse); + + /* Close a socket we've already seen to remove it from the bucket. */ + close_idx = get_seen_socket(fds, counts, nr_soreuse); + if (!ASSERT_GE(close_idx, 0, "close_idx")) + goto done; + close(fds[close_idx]); + fds[close_idx] = -1; + + /* Iterate through the rest of the sockets. */ + read_n(iter_fd, -1, counts, nr_soreuse); + + /* Make sure the last socket wasn't skipped and that there were no + * repeats. + */ + check_n_were_seen_once(fds, nr_soreuse, nr_soreuse - 1, counts, + nr_soreuse); +done: + free_fds(fds, nr_soreuse); + if (iter_fd < 0) + close(iter_fd); + bpf_link__destroy(link); + sock_iter_batch__destroy(skel); +} + +static void do_repeat_test(int sock_type) +{ + struct sock_count counts[nr_soreuse] = {}; + struct bpf_link *link = NULL; + struct sock_iter_batch *skel; + int err, i, iter_fd = -1; + int *fds[2] = {}; + + skel = sock_iter_batch__open(); + if (!ASSERT_OK_PTR(skel, "sock_iter_batch__open")) + return; + + /* Prepare a bucket of sockets in the kernel hashtable */ + int local_port; + + fds[0] = start_reuseport_server(AF_INET, sock_type, "127.0.0.1", + reuse_port, 0, nr_soreuse); + if (!ASSERT_OK_PTR(fds[0], "start_reuseport_server")) + goto done; + local_port = get_socket_local_port(*fds[0]); + if (!ASSERT_GE(local_port, 0, "get_socket_local_port")) + goto done; + skel->rodata->ports[0] = ntohs(local_port); + skel->rodata->sf = AF_INET; + + err = sock_iter_batch__load(skel); + if (!ASSERT_OK(err, "sock_iter_batch__load")) + goto done; + + link = bpf_program__attach_iter(sock_type == SOCK_STREAM ? + skel->progs.iter_tcp_soreuse : + skel->progs.iter_udp_soreuse, + NULL); + if (!ASSERT_OK_PTR(link, "bpf_program__attach_iter")) + goto done; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "bpf_iter_create")) + goto done; + + /* Iterate through the first three sockets */ + read_n(iter_fd, nr_soreuse - 1, counts, nr_soreuse); + + /* Make sure we saw three sockets from fds exactly once. */ + check_n_were_seen_once(fds[0], nr_soreuse, nr_soreuse - 1, counts, + nr_soreuse); + + /* Add nr_soreuse more sockets to the bucket. */ + fds[1] = start_reuseport_server(AF_INET, sock_type, "127.0.0.1", + reuse_port, 0, nr_soreuse); + if (!ASSERT_OK_PTR(fds[1], "start_reuseport_server")) + goto done; + + /* Iterate through the rest of the sockets. */ + read_n(iter_fd, -1, counts, nr_soreuse); + + /* Make sure each socket from the first set was seen exactly once. */ + check_n_were_seen_once(fds[0], nr_soreuse, nr_soreuse, counts, + nr_soreuse); +done: + for (i = 0; i < ARRAY_SIZE(fds); i++) + free_fds(fds[i], nr_soreuse); + if (iter_fd < 0) + close(iter_fd); + bpf_link__destroy(link); + sock_iter_batch__destroy(skel); +} static void do_test(int sock_type, bool onebyone) { int err, i, nread, to_read, total_read, iter_fd = -1; - int first_idx, second_idx, indices[nr_soreuse]; + struct iter_out outputs[nr_soreuse]; struct bpf_link *link = NULL; struct sock_iter_batch *skel; + int first_idx, second_idx; int *fds[2] = {}; skel = sock_iter_batch__open(); @@ -34,6 +294,7 @@ static void do_test(int sock_type, bool onebyone) goto done; skel->rodata->ports[i] = ntohs(local_port); } + skel->rodata->sf = AF_INET6; err = sock_iter_batch__load(skel); if (!ASSERT_OK(err, "sock_iter_batch__load")) @@ -55,38 +316,38 @@ static void do_test(int sock_type, bool onebyone) * from a bucket and leave one socket out from * that bucket on purpose. */ - to_read = (nr_soreuse - 1) * sizeof(*indices); + to_read = (nr_soreuse - 1) * sizeof(*outputs); total_read = 0; first_idx = -1; do { - nread = read(iter_fd, indices, onebyone ? sizeof(*indices) : to_read); - if (nread <= 0 || nread % sizeof(*indices)) + nread = read(iter_fd, outputs, onebyone ? sizeof(*outputs) : to_read); + if (nread <= 0 || nread % sizeof(*outputs)) break; total_read += nread; if (first_idx == -1) - first_idx = indices[0]; - for (i = 0; i < nread / sizeof(*indices); i++) - ASSERT_EQ(indices[i], first_idx, "first_idx"); + first_idx = outputs[0].idx; + for (i = 0; i < nread / sizeof(*outputs); i++) + ASSERT_EQ(outputs[i].idx, first_idx, "first_idx"); } while (total_read < to_read); - ASSERT_EQ(nread, onebyone ? sizeof(*indices) : to_read, "nread"); + ASSERT_EQ(nread, onebyone ? sizeof(*outputs) : to_read, "nread"); ASSERT_EQ(total_read, to_read, "total_read"); free_fds(fds[first_idx], nr_soreuse); fds[first_idx] = NULL; /* Read the "whole" second bucket */ - to_read = nr_soreuse * sizeof(*indices); + to_read = nr_soreuse * sizeof(*outputs); total_read = 0; second_idx = !first_idx; do { - nread = read(iter_fd, indices, onebyone ? sizeof(*indices) : to_read); - if (nread <= 0 || nread % sizeof(*indices)) + nread = read(iter_fd, outputs, onebyone ? sizeof(*outputs) : to_read); + if (nread <= 0 || nread % sizeof(*outputs)) break; total_read += nread; - for (i = 0; i < nread / sizeof(*indices); i++) - ASSERT_EQ(indices[i], second_idx, "second_idx"); + for (i = 0; i < nread / sizeof(*outputs); i++) + ASSERT_EQ(outputs[i].idx, second_idx, "second_idx"); } while (total_read <= to_read); ASSERT_EQ(nread, 0, "nread"); /* Both so_reuseport ports should be in different buckets, so @@ -123,10 +384,14 @@ void test_sock_iter_batch(void) if (test__start_subtest("tcp")) { do_test(SOCK_STREAM, true); do_test(SOCK_STREAM, false); + do_skip_test(SOCK_STREAM); + do_repeat_test(SOCK_STREAM); } if (test__start_subtest("udp")) { do_test(SOCK_DGRAM, true); do_test(SOCK_DGRAM, false); + do_skip_test(SOCK_DGRAM); + do_repeat_test(SOCK_DGRAM); } close_netns(nstoken); diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h index 59843b430f76..82928cc5d87b 100644 --- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h +++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h @@ -123,6 +123,7 @@ #define sk_refcnt __sk_common.skc_refcnt #define sk_state __sk_common.skc_state #define sk_net __sk_common.skc_net +#define sk_rcv_saddr __sk_common.skc_rcv_saddr #define sk_v6_daddr __sk_common.skc_v6_daddr #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr #define sk_flags __sk_common.skc_flags diff --git a/tools/testing/selftests/bpf/progs/sock_iter_batch.c b/tools/testing/selftests/bpf/progs/sock_iter_batch.c index 96531b0d9d55..8f483337e103 100644 --- a/tools/testing/selftests/bpf/progs/sock_iter_batch.c +++ b/tools/testing/selftests/bpf/progs/sock_iter_batch.c @@ -17,6 +17,12 @@ static bool ipv6_addr_loopback(const struct in6_addr *a) a->s6_addr32[2] | (a->s6_addr32[3] ^ bpf_htonl(1))) == 0; } +static bool ipv4_addr_loopback(__be32 a) +{ + return a == bpf_ntohl(0x7f000001); +} + +volatile const unsigned int sf; volatile const __u16 ports[2]; unsigned int bucket[2]; @@ -26,16 +32,20 @@ int iter_tcp_soreuse(struct bpf_iter__tcp *ctx) struct sock *sk = (struct sock *)ctx->sk_common; struct inet_hashinfo *hinfo; unsigned int hash; + __u64 sock_cookie; struct net *net; int idx; if (!sk) return 0; + sock_cookie = bpf_get_socket_cookie(sk); sk = bpf_core_cast(sk, struct sock); - if (sk->sk_family != AF_INET6 || + if (sk->sk_family != sf || sk->sk_state != TCP_LISTEN || - !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr)) + sk->sk_family == AF_INET6 ? + !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr) : + !ipv4_addr_loopback(sk->sk_rcv_saddr)) return 0; if (sk->sk_num == ports[0]) @@ -52,6 +62,7 @@ int iter_tcp_soreuse(struct bpf_iter__tcp *ctx) hinfo = net->ipv4.tcp_death_row.hashinfo; bucket[idx] = hash & hinfo->lhash2_mask; bpf_seq_write(ctx->meta->seq, &idx, sizeof(idx)); + bpf_seq_write(ctx->meta->seq, &sock_cookie, sizeof(sock_cookie)); return 0; } @@ -63,14 +74,18 @@ int iter_udp_soreuse(struct bpf_iter__udp *ctx) { struct sock *sk = (struct sock *)ctx->udp_sk; struct udp_table *udptable; + __u64 sock_cookie; int idx; if (!sk) return 0; + sock_cookie = bpf_get_socket_cookie(sk); sk = bpf_core_cast(sk, struct sock); - if (sk->sk_family != AF_INET6 || - !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr)) + if (sk->sk_family != sf || + sk->sk_family == AF_INET6 ? + !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr) : + !ipv4_addr_loopback(sk->sk_rcv_saddr)) return 0; if (sk->sk_num == ports[0]) @@ -84,6 +99,7 @@ int iter_udp_soreuse(struct bpf_iter__udp *ctx) udptable = sk->sk_net.net->ipv4.udp_table; bucket[idx] = udp_sk(sk)->udp_portaddr_hash & udptable->mask; bpf_seq_write(ctx->meta->seq, &idx, sizeof(idx)); + bpf_seq_write(ctx->meta->seq, &sock_cookie, sizeof(sock_cookie)); return 0; }