From patchwork Sat Jan 11 23:07:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 13936193 X-Patchwork-Delegate: kuba@kernel.org Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B06861CFBC; Sat, 11 Jan 2025 23:08:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.188.207 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636890; cv=none; b=uSocqzRBwuHwOY8TJqxcs8AR925B2/oXP75YfZSqZE0VlsW+sxXpbKOGTsOmf9d/c344iQwU+11Ro9ehSrvlu5DgE+JIPQIBUOvwGBhqwDWEC4/gfWct6++43i6Z7RhAS8AwI8rtup76gf69xLkS9+ag4JWcSfY1lM52o82wb70= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636890; c=relaxed/simple; bh=92EaKkeevPv19MH0eBUuD5KCmOqEQ8K6HwML4uHUX1M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DsoUfGIqHs1J3YWCr8/AVTVJGlnfb18NharXIMiaSwwdjUMipO0m5FCWonOtdktZZ4UM1X5ayeAgo/jdEE6vgQpVRu7yb99oHUny8jhk57qcsf9KgaTxMSEGRmGG7G3WbQ6K5dM3EyXM1Uu+Y42azxajHqlhfIJ+Clx02GVnu3U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org; spf=pass smtp.mailfrom=netfilter.org; arc=none smtp.client-ip=217.70.188.207 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=netfilter.org From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, fw@strlen.de, kadlec@netfilter.org Subject: [PATCH net-next 1/4] netfilter: nf_tables: remove the genmask parameter Date: Sun, 12 Jan 2025 00:07:57 +0100 Message-Id: <20250111230800.67349-2-pablo@netfilter.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20250111230800.67349-1-pablo@netfilter.org> References: <20250111230800.67349-1-pablo@netfilter.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: tuqiang The genmask parameter is not used within the nf_tables_addchain function body. It should be removed to simplify the function parameter list. Signed-off-by: tuqiang Signed-off-by: Jiang Kun Reviewed-by: Simon Horman Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 0b9f1e8dfe49..f7ca7165e66e 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2598,9 +2598,8 @@ int nft_chain_add(struct nft_table *table, struct nft_chain *chain) static u64 chain_id; -static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, - u8 policy, u32 flags, - struct netlink_ext_ack *extack) +static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 policy, + u32 flags, struct netlink_ext_ack *extack) { const struct nlattr * const *nla = ctx->nla; struct nft_table *table = ctx->table; @@ -3038,7 +3037,7 @@ static int nf_tables_newchain(struct sk_buff *skb, const struct nfnl_info *info, extack); } - return nf_tables_addchain(&ctx, family, genmask, policy, flags, extack); + return nf_tables_addchain(&ctx, family, policy, flags, extack); } static int nft_delchain_hook(struct nft_ctx *ctx, From patchwork Sat Jan 11 23:07:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 13936195 X-Patchwork-Delegate: kuba@kernel.org Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 652FE1BBBD3; Sat, 11 Jan 2025 23:08:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.188.207 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636891; cv=none; b=lnoJaO1afE1ljeyVRxa1/EkxJWtR0GczEOuhH2MZLobSR02FfvKepdlgxByO9QZZpTxBAIZn0vY8ZhGWtbQj/h28Uo+XWjoH4wPNjXtgU/xpnUGnDfs05m7t31BTQRO9uxIryyvoTJITFpZRzAtnxeboSr2tJxCoZgxNm/JjRJM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636891; c=relaxed/simple; bh=5BBujJnzs13bQBzd2BKpJQXOIZnHmbKqBsFueEYWZVk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RffjxKVlu/WQC/+xhUax9qz4QzpH2eqZ99fZMn2LxuuYObJ40v1nfx5hJkogPYOkBC8jjDrhH2Ms7Dui/vMwK0DT57ffYjLbCbH151M6SZLV+xQ+N8uReX2bx0XyhRa6onVA2eaJ1RQxu8o6Re554mp4aySdtCFtFWzGmmW4dWw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org; spf=pass smtp.mailfrom=netfilter.org; arc=none smtp.client-ip=217.70.188.207 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=netfilter.org From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, fw@strlen.de, kadlec@netfilter.org Subject: [PATCH net-next 2/4] ipvs: speed up reads from ip_vs_conn proc file Date: Sun, 12 Jan 2025 00:07:58 +0100 Message-Id: <20250111230800.67349-3-pablo@netfilter.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20250111230800.67349-1-pablo@netfilter.org> References: <20250111230800.67349-1-pablo@netfilter.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Florian Westphal Reading is very slow because ->start() performs a linear re-scan of the entire hash table until it finds the successor to the last dumped element. The current implementation uses 'pos' as the 'number of elements to skip, then does linear iteration until it has skipped 'pos' entries. Store the last bucket and the number of elements to skip in that bucket instead, so we can resume from bucket b directly. before this patch, its possible to read ~35k entries in one second, but each read() gets slower as the number of entries to skip grows: time timeout 60 cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all real 1m0.007s user 0m0.003s sys 0m59.956s 140386 /tmp/all Only ~100k more got read in remaining the remaining 59s, and did not get nowhere near the 1m entries that are stored at the time. after this patch, dump completes very quickly: time cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all real 0m2.286s user 0m0.004s sys 0m2.281s 1000001 /tmp/all Signed-off-by: Florian Westphal Acked-by: Julian Anastasov Signed-off-by: Pablo Neira Ayuso --- net/netfilter/ipvs/ip_vs_conn.c | 50 ++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index c0289f83f96d..20a1727e2457 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1046,28 +1046,35 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p, int dest_af, #ifdef CONFIG_PROC_FS struct ip_vs_iter_state { struct seq_net_private p; - struct hlist_head *l; + unsigned int bucket; + unsigned int skip_elems; }; -static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos) +static void *ip_vs_conn_array(struct ip_vs_iter_state *iter) { int idx; struct ip_vs_conn *cp; - struct ip_vs_iter_state *iter = seq->private; - for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { + for (idx = iter->bucket; idx < ip_vs_conn_tab_size; idx++) { + unsigned int skip = 0; + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { /* __ip_vs_conn_get() is not needed by * ip_vs_conn_seq_show and ip_vs_conn_sync_seq_show */ - if (pos-- == 0) { - iter->l = &ip_vs_conn_tab[idx]; + if (skip >= iter->skip_elems) { + iter->bucket = idx; return cp; } + + ++skip; } + + iter->skip_elems = 0; cond_resched_rcu(); } + iter->bucket = idx; return NULL; } @@ -1076,9 +1083,14 @@ static void *ip_vs_conn_seq_start(struct seq_file *seq, loff_t *pos) { struct ip_vs_iter_state *iter = seq->private; - iter->l = NULL; rcu_read_lock(); - return *pos ? ip_vs_conn_array(seq, *pos - 1) :SEQ_START_TOKEN; + if (*pos == 0) { + iter->skip_elems = 0; + iter->bucket = 0; + return SEQ_START_TOKEN; + } + + return ip_vs_conn_array(iter); } static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos) @@ -1086,28 +1098,22 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos) struct ip_vs_conn *cp = v; struct ip_vs_iter_state *iter = seq->private; struct hlist_node *e; - struct hlist_head *l = iter->l; - int idx; ++*pos; if (v == SEQ_START_TOKEN) - return ip_vs_conn_array(seq, 0); + return ip_vs_conn_array(iter); /* more on same hash chain? */ e = rcu_dereference(hlist_next_rcu(&cp->c_list)); - if (e) + if (e) { + iter->skip_elems++; return hlist_entry(e, struct ip_vs_conn, c_list); - - idx = l - ip_vs_conn_tab; - while (++idx < ip_vs_conn_tab_size) { - hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { - iter->l = &ip_vs_conn_tab[idx]; - return cp; - } - cond_resched_rcu(); } - iter->l = NULL; - return NULL; + + iter->skip_elems = 0; + iter->bucket++; + + return ip_vs_conn_array(iter); } static void ip_vs_conn_seq_stop(struct seq_file *seq, void *v) From patchwork Sat Jan 11 23:07:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 13936196 X-Patchwork-Delegate: kuba@kernel.org Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4FA591BBBC0; Sat, 11 Jan 2025 23:08:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.188.207 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636892; cv=none; b=a5Rxn5lDWGBA1957ZQ/om6XchRHHCg/nfcFbfyau5s0XOI1dBNYXNUeHszBQLW2rqhWQ/Q7NESh778+lZAyAYfBH+Wl2O0vStY13OQS1EgSmmyJPLQ8wMd0OqD08+ywxxqRJkZ7cB4KV/X5qSpk4Nr9JTpf6aHowObOaJa2ba24= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636892; c=relaxed/simple; bh=rh+NXkFkOSCld0emXqgPxXDTnO4nE88OW8sOnUZBH8Y=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CEy5+JZbgcDmKRkdhPT2eUXUEyY4t0zsB+c+6Zm4s/+R+BP9aYz73V7c3iLARYqpAxXO1blJbJPD9AElJkBKPM3lGq4kCmdkTjbP957qMOnthtoswIhYA8VuUvumshREHtTJlwaLJS3KJWYWGenaMbFwo0sh1AaQ8+iVNqPOqIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org; spf=pass smtp.mailfrom=netfilter.org; arc=none smtp.client-ip=217.70.188.207 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=netfilter.org From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, fw@strlen.de, kadlec@netfilter.org Subject: [PATCH net-next 3/4] netfilter: xt_hashlimit: htable_selective_cleanup() optimization Date: Sun, 12 Jan 2025 00:07:59 +0100 Message-Id: <20250111230800.67349-4-pablo@netfilter.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20250111230800.67349-1-pablo@netfilter.org> References: <20250111230800.67349-1-pablo@netfilter.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet I have seen syzbot reports hinting at xt_hashlimit abuse: [ 105.783066][ T4331] xt_hashlimit: max too large, truncated to 1048576 [ 105.811405][ T4331] xt_hashlimit: size too large, truncated to 1048576 And worker threads using up to 1 second per htable_selective_cleanup() invocation. [ 269.734496][ C1] [] ? __local_bh_enable_ip+0x1a0/0x1a0 [ 269.734513][ C1] [] ? lockdep_hardirqs_on_prepare+0x740/0x740 [ 269.734533][ C1] [] ? htable_selective_cleanup+0x25f/0x310 [ 269.734549][ C1] [] ? __lock_acquire+0x2060/0x2060 [ 269.734567][ C1] [] ? do_raw_spin_lock+0x14a/0x370 [ 269.734583][ C1] [] ? htable_selective_cleanup+0x25f/0x310 [ 269.734599][ C1] [] __local_bh_enable_ip+0x167/0x1a0 [ 269.734616][ C1] [] ? _local_bh_enable+0xa0/0xa0 [ 269.734634][ C1] [] ? htable_selective_cleanup+0x25f/0x310 [ 269.734651][ C1] [] htable_selective_cleanup+0x25f/0x310 [ 269.734670][ C1] [] ? process_one_work+0x7a9/0x1170 [ 269.734685][ C1] [] htable_gc+0x1b/0xa0 [ 269.734700][ C1] [] ? process_one_work+0x7a9/0x1170 [ 269.734714][ C1] [] process_one_work+0x8a9/0x1170 [ 269.734733][ C1] [] ? worker_detach_from_pool+0x260/0x260 [ 269.734749][ C1] [] ? _raw_spin_lock_irq+0xb7/0xf0 [ 269.734763][ C1] [] ? _raw_spin_lock_irqsave+0x100/0x100 [ 269.734777][ C1] [] ? wq_worker_sleeping+0x5f/0x270 [ 269.734800][ C1] [] worker_thread+0xa47/0x1200 [ 269.734815][ C1] [] ? _raw_spin_lock+0x40/0x40 [ 269.734835][ C1] [] kthread+0x25a/0x2e0 [ 269.734853][ C1] [] ? worker_clr_flags+0x190/0x190 [ 269.734866][ C1] [] ? kthread_blkcg+0xd0/0xd0 [ 269.734885][ C1] [] ret_from_fork+0x3a/0x50 We can skip over empty buckets, avoiding the lockdep penalty for debug kernels, and avoid atomic operations on non debug ones. Signed-off-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso --- net/netfilter/xt_hashlimit.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c index 0859b8f76764..fa02aab56724 100644 --- a/net/netfilter/xt_hashlimit.c +++ b/net/netfilter/xt_hashlimit.c @@ -363,11 +363,15 @@ static void htable_selective_cleanup(struct xt_hashlimit_htable *ht, bool select unsigned int i; for (i = 0; i < ht->cfg.size; i++) { + struct hlist_head *head = &ht->hash[i]; struct dsthash_ent *dh; struct hlist_node *n; + if (hlist_empty(head)) + continue; + spin_lock_bh(&ht->lock); - hlist_for_each_entry_safe(dh, n, &ht->hash[i], node) { + hlist_for_each_entry_safe(dh, n, head, node) { if (time_after_eq(jiffies, dh->expires) || select_all) dsthash_free(ht, dh); } From patchwork Sat Jan 11 23:08:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 13936197 X-Patchwork-Delegate: kuba@kernel.org Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B6E771BD9C8; Sat, 11 Jan 2025 23:08:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.188.207 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636892; cv=none; b=YQpeqJvenNQnHqGqIQvDPYcmcb3RYVlRZ9mrerGEC7Tz9enN/ZNp04OLAVigebq60Lcb8Oh4cR8oS34PuZ1y+LmuRVaIh3ZL4Re9c7l/TQ1u2WTlb/d9/rW3G8iG19Oi8J/TEQYr2dNuZr6Y7NDLBUNZpS0EvVsFaDcVML2lKH8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736636892; c=relaxed/simple; bh=a/b/3iDX3jtrrVaU98TxS5yUDymtDH4DbBDbtQPbMNI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bgnrBunRh3a/x/lkoq3TvdN/BnmptODKLEmMuQ1TEcaeRITRD2Iv2KOSUL/a97QhaabL5ANioX6Ba6dOQEK5o6zEShqJr9D58ZQc/vCdN0eXOFCRyfbM1VaN8P518xuiqQHgrSRS6Z1bIbcCXfLS9pckGj7dLZ7nWQKfhNbjEoc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org; spf=pass smtp.mailfrom=netfilter.org; arc=none smtp.client-ip=217.70.188.207 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=netfilter.org From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, fw@strlen.de, kadlec@netfilter.org Subject: [PATCH net-next 4/4] netfilter: conntrack: add conntrack event timestamp Date: Sun, 12 Jan 2025 00:08:00 +0100 Message-Id: <20250111230800.67349-5-pablo@netfilter.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20250111230800.67349-1-pablo@netfilter.org> References: <20250111230800.67349-1-pablo@netfilter.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Florian Westphal Nadia Pinaeva writes: I am working on a tool that allows collecting network performance metrics by using conntrack events. Start time of a conntrack entry is used to evaluate seen_reply latency, therefore the sooner it is timestamped, the better the precision is. In particular, when using this tool to compare the performance of the same feature implemented using iptables/nftables/OVS it is crucial to have the entry timestamped earlier to see any difference. At this time, conntrack events can only get timestamped at recv time in userspace, so there can be some delay between the event being generated and the userspace process consuming the message. There is sys/net/netfilter/nf_conntrack_timestamp, which adds a 64bit timestamp (ns resolution) that records start and stop times, but its not suited for this either, start time is the 'hashtable insertion time', not 'conntrack allocation time'. There is concern that moving the start-time moment to conntrack allocation will add overhead in case of flooding, where conntrack entries are allocated and released right away without getting inserted into the hashtable. Also, even if this was changed it would not with events other than new (start time) and destroy (stop time). Pablo suggested to add new CTA_TIMESTAMP_EVENT, this adds this feature. The timestamp is recorded in case both events are requested and the sys/net/netfilter/nf_conntrack_timestamp toggle is enabled. Reported-by: Nadia Pinaeva Suggested-by: Pablo Neira Ayuso Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_conntrack_ecache.h | 12 +++++++++ .../linux/netfilter/nfnetlink_conntrack.h | 1 + net/netfilter/nf_conntrack_ecache.c | 23 +++++++++++++++++ net/netfilter/nf_conntrack_netlink.c | 25 +++++++++++++++++++ 4 files changed, 61 insertions(+) diff --git a/include/net/netfilter/nf_conntrack_ecache.h b/include/net/netfilter/nf_conntrack_ecache.h index 0c1dac318e02..8dcf7c371ee9 100644 --- a/include/net/netfilter/nf_conntrack_ecache.h +++ b/include/net/netfilter/nf_conntrack_ecache.h @@ -12,6 +12,7 @@ #include #include #include +#include enum nf_ct_ecache_state { NFCT_ECACHE_DESTROY_FAIL, /* tried but failed to send destroy event */ @@ -20,6 +21,9 @@ enum nf_ct_ecache_state { struct nf_conntrack_ecache { unsigned long cache; /* bitops want long */ +#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP + local64_t timestamp; /* event timestamp, in nanoseconds */ +#endif u16 ctmask; /* bitmask of ct events to be delivered */ u16 expmask; /* bitmask of expect events to be delivered */ u32 missed; /* missed events */ @@ -108,6 +112,14 @@ nf_conntrack_event_cache(enum ip_conntrack_events event, struct nf_conn *ct) if (e == NULL) return; +#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP + /* renew only if this is the first cached event, so that the + * timestamp reflects the first, not the last, generated event. + */ + if (local64_read(&e->timestamp) && READ_ONCE(e->cache) == 0) + local64_set(&e->timestamp, ktime_get_real_ns()); +#endif + set_bit(event, &e->cache); #endif } diff --git a/include/uapi/linux/netfilter/nfnetlink_conntrack.h b/include/uapi/linux/netfilter/nfnetlink_conntrack.h index c2ac7269acf7..43233af75b9d 100644 --- a/include/uapi/linux/netfilter/nfnetlink_conntrack.h +++ b/include/uapi/linux/netfilter/nfnetlink_conntrack.h @@ -57,6 +57,7 @@ enum ctattr_type { CTA_SYNPROXY, CTA_FILTER, CTA_STATUS_MASK, + CTA_TIMESTAMP_EVENT, __CTA_MAX }; #define CTA_MAX (__CTA_MAX - 1) diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c index 69948e1d6974..af68c64acaab 100644 --- a/net/netfilter/nf_conntrack_ecache.c +++ b/net/netfilter/nf_conntrack_ecache.c @@ -162,6 +162,14 @@ static int __nf_conntrack_eventmask_report(struct nf_conntrack_ecache *e, return ret; } +static void nf_ct_ecache_tstamp_refresh(struct nf_conntrack_ecache *e) +{ +#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP + if (local64_read(&e->timestamp)) + local64_set(&e->timestamp, ktime_get_real_ns()); +#endif +} + int nf_conntrack_eventmask_report(unsigned int events, struct nf_conn *ct, u32 portid, int report) { @@ -186,6 +194,8 @@ int nf_conntrack_eventmask_report(unsigned int events, struct nf_conn *ct, /* This is a resent of a destroy event? If so, skip missed */ missed = e->portid ? 0 : e->missed; + nf_ct_ecache_tstamp_refresh(e); + ret = __nf_conntrack_eventmask_report(e, events, missed, &item); if (unlikely(ret < 0 && (events & (1 << IPCT_DESTROY)))) { /* This is a destroy event that has been triggered by a process, @@ -297,6 +307,18 @@ void nf_conntrack_ecache_work(struct net *net, enum nf_ct_ecache_state state) } } +static void nf_ct_ecache_tstamp_new(const struct nf_conn *ct, struct nf_conntrack_ecache *e) +{ +#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP + u64 ts = 0; + + if (nf_ct_ext_exist(ct, NF_CT_EXT_TSTAMP)) + ts = ktime_get_real_ns(); + + local64_set(&e->timestamp, ts); +#endif +} + bool nf_ct_ecache_ext_add(struct nf_conn *ct, u16 ctmask, u16 expmask, gfp_t gfp) { struct net *net = nf_ct_net(ct); @@ -326,6 +348,7 @@ bool nf_ct_ecache_ext_add(struct nf_conn *ct, u16 ctmask, u16 expmask, gfp_t gfp e = nf_ct_ext_add(ct, NF_CT_EXT_ECACHE, gfp); if (e) { + nf_ct_ecache_tstamp_new(ct, e); e->ctmask = ctmask; e->expmask = expmask; } diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 36168f8b6efa..2277b744eb2c 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -383,6 +383,23 @@ static int ctnetlink_dump_secctx(struct sk_buff *skb, const struct nf_conn *ct) #endif #ifdef CONFIG_NF_CONNTRACK_EVENTS +static int +ctnetlink_dump_event_timestamp(struct sk_buff *skb, const struct nf_conn *ct) +{ +#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP + const struct nf_conntrack_ecache *e = nf_ct_ecache_find(ct); + + if (e) { + u64 ts = local64_read(&e->timestamp); + + if (ts) + return nla_put_be64(skb, CTA_TIMESTAMP_EVENT, + cpu_to_be64(ts), CTA_TIMESTAMP_PAD); + } +#endif + return 0; +} + static inline int ctnetlink_label_size(const struct nf_conn *ct) { struct nf_conn_labels *labels = nf_ct_labels_find(ct); @@ -717,6 +734,9 @@ static size_t ctnetlink_nlmsg_size(const struct nf_conn *ct) #endif + ctnetlink_proto_size(ct) + ctnetlink_label_size(ct) +#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP + + nla_total_size(sizeof(u64)) /* CTA_TIMESTAMP_EVENT */ +#endif ; } @@ -838,6 +858,10 @@ ctnetlink_conntrack_event(unsigned int events, const struct nf_ct_event *item) if (ctnetlink_dump_mark(skb, ct, events & (1 << IPCT_MARK))) goto nla_put_failure; #endif + + if (ctnetlink_dump_event_timestamp(skb, ct)) + goto nla_put_failure; + nlmsg_end(skb, nlh); err = nfnetlink_send(skb, net, item->portid, group, item->report, GFP_ATOMIC); @@ -1557,6 +1581,7 @@ static const struct nla_policy ct_nla_policy[CTA_MAX+1] = { .len = NF_CT_LABELS_MAX_SIZE }, [CTA_FILTER] = { .type = NLA_NESTED }, [CTA_STATUS_MASK] = { .type = NLA_U32 }, + [CTA_TIMESTAMP_EVENT] = { .type = NLA_REJECT }, }; static int ctnetlink_flush_iterate(struct nf_conn *ct, void *data)