diff mbox series

[net-next,01/19] netfilter: conntrack: revisit gc autotuning

Message ID 20220321123052.70553-2-pablo@netfilter.org (mailing list archive)
State Accepted
Commit 2cfadb761d3d0219412fd8150faea60c7e863833
Delegated to: Netdev Maintainers
Headers show
Series [net-next,01/19] netfilter: conntrack: revisit gc autotuning | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Pull request is its own cover letter
netdev/patch_count warning Series longer than 15 patches
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/cc_maintainers warning 4 maintainers not CCed: kadlec@netfilter.org coreteam@netfilter.org pabeni@redhat.com fw@strlen.de
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch warning WARNING: line length of 106 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Pablo Neira Ayuso March 21, 2022, 12:30 p.m. UTC
From: Florian Westphal <fw@strlen.de>

as of commit 4608fdfc07e1
("netfilter: conntrack: collect all entries in one cycle")
conntrack gc was changed to run every 2 minutes.

On systems where conntrack hash table is set to large value, most evictions
happen from gc worker rather than the packet path due to hash table
distribution.

This causes netlink event overflows when events are collected.

This change collects average expiry of scanned entries and
reschedules to the average remaining value, within 1 to 60 second interval.

To avoid event overflows, reschedule after each bucket and add a
limit for both run time and number of evictions per run.

If more entries have to be evicted, reschedule and restart 1 jiffy
into the future.

Reported-by: Karel Rericha <karel@maxtel.cz>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 85 ++++++++++++++++++++++++-------
 1 file changed, 68 insertions(+), 17 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org March 21, 2022, 1:20 p.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (master)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Mon, 21 Mar 2022 13:30:34 +0100 you wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> as of commit 4608fdfc07e1
> ("netfilter: conntrack: collect all entries in one cycle")
> conntrack gc was changed to run every 2 minutes.
> 
> On systems where conntrack hash table is set to large value, most evictions
> happen from gc worker rather than the packet path due to hash table
> distribution.
> 
> [...]

Here is the summary with links:
  - [net-next,01/19] netfilter: conntrack: revisit gc autotuning
    https://git.kernel.org/netdev/net-next/c/2cfadb761d3d
  - [net-next,02/19] netfilter: conntrack: Add and use nf_ct_set_auto_assign_helper_warned()
    https://git.kernel.org/netdev/net-next/c/31d0bb9763ef
  - [net-next,03/19] netfilter: nf_tables: do not reduce read-only expressions
    https://git.kernel.org/netdev/net-next/c/b2d306542ff9
  - [net-next,04/19] netfilter: nf_tables: cancel tracking for clobbered destination registers
    https://git.kernel.org/netdev/net-next/c/34cc9e52884a
  - [net-next,05/19] netfilter: nft_ct: track register operations
    https://git.kernel.org/netdev/net-next/c/03858af0135f
  - [net-next,06/19] netfilter: nft_lookup: only cancel tracking for clobbered dregs
    https://git.kernel.org/netdev/net-next/c/e50ae445fb70
  - [net-next,07/19] netfilter: nft_meta: extend reduce support to bridge family
    https://git.kernel.org/netdev/net-next/c/aaa7b20bd4d6
  - [net-next,08/19] netfilter: nft_numgen: cancel register tracking
    https://git.kernel.org/netdev/net-next/c/4e2b29d88168
  - [net-next,09/19] netfilter: nft_osf: track register operations
    https://git.kernel.org/netdev/net-next/c/ffe6488e624e
  - [net-next,10/19] netfilter: nft_hash: track register operations
    https://git.kernel.org/netdev/net-next/c/5da03b566626
  - [net-next,11/19] netfilter: nft_immediate: cancel register tracking for data destination register
    https://git.kernel.org/netdev/net-next/c/71ef842d73f6
  - [net-next,12/19] netfilter: nft_socket: track register operations
    https://git.kernel.org/netdev/net-next/c/d77a721d212d
  - [net-next,13/19] netfilter: nft_xfrm: track register operations
    https://git.kernel.org/netdev/net-next/c/48f1910326ea
  - [net-next,14/19] netfilter: nft_tunnel: track register operations
    https://git.kernel.org/netdev/net-next/c/611580d2df1f
  - [net-next,15/19] netfilter: nft_fib: add reduce support
    https://git.kernel.org/netdev/net-next/c/3c1eb413a45b
  - [net-next,16/19] netfilter: nft_exthdr: add reduce support
    https://git.kernel.org/netdev/net-next/c/e86dbdb9d461
  - [net-next,17/19] netfilter: nf_nat_h323: eliminate anonymous module_init & module_exit
    https://git.kernel.org/netdev/net-next/c/fd4213929053
  - [net-next,18/19] netfilter: flowtable: remove redundant field in flow_offload_work struct
    https://git.kernel.org/netdev/net-next/c/bb321ed6bbaa
  - [net-next,19/19] netfilter: flowtable: pass flowtable to nf_flow_table_iterate()
    https://git.kernel.org/netdev/net-next/c/217cff36e885

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index d1a58ed357a4..0164e5f522e8 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -66,6 +66,8 @@  EXPORT_SYMBOL_GPL(nf_conntrack_hash);
 struct conntrack_gc_work {
 	struct delayed_work	dwork;
 	u32			next_bucket;
+	u32			avg_timeout;
+	u32			start_time;
 	bool			exiting;
 	bool			early_drop;
 };
@@ -77,8 +79,19 @@  static __read_mostly bool nf_conntrack_locks_all;
 /* serialize hash resizes and nf_ct_iterate_cleanup */
 static DEFINE_MUTEX(nf_conntrack_mutex);
 
-#define GC_SCAN_INTERVAL	(120u * HZ)
+#define GC_SCAN_INTERVAL_MAX	(60ul * HZ)
+#define GC_SCAN_INTERVAL_MIN	(1ul * HZ)
+
+/* clamp timeouts to this value (TCP unacked) */
+#define GC_SCAN_INTERVAL_CLAMP	(300ul * HZ)
+
+/* large initial bias so that we don't scan often just because we have
+ * three entries with a 1s timeout.
+ */
+#define GC_SCAN_INTERVAL_INIT	INT_MAX
+
 #define GC_SCAN_MAX_DURATION	msecs_to_jiffies(10)
+#define GC_SCAN_EXPIRED_MAX	(64000u / HZ)
 
 #define MIN_CHAINLEN	8u
 #define MAX_CHAINLEN	(32u - MIN_CHAINLEN)
@@ -1420,16 +1433,28 @@  static bool gc_worker_can_early_drop(const struct nf_conn *ct)
 
 static void gc_worker(struct work_struct *work)
 {
-	unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION;
 	unsigned int i, hashsz, nf_conntrack_max95 = 0;
-	unsigned long next_run = GC_SCAN_INTERVAL;
+	u32 end_time, start_time = nfct_time_stamp;
 	struct conntrack_gc_work *gc_work;
+	unsigned int expired_count = 0;
+	unsigned long next_run;
+	s32 delta_time;
+
 	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
 	i = gc_work->next_bucket;
 	if (gc_work->early_drop)
 		nf_conntrack_max95 = nf_conntrack_max / 100u * 95u;
 
+	if (i == 0) {
+		gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT;
+		gc_work->start_time = start_time;
+	}
+
+	next_run = gc_work->avg_timeout;
+
+	end_time = start_time + GC_SCAN_MAX_DURATION;
+
 	do {
 		struct nf_conntrack_tuple_hash *h;
 		struct hlist_nulls_head *ct_hash;
@@ -1446,6 +1471,7 @@  static void gc_worker(struct work_struct *work)
 
 		hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
 			struct nf_conntrack_net *cnet;
+			unsigned long expires;
 			struct net *net;
 
 			tmp = nf_ct_tuplehash_to_ctrack(h);
@@ -1455,11 +1481,29 @@  static void gc_worker(struct work_struct *work)
 				continue;
 			}
 
+			if (expired_count > GC_SCAN_EXPIRED_MAX) {
+				rcu_read_unlock();
+
+				gc_work->next_bucket = i;
+				gc_work->avg_timeout = next_run;
+
+				delta_time = nfct_time_stamp - gc_work->start_time;
+
+				/* re-sched immediately if total cycle time is exceeded */
+				next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX;
+				goto early_exit;
+			}
+
 			if (nf_ct_is_expired(tmp)) {
 				nf_ct_gc_expired(tmp);
+				expired_count++;
 				continue;
 			}
 
+			expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP);
+			next_run += expires;
+			next_run /= 2u;
+
 			if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp))
 				continue;
 
@@ -1477,8 +1521,10 @@  static void gc_worker(struct work_struct *work)
 				continue;
 			}
 
-			if (gc_worker_can_early_drop(tmp))
+			if (gc_worker_can_early_drop(tmp)) {
 				nf_ct_kill(tmp);
+				expired_count++;
+			}
 
 			nf_ct_put(tmp);
 		}
@@ -1491,33 +1537,38 @@  static void gc_worker(struct work_struct *work)
 		cond_resched();
 		i++;
 
-		if (time_after(jiffies, end_time) && i < hashsz) {
+		delta_time = nfct_time_stamp - end_time;
+		if (delta_time > 0 && i < hashsz) {
+			gc_work->avg_timeout = next_run;
 			gc_work->next_bucket = i;
 			next_run = 0;
-			break;
+			goto early_exit;
 		}
 	} while (i < hashsz);
 
+	gc_work->next_bucket = 0;
+
+	next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX);
+
+	delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1);
+	if (next_run > (unsigned long)delta_time)
+		next_run -= delta_time;
+	else
+		next_run = 1;
+
+early_exit:
 	if (gc_work->exiting)
 		return;
 
-	/*
-	 * Eviction will normally happen from the packet path, and not
-	 * from this gc worker.
-	 *
-	 * This worker is only here to reap expired entries when system went
-	 * idle after a busy period.
-	 */
-	if (next_run) {
+	if (next_run)
 		gc_work->early_drop = false;
-		gc_work->next_bucket = 0;
-	}
+
 	queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run);
 }
 
 static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work)
 {
-	INIT_DEFERRABLE_WORK(&gc_work->dwork, gc_worker);
+	INIT_DELAYED_WORK(&gc_work->dwork, gc_worker);
 	gc_work->exiting = false;
 }