diff mbox series

[net] netfilter: fix conntrack flows stack issue on cleanup.

Message ID 1633714526-31678-1-git-send-email-volodymyr.mytnyk@plvision.eu (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net] netfilter: fix conntrack flows stack issue on cleanup. | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Volodymyr Mytnyk Oct. 8, 2021, 5:35 p.m. UTC
From: Volodymyr Mytnyk <vmytnyk@marvell.com>

On busy system with big number (few thousands) of HW offloaded flows, it
is possible to hit the situation, where some of the conntack flows are
stuck in conntrack table (as offloaded) and cannot be removed by user.

This behaviour happens if user has configured conntack using tc sub-system,
offloaded those flows for HW and then deleted tc configuration from Linux
system by deleting the tc qdiscs.

When qdiscs are removed, the nf_flow_table_free() is called to do the
cleanup of HW offloaded flows in conntrack table.

...
process_one_work
  tcf_ct_flow_table_cleanup_work()
    nf_flow_table_free()

The nf_flow_table_free() does the following things:

  1. cancels gc workqueue
  2. marks all flows as teardown
  3. executes nf_flow_offload_gc_step() once for each flow to
     trigger correct teardown flow procedure (e.g., allocate
     work to delete the HW flow and marks the flow as "dying").
  4. waits for all scheduled flow offload works to be finished.
  5. executes nf_flow_offload_gc_step() once for each flow to
     trigger the deleting of flows.

Root cause:

In step 3, nf_flow_offload_gc_step() expects to move flow to "dying"
state by using nf_flow_offload_del() and deletes the flow in next
nf_flow_offload_gc_step() iteration. But, if flow is in "pending" state
for some reason (e.g., reading HW stats), it will not be moved to
"dying" state as expected by nf_flow_offload_gc_step() and will not
be marked as "dead" for delition.

In step 5, nf_flow_offload_gc_step() assumes that all flows marked
as "dead" and will be deleted by this call, but this is not true since
the state was not set diring previous nf_flow_offload_gc_step()
call.

It issue causes some of the flows to get stuck in connection tracking
system or not release properly.

To fix this problem, add nf_flow_table_offload_flush() call between 2 & 3
step, to make sure no other flow offload works will be in "pending" state
during step 3.

Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
---
 net/netfilter/nf_flow_table_core.c | 2 ++
 1 file changed, 2 insertions(+)
diff mbox series

Patch

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 1e50908b1b7e..0de79835f628 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -638,6 +638,8 @@  void nf_flow_table_free(struct nf_flowtable *flow_table)
 
 	cancel_delayed_work_sync(&flow_table->gc_work);
 	nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
+	/* wait to finish */
+	nf_flow_table_offload_flush(flow_table);
 	nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, flow_table);
 	nf_flow_table_offload_flush(flow_table);
 	if (nf_flowtable_hw_offload(flow_table))