diff mbox series

[bpf,v2,1/3] bpf: Enable IRQ after irq_work_raise() completes in unit_alloc()

Message ID 20230901111954.1804721-2-houtao@huaweicloud.com (mailing list archive)
State Accepted
Commit d2fc491bc41d54006c67fb589d4920c3c6c37101
Delegated to: BPF
Headers show
Series bpf: Enable IRQ after irq_work_raise() completes | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1335 this patch: 1335
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1358 this patch: 1358
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 18 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-0 success Logs for ShellCheck
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-1 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-28 success Logs for veristat
bpf/vmtest-bpf-VM_Test-6 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-10 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-14 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-16 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-18 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-21 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-22 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-15 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-25 fail Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for test_progs on s390x with gcc

Commit Message

Hou Tao Sept. 1, 2023, 11:19 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

When doing stress test for qp-trie, bpf_mem_alloc() returned NULL
unexpectedly because all qp-trie operations were initiated from
bpf syscalls and there was still available free memory. bpf_obj_new()
has the same problem as shown by the following selftest.

The failure is due to the preemption. irq_work_raise() will invoke
irq_work_claim() first to mark the irq work as pending and then inovke
__irq_work_queue_local() to raise an IPI. So when the current task
which is invoking irq_work_raise() is preempted by other task,
unit_alloc() may return NULL for preemption task as shown below:

task A         task B

unit_alloc()
  // low_watermark = 32
  // free_cnt = 31 after alloc
  irq_work_raise()
    // mark irq work as IRQ_WORK_PENDING
    irq_work_claim()

	       // task B preempts task A
	       unit_alloc()
	         // free_cnt = 30 after alloc
	         // irq work is already PENDING,
	         // so just return
	         irq_work_raise()
	       // does unit_alloc() 30-times
	       ......
	       unit_alloc()
	         // free_cnt = 0 before alloc
	         return NULL

Fix it by enabling IRQ after irq_work_raise() completes. An alternative
fix is using preempt_{disable|enable}_notrace() pair, but it may have
extra overhead. Another feasible fix is to only disable preemption or
IRQ before invoking irq_work_queue() and enable preemption or IRQ after
the invocation completes, but it can't handle the case when
c->low_watermark is 1.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/memalloc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 9c49ae53deaf..962472ace98c 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -734,12 +734,17 @@  static void notrace *unit_alloc(struct bpf_mem_cache *c)
 		}
 	}
 	local_dec(&c->active);
-	local_irq_restore(flags);
 
 	WARN_ON(cnt < 0);
 
 	if (cnt < c->low_watermark)
 		irq_work_raise(c);
+	/* Enable IRQ after the enqueue of irq work completes, so irq work
+	 * will run after IRQ is enabled and free_llist may be refilled by
+	 * irq work before other task preempts current task.
+	 */
+	local_irq_restore(flags);
+
 	return llnode;
 }