diff mbox series

[bpf-next] bpf: Use separate RCU callbacks for freeing selem

Message ID 20230303141542.300068-1-memxor@gmail.com (mailing list archive)
State Deferred
Delegated to: BPF
Headers show
Series [bpf-next] bpf: Use separate RCU callbacks for freeing selem | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on s390x with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 93 this patch: 94
netdev/cc_maintainers warning 7 maintainers not CCed: jolsa@kernel.org john.fastabend@gmail.com martin.lau@linux.dev yhs@fb.com song@kernel.org haoluo@google.com sdf@google.com
netdev/build_clang success Errors and warnings before: 12 this patch: 12
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 93 this patch: 94
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on s390x with gcc

Commit Message

Kumar Kartikeya Dwivedi March 3, 2023, 2:15 p.m. UTC
Martin suggested that instead of using a byte in the hole (which he has
a use for in his future patch) in bpf_local_storage_elem, we can
dispatch a different call_rcu callback based on whether we need to free
special fields in bpf_local_storage_elem data. The free path, described
in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"),
only waits for call_rcu callbacks when there are special (kptrs, etc.)
fields in the map value, hence it is necessary that we only access
smap in this case.

Therefore, dispatch different RCU callbacks based on the BPF map has a
valid btf_record, which dereference and use smap's btf_record only when
it is valid.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_local_storage.h |  6 ---
 kernel/bpf/bpf_local_storage.c    | 76 +++++++++++++++++++------------
 2 files changed, 46 insertions(+), 36 deletions(-)

--
2.39.2

Comments

Martin KaFai Lau March 3, 2023, 3:40 p.m. UTC | #1
On 3/3/23 6:15 AM, Kumar Kartikeya Dwivedi wrote:
> Martin suggested that instead of using a byte in the hole (which he has
> a use for in his future patch) in bpf_local_storage_elem, we can
> dispatch a different call_rcu callback based on whether we need to free
> special fields in bpf_local_storage_elem data. The free path, described
> in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"),
> only waits for call_rcu callbacks when there are special (kptrs, etc.)
> fields in the map value, hence it is necessary that we only access
> smap in this case.
> 
> Therefore, dispatch different RCU callbacks based on the BPF map has a
> valid btf_record, which dereference and use smap's btf_record only when
> it is valid.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Thanks for your patch. I have already made a similar change in my local branch 
which has some differences like refactored it a little for my work. The set is 
almost ready. Do you mind if I include your patch in my set and keep your SOB?
Kumar Kartikeya Dwivedi March 3, 2023, 5:45 p.m. UTC | #2
On Fri, 3 Mar 2023 at 16:40, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 3/3/23 6:15 AM, Kumar Kartikeya Dwivedi wrote:
> > Martin suggested that instead of using a byte in the hole (which he has
> > a use for in his future patch) in bpf_local_storage_elem, we can
> > dispatch a different call_rcu callback based on whether we need to free
> > special fields in bpf_local_storage_elem data. The free path, described
> > in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"),
> > only waits for call_rcu callbacks when there are special (kptrs, etc.)
> > fields in the map value, hence it is necessary that we only access
> > smap in this case.
> >
> > Therefore, dispatch different RCU callbacks based on the BPF map has a
> > valid btf_record, which dereference and use smap's btf_record only when
> > it is valid.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>
> Thanks for your patch. I have already made a similar change in my local branch
> which has some differences like refactored it a little for my work. The set is
> almost ready. Do you mind if I include your patch in my set and keep your SOB?
>

No problem, please do.
Martin KaFai Lau March 3, 2023, 5:55 p.m. UTC | #3
On 3/3/23 9:45 AM, Kumar Kartikeya Dwivedi wrote:
> On Fri, 3 Mar 2023 at 16:40, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 3/3/23 6:15 AM, Kumar Kartikeya Dwivedi wrote:
>>> Martin suggested that instead of using a byte in the hole (which he has
>>> a use for in his future patch) in bpf_local_storage_elem, we can
>>> dispatch a different call_rcu callback based on whether we need to free
>>> special fields in bpf_local_storage_elem data. The free path, described
>>> in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"),
>>> only waits for call_rcu callbacks when there are special (kptrs, etc.)
>>> fields in the map value, hence it is necessary that we only access
>>> smap in this case.
>>>
>>> Therefore, dispatch different RCU callbacks based on the BPF map has a
>>> valid btf_record, which dereference and use smap's btf_record only when
>>> it is valid.
>>>
>>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>
>> Thanks for your patch. I have already made a similar change in my local branch
>> which has some differences like refactored it a little for my work. The set is
>> almost ready. Do you mind if I include your patch in my set and keep your SOB?
>>
> 
> No problem, please do.

Please ignore my previous message. I will make some adjustments on my set.

Applied with some changes on de-referencing SDATA(selem)->smap to address these 
warnings:

# this is addressed by rcu_dereference_protected(..., true)
../kernel/bpf/bpf_local_storage.c:117:41: warning: dereference of noderef expression

# this is addressed by directly using the earlier dereferenced smap pointer
../kernel/bpf/bpf_local_storage.c:200:27: warning: dereference of noderef expression
diff mbox series

Patch

diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 0fe92986412b..6d37a40cd90e 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -74,12 +74,6 @@  struct bpf_local_storage_elem {
 	struct hlist_node snode;	/* Linked to bpf_local_storage */
 	struct bpf_local_storage __rcu *local_storage;
 	struct rcu_head rcu;
-	bool can_use_smap; /* Is it safe to access smap in bpf_selem_free_* RCU
-			    * callbacks? bpf_local_storage_map_free only
-			    * executes rcu_barrier when there are special
-			    * fields, this field remembers that to ensure we
-			    * don't access already freed smap in sdata.
-			    */
 	/* 8 bytes hole */
 	/* The data is stored in another cacheline to minimize
 	 * the number of cachelines access during a cache hit.
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 2bdd722fe293..3b4caf1d86d1 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -109,30 +109,33 @@  void bpf_local_storage_free_rcu(struct rcu_head *rcu)
 		kfree_rcu(local_storage, rcu);
 }

-static void bpf_selem_free_rcu(struct rcu_head *rcu)
+static void bpf_selem_free_fields_rcu(struct rcu_head *rcu)
 {
 	struct bpf_local_storage_elem *selem;

 	selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
-	/* The can_use_smap bool is set whenever we need to free additional
-	 * fields in selem data before freeing selem. bpf_local_storage_map_free
-	 * only executes rcu_barrier to wait for RCU callbacks when it has
-	 * special fields, hence we can only conditionally dereference smap, as
-	 * by this time the map might have already been freed without waiting
-	 * for our call_rcu callback if it did not have any special fields.
-	 */
-	if (selem->can_use_smap)
-		bpf_obj_free_fields(SDATA(selem)->smap->map.record, SDATA(selem)->data);
+	bpf_obj_free_fields(SDATA(selem)->smap->map.record, SDATA(selem)->data);
 	kfree(selem);
 }

-static void bpf_selem_free_tasks_trace_rcu(struct rcu_head *rcu)
+static void bpf_selem_free_fields_trace_rcu(struct rcu_head *rcu)
 {
 	/* Free directly if Tasks Trace RCU GP also implies RCU GP */
 	if (rcu_trace_implies_rcu_gp())
-		bpf_selem_free_rcu(rcu);
+		bpf_selem_free_fields_rcu(rcu);
 	else
-		call_rcu(rcu, bpf_selem_free_rcu);
+		call_rcu(rcu, bpf_selem_free_fields_rcu);
+}
+
+static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
+{
+	struct bpf_local_storage_elem *selem;
+
+	selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
+	if (rcu_trace_implies_rcu_gp())
+		kfree(selem);
+	else
+		kfree_rcu(selem, rcu);
 }

 /* local_storage->lock must be held and selem->local_storage == local_storage.
@@ -145,6 +148,7 @@  static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
 {
 	struct bpf_local_storage_map *smap;
 	bool free_local_storage;
+	struct btf_record *rec;
 	void *owner;

 	smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
@@ -185,10 +189,26 @@  static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
 	    SDATA(selem))
 		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);

-	if (use_trace_rcu)
-		call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_tasks_trace_rcu);
-	else
-		call_rcu(&selem->rcu, bpf_selem_free_rcu);
+	/* A different RCU callback is chosen whenever we need to free
+	 * additional fields in selem data before freeing selem.
+	 * bpf_local_storage_map_free only executes rcu_barrier to wait for RCU
+	 * callbacks when it has special fields, hence we can only conditionally
+	 * dereference smap, as by this time the map might have already been
+	 * freed without waiting for our call_rcu callback if it did not have
+	 * any special fields.
+	 */
+	rec = SDATA(selem)->smap->map.record;
+	if (use_trace_rcu) {
+		if (!IS_ERR_OR_NULL(rec))
+			call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_fields_trace_rcu);
+		else
+			call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
+	} else {
+		if (!IS_ERR_OR_NULL(rec))
+			call_rcu(&selem->rcu, bpf_selem_free_fields_rcu);
+		else
+			kfree_rcu(selem, rcu);
+	}

 	return free_local_storage;
 }
@@ -256,11 +276,6 @@  void bpf_selem_link_map(struct bpf_local_storage_map *smap,
 	RCU_INIT_POINTER(SDATA(selem)->smap, smap);
 	hlist_add_head_rcu(&selem->map_node, &b->list);
 	raw_spin_unlock_irqrestore(&b->lock, flags);
-
-	/* If our data will have special fields, smap will wait for us to use
-	 * its record in bpf_selem_free_* RCU callbacks before freeing itself.
-	 */
-	selem->can_use_smap = !IS_ERR_OR_NULL(smap->map.record);
 }

 void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu)
@@ -748,19 +763,20 @@  void bpf_local_storage_map_free(struct bpf_map *map,
 	kvfree(smap->buckets);

 	/* When local storage has special fields, callbacks for
-	 * bpf_selem_free_rcu and bpf_selem_free_tasks_trace_rcu will keep using
-	 * the map BTF record, we need to execute an RCU barrier to wait for
-	 * them as the record will be freed right after our map_free callback.
+	 * bpf_selem_free_fields_rcu and bpf_selem_free_fields_trace_rcu will
+	 * keep using the map BTF record, we need to execute an RCU barrier to
+	 * wait for them as the record will be freed right after our map_free
+	 * callback.
 	 */
 	if (!IS_ERR_OR_NULL(smap->map.record)) {
 		rcu_barrier_tasks_trace();
 		/* We cannot skip rcu_barrier() when rcu_trace_implies_rcu_gp()
 		 * is true, because while call_rcu invocation is skipped in that
-		 * case in bpf_selem_free_tasks_trace_rcu (and all local storage
-		 * maps pass use_trace_rcu = true), there can be call_rcu
-		 * callbacks based on use_trace_rcu = false in the earlier while
-		 * ((selem = ...)) loop or from bpf_local_storage_unlink_nolock
-		 * called from owner's free path.
+		 * case in bpf_selem_free_fields_trace_rcu (and all local
+		 * storage maps pass use_trace_rcu = true), there can be
+		 * call_rcu callbacks based on use_trace_rcu = false in the
+		 * while ((selem = ...)) loop above or when owner's free path
+		 * calls bpf_local_storage_unlink_nolock.
 		 */
 		rcu_barrier();
 	}