diff mbox series

[bpf-next,v2,1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map

Message ID 20231211073843.1888058-2-houtao@huaweicloud.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: Use GFP_KERNEL in bpf_event_entry_gen() | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1130 this patch: 1130
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 1143 this patch: 1143
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1157 this patch: 1157
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 33 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-7 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-10 success Logs for x86_64-gcc / build-release

Commit Message

Hou Tao Dec. 11, 2023, 7:38 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
callbacks and only uses rcu-read-lock for the underlying update
operations in bpf_fd_{array,htab}_map_update_elem().

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/arraymap.c | 2 ++
 kernel/bpf/hashtab.c  | 2 ++
 kernel/bpf/syscall.c  | 4 ----
 3 files changed, 4 insertions(+), 4 deletions(-)

Comments

Alexei Starovoitov Dec. 14, 2023, 1:10 a.m. UTC | #1
On Sun, Dec 10, 2023 at 11:37 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
> callbacks and only uses rcu-read-lock for the underlying update
> operations in bpf_fd_{array,htab}_map_update_elem().
>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/arraymap.c | 2 ++
>  kernel/bpf/hashtab.c  | 2 ++
>  kernel/bpf/syscall.c  | 4 ----
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 8d365bda9a8b..6cf47bcb7b83 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -863,7 +863,9 @@ int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
>                 map->ops->map_poke_run(map, index, old_ptr, new_ptr);
>                 mutex_unlock(&array->aux->poke_mutex);
>         } else {
> +               rcu_read_lock();
>                 old_ptr = xchg(array->ptrs + index, new_ptr);
> +               rcu_read_unlock();
>         }
>
>         if (old_ptr)
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 5b9146fa825f..4c28fd51ac01 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2523,7 +2523,9 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>         if (IS_ERR(ptr))
>                 return PTR_ERR(ptr);
>
> +       rcu_read_lock();
>         ret = htab_map_update_elem(map, key, &ptr, map_flags);
> +       rcu_read_unlock();
>         if (ret)
>                 map->ops->map_fd_put_ptr(map, ptr, false);
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a76467fda558..019d18d33d63 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -183,15 +183,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
>                 err = bpf_percpu_cgroup_storage_update(map, key, value,
>                                                        flags);
>         } else if (IS_FD_ARRAY(map)) {
> -               rcu_read_lock();
>                 err = bpf_fd_array_map_update_elem(map, map_file, key, value,
>                                                    flags);
> -               rcu_read_unlock();
>         } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
> -               rcu_read_lock();
>                 err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
>                                                   flags);
> -               rcu_read_unlock();

Sorry. I misunderstood the previous diff.
Dropping rcu_read_lock() around bpf_fd_array_map_update_elem()
is actually mandatory, since it may do mutex_lock
which will splat under rcu CS.

Adding rcu_read_lock() to bpf_fd_htab_map_update_elem()
is necessary just to avoid the WARN.
The RCU CS doesn't provide any protection to any pointer.
It's worth adding a comment.

And
 +               rcu_read_lock();
                 old_ptr = xchg(array->ptrs + index, new_ptr);
 +               rcu_read_unlock();
is wrong and unnecessary.
Neither old_ptr nor new_ptr are rcu protected.
This rcu_read_lock() only causes confusion.

pw-bot: cr
Hou Tao Dec. 14, 2023, 1:57 a.m. UTC | #2
Hi,

On 12/14/2023 9:10 AM, Alexei Starovoitov wrote:
> On Sun, Dec 10, 2023 at 11:37 PM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
>> callbacks and only uses rcu-read-lock for the underlying update
>> operations in bpf_fd_{array,htab}_map_update_elem().
>>
>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  kernel/bpf/arraymap.c | 2 ++
>>  kernel/bpf/hashtab.c  | 2 ++
>>  kernel/bpf/syscall.c  | 4 ----
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 8d365bda9a8b..6cf47bcb7b83 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -863,7 +863,9 @@ int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
>>                 map->ops->map_poke_run(map, index, old_ptr, new_ptr);
>>                 mutex_unlock(&array->aux->poke_mutex);
>>         } else {
>> +               rcu_read_lock();
>>                 old_ptr = xchg(array->ptrs + index, new_ptr);
>> +               rcu_read_unlock();
>>         }
>>
>>         if (old_ptr)
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 5b9146fa825f..4c28fd51ac01 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -2523,7 +2523,9 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>>         if (IS_ERR(ptr))
>>                 return PTR_ERR(ptr);
>>
>> +       rcu_read_lock();
>>         ret = htab_map_update_elem(map, key, &ptr, map_flags);
>> +       rcu_read_unlock();
>>         if (ret)
>>                 map->ops->map_fd_put_ptr(map, ptr, false);
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index a76467fda558..019d18d33d63 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -183,15 +183,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
>>                 err = bpf_percpu_cgroup_storage_update(map, key, value,
>>                                                        flags);
>>         } else if (IS_FD_ARRAY(map)) {
>> -               rcu_read_lock();
>>                 err = bpf_fd_array_map_update_elem(map, map_file, key, value,
>>                                                    flags);
>> -               rcu_read_unlock();
>>         } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
>> -               rcu_read_lock();
>>                 err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
>>                                                   flags);
>> -               rcu_read_unlock();
> Sorry. I misunderstood the previous diff.
> Dropping rcu_read_lock() around bpf_fd_array_map_update_elem()
> is actually mandatory, since it may do mutex_lock
> which will splat under rcu CS.

Acquiring mutex_lock is only possible for program fd array, but
bpf_fd_array_map_update_elem() has already been called above to handle
program fd array and there is no rcu_read_lock() being acquired.
>
> Adding rcu_read_lock() to bpf_fd_htab_map_update_elem()
> is necessary just to avoid the WARN.
> The RCU CS doesn't provide any protection to any pointer.
> It's worth adding a comment.

Yes. There is no spin-lock support in fd htab, the update operation for
fd htab is taken under bucket lock. So the RCU CS is only to make the
WARN_ON_ONCE() in htab_map_update_elem() happy.

To ensure I fully understand what you mean, let me rephrase the things
that need to done:
1) Repost v3 based on v1
2) In v3, add comments in bpf_fd_htab_map_update_elem() to explain why
the RCU CS is needed. Is that correct ?

>
> And
>  +               rcu_read_lock();
>                  old_ptr = xchg(array->ptrs + index, new_ptr);
>  +               rcu_read_unlock();
> is wrong and unnecessary.
> Neither old_ptr nor new_ptr are rcu protected.
> This rcu_read_lock() only causes confusion.

OK. Will remove.
>
> pw-bot: cr
Alexei Starovoitov Dec. 14, 2023, 2:19 a.m. UTC | #3
On Wed, Dec 13, 2023 at 5:57 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 12/14/2023 9:10 AM, Alexei Starovoitov wrote:
> > On Sun, Dec 10, 2023 at 11:37 PM Hou Tao <houtao@huaweicloud.com> wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
> >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
> >> callbacks and only uses rcu-read-lock for the underlying update
> >> operations in bpf_fd_{array,htab}_map_update_elem().
> >>
> >> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >> ---
> >>  kernel/bpf/arraymap.c | 2 ++
> >>  kernel/bpf/hashtab.c  | 2 ++
> >>  kernel/bpf/syscall.c  | 4 ----
> >>  3 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> >> index 8d365bda9a8b..6cf47bcb7b83 100644
> >> --- a/kernel/bpf/arraymap.c
> >> +++ b/kernel/bpf/arraymap.c
> >> @@ -863,7 +863,9 @@ int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
> >>                 map->ops->map_poke_run(map, index, old_ptr, new_ptr);
> >>                 mutex_unlock(&array->aux->poke_mutex);
> >>         } else {
> >> +               rcu_read_lock();
> >>                 old_ptr = xchg(array->ptrs + index, new_ptr);
> >> +               rcu_read_unlock();
> >>         }
> >>
> >>         if (old_ptr)
> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> index 5b9146fa825f..4c28fd51ac01 100644
> >> --- a/kernel/bpf/hashtab.c
> >> +++ b/kernel/bpf/hashtab.c
> >> @@ -2523,7 +2523,9 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
> >>         if (IS_ERR(ptr))
> >>                 return PTR_ERR(ptr);
> >>
> >> +       rcu_read_lock();
> >>         ret = htab_map_update_elem(map, key, &ptr, map_flags);
> >> +       rcu_read_unlock();
> >>         if (ret)
> >>                 map->ops->map_fd_put_ptr(map, ptr, false);
> >>
> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >> index a76467fda558..019d18d33d63 100644
> >> --- a/kernel/bpf/syscall.c
> >> +++ b/kernel/bpf/syscall.c
> >> @@ -183,15 +183,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
> >>                 err = bpf_percpu_cgroup_storage_update(map, key, value,
> >>                                                        flags);
> >>         } else if (IS_FD_ARRAY(map)) {
> >> -               rcu_read_lock();
> >>                 err = bpf_fd_array_map_update_elem(map, map_file, key, value,
> >>                                                    flags);
> >> -               rcu_read_unlock();
> >>         } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
> >> -               rcu_read_lock();
> >>                 err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
> >>                                                   flags);
> >> -               rcu_read_unlock();
> > Sorry. I misunderstood the previous diff.
> > Dropping rcu_read_lock() around bpf_fd_array_map_update_elem()
> > is actually mandatory, since it may do mutex_lock
> > which will splat under rcu CS.
>
> Acquiring mutex_lock is only possible for program fd array, but
> bpf_fd_array_map_update_elem() has already been called above to handle
> program fd array and there is no rcu_read_lock() being acquired.

ahh. right. That explains why we don't have a splat now. good.

> >
> > Adding rcu_read_lock() to bpf_fd_htab_map_update_elem()
> > is necessary just to avoid the WARN.
> > The RCU CS doesn't provide any protection to any pointer.
> > It's worth adding a comment.
>
> Yes. There is no spin-lock support in fd htab, the update operation for
> fd htab is taken under bucket lock. So the RCU CS is only to make the
> WARN_ON_ONCE() in htab_map_update_elem() happy.
>
> To ensure I fully understand what you mean, let me rephrase the things
> that need to done:
> 1) Repost v3 based on v1
> 2) In v3, add comments in bpf_fd_htab_map_update_elem() to explain why
> the RCU CS is needed. Is that correct ?

Yes.
Thanks
diff mbox series

Patch

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 8d365bda9a8b..6cf47bcb7b83 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -863,7 +863,9 @@  int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
 		map->ops->map_poke_run(map, index, old_ptr, new_ptr);
 		mutex_unlock(&array->aux->poke_mutex);
 	} else {
+		rcu_read_lock();
 		old_ptr = xchg(array->ptrs + index, new_ptr);
+		rcu_read_unlock();
 	}
 
 	if (old_ptr)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 5b9146fa825f..4c28fd51ac01 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2523,7 +2523,9 @@  int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
 	if (IS_ERR(ptr))
 		return PTR_ERR(ptr);
 
+	rcu_read_lock();
 	ret = htab_map_update_elem(map, key, &ptr, map_flags);
+	rcu_read_unlock();
 	if (ret)
 		map->ops->map_fd_put_ptr(map, ptr, false);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a76467fda558..019d18d33d63 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -183,15 +183,11 @@  static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
 		err = bpf_percpu_cgroup_storage_update(map, key, value,
 						       flags);
 	} else if (IS_FD_ARRAY(map)) {
-		rcu_read_lock();
 		err = bpf_fd_array_map_update_elem(map, map_file, key, value,
 						   flags);
-		rcu_read_unlock();
 	} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
-		rcu_read_lock();
 		err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
 						  flags);
-		rcu_read_unlock();
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		/* rcu_read_lock() is not needed */
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,