diff mbox series

[bpf-next,09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible

Message ID 20230814172857.1366162-1-yonghong.song@linux.dev (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Add support for local percpu kptr | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1342 this patch: 1342
netdev/cc_maintainers warning 7 maintainers not CCed: kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com sdf@google.com song@kernel.org jolsa@kernel.org haoluo@google.com
netdev/build_clang success Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch warning WARNING: line length of 100 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-30 fail Logs for veristat
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 fail Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 fail Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 fail Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 fail Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 fail Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 fail Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 fail Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 fail Logs for veristat
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs_no_alu32 on aarch64 with gcc

Commit Message

Yonghong Song Aug. 14, 2023, 5:28 p.m. UTC
In previous selftests/bpf patch, we have
  p = bpf_percpu_obj_new(struct val_t);
  if (!p)
          goto out;

  p1 = bpf_kptr_xchg(&e->pc, p);
  if (p1) {
          /* race condition */
          bpf_percpu_obj_drop(p1);
  }

  p = e->pc;
  if (!p)
          goto out;

After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
This is due to that the second argument of bpf_kptr_xchg() is marked
OBJ_RELEASE and it will be marked as invalid after the call.
So after bpf_kptr_xchg(), 'p' is an unknown scalar,
and the bpf program needs to reread from the map value.

This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
and if 'p' is RCU protected. If this is the case, 'p' can be marked
as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
an owning reference any more. Such a change makes re-read
from the map value unnecessary.

Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
a different value from 'p' if immediately before 'p = e->pc',
another cpu may do another bpf_kptr_xchg() and swap in another value
into 'e->pc'. If this is the case, then 'p = e->pc' may
get either 'p' or another value, and race condition already exists.
So removing direct re-reading seems fine too.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Kumar Kartikeya Dwivedi Aug. 19, 2023, 1:44 a.m. UTC | #1
On Mon, 14 Aug 2023 at 23:00, Yonghong Song <yonghong.song@linux.dev> wrote:
>
> In previous selftests/bpf patch, we have
>   p = bpf_percpu_obj_new(struct val_t);
>   if (!p)
>           goto out;
>
>   p1 = bpf_kptr_xchg(&e->pc, p);
>   if (p1) {
>           /* race condition */
>           bpf_percpu_obj_drop(p1);
>   }
>
>   p = e->pc;
>   if (!p)
>           goto out;
>
> After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
> This is due to that the second argument of bpf_kptr_xchg() is marked
> OBJ_RELEASE and it will be marked as invalid after the call.
> So after bpf_kptr_xchg(), 'p' is an unknown scalar,
> and the bpf program needs to reread from the map value.
>
> This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
> and if 'p' is RCU protected. If this is the case, 'p' can be marked
> as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
> an owning reference any more. Such a change makes re-read
> from the map value unnecessary.
>
> Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
> a different value from 'p' if immediately before 'p = e->pc',
> another cpu may do another bpf_kptr_xchg() and swap in another value
> into 'e->pc'. If this is the case, then 'p = e->pc' may
> get either 'p' or another value, and race condition already exists.
> So removing direct re-reading seems fine too.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  kernel/bpf/verifier.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6fc200cb68b6..6fa458e13bfc 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env,
>                 return err;
>
>         bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> -               if (reg->ref_obj_id == ref_obj_id)
> -                       mark_reg_invalid(env, reg);
> +               if (reg->ref_obj_id == ref_obj_id) {
> +                       if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {

Wouldn't this check also be true in case of bpf_percpu_obj_drop(p)
inside RCU CS/non-sleepable prog?
Do we want to permit access to p after drop in that case? I think it
will be a bit unintuitive.
I think we should preserve normal behavior for everything except for
kptr_xchg of a percpu_kptr.

> +                               reg->ref_obj_id = 0;
> +                               reg->type &= ~MEM_ALLOC;
> +                               reg->type |= MEM_RCU;
> +                       } else {
> +                               mark_reg_invalid(env, reg);
> +                       }
> +               }
>         }));
>
>         return 0;
> --
> 2.34.1
>
>
Yonghong Song Aug. 20, 2023, 4:19 a.m. UTC | #2
On 8/18/23 6:44 PM, Kumar Kartikeya Dwivedi wrote:
> On Mon, 14 Aug 2023 at 23:00, Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> In previous selftests/bpf patch, we have
>>    p = bpf_percpu_obj_new(struct val_t);
>>    if (!p)
>>            goto out;
>>
>>    p1 = bpf_kptr_xchg(&e->pc, p);
>>    if (p1) {
>>            /* race condition */
>>            bpf_percpu_obj_drop(p1);
>>    }
>>
>>    p = e->pc;
>>    if (!p)
>>            goto out;
>>
>> After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
>> This is due to that the second argument of bpf_kptr_xchg() is marked
>> OBJ_RELEASE and it will be marked as invalid after the call.
>> So after bpf_kptr_xchg(), 'p' is an unknown scalar,
>> and the bpf program needs to reread from the map value.
>>
>> This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
>> and if 'p' is RCU protected. If this is the case, 'p' can be marked
>> as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
>> an owning reference any more. Such a change makes re-read
>> from the map value unnecessary.
>>
>> Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
>> a different value from 'p' if immediately before 'p = e->pc',
>> another cpu may do another bpf_kptr_xchg() and swap in another value
>> into 'e->pc'. If this is the case, then 'p = e->pc' may
>> get either 'p' or another value, and race condition already exists.
>> So removing direct re-reading seems fine too.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   kernel/bpf/verifier.c | 11 +++++++++--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 6fc200cb68b6..6fa458e13bfc 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env,
>>                  return err;
>>
>>          bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>> -               if (reg->ref_obj_id == ref_obj_id)
>> -                       mark_reg_invalid(env, reg);
>> +               if (reg->ref_obj_id == ref_obj_id) {
>> +                       if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
> 
> Wouldn't this check also be true in case of bpf_percpu_obj_drop(p)
> inside RCU CS/non-sleepable prog?
> Do we want to permit access to p after drop in that case? I think it
> will be a bit unintuitive.
> I think we should preserve normal behavior for everything except for
> kptr_xchg of a percpu_kptr.

You are correct. Above condition also applies to bpf_percpu_obj_drop()
and we should should change MEM_ALLOC to MEM_RCU only for
bpf_percpu_obj_new(). Will fix.

> 
>> +                               reg->ref_obj_id = 0;
>> +                               reg->type &= ~MEM_ALLOC;
>> +                               reg->type |= MEM_RCU;
>> +                       } else {
>> +                               mark_reg_invalid(env, reg);
>> +                       }
>> +               }
>>          }));
>>
>>          return 0;
>> --
>> 2.34.1
>>
>>
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6fc200cb68b6..6fa458e13bfc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8854,8 +8854,15 @@  static int release_reference(struct bpf_verifier_env *env,
 		return err;
 
 	bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
-		if (reg->ref_obj_id == ref_obj_id)
-			mark_reg_invalid(env, reg);
+		if (reg->ref_obj_id == ref_obj_id) {
+			if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
+				reg->ref_obj_id = 0;
+				reg->type &= ~MEM_ALLOC;
+				reg->type |= MEM_RCU;
+			} else {
+				mark_reg_invalid(env, reg);
+			}
+		}
 	}));
 
 	return 0;