diff mbox series

[bpf,v1,3/4] bpf: Augment raw_tp arguments with PTR_MAYBE_NULL

Message ID 20241211020156.18966-4-memxor@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Explicit raw_tp NULL arguments | expand

Checks

Context Check Description
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / veristat-kernel
bpf/vmtest-bpf-VM_Test-11 success Logs for aarch64-gcc / veristat-meta
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for s390x-gcc / veristat-kernel
bpf/vmtest-bpf-VM_Test-18 success Logs for s390x-gcc / veristat-meta
bpf/vmtest-bpf-VM_Test-19 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-gcc / veristat-kernel / x86_64-gcc veristat_kernel
bpf/vmtest-bpf-VM_Test-29 fail Logs for x86_64-gcc / veristat-meta / x86_64-gcc veristat_meta
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-17 / veristat-kernel
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-17 / veristat-meta
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-44 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-45 success Logs for x86_64-llvm-18 / veristat-kernel
bpf/vmtest-bpf-VM_Test-46 success Logs for x86_64-llvm-18 / veristat-meta
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-42 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-43 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-PR fail PR summary
bpf/vmtest-bpf-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc

Commit Message

Kumar Kartikeya Dwivedi Dec. 11, 2024, 2:01 a.m. UTC
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL.  However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].

Thus, there is a discrepancy between the reality, that raw_tp arguments can
actually be NULL, and the verifier's knowledge, that they are never NULL,
causing explicit NULL checks to be deleted, and accesses to such pointers
potentially crashing the kernel.

A previous attempt [1], i.e. the second fixed commit, was made to
simulate symbolic execution as if in most accesses, the argument is a
non-NULL raw_tp, except for conditional jumps.  This tried to suppress
branch prediction while preserving compatibility, but surfaced issues
with production programs that were difficult to solve without increasing
verifier complexity. A more complete discussion of issues and fixes is
available at [2].

Fix this by maintaining an explicit, incomplete list of tracepoints
where the arguments are known to be NULL, and mark the positional
arguments as PTR_MAYBE_NULL. Additionally, capture the tracepoints where
arguments are known to be PTR_ERR, and mark these arguments as scalar
values to prevent potential dereference.

In the future, an automated pass will be used to produce such a list, or
insert __nullable annotations automatically for tracepoints. Anyhow,
this is an attempt to close the gap until the automation lands, and
reflets the current best known list according to Jiri's analysis in [3].

  [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
  [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
  [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
  [3]: https://lore.kernel.org/bpf/Z1d-qbCdtJqg6Er4@krava

Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/btf.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

Comments

Jiri Olsa Dec. 11, 2024, 12:24 p.m. UTC | #1
On Tue, Dec 10, 2024 at 06:01:55PM -0800, Kumar Kartikeya Dwivedi wrote:
> Arguments to a raw tracepoint are tagged as trusted, which carries the
> semantics that the pointer will be non-NULL.  However, in certain cases,
> a raw tracepoint argument may end up being NULL. More context about this
> issue is available in [0].
> 
> Thus, there is a discrepancy between the reality, that raw_tp arguments can
> actually be NULL, and the verifier's knowledge, that they are never NULL,
> causing explicit NULL checks to be deleted, and accesses to such pointers
> potentially crashing the kernel.
> 
> A previous attempt [1], i.e. the second fixed commit, was made to
> simulate symbolic execution as if in most accesses, the argument is a
> non-NULL raw_tp, except for conditional jumps.  This tried to suppress
> branch prediction while preserving compatibility, but surfaced issues
> with production programs that were difficult to solve without increasing
> verifier complexity. A more complete discussion of issues and fixes is
> available at [2].
> 
> Fix this by maintaining an explicit, incomplete list of tracepoints
> where the arguments are known to be NULL, and mark the positional
> arguments as PTR_MAYBE_NULL. Additionally, capture the tracepoints where
> arguments are known to be PTR_ERR, and mark these arguments as scalar
> values to prevent potential dereference.
> 
> In the future, an automated pass will be used to produce such a list, or
> insert __nullable annotations automatically for tracepoints. Anyhow,
> this is an attempt to close the gap until the automation lands, and

so this won't cover modules with raw tracepoints, but I guess it's fine
as temporary solution until we have __nullable annotation support

SNIP

>  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  		    const struct bpf_prog *prog,
>  		    struct bpf_insn_access_aux *info)
> @@ -6449,6 +6539,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  	const char *tname = prog->aux->attach_func_name;
>  	struct bpf_verifier_log *log = info->log;
>  	const struct btf_param *args;
> +	bool ptr_err_raw_tp = false;
>  	const char *tag_value;
>  	u32 nr_args, arg;
>  	int i, ret;
> @@ -6591,6 +6682,36 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  	if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
>  		info->reg_type |= PTR_MAYBE_NULL;
>  
> +	if (prog->expected_attach_type == BPF_TRACE_RAW_TP) {
> +		struct btf *btf = prog->aux->attach_btf;
> +		const struct btf_type *t;
> +		const char *tname;
> +
> +		t = btf_type_by_id(btf, prog->aux->attach_btf_id);
> +		if (!t)
> +			goto done;
> +		tname = btf_name_by_offset(btf, t->name_off);
> +		if (!tname)
> +			goto done;

I think both btf_type_by_id and btf_name_by_offset should succeed for
BPF_TRACE_RAW_TP .. should be already checked in bpf_check_attach_target

> +		for (int i = 0; i < ARRAY_SIZE(raw_tp_null_args); i++) {
> +			/* Is this a func with potential NULL args? */
> +			if (strcmp(tname, raw_tp_null_args[i].func))
> +				continue;
> +			/* Is the current arg NULL? */
> +			if (raw_tp_null_args[i].mask & NULL_ARG(arg + 1))
> +				info->reg_type |= PTR_MAYBE_NULL;
> +			break;
> +		}
> +		/* Hardcode the only cases which has a IS_ERR pointer, i.e.
> +		 * mr_integ_alloc's 4th argument (mr), and
> +		 * cachefiles_lookup's 3rd argument (de).
> +		 */
> +		if (!strcmp(tname, "btf_trace_mr_integ_alloc") && (arg + 1) == 4)
> +			ptr_err_raw_tp = true;
> +		if (!strcmp(tname, "btf_trace_cachefiles_lookup") && (arg + 1) == 3)
> +			ptr_err_raw_tp = true;

could we have extra mask value (or split the current one in half) in
struct bpf_raw_tp_null_args and use it for scalar arguments? so we don't
have special checks and handle everything in the loop above

jirka

> +	}
> +done:
>  	if (tgt_prog) {
>  		enum bpf_prog_type tgt_type;
>  
> @@ -6635,6 +6756,14 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  	bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
>  		tname, arg, info->btf_id, btf_type_str(t),
>  		__btf_name_by_offset(btf, t->name_off));
> +
> +	/* Perform all checks on the validity of type for this argument, but if
> +	 * we know it can be IS_ERR at runtime, scrub pointer type and mark as
> +	 * scalar. We do not handle is_retval case as we hardcode ptr_err_raw_tp
> +	 * handling for known tps.
> +	 */
> +	if (ptr_err_raw_tp)
> +		info->reg_type = SCALAR_VALUE;
>  	return true;
>  }
>  EXPORT_SYMBOL_GPL(btf_ctx_access);
> -- 
> 2.43.5
>
Jiri Olsa Dec. 11, 2024, 2:56 p.m. UTC | #2
On Tue, Dec 10, 2024 at 06:01:55PM -0800, Kumar Kartikeya Dwivedi wrote:

SNIP

> +struct bpf_raw_tp_null_args raw_tp_null_args[] = {
> +	/* sched */
> +	RAW_TP_NULL_ARGS(sched_pi_setprio, NULL_ARG(2)),
> +	/* ... from sched_numa_pair_template event class */
> +	RAW_TP_NULL_ARGS(sched_stick_numa, NULL_ARG(3)),
> +	RAW_TP_NULL_ARGS(sched_swap_numa, NULL_ARG(3)),
> +	/* afs */
> +	RAW_TP_NULL_ARGS(afs_make_fs_call, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(afs_make_fs_calli, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(afs_make_fs_call1, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(afs_make_fs_call2, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(afs_protocol_error, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(afs_flock_ev, NULL_ARG(2)),
> +	/* cachefiles */
> +	RAW_TP_NULL_ARGS(cachefiles_lookup, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_unlink, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_rename, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_prep_read, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_mark_active, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_mark_failed, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_mark_inactive, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_vfs_error, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_io_error, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_ondemand_open, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_ondemand_copen, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_ondemand_close, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_ondemand_read, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_ondemand_cread, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_ondemand_fd_write, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(cachefiles_ondemand_fd_release, NULL_ARG(1)),
> +	/* ext4, from ext4__mballoc event class */
> +	RAW_TP_NULL_ARGS(ext4_mballoc_discard, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(ext4_mballoc_free, NULL_ARG(2)),
> +	/* fib */
> +	RAW_TP_NULL_ARGS(fib_table_lookup, NULL_ARG(3)),
> +	/* filelock */
> +	/* ... from filelock_lock event class */
> +	RAW_TP_NULL_ARGS(posix_lock_inode, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(fcntl_setlk, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(locks_remove_posix, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(flock_lock_inode, NULL_ARG(2)),
> +	/* ... from filelock_lease event class */
> +	RAW_TP_NULL_ARGS(break_lease_noblock, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(break_lease_block, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(break_lease_unblock, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(generic_delete_lease, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(time_out_leases, NULL_ARG(2)),
> +	/* host1x */
> +	RAW_TP_NULL_ARGS(host1x_cdma_push_gather, NULL_ARG(5)),
> +	/* huge_memory */
> +	RAW_TP_NULL_ARGS(mm_khugepaged_scan_pmd, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(mm_collapse_huge_page_isolate, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(mm_khugepaged_scan_file, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(mm_khugepaged_collapse_file, NULL_ARG(2)),
> +	/* kmem */
> +	RAW_TP_NULL_ARGS(mm_page_alloc, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(mm_page_pcpu_drain, NULL_ARG(1)),
> +	/* .. from mm_page event class */
> +	RAW_TP_NULL_ARGS(mm_page_alloc_zone_locked, NULL_ARG(1)),
> +	/* netfs */
> +	RAW_TP_NULL_ARGS(netfs_failure, NULL_ARG(2)),
> +	/* power */
> +	RAW_TP_NULL_ARGS(device_pm_callback_start, NULL_ARG(2)),
> +	/* qdisc */
> +	RAW_TP_NULL_ARGS(qdisc_dequeue, NULL_ARG(4)),
> +	/* rxrpc */
> +	RAW_TP_NULL_ARGS(rxrpc_recvdata, NULL_ARG(1)),
> +	RAW_TP_NULL_ARGS(rxrpc_resend, NULL_ARG(2)),
> +	/* sunrpc */
> +	RAW_TP_NULL_ARGS(xs_stream_read_data, NULL_ARG(1)),

I missed one more in sunrpc: xprt_cong_event class

jirka

> +	/* tcp */
> +	RAW_TP_NULL_ARGS(tcp_send_reset, NULL_ARG(1) | NULL_ARG(2)),
> +	/* tegra_apb_dma */
> +	RAW_TP_NULL_ARGS(tegra_dma_tx_status, NULL_ARG(3)),
> +	/* timer_migration */
> +	RAW_TP_NULL_ARGS(tmigr_update_events, NULL_ARG(1)),
> +	/* writeback, from writeback_folio_template event class */
> +	RAW_TP_NULL_ARGS(writeback_dirty_folio, NULL_ARG(2)),
> +	RAW_TP_NULL_ARGS(folio_wait_writeback, NULL_ARG(2)),
> +};
> +
>  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  		    const struct bpf_prog *prog,
>  		    struct bpf_insn_access_aux *info)
> @@ -6449,6 +6539,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  	const char *tname = prog->aux->attach_func_name;
>  	struct bpf_verifier_log *log = info->log;
>  	const struct btf_param *args;
> +	bool ptr_err_raw_tp = false;
>  	const char *tag_value;
>  	u32 nr_args, arg;
>  	int i, ret;
> @@ -6591,6 +6682,36 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  	if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
>  		info->reg_type |= PTR_MAYBE_NULL;
>  
> +	if (prog->expected_attach_type == BPF_TRACE_RAW_TP) {
> +		struct btf *btf = prog->aux->attach_btf;
> +		const struct btf_type *t;
> +		const char *tname;
> +
> +		t = btf_type_by_id(btf, prog->aux->attach_btf_id);
> +		if (!t)
> +			goto done;
> +		tname = btf_name_by_offset(btf, t->name_off);
> +		if (!tname)
> +			goto done;
> +		for (int i = 0; i < ARRAY_SIZE(raw_tp_null_args); i++) {
> +			/* Is this a func with potential NULL args? */
> +			if (strcmp(tname, raw_tp_null_args[i].func))
> +				continue;
> +			/* Is the current arg NULL? */
> +			if (raw_tp_null_args[i].mask & NULL_ARG(arg + 1))
> +				info->reg_type |= PTR_MAYBE_NULL;
> +			break;
> +		}
> +		/* Hardcode the only cases which has a IS_ERR pointer, i.e.
> +		 * mr_integ_alloc's 4th argument (mr), and
> +		 * cachefiles_lookup's 3rd argument (de).
> +		 */
> +		if (!strcmp(tname, "btf_trace_mr_integ_alloc") && (arg + 1) == 4)
> +			ptr_err_raw_tp = true;
> +		if (!strcmp(tname, "btf_trace_cachefiles_lookup") && (arg + 1) == 3)
> +			ptr_err_raw_tp = true;
> +	}
> +done:
>  	if (tgt_prog) {
>  		enum bpf_prog_type tgt_type;
>  
> @@ -6635,6 +6756,14 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  	bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
>  		tname, arg, info->btf_id, btf_type_str(t),
>  		__btf_name_by_offset(btf, t->name_off));
> +
> +	/* Perform all checks on the validity of type for this argument, but if
> +	 * we know it can be IS_ERR at runtime, scrub pointer type and mark as
> +	 * scalar. We do not handle is_retval case as we hardcode ptr_err_raw_tp
> +	 * handling for known tps.
> +	 */
> +	if (ptr_err_raw_tp)
> +		info->reg_type = SCALAR_VALUE;
>  	return true;
>  }
>  EXPORT_SYMBOL_GPL(btf_ctx_access);
> -- 
> 2.43.5
>
Alexei Starovoitov Dec. 11, 2024, 3:56 p.m. UTC | #3
On Tue, Dec 10, 2024 at 6:02 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Arguments to a raw tracepoint are tagged as trusted, which carries the
> semantics that the pointer will be non-NULL.  However, in certain cases,
> a raw tracepoint argument may end up being NULL. More context about this
> issue is available in [0].
>
> Thus, there is a discrepancy between the reality, that raw_tp arguments can
> actually be NULL, and the verifier's knowledge, that they are never NULL,
> causing explicit NULL checks to be deleted, and accesses to such pointers
> potentially crashing the kernel.
>
> A previous attempt [1], i.e. the second fixed commit, was made to
> simulate symbolic execution as if in most accesses, the argument is a
> non-NULL raw_tp, except for conditional jumps.  This tried to suppress
> branch prediction while preserving compatibility, but surfaced issues
> with production programs that were difficult to solve without increasing
> verifier complexity. A more complete discussion of issues and fixes is
> available at [2].
>
> Fix this by maintaining an explicit, incomplete list of tracepoints
> where the arguments are known to be NULL, and mark the positional
> arguments as PTR_MAYBE_NULL. Additionally, capture the tracepoints where
> arguments are known to be PTR_ERR, and mark these arguments as scalar
> values to prevent potential dereference.
>
> In the future, an automated pass will be used to produce such a list, or
> insert __nullable annotations automatically for tracepoints. Anyhow,
> this is an attempt to close the gap until the automation lands, and
> reflets the current best known list according to Jiri's analysis in [3].
>
>   [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
>   [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
>   [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
>   [3]: https://lore.kernel.org/bpf/Z1d-qbCdtJqg6Er4@krava
>
> Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
> Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
> Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
> Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
> Co-developed-by: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/btf.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 129 insertions(+)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index ed3219da7181..cb72cbf04d12 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6439,6 +6439,96 @@ int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto,
>         return off;
>  }
>
> +struct bpf_raw_tp_null_args {
> +       const char *func;
> +       u64 mask;
> +};
> +
> +#define RAW_TP_NULL_ARGS(str, arg) { .func = "btf_trace_" #str, .mask = (arg) }
> +/* Use 1-based indexing for argno */
> +#define NULL_ARG(argno) (1 << (argno))
> +
> +struct bpf_raw_tp_null_args raw_tp_null_args[] = {
> +       /* sched */
> +       RAW_TP_NULL_ARGS(sched_pi_setprio, NULL_ARG(2)),
> +       /* ... from sched_numa_pair_template event class */
> +       RAW_TP_NULL_ARGS(sched_stick_numa, NULL_ARG(3)),

Let's avoid LOUD macros.
"btf_trace_" prefix can also be dropped to save space.
How about the following encoding:
 {"sched_pi_setprio", 0x10},
 {"sched_stick_numa", 0x100},

> +       RAW_TP_NULL_ARGS(cachefiles_lookup, NULL_ARG(1)),
 {"cachefiles_lookup", 0x1},

There is no need for arg0, since the count starts from arg1.


> +               if (!strcmp(tname, "btf_trace_mr_integ_alloc") && (arg + 1) == 4)
> +                       ptr_err_raw_tp = true;
> +               if (!strcmp(tname, "btf_trace_cachefiles_lookup") && (arg + 1) == 3)
> +                       ptr_err_raw_tp = true;

+1 to Jiri's point.
Let's use 0x2 to encode the scalar ?
kernel test robot Dec. 12, 2024, 11:39 a.m. UTC | #4
Hi Kumar,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 7d0d673627e20cfa3b21a829a896ce03b58a4f1c]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/bpf-Revert-bpf-Mark-raw_tp-arguments-with-PTR_MAYBE_NULL/20241211-100358
base:   7d0d673627e20cfa3b21a829a896ce03b58a4f1c
patch link:    https://lore.kernel.org/r/20241211020156.18966-4-memxor%40gmail.com
patch subject: [PATCH bpf v1 3/4] bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
config: i386-randconfig-062-20241212 (https://download.01.org/0day-ci/archive/20241212/202412121921.xPnFS8u5-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241212/202412121921.xPnFS8u5-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412121921.xPnFS8u5-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> kernel/bpf/btf.c:6451:29: sparse: sparse: symbol 'raw_tp_null_args' was not declared. Should it be static?
   kernel/bpf/btf.c: note: in included file (through include/linux/bpf.h, include/linux/bpf_verifier.h):
   include/linux/bpfptr.h:65:40: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:65:40: sparse: sparse: cast from non-scalar

vim +/raw_tp_null_args +6451 kernel/bpf/btf.c

  6450	
> 6451	struct bpf_raw_tp_null_args raw_tp_null_args[] = {
  6452		/* sched */
  6453		RAW_TP_NULL_ARGS(sched_pi_setprio, NULL_ARG(2)),
  6454		/* ... from sched_numa_pair_template event class */
  6455		RAW_TP_NULL_ARGS(sched_stick_numa, NULL_ARG(3)),
  6456		RAW_TP_NULL_ARGS(sched_swap_numa, NULL_ARG(3)),
  6457		/* afs */
  6458		RAW_TP_NULL_ARGS(afs_make_fs_call, NULL_ARG(2)),
  6459		RAW_TP_NULL_ARGS(afs_make_fs_calli, NULL_ARG(2)),
  6460		RAW_TP_NULL_ARGS(afs_make_fs_call1, NULL_ARG(2)),
  6461		RAW_TP_NULL_ARGS(afs_make_fs_call2, NULL_ARG(2)),
  6462		RAW_TP_NULL_ARGS(afs_protocol_error, NULL_ARG(1)),
  6463		RAW_TP_NULL_ARGS(afs_flock_ev, NULL_ARG(2)),
  6464		/* cachefiles */
  6465		RAW_TP_NULL_ARGS(cachefiles_lookup, NULL_ARG(1)),
  6466		RAW_TP_NULL_ARGS(cachefiles_unlink, NULL_ARG(1)),
  6467		RAW_TP_NULL_ARGS(cachefiles_rename, NULL_ARG(1)),
  6468		RAW_TP_NULL_ARGS(cachefiles_prep_read, NULL_ARG(1)),
  6469		RAW_TP_NULL_ARGS(cachefiles_mark_active, NULL_ARG(1)),
  6470		RAW_TP_NULL_ARGS(cachefiles_mark_failed, NULL_ARG(1)),
  6471		RAW_TP_NULL_ARGS(cachefiles_mark_inactive, NULL_ARG(1)),
  6472		RAW_TP_NULL_ARGS(cachefiles_vfs_error, NULL_ARG(1)),
  6473		RAW_TP_NULL_ARGS(cachefiles_io_error, NULL_ARG(1)),
  6474		RAW_TP_NULL_ARGS(cachefiles_ondemand_open, NULL_ARG(1)),
  6475		RAW_TP_NULL_ARGS(cachefiles_ondemand_copen, NULL_ARG(1)),
  6476		RAW_TP_NULL_ARGS(cachefiles_ondemand_close, NULL_ARG(1)),
  6477		RAW_TP_NULL_ARGS(cachefiles_ondemand_read, NULL_ARG(1)),
  6478		RAW_TP_NULL_ARGS(cachefiles_ondemand_cread, NULL_ARG(1)),
  6479		RAW_TP_NULL_ARGS(cachefiles_ondemand_fd_write, NULL_ARG(1)),
  6480		RAW_TP_NULL_ARGS(cachefiles_ondemand_fd_release, NULL_ARG(1)),
  6481		/* ext4, from ext4__mballoc event class */
  6482		RAW_TP_NULL_ARGS(ext4_mballoc_discard, NULL_ARG(2)),
  6483		RAW_TP_NULL_ARGS(ext4_mballoc_free, NULL_ARG(2)),
  6484		/* fib */
  6485		RAW_TP_NULL_ARGS(fib_table_lookup, NULL_ARG(3)),
  6486		/* filelock */
  6487		/* ... from filelock_lock event class */
  6488		RAW_TP_NULL_ARGS(posix_lock_inode, NULL_ARG(2)),
  6489		RAW_TP_NULL_ARGS(fcntl_setlk, NULL_ARG(2)),
  6490		RAW_TP_NULL_ARGS(locks_remove_posix, NULL_ARG(2)),
  6491		RAW_TP_NULL_ARGS(flock_lock_inode, NULL_ARG(2)),
  6492		/* ... from filelock_lease event class */
  6493		RAW_TP_NULL_ARGS(break_lease_noblock, NULL_ARG(2)),
  6494		RAW_TP_NULL_ARGS(break_lease_block, NULL_ARG(2)),
  6495		RAW_TP_NULL_ARGS(break_lease_unblock, NULL_ARG(2)),
  6496		RAW_TP_NULL_ARGS(generic_delete_lease, NULL_ARG(2)),
  6497		RAW_TP_NULL_ARGS(time_out_leases, NULL_ARG(2)),
  6498		/* host1x */
  6499		RAW_TP_NULL_ARGS(host1x_cdma_push_gather, NULL_ARG(5)),
  6500		/* huge_memory */
  6501		RAW_TP_NULL_ARGS(mm_khugepaged_scan_pmd, NULL_ARG(2)),
  6502		RAW_TP_NULL_ARGS(mm_collapse_huge_page_isolate, NULL_ARG(1)),
  6503		RAW_TP_NULL_ARGS(mm_khugepaged_scan_file, NULL_ARG(2)),
  6504		RAW_TP_NULL_ARGS(mm_khugepaged_collapse_file, NULL_ARG(2)),
  6505		/* kmem */
  6506		RAW_TP_NULL_ARGS(mm_page_alloc, NULL_ARG(1)),
  6507		RAW_TP_NULL_ARGS(mm_page_pcpu_drain, NULL_ARG(1)),
  6508		/* .. from mm_page event class */
  6509		RAW_TP_NULL_ARGS(mm_page_alloc_zone_locked, NULL_ARG(1)),
  6510		/* netfs */
  6511		RAW_TP_NULL_ARGS(netfs_failure, NULL_ARG(2)),
  6512		/* power */
  6513		RAW_TP_NULL_ARGS(device_pm_callback_start, NULL_ARG(2)),
  6514		/* qdisc */
  6515		RAW_TP_NULL_ARGS(qdisc_dequeue, NULL_ARG(4)),
  6516		/* rxrpc */
  6517		RAW_TP_NULL_ARGS(rxrpc_recvdata, NULL_ARG(1)),
  6518		RAW_TP_NULL_ARGS(rxrpc_resend, NULL_ARG(2)),
  6519		/* sunrpc */
  6520		RAW_TP_NULL_ARGS(xs_stream_read_data, NULL_ARG(1)),
  6521		/* tcp */
  6522		RAW_TP_NULL_ARGS(tcp_send_reset, NULL_ARG(1) | NULL_ARG(2)),
  6523		/* tegra_apb_dma */
  6524		RAW_TP_NULL_ARGS(tegra_dma_tx_status, NULL_ARG(3)),
  6525		/* timer_migration */
  6526		RAW_TP_NULL_ARGS(tmigr_update_events, NULL_ARG(1)),
  6527		/* writeback, from writeback_folio_template event class */
  6528		RAW_TP_NULL_ARGS(writeback_dirty_folio, NULL_ARG(2)),
  6529		RAW_TP_NULL_ARGS(folio_wait_writeback, NULL_ARG(2)),
  6530	};
  6531
diff mbox series

Patch

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ed3219da7181..cb72cbf04d12 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6439,6 +6439,96 @@  int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto,
 	return off;
 }
 
+struct bpf_raw_tp_null_args {
+	const char *func;
+	u64 mask;
+};
+
+#define RAW_TP_NULL_ARGS(str, arg) { .func = "btf_trace_" #str, .mask = (arg) }
+/* Use 1-based indexing for argno */
+#define NULL_ARG(argno) (1 << (argno))
+
+struct bpf_raw_tp_null_args raw_tp_null_args[] = {
+	/* sched */
+	RAW_TP_NULL_ARGS(sched_pi_setprio, NULL_ARG(2)),
+	/* ... from sched_numa_pair_template event class */
+	RAW_TP_NULL_ARGS(sched_stick_numa, NULL_ARG(3)),
+	RAW_TP_NULL_ARGS(sched_swap_numa, NULL_ARG(3)),
+	/* afs */
+	RAW_TP_NULL_ARGS(afs_make_fs_call, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(afs_make_fs_calli, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(afs_make_fs_call1, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(afs_make_fs_call2, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(afs_protocol_error, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(afs_flock_ev, NULL_ARG(2)),
+	/* cachefiles */
+	RAW_TP_NULL_ARGS(cachefiles_lookup, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_unlink, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_rename, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_prep_read, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_mark_active, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_mark_failed, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_mark_inactive, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_vfs_error, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_io_error, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_ondemand_open, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_ondemand_copen, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_ondemand_close, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_ondemand_read, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_ondemand_cread, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_ondemand_fd_write, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(cachefiles_ondemand_fd_release, NULL_ARG(1)),
+	/* ext4, from ext4__mballoc event class */
+	RAW_TP_NULL_ARGS(ext4_mballoc_discard, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(ext4_mballoc_free, NULL_ARG(2)),
+	/* fib */
+	RAW_TP_NULL_ARGS(fib_table_lookup, NULL_ARG(3)),
+	/* filelock */
+	/* ... from filelock_lock event class */
+	RAW_TP_NULL_ARGS(posix_lock_inode, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(fcntl_setlk, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(locks_remove_posix, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(flock_lock_inode, NULL_ARG(2)),
+	/* ... from filelock_lease event class */
+	RAW_TP_NULL_ARGS(break_lease_noblock, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(break_lease_block, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(break_lease_unblock, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(generic_delete_lease, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(time_out_leases, NULL_ARG(2)),
+	/* host1x */
+	RAW_TP_NULL_ARGS(host1x_cdma_push_gather, NULL_ARG(5)),
+	/* huge_memory */
+	RAW_TP_NULL_ARGS(mm_khugepaged_scan_pmd, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(mm_collapse_huge_page_isolate, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(mm_khugepaged_scan_file, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(mm_khugepaged_collapse_file, NULL_ARG(2)),
+	/* kmem */
+	RAW_TP_NULL_ARGS(mm_page_alloc, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(mm_page_pcpu_drain, NULL_ARG(1)),
+	/* .. from mm_page event class */
+	RAW_TP_NULL_ARGS(mm_page_alloc_zone_locked, NULL_ARG(1)),
+	/* netfs */
+	RAW_TP_NULL_ARGS(netfs_failure, NULL_ARG(2)),
+	/* power */
+	RAW_TP_NULL_ARGS(device_pm_callback_start, NULL_ARG(2)),
+	/* qdisc */
+	RAW_TP_NULL_ARGS(qdisc_dequeue, NULL_ARG(4)),
+	/* rxrpc */
+	RAW_TP_NULL_ARGS(rxrpc_recvdata, NULL_ARG(1)),
+	RAW_TP_NULL_ARGS(rxrpc_resend, NULL_ARG(2)),
+	/* sunrpc */
+	RAW_TP_NULL_ARGS(xs_stream_read_data, NULL_ARG(1)),
+	/* tcp */
+	RAW_TP_NULL_ARGS(tcp_send_reset, NULL_ARG(1) | NULL_ARG(2)),
+	/* tegra_apb_dma */
+	RAW_TP_NULL_ARGS(tegra_dma_tx_status, NULL_ARG(3)),
+	/* timer_migration */
+	RAW_TP_NULL_ARGS(tmigr_update_events, NULL_ARG(1)),
+	/* writeback, from writeback_folio_template event class */
+	RAW_TP_NULL_ARGS(writeback_dirty_folio, NULL_ARG(2)),
+	RAW_TP_NULL_ARGS(folio_wait_writeback, NULL_ARG(2)),
+};
+
 bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		    const struct bpf_prog *prog,
 		    struct bpf_insn_access_aux *info)
@@ -6449,6 +6539,7 @@  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 	const char *tname = prog->aux->attach_func_name;
 	struct bpf_verifier_log *log = info->log;
 	const struct btf_param *args;
+	bool ptr_err_raw_tp = false;
 	const char *tag_value;
 	u32 nr_args, arg;
 	int i, ret;
@@ -6591,6 +6682,36 @@  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 	if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
 		info->reg_type |= PTR_MAYBE_NULL;
 
+	if (prog->expected_attach_type == BPF_TRACE_RAW_TP) {
+		struct btf *btf = prog->aux->attach_btf;
+		const struct btf_type *t;
+		const char *tname;
+
+		t = btf_type_by_id(btf, prog->aux->attach_btf_id);
+		if (!t)
+			goto done;
+		tname = btf_name_by_offset(btf, t->name_off);
+		if (!tname)
+			goto done;
+		for (int i = 0; i < ARRAY_SIZE(raw_tp_null_args); i++) {
+			/* Is this a func with potential NULL args? */
+			if (strcmp(tname, raw_tp_null_args[i].func))
+				continue;
+			/* Is the current arg NULL? */
+			if (raw_tp_null_args[i].mask & NULL_ARG(arg + 1))
+				info->reg_type |= PTR_MAYBE_NULL;
+			break;
+		}
+		/* Hardcode the only cases which has a IS_ERR pointer, i.e.
+		 * mr_integ_alloc's 4th argument (mr), and
+		 * cachefiles_lookup's 3rd argument (de).
+		 */
+		if (!strcmp(tname, "btf_trace_mr_integ_alloc") && (arg + 1) == 4)
+			ptr_err_raw_tp = true;
+		if (!strcmp(tname, "btf_trace_cachefiles_lookup") && (arg + 1) == 3)
+			ptr_err_raw_tp = true;
+	}
+done:
 	if (tgt_prog) {
 		enum bpf_prog_type tgt_type;
 
@@ -6635,6 +6756,14 @@  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 	bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
 		tname, arg, info->btf_id, btf_type_str(t),
 		__btf_name_by_offset(btf, t->name_off));
+
+	/* Perform all checks on the validity of type for this argument, but if
+	 * we know it can be IS_ERR at runtime, scrub pointer type and mark as
+	 * scalar. We do not handle is_retval case as we hardcode ptr_err_raw_tp
+	 * handling for known tps.
+	 */
+	if (ptr_err_raw_tp)
+		info->reg_type = SCALAR_VALUE;
 	return true;
 }
 EXPORT_SYMBOL_GPL(btf_ctx_access);