diff mbox series

[v2] bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD

Message ID 20221223185531.222689-1-paul@paul-moore.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [v2] bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch, async
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix

Commit Message

Paul Moore Dec. 23, 2022, 6:55 p.m. UTC
When changing the ebpf program put() routines to support being called
from within IRQ context the program ID was reset to zero prior to
calling the perf event and audit UNLOAD record generators, which
resulted in problems as the ebpf program ID was bogus (always zero).
This patch resolves this by adding a new flag, bpf_prog::valid_id, to
indicate when the bpf_prog_aux ID field is valid; it is set to true/1
in bpf_prog_alloc_id() and set to false/0 in bpf_prog_free_id().  In
order to help ensure that access to the bpf_prog_aux ID field takes
into account the new valid_id flag, the bpf_prog_aux ID field is
renamed to bpf_prog_aux::__id and a getter function,
bpf_prog_get_id(), was created and all users of bpf_prog_aux::id were
converted to the new caller.  Exceptions to this include some of the
internal ebpf functions and the xdp trace points, although the latter
still take into account the valid_id flag.

I also modified the bpf_audit_prog() logic used to associate the
AUDIT_BPF record with other associated records, e.g. @ctx != NULL.
Instead of keying off the operation, it now keys off the execution
context, e.g. '!in_irg && !irqs_disabled()', which is much more
appropriate and should help better connect the UNLOAD operations with
the associated audit state (other audit records).

Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
Reported-by: Burn Alting <burn.alting@iinet.net.au>
Reported-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

--
* v2
  - change subj
  - add mention of the perf regression
  - drop the dedicated program audit ID
  - add the bpf_prog::valid_id flag, bpf_prog_get_id() getter
  - convert prog ID users to new ID getter
* v1
  - subj was: "bpf: restore the ebpf audit UNLOAD id field"
  - initial draft
---
 drivers/net/netdevsim/bpf.c  |  6 ++++--
 include/linux/bpf.h          | 11 +++++++++--
 include/linux/bpf_verifier.h |  2 +-
 include/trace/events/xdp.h   |  4 ++--
 kernel/bpf/arraymap.c        |  2 +-
 kernel/bpf/bpf_struct_ops.c  |  2 +-
 kernel/bpf/cgroup.c          |  2 +-
 kernel/bpf/core.c            |  2 +-
 kernel/bpf/cpumap.c          |  2 +-
 kernel/bpf/devmap.c          |  2 +-
 kernel/bpf/syscall.c         | 27 +++++++++++++++------------
 kernel/events/core.c         |  6 +++++-
 kernel/trace/bpf_trace.c     |  2 +-
 net/core/dev.c               |  2 +-
 net/core/filter.c            |  3 ++-
 net/core/rtnetlink.c         |  2 +-
 net/core/sock_map.c          |  2 +-
 net/ipv6/seg6_local.c        |  3 ++-
 net/sched/act_bpf.c          |  2 +-
 net/sched/cls_bpf.c          |  2 +-
 20 files changed, 52 insertions(+), 34 deletions(-)

Comments

Paul Moore Dec. 23, 2022, 9:26 p.m. UTC | #1
On Fri, Dec 23, 2022 at 1:55 PM Paul Moore <paul@paul-moore.com> wrote:
>
> When changing the ebpf program put() routines to support being called
> from within IRQ context the program ID was reset to zero prior to
> calling the perf event and audit UNLOAD record generators, which
> resulted in problems as the ebpf program ID was bogus (always zero).
> This patch resolves this by adding a new flag, bpf_prog::valid_id, to
> indicate when the bpf_prog_aux ID field is valid; it is set to true/1
> in bpf_prog_alloc_id() and set to false/0 in bpf_prog_free_id().  In
> order to help ensure that access to the bpf_prog_aux ID field takes
> into account the new valid_id flag, the bpf_prog_aux ID field is
> renamed to bpf_prog_aux::__id and a getter function,
> bpf_prog_get_id(), was created and all users of bpf_prog_aux::id were
> converted to the new caller.  Exceptions to this include some of the
> internal ebpf functions and the xdp trace points, although the latter
> still take into account the valid_id flag.
>
> I also modified the bpf_audit_prog() logic used to associate the
> AUDIT_BPF record with other associated records, e.g. @ctx != NULL.
> Instead of keying off the operation, it now keys off the execution
> context, e.g. '!in_irg && !irqs_disabled()', which is much more
> appropriate and should help better connect the UNLOAD operations with
> the associated audit state (other audit records).
>
> Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
> Reported-by: Burn Alting <burn.alting@iinet.net.au>
> Reported-by: Jiri Olsa <olsajiri@gmail.com>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
>
> --
> * v2
>   - change subj
>   - add mention of the perf regression
>   - drop the dedicated program audit ID
>   - add the bpf_prog::valid_id flag, bpf_prog_get_id() getter
>   - convert prog ID users to new ID getter
> * v1
>   - subj was: "bpf: restore the ebpf audit UNLOAD id field"
>   - initial draft
> ---
>  drivers/net/netdevsim/bpf.c  |  6 ++++--
>  include/linux/bpf.h          | 11 +++++++++--
>  include/linux/bpf_verifier.h |  2 +-
>  include/trace/events/xdp.h   |  4 ++--
>  kernel/bpf/arraymap.c        |  2 +-
>  kernel/bpf/bpf_struct_ops.c  |  2 +-
>  kernel/bpf/cgroup.c          |  2 +-
>  kernel/bpf/core.c            |  2 +-
>  kernel/bpf/cpumap.c          |  2 +-
>  kernel/bpf/devmap.c          |  2 +-
>  kernel/bpf/syscall.c         | 27 +++++++++++++++------------
>  kernel/events/core.c         |  6 +++++-
>  kernel/trace/bpf_trace.c     |  2 +-
>  net/core/dev.c               |  2 +-
>  net/core/filter.c            |  3 ++-
>  net/core/rtnetlink.c         |  2 +-
>  net/core/sock_map.c          |  2 +-
>  net/ipv6/seg6_local.c        |  3 ++-
>  net/sched/act_bpf.c          |  2 +-
>  net/sched/cls_bpf.c          |  2 +-
>  20 files changed, 52 insertions(+), 34 deletions(-)

...

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9e7d46d16032..18e965bd7db9 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
>  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
>  void bpf_prog_put(struct bpf_prog *prog);
>
> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> +{
> +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> +               return 0;
> +       return prog->aux->__id;
> +}
>  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
>  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);

The bpf_prog_get_id() either needs to be moved outside the `#ifdef
CONFIG_BPF_SYSCALL` block, or a dummy function needs to be added when
CONFIG_BPF_SYSCALL is undefined.  I can fix that up easily enough, but
given the time of year I'll wait a bit to see if there are any other
comments before posting another revision.
Stanislav Fomichev Dec. 24, 2022, 1:49 a.m. UTC | #2
a

On Fri, Dec 23, 2022 at 10:55 AM Paul Moore <paul@paul-moore.com> wrote:
>
> When changing the ebpf program put() routines to support being called
> from within IRQ context the program ID was reset to zero prior to
> calling the perf event and audit UNLOAD record generators, which
> resulted in problems as the ebpf program ID was bogus (always zero).
> This patch resolves this by adding a new flag, bpf_prog::valid_id, to
> indicate when the bpf_prog_aux ID field is valid; it is set to true/1
> in bpf_prog_alloc_id() and set to false/0 in bpf_prog_free_id().  In
> order to help ensure that access to the bpf_prog_aux ID field takes
> into account the new valid_id flag, the bpf_prog_aux ID field is
> renamed to bpf_prog_aux::__id and a getter function,
> bpf_prog_get_id(), was created and all users of bpf_prog_aux::id were
> converted to the new caller.  Exceptions to this include some of the
> internal ebpf functions and the xdp trace points, although the latter
> still take into account the valid_id flag.
>
> I also modified the bpf_audit_prog() logic used to associate the
> AUDIT_BPF record with other associated records, e.g. @ctx != NULL.
> Instead of keying off the operation, it now keys off the execution
> context, e.g. '!in_irg && !irqs_disabled()', which is much more
> appropriate and should help better connect the UNLOAD operations with
> the associated audit state (other audit records).
>
> Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
> Reported-by: Burn Alting <burn.alting@iinet.net.au>
> Reported-by: Jiri Olsa <olsajiri@gmail.com>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
>
> --
> * v2
>   - change subj
>   - add mention of the perf regression
>   - drop the dedicated program audit ID
>   - add the bpf_prog::valid_id flag, bpf_prog_get_id() getter
>   - convert prog ID users to new ID getter
> * v1
>   - subj was: "bpf: restore the ebpf audit UNLOAD id field"
>   - initial draft
> ---
>  drivers/net/netdevsim/bpf.c  |  6 ++++--
>  include/linux/bpf.h          | 11 +++++++++--
>  include/linux/bpf_verifier.h |  2 +-
>  include/trace/events/xdp.h   |  4 ++--
>  kernel/bpf/arraymap.c        |  2 +-
>  kernel/bpf/bpf_struct_ops.c  |  2 +-
>  kernel/bpf/cgroup.c          |  2 +-
>  kernel/bpf/core.c            |  2 +-
>  kernel/bpf/cpumap.c          |  2 +-
>  kernel/bpf/devmap.c          |  2 +-
>  kernel/bpf/syscall.c         | 27 +++++++++++++++------------
>  kernel/events/core.c         |  6 +++++-
>  kernel/trace/bpf_trace.c     |  2 +-
>  net/core/dev.c               |  2 +-
>  net/core/filter.c            |  3 ++-
>  net/core/rtnetlink.c         |  2 +-
>  net/core/sock_map.c          |  2 +-
>  net/ipv6/seg6_local.c        |  3 ++-
>  net/sched/act_bpf.c          |  2 +-
>  net/sched/cls_bpf.c          |  2 +-
>  20 files changed, 52 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
> index 50854265864d..2795f03f5f34 100644
> --- a/drivers/net/netdevsim/bpf.c
> +++ b/drivers/net/netdevsim/bpf.c
> @@ -109,7 +109,7 @@ nsim_bpf_offload(struct netdevsim *ns, struct bpf_prog *prog, bool oldprog)
>              "bad offload state, expected offload %sto be active",
>              oldprog ? "" : "not ");
>         ns->bpf_offloaded = prog;
> -       ns->bpf_offloaded_id = prog ? prog->aux->id : 0;
> +       ns->bpf_offloaded_id = prog ? bpf_prog_get_id(prog) : 0;
>         nsim_prog_set_loaded(prog, true);
>
>         return 0;
> @@ -221,6 +221,7 @@ static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
>         struct nsim_bpf_bound_prog *state;
>         char name[16];
>         int ret;
> +       u32 id;
>
>         state = kzalloc(sizeof(*state), GFP_KERNEL);
>         if (!state)
> @@ -239,7 +240,8 @@ static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
>                 return ret;
>         }
>
> -       debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
> +       id = bpf_prog_get_id(prog);
> +       debugfs_create_u32("id", 0400, state->ddir, &id);
>         debugfs_create_file("state", 0400, state->ddir,
>                             &state->state, &nsim_bpf_string_fops);
>         debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded);
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9e7d46d16032..18e965bd7db9 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1102,7 +1102,7 @@ struct bpf_prog_aux {
>         u32 max_pkt_offset;
>         u32 max_tp_access;
>         u32 stack_depth;
> -       u32 id;
> +       u32 __id; /* access via bpf_prog_get_id() to check bpf_prog::valid_id */
>         u32 func_cnt; /* used by non-func prog as the number of func progs */
>         u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
>         u32 attach_btf_id; /* in-kernel BTF type id to attach to */
> @@ -1197,7 +1197,8 @@ struct bpf_prog {
>                                 enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
>                                 call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>                                 call_get_func_ip:1, /* Do we call get_func_ip() */
> -                               tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> +                               tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
> +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
>         enum bpf_prog_type      type;           /* Type of BPF program */
>         enum bpf_attach_type    expected_attach_type; /* For some prog types */
>         u32                     len;            /* Number of filter blocks */
> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
>  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
>  void bpf_prog_put(struct bpf_prog *prog);
>
> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> +{
> +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> +               return 0;
> +       return prog->aux->__id;
> +}

I'm still missing why we need to have this WARN and have a check at all.
IIUC, we're actually too eager in resetting the id to 0, and need to
keep that stale id around at least for perf/audit.
Why not have a flag only to protect against double-idr_remove
bpf_prog_free_id and keep the rest as is?
Which places are we concerned about that used to report id=0 but now
would report stale id?


>  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
>  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 9e1e6965f407..525c02cc12ea 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -604,7 +604,7 @@ static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
>                                              struct btf *btf, u32 btf_id)
>  {
>         if (tgt_prog)
> -               return ((u64)tgt_prog->aux->id << 32) | btf_id;
> +               return ((u64)bpf_prog_get_id(tgt_prog) << 32) | btf_id;
>         else
>                 return ((u64)btf_obj_id(btf) << 32) | 0x80000000 | btf_id;
>  }
> diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
> index c40fc97f9417..a1c3048872ea 100644
> --- a/include/trace/events/xdp.h
> +++ b/include/trace/events/xdp.h
> @@ -39,7 +39,7 @@ TRACE_EVENT(xdp_exception,
>         ),
>
>         TP_fast_assign(
> -               __entry->prog_id        = xdp->aux->id;
> +               __entry->prog_id        = (xdp->valid_id ? xdp->aux->__id : 0);
>                 __entry->act            = act;
>                 __entry->ifindex        = dev->ifindex;
>         ),
> @@ -120,7 +120,7 @@ DECLARE_EVENT_CLASS(xdp_redirect_template,
>                         map_index = 0;
>                 }
>
> -               __entry->prog_id        = xdp->aux->id;
> +               __entry->prog_id        = (xdp->valid_id ? xdp->aux->__id : 0);
>                 __entry->act            = XDP_REDIRECT;
>                 __entry->ifindex        = dev->ifindex;
>                 __entry->err            = err;
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 832b2659e96e..d19db5980b1b 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -905,7 +905,7 @@ static void prog_fd_array_put_ptr(void *ptr)
>
>  static u32 prog_fd_array_sys_lookup_elem(void *ptr)
>  {
> -       return ((struct bpf_prog *)ptr)->aux->id;
> +       return bpf_prog_get_id((struct bpf_prog *)ptr);
>  }
>
>  /* decrement refcnt of all bpf_progs that are stored in this map */
> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
> index 84b2d9dba79a..6c20e6cd9442 100644
> --- a/kernel/bpf/bpf_struct_ops.c
> +++ b/kernel/bpf/bpf_struct_ops.c
> @@ -488,7 +488,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
>                 image += err;
>
>                 /* put prog_id to udata */
> -               *(unsigned long *)(udata + moff) = prog->aux->id;
> +               *(unsigned long *)(udata + moff) = bpf_prog_get_id(prog);
>         }
>
>         refcount_set(&kvalue->refcnt, 1);
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index bf2fdb33fb31..4a8d26f1d5d1 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1091,7 +1091,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>                         i = 0;
>                         hlist_for_each_entry(pl, progs, node) {
>                                 prog = prog_list_prog(pl);
> -                               id = prog->aux->id;
> +                               id = bpf_prog_get_id(prog);
>                                 if (copy_to_user(prog_ids + i, &id, sizeof(id)))
>                                         return -EFAULT;
>                                 if (++i == cnt)
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 25a54e04560e..ea3938ab6f5b 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2293,7 +2293,7 @@ static bool bpf_prog_array_copy_core(struct bpf_prog_array *array,
>         for (item = array->items; item->prog; item++) {
>                 if (item->prog == &dummy_bpf_prog.prog)
>                         continue;
> -               prog_ids[i] = item->prog->aux->id;
> +               prog_ids[i] = bpf_prog_get_id(item->prog);
>                 if (++i == request_cnt) {
>                         item++;
>                         break;
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index b5ba34ddd4b6..3f3423d03aea 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -413,7 +413,7 @@ static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu,
>                 return -EINVAL;
>         }
>
> -       rcpu->value.bpf_prog.id = prog->aux->id;
> +       rcpu->value.bpf_prog.id = bpf_prog_get_id(prog);
>         rcpu->prog = prog;
>
>         return 0;
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index f9a87dcc5535..d46309d4aa9e 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -868,7 +868,7 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
>         dev->dtab = dtab;
>         if (prog) {
>                 dev->xdp_prog = prog;
> -               dev->val.bpf_prog.id = prog->aux->id;
> +               dev->val.bpf_prog.id = bpf_prog_get_id(prog);
>         } else {
>                 dev->xdp_prog = NULL;
>                 dev->val.bpf_prog.id = 0;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 7b373a5e861f..9e862ef792cb 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1958,13 +1958,14 @@ static void bpf_audit_prog(const struct bpf_prog *prog, unsigned int op)
>                 return;
>         if (audit_enabled == AUDIT_OFF)
>                 return;
> -       if (op == BPF_AUDIT_LOAD)
> +       if (!in_irq() && !irqs_disabled())
>                 ctx = audit_context();
>         ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_BPF);
>         if (unlikely(!ab))
>                 return;
> +       /* log the id regardless of bpf_prog::valid_id */
>         audit_log_format(ab, "prog-id=%u op=%s",
> -                        prog->aux->id, bpf_audit_str[op]);
> +                        prog->aux->__id, bpf_audit_str[op]);
>         audit_log_end(ab);
>  }
>
> @@ -1975,8 +1976,10 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
>         idr_preload(GFP_KERNEL);
>         spin_lock_bh(&prog_idr_lock);
>         id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC);
> -       if (id > 0)
> -               prog->aux->id = id;
> +       if (id > 0) {
> +               prog->aux->__id = id;
> +               prog->valid_id = true;
> +       }
>         spin_unlock_bh(&prog_idr_lock);
>         idr_preload_end();
>
> @@ -1996,7 +1999,7 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
>          * disappears - even if someone grabs an fd to them they are unusable,
>          * simply waiting for refcnt to drop to be freed.
>          */
> -       if (!prog->aux->id)
> +       if (!prog->valid_id)
>                 return;
>
>         if (do_idr_lock)
> @@ -2004,8 +2007,8 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
>         else
>                 __acquire(&prog_idr_lock);
>
> -       idr_remove(&prog_idr, prog->aux->id);
> -       prog->aux->id = 0;
> +       idr_remove(&prog_idr, prog->aux->__id);
> +       prog->valid_id = false;
>
>         if (do_idr_lock)
>                 spin_unlock_irqrestore(&prog_idr_lock, flags);
> @@ -2154,7 +2157,7 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
>                    prog->jited,
>                    prog_tag,
>                    prog->pages * 1ULL << PAGE_SHIFT,
> -                  prog->aux->id,
> +                  bpf_prog_get_id(prog),
>                    stats.nsecs,
>                    stats.cnt,
>                    stats.misses,
> @@ -2786,7 +2789,7 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
>                    bpf_link_type_strs[link->type],
>                    link->id,
>                    prog_tag,
> -                  prog->aux->id);
> +                  bpf_prog_get_id(prog));
>         if (link->ops->show_fdinfo)
>                 link->ops->show_fdinfo(link, m);
>  }
> @@ -3914,7 +3917,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
>                 return -EFAULT;
>
>         info.type = prog->type;
> -       info.id = prog->aux->id;
> +       info.id = bpf_prog_get_id(prog);
>         info.load_time = prog->aux->load_time;
>         info.created_by_uid = from_kuid_munged(current_user_ns(),
>                                                prog->aux->user->uid);
> @@ -4261,7 +4264,7 @@ static int bpf_link_get_info_by_fd(struct file *file,
>
>         info.type = link->type;
>         info.id = link->id;
> -       info.prog_id = link->prog->aux->id;
> +       info.prog_id = bpf_prog_get_id(link->prog);
>
>         if (link->ops->fill_link_info) {
>                 err = link->ops->fill_link_info(link, &info);
> @@ -4426,7 +4429,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>                         struct bpf_raw_event_map *btp = raw_tp->btp;
>
>                         err = bpf_task_fd_query_copy(attr, uattr,
> -                                                    raw_tp->link.prog->aux->id,
> +                                                    bpf_prog_get_id(raw_tp->link.prog),
>                                                      BPF_FD_TYPE_RAW_TRACEPOINT,
>                                                      btp->tp->name, 0, 0);
>                         goto put_file;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index aefc1e08e015..c24e897d27f1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9001,7 +9001,11 @@ void perf_event_bpf_event(struct bpf_prog *prog,
>                         },
>                         .type = type,
>                         .flags = flags,
> -                       .id = prog->aux->id,
> +                       /*
> +                        * don't use bpf_prog_get_id() as the id may be marked
> +                        * invalid on PERF_BPF_EVENT_PROG_UNLOAD events
> +                        */
> +                       .id = prog->aux->__id,
>                 },
>         };
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 49fb9ec8366d..7cd0eb83b137 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2344,7 +2344,7 @@ int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>         if (prog->type == BPF_PROG_TYPE_PERF_EVENT)
>                 return -EOPNOTSUPP;
>
> -       *prog_id = prog->aux->id;
> +       *prog_id = bpf_prog_get_id(prog);
>         flags = event->tp_event->flags;
>         is_tracepoint = flags & TRACE_EVENT_FL_TRACEPOINT;
>         is_syscall_tp = is_syscall_trace_event(event->tp_event);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fa53830d0683..0d39ef22cf4b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9068,7 +9068,7 @@ u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode)
>  {
>         struct bpf_prog *prog = dev_xdp_prog(dev, mode);
>
> -       return prog ? prog->aux->id : 0;
> +       return prog ? bpf_prog_get_id(prog) : 0;
>  }
>
>  static void dev_xdp_set_link(struct net_device *dev, enum bpf_xdp_mode mode,
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bb0136e7a8e4..282ccfe34ced 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8729,7 +8729,8 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
>
>         pr_warn_once("%s XDP return value %u on prog %s (id %d) dev %s, expect packet loss!\n",
>                      act > act_max ? "Illegal" : "Driver unsupported",
> -                    act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A");
> +                    act, prog->aux->name, bpf_prog_get_id(prog),
> +                    dev ? dev->name : "N/A");
>  }
>  EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 74864dc46a7e..1f7e36909541 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1453,7 +1453,7 @@ static u32 rtnl_xdp_prog_skb(struct net_device *dev)
>         generic_xdp_prog = rtnl_dereference(dev->xdp_prog);
>         if (!generic_xdp_prog)
>                 return 0;
> -       return generic_xdp_prog->aux->id;
> +       return bpf_prog_get_id(generic_xdp_prog);
>  }
>
>  static u32 rtnl_xdp_prog_drv(struct net_device *dev)
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index a660baedd9e7..550ec6cb3aee 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -1518,7 +1518,7 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
>         /* we do not hold the refcnt, the bpf prog may be released
>          * asynchronously and the id would be set to 0.
>          */
> -       id = data_race(prog->aux->id);
> +       id = data_race(bpf_prog_get_id(prog));
>         if (id == 0)
>                 prog_cnt = 0;
>
> diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
> index 8370726ae7bf..440ce3aba802 100644
> --- a/net/ipv6/seg6_local.c
> +++ b/net/ipv6/seg6_local.c
> @@ -1543,7 +1543,8 @@ static int put_nla_bpf(struct sk_buff *skb, struct seg6_local_lwt *slwt)
>         if (!nest)
>                 return -EMSGSIZE;
>
> -       if (nla_put_u32(skb, SEG6_LOCAL_BPF_PROG, slwt->bpf.prog->aux->id))
> +       if (nla_put_u32(skb, SEG6_LOCAL_BPF_PROG,
> +                       bpf_prog_get_id(slwt->bpf.prog)))
>                 return -EMSGSIZE;
>
>         if (slwt->bpf.name &&
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index b79eee44e24e..604a29e482b0 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -121,7 +121,7 @@ static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf *prog,
>             nla_put_string(skb, TCA_ACT_BPF_NAME, prog->bpf_name))
>                 return -EMSGSIZE;
>
> -       if (nla_put_u32(skb, TCA_ACT_BPF_ID, prog->filter->aux->id))
> +       if (nla_put_u32(skb, TCA_ACT_BPF_ID, bpf_prog_get_id(prog->filter)))
>                 return -EMSGSIZE;
>
>         nla = nla_reserve(skb, TCA_ACT_BPF_TAG, sizeof(prog->filter->tag));
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index bc317b3eac12..eb5ac6be589e 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -565,7 +565,7 @@ static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
>             nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
>                 return -EMSGSIZE;
>
> -       if (nla_put_u32(skb, TCA_BPF_ID, prog->filter->aux->id))
> +       if (nla_put_u32(skb, TCA_BPF_ID, bpf_prog_get_id(prog->filter)))
>                 return -EMSGSIZE;
>
>         nla = nla_reserve(skb, TCA_BPF_TAG, sizeof(prog->filter->tag));
> --
> 2.39.0
>
Paul Moore Dec. 24, 2022, 3:31 p.m. UTC | #3
On Fri, Dec 23, 2022 at 8:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> On Fri, Dec 23, 2022 at 10:55 AM Paul Moore <paul@paul-moore.com> wrote:
> >
> > When changing the ebpf program put() routines to support being called
> > from within IRQ context the program ID was reset to zero prior to
> > calling the perf event and audit UNLOAD record generators, which
> > resulted in problems as the ebpf program ID was bogus (always zero).
> > This patch resolves this by adding a new flag, bpf_prog::valid_id, to
> > indicate when the bpf_prog_aux ID field is valid; it is set to true/1
> > in bpf_prog_alloc_id() and set to false/0 in bpf_prog_free_id().  In
> > order to help ensure that access to the bpf_prog_aux ID field takes
> > into account the new valid_id flag, the bpf_prog_aux ID field is
> > renamed to bpf_prog_aux::__id and a getter function,
> > bpf_prog_get_id(), was created and all users of bpf_prog_aux::id were
> > converted to the new caller.  Exceptions to this include some of the
> > internal ebpf functions and the xdp trace points, although the latter
> > still take into account the valid_id flag.
> >
> > I also modified the bpf_audit_prog() logic used to associate the
> > AUDIT_BPF record with other associated records, e.g. @ctx != NULL.
> > Instead of keying off the operation, it now keys off the execution
> > context, e.g. '!in_irg && !irqs_disabled()', which is much more
> > appropriate and should help better connect the UNLOAD operations with
> > the associated audit state (other audit records).
> >
> > Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
> > Reported-by: Burn Alting <burn.alting@iinet.net.au>
> > Reported-by: Jiri Olsa <olsajiri@gmail.com>
> > Signed-off-by: Paul Moore <paul@paul-moore.com>
> >
> > --
> > * v2
> >   - change subj
> >   - add mention of the perf regression
> >   - drop the dedicated program audit ID
> >   - add the bpf_prog::valid_id flag, bpf_prog_get_id() getter
> >   - convert prog ID users to new ID getter
> > * v1
> >   - subj was: "bpf: restore the ebpf audit UNLOAD id field"
> >   - initial draft
> > ---
> >  drivers/net/netdevsim/bpf.c  |  6 ++++--
> >  include/linux/bpf.h          | 11 +++++++++--
> >  include/linux/bpf_verifier.h |  2 +-
> >  include/trace/events/xdp.h   |  4 ++--
> >  kernel/bpf/arraymap.c        |  2 +-
> >  kernel/bpf/bpf_struct_ops.c  |  2 +-
> >  kernel/bpf/cgroup.c          |  2 +-
> >  kernel/bpf/core.c            |  2 +-
> >  kernel/bpf/cpumap.c          |  2 +-
> >  kernel/bpf/devmap.c          |  2 +-
> >  kernel/bpf/syscall.c         | 27 +++++++++++++++------------
> >  kernel/events/core.c         |  6 +++++-
> >  kernel/trace/bpf_trace.c     |  2 +-
> >  net/core/dev.c               |  2 +-
> >  net/core/filter.c            |  3 ++-
> >  net/core/rtnetlink.c         |  2 +-
> >  net/core/sock_map.c          |  2 +-
> >  net/ipv6/seg6_local.c        |  3 ++-
> >  net/sched/act_bpf.c          |  2 +-
> >  net/sched/cls_bpf.c          |  2 +-
> >  20 files changed, 52 insertions(+), 34 deletions(-)

...

> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 9e7d46d16032..18e965bd7db9 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> >  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> >  void bpf_prog_put(struct bpf_prog *prog);
> >
> > +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > +{
> > +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > +               return 0;
> > +       return prog->aux->__id;
> > +}
>
> I'm still missing why we need to have this WARN and have a check at all.

I believe I explained my reasoning in the other posting, but as I also
mentioned, it's your subsystem so I don't really care about the
details as long as we fix the bug/regression in the ebpf code.

> IIUC, we're actually too eager in resetting the id to 0, and need to
> keep that stale id around at least for perf/audit.

Agreed.

> Why not have a flag only to protect against double-idr_remove
> bpf_prog_free_id and keep the rest as is?

I'll send an updated patch next week with the only protection being a
check in bpf_prog_free_id().
Jiri Olsa Dec. 25, 2022, 2:13 p.m. UTC | #4
On Fri, Dec 23, 2022 at 01:55:31PM -0500, Paul Moore wrote:

SNIP

> diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
> index 50854265864d..2795f03f5f34 100644
> --- a/drivers/net/netdevsim/bpf.c
> +++ b/drivers/net/netdevsim/bpf.c
> @@ -109,7 +109,7 @@ nsim_bpf_offload(struct netdevsim *ns, struct bpf_prog *prog, bool oldprog)
>  	     "bad offload state, expected offload %sto be active",
>  	     oldprog ? "" : "not ");
>  	ns->bpf_offloaded = prog;
> -	ns->bpf_offloaded_id = prog ? prog->aux->id : 0;
> +	ns->bpf_offloaded_id = prog ? bpf_prog_get_id(prog) : 0;
>  	nsim_prog_set_loaded(prog, true);
>  
>  	return 0;
> @@ -221,6 +221,7 @@ static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
>  	struct nsim_bpf_bound_prog *state;
>  	char name[16];
>  	int ret;
> +	u32 id;
>  
>  	state = kzalloc(sizeof(*state), GFP_KERNEL);
>  	if (!state)
> @@ -239,7 +240,8 @@ static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
>  		return ret;
>  	}
>  
> -	debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
> +	id = bpf_prog_get_id(prog);
> +	debugfs_create_u32("id", 0400, state->ddir, &id);
>  	debugfs_create_file("state", 0400, state->ddir,
>  			    &state->state, &nsim_bpf_string_fops);
>  	debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded);
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9e7d46d16032..18e965bd7db9 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1102,7 +1102,7 @@ struct bpf_prog_aux {
>  	u32 max_pkt_offset;
>  	u32 max_tp_access;
>  	u32 stack_depth;
> -	u32 id;
> +	u32 __id; /* access via bpf_prog_get_id() to check bpf_prog::valid_id */

it breaks bpftool that uses

  BPF_CORE_READ((struct bpf_prog *)ent, aux, id);

and bpffs selftest because of preload iter object uses aux->id

  kernel/bpf/preload/iterators/iterators.bpf.c

it'd be great to have a solution that keep 'id' field,
because it's probably used in many bpf programs already

jirka

>  	u32 func_cnt; /* used by non-func prog as the number of func progs */
>  	u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
>  	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
> @@ -1197,7 +1197,8 @@ struct bpf_prog {
>  				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
>  				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>  				call_get_func_ip:1, /* Do we call get_func_ip() */
> -				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> +				tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
> +				valid_id:1; /* Is bpf_prog::aux::__id valid? */
>  	enum bpf_prog_type	type;		/* Type of BPF program */
>  	enum bpf_attach_type	expected_attach_type; /* For some prog types */
>  	u32			len;		/* Number of filter blocks */
> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
>  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
>  void bpf_prog_put(struct bpf_prog *prog);
>  

SNIP
Paul Moore Dec. 25, 2022, 7:14 p.m. UTC | #5
Apologies for the top post, but as I mentioned in my last message in this 
thread, next week I'll post a version without the getter/checks so this 
should not be an issue.

--
paul-moore.com

On December 25, 2022 9:13:40 AM Jiri Olsa <olsajiri@gmail.com> wrote:

> On Fri, Dec 23, 2022 at 01:55:31PM -0500, Paul Moore wrote:
>
> SNIP
>
>> diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
>> index 50854265864d..2795f03f5f34 100644
>> --- a/drivers/net/netdevsim/bpf.c
>> +++ b/drivers/net/netdevsim/bpf.c
>> @@ -109,7 +109,7 @@ nsim_bpf_offload(struct netdevsim *ns, struct bpf_prog 
>> *prog, bool oldprog)
>> "bad offload state, expected offload %sto be active",
>> oldprog ? "" : "not ");
>> ns->bpf_offloaded = prog;
>> - ns->bpf_offloaded_id = prog ? prog->aux->id : 0;
>> + ns->bpf_offloaded_id = prog ? bpf_prog_get_id(prog) : 0;
>> nsim_prog_set_loaded(prog, true);
>>
>> return 0;
>> @@ -221,6 +221,7 @@ static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
>> struct nsim_bpf_bound_prog *state;
>> char name[16];
>> int ret;
>> + u32 id;
>>
>> state = kzalloc(sizeof(*state), GFP_KERNEL);
>> if (!state)
>> @@ -239,7 +240,8 @@ static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
>> return ret;
>> }
>>
>> - debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
>> + id = bpf_prog_get_id(prog);
>> + debugfs_create_u32("id", 0400, state->ddir, &id);
>> debugfs_create_file("state", 0400, state->ddir,
>> &state->state, &nsim_bpf_string_fops);
>> debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded);
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 9e7d46d16032..18e965bd7db9 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1102,7 +1102,7 @@ struct bpf_prog_aux {
>> u32 max_pkt_offset;
>> u32 max_tp_access;
>> u32 stack_depth;
>> - u32 id;
>> + u32 __id; /* access via bpf_prog_get_id() to check bpf_prog::valid_id */
>
> it breaks bpftool that uses
>
>  BPF_CORE_READ((struct bpf_prog *)ent, aux, id);
>
> and bpffs selftest because of preload iter object uses aux->id
>
>  kernel/bpf/preload/iterators/iterators.bpf.c
>
> it'd be great to have a solution that keep 'id' field,
> because it's probably used in many bpf programs already
>
> jirka
>
>> u32 func_cnt; /* used by non-func prog as the number of func progs */
>> u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
>> u32 attach_btf_id; /* in-kernel BTF type id to attach to */
>> @@ -1197,7 +1197,8 @@ struct bpf_prog {
>> enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at 
>> attach time */
>> call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>> call_get_func_ip:1, /* Do we call get_func_ip() */
>> - tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
>> + tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
>> + valid_id:1; /* Is bpf_prog::aux::__id valid? */
>> enum bpf_prog_type type; /* Type of BPF program */
>> enum bpf_attach_type expected_attach_type; /* For some prog types */
>> u32 len; /* Number of filter blocks */
>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
>> void bpf_prog_put(struct bpf_prog *prog);
>
> SNIP
Alexei Starovoitov Dec. 25, 2022, 10:16 p.m. UTC | #6
On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
get_func_ip() */
> > -                               tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > +                               tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
> > +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> >         enum bpf_prog_type      type;           /* Type of BPF program */
> >         enum bpf_attach_type    expected_attach_type; /* For some prog types */
> >         u32                     len;            /* Number of filter blocks */
> > @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> >  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> >  void bpf_prog_put(struct bpf_prog *prog);
> >
> > +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > +{
> > +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > +               return 0;
> > +       return prog->aux->__id;
> > +}
>
> I'm still missing why we need to have this WARN and have a check at all.
> IIUC, we're actually too eager in resetting the id to 0, and need to
> keep that stale id around at least for perf/audit.
> Why not have a flag only to protect against double-idr_remove
> bpf_prog_free_id and keep the rest as is?
> Which places are we concerned about that used to report id=0 but now
> would report stale id?

What double-idr_remove are you concerned about?
bpf_prog_by_id() is doing bpf_prog_inc_not_zero
while __bpf_prog_put just dropped it to zero.

Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
after perf_event_bpf_event and bpf_audit_prog ?
Probably can remove the obsolete do_idr_lock bool flag as
separate patch?

Much simpler fix and no code churn.
Both valid_id and saved_id approaches have flaws.
Stanislav Fomichev Dec. 27, 2022, 3:35 a.m. UTC | #7
> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> get_func_ip() */
> > > -                               tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > +                               tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
> > > +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> > >         enum bpf_prog_type      type;           /* Type of BPF program */
> > >         enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > >         u32                     len;            /* Number of filter blocks */
> > > @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > >  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> > >  void bpf_prog_put(struct bpf_prog *prog);
> > >
> > > +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > > +{
> > > +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > > +               return 0;
> > > +       return prog->aux->__id;
> > > +}
> >
> > I'm still missing why we need to have this WARN and have a check at all.
> > IIUC, we're actually too eager in resetting the id to 0, and need to
> > keep that stale id around at least for perf/audit.
> > Why not have a flag only to protect against double-idr_remove
> > bpf_prog_free_id and keep the rest as is?
> > Which places are we concerned about that used to report id=0 but now
> > would report stale id?
> 
> What double-idr_remove are you concerned about?
> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> while __bpf_prog_put just dropped it to zero.

(traveling, sending from an untested setup, hope it reaches everyone)

There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
tries to make offloaded program disappear from the idr when the netdev
goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
is to handle that case where we do bpf_prog_free_id much earlier than the
rest of the __bpf_prog_put stuff.

> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> after perf_event_bpf_event and bpf_audit_prog ?
> Probably can remove the obsolete do_idr_lock bool flag as
> separate patch?

+1 on removing do_idr_lock separately.

> Much simpler fix and no code churn.
> Both valid_id and saved_id approaches have flaws.

Given the __bpf_prog_offload_destroy path above, we still probably need
some flag to indicate that the id has been already removed from the idr?
Paul Moore Dec. 27, 2022, 4:40 p.m. UTC | #8
On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev@yandex.ru> 
wrote:
>> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
>> get_func_ip() */
>>>> -                               tstamp_type_access:1; /* Accessed 
>>>> __sk_buff->tstamp_type */
>>>> +                               tstamp_type_access:1, /* Accessed 
>>>> __sk_buff->tstamp_type */
>>>> +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
>>>>    enum bpf_prog_type      type;           /* Type of BPF program */
>>>>    enum bpf_attach_type    expected_attach_type; /* For some prog types */
>>>>    u32                     len;            /* Number of filter blocks */
>>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
>>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
>>>> void bpf_prog_put(struct bpf_prog *prog);
>>>>
>>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
>>>> +{
>>>> +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
>>>> +               return 0;
>>>> +       return prog->aux->__id;
>>>> +}
>>>
>>> I'm still missing why we need to have this WARN and have a check at all.
>>> IIUC, we're actually too eager in resetting the id to 0, and need to
>>> keep that stale id around at least for perf/audit.
>>> Why not have a flag only to protect against double-idr_remove
>>> bpf_prog_free_id and keep the rest as is?
>>> Which places are we concerned about that used to report id=0 but now
>>> would report stale id?
>>
>> What double-idr_remove are you concerned about?
>> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
>> while __bpf_prog_put just dropped it to zero.
>
> (traveling, sending from an untested setup, hope it reaches everyone)
>
> There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> tries to make offloaded program disappear from the idr when the netdev
> goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
> is to handle that case where we do bpf_prog_free_id much earlier than the
> rest of the __bpf_prog_put stuff.
>
>> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
>> after perf_event_bpf_event and bpf_audit_prog ?
>> Probably can remove the obsolete do_idr_lock bool flag as
>> separate patch?
>
> +1 on removing do_idr_lock separately.
>
>> Much simpler fix and no code churn.
>> Both valid_id and saved_id approaches have flaws.
>
> Given the __bpf_prog_offload_destroy path above, we still probably need
> some flag to indicate that the id has been already removed from the idr?

So what do you guys want in a patch?  Is there a consensus on what you 
would merge to fix this bug/regression?

--
paul-moore.com
Alexei Starovoitov Dec. 27, 2022, 5:49 p.m. UTC | #9
On Mon, Dec 26, 2022 at 7:35 PM Stanislav Fomichev <stfomichev@yandex.ru> wrote:
>
> > On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> > get_func_ip() */
> > > > -                               tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > +                               tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
> > > > +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> > > >         enum bpf_prog_type      type;           /* Type of BPF program */
> > > >         enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > >         u32                     len;            /* Number of filter blocks */
> > > > @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > > >  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> > > >  void bpf_prog_put(struct bpf_prog *prog);
> > > >
> > > > +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > > > +{
> > > > +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > > > +               return 0;
> > > > +       return prog->aux->__id;
> > > > +}
> > >
> > > I'm still missing why we need to have this WARN and have a check at all.
> > > IIUC, we're actually too eager in resetting the id to 0, and need to
> > > keep that stale id around at least for perf/audit.
> > > Why not have a flag only to protect against double-idr_remove
> > > bpf_prog_free_id and keep the rest as is?
> > > Which places are we concerned about that used to report id=0 but now
> > > would report stale id?
> >
> > What double-idr_remove are you concerned about?
> > bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> > while __bpf_prog_put just dropped it to zero.
>
> (traveling, sending from an untested setup, hope it reaches everyone)
>
> There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> tries to make offloaded program disappear from the idr when the netdev
> goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
> is to handle that case where we do bpf_prog_free_id much earlier than the
> rest of the __bpf_prog_put stuff.

That remove was done in
commit ad8ad79f4f60 ("bpf: offload: free program id when device disappears")
Back in 2017 there was no bpf audit and no
PERF_BPF_EVENT_PROG_LOAD/UNLOAD events.

The removal of id made sense back then to avoid showing this
'useless' orphaned offloaded prog in 'bpftool prog show',
but with addition of perf load/unload and audit it was no longer
correct to zero out ID in a prog which refcnt is still not zero.

So we should simply remove bpf_prog_free_id from __bpf_prog_offload_destroy.
There won't be any adverse effect other than bpftool prog show
will show orphaned progs.

>
> > Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> > after perf_event_bpf_event and bpf_audit_prog ?
> > Probably can remove the obsolete do_idr_lock bool flag as
> > separate patch?
>
> +1 on removing do_idr_lock separately.
>
> > Much simpler fix and no code churn.
> > Both valid_id and saved_id approaches have flaws.
>
> Given the __bpf_prog_offload_destroy path above, we still probably need
> some flag to indicate that the id has been already removed from the idr?

No. ID should be valid until prog went through perf and audit unload
events. Don't know about audit, but for perf it's essential to have
valid ID otherwise perf record will not be able to properly associate events.
Stanislav Fomichev Dec. 28, 2022, 12:25 a.m. UTC | #10
> On Mon, Dec 26, 2022 at 7:35 PM Stanislav Fomichev <stfomichev@yandex.ru> wrote:
> >
> > > On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > get_func_ip() */
> > > > > -                               tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > +                               tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
> > > > > +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> > > > >         enum bpf_prog_type      type;           /* Type of BPF program */
> > > > >         enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > >         u32                     len;            /* Number of filter blocks */
> > > > > @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > > > >  struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> > > > >  void bpf_prog_put(struct bpf_prog *prog);
> > > > >
> > > > > +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > > > > +{
> > > > > +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > > > > +               return 0;
> > > > > +       return prog->aux->__id;
> > > > > +}
> > > >
> > > > I'm still missing why we need to have this WARN and have a check at all.
> > > > IIUC, we're actually too eager in resetting the id to 0, and need to
> > > > keep that stale id around at least for perf/audit.
> > > > Why not have a flag only to protect against double-idr_remove
> > > > bpf_prog_free_id and keep the rest as is?
> > > > Which places are we concerned about that used to report id=0 but now
> > > > would report stale id?
> > >
> > > What double-idr_remove are you concerned about?
> > > bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> > > while __bpf_prog_put just dropped it to zero.
> >
> > (traveling, sending from an untested setup, hope it reaches everyone)
> >
> > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> > tries to make offloaded program disappear from the idr when the netdev
> > goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
> > is to handle that case where we do bpf_prog_free_id much earlier than the
> > rest of the __bpf_prog_put stuff.
> 
> That remove was done in
> commit ad8ad79f4f60 ("bpf: offload: free program id when device disappears")
> Back in 2017 there was no bpf audit and no
> PERF_BPF_EVENT_PROG_LOAD/UNLOAD events.
> 
> The removal of id made sense back then to avoid showing this
> 'useless' orphaned offloaded prog in 'bpftool prog show',
> but with addition of perf load/unload and audit it was no longer
> correct to zero out ID in a prog which refcnt is still not zero.
> 
> So we should simply remove bpf_prog_free_id from __bpf_prog_offload_destroy.
> There won't be any adverse effect other than bpftool prog show
> will show orphaned progs.

SGTM, that would simplify everything..

> >
> > > Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> > > after perf_event_bpf_event and bpf_audit_prog ?
> > > Probably can remove the obsolete do_idr_lock bool flag as
> > > separate patch?
> >
> > +1 on removing do_idr_lock separately.
> >
> > > Much simpler fix and no code churn.
> > > Both valid_id and saved_id approaches have flaws.
> >
> > Given the __bpf_prog_offload_destroy path above, we still probably need
> > some flag to indicate that the id has been already removed from the idr?
> 
> No. ID should be valid until prog went through perf and audit unload
> events. Don't know about audit, but for perf it's essential to have
> valid ID otherwise perf record will not be able to properly associate events.
Stanislav Fomichev Dec. 30, 2022, 2:13 a.m. UTC | #11
On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev@yandex.ru>
> wrote:
> >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> >> get_func_ip() */
> >>>> -                               tstamp_type_access:1; /* Accessed
> >>>> __sk_buff->tstamp_type */
> >>>> +                               tstamp_type_access:1, /* Accessed
> >>>> __sk_buff->tstamp_type */
> >>>> +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> >>>>    enum bpf_prog_type      type;           /* Type of BPF program */
> >>>>    enum bpf_attach_type    expected_attach_type; /* For some prog types */
> >>>>    u32                     len;            /* Number of filter blocks */
> >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> >>>> void bpf_prog_put(struct bpf_prog *prog);
> >>>>
> >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> >>>> +{
> >>>> +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> >>>> +               return 0;
> >>>> +       return prog->aux->__id;
> >>>> +}
> >>>
> >>> I'm still missing why we need to have this WARN and have a check at all.
> >>> IIUC, we're actually too eager in resetting the id to 0, and need to
> >>> keep that stale id around at least for perf/audit.
> >>> Why not have a flag only to protect against double-idr_remove
> >>> bpf_prog_free_id and keep the rest as is?
> >>> Which places are we concerned about that used to report id=0 but now
> >>> would report stale id?
> >>
> >> What double-idr_remove are you concerned about?
> >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> >> while __bpf_prog_put just dropped it to zero.
> >
> > (traveling, sending from an untested setup, hope it reaches everyone)
> >
> > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> > tries to make offloaded program disappear from the idr when the netdev
> > goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
> > is to handle that case where we do bpf_prog_free_id much earlier than the
> > rest of the __bpf_prog_put stuff.
> >
> >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> >> after perf_event_bpf_event and bpf_audit_prog ?
> >> Probably can remove the obsolete do_idr_lock bool flag as
> >> separate patch?
> >
> > +1 on removing do_idr_lock separately.
> >
> >> Much simpler fix and no code churn.
> >> Both valid_id and saved_id approaches have flaws.
> >
> > Given the __bpf_prog_offload_destroy path above, we still probably need
> > some flag to indicate that the id has been already removed from the idr?
>
> So what do you guys want in a patch?  Is there a consensus on what you
> would merge to fix this bug/regression?

Can we try the following?

1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from
kernel/bpf/offload.c; that should make it easier to reason about those
'!id' checks
2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after
audit/perf in kernel/bpf/syscall.c (there are comments that say "must
be called first", but I don't see why; seems like GET_FD_BY_ID would
correctly return -ENOENT; maybe Martin can chime in, CC'ed him
explicitly)
3. (optionally) Remove do_idr_lock arguments (all callers are passing 'true')
Alexei Starovoitov Dec. 30, 2022, 3:10 a.m. UTC | #12
On Thu, Dec 29, 2022 at 6:13 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev@yandex.ru>
> > wrote:
> > >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >> get_func_ip() */
> > >>>> -                               tstamp_type_access:1; /* Accessed
> > >>>> __sk_buff->tstamp_type */
> > >>>> +                               tstamp_type_access:1, /* Accessed
> > >>>> __sk_buff->tstamp_type */
> > >>>> +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> > >>>>    enum bpf_prog_type      type;           /* Type of BPF program */
> > >>>>    enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > >>>>    u32                     len;            /* Number of filter blocks */
> > >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> > >>>> void bpf_prog_put(struct bpf_prog *prog);
> > >>>>
> > >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > >>>> +{
> > >>>> +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > >>>> +               return 0;
> > >>>> +       return prog->aux->__id;
> > >>>> +}
> > >>>
> > >>> I'm still missing why we need to have this WARN and have a check at all.
> > >>> IIUC, we're actually too eager in resetting the id to 0, and need to
> > >>> keep that stale id around at least for perf/audit.
> > >>> Why not have a flag only to protect against double-idr_remove
> > >>> bpf_prog_free_id and keep the rest as is?
> > >>> Which places are we concerned about that used to report id=0 but now
> > >>> would report stale id?
> > >>
> > >> What double-idr_remove are you concerned about?
> > >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> > >> while __bpf_prog_put just dropped it to zero.
> > >
> > > (traveling, sending from an untested setup, hope it reaches everyone)
> > >
> > > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> > > tries to make offloaded program disappear from the idr when the netdev
> > > goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
> > > is to handle that case where we do bpf_prog_free_id much earlier than the
> > > rest of the __bpf_prog_put stuff.
> > >
> > >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> > >> after perf_event_bpf_event and bpf_audit_prog ?
> > >> Probably can remove the obsolete do_idr_lock bool flag as
> > >> separate patch?
> > >
> > > +1 on removing do_idr_lock separately.
> > >
> > >> Much simpler fix and no code churn.
> > >> Both valid_id and saved_id approaches have flaws.
> > >
> > > Given the __bpf_prog_offload_destroy path above, we still probably need
> > > some flag to indicate that the id has been already removed from the idr?
> >
> > So what do you guys want in a patch?  Is there a consensus on what you
> > would merge to fix this bug/regression?
>
> Can we try the following?
>
> 1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from
> kernel/bpf/offload.c; that should make it easier to reason about those
> '!id' checks

calls? you mean a single call, right?

> 2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after
> audit/perf in kernel/bpf/syscall.c (there are comments that say "must
> be called first", but I don't see why; seems like GET_FD_BY_ID would
> correctly return -ENOENT; maybe Martin can chime in, CC'ed him
> explicitly)

The comment says that it should be removed from idr
before __bpf_prog_put_noref will proceed to clean up.

> 3. (optionally) Remove do_idr_lock arguments (all callers are passing 'true')

yes. please.
Stanislav Fomichev Dec. 30, 2022, 3:38 a.m. UTC | #13
On Thu, Dec 29, 2022 at 7:10 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Dec 29, 2022 at 6:13 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev@yandex.ru>
> > > wrote:
> > > >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >> get_func_ip() */
> > > >>>> -                               tstamp_type_access:1; /* Accessed
> > > >>>> __sk_buff->tstamp_type */
> > > >>>> +                               tstamp_type_access:1, /* Accessed
> > > >>>> __sk_buff->tstamp_type */
> > > >>>> +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> > > >>>>    enum bpf_prog_type      type;           /* Type of BPF program */
> > > >>>>    enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > >>>>    u32                     len;            /* Number of filter blocks */
> > > >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > > >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> > > >>>> void bpf_prog_put(struct bpf_prog *prog);
> > > >>>>
> > > >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > > >>>> +{
> > > >>>> +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > > >>>> +               return 0;
> > > >>>> +       return prog->aux->__id;
> > > >>>> +}
> > > >>>
> > > >>> I'm still missing why we need to have this WARN and have a check at all.
> > > >>> IIUC, we're actually too eager in resetting the id to 0, and need to
> > > >>> keep that stale id around at least for perf/audit.
> > > >>> Why not have a flag only to protect against double-idr_remove
> > > >>> bpf_prog_free_id and keep the rest as is?
> > > >>> Which places are we concerned about that used to report id=0 but now
> > > >>> would report stale id?
> > > >>
> > > >> What double-idr_remove are you concerned about?
> > > >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> > > >> while __bpf_prog_put just dropped it to zero.
> > > >
> > > > (traveling, sending from an untested setup, hope it reaches everyone)
> > > >
> > > > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> > > > tries to make offloaded program disappear from the idr when the netdev
> > > > goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
> > > > is to handle that case where we do bpf_prog_free_id much earlier than the
> > > > rest of the __bpf_prog_put stuff.
> > > >
> > > >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> > > >> after perf_event_bpf_event and bpf_audit_prog ?
> > > >> Probably can remove the obsolete do_idr_lock bool flag as
> > > >> separate patch?
> > > >
> > > > +1 on removing do_idr_lock separately.
> > > >
> > > >> Much simpler fix and no code churn.
> > > >> Both valid_id and saved_id approaches have flaws.
> > > >
> > > > Given the __bpf_prog_offload_destroy path above, we still probably need
> > > > some flag to indicate that the id has been already removed from the idr?
> > >
> > > So what do you guys want in a patch?  Is there a consensus on what you
> > > would merge to fix this bug/regression?
> >
> > Can we try the following?
> >
> > 1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from
> > kernel/bpf/offload.c; that should make it easier to reason about those
> > '!id' checks
>
> calls? you mean a single call, right?

Right, there is a single call to bpf_prog_free_id. But there is also
another single call to bpf_map_free_id with the same "remove it from
idr so it can't be found if GET_NEXT_ID" reasoning.
It's probably worth it to look into whether we can remove it as well
to have consistent id management for progs and maps?

> > 2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after
> > audit/perf in kernel/bpf/syscall.c (there are comments that say "must
> > be called first", but I don't see why; seems like GET_FD_BY_ID would
> > correctly return -ENOENT; maybe Martin can chime in, CC'ed him
> > explicitly)
>
> The comment says that it should be removed from idr
> before __bpf_prog_put_noref will proceed to clean up.

Which one? I was trying to see if there is any reasoning in the
original commit 34ad5580f8f9 ("bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID
command"), but couldn't find anything useful :-(

> > 3. (optionally) Remove do_idr_lock arguments (all callers are passing 'true')
>
> yes. please.
Alexei Starovoitov Dec. 30, 2022, 4:18 a.m. UTC | #14
On Thu, Dec 29, 2022 at 7:38 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Thu, Dec 29, 2022 at 7:10 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Dec 29, 2022 at 6:13 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev@yandex.ru>
> > > > wrote:
> > > > >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >> get_func_ip() */
> > > > >>>> -                               tstamp_type_access:1; /* Accessed
> > > > >>>> __sk_buff->tstamp_type */
> > > > >>>> +                               tstamp_type_access:1, /* Accessed
> > > > >>>> __sk_buff->tstamp_type */
> > > > >>>> +                               valid_id:1; /* Is bpf_prog::aux::__id valid? */
> > > > >>>>    enum bpf_prog_type      type;           /* Type of BPF program */
> > > > >>>>    enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > >>>>    u32                     len;            /* Number of filter blocks */
> > > > >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > > > >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
> > > > >>>> void bpf_prog_put(struct bpf_prog *prog);
> > > > >>>>
> > > > >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > > > >>>> +{
> > > > >>>> +       if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
> > > > >>>> +               return 0;
> > > > >>>> +       return prog->aux->__id;
> > > > >>>> +}
> > > > >>>
> > > > >>> I'm still missing why we need to have this WARN and have a check at all.
> > > > >>> IIUC, we're actually too eager in resetting the id to 0, and need to
> > > > >>> keep that stale id around at least for perf/audit.
> > > > >>> Why not have a flag only to protect against double-idr_remove
> > > > >>> bpf_prog_free_id and keep the rest as is?
> > > > >>> Which places are we concerned about that used to report id=0 but now
> > > > >>> would report stale id?
> > > > >>
> > > > >> What double-idr_remove are you concerned about?
> > > > >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> > > > >> while __bpf_prog_put just dropped it to zero.
> > > > >
> > > > > (traveling, sending from an untested setup, hope it reaches everyone)
> > > > >
> > > > > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which
> > > > > tries to make offloaded program disappear from the idr when the netdev
> > > > > goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id
> > > > > is to handle that case where we do bpf_prog_free_id much earlier than the
> > > > > rest of the __bpf_prog_put stuff.
> > > > >
> > > > >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> > > > >> after perf_event_bpf_event and bpf_audit_prog ?
> > > > >> Probably can remove the obsolete do_idr_lock bool flag as
> > > > >> separate patch?
> > > > >
> > > > > +1 on removing do_idr_lock separately.
> > > > >
> > > > >> Much simpler fix and no code churn.
> > > > >> Both valid_id and saved_id approaches have flaws.
> > > > >
> > > > > Given the __bpf_prog_offload_destroy path above, we still probably need
> > > > > some flag to indicate that the id has been already removed from the idr?
> > > >
> > > > So what do you guys want in a patch?  Is there a consensus on what you
> > > > would merge to fix this bug/regression?
> > >
> > > Can we try the following?
> > >
> > > 1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from
> > > kernel/bpf/offload.c; that should make it easier to reason about those
> > > '!id' checks
> >
> > calls? you mean a single call, right?
>
> Right, there is a single call to bpf_prog_free_id. But there is also
> another single call to bpf_map_free_id with the same "remove it from
> idr so it can't be found if GET_NEXT_ID" reasoning.

map offloading is different from prog offload.
Like:
        if (bpf_map_is_dev_bound(map))
                return bpf_map_offload_lookup_elem(map, key, value);

gotta be much more careful with them and offload.

> It's probably worth it to look into whether we can remove it as well
> to have consistent id management for progs and maps?

I'd rather not at this point.
Consistency sounds nice, but requires a ton more work.

> > > 2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after
> > > audit/perf in kernel/bpf/syscall.c (there are comments that say "must
> > > be called first", but I don't see why; seems like GET_FD_BY_ID would
> > > correctly return -ENOENT; maybe Martin can chime in, CC'ed him
> > > explicitly)
> >
> > The comment says that it should be removed from idr
> > before __bpf_prog_put_noref will proceed to clean up.
>
> Which one? I was trying to see if there is any reasoning in the
> original commit 34ad5580f8f9 ("bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID
> command"), but couldn't find anything useful :-(

Maybe back then we didn't have atomic_inc_not_zero(prog/map->refcnt) ?
I don't really recall what race we were worried about.

> > > 3. (optionally) Remove do_idr_lock arguments (all callers are passing 'true')
> >
> > yes. please.
Yujie Liu Jan. 3, 2023, 12:33 a.m. UTC | #15
Hi Paul,

We noticed that there has been a lot of discussion on this patch, and a
new version will be posted soon. Not sure if the problem in this report
has been spotted or not, so we are sending this report FYI. Thanks.

Greetings,

We noticed BUG:unable_to_handle_page_fault_for_address due to commit (built with gcc-11):

commit: 30e779c8882f2f84869405eef26e37785a1849ac ("[PATCH v2] bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD")
url: https://github.com/intel-lab-lkp/linux/commits/Paul-Moore/bpf-restore-the-ebpf-program-ID-for-BPF_AUDIT_UNLOAD-and-PERF_BPF_EVENT_PROG_UNLOAD/20221224-025703
base: https://git.kernel.org/cgit/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/all/20221223185531.222689-1-paul@paul-moore.com/
patch subject: [PATCH v2] bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


[   83.246474][    T1] BUG: unable to handle page fault for address: ffffc90000026003
[   83.249440][    T1] #PF: supervisor write access in kernel mode
[   83.251774][    T1] #PF: error_code(0x0003) - permissions violation
[   83.254275][    T1] PGD 100000067 P4D 100000067 PUD 100122067 PMD 100123067 PTE 800000014a9c4161
[   83.257884][    T1] Oops: 0003 [#1] KASAN
[   83.259578][    T1] CPU: 0 PID: 1 Comm: swapper Tainted: G                T  6.1.0-09655-g30e779c8882f #28 fbb398f715584ab16b1be88180e395d344d64436
[   83.264371][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[ 83.268137][ T1] RIP: 0010:bpf_prog_load (syscall.c:?) 
[ 83.270295][ T1] Code: ff 37 00 45 89 65 20 48 89 fa 48 c1 e0 2a 48 c1 ea 03 8a 14 02 48 89 f8 83 e0 07 ff c0 38 d0 7c 09 84 d2 74 05 e8 78 8c 19 00 <80> 4b 03 40 48 c7 c7 a0 01 36 85 e8 9e 77 28 02 e8 04 7a ff ff 45
All code
========
   0:	ff 37                	pushq  (%rdi)
   2:	00 45 89             	add    %al,-0x77(%rbp)
   5:	65 20 48 89          	and    %cl,%gs:-0x77(%rax)
   9:	fa                   	cli    
   a:	48 c1 e0 2a          	shl    $0x2a,%rax
   e:	48 c1 ea 03          	shr    $0x3,%rdx
  12:	8a 14 02             	mov    (%rdx,%rax,1),%dl
  15:	48 89 f8             	mov    %rdi,%rax
  18:	83 e0 07             	and    $0x7,%eax
  1b:	ff c0                	inc    %eax
  1d:	38 d0                	cmp    %dl,%al
  1f:	7c 09                	jl     0x2a
  21:	84 d2                	test   %dl,%dl
  23:	74 05                	je     0x2a
  25:	e8 78 8c 19 00       	callq  0x198ca2
  2a:*	80 4b 03 40          	orb    $0x40,0x3(%rbx)		<-- trapping instruction
  2e:	48 c7 c7 a0 01 36 85 	mov    $0xffffffff853601a0,%rdi
  35:	e8 9e 77 28 02       	callq  0x22877d8
  3a:	e8 04 7a ff ff       	callq  0xffffffffffff7a43
  3f:	45                   	rex.RB

Code starting with the faulting instruction
===========================================
   0:	80 4b 03 40          	orb    $0x40,0x3(%rbx)
   4:	48 c7 c7 a0 01 36 85 	mov    $0xffffffff853601a0,%rdi
   b:	e8 9e 77 28 02       	callq  0x22877ae
  10:	e8 04 7a ff ff       	callq  0xffffffffffff7a19
  15:	45                   	rex.RB
[   83.277723][    T1] RSP: 0000:ffffc9000001f900 EFLAGS: 00010246
[   83.280272][    T1] RAX: 0000000000000003 RBX: ffffc90000026000 RCX: 000000007ffffffe
[   83.283494][    T1] RDX: 1ffff92000004c00 RSI: 0000000000000008 RDI: ffffc90000026002
[   83.286512][    T1] RBP: ffffc9000001fa88 R08: 0000000000000008 R09: 0000000000000001
[   83.289897][    T1] R10: ffffed1028b397b6 R11: ffff8881459cbdaf R12: 0000000000000001
[   83.293058][    T1] R13: ffff88814aad2000 R14: ffffffff83ea1f60 R15: ffff88814aad2000
[   83.296246][    T1] FS:  0000000000000000(0000) GS:ffffffff84ed4000(0000) knlGS:0000000000000000
[   83.299784][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.302378][    T1] CR2: ffffc90000026003 CR3: 0000000004e3a000 CR4: 00000000000406b0
[   83.305526][    T1] Call Trace:
[   83.307004][    T1]  <TASK>
[ 83.308267][ T1] ? bpf_prog_get (syscall.c:?) 
[ 83.310014][ T1] ? __sys_bpf (syscall.c:?) 
[ 83.311820][ T1] ? bpf_link_by_id (syscall.c:?) 
[ 83.313794][ T1] ? copy_from_kernel_nofault (??:?) 
[ 83.315860][ T1] ? copy_from_bpfptr (syscall.c:?) 
[ 83.317717][ T1] ? bpf_obj_memcpy (arraymap.c:?) 
[ 83.323880][ T1] __sys_bpf (syscall.c:?) 
[ 83.325623][ T1] ? bpf_link_by_id (syscall.c:?) 
[ 83.327526][ T1] ? kern_sys_bpf (??:?) 
[ 83.329365][ T1] ? find_held_lock (lockdep.c:?) 
[ 83.331305][ T1] kern_sys_bpf (??:?) 
[ 83.333077][ T1] bpf_load_and_run+0x284/0x3c8 
[ 83.335332][ T1] ? iterators_bpf__destroy+0x14d/0x14d 
[ 83.337424][ T1] ? kasan_unpoison (??:?) 
[ 83.339268][ T1] ? __kasan_slab_alloc (??:?) 
[ 83.341334][ T1] ? trace_kmalloc (slab_common.c:?) 
[ 83.343249][ T1] ? __kmalloc_node (??:?) 
[ 83.345040][ T1] load_skel (bpf_preload_kern.c:?) 
[ 83.346671][ T1] ? free_links_and_skel (bpf_preload_kern.c:?) 
[ 83.348756][ T1] ? rcu_read_lock_sched_held (??:?) 
[ 83.350996][ T1] ? bpf_iter_cgroup (bpf_preload_kern.c:?) 
[ 83.352705][ T1] load (bpf_preload_kern.c:?) 
[ 83.354259][ T1] do_one_initcall (??:?) 
[ 83.356051][ T1] ? rcu_lock_acquire (??:?) 
[ 83.358022][ T1] ? rcu_read_lock_sched_held (??:?) 
[ 83.360100][ T1] ? rcu_read_lock_bh_held (??:?) 
[ 83.362036][ T1] do_initcalls (main.c:?) 
[ 83.363846][ T1] kernel_init_freeable (main.c:?) 
[ 83.365850][ T1] ? rest_init (main.c:?) 
[ 83.367612][ T1] kernel_init (main.c:?) 
[ 83.369180][ T1] ret_from_fork (??:?) 
[   83.370863][    T1]  </TASK>
[   83.372036][    T1] Modules linked in:
[   83.373544][    T1] CR2: ffffc90000026003
[   83.375076][    T1] ---[ end trace 0000000000000000 ]---
[ 83.377006][ T1] RIP: 0010:bpf_prog_load (syscall.c:?) 
[ 83.378816][ T1] Code: ff 37 00 45 89 65 20 48 89 fa 48 c1 e0 2a 48 c1 ea 03 8a 14 02 48 89 f8 83 e0 07 ff c0 38 d0 7c 09 84 d2 74 05 e8 78 8c 19 00 <80> 4b 03 40 48 c7 c7 a0 01 36 85 e8 9e 77 28 02 e8 04 7a ff ff 45


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@intel.com>
| Link: https://lore.kernel.org/oe-lkp/202301022358.7f742b86-yujie.liu@intel.com


To reproduce:

        # build kernel
	cd linux
	cp config-6.1.0-09655-g30e779c8882f .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
diff mbox series

Patch

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 50854265864d..2795f03f5f34 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -109,7 +109,7 @@  nsim_bpf_offload(struct netdevsim *ns, struct bpf_prog *prog, bool oldprog)
 	     "bad offload state, expected offload %sto be active",
 	     oldprog ? "" : "not ");
 	ns->bpf_offloaded = prog;
-	ns->bpf_offloaded_id = prog ? prog->aux->id : 0;
+	ns->bpf_offloaded_id = prog ? bpf_prog_get_id(prog) : 0;
 	nsim_prog_set_loaded(prog, true);
 
 	return 0;
@@ -221,6 +221,7 @@  static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
 	struct nsim_bpf_bound_prog *state;
 	char name[16];
 	int ret;
+	u32 id;
 
 	state = kzalloc(sizeof(*state), GFP_KERNEL);
 	if (!state)
@@ -239,7 +240,8 @@  static int nsim_bpf_create_prog(struct nsim_dev *nsim_dev,
 		return ret;
 	}
 
-	debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
+	id = bpf_prog_get_id(prog);
+	debugfs_create_u32("id", 0400, state->ddir, &id);
 	debugfs_create_file("state", 0400, state->ddir,
 			    &state->state, &nsim_bpf_string_fops);
 	debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9e7d46d16032..18e965bd7db9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1102,7 +1102,7 @@  struct bpf_prog_aux {
 	u32 max_pkt_offset;
 	u32 max_tp_access;
 	u32 stack_depth;
-	u32 id;
+	u32 __id; /* access via bpf_prog_get_id() to check bpf_prog::valid_id */
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
 	u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
 	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
@@ -1197,7 +1197,8 @@  struct bpf_prog {
 				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
 				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
 				call_get_func_ip:1, /* Do we call get_func_ip() */
-				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
+				tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
+				valid_id:1; /* Is bpf_prog::aux::__id valid? */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
 	u32			len;		/* Number of filter blocks */
@@ -1688,6 +1689,12 @@  void bpf_prog_inc(struct bpf_prog *prog);
 struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 
+static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
+{
+	if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program"))
+		return 0;
+	return prog->aux->__id;
+}
 void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9e1e6965f407..525c02cc12ea 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -604,7 +604,7 @@  static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
 					     struct btf *btf, u32 btf_id)
 {
 	if (tgt_prog)
-		return ((u64)tgt_prog->aux->id << 32) | btf_id;
+		return ((u64)bpf_prog_get_id(tgt_prog) << 32) | btf_id;
 	else
 		return ((u64)btf_obj_id(btf) << 32) | 0x80000000 | btf_id;
 }
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index c40fc97f9417..a1c3048872ea 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -39,7 +39,7 @@  TRACE_EVENT(xdp_exception,
 	),
 
 	TP_fast_assign(
-		__entry->prog_id	= xdp->aux->id;
+		__entry->prog_id	= (xdp->valid_id ? xdp->aux->__id : 0);
 		__entry->act		= act;
 		__entry->ifindex	= dev->ifindex;
 	),
@@ -120,7 +120,7 @@  DECLARE_EVENT_CLASS(xdp_redirect_template,
 			map_index = 0;
 		}
 
-		__entry->prog_id	= xdp->aux->id;
+		__entry->prog_id	= (xdp->valid_id ? xdp->aux->__id : 0);
 		__entry->act		= XDP_REDIRECT;
 		__entry->ifindex	= dev->ifindex;
 		__entry->err		= err;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 832b2659e96e..d19db5980b1b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -905,7 +905,7 @@  static void prog_fd_array_put_ptr(void *ptr)
 
 static u32 prog_fd_array_sys_lookup_elem(void *ptr)
 {
-	return ((struct bpf_prog *)ptr)->aux->id;
+	return bpf_prog_get_id((struct bpf_prog *)ptr);
 }
 
 /* decrement refcnt of all bpf_progs that are stored in this map */
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 84b2d9dba79a..6c20e6cd9442 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -488,7 +488,7 @@  static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
 		image += err;
 
 		/* put prog_id to udata */
-		*(unsigned long *)(udata + moff) = prog->aux->id;
+		*(unsigned long *)(udata + moff) = bpf_prog_get_id(prog);
 	}
 
 	refcount_set(&kvalue->refcnt, 1);
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index bf2fdb33fb31..4a8d26f1d5d1 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1091,7 +1091,7 @@  static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 			i = 0;
 			hlist_for_each_entry(pl, progs, node) {
 				prog = prog_list_prog(pl);
-				id = prog->aux->id;
+				id = bpf_prog_get_id(prog);
 				if (copy_to_user(prog_ids + i, &id, sizeof(id)))
 					return -EFAULT;
 				if (++i == cnt)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 25a54e04560e..ea3938ab6f5b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2293,7 +2293,7 @@  static bool bpf_prog_array_copy_core(struct bpf_prog_array *array,
 	for (item = array->items; item->prog; item++) {
 		if (item->prog == &dummy_bpf_prog.prog)
 			continue;
-		prog_ids[i] = item->prog->aux->id;
+		prog_ids[i] = bpf_prog_get_id(item->prog);
 		if (++i == request_cnt) {
 			item++;
 			break;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index b5ba34ddd4b6..3f3423d03aea 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -413,7 +413,7 @@  static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu,
 		return -EINVAL;
 	}
 
-	rcpu->value.bpf_prog.id = prog->aux->id;
+	rcpu->value.bpf_prog.id = bpf_prog_get_id(prog);
 	rcpu->prog = prog;
 
 	return 0;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f9a87dcc5535..d46309d4aa9e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -868,7 +868,7 @@  static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 	dev->dtab = dtab;
 	if (prog) {
 		dev->xdp_prog = prog;
-		dev->val.bpf_prog.id = prog->aux->id;
+		dev->val.bpf_prog.id = bpf_prog_get_id(prog);
 	} else {
 		dev->xdp_prog = NULL;
 		dev->val.bpf_prog.id = 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7b373a5e861f..9e862ef792cb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1958,13 +1958,14 @@  static void bpf_audit_prog(const struct bpf_prog *prog, unsigned int op)
 		return;
 	if (audit_enabled == AUDIT_OFF)
 		return;
-	if (op == BPF_AUDIT_LOAD)
+	if (!in_irq() && !irqs_disabled())
 		ctx = audit_context();
 	ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_BPF);
 	if (unlikely(!ab))
 		return;
+	/* log the id regardless of bpf_prog::valid_id */
 	audit_log_format(ab, "prog-id=%u op=%s",
-			 prog->aux->id, bpf_audit_str[op]);
+			 prog->aux->__id, bpf_audit_str[op]);
 	audit_log_end(ab);
 }
 
@@ -1975,8 +1976,10 @@  static int bpf_prog_alloc_id(struct bpf_prog *prog)
 	idr_preload(GFP_KERNEL);
 	spin_lock_bh(&prog_idr_lock);
 	id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC);
-	if (id > 0)
-		prog->aux->id = id;
+	if (id > 0) {
+		prog->aux->__id = id;
+		prog->valid_id = true;
+	}
 	spin_unlock_bh(&prog_idr_lock);
 	idr_preload_end();
 
@@ -1996,7 +1999,7 @@  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 	 * disappears - even if someone grabs an fd to them they are unusable,
 	 * simply waiting for refcnt to drop to be freed.
 	 */
-	if (!prog->aux->id)
+	if (!prog->valid_id)
 		return;
 
 	if (do_idr_lock)
@@ -2004,8 +2007,8 @@  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 	else
 		__acquire(&prog_idr_lock);
 
-	idr_remove(&prog_idr, prog->aux->id);
-	prog->aux->id = 0;
+	idr_remove(&prog_idr, prog->aux->__id);
+	prog->valid_id = false;
 
 	if (do_idr_lock)
 		spin_unlock_irqrestore(&prog_idr_lock, flags);
@@ -2154,7 +2157,7 @@  static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
 		   prog->jited,
 		   prog_tag,
 		   prog->pages * 1ULL << PAGE_SHIFT,
-		   prog->aux->id,
+		   bpf_prog_get_id(prog),
 		   stats.nsecs,
 		   stats.cnt,
 		   stats.misses,
@@ -2786,7 +2789,7 @@  static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 		   bpf_link_type_strs[link->type],
 		   link->id,
 		   prog_tag,
-		   prog->aux->id);
+		   bpf_prog_get_id(prog));
 	if (link->ops->show_fdinfo)
 		link->ops->show_fdinfo(link, m);
 }
@@ -3914,7 +3917,7 @@  static int bpf_prog_get_info_by_fd(struct file *file,
 		return -EFAULT;
 
 	info.type = prog->type;
-	info.id = prog->aux->id;
+	info.id = bpf_prog_get_id(prog);
 	info.load_time = prog->aux->load_time;
 	info.created_by_uid = from_kuid_munged(current_user_ns(),
 					       prog->aux->user->uid);
@@ -4261,7 +4264,7 @@  static int bpf_link_get_info_by_fd(struct file *file,
 
 	info.type = link->type;
 	info.id = link->id;
-	info.prog_id = link->prog->aux->id;
+	info.prog_id = bpf_prog_get_id(link->prog);
 
 	if (link->ops->fill_link_info) {
 		err = link->ops->fill_link_info(link, &info);
@@ -4426,7 +4429,7 @@  static int bpf_task_fd_query(const union bpf_attr *attr,
 			struct bpf_raw_event_map *btp = raw_tp->btp;
 
 			err = bpf_task_fd_query_copy(attr, uattr,
-						     raw_tp->link.prog->aux->id,
+						     bpf_prog_get_id(raw_tp->link.prog),
 						     BPF_FD_TYPE_RAW_TRACEPOINT,
 						     btp->tp->name, 0, 0);
 			goto put_file;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index aefc1e08e015..c24e897d27f1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9001,7 +9001,11 @@  void perf_event_bpf_event(struct bpf_prog *prog,
 			},
 			.type = type,
 			.flags = flags,
-			.id = prog->aux->id,
+			/*
+			 * don't use bpf_prog_get_id() as the id may be marked
+			 * invalid on PERF_BPF_EVENT_PROG_UNLOAD events
+			 */
+			.id = prog->aux->__id,
 		},
 	};
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 49fb9ec8366d..7cd0eb83b137 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2344,7 +2344,7 @@  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 	if (prog->type == BPF_PROG_TYPE_PERF_EVENT)
 		return -EOPNOTSUPP;
 
-	*prog_id = prog->aux->id;
+	*prog_id = bpf_prog_get_id(prog);
 	flags = event->tp_event->flags;
 	is_tracepoint = flags & TRACE_EVENT_FL_TRACEPOINT;
 	is_syscall_tp = is_syscall_trace_event(event->tp_event);
diff --git a/net/core/dev.c b/net/core/dev.c
index fa53830d0683..0d39ef22cf4b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9068,7 +9068,7 @@  u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode)
 {
 	struct bpf_prog *prog = dev_xdp_prog(dev, mode);
 
-	return prog ? prog->aux->id : 0;
+	return prog ? bpf_prog_get_id(prog) : 0;
 }
 
 static void dev_xdp_set_link(struct net_device *dev, enum bpf_xdp_mode mode,
diff --git a/net/core/filter.c b/net/core/filter.c
index bb0136e7a8e4..282ccfe34ced 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8729,7 +8729,8 @@  void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
 
 	pr_warn_once("%s XDP return value %u on prog %s (id %d) dev %s, expect packet loss!\n",
 		     act > act_max ? "Illegal" : "Driver unsupported",
-		     act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A");
+		     act, prog->aux->name, bpf_prog_get_id(prog),
+		     dev ? dev->name : "N/A");
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 74864dc46a7e..1f7e36909541 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1453,7 +1453,7 @@  static u32 rtnl_xdp_prog_skb(struct net_device *dev)
 	generic_xdp_prog = rtnl_dereference(dev->xdp_prog);
 	if (!generic_xdp_prog)
 		return 0;
-	return generic_xdp_prog->aux->id;
+	return bpf_prog_get_id(generic_xdp_prog);
 }
 
 static u32 rtnl_xdp_prog_drv(struct net_device *dev)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index a660baedd9e7..550ec6cb3aee 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1518,7 +1518,7 @@  int sock_map_bpf_prog_query(const union bpf_attr *attr,
 	/* we do not hold the refcnt, the bpf prog may be released
 	 * asynchronously and the id would be set to 0.
 	 */
-	id = data_race(prog->aux->id);
+	id = data_race(bpf_prog_get_id(prog));
 	if (id == 0)
 		prog_cnt = 0;
 
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 8370726ae7bf..440ce3aba802 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -1543,7 +1543,8 @@  static int put_nla_bpf(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 	if (!nest)
 		return -EMSGSIZE;
 
-	if (nla_put_u32(skb, SEG6_LOCAL_BPF_PROG, slwt->bpf.prog->aux->id))
+	if (nla_put_u32(skb, SEG6_LOCAL_BPF_PROG,
+			bpf_prog_get_id(slwt->bpf.prog)))
 		return -EMSGSIZE;
 
 	if (slwt->bpf.name &&
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index b79eee44e24e..604a29e482b0 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -121,7 +121,7 @@  static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf *prog,
 	    nla_put_string(skb, TCA_ACT_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
 
-	if (nla_put_u32(skb, TCA_ACT_BPF_ID, prog->filter->aux->id))
+	if (nla_put_u32(skb, TCA_ACT_BPF_ID, bpf_prog_get_id(prog->filter)))
 		return -EMSGSIZE;
 
 	nla = nla_reserve(skb, TCA_ACT_BPF_TAG, sizeof(prog->filter->tag));
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index bc317b3eac12..eb5ac6be589e 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -565,7 +565,7 @@  static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
 	    nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
 
-	if (nla_put_u32(skb, TCA_BPF_ID, prog->filter->aux->id))
+	if (nla_put_u32(skb, TCA_BPF_ID, bpf_prog_get_id(prog->filter)))
 		return -EMSGSIZE;
 
 	nla = nla_reserve(skb, TCA_BPF_TAG, sizeof(prog->filter->tag));