mbox series

[bpf-next,v3,00/12] xdp: hints via kfuncs

Message ID 20221206024554.3826186-1-sdf@google.com (mailing list archive)
Headers show
Series xdp: hints via kfuncs | expand

Message

Stanislav Fomichev Dec. 6, 2022, 2:45 a.m. UTC
Please see the first patch in the series for the overall
design and use-cases.

Changes since v3:

- Rework prog->bound_netdev refcounting (Jakub/Marin)

  Now it's based on the offload.c framework. It mostly fits, except
  I had to automatically insert a HT entry for the netdev. In the
  offloaded case, the netdev is added via a call to
  bpf_offload_dev_netdev_register from the driver init path; with
  a dev-bound programs, we have to manually add (and remove) the entry.

  As suggested by Toke, I'm also prohibiting putting dev-bound programs
  into prog-array map; essentially prohibiting tail calling into it.
  I'm also disabling freplace of the dev-bound programs. Both of those
  restrictions can be loosened up eventually.
  Note that we don't require maps to be dev-bound when the program is
  dev-bound.

  Confirmed with the test_offload.py that the existing parts are still
  operational.

- Fix compile issues with CONFIG_NET=n and mlx5 driver (lkp@intel.com)

Changes since v2:

- Rework bpf_prog_aux->xdp_netdev refcnt (Martin)

  Switched to dropping the count early, after loading / verification is
  done. At attach time, the pointer value is used only for comparing
  the actual netdev at attach vs netdev at load.

  (potentially can be a problem if the same slub slot is reused
  for another netdev later on?)

- Use correct RX queue number in xdp_hw_metadata (Toke / Jakub)

- Fix wrongly placed '*cnt=0' in fixup_kfunc_call after merge (Toke)

- Fix sorted BTF_SET8_START (Toke)

  Introduce old-school unsorted BTF_ID_LIST for lookup purposes.

- Zero-initialize mlx4_xdp_buff (Tariq)

- Separate common timestamp handling into mlx4_en_get_hwtstamp (Tariq)

- mlx5 patches (Toke)

  Note, I've renamed the following for consistency with the rest:
  - s/mlx5_xdp_ctx/mlx5_xdp_buff/
  - s/mctx/mxbuf/

Changes since v1:

- Drop xdp->skb metadata path (Jakub)

  No consensus yet on exposing xdp_skb_metadata in UAPI. Exploring
  whether everyone would be ok with kfunc to access that part..
  Will follow up separately.

- Drop kfunc unrolling (Alexei)

  Starting with simple code to resolve per-device ndo kfuncs.
  We can always go back to unrolling and keep the same kfuncs
  interface in the future.

- Add rx hash metadata (Toke)

  Not adding the rest (csum/hash_type/etc), I'd like us to agree on
  the framework.

- use dev_get_by_index and add proper refcnt (Toke)

Changes since last RFC:

- drop ice/bnxt example implementation (Alexander)

  -ENOHARDWARE to test

- fix/test mlx4 implementation

  Confirmed that I get reasonable looking timestamp.
  The last patch in the series is the small xsk program that can
  be used to dump incoming metadata.

- bpf_push64/bpf_pop64 (Alexei)

  x86_64+arm64(untested)+disassembler

- struct xdp_to_skb_metadata -> struct xdp_skb_metadata (Toke)

  s/xdp_to_skb/xdp_skb/

- Documentation/bpf/xdp-rx-metadata.rst

  Documents functionality, assumptions and limitations.

- bpf_xdp_metadata_export_to_skb returns true/false (Martin)

  Plus xdp_md->skb_metadata field to access it.

- BPF_F_XDP_HAS_METADATA flag (Toke/Martin)

  Drop magic, use the flag instead.

- drop __randomize_layout

  Not sure it's possible to sanely expose it via UAPI. Because every
  .o potentially gets its own randomized layout, test_progs
  refuses to link.

- remove __net_timestamp in veth driver (John/Jesper)

  Instead, calling ktime_get from the kfunc; enough for the selftests.

Future work on RX side:

- Support more devices besides veth and mlx4
- Support more metadata besides RX timestamp.
- Convert skb_metadata_set() callers to xdp_convert_skb_metadata()
  which handles extra xdp_skb_metadata

Prior art (to record pros/cons for different approaches):

- Stable UAPI approach:
  https://lore.kernel.org/bpf/20220628194812.1453059-1-alexandr.lobakin@intel.com/
- Metadata+BTF_ID appoach:
  https://lore.kernel.org/bpf/166256538687.1434226.15760041133601409770.stgit@firesoul/
- v1:
  https://lore.kernel.org/bpf/20221115030210.3159213-1-sdf@google.com/T/#t
- kfuncs v2 RFC:
  https://lore.kernel.org/bpf/20221027200019.4106375-1-sdf@google.com/
- kfuncs v1 RFC:
  https://lore.kernel.org/bpf/20221104032532.1615099-1-sdf@google.com/

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org

Stanislav Fomichev (9):
  bpf: Document XDP RX metadata
  bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded
  bpf: XDP metadata RX kfuncs
  veth: Introduce veth_xdp_buff wrapper for xdp_buff
  veth: Support RX XDP metadata
  selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  mxl4: Support RX XDP metadata
  selftests/bpf: Simple program to dump XDP RX metadata

Toke Høiland-Jørgensen (3):
  xsk: Add cb area to struct xdp_buff_xsk
  mlx5: Introduce mlx5_xdp_buff wrapper for xdp_buff
  mlx5: Support RX XDP metadata

 Documentation/bpf/xdp-rx-metadata.rst         |  90 ++++
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |  13 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  10 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  68 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  32 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |  13 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  35 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   4 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  98 ++---
 drivers/net/veth.c                            |  88 ++--
 include/linux/bpf.h                           |  26 +-
 include/linux/mlx4/device.h                   |   7 +
 include/linux/netdevice.h                     |   5 +
 include/net/xdp.h                             |  29 ++
 include/net/xsk_buff_pool.h                   |   5 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/arraymap.c                         |  17 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/offload.c                          | 162 +++++--
 kernel/bpf/syscall.c                          |  25 +-
 kernel/bpf/verifier.c                         |  42 +-
 net/core/dev.c                                |   7 +-
 net/core/filter.c                             |   2 +-
 net/core/xdp.c                                |  58 +++
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   8 +-
 .../selftests/bpf/prog_tests/xdp_metadata.c   | 394 +++++++++++++++++
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  93 ++++
 .../selftests/bpf/progs/xdp_metadata.c        |  70 +++
 .../selftests/bpf/progs/xdp_metadata2.c       |  15 +
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 405 ++++++++++++++++++
 tools/testing/selftests/bpf/xdp_metadata.h    |   7 +
 36 files changed, 1688 insertions(+), 167 deletions(-)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata2.c
 create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_metadata.h

Comments

Toke Høiland-Jørgensen Dec. 8, 2022, 10:28 p.m. UTC | #1
Stanislav Fomichev <sdf@google.com> writes:

> Please see the first patch in the series for the overall
> design and use-cases.
>
> Changes since v3:
>
> - Rework prog->bound_netdev refcounting (Jakub/Marin)
>
>   Now it's based on the offload.c framework. It mostly fits, except
>   I had to automatically insert a HT entry for the netdev. In the
>   offloaded case, the netdev is added via a call to
>   bpf_offload_dev_netdev_register from the driver init path; with
>   a dev-bound programs, we have to manually add (and remove) the entry.
>
>   As suggested by Toke, I'm also prohibiting putting dev-bound programs
>   into prog-array map; essentially prohibiting tail calling into it.
>   I'm also disabling freplace of the dev-bound programs. Both of those
>   restrictions can be loosened up eventually.

I thought it would be a shame that we don't support at least freplace
programs from the get-go (as that would exclude libxdp from taking
advantage of this). So see below for a patch implementing this :)

-Toke




commit 3abb333e5fd2e8a0920b77013499bdae0ee3db43
Author: Toke Høiland-Jørgensen <toke@redhat.com>
Date:   Thu Dec 8 23:10:54 2022 +0100

    bpf: Support consuming XDP HW metadata from fext programs
    
    Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
    programs that consume HW metadata, implement support for propagating the
    offload information. The extension program doesn't need to set a flag or
    ifindex, it these will just be propagated from the target by the verifier.
    We need to create a separate offload object for the extension program,
    though, since it can be reattached to a different program later (which
    means we can't just inhering the offload information from the target).
    
    An additional check is added on attach that the new target is compatible
    with the offload information in the extension prog.
    
    Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b46b60f4eae1..cfa5c847cf2c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2482,6 +2482,7 @@ void *bpf_offload_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
 void unpriv_ebpf_notify(int new_state);
 
 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
+int __bpf_prog_offload_init(struct bpf_prog *prog, struct net_device *netdev);
 int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
 void bpf_offload_bound_netdev_unregister(struct net_device *dev);
 
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index bad8bab916eb..b059a7b53457 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -83,36 +83,25 @@ bpf_offload_find_netdev(struct net_device *netdev)
 	return rhashtable_lookup_fast(&offdevs, &netdev, offdevs_params);
 }
 
-int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
+int __bpf_prog_offload_init(struct bpf_prog *prog, struct net_device *netdev)
 {
 	struct bpf_offload_netdev *ondev;
 	struct bpf_prog_offload *offload;
 	int err;
 
-	if (attr->prog_type != BPF_PROG_TYPE_SCHED_CLS &&
-	    attr->prog_type != BPF_PROG_TYPE_XDP)
+	if (!netdev)
 		return -EINVAL;
 
-	if (attr->prog_flags & ~BPF_F_XDP_HAS_METADATA)
-		return -EINVAL;
+	err = __bpf_offload_init();
+	if (err)
+		return err;
 
 	offload = kzalloc(sizeof(*offload), GFP_USER);
 	if (!offload)
 		return -ENOMEM;
 
-	err = __bpf_offload_init();
-	if (err)
-		return err;
-
 	offload->prog = prog;
-
-	offload->netdev = dev_get_by_index(current->nsproxy->net_ns,
-					   attr->prog_ifindex);
-	err = bpf_dev_offload_check(offload->netdev);
-	if (err)
-		goto err_maybe_put;
-
-	prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_HAS_METADATA);
+	offload->netdev = netdev;
 
 	down_write(&bpf_devs_lock);
 	ondev = bpf_offload_find_netdev(offload->netdev);
@@ -135,19 +124,46 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
 	offload->offdev = ondev->offdev;
 	prog->aux->offload = offload;
 	list_add_tail(&offload->offloads, &ondev->progs);
-	dev_put(offload->netdev);
 	up_write(&bpf_devs_lock);
 
 	return 0;
 err_unlock:
 	up_write(&bpf_devs_lock);
-err_maybe_put:
-	if (offload->netdev)
-		dev_put(offload->netdev);
 	kfree(offload);
 	return err;
 }
 
+int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
+{
+	struct net_device *netdev;
+	int err;
+
+	if (attr->prog_type != BPF_PROG_TYPE_SCHED_CLS &&
+	    attr->prog_type != BPF_PROG_TYPE_XDP)
+		return -EINVAL;
+
+	if (attr->prog_flags & ~BPF_F_XDP_HAS_METADATA)
+		return -EINVAL;
+
+	netdev = dev_get_by_index(current->nsproxy->net_ns, attr->prog_ifindex);
+	if (!netdev)
+		return -EINVAL;
+
+	err = bpf_dev_offload_check(netdev);
+	if (err)
+		goto out;
+
+	prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_HAS_METADATA);
+
+	err = __bpf_prog_offload_init(prog, netdev);
+	if (err)
+		goto out;
+
+out:
+	dev_put(netdev);
+	return err;
+}
+
 int bpf_prog_offload_verifier_prep(struct bpf_prog *prog)
 {
 	struct bpf_prog_offload *offload;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b345a273f7d0..606e6de5f716 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 			goto out_put_prog;
 		}
 
+		if (bpf_prog_is_dev_bound(tgt_prog->aux) &&
+		    (bpf_prog_is_offloaded(tgt_prog->aux) ||
+		     !bpf_prog_is_dev_bound(prog->aux) ||
+		     !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {
+			err = -EINVAL;
+			goto out_put_prog;
+		}
+
 		key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id);
 	}
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bc8d9b8d4f47..d92e28dd220e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16379,11 +16379,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 	if (tgt_prog) {
 		struct bpf_prog_aux *aux = tgt_prog->aux;
 
-		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
-			bpf_log(log, "Replacing device-bound programs not supported\n");
-			return -EINVAL;
-		}
-
 		for (i = 0; i < aux->func_info_cnt; i++)
 			if (aux->func_info[i].type_id == btf_id) {
 				subprog = i;
@@ -16644,10 +16639,22 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	if (tgt_prog && prog->type == BPF_PROG_TYPE_EXT) {
 		/* to make freplace equivalent to their targets, they need to
 		 * inherit env->ops and expected_attach_type for the rest of the
-		 * verification
+		 * verification; we also need to propagate the prog offload data
+		 * for resolving kfuncs.
 		 */
 		env->ops = bpf_verifier_ops[tgt_prog->type];
 		prog->expected_attach_type = tgt_prog->expected_attach_type;
+
+		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
+			if (bpf_prog_is_offloaded(tgt_prog->aux))
+				return -EINVAL;
+
+			prog->aux->dev_bound = true;
+			ret = __bpf_prog_offload_init(prog,
+						      tgt_prog->aux->offload->netdev);
+			if (ret)
+				return ret;
+		}
 	}
 
 	/* store info about the attachment target that will be used later */
Stanislav Fomichev Dec. 8, 2022, 11:47 p.m. UTC | #2
On Thu, Dec 8, 2022 at 2:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > Please see the first patch in the series for the overall
> > design and use-cases.
> >
> > Changes since v3:
> >
> > - Rework prog->bound_netdev refcounting (Jakub/Marin)
> >
> >   Now it's based on the offload.c framework. It mostly fits, except
> >   I had to automatically insert a HT entry for the netdev. In the
> >   offloaded case, the netdev is added via a call to
> >   bpf_offload_dev_netdev_register from the driver init path; with
> >   a dev-bound programs, we have to manually add (and remove) the entry.
> >
> >   As suggested by Toke, I'm also prohibiting putting dev-bound programs
> >   into prog-array map; essentially prohibiting tail calling into it.
> >   I'm also disabling freplace of the dev-bound programs. Both of those
> >   restrictions can be loosened up eventually.
>
> I thought it would be a shame that we don't support at least freplace
> programs from the get-go (as that would exclude libxdp from taking
> advantage of this). So see below for a patch implementing this :)
>
> -Toke

Damn, now I need to write a selftest :-)
But seriously, thank you for taking care of this, will try to include
preserving SoB!


> commit 3abb333e5fd2e8a0920b77013499bdae0ee3db43
> Author: Toke Høiland-Jørgensen <toke@redhat.com>
> Date:   Thu Dec 8 23:10:54 2022 +0100
>
>     bpf: Support consuming XDP HW metadata from fext programs
>
>     Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
>     programs that consume HW metadata, implement support for propagating the
>     offload information. The extension program doesn't need to set a flag or
>     ifindex, it these will just be propagated from the target by the verifier.
>     We need to create a separate offload object for the extension program,
>     though, since it can be reattached to a different program later (which
>     means we can't just inhering the offload information from the target).
>
>     An additional check is added on attach that the new target is compatible
>     with the offload information in the extension prog.
>
>     Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index b46b60f4eae1..cfa5c847cf2c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2482,6 +2482,7 @@ void *bpf_offload_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
>  void unpriv_ebpf_notify(int new_state);
>
>  #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
> +int __bpf_prog_offload_init(struct bpf_prog *prog, struct net_device *netdev);
>  int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
>  void bpf_offload_bound_netdev_unregister(struct net_device *dev);
>
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index bad8bab916eb..b059a7b53457 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -83,36 +83,25 @@ bpf_offload_find_netdev(struct net_device *netdev)
>         return rhashtable_lookup_fast(&offdevs, &netdev, offdevs_params);
>  }
>
> -int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> +int __bpf_prog_offload_init(struct bpf_prog *prog, struct net_device *netdev)
>  {
>         struct bpf_offload_netdev *ondev;
>         struct bpf_prog_offload *offload;
>         int err;
>
> -       if (attr->prog_type != BPF_PROG_TYPE_SCHED_CLS &&
> -           attr->prog_type != BPF_PROG_TYPE_XDP)
> +       if (!netdev)
>                 return -EINVAL;
>
> -       if (attr->prog_flags & ~BPF_F_XDP_HAS_METADATA)
> -               return -EINVAL;
> +       err = __bpf_offload_init();
> +       if (err)
> +               return err;
>
>         offload = kzalloc(sizeof(*offload), GFP_USER);
>         if (!offload)
>                 return -ENOMEM;
>
> -       err = __bpf_offload_init();
> -       if (err)
> -               return err;
> -
>         offload->prog = prog;
> -
> -       offload->netdev = dev_get_by_index(current->nsproxy->net_ns,
> -                                          attr->prog_ifindex);
> -       err = bpf_dev_offload_check(offload->netdev);
> -       if (err)
> -               goto err_maybe_put;
> -
> -       prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_HAS_METADATA);
> +       offload->netdev = netdev;
>
>         down_write(&bpf_devs_lock);
>         ondev = bpf_offload_find_netdev(offload->netdev);
> @@ -135,19 +124,46 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
>         offload->offdev = ondev->offdev;
>         prog->aux->offload = offload;
>         list_add_tail(&offload->offloads, &ondev->progs);
> -       dev_put(offload->netdev);
>         up_write(&bpf_devs_lock);
>
>         return 0;
>  err_unlock:
>         up_write(&bpf_devs_lock);
> -err_maybe_put:
> -       if (offload->netdev)
> -               dev_put(offload->netdev);
>         kfree(offload);
>         return err;
>  }
>
> +int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> +{
> +       struct net_device *netdev;
> +       int err;
> +
> +       if (attr->prog_type != BPF_PROG_TYPE_SCHED_CLS &&
> +           attr->prog_type != BPF_PROG_TYPE_XDP)
> +               return -EINVAL;
> +
> +       if (attr->prog_flags & ~BPF_F_XDP_HAS_METADATA)
> +               return -EINVAL;
> +
> +       netdev = dev_get_by_index(current->nsproxy->net_ns, attr->prog_ifindex);
> +       if (!netdev)
> +               return -EINVAL;
> +
> +       err = bpf_dev_offload_check(netdev);
> +       if (err)
> +               goto out;
> +
> +       prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_HAS_METADATA);
> +
> +       err = __bpf_prog_offload_init(prog, netdev);
> +       if (err)
> +               goto out;
> +
> +out:
> +       dev_put(netdev);
> +       return err;
> +}
> +
>  int bpf_prog_offload_verifier_prep(struct bpf_prog *prog)
>  {
>         struct bpf_prog_offload *offload;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index b345a273f7d0..606e6de5f716 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>                         goto out_put_prog;
>                 }
>
> +               if (bpf_prog_is_dev_bound(tgt_prog->aux) &&
> +                   (bpf_prog_is_offloaded(tgt_prog->aux) ||
> +                    !bpf_prog_is_dev_bound(prog->aux) ||
> +                    !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {
> +                       err = -EINVAL;
> +                       goto out_put_prog;
> +               }
> +
>                 key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id);
>         }
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index bc8d9b8d4f47..d92e28dd220e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -16379,11 +16379,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>         if (tgt_prog) {
>                 struct bpf_prog_aux *aux = tgt_prog->aux;
>
> -               if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
> -                       bpf_log(log, "Replacing device-bound programs not supported\n");
> -                       return -EINVAL;
> -               }
> -
>                 for (i = 0; i < aux->func_info_cnt; i++)
>                         if (aux->func_info[i].type_id == btf_id) {
>                                 subprog = i;
> @@ -16644,10 +16639,22 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
>         if (tgt_prog && prog->type == BPF_PROG_TYPE_EXT) {
>                 /* to make freplace equivalent to their targets, they need to
>                  * inherit env->ops and expected_attach_type for the rest of the
> -                * verification
> +                * verification; we also need to propagate the prog offload data
> +                * for resolving kfuncs.
>                  */
>                 env->ops = bpf_verifier_ops[tgt_prog->type];
>                 prog->expected_attach_type = tgt_prog->expected_attach_type;
> +
> +               if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
> +                       if (bpf_prog_is_offloaded(tgt_prog->aux))
> +                               return -EINVAL;
> +
> +                       prog->aux->dev_bound = true;
> +                       ret = __bpf_prog_offload_init(prog,
> +                                                     tgt_prog->aux->offload->netdev);
> +                       if (ret)
> +                               return ret;
> +               }
>         }
>
>         /* store info about the attachment target that will be used later */
>
Toke Høiland-Jørgensen Dec. 9, 2022, 12:14 a.m. UTC | #3
Stanislav Fomichev <sdf@google.com> writes:

> On Thu, Dec 8, 2022 at 2:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > Please see the first patch in the series for the overall
>> > design and use-cases.
>> >
>> > Changes since v3:
>> >
>> > - Rework prog->bound_netdev refcounting (Jakub/Marin)
>> >
>> >   Now it's based on the offload.c framework. It mostly fits, except
>> >   I had to automatically insert a HT entry for the netdev. In the
>> >   offloaded case, the netdev is added via a call to
>> >   bpf_offload_dev_netdev_register from the driver init path; with
>> >   a dev-bound programs, we have to manually add (and remove) the entry.
>> >
>> >   As suggested by Toke, I'm also prohibiting putting dev-bound programs
>> >   into prog-array map; essentially prohibiting tail calling into it.
>> >   I'm also disabling freplace of the dev-bound programs. Both of those
>> >   restrictions can be loosened up eventually.
>>
>> I thought it would be a shame that we don't support at least freplace
>> programs from the get-go (as that would exclude libxdp from taking
>> advantage of this). So see below for a patch implementing this :)
>>
>> -Toke
>
> Damn, now I need to write a selftest :-)
> But seriously, thank you for taking care of this, will try to include
> preserving SoB!

Cool, thanks! I just realised I made on mistake in the attach check,
though:

[...]

>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index b345a273f7d0..606e6de5f716 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>>                         goto out_put_prog;
>>                 }
>>
>> +               if (bpf_prog_is_dev_bound(tgt_prog->aux) &&
>> +                   (bpf_prog_is_offloaded(tgt_prog->aux) ||
>> +                    !bpf_prog_is_dev_bound(prog->aux) ||
>> +                    !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {

This should switch the order of the is_dev_bound() checks, like:

+               if (bpf_prog_is_dev_bound(prog->aux) &&
+                   (bpf_prog_is_offloaded(tgt_prog->aux) ||
+                    !bpf_prog_is_dev_bound(tgt_prog->aux) ||
+                    !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {

I.e., first check bpf_prog_is_dev_bound(prog->aux) (the program being
attached), and only perform the other checks if we're attaching
something that has been verified as being dev-bound. It should be fine
to attach a non-devbound function to a devbound parent program (since
that non-devbound function can't call any of the kfuncs).

-Toke