Message ID | 20221115030210.3159213-7-sdf@google.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | xdp: hints via kfuncs | expand |
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index b444b1118c4f..71e3bc7ad839 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -6116,6 +6116,12 @@ enum xdp_action { > XDP_REDIRECT, > }; > > +/* Subset of XDP metadata exported to skb context. > + */ > +struct xdp_skb_metadata { > + __u64 rx_timestamp; > +}; Okay, so given Alexei's comment about __randomize_struct not actually working, I think we need to come up with something else for this. Just sticking this in a regular UAPI header seems like a bad idea; we'd just be inviting people to use it as-is. Do we actually need the full definition here? It's just a pointer declaration below, so is an opaque forward-definition enough? Then we could have the full definition in an internal header, moving the full definition back to being in vmlinux.h only? -Toke
On Tue, Nov 15, 2022 at 3:20 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index b444b1118c4f..71e3bc7ad839 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -6116,6 +6116,12 @@ enum xdp_action { > > XDP_REDIRECT, > > }; > > > > +/* Subset of XDP metadata exported to skb context. > > + */ > > +struct xdp_skb_metadata { > > + __u64 rx_timestamp; > > +}; > > Okay, so given Alexei's comment about __randomize_struct not actually > working, I think we need to come up with something else for this. Just > sticking this in a regular UAPI header seems like a bad idea; we'd just > be inviting people to use it as-is. > > Do we actually need the full definition here? It's just a pointer > declaration below, so is an opaque forward-definition enough? Then we > could have the full definition in an internal header, moving the full > definition back to being in vmlinux.h only? Looks like having a uapi-declaration only (and moving the definition into the kernel headers) might work. At least it does in my limited testing :-) So let's go with that for now. Alexei, thanks for the context on __randomize_struct!
Hi Stanislav,
I love your patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20221115030210.3159213-7-sdf%40google.com
patch subject: [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context
config: hexagon-randconfig-r031-20221115
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 463da45892e2d2a262277b91b96f5f8c05dc25d0)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/5c7e4e60d13af8491563dd4d2ad5fc684420e540
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547
git checkout 5c7e4e60d13af8491563dd4d2ad5fc684420e540
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash net/sched/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from net/sched/cls_flow.c:14:
In file included from include/linux/skbuff.h:17:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from net/sched/cls_flow.c:14:
In file included from include/linux/skbuff.h:17:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from net/sched/cls_flow.c:14:
In file included from include/linux/skbuff.h:17:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
In file included from net/sched/cls_flow.c:17:
In file included from include/linux/ipv6.h:93:
In file included from include/linux/tcp.h:19:
In file included from include/net/sock.h:46:
In file included from include/linux/netdevice.h:43:
include/net/xdp.h:449:2: error: void function 'xdp_metadata_export_to_skb' should not return a value [-Wreturn-type]
return 0;
^ ~
>> net/sched/cls_flow.c:63:52: warning: shift count >= width of type [-Wshift-count-overflow]
return (a & 0xFFFFFFFF) ^ (BITS_PER_LONG > 32 ? a >> 32 : 0);
^ ~~
7 warnings and 1 error generated.
vim +63 net/sched/cls_flow.c
e5dfb815181fcb Patrick McHardy 2008-01-31 58
e5dfb815181fcb Patrick McHardy 2008-01-31 59 static inline u32 addr_fold(void *addr)
e5dfb815181fcb Patrick McHardy 2008-01-31 60 {
e5dfb815181fcb Patrick McHardy 2008-01-31 61 unsigned long a = (unsigned long)addr;
e5dfb815181fcb Patrick McHardy 2008-01-31 62
e5dfb815181fcb Patrick McHardy 2008-01-31 @63 return (a & 0xFFFFFFFF) ^ (BITS_PER_LONG > 32 ? a >> 32 : 0);
e5dfb815181fcb Patrick McHardy 2008-01-31 64 }
e5dfb815181fcb Patrick McHardy 2008-01-31 65
On 11/14/22 7:02 PM, Stanislav Fomichev wrote: > Implement new bpf_xdp_metadata_export_to_skb kfunc which > prepares compatible xdp metadata for kernel consumption. > This kfunc should be called prior to bpf_redirect > or when XDP_PASS'ing the frame into the kernel (note, the drivers > have to be updated to enable consuming XDP_PASS'ed metadata). > > veth driver is amended to consume this metadata when converting to skb. > > Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate > whether the frame has skb metadata. The metadata is currently > stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses > to work after a call to bpf_xdp_metadata_export_to_skb (can lift > this requirement later on if needed, we'd have to memmove > xdp_skb_metadata). It is ok to refuse bpf_xdp_adjust_meta() after bpf_xdp_metadata_export_to_skb() for now. However, it will also need to refuse bpf_xdp_adjust_head(). [ ... ] > +/* For the packets directed to the kernel, this kfunc exports XDP metadata > + * into skb context. > + */ > +noinline int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) > +{ > + return 0; > +} > + I think it is still better to return 'struct xdp_skb_metata *' instead of true/false. Like: noinline struct xdp_skb_metata *bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) { return 0; } The KF_RET_NULL has already been set in BTF_SET8_START_GLOBAL(xdp_metadata_kfunc_ids). There is "xdp_btf_struct_access()" that can allow write access to 'struct xdp_skb_metata' What else is missing? We can try to solve it. Then there is no need for this double check in patch 8 selftest which is not easy to use: + if (bpf_xdp_metadata_export_to_skb(ctx) < 0) { + bpf_printk("bpf_xdp_metadata_export_to_skb failed"); + return XDP_DROP; + } [ ... ] + skb_metadata = ctx->skb_metadata; + if (!skb_metadata) { + bpf_printk("no ctx->skb_metadata"); + return XDP_DROP; + } [ ... ] > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index b444b1118c4f..71e3bc7ad839 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -6116,6 +6116,12 @@ enum xdp_action { > XDP_REDIRECT, > }; > > +/* Subset of XDP metadata exported to skb context. > + */ > +struct xdp_skb_metadata { > + __u64 rx_timestamp; > +}; > + > /* user accessible metadata for XDP packet hook > * new fields must be added to the end of this structure > */ > @@ -6128,6 +6134,7 @@ struct xdp_md { > __u32 rx_queue_index; /* rxq->queue_index */ > > __u32 egress_ifindex; /* txq->dev->ifindex */ > + __bpf_md_ptr(struct xdp_skb_metadata *, skb_metadata); Once the above bpf_xdp_metadata_export_to_skb() returning a pointer works, then it can be another kfunc 'struct xdp_skb_metata * bpf_xdp_get_skb_metadata(const struct xdp_md *ctx)' to return the skb_metadata which was a similar point discussed in the previous RFC.
Hi Stanislav, I love your patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20221115030210.3159213-7-sdf%40google.com patch subject: [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context config: parisc-randconfig-r006-20221115 compiler: hppa-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/5c7e4e60d13af8491563dd4d2ad5fc684420e540 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547 git checkout 5c7e4e60d13af8491563dd4d2ad5fc684420e540 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=parisc SHELL=/bin/bash net/core/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from include/linux/netdevice.h:43, from include/linux/if_vlan.h:10, from include/linux/filter.h:20, from net/core/xdp.c:9: include/net/xdp.h: In function 'xdp_metadata_export_to_skb': include/net/xdp.h:449:16: error: 'return' with a value, in function returning void [-Werror=return-type] 449 | return 0; | ^ include/net/xdp.h:447:13: note: declared here 447 | static void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/xdp.c: At top level: net/core/xdp.c:734:14: warning: no previous prototype for 'bpf_xdp_metadata_export_to_skb' [-Wmissing-prototypes] 734 | noinline int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/xdp.c:743:14: warning: no previous prototype for 'bpf_xdp_metadata_rx_timestamp_supported' [-Wmissing-prototypes] 743 | noinline int bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/xdp.c:751:10: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 751 | noinline const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) | ^~~~~ net/core/xdp.c:751:22: warning: no previous prototype for 'bpf_xdp_metadata_rx_timestamp' [-Wmissing-prototypes] 751 | noinline const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> net/core/xdp.c:881:6: error: redefinition of 'xdp_metadata_export_to_skb' 881 | void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ include/net/xdp.h:447:13: note: previous definition of 'xdp_metadata_export_to_skb' with type 'void(const struct bpf_prog *, struct bpf_patch *)' 447 | static void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ include/net/xdp.h:447:13: warning: 'xdp_metadata_export_to_skb' defined but not used [-Wunused-function] cc1: some warnings being treated as errors vim +/xdp_metadata_export_to_skb +881 net/core/xdp.c 872 873 static int __init xdp_metadata_init(void) 874 { 875 return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set); 876 } 877 late_initcall(xdp_metadata_init); 878 #else 879 struct btf_id_set8 xdp_metadata_kfunc_ids = {}; 880 EXPORT_SYMBOL(xdp_metadata_kfunc_ids); > 881 void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch)
Hi Stanislav,
I love your patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20221115030210.3159213-7-sdf%40google.com
patch subject: [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context
config: arm64-randconfig-r001-20221115
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 463da45892e2d2a262277b91b96f5f8c05dc25d0)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/5c7e4e60d13af8491563dd4d2ad5fc684420e540
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547
git checkout 5c7e4e60d13af8491563dd4d2ad5fc684420e540
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash net/mac80211/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from net/mac80211/s1g.c:6:
In file included from include/linux/ieee80211.h:20:
In file included from include/linux/etherdevice.h:21:
In file included from include/linux/netdevice.h:43:
include/net/xdp.h:449:2: error: void function 'xdp_metadata_export_to_skb' should not return a value [-Wreturn-type]
return 0;
^ ~
>> net/mac80211/s1g.c:103:36: warning: implicit conversion from 'unsigned long' to '__u16' (aka 'unsigned short') changes value from 18446744073709551614 to 65534 [-Wconstant-conversion]
twt_agrt->req_type &= cpu_to_le16(~IEEE80211_TWT_REQTYPE_REQUEST);
~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/byteorder/generic.h:90:21: note: expanded from macro 'cpu_to_le16'
#define cpu_to_le16 __cpu_to_le16
^
include/uapi/linux/byteorder/big_endian.h:36:53: note: expanded from macro '__cpu_to_le16'
#define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
~~~~~~~~~~^~~
include/uapi/linux/swab.h:107:12: note: expanded from macro '__swab16'
__fswab16(x))
~~~~~~~~~ ^
1 warning and 1 error generated.
vim +103 net/mac80211/s1g.c
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 94
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 95 static void
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 96 ieee80211_s1g_rx_twt_setup(struct ieee80211_sub_if_data *sdata,
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 97 struct sta_info *sta, struct sk_buff *skb)
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 98 {
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 99 struct ieee80211_mgmt *mgmt = (void *)skb->data;
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 100 struct ieee80211_twt_setup *twt = (void *)mgmt->u.action.u.s1g.variable;
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 101 struct ieee80211_twt_params *twt_agrt = (void *)twt->params;
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 102
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 @103 twt_agrt->req_type &= cpu_to_le16(~IEEE80211_TWT_REQTYPE_REQUEST);
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 104
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 105 /* broadcast TWT not supported yet */
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 106 if (twt->control & IEEE80211_TWT_CONTROL_NEG_TYPE_BROADCAST) {
7ff379ba2d4b7b Johannes Berg 2021-09-27 107 twt_agrt->req_type &=
7ff379ba2d4b7b Johannes Berg 2021-09-27 108 ~cpu_to_le16(IEEE80211_TWT_REQTYPE_SETUP_CMD);
7ff379ba2d4b7b Johannes Berg 2021-09-27 109 twt_agrt->req_type |=
7ff379ba2d4b7b Johannes Berg 2021-09-27 110 le16_encode_bits(TWT_SETUP_CMD_REJECT,
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 111 IEEE80211_TWT_REQTYPE_SETUP_CMD);
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 112 goto out;
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 113 }
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 114
30ac96f7cc973b Howard Hsu 2022-10-27 115 /* TWT Information not supported yet */
30ac96f7cc973b Howard Hsu 2022-10-27 116 twt->control |= IEEE80211_TWT_CONTROL_RX_DISABLED;
30ac96f7cc973b Howard Hsu 2022-10-27 117
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 118 drv_add_twt_setup(sdata->local, sdata, &sta->sta, twt);
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 119 out:
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 120 ieee80211_s1g_send_twt_setup(sdata, mgmt->sa, sdata->vif.addr, twt);
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 121 }
f5a4c24e689f54 Lorenzo Bianconi 2021-08-23 122
Stanislav Fomichev <sdf@google.com> writes: > On Tue, Nov 15, 2022 at 3:20 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> > index b444b1118c4f..71e3bc7ad839 100644 >> > --- a/include/uapi/linux/bpf.h >> > +++ b/include/uapi/linux/bpf.h >> > @@ -6116,6 +6116,12 @@ enum xdp_action { >> > XDP_REDIRECT, >> > }; >> > >> > +/* Subset of XDP metadata exported to skb context. >> > + */ >> > +struct xdp_skb_metadata { >> > + __u64 rx_timestamp; >> > +}; >> >> Okay, so given Alexei's comment about __randomize_struct not actually >> working, I think we need to come up with something else for this. Just >> sticking this in a regular UAPI header seems like a bad idea; we'd just >> be inviting people to use it as-is. >> >> Do we actually need the full definition here? It's just a pointer >> declaration below, so is an opaque forward-definition enough? Then we >> could have the full definition in an internal header, moving the full >> definition back to being in vmlinux.h only? > > Looks like having a uapi-declaration only (and moving the definition > into the kernel headers) might work. At least it does in my limited > testing :-) So let's go with that for now. Alexei, thanks for the > context on __randomize_struct! Sounds good! :) -Toke
Martin KaFai Lau <martin.lau@linux.dev> writes: > On 11/14/22 7:02 PM, Stanislav Fomichev wrote: >> Implement new bpf_xdp_metadata_export_to_skb kfunc which >> prepares compatible xdp metadata for kernel consumption. >> This kfunc should be called prior to bpf_redirect >> or when XDP_PASS'ing the frame into the kernel (note, the drivers >> have to be updated to enable consuming XDP_PASS'ed metadata). >> >> veth driver is amended to consume this metadata when converting to skb. >> >> Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate >> whether the frame has skb metadata. The metadata is currently >> stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses >> to work after a call to bpf_xdp_metadata_export_to_skb (can lift >> this requirement later on if needed, we'd have to memmove >> xdp_skb_metadata). > > It is ok to refuse bpf_xdp_adjust_meta() after bpf_xdp_metadata_export_to_skb() > for now. However, it will also need to refuse bpf_xdp_adjust_head(). I'm also OK with deferring this, although I'm wondering if it isn't just as easy to just add the memmove() straight away? :) -Toke
Hi Stanislav, I love your patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20221115030210.3159213-7-sdf%40google.com patch subject: [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context config: hexagon-randconfig-r041-20221115 compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 463da45892e2d2a262277b91b96f5f8c05dc25d0) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/5c7e4e60d13af8491563dd4d2ad5fc684420e540 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Stanislav-Fomichev/xdp-hints-via-kfuncs/20221115-110547 git checkout 5c7e4e60d13af8491563dd4d2ad5fc684420e540 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash net/core/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from net/core/dev_addr_lists_test.c:4: In file included from include/linux/etherdevice.h:20: In file included from include/linux/if_ether.h:19: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __raw_readb(PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu' #define __le16_to_cpu(x) ((__force __u16)(__le16)(x)) ^ In file included from net/core/dev_addr_lists_test.c:4: In file included from include/linux/etherdevice.h:20: In file included from include/linux/if_ether.h:19: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) ((__force __u32)(__le32)(x)) ^ In file included from net/core/dev_addr_lists_test.c:4: In file included from include/linux/etherdevice.h:20: In file included from include/linux/if_ether.h:19: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writeb(value, PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ In file included from net/core/dev_addr_lists_test.c:4: In file included from include/linux/etherdevice.h:21: In file included from include/linux/netdevice.h:43: >> include/net/xdp.h:449:2: error: void function 'xdp_metadata_export_to_skb' should not return a value [-Wreturn-type] return 0; ^ ~ 6 warnings and 1 error generated. -- In file included from net/core/xdp.c:9: In file included from include/linux/filter.h:12: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __raw_readb(PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu' #define __le16_to_cpu(x) ((__force __u16)(__le16)(x)) ^ In file included from net/core/xdp.c:9: In file included from include/linux/filter.h:12: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) ((__force __u32)(__le32)(x)) ^ In file included from net/core/xdp.c:9: In file included from include/linux/filter.h:12: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writeb(value, PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ In file included from net/core/xdp.c:9: In file included from include/linux/filter.h:20: In file included from include/linux/if_vlan.h:10: In file included from include/linux/netdevice.h:43: >> include/net/xdp.h:449:2: error: void function 'xdp_metadata_export_to_skb' should not return a value [-Wreturn-type] return 0; ^ ~ net/core/xdp.c:734:14: warning: no previous prototype for function 'bpf_xdp_metadata_export_to_skb' [-Wmissing-prototypes] noinline int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) ^ net/core/xdp.c:734:10: note: declare 'static' if the function is not intended to be used outside of this translation unit noinline int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) ^ static net/core/xdp.c:743:14: warning: no previous prototype for function 'bpf_xdp_metadata_rx_timestamp_supported' [-Wmissing-prototypes] noinline int bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) ^ net/core/xdp.c:743:10: note: declare 'static' if the function is not intended to be used outside of this translation unit noinline int bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) ^ static net/core/xdp.c:751:10: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers] noinline const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) ^~~~~~ net/core/xdp.c:751:22: warning: no previous prototype for function 'bpf_xdp_metadata_rx_timestamp' [-Wmissing-prototypes] noinline const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) ^ net/core/xdp.c:751:16: note: declare 'static' if the function is not intended to be used outside of this translation unit noinline const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) ^ static net/core/xdp.c:881:6: error: redefinition of 'xdp_metadata_export_to_skb' void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) ^ include/net/xdp.h:447:13: note: previous definition is here static void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) ^ 10 warnings and 2 errors generated. -- In file included from net/core/net-traces.c:8: In file included from include/linux/netdevice.h:38: In file included from include/net/net_namespace.h:43: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __raw_readb(PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu' #define __le16_to_cpu(x) ((__force __u16)(__le16)(x)) ^ In file included from net/core/net-traces.c:8: In file included from include/linux/netdevice.h:38: In file included from include/net/net_namespace.h:43: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) ((__force __u32)(__le32)(x)) ^ In file included from net/core/net-traces.c:8: In file included from include/linux/netdevice.h:38: In file included from include/net/net_namespace.h:43: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:334: include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writeb(value, PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ In file included from net/core/net-traces.c:8: In file included from include/linux/netdevice.h:43: >> include/net/xdp.h:449:2: error: void function 'xdp_metadata_export_to_skb' should not return a value [-Wreturn-type] return 0; ^ ~ In file included from net/core/net-traces.c:50: In file included from include/trace/events/neigh.h:255: In file included from include/trace/define_trace.h:102: In file included from include/trace/trace_events.h:419: include/trace/events/neigh.h:42:20: warning: variable 'pin6' set but not used [-Wunused-but-set-variable] struct in6_addr *pin6; ^ In file included from net/core/net-traces.c:50: In file included from include/trace/events/neigh.h:255: In file included from include/trace/define_trace.h:103: In file included from include/trace/perf.h:113: include/trace/events/neigh.h:42:20: warning: variable 'pin6' set but not used [-Wunused-but-set-variable] struct in6_addr *pin6; ^ 8 warnings and 1 error generated. vim +/xdp_metadata_export_to_skb +449 include/net/xdp.h 437 438 #ifdef CONFIG_DEBUG_INFO_BTF 439 extern struct btf_id_set8 xdp_metadata_kfunc_ids; 440 static inline u32 xdp_metadata_kfunc_id(int id) 441 { 442 return xdp_metadata_kfunc_ids.pairs[id].id; 443 } 444 void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch); 445 #else 446 static inline u32 xdp_metadata_kfunc_id(int id) { return 0; } 447 static void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) 448 { > 449 return 0; 450 } 451 #endif 452
On Wed, Nov 16, 2022 at 1:48 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Martin KaFai Lau <martin.lau@linux.dev> writes: > > > On 11/14/22 7:02 PM, Stanislav Fomichev wrote: > >> Implement new bpf_xdp_metadata_export_to_skb kfunc which > >> prepares compatible xdp metadata for kernel consumption. > >> This kfunc should be called prior to bpf_redirect > >> or when XDP_PASS'ing the frame into the kernel (note, the drivers > >> have to be updated to enable consuming XDP_PASS'ed metadata). > >> > >> veth driver is amended to consume this metadata when converting to skb. > >> > >> Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate > >> whether the frame has skb metadata. The metadata is currently > >> stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses > >> to work after a call to bpf_xdp_metadata_export_to_skb (can lift > >> this requirement later on if needed, we'd have to memmove > >> xdp_skb_metadata). > > > > It is ok to refuse bpf_xdp_adjust_meta() after bpf_xdp_metadata_export_to_skb() > > for now. However, it will also need to refuse bpf_xdp_adjust_head(). > > I'm also OK with deferring this, although I'm wondering if it isn't just > as easy to just add the memmove() straight away? :) SG, let me try that! Martin also mentioned bpf_xdp_adjust_head needs to be taken care of..
On Tue, Nov 15, 2022 at 11:04 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 11/14/22 7:02 PM, Stanislav Fomichev wrote: > > Implement new bpf_xdp_metadata_export_to_skb kfunc which > > prepares compatible xdp metadata for kernel consumption. > > This kfunc should be called prior to bpf_redirect > > or when XDP_PASS'ing the frame into the kernel (note, the drivers > > have to be updated to enable consuming XDP_PASS'ed metadata). > > > > veth driver is amended to consume this metadata when converting to skb. > > > > Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate > > whether the frame has skb metadata. The metadata is currently > > stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses > > to work after a call to bpf_xdp_metadata_export_to_skb (can lift > > this requirement later on if needed, we'd have to memmove > > xdp_skb_metadata). > > It is ok to refuse bpf_xdp_adjust_meta() after bpf_xdp_metadata_export_to_skb() > for now. However, it will also need to refuse bpf_xdp_adjust_head(). Good catch! > [ ... ] > > > +/* For the packets directed to the kernel, this kfunc exports XDP metadata > > + * into skb context. > > + */ > > +noinline int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) > > +{ > > + return 0; > > +} > > + > > I think it is still better to return 'struct xdp_skb_metata *' instead of > true/false. Like: > > noinline struct xdp_skb_metata *bpf_xdp_metadata_export_to_skb(const struct > xdp_md *ctx) > { > return 0; > } > > The KF_RET_NULL has already been set in > BTF_SET8_START_GLOBAL(xdp_metadata_kfunc_ids). There is > "xdp_btf_struct_access()" that can allow write access to 'struct xdp_skb_metata' > What else is missing? We can try to solve it. Ah, that KF_RET_NULL is a left-over from my previous attempt to return pointers :-) I can retry with returning a pointer, I don't have a preference, but I felt like returning simple -errno might be more simple api-wise. (with bpf_xdp_metadata_export_to_skb returning NULL vs return loggable errno - I'd prefer the latter, but not strongly) > Then there is no need for this double check in patch 8 selftest which is not > easy to use: > > + if (bpf_xdp_metadata_export_to_skb(ctx) < 0) { > + bpf_printk("bpf_xdp_metadata_export_to_skb failed"); > + return XDP_DROP; > + } > > [ ... ] > > + skb_metadata = ctx->skb_metadata; > + if (!skb_metadata) { > + bpf_printk("no ctx->skb_metadata"); > + return XDP_DROP; > + } > > [ ... ] > > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > index b444b1118c4f..71e3bc7ad839 100644 > > --- a/tools/include/uapi/linux/bpf.h > > +++ b/tools/include/uapi/linux/bpf.h > > @@ -6116,6 +6116,12 @@ enum xdp_action { > > XDP_REDIRECT, > > }; > > > > +/* Subset of XDP metadata exported to skb context. > > + */ > > +struct xdp_skb_metadata { > > + __u64 rx_timestamp; > > +}; > > + > > /* user accessible metadata for XDP packet hook > > * new fields must be added to the end of this structure > > */ > > @@ -6128,6 +6134,7 @@ struct xdp_md { > > __u32 rx_queue_index; /* rxq->queue_index */ > > > > __u32 egress_ifindex; /* txq->dev->ifindex */ > > + __bpf_md_ptr(struct xdp_skb_metadata *, skb_metadata); > > Once the above bpf_xdp_metadata_export_to_skb() returning a pointer works, then > it can be another kfunc 'struct xdp_skb_metata * bpf_xdp_get_skb_metadata(const > struct xdp_md *ctx)' to return the skb_metadata which was a similar point > discussed in the previous RFC. I see. I think you've mentioned it previously somewhere to have a kfunc accessor vs uapi field and I totally forgot. Will try to address it, ty!
On Mon, 14 Nov 2022 19:02:05 -0800 Stanislav Fomichev wrote: > Implement new bpf_xdp_metadata_export_to_skb kfunc which > prepares compatible xdp metadata for kernel consumption. > This kfunc should be called prior to bpf_redirect > or when XDP_PASS'ing the frame into the kernel (note, the drivers > have to be updated to enable consuming XDP_PASS'ed metadata). > > veth driver is amended to consume this metadata when converting to skb. > > Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate > whether the frame has skb metadata. The metadata is currently > stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses > to work after a call to bpf_xdp_metadata_export_to_skb (can lift > this requirement later on if needed, we'd have to memmove > xdp_skb_metadata). IMO we should split the xdp -> skb work from the pure HW data access in XDP. We have a proof point here that there is a way of building on top of the first 5 patches to achieve the objective, and that's sufficient to let the prior work going in. ..because some of us may not agree that we should be pushing in a fixed-format structure that's also listed in uAPI. And prefer to, say, let the user define their format and add a call point for a BPF prog to populate the skb from whatever data they decided to stash... That's how I interpreted some of John's comments as well, but I may be wrong. Either way, there is no technical reason for the xdp -> skb field propagation to hold up the HW descriptor access, right? If anything we may be able to make quicker progress if prior work is in tree and multiple people can hack...
On 11/16/22 1:12 PM, Jakub Kicinski wrote: > On Mon, 14 Nov 2022 19:02:05 -0800 Stanislav Fomichev wrote: >> Implement new bpf_xdp_metadata_export_to_skb kfunc which >> prepares compatible xdp metadata for kernel consumption. >> This kfunc should be called prior to bpf_redirect >> or when XDP_PASS'ing the frame into the kernel (note, the drivers >> have to be updated to enable consuming XDP_PASS'ed metadata). >> >> veth driver is amended to consume this metadata when converting to skb. >> >> Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate >> whether the frame has skb metadata. The metadata is currently >> stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses >> to work after a call to bpf_xdp_metadata_export_to_skb (can lift >> this requirement later on if needed, we'd have to memmove >> xdp_skb_metadata). > > IMO we should split the xdp -> skb work from the pure HW data access > in XDP. We have a proof point here that there is a way of building > on top of the first 5 patches to achieve the objective, and that's > sufficient to let the prior work going in. +1 Good idea.
On 15/11/2022 04.02, Stanislav Fomichev wrote: > Implement new bpf_xdp_metadata_export_to_skb kfunc which > prepares compatible xdp metadata for kernel consumption. > This kfunc should be called prior to bpf_redirect > or when XDP_PASS'ing the frame into the kernel (note, the drivers > have to be updated to enable consuming XDP_PASS'ed metadata). > > veth driver is amended to consume this metadata when converting to skb. > > Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate > whether the frame has skb metadata. The metadata is currently > stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses > to work after a call to bpf_xdp_metadata_export_to_skb (can lift > this requirement later on if needed, we'd have to memmove > xdp_skb_metadata). > I think it is wrong to refuses using metadata area (bpf_xdp_adjust_meta) when the function bpf_xdp_metadata_export_to_skb() have been called. In my design they were suppose to co-exist, and BPF-prog was expected to access this directly themselves. With this current design, I think it is better to place the struct xdp_skb_metadata (maybe call it xdp_skb_hints) after xdp_frame (in the top of the frame). This way we don't conflict with metadata and headroom use-cases. Plus, verifier will keep BPF-prog from accessing this area directly (which seems to be one of the new design goals). By placing it after xdp_frame, I think it would be possible to let veth unroll functions seamlessly access this info for XDP_REDIRECT'ed xdp_frame's. WDYT? --Jesper
On Fri, Nov 18, 2022 at 6:05 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > On 15/11/2022 04.02, Stanislav Fomichev wrote: > > Implement new bpf_xdp_metadata_export_to_skb kfunc which > > prepares compatible xdp metadata for kernel consumption. > > This kfunc should be called prior to bpf_redirect > > or when XDP_PASS'ing the frame into the kernel (note, the drivers > > have to be updated to enable consuming XDP_PASS'ed metadata). > > > > veth driver is amended to consume this metadata when converting to skb. > > > > Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate > > whether the frame has skb metadata. The metadata is currently > > stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses > > to work after a call to bpf_xdp_metadata_export_to_skb (can lift > > this requirement later on if needed, we'd have to memmove > > xdp_skb_metadata). > > > > I think it is wrong to refuses using metadata area (bpf_xdp_adjust_meta) > when the function bpf_xdp_metadata_export_to_skb() have been called. > In my design they were suppose to co-exist, and BPF-prog was expected to > access this directly themselves. > > With this current design, I think it is better to place the struct > xdp_skb_metadata (maybe call it xdp_skb_hints) after xdp_frame (in the > top of the frame). This way we don't conflict with metadata and > headroom use-cases. Plus, verifier will keep BPF-prog from accessing > this area directly (which seems to be one of the new design goals). > > By placing it after xdp_frame, I think it would be possible to let veth > unroll functions seamlessly access this info for XDP_REDIRECT'ed > xdp_frame's. > > WDYT? Not everyone seems to be happy with exposing this xdp_skb_metadata via uapi though :-( So I'll drop this part in the v2 for now. But let's definitely keep talking about the future approach. Putting it after xdp_frame SGTM; with this we seem to avoid the need to memmove it on adjust_{head,meta}. But going back to the uapi part, what if we add separate kfunc accessors for skb exported metadata? To export: bpf_xdp_metadata_export_rx_timestamp_to_skb(ctx, rx_timestamp) bpf_xdp_metadata_export_rx_hash_to_skb(ctx, rx_hash) // ^^ these prepare xdp_skb_metadata after xdp_frame, but not expose it via uapi/af_xdp/etc Then bpf_xdp_metadata_export_to_skb can be 'static inline' define in the headers: void bpf_xdp_metadata_export_to_skb(ctx) { if (bpf_xdp_metadata_rx_timestamp_supported(ctx)) bpf_xdp_metadata_export_rx_timestamp_to_skb(ctx, bpf_xdp_metadata_rx_timestamp(ctx)); if (bpf_xdp_metadata_rx_hash_supported(ctx)) bpf_xdp_metadata_export_rx_hash_to_skb(ctx, bpf_xdp_metadata_rx_hash(ctx)); } We can also do the accessors: u64 bpf_xdp_metadata_skb_rx_timestamp(ctx) u32 bpf_xdp_metadata_skb_rx_hash(ctx) Hopefully we can unroll at least these, since they are not part of the drivers, it should be easier to argue... The only issue, it seems, is that if the final bpf program would like to export this metadata to af_xdp, it has to manually adj_meta and use bpf_xdp_metadata_skb_rx_xxx to prepare a custom layout. Not sure whether performance would suffer with this extra copy; but we can at least try and see..
Stanislav Fomichev <sdf@google.com> writes: > On Fri, Nov 18, 2022 at 6:05 AM Jesper Dangaard Brouer > <jbrouer@redhat.com> wrote: >> >> >> On 15/11/2022 04.02, Stanislav Fomichev wrote: >> > Implement new bpf_xdp_metadata_export_to_skb kfunc which >> > prepares compatible xdp metadata for kernel consumption. >> > This kfunc should be called prior to bpf_redirect >> > or when XDP_PASS'ing the frame into the kernel (note, the drivers >> > have to be updated to enable consuming XDP_PASS'ed metadata). >> > >> > veth driver is amended to consume this metadata when converting to skb. >> > >> > Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate >> > whether the frame has skb metadata. The metadata is currently >> > stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses >> > to work after a call to bpf_xdp_metadata_export_to_skb (can lift >> > this requirement later on if needed, we'd have to memmove >> > xdp_skb_metadata). >> > >> >> I think it is wrong to refuses using metadata area (bpf_xdp_adjust_meta) >> when the function bpf_xdp_metadata_export_to_skb() have been called. >> In my design they were suppose to co-exist, and BPF-prog was expected to >> access this directly themselves. >> >> With this current design, I think it is better to place the struct >> xdp_skb_metadata (maybe call it xdp_skb_hints) after xdp_frame (in the >> top of the frame). This way we don't conflict with metadata and >> headroom use-cases. Plus, verifier will keep BPF-prog from accessing >> this area directly (which seems to be one of the new design goals). >> >> By placing it after xdp_frame, I think it would be possible to let veth >> unroll functions seamlessly access this info for XDP_REDIRECT'ed >> xdp_frame's. >> >> WDYT? > > Not everyone seems to be happy with exposing this xdp_skb_metadata via > uapi though :-( > So I'll drop this part in the v2 for now. But let's definitely keep > talking about the future approach. Jakub was objecting to putting it in the UAPI header, but didn't we already agree that this wasn't necessary? I.e., if we just define struct xdp_skb_metadata *bpf_xdp_metadata_export_to_skb() as a kfunc, the xdp_skb_metadata struct won't appear in any UAPI headers and will only be accessible via BTF? And we can put the actual data wherever we choose, since that bit is nicely hidden behind the kfunc, while the returned pointer still allows programs to access it. We could even make that kfunc smart enough that it checks if the field is already populated and just return the pointer to the existing data instead of re-populating it int his case (with a flag to override, maybe?). > Putting it after xdp_frame SGTM; with this we seem to avoid the need > to memmove it on adjust_{head,meta}. > > But going back to the uapi part, what if we add separate kfunc > accessors for skb exported metadata? > > To export: > bpf_xdp_metadata_export_rx_timestamp_to_skb(ctx, rx_timestamp) > bpf_xdp_metadata_export_rx_hash_to_skb(ctx, rx_hash) > // ^^ these prepare xdp_skb_metadata after xdp_frame, but not expose > it via uapi/af_xdp/etc > > Then bpf_xdp_metadata_export_to_skb can be 'static inline' define in > the headers: > > void bpf_xdp_metadata_export_to_skb(ctx) > { > if (bpf_xdp_metadata_rx_timestamp_supported(ctx)) > bpf_xdp_metadata_export_rx_timestamp_to_skb(ctx, > bpf_xdp_metadata_rx_timestamp(ctx)); > if (bpf_xdp_metadata_rx_hash_supported(ctx)) > bpf_xdp_metadata_export_rx_hash_to_skb(ctx, bpf_xdp_metadata_rx_hash(ctx)); > } The problem with this is that the BPF programs then have to keep up with the kernel. I.e., if the kernel later adds support for a new field that is used in the SKB, old XDP programs won't be populating it, which seems suboptimal. I think rather the kernel should be in control of the SKB metadata, and just allow XDP to consume it (and change individual fields as needed). > The only issue, it seems, is that if the final bpf program would like > to export this metadata to af_xdp, it has to manually adj_meta and use > bpf_xdp_metadata_skb_rx_xxx to prepare a custom layout. Not sure > whether performance would suffer with this extra copy; but we can at > least try and see.. If we write the metadata after the packet data, that could still be transferred to AF_XDP, couldn't it? Userspace would just have to know how to find and read it, like it would if it's before the metadata. -Toke
On Sat, Nov 19, 2022 at 4:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Stanislav Fomichev <sdf@google.com> writes: > > > On Fri, Nov 18, 2022 at 6:05 AM Jesper Dangaard Brouer > > <jbrouer@redhat.com> wrote: > >> > >> > >> On 15/11/2022 04.02, Stanislav Fomichev wrote: > >> > Implement new bpf_xdp_metadata_export_to_skb kfunc which > >> > prepares compatible xdp metadata for kernel consumption. > >> > This kfunc should be called prior to bpf_redirect > >> > or when XDP_PASS'ing the frame into the kernel (note, the drivers > >> > have to be updated to enable consuming XDP_PASS'ed metadata). > >> > > >> > veth driver is amended to consume this metadata when converting to skb. > >> > > >> > Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate > >> > whether the frame has skb metadata. The metadata is currently > >> > stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses > >> > to work after a call to bpf_xdp_metadata_export_to_skb (can lift > >> > this requirement later on if needed, we'd have to memmove > >> > xdp_skb_metadata). > >> > > >> > >> I think it is wrong to refuses using metadata area (bpf_xdp_adjust_meta) > >> when the function bpf_xdp_metadata_export_to_skb() have been called. > >> In my design they were suppose to co-exist, and BPF-prog was expected to > >> access this directly themselves. > >> > >> With this current design, I think it is better to place the struct > >> xdp_skb_metadata (maybe call it xdp_skb_hints) after xdp_frame (in the > >> top of the frame). This way we don't conflict with metadata and > >> headroom use-cases. Plus, verifier will keep BPF-prog from accessing > >> this area directly (which seems to be one of the new design goals). > >> > >> By placing it after xdp_frame, I think it would be possible to let veth > >> unroll functions seamlessly access this info for XDP_REDIRECT'ed > >> xdp_frame's. > >> > >> WDYT? > > > > Not everyone seems to be happy with exposing this xdp_skb_metadata via > > uapi though :-( > > So I'll drop this part in the v2 for now. But let's definitely keep > > talking about the future approach. > > Jakub was objecting to putting it in the UAPI header, but didn't we > already agree that this wasn't necessary? > > I.e., if we just define > > struct xdp_skb_metadata *bpf_xdp_metadata_export_to_skb() > > as a kfunc, the xdp_skb_metadata struct won't appear in any UAPI headers > and will only be accessible via BTF? And we can put the actual data > wherever we choose, since that bit is nicely hidden behind the kfunc, > while the returned pointer still allows programs to access it. > > We could even make that kfunc smart enough that it checks if the field > is already populated and just return the pointer to the existing data > instead of re-populating it int his case (with a flag to override, > maybe?). Even if we only expose it via btf, I think the fact that we still expose a somewhat fixed layout is the problem? I'm not sure the fact that we're not technically putting in the uapi header is the issue here, but maybe I'm wrong? Jakub? > > Putting it after xdp_frame SGTM; with this we seem to avoid the need > > to memmove it on adjust_{head,meta}. > > > > But going back to the uapi part, what if we add separate kfunc > > accessors for skb exported metadata? > > > > To export: > > bpf_xdp_metadata_export_rx_timestamp_to_skb(ctx, rx_timestamp) > > bpf_xdp_metadata_export_rx_hash_to_skb(ctx, rx_hash) > > // ^^ these prepare xdp_skb_metadata after xdp_frame, but not expose > > it via uapi/af_xdp/etc > > > > Then bpf_xdp_metadata_export_to_skb can be 'static inline' define in > > the headers: > > > > void bpf_xdp_metadata_export_to_skb(ctx) > > { > > if (bpf_xdp_metadata_rx_timestamp_supported(ctx)) > > bpf_xdp_metadata_export_rx_timestamp_to_skb(ctx, > > bpf_xdp_metadata_rx_timestamp(ctx)); > > if (bpf_xdp_metadata_rx_hash_supported(ctx)) > > bpf_xdp_metadata_export_rx_hash_to_skb(ctx, bpf_xdp_metadata_rx_hash(ctx)); > > } > > The problem with this is that the BPF programs then have to keep up with > the kernel. I.e., if the kernel later adds support for a new field that > is used in the SKB, old XDP programs won't be populating it, which seems > suboptimal. I think rather the kernel should be in control of the SKB > metadata, and just allow XDP to consume it (and change individual fields > as needed). Good point. Although doesn't sound like a huge drawback to me? If that bpf_xdp_metadata_export_to_skb is a part of libbpf/libxdp, the new fields will get populated after a library update.. > > The only issue, it seems, is that if the final bpf program would like > > to export this metadata to af_xdp, it has to manually adj_meta and use > > bpf_xdp_metadata_skb_rx_xxx to prepare a custom layout. Not sure > > whether performance would suffer with this extra copy; but we can at > > least try and see.. > > If we write the metadata after the packet data, that could still be > transferred to AF_XDP, couldn't it? Userspace would just have to know > how to find and read it, like it would if it's before the metadata. Right, but here we again bump into the fact that we need to somehow communicate that layout to the userspace (via btf ids) which doesn't make everybody excited :-)
On Mon, 21 Nov 2022 09:53:02 -0800 Stanislav Fomichev wrote: > > Jakub was objecting to putting it in the UAPI header, but didn't we > > already agree that this wasn't necessary? > > > > I.e., if we just define > > > > struct xdp_skb_metadata *bpf_xdp_metadata_export_to_skb() > > > > as a kfunc, the xdp_skb_metadata struct won't appear in any UAPI headers > > and will only be accessible via BTF? And we can put the actual data > > wherever we choose, since that bit is nicely hidden behind the kfunc, > > while the returned pointer still allows programs to access it. > > > > We could even make that kfunc smart enough that it checks if the field > > is already populated and just return the pointer to the existing data > > instead of re-populating it int his case (with a flag to override, > > maybe?). > > Even if we only expose it via btf, I think the fact that we still > expose a somewhat fixed layout is the problem? > I'm not sure the fact that we're not technically putting in the uapi > header is the issue here, but maybe I'm wrong? > Jakub? Until the device metadata access from BPF is in bpf-next the only opinion I have on this is something along the lines of "not right now". I may be missing some concerns / perspectives, in which case - when is the next "BPF office hours" meeting?
On Mon, Nov 21, 2022 at 10:47 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 21 Nov 2022 09:53:02 -0800 Stanislav Fomichev wrote: > > > Jakub was objecting to putting it in the UAPI header, but didn't we > > > already agree that this wasn't necessary? > > > > > > I.e., if we just define > > > > > > struct xdp_skb_metadata *bpf_xdp_metadata_export_to_skb() > > > > > > as a kfunc, the xdp_skb_metadata struct won't appear in any UAPI headers > > > and will only be accessible via BTF? And we can put the actual data > > > wherever we choose, since that bit is nicely hidden behind the kfunc, > > > while the returned pointer still allows programs to access it. > > > > > > We could even make that kfunc smart enough that it checks if the field > > > is already populated and just return the pointer to the existing data > > > instead of re-populating it int his case (with a flag to override, > > > maybe?). > > > > Even if we only expose it via btf, I think the fact that we still > > expose a somewhat fixed layout is the problem? > > I'm not sure the fact that we're not technically putting in the uapi > > header is the issue here, but maybe I'm wrong? > > Jakub? > > Until the device metadata access from BPF is in bpf-next the only > opinion I have on this is something along the lines of "not right now". > > I may be missing some concerns / perspectives, in which case - when > is the next "BPF office hours" meeting? SG! Let's get back to it once we get the basic rx metadata sorted out. I'll probably look at the tx part next though; that xdp->skb path is of least priority for me.
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index c626580a2294..35349a232209 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -803,7 +803,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, void *orig_data, *orig_data_end; struct bpf_prog *xdp_prog; struct xdp_buff xdp; - u32 act, metalen; + u32 act; int off; skb_prepare_for_gro(skb); @@ -886,9 +886,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, skb->protocol = eth_type_trans(skb, rq->dev); - metalen = xdp.data - xdp.data_meta; - if (metalen) - skb_metadata_set(skb, metalen); + xdp_convert_skb_metadata(&xdp, skb); out: return skb; drop: @@ -1663,7 +1661,9 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) static void veth_unroll_kfunc(const struct bpf_prog *prog, u32 func_id, struct bpf_patch *patch) { - if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) { + if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_EXPORT_TO_SKB)) { + return xdp_metadata_export_to_skb(prog, patch); + } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) { /* return true; */ bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 1)); } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) { diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4e464a27adaf..be6a9559dbf1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4219,6 +4219,10 @@ static inline bool skb_metadata_differs(const struct sk_buff *skb_a, true : __skb_metadata_differs(skb_a, skb_b, len_a); } +struct xdp_skb_metadata; +bool skb_metadata_import_from_xdp(struct sk_buff *skb, + struct xdp_skb_metadata *meta); + static inline void skb_metadata_set(struct sk_buff *skb, u8 meta_len) { skb_shinfo(skb)->meta_len = meta_len; diff --git a/include/net/xdp.h b/include/net/xdp.h index 2a82a98f2f9f..547a6a0e99f8 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -73,6 +73,7 @@ enum xdp_buff_flags { XDP_FLAGS_FRAGS_PF_MEMALLOC = BIT(1), /* xdp paged memory is under * pressure */ + XDP_FLAGS_HAS_SKB_METADATA = BIT(2), /* xdp_skb_metadata */ }; struct xdp_buff { @@ -91,6 +92,11 @@ static __always_inline bool xdp_buff_has_frags(struct xdp_buff *xdp) return !!(xdp->flags & XDP_FLAGS_HAS_FRAGS); } +static __always_inline bool xdp_buff_has_skb_metadata(struct xdp_buff *xdp) +{ + return !!(xdp->flags & XDP_FLAGS_HAS_SKB_METADATA); +} + static __always_inline void xdp_buff_set_frags_flag(struct xdp_buff *xdp) { xdp->flags |= XDP_FLAGS_HAS_FRAGS; @@ -306,6 +312,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp) return xdp_frame; } +bool xdp_convert_skb_metadata(struct xdp_buff *xdp, struct sk_buff *skb); + void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, struct xdp_buff *xdp); void xdp_return_frame(struct xdp_frame *xdpf); @@ -411,6 +419,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE #define XDP_METADATA_KFUNC_xxx \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_EXPORT_TO_SKB, \ + bpf_xdp_metadata_export_to_skb) \ XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED, \ bpf_xdp_metadata_rx_timestamp_supported) \ XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \ @@ -423,14 +433,21 @@ XDP_METADATA_KFUNC_xxx MAX_XDP_METADATA_KFUNC, }; +struct bpf_patch; + #ifdef CONFIG_DEBUG_INFO_BTF extern struct btf_id_set8 xdp_metadata_kfunc_ids; static inline u32 xdp_metadata_kfunc_id(int id) { return xdp_metadata_kfunc_ids.pairs[id].id; } +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch); #else static inline u32 xdp_metadata_kfunc_id(int id) { return 0; } +static void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) +{ + return 0; +} #endif #endif /* __LINUX_NET_XDP_H__ */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b444b1118c4f..71e3bc7ad839 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6116,6 +6116,12 @@ enum xdp_action { XDP_REDIRECT, }; +/* Subset of XDP metadata exported to skb context. + */ +struct xdp_skb_metadata { + __u64 rx_timestamp; +}; + /* user accessible metadata for XDP packet hook * new fields must be added to the end of this structure */ @@ -6128,6 +6134,7 @@ struct xdp_md { __u32 rx_queue_index; /* rxq->queue_index */ __u32 egress_ifindex; /* txq->dev->ifindex */ + __bpf_md_ptr(struct xdp_skb_metadata *, skb_metadata); }; /* DEVMAP map-value layout diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b657ed6eb277..6879ad3a6026 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -14023,6 +14023,7 @@ static int unroll_kfunc_call(struct bpf_verifier_env *env, enum bpf_prog_type prog_type; struct bpf_prog_aux *aux; struct btf *desc_btf; + u32 allowed, mangled; u32 *kfunc_flags; u32 func_id; @@ -14050,6 +14051,20 @@ static int unroll_kfunc_call(struct bpf_verifier_env *env, */ bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 0)); } + + allowed = 1 << BPF_REG_0; + allowed |= 1 << BPF_REG_1; + allowed |= 1 << BPF_REG_2; + allowed |= 1 << BPF_REG_3; + allowed |= 1 << BPF_REG_4; + allowed |= 1 << BPF_REG_5; + mangled = bpf_patch_magles_registers(patch); + if (WARN_ON_ONCE(mangled & ~allowed)) { + bpf_patch_free(patch); + verbose(env, "bpf verifier is misconfigured\n"); + return -EINVAL; + } + return bpf_patch_err(patch); } diff --git a/net/core/filter.c b/net/core/filter.c index 6dd2baf5eeb2..2497144e4216 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4094,6 +4094,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) return -EINVAL; if (unlikely(xdp_metalen_invalid(metalen))) return -EACCES; + if (unlikely(xdp_buff_has_skb_metadata(xdp))) + return -EACCES; xdp->data_meta = meta; @@ -8690,6 +8692,8 @@ static bool __is_valid_xdp_access(int off, int size) return true; } +BTF_ID_LIST_SINGLE(xdp_to_skb_metadata_btf_ids, struct, xdp_skb_metadata); + static bool xdp_is_valid_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, @@ -8722,6 +8726,18 @@ static bool xdp_is_valid_access(int off, int size, case offsetof(struct xdp_md, data_end): info->reg_type = PTR_TO_PACKET_END; break; + case offsetof(struct xdp_md, skb_metadata): + info->btf = bpf_get_btf_vmlinux(); + if (IS_ERR(info->btf)) + return PTR_ERR(info->btf); + if (!info->btf) + return -EINVAL; + + info->reg_type = PTR_TO_BTF_ID_OR_NULL; + info->btf_id = xdp_to_skb_metadata_btf_ids[0]; + if (size == sizeof(__u64)) + return true; + return false; } return __is_valid_xdp_access(off, size); @@ -9814,6 +9830,30 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, offsetof(struct net_device, ifindex)); break; + case offsetof(struct xdp_md, skb_metadata): + /* dst_reg = xdp_buff->flags; */ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, flags)); + /* dst_reg &= XDP_FLAGS_HAS_SKB_METADATA; */ + *insn++ = BPF_ALU64_IMM(BPF_AND, si->dst_reg, + XDP_FLAGS_HAS_SKB_METADATA); + + /* if (dst_reg != 0) { */ + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 3); + /* dst_reg = xdp_buff->data_meta; */ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_meta), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, data_meta)); + /* dst_reg -= sizeof(struct xdp_skb_metadata); */ + *insn++ = BPF_ALU64_IMM(BPF_SUB, si->dst_reg, + sizeof(struct xdp_skb_metadata)); + *insn++ = BPF_JMP_A(1); + /* } else { */ + /* return 0; */ + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0); + /* } */ + break; } return insn - insn_buf; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 90d085290d49..0cc24ca20e4d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -72,6 +72,7 @@ #include <net/mptcp.h> #include <net/mctp.h> #include <net/page_pool.h> +#include <net/xdp.h> #include <linux/uaccess.h> #include <trace/events/skb.h> @@ -6675,3 +6676,22 @@ nodefer: __kfree_skb(skb); if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1)) smp_call_function_single_async(cpu, &sd->defer_csd); } + +bool skb_metadata_import_from_xdp(struct sk_buff *skb, + struct xdp_skb_metadata *meta) +{ + /* Optional SKB info, currently missing: + * - HW checksum info (skb->ip_summed) + * - HW RX hash (skb_set_hash) + * - RX ring dev queue index (skb_record_rx_queue) + */ + + if (meta->rx_timestamp) { + *skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){ + .hwtstamp = ns_to_ktime(meta->rx_timestamp), + }; + } + + return true; +} +EXPORT_SYMBOL(skb_metadata_import_from_xdp); diff --git a/net/core/xdp.c b/net/core/xdp.c index 22f1e44700eb..ede9b1b987d9 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -368,6 +368,22 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); +bool xdp_convert_skb_metadata(struct xdp_buff *xdp, struct sk_buff *skb) +{ + struct xdp_skb_metadata *meta; + u32 metalen; + + metalen = xdp->data - xdp->data_meta; + if (metalen) + skb_metadata_set(skb, metalen); + if (xdp_buff_has_skb_metadata(xdp)) { + meta = xdp->data_meta - sizeof(*meta); + return skb_metadata_import_from_xdp(skb, meta); + } + return false; +} +EXPORT_SYMBOL(xdp_convert_skb_metadata); + /* XDP RX runs under NAPI protection, and in different delivery error * scenarios (e.g. queue full), it is possible to return the xdp_frame * while still leveraging this protection. The @napi_direct boolean @@ -619,6 +635,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, { struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf); unsigned int headroom, frame_size; + struct xdp_skb_metadata *meta; void *hard_start; u8 nr_frags; @@ -653,11 +670,10 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, /* Essential SKB info: protocol and skb->dev */ skb->protocol = eth_type_trans(skb, dev); - /* Optional SKB info, currently missing: - * - HW checksum info (skb->ip_summed) - * - HW RX hash (skb_set_hash) - * - RX ring dev queue index (skb_record_rx_queue) - */ + if (xdpf->flags & XDP_FLAGS_HAS_SKB_METADATA) { + meta = xdpf->data - xdpf->metasize - sizeof(*meta); + skb_metadata_import_from_xdp(skb, meta); + } /* Until page_pool get SKB return path, release DMA here */ xdp_release_frame(xdpf); @@ -712,6 +728,14 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) return nxdpf; } +/* For the packets directed to the kernel, this kfunc exports XDP metadata + * into skb context. + */ +noinline int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) +{ + return 0; +} + /* Indicates whether particular device supports rx_timestamp metadata. * This is an optional helper to support marking some branches as * "dead code" in the BPF programs. @@ -736,15 +760,126 @@ BTF_SET8_START_GLOBAL(xdp_metadata_kfunc_ids) XDP_METADATA_KFUNC_xxx #undef XDP_METADATA_KFUNC BTF_SET8_END(xdp_metadata_kfunc_ids) +EXPORT_SYMBOL(xdp_metadata_kfunc_ids); static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = { .owner = THIS_MODULE, .set = &xdp_metadata_kfunc_ids, }; +/* Since we're not actually doing a call but instead rewriting + * in place, we can only afford to use R0-R5 scratch registers + * and hidden BPF_PUSH64/BPF_POP64 opcodes to spill to the stack. + */ +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) +{ + u32 func_id; + + /* The code below generates the following: + * + * int bpf_xdp_metadata_export_to_skb(struct xdp_md *ctx) + * { + * struct xdp_skb_metadata *meta = ctx->data_meta - sizeof(*meta); + * int ret; + * + * if (ctx->flags & XDP_FLAGS_HAS_SKB_METADATA) + * return -1; + * + * if (meta < ctx->data_hard_start + sizeof(struct xdp_frame)) + * return -1; + * + * meta->rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx); + * ctx->flags |= BPF_F_XDP_HAS_METADATA; + * + * return 0; + * } + */ + + bpf_patch_append(patch, + BPF_MOV64_IMM(BPF_REG_0, -1), + + /* r2 = ((struct xdp_buff *)r1)->flags; */ + BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, flags)), + + /* r2 &= XDP_FLAGS_HAS_SKB_METADATA; */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_2, XDP_FLAGS_HAS_SKB_METADATA), + + /* if (xdp_buff->flags & XDP_FLAGS_HAS_SKB_METADATA) return -1; */ + BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0, S16_MAX), + + /* r2 = ((struct xdp_buff *)r1)->data_meta; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, data_meta)), + /* r2 -= sizeof(struct xdp_skb_metadata); */ + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, + sizeof(struct xdp_skb_metadata)), + /* r3 = ((struct xdp_buff *)r1)->data_hard_start; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_buff, data_hard_start)), + /* r3 += sizeof(struct xdp_frame) */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, + sizeof(struct xdp_frame)), + /* if (data_meta-sizeof(struct xdp_skb_metadata) < + * data_hard_start+sizeof(struct xdp_frame)) return -1; + */ + BPF_JMP_REG(BPF_JLT, BPF_REG_2, BPF_REG_3, S16_MAX), + + /* r2 = ((struct xdp_buff *)r1)->flags; */ + BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, flags)), + + /* r2 |= XDP_FLAGS_HAS_SKB_METADATA; */ + BPF_ALU64_IMM(BPF_OR, BPF_REG_2, XDP_FLAGS_HAS_SKB_METADATA), + + /* ((struct xdp_buff *)r1)->flags = r2; */ + BPF_STX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + BPF_REG_1, BPF_REG_2, + offsetof(struct xdp_buff, flags)), + + /* push r1 */ + BPF_PUSH64(BPF_REG_1), + ); + + /* r0 = bpf_xdp_metadata_rx_timestamp(ctx); */ + func_id = xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP); + prog->aux->xdp_kfunc_ndo->ndo_unroll_kfunc(prog, func_id, patch); + + bpf_patch_append(patch, + /* pop r1 */ + BPF_POP64(BPF_REG_1), + + /* r2 = ((struct xdp_buff *)r1)->data_meta; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, data_meta)), + /* r2 -= sizeof(struct xdp_skb_metadata); */ + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, + sizeof(struct xdp_skb_metadata)), + + /* *((struct xdp_skb_metadata *)r2)->rx_timestamp = r0; */ + BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, + offsetof(struct xdp_skb_metadata, rx_timestamp)), + + /* return 0; */ + BPF_MOV64_IMM(BPF_REG_0, 0), + ); + + bpf_patch_resolve_jmp(patch); +} +EXPORT_SYMBOL(xdp_metadata_export_to_skb); + static int __init xdp_metadata_init(void) { return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set); } late_initcall(xdp_metadata_init); +#else +struct btf_id_set8 xdp_metadata_kfunc_ids = {}; +EXPORT_SYMBOL(xdp_metadata_kfunc_ids); +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) +{ +} +EXPORT_SYMBOL(xdp_metadata_export_to_skb); #endif diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b444b1118c4f..71e3bc7ad839 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -6116,6 +6116,12 @@ enum xdp_action { XDP_REDIRECT, }; +/* Subset of XDP metadata exported to skb context. + */ +struct xdp_skb_metadata { + __u64 rx_timestamp; +}; + /* user accessible metadata for XDP packet hook * new fields must be added to the end of this structure */ @@ -6128,6 +6134,7 @@ struct xdp_md { __u32 rx_queue_index; /* rxq->queue_index */ __u32 egress_ifindex; /* txq->dev->ifindex */ + __bpf_md_ptr(struct xdp_skb_metadata *, skb_metadata); }; /* DEVMAP map-value layout
Implement new bpf_xdp_metadata_export_to_skb kfunc which prepares compatible xdp metadata for kernel consumption. This kfunc should be called prior to bpf_redirect or when XDP_PASS'ing the frame into the kernel (note, the drivers have to be updated to enable consuming XDP_PASS'ed metadata). veth driver is amended to consume this metadata when converting to skb. Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate whether the frame has skb metadata. The metadata is currently stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses to work after a call to bpf_xdp_metadata_export_to_skb (can lift this requirement later on if needed, we'd have to memmove xdp_skb_metadata). Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> --- drivers/net/veth.c | 10 +-- include/linux/skbuff.h | 4 + include/net/xdp.h | 17 ++++ include/uapi/linux/bpf.h | 7 ++ kernel/bpf/verifier.c | 15 ++++ net/core/filter.c | 40 +++++++++ net/core/skbuff.c | 20 +++++ net/core/xdp.c | 145 +++++++++++++++++++++++++++++++-- tools/include/uapi/linux/bpf.h | 7 ++ 9 files changed, 255 insertions(+), 10 deletions(-)