Message ID | 20230119221536.3349901-18-sdf@google.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | BPF |
Headers | show |
Series | xdp: hints via kfuncs | expand |
On 1/19/23 2:15 PM, Stanislav Fomichev wrote: > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile > index e09bef2b7502..9c961d2d868e 100644 > --- a/tools/testing/selftests/bpf/Makefile > +++ b/tools/testing/selftests/bpf/Makefile > @@ -83,7 +83,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ > TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ > flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ > test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ > - xskxceiver xdp_redirect_multi xdp_synproxy veristat > + xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata > > TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file > TEST_GEN_FILES += liburandom_read.so > @@ -383,6 +383,7 @@ test_subskeleton.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib > test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o > test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o > xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o > +xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o > > LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps))) > > @@ -580,6 +581,10 @@ $(OUTPUT)/xskxceiver: xskxceiver.c $(OUTPUT)/xsk.o $(OUTPUT)/xsk_xdp_progs.skel. > $(call msg,BINARY,,$@) > $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ > > +$(OUTPUT)/xdp_hw_metadata: xdp_hw_metadata.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xsk.o $(OUTPUT)/xdp_hw_metadata.skel.h | $(OUTPUT) > + $(call msg,BINARY,,$@) > + $(Q)$(CC) $(CFLAGS) -static $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ My dev machine fails on '-static' :(. A few machines that I got also don't have those static libraries, so likely the default environment that I got here. It seems to be the only binary using '-static' in this Makefile. Can it be removed or at least not the default?
On Fri, Jan 20, 2023 at 2:30 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 1/19/23 2:15 PM, Stanislav Fomichev wrote: > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile > > index e09bef2b7502..9c961d2d868e 100644 > > --- a/tools/testing/selftests/bpf/Makefile > > +++ b/tools/testing/selftests/bpf/Makefile > > @@ -83,7 +83,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ > > TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ > > flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ > > test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ > > - xskxceiver xdp_redirect_multi xdp_synproxy veristat > > + xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata > > > > TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file > > TEST_GEN_FILES += liburandom_read.so > > @@ -383,6 +383,7 @@ test_subskeleton.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib > > test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o > > test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o > > xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o > > +xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o > > > > LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps))) > > > > @@ -580,6 +581,10 @@ $(OUTPUT)/xskxceiver: xskxceiver.c $(OUTPUT)/xsk.o $(OUTPUT)/xsk_xdp_progs.skel. > > $(call msg,BINARY,,$@) > > $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ > > > > +$(OUTPUT)/xdp_hw_metadata: xdp_hw_metadata.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xsk.o $(OUTPUT)/xdp_hw_metadata.skel.h | $(OUTPUT) > > + $(call msg,BINARY,,$@) > > + $(Q)$(CC) $(CFLAGS) -static $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ > > My dev machine fails on '-static' :(. A few machines that I got also don't have > those static libraries, so likely the default environment that I got here. > > It seems to be the only binary using '-static' in this Makefile. Can it be > removed or at least not the default? Sure, I can leave it out. It's mostly here due to G's environment where it is easier to work with static binaries.
Testing this on mlx5 and I'm not getting the RX-timestamp. See command details below. On 19/01/2023 23.15, Stanislav Fomichev wrote: > To be used for verification of driver implementations. Note that > the skb path is gone from the series, but I'm still keeping the > implementation for any possible future work. > > $ xdp_hw_metadata <ifname> sudo ./xdp_hw_metadata mlx5p1 Output: [...cut ...] open bpf program... load bpf program... prepare skb endpoint... XXX timestamping_enable(): setsockopt(SO_TIMESTAMPING) ret:0 prepare xsk map... map[0] = 3 map[1] = 4 map[2] = 5 map[3] = 6 map[4] = 7 map[5] = 8 attach bpf program... poll: 0 (0) poll: 0 (0) poll: 0 (0) poll: 1 (0) xsk_ring_cons__peek: 1 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_timestamp: 0 rx_hash: 2773355807 0x1821788: complete idx=8 addr=8000 poll: 0 (0) The trace_pipe: $ sudo cat /sys/kernel/debug/tracing/trace_pipe <idle>-0 [005] ..s2. 2722.884762: bpf_trace_printk: forwarding UDP:9091 to AF_XDP <idle>-0 [005] ..s2. 2722.884771: bpf_trace_printk: populated rx_hash with 2773355807 > On the other machine: > > $ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP Fixing the source-port to see if RX-hash remains the same. $ echo xdp | nc --source-port=2000 --udp 198.18.1.1 9091 > $ echo -n skb | nc -u -q1 <target> 9092 # for skb > > Sample output: > > # xdp > xsk_ring_cons__peek: 1 > 0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > rx_timestamp_supported: 1 > rx_timestamp: 1667850075063948829 > 0x19f9090: complete idx=8 addr=8000 xsk_ring_cons__peek: 1 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_timestamp: 0 rx_hash: 2773355807 0x1821788: complete idx=8 addr=8000 It doesn't look like hardware RX-timestamps are getting enabled. [... cut to relevant code ...] > diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c > new file mode 100644 > index 000000000000..0008f0f239e8 > --- /dev/null > +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c > @@ -0,0 +1,403 @@ [...] > +static void timestamping_enable(int fd, int val) > +{ > + int ret; > + > + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); > + if (ret < 0) > + error(-1, errno, "setsockopt(SO_TIMESTAMPING)"); > +} > + > +int main(int argc, char *argv[]) > +{ [...] > + printf("prepare skb endpoint...\n"); > + server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, 1000); > + if (server_fd < 0) > + error(-1, errno, "start_server"); > + timestamping_enable(server_fd, > + SOF_TIMESTAMPING_SOFTWARE | > + SOF_TIMESTAMPING_RAW_HARDWARE); > + I don't think this timestamping_enable() with these flags are enough to enable hardware timestamping. --Jesper
On Tue, Jan 24, 2023 at 7:26 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > Testing this on mlx5 and I'm not getting the RX-timestamp. > See command details below. CC'ed Toke since I've never tested mlx5 myself. I was pretty close to getting the setup late last week, let me try to see whether it's ready or not. > On 19/01/2023 23.15, Stanislav Fomichev wrote: > > To be used for verification of driver implementations. Note that > > the skb path is gone from the series, but I'm still keeping the > > implementation for any possible future work. > > > > $ xdp_hw_metadata <ifname> > > sudo ./xdp_hw_metadata mlx5p1 > > Output: > [...cut ...] > open bpf program... > load bpf program... > prepare skb endpoint... > XXX timestamping_enable(): setsockopt(SO_TIMESTAMPING) ret:0 > prepare xsk map... > map[0] = 3 > map[1] = 4 > map[2] = 5 > map[3] = 6 > map[4] = 7 > map[5] = 8 > attach bpf program... > poll: 0 (0) > poll: 0 (0) > poll: 0 (0) > poll: 1 (0) > xsk_ring_cons__peek: 1 > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > rx_timestamp: 0 > rx_hash: 2773355807 > 0x1821788: complete idx=8 addr=8000 > poll: 0 (0) > > The trace_pipe: > > $ sudo cat /sys/kernel/debug/tracing/trace_pipe > <idle>-0 [005] ..s2. 2722.884762: bpf_trace_printk: > forwarding UDP:9091 to AF_XDP > <idle>-0 [005] ..s2. 2722.884771: bpf_trace_printk: > populated rx_hash with 2773355807 > > > > On the other machine: > > > > $ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP > > Fixing the source-port to see if RX-hash remains the same. > > $ echo xdp | nc --source-port=2000 --udp 198.18.1.1 9091 > > > $ echo -n skb | nc -u -q1 <target> 9092 # for skb > > > > Sample output: > > > > # xdp > > xsk_ring_cons__peek: 1 > > 0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > > rx_timestamp_supported: 1 > > rx_timestamp: 1667850075063948829 > > 0x19f9090: complete idx=8 addr=8000 > > xsk_ring_cons__peek: 1 > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > rx_timestamp: 0 > rx_hash: 2773355807 > 0x1821788: complete idx=8 addr=8000 > > It doesn't look like hardware RX-timestamps are getting enabled. > > [... cut to relevant code ...] > > > diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c > > new file mode 100644 > > index 000000000000..0008f0f239e8 > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c > > @@ -0,0 +1,403 @@ > [...] > > > +static void timestamping_enable(int fd, int val) > > +{ > > + int ret; > > + > > + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); > > + if (ret < 0) > > + error(-1, errno, "setsockopt(SO_TIMESTAMPING)"); > > +} > > + > > +int main(int argc, char *argv[]) > > +{ > [...] > > > + printf("prepare skb endpoint...\n"); > > + server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, 1000); > > + if (server_fd < 0) > > + error(-1, errno, "start_server"); > > + timestamping_enable(server_fd, > > + SOF_TIMESTAMPING_SOFTWARE | > > + SOF_TIMESTAMPING_RAW_HARDWARE); > > + > > I don't think this timestamping_enable() with these flags are enough to > enable hardware timestamping. > > --Jesper >
On 01/24, Stanislav Fomichev wrote: > On Tue, Jan 24, 2023 at 7:26 AM Jesper Dangaard Brouer > <jbrouer@redhat.com> wrote: > > > > > > Testing this on mlx5 and I'm not getting the RX-timestamp. > > See command details below. > CC'ed Toke since I've never tested mlx5 myself. > I was pretty close to getting the setup late last week, let me try to > see whether it's ready or not. > > On 19/01/2023 23.15, Stanislav Fomichev wrote: > > > To be used for verification of driver implementations. Note that > > > the skb path is gone from the series, but I'm still keeping the > > > implementation for any possible future work. > > > > > > $ xdp_hw_metadata <ifname> > > > > sudo ./xdp_hw_metadata mlx5p1 > > > > Output: > > [...cut ...] > > open bpf program... > > load bpf program... > > prepare skb endpoint... > > XXX timestamping_enable(): setsockopt(SO_TIMESTAMPING) ret:0 > > prepare xsk map... > > map[0] = 3 > > map[1] = 4 > > map[2] = 5 > > map[3] = 6 > > map[4] = 7 > > map[5] = 8 > > attach bpf program... > > poll: 0 (0) > > poll: 0 (0) > > poll: 0 (0) > > poll: 1 (0) > > xsk_ring_cons__peek: 1 > > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > > rx_timestamp: 0 > > rx_hash: 2773355807 > > 0x1821788: complete idx=8 addr=8000 > > poll: 0 (0) > > > > The trace_pipe: > > > > $ sudo cat /sys/kernel/debug/tracing/trace_pipe > > <idle>-0 [005] ..s2. 2722.884762: bpf_trace_printk: > > forwarding UDP:9091 to AF_XDP > > <idle>-0 [005] ..s2. 2722.884771: bpf_trace_printk: > > populated rx_hash with 2773355807 > > > > > > > On the other machine: > > > > > > $ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP > > > > Fixing the source-port to see if RX-hash remains the same. > > > > $ echo xdp | nc --source-port=2000 --udp 198.18.1.1 9091 > > > > > $ echo -n skb | nc -u -q1 <target> 9092 # for skb > > > > > > Sample output: > > > > > > # xdp > > > xsk_ring_cons__peek: 1 > > > 0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 > comp_addr=8000 > > > rx_timestamp_supported: 1 > > > rx_timestamp: 1667850075063948829 > > > 0x19f9090: complete idx=8 addr=8000 > > > > xsk_ring_cons__peek: 1 > > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > > rx_timestamp: 0 > > rx_hash: 2773355807 > > 0x1821788: complete idx=8 addr=8000 > > > > It doesn't look like hardware RX-timestamps are getting enabled. > > > > [... cut to relevant code ...] > > > > > diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c > b/tools/testing/selftests/bpf/xdp_hw_metadata.c > > > new file mode 100644 > > > index 000000000000..0008f0f239e8 > > > --- /dev/null > > > +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c > > > @@ -0,0 +1,403 @@ > > [...] > > > > > +static void timestamping_enable(int fd, int val) > > > +{ > > > + int ret; > > > + > > > + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, > sizeof(val)); > > > + if (ret < 0) > > > + error(-1, errno, "setsockopt(SO_TIMESTAMPING)"); > > > +} > > > + > > > +int main(int argc, char *argv[]) > > > +{ > > [...] > > > > > + printf("prepare skb endpoint...\n"); > > > + server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, > 1000); > > > + if (server_fd < 0) > > > + error(-1, errno, "start_server"); > > > + timestamping_enable(server_fd, > > > + SOF_TIMESTAMPING_SOFTWARE | > > > + SOF_TIMESTAMPING_RAW_HARDWARE); > > > + > > > > I don't think this timestamping_enable() with these flags are enough to > > enable hardware timestamping. Yeah, agreed, looks like that's the issue. timestamping_enable() has been used for the xdp->skb path that I've eventually removed from the series, so it's mostly a noop here.. Maybe you can try the following before I send a proper patch? diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c index 0008f0f239e8..dceddb17fbc9 100644 --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -24,6 +24,7 @@ #include <linux/net_tstamp.h> #include <linux/udp.h> #include <linux/sockios.h> +#include <linux/net_tstamp.h> #include <sys/mman.h> #include <net/if.h> #include <poll.h> @@ -278,13 +279,37 @@ static int rxq_num(const char *ifname) ret = ioctl(fd, SIOCETHTOOL, &ifr); if (ret < 0) - error(-1, errno, "socket"); + error(-1, errno, "ioctl(SIOCETHTOOL)"); close(fd); return ch.rx_count + ch.combined_count; } +static void hwtstamp_enable(const char *ifname) +{ + struct hwtstamp_config cfg = { + .rx_filter = HWTSTAMP_FILTER_ALL, + + }; + + struct ifreq ifr = { + .ifr_data = (void *)&cfg, + }; + strcpy(ifr.ifr_name, ifname); + int fd, ret; + + fd = socket(AF_UNIX, SOCK_DGRAM, 0); + if (fd < 0) + error(-1, errno, "socket"); + + ret = ioctl(fd, SIOCSHWTSTAMP, &ifr); + if (ret < 0) + error(-1, errno, "ioctl(SIOCSHWTSTAMP)"); + + close(fd); +} + static void cleanup(void) { LIBBPF_OPTS(bpf_xdp_attach_opts, opts); @@ -341,6 +366,8 @@ int main(int argc, char *argv[]) printf("rxq: %d\n", rxq); + hwtstamp_enable(ifname); + rx_xsk = malloc(sizeof(struct xsk) * rxq); if (!rx_xsk) error(-1, ENOMEM, "malloc"); > > --Jesper > >
On 24/01/2023 19.48, sdf@google.com wrote: > On 01/24, Stanislav Fomichev wrote: >> On Tue, Jan 24, 2023 at 7:26 AM Jesper Dangaard Brouer >> <jbrouer@redhat.com> wrote: >> > >> > >> > Testing this on mlx5 and I'm not getting the RX-timestamp. >> > See command details below. > >> CC'ed Toke since I've never tested mlx5 myself. >> I was pretty close to getting the setup late last week, let me try to >> see whether it's ready or not. > >> > On 19/01/2023 23.15, Stanislav Fomichev wrote: >> > > To be used for verification of driver implementations. Note that >> > > the skb path is gone from the series, but I'm still keeping the >> > > implementation for any possible future work. >> > > >> > > $ xdp_hw_metadata <ifname> >> > >> > sudo ./xdp_hw_metadata mlx5p1 >> > >> > Output: >> > [...cut ...] >> > open bpf program... >> > load bpf program... >> > prepare skb endpoint... >> > XXX timestamping_enable(): setsockopt(SO_TIMESTAMPING) ret:0 >> > prepare xsk map... >> > map[0] = 3 >> > map[1] = 4 >> > map[2] = 5 >> > map[3] = 6 >> > map[4] = 7 >> > map[5] = 8 >> > attach bpf program... >> > poll: 0 (0) >> > poll: 0 (0) >> > poll: 0 (0) >> > poll: 1 (0) >> > xsk_ring_cons__peek: 1 >> > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 >> > rx_timestamp: 0 >> > rx_hash: 2773355807 >> > 0x1821788: complete idx=8 addr=8000 >> > poll: 0 (0) >> > >> > The trace_pipe: >> > >> > $ sudo cat /sys/kernel/debug/tracing/trace_pipe >> > <idle>-0 [005] ..s2. 2722.884762: bpf_trace_printk: >> > forwarding UDP:9091 to AF_XDP >> > <idle>-0 [005] ..s2. 2722.884771: bpf_trace_printk: >> > populated rx_hash with 2773355807 >> > >> > >> > > On the other machine: >> > > >> > > $ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP >> > >> > Fixing the source-port to see if RX-hash remains the same. >> > >> > $ echo xdp | nc --source-port=2000 --udp 198.18.1.1 9091 >> > >> > > $ echo -n skb | nc -u -q1 <target> 9092 # for skb >> > > >> > > Sample output: >> > > >> > > # xdp >> > > xsk_ring_cons__peek: 1 >> > > 0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 >> comp_addr=8000 >> > > rx_timestamp_supported: 1 >> > > rx_timestamp: 1667850075063948829 >> > > 0x19f9090: complete idx=8 addr=8000 >> > >> > xsk_ring_cons__peek: 1 >> > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 >> > rx_timestamp: 0 >> > rx_hash: 2773355807 >> > 0x1821788: complete idx=8 addr=8000 >> > >> > It doesn't look like hardware RX-timestamps are getting enabled. >> > >> > [... cut to relevant code ...] >> > >> > > diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c >> b/tools/testing/selftests/bpf/xdp_hw_metadata.c >> > > new file mode 100644 >> > > index 000000000000..0008f0f239e8 >> > > --- /dev/null >> > > +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c >> > > @@ -0,0 +1,403 @@ >> > [...] >> > >> > > +static void timestamping_enable(int fd, int val) >> > > +{ >> > > + int ret; >> > > + >> > > + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, >> sizeof(val)); >> > > + if (ret < 0) >> > > + error(-1, errno, "setsockopt(SO_TIMESTAMPING)"); >> > > +} >> > > + >> > > +int main(int argc, char *argv[]) >> > > +{ >> > [...] >> > >> > > + printf("prepare skb endpoint...\n"); >> > > + server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, >> 1000); >> > > + if (server_fd < 0) >> > > + error(-1, errno, "start_server"); >> > > + timestamping_enable(server_fd, >> > > + SOF_TIMESTAMPING_SOFTWARE | >> > > + SOF_TIMESTAMPING_RAW_HARDWARE); >> > > + >> > >> > I don't think this timestamping_enable() with these flags are enough to >> > enable hardware timestamping. > > Yeah, agreed, looks like that's the issue. timestamping_enable() has > been used for the xdp->skb path that I've eventually removed from the > series, so it's mostly a noop here.. > > Maybe you can try the following before I send a proper patch? Yes, below patch fixed the issue, thx. Now I get HW timestamps, plus I added some software CLOCK_TAI timestamps to compare against. Output is now: poll: 1 (0) xsk_ring_cons__peek: 1 0xf64788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_hash: 3697961069 rx_timestamp: 1674657672142214773 (sec:1674657672.1422) XDP RX-time: 1674657709561774876 (sec:1674657709.5618) delta sec:37.4196 AF_XDP time: 1674657709561871034 (sec:1674657709.5619) delta sec:0.0001 (96.158 usec) 0xf64788: complete idx=8 addr=8000 My NIC hardware clock is clearly not synced with system time, as above delta say 37.4 seconds between HW and XDP timestamps (using bpf_ktime_get_tai_ns()). Time between XDP and AF_XDP wakeup is reported to be 96 usec, which is also higher than I expected. As explained in [1] this is caused by CPU sleep states. My /dev/cpu_dma_latency was set to 2000000000. Applying tuned-adm profile latency-performance this value change to 2. $ sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency 2000000000 $ sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency 2 Now the time between XDP and AF_XDP wakeup is reduced to approx 12 usec. rx_timestamp: 1674659206344977544 (sec:1674659206.3450) XDP RX-time: 1674659243776087765 (sec:1674659243.7761) delta sec:37.4311 AF_XDP time: 1674659243776099841 (sec:1674659243.7761) delta sec:0.0000 (12.076 usec) [1] https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-interaction > diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c > b/tools/testing/selftests/bpf/xdp_hw_metadata.c > index 0008f0f239e8..dceddb17fbc9 100644 > --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c > +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c > @@ -24,6 +24,7 @@ > #include <linux/net_tstamp.h> > #include <linux/udp.h> > #include <linux/sockios.h> > +#include <linux/net_tstamp.h> > #include <sys/mman.h> > #include <net/if.h> > #include <poll.h> > @@ -278,13 +279,37 @@ static int rxq_num(const char *ifname) > > ret = ioctl(fd, SIOCETHTOOL, &ifr); > if (ret < 0) > - error(-1, errno, "socket"); > + error(-1, errno, "ioctl(SIOCETHTOOL)"); > > close(fd); > > return ch.rx_count + ch.combined_count; > } > > +static void hwtstamp_enable(const char *ifname) > +{ > + struct hwtstamp_config cfg = { > + .rx_filter = HWTSTAMP_FILTER_ALL, > + > + }; > + > + struct ifreq ifr = { > + .ifr_data = (void *)&cfg, > + }; > + strcpy(ifr.ifr_name, ifname); > + int fd, ret; > + > + fd = socket(AF_UNIX, SOCK_DGRAM, 0); > + if (fd < 0) > + error(-1, errno, "socket"); > + > + ret = ioctl(fd, SIOCSHWTSTAMP, &ifr); > + if (ret < 0) > + error(-1, errno, "ioctl(SIOCSHWTSTAMP)"); > + > + close(fd); > +} > + > static void cleanup(void) > { > LIBBPF_OPTS(bpf_xdp_attach_opts, opts); > @@ -341,6 +366,8 @@ int main(int argc, char *argv[]) > > printf("rxq: %d\n", rxq); > > + hwtstamp_enable(ifname); > + > rx_xsk = malloc(sizeof(struct xsk) * rxq); > if (!rx_xsk) > error(-1, ENOMEM, "malloc"); > >
On Wed, Jan 25, 2023 at 7:10 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > On 24/01/2023 19.48, sdf@google.com wrote: > > On 01/24, Stanislav Fomichev wrote: > >> On Tue, Jan 24, 2023 at 7:26 AM Jesper Dangaard Brouer > >> <jbrouer@redhat.com> wrote: > >> > > >> > > >> > Testing this on mlx5 and I'm not getting the RX-timestamp. > >> > See command details below. > > > >> CC'ed Toke since I've never tested mlx5 myself. > >> I was pretty close to getting the setup late last week, let me try to > >> see whether it's ready or not. > > > >> > On 19/01/2023 23.15, Stanislav Fomichev wrote: > >> > > To be used for verification of driver implementations. Note that > >> > > the skb path is gone from the series, but I'm still keeping the > >> > > implementation for any possible future work. > >> > > > >> > > $ xdp_hw_metadata <ifname> > >> > > >> > sudo ./xdp_hw_metadata mlx5p1 > >> > > >> > Output: > >> > [...cut ...] > >> > open bpf program... > >> > load bpf program... > >> > prepare skb endpoint... > >> > XXX timestamping_enable(): setsockopt(SO_TIMESTAMPING) ret:0 > >> > prepare xsk map... > >> > map[0] = 3 > >> > map[1] = 4 > >> > map[2] = 5 > >> > map[3] = 6 > >> > map[4] = 7 > >> > map[5] = 8 > >> > attach bpf program... > >> > poll: 0 (0) > >> > poll: 0 (0) > >> > poll: 0 (0) > >> > poll: 1 (0) > >> > xsk_ring_cons__peek: 1 > >> > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > >> > rx_timestamp: 0 > >> > rx_hash: 2773355807 > >> > 0x1821788: complete idx=8 addr=8000 > >> > poll: 0 (0) > >> > > >> > The trace_pipe: > >> > > >> > $ sudo cat /sys/kernel/debug/tracing/trace_pipe > >> > <idle>-0 [005] ..s2. 2722.884762: bpf_trace_printk: > >> > forwarding UDP:9091 to AF_XDP > >> > <idle>-0 [005] ..s2. 2722.884771: bpf_trace_printk: > >> > populated rx_hash with 2773355807 > >> > > >> > > >> > > On the other machine: > >> > > > >> > > $ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP > >> > > >> > Fixing the source-port to see if RX-hash remains the same. > >> > > >> > $ echo xdp | nc --source-port=2000 --udp 198.18.1.1 9091 > >> > > >> > > $ echo -n skb | nc -u -q1 <target> 9092 # for skb > >> > > > >> > > Sample output: > >> > > > >> > > # xdp > >> > > xsk_ring_cons__peek: 1 > >> > > 0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 > >> comp_addr=8000 > >> > > rx_timestamp_supported: 1 > >> > > rx_timestamp: 1667850075063948829 > >> > > 0x19f9090: complete idx=8 addr=8000 > >> > > >> > xsk_ring_cons__peek: 1 > >> > 0x1821788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > >> > rx_timestamp: 0 > >> > rx_hash: 2773355807 > >> > 0x1821788: complete idx=8 addr=8000 > >> > > >> > It doesn't look like hardware RX-timestamps are getting enabled. > >> > > >> > [... cut to relevant code ...] > >> > > >> > > diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c > >> b/tools/testing/selftests/bpf/xdp_hw_metadata.c > >> > > new file mode 100644 > >> > > index 000000000000..0008f0f239e8 > >> > > --- /dev/null > >> > > +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c > >> > > @@ -0,0 +1,403 @@ > >> > [...] > >> > > >> > > +static void timestamping_enable(int fd, int val) > >> > > +{ > >> > > + int ret; > >> > > + > >> > > + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, > >> sizeof(val)); > >> > > + if (ret < 0) > >> > > + error(-1, errno, "setsockopt(SO_TIMESTAMPING)"); > >> > > +} > >> > > + > >> > > +int main(int argc, char *argv[]) > >> > > +{ > >> > [...] > >> > > >> > > + printf("prepare skb endpoint...\n"); > >> > > + server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, > >> 1000); > >> > > + if (server_fd < 0) > >> > > + error(-1, errno, "start_server"); > >> > > + timestamping_enable(server_fd, > >> > > + SOF_TIMESTAMPING_SOFTWARE | > >> > > + SOF_TIMESTAMPING_RAW_HARDWARE); > >> > > + > >> > > >> > I don't think this timestamping_enable() with these flags are enough to > >> > enable hardware timestamping. > > > > Yeah, agreed, looks like that's the issue. timestamping_enable() has > > been used for the xdp->skb path that I've eventually removed from the > > series, so it's mostly a noop here.. > > > > Maybe you can try the following before I send a proper patch? > > Yes, below patch fixed the issue, thx. > > Now I get HW timestamps, plus I added some software CLOCK_TAI timestamps > to compare against. > > Output is now: > > poll: 1 (0) > xsk_ring_cons__peek: 1 > 0xf64788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 > rx_hash: 3697961069 > rx_timestamp: 1674657672142214773 (sec:1674657672.1422) > XDP RX-time: 1674657709561774876 (sec:1674657709.5618) delta sec:37.4196 > AF_XDP time: 1674657709561871034 (sec:1674657709.5619) delta > sec:0.0001 (96.158 usec) > 0xf64788: complete idx=8 addr=8000 > > My NIC hardware clock is clearly not synced with system time, as above > delta say 37.4 seconds between HW and XDP timestamps (using > bpf_ktime_get_tai_ns()). > > Time between XDP and AF_XDP wakeup is reported to be 96 usec, which is > also higher than I expected. As explained in [1] this is caused by CPU > sleep states. > > My /dev/cpu_dma_latency was set to 2000000000. Applying tuned-adm > profile latency-performance this value change to 2. > > $ sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency > 2000000000 > $ sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency > 2 > > Now the time between XDP and AF_XDP wakeup is reduced to approx 12 usec. > > rx_timestamp: 1674659206344977544 (sec:1674659206.3450) > XDP RX-time: 1674659243776087765 (sec:1674659243.7761) delta sec:37.4311 > AF_XDP time: 1674659243776099841 (sec:1674659243.7761) delta > sec:0.0000 (12.076 usec) > > > [1] > https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-interaction Great, thank you for testing and investigating the clock discrepancy! Will send it as a patch later today, will add your Tested-by (if you don't mind). > > diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c > > b/tools/testing/selftests/bpf/xdp_hw_metadata.c > > index 0008f0f239e8..dceddb17fbc9 100644 > > --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c > > +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c > > @@ -24,6 +24,7 @@ > > #include <linux/net_tstamp.h> > > #include <linux/udp.h> > > #include <linux/sockios.h> > > +#include <linux/net_tstamp.h> > > #include <sys/mman.h> > > #include <net/if.h> > > #include <poll.h> > > @@ -278,13 +279,37 @@ static int rxq_num(const char *ifname) > > > > ret = ioctl(fd, SIOCETHTOOL, &ifr); > > if (ret < 0) > > - error(-1, errno, "socket"); > > + error(-1, errno, "ioctl(SIOCETHTOOL)"); > > > > close(fd); > > > > return ch.rx_count + ch.combined_count; > > } > > > > +static void hwtstamp_enable(const char *ifname) > > +{ > > + struct hwtstamp_config cfg = { > > + .rx_filter = HWTSTAMP_FILTER_ALL, > > + > > + }; > > + > > + struct ifreq ifr = { > > + .ifr_data = (void *)&cfg, > > + }; > > + strcpy(ifr.ifr_name, ifname); > > + int fd, ret; > > + > > + fd = socket(AF_UNIX, SOCK_DGRAM, 0); > > + if (fd < 0) > > + error(-1, errno, "socket"); > > + > > + ret = ioctl(fd, SIOCSHWTSTAMP, &ifr); > > + if (ret < 0) > > + error(-1, errno, "ioctl(SIOCSHWTSTAMP)"); > > + > > + close(fd); > > +} > > + > > static void cleanup(void) > > { > > LIBBPF_OPTS(bpf_xdp_attach_opts, opts); > > @@ -341,6 +366,8 @@ int main(int argc, char *argv[]) > > > > printf("rxq: %d\n", rxq); > > > > + hwtstamp_enable(ifname); > > + > > rx_xsk = malloc(sizeof(struct xsk) * rxq); > > if (!rx_xsk) > > error(-1, ENOMEM, "malloc"); > > > > >
diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index 401a75844cc0..4aa5bba956ff 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -47,3 +47,4 @@ test_cpp xskxceiver xdp_redirect_multi xdp_synproxy +xdp_hw_metadata diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index e09bef2b7502..9c961d2d868e 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -83,7 +83,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ - xskxceiver xdp_redirect_multi xdp_synproxy veristat + xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file TEST_GEN_FILES += liburandom_read.so @@ -383,6 +383,7 @@ test_subskeleton.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o +xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps))) @@ -580,6 +581,10 @@ $(OUTPUT)/xskxceiver: xskxceiver.c $(OUTPUT)/xsk.o $(OUTPUT)/xsk_xdp_progs.skel. $(call msg,BINARY,,$@) $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ +$(OUTPUT)/xdp_hw_metadata: xdp_hw_metadata.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xsk.o $(OUTPUT)/xdp_hw_metadata.skel.h | $(OUTPUT) + $(call msg,BINARY,,$@) + $(Q)$(CC) $(CFLAGS) -static $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ + # Make sure we are able to include and link libbpf against c++. $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ) $(call msg,CXX,,$@) diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c new file mode 100644 index 000000000000..25b8178735ee --- /dev/null +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <vmlinux.h> +#include "xdp_metadata.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_endian.h> + +struct { + __uint(type, BPF_MAP_TYPE_XSKMAP); + __uint(max_entries, 256); + __type(key, __u32); + __type(value, __u32); +} xsk SEC(".maps"); + +extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, + __u64 *timestamp) __ksym; +extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, + __u32 *hash) __ksym; + +SEC("xdp") +int rx(struct xdp_md *ctx) +{ + void *data, *data_meta, *data_end; + struct ipv6hdr *ip6h = NULL; + struct ethhdr *eth = NULL; + struct udphdr *udp = NULL; + struct iphdr *iph = NULL; + struct xdp_meta *meta; + int ret; + + data = (void *)(long)ctx->data; + data_end = (void *)(long)ctx->data_end; + eth = data; + if (eth + 1 < data_end) { + if (eth->h_proto == bpf_htons(ETH_P_IP)) { + iph = (void *)(eth + 1); + if (iph + 1 < data_end && iph->protocol == IPPROTO_UDP) + udp = (void *)(iph + 1); + } + if (eth->h_proto == bpf_htons(ETH_P_IPV6)) { + ip6h = (void *)(eth + 1); + if (ip6h + 1 < data_end && ip6h->nexthdr == IPPROTO_UDP) + udp = (void *)(ip6h + 1); + } + if (udp && udp + 1 > data_end) + udp = NULL; + } + + if (!udp) + return XDP_PASS; + + if (udp->dest != bpf_htons(9091)) + return XDP_PASS; + + bpf_printk("forwarding UDP:9091 to AF_XDP"); + + ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta)); + if (ret != 0) { + bpf_printk("bpf_xdp_adjust_meta returned %d", ret); + return XDP_PASS; + } + + data = (void *)(long)ctx->data; + data_meta = (void *)(long)ctx->data_meta; + meta = data_meta; + + if (meta + 1 > data) { + bpf_printk("bpf_xdp_adjust_meta doesn't appear to work"); + return XDP_PASS; + } + + if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp)) + bpf_printk("populated rx_timestamp with %u", meta->rx_timestamp); + + if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash)) + bpf_printk("populated rx_hash with %u", meta->rx_hash); + + return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c new file mode 100644 index 000000000000..0008f0f239e8 --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -0,0 +1,403 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* Reference program for verifying XDP metadata on real HW. Functional test + * only, doesn't test the performance. + * + * RX: + * - UDP 9091 packets are diverted into AF_XDP + * - Metadata verified: + * - rx_timestamp + * - rx_hash + * + * TX: + * - TBD + */ + +#include <test_progs.h> +#include <network_helpers.h> +#include "xdp_hw_metadata.skel.h" +#include "xsk.h" + +#include <error.h> +#include <linux/errqueue.h> +#include <linux/if_link.h> +#include <linux/net_tstamp.h> +#include <linux/udp.h> +#include <linux/sockios.h> +#include <sys/mman.h> +#include <net/if.h> +#include <poll.h> + +#include "xdp_metadata.h" + +#define UMEM_NUM 16 +#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE +#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM) +#define XDP_FLAGS (XDP_FLAGS_DRV_MODE | XDP_FLAGS_REPLACE) + +struct xsk { + void *umem_area; + struct xsk_umem *umem; + struct xsk_ring_prod fill; + struct xsk_ring_cons comp; + struct xsk_ring_prod tx; + struct xsk_ring_cons rx; + struct xsk_socket *socket; +}; + +struct xdp_hw_metadata *bpf_obj; +struct xsk *rx_xsk; +const char *ifname; +int ifindex; +int rxq; + +void test__fail(void) { /* for network_helpers.c */ } + +static int open_xsk(int ifindex, struct xsk *xsk, __u32 queue_id) +{ + int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE; + const struct xsk_socket_config socket_config = { + .rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .bind_flags = XDP_COPY, + }; + const struct xsk_umem_config umem_config = { + .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, + .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, + .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG, + }; + __u32 idx; + u64 addr; + int ret; + int i; + + xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0); + if (xsk->umem_area == MAP_FAILED) + return -ENOMEM; + + ret = xsk_umem__create(&xsk->umem, + xsk->umem_area, UMEM_SIZE, + &xsk->fill, + &xsk->comp, + &umem_config); + if (ret) + return ret; + + ret = xsk_socket__create(&xsk->socket, ifindex, queue_id, + xsk->umem, + &xsk->rx, + &xsk->tx, + &socket_config); + if (ret) + return ret; + + /* First half of umem is for TX. This way address matches 1-to-1 + * to the completion queue index. + */ + + for (i = 0; i < UMEM_NUM / 2; i++) { + addr = i * UMEM_FRAME_SIZE; + printf("%p: tx_desc[%d] -> %lx\n", xsk, i, addr); + } + + /* Second half of umem is for RX. */ + + ret = xsk_ring_prod__reserve(&xsk->fill, UMEM_NUM / 2, &idx); + for (i = 0; i < UMEM_NUM / 2; i++) { + addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE; + printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr); + *xsk_ring_prod__fill_addr(&xsk->fill, i) = addr; + } + xsk_ring_prod__submit(&xsk->fill, ret); + + return 0; +} + +static void close_xsk(struct xsk *xsk) +{ + if (xsk->umem) + xsk_umem__delete(xsk->umem); + if (xsk->socket) + xsk_socket__delete(xsk->socket); + munmap(xsk->umem, UMEM_SIZE); +} + +static void refill_rx(struct xsk *xsk, __u64 addr) +{ + __u32 idx; + + if (xsk_ring_prod__reserve(&xsk->fill, 1, &idx) == 1) { + printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr); + *xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr; + xsk_ring_prod__submit(&xsk->fill, 1); + } +} + +static void verify_xdp_metadata(void *data) +{ + struct xdp_meta *meta; + + meta = data - sizeof(*meta); + + printf("rx_timestamp: %llu\n", meta->rx_timestamp); + printf("rx_hash: %u\n", meta->rx_hash); +} + +static void verify_skb_metadata(int fd) +{ + char cmsg_buf[1024]; + char packet_buf[128]; + + struct scm_timestamping *ts; + struct iovec packet_iov; + struct cmsghdr *cmsg; + struct msghdr hdr; + + memset(&hdr, 0, sizeof(hdr)); + hdr.msg_iov = &packet_iov; + hdr.msg_iovlen = 1; + packet_iov.iov_base = packet_buf; + packet_iov.iov_len = sizeof(packet_buf); + + hdr.msg_control = cmsg_buf; + hdr.msg_controllen = sizeof(cmsg_buf); + + if (recvmsg(fd, &hdr, 0) < 0) + error(-1, errno, "recvmsg"); + + for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != NULL; + cmsg = CMSG_NXTHDR(&hdr, cmsg)) { + + if (cmsg->cmsg_level != SOL_SOCKET) + continue; + + switch (cmsg->cmsg_type) { + case SCM_TIMESTAMPING: + ts = (struct scm_timestamping *)CMSG_DATA(cmsg); + if (ts->ts[2].tv_sec || ts->ts[2].tv_nsec) { + printf("found skb hwtstamp = %lu.%lu\n", + ts->ts[2].tv_sec, ts->ts[2].tv_nsec); + return; + } + break; + default: + break; + } + } + + printf("skb hwtstamp is not found!\n"); +} + +static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd) +{ + const struct xdp_desc *rx_desc; + struct pollfd fds[rxq + 1]; + __u64 comp_addr; + __u64 addr; + __u32 idx; + int ret; + int i; + + for (i = 0; i < rxq; i++) { + fds[i].fd = xsk_socket__fd(rx_xsk[i].socket); + fds[i].events = POLLIN; + fds[i].revents = 0; + } + + fds[rxq].fd = server_fd; + fds[rxq].events = POLLIN; + fds[rxq].revents = 0; + + while (true) { + errno = 0; + ret = poll(fds, rxq + 1, 1000); + printf("poll: %d (%d)\n", ret, errno); + if (ret < 0) + break; + if (ret == 0) + continue; + + if (fds[rxq].revents) + verify_skb_metadata(server_fd); + + for (i = 0; i < rxq; i++) { + if (fds[i].revents == 0) + continue; + + struct xsk *xsk = &rx_xsk[i]; + + ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx); + printf("xsk_ring_cons__peek: %d\n", ret); + if (ret != 1) + continue; + + rx_desc = xsk_ring_cons__rx_desc(&xsk->rx, idx); + comp_addr = xsk_umem__extract_addr(rx_desc->addr); + addr = xsk_umem__add_offset_to_addr(rx_desc->addr); + printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n", + xsk, idx, rx_desc->addr, addr, comp_addr); + verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr)); + xsk_ring_cons__release(&xsk->rx, 1); + refill_rx(xsk, comp_addr); + } + } + + return 0; +} + +struct ethtool_channels { + __u32 cmd; + __u32 max_rx; + __u32 max_tx; + __u32 max_other; + __u32 max_combined; + __u32 rx_count; + __u32 tx_count; + __u32 other_count; + __u32 combined_count; +}; + +#define ETHTOOL_GCHANNELS 0x0000003c /* Get no of channels */ + +static int rxq_num(const char *ifname) +{ + struct ethtool_channels ch = { + .cmd = ETHTOOL_GCHANNELS, + }; + + struct ifreq ifr = { + .ifr_data = (void *)&ch, + }; + strcpy(ifr.ifr_name, ifname); + int fd, ret; + + fd = socket(AF_UNIX, SOCK_DGRAM, 0); + if (fd < 0) + error(-1, errno, "socket"); + + ret = ioctl(fd, SIOCETHTOOL, &ifr); + if (ret < 0) + error(-1, errno, "socket"); + + close(fd); + + return ch.rx_count + ch.combined_count; +} + +static void cleanup(void) +{ + LIBBPF_OPTS(bpf_xdp_attach_opts, opts); + int ret; + int i; + + if (bpf_obj) { + opts.old_prog_fd = bpf_program__fd(bpf_obj->progs.rx); + if (opts.old_prog_fd >= 0) { + printf("detaching bpf program....\n"); + ret = bpf_xdp_detach(ifindex, XDP_FLAGS, &opts); + if (ret) + printf("failed to detach XDP program: %d\n", ret); + } + } + + for (i = 0; i < rxq; i++) + close_xsk(&rx_xsk[i]); + + if (bpf_obj) + xdp_hw_metadata__destroy(bpf_obj); +} + +static void handle_signal(int sig) +{ + /* interrupting poll() is all we need */ +} + +static void timestamping_enable(int fd, int val) +{ + int ret; + + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); + if (ret < 0) + error(-1, errno, "setsockopt(SO_TIMESTAMPING)"); +} + +int main(int argc, char *argv[]) +{ + int server_fd = -1; + int ret; + int i; + + struct bpf_program *prog; + + if (argc != 2) { + fprintf(stderr, "pass device name\n"); + return -1; + } + + ifname = argv[1]; + ifindex = if_nametoindex(ifname); + rxq = rxq_num(ifname); + + printf("rxq: %d\n", rxq); + + rx_xsk = malloc(sizeof(struct xsk) * rxq); + if (!rx_xsk) + error(-1, ENOMEM, "malloc"); + + for (i = 0; i < rxq; i++) { + printf("open_xsk(%s, %p, %d)\n", ifname, &rx_xsk[i], i); + ret = open_xsk(ifindex, &rx_xsk[i], i); + if (ret) + error(-1, -ret, "open_xsk"); + + printf("xsk_socket__fd() -> %d\n", xsk_socket__fd(rx_xsk[i].socket)); + } + + printf("open bpf program...\n"); + bpf_obj = xdp_hw_metadata__open(); + if (libbpf_get_error(bpf_obj)) + error(-1, libbpf_get_error(bpf_obj), "xdp_hw_metadata__open"); + + prog = bpf_object__find_program_by_name(bpf_obj->obj, "rx"); + bpf_program__set_ifindex(prog, ifindex); + bpf_program__set_flags(prog, BPF_F_XDP_DEV_BOUND_ONLY); + + printf("load bpf program...\n"); + ret = xdp_hw_metadata__load(bpf_obj); + if (ret) + error(-1, -ret, "xdp_hw_metadata__load"); + + printf("prepare skb endpoint...\n"); + server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, 1000); + if (server_fd < 0) + error(-1, errno, "start_server"); + timestamping_enable(server_fd, + SOF_TIMESTAMPING_SOFTWARE | + SOF_TIMESTAMPING_RAW_HARDWARE); + + printf("prepare xsk map...\n"); + for (i = 0; i < rxq; i++) { + int sock_fd = xsk_socket__fd(rx_xsk[i].socket); + __u32 queue_id = i; + + printf("map[%d] = %d\n", queue_id, sock_fd); + ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0); + if (ret) + error(-1, -ret, "bpf_map_update_elem"); + } + + printf("attach bpf program...\n"); + ret = bpf_xdp_attach(ifindex, + bpf_program__fd(bpf_obj->progs.rx), + XDP_FLAGS, NULL); + if (ret) + error(-1, -ret, "bpf_xdp_attach"); + + signal(SIGINT, handle_signal); + ret = verify_metadata(rx_xsk, rxq, server_fd); + close(server_fd); + cleanup(); + if (ret) + error(-1, -ret, "verify_metadata"); +}
To be used for verification of driver implementations. Note that the skb path is gone from the series, but I'm still keeping the implementation for any possible future work. $ xdp_hw_metadata <ifname> On the other machine: $ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP $ echo -n skb | nc -u -q1 <target> 9092 # for skb Sample output: # xdp xsk_ring_cons__peek: 1 0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_timestamp_supported: 1 rx_timestamp: 1667850075063948829 0x19f9090: complete idx=8 addr=8000 # skb found skb hwtstamp = 1668314052.854274681 Decoding: # xdp rx_timestamp=1667850075.063948829 $ date -d @1667850075 Mon Nov 7 11:41:15 AM PST 2022 $ date Mon Nov 7 11:42:05 AM PST 2022 # skb $ date -d @1668314052 Sat Nov 12 08:34:12 PM PST 2022 $ date Sat Nov 12 08:37:06 PM PST 2022 Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> --- tools/testing/selftests/bpf/.gitignore | 1 + tools/testing/selftests/bpf/Makefile | 7 +- .../selftests/bpf/progs/xdp_hw_metadata.c | 81 ++++ tools/testing/selftests/bpf/xdp_hw_metadata.c | 403 ++++++++++++++++++ 4 files changed, 491 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c