diff mbox series

[bpf-next,V1,5/5] selftests/bpf: xdp_hw_metadata track more timestamps

Message ID 168174344813.593471.4026230439937368990.stgit@firesoul (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series XDP-hints: XDP kfunc metadata for driver igc | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/cc_maintainers warning 10 maintainers not CCed: mykolal@fb.com andrii@kernel.org song@kernel.org shuah@kernel.org haoluo@google.com yhs@fb.com kpsingh@kernel.org jolsa@kernel.org martin.lau@linux.dev linux-kselftest@vger.kernel.org
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch warning CHECK: No space is necessary after a cast WARNING: line length of 85 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on s390x with gcc

Commit Message

Jesper Dangaard Brouer April 17, 2023, 2:57 p.m. UTC
To correlate the hardware RX timestamp with something, add tracking of
two software timestamps both clock source CLOCK_TAI (see description in
man clock_gettime(2)).

XDP metadata is extended with xdp_timestamp for capturing when XDP
received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
could not find a BPF helper for getting CLOCK_REALTIME, which would have
been preferred. In userspace when AF_XDP sees the packet another
software timestamp is recorded via clock_gettime() also clock source
CLOCK_TAI.

Example output shortly after loading igc driver:

  poll: 1 (0) skip=1 fail=0 redir=2
  xsk_ring_cons__peek: 1
  0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
  rx_hash: 0x82A96531 with RSS type:0x1
  rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
  XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
  AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
  0x12557a8: complete idx=9 addr=9000

The first observation is that the 37 sec difference between RX HW vs XDP
timestamps, which indicate hardware is likely clock source
CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
with a 37 sec offset.

The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
userspace wakeup time. On this hardware it was caused by CPU idle sleep
states, which can be reduced by tuning /dev/cpu_dma_latency.

View current requested/allowed latency bound via:
  hexdump --format '"%d\n"' /dev/cpu_dma_latency

More explanation of the output and how this can be used to identify
clock drift for the HW clock can be seen here[1]:

[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../testing/selftests/bpf/progs/xdp_hw_metadata.c  |    4 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c      |   47 ++++++++++++++++++--
 tools/testing/selftests/bpf/xdp_metadata.h         |    1 
 3 files changed, 46 insertions(+), 6 deletions(-)

Comments

Jesper Dangaard Brouer April 17, 2023, 3:04 p.m. UTC | #1
On 17/04/2023 16.57, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
> 
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
> 
> Example output shortly after loading igc driver:
> 
>    poll: 1 (0) skip=1 fail=0 redir=2
>    xsk_ring_cons__peek: 1
>    0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>    rx_hash: 0x82A96531 with RSS type:0x1
>    rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>    XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>    AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>    0x12557a8: complete idx=9 addr=9000
> 

For QA verification testing, I want to mention that this fix[0] were
applied, in-order to get "rx_timestamp" working on igc:

  [0] 
https://lore.kernel.org/all/20230414154902.2950535-1-yoong.siang.song@intel.com/


> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.
> 
> The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
> userspace wakeup time. On this hardware it was caused by CPU idle sleep
> states, which can be reduced by tuning /dev/cpu_dma_latency.
> 
> View current requested/allowed latency bound via:
>    hexdump --format '"%d\n"' /dev/cpu_dma_latency
> 
> More explanation of the output and how this can be used to identify
> clock drift for the HW clock can be seen here[1]:
> 
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
>
Kurt Kanzenbach April 17, 2023, 3:31 p.m. UTC | #2
On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
>
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
>
> Example output shortly after loading igc driver:
>
>   poll: 1 (0) skip=1 fail=0 redir=2
>   xsk_ring_cons__peek: 1
>   0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>   rx_hash: 0x82A96531 with RSS type:0x1
>   rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>   XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>   AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>   0x12557a8: complete idx=9 addr=9000
>
> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.

Maybe I'm missing something here, but in order to compare the hardware
with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
time sources have to be synchronized by using something like
phc2sys. That should make them comparable within reasonable range
(nanoseconds).

Thanks,
Kurt
Song Yoong Siang April 18, 2023, 6:07 a.m. UTC | #3
On Monday, April 17, 2023 11:32 PM, Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> wrote:
>On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>> To correlate the hardware RX timestamp with something, add tracking of
>> two software timestamps both clock source CLOCK_TAI (see description
>> in man clock_gettime(2)).
>>
>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns().
>> I could not find a BPF helper for getting CLOCK_REALTIME, which would
>> have been preferred. In userspace when AF_XDP sees the packet another
>> software timestamp is recorded via clock_gettime() also clock source
>> CLOCK_TAI.
>>
>> Example output shortly after loading igc driver:
>>
>>   poll: 1 (0) skip=1 fail=0 redir=2
>>   xsk_ring_cons__peek: 1
>>   0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>>   rx_hash: 0x82A96531 with RSS type:0x1
>>   rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>>   XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta
>sec:37.0001 (37000059.407 usec)
>>   AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta
>sec:0.0001 (92.999 usec)
>>   0x12557a8: complete idx=9 addr=9000
>>
>> The first observation is that the 37 sec difference between RX HW vs
>> XDP timestamps, which indicate hardware is likely clock source
>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>> with a 37 sec offset.
>
>Maybe I'm missing something here, but in order to compare the hardware with
>software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the time sources
>have to be synchronized by using something like phc2sys. That should make them
>comparable within reasonable range (nanoseconds).
>
>Thanks,
>Kurt

Tested-by: Song Yoong Siang <yoong.siang.song@intel.com>

I tested this patchset by using I226-LM (rev 04) NIC on Tiger Lake Platform.
I use testptp selftest tool to make sure PHC is almost same as system clock.
Below are the detail of test steps and result.

1. Run xdp_hw_metadata tool.
   @DUT: sudo ./xdp_hw_metadata eth0

2. Enable Rx HWTS for all incoming packets. Note: This step is not needed if
   https://lore.kernel.org/all/20230414154902.2950535-1-yoong.siang.song@intel.com/
   bug fix patch is applied to the igc driver.
   @DUT: sudo hwstamp_ctl -i eth0 -r 1

3. Set the ptp clock time from the system time using testptp tool.
   @DUT: sudo ./testptp -d /dev/ptp0 -s

4. Send UDP packet with 9091 port from link partner immediately after step 3.
   @LinkPartner: echo -n xdp | nc -u -q1 <Destination IPv4 addr> 9091

Result:
   poll: 1 (0) skip=1 fail=0 redir=1
   xsk_ring_cons__peek: 1
   0x5626248d16d0: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
   rx_hash: 0x35E1B60E with RSS type:0x1
   rx_timestamp:  1677762195217129600 (sec:1677762195.2171)
   XDP RX-time:   1677762195217202099 (sec:1677762195.2172) delta sec:0.0001 (72.499 usec)
   AF_XDP time:   1677762195217231775 (sec:1677762195.2172) delta sec:0.0000 (29.676 usec)
   0x5626248d16d0: complete idx=8 addr=8000
Kurt Kanzenbach April 18, 2023, 6:38 a.m. UTC | #4
On Tue Apr 18 2023, Song, Yoong Siang wrote:
> Tested-by: Song Yoong Siang <yoong.siang.song@intel.com>
>
> I tested this patchset by using I226-LM (rev 04) NIC on Tiger Lake Platform.
> I use testptp selftest tool to make sure PHC is almost same as system clock.

OK, your test result looks sane.

Thanks,
Kurt
Jesper Dangaard Brouer April 18, 2023, 2:01 p.m. UTC | #5
On 17/04/2023 17.31, Kurt Kanzenbach wrote:
> On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>> To correlate the hardware RX timestamp with something, add tracking of
>> two software timestamps both clock source CLOCK_TAI (see description in
>> man clock_gettime(2)).
>>
>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
>> could not find a BPF helper for getting CLOCK_REALTIME, which would have
>> been preferred. In userspace when AF_XDP sees the packet another
>> software timestamp is recorded via clock_gettime() also clock source
>> CLOCK_TAI.
>>
>> Example output shortly after loading igc driver:
>>
>>    poll: 1 (0) skip=1 fail=0 redir=2
>>    xsk_ring_cons__peek: 1
>>    0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>>    rx_hash: 0x82A96531 with RSS type:0x1
>>    rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>>    XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>>    AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>>    0x12557a8: complete idx=9 addr=9000
>>
>> The first observation is that the 37 sec difference between RX HW vs XDP
>> timestamps, which indicate hardware is likely clock source
>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>> with a 37 sec offset.
> 
> Maybe I'm missing something here, but in order to compare the hardware
> with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
> time sources have to be synchronized by using something like
> phc2sys. That should make them comparable within reasonable range
> (nanoseconds).

Precisely, in this test I've not synchronized the clocks.
The observation is that driver igc clock gets initialized to
CLOCK_REALTIME wall-clock time, and it slowly drifts as documented in 
provided link[1].

  [1] 
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#driver-igc-clock-drift-observations
  [2] 
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#quick-time-sync-setup

I've also played with using phc2sys (in same doc[2]) to sync HW clock
with SW clock. I do *seek input* if I'm using it correctly?!?.

I don't have a PTP clock setup , so I manually: Use phc2sys to
synchronize the system clock to the PTP hardware clock (PHC) on the
network card (which driver inited to CLOCK_REALTIME wall-clock).

Stop ntp clock sync and disable most CPU sleep states:

   sudo systemctl stop chronyd
   sudo tuned-adm profile latency-performance
   sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency
   2

Adjust for the 37 sec offset to TAI, such that our BPF-prog using TAI 
will align:

   sudo phc2sys -s igc1 -O -37 -R 2 -u 10

Result on igc with xdp_hw_metadata:

  poll: 1 (0) skip=1 fail=0 redir=6
  xsk_ring_cons__peek: 1
  rx_hash: 0x82A96531 with RSS type:0x1
  rx_timestamp:  1681825632645744805 (sec:1681825632.6457)
  XDP RX-time:   1681825632645755858 (sec:1681825632.6458) delta 
sec:0.0000 (11.053 usec)
  AF_XDP time:   1681825632645769371 (sec:1681825632.6458) delta 
sec:0.0000 (13.513 usec)

The log file from phc2sys says:

  phc2sys[1294263]: [86275.140] CLOCK_REALTIME rms    6 max   11 freq 
+13719 +/-   5 delay  1435 +/-   5

Notice the delta between HW and SW timestamps is 11.053 usec.
Even-though it is small, I don't really trust it, because the phc2sys
log says frequency offset mean is "+13719" nanosec.

So, it is true that latency/delay between HW to XDP-SW is 11 usec?
Or is this due to (in)accuracy of phc2sys sync?

--Jesper
Kurt Kanzenbach April 18, 2023, 7:08 p.m. UTC | #6
On Tue Apr 18 2023, Jesper Dangaard Brouer wrote:
> On 17/04/2023 17.31, Kurt Kanzenbach wrote:
>> On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>>> To correlate the hardware RX timestamp with something, add tracking of
>>> two software timestamps both clock source CLOCK_TAI (see description in
>>> man clock_gettime(2)).
>>>
>>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
>>> could not find a BPF helper for getting CLOCK_REALTIME, which would have
>>> been preferred. In userspace when AF_XDP sees the packet another
>>> software timestamp is recorded via clock_gettime() also clock source
>>> CLOCK_TAI.
>>>
>>> Example output shortly after loading igc driver:
>>>
>>>    poll: 1 (0) skip=1 fail=0 redir=2
>>>    xsk_ring_cons__peek: 1
>>>    0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>>>    rx_hash: 0x82A96531 with RSS type:0x1
>>>    rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>>>    XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>>>    AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>>>    0x12557a8: complete idx=9 addr=9000
>>>
>>> The first observation is that the 37 sec difference between RX HW vs XDP
>>> timestamps, which indicate hardware is likely clock source
>>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>>> with a 37 sec offset.
>> 
>> Maybe I'm missing something here, but in order to compare the hardware
>> with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
>> time sources have to be synchronized by using something like
>> phc2sys. That should make them comparable within reasonable range
>> (nanoseconds).
>
> Precisely, in this test I've not synchronized the clocks.
> The observation is that driver igc clock gets initialized to
> CLOCK_REALTIME wall-clock time

Yes. The igc driver uses ktime_get_real() to initialize the PHC time in
init() and reset(). However, that's driver specific. PTP is based on
TAI.

>, and it slowly drifts as documented in provided link[1].

Yes, it does without proper synchronization. Linux has its own
independent system clock. Therefore, tools like phc2sys are required.

>
>   [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#driver-igc-clock-drift-observations
>   [2] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#quick-time-sync-setup
>
> I've also played with using phc2sys (in same doc[2]) to sync HW clock
> with SW clock. I do *seek input* if I'm using it correctly?!?.

Looks correct.

>
> I don't have a PTP clock setup , so I manually: Use phc2sys to
> synchronize the system clock to the PTP hardware clock (PHC) on the
> network card (which driver inited to CLOCK_REALTIME wall-clock).
>
> Stop ntp clock sync and disable most CPU sleep states:
>
>    sudo systemctl stop chronyd
>    sudo tuned-adm profile latency-performance
>    sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency
>    2
>
> Adjust for the 37 sec offset to TAI, such that our BPF-prog using TAI 
> will align:
>
>    sudo phc2sys -s igc1 -O -37 -R 2 -u 10
>
> Result on igc with xdp_hw_metadata:
>
>   poll: 1 (0) skip=1 fail=0 redir=6
>   xsk_ring_cons__peek: 1
>   rx_hash: 0x82A96531 with RSS type:0x1
>   rx_timestamp:  1681825632645744805 (sec:1681825632.6457)
>   XDP RX-time:   1681825632645755858 (sec:1681825632.6458) delta 
> sec:0.0000 (11.053 usec)
>   AF_XDP time:   1681825632645769371 (sec:1681825632.6458) delta 
> sec:0.0000 (13.513 usec)
>
> The log file from phc2sys says:
>
>   phc2sys[1294263]: [86275.140] CLOCK_REALTIME rms    6 max   11 freq 
> +13719 +/-   5 delay  1435 +/-   5
>
> Notice the delta between HW and SW timestamps is 11.053 usec.
> Even-though it is small, I don't really trust it, because the phc2sys
> log says frequency offset mean is "+13719" nanosec.

The offset between the system and PHC clock is 11ns at maximum (and 6ns
in mean) which is quite good. The frequency offset is displayed in ppb.

>
> So, it is true that latency/delay between HW to XDP-SW is 11 usec?

I think so.

> Or is this due to (in)accuracy of phc2sys sync?

Nope.

Thanks,
Kurt
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index e1c787815e44..b2dfd7066c6e 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -77,7 +77,9 @@  int rx(struct xdp_md *ctx)
 	}
 
 	err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
-	if (err)
+	if (!err)
+		meta->xdp_timestamp = bpf_ktime_get_tai_ns();
+	else
 		meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
 
 	err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 987cf0db5ebc..613321eb84c1 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -27,6 +27,7 @@ 
 #include <sys/mman.h>
 #include <net/if.h>
 #include <poll.h>
+#include <time.h>
 
 #include "xdp_metadata.h"
 
@@ -134,18 +135,52 @@  static void refill_rx(struct xsk *xsk, __u64 addr)
 	}
 }
 
-static void verify_xdp_metadata(void *data)
+#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
+static __u64 gettime(clockid_t clock_id)
+{
+	struct timespec t;
+	int res;
+
+	/* See man clock_gettime(2) for type of clock_id's */
+	res = clock_gettime(clock_id, &t);
+
+	if (res < 0)
+		error(res, errno, "Error with clock_gettime()");
+
+	return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
+}
+
+static void verify_xdp_metadata(void *data, clockid_t clock_id)
 {
 	struct xdp_meta *meta;
 
 	meta = data - sizeof(*meta);
 
-	printf("rx_timestamp: %llu\n", meta->rx_timestamp);
 	if (meta->rx_hash_err < 0)
 		printf("No rx_hash err=%d\n", meta->rx_hash_err);
 	else
 		printf("rx_hash: 0x%X with RSS type:0x%X\n",
 		       meta->rx_hash, meta->rx_hash_type);
+
+	printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
+	       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
+	if (meta->rx_timestamp) {
+		__u64 usr_clock = gettime(clock_id);
+		__u64 xdp_clock = meta->xdp_timestamp;
+		__s64 delta_X = xdp_clock - meta->rx_timestamp;
+		__s64 delta_X2U = usr_clock - xdp_clock;
+
+		printf("XDP RX-time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+		       xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC,
+		       (double)delta_X / NANOSEC_PER_SEC,
+		       (double)delta_X / 1000);
+
+		printf("AF_XDP time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+		       usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
+		       (double)delta_X2U / NANOSEC_PER_SEC,
+		       (double)delta_X2U / 1000);
+	}
+
 }
 
 static void verify_skb_metadata(int fd)
@@ -193,7 +228,7 @@  static void verify_skb_metadata(int fd)
 	printf("skb hwtstamp is not found!\n");
 }
 
-static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
+static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id)
 {
 	const struct xdp_desc *rx_desc;
 	struct pollfd fds[rxq + 1];
@@ -243,7 +278,8 @@  static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
 			addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
 			printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
 			       xsk, idx, rx_desc->addr, addr, comp_addr);
-			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
+			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr),
+					    clock_id);
 			xsk_ring_cons__release(&xsk->rx, 1);
 			refill_rx(xsk, comp_addr);
 		}
@@ -370,6 +406,7 @@  static void timestamping_enable(int fd, int val)
 
 int main(int argc, char *argv[])
 {
+	clockid_t clock_id = CLOCK_TAI;
 	int server_fd = -1;
 	int ret;
 	int i;
@@ -443,7 +480,7 @@  int main(int argc, char *argv[])
 		error(1, -ret, "bpf_xdp_attach");
 
 	signal(SIGINT, handle_signal);
-	ret = verify_metadata(rx_xsk, rxq, server_fd);
+	ret = verify_metadata(rx_xsk, rxq, server_fd, clock_id);
 	close(server_fd);
 	cleanup();
 	if (ret)
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 0c4624dc6f2f..938a729bd307 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -11,6 +11,7 @@ 
 
 struct xdp_meta {
 	__u64 rx_timestamp;
+	__u64 xdp_timestamp;
 	__u32 rx_hash;
 	union {
 		__u32 rx_hash_type;