diff mbox series

[net-next,v1] bpf: add bpf_ktime_get_real_ns helper

Message ID 20220420122307.5290-1-xiangxia.m.yue@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v1] bpf: add bpf_ktime_get_real_ns helper | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1813 this patch: 1815
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 195 this patch: 195
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 1823 this patch: 1825
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: please, no space before tabs
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Tonghao Zhang April 20, 2022, 12:23 p.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch introduce a new bpf_ktime_get_real_ns helper, which may
help us to measure the skb latency in the ingress/forwarding path:

HW/SW[1] -> ip_rcv/tcp_rcv_established -> tcp_recvmsg_locked/tcp_update_recv_tstamps

* Insert BPF kprobe into ip_rcv/tcp_rcv_established invoking this helper.
  Then we can inspect how long time elapsed since HW/SW.
* If inserting BPF kprobe tcp_update_recv_tstamps invoked by tcp_recvmsg,
  we can measure how much latency skb in tcp receive queue. The reason for
  this can be application fetch the TCP messages too late.

[1]:
- HW drivers may set skb_hwtstamps(skb)->hwtstamp
- SW __netif_receive_skb_core set skb->tstamp with ktime_get_real()

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Cc: Joanne Koong <joannekoong@fb.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 include/uapi/linux/bpf.h       | 13 +++++++++++++
 kernel/bpf/core.c              |  1 +
 kernel/bpf/helpers.c           | 14 ++++++++++++++
 tools/include/uapi/linux/bpf.h | 13 +++++++++++++
 4 files changed, 41 insertions(+)

Comments

Toke Høiland-Jørgensen April 20, 2022, 12:53 p.m. UTC | #1
xiangxia.m.yue@gmail.com writes:

> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patch introduce a new bpf_ktime_get_real_ns helper, which may
> help us to measure the skb latency in the ingress/forwarding path:
>
> HW/SW[1] -> ip_rcv/tcp_rcv_established -> tcp_recvmsg_locked/tcp_update_recv_tstamps
>
> * Insert BPF kprobe into ip_rcv/tcp_rcv_established invoking this helper.
>   Then we can inspect how long time elapsed since HW/SW.
> * If inserting BPF kprobe tcp_update_recv_tstamps invoked by tcp_recvmsg,
>   we can measure how much latency skb in tcp receive queue. The reason for
>   this can be application fetch the TCP messages too late.

Why not just use one of the existing ktime helpers and also add a BPF
probe to set the initial timestamp instead of relying on skb->tstamp?

-Toke
Andrii Nakryiko April 20, 2022, 4:17 p.m. UTC | #2
On Wed, Apr 20, 2022 at 5:53 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>
> xiangxia.m.yue@gmail.com writes:
>
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch introduce a new bpf_ktime_get_real_ns helper, which may
> > help us to measure the skb latency in the ingress/forwarding path:
> >
> > HW/SW[1] -> ip_rcv/tcp_rcv_established -> tcp_recvmsg_locked/tcp_update_recv_tstamps
> >
> > * Insert BPF kprobe into ip_rcv/tcp_rcv_established invoking this helper.
> >   Then we can inspect how long time elapsed since HW/SW.
> > * If inserting BPF kprobe tcp_update_recv_tstamps invoked by tcp_recvmsg,
> >   we can measure how much latency skb in tcp receive queue. The reason for
> >   this can be application fetch the TCP messages too late.
>
> Why not just use one of the existing ktime helpers and also add a BPF
> probe to set the initial timestamp instead of relying on skb->tstamp?
>

You don't even need a BPF probe for this. See [0] for how retsnoop is
converting bpf_ktime_get_ns() into real time.

  [0] https://github.com/anakryiko/retsnoop/blob/master/src/retsnoop.c#L649-L668

> -Toke
Toke Høiland-Jørgensen April 20, 2022, 4:26 p.m. UTC | #3
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Apr 20, 2022 at 5:53 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>>
>> xiangxia.m.yue@gmail.com writes:
>>
>> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> >
>> > This patch introduce a new bpf_ktime_get_real_ns helper, which may
>> > help us to measure the skb latency in the ingress/forwarding path:
>> >
>> > HW/SW[1] -> ip_rcv/tcp_rcv_established -> tcp_recvmsg_locked/tcp_update_recv_tstamps
>> >
>> > * Insert BPF kprobe into ip_rcv/tcp_rcv_established invoking this helper.
>> >   Then we can inspect how long time elapsed since HW/SW.
>> > * If inserting BPF kprobe tcp_update_recv_tstamps invoked by tcp_recvmsg,
>> >   we can measure how much latency skb in tcp receive queue. The reason for
>> >   this can be application fetch the TCP messages too late.
>>
>> Why not just use one of the existing ktime helpers and also add a BPF
>> probe to set the initial timestamp instead of relying on skb->tstamp?
>>
>
> You don't even need a BPF probe for this. See [0] for how retsnoop is
> converting bpf_ktime_get_ns() into real time.
>
>   [0] https://github.com/anakryiko/retsnoop/blob/master/src/retsnoop.c#L649-L668

Uh, neat! Thanks for the link :)

-Toke
Tonghao Zhang April 21, 2022, 2:27 a.m. UTC | #4
On Wed, Apr 20, 2022 at 8:53 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>
> xiangxia.m.yue@gmail.com writes:
>
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch introduce a new bpf_ktime_get_real_ns helper, which may
> > help us to measure the skb latency in the ingress/forwarding path:
> >
> > HW/SW[1] -> ip_rcv/tcp_rcv_established -> tcp_recvmsg_locked/tcp_update_recv_tstamps
> >
> > * Insert BPF kprobe into ip_rcv/tcp_rcv_established invoking this helper.
> >   Then we can inspect how long time elapsed since HW/SW.
> > * If inserting BPF kprobe tcp_update_recv_tstamps invoked by tcp_recvmsg,
> >   we can measure how much latency skb in tcp receive queue. The reason for
> >   this can be application fetch the TCP messages too late.
>
> Why not just use one of the existing ktime helpers and also add a BPF
> probe to set the initial timestamp instead of relying on skb->tstamp?
Yes, That also looks good to me.
> -Toke
Tonghao Zhang April 21, 2022, 2:37 a.m. UTC | #5
On Thu, Apr 21, 2022 at 12:17 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Apr 20, 2022 at 5:53 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> >
> > xiangxia.m.yue@gmail.com writes:
> >
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > This patch introduce a new bpf_ktime_get_real_ns helper, which may
> > > help us to measure the skb latency in the ingress/forwarding path:
> > >
> > > HW/SW[1] -> ip_rcv/tcp_rcv_established -> tcp_recvmsg_locked/tcp_update_recv_tstamps
> > >
> > > * Insert BPF kprobe into ip_rcv/tcp_rcv_established invoking this helper.
> > >   Then we can inspect how long time elapsed since HW/SW.
> > > * If inserting BPF kprobe tcp_update_recv_tstamps invoked by tcp_recvmsg,
> > >   we can measure how much latency skb in tcp receive queue. The reason for
> > >   this can be application fetch the TCP messages too late.
> >
> > Why not just use one of the existing ktime helpers and also add a BPF
> > probe to set the initial timestamp instead of relying on skb->tstamp?
> >
>
> You don't even need a BPF probe for this. See [0] for how retsnoop is
> converting bpf_ktime_get_ns() into real time.
>
>   [0] https://github.com/anakryiko/retsnoop/blob/master/src/retsnoop.c#L649-L668
I try to calculate this offset too. But one case:
If administrator manually or NTP changes the clock, we should
calculate the offset.
How do we know the changes, one solution is that inserting kprobe in
tk_set_wall_to_mono() kernel function,
and using perf_event to notify userspace.

> > -Toke
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d14b10b85e51..2565c587fe1b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5143,6 +5143,18 @@  union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * u64 bpf_ktime_get_real_ns(void)
+ * 	Description
+ * 		Return a fine-grained version of the real (i.e., wall-clock) time,
+ * 		in nanoseconds. This clock is affected by discontinuous jumps in
+ * 		the system time (e.g., if the system administrator manually changes
+ * 		the clock), and by the incremental adjustments performed by adjtime(3)
+ * 		and NTP.
+ * 		See: **clock_gettime**\ (**CLOCK_REALTIME**)
+ * 	Return
+ * 		Current *ktime*.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5351,7 @@  union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(ktime_get_real_ns),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..acdf538b1dcd 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2627,6 +2627,7 @@  const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
 const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
 const struct bpf_func_proto bpf_get_numa_node_id_proto __weak;
 const struct bpf_func_proto bpf_ktime_get_ns_proto __weak;
+const struct bpf_func_proto bpf_ktime_get_real_ns_proto __weak;
 const struct bpf_func_proto bpf_ktime_get_boot_ns_proto __weak;
 const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto __weak;
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 315053ef6a75..d38548ed292f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -159,6 +159,18 @@  const struct bpf_func_proto bpf_ktime_get_ns_proto = {
 	.ret_type	= RET_INTEGER,
 };
 
+BPF_CALL_0(bpf_ktime_get_real_ns)
+{
+	/* NMI safe access to clock realtime. */
+	return ktime_get_real_fast_ns();
+}
+
+const struct bpf_func_proto bpf_ktime_get_real_ns_proto = {
+	.func		= bpf_ktime_get_real_ns,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+};
+
 BPF_CALL_0(bpf_ktime_get_boot_ns)
 {
 	/* NMI safe access to clock boottime */
@@ -1410,6 +1422,8 @@  bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_ktime_get_ns_proto;
 	case BPF_FUNC_ktime_get_boot_ns:
 		return &bpf_ktime_get_boot_ns_proto;
+	case BPF_FUNC_ktime_get_real_ns:
+		return &bpf_ktime_get_real_ns_proto;
 	case BPF_FUNC_ringbuf_output:
 		return &bpf_ringbuf_output_proto;
 	case BPF_FUNC_ringbuf_reserve:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d14b10b85e51..2565c587fe1b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5143,6 +5143,18 @@  union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * u64 bpf_ktime_get_real_ns(void)
+ * 	Description
+ * 		Return a fine-grained version of the real (i.e., wall-clock) time,
+ * 		in nanoseconds. This clock is affected by discontinuous jumps in
+ * 		the system time (e.g., if the system administrator manually changes
+ * 		the clock), and by the incremental adjustments performed by adjtime(3)
+ * 		and NTP.
+ * 		See: **clock_gettime**\ (**CLOCK_REALTIME**)
+ * 	Return
+ * 		Current *ktime*.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5351,7 @@  union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(ktime_get_real_ns),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper