diff mbox series

[RFC,bpf-next,v6,2/3] net: Add additional bit to support clockid_t timestamp type

Message ID 20240504031331.2737365-3-quic_abchauha@quicinc.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Replace mono_delivery_time with tstamp_type | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 fail Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 fail Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 fail Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38 fail Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 fail Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 fail Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-7 fail Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7516 this patch: 7516
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 11 maintainers not CCed: kpsingh@kernel.org john.fastabend@gmail.com andrii@kernel.org jolsa@kernel.org eddyz87@gmail.com yonghong.song@linux.dev song@kernel.org dsahern@kernel.org ast@kernel.org haoluo@google.com sdf@google.com
netdev/build_clang success Errors and warnings before: 1216 this patch: 1216
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 7900 this patch: 7900
netdev/checkpatch warning CHECK: Prefer using the BIT macro WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns WARNING: line length of 96 exceeds 80 columns WARNING: line length of 97 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 69 this patch: 69
netdev/source_inline success Was 0 now: 0

Commit Message

Abhishek Chauhan (ABC) May 4, 2024, 3:13 a.m. UTC
tstamp_type is now set based on actual clockid_t compressed
into 2 bits.

To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
other clockid_t timestamp.

We now support CLOCK_TAI as part of tstamp_type as part of this
commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.

Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
---
Changes since v5
- Took care of documentation comments of tstamp_type 
  in skbuff.h as mentioned by Willem.
- Use of complete words instead of abbrevation in 
  macro definitions as mentioned by Willem.
- Fixed indentation problems 
- Removed BPF_SKB_TSTAMP_UNSPEC and marked it 
  Deprecated as documentation, and introduced 
  BPF_SKB_CLOCK_REALTIME instead. 
- BUILD_BUG_ON for additional enums introduced.
- __ip_make_skb and ip6_make_skb now has 
  tcp checks to mark tcp packet as mono tstamp base. 
- separated the selftests/bpf changes into another patch.
- Made changes as per Martin in selftest bpf code and 
  tool/include/uapi/linux/bpf.h 

Changes since v4
- Made changes to BPF code in filter.c as per 
  Martin's comments
- Minor fixes on comments given on documentation
  from Willem in skbuff.h (removed obvious ones)
- Made changes to ctx_rewrite.c and test_tc_dtime.c
- test_tc_dtime.c i am not really sure if i took care 
  of all the changes as i am not too familiar with 
  the framework.
- Introduce common mask SKB_TSTAMP_TYPE_MASK instead
  of multiple SKB mask.
- Optimisation on BPF code as suggested by Martin.
- Set default case to SKB_CLOCK_REALTME.  

Changes since v3
- Carefully reviewed BPF APIs and made changes in 
  BPF code as well. 
- Re-used actual clockid_t values since skbuff.h 
  indirectly includes uapi/linux/time.h
- Added CLOCK_TAI as part of the skb_set_delivery_time
  handling instead of CLOCK_USER
- Added default in switch for unsupported and invalid 
  timestamp with an WARN_ONCE
- All of the above comments were given by Willem  
- Made changes in filter.c as per Martin's comments
  to handle invalid cases in bpf code with addition of
  SKB_TAI_DELIVERY_TIME_MASK

Changes since v2
- Minor changes to commit subject

Changes since v1 
- identified additional changes in BPF framework.
- Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
- Made changes in skb_set_delivery_time to keep changes similar to 
  previous code for mono_delivery_time and just setting tstamp_type
  bit 1 for userspace timestamp.


 include/linux/skbuff.h   | 21 +++++++++++--------
 include/uapi/linux/bpf.h | 15 +++++++++-----
 net/core/filter.c        | 44 +++++++++++++++++++++++-----------------
 net/ipv4/ip_output.c     |  5 ++++-
 net/ipv4/raw.c           |  2 +-
 net/ipv6/ip6_output.c    |  5 ++++-
 net/ipv6/raw.c           |  2 +-
 net/packet/af_packet.c   |  7 +++----
 8 files changed, 61 insertions(+), 40 deletions(-)

Comments

Willem de Bruijn May 6, 2024, 7 p.m. UTC | #1
Abhishek Chauhan wrote:
> tstamp_type is now set based on actual clockid_t compressed
> into 2 bits.
> 
> To make the design scalable for future needs this commit bring in
> the change to extend the tstamp_type:1 to tstamp_type:2 to support
> other clockid_t timestamp.
> 
> We now support CLOCK_TAI as part of tstamp_type as part of this
> commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.
> 
> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
> ---
> Changes since v5
> - Took care of documentation comments of tstamp_type 
>   in skbuff.h as mentioned by Willem.
> - Use of complete words instead of abbrevation in 
>   macro definitions as mentioned by Willem.
> - Fixed indentation problems 
> - Removed BPF_SKB_TSTAMP_UNSPEC and marked it 
>   Deprecated as documentation, and introduced 
>   BPF_SKB_CLOCK_REALTIME instead. 
> - BUILD_BUG_ON for additional enums introduced.
> - __ip_make_skb and ip6_make_skb now has 
>   tcp checks to mark tcp packet as mono tstamp base. 
> - separated the selftests/bpf changes into another patch.
> - Made changes as per Martin in selftest bpf code and 
>   tool/include/uapi/linux/bpf.h 
> 
> Changes since v4
> - Made changes to BPF code in filter.c as per 
>   Martin's comments
> - Minor fixes on comments given on documentation
>   from Willem in skbuff.h (removed obvious ones)
> - Made changes to ctx_rewrite.c and test_tc_dtime.c
> - test_tc_dtime.c i am not really sure if i took care 
>   of all the changes as i am not too familiar with 
>   the framework.
> - Introduce common mask SKB_TSTAMP_TYPE_MASK instead
>   of multiple SKB mask.
> - Optimisation on BPF code as suggested by Martin.
> - Set default case to SKB_CLOCK_REALTME.  
> 
> Changes since v3
> - Carefully reviewed BPF APIs and made changes in 
>   BPF code as well. 
> - Re-used actual clockid_t values since skbuff.h 
>   indirectly includes uapi/linux/time.h
> - Added CLOCK_TAI as part of the skb_set_delivery_time
>   handling instead of CLOCK_USER
> - Added default in switch for unsupported and invalid 
>   timestamp with an WARN_ONCE
> - All of the above comments were given by Willem  
> - Made changes in filter.c as per Martin's comments
>   to handle invalid cases in bpf code with addition of
>   SKB_TAI_DELIVERY_TIME_MASK
> 
> Changes since v2
> - Minor changes to commit subject
> 
> Changes since v1 
> - identified additional changes in BPF framework.
> - Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
> - Made changes in skb_set_delivery_time to keep changes similar to 
>   previous code for mono_delivery_time and just setting tstamp_type
>   bit 1 for userspace timestamp.
> 
> 
>  include/linux/skbuff.h   | 21 +++++++++++--------
>  include/uapi/linux/bpf.h | 15 +++++++++-----
>  net/core/filter.c        | 44 +++++++++++++++++++++++-----------------
>  net/ipv4/ip_output.c     |  5 ++++-
>  net/ipv4/raw.c           |  2 +-
>  net/ipv6/ip6_output.c    |  5 ++++-
>  net/ipv6/raw.c           |  2 +-
>  net/packet/af_packet.c   |  7 +++----
>  8 files changed, 61 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index de3915e2bfdb..fe7d8dbef77e 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -709,6 +709,8 @@ typedef unsigned char *sk_buff_data_t;
>  enum skb_tstamp_type {
>  	SKB_CLOCK_REALTIME,
>  	SKB_CLOCK_MONOTONIC,
> +	SKB_CLOCK_TAI,
> +	__SKB_CLOCK_MAX = SKB_CLOCK_TAI,
>  };
>  
>  /**
> @@ -829,8 +831,7 @@ enum skb_tstamp_type {
>   *	@decrypted: Decrypted SKB
>   *	@slow_gro: state present at GRO time, slower prepare step required
>   *	@tstamp_type: When set, skb->tstamp has the
> - *		delivery_time in mono clock base Otherwise, the
> - *		timestamp is considered real clock base.
> + *		delivery_time clock base of skb->tstamp.
>   *	@napi_id: id of the NAPI struct this skb came from
>   *	@sender_cpu: (aka @napi_id) source CPU in XPS
>   *	@alloc_cpu: CPU which did the skb allocation.
> @@ -958,7 +959,7 @@ struct sk_buff {
>  	/* private: */
>  	__u8			__mono_tc_offset[0];
>  	/* public: */
> -	__u8			tstamp_type:1;	/* See skb_tstamp_type */
> +	__u8			tstamp_type:2;	/* See skb_tstamp_type */
>  #ifdef CONFIG_NET_XGRESS
>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>  	__u8			tc_skip_classify:1;
> @@ -1088,15 +1089,16 @@ struct sk_buff {
>  #endif
>  #define PKT_TYPE_OFFSET		offsetof(struct sk_buff, __pkt_type_offset)
>  
> -/* if you move tc_at_ingress or mono_delivery_time
> +/* if you move tc_at_ingress or tstamp_type
>   * around, you also must adapt these constants.
>   */
>  #ifdef __BIG_ENDIAN_BITFIELD
> -#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 7)
> -#define TC_AT_INGRESS_MASK		(1 << 6)
> +#define SKB_TSTAMP_TYPE_MASK		(3 << 6)
> +#define SKB_TSTAMP_TYPE_RSHIFT		(6)
> +#define TC_AT_INGRESS_MASK		(1 << 5)
>  #else
> -#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 0)
> -#define TC_AT_INGRESS_MASK		(1 << 1)
> +#define SKB_TSTAMP_TYPE_MASK		(3)
> +#define TC_AT_INGRESS_MASK		(1 << 2)
>  #endif
>  #define SKB_BF_MONO_TC_OFFSET		offsetof(struct sk_buff, __mono_tc_offset)
>  
> @@ -4213,6 +4215,9 @@ static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
>  	case CLOCK_MONOTONIC:
>  		tstamp_type = SKB_CLOCK_MONOTONIC;
>  		break;
> +	case CLOCK_TAI:
> +		tstamp_type = SKB_CLOCK_TAI;
> +		break;
>  	default:
>  		WARN_ON_ONCE(1);
>  		kt = 0;
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 90706a47f6ff..25ea393cf084 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6207,12 +6207,17 @@ union {					\
>  	__u64 :64;			\
>  } __attribute__((aligned(8)))
>  
> +/* The enum used in skb->tstamp_type. It specifies the clock type
> + * of the time stored in the skb->tstamp.
> + */
>  enum {
> -	BPF_SKB_TSTAMP_UNSPEC,
> -	BPF_SKB_TSTAMP_DELIVERY_MONO,	/* tstamp has mono delivery time */
> -	/* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
> -	 * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
> -	 * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
> +	BPF_SKB_TSTAMP_UNSPEC = 0,		/* DEPRECATED */
> +	BPF_SKB_TSTAMP_DELIVERY_MONO = 1,	/* DEPRECATED */
> +	BPF_SKB_CLOCK_REALTIME = 0,
> +	BPF_SKB_CLOCK_MONOTONIC = 1,
> +	BPF_SKB_CLOCK_TAI = 2,
> +	/* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
> +	 * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>  	 */
>  };
>  
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a3781a796da4..9f3df4a0d1ee 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -7726,16 +7726,20 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
>  		return -EOPNOTSUPP;
>  
>  	switch (tstamp_type) {
> -	case BPF_SKB_TSTAMP_DELIVERY_MONO:
> +	case BPF_SKB_CLOCK_MONOTONIC:
>  		if (!tstamp)
>  			return -EINVAL;
>  		skb->tstamp = tstamp;
>  		skb->tstamp_type = SKB_CLOCK_MONOTONIC;
>  		break;
> -	case BPF_SKB_TSTAMP_UNSPEC:
> -		if (tstamp)
> +	case BPF_SKB_CLOCK_TAI:
> +		if (!tstamp)
>  			return -EINVAL;
> -		skb->tstamp = 0;
> +		skb->tstamp = tstamp;
> +		skb->tstamp_type = SKB_CLOCK_TAI;
> +		break;
> +	case BPF_SKB_CLOCK_REALTIME:
> +		skb->tstamp = tstamp;
>  		skb->tstamp_type = SKB_CLOCK_REALTIME;

Only since there is another reason to respin.

The previous code did not do this, but let's order cases by their enum
value, starting with realtime.

Also in anticipation with possible future expansions.
Abhishek Chauhan (ABC) May 6, 2024, 7:57 p.m. UTC | #2
On 5/6/2024 12:00 PM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> tstamp_type is now set based on actual clockid_t compressed
>> into 2 bits.
>>
>> To make the design scalable for future needs this commit bring in
>> the change to extend the tstamp_type:1 to tstamp_type:2 to support
>> other clockid_t timestamp.
>>
>> We now support CLOCK_TAI as part of tstamp_type as part of this
>> commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.
>>
>> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
>> ---
>> Changes since v5
>> - Took care of documentation comments of tstamp_type 
>>   in skbuff.h as mentioned by Willem.
>> - Use of complete words instead of abbrevation in 
>>   macro definitions as mentioned by Willem.
>> - Fixed indentation problems 
>> - Removed BPF_SKB_TSTAMP_UNSPEC and marked it 
>>   Deprecated as documentation, and introduced 
>>   BPF_SKB_CLOCK_REALTIME instead. 
>> - BUILD_BUG_ON for additional enums introduced.
>> - __ip_make_skb and ip6_make_skb now has 
>>   tcp checks to mark tcp packet as mono tstamp base. 
>> - separated the selftests/bpf changes into another patch.
>> - Made changes as per Martin in selftest bpf code and 
>>   tool/include/uapi/linux/bpf.h 
>>
>> Changes since v4
>> - Made changes to BPF code in filter.c as per 
>>   Martin's comments
>> - Minor fixes on comments given on documentation
>>   from Willem in skbuff.h (removed obvious ones)
>> - Made changes to ctx_rewrite.c and test_tc_dtime.c
>> - test_tc_dtime.c i am not really sure if i took care 
>>   of all the changes as i am not too familiar with 
>>   the framework.
>> - Introduce common mask SKB_TSTAMP_TYPE_MASK instead
>>   of multiple SKB mask.
>> - Optimisation on BPF code as suggested by Martin.
>> - Set default case to SKB_CLOCK_REALTME.  
>>
>> Changes since v3
>> - Carefully reviewed BPF APIs and made changes in 
>>   BPF code as well. 
>> - Re-used actual clockid_t values since skbuff.h 
>>   indirectly includes uapi/linux/time.h
>> - Added CLOCK_TAI as part of the skb_set_delivery_time
>>   handling instead of CLOCK_USER
>> - Added default in switch for unsupported and invalid 
>>   timestamp with an WARN_ONCE
>> - All of the above comments were given by Willem  
>> - Made changes in filter.c as per Martin's comments
>>   to handle invalid cases in bpf code with addition of
>>   SKB_TAI_DELIVERY_TIME_MASK
>>
>> Changes since v2
>> - Minor changes to commit subject
>>
>> Changes since v1 
>> - identified additional changes in BPF framework.
>> - Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
>> - Made changes in skb_set_delivery_time to keep changes similar to 
>>   previous code for mono_delivery_time and just setting tstamp_type
>>   bit 1 for userspace timestamp.
>>
>>
>>  include/linux/skbuff.h   | 21 +++++++++++--------
>>  include/uapi/linux/bpf.h | 15 +++++++++-----
>>  net/core/filter.c        | 44 +++++++++++++++++++++++-----------------
>>  net/ipv4/ip_output.c     |  5 ++++-
>>  net/ipv4/raw.c           |  2 +-
>>  net/ipv6/ip6_output.c    |  5 ++++-
>>  net/ipv6/raw.c           |  2 +-
>>  net/packet/af_packet.c   |  7 +++----
>>  8 files changed, 61 insertions(+), 40 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index de3915e2bfdb..fe7d8dbef77e 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -709,6 +709,8 @@ typedef unsigned char *sk_buff_data_t;
>>  enum skb_tstamp_type {
>>  	SKB_CLOCK_REALTIME,
>>  	SKB_CLOCK_MONOTONIC,
>> +	SKB_CLOCK_TAI,
>> +	__SKB_CLOCK_MAX = SKB_CLOCK_TAI,
>>  };
>>  
>>  /**
>> @@ -829,8 +831,7 @@ enum skb_tstamp_type {
>>   *	@decrypted: Decrypted SKB
>>   *	@slow_gro: state present at GRO time, slower prepare step required
>>   *	@tstamp_type: When set, skb->tstamp has the
>> - *		delivery_time in mono clock base Otherwise, the
>> - *		timestamp is considered real clock base.
>> + *		delivery_time clock base of skb->tstamp.
>>   *	@napi_id: id of the NAPI struct this skb came from
>>   *	@sender_cpu: (aka @napi_id) source CPU in XPS
>>   *	@alloc_cpu: CPU which did the skb allocation.
>> @@ -958,7 +959,7 @@ struct sk_buff {
>>  	/* private: */
>>  	__u8			__mono_tc_offset[0];
>>  	/* public: */
>> -	__u8			tstamp_type:1;	/* See skb_tstamp_type */
>> +	__u8			tstamp_type:2;	/* See skb_tstamp_type */
>>  #ifdef CONFIG_NET_XGRESS
>>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>>  	__u8			tc_skip_classify:1;
>> @@ -1088,15 +1089,16 @@ struct sk_buff {
>>  #endif
>>  #define PKT_TYPE_OFFSET		offsetof(struct sk_buff, __pkt_type_offset)
>>  
>> -/* if you move tc_at_ingress or mono_delivery_time
>> +/* if you move tc_at_ingress or tstamp_type
>>   * around, you also must adapt these constants.
>>   */
>>  #ifdef __BIG_ENDIAN_BITFIELD
>> -#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 7)
>> -#define TC_AT_INGRESS_MASK		(1 << 6)
>> +#define SKB_TSTAMP_TYPE_MASK		(3 << 6)
>> +#define SKB_TSTAMP_TYPE_RSHIFT		(6)
>> +#define TC_AT_INGRESS_MASK		(1 << 5)
>>  #else
>> -#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 0)
>> -#define TC_AT_INGRESS_MASK		(1 << 1)
>> +#define SKB_TSTAMP_TYPE_MASK		(3)
>> +#define TC_AT_INGRESS_MASK		(1 << 2)
>>  #endif
>>  #define SKB_BF_MONO_TC_OFFSET		offsetof(struct sk_buff, __mono_tc_offset)
>>  
>> @@ -4213,6 +4215,9 @@ static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
>>  	case CLOCK_MONOTONIC:
>>  		tstamp_type = SKB_CLOCK_MONOTONIC;
>>  		break;
>> +	case CLOCK_TAI:
>> +		tstamp_type = SKB_CLOCK_TAI;
>> +		break;
>>  	default:
>>  		WARN_ON_ONCE(1);
>>  		kt = 0;
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 90706a47f6ff..25ea393cf084 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -6207,12 +6207,17 @@ union {					\
>>  	__u64 :64;			\
>>  } __attribute__((aligned(8)))
>>  
>> +/* The enum used in skb->tstamp_type. It specifies the clock type
>> + * of the time stored in the skb->tstamp.
>> + */
>>  enum {
>> -	BPF_SKB_TSTAMP_UNSPEC,
>> -	BPF_SKB_TSTAMP_DELIVERY_MONO,	/* tstamp has mono delivery time */
>> -	/* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>> -	 * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>> -	 * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>> +	BPF_SKB_TSTAMP_UNSPEC = 0,		/* DEPRECATED */
>> +	BPF_SKB_TSTAMP_DELIVERY_MONO = 1,	/* DEPRECATED */
>> +	BPF_SKB_CLOCK_REALTIME = 0,
>> +	BPF_SKB_CLOCK_MONOTONIC = 1,
>> +	BPF_SKB_CLOCK_TAI = 2,
>> +	/* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
>> +	 * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>>  	 */
>>  };
>>  
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index a3781a796da4..9f3df4a0d1ee 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -7726,16 +7726,20 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
>>  		return -EOPNOTSUPP;
>>  
>>  	switch (tstamp_type) {
>> -	case BPF_SKB_TSTAMP_DELIVERY_MONO:
>> +	case BPF_SKB_CLOCK_MONOTONIC:
>>  		if (!tstamp)
>>  			return -EINVAL;
>>  		skb->tstamp = tstamp;
>>  		skb->tstamp_type = SKB_CLOCK_MONOTONIC;
>>  		break;
>> -	case BPF_SKB_TSTAMP_UNSPEC:
>> -		if (tstamp)
>> +	case BPF_SKB_CLOCK_TAI:
>> +		if (!tstamp)
>>  			return -EINVAL;
>> -		skb->tstamp = 0;
>> +		skb->tstamp = tstamp;
>> +		skb->tstamp_type = SKB_CLOCK_TAI;
>> +		break;
>> +	case BPF_SKB_CLOCK_REALTIME:
>> +		skb->tstamp = tstamp;
>>  		skb->tstamp_type = SKB_CLOCK_REALTIME;
> 
> Only since there is another reason to respin.
> 
> The previous code did not do this, but let's order cases by their enum
> value, starting with realtime.
> 
> Also in anticipation with possible future expansions.
> 
Noted I will take care of this. 

>
Martin KaFai Lau May 7, 2024, 12:44 a.m. UTC | #3
On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index fe86cadfa85b..c3d852eecb01 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>   
>   	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>   	skb->mark = cork->mark;
> -	skb->tstamp = cork->transmit_time;
> +	if (sk_is_tcp(sk))

This seems not catching all IPPROTO_TCP case. In particular, the percpu 
"ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:

void __init tcp_v4_init(void)
{

	/* ... */
	res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
				   IPPROTO_TCP, &init_net);

	/* ... */
}

"while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done" 
failed pretty often exactly in this case.

> +		skb_set_delivery_type_by_clockid(skb, cork->transmit_time, CLOCK_MONOTONIC);
> +	else
> +		skb_set_delivery_type_by_clockid(skb, cork->transmit_time, sk->sk_clockid);
>   	/*
>   	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
>   	 * on dst refcount

[ ... ]

> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 05067bd44775..797a9764e8fe 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1924,7 +1924,10 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
>   
>   	skb->priority = READ_ONCE(sk->sk_priority);
>   	skb->mark = cork->base.mark;
> -	skb->tstamp = cork->base.transmit_time;
> +	if (sk_is_tcp(sk))
> +		skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, CLOCK_MONOTONIC);
> +	else
> +		skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, sk->sk_clockid);
>   
>   	ip6_cork_steal_dst(skb, cork);
>   	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
Willem de Bruijn May 7, 2024, 11:39 a.m. UTC | #4
Martin KaFai Lau wrote:
> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index fe86cadfa85b..c3d852eecb01 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
> >   
> >   	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> >   	skb->mark = cork->mark;
> > -	skb->tstamp = cork->transmit_time;
> > +	if (sk_is_tcp(sk))
> 
> This seems not catching all IPPROTO_TCP case. In particular, the percpu 
> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
> 
> void __init tcp_v4_init(void)
> {
> 
> 	/* ... */
> 	res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
> 				   IPPROTO_TCP, &init_net);
> 
> 	/* ... */
> }
> 
> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done" 
> failed pretty often exactly in this case.
> 

Interesting. The TCP stack opens non TCP sockets.

Initializing sk->sk_clockid for this socket should address that.
Abhishek Chauhan (ABC) May 7, 2024, 7:08 p.m. UTC | #5
On 5/7/2024 4:39 AM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>> index fe86cadfa85b..c3d852eecb01 100644
>>> --- a/net/ipv4/ip_output.c
>>> +++ b/net/ipv4/ip_output.c
>>> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>>>   
>>>   	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>>>   	skb->mark = cork->mark;
>>> -	skb->tstamp = cork->transmit_time;
>>> +	if (sk_is_tcp(sk))
>>
>> This seems not catching all IPPROTO_TCP case. In particular, the percpu 
>> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
>>
>> void __init tcp_v4_init(void)
>> {
>>
>> 	/* ... */
>> 	res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
>> 				   IPPROTO_TCP, &init_net);
>>
>> 	/* ... */
>> }
>>
>> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done" 
>> failed pretty often exactly in this case.
>>
> 
> Interesting. The TCP stack opens non TCP sockets.
> 
> Initializing sk->sk_clockid for this socket should address that.
> 
Willem, Are you suggesting your point from the previous patch ? 

"I think we want to avoid special casing if we can. Note the if.

If TCP always uses monotonic, we could consider initializing
sk_clockid to CLOCK_MONONOTIC in tcp_init_sock.

I guess TCP logic currently entirely ignores sk_clockid. If we are to
start using this, then setsocktop SO_TXTIME must explicitly fail or
ignore for TCP sockets, or silently skip the write.

All of that is more complexity. Than is maybe warranted for this one
case. So no objections from me to special casing using sk_is_tcp(sk)
either." 

Few places we need to initialize the clock base for tcp to monotonic 
1. tcp_init_sock 
2. void __init tcp_v4_init(void) in tcp_ipv4.c
3. static int __net_init tcpv6_net_init(struct net *net)
4. Ignore setsockopts for SO_TXTIME if the sk->protocol is tcp.  

Is it safe to assume the TCP will never use any other close base ? 


OR 


For now we can do just protocol level check in ip_make_skb and ip6_make_skb 
like 
if (iph->protocol == IPPROTO_TCP)
    /* ... */
else
    /* ... */
Willem de Bruijn May 7, 2024, 7:18 p.m. UTC | #6
Abhishek Chauhan (ABC) wrote:
> 
> 
> On 5/7/2024 4:39 AM, Willem de Bruijn wrote:
> > Martin KaFai Lau wrote:
> >> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
> >>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> >>> index fe86cadfa85b..c3d852eecb01 100644
> >>> --- a/net/ipv4/ip_output.c
> >>> +++ b/net/ipv4/ip_output.c
> >>> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
> >>>   
> >>>   	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> >>>   	skb->mark = cork->mark;
> >>> -	skb->tstamp = cork->transmit_time;
> >>> +	if (sk_is_tcp(sk))
> >>
> >> This seems not catching all IPPROTO_TCP case. In particular, the percpu 
> >> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
> >>
> >> void __init tcp_v4_init(void)
> >> {
> >>
> >> 	/* ... */
> >> 	res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
> >> 				   IPPROTO_TCP, &init_net);
> >>
> >> 	/* ... */
> >> }
> >>
> >> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done" 
> >> failed pretty often exactly in this case.
> >>
> > 
> > Interesting. The TCP stack opens non TCP sockets.
> > 
> > Initializing sk->sk_clockid for this socket should address that.
> > 
> Willem, Are you suggesting your point from the previous patch ? 
> 

No, just for this custom socket to initialize the sk_clockid. It is
not a TCP socket, but only used by TCP.
Abhishek Chauhan (ABC) May 7, 2024, 7:38 p.m. UTC | #7
On 5/7/2024 12:18 PM, Willem de Bruijn wrote:
> Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 5/7/2024 4:39 AM, Willem de Bruijn wrote:
>>> Martin KaFai Lau wrote:
>>>> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
>>>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>>>> index fe86cadfa85b..c3d852eecb01 100644
>>>>> --- a/net/ipv4/ip_output.c
>>>>> +++ b/net/ipv4/ip_output.c
>>>>> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>>>>>   
>>>>>   	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>>>>>   	skb->mark = cork->mark;
>>>>> -	skb->tstamp = cork->transmit_time;
>>>>> +	if (sk_is_tcp(sk))
>>>>
>>>> This seems not catching all IPPROTO_TCP case. In particular, the percpu 
>>>> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
>>>>
>>>> void __init tcp_v4_init(void)
>>>> {
>>>>
>>>> 	/* ... */
>>>> 	res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
>>>> 				   IPPROTO_TCP, &init_net);
>>>>
>>>> 	/* ... */
>>>> }
>>>>
>>>> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done" 
>>>> failed pretty often exactly in this case.
>>>>
>>>
>>> Interesting. The TCP stack opens non TCP sockets.
>>>
>>> Initializing sk->sk_clockid for this socket should address that.
>>>
>> Willem, Are you suggesting your point from the previous patch ? 
>>
> 
> No, just for this custom socket to initialize the sk_clockid. It is
> not a TCP socket, but only used by TCP.
Thanks Willem, 
Noted! Which means there are only two places these custom RAW tcp socket 
are getting called 

1. tcp_ipv4.c 
2. tcp_ipv6.c 

I will take care of initializing sk_clockid to monotonic in the next patch 
in the above two files. 

Let me know if i missed out anything.
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index de3915e2bfdb..fe7d8dbef77e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -709,6 +709,8 @@  typedef unsigned char *sk_buff_data_t;
 enum skb_tstamp_type {
 	SKB_CLOCK_REALTIME,
 	SKB_CLOCK_MONOTONIC,
+	SKB_CLOCK_TAI,
+	__SKB_CLOCK_MAX = SKB_CLOCK_TAI,
 };
 
 /**
@@ -829,8 +831,7 @@  enum skb_tstamp_type {
  *	@decrypted: Decrypted SKB
  *	@slow_gro: state present at GRO time, slower prepare step required
  *	@tstamp_type: When set, skb->tstamp has the
- *		delivery_time in mono clock base Otherwise, the
- *		timestamp is considered real clock base.
+ *		delivery_time clock base of skb->tstamp.
  *	@napi_id: id of the NAPI struct this skb came from
  *	@sender_cpu: (aka @napi_id) source CPU in XPS
  *	@alloc_cpu: CPU which did the skb allocation.
@@ -958,7 +959,7 @@  struct sk_buff {
 	/* private: */
 	__u8			__mono_tc_offset[0];
 	/* public: */
-	__u8			tstamp_type:1;	/* See skb_tstamp_type */
+	__u8			tstamp_type:2;	/* See skb_tstamp_type */
 #ifdef CONFIG_NET_XGRESS
 	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
 	__u8			tc_skip_classify:1;
@@ -1088,15 +1089,16 @@  struct sk_buff {
 #endif
 #define PKT_TYPE_OFFSET		offsetof(struct sk_buff, __pkt_type_offset)
 
-/* if you move tc_at_ingress or mono_delivery_time
+/* if you move tc_at_ingress or tstamp_type
  * around, you also must adapt these constants.
  */
 #ifdef __BIG_ENDIAN_BITFIELD
-#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 7)
-#define TC_AT_INGRESS_MASK		(1 << 6)
+#define SKB_TSTAMP_TYPE_MASK		(3 << 6)
+#define SKB_TSTAMP_TYPE_RSHIFT		(6)
+#define TC_AT_INGRESS_MASK		(1 << 5)
 #else
-#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 0)
-#define TC_AT_INGRESS_MASK		(1 << 1)
+#define SKB_TSTAMP_TYPE_MASK		(3)
+#define TC_AT_INGRESS_MASK		(1 << 2)
 #endif
 #define SKB_BF_MONO_TC_OFFSET		offsetof(struct sk_buff, __mono_tc_offset)
 
@@ -4213,6 +4215,9 @@  static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
 	case CLOCK_MONOTONIC:
 		tstamp_type = SKB_CLOCK_MONOTONIC;
 		break;
+	case CLOCK_TAI:
+		tstamp_type = SKB_CLOCK_TAI;
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		kt = 0;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 90706a47f6ff..25ea393cf084 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6207,12 +6207,17 @@  union {					\
 	__u64 :64;			\
 } __attribute__((aligned(8)))
 
+/* The enum used in skb->tstamp_type. It specifies the clock type
+ * of the time stored in the skb->tstamp.
+ */
 enum {
-	BPF_SKB_TSTAMP_UNSPEC,
-	BPF_SKB_TSTAMP_DELIVERY_MONO,	/* tstamp has mono delivery time */
-	/* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
-	 * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
-	 * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
+	BPF_SKB_TSTAMP_UNSPEC = 0,		/* DEPRECATED */
+	BPF_SKB_TSTAMP_DELIVERY_MONO = 1,	/* DEPRECATED */
+	BPF_SKB_CLOCK_REALTIME = 0,
+	BPF_SKB_CLOCK_MONOTONIC = 1,
+	BPF_SKB_CLOCK_TAI = 2,
+	/* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
+	 * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
 	 */
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a3781a796da4..9f3df4a0d1ee 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7726,16 +7726,20 @@  BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
 		return -EOPNOTSUPP;
 
 	switch (tstamp_type) {
-	case BPF_SKB_TSTAMP_DELIVERY_MONO:
+	case BPF_SKB_CLOCK_MONOTONIC:
 		if (!tstamp)
 			return -EINVAL;
 		skb->tstamp = tstamp;
 		skb->tstamp_type = SKB_CLOCK_MONOTONIC;
 		break;
-	case BPF_SKB_TSTAMP_UNSPEC:
-		if (tstamp)
+	case BPF_SKB_CLOCK_TAI:
+		if (!tstamp)
 			return -EINVAL;
-		skb->tstamp = 0;
+		skb->tstamp = tstamp;
+		skb->tstamp_type = SKB_CLOCK_TAI;
+		break;
+	case BPF_SKB_CLOCK_REALTIME:
+		skb->tstamp = tstamp;
 		skb->tstamp_type = SKB_CLOCK_REALTIME;
 		break;
 	default:
@@ -9387,16 +9391,17 @@  static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
 {
 	__u8 value_reg = si->dst_reg;
 	__u8 skb_reg = si->src_reg;
-	/* AX is needed because src_reg and dst_reg could be the same */
-	__u8 tmp_reg = BPF_REG_AX;
-
-	*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
-			      SKB_BF_MONO_TC_OFFSET);
-	*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
-				SKB_MONO_DELIVERY_TIME_MASK, 2);
-	*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
-	*insn++ = BPF_JMP_A(1);
-	*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
+	BUILD_BUG_ON(__SKB_CLOCK_MAX != (int)BPF_SKB_CLOCK_TAI);
+	BUILD_BUG_ON(SKB_CLOCK_REALTIME != (int)BPF_SKB_CLOCK_REALTIME);
+	BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != (int)BPF_SKB_CLOCK_MONOTONIC);
+	BUILD_BUG_ON(SKB_CLOCK_TAI != (int)BPF_SKB_CLOCK_TAI);
+	*insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
+	*insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK);
+#ifdef __BIG_ENDIAN_BITFIELD
+	*insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSHIFT);
+#else
+	BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1));
+#endif
 
 	return insn;
 }
@@ -9439,10 +9444,11 @@  static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
 		__u8 tmp_reg = BPF_REG_AX;
 
 		*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
-		*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
-					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
-		*insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
-					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
+		/* check if ingress mask bits is set */
+		*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
+		*insn++ = BPF_JMP_A(4);
+		*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_TSTAMP_TYPE_MASK, 1);
+		*insn++ = BPF_JMP_A(2);
 		/* skb->tc_at_ingress && skb->tstamp_type,
 		 * read 0 as the (rcv) timestamp.
 		 */
@@ -9479,7 +9485,7 @@  static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
 		/* goto <store> */
 		*insn++ = BPF_JMP_A(2);
 		/* <clear>: skb->tstamp_type */
-		*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
+		*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TSTAMP_TYPE_MASK);
 		*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET);
 	}
 #endif
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fe86cadfa85b..c3d852eecb01 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1457,7 +1457,10 @@  struct sk_buff *__ip_make_skb(struct sock *sk,
 
 	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
 	skb->mark = cork->mark;
-	skb->tstamp = cork->transmit_time;
+	if (sk_is_tcp(sk))
+		skb_set_delivery_type_by_clockid(skb, cork->transmit_time, CLOCK_MONOTONIC);
+	else
+		skb_set_delivery_type_by_clockid(skb, cork->transmit_time, sk->sk_clockid);
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
 	 * on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 4cb43401e0e0..1a0953650356 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -360,7 +360,7 @@  static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	skb->protocol = htons(ETH_P_IP);
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
-	skb->tstamp = sockc->transmit_time;
+	skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
 	skb_dst_set(skb, &rt->dst);
 	*rtp = NULL;
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 05067bd44775..797a9764e8fe 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1924,7 +1924,10 @@  struct sk_buff *__ip6_make_skb(struct sock *sk,
 
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = cork->base.mark;
-	skb->tstamp = cork->base.transmit_time;
+	if (sk_is_tcp(sk))
+		skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, CLOCK_MONOTONIC);
+	else
+		skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, sk->sk_clockid);
 
 	ip6_cork_steal_dst(skb, cork);
 	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 2eedf255600b..f838366e8256 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -621,7 +621,7 @@  static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 	skb->protocol = htons(ETH_P_IPV6);
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
-	skb->tstamp = sockc->transmit_time;
+	skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
 
 	skb_put(skb, length);
 	skb_reset_network_header(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8c6d3fbb4ed8..89b54021d196 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2056,8 +2056,7 @@  static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
 	skb->dev = dev;
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = READ_ONCE(sk->sk_mark);
-	skb->tstamp = sockc.transmit_time;
-
+	skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);
 	skb_setup_tx_timestamp(skb, sockc.tsflags);
 
 	if (unlikely(extra_len == 4))
@@ -2585,7 +2584,7 @@  static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->dev = dev;
 	skb->priority = READ_ONCE(po->sk.sk_priority);
 	skb->mark = READ_ONCE(po->sk.sk_mark);
-	skb->tstamp = sockc->transmit_time;
+	skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, po->sk.sk_clockid);
 	skb_setup_tx_timestamp(skb, sockc->tsflags);
 	skb_zcopy_set_nouarg(skb, ph.raw);
 
@@ -3063,7 +3062,7 @@  static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	skb->dev = dev;
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc.mark;
-	skb->tstamp = sockc.transmit_time;
+	skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);
 
 	if (unlikely(extra_len == 4))
 		skb->no_fcs = 1;