Message ID | 20240409210547.3815806-4-quic_abchauha@quicinc.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | Rename mono_delivery_time to | expand |
Abhishek Chauhan wrote: > tstamp_type can be real, mono or userspace timestamp. > > This commit adds userspace timestamp and sets it if there is > valid transmit_time available in socket coming from userspace. > > To make the design scalable for future needs this commit bring in > the change to extend the tstamp_type:1 to tstamp_type:2 to support > userspace timestamp. > > Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > --- > include/linux/skbuff.h | 19 +++++++++++++++++-- > net/ipv4/ip_output.c | 2 +- > net/ipv4/raw.c | 2 +- > net/ipv6/ip6_output.c | 2 +- > net/ipv6/raw.c | 2 +- > net/packet/af_packet.c | 6 +++--- > 6 files changed, 24 insertions(+), 9 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 6160185f0fe0..2f91a8a2157a 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -705,6 +705,9 @@ typedef unsigned char *sk_buff_data_t; > enum skb_tstamp_type { > SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ > SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ > + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock > + * is in skb->sk->sk_clockid. > + */ Weird indentation? More fundamentally: instead of defining a type TX_USER, can we use a real clockid (e.g., CLOCK_TAI) based on skb->sk->sk_clockid? Rather than store an id that means "go look at sk_clockid". > }; > > /** > @@ -830,6 +833,9 @@ enum skb_tstamp_type { > * delivery_time in mono clock base (i.e. EDT). Otherwise, the > * skb->tstamp has the (rcv) timestamp at ingress and > * delivery_time at egress. > + * delivery_time in mono clock base (i.e., EDT) or a clock base chosen > + * by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at > + * ingress. > * @napi_id: id of the NAPI struct this skb came from > * @sender_cpu: (aka @napi_id) source CPU in XPS > * @alloc_cpu: CPU which did the skb allocation. > @@ -960,7 +966,7 @@ struct sk_buff { > /* private: */ > __u8 __mono_tc_offset[0]; > /* public: */ > - __u8 tstamp_type:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ > + __u8 tstamp_type:2; /* See SKB_MONO_DELIVERY_TIME_MASK */ > #ifdef CONFIG_NET_XGRESS > __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ > __u8 tc_skip_classify:1; With pahole, does this have an effect on sk_buff layout? > @@ -4274,7 +4280,16 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, > enum skb_tstamp_type tstamp_type) > { > skb->tstamp = kt; > - skb->tstamp_type = kt && tstamp_type; > + > + if (skb->tstamp_type) > + return; > + Why bail if a type is already set? And what if skb->tstamp_type != tstamp_type? Should skb->tstamp then not be written to (i.e., the test moved up), and perhaps a rate limited warning. > + if (kt && tstamp_type == SKB_TSTAMP_TYPE_TX_MONO) > + skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO; > + > + if (kt && tstamp_type == SKB_TSTAMP_TYPE_TX_USER) > + skb->tstamp_type = SKB_TSTAMP_TYPE_TX_USER; Simpler if (kt) skb->tstamp_type = tstamp_type;
On 4/10/2024 8:42 AM, Willem de Bruijn wrote: > Abhishek Chauhan wrote: >> tstamp_type can be real, mono or userspace timestamp. >> >> This commit adds userspace timestamp and sets it if there is >> valid transmit_time available in socket coming from userspace. >> >> To make the design scalable for future needs this commit bring in >> the change to extend the tstamp_type:1 to tstamp_type:2 to support >> userspace timestamp. >> >> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >> --- >> include/linux/skbuff.h | 19 +++++++++++++++++-- >> net/ipv4/ip_output.c | 2 +- >> net/ipv4/raw.c | 2 +- >> net/ipv6/ip6_output.c | 2 +- >> net/ipv6/raw.c | 2 +- >> net/packet/af_packet.c | 6 +++--- >> 6 files changed, 24 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >> index 6160185f0fe0..2f91a8a2157a 100644 >> --- a/include/linux/skbuff.h >> +++ b/include/linux/skbuff.h >> @@ -705,6 +705,9 @@ typedef unsigned char *sk_buff_data_t; >> enum skb_tstamp_type { >> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >> + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock >> + * is in skb->sk->sk_clockid. >> + */ > > Weird indentation? > I will correct it. > More fundamentally: instead of defining a type TX_USER, can we use a > real clockid (e.g., CLOCK_TAI) based on skb->sk->sk_clockid? Rather > than store an id that means "go look at sk_clockid". > >> }; >> >> /** >> @@ -830,6 +833,9 @@ enum skb_tstamp_type { >> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >> * skb->tstamp has the (rcv) timestamp at ingress and >> * delivery_time at egress. >> + * delivery_time in mono clock base (i.e., EDT) or a clock base chosen >> + * by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at >> + * ingress. >> * @napi_id: id of the NAPI struct this skb came from >> * @sender_cpu: (aka @napi_id) source CPU in XPS >> * @alloc_cpu: CPU which did the skb allocation. >> @@ -960,7 +966,7 @@ struct sk_buff { >> /* private: */ >> __u8 __mono_tc_offset[0]; >> /* public: */ >> - __u8 tstamp_type:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ >> + __u8 tstamp_type:2; /* See SKB_MONO_DELIVERY_TIME_MASK */ >> #ifdef CONFIG_NET_XGRESS >> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >> __u8 tc_skip_classify:1; > > With pahole, does this have an effect on sk_buff layout? > I think it does and it also impacts BPF testing. Hence in my cover letter i have mentioned that these changes will impact BPF. My level of expertise is very limited to BPF hence the reason for RFC. That being said i am actually trying to understand/learn BPF instructions to know things better. I think we need to also change the offset SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK #ifdef __BIG_ENDIAN_BITFIELD #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) //Suspecting changes here too #define TC_AT_INGRESS_MASK (1 << 6) // and here #else #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) #define TC_AT_INGRESS_MASK (1 << 1) (this might have to change to 1<<2 ) #endif #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) Also i suspect i change in /selftests/bpf/prog_tests/ctx_rewrite.c I am trying to figure out what this part of the code is doing. https://lore.kernel.org/all/20230321014115.997841-4-kuba@kernel.org/ Please correct me if i am wrong here. >> @@ -4274,7 +4280,16 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >> enum skb_tstamp_type tstamp_type) >> { >> skb->tstamp = kt; >> - skb->tstamp_type = kt && tstamp_type; >> + >> + if (skb->tstamp_type) >> + return; >> + > I can put a warn on here incase if both MONO and TAI are set. OR Rather make it simple as you have mentioned below. > Why bail if a type is already set? And what if > skb->tstamp_type != tstamp_type? Should skb->tstamp then not be > written to (i.e., the test moved up), and perhaps a rate limited > warning. > >> + if (kt && tstamp_type == SKB_TSTAMP_TYPE_TX_MONO) >> + skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO; >> + >> + if (kt && tstamp_type == SKB_TSTAMP_TYPE_TX_USER) >> + skb->tstamp_type = SKB_TSTAMP_TYPE_TX_USER; > > Simpler > > if (kt) > skb->tstamp_type = tstamp_type;
Abhishek Chauhan (ABC) wrote: > > > On 4/10/2024 8:42 AM, Willem de Bruijn wrote: > > Abhishek Chauhan wrote: > >> tstamp_type can be real, mono or userspace timestamp. > >> > >> This commit adds userspace timestamp and sets it if there is > >> valid transmit_time available in socket coming from userspace. > >> > >> To make the design scalable for future needs this commit bring in > >> the change to extend the tstamp_type:1 to tstamp_type:2 to support > >> userspace timestamp. > >> > >> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ > >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > > With pahole, does this have an effect on sk_buff layout? > > > I think it does and it also impacts BPF testing. Hence in my cover letter i have mentioned that these > changes will impact BPF. My level of expertise is very limited to BPF hence the reason for RFC. > That being said i am actually trying to understand/learn BPF instructions to know things better. > I think we need to also change the offset SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK > > > #ifdef __BIG_ENDIAN_BITFIELD > #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) //Suspecting changes here too > #define TC_AT_INGRESS_MASK (1 << 6) // and here > #else > #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) > #define TC_AT_INGRESS_MASK (1 << 1) (this might have to change to 1<<2 ) > #endif > #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) > > Also i suspect i change in /selftests/bpf/prog_tests/ctx_rewrite.c > I am trying to figure out what this part of the code is doing. > https://lore.kernel.org/all/20230321014115.997841-4-kuba@kernel.org/ > > Please correct me if i am wrong here. I broadly agree. We should convert all references to SKB_MONO_DELIVERY_TIME_MASK to an skb_tstamp_type equivalent. > > >> @@ -4274,7 +4280,16 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, > >> enum skb_tstamp_type tstamp_type) > >> { > >> skb->tstamp = kt; > >> - skb->tstamp_type = kt && tstamp_type; > >> + > >> + if (skb->tstamp_type) > >> + return; > >> + > > > I can put a warn on here incase if both MONO and TAI are set. > OR > Rather make it simple as you have mentioned below. When might skb->tstamp_type already be non-zero when skb_set_deliver_type is called? In most cases, this is called for a fresh skb. In br_ip6_fragment, it is called with a previous value. But this is the value of skb->tstamp_type. It just clears it if kt is 0. If skb->tstamp_type != tstamp_type is not a condition that can be forced by an unprivileged user, then we can warn.
On 4/10/24 1:25 PM, Abhishek Chauhan (ABC) wrote: >>> @@ -830,6 +833,9 @@ enum skb_tstamp_type { >>> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >>> * skb->tstamp has the (rcv) timestamp at ingress and >>> * delivery_time at egress. >>> + * delivery_time in mono clock base (i.e., EDT) or a clock base chosen >>> + * by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at >>> + * ingress. >>> * @napi_id: id of the NAPI struct this skb came from >>> * @sender_cpu: (aka @napi_id) source CPU in XPS >>> * @alloc_cpu: CPU which did the skb allocation. >>> @@ -960,7 +966,7 @@ struct sk_buff { >>> /* private: */ >>> __u8 __mono_tc_offset[0]; >>> /* public: */ >>> - __u8 tstamp_type:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ >>> + __u8 tstamp_type:2; /* See SKB_MONO_DELIVERY_TIME_MASK */ >>> #ifdef CONFIG_NET_XGRESS >>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ The above "tstamp_type:2" change shifted the tc_at_ingress bit. TC_AT_INGRESS_MASK needs to be adjusted. >>> __u8 tc_skip_classify:1; >> >> With pahole, does this have an effect on sk_buff layout? >> > I think it does and it also impacts BPF testing. Hence in my cover letter i have mentioned that these > changes will impact BPF. My level of expertise is very limited to BPF hence the reason for RFC. > That being said i am actually trying to understand/learn BPF instructions to know things better. > I think we need to also change the offset SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK > > > #ifdef __BIG_ENDIAN_BITFIELD > #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) //Suspecting changes here too > #define TC_AT_INGRESS_MASK (1 << 6) // and here > #else > #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) > #define TC_AT_INGRESS_MASK (1 << 1) (this might have to change to 1<<2 ) This should be (1 << 2) now. Similar adjustment for the big endian. > #endif > #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) > > Also i suspect i change in /selftests/bpf/prog_tests/ctx_rewrite.c ctx_rewrite.c tests the bpf ctx rewrite code. In this particular case, it tests the bpf_convert_tstamp_read() and bpf_convert_tstamp_write() generate the correct bpf instructions. e.g. "w11 &= 3;" is testing the following in bpf_convert_tstamp_read(): *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); The existing "TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK" is 0x3 and it should become 0x5 if my hand counts correctly. The patch set cannot be applied to the bpf-next: https://patchwork.kernel.org/project/netdevbpf/patch/20240409210547.3815806-4-quic_abchauha@quicinc.com/ , so bpf CI cannot run to reproduce the issue.
On 4/10/2024 4:25 PM, Martin KaFai Lau wrote: > On 4/10/24 1:25 PM, Abhishek Chauhan (ABC) wrote: >>>> @@ -830,6 +833,9 @@ enum skb_tstamp_type { >>>> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >>>> * skb->tstamp has the (rcv) timestamp at ingress and >>>> * delivery_time at egress. >>>> + * delivery_time in mono clock base (i.e., EDT) or a clock base chosen >>>> + * by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at >>>> + * ingress. >>>> * @napi_id: id of the NAPI struct this skb came from >>>> * @sender_cpu: (aka @napi_id) source CPU in XPS >>>> * @alloc_cpu: CPU which did the skb allocation. >>>> @@ -960,7 +966,7 @@ struct sk_buff { >>>> /* private: */ >>>> __u8 __mono_tc_offset[0]; >>>> /* public: */ >>>> - __u8 tstamp_type:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ >>>> + __u8 tstamp_type:2; /* See SKB_MONO_DELIVERY_TIME_MASK */ >>>> #ifdef CONFIG_NET_XGRESS >>>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ > > The above "tstamp_type:2" change shifted the tc_at_ingress bit. > TC_AT_INGRESS_MASK needs to be adjusted. > >>>> __u8 tc_skip_classify:1; >>> >>> With pahole, does this have an effect on sk_buff layout? >>> >> I think it does and it also impacts BPF testing. Hence in my cover letter i have mentioned that these >> changes will impact BPF. My level of expertise is very limited to BPF hence the reason for RFC. >> That being said i am actually trying to understand/learn BPF instructions to know things better. >> I think we need to also change the offset SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK >> >> >> #ifdef __BIG_ENDIAN_BITFIELD >> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) //Suspecting changes here too >> #define TC_AT_INGRESS_MASK (1 << 6) // and here >> #else >> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >> #define TC_AT_INGRESS_MASK (1 << 1) (this might have to change to 1<<2 ) > > This should be (1 << 2) now. Similar adjustment for the big endian. > >> #endif >> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >> >> Also i suspect i change in /selftests/bpf/prog_tests/ctx_rewrite.c > > ctx_rewrite.c tests the bpf ctx rewrite code. In this particular case, it tests > the bpf_convert_tstamp_read() and bpf_convert_tstamp_write() generate the > correct bpf instructions. > e.g. "w11 &= 3;" is testing the following in bpf_convert_tstamp_read(): > *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, > TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); > > The existing "TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK" is 0x3 > and it should become 0x5 if my hand counts correctly. > so the changes will be as follows (Martin correct me if am wrong) //w11 is checked againt 0x5 (Binary = 101) N(SCHED_CLS, struct __sk_buff, tstamp), .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);" "w11 &= 5;" <== here "if w11 != 0x5 goto pc+2;" <==here "$dst = 0;" "goto pc+1;" "$dst = *(u64 *)($ctx + sk_buff::tstamp);", //w11 is checked againt 0x4 (100) .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);" "if w11 & 0x4 goto pc+1;" <== here "goto pc+2;" "w11 &= -4;" <==here "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;" "*(u64 *)($ctx + sk_buff::tstamp) = $src;", > The patch set cannot be applied to the bpf-next: > https://patchwork.kernel.org/project/netdevbpf/patch/20240409210547.3815806-4-quic_abchauha@quicinc.com/ > , so bpf CI cannot run to reproduce the issue. >
On 4/10/2024 4:39 PM, Abhishek Chauhan (ABC) wrote: > > > On 4/10/2024 4:25 PM, Martin KaFai Lau wrote: >> On 4/10/24 1:25 PM, Abhishek Chauhan (ABC) wrote: >>>>> @@ -830,6 +833,9 @@ enum skb_tstamp_type { >>>>> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >>>>> * skb->tstamp has the (rcv) timestamp at ingress and >>>>> * delivery_time at egress. >>>>> + * delivery_time in mono clock base (i.e., EDT) or a clock base chosen >>>>> + * by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at >>>>> + * ingress. >>>>> * @napi_id: id of the NAPI struct this skb came from >>>>> * @sender_cpu: (aka @napi_id) source CPU in XPS >>>>> * @alloc_cpu: CPU which did the skb allocation. >>>>> @@ -960,7 +966,7 @@ struct sk_buff { >>>>> /* private: */ >>>>> __u8 __mono_tc_offset[0]; >>>>> /* public: */ >>>>> - __u8 tstamp_type:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ >>>>> + __u8 tstamp_type:2; /* See SKB_MONO_DELIVERY_TIME_MASK */ >>>>> #ifdef CONFIG_NET_XGRESS >>>>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >> >> The above "tstamp_type:2" change shifted the tc_at_ingress bit. >> TC_AT_INGRESS_MASK needs to be adjusted. >> >>>>> __u8 tc_skip_classify:1; >>>> >>>> With pahole, does this have an effect on sk_buff layout? >>>> >>> I think it does and it also impacts BPF testing. Hence in my cover letter i have mentioned that these >>> changes will impact BPF. My level of expertise is very limited to BPF hence the reason for RFC. >>> That being said i am actually trying to understand/learn BPF instructions to know things better. >>> I think we need to also change the offset SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK >>> >>> >>> #ifdef __BIG_ENDIAN_BITFIELD >>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) //Suspecting changes here too >>> #define TC_AT_INGRESS_MASK (1 << 6) // and here >>> #else >>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >>> #define TC_AT_INGRESS_MASK (1 << 1) (this might have to change to 1<<2 ) >> >> This should be (1 << 2) now. Similar adjustment for the big endian. >> >>> #endif >>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >>> >>> Also i suspect i change in /selftests/bpf/prog_tests/ctx_rewrite.c >> >> ctx_rewrite.c tests the bpf ctx rewrite code. In this particular case, it tests >> the bpf_convert_tstamp_read() and bpf_convert_tstamp_write() generate the >> correct bpf instructions. >> e.g. "w11 &= 3;" is testing the following in bpf_convert_tstamp_read(): >> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >> >> The existing "TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK" is 0x3 >> and it should become 0x5 if my hand counts correctly. >> > > so the changes will be as follows (Martin correct me if am wrong) > > //w11 is checked againt 0x5 (Binary = 101) > N(SCHED_CLS, struct __sk_buff, tstamp), > .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);" > "w11 &= 5;" <== here > "if w11 != 0x5 goto pc+2;" <==here > "$dst = 0;" > "goto pc+1;" > "$dst = *(u64 *)($ctx + sk_buff::tstamp);", > > //w11 is checked againt 0x4 (100) > .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);" > "if w11 & 0x4 goto pc+1;" <== here > "goto pc+2;" > "w11 &= -4;" <==here > "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;" > "*(u64 *)($ctx + sk_buff::tstamp) = $src;", > > Martin and Willem, After the above changes, patchset v3 of these changes passed BPF test cases . Looks like we are good to go with final review now. If you have any further comments Thank you for all the comments and design discussion that we had as part of this patch set series. Testing :- 1. https://patchwork.kernel.org/project/netdevbpf/patch/20240412210125.1780574-2-quic_abchauha@quicinc.com/ 2. https://patchwork.kernel.org/project/netdevbpf/patch/20240412210125.1780574-3-quic_abchauha@quicinc.com/ >> The patch set cannot be applied to the bpf-next: >> https://patchwork.kernel.org/project/netdevbpf/patch/20240409210547.3815806-4-quic_abchauha@quicinc.com/ >> , so bpf CI cannot run to reproduce the issue. >>
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6160185f0fe0..2f91a8a2157a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -705,6 +705,9 @@ typedef unsigned char *sk_buff_data_t; enum skb_tstamp_type { SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock + * is in skb->sk->sk_clockid. + */ }; /** @@ -830,6 +833,9 @@ enum skb_tstamp_type { * delivery_time in mono clock base (i.e. EDT). Otherwise, the * skb->tstamp has the (rcv) timestamp at ingress and * delivery_time at egress. + * delivery_time in mono clock base (i.e., EDT) or a clock base chosen + * by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at + * ingress. * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @alloc_cpu: CPU which did the skb allocation. @@ -960,7 +966,7 @@ struct sk_buff { /* private: */ __u8 __mono_tc_offset[0]; /* public: */ - __u8 tstamp_type:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ + __u8 tstamp_type:2; /* See SKB_MONO_DELIVERY_TIME_MASK */ #ifdef CONFIG_NET_XGRESS __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ __u8 tc_skip_classify:1; @@ -4274,7 +4280,16 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, enum skb_tstamp_type tstamp_type) { skb->tstamp = kt; - skb->tstamp_type = kt && tstamp_type; + + if (skb->tstamp_type) + return; + + if (kt && tstamp_type == SKB_TSTAMP_TYPE_TX_MONO) + skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO; + + if (kt && tstamp_type == SKB_TSTAMP_TYPE_TX_USER) + skb->tstamp_type = SKB_TSTAMP_TYPE_TX_USER; + } DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 62e457f7c02c..9aea6e810f52 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); skb->mark = cork->mark; - skb->tstamp = cork->transmit_time; + skb_set_delivery_time(skb, cork->transmit_time, SKB_TSTAMP_TYPE_TX_USER); /* * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec * on dst refcount diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index dcb11f22cbf2..d8f52bc06ed3 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -360,7 +360,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, skb->protocol = htons(ETH_P_IP); skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; - skb->tstamp = sockc->transmit_time; + skb_set_delivery_time(skb, sockc->transmit_time, SKB_TSTAMP_TYPE_TX_USER); skb_dst_set(skb, &rt->dst); *rtp = NULL; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a9e819115622..2beb9fc8c0b1 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1924,7 +1924,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = cork->base.mark; - skb->tstamp = cork->base.transmit_time; + skb_set_delivery_time(skb, cork->base.transmit_time, SKB_TSTAMP_TYPE_TX_USER); ip6_cork_steal_dst(skb, cork); IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 0d896ca7b589..3a68ca80bf83 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -621,7 +621,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb->protocol = htons(ETH_P_IPV6); skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; - skb->tstamp = sockc->transmit_time; + skb_set_delivery_time(skb, sockc->transmit_time, SKB_TSTAMP_TYPE_TX_USER); skb_put(skb, length); skb_reset_network_header(skb); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 18f616f487ea..27ea972dfc56 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2056,7 +2056,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, skb->dev = dev; skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark); - skb->tstamp = sockc.transmit_time; + skb_set_delivery_time(skb, sockc.transmit_time, SKB_TSTAMP_TYPE_TX_USER); skb_setup_tx_timestamp(skb, sockc.tsflags); @@ -2585,7 +2585,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, skb->dev = dev; skb->priority = READ_ONCE(po->sk.sk_priority); skb->mark = READ_ONCE(po->sk.sk_mark); - skb->tstamp = sockc->transmit_time; + skb_set_delivery_time(skb, sockc->transmit_time, SKB_TSTAMP_TYPE_TX_USER); skb_setup_tx_timestamp(skb, sockc->tsflags); skb_zcopy_set_nouarg(skb, ph.raw); @@ -3063,7 +3063,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) skb->dev = dev; skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc.mark; - skb->tstamp = sockc.transmit_time; + skb_set_delivery_time(skb, sockc.transmit_time, SKB_TSTAMP_TYPE_TX_USER); if (unlikely(extra_len == 4)) skb->no_fcs = 1;
tstamp_type can be real, mono or userspace timestamp. This commit adds userspace timestamp and sets it if there is valid transmit_time available in socket coming from userspace. To make the design scalable for future needs this commit bring in the change to extend the tstamp_type:1 to tstamp_type:2 to support userspace timestamp. Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> --- include/linux/skbuff.h | 19 +++++++++++++++++-- net/ipv4/ip_output.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv6/ip6_output.c | 2 +- net/ipv6/raw.c | 2 +- net/packet/af_packet.c | 6 +++--- 6 files changed, 24 insertions(+), 9 deletions(-)