Message ID | 20240228011219.1119105-1-quic_abchauha@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v2] net: Modify mono_delivery_time with clockid_delivery_time | expand |
Abhishek Chauhan wrote: > Bridge driver today has no support to forward the userspace timestamp > packets and ends up resetting the timestamp. ETF qdisc checks the > packet coming from userspace and encounters to be 0 thereby dropping > time sensitive packets. These changes will allow userspace timestamps > packets to be forwarded from the bridge to NIC drivers. > > Existing functionality of mono_delivery_time is not altered here > instead just extended with userspace tstamp support for bridge > forwarding path. > > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > --- > Changes since v1 > - Changed the commit subject as i am modifying the mono_delivery_time > bit with clockid_delivery_time. > - Took care of suggestion mentioned by Willem to use the same bit for > userspace delivery time as there are no conflicts between TCP and > SCM_TXTIME, because explicit cmsg makes no sense for TCP and only > RAW and DGRAM sockets interprets it. The variable rename churn makes it hard to spot the functional changes. Perhaps it makes sense just keep the variable name as is, even though the "mono" is not always technically correct anymore. Or else to split into two patches. One that renames the field. And one that adds the new behavior of setting the bit for SO_TXTIME. Sidenote: with sk_clockid, FQ could detect when skb->tstamp is not set in monotonic (i.e., set by SO_TXTIME) and drop the packet or ignore the embedded timestamp, warn, etc. > - Clear explaination of why this is needed mentioned below and this > is extending the work done by Martin for mono_delivery_time > https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ > - Version 1 patch can be referenced with below link which states > the exact problem with tc-etf and discussions which took place > https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ > > include/linux/skbuff.h | 22 +++++++++++++--------- > net/bridge/netfilter/nf_conntrack_bridge.c | 2 +- > net/core/dev.c | 2 +- > net/core/filter.c | 6 +++--- > net/ieee802154/6lowpan/reassembly.c | 2 +- > net/ipv4/inet_fragment.c | 2 +- > net/ipv4/ip_fragment.c | 2 +- > net/ipv4/ip_output.c | 13 +++++++++++-- > net/ipv4/raw.c | 9 +++++++++ > net/ipv6/ip6_output.c | 12 ++++++++++-- > net/ipv6/netfilter.c | 2 +- > net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- > net/ipv6/raw.c | 10 +++++++++- > net/ipv6/reassembly.c | 2 +- > net/sched/act_bpf.c | 4 ++-- > net/sched/cls_bpf.c | 4 ++-- > 16 files changed, 67 insertions(+), 29 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 2dde34c29203..24a34d56cfa3 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -816,10 +816,14 @@ typedef unsigned char *sk_buff_data_t; > * @dst_pending_confirm: need to confirm neighbour > * @decrypted: Decrypted SKB > * @slow_gro: state present at GRO time, slower prepare step required > - * @mono_delivery_time: When set, skb->tstamp has the > + * @clockid_delivery_time: When set, skb->tstamp has the > * delivery_time in mono clock base (i.e. EDT). Otherwise, the This is not true anymore. Perhaps "delivery_time in sk_clockid clock base" > * skb->tstamp has the (rcv) timestamp at ingress and > * delivery_time at egress. > + * This bit is also set if the tstamp is set from userspace which > + * acts as an information in the bridge forwarding path to net > + * reset the tstamp value when user sets the timestamp using > + * SO_TXTIME sockopts > * @napi_id: id of the NAPI struct this skb came from > * @sender_cpu: (aka @napi_id) source CPU in XPS > * @alloc_cpu: CPU which did the skb allocation. > @@ -950,7 +954,7 @@ struct sk_buff { > /* private: */ > __u8 __mono_tc_offset[0]; > /* public: */ > - __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ > + __u8 clockid_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ > #ifdef CONFIG_NET_XGRESS > __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ > __u8 tc_skip_classify:1; > @@ -4172,7 +4176,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb, > static inline void __net_timestamp(struct sk_buff *skb) > { > skb->tstamp = ktime_get_real(); > - skb->mono_delivery_time = 0; > + skb->clockid_delivery_time = 0; > } > > static inline ktime_t net_timedelta(ktime_t t) > @@ -4184,7 +4188,7 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, > bool mono) > { > skb->tstamp = kt; > - skb->mono_delivery_time = kt && mono; > + skb->clockid_delivery_time = kt && mono; > } > > DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); > @@ -4194,8 +4198,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); > */ > static inline void skb_clear_delivery_time(struct sk_buff *skb) > { > - if (skb->mono_delivery_time) { > - skb->mono_delivery_time = 0; > + if (skb->clockid_delivery_time) { > + skb->clockid_delivery_time = 0; > if (static_branch_unlikely(&netstamp_needed_key)) > skb->tstamp = ktime_get_real(); > else > @@ -4205,7 +4209,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb) > > static inline void skb_clear_tstamp(struct sk_buff *skb) > { > - if (skb->mono_delivery_time) > + if (skb->clockid_delivery_time) > return; > > skb->tstamp = 0; > @@ -4213,7 +4217,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb) > > static inline ktime_t skb_tstamp(const struct sk_buff *skb) > { > - if (skb->mono_delivery_time) > + if (skb->clockid_delivery_time) > return 0; > > return skb->tstamp; > @@ -4221,7 +4225,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb) > > static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond) > { > - if (!skb->mono_delivery_time && skb->tstamp) > + if (!skb->clockid_delivery_time && skb->tstamp) > return skb->tstamp; > > if (static_branch_unlikely(&netstamp_needed_key) || cond) > diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c > index abb090f94ed2..cfc9eeaf23fb 100644 > --- a/net/bridge/netfilter/nf_conntrack_bridge.c > +++ b/net/bridge/netfilter/nf_conntrack_bridge.c > @@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, > struct sk_buff *)) > { > int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; > - bool mono_delivery_time = skb->mono_delivery_time; > + bool mono_delivery_time = skb->clockid_delivery_time; > unsigned int hlen, ll_rs, mtu; > ktime_t tstamp = skb->tstamp; > struct ip_frag_state state; > diff --git a/net/core/dev.c b/net/core/dev.c > index bd230ce8fa79..2d6556444f55 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2137,7 +2137,7 @@ EXPORT_SYMBOL(net_disable_timestamp); > static inline void net_timestamp_set(struct sk_buff *skb) > { > skb->tstamp = 0; > - skb->mono_delivery_time = 0; > + skb->clockid_delivery_time = 0; > if (static_branch_unlikely(&netstamp_needed_key)) > skb->tstamp = ktime_get_real(); > } > diff --git a/net/core/filter.c b/net/core/filter.c > index 524adf1fa6d0..c653cff185cb 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -7698,13 +7698,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb, > if (!tstamp) > return -EINVAL; > skb->tstamp = tstamp; > - skb->mono_delivery_time = 1; > + skb->clockid_delivery_time = 1; > break; > case BPF_SKB_TSTAMP_UNSPEC: > if (tstamp) > return -EINVAL; > skb->tstamp = 0; > - skb->mono_delivery_time = 0; > + skb->clockid_delivery_time = 0; > break; > default: > return -EINVAL; > @@ -9413,7 +9413,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, > TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); > *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, > TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); > - /* skb->tc_at_ingress && skb->mono_delivery_time, > + /* skb->tc_at_ingress && skb->clockid_delivery_time, > * read 0 as the (rcv) timestamp. > */ > *insn++ = BPF_MOV64_IMM(value_reg, 0); > diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c > index 6dd960ec558c..651ca0a81e61 100644 > --- a/net/ieee802154/6lowpan/reassembly.c > +++ b/net/ieee802154/6lowpan/reassembly.c > @@ -130,7 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq, > goto err; > > fq->q.stamp = skb->tstamp; > - fq->q.mono_delivery_time = skb->mono_delivery_time; > + fq->q.mono_delivery_time = skb->clockid_delivery_time; > if (frag_type == LOWPAN_DISPATCH_FRAG1) > fq->q.flags |= INET_FRAG_FIRST_IN; > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c > index 7072fc0783ef..71958ea6ffd6 100644 > --- a/net/ipv4/inet_fragment.c > +++ b/net/ipv4/inet_fragment.c > @@ -578,7 +578,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, > skb_mark_not_on_list(head); > head->prev = NULL; > head->tstamp = q->stamp; > - head->mono_delivery_time = q->mono_delivery_time; > + head->clockid_delivery_time = q->mono_delivery_time; > } > EXPORT_SYMBOL(inet_frag_reasm_finish); > > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c > index a4941f53b523..6f41fdfda84e 100644 > --- a/net/ipv4/ip_fragment.c > +++ b/net/ipv4/ip_fragment.c > @@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) > qp->iif = dev->ifindex; > > qp->q.stamp = skb->tstamp; > - qp->q.mono_delivery_time = skb->mono_delivery_time; > + qp->q.mono_delivery_time = skb->clockid_delivery_time; > qp->q.meat += skb->len; > qp->ecn |= ecn; > add_frag_mem_limit(qp->q.fqdir, skb->truesize); > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 5b5a0adb927f..1b31171b58a4 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, > { > struct iphdr *iph; > struct sk_buff *skb2; > - bool mono_delivery_time = skb->mono_delivery_time; > + bool mono_delivery_time = skb->clockid_delivery_time; > struct rtable *rt = skb_rtable(skb); > unsigned int mtu, hlen, ll_rs; > struct ip_fraglist_iter iter; > @@ -1455,6 +1455,15 @@ struct sk_buff *__ip_make_skb(struct sock *sk, > skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); > skb->mark = cork->mark; > skb->tstamp = cork->transmit_time; > + /* TX timestamp from userspace can be set by setting SO_TXTIME flag > + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set > + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is > + * set or not. > + * Note:- the reason why we cannot use clockid to set the bool flag > + * is because if userspace sets Monotonic clocks the macro value for > + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. > + */ Let's avoid such long comments duplicated across multiple locations. > + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); skb->clockid_delivery_time = !!cork->transmit_time; > /* > * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec > * on dst refcount > @@ -1646,7 +1655,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, > arg->csumoffset) = csum_fold(csum_add(nskb->csum, > arg->csum)); > nskb->ip_summed = CHECKSUM_NONE; > - nskb->mono_delivery_time = !!transmit_time; > + nskb->clockid_delivery_time = !!transmit_time; > if (txhash) > skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4); > ip_push_pending_frames(sk, &fl4); > diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c > index aea89326c697..6ade217c4251 100644 > --- a/net/ipv4/raw.c > +++ b/net/ipv4/raw.c > @@ -353,6 +353,15 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, > skb->priority = READ_ONCE(sk->sk_priority); > skb->mark = sockc->mark; > skb->tstamp = sockc->transmit_time; > + /* TX timestamp from userspace can be set by setting SO_TXTIME flag > + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set > + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is > + * set or not. > + * Note:- the reason why we cannot use clockid to set the bool flag > + * is because if userspace sets Monotonic clocks the macro value for > + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. > + */ > + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); > skb_dst_set(skb, &rt->dst); > *rtp = NULL; > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index a722a43dd668..e4548e395c3d 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, > struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); > struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ? > inet6_sk(skb->sk) : NULL; > - bool mono_delivery_time = skb->mono_delivery_time; > + bool mono_delivery_time = skb->clockid_delivery_time; > struct ip6_frag_state state; > unsigned int mtu, hlen, nexthdr_offset; > ktime_t tstamp = skb->tstamp; > @@ -1922,7 +1922,15 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, > skb->priority = READ_ONCE(sk->sk_priority); > skb->mark = cork->base.mark; > skb->tstamp = cork->base.transmit_time; > - > + /* TX timestamp from userspace can be set by setting SO_TXTIME flag > + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set > + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is > + * set or not. > + * Note:- the reason why we cannot use clockid to set the bool flag > + * is because if userspace sets Monotonic clocks the macro value for > + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. > + */ > + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); > ip6_cork_steal_dst(skb, cork); > IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); > if (proto == IPPROTO_ICMPV6) { > diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c > index 53d255838e6a..33bc6b0f3b13 100644 > --- a/net/ipv6/netfilter.c > +++ b/net/ipv6/netfilter.c > @@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, > struct sk_buff *)) > { > int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; > - bool mono_delivery_time = skb->mono_delivery_time; > + bool mono_delivery_time = skb->clockid_delivery_time; > ktime_t tstamp = skb->tstamp; > struct ip6_frag_state state; > u8 *prevhdr, nexthdr = 0; > diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c > index b2dd48911c8d..65cdc41c2bba 100644 > --- a/net/ipv6/netfilter/nf_conntrack_reasm.c > +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c > @@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, > fq->iif = dev->ifindex; > > fq->q.stamp = skb->tstamp; > - fq->q.mono_delivery_time = skb->mono_delivery_time; > + fq->q.mono_delivery_time = skb->clockid_delivery_time; > fq->q.meat += skb->len; > fq->ecn |= ecn; > if (payload_len > fq->q.max_size) > diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c > index 03dbb874c363..1b95ccb50066 100644 > --- a/net/ipv6/raw.c > +++ b/net/ipv6/raw.c > @@ -616,7 +616,15 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, > skb->priority = READ_ONCE(sk->sk_priority); > skb->mark = sockc->mark; > skb->tstamp = sockc->transmit_time; > - > + /* TX timestamp from userspace can be set by setting SO_TXTIME flag > + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set > + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is > + * set or not. > + * Note:- the reason why we cannot use clockid to set the bool flag > + * is because if userspace sets Monotonic clocks the macro value for > + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. > + */ > + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); > skb_put(skb, length); > skb_reset_network_header(skb); > iph = ipv6_hdr(skb); > diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c > index 5ebc47da1000..ad0e6832b199 100644 > --- a/net/ipv6/reassembly.c > +++ b/net/ipv6/reassembly.c > @@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, > fq->iif = dev->ifindex; > > fq->q.stamp = skb->tstamp; > - fq->q.mono_delivery_time = skb->mono_delivery_time; > + fq->q.mono_delivery_time = skb->clockid_delivery_time; > fq->q.meat += skb->len; > fq->ecn |= ecn; > add_frag_mem_limit(fq->q.fqdir, skb->truesize); > diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c > index 0e3cf11ae5fc..47f46fcb1827 100644 > --- a/net/sched/act_bpf.c > +++ b/net/sched/act_bpf.c > @@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb, > bpf_compute_data_pointers(skb); > filter_res = bpf_prog_run(filter, skb); > } > - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) > - skb->mono_delivery_time = 0; > + if (unlikely(!skb->tstamp && skb->clockid_delivery_time)) > + skb->clockid_delivery_time = 0; > if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK) > skb_orphan(skb); > > diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c > index 5e83e890f6a4..fc85a7619f07 100644 > --- a/net/sched/cls_bpf.c > +++ b/net/sched/cls_bpf.c > @@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb, > bpf_compute_data_pointers(skb); > filter_res = bpf_prog_run(prog->filter, skb); > } > - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) > - skb->mono_delivery_time = 0; > + if (unlikely(!skb->tstamp && skb->clockid_delivery_time)) > + skb->clockid_delivery_time = 0; > > if (prog->exts_integrated) { > res->class = 0; > -- > 2.25.1 >
On 2/28/2024 7:53 AM, Willem de Bruijn wrote: > Abhishek Chauhan wrote: >> Bridge driver today has no support to forward the userspace timestamp >> packets and ends up resetting the timestamp. ETF qdisc checks the >> packet coming from userspace and encounters to be 0 thereby dropping >> time sensitive packets. These changes will allow userspace timestamps >> packets to be forwarded from the bridge to NIC drivers. >> >> Existing functionality of mono_delivery_time is not altered here >> instead just extended with userspace tstamp support for bridge >> forwarding path. >> >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >> --- >> Changes since v1 >> - Changed the commit subject as i am modifying the mono_delivery_time >> bit with clockid_delivery_time. >> - Took care of suggestion mentioned by Willem to use the same bit for >> userspace delivery time as there are no conflicts between TCP and >> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only >> RAW and DGRAM sockets interprets it. > > The variable rename churn makes it hard to spot the functional > changes. Perhaps it makes sense just keep the variable name as is, > even though the "mono" is not always technically correct anymore. > I think a better approach would be to keep the variable as ease and add comments and documentation in the header file of skbuff.h like how i have done in this patch. The reason why i say this is a. We can avoid alot of code churn just to solve this basic problem of propagating timestamp through forwarding bridge path b. Re-use the same variable name and have better documentation c. Complexity will be as minimal as possible. Let me know what you think. > Or else to split into two patches. One that renames the field. > And one that adds the new behavior of setting the bit for SO_TXTIME. > Regarding the sidenote. I dont see how they are using clock_id to determine if the skb->tstamp is set in monotonic. Please correct me or point me to the piece of code which is doing so. I hope the check against sock_flag is a better implementation as it clearly stats and is inline with the implementation that the tstamp is coming from userspace. skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME); > Sidenote: with sk_clockid, FQ could detect when skb->tstamp is not > set in monotonic (i.e., set by SO_TXTIME) and drop the packet or > ignore the embedded timestamp, warn, etc. > >> - Clear explaination of why this is needed mentioned below and this >> is extending the work done by Martin for mono_delivery_time >> https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ >> - Version 1 patch can be referenced with below link which states >> the exact problem with tc-etf and discussions which took place >> https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ >> >> include/linux/skbuff.h | 22 +++++++++++++--------- >> net/bridge/netfilter/nf_conntrack_bridge.c | 2 +- >> net/core/dev.c | 2 +- >> net/core/filter.c | 6 +++--- >> net/ieee802154/6lowpan/reassembly.c | 2 +- >> net/ipv4/inet_fragment.c | 2 +- >> net/ipv4/ip_fragment.c | 2 +- >> net/ipv4/ip_output.c | 13 +++++++++++-- >> net/ipv4/raw.c | 9 +++++++++ >> net/ipv6/ip6_output.c | 12 ++++++++++-- >> net/ipv6/netfilter.c | 2 +- >> net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- >> net/ipv6/raw.c | 10 +++++++++- >> net/ipv6/reassembly.c | 2 +- >> net/sched/act_bpf.c | 4 ++-- >> net/sched/cls_bpf.c | 4 ++-- >> 16 files changed, 67 insertions(+), 29 deletions(-) >> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >> index 2dde34c29203..24a34d56cfa3 100644 >> --- a/include/linux/skbuff.h >> +++ b/include/linux/skbuff.h >> @@ -816,10 +816,14 @@ typedef unsigned char *sk_buff_data_t; >> * @dst_pending_confirm: need to confirm neighbour >> * @decrypted: Decrypted SKB >> * @slow_gro: state present at GRO time, slower prepare step required >> - * @mono_delivery_time: When set, skb->tstamp has the >> + * @clockid_delivery_time: When set, skb->tstamp has the >> * delivery_time in mono clock base (i.e. EDT). Otherwise, the > > This is not true anymore. Perhaps "delivery_time in sk_clockid clock base" > >> * skb->tstamp has the (rcv) timestamp at ingress and >> * delivery_time at egress. >> + * This bit is also set if the tstamp is set from userspace which >> + * acts as an information in the bridge forwarding path to net >> + * reset the tstamp value when user sets the timestamp using >> + * SO_TXTIME sockopts >> * @napi_id: id of the NAPI struct this skb came from >> * @sender_cpu: (aka @napi_id) source CPU in XPS >> * @alloc_cpu: CPU which did the skb allocation. >> @@ -950,7 +954,7 @@ struct sk_buff { >> /* private: */ >> __u8 __mono_tc_offset[0]; >> /* public: */ >> - __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ >> + __u8 clockid_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ >> #ifdef CONFIG_NET_XGRESS >> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >> __u8 tc_skip_classify:1; >> @@ -4172,7 +4176,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb, >> static inline void __net_timestamp(struct sk_buff *skb) >> { >> skb->tstamp = ktime_get_real(); >> - skb->mono_delivery_time = 0; >> + skb->clockid_delivery_time = 0; >> } >> >> static inline ktime_t net_timedelta(ktime_t t) >> @@ -4184,7 +4188,7 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >> bool mono) >> { >> skb->tstamp = kt; >> - skb->mono_delivery_time = kt && mono; >> + skb->clockid_delivery_time = kt && mono; >> } >> >> DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); >> @@ -4194,8 +4198,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); >> */ >> static inline void skb_clear_delivery_time(struct sk_buff *skb) >> { >> - if (skb->mono_delivery_time) { >> - skb->mono_delivery_time = 0; >> + if (skb->clockid_delivery_time) { >> + skb->clockid_delivery_time = 0; >> if (static_branch_unlikely(&netstamp_needed_key)) >> skb->tstamp = ktime_get_real(); >> else >> @@ -4205,7 +4209,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb) >> >> static inline void skb_clear_tstamp(struct sk_buff *skb) >> { >> - if (skb->mono_delivery_time) >> + if (skb->clockid_delivery_time) >> return; >> >> skb->tstamp = 0; >> @@ -4213,7 +4217,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb) >> >> static inline ktime_t skb_tstamp(const struct sk_buff *skb) >> { >> - if (skb->mono_delivery_time) >> + if (skb->clockid_delivery_time) >> return 0; >> >> return skb->tstamp; >> @@ -4221,7 +4225,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb) >> >> static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond) >> { >> - if (!skb->mono_delivery_time && skb->tstamp) >> + if (!skb->clockid_delivery_time && skb->tstamp) >> return skb->tstamp; >> >> if (static_branch_unlikely(&netstamp_needed_key) || cond) >> diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c >> index abb090f94ed2..cfc9eeaf23fb 100644 >> --- a/net/bridge/netfilter/nf_conntrack_bridge.c >> +++ b/net/bridge/netfilter/nf_conntrack_bridge.c >> @@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, >> struct sk_buff *)) >> { >> int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; >> - bool mono_delivery_time = skb->mono_delivery_time; >> + bool mono_delivery_time = skb->clockid_delivery_time; >> unsigned int hlen, ll_rs, mtu; >> ktime_t tstamp = skb->tstamp; >> struct ip_frag_state state; >> diff --git a/net/core/dev.c b/net/core/dev.c >> index bd230ce8fa79..2d6556444f55 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -2137,7 +2137,7 @@ EXPORT_SYMBOL(net_disable_timestamp); >> static inline void net_timestamp_set(struct sk_buff *skb) >> { >> skb->tstamp = 0; >> - skb->mono_delivery_time = 0; >> + skb->clockid_delivery_time = 0; >> if (static_branch_unlikely(&netstamp_needed_key)) >> skb->tstamp = ktime_get_real(); >> } >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 524adf1fa6d0..c653cff185cb 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -7698,13 +7698,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb, >> if (!tstamp) >> return -EINVAL; >> skb->tstamp = tstamp; >> - skb->mono_delivery_time = 1; >> + skb->clockid_delivery_time = 1; >> break; >> case BPF_SKB_TSTAMP_UNSPEC: >> if (tstamp) >> return -EINVAL; >> skb->tstamp = 0; >> - skb->mono_delivery_time = 0; >> + skb->clockid_delivery_time = 0; >> break; >> default: >> return -EINVAL; >> @@ -9413,7 +9413,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >> *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, >> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); >> - /* skb->tc_at_ingress && skb->mono_delivery_time, >> + /* skb->tc_at_ingress && skb->clockid_delivery_time, >> * read 0 as the (rcv) timestamp. >> */ >> *insn++ = BPF_MOV64_IMM(value_reg, 0); >> diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c >> index 6dd960ec558c..651ca0a81e61 100644 >> --- a/net/ieee802154/6lowpan/reassembly.c >> +++ b/net/ieee802154/6lowpan/reassembly.c >> @@ -130,7 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq, >> goto err; >> >> fq->q.stamp = skb->tstamp; >> - fq->q.mono_delivery_time = skb->mono_delivery_time; >> + fq->q.mono_delivery_time = skb->clockid_delivery_time; >> if (frag_type == LOWPAN_DISPATCH_FRAG1) >> fq->q.flags |= INET_FRAG_FIRST_IN; >> >> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c >> index 7072fc0783ef..71958ea6ffd6 100644 >> --- a/net/ipv4/inet_fragment.c >> +++ b/net/ipv4/inet_fragment.c >> @@ -578,7 +578,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, >> skb_mark_not_on_list(head); >> head->prev = NULL; >> head->tstamp = q->stamp; >> - head->mono_delivery_time = q->mono_delivery_time; >> + head->clockid_delivery_time = q->mono_delivery_time; >> } >> EXPORT_SYMBOL(inet_frag_reasm_finish); >> >> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c >> index a4941f53b523..6f41fdfda84e 100644 >> --- a/net/ipv4/ip_fragment.c >> +++ b/net/ipv4/ip_fragment.c >> @@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) >> qp->iif = dev->ifindex; >> >> qp->q.stamp = skb->tstamp; >> - qp->q.mono_delivery_time = skb->mono_delivery_time; >> + qp->q.mono_delivery_time = skb->clockid_delivery_time; >> qp->q.meat += skb->len; >> qp->ecn |= ecn; >> add_frag_mem_limit(qp->q.fqdir, skb->truesize); >> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c >> index 5b5a0adb927f..1b31171b58a4 100644 >> --- a/net/ipv4/ip_output.c >> +++ b/net/ipv4/ip_output.c >> @@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, >> { >> struct iphdr *iph; >> struct sk_buff *skb2; >> - bool mono_delivery_time = skb->mono_delivery_time; >> + bool mono_delivery_time = skb->clockid_delivery_time; >> struct rtable *rt = skb_rtable(skb); >> unsigned int mtu, hlen, ll_rs; >> struct ip_fraglist_iter iter; >> @@ -1455,6 +1455,15 @@ struct sk_buff *__ip_make_skb(struct sock *sk, >> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); >> skb->mark = cork->mark; >> skb->tstamp = cork->transmit_time; >> + /* TX timestamp from userspace can be set by setting SO_TXTIME flag >> + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set >> + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is >> + * set or not. >> + * Note:- the reason why we cannot use clockid to set the bool flag >> + * is because if userspace sets Monotonic clocks the macro value for >> + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. >> + */ > > Let's avoid such long comments duplicated across multiple locations. > >> + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); > > skb->clockid_delivery_time = !!cork->transmit_time; > >> /* >> * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec >> * on dst refcount >> @@ -1646,7 +1655,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, >> arg->csumoffset) = csum_fold(csum_add(nskb->csum, >> arg->csum)); >> nskb->ip_summed = CHECKSUM_NONE; >> - nskb->mono_delivery_time = !!transmit_time; >> + nskb->clockid_delivery_time = !!transmit_time; >> if (txhash) >> skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4); >> ip_push_pending_frames(sk, &fl4); >> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c >> index aea89326c697..6ade217c4251 100644 >> --- a/net/ipv4/raw.c >> +++ b/net/ipv4/raw.c >> @@ -353,6 +353,15 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, >> skb->priority = READ_ONCE(sk->sk_priority); >> skb->mark = sockc->mark; >> skb->tstamp = sockc->transmit_time; >> + /* TX timestamp from userspace can be set by setting SO_TXTIME flag >> + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set >> + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is >> + * set or not. >> + * Note:- the reason why we cannot use clockid to set the bool flag >> + * is because if userspace sets Monotonic clocks the macro value for >> + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. >> + */ >> + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); >> skb_dst_set(skb, &rt->dst); >> *rtp = NULL; >> >> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c >> index a722a43dd668..e4548e395c3d 100644 >> --- a/net/ipv6/ip6_output.c >> +++ b/net/ipv6/ip6_output.c >> @@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, >> struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); >> struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ? >> inet6_sk(skb->sk) : NULL; >> - bool mono_delivery_time = skb->mono_delivery_time; >> + bool mono_delivery_time = skb->clockid_delivery_time; >> struct ip6_frag_state state; >> unsigned int mtu, hlen, nexthdr_offset; >> ktime_t tstamp = skb->tstamp; >> @@ -1922,7 +1922,15 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, >> skb->priority = READ_ONCE(sk->sk_priority); >> skb->mark = cork->base.mark; >> skb->tstamp = cork->base.transmit_time; >> - >> + /* TX timestamp from userspace can be set by setting SO_TXTIME flag >> + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set >> + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is >> + * set or not. >> + * Note:- the reason why we cannot use clockid to set the bool flag >> + * is because if userspace sets Monotonic clocks the macro value for >> + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. >> + */ >> + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); >> ip6_cork_steal_dst(skb, cork); >> IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); >> if (proto == IPPROTO_ICMPV6) { >> diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c >> index 53d255838e6a..33bc6b0f3b13 100644 >> --- a/net/ipv6/netfilter.c >> +++ b/net/ipv6/netfilter.c >> @@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, >> struct sk_buff *)) >> { >> int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; >> - bool mono_delivery_time = skb->mono_delivery_time; >> + bool mono_delivery_time = skb->clockid_delivery_time; >> ktime_t tstamp = skb->tstamp; >> struct ip6_frag_state state; >> u8 *prevhdr, nexthdr = 0; >> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c >> index b2dd48911c8d..65cdc41c2bba 100644 >> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c >> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c >> @@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, >> fq->iif = dev->ifindex; >> >> fq->q.stamp = skb->tstamp; >> - fq->q.mono_delivery_time = skb->mono_delivery_time; >> + fq->q.mono_delivery_time = skb->clockid_delivery_time; >> fq->q.meat += skb->len; >> fq->ecn |= ecn; >> if (payload_len > fq->q.max_size) >> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c >> index 03dbb874c363..1b95ccb50066 100644 >> --- a/net/ipv6/raw.c >> +++ b/net/ipv6/raw.c >> @@ -616,7 +616,15 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, >> skb->priority = READ_ONCE(sk->sk_priority); >> skb->mark = sockc->mark; >> skb->tstamp = sockc->transmit_time; >> - >> + /* TX timestamp from userspace can be set by setting SO_TXTIME flag >> + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set >> + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is >> + * set or not. >> + * Note:- the reason why we cannot use clockid to set the bool flag >> + * is because if userspace sets Monotonic clocks the macro value for >> + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. >> + */ >> + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); >> skb_put(skb, length); >> skb_reset_network_header(skb); >> iph = ipv6_hdr(skb); >> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c >> index 5ebc47da1000..ad0e6832b199 100644 >> --- a/net/ipv6/reassembly.c >> +++ b/net/ipv6/reassembly.c >> @@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, >> fq->iif = dev->ifindex; >> >> fq->q.stamp = skb->tstamp; >> - fq->q.mono_delivery_time = skb->mono_delivery_time; >> + fq->q.mono_delivery_time = skb->clockid_delivery_time; >> fq->q.meat += skb->len; >> fq->ecn |= ecn; >> add_frag_mem_limit(fq->q.fqdir, skb->truesize); >> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c >> index 0e3cf11ae5fc..47f46fcb1827 100644 >> --- a/net/sched/act_bpf.c >> +++ b/net/sched/act_bpf.c >> @@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb, >> bpf_compute_data_pointers(skb); >> filter_res = bpf_prog_run(filter, skb); >> } >> - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) >> - skb->mono_delivery_time = 0; >> + if (unlikely(!skb->tstamp && skb->clockid_delivery_time)) >> + skb->clockid_delivery_time = 0; >> if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK) >> skb_orphan(skb); >> >> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c >> index 5e83e890f6a4..fc85a7619f07 100644 >> --- a/net/sched/cls_bpf.c >> +++ b/net/sched/cls_bpf.c >> @@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb, >> bpf_compute_data_pointers(skb); >> filter_res = bpf_prog_run(prog->filter, skb); >> } >> - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) >> - skb->mono_delivery_time = 0; >> + if (unlikely(!skb->tstamp && skb->clockid_delivery_time)) >> + skb->clockid_delivery_time = 0; >> >> if (prog->exts_integrated) { >> res->class = 0; >> -- >> 2.25.1 >> > >
Abhishek Chauhan (ABC) wrote: > > > On 2/28/2024 7:53 AM, Willem de Bruijn wrote: > > Abhishek Chauhan wrote: > >> Bridge driver today has no support to forward the userspace timestamp > >> packets and ends up resetting the timestamp. ETF qdisc checks the > >> packet coming from userspace and encounters to be 0 thereby dropping > >> time sensitive packets. These changes will allow userspace timestamps > >> packets to be forwarded from the bridge to NIC drivers. > >> > >> Existing functionality of mono_delivery_time is not altered here > >> instead just extended with userspace tstamp support for bridge > >> forwarding path. > >> > >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > >> --- > >> Changes since v1 > >> - Changed the commit subject as i am modifying the mono_delivery_time > >> bit with clockid_delivery_time. > >> - Took care of suggestion mentioned by Willem to use the same bit for > >> userspace delivery time as there are no conflicts between TCP and > >> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only > >> RAW and DGRAM sockets interprets it. > > > > The variable rename churn makes it hard to spot the functional > > changes. Perhaps it makes sense just keep the variable name as is, > > even though the "mono" is not always technically correct anymore. > > > > > I think a better approach would be to keep the variable as ease and add > comments and documentation in the header file of skbuff.h like > how i have done in this patch. The reason why i say this is > a. We can avoid alot of code churn just to solve this basic problem of > propagating timestamp through forwarding bridge path > b. Re-use the same variable name and have better documentation > c. Complexity will be as minimal as possible. > > Let me know what you think. Agreed > > Or else to split into two patches. One that renames the field. > > And one that adds the new behavior of setting the bit for SO_TXTIME. > > > > Regarding the sidenote. I dont see how they are using clock_id to determine > if the skb->tstamp is set in monotonic. Please correct me or point me to > the piece of code which is doing so. That's really out of scope of this series anyway > > I hope the check against sock_flag is a better implementation as > it clearly stats and is inline with the implementation that the tstamp is > coming from userspace. > skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME); Enabling the socket flag is not sufficient to configure a delivery time on a packet. A transmit time must be communicated per packet with cork->transmit_time. And on top of that, it is cheaper to test.
On 2/28/2024 12:14 PM, Willem de Bruijn wrote: > Abhishek Chauhan (ABC) wrote: >> >> >> On 2/28/2024 7:53 AM, Willem de Bruijn wrote: >>> Abhishek Chauhan wrote: >>>> Bridge driver today has no support to forward the userspace timestamp >>>> packets and ends up resetting the timestamp. ETF qdisc checks the >>>> packet coming from userspace and encounters to be 0 thereby dropping >>>> time sensitive packets. These changes will allow userspace timestamps >>>> packets to be forwarded from the bridge to NIC drivers. >>>> >>>> Existing functionality of mono_delivery_time is not altered here >>>> instead just extended with userspace tstamp support for bridge >>>> forwarding path. >>>> >>>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >>>> --- >>>> Changes since v1 >>>> - Changed the commit subject as i am modifying the mono_delivery_time >>>> bit with clockid_delivery_time. >>>> - Took care of suggestion mentioned by Willem to use the same bit for >>>> userspace delivery time as there are no conflicts between TCP and >>>> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only >>>> RAW and DGRAM sockets interprets it. >>> >>> The variable rename churn makes it hard to spot the functional >>> changes. Perhaps it makes sense just keep the variable name as is, >>> even though the "mono" is not always technically correct anymore. >>> >> >> >> I think a better approach would be to keep the variable as ease and add >> comments and documentation in the header file of skbuff.h like >> how i have done in this patch. The reason why i say this is >> a. We can avoid alot of code churn just to solve this basic problem of >> propagating timestamp through forwarding bridge path >> b. Re-use the same variable name and have better documentation >> c. Complexity will be as minimal as possible. >> >> Let me know what you think. > > Agreed > Okay i will make the changes accordingly. >>> Or else to split into two patches. One that renames the field. >>> And one that adds the new behavior of setting the bit for SO_TXTIME. >>> >> >> Regarding the sidenote. I dont see how they are using clock_id to determine >> if the skb->tstamp is set in monotonic. Please correct me or point me to >> the piece of code which is doing so. > > That's really out of scope of this series anyway > Sounds good. Really appreciate your review and discussion on this topic. >> >> I hope the check against sock_flag is a better implementation as >> it clearly stats and is inline with the implementation that the tstamp is >> coming from userspace. >> skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME); > > Enabling the socket flag is not sufficient to configure a delivery > time on a packet. A transmit time must be communicated per packet > with cork->transmit_time. And on top of that, it is cheaper to test. So to re-use the same bit of mono_delivery_time. I want to set this bit when user-space sets the timestamps using SCM_TXTIME. Is it okay if i do the below when we make skb in ipv4/ipv6 and raw packets to ensure that bridge doesn't reset the packet tstamp or do you have a better suggestion to set the bit so br_forward_finish does not reset the timestamp. skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME);
Abhishek Chauhan (ABC) wrote: > > > On 2/28/2024 12:14 PM, Willem de Bruijn wrote: > > Abhishek Chauhan (ABC) wrote: > >> > >> > >> On 2/28/2024 7:53 AM, Willem de Bruijn wrote: > >>> Abhishek Chauhan wrote: > >>>> Bridge driver today has no support to forward the userspace timestamp > >>>> packets and ends up resetting the timestamp. ETF qdisc checks the > >>>> packet coming from userspace and encounters to be 0 thereby dropping > >>>> time sensitive packets. These changes will allow userspace timestamps > >>>> packets to be forwarded from the bridge to NIC drivers. > >>>> > >>>> Existing functionality of mono_delivery_time is not altered here > >>>> instead just extended with userspace tstamp support for bridge > >>>> forwarding path. > >>>> > >>>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > >>>> --- > >>>> Changes since v1 > >>>> - Changed the commit subject as i am modifying the mono_delivery_time > >>>> bit with clockid_delivery_time. > >>>> - Took care of suggestion mentioned by Willem to use the same bit for > >>>> userspace delivery time as there are no conflicts between TCP and > >>>> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only > >>>> RAW and DGRAM sockets interprets it. > >>> > >>> The variable rename churn makes it hard to spot the functional > >>> changes. Perhaps it makes sense just keep the variable name as is, > >>> even though the "mono" is not always technically correct anymore. > >>> > >> > >> > >> I think a better approach would be to keep the variable as ease and add > >> comments and documentation in the header file of skbuff.h like > >> how i have done in this patch. The reason why i say this is > >> a. We can avoid alot of code churn just to solve this basic problem of > >> propagating timestamp through forwarding bridge path > >> b. Re-use the same variable name and have better documentation > >> c. Complexity will be as minimal as possible. > >> > >> Let me know what you think. > > > > Agreed > > > > Okay i will make the changes accordingly. > >>> Or else to split into two patches. One that renames the field. > >>> And one that adds the new behavior of setting the bit for SO_TXTIME. > >>> > >> > >> Regarding the sidenote. I dont see how they are using clock_id to determine > >> if the skb->tstamp is set in monotonic. Please correct me or point me to > >> the piece of code which is doing so. > > > > That's really out of scope of this series anyway > > > > Sounds good. Really appreciate your review and discussion on this topic. > > >> > >> I hope the check against sock_flag is a better implementation as > >> it clearly stats and is inline with the implementation that the tstamp is > >> coming from userspace. > >> skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME); > > > > Enabling the socket flag is not sufficient to configure a delivery > > time on a packet. A transmit time must be communicated per packet > > with cork->transmit_time. And on top of that, it is cheaper to test. > > > So to re-use the same bit of mono_delivery_time. I want to set this bit > when user-space sets the timestamps using SCM_TXTIME. > Is it okay if i do the below when we make skb in ipv4/ipv6 and raw packets > to ensure that bridge doesn't reset the packet tstamp or do you have a better > suggestion to set the bit so br_forward_finish does not reset the timestamp. > > skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME); I already gave my suggestion. The timestamp is passed using sockcm_cookie field transmit_time, which is set from a control message of type SCM_TXTIME passed to sendmsg. Right above where you wanted to add this check is where skb->tstamp is initialized from cork->transmit_time, which got it from this field. So clearly that is the indication that a transmit time is set. And the field is hot in the cache.
On 2/28/2024 4:09 PM, Willem de Bruijn wrote: > Abhishek Chauhan (ABC) wrote: >> >> >> On 2/28/2024 12:14 PM, Willem de Bruijn wrote: >>> Abhishek Chauhan (ABC) wrote: >>>> >>>> >>>> On 2/28/2024 7:53 AM, Willem de Bruijn wrote: >>>>> Abhishek Chauhan wrote: >>>>>> Bridge driver today has no support to forward the userspace timestamp >>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the >>>>>> packet coming from userspace and encounters to be 0 thereby dropping >>>>>> time sensitive packets. These changes will allow userspace timestamps >>>>>> packets to be forwarded from the bridge to NIC drivers. >>>>>> >>>>>> Existing functionality of mono_delivery_time is not altered here >>>>>> instead just extended with userspace tstamp support for bridge >>>>>> forwarding path. >>>>>> >>>>>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >>>>>> --- >>>>>> Changes since v1 >>>>>> - Changed the commit subject as i am modifying the mono_delivery_time >>>>>> bit with clockid_delivery_time. >>>>>> - Took care of suggestion mentioned by Willem to use the same bit for >>>>>> userspace delivery time as there are no conflicts between TCP and >>>>>> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only >>>>>> RAW and DGRAM sockets interprets it. >>>>> >>>>> The variable rename churn makes it hard to spot the functional >>>>> changes. Perhaps it makes sense just keep the variable name as is, >>>>> even though the "mono" is not always technically correct anymore. >>>>> >>>> >>>> >>>> I think a better approach would be to keep the variable as ease and add >>>> comments and documentation in the header file of skbuff.h like >>>> how i have done in this patch. The reason why i say this is >>>> a. We can avoid alot of code churn just to solve this basic problem of >>>> propagating timestamp through forwarding bridge path >>>> b. Re-use the same variable name and have better documentation >>>> c. Complexity will be as minimal as possible. >>>> >>>> Let me know what you think. >>> >>> Agreed >>> >> >> Okay i will make the changes accordingly. >>>>> Or else to split into two patches. One that renames the field. >>>>> And one that adds the new behavior of setting the bit for SO_TXTIME. >>>>> >>>> >>>> Regarding the sidenote. I dont see how they are using clock_id to determine >>>> if the skb->tstamp is set in monotonic. Please correct me or point me to >>>> the piece of code which is doing so. >>> >>> That's really out of scope of this series anyway >>> >> >> Sounds good. Really appreciate your review and discussion on this topic. >> >>>> >>>> I hope the check against sock_flag is a better implementation as >>>> it clearly stats and is inline with the implementation that the tstamp is >>>> coming from userspace. >>>> skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME); >>> >>> Enabling the socket flag is not sufficient to configure a delivery >>> time on a packet. A transmit time must be communicated per packet >>> with cork->transmit_time. And on top of that, it is cheaper to test. >> >> >> So to re-use the same bit of mono_delivery_time. I want to set this bit >> when user-space sets the timestamps using SCM_TXTIME. >> Is it okay if i do the below when we make skb in ipv4/ipv6 and raw packets >> to ensure that bridge doesn't reset the packet tstamp or do you have a better >> suggestion to set the bit so br_forward_finish does not reset the timestamp. >> >> skb->mono_delivery_time = sock_flag(sk, SOCK_TXTIME); > > I already gave my suggestion. > > The timestamp is passed using sockcm_cookie field transmit_time, which > is set from a control message of type SCM_TXTIME passed to sendmsg. > > Right above where you wanted to add this check is where skb->tstamp is > initialized from cork->transmit_time, which got it from this field. So > clearly that is the indication that a transmit time is set. And the > field is hot in the cache. Oh i see what you are saying. I got it . I will make the changes accordingly. Let me post patch set v3 after testing it internally. Thanks again for all the review comments.
On 2/28/24 7:53 AM, Willem de Bruijn wrote: > Sidenote: with sk_clockid, FQ could detect when skb->tstamp is not > set in monotonic (i.e., set by SO_TXTIME) and drop the packet or > ignore the embedded timestamp, warn, etc. Thanks for cc-ing me. Sorry for the late reply. I just catch up to this thread and the v1. I think it is needed to detect if skb->tstamp is monotonic or not in fq. The container (with the veth setup) may use sch_etf while the host usually uses fq at the physical NIC and expects monotonic skb->tstamp. During forward (e.g. by bpf_redirect / ip[6]_forward from a veth to a physical NIC), skb_clear_tstamp() only forwards the monotonic skb->tstamp now. While sch_etf does check sk_clockid first before using skb->tstamp, fq does not check that now. or fq_packet_beyond_horizon() is enough to catch this clock discrepancy?
On Tue, 2024-02-27 at 17:12 -0800, Abhishek Chauhan wrote: > Bridge driver today has no support to forward the userspace timestamp > packets and ends up resetting the timestamp. ETF qdisc checks the > packet coming from userspace and encounters to be 0 thereby dropping > time sensitive packets. These changes will allow userspace timestamps > packets to be forwarded from the bridge to NIC drivers. > > Existing functionality of mono_delivery_time is not altered here > instead just extended with userspace tstamp support for bridge > forwarding path. > > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Since this is touching a quite sensitive code path, I think v3 would deserve some paired self-tests. Thanks! Paolo
Hey ABC, One minor nit below. On Tue, Feb 27, 2024 at 05:12:19PM -0800, Abhishek Chauhan wrote: > Bridge driver today has no support to forward the userspace timestamp > packets and ends up resetting the timestamp. ETF qdisc checks the > packet coming from userspace and encounters to be 0 thereby dropping > time sensitive packets. These changes will allow userspace timestamps > packets to be forwarded from the bridge to NIC drivers. > > Existing functionality of mono_delivery_time is not altered here > instead just extended with userspace tstamp support for bridge > forwarding path. > > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > --- > Changes since v1 > - Changed the commit subject as i am modifying the mono_delivery_time > bit with clockid_delivery_time. > - Took care of suggestion mentioned by Willem to use the same bit for > userspace delivery time as there are no conflicts between TCP and > SCM_TXTIME, because explicit cmsg makes no sense for TCP and only > RAW and DGRAM sockets interprets it. > - Clear explaination of why this is needed mentioned below and this > is extending the work done by Martin for mono_delivery_time > https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ > - Version 1 patch can be referenced with below link which states > the exact problem with tc-etf and discussions which took place > https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ > > include/linux/skbuff.h | 22 +++++++++++++--------- > net/bridge/netfilter/nf_conntrack_bridge.c | 2 +- > net/core/dev.c | 2 +- > net/core/filter.c | 6 +++--- > net/ieee802154/6lowpan/reassembly.c | 2 +- > net/ipv4/inet_fragment.c | 2 +- > net/ipv4/ip_fragment.c | 2 +- > net/ipv4/ip_output.c | 13 +++++++++++-- > net/ipv4/raw.c | 9 +++++++++ > net/ipv6/ip6_output.c | 12 ++++++++++-- > net/ipv6/netfilter.c | 2 +- > net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- > net/ipv6/raw.c | 10 +++++++++- > net/ipv6/reassembly.c | 2 +- > net/sched/act_bpf.c | 4 ++-- > net/sched/cls_bpf.c | 4 ++-- > 16 files changed, 67 insertions(+), 29 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 2dde34c29203..24a34d56cfa3 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -816,10 +816,14 @@ typedef unsigned char *sk_buff_data_t; > * @dst_pending_confirm: need to confirm neighbour > * @decrypted: Decrypted SKB > * @slow_gro: state present at GRO time, slower prepare step required > - * @mono_delivery_time: When set, skb->tstamp has the > + * @clockid_delivery_time: When set, skb->tstamp has the > * delivery_time in mono clock base (i.e. EDT). Otherwise, the > * skb->tstamp has the (rcv) timestamp at ingress and > * delivery_time at egress. > + * This bit is also set if the tstamp is set from userspace which > + * acts as an information in the bridge forwarding path to net s/net/not/ or maybe "avoid resetting" ? > + * reset the tstamp value when user sets the timestamp using > + * SO_TXTIME sockopts
Martin KaFai Lau wrote: > On 2/28/24 7:53 AM, Willem de Bruijn wrote: > > Sidenote: with sk_clockid, FQ could detect when skb->tstamp is not > > set in monotonic (i.e., set by SO_TXTIME) and drop the packet or > > ignore the embedded timestamp, warn, etc. > > Thanks for cc-ing me. Sorry for the late reply. I just catch up to this thread > and the v1. > > I think it is needed to detect if skb->tstamp is monotonic or not in fq. The > container (with the veth setup) may use sch_etf while the host usually uses fq > at the physical NIC and expects monotonic skb->tstamp. > > During forward (e.g. by bpf_redirect / ip[6]_forward from a veth to a physical > NIC), skb_clear_tstamp() only forwards the monotonic skb->tstamp now. While > sch_etf does check sk_clockid first before using skb->tstamp, fq does not check > that now. > or fq_packet_beyond_horizon() is enough to catch this clock discrepancy? Before your patch, I believe FQ had no such guard rails. An skb with any clockid from SO_TXTIME can arrive at FQ. With the new clockid field, we could add guard rails in fq_enqueue. If the bit is set, look up sk_clockid.
On 2/29/2024 6:44 AM, Willem de Bruijn wrote: > Martin KaFai Lau wrote: >> On 2/28/24 7:53 AM, Willem de Bruijn wrote: >>> Sidenote: with sk_clockid, FQ could detect when skb->tstamp is not >>> set in monotonic (i.e., set by SO_TXTIME) and drop the packet or >>> ignore the embedded timestamp, warn, etc. >> >> Thanks for cc-ing me. Sorry for the late reply. I just catch up to this thread >> and the v1. >> >> I think it is needed to detect if skb->tstamp is monotonic or not in fq. The >> container (with the veth setup) may use sch_etf while the host usually uses fq >> at the physical NIC and expects monotonic skb->tstamp. >> >> During forward (e.g. by bpf_redirect / ip[6]_forward from a veth to a physical >> NIC), skb_clear_tstamp() only forwards the monotonic skb->tstamp now. While >> sch_etf does check sk_clockid first before using skb->tstamp, fq does not check >> that now. >> or fq_packet_beyond_horizon() is enough to catch this clock discrepancy? > > Before your patch, I believe FQ had no such guard rails. An skb with > any clockid from SO_TXTIME can arrive at FQ. > > With the new clockid field, we could add guard rails in fq_enqueue. > If the bit is set, look up sk_clockid. > I agree with Willem here. Just how Sch_etf is doing in the validate_input_params function , it checks for qopt->clockid != CLOCK_TAI and also in validate packet its checking if the q->clockid and the skb->clockid are the same. Only then it proceeds to XMIT the packet. Such Guards can be added in FQ as well. I believe this is some optimization which can be done in future which is not in scope of this current problem (correct me if i am wrong). The main issue with bridge forwarding path which anyone using latest kernel would face is all ETF packets will be dropped since only monotonic tstamps are forwarded and ETF uses TAI clocks . Hence my v3 patch would be helpful to solve the problem. Later we can take up the optimization part to add Guard checks in the FQ as well as a seperate patch. > >
On 2/29/2024 1:30 AM, Paolo Abeni wrote: > On Tue, 2024-02-27 at 17:12 -0800, Abhishek Chauhan wrote: >> Bridge driver today has no support to forward the userspace timestamp >> packets and ends up resetting the timestamp. ETF qdisc checks the >> packet coming from userspace and encounters to be 0 thereby dropping >> time sensitive packets. These changes will allow userspace timestamps >> packets to be forwarded from the bridge to NIC drivers. >> >> Existing functionality of mono_delivery_time is not altered here >> instead just extended with userspace tstamp support for bridge >> forwarding path. >> >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > > Since this is touching a quite sensitive code path, I think v3 would > deserve some paired self-tests. > I did verify the existing so_txtime selftest for the same which does FQ and ETF tests which sends/recieve packets with both mono and tai clocks. OK. All tests passed (so_txtime.sh) Sometimes i saw resource temporarily unavailable in RX which i believe is a known issue https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg06792.html Along with this i also did mqprio + etf offload testing on internal platform. 1. iperf3 best effort testing UDP/TCP 2. Sent ETF packets from userspace application just like so_txtime.c but with SO_PRIORITY and added dumpstack to check the flow of packets. I saw no issues on internal platform. 3. I will mention all the testing in Tx/Rx path done for v3 patch. > Thanks! > > Paolo >
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2dde34c29203..24a34d56cfa3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -816,10 +816,14 @@ typedef unsigned char *sk_buff_data_t; * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required - * @mono_delivery_time: When set, skb->tstamp has the + * @clockid_delivery_time: When set, skb->tstamp has the * delivery_time in mono clock base (i.e. EDT). Otherwise, the * skb->tstamp has the (rcv) timestamp at ingress and * delivery_time at egress. + * This bit is also set if the tstamp is set from userspace which + * acts as an information in the bridge forwarding path to net + * reset the tstamp value when user sets the timestamp using + * SO_TXTIME sockopts * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @alloc_cpu: CPU which did the skb allocation. @@ -950,7 +954,7 @@ struct sk_buff { /* private: */ __u8 __mono_tc_offset[0]; /* public: */ - __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ + __u8 clockid_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ #ifdef CONFIG_NET_XGRESS __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ __u8 tc_skip_classify:1; @@ -4172,7 +4176,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb, static inline void __net_timestamp(struct sk_buff *skb) { skb->tstamp = ktime_get_real(); - skb->mono_delivery_time = 0; + skb->clockid_delivery_time = 0; } static inline ktime_t net_timedelta(ktime_t t) @@ -4184,7 +4188,7 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, bool mono) { skb->tstamp = kt; - skb->mono_delivery_time = kt && mono; + skb->clockid_delivery_time = kt && mono; } DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); @@ -4194,8 +4198,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); */ static inline void skb_clear_delivery_time(struct sk_buff *skb) { - if (skb->mono_delivery_time) { - skb->mono_delivery_time = 0; + if (skb->clockid_delivery_time) { + skb->clockid_delivery_time = 0; if (static_branch_unlikely(&netstamp_needed_key)) skb->tstamp = ktime_get_real(); else @@ -4205,7 +4209,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb) static inline void skb_clear_tstamp(struct sk_buff *skb) { - if (skb->mono_delivery_time) + if (skb->clockid_delivery_time) return; skb->tstamp = 0; @@ -4213,7 +4217,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb) static inline ktime_t skb_tstamp(const struct sk_buff *skb) { - if (skb->mono_delivery_time) + if (skb->clockid_delivery_time) return 0; return skb->tstamp; @@ -4221,7 +4225,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb) static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond) { - if (!skb->mono_delivery_time && skb->tstamp) + if (!skb->clockid_delivery_time && skb->tstamp) return skb->tstamp; if (static_branch_unlikely(&netstamp_needed_key) || cond) diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index abb090f94ed2..cfc9eeaf23fb 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, struct sk_buff *)) { int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; - bool mono_delivery_time = skb->mono_delivery_time; + bool mono_delivery_time = skb->clockid_delivery_time; unsigned int hlen, ll_rs, mtu; ktime_t tstamp = skb->tstamp; struct ip_frag_state state; diff --git a/net/core/dev.c b/net/core/dev.c index bd230ce8fa79..2d6556444f55 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2137,7 +2137,7 @@ EXPORT_SYMBOL(net_disable_timestamp); static inline void net_timestamp_set(struct sk_buff *skb) { skb->tstamp = 0; - skb->mono_delivery_time = 0; + skb->clockid_delivery_time = 0; if (static_branch_unlikely(&netstamp_needed_key)) skb->tstamp = ktime_get_real(); } diff --git a/net/core/filter.c b/net/core/filter.c index 524adf1fa6d0..c653cff185cb 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7698,13 +7698,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb, if (!tstamp) return -EINVAL; skb->tstamp = tstamp; - skb->mono_delivery_time = 1; + skb->clockid_delivery_time = 1; break; case BPF_SKB_TSTAMP_UNSPEC: if (tstamp) return -EINVAL; skb->tstamp = 0; - skb->mono_delivery_time = 0; + skb->clockid_delivery_time = 0; break; default: return -EINVAL; @@ -9413,7 +9413,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); - /* skb->tc_at_ingress && skb->mono_delivery_time, + /* skb->tc_at_ingress && skb->clockid_delivery_time, * read 0 as the (rcv) timestamp. */ *insn++ = BPF_MOV64_IMM(value_reg, 0); diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c index 6dd960ec558c..651ca0a81e61 100644 --- a/net/ieee802154/6lowpan/reassembly.c +++ b/net/ieee802154/6lowpan/reassembly.c @@ -130,7 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq, goto err; fq->q.stamp = skb->tstamp; - fq->q.mono_delivery_time = skb->mono_delivery_time; + fq->q.mono_delivery_time = skb->clockid_delivery_time; if (frag_type == LOWPAN_DISPATCH_FRAG1) fq->q.flags |= INET_FRAG_FIRST_IN; diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 7072fc0783ef..71958ea6ffd6 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -578,7 +578,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, skb_mark_not_on_list(head); head->prev = NULL; head->tstamp = q->stamp; - head->mono_delivery_time = q->mono_delivery_time; + head->clockid_delivery_time = q->mono_delivery_time; } EXPORT_SYMBOL(inet_frag_reasm_finish); diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index a4941f53b523..6f41fdfda84e 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) qp->iif = dev->ifindex; qp->q.stamp = skb->tstamp; - qp->q.mono_delivery_time = skb->mono_delivery_time; + qp->q.mono_delivery_time = skb->clockid_delivery_time; qp->q.meat += skb->len; qp->ecn |= ecn; add_frag_mem_limit(qp->q.fqdir, skb->truesize); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 5b5a0adb927f..1b31171b58a4 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, { struct iphdr *iph; struct sk_buff *skb2; - bool mono_delivery_time = skb->mono_delivery_time; + bool mono_delivery_time = skb->clockid_delivery_time; struct rtable *rt = skb_rtable(skb); unsigned int mtu, hlen, ll_rs; struct ip_fraglist_iter iter; @@ -1455,6 +1455,15 @@ struct sk_buff *__ip_make_skb(struct sock *sk, skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); skb->mark = cork->mark; skb->tstamp = cork->transmit_time; + /* TX timestamp from userspace can be set by setting SO_TXTIME flag + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is + * set or not. + * Note:- the reason why we cannot use clockid to set the bool flag + * is because if userspace sets Monotonic clocks the macro value for + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. + */ + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); /* * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec * on dst refcount @@ -1646,7 +1655,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, arg->csumoffset) = csum_fold(csum_add(nskb->csum, arg->csum)); nskb->ip_summed = CHECKSUM_NONE; - nskb->mono_delivery_time = !!transmit_time; + nskb->clockid_delivery_time = !!transmit_time; if (txhash) skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4); ip_push_pending_frames(sk, &fl4); diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index aea89326c697..6ade217c4251 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -353,6 +353,15 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; + /* TX timestamp from userspace can be set by setting SO_TXTIME flag + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is + * set or not. + * Note:- the reason why we cannot use clockid to set the bool flag + * is because if userspace sets Monotonic clocks the macro value for + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. + */ + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); skb_dst_set(skb, &rt->dst); *rtp = NULL; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a722a43dd668..e4548e395c3d 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ? inet6_sk(skb->sk) : NULL; - bool mono_delivery_time = skb->mono_delivery_time; + bool mono_delivery_time = skb->clockid_delivery_time; struct ip6_frag_state state; unsigned int mtu, hlen, nexthdr_offset; ktime_t tstamp = skb->tstamp; @@ -1922,7 +1922,15 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = cork->base.mark; skb->tstamp = cork->base.transmit_time; - + /* TX timestamp from userspace can be set by setting SO_TXTIME flag + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is + * set or not. + * Note:- the reason why we cannot use clockid to set the bool flag + * is because if userspace sets Monotonic clocks the macro value for + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. + */ + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); ip6_cork_steal_dst(skb, cork); IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); if (proto == IPPROTO_ICMPV6) { diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 53d255838e6a..33bc6b0f3b13 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, struct sk_buff *)) { int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; - bool mono_delivery_time = skb->mono_delivery_time; + bool mono_delivery_time = skb->clockid_delivery_time; ktime_t tstamp = skb->tstamp; struct ip6_frag_state state; u8 *prevhdr, nexthdr = 0; diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index b2dd48911c8d..65cdc41c2bba 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; - fq->q.mono_delivery_time = skb->mono_delivery_time; + fq->q.mono_delivery_time = skb->clockid_delivery_time; fq->q.meat += skb->len; fq->ecn |= ecn; if (payload_len > fq->q.max_size) diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 03dbb874c363..1b95ccb50066 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -616,7 +616,15 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; - + /* TX timestamp from userspace can be set by setting SO_TXTIME flag + * Once userspace sets the SO_TXTIME flag internally SOCK_TXTIME is set + * So we are setting the clockid_delivery_time based on if SOCK_TXTIME is + * set or not. + * Note:- the reason why we cannot use clockid to set the bool flag + * is because if userspace sets Monotonic clocks the macro value for + * monotonic clocks is 0 and !!clockid will return 0 for monotonic. + */ + skb->clockid_delivery_time = sock_flag(sk, SOCK_TXTIME); skb_put(skb, length); skb_reset_network_header(skb); iph = ipv6_hdr(skb); diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 5ebc47da1000..ad0e6832b199 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; - fq->q.mono_delivery_time = skb->mono_delivery_time; + fq->q.mono_delivery_time = skb->clockid_delivery_time; fq->q.meat += skb->len; fq->ecn |= ecn; add_frag_mem_limit(fq->q.fqdir, skb->truesize); diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index 0e3cf11ae5fc..47f46fcb1827 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb, bpf_compute_data_pointers(skb); filter_res = bpf_prog_run(filter, skb); } - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) - skb->mono_delivery_time = 0; + if (unlikely(!skb->tstamp && skb->clockid_delivery_time)) + skb->clockid_delivery_time = 0; if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK) skb_orphan(skb); diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 5e83e890f6a4..fc85a7619f07 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb, bpf_compute_data_pointers(skb); filter_res = bpf_prog_run(prog->filter, skb); } - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) - skb->mono_delivery_time = 0; + if (unlikely(!skb->tstamp && skb->clockid_delivery_time)) + skb->clockid_delivery_time = 0; if (prog->exts_integrated) { res->class = 0;
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc checks the packet coming from userspace and encounters to be 0 thereby dropping time sensitive packets. These changes will allow userspace timestamps packets to be forwarded from the bridge to NIC drivers. Existing functionality of mono_delivery_time is not altered here instead just extended with userspace tstamp support for bridge forwarding path. Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> --- Changes since v1 - Changed the commit subject as i am modifying the mono_delivery_time bit with clockid_delivery_time. - Took care of suggestion mentioned by Willem to use the same bit for userspace delivery time as there are no conflicts between TCP and SCM_TXTIME, because explicit cmsg makes no sense for TCP and only RAW and DGRAM sockets interprets it. - Clear explaination of why this is needed mentioned below and this is extending the work done by Martin for mono_delivery_time https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ - Version 1 patch can be referenced with below link which states the exact problem with tc-etf and discussions which took place https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ include/linux/skbuff.h | 22 +++++++++++++--------- net/bridge/netfilter/nf_conntrack_bridge.c | 2 +- net/core/dev.c | 2 +- net/core/filter.c | 6 +++--- net/ieee802154/6lowpan/reassembly.c | 2 +- net/ipv4/inet_fragment.c | 2 +- net/ipv4/ip_fragment.c | 2 +- net/ipv4/ip_output.c | 13 +++++++++++-- net/ipv4/raw.c | 9 +++++++++ net/ipv6/ip6_output.c | 12 ++++++++++-- net/ipv6/netfilter.c | 2 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- net/ipv6/raw.c | 10 +++++++++- net/ipv6/reassembly.c | 2 +- net/sched/act_bpf.c | 4 ++-- net/sched/cls_bpf.c | 4 ++-- 16 files changed, 67 insertions(+), 29 deletions(-)