diff mbox series

[net-next,v3,2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets

Message ID 20240904113153.2196238-3-vadfed@meta.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Add option to provide OPT_ID value via cmsg | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 14 this patch: 14
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: edumazet@google.com
netdev/build_clang fail Errors and warnings before: 17 this patch: 17
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 14 this patch: 14
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Vadim Fedorenko Sept. 4, 2024, 11:31 a.m. UTC
TCP sockets have different flow for providing timestamp OPT_ID value.
Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
 net/ipv4/tcp.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Vadim Fedorenko Sept. 5, 2024, 2:58 p.m. UTC | #1
On 04/09/2024 12:31, Vadim Fedorenko wrote:
> TCP sockets have different flow for providing timestamp OPT_ID value.
> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> 
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
>   net/ipv4/tcp.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 8a5680b4e786..5553a8aeee80 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>   }
>   EXPORT_SYMBOL(tcp_init_sock);
>   
> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>   {
>   	struct sk_buff *skb = tcp_write_queue_tail(sk);
> +	u32 tsflags = sockc->tsflags;
>   
>   	if (tsflags && skb) {
>   		struct skb_shared_info *shinfo = skb_shinfo(skb);
> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>   		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>   		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>   			tcb->txstamp_ack = 1;
> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> +				shinfo->tskey = sockc->ts_opt_id;
> +			else
> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> +		}
>   	}
>   }
>   
> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>   
>   out:
>   	if (copied) {
> -		tcp_tx_timestamp(sk, sockc.tsflags);
> +		tcp_tx_timestamp(sk, &sockc);
>   		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>   	}
>   out_nopush:

Hi Willem,

Unfortunately, these changes are not enough to enable custom OPT_ID for
TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
flow:

tcp_skb_collapse_tstamp()
tcp_fragment_tstamp()
tcp_gso_tstamp()

I believe the last one breaks tests, but the problem is that there is no
easy way to provide the flag of constant tskey to it. Only
shinfo::tx_flags are available at the caller side and we have already
discussed that we shouldn't use the last bit of this field.

So, how should we deal with the problem? Or is it better to postpone
support for TCP sockets in this case?

Thanks,
Vadim
Jason Xing Sept. 5, 2024, 3:37 p.m. UTC | #2
Hello Vadim,

On Thu, Sep 5, 2024 at 10:58 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> > TCP sockets have different flow for providing timestamp OPT_ID value.
> > Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >
> > Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > ---
> >   net/ipv4/tcp.c | 13 +++++++++----
> >   1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 8a5680b4e786..5553a8aeee80 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >   }
> >   EXPORT_SYMBOL(tcp_init_sock);
> >
> > -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >   {
> >       struct sk_buff *skb = tcp_write_queue_tail(sk);
> > +     u32 tsflags = sockc->tsflags;
> >
> >       if (tsflags && skb) {
> >               struct skb_shared_info *shinfo = skb_shinfo(skb);
> > @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >               sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >               if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >                       tcb->txstamp_ack = 1;
> > -             if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> > -                     shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > +             if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> > +                     if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> > +                             shinfo->tskey = sockc->ts_opt_id;
> > +                     else
> > +                             shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > +             }
> >       }
> >   }
> >
> > @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >
> >   out:
> >       if (copied) {
> > -             tcp_tx_timestamp(sk, sockc.tsflags);
> > +             tcp_tx_timestamp(sk, &sockc);
> >               tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >       }
> >   out_nopush:
>
> Hi Willem,
>
> Unfortunately, these changes are not enough to enable custom OPT_ID for
> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> flow:

I was planning to test locally after your new version is posted... Now
I can see the good of a useful selftest from this :)

>
> tcp_skb_collapse_tstamp()
> tcp_fragment_tstamp()
> tcp_gso_tstamp()
>
> I believe the last one breaks tests, but the problem is that there is no
> easy way to provide the flag of constant tskey to it. Only
> shinfo::tx_flags are available at the caller side and we have already
> discussed that we shouldn't use the last bit of this field.
>
> So, how should we deal with the problem? Or is it better to postpone
> support for TCP sockets in this case?

I'm not Willem, but my intuition is to postpone it since this will
make the series much bigger than the previous expectation.

Let Willem decide finally :)
Willem de Bruijn Sept. 5, 2024, 4:39 p.m. UTC | #3
Vadim Fedorenko wrote:
> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> > TCP sockets have different flow for providing timestamp OPT_ID value.
> > Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> > 
> > Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > ---
> >   net/ipv4/tcp.c | 13 +++++++++----
> >   1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 8a5680b4e786..5553a8aeee80 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >   }
> >   EXPORT_SYMBOL(tcp_init_sock);
> >   
> > -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >   {
> >   	struct sk_buff *skb = tcp_write_queue_tail(sk);
> > +	u32 tsflags = sockc->tsflags;
> >   
> >   	if (tsflags && skb) {
> >   		struct skb_shared_info *shinfo = skb_shinfo(skb);
> > @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >   		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >   		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >   			tcb->txstamp_ack = 1;
> > -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> > -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> > +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> > +				shinfo->tskey = sockc->ts_opt_id;
> > +			else
> > +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > +		}
> >   	}
> >   }
> >   
> > @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >   
> >   out:
> >   	if (copied) {
> > -		tcp_tx_timestamp(sk, sockc.tsflags);
> > +		tcp_tx_timestamp(sk, &sockc);
> >   		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >   	}
> >   out_nopush:
> 
> Hi Willem,
> 
> Unfortunately, these changes are not enough to enable custom OPT_ID for
> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> flow:
> 
> tcp_skb_collapse_tstamp()
> tcp_fragment_tstamp()
> tcp_gso_tstamp()
> 
> I believe the last one breaks tests, but the problem is that there is no
> easy way to provide the flag of constant tskey to it. Only
> shinfo::tx_flags are available at the caller side and we have already
> discussed that we shouldn't use the last bit of this field.
> 
> So, how should we deal with the problem? Or is it better to postpone
> support for TCP sockets in this case?

Are you sure that this is a problem. These functions pass on the
skb_shinfo(skb)->ts_key from one skb to another.

As long as tcp_tx_timestamp sets the skb_shinfo(skb)->ts_key of the
original skb correctly, either from the cmsg or sk_tskey, then I don't
immediate see how this passing on from one skb to another would break
the intent.
Vadim Fedorenko Sept. 6, 2024, 12:20 p.m. UTC | #4
On 05/09/2024 17:39, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>
>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>> ---
>>>    net/ipv4/tcp.c | 13 +++++++++----
>>>    1 file changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 8a5680b4e786..5553a8aeee80 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>    }
>>>    EXPORT_SYMBOL(tcp_init_sock);
>>>    
>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>    {
>>>    	struct sk_buff *skb = tcp_write_queue_tail(sk);
>>> +	u32 tsflags = sockc->tsflags;
>>>    
>>>    	if (tsflags && skb) {
>>>    		struct skb_shared_info *shinfo = skb_shinfo(skb);
>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>    		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>    		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>    			tcb->txstamp_ack = 1;
>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>> +				shinfo->tskey = sockc->ts_opt_id;
>>> +			else
>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>> +		}
>>>    	}
>>>    }
>>>    
>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>    
>>>    out:
>>>    	if (copied) {
>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
>>> +		tcp_tx_timestamp(sk, &sockc);
>>>    		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>    	}
>>>    out_nopush:
>>
>> Hi Willem,
>>
>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>> flow:
>>
>> tcp_skb_collapse_tstamp()
>> tcp_fragment_tstamp()
>> tcp_gso_tstamp()
>>
>> I believe the last one breaks tests, but the problem is that there is no
>> easy way to provide the flag of constant tskey to it. Only
>> shinfo::tx_flags are available at the caller side and we have already
>> discussed that we shouldn't use the last bit of this field.
>>
>> So, how should we deal with the problem? Or is it better to postpone
>> support for TCP sockets in this case?
> 
> Are you sure that this is a problem. These functions pass on the
> skb_shinfo(skb)->ts_key from one skb to another.

Yes, you are right, the problem is in a different place.

__skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
value:

	if (sk_is_tcp(sk))
		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);

It happens because of assumption that for TCP sockets shinfo::tskey will
have sequence number and the logic has to recalculate it back to the
bytes sent. The same logic exists in all TCP TX timestamping functions
(mentioned in the previous mail) and may trigger some unexpected
behavior. To fix the issue we have to provide some kind of signal that
tskey value is provided from user-space and shouldn't be changed. And we
have to have it somewhere in skb info. Again, tx_flags looks like the
best candidate, but it's impossible to use. I'm thinking of using
special flag in tcp_skb_cb - gonna test more, but open for other
suggestions.
Willem de Bruijn Sept. 6, 2024, 3:25 p.m. UTC | #5
Vadim Fedorenko wrote:
> On 05/09/2024 17:39, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>
> >>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>> ---
> >>>    net/ipv4/tcp.c | 13 +++++++++----
> >>>    1 file changed, 9 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>> index 8a5680b4e786..5553a8aeee80 100644
> >>> --- a/net/ipv4/tcp.c
> >>> +++ b/net/ipv4/tcp.c
> >>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>    }
> >>>    EXPORT_SYMBOL(tcp_init_sock);
> >>>    
> >>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>    {
> >>>    	struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>> +	u32 tsflags = sockc->tsflags;
> >>>    
> >>>    	if (tsflags && skb) {
> >>>    		struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>    		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>    		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>    			tcb->txstamp_ack = 1;
> >>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>> +				shinfo->tskey = sockc->ts_opt_id;
> >>> +			else
> >>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>> +		}
> >>>    	}
> >>>    }
> >>>    
> >>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>    
> >>>    out:
> >>>    	if (copied) {
> >>> -		tcp_tx_timestamp(sk, sockc.tsflags);
> >>> +		tcp_tx_timestamp(sk, &sockc);
> >>>    		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>    	}
> >>>    out_nopush:
> >>
> >> Hi Willem,
> >>
> >> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >> flow:
> >>
> >> tcp_skb_collapse_tstamp()
> >> tcp_fragment_tstamp()
> >> tcp_gso_tstamp()
> >>
> >> I believe the last one breaks tests, but the problem is that there is no
> >> easy way to provide the flag of constant tskey to it. Only
> >> shinfo::tx_flags are available at the caller side and we have already
> >> discussed that we shouldn't use the last bit of this field.
> >>
> >> So, how should we deal with the problem? Or is it better to postpone
> >> support for TCP sockets in this case?
> > 
> > Are you sure that this is a problem. These functions pass on the
> > skb_shinfo(skb)->ts_key from one skb to another.
> 
> Yes, you are right, the problem is in a different place.
> 
> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> value:
> 
> 	if (sk_is_tcp(sk))
> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> 
> It happens because of assumption that for TCP sockets shinfo::tskey will
> have sequence number and the logic has to recalculate it back to the
> bytes sent. The same logic exists in all TCP TX timestamping functions
> (mentioned in the previous mail) and may trigger some unexpected
> behavior. To fix the issue we have to provide some kind of signal that
> tskey value is provided from user-space and shouldn't be changed. And we
> have to have it somewhere in skb info. Again, tx_flags looks like the
> best candidate, but it's impossible to use. I'm thinking of using
> special flag in tcp_skb_cb - gonna test more, but open for other
> suggestions.

Ai, that is tricky. tx_flags is full/scarce indeed.

CB does not persist as the skb transitions between layers.

The most obvious solution would be to set the flag in sk_tsflags
itself. But then the cmsg would no long work on a per request basis:
either the socket uses OPT_ID with counter or OPT_ID_CMSG.

Good that we catch this now before the ABI is finalized.

If necessary TCP semantics can diverge from datagrams. So we could
punt on this. But it's not ideal.
Willem de Bruijn Sept. 6, 2024, 4:33 p.m. UTC | #6
Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
> > On 05/09/2024 17:39, Willem de Bruijn wrote:
> > > Vadim Fedorenko wrote:
> > >> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> > >>> TCP sockets have different flow for providing timestamp OPT_ID value.
> > >>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> > >>>
> > >>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > >>> ---
> > >>>    net/ipv4/tcp.c | 13 +++++++++----
> > >>>    1 file changed, 9 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > >>> index 8a5680b4e786..5553a8aeee80 100644
> > >>> --- a/net/ipv4/tcp.c
> > >>> +++ b/net/ipv4/tcp.c
> > >>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> > >>>    }
> > >>>    EXPORT_SYMBOL(tcp_init_sock);
> > >>>    
> > >>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > >>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> > >>>    {
> > >>>    	struct sk_buff *skb = tcp_write_queue_tail(sk);
> > >>> +	u32 tsflags = sockc->tsflags;
> > >>>    
> > >>>    	if (tsflags && skb) {
> > >>>    		struct skb_shared_info *shinfo = skb_shinfo(skb);
> > >>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > >>>    		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> > >>>    		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> > >>>    			tcb->txstamp_ack = 1;
> > >>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> > >>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > >>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> > >>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> > >>> +				shinfo->tskey = sockc->ts_opt_id;
> > >>> +			else
> > >>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > >>> +		}
> > >>>    	}
> > >>>    }
> > >>>    
> > >>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> > >>>    
> > >>>    out:
> > >>>    	if (copied) {
> > >>> -		tcp_tx_timestamp(sk, sockc.tsflags);
> > >>> +		tcp_tx_timestamp(sk, &sockc);
> > >>>    		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> > >>>    	}
> > >>>    out_nopush:
> > >>
> > >> Hi Willem,
> > >>
> > >> Unfortunately, these changes are not enough to enable custom OPT_ID for
> > >> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> > >> flow:
> > >>
> > >> tcp_skb_collapse_tstamp()
> > >> tcp_fragment_tstamp()
> > >> tcp_gso_tstamp()
> > >>
> > >> I believe the last one breaks tests, but the problem is that there is no
> > >> easy way to provide the flag of constant tskey to it. Only
> > >> shinfo::tx_flags are available at the caller side and we have already
> > >> discussed that we shouldn't use the last bit of this field.
> > >>
> > >> So, how should we deal with the problem? Or is it better to postpone
> > >> support for TCP sockets in this case?
> > > 
> > > Are you sure that this is a problem. These functions pass on the
> > > skb_shinfo(skb)->ts_key from one skb to another.
> > 
> > Yes, you are right, the problem is in a different place.
> > 
> > __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> > provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> > value:
> > 
> > 	if (sk_is_tcp(sk))
> > 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> > 
> > It happens because of assumption that for TCP sockets shinfo::tskey will
> > have sequence number and the logic has to recalculate it back to the
> > bytes sent. The same logic exists in all TCP TX timestamping functions
> > (mentioned in the previous mail) and may trigger some unexpected
> > behavior. To fix the issue we have to provide some kind of signal that
> > tskey value is provided from user-space and shouldn't be changed. And we
> > have to have it somewhere in skb info. Again, tx_flags looks like the
> > best candidate, but it's impossible to use. I'm thinking of using
> > special flag in tcp_skb_cb - gonna test more, but open for other
> > suggestions.
> 
> Ai, that is tricky. tx_flags is full/scarce indeed.
> 
> CB does not persist as the skb transitions between layers.

Though specifically for TCP, it is possible to look up the fast
clone on the rtx queue, whose tcp_skb_cb will be unperturbed. But
the tcb currently does not have this data either.
Vadim Fedorenko Sept. 6, 2024, 5:27 p.m. UTC | #7
On 06/09/2024 16:25, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>
>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>> ---
>>>>>     net/ipv4/tcp.c | 13 +++++++++----
>>>>>     1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>> --- a/net/ipv4/tcp.c
>>>>> +++ b/net/ipv4/tcp.c
>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>     }
>>>>>     EXPORT_SYMBOL(tcp_init_sock);
>>>>>     
>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>     {
>>>>>     	struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>> +	u32 tsflags = sockc->tsflags;
>>>>>     
>>>>>     	if (tsflags && skb) {
>>>>>     		struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>     		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>     		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>     			tcb->txstamp_ack = 1;
>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>> +				shinfo->tskey = sockc->ts_opt_id;
>>>>> +			else
>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>> +		}
>>>>>     	}
>>>>>     }
>>>>>     
>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>     
>>>>>     out:
>>>>>     	if (copied) {
>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
>>>>> +		tcp_tx_timestamp(sk, &sockc);
>>>>>     		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>     	}
>>>>>     out_nopush:
>>>>
>>>> Hi Willem,
>>>>
>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>> flow:
>>>>
>>>> tcp_skb_collapse_tstamp()
>>>> tcp_fragment_tstamp()
>>>> tcp_gso_tstamp()
>>>>
>>>> I believe the last one breaks tests, but the problem is that there is no
>>>> easy way to provide the flag of constant tskey to it. Only
>>>> shinfo::tx_flags are available at the caller side and we have already
>>>> discussed that we shouldn't use the last bit of this field.
>>>>
>>>> So, how should we deal with the problem? Or is it better to postpone
>>>> support for TCP sockets in this case?
>>>
>>> Are you sure that this is a problem. These functions pass on the
>>> skb_shinfo(skb)->ts_key from one skb to another.
>>
>> Yes, you are right, the problem is in a different place.
>>
>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>> value:
>>
>> 	if (sk_is_tcp(sk))
>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>
>> It happens because of assumption that for TCP sockets shinfo::tskey will
>> have sequence number and the logic has to recalculate it back to the
>> bytes sent. The same logic exists in all TCP TX timestamping functions
>> (mentioned in the previous mail) and may trigger some unexpected
>> behavior. To fix the issue we have to provide some kind of signal that
>> tskey value is provided from user-space and shouldn't be changed. And we
>> have to have it somewhere in skb info. Again, tx_flags looks like the
>> best candidate, but it's impossible to use. I'm thinking of using
>> special flag in tcp_skb_cb - gonna test more, but open for other
>> suggestions.
> 
> Ai, that is tricky. tx_flags is full/scarce indeed.
> 
> CB does not persist as the skb transitions between layers.
> 
> The most obvious solution would be to set the flag in sk_tsflags
> itself. But then the cmsg would no long work on a per request basis:
> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> 
> Good that we catch this now before the ABI is finalized.
> 
> If necessary TCP semantics can diverge from datagrams. So we could
> punt on this. But it's not ideal.

I have done proof of concept code which uses hwtsamp as a storage for
custom tskey in case of TCP:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a52638363ea5..40ed49e61bf7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct 
sk_buff *skb,
         serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
         if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
                 serr->ee.ee_data = skb_shinfo(skb)->tskey;
-               if (sk_is_tcp(sk))
-                       serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
         }

         err = sock_queue_err_skb(sk, skb);
@@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
          * but only if the socket refcount is not zero.
          */
         if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
+               if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) & 
SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
+                       skb_shinfo(skb)->tskey = 
(u32)skb_hwtstamps(skb)->hwtstamp;
                 *skb_hwtstamps(skb) = *hwtstamps;
                 __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, 
false);
                 sock_put(sk);
@@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
                 skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
         }

+       if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
+               if (skb_hwtstamps(orig_skb)->hwtstamp)
+                       skb_shinfo(skb)->tskey = 
(u32)skb_hwtstamps(orig_skb)->hwtstamp;
+               else
+                       skb_shinfo(skb)->tskey -= 
atomic_read(&sk->sk_tskey);
+       }
         if (hwtstamps)
                 *skb_hwtstamps(skb) = *hwtstamps;
         else
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d3decc13a99..1a161a2155b5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct 
sockcm_cookie *sockc)
                         tcb->txstamp_ack = 1;
                 if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
                         if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
-                               shinfo->tskey = sockc->ts_opt_id;
-                       else
-                               shinfo->tskey = TCP_SKB_CB(skb)->seq + 
skb->len - 1;
+                               skb_hwtstamps(skb)->hwtstamp = 
sockc->ts_opt_id;
+                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
                 }
         }
  }


Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
it. netdev_data field is only used on RX timestamp side, so should be
fine to reuse. WDYT?
Vadim Fedorenko Sept. 6, 2024, 5:27 p.m. UTC | #8
On 06/09/2024 17:33, Willem de Bruijn wrote:
> Willem de Bruijn wrote:
>> Vadim Fedorenko wrote:
>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>>> Vadim Fedorenko wrote:
>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>>
>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>>> ---
>>>>>>     net/ipv4/tcp.c | 13 +++++++++----
>>>>>>     1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>>> --- a/net/ipv4/tcp.c
>>>>>> +++ b/net/ipv4/tcp.c
>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>>     }
>>>>>>     EXPORT_SYMBOL(tcp_init_sock);
>>>>>>     
>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>     {
>>>>>>     	struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>>> +	u32 tsflags = sockc->tsflags;
>>>>>>     
>>>>>>     	if (tsflags && skb) {
>>>>>>     		struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>     		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>>     		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>>     			tcb->txstamp_ack = 1;
>>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>>> +				shinfo->tskey = sockc->ts_opt_id;
>>>>>> +			else
>>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>> +		}
>>>>>>     	}
>>>>>>     }
>>>>>>     
>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>>     
>>>>>>     out:
>>>>>>     	if (copied) {
>>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
>>>>>> +		tcp_tx_timestamp(sk, &sockc);
>>>>>>     		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>>     	}
>>>>>>     out_nopush:
>>>>>
>>>>> Hi Willem,
>>>>>
>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>>> flow:
>>>>>
>>>>> tcp_skb_collapse_tstamp()
>>>>> tcp_fragment_tstamp()
>>>>> tcp_gso_tstamp()
>>>>>
>>>>> I believe the last one breaks tests, but the problem is that there is no
>>>>> easy way to provide the flag of constant tskey to it. Only
>>>>> shinfo::tx_flags are available at the caller side and we have already
>>>>> discussed that we shouldn't use the last bit of this field.
>>>>>
>>>>> So, how should we deal with the problem? Or is it better to postpone
>>>>> support for TCP sockets in this case?
>>>>
>>>> Are you sure that this is a problem. These functions pass on the
>>>> skb_shinfo(skb)->ts_key from one skb to another.
>>>
>>> Yes, you are right, the problem is in a different place.
>>>
>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>>> value:
>>>
>>> 	if (sk_is_tcp(sk))
>>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>
>>> It happens because of assumption that for TCP sockets shinfo::tskey will
>>> have sequence number and the logic has to recalculate it back to the
>>> bytes sent. The same logic exists in all TCP TX timestamping functions
>>> (mentioned in the previous mail) and may trigger some unexpected
>>> behavior. To fix the issue we have to provide some kind of signal that
>>> tskey value is provided from user-space and shouldn't be changed. And we
>>> have to have it somewhere in skb info. Again, tx_flags looks like the
>>> best candidate, but it's impossible to use. I'm thinking of using
>>> special flag in tcp_skb_cb - gonna test more, but open for other
>>> suggestions.
>>
>> Ai, that is tricky. tx_flags is full/scarce indeed.
>>
>> CB does not persist as the skb transitions between layers.
> 
> Though specifically for TCP, it is possible to look up the fast
> clone on the rtx queue, whose tcp_skb_cb will be unperturbed. But
> the tcb currently does not have this data either.

It will work fine for software timestamps, but we cannot do the same
trick in case of HW timestamps, right?
Willem de Bruijn Sept. 6, 2024, 11:48 p.m. UTC | #9
Vadim Fedorenko wrote:
> On 06/09/2024 16:25, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>> Vadim Fedorenko wrote:
> >>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>
> >>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>>>> ---
> >>>>>     net/ipv4/tcp.c | 13 +++++++++----
> >>>>>     1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>> --- a/net/ipv4/tcp.c
> >>>>> +++ b/net/ipv4/tcp.c
> >>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>>     }
> >>>>>     EXPORT_SYMBOL(tcp_init_sock);
> >>>>>     
> >>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>>     {
> >>>>>     	struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>> +	u32 tsflags = sockc->tsflags;
> >>>>>     
> >>>>>     	if (tsflags && skb) {
> >>>>>     		struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>     		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>>     		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>>     			tcb->txstamp_ack = 1;
> >>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>> +				shinfo->tskey = sockc->ts_opt_id;
> >>>>> +			else
> >>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>> +		}
> >>>>>     	}
> >>>>>     }
> >>>>>     
> >>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>     
> >>>>>     out:
> >>>>>     	if (copied) {
> >>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>> +		tcp_tx_timestamp(sk, &sockc);
> >>>>>     		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>>     	}
> >>>>>     out_nopush:
> >>>>
> >>>> Hi Willem,
> >>>>
> >>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>> flow:
> >>>>
> >>>> tcp_skb_collapse_tstamp()
> >>>> tcp_fragment_tstamp()
> >>>> tcp_gso_tstamp()
> >>>>
> >>>> I believe the last one breaks tests, but the problem is that there is no
> >>>> easy way to provide the flag of constant tskey to it. Only
> >>>> shinfo::tx_flags are available at the caller side and we have already
> >>>> discussed that we shouldn't use the last bit of this field.
> >>>>
> >>>> So, how should we deal with the problem? Or is it better to postpone
> >>>> support for TCP sockets in this case?
> >>>
> >>> Are you sure that this is a problem. These functions pass on the
> >>> skb_shinfo(skb)->ts_key from one skb to another.
> >>
> >> Yes, you are right, the problem is in a different place.
> >>
> >> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >> value:
> >>
> >> 	if (sk_is_tcp(sk))
> >> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>
> >> It happens because of assumption that for TCP sockets shinfo::tskey will
> >> have sequence number and the logic has to recalculate it back to the
> >> bytes sent. The same logic exists in all TCP TX timestamping functions
> >> (mentioned in the previous mail) and may trigger some unexpected
> >> behavior. To fix the issue we have to provide some kind of signal that
> >> tskey value is provided from user-space and shouldn't be changed. And we
> >> have to have it somewhere in skb info. Again, tx_flags looks like the
> >> best candidate, but it's impossible to use. I'm thinking of using
> >> special flag in tcp_skb_cb - gonna test more, but open for other
> >> suggestions.
> > 
> > Ai, that is tricky. tx_flags is full/scarce indeed.
> > 
> > CB does not persist as the skb transitions between layers.
> > 
> > The most obvious solution would be to set the flag in sk_tsflags
> > itself. But then the cmsg would no long work on a per request basis:
> > either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> > 
> > Good that we catch this now before the ABI is finalized.
> > 
> > If necessary TCP semantics can diverge from datagrams. So we could
> > punt on this. But it's not ideal.
> 
> I have done proof of concept code which uses hwtsamp as a storage for
> custom tskey in case of TCP:
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a52638363ea5..40ed49e61bf7 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct 
> sk_buff *skb,
>          serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
>          if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
>                  serr->ee.ee_data = skb_shinfo(skb)->tskey;
> -               if (sk_is_tcp(sk))
> -                       serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>          }
> 
>          err = sock_queue_err_skb(sk, skb);
> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>           * but only if the socket refcount is not zero.
>           */
>          if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
> +               if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) & 
> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
> +                       skb_shinfo(skb)->tskey = 
> (u32)skb_hwtstamps(skb)->hwtstamp;
>                  *skb_hwtstamps(skb) = *hwtstamps;
>                  __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, 
> false);
>                  sock_put(sk);
> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>                  skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
>          }
> 
> +       if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> +               if (skb_hwtstamps(orig_skb)->hwtstamp)
> +                       skb_shinfo(skb)->tskey = 
> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
> +               else
> +                       skb_shinfo(skb)->tskey -= 
> atomic_read(&sk->sk_tskey);
> +       }
>          if (hwtstamps)
>                  *skb_hwtstamps(skb) = *hwtstamps;
>          else
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0d3decc13a99..1a161a2155b5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct 
> sockcm_cookie *sockc)
>                          tcb->txstamp_ack = 1;
>                  if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>                          if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> -                               shinfo->tskey = sockc->ts_opt_id;
> -                       else
> -                               shinfo->tskey = TCP_SKB_CB(skb)->seq + 
> skb->len - 1;
> +                               skb_hwtstamps(skb)->hwtstamp = 
> sockc->ts_opt_id;
> +                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>                  }
>          }
>   }
> 
> 
> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
> it. netdev_data field is only used on RX timestamp side, so should be
> fine to reuse. WDYT?

We cannot really extend the struct. skb_shared_info is scarce.
hwtstamps is a union. But on tx the hw tstamp is queued using
skb_tstamp_tx, not through this shinfo data at all.

It just seems weird to have a shinfo->tskey, but then ignore it and
find yet another 32b field. Easier would be to find 1b to toggle
whether tskey is to be interpreted as counter based or OPT_ID_CMSG.

I don't immediate see a perfect solution either.
Willem de Bruijn Sept. 6, 2024, 11:50 p.m. UTC | #10
Vadim Fedorenko wrote:
> On 06/09/2024 17:33, Willem de Bruijn wrote:
> > Willem de Bruijn wrote:
> >> Vadim Fedorenko wrote:
> >>> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>>> Vadim Fedorenko wrote:
> >>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>>
> >>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>>>>> ---
> >>>>>>     net/ipv4/tcp.c | 13 +++++++++----
> >>>>>>     1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>>
> >>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>>> --- a/net/ipv4/tcp.c
> >>>>>> +++ b/net/ipv4/tcp.c
> >>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>>>     }
> >>>>>>     EXPORT_SYMBOL(tcp_init_sock);
> >>>>>>     
> >>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>>>     {
> >>>>>>     	struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>>> +	u32 tsflags = sockc->tsflags;
> >>>>>>     
> >>>>>>     	if (tsflags && skb) {
> >>>>>>     		struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>>     		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>>>     		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>>>     			tcb->txstamp_ack = 1;
> >>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>>> +				shinfo->tskey = sockc->ts_opt_id;
> >>>>>> +			else
> >>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>> +		}
> >>>>>>     	}
> >>>>>>     }
> >>>>>>     
> >>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>>     
> >>>>>>     out:
> >>>>>>     	if (copied) {
> >>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>>> +		tcp_tx_timestamp(sk, &sockc);
> >>>>>>     		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>>>     	}
> >>>>>>     out_nopush:
> >>>>>
> >>>>> Hi Willem,
> >>>>>
> >>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>>> flow:
> >>>>>
> >>>>> tcp_skb_collapse_tstamp()
> >>>>> tcp_fragment_tstamp()
> >>>>> tcp_gso_tstamp()
> >>>>>
> >>>>> I believe the last one breaks tests, but the problem is that there is no
> >>>>> easy way to provide the flag of constant tskey to it. Only
> >>>>> shinfo::tx_flags are available at the caller side and we have already
> >>>>> discussed that we shouldn't use the last bit of this field.
> >>>>>
> >>>>> So, how should we deal with the problem? Or is it better to postpone
> >>>>> support for TCP sockets in this case?
> >>>>
> >>>> Are you sure that this is a problem. These functions pass on the
> >>>> skb_shinfo(skb)->ts_key from one skb to another.
> >>>
> >>> Yes, you are right, the problem is in a different place.
> >>>
> >>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >>> value:
> >>>
> >>> 	if (sk_is_tcp(sk))
> >>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>>
> >>> It happens because of assumption that for TCP sockets shinfo::tskey will
> >>> have sequence number and the logic has to recalculate it back to the
> >>> bytes sent. The same logic exists in all TCP TX timestamping functions
> >>> (mentioned in the previous mail) and may trigger some unexpected
> >>> behavior. To fix the issue we have to provide some kind of signal that
> >>> tskey value is provided from user-space and shouldn't be changed. And we
> >>> have to have it somewhere in skb info. Again, tx_flags looks like the
> >>> best candidate, but it's impossible to use. I'm thinking of using
> >>> special flag in tcp_skb_cb - gonna test more, but open for other
> >>> suggestions.
> >>
> >> Ai, that is tricky. tx_flags is full/scarce indeed.
> >>
> >> CB does not persist as the skb transitions between layers.
> > 
> > Though specifically for TCP, it is possible to look up the fast
> > clone on the rtx queue, whose tcp_skb_cb will be unperturbed. But
> > the tcb currently does not have this data either.
> 
> It will work fine for software timestamps, but we cannot do the same
> trick in case of HW timestamps, right?

I'm not really advocating it. The only user of this trick that I can
find is skb_still_in_host_queue, through skb_fclone_busy.

That said, it would also work for hardware, as the SKB_FCLONE_ORIG
remains on the rtx queue. In the common case. But skb_fclone_busy
points out edge cases, such as if a driver call skb_orphan..
Vadim Fedorenko Sept. 7, 2024, 9:55 p.m. UTC | #11
On 07/09/2024 00:48, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 06/09/2024 16:25, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>>>> Vadim Fedorenko wrote:
>>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>>>
>>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>>>> ---
>>>>>>>      net/ipv4/tcp.c | 13 +++++++++----
>>>>>>>      1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>>>> --- a/net/ipv4/tcp.c
>>>>>>> +++ b/net/ipv4/tcp.c
>>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>>>      }
>>>>>>>      EXPORT_SYMBOL(tcp_init_sock);
>>>>>>>      
>>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>>      {
>>>>>>>      	struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>>>> +	u32 tsflags = sockc->tsflags;
>>>>>>>      
>>>>>>>      	if (tsflags && skb) {
>>>>>>>      		struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>>      		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>>>      		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>>>      			tcb->txstamp_ack = 1;
>>>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>>>> +				shinfo->tskey = sockc->ts_opt_id;
>>>>>>> +			else
>>>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>> +		}
>>>>>>>      	}
>>>>>>>      }
>>>>>>>      
>>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>>>      
>>>>>>>      out:
>>>>>>>      	if (copied) {
>>>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
>>>>>>> +		tcp_tx_timestamp(sk, &sockc);
>>>>>>>      		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>>>      	}
>>>>>>>      out_nopush:
>>>>>>
>>>>>> Hi Willem,
>>>>>>
>>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>>>> flow:
>>>>>>
>>>>>> tcp_skb_collapse_tstamp()
>>>>>> tcp_fragment_tstamp()
>>>>>> tcp_gso_tstamp()
>>>>>>
>>>>>> I believe the last one breaks tests, but the problem is that there is no
>>>>>> easy way to provide the flag of constant tskey to it. Only
>>>>>> shinfo::tx_flags are available at the caller side and we have already
>>>>>> discussed that we shouldn't use the last bit of this field.
>>>>>>
>>>>>> So, how should we deal with the problem? Or is it better to postpone
>>>>>> support for TCP sockets in this case?
>>>>>
>>>>> Are you sure that this is a problem. These functions pass on the
>>>>> skb_shinfo(skb)->ts_key from one skb to another.
>>>>
>>>> Yes, you are right, the problem is in a different place.
>>>>
>>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>>>> value:
>>>>
>>>> 	if (sk_is_tcp(sk))
>>>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>>
>>>> It happens because of assumption that for TCP sockets shinfo::tskey will
>>>> have sequence number and the logic has to recalculate it back to the
>>>> bytes sent. The same logic exists in all TCP TX timestamping functions
>>>> (mentioned in the previous mail) and may trigger some unexpected
>>>> behavior. To fix the issue we have to provide some kind of signal that
>>>> tskey value is provided from user-space and shouldn't be changed. And we
>>>> have to have it somewhere in skb info. Again, tx_flags looks like the
>>>> best candidate, but it's impossible to use. I'm thinking of using
>>>> special flag in tcp_skb_cb - gonna test more, but open for other
>>>> suggestions.
>>>
>>> Ai, that is tricky. tx_flags is full/scarce indeed.
>>>
>>> CB does not persist as the skb transitions between layers.
>>>
>>> The most obvious solution would be to set the flag in sk_tsflags
>>> itself. But then the cmsg would no long work on a per request basis:
>>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
>>>
>>> Good that we catch this now before the ABI is finalized.
>>>
>>> If necessary TCP semantics can diverge from datagrams. So we could
>>> punt on this. But it's not ideal.
>>
>> I have done proof of concept code which uses hwtsamp as a storage for
>> custom tskey in case of TCP:
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a52638363ea5..40ed49e61bf7 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
>> sk_buff *skb,
>>           serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
>>           if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
>>                   serr->ee.ee_data = skb_shinfo(skb)->tskey;
>> -               if (sk_is_tcp(sk))
>> -                       serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>           }
>>
>>           err = sock_queue_err_skb(sk, skb);
>> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>>            * but only if the socket refcount is not zero.
>>            */
>>           if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
>> +               if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
>> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
>> +                       skb_shinfo(skb)->tskey =
>> (u32)skb_hwtstamps(skb)->hwtstamp;
>>                   *skb_hwtstamps(skb) = *hwtstamps;
>>                   __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
>> false);
>>                   sock_put(sk);
>> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>>                   skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
>>           }
>>
>> +       if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
>> +               if (skb_hwtstamps(orig_skb)->hwtstamp)
>> +                       skb_shinfo(skb)->tskey =
>> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
>> +               else
>> +                       skb_shinfo(skb)->tskey -=
>> atomic_read(&sk->sk_tskey);
>> +       }
>>           if (hwtstamps)
>>                   *skb_hwtstamps(skb) = *hwtstamps;
>>           else
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 0d3decc13a99..1a161a2155b5 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
>> sockcm_cookie *sockc)
>>                           tcb->txstamp_ack = 1;
>>                   if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>                           if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>> -                               shinfo->tskey = sockc->ts_opt_id;
>> -                       else
>> -                               shinfo->tskey = TCP_SKB_CB(skb)->seq +
>> skb->len - 1;
>> +                               skb_hwtstamps(skb)->hwtstamp =
>> sockc->ts_opt_id;
>> +                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>                   }
>>           }
>>    }
>>
>>
>> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
>> it. netdev_data field is only used on RX timestamp side, so should be
>> fine to reuse. WDYT?
> 
> We cannot really extend the struct. skb_shared_info is scarce.
> hwtstamps is a union. But on tx the hw tstamp is queued using
> skb_tstamp_tx, not through this shinfo data at all.
> 
> It just seems weird to have a shinfo->tskey, but then ignore it and
> find yet another 32b field. Easier would be to find 1b to toggle
> whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
> 
> I don't immediate see a perfect solution either.

Well, with this said, and given that having 2 different tskey values is
weird solution, I'm thinking of skipping TCP part until we can find some
proper way. Will it be OK to have UDP and RAW sockets only opted into
the feature?
Willem de Bruijn Sept. 8, 2024, 7:19 p.m. UTC | #12
Vadim Fedorenko wrote:
> On 07/09/2024 00:48, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 06/09/2024 16:25, Willem de Bruijn wrote:
> >>> Vadim Fedorenko wrote:
> >>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>>>> Vadim Fedorenko wrote:
> >>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>>>
> >>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>>>>>> ---
> >>>>>>>      net/ipv4/tcp.c | 13 +++++++++----
> >>>>>>>      1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>>>> --- a/net/ipv4/tcp.c
> >>>>>>> +++ b/net/ipv4/tcp.c
> >>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>>>>      }
> >>>>>>>      EXPORT_SYMBOL(tcp_init_sock);
> >>>>>>>      
> >>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>>>>      {
> >>>>>>>      	struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>>>> +	u32 tsflags = sockc->tsflags;
> >>>>>>>      
> >>>>>>>      	if (tsflags && skb) {
> >>>>>>>      		struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>>>      		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>>>>      		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>>>>      			tcb->txstamp_ack = 1;
> >>>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>>>> +				shinfo->tskey = sockc->ts_opt_id;
> >>>>>>> +			else
> >>>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>>> +		}
> >>>>>>>      	}
> >>>>>>>      }
> >>>>>>>      
> >>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>>>      
> >>>>>>>      out:
> >>>>>>>      	if (copied) {
> >>>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>>>> +		tcp_tx_timestamp(sk, &sockc);
> >>>>>>>      		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>>>>      	}
> >>>>>>>      out_nopush:
> >>>>>>
> >>>>>> Hi Willem,
> >>>>>>
> >>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>>>> flow:
> >>>>>>
> >>>>>> tcp_skb_collapse_tstamp()
> >>>>>> tcp_fragment_tstamp()
> >>>>>> tcp_gso_tstamp()
> >>>>>>
> >>>>>> I believe the last one breaks tests, but the problem is that there is no
> >>>>>> easy way to provide the flag of constant tskey to it. Only
> >>>>>> shinfo::tx_flags are available at the caller side and we have already
> >>>>>> discussed that we shouldn't use the last bit of this field.
> >>>>>>
> >>>>>> So, how should we deal with the problem? Or is it better to postpone
> >>>>>> support for TCP sockets in this case?
> >>>>>
> >>>>> Are you sure that this is a problem. These functions pass on the
> >>>>> skb_shinfo(skb)->ts_key from one skb to another.
> >>>>
> >>>> Yes, you are right, the problem is in a different place.
> >>>>
> >>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >>>> value:
> >>>>
> >>>> 	if (sk_is_tcp(sk))
> >>>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>>>
> >>>> It happens because of assumption that for TCP sockets shinfo::tskey will
> >>>> have sequence number and the logic has to recalculate it back to the
> >>>> bytes sent. The same logic exists in all TCP TX timestamping functions
> >>>> (mentioned in the previous mail) and may trigger some unexpected
> >>>> behavior. To fix the issue we have to provide some kind of signal that
> >>>> tskey value is provided from user-space and shouldn't be changed. And we
> >>>> have to have it somewhere in skb info. Again, tx_flags looks like the
> >>>> best candidate, but it's impossible to use. I'm thinking of using
> >>>> special flag in tcp_skb_cb - gonna test more, but open for other
> >>>> suggestions.
> >>>
> >>> Ai, that is tricky. tx_flags is full/scarce indeed.
> >>>
> >>> CB does not persist as the skb transitions between layers.
> >>>
> >>> The most obvious solution would be to set the flag in sk_tsflags
> >>> itself. But then the cmsg would no long work on a per request basis:
> >>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> >>>
> >>> Good that we catch this now before the ABI is finalized.
> >>>
> >>> If necessary TCP semantics can diverge from datagrams. So we could
> >>> punt on this. But it's not ideal.
> >>
> >> I have done proof of concept code which uses hwtsamp as a storage for
> >> custom tskey in case of TCP:
> >>
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index a52638363ea5..40ed49e61bf7 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
> >> sk_buff *skb,
> >>           serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
> >>           if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> >>                   serr->ee.ee_data = skb_shinfo(skb)->tskey;
> >> -               if (sk_is_tcp(sk))
> >> -                       serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>           }
> >>
> >>           err = sock_queue_err_skb(sk, skb);
> >> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >>            * but only if the socket refcount is not zero.
> >>            */
> >>           if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
> >> +               if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
> >> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
> >> +                       skb_shinfo(skb)->tskey =
> >> (u32)skb_hwtstamps(skb)->hwtstamp;
> >>                   *skb_hwtstamps(skb) = *hwtstamps;
> >>                   __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
> >> false);
> >>                   sock_put(sk);
> >> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >>                   skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
> >>           }
> >>
> >> +       if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> >> +               if (skb_hwtstamps(orig_skb)->hwtstamp)
> >> +                       skb_shinfo(skb)->tskey =
> >> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
> >> +               else
> >> +                       skb_shinfo(skb)->tskey -=
> >> atomic_read(&sk->sk_tskey);
> >> +       }
> >>           if (hwtstamps)
> >>                   *skb_hwtstamps(skb) = *hwtstamps;
> >>           else
> >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >> index 0d3decc13a99..1a161a2155b5 100644
> >> --- a/net/ipv4/tcp.c
> >> +++ b/net/ipv4/tcp.c
> >> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
> >> sockcm_cookie *sockc)
> >>                           tcb->txstamp_ack = 1;
> >>                   if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>                           if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >> -                               shinfo->tskey = sockc->ts_opt_id;
> >> -                       else
> >> -                               shinfo->tskey = TCP_SKB_CB(skb)->seq +
> >> skb->len - 1;
> >> +                               skb_hwtstamps(skb)->hwtstamp =
> >> sockc->ts_opt_id;
> >> +                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>                   }
> >>           }
> >>    }
> >>
> >>
> >> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
> >> it. netdev_data field is only used on RX timestamp side, so should be
> >> fine to reuse. WDYT?
> > 
> > We cannot really extend the struct. skb_shared_info is scarce.
> > hwtstamps is a union. But on tx the hw tstamp is queued using
> > skb_tstamp_tx, not through this shinfo data at all.
> > 
> > It just seems weird to have a shinfo->tskey, but then ignore it and
> > find yet another 32b field. Easier would be to find 1b to toggle
> > whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
> > 
> > I don't immediate see a perfect solution either.
> 
> Well, with this said, and given that having 2 different tskey values is
> weird solution, I'm thinking of skipping TCP part until we can find some
> proper way. Will it be OK to have UDP and RAW sockets only opted into
> the feature?

Yes, I think that's fine.

Let's make sure that tcp sendmsg fails if the new cmsg is passed. So
that we can add support for it later.

We also have to give some thought what it means to coalesce skbs with
non-sequential IDs, in the functions you mentioned before. Might be
fine to just say: that's the application's issue. But even then we
should document that behavior.
Vadim Fedorenko Sept. 8, 2024, 7:55 p.m. UTC | #13
On 08/09/2024 20:19, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 07/09/2024 00:48, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 06/09/2024 16:25, Willem de Bruijn wrote:
>>>>> Vadim Fedorenko wrote:
>>>>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>>>>>> Vadim Fedorenko wrote:
>>>>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>>>>>> ---
>>>>>>>>>       net/ipv4/tcp.c | 13 +++++++++----
>>>>>>>>>       1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>>>>>> --- a/net/ipv4/tcp.c
>>>>>>>>> +++ b/net/ipv4/tcp.c
>>>>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>>>>>       }
>>>>>>>>>       EXPORT_SYMBOL(tcp_init_sock);
>>>>>>>>>       
>>>>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>>>>       {
>>>>>>>>>       	struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>>>>>> +	u32 tsflags = sockc->tsflags;
>>>>>>>>>       
>>>>>>>>>       	if (tsflags && skb) {
>>>>>>>>>       		struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>>>>       		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>>>>>       		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>>>>>       			tcb->txstamp_ack = 1;
>>>>>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>>>>>> +				shinfo->tskey = sockc->ts_opt_id;
>>>>>>>>> +			else
>>>>>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>>>> +		}
>>>>>>>>>       	}
>>>>>>>>>       }
>>>>>>>>>       
>>>>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>>>>>       
>>>>>>>>>       out:
>>>>>>>>>       	if (copied) {
>>>>>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
>>>>>>>>> +		tcp_tx_timestamp(sk, &sockc);
>>>>>>>>>       		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>>>>>       	}
>>>>>>>>>       out_nopush:
>>>>>>>>
>>>>>>>> Hi Willem,
>>>>>>>>
>>>>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>>>>>> flow:
>>>>>>>>
>>>>>>>> tcp_skb_collapse_tstamp()
>>>>>>>> tcp_fragment_tstamp()
>>>>>>>> tcp_gso_tstamp()
>>>>>>>>
>>>>>>>> I believe the last one breaks tests, but the problem is that there is no
>>>>>>>> easy way to provide the flag of constant tskey to it. Only
>>>>>>>> shinfo::tx_flags are available at the caller side and we have already
>>>>>>>> discussed that we shouldn't use the last bit of this field.
>>>>>>>>
>>>>>>>> So, how should we deal with the problem? Or is it better to postpone
>>>>>>>> support for TCP sockets in this case?
>>>>>>>
>>>>>>> Are you sure that this is a problem. These functions pass on the
>>>>>>> skb_shinfo(skb)->ts_key from one skb to another.
>>>>>>
>>>>>> Yes, you are right, the problem is in a different place.
>>>>>>
>>>>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>>>>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>>>>>> value:
>>>>>>
>>>>>> 	if (sk_is_tcp(sk))
>>>>>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>>>>
>>>>>> It happens because of assumption that for TCP sockets shinfo::tskey will
>>>>>> have sequence number and the logic has to recalculate it back to the
>>>>>> bytes sent. The same logic exists in all TCP TX timestamping functions
>>>>>> (mentioned in the previous mail) and may trigger some unexpected
>>>>>> behavior. To fix the issue we have to provide some kind of signal that
>>>>>> tskey value is provided from user-space and shouldn't be changed. And we
>>>>>> have to have it somewhere in skb info. Again, tx_flags looks like the
>>>>>> best candidate, but it's impossible to use. I'm thinking of using
>>>>>> special flag in tcp_skb_cb - gonna test more, but open for other
>>>>>> suggestions.
>>>>>
>>>>> Ai, that is tricky. tx_flags is full/scarce indeed.
>>>>>
>>>>> CB does not persist as the skb transitions between layers.
>>>>>
>>>>> The most obvious solution would be to set the flag in sk_tsflags
>>>>> itself. But then the cmsg would no long work on a per request basis:
>>>>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
>>>>>
>>>>> Good that we catch this now before the ABI is finalized.
>>>>>
>>>>> If necessary TCP semantics can diverge from datagrams. So we could
>>>>> punt on this. But it's not ideal.
>>>>
>>>> I have done proof of concept code which uses hwtsamp as a storage for
>>>> custom tskey in case of TCP:
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index a52638363ea5..40ed49e61bf7 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
>>>> sk_buff *skb,
>>>>            serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
>>>>            if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
>>>>                    serr->ee.ee_data = skb_shinfo(skb)->tskey;
>>>> -               if (sk_is_tcp(sk))
>>>> -                       serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>>            }
>>>>
>>>>            err = sock_queue_err_skb(sk, skb);
>>>> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>>>>             * but only if the socket refcount is not zero.
>>>>             */
>>>>            if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
>>>> +               if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
>>>> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
>>>> +                       skb_shinfo(skb)->tskey =
>>>> (u32)skb_hwtstamps(skb)->hwtstamp;
>>>>                    *skb_hwtstamps(skb) = *hwtstamps;
>>>>                    __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
>>>> false);
>>>>                    sock_put(sk);
>>>> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>>>>                    skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
>>>>            }
>>>>
>>>> +       if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
>>>> +               if (skb_hwtstamps(orig_skb)->hwtstamp)
>>>> +                       skb_shinfo(skb)->tskey =
>>>> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
>>>> +               else
>>>> +                       skb_shinfo(skb)->tskey -=
>>>> atomic_read(&sk->sk_tskey);
>>>> +       }
>>>>            if (hwtstamps)
>>>>                    *skb_hwtstamps(skb) = *hwtstamps;
>>>>            else
>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>> index 0d3decc13a99..1a161a2155b5 100644
>>>> --- a/net/ipv4/tcp.c
>>>> +++ b/net/ipv4/tcp.c
>>>> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
>>>> sockcm_cookie *sockc)
>>>>                            tcb->txstamp_ack = 1;
>>>>                    if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>                            if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>> -                               shinfo->tskey = sockc->ts_opt_id;
>>>> -                       else
>>>> -                               shinfo->tskey = TCP_SKB_CB(skb)->seq +
>>>> skb->len - 1;
>>>> +                               skb_hwtstamps(skb)->hwtstamp =
>>>> sockc->ts_opt_id;
>>>> +                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>                    }
>>>>            }
>>>>     }
>>>>
>>>>
>>>> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
>>>> it. netdev_data field is only used on RX timestamp side, so should be
>>>> fine to reuse. WDYT?
>>>
>>> We cannot really extend the struct. skb_shared_info is scarce.
>>> hwtstamps is a union. But on tx the hw tstamp is queued using
>>> skb_tstamp_tx, not through this shinfo data at all.
>>>
>>> It just seems weird to have a shinfo->tskey, but then ignore it and
>>> find yet another 32b field. Easier would be to find 1b to toggle
>>> whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
>>>
>>> I don't immediate see a perfect solution either.
>>
>> Well, with this said, and given that having 2 different tskey values is
>> weird solution, I'm thinking of skipping TCP part until we can find some
>> proper way. Will it be OK to have UDP and RAW sockets only opted into
>> the feature?
> 
> Yes, I think that's fine.
> 
> Let's make sure that tcp sendmsg fails if the new cmsg is passed. So
> that we can add support for it later.

Thanks Willem, I'll prepare patchset tomorrow.

> We also have to give some thought what it means to coalesce skbs with
> non-sequential IDs, in the functions you mentioned before. Might be
> fine to just say: that's the application's issue. But even then we
> should document that behavior.

We can discuss it in a separate RFC series, there are more open
questions, I believe.
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a5680b4e786..5553a8aeee80 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -474,9 +474,10 @@  void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
-static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
+static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 {
 	struct sk_buff *skb = tcp_write_queue_tail(sk);
+	u32 tsflags = sockc->tsflags;
 
 	if (tsflags && skb) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
@@ -485,8 +486,12 @@  static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
 		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
 		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
 			tcb->txstamp_ack = 1;
-		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
-			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
+			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
+				shinfo->tskey = sockc->ts_opt_id;
+			else
+				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+		}
 	}
 }
 
@@ -1318,7 +1323,7 @@  int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 out:
 	if (copied) {
-		tcp_tx_timestamp(sk, sockc.tsflags);
+		tcp_tx_timestamp(sk, &sockc);
 		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
 	}
 out_nopush: