diff mbox series

[bpf-next] bpf: Allow setting SO_TIMESTAMPING* with bpf_setsockopt()

Message ID 20240115134110.11624-1-j-t.hinz@alumni.tu-berlin.de (mailing list archive)
State New
Headers show
Series [bpf-next] bpf: Allow setting SO_TIMESTAMPING* with bpf_setsockopt() | expand

Commit Message

Jörn-Thorben Hinz Jan. 15, 2024, 1:41 p.m. UTC
A BPF application, e.g., a TCP congestion control, might benefit from or
even require precise (=hardware) packet timestamps. These timestamps are
already available through __sk_buff.hwtstamp and
bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
not allowed to set SO_TIMESTAMPING* on sockets.

Enable BPF programs to actively request the generation of timestamps
from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
network device must still be done separately, in user space.

This patch had previously been submitted in a two-part series (first
link below). The second patch has been independently applied in commit
7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)")
(second link below).

On the earlier submission, there was the open question whether to only
allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:

For a BPF program, this won't make a difference: A timestamp, when
accessed through the fields mentioned above, is directly read from
skb_shared_info.hwtstamps, independent of the places where NEW/OLD is
relevant. See bpf_convert_ctx_access() besides others.

I am unsure, though, when it comes to the interconnection of user space
and BPF "space", when both are interested in the timestamps. I think it
would cause an unsolvable conflict when user space is bound to use
SO_TIMESTAMPING_OLD with a BPF program only allowed to set
SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
mistaken.

Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/
Link: https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin.de/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Jörn-Thorben Hinz <j-t.hinz@alumni.tu-berlin.de>
---
 include/uapi/linux/bpf.h                            | 3 ++-
 net/core/filter.c                                   | 2 ++
 tools/include/uapi/linux/bpf.h                      | 3 ++-
 tools/testing/selftests/bpf/progs/bpf_tracing_net.h | 2 ++
 tools/testing/selftests/bpf/progs/setget_sockopt.c  | 4 ++++
 5 files changed, 12 insertions(+), 2 deletions(-)

Comments

Willem de Bruijn Jan. 16, 2024, 3:17 p.m. UTC | #1
Jörn-Thorben Hinz wrote:
> A BPF application, e.g., a TCP congestion control, might benefit from or
> even require precise (=hardware) packet timestamps. These timestamps are
> already available through __sk_buff.hwtstamp and
> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
> not allowed to set SO_TIMESTAMPING* on sockets.
> 
> Enable BPF programs to actively request the generation of timestamps
> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
> network device must still be done separately, in user space.
> 
> This patch had previously been submitted in a two-part series (first
> link below). The second patch has been independently applied in commit
> 7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)")
> (second link below).
> 
> On the earlier submission, there was the open question whether to only
> allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
> 
> For a BPF program, this won't make a difference: A timestamp, when
> accessed through the fields mentioned above, is directly read from
> skb_shared_info.hwtstamps, independent of the places where NEW/OLD is
> relevant. See bpf_convert_ctx_access() besides others.
> 
> I am unsure, though, when it comes to the interconnection of user space
> and BPF "space", when both are interested in the timestamps. I think it
> would cause an unsolvable conflict when user space is bound to use
> SO_TIMESTAMPING_OLD with a BPF program only allowed to set
> SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
> mistaken.

The difference between OLD and NEW only affects the system calls. It
is not reflected in how the data is stored in the skb, or how BPF can
read the data. A process setting SO_TIMESTAMPING_OLD will still allow
BPF to read data using SO_TIMESTAMPING_NEW.

But, he one place where I see a conflict is in setting sock_flag
SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg
is written:

                if (sock_flag(sk, SOCK_TSTAMP_NEW))
                        put_cmsg_scm_timestamping64(msg, tss);
                else
                        put_cmsg_scm_timestamping(msg, tss);

So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by
a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this
would flip SOCK_TSTAMP_NEW.

Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it
just adds the inverse case.

A related problem is how does the BPF program know which of the two
variants to set. The BPF program is usually compiled and loaded
independently of the running process.

Perhaps one option is to fail the setsockop if it would flip
sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it
changes existing ABI.

Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be
prepared to handle a particular errno, and retry with
SO_TIMESTAMPING_OLD.



 
> Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/
> Link: https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin.de/
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Deepa Dinamani <deepa.kernel@gmail.com>
> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Signed-off-by: Jörn-Thorben Hinz <j-t.hinz@alumni.tu-berlin.de>
Martin KaFai Lau Jan. 17, 2024, 7:33 a.m. UTC | #2
On 1/16/24 7:17 AM, Willem de Bruijn wrote:
> Jörn-Thorben Hinz wrote:
>> A BPF application, e.g., a TCP congestion control, might benefit from or
>> even require precise (=hardware) packet timestamps. These timestamps are
>> already available through __sk_buff.hwtstamp and
>> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
>> not allowed to set SO_TIMESTAMPING* on sockets.

This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about 
others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue 
which seems not good. If rx tstamp is useful, tx tstamp should be useful also?

>>
>> Enable BPF programs to actively request the generation of timestamps
>> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
>> network device must still be done separately, in user space.

hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the 
SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?

I likely miss something. When skb is created in the driver rx path, the sk is 
not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the 
skb_shinfo(skb)->hwtstamps?
Willem de Bruijn Jan. 17, 2024, 3:55 p.m. UTC | #3
Martin KaFai Lau wrote:
> On 1/16/24 7:17 AM, Willem de Bruijn wrote:
> > Jörn-Thorben Hinz wrote:
> >> A BPF application, e.g., a TCP congestion control, might benefit from or
> >> even require precise (=hardware) packet timestamps. These timestamps are
> >> already available through __sk_buff.hwtstamp and
> >> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
> >> not allowed to set SO_TIMESTAMPING* on sockets.
> 
> This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about 
> others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue 
> which seems not good. If rx tstamp is useful, tx tstamp should be useful also?

Good point. Or should not be allowed to be set from BPF.

That significantly changes process behavior, e.g., by returning POLLERR.
 
> >>
> >> Enable BPF programs to actively request the generation of timestamps
> >> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
> >> network device must still be done separately, in user space.
> 
> hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the 
> SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
> 
> I likely miss something. When skb is created in the driver rx path, the sk is 
> not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the 
> skb_shinfo(skb)->hwtstamps?

Indeed it does not seem to do anything in the datapath.

Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call net_enable_timestamp
to start timestamping packets.

But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.

Drivers do use it in ethtool get_ts_info to signal hardware
capabilities. But those must be configured using the ioctl.

It is there more for consistency with the other timestamp recording
options, I suppose.
Martin KaFai Lau Jan. 17, 2024, 9:23 p.m. UTC | #4
On 1/17/24 7:55 AM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 1/16/24 7:17 AM, Willem de Bruijn wrote:
>>> Jörn-Thorben Hinz wrote:
>>>> A BPF application, e.g., a TCP congestion control, might benefit from or
>>>> even require precise (=hardware) packet timestamps. These timestamps are
>>>> already available through __sk_buff.hwtstamp and
>>>> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
>>>> not allowed to set SO_TIMESTAMPING* on sockets.
>>
>> This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about
>> others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue
>> which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
> 
> Good point. Or should not be allowed to be set from BPF.
> 
> That significantly changes process behavior, e.g., by returning POLLERR.
>   
>>>>
>>>> Enable BPF programs to actively request the generation of timestamps
>>>> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
>>>> network device must still be done separately, in user space.
>>
>> hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the
>> SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
>>
>> I likely miss something. When skb is created  in the driver rx path, the sk is
>> not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the
>> skb_shinfo(skb)->hwtstamps?
> 
> Indeed it does not seem to do anything in the datapath.
> 
> Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call net_enable_timestamp
> to start timestamping packets.
> 
> But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.
> 
> Drivers do use it in ethtool get_ts_info to signal hardware
> capabilities. But those must be configured using the ioctl.
> 
> It is there more for consistency with the other timestamp recording
> options, I suppose.
> 

Thanks for the explanation on the SOF_TIMESTAMPING_RX_{HARDWARE,SOFTWARE}.

__sk_buff.hwtstamp should have the NIC rx timestamp then as long as the NIC is 
ioctl configured.

Jorn, do you need RX_SOFTWARE? From looking at net_timestamp_set(), any socket 
requested RX_SOFTWARE should be enough to get a skb->tstamp for all skbs. A 
workaround is to manually create a socket and turn on RX_SOFTWARE.

It will still be nice to get proper bpf_setsockopt() support for RX_SOFTWARE but 
it should be considered together with how SO_TIMESTAMPING_TX_* should work in 
bpf prog considering the TX tstamping does not have a workaround solution like 
RX_SOFTWARE.

It is probably cleaner to have a separate bit in sk->sk_tsflags for bpf such 
that the bpf prog won't be affected by the userspace turning it on/off and it 
won't change the userspace's expectation also (e.g. sk_error_queue and POLLERR).

The part that needs more thoughts in the tx tstamp is how to notify the bpf prog 
to consume it. Potentially the kernel can involve a bpf prog to collect the tx 
timestamp when the bpf bit in sk->sk_tsflags is set. An example on how TCP-CC is 
using it will help to think of the approach here.
Jörn-Thorben Hinz Jan. 18, 2024, 11:04 a.m. UTC | #5
On Tue, 2024-01-16 at 10:17 -0500, Willem de Bruijn wrote:
> Jörn-Thorben Hinz wrote:
> > A BPF application, e.g., a TCP congestion control, might benefit
> > from or
> > even require precise (=hardware) packet timestamps. These
> > timestamps are
> > already available through __sk_buff.hwtstamp and
> > bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs
> > were
> > not allowed to set SO_TIMESTAMPING* on sockets.
> > 
> > Enable BPF programs to actively request the generation of
> > timestamps
> > from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
> > network device must still be done separately, in user space.
> > 
> > This patch had previously been submitted in a two-part series
> > (first
> > link below). The second patch has been independently applied in
> > commit
> > 7f6ca95d16b9 ("net: Implement missing
> > getsockopt(SO_TIMESTAMPING_NEW)")
> > (second link below).
> > 
> > On the earlier submission, there was the open question whether to
> > only
> > allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
> > 
> > For a BPF program, this won't make a difference: A timestamp, when
> > accessed through the fields mentioned above, is directly read from
> > skb_shared_info.hwtstamps, independent of the places where NEW/OLD
> > is
> > relevant. See bpf_convert_ctx_access() besides others.
> > 
> > I am unsure, though, when it comes to the interconnection of user
> > space
> > and BPF "space", when both are interested in the timestamps. I
> > think it
> > would cause an unsolvable conflict when user space is bound to use
> > SO_TIMESTAMPING_OLD with a BPF program only allowed to set
> > SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
> > mistaken.
> 
> The difference between OLD and NEW only affects the system calls. It
> is not reflected in how the data is stored in the skb, or how BPF can
> read the data. A process setting SO_TIMESTAMPING_OLD will still allow
> BPF to read data using SO_TIMESTAMPING_NEW.
> 
> But, he one place where I see a conflict is in setting sock_flag
> SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg
> is written:
> 
>                 if (sock_flag(sk, SOCK_TSTAMP_NEW))
>                         put_cmsg_scm_timestamping64(msg, tss);
>                 else
>                         put_cmsg_scm_timestamping(msg, tss);
> 
> So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by
> a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this
> would flip SOCK_TSTAMP_NEW.
> 
> Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it
> just adds the inverse case.
Thanks for elaborating on this. I see I only thought of half the
possible conflicting situations.

> 
> A related problem is how does the BPF program know which of the two
> variants to set. The BPF program is usually compiled and loaded
> independently of the running process.
True, that is an additional challenge. And with respect to CO-RE, I
think a really portable BPF program could (or at least should) not even
decide on NEW or OLD at compile time.

> 
> Perhaps one option is to fail the setsockop if it would flip
> sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it
> changes existing ABI.
> 
> Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be
> prepared to handle a particular errno, and retry with
> SO_TIMESTAMPING_OLD.
Hmm, would be possible, yes. But sounds like a weird and unexpected
special-case behavior to the occasional BPF user.

> 
> 
> 
>  
> > Link:
> > https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/
> > Link:
> > https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin.de/
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Deepa Dinamani <deepa.kernel@gmail.com>
> > Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Signed-off-by: Jörn-Thorben Hinz <j-t.hinz@alumni.tu-berlin.de>
> 
>
Willem de Bruijn Jan. 18, 2024, 2:46 p.m. UTC | #6
Jörn-Thorben Hinz wrote:
> On Tue, 2024-01-16 at 10:17 -0500, Willem de Bruijn wrote:
> > Jörn-Thorben Hinz wrote:
> > > A BPF application, e.g., a TCP congestion control, might benefit
> > > from or
> > > even require precise (=hardware) packet timestamps. These
> > > timestamps are
> > > already available through __sk_buff.hwtstamp and
> > > bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs
> > > were
> > > not allowed to set SO_TIMESTAMPING* on sockets.
> > > 
> > > Enable BPF programs to actively request the generation of
> > > timestamps
> > > from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
> > > network device must still be done separately, in user space.
> > > 
> > > This patch had previously been submitted in a two-part series
> > > (first
> > > link below). The second patch has been independently applied in
> > > commit
> > > 7f6ca95d16b9 ("net: Implement missing
> > > getsockopt(SO_TIMESTAMPING_NEW)")
> > > (second link below).
> > > 
> > > On the earlier submission, there was the open question whether to
> > > only
> > > allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
> > > 
> > > For a BPF program, this won't make a difference: A timestamp, when
> > > accessed through the fields mentioned above, is directly read from
> > > skb_shared_info.hwtstamps, independent of the places where NEW/OLD
> > > is
> > > relevant. See bpf_convert_ctx_access() besides others.
> > > 
> > > I am unsure, though, when it comes to the interconnection of user
> > > space
> > > and BPF "space", when both are interested in the timestamps. I
> > > think it
> > > would cause an unsolvable conflict when user space is bound to use
> > > SO_TIMESTAMPING_OLD with a BPF program only allowed to set
> > > SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
> > > mistaken.
> > 
> > The difference between OLD and NEW only affects the system calls. It
> > is not reflected in how the data is stored in the skb, or how BPF can
> > read the data. A process setting SO_TIMESTAMPING_OLD will still allow
> > BPF to read data using SO_TIMESTAMPING_NEW.
> > 
> > But, he one place where I see a conflict is in setting sock_flag
> > SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg
> > is written:
> > 
> >                 if (sock_flag(sk, SOCK_TSTAMP_NEW))
> >                         put_cmsg_scm_timestamping64(msg, tss);
> >                 else
> >                         put_cmsg_scm_timestamping(msg, tss);
> > 
> > So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by
> > a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this
> > would flip SOCK_TSTAMP_NEW.
> > 
> > Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it
> > just adds the inverse case.
> Thanks for elaborating on this. I see I only thought of half the
> possible conflicting situations.
> 
> > 
> > A related problem is how does the BPF program know which of the two
> > variants to set. The BPF program is usually compiled and loaded
> > independently of the running process.
> True, that is an additional challenge. And with respect to CO-RE, I
> think a really portable BPF program could (or at least should) not even
> decide on NEW or OLD at compile time.
> 
> > 
> > Perhaps one option is to fail the setsockop if it would flip
> > sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it
> > changes existing ABI.
> > 
> > Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be
> > prepared to handle a particular errno, and retry with
> > SO_TIMESTAMPING_OLD.
> Hmm, would be possible, yes. But sounds like a weird and unexpected
> special-case behavior to the occasional BPF user.

Agreed. So perhaps we're back to where we say: this is a new feature
for BPF, only support it on modern environments that use
SO_TIMESTAMPING_NEW?
Jörn-Thorben Hinz Jan. 18, 2024, 2:53 p.m. UTC | #7
Hmm, after taking a new look at it today, I think my patch can be
disregarded---at least for having a BPF program access *RX* *hardware*
timestamps. (Sorry about the noise then.)

When I looked into this a few months ago, I half-blindly followed
Documentation/networking/timestamping.rst, afterwards assuming
bpf_setsockopt(SO_TIMESTAMPING*) will be necessary for my use case (see
about it at the end).

Looking at it again today, it seems the ioctl(SIOCSHWTSTAMP) is
sufficient here: It enables the hardware timestamping on the device,
which are placed in skb's/skb_shared_info's hwtstamps field. This
hwtstamps is where the values of __sk_buff.hwtstamp and
bpf_sock_ops.skb_hwtstamp are coming from. No further timestamp
processing is involved when a BPF program reads the these two fields.
Meaning bpf_setsockopt(SOF_TIMESTAMPING_RX_HARDWARE) would be a no-op
from the view of a BPF program.

I started this message before coming to the above understanding but
I've left my replies in below.

With bpf_setsockopt(SOF_TIMESTAMPING_RX_HARDWARE) being unnecessary,
and bpf_setsockopt(SOF_TIMESTAMPING_RX_SOFTWARE), as I understand,
having a number of possibly unwanted implications---should we leave it
at that here?

On Wed, 2024-01-17 at 13:23 -0800, Martin KaFai Lau wrote:
> > On 1/17/24 7:55 AM, Willem de Bruijn wrote:
> > > > Martin KaFai Lau wrote:
> > > > > > On 1/16/24 7:17 AM, Willem de Bruijn wrote:
> > > > > > > > Jörn-Thorben Hinz wrote:
> > > > > > > > > > A BPF application, e.g., a TCP congestion control,
> > > > > > > > > > might
> > > > > > > > > > benefit from or
> > > > > > > > > > even require precise (=hardware) packet timestamps.
> > > > > > > > > > These
> > > > > > > > > > timestamps are
> > > > > > > > > > already available through __sk_buff.hwtstamp and
> > > > > > > > > > bpf_sock_ops.skb_hwtstamp, but could not be
> > > > > > > > > > requested: BPF
> > > > > > > > > > programs were
> > > > > > > > > > not allowed to set SO_TIMESTAMPING* on sockets.
> > > > > > 
> > > > > > This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in
> > > > > > the
> > > > > > selftest. How about
> > > > > > others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the
> > > > > > sk->sk_error_queue
> > > > > > which seems not good. If rx tstamp is useful, tx tstamp
> > > > > > should be
> > > > > > useful also?
I admit I only ever looked at enabling and using
SOF_TIMESTAMPING_RX_HARDWARE for my/our use case. With that, I was not
aware that _SOFTWARE has more, possibly complicating implications.

> > > > 
> > > > Good point. Or should not be allowed to be set from BPF.
> > > > 
> > > > That significantly changes process behavior, e.g., by returning
> > > > POLLERR.
> > > >   
> > > > > > > > > > 
> > > > > > > > > > Enable BPF programs to actively request the
> > > > > > > > > > generation of
> > > > > > > > > > timestamps
> > > > > > > > > > from a stream socket. The also required
> > > > > > > > > > ioctl(SIOCSHWTSTAMP)
> > > > > > > > > > on the
> > > > > > > > > > network device must still be done separately, in
> > > > > > > > > > user space.
> > > > > > 
> > > > > > hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and
> > > > > > the
> > > > > > SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
> > > > > > 
> > > > > > I likely miss something. When skb is created  in the driver
> > > > > > rx
> > > > > > path, the sk is
> > > > > > not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE
> > > > > > of the
> > > > > > sk affects the
> > > > > > skb_shinfo(skb)->hwtstamps?
I mostly followed Documentation/networking/timestamping.rst (section 3)
to understand how the hardware timestamps are to be setup and used.

From my understanding, the ioctl(SIOCSHWTSTAMP) makes a persistent
setting for the device/driver, independent of the lifetime of any
socket or skb.

I used a simplified program[1] when trying out this patch a few months
ago.

> > > > 
> > > > Indeed it does not seem to do anything in the datapath.
> > > > 
> > > > Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call
> > > > net_enable_timestamp
> > > > to start timestamping packets.
> > > > 
> > > > But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.
> > > > 
> > > > Drivers do use it in ethtool get_ts_info to signal hardware
> > > > capabilities. But those must be configured using the ioctl.
> > > > 
> > > > It is there more for consistency with the other timestamp
> > > > recording
> > > > options, I suppose.
> > > > 
> > 
> > Thanks for the explanation on the
> > SOF_TIMESTAMPING_RX_{HARDWARE,SOFTWARE}.
> > 
> > __sk_buff.hwtstamp should have the NIC rx timestamp then as long as
> > the NIC is 
> > ioctl configured.

> > 
> > Jorn, do you need RX_SOFTWARE? From looking at net_timestamp_set(),
> > any socket 
> > requested RX_SOFTWARE should be enough to get a skb->tstamp for all
> > skbs. A 
> > workaround is to manually create a socket and turn on RX_SOFTWARE.
No, my use case was only for the RX hardware timestamps, as close to
the packet reception time point as possible.

> > 
> > It will still be nice to get proper bpf_setsockopt() support for
> > RX_SOFTWARE but 
> > it should be considered together with how SO_TIMESTAMPING_TX_*
> > should
> > work in 
> > bpf prog considering the TX tstamping does not have a workaround
> > solution like 
> > RX_SOFTWARE.
> > 
> > It is probably cleaner to have a separate bit in sk->sk_tsflags for
> > bpf such 
> > that the bpf prog won't be affected by the userspace turning it
> > on/off and it 
> > won't change the userspace's expectation also (e.g. sk_error_queue
> > and POLLERR).
> > 
> > The part that needs more thoughts in the tx tstamp is how to notify
> > the bpf prog 
> > to consume it. Potentially the kernel can involve a bpf prog to
> > collect the tx 
> > timestamp when the bpf bit in sk->sk_tsflags is set. An example on
> > how TCP-CC is 
> > using it will help to think of the approach here.
My (academic) application was an implementation[2,3] of PowerTCP[4], a
CC that (in its simplified variant) profits from precise timestamping.
Only the RX timestamps would be of use there.

As mentioned above, I used[1] a while ago when I looked into timestamp
usage. It shows how I imagine the timestamps could be accessed and used
(similarly implemented in [2]).

[1] https://github.com/jtdor/bpf_hwtstamps
[2] https://github.com/inet-tub/powertcp-linux
[3] https://schmiste.github.io/ebpf23.pdf
[4] https://schmiste.github.io/nsdi22powertcp.pdf

> > 
> >
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 754e68ca8744..8825d0648efe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@  union bpf_attr {
  * 		  **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
  * 		  **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
  * 		  **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * 		  **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * 		  **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * 		  **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
  * 		* **IPPROTO_TCP**, which supports the following *optname*\ s:
  * 		  **TCP_CONGESTION**, **TCP_BPF_IW**,
  * 		  **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/net/core/filter.c b/net/core/filter.c
index 8c9f67c81e22..4f5280874fd8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5144,6 +5144,8 @@  static int sol_socket_sockopt(struct sock *sk, int optname,
 	case SO_MAX_PACING_RATE:
 	case SO_BINDTOIFINDEX:
 	case SO_TXREHASH:
+	case SO_TIMESTAMPING_NEW:
+	case SO_TIMESTAMPING_OLD:
 		if (*optlen != sizeof(int))
 			return -EINVAL;
 		break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7f24d898efbb..09eaafa6ab43 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@  union bpf_attr {
  * 		  **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
  * 		  **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
  * 		  **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * 		  **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * 		  **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * 		  **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
  * 		* **IPPROTO_TCP**, which supports the following *optname*\ s:
  * 		  **TCP_CONGESTION**, **TCP_BPF_IW**,
  * 		  **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 1bdc680b0e0e..95f5f169819e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -15,8 +15,10 @@ 
 #define SO_RCVLOWAT		18
 #define SO_BINDTODEVICE		25
 #define SO_MARK			36
+#define SO_TIMESTAMPING_OLD     37
 #define SO_MAX_PACING_RATE	47
 #define SO_BINDTOIFINDEX	62
+#define SO_TIMESTAMPING_NEW     65
 #define SO_TXREHASH		74
 #define __SO_ACCEPTCON		(1 << 16)
 
diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c
index 7a438600ae98..54205d10793c 100644
--- a/tools/testing/selftests/bpf/progs/setget_sockopt.c
+++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c
@@ -48,6 +48,10 @@  static const struct sockopt_test sol_socket_tests[] = {
 	{ .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, },
 	{ .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, },
 	{ .opt = SO_TXREHASH, .flip = 1, },
+	{ .opt = SO_TIMESTAMPING_NEW, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+		.expected = SOF_TIMESTAMPING_RX_HARDWARE, },
+	{ .opt = SO_TIMESTAMPING_OLD, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+		.expected = SOF_TIMESTAMPING_RX_HARDWARE, },
 	{ .opt = 0, },
 };