diff mbox series

[net-next,v3,03/14] net-timestamp: open gate for bpf_setsockopt/_getsockopt

Message ID 20241028110535.82999-4-kerneljasonxing@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net-timestamp: bpf extension to equip applications transparently | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/build_tools success Errors and warnings before: 0 (+0) this patch: 0 (+0)
netdev/cc_maintainers warning 5 maintainers not CCed: horms@kernel.org matttbe@kernel.org geliang@kernel.org mptcp@lists.linux.dev martineau@kernel.org
netdev/build_clang success Errors and warnings before: 55 this patch: 55
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3307 this patch: 3307
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 47 this patch: 47
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Xing Oct. 28, 2024, 11:05 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

For now, we support bpf_setsockopt to set or clear timestamps flags.

Users can use something like this in bpf program to turn on the feature:
flags = SOF_TIMESTAMPING_TX_SCHED;
bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));
The specific use cases can be seen in the bpf selftest in this series.

Later, I will support each flags one by one based on this.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/sock.h              |  4 ++--
 include/uapi/linux/net_tstamp.h |  7 +++++++
 net/core/filter.c               |  7 +++++--
 net/core/sock.c                 | 34 ++++++++++++++++++++++++++-------
 net/ipv4/udp.c                  |  2 +-
 net/mptcp/sockopt.c             |  2 +-
 net/socket.c                    |  2 +-
 7 files changed, 44 insertions(+), 14 deletions(-)

Comments

Willem de Bruijn Oct. 29, 2024, 12:59 a.m. UTC | #1
Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> For now, we support bpf_setsockopt to set or clear timestamps flags.
> 
> Users can use something like this in bpf program to turn on the feature:
> flags = SOF_TIMESTAMPING_TX_SCHED;
> bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));
> The specific use cases can be seen in the bpf selftest in this series.
> 
> Later, I will support each flags one by one based on this.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  include/net/sock.h              |  4 ++--
>  include/uapi/linux/net_tstamp.h |  7 +++++++
>  net/core/filter.c               |  7 +++++--
>  net/core/sock.c                 | 34 ++++++++++++++++++++++++++-------
>  net/ipv4/udp.c                  |  2 +-
>  net/mptcp/sockopt.c             |  2 +-
>  net/socket.c                    |  2 +-
>  7 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 5384f1e49f5c..062f405c744e 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1775,7 +1775,7 @@ static inline void skb_set_owner_edemux(struct sk_buff *skb, struct sock *sk)
>  #endif
>  
>  int sk_setsockopt(struct sock *sk, int level, int optname,
> -		  sockptr_t optval, unsigned int optlen);
> +		  sockptr_t optval, unsigned int optlen, bool bpf_timetamping);

timestamping, not timetamping

More importantly, is there perhaps a cleaner way to add a BPF
setsockopt than to have to update the existing API and all its
callers?

>  int sock_setsockopt(struct socket *sock, int level, int op,
>  		    sockptr_t optval, unsigned int optlen);
>  int do_sock_setsockopt(struct socket *sock, bool compat, int level,
> @@ -1784,7 +1784,7 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>  		       int optname, sockptr_t optval, sockptr_t optlen);
>  
>  int sk_getsockopt(struct sock *sk, int level, int optname,
> -		  sockptr_t optval, sockptr_t optlen);
> +		  sockptr_t optval, sockptr_t optlen, bool bpf_timetamping);
>  int sock_gettstamp(struct socket *sock, void __user *userstamp,
>  		   bool timeval, bool time32);
>  struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index 858339d1c1c4..0696699cf964 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -49,6 +49,13 @@ enum {
>  					 SOF_TIMESTAMPING_TX_SCHED | \
>  					 SOF_TIMESTAMPING_TX_ACK)
>  
> +#define SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK (SOF_TIMESTAMPING_SOFTWARE | \
> +					      SOF_TIMESTAMPING_TX_SCHED | \
> +					      SOF_TIMESTAMPING_TX_SOFTWARE | \
> +					      SOF_TIMESTAMPING_TX_ACK | \
> +					      SOF_TIMESTAMPING_OPT_ID | \
> +					      SOF_TIMESTAMPING_OPT_ID_TCP)
> +

We discussed the subtle distinction between OPT_ID and OPT_ID_TCP before.

Basically, OPT_ID_TCP is a fix for OPT_ID on TCP sockets, and should always be
passed. On a new API like this one, we can even require this.

Not super important, only if it does not make the code more complex.
Jason Xing Oct. 29, 2024, 1:18 a.m. UTC | #2
On Tue, Oct 29, 2024 at 8:59 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > For now, we support bpf_setsockopt to set or clear timestamps flags.
> >
> > Users can use something like this in bpf program to turn on the feature:
> > flags = SOF_TIMESTAMPING_TX_SCHED;
> > bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));
> > The specific use cases can be seen in the bpf selftest in this series.
> >
> > Later, I will support each flags one by one based on this.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  include/net/sock.h              |  4 ++--
> >  include/uapi/linux/net_tstamp.h |  7 +++++++
> >  net/core/filter.c               |  7 +++++--
> >  net/core/sock.c                 | 34 ++++++++++++++++++++++++++-------
> >  net/ipv4/udp.c                  |  2 +-
> >  net/mptcp/sockopt.c             |  2 +-
> >  net/socket.c                    |  2 +-
> >  7 files changed, 44 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 5384f1e49f5c..062f405c744e 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1775,7 +1775,7 @@ static inline void skb_set_owner_edemux(struct sk_buff *skb, struct sock *sk)
> >  #endif
> >
> >  int sk_setsockopt(struct sock *sk, int level, int optname,
> > -               sockptr_t optval, unsigned int optlen);
> > +               sockptr_t optval, unsigned int optlen, bool bpf_timetamping);
>
> timestamping, not timetamping

Oh, right...

>
> More importantly, is there perhaps a cleaner way to add a BPF
> setsockopt than to have to update the existing API and all its
> callers?

I've thought about that as well. As you may notice, this version
changes the prior implementation [1] that makes the code more clear
from my perspective.

[1]: https://lore.kernel.org/all/20241012040651.95616-3-kerneljasonxing@gmail.com/

The link here didn't support the bpf_setsockopt which requires more
strange modification in sol_socket_sockopt() and return earlier
compared to other uses of SO_xxx. That's why I changed here.

>
> >  int sock_setsockopt(struct socket *sock, int level, int op,
> >                   sockptr_t optval, unsigned int optlen);
> >  int do_sock_setsockopt(struct socket *sock, bool compat, int level,
> > @@ -1784,7 +1784,7 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> >                      int optname, sockptr_t optval, sockptr_t optlen);
> >
> >  int sk_getsockopt(struct sock *sk, int level, int optname,
> > -               sockptr_t optval, sockptr_t optlen);
> > +               sockptr_t optval, sockptr_t optlen, bool bpf_timetamping);
> >  int sock_gettstamp(struct socket *sock, void __user *userstamp,
> >                  bool timeval, bool time32);
> >  struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> > diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> > index 858339d1c1c4..0696699cf964 100644
> > --- a/include/uapi/linux/net_tstamp.h
> > +++ b/include/uapi/linux/net_tstamp.h
> > @@ -49,6 +49,13 @@ enum {
> >                                        SOF_TIMESTAMPING_TX_SCHED | \
> >                                        SOF_TIMESTAMPING_TX_ACK)
> >
> > +#define SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK (SOF_TIMESTAMPING_SOFTWARE | \
> > +                                           SOF_TIMESTAMPING_TX_SCHED | \
> > +                                           SOF_TIMESTAMPING_TX_SOFTWARE | \
> > +                                           SOF_TIMESTAMPING_TX_ACK | \
> > +                                           SOF_TIMESTAMPING_OPT_ID | \
> > +                                           SOF_TIMESTAMPING_OPT_ID_TCP)
> > +
>
> We discussed the subtle distinction between OPT_ID and OPT_ID_TCP before.
>
> Basically, OPT_ID_TCP is a fix for OPT_ID on TCP sockets, and should always be
> passed. On a new API like this one, we can even require this.

Good idea. Will do it. Thanks.

>
> Not super important, only if it does not make the code more complex.

I need to ponder on this point more.

Thanks,
Jason
Martin KaFai Lau Oct. 30, 2024, 12:32 a.m. UTC | #3
On 10/28/24 4:05 AM, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> For now, we support bpf_setsockopt to set or clear timestamps flags.
> 
> Users can use something like this in bpf program to turn on the feature:
> flags = SOF_TIMESTAMPING_TX_SCHED;
> bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));
> The specific use cases can be seen in the bpf selftest in this series.
> 
> Later, I will support each flags one by one based on this.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   include/net/sock.h              |  4 ++--
>   include/uapi/linux/net_tstamp.h |  7 +++++++
>   net/core/filter.c               |  7 +++++--
>   net/core/sock.c                 | 34 ++++++++++++++++++++++++++-------
>   net/ipv4/udp.c                  |  2 +-
>   net/mptcp/sockopt.c             |  2 +-
>   net/socket.c                    |  2 +-
>   7 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 5384f1e49f5c..062f405c744e 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1775,7 +1775,7 @@ static inline void skb_set_owner_edemux(struct sk_buff *skb, struct sock *sk)
>   #endif
>   
>   int sk_setsockopt(struct sock *sk, int level, int optname,
> -		  sockptr_t optval, unsigned int optlen);
> +		  sockptr_t optval, unsigned int optlen, bool bpf_timetamping);
>   int sock_setsockopt(struct socket *sock, int level, int op,
>   		    sockptr_t optval, unsigned int optlen);
>   int do_sock_setsockopt(struct socket *sock, bool compat, int level,
> @@ -1784,7 +1784,7 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>   		       int optname, sockptr_t optval, sockptr_t optlen);
>   
>   int sk_getsockopt(struct sock *sk, int level, int optname,
> -		  sockptr_t optval, sockptr_t optlen);
> +		  sockptr_t optval, sockptr_t optlen, bool bpf_timetamping);
>   int sock_gettstamp(struct socket *sock, void __user *userstamp,
>   		   bool timeval, bool time32);
>   struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index 858339d1c1c4..0696699cf964 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -49,6 +49,13 @@ enum {
>   					 SOF_TIMESTAMPING_TX_SCHED | \
>   					 SOF_TIMESTAMPING_TX_ACK)
>   
> +#define SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK (SOF_TIMESTAMPING_SOFTWARE | \

hmm... so we are allowing it but SOF_TIMESTAMPING_SOFTWARE won't do anything 
(meaning set and not-set are both no-op) ?

> +					      SOF_TIMESTAMPING_TX_SCHED | \
> +					      SOF_TIMESTAMPING_TX_SOFTWARE | \
> +					      SOF_TIMESTAMPING_TX_ACK | \
> +					      SOF_TIMESTAMPING_OPT_ID | \
> +					      SOF_TIMESTAMPING_OPT_ID_TCP)
> +
>   /**
>    * struct so_timestamping - SO_TIMESTAMPING parameter
>    *
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 58761263176c..dc8ecf899ced 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5238,6 +5238,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
>   		break;
>   	case SO_BINDTODEVICE:
>   		break;
> +	case SO_TIMESTAMPING_NEW:

How about only allow bpf_setsockopt(SO_TIMESTAMPING_NEW) instead of 
bpf_setsockopt(SO_TIMESTAMPING). Does it solve the issue reported in v2?

> +	case SO_TIMESTAMPING_OLD:
> +		break;
>   	default:
>   		return -EINVAL;
>   	}
> @@ -5247,11 +5250,11 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
>   			return -EINVAL;
>   		return sk_getsockopt(sk, SOL_SOCKET, optname,
>   				     KERNEL_SOCKPTR(optval),
> -				     KERNEL_SOCKPTR(optlen));
> +				     KERNEL_SOCKPTR(optlen), true);
>   	}
>   
>   	return sk_setsockopt(sk, SOL_SOCKET, optname,
> -			     KERNEL_SOCKPTR(optval), *optlen);
> +			     KERNEL_SOCKPTR(optval), *optlen, true);
>   }
>   
>   static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 7f398bd07fb7..7e05748b1a06 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -941,6 +941,19 @@ int sock_set_timestamping(struct sock *sk, int optname,
>   	return 0;
>   }
>   
> +static int sock_set_timestamping_bpf(struct sock *sk,
> +				     struct so_timestamping timestamping)
> +{
> +	u32 flags = timestamping.flags;
> +
> +	if (flags & ~SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK)
> +		return -EINVAL;
> +
> +	WRITE_ONCE(sk->sk_tsflags_bpf, flags);

I think it is cleaner to directly "WRITE_ONCE(sk->sk_tsflags_bpf, flags);" in 
sol_socket_sockopt() instead of adding "bool bpf_timestamping" to sk_setsockopt. 
sk_tsflags_bpf is a separate u32 anyway, so not a lot of code to share. The same 
for getsockopt.

[ will continue the remaining patches a little later ]

> +
> +	return 0;
> +}
> +
>   void sock_set_keepalive(struct sock *sk)
>   {
>   	lock_sock(sk);
> @@ -1159,7 +1172,7 @@ static int sockopt_validate_clockid(__kernel_clockid_t value)
>    */
>   
>   int sk_setsockopt(struct sock *sk, int level, int optname,
> -		  sockptr_t optval, unsigned int optlen)
> +		  sockptr_t optval, unsigned int optlen, bool bpf_timetamping)
>   {
>   	struct so_timestamping timestamping;
>   	struct socket *sock = sk->sk_socket;
> @@ -1409,7 +1422,10 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
>   			memset(&timestamping, 0, sizeof(timestamping));
>   			timestamping.flags = val;
>   		}
> -		ret = sock_set_timestamping(sk, optname, timestamping);
> +		if (!bpf_timetamping)
> +			ret = sock_set_timestamping(sk, optname, timestamping);
> +		else
> +			ret = sock_set_timestamping_bpf(sk, timestamping);
>   		break;
>   
>   	case SO_RCVLOWAT:
> @@ -1626,7 +1642,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>   		    sockptr_t optval, unsigned int optlen)
>   {
>   	return sk_setsockopt(sock->sk, level, optname,
> -			     optval, optlen);
> +			     optval, optlen, false);
>   }
>   EXPORT_SYMBOL(sock_setsockopt);
>   
> @@ -1670,7 +1686,7 @@ static int groups_to_user(sockptr_t dst, const struct group_info *src)
>   }
>   
>   int sk_getsockopt(struct sock *sk, int level, int optname,
> -		  sockptr_t optval, sockptr_t optlen)
> +		  sockptr_t optval, sockptr_t optlen, bool bpf_timetamping)
>   {
>   	struct socket *sock = sk->sk_socket;
>   
> @@ -1793,9 +1809,13 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
>   		 * returning the flags when they were set through the same option.
>   		 * Don't change the beviour for the old case SO_TIMESTAMPING_OLD.
>   		 */
> -		if (optname == SO_TIMESTAMPING_OLD || sock_flag(sk, SOCK_TSTAMP_NEW)) {
> -			v.timestamping.flags = READ_ONCE(sk->sk_tsflags);
> -			v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc);
> +		if (!bpf_timetamping) {
> +			if (optname == SO_TIMESTAMPING_OLD || sock_flag(sk, SOCK_TSTAMP_NEW)) {
> +				v.timestamping.flags = READ_ONCE(sk->sk_tsflags);
> +				v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc);
> +			}
> +		} else {
> +			v.timestamping.flags = READ_ONCE(sk->sk_tsflags_bpf);
>   		}
>   		break;
>   
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 0e24916b39d4..9a20af41e272 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2679,7 +2679,7 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
>   	int is_udplite = IS_UDPLITE(sk);
>   
>   	if (level == SOL_SOCKET) {
> -		err = sk_setsockopt(sk, level, optname, optval, optlen);
> +		err = sk_setsockopt(sk, level, optname, optval, optlen, false);
>   
>   		if (optname == SO_RCVBUF || optname == SO_RCVBUFFORCE) {
>   			sockopt_lock_sock(sk);
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 505445a9598f..7b12cc2db136 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -306,7 +306,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
>   			return PTR_ERR(ssk);
>   		}
>   
> -		ret = sk_setsockopt(ssk, SOL_SOCKET, optname, optval, optlen);
> +		ret = sk_setsockopt(ssk, SOL_SOCKET, optname, optval, optlen, false);
>   		if (ret == 0) {
>   			if (optname == SO_REUSEPORT)
>   				sk->sk_reuseport = ssk->sk_reuseport;
> diff --git a/net/socket.c b/net/socket.c
> index 9a8e4452b9b2..4bdca39685a6 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2385,7 +2385,7 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>   
>   	ops = READ_ONCE(sock->ops);
>   	if (level == SOL_SOCKET) {
> -		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
> +		err = sk_getsockopt(sock->sk, level, optname, optval, optlen, false);
>   	} else if (unlikely(!ops->getsockopt)) {
>   		err = -EOPNOTSUPP;
>   	} else {
Jason Xing Oct. 30, 2024, 1:15 a.m. UTC | #4
On Wed, Oct 30, 2024 at 8:32 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/28/24 4:05 AM, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > For now, we support bpf_setsockopt to set or clear timestamps flags.
> >
> > Users can use something like this in bpf program to turn on the feature:
> > flags = SOF_TIMESTAMPING_TX_SCHED;
> > bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));
> > The specific use cases can be seen in the bpf selftest in this series.
> >
> > Later, I will support each flags one by one based on this.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   include/net/sock.h              |  4 ++--
> >   include/uapi/linux/net_tstamp.h |  7 +++++++
> >   net/core/filter.c               |  7 +++++--
> >   net/core/sock.c                 | 34 ++++++++++++++++++++++++++-------
> >   net/ipv4/udp.c                  |  2 +-
> >   net/mptcp/sockopt.c             |  2 +-
> >   net/socket.c                    |  2 +-
> >   7 files changed, 44 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 5384f1e49f5c..062f405c744e 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1775,7 +1775,7 @@ static inline void skb_set_owner_edemux(struct sk_buff *skb, struct sock *sk)
> >   #endif
> >
> >   int sk_setsockopt(struct sock *sk, int level, int optname,
> > -               sockptr_t optval, unsigned int optlen);
> > +               sockptr_t optval, unsigned int optlen, bool bpf_timetamping);
> >   int sock_setsockopt(struct socket *sock, int level, int op,
> >                   sockptr_t optval, unsigned int optlen);
> >   int do_sock_setsockopt(struct socket *sock, bool compat, int level,
> > @@ -1784,7 +1784,7 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> >                      int optname, sockptr_t optval, sockptr_t optlen);
> >
> >   int sk_getsockopt(struct sock *sk, int level, int optname,
> > -               sockptr_t optval, sockptr_t optlen);
> > +               sockptr_t optval, sockptr_t optlen, bool bpf_timetamping);
> >   int sock_gettstamp(struct socket *sock, void __user *userstamp,
> >                  bool timeval, bool time32);
> >   struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> > diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> > index 858339d1c1c4..0696699cf964 100644
> > --- a/include/uapi/linux/net_tstamp.h
> > +++ b/include/uapi/linux/net_tstamp.h
> > @@ -49,6 +49,13 @@ enum {
> >                                        SOF_TIMESTAMPING_TX_SCHED | \
> >                                        SOF_TIMESTAMPING_TX_ACK)
> >
> > +#define SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK (SOF_TIMESTAMPING_SOFTWARE | \
>
> hmm... so we are allowing it but SOF_TIMESTAMPING_SOFTWARE won't do anything
> (meaning set and not-set are both no-op) ?

I was thinking of writing a separate patch to control the output
function by using this flag. Apparently, I didn't do that, so I think
I can remove it from this series.

>
> > +                                           SOF_TIMESTAMPING_TX_SCHED | \
> > +                                           SOF_TIMESTAMPING_TX_SOFTWARE | \
> > +                                           SOF_TIMESTAMPING_TX_ACK | \
> > +                                           SOF_TIMESTAMPING_OPT_ID | \
> > +                                           SOF_TIMESTAMPING_OPT_ID_TCP)
> > +
> >   /**
> >    * struct so_timestamping - SO_TIMESTAMPING parameter
> >    *
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 58761263176c..dc8ecf899ced 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5238,6 +5238,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
> >               break;
> >       case SO_BINDTODEVICE:
> >               break;
> > +     case SO_TIMESTAMPING_NEW:
>
> How about only allow bpf_setsockopt(SO_TIMESTAMPING_NEW) instead of
> bpf_setsockopt(SO_TIMESTAMPING). Does it solve the issue reported in v2?

No, it doesn't. Sorry, I will handle it in a proper way.

>
> > +     case SO_TIMESTAMPING_OLD:
> > +             break;
> >       default:
> >               return -EINVAL;
> >       }
> > @@ -5247,11 +5250,11 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
> >                       return -EINVAL;
> >               return sk_getsockopt(sk, SOL_SOCKET, optname,
> >                                    KERNEL_SOCKPTR(optval),
> > -                                  KERNEL_SOCKPTR(optlen));
> > +                                  KERNEL_SOCKPTR(optlen), true);
> >       }
> >
> >       return sk_setsockopt(sk, SOL_SOCKET, optname,
> > -                          KERNEL_SOCKPTR(optval), *optlen);
> > +                          KERNEL_SOCKPTR(optval), *optlen, true);
> >   }
> >
> >   static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 7f398bd07fb7..7e05748b1a06 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -941,6 +941,19 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >       return 0;
> >   }
> >
> > +static int sock_set_timestamping_bpf(struct sock *sk,
> > +                                  struct so_timestamping timestamping)
> > +{
> > +     u32 flags = timestamping.flags;
> > +
> > +     if (flags & ~SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK)
> > +             return -EINVAL;
> > +
> > +     WRITE_ONCE(sk->sk_tsflags_bpf, flags);
>
> I think it is cleaner to directly "WRITE_ONCE(sk->sk_tsflags_bpf, flags);" in
> sol_socket_sockopt() instead of adding "bool bpf_timestamping" to sk_setsockopt.
> sk_tsflags_bpf is a separate u32 anyway, so not a lot of code to share. The same
> for getsockopt.

As I replied to Willem, I feel this way (that is also the same as v2)
[1] introduces more extra duplicated code and returns earlier compared
to other use cases of SO_xxx, which do you think is a bit weird?

[1]: https://lore.kernel.org/all/20241012040651.95616-3-kerneljasonxing@gmail.com/

Surely, I can write it like how v2 works. Which one would you prefer :) ?

>
> [ will continue the remaining patches a little later ]

Thanks!

Thanks,
Jason
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 5384f1e49f5c..062f405c744e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1775,7 +1775,7 @@  static inline void skb_set_owner_edemux(struct sk_buff *skb, struct sock *sk)
 #endif
 
 int sk_setsockopt(struct sock *sk, int level, int optname,
-		  sockptr_t optval, unsigned int optlen);
+		  sockptr_t optval, unsigned int optlen, bool bpf_timetamping);
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
 int do_sock_setsockopt(struct socket *sock, bool compat, int level,
@@ -1784,7 +1784,7 @@  int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 		       int optname, sockptr_t optval, sockptr_t optlen);
 
 int sk_getsockopt(struct sock *sk, int level, int optname,
-		  sockptr_t optval, sockptr_t optlen);
+		  sockptr_t optval, sockptr_t optlen, bool bpf_timetamping);
 int sock_gettstamp(struct socket *sock, void __user *userstamp,
 		   bool timeval, bool time32);
 struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 858339d1c1c4..0696699cf964 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -49,6 +49,13 @@  enum {
 					 SOF_TIMESTAMPING_TX_SCHED | \
 					 SOF_TIMESTAMPING_TX_ACK)
 
+#define SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK (SOF_TIMESTAMPING_SOFTWARE | \
+					      SOF_TIMESTAMPING_TX_SCHED | \
+					      SOF_TIMESTAMPING_TX_SOFTWARE | \
+					      SOF_TIMESTAMPING_TX_ACK | \
+					      SOF_TIMESTAMPING_OPT_ID | \
+					      SOF_TIMESTAMPING_OPT_ID_TCP)
+
 /**
  * struct so_timestamping - SO_TIMESTAMPING parameter
  *
diff --git a/net/core/filter.c b/net/core/filter.c
index 58761263176c..dc8ecf899ced 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5238,6 +5238,9 @@  static int sol_socket_sockopt(struct sock *sk, int optname,
 		break;
 	case SO_BINDTODEVICE:
 		break;
+	case SO_TIMESTAMPING_NEW:
+	case SO_TIMESTAMPING_OLD:
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -5247,11 +5250,11 @@  static int sol_socket_sockopt(struct sock *sk, int optname,
 			return -EINVAL;
 		return sk_getsockopt(sk, SOL_SOCKET, optname,
 				     KERNEL_SOCKPTR(optval),
-				     KERNEL_SOCKPTR(optlen));
+				     KERNEL_SOCKPTR(optlen), true);
 	}
 
 	return sk_setsockopt(sk, SOL_SOCKET, optname,
-			     KERNEL_SOCKPTR(optval), *optlen);
+			     KERNEL_SOCKPTR(optval), *optlen, true);
 }
 
 static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
diff --git a/net/core/sock.c b/net/core/sock.c
index 7f398bd07fb7..7e05748b1a06 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -941,6 +941,19 @@  int sock_set_timestamping(struct sock *sk, int optname,
 	return 0;
 }
 
+static int sock_set_timestamping_bpf(struct sock *sk,
+				     struct so_timestamping timestamping)
+{
+	u32 flags = timestamping.flags;
+
+	if (flags & ~SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK)
+		return -EINVAL;
+
+	WRITE_ONCE(sk->sk_tsflags_bpf, flags);
+
+	return 0;
+}
+
 void sock_set_keepalive(struct sock *sk)
 {
 	lock_sock(sk);
@@ -1159,7 +1172,7 @@  static int sockopt_validate_clockid(__kernel_clockid_t value)
  */
 
 int sk_setsockopt(struct sock *sk, int level, int optname,
-		  sockptr_t optval, unsigned int optlen)
+		  sockptr_t optval, unsigned int optlen, bool bpf_timetamping)
 {
 	struct so_timestamping timestamping;
 	struct socket *sock = sk->sk_socket;
@@ -1409,7 +1422,10 @@  int sk_setsockopt(struct sock *sk, int level, int optname,
 			memset(&timestamping, 0, sizeof(timestamping));
 			timestamping.flags = val;
 		}
-		ret = sock_set_timestamping(sk, optname, timestamping);
+		if (!bpf_timetamping)
+			ret = sock_set_timestamping(sk, optname, timestamping);
+		else
+			ret = sock_set_timestamping_bpf(sk, timestamping);
 		break;
 
 	case SO_RCVLOWAT:
@@ -1626,7 +1642,7 @@  int sock_setsockopt(struct socket *sock, int level, int optname,
 		    sockptr_t optval, unsigned int optlen)
 {
 	return sk_setsockopt(sock->sk, level, optname,
-			     optval, optlen);
+			     optval, optlen, false);
 }
 EXPORT_SYMBOL(sock_setsockopt);
 
@@ -1670,7 +1686,7 @@  static int groups_to_user(sockptr_t dst, const struct group_info *src)
 }
 
 int sk_getsockopt(struct sock *sk, int level, int optname,
-		  sockptr_t optval, sockptr_t optlen)
+		  sockptr_t optval, sockptr_t optlen, bool bpf_timetamping)
 {
 	struct socket *sock = sk->sk_socket;
 
@@ -1793,9 +1809,13 @@  int sk_getsockopt(struct sock *sk, int level, int optname,
 		 * returning the flags when they were set through the same option.
 		 * Don't change the beviour for the old case SO_TIMESTAMPING_OLD.
 		 */
-		if (optname == SO_TIMESTAMPING_OLD || sock_flag(sk, SOCK_TSTAMP_NEW)) {
-			v.timestamping.flags = READ_ONCE(sk->sk_tsflags);
-			v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc);
+		if (!bpf_timetamping) {
+			if (optname == SO_TIMESTAMPING_OLD || sock_flag(sk, SOCK_TSTAMP_NEW)) {
+				v.timestamping.flags = READ_ONCE(sk->sk_tsflags);
+				v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc);
+			}
+		} else {
+			v.timestamping.flags = READ_ONCE(sk->sk_tsflags_bpf);
 		}
 		break;
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0e24916b39d4..9a20af41e272 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2679,7 +2679,7 @@  int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 	int is_udplite = IS_UDPLITE(sk);
 
 	if (level == SOL_SOCKET) {
-		err = sk_setsockopt(sk, level, optname, optval, optlen);
+		err = sk_setsockopt(sk, level, optname, optval, optlen, false);
 
 		if (optname == SO_RCVBUF || optname == SO_RCVBUFFORCE) {
 			sockopt_lock_sock(sk);
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 505445a9598f..7b12cc2db136 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -306,7 +306,7 @@  static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 			return PTR_ERR(ssk);
 		}
 
-		ret = sk_setsockopt(ssk, SOL_SOCKET, optname, optval, optlen);
+		ret = sk_setsockopt(ssk, SOL_SOCKET, optname, optval, optlen, false);
 		if (ret == 0) {
 			if (optname == SO_REUSEPORT)
 				sk->sk_reuseport = ssk->sk_reuseport;
diff --git a/net/socket.c b/net/socket.c
index 9a8e4452b9b2..4bdca39685a6 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2385,7 +2385,7 @@  int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 
 	ops = READ_ONCE(sock->ops);
 	if (level == SOL_SOCKET) {
-		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
+		err = sk_getsockopt(sock->sk, level, optname, optval, optlen, false);
 	} else if (unlikely(!ops->getsockopt)) {
 		err = -EOPNOTSUPP;
 	} else {