diff mbox series

[bpf-next,01/14] net: Change sock_setsockopt from taking sock ptr to sk ptr

Message ID 20220727060902.2370689-1-kafai@fb.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: net: Remove duplicated codes from bpf_setsockopt() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2749 this patch: 2749
netdev/cc_maintainers warning 13 maintainers not CCed: hch@lst.de sfrench@samba.org matthieu.baerts@tessares.net mathew.j.martineau@linux.intel.com sagi@grimberg.me senozhatsky@chromium.org mptcp@lists.linux.dev kbusch@kernel.org axboe@fb.com hyc.lee@gmail.com linkinjeon@kernel.org linux-nvme@lists.infradead.org linux-cifs@vger.kernel.org
netdev/build_clang success Errors and warnings before: 609 this patch: 609
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2881 this patch: 2881
netdev/checkpatch warning WARNING: line length of 83 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Kernel LATEST on ubuntu-latest with llvm-15
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for Kernel LATEST on z15 with gcc

Commit Message

Martin KaFai Lau July 27, 2022, 6:09 a.m. UTC
A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the
sock_setsockopt() to avoid code duplication and code
drift between the two duplicates.

The current sock_setsockopt() takes sock ptr as the argument.
The very first thing of this function is to get back the sk ptr
by 'sk = sock->sk'.

bpf_setsockopt() could be called when the sk does not have
a userspace owner.  Meaning sk->sk_socket is NULL.  For example,
when a passive tcp connection has just been established.  Thus,
it cannot use the sock_setsockopt(sk->sk_socket) or else it will
pass a NULL sock ptr.

All existing callers have both sock->sk and sk->sk_socket pointer.
Thus, this patch changes the sock_setsockopt() to take a sk ptr
instead of the sock ptr.  The bpf_setsockopt() only allows
optnames that do not require a sock ptr.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 drivers/nvme/host/tcp.c  |  2 +-
 fs/ksmbd/transport_tcp.c |  2 +-
 include/net/sock.h       |  2 +-
 net/core/sock.c          |  4 ++--
 net/mptcp/sockopt.c      | 12 ++++++------
 net/socket.c             |  2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

Comments

David Laight July 27, 2022, 8:11 a.m. UTC | #1
From: Martin KaFai Lau
> Sent: 27 July 2022 07:09
> 
> A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the
> sock_setsockopt() to avoid code duplication and code
> drift between the two duplicates.
> 
> The current sock_setsockopt() takes sock ptr as the argument.
> The very first thing of this function is to get back the sk ptr
> by 'sk = sock->sk'.
> 
> bpf_setsockopt() could be called when the sk does not have
> a userspace owner.  Meaning sk->sk_socket is NULL.  For example,
> when a passive tcp connection has just been established.  Thus,
> it cannot use the sock_setsockopt(sk->sk_socket) or else it will
> pass a NULL sock ptr.

I'm intrigued, I've some code that uses sock_create_kern() to create
sockets without a userspace owner - I'd have though bpf is doing
much the same.

I end up doing:
        if (level == SOL_SOCKET)
                err = sock_setsockopt(sock, level, optname, koptval, optlen);
        else
                err = sock->ops->setsockopt(sock, level, optname, koptval,
                                            optlen);
to set options.
(This code used to use kern_setsockopt() - but that got removed.)

I'd have though bpf would need similar code??

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Eric Dumazet July 27, 2022, 8:16 a.m. UTC | #2
On Wed, Jul 27, 2022 at 8:09 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the
> sock_setsockopt() to avoid code duplication and code
> drift between the two duplicates.
>
> The current sock_setsockopt() takes sock ptr as the argument.
> The very first thing of this function is to get back the sk ptr
> by 'sk = sock->sk'.
>
> bpf_setsockopt() could be called when the sk does not have
> a userspace owner.  Meaning sk->sk_socket is NULL.  For example,
> when a passive tcp connection has just been established.  Thus,
> it cannot use the sock_setsockopt(sk->sk_socket) or else it will
> pass a NULL sock ptr.
>
> All existing callers have both sock->sk and sk->sk_socket pointer.
> Thus, this patch changes the sock_setsockopt() to take a sk ptr
> instead of the sock ptr.  The bpf_setsockopt() only allows
> optnames that do not require a sock ptr.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

...

> diff --git a/include/net/sock.h b/include/net/sock.h
> index f7ad1a7705e9..9e2539dcc293 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1795,7 +1795,7 @@ void sock_pfree(struct sk_buff *skb);
>  #define sock_edemux sock_efree
>  #endif
>
> -int sock_setsockopt(struct socket *sock, int level, int op,
> +int sock_setsockopt(struct sock *sk, int level, int op,
>                     sockptr_t optval, unsigned int optlen);
>

SGTM, but I feel we should rename this to sk_setsockopt() ?
Martin KaFai Lau July 27, 2022, 6:50 p.m. UTC | #3
On Wed, Jul 27, 2022 at 10:16:28AM +0200, Eric Dumazet wrote:
> On Wed, Jul 27, 2022 at 8:09 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the
> > sock_setsockopt() to avoid code duplication and code
> > drift between the two duplicates.
> >
> > The current sock_setsockopt() takes sock ptr as the argument.
> > The very first thing of this function is to get back the sk ptr
> > by 'sk = sock->sk'.
> >
> > bpf_setsockopt() could be called when the sk does not have
> > a userspace owner.  Meaning sk->sk_socket is NULL.  For example,
> > when a passive tcp connection has just been established.  Thus,
> > it cannot use the sock_setsockopt(sk->sk_socket) or else it will
> > pass a NULL sock ptr.
> >
> > All existing callers have both sock->sk and sk->sk_socket pointer.
> > Thus, this patch changes the sock_setsockopt() to take a sk ptr
> > instead of the sock ptr.  The bpf_setsockopt() only allows
> > optnames that do not require a sock ptr.
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> 
> ...
> 
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index f7ad1a7705e9..9e2539dcc293 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1795,7 +1795,7 @@ void sock_pfree(struct sk_buff *skb);
> >  #define sock_edemux sock_efree
> >  #endif
> >
> > -int sock_setsockopt(struct socket *sock, int level, int op,
> > +int sock_setsockopt(struct sock *sk, int level, int op,
> >                     sockptr_t optval, unsigned int optlen);
> >
> 
> SGTM, but I feel we should rename this to sk_setsockopt() ?
Ah, right.  will rename it.
Martin KaFai Lau July 27, 2022, 8:42 p.m. UTC | #4
On Wed, Jul 27, 2022 at 08:11:26AM +0000, David Laight wrote:
> From: Martin KaFai Lau
> > Sent: 27 July 2022 07:09
> > 
> > A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the
> > sock_setsockopt() to avoid code duplication and code
> > drift between the two duplicates.
> > 
> > The current sock_setsockopt() takes sock ptr as the argument.
> > The very first thing of this function is to get back the sk ptr
> > by 'sk = sock->sk'.
> > 
> > bpf_setsockopt() could be called when the sk does not have
> > a userspace owner.  Meaning sk->sk_socket is NULL.  For example,
> > when a passive tcp connection has just been established.  Thus,
> > it cannot use the sock_setsockopt(sk->sk_socket) or else it will
> > pass a NULL sock ptr.
> 
> I'm intrigued, I've some code that uses sock_create_kern() to create
> sockets without a userspace owner - I'd have though bpf is doing
> much the same.
> 
> I end up doing:
>         if (level == SOL_SOCKET)
>                 err = sock_setsockopt(sock, level, optname, koptval, optlen);
>         else
>                 err = sock->ops->setsockopt(sock, level, optname, koptval,
>                                             optlen);
> to set options.
> (This code used to use kern_setsockopt() - but that got removed.)
> 
> I'd have though bpf would need similar code??
By no userspace owner, I was referring a sk has not been accept()-ed by
the userspace yet instead of a 'struct socket *sock' created for
the kernel internal use.  After another thought, that tcp_sock
is sort of owned by the listen sk,  I will rephrase the commit
message to avoid the confusion.

bpf prog does not have a 'sock' ptr because the sock
has not been created yet.
diff mbox series

Patch

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 7a9e6ffa2342..60e14cc39e49 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1555,7 +1555,7 @@  static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
 		char *iface = nctrl->opts->host_iface;
 		sockptr_t optval = KERNEL_SOCKPTR(iface);
 
-		ret = sock_setsockopt(queue->sock, SOL_SOCKET, SO_BINDTODEVICE,
+		ret = sock_setsockopt(queue->sock->sk, SOL_SOCKET, SO_BINDTODEVICE,
 				      optval, strlen(iface));
 		if (ret) {
 			dev_err(nctrl->device,
diff --git a/fs/ksmbd/transport_tcp.c b/fs/ksmbd/transport_tcp.c
index 143bba4e4db8..982eed2dd575 100644
--- a/fs/ksmbd/transport_tcp.c
+++ b/fs/ksmbd/transport_tcp.c
@@ -420,7 +420,7 @@  static int create_socket(struct interface *iface)
 	ksmbd_tcp_nodelay(ksmbd_socket);
 	ksmbd_tcp_reuseaddr(ksmbd_socket);
 
-	ret = sock_setsockopt(ksmbd_socket,
+	ret = sock_setsockopt(ksmbd_socket->sk,
 			      SOL_SOCKET,
 			      SO_BINDTODEVICE,
 			      KERNEL_SOCKPTR(iface->name),
diff --git a/include/net/sock.h b/include/net/sock.h
index f7ad1a7705e9..9e2539dcc293 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1795,7 +1795,7 @@  void sock_pfree(struct sk_buff *skb);
 #define sock_edemux sock_efree
 #endif
 
-int sock_setsockopt(struct socket *sock, int level, int op,
+int sock_setsockopt(struct sock *sk, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
 
 int sock_getsockopt(struct socket *sock, int level, int op,
diff --git a/net/core/sock.c b/net/core/sock.c
index 4cb957d934a2..18bb4f269cf1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1041,12 +1041,12 @@  static int sock_reserve_memory(struct sock *sk, int bytes)
  *	at the socket level. Everything here is generic.
  */
 
-int sock_setsockopt(struct socket *sock, int level, int optname,
+int sock_setsockopt(struct sock *sk, int level, int optname,
 		    sockptr_t optval, unsigned int optlen)
 {
 	struct so_timestamping timestamping;
+	struct socket *sock = sk->sk_socket;
 	struct sock_txtime sk_txtime;
-	struct sock *sk = sock->sk;
 	int val;
 	int valbool;
 	struct linger ling;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 423d3826ca1e..5684499b4d39 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -124,7 +124,7 @@  static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val)
 	struct sock *sk = (struct sock *)msk;
 	int ret;
 
-	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname,
+	ret = sock_setsockopt(sk, SOL_SOCKET, optname,
 			      optval, sizeof(val));
 	if (ret)
 		return ret;
@@ -149,7 +149,7 @@  static int mptcp_setsockopt_sol_socket_tstamp(struct mptcp_sock *msk, int optnam
 	struct sock *sk = (struct sock *)msk;
 	int ret;
 
-	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname,
+	ret = sock_setsockopt(sk, SOL_SOCKET, optname,
 			      optval, sizeof(val));
 	if (ret)
 		return ret;
@@ -225,7 +225,7 @@  static int mptcp_setsockopt_sol_socket_timestamping(struct mptcp_sock *msk,
 		return -EINVAL;
 	}
 
-	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname,
+	ret = sock_setsockopt(sk, SOL_SOCKET, optname,
 			      KERNEL_SOCKPTR(&timestamping),
 			      sizeof(timestamping));
 	if (ret)
@@ -262,7 +262,7 @@  static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
 		return -EFAULT;
 
 	kopt = KERNEL_SOCKPTR(&ling);
-	ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, SO_LINGER, kopt, sizeof(ling));
+	ret = sock_setsockopt(sk, SOL_SOCKET, SO_LINGER, kopt, sizeof(ling));
 	if (ret)
 		return ret;
 
@@ -306,7 +306,7 @@  static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 			return -EINVAL;
 		}
 
-		ret = sock_setsockopt(ssock, SOL_SOCKET, optname, optval, optlen);
+		ret = sock_setsockopt(ssock->sk, SOL_SOCKET, optname, optval, optlen);
 		if (ret == 0) {
 			if (optname == SO_REUSEPORT)
 				sk->sk_reuseport = ssock->sk->sk_reuseport;
@@ -349,7 +349,7 @@  static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_PREFER_BUSY_POLL:
 	case SO_BUSY_POLL_BUDGET:
 		/* No need to copy: only relevant for msk */
-		return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen);
+		return sock_setsockopt(sk, SOL_SOCKET, optname, optval, optlen);
 	case SO_NO_CHECK:
 	case SO_DONTROUTE:
 	case SO_BROADCAST:
diff --git a/net/socket.c b/net/socket.c
index b6bd4cf44d3f..c6911d613ae2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2245,7 +2245,7 @@  int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 	if (kernel_optval)
 		optval = KERNEL_SOCKPTR(kernel_optval);
 	if (level == SOL_SOCKET && !sock_use_custom_sol_socket(sock))
-		err = sock_setsockopt(sock, level, optname, optval, optlen);
+		err = sock_setsockopt(sock->sk, level, optname, optval, optlen);
 	else if (unlikely(!sock->ops->setsockopt))
 		err = -EOPNOTSUPP;
 	else