diff mbox series

[v2,net,3/5] net: Convert @kern of __sock_create() to enum.

Message ID 20240227011041.97375-4-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series tcp/rds: Fix use-after-free around kernel TCP reqsk. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5783 this patch: 5783
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 2090 this patch: 2090
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 6088 this patch: 6088
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 61 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-02-27--06-00 (tests: 1455)

Commit Message

Kuniyuki Iwashima Feb. 27, 2024, 1:10 a.m. UTC
Historically, syzbot has reported many use-after-free of struct
net by kernel sockets.

In most cases, the root cause was a timer kicked by a kernel socket
which does not hold netns refcount nor clean it up during netns
dismantle.

This patch converts the @kern argument of __sock_create() to enum
so that we can pass SOCKET_KERN_NET_REF and later sk_alloc() can
hold refcount of net for kernel sockets.

We pass !!kern to security_socket(_post)?_create() but kern as is
to pf->create() because 3 functions (atalk_create(), inet_create(),
inet6_create()) use it for the following check:

  if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))

The conversion for rest of the callers of __sock_create() and
sk_alloc() will be completed in net-next.git as the change is
too large to backport.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/linux/net.h |  6 ++++++
 net/core/sock.c     |  2 +-
 net/socket.c        | 11 ++++++-----
 3 files changed, 13 insertions(+), 6 deletions(-)

Comments

David Laight Feb. 29, 2024, 9:51 p.m. UTC | #1
From: Kuniyuki Iwashima
> Sent: 27 February 2024 01:11
> Subject: [PATCH v2 net 3/5] net: Convert @kern of __sock_create() to enum.

Should probably be (something like):
	Allow __sock_create() create kernel sockets that hold a reference
	to the network namespace.

> Historically, syzbot has reported many use-after-free of struct
> net by kernel sockets.
> 
> In most cases, the root cause was a timer kicked by a kernel socket
> which does not hold netns refcount nor clean it up during netns
> dismantle.
> 
> This patch converts the @kern argument of __sock_create() to enum
> so that we can pass SOCKET_KERN_NET_REF and later sk_alloc() can
> hold refcount of net for kernel sockets.

I think you should add a 'hold netns' parameter to sock_create_kern().
Indeed, that is likely to be used for a real connection
(which would need the 'hold netns') and code that doesn't need it
(because the socket is some internal housekeeping socket) could
directly call __sock_create().

Fortunately both functions are exported non-gpl.

I've this comment in a driver...

    /* sock_create_kern() creates a socket that doesn't hold a reference
     * to the namespace (they get used for sockets needed by the protocol
     * stack code itself).
     * We need a socket that holds a reference to the namespace, so create
     * a 'user' socket in a specific namespace.
     * This adds an extra security check which we should pass because all the
     * sockets are created by kernel threads.
     */
    rval = __sock_create(net, family, type, protocol, sockp, 0);


	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/include/linux/net.h b/include/linux/net.h
index c9b4a63791a4..62ef0954be75 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -245,6 +245,12 @@  enum {
 	SOCK_WAKE_URG,
 };
 
+enum socket_user {
+	SOCKET_USER,
+	SOCKET_KERN,
+	SOCKET_KERN_NET_REF,
+};
+
 int sock_wake_async(struct socket_wq *sk_wq, int how, int band);
 int sock_register(const struct net_proto_family *fam);
 void sock_unregister(int family);
diff --git a/net/core/sock.c b/net/core/sock.c
index 5e78798456fd..6f417cdbcf50 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2138,7 +2138,7 @@  struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sk->sk_prot = sk->sk_prot_creator = prot;
 		sk->sk_kern_sock = kern;
 		sock_lock_init(sk);
-		sk->sk_net_refcnt = kern ? 0 : 1;
+		sk->sk_net_refcnt = kern != SOCKET_KERN;
 		if (likely(sk->sk_net_refcnt)) {
 			get_net_track(net, &sk->ns_tracker, priority);
 			sock_inuse_add(net, 1);
diff --git a/net/socket.c b/net/socket.c
index ed3df2f749bf..f5ec613d9e3b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1489,7 +1489,7 @@  EXPORT_SYMBOL(sock_wake_async);
  *	@type: communication type (SOCK_STREAM, ...)
  *	@protocol: protocol (0, ...)
  *	@res: new socket
- *	@kern: boolean for kernel space sockets
+ *	@kern: enum for kernel space sockets
  *
  *	Creates a new socket and assigns it to @res, passing through LSM.
  *	Returns 0 or an error. On failure @res is set to %NULL. @kern must
@@ -1523,7 +1523,7 @@  int __sock_create(struct net *net, int family, int type, int protocol,
 		family = PF_PACKET;
 	}
 
-	err = security_socket_create(family, type, protocol, kern);
+	err = security_socket_create(family, type, protocol, !!kern);
 	if (err)
 		return err;
 
@@ -1584,7 +1584,7 @@  int __sock_create(struct net *net, int family, int type, int protocol,
 	 * module can have its refcnt decremented
 	 */
 	module_put(pf->owner);
-	err = security_socket_post_create(sock, family, type, protocol, kern);
+	err = security_socket_post_create(sock, family, type, protocol, !!kern);
 	if (err)
 		goto out_sock_release;
 	*res = sock;
@@ -1619,7 +1619,8 @@  EXPORT_SYMBOL(__sock_create);
 
 int sock_create(int family, int type, int protocol, struct socket **res)
 {
-	return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
+	return __sock_create(current->nsproxy->net_ns, family, type, protocol,
+			     res, SOCKET_USER);
 }
 EXPORT_SYMBOL(sock_create);
 
@@ -1637,7 +1638,7 @@  EXPORT_SYMBOL(sock_create);
 
 int sock_create_kern(struct net *net, int family, int type, int protocol, struct socket **res)
 {
-	return __sock_create(net, family, type, protocol, res, 1);
+	return __sock_create(net, family, type, protocol, res, SOCKET_KERN);
 }
 EXPORT_SYMBOL(sock_create_kern);