From patchwork Wed Dec 18 03:24:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve French X-Patchwork-Id: 13912992 Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44A281494C3 for ; Wed, 18 Dec 2024 03:24:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734492270; cv=none; b=k/VcQOWhawmtHvIYGQUTjB1toi/Tv+Qj7S9/K3vqPecgdTCN3Ze/ZKxRedjsQSFygVmWiIwRuRJYQ7eeUGxoUGBAXBbl2L0+Ofo/D48XmT2p8prVe31J74Y4tJjLfsW6jPQ2AVzKw/sH3cTvkDg4t3SS3hP9+Xo9CXRKKnFGDMI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734492270; c=relaxed/simple; bh=c7JODEXksCUlACS5BFUotXmeQ7S81wE12FTG37257a0=; h=MIME-Version:From:Date:Message-ID:Subject:To:Cc:Content-Type; b=tp3KSM/cGLqh938wJtNEx9ezyc0F+VikcKeRG1SJ5AWdqtf+4Wb4pqUTES8FZgYebtfHFUMoc6AMPA6kxzcqskZKM8zVgxd4FtKuUgOlEepv1iZX/waM6QZTuPI6lYAm0pnG4gUL1KpfRY5K+9H5wbwFYg/2vN/5j6GH2PPcILI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=V1lwjdat; arc=none smtp.client-ip=209.85.208.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V1lwjdat" Received: by mail-lj1-f180.google.com with SMTP id 38308e7fff4ca-30033e07ef3so3950421fa.0 for ; Tue, 17 Dec 2024 19:24:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734492265; x=1735097065; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=dCX3GcnHoVOaB9KmiYj8QNDfWxPyq3jtetYwuX+Kc5E=; b=V1lwjdat+dP2tV7oPxqc/8GRDE6RC98UeR7EloeUZZqZ1UkC/fpF2po8E/CRAODlHr 0HMMotY8Ot2kzO9QfkOy2QTHAHLSeUA7Ti/Ox/MCqthKygz5OTbH3ofkwF2TKuNeNz4Q ARLDkHZ9v866VPk2qdfBHhC31yEByLRo4PqRUj3xERPPnk9oIRUDwx56xFdTQkuB23OZ nZtTHK/gPHf4+egxba4kAHakvoYrtz9kNcRsmsdZ2zlF7Rf95Ipdvu9X/SmYgKmKhQEs PjPbmQdUoG/q5GtGPTdEtj6cIOwvXwWMrPYVtP1SnygVEojcFyIEbXLvV4WZkZdNRInR +O5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734492265; x=1735097065; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=dCX3GcnHoVOaB9KmiYj8QNDfWxPyq3jtetYwuX+Kc5E=; b=jypMM3P2F5PhZtc/thuE9gVYhtIHsJJ/8TGdwJk8tfXGxOxusFvEONPECtCootnntE ChzL/Br1M/hUlzPxEbKxHhf3aIUZjQiHFkVBcaE2xY+dp7UvYcLbIObuRvn0aIgepopZ l3nd5yppJroVt58SiMcUR1uSYy2BNJWGdy+nL7/Px4Ov9m+e8Fmf6xz8I4mYa60XwTyD ZZmKWXEVhwAKSyQju+KSc71D/3S6SFrL4kglCW5wiTN5QHIGQXN6b42IFrGAW1syORCM uxr/UbNhOT/m5wi4FGRrA1JfKzuysEW7+fqCTDgno+/tOMGlFLm6OJ255NYUY8rhK4Tj aP1Q== X-Gm-Message-State: AOJu0YztgoAO/jRYHaUlgLi+EEUqtX5+fThRl4GuhEuaEPlzPGUj3wZs Jhe9TiUbCwuVXdK6NaRkZsznnvbE3ryJfwv+INJe6jfQfhfMDLEy1jX8lgpjh2kidDRMqEaxkN2 ke/KKAsyIkgZmxI2lxH2lsv/Q5ivrxszY X-Gm-Gg: ASbGnculkHDU9bXMOQ8pg+Me9QoFKlYjXnnV4MV+z/UGH55SiYNoXWxARt3vqiwi2NJ ZY+hFKvWzsuPwNduqbG+rxyyTQYbA7DfwU6OnVkSW4eQCP4JN7pWiApOhZdK3u3ch/2YHKoTu X-Google-Smtp-Source: AGHT+IGY9pRp7SB0b8W4JLgHkbt1Bhkt4YDLuWmTebba4x1klLIDMVX5+wcayzjpqGjOxQYfQ10Qb+9gZS1eGIBL5LQ= X-Received: by 2002:a05:651c:b0d:b0:302:251a:bcfe with SMTP id 38308e7fff4ca-3044e624278mr3474441fa.6.1734492264597; Tue, 17 Dec 2024 19:24:24 -0800 (PST) Precedence: bulk X-Mailing-List: linux-cifs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steve French Date: Tue, 17 Dec 2024 21:24:12 -0600 Message-ID: Subject: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod To: CIFS Cc: Kuniyuki Iwashima , Enzo Matsumiya Enzo had an interesting patch, that seems to fix an important problem. Here was his repro scenario: tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10 //someserver/target1 /mnt/test tw:~ # ls /mnt/test abc dir1 dir3 target1_file.txt tsub tw:~ # iptables -A INPUT -s someserver -j DROP Trigger reconnect and wait for 3*echo_interval: tw:~ # cat /mnt/test/target1_file.txt cat: /mnt/test/target1_file.txt: Host is down Then umount and rmmod. Note that rmmod might take several iterations until it properly tears down everything, so make sure you see the "not loaded" message before proceeding: tw:~ # umount /mnt/*; rmmod cifs umount: /mnt/az: not mounted. umount: /mnt/dfs: not mounted. umount: /mnt/local: not mounted. umount: /mnt/scratch: not mounted. rmmod: ERROR: Module cifs is in use ... tw:~ # rmmod cifs rmmod: ERROR: Module cifs is not currently loaded Then kickoff the TCP internals: tw:~ # iptables -F Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref later on. Any thoughts on his patch? See below (and attached) Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") fixed a netns UAF by manually enabled socket refcounting (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)). The reason the patch worked for that bug was because we now hold references to the netns (get_net_track() gets a ref internally) and they're properly released (internally, on __sk_destruct()), but only because sk->sk_net_refcnt was set. Problem: (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless if init_net or other) Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not only out of cifs scope, but also technically wrong -- it's set conditionally based on user (=1) vs kernel (=0) sockets. And net/ implementations seem to base their user vs kernel space operations on it. e.g. upon TCP socket close, the TCP timers are not cleared because sk->sk_net_refcnt=1: (cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets")) net/ipv4/tcp.c: void tcp_close(struct sock *sk, long timeout) { lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); if (!sk->sk_net_refcnt) inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); } Which will throw a lockdep warning and then, as expected, deadlock on tcp_write_timer(). A way to reproduce this is by running the reproducer from ef7134c7fc48 and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep warning shows up. Fix: We shouldn't mess with socket internals ourselves, so do not set sk_net_refcnt manually. Also change __sock_create() to sock_create_kern() for explicitness. As for non-init_net network namespaces, we deal with it the best way we can -- hold an extra netns reference for server->ssocket and drop it when it's released. This ensures that the netns still exists whenever we need to create/destroy server->ssocket, but is not directly tied to it. Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") From f6cfa4bc261477f7a91c46f34b8d163f19870249 Mon Sep 17 00:00:00 2001 From: Enzo Matsumiya Date: Tue, 10 Dec 2024 18:15:12 -0300 Subject: [PATCH 1/4] smb: client: fix TCP timers deadlock after rmmod Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") fixed a netns UAF by manually enabled socket refcounting (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)). The reason the patch worked for that bug was because we now hold references to the netns (get_net_track() gets a ref internally) and they're properly released (internally, on __sk_destruct()), but only because sk->sk_net_refcnt was set. Problem: (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless if init_net or other) Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not only out of cifs scope, but also technically wrong -- it's set conditionally based on user (=1) vs kernel (=0) sockets. And net/ implementations seem to base their user vs kernel space operations on it. e.g. upon TCP socket close, the TCP timers are not cleared because sk->sk_net_refcnt=1: (cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets")) net/ipv4/tcp.c: void tcp_close(struct sock *sk, long timeout) { lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); if (!sk->sk_net_refcnt) inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); } Which will throw a lockdep warning and then, as expected, deadlock on tcp_write_timer(). A way to reproduce this is by running the reproducer from ef7134c7fc48 and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep warning shows up. Fix: We shouldn't mess with socket internals ourselves, so do not set sk_net_refcnt manually. Also change __sock_create() to sock_create_kern() for explicitness. As for non-init_net network namespaces, we deal with it the best way we can -- hold an extra netns reference for server->ssocket and drop it when it's released. This ensures that the netns still exists whenever we need to create/destroy server->ssocket, but is not directly tied to it. Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") Signed-off-by: Enzo Matsumiya Signed-off-by: Steve French --- fs/smb/client/connect.c | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 2372538a1211..ddcc9e514a0e 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -987,9 +987,13 @@ clean_demultiplex_info(struct TCP_Server_Info *server) msleep(125); if (cifs_rdma_enabled(server)) smbd_destroy(server); + if (server->ssocket) { sock_release(server->ssocket); server->ssocket = NULL; + + /* Release netns reference for the socket. */ + put_net(cifs_net_ns(server)); } if (!list_empty(&server->pending_mid_q)) { @@ -1037,6 +1041,7 @@ clean_demultiplex_info(struct TCP_Server_Info *server) */ } + /* Release netns reference for this server. */ put_net(cifs_net_ns(server)); kfree(server->leaf_fullpath); kfree(server); @@ -1713,6 +1718,8 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx, tcp_ses->ops = ctx->ops; tcp_ses->vals = ctx->vals; + + /* Grab netns reference for this server. */ cifs_set_net_ns(tcp_ses, get_net(current->nsproxy->net_ns)); tcp_ses->conn_id = atomic_inc_return(&tcpSesNextId); @@ -1844,6 +1851,7 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx, out_err_crypto_release: cifs_crypto_secmech_release(tcp_ses); + /* Release netns reference for this server. */ put_net(cifs_net_ns(tcp_ses)); out_err: @@ -1852,8 +1860,10 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx, cifs_put_tcp_session(tcp_ses->primary_server, false); kfree(tcp_ses->hostname); kfree(tcp_ses->leaf_fullpath); - if (tcp_ses->ssocket) + if (tcp_ses->ssocket) { sock_release(tcp_ses->ssocket); + put_net(cifs_net_ns(tcp_ses)); + } kfree(tcp_ses); } return ERR_PTR(rc); @@ -3131,20 +3141,20 @@ generic_ip_connect(struct TCP_Server_Info *server) socket = server->ssocket; } else { struct net *net = cifs_net_ns(server); - struct sock *sk; - rc = __sock_create(net, sfamily, SOCK_STREAM, - IPPROTO_TCP, &server->ssocket, 1); + rc = sock_create_kern(net, sfamily, SOCK_STREAM, IPPROTO_TCP, &server->ssocket); if (rc < 0) { cifs_server_dbg(VFS, "Error %d creating socket\n", rc); return rc; } - sk = server->ssocket->sk; - __netns_tracker_free(net, &sk->ns_tracker, false); - sk->sk_net_refcnt = 1; - get_net_track(net, &sk->ns_tracker, GFP_KERNEL); - sock_inuse_add(net, 1); + /* + * Grab netns reference for the socket. + * + * It'll be released here, on error, or in clean_demultiplex_info() upon server + * teardown. + */ + get_net(net); /* BB other socket options to set KEEPALIVE, NODELAY? */ cifs_dbg(FYI, "Socket created\n"); @@ -3158,8 +3168,10 @@ generic_ip_connect(struct TCP_Server_Info *server) } rc = bind_socket(server); - if (rc < 0) + if (rc < 0) { + put_net(cifs_net_ns(server)); return rc; + } /* * Eventually check for other socket options to change from @@ -3196,6 +3208,7 @@ generic_ip_connect(struct TCP_Server_Info *server) if (rc < 0) { cifs_dbg(FYI, "Error %d connecting to server\n", rc); trace_smb3_connect_err(server->hostname, server->conn_id, &server->dstaddr, rc); + put_net(cifs_net_ns(server)); sock_release(socket); server->ssocket = NULL; return rc; @@ -3204,6 +3217,9 @@ generic_ip_connect(struct TCP_Server_Info *server) if (sport == htons(RFC1001_PORT)) rc = ip_rfc1001_connect(server); + if (rc < 0) + put_net(cifs_net_ns(server)); + return rc; } -- 2.43.0