diff mbox series

[stable,5.15/6.1/6.6] af_unix: Clear oob_skb in scan_inflight().

Message ID 20250304030149.82265-1-kuniyu@amazon.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [stable,5.15/6.1/6.6] af_unix: Clear oob_skb in scan_inflight(). | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Kuniyuki Iwashima March 4, 2025, 3:01 a.m. UTC
Embryo socket is not queued in gc_candidates, so we can't drop
a reference held by its oob_skb.

Let's say we create listener and embryo sockets, send the
listener's fd to the embryo as OOB data, and close() them
without recv()ing the OOB data.

There is a self-reference cycle like

  listener -> embryo.oob_skb -> listener

, so this must be cleaned up by GC.  Otherwise, the listener's
refcnt is not released and sockets are leaked:

  # unshare -n
  # cat /proc/net/protocols | grep UNIX-STREAM
  UNIX-STREAM 1024      0      -1   NI       0   yes  kernel ...

  # python3
  >>> from array import array
  >>> from socket import *
  >>>
  >>> s = socket(AF_UNIX, SOCK_STREAM)
  >>> s.bind('\0test\0')
  >>> s.listen()
  >>>
  >>> c = socket(AF_UNIX, SOCK_STREAM)
  >>> c.connect(s.getsockname())
  >>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB)
  1
  >>> quit()

  # cat /proc/net/protocols | grep UNIX-STREAM
  UNIX-STREAM 1024      3      -1   NI       0   yes  kernel ...
                        ^^^
                        3 sockets still in use after FDs are close()d

Let's drop the embryo socket's oob_skb ref in scan_inflight().

This also fixes a racy access to oob_skb that commit 9841991a446c
("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue
lock.") fixed for the new Tarjan's algo-based GC.

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Reported-by: Lei Lu <llfamsec@gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
This has no upstream commit because I replaced the entire GC in
6.10 and the new GC does not have this bug, and this fix is only
applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6.
---
---
 net/unix/garbage.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Greg KH March 5, 2025, 2:08 p.m. UTC | #1
On Mon, Mar 03, 2025 at 07:01:49PM -0800, Kuniyuki Iwashima wrote:
> Embryo socket is not queued in gc_candidates, so we can't drop
> a reference held by its oob_skb.
> 
> Let's say we create listener and embryo sockets, send the
> listener's fd to the embryo as OOB data, and close() them
> without recv()ing the OOB data.
> 
> There is a self-reference cycle like
> 
>   listener -> embryo.oob_skb -> listener
> 
> , so this must be cleaned up by GC.  Otherwise, the listener's
> refcnt is not released and sockets are leaked:
> 
>   # unshare -n
>   # cat /proc/net/protocols | grep UNIX-STREAM
>   UNIX-STREAM 1024      0      -1   NI       0   yes  kernel ...
> 
>   # python3
>   >>> from array import array
>   >>> from socket import *
>   >>>
>   >>> s = socket(AF_UNIX, SOCK_STREAM)
>   >>> s.bind('\0test\0')
>   >>> s.listen()
>   >>>
>   >>> c = socket(AF_UNIX, SOCK_STREAM)
>   >>> c.connect(s.getsockname())
>   >>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB)
>   1
>   >>> quit()
> 
>   # cat /proc/net/protocols | grep UNIX-STREAM
>   UNIX-STREAM 1024      3      -1   NI       0   yes  kernel ...
>                         ^^^
>                         3 sockets still in use after FDs are close()d
> 
> Let's drop the embryo socket's oob_skb ref in scan_inflight().
> 
> This also fixes a racy access to oob_skb that commit 9841991a446c
> ("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue
> lock.") fixed for the new Tarjan's algo-based GC.
> 
> Fixes: 314001f0bf92 ("af_unix: Add OOB support")
> Reported-by: Lei Lu <llfamsec@gmail.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> This has no upstream commit because I replaced the entire GC in
> 6.10 and the new GC does not have this bug, and this fix is only
> applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6.

You need to get the networking maintainers to review and agree that this
is ok for us to take, as we really don't want to take "custom" stuff
like thi s at all.  Why not just take the commits that are in newer
kernels instead?

thanks,

greg k-h
Kuniyuki Iwashima March 5, 2025, 6:10 p.m. UTC | #2
+Paolo

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 5 Mar 2025 15:08:26 +0100
> On Mon, Mar 03, 2025 at 07:01:49PM -0800, Kuniyuki Iwashima wrote:
> > Embryo socket is not queued in gc_candidates, so we can't drop
> > a reference held by its oob_skb.
> > 
> > Let's say we create listener and embryo sockets, send the
> > listener's fd to the embryo as OOB data, and close() them
> > without recv()ing the OOB data.
> > 
> > There is a self-reference cycle like
> > 
> >   listener -> embryo.oob_skb -> listener
> > 
> > , so this must be cleaned up by GC.  Otherwise, the listener's
> > refcnt is not released and sockets are leaked:
> > 
> >   # unshare -n
> >   # cat /proc/net/protocols | grep UNIX-STREAM
> >   UNIX-STREAM 1024      0      -1   NI       0   yes  kernel ...
> > 
> >   # python3
> >   >>> from array import array
> >   >>> from socket import *
> >   >>>
> >   >>> s = socket(AF_UNIX, SOCK_STREAM)
> >   >>> s.bind('\0test\0')
> >   >>> s.listen()
> >   >>>
> >   >>> c = socket(AF_UNIX, SOCK_STREAM)
> >   >>> c.connect(s.getsockname())
> >   >>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB)
> >   1
> >   >>> quit()
> > 
> >   # cat /proc/net/protocols | grep UNIX-STREAM
> >   UNIX-STREAM 1024      3      -1   NI       0   yes  kernel ...
> >                         ^^^
> >                         3 sockets still in use after FDs are close()d
> > 
> > Let's drop the embryo socket's oob_skb ref in scan_inflight().
> > 
> > This also fixes a racy access to oob_skb that commit 9841991a446c
> > ("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue
> > lock.") fixed for the new Tarjan's algo-based GC.
> > 
> > Fixes: 314001f0bf92 ("af_unix: Add OOB support")
> > Reported-by: Lei Lu <llfamsec@gmail.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> > This has no upstream commit because I replaced the entire GC in
> > 6.10 and the new GC does not have this bug, and this fix is only
> > applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6.
> 
> You need to get the networking maintainers to review and agree that this
> is ok for us to take, as we really don't want to take "custom" stuff
> like thi s at all.

Paolo, could you take a look at this patch ?
https://lore.kernel.org/netdev/20250304030149.82265-1-kuniyu@amazon.com/


> Why not just take the commits that are in newer
> kernels instead?

That will be about 20 patches that rewrite the most lines of
net/unix/garbage.c and cannot be applied cleanly.

I think backporting these commits is overkill to fix a small
bug that can be fixed with a much smaller diff.

927fa5b3e4f5 af_unix: Fix uninit-value in __unix_walk_scc()
041933a1ec7b af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
7172dc93d621 af_unix: Add dead flag to struct scm_fp_list.
1af2dface5d2 af_unix: Don't access successor in unix_del_edges() during GC.
fd86344823b5 af_unix: Try not to hold unix_gc_lock during accept().
118f457da9ed af_unix: Remove lock dance in unix_peek_fds().
4090fa373f0e af_unix: Replace garbage collection algorithm.
a15702d8b3aa af_unix: Detect dead SCC.
bfdb01283ee8 af_unix: Assign a unique index to SCC.
ad081928a8b0 af_unix: Avoid Tarjan's algorithm if unnecessary.
77e5593aebba af_unix: Skip GC if no cycle exists.
ba31b4a4e101 af_unix: Save O(n) setup of Tarjan's algo.
dcf70df2048d af_unix: Fix up unix_edge.successor for embryo socket.
3484f063172d af_unix: Detect Strongly Connected Components.
6ba76fd2848e af_unix: Iterate all vertices by DFS.
22c3c0c52d32 af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
42f298c06b30 af_unix: Link struct unix_edge when queuing skb.
29b64e354029 af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
1fbfdfaa5902 af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.
Greg KH March 5, 2025, 6:22 p.m. UTC | #3
On Wed, Mar 05, 2025 at 10:10:41AM -0800, Kuniyuki Iwashima wrote:
> +Paolo
> 
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Date: Wed, 5 Mar 2025 15:08:26 +0100
> > On Mon, Mar 03, 2025 at 07:01:49PM -0800, Kuniyuki Iwashima wrote:
> > > Embryo socket is not queued in gc_candidates, so we can't drop
> > > a reference held by its oob_skb.
> > > 
> > > Let's say we create listener and embryo sockets, send the
> > > listener's fd to the embryo as OOB data, and close() them
> > > without recv()ing the OOB data.
> > > 
> > > There is a self-reference cycle like
> > > 
> > >   listener -> embryo.oob_skb -> listener
> > > 
> > > , so this must be cleaned up by GC.  Otherwise, the listener's
> > > refcnt is not released and sockets are leaked:
> > > 
> > >   # unshare -n
> > >   # cat /proc/net/protocols | grep UNIX-STREAM
> > >   UNIX-STREAM 1024      0      -1   NI       0   yes  kernel ...
> > > 
> > >   # python3
> > >   >>> from array import array
> > >   >>> from socket import *
> > >   >>>
> > >   >>> s = socket(AF_UNIX, SOCK_STREAM)
> > >   >>> s.bind('\0test\0')
> > >   >>> s.listen()
> > >   >>>
> > >   >>> c = socket(AF_UNIX, SOCK_STREAM)
> > >   >>> c.connect(s.getsockname())
> > >   >>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB)
> > >   1
> > >   >>> quit()
> > > 
> > >   # cat /proc/net/protocols | grep UNIX-STREAM
> > >   UNIX-STREAM 1024      3      -1   NI       0   yes  kernel ...
> > >                         ^^^
> > >                         3 sockets still in use after FDs are close()d
> > > 
> > > Let's drop the embryo socket's oob_skb ref in scan_inflight().
> > > 
> > > This also fixes a racy access to oob_skb that commit 9841991a446c
> > > ("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue
> > > lock.") fixed for the new Tarjan's algo-based GC.
> > > 
> > > Fixes: 314001f0bf92 ("af_unix: Add OOB support")
> > > Reported-by: Lei Lu <llfamsec@gmail.com>
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > > This has no upstream commit because I replaced the entire GC in
> > > 6.10 and the new GC does not have this bug, and this fix is only
> > > applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6.
> > 
> > You need to get the networking maintainers to review and agree that this
> > is ok for us to take, as we really don't want to take "custom" stuff
> > like thi s at all.
> 
> Paolo, could you take a look at this patch ?
> https://lore.kernel.org/netdev/20250304030149.82265-1-kuniyu@amazon.com/
> 
> 
> > Why not just take the commits that are in newer
> > kernels instead?
> 
> That will be about 20 patches that rewrite the most lines of
> net/unix/garbage.c and cannot be applied cleanly.
> 
> I think backporting these commits is overkill to fix a small
> bug that can be fixed with a much smaller diff.
> 
> 927fa5b3e4f5 af_unix: Fix uninit-value in __unix_walk_scc()
> 041933a1ec7b af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
> 7172dc93d621 af_unix: Add dead flag to struct scm_fp_list.
> 1af2dface5d2 af_unix: Don't access successor in unix_del_edges() during GC.
> fd86344823b5 af_unix: Try not to hold unix_gc_lock during accept().
> 118f457da9ed af_unix: Remove lock dance in unix_peek_fds().
> 4090fa373f0e af_unix: Replace garbage collection algorithm.
> a15702d8b3aa af_unix: Detect dead SCC.
> bfdb01283ee8 af_unix: Assign a unique index to SCC.
> ad081928a8b0 af_unix: Avoid Tarjan's algorithm if unnecessary.
> 77e5593aebba af_unix: Skip GC if no cycle exists.
> ba31b4a4e101 af_unix: Save O(n) setup of Tarjan's algo.
> dcf70df2048d af_unix: Fix up unix_edge.successor for embryo socket.
> 3484f063172d af_unix: Detect Strongly Connected Components.
> 6ba76fd2848e af_unix: Iterate all vertices by DFS.
> 22c3c0c52d32 af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
> 42f298c06b30 af_unix: Link struct unix_edge when queuing skb.
> 29b64e354029 af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
> 1fbfdfaa5902 af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.

Sure, but now all fixes made upstream after these changes will not apply
to older kernels at all, making supporting this old one-off change
harder and harder over time.

But I'll defer to the maintainers here as to what they want.  Taking 20+
patches in a stable tree is trivial for us, not a problem at all.

thanks,

greg k-h
diff mbox series

Patch

diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 2a758531e102..b3fbdf129944 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -102,13 +102,14 @@  static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 			/* Process the descriptors of this socket */
 			int nfd = UNIXCB(skb).fp->count;
 			struct file **fp = UNIXCB(skb).fp->fp;
+			struct unix_sock *u;
 
 			while (nfd--) {
 				/* Get the socket the fd matches if it indeed does so */
 				struct sock *sk = unix_get_socket(*fp++);
 
 				if (sk) {
-					struct unix_sock *u = unix_sk(sk);
+					u = unix_sk(sk);
 
 					/* Ignore non-candidates, they could
 					 * have been added to the queues after
@@ -122,6 +123,13 @@  static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 				}
 			}
 			if (hit && hitlist != NULL) {
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+				u = unix_sk(x);
+				if (u->oob_skb) {
+					WARN_ON_ONCE(skb_unref(u->oob_skb));
+					u->oob_skb = NULL;
+				}
+#endif
 				__skb_unlink(skb, &x->sk_receive_queue);
 				__skb_queue_tail(hitlist, skb);
 			}
@@ -299,17 +307,9 @@  void unix_gc(void)
 	 * which are creating the cycle(s).
 	 */
 	skb_queue_head_init(&hitlist);
-	list_for_each_entry(u, &gc_candidates, link) {
+	list_for_each_entry(u, &gc_candidates, link)
 		scan_children(&u->sk, inc_inflight, &hitlist);
 
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
-		if (u->oob_skb) {
-			kfree_skb(u->oob_skb);
-			u->oob_skb = NULL;
-		}
-#endif
-	}
-
 	/* not_cycle_list contains those sockets which do not make up a
 	 * cycle.  Restore these to the inflight list.
 	 */