diff mbox series

[v1,net] af_unix: Clear stale u->oob_skb.

Message ID 20240405204145.93169-1-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v1,net] af_unix: Clear stale u->oob_skb. | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 948 this patch: 948
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dhowells@redhat.com
netdev/build_clang success Errors and warnings before: 954 this patch: 954
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 959 this patch: 959
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kuniyuki Iwashima April 5, 2024, 8:41 p.m. UTC
syzkaller started to report deadlock of unix_gc_lock after commit
4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but
it just uncovers the bug that has been there since commit 314001f0bf92
("af_unix: Add OOB support").

The repro basically does the following.

  from socket import *
  from array import array

  c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
  c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
  c2.recv(1)  # blocked as no normal data in recv queue

  c2.close()  # done async and unblock recv()
  c1.close()  # done async and trigger GC

A socket sends its file descriptor to itself as OOB data and tries to
receive normal data, but finally recv() fails due to async close().

The problem here is wrong handling of OOB skb in manage_oob().  When
recvmsg() is called without MSG_OOB, manage_oob() is called to check
if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
out of the receive queue but does not clear unix_sock(sk)->oob_skb.
This is wrong in terms of uAPI.

Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
The 'o' is handled as OOB data.  When recv() is called twice without
MSG_OOB, the OOB data should be lost.

  >>> from socket import *
  >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
  >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
  5
  >>> c1.send(b'world')
  5
  >>> c2.recv(5)  # OOB data is not received
  b'hell'
  >>> c2.recv(5)  # OOB date is skippeed
  b'world'
  >>> c2.recv(5, MSG_OOB)  # This should return an error
  b'o'

In the same situation, TCP actually returns -EINVAL for the last
recv().

Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
EPOLLPRI even though the data has passed through by previous recv().

To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
it from recv queue.

The reason why the old GC did not trigger the deadlock is because the
old GC relied on the receive queue to detect the loop.

When it is triggered, the socket with OOB data is marked as GC candidate
because file refcount == inflight count (1).  However, after traversing
all inflight sockets, the socket still has a positive inflight count (1),
thus the socket is excluded from candidates.  Then, the old GC lose the
chance to garbage-collect the socket.

With the old GC, the repro continues to create true garbage that will
never be freed nor detected by kmemleak as it's linked to the global
inflight list.  That's why we couldn't even notice the issue.

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Eric Dumazet April 5, 2024, 9:06 p.m. UTC | #1
On Fri, Apr 5, 2024 at 10:42 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> syzkaller started to report deadlock of unix_gc_lock after commit
> 4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but
> it just uncovers the bug that has been there since commit 314001f0bf92
> ("af_unix: Add OOB support").
>
> The repro basically does the following.
>
>   from socket import *
>   from array import array
>
>   c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>   c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
>   c2.recv(1)  # blocked as no normal data in recv queue
>
>   c2.close()  # done async and unblock recv()
>   c1.close()  # done async and trigger GC
>
> A socket sends its file descriptor to itself as OOB data and tries to
> receive normal data, but finally recv() fails due to async close().
>
> The problem here is wrong handling of OOB skb in manage_oob().  When
> recvmsg() is called without MSG_OOB, manage_oob() is called to check
> if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
> out of the receive queue but does not clear unix_sock(sk)->oob_skb.
> This is wrong in terms of uAPI.
>
> Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
> The 'o' is handled as OOB data.  When recv() is called twice without
> MSG_OOB, the OOB data should be lost.
>
>   >>> from socket import *
>   >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
>   >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
>   5
>   >>> c1.send(b'world')
>   5
>   >>> c2.recv(5)  # OOB data is not received
>   b'hell'
>   >>> c2.recv(5)  # OOB date is skippeed
>   b'world'
>   >>> c2.recv(5, MSG_OOB)  # This should return an error
>   b'o'
>
> In the same situation, TCP actually returns -EINVAL for the last
> recv().
>
> Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
> EPOLLPRI even though the data has passed through by previous recv().
>
> To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
> it from recv queue.
>
> The reason why the old GC did not trigger the deadlock is because the
> old GC relied on the receive queue to detect the loop.
>
> When it is triggered, the socket with OOB data is marked as GC candidate
> because file refcount == inflight count (1).  However, after traversing
> all inflight sockets, the socket still has a positive inflight count (1),
> thus the socket is excluded from candidates.  Then, the old GC lose the
> chance to garbage-collect the socket.
>
> With the old GC, the repro continues to create true garbage that will
> never be freed nor detected by kmemleak as it's linked to the global
> inflight list.  That's why we couldn't even notice the issue.
>
> Fixes: 314001f0bf92 ("af_unix: Add OOB support")
> Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  net/unix/af_unix.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 5b41e2321209..8f105cf535be 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2665,7 +2665,9 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
>                                 }
>                         } else if (!(flags & MSG_PEEK)) {
>                                 skb_unlink(skb, &sk->sk_receive_queue);
> -                               consume_skb(skb);
> +                               WRITE_ONCE(u->oob_skb, NULL);
> +                               kfree_skb(skb);

I dunno, this duplicate kfree_skb() is quite unusual and would deserve
a comment.

I would perhaps use

   if (!WARN_ON_ONCE(skb_unref(skb))
      kfree_skb(skb);

> +                               kfree_skb(skb);
>                                 skb = skb_peek(&sk->sk_receive_queue);
>                         }
>                 }
> --
> 2.30.2
>
Kuniyuki Iwashima April 5, 2024, 9:54 p.m. UTC | #2
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 5 Apr 2024 23:06:32 +0200
> On Fri, Apr 5, 2024 at 10:42 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > syzkaller started to report deadlock of unix_gc_lock after commit
> > 4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but
> > it just uncovers the bug that has been there since commit 314001f0bf92
> > ("af_unix: Add OOB support").
> >
> > The repro basically does the following.
> >
> >   from socket import *
> >   from array import array
> >
> >   c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
> >   c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
> >   c2.recv(1)  # blocked as no normal data in recv queue
> >
> >   c2.close()  # done async and unblock recv()
> >   c1.close()  # done async and trigger GC
> >
> > A socket sends its file descriptor to itself as OOB data and tries to
> > receive normal data, but finally recv() fails due to async close().
> >
> > The problem here is wrong handling of OOB skb in manage_oob().  When
> > recvmsg() is called without MSG_OOB, manage_oob() is called to check
> > if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
> > out of the receive queue but does not clear unix_sock(sk)->oob_skb.
> > This is wrong in terms of uAPI.
> >
> > Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
> > The 'o' is handled as OOB data.  When recv() is called twice without
> > MSG_OOB, the OOB data should be lost.
> >
> >   >>> from socket import *
> >   >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
> >   >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
> >   5
> >   >>> c1.send(b'world')
> >   5
> >   >>> c2.recv(5)  # OOB data is not received
> >   b'hell'
> >   >>> c2.recv(5)  # OOB date is skippeed
> >   b'world'
> >   >>> c2.recv(5, MSG_OOB)  # This should return an error
> >   b'o'
> >
> > In the same situation, TCP actually returns -EINVAL for the last
> > recv().
> >
> > Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
> > EPOLLPRI even though the data has passed through by previous recv().
> >
> > To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
> > it from recv queue.
> >
> > The reason why the old GC did not trigger the deadlock is because the
> > old GC relied on the receive queue to detect the loop.
> >
> > When it is triggered, the socket with OOB data is marked as GC candidate
> > because file refcount == inflight count (1).  However, after traversing
> > all inflight sockets, the socket still has a positive inflight count (1),
> > thus the socket is excluded from candidates.  Then, the old GC lose the
> > chance to garbage-collect the socket.
> >
> > With the old GC, the repro continues to create true garbage that will
> > never be freed nor detected by kmemleak as it's linked to the global
> > inflight list.  That's why we couldn't even notice the issue.
> >
> > Fixes: 314001f0bf92 ("af_unix: Add OOB support")
> > Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  net/unix/af_unix.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > index 5b41e2321209..8f105cf535be 100644
> > --- a/net/unix/af_unix.c
> > +++ b/net/unix/af_unix.c
> > @@ -2665,7 +2665,9 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
> >                                 }
> >                         } else if (!(flags & MSG_PEEK)) {
> >                                 skb_unlink(skb, &sk->sk_receive_queue);
> > -                               consume_skb(skb);
> > +                               WRITE_ONCE(u->oob_skb, NULL);
> > +                               kfree_skb(skb);
> 
> I dunno, this duplicate kfree_skb() is quite unusual and would deserve
> a comment.
> 
> I would perhaps use
> 
>    if (!WARN_ON_ONCE(skb_unref(skb))
>       kfree_skb(skb);
>

Ah, this is what I wanted..!  Somehow I was wondering if I should use
either kfree_skb() or refcount_dec() directly :S

Will post v2, thanks!
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5b41e2321209..8f105cf535be 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2665,7 +2665,9 @@  static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
 				}
 			} else if (!(flags & MSG_PEEK)) {
 				skb_unlink(skb, &sk->sk_receive_queue);
-				consume_skb(skb);
+				WRITE_ONCE(u->oob_skb, NULL);
+				kfree_skb(skb);
+				kfree_skb(skb);
 				skb = skb_peek(&sk->sk_receive_queue);
 			}
 		}