diff mbox series

[net] net: mptcp: fix unreleased socket in accept queue

Message ID 20220905050400.1136241-1-imagedong@tencent.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net: mptcp: fix unreleased socket in accept queue | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 2 this patch: 5
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang fail Errors and warnings before: 5 this patch: 6
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 2 this patch: 5
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Menglong Dong Sept. 5, 2022, 5:04 a.m. UTC
From: Menglong Dong <imagedong@tencent.com>

The mptcp socket and its subflow sockets in accept queue can't be
released after the process exit.

While the release of a mptcp socket in listening state, the
corresponding tcp socket will be released too. Meanwhile, the tcp
socket in the unaccept queue will be released too. However, only init
subflow is in the unaccept queue, and the joined subflow is not in the
unaccept queue, which makes the joined subflow won't be released, and
therefore the corresponding unaccepted mptcp socket will not be released
to.

This can be reproduced easily with following steps:

1. create 2 namespace and veth:
   $ ip netns add mptcp-client
   $ ip netns add mptcp-server
   $ sysctl -w net.ipv4.conf.all.rp_filter=0
   $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
   $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
   $ ip link add red-client netns mptcp-client type veth peer red-server \
     netns mptcp-server
   $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
   $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
   $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
   $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
   $ ip -n mptcp-server link set red-server up
   $ ip -n mptcp-client link set red-client up

2. configure the endpoint and limit for client and server:
   $ ip -n mptcp-server mptcp endpoint flush
   $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
   $ ip -n mptcp-client mptcp endpoint flush
   $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
   $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
     1 subflow

3. listen and accept on a port, such as 9999. The nc command we used
   here is modified, which makes it uses mptcp protocol by default.
   And the default backlog is 1:
   ip netns exec mptcp-server nc -l -k -p 9999

4. open another *two* terminal and connect to the server with the
   following command:
   $ ip netns exec mptcp-client nc 10.0.0.1 9999
   input something after connect, to triger the connection of the second
   subflow

5. exit all the nc command, and check the tcp socket in server namespace.
   And you will find that there is one tcp socket in CLOSE_WAIT state
   and can't release forever.

There are some solutions that I thought:

1. release all unaccepted mptcp socket with mptcp_close() while the
   listening tcp socket release in mptcp_subflow_queue_clean(). This is
   what we do in this commit.
2. release the mptcp socket with mptcp_close() in subflow_ulp_release().
3. etc

Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
 net/mptcp/subflow.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

kernel test robot Sept. 5, 2022, 6:47 a.m. UTC | #1
Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/intel-lab-lkp/linux/commits/menglong8-dong-gmail-com/net-mptcp-fix-unreleased-socket-in-accept-queue/20220905-130457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git c90714017cb3f197e71c7ff1317335b96d4d19e8
config: s390-randconfig-r015-20220905
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project c55b41d5199d2394dd6cdb8f52180d8b81d809d4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/496c680afa6c8a180858e88ba2b5a6aa6d262bed
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review menglong8-dong-gmail-com/net-mptcp-fix-unreleased-socket-in-accept-queue/20220905-130457
        git checkout 496c680afa6c8a180858e88ba2b5a6aa6d262bed
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash net/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from net/mptcp/subflow.c:11:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:43:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from net/mptcp/subflow.c:11:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:43:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from net/mptcp/subflow.c:11:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:43:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> net/mptcp/subflow.c:1776:24: error: too few arguments to function call, expected 2, have 1
                   sk->sk_prot->close(sk);
                   ~~~~~~~~~~~~~~~~~~   ^
   12 warnings and 1 error generated.


vim +1776 net/mptcp/subflow.c

  1726	
  1727	void mptcp_subflow_queue_clean(struct sock *listener_ssk)
  1728	{
  1729		struct request_sock_queue *queue = &inet_csk(listener_ssk)->icsk_accept_queue;
  1730		struct mptcp_sock *msk, *next, *head = NULL;
  1731		struct request_sock *req;
  1732	
  1733		/* build a list of all unaccepted mptcp sockets */
  1734		spin_lock_bh(&queue->rskq_lock);
  1735		for (req = queue->rskq_accept_head; req; req = req->dl_next) {
  1736			struct mptcp_subflow_context *subflow;
  1737			struct sock *ssk = req->sk;
  1738			struct mptcp_sock *msk;
  1739	
  1740			if (!sk_is_mptcp(ssk))
  1741				continue;
  1742	
  1743			subflow = mptcp_subflow_ctx(ssk);
  1744			if (!subflow || !subflow->conn)
  1745				continue;
  1746	
  1747			/* skip if already in list */
  1748			msk = mptcp_sk(subflow->conn);
  1749			if (msk->dl_next || msk == head)
  1750				continue;
  1751	
  1752			msk->dl_next = head;
  1753			head = msk;
  1754		}
  1755		spin_unlock_bh(&queue->rskq_lock);
  1756		if (!head)
  1757			return;
  1758	
  1759		/* can't acquire the msk socket lock under the subflow one,
  1760		 * or will cause ABBA deadlock
  1761		 */
  1762		release_sock(listener_ssk);
  1763	
  1764		for (msk = head; msk; msk = next) {
  1765			struct sock *sk = (struct sock *)msk;
  1766			bool slow;
  1767	
  1768			slow = lock_sock_fast_nested(sk);
  1769			next = msk->dl_next;
  1770			msk->first = NULL;
  1771			msk->dl_next = NULL;
  1772			unlock_sock_fast(sk, slow);
  1773	
  1774			/*  */
  1775			sock_hold(sk);
> 1776			sk->sk_prot->close(sk);
  1777		}
  1778	
  1779		/* we are still under the listener msk socket lock */
  1780		lock_sock_nested(listener_ssk, SINGLE_DEPTH_NESTING);
  1781	}
  1782
Paolo Abeni Sept. 5, 2022, 8:26 a.m. UTC | #2
Hello,

On Mon, 2022-09-05 at 13:04 +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> The mptcp socket and its subflow sockets in accept queue can't be
> released after the process exit.
> 
> While the release of a mptcp socket in listening state, the
> corresponding tcp socket will be released too. Meanwhile, the tcp
> socket in the unaccept queue will be released too. However, only init
> subflow is in the unaccept queue, and the joined subflow is not in the
> unaccept queue, which makes the joined subflow won't be released, and
> therefore the corresponding unaccepted mptcp socket will not be released
> to.
> 
> This can be reproduced easily with following steps:
> 
> 1. create 2 namespace and veth:
>    $ ip netns add mptcp-client
>    $ ip netns add mptcp-server
>    $ sysctl -w net.ipv4.conf.all.rp_filter=0
>    $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
>    $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
>    $ ip link add red-client netns mptcp-client type veth peer red-server \
>      netns mptcp-server
>    $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
>    $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
>    $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
>    $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
>    $ ip -n mptcp-server link set red-server up
>    $ ip -n mptcp-client link set red-client up
> 
> 2. configure the endpoint and limit for client and server:
>    $ ip -n mptcp-server mptcp endpoint flush
>    $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
>    $ ip -n mptcp-client mptcp endpoint flush
>    $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
>    $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
>      1 subflow
> 
> 3. listen and accept on a port, such as 9999. The nc command we used
>    here is modified, which makes it uses mptcp protocol by default.
>    And the default backlog is 1:
>    ip netns exec mptcp-server nc -l -k -p 9999
> 
> 4. open another *two* terminal and connect to the server with the
>    following command:
>    $ ip netns exec mptcp-client nc 10.0.0.1 9999
>    input something after connect, to triger the connection of the second
>    subflow
> 
> 5. exit all the nc command, and check the tcp socket in server namespace.
>    And you will find that there is one tcp socket in CLOSE_WAIT state
>    and can't release forever.

Thank you for the report! 

I have a doubt WRT the above scenario: AFAICS 'nc' will accept the
incoming sockets ASAP, so the unaccepted queue should be empty at
shutdown, but that does not fit with your description?!?

> There are some solutions that I thought:
> 
> 1. release all unaccepted mptcp socket with mptcp_close() while the
>    listening tcp socket release in mptcp_subflow_queue_clean(). This is
>    what we do in this commit.
> 2. release the mptcp socket with mptcp_close() in subflow_ulp_release().
> 3. etc
> 

Can you please point to a commit introducing the issue?

> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
>  net/mptcp/subflow.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index c7d49fb6e7bd..e39dff5d5d84 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1770,6 +1770,10 @@ void mptcp_subflow_queue_clean(struct sock *listener_ssk)
>  		msk->first = NULL;
>  		msk->dl_next = NULL;
>  		unlock_sock_fast(sk, slow);
> +
> +		/*  */
> +		sock_hold(sk);
> +		sk->sk_prot->close(sk);

You can call mptcp_close() directly here.

Perhaps we could as well drop the mptcp_sock_destruct() hack?

Perhpas even providing a __mptcp_close() variant not acquiring the
socket lock and move such close call inside the existing sk socket lock
above?

Thanks,

Paolo
Menglong Dong Sept. 5, 2022, 9:03 a.m. UTC | #3
On Mon, Sep 5, 2022 at 4:26 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hello,
>
> On Mon, 2022-09-05 at 13:04 +0800, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > The mptcp socket and its subflow sockets in accept queue can't be
> > released after the process exit.
> >
> > While the release of a mptcp socket in listening state, the
> > corresponding tcp socket will be released too. Meanwhile, the tcp
> > socket in the unaccept queue will be released too. However, only init
> > subflow is in the unaccept queue, and the joined subflow is not in the
> > unaccept queue, which makes the joined subflow won't be released, and
> > therefore the corresponding unaccepted mptcp socket will not be released
> > to.
> >
> > This can be reproduced easily with following steps:
> >
> > 1. create 2 namespace and veth:
> >    $ ip netns add mptcp-client
> >    $ ip netns add mptcp-server
> >    $ sysctl -w net.ipv4.conf.all.rp_filter=0
> >    $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
> >    $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
> >    $ ip link add red-client netns mptcp-client type veth peer red-server \
> >      netns mptcp-server
> >    $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
> >    $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
> >    $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
> >    $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
> >    $ ip -n mptcp-server link set red-server up
> >    $ ip -n mptcp-client link set red-client up
> >
> > 2. configure the endpoint and limit for client and server:
> >    $ ip -n mptcp-server mptcp endpoint flush
> >    $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
> >    $ ip -n mptcp-client mptcp endpoint flush
> >    $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
> >    $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
> >      1 subflow
> >
> > 3. listen and accept on a port, such as 9999. The nc command we used
> >    here is modified, which makes it uses mptcp protocol by default.
> >    And the default backlog is 1:
> >    ip netns exec mptcp-server nc -l -k -p 9999
> >
> > 4. open another *two* terminal and connect to the server with the
> >    following command:
> >    $ ip netns exec mptcp-client nc 10.0.0.1 9999
> >    input something after connect, to triger the connection of the second
> >    subflow
> >
> > 5. exit all the nc command, and check the tcp socket in server namespace.
> >    And you will find that there is one tcp socket in CLOSE_WAIT state
> >    and can't release forever.
>
> Thank you for the report!
>
> I have a doubt WRT the above scenario: AFAICS 'nc' will accept the
> incoming sockets ASAP, so the unaccepted queue should be empty at
> shutdown, but that does not fit with your description?!?
>

By default, as far as in my case, nc won't accept the new connection
until the first connection closes with the '-k' set. Therefor, the second
connection will stay in the unaccepted queue.

> > There are some solutions that I thought:
> >
> > 1. release all unaccepted mptcp socket with mptcp_close() while the
> >    listening tcp socket release in mptcp_subflow_queue_clean(). This is
> >    what we do in this commit.
> > 2. release the mptcp socket with mptcp_close() in subflow_ulp_release().
> > 3. etc
> >
>
> Can you please point to a commit introducing the issue?
>

In fact, I'm not sure. In my case, I found this issue in kernel 5.10.
And I wanted to find the solution in the upstream, but find that
upstream has this issue too.

Hmm...I am curious if this issue exists in the beginning? I
can't find the opportunity that the joined subflow which are
unaccepted can be released.

> > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > ---
> >  net/mptcp/subflow.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > index c7d49fb6e7bd..e39dff5d5d84 100644
> > --- a/net/mptcp/subflow.c
> > +++ b/net/mptcp/subflow.c
> > @@ -1770,6 +1770,10 @@ void mptcp_subflow_queue_clean(struct sock *listener_ssk)
> >               msk->first = NULL;
> >               msk->dl_next = NULL;
> >               unlock_sock_fast(sk, slow);
> > +
> > +             /*  */
> > +             sock_hold(sk);
> > +             sk->sk_prot->close(sk);
>
> You can call mptcp_close() directly here.
>
> Perhaps we could as well drop the mptcp_sock_destruct() hack?

Do you mean to call mptcp_sock_destruct() directly here?

>
> Perhpas even providing a __mptcp_close() variant not acquiring the
> socket lock and move such close call inside the existing sk socket lock
> above?
>

Yeah, sounds nice.

Thanks!
Menglong Dong

> Thanks,
>
> Paolo
>
Paolo Abeni Sept. 6, 2022, 7:02 a.m. UTC | #4
On Mon, 2022-09-05 at 17:03 +0800, Menglong Dong wrote:
> On Mon, Sep 5, 2022 at 4:26 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Hello,
> > 
> > On Mon, 2022-09-05 at 13:04 +0800, menglong8.dong@gmail.com wrote:
> > > From: Menglong Dong <imagedong@tencent.com>
> > > 
> > > The mptcp socket and its subflow sockets in accept queue can't be
> > > released after the process exit.
> > > 
> > > While the release of a mptcp socket in listening state, the
> > > corresponding tcp socket will be released too. Meanwhile, the tcp
> > > socket in the unaccept queue will be released too. However, only init
> > > subflow is in the unaccept queue, and the joined subflow is not in the
> > > unaccept queue, which makes the joined subflow won't be released, and
> > > therefore the corresponding unaccepted mptcp socket will not be released
> > > to.
> > > 
> > > This can be reproduced easily with following steps:
> > > 
> > > 1. create 2 namespace and veth:
> > >    $ ip netns add mptcp-client
> > >    $ ip netns add mptcp-server
> > >    $ sysctl -w net.ipv4.conf.all.rp_filter=0
> > >    $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
> > >    $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
> > >    $ ip link add red-client netns mptcp-client type veth peer red-server \
> > >      netns mptcp-server
> > >    $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
> > >    $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
> > >    $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
> > >    $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
> > >    $ ip -n mptcp-server link set red-server up
> > >    $ ip -n mptcp-client link set red-client up
> > > 
> > > 2. configure the endpoint and limit for client and server:
> > >    $ ip -n mptcp-server mptcp endpoint flush
> > >    $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
> > >    $ ip -n mptcp-client mptcp endpoint flush
> > >    $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
> > >    $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
> > >      1 subflow
> > > 
> > > 3. listen and accept on a port, such as 9999. The nc command we used
> > >    here is modified, which makes it uses mptcp protocol by default.
> > >    And the default backlog is 1:
> > >    ip netns exec mptcp-server nc -l -k -p 9999
> > > 
> > > 4. open another *two* terminal and connect to the server with the
> > >    following command:
> > >    $ ip netns exec mptcp-client nc 10.0.0.1 9999
> > >    input something after connect, to triger the connection of the second
> > >    subflow
> > > 
> > > 5. exit all the nc command, and check the tcp socket in server namespace.
> > >    And you will find that there is one tcp socket in CLOSE_WAIT state
> > >    and can't release forever.
> > 
> > Thank you for the report!
> > 
> > I have a doubt WRT the above scenario: AFAICS 'nc' will accept the
> > incoming sockets ASAP, so the unaccepted queue should be empty at
> > shutdown, but that does not fit with your description?!?
> > 
> 
> By default, as far as in my case, nc won't accept the new connection
> until the first connection closes with the '-k' set. Therefor, the second
> connection will stay in the unaccepted queue.

I missed the fact you opened 2 connections. I guess that is point 4
above. Please rephrase that sentence with something alike:

---
4. open another *two* terminal and use each of them to connect to the
server with the following command:
...
So that there are two established mptcp connections, with the second
one still unaccepted.
---
> 
> > > There are some solutions that I thought:
> > > 
> > > 1. release all unaccepted mptcp socket with mptcp_close() while the
> > >    listening tcp socket release in mptcp_subflow_queue_clean(). This is
> > >    what we do in this commit.
> > > 2. release the mptcp socket with mptcp_close() in subflow_ulp_release().
> > > 3. etc
> > > 
> > 
> > Can you please point to a commit introducing the issue?
> > 
> 
> In fact, I'm not sure. In my case, I found this issue in kernel 5.10.
> And I wanted to find the solution in the upstream, but find that
> upstream has this issue too.
> 
> Hmm...I am curious if this issue exists in the beginning? I
> can't find the opportunity that the joined subflow which are
> unaccepted can be released.

It looks like the problem is there since MPJ support, commit
f296234c98a8fcec94eec80304a873f635d350ea

> 
> > > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > > ---
> > >  net/mptcp/subflow.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > > index c7d49fb6e7bd..e39dff5d5d84 100644
> > > --- a/net/mptcp/subflow.c
> > > +++ b/net/mptcp/subflow.c
> > > @@ -1770,6 +1770,10 @@ void mptcp_subflow_queue_clean(struct sock *listener_ssk)
> > >               msk->first = NULL;
> > >               msk->dl_next = NULL;
> > >               unlock_sock_fast(sk, slow);
> > > +
> > > +             /*  */
> > > +             sock_hold(sk);
> > > +             sk->sk_prot->close(sk);
> > 
> > You can call mptcp_close() directly here.
> > 
> > Perhaps we could as well drop the mptcp_sock_destruct() hack?
> 
> Do you mean to call mptcp_sock_destruct() directly here?

I suspect that with this change setting msk->sk_destruct to
mptcp_sock_destruct in subflow_syn_recv_sock() is not needed anymore,
and the relevant intialization (and callback definition) could be
removed.

> 
Cheers,

Paolo
Menglong Dong Sept. 7, 2022, 7:12 a.m. UTC | #5
On Tue, Sep 6, 2022 at 3:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2022-09-05 at 17:03 +0800, Menglong Dong wrote:
> > On Mon, Sep 5, 2022 at 4:26 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > Hello,
> > >
> > > On Mon, 2022-09-05 at 13:04 +0800, menglong8.dong@gmail.com wrote:
> > > > From: Menglong Dong <imagedong@tencent.com>
> > > >
> > > > The mptcp socket and its subflow sockets in accept queue can't be
> > > > released after the process exit.
> > > >
> > > > While the release of a mptcp socket in listening state, the
> > > > corresponding tcp socket will be released too. Meanwhile, the tcp
> > > > socket in the unaccept queue will be released too. However, only init
> > > > subflow is in the unaccept queue, and the joined subflow is not in the
> > > > unaccept queue, which makes the joined subflow won't be released, and
> > > > therefore the corresponding unaccepted mptcp socket will not be released
> > > > to.
> > > >
> > > > This can be reproduced easily with following steps:
> > > >
> > > > 1. create 2 namespace and veth:
> > > >    $ ip netns add mptcp-client
> > > >    $ ip netns add mptcp-server
> > > >    $ sysctl -w net.ipv4.conf.all.rp_filter=0
> > > >    $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
> > > >    $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
> > > >    $ ip link add red-client netns mptcp-client type veth peer red-server \
> > > >      netns mptcp-server
> > > >    $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
> > > >    $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
> > > >    $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
> > > >    $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
> > > >    $ ip -n mptcp-server link set red-server up
> > > >    $ ip -n mptcp-client link set red-client up
> > > >
> > > > 2. configure the endpoint and limit for client and server:
> > > >    $ ip -n mptcp-server mptcp endpoint flush
> > > >    $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
> > > >    $ ip -n mptcp-client mptcp endpoint flush
> > > >    $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
> > > >    $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
> > > >      1 subflow
> > > >
> > > > 3. listen and accept on a port, such as 9999. The nc command we used
> > > >    here is modified, which makes it uses mptcp protocol by default.
> > > >    And the default backlog is 1:
> > > >    ip netns exec mptcp-server nc -l -k -p 9999
> > > >
> > > > 4. open another *two* terminal and connect to the server with the
> > > >    following command:
> > > >    $ ip netns exec mptcp-client nc 10.0.0.1 9999
> > > >    input something after connect, to triger the connection of the second
> > > >    subflow
> > > >
> > > > 5. exit all the nc command, and check the tcp socket in server namespace.
> > > >    And you will find that there is one tcp socket in CLOSE_WAIT state
> > > >    and can't release forever.
> > >
> > > Thank you for the report!
> > >
> > > I have a doubt WRT the above scenario: AFAICS 'nc' will accept the
> > > incoming sockets ASAP, so the unaccepted queue should be empty at
> > > shutdown, but that does not fit with your description?!?
> > >
> >
> > By default, as far as in my case, nc won't accept the new connection
> > until the first connection closes with the '-k' set. Therefor, the second
> > connection will stay in the unaccepted queue.
>
> I missed the fact you opened 2 connections. I guess that is point 4
> above. Please rephrase that sentence with something alike:
>
> ---
> 4. open another *two* terminal and use each of them to connect to the
> server with the following command:
> ...
> So that there are two established mptcp connections, with the second
> one still unaccepted.
> ---

Sounds nice! Thanks~

> >
> > > > There are some solutions that I thought:
> > > >
> > > > 1. release all unaccepted mptcp socket with mptcp_close() while the
> > > >    listening tcp socket release in mptcp_subflow_queue_clean(). This is
> > > >    what we do in this commit.
> > > > 2. release the mptcp socket with mptcp_close() in subflow_ulp_release().
> > > > 3. etc
> > > >
> > >
> > > Can you please point to a commit introducing the issue?
> > >
> >
> > In fact, I'm not sure. In my case, I found this issue in kernel 5.10.
> > And I wanted to find the solution in the upstream, but find that
> > upstream has this issue too.
> >
> > Hmm...I am curious if this issue exists in the beginning? I
> > can't find the opportunity that the joined subflow which are
> > unaccepted can be released.
>
> It looks like the problem is there since MPJ support, commit
> f296234c98a8fcec94eec80304a873f635d350ea
>

Yeah, I'll add a Fixes tag for this commit.

> >
> > > > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > > > ---
> > > >  net/mptcp/subflow.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > > > index c7d49fb6e7bd..e39dff5d5d84 100644
> > > > --- a/net/mptcp/subflow.c
> > > > +++ b/net/mptcp/subflow.c
> > > > @@ -1770,6 +1770,10 @@ void mptcp_subflow_queue_clean(struct sock *listener_ssk)
> > > >               msk->first = NULL;
> > > >               msk->dl_next = NULL;
> > > >               unlock_sock_fast(sk, slow);
> > > > +
> > > > +             /*  */
> > > > +             sock_hold(sk);
> > > > +             sk->sk_prot->close(sk);
> > >
> > > You can call mptcp_close() directly here.
> > >
> > > Perhaps we could as well drop the mptcp_sock_destruct() hack?
> >
> > Do you mean to call mptcp_sock_destruct() directly here?
>
> I suspect that with this change setting msk->sk_destruct to
> mptcp_sock_destruct in subflow_syn_recv_sock() is not needed anymore,
> and the relevant intialization (and callback definition) could be
> removed.

Your suspect should be right. The mptcp_subflow_queue_clean()
should always be called before unaccepted tcp socket and the
corresponding mptcp socket release.
Therefore this change can ensure that the mptcp socket will be
CLOSE state when it is released. I'll remove
mptcp_sock_destruct(), BTW.

Thanks!
Menglong Dong

>
> >
> Cheers,
>
> Paolo
>
diff mbox series

Patch

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index c7d49fb6e7bd..e39dff5d5d84 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1770,6 +1770,10 @@  void mptcp_subflow_queue_clean(struct sock *listener_ssk)
 		msk->first = NULL;
 		msk->dl_next = NULL;
 		unlock_sock_fast(sk, slow);
+
+		/*  */
+		sock_hold(sk);
+		sk->sk_prot->close(sk);
 	}
 
 	/* we are still under the listener msk socket lock */