diff mbox series

[net] rxrpc: Restore removed timer deletion

Message ID 164984498582.2000115.4023190177137486137.stgit@warthog.procyon.org.uk (mailing list archive)
State Accepted
Commit ee3b0826b4764f6c13ad6db67495c5a1c38e9025
Delegated to: Netdev Maintainers
Headers show
Series [net] rxrpc: Restore removed timer deletion | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 3 maintainers not CCed: davem@davemloft.net kuba@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

David Howells April 13, 2022, 10:16 a.m. UTC
A recent patch[1] from Eric Dumazet flipped the order in which the
keepalive timer and the keepalive worker were cancelled in order to fix a
syzbot reported issue[2].  Unfortunately, this enables the mirror image bug
whereby the timer races with rxrpc_exit_net(), restarting the worker after
it has been cancelled:

	CPU 1		CPU 2
	===============	=====================
			if (rxnet->live)
			<INTERRUPT>
	rxnet->live = false;
 	cancel_work_sync(&rxnet->peer_keepalive_work);
			rxrpc_queue_work(&rxnet->peer_keepalive_work);
	del_timer_sync(&rxnet->peer_keepalive_timer);

Fix this by restoring the removed del_timer_sync() so that we try to remove
the timer twice.  If the timer runs again, it should see ->live == false
and not restart the worker.

Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
---

 net/rxrpc/net_ns.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Eric Dumazet April 13, 2022, 5:14 p.m. UTC | #1
On Wed, Apr 13, 2022 at 3:16 AM David Howells <dhowells@redhat.com> wrote:
>
> A recent patch[1] from Eric Dumazet flipped the order in which the
> keepalive timer and the keepalive worker were cancelled in order to fix a
> syzbot reported issue[2].  Unfortunately, this enables the mirror image bug
> whereby the timer races with rxrpc_exit_net(), restarting the worker after
> it has been cancelled:
>
>         CPU 1           CPU 2
>         =============== =====================
>                         if (rxnet->live)
>                         <INTERRUPT>
>         rxnet->live = false;
>         cancel_work_sync(&rxnet->peer_keepalive_work);
>                         rxrpc_queue_work(&rxnet->peer_keepalive_work);
>         del_timer_sync(&rxnet->peer_keepalive_timer);
>
> Fix this by restoring the removed del_timer_sync() so that we try to remove
> the timer twice.  If the timer runs again, it should see ->live == false
> and not restart the worker.
>
> Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()")
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: linux-afs@lists.infradead.org
> Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
> Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
> ---
>
>  net/rxrpc/net_ns.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c
> index f15d6942da45..cc7e30733feb 100644
> --- a/net/rxrpc/net_ns.c
> +++ b/net/rxrpc/net_ns.c
> @@ -113,7 +113,9 @@ static __net_exit void rxrpc_exit_net(struct net *net)
>         struct rxrpc_net *rxnet = rxrpc_net(net);
>
>         rxnet->live = false;
> +       del_timer_sync(&rxnet->peer_keepalive_timer);
>         cancel_work_sync(&rxnet->peer_keepalive_work);
> +       /* Remove the timer again as the worker may have restarted it. */
>         del_timer_sync(&rxnet->peer_keepalive_timer);
>         rxrpc_destroy_all_calls(rxnet);
>         rxrpc_destroy_all_connections(rxnet);
>
>

ok... so we have a timer and a work queue, both activating each other
in kind of a ping pong ?

Any particular reason not using delayed works ?

Thanks.
David Howells April 13, 2022, 5:41 p.m. UTC | #2
Eric Dumazet <edumazet@google.com> wrote:

> ok... so we have a timer and a work queue, both activating each other
> in kind of a ping pong ?

Yes.  I want to emit regular keepalive pokes.

> Any particular reason not using delayed works ?

Because there's a race between starting the keepalive timer when a new peer is
added and when the keepalive worker is resetting the timer for the next peer
in the list.  This is why I'm using timer_reduce().  delayed_work doesn't
currently have such a facility.  It's not simple to add because
try_to_grab_pending() as called from mod_delayed_work_on() cancels the timer -
which is not what I want it to do.

David
Eric Dumazet April 13, 2022, 5:53 p.m. UTC | #3
On Wed, Apr 13, 2022 at 10:41 AM David Howells <dhowells@redhat.com> wrote:
>
> Eric Dumazet <edumazet@google.com> wrote:
>
> > ok... so we have a timer and a work queue, both activating each other
> > in kind of a ping pong ?
>
> Yes.  I want to emit regular keepalive pokes.
>
> > Any particular reason not using delayed works ?
>
> Because there's a race between starting the keepalive timer when a new peer is
> added and when the keepalive worker is resetting the timer for the next peer
> in the list.  This is why I'm using timer_reduce().  delayed_work doesn't
> currently have such a facility.  It's not simple to add because
> try_to_grab_pending() as called from mod_delayed_work_on() cancels the timer -
> which is not what I want it to do.
>

SGTM, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>
patchwork-bot+netdevbpf@kernel.org April 15, 2022, 10 a.m. UTC | #4
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 13 Apr 2022 11:16:25 +0100 you wrote:
> A recent patch[1] from Eric Dumazet flipped the order in which the
> keepalive timer and the keepalive worker were cancelled in order to fix a
> syzbot reported issue[2].  Unfortunately, this enables the mirror image bug
> whereby the timer races with rxrpc_exit_net(), restarting the worker after
> it has been cancelled:
> 
> 	CPU 1		CPU 2
> 	===============	=====================
> 			if (rxnet->live)
> 			<INTERRUPT>
> 	rxnet->live = false;
>  	cancel_work_sync(&rxnet->peer_keepalive_work);
> 			rxrpc_queue_work(&rxnet->peer_keepalive_work);
> 	del_timer_sync(&rxnet->peer_keepalive_timer);
> 
> [...]

Here is the summary with links:
  - [net] rxrpc: Restore removed timer deletion
    https://git.kernel.org/netdev/net/c/ee3b0826b476

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c
index f15d6942da45..cc7e30733feb 100644
--- a/net/rxrpc/net_ns.c
+++ b/net/rxrpc/net_ns.c
@@ -113,7 +113,9 @@  static __net_exit void rxrpc_exit_net(struct net *net)
 	struct rxrpc_net *rxnet = rxrpc_net(net);
 
 	rxnet->live = false;
+	del_timer_sync(&rxnet->peer_keepalive_timer);
 	cancel_work_sync(&rxnet->peer_keepalive_work);
+	/* Remove the timer again as the worker may have restarted it. */
 	del_timer_sync(&rxnet->peer_keepalive_timer);
 	rxrpc_destroy_all_calls(rxnet);
 	rxrpc_destroy_all_connections(rxnet);