diff mbox series

[net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()

Message ID 20211108145340.17208-1-ap420073@gmail.com (mailing list archive)
State Accepted
Commit 43aa4937994f39a5ffc1a581b5ca382a1a2a8b1e
Delegated to: Netdev Maintainers
Headers show
Series [net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Taehee Yoo Nov. 8, 2021, 2:53 p.m. UTC
When the amt module is being removed, it calls flush_delayed_work() to exit
source_gc_wq. But it wouldn't be exited properly because the
amt_source_gc_work(), which is the callback function of source_gc_wq
internally calls mod_delayed_work() again.
So, amt_source_gc_work() would be called after the amt module is removed.
Therefore kernel panic would occur.
In order to avoid it, cancel_delayed_work() should be used instead of
flush_delayed_work().

Test commands:
   modprobe amt
   modprobe -rv amt

Splat looks like:
 BUG: unable to handle page fault for address: fffffbfff80f50db
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1237ee067 P4D 1237ee067 PUD 1237b2067 PMD 100c11067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0+ #27
 5a0ebebc29fe5c40c68bea90197606c3a832b09f
 RIP: 0010:run_timer_softirq+0x221/0xfc0
 Code: 00 00 4c 89 e1 4c 8b 30 48 c1 e9 03 80 3c 29 00 0f 85 ed 0b 00 00
 4d 89 34 24 4d 85 f6 74 19 49 8d 7e 08 48 89 f9 48 c1 e9 03 <80> 3c 29 00
 0f 85 fa 0b 00 00 4d 89 66 08 83 04 24 01 49 89 d4 48
 RSP: 0018:ffff888119009e50 EFLAGS: 00010806
 RAX: ffff8881191f8a80 RBX: 00000000007ffe2a RCX: 1ffffffff80f50db
 RDX: ffff888119009ed0 RSI: 0000000000000008 RDI: ffffffffc07a86d8
 RBP: dffffc0000000000 R08: ffff8881191f8280 R09: ffffed102323f061
 R10: ffff8881191f8307 R11: ffffed102323f060 R12: ffff888119009ec8
 R13: 00000000000000c0 R14: ffffffffc07a86d0 R15: ffff8881191f82e8
 FS:  0000000000000000(0000) GS:ffff888119000000(0000)
 knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: fffffbfff80f50db CR3: 00000001062dc002 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <IRQ>
  ? add_timer+0x650/0x650
  ? kvm_clock_read+0x14/0x30
  ? ktime_get+0xb9/0x180
  ? rcu_read_lock_held_common+0xe/0xa0
  ? rcu_read_lock_sched_held+0x56/0xc0
  ? rcu_read_lock_bh_held+0xa0/0xa0
  ? hrtimer_interrupt+0x271/0x790
  __do_softirq+0x1d0/0x88f
  irq_exit_rcu+0xe7/0x120
  sysvec_apic_timer_interrupt+0x8a/0xb0
  </IRQ>
  <TASK>
[ ... ]

Fixes: bc54e49c140b ("amt: add multicast(IGMP) report message handler")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
 drivers/net/amt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

patchwork-bot+netdevbpf@kernel.org Nov. 10, 2021, 3:20 a.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  8 Nov 2021 14:53:40 +0000 you wrote:
> When the amt module is being removed, it calls flush_delayed_work() to exit
> source_gc_wq. But it wouldn't be exited properly because the
> amt_source_gc_work(), which is the callback function of source_gc_wq
> internally calls mod_delayed_work() again.
> So, amt_source_gc_work() would be called after the amt module is removed.
> Therefore kernel panic would occur.
> In order to avoid it, cancel_delayed_work() should be used instead of
> flush_delayed_work().
> 
> [...]

Here is the summary with links:
  - [net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
    https://git.kernel.org/netdev/net/c/43aa4937994f

You are awesome, thank you!
Jakub Kicinski Nov. 11, 2021, 3:37 p.m. UTC | #2
On Mon,  8 Nov 2021 14:53:40 +0000 Taehee Yoo wrote:
> When the amt module is being removed, it calls flush_delayed_work() to exit
> source_gc_wq. But it wouldn't be exited properly because the
> amt_source_gc_work(), which is the callback function of source_gc_wq
> internally calls mod_delayed_work() again.
> So, amt_source_gc_work() would be called after the amt module is removed.
> Therefore kernel panic would occur.
> In order to avoid it, cancel_delayed_work() should be used instead of
> flush_delayed_work().

Somehow I convinced myself this is correct but now I'm not sure, again.

> diff --git a/drivers/net/amt.c b/drivers/net/amt.c
> index c384b2694f9e..47a04c330885 100644
> --- a/drivers/net/amt.c
> +++ b/drivers/net/amt.c
> @@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
>  {
>  	rtnl_link_unregister(&amt_link_ops);
>  	unregister_netdevice_notifier(&amt_notifier_block);
> -	flush_delayed_work(&source_gc_wq);
> +	cancel_delayed_work(&source_gc_wq);

This doesn't guarantee that the work is not running _right_ now and
will re-arm itself on the next clock cycle, so to speak.

 CPU 0                      CPU 1
 -----                      -----

 worker gets the work
 clears pending
 work starts running
                            cancel_work
                            grabs pending
                            clears pending
 mod_delayed_work()

You need cancel_delayed_work_sync(), right?

>  	__amt_source_gc_work();
>  	destroy_workqueue(amt_wq);
>  }
Taehee Yoo Nov. 12, 2021, 12:17 p.m. UTC | #3
Hi Jakub,
Thank you for your review!

On 11/12/21 12:37 AM, Jakub Kicinski wrote:
 > On Mon,  8 Nov 2021 14:53:40 +0000 Taehee Yoo wrote:
 >> When the amt module is being removed, it calls flush_delayed_work() 
to exit
 >> source_gc_wq. But it wouldn't be exited properly because the
 >> amt_source_gc_work(), which is the callback function of source_gc_wq
 >> internally calls mod_delayed_work() again.
 >> So, amt_source_gc_work() would be called after the amt module is 
removed.
 >> Therefore kernel panic would occur.
 >> In order to avoid it, cancel_delayed_work() should be used instead of
 >> flush_delayed_work().
 >
 > Somehow I convinced myself this is correct but now I'm not sure, again.
 >
 >> diff --git a/drivers/net/amt.c b/drivers/net/amt.c
 >> index c384b2694f9e..47a04c330885 100644
 >> --- a/drivers/net/amt.c
 >> +++ b/drivers/net/amt.c
 >> @@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
 >>   {
 >>   	rtnl_link_unregister(&amt_link_ops);
 >>   	unregister_netdevice_notifier(&amt_notifier_block);
 >> -	flush_delayed_work(&source_gc_wq);
 >> +	cancel_delayed_work(&source_gc_wq);
 >
 > This doesn't guarantee that the work is not running _right_ now and
 > will re-arm itself on the next clock cycle, so to speak.
 >
 >   CPU 0                      CPU 1
 >   -----                      -----
 >
 >   worker gets the work
 >   clears pending
 >   work starts running
 >                              cancel_work
 >                              grabs pending
 >                              clears pending
 >   mod_delayed_work()
 >
 > You need cancel_delayed_work_sync(), right?
 >

you're right!
I think cancel_delayed_work() is async so that it can't clearly fix this 
problem.
So, I will send a new patch after some tests.
Thank you so much for catching it!

Thanks a lot,
Taehee
diff mbox series

Patch

diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index c384b2694f9e..47a04c330885 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3286,7 +3286,7 @@  static void __exit amt_fini(void)
 {
 	rtnl_link_unregister(&amt_link_ops);
 	unregister_netdevice_notifier(&amt_notifier_block);
-	flush_delayed_work(&source_gc_wq);
+	cancel_delayed_work(&source_gc_wq);
 	__amt_source_gc_work();
 	destroy_workqueue(amt_wq);
 }