diff mbox series

[1/1] batman-adv: Broken sync while rescheduling delayed work

Message ID 20230607155515.548120-2-sw@simonwunderlich.de (mailing list archive)
State Accepted
Commit abac3ac97fe8734b620e7322a116450d7f90aa43
Delegated to: Netdev Maintainers
Headers show
Series [1/1] batman-adv: Broken sync while rescheduling delayed work | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers fail 1 blamed authors not CCed: a@unstable.cc; 4 maintainers not CCed: edumazet@google.com a@unstable.cc mareklindner@neomailbox.ch pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Simon Wunderlich June 7, 2023, 3:55 p.m. UTC
From: Vladislav Efanov <VEfanov@ispras.ru>

Syzkaller got a lot of crashes like:
KASAN: use-after-free Write in *_timers*

All of these crashes point to the same memory area:

The buggy address belongs to the object at ffff88801f870000
 which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 5320 bytes inside of
 8192-byte region [ffff88801f870000, ffff88801f872000)

This area belongs to :
        batadv_priv->batadv_priv_dat->delayed_work->timer_list

The reason for these issues is the lack of synchronization. Delayed
work (batadv_dat_purge) schedules new timer/work while the device
is being deleted. As the result new timer/delayed work is set after
cancel_delayed_work_sync() was called. So after the device is freed
the timer list contains pointer to already freed memory.

Found by Linux Verification Center (linuxtesting.org) with syzkaller.

Cc: stable@kernel.org
Fixes: 2f1dfbe18507 ("batman-adv: Distributed ARP Table - implement local storage")
Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
---
 net/batman-adv/distributed-arp-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

patchwork-bot+netdevbpf@kernel.org June 8, 2023, 5 a.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (main)
by Simon Wunderlich <sw@simonwunderlich.de>:

On Wed,  7 Jun 2023 17:55:15 +0200 you wrote:
> From: Vladislav Efanov <VEfanov@ispras.ru>
> 
> Syzkaller got a lot of crashes like:
> KASAN: use-after-free Write in *_timers*
> 
> All of these crashes point to the same memory area:
> 
> [...]

Here is the summary with links:
  - [1/1] batman-adv: Broken sync while rescheduling delayed work
    https://git.kernel.org/netdev/net/c/abac3ac97fe8

You are awesome, thank you!
Jakub Kicinski June 8, 2023, 5:01 a.m. UTC | #2
On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
> The reason for these issues is the lack of synchronization. Delayed
> work (batadv_dat_purge) schedules new timer/work while the device
> is being deleted. As the result new timer/delayed work is set after
> cancel_delayed_work_sync() was called. So after the device is freed
> the timer list contains pointer to already freed memory.

I guess this is better than status quo but is the fix really complete?
We're still not preventing the timer / work from getting scheduled
and staying alive after the netdev has been freed, right?
Jacob Keller June 8, 2023, 5:24 a.m. UTC | #3
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, June 7, 2023 10:01 PM
> To: Simon Wunderlich <sw@simonwunderlich.de>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; b.a.t.m.a.n@lists.open-
> mesh.org; Vladislav Efanov <VEfanov@ispras.ru>; stable@kernel.org; Sven
> Eckelmann <sven@narfation.org>
> Subject: Re: [PATCH 1/1] batman-adv: Broken sync while rescheduling delayed
> work
> 
> On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
> > The reason for these issues is the lack of synchronization. Delayed
> > work (batadv_dat_purge) schedules new timer/work while the device
> > is being deleted. As the result new timer/delayed work is set after
> > cancel_delayed_work_sync() was called. So after the device is freed
> > the timer list contains pointer to already freed memory.
> 
> I guess this is better than status quo but is the fix really complete?
> We're still not preventing the timer / work from getting scheduled
> and staying alive after the netdev has been freed, right?

Yea, I would expect some synchronization mechanism to ensure that after cancel_delayed_work_sync() you can't queue the work again.

I know for timers there is recently timer_shutdown_sync() which can be used to guarantee a timer can't re-arm at all, and its intended for some situations where there is a cyclic dependency...

Thanks,
Jake
Vladislav Efanov June 8, 2023, 9:01 a.m. UTC | #4
As far as I found the synchronization is provided by delayed work 
subsystem. It is based on the WORK_STRUCT_PENDING_BIT in work->data field.

The cancel_delayed_work_sync() atomically sets this bit and 
queue_delayed_work() checks it before scheduling new delayed work.


The problem is caused by the INIT_DELAYED_WORK() call inside 
batadv_dat_start_timer(). This call happens before the 
queue_delayed_work() call and clears this bit.


Best regards,

Vlad


On 08.06.2023 08:24, Keller, Jacob E wrote:
>
>> -----Original Message-----
>> From: Jakub Kicinski <kuba@kernel.org>
>> Sent: Wednesday, June 7, 2023 10:01 PM
>> To: Simon Wunderlich <sw@simonwunderlich.de>
>> Cc: davem@davemloft.net; netdev@vger.kernel.org; b.a.t.m.a.n@lists.open-
>> mesh.org; Vladislav Efanov <VEfanov@ispras.ru>; stable@kernel.org; Sven
>> Eckelmann <sven@narfation.org>
>> Subject: Re: [PATCH 1/1] batman-adv: Broken sync while rescheduling delayed
>> work
>>
>> On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
>>> The reason for these issues is the lack of synchronization. Delayed
>>> work (batadv_dat_purge) schedules new timer/work while the device
>>> is being deleted. As the result new timer/delayed work is set after
>>> cancel_delayed_work_sync() was called. So after the device is freed
>>> the timer list contains pointer to already freed memory.
>> I guess this is better than status quo but is the fix really complete?
>> We're still not preventing the timer / work from getting scheduled
>> and staying alive after the netdev has been freed, right?
> Yea, I would expect some synchronization mechanism to ensure that after cancel_delayed_work_sync() you can't queue the work again.
>
> I know for timers there is recently timer_shutdown_sync() which can be used to guarantee a timer can't re-arm at all, and its intended for some situations where there is a cyclic dependency...
>
> Thanks,
> Jake
Paolo Abeni June 8, 2023, 9:27 a.m. UTC | #5
On Wed, 2023-06-07 at 22:01 -0700, Jakub Kicinski wrote:
> On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
> > The reason for these issues is the lack of synchronization. Delayed
> > work (batadv_dat_purge) schedules new timer/work while the device
> > is being deleted. As the result new timer/delayed work is set after
> > cancel_delayed_work_sync() was called. So after the device is freed
> > the timer list contains pointer to already freed memory.
> 
> I guess this is better than status quo but is the fix really complete?
> We're still not preventing the timer / work from getting scheduled
> and staying alive after the netdev has been freed, right?

I *think* this specific use case does not expose such problem, as the
delayed work is (AFAICS) scheduled only at device creation time and by
the work itself, it should never be re-scheduled after
cancel_delayed_work_sync()

Cheers,

Paolo
Sven Eckelmann June 8, 2023, 4:57 p.m. UTC | #6
On Thursday, 8 June 2023 11:27:31 CEST Paolo Abeni wrote:
[...]
> > We're still not preventing the timer / work from getting scheduled
> > and staying alive after the netdev has been freed, right?
> 
> I *think* this specific use case does not expose such problem, as the
> delayed work is (AFAICS) scheduled only at device creation time and by
> the work itself, it should never be re-scheduled after
> cancel_delayed_work_sync()

Correct.

* batadv_dat_start_timer is the only thing scheduling it
* batadv_dat_start_timer is called by:

  - batadv_dat_purge (the worker rearming itself)
  - batadv_dat_init (when the interface is created)

Kind regards,
	Sven
diff mbox series

Patch

diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
index 6968e55eb971..28a939d56090 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -101,7 +101,6 @@  static void batadv_dat_purge(struct work_struct *work);
  */
 static void batadv_dat_start_timer(struct batadv_priv *bat_priv)
 {
-	INIT_DELAYED_WORK(&bat_priv->dat.work, batadv_dat_purge);
 	queue_delayed_work(batadv_event_workqueue, &bat_priv->dat.work,
 			   msecs_to_jiffies(10000));
 }
@@ -819,6 +818,7 @@  int batadv_dat_init(struct batadv_priv *bat_priv)
 	if (!bat_priv->dat.hash)
 		return -ENOMEM;
 
+	INIT_DELAYED_WORK(&bat_priv->dat.work, batadv_dat_purge);
 	batadv_dat_start_timer(bat_priv);
 
 	batadv_tvlv_handler_register(bat_priv, batadv_dat_tvlv_ogm_handler_v1,