Message ID | 20230607155515.548120-2-sw@simonwunderlich.de (mailing list archive) |
---|---|
State | Accepted |
Commit | abac3ac97fe8734b620e7322a116450d7f90aa43 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [1/1] batman-adv: Broken sync while rescheduling delayed work | expand |
Hello: This patch was applied to netdev/net.git (main) by Simon Wunderlich <sw@simonwunderlich.de>: On Wed, 7 Jun 2023 17:55:15 +0200 you wrote: > From: Vladislav Efanov <VEfanov@ispras.ru> > > Syzkaller got a lot of crashes like: > KASAN: use-after-free Write in *_timers* > > All of these crashes point to the same memory area: > > [...] Here is the summary with links: - [1/1] batman-adv: Broken sync while rescheduling delayed work https://git.kernel.org/netdev/net/c/abac3ac97fe8 You are awesome, thank you!
On Wed, 7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote: > The reason for these issues is the lack of synchronization. Delayed > work (batadv_dat_purge) schedules new timer/work while the device > is being deleted. As the result new timer/delayed work is set after > cancel_delayed_work_sync() was called. So after the device is freed > the timer list contains pointer to already freed memory. I guess this is better than status quo but is the fix really complete? We're still not preventing the timer / work from getting scheduled and staying alive after the netdev has been freed, right?
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, June 7, 2023 10:01 PM > To: Simon Wunderlich <sw@simonwunderlich.de> > Cc: davem@davemloft.net; netdev@vger.kernel.org; b.a.t.m.a.n@lists.open- > mesh.org; Vladislav Efanov <VEfanov@ispras.ru>; stable@kernel.org; Sven > Eckelmann <sven@narfation.org> > Subject: Re: [PATCH 1/1] batman-adv: Broken sync while rescheduling delayed > work > > On Wed, 7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote: > > The reason for these issues is the lack of synchronization. Delayed > > work (batadv_dat_purge) schedules new timer/work while the device > > is being deleted. As the result new timer/delayed work is set after > > cancel_delayed_work_sync() was called. So after the device is freed > > the timer list contains pointer to already freed memory. > > I guess this is better than status quo but is the fix really complete? > We're still not preventing the timer / work from getting scheduled > and staying alive after the netdev has been freed, right? Yea, I would expect some synchronization mechanism to ensure that after cancel_delayed_work_sync() you can't queue the work again. I know for timers there is recently timer_shutdown_sync() which can be used to guarantee a timer can't re-arm at all, and its intended for some situations where there is a cyclic dependency... Thanks, Jake
As far as I found the synchronization is provided by delayed work subsystem. It is based on the WORK_STRUCT_PENDING_BIT in work->data field. The cancel_delayed_work_sync() atomically sets this bit and queue_delayed_work() checks it before scheduling new delayed work. The problem is caused by the INIT_DELAYED_WORK() call inside batadv_dat_start_timer(). This call happens before the queue_delayed_work() call and clears this bit. Best regards, Vlad On 08.06.2023 08:24, Keller, Jacob E wrote: > >> -----Original Message----- >> From: Jakub Kicinski <kuba@kernel.org> >> Sent: Wednesday, June 7, 2023 10:01 PM >> To: Simon Wunderlich <sw@simonwunderlich.de> >> Cc: davem@davemloft.net; netdev@vger.kernel.org; b.a.t.m.a.n@lists.open- >> mesh.org; Vladislav Efanov <VEfanov@ispras.ru>; stable@kernel.org; Sven >> Eckelmann <sven@narfation.org> >> Subject: Re: [PATCH 1/1] batman-adv: Broken sync while rescheduling delayed >> work >> >> On Wed, 7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote: >>> The reason for these issues is the lack of synchronization. Delayed >>> work (batadv_dat_purge) schedules new timer/work while the device >>> is being deleted. As the result new timer/delayed work is set after >>> cancel_delayed_work_sync() was called. So after the device is freed >>> the timer list contains pointer to already freed memory. >> I guess this is better than status quo but is the fix really complete? >> We're still not preventing the timer / work from getting scheduled >> and staying alive after the netdev has been freed, right? > Yea, I would expect some synchronization mechanism to ensure that after cancel_delayed_work_sync() you can't queue the work again. > > I know for timers there is recently timer_shutdown_sync() which can be used to guarantee a timer can't re-arm at all, and its intended for some situations where there is a cyclic dependency... > > Thanks, > Jake
On Wed, 2023-06-07 at 22:01 -0700, Jakub Kicinski wrote: > On Wed, 7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote: > > The reason for these issues is the lack of synchronization. Delayed > > work (batadv_dat_purge) schedules new timer/work while the device > > is being deleted. As the result new timer/delayed work is set after > > cancel_delayed_work_sync() was called. So after the device is freed > > the timer list contains pointer to already freed memory. > > I guess this is better than status quo but is the fix really complete? > We're still not preventing the timer / work from getting scheduled > and staying alive after the netdev has been freed, right? I *think* this specific use case does not expose such problem, as the delayed work is (AFAICS) scheduled only at device creation time and by the work itself, it should never be re-scheduled after cancel_delayed_work_sync() Cheers, Paolo
On Thursday, 8 June 2023 11:27:31 CEST Paolo Abeni wrote: [...] > > We're still not preventing the timer / work from getting scheduled > > and staying alive after the netdev has been freed, right? > > I *think* this specific use case does not expose such problem, as the > delayed work is (AFAICS) scheduled only at device creation time and by > the work itself, it should never be re-scheduled after > cancel_delayed_work_sync() Correct. * batadv_dat_start_timer is the only thing scheduling it * batadv_dat_start_timer is called by: - batadv_dat_purge (the worker rearming itself) - batadv_dat_init (when the interface is created) Kind regards, Sven
diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index 6968e55eb971..28a939d56090 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -101,7 +101,6 @@ static void batadv_dat_purge(struct work_struct *work); */ static void batadv_dat_start_timer(struct batadv_priv *bat_priv) { - INIT_DELAYED_WORK(&bat_priv->dat.work, batadv_dat_purge); queue_delayed_work(batadv_event_workqueue, &bat_priv->dat.work, msecs_to_jiffies(10000)); } @@ -819,6 +818,7 @@ int batadv_dat_init(struct batadv_priv *bat_priv) if (!bat_priv->dat.hash) return -ENOMEM; + INIT_DELAYED_WORK(&bat_priv->dat.work, batadv_dat_purge); batadv_dat_start_timer(bat_priv); batadv_tvlv_handler_register(bat_priv, batadv_dat_tvlv_ogm_handler_v1,