Message ID | 20220425042400.66517-1-duoming@zju.edu.cn (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] drivers: net: can: Fix deadlock in grcan_close() | expand |
On 25.04.22 06:24, Duoming Zhou wrote: > There are deadlocks caused by del_timer_sync(&priv->hang_timer) > and del_timer_sync(&priv->rr_timer) in grcan_close(), one of > the deadlocks are shown below: > > (Thread 1) | (Thread 2) > | grcan_reset_timer() > grcan_close() | mod_timer() > spin_lock_irqsave() //(1) | (wait a time) > ... | grcan_initiate_running_reset() > del_timer_sync() | spin_lock_irqsave() //(2) > (wait timer to stop) | ... > > We hold priv->lock in position (1) of thread 1 and use > del_timer_sync() to wait timer to stop, but timer handler > also need priv->lock in position (2) of thread 2. > As a result, grcan_close() will block forever. > > This patch extracts del_timer_sync() from the protection of > spin_lock_irqsave(), which could let timer handler to obtain > the needed lock. > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > --- > drivers/net/can/grcan.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/can/grcan.c b/drivers/net/can/grcan.c > index d0c5a7a60da..1189057b5d6 100644 > --- a/drivers/net/can/grcan.c > +++ b/drivers/net/can/grcan.c > @@ -1102,8 +1102,10 @@ static int grcan_close(struct net_device *dev) > > priv->closing = true; > if (priv->need_txbug_workaround) { > + spin_unlock_irqrestore(&priv->lock, flags); > del_timer_sync(&priv->hang_timer); > del_timer_sync(&priv->rr_timer); > + spin_lock_irqsave(&priv->lock, flags); It looks weird to unlock and re-lock the operations like this. This breaks the intended locking for the closing process. Isn't there any possibility to e.g. move that entire if-section before the lock? > } > netif_stop_queue(dev); > grcan_stop_hardware(dev); Regards, Oliver
On 2022-04-26 21:12, Oliver Hartkopp wrote: > On 25.04.22 06:24, Duoming Zhou wrote: >> There are deadlocks caused by del_timer_sync(&priv->hang_timer) >> and del_timer_sync(&priv->rr_timer) in grcan_close(), one of >> the deadlocks are shown below: >> >> (Thread 1) | (Thread 2) >> | grcan_reset_timer() >> grcan_close() | mod_timer() >> spin_lock_irqsave() //(1) | (wait a time) >> ... | grcan_initiate_running_reset() >> del_timer_sync() | spin_lock_irqsave() //(2) >> (wait timer to stop) | ... >> >> We hold priv->lock in position (1) of thread 1 and use >> del_timer_sync() to wait timer to stop, but timer handler >> also need priv->lock in position (2) of thread 2. >> As a result, grcan_close() will block forever. >> >> This patch extracts del_timer_sync() from the protection of >> spin_lock_irqsave(), which could let timer handler to obtain >> the needed lock. >> >> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> >> --- >> drivers/net/can/grcan.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/net/can/grcan.c b/drivers/net/can/grcan.c >> index d0c5a7a60da..1189057b5d6 100644 >> --- a/drivers/net/can/grcan.c >> +++ b/drivers/net/can/grcan.c >> @@ -1102,8 +1102,10 @@ static int grcan_close(struct net_device *dev) >> priv->closing = true; >> if (priv->need_txbug_workaround) { >> + spin_unlock_irqrestore(&priv->lock, flags); >> del_timer_sync(&priv->hang_timer); >> del_timer_sync(&priv->rr_timer); >> + spin_lock_irqsave(&priv->lock, flags); > > It looks weird to unlock and re-lock the operations like this. This > breaks the intended locking for the closing process. > > Isn't there any possibility to e.g. move that entire if-section before > the lock? All functions wishing to start the timers both check priv->closing and then, if false, start the timer within the priv->lock spinlock. Given that, it should be ok that del_timer_sync is not done within the spinlock as therefore no one can restart any timers after priv->closing has been set to true. It looks a bit weird, but setting priv->closing to true needs to happen within the priv->lock spinlock protection and needs to happen before del_timer_sync to avoid a race between grcan_close and someone starting the timer. Reviewed-by: Andreas Larsson <andreas@gaisler.com>
On 27.04.22 14:47, Andreas Larsson wrote: > On 2022-04-26 21:12, Oliver Hartkopp wrote: >> On 25.04.22 06:24, Duoming Zhou wrote: >>> There are deadlocks caused by del_timer_sync(&priv->hang_timer) >>> and del_timer_sync(&priv->rr_timer) in grcan_close(), one of >>> the deadlocks are shown below: >>> >>> (Thread 1) | (Thread 2) >>> | grcan_reset_timer() >>> grcan_close() | mod_timer() >>> spin_lock_irqsave() //(1) | (wait a time) >>> ... | grcan_initiate_running_reset() >>> del_timer_sync() | spin_lock_irqsave() //(2) >>> (wait timer to stop) | ... >>> >>> We hold priv->lock in position (1) of thread 1 and use >>> del_timer_sync() to wait timer to stop, but timer handler >>> also need priv->lock in position (2) of thread 2. >>> As a result, grcan_close() will block forever. >>> >>> This patch extracts del_timer_sync() from the protection of >>> spin_lock_irqsave(), which could let timer handler to obtain >>> the needed lock. >>> >>> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> >>> --- >>> drivers/net/can/grcan.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/net/can/grcan.c b/drivers/net/can/grcan.c >>> index d0c5a7a60da..1189057b5d6 100644 >>> --- a/drivers/net/can/grcan.c >>> +++ b/drivers/net/can/grcan.c >>> @@ -1102,8 +1102,10 @@ static int grcan_close(struct net_device *dev) >>> priv->closing = true; >>> if (priv->need_txbug_workaround) { >>> + spin_unlock_irqrestore(&priv->lock, flags); >>> del_timer_sync(&priv->hang_timer); >>> del_timer_sync(&priv->rr_timer); >>> + spin_lock_irqsave(&priv->lock, flags); >> >> It looks weird to unlock and re-lock the operations like this. This >> breaks the intended locking for the closing process. >> >> Isn't there any possibility to e.g. move that entire if-section before >> the lock? > > All functions wishing to start the timers both check priv->closing and > then, if false, start the timer within the priv->lock spinlock. Given > that, it should be ok that del_timer_sync is not done within the > spinlock as therefore no one can restart any timers after priv->closing > has been set to true. > > It looks a bit weird, but setting priv->closing to true needs to happen > within the priv->lock spinlock protection and needs to happen before > del_timer_sync to avoid a race between grcan_close and someone starting > the timer. > > Reviewed-by: Andreas Larsson <andreas@gaisler.com> > Thanks Andreas! Best regards, Oliver
diff --git a/drivers/net/can/grcan.c b/drivers/net/can/grcan.c index d0c5a7a60da..1189057b5d6 100644 --- a/drivers/net/can/grcan.c +++ b/drivers/net/can/grcan.c @@ -1102,8 +1102,10 @@ static int grcan_close(struct net_device *dev) priv->closing = true; if (priv->need_txbug_workaround) { + spin_unlock_irqrestore(&priv->lock, flags); del_timer_sync(&priv->hang_timer); del_timer_sync(&priv->rr_timer); + spin_lock_irqsave(&priv->lock, flags); } netif_stop_queue(dev); grcan_stop_hardware(dev);
There are deadlocks caused by del_timer_sync(&priv->hang_timer) and del_timer_sync(&priv->rr_timer) in grcan_close(), one of the deadlocks are shown below: (Thread 1) | (Thread 2) | grcan_reset_timer() grcan_close() | mod_timer() spin_lock_irqsave() //(1) | (wait a time) ... | grcan_initiate_running_reset() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ... We hold priv->lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need priv->lock in position (2) of thread 2. As a result, grcan_close() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_irqsave(), which could let timer handler to obtain the needed lock. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> --- drivers/net/can/grcan.c | 2 ++ 1 file changed, 2 insertions(+)