mbox series

[0/3] serial: 8250_dw: Fix clk-notifier/port suspend deadlock

Message ID 20200923161950.6237-1-Sergey.Semin@baikalelectronics.ru (mailing list archive)
Headers show
Series serial: 8250_dw: Fix clk-notifier/port suspend deadlock | expand

Message

Serge Semin Sept. 23, 2020, 4:19 p.m. UTC
Hans has discovered that there is a potential deadlock between the ref
clock change notifier and the port suspension procedures {see the link at
the bottom of the letter}. Indeed the deadlock is possible if the port
suspension is initiated during the ref clock rate change:

    CPU0 (suspend CPU/UART)   CPU1 (update clock)
             ----                    ----
    lock(&port->mutex);
                              lock((work_completion)(&data->clk_work));
                              lock(&port->mutex);
    lock((work_completion)(&data->clk_work));

    *** DEADLOCK ***

So the CPU performing the UART port shutdown procedure will wait until the
ref clock change notifier is finished (worker is flushed), while the later
will wait for a port mutex being released.

A possible solution to bypass the deadlock is to move the worker flush out
of the critical section protected by the TTY port mutex. For instance we
can register and de-register the clock change notifier in the port probe
and remove methods instead of having them called from the port
startup/shutdown callbacks. But in order to do that we need to make sure
that the serial8250_update_uartclk() method is safe to be used while the
port is shutted down. Alas the current implementation doesn't provide that
safety. The solution described above is introduced in the framework of
this patchset. See individual patches for details.

Link: https://lore.kernel.org/linux-serial/f1cd5c75-9cda-6896-a4e2-42c5bfc3f5c3@redhat.com

Hans, could you test the patchset out on your Cherry Trail (x86)-based
devices? After that we can merge it in into the kernels 5.8 and 5.9 if
there is no objections against the fix.

Note, in order to have the fix working for the older kernel all of patches
need to be backported.

Fixes: cc816969d7b5 ("serial: 8250_dw: Fix common clocks usage race condition")
Fixes: 868f3ee6e452 ("serial: 8250: Add 8250 port clock update method")
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Serge Semin (3):
  serial: 8250: Discard RTS/DTS setting from clock update method
  serial: 8250: Skip uninitialized TTY port baud rate update
  serial: 8250_dw: Fix clk-notifier/port suspend deadlock

 drivers/tty/serial/8250/8250_dw.c   | 54 ++++++++++-------------------
 drivers/tty/serial/8250/8250_port.c |  5 ++-
 2 files changed, 23 insertions(+), 36 deletions(-)

Comments

Hans de Goede Sept. 27, 2020, 3:01 p.m. UTC | #1
Hi,

On 9/23/20 6:19 PM, Serge Semin wrote:
> Hans has discovered that there is a potential deadlock between the ref
> clock change notifier and the port suspension procedures {see the link at
> the bottom of the letter}. Indeed the deadlock is possible if the port
> suspension is initiated during the ref clock rate change:
> 
>      CPU0 (suspend CPU/UART)   CPU1 (update clock)
>               ----                    ----
>      lock(&port->mutex);
>                                lock((work_completion)(&data->clk_work));
>                                lock(&port->mutex);
>      lock((work_completion)(&data->clk_work));
> 
>      *** DEADLOCK ***
> 
> So the CPU performing the UART port shutdown procedure will wait until the
> ref clock change notifier is finished (worker is flushed), while the later
> will wait for a port mutex being released.
> 
> A possible solution to bypass the deadlock is to move the worker flush out
> of the critical section protected by the TTY port mutex. For instance we
> can register and de-register the clock change notifier in the port probe
> and remove methods instead of having them called from the port
> startup/shutdown callbacks. But in order to do that we need to make sure
> that the serial8250_update_uartclk() method is safe to be used while the
> port is shutted down. Alas the current implementation doesn't provide that
> safety. The solution described above is introduced in the framework of
> this patchset. See individual patches for details.
> 
> Link: https://lore.kernel.org/linux-serial/f1cd5c75-9cda-6896-a4e2-42c5bfc3f5c3@redhat.com
> 
> Hans, could you test the patchset out on your Cherry Trail (x86)-based
> devices? After that we can merge it in into the kernels 5.8 and 5.9 if
> there is no objections against the fix.

Done, I can confirm that this fixes the lockdep issue for me, so you
can add my:

Tested-by: Hans de Goede <hdegoede@redhat.com>

To the entire series.

Regards,

Hans
Serge Semin Sept. 29, 2020, 8:51 p.m. UTC | #2
Hello,

On Sun, Sep 27, 2020 at 05:01:52PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 9/23/20 6:19 PM, Serge Semin wrote:
> > Hans has discovered that there is a potential deadlock between the ref
> > clock change notifier and the port suspension procedures {see the link at
> > the bottom of the letter}. Indeed the deadlock is possible if the port
> > suspension is initiated during the ref clock rate change:
> > 
> >      CPU0 (suspend CPU/UART)   CPU1 (update clock)
> >               ----                    ----
> >      lock(&port->mutex);
> >                                lock((work_completion)(&data->clk_work));
> >                                lock(&port->mutex);
> >      lock((work_completion)(&data->clk_work));
> > 
> >      *** DEADLOCK ***
> > 
> > So the CPU performing the UART port shutdown procedure will wait until the
> > ref clock change notifier is finished (worker is flushed), while the later
> > will wait for a port mutex being released.
> > 
> > A possible solution to bypass the deadlock is to move the worker flush out
> > of the critical section protected by the TTY port mutex. For instance we
> > can register and de-register the clock change notifier in the port probe
> > and remove methods instead of having them called from the port
> > startup/shutdown callbacks. But in order to do that we need to make sure
> > that the serial8250_update_uartclk() method is safe to be used while the
> > port is shutted down. Alas the current implementation doesn't provide that
> > safety. The solution described above is introduced in the framework of
> > this patchset. See individual patches for details.
> > 
> > Link: https://lore.kernel.org/linux-serial/f1cd5c75-9cda-6896-a4e2-42c5bfc3f5c3@redhat.com
> > 
> > Hans, could you test the patchset out on your Cherry Trail (x86)-based
> > devices? After that we can merge it in into the kernels 5.8 and 5.9 if
> > there is no objections against the fix.
> 
> Done, I can confirm that this fixes the lockdep issue for me, so you
> can add my:
> 
> Tested-by: Hans de Goede <hdegoede@redhat.com>

Great! Thank you very much.

Greg, could you merge the series in if you have no objection against the
solution design? Seeing the bug has been introduced together with the
original series integrated in the kernel 5.9, the fix provided by this
patchset will be only needed in 5.9.

-Sergey

> 
> To the entire series.
> 
> Regards,
> 
> Hans
>