mbox series

[0/2] serial: sh-sci: Fix .flush_buffer() issues

Message ID 20190624123540.20629-1-geert+renesas@glider.be (mailing list archive)
Headers show
Series serial: sh-sci: Fix .flush_buffer() issues | expand

Message

Geert Uytterhoeven June 24, 2019, 12:35 p.m. UTC
Hi Greg, Jiri,

This patch series attempts to fix the issues Eugeniu Rosca reported
seeing, where .flush_buffer() interfered with transmit DMA operation[*].

There's a third patch "dmaengine: rcar-dmac: Reject zero-length slave
DMA requests", which is related to the issue, but further independent,
hence submitted separately.

Eugeniu: does this fix the issues you were seeing?

Thanks for your comments!

[*] '[PATCH 2/6] Revert "arm64: dts: renesas: r8a7796: Enable DMA for SCIF2"'
(https://lore.kernel.org/lkml/20190504004258.23574-3-erosca@de.adit-jv.com/).

Geert Uytterhoeven (2):
  serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
  serial: sh-sci: Terminate TX DMA during buffer flushing

 drivers/tty/serial/sh-sci.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

Comments

Eugeniu Rosca June 26, 2019, 5:34 p.m. UTC | #1
Hi Geert, CC: George

On Mon, Jun 24, 2019 at 02:35:38PM +0200, Geert Uytterhoeven wrote:
> Hi Greg, Jiri,
> 
> This patch series attempts to fix the issues Eugeniu Rosca reported
> seeing, where .flush_buffer() interfered with transmit DMA operation[*].
> 
> There's a third patch "dmaengine: rcar-dmac: Reject zero-length slave
> DMA requests", which is related to the issue, but further independent,
> hence submitted separately.
> 
> Eugeniu: does this fix the issues you were seeing?

Many thanks for both sh-sci and the rcar-dmac patches.
The fixes are very much appreciated.

> Geert Uytterhoeven (2):
>   serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
>   serial: sh-sci: Terminate TX DMA during buffer flushing
> 
>  drivers/tty/serial/sh-sci.c | 33 ++++++++++++++++++++++++---------
>  1 file changed, 24 insertions(+), 9 deletions(-)

I reserved some time to get a feeling about how the patches behave on
a real system (H3-ES2.0-ULCB-KF-M06), so here come my observations.

First of all, the issue I have originally reported in [0] is only
reproducible in absence of [4]. So, one of my questions would be how
do you yourself see the relationship between [1-3] and [4]?

That said, all my testing assumes:
 - Vanilla tip v5.2-rc6-15-g249155c20f9b with [4] reverted.
 - DEBUG is undefined in {sh-sci.c,rcar-dmac.c}, since I've noticed
   new issues arising in the debug build, which are unrelated to [0].

Below is the summary of my findings:

 Version         IS [0]       Is console       Error message when
(vanilla+X)    reproduced?  usable after [0]   [0] is reproduced
                             is reproduced?
 ------------------------------------------------------------
 -[4]             Yes           No                [5]
 -[4]+[1]         Yes           No                -
 -[4]+[2]         Yes           Yes               [5]
 -[4]+[3]         Yes           Yes               [6]
 -[4]+[1]+[2]     No            -                 -
 -[4]+[1]+[2]+[3] No            -                 -
 pure vanilla     No            -                 -

This looks a little too verbose, but I thought it might be interesting.

The story which I see is that [1] does not fix [0] alone, but it seems
to depend on [2]. Furthermore, if cherry picked alone, [1] makes the
matters somewhat worse in the sense that it hides the error [5].

My only question is whether [1-3] are supposed to replace [4] or they
are supposed to happily coexist. Since I don't see [0] being reproduced
with [1-3], I personally prefer to re-enable DMA on SCIF (when the
latter is used as console) so that more features and code paths are
exercised to increase test coverage.

[0] https://lore.kernel.org/lkml/20190504004258.23574-3-erosca@de.adit-jv.com/
[1] https://patchwork.kernel.org/patch/11012983/
    ("serial: sh-sci: Fix TX DMA buffer flushing and workqueue races")
[2] https://patchwork.kernel.org/patch/11012987/
    ("serial: sh-sci: Terminate TX DMA during buffer flushing")
[3] https://patchwork.kernel.org/patch/11012991/
    ("dmaengine: rcar-dmac: Reject zero-length slave DMA requests")
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=099506cbbc79c0
    ("serial: sh-sci: disable DMA for uart_console")

[5] rcar-dmac e7300000.dma-controller: Channel Address Error
[6] rcar-dmac e7300000.dma-controller: rcar_dmac_prep_slave_sg: bad parameter: len=1, id=19
    sh-sci e6e88000.serial: Failed preparing Tx DMA descriptor

Thanks!
Geert Uytterhoeven June 28, 2019, 11:51 a.m. UTC | #2
Hi Eugeniu,

On Wed, Jun 26, 2019 at 7:34 PM Eugeniu Rosca <roscaeugeniu@gmail.com> wrote:
> On Mon, Jun 24, 2019 at 02:35:38PM +0200, Geert Uytterhoeven wrote:
> > This patch series attempts to fix the issues Eugeniu Rosca reported
> > seeing, where .flush_buffer() interfered with transmit DMA operation[*].
> >
> > There's a third patch "dmaengine: rcar-dmac: Reject zero-length slave
> > DMA requests", which is related to the issue, but further independent,
> > hence submitted separately.
> >
> > Eugeniu: does this fix the issues you were seeing?
>
> Many thanks for both sh-sci and the rcar-dmac patches.
> The fixes are very much appreciated.
>
> > Geert Uytterhoeven (2):
> >   serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
> >   serial: sh-sci: Terminate TX DMA during buffer flushing
> >
> >  drivers/tty/serial/sh-sci.c | 33 ++++++++++++++++++++++++---------
> >  1 file changed, 24 insertions(+), 9 deletions(-)
>
> I reserved some time to get a feeling about how the patches behave on
> a real system (H3-ES2.0-ULCB-KF-M06), so here come my observations.

Thanks for your extensive testing!

> First of all, the issue I have originally reported in [0] is only
> reproducible in absence of [4]. So, one of my questions would be how
> do you yourself see the relationship between [1-3] and [4]?

I consider them independent.
Just applying [4] would fix the issue for the console only, while the
race condition can still be triggered on other serial ports.

> That said, all my testing assumes:
>  - Vanilla tip v5.2-rc6-15-g249155c20f9b with [4] reverted.
>  - DEBUG is undefined in {sh-sci.c,rcar-dmac.c}, since I've noticed
>    new issues arising in the debug build, which are unrelated to [0].
>
> Below is the summary of my findings:
>
>  Version         IS [0]       Is console       Error message when
> (vanilla+X)    reproduced?  usable after [0]   [0] is reproduced
>                              is reproduced?
>  ------------------------------------------------------------
>  -[4]             Yes           No                [5]
>  -[4]+[1]         Yes           No                -
>  -[4]+[2]         Yes           Yes               [5]
>  -[4]+[3]         Yes           Yes               [6]
>  -[4]+[1]+[2]     No            -                 -
>  -[4]+[1]+[2]+[3] No            -                 -
>  pure vanilla     No            -                 -
>
> This looks a little too verbose, but I thought it might be interesting.

Thanks, it's very helpful to provide these results.

> The story which I see is that [1] does not fix [0] alone, but it seems
> to depend on [2]. Furthermore, if cherry picked alone, [1] makes the
> matters somewhat worse in the sense that it hides the error [5].

OK.

> My only question is whether [1-3] are supposed to replace [4] or they
> are supposed to happily coexist. Since I don't see [0] being reproduced

They are meant to coexist.

> with [1-3], I personally prefer to re-enable DMA on SCIF (when the
> latter is used as console) so that more features and code paths are
> exercised to increase test coverage.

If a serial port is used as a console, the port is used for both DMA
(normal use) and PIO (serial console output).  The latter can have a
negative impact on the former, aggravating existing bugs, or triggering
more races, even in the hardware.  So I think it's better to be more
cautious and keep DMA disabled for the console.

> [0] https://lore.kernel.org/lkml/20190504004258.23574-3-erosca@de.adit-jv.com/
> [1] https://patchwork.kernel.org/patch/11012983/
>     ("serial: sh-sci: Fix TX DMA buffer flushing and workqueue races")
> [2] https://patchwork.kernel.org/patch/11012987/
>     ("serial: sh-sci: Terminate TX DMA during buffer flushing")
> [3] https://patchwork.kernel.org/patch/11012991/
>     ("dmaengine: rcar-dmac: Reject zero-length slave DMA requests")
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=099506cbbc79c0
>     ("serial: sh-sci: disable DMA for uart_console")
>
> [5] rcar-dmac e7300000.dma-controller: Channel Address Error
> [6] rcar-dmac e7300000.dma-controller: rcar_dmac_prep_slave_sg: bad parameter: len=1, id=19
>     sh-sci e6e88000.serial: Failed preparing Tx DMA descriptor

Gr{oetje,eeting}s,

                        Geert
Eugeniu Rosca June 28, 2019, 12:39 p.m. UTC | #3
Hi Geert,

On Fri, Jun 28, 2019 at 01:51:25PM +0200, Geert Uytterhoeven wrote:

[..]

> If a serial port is used as a console, the port is used for both DMA
> (normal use) and PIO (serial console output).  The latter can have a
> negative impact on the former, aggravating existing bugs, or triggering
> more races, even in the hardware.  So I think it's better to be more
> cautious and keep DMA disabled for the console.

Thanks for the extensive and comprehensible replies.
No more questions from my end.
Looking forward to picking the patches from vanilla/stable trees.
Wolfram Sang June 28, 2019, 12:55 p.m. UTC | #4
> > If a serial port is used as a console, the port is used for both DMA
> > (normal use) and PIO (serial console output).  The latter can have a
> > negative impact on the former, aggravating existing bugs, or triggering
> > more races, even in the hardware.  So I think it's better to be more
> > cautious and keep DMA disabled for the console.
> 
> Thanks for the extensive and comprehensible replies.
> No more questions from my end.
> Looking forward to picking the patches from vanilla/stable trees.

If you could formally add such a tag:

Tested-by: <your email>

(maybe also Acked-by: or Reviewed-by:, dunno if you think it is
apropriate)

to the patches, this would be much appreciated and will usually speed up
the patches getting applied.

Thanks for your help!
Eugeniu Rosca June 28, 2019, 1:02 p.m. UTC | #5
Hi Wolfram,

On Fri, Jun 28, 2019 at 02:55:34PM +0200, Wolfram Sang wrote:
[..]
> If you could formally add such a tag:
> 
> Tested-by: <your email>
> 
> (maybe also Acked-by: or Reviewed-by:, dunno if you think it is
> apropriate)
> 
> to the patches, this would be much appreciated and will usually speed up
> the patches getting applied.
> 
> Thanks for your help!

I am doing this per-patch to allow patchwork to reflect the status of
each patch on the linux-renesas-soc front-page. AFAIK patchwork ignores
series-wide '*-by: Name <email>' signatures/tags.
Wolfram Sang June 28, 2019, 1:14 p.m. UTC | #6
> I am doing this per-patch to allow patchwork to reflect the status of
> each patch on the linux-renesas-soc front-page. AFAIK patchwork ignores
> series-wide '*-by: Name <email>' signatures/tags.

AFAIK, yes.

Thanks!
George G. Davis June 28, 2019, 4:29 p.m. UTC | #7
Hello All,

On Fri, Jun 28, 2019 at 01:51:25PM +0200, Geert Uytterhoeven wrote:
> Hi Eugeniu,
> 
> On Wed, Jun 26, 2019 at 7:34 PM Eugeniu Rosca <roscaeugeniu@gmail.com> wrote:
> > On Mon, Jun 24, 2019 at 02:35:38PM +0200, Geert Uytterhoeven wrote:
> > > This patch series attempts to fix the issues Eugeniu Rosca reported
> > > seeing, where .flush_buffer() interfered with transmit DMA operation[*].
> > >
> > > There's a third patch "dmaengine: rcar-dmac: Reject zero-length slave
> > > DMA requests", which is related to the issue, but further independent,
> > > hence submitted separately.
> > >
> > > Eugeniu: does this fix the issues you were seeing?
> >
> > Many thanks for both sh-sci and the rcar-dmac patches.
> > The fixes are very much appreciated.
> >
> > > Geert Uytterhoeven (2):
> > >   serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
> > >   serial: sh-sci: Terminate TX DMA during buffer flushing
> > >
> > >  drivers/tty/serial/sh-sci.c | 33 ++++++++++++++++++++++++---------
> > >  1 file changed, 24 insertions(+), 9 deletions(-)
> >
> > I reserved some time to get a feeling about how the patches behave on
> > a real system (H3-ES2.0-ULCB-KF-M06), so here come my observations.
> 
> Thanks for your extensive testing!
> 
> > First of all, the issue I have originally reported in [0] is only
> > reproducible in absence of [4]. So, one of my questions would be how
> > do you yourself see the relationship between [1-3] and [4]?
> 
> I consider them independent.
> Just applying [4] would fix the issue for the console only, while the
> race condition can still be triggered on other serial ports.
> 
> > That said, all my testing assumes:
> >  - Vanilla tip v5.2-rc6-15-g249155c20f9b with [4] reverted.
> >  - DEBUG is undefined in {sh-sci.c,rcar-dmac.c}, since I've noticed
> >    new issues arising in the debug build, which are unrelated to [0].
> >
> > Below is the summary of my findings:
> >
> >  Version         IS [0]       Is console       Error message when
> > (vanilla+X)    reproduced?  usable after [0]   [0] is reproduced
> >                              is reproduced?
> >  ------------------------------------------------------------
> >  -[4]             Yes           No                [5]
> >  -[4]+[1]         Yes           No                -
> >  -[4]+[2]         Yes           Yes               [5]
> >  -[4]+[3]         Yes           Yes               [6]
> >  -[4]+[1]+[2]     No            -                 -
> >  -[4]+[1]+[2]+[3] No            -                 -
> >  pure vanilla     No            -                 -
> >
> > This looks a little too verbose, but I thought it might be interesting.
> 
> Thanks, it's very helpful to provide these results.
> 
> > The story which I see is that [1] does not fix [0] alone, but it seems
> > to depend on [2]. Furthermore, if cherry picked alone, [1] makes the
> > matters somewhat worse in the sense that it hides the error [5].
> 
> OK.
> 
> > My only question is whether [1-3] are supposed to replace [4] or they
> > are supposed to happily coexist. Since I don't see [0] being reproduced
> 
> They are meant to coexist.
> 
> > with [1-3], I personally prefer to re-enable DMA on SCIF (when the
> > latter is used as console) so that more features and code paths are
> > exercised to increase test coverage.
> 
> If a serial port is used as a console, the port is used for both DMA
> (normal use) and PIO (serial console output).  The latter can have a
> negative impact on the former, aggravating existing bugs, or triggering
> more races, even in the hardware.  So I think it's better to be more
> cautious and keep DMA disabled for the console.

Agreed.

Just a note for the record that [4] was the easiest way to resolve the
reported problem [0] but an alternative solution would be to implement DMA
support for ttySC console ports which will be non-trivial to implement and test
due to the potential for deadlocks in console write critical paths where
various locks are held with interrupts disabled. I see only one tty serial
driver which implements console DMA support, drivers/tty/serial/mpsc.c,
and perhaps there is a good reason why there are no other examples?

> > [0] https://lore.kernel.org/lkml/20190504004258.23574-3-erosca@de.adit-jv.com/
> > [1] https://patchwork.kernel.org/patch/11012983/
> >     ("serial: sh-sci: Fix TX DMA buffer flushing and workqueue races")
> > [2] https://patchwork.kernel.org/patch/11012987/
> >     ("serial: sh-sci: Terminate TX DMA during buffer flushing")
> > [3] https://patchwork.kernel.org/patch/11012991/
> >     ("dmaengine: rcar-dmac: Reject zero-length slave DMA requests")
> > [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=099506cbbc79c0
> >     ("serial: sh-sci: disable DMA for uart_console")
> >
> > [5] rcar-dmac e7300000.dma-controller: Channel Address Error
> > [6] rcar-dmac e7300000.dma-controller: rcar_dmac_prep_slave_sg: bad parameter: len=1, id=19
> >     sh-sci e6e88000.serial: Failed preparing Tx DMA descriptor
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Greg Kroah-Hartman July 3, 2019, 5:30 p.m. UTC | #8
On Fri, Jun 28, 2019 at 03:02:00PM +0200, Eugeniu Rosca wrote:
> Hi Wolfram,
> 
> On Fri, Jun 28, 2019 at 02:55:34PM +0200, Wolfram Sang wrote:
> [..]
> > If you could formally add such a tag:
> > 
> > Tested-by: <your email>
> > 
> > (maybe also Acked-by: or Reviewed-by:, dunno if you think it is
> > apropriate)
> > 
> > to the patches, this would be much appreciated and will usually speed up
> > the patches getting applied.
> > 
> > Thanks for your help!
> 
> I am doing this per-patch to allow patchwork to reflect the status of
> each patch on the linux-renesas-soc front-page. AFAIK patchwork ignores
> series-wide '*-by: Name <email>' signatures/tags.

I don't use patchwork :)
Eugeniu Rosca July 3, 2019, 6:15 p.m. UTC | #9
Hi Greg,

On Wed, Jul 03, 2019 at 07:30:50PM +0200, Greg Kroah-Hartman wrote:
> On Fri, Jun 28, 2019 at 03:02:00PM +0200, Eugeniu Rosca wrote:
[..]
> > I am doing this per-patch to allow patchwork to reflect the status of
> > each patch on the linux-renesas-soc front-page. AFAIK patchwork ignores
> > series-wide '*-by: Name <email>' signatures/tags.
> 
> I don't use patchwork :)

How do you then collect all the "{Reviewed,Tested,etc}-by:" signatures
(each of which means sometimes hours of effort) in the hairy ML threads?
Patchwork makes it a matter of one click.
Greg Kroah-Hartman July 3, 2019, 6:44 p.m. UTC | #10
On Wed, Jul 03, 2019 at 08:15:19PM +0200, Eugeniu Rosca wrote:
> Hi Greg,
> 
> On Wed, Jul 03, 2019 at 07:30:50PM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Jun 28, 2019 at 03:02:00PM +0200, Eugeniu Rosca wrote:
> [..]
> > > I am doing this per-patch to allow patchwork to reflect the status of
> > > each patch on the linux-renesas-soc front-page. AFAIK patchwork ignores
> > > series-wide '*-by: Name <email>' signatures/tags.
> > 
> > I don't use patchwork :)
> 
> How do you then collect all the "{Reviewed,Tested,etc}-by:" signatures
> (each of which means sometimes hours of effort) in the hairy ML threads?
> Patchwork makes it a matter of one click.

I've been doing this for a very long time now, before patchwork was even
around.  It's pretty trivial to collect them on my own.

thanks,

greg k-h
Wolfram Sang July 3, 2019, 9:08 p.m. UTC | #11
> I've been doing this for a very long time now, before patchwork was even
> around.  It's pretty trivial to collect them on my own.

It's a workflow thing. Mileages vary. I ask people to tag patches
individually for reasonable small series.