Message ID | 20190504004258.23574-3-erosca@de.adit-jv.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Simon Horman |
Headers | show |
Series | Zap SCIF2 DMA configuration in R-Car Gen3 DTS | expand |
Hi Eugeniu, Thanks for your report! On Sat, May 4, 2019 at 2:45 AM Eugeniu Rosca <roscaeugeniu@gmail.com> wrote: > This reverts commit 97f26702bc95b5c3a72671d5c6675e4d6ee0a2f4. > > Here is the story behind this revert. > > Mainline commit [0] landed in the stable tree as commit [1], from where > it reached us in the form of regular stable update. After that, Michael > started to report occasional (30-50%) freezes of serial console on > booting M3-ES1.1-Salvator-XS. Same happened on M3-ES1.1-Salvator-X. > > Every time the issue occurs, the serial console outputs below [2] > before becoming totally unresponsive and printing nothing else: > rcar-dmac e7300000.dma-controller: Channel Address Error > > Git bisecting shows that the problem is contributed by commits [0-1]. > > While we can't be 100% certain (since we don't have the SCIF design docs > revealing its internal implementation detail) we think there is plenty > of evidence to assume that DMA is not supported on SCIF2, hence should > stay disabled on this specific channel: > > - Excerpt from Chapter 17. Direct Memory Access Controller for System > (SYS-DMAC) of R19UH0105EJ0150 Rev.1.50: > ---------8<--------- > [H3, H3-N, M3-W, V3M, V3H, D3, M3-N, E3] > The following modules can issue on-chip peripheral module requests. > [..] HSCIF0/1/2/3/4, [..] SCIF0/1/3/4/5, > ---------8<--------- > > - Excerpt from RENESAS_RCH3M3M3NE3_SCIF_UME_v2.00.pdf (Yocto v3.15.0): > ---------8<--------- > DMA Transfer: > - Support: SCIF0, SCIF1, SCIF3, SCIF4, SCIF5 > - Not support: SCIF2 > ---------8<--------- > - Disabled SCIF2 DMA in official Renesas v4.9/v4.14 kernels, e.g. see: > https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/commit/?id=e79c418fda8c Table 17.5 ("Selecting On-Chip Peripheral Module Request Modes") of "R-Car Series, 3rd Generation User’s Manual: Hardware" gained entries for SCIF2 in Revision 1.50 of the document, but it seems 17.1.1 ("Features") and Table 17.6 ("Data Length of DMA Transfer for Each of the On-Chip Peripheral Modules") were forgotten to be updated. The addition of the entry for SCIF2 is also mentioned in "Renesas Technical Update TN-RCT-S019A/E / R-Car M3-W Additional Explanation for Direct Memory Access Controller for System (SYS-DMAC)". Unfortunately both documents report wrong MID/RID values, due to a hexadecimal vs. decimal mistake, which were corrected in the Feb 12 errata for Rev. 1.50. So in my understanding, and according to my testing, DMA has always worked for SCIF2 on (at least) R-Car H3 ES1.0/2.0, M3-W, and M3-N. However, early firmware versions (before IPL and Secure Monitor Rev1.0.6, released on Feb 25, 2016) prohibited the use of SYS-DMAC2, cfr. commit eb21089c32054ecd ("arm64: dts: renesas: r8a7795: Add missing SYS-DMAC2 dmas"). Perhaps some firmware versions may impose additional restrictions? > Based on the issues generated by [0-1] (reproduced on H3, M3 and M3N) > and the doc statements presented above, we think it makes sense to > disable DMA on SCIF2 for most/all R-Car3 SoCs. > > [0] v5.0-rc6 commit 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > [1] v4.14.106 commit 703db5d1b1759f ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > [2] scif (DEBUG) and rcar-dmac logs: > https://gist.github.com/erosca/132cce76a619724a9e4fa61d1db88c66 I have checked my kernel logs, and found a few instances of "Channel Address Error". In all cases, I had enabled/added extra debug prints in the sh-sci driver, which may have had impact. Last occurrence was in a kernel based on v4.18-rc2, which predates several recent fixes for the sh-sci and rcar-dmac drivers. Can the issue be reproduced on current mainline? Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Geert, On Mon, May 06, 2019 at 12:02:41PM +0200, Geert Uytterhoeven wrote: > Hi Eugeniu, > > Thanks for your report! Thanks for your feedback. > > On Sat, May 4, 2019 at 2:45 AM Eugeniu Rosca <roscaeugeniu@gmail.com> wrote: > > This reverts commit 97f26702bc95b5c3a72671d5c6675e4d6ee0a2f4. > > > > Here is the story behind this revert. > > > > Mainline commit [0] landed in the stable tree as commit [1], from where > > it reached us in the form of regular stable update. After that, Michael > > started to report occasional (30-50%) freezes of serial console on > > booting M3-ES1.1-Salvator-XS. Same happened on M3-ES1.1-Salvator-X. > > > > Every time the issue occurs, the serial console outputs below [2] > > before becoming totally unresponsive and printing nothing else: > > rcar-dmac e7300000.dma-controller: Channel Address Error > > > > Git bisecting shows that the problem is contributed by commits [0-1]. > > > > While we can't be 100% certain (since we don't have the SCIF design docs > > revealing its internal implementation detail) we think there is plenty > > of evidence to assume that DMA is not supported on SCIF2, hence should > > stay disabled on this specific channel: > > > > - Excerpt from Chapter 17. Direct Memory Access Controller for System > > (SYS-DMAC) of R19UH0105EJ0150 Rev.1.50: > > ---------8<--------- > > [H3, H3-N, M3-W, V3M, V3H, D3, M3-N, E3] > > The following modules can issue on-chip peripheral module requests. > > [..] HSCIF0/1/2/3/4, [..] SCIF0/1/3/4/5, > > ---------8<--------- > > > > - Excerpt from RENESAS_RCH3M3M3NE3_SCIF_UME_v2.00.pdf (Yocto v3.15.0): > > ---------8<--------- > > DMA Transfer: > > - Support: SCIF0, SCIF1, SCIF3, SCIF4, SCIF5 > > - Not support: SCIF2 > > ---------8<--------- > > > - Disabled SCIF2 DMA in official Renesas v4.9/v4.14 kernels, e.g. see: > > https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/commit/?id=e79c418fda8c > > Table 17.5 ("Selecting On-Chip Peripheral Module Request Modes") of > "R-Car Series, 3rd Generation User’s Manual: Hardware" gained entries > for SCIF2 in Revision 1.50 of the document, but it seems 17.1.1 > ("Features") and Table 17.6 ("Data Length of DMA Transfer for Each of > the On-Chip Peripheral Modules") were forgotten to be updated. > The addition of the entry for SCIF2 is also mentioned in "Renesas > Technical Update TN-RCT-S019A/E / R-Car M3-W Additional Explanation for > Direct Memory Access Controller for System (SYS-DMAC)". > Unfortunately both documents report wrong MID/RID values, due to a > hexadecimal vs. decimal mistake, which were corrected in the Feb 12 > errata for Rev. 1.50. I do observe now that the most recent Rev. 1.50 of "R-Car Series, 3rd Generation User’s Manual: Hardware" does update _some_ of its internal chapters/tables to reflect the support of DMA on SCIF2. These SCIF2 changes look to be also tracked in the "Revision History" companion doc: Rev | Date | Page | Summary 1.50 | Nov 30, 2018 | 17-86-87 | Table 17.5 Selecting On-Chip Peripheral Module Request Modes: DMA Transfer Request Source, changed. SCIF2 reception and SCIF2 transmission, added | 17-91 | Table 17.6 Data Length of DMA Transfer for Each of the On-Chip Peripheral Modules: SCIF2, added As you have already stated, it looks like certain chapters like "17.1.1 Features" didn't receive a proper update, generating confusion. I will report this in parallel to Renesas Duesseldorf. > > So in my understanding, and according to my testing, DMA has always > worked for SCIF2 on (at least) R-Car H3 ES1.0/2.0, M3-W, and M3-N. Well, my testing shows different results. Using M3-W-ES1.1-Salvator-XS, I can reproduce the issue since v4.17 (also reproduced on v4.18, v4.19 and v5.1 with cherry picking 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") where appropriate). > However, early firmware versions (before IPL and Secure Monitor > Rev1.0.6, released on Feb 25, 2016) prohibited the use of SYS-DMAC2, > cfr. commit eb21089c32054ecd ("arm64: dts: renesas: r8a7795: Add missing > SYS-DMAC2 dmas"). I use a very recent Rev2.0.2 of https://github.com/renesas-rcar/arm-trusted-firmware . > > Perhaps some firmware versions may impose additional restrictions? I would have some suspicions about ATF if the issue was consistent. Since it is not, I believe there is a race going on in the kernel. > > > Based on the issues generated by [0-1] (reproduced on H3, M3 and M3N) > > and the doc statements presented above, we think it makes sense to > > disable DMA on SCIF2 for most/all R-Car3 SoCs. > > > > [0] v5.0-rc6 commit 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > [1] v4.14.106 commit 703db5d1b1759f ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > [2] scif (DEBUG) and rcar-dmac logs: > > https://gist.github.com/erosca/132cce76a619724a9e4fa61d1db88c66 > > I have checked my kernel logs, and found a few instances of "Channel > Address Error". In all cases, I had enabled/added extra debug prints in > the sh-sci driver, which may have had impact. > Last occurrence was in a kernel based on v4.18-rc2, which predates > several recent fixes for the sh-sci and rcar-dmac drivers. > Can the issue be reproduced on current mainline? With pure vanilla sources, arm64 defconfig and DTS (+97f26702bc95b5 where appropriate), the issue is seen on M3-W-ES1.1-Salvator-XS since v4.17. Can you please confirm you are seeing it too? Enabling DEBUG in drivers/dma/sh/rcar-dmac.c, I can notice that one of the symptoms is a NULL dst_addr revealed by: rcar-dmac e7300000.dma-controller: chan0: queue chunk (____ptrval____): 0@0xffff800639eb8090 -> 0x0000000000000000 In working scenarios, dst_addr is never zero. Does it give any hints? > > Thanks! Likewise! > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds
Hi George, I am able to reproduce the SCIF2 console freeze described in the referenced patchwork link using M3-ES1.1-Salvator-XS and recent v5.1-9573-gb970afcfcabd kernel. I confirm the behavior is healed with this patch. Thanks! Hope to see it accepted soon, since it fixes a super annoying console breakage every fourth boot or so on lots of R-Car3 targets. Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com> On Thu, May 09, 2019 at 10:43:30AM -0400, George G. Davis wrote: > As noted in commit 84b40e3b57ee ("serial: 8250: omap: Disable DMA for > console UART"), UART console lines use low-level PIO only access functions > which will conflict with use of the line when DMA is enabled, e.g. when > the console line is also used for systemd messages. So disable DMA > support for UART console lines. > > Fixes: https://patchwork.kernel.org/patch/10929511/ > Reported-by: Michael Rodin <mrodin@de.adit-jv.com> > Cc: Eugeniu Rosca <erosca@de.adit-jv.com> > Signed-off-by: George G. Davis <george_davis@mentor.com> > --- > drivers/tty/serial/sh-sci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c > index 3cd139752d3f..885b56b1d4e4 100644 > --- a/drivers/tty/serial/sh-sci.c > +++ b/drivers/tty/serial/sh-sci.c > @@ -1557,6 +1557,9 @@ static void sci_request_dma(struct uart_port *port) > > dev_dbg(port->dev, "%s: port %d\n", __func__, port->line); > > + if (uart_console(port)) > + return; /* Cannot use DMA on console */ > + > if (!port->dev->of_node) > return; > > -- > 2.7.4 >
Hello Eugeniu, On Fri, May 10, 2019 at 07:10:21PM +0200, Eugeniu Rosca wrote: > Hi George, > > I am able to reproduce the SCIF2 console freeze described in the > referenced patchwork link using M3-ES1.1-Salvator-XS and recent > v5.1-9573-gb970afcfcabd kernel. > > I confirm the behavior is healed with this patch. Thanks! > Hope to see it accepted soon, since it fixes a super annoying > console breakage every fourth boot or so on lots of R-Car3 targets. > > Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com> Thanks for testing. Also note, for the record, that the problem is not limited to SCIF2, e.g. try setting console=ttySC<n> wheren <n> is not SCIF2 on any other board which includes support for other serial ports, e.g. r8a7795-salvator-x, and you will observe the same problem on other SCIF ports too. It's just a concidence that most boards use SCIF2 as the default serial console where the console hangs (resolved by this patch) have been observed on multiple boards. > > On Thu, May 09, 2019 at 10:43:30AM -0400, George G. Davis wrote: > > As noted in commit 84b40e3b57ee ("serial: 8250: omap: Disable DMA for > > console UART"), UART console lines use low-level PIO only access functions > > which will conflict with use of the line when DMA is enabled, e.g. when > > the console line is also used for systemd messages. So disable DMA > > support for UART console lines. > > > > Fixes: https://patchwork.kernel.org/patch/10929511/ > > Reported-by: Michael Rodin <mrodin@de.adit-jv.com> > > Cc: Eugeniu Rosca <erosca@de.adit-jv.com> > > Signed-off-by: George G. Davis <george_davis@mentor.com> > > --- > > drivers/tty/serial/sh-sci.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c > > index 3cd139752d3f..885b56b1d4e4 100644 > > --- a/drivers/tty/serial/sh-sci.c > > +++ b/drivers/tty/serial/sh-sci.c > > @@ -1557,6 +1557,9 @@ static void sci_request_dma(struct uart_port *port) > > > > dev_dbg(port->dev, "%s: port %d\n", __func__, port->line); > > > > + if (uart_console(port)) > > + return; /* Cannot use DMA on console */ > > + > > if (!port->dev->of_node) > > return; > > > > -- > > 2.7.4 > > > > -- > Best Regards, > Eugeniu.
Hi George, On Fri, May 10, 2019 at 02:38:47PM -0400, George G. Davis wrote: > Hello Eugeniu, > > On Fri, May 10, 2019 at 07:10:21PM +0200, Eugeniu Rosca wrote: > > Hi George, > > > > I am able to reproduce the SCIF2 console freeze described in the > > referenced patchwork link using M3-ES1.1-Salvator-XS and recent > > v5.1-9573-gb970afcfcabd kernel. > > > > I confirm the behavior is healed with this patch. Thanks! > > Hope to see it accepted soon, since it fixes a super annoying > > console breakage every fourth boot or so on lots of R-Car3 targets. > > > > Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com> > > Thanks for testing. > > Also note, for the record, that the problem is not limited to SCIF2, e.g. try > setting console=ttySC<n> wheren <n> is not SCIF2 on any other board which > includes support for other serial ports, e.g. r8a7795-salvator-x, and you will > observe the same problem on other SCIF ports too. It's just a concidence that > most boards use SCIF2 as the default serial console where the console hangs > (resolved by this patch) have been observed on multiple boards. Thanks for the additional level of detail. FTR, trying to track the origin of the problem, it looks to me that the issue was _unmasked_ by v4.16-rc1 commit be7e251d20e6c8 ("tty: serial: sh-sci: Hide DMA config question") which turned on DMA on SCIF by default. I wonder if it'd be helpful to resend the patch w/o using --in-reply-to, so that it appears as standalone entry in linux-renesas-soc patchwork. Currently, assuming that the R-Car maintainers filter out any "Rejected" patches (which is the default patchwork behavior), your patch would be hidden from their eye.
Hi George, On Thu, May 9, 2019 at 4:44 PM George G. Davis <ggdavisiv@gmail.com> wrote: > As noted in commit 84b40e3b57ee ("serial: 8250: omap: Disable DMA for > console UART"), UART console lines use low-level PIO only access functions > which will conflict with use of the line when DMA is enabled, e.g. when > the console line is also used for systemd messages. So disable DMA > support for UART console lines. > > Fixes: https://patchwork.kernel.org/patch/10929511/ > Reported-by: Michael Rodin <mrodin@de.adit-jv.com> > Cc: Eugeniu Rosca <erosca@de.adit-jv.com> > Signed-off-by: George G. Davis <george_davis@mentor.com> I think this makes sense. In addition to OMAP 8250, the same approach is used in the Mediatek 8250 and iMX serial drivers. Regardless, this is definitely better than removing the "dmas" properties from DT, as DT describes hardware, not usage policies. Anyone else with a comment? Gr{oetje,eeting}s, Geert
On Mon, May 13, 2019 at 01:13:16PM +0200, Geert Uytterhoeven wrote: > Hi George, > > On Thu, May 9, 2019 at 4:44 PM George G. Davis <ggdavisiv@gmail.com> wrote: > > As noted in commit 84b40e3b57ee ("serial: 8250: omap: Disable DMA for > > console UART"), UART console lines use low-level PIO only access functions > > which will conflict with use of the line when DMA is enabled, e.g. when > > the console line is also used for systemd messages. So disable DMA > > support for UART console lines. > > > > Fixes: https://patchwork.kernel.org/patch/10929511/ > > Reported-by: Michael Rodin <mrodin@de.adit-jv.com> > > Cc: Eugeniu Rosca <erosca@de.adit-jv.com> > > Signed-off-by: George G. Davis <george_davis@mentor.com> > > I think this makes sense. In addition to OMAP 8250, the same approach > is used in the Mediatek 8250 and iMX serial drivers. > > Regardless, this is definitely better than removing the "dmas" properties > from DT, as DT describes hardware, not usage policies. +1 > Anyone else with a comment? Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
On Thu, May 09, 2019 at 10:43:30AM -0400, George G. Davis wrote: > As noted in commit 84b40e3b57ee ("serial: 8250: omap: Disable DMA for > console UART"), UART console lines use low-level PIO only access functions > which will conflict with use of the line when DMA is enabled, e.g. when > the console line is also used for systemd messages. So disable DMA > support for UART console lines. > > Fixes: https://patchwork.kernel.org/patch/10929511/ > Reported-by: Michael Rodin <mrodin@de.adit-jv.com> > Cc: Eugeniu Rosca <erosca@de.adit-jv.com> > Signed-off-by: George G. Davis <george_davis@mentor.com> > --- > drivers/tty/serial/sh-sci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c > index 3cd139752d3f..885b56b1d4e4 100644 > --- a/drivers/tty/serial/sh-sci.c > +++ b/drivers/tty/serial/sh-sci.c > @@ -1557,6 +1557,9 @@ static void sci_request_dma(struct uart_port *port) > > dev_dbg(port->dev, "%s: port %d\n", __func__, port->line); > > + if (uart_console(port)) > + return; /* Cannot use DMA on console */ Minor nit: maybe the comment can be made more specific? /* * DMA on console may interfere with Kernel log messages which use * plain putchar(). So, simply don't use it with a console. */ Other than that: Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Much better than dropping the properties, as Geert noted.
Hello Wolfram, On Mon, May 13, 2019 at 03:51:14PM +0200, Wolfram Sang wrote: > On Thu, May 09, 2019 at 10:43:30AM -0400, George G. Davis wrote: > > As noted in commit 84b40e3b57ee ("serial: 8250: omap: Disable DMA for > > console UART"), UART console lines use low-level PIO only access functions > > which will conflict with use of the line when DMA is enabled, e.g. when > > the console line is also used for systemd messages. So disable DMA > > support for UART console lines. > > > > Fixes: https://patchwork.kernel.org/patch/10929511/ > > Reported-by: Michael Rodin <mrodin@de.adit-jv.com> > > Cc: Eugeniu Rosca <erosca@de.adit-jv.com> > > Signed-off-by: George G. Davis <george_davis@mentor.com> > > --- > > drivers/tty/serial/sh-sci.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c > > index 3cd139752d3f..885b56b1d4e4 100644 > > --- a/drivers/tty/serial/sh-sci.c > > +++ b/drivers/tty/serial/sh-sci.c > > @@ -1557,6 +1557,9 @@ static void sci_request_dma(struct uart_port *port) > > > > dev_dbg(port->dev, "%s: port %d\n", __func__, port->line); > > > > + if (uart_console(port)) > > + return; /* Cannot use DMA on console */ > > Minor nit: maybe the comment can be made more specific? > > /* > * DMA on console may interfere with Kernel log messages which use > * plain putchar(). So, simply don't use it with a console. > */ I'll submit v2 with the above recommended change. Thanks! > Other than that: > > Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> > > Much better than dropping the properties, as Geert noted.
Hi Eugeniu-san, Geert-san, > From: Eugeniu Rosca, Sent: Tuesday, May 7, 2019 4:43 AM <snip> > > > [0] v5.0-rc6 commit 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > > [1] v4.14.106 commit 703db5d1b1759f ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > > [2] scif (DEBUG) and rcar-dmac logs: > > > https://gist.github.com/erosca/132cce76a619724a9e4fa61d1db88c66 <snip> > Enabling DEBUG in drivers/dma/sh/rcar-dmac.c, I can notice that one of > the symptoms is a NULL dst_addr revealed by: > > rcar-dmac e7300000.dma-controller: chan0: queue chunk (____ptrval____): 0@0xffff800639eb8090 -> 0x0000000000000000 > > In working scenarios, dst_addr is never zero. Does it give any hints? Thank you for the report! It's very helpful to me. I think we should fix the sh-sci driver at least. According to the [2] log above, [ 4.379716] sh-sci e6e88000.serial: sci_dma_tx_work_fn: ffff800639b55000: 0...0, cookie 126 This "0...0" means the s->tx_dma_len on the sci_dma_tx_work_fn will be zero. And, > rcar-dmac e7300000.dma-controller: chan0: queue chunk (____ptrval____): 0@0xffff800639eb8090 -> 0x0000000000000000 This means the chunk->dst_addr is not set to the "dst_addr" for SCIF because the len on rcar_dmac_chan_prep_sg is zero. So, I'm thinking: - we have to fix the sh_sci driver to avoid "tx_dma_len = 0" transferring. and - also we have to fix the rcar-dmac driver to avoid this issue because the DMA Engine API guide doesn't prevent the len = 0. Eugeniu-san, Geert-san, what do you think? Best regards, Yoshihiro Shimoda >> > > Thanks! > > Likewise! > > > > > Gr{oetje,eeting}s, > > > > Geert > > > > -- > > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > > > In personal conversations with technical people, I call myself a hacker. But > > when I'm talking to journalists I just say "programmer" or something like that. > > -- Linus Torvalds > > -- > Best Regards, > Eugeniu.
Hi Shimoda-san, Thanks for your analysis! On Mon, May 20, 2019 at 4:18 AM Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> wrote: > > From: Eugeniu Rosca, Sent: Tuesday, May 7, 2019 4:43 AM > <snip> > > > > [0] v5.0-rc6 commit 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > > > [1] v4.14.106 commit 703db5d1b1759f ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > > > [2] scif (DEBUG) and rcar-dmac logs: > > > > https://gist.github.com/erosca/132cce76a619724a9e4fa61d1db88c66 > <snip> > > Enabling DEBUG in drivers/dma/sh/rcar-dmac.c, I can notice that one of > > the symptoms is a NULL dst_addr revealed by: > > > > rcar-dmac e7300000.dma-controller: chan0: queue chunk (____ptrval____): 0@0xffff800639eb8090 -> 0x0000000000000000 > > > > In working scenarios, dst_addr is never zero. Does it give any hints? > > Thank you for the report! It's very helpful to me. > I think we should fix the sh-sci driver at least. > > According to the [2] log above, > > [ 4.379716] sh-sci e6e88000.serial: sci_dma_tx_work_fn: ffff800639b55000: 0...0, cookie 126 > > This "0...0" means the s->tx_dma_len on the sci_dma_tx_work_fn will be zero. And, How can this happen? schedule_work(&s->work_tx) is called only if !uart_circ_empty(), and while holding the port lock? So the circular buffer must be made empty in between the call to schedule_work() and the work function sci_dma_tx_work_fn() being called. I think this can happen if uart_flush_buffer() is called at the right moment? > > rcar-dmac e7300000.dma-controller: chan0: queue chunk (____ptrval____): 0@0xffff800639eb8090 -> 0x0000000000000000 > > This means the chunk->dst_addr is not set to the "dst_addr" for SCIF because the len on rcar_dmac_chan_prep_sg is zero. > So, I'm thinking: > - we have to fix the sh_sci driver to avoid "tx_dma_len = 0" transferring. That sounds like just a simple check for !s->tx_dma_len in sci_dma_tx_work_fn(), to return early, _and_ reset s->cookie_tx to -EINVAL. However, uart_flush_buffer() may still be called in between the check and the calls to dmaengine_prep_slave_single() / dma_sync_single_for_device(), clearing s->tx_dma_len again. Unless something has changed recently, these two calls cannot be moved inside the spinlock-protected section? Using a cached value of s->tx_dma_len for the dmaengine calls might work, though. > and > > - also we have to fix the rcar-dmac driver to avoid this issue because the DMA Engine API > guide doesn't prevent the len = 0. I guess returning an error makes most sense? Else we have to fix it deeper into the driver, where handling becomes more complex. Gr{oetje,eeting}s, Geert
Hi Geert-san, Thank you for your reply! > From: Geert Uytterhoeven, Sent: Monday, May 20, 2019 4:38 PM > > Hi Shimoda-san, > > Thanks for your analysis! > > On Mon, May 20, 2019 at 4:18 AM Yoshihiro Shimoda > <yoshihiro.shimoda.uh@renesas.com> wrote: > > > From: Eugeniu Rosca, Sent: Tuesday, May 7, 2019 4:43 AM > > <snip> > > > > > [0] v5.0-rc6 commit 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > > > > [1] v4.14.106 commit 703db5d1b1759f ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") > > > > > [2] scif (DEBUG) and rcar-dmac logs: > > > > > https://gist.github.com/erosca/132cce76a619724a9e4fa61d1db88c66 > > <snip> > > > Enabling DEBUG in drivers/dma/sh/rcar-dmac.c, I can notice that one of > > > the symptoms is a NULL dst_addr revealed by: > > > > > > rcar-dmac e7300000.dma-controller: chan0: queue chunk (____ptrval____): 0@0xffff800639eb8090 -> 0x0000000000000000 > > > > > > In working scenarios, dst_addr is never zero. Does it give any hints? > > > > Thank you for the report! It's very helpful to me. > > I think we should fix the sh-sci driver at least. > > > > According to the [2] log above, > > > > [ 4.379716] sh-sci e6e88000.serial: sci_dma_tx_work_fn: ffff800639b55000: 0...0, cookie 126 > > > > This "0...0" means the s->tx_dma_len on the sci_dma_tx_work_fn will be zero. And, > > How can this happen? schedule_work(&s->work_tx) is called only if > !uart_circ_empty(), and while holding the port lock? So the circular > buffer must be made empty in between the call to schedule_work() and the > work function sci_dma_tx_work_fn() being called. > > I think this can happen if uart_flush_buffer() is called at the right > moment? I think so. According to the log [2], the xmit->head and tail is set to zero. 278 [ 4.331234] sh-sci e6e88000.serial: sci_dma_tx_work_fn: ffff800639b55000: 9...52, cookie 124 279 [ 4.334885] sh-sci e6e88000.serial: sci_dma_tx_complete(0) 280 [ 4.339992] sh-sci e6e88000.serial: sci_dma_tx_work_fn: ffff800639b55000: 52...100, cookie 125 281 [ 4.343340] sh-sci e6e88000.serial: sci_dma_tx_complete(0) 282 [ 4.379716] sh-sci e6e88000.serial: sci_dma_tx_work_fn: ffff800639b55000: 0...0, cookie 126 > > > rcar-dmac e7300000.dma-controller: chan0: queue chunk (____ptrval____): 0@0xffff800639eb8090 -> 0x0000000000000000 > > > > This means the chunk->dst_addr is not set to the "dst_addr" for SCIF because the len on rcar_dmac_chan_prep_sg is zero. > > So, I'm thinking: > > - we have to fix the sh_sci driver to avoid "tx_dma_len = 0" transferring. > > That sounds like just a simple check for !s->tx_dma_len in > sci_dma_tx_work_fn(), to return early, _and_ reset s->cookie_tx to > -EINVAL. > > However, uart_flush_buffer() may still be called in between the check > and the calls to dmaengine_prep_slave_single() / > dma_sync_single_for_device(), clearing s->tx_dma_len again. > Unless something has changed recently, these two calls cannot be moved > inside the spinlock-protected section? I also think these two calls (and dmaengine_submit() and dma_async_issue_pending()) should be moved inside the spinlock-protected section like sci_dma_rx_complete(). Also, sci_flush_buffer() should have the spinlock-protected section and check the xmit and dma state somehow. > Using a cached value of s->tx_dma_len for the dmaengine calls might > work, though. > > > and > > > > - also we have to fix the rcar-dmac driver to avoid this issue because the DMA Engine API > > guide doesn't prevent the len = 0. > > I guess returning an error makes most sense? > Else we have to fix it deeper into the driver, where handling becomes > more complex. I see. I think so. (We should avoid more complex.) Best regards, Yoshihiro Shimoda
diff --git a/arch/arm64/boot/dts/renesas/r8a7796.dtsi b/arch/arm64/boot/dts/renesas/r8a7796.dtsi index cdf784899cf8..23de63f3d6c3 100644 --- a/arch/arm64/boot/dts/renesas/r8a7796.dtsi +++ b/arch/arm64/boot/dts/renesas/r8a7796.dtsi @@ -1262,9 +1262,6 @@ <&cpg CPG_CORE R8A7796_CLK_S3D1>, <&scif_clk>; clock-names = "fck", "brg_int", "scif_clk"; - dmas = <&dmac1 0x13>, <&dmac1 0x12>, - <&dmac2 0x13>, <&dmac2 0x12>; - dma-names = "tx", "rx", "tx", "rx"; power-domains = <&sysc R8A7796_PD_ALWAYS_ON>; resets = <&cpg 310>; status = "disabled";
This reverts commit 97f26702bc95b5c3a72671d5c6675e4d6ee0a2f4. Here is the story behind this revert. Mainline commit [0] landed in the stable tree as commit [1], from where it reached us in the form of regular stable update. After that, Michael started to report occasional (30-50%) freezes of serial console on booting M3-ES1.1-Salvator-XS. Same happened on M3-ES1.1-Salvator-X. Every time the issue occurs, the serial console outputs below [2] before becoming totally unresponsive and printing nothing else: rcar-dmac e7300000.dma-controller: Channel Address Error Git bisecting shows that the problem is contributed by commits [0-1]. While we can't be 100% certain (since we don't have the SCIF design docs revealing its internal implementation detail) we think there is plenty of evidence to assume that DMA is not supported on SCIF2, hence should stay disabled on this specific channel: - Excerpt from Chapter 17. Direct Memory Access Controller for System (SYS-DMAC) of R19UH0105EJ0150 Rev.1.50: ---------8<--------- [H3, H3-N, M3-W, V3M, V3H, D3, M3-N, E3] The following modules can issue on-chip peripheral module requests. [..] HSCIF0/1/2/3/4, [..] SCIF0/1/3/4/5, ---------8<--------- - Excerpt from RENESAS_RCH3M3M3NE3_SCIF_UME_v2.00.pdf (Yocto v3.15.0): ---------8<--------- DMA Transfer: - Support: SCIF0, SCIF1, SCIF3, SCIF4, SCIF5 - Not support: SCIF2 ---------8<--------- - Disabled SCIF2 DMA in official Renesas v4.9/v4.14 kernels, e.g. see: https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/commit/?id=e79c418fda8c Based on the issues generated by [0-1] (reproduced on H3, M3 and M3N) and the doc statements presented above, we think it makes sense to disable DMA on SCIF2 for most/all R-Car3 SoCs. [0] v5.0-rc6 commit 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") [1] v4.14.106 commit 703db5d1b1759f ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") [2] scif (DEBUG) and rcar-dmac logs: https://gist.github.com/erosca/132cce76a619724a9e4fa61d1db88c66 Fixes: 97f26702bc95b5 ("arm64: dts: renesas: r8a7796: Enable DMA for SCIF2") Reported-by: Michael Rodin <mrodin@de.adit-jv.com> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> --- arch/arm64/boot/dts/renesas/r8a7796.dtsi | 3 --- 1 file changed, 3 deletions(-)