diff mbox series

Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again"

Message ID 20181216200833.27928-1-paul.burton@mips.com (mailing list archive)
State Accepted
Commit 3c9dc275dba1124c1e16e7037226038451286813
Headers show
Series Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again" | expand

Commit Message

Paul Burton Dec. 16, 2018, 8:10 p.m. UTC
Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode
again") makes a change to FIFO clearing code which its commit message
suggests was intended to be specific to use with RS485 mode, however:

 1) The change made does not just affect __do_stop_tx_rs485(), it also
    affects other uses of serial8250_clear_fifos() including paths for
    starting up, shutting down or auto-configuring a port regardless of
    whether it's an RS485 port or not.

 2) It makes the assumption that resetting the FIFOs is a no-op when
    FIFOs are disabled, and as such it checks for this case & explicitly
    avoids setting the FIFO reset bits when the FIFO enable bit is
    clear. A reading of the PC16550D manual would suggest that this is
    OK since the FIFO should automatically be reset if it is later
    enabled, but we support many 16550-compatible devices and have never
    required this auto-reset behaviour for at least the whole git era.
    Starting to rely on it now seems risky, offers no benefit, and
    indeed breaks at least the Ingenic JZ4780's UARTs which reads
    garbage when the RX FIFO is enabled if we don't explicitly reset it.

 3) By only resetting the FIFOs if they're enabled, the behaviour of
    serial8250_do_startup() during boot now depends on what the value of
    FCR is before the 8250 driver is probed. This in itself seems
    questionable and leaves us with FCR=0 & no FIFO reset if the UART
    was used by 8250_early, otherwise it depends upon what the
    bootloader left behind.

 4) Although the naming of serial8250_clear_fifos() may be unclear, it
    is clear that callers of it expect that it will disable FIFOs. Both
    serial8250_do_startup() & serial8250_do_shutdown() contain comments
    to that effect, and other callers explicitly re-enable the FIFOs
    after calling serial8250_clear_fifos(). The premise of that patch
    that disabling the FIFOs is incorrect therefore seems wrong.

For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix
clearing FIFOs in RS485 mode again").

Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Jedrychowski <avistel@gmail.com>
Cc: Marek Vasut <marex@denx.de>
Cc: linux-mips@vger.kernel.org
Cc: linux-serial@vger.kernel.org
Cc: stable <stable@vger.kernel.org> # 4.10+
---
I did suggest an alternative approach which would rename
serial8250_clear_fifos() and split it into 2 variants - one that
disables FIFOs & one that does not, then use the latter in
__do_stop_tx_rs485():

https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/

However I have no access to the OMAP3 hardware that Marek's patch was
attempting to fix & have heard nothing back with regards to him testing
that approach, so here's a simple revert that fixes the Ingenic JZ4780.

I've marked for stable back to v4.10 presuming that this is how far the
broken patch may be backported, given that this is where commit
2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
transmits to break") that it tried to fix was introduced.
---
 drivers/tty/serial/8250/8250_port.c | 29 +++++------------------------
 1 file changed, 5 insertions(+), 24 deletions(-)

Comments

Marek Vasut Dec. 16, 2018, 8:32 p.m. UTC | #1
On 12/16/2018 09:10 PM, Paul Burton wrote:
> Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode
> again") makes a change to FIFO clearing code which its commit message
> suggests was intended to be specific to use with RS485 mode, however:
> 
>  1) The change made does not just affect __do_stop_tx_rs485(), it also
>     affects other uses of serial8250_clear_fifos() including paths for
>     starting up, shutting down or auto-configuring a port regardless of
>     whether it's an RS485 port or not.
> 
>  2) It makes the assumption that resetting the FIFOs is a no-op when
>     FIFOs are disabled, and as such it checks for this case & explicitly
>     avoids setting the FIFO reset bits when the FIFO enable bit is
>     clear. A reading of the PC16550D manual would suggest that this is
>     OK since the FIFO should automatically be reset if it is later
>     enabled, but we support many 16550-compatible devices and have never
>     required this auto-reset behaviour for at least the whole git era.
>     Starting to rely on it now seems risky, offers no benefit, and
>     indeed breaks at least the Ingenic JZ4780's UARTs which reads
>     garbage when the RX FIFO is enabled if we don't explicitly reset it.
> 
>  3) By only resetting the FIFOs if they're enabled, the behaviour of
>     serial8250_do_startup() during boot now depends on what the value of
>     FCR is before the 8250 driver is probed. This in itself seems
>     questionable and leaves us with FCR=0 & no FIFO reset if the UART
>     was used by 8250_early, otherwise it depends upon what the
>     bootloader left behind.
> 
>  4) Although the naming of serial8250_clear_fifos() may be unclear, it
>     is clear that callers of it expect that it will disable FIFOs. Both
>     serial8250_do_startup() & serial8250_do_shutdown() contain comments
>     to that effect, and other callers explicitly re-enable the FIFOs
>     after calling serial8250_clear_fifos(). The premise of that patch
>     that disabling the FIFOs is incorrect therefore seems wrong.
> 
> For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix
> clearing FIFOs in RS485 mode again").
> 
> Signed-off-by: Paul Burton <paul.burton@mips.com>
> Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Jedrychowski <avistel@gmail.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: linux-mips@vger.kernel.org
> Cc: linux-serial@vger.kernel.org
> Cc: stable <stable@vger.kernel.org> # 4.10+
I am unable to test it on such a short notice as I'm currently ill, so I
cannot tell if your change breaks the OMAP3/AM335x boards or not. Given
that there are very few CI20 boards in use, I'd like to ask you for some
extra time to investigate this on the OMAP3 too.

btw what strikes me as curious is that this patch emerged shortly after
Ezequiel re-posted the CI20 U-Boot patches after an year-long hiatus, is
it somehow related ?
Marek Vasut Dec. 16, 2018, 9:08 p.m. UTC | #2
On 12/16/2018 09:10 PM, Paul Burton wrote:
> Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode
> again") makes a change to FIFO clearing code which its commit message
> suggests was intended to be specific to use with RS485 mode, however:
> 
>  1) The change made does not just affect __do_stop_tx_rs485(), it also
>     affects other uses of serial8250_clear_fifos() including paths for
>     starting up, shutting down or auto-configuring a port regardless of
>     whether it's an RS485 port or not.
> 
>  2) It makes the assumption that resetting the FIFOs is a no-op when
>     FIFOs are disabled, and as such it checks for this case & explicitly
>     avoids setting the FIFO reset bits when the FIFO enable bit is
>     clear. A reading of the PC16550D manual would suggest that this is
>     OK since the FIFO should automatically be reset if it is later
>     enabled, but we support many 16550-compatible devices and have never
>     required this auto-reset behaviour for at least the whole git era.
>     Starting to rely on it now seems risky, offers no benefit, and
>     indeed breaks at least the Ingenic JZ4780's UARTs which reads
>     garbage when the RX FIFO is enabled if we don't explicitly reset it.
> 
>  3) By only resetting the FIFOs if they're enabled, the behaviour of
>     serial8250_do_startup() during boot now depends on what the value of
>     FCR is before the 8250 driver is probed. This in itself seems
>     questionable and leaves us with FCR=0 & no FIFO reset if the UART
>     was used by 8250_early, otherwise it depends upon what the
>     bootloader left behind.
> 
>  4) Although the naming of serial8250_clear_fifos() may be unclear, it
>     is clear that callers of it expect that it will disable FIFOs. Both
>     serial8250_do_startup() & serial8250_do_shutdown() contain comments
>     to that effect, and other callers explicitly re-enable the FIFOs
>     after calling serial8250_clear_fifos(). The premise of that patch
>     that disabling the FIFOs is incorrect therefore seems wrong.
> 
> For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix
> clearing FIFOs in RS485 mode again").
> 
> Signed-off-by: Paul Burton <paul.burton@mips.com>
> Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Jedrychowski <avistel@gmail.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: linux-mips@vger.kernel.org
> Cc: linux-serial@vger.kernel.org
> Cc: stable <stable@vger.kernel.org> # 4.10+
> ---
> I did suggest an alternative approach which would rename
> serial8250_clear_fifos() and split it into 2 variants - one that
> disables FIFOs & one that does not, then use the latter in
> __do_stop_tx_rs485():
> 
> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
> 
> However I have no access to the OMAP3 hardware that Marek's patch was
> attempting to fix & have heard nothing back with regards to him testing
> that approach, so here's a simple revert that fixes the Ingenic JZ4780.
> 
> I've marked for stable back to v4.10 presuming that this is how far the
> broken patch may be backported, given that this is where commit
> 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
> transmits to break") that it tried to fix was introduced.

OK, I tested this on AM335x / OMAP3 and the system is again broken, so
that's a NAK.

> ---
>  drivers/tty/serial/8250/8250_port.c | 29 +++++------------------------
>  1 file changed, 5 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index f776b3eafb96..3f779d25ec0c 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -552,30 +552,11 @@ static unsigned int serial_icr_read(struct uart_8250_port *up, int offset)
>   */
>  static void serial8250_clear_fifos(struct uart_8250_port *p)
>  {
> -	unsigned char fcr;
> -	unsigned char clr_mask = UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT;
> -
>  	if (p->capabilities & UART_CAP_FIFO) {
> -		/*
> -		 * Make sure to avoid changing FCR[7:3] and ENABLE_FIFO bits.
> -		 * In case ENABLE_FIFO is not set, there is nothing to flush
> -		 * so just return. Furthermore, on certain implementations of
> -		 * the 8250 core, the FCR[7:3] bits may only be changed under
> -		 * specific conditions and changing them if those conditions
> -		 * are not met can have nasty side effects. One such core is
> -		 * the 8250-omap present in TI AM335x.
> -		 */
> -		fcr = serial_in(p, UART_FCR);
> -
> -		/* FIFO is not enabled, there's nothing to clear. */
> -		if (!(fcr & UART_FCR_ENABLE_FIFO))
> -			return;
> -
> -		fcr |= clr_mask;
> -		serial_out(p, UART_FCR, fcr);
> -
> -		fcr &= ~clr_mask;
> -		serial_out(p, UART_FCR, fcr);
> +		serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO);
> +		serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO |
> +			       UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
> +		serial_out(p, UART_FCR, 0);
>  	}
>  }
>  
> @@ -1467,7 +1448,7 @@ static void __do_stop_tx_rs485(struct uart_8250_port *p)
>  	 * Enable previously disabled RX interrupts.
>  	 */
>  	if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
> -		serial8250_clear_fifos(p);
> +		serial8250_clear_and_reinit_fifos(p);
>  
>  		p->ier |= UART_IER_RLSI | UART_IER_RDI;
>  		serial_port_out(&p->port, UART_IER, p->ier);
>
Paul Burton Dec. 16, 2018, 9:31 p.m. UTC | #3
Hi Marek,

On Sun, Dec 16, 2018 at 09:32:19PM +0100, Marek Vasut wrote:
> I am unable to test it on such a short notice as I'm currently ill, so I
> cannot tell if your change breaks the OMAP3/AM335x boards or not. Given
> that there are very few CI20 boards in use, I'd like to ask you for some
> extra time to investigate this on the OMAP3 too.

I'm sorry to hear that you're ill, but your patch is getting awfully
close to becoming part of a stable kernel release & it causes
regressions. Even if it didn't break a board I use, I think the patch
would be broken & risky for the reasons I outlined in my revert's commit
message.

Ultimately it's Greg's decision but it sounds like you're asking me to
say it's OK to break the JZ4780 in a stable kernel with a patch that I
think would be risky anyway, and I won't do that.

> btw what strikes me as curious is that this patch emerged shortly after
> Ezequiel re-posted the CI20 U-Boot patches after an year-long hiatus, is
> it somehow related ?

Not at all - I regularly test on Ci20 & found this breakage whilst
testing Paul Cercueil's Ingenic TCU patchset v8 [1]. Using a Ci20 with
mainline kernels doesn't rely on Ezequiel's U-Boot work, and indeed I
generally boot using my ci20-usb-boot tool [2] so U-Boot isn't involved
at all.

Now my Ci20 testing isn't automated so it tends to happen mostly when
there are obvious changes to the board or SoC support, but it should
become more automated soon. Kevin Hilman just got a Ci40 working with
kernelci.org infrastructure & I hope we can get Ci20 & other boards
included soon too, either via existing kernelci.org labs or by setting
up a new one.

Thanks,
    Paul

[1] https://lore.kernel.org/linux-mips/20181212220922.18759-1-paul@crapouillou.net/T/#t
[2] https://github.com/paulburton/ci20-tools
Paul Burton Dec. 16, 2018, 9:39 p.m. UTC | #4
Hi Marek,

On Sun, Dec 16, 2018 at 10:08:48PM +0100, Marek Vasut wrote:
> > I did suggest an alternative approach which would rename
> > serial8250_clear_fifos() and split it into 2 variants - one that
> > disables FIFOs & one that does not, then use the latter in
> > __do_stop_tx_rs485():
> > 
> > https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
> > 
> > However I have no access to the OMAP3 hardware that Marek's patch was
> > attempting to fix & have heard nothing back with regards to him testing
> > that approach, so here's a simple revert that fixes the Ingenic JZ4780.
> > 
> > I've marked for stable back to v4.10 presuming that this is how far the
> > broken patch may be backported, given that this is where commit
> > 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
> > transmits to break") that it tried to fix was introduced.
> 
> OK, I tested this on AM335x / OMAP3 and the system is again broken, so
> that's a NAK.

To be clear - what did you test? This revert or the patch linked to
above?

This revert would of course reintroduce your RS485 issue because it just
undoes your change.

Either way, commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in
RS485 mode again") breaks systems that worked before it so at this late
stage in the 4.20 cycle a revert would make sense to me. If that breaks
RS85 on OMAP3 then my question would be how much can anyone really care
if nobody noticed since v4.10? And why should that lead to you breaking
the JZ4780 which has been discovered before a stable kernel release
includes the breakage?

Thanks,
    Paul
Marek Vasut Dec. 16, 2018, 9:45 p.m. UTC | #5
On 12/16/2018 10:31 PM, Paul Burton wrote:
> Hi Marek,

Hi,

> On Sun, Dec 16, 2018 at 09:32:19PM +0100, Marek Vasut wrote:
>> I am unable to test it on such a short notice as I'm currently ill, so I
>> cannot tell if your change breaks the OMAP3/AM335x boards or not. Given
>> that there are very few CI20 boards in use, I'd like to ask you for some
>> extra time to investigate this on the OMAP3 too.
> 
> I'm sorry to hear that you're ill, but your patch is getting awfully
> close to becoming part of a stable kernel release & it causes
> regressions. Even if it didn't break a board I use, I think the patch
> would be broken & risky for the reasons I outlined in my revert's commit
> message.

That's what the incremental releases are for, so that minor problems can
get fixed there. Sure, it's great to have things perfect in the first
release, but if that breaks other systems, too bad.

> Ultimately it's Greg's decision but it sounds like you're asking me to
> say it's OK to break the JZ4780 in a stable kernel with a patch that I
> think would be risky anyway, and I won't do that.

I am saying this revert breaks AM335x, so this is a stalemate. I had a
discussion with Ezequiel (on CC) and he seems to have a different
smaller patch coming for this problem.

[...]
Ezequiel Garcia Dec. 16, 2018, 9:52 p.m. UTC | #6
On Sun, 16 Dec 2018 at 18:45, Marek Vasut <marex@denx.de> wrote:
[skips discussion]
>
> > Ultimately it's Greg's decision but it sounds like you're asking me to
> > say it's OK to break the JZ4780 in a stable kernel with a patch that I
> > think would be risky anyway, and I won't do that.
>
> I am saying this revert breaks AM335x, so this is a stalemate. I had a
> discussion with Ezequiel (on CC) and he seems to have a different
> smaller patch coming for this problem.
>

Can you guys test this? Note that serial8250_do_startup has a comment
stating clearly that it has the intention of disabling the FIFOs,
so it seems this is the right thing to do.

Paul, this removes the garbage on my CI20 (rev.1)

diff --git a/drivers/tty/serial/8250/8250_port.c
b/drivers/tty/serial/8250/8250_port.c
index c39482b96111..fac19cbc51d1 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port)
        /*
         * Clear the FIFO buffers and disable them.
         * (they will be reenabled in set_termios())
         */
        serial8250_clear_fifos(up);
+       serial_out(up, UART_FCR, 0);

        /*
         * Clear the interrupt registers.
         */
        serial_port_in(port, UART_LSR);
Marek Vasut Dec. 16, 2018, 10:01 p.m. UTC | #7
On 12/16/2018 10:39 PM, Paul Burton wrote:
> Hi Marek,

Hi,

> On Sun, Dec 16, 2018 at 10:08:48PM +0100, Marek Vasut wrote:
>>> I did suggest an alternative approach which would rename
>>> serial8250_clear_fifos() and split it into 2 variants - one that
>>> disables FIFOs & one that does not, then use the latter in
>>> __do_stop_tx_rs485():
>>>
>>> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
>>>
>>> However I have no access to the OMAP3 hardware that Marek's patch was
>>> attempting to fix & have heard nothing back with regards to him testing
>>> that approach, so here's a simple revert that fixes the Ingenic JZ4780.
>>>
>>> I've marked for stable back to v4.10 presuming that this is how far the
>>> broken patch may be backported, given that this is where commit
>>> 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
>>> transmits to break") that it tried to fix was introduced.
>>
>> OK, I tested this on AM335x / OMAP3 and the system is again broken, so
>> that's a NAK.
> 
> To be clear - what did you test? This revert or the patch linked to
> above?
> 
> This revert would of course reintroduce your RS485 issue because it just
> undoes your change.

The revert. Which of the two patches do you need me to test.

> Either way, commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in
> RS485 mode again") breaks systems that worked before it so at this late
> stage in the 4.20 cycle a revert would make sense to me. If that breaks
> RS85 on OMAP3 then my question would be how much can anyone really care
> if nobody noticed since v4.10? And why should that lead to you breaking
> the JZ4780 which has been discovered before a stable kernel release
> includes the breakage?

There's always a .y release where this can be properly investigated and
solved, instead of breaking one platform or the other.

Then again, see the patch from Ezequiel that was just posted, I think it
might be a far better solution.
Marek Vasut Dec. 16, 2018, 10:11 p.m. UTC | #8
On 12/16/2018 10:52 PM, Ezequiel Garcia wrote:
> On Sun, 16 Dec 2018 at 18:45, Marek Vasut <marex@denx.de> wrote:
> [skips discussion]
>>
>>> Ultimately it's Greg's decision but it sounds like you're asking me to
>>> say it's OK to break the JZ4780 in a stable kernel with a patch that I
>>> think would be risky anyway, and I won't do that.
>>
>> I am saying this revert breaks AM335x, so this is a stalemate. I had a
>> discussion with Ezequiel (on CC) and he seems to have a different
>> smaller patch coming for this problem.
>>
> 
> Can you guys test this? Note that serial8250_do_startup has a comment
> stating clearly that it has the intention of disabling the FIFOs,
> so it seems this is the right thing to do.
> 
> Paul, this removes the garbage on my CI20 (rev.1)
> 
> diff --git a/drivers/tty/serial/8250/8250_port.c
> b/drivers/tty/serial/8250/8250_port.c
> index c39482b96111..fac19cbc51d1 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port)
>         /*
>          * Clear the FIFO buffers and disable them.
>          * (they will be reenabled in set_termios())
>          */
>         serial8250_clear_fifos(up);
> +       serial_out(up, UART_FCR, 0);
> 
>         /*
>          * Clear the interrupt registers.
>          */
>         serial_port_in(port, UART_LSR);
> 
> 

On AM335x pocketbeagle
Tested-by: Marek Vasut <marex@denx.de>
Paul Burton Dec. 16, 2018, 10:16 p.m. UTC | #9
On Sun, Dec 16, 2018 at 10:45:23PM +0100, Marek Vasut wrote:
> >> I am unable to test it on such a short notice as I'm currently ill, so I
> >> cannot tell if your change breaks the OMAP3/AM335x boards or not. Given
> >> that there are very few CI20 boards in use, I'd like to ask you for some
> >> extra time to investigate this on the OMAP3 too.
> > 
> > I'm sorry to hear that you're ill, but your patch is getting awfully
> > close to becoming part of a stable kernel release & it causes
> > regressions. Even if it didn't break a board I use, I think the patch
> > would be broken & risky for the reasons I outlined in my revert's commit
> > message.
> 
> That's what the incremental releases are for, so that minor problems can
> get fixed there. Sure, it's great to have things perfect in the first
> release, but if that breaks other systems, too bad.

I don't think the purpose of stable kernels is to intentionally break
systems in a stable release & backport fixes later...

> > Ultimately it's Greg's decision but it sounds like you're asking me to
> > say it's OK to break the JZ4780 in a stable kernel with a patch that I
> > think would be risky anyway, and I won't do that.
> 
> I am saying this revert breaks AM335x, so this is a stalemate. I had a
> discussion with Ezequiel (on CC) and he seems to have a different
> smaller patch coming for this problem.

Well, no. Your patch says commit 2bed8a8e7072 ("Clearing FIFOs in RS485
emulation mode causes subsequent transmits to break") broke RS485 on
AM33x in v4.10, and it took 10 release cycles for someone to notice &
fix it. You're asking to break a system that is used & working in order
to fix a problem that seemingly wasn't even noticed for nearly 2 years.

Your fix breaks my system, but I outlined 4 reasons that I believe your
patch to be wrong anyway - breaking my system is just one part of that.

I'll rely to Ezequiel's patch now, but it again addresses just one part
of one of the 4 points I listed in the reasoning for my revert.

Thanks,
    Paul
Paul Burton Dec. 16, 2018, 10:24 p.m. UTC | #10
Hi Ezequiel,

On Sun, Dec 16, 2018 at 06:52:53PM -0300, Ezequiel Garcia wrote:
> diff --git a/drivers/tty/serial/8250/8250_port.c
> b/drivers/tty/serial/8250/8250_port.c
> index c39482b96111..fac19cbc51d1 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port)
>         /*
>          * Clear the FIFO buffers and disable them.
>          * (they will be reenabled in set_termios())
>          */
>         serial8250_clear_fifos(up);
> +       serial_out(up, UART_FCR, 0);
> 
>         /*
>          * Clear the interrupt registers.
>          */
>         serial_port_in(port, UART_LSR);
> 

This helps, but it only addresses one part of one of the 4 reasons I
listed as motivation for my revert. For example serial8250_do_shutdown()
also clearly intends to disable the FIFOs.

Thanks,
    Paul
Paul Burton Dec. 16, 2018, 10:28 p.m. UTC | #11
Hi Marek,

On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
> >>> I did suggest an alternative approach which would rename
> >>> serial8250_clear_fifos() and split it into 2 variants - one that
> >>> disables FIFOs & one that does not, then use the latter in
> >>> __do_stop_tx_rs485():
> >>>
> >>> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
> >>>
> >>> However I have no access to the OMAP3 hardware that Marek's patch was
> >>> attempting to fix & have heard nothing back with regards to him testing
> >>> that approach, so here's a simple revert that fixes the Ingenic JZ4780.
> >>>
> >>> I've marked for stable back to v4.10 presuming that this is how far the
> >>> broken patch may be backported, given that this is where commit
> >>> 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
> >>> transmits to break") that it tried to fix was introduced.
> >>
> >> OK, I tested this on AM335x / OMAP3 and the system is again broken, so
> >> that's a NAK.
> > 
> > To be clear - what did you test? This revert or the patch linked to
> > above?
> > 
> > This revert would of course reintroduce your RS485 issue because it just
> > undoes your change.
> 
> The revert. Which of the two patches do you need me to test.

The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to
at lore.kernel.org in the quote right above:

https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/

You replied with comments on the patch, you just never tested it or
never told me if you did. The lack of response means I don't know
whether that potential patch even still works for your system, hence the
revert.

Thanks,
    Paul
Ezequiel Garcia Dec. 16, 2018, 10:28 p.m. UTC | #12
On Sun, 16 Dec 2018 at 19:24, Paul Burton <paul.burton@mips.com> wrote:
>
> Hi Ezequiel,
>
> On Sun, Dec 16, 2018 at 06:52:53PM -0300, Ezequiel Garcia wrote:
> > diff --git a/drivers/tty/serial/8250/8250_port.c
> > b/drivers/tty/serial/8250/8250_port.c
> > index c39482b96111..fac19cbc51d1 100644
> > --- a/drivers/tty/serial/8250/8250_port.c
> > +++ b/drivers/tty/serial/8250/8250_port.c
> > @@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port)
> >         /*
> >          * Clear the FIFO buffers and disable them.
> >          * (they will be reenabled in set_termios())
> >          */
> >         serial8250_clear_fifos(up);
> > +       serial_out(up, UART_FCR, 0);
> >
> >         /*
> >          * Clear the interrupt registers.
> >          */
> >         serial_port_in(port, UART_LSR);
> >
>
> This helps, but it only addresses one part of one of the 4 reasons I
> listed as motivation for my revert. For example serial8250_do_shutdown()
> also clearly intends to disable the FIFOs.
>

OK. So, let's fix that :-)

By all means, it would be really nice to push forward and fix the garbage
issue on JZ4780, as well as the transmission issue on AM335x.

AM335x is a wildly popular platform, and it's not funny to break it.
So, let's please stop discussing which board we'll break and just fix both.
Paul Burton Dec. 16, 2018, 10:35 p.m. UTC | #13
Hi Ezequiel,

On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
> On Sun, 16 Dec 2018 at 19:24, Paul Burton <paul.burton@mips.com> wrote:
> > This helps, but it only addresses one part of one of the 4 reasons I
> > listed as motivation for my revert. For example serial8250_do_shutdown()
> > also clearly intends to disable the FIFOs.
> >
> 
> OK. So, let's fix that :-)

I already did, or at least tried to, on Thursday [1].

> By all means, it would be really nice to push forward and fix the garbage
> issue on JZ4780, as well as the transmission issue on AM335x.
>
> AM335x is a wildly popular platform, and it's not funny to break it.

Well, clearly not if it was broken in v4.10 & only just fixed..? And
from Marek's commit message the patch in v4.10 doesn't break the whole
system just RS485.

> So, let's please stop discussing which board we'll break and just fix both.

I completely agree that would be ideal and I wrote a patch hoping to do
that on Thursday, but didn't get any response on testing. It's late in
the cycle hence a revert made sense. Simple as that.

Thanks,
    Paul

[1] https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
Greg KH Dec. 17, 2018, 3:18 p.m. UTC | #14
On Sun, Dec 16, 2018 at 10:35:12PM +0000, Paul Burton wrote:
> Hi Ezequiel,
> 
> On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
> > On Sun, 16 Dec 2018 at 19:24, Paul Burton <paul.burton@mips.com> wrote:
> > > This helps, but it only addresses one part of one of the 4 reasons I
> > > listed as motivation for my revert. For example serial8250_do_shutdown()
> > > also clearly intends to disable the FIFOs.
> > >
> > 
> > OK. So, let's fix that :-)
> 
> I already did, or at least tried to, on Thursday [1].
> 
> > By all means, it would be really nice to push forward and fix the garbage
> > issue on JZ4780, as well as the transmission issue on AM335x.
> >
> > AM335x is a wildly popular platform, and it's not funny to break it.
> 
> Well, clearly not if it was broken in v4.10 & only just fixed..? And
> from Marek's commit message the patch in v4.10 doesn't break the whole
> system just RS485.
> 
> > So, let's please stop discussing which board we'll break and just fix both.
> 
> I completely agree that would be ideal and I wrote a patch hoping to do
> that on Thursday, but didn't get any response on testing. It's late in
> the cycle hence a revert made sense. Simple as that.

A revert makes sense now, I'll go queue this up, thanks.

greg k-h
Marek Vasut Dec. 17, 2018, 3:43 p.m. UTC | #15
On 12/17/2018 04:18 PM, Greg Kroah-Hartman wrote:
> On Sun, Dec 16, 2018 at 10:35:12PM +0000, Paul Burton wrote:
>> Hi Ezequiel,
>>
>> On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
>>> On Sun, 16 Dec 2018 at 19:24, Paul Burton <paul.burton@mips.com> wrote:
>>>> This helps, but it only addresses one part of one of the 4 reasons I
>>>> listed as motivation for my revert. For example serial8250_do_shutdown()
>>>> also clearly intends to disable the FIFOs.
>>>>
>>>
>>> OK. So, let's fix that :-)
>>
>> I already did, or at least tried to, on Thursday [1].
>>
>>> By all means, it would be really nice to push forward and fix the garbage
>>> issue on JZ4780, as well as the transmission issue on AM335x.
>>>
>>> AM335x is a wildly popular platform, and it's not funny to break it.
>>
>> Well, clearly not if it was broken in v4.10 & only just fixed..? And
>> from Marek's commit message the patch in v4.10 doesn't break the whole
>> system just RS485.
>>
>>> So, let's please stop discussing which board we'll break and just fix both.
>>
>> I completely agree that would be ideal and I wrote a patch hoping to do
>> that on Thursday, but didn't get any response on testing. It's late in
>> the cycle hence a revert made sense. Simple as that.
> 
> A revert makes sense now, I'll go queue this up, thanks.

I don't like this for multiple reasons.
1) There is a better patch posted which doesn't break the AM335x and
   clearly identifies and fixes the problem on the JZ4780 / CI20
2) The JZ4780 8250 core is not a standard 8250 core, since it has extra
   bits in the FCR register (like the UME bit, which disables the whole
   UART block), so the revert IMO would break that core too, it just
   hides the breakage. I'm still trying to understand the implications
   of that in detail, but the discussion wasn't quite constructive.

I'd much rather see Ezequiel's patch applied, since that's far less
destructive approach to fixing the problem than the revert.
Ezequiel Garcia Dec. 17, 2018, 4:30 p.m. UTC | #16
On Sun, 16 Dec 2018 at 19:35, Paul Burton <paul.burton@mips.com> wrote:
>
> Hi Ezequiel,
>
> On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
> > On Sun, 16 Dec 2018 at 19:24, Paul Burton <paul.burton@mips.com> wrote:
> > > This helps, but it only addresses one part of one of the 4 reasons I
> > > listed as motivation for my revert. For example serial8250_do_shutdown()
> > > also clearly intends to disable the FIFOs.
> > >
> >
> > OK. So, let's fix that :-)
>
> I already did, or at least tried to, on Thursday [1].
>
> > By all means, it would be really nice to push forward and fix the garbage
> > issue on JZ4780, as well as the transmission issue on AM335x.
> >
> > AM335x is a wildly popular platform, and it's not funny to break it.
>
> Well, clearly not if it was broken in v4.10 & only just fixed..? And
> from Marek's commit message the patch in v4.10 doesn't break the whole
> system just RS485.
>

Careful here. It's naive to consider v4.10 as old in this context.

AM335x is used in hundreds of thousands of products, probably
"industrial" products.
Manufacturers don't always follow the kernel, and it's entirely
likely that they notice a regression only when developing a new product,
or when rebasing on top of a newer longterm kernel.

Then again, I don't have any details about what is Marek's original fix
actually fixing.

Marek: could you please post the test case that you used to validate your fix?
I can't find that anywhere.

> > So, let's please stop discussing which board we'll break and just fix both.
>
> I completely agree that would be ideal and I wrote a patch hoping to do
> that on Thursday, but didn't get any response on testing. It's late in
> the cycle hence a revert made sense. Simple as that.
>
> Thanks,
>     Paul
>
> [1] https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/

Well, I think this patch a lot of sense. But since Greg wants to
revert Marek's fix,
it would make sense to collate both Marek's and Paul's in a single commit.
Marek Vasut Dec. 17, 2018, 5:20 p.m. UTC | #17
On 12/17/2018 05:30 PM, Ezequiel Garcia wrote:
> On Sun, 16 Dec 2018 at 19:35, Paul Burton <paul.burton@mips.com> wrote:
>>
>> Hi Ezequiel,
>>
>> On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
>>> On Sun, 16 Dec 2018 at 19:24, Paul Burton <paul.burton@mips.com> wrote:
>>>> This helps, but it only addresses one part of one of the 4 reasons I
>>>> listed as motivation for my revert. For example serial8250_do_shutdown()
>>>> also clearly intends to disable the FIFOs.
>>>>
>>>
>>> OK. So, let's fix that :-)
>>
>> I already did, or at least tried to, on Thursday [1].
>>
>>> By all means, it would be really nice to push forward and fix the garbage
>>> issue on JZ4780, as well as the transmission issue on AM335x.
>>>
>>> AM335x is a wildly popular platform, and it's not funny to break it.
>>
>> Well, clearly not if it was broken in v4.10 & only just fixed..? And
>> from Marek's commit message the patch in v4.10 doesn't break the whole
>> system just RS485.
>>
> 
> Careful here. It's naive to consider v4.10 as old in this context.
> 
> AM335x is used in hundreds of thousands of products, probably
> "industrial" products.
> Manufacturers don't always follow the kernel, and it's entirely
> likely that they notice a regression only when developing a new product,
> or when rebasing on top of a newer longterm kernel.
> 
> Then again, I don't have any details about what is Marek's original fix
> actually fixing.
> 
> Marek: could you please post the test case that you used to validate your fix?
> I can't find that anywhere.

I can't share the testcase itself because it has no license and I didn't
write it, but I can tell you what it's doing. (I'll check if I can share
the testcase verbatim too, I just sent an email to the author)

The test spawns two threads, one sending , one receiving. The sending
thread sends 8 bytes of data from /dev/ttyS4 , the receiving thread
receives the data on /dev/ttyS2 and compares the pattern. The port
settings is B1000000 8N1 with rs485conf.flags set to SER_RS485_ENABLED
and SER_RS485_RTS_AFTER_SEND. Sometimes the received data do not match
the sent data, but rather look as if one character was left over from
the previous transmission in the FIFO.

Those two UARTs are connected together by two wires, without any real
RS485 hardware to minimize the hardware complexity (it's easy to
implement that on the pocketbeagle, which is the cheap option here).
Marek Vasut Dec. 19, 2018, 12:12 a.m. UTC | #18
On 12/17/2018 06:20 PM, Marek Vasut wrote:
> On 12/17/2018 05:30 PM, Ezequiel Garcia wrote:
>> On Sun, 16 Dec 2018 at 19:35, Paul Burton <paul.burton@mips.com> wrote:
>>>
>>> Hi Ezequiel,
>>>
>>> On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
>>>> On Sun, 16 Dec 2018 at 19:24, Paul Burton <paul.burton@mips.com> wrote:
>>>>> This helps, but it only addresses one part of one of the 4 reasons I
>>>>> listed as motivation for my revert. For example serial8250_do_shutdown()
>>>>> also clearly intends to disable the FIFOs.
>>>>>
>>>>
>>>> OK. So, let's fix that :-)
>>>
>>> I already did, or at least tried to, on Thursday [1].
>>>
>>>> By all means, it would be really nice to push forward and fix the garbage
>>>> issue on JZ4780, as well as the transmission issue on AM335x.
>>>>
>>>> AM335x is a wildly popular platform, and it's not funny to break it.
>>>
>>> Well, clearly not if it was broken in v4.10 & only just fixed..? And
>>> from Marek's commit message the patch in v4.10 doesn't break the whole
>>> system just RS485.
>>>
>>
>> Careful here. It's naive to consider v4.10 as old in this context.
>>
>> AM335x is used in hundreds of thousands of products, probably
>> "industrial" products.
>> Manufacturers don't always follow the kernel, and it's entirely
>> likely that they notice a regression only when developing a new product,
>> or when rebasing on top of a newer longterm kernel.
>>
>> Then again, I don't have any details about what is Marek's original fix
>> actually fixing.
>>
>> Marek: could you please post the test case that you used to validate your fix?
>> I can't find that anywhere.
> 
> I can't share the testcase itself because it has no license and I didn't
> write it, but I can tell you what it's doing. (I'll check if I can share
> the testcase verbatim too, I just sent an email to the author)
> 
> The test spawns two threads, one sending , one receiving. The sending
> thread sends 8 bytes of data from /dev/ttyS4 , the receiving thread
> receives the data on /dev/ttyS2 and compares the pattern. The port
> settings is B1000000 8N1 with rs485conf.flags set to SER_RS485_ENABLED
> and SER_RS485_RTS_AFTER_SEND. Sometimes the received data do not match
> the sent data, but rather look as if one character was left over from
> the previous transmission in the FIFO.
> 
> Those two UARTs are connected together by two wires, without any real
> RS485 hardware to minimize the hardware complexity (it's easy to
> implement that on the pocketbeagle, which is the cheap option here).

Test code is below . On the pocketbeagle, connect SPI0:CLK with U4:TX
and SPI0:MISO with U4:RX , apply the DT patch below and run the example.
It'll fail after a few iterations.

#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <termios.h>
#include <unistd.h>

#include <iostream>
#include <iomanip>
#include <atomic>
#include <thread>

#include <linux/serial.h>
#include <sys/ioctl.h>

std::atomic<bool> running{true};

int set_interface_attribs (int fd, int speed, int parity)
{
        struct termios tty;
        memset (&tty, 0, sizeof tty);
        if (tcgetattr (fd, &tty) != 0)
        {
                std::cerr << "Error from tcgetattr" << std::endl;
                return -1;
        }

        cfsetospeed (&tty, speed);
        cfsetispeed (&tty, speed);

        tty.c_cflag = (tty.c_cflag & ~CSIZE) | CS8;
        tty.c_iflag &= ~IGNBRK;
        tty.c_lflag = 0;

        tty.c_oflag = 0;
        tty.c_cc[VMIN]  = 8;
        tty.c_cc[VTIME] = 0;

        tty.c_iflag &= ~(IXON | IXOFF | IXANY);

        tty.c_cflag |= (CLOCAL | CREAD);
        tty.c_cflag &= ~(PARENB | PARODD);
        tty.c_cflag |= parity;
        tty.c_cflag &= ~CSTOPB;
        tty.c_cflag &= ~CRTSCTS;

        if (tcsetattr (fd, TCSANOW, &tty) != 0)
        {
                std::cerr << "Error from tcsetattr" << std::endl;
                return -1;
        }
        return 0;
}

void enable_rts(int fd, int rts)
{
     struct serial_rs485 rs485conf;
      if (rts) {
              rs485conf.flags = SER_RS485_ENABLED ;
              rs485conf.flags |= SER_RS485_RTS_AFTER_SEND;
              rs485conf.flags &= ~(SER_RS485_RTS_ON_SEND);
              rs485conf.delay_rts_before_send = 0;
              rs485conf.delay_rts_after_send = 0;
      } else {
              rs485conf.flags = 0 ;
      }

      if (ioctl( fd, TIOCSRS485, &rs485conf) < 0){
          std::cerr << "Cannot set device in RS485 mode" << std::endl;
      }
}

void output_function(int fd)
{
   while(running) {
    write (fd, "asdfghjk", 8);
    usleep (20000);
   }
}

void input_function(int fd)
{
  char buf [100];
  size_t count=0;
  std::cout << std::unitbuf;
  while(running){
    int n = read (fd, buf, sizeof buf);
    if( (n >0) && (buf[0] != 'a') ){
        std::cout << "\nunexpected receive! " << std::string(buf,8) <<
std::endl;
        running = false;
    } else {
       ++count;
       if(count%100 == 0){
          std::cout << "\r" << std::setw(8) << count << " possibly ok";
       }
    }
  }
}

int main(int argc, char** argv)
{
  std::string in_port  = "/dev/ttyS2";
  std::string out_port = "/dev/ttyS4";

  int c, rts = 1;

  while ((c = getopt (argc, argv, "i:o:r")) != -1)
  {
    switch (c)
      {
      case 'i':
        in_port = std::string(optarg);
        break;
      case 'o':
        out_port = std::string(optarg);
        break;
      case 'r':
        rts = 0;
        break;
      case '?':
        return 1;
      default:
        break;
      }
    }

    std::cout << "opening output " << out_port << std::endl;
    int outfd = open ( out_port.c_str(), O_RDWR | O_NOCTTY | O_SYNC);
    if (outfd < 0)
    {
            std::cerr << "Could not open " << out_port << std::endl;
            return 1;
    }

    std::cout << "opening input " << in_port << std::endl;
    int infd = open ( in_port.c_str(), O_RDWR | O_NOCTTY | O_SYNC);
    if (infd < 0)
    {
            std::cerr << "Could not open " << in_port << std::endl;
            return 1;
    }

    set_interface_attribs (infd,  B1000000, 0);
    set_interface_attribs (outfd, B1000000, 0);

    enable_rts(outfd, rts);
    enable_rts(infd, rts);

    std::thread in(input_function, infd);
    std::thread out(output_function, outfd);

    in.join();
    out.join();

    close(infd);
    close(outfd);
}

diff --git a/arch/arm/boot/dts/am335x-pocketbeagle.dts
b/arch/arm/boot/dts/am335x-pocketbeagle.dts
index 62fe5cab9fae..d9b8f09920a6 100644
--- a/arch/arm/boot/dts/am335x-pocketbeagle.dts
+++ b/arch/arm/boot/dts/am335x-pocketbeagle.dts
@@ -92,15 +92,6 @@
                >;
        };

-       spi0_pins: pinmux-spi0-pins {
-               pinctrl-single,pins = <
-                       AM33XX_IOPAD(0x950, PIN_INPUT_PULLUP |
MUX_MODE0)       /* (A17) spi0_sclk.spi0_sclk */
-                       AM33XX_IOPAD(0x954, PIN_INPUT_PULLUP |
MUX_MODE0)       /* (B17) spi0_d0.spi0_d0 */
-                       AM33XX_IOPAD(0x958, PIN_INPUT_PULLUP |
MUX_MODE0)       /* (B16) spi0_d1.spi0_d1 */
-                       AM33XX_IOPAD(0x95c, PIN_INPUT_PULLUP |
MUX_MODE0)       /* (A16) spi0_cs0.spi0_cs0 */
-               >;
-       };
-
        spi1_pins: pinmux-spi1-pins {
                pinctrl-single,pins = <
                        AM33XX_IOPAD(0x964, PIN_INPUT_PULLUP |
MUX_MODE4)       /* (C18) eCAP0_in_PWM0_out.spi1_sclk */
@@ -126,6 +117,13 @@
                >;
        };

+       uart2_pins: pinmux-uart2-pins {
+               pinctrl-single,pins = <
+                       AM33XX_IOPAD(0x950, PIN_INPUT | MUX_MODE1)
       /* spi0_sclk.uart2_rxd */
+                       AM33XX_IOPAD(0x954, PIN_OUTPUT | MUX_MODE1)
       /* spi0_d0.uart2_txd */
+               >;
+       };
+
        uart4_pins: pinmux-uart4-pins {
                pinctrl-single,pins = <
                        AM33XX_IOPAD(0x870, PIN_INPUT_PULLUP |
MUX_MODE6)       /* (T17) gpmc_wait0.uart4_rxd */
@@ -199,6 +197,13 @@
        status = "okay";
 };

+&uart2 {
+       pinctrl-names = "default";
+       pinctrl-0 = <&uart2_pins>;
+
+       status = "okay";
+};
+
 &uart4 {
        pinctrl-names = "default";
        pinctrl-0 = <&uart4_pins>;
Marek Vasut Dec. 21, 2018, 8:08 p.m. UTC | #19
On 12/16/2018 11:28 PM, Paul Burton wrote:
> Hi Marek,
> 
> On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
>>>>> I did suggest an alternative approach which would rename
>>>>> serial8250_clear_fifos() and split it into 2 variants - one that
>>>>> disables FIFOs & one that does not, then use the latter in
>>>>> __do_stop_tx_rs485():
>>>>>
>>>>> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
>>>>>
>>>>> However I have no access to the OMAP3 hardware that Marek's patch was
>>>>> attempting to fix & have heard nothing back with regards to him testing
>>>>> that approach, so here's a simple revert that fixes the Ingenic JZ4780.
>>>>>
>>>>> I've marked for stable back to v4.10 presuming that this is how far the
>>>>> broken patch may be backported, given that this is where commit
>>>>> 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
>>>>> transmits to break") that it tried to fix was introduced.
>>>>
>>>> OK, I tested this on AM335x / OMAP3 and the system is again broken, so
>>>> that's a NAK.
>>>
>>> To be clear - what did you test? This revert or the patch linked to
>>> above?
>>>
>>> This revert would of course reintroduce your RS485 issue because it just
>>> undoes your change.
>>
>> The revert. Which of the two patches do you need me to test.
> 
> The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to
> at lore.kernel.org in the quote right above:
> 
> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
> 
> You replied with comments on the patch, you just never tested it or
> never told me if you did. The lack of response means I don't know
> whether that potential patch even still works for your system, hence the
> revert.

I shared the entire testcase, which now fails on AM335x due to this
revert. Is there any progress on a proper fix from your side which does
not break the AM335x ?
Paul Burton Dec. 21, 2018, 9:08 p.m. UTC | #20
Hi Marek,

On Fri, Dec 21, 2018 at 09:08:28PM +0100, Marek Vasut wrote:
> On 12/16/2018 11:28 PM, Paul Burton wrote:
> > On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
> >>>>> I did suggest an alternative approach which would rename
> >>>>> serial8250_clear_fifos() and split it into 2 variants - one that
> >>>>> disables FIFOs & one that does not, then use the latter in
> >>>>> __do_stop_tx_rs485():
> >>>>>
> >>>>> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
> >>>>>
> >>>>> However I have no access to the OMAP3 hardware that Marek's patch was
> >>>>> attempting to fix & have heard nothing back with regards to him testing
> >>>>> that approach, so here's a simple revert that fixes the Ingenic JZ4780.
> >>>>>
> >>>>> I've marked for stable back to v4.10 presuming that this is how far the
> >>>>> broken patch may be backported, given that this is where commit
> >>>>> 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
> >>>>> transmits to break") that it tried to fix was introduced.
> >>>>
> >>>> OK, I tested this on AM335x / OMAP3 and the system is again broken, so
> >>>> that's a NAK.
> >>>
> >>> To be clear - what did you test? This revert or the patch linked to
> >>> above?
> >>>
> >>> This revert would of course reintroduce your RS485 issue because it just
> >>> undoes your change.
> >>
> >> The revert. Which of the two patches do you need me to test.
> > 
> > The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to
> > at lore.kernel.org in the quote right above:
> > 
> > https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
> > 
> > You replied with comments on the patch, you just never tested it or
> > never told me if you did. The lack of response means I don't know
> > whether that potential patch even still works for your system, hence the
> > revert.
> 
> I shared the entire testcase, which now fails on AM335x due to this
> revert. Is there any progress on a proper fix from your side which does
> not break the AM335x ?

No.

Let's be clear - I didn't break your AM335x system, your broken patch
says that Daniel did with a commit applied back in v4.10. As such I
don't consider it my responsibility to fix your problem at all, I don't
have any access to the hardware anyway & I won't be buying hardware to
fix a bug for you.

Despite all that I did write a patch which I expect would have solved
the problem for both of us, which is linked *twice* in the quoted emails
above & which as far as I can tell you *still* have not tested. I can
only surmise that you're trying deliberately to be annoying at this
point.

If you want to try the patch I already wrote, and confirm whether it
actually works for you, then let's talk. Otherwise we're done here.

Thanks,
    Paul
Marek Vasut Dec. 21, 2018, 10:02 p.m. UTC | #21
On 12/21/2018 10:08 PM, Paul Burton wrote:
> Hi Marek,
> 
> On Fri, Dec 21, 2018 at 09:08:28PM +0100, Marek Vasut wrote:
>> On 12/16/2018 11:28 PM, Paul Burton wrote:
>>> On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
>>>>>>> I did suggest an alternative approach which would rename
>>>>>>> serial8250_clear_fifos() and split it into 2 variants - one that
>>>>>>> disables FIFOs & one that does not, then use the latter in
>>>>>>> __do_stop_tx_rs485():
>>>>>>>
>>>>>>> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
>>>>>>>
>>>>>>> However I have no access to the OMAP3 hardware that Marek's patch was
>>>>>>> attempting to fix & have heard nothing back with regards to him testing
>>>>>>> that approach, so here's a simple revert that fixes the Ingenic JZ4780.
>>>>>>>
>>>>>>> I've marked for stable back to v4.10 presuming that this is how far the
>>>>>>> broken patch may be backported, given that this is where commit
>>>>>>> 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
>>>>>>> transmits to break") that it tried to fix was introduced.
>>>>>>
>>>>>> OK, I tested this on AM335x / OMAP3 and the system is again broken, so
>>>>>> that's a NAK.
>>>>>
>>>>> To be clear - what did you test? This revert or the patch linked to
>>>>> above?
>>>>>
>>>>> This revert would of course reintroduce your RS485 issue because it just
>>>>> undoes your change.
>>>>
>>>> The revert. Which of the two patches do you need me to test.
>>>
>>> The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to
>>> at lore.kernel.org in the quote right above:
>>>
>>> https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
>>>
>>> You replied with comments on the patch, you just never tested it or
>>> never told me if you did. The lack of response means I don't know
>>> whether that potential patch even still works for your system, hence the
>>> revert.
>>
>> I shared the entire testcase, which now fails on AM335x due to this
>> revert. Is there any progress on a proper fix from your side which does
>> not break the AM335x ?
> 
> No.
> 
> Let's be clear - I didn't break your AM335x system, your broken patch
> says that Daniel did with a commit applied back in v4.10. As such I
> don't consider it my responsibility to fix your problem at all, I don't
> have any access to the hardware anyway & I won't be buying hardware to
> fix a bug for you.
> 
> Despite all that I did write a patch which I expect would have solved
> the problem for both of us, which is linked *twice* in the quoted emails
> above & which as far as I can tell you *still* have not tested. I can
> only surmise that you're trying deliberately to be annoying at this
> point.
> 
> If you want to try the patch I already wrote, and confirm whether it
> actually works for you, then let's talk. Otherwise we're done here.

Understood. However I did test your patch, but it was generating
spurious IRQs and overruns (ttyS ttyS2: 91 input overrun(s)) on the
AM335x, so that's not a proper solution.

I even brought the CI20 V2 I have back to life (yes, I bought one). The
patch Ezequiel posted did fix the problem on the CI20, and did not break
the AM335x, so I'd prefer if it was revisited instead of a heavy-handed
revert.

And I'd prefer to keep this thread alive, since I am still convinced
that the FIFO handling code is wrong. Moreover, considering the UME bit
on JZ4780 in FCR, the original code should confuse the UART on the
JZ4780 too, although this might be hidden by some other surrounding code
specific to the 8250 core on the JZ4780.

I am also on mipslinux IRC channel, in case you want to discuss this.
Paul Burton Dec. 24, 2018, 3:43 p.m. UTC | #22
Hi Marek,

On Fri, Dec 21, 2018 at 11:02:03PM +0100, Marek Vasut wrote:
> >> I shared the entire testcase, which now fails on AM335x due to this
> >> revert. Is there any progress on a proper fix from your side which does
> >> not break the AM335x ?
> > 
> > No.
> > 
> > Let's be clear - I didn't break your AM335x system, your broken patch
> > says that Daniel did with a commit applied back in v4.10. As such I
> > don't consider it my responsibility to fix your problem at all, I don't
> > have any access to the hardware anyway & I won't be buying hardware to
> > fix a bug for you.
> > 
> > Despite all that I did write a patch which I expect would have solved
> > the problem for both of us, which is linked *twice* in the quoted emails
> > above & which as far as I can tell you *still* have not tested. I can
> > only surmise that you're trying deliberately to be annoying at this
> > point.
> > 
> > If you want to try the patch I already wrote, and confirm whether it
> > actually works for you, then let's talk. Otherwise we're done here.
> 
> Understood. However I did test your patch, but it was generating
> spurious IRQs and overruns (ttyS ttyS2: 91 input overrun(s)) on the
> AM335x, so that's not a proper solution.

OK, thanks for testing, and let's figure out what's going on. Since the
revert is in now I'd suggest starting from there - ie. approximately
from the code we've had since v4.10.

> I even brought the CI20 V2 I have back to life (yes, I bought one). The
> patch Ezequiel posted did fix the problem on the CI20, and did not break
> the AM335x, so I'd prefer if it was revisited instead of a heavy-handed
> revert.

As I described in an earlier email & on IRC, Ezequiel's one-liner does
not address all of the problems listed in my revert's commit message.
But again, since the revert is now in I suggest starting from there.

As neither a maintainer or significant contributor to
drivers/tty/serial, nor the author of the patch applied in v4.10 which
you say broke AM335x, nor someone with access to the affected hardware,
I'm probably not the best placed person to help you here - all I can do
is offer general debug suggestions. With that in mind, here are some
questions & ideas I have:

 1) Your message for commit f6aa5beb45be ("serial: 8250: Fix clearing
    FIFOs in RS485 mode again") suggests that the problem you're seeing
    is specific to the __do_stop_tx_rs485() path. Does changing only
    __do_stop_tx_rs485() to clear the FIFOs without disabling them,
    without modifying other users of serial8250_clear_fifos(), fix your
    problem? ie. does the following work?

diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index f776b3eafb96..18d2a1a93f03 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1458,17 +1458,21 @@ static void serial8250_stop_rx(struct uart_port *port)
 }
 
 static void __do_stop_tx_rs485(struct uart_8250_port *p)
 {
+	unsigned char fcr;
+
 	serial8250_em485_rts_after_send(p);
 
 	/*
 	 * Empty the RX FIFO, we are not interested in anything
 	 * received during the half-duplex transmission.
 	 * Enable previously disabled RX interrupts.
 	 */
 	if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
-		serial8250_clear_fifos(p);
+		fcr = serial_port_in(&p->port, UART_FCR);
+		serial_port_out(&p->port, UART_FCR,
+				fcr | UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
 
 		p->ier |= UART_IER_RLSI | UART_IER_RDI;
 		serial_port_out(&p->port, UART_IER, p->ier);
 	}

    Based on the comment above maybe leaving UART_FCR_CLEAR_XMIT out of
    that might make sense too..?

    Having __do_stop_tx_rs485() not share the FIFO clearing code with
    other callers may make sense given that so far as I can see
    __do_stop_tx_rs485() runs whilst the port is in use & other callers
    of serial8250_clear_fifos() do not.

 2) In an earlier email you said "The problem the original patch fixed
    was that too many bits were cleared in the FCR on OMAP3/AM335x".
    Could you clarify which bits we're talking about? From the AM335x
    reference manual I can only see the DMA_MODE bit & the
    [RT]X_FIFO_TRIG fields. Do you know which are problematic & why? In
    your commit message you mention the AM335x UART swallowing &
    retransmitting bytes - which field changing causes that?

> And I'd prefer to keep this thread alive, since I am still convinced
> that the FIFO handling code is wrong. Moreover, considering the UME bit
> on JZ4780 in FCR, the original code should confuse the UART on the
> JZ4780 too, although this might be hidden by some other surrounding code
> specific to the 8250 core on the JZ4780.

For completeness, as I said in an earlier email in the thread and as
we've since discussed on IRC, this is incorrect because
ingenic_uart_serial_out() unconditionally sets the UME bit. As such the
UME bit is not a problem, and JZ4780 works fine both before your patch &
after the revert.

Thanks,
    Paul
diff mbox series

Patch

diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index f776b3eafb96..3f779d25ec0c 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -552,30 +552,11 @@  static unsigned int serial_icr_read(struct uart_8250_port *up, int offset)
  */
 static void serial8250_clear_fifos(struct uart_8250_port *p)
 {
-	unsigned char fcr;
-	unsigned char clr_mask = UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT;
-
 	if (p->capabilities & UART_CAP_FIFO) {
-		/*
-		 * Make sure to avoid changing FCR[7:3] and ENABLE_FIFO bits.
-		 * In case ENABLE_FIFO is not set, there is nothing to flush
-		 * so just return. Furthermore, on certain implementations of
-		 * the 8250 core, the FCR[7:3] bits may only be changed under
-		 * specific conditions and changing them if those conditions
-		 * are not met can have nasty side effects. One such core is
-		 * the 8250-omap present in TI AM335x.
-		 */
-		fcr = serial_in(p, UART_FCR);
-
-		/* FIFO is not enabled, there's nothing to clear. */
-		if (!(fcr & UART_FCR_ENABLE_FIFO))
-			return;
-
-		fcr |= clr_mask;
-		serial_out(p, UART_FCR, fcr);
-
-		fcr &= ~clr_mask;
-		serial_out(p, UART_FCR, fcr);
+		serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO);
+		serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO |
+			       UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
+		serial_out(p, UART_FCR, 0);
 	}
 }
 
@@ -1467,7 +1448,7 @@  static void __do_stop_tx_rs485(struct uart_8250_port *p)
 	 * Enable previously disabled RX interrupts.
 	 */
 	if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
-		serial8250_clear_fifos(p);
+		serial8250_clear_and_reinit_fifos(p);
 
 		p->ier |= UART_IER_RLSI | UART_IER_RDI;
 		serial_port_out(&p->port, UART_IER, p->ier);