4.16 OMAP serial transmit corruption?
diff mbox

Message ID da3be56b-29d3-72f3-5eba-d6a38ae500d0@ti.com
State New
Headers show

Commit Message

Vignesh Raghavendra April 18, 2018, 9:11 a.m. UTC
On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
> 
> 
> On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
>> * Russell King - ARM Linux <linux@armlinux.org.uk> [180416 15:19]:
>>> Hi,
>>>
>>> I'm not entirely sure what's going on, but I see corrupted characters
>>> with the serial console on the OMAP4430 SDP board.  During boot,
>>> everything seems fine, the problem appears to be userspace output.
>>>
>>> For example, if I edit a file, then quit vi:
>>>
>>> :q■■%■■B■■Z■root@omap-4430sdp:~#
>>
>> I don't think I've seen that one. What I've seen few times is
>> typing a key on the serial console echoing back the previous
>> character typed while the new character won't get displayed
>> until hitting keyboard again. Only rebooting the device seems
>> to solve this. This is with 4430 ES2.3 revision.
>>
>> I wonder if we're missing some parts of errata i202 handling
>> in omap_8250_mdr1_errataset()?
>>

I wonder if the extra read of MDR1 register at the beginning of
omap_8250_mdr1_errataset() compared to omap-serial is the issue.
errata i202 says access to MDR1 can cause data corruption. 
Assuming both reads and writes can cause glitch then, that read
is not following advisory:

I don't have SDP board so, could you verify if below diff helps:

Comments

Russell King - ARM Linux admin April 18, 2018, 9:59 a.m. UTC | #1
On Wed, Apr 18, 2018 at 02:41:43PM +0530, Vignesh R wrote:
> 
> 
> On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
> > 
> > 
> > On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
> >> * Russell King - ARM Linux <linux@armlinux.org.uk> [180416 15:19]:
> >>> Hi,
> >>>
> >>> I'm not entirely sure what's going on, but I see corrupted characters
> >>> with the serial console on the OMAP4430 SDP board.  During boot,
> >>> everything seems fine, the problem appears to be userspace output.
> >>>
> >>> For example, if I edit a file, then quit vi:
> >>>
> >>> :q■■%■■B■■Z■root@omap-4430sdp:~#
> >>
> >> I don't think I've seen that one. What I've seen few times is
> >> typing a key on the serial console echoing back the previous
> >> character typed while the new character won't get displayed
> >> until hitting keyboard again. Only rebooting the device seems
> >> to solve this. This is with 4430 ES2.3 revision.
> >>
> >> I wonder if we're missing some parts of errata i202 handling
> >> in omap_8250_mdr1_errataset()?
> >>
> 
> I wonder if the extra read of MDR1 register at the beginning of
> omap_8250_mdr1_errataset() compared to omap-serial is the issue.
> errata i202 says access to MDR1 can cause data corruption. 
> Assuming both reads and writes can cause glitch then, that read
> is not following advisory:
> 
> I don't have SDP board so, could you verify if below diff helps:
> 
> 
> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> index 6aaa84355fd1..8ab9d0a1b1eb 100644
> --- a/drivers/tty/serial/8250/8250_omap.c
> +++ b/drivers/tty/serial/8250/8250_omap.c
> @@ -163,11 +163,6 @@ static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
>                                      struct omap8250_priv *priv)
>  {
>         u8 timeout = 255;
> -       u8 old_mdr1;
> -
> -       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
> -       if (old_mdr1 == priv->mdr1)
> -               return;
>  
>         serial_out(up, UART_OMAP_MDR1, priv->mdr1);
>         udelay(2);

That doesn't appear to help.

Looking at the bitstream and comparing what should have been sent with
what was sent, there appears to be some correlation between the two.
It looks like the FTDI is not properly synchronised to the bitstream
coming from the OMAP4430.

Setting two stop bits on both ends (OMAP4430 and FTDI) appears to
improve the issue, but not completely solve it.
Michael Nazzareno Trimarchi April 18, 2018, 10:27 a.m. UTC | #2
Hi

On Wed, Apr 18, 2018 at 11:59 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Wed, Apr 18, 2018 at 02:41:43PM +0530, Vignesh R wrote:
>>
>>
>> On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
>> >
>> >
>> > On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
>> >> * Russell King - ARM Linux <linux@armlinux.org.uk> [180416 15:19]:
>> >>> Hi,
>> >>>
>> >>> I'm not entirely sure what's going on, but I see corrupted characters
>> >>> with the serial console on the OMAP4430 SDP board.  During boot,
>> >>> everything seems fine, the problem appears to be userspace output.
>> >>>
>> >>> For example, if I edit a file, then quit vi:
>> >>>
>> >>> :q■■%■■B■■Z■root@omap-4430sdp:~#
>> >>
>> >> I don't think I've seen that one. What I've seen few times is
>> >> typing a key on the serial console echoing back the previous
>> >> character typed while the new character won't get displayed
>> >> until hitting keyboard again. Only rebooting the device seems
>> >> to solve this. This is with 4430 ES2.3 revision.
>> >>
>> >> I wonder if we're missing some parts of errata i202 handling
>> >> in omap_8250_mdr1_errataset()?
>> >>
>>
>> I wonder if the extra read of MDR1 register at the beginning of
>> omap_8250_mdr1_errataset() compared to omap-serial is the issue.
>> errata i202 says access to MDR1 can cause data corruption.
>> Assuming both reads and writes can cause glitch then, that read
>> is not following advisory:
>>
>> I don't have SDP board so, could you verify if below diff helps:
>>
>>
>> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
>> index 6aaa84355fd1..8ab9d0a1b1eb 100644
>> --- a/drivers/tty/serial/8250/8250_omap.c
>> +++ b/drivers/tty/serial/8250/8250_omap.c
>> @@ -163,11 +163,6 @@ static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
>>                                      struct omap8250_priv *priv)
>>  {
>>         u8 timeout = 255;
>> -       u8 old_mdr1;
>> -
>> -       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
>> -       if (old_mdr1 == priv->mdr1)
>> -               return;
>>
>>         serial_out(up, UART_OMAP_MDR1, priv->mdr1);
>>         udelay(2);
>
> That doesn't appear to help.
>
> Looking at the bitstream and comparing what should have been sent with
> what was sent, there appears to be some correlation between the two.
> It looks like the FTDI is not properly synchronised to the bitstream
> coming from the OMAP4430.
>
> Setting two stop bits on both ends (OMAP4430 and FTDI) appears to
> improve the issue, but not completely solve it.

Are you sure about clock error above some tollerance?

>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
> According to speedtest.net: 8.21Mbps down 510kbps up
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux admin April 18, 2018, 11 a.m. UTC | #3
On Wed, Apr 18, 2018 at 12:27:02PM +0200, Michael Nazzareno Trimarchi wrote:
> Hi
> 
> On Wed, Apr 18, 2018 at 11:59 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Wed, Apr 18, 2018 at 02:41:43PM +0530, Vignesh R wrote:
> >>
> >>
> >> On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
> >> >
> >> >
> >> > On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
> >> >> * Russell King - ARM Linux <linux@armlinux.org.uk> [180416 15:19]:
> >> >>> Hi,
> >> >>>
> >> >>> I'm not entirely sure what's going on, but I see corrupted characters
> >> >>> with the serial console on the OMAP4430 SDP board.  During boot,
> >> >>> everything seems fine, the problem appears to be userspace output.
> >> >>>
> >> >>> For example, if I edit a file, then quit vi:
> >> >>>
> >> >>> :q■■%■■B■■Z■root@omap-4430sdp:~#
> >> >>
> >> >> I don't think I've seen that one. What I've seen few times is
> >> >> typing a key on the serial console echoing back the previous
> >> >> character typed while the new character won't get displayed
> >> >> until hitting keyboard again. Only rebooting the device seems
> >> >> to solve this. This is with 4430 ES2.3 revision.
> >> >>
> >> >> I wonder if we're missing some parts of errata i202 handling
> >> >> in omap_8250_mdr1_errataset()?
> >> >>
> >>
> >> I wonder if the extra read of MDR1 register at the beginning of
> >> omap_8250_mdr1_errataset() compared to omap-serial is the issue.
> >> errata i202 says access to MDR1 can cause data corruption.
> >> Assuming both reads and writes can cause glitch then, that read
> >> is not following advisory:
> >>
> >> I don't have SDP board so, could you verify if below diff helps:
> >>
> >>
> >> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> >> index 6aaa84355fd1..8ab9d0a1b1eb 100644
> >> --- a/drivers/tty/serial/8250/8250_omap.c
> >> +++ b/drivers/tty/serial/8250/8250_omap.c
> >> @@ -163,11 +163,6 @@ static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
> >>                                      struct omap8250_priv *priv)
> >>  {
> >>         u8 timeout = 255;
> >> -       u8 old_mdr1;
> >> -
> >> -       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
> >> -       if (old_mdr1 == priv->mdr1)
> >> -               return;
> >>
> >>         serial_out(up, UART_OMAP_MDR1, priv->mdr1);
> >>         udelay(2);
> >
> > That doesn't appear to help.
> >
> > Looking at the bitstream and comparing what should have been sent with
> > what was sent, there appears to be some correlation between the two.
> > It looks like the FTDI is not properly synchronised to the bitstream
> > coming from the OMAP4430.
> >
> > Setting two stop bits on both ends (OMAP4430 and FTDI) appears to
> > improve the issue, but not completely solve it.
> 
> Are you sure about clock error above some tollerance?

No idea at the moment.  Looking at the bitstream with a scope is the
next step, but it's not easy to do that with just two hands.  I also
need to find some way to trigger it reliably.

Another cause could be that the UART pin is being held high/low for
some reason (maybe a pinmux problem.)

Another interesting observation is that if I login over the network and
then do:

	while :; do :; done &
	while :; do :; done &

to occupy both CPUs, and then do:

	dmesg | less

on the console, the problem goes away.  If I only do one while loop,
the problem is present, but the corruption looks like it happens at a
different point in the serial stream.

This would seem to point the blame away from clocks or pinmux, and back
to power management issues.

I've also tried mimicking the less output with a stand-alone program,
and that doesn't exhibit the problem - I've tried with various initial
delays between program start and first output, but this doesn't seem
to have much effect.  So it seems to need rather precise timing.

stracing less does change where the corruption happens in the output,
which also suggests a timing related cause.
Michael Nazzareno Trimarchi April 18, 2018, 11:45 a.m. UTC | #4
Hi

On Wed, Apr 18, 2018 at 1:00 PM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Wed, Apr 18, 2018 at 12:27:02PM +0200, Michael Nazzareno Trimarchi wrote:
>> Hi
>>
>> On Wed, Apr 18, 2018 at 11:59 AM, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>> > On Wed, Apr 18, 2018 at 02:41:43PM +0530, Vignesh R wrote:
>> >>
>> >>
>> >> On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
>> >> >
>> >> >
>> >> > On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
>> >> >> * Russell King - ARM Linux <linux@armlinux.org.uk> [180416 15:19]:
>> >> >>> Hi,
>> >> >>>
>> >> >>> I'm not entirely sure what's going on, but I see corrupted characters
>> >> >>> with the serial console on the OMAP4430 SDP board.  During boot,
>> >> >>> everything seems fine, the problem appears to be userspace output.
>> >> >>>
>> >> >>> For example, if I edit a file, then quit vi:
>> >> >>>
>> >> >>> :q■■%■■B■■Z■root@omap-4430sdp:~#
>> >> >>
>> >> >> I don't think I've seen that one. What I've seen few times is
>> >> >> typing a key on the serial console echoing back the previous
>> >> >> character typed while the new character won't get displayed
>> >> >> until hitting keyboard again. Only rebooting the device seems
>> >> >> to solve this. This is with 4430 ES2.3 revision.
>> >> >>
>> >> >> I wonder if we're missing some parts of errata i202 handling
>> >> >> in omap_8250_mdr1_errataset()?
>> >> >>
>> >>
>> >> I wonder if the extra read of MDR1 register at the beginning of
>> >> omap_8250_mdr1_errataset() compared to omap-serial is the issue.
>> >> errata i202 says access to MDR1 can cause data corruption.
>> >> Assuming both reads and writes can cause glitch then, that read
>> >> is not following advisory:
>> >>
>> >> I don't have SDP board so, could you verify if below diff helps:
>> >>
>> >>
>> >> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
>> >> index 6aaa84355fd1..8ab9d0a1b1eb 100644
>> >> --- a/drivers/tty/serial/8250/8250_omap.c
>> >> +++ b/drivers/tty/serial/8250/8250_omap.c
>> >> @@ -163,11 +163,6 @@ static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
>> >>                                      struct omap8250_priv *priv)
>> >>  {
>> >>         u8 timeout = 255;
>> >> -       u8 old_mdr1;
>> >> -
>> >> -       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
>> >> -       if (old_mdr1 == priv->mdr1)
>> >> -               return;
>> >>
>> >>         serial_out(up, UART_OMAP_MDR1, priv->mdr1);
>> >>         udelay(2);
>> >
>> > That doesn't appear to help.
>> >
>> > Looking at the bitstream and comparing what should have been sent with
>> > what was sent, there appears to be some correlation between the two.
>> > It looks like the FTDI is not properly synchronised to the bitstream
>> > coming from the OMAP4430.
>> >
>> > Setting two stop bits on both ends (OMAP4430 and FTDI) appears to
>> > improve the issue, but not completely solve it.
>>
>> Are you sure about clock error above some tollerance?
>
> No idea at the moment.  Looking at the bitstream with a scope is the
> next step, but it's not easy to do that with just two hands.  I also
> need to find some way to trigger it reliably.
>
> Another cause could be that the UART pin is being held high/low for
> some reason (maybe a pinmux problem.)
>
> Another interesting observation is that if I login over the network and
> then do:
>
>         while :; do :; done &
>         while :; do :; done &
>

You can disable it. Anyway when uart from Ti go in idle mode that can loose
the first char on receiving

> to occupy both CPUs, and then do:
>
>         dmesg | less
>
> on the console, the problem goes away.  If I only do one while loop,
> the problem is present, but the corruption looks like it happens at a
> different point in the serial stream.
>
> This would seem to point the blame away from clocks or pinmux, and back
> to power management issues.
>

Do you have statistics from the uart under proc?

Michael

> I've also tried mimicking the less output with a stand-alone program,
> and that doesn't exhibit the problem - I've tried with various initial
> delays between program start and first output, but this doesn't seem
> to have much effect.  So it seems to need rather precise timing.
>
> stracing less does change where the corruption happens in the output,
> which also suggests a timing related cause.
>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
> According to speedtest.net: 8.21Mbps down 510kbps up
Russell King - ARM Linux admin April 18, 2018, 12:17 p.m. UTC | #5
On Wed, Apr 18, 2018 at 01:45:12PM +0200, Michael Nazzareno Trimarchi wrote:
> Hi
> 
> On Wed, Apr 18, 2018 at 1:00 PM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Wed, Apr 18, 2018 at 12:27:02PM +0200, Michael Nazzareno Trimarchi wrote:
> >> Hi
> >>
> >> On Wed, Apr 18, 2018 at 11:59 AM, Russell King - ARM Linux
> >> <linux@armlinux.org.uk> wrote:
> >> > On Wed, Apr 18, 2018 at 02:41:43PM +0530, Vignesh R wrote:
> >> >>
> >> >>
> >> >> On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
> >> >> >
> >> >> >
> >> >> > On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
> >> >> >> * Russell King - ARM Linux <linux@armlinux.org.uk> [180416 15:19]:
> >> >> >>> Hi,
> >> >> >>>
> >> >> >>> I'm not entirely sure what's going on, but I see corrupted characters
> >> >> >>> with the serial console on the OMAP4430 SDP board.  During boot,
> >> >> >>> everything seems fine, the problem appears to be userspace output.
> >> >> >>>
> >> >> >>> For example, if I edit a file, then quit vi:
> >> >> >>>
> >> >> >>> :q■■%■■B■■Z■root@omap-4430sdp:~#
> >> >> >>
> >> >> >> I don't think I've seen that one. What I've seen few times is
> >> >> >> typing a key on the serial console echoing back the previous
> >> >> >> character typed while the new character won't get displayed
> >> >> >> until hitting keyboard again. Only rebooting the device seems
> >> >> >> to solve this. This is with 4430 ES2.3 revision.
> >> >> >>
> >> >> >> I wonder if we're missing some parts of errata i202 handling
> >> >> >> in omap_8250_mdr1_errataset()?
> >> >> >>
> >> >>
> >> >> I wonder if the extra read of MDR1 register at the beginning of
> >> >> omap_8250_mdr1_errataset() compared to omap-serial is the issue.
> >> >> errata i202 says access to MDR1 can cause data corruption.
> >> >> Assuming both reads and writes can cause glitch then, that read
> >> >> is not following advisory:
> >> >>
> >> >> I don't have SDP board so, could you verify if below diff helps:
> >> >>
> >> >>
> >> >> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> >> >> index 6aaa84355fd1..8ab9d0a1b1eb 100644
> >> >> --- a/drivers/tty/serial/8250/8250_omap.c
> >> >> +++ b/drivers/tty/serial/8250/8250_omap.c
> >> >> @@ -163,11 +163,6 @@ static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
> >> >>                                      struct omap8250_priv *priv)
> >> >>  {
> >> >>         u8 timeout = 255;
> >> >> -       u8 old_mdr1;
> >> >> -
> >> >> -       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
> >> >> -       if (old_mdr1 == priv->mdr1)
> >> >> -               return;
> >> >>
> >> >>         serial_out(up, UART_OMAP_MDR1, priv->mdr1);
> >> >>         udelay(2);
> >> >
> >> > That doesn't appear to help.
> >> >
> >> > Looking at the bitstream and comparing what should have been sent with
> >> > what was sent, there appears to be some correlation between the two.
> >> > It looks like the FTDI is not properly synchronised to the bitstream
> >> > coming from the OMAP4430.
> >> >
> >> > Setting two stop bits on both ends (OMAP4430 and FTDI) appears to
> >> > improve the issue, but not completely solve it.
> >>
> >> Are you sure about clock error above some tollerance?
> >
> > No idea at the moment.  Looking at the bitstream with a scope is the
> > next step, but it's not easy to do that with just two hands.  I also
> > need to find some way to trigger it reliably.
> >
> > Another cause could be that the UART pin is being held high/low for
> > some reason (maybe a pinmux problem.)
> >
> > Another interesting observation is that if I login over the network and
> > then do:
> >
> >         while :; do :; done &
> >         while :; do :; done &
> >
> 
> You can disable it. Anyway when uart from Ti go in idle mode that can loose
> the first char on receiving

That may be, but what happens on the OMAP receive side is not relevant.
This issue is about the OMAP4430 transmit side.

> > to occupy both CPUs, and then do:
> >
> >         dmesg | less
> >
> > on the console, the problem goes away.  If I only do one while loop,
> > the problem is present, but the corruption looks like it happens at a
> > different point in the serial stream.
> >
> > This would seem to point the blame away from clocks or pinmux, and back
> > to power management issues.
> >
> 
> Do you have statistics from the uart under proc?

You mean on the OMAP4430?

# cat /proc/tty/driver/OMAP-SERIAL
serinfo:1.0 driver revision:
0: uart:OMAP UART0 mmio:0x4806A000 irq:32 tx:0 rx:0
1: uart:OMAP UART1 mmio:0x4806C000 irq:33 tx:0 rx:0
2: uart:OMAP UART2 mmio:0x48020000 irq:34 tx:638807 rx:5406 RTS|CTS|DTR
3: uart:OMAP UART3 mmio:0x4806E000 irq:35 tx:0 rx:0

Of course, there won't be anything of interest there because I'm
talking about the *transmit* side on the OMAP4430 and there's no
way to detect or monitor errors in the transmit side.

The ftdi-sio driver on the host machine, which would be involved in
the receive, doesn't keep statistics and make them available through
/proc.  (Another reason why I hate usb-serial based development
boards.)
Russell King - ARM Linux admin April 18, 2018, 12:47 p.m. UTC | #6
On Wed, Apr 18, 2018 at 12:00:33PM +0100, Russell King - ARM Linux wrote:
> On Wed, Apr 18, 2018 at 12:27:02PM +0200, Michael Nazzareno Trimarchi wrote:
> > Hi
> > 
> > On Wed, Apr 18, 2018 at 11:59 AM, Russell King - ARM Linux
> > <linux@armlinux.org.uk> wrote:
> > > On Wed, Apr 18, 2018 at 02:41:43PM +0530, Vignesh R wrote:
> > >>
> > >>
> > >> On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
> > >> >
> > >> >
> > >> > On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
> > >> >> * Russell King - ARM Linux <linux@armlinux.org.uk> [180416 15:19]:
> > >> >>> Hi,
> > >> >>>
> > >> >>> I'm not entirely sure what's going on, but I see corrupted characters
> > >> >>> with the serial console on the OMAP4430 SDP board.  During boot,
> > >> >>> everything seems fine, the problem appears to be userspace output.
> > >> >>>
> > >> >>> For example, if I edit a file, then quit vi:
> > >> >>>
> > >> >>> :q■■%■■B■■Z■root@omap-4430sdp:~#
> > >> >>
> > >> >> I don't think I've seen that one. What I've seen few times is
> > >> >> typing a key on the serial console echoing back the previous
> > >> >> character typed while the new character won't get displayed
> > >> >> until hitting keyboard again. Only rebooting the device seems
> > >> >> to solve this. This is with 4430 ES2.3 revision.
> > >> >>
> > >> >> I wonder if we're missing some parts of errata i202 handling
> > >> >> in omap_8250_mdr1_errataset()?
> > >> >>
> > >>
> > >> I wonder if the extra read of MDR1 register at the beginning of
> > >> omap_8250_mdr1_errataset() compared to omap-serial is the issue.
> > >> errata i202 says access to MDR1 can cause data corruption.
> > >> Assuming both reads and writes can cause glitch then, that read
> > >> is not following advisory:
> > >>
> > >> I don't have SDP board so, could you verify if below diff helps:
> > >>
> > >>
> > >> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> > >> index 6aaa84355fd1..8ab9d0a1b1eb 100644
> > >> --- a/drivers/tty/serial/8250/8250_omap.c
> > >> +++ b/drivers/tty/serial/8250/8250_omap.c
> > >> @@ -163,11 +163,6 @@ static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
> > >>                                      struct omap8250_priv *priv)
> > >>  {
> > >>         u8 timeout = 255;
> > >> -       u8 old_mdr1;
> > >> -
> > >> -       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
> > >> -       if (old_mdr1 == priv->mdr1)
> > >> -               return;
> > >>
> > >>         serial_out(up, UART_OMAP_MDR1, priv->mdr1);
> > >>         udelay(2);
> > >
> > > That doesn't appear to help.
> > >
> > > Looking at the bitstream and comparing what should have been sent with
> > > what was sent, there appears to be some correlation between the two.
> > > It looks like the FTDI is not properly synchronised to the bitstream
> > > coming from the OMAP4430.
> > >
> > > Setting two stop bits on both ends (OMAP4430 and FTDI) appears to
> > > improve the issue, but not completely solve it.
> > 
> > Are you sure about clock error above some tollerance?
> 
> No idea at the moment.  Looking at the bitstream with a scope is the
> next step, but it's not easy to do that with just two hands.  I also
> need to find some way to trigger it reliably.
> 
> Another cause could be that the UART pin is being held high/low for
> some reason (maybe a pinmux problem.)
> 
> Another interesting observation is that if I login over the network and
> then do:
> 
> 	while :; do :; done &
> 	while :; do :; done &
> 
> to occupy both CPUs, and then do:
> 
> 	dmesg | less
> 
> on the console, the problem goes away.  If I only do one while loop,
> the problem is present, but the corruption looks like it happens at a
> different point in the serial stream.
> 
> This would seem to point the blame away from clocks or pinmux, and back
> to power management issues.
> 
> I've also tried mimicking the less output with a stand-alone program,
> and that doesn't exhibit the problem - I've tried with various initial
> delays between program start and first output, but this doesn't seem
> to have much effect.  So it seems to need rather precise timing.
> 
> stracing less does change where the corruption happens in the output,
> which also suggests a timing related cause.

Okay, I think I'm getting somewhere...  `less' does an ioctl(, TCSETS, )
after outputting a screenful in order to change c_iflag and c_lflag.
The differences are:

	c_iflag 0x1500 -> 0x1000
	c_lflag 0x083b -> 0x0831

Other settings are kept the same.

The iflag changes are IXON | ICRNL, and the lflag changes are
ECHO | ICANON.  Reproducing those changes in my test program shows
the same corruption.

Removing the lflag changes makes no difference.  Removing the ICRNL
also makes no difference - the problem is still there.  Removing
the IXON change and the problem vanishes.

Given that the serial driver rewrites the entire UART configuration
on a termios change that affects any hardware settings, this is
rather expected to happen.

So, the question becomes whether userspace is acting correctly - and
I'd say no.  Looking at _real_ `less' (iow, not the busybox version
that I seem to have on the OMAP4430) it doesn't do this fiddling with
termios settings just before waiting for input.  Moreover, I can't see
_any_ reason for `less' of any kind to be fiddling with IXON.

There is the remaining question about the proper behaviour of setting
termios modes while there is a transmit operation in progress - I know
of several programs that do this.  A TCSETS operation is defined to
occur "immediately" by the spec, but is it reasonable to change the
modes mid-transmission of a character (which _will_ corrupt the
character), or should they be changed at a character boundary (or at
whatever character boundary the hardware is capable of.)

I note that if DMA is enabled, 8250_omap delays a TCSETS operation
until DMA has completed, so I suspect that the problem I'm seeing
will go away if I enable DMA.

Patch
diff mbox

diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index 6aaa84355fd1..8ab9d0a1b1eb 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -163,11 +163,6 @@  static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
                                     struct omap8250_priv *priv)
 {
        u8 timeout = 255;
-       u8 old_mdr1;
-
-       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
-       if (old_mdr1 == priv->mdr1)
-               return;
 
        serial_out(up, UART_OMAP_MDR1, priv->mdr1);
        udelay(2);