diff mbox

[V3,3/3] can: m_can: workaround for transmit data less than 4 bytes

Message ID 1415193393-30023-3-git-send-email-b29396@freescale.com (mailing list archive)
State New, archived
Headers show

Commit Message

Aisheng Dong Nov. 5, 2014, 1:16 p.m. UTC
At least on the i.MX6SX TO1.2 with M_CAN IP version 3.0.1, an issue with
the Message RAM was discovered. Sending CAN frames with dlc less
than 4 bytes will lead to bit errors, when the first 8 bytes of
the Message RAM have not been initialized (i.e. written to).
To work around this issue, the first 8 bytes are initialized in open()
function.

Without the workaround, we can easily see the following errors:
root@imx6qdlsolo:~# ip link set can0 up type can bitrate 1000000
[   66.882520] IPv6: ADDRCONF(NETDEV_CHANGE): can0: link becomes ready
root@imx6qdlsolo:~# cansend can0 123#112233
[   66.935640] m_can 20e8000.can can0: Bit Error Uncorrected

Signed-off-by: Dong Aisheng <b29396@freescale.com>
---
ChangeLog
v2->v3:
 * add i.MX chip version in issue in commit message
v1->v2:
 * initialize the first 8 bytes of Tx Buffer of Message RAM in open()
   to workaround the issue
---
 drivers/net/can/m_can/m_can.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Marc Kleine-Budde Nov. 5, 2014, 2:29 p.m. UTC | #1
On 11/05/2014 02:16 PM, Dong Aisheng wrote:
> At least on the i.MX6SX TO1.2 with M_CAN IP version 3.0.1, an issue with
> the Message RAM was discovered. Sending CAN frames with dlc less
> than 4 bytes will lead to bit errors, when the first 8 bytes of
> the Message RAM have not been initialized (i.e. written to).
> To work around this issue, the first 8 bytes are initialized in open()
> function.
> 
> Without the workaround, we can easily see the following errors:
> root@imx6qdlsolo:~# ip link set can0 up type can bitrate 1000000
> [   66.882520] IPv6: ADDRCONF(NETDEV_CHANGE): can0: link becomes ready
> root@imx6qdlsolo:~# cansend can0 123#112233
> [   66.935640] m_can 20e8000.can can0: Bit Error Uncorrected
> 
> Signed-off-by: Dong Aisheng <b29396@freescale.com>

Applied to can/master

Thanks,
Marc
Oliver Hartkopp Nov. 5, 2014, 6:15 p.m. UTC | #2
Hi all,

just to close this application note relevant point ...

I got an answer from Florian Hartwich (Mr. CAN) from Bosch regarding the bit 
error detection found by Dong Aisheng.

The relevant interrupts IR.BEU or IR.BEC monitor the message RAM:

Bit 21 BEU: Bit Error Uncorrected
Message RAM bit error detected, uncorrected. Controlled by input signal 
m_can_aeim_berr[1] generated by an optional external parity / ECC logic 
attached to the Message RAM. An uncorrected Message RAM bit error sets 
CCCR.INIT to ‘1’. This is done to avoid transmission of corrupted data.

0= No bit error detected when reading from Message RAM
1= Bit error detected, uncorrected (e.g. parity logic)

Bit 20 BEC: Bit Error Corrected
Message RAM bit error detected and corrected. Controlled by input signal 
m_can_aeim_berr[0] generated by an optional external parity / ECC logic 
attached to the Message RAM.

0= No bit error detected when reading from Message RAM
1= Bit error detected and corrected (e.g. ECC)

---

The Message RAM is usually equipped with a parity or ECC functionality.
But RAM cells suffer a hardware reset and can therefore hold arbitrary content 
at startup - including parity and/or ECC bits.

So when you write only the CAN ID and the first four bytes the last four bytes 
remain untouched. Then the M_CAN starts to read in 32bit words from the start 
of the Tx Message element. So it is very likely to trigger the message RAM 
error when reading the uninitialized 32bit word from the last four bytes.

Finally it turns out that an initial writing (with any kind of data) to the 
entire message RAM is mandatory to create valid parity/ECC checksums.

That's it.

Regards,
Oliver
Aisheng Dong Nov. 6, 2014, 1:57 a.m. UTC | #3
On Wed, Nov 05, 2014 at 07:15:10PM +0100, Oliver Hartkopp wrote:
> Hi all,
> 
> just to close this application note relevant point ...
> 
> I got an answer from Florian Hartwich (Mr. CAN) from Bosch regarding
> the bit error detection found by Dong Aisheng.
> 
> The relevant interrupts IR.BEU or IR.BEC monitor the message RAM:
> 
> Bit 21 BEU: Bit Error Uncorrected
> Message RAM bit error detected, uncorrected. Controlled by input
> signal m_can_aeim_berr[1] generated by an optional external parity /
> ECC logic attached to the Message RAM. An uncorrected Message RAM
> bit error sets CCCR.INIT to ‘1’. This is done to avoid transmission
> of corrupted data.
> 
> 0= No bit error detected when reading from Message RAM
> 1= Bit error detected, uncorrected (e.g. parity logic)
> 
> Bit 20 BEC: Bit Error Corrected
> Message RAM bit error detected and corrected. Controlled by input
> signal m_can_aeim_berr[0] generated by an optional external parity /
> ECC logic attached to the Message RAM.
> 
> 0= No bit error detected when reading from Message RAM
> 1= Bit error detected and corrected (e.g. ECC)
> 
> ---
> 
> The Message RAM is usually equipped with a parity or ECC functionality.
> But RAM cells suffer a hardware reset and can therefore hold
> arbitrary content at startup - including parity and/or ECC bits.
> 
> So when you write only the CAN ID and the first four bytes the last
> four bytes remain untouched. Then the M_CAN starts to read in 32bit
> words from the start of the Tx Message element. So it is very likely
> to trigger the message RAM error when reading the uninitialized
> 32bit word from the last four bytes.
> 
> Finally it turns out that an initial writing (with any kind of data)
> to the entire message RAM is mandatory to create valid parity/ECC
> checksums.
> 
> That's it.
> 

Thanks for sharing this information.
Does it mean this issue is related to the nature of Message RAM and is
supposed to exist on all M_CAN IP versions?

> Regards,
> Oliver
> 

Regards
Dong Aisheng
Oliver Hartkopp Nov. 6, 2014, 7:04 a.m. UTC | #4
On 06.11.2014 02:57, Dong Aisheng wrote:
> On Wed, Nov 05, 2014 at 07:15:10PM +0100, Oliver Hartkopp wrote:

>> The Message RAM is usually equipped with a parity or ECC functionality.
>> But RAM cells suffer a hardware reset and can therefore hold
>> arbitrary content at startup - including parity and/or ECC bits.
>>
>> So when you write only the CAN ID and the first four bytes the last
>> four bytes remain untouched. Then the M_CAN starts to read in 32bit
>> words from the start of the Tx Message element. So it is very likely
>> to trigger the message RAM error when reading the uninitialized
>> 32bit word from the last four bytes.
>>
>> Finally it turns out that an initial writing (with any kind of data)
>> to the entire message RAM is mandatory to create valid parity/ECC
>> checksums.
>>
>> That's it.
>>
>
> Thanks for sharing this information.
> Does it mean this issue is related to the nature of Message RAM and is
> supposed to exist on all M_CAN IP versions?

 From what I know from the 3.1.x revision there's no change regarding IR.BRU 
and IR.BEC - so I would assume this to stay on all M_CAN IP revisions.

But after some sleep I wonder if this patch [3/3] would need an update too.

Writing to the TX message RAM is obviously no workaround but a valid and 
needed initialization process.

I would tend to make this patch:

---

can: m_can: add missing TX message RAM initialization

The M_CAN message RAM is usually equipped with a parity or ECC functionality.
But RAM cells suffer a hardware reset and can therefore hold arbitrary content 
at startup - including parity and/or ECC bits.

To prevent the M_CAN controller detecting checksum errors when reading 
potentially uninitialized TX message RAM content to transmit CAN frames the TX 
message RAM has to be written with (any kind of) initial data.

---

Then the code should memset() the entire TX FIFO element - and not only the 8 
data bytes we are addressing now.

Maybe it makes sense to send the entire updated patch set (3) again ...

[1/3] can: add can_is_canfd_skb() API
[2/3] can: m_can: update to support CAN FD features
[3/3] can: m_can: add missing message RAM initialization

Are you ok with that?

Regards,
Oliver
Aisheng Dong Nov. 6, 2014, 8:09 a.m. UTC | #5
On Thu, Nov 06, 2014 at 08:04:17AM +0100, Oliver Hartkopp wrote:
> On 06.11.2014 02:57, Dong Aisheng wrote:
> >On Wed, Nov 05, 2014 at 07:15:10PM +0100, Oliver Hartkopp wrote:
> 
> >>The Message RAM is usually equipped with a parity or ECC functionality.
> >>But RAM cells suffer a hardware reset and can therefore hold
> >>arbitrary content at startup - including parity and/or ECC bits.
> >>
> >>So when you write only the CAN ID and the first four bytes the last
> >>four bytes remain untouched. Then the M_CAN starts to read in 32bit
> >>words from the start of the Tx Message element. So it is very likely
> >>to trigger the message RAM error when reading the uninitialized
> >>32bit word from the last four bytes.
> >>
> >>Finally it turns out that an initial writing (with any kind of data)
> >>to the entire message RAM is mandatory to create valid parity/ECC
> >>checksums.
> >>
> >>That's it.
> >>
> >
> >Thanks for sharing this information.
> >Does it mean this issue is related to the nature of Message RAM and is
> >supposed to exist on all M_CAN IP versions?
> 
> From what I know from the 3.1.x revision there's no change regarding
> IR.BRU and IR.BEC - so I would assume this to stay on all M_CAN IP
> revisions.
> 
> But after some sleep I wonder if this patch [3/3] would need an update too.
> 
> Writing to the TX message RAM is obviously no workaround but a valid
> and needed initialization process.
> 
> I would tend to make this patch:
> 
> ---
> 
> can: m_can: add missing TX message RAM initialization
> 
> The M_CAN message RAM is usually equipped with a parity or ECC functionality.
> But RAM cells suffer a hardware reset and can therefore hold
> arbitrary content at startup - including parity and/or ECC bits.
> 
> To prevent the M_CAN controller detecting checksum errors when
> reading potentially uninitialized TX message RAM content to transmit
> CAN frames the TX message RAM has to be written with (any kind of)
> initial data.
> 

The key point of the issue is that why M_CAN will read potentially uninitialized
TX message RAM content which should not happen?
e.g. for our case of the issue, if we sending a no data frame or a less
than 4 bytes frame, why m_can will read extra 4 bytes uninitialized/unset
data which seems not reasonable?

From IP code logic, it will read full 8 bytes of data no matter how many data
actually to be transfered which is strange.

For sending data over 4 bytes, since the Message RAM content will be filled
with the real data to be transfered so there's no such issue.

> ---
> 
> Then the code should memset() the entire TX FIFO element - and not
> only the 8 data bytes we are addressing now.
> 

Our IC guy told me the issue only happened on transferring a data size
of less than 4 bytes and my test also proved that.
So i'm not sure memset() the entire TX FIFO element is neccesary...

Do you think we could keep the current solution firstly and updated later
if needed?

> Maybe it makes sense to send the entire updated patch set (3) again ...
> 
> [1/3] can: add can_is_canfd_skb() API
> [2/3] can: m_can: update to support CAN FD features
> [3/3] can: m_can: add missing message RAM initialization
> 
> Are you ok with that?
> 
> Regards,
> Oliver
> 

Regards
Dong Aisheng
Marc Kleine-Budde Nov. 6, 2014, 9 a.m. UTC | #6
On 11/06/2014 08:04 AM, Oliver Hartkopp wrote:
> On 06.11.2014 02:57, Dong Aisheng wrote:
>> On Wed, Nov 05, 2014 at 07:15:10PM +0100, Oliver Hartkopp wrote:
> 
>>> The Message RAM is usually equipped with a parity or ECC functionality.
>>> But RAM cells suffer a hardware reset and can therefore hold
>>> arbitrary content at startup - including parity and/or ECC bits.
>>>
>>> So when you write only the CAN ID and the first four bytes the last
>>> four bytes remain untouched. Then the M_CAN starts to read in 32bit
>>> words from the start of the Tx Message element. So it is very likely
>>> to trigger the message RAM error when reading the uninitialized
>>> 32bit word from the last four bytes.
>>>
>>> Finally it turns out that an initial writing (with any kind of data)
>>> to the entire message RAM is mandatory to create valid parity/ECC
>>> checksums.
>>>
>>> That's it.
>>>
>>
>> Thanks for sharing this information.
>> Does it mean this issue is related to the nature of Message RAM and is
>> supposed to exist on all M_CAN IP versions?
> 
> From what I know from the 3.1.x revision there's no change regarding
> IR.BRU and IR.BEC - so I would assume this to stay on all M_CAN IP
> revisions.
> 
> But after some sleep I wonder if this patch [3/3] would need an update too.
> 
> Writing to the TX message RAM is obviously no workaround but a valid and
> needed initialization process.
> 
> I would tend to make this patch:
> 
> ---
> 
> can: m_can: add missing TX message RAM initialization
> 
> The M_CAN message RAM is usually equipped with a parity or ECC
> functionality.
> But RAM cells suffer a hardware reset and can therefore hold arbitrary
> content at startup - including parity and/or ECC bits.
> 
> To prevent the M_CAN controller detecting checksum errors when reading
> potentially uninitialized TX message RAM content to transmit CAN frames
> the TX message RAM has to be written with (any kind of) initial data.
> 
> ---
> 
> Then the code should memset() the entire TX FIFO element - and not only
> the 8 data bytes we are addressing now.

No literal memset() as this is iomem

Marc
Oliver Hartkopp Nov. 6, 2014, 12:33 p.m. UTC | #7
On 06.11.2014 09:09, Dong Aisheng wrote:
> On Thu, Nov 06, 2014 at 08:04:17AM +0100, Oliver Hartkopp wrote:


>> To prevent the M_CAN controller detecting checksum errors when
>> reading potentially uninitialized TX message RAM content to transmit
>> CAN frames the TX message RAM has to be written with (any kind of)
>> initial data.
>>
>
> The key point of the issue is that why M_CAN will read potentially uninitialized
> TX message RAM content which should not happen?
> e.g. for our case of the issue, if we sending a no data frame or a less
> than 4 bytes frame, why m_can will read extra 4 bytes uninitialized/unset
> data which seems not reasonable?
>
>  From IP code logic, it will read full 8 bytes of data no matter how many data
> actually to be transfered which is strange.

Yes.

>
> For sending data over 4 bytes, since the Message RAM content will be filled
> with the real data to be transfered so there's no such issue.

E.g. think about the transfer of a CAN FD frame with 32 byte.
When you only fill up content until 28 byte the last four bytes still remain 
uninitialized.

Did you try this 28 byte use-case with an uninitialized TX message RAM ?

cansend can0 123##1001122334566778899AABBCCDDEEFF001122334566778899AABB

To me it looks too risky when we only initialize the first 8 byte.

>
>> ---
>>
>> Then the code should memset() the entire TX FIFO element - and not
>> only the 8 data bytes we are addressing now.
>>
>
> Our IC guy told me the issue only happened on transferring a data size
> of less than 4 bytes and my test also proved that.

'less than'?

So you might try to use 26 bytes too:

cansend can0 123##1001122334566778899AABBCCDDEEFF001122334566778899


> So i'm not sure memset() the entire TX FIFO element is neccesary...

It's no big deal - so we should be defensive here.
And memset() is not working as Marc pointed out in another mail.

So we would need to loop with

	m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i), 0x0);

>
> Do you think we could keep the current solution firstly and updated later
> if needed?

No :-)

I would like to have all data bytes to be written at startup.

Regards,
Oliver
Marc Kleine-Budde Nov. 6, 2014, 12:47 p.m. UTC | #8
On 11/06/2014 01:33 PM, Oliver Hartkopp wrote:
>> So i'm not sure memset() the entire TX FIFO element is neccesary...
> 
> It's no big deal - so we should be defensive here.
> And memset() is not working as Marc pointed out in another mail.
> 
> So we would need to loop with
> 
>     m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i), 0x0);
> 
>>
>> Do you think we could keep the current solution firstly and updated later
>> if needed?
> 
> No :-)
> 
> I would like to have all data bytes to be written at startup.

Me, too. As this happens only once during ifconfig up it should not hurt
performance, either send an incremental or new patch. I'll sort it out.

Marc
Aisheng Dong Nov. 7, 2014, 8:34 a.m. UTC | #9
On Thu, Nov 06, 2014 at 01:33:56PM +0100, Oliver Hartkopp wrote:
> On 06.11.2014 09:09, Dong Aisheng wrote:
> >On Thu, Nov 06, 2014 at 08:04:17AM +0100, Oliver Hartkopp wrote:
> 
> 
> >>To prevent the M_CAN controller detecting checksum errors when
> >>reading potentially uninitialized TX message RAM content to transmit
> >>CAN frames the TX message RAM has to be written with (any kind of)
> >>initial data.
> >>
> >
> >The key point of the issue is that why M_CAN will read potentially uninitialized
> >TX message RAM content which should not happen?
> >e.g. for our case of the issue, if we sending a no data frame or a less
> >than 4 bytes frame, why m_can will read extra 4 bytes uninitialized/unset
> >data which seems not reasonable?
> >
> > From IP code logic, it will read full 8 bytes of data no matter how many data
> >actually to be transfered which is strange.
> 
> Yes.
> 
> >
> >For sending data over 4 bytes, since the Message RAM content will be filled
> >with the real data to be transfered so there's no such issue.
> 
> E.g. think about the transfer of a CAN FD frame with 32 byte.
> When you only fill up content until 28 byte the last four bytes
> still remain uninitialized.
> 
> Did you try this 28 byte use-case with an uninitialized TX message RAM ?
> 
> cansend can0 123##1001122334566778899AABBCCDDEEFF001122334566778899AABB
> 
> To me it looks too risky when we only initialize the first 8 byte.
> 

I tried 28 byte case with two MX6SX SDB board and it worked.
See below:
Tx side:
root@imx6sxsabresd:~# cansend can0 123##1001122334566778899AABBC566778899AABB334
Rx side:
root@imx6sxsabresd:~# candump -x can0
  can0  RX B -  123  [32]  00 11 22 33 45 66 77 88 99 AA BB CC DD EE FF 00 11 22 33 45 66 77 88 99 AA BB 00 00 00 00 00 00

I think this is mainly because the driver will ensure to write
the full 32 bytes to Message RAM even we only fill up content of
28 bytes. The remain 4 bytes written to M_RAM are default 0.
This seems avoid the possibility of reading uninitialized TX message RAM
for transfer.

The code is done as follows:
for (i = 0; i < cf->len; i += 4)
        m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i / 4),
                         *(u32 *)(cf->data + i));
cf->len will be rounded to 32 in cansend.

> >
> >>---
> >>
> >>Then the code should memset() the entire TX FIFO element - and not
> >>only the 8 data bytes we are addressing now.
> >>
> >
> >Our IC guy told me the issue only happened on transferring a data size
> >of less than 4 bytes and my test also proved that.
> 
> 'less than'?
> 

As i said before, from IP code logic, M_CAN will read extra data bytes
from TX buffer only for sending data less than 4 bytes.
e.g.
cansnd can0 123#
cansnd can0 123#112233
Both case will read the full 8 byte from TX buffer even it sends no data
and a 3 bytes data.

But
cansnd can0 123#1122334455
it read 8 bytes
cansnd can0 123##1112233445566778899001122
it read 12 bytes.
No extra uninitialized data read.

> So you might try to use 26 bytes too:
> 
> cansend can0 123##1001122334566778899AABBCCDDEEFF001122334566778899
> 
> 

It works too.

> >So i'm not sure memset() the entire TX FIFO element is neccesary...
> 
> It's no big deal - so we should be defensive here.
> And memset() is not working as Marc pointed out in another mail.
> 
> So we would need to loop with
> 
> 	m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i), 0x0);
> 

This simple loop may not work.
m_can_fifo_write is only for Tx Buffer.

Since Message RAM may be shared, we may want to initialize each part of
Message RAM used by this M_CAN controller.
Something like follows in probe() function:

/* initialize the entire Message RAM in use to avoid possible
* ECC/parity checksum errors when reading an uninitialized buffer
*/
start = priv->mcfg[MRAM_SIDF].off;
end = priv->mcfg[MRAM_TXB].off +
priv->mcfg[MRAM_TXB].num * TXB_ELEMENT_SIZE;
for (i = start; i < end; i += 4)
        writel(0x0, priv->mram_base + i);

I will send a updated patch for this.

Regards
Dong Aisheng

> >
> >Do you think we could keep the current solution firstly and updated later
> >if needed?
> 
> No :-)
> 
> I would like to have all data bytes to be written at startup.
> 
> Regards,
> Oliver
>
Aisheng Dong Nov. 7, 2014, 8:40 a.m. UTC | #10
On Thu, Nov 06, 2014 at 01:47:20PM +0100, Marc Kleine-Budde wrote:
> On 11/06/2014 01:33 PM, Oliver Hartkopp wrote:
> >> So i'm not sure memset() the entire TX FIFO element is neccesary...
> > 
> > It's no big deal - so we should be defensive here.
> > And memset() is not working as Marc pointed out in another mail.
> > 
> > So we would need to loop with
> > 
> >     m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i), 0x0);
> > 
> >>
> >> Do you think we could keep the current solution firstly and updated later
> >> if needed?
> > 
> > No :-)
> > 
> > I would like to have all data bytes to be written at startup.
> 
> Me, too. As this happens only once during ifconfig up it should not hurt
> performance, either send an incremental or new patch. I'll sort it out.
> 

I will send a new patch for this.

Regards
Dong Aisheng

> Marc
> 
> -- 
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |
>
diff mbox

Patch

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index eee1533..567cd27 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -905,6 +905,16 @@  static void m_can_chip_config(struct net_device *dev)
 	/* set bittiming params */
 	m_can_set_bittiming(dev);
 
+	/* At least on the i.MX6SX TO1.2 with M_CAN IP version 3.0.1,
+	 * (CREL = 30130506) an issue with the Message RAM was discovered.
+	 * Sending CAN frames with dlc less than 4 bytes will lead to bit
+	 * errors, when the first 8 bytes of the Message RAM have not been
+	 * initialized (i.e. written to). To work around this issue, the
+	 * first 8 bytes are initialized here.
+	 */
+	m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(0), 0x0);
+	m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(1), 0x0);
+
 	m_can_config_endisable(priv, false);
 }