tg3: fix big endian MAC address collection failure

Message ID	1239636110.3278.29.camel@mulgrave.int.hansenpartnership.com (mailing list archive)
State	Not Applicable
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n3DFMMpF028678 for <patchwork-parisc@patchwork.kernel.org>; Mon, 13 Apr 2009 15:22:22 GMT Subject: [PATCH] tg3: fix big endian MAC address collection failure From: James Bottomley <James.Bottomley@HansenPartnership.com> To: linux-netdev@vger.kernel.org Cc: Parisc List <linux-parisc@vger.kernel.org>, Matt Carlson <mcarlson@broadcom.com>, Michael Chan <mchan@broadcom.com> Content-Type: text/plain Date: Mon, 13 Apr 2009 15:21:50 +0000 Message-Id: <1239636110.3278.29.camel@mulgrave.int.hansenpartnership.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-parisc-owner@vger.kernel.org Precedence: bulk

James Bottomley April 13, 2009, 3:21 p.m. UTC

We noticed on parisc that our broadcoms all swapped MAC addresses going
from 2.6.29 to 2.6.30-rc1:

Apr 11 07:48:24 ion kernel: eth0: Tigon3 [partno(BCM95700A6) rev 0105] (PCI:66MHz:64-bit) MAC address 00:30:6e:4b:15:59
Apr 13 07:34:34 ion kernel: eth0: Tigon3 [partno(BCM95700A6) rev 0105] (PCI:66MHz:64-bit) MAC address 00:00:59:15:4b:6e

The problem patch is:

commit 6d348f2c1e0bb1cf7a494b51fc921095ead3f6ae
Author: Matt Carlson <mcarlson@broadcom.com>
Date:   Wed Feb 25 14:25:52 2009 +0000

    tg3: Eliminate tg3_nvram_read_swab()

With the root cause being the use of memcpy to set the mac address:

   memcpy(&dev->dev_addr[0], ((char *)&hi) + 2, 2);
   memcpy(&dev->dev_addr[2], (char *)&lo, sizeof(lo));

This might work on little endian machines, but it can't on big endian
ones.  You have to use the original setting mechanism to be correct on
all architectures.

The attached patch fixes parisc.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

---




--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matt Carlson April 13, 2009, 6 p.m. UTC | #1

Hi James.  I'd like to think about this some more before we apply it.  I
investigated this problem using a 5703 on a SPARC Ultra 10, and all my
assumptions checked out.  I don't dispute that you and Robin are having
a problem.  I just don't understand where the root cause of the problem
is yet.  Stay tuned.

On Mon, Apr 13, 2009 at 08:21:50AM -0700, James Bottomley wrote:
> We noticed on parisc that our broadcoms all swapped MAC addresses going
> from 2.6.29 to 2.6.30-rc1:
> 
> Apr 11 07:48:24 ion kernel: eth0: Tigon3 [partno(BCM95700A6) rev 0105] (PCI:66MHz:64-bit) MAC address 00:30:6e:4b:15:59
> Apr 13 07:34:34 ion kernel: eth0: Tigon3 [partno(BCM95700A6) rev 0105] (PCI:66MHz:64-bit) MAC address 00:00:59:15:4b:6e
> 
> The problem patch is:
> 
> commit 6d348f2c1e0bb1cf7a494b51fc921095ead3f6ae
> Author: Matt Carlson <mcarlson@broadcom.com>
> Date:   Wed Feb 25 14:25:52 2009 +0000
> 
>     tg3: Eliminate tg3_nvram_read_swab()
> 
> With the root cause being the use of memcpy to set the mac address:
> 
>    memcpy(&dev->dev_addr[0], ((char *)&hi) + 2, 2);
>    memcpy(&dev->dev_addr[2], (char *)&lo, sizeof(lo));
> 
> This might work on little endian machines, but it can't on big endian
> ones.  You have to use the original setting mechanism to be correct on
> all architectures.
> 
> The attached patch fixes parisc.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> 
> ---
> 
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 6a736dd..7a837c4 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -12443,8 +12443,13 @@ static int __devinit tg3_get_device_address(struct tg3 *tp)
>  		/* Next, try NVRAM. */
>  		if (!tg3_nvram_read_be32(tp, mac_offset + 0, &hi) &&
>  		    !tg3_nvram_read_be32(tp, mac_offset + 4, &lo)) {
> -			memcpy(&dev->dev_addr[0], ((char *)&hi) + 2, 2);
> -			memcpy(&dev->dev_addr[2], (char *)&lo, sizeof(lo));
> +			dev->dev_addr[0] = ((hi >> 16) & 0xff);
> +			dev->dev_addr[1] = ((hi >> 24) & 0xff);
> +			dev->dev_addr[2] = ((lo >>  0) & 0xff);
> +			dev->dev_addr[3] = ((lo >>  8) & 0xff);
> +			dev->dev_addr[4] = ((lo >> 16) & 0xff);
> +			dev->dev_addr[5] = ((lo >> 24) & 0xff);
> +
>  		}
>  		/* Finally just fetch it out of the MAC control regs. */
>  		else {
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kyle mcmartin April 13, 2009, 6:04 p.m. UTC | #2

On Mon, Apr 13, 2009 at 11:00:52AM -0700, Matt Carlson wrote:
> Hi James.  I'd like to think about this some more before we apply it.  I
> investigated this problem using a 5703 on a SPARC Ultra 10, and all my
> assumptions checked out.  I don't dispute that you and Robin are having
> a problem.  I just don't understand where the root cause of the problem
> is yet.  Stay tuned.
> 

Did your card in the sparc have an openfirmware ROM? If so, it probably
used the ofw tree to obtain parameters instead of the card nvram...

regards, Kyle
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matt Carlson April 13, 2009, 6:13 p.m. UTC | #3

On Mon, Apr 13, 2009 at 11:04:26AM -0700, Kyle McMartin wrote:
> On Mon, Apr 13, 2009 at 11:00:52AM -0700, Matt Carlson wrote:
> > Hi James.  I'd like to think about this some more before we apply it.  I
> > investigated this problem using a 5703 on a SPARC Ultra 10, and all my
> > assumptions checked out.  I don't dispute that you and Robin are having
> > a problem.  I just don't understand where the root cause of the problem
> > is yet.  Stay tuned.
> > 
> 
> Did your card in the sparc have an openfirmware ROM? If so, it probably
> used the ofw tree to obtain parameters instead of the card nvram...

No, the test was run using a NIC.

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Bottomley April 13, 2009, 6:15 p.m. UTC | #4

On Mon, 2009-04-13 at 11:00 -0700, Matt Carlson wrote:
> Hi James.  I'd like to think about this some more before we apply it.  I
> investigated this problem using a 5703 on a SPARC Ultra 10, and all my
> assumptions checked out.

Um, I think you'll find the SPARC has a special routine for extracting
the MAC address from Open Firmware, so it doesn't exercise the NVRAM
path.

>   I don't dispute that you and Robin are having
> a problem.  I just don't understand where the root cause of the problem
> is yet.  Stay tuned.

The root cause is basically that if you get the MAC from NVRAM into two
native 32 bit words, you have to be careful about how you convert them
to a byte stream for the MAC address.  You can't use a memcpy from the
native number to the MAC because it's not endian invariant (it copies
the numbers the opposite ways around on big and little endian boxes).

James

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Bottomley April 13, 2009, 6:18 p.m. UTC | #5

On Mon, 2009-04-13 at 11:13 -0700, Matt Carlson wrote:
> On Mon, Apr 13, 2009 at 11:04:26AM -0700, Kyle McMartin wrote:
> > On Mon, Apr 13, 2009 at 11:00:52AM -0700, Matt Carlson wrote:
> > > Hi James.  I'd like to think about this some more before we apply it.  I
> > > investigated this problem using a 5703 on a SPARC Ultra 10, and all my
> > > assumptions checked out.  I don't dispute that you and Robin are having
> > > a problem.  I just don't understand where the root cause of the problem
> > > is yet.  Stay tuned.
> > > 
> > 
> > Did your card in the sparc have an openfirmware ROM? If so, it probably
> > used the ofw tree to obtain parameters instead of the card nvram...
> 
> No, the test was run using a NIC.

Possibly, then, the MAC comes from the MBOX, which looks correct?

tg3_get_device_address() has four mechanisms to get the mac address

     1. A sparc specific open firmware boot mac
     2. The card mac mbox
     3. the NVRAM (this is the failing one)
     4. a sparc specific default fallback if none of the above worked

The mbox route doesn't have a memcpy, only the NVRAM one.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matt Carlson April 13, 2009, 6:24 p.m. UTC | #6

On Mon, Apr 13, 2009 at 11:18:37AM -0700, James Bottomley wrote:
> On Mon, 2009-04-13 at 11:13 -0700, Matt Carlson wrote:
> > On Mon, Apr 13, 2009 at 11:04:26AM -0700, Kyle McMartin wrote:
> > > On Mon, Apr 13, 2009 at 11:00:52AM -0700, Matt Carlson wrote:
> > > > Hi James.  I'd like to think about this some more before we apply it.  I
> > > > investigated this problem using a 5703 on a SPARC Ultra 10, and all my
> > > > assumptions checked out.  I don't dispute that you and Robin are having
> > > > a problem.  I just don't understand where the root cause of the problem
> > > > is yet.  Stay tuned.
> > > > 
> > > 
> > > Did your card in the sparc have an openfirmware ROM? If so, it probably
> > > used the ofw tree to obtain parameters instead of the card nvram...
> > 
> > No, the test was run using a NIC.
> 
> Possibly, then, the MAC comes from the MBOX, which looks correct?
> 
> tg3_get_device_address() has four mechanisms to get the mac address
> 
>      1. A sparc specific open firmware boot mac
>      2. The card mac mbox
>      3. the NVRAM (this is the failing one)
>      4. a sparc specific default fallback if none of the above worked
> 
> The mbox route doesn't have a memcpy, only the NVRAM one.
> 
> James

On this particular adapter, the MAC address would be obtained through
shared memory.  I had to modify the driver to force the code to obtain
the MAC address through NVRAM.  I printed both MAC addresses and they match.

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matt Carlson April 13, 2009, 6:37 p.m. UTC | #7

On Mon, Apr 13, 2009 at 11:15:16AM -0700, James Bottomley wrote:
> On Mon, 2009-04-13 at 11:00 -0700, Matt Carlson wrote:
> > Hi James.  I'd like to think about this some more before we apply it.  I
> > investigated this problem using a 5703 on a SPARC Ultra 10, and all my
> > assumptions checked out.
> 
> Um, I think you'll find the SPARC has a special routine for extracting
> the MAC address from Open Firmware, so it doesn't exercise the NVRAM
> path.

Yes, it does.  The driver moves beyond that point because the routine
fails on this system.  (It's really old and doesn't have a Broadcom LOM
on it.)

> >   I don't dispute that you and Robin are having
> > a problem.  I just don't understand where the root cause of the problem
> > is yet.  Stay tuned.
> 
> The root cause is basically that if you get the MAC from NVRAM into two
> native 32 bit words, you have to be careful about how you convert them
> to a byte stream for the MAC address.  You can't use a memcpy from the
> native number to the MAC because it's not endian invariant (it copies
> the numbers the opposite ways around on big and little endian boxes).
> 
> James

But that is exactly what the code is doing.  tg3_nvram_read_be32() will
return the data in bytestream format.  A memcpy() should be all that is
needed to transport the data to a different memory location.

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Bottomley April 13, 2009, 8:48 p.m. UTC | #8

On Mon, 2009-04-13 at 11:37 -0700, Matt Carlson wrote:
> But that is exactly what the code is doing.  tg3_nvram_read_be32() will
> return the data in bytestream format.  A memcpy() should be all that is
> needed to transport the data to a different memory location.

But not the one you've done.  cpu_to_be32 is a nop pass through on our
architecture, so tg3_nvram_read_be32 is equivalent to tg3_nvram_read on
our architecture (i.e. identical to the code that was doing the read in
2.6.29).  However, the memcpy is the wrong way around for us.  If you
look at an example, the original code said

dev_addr[0] = hi >> 16;
dev_addr[1] = hi >> 24

So MSB-1 and MSB.  However, on a BE machine these are at offset one and
zero from the start of the word.  The replacement memcopy is:

memcpy(&dev->dev_addr[0], ((char *)&hi) + 2, 2)

i.e. offset 3 and 4, which actually copies LSB-1 and LSB into there.
You can follow similar logic to show that the lo copy is wrong too.

Perhaps the fix is just to put the tg3_nvram_read() back as well as the
original by loads?

James

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matt Carlson April 14, 2009, 1:17 a.m. UTC | #9

Sorry for the delay.  I just ran the same experiment on an ia64 we have
in-house.  The results were the same.  The machine was a HP integrity
BL870C pre-production blade with dual 5704S LOMs.  More responses
inline.

On Mon, Apr 13, 2009 at 01:48:18PM -0700, James Bottomley wrote:
> On Mon, 2009-04-13 at 11:37 -0700, Matt Carlson wrote:
> > But that is exactly what the code is doing.  tg3_nvram_read_be32() will
> > return the data in bytestream format.  A memcpy() should be all that is
> > needed to transport the data to a different memory location.
> 
> But not the one you've done.  cpu_to_be32 is a nop pass through on our
> architecture, so tg3_nvram_read_be32 is equivalent to tg3_nvram_read on
> our architecture (i.e. identical to the code that was doing the read in
> 2.6.29).

tg3_nvram_read() returns a 32-bit word of NVRAM data, swapped according to
the byteswapping rules for all register accesses.  On big endian machines,
the data returned will be exactly as it would have been in NVRAM.  On
little endian machines, the data will be swapped when compared to a raw
read of the NVRAM.  To bring the data back to a bytestream format on LE
systems, a swap is needed.  After that, a memcpy should be all that is
required to extract the MAC address.

Code that operates on data in a bytestream format will necessarily look
different than code that operates on data as a 32-bit native endian
quantity.  The code changes can look a bit drastic and are easily
culpable at first glance.

> However, the memcpy is the wrong way around for us.

It shouldn't be, and that is the problem.  We are trying to determine
why the data isn't in the form we think it should be in.  The most
difficult part of this discussion is convincing ourselves that the end
result is the same, even though the path to get there changes in
significant ways.

The old code read the data from NVRAM, did a blind swap of the data and
then operated on that data as a 32-bit quantity.  The blind swap was
completely superfluous and only changed which byte the driver attempted
to get its data.  An important property of this version is that the
same code when working with machines of either endianness.

If you are with me this far, then hopefully you can see that, while the
two versions of the code appear different, they wind up with the same
result.  The change was made because the code became much more readable
when we describe the operations as a bytestream.

> If you look at an example, the original code said
> 
> dev_addr[0] = hi >> 16;
> dev_addr[1] = hi >> 24
> 
> So MSB-1 and MSB.  However, on a BE machine these are at offset one and
> zero from the start of the word.  The replacement memcopy is:
> 
> memcpy(&dev->dev_addr[0], ((char *)&hi) + 2, 2)
> 
> i.e. offset 3 and 4, which actually copies LSB-1 and LSB into there.
> You can follow similar logic to show that the lo copy is wrong too.

Yes.  The code will be different depending on whether you treat the data
as a native endian 32-bit word or as a 4 byte segment of a bytestream.

> Perhaps the fix is just to put the tg3_nvram_read() back as well as the
> original by loads?

The code is much more readable with the changes in place.  I'd rather
not back them out.  I'd like to find out why you are getting different
results than I am with my tests.

I just saw your NVRAM dump post.  Perhaps that'll shed some light on the
matter.

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Grant Grundler April 14, 2009, 3:39 p.m. UTC | #10

On Mon, Apr 13, 2009 at 06:17:10PM -0700, Matt Carlson wrote:
> Sorry for the delay.  I just ran the same experiment on an ia64 we have
> in-house.  The results were the same.  The machine was a HP integrity
> BL870C pre-production blade with dual 5704S LOMs.  More responses
> inline.

Matt,
sorry - why test this on ia64?
ia64 is also little endian.
parisc is big endian.

> On Mon, Apr 13, 2009 at 01:48:18PM -0700, James Bottomley wrote:
> > On Mon, 2009-04-13 at 11:37 -0700, Matt Carlson wrote:
> > > But that is exactly what the code is doing.  tg3_nvram_read_be32() will
> > > return the data in bytestream format.  A memcpy() should be all that is
> > > needed to transport the data to a different memory location.
> > 
> > But not the one you've done.  cpu_to_be32 is a nop pass through on our
> > architecture, so tg3_nvram_read_be32 is equivalent to tg3_nvram_read on
> > our architecture (i.e. identical to the code that was doing the read in
> > 2.6.29).
> 
> tg3_nvram_read() returns a 32-bit word of NVRAM data, swapped according to
> the byteswapping rules for all register accesses.

Lets review that calling order I found in 2.6.29-rc8:
	tg3_nvram_read -> tr32 -> tg3_read32 -> readl -> swab

o The swab is actually in tg3_nvram_read.
o one should be able to get rid of indirect function call in tr32.

>  On big endian machines,
> the data returned will be exactly as it would have been in NVRAM.  On
> little endian machines, the data will be swapped when compared to a raw
> read of the NVRAM.

Is this referring to the returned value from readl() or from tg3_nvram_read()?
The readl() return value will be the same in both LE and BE architectures.
readl() adjusts for endianess.
__raw_readl() does not.

> To bring the data back to a bytestream format on LE
> systems, a swap is needed. 

For the above code path, if swap is needed on LE, it's also needed on BE.

> After that, a memcpy should be all that is
> required to extract the MAC address.
> 
> Code that operates on data in a bytestream format will necessarily look
> different than code that operates on data as a 32-bit native endian
> quantity.  The code changes can look a bit drastic and are easily
> culpable at first glance.

This is true for code paths that deal with DMA data. It's not true
for readl() or writel() since those are already taking care of
the endianess.

> > However, the memcpy is the wrong way around for us.
> 
> It shouldn't be, and that is the problem.  We are trying to determine
> why the data isn't in the form we think it should be in.  The most
> difficult part of this discussion is convincing ourselves that the end
> result is the same, even though the path to get there changes in
> significant ways.

It's not that hard. readl() is already compensating for endianess.

> The old code read the data from NVRAM, did a blind swap of the data and
> then operated on that data as a 32-bit quantity.  The blind swap was
> completely superfluous and only changed which byte the driver attempted
> to get its data.

The swap is not superfluous if one is using memcpy.

>  An important property of this version is that the
> same code when working with machines of either endianness.
> 
> If you are with me this far, then hopefully you can see that, while the
> two versions of the code appear different, they wind up with the same
> result.  The change was made because the code became much more readable
> when we describe the operations as a bytestream.

I think james already demonstrated that the result is not the same.

I'll stop here since most of the discussion is based on not knowing
that readl() takes care of endianess.

hth,
grant

> > If you look at an example, the original code said
> > 
> > dev_addr[0] = hi >> 16;
> > dev_addr[1] = hi >> 24
> > 
> > So MSB-1 and MSB.  However, on a BE machine these are at offset one and
> > zero from the start of the word.  The replacement memcopy is:
> > 
> > memcpy(&dev->dev_addr[0], ((char *)&hi) + 2, 2)
> > 
> > i.e. offset 3 and 4, which actually copies LSB-1 and LSB into there.
> > You can follow similar logic to show that the lo copy is wrong too.
> 
> Yes.  The code will be different depending on whether you treat the data
> as a native endian 32-bit word or as a 4 byte segment of a bytestream.
> 
> > Perhaps the fix is just to put the tg3_nvram_read() back as well as the
> > original by loads?
> 
> The code is much more readable with the changes in place.  I'd rather
> not back them out.  I'd like to find out why you are getting different
> results than I am with my tests.
> 
> I just saw your NVRAM dump post.  Perhaps that'll shed some light on the
> matter.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matt Carlson April 14, 2009, 7:02 p.m. UTC | #11

On Tue, Apr 14, 2009 at 08:39:11AM -0700, Grant Grundler wrote:
> On Mon, Apr 13, 2009 at 06:17:10PM -0700, Matt Carlson wrote:
> > Sorry for the delay.  I just ran the same experiment on an ia64 we have
> > in-house.  The results were the same.  The machine was a HP integrity
> > BL870C pre-production blade with dual 5704S LOMs.  More responses
> > inline.
> 
> Matt,
> sorry - why test this on ia64?
> ia64 is also little endian.
> parisc is big endian.

Like Michael said in another post, Robin Holt experienced a similar (the
same?) problem on an IA64.  We were thinking that a problem reproduction
on one machine might imply the same problem on the other.

> > On Mon, Apr 13, 2009 at 01:48:18PM -0700, James Bottomley wrote:
> > > On Mon, 2009-04-13 at 11:37 -0700, Matt Carlson wrote:
> > > > But that is exactly what the code is doing.  tg3_nvram_read_be32() will
> > > > return the data in bytestream format.  A memcpy() should be all that is
> > > > needed to transport the data to a different memory location.
> > > 
> > > But not the one you've done.  cpu_to_be32 is a nop pass through on our
> > > architecture, so tg3_nvram_read_be32 is equivalent to tg3_nvram_read on
> > > our architecture (i.e. identical to the code that was doing the read in
> > > 2.6.29).
> > 
> > tg3_nvram_read() returns a 32-bit word of NVRAM data, swapped according to
> > the byteswapping rules for all register accesses.
> 
> Lets review that calling order I found in 2.6.29-rc8:
> 	tg3_nvram_read -> tr32 -> tg3_read32 -> readl -> swab
> 
> o The swab is actually in tg3_nvram_read.

Right.

> o one should be able to get rid of indirect function call in tr32.

Well, there are situations where the read operation needs to be done
using a different method.  So that the reader could concentrate on the
higher-level algorthm, we hide the implementation details of the read
through a function pointer.

> >  On big endian machines,
> > the data returned will be exactly as it would have been in NVRAM.  On
> > little endian machines, the data will be swapped when compared to a raw
> > read of the NVRAM.
> 
> Is this referring to the returned value from readl() or from tg3_nvram_read()?

It is referring to the value from readl().

> The readl() return value will be the same in both LE and BE architectures.
> readl() adjusts for endianess.
> __raw_readl() does not.
>
> > To bring the data back to a bytestream format on LE
> > systems, a swap is needed. 
> 
> For the above code path, if swap is needed on LE, it's also needed on BE.

Well, the 32-bit _value_ will be the same on architectures of either endianness.
But our goal is to deal with the data as a bytestream.  Consequently a
swap is needed.

For example, reading offset 0x0 of NVRAM (for legacy NVRAM formats)
readl will always produce a value of 0x669955aa.  On LE machines, this 32-bit
value will be laid out as :

  i             i + 3
  |               |
  V               V
---------------------
| aa | 55 | 99 | 66 |
---------------------

On BE machines, the byte ordering is obviously reversed.  To do a
memcpy, the bytes must be in the BE format.

> > After that, a memcpy should be all that is
> > required to extract the MAC address.
> > 
> > Code that operates on data in a bytestream format will necessarily look
> > different than code that operates on data as a 32-bit native endian
> > quantity.  The code changes can look a bit drastic and are easily
> > culpable at first glance.
> 
> This is true for code paths that deal with DMA data. It's not true
> for readl() or writel() since those are already taking care of
> the endianess.

Again, think bytestream.

> > > However, the memcpy is the wrong way around for us.
> > 
> > It shouldn't be, and that is the problem.  We are trying to determine
> > why the data isn't in the form we think it should be in.  The most
> > difficult part of this discussion is convincing ourselves that the end
> > result is the same, even though the path to get there changes in
> > significant ways.
> 
> It's not that hard. readl() is already compensating for endianess.
> 
> > The old code read the data from NVRAM, did a blind swap of the data and
> > then operated on that data as a 32-bit quantity.  The blind swap was
> > completely superfluous and only changed which byte the driver attempted
> > to get its data.
> 
> The swap is not superfluous if one is using memcpy.
>
> >  An important property of this version is that the
> > same code when working with machines of either endianness.
> > 
> > If you are with me this far, then hopefully you can see that, while the
> > two versions of the code appear different, they wind up with the same
> > result.  The change was made because the code became much more readable
> > when we describe the operations as a bytestream.
> 
> I think james already demonstrated that the result is not the same.
> 
> I'll stop here since most of the discussion is based on not knowing
> that readl() takes care of endianess.

Yes.  I think the discussion will be resolved once we distinguish
between operating on the data as a value and operating on the data as a
bytestream.

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Grant Grundler April 18, 2009, 11 p.m. UTC | #12

This is settled already I think...just following up to complete
the conversation. Was "Out of Office" this past week.

On Tue, Apr 14, 2009 at 12:02:32PM -0700, Matt Carlson wrote:
...
> > > tg3_nvram_read() returns a 32-bit word of NVRAM data, swapped according to
> > > the byteswapping rules for all register accesses.
> > 
> > Lets review that calling order I found in 2.6.29-rc8:
> > 	tg3_nvram_read -> tr32 -> tg3_read32 -> readl -> swab
> > 
> > o The swab is actually in tg3_nvram_read.
> 
> Right.
> 
> > o one should be able to get rid of indirect function call in tr32.
> 
> Well, there are situations where the read operation needs to be done
> using a different method.  So that the reader could concentrate on the
> higher-level algorthm, we hide the implementation details of the read
> through a function pointer.

I agree with "why" (problems demand it) but think "how" (implementation)
could be better. Indirect function calls are usually the next most
"expensive" operation after cache misses because they are not predictable
to the CPU (data dependency). I don't know as much about the compiler
behavior with respect to indirect function calls but expect the compiler
can't optimize well either.  So I usually prefer to only see indirect
function calls for "public" APIs and not internal functions.

So what are alternatives?
1) Embedding the choice in tr32() is the first obvious one. is use
   a switch statement and call the respective functions. The compiler
   can inline all the functions into one block since they are
   only being called from one place. Net cost is an extra branch.

2) Not as good as (1), but redefine tr32() to be a macro that is
   an indirect function call.  Maybe that's already happening via
   compiler inlining and I'm being obtuse. 

(1) would make code path explicit to HW/SW and work roughly the same
on all architectures.

> > >  On big endian machines,
> > > the data returned will be exactly as it would have been in NVRAM.  On
> > > little endian machines, the data will be swapped when compared to a raw
> > > read of the NVRAM.
> > 
> > Is this referring to the returned value from readl() or from tg3_nvram_read()?
> 
> It is referring to the value from readl().

Ok - than I hope it's clear the original assertion is not correct.

> 
> > The readl() return value will be the same in both LE and BE architectures.
> > readl() adjusts for endianess.
> > __raw_readl() does not.
> >
> > > To bring the data back to a bytestream format on LE
> > > systems, a swap is needed. 
> > 
> > For the above code path, if swap is needed on LE, it's also needed on BE.
> 
> Well, the 32-bit _value_ will be the same on architectures of either endianness.
> But our goal is to deal with the data as a bytestream.

Ok. This makes sense. But then maybe use __raw_read in this particular case?
Is it possible to read the bytes one at a time?

>  Consequently a > swap is needed.
> 
> For example, reading offset 0x0 of NVRAM (for legacy NVRAM formats)
> readl will always produce a value of 0x669955aa.  On LE machines, this 32-bit
> value will be laid out as :
> 
>   i             i + 3
>   |               |
>   V               V
> ---------------------
> | aa | 55 | 99 | 66 |
> ---------------------
> 
> On BE machines, the byte ordering is obviously reversed.  To do a
> memcpy, the bytes must be in the BE format.

memcpy() doesn't care which "order" bytes are in.
It treats the source/dest memory regions as byte streams as well.

> 
> > > After that, a memcpy should be all that is
> > > required to extract the MAC address.
> > > 
> > > Code that operates on data in a bytestream format will necessarily look
> > > different than code that operates on data as a 32-bit native endian
> > > quantity.  The code changes can look a bit drastic and are easily
> > > culpable at first glance.
> > 
> > This is true for code paths that deal with DMA data. It's not true
> > for readl() or writel() since those are already taking care of
> > the endianess.
> 
> Again, think bytestream.

The data is not a byte stream (yet) since we used readl() to retrieve
the value. Use cpu_to_le32() to convert what was originally an LE byte
stream back to a LE byte stream.

Using either __raw_read() or readb() would have avoided this confusion.

> > I'll stop here since most of the discussion is based on not knowing
> > that readl() takes care of endianess.
> 
> Yes.  I think the discussion will be resolved once we distinguish
> between operating on the data as a value and operating on the data as a
> bytestream.

Personally, it's easier for me compare "CPU register data" (aka "value")
vs "data in Memory" (which is inherently a byte stream). The CPU is the
one that is imposing an endianness to multi-byte data in memory.

thanks,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael Chan April 19, 2009, 7:30 a.m. UTC | #13

Grant Grundler wrote:

> This is settled already I think...just following up to complete
> the conversation. Was "Out of Office" this past week.
>
> On Tue, Apr 14, 2009 at 12:02:32PM -0700, Matt Carlson wrote:
> ...
> > > > tg3_nvram_read() returns a 32-bit word of NVRAM data,
> swapped according to
> > > > the byteswapping rules for all register accesses.
> > >
> > > Lets review that calling order I found in 2.6.29-rc8:
> > >   tg3_nvram_read -> tr32 -> tg3_read32 -> readl -> swab
> > >
> > > o The swab is actually in tg3_nvram_read.
> >
> > Right.
> >
> > > o one should be able to get rid of indirect function call in tr32.
> >
> > Well, there are situations where the read operation needs to be done
> > using a different method.  So that the reader could
> concentrate on the
> > higher-level algorthm, we hide the implementation details
> of the read
> > through a function pointer.
>
> I agree with "why" (problems demand it) but think "how"
> (implementation)
> could be better. Indirect function calls are usually the next most
> "expensive" operation after cache misses because they are not
> predictable
> to the CPU (data dependency). I don't know as much about the compiler
> behavior with respect to indirect function calls but expect
> the compiler
> can't optimize well either.  So I usually prefer to only see indirect
> function calls for "public" APIs and not internal functions.

We've had a lengthy discussion on this topic with DaveM and jgarzik many
years ago.  Marc is down and I cannot find the archive.  It was agreed
that the function pointer approach is better than a cascade of if -
else if - else if approach.  The Branch target buffers in modern CPUs
are supposed to be able to predict these function pointer branches
if we are using the same method in the hot code path.

>
> So what are alternatives?
> 1) Embedding the choice in tr32() is the first obvious one. is use
>    a switch statement and call the respective functions. The compiler
>    can inline all the functions into one block since they are
>    only being called from one place. Net cost is an extra branch.
>
> 2) Not as good as (1), but redefine tr32() to be a macro that is
>    an indirect function call.  Maybe that's already happening via
>    compiler inlining and I'm being obtuse.
>
> (1) would make code path explicit to HW/SW and work roughly the same
> on all architectures.
>
>
> > > >  On big endian machines,
> > > > the data returned will be exactly as it would have been
> in NVRAM.  On
> > > > little endian machines, the data will be swapped when
> compared to a raw
> > > > read of the NVRAM.
> > >
> > > Is this referring to the returned value from readl() or
> from tg3_nvram_read()?
> >
> > It is referring to the value from readl().
>
> Ok - than I hope it's clear the original assertion is not correct.
>
> >
> > > The readl() return value will be the same in both LE and
> BE architectures.
> > > readl() adjusts for endianess.
> > > __raw_readl() does not.
> > >
> > > > To bring the data back to a bytestream format on LE
> > > > systems, a swap is needed.
> > >
> > > For the above code path, if swap is needed on LE, it's
> also needed on BE.
> >
> > Well, the 32-bit _value_ will be the same on architectures
> of either endianness.
> > But our goal is to deal with the data as a bytestream.
>
> Ok. This makes sense. But then maybe use __raw_read in this
> particular case?
> Is it possible to read the bytes one at a time?
>
> >  Consequently a > swap is needed.
> >
> > For example, reading offset 0x0 of NVRAM (for legacy NVRAM formats)
> > readl will always produce a value of 0x669955aa.  On LE
> machines, this 32-bit
> > value will be laid out as :
> >
> >   i             i + 3
> >   |               |
> >   V               V
> > ---------------------
> > | aa | 55 | 99 | 66 |
> > ---------------------
> >
> > On BE machines, the byte ordering is obviously reversed.  To do a
> > memcpy, the bytes must be in the BE format.
>
> memcpy() doesn't care which "order" bytes are in.
> It treats the source/dest memory regions as byte streams as well.

Yeah, I think the point is that the source data needs to be in the
desired byte stream order.

>
> >
> > > > After that, a memcpy should be all that is
> > > > required to extract the MAC address.
> > > >
> > > > Code that operates on data in a bytestream format will
> necessarily look
> > > > different than code that operates on data as a 32-bit
> native endian
> > > > quantity.  The code changes can look a bit drastic and
> are easily
> > > > culpable at first glance.
> > >
> > > This is true for code paths that deal with DMA data. It's not true
> > > for readl() or writel() since those are already taking care of
> > > the endianess.
> >
> > Again, think bytestream.
>
> The data is not a byte stream (yet) since we used readl() to retrieve
> the value. Use cpu_to_le32() to convert what was originally an LE byte
> stream back to a LE byte stream.
>
> Using either __raw_read() or readb() would have avoided this
> confusion.

It's a 32-bit so we cannot use readb().  I don't think the chip will decode
the byte enables.  I don't like to use __raw_read() because it does not have
the same memory barrier as regular readl().

>
>
> > > I'll stop here since most of the discussion is based on
> not knowing
> > > that readl() takes care of endianess.
> >
> > Yes.  I think the discussion will be resolved once we distinguish
> > between operating on the data as a value and operating on
> the data as a
> > bytestream.
>
> Personally, it's easier for me compare "CPU register data"
> (aka "value")
> vs "data in Memory" (which is inherently a byte stream). The
> CPU is the
> one that is imposing an endianness to multi-byte data in memory.

Not sure if I understand your statement.  The point is that we need
to provide a consistent view of the NVRAM data regardless of the
host machine's endianess.  So we present the data as an array of
bytes.

Thanks.

>
> thanks,
> grant
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Grant Grundler April 19, 2009, 10:32 p.m. UTC | #14

On Sun, Apr 19, 2009 at 12:30:26AM -0700, Michael Chan wrote:
[ Deleted my comments about indirect function calls. ]
...
> We've had a lengthy discussion on this topic with DaveM and jgarzik many
> years ago.  Marc is down and I cannot find the archive.  It was agreed
> that the function pointer approach is better than a cascade of if -
> else if - else if approach.  The Branch target buffers in modern CPUs
> are supposed to be able to predict these function pointer branches
> if we are using the same method in the hot code path.

Ok - thanks. I can look for that.

...
> > The data is not a byte stream (yet) since we used readl() to retrieve
> > the value. Use cpu_to_le32() to convert what was originally an LE byte
> > stream back to a LE byte stream.
> >
> > Using either __raw_read() or readb() would have avoided this
> > confusion.
> 
> It's a 32-bit so we cannot use readb().  I don't think the chip will decode
> the byte enables.

Ok. Thanks for clarifying.

> I don't like to use __raw_read() because it does not have
> the same memory barrier as regular readl().

I don't think the memory barrier is necessary in the "get" case.
wmb() is probably needed in the "set" case.

I was thinking that avoiding several byte swaps would make the code
more maintainable and one wmb() could be explained with a comment.
Remember, we are having this conversation because the MAC address
was incorrectly ordered on BE machine. Seems that a "load from NVRAM"
and "Store to HostMem" preserves the byte stream order in the same
way memcpy would. And would be heck of a lot easier to understand.

thanks,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matt Carlson April 20, 2009, 8:46 p.m. UTC | #15

On Sun, Apr 19, 2009 at 03:32:02PM -0700, Grant Grundler wrote:
> > > The data is not a byte stream (yet) since we used readl() to retrieve
> > > the value. Use cpu_to_le32() to convert what was originally an LE byte
> > > stream back to a LE byte stream.
> > >
> > > Using either __raw_read() or readb() would have avoided this
> > > confusion.
> > 
> > It's a 32-bit so we cannot use readb().  I don't think the chip will decode
> > the byte enables.
> 
> Ok. Thanks for clarifying.
> 
> > I don't like to use __raw_read() because it does not have
> > the same memory barrier as regular readl().
> 
> I don't think the memory barrier is necessary in the "get" case.
> wmb() is probably needed in the "set" case.
> 
> I was thinking that avoiding several byte swaps would make the code
> more maintainable and one wmb() could be explained with a comment.
> Remember, we are having this conversation because the MAC address
> was incorrectly ordered on BE machine. Seems that a "load from NVRAM"
> and "Store to HostMem" preserves the byte stream order in the same
> way memcpy would. And would be heck of a lot easier to understand.

While we could reduce some of the byte swaps, I'm not sure this
problem will be any easier to grok for that effort.

The confusion in this case stems from the fact that the SEEPROM
interface deals with data in LE format.  This fact violates rule that
the chip always deals with its data in BE format.

We can move the discussion so that we are instead talking about values
as they appear on the bus, but then that just means we need to
highlight that this data comes across the bus in BE format, when every
other value (including the flash interface data) comes across in
LE format.  IOW, no matter where we choose to couch the discussion,
we will still have to somehow convey that the behaviors are backwards
of every other chip access.

The approach you are proposing here hides the problem in the behavioural
differences between __raw_read() and readl().  The problem with this is
that someone at some point in the future wants to make the same
conversion / optimization to the flash interface.  That person will then
be confronted with this exact same problem.

If we really want to make the code more maintainable, I might try
giving the EEPROM it's own "read as value" and "read as bytestream"
functions instead.  This would effectively hide the difference behind
an API.  The intentions would (still) be clear and we would avoid the
pointless byteswap in tg3_nvram_read_using_eeprom() and swapback in
tg3_nvram_read_be32() when attempting to read the data as a bytestream.
Swapping the readl() for __raw_read() then becomes an exercise in
optimization, and we could apply it equally to the flash interface.

The problem with this proposal is that we'd still have the ugly
swap32(be32_to_cpu(data)) wart in tg3_nvram_write_block_using_eeprom().
I'm not sure how I'd want to make that prettier.

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

tg3: fix big endian MAC address collection failure

Commit Message

Comments

Patch