diff mbox series

[v8,2/2] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Message ID 20240214-fix_sparse_errors_checksum_tests-v8-2-36b60e673593@rivosinc.com (mailing list archive)
State Superseded
Headers show
Series lib: checksum: Fix issues with checksum tests | expand

Commit Message

Charlie Jenkins Feb. 14, 2024, 9:41 p.m. UTC
The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
variety of architectures that are big endian or do not support
misalgined accesses. Both of these test cases are changed to support big
and little endian architectures.

The test for ip_fast_csum is changed to align the data along (14 +
NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
csum_ipv6_magic aligns the data using a struct. An extra padding field
is added to the struct to ensure that the size of the struct is the same
on all architectures (44 bytes).

The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
daddr. This would fail on parisc64 due to the following code snippet in
arch/parisc/include/asm/checksum.h:

add		%4, %0, %0\n"
ldd,ma		8(%1), %6\n"
ldd,ma		8(%2), %7\n"
add,dc		%5, %0, %0\n"

The second add is expecting carry flags from the first add. Normally,
a double word load (ldd) does not modify the carry flags. However,
because saddr and daddr may be misaligned, ldd triggers a misalignment
trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
many additional instructions to be executed between the two adds. This
can be easily solved by adding the carry into %0 before executing the
ldd.

However, that is not necessary since ipv6 headers should always be
aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to 2
and the ethernet header size is 14.

Architectures that set NET_IP_ALIGN to 0 must support misaligned saddr
and daddr, but that is not tested here.

Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 lib/checksum_kunit.c | 395 ++++++++++++++++++---------------------------------
 1 file changed, 136 insertions(+), 259 deletions(-)

Comments

Guenter Roeck Feb. 14, 2024, 11:03 p.m. UTC | #1
On 2/14/24 13:41, Charlie Jenkins wrote:
> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
> variety of architectures that are big endian or do not support
> misalgined accesses. Both of these test cases are changed to support big
> and little endian architectures.
> 
> The test for ip_fast_csum is changed to align the data along (14 +
> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
> csum_ipv6_magic aligns the data using a struct. An extra padding field
> is added to the struct to ensure that the size of the struct is the same
> on all architectures (44 bytes).
> 
> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
> daddr. This would fail on parisc64 due to the following code snippet in
> arch/parisc/include/asm/checksum.h:
> 
> add		%4, %0, %0\n"
> ldd,ma		8(%1), %6\n"
> ldd,ma		8(%2), %7\n"
> add,dc		%5, %0, %0\n"
> 
> The second add is expecting carry flags from the first add. Normally,
> a double word load (ldd) does not modify the carry flags. However,
> because saddr and daddr may be misaligned, ldd triggers a misalignment
> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
> many additional instructions to be executed between the two adds. This
> can be easily solved by adding the carry into %0 before executing the
> ldd.
> 

I really think this is a bug either in the trap handler or in the hppa64
qemu emulation. Only unaligned ldd instructions affect (actually,
unconditionally set) the carry flag. That doesn't happen with unaligned
ldw instructions. It would be worthwhile tracking this down since there are
lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
when running the kernel in 64-bit mode. On the other side, I guess this
is a different problem. Not sure though if that should even be mentioned
here since that makes it sound as if it would be expected that such
accesses impact the carry flag.

> However, that is not necessary since ipv6 headers should always be
> aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to 2
> and the ethernet header size is 14.
> 
> Architectures that set NET_IP_ALIGN to 0 must support misaligned saddr
> and daddr, but that is not tested here.
> 
> Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

I'll run this through my test system and let you know how it goes.
It should be done in a couple of hours.

Thanks,
Guenter
Guenter Roeck Feb. 15, 2024, 1:24 a.m. UTC | #2
On 2/14/24 13:41, Charlie Jenkins wrote:
> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
> variety of architectures that are big endian or do not support
> misalgined accesses. Both of these test cases are changed to support big
> and little endian architectures.
> 
> The test for ip_fast_csum is changed to align the data along (14 +
> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
> csum_ipv6_magic aligns the data using a struct. An extra padding field
> is added to the struct to ensure that the size of the struct is the same
> on all architectures (44 bytes).
> 
> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
> daddr. This would fail on parisc64 due to the following code snippet in
> arch/parisc/include/asm/checksum.h:
> 
> add		%4, %0, %0\n"
> ldd,ma		8(%1), %6\n"
> ldd,ma		8(%2), %7\n"
> add,dc		%5, %0, %0\n"
> 
> The second add is expecting carry flags from the first add. Normally,
> a double word load (ldd) does not modify the carry flags. However,
> because saddr and daddr may be misaligned, ldd triggers a misalignment
> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
> many additional instructions to be executed between the two adds. This
> can be easily solved by adding the carry into %0 before executing the
> ldd.
> 
> However, that is not necessary since ipv6 headers should always be
> aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to 2
> and the ethernet header size is 14.
> 
> Architectures that set NET_IP_ALIGN to 0 must support misaligned saddr
> and daddr, but that is not tested here.
> 
> Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

Reviewed-and-Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter
Charlie Jenkins Feb. 15, 2024, 1:30 a.m. UTC | #3
On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote:
> On 2/14/24 13:41, Charlie Jenkins wrote:
> > The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
> > variety of architectures that are big endian or do not support
> > misalgined accesses. Both of these test cases are changed to support big
> > and little endian architectures.
> > 
> > The test for ip_fast_csum is changed to align the data along (14 +
> > NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
> > csum_ipv6_magic aligns the data using a struct. An extra padding field
> > is added to the struct to ensure that the size of the struct is the same
> > on all architectures (44 bytes).
> > 
> > The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
> > daddr. This would fail on parisc64 due to the following code snippet in
> > arch/parisc/include/asm/checksum.h:
> > 
> > add		%4, %0, %0\n"
> > ldd,ma		8(%1), %6\n"
> > ldd,ma		8(%2), %7\n"
> > add,dc		%5, %0, %0\n"
> > 
> > The second add is expecting carry flags from the first add. Normally,
> > a double word load (ldd) does not modify the carry flags. However,
> > because saddr and daddr may be misaligned, ldd triggers a misalignment
> > trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
> > many additional instructions to be executed between the two adds. This
> > can be easily solved by adding the carry into %0 before executing the
> > ldd.
> > 
> 
> I really think this is a bug either in the trap handler or in the hppa64
> qemu emulation. Only unaligned ldd instructions affect (actually,
> unconditionally set) the carry flag. That doesn't happen with unaligned
> ldw instructions. It would be worthwhile tracking this down since there are
> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
> when running the kernel in 64-bit mode. On the other side, I guess this
> is a different problem. Not sure though if that should even be mentioned
> here since that makes it sound as if it would be expected that such
> accesses impact the carry flag.

I wasn't confident it was a bug somewhere so that's why I sent this patch.

However, I have just found the section of the processor manual [1] I was
looking for (Section Privileged Software-Accessible Registers subsection
Processor Status Word (PSW)):

"Processor state is encoded in a 64-bit register called the Processor
Status Word (PSW). When an interruption occurs, the current value of the
PSW is saved in the Interruption Processor Status Word (IPSW) and
usually all defined PSW bits are set to 0.

"The PSW is set to the contents of the IPSW by the RETURN FROM
INTERRUPTION instruction. The interruption handler may restore the
original PSW, modify selected bits, or may change the PSW to an entirely
new value."

Stored in the PSW register are the "Carry/borrow bits". This confirms
that the carry/borrow bits should be restored. The save is supposed to
automatically happen upon an interrupt and restored by the RETURN FROM
INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please
correct me if I am wrong).

This v8 was not needed after-all it seems. It would be best to stick
with the v7.

[1] https://parisc.wiki.kernel.org/images-parisc/7/73/Parisc2.0.pdf

> 
> > However, that is not necessary since ipv6 headers should always be
> > aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to
> > 2 and the ethernet header size is 14.
> > 
> > Architectures that set NET_IP_ALIGN to 0 must support misaligned
> > saddr and daddr, but that is not tested here.
> > 
> > Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and
> > ip_fast_csum") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> 
> I'll run this through my test system and let you know how it goes.  It
> should be done in a couple of hours.
> 
> Thanks, Guenter
>
Guenter Roeck Feb. 15, 2024, 1:58 a.m. UTC | #4
Hi Charlie,

On 2/14/24 17:30, Charlie Jenkins wrote:
> On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote:
>> On 2/14/24 13:41, Charlie Jenkins wrote:
>>> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
>>> variety of architectures that are big endian or do not support
>>> misalgined accesses. Both of these test cases are changed to support big
>>> and little endian architectures.
>>>
>>> The test for ip_fast_csum is changed to align the data along (14 +
>>> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
>>> csum_ipv6_magic aligns the data using a struct. An extra padding field
>>> is added to the struct to ensure that the size of the struct is the same
>>> on all architectures (44 bytes).
>>>
>>> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
>>> daddr. This would fail on parisc64 due to the following code snippet in
>>> arch/parisc/include/asm/checksum.h:
>>>
>>> add		%4, %0, %0\n"
>>> ldd,ma		8(%1), %6\n"
>>> ldd,ma		8(%2), %7\n"
>>> add,dc		%5, %0, %0\n"
>>>
>>> The second add is expecting carry flags from the first add. Normally,
>>> a double word load (ldd) does not modify the carry flags. However,
>>> because saddr and daddr may be misaligned, ldd triggers a misalignment
>>> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
>>> many additional instructions to be executed between the two adds. This
>>> can be easily solved by adding the carry into %0 before executing the
>>> ldd.
>>>
>>
>> I really think this is a bug either in the trap handler or in the hppa64
>> qemu emulation. Only unaligned ldd instructions affect (actually,
>> unconditionally set) the carry flag. That doesn't happen with unaligned
>> ldw instructions. It would be worthwhile tracking this down since there are
>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>> when running the kernel in 64-bit mode. On the other side, I guess this
>> is a different problem. Not sure though if that should even be mentioned
>> here since that makes it sound as if it would be expected that such
>> accesses impact the carry flag.
> 
> I wasn't confident it was a bug somewhere so that's why I sent this patch.
> 
> However, I have just found the section of the processor manual [1] I was
> looking for (Section Privileged Software-Accessible Registers subsection
> Processor Status Word (PSW)):
> 
> "Processor state is encoded in a 64-bit register called the Processor
> Status Word (PSW). When an interruption occurs, the current value of the
> PSW is saved in the Interruption Processor Status Word (IPSW) and
> usually all defined PSW bits are set to 0.
> 
> "The PSW is set to the contents of the IPSW by the RETURN FROM
> INTERRUPTION instruction. The interruption handler may restore the
> original PSW, modify selected bits, or may change the PSW to an entirely
> new value."
> 
> Stored in the PSW register are the "Carry/borrow bits". This confirms
> that the carry/borrow bits should be restored. The save is supposed to
> automatically happen upon an interrupt and restored by the RETURN FROM
> INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please
> correct me if I am wrong).
> 

I know that much (I looked into the manual as well), I just really don't
know if this is a Linux bug or a QEMU bug, and I have not been able to
nail it down. I think someone with access to hardware will need to confirm.

Specifically: Yes, the carry/borrow bits should be restored. Question is
if the Linux kernel's interrupt handler doesn't restore the carry bits
or if the problem is on the qemu side.

> This v8 was not needed after-all it seems. It would be best to stick
> with the v7.
> 
I tend to agree; after all, v7 exposes the problem, making it easier to
determine if the problem can be reproduced on real hardware.

FWIW,I wrote some test code which exposes the problem. It is quite
easy to show that carry is always set after executing ldd on an unaligned
address. That is also why I know for sure that the problem is not
seen with ldw on unaligned addresses.

Thanks,
Guenter
John David Anglin Feb. 15, 2024, 3 a.m. UTC | #5
On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
> Specifically: Yes, the carry/borrow bits should be restored. Question is
> if the Linux kernel's interrupt handler doesn't restore the carry bits
> or if the problem is on the qemu side.
The carry/borrow bits in the PSW should be saved and restored by the save_specials
and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.

However, it appears the tophys macro might clobber the carry bits before they
are saved in intr_save.

Dave
Charlie Jenkins Feb. 15, 2024, 3:35 a.m. UTC | #6
On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
> On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
> > Specifically: Yes, the carry/borrow bits should be restored. Question is
> > if the Linux kernel's interrupt handler doesn't restore the carry bits
> > or if the problem is on the qemu side.
> The carry/borrow bits in the PSW should be saved and restored by the save_specials
> and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.

Why would they be needed to be restored in linux? The manual says "The
PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
instruction". This means that the PSW must be restored by the hardware.

We can see the QEMU implementation in:

rfi:
https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93

handling interrupt:
https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109

However the implementation appears to be faulty. During an RFI, the PSW
is always set to 0x804000e (regardless of what the PSW was before the
interrupt).

- Charlie


> 
> However, it appears the tophys macro might clobber the carry bits before they
> are saved in intr_save.
> 
> Dave
> 
> -- 
> John David Anglin  dave.anglin@bell.net
>
Charlie Jenkins Feb. 15, 2024, 4:11 a.m. UTC | #7
On Wed, Feb 14, 2024 at 10:35:26PM -0500, Charlie Jenkins wrote:
> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
> > On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
> > > Specifically: Yes, the carry/borrow bits should be restored. Question is
> > > if the Linux kernel's interrupt handler doesn't restore the carry bits
> > > or if the problem is on the qemu side.
> > The carry/borrow bits in the PSW should be saved and restored by the save_specials
> > and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.

To clarify my previous point, even if rest_specials is restoring PSW,
the "restored" values are going to be overwritten during the rfi
instruction, since that instruction is defined to restore PSW. The
handshake here seems to be lost, with both the hardware and linux
messing with PSW and IPSW and both expecting the other to not mess with
the values.

> 
> Why would they be needed to be restored in linux? The manual says "The
> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
> instruction". This means that the PSW must be restored by the hardware.
> 
> We can see the QEMU implementation in:
> 
> rfi:
> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93
> 
> handling interrupt:
> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109
> 
> However the implementation appears to be faulty. During an RFI, the PSW
> is always set to 0x804000e (regardless of what the PSW was before the
> interrupt).
> 
> - Charlie
> 
> 
> > 
> > However, it appears the tophys macro might clobber the carry bits before they
> > are saved in intr_save.
> > 
> > Dave
> > 
> > -- 
> > John David Anglin  dave.anglin@bell.net
> >
Guenter Roeck Feb. 15, 2024, 8:56 a.m. UTC | #8
On 2/14/24 19:35, Charlie Jenkins wrote:
> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
>> On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
>>> Specifically: Yes, the carry/borrow bits should be restored. Question is
>>> if the Linux kernel's interrupt handler doesn't restore the carry bits
>>> or if the problem is on the qemu side.
>> The carry/borrow bits in the PSW should be saved and restored by the save_specials
>> and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.
> 
> Why would they be needed to be restored in linux? The manual says "The
> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
> instruction". This means that the PSW must be restored by the hardware.
> 
> We can see the QEMU implementation in:
> 
> rfi:
> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93
> 
> handling interrupt:
> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109
> 
> However the implementation appears to be faulty. During an RFI, the PSW
> is always set to 0x804000e (regardless of what the PSW was before the
> interrupt).
> 

Not sure if I agree. The interrupt handler in Linux is the one which needs to set
IPSW. Looking into the code, I agree with Dave that the tophys macro seems to
clobber the carry bits before psw is saved, so they can not really be restored.
The only issue with that idea is that I can only reproduce the problem with
an interrupted ldd instruction but not, for example, with ldw. This is why it
would be really important to have someone with real hardware test this.

Thanks,
Guenter
David Laight Feb. 15, 2024, 10:27 a.m. UTC | #9
...
> It would be worthwhile tracking this down since there are
> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
> when running the kernel in 64-bit mode.

Hmmm....
For performance reasons you really don't want any of them.
The misaligned 64bit fields need an __attribute((aligned(4)) marker.

If the checksum code can do them it really needs to detect
and handle the misalignment.

The misaligned trap handler probably ought to contain a
warn_on_once() to dump stack on the first such error.
They can then be fixed one at a time.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Guenter Roeck Feb. 15, 2024, 3:22 p.m. UTC | #10
On 2/14/24 19:00, John David Anglin wrote:
> On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
>> Specifically: Yes, the carry/borrow bits should be restored. Question is
>> if the Linux kernel's interrupt handler doesn't restore the carry bits
>> or if the problem is on the qemu side.
> The carry/borrow bits in the PSW should be saved and restored by the save_specials
> and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.
> 
> However, it appears the tophys macro might clobber the carry bits before they
> are saved in intr_save.
> 

Actually, I did some logging and found that the correct carry bits do find their way
into the IPSW register in the Linux kernel. Only when userspace is back after handling
the unaligned LDD, the carry flag is always set to 1, no matter how it was set prior
to the unaligned access.

Guenter
Charlie Jenkins Feb. 15, 2024, 3:36 p.m. UTC | #11
On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote:
> On 2/14/24 19:35, Charlie Jenkins wrote:
> > On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
> > > On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
> > > > Specifically: Yes, the carry/borrow bits should be restored. Question is
> > > > if the Linux kernel's interrupt handler doesn't restore the carry bits
> > > > or if the problem is on the qemu side.
> > > The carry/borrow bits in the PSW should be saved and restored by the save_specials
> > > and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.
> > 
> > Why would they be needed to be restored in linux? The manual says "The
> > PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
> > instruction". This means that the PSW must be restored by the hardware.
> > 
> > We can see the QEMU implementation in:
> > 
> > rfi:
> > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93
> > 
> > handling interrupt:
> > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109
> > 
> > However the implementation appears to be faulty. During an RFI, the PSW
> > is always set to 0x804000e (regardless of what the PSW was before the
> > interrupt).
> > 
> 
> Not sure if I agree. The interrupt handler in Linux is the one which needs to set
> IPSW. Looking into the code, I agree with Dave that the tophys macro seems to
> clobber the carry bits before psw is saved, so they can not really be restored.
> The only issue with that idea is that I can only reproduce the problem with
> an interrupted ldd instruction but not, for example, with ldw. This is why it
> would be really important to have someone with real hardware test this.
> 
> Thanks,
> Guenter

Yes, we definitely feedback from somebody with access to hardware, but I
do not understand how "The PSW is set to the contents of the IPSW by the
RETURN FROM INTERRUPTION" could be interpreted as anything except that
the hardware is expected to over-write the contents of the PSW during
the rfi.

- Charlie

>
Guenter Roeck Feb. 15, 2024, 3:44 p.m. UTC | #12
On 2/15/24 02:27, David Laight wrote:
> ...
>> It would be worthwhile tracking this down since there are
>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>> when running the kernel in 64-bit mode.
> 
> Hmmm....
> For performance reasons you really don't want any of them.
> The misaligned 64bit fields need an __attribute((aligned(4)) marker.
> 
> If the checksum code can do them it really needs to detect
> and handle the misalignment.
> 
> The misaligned trap handler probably ought to contain a
> warn_on_once() to dump stack on the first such error.
> They can then be fixed one at a time.
> 

Unaligned LDD at unwind_once+0x4a8/0x5e0

Decoded:

Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445)

Source:

static bool pc_is_kernel_fn(unsigned long pc, void *fn)
{
         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
}

Disassembled:

  c38:   50 3c 00 00     ldd 0(r1),ret0
  c3c:   08 1b 02 44     copy dp,r4
  c40:   0f 80 10 da     ldd 0(ret0),r26    <--
  c44:   37 dd 3f a1     ldo -30(sp),ret1
  c48:   51 02 00 20     ldd 10(r8),rp
  c4c:   e8 40 f0 00     bve,l (rp),rp
  c50:   51 1b 00 30     ldd 18(r8),dp

Hmm, interesting. See below for a possible fix. Should I submit a patch ?

Thanks,
Guenter

---
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index ab23e61a6f01..1d2aab619466 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -772,6 +772,7 @@ ENTRY_CFI(_switch_to)
         bv      %r0(%r2)
         mtctl   %r25,%cr30

+       .align  8
  ENTRY(_switch_to_ret)
         mtctl   %r0, %cr0               /* Needed for single stepping */
         callee_rest
@@ -1702,6 +1703,7 @@ ENTRY_CFI(sys_rt_sigreturn_wrapper)
         LDREG   PT_GR28(%r1),%r28  /* reload original r28 for syscall_exit */
  ENDPROC_CFI(sys_rt_sigreturn_wrapper)

+       .align 8
  ENTRY(syscall_exit)
         /* NOTE: Not all syscalls exit this way.  rt_sigreturn will exit
          * via syscall_exit_rfi if the signal was received while the process
Guenter Roeck Feb. 15, 2024, 4:30 p.m. UTC | #13
On 2/15/24 07:36, Charlie Jenkins wrote:
> On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote:
>> On 2/14/24 19:35, Charlie Jenkins wrote:
>>> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
>>>> On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
>>>>> Specifically: Yes, the carry/borrow bits should be restored. Question is
>>>>> if the Linux kernel's interrupt handler doesn't restore the carry bits
>>>>> or if the problem is on the qemu side.
>>>> The carry/borrow bits in the PSW should be saved and restored by the save_specials
>>>> and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.
>>>
>>> Why would they be needed to be restored in linux? The manual says "The
>>> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
>>> instruction". This means that the PSW must be restored by the hardware.
>>>
>>> We can see the QEMU implementation in:
>>>
>>> rfi:
>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93
>>>
>>> handling interrupt:
>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109
>>>
>>> However the implementation appears to be faulty. During an RFI, the PSW
>>> is always set to 0x804000e (regardless of what the PSW was before the
>>> interrupt).
>>>
>>
>> Not sure if I agree. The interrupt handler in Linux is the one which needs to set
>> IPSW. Looking into the code, I agree with Dave that the tophys macro seems to
>> clobber the carry bits before psw is saved, so they can not really be restored.
>> The only issue with that idea is that I can only reproduce the problem with
>> an interrupted ldd instruction but not, for example, with ldw. This is why it
>> would be really important to have someone with real hardware test this.
>>
>> Thanks,
>> Guenter
> 
> Yes, we definitely feedback from somebody with access to hardware, but I
> do not understand how "The PSW is set to the contents of the IPSW by the
> RETURN FROM INTERRUPTION" could be interpreted as anything except that
> the hardware is expected to over-write the contents of the PSW during
> the rfi.
> 

Sure, I absolutely agree. But that assumes that IPSW is set correctly
in the Linux interrupt handler. We do know that something odd happens
when an unaligned ldd is encountered. At least for my part I don't know
if the problem is in emulate_ldd() in the Linux kernel or in the ldd
implementation and trap handling in qemu. I do know (from my logs)
that qemu does see the correct PSW/IPSW values, because they do
show up correctly in the Linux kernel when running the qemu emulation.
Only it somehow gets lost when the Linux interrupt handler returns.

Thanks.
Guenter
Charlie Jenkins Feb. 15, 2024, 4:51 p.m. UTC | #14
On Thu, Feb 15, 2024 at 08:30:22AM -0800, Guenter Roeck wrote:
> On 2/15/24 07:36, Charlie Jenkins wrote:
> > On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote:
> > > On 2/14/24 19:35, Charlie Jenkins wrote:
> > > > On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
> > > > > On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
> > > > > > Specifically: Yes, the carry/borrow bits should be restored. Question is
> > > > > > if the Linux kernel's interrupt handler doesn't restore the carry bits
> > > > > > or if the problem is on the qemu side.
> > > > > The carry/borrow bits in the PSW should be saved and restored by the save_specials
> > > > > and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.
> > > > 
> > > > Why would they be needed to be restored in linux? The manual says "The
> > > > PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
> > > > instruction". This means that the PSW must be restored by the hardware.
> > > > 
> > > > We can see the QEMU implementation in:
> > > > 
> > > > rfi:
> > > > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93
> > > > 
> > > > handling interrupt:
> > > > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109
> > > > 
> > > > However the implementation appears to be faulty. During an RFI, the PSW
> > > > is always set to 0x804000e (regardless of what the PSW was before the
> > > > interrupt).
> > > > 
> > > 
> > > Not sure if I agree. The interrupt handler in Linux is the one which needs to set
> > > IPSW. Looking into the code, I agree with Dave that the tophys macro seems to
> > > clobber the carry bits before psw is saved, so they can not really be restored.
> > > The only issue with that idea is that I can only reproduce the problem with
> > > an interrupted ldd instruction but not, for example, with ldw. This is why it
> > > would be really important to have someone with real hardware test this.
> > > 
> > > Thanks,
> > > Guenter
> > 
> > Yes, we definitely feedback from somebody with access to hardware, but I
> > do not understand how "The PSW is set to the contents of the IPSW by the
> > RETURN FROM INTERRUPTION" could be interpreted as anything except that
> > the hardware is expected to over-write the contents of the PSW during
> > the rfi.
> > 
> 
> Sure, I absolutely agree. But that assumes that IPSW is set correctly
> in the Linux interrupt handler. We do know that something odd happens

The manual defines the saving of PSW as the responsibility of the
hardware as well: "When an interruption occurs, the current value of the
PSW is saved in the Interruption Processor Status Word (IPSW)". I don't
think this should be interpreted to mean that a software interrupt
handler is required to save the IPSW.

- Charlie

> when an unaligned ldd is encountered. At least for my part I don't know
> if the problem is in emulate_ldd() in the Linux kernel or in the ldd
> implementation and trap handling in qemu. I do know (from my logs)
> that qemu does see the correct PSW/IPSW values, because they do
> show up correctly in the Linux kernel when running the qemu emulation.
> Only it somehow gets lost when the Linux interrupt handler returns.
> 
> Thanks.
> Guenter
>
John David Anglin Feb. 15, 2024, 4:51 p.m. UTC | #15
On 2024-02-15 10:44 a.m., Guenter Roeck wrote:
> On 2/15/24 02:27, David Laight wrote:
>> ...
>>> It would be worthwhile tracking this down since there are
>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>>> when running the kernel in 64-bit mode.
>>
>> Hmmm....
>> For performance reasons you really don't want any of them.
>> The misaligned 64bit fields need an __attribute((aligned(4)) marker.
>>
>> If the checksum code can do them it really needs to detect
>> and handle the misalignment.
>>
>> The misaligned trap handler probably ought to contain a
>> warn_on_once() to dump stack on the first such error.
>> They can then be fixed one at a time.
>>
>
> Unaligned LDD at unwind_once+0x4a8/0x5e0
>
> Decoded:
>
> Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 
> arch/parisc/kernel/unwind.c:445)
>
> Source:
>
> static bool pc_is_kernel_fn(unsigned long pc, void *fn)
> {
>         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
This looks wrong to me.  Function descriptors should always be 8-byte aligned.  I think this
routine should return false if fn isn't 8-byte aligned.
> }
>
> Disassembled:
>
>  c38:   50 3c 00 00     ldd 0(r1),ret0
>  c3c:   08 1b 02 44     copy dp,r4
>  c40:   0f 80 10 da     ldd 0(ret0),r26    <--
>  c44:   37 dd 3f a1     ldo -30(sp),ret1
>  c48:   51 02 00 20     ldd 10(r8),rp
>  c4c:   e8 40 f0 00     bve,l (rp),rp
>  c50:   51 1b 00 30     ldd 18(r8),dp
>
> Hmm, interesting. See below for a possible fix. Should I submit a patch ?
>
> Thanks,
> Guenter
>
> ---
> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
> index ab23e61a6f01..1d2aab619466 100644
> --- a/arch/parisc/kernel/entry.S
> +++ b/arch/parisc/kernel/entry.S
> @@ -772,6 +772,7 @@ ENTRY_CFI(_switch_to)
>         bv      %r0(%r2)
>         mtctl   %r25,%cr30
>
> +       .align  8
Code entry points only need 4-byte alignment.
>  ENTRY(_switch_to_ret)
>         mtctl   %r0, %cr0               /* Needed for single stepping */
>         callee_rest
> @@ -1702,6 +1703,7 @@ ENTRY_CFI(sys_rt_sigreturn_wrapper)
>         LDREG   PT_GR28(%r1),%r28  /* reload original r28 for syscall_exit */
>  ENDPROC_CFI(sys_rt_sigreturn_wrapper)
>
> +       .align 8
Ditto.
>  ENTRY(syscall_exit)
>         /* NOTE: Not all syscalls exit this way.  rt_sigreturn will exit
>          * via syscall_exit_rfi if the signal was received while the process

Dave
Guenter Roeck Feb. 15, 2024, 5:06 p.m. UTC | #16
On 2/15/24 08:51, John David Anglin wrote:
> On 2024-02-15 10:44 a.m., Guenter Roeck wrote:
>> On 2/15/24 02:27, David Laight wrote:
>>> ...
>>>> It would be worthwhile tracking this down since there are
>>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>>>> when running the kernel in 64-bit mode.
>>>
>>> Hmmm....
>>> For performance reasons you really don't want any of them.
>>> The misaligned 64bit fields need an __attribute((aligned(4)) marker.
>>>
>>> If the checksum code can do them it really needs to detect
>>> and handle the misalignment.
>>>
>>> The misaligned trap handler probably ought to contain a
>>> warn_on_once() to dump stack on the first such error.
>>> They can then be fixed one at a time.
>>>
>>
>> Unaligned LDD at unwind_once+0x4a8/0x5e0
>>
>> Decoded:
>>
>> Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445)
>>
>> Source:
>>
>> static bool pc_is_kernel_fn(unsigned long pc, void *fn)
>> {
>>         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
> This looks wrong to me.  Function descriptors should always be 8-byte aligned.  I think this
> routine should return false if fn isn't 8-byte aligned.

Below you state "Code entry points only need 4-byte alignment."

I think that contradicts each other. Also, the calling code is,
for example,
	pc_is_kernel_fn(pc, syscall_exit)

I fail to see how this can be consolidated if it is ok
that syscall_exit is 4-byte aligned but, at the same time,
must be 8-byte aligned to be considered to be a kernel function.

Guenter

>> }
>>
>> Disassembled:
>>
>>  c38:   50 3c 00 00     ldd 0(r1),ret0
>>  c3c:   08 1b 02 44     copy dp,r4
>>  c40:   0f 80 10 da     ldd 0(ret0),r26    <--
>>  c44:   37 dd 3f a1     ldo -30(sp),ret1
>>  c48:   51 02 00 20     ldd 10(r8),rp
>>  c4c:   e8 40 f0 00     bve,l (rp),rp
>>  c50:   51 1b 00 30     ldd 18(r8),dp
>>
>> Hmm, interesting. See below for a possible fix. Should I submit a patch ?
>>
>> Thanks,
>> Guenter
>>
>> ---
>> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
>> index ab23e61a6f01..1d2aab619466 100644
>> --- a/arch/parisc/kernel/entry.S
>> +++ b/arch/parisc/kernel/entry.S
>> @@ -772,6 +772,7 @@ ENTRY_CFI(_switch_to)
>>         bv      %r0(%r2)
>>         mtctl   %r25,%cr30
>>
>> +       .align  8
> Code entry points only need 4-byte alignment.
>>  ENTRY(_switch_to_ret)
>>         mtctl   %r0, %cr0               /* Needed for single stepping */
>>         callee_rest
>> @@ -1702,6 +1703,7 @@ ENTRY_CFI(sys_rt_sigreturn_wrapper)
>>         LDREG   PT_GR28(%r1),%r28  /* reload original r28 for syscall_exit */
>>  ENDPROC_CFI(sys_rt_sigreturn_wrapper)
>>
>> +       .align 8
> Ditto.
>>  ENTRY(syscall_exit)
>>         /* NOTE: Not all syscalls exit this way.  rt_sigreturn will exit
>>          * via syscall_exit_rfi if the signal was received while the process
> 
> Dave
>
John David Anglin Feb. 15, 2024, 5:13 p.m. UTC | #17
On 2024-02-15 11:51 a.m., Charlie Jenkins wrote:
> On Thu, Feb 15, 2024 at 08:30:22AM -0800, Guenter Roeck wrote:
>> On 2/15/24 07:36, Charlie Jenkins wrote:
>>> On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote:
>>>> On 2/14/24 19:35, Charlie Jenkins wrote:
>>>>> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
>>>>>> On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
>>>>>>> Specifically: Yes, the carry/borrow bits should be restored. Question is
>>>>>>> if the Linux kernel's interrupt handler doesn't restore the carry bits
>>>>>>> or if the problem is on the qemu side.
>>>>>> The carry/borrow bits in the PSW should be saved and restored by the save_specials
>>>>>> and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.
>>>>> Why would they be needed to be restored in linux? The manual says "The
>>>>> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
>>>>> instruction". This means that the PSW must be restored by the hardware.
>>>>>
>>>>> We can see the QEMU implementation in:
>>>>>
>>>>> rfi:
>>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93
>>>>>
>>>>> handling interrupt:
>>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109
>>>>>
>>>>> However the implementation appears to be faulty. During an RFI, the PSW
>>>>> is always set to 0x804000e (regardless of what the PSW was before the
>>>>> interrupt).
>>>>>
>>>> Not sure if I agree. The interrupt handler in Linux is the one which needs to set
>>>> IPSW. Looking into the code, I agree with Dave that the tophys macro seems to
>>>> clobber the carry bits before psw is saved, so they can not really be restored.
>>>> The only issue with that idea is that I can only reproduce the problem with
>>>> an interrupted ldd instruction but not, for example, with ldw. This is why it
>>>> would be really important to have someone with real hardware test this.
>>>>
>>>> Thanks,
>>>> Guenter
>>> Yes, we definitely feedback from somebody with access to hardware, but I
>>> do not understand how "The PSW is set to the contents of the IPSW by the
>>> RETURN FROM INTERRUPTION" could be interpreted as anything except that
>>> the hardware is expected to over-write the contents of the PSW during
>>> the rfi.
>>>
>> Sure, I absolutely agree. But that assumes that IPSW is set correctly
>> in the Linux interrupt handler. We do know that something odd happens
> The manual defines the saving of PSW as the responsibility of the
> hardware as well: "When an interruption occurs, the current value of the
> PSW is saved in the Interruption Processor Status Word (IPSW)". I don't
> think this should be interpreted to mean that a software interrupt
> handler is required to save the IPSW.
The IPSW (cr22) is saved by save_specials to regs->gr[0].  It is modified in various
places when an interruption is handled.  In the case of emulate_ldd, we have

         /* else we handled it, let life go on. */
         regs->gr[0]|=PSW_N;

This is supposed to nullify the faulting ldd.  I've yet to find where the carry bit is getting
set in the PSW.

There is a gap between where the hardware sets IPSW and where it is saved to the stack
in save_specials.  tophys might clobber the IPSW is there is a double fault.  But it seems
more likely that regs->gr[0] is getting clobbered somehow.

Dave
>
> - Charlie
>
>> when an unaligned ldd is encountered. At least for my part I don't know
>> if the problem is in emulate_ldd() in the Linux kernel or in the ldd
>> implementation and trap handling in qemu. I do know (from my logs)
>> that qemu does see the correct PSW/IPSW values, because they do
>> show up correctly in the Linux kernel when running the qemu emulation.
>> Only it somehow gets lost when the Linux interrupt handler returns.
>>
>> Thanks.
>> Guenter
>>
John David Anglin Feb. 15, 2024, 5:25 p.m. UTC | #18
On 2024-02-15 12:06 p.m., Guenter Roeck wrote:
> On 2/15/24 08:51, John David Anglin wrote:
>> On 2024-02-15 10:44 a.m., Guenter Roeck wrote:
>>> On 2/15/24 02:27, David Laight wrote:
>>>> ...
>>>>> It would be worthwhile tracking this down since there are
>>>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>>>>> when running the kernel in 64-bit mode.
>>>>
>>>> Hmmm....
>>>> For performance reasons you really don't want any of them.
>>>> The misaligned 64bit fields need an __attribute((aligned(4)) marker.
>>>>
>>>> If the checksum code can do them it really needs to detect
>>>> and handle the misalignment.
>>>>
>>>> The misaligned trap handler probably ought to contain a
>>>> warn_on_once() to dump stack on the first such error.
>>>> They can then be fixed one at a time.
>>>>
>>>
>>> Unaligned LDD at unwind_once+0x4a8/0x5e0
>>>
>>> Decoded:
>>>
>>> Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 
>>> arch/parisc/kernel/unwind.c:445)
>>>
>>> Source:
>>>
>>> static bool pc_is_kernel_fn(unsigned long pc, void *fn)
>>> {
>>>         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
>> This looks wrong to me.  Function descriptors should always be 8-byte aligned.  I think this
>> routine should return false if fn isn't 8-byte aligned.
>
> Below you state "Code entry points only need 4-byte alignment."
>
> I think that contradicts each other. Also, the calling code is,
> for example,
>     pc_is_kernel_fn(pc, syscall_exit)
>
> I fail to see how this can be consolidated if it is ok
> that syscall_exit is 4-byte aligned but, at the same time,
> must be 8-byte aligned to be considered to be a kernel function.
In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned
function descriptor.  The descriptor holds the actual address of the function.  It only needs
4-byte alignment.

Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed
using ldd instructions.

Dave
Guenter Roeck Feb. 15, 2024, 5:29 p.m. UTC | #19
On 2/15/24 08:51, Charlie Jenkins wrote:
> On Thu, Feb 15, 2024 at 08:30:22AM -0800, Guenter Roeck wrote:
>> On 2/15/24 07:36, Charlie Jenkins wrote:
>>> On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote:
>>>> On 2/14/24 19:35, Charlie Jenkins wrote:
>>>>> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote:
>>>>>> On 2024-02-14 8:58 p.m., Guenter Roeck wrote:
>>>>>>> Specifically: Yes, the carry/borrow bits should be restored. Question is
>>>>>>> if the Linux kernel's interrupt handler doesn't restore the carry bits
>>>>>>> or if the problem is on the qemu side.
>>>>>> The carry/borrow bits in the PSW should be saved and restored by the save_specials
>>>>>> and rest_specials macros.  They are defined in arch/parisc/include/asm/assembly.h.
>>>>>
>>>>> Why would they be needed to be restored in linux? The manual says "The
>>>>> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION
>>>>> instruction". This means that the PSW must be restored by the hardware.
>>>>>
>>>>> We can see the QEMU implementation in:
>>>>>
>>>>> rfi:
>>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93
>>>>>
>>>>> handling interrupt:
>>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109
>>>>>
>>>>> However the implementation appears to be faulty. During an RFI, the PSW
>>>>> is always set to 0x804000e (regardless of what the PSW was before the
>>>>> interrupt).
>>>>>
>>>>
>>>> Not sure if I agree. The interrupt handler in Linux is the one which needs to set
>>>> IPSW. Looking into the code, I agree with Dave that the tophys macro seems to
>>>> clobber the carry bits before psw is saved, so they can not really be restored.
>>>> The only issue with that idea is that I can only reproduce the problem with
>>>> an interrupted ldd instruction but not, for example, with ldw. This is why it
>>>> would be really important to have someone with real hardware test this.
>>>>
>>>> Thanks,
>>>> Guenter
>>>
>>> Yes, we definitely feedback from somebody with access to hardware, but I
>>> do not understand how "The PSW is set to the contents of the IPSW by the
>>> RETURN FROM INTERRUPTION" could be interpreted as anything except that
>>> the hardware is expected to over-write the contents of the PSW during
>>> the rfi.
>>>
>>
>> Sure, I absolutely agree. But that assumes that IPSW is set correctly
>> in the Linux interrupt handler. We do know that something odd happens
> 
> The manual defines the saving of PSW as the responsibility of the
> hardware as well: "When an interruption occurs, the current value of the
> PSW is saved in the Interruption Processor Status Word (IPSW)". I don't
> think this should be interpreted to mean that a software interrupt
> handler is required to save the IPSW.
> 

Sorry, I meant the manipulation of ipsw by the linux interrupt handler.

Guenter

> - Charlie
> 
>> when an unaligned ldd is encountered. At least for my part I don't know
>> if the problem is in emulate_ldd() in the Linux kernel or in the ldd
>> implementation and trap handling in qemu. I do know (from my logs)
>> that qemu does see the correct PSW/IPSW values, because they do
>> show up correctly in the Linux kernel when running the qemu emulation.
>> Only it somehow gets lost when the Linux interrupt handler returns.
>>
>> Thanks.
>> Guenter
>>
Guenter Roeck Feb. 15, 2024, 6:17 p.m. UTC | #20
On 2/15/24 09:25, John David Anglin wrote:
[ ... ]
>>>> Source:
>>>>
>>>> static bool pc_is_kernel_fn(unsigned long pc, void *fn)
>>>> {
>>>>         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
>>> This looks wrong to me.  Function descriptors should always be 8-byte aligned.  I think this
>>> routine should return false if fn isn't 8-byte aligned.
>>
>> Below you state "Code entry points only need 4-byte alignment."
>>
>> I think that contradicts each other. Also, the calling code is,
>> for example,
>>     pc_is_kernel_fn(pc, syscall_exit)
>>
>> I fail to see how this can be consolidated if it is ok
>> that syscall_exit is 4-byte aligned but, at the same time,
>> must be 8-byte aligned to be considered to be a kernel function.
> In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned
> function descriptor.  The descriptor holds the actual address of the function.  It only needs
> 4-byte alignment.
> 
> Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed
> using ldd instructions.
> 

Maybe code such as
	pc_is_kernel_fn(pc, syscall_exit)
is wrong because syscall_exit doesn't point to a function descriptor
but to the actual address. The code and comments in arch/parisc/kernel/unwind.c
is for sure confusing because it talks about not using
dereference_kernel_function_descriptor() to keep things simple but then calls
dereference_kernel_function_descriptor() anyway. Maybe it should just be
	if (pc == syscall_exit)
instead.

The entire code is really odd anyway.

         ptr = dereference_kernel_function_descriptor(&handle_interruption);
         if (pc_is_kernel_fn(pc, ptr)) {

and then pc_is_kernel_fn() dereferences it again. Weird.

It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when
CONFIG_64BIT is enabled") might have messed this up. No idea how to fix
it properly, though.

Thanks,
Guenter
Guenter Roeck Feb. 15, 2024, 6:42 p.m. UTC | #21
On 2/15/24 09:25, John David Anglin wrote:
> On 2024-02-15 12:06 p.m., Guenter Roeck wrote:
>> On 2/15/24 08:51, John David Anglin wrote:
>>> On 2024-02-15 10:44 a.m., Guenter Roeck wrote:
>>>> On 2/15/24 02:27, David Laight wrote:
>>>>> ...
>>>>>> It would be worthwhile tracking this down since there are
>>>>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>>>>>> when running the kernel in 64-bit mode.
>>>>>
>>>>> Hmmm....
>>>>> For performance reasons you really don't want any of them.
>>>>> The misaligned 64bit fields need an __attribute((aligned(4)) marker.
>>>>>
>>>>> If the checksum code can do them it really needs to detect
>>>>> and handle the misalignment.
>>>>>
>>>>> The misaligned trap handler probably ought to contain a
>>>>> warn_on_once() to dump stack on the first such error.
>>>>> They can then be fixed one at a time.
>>>>>
>>>>
>>>> Unaligned LDD at unwind_once+0x4a8/0x5e0
>>>>
>>>> Decoded:
>>>>
>>>> Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445)
>>>>
>>>> Source:
>>>>
>>>> static bool pc_is_kernel_fn(unsigned long pc, void *fn)
>>>> {
>>>>         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
>>> This looks wrong to me.  Function descriptors should always be 8-byte aligned.  I think this
>>> routine should return false if fn isn't 8-byte aligned.
>>
>> Below you state "Code entry points only need 4-byte alignment."
>>
>> I think that contradicts each other. Also, the calling code is,
>> for example,
>>     pc_is_kernel_fn(pc, syscall_exit)
>>
>> I fail to see how this can be consolidated if it is ok
>> that syscall_exit is 4-byte aligned but, at the same time,
>> must be 8-byte aligned to be considered to be a kernel function.
> In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned
> function descriptor.  The descriptor holds the actual address of the function.  It only needs
> 4-byte alignment.
> 
> Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed
> using ldd instructions.
> 

How about the patch below ?

Guenter

---
diff --git a/arch/parisc/kernel/unwind.c b/arch/parisc/kernel/unwind.c
index 27ae40a443b8..c2b9e23cbc0a 100644
--- a/arch/parisc/kernel/unwind.c
+++ b/arch/parisc/kernel/unwind.c
@@ -214,24 +214,14 @@ static bool pc_is_kernel_fn(unsigned long pc, void *fn)

  static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int frame_size)
  {
-       /*
-        * We have to use void * instead of a function pointer, because
-        * function pointers aren't a pointer to the function on 64-bit.
-        * Make them const so the compiler knows they live in .text
-        * Note: We could use dereference_kernel_function_descriptor()
-        * instead but we want to keep it simple here.
-        */
-       extern void * const ret_from_kernel_thread;
-       extern void * const syscall_exit;
-       extern void * const intr_return;
-       extern void * const _switch_to_ret;
+       void (*ret_from_kernel_thread)(void);
+       void (*syscall_exit)(void);
+       void (*intr_return)(void);
+       void (*_switch_to_ret)(void);
  #ifdef CONFIG_IRQSTACKS
-       extern void * const _call_on_stack;
+       void (*_call_on_stack)(void);
  #endif /* CONFIG_IRQSTACKS */
-       void *ptr;
-
-       ptr = dereference_kernel_function_descriptor(&handle_interruption);
-       if (pc_is_kernel_fn(pc, ptr)) {
+       if (pc_is_kernel_fn(pc, handle_interruption)) {
                 struct pt_regs *regs = (struct pt_regs *)(info->sp - frame_size - PT_SZ_ALGN);
                 dbg("Unwinding through handle_interruption()\n");
                 info->prev_sp = regs->gr[30];
John David Anglin Feb. 15, 2024, 6:56 p.m. UTC | #22
On 2024-02-15 1:17 p.m., Guenter Roeck wrote:
> On 2/15/24 09:25, John David Anglin wrote:
> [ ... ]
>>>>> Source:
>>>>>
>>>>> static bool pc_is_kernel_fn(unsigned long pc, void *fn)
>>>>> {
>>>>>         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
>>>> This looks wrong to me.  Function descriptors should always be 8-byte aligned.  I think this
>>>> routine should return false if fn isn't 8-byte aligned.
>>>
>>> Below you state "Code entry points only need 4-byte alignment."
>>>
>>> I think that contradicts each other. Also, the calling code is,
>>> for example,
>>>     pc_is_kernel_fn(pc, syscall_exit)
>>>
>>> I fail to see how this can be consolidated if it is ok
>>> that syscall_exit is 4-byte aligned but, at the same time,
>>> must be 8-byte aligned to be considered to be a kernel function.
>> In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned
>> function descriptor.  The descriptor holds the actual address of the function.  It only needs
>> 4-byte alignment.
>>
>> Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed
>> using ldd instructions.
>>
>
> Maybe code such as
>     pc_is_kernel_fn(pc, syscall_exit)
> is wrong because syscall_exit doesn't point to a function descriptor
> but to the actual address. The code and comments in arch/parisc/kernel/unwind.c
It depends on how syscall_exit is declared.    unwind.c lies the type of handle_interruption, etc:

         extern void * const handle_interruption;
         extern void * const ret_from_kernel_thread;
         extern void * const syscall_exit;
         extern void * const intr_return;
         extern void * const _switch_to_ret;
#ifdef CONFIG_IRQSTACKS
         extern void * const _call_on_stack;
#endif /* CONFIG_IRQSTACKS */

This should yield actual addresses.
> is for sure confusing because it talks about not using
> dereference_kernel_function_descriptor() to keep things simple but then calls
> dereference_kernel_function_descriptor() anyway. Maybe it should just be
>     if (pc == syscall_exit)
> instead.
Looks like.
>
> The entire code is really odd anyway.
>
>         ptr = dereference_kernel_function_descriptor(&handle_interruption);
>         if (pc_is_kernel_fn(pc, ptr)) {
>
> and then pc_is_kernel_fn() dereferences it again. Weird.
>
> It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when
> CONFIG_64BIT is enabled") might have messed this up. No idea how to fix
> it properly, though.
This is Helge's code...  I'll let him fix it.

Dave
David Laight Feb. 15, 2024, 7:41 p.m. UTC | #23
...
> > Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed
> > using ldd instructions.
> >
> 
> How about the patch below ?

I think you still need something to avoid the misalignment trap
in the 'false' case.

If they only need to be aligned 'for efficiency' then I'd assume
the cpu (or whatever normally processes them) is ok with 4-byte
alignment even though 'ldd' (8 byte load?) faults.

Which would mean you need to read them with two 4-byte loads.

Especially if 'fn' isn't just one of a couple of specific values
that can be forced to be aligned.

But the code might just be completely broken.
(as suggested elsewhere).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Charlie Jenkins Feb. 15, 2024, 7:42 p.m. UTC | #24
On Thu, Feb 15, 2024 at 10:42:29AM -0800, Guenter Roeck wrote:
> On 2/15/24 09:25, John David Anglin wrote:
> > On 2024-02-15 12:06 p.m., Guenter Roeck wrote:
> > > On 2/15/24 08:51, John David Anglin wrote:
> > > > On 2024-02-15 10:44 a.m., Guenter Roeck wrote:
> > > > > On 2/15/24 02:27, David Laight wrote:
> > > > > > ...
> > > > > > > It would be worthwhile tracking this down since there are
> > > > > > > lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
> > > > > > > when running the kernel in 64-bit mode.
> > > > > > 
> > > > > > Hmmm....
> > > > > > For performance reasons you really don't want any of them.
> > > > > > The misaligned 64bit fields need an __attribute((aligned(4)) marker.
> > > > > > 
> > > > > > If the checksum code can do them it really needs to detect
> > > > > > and handle the misalignment.
> > > > > > 
> > > > > > The misaligned trap handler probably ought to contain a
> > > > > > warn_on_once() to dump stack on the first such error.
> > > > > > They can then be fixed one at a time.
> > > > > > 
> > > > > 
> > > > > Unaligned LDD at unwind_once+0x4a8/0x5e0
> > > > > 
> > > > > Decoded:
> > > > > 
> > > > > Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445)
> > > > > 
> > > > > Source:
> > > > > 
> > > > > static bool pc_is_kernel_fn(unsigned long pc, void *fn)
> > > > > {
> > > > >         return (unsigned long)dereference_kernel_function_descriptor(fn) == pc;
> > > > This looks wrong to me.  Function descriptors should always be 8-byte aligned.  I think this
> > > > routine should return false if fn isn't 8-byte aligned.
> > > 
> > > Below you state "Code entry points only need 4-byte alignment."
> > > 
> > > I think that contradicts each other. Also, the calling code is,
> > > for example,
> > >     pc_is_kernel_fn(pc, syscall_exit)
> > > 
> > > I fail to see how this can be consolidated if it is ok
> > > that syscall_exit is 4-byte aligned but, at the same time,
> > > must be 8-byte aligned to be considered to be a kernel function.
> > In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned
> > function descriptor.  The descriptor holds the actual address of the function.  It only needs
> > 4-byte alignment.
> > 
> > Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed
> > using ldd instructions.
> > 
> 
> How about the patch below ?
> 
> Guenter
> 
> ---
> diff --git a/arch/parisc/kernel/unwind.c b/arch/parisc/kernel/unwind.c
> index 27ae40a443b8..c2b9e23cbc0a 100644
> --- a/arch/parisc/kernel/unwind.c
> +++ b/arch/parisc/kernel/unwind.c
> @@ -214,24 +214,14 @@ static bool pc_is_kernel_fn(unsigned long pc, void *fn)
> 
>  static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int frame_size)
>  {
> -       /*
> -        * We have to use void * instead of a function pointer, because
> -        * function pointers aren't a pointer to the function on 64-bit.
> -        * Make them const so the compiler knows they live in .text
> -        * Note: We could use dereference_kernel_function_descriptor()
> -        * instead but we want to keep it simple here.
> -        */
> -       extern void * const ret_from_kernel_thread;
> -       extern void * const syscall_exit;
> -       extern void * const intr_return;
> -       extern void * const _switch_to_ret;
> +       void (*ret_from_kernel_thread)(void);
> +       void (*syscall_exit)(void);
> +       void (*intr_return)(void);
> +       void (*_switch_to_ret)(void);
>  #ifdef CONFIG_IRQSTACKS
> -       extern void * const _call_on_stack;
> +       void (*_call_on_stack)(void);
>  #endif /* CONFIG_IRQSTACKS */
> -       void *ptr;
> -
> -       ptr = dereference_kernel_function_descriptor(&handle_interruption);
> -       if (pc_is_kernel_fn(pc, ptr)) {
> +       if (pc_is_kernel_fn(pc, handle_interruption)) {
>                 struct pt_regs *regs = (struct pt_regs *)(info->sp - frame_size - PT_SZ_ALGN);
>                 dbg("Unwinding through handle_interruption()\n");
>                 info->prev_sp = regs->gr[30];
> 

Seems like a promising direction.

It feels like we have hit a point when we should "close" this thread and
start potentially a couple new ones to correct the behavior of
saving/restoring the PSW and this behavior with unwind.

I don't know what the proper etiquitte is for reverting back to a
previous patch, should I send a v9 that is just the same as the v7?

Thank you Guenter and John for looking into the parisc behavior!

- Charlie
Guenter Roeck Feb. 15, 2024, 9 p.m. UTC | #25
On 2/15/24 10:56, John David Anglin wrote:
[ ... ]
>> It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when
>> CONFIG_64BIT is enabled") might have messed this up. No idea how to fix
>> it properly, though.
> This is Helge's code...  I'll let him fix it.
> 

I need the code anyway to be able to debug the other problem, so I'll give it
a shot and send a patch. Helge can then go from there.

Thanks,
Guenter
John David Anglin Feb. 15, 2024, 9:08 p.m. UTC | #26
On 2024-02-15 4:00 p.m., Guenter Roeck wrote:
> On 2/15/24 10:56, John David Anglin wrote:
> [ ... ]
>>> It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when
>>> CONFIG_64BIT is enabled") might have messed this up. No idea how to fix
>>> it properly, though.
>> This is Helge's code...  I'll let him fix it.
>>
>
> I need the code anyway to be able to debug the other problem, so I'll give it
> a shot and send a patch. Helge can then go from there.
Your patch looked like a good start.

Dave
Guenter Roeck Feb. 16, 2024, 5 a.m. UTC | #27
On 2/15/24 11:42, Charlie Jenkins wrote:
[ ... ]
> Seems like a promising direction.
> 
> It feels like we have hit a point when we should "close" this thread and
> start potentially a couple new ones to correct the behavior of
> saving/restoring the PSW and this behavior with unwind.
> 

No need. Found it. It was a qemu problem. Kind of obvious, like hiding in plain sight.
  I'll send a patch shortly.

> I don't know what the proper etiquitte is for reverting back to a
> previous patch, should I send a v9 that is just the same as the v7?
> 

Not sure if there is one. Just go ahead and send v9, and tell the maintainer
to yell at me.

Guenter
Guenter Roeck Feb. 16, 2024, 5:25 a.m. UTC | #28
On Fri, Feb 16, 2024 at 06:54:55AM +0100, Helge Deller wrote:
> 
> Can you please give a pointer to this test code?
> I'm happy to try it on real hardware.
> 
See below.

Guenter

---
From 0478f35f02224994e1d81e614b66219ab7539f7f Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Wed, 14 Feb 2024 11:25:18 -0800
Subject: [PATCH] carry tests

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 lib/checksum_kunit.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 72c313ba4c78..8f7925396e53 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -546,12 +546,88 @@ static void test_csum_ipv6_magic(struct kunit *test)
 #endif /* !CONFIG_NET */
 }
 
+#ifdef CONFIG_64BIT
+
+static __inline__ int get_carry64(void *addr)
+{
+	int carry = 0;
+	unsigned long sum = 0xffffffff;
+	unsigned long tmp;
+
+	__asm__ __volatile__ (
+"	add	%0, %0, %0\n"	/* clear carry			*/
+"	ldd	0(%2), %3\n"	/* load from memory		*/
+"	add	%1, %3, %1\n"	/* optionally generate carry	*/
+"	ldd	0(%2), %3\n"	/* load from memory again	*/
+"	add,dc	%0, %0, %0\n"	/* return carry			*/
+	: "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp)
+	: "0" (carry), "1" (sum), "2" (addr)
+	: "memory");
+
+	return carry;
+}
+
+static __inline__ int get_carry32(void *addr)
+{
+	int carry = 0;
+	unsigned int sum = 0xffffffff;
+	unsigned int tmp;
+
+	__asm__ __volatile__ (
+"	add	%0, %0, %0\n"	/* clear carry			*/
+"	ldw	0(%2), %3\n"	/* load from memory		*/
+"	add	%1, %3, %1\n"	/* optionally generate carry	*/
+"	ldw	0(%2), %3\n"	/* load from memory again	*/
+"	addc	%0, %0, %0\n"	/* return carry			*/
+	: "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp)
+	: "0" (carry), "1" (sum), "2" (addr)
+	: "memory");
+
+	return carry;
+}
+
+static void test_bad_carry(struct kunit *test)
+{
+	int carry;
+
+	memset(tmp_buf, 0xff, sizeof(tmp_buf));
+	carry = get_carry64(&tmp_buf[0]);
+	pr_info("#### carry64 aligned, expect 1 -> %d\n", carry);
+	carry = get_carry64(&tmp_buf[4]);
+	pr_info("#### carry64 unaligned 4, expect 1 -> %d\n", carry);
+
+	carry = get_carry64(&tmp_buf[2]);
+	pr_info("#### carry64 unaligned 2, expect 1 -> %d\n", carry);
+
+	carry = get_carry32(&tmp_buf[0]);
+	pr_info("#### carry32 aligned, expect 1 -> %d\n", carry);
+	carry = get_carry32(&tmp_buf[2]);
+	pr_info("#### carry64 unaligned, expect 1 -> %d\n", carry);
+
+	memset(tmp_buf, 0, sizeof(tmp_buf));
+	carry = get_carry64(&tmp_buf[0]);
+	pr_info("#### carry64 aligned, expect 0 -> %d\n", carry);
+	carry = get_carry64(&tmp_buf[4]);
+	pr_info("#### carry64 unaligned 4, expect 0 -> %d\n", carry);
+	carry = get_carry64(&tmp_buf[2]);
+	pr_info("#### carry64 unaligned 2, expect 0 -> %d\n", carry);
+
+	carry = get_carry32(&tmp_buf[0]);
+	pr_info("#### carry32 aligned, expect 0 -> %d\n", carry);
+	carry = get_carry32(&tmp_buf[2]);
+	pr_info("#### carry32 unaligned, expect 0 -> %d\n", carry);
+}
+#else
+static void test_bad_carry(struct kunit *test) {}
+#endif /* CONFIG_64BIT */
+
 static struct kunit_case __refdata checksum_test_cases[] = {
 	KUNIT_CASE(test_csum_fixed_random_inputs),
 	KUNIT_CASE(test_csum_all_carry_inputs),
 	KUNIT_CASE(test_csum_no_carry_inputs),
 	KUNIT_CASE(test_ip_fast_csum),
 	KUNIT_CASE(test_csum_ipv6_magic),
+	KUNIT_CASE(test_bad_carry),
 	{}
 };
Helge Deller Feb. 16, 2024, 5:54 a.m. UTC | #29
On 2/15/24 02:58, Guenter Roeck wrote:
> Hi Charlie,
>
> On 2/14/24 17:30, Charlie Jenkins wrote:
>> On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote:
>>> On 2/14/24 13:41, Charlie Jenkins wrote:
>>>> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
>>>> variety of architectures that are big endian or do not support
>>>> misalgined accesses. Both of these test cases are changed to support big
>>>> and little endian architectures.
>>>>
>>>> The test for ip_fast_csum is changed to align the data along (14 +
>>>> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
>>>> csum_ipv6_magic aligns the data using a struct. An extra padding field
>>>> is added to the struct to ensure that the size of the struct is the same
>>>> on all architectures (44 bytes).
>>>>
>>>> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
>>>> daddr. This would fail on parisc64 due to the following code snippet in
>>>> arch/parisc/include/asm/checksum.h:
>>>>
>>>> add        %4, %0, %0\n"
>>>> ldd,ma        8(%1), %6\n"
>>>> ldd,ma        8(%2), %7\n"
>>>> add,dc        %5, %0, %0\n"
>>>>
>>>> The second add is expecting carry flags from the first add. Normally,
>>>> a double word load (ldd) does not modify the carry flags. However,
>>>> because saddr and daddr may be misaligned, ldd triggers a misalignment
>>>> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
>>>> many additional instructions to be executed between the two adds. This
>>>> can be easily solved by adding the carry into %0 before executing the
>>>> ldd.
>>>>
>>>
>>> I really think this is a bug either in the trap handler or in the hppa64
>>> qemu emulation. Only unaligned ldd instructions affect (actually,
>>> unconditionally set) the carry flag. That doesn't happen with unaligned
>>> ldw instructions. It would be worthwhile tracking this down since there are
>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>>> when running the kernel in 64-bit mode. On the other side, I guess this
>>> is a different problem. Not sure though if that should even be mentioned
>>> here since that makes it sound as if it would be expected that such
>>> accesses impact the carry flag.
>>
>> I wasn't confident it was a bug somewhere so that's why I sent this patch.
>>
>> However, I have just found the section of the processor manual [1] I was
>> looking for (Section Privileged Software-Accessible Registers subsection
>> Processor Status Word (PSW)):
>>
>> "Processor state is encoded in a 64-bit register called the Processor
>> Status Word (PSW). When an interruption occurs, the current value of the
>> PSW is saved in the Interruption Processor Status Word (IPSW) and
>> usually all defined PSW bits are set to 0.
>>
>> "The PSW is set to the contents of the IPSW by the RETURN FROM
>> INTERRUPTION instruction. The interruption handler may restore the
>> original PSW, modify selected bits, or may change the PSW to an entirely
>> new value."
>>
>> Stored in the PSW register are the "Carry/borrow bits". This confirms
>> that the carry/borrow bits should be restored. The save is supposed to
>> automatically happen upon an interrupt and restored by the RETURN FROM
>> INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please
>> correct me if I am wrong).
>>
>
> I know that much (I looked into the manual as well), I just really don't
> know if this is a Linux bug or a QEMU bug, and I have not been able to
> nail it down. I think someone with access to hardware will need to confirm.
>
> Specifically: Yes, the carry/borrow bits should be restored. Question is
> if the Linux kernel's interrupt handler doesn't restore the carry bits
> or if the problem is on the qemu side.
>
>> This v8 was not needed after-all it seems. It would be best to stick
>> with the v7.
>>
> I tend to agree; after all, v7 exposes the problem, making it easier to
> determine if the problem can be reproduced on real hardware.
>
> FWIW,I wrote some test code which exposes the problem.

Can you please give a pointer to this test code?
I'm happy to try it on real hardware.

> It is quite easy to show that carry is always set after executing ldd
> on an unaligned address. That is also why I know for sure that the
> problem is not seen with ldw on unaligned addresses.
Interesting.
In general I think it's quite important to differentiate between
running on qemu or running on physical hardware.
Qemu just recently got 64-bit support, and it's not yet behaving
like real hardware. One thing I noticed is, that read hardware
does not seem to jump into the exception handler twice, while
qemu does. So, if you run into an exception (e.g. unaligned ldd)
then if a second exception happens in the fault handler (e.g. second
unaligned ldd to resolve wrongly-coded code lookup), you will
get different behaviour between hardware and emulation.
This is also the reason why qemu still fails to emulate newer
64-bit Linux kernels which uses kernel modules.

Helge
Guenter Roeck Feb. 16, 2024, 6:09 a.m. UTC | #30
On 2/15/24 21:54, Helge Deller wrote:
[ ... ]
> 
> Can you please give a pointer to this test code?
> I'm happy to try it on real hardware.
> 
You should also see the problem if you use v7 of Charlie's checksum
unit test fixes.

I submitted the qemu fix (or at least what I think the fix should be)
a couple of minutes ago.

https://patchwork.kernel.org/project/qemu-devel/patch/20240216053415.2163286-1-linux@roeck-us.net/

>> It is quite easy to show that carry is always set after executing ldd
>> on an unaligned address. That is also why I know for sure that the
>> problem is not seen with ldw on unaligned addresses.
> Interesting.

Ultimately it wasn't surprising, with the unusual carry bit
implementation on hppa. The upper 8 carry bits were not masked
correctly when returning from a trap or interrupt.

> In general I think it's quite important to differentiate between
> running on qemu or running on physical hardware.

I know, that makes testing always tricky (not just with this
architecture) because it is often not obvious if the problem
is a problem in the tested code or a problem in the emulation.

> Qemu just recently got 64-bit support, and it's not yet behaving
> like real hardware. One thing I noticed is, that read hardware
> does not seem to jump into the exception handler twice, while
> qemu does. So, if you run into an exception (e.g. unaligned ldd)
> then if a second exception happens in the fault handler (e.g. second
> unaligned ldd to resolve wrongly-coded code lookup), you will
> get different behaviour between hardware and emulation.

Hmm, interesting. Makes me wonder how the real hardware handles such
double traps.

> This is also the reason why qemu still fails to emulate newer
> 64-bit Linux kernels which uses kernel modules.
> 
I don't use modules in my testing, so I'll leave that alone for
anther day.

Cheers,
Guenter
Charlie Jenkins Feb. 16, 2024, 6:13 a.m. UTC | #31
On Thu, Feb 15, 2024 at 10:09:42PM -0800, Guenter Roeck wrote:
> On 2/15/24 21:54, Helge Deller wrote:
> [ ... ]
> > 
> > Can you please give a pointer to this test code?
> > I'm happy to try it on real hardware.
> > 
> You should also see the problem if you use v7 of Charlie's checksum
> unit test fixes.
> 
> I submitted the qemu fix (or at least what I think the fix should be)
> a couple of minutes ago.
> 
> https://patchwork.kernel.org/project/qemu-devel/patch/20240216053415.2163286-1-linux@roeck-us.net/
> 
> > > It is quite easy to show that carry is always set after executing ldd
> > > on an unaligned address. That is also why I know for sure that the
> > > problem is not seen with ldw on unaligned addresses.
> > Interesting.
> 
> Ultimately it wasn't surprising, with the unusual carry bit
> implementation on hppa. The upper 8 carry bits were not masked
> correctly when returning from a trap or interrupt.

Tangential question, but why does Linux need to save and restore the PSW
if that is already handled by the hardware? I am missing something.

- Charlie

> 
> > In general I think it's quite important to differentiate between
> > running on qemu or running on physical hardware.
> 
> I know, that makes testing always tricky (not just with this
> architecture) because it is often not obvious if the problem
> is a problem in the tested code or a problem in the emulation.
> 
> > Qemu just recently got 64-bit support, and it's not yet behaving
> > like real hardware. One thing I noticed is, that read hardware
> > does not seem to jump into the exception handler twice, while
> > qemu does. So, if you run into an exception (e.g. unaligned ldd)
> > then if a second exception happens in the fault handler (e.g. second
> > unaligned ldd to resolve wrongly-coded code lookup), you will
> > get different behaviour between hardware and emulation.
> 
> Hmm, interesting. Makes me wonder how the real hardware handles such
> double traps.
> 
> > This is also the reason why qemu still fails to emulate newer
> > 64-bit Linux kernels which uses kernel modules.
> > 
> I don't use modules in my testing, so I'll leave that alone for
> anther day.
> 
> Cheers,
> Guenter
>
Guenter Roeck Feb. 16, 2024, 6:58 a.m. UTC | #32
Hi Helge,

On 2/15/24 23:31, Helge Deller wrote:
> On 2/16/24 06:25, Guenter Roeck wrote:
>> On Fri, Feb 16, 2024 at 06:54:55AM +0100, Helge Deller wrote:
>>>
>>> Can you please give a pointer to this test code?
>>> I'm happy to try it on real hardware.
>>>
>> See below.
> 
> Testcase runs OK on physical machine:
> 
> #### carry64 aligned, expect 1 -> 1
> #### carry64 unaligned 4, expect 1 -> 1
> #### carry64 unaligned 2, expect 1 -> 1
> #### carry32 aligned, expect 1 -> 1
> #### carry64 unaligned, expect 1 -> 1
> #### carry64 aligned, expect 0 -> 0
> #### carry64 unaligned 4, expect 0 -> 0
> #### carry64 unaligned 2, expect 0 -> 0
> #### carry32 aligned, expect 0 -> 0
> #### carry32 unaligned, expect 0 -> 0
>      ok 6 test_bad_carry
> 

thanks a lot for the confirmation.

Guenter
Helge Deller Feb. 16, 2024, 7:31 a.m. UTC | #33
On 2/16/24 06:25, Guenter Roeck wrote:
> On Fri, Feb 16, 2024 at 06:54:55AM +0100, Helge Deller wrote:
>>
>> Can you please give a pointer to this test code?
>> I'm happy to try it on real hardware.
>>
> See below.

Testcase runs OK on physical machine:

#### carry64 aligned, expect 1 -> 1
#### carry64 unaligned 4, expect 1 -> 1
#### carry64 unaligned 2, expect 1 -> 1
#### carry32 aligned, expect 1 -> 1
#### carry64 unaligned, expect 1 -> 1
#### carry64 aligned, expect 0 -> 0
#### carry64 unaligned 4, expect 0 -> 0
#### carry64 unaligned 2, expect 0 -> 0
#### carry32 aligned, expect 0 -> 0
#### carry32 unaligned, expect 0 -> 0
     ok 6 test_bad_carry

Helge

> ---
>  From 0478f35f02224994e1d81e614b66219ab7539f7f Mon Sep 17 00:00:00 2001
> From: Guenter Roeck <linux@roeck-us.net>
> Date: Wed, 14 Feb 2024 11:25:18 -0800
> Subject: [PATCH] carry tests
>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>   lib/checksum_kunit.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
>
> diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
> index 72c313ba4c78..8f7925396e53 100644
> --- a/lib/checksum_kunit.c
> +++ b/lib/checksum_kunit.c
> @@ -546,12 +546,88 @@ static void test_csum_ipv6_magic(struct kunit *test)
>   #endif /* !CONFIG_NET */
>   }
>
> +#ifdef CONFIG_64BIT
> +
> +static __inline__ int get_carry64(void *addr)
> +{
> +	int carry = 0;
> +	unsigned long sum = 0xffffffff;
> +	unsigned long tmp;
> +
> +	__asm__ __volatile__ (
> +"	add	%0, %0, %0\n"	/* clear carry			*/
> +"	ldd	0(%2), %3\n"	/* load from memory		*/
> +"	add	%1, %3, %1\n"	/* optionally generate carry	*/
> +"	ldd	0(%2), %3\n"	/* load from memory again	*/
> +"	add,dc	%0, %0, %0\n"	/* return carry			*/
> +	: "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp)
> +	: "0" (carry), "1" (sum), "2" (addr)
> +	: "memory");
> +
> +	return carry;
> +}
> +
> +static __inline__ int get_carry32(void *addr)
> +{
> +	int carry = 0;
> +	unsigned int sum = 0xffffffff;
> +	unsigned int tmp;
> +
> +	__asm__ __volatile__ (
> +"	add	%0, %0, %0\n"	/* clear carry			*/
> +"	ldw	0(%2), %3\n"	/* load from memory		*/
> +"	add	%1, %3, %1\n"	/* optionally generate carry	*/
> +"	ldw	0(%2), %3\n"	/* load from memory again	*/
> +"	addc	%0, %0, %0\n"	/* return carry			*/
> +	: "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp)
> +	: "0" (carry), "1" (sum), "2" (addr)
> +	: "memory");
> +
> +	return carry;
> +}
> +
> +static void test_bad_carry(struct kunit *test)
> +{
> +	int carry;
> +
> +	memset(tmp_buf, 0xff, sizeof(tmp_buf));
> +	carry = get_carry64(&tmp_buf[0]);
> +	pr_info("#### carry64 aligned, expect 1 -> %d\n", carry);
> +	carry = get_carry64(&tmp_buf[4]);
> +	pr_info("#### carry64 unaligned 4, expect 1 -> %d\n", carry);
> +
> +	carry = get_carry64(&tmp_buf[2]);
> +	pr_info("#### carry64 unaligned 2, expect 1 -> %d\n", carry);
> +
> +	carry = get_carry32(&tmp_buf[0]);
> +	pr_info("#### carry32 aligned, expect 1 -> %d\n", carry);
> +	carry = get_carry32(&tmp_buf[2]);
> +	pr_info("#### carry64 unaligned, expect 1 -> %d\n", carry);
> +
> +	memset(tmp_buf, 0, sizeof(tmp_buf));
> +	carry = get_carry64(&tmp_buf[0]);
> +	pr_info("#### carry64 aligned, expect 0 -> %d\n", carry);
> +	carry = get_carry64(&tmp_buf[4]);
> +	pr_info("#### carry64 unaligned 4, expect 0 -> %d\n", carry);
> +	carry = get_carry64(&tmp_buf[2]);
> +	pr_info("#### carry64 unaligned 2, expect 0 -> %d\n", carry);
> +
> +	carry = get_carry32(&tmp_buf[0]);
> +	pr_info("#### carry32 aligned, expect 0 -> %d\n", carry);
> +	carry = get_carry32(&tmp_buf[2]);
> +	pr_info("#### carry32 unaligned, expect 0 -> %d\n", carry);
> +}
> +#else
> +static void test_bad_carry(struct kunit *test) {}
> +#endif /* CONFIG_64BIT */
> +
>   static struct kunit_case __refdata checksum_test_cases[] = {
>   	KUNIT_CASE(test_csum_fixed_random_inputs),
>   	KUNIT_CASE(test_csum_all_carry_inputs),
>   	KUNIT_CASE(test_csum_no_carry_inputs),
>   	KUNIT_CASE(test_ip_fast_csum),
>   	KUNIT_CASE(test_csum_ipv6_magic),
> +	KUNIT_CASE(test_bad_carry),
>   	{}
>   };
>
Helge Deller Feb. 16, 2024, 12:05 p.m. UTC | #34
On 2/16/24 07:13, Charlie Jenkins wrote:
> On Thu, Feb 15, 2024 at 10:09:42PM -0800, Guenter Roeck wrote:
>> On 2/15/24 21:54, Helge Deller wrote:
>> [ ... ]
>>>
>>> Can you please give a pointer to this test code?
>>> I'm happy to try it on real hardware.
>>>
>> You should also see the problem if you use v7 of Charlie's checksum
>> unit test fixes.
>>
>> I submitted the qemu fix (or at least what I think the fix should be)
>> a couple of minutes ago.
>>
>> https://patchwork.kernel.org/project/qemu-devel/patch/20240216053415.2163286-1-linux@roeck-us.net/
>>
>>>> It is quite easy to show that carry is always set after executing ldd
>>>> on an unaligned address. That is also why I know for sure that the
>>>> problem is not seen with ldw on unaligned addresses.
>>> Interesting.
>>
>> Ultimately it wasn't surprising, with the unusual carry bit
>> implementation on hppa. The upper 8 carry bits were not masked
>> correctly when returning from a trap or interrupt.
>
> Tangential question, but why does Linux need to save and restore the PSW
> if that is already handled by the hardware? I am missing something.

e.g. for task switching.

Helge
John David Anglin Feb. 16, 2024, 6:22 p.m. UTC | #35
On 2024-02-16 12:54 a.m., Helge Deller wrote:
> On 2/15/24 02:58, Guenter Roeck wrote:
>> Hi Charlie,
>>
>> On 2/14/24 17:30, Charlie Jenkins wrote:
>>> On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote:
>>>> On 2/14/24 13:41, Charlie Jenkins wrote:
>>>>> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
>>>>> variety of architectures that are big endian or do not support
>>>>> misalgined accesses. Both of these test cases are changed to support big
>>>>> and little endian architectures.
>>>>>
>>>>> The test for ip_fast_csum is changed to align the data along (14 +
>>>>> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
>>>>> csum_ipv6_magic aligns the data using a struct. An extra padding field
>>>>> is added to the struct to ensure that the size of the struct is the same
>>>>> on all architectures (44 bytes).
>>>>>
>>>>> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
>>>>> daddr. This would fail on parisc64 due to the following code snippet in
>>>>> arch/parisc/include/asm/checksum.h:
>>>>>
>>>>> add        %4, %0, %0\n"
>>>>> ldd,ma        8(%1), %6\n"
>>>>> ldd,ma        8(%2), %7\n"
>>>>> add,dc        %5, %0, %0\n"
>>>>>
>>>>> The second add is expecting carry flags from the first add. Normally,
>>>>> a double word load (ldd) does not modify the carry flags. However,
>>>>> because saddr and daddr may be misaligned, ldd triggers a misalignment
>>>>> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
>>>>> many additional instructions to be executed between the two adds. This
>>>>> can be easily solved by adding the carry into %0 before executing the
>>>>> ldd.
>>>>>
>>>>
>>>> I really think this is a bug either in the trap handler or in the hppa64
>>>> qemu emulation. Only unaligned ldd instructions affect (actually,
>>>> unconditionally set) the carry flag. That doesn't happen with unaligned
>>>> ldw instructions. It would be worthwhile tracking this down since there are
>>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
>>>> when running the kernel in 64-bit mode. On the other side, I guess this
>>>> is a different problem. Not sure though if that should even be mentioned
>>>> here since that makes it sound as if it would be expected that such
>>>> accesses impact the carry flag.
>>>
>>> I wasn't confident it was a bug somewhere so that's why I sent this patch.
>>>
>>> However, I have just found the section of the processor manual [1] I was
>>> looking for (Section Privileged Software-Accessible Registers subsection
>>> Processor Status Word (PSW)):
>>>
>>> "Processor state is encoded in a 64-bit register called the Processor
>>> Status Word (PSW). When an interruption occurs, the current value of the
>>> PSW is saved in the Interruption Processor Status Word (IPSW) and
>>> usually all defined PSW bits are set to 0.
>>>
>>> "The PSW is set to the contents of the IPSW by the RETURN FROM
>>> INTERRUPTION instruction. The interruption handler may restore the
>>> original PSW, modify selected bits, or may change the PSW to an entirely
>>> new value."
>>>
>>> Stored in the PSW register are the "Carry/borrow bits". This confirms
>>> that the carry/borrow bits should be restored. The save is supposed to
>>> automatically happen upon an interrupt and restored by the RETURN FROM
>>> INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please
>>> correct me if I am wrong).
>>>
>>
>> I know that much (I looked into the manual as well), I just really don't
>> know if this is a Linux bug or a QEMU bug, and I have not been able to
>> nail it down. I think someone with access to hardware will need to confirm.
>>
>> Specifically: Yes, the carry/borrow bits should be restored. Question is
>> if the Linux kernel's interrupt handler doesn't restore the carry bits
>> or if the problem is on the qemu side.
>>
>>> This v8 was not needed after-all it seems. It would be best to stick
>>> with the v7.
>>>
>> I tend to agree; after all, v7 exposes the problem, making it easier to
>> determine if the problem can be reproduced on real hardware.
>>
>> FWIW,I wrote some test code which exposes the problem.
>
> Can you please give a pointer to this test code?
> I'm happy to try it on real hardware.
>
>> It is quite easy to show that carry is always set after executing ldd
>> on an unaligned address. That is also why I know for sure that the
>> problem is not seen with ldw on unaligned addresses.
> Interesting.
> In general I think it's quite important to differentiate between
> running on qemu or running on physical hardware.
> Qemu just recently got 64-bit support, and it's not yet behaving
> like real hardware. One thing I noticed is, that read hardware
> does not seem to jump into the exception handler twice, while
> qemu does. So, if you run into an exception (e.g. unaligned ldd)
See page 2-13 of arch.  An interruption sets the PSW Q bit to 0 freezing the IIA
queues.  If an interruption occurs with Q=0, the interruption parameter registers
are left unchanged, so I don't think there's a way to handle double exceptions
on real hardware (Q would have to be set back to 1 after the IPRs are read).

Dave
diff mbox series

Patch

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 776ad3d6d5a1..f0eb7005ff27 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -13,8 +13,9 @@ 
 
 #define IPv4_MIN_WORDS 5
 #define IPv4_MAX_WORDS 15
-#define NUM_IPv6_TESTS 200
-#define NUM_IP_FAST_CSUM_TESTS 181
+#define WORD_ALIGNMENT 4
+/* Ethernet headers are 14 bytes and NET_IP_ALIGN is used to align them */
+#define IP_ALIGNMENT (14 + NET_IP_ALIGN)
 
 /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
 static const u32 random_init_sum = 0x2847aab;
@@ -216,234 +217,106 @@  static const u32 init_sums_no_overflow[] = {
 };
 
 static const u16 expected_csum_ipv6_magic[] = {
-	0x18d4, 0x3085, 0x2e4b, 0xd9f4, 0xbdc8, 0x78f,	0x1034, 0x8422, 0x6fc0,
-	0xd2f6, 0xbeb5, 0x9d3,	0x7e2a, 0x312e, 0x778e, 0xc1bb, 0x7cf2, 0x9d1e,
-	0xca21, 0xf3ff, 0x7569, 0xb02e, 0xca86, 0x7e76, 0x4539, 0x45e3, 0xf28d,
-	0xdf81, 0x8fd5, 0x3b5d, 0x8324, 0xf471, 0x83be, 0x1daf, 0x8c46, 0xe682,
-	0xd1fb, 0x6b2e, 0xe687, 0x2a33, 0x4833, 0x2d67, 0x660f, 0x2e79, 0xd65e,
-	0x6b62, 0x6672, 0x5dbd, 0x8680, 0xbaa5, 0x2229, 0x2125, 0x2d01, 0x1cc0,
-	0x6d36, 0x33c0, 0xee36, 0xd832, 0x9820, 0x8a31, 0x53c5, 0x2e2,	0xdb0e,
-	0x49ed, 0x17a7, 0x77a0, 0xd72e, 0x3d72, 0x7dc8, 0x5b17, 0xf55d, 0xa4d9,
-	0x1446, 0x5d56, 0x6b2e, 0x69a5, 0xadb6, 0xff2a, 0x92e,	0xe044, 0x3402,
-	0xbb60, 0xec7f, 0xe7e6, 0x1986, 0x32f4, 0x8f8,	0x5e00, 0x47c6, 0x3059,
-	0x3969, 0xe957, 0x4388, 0x2854, 0x3334, 0xea71, 0xa6de, 0x33f9, 0x83fc,
-	0x37b4, 0x5531, 0x3404, 0x1010, 0xed30, 0x610a, 0xc95,	0x9aed, 0x6ff,
-	0x5136, 0x2741, 0x660e, 0x8b80, 0xf71,	0xa263, 0x88af, 0x7a73, 0x3c37,
-	0x1908, 0x6db5, 0x2e92, 0x1cd2, 0x70c8, 0xee16, 0xe80,	0xcd55, 0x6e6,
-	0x6434, 0x127,	0x655d, 0x2ea0, 0xb4f4, 0xdc20, 0x5671, 0xe462, 0xe52b,
-	0xdb44, 0x3589, 0xc48f, 0xe60b, 0xd2d2, 0x66ad, 0x498,	0x436,	0xb917,
-	0xf0ca, 0x1a6e, 0x1cb7, 0xbf61, 0x2870, 0xc7e8, 0x5b30, 0xe4a5, 0x168,
-	0xadfc, 0xd035, 0xe690, 0xe283, 0xfb27, 0xe4ad, 0xb1a5, 0xf2d5, 0xc4b6,
-	0x8a30, 0xd7d5, 0x7df9, 0x91d5, 0x63ed, 0x2d21, 0x312b, 0xab19, 0xa632,
-	0x8d2e, 0xef06, 0x57b9, 0xc373, 0xbd1f, 0xa41f, 0x8444, 0x9975, 0x90cb,
-	0xc49c, 0xe965, 0x4eff, 0x5a,	0xef6d, 0xe81a, 0xe260, 0x853a, 0xff7a,
-	0x99aa, 0xb06b, 0xee19, 0xcc2c, 0xf34c, 0x7c49, 0xdac3, 0xa71e, 0xc988,
-	0x3845, 0x1014
+	0x2fbd, 0xb5d0, 0xc16e, 0xe6c1, 0x412e, 0xa836, 0x433b, 0xe87c, 0x57ea,
+	0x5875, 0x5a21, 0xd361, 0xe422, 0xdd50, 0xa57f, 0xad6b, 0xecd1, 0x90b5,
+	0x7fda, 0x88db, 0x979e, 0x8916, 0x1df0, 0x7853, 0x473e, 0xd2b3, 0x4526,
+	0x8304, 0x4c34, 0x363d, 0x2dc1, 0xb66c, 0x9a28, 0xd4f2, 0x615d, 0x36dd,
+	0x3784, 0xbadd, 0xa2c6, 0xd293, 0x9830, 0xcea8, 0x1349, 0x886d, 0x20a3,
+	0x6001, 0x5607, 0x4a30, 0x9365, 0x893c, 0x7505, 0x50a1, 0xb200, 0xf3ad,
+	0x80bf, 0x1f48, 0xc1d9, 0xf55d, 0x7871, 0x262a, 0xe606, 0xb894, 0xd6cd,
+	0x39ed, 0x1817, 0x9d20, 0x6f93, 0x722d, 0x1116, 0xb3b4, 0x88ec, 0xdcb5,
+	0x1a46, 0xfa1d, 0x141e, 0xaef7, 0x7cee, 0xa583, 0x72ad, 0x201b, 0xdece,
+	0xa1ee, 0x92bd, 0xc7ba, 0x403e, 0x50a9, 0xcb5a, 0xff3b, 0x1b41, 0xa06b,
+	0xcf2d, 0xcc9a, 0xf42a, 0x0c61, 0x7654, 0xf3d4, 0xcc25, 0x4985, 0x7606,
+	0xc8a2, 0x6636, 0x610e, 0xc454, 0xefa4, 0xf3a6, 0x43a7, 0xd8e2, 0xe31e,
+	0x150b, 0x445d, 0x57d5, 0x253c, 0x6adf, 0x3c53, 0x502c, 0x9ee5, 0x8422,
 };
 
 static const u16 expected_fast_csum[] = {
-	0xda83, 0x45da, 0x4f46, 0x4e4f, 0x34e,	0xe902, 0xa5e9, 0x87a5, 0x7187,
-	0x5671, 0xf556, 0x6df5, 0x816d, 0x8f81, 0xbb8f, 0xfbba, 0x5afb, 0xbe5a,
-	0xedbe, 0xabee, 0x6aac, 0xe6b,	0xea0d, 0x67ea, 0x7e68, 0x8a7e, 0x6f8a,
-	0x3a70, 0x9f3a, 0xe89e, 0x75e8, 0x7976, 0xfa79, 0x2cfa, 0x3c2c, 0x463c,
-	0x7146, 0x7a71, 0x547a, 0xfd53, 0x99fc, 0xb699, 0x92b6, 0xdb91, 0xe8da,
-	0x5fe9, 0x1e60, 0xae1d, 0x39ae, 0xf439, 0xa1f4, 0xdda1, 0xede,	0x790f,
-	0x579,	0x1206, 0x9012, 0x2490, 0xd224, 0x5cd2, 0xa65d, 0xca7,	0x220d,
-	0xf922, 0xbf9,	0x920b, 0x1b92, 0x361c, 0x2e36, 0x4d2e, 0x24d,	0x2,
-	0xcfff, 0x90cf, 0xa591, 0x93a5, 0x7993, 0x9579, 0xc894, 0x50c8, 0x5f50,
-	0xd55e, 0xcad5, 0xf3c9, 0x8f4,	0x4409, 0x5043, 0x5b50, 0x55b,	0x2205,
-	0x1e22, 0x801e, 0x3780, 0xe137, 0x7ee0, 0xf67d, 0x3cf6, 0xa53c, 0x2ea5,
-	0x472e, 0x5147, 0xcf51, 0x1bcf, 0x951c, 0x1e95, 0xc71e, 0xe4c7, 0xc3e4,
-	0x3dc3, 0xee3d, 0xa4ed, 0xf9a4, 0xcbf8, 0x75cb, 0xb375, 0x50b4, 0x3551,
-	0xf835, 0x19f8, 0x8c1a, 0x538c, 0xad52, 0xa3ac, 0xb0a3, 0x5cb0, 0x6c5c,
-	0x5b6c, 0xc05a, 0x92c0, 0x4792, 0xbe47, 0x53be, 0x1554, 0x5715, 0x4b57,
-	0xe54a, 0x20e5, 0x21,	0xd500, 0xa1d4, 0xa8a1, 0x57a9, 0xca57, 0x5ca,
-	0x1c06, 0x4f1c, 0xe24e, 0xd9e2, 0xf0d9, 0x4af1, 0x474b, 0x8146, 0xe81,
-	0xfd0e, 0x84fd, 0x7c85, 0xba7c, 0x17ba, 0x4a17, 0x964a, 0xf595, 0xff5,
-	0x5310, 0x3253, 0x6432, 0x4263, 0x2242, 0xe121, 0x32e1, 0xf632, 0xc5f5,
-	0x21c6, 0x7d22, 0x8e7c, 0x418e, 0x5641, 0x3156, 0x7c31, 0x737c, 0x373,
-	0x2503, 0xc22a, 0x3c2,	0x4a04, 0x8549, 0x5285, 0xa352, 0xe8a3, 0x6fe8,
-	0x1a6f, 0x211a, 0xe021, 0x38e0, 0x7638, 0xf575, 0x9df5, 0x169e, 0xf116,
-	0x23f1, 0xcd23, 0xece,	0x660f, 0x4866, 0x6a48, 0x716a, 0xee71, 0xa2ee,
-	0xb8a2, 0x61b9, 0xa361, 0xf7a2, 0x26f7, 0x1127, 0x6611, 0xe065, 0x36e0,
-	0x1837, 0x3018, 0x1c30, 0x721b, 0x3e71, 0xe43d, 0x99e4, 0x9e9a, 0xb79d,
-	0xa9b7, 0xcaa,	0xeb0c, 0x4eb,	0x1305, 0x8813, 0xb687, 0xa9b6, 0xfba9,
-	0xd7fb, 0xccd8, 0x2ecd, 0x652f, 0xae65, 0x3fae, 0x3a40, 0x563a, 0x7556,
-	0x2776, 0x1228, 0xef12, 0xf9ee, 0xcef9, 0x56cf, 0xa956, 0x24a9, 0xba24,
-	0x5fba, 0x665f, 0xf465, 0x8ff4, 0x6d8f, 0x346d, 0x5f34, 0x385f, 0xd137,
-	0xb8d0, 0xacb8, 0x55ac, 0x7455, 0xe874, 0x89e8, 0xd189, 0xa0d1, 0xb2a0,
-	0xb8b2, 0x36b8, 0x5636, 0xd355, 0x8d3,	0x1908, 0x2118, 0xc21,	0x990c,
-	0x8b99, 0x158c, 0x7815, 0x9e78, 0x6f9e, 0x4470, 0x1d44, 0x341d, 0x2634,
-	0x3f26, 0x793e, 0xc79,	0xcc0b, 0x26cc, 0xd126, 0x1fd1, 0xb41f, 0xb6b4,
-	0x22b7, 0xa122, 0xa1,	0x7f01, 0x837e, 0x3b83, 0xaf3b, 0x6fae, 0x916f,
-	0xb490, 0xffb3, 0xceff, 0x50cf, 0x7550, 0x7275, 0x1272, 0x2613, 0xaa26,
-	0xd5aa, 0x7d5,	0x9607, 0x96,	0xb100, 0xf8b0, 0x4bf8, 0xdd4c, 0xeddd,
-	0x98ed, 0x2599, 0x9325, 0xeb92, 0x8feb, 0xcc8f, 0x2acd, 0x392b, 0x3b39,
-	0xcb3b, 0x6acb, 0xd46a, 0xb8d4, 0x6ab8, 0x106a, 0x2f10, 0x892f, 0x789,
-	0xc806, 0x45c8, 0x7445, 0x3c74, 0x3a3c, 0xcf39, 0xd7ce, 0x58d8, 0x6e58,
-	0x336e, 0x1034, 0xee10, 0xe9ed, 0xc2e9, 0x3fc2, 0xd53e, 0xd2d4, 0xead2,
-	0x8fea, 0x2190, 0x1162, 0xbe11, 0x8cbe, 0x6d8c, 0xfb6c, 0x6dfb, 0xd36e,
-	0x3ad3, 0xf3a,	0x870e, 0xc287, 0x53c3, 0xc54,	0x5b0c, 0x7d5a, 0x797d,
-	0xec79, 0x5dec, 0x4d5e, 0x184e, 0xd618, 0x60d6, 0xb360, 0x98b3, 0xf298,
-	0xb1f2, 0x69b1, 0xf969, 0xef9,	0xab0e, 0x21ab, 0xe321, 0x24e3, 0x8224,
-	0x5481, 0x5954, 0x7a59, 0xff7a, 0x7dff, 0x1a7d, 0xa51a, 0x46a5, 0x6b47,
-	0xe6b,	0x830e, 0xa083, 0xff9f, 0xd0ff, 0xffd0, 0xe6ff, 0x7de7, 0xc67d,
-	0xd0c6, 0x61d1, 0x3a62, 0xc3b,	0x150c, 0x1715, 0x4517, 0x5345, 0x3954,
-	0xdd39, 0xdadd, 0x32db, 0x6a33, 0xd169, 0x86d1, 0xb687, 0x3fb6, 0x883f,
-	0xa487, 0x39a4, 0x2139, 0xbe20, 0xffbe, 0xedfe, 0x8ded, 0x368e, 0xc335,
-	0x51c3, 0x9851, 0xf297, 0xd6f2, 0xb9d6, 0x95ba, 0x2096, 0xea1f, 0x76e9,
-	0x4e76, 0xe04d, 0xd0df, 0x80d0, 0xa280, 0xfca2, 0x75fc, 0xef75, 0x32ef,
-	0x6833, 0xdf68, 0xc4df, 0x76c4, 0xb77,	0xb10a, 0xbfb1, 0x58bf, 0x5258,
-	0x4d52, 0x6c4d, 0x7e6c, 0xb67e, 0xccb5, 0x8ccc, 0xbe8c, 0xc8bd, 0x9ac8,
-	0xa99b, 0x52a9, 0x2f53, 0xc30,	0x3e0c, 0xb83d, 0x83b7, 0x5383, 0x7e53,
-	0x4f7e, 0xe24e, 0xb3e1, 0x8db3, 0x618e, 0xc861, 0xfcc8, 0x34fc, 0x9b35,
-	0xaa9b, 0xb1aa, 0x5eb1, 0x395e, 0x8639, 0xd486, 0x8bd4, 0x558b, 0x2156,
-	0xf721, 0x4ef6, 0x14f,	0x7301, 0xdd72, 0x49de, 0x894a, 0x9889, 0x8898,
-	0x7788, 0x7b77, 0x637b, 0xb963, 0xabb9, 0x7cab, 0xc87b, 0x21c8, 0xcb21,
-	0xdfca, 0xbfdf, 0xf2bf, 0x6af2, 0x626b, 0xb261, 0x3cb2, 0xc63c, 0xc9c6,
-	0xc9c9, 0xb4c9, 0xf9b4, 0x91f9, 0x4091, 0x3a40, 0xcc39, 0xd1cb, 0x7ed1,
-	0x537f, 0x6753, 0xa167, 0xba49, 0x88ba, 0x7789, 0x3877, 0xf037, 0xd3ef,
-	0xb5d4, 0x55b6, 0xa555, 0xeca4, 0xa1ec, 0xb6a2, 0x7b7,	0x9507, 0xfd94,
-	0x82fd, 0x5c83, 0x765c, 0x9676, 0x3f97, 0xda3f, 0x6fda, 0x646f, 0x3064,
-	0x5e30, 0x655e, 0x6465, 0xcb64, 0xcdca, 0x4ccd, 0x3f4c, 0x243f, 0x6f24,
-	0x656f, 0x6065, 0x3560, 0x3b36, 0xac3b, 0x4aac, 0x714a, 0x7e71, 0xda7e,
-	0x7fda, 0xda7f, 0x6fda, 0xff6f, 0xc6ff, 0xedc6, 0xd4ed, 0x70d5, 0xeb70,
-	0xa3eb, 0x80a3, 0xca80, 0x3fcb, 0x2540, 0xf825, 0x7ef8, 0xf87e, 0x73f8,
-	0xb474, 0xb4b4, 0x92b5, 0x9293, 0x93,	0x3500, 0x7134, 0x9071, 0xfa8f,
-	0x51fa, 0x1452, 0xba13, 0x7ab9, 0x957a, 0x8a95, 0x6e8a, 0x6d6e, 0x7c6d,
-	0x447c, 0x9744, 0x4597, 0x8945, 0xef88, 0x8fee, 0x3190, 0x4831, 0x8447,
-	0xa183, 0x1da1, 0xd41d, 0x2dd4, 0x4f2e, 0xc94e, 0xcbc9, 0xc9cb, 0x9ec9,
-	0x319e, 0xd531, 0x20d5, 0x4021, 0xb23f, 0x29b2, 0xd828, 0xecd8, 0x5ded,
-	0xfc5d, 0x4dfc, 0xd24d, 0x6bd2, 0x5f6b, 0xb35e, 0x7fb3, 0xee7e, 0x56ee,
-	0xa657, 0x68a6, 0x8768, 0x7787, 0xb077, 0x4cb1, 0x764c, 0xb175, 0x7b1,
-	0x3d07, 0x603d, 0x3560, 0x3e35, 0xb03d, 0xd6b0, 0xc8d6, 0xd8c8, 0x8bd8,
-	0x3e8c, 0x303f, 0xd530, 0xf1d4, 0x42f1, 0xca42, 0xddca, 0x41dd, 0x3141,
-	0x132,	0xe901, 0x8e9,	0xbe09, 0xe0bd, 0x2ce0, 0x862d, 0x3986, 0x9139,
-	0x6d91, 0x6a6d, 0x8d6a, 0x1b8d, 0xac1b, 0xedab, 0x54ed, 0xc054, 0xcebf,
-	0xc1ce, 0x5c2,	0x3805, 0x6038, 0x5960, 0xd359, 0xdd3,	0xbe0d, 0xafbd,
-	0x6daf, 0x206d, 0x2c20, 0x862c, 0x8e86, 0xec8d, 0xa2ec, 0xa3a2, 0x51a3,
-	0x8051, 0xfd7f, 0x91fd, 0xa292, 0xaf14, 0xeeae, 0x59ef, 0x535a, 0x8653,
-	0x3986, 0x9539, 0xb895, 0xa0b8, 0x26a0, 0x2227, 0xc022, 0x77c0, 0xad77,
-	0x46ad, 0xaa46, 0x60aa, 0x8560, 0x4785, 0xd747, 0x45d7, 0x2346, 0x5f23,
-	0x25f,	0x1d02, 0x71d,	0x8206, 0xc82,	0x180c, 0x3018, 0x4b30, 0x4b,
-	0x3001, 0x1230, 0x2d12, 0x8c2d, 0x148d, 0x4015, 0x5f3f, 0x3d5f, 0x6b3d,
-	0x396b, 0x473a, 0xf746, 0x44f7, 0x8945, 0x3489, 0xcb34, 0x84ca, 0xd984,
-	0xf0d9, 0xbcf0, 0x63bd, 0x3264, 0xf332, 0x45f3, 0x7346, 0x5673, 0xb056,
-	0xd3b0, 0x4ad4, 0x184b, 0x7d18, 0x6c7d, 0xbb6c, 0xfeba, 0xe0fe, 0x10e1,
-	0x5410, 0x2954, 0x9f28, 0x3a9f, 0x5a3a, 0xdb59, 0xbdc,	0xb40b, 0x1ab4,
-	0x131b, 0x5d12, 0x6d5c, 0xe16c, 0xb0e0, 0x89b0, 0xba88, 0xbb,	0x3c01,
-	0xe13b, 0x6fe1, 0x446f, 0xa344, 0x81a3, 0xfe81, 0xc7fd, 0x38c8, 0xb38,
-	0x1a0b, 0x6d19, 0xf36c, 0x47f3, 0x6d48, 0xb76d, 0xd3b7, 0xd8d2, 0x52d9,
-	0x4b53, 0xa54a, 0x34a5, 0xc534, 0x9bc4, 0xed9b, 0xbeed, 0x3ebe, 0x233e,
-	0x9f22, 0x4a9f, 0x774b, 0x4577, 0xa545, 0x64a5, 0xb65,	0x870b, 0x487,
-	0x9204, 0x5f91, 0xd55f, 0x35d5, 0x1a35, 0x71a,	0x7a07, 0x4e7a, 0xfc4e,
-	0x1efc, 0x481f, 0x7448, 0xde74, 0xa7dd, 0x1ea7, 0xaa1e, 0xcfaa, 0xfbcf,
-	0xedfb, 0x6eee, 0x386f, 0x4538, 0x6e45, 0xd96d, 0x11d9, 0x7912, 0x4b79,
-	0x494b, 0x6049, 0xac5f, 0x65ac, 0x1366, 0x5913, 0xe458, 0x7ae4, 0x387a,
-	0x3c38, 0xb03c, 0x76b0, 0x9376, 0xe193, 0x42e1, 0x7742, 0x6476, 0x3564,
-	0x3c35, 0x6a3c, 0xcc69, 0x94cc, 0x5d95, 0xe5e,	0xee0d, 0x4ced, 0xce4c,
-	0x52ce, 0xaa52, 0xdaaa, 0xe4da, 0x1de5, 0x4530, 0x5445, 0x3954, 0xb639,
-	0x81b6, 0x7381, 0x1574, 0xc215, 0x10c2, 0x3f10, 0x6b3f, 0xe76b, 0x7be7,
-	0xbc7b, 0xf7bb, 0x41f7, 0xcc41, 0x38cc, 0x4239, 0xa942, 0x4a9,	0xc504,
-	0x7cc4, 0x437c, 0x6743, 0xea67, 0x8dea, 0xe88d, 0xd8e8, 0xdcd8, 0x17dd,
-	0x5718, 0x958,	0xa609, 0x41a5, 0x5842, 0x159,	0x9f01, 0x269f, 0x5a26,
-	0x405a, 0xc340, 0xb4c3, 0xd4b4, 0xf4d3, 0xf1f4, 0x39f2, 0xe439, 0x67e4,
-	0x4168, 0xa441, 0xdda3, 0xdedd, 0x9df,	0xab0a, 0xa5ab, 0x9a6,	0xba09,
-	0x9ab9, 0xad9a, 0x5ae,	0xe205, 0xece2, 0xecec, 0x14ed, 0xd614, 0x6bd5,
-	0x916c, 0x3391, 0x6f33, 0x206f, 0x8020, 0x780,	0x7207, 0x2472, 0x8a23,
-	0xb689, 0x3ab6, 0xf739, 0x97f6, 0xb097, 0xa4b0, 0xe6a4, 0x88e6, 0x2789,
-	0xb28,	0x350b, 0x1f35, 0x431e, 0x1043, 0xc30f, 0x79c3, 0x379,	0x5703,
-	0x3256, 0x4732, 0x7247, 0x9d72, 0x489d, 0xd348, 0xa4d3, 0x7ca4, 0xbf7b,
-	0x45c0, 0x7b45, 0x337b, 0x4034, 0x843f, 0xd083, 0x35d0, 0x6335, 0x4d63,
-	0xe14c, 0xcce0, 0xfecc, 0x35ff, 0x5636, 0xf856, 0xeef8, 0x2def, 0xfc2d,
-	0x4fc,	0x6e04, 0xb66d, 0x78b6, 0xbb78, 0x3dbb, 0x9a3d, 0x839a, 0x9283,
-	0x593,	0xd504, 0x23d5, 0x5424, 0xd054, 0x61d0, 0xdb61, 0x17db, 0x1f18,
-	0x381f, 0x9e37, 0x679e, 0x1d68, 0x381d, 0x8038, 0x917f, 0x491,	0xbb04,
-	0x23bb, 0x4124, 0xd41,	0xa30c, 0x8ba3, 0x8b8b, 0xc68b, 0xd2c6, 0xebd2,
-	0x93eb, 0xbd93, 0x99bd, 0x1a99, 0xea19, 0x58ea, 0xcf58, 0x73cf, 0x1073,
-	0x9e10, 0x139e, 0xea13, 0xcde9, 0x3ecd, 0x883f, 0xf89,	0x180f, 0x2a18,
-	0x212a, 0xce20, 0x73ce, 0xf373, 0x60f3, 0xad60, 0x4093, 0x8e40, 0xb98e,
-	0xbfb9, 0xf1bf, 0x8bf1, 0x5e8c, 0xe95e, 0x14e9, 0x4e14, 0x1c4e, 0x7f1c,
-	0xe77e, 0x6fe7, 0xf26f, 0x13f2, 0x8b13, 0xda8a, 0x5fda, 0xea5f, 0x4eea,
-	0xa84f, 0x88a8, 0x1f88, 0x2820, 0x9728, 0x5a97, 0x3f5b, 0xb23f, 0x70b2,
-	0x2c70, 0x232d, 0xf623, 0x4f6,	0x905,	0x7509, 0xd675, 0x28d7, 0x9428,
-	0x3794, 0xf036, 0x2bf0, 0xba2c, 0xedb9, 0xd7ed, 0x59d8, 0xed59, 0x4ed,
-	0xe304, 0x18e3, 0x5c19, 0x3d5c, 0x753d, 0x6d75, 0x956d, 0x7f95, 0xc47f,
-	0x83c4, 0xa84,	0x2e0a, 0x5f2e, 0xb95f, 0x77b9, 0x6d78, 0xf46d, 0x1bf4,
-	0xed1b, 0xd6ed, 0xe0d6, 0x5e1,	0x3905, 0x5638, 0xa355, 0x99a2, 0xbe99,
-	0xb4bd, 0x85b4, 0x2e86, 0x542e, 0x6654, 0xd765, 0x73d7, 0x3a74, 0x383a,
-	0x2638, 0x7826, 0x7677, 0x9a76, 0x7e99, 0x2e7e, 0xea2d, 0xa6ea, 0x8a7,
-	0x109,	0x3300, 0xad32, 0x5fad, 0x465f, 0x2f46, 0xc62f, 0xd4c5, 0xad5,
-	0xcb0a, 0x4cb,	0xb004, 0x7baf, 0xe47b, 0x92e4, 0x8e92, 0x638e, 0x1763,
-	0xc17,	0xf20b, 0x1ff2, 0x8920, 0x5889, 0xcb58, 0xf8cb, 0xcaf8, 0x84cb,
-	0x9f84, 0x8a9f, 0x918a, 0x4991, 0x8249, 0xff81, 0x46ff, 0x5046, 0x5f50,
-	0x725f, 0xf772, 0x8ef7, 0xe08f, 0xc1e0, 0x1fc2, 0x9e1f, 0x8b9d, 0x108b,
-	0x411,	0x2b04, 0xb02a, 0x1fb0, 0x1020, 0x7a0f, 0x587a, 0x8958, 0xb188,
-	0xb1b1, 0x49b2, 0xb949, 0x7ab9, 0x917a, 0xfc91, 0xe6fc, 0x47e7, 0xbc47,
-	0x8fbb, 0xea8e, 0x34ea, 0x2635, 0x1726, 0x9616, 0xc196, 0xa6c1, 0xf3a6,
-	0x11f3, 0x4811, 0x3e48, 0xeb3e, 0xf7ea, 0x1bf8, 0xdb1c, 0x8adb, 0xe18a,
-	0x42e1, 0x9d42, 0x5d9c, 0x6e5d, 0x286e, 0x4928, 0x9a49, 0xb09c, 0xa6b0,
-	0x2a7,	0xe702, 0xf5e6, 0x9af5, 0xf9b,	0x810f, 0x8080, 0x180,	0x1702,
-	0x5117, 0xa650, 0x11a6, 0x1011, 0x550f, 0xd554, 0xbdd5, 0x6bbe, 0xc66b,
-	0xfc7,	0x5510, 0x5555, 0x7655, 0x177,	0x2b02, 0x6f2a, 0xb70,	0x9f0b,
-	0xcf9e, 0xf3cf, 0x3ff4, 0xcb40, 0x8ecb, 0x768e, 0x5277, 0x8652, 0x9186,
-	0x9991, 0x5099, 0xd350, 0x93d3, 0x6d94, 0xe6d,	0x530e, 0x3153, 0xa531,
-	0x64a5, 0x7964, 0x7c79, 0x467c, 0x1746, 0x3017, 0x3730, 0x538,	0x5,
-	0x1e00, 0x5b1e, 0x955a, 0xae95, 0x3eaf, 0xff3e, 0xf8ff, 0xb2f9, 0xa1b3,
-	0xb2a1, 0x5b2,	0xad05, 0x7cac, 0x2d7c, 0xd32c, 0x80d2, 0x7280, 0x8d72,
-	0x1b8e, 0x831b, 0xac82, 0xfdac, 0xa7fd, 0x15a8, 0xd614, 0xe0d5, 0x7be0,
-	0xb37b, 0x61b3, 0x9661, 0x9d95, 0xc79d, 0x83c7, 0xd883, 0xead7, 0xceb,
-	0xf60c, 0xa9f5, 0x19a9, 0xa019, 0x8f9f, 0xd48f, 0x3ad5, 0x853a, 0x985,
-	0x5309, 0x6f52, 0x1370, 0x6e13, 0xa96d, 0x98a9, 0x5198, 0x9f51, 0xb69f,
-	0xa1b6, 0x2ea1, 0x672e, 0x2067, 0x6520, 0xaf65, 0x6eaf, 0x7e6f, 0xee7e,
-	0x17ef, 0xa917, 0xcea8, 0x9ace, 0xff99, 0x5dff, 0xdf5d, 0x38df, 0xa39,
-	0x1c0b, 0xe01b, 0x46e0, 0xcb46, 0x90cb, 0xba90, 0x4bb,	0x9104, 0x9d90,
-	0xc89c, 0xf6c8, 0x6cf6, 0x886c, 0x1789, 0xbd17, 0x70bc, 0x7e71, 0x17e,
-	0x1f01, 0xa01f, 0xbaa0, 0x14bb, 0xfc14, 0x7afb, 0xa07a, 0x3da0, 0xbf3d,
-	0x48bf, 0x8c48, 0x968b, 0x9d96, 0xfd9d, 0x96fd, 0x9796, 0x6b97, 0xd16b,
-	0xf4d1, 0x3bf4, 0x253c, 0x9125, 0x6691, 0xc166, 0x34c1, 0x5735, 0x1a57,
-	0xdc19, 0x77db, 0x8577, 0x4a85, 0x824a, 0x9182, 0x7f91, 0xfd7f, 0xb4c3,
-	0xb5b4, 0xb3b5, 0x7eb3, 0x617e, 0x4e61, 0xa4f,	0x530a, 0x3f52, 0xa33e,
-	0x34a3, 0x9234, 0xf091, 0xf4f0, 0x1bf5, 0x311b, 0x9631, 0x6a96, 0x386b,
-	0x1d39, 0xe91d, 0xe8e9, 0x69e8, 0x426a, 0xee42, 0x89ee, 0x368a, 0x2837,
-	0x7428, 0x5974, 0x6159, 0x1d62, 0x7b1d, 0xf77a, 0x7bf7, 0x6b7c, 0x696c,
-	0xf969, 0x4cf9, 0x714c, 0x4e71, 0x6b4e, 0x256c, 0x6e25, 0xe96d, 0x94e9,
-	0x8f94, 0x3e8f, 0x343e, 0x4634, 0xb646, 0x97b5, 0x8997, 0xe8a,	0x900e,
-	0x8090, 0xfd80, 0xa0fd, 0x16a1, 0xf416, 0xebf4, 0x95ec, 0x1196, 0x8911,
-	0x3d89, 0xda3c, 0x9fd9, 0xd79f, 0x4bd7, 0x214c, 0x3021, 0x4f30, 0x994e,
-	0x5c99, 0x6f5d, 0x326f, 0xab31, 0x6aab, 0xe969, 0x90e9, 0x1190, 0xff10,
-	0xa2fe, 0xe0a2, 0x66e1, 0x4067, 0x9e3f, 0x2d9e, 0x712d, 0x8170, 0xd180,
-	0xffd1, 0x25ff, 0x3826, 0x2538, 0x5f24, 0xc45e, 0x1cc4, 0xdf1c, 0x93df,
-	0xc793, 0x80c7, 0x2380, 0xd223, 0x7ed2, 0xfc7e, 0x22fd, 0x7422, 0x1474,
-	0xb714, 0x7db6, 0x857d, 0xa85,	0xa60a, 0x88a6, 0x4289, 0x7842, 0xc278,
-	0xf7c2, 0xcdf7, 0x84cd, 0xae84, 0x8cae, 0xb98c, 0x1aba, 0x4d1a, 0x884c,
-	0x4688, 0xcc46, 0xd8cb, 0x2bd9, 0xbe2b, 0xa2be, 0x72a2, 0xf772, 0xd2f6,
-	0x75d2, 0xc075, 0xa3c0, 0x63a3, 0xae63, 0x8fae, 0x2a90, 0x5f2a, 0xef5f,
-	0x5cef, 0xa05c, 0x89a0, 0x5e89, 0x6b5e, 0x736b, 0x773,	0x9d07, 0xe99c,
-	0x27ea, 0x2028, 0xc20,	0x980b, 0x4797, 0x2848, 0x9828, 0xc197, 0x48c2,
-	0x2449, 0x7024, 0x570,	0x3e05, 0xd3e,	0xf60c, 0xbbf5, 0x69bb, 0x3f6a,
-	0x740,	0xf006, 0xe0ef, 0xbbe0, 0xadbb, 0x56ad, 0xcf56, 0xbfce, 0xa9bf,
-	0x205b, 0x6920, 0xae69, 0x50ae, 0x2050, 0xf01f, 0x27f0, 0x9427, 0x8993,
-	0x8689, 0x4087, 0x6e40, 0xb16e, 0xa1b1, 0xe8a1, 0x87e8, 0x6f88, 0xfe6f,
-	0x4cfe, 0xe94d, 0xd5e9, 0x47d6, 0x3148, 0x5f31, 0xc35f, 0x13c4, 0xa413,
-	0x5a5,	0x2405, 0xc223, 0x66c2, 0x3667, 0x5e37, 0x5f5e, 0x2f5f, 0x8c2f,
-	0xe48c, 0xd0e4, 0x4d1,	0xd104, 0xe4d0, 0xcee4, 0xfcf,	0x480f, 0xa447,
-	0x5ea4, 0xff5e, 0xbefe, 0x8dbe, 0x1d8e, 0x411d, 0x1841, 0x6918, 0x5469,
-	0x1155, 0xc611, 0xaac6, 0x37ab, 0x2f37, 0xca2e, 0x87ca, 0xbd87, 0xabbd,
-	0xb3ab, 0xcb4,	0xce0c, 0xfccd, 0xa5fd, 0x72a5, 0xf072, 0x83f0, 0xfe83,
-	0x97fd, 0xc997, 0xb0c9, 0xadb0, 0xe6ac, 0x88e6, 0x1088, 0xbe10, 0x16be,
-	0xa916, 0xa3a8, 0x46a3, 0x5447, 0xe953, 0x84e8, 0x2085, 0xa11f, 0xfa1,
-	0xdd0f, 0xbedc, 0x5abe, 0x805a, 0xc97f, 0x6dc9, 0x826d, 0x4a82, 0x934a,
-	0x5293, 0xd852, 0xd3d8, 0xadd3, 0xf4ad, 0xf3f4, 0xfcf3, 0xfefc, 0xcafe,
-	0xb7ca, 0x3cb8, 0xa13c, 0x18a1, 0x1418, 0xea13, 0x91ea, 0xf891, 0x53f8,
-	0xa254, 0xe9a2, 0x87ea, 0x4188, 0x1c41, 0xdc1b, 0xf5db, 0xcaf5, 0x45ca,
-	0x6d45, 0x396d, 0xde39, 0x90dd, 0x1e91, 0x1e,	0x7b00, 0x6a7b, 0xa46a,
-	0xc9a3, 0x9bc9, 0x389b, 0x1139, 0x5211, 0x1f52, 0xeb1f, 0xabeb, 0x48ab,
-	0x9348, 0xb392, 0x17b3, 0x1618, 0x5b16, 0x175b, 0xdc17, 0xdedb, 0x1cdf,
-	0xeb1c, 0xd1ea, 0x4ad2, 0xd4b,	0xc20c, 0x24c2, 0x7b25, 0x137b, 0x8b13,
-	0x618b, 0xa061, 0xff9f, 0xfffe, 0x72ff, 0xf572, 0xe2f5, 0xcfe2, 0xd2cf,
-	0x75d3, 0x6a76, 0xc469, 0x1ec4, 0xfc1d, 0x59fb, 0x455a, 0x7a45, 0xa479,
-	0xb7a4
+	0x83da, 0x2ac2, 0x6211, 0x49ba, 0x14af, 0x3045, 0x9340, 0x9cb0, 0xc3b4,
+	0x5b20, 0x2cdf, 0x4e03, 0x8552, 0x6cfb, 0x37f0, 0x5386, 0xb681, 0xbff1,
+	0xe6f5, 0x7e61, 0x5020, 0x916a, 0x8771, 0x6f1a, 0x3a0f, 0x55a5, 0xb8a0,
+	0xc210, 0xe914, 0x8080, 0x523f, 0x9389, 0x704a, 0x6d81, 0x3876, 0x540c,
+	0xb707, 0xc077, 0xe77b, 0x7ee7, 0x50a6, 0x91f0, 0x6eb1, 0xc58a, 0xfb5a,
+	0x16f1, 0x79ec, 0x835c, 0xaa60, 0x41cc, 0x138b, 0x54d5, 0x3196, 0x886f,
+	0x545c, 0xac6a, 0x0f66, 0x18d6, 0x3fda, 0xd745, 0xa904, 0xea4e, 0xc70f,
+	0x1de9, 0xe9d5, 0x06ab, 0x687e, 0x71ee, 0x98f2, 0x305e, 0x021d, 0x4367,
+	0x2028, 0x7701, 0x42ee, 0x5fc3, 0x3c73, 0x3a9f, 0x61a3, 0xf90e, 0xcacd,
+	0x0c18, 0xe8d8, 0x3fb2, 0x0b9f, 0x2874, 0x0524, 0x1f07, 0x79fa, 0x1166,
+	0xe324, 0x246f, 0x0130, 0x5809, 0x23f6, 0x40cb, 0x1d7b, 0x375e, 0x57d9,
+	0x4671, 0x1830, 0x597a, 0x363b, 0x8d14, 0x5901, 0x75d6, 0x5286, 0x6c69,
+	0x8ce4, 0xec7a, 0xfc99, 0x3de4, 0x1aa5, 0x717e, 0x3d6b, 0x5a40, 0x36f0,
+	0x50d3, 0x714e, 0xd0e4, 0xaa9f, 0xdae8, 0xb7a9, 0x0e83, 0xda6f, 0xf744,
+	0xd3f4, 0xedd7, 0x0e53, 0x6de9, 0x47a4, 0xc7fd, 0xae39, 0x0513, 0xd0ff,
+	0xedd4, 0xca84, 0xe467, 0x04e3, 0x6479, 0x3e34, 0xbe8d, 0x3f6d, 0xde0e,
+	0xa9fb, 0xc6d0, 0xa380, 0xbd63, 0xddde, 0x3d75, 0x1730, 0x9789, 0x1869,
+	0x5e16, 0x1290, 0x2f65, 0x0c15, 0x25f8, 0x4673, 0xa609, 0x7fc4, 0x001e,
+	0x80fd, 0xc6aa, 0x9c46, 0x5da6, 0x3a56, 0x5439, 0x74b4, 0xd44a, 0xae05,
+	0x2e5f, 0xaf3e, 0xf4eb, 0xca87, 0xf556, 0xf90b, 0x12ef, 0x336a, 0x9300,
+	0x6cbb, 0xed14, 0x6df4, 0xb3a1, 0x893d, 0xb40c, 0x8233, 0x362e, 0x56a9,
+	0xb63f, 0x8ffa, 0x1054, 0x9133, 0xd6e0, 0xac7c, 0xd74b, 0xa572, 0x5471,
+	0xffcf, 0x5f66, 0x3921, 0xb97a, 0x3a5a, 0x8007, 0x55a3, 0x8072, 0x4e99,
+	0xfd97, 0x78d3, 0x9379, 0x6d34, 0xed8d, 0x6e6d, 0xb41a, 0x89b6, 0xb485,
+	0x82ac, 0x31ab, 0xace6, 0xdaeb, 0x505f, 0xd0b8, 0x5198, 0x9745, 0x6ce1,
+	0x97b0, 0x65d7, 0x14d6, 0x9011, 0xbe16, 0x2404, 0xf408, 0x74e8, 0xba95,
+	0x9031, 0xbb00, 0x8927, 0x3826, 0xb361, 0xe166, 0x4754, 0x3984, 0x5b05,
+	0xa0b2, 0x764e, 0xa11d, 0x6f44, 0x1e43, 0x997e, 0xc783, 0x2d71, 0x1fa1,
+	0xded5, 0x8037, 0x55d3, 0x80a2, 0x4ec9, 0xfdc7, 0x7903, 0xa708, 0x0cf6,
+	0xff25, 0xbe5a, 0xcd18, 0xf63c, 0x210c, 0xef32, 0x9e31, 0x196d, 0x4772,
+	0xad5f, 0x9f8f, 0x5ec4, 0x6d82, 0x2c93, 0x4751, 0x1578, 0xc476, 0x3fb2,
+	0x6db7, 0xd3a4, 0xc5d4, 0x8509, 0x93c7, 0x52d8, 0x754d, 0x951e, 0x441d,
+	0xbf58, 0xed5d, 0x534b, 0x457b, 0x04b0, 0x136e, 0xd27e, 0xf4f3, 0x4b97,
+	0xc33d, 0x3e79, 0x6c7e, 0xd26b, 0xc49b, 0x83d0, 0x928e, 0x519f, 0x7414,
+	0xcab7, 0x5dc1, 0xf8cb, 0x26d1, 0x8cbe, 0x7eee, 0x3e23, 0x4ce1, 0x0bf2,
+	0x2e67, 0x850a, 0x1814, 0xcdef, 0x5135, 0xb722, 0xa952, 0x6887, 0x7745,
+	0x3656, 0x58cb, 0xaf6e, 0x4278, 0xf853, 0xb310, 0x8c53, 0x7e83, 0x3db8,
+	0x4c76, 0x0b87, 0x2dfc, 0x849f, 0x17a9, 0xcd84, 0x8841, 0xc3f1, 0xb05c,
+	0x6f91, 0x7e4f, 0x3d60, 0x5fd5, 0xb678, 0x4982, 0xff5d, 0xba1a, 0xf5ca,
+	0xe8dc, 0xc092, 0xcf50, 0x8e61, 0xb0d6, 0x077a, 0x9a83, 0x505f, 0x0b1c,
+	0x46cc, 0x39de, 0x7bb6, 0x5415, 0x1326, 0x359b, 0x8c3e, 0x1f48, 0xd523,
+	0x8fe0, 0xcb90, 0xbea2, 0x007b, 0xf7a0, 0xe520, 0x0796, 0x5e39, 0xf142,
+	0xa71e, 0x61db, 0x9d8b, 0x909d, 0xd275, 0xc99b, 0xb80c, 0xa1a8, 0xf84b,
+	0x8b55, 0x4131, 0xfbed, 0x379e, 0x2ab0, 0x6c88, 0x63ae, 0x521f, 0x3ac3,
+	0x061c, 0x9925, 0x4f01, 0x09be, 0x456e, 0x3880, 0x7a58, 0x717e, 0x5fef,
+	0x4893, 0xd58f, 0xd9f0, 0x8fcc, 0x4a89, 0x8639, 0x794b, 0xbb23, 0xb249,
+	0xa0ba, 0x895e, 0x165b, 0xedda, 0x810e, 0x3bcb, 0x777b, 0x6a8d, 0xac65,
+	0xa38b, 0x91fc, 0x7aa0, 0x079d, 0xdf1c, 0x0a1e, 0x7cba, 0xb86a, 0xab7c,
+	0xed54, 0xe47a, 0xd2eb, 0xbb8f, 0x488c, 0x200c, 0x4b0d, 0x9d88, 0x95f5,
+	0x8907, 0xcadf, 0xc205, 0xb076, 0x991a, 0x2617, 0xfd96, 0x2898, 0x7b13,
+	0xf6a2, 0x3264, 0x743c, 0x6b62, 0x59d3, 0x4277, 0xcf73, 0xa6f3, 0xd1f4,
+	0x2470, 0x9fff, 0x88ec, 0xe132, 0xd858, 0xc6c9, 0xaf6d, 0x3c6a, 0x13ea,
+	0x3eeb, 0x9166, 0x0cf6, 0xf5e2, 0x2c46, 0x227d, 0x10ee, 0xf991, 0x868e,
+	0x5e0e, 0x890f, 0xdb8a, 0x571a, 0x4007, 0x766a, 0xb616, 0x5631, 0x3ed5,
+	0xcbd1, 0xa351, 0xce52, 0x20ce, 0x9c5d, 0x854a, 0xbbad, 0xfb59, 0xe067,
+	0x0325, 0x9021, 0x67a1, 0x92a2, 0xe51d, 0x60ad, 0x499a, 0x7ffd, 0xbfa9,
+	0xa4b7, 0x78d3, 0x9d0f, 0x748f, 0x9f90, 0xf20b, 0x6d9b, 0x5688, 0x8ceb,
+	0xcc97, 0xb1a5, 0x85c1, 0xee49, 0x32b7, 0x5db8, 0xb033, 0x2bc3, 0x14b0,
+	0x4b13, 0x8abf, 0x6fcd, 0x43e9, 0xac71, 0xe4d9, 0x6692, 0xb90d, 0x349d,
+	0x1d8a, 0x53ed, 0x9399, 0x78a7, 0x4cc3, 0xb54b, 0xedb3, 0x5a4e, 0xca9c,
+	0x462c, 0x2f19, 0x657c, 0xa528, 0x8a36, 0x5e52, 0xc6da, 0xff42, 0x6bdd,
+	0x4b71, 0x5d88, 0x4675, 0x7cd8, 0xbc84, 0xa192, 0x75ae, 0xde36, 0x169f,
+	0x8339, 0x62cd, 0x7d20, 0xb978, 0xefdb, 0x2f88, 0x1496, 0xe8b1, 0x513a,
+	0x89a2, 0xf63c, 0xd5d0, 0xf023, 0x7c35, 0x185c, 0x5808, 0x3d16, 0x1132,
+	0x79ba, 0xb222, 0x1ebd, 0xfe50, 0x18a4, 0xa4b5, 0x0dc3, 0x2d07, 0x1215,
+	0xe630, 0x4eb9, 0x8721, 0xf3bb, 0xd34f, 0xeda2, 0x79b4, 0xe2c1, 0xa671,
+	0xbf99, 0x93b5, 0xfc3d, 0x34a6, 0xa140, 0x80d4, 0x9b27, 0x2739, 0x9046,
+	0x53f6, 0xe623, 0x1826, 0x80ae, 0xb916, 0x25b1, 0x0545, 0x1f98, 0xaba9,
+	0x14b7, 0xd866, 0x6a94, 0xfea5, 0x97c1, 0xd029, 0x3cc4, 0x1c58, 0x36ab,
+	0xc2bc, 0x2bca, 0xef79, 0x81a7, 0x15b9, 0x50d9, 0x99c6, 0x0661, 0xe5f4,
+	0x0048, 0x8c59, 0xf566, 0xb916, 0x4b44, 0xdf55, 0x1a76, 0x70dd, 0xc6b4,
+	0xa648, 0xc09b, 0x4cad, 0xb5ba, 0x796a, 0x0b98, 0x9fa9, 0xdac9, 0x3131,
+	0x00cb, 0xc13a, 0xdb8d, 0x679f, 0xd0ac, 0x945c, 0x268a, 0xba9b, 0xf5bb,
+	0x4c23, 0x1bbd, 0xd992, 0x0772, 0x9383, 0xfc90, 0xc040, 0x526e, 0xe67f,
+	0x21a0, 0x7807, 0x47a1, 0x0577, 0xa4cd, 0x2afb, 0x9408, 0x57b8, 0xe9e5,
+	0x7df7, 0xb917, 0x0f7f, 0xdf18, 0x9cee, 0x3c45, 0x9aaf, 0x5ba0, 0x1f50,
+	0xb17d, 0x458f, 0x80af, 0xd716, 0xa6b0, 0x6486, 0x03dd, 0x6247, 0xce54,
+	0xb2b5, 0x44e3, 0xd8f4, 0x1415, 0x6a7c, 0x3a16, 0xf7eb, 0x9742, 0xf5ac,
+	0x61ba, 0xc44b, 0x654f, 0xf960, 0x3481, 0x8ae8, 0x5a82, 0x1858, 0xb7ae,
+	0x1619, 0x8226, 0xe4b7, 0x1920, 0xdf0d, 0x1a2e, 0x7095, 0x402f, 0xfe04,
+	0x9d5b, 0xfbc5, 0x67d3, 0xca64, 0xfecc, 0x68ea, 0x8e1c, 0xe483, 0xb41d,
+	0x71f3, 0x114a, 0x6fb4, 0xdbc1, 0x3e53, 0x72bb, 0xdcd8, 0x6b24, 0x7b76,
+	0x4b10, 0x08e6, 0xa83c, 0x06a7, 0x72b4, 0xd545, 0x09ae, 0x73cb, 0x0217,
+	0x9603,
 };
 
 static u8 tmp_buf[TEST_BUFLEN];
@@ -578,15 +451,19 @@  static void test_csum_no_carry_inputs(struct kunit *test)
 static void test_ip_fast_csum(struct kunit *test)
 {
 	__sum16 csum_result, expected;
-
-	for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
-		for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
-			csum_result = ip_fast_csum(random_buf + index, len);
-			expected = (__force __sum16)
-				expected_fast_csum[(len - IPv4_MIN_WORDS) *
-						   NUM_IP_FAST_CSUM_TESTS +
-						   index];
-			CHECK_EQ(expected, csum_result);
+	int num_tests = (MAX_LEN / WORD_ALIGNMENT - IPv4_MAX_WORDS * WORD_ALIGNMENT);
+
+	for (int i = 0; i < num_tests; i++) {
+		memcpy(&tmp_buf[IP_ALIGNMENT],
+		       random_buf + (i * WORD_ALIGNMENT),
+		       IPv4_MAX_WORDS * WORD_ALIGNMENT);
+
+		for (int len = IPv4_MIN_WORDS; len <= IPv4_MAX_WORDS; len++) {
+			int index = (len - IPv4_MIN_WORDS) +
+				 i * ((IPv4_MAX_WORDS - IPv4_MIN_WORDS) + 1);
+			csum_result = ip_fast_csum(tmp_buf + IP_ALIGNMENT, len);
+			expected = (__force __sum16)htons(expected_fast_csum[index]);
+			CHECK_EQ(csum_result, expected);
 		}
 	}
 }
@@ -594,29 +471,29 @@  static void test_ip_fast_csum(struct kunit *test)
 static void test_csum_ipv6_magic(struct kunit *test)
 {
 #if defined(CONFIG_NET)
-	const struct in6_addr *saddr;
-	const struct in6_addr *daddr;
-	unsigned int len;
-	unsigned char proto;
-	unsigned int csum;
-
-	const int daddr_offset = sizeof(struct in6_addr);
-	const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
-	const int proto_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
-			     sizeof(int);
-	const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
-			    sizeof(int) + sizeof(char);
-
-	for (int i = 0; i < NUM_IPv6_TESTS; i++) {
-		saddr = (const struct in6_addr *)(random_buf + i);
-		daddr = (const struct in6_addr *)(random_buf + i +
-						  daddr_offset);
-		len = *(unsigned int *)(random_buf + i + len_offset);
-		proto = *(random_buf + i + proto_offset);
-		csum = *(unsigned int *)(random_buf + i + csum_offset);
-		CHECK_EQ((__force __sum16)expected_csum_ipv6_magic[i],
-			 csum_ipv6_magic(saddr, daddr, len, proto,
-					 (__force __wsum)csum));
+	struct csum_ipv6_magic_data {
+		const struct in6_addr saddr;
+		const struct in6_addr daddr;
+		__be32 len;
+		__wsum csum;
+		unsigned char proto;
+		unsigned char pad[3];
+	} *data;
+	__sum16 csum_result, expected;
+	int ipv6_num_tests = ((MAX_LEN - sizeof(struct csum_ipv6_magic_data)) / WORD_ALIGNMENT);
+
+	for (int i = 0; i < ipv6_num_tests; i++) {
+		int index = i * WORD_ALIGNMENT;
+
+		data = kmalloc(sizeof(struct csum_ipv6_magic_data), GFP_KERNEL);
+
+		memcpy(data, random_buf + index, sizeof(struct csum_ipv6_magic_data));
+
+		csum_result = csum_ipv6_magic(&data->saddr, &data->daddr,
+					      ntohl(data->len), data->proto,
+					      data->csum);
+		expected = (__force __sum16)htons(expected_csum_ipv6_magic[i]);
+		CHECK_EQ(csum_result, expected);
 	}
 #endif /* !CONFIG_NET */
 }