Message ID | 20240214-fix_sparse_errors_checksum_tests-v8-2-36b60e673593@rivosinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | lib: checksum: Fix issues with checksum tests | expand |
On 2/14/24 13:41, Charlie Jenkins wrote: > The test cases for ip_fast_csum and csum_ipv6_magic were failing on a > variety of architectures that are big endian or do not support > misalgined accesses. Both of these test cases are changed to support big > and little endian architectures. > > The test for ip_fast_csum is changed to align the data along (14 + > NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for > csum_ipv6_magic aligns the data using a struct. An extra padding field > is added to the struct to ensure that the size of the struct is the same > on all architectures (44 bytes). > > The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and > daddr. This would fail on parisc64 due to the following code snippet in > arch/parisc/include/asm/checksum.h: > > add %4, %0, %0\n" > ldd,ma 8(%1), %6\n" > ldd,ma 8(%2), %7\n" > add,dc %5, %0, %0\n" > > The second add is expecting carry flags from the first add. Normally, > a double word load (ldd) does not modify the carry flags. However, > because saddr and daddr may be misaligned, ldd triggers a misalignment > trap that gets handled in arch/parisc/kernel/unaligned.c. This causes > many additional instructions to be executed between the two adds. This > can be easily solved by adding the carry into %0 before executing the > ldd. > I really think this is a bug either in the trap handler or in the hppa64 qemu emulation. Only unaligned ldd instructions affect (actually, unconditionally set) the carry flag. That doesn't happen with unaligned ldw instructions. It would be worthwhile tracking this down since there are lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) when running the kernel in 64-bit mode. On the other side, I guess this is a different problem. Not sure though if that should even be mentioned here since that makes it sound as if it would be expected that such accesses impact the carry flag. > However, that is not necessary since ipv6 headers should always be > aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to 2 > and the ethernet header size is 14. > > Architectures that set NET_IP_ALIGN to 0 must support misaligned saddr > and daddr, but that is not tested here. > > Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> I'll run this through my test system and let you know how it goes. It should be done in a couple of hours. Thanks, Guenter
On 2/14/24 13:41, Charlie Jenkins wrote: > The test cases for ip_fast_csum and csum_ipv6_magic were failing on a > variety of architectures that are big endian or do not support > misalgined accesses. Both of these test cases are changed to support big > and little endian architectures. > > The test for ip_fast_csum is changed to align the data along (14 + > NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for > csum_ipv6_magic aligns the data using a struct. An extra padding field > is added to the struct to ensure that the size of the struct is the same > on all architectures (44 bytes). > > The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and > daddr. This would fail on parisc64 due to the following code snippet in > arch/parisc/include/asm/checksum.h: > > add %4, %0, %0\n" > ldd,ma 8(%1), %6\n" > ldd,ma 8(%2), %7\n" > add,dc %5, %0, %0\n" > > The second add is expecting carry flags from the first add. Normally, > a double word load (ldd) does not modify the carry flags. However, > because saddr and daddr may be misaligned, ldd triggers a misalignment > trap that gets handled in arch/parisc/kernel/unaligned.c. This causes > many additional instructions to be executed between the two adds. This > can be easily solved by adding the carry into %0 before executing the > ldd. > > However, that is not necessary since ipv6 headers should always be > aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to 2 > and the ethernet header size is 14. > > Architectures that set NET_IP_ALIGN to 0 must support misaligned saddr > and daddr, but that is not tested here. > > Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-and-Tested-by: Guenter Roeck <linux@roeck-us.net> Guenter
On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote: > On 2/14/24 13:41, Charlie Jenkins wrote: > > The test cases for ip_fast_csum and csum_ipv6_magic were failing on a > > variety of architectures that are big endian or do not support > > misalgined accesses. Both of these test cases are changed to support big > > and little endian architectures. > > > > The test for ip_fast_csum is changed to align the data along (14 + > > NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for > > csum_ipv6_magic aligns the data using a struct. An extra padding field > > is added to the struct to ensure that the size of the struct is the same > > on all architectures (44 bytes). > > > > The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and > > daddr. This would fail on parisc64 due to the following code snippet in > > arch/parisc/include/asm/checksum.h: > > > > add %4, %0, %0\n" > > ldd,ma 8(%1), %6\n" > > ldd,ma 8(%2), %7\n" > > add,dc %5, %0, %0\n" > > > > The second add is expecting carry flags from the first add. Normally, > > a double word load (ldd) does not modify the carry flags. However, > > because saddr and daddr may be misaligned, ldd triggers a misalignment > > trap that gets handled in arch/parisc/kernel/unaligned.c. This causes > > many additional instructions to be executed between the two adds. This > > can be easily solved by adding the carry into %0 before executing the > > ldd. > > > > I really think this is a bug either in the trap handler or in the hppa64 > qemu emulation. Only unaligned ldd instructions affect (actually, > unconditionally set) the carry flag. That doesn't happen with unaligned > ldw instructions. It would be worthwhile tracking this down since there are > lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) > when running the kernel in 64-bit mode. On the other side, I guess this > is a different problem. Not sure though if that should even be mentioned > here since that makes it sound as if it would be expected that such > accesses impact the carry flag. I wasn't confident it was a bug somewhere so that's why I sent this patch. However, I have just found the section of the processor manual [1] I was looking for (Section Privileged Software-Accessible Registers subsection Processor Status Word (PSW)): "Processor state is encoded in a 64-bit register called the Processor Status Word (PSW). When an interruption occurs, the current value of the PSW is saved in the Interruption Processor Status Word (IPSW) and usually all defined PSW bits are set to 0. "The PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION instruction. The interruption handler may restore the original PSW, modify selected bits, or may change the PSW to an entirely new value." Stored in the PSW register are the "Carry/borrow bits". This confirms that the carry/borrow bits should be restored. The save is supposed to automatically happen upon an interrupt and restored by the RETURN FROM INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please correct me if I am wrong). This v8 was not needed after-all it seems. It would be best to stick with the v7. [1] https://parisc.wiki.kernel.org/images-parisc/7/73/Parisc2.0.pdf > > > However, that is not necessary since ipv6 headers should always be > > aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to > > 2 and the ethernet header size is 14. > > > > Architectures that set NET_IP_ALIGN to 0 must support misaligned > > saddr and daddr, but that is not tested here. > > > > Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and > > ip_fast_csum") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> > > I'll run this through my test system and let you know how it goes. It > should be done in a couple of hours. > > Thanks, Guenter >
Hi Charlie, On 2/14/24 17:30, Charlie Jenkins wrote: > On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote: >> On 2/14/24 13:41, Charlie Jenkins wrote: >>> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a >>> variety of architectures that are big endian or do not support >>> misalgined accesses. Both of these test cases are changed to support big >>> and little endian architectures. >>> >>> The test for ip_fast_csum is changed to align the data along (14 + >>> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for >>> csum_ipv6_magic aligns the data using a struct. An extra padding field >>> is added to the struct to ensure that the size of the struct is the same >>> on all architectures (44 bytes). >>> >>> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and >>> daddr. This would fail on parisc64 due to the following code snippet in >>> arch/parisc/include/asm/checksum.h: >>> >>> add %4, %0, %0\n" >>> ldd,ma 8(%1), %6\n" >>> ldd,ma 8(%2), %7\n" >>> add,dc %5, %0, %0\n" >>> >>> The second add is expecting carry flags from the first add. Normally, >>> a double word load (ldd) does not modify the carry flags. However, >>> because saddr and daddr may be misaligned, ldd triggers a misalignment >>> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes >>> many additional instructions to be executed between the two adds. This >>> can be easily solved by adding the carry into %0 before executing the >>> ldd. >>> >> >> I really think this is a bug either in the trap handler or in the hppa64 >> qemu emulation. Only unaligned ldd instructions affect (actually, >> unconditionally set) the carry flag. That doesn't happen with unaligned >> ldw instructions. It would be worthwhile tracking this down since there are >> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >> when running the kernel in 64-bit mode. On the other side, I guess this >> is a different problem. Not sure though if that should even be mentioned >> here since that makes it sound as if it would be expected that such >> accesses impact the carry flag. > > I wasn't confident it was a bug somewhere so that's why I sent this patch. > > However, I have just found the section of the processor manual [1] I was > looking for (Section Privileged Software-Accessible Registers subsection > Processor Status Word (PSW)): > > "Processor state is encoded in a 64-bit register called the Processor > Status Word (PSW). When an interruption occurs, the current value of the > PSW is saved in the Interruption Processor Status Word (IPSW) and > usually all defined PSW bits are set to 0. > > "The PSW is set to the contents of the IPSW by the RETURN FROM > INTERRUPTION instruction. The interruption handler may restore the > original PSW, modify selected bits, or may change the PSW to an entirely > new value." > > Stored in the PSW register are the "Carry/borrow bits". This confirms > that the carry/borrow bits should be restored. The save is supposed to > automatically happen upon an interrupt and restored by the RETURN FROM > INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please > correct me if I am wrong). > I know that much (I looked into the manual as well), I just really don't know if this is a Linux bug or a QEMU bug, and I have not been able to nail it down. I think someone with access to hardware will need to confirm. Specifically: Yes, the carry/borrow bits should be restored. Question is if the Linux kernel's interrupt handler doesn't restore the carry bits or if the problem is on the qemu side. > This v8 was not needed after-all it seems. It would be best to stick > with the v7. > I tend to agree; after all, v7 exposes the problem, making it easier to determine if the problem can be reproduced on real hardware. FWIW,I wrote some test code which exposes the problem. It is quite easy to show that carry is always set after executing ldd on an unaligned address. That is also why I know for sure that the problem is not seen with ldw on unaligned addresses. Thanks, Guenter
On 2024-02-14 8:58 p.m., Guenter Roeck wrote: > Specifically: Yes, the carry/borrow bits should be restored. Question is > if the Linux kernel's interrupt handler doesn't restore the carry bits > or if the problem is on the qemu side. The carry/borrow bits in the PSW should be saved and restored by the save_specials and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. However, it appears the tophys macro might clobber the carry bits before they are saved in intr_save. Dave
On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: > On 2024-02-14 8:58 p.m., Guenter Roeck wrote: > > Specifically: Yes, the carry/borrow bits should be restored. Question is > > if the Linux kernel's interrupt handler doesn't restore the carry bits > > or if the problem is on the qemu side. > The carry/borrow bits in the PSW should be saved and restored by the save_specials > and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. Why would they be needed to be restored in linux? The manual says "The PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION instruction". This means that the PSW must be restored by the hardware. We can see the QEMU implementation in: rfi: https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 handling interrupt: https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 However the implementation appears to be faulty. During an RFI, the PSW is always set to 0x804000e (regardless of what the PSW was before the interrupt). - Charlie > > However, it appears the tophys macro might clobber the carry bits before they > are saved in intr_save. > > Dave > > -- > John David Anglin dave.anglin@bell.net >
On Wed, Feb 14, 2024 at 10:35:26PM -0500, Charlie Jenkins wrote: > On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: > > On 2024-02-14 8:58 p.m., Guenter Roeck wrote: > > > Specifically: Yes, the carry/borrow bits should be restored. Question is > > > if the Linux kernel's interrupt handler doesn't restore the carry bits > > > or if the problem is on the qemu side. > > The carry/borrow bits in the PSW should be saved and restored by the save_specials > > and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. To clarify my previous point, even if rest_specials is restoring PSW, the "restored" values are going to be overwritten during the rfi instruction, since that instruction is defined to restore PSW. The handshake here seems to be lost, with both the hardware and linux messing with PSW and IPSW and both expecting the other to not mess with the values. > > Why would they be needed to be restored in linux? The manual says "The > PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION > instruction". This means that the PSW must be restored by the hardware. > > We can see the QEMU implementation in: > > rfi: > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 > > handling interrupt: > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 > > However the implementation appears to be faulty. During an RFI, the PSW > is always set to 0x804000e (regardless of what the PSW was before the > interrupt). > > - Charlie > > > > > > However, it appears the tophys macro might clobber the carry bits before they > > are saved in intr_save. > > > > Dave > > > > -- > > John David Anglin dave.anglin@bell.net > >
On 2/14/24 19:35, Charlie Jenkins wrote: > On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: >> On 2024-02-14 8:58 p.m., Guenter Roeck wrote: >>> Specifically: Yes, the carry/borrow bits should be restored. Question is >>> if the Linux kernel's interrupt handler doesn't restore the carry bits >>> or if the problem is on the qemu side. >> The carry/borrow bits in the PSW should be saved and restored by the save_specials >> and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. > > Why would they be needed to be restored in linux? The manual says "The > PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION > instruction". This means that the PSW must be restored by the hardware. > > We can see the QEMU implementation in: > > rfi: > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 > > handling interrupt: > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 > > However the implementation appears to be faulty. During an RFI, the PSW > is always set to 0x804000e (regardless of what the PSW was before the > interrupt). > Not sure if I agree. The interrupt handler in Linux is the one which needs to set IPSW. Looking into the code, I agree with Dave that the tophys macro seems to clobber the carry bits before psw is saved, so they can not really be restored. The only issue with that idea is that I can only reproduce the problem with an interrupted ldd instruction but not, for example, with ldw. This is why it would be really important to have someone with real hardware test this. Thanks, Guenter
... > It would be worthwhile tracking this down since there are > lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) > when running the kernel in 64-bit mode. Hmmm.... For performance reasons you really don't want any of them. The misaligned 64bit fields need an __attribute((aligned(4)) marker. If the checksum code can do them it really needs to detect and handle the misalignment. The misaligned trap handler probably ought to contain a warn_on_once() to dump stack on the first such error. They can then be fixed one at a time. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 2/14/24 19:00, John David Anglin wrote: > On 2024-02-14 8:58 p.m., Guenter Roeck wrote: >> Specifically: Yes, the carry/borrow bits should be restored. Question is >> if the Linux kernel's interrupt handler doesn't restore the carry bits >> or if the problem is on the qemu side. > The carry/borrow bits in the PSW should be saved and restored by the save_specials > and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. > > However, it appears the tophys macro might clobber the carry bits before they > are saved in intr_save. > Actually, I did some logging and found that the correct carry bits do find their way into the IPSW register in the Linux kernel. Only when userspace is back after handling the unaligned LDD, the carry flag is always set to 1, no matter how it was set prior to the unaligned access. Guenter
On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote: > On 2/14/24 19:35, Charlie Jenkins wrote: > > On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: > > > On 2024-02-14 8:58 p.m., Guenter Roeck wrote: > > > > Specifically: Yes, the carry/borrow bits should be restored. Question is > > > > if the Linux kernel's interrupt handler doesn't restore the carry bits > > > > or if the problem is on the qemu side. > > > The carry/borrow bits in the PSW should be saved and restored by the save_specials > > > and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. > > > > Why would they be needed to be restored in linux? The manual says "The > > PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION > > instruction". This means that the PSW must be restored by the hardware. > > > > We can see the QEMU implementation in: > > > > rfi: > > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 > > > > handling interrupt: > > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 > > > > However the implementation appears to be faulty. During an RFI, the PSW > > is always set to 0x804000e (regardless of what the PSW was before the > > interrupt). > > > > Not sure if I agree. The interrupt handler in Linux is the one which needs to set > IPSW. Looking into the code, I agree with Dave that the tophys macro seems to > clobber the carry bits before psw is saved, so they can not really be restored. > The only issue with that idea is that I can only reproduce the problem with > an interrupted ldd instruction but not, for example, with ldw. This is why it > would be really important to have someone with real hardware test this. > > Thanks, > Guenter Yes, we definitely feedback from somebody with access to hardware, but I do not understand how "The PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION" could be interpreted as anything except that the hardware is expected to over-write the contents of the PSW during the rfi. - Charlie >
On 2/15/24 02:27, David Laight wrote: > ... >> It would be worthwhile tracking this down since there are >> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >> when running the kernel in 64-bit mode. > > Hmmm.... > For performance reasons you really don't want any of them. > The misaligned 64bit fields need an __attribute((aligned(4)) marker. > > If the checksum code can do them it really needs to detect > and handle the misalignment. > > The misaligned trap handler probably ought to contain a > warn_on_once() to dump stack on the first such error. > They can then be fixed one at a time. > Unaligned LDD at unwind_once+0x4a8/0x5e0 Decoded: Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445) Source: static bool pc_is_kernel_fn(unsigned long pc, void *fn) { return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; } Disassembled: c38: 50 3c 00 00 ldd 0(r1),ret0 c3c: 08 1b 02 44 copy dp,r4 c40: 0f 80 10 da ldd 0(ret0),r26 <-- c44: 37 dd 3f a1 ldo -30(sp),ret1 c48: 51 02 00 20 ldd 10(r8),rp c4c: e8 40 f0 00 bve,l (rp),rp c50: 51 1b 00 30 ldd 18(r8),dp Hmm, interesting. See below for a possible fix. Should I submit a patch ? Thanks, Guenter --- diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index ab23e61a6f01..1d2aab619466 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -772,6 +772,7 @@ ENTRY_CFI(_switch_to) bv %r0(%r2) mtctl %r25,%cr30 + .align 8 ENTRY(_switch_to_ret) mtctl %r0, %cr0 /* Needed for single stepping */ callee_rest @@ -1702,6 +1703,7 @@ ENTRY_CFI(sys_rt_sigreturn_wrapper) LDREG PT_GR28(%r1),%r28 /* reload original r28 for syscall_exit */ ENDPROC_CFI(sys_rt_sigreturn_wrapper) + .align 8 ENTRY(syscall_exit) /* NOTE: Not all syscalls exit this way. rt_sigreturn will exit * via syscall_exit_rfi if the signal was received while the process
On 2/15/24 07:36, Charlie Jenkins wrote: > On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote: >> On 2/14/24 19:35, Charlie Jenkins wrote: >>> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: >>>> On 2024-02-14 8:58 p.m., Guenter Roeck wrote: >>>>> Specifically: Yes, the carry/borrow bits should be restored. Question is >>>>> if the Linux kernel's interrupt handler doesn't restore the carry bits >>>>> or if the problem is on the qemu side. >>>> The carry/borrow bits in the PSW should be saved and restored by the save_specials >>>> and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. >>> >>> Why would they be needed to be restored in linux? The manual says "The >>> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION >>> instruction". This means that the PSW must be restored by the hardware. >>> >>> We can see the QEMU implementation in: >>> >>> rfi: >>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 >>> >>> handling interrupt: >>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 >>> >>> However the implementation appears to be faulty. During an RFI, the PSW >>> is always set to 0x804000e (regardless of what the PSW was before the >>> interrupt). >>> >> >> Not sure if I agree. The interrupt handler in Linux is the one which needs to set >> IPSW. Looking into the code, I agree with Dave that the tophys macro seems to >> clobber the carry bits before psw is saved, so they can not really be restored. >> The only issue with that idea is that I can only reproduce the problem with >> an interrupted ldd instruction but not, for example, with ldw. This is why it >> would be really important to have someone with real hardware test this. >> >> Thanks, >> Guenter > > Yes, we definitely feedback from somebody with access to hardware, but I > do not understand how "The PSW is set to the contents of the IPSW by the > RETURN FROM INTERRUPTION" could be interpreted as anything except that > the hardware is expected to over-write the contents of the PSW during > the rfi. > Sure, I absolutely agree. But that assumes that IPSW is set correctly in the Linux interrupt handler. We do know that something odd happens when an unaligned ldd is encountered. At least for my part I don't know if the problem is in emulate_ldd() in the Linux kernel or in the ldd implementation and trap handling in qemu. I do know (from my logs) that qemu does see the correct PSW/IPSW values, because they do show up correctly in the Linux kernel when running the qemu emulation. Only it somehow gets lost when the Linux interrupt handler returns. Thanks. Guenter
On Thu, Feb 15, 2024 at 08:30:22AM -0800, Guenter Roeck wrote: > On 2/15/24 07:36, Charlie Jenkins wrote: > > On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote: > > > On 2/14/24 19:35, Charlie Jenkins wrote: > > > > On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: > > > > > On 2024-02-14 8:58 p.m., Guenter Roeck wrote: > > > > > > Specifically: Yes, the carry/borrow bits should be restored. Question is > > > > > > if the Linux kernel's interrupt handler doesn't restore the carry bits > > > > > > or if the problem is on the qemu side. > > > > > The carry/borrow bits in the PSW should be saved and restored by the save_specials > > > > > and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. > > > > > > > > Why would they be needed to be restored in linux? The manual says "The > > > > PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION > > > > instruction". This means that the PSW must be restored by the hardware. > > > > > > > > We can see the QEMU implementation in: > > > > > > > > rfi: > > > > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 > > > > > > > > handling interrupt: > > > > https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 > > > > > > > > However the implementation appears to be faulty. During an RFI, the PSW > > > > is always set to 0x804000e (regardless of what the PSW was before the > > > > interrupt). > > > > > > > > > > Not sure if I agree. The interrupt handler in Linux is the one which needs to set > > > IPSW. Looking into the code, I agree with Dave that the tophys macro seems to > > > clobber the carry bits before psw is saved, so they can not really be restored. > > > The only issue with that idea is that I can only reproduce the problem with > > > an interrupted ldd instruction but not, for example, with ldw. This is why it > > > would be really important to have someone with real hardware test this. > > > > > > Thanks, > > > Guenter > > > > Yes, we definitely feedback from somebody with access to hardware, but I > > do not understand how "The PSW is set to the contents of the IPSW by the > > RETURN FROM INTERRUPTION" could be interpreted as anything except that > > the hardware is expected to over-write the contents of the PSW during > > the rfi. > > > > Sure, I absolutely agree. But that assumes that IPSW is set correctly > in the Linux interrupt handler. We do know that something odd happens The manual defines the saving of PSW as the responsibility of the hardware as well: "When an interruption occurs, the current value of the PSW is saved in the Interruption Processor Status Word (IPSW)". I don't think this should be interpreted to mean that a software interrupt handler is required to save the IPSW. - Charlie > when an unaligned ldd is encountered. At least for my part I don't know > if the problem is in emulate_ldd() in the Linux kernel or in the ldd > implementation and trap handling in qemu. I do know (from my logs) > that qemu does see the correct PSW/IPSW values, because they do > show up correctly in the Linux kernel when running the qemu emulation. > Only it somehow gets lost when the Linux interrupt handler returns. > > Thanks. > Guenter >
On 2024-02-15 10:44 a.m., Guenter Roeck wrote: > On 2/15/24 02:27, David Laight wrote: >> ... >>> It would be worthwhile tracking this down since there are >>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >>> when running the kernel in 64-bit mode. >> >> Hmmm.... >> For performance reasons you really don't want any of them. >> The misaligned 64bit fields need an __attribute((aligned(4)) marker. >> >> If the checksum code can do them it really needs to detect >> and handle the misalignment. >> >> The misaligned trap handler probably ought to contain a >> warn_on_once() to dump stack on the first such error. >> They can then be fixed one at a time. >> > > Unaligned LDD at unwind_once+0x4a8/0x5e0 > > Decoded: > > Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 > arch/parisc/kernel/unwind.c:445) > > Source: > > static bool pc_is_kernel_fn(unsigned long pc, void *fn) > { > return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; This looks wrong to me. Function descriptors should always be 8-byte aligned. I think this routine should return false if fn isn't 8-byte aligned. > } > > Disassembled: > > c38: 50 3c 00 00 ldd 0(r1),ret0 > c3c: 08 1b 02 44 copy dp,r4 > c40: 0f 80 10 da ldd 0(ret0),r26 <-- > c44: 37 dd 3f a1 ldo -30(sp),ret1 > c48: 51 02 00 20 ldd 10(r8),rp > c4c: e8 40 f0 00 bve,l (rp),rp > c50: 51 1b 00 30 ldd 18(r8),dp > > Hmm, interesting. See below for a possible fix. Should I submit a patch ? > > Thanks, > Guenter > > --- > diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S > index ab23e61a6f01..1d2aab619466 100644 > --- a/arch/parisc/kernel/entry.S > +++ b/arch/parisc/kernel/entry.S > @@ -772,6 +772,7 @@ ENTRY_CFI(_switch_to) > bv %r0(%r2) > mtctl %r25,%cr30 > > + .align 8 Code entry points only need 4-byte alignment. > ENTRY(_switch_to_ret) > mtctl %r0, %cr0 /* Needed for single stepping */ > callee_rest > @@ -1702,6 +1703,7 @@ ENTRY_CFI(sys_rt_sigreturn_wrapper) > LDREG PT_GR28(%r1),%r28 /* reload original r28 for syscall_exit */ > ENDPROC_CFI(sys_rt_sigreturn_wrapper) > > + .align 8 Ditto. > ENTRY(syscall_exit) > /* NOTE: Not all syscalls exit this way. rt_sigreturn will exit > * via syscall_exit_rfi if the signal was received while the process Dave
On 2/15/24 08:51, John David Anglin wrote: > On 2024-02-15 10:44 a.m., Guenter Roeck wrote: >> On 2/15/24 02:27, David Laight wrote: >>> ... >>>> It would be worthwhile tracking this down since there are >>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >>>> when running the kernel in 64-bit mode. >>> >>> Hmmm.... >>> For performance reasons you really don't want any of them. >>> The misaligned 64bit fields need an __attribute((aligned(4)) marker. >>> >>> If the checksum code can do them it really needs to detect >>> and handle the misalignment. >>> >>> The misaligned trap handler probably ought to contain a >>> warn_on_once() to dump stack on the first such error. >>> They can then be fixed one at a time. >>> >> >> Unaligned LDD at unwind_once+0x4a8/0x5e0 >> >> Decoded: >> >> Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445) >> >> Source: >> >> static bool pc_is_kernel_fn(unsigned long pc, void *fn) >> { >> return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; > This looks wrong to me. Function descriptors should always be 8-byte aligned. I think this > routine should return false if fn isn't 8-byte aligned. Below you state "Code entry points only need 4-byte alignment." I think that contradicts each other. Also, the calling code is, for example, pc_is_kernel_fn(pc, syscall_exit) I fail to see how this can be consolidated if it is ok that syscall_exit is 4-byte aligned but, at the same time, must be 8-byte aligned to be considered to be a kernel function. Guenter >> } >> >> Disassembled: >> >> c38: 50 3c 00 00 ldd 0(r1),ret0 >> c3c: 08 1b 02 44 copy dp,r4 >> c40: 0f 80 10 da ldd 0(ret0),r26 <-- >> c44: 37 dd 3f a1 ldo -30(sp),ret1 >> c48: 51 02 00 20 ldd 10(r8),rp >> c4c: e8 40 f0 00 bve,l (rp),rp >> c50: 51 1b 00 30 ldd 18(r8),dp >> >> Hmm, interesting. See below for a possible fix. Should I submit a patch ? >> >> Thanks, >> Guenter >> >> --- >> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S >> index ab23e61a6f01..1d2aab619466 100644 >> --- a/arch/parisc/kernel/entry.S >> +++ b/arch/parisc/kernel/entry.S >> @@ -772,6 +772,7 @@ ENTRY_CFI(_switch_to) >> bv %r0(%r2) >> mtctl %r25,%cr30 >> >> + .align 8 > Code entry points only need 4-byte alignment. >> ENTRY(_switch_to_ret) >> mtctl %r0, %cr0 /* Needed for single stepping */ >> callee_rest >> @@ -1702,6 +1703,7 @@ ENTRY_CFI(sys_rt_sigreturn_wrapper) >> LDREG PT_GR28(%r1),%r28 /* reload original r28 for syscall_exit */ >> ENDPROC_CFI(sys_rt_sigreturn_wrapper) >> >> + .align 8 > Ditto. >> ENTRY(syscall_exit) >> /* NOTE: Not all syscalls exit this way. rt_sigreturn will exit >> * via syscall_exit_rfi if the signal was received while the process > > Dave >
On 2024-02-15 11:51 a.m., Charlie Jenkins wrote: > On Thu, Feb 15, 2024 at 08:30:22AM -0800, Guenter Roeck wrote: >> On 2/15/24 07:36, Charlie Jenkins wrote: >>> On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote: >>>> On 2/14/24 19:35, Charlie Jenkins wrote: >>>>> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: >>>>>> On 2024-02-14 8:58 p.m., Guenter Roeck wrote: >>>>>>> Specifically: Yes, the carry/borrow bits should be restored. Question is >>>>>>> if the Linux kernel's interrupt handler doesn't restore the carry bits >>>>>>> or if the problem is on the qemu side. >>>>>> The carry/borrow bits in the PSW should be saved and restored by the save_specials >>>>>> and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. >>>>> Why would they be needed to be restored in linux? The manual says "The >>>>> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION >>>>> instruction". This means that the PSW must be restored by the hardware. >>>>> >>>>> We can see the QEMU implementation in: >>>>> >>>>> rfi: >>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 >>>>> >>>>> handling interrupt: >>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 >>>>> >>>>> However the implementation appears to be faulty. During an RFI, the PSW >>>>> is always set to 0x804000e (regardless of what the PSW was before the >>>>> interrupt). >>>>> >>>> Not sure if I agree. The interrupt handler in Linux is the one which needs to set >>>> IPSW. Looking into the code, I agree with Dave that the tophys macro seems to >>>> clobber the carry bits before psw is saved, so they can not really be restored. >>>> The only issue with that idea is that I can only reproduce the problem with >>>> an interrupted ldd instruction but not, for example, with ldw. This is why it >>>> would be really important to have someone with real hardware test this. >>>> >>>> Thanks, >>>> Guenter >>> Yes, we definitely feedback from somebody with access to hardware, but I >>> do not understand how "The PSW is set to the contents of the IPSW by the >>> RETURN FROM INTERRUPTION" could be interpreted as anything except that >>> the hardware is expected to over-write the contents of the PSW during >>> the rfi. >>> >> Sure, I absolutely agree. But that assumes that IPSW is set correctly >> in the Linux interrupt handler. We do know that something odd happens > The manual defines the saving of PSW as the responsibility of the > hardware as well: "When an interruption occurs, the current value of the > PSW is saved in the Interruption Processor Status Word (IPSW)". I don't > think this should be interpreted to mean that a software interrupt > handler is required to save the IPSW. The IPSW (cr22) is saved by save_specials to regs->gr[0]. It is modified in various places when an interruption is handled. In the case of emulate_ldd, we have /* else we handled it, let life go on. */ regs->gr[0]|=PSW_N; This is supposed to nullify the faulting ldd. I've yet to find where the carry bit is getting set in the PSW. There is a gap between where the hardware sets IPSW and where it is saved to the stack in save_specials. tophys might clobber the IPSW is there is a double fault. But it seems more likely that regs->gr[0] is getting clobbered somehow. Dave > > - Charlie > >> when an unaligned ldd is encountered. At least for my part I don't know >> if the problem is in emulate_ldd() in the Linux kernel or in the ldd >> implementation and trap handling in qemu. I do know (from my logs) >> that qemu does see the correct PSW/IPSW values, because they do >> show up correctly in the Linux kernel when running the qemu emulation. >> Only it somehow gets lost when the Linux interrupt handler returns. >> >> Thanks. >> Guenter >>
On 2024-02-15 12:06 p.m., Guenter Roeck wrote: > On 2/15/24 08:51, John David Anglin wrote: >> On 2024-02-15 10:44 a.m., Guenter Roeck wrote: >>> On 2/15/24 02:27, David Laight wrote: >>>> ... >>>>> It would be worthwhile tracking this down since there are >>>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >>>>> when running the kernel in 64-bit mode. >>>> >>>> Hmmm.... >>>> For performance reasons you really don't want any of them. >>>> The misaligned 64bit fields need an __attribute((aligned(4)) marker. >>>> >>>> If the checksum code can do them it really needs to detect >>>> and handle the misalignment. >>>> >>>> The misaligned trap handler probably ought to contain a >>>> warn_on_once() to dump stack on the first such error. >>>> They can then be fixed one at a time. >>>> >>> >>> Unaligned LDD at unwind_once+0x4a8/0x5e0 >>> >>> Decoded: >>> >>> Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 >>> arch/parisc/kernel/unwind.c:445) >>> >>> Source: >>> >>> static bool pc_is_kernel_fn(unsigned long pc, void *fn) >>> { >>> return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; >> This looks wrong to me. Function descriptors should always be 8-byte aligned. I think this >> routine should return false if fn isn't 8-byte aligned. > > Below you state "Code entry points only need 4-byte alignment." > > I think that contradicts each other. Also, the calling code is, > for example, > pc_is_kernel_fn(pc, syscall_exit) > > I fail to see how this can be consolidated if it is ok > that syscall_exit is 4-byte aligned but, at the same time, > must be 8-byte aligned to be considered to be a kernel function. In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned function descriptor. The descriptor holds the actual address of the function. It only needs 4-byte alignment. Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed using ldd instructions. Dave
On 2/15/24 08:51, Charlie Jenkins wrote: > On Thu, Feb 15, 2024 at 08:30:22AM -0800, Guenter Roeck wrote: >> On 2/15/24 07:36, Charlie Jenkins wrote: >>> On Thu, Feb 15, 2024 at 12:56:13AM -0800, Guenter Roeck wrote: >>>> On 2/14/24 19:35, Charlie Jenkins wrote: >>>>> On Wed, Feb 14, 2024 at 10:00:37PM -0500, John David Anglin wrote: >>>>>> On 2024-02-14 8:58 p.m., Guenter Roeck wrote: >>>>>>> Specifically: Yes, the carry/borrow bits should be restored. Question is >>>>>>> if the Linux kernel's interrupt handler doesn't restore the carry bits >>>>>>> or if the problem is on the qemu side. >>>>>> The carry/borrow bits in the PSW should be saved and restored by the save_specials >>>>>> and rest_specials macros. They are defined in arch/parisc/include/asm/assembly.h. >>>>> >>>>> Why would they be needed to be restored in linux? The manual says "The >>>>> PSW is set to the contents of the IPSW by the RETURN FROM INTERRUPTION >>>>> instruction". This means that the PSW must be restored by the hardware. >>>>> >>>>> We can see the QEMU implementation in: >>>>> >>>>> rfi: >>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/sys_helper.c#L93 >>>>> >>>>> handling interrupt: >>>>> https://github.com/qemu/qemu/blob/v8.2.1/target/hppa/int_helper.c#L109 >>>>> >>>>> However the implementation appears to be faulty. During an RFI, the PSW >>>>> is always set to 0x804000e (regardless of what the PSW was before the >>>>> interrupt). >>>>> >>>> >>>> Not sure if I agree. The interrupt handler in Linux is the one which needs to set >>>> IPSW. Looking into the code, I agree with Dave that the tophys macro seems to >>>> clobber the carry bits before psw is saved, so they can not really be restored. >>>> The only issue with that idea is that I can only reproduce the problem with >>>> an interrupted ldd instruction but not, for example, with ldw. This is why it >>>> would be really important to have someone with real hardware test this. >>>> >>>> Thanks, >>>> Guenter >>> >>> Yes, we definitely feedback from somebody with access to hardware, but I >>> do not understand how "The PSW is set to the contents of the IPSW by the >>> RETURN FROM INTERRUPTION" could be interpreted as anything except that >>> the hardware is expected to over-write the contents of the PSW during >>> the rfi. >>> >> >> Sure, I absolutely agree. But that assumes that IPSW is set correctly >> in the Linux interrupt handler. We do know that something odd happens > > The manual defines the saving of PSW as the responsibility of the > hardware as well: "When an interruption occurs, the current value of the > PSW is saved in the Interruption Processor Status Word (IPSW)". I don't > think this should be interpreted to mean that a software interrupt > handler is required to save the IPSW. > Sorry, I meant the manipulation of ipsw by the linux interrupt handler. Guenter > - Charlie > >> when an unaligned ldd is encountered. At least for my part I don't know >> if the problem is in emulate_ldd() in the Linux kernel or in the ldd >> implementation and trap handling in qemu. I do know (from my logs) >> that qemu does see the correct PSW/IPSW values, because they do >> show up correctly in the Linux kernel when running the qemu emulation. >> Only it somehow gets lost when the Linux interrupt handler returns. >> >> Thanks. >> Guenter >>
On 2/15/24 09:25, John David Anglin wrote: [ ... ] >>>> Source: >>>> >>>> static bool pc_is_kernel_fn(unsigned long pc, void *fn) >>>> { >>>> return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; >>> This looks wrong to me. Function descriptors should always be 8-byte aligned. I think this >>> routine should return false if fn isn't 8-byte aligned. >> >> Below you state "Code entry points only need 4-byte alignment." >> >> I think that contradicts each other. Also, the calling code is, >> for example, >> pc_is_kernel_fn(pc, syscall_exit) >> >> I fail to see how this can be consolidated if it is ok >> that syscall_exit is 4-byte aligned but, at the same time, >> must be 8-byte aligned to be considered to be a kernel function. > In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned > function descriptor. The descriptor holds the actual address of the function. It only needs > 4-byte alignment. > > Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed > using ldd instructions. > Maybe code such as pc_is_kernel_fn(pc, syscall_exit) is wrong because syscall_exit doesn't point to a function descriptor but to the actual address. The code and comments in arch/parisc/kernel/unwind.c is for sure confusing because it talks about not using dereference_kernel_function_descriptor() to keep things simple but then calls dereference_kernel_function_descriptor() anyway. Maybe it should just be if (pc == syscall_exit) instead. The entire code is really odd anyway. ptr = dereference_kernel_function_descriptor(&handle_interruption); if (pc_is_kernel_fn(pc, ptr)) { and then pc_is_kernel_fn() dereferences it again. Weird. It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when CONFIG_64BIT is enabled") might have messed this up. No idea how to fix it properly, though. Thanks, Guenter
On 2/15/24 09:25, John David Anglin wrote: > On 2024-02-15 12:06 p.m., Guenter Roeck wrote: >> On 2/15/24 08:51, John David Anglin wrote: >>> On 2024-02-15 10:44 a.m., Guenter Roeck wrote: >>>> On 2/15/24 02:27, David Laight wrote: >>>>> ... >>>>>> It would be worthwhile tracking this down since there are >>>>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >>>>>> when running the kernel in 64-bit mode. >>>>> >>>>> Hmmm.... >>>>> For performance reasons you really don't want any of them. >>>>> The misaligned 64bit fields need an __attribute((aligned(4)) marker. >>>>> >>>>> If the checksum code can do them it really needs to detect >>>>> and handle the misalignment. >>>>> >>>>> The misaligned trap handler probably ought to contain a >>>>> warn_on_once() to dump stack on the first such error. >>>>> They can then be fixed one at a time. >>>>> >>>> >>>> Unaligned LDD at unwind_once+0x4a8/0x5e0 >>>> >>>> Decoded: >>>> >>>> Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445) >>>> >>>> Source: >>>> >>>> static bool pc_is_kernel_fn(unsigned long pc, void *fn) >>>> { >>>> return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; >>> This looks wrong to me. Function descriptors should always be 8-byte aligned. I think this >>> routine should return false if fn isn't 8-byte aligned. >> >> Below you state "Code entry points only need 4-byte alignment." >> >> I think that contradicts each other. Also, the calling code is, >> for example, >> pc_is_kernel_fn(pc, syscall_exit) >> >> I fail to see how this can be consolidated if it is ok >> that syscall_exit is 4-byte aligned but, at the same time, >> must be 8-byte aligned to be considered to be a kernel function. > In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned > function descriptor. The descriptor holds the actual address of the function. It only needs > 4-byte alignment. > > Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed > using ldd instructions. > How about the patch below ? Guenter --- diff --git a/arch/parisc/kernel/unwind.c b/arch/parisc/kernel/unwind.c index 27ae40a443b8..c2b9e23cbc0a 100644 --- a/arch/parisc/kernel/unwind.c +++ b/arch/parisc/kernel/unwind.c @@ -214,24 +214,14 @@ static bool pc_is_kernel_fn(unsigned long pc, void *fn) static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int frame_size) { - /* - * We have to use void * instead of a function pointer, because - * function pointers aren't a pointer to the function on 64-bit. - * Make them const so the compiler knows they live in .text - * Note: We could use dereference_kernel_function_descriptor() - * instead but we want to keep it simple here. - */ - extern void * const ret_from_kernel_thread; - extern void * const syscall_exit; - extern void * const intr_return; - extern void * const _switch_to_ret; + void (*ret_from_kernel_thread)(void); + void (*syscall_exit)(void); + void (*intr_return)(void); + void (*_switch_to_ret)(void); #ifdef CONFIG_IRQSTACKS - extern void * const _call_on_stack; + void (*_call_on_stack)(void); #endif /* CONFIG_IRQSTACKS */ - void *ptr; - - ptr = dereference_kernel_function_descriptor(&handle_interruption); - if (pc_is_kernel_fn(pc, ptr)) { + if (pc_is_kernel_fn(pc, handle_interruption)) { struct pt_regs *regs = (struct pt_regs *)(info->sp - frame_size - PT_SZ_ALGN); dbg("Unwinding through handle_interruption()\n"); info->prev_sp = regs->gr[30];
On 2024-02-15 1:17 p.m., Guenter Roeck wrote: > On 2/15/24 09:25, John David Anglin wrote: > [ ... ] >>>>> Source: >>>>> >>>>> static bool pc_is_kernel_fn(unsigned long pc, void *fn) >>>>> { >>>>> return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; >>>> This looks wrong to me. Function descriptors should always be 8-byte aligned. I think this >>>> routine should return false if fn isn't 8-byte aligned. >>> >>> Below you state "Code entry points only need 4-byte alignment." >>> >>> I think that contradicts each other. Also, the calling code is, >>> for example, >>> pc_is_kernel_fn(pc, syscall_exit) >>> >>> I fail to see how this can be consolidated if it is ok >>> that syscall_exit is 4-byte aligned but, at the same time, >>> must be 8-byte aligned to be considered to be a kernel function. >> In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned >> function descriptor. The descriptor holds the actual address of the function. It only needs >> 4-byte alignment. >> >> Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed >> using ldd instructions. >> > > Maybe code such as > pc_is_kernel_fn(pc, syscall_exit) > is wrong because syscall_exit doesn't point to a function descriptor > but to the actual address. The code and comments in arch/parisc/kernel/unwind.c It depends on how syscall_exit is declared. unwind.c lies the type of handle_interruption, etc: extern void * const handle_interruption; extern void * const ret_from_kernel_thread; extern void * const syscall_exit; extern void * const intr_return; extern void * const _switch_to_ret; #ifdef CONFIG_IRQSTACKS extern void * const _call_on_stack; #endif /* CONFIG_IRQSTACKS */ This should yield actual addresses. > is for sure confusing because it talks about not using > dereference_kernel_function_descriptor() to keep things simple but then calls > dereference_kernel_function_descriptor() anyway. Maybe it should just be > if (pc == syscall_exit) > instead. Looks like. > > The entire code is really odd anyway. > > ptr = dereference_kernel_function_descriptor(&handle_interruption); > if (pc_is_kernel_fn(pc, ptr)) { > > and then pc_is_kernel_fn() dereferences it again. Weird. > > It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when > CONFIG_64BIT is enabled") might have messed this up. No idea how to fix > it properly, though. This is Helge's code... I'll let him fix it. Dave
... > > Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed > > using ldd instructions. > > > > How about the patch below ? I think you still need something to avoid the misalignment trap in the 'false' case. If they only need to be aligned 'for efficiency' then I'd assume the cpu (or whatever normally processes them) is ok with 4-byte alignment even though 'ldd' (8 byte load?) faults. Which would mean you need to read them with two 4-byte loads. Especially if 'fn' isn't just one of a couple of specific values that can be forced to be aligned. But the code might just be completely broken. (as suggested elsewhere). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Feb 15, 2024 at 10:42:29AM -0800, Guenter Roeck wrote: > On 2/15/24 09:25, John David Anglin wrote: > > On 2024-02-15 12:06 p.m., Guenter Roeck wrote: > > > On 2/15/24 08:51, John David Anglin wrote: > > > > On 2024-02-15 10:44 a.m., Guenter Roeck wrote: > > > > > On 2/15/24 02:27, David Laight wrote: > > > > > > ... > > > > > > > It would be worthwhile tracking this down since there are > > > > > > > lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) > > > > > > > when running the kernel in 64-bit mode. > > > > > > > > > > > > Hmmm.... > > > > > > For performance reasons you really don't want any of them. > > > > > > The misaligned 64bit fields need an __attribute((aligned(4)) marker. > > > > > > > > > > > > If the checksum code can do them it really needs to detect > > > > > > and handle the misalignment. > > > > > > > > > > > > The misaligned trap handler probably ought to contain a > > > > > > warn_on_once() to dump stack on the first such error. > > > > > > They can then be fixed one at a time. > > > > > > > > > > > > > > > > Unaligned LDD at unwind_once+0x4a8/0x5e0 > > > > > > > > > > Decoded: > > > > > > > > > > Unaligned LDD at unwind_once (arch/parisc/kernel/unwind.c:212 arch/parisc/kernel/unwind.c:243 arch/parisc/kernel/unwind.c:371 arch/parisc/kernel/unwind.c:445) > > > > > > > > > > Source: > > > > > > > > > > static bool pc_is_kernel_fn(unsigned long pc, void *fn) > > > > > { > > > > > return (unsigned long)dereference_kernel_function_descriptor(fn) == pc; > > > > This looks wrong to me. Function descriptors should always be 8-byte aligned. I think this > > > > routine should return false if fn isn't 8-byte aligned. > > > > > > Below you state "Code entry points only need 4-byte alignment." > > > > > > I think that contradicts each other. Also, the calling code is, > > > for example, > > > pc_is_kernel_fn(pc, syscall_exit) > > > > > > I fail to see how this can be consolidated if it is ok > > > that syscall_exit is 4-byte aligned but, at the same time, > > > must be 8-byte aligned to be considered to be a kernel function. > > In the above call, syscall_exit is treated as a function pointer. It points to an 8-byte aligned > > function descriptor. The descriptor holds the actual address of the function. It only needs > > 4-byte alignment. > > > > Descriptors need 8-byte alignment for efficiency on 64-bit parisc. The pc and gp are accessed > > using ldd instructions. > > > > How about the patch below ? > > Guenter > > --- > diff --git a/arch/parisc/kernel/unwind.c b/arch/parisc/kernel/unwind.c > index 27ae40a443b8..c2b9e23cbc0a 100644 > --- a/arch/parisc/kernel/unwind.c > +++ b/arch/parisc/kernel/unwind.c > @@ -214,24 +214,14 @@ static bool pc_is_kernel_fn(unsigned long pc, void *fn) > > static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int frame_size) > { > - /* > - * We have to use void * instead of a function pointer, because > - * function pointers aren't a pointer to the function on 64-bit. > - * Make them const so the compiler knows they live in .text > - * Note: We could use dereference_kernel_function_descriptor() > - * instead but we want to keep it simple here. > - */ > - extern void * const ret_from_kernel_thread; > - extern void * const syscall_exit; > - extern void * const intr_return; > - extern void * const _switch_to_ret; > + void (*ret_from_kernel_thread)(void); > + void (*syscall_exit)(void); > + void (*intr_return)(void); > + void (*_switch_to_ret)(void); > #ifdef CONFIG_IRQSTACKS > - extern void * const _call_on_stack; > + void (*_call_on_stack)(void); > #endif /* CONFIG_IRQSTACKS */ > - void *ptr; > - > - ptr = dereference_kernel_function_descriptor(&handle_interruption); > - if (pc_is_kernel_fn(pc, ptr)) { > + if (pc_is_kernel_fn(pc, handle_interruption)) { > struct pt_regs *regs = (struct pt_regs *)(info->sp - frame_size - PT_SZ_ALGN); > dbg("Unwinding through handle_interruption()\n"); > info->prev_sp = regs->gr[30]; > Seems like a promising direction. It feels like we have hit a point when we should "close" this thread and start potentially a couple new ones to correct the behavior of saving/restoring the PSW and this behavior with unwind. I don't know what the proper etiquitte is for reverting back to a previous patch, should I send a v9 that is just the same as the v7? Thank you Guenter and John for looking into the parisc behavior! - Charlie
On 2/15/24 10:56, John David Anglin wrote: [ ... ] >> It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when >> CONFIG_64BIT is enabled") might have messed this up. No idea how to fix >> it properly, though. > This is Helge's code... I'll let him fix it. > I need the code anyway to be able to debug the other problem, so I'll give it a shot and send a patch. Helge can then go from there. Thanks, Guenter
On 2024-02-15 4:00 p.m., Guenter Roeck wrote: > On 2/15/24 10:56, John David Anglin wrote: > [ ... ] >>> It looks like commit 8e0ba125c2bf ("parisc/unwind: fix unwinder when >>> CONFIG_64BIT is enabled") might have messed this up. No idea how to fix >>> it properly, though. >> This is Helge's code... I'll let him fix it. >> > > I need the code anyway to be able to debug the other problem, so I'll give it > a shot and send a patch. Helge can then go from there. Your patch looked like a good start. Dave
On 2/15/24 11:42, Charlie Jenkins wrote: [ ... ] > Seems like a promising direction. > > It feels like we have hit a point when we should "close" this thread and > start potentially a couple new ones to correct the behavior of > saving/restoring the PSW and this behavior with unwind. > No need. Found it. It was a qemu problem. Kind of obvious, like hiding in plain sight. I'll send a patch shortly. > I don't know what the proper etiquitte is for reverting back to a > previous patch, should I send a v9 that is just the same as the v7? > Not sure if there is one. Just go ahead and send v9, and tell the maintainer to yell at me. Guenter
On Fri, Feb 16, 2024 at 06:54:55AM +0100, Helge Deller wrote: > > Can you please give a pointer to this test code? > I'm happy to try it on real hardware. > See below. Guenter --- From 0478f35f02224994e1d81e614b66219ab7539f7f Mon Sep 17 00:00:00 2001 From: Guenter Roeck <linux@roeck-us.net> Date: Wed, 14 Feb 2024 11:25:18 -0800 Subject: [PATCH] carry tests Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- lib/checksum_kunit.c | 76 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c index 72c313ba4c78..8f7925396e53 100644 --- a/lib/checksum_kunit.c +++ b/lib/checksum_kunit.c @@ -546,12 +546,88 @@ static void test_csum_ipv6_magic(struct kunit *test) #endif /* !CONFIG_NET */ } +#ifdef CONFIG_64BIT + +static __inline__ int get_carry64(void *addr) +{ + int carry = 0; + unsigned long sum = 0xffffffff; + unsigned long tmp; + + __asm__ __volatile__ ( +" add %0, %0, %0\n" /* clear carry */ +" ldd 0(%2), %3\n" /* load from memory */ +" add %1, %3, %1\n" /* optionally generate carry */ +" ldd 0(%2), %3\n" /* load from memory again */ +" add,dc %0, %0, %0\n" /* return carry */ + : "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp) + : "0" (carry), "1" (sum), "2" (addr) + : "memory"); + + return carry; +} + +static __inline__ int get_carry32(void *addr) +{ + int carry = 0; + unsigned int sum = 0xffffffff; + unsigned int tmp; + + __asm__ __volatile__ ( +" add %0, %0, %0\n" /* clear carry */ +" ldw 0(%2), %3\n" /* load from memory */ +" add %1, %3, %1\n" /* optionally generate carry */ +" ldw 0(%2), %3\n" /* load from memory again */ +" addc %0, %0, %0\n" /* return carry */ + : "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp) + : "0" (carry), "1" (sum), "2" (addr) + : "memory"); + + return carry; +} + +static void test_bad_carry(struct kunit *test) +{ + int carry; + + memset(tmp_buf, 0xff, sizeof(tmp_buf)); + carry = get_carry64(&tmp_buf[0]); + pr_info("#### carry64 aligned, expect 1 -> %d\n", carry); + carry = get_carry64(&tmp_buf[4]); + pr_info("#### carry64 unaligned 4, expect 1 -> %d\n", carry); + + carry = get_carry64(&tmp_buf[2]); + pr_info("#### carry64 unaligned 2, expect 1 -> %d\n", carry); + + carry = get_carry32(&tmp_buf[0]); + pr_info("#### carry32 aligned, expect 1 -> %d\n", carry); + carry = get_carry32(&tmp_buf[2]); + pr_info("#### carry64 unaligned, expect 1 -> %d\n", carry); + + memset(tmp_buf, 0, sizeof(tmp_buf)); + carry = get_carry64(&tmp_buf[0]); + pr_info("#### carry64 aligned, expect 0 -> %d\n", carry); + carry = get_carry64(&tmp_buf[4]); + pr_info("#### carry64 unaligned 4, expect 0 -> %d\n", carry); + carry = get_carry64(&tmp_buf[2]); + pr_info("#### carry64 unaligned 2, expect 0 -> %d\n", carry); + + carry = get_carry32(&tmp_buf[0]); + pr_info("#### carry32 aligned, expect 0 -> %d\n", carry); + carry = get_carry32(&tmp_buf[2]); + pr_info("#### carry32 unaligned, expect 0 -> %d\n", carry); +} +#else +static void test_bad_carry(struct kunit *test) {} +#endif /* CONFIG_64BIT */ + static struct kunit_case __refdata checksum_test_cases[] = { KUNIT_CASE(test_csum_fixed_random_inputs), KUNIT_CASE(test_csum_all_carry_inputs), KUNIT_CASE(test_csum_no_carry_inputs), KUNIT_CASE(test_ip_fast_csum), KUNIT_CASE(test_csum_ipv6_magic), + KUNIT_CASE(test_bad_carry), {} };
On 2/15/24 02:58, Guenter Roeck wrote: > Hi Charlie, > > On 2/14/24 17:30, Charlie Jenkins wrote: >> On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote: >>> On 2/14/24 13:41, Charlie Jenkins wrote: >>>> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a >>>> variety of architectures that are big endian or do not support >>>> misalgined accesses. Both of these test cases are changed to support big >>>> and little endian architectures. >>>> >>>> The test for ip_fast_csum is changed to align the data along (14 + >>>> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for >>>> csum_ipv6_magic aligns the data using a struct. An extra padding field >>>> is added to the struct to ensure that the size of the struct is the same >>>> on all architectures (44 bytes). >>>> >>>> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and >>>> daddr. This would fail on parisc64 due to the following code snippet in >>>> arch/parisc/include/asm/checksum.h: >>>> >>>> add %4, %0, %0\n" >>>> ldd,ma 8(%1), %6\n" >>>> ldd,ma 8(%2), %7\n" >>>> add,dc %5, %0, %0\n" >>>> >>>> The second add is expecting carry flags from the first add. Normally, >>>> a double word load (ldd) does not modify the carry flags. However, >>>> because saddr and daddr may be misaligned, ldd triggers a misalignment >>>> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes >>>> many additional instructions to be executed between the two adds. This >>>> can be easily solved by adding the carry into %0 before executing the >>>> ldd. >>>> >>> >>> I really think this is a bug either in the trap handler or in the hppa64 >>> qemu emulation. Only unaligned ldd instructions affect (actually, >>> unconditionally set) the carry flag. That doesn't happen with unaligned >>> ldw instructions. It would be worthwhile tracking this down since there are >>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >>> when running the kernel in 64-bit mode. On the other side, I guess this >>> is a different problem. Not sure though if that should even be mentioned >>> here since that makes it sound as if it would be expected that such >>> accesses impact the carry flag. >> >> I wasn't confident it was a bug somewhere so that's why I sent this patch. >> >> However, I have just found the section of the processor manual [1] I was >> looking for (Section Privileged Software-Accessible Registers subsection >> Processor Status Word (PSW)): >> >> "Processor state is encoded in a 64-bit register called the Processor >> Status Word (PSW). When an interruption occurs, the current value of the >> PSW is saved in the Interruption Processor Status Word (IPSW) and >> usually all defined PSW bits are set to 0. >> >> "The PSW is set to the contents of the IPSW by the RETURN FROM >> INTERRUPTION instruction. The interruption handler may restore the >> original PSW, modify selected bits, or may change the PSW to an entirely >> new value." >> >> Stored in the PSW register are the "Carry/borrow bits". This confirms >> that the carry/borrow bits should be restored. The save is supposed to >> automatically happen upon an interrupt and restored by the RETURN FROM >> INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please >> correct me if I am wrong). >> > > I know that much (I looked into the manual as well), I just really don't > know if this is a Linux bug or a QEMU bug, and I have not been able to > nail it down. I think someone with access to hardware will need to confirm. > > Specifically: Yes, the carry/borrow bits should be restored. Question is > if the Linux kernel's interrupt handler doesn't restore the carry bits > or if the problem is on the qemu side. > >> This v8 was not needed after-all it seems. It would be best to stick >> with the v7. >> > I tend to agree; after all, v7 exposes the problem, making it easier to > determine if the problem can be reproduced on real hardware. > > FWIW,I wrote some test code which exposes the problem. Can you please give a pointer to this test code? I'm happy to try it on real hardware. > It is quite easy to show that carry is always set after executing ldd > on an unaligned address. That is also why I know for sure that the > problem is not seen with ldw on unaligned addresses. Interesting. In general I think it's quite important to differentiate between running on qemu or running on physical hardware. Qemu just recently got 64-bit support, and it's not yet behaving like real hardware. One thing I noticed is, that read hardware does not seem to jump into the exception handler twice, while qemu does. So, if you run into an exception (e.g. unaligned ldd) then if a second exception happens in the fault handler (e.g. second unaligned ldd to resolve wrongly-coded code lookup), you will get different behaviour between hardware and emulation. This is also the reason why qemu still fails to emulate newer 64-bit Linux kernels which uses kernel modules. Helge
On 2/15/24 21:54, Helge Deller wrote: [ ... ] > > Can you please give a pointer to this test code? > I'm happy to try it on real hardware. > You should also see the problem if you use v7 of Charlie's checksum unit test fixes. I submitted the qemu fix (or at least what I think the fix should be) a couple of minutes ago. https://patchwork.kernel.org/project/qemu-devel/patch/20240216053415.2163286-1-linux@roeck-us.net/ >> It is quite easy to show that carry is always set after executing ldd >> on an unaligned address. That is also why I know for sure that the >> problem is not seen with ldw on unaligned addresses. > Interesting. Ultimately it wasn't surprising, with the unusual carry bit implementation on hppa. The upper 8 carry bits were not masked correctly when returning from a trap or interrupt. > In general I think it's quite important to differentiate between > running on qemu or running on physical hardware. I know, that makes testing always tricky (not just with this architecture) because it is often not obvious if the problem is a problem in the tested code or a problem in the emulation. > Qemu just recently got 64-bit support, and it's not yet behaving > like real hardware. One thing I noticed is, that read hardware > does not seem to jump into the exception handler twice, while > qemu does. So, if you run into an exception (e.g. unaligned ldd) > then if a second exception happens in the fault handler (e.g. second > unaligned ldd to resolve wrongly-coded code lookup), you will > get different behaviour between hardware and emulation. Hmm, interesting. Makes me wonder how the real hardware handles such double traps. > This is also the reason why qemu still fails to emulate newer > 64-bit Linux kernels which uses kernel modules. > I don't use modules in my testing, so I'll leave that alone for anther day. Cheers, Guenter
On Thu, Feb 15, 2024 at 10:09:42PM -0800, Guenter Roeck wrote: > On 2/15/24 21:54, Helge Deller wrote: > [ ... ] > > > > Can you please give a pointer to this test code? > > I'm happy to try it on real hardware. > > > You should also see the problem if you use v7 of Charlie's checksum > unit test fixes. > > I submitted the qemu fix (or at least what I think the fix should be) > a couple of minutes ago. > > https://patchwork.kernel.org/project/qemu-devel/patch/20240216053415.2163286-1-linux@roeck-us.net/ > > > > It is quite easy to show that carry is always set after executing ldd > > > on an unaligned address. That is also why I know for sure that the > > > problem is not seen with ldw on unaligned addresses. > > Interesting. > > Ultimately it wasn't surprising, with the unusual carry bit > implementation on hppa. The upper 8 carry bits were not masked > correctly when returning from a trap or interrupt. Tangential question, but why does Linux need to save and restore the PSW if that is already handled by the hardware? I am missing something. - Charlie > > > In general I think it's quite important to differentiate between > > running on qemu or running on physical hardware. > > I know, that makes testing always tricky (not just with this > architecture) because it is often not obvious if the problem > is a problem in the tested code or a problem in the emulation. > > > Qemu just recently got 64-bit support, and it's not yet behaving > > like real hardware. One thing I noticed is, that read hardware > > does not seem to jump into the exception handler twice, while > > qemu does. So, if you run into an exception (e.g. unaligned ldd) > > then if a second exception happens in the fault handler (e.g. second > > unaligned ldd to resolve wrongly-coded code lookup), you will > > get different behaviour between hardware and emulation. > > Hmm, interesting. Makes me wonder how the real hardware handles such > double traps. > > > This is also the reason why qemu still fails to emulate newer > > 64-bit Linux kernels which uses kernel modules. > > > I don't use modules in my testing, so I'll leave that alone for > anther day. > > Cheers, > Guenter >
Hi Helge, On 2/15/24 23:31, Helge Deller wrote: > On 2/16/24 06:25, Guenter Roeck wrote: >> On Fri, Feb 16, 2024 at 06:54:55AM +0100, Helge Deller wrote: >>> >>> Can you please give a pointer to this test code? >>> I'm happy to try it on real hardware. >>> >> See below. > > Testcase runs OK on physical machine: > > #### carry64 aligned, expect 1 -> 1 > #### carry64 unaligned 4, expect 1 -> 1 > #### carry64 unaligned 2, expect 1 -> 1 > #### carry32 aligned, expect 1 -> 1 > #### carry64 unaligned, expect 1 -> 1 > #### carry64 aligned, expect 0 -> 0 > #### carry64 unaligned 4, expect 0 -> 0 > #### carry64 unaligned 2, expect 0 -> 0 > #### carry32 aligned, expect 0 -> 0 > #### carry32 unaligned, expect 0 -> 0 > ok 6 test_bad_carry > thanks a lot for the confirmation. Guenter
On 2/16/24 06:25, Guenter Roeck wrote: > On Fri, Feb 16, 2024 at 06:54:55AM +0100, Helge Deller wrote: >> >> Can you please give a pointer to this test code? >> I'm happy to try it on real hardware. >> > See below. Testcase runs OK on physical machine: #### carry64 aligned, expect 1 -> 1 #### carry64 unaligned 4, expect 1 -> 1 #### carry64 unaligned 2, expect 1 -> 1 #### carry32 aligned, expect 1 -> 1 #### carry64 unaligned, expect 1 -> 1 #### carry64 aligned, expect 0 -> 0 #### carry64 unaligned 4, expect 0 -> 0 #### carry64 unaligned 2, expect 0 -> 0 #### carry32 aligned, expect 0 -> 0 #### carry32 unaligned, expect 0 -> 0 ok 6 test_bad_carry Helge > --- > From 0478f35f02224994e1d81e614b66219ab7539f7f Mon Sep 17 00:00:00 2001 > From: Guenter Roeck <linux@roeck-us.net> > Date: Wed, 14 Feb 2024 11:25:18 -0800 > Subject: [PATCH] carry tests > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > lib/checksum_kunit.c | 76 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 76 insertions(+) > > diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c > index 72c313ba4c78..8f7925396e53 100644 > --- a/lib/checksum_kunit.c > +++ b/lib/checksum_kunit.c > @@ -546,12 +546,88 @@ static void test_csum_ipv6_magic(struct kunit *test) > #endif /* !CONFIG_NET */ > } > > +#ifdef CONFIG_64BIT > + > +static __inline__ int get_carry64(void *addr) > +{ > + int carry = 0; > + unsigned long sum = 0xffffffff; > + unsigned long tmp; > + > + __asm__ __volatile__ ( > +" add %0, %0, %0\n" /* clear carry */ > +" ldd 0(%2), %3\n" /* load from memory */ > +" add %1, %3, %1\n" /* optionally generate carry */ > +" ldd 0(%2), %3\n" /* load from memory again */ > +" add,dc %0, %0, %0\n" /* return carry */ > + : "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp) > + : "0" (carry), "1" (sum), "2" (addr) > + : "memory"); > + > + return carry; > +} > + > +static __inline__ int get_carry32(void *addr) > +{ > + int carry = 0; > + unsigned int sum = 0xffffffff; > + unsigned int tmp; > + > + __asm__ __volatile__ ( > +" add %0, %0, %0\n" /* clear carry */ > +" ldw 0(%2), %3\n" /* load from memory */ > +" add %1, %3, %1\n" /* optionally generate carry */ > +" ldw 0(%2), %3\n" /* load from memory again */ > +" addc %0, %0, %0\n" /* return carry */ > + : "=r" (carry), "=r" (sum), "=r" (addr), "=r" (tmp) > + : "0" (carry), "1" (sum), "2" (addr) > + : "memory"); > + > + return carry; > +} > + > +static void test_bad_carry(struct kunit *test) > +{ > + int carry; > + > + memset(tmp_buf, 0xff, sizeof(tmp_buf)); > + carry = get_carry64(&tmp_buf[0]); > + pr_info("#### carry64 aligned, expect 1 -> %d\n", carry); > + carry = get_carry64(&tmp_buf[4]); > + pr_info("#### carry64 unaligned 4, expect 1 -> %d\n", carry); > + > + carry = get_carry64(&tmp_buf[2]); > + pr_info("#### carry64 unaligned 2, expect 1 -> %d\n", carry); > + > + carry = get_carry32(&tmp_buf[0]); > + pr_info("#### carry32 aligned, expect 1 -> %d\n", carry); > + carry = get_carry32(&tmp_buf[2]); > + pr_info("#### carry64 unaligned, expect 1 -> %d\n", carry); > + > + memset(tmp_buf, 0, sizeof(tmp_buf)); > + carry = get_carry64(&tmp_buf[0]); > + pr_info("#### carry64 aligned, expect 0 -> %d\n", carry); > + carry = get_carry64(&tmp_buf[4]); > + pr_info("#### carry64 unaligned 4, expect 0 -> %d\n", carry); > + carry = get_carry64(&tmp_buf[2]); > + pr_info("#### carry64 unaligned 2, expect 0 -> %d\n", carry); > + > + carry = get_carry32(&tmp_buf[0]); > + pr_info("#### carry32 aligned, expect 0 -> %d\n", carry); > + carry = get_carry32(&tmp_buf[2]); > + pr_info("#### carry32 unaligned, expect 0 -> %d\n", carry); > +} > +#else > +static void test_bad_carry(struct kunit *test) {} > +#endif /* CONFIG_64BIT */ > + > static struct kunit_case __refdata checksum_test_cases[] = { > KUNIT_CASE(test_csum_fixed_random_inputs), > KUNIT_CASE(test_csum_all_carry_inputs), > KUNIT_CASE(test_csum_no_carry_inputs), > KUNIT_CASE(test_ip_fast_csum), > KUNIT_CASE(test_csum_ipv6_magic), > + KUNIT_CASE(test_bad_carry), > {} > }; >
On 2/16/24 07:13, Charlie Jenkins wrote: > On Thu, Feb 15, 2024 at 10:09:42PM -0800, Guenter Roeck wrote: >> On 2/15/24 21:54, Helge Deller wrote: >> [ ... ] >>> >>> Can you please give a pointer to this test code? >>> I'm happy to try it on real hardware. >>> >> You should also see the problem if you use v7 of Charlie's checksum >> unit test fixes. >> >> I submitted the qemu fix (or at least what I think the fix should be) >> a couple of minutes ago. >> >> https://patchwork.kernel.org/project/qemu-devel/patch/20240216053415.2163286-1-linux@roeck-us.net/ >> >>>> It is quite easy to show that carry is always set after executing ldd >>>> on an unaligned address. That is also why I know for sure that the >>>> problem is not seen with ldw on unaligned addresses. >>> Interesting. >> >> Ultimately it wasn't surprising, with the unusual carry bit >> implementation on hppa. The upper 8 carry bits were not masked >> correctly when returning from a trap or interrupt. > > Tangential question, but why does Linux need to save and restore the PSW > if that is already handled by the hardware? I am missing something. e.g. for task switching. Helge
On 2024-02-16 12:54 a.m., Helge Deller wrote: > On 2/15/24 02:58, Guenter Roeck wrote: >> Hi Charlie, >> >> On 2/14/24 17:30, Charlie Jenkins wrote: >>> On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote: >>>> On 2/14/24 13:41, Charlie Jenkins wrote: >>>>> The test cases for ip_fast_csum and csum_ipv6_magic were failing on a >>>>> variety of architectures that are big endian or do not support >>>>> misalgined accesses. Both of these test cases are changed to support big >>>>> and little endian architectures. >>>>> >>>>> The test for ip_fast_csum is changed to align the data along (14 + >>>>> NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for >>>>> csum_ipv6_magic aligns the data using a struct. An extra padding field >>>>> is added to the struct to ensure that the size of the struct is the same >>>>> on all architectures (44 bytes). >>>>> >>>>> The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and >>>>> daddr. This would fail on parisc64 due to the following code snippet in >>>>> arch/parisc/include/asm/checksum.h: >>>>> >>>>> add %4, %0, %0\n" >>>>> ldd,ma 8(%1), %6\n" >>>>> ldd,ma 8(%2), %7\n" >>>>> add,dc %5, %0, %0\n" >>>>> >>>>> The second add is expecting carry flags from the first add. Normally, >>>>> a double word load (ldd) does not modify the carry flags. However, >>>>> because saddr and daddr may be misaligned, ldd triggers a misalignment >>>>> trap that gets handled in arch/parisc/kernel/unaligned.c. This causes >>>>> many additional instructions to be executed between the two adds. This >>>>> can be easily solved by adding the carry into %0 before executing the >>>>> ldd. >>>>> >>>> >>>> I really think this is a bug either in the trap handler or in the hppa64 >>>> qemu emulation. Only unaligned ldd instructions affect (actually, >>>> unconditionally set) the carry flag. That doesn't happen with unaligned >>>> ldw instructions. It would be worthwhile tracking this down since there are >>>> lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses) >>>> when running the kernel in 64-bit mode. On the other side, I guess this >>>> is a different problem. Not sure though if that should even be mentioned >>>> here since that makes it sound as if it would be expected that such >>>> accesses impact the carry flag. >>> >>> I wasn't confident it was a bug somewhere so that's why I sent this patch. >>> >>> However, I have just found the section of the processor manual [1] I was >>> looking for (Section Privileged Software-Accessible Registers subsection >>> Processor Status Word (PSW)): >>> >>> "Processor state is encoded in a 64-bit register called the Processor >>> Status Word (PSW). When an interruption occurs, the current value of the >>> PSW is saved in the Interruption Processor Status Word (IPSW) and >>> usually all defined PSW bits are set to 0. >>> >>> "The PSW is set to the contents of the IPSW by the RETURN FROM >>> INTERRUPTION instruction. The interruption handler may restore the >>> original PSW, modify selected bits, or may change the PSW to an entirely >>> new value." >>> >>> Stored in the PSW register are the "Carry/borrow bits". This confirms >>> that the carry/borrow bits should be restored. The save is supposed to >>> automatically happen upon an interrupt and restored by the RETURN FROM >>> INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please >>> correct me if I am wrong). >>> >> >> I know that much (I looked into the manual as well), I just really don't >> know if this is a Linux bug or a QEMU bug, and I have not been able to >> nail it down. I think someone with access to hardware will need to confirm. >> >> Specifically: Yes, the carry/borrow bits should be restored. Question is >> if the Linux kernel's interrupt handler doesn't restore the carry bits >> or if the problem is on the qemu side. >> >>> This v8 was not needed after-all it seems. It would be best to stick >>> with the v7. >>> >> I tend to agree; after all, v7 exposes the problem, making it easier to >> determine if the problem can be reproduced on real hardware. >> >> FWIW,I wrote some test code which exposes the problem. > > Can you please give a pointer to this test code? > I'm happy to try it on real hardware. > >> It is quite easy to show that carry is always set after executing ldd >> on an unaligned address. That is also why I know for sure that the >> problem is not seen with ldw on unaligned addresses. > Interesting. > In general I think it's quite important to differentiate between > running on qemu or running on physical hardware. > Qemu just recently got 64-bit support, and it's not yet behaving > like real hardware. One thing I noticed is, that read hardware > does not seem to jump into the exception handler twice, while > qemu does. So, if you run into an exception (e.g. unaligned ldd) See page 2-13 of arch. An interruption sets the PSW Q bit to 0 freezing the IIA queues. If an interruption occurs with Q=0, the interruption parameter registers are left unchanged, so I don't think there's a way to handle double exceptions on real hardware (Q would have to be set back to 1 after the IPRs are read). Dave
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c index 776ad3d6d5a1..f0eb7005ff27 100644 --- a/lib/checksum_kunit.c +++ b/lib/checksum_kunit.c @@ -13,8 +13,9 @@ #define IPv4_MIN_WORDS 5 #define IPv4_MAX_WORDS 15 -#define NUM_IPv6_TESTS 200 -#define NUM_IP_FAST_CSUM_TESTS 181 +#define WORD_ALIGNMENT 4 +/* Ethernet headers are 14 bytes and NET_IP_ALIGN is used to align them */ +#define IP_ALIGNMENT (14 + NET_IP_ALIGN) /* Values for a little endian CPU. Byte swap each half on big endian CPU. */ static const u32 random_init_sum = 0x2847aab; @@ -216,234 +217,106 @@ static const u32 init_sums_no_overflow[] = { }; static const u16 expected_csum_ipv6_magic[] = { - 0x18d4, 0x3085, 0x2e4b, 0xd9f4, 0xbdc8, 0x78f, 0x1034, 0x8422, 0x6fc0, - 0xd2f6, 0xbeb5, 0x9d3, 0x7e2a, 0x312e, 0x778e, 0xc1bb, 0x7cf2, 0x9d1e, - 0xca21, 0xf3ff, 0x7569, 0xb02e, 0xca86, 0x7e76, 0x4539, 0x45e3, 0xf28d, - 0xdf81, 0x8fd5, 0x3b5d, 0x8324, 0xf471, 0x83be, 0x1daf, 0x8c46, 0xe682, - 0xd1fb, 0x6b2e, 0xe687, 0x2a33, 0x4833, 0x2d67, 0x660f, 0x2e79, 0xd65e, - 0x6b62, 0x6672, 0x5dbd, 0x8680, 0xbaa5, 0x2229, 0x2125, 0x2d01, 0x1cc0, - 0x6d36, 0x33c0, 0xee36, 0xd832, 0x9820, 0x8a31, 0x53c5, 0x2e2, 0xdb0e, - 0x49ed, 0x17a7, 0x77a0, 0xd72e, 0x3d72, 0x7dc8, 0x5b17, 0xf55d, 0xa4d9, - 0x1446, 0x5d56, 0x6b2e, 0x69a5, 0xadb6, 0xff2a, 0x92e, 0xe044, 0x3402, - 0xbb60, 0xec7f, 0xe7e6, 0x1986, 0x32f4, 0x8f8, 0x5e00, 0x47c6, 0x3059, - 0x3969, 0xe957, 0x4388, 0x2854, 0x3334, 0xea71, 0xa6de, 0x33f9, 0x83fc, - 0x37b4, 0x5531, 0x3404, 0x1010, 0xed30, 0x610a, 0xc95, 0x9aed, 0x6ff, - 0x5136, 0x2741, 0x660e, 0x8b80, 0xf71, 0xa263, 0x88af, 0x7a73, 0x3c37, - 0x1908, 0x6db5, 0x2e92, 0x1cd2, 0x70c8, 0xee16, 0xe80, 0xcd55, 0x6e6, - 0x6434, 0x127, 0x655d, 0x2ea0, 0xb4f4, 0xdc20, 0x5671, 0xe462, 0xe52b, - 0xdb44, 0x3589, 0xc48f, 0xe60b, 0xd2d2, 0x66ad, 0x498, 0x436, 0xb917, - 0xf0ca, 0x1a6e, 0x1cb7, 0xbf61, 0x2870, 0xc7e8, 0x5b30, 0xe4a5, 0x168, - 0xadfc, 0xd035, 0xe690, 0xe283, 0xfb27, 0xe4ad, 0xb1a5, 0xf2d5, 0xc4b6, - 0x8a30, 0xd7d5, 0x7df9, 0x91d5, 0x63ed, 0x2d21, 0x312b, 0xab19, 0xa632, - 0x8d2e, 0xef06, 0x57b9, 0xc373, 0xbd1f, 0xa41f, 0x8444, 0x9975, 0x90cb, - 0xc49c, 0xe965, 0x4eff, 0x5a, 0xef6d, 0xe81a, 0xe260, 0x853a, 0xff7a, - 0x99aa, 0xb06b, 0xee19, 0xcc2c, 0xf34c, 0x7c49, 0xdac3, 0xa71e, 0xc988, - 0x3845, 0x1014 + 0x2fbd, 0xb5d0, 0xc16e, 0xe6c1, 0x412e, 0xa836, 0x433b, 0xe87c, 0x57ea, + 0x5875, 0x5a21, 0xd361, 0xe422, 0xdd50, 0xa57f, 0xad6b, 0xecd1, 0x90b5, + 0x7fda, 0x88db, 0x979e, 0x8916, 0x1df0, 0x7853, 0x473e, 0xd2b3, 0x4526, + 0x8304, 0x4c34, 0x363d, 0x2dc1, 0xb66c, 0x9a28, 0xd4f2, 0x615d, 0x36dd, + 0x3784, 0xbadd, 0xa2c6, 0xd293, 0x9830, 0xcea8, 0x1349, 0x886d, 0x20a3, + 0x6001, 0x5607, 0x4a30, 0x9365, 0x893c, 0x7505, 0x50a1, 0xb200, 0xf3ad, + 0x80bf, 0x1f48, 0xc1d9, 0xf55d, 0x7871, 0x262a, 0xe606, 0xb894, 0xd6cd, + 0x39ed, 0x1817, 0x9d20, 0x6f93, 0x722d, 0x1116, 0xb3b4, 0x88ec, 0xdcb5, + 0x1a46, 0xfa1d, 0x141e, 0xaef7, 0x7cee, 0xa583, 0x72ad, 0x201b, 0xdece, + 0xa1ee, 0x92bd, 0xc7ba, 0x403e, 0x50a9, 0xcb5a, 0xff3b, 0x1b41, 0xa06b, + 0xcf2d, 0xcc9a, 0xf42a, 0x0c61, 0x7654, 0xf3d4, 0xcc25, 0x4985, 0x7606, + 0xc8a2, 0x6636, 0x610e, 0xc454, 0xefa4, 0xf3a6, 0x43a7, 0xd8e2, 0xe31e, + 0x150b, 0x445d, 0x57d5, 0x253c, 0x6adf, 0x3c53, 0x502c, 0x9ee5, 0x8422, }; static const u16 expected_fast_csum[] = { - 0xda83, 0x45da, 0x4f46, 0x4e4f, 0x34e, 0xe902, 0xa5e9, 0x87a5, 0x7187, - 0x5671, 0xf556, 0x6df5, 0x816d, 0x8f81, 0xbb8f, 0xfbba, 0x5afb, 0xbe5a, - 0xedbe, 0xabee, 0x6aac, 0xe6b, 0xea0d, 0x67ea, 0x7e68, 0x8a7e, 0x6f8a, - 0x3a70, 0x9f3a, 0xe89e, 0x75e8, 0x7976, 0xfa79, 0x2cfa, 0x3c2c, 0x463c, - 0x7146, 0x7a71, 0x547a, 0xfd53, 0x99fc, 0xb699, 0x92b6, 0xdb91, 0xe8da, - 0x5fe9, 0x1e60, 0xae1d, 0x39ae, 0xf439, 0xa1f4, 0xdda1, 0xede, 0x790f, - 0x579, 0x1206, 0x9012, 0x2490, 0xd224, 0x5cd2, 0xa65d, 0xca7, 0x220d, - 0xf922, 0xbf9, 0x920b, 0x1b92, 0x361c, 0x2e36, 0x4d2e, 0x24d, 0x2, - 0xcfff, 0x90cf, 0xa591, 0x93a5, 0x7993, 0x9579, 0xc894, 0x50c8, 0x5f50, - 0xd55e, 0xcad5, 0xf3c9, 0x8f4, 0x4409, 0x5043, 0x5b50, 0x55b, 0x2205, - 0x1e22, 0x801e, 0x3780, 0xe137, 0x7ee0, 0xf67d, 0x3cf6, 0xa53c, 0x2ea5, - 0x472e, 0x5147, 0xcf51, 0x1bcf, 0x951c, 0x1e95, 0xc71e, 0xe4c7, 0xc3e4, - 0x3dc3, 0xee3d, 0xa4ed, 0xf9a4, 0xcbf8, 0x75cb, 0xb375, 0x50b4, 0x3551, - 0xf835, 0x19f8, 0x8c1a, 0x538c, 0xad52, 0xa3ac, 0xb0a3, 0x5cb0, 0x6c5c, - 0x5b6c, 0xc05a, 0x92c0, 0x4792, 0xbe47, 0x53be, 0x1554, 0x5715, 0x4b57, - 0xe54a, 0x20e5, 0x21, 0xd500, 0xa1d4, 0xa8a1, 0x57a9, 0xca57, 0x5ca, - 0x1c06, 0x4f1c, 0xe24e, 0xd9e2, 0xf0d9, 0x4af1, 0x474b, 0x8146, 0xe81, - 0xfd0e, 0x84fd, 0x7c85, 0xba7c, 0x17ba, 0x4a17, 0x964a, 0xf595, 0xff5, - 0x5310, 0x3253, 0x6432, 0x4263, 0x2242, 0xe121, 0x32e1, 0xf632, 0xc5f5, - 0x21c6, 0x7d22, 0x8e7c, 0x418e, 0x5641, 0x3156, 0x7c31, 0x737c, 0x373, - 0x2503, 0xc22a, 0x3c2, 0x4a04, 0x8549, 0x5285, 0xa352, 0xe8a3, 0x6fe8, - 0x1a6f, 0x211a, 0xe021, 0x38e0, 0x7638, 0xf575, 0x9df5, 0x169e, 0xf116, - 0x23f1, 0xcd23, 0xece, 0x660f, 0x4866, 0x6a48, 0x716a, 0xee71, 0xa2ee, - 0xb8a2, 0x61b9, 0xa361, 0xf7a2, 0x26f7, 0x1127, 0x6611, 0xe065, 0x36e0, - 0x1837, 0x3018, 0x1c30, 0x721b, 0x3e71, 0xe43d, 0x99e4, 0x9e9a, 0xb79d, - 0xa9b7, 0xcaa, 0xeb0c, 0x4eb, 0x1305, 0x8813, 0xb687, 0xa9b6, 0xfba9, - 0xd7fb, 0xccd8, 0x2ecd, 0x652f, 0xae65, 0x3fae, 0x3a40, 0x563a, 0x7556, - 0x2776, 0x1228, 0xef12, 0xf9ee, 0xcef9, 0x56cf, 0xa956, 0x24a9, 0xba24, - 0x5fba, 0x665f, 0xf465, 0x8ff4, 0x6d8f, 0x346d, 0x5f34, 0x385f, 0xd137, - 0xb8d0, 0xacb8, 0x55ac, 0x7455, 0xe874, 0x89e8, 0xd189, 0xa0d1, 0xb2a0, - 0xb8b2, 0x36b8, 0x5636, 0xd355, 0x8d3, 0x1908, 0x2118, 0xc21, 0x990c, - 0x8b99, 0x158c, 0x7815, 0x9e78, 0x6f9e, 0x4470, 0x1d44, 0x341d, 0x2634, - 0x3f26, 0x793e, 0xc79, 0xcc0b, 0x26cc, 0xd126, 0x1fd1, 0xb41f, 0xb6b4, - 0x22b7, 0xa122, 0xa1, 0x7f01, 0x837e, 0x3b83, 0xaf3b, 0x6fae, 0x916f, - 0xb490, 0xffb3, 0xceff, 0x50cf, 0x7550, 0x7275, 0x1272, 0x2613, 0xaa26, - 0xd5aa, 0x7d5, 0x9607, 0x96, 0xb100, 0xf8b0, 0x4bf8, 0xdd4c, 0xeddd, - 0x98ed, 0x2599, 0x9325, 0xeb92, 0x8feb, 0xcc8f, 0x2acd, 0x392b, 0x3b39, - 0xcb3b, 0x6acb, 0xd46a, 0xb8d4, 0x6ab8, 0x106a, 0x2f10, 0x892f, 0x789, - 0xc806, 0x45c8, 0x7445, 0x3c74, 0x3a3c, 0xcf39, 0xd7ce, 0x58d8, 0x6e58, - 0x336e, 0x1034, 0xee10, 0xe9ed, 0xc2e9, 0x3fc2, 0xd53e, 0xd2d4, 0xead2, - 0x8fea, 0x2190, 0x1162, 0xbe11, 0x8cbe, 0x6d8c, 0xfb6c, 0x6dfb, 0xd36e, - 0x3ad3, 0xf3a, 0x870e, 0xc287, 0x53c3, 0xc54, 0x5b0c, 0x7d5a, 0x797d, - 0xec79, 0x5dec, 0x4d5e, 0x184e, 0xd618, 0x60d6, 0xb360, 0x98b3, 0xf298, - 0xb1f2, 0x69b1, 0xf969, 0xef9, 0xab0e, 0x21ab, 0xe321, 0x24e3, 0x8224, - 0x5481, 0x5954, 0x7a59, 0xff7a, 0x7dff, 0x1a7d, 0xa51a, 0x46a5, 0x6b47, - 0xe6b, 0x830e, 0xa083, 0xff9f, 0xd0ff, 0xffd0, 0xe6ff, 0x7de7, 0xc67d, - 0xd0c6, 0x61d1, 0x3a62, 0xc3b, 0x150c, 0x1715, 0x4517, 0x5345, 0x3954, - 0xdd39, 0xdadd, 0x32db, 0x6a33, 0xd169, 0x86d1, 0xb687, 0x3fb6, 0x883f, - 0xa487, 0x39a4, 0x2139, 0xbe20, 0xffbe, 0xedfe, 0x8ded, 0x368e, 0xc335, - 0x51c3, 0x9851, 0xf297, 0xd6f2, 0xb9d6, 0x95ba, 0x2096, 0xea1f, 0x76e9, - 0x4e76, 0xe04d, 0xd0df, 0x80d0, 0xa280, 0xfca2, 0x75fc, 0xef75, 0x32ef, - 0x6833, 0xdf68, 0xc4df, 0x76c4, 0xb77, 0xb10a, 0xbfb1, 0x58bf, 0x5258, - 0x4d52, 0x6c4d, 0x7e6c, 0xb67e, 0xccb5, 0x8ccc, 0xbe8c, 0xc8bd, 0x9ac8, - 0xa99b, 0x52a9, 0x2f53, 0xc30, 0x3e0c, 0xb83d, 0x83b7, 0x5383, 0x7e53, - 0x4f7e, 0xe24e, 0xb3e1, 0x8db3, 0x618e, 0xc861, 0xfcc8, 0x34fc, 0x9b35, - 0xaa9b, 0xb1aa, 0x5eb1, 0x395e, 0x8639, 0xd486, 0x8bd4, 0x558b, 0x2156, - 0xf721, 0x4ef6, 0x14f, 0x7301, 0xdd72, 0x49de, 0x894a, 0x9889, 0x8898, - 0x7788, 0x7b77, 0x637b, 0xb963, 0xabb9, 0x7cab, 0xc87b, 0x21c8, 0xcb21, - 0xdfca, 0xbfdf, 0xf2bf, 0x6af2, 0x626b, 0xb261, 0x3cb2, 0xc63c, 0xc9c6, - 0xc9c9, 0xb4c9, 0xf9b4, 0x91f9, 0x4091, 0x3a40, 0xcc39, 0xd1cb, 0x7ed1, - 0x537f, 0x6753, 0xa167, 0xba49, 0x88ba, 0x7789, 0x3877, 0xf037, 0xd3ef, - 0xb5d4, 0x55b6, 0xa555, 0xeca4, 0xa1ec, 0xb6a2, 0x7b7, 0x9507, 0xfd94, - 0x82fd, 0x5c83, 0x765c, 0x9676, 0x3f97, 0xda3f, 0x6fda, 0x646f, 0x3064, - 0x5e30, 0x655e, 0x6465, 0xcb64, 0xcdca, 0x4ccd, 0x3f4c, 0x243f, 0x6f24, - 0x656f, 0x6065, 0x3560, 0x3b36, 0xac3b, 0x4aac, 0x714a, 0x7e71, 0xda7e, - 0x7fda, 0xda7f, 0x6fda, 0xff6f, 0xc6ff, 0xedc6, 0xd4ed, 0x70d5, 0xeb70, - 0xa3eb, 0x80a3, 0xca80, 0x3fcb, 0x2540, 0xf825, 0x7ef8, 0xf87e, 0x73f8, - 0xb474, 0xb4b4, 0x92b5, 0x9293, 0x93, 0x3500, 0x7134, 0x9071, 0xfa8f, - 0x51fa, 0x1452, 0xba13, 0x7ab9, 0x957a, 0x8a95, 0x6e8a, 0x6d6e, 0x7c6d, - 0x447c, 0x9744, 0x4597, 0x8945, 0xef88, 0x8fee, 0x3190, 0x4831, 0x8447, - 0xa183, 0x1da1, 0xd41d, 0x2dd4, 0x4f2e, 0xc94e, 0xcbc9, 0xc9cb, 0x9ec9, - 0x319e, 0xd531, 0x20d5, 0x4021, 0xb23f, 0x29b2, 0xd828, 0xecd8, 0x5ded, - 0xfc5d, 0x4dfc, 0xd24d, 0x6bd2, 0x5f6b, 0xb35e, 0x7fb3, 0xee7e, 0x56ee, - 0xa657, 0x68a6, 0x8768, 0x7787, 0xb077, 0x4cb1, 0x764c, 0xb175, 0x7b1, - 0x3d07, 0x603d, 0x3560, 0x3e35, 0xb03d, 0xd6b0, 0xc8d6, 0xd8c8, 0x8bd8, - 0x3e8c, 0x303f, 0xd530, 0xf1d4, 0x42f1, 0xca42, 0xddca, 0x41dd, 0x3141, - 0x132, 0xe901, 0x8e9, 0xbe09, 0xe0bd, 0x2ce0, 0x862d, 0x3986, 0x9139, - 0x6d91, 0x6a6d, 0x8d6a, 0x1b8d, 0xac1b, 0xedab, 0x54ed, 0xc054, 0xcebf, - 0xc1ce, 0x5c2, 0x3805, 0x6038, 0x5960, 0xd359, 0xdd3, 0xbe0d, 0xafbd, - 0x6daf, 0x206d, 0x2c20, 0x862c, 0x8e86, 0xec8d, 0xa2ec, 0xa3a2, 0x51a3, - 0x8051, 0xfd7f, 0x91fd, 0xa292, 0xaf14, 0xeeae, 0x59ef, 0x535a, 0x8653, - 0x3986, 0x9539, 0xb895, 0xa0b8, 0x26a0, 0x2227, 0xc022, 0x77c0, 0xad77, - 0x46ad, 0xaa46, 0x60aa, 0x8560, 0x4785, 0xd747, 0x45d7, 0x2346, 0x5f23, - 0x25f, 0x1d02, 0x71d, 0x8206, 0xc82, 0x180c, 0x3018, 0x4b30, 0x4b, - 0x3001, 0x1230, 0x2d12, 0x8c2d, 0x148d, 0x4015, 0x5f3f, 0x3d5f, 0x6b3d, - 0x396b, 0x473a, 0xf746, 0x44f7, 0x8945, 0x3489, 0xcb34, 0x84ca, 0xd984, - 0xf0d9, 0xbcf0, 0x63bd, 0x3264, 0xf332, 0x45f3, 0x7346, 0x5673, 0xb056, - 0xd3b0, 0x4ad4, 0x184b, 0x7d18, 0x6c7d, 0xbb6c, 0xfeba, 0xe0fe, 0x10e1, - 0x5410, 0x2954, 0x9f28, 0x3a9f, 0x5a3a, 0xdb59, 0xbdc, 0xb40b, 0x1ab4, - 0x131b, 0x5d12, 0x6d5c, 0xe16c, 0xb0e0, 0x89b0, 0xba88, 0xbb, 0x3c01, - 0xe13b, 0x6fe1, 0x446f, 0xa344, 0x81a3, 0xfe81, 0xc7fd, 0x38c8, 0xb38, - 0x1a0b, 0x6d19, 0xf36c, 0x47f3, 0x6d48, 0xb76d, 0xd3b7, 0xd8d2, 0x52d9, - 0x4b53, 0xa54a, 0x34a5, 0xc534, 0x9bc4, 0xed9b, 0xbeed, 0x3ebe, 0x233e, - 0x9f22, 0x4a9f, 0x774b, 0x4577, 0xa545, 0x64a5, 0xb65, 0x870b, 0x487, - 0x9204, 0x5f91, 0xd55f, 0x35d5, 0x1a35, 0x71a, 0x7a07, 0x4e7a, 0xfc4e, - 0x1efc, 0x481f, 0x7448, 0xde74, 0xa7dd, 0x1ea7, 0xaa1e, 0xcfaa, 0xfbcf, - 0xedfb, 0x6eee, 0x386f, 0x4538, 0x6e45, 0xd96d, 0x11d9, 0x7912, 0x4b79, - 0x494b, 0x6049, 0xac5f, 0x65ac, 0x1366, 0x5913, 0xe458, 0x7ae4, 0x387a, - 0x3c38, 0xb03c, 0x76b0, 0x9376, 0xe193, 0x42e1, 0x7742, 0x6476, 0x3564, - 0x3c35, 0x6a3c, 0xcc69, 0x94cc, 0x5d95, 0xe5e, 0xee0d, 0x4ced, 0xce4c, - 0x52ce, 0xaa52, 0xdaaa, 0xe4da, 0x1de5, 0x4530, 0x5445, 0x3954, 0xb639, - 0x81b6, 0x7381, 0x1574, 0xc215, 0x10c2, 0x3f10, 0x6b3f, 0xe76b, 0x7be7, - 0xbc7b, 0xf7bb, 0x41f7, 0xcc41, 0x38cc, 0x4239, 0xa942, 0x4a9, 0xc504, - 0x7cc4, 0x437c, 0x6743, 0xea67, 0x8dea, 0xe88d, 0xd8e8, 0xdcd8, 0x17dd, - 0x5718, 0x958, 0xa609, 0x41a5, 0x5842, 0x159, 0x9f01, 0x269f, 0x5a26, - 0x405a, 0xc340, 0xb4c3, 0xd4b4, 0xf4d3, 0xf1f4, 0x39f2, 0xe439, 0x67e4, - 0x4168, 0xa441, 0xdda3, 0xdedd, 0x9df, 0xab0a, 0xa5ab, 0x9a6, 0xba09, - 0x9ab9, 0xad9a, 0x5ae, 0xe205, 0xece2, 0xecec, 0x14ed, 0xd614, 0x6bd5, - 0x916c, 0x3391, 0x6f33, 0x206f, 0x8020, 0x780, 0x7207, 0x2472, 0x8a23, - 0xb689, 0x3ab6, 0xf739, 0x97f6, 0xb097, 0xa4b0, 0xe6a4, 0x88e6, 0x2789, - 0xb28, 0x350b, 0x1f35, 0x431e, 0x1043, 0xc30f, 0x79c3, 0x379, 0x5703, - 0x3256, 0x4732, 0x7247, 0x9d72, 0x489d, 0xd348, 0xa4d3, 0x7ca4, 0xbf7b, - 0x45c0, 0x7b45, 0x337b, 0x4034, 0x843f, 0xd083, 0x35d0, 0x6335, 0x4d63, - 0xe14c, 0xcce0, 0xfecc, 0x35ff, 0x5636, 0xf856, 0xeef8, 0x2def, 0xfc2d, - 0x4fc, 0x6e04, 0xb66d, 0x78b6, 0xbb78, 0x3dbb, 0x9a3d, 0x839a, 0x9283, - 0x593, 0xd504, 0x23d5, 0x5424, 0xd054, 0x61d0, 0xdb61, 0x17db, 0x1f18, - 0x381f, 0x9e37, 0x679e, 0x1d68, 0x381d, 0x8038, 0x917f, 0x491, 0xbb04, - 0x23bb, 0x4124, 0xd41, 0xa30c, 0x8ba3, 0x8b8b, 0xc68b, 0xd2c6, 0xebd2, - 0x93eb, 0xbd93, 0x99bd, 0x1a99, 0xea19, 0x58ea, 0xcf58, 0x73cf, 0x1073, - 0x9e10, 0x139e, 0xea13, 0xcde9, 0x3ecd, 0x883f, 0xf89, 0x180f, 0x2a18, - 0x212a, 0xce20, 0x73ce, 0xf373, 0x60f3, 0xad60, 0x4093, 0x8e40, 0xb98e, - 0xbfb9, 0xf1bf, 0x8bf1, 0x5e8c, 0xe95e, 0x14e9, 0x4e14, 0x1c4e, 0x7f1c, - 0xe77e, 0x6fe7, 0xf26f, 0x13f2, 0x8b13, 0xda8a, 0x5fda, 0xea5f, 0x4eea, - 0xa84f, 0x88a8, 0x1f88, 0x2820, 0x9728, 0x5a97, 0x3f5b, 0xb23f, 0x70b2, - 0x2c70, 0x232d, 0xf623, 0x4f6, 0x905, 0x7509, 0xd675, 0x28d7, 0x9428, - 0x3794, 0xf036, 0x2bf0, 0xba2c, 0xedb9, 0xd7ed, 0x59d8, 0xed59, 0x4ed, - 0xe304, 0x18e3, 0x5c19, 0x3d5c, 0x753d, 0x6d75, 0x956d, 0x7f95, 0xc47f, - 0x83c4, 0xa84, 0x2e0a, 0x5f2e, 0xb95f, 0x77b9, 0x6d78, 0xf46d, 0x1bf4, - 0xed1b, 0xd6ed, 0xe0d6, 0x5e1, 0x3905, 0x5638, 0xa355, 0x99a2, 0xbe99, - 0xb4bd, 0x85b4, 0x2e86, 0x542e, 0x6654, 0xd765, 0x73d7, 0x3a74, 0x383a, - 0x2638, 0x7826, 0x7677, 0x9a76, 0x7e99, 0x2e7e, 0xea2d, 0xa6ea, 0x8a7, - 0x109, 0x3300, 0xad32, 0x5fad, 0x465f, 0x2f46, 0xc62f, 0xd4c5, 0xad5, - 0xcb0a, 0x4cb, 0xb004, 0x7baf, 0xe47b, 0x92e4, 0x8e92, 0x638e, 0x1763, - 0xc17, 0xf20b, 0x1ff2, 0x8920, 0x5889, 0xcb58, 0xf8cb, 0xcaf8, 0x84cb, - 0x9f84, 0x8a9f, 0x918a, 0x4991, 0x8249, 0xff81, 0x46ff, 0x5046, 0x5f50, - 0x725f, 0xf772, 0x8ef7, 0xe08f, 0xc1e0, 0x1fc2, 0x9e1f, 0x8b9d, 0x108b, - 0x411, 0x2b04, 0xb02a, 0x1fb0, 0x1020, 0x7a0f, 0x587a, 0x8958, 0xb188, - 0xb1b1, 0x49b2, 0xb949, 0x7ab9, 0x917a, 0xfc91, 0xe6fc, 0x47e7, 0xbc47, - 0x8fbb, 0xea8e, 0x34ea, 0x2635, 0x1726, 0x9616, 0xc196, 0xa6c1, 0xf3a6, - 0x11f3, 0x4811, 0x3e48, 0xeb3e, 0xf7ea, 0x1bf8, 0xdb1c, 0x8adb, 0xe18a, - 0x42e1, 0x9d42, 0x5d9c, 0x6e5d, 0x286e, 0x4928, 0x9a49, 0xb09c, 0xa6b0, - 0x2a7, 0xe702, 0xf5e6, 0x9af5, 0xf9b, 0x810f, 0x8080, 0x180, 0x1702, - 0x5117, 0xa650, 0x11a6, 0x1011, 0x550f, 0xd554, 0xbdd5, 0x6bbe, 0xc66b, - 0xfc7, 0x5510, 0x5555, 0x7655, 0x177, 0x2b02, 0x6f2a, 0xb70, 0x9f0b, - 0xcf9e, 0xf3cf, 0x3ff4, 0xcb40, 0x8ecb, 0x768e, 0x5277, 0x8652, 0x9186, - 0x9991, 0x5099, 0xd350, 0x93d3, 0x6d94, 0xe6d, 0x530e, 0x3153, 0xa531, - 0x64a5, 0x7964, 0x7c79, 0x467c, 0x1746, 0x3017, 0x3730, 0x538, 0x5, - 0x1e00, 0x5b1e, 0x955a, 0xae95, 0x3eaf, 0xff3e, 0xf8ff, 0xb2f9, 0xa1b3, - 0xb2a1, 0x5b2, 0xad05, 0x7cac, 0x2d7c, 0xd32c, 0x80d2, 0x7280, 0x8d72, - 0x1b8e, 0x831b, 0xac82, 0xfdac, 0xa7fd, 0x15a8, 0xd614, 0xe0d5, 0x7be0, - 0xb37b, 0x61b3, 0x9661, 0x9d95, 0xc79d, 0x83c7, 0xd883, 0xead7, 0xceb, - 0xf60c, 0xa9f5, 0x19a9, 0xa019, 0x8f9f, 0xd48f, 0x3ad5, 0x853a, 0x985, - 0x5309, 0x6f52, 0x1370, 0x6e13, 0xa96d, 0x98a9, 0x5198, 0x9f51, 0xb69f, - 0xa1b6, 0x2ea1, 0x672e, 0x2067, 0x6520, 0xaf65, 0x6eaf, 0x7e6f, 0xee7e, - 0x17ef, 0xa917, 0xcea8, 0x9ace, 0xff99, 0x5dff, 0xdf5d, 0x38df, 0xa39, - 0x1c0b, 0xe01b, 0x46e0, 0xcb46, 0x90cb, 0xba90, 0x4bb, 0x9104, 0x9d90, - 0xc89c, 0xf6c8, 0x6cf6, 0x886c, 0x1789, 0xbd17, 0x70bc, 0x7e71, 0x17e, - 0x1f01, 0xa01f, 0xbaa0, 0x14bb, 0xfc14, 0x7afb, 0xa07a, 0x3da0, 0xbf3d, - 0x48bf, 0x8c48, 0x968b, 0x9d96, 0xfd9d, 0x96fd, 0x9796, 0x6b97, 0xd16b, - 0xf4d1, 0x3bf4, 0x253c, 0x9125, 0x6691, 0xc166, 0x34c1, 0x5735, 0x1a57, - 0xdc19, 0x77db, 0x8577, 0x4a85, 0x824a, 0x9182, 0x7f91, 0xfd7f, 0xb4c3, - 0xb5b4, 0xb3b5, 0x7eb3, 0x617e, 0x4e61, 0xa4f, 0x530a, 0x3f52, 0xa33e, - 0x34a3, 0x9234, 0xf091, 0xf4f0, 0x1bf5, 0x311b, 0x9631, 0x6a96, 0x386b, - 0x1d39, 0xe91d, 0xe8e9, 0x69e8, 0x426a, 0xee42, 0x89ee, 0x368a, 0x2837, - 0x7428, 0x5974, 0x6159, 0x1d62, 0x7b1d, 0xf77a, 0x7bf7, 0x6b7c, 0x696c, - 0xf969, 0x4cf9, 0x714c, 0x4e71, 0x6b4e, 0x256c, 0x6e25, 0xe96d, 0x94e9, - 0x8f94, 0x3e8f, 0x343e, 0x4634, 0xb646, 0x97b5, 0x8997, 0xe8a, 0x900e, - 0x8090, 0xfd80, 0xa0fd, 0x16a1, 0xf416, 0xebf4, 0x95ec, 0x1196, 0x8911, - 0x3d89, 0xda3c, 0x9fd9, 0xd79f, 0x4bd7, 0x214c, 0x3021, 0x4f30, 0x994e, - 0x5c99, 0x6f5d, 0x326f, 0xab31, 0x6aab, 0xe969, 0x90e9, 0x1190, 0xff10, - 0xa2fe, 0xe0a2, 0x66e1, 0x4067, 0x9e3f, 0x2d9e, 0x712d, 0x8170, 0xd180, - 0xffd1, 0x25ff, 0x3826, 0x2538, 0x5f24, 0xc45e, 0x1cc4, 0xdf1c, 0x93df, - 0xc793, 0x80c7, 0x2380, 0xd223, 0x7ed2, 0xfc7e, 0x22fd, 0x7422, 0x1474, - 0xb714, 0x7db6, 0x857d, 0xa85, 0xa60a, 0x88a6, 0x4289, 0x7842, 0xc278, - 0xf7c2, 0xcdf7, 0x84cd, 0xae84, 0x8cae, 0xb98c, 0x1aba, 0x4d1a, 0x884c, - 0x4688, 0xcc46, 0xd8cb, 0x2bd9, 0xbe2b, 0xa2be, 0x72a2, 0xf772, 0xd2f6, - 0x75d2, 0xc075, 0xa3c0, 0x63a3, 0xae63, 0x8fae, 0x2a90, 0x5f2a, 0xef5f, - 0x5cef, 0xa05c, 0x89a0, 0x5e89, 0x6b5e, 0x736b, 0x773, 0x9d07, 0xe99c, - 0x27ea, 0x2028, 0xc20, 0x980b, 0x4797, 0x2848, 0x9828, 0xc197, 0x48c2, - 0x2449, 0x7024, 0x570, 0x3e05, 0xd3e, 0xf60c, 0xbbf5, 0x69bb, 0x3f6a, - 0x740, 0xf006, 0xe0ef, 0xbbe0, 0xadbb, 0x56ad, 0xcf56, 0xbfce, 0xa9bf, - 0x205b, 0x6920, 0xae69, 0x50ae, 0x2050, 0xf01f, 0x27f0, 0x9427, 0x8993, - 0x8689, 0x4087, 0x6e40, 0xb16e, 0xa1b1, 0xe8a1, 0x87e8, 0x6f88, 0xfe6f, - 0x4cfe, 0xe94d, 0xd5e9, 0x47d6, 0x3148, 0x5f31, 0xc35f, 0x13c4, 0xa413, - 0x5a5, 0x2405, 0xc223, 0x66c2, 0x3667, 0x5e37, 0x5f5e, 0x2f5f, 0x8c2f, - 0xe48c, 0xd0e4, 0x4d1, 0xd104, 0xe4d0, 0xcee4, 0xfcf, 0x480f, 0xa447, - 0x5ea4, 0xff5e, 0xbefe, 0x8dbe, 0x1d8e, 0x411d, 0x1841, 0x6918, 0x5469, - 0x1155, 0xc611, 0xaac6, 0x37ab, 0x2f37, 0xca2e, 0x87ca, 0xbd87, 0xabbd, - 0xb3ab, 0xcb4, 0xce0c, 0xfccd, 0xa5fd, 0x72a5, 0xf072, 0x83f0, 0xfe83, - 0x97fd, 0xc997, 0xb0c9, 0xadb0, 0xe6ac, 0x88e6, 0x1088, 0xbe10, 0x16be, - 0xa916, 0xa3a8, 0x46a3, 0x5447, 0xe953, 0x84e8, 0x2085, 0xa11f, 0xfa1, - 0xdd0f, 0xbedc, 0x5abe, 0x805a, 0xc97f, 0x6dc9, 0x826d, 0x4a82, 0x934a, - 0x5293, 0xd852, 0xd3d8, 0xadd3, 0xf4ad, 0xf3f4, 0xfcf3, 0xfefc, 0xcafe, - 0xb7ca, 0x3cb8, 0xa13c, 0x18a1, 0x1418, 0xea13, 0x91ea, 0xf891, 0x53f8, - 0xa254, 0xe9a2, 0x87ea, 0x4188, 0x1c41, 0xdc1b, 0xf5db, 0xcaf5, 0x45ca, - 0x6d45, 0x396d, 0xde39, 0x90dd, 0x1e91, 0x1e, 0x7b00, 0x6a7b, 0xa46a, - 0xc9a3, 0x9bc9, 0x389b, 0x1139, 0x5211, 0x1f52, 0xeb1f, 0xabeb, 0x48ab, - 0x9348, 0xb392, 0x17b3, 0x1618, 0x5b16, 0x175b, 0xdc17, 0xdedb, 0x1cdf, - 0xeb1c, 0xd1ea, 0x4ad2, 0xd4b, 0xc20c, 0x24c2, 0x7b25, 0x137b, 0x8b13, - 0x618b, 0xa061, 0xff9f, 0xfffe, 0x72ff, 0xf572, 0xe2f5, 0xcfe2, 0xd2cf, - 0x75d3, 0x6a76, 0xc469, 0x1ec4, 0xfc1d, 0x59fb, 0x455a, 0x7a45, 0xa479, - 0xb7a4 + 0x83da, 0x2ac2, 0x6211, 0x49ba, 0x14af, 0x3045, 0x9340, 0x9cb0, 0xc3b4, + 0x5b20, 0x2cdf, 0x4e03, 0x8552, 0x6cfb, 0x37f0, 0x5386, 0xb681, 0xbff1, + 0xe6f5, 0x7e61, 0x5020, 0x916a, 0x8771, 0x6f1a, 0x3a0f, 0x55a5, 0xb8a0, + 0xc210, 0xe914, 0x8080, 0x523f, 0x9389, 0x704a, 0x6d81, 0x3876, 0x540c, + 0xb707, 0xc077, 0xe77b, 0x7ee7, 0x50a6, 0x91f0, 0x6eb1, 0xc58a, 0xfb5a, + 0x16f1, 0x79ec, 0x835c, 0xaa60, 0x41cc, 0x138b, 0x54d5, 0x3196, 0x886f, + 0x545c, 0xac6a, 0x0f66, 0x18d6, 0x3fda, 0xd745, 0xa904, 0xea4e, 0xc70f, + 0x1de9, 0xe9d5, 0x06ab, 0x687e, 0x71ee, 0x98f2, 0x305e, 0x021d, 0x4367, + 0x2028, 0x7701, 0x42ee, 0x5fc3, 0x3c73, 0x3a9f, 0x61a3, 0xf90e, 0xcacd, + 0x0c18, 0xe8d8, 0x3fb2, 0x0b9f, 0x2874, 0x0524, 0x1f07, 0x79fa, 0x1166, + 0xe324, 0x246f, 0x0130, 0x5809, 0x23f6, 0x40cb, 0x1d7b, 0x375e, 0x57d9, + 0x4671, 0x1830, 0x597a, 0x363b, 0x8d14, 0x5901, 0x75d6, 0x5286, 0x6c69, + 0x8ce4, 0xec7a, 0xfc99, 0x3de4, 0x1aa5, 0x717e, 0x3d6b, 0x5a40, 0x36f0, + 0x50d3, 0x714e, 0xd0e4, 0xaa9f, 0xdae8, 0xb7a9, 0x0e83, 0xda6f, 0xf744, + 0xd3f4, 0xedd7, 0x0e53, 0x6de9, 0x47a4, 0xc7fd, 0xae39, 0x0513, 0xd0ff, + 0xedd4, 0xca84, 0xe467, 0x04e3, 0x6479, 0x3e34, 0xbe8d, 0x3f6d, 0xde0e, + 0xa9fb, 0xc6d0, 0xa380, 0xbd63, 0xddde, 0x3d75, 0x1730, 0x9789, 0x1869, + 0x5e16, 0x1290, 0x2f65, 0x0c15, 0x25f8, 0x4673, 0xa609, 0x7fc4, 0x001e, + 0x80fd, 0xc6aa, 0x9c46, 0x5da6, 0x3a56, 0x5439, 0x74b4, 0xd44a, 0xae05, + 0x2e5f, 0xaf3e, 0xf4eb, 0xca87, 0xf556, 0xf90b, 0x12ef, 0x336a, 0x9300, + 0x6cbb, 0xed14, 0x6df4, 0xb3a1, 0x893d, 0xb40c, 0x8233, 0x362e, 0x56a9, + 0xb63f, 0x8ffa, 0x1054, 0x9133, 0xd6e0, 0xac7c, 0xd74b, 0xa572, 0x5471, + 0xffcf, 0x5f66, 0x3921, 0xb97a, 0x3a5a, 0x8007, 0x55a3, 0x8072, 0x4e99, + 0xfd97, 0x78d3, 0x9379, 0x6d34, 0xed8d, 0x6e6d, 0xb41a, 0x89b6, 0xb485, + 0x82ac, 0x31ab, 0xace6, 0xdaeb, 0x505f, 0xd0b8, 0x5198, 0x9745, 0x6ce1, + 0x97b0, 0x65d7, 0x14d6, 0x9011, 0xbe16, 0x2404, 0xf408, 0x74e8, 0xba95, + 0x9031, 0xbb00, 0x8927, 0x3826, 0xb361, 0xe166, 0x4754, 0x3984, 0x5b05, + 0xa0b2, 0x764e, 0xa11d, 0x6f44, 0x1e43, 0x997e, 0xc783, 0x2d71, 0x1fa1, + 0xded5, 0x8037, 0x55d3, 0x80a2, 0x4ec9, 0xfdc7, 0x7903, 0xa708, 0x0cf6, + 0xff25, 0xbe5a, 0xcd18, 0xf63c, 0x210c, 0xef32, 0x9e31, 0x196d, 0x4772, + 0xad5f, 0x9f8f, 0x5ec4, 0x6d82, 0x2c93, 0x4751, 0x1578, 0xc476, 0x3fb2, + 0x6db7, 0xd3a4, 0xc5d4, 0x8509, 0x93c7, 0x52d8, 0x754d, 0x951e, 0x441d, + 0xbf58, 0xed5d, 0x534b, 0x457b, 0x04b0, 0x136e, 0xd27e, 0xf4f3, 0x4b97, + 0xc33d, 0x3e79, 0x6c7e, 0xd26b, 0xc49b, 0x83d0, 0x928e, 0x519f, 0x7414, + 0xcab7, 0x5dc1, 0xf8cb, 0x26d1, 0x8cbe, 0x7eee, 0x3e23, 0x4ce1, 0x0bf2, + 0x2e67, 0x850a, 0x1814, 0xcdef, 0x5135, 0xb722, 0xa952, 0x6887, 0x7745, + 0x3656, 0x58cb, 0xaf6e, 0x4278, 0xf853, 0xb310, 0x8c53, 0x7e83, 0x3db8, + 0x4c76, 0x0b87, 0x2dfc, 0x849f, 0x17a9, 0xcd84, 0x8841, 0xc3f1, 0xb05c, + 0x6f91, 0x7e4f, 0x3d60, 0x5fd5, 0xb678, 0x4982, 0xff5d, 0xba1a, 0xf5ca, + 0xe8dc, 0xc092, 0xcf50, 0x8e61, 0xb0d6, 0x077a, 0x9a83, 0x505f, 0x0b1c, + 0x46cc, 0x39de, 0x7bb6, 0x5415, 0x1326, 0x359b, 0x8c3e, 0x1f48, 0xd523, + 0x8fe0, 0xcb90, 0xbea2, 0x007b, 0xf7a0, 0xe520, 0x0796, 0x5e39, 0xf142, + 0xa71e, 0x61db, 0x9d8b, 0x909d, 0xd275, 0xc99b, 0xb80c, 0xa1a8, 0xf84b, + 0x8b55, 0x4131, 0xfbed, 0x379e, 0x2ab0, 0x6c88, 0x63ae, 0x521f, 0x3ac3, + 0x061c, 0x9925, 0x4f01, 0x09be, 0x456e, 0x3880, 0x7a58, 0x717e, 0x5fef, + 0x4893, 0xd58f, 0xd9f0, 0x8fcc, 0x4a89, 0x8639, 0x794b, 0xbb23, 0xb249, + 0xa0ba, 0x895e, 0x165b, 0xedda, 0x810e, 0x3bcb, 0x777b, 0x6a8d, 0xac65, + 0xa38b, 0x91fc, 0x7aa0, 0x079d, 0xdf1c, 0x0a1e, 0x7cba, 0xb86a, 0xab7c, + 0xed54, 0xe47a, 0xd2eb, 0xbb8f, 0x488c, 0x200c, 0x4b0d, 0x9d88, 0x95f5, + 0x8907, 0xcadf, 0xc205, 0xb076, 0x991a, 0x2617, 0xfd96, 0x2898, 0x7b13, + 0xf6a2, 0x3264, 0x743c, 0x6b62, 0x59d3, 0x4277, 0xcf73, 0xa6f3, 0xd1f4, + 0x2470, 0x9fff, 0x88ec, 0xe132, 0xd858, 0xc6c9, 0xaf6d, 0x3c6a, 0x13ea, + 0x3eeb, 0x9166, 0x0cf6, 0xf5e2, 0x2c46, 0x227d, 0x10ee, 0xf991, 0x868e, + 0x5e0e, 0x890f, 0xdb8a, 0x571a, 0x4007, 0x766a, 0xb616, 0x5631, 0x3ed5, + 0xcbd1, 0xa351, 0xce52, 0x20ce, 0x9c5d, 0x854a, 0xbbad, 0xfb59, 0xe067, + 0x0325, 0x9021, 0x67a1, 0x92a2, 0xe51d, 0x60ad, 0x499a, 0x7ffd, 0xbfa9, + 0xa4b7, 0x78d3, 0x9d0f, 0x748f, 0x9f90, 0xf20b, 0x6d9b, 0x5688, 0x8ceb, + 0xcc97, 0xb1a5, 0x85c1, 0xee49, 0x32b7, 0x5db8, 0xb033, 0x2bc3, 0x14b0, + 0x4b13, 0x8abf, 0x6fcd, 0x43e9, 0xac71, 0xe4d9, 0x6692, 0xb90d, 0x349d, + 0x1d8a, 0x53ed, 0x9399, 0x78a7, 0x4cc3, 0xb54b, 0xedb3, 0x5a4e, 0xca9c, + 0x462c, 0x2f19, 0x657c, 0xa528, 0x8a36, 0x5e52, 0xc6da, 0xff42, 0x6bdd, + 0x4b71, 0x5d88, 0x4675, 0x7cd8, 0xbc84, 0xa192, 0x75ae, 0xde36, 0x169f, + 0x8339, 0x62cd, 0x7d20, 0xb978, 0xefdb, 0x2f88, 0x1496, 0xe8b1, 0x513a, + 0x89a2, 0xf63c, 0xd5d0, 0xf023, 0x7c35, 0x185c, 0x5808, 0x3d16, 0x1132, + 0x79ba, 0xb222, 0x1ebd, 0xfe50, 0x18a4, 0xa4b5, 0x0dc3, 0x2d07, 0x1215, + 0xe630, 0x4eb9, 0x8721, 0xf3bb, 0xd34f, 0xeda2, 0x79b4, 0xe2c1, 0xa671, + 0xbf99, 0x93b5, 0xfc3d, 0x34a6, 0xa140, 0x80d4, 0x9b27, 0x2739, 0x9046, + 0x53f6, 0xe623, 0x1826, 0x80ae, 0xb916, 0x25b1, 0x0545, 0x1f98, 0xaba9, + 0x14b7, 0xd866, 0x6a94, 0xfea5, 0x97c1, 0xd029, 0x3cc4, 0x1c58, 0x36ab, + 0xc2bc, 0x2bca, 0xef79, 0x81a7, 0x15b9, 0x50d9, 0x99c6, 0x0661, 0xe5f4, + 0x0048, 0x8c59, 0xf566, 0xb916, 0x4b44, 0xdf55, 0x1a76, 0x70dd, 0xc6b4, + 0xa648, 0xc09b, 0x4cad, 0xb5ba, 0x796a, 0x0b98, 0x9fa9, 0xdac9, 0x3131, + 0x00cb, 0xc13a, 0xdb8d, 0x679f, 0xd0ac, 0x945c, 0x268a, 0xba9b, 0xf5bb, + 0x4c23, 0x1bbd, 0xd992, 0x0772, 0x9383, 0xfc90, 0xc040, 0x526e, 0xe67f, + 0x21a0, 0x7807, 0x47a1, 0x0577, 0xa4cd, 0x2afb, 0x9408, 0x57b8, 0xe9e5, + 0x7df7, 0xb917, 0x0f7f, 0xdf18, 0x9cee, 0x3c45, 0x9aaf, 0x5ba0, 0x1f50, + 0xb17d, 0x458f, 0x80af, 0xd716, 0xa6b0, 0x6486, 0x03dd, 0x6247, 0xce54, + 0xb2b5, 0x44e3, 0xd8f4, 0x1415, 0x6a7c, 0x3a16, 0xf7eb, 0x9742, 0xf5ac, + 0x61ba, 0xc44b, 0x654f, 0xf960, 0x3481, 0x8ae8, 0x5a82, 0x1858, 0xb7ae, + 0x1619, 0x8226, 0xe4b7, 0x1920, 0xdf0d, 0x1a2e, 0x7095, 0x402f, 0xfe04, + 0x9d5b, 0xfbc5, 0x67d3, 0xca64, 0xfecc, 0x68ea, 0x8e1c, 0xe483, 0xb41d, + 0x71f3, 0x114a, 0x6fb4, 0xdbc1, 0x3e53, 0x72bb, 0xdcd8, 0x6b24, 0x7b76, + 0x4b10, 0x08e6, 0xa83c, 0x06a7, 0x72b4, 0xd545, 0x09ae, 0x73cb, 0x0217, + 0x9603, }; static u8 tmp_buf[TEST_BUFLEN]; @@ -578,15 +451,19 @@ static void test_csum_no_carry_inputs(struct kunit *test) static void test_ip_fast_csum(struct kunit *test) { __sum16 csum_result, expected; - - for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) { - for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) { - csum_result = ip_fast_csum(random_buf + index, len); - expected = (__force __sum16) - expected_fast_csum[(len - IPv4_MIN_WORDS) * - NUM_IP_FAST_CSUM_TESTS + - index]; - CHECK_EQ(expected, csum_result); + int num_tests = (MAX_LEN / WORD_ALIGNMENT - IPv4_MAX_WORDS * WORD_ALIGNMENT); + + for (int i = 0; i < num_tests; i++) { + memcpy(&tmp_buf[IP_ALIGNMENT], + random_buf + (i * WORD_ALIGNMENT), + IPv4_MAX_WORDS * WORD_ALIGNMENT); + + for (int len = IPv4_MIN_WORDS; len <= IPv4_MAX_WORDS; len++) { + int index = (len - IPv4_MIN_WORDS) + + i * ((IPv4_MAX_WORDS - IPv4_MIN_WORDS) + 1); + csum_result = ip_fast_csum(tmp_buf + IP_ALIGNMENT, len); + expected = (__force __sum16)htons(expected_fast_csum[index]); + CHECK_EQ(csum_result, expected); } } } @@ -594,29 +471,29 @@ static void test_ip_fast_csum(struct kunit *test) static void test_csum_ipv6_magic(struct kunit *test) { #if defined(CONFIG_NET) - const struct in6_addr *saddr; - const struct in6_addr *daddr; - unsigned int len; - unsigned char proto; - unsigned int csum; - - const int daddr_offset = sizeof(struct in6_addr); - const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr); - const int proto_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) + - sizeof(int); - const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) + - sizeof(int) + sizeof(char); - - for (int i = 0; i < NUM_IPv6_TESTS; i++) { - saddr = (const struct in6_addr *)(random_buf + i); - daddr = (const struct in6_addr *)(random_buf + i + - daddr_offset); - len = *(unsigned int *)(random_buf + i + len_offset); - proto = *(random_buf + i + proto_offset); - csum = *(unsigned int *)(random_buf + i + csum_offset); - CHECK_EQ((__force __sum16)expected_csum_ipv6_magic[i], - csum_ipv6_magic(saddr, daddr, len, proto, - (__force __wsum)csum)); + struct csum_ipv6_magic_data { + const struct in6_addr saddr; + const struct in6_addr daddr; + __be32 len; + __wsum csum; + unsigned char proto; + unsigned char pad[3]; + } *data; + __sum16 csum_result, expected; + int ipv6_num_tests = ((MAX_LEN - sizeof(struct csum_ipv6_magic_data)) / WORD_ALIGNMENT); + + for (int i = 0; i < ipv6_num_tests; i++) { + int index = i * WORD_ALIGNMENT; + + data = kmalloc(sizeof(struct csum_ipv6_magic_data), GFP_KERNEL); + + memcpy(data, random_buf + index, sizeof(struct csum_ipv6_magic_data)); + + csum_result = csum_ipv6_magic(&data->saddr, &data->daddr, + ntohl(data->len), data->proto, + data->csum); + expected = (__force __sum16)htons(expected_csum_ipv6_magic[i]); + CHECK_EQ(csum_result, expected); } #endif /* !CONFIG_NET */ }
The test cases for ip_fast_csum and csum_ipv6_magic were failing on a variety of architectures that are big endian or do not support misalgined accesses. Both of these test cases are changed to support big and little endian architectures. The test for ip_fast_csum is changed to align the data along (14 + NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for csum_ipv6_magic aligns the data using a struct. An extra padding field is added to the struct to ensure that the size of the struct is the same on all architectures (44 bytes). The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and daddr. This would fail on parisc64 due to the following code snippet in arch/parisc/include/asm/checksum.h: add %4, %0, %0\n" ldd,ma 8(%1), %6\n" ldd,ma 8(%2), %7\n" add,dc %5, %0, %0\n" The second add is expecting carry flags from the first add. Normally, a double word load (ldd) does not modify the carry flags. However, because saddr and daddr may be misaligned, ldd triggers a misalignment trap that gets handled in arch/parisc/kernel/unaligned.c. This causes many additional instructions to be executed between the two adds. This can be easily solved by adding the carry into %0 before executing the ldd. However, that is not necessary since ipv6 headers should always be aligned on a 16-byte boundary on parisc since NET_IP_ALIGN is set to 2 and the ethernet header size is 14. Architectures that set NET_IP_ALIGN to 0 must support misaligned saddr and daddr, but that is not tested here. Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> --- lib/checksum_kunit.c | 395 ++++++++++++++++++--------------------------------- 1 file changed, 136 insertions(+), 259 deletions(-)