[2/2] powerpc/32: Implement csum_sub

Message ID	c2a3f87d97f0903fdef3bbcb84661f75619301bf.1644574987.git.christophe.leroy@csgroup.eu (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Christophe Leroy <christophe.leroy@csgroup.eu> To: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org Subject: [PATCH 2/2] powerpc/32: Implement csum_sub Date: Fri, 11 Feb 2022 11:24:49 +0100 Message-Id: <c2a3f87d97f0903fdef3bbcb84661f75619301bf.1644574987.git.christophe.leroy@csgroup.eu> In-Reply-To: <0c8eaab8f0685d2a70d125cf876238c70afd4fb6.1644574987.git.christophe.leroy@csgroup.eu> References: <0c8eaab8f0685d2a70d125cf876238c70afd4fb6.1644574987.git.christophe.leroy@csgroup.eu> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[1/2] net: Allow csum_sub() to be provided in arch \| expand [1/2] net: Allow csum_sub() to be provided in arch [2/2] powerpc/32: Implement csum_sub

Message ID

c2a3f87d97f0903fdef3bbcb84661f75619301bf.1644574987.git.christophe.leroy@csgroup.eu (mailing list archive)

State

Changes Requested

Delegated to:

Netdev Maintainers

Headers

From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Paul Mackerras <paulus@samba.org>,
        Michael Ellerman <mpe@ellerman.id.au>,
        "David S. Miller" <davem@davemloft.net>,
        Jakub Kicinski <kuba@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>,
        linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
        netdev@vger.kernel.org
Subject: [PATCH 2/2] powerpc/32: Implement csum_sub
Date: Fri, 11 Feb 2022 11:24:49 +0100
Message-Id: 
 <c2a3f87d97f0903fdef3bbcb84661f75619301bf.1644574987.git.christophe.leroy@csgroup.eu>
In-Reply-To: 
 <0c8eaab8f0685d2a70d125cf876238c70afd4fb6.1644574987.git.christophe.leroy@csgroup.eu>
References: 
 <0c8eaab8f0685d2a70d125cf876238c70afd4fb6.1644574987.git.christophe.leroy@csgroup.eu>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

[1/2] net: Allow csum_sub() to be provided in arch | expand

Context	Check	Description
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	warning	1 maintainers not CCed: segher@kernel.crashing.org
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/tree_selection	success	Guessing tree name failed - patch did not apply

Context

Check

Description

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/subject_prefix

success

Link

netdev/cover_letter

success

Single patches do not need cover letters

netdev/patch_count

success

Link

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 0 this patch: 0

netdev/cc_maintainers

warning

1 maintainers not CCed: segher@kernel.crashing.org

netdev/build_clang

success

Errors and warnings before: 0 this patch: 0

netdev/module_param

success

Was 0 now: 0

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 0 this patch: 0

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 22 lines checked

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

netdev/tree_selection

success

Guessing tree name failed - patch did not apply

Commit Message

Christophe Leroy Feb. 11, 2022, 10:24 a.m. UTC

When building kernel with CONFIG_CC_OPTIMISE_FOR_SIZE, several
copies of csum_sub() are generated, with the following code:

	00000170 <csum_sub>:
	     170:	7c 84 20 f8 	not     r4,r4
	     174:	7c 63 20 14 	addc    r3,r3,r4
	     178:	7c 63 01 94 	addze   r3,r3
	     17c:	4e 80 00 20 	blr

Let's define a PPC32 version with subc/addme, and for it's inlining.

It will return 0 instead of 0xffffffff when subtracting 0x80000000 to itself,
this is not an issue as 0 and ~0 are equivalent, refer to RFC 1624.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/checksum.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

David Laight Feb. 13, 2022, 3:01 a.m. UTC | #1

From: Christophe Leroy
> Sent: 11 February 2022 10:25
> 
> When building kernel with CONFIG_CC_OPTIMISE_FOR_SIZE, several
> copies of csum_sub() are generated, with the following code:
> 
> 	00000170 <csum_sub>:
> 	     170:	7c 84 20 f8 	not     r4,r4
> 	     174:	7c 63 20 14 	addc    r3,r3,r4
> 	     178:	7c 63 01 94 	addze   r3,r3
> 	     17c:	4e 80 00 20 	blr
> 
> Let's define a PPC32 version with subc/addme, and for it's inlining.
> 
> It will return 0 instead of 0xffffffff when subtracting 0x80000000 to itself,
> this is not an issue as 0 and ~0 are equivalent, refer to RFC 1624.

They are not always equivalent.
In particular in the UDP checksum field one of them is (0?) 'checksum not calculated'.

I think all the Linux functions have to return a non-zero value (for non-zero input).

If the csum is going to be converted to 16 bit, inverted, and put into a packet
the code usually has to have a check that changes 0 to 0xffff.
However if the csum functions guarantee never to return zero they can feed
an extra 1 into the first csum_partial() then just invert and add 1 at the end.
Because (~csum_partion(buffer, 1) + 1) is the same as ~csum_partial(buffer, 0)
except when the buffer's csum is 0xffffffff.

I did do some experiments and the 64bit value can be reduced directly to
16bits using '% 0xffff'.
This is different because it returns 0 not 0xffff.
However gcc 'randomly' picks between the fast 'multiply by reciprocal'
and slow divide instruction paths.
The former is (probably) faster than reducing using shifts and adc.
The latter definitely slower.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Christophe Leroy Feb. 17, 2022, 10:13 a.m. UTC | #2

Le 13/02/2022 à 04:01, David Laight a écrit :
> From: Christophe Leroy
>> Sent: 11 February 2022 10:25
>>
>> When building kernel with CONFIG_CC_OPTIMISE_FOR_SIZE, several
>> copies of csum_sub() are generated, with the following code:
>>
>> 	00000170 <csum_sub>:
>> 	     170:	7c 84 20 f8 	not     r4,r4
>> 	     174:	7c 63 20 14 	addc    r3,r3,r4
>> 	     178:	7c 63 01 94 	addze   r3,r3
>> 	     17c:	4e 80 00 20 	blr
>>
>> Let's define a PPC32 version with subc/addme, and for it's inlining.
>>
>> It will return 0 instead of 0xffffffff when subtracting 0x80000000 to itself,
>> this is not an issue as 0 and ~0 are equivalent, refer to RFC 1624.
> 
> They are not always equivalent.
> In particular in the UDP checksum field one of them is (0?) 'checksum not calculated'.
> 
> I think all the Linux functions have to return a non-zero value (for non-zero input).
> 
> If the csum is going to be converted to 16 bit, inverted, and put into a packet
> the code usually has to have a check that changes 0 to 0xffff.
> However if the csum functions guarantee never to return zero they can feed
> an extra 1 into the first csum_partial() then just invert and add 1 at the end.
> Because (~csum_partion(buffer, 1) + 1) is the same as ~csum_partial(buffer, 0)
> except when the buffer's csum is 0xffffffff.
> 
> I did do some experiments and the 64bit value can be reduced directly to
> 16bits using '% 0xffff'.
> This is different because it returns 0 not 0xffff.
> However gcc 'randomly' picks between the fast 'multiply by reciprocal'
> and slow divide instruction paths.
> The former is (probably) faster than reducing using shifts and adc.
> The latter definitely slower.
> 

Ok, I submitted a patch to force inlining of all checksum helpers in 
net/checksum.h instead.

Christophe

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 350de8f90250..3288a1bf5e8d 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -112,6 +112,22 @@  static __always_inline __wsum csum_add(__wsum csum, __wsum addend)
 #endif
 }
 
+#ifdef CONFIG_PPC32
+#define HAVE_ARCH_CSUM_SUB
+static __always_inline __wsum csum_sub(__wsum csum, __wsum addend)
+{
+	if (__builtin_constant_p(csum) && (csum == 0 || csum == ~0))
+		return ~addend;
+	if (__builtin_constant_p(addend) && (addend == 0 || addend == ~0))
+		return csum;
+
+	asm("subc %0,%0,%1;"
+	    "addme %0,%0;"
+	    : "+r" (csum) : "r" (addend) : "xer");
+	return csum;
+}
+#endif
+
 /*
  * This is a version of ip_compute_csum() optimized for IP headers,
  * which always checksum on 4 octet boundaries.  ihl is the number

[2/2] powerpc/32: Implement csum_sub

Checks

Commit Message

Comments

Patch