Message ID | 20231027-optimize_checksum-v8-1-feb7101d128d@rivosinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | riscv: Add fine-tuned checksum functions | expand |
On Fri, Oct 27, 2023 at 03:43:51PM -0700, Charlie Jenkins wrote: > /* > * computes the checksum of a memory block at buff, length len, > * and adds in "sum" (32-bit) > @@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl); > static inline __sum16 csum_fold(__wsum csum) > { > u32 sum = (__force u32)csum; > - sum = (sum & 0xffff) + (sum >> 16); > - sum = (sum & 0xffff) + (sum >> 16); > - return (__force __sum16)~sum; > + return (__force __sum16)((~sum - ror32(sum, 16)) >> 16); > } Will (~(sum + ror32(sum, 16))>>16 produce worse code than that? Because at least with recent gcc this will generate the exact thing you get from arm inline asm...
On Sat, Oct 28, 2023 at 12:10:36AM +0100, Al Viro wrote: > On Fri, Oct 27, 2023 at 03:43:51PM -0700, Charlie Jenkins wrote: > > /* > > * computes the checksum of a memory block at buff, length len, > > * and adds in "sum" (32-bit) > > @@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl); > > static inline __sum16 csum_fold(__wsum csum) > > { > > u32 sum = (__force u32)csum; > > - sum = (sum & 0xffff) + (sum >> 16); > > - sum = (sum & 0xffff) + (sum >> 16); > > - return (__force __sum16)~sum; > > + return (__force __sum16)((~sum - ror32(sum, 16)) >> 16); > > } > > Will (~(sum + ror32(sum, 16))>>16 produce worse code than that? > Because at least with recent gcc this will generate the exact thing > you get from arm inline asm... Yes that will produce worse code because an out-of-order processor will be able to leverage that ~sum and ror32(sum, 16) can be computed independently of each other. There are more strict data dependencies in (~(sum + ror32(sum, 16))>>16. - Charlie
diff --git a/include/asm-generic/checksum.h b/include/asm-generic/checksum.h index 43e18db89c14..ad928cce268b 100644 --- a/include/asm-generic/checksum.h +++ b/include/asm-generic/checksum.h @@ -2,6 +2,8 @@ #ifndef __ASM_GENERIC_CHECKSUM_H #define __ASM_GENERIC_CHECKSUM_H +#include <linux/bitops.h> + /* * computes the checksum of a memory block at buff, length len, * and adds in "sum" (32-bit) @@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl); static inline __sum16 csum_fold(__wsum csum) { u32 sum = (__force u32)csum; - sum = (sum & 0xffff) + (sum >> 16); - sum = (sum & 0xffff) + (sum >> 16); - return (__force __sum16)~sum; + return (__force __sum16)((~sum - ror32(sum, 16)) >> 16); } #endif