Message ID | 1483690133-25104-1-git-send-email-lprosek@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> On 6 Jan 2017, at 10:08 AM, Ladi Prosek <lprosek@redhat.com> wrote: > > Very simple loop optimization with a significant performance impact. > > Microbenchmark results, modern x86-64: > > buffer size | speed up > ------------+--------- > 1500 | 1.7x > 64 | 1.5x > 8 | 1.15x > > Microbenchmark results, POWER7: > > buffer size | speed up > ------------+--------- > 1500 | 5x > 64 | 3.3x > 8 | 1.13x > > There is a lot of room for further improvement at the expense of > code complexity - aligned multibyte reads, LE/BE considerations, > architecture-specific optimizations, etc. This patch still keeps > things simple and readable. Reviewed-by: Dmitry Fleytman <dmitry@daynix.com> > > Signed-off-by: Ladi Prosek <lprosek@redhat.com> > --- > net/checksum.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/net/checksum.c b/net/checksum.c > index 23323b0..4da72a6 100644 > --- a/net/checksum.c > +++ b/net/checksum.c > @@ -22,17 +22,22 @@ > > uint32_t net_checksum_add_cont(int len, uint8_t *buf, int seq) > { > - uint32_t sum = 0; > + uint32_t sum1 = 0, sum2 = 0; > int i; > > - for (i = seq; i < seq + len; i++) { > - if (i & 1) { > - sum += (uint32_t)buf[i - seq]; > - } else { > - sum += (uint32_t)buf[i - seq] << 8; > - } > + for (i = 0; i < len - 1; i += 2) { > + sum1 += (uint32_t)buf[i]; > + sum2 += (uint32_t)buf[i + 1]; > + } > + if (i < len) { > + sum1 += (uint32_t)buf[i]; > + } > + > + if (seq & 1) { > + return sum1 + (sum2 << 8); > + } else { > + return sum2 + (sum1 << 8); > } > - return sum; > } > > uint16_t net_checksum_finish(uint32_t sum) > -- > 2.7.4 >
On 2017年01月08日 17:03, Dmitry Fleytman wrote: >> On 6 Jan 2017, at 10:08 AM, Ladi Prosek <lprosek@redhat.com> wrote: >> >> Very simple loop optimization with a significant performance impact. >> >> Microbenchmark results, modern x86-64: >> >> buffer size | speed up >> ------------+--------- >> 1500 | 1.7x >> 64 | 1.5x >> 8 | 1.15x >> >> Microbenchmark results, POWER7: >> >> buffer size | speed up >> ------------+--------- >> 1500 | 5x >> 64 | 3.3x >> 8 | 1.13x >> >> There is a lot of room for further improvement at the expense of >> code complexity - aligned multibyte reads, LE/BE considerations, >> architecture-specific optimizations, etc. This patch still keeps >> things simple and readable. > Reviewed-by: Dmitry Fleytman <dmitry@daynix.com> > Applied to -net. Thanks
diff --git a/net/checksum.c b/net/checksum.c index 23323b0..4da72a6 100644 --- a/net/checksum.c +++ b/net/checksum.c @@ -22,17 +22,22 @@ uint32_t net_checksum_add_cont(int len, uint8_t *buf, int seq) { - uint32_t sum = 0; + uint32_t sum1 = 0, sum2 = 0; int i; - for (i = seq; i < seq + len; i++) { - if (i & 1) { - sum += (uint32_t)buf[i - seq]; - } else { - sum += (uint32_t)buf[i - seq] << 8; - } + for (i = 0; i < len - 1; i += 2) { + sum1 += (uint32_t)buf[i]; + sum2 += (uint32_t)buf[i + 1]; + } + if (i < len) { + sum1 += (uint32_t)buf[i]; + } + + if (seq & 1) { + return sum1 + (sum2 << 8); + } else { + return sum2 + (sum1 << 8); } - return sum; } uint16_t net_checksum_finish(uint32_t sum)
Very simple loop optimization with a significant performance impact. Microbenchmark results, modern x86-64: buffer size | speed up ------------+--------- 1500 | 1.7x 64 | 1.5x 8 | 1.15x Microbenchmark results, POWER7: buffer size | speed up ------------+--------- 1500 | 5x 64 | 3.3x 8 | 1.13x There is a lot of room for further improvement at the expense of code complexity - aligned multibyte reads, LE/BE considerations, architecture-specific optimizations, etc. This patch still keeps things simple and readable. Signed-off-by: Ladi Prosek <lprosek@redhat.com> --- net/checksum.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)