From patchwork Fri Jul 2 12:31:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Croce X-Patchwork-Id: 12356027 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12DFAC11F68 for ; Fri, 2 Jul 2021 12:32:38 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A2693613F5 for ; Fri, 2 Jul 2021 12:32:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A2693613F5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=6LWn4BNk4WKwJaxONYZ0WM+oV/4QXWzdVLh3MA66pqo=; b=F/P+43ghyxZwoC kkmwJHBiTNQv5rcRfJ30AAQ9GTLdGfzLRkwwIs+92pPkUxo7w1diF0PPN88A2ZhfUjGWAAZIac+9A XK3dCLo0301hl4XQjQAif9c6gp9citk9kum+2H1+DPo1If0hTSMl0l1lyK0gzTVNovXyg+AHA0Urs vcVDS3U6vaS4FpVrG0+PhJbhUCv07nv05ygzNLEKkgjEYDFH20AUWobGQiFrdoK4ziLnj/a1iFoDo TuOD2JyMMPU6V5WajID/OUEW0bUglRB0Q8jpx02VFYbs/DqMG4vlhCnqHDnkOR9wrswdd47mAUx/e 3hDlkFqlXBTt+e3tRMow==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lzILH-002yvM-DP; Fri, 02 Jul 2021 12:32:23 +0000 Received: from mail-ej1-f52.google.com ([209.85.218.52]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lzILE-002yuA-VG for linux-riscv@lists.infradead.org; Fri, 02 Jul 2021 12:32:22 +0000 Received: by mail-ej1-f52.google.com with SMTP id c17so15808991ejk.13 for ; Fri, 02 Jul 2021 05:32:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/GMA6YAdsMJJ2OpDaPRDcSndU6vVBN1z0EV3RST4eEI=; b=XhE0Dk/tHL/MFXvx44/dqIpLWwHI8bc8+xuuznxTjY+vUzkxUM2tbJG+FTUuIK+MjG rSGBfPh/G3tsaaFEQSp7Q5Hp4N7paTRb/vIVEqRmlKgrNracyUPSGiB+M0ZLJNqSe/wK Pwzr5P4wdLaBl/Hid8ON7ia1PRrvujnBlrwzpUxakiHfbn5YH0BVJ5tRENZ5fHzq3Iea peU3NgPhD/Tn8ppBn8TjzaXnDlWa2igQNhnR8nBOdAojVRDzf922ybhdFbVCmIW1LfAs webMPTV/Y816mXy01tAoldfaxl5OokqzoNYdVE3BByPXSIJot9YnI8mWDwE8QfGUyL1d u72w== X-Gm-Message-State: AOAM530of4JnFMwXVnEyj4W7aoUyJxgn0qH+PZ9EdOlLIXivUPYIsmEE e98JYwG1EIsquYZASo8ey/A= X-Google-Smtp-Source: ABdhPJyuBLxcG4vaaxi6RF9BBdfZtZkVEaNit0HDs+2ntcHzQ/RwrmfxN18YyFvI23EQTmjogWV32g== X-Received: by 2002:a17:907:7254:: with SMTP id ds20mr5026096ejc.145.1625229139021; Fri, 02 Jul 2021 05:32:19 -0700 (PDT) Received: from msft-t490s.fritz.box (host-80-182-89-242.retail.telecomitalia.it. [80.182.89.242]) by smtp.gmail.com with ESMTPSA id c3sm1290189edy.0.2021.07.02.05.32.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jul 2021 05:32:18 -0700 (PDT) From: Matteo Croce To: linux-kernel@vger.kernel.org, Nick Kossifidis , Guo Ren , Christoph Hellwig , David Laight , Palmer Dabbelt , Emil Renner Berthing , Drew Fustini Cc: linux-arch@vger.kernel.org, Andrew Morton , Nick Desaulniers , linux-riscv@lists.infradead.org Subject: [PATCH v2 1/3] lib/string: optimized memcpy Date: Fri, 2 Jul 2021 14:31:51 +0200 Message-Id: <20210702123153.14093-2-mcroce@linux.microsoft.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210702123153.14093-1-mcroce@linux.microsoft.com> References: <20210702123153.14093-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210702_053221_060022_8D66C7B4 X-CRM114-Status: GOOD ( 19.36 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Matteo Croce Rewrite the generic memcpy() to copy a word at time, without generating unaligned accesses. The procedure is made of three steps: First copy data one byte at time until the destination buffer is aligned to a long boundary. Then copy the data one long at time shifting the current and the next long to compose a long at every cycle. Finally, copy the remainder one byte at time. This is the improvement on RISC-V: original aligned: 75 Mb/s original unaligned: 75 Mb/s new aligned: 114 Mb/s new unaligned: 107 Mb/s and this the binary size increase according to bloat-o-meter: Function old new delta memcpy 36 324 +288 Signed-off-by: Matteo Croce --- lib/string.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 77 insertions(+), 3 deletions(-) diff --git a/lib/string.c b/lib/string.c index 546d59711a12..caeef4264c43 100644 --- a/lib/string.c +++ b/lib/string.c @@ -33,6 +33,23 @@ #include #include +#define BYTES_LONG sizeof(long) +#define WORD_MASK (BYTES_LONG - 1) +#define MIN_THRESHOLD (BYTES_LONG * 2) + +/* convenience union to avoid cast between different pointer types */ +union types { + u8 *as_u8; + unsigned long *as_ulong; + uintptr_t as_uptr; +}; + +union const_types { + const u8 *as_u8; + const unsigned long *as_ulong; + uintptr_t as_uptr; +}; + #ifndef __HAVE_ARCH_STRNCASECMP /** * strncasecmp - Case insensitive, length-limited string comparison @@ -869,6 +886,13 @@ EXPORT_SYMBOL(memset64); #endif #ifndef __HAVE_ARCH_MEMCPY + +#ifdef __BIG_ENDIAN +#define MERGE_UL(h, l, d) ((h) << ((d) * 8) | (l) >> ((BYTES_LONG - (d)) * 8)) +#else +#define MERGE_UL(h, l, d) ((h) >> ((d) * 8) | (l) << ((BYTES_LONG - (d)) * 8)) +#endif + /** * memcpy - Copy one area of memory to another * @dest: Where to copy to @@ -880,14 +904,64 @@ EXPORT_SYMBOL(memset64); */ void *memcpy(void *dest, const void *src, size_t count) { - char *tmp = dest; - const char *s = src; + union const_types s = { .as_u8 = src }; + union types d = { .as_u8 = dest }; + int distance = 0; + + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { + if (count < MIN_THRESHOLD) + goto copy_remainder; + + /* Copy a byte at time until destination is aligned. */ + for (; d.as_uptr & WORD_MASK; count--) + *d.as_u8++ = *s.as_u8++; + + distance = s.as_uptr & WORD_MASK; + } + + if (distance) { + unsigned long last, next; + /* + * s is distance bytes ahead of d, and d just reached + * the alignment boundary. Move s backward to word align it + * and shift data to compensate for distance, in order to do + * word-by-word copy. + */ + s.as_u8 -= distance; + + next = s.as_ulong[0]; + for (; count >= BYTES_LONG; count -= BYTES_LONG) { + last = next; + next = s.as_ulong[1]; + + d.as_ulong[0] = MERGE_UL(last, next, distance); + + d.as_ulong++; + s.as_ulong++; + } + + /* Restore s with the original offset. */ + s.as_u8 += distance; + } else { + /* + * If the source and dest lower bits are the same, do a simple + * 32/64 bit wide copy. + */ + for (; count >= BYTES_LONG; count -= BYTES_LONG) + *d.as_ulong++ = *s.as_ulong++; + } + +copy_remainder: while (count--) - *tmp++ = *s++; + *d.as_u8++ = *s.as_u8++; + return dest; } EXPORT_SYMBOL(memcpy); + +#undef MERGE_UL + #endif #ifndef __HAVE_ARCH_MEMMOVE From patchwork Fri Jul 2 12:31:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Croce X-Patchwork-Id: 12356029 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 015DAC11F68 for ; Fri, 2 Jul 2021 12:32:40 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A0E8461424 for ; Fri, 2 Jul 2021 12:32:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A0E8461424 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=wRh3Vg5gbcQ5+3391tksiiyO0syNgIUcxovU+ZcDSB8=; b=okEbOSUXEQaygJ Bb1eSes5MVE5aWEVZy4Rk9I9ASfIxnzP7/K6xgcwfIFB/aMg/9uaNGJ8sJNBg64vUyh16C69xWIkv uQ3NsIoWgpi3fbmgGPSCOdLMpJ3fIp8T50hHENFYQkvl0O5ub0FBVA6cPTxLa8Q/I5p+PHcel/jaF EUEQx2rVwlVZd8ERqb3d1gTTuQUuu3ZZe3mgRB66YD65RXCqBQhgbhOMZzDm61zls+yBCk72cNRsR VmkUs5J0pL44AsXKcVWIFM7Yy03NOzT87YEFm0b0+QxEfq+SGedfSPDvroW8I7s9go+8o0P019sPw Gv1XUCTaOtueEybYlOFw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lzILJ-002yx2-Ss; Fri, 02 Jul 2021 12:32:25 +0000 Received: from mail-ej1-f50.google.com ([209.85.218.50]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lzILF-002yuJ-Pe for linux-riscv@lists.infradead.org; Fri, 02 Jul 2021 12:32:23 +0000 Received: by mail-ej1-f50.google.com with SMTP id hr1so12470592ejc.1 for ; Fri, 02 Jul 2021 05:32:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=T4puU0S79nYZVJxL5330o71fJ7BCxyuoFiktDEi3Ycs=; b=lnhT/5EWCZadbN1NPvM78EJB4McAjK0zyvLKNox0+UqRIWtUBYCLGeA9H2O1AHiykR STx62dFQjAzW+PsEp1Y0HLcFdh1s7tWfNodefJR/QIRoOh7Zvk+E1iFBMRqA9VuugclE Y7/ArwCLL96dV6eOPB1KKPiQ43eSqe7l67sGfZwpyts0UldO/1b5evpPdUiLDxNL7soq TZBj7HIfYeQIxaEdKxCeTG0FN9Q2C8q/T1mwSDjAeemTDLCmc9gvLk6yVhB6ioGJUt1E T/IDUMNTarRwEazvGSndq3TX0Xjvvj3EuvcynNV9NCT3W2EPlzsHKMKONY7jM7xk2D1s PZqA== X-Gm-Message-State: AOAM5322zu2Yi7zWLUh6USec3UvfpiWvR2wIGNYF/wj3aWoyeek2NreZ aNgapXmXjIrtRSJXNQX2s7Y= X-Google-Smtp-Source: ABdhPJzVbfW2f5ok3CMI+LmMsoURAHir44e7S77s4RVSljg6S5X+t1PvRrH04Og7dFUH6Z5AD+3Xug== X-Received: by 2002:a17:906:2bd9:: with SMTP id n25mr5046575ejg.513.1625229140622; Fri, 02 Jul 2021 05:32:20 -0700 (PDT) Received: from msft-t490s.fritz.box (host-80-182-89-242.retail.telecomitalia.it. [80.182.89.242]) by smtp.gmail.com with ESMTPSA id c3sm1290189edy.0.2021.07.02.05.32.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jul 2021 05:32:20 -0700 (PDT) From: Matteo Croce To: linux-kernel@vger.kernel.org, Nick Kossifidis , Guo Ren , Christoph Hellwig , David Laight , Palmer Dabbelt , Emil Renner Berthing , Drew Fustini Cc: linux-arch@vger.kernel.org, Andrew Morton , Nick Desaulniers , linux-riscv@lists.infradead.org Subject: [PATCH v2 2/3] lib/string: optimized memmove Date: Fri, 2 Jul 2021 14:31:52 +0200 Message-Id: <20210702123153.14093-3-mcroce@linux.microsoft.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210702123153.14093-1-mcroce@linux.microsoft.com> References: <20210702123153.14093-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210702_053221_877771_F2797F94 X-CRM114-Status: GOOD ( 13.50 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Matteo Croce When the destination buffer is before the source one, or when the buffers doesn't overlap, it's safe to use memcpy() instead, which is optimized to use a bigger data size possible. This "optimization" only covers a common case. In future, proper code which does the same thing as memcpy() does but backwards can be done. Signed-off-by: Matteo Croce --- lib/string.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/lib/string.c b/lib/string.c index caeef4264c43..108b83c34cec 100644 --- a/lib/string.c +++ b/lib/string.c @@ -975,19 +975,13 @@ EXPORT_SYMBOL(memcpy); */ void *memmove(void *dest, const void *src, size_t count) { - char *tmp; - const char *s; + if (dest < src || src + count <= dest) + return memcpy(dest, src, count); + + if (dest > src) { + const char *s = src + count; + char *tmp = dest + count; - if (dest <= src) { - tmp = dest; - s = src; - while (count--) - *tmp++ = *s++; - } else { - tmp = dest; - tmp += count; - s = src; - s += count; while (count--) *--tmp = *--s; } From patchwork Fri Jul 2 12:31:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Croce X-Patchwork-Id: 12356031 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 810FCC11F68 for ; Fri, 2 Jul 2021 12:32:45 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 46303613F5 for ; Fri, 2 Jul 2021 12:32:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46303613F5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=K8DeZzbUtISb3Regy2EAC5S+mn3eDwRDqVfZrZPDGkg=; b=PlxJLFPSrB0Vw9 AREBrBokHe5JDr/G7HW4+GsPI9xX67U+LdJDOGxG+vfR1WcrMfaBuGQSpBk2kNZ727EI7m+WKvoOe 8pDjB1AAi7Q7X8FXHBnQdUImVvNj3ELFzkPEu8s/YhAHTfWlbtLNg8ghel8R1kbq5fcbTzQ6JbCfR eAhMBDcfNJmlDKozEH74G9uyO4HzWraLpfQVM6UHfpnLIohmyEL1SOzGRvaWmkC8zfANA0pG7ty3g hewK3Rg2fFvgL0dAgQFqK2qdLIjvV8RmM2PxKCBaWbthmfeRIh/LFC091CtbLrPIDIDisVYvWRcf0 mX/0TTeIkFAaCAopd+Sw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lzILP-002z0A-DI; Fri, 02 Jul 2021 12:32:31 +0000 Received: from mail-ej1-f49.google.com ([209.85.218.49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lzILM-002yyF-NQ for linux-riscv@lists.infradead.org; Fri, 02 Jul 2021 12:32:30 +0000 Received: by mail-ej1-f49.google.com with SMTP id bg14so15839906ejb.9 for ; Fri, 02 Jul 2021 05:32:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=o6ZDZDCvhi/aYSH4+o4WB0jbVOX9Yf7JZyqDclGvSvg=; b=DStEl9R1lG99N6hym2mTLwdYPGsXdLStJaT5dryEHNz3YMRnYdbaAEV6tVXf4nmXcx ozkXJofZwhAaazJjLNLrK0BtuRgK7+lRZROJC5GYC6rIXxPzJRCwjMg5zba2+/ZQVBoA o7L+yRNaAiqmlj+zP+mmG/iXiabSNwjrZwpDbghbvJmFJDnQZQgeKiq0gsa2rCL5I+oC s6x6Cjom2QOy/5StsHL2S12Tzdv/Uvsk1nEeD/2uHcZUbJdbRhgaSa788RTx/liOgv1V O9XXYpaxGIrPMDvKGTXh1KszRhjzEUt5LYknDhBMKaiglpDUgR74To6cQnE60mJ9wV2W P5pw== X-Gm-Message-State: AOAM533AcK4qk0MTjfT98cSFwMgFNjBTjXh3uEHnUfpjhiqeiN/V4RqC Wg3obbTtgeC5lUgyof0czlg= X-Google-Smtp-Source: ABdhPJwQrfzQ1h6zczBdfLb25MTXk3GdnKc5nFaDGxDMgWYbUpCHG31dhKgZ9CSECScHKbbT4BunVg== X-Received: by 2002:a17:906:9b86:: with SMTP id dd6mr4909110ejc.100.1625229142346; Fri, 02 Jul 2021 05:32:22 -0700 (PDT) Received: from msft-t490s.fritz.box (host-80-182-89-242.retail.telecomitalia.it. [80.182.89.242]) by smtp.gmail.com with ESMTPSA id c3sm1290189edy.0.2021.07.02.05.32.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jul 2021 05:32:21 -0700 (PDT) From: Matteo Croce To: linux-kernel@vger.kernel.org, Nick Kossifidis , Guo Ren , Christoph Hellwig , David Laight , Palmer Dabbelt , Emil Renner Berthing , Drew Fustini Cc: linux-arch@vger.kernel.org, Andrew Morton , Nick Desaulniers , linux-riscv@lists.infradead.org Subject: [PATCH v2 3/3] lib/string: optimized memset Date: Fri, 2 Jul 2021 14:31:53 +0200 Message-Id: <20210702123153.14093-4-mcroce@linux.microsoft.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210702123153.14093-1-mcroce@linux.microsoft.com> References: <20210702123153.14093-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210702_053228_818650_73E1BD35 X-CRM114-Status: GOOD ( 16.50 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Matteo Croce The generic memset is defined as a byte at time write. This is always safe, but it's slower than a 4 byte or even 8 byte write. Write a generic memset which fills the data one byte at time until the destination is aligned, then fills using the largest size allowed, and finally fills the remaining data one byte at time. On a RISC-V machine the speed goes from 140 Mb/s to 241 Mb/s, and this the binary size increase according to bloat-o-meter: Function old new delta memset 32 148 +116 Signed-off-by: Matteo Croce --- lib/string.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/lib/string.c b/lib/string.c index 108b83c34cec..264821f0e795 100644 --- a/lib/string.c +++ b/lib/string.c @@ -810,10 +810,38 @@ EXPORT_SYMBOL(__sysfs_match_string); */ void *memset(void *s, int c, size_t count) { - char *xs = s; + union types dest = { .as_u8 = s }; + if (count >= MIN_THRESHOLD) { + unsigned long cu = (unsigned long)c; + + /* Compose an ulong with 'c' repeated 4/8 times */ +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER + cu *= 0x0101010101010101UL; +#else + cu |= cu << 8; + cu |= cu << 16; + /* Suppress warning on 32 bit machines */ + cu |= (cu << 16) << 16; +#endif + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { + /* + * Fill the buffer one byte at time until + * the destination is word aligned. + */ + for (; count && dest.as_uptr & WORD_MASK; count--) + *dest.as_u8++ = c; + } + + /* Copy using the largest size allowed */ + for (; count >= BYTES_LONG; count -= BYTES_LONG) + *dest.as_ulong++ = cu; + } + + /* copy the remainder */ while (count--) - *xs++ = c; + *dest.as_u8++ = c; + return s; } EXPORT_SYMBOL(memset);