From patchwork Tue Jun 15 02:38:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Croce X-Patchwork-Id: 12322631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0107C48BDF for ; Tue, 15 Jun 2021 18:30:19 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6E75060FF0 for ; Tue, 15 Jun 2021 18:30:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6E75060FF0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=FiE9oksvHtKGoHew++ojqgegGUKBmi3qswWshpCBc7g=; b=0BbJ/nFA+SWI6N 2HfdCJpj3/SiSV6ny/O6YmgWeNrzGuoJ4kP3Ef7lJ2gHXGk9WjJf6lD7MZT9ytxdBJMaS7mD7Zqsv iNy1hgMtoXLiClKJvZGvC8Rn3JvhRcPbBXGo9J7yiaWojIaY33JtPv1wIaolEX76GHDOb+lVwLvyL XcUs+u/Ggwd2JJjeKqKxXGOeBaAmUJ/U25cmHtudzTPlMPW+kXxlkvEZwziVok+0hVVhwQhI2LUF1 FWH80YLYeVJNROe0C/wU0i7RqXCBkJVZe6Ptl1621vW9JIcns5KqTWn3DhSgJ9Wt5s0cTxLd4eGZO Me1PfNa37NlCSTSEsV0Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ltDp5-0027ec-0T; Tue, 15 Jun 2021 18:30:03 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lt9Pm-000DDm-6L for linux-riscv@bombadil.infradead.org; Tue, 15 Jun 2021 13:47:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=OThMiBWqm2X7NaT3Fk/xKklrPu8milZY09O+f8YgZh0=; b=JT9MBJNeNQPFiRmG3IgpoUHA7v zjxc9SdrinaP5JGeeWLpVWBsGEZ+tRK2U2adqJJ9ZrsVBaEWRYd8CZY7hwyoE9UQJ09kUAZKVbZ2v zIGvOfi1tC3WGyi+OaebNrLeLspuYbzxjL0Iglv7iqqQLO2o/rS+WBcyrhH8VqxPJKOY+OKvjpbbR zpELp6tKBPzQp5/SQBVRToIsTBnIc1yM3jSV9ymrHzcgQfhdrLIbjuPF6S6A8LofHaU2MJm3gGytg CDc7UjcHFQa4xJwoq581SUA6biHkDs2NIMQRhQoEC0jb4bNgURawMkWV48TQERFpc+yrJVcNWfPqr dM8D8Uog==; Received: from mail-ej1-f49.google.com ([209.85.218.49]) by desiato.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lsyzw-007GxD-NO for linux-riscv@lists.infradead.org; Tue, 15 Jun 2021 02:40:26 +0000 Received: by mail-ej1-f49.google.com with SMTP id k7so19834306ejv.12 for ; Mon, 14 Jun 2021 19:40:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OThMiBWqm2X7NaT3Fk/xKklrPu8milZY09O+f8YgZh0=; b=X6684+L9LdIwn4EX0T7Bjt1dypz7fd/aFNkN15RFoBimXYCpEw0KlMnJd7eN75ayXm z9Icfjy97HJ+70JAzrBjH8TzXadi2qelvj+/fUn+XpepqhRUg+nwy413gU+5pbefofek IyrBcd51Eyt0Xh5BxhHueR76Wi+cWC3yp9xo3k6tfjASbeasOjpXYIzehkLy2gcMkEA6 RNJ9CJ8enRjx9BX84vdLiw7LIpW0TX1ecqDjbS4a4jUf+W8j2YzR7l/OUp7Rk/OcNIjr OZZYGN3LfwJl6VTvFT1NM7UPcmTjp85/BLU9ZN2a0J6I9TRz0zxomvuPKR6+tH9aEW2x 7G6Q== X-Gm-Message-State: AOAM53200JiZ+XpmADtSJM2m69Cr6CfNzxHOXrDYuH9vwszPr32TOkgV lP+CqK7kk9fGhiindLLj+BVZsTbTTAIVtg== X-Google-Smtp-Source: ABdhPJzzUePkA5GTrHcS9fewGvf6Z628SeyPzKTgjrUu8tj8JkSjLwWjra5xnyQZWv9gOkPoN+xOhw== X-Received: by 2002:a17:907:270b:: with SMTP id w11mr16382855ejk.7.1623724822768; Mon, 14 Jun 2021 19:40:22 -0700 (PDT) Received: from msft-t490s.teknoraver.net (net-37-119-128-179.cust.vodafonedsl.it. [37.119.128.179]) by smtp.gmail.com with ESMTPSA id cn25sm834966edb.69.2021.06.14.19.40.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 19:40:22 -0700 (PDT) From: Matteo Croce To: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Paul Walmsley , Palmer Dabbelt , Albert Ou , Atish Patra , Emil Renner Berthing , Akira Tsukamoto , Drew Fustini , Bin Meng Subject: [PATCH 1/3] riscv: optimized memcpy Date: Tue, 15 Jun 2021 04:38:10 +0200 Message-Id: <20210615023812.50885-2-mcroce@linux.microsoft.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210615023812.50885-1-mcroce@linux.microsoft.com> References: <20210615023812.50885-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210615_034024_592006_5BC27A51 X-CRM114-Status: GOOD ( 27.84 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Matteo Croce Write a C version of memcpy() which uses the biggest data size allowed, without generating unaligned accesses. The procedure is made of three steps: First copy data one byte at time until the destination buffer is aligned to a long boundary. Then copy the data one long at time shifting the current and the next u8 to compose a long at every cycle. Finally, copy the remainder one byte at time. On a BeagleV, the TCP RX throughput increased by 45%: before: $ iperf3 -c beaglev Connecting to host beaglev, port 5201 [ 5] local 192.168.85.6 port 44840 connected to 192.168.85.48 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 76.4 MBytes 641 Mbits/sec 27 624 KBytes [ 5] 1.00-2.00 sec 72.5 MBytes 608 Mbits/sec 0 708 KBytes [ 5] 2.00-3.00 sec 73.8 MBytes 619 Mbits/sec 10 451 KBytes [ 5] 3.00-4.00 sec 72.5 MBytes 608 Mbits/sec 0 564 KBytes [ 5] 4.00-5.00 sec 73.8 MBytes 619 Mbits/sec 0 658 KBytes [ 5] 5.00-6.00 sec 73.8 MBytes 619 Mbits/sec 14 522 KBytes [ 5] 6.00-7.00 sec 73.8 MBytes 619 Mbits/sec 0 621 KBytes [ 5] 7.00-8.00 sec 72.5 MBytes 608 Mbits/sec 0 706 KBytes [ 5] 8.00-9.00 sec 73.8 MBytes 619 Mbits/sec 20 580 KBytes [ 5] 9.00-10.00 sec 73.8 MBytes 619 Mbits/sec 0 672 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 736 MBytes 618 Mbits/sec 71 sender [ 5] 0.00-10.01 sec 733 MBytes 615 Mbits/sec receiver after: $ iperf3 -c beaglev Connecting to host beaglev, port 5201 [ 5] local 192.168.85.6 port 44864 connected to 192.168.85.48 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 109 MBytes 912 Mbits/sec 48 559 KBytes [ 5] 1.00-2.00 sec 108 MBytes 902 Mbits/sec 0 690 KBytes [ 5] 2.00-3.00 sec 106 MBytes 891 Mbits/sec 36 396 KBytes [ 5] 3.00-4.00 sec 108 MBytes 902 Mbits/sec 0 567 KBytes [ 5] 4.00-5.00 sec 106 MBytes 891 Mbits/sec 0 699 KBytes [ 5] 5.00-6.00 sec 106 MBytes 891 Mbits/sec 32 414 KBytes [ 5] 6.00-7.00 sec 106 MBytes 891 Mbits/sec 0 583 KBytes [ 5] 7.00-8.00 sec 106 MBytes 891 Mbits/sec 0 708 KBytes [ 5] 8.00-9.00 sec 106 MBytes 891 Mbits/sec 28 433 KBytes [ 5] 9.00-10.00 sec 108 MBytes 902 Mbits/sec 0 591 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.04 GBytes 897 Mbits/sec 144 sender [ 5] 0.00-10.01 sec 1.04 GBytes 894 Mbits/sec receiver And the decreased CPU time of the memcpy() is observable with perf top. This is the `perf top -Ue task-clock` output when doing the test: before: Overhead Shared O Symbol 42.22% [kernel] [k] memcpy 35.00% [kernel] [k] __asm_copy_to_user 3.50% [kernel] [k] sifive_l2_flush64_range 2.30% [kernel] [k] stmmac_napi_poll_rx 1.11% [kernel] [k] memset after: Overhead Shared O Symbol 45.69% [kernel] [k] __asm_copy_to_user 29.06% [kernel] [k] memcpy 4.09% [kernel] [k] sifive_l2_flush64_range 2.77% [kernel] [k] stmmac_napi_poll_rx 1.24% [kernel] [k] memset Signed-off-by: Matteo Croce --- arch/riscv/include/asm/string.h | 8 ++- arch/riscv/kernel/riscv_ksyms.c | 2 - arch/riscv/lib/Makefile | 2 +- arch/riscv/lib/memcpy.S | 108 -------------------------------- arch/riscv/lib/string.c | 94 +++++++++++++++++++++++++++ 5 files changed, 101 insertions(+), 113 deletions(-) delete mode 100644 arch/riscv/lib/memcpy.S create mode 100644 arch/riscv/lib/string.c diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h index 909049366555..6b5d6fc3eab4 100644 --- a/arch/riscv/include/asm/string.h +++ b/arch/riscv/include/asm/string.h @@ -12,9 +12,13 @@ #define __HAVE_ARCH_MEMSET extern asmlinkage void *memset(void *, int, size_t); extern asmlinkage void *__memset(void *, int, size_t); + +#ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE #define __HAVE_ARCH_MEMCPY -extern asmlinkage void *memcpy(void *, const void *, size_t); -extern asmlinkage void *__memcpy(void *, const void *, size_t); +extern void *memcpy(void *dest, const void *src, size_t count); +extern void *__memcpy(void *dest, const void *src, size_t count); +#endif + #define __HAVE_ARCH_MEMMOVE extern asmlinkage void *memmove(void *, const void *, size_t); extern asmlinkage void *__memmove(void *, const void *, size_t); diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c index 5ab1c7e1a6ed..3f6d512a5b97 100644 --- a/arch/riscv/kernel/riscv_ksyms.c +++ b/arch/riscv/kernel/riscv_ksyms.c @@ -10,8 +10,6 @@ * Assembly functions that may be used (directly or indirectly) by modules */ EXPORT_SYMBOL(memset); -EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(memmove); EXPORT_SYMBOL(__memset); -EXPORT_SYMBOL(__memcpy); EXPORT_SYMBOL(__memmove); diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile index 25d5c9664e57..2ffe85d4baee 100644 --- a/arch/riscv/lib/Makefile +++ b/arch/riscv/lib/Makefile @@ -1,9 +1,9 @@ # SPDX-License-Identifier: GPL-2.0-only lib-y += delay.o -lib-y += memcpy.o lib-y += memset.o lib-y += memmove.o lib-$(CONFIG_MMU) += uaccess.o lib-$(CONFIG_64BIT) += tishift.o +lib-$(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE) += string.o obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o diff --git a/arch/riscv/lib/memcpy.S b/arch/riscv/lib/memcpy.S deleted file mode 100644 index 51ab716253fa..000000000000 --- a/arch/riscv/lib/memcpy.S +++ /dev/null @@ -1,108 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Regents of the University of California - */ - -#include -#include - -/* void *memcpy(void *, const void *, size_t) */ -ENTRY(__memcpy) -WEAK(memcpy) - move t6, a0 /* Preserve return value */ - - /* Defer to byte-oriented copy for small sizes */ - sltiu a3, a2, 128 - bnez a3, 4f - /* Use word-oriented copy only if low-order bits match */ - andi a3, t6, SZREG-1 - andi a4, a1, SZREG-1 - bne a3, a4, 4f - - beqz a3, 2f /* Skip if already aligned */ - /* - * Round to nearest double word-aligned address - * greater than or equal to start address - */ - andi a3, a1, ~(SZREG-1) - addi a3, a3, SZREG - /* Handle initial misalignment */ - sub a4, a3, a1 -1: - lb a5, 0(a1) - addi a1, a1, 1 - sb a5, 0(t6) - addi t6, t6, 1 - bltu a1, a3, 1b - sub a2, a2, a4 /* Update count */ - -2: - andi a4, a2, ~((16*SZREG)-1) - beqz a4, 4f - add a3, a1, a4 -3: - REG_L a4, 0(a1) - REG_L a5, SZREG(a1) - REG_L a6, 2*SZREG(a1) - REG_L a7, 3*SZREG(a1) - REG_L t0, 4*SZREG(a1) - REG_L t1, 5*SZREG(a1) - REG_L t2, 6*SZREG(a1) - REG_L t3, 7*SZREG(a1) - REG_L t4, 8*SZREG(a1) - REG_L t5, 9*SZREG(a1) - REG_S a4, 0(t6) - REG_S a5, SZREG(t6) - REG_S a6, 2*SZREG(t6) - REG_S a7, 3*SZREG(t6) - REG_S t0, 4*SZREG(t6) - REG_S t1, 5*SZREG(t6) - REG_S t2, 6*SZREG(t6) - REG_S t3, 7*SZREG(t6) - REG_S t4, 8*SZREG(t6) - REG_S t5, 9*SZREG(t6) - REG_L a4, 10*SZREG(a1) - REG_L a5, 11*SZREG(a1) - REG_L a6, 12*SZREG(a1) - REG_L a7, 13*SZREG(a1) - REG_L t0, 14*SZREG(a1) - REG_L t1, 15*SZREG(a1) - addi a1, a1, 16*SZREG - REG_S a4, 10*SZREG(t6) - REG_S a5, 11*SZREG(t6) - REG_S a6, 12*SZREG(t6) - REG_S a7, 13*SZREG(t6) - REG_S t0, 14*SZREG(t6) - REG_S t1, 15*SZREG(t6) - addi t6, t6, 16*SZREG - bltu a1, a3, 3b - andi a2, a2, (16*SZREG)-1 /* Update count */ - -4: - /* Handle trailing misalignment */ - beqz a2, 6f - add a3, a1, a2 - - /* Use word-oriented copy if co-aligned to word boundary */ - or a5, a1, t6 - or a5, a5, a3 - andi a5, a5, 3 - bnez a5, 5f -7: - lw a4, 0(a1) - addi a1, a1, 4 - sw a4, 0(t6) - addi t6, t6, 4 - bltu a1, a3, 7b - - ret - -5: - lb a4, 0(a1) - addi a1, a1, 1 - sb a4, 0(t6) - addi t6, t6, 1 - bltu a1, a3, 5b -6: - ret -END(__memcpy) diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c new file mode 100644 index 000000000000..525f9ee25a74 --- /dev/null +++ b/arch/riscv/lib/string.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * String functions optimized for hardware which doesn't + * handle unaligned memory accesses efficiently. + * + * Copyright (C) 2021 Matteo Croce + */ + +#include +#include + +/* size below a classic byte at time copy is done */ +#define MIN_THRESHOLD 64 + +/* convenience types to avoid cast between different pointer types */ +union types { + u8 *u8; + unsigned long *ulong; + uintptr_t uptr; +}; + +union const_types { + const u8 *u8; + unsigned long *ulong; +}; + +void *memcpy(void *dest, const void *src, size_t count) +{ + const int bytes_long = BITS_PER_LONG / 8; +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + const int mask = bytes_long - 1; + const int distance = (src - dest) & mask; +#endif + union const_types s = { .u8 = src }; + union types d = { .u8 = dest }; + +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + if (count <= MIN_THRESHOLD) + goto copy_remainder; + + /* copy a byte at time until destination is aligned */ + for (; count && d.uptr & mask; count--) + *d.u8++ = *s.u8++; + + if (distance) { + unsigned long last, next; + + /* move s backward to the previous alignment boundary */ + s.u8 -= distance; + + /* 32/64 bit wide copy from s to d. + * d is aligned now but s is not, so read s alignment wise, + * and do proper shift to get the right value. + * Works only on Little Endian machines. + */ + for (next = s.ulong[0]; count >= bytes_long + mask; count -= bytes_long) { + last = next; + next = s.ulong[1]; + + d.ulong[0] = last >> (distance * 8) | + next << ((bytes_long - distance) * 8); + + d.ulong++; + s.ulong++; + } + + /* restore s with the original offset */ + s.u8 += distance; + } else +#endif + { + /* if the source and dest lower bits are the same, do a simple + * 32/64 bit wide copy. + */ + for (; count >= bytes_long; count -= bytes_long) + *d.ulong++ = *s.ulong++; + } + + /* suppress warning when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y */ + goto copy_remainder; + +copy_remainder: + while (count--) + *d.u8++ = *s.u8++; + + return dest; +} +EXPORT_SYMBOL(memcpy); + +void *__memcpy(void *dest, const void *src, size_t count) +{ + return memcpy(dest, src, count); +} +EXPORT_SYMBOL(__memcpy); From patchwork Tue Jun 15 02:38:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Croce X-Patchwork-Id: 12322381 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15ECCC48BE5 for ; Tue, 15 Jun 2021 16:31:39 +0000 (UTC) Received: from bombadil.infradead.org (unknown [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C6A5C61458 for ; Tue, 15 Jun 2021 16:31:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C6A5C61458 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=OhnoHgImGzURWNWsMVjr4vGcL+hczVbV6kIJPNxnagI=; b=j420FsKH1niTvp nxy5xjoChwr/YQf4B5FXZKtCbcXAafZW3meVK1x3m7WE9852LuqyprgOM8mp+DMV4SKSZD2JuPhjk /pHpBay6pvbG5ipJxA/P3SuHOeJthYYIApdKEZBd8hq6J0F/FPC1Nhu7Mh+Xmf5rQUfdo4vUuVF4+ cUHVn6mHk4jJmfyct5QyaO5Urt0SKX7CykegjxA5b0rKtbjp7l20kz5cyMsfYGPr+tt+KCknUhYdk wcuBnd2MC0IkokkVJWt5SxFTkVYTF1R4DIls+6tsf5wtnKxZDonfcDMNLwMKbOPZKW6uGv1GEbZEU mG9Q4txWGCPo/t+Lyl5A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ltBxT-001N1z-RD; Tue, 15 Jun 2021 16:30:36 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lt9Pl-000DwX-TL for linux-riscv@bombadil.infradead.org; Tue, 15 Jun 2021 13:47:38 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=B5eq+NMAoBB0mBYPYHItWOU/L9Y+dhBJjcdEFT7I1T0=; b=GBiGhVK5ov7ZwYeBwRvNHOG0Bv rmMBo2wffPvzdMXBjIHJAm/kwOXr948A+udaqaanyYRVeaBvChd9RI8wIuqjGbnKZGCjJ/+8yYe1c Rf2iIJft3VH9qp5ri3/9t3ntvZBylD0Q5VuexoCqZEDrDOA8kfeHsKuL3hsy+74Cui+2cNJhawq0H hyVSc/iVpTJ1e4WMhexhP7fMh1cAp5prsuE3kfMDP9lVESETctVEtjMVyRL9qniHUTdf9ji+PT4oW lX3cUcGMNWa7TxcEhTemtmsIWf52XFwlqAoZq37KYvb+ddiM2OsErVQduuzA2JbA13KpSDcjjL6rD S8n68sGw==; Received: from mail-ed1-f53.google.com ([209.85.208.53]) by desiato.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lsyzx-007GxG-TO for linux-riscv@lists.infradead.org; Tue, 15 Jun 2021 02:40:26 +0000 Received: by mail-ed1-f53.google.com with SMTP id dj8so49026747edb.6 for ; Mon, 14 Jun 2021 19:40:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B5eq+NMAoBB0mBYPYHItWOU/L9Y+dhBJjcdEFT7I1T0=; b=Hrt+LidfOVskoohmeO7dxGNnizKlX2fyWiT1uhzX8/QMGwJjdau/cr/f+vK1QLAtwP 24jHlDpdmyVqjHnfE/3tX1iOmhoMIOwdNaxNImtHoQR9MmbqlpCmUOcWcIfaSwLyJmru jwWEyGgL15MGOnm8hIdmiSxEN4cwxzSuXUsuvFdqOdScVbMAzffxgkLTRHQHnoRodGUq P8QpVphNj3WsSXHFPC2F6dxtX6POWakDKIMSbQ3gzIjsUeiYak8SxDgxfVRCA/VLnSP+ SKLAInstGudpS4qO3s7HU+vWAg7bP7w5ScFnHfWs3y14dMTcChN5cFK899Nf98ui9lw1 SEDw== X-Gm-Message-State: AOAM533TXyZ/odQvDymV/K5VK6VB/idpjTXsTB4NLXnolpzWQbfonKiD 5W8Tmefrg2lZmNMzF9oY0ZLtUNESnqD9Lw== X-Google-Smtp-Source: ABdhPJyxEl2Bki0u7t8w2cOeJo1RLJ78EgRJ98Dqw2D99S4+5PkI8ZANdpxsaExN1kz5bIyZ8ZCuvA== X-Received: by 2002:aa7:d590:: with SMTP id r16mr20773482edq.355.1623724823944; Mon, 14 Jun 2021 19:40:23 -0700 (PDT) Received: from msft-t490s.teknoraver.net (net-37-119-128-179.cust.vodafonedsl.it. [37.119.128.179]) by smtp.gmail.com with ESMTPSA id cn25sm834966edb.69.2021.06.14.19.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 19:40:23 -0700 (PDT) From: Matteo Croce To: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Paul Walmsley , Palmer Dabbelt , Albert Ou , Atish Patra , Emil Renner Berthing , Akira Tsukamoto , Drew Fustini , Bin Meng Subject: [PATCH 2/3] riscv: optimized memmove Date: Tue, 15 Jun 2021 04:38:11 +0200 Message-Id: <20210615023812.50885-3-mcroce@linux.microsoft.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210615023812.50885-1-mcroce@linux.microsoft.com> References: <20210615023812.50885-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210615_034024_938337_D476F2DD X-CRM114-Status: GOOD ( 17.01 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Matteo Croce When the destination buffer is before the source one, or when the buffers doesn't overlap, it's safe to use memcpy() instead, which is optimized to use a bigger data size possible. Signed-off-by: Matteo Croce --- arch/riscv/include/asm/string.h | 6 ++-- arch/riscv/kernel/riscv_ksyms.c | 2 -- arch/riscv/lib/Makefile | 1 - arch/riscv/lib/memmove.S | 64 --------------------------------- arch/riscv/lib/string.c | 26 ++++++++++++++ 5 files changed, 29 insertions(+), 70 deletions(-) delete mode 100644 arch/riscv/lib/memmove.S diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h index 6b5d6fc3eab4..25d9b9078569 100644 --- a/arch/riscv/include/asm/string.h +++ b/arch/riscv/include/asm/string.h @@ -17,11 +17,11 @@ extern asmlinkage void *__memset(void *, int, size_t); #define __HAVE_ARCH_MEMCPY extern void *memcpy(void *dest, const void *src, size_t count); extern void *__memcpy(void *dest, const void *src, size_t count); +#define __HAVE_ARCH_MEMMOVE +extern void *memmove(void *dest, const void *src, size_t count); +extern void *__memmove(void *dest, const void *src, size_t count); #endif -#define __HAVE_ARCH_MEMMOVE -extern asmlinkage void *memmove(void *, const void *, size_t); -extern asmlinkage void *__memmove(void *, const void *, size_t); /* For those files which don't want to check by kasan. */ #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__) #define memcpy(dst, src, len) __memcpy(dst, src, len) diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c index 3f6d512a5b97..361565c4db7e 100644 --- a/arch/riscv/kernel/riscv_ksyms.c +++ b/arch/riscv/kernel/riscv_ksyms.c @@ -10,6 +10,4 @@ * Assembly functions that may be used (directly or indirectly) by modules */ EXPORT_SYMBOL(memset); -EXPORT_SYMBOL(memmove); EXPORT_SYMBOL(__memset); -EXPORT_SYMBOL(__memmove); diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile index 2ffe85d4baee..484f5ff7b508 100644 --- a/arch/riscv/lib/Makefile +++ b/arch/riscv/lib/Makefile @@ -1,7 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only lib-y += delay.o lib-y += memset.o -lib-y += memmove.o lib-$(CONFIG_MMU) += uaccess.o lib-$(CONFIG_64BIT) += tishift.o lib-$(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE) += string.o diff --git a/arch/riscv/lib/memmove.S b/arch/riscv/lib/memmove.S deleted file mode 100644 index 07d1d2152ba5..000000000000 --- a/arch/riscv/lib/memmove.S +++ /dev/null @@ -1,64 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ - -#include -#include - -ENTRY(__memmove) -WEAK(memmove) - move t0, a0 - move t1, a1 - - beq a0, a1, exit_memcpy - beqz a2, exit_memcpy - srli t2, a2, 0x2 - - slt t3, a0, a1 - beqz t3, do_reverse - - andi a2, a2, 0x3 - li t4, 1 - beqz t2, byte_copy - -word_copy: - lw t3, 0(a1) - addi t2, t2, -1 - addi a1, a1, 4 - sw t3, 0(a0) - addi a0, a0, 4 - bnez t2, word_copy - beqz a2, exit_memcpy - j byte_copy - -do_reverse: - add a0, a0, a2 - add a1, a1, a2 - andi a2, a2, 0x3 - li t4, -1 - beqz t2, reverse_byte_copy - -reverse_word_copy: - addi a1, a1, -4 - addi t2, t2, -1 - lw t3, 0(a1) - addi a0, a0, -4 - sw t3, 0(a0) - bnez t2, reverse_word_copy - beqz a2, exit_memcpy - -reverse_byte_copy: - addi a0, a0, -1 - addi a1, a1, -1 - -byte_copy: - lb t3, 0(a1) - addi a2, a2, -1 - sb t3, 0(a0) - add a1, a1, t4 - add a0, a0, t4 - bnez a2, byte_copy - -exit_memcpy: - move a0, t0 - move a1, t1 - ret -END(__memmove) diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c index 525f9ee25a74..bc006708f075 100644 --- a/arch/riscv/lib/string.c +++ b/arch/riscv/lib/string.c @@ -92,3 +92,29 @@ void *__memcpy(void *dest, const void *src, size_t count) return memcpy(dest, src, count); } EXPORT_SYMBOL(__memcpy); + +/* + * Simply check if the buffer overlaps an call memcpy() in case, + * otherwise do a simple one byte at time backward copy. + */ +void *memmove(void *dest, const void *src, size_t count) +{ + if (dest < src || src + count <= dest) + return memcpy(dest, src, count); + + if (dest > src) { + const char *s = src + count; + char *tmp = dest + count; + + while (count--) + *--tmp = *--s; + } + return dest; +} +EXPORT_SYMBOL(memmove); + +void *__memmove(void *dest, const void *src, size_t count) +{ + return memmove(dest, src, count); +} +EXPORT_SYMBOL(__memmove); From patchwork Tue Jun 15 02:38:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matteo Croce X-Patchwork-Id: 12322395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07B1EC48BDF for ; Tue, 15 Jun 2021 16:38:00 +0000 (UTC) Received: from bombadil.infradead.org (unknown [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C1DC2617ED for ; Tue, 15 Jun 2021 16:37:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1DC2617ED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=9PBFSBwH83tuKc23MPpvl3g9M8sz5Y9Bt9CkEkYa5cI=; b=UCy3uwxcnGCoSR XCq0Uy/unXEw3hMQrKF8N3ufZSEWX8/SfMuSbV2TOStoRaGtrrYNOuhrM4tWWW/Ud2QE3YJazueQk grb7rGa/uVxCUlj2gGMP+iWSF1KReRr78ekGlTQFZitiXuWDDkSVuUNDNOggq8N3jdztBw+bQisAM ppoNu1Pr1I39bCp6um4Licm7+4mIUYph4uBjzhKCc3G9018YMFSUXYWIKwWf04t/6CSaDF4Lqobii bjnOi0uH+63uklKJgmG/1y5D+gcwLOeNRf33QMyvniFgUUw93UnZ9JXAIt3qulQQRfpkB1V5TLxP3 I48ldDEVtqjgLYI2LJhg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ltC3c-001PWE-MF; Tue, 15 Jun 2021 16:36:57 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lt9Pl-000E8e-Us for linux-riscv@bombadil.infradead.org; Tue, 15 Jun 2021 13:47:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=cO3aAbKYczrkZB+QQDywmiQyKndjXl+nOXlnziDuRHI=; b=pm2ak8ottSWwOTBf3g9oEhRj8H 3TYZBNEiUqpkYw0s9Nb8IMI2rIXPdyL0nhFop/xmpxzDjar93HlQmQ3i6pezrpEtyCCLbKYOdHLT5 HDPWBws2rsykxvMUKLYMrkFE5DS6uwNKSvjhX3Y+g35yYmy9QMvCW/2kC/dxlE1qrMjDHvdWBMddc /0m2H7T0M+xSQ6IVZ8jBssmSGz0FoioUbARSB7mgfkkzeVMWmVCpkL3kxCkHuqgisFAtln2hzOKZj DKy4H3sthvznY8bnLysRQijOVfBoPyAAPLzdA9KU0QLNfQmOaDMeGYt3aX0dem89IUc001bGfOYmS EVub9oew==; Received: from mail-ed1-f46.google.com ([209.85.208.46]) by desiato.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lsyzz-007Gxi-4u for linux-riscv@lists.infradead.org; Tue, 15 Jun 2021 02:40:28 +0000 Received: by mail-ed1-f46.google.com with SMTP id g18so47070589edq.8 for ; Mon, 14 Jun 2021 19:40:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cO3aAbKYczrkZB+QQDywmiQyKndjXl+nOXlnziDuRHI=; b=P4FdCjx+txeqL8StroTKbETCOa2BYkwoSUhhZMCGCwIQ/tLJxR+M6c/Dbbv9uXVaBk mkWDuY4JG+V7dZ9yl/TEhPmFdYvPzYy/nQwhcGiMKzLwKmOqgcRR3CU5J9TPdKEF/Iyj XbeCRa7V6oA+LFz3i+uASTO2K1ZesYOkhoz6w1kE60jKZOlP40g6+tB9BNOQQwj8iyzC WBINjRBL1qv7EOYNZbc8IgHa0ObbKHIkDw5EK5qxgoHeouGQwzjnleZEK0uTgCGKQPcX xNOqdrloWZQad+rHwJB6SXDmgDRoH+m++ms/v3qovQ+5nuY1Zn2JcPDUvFHcxr+Dsr74 zkWg== X-Gm-Message-State: AOAM530R0nb4Rrev8Q4p/XvIU5OiTP42FxEUAPVaENPLi0DvGeObe1Tq B0+MvqiZupK/lvhjegDVAXuWTrBdPBtLgQ== X-Google-Smtp-Source: ABdhPJxExnmIz2ROrfI9FdMl0J7eGhw3j5p3sbwJIDkrCoiKGXI4oaeB6ZYWsqwMW+VYlUL8WnukRA== X-Received: by 2002:aa7:d785:: with SMTP id s5mr7065851edq.19.1623724825222; Mon, 14 Jun 2021 19:40:25 -0700 (PDT) Received: from msft-t490s.teknoraver.net (net-37-119-128-179.cust.vodafonedsl.it. [37.119.128.179]) by smtp.gmail.com with ESMTPSA id cn25sm834966edb.69.2021.06.14.19.40.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 19:40:24 -0700 (PDT) From: Matteo Croce To: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Paul Walmsley , Palmer Dabbelt , Albert Ou , Atish Patra , Emil Renner Berthing , Akira Tsukamoto , Drew Fustini , Bin Meng Subject: [PATCH 3/3] riscv: optimized memset Date: Tue, 15 Jun 2021 04:38:12 +0200 Message-Id: <20210615023812.50885-4-mcroce@linux.microsoft.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210615023812.50885-1-mcroce@linux.microsoft.com> References: <20210615023812.50885-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210615_034026_418041_6AC5CF78 X-CRM114-Status: GOOD ( 21.22 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Matteo Croce The generic memset is defined as a byte at time write. This is always safe, but it's slower than a 4 byte or even 8 byte write. Write a generic memset which fills the data one byte at time until the destination is aligned, then fills using the largest size allowed, and finally fills the remaining data one byte at time. Signed-off-by: Matteo Croce --- arch/riscv/include/asm/string.h | 10 +-- arch/riscv/kernel/Makefile | 1 - arch/riscv/kernel/riscv_ksyms.c | 13 ---- arch/riscv/lib/Makefile | 1 - arch/riscv/lib/memset.S | 113 -------------------------------- arch/riscv/lib/string.c | 42 ++++++++++++ 6 files changed, 45 insertions(+), 135 deletions(-) delete mode 100644 arch/riscv/kernel/riscv_ksyms.c delete mode 100644 arch/riscv/lib/memset.S diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h index 25d9b9078569..90500635035a 100644 --- a/arch/riscv/include/asm/string.h +++ b/arch/riscv/include/asm/string.h @@ -6,14 +6,10 @@ #ifndef _ASM_RISCV_STRING_H #define _ASM_RISCV_STRING_H -#include -#include - -#define __HAVE_ARCH_MEMSET -extern asmlinkage void *memset(void *, int, size_t); -extern asmlinkage void *__memset(void *, int, size_t); - #ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE +#define __HAVE_ARCH_MEMSET +extern void *memset(void *s, int c, size_t count); +extern void *__memset(void *s, int c, size_t count); #define __HAVE_ARCH_MEMCPY extern void *memcpy(void *dest, const void *src, size_t count); extern void *__memcpy(void *dest, const void *src, size_t count); diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index d3081e4d9600..e635ce1e5645 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -31,7 +31,6 @@ obj-y += syscall_table.o obj-y += sys_riscv.o obj-y += time.o obj-y += traps.o -obj-y += riscv_ksyms.o obj-y += stacktrace.o obj-y += cacheinfo.o obj-y += patch.o diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c deleted file mode 100644 index 361565c4db7e..000000000000 --- a/arch/riscv/kernel/riscv_ksyms.c +++ /dev/null @@ -1,13 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2017 Zihao Yu - */ - -#include -#include - -/* - * Assembly functions that may be used (directly or indirectly) by modules - */ -EXPORT_SYMBOL(memset); -EXPORT_SYMBOL(__memset); diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile index 484f5ff7b508..e33263cc622a 100644 --- a/arch/riscv/lib/Makefile +++ b/arch/riscv/lib/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only lib-y += delay.o -lib-y += memset.o lib-$(CONFIG_MMU) += uaccess.o lib-$(CONFIG_64BIT) += tishift.o lib-$(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE) += string.o diff --git a/arch/riscv/lib/memset.S b/arch/riscv/lib/memset.S deleted file mode 100644 index 34c5360c6705..000000000000 --- a/arch/riscv/lib/memset.S +++ /dev/null @@ -1,113 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Regents of the University of California - */ - - -#include -#include - -/* void *memset(void *, int, size_t) */ -ENTRY(__memset) -WEAK(memset) - move t0, a0 /* Preserve return value */ - - /* Defer to byte-oriented fill for small sizes */ - sltiu a3, a2, 16 - bnez a3, 4f - - /* - * Round to nearest XLEN-aligned address - * greater than or equal to start address - */ - addi a3, t0, SZREG-1 - andi a3, a3, ~(SZREG-1) - beq a3, t0, 2f /* Skip if already aligned */ - /* Handle initial misalignment */ - sub a4, a3, t0 -1: - sb a1, 0(t0) - addi t0, t0, 1 - bltu t0, a3, 1b - sub a2, a2, a4 /* Update count */ - -2: /* Duff's device with 32 XLEN stores per iteration */ - /* Broadcast value into all bytes */ - andi a1, a1, 0xff - slli a3, a1, 8 - or a1, a3, a1 - slli a3, a1, 16 - or a1, a3, a1 -#ifdef CONFIG_64BIT - slli a3, a1, 32 - or a1, a3, a1 -#endif - - /* Calculate end address */ - andi a4, a2, ~(SZREG-1) - add a3, t0, a4 - - andi a4, a4, 31*SZREG /* Calculate remainder */ - beqz a4, 3f /* Shortcut if no remainder */ - neg a4, a4 - addi a4, a4, 32*SZREG /* Calculate initial offset */ - - /* Adjust start address with offset */ - sub t0, t0, a4 - - /* Jump into loop body */ - /* Assumes 32-bit instruction lengths */ - la a5, 3f -#ifdef CONFIG_64BIT - srli a4, a4, 1 -#endif - add a5, a5, a4 - jr a5 -3: - REG_S a1, 0(t0) - REG_S a1, SZREG(t0) - REG_S a1, 2*SZREG(t0) - REG_S a1, 3*SZREG(t0) - REG_S a1, 4*SZREG(t0) - REG_S a1, 5*SZREG(t0) - REG_S a1, 6*SZREG(t0) - REG_S a1, 7*SZREG(t0) - REG_S a1, 8*SZREG(t0) - REG_S a1, 9*SZREG(t0) - REG_S a1, 10*SZREG(t0) - REG_S a1, 11*SZREG(t0) - REG_S a1, 12*SZREG(t0) - REG_S a1, 13*SZREG(t0) - REG_S a1, 14*SZREG(t0) - REG_S a1, 15*SZREG(t0) - REG_S a1, 16*SZREG(t0) - REG_S a1, 17*SZREG(t0) - REG_S a1, 18*SZREG(t0) - REG_S a1, 19*SZREG(t0) - REG_S a1, 20*SZREG(t0) - REG_S a1, 21*SZREG(t0) - REG_S a1, 22*SZREG(t0) - REG_S a1, 23*SZREG(t0) - REG_S a1, 24*SZREG(t0) - REG_S a1, 25*SZREG(t0) - REG_S a1, 26*SZREG(t0) - REG_S a1, 27*SZREG(t0) - REG_S a1, 28*SZREG(t0) - REG_S a1, 29*SZREG(t0) - REG_S a1, 30*SZREG(t0) - REG_S a1, 31*SZREG(t0) - addi t0, t0, 32*SZREG - bltu t0, a3, 3b - andi a2, a2, SZREG-1 /* Update count */ - -4: - /* Handle trailing misalignment */ - beqz a2, 6f - add a3, t0, a2 -5: - sb a1, 0(t0) - addi t0, t0, 1 - bltu t0, a3, 5b -6: - ret -END(__memset) diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c index bc006708f075..62869627e139 100644 --- a/arch/riscv/lib/string.c +++ b/arch/riscv/lib/string.c @@ -118,3 +118,45 @@ void *__memmove(void *dest, const void *src, size_t count) return memmove(dest, src, count); } EXPORT_SYMBOL(__memmove); + +void *memset(void *s, int c, size_t count) +{ + union types dest = { .u8 = s }; + + if (count > MIN_THRESHOLD) { + const int bytes_long = BITS_PER_LONG / 8; + unsigned long cu = (unsigned long)c; + + /* Compose an ulong with 'c' repeated 4/8 times */ + cu = +#if BITS_PER_LONG == 64 + cu << 56 | cu << 48 | cu << 40 | cu << 32 | +#endif + cu << 24 | cu << 16 | cu << 8 | cu; + +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + /* Fill the buffer one byte at time until the destination + * is aligned on a 32/64 bit boundary. + */ + for (; count && dest.uptr % bytes_long; count--) + *dest.u8++ = c; +#endif + + /* Copy using the largest size allowed */ + for (; count >= bytes_long; count -= bytes_long) + *dest.ulong++ = cu; + } + + /* copy the remainder */ + while (count--) + *dest.u8++ = c; + + return s; +} +EXPORT_SYMBOL(memset); + +void *__memset(void *s, int c, size_t count) +{ + return memset(s, c, count); +} +EXPORT_SYMBOL(__memset);