From patchwork Mon Mar 30 11:32:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikulas Patocka X-Patchwork-Id: 11465289 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E23B113A4 for ; Mon, 30 Mar 2020 11:32:33 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B53BC2073B for ; Mon, 30 Mar 2020 11:32:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OiMAIDYW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B53BC2073B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 64D6F10FC379F; Mon, 30 Mar 2020 04:33:23 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=216.205.24.74; helo=us-smtp-delivery-74.mimecast.com; envelope-from=mpatocka@redhat.com; receiver= Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [216.205.24.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B376710FC36C4 for ; Mon, 30 Mar 2020 04:33:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585567949; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TwpNVYqH1IT8XInDAnx1G1rtABHljnjNAazlrsB4oD8=; b=OiMAIDYWCtBMVSSvR2sqRsZNdymn9GyksuuBr2+vO79isMnBW+GWu9UGItQAjiutROx7XU h4ywF9WQiVDZGcvgWdb7WzV8sTq342aCo/GIBxUBHUtyKZmJ4NENYUobdqfbYm19B4zKMz zO8Ld9rmK+wCvC4hNkbfNVAvlVsYHnk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-309-7ySpwmMxPyOodtHlMtqg7Q-1; Mon, 30 Mar 2020 07:32:25 -0400 X-MC-Unique: 7ySpwmMxPyOodtHlMtqg7Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 02C68800D5B; Mon, 30 Mar 2020 11:32:24 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1151A5DA76; Mon, 30 Mar 2020 11:32:20 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 02UBWKDW010361; Mon, 30 Mar 2020 07:32:20 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 02UBWK28010357; Mon, 30 Mar 2020 07:32:20 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Mon, 30 Mar 2020 07:32:20 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Dan Williams , Vishal Verma , Dave Jiang , Ira Weiny , Mike Snitzer Subject: [PATCH v2] memcpy_flushcache: use cache flusing for larger lengths In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: WHNOBUGZRBRY3UXG3LIRVNP7FWY5XUC5 X-Message-ID-Hash: WHNOBUGZRBRY3UXG3LIRVNP7FWY5XUC5 X-MailFrom: mpatocka@redhat.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: linux-nvdimm@lists.01.org, dm-devel@redhat.com X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This is the second version of the patch - it adds a test for boot_cpu_data.x86_clflush_size. There may be CPUs with different cache line size and we don't want to run the 64-byte aligned loop on them. Mikulas From: Mikulas Patocka memcpy_flushcache: use cache flusing for larger lengths I tested dm-writecache performance on a machine with Optane nvdimm and it turned out that for larger writes, cached stores + cache flushing perform better than non-temporal stores. This is the throughput of dm-writecache measured with this command: dd if=/dev/zero of=/dev/mapper/wc bs=64 oflag=direct block size 512 1024 2048 4096 movnti 496 MB/s 642 MB/s 725 MB/s 744 MB/s clflushopt 373 MB/s 688 MB/s 1.1 GB/s 1.2 GB/s We can see that for smaller block, movnti performs better, but for larger blocks, clflushopt has better performance. This patch changes the function __memcpy_flushcache accordingly, so that with size >= 768 it performs cached stores and cache flushing. Note that we must not use the new branch if the CPU doesn't have clflushopt - in that case, the kernel would use inefficient "clflush" instruction that has very bad performance. Signed-off-by: Mikulas Patocka --- arch/x86/lib/usercopy_64.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) Index: linux-2.6/arch/x86/lib/usercopy_64.c =================================================================== --- linux-2.6.orig/arch/x86/lib/usercopy_64.c 2020-03-24 15:15:36.644945091 -0400 +++ linux-2.6/arch/x86/lib/usercopy_64.c 2020-03-30 07:17:51.450290007 -0400 @@ -152,6 +152,42 @@ void __memcpy_flushcache(void *_dst, con return; } + if (static_cpu_has(X86_FEATURE_CLFLUSHOPT) && size >= 768 && likely(boot_cpu_data.x86_clflush_size == 64)) { + while (!IS_ALIGNED(dest, 64)) { + asm("movq (%0), %%r8\n" + "movnti %%r8, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 8; + source += 8; + size -= 8; + } + do { + asm("movq (%0), %%r8\n" + "movq 8(%0), %%r9\n" + "movq 16(%0), %%r10\n" + "movq 24(%0), %%r11\n" + "movq %%r8, (%1)\n" + "movq %%r9, 8(%1)\n" + "movq %%r10, 16(%1)\n" + "movq %%r11, 24(%1)\n" + "movq 32(%0), %%r8\n" + "movq 40(%0), %%r9\n" + "movq 48(%0), %%r10\n" + "movq 56(%0), %%r11\n" + "movq %%r8, 32(%1)\n" + "movq %%r9, 40(%1)\n" + "movq %%r10, 48(%1)\n" + "movq %%r11, 56(%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8", "r9", "r10", "r11"); + clflushopt((void *)dest); + dest += 64; + source += 64; + size -= 64; + } while (size >= 64); + } + /* 4x8 movnti loop */ while (size >= 32) { asm("movq (%0), %%r8\n"