From patchwork Wed Oct 5 18:03:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 12999543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79E6AC433FE for ; Wed, 5 Oct 2022 18:03:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED8D86B0072; Wed, 5 Oct 2022 14:03:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E87F46B0073; Wed, 5 Oct 2022 14:03:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D28A16B0074; Wed, 5 Oct 2022 14:03:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C03466B0072 for ; Wed, 5 Oct 2022 14:03:53 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 80330160C24 for ; Wed, 5 Oct 2022 18:03:53 +0000 (UTC) X-FDA: 79987669146.16.3C53E06 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf11.hostedemail.com (Postfix) with ESMTP id 24EE34001A for ; Wed, 5 Oct 2022 18:03:52 +0000 (UTC) Received: by mail-pj1-f44.google.com with SMTP id a5-20020a17090aa50500b002008eeb040eso3397809pjq.1 for ; Wed, 05 Oct 2022 11:03:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=IKwMLpWApD5aFrIIIf8CGQFhqbcknhmLeA7p0rloKo4=; b=Bwp7bfmZ1+hu0hFa/fB0IIO8XiNYixDa7F4lA7Ku5KKsFe4yZEnhoaC0IiYBALR/S9 VHpRXAsh6hEn503lfK2AHF/xg8J5NAPm6i8Wy1bnOOHSuemBlO8avg84TjA6Ay1x9Trf cDdM8d1WsFHfnANAErN36C9iV114dmiUYuwAMY8tXlfORHN1BB9UC6qghagMbdW4JxsQ bMBX2YdBPly1w0IrNXhOZTCYj8Ld/BKXFLdJAhJeFvHsc8n6pwrPXbXAu9axmpeIvY8V nH56dx1bZCoVBLRY0JaDurn7GXEIHkFCnUvxOwrM793x57vII6870emCiGVnDw5gz0eN sboQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IKwMLpWApD5aFrIIIf8CGQFhqbcknhmLeA7p0rloKo4=; b=zgbxa6R67/Iv4DmIhKa1/vcbpLJ2jh4WzJ1XSGf05XEBr3e/GcuGJKbJJDWBXK2bmm Yi5g1lhfQgHLtFIXsgB/6ke8jyUwKIpeWZw2jcUeyb0bAHfrbKhv0n4t3hHv2QfHQ96N H3sWQTXzTc8HsrHkypKSLFLfPHhDmxCEoewAn5RG64V0BuzgXzYj9PqZIBQElZf9PSDy DVVjWmmrYfK/2Chx0/uI7/0Oixtdwwyb8m1wUUAhjAmp5WWwprwfsBLTxBpvGrPQxHIC UrOhBA3D+wFhauhQdX1M9hBHwnaeQlLO7v6LjdyAV6jKI+0lUcN7vHhPvwXzUo4C5CIR UjNA== X-Gm-Message-State: ACrzQf3cwiViS721JN34wvTMDHxRo78vFReE5JQzwm2nkMb869haMbnq qTUYdf8pxo2YZvsFKR3OjeM= X-Google-Smtp-Source: AMsMyM53gfS50h/TX2/GWR7BIW/H3B+J8W4P92wIc0AYeapZJ8wWdOf6715h39n5ZlkC6HAcQe53Tg== X-Received: by 2002:a17:90b:188b:b0:20a:8fc8:60b8 with SMTP id mn11-20020a17090b188b00b0020a8fc860b8mr6326296pjb.37.1664993032035; Wed, 05 Oct 2022 11:03:52 -0700 (PDT) Received: from localhost.localdomain (c-67-174-241-145.hsd1.ca.comcast.net. [67.174.241.145]) by smtp.gmail.com with ESMTPSA id y17-20020a170903011100b001788494b764sm10674639plc.231.2022.10.05.11.03.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Oct 2022 11:03:51 -0700 (PDT) From: Yang Shi To: mgorman@techsingularity.net, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 0/4] Introduce mempool pages bulk allocator the use it in dm-crypt Date: Wed, 5 Oct 2022 11:03:37 -0700 Message-Id: <20221005180341.1738796-1-shy828301@gmail.com> X-Mailer: git-send-email 2.26.3 MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664993033; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=IKwMLpWApD5aFrIIIf8CGQFhqbcknhmLeA7p0rloKo4=; b=YDTt9D0DB/wfjgk3yV/FnZegi1WWtW690nYLyinw0afJfzfLdD2byPiPx/Co8IkvXlsCI2 3DDA0weuk2wbIlVFcpF/C53R3RNokYYWyS8433gzfDlo8ua02ErvE9I6W7YhTgemgjtTTo AIRbMyyS+isFWu1UlxmIoHQgBvaXAfs= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Bwp7bfmZ; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664993033; a=rsa-sha256; cv=none; b=51DoqqYYtKvT+r/prygYH266wTQ/uVT6TfAxYg06qvDrwhgqU3SE6K2HKJLjkQc7d+6qiF MV2hxPrJojNNUX+k7I7LuH1zrJw+8No19rXwohD/TcdX/fBDafPG4p6k8lNnAXZizcDG6B lAWV+VeGVsRZCnzi7djtz004ftCbBAo= Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Bwp7bfmZ; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 24EE34001A X-Rspam-User: X-Stat-Signature: bc81t3p4w1kcp9okj733axj89mekbe1n X-HE-Tag: 1664993032-763915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have full disk encryption enabled, profiling shows page allocations may incur a noticeable overhead when writing. The dm-crypt creates an "out" bio for writing. And fill the "out" bio with the same amount of pages as "in" bio. But the driver allocates one page at a time in a loop. For 1M bio it means the driver has to call page allocator 256 times. It seems not that efficient. Since v5.13 we have page bulk allocator supported, so dm-crypt could use it to do page allocations more efficiently. I could just call the page bulk allocator in dm-crypt driver before the mempool allocator, but it seems ad-hoc and the quick search shows some others do the similar thing, for example, f2fs compress, block bounce, g2fs, ufs, etc. So it seems more neat to implement a bulk allocation API for mempool. So introduce the mempool page bulk allocator. The below APIs are introduced: - mempool_init_pages_bulk() - mempool_create_pages_bulk() They initialize the mempool for page bulk allocator. The pool is filled by alloc_page() in a loop. - mempool_alloc_pages_bulk_list() - mempool_alloc_pages_bulk_array() They do bulk allocation from mempool. They do the below conceptually: 1. Call bulk page allocator 2. If the allocation is fulfilled then return otherwise try to allocate the remaining pages from the mempool 3. If it is fulfilled then return otherwise retry from #1 with sleepable gfp 4. If it is still failed, sleep for a while to wait for the mempool is refilled, then retry from #1 The populated pages will stay on the list or array until the callers consume them or free them. Since mempool allocator is guaranteed to success in the sleepable context, so the two APIs return true for success or false for fail. It is the caller's responsibility to handle failure case (partial allocation), just like the page bulk allocator. The mempool typically is an object agnostic allocator, but bulk allocation is only supported by pages, so the mempool bulk allocator is for page allocation only as well. With the mempool bulk allocator the IOPS of dm-crypt with 1M I/O would get improved by approxiamately 6%. The test is done on a VM with 80 vCPU and 64GB memory with an encrypted ram device (the impact from storage hardware could be minimized so that we could benchmark the dm-crypt layer more accurately). Before the patch: Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=402MiB/s][r=0,w=402 IOPS][eta 00m:00s] crypt: (groupid=0, jobs=1): err= 0: pid=233950: Thu Sep 15 16:23:10 2022 write: IOPS=402, BW=403MiB/s (423MB/s)(23.6GiB/60002msec) slat (usec): min=2425, max=3819, avg=2480.84, stdev=34.00 clat (usec): min=7, max=165751, avg=156398.72, stdev=4691.03 lat (msec): min=2, max=168, avg=158.88, stdev= 4.69 clat percentiles (msec): | 1.00th=[ 157], 5.00th=[ 157], 10.00th=[ 157], 20.00th=[ 157], | 30.00th=[ 157], 40.00th=[ 157], 50.00th=[ 157], 60.00th=[ 157], | 70.00th=[ 157], 80.00th=[ 157], 90.00th=[ 157], 95.00th=[ 157], | 99.00th=[ 159], 99.50th=[ 159], 99.90th=[ 165], 99.95th=[ 165], | 99.99th=[ 167] bw ( KiB/s): min=405504, max=413696, per=99.71%, avg=411845.53, stdev=1155.04, samples=120 iops : min= 396, max= 404, avg=402.17, stdev= 1.15, samples=120 lat (usec) : 10=0.01% lat (msec) : 4=0.01%, 10=0.01%, 20=0.02%, 50=0.05%, 100=0.08% lat (msec) : 250=100.09% cpu : usr=3.74%, sys=95.66%, ctx=27, majf=0, minf=4 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=103.1% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,24138,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=403MiB/s (423MB/s), 403MiB/s-403MiB/s (423MB/s-423MB/s), io=23.6GiB (25.4GB), run=60002-60002msec After the patch: Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=430MiB/s][r=0,w=430 IOPS][eta 00m:00s] crypt: (groupid=0, jobs=1): err= 0: pid=288730: Thu Sep 15 16:25:39 2022 write: IOPS=430, BW=431MiB/s (452MB/s)(25.3GiB/60002msec) slat (usec): min=2253, max=3213, avg=2319.49, stdev=34.29 clat (usec): min=6, max=149337, avg=146257.68, stdev=4239.52 lat (msec): min=2, max=151, avg=148.58, stdev= 4.24 clat percentiles (msec): | 1.00th=[ 146], 5.00th=[ 146], 10.00th=[ 146], 20.00th=[ 146], | 30.00th=[ 146], 40.00th=[ 146], 50.00th=[ 146], 60.00th=[ 146], | 70.00th=[ 146], 80.00th=[ 146], 90.00th=[ 148], 95.00th=[ 148], | 99.00th=[ 148], 99.50th=[ 148], 99.90th=[ 150], 99.95th=[ 150], | 99.99th=[ 150] bw ( KiB/s): min=438272, max=442368, per=99.73%, avg=440463.57, stdev=1305.60, samples=120 iops : min= 428, max= 432, avg=430.12, stdev= 1.28, samples=120 lat (usec) : 10=0.01% lat (msec) : 4=0.01%, 10=0.01%, 20=0.02%, 50=0.05%, 100=0.09% lat (msec) : 250=100.07% cpu : usr=3.78%, sys=95.37%, ctx=12778, majf=0, minf=4 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=103.1% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,25814,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=431MiB/s (452MB/s), 431MiB/s-431MiB/s (452MB/s-452MB/s), io=25.3GiB (27.1GB), run=60002-60002msec The function tracing also shows the time consumed by page allocations is reduced significantly. The test allocated 1M (256 pages) bio in the same environment. Before the patch: It took approximately 600us by excluding the bio_add_page() calls. 2720.630754 | 56) xfs_io-38859 | 2.571 us | mempool_alloc(); 2720.630757 | 56) xfs_io-38859 | 0.937 us | bio_add_page(); 2720.630758 | 56) xfs_io-38859 | 1.772 us | mempool_alloc(); 2720.630760 | 56) xfs_io-38859 | 0.852 us | bio_add_page(); …. 2720.631559 | 56) xfs_io-38859 | 2.058 us | mempool_alloc(); 2720.631561 | 56) xfs_io-38859 | 0.717 us | bio_add_page(); 2720.631562 | 56) xfs_io-38859 | 2.014 us | mempool_alloc(); 2720.631564 | 56) xfs_io-38859 | 0.620 us | bio_add_page(); After the patch: It took approxiamately 30us. 11564.266385 | 22) xfs_io-136183 | + 30.551 us | __alloc_pages_bulk(); Page allocations overhead is around 6% (600us/9853us) in dm-crypt layer shown by function trace. The data also matches the IOPS data shown by fio. And the benchmark with 4K size I/O doesn't show measurable regression. Yang Shi (4): mm: mempool: extract common initialization code mm: mempool: introduce page bulk allocator md: dm-crypt: move crypt_free_buffer_pages ahead md: dm-crypt: use mempool page bulk allocator drivers/md/dm-crypt.c | 92 ++++++++++++++++------------- include/linux/mempool.h | 19 ++++++ mm/mempool.c | 227 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------- 3 files changed, 276 insertions(+), 62 deletions(-)