From patchwork Mon Apr 15 17:18:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13630435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 005CDC4345F for ; Mon, 15 Apr 2024 17:19:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B40C6B00A6; Mon, 15 Apr 2024 13:19:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 762866B00A7; Mon, 15 Apr 2024 13:19:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 568BD6B00A9; Mon, 15 Apr 2024 13:19:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2C0136B00A6 for ; Mon, 15 Apr 2024 13:19:25 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B4BF8406D2 for ; Mon, 15 Apr 2024 17:19:24 +0000 (UTC) X-FDA: 82012427448.24.A8AC605 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf16.hostedemail.com (Postfix) with ESMTP id C8FE4180018 for ; Mon, 15 Apr 2024 17:19:22 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TkwTkdDM; spf=pass (imf16.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713201562; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7ftt1wlC5vn+OAyWyZNr8D22lSmnA0oAAZ/IwpKgu3E=; b=ix6Ookarob+QdhJ5nfPU65HdAdCsF4doXes+ECGnrDOW3jIXuhBxb6I4eT16wp9uWlBBds Yvm1/gwGOYOf/pP85Fta25JzHAYLZCYa47qoG8lBEoHovlQiXuqhJpTFdIAapTSG8O7xAE kHD8NjfOd7jVO4hQy09Kb6RtkP7av5U= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TkwTkdDM; spf=pass (imf16.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713201562; a=rsa-sha256; cv=none; b=N0DmSBBcnQolwTMjQr/NosCg0rTZSU+bG1wk2XKXwNTN3cBpoL3M5kUSLSt9i8J+66x50v w8jqomBe2HXZsFQ2HyRQEXeEzBxh2PP2j6re5nKeqC8LqfzMC9V4g4xsSzW+7IPTXE3k6t k6wcNq808crcGl3NgnrOFFjx4KRIhJ0= Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-5cfd95130c6so2336852a12.1 for ; Mon, 15 Apr 2024 10:19:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713201561; x=1713806361; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=7ftt1wlC5vn+OAyWyZNr8D22lSmnA0oAAZ/IwpKgu3E=; b=TkwTkdDM1yzdct/xCHTWl3PCgR5KKqq74QNS267kQdpj+oUhZxAnrgrgbogCXO25oy QZX3AD2FAs+wVk1Ldt0lTsWQeyzLHg3gj1aVw7eg83mqMt7CFnaId+pFZtXzqvYLbmW9 2nnPJTjDs5FY2eRTCwyMh3G3G8QARYM8Yn2KdKPcu89Y156Mne+GHC1CYNZ4+XByw8p1 CbW3PR7IsIZYi2zsIluPEOa+0Jfl129nj9Tqpm3QnWUeZ4XbHsFy0uVFbEOvIOrEEgSg amKDQFsRsyFeI6u0mihLIXGlCXoCSvkQNiwAoPvrdY7W+Lu37RPoTlRIB/ih5nv5Ek0o jwDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713201561; x=1713806361; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=7ftt1wlC5vn+OAyWyZNr8D22lSmnA0oAAZ/IwpKgu3E=; b=vKvKFluVKcWded6HsrlMwdyJIqzGLO0N5XLRkaVstelQ+vB9DbWHJXH/X0jFibbaco FkwV+WwCJ6j7+9SplxjGdxEPMIEofYor1fDsdosmFEAFOFcJhylOZN4FL1mO3oazMh6I htqFFwf4jtb6wOkHokdZRTOmp79P1NX/lp9Gp9fiDD1aFetn2KeKF57Ijcsr5Huvwi0y vrgKPLlnUJHQaRwuecoyq2q4iUWCMRZURaGvCV5ijWtmXiEYQLrPlDzBi9ld3GV1jRt7 v34Gr0URI+m5gDAxb3eOrwzB8X+/EhY+bcyZSrmMYw1Gk8W4jIFhQZkH6Il6kL/OZDVS T4/Q== X-Gm-Message-State: AOJu0YyEr6mOmYeLmOyf8OLkD1o6HxZoOvcvBMRnUuedSaEle9t/LCIM gExFHAuhnnjaYvpVD8mUbZL2g1jDTmAwSdaPTkbK3ODpR6oZmS7lVMXZyxMW X-Google-Smtp-Source: AGHT+IF5I7CUmmaoFvwK3SVG9GhKVEy4AarbCQA8oQBcJAZLx+3bOZ1JGWosjLM5SoLJEKPUKgit5A== X-Received: by 2002:a17:90a:f2d7:b0:2a2:b9f8:e0b0 with SMTP id gt23-20020a17090af2d700b002a2b9f8e0b0mr6933990pjb.19.1713201560649; Mon, 15 Apr 2024 10:19:20 -0700 (PDT) Received: from localhost.localdomain ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id o19-20020a17090aac1300b002a537abb536sm8751648pjq.57.2024.04.15.10.19.18 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 15 Apr 2024 10:19:20 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Matthew Wilcox , Andrew Morton , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 4/4] mm/filemap: optimize filemap folio adding Date: Tue, 16 Apr 2024 01:18:56 +0800 Message-ID: <20240415171857.19244-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240415171857.19244-1-ryncsn@gmail.com> References: <20240415171857.19244-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: C8FE4180018 X-Rspam-User: X-Stat-Signature: fwyaw4tqhg4ze7j416b1ynzyher6ytg7 X-Rspamd-Server: rspam01 X-HE-Tag: 1713201562-960948 X-HE-Meta: U2FsdGVkX1/4vUMIAFHmLBottvmalo790nAs32QQpyQwlSoSwmY1ZTWCQY7HtE+3KCDdzGcXAndGi5OhjtDM9OGOhw97dwiOeS2pBsqRa5ozCCTeO1nsm6P1z4HqAOWixt68GBL2UvHFyGfS4XMWWVRNlnb/ObnylPP1PjzF5BMtQRw3tGYyYr5Jcwe5RPyfbkl0F6VBFN6xCnoyOVxxKSVfsi1KxShvcU3b6fzzg399S5QA4mfMYISqihXaFbyqAEMngraRb0rszH/oZ9BVk6Y9eNMwhTG0I46uQzgUW2A+MnZO+LvqWTzACvjLCiy/8G4SPcRBYT+G/At/EKRLI9evSV5J4nIKZe6yJ6seBz7SbYzZL31EV2UcY0tPCqnt3fvERMGn64ORp7ZQTbZUxFmkxtU0INMgQAF0ASBAqDNbWr3PSLK7bw0q7WSrAjHNsHLt2Y8yT7Tu8tzH1FfW0C7suNo5S+M7wbUsy9ABwAGbtmrSasVqhZAwGTHqr7MNI5HuqtZMVnStL/1jozlaaV4UJpkKn5TPsoUG4tKvhDb1SSwKw6t25qCWculj14B5aWL8b9dYdZjCGWiBROSULIb7tMs7wCvNvCZSt44qs4jWWyga+NqzkLk/6tAos2dOS8UXE1wW6keJhpoyRtsUa2RIhcYpMvoUUb+gSpZPvCd3MGPOJbxXcC4VzHNzINwxJhPbSGp8EG+d7cqqh69XAHzhXRuVWgs+rm3SiTw+Qhw7DUfRtnisnMShBvA18A/QUXZSItXnom9rFRV/jFAMzMSq9+pRkHUkwYuXAvubrLwyyOt6xJ7SMxayR9oqJTq/KCI2jaEGHKaZ0I+QLlxS7/SOnFLIL0i+61y+gk6tTVKboQEbYtpFZDksFtEjsRVcZnWWIEACHapXHqiwrLGmzRycquR9sHBrmXf2Jx7Oxeg/WVscpQhEBd+VZksoprgCQcJECHOdgYrel/9Rih1 5MLgFr+j JtoMskVchECW4oqK0LivzUm53im3/vfxxuRvn534B73vDp2voIDir76yPORJciLpJAX8rmrYT4de6KsFmJEwxME280UvC1JeCGZsaStjaPWQt12Ptf6te6zSp3eatCb3YG3Z4u3HWB3iyKqnNvwiTU2V6BTT4Bz49J4SNbuN08y9ObokE0hS3IWozL0o691YMu+xChv9TLGWWPVJ7Z43nE6dGdEerVVtJSN6FYXcszFWOAmlv6yU8vKH+6ArsWTJdWcauegypU3hhBSV49k7soqGSK/8mQvNFqB/zq+UQu48T10ZYd4SpALTEFBCGxi7xVCRbO2EhUBGsQKZwQq05zJoyUJuM07bWoILU+dXQDagyOgkPQ7kA0QEu/1jErPNs2T2CaQm55FTo+Z7Gti7N7TWNH+8glaMVP9Xbh4dRkjRSGp/1LCAyjBCY0hglE3HYeugU5QG7WHGtDAsClX+uUBSDHG8ngUzU2m0c6uInFJXy5CKxW+oWVT00r6OGgtNbErevk1E0Jp88eZ1/HpbRw1EFfZ6o357OGgc/+vOm6bnlwnZuNd0+axHh2Ivoey60PIPVwcauG5vEN274KlsnWM+4xSK3XvupIIjw2KAepZyZl0ytHoKR38NRQMGpvVTURugh76J7ajJYij+uQCig74PmE+K0Ydxmp3jhj01WqHkvWBmV3FizhwquuoM0H/8xL041mNqkcDM8iNbh6yKSOJKajHEcaITyp3Jx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4K pages, in an 8G cgroup, with 16G brd as block device: echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691 After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976ยท READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec After (+12.5%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146 READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read. Signed-off-by: Kairui Song --- lib/test_xarray.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++ mm/filemap.c | 56 ++++++++++++++++++++++++++++++++------------ 2 files changed, 100 insertions(+), 15 deletions(-) diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 0efde8f93490..8732a311f613 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -2017,6 +2017,64 @@ static noinline void check_xas_get_order(struct xarray *xa) } } +static noinline void check_xas_conflict_get_order(struct xarray *xa) +{ + XA_STATE(xas, xa, 0); + + void *entry; + int only_once; + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1; + unsigned int order; + unsigned long i, j, k; + + for (order = 0; order < max_order; order++) { + for (i = 0; i < 10; i++) { + xas_set_order(&xas, i << order, order); + do { + xas_lock(&xas); + xas_store(&xas, xa_mk_value(i)); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + /* + * Ensure xas_get_order works with xas_for_each_conflict. + */ + j = i << order; + for (k = 0; k < order; k++) { + only_once = 0; + xas_set_order(&xas, j + (1 << k), k); + xas_lock(&xas); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + if (order < max_order - 1) { + only_once = 0; + xas_set_order(&xas, (i & ~1UL) << order, order + 1); + xas_lock(&xas); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + xas_set_order(&xas, i << order, order); + xas_lock(&xas); + xas_store(&xas, NULL); + xas_unlock(&xas); + } + } +} + + static noinline void check_destroy(struct xarray *xa) { unsigned long index; @@ -2069,6 +2127,7 @@ static int xarray_checks(void) check_multi_store_advanced(&array); check_get_order(&array); check_xas_get_order(&array); + check_xas_conflict_get_order(&array); check_xa_alloc(); check_find(&array); check_find_entry(&array); diff --git a/mm/filemap.c b/mm/filemap.c index 17a66ea544e7..7b0b2229d4ed 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -852,7 +852,9 @@ noinline int __filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp) { XA_STATE(xas, &mapping->i_pages, index); - bool huge = folio_test_hugetlb(folio); + void *alloced_shadow = NULL; + int alloced_order = 0; + bool huge; long nr; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); @@ -861,6 +863,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); xas_set_order(&xas, index, folio_order(folio)); + huge = folio_test_hugetlb(folio); nr = folio_nr_pages(folio); gfp &= GFP_RECLAIM_MASK; @@ -868,16 +871,10 @@ noinline int __filemap_add_folio(struct address_space *mapping, folio->mapping = mapping; folio->index = xas.xa_index; - do { - unsigned int order = xa_get_order(xas.xa, xas.xa_index); + for (;;) { + int order = -1, split_order = 0; void *entry, *old = NULL; - if (order > folio_order(folio)) { - xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index), - order, gfp); - if (xas_error(&xas)) - goto error; - } xas_lock_irq(&xas); xas_for_each_conflict(&xas, entry) { old = entry; @@ -885,19 +882,33 @@ noinline int __filemap_add_folio(struct address_space *mapping, xas_set_err(&xas, -EEXIST); goto unlock; } + /* + * If a larger entry exists, + * it will be the first and only entry iterated. + */ + if (order == -1) + order = xas_get_order(&xas); + } + + /* entry may have changed before we re-acquire the lock */ + if (alloced_order && (old != alloced_shadow || order != alloced_order)) { + xas_destroy(&xas); + alloced_order = 0; } if (old) { - if (shadowp) - *shadowp = old; - /* entry may have been split before we acquired lock */ - order = xa_get_order(xas.xa, xas.xa_index); - if (order > folio_order(folio)) { + if (order > 0 && order > folio_order(folio)) { /* How to handle large swap entries? */ BUG_ON(shmem_mapping(mapping)); + if (!alloced_order) { + split_order = order; + goto unlock; + } xas_split(&xas, old, order); xas_reset(&xas); } + if (shadowp) + *shadowp = old; } xas_store(&xas, folio); @@ -913,9 +924,24 @@ noinline int __filemap_add_folio(struct address_space *mapping, __lruvec_stat_mod_folio(folio, NR_FILE_THPS, nr); } + unlock: xas_unlock_irq(&xas); - } while (xas_nomem(&xas, gfp)); + + /* split needed, alloc here and retry. */ + if (split_order) { + xas_split_alloc(&xas, old, split_order, gfp); + if (xas_error(&xas)) + goto error; + alloced_shadow = old; + alloced_order = split_order; + xas_reset(&xas); + continue; + } + + if (!xas_nomem(&xas, gfp)) + break; + } if (xas_error(&xas)) goto error;