From patchwork Mon Mar 25 17:14:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13602535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 426B2C54E58 for ; Mon, 25 Mar 2024 17:14:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C565D6B009B; Mon, 25 Mar 2024 13:14:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDF416B009C; Mon, 25 Mar 2024 13:14:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A59C96B009D; Mon, 25 Mar 2024 13:14:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8F8A46B009B for ; Mon, 25 Mar 2024 13:14:25 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 64CEE80790 for ; Mon, 25 Mar 2024 17:14:25 +0000 (UTC) X-FDA: 81936210090.06.E217F89 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf30.hostedemail.com (Postfix) with ESMTP id 6F1CE80006 for ; Mon, 25 Mar 2024 17:14:23 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=J1fzzfBL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711386863; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bxkVFzSJVoLggCSIdplAlTRqteX3zHBKMrMrFYGST98=; b=nmPhlwlW/x/9Cvd9LV4GbRgvdyasMsCjJgxp8OLaL6Dt8eD/6rR+geI34lJIsnuS0vpvga XAI0/gNy/b9dcm7m8HNBSRcpL6rkwqiQWqWPr2aEnMRfbNLQGo2iErCUA52uMHHY6S9Gcd S4bmK33X6ZIysRzpkiA0bdtZiVMHiF8= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=J1fzzfBL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711386863; a=rsa-sha256; cv=none; b=f+1wlXDxqtjaigd4okQSyJ9nwRlMYPiw+ed0e/GIbZZyAgHaJjMLSRcLfAoXqy0ruRDl0G 8uWFiJO83bgb+v07QmCU4b616OjIH3Ca6XFc05mnoNRQsFxiQl+pDspf9Gh7R8rY3wcWpl p5LupZVu8433pZWzDsyLSmuai0zSFCc= Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6ea9a616cc4so1226265b3a.1 for ; Mon, 25 Mar 2024 10:14:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711386861; x=1711991661; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=bxkVFzSJVoLggCSIdplAlTRqteX3zHBKMrMrFYGST98=; b=J1fzzfBL5R8CeJNKTACab6RJKPe/ISIFWuBrBBvO4EbHx09XO+4UpiJk0yzCkWV8+S 2NJCDyRaImA7AZ4QNwPCT/VsyQVcKCHXjytIk19kqYLRNJ0nipnzVCpcPWiryMYP6rnH N9myNSvp2QlsIAIglb2txEB8amnpO6tzeOO1bHxdGN6EshD9N7gRwDsipG9wHSOOFmAc I0yU3KPvBlfiRu1cw+68FEUIoIF2mUqXeSxFYASYaNAmwDqdYzbim5NVl0pbk9as5clR gX2Db3C8Vss3cxjWqPcAE60Q0lrY5ez5kAtNuqqc8URUOiQSom4mEeImvv9hEJYGA6FB c+gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711386861; x=1711991661; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=bxkVFzSJVoLggCSIdplAlTRqteX3zHBKMrMrFYGST98=; b=JUbaxtgGNu/8+SkvAPeVsIfzqz6vvpVBoDm6UYosL7gZMrFV4vkFpGQm15UmwY4L1o D8t3SXsYOr6p7siRnH3exF4w4vBBj2Ml4CS8RdzYHZsL9yOrdC6vP+z915HyjulYGGV8 mGiS/XY8aUYV2k397Y7+MrjHCnRm1RAVELCouG8FyuwKxe6uE07PE58xPKeoCfpoyCut Ulpk3pJEB0+PbXaqSCiBz+w0YEwtqpU/m4BoN0NCm+s0vyMcPCl1Vjv3JreiWtU1lEhZ ty6OFBAh30uijoNoznFAckEcdsI7PAdZGflil/GGRxvROg9gEFr1MPjsxQRDv4mDQq60 UYTA== X-Gm-Message-State: AOJu0YzfbpvwFc0GOJ8tP0/5mLFh/j95Ggs3WOw515FI3F1bRSL9K1k4 V9D5k5Wu/NYBV0r3ZDo2Z1FpKWYtYHtCXMEdHoWWpkdOKRXl+lSskwIvlHZMyu2zCCkA X-Google-Smtp-Source: AGHT+IHqt40rBnvxQlAYOk3f3IH9W03fIAB5lQLk0ER2F//3YE3Sl2ab/cKHdSL/dz1j/62/GlTTNw== X-Received: by 2002:a05:6a00:399f:b0:6e6:8b59:1bad with SMTP id fi31-20020a056a00399f00b006e68b591badmr8163470pfb.25.1711386860921; Mon, 25 Mar 2024 10:14:20 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([115.171.40.106]) by smtp.gmail.com with ESMTPSA id r16-20020a63d910000000b005dc5129ba9dsm6001812pgg.72.2024.03.25.10.14.18 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 25 Mar 2024 10:14:20 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Matthew Wilcox , Andrew Morton , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 4/4] mm/filemap: optimize filemap folio adding Date: Tue, 26 Mar 2024 01:14:05 +0800 Message-ID: <20240325171405.99971-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240325171405.99971-1-ryncsn@gmail.com> References: <20240325171405.99971-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: p81wjybn3pbiffdzk9tt4cttjj75bts3 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6F1CE80006 X-HE-Tag: 1711386863-86882 X-HE-Meta: U2FsdGVkX1+yqwD455rYqUV+ZYMizNrNniS25V/vfHwozDhaGNd6TLEZl/dhCweHfXDArEimSdFfrgnHJ12N2ZDKmcGY8yDs5kCVKwDWzfvE+9zysgQT4YJVdrOTXQrV5d5O4hd4dWbsDki0jWgO6YVyVBPvhx5k6eOsVscrzFux5JgHXR45W+0y2phWgaAgQ5pYco+Vun99h3YVAs2nchUYA8qyAEKSBXSDfzRzs6g2j9ylJq7v+3gHqwIuYVynj1bFkgoBuWs+0VdoW3swWdLYJk311QJv2cGE+ALxIXbj0vthEqh0chTIMBG2JminczI6KlFjQgZXaE4h6xITumBWiMSUCqpcK3gLpKc/aCPF/ciMjW09CYQMCMPKtK/Ozu++8QXsdnv/z76h8Xyzff8CzFtU1F19wt6g3oVG35nJGRra/1uKNor+oz/LVx+RBfbvSz8zOMRmKwIvptbTsIsBHPVD5vAtsMIXJvNRDU694NgzYtfsGUP5YEWg7SY7Jn5AFLfSxV62+L/keqMny73RXLv/bvZoowLgj8yvQA91E7rWDF28jiBTcPhu35P1iPnV+YpAdEVkV50ifS4kkKh2y+7+plf5cvRFtjEeQq24UKsiQsT38ohsbTxfUMeLH9WXe2e7sLsqmQjkPff86Jfms9I28zRjvBWT9wYARcGxTF+dFBdlqs9e7fOih48KVdFiPhWrIVGcnJ7K/nhQa+aqrPhydncSA0VPj7w24k1zFTpZR8CII8cJSoxozOd67FdQDCNmWM3eji6AiQhwn4JRsc4DryyrBjNSVV/8f75yQ9AClzDChJqMMn1FqqmsBybSPhSigisyW1osz+IXUGuX20gGx8CJuPiWoGaZCnIChkjeEb1GIDZ1ImmvaLRHPDbnUr0QnIdyFunwG+wMBrOER52+ZQDCpp6pZNYThbjF80vzh4VOH0+M1/YBnnRREYDYHypiZi5hNzQfYom Am53RtrM 8+S0tGXzlT2mFPJ7Xl2tBLLNI4OOS92J0QQ1aBpH0Tme5+XMc9620CPBhV4tRaTtsuffReew5cJEBluE9BNN5CLJ28GIZnLi5+5EVdFljzzyv0sFiVgLpvzZy5QsRUPp0NanpOIRb5gnt/sIrm+Qz22LJkB2Qe2zJPwmLTsF5VLnA1/n0MmpSiJcyR3NXXSo5o4D9MMyfUpSTThGFK80sfmNet8wMEMHcqwo7I82Gpx4f1RXD8PnbY9kdhsuC/K2GkSJguwESuMoYuL08FdmJZmReisUsmLaoyH4NUJIDCpxwxRYYKdLO+VdsvKXtO448rvj7Isx8loDicxeOevmOv7XYECfamUFrjvHlQWUOssaBKsVK9pkkADV5XGjjItJZ1t4g6ZVXwu0oN8PEZVmLVJXnMZGxSaRN+L2RWrW2iqFH0ZoFIWdv7EvNcN5t2mwZKnvfdLJDv9VWdRMMU0TpXRxDBrFENqSiMoJ6xjnOQs6k10RMu3bm92IiGxZEi/l0ac67g9Pda+P3u7oJAeUUPLJ3X4Ho9VwHDe2cjDn2OnacejGBRnxKXDmrRUvER5AlqDZXZVldR6vxHGND2nvjUQsaz6puXXfyfHU4kyHjy5PHiwJw+p2peL46eL/eiCURIkLRFukmyAkMKkd3MyPigzNoAt/pnN4dyXLY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4K pages, in an 8G cgroup, with 16G brd as block device: echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691 After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976ยท READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec After (+12.5%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146 READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read. Signed-off-by: Kairui Song --- lib/test_xarray.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++ mm/filemap.c | 56 +++++++++++++++++++++++++++++++++------------- 2 files changed, 98 insertions(+), 15 deletions(-) diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 26e28b65d60a..f0e02a1ee3d5 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -2015,6 +2015,62 @@ static noinline void check_xas_get_order(struct xarray *xa) } } +static noinline void check_xas_conflict_get_order(struct xarray *xa) +{ + XA_STATE(xas, xa, 0); + + void *entry; + int only_once; + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1; + unsigned int order; + unsigned long i, j, k; + + for (order = 0; order < max_order; order++) { + for (i = 0; i < 10; i++) { + xas_set_order(&xas, i << order, order); + do { + xas_lock(&xas); + xas_store(&xas, xa_mk_value(i)); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + /* + * Ensure xas_get_order works with xas_for_each_conflict. + */ + j = i << order; + for (k = 0; k < order; k++) { + only_once = 0; + xas_lock(&xas); + xas_set_order(&xas, j + (1 << k), k); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + if (order < max_order - 1) { + only_once = 0; + xas_lock(&xas); + xas_set_order(&xas, (i & ~1UL) << order, order + 1); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + xas_set_order(&xas, i << order, order); + xas_store(&xas, NULL); + } + } +} + + static noinline void check_destroy(struct xarray *xa) { unsigned long index; @@ -2067,6 +2123,7 @@ static int xarray_checks(void) check_multi_store_advanced(&array); check_get_order(&array); check_xas_get_order(&array); + check_xas_conflict_get_order(&array); check_xa_alloc(); check_find(&array); check_find_entry(&array); diff --git a/mm/filemap.c b/mm/filemap.c index 6bbec8783793..90b86f22a9df 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -852,7 +852,9 @@ noinline int __filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp) { XA_STATE(xas, &mapping->i_pages, index); - bool huge = folio_test_hugetlb(folio); + void *alloced_shadow = NULL; + int alloced_order = 0; + bool huge; long nr; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); @@ -861,6 +863,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); xas_set_order(&xas, index, folio_order(folio)); + huge = folio_test_hugetlb(folio); nr = folio_nr_pages(folio); gfp &= GFP_RECLAIM_MASK; @@ -868,16 +871,10 @@ noinline int __filemap_add_folio(struct address_space *mapping, folio->mapping = mapping; folio->index = xas.xa_index; - do { - unsigned int order = xa_get_order(xas.xa, xas.xa_index); + for (;;) { + int order = -1, split_order = 0; void *entry, *old = NULL; - if (order > folio_order(folio)) { - xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index), - order, gfp); - if (xas_error(&xas)) - goto error; - } xas_lock_irq(&xas); xas_for_each_conflict(&xas, entry) { old = entry; @@ -885,19 +882,33 @@ noinline int __filemap_add_folio(struct address_space *mapping, xas_set_err(&xas, -EEXIST); goto unlock; } + /* + * If a larger entry exists, + * it will be the first and only entry iterated. + */ + if (order == -1) + order = xas_get_order(&xas); + } + + /* entry may have changed before we re-acquire the lock */ + if (alloced_order && (old != alloced_shadow || order != alloced_order)) { + xas_destroy(&xas); + alloced_order = 0; } if (old) { - if (shadowp) - *shadowp = old; - /* entry may have been split before we acquired lock */ - order = xa_get_order(xas.xa, xas.xa_index); - if (order > folio_order(folio)) { + if (order > 0 && order > folio_order(folio)) { /* How to handle large swap entries? */ BUG_ON(shmem_mapping(mapping)); + if (!alloced_order) { + split_order = order; + goto unlock; + } xas_split(&xas, old, order); xas_reset(&xas); } + if (shadowp) + *shadowp = old; } xas_store(&xas, folio); @@ -913,9 +924,24 @@ noinline int __filemap_add_folio(struct address_space *mapping, __lruvec_stat_mod_folio(folio, NR_FILE_THPS, nr); } + unlock: xas_unlock_irq(&xas); - } while (xas_nomem(&xas, gfp)); + + /* split needed, alloc here and retry. */ + if (split_order) { + xas_split_alloc(&xas, old, split_order, gfp); + if (xas_error(&xas)) + goto error; + alloced_shadow = old; + alloced_order = split_order; + xas_reset(&xas); + continue; + } + + if (!xas_nomem(&xas, gfp)) + break; + } if (xas_error(&xas)) goto error;