From patchwork Wed Apr 17 16:08:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13633564 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4716FC4345F for ; Wed, 17 Apr 2024 16:09:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0A996B0085; Wed, 17 Apr 2024 12:09:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BA536B0087; Wed, 17 Apr 2024 12:09:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 881AB6B0088; Wed, 17 Apr 2024 12:09:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6B25F6B0085 for ; Wed, 17 Apr 2024 12:09:42 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F0EB4160EE0 for ; Wed, 17 Apr 2024 16:09:41 +0000 (UTC) X-FDA: 82019509362.15.4709237 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf01.hostedemail.com (Postfix) with ESMTP id 407EE4000D for ; Wed, 17 Apr 2024 16:09:39 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XD6QdZUv; spf=pass (imf01.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713370179; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=zpA6c5jeU6NWfuv45H8T/JHqLUyxgpH09COJKzBQsDY=; b=B3/eWWI4v8M21FoSELqKimS0Kz1kPAfAuIpImj1aZmLirUyIR1YqF0v0Hnbsbg6L9V0h9i E8MxwHNNxhxgGIlp903wGLPk7GTPzURcHSmRr51ruI8EUDX7wxeuO3ArtMHkQVmqRMNK3w AEQo18k0753xZ9D5ThtCJ74p6lPwYPY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713370179; a=rsa-sha256; cv=none; b=S6k9lodtlSsgs/CQ8mYYFHl64ikCt8tpZvVEAiPUgOsPaMxgEZVcs5xU8JM2UpYCpTECUv qEyYg39h+XyBTcuuTmX3P7Dw9GGfheOh7XObfzl4Sn5W+onPpxLJ4Lo8vcZnUjAcuJmXyL HdKzhDoab02wz1hFwK/GEAveAOlpS24= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XD6QdZUv; spf=pass (imf01.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6ed32341906so5184222b3a.1 for ; Wed, 17 Apr 2024 09:09:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713370178; x=1713974978; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=zpA6c5jeU6NWfuv45H8T/JHqLUyxgpH09COJKzBQsDY=; b=XD6QdZUvY5mPaHd/0fm9xigAQKBGg05JIKVjZLeBnG0NlWVxwB2zzjaFVCuQZDa2N1 JL6SQeXPUXrO8URi+416E4aGiWvpTsHRb3PPuTornm9xMyT0fIn/HqVtK2Z5zYPofMPq yfKPuUrhKbFVuQ8BP3VyQwNXwkRCcwHZGsUJ/HzT6RDJXhoLcHxsZy3Zvp/PeOcYJRFc g+OACTcL89Wc1yXd45AJVUAzDz8rqWqUJ3D8jHC5qSuxHxW1CGFp0R19xqr0UZTLw3MO OjLDIDyJkR4eDZHEVVX128FXpHfFJQCXzhHyrNizJE6FcznrAijqhKNH44hGPF9gj8Qa E2zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713370178; x=1713974978; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zpA6c5jeU6NWfuv45H8T/JHqLUyxgpH09COJKzBQsDY=; b=rTqiWae1d/BjzQ3ptbi6dYIcfKivM/cIhXnQ6y/xRqmIuJyykmmCthH5Qs1z0q6RDH Dl6I/U2QWiZhwC3RqfNt4xpm4oMbtK+IVGPQFfRNnUtozNlC+seHdnFBl2uOjAzX2cJs /wUb5P1asAh1dYOUK5LkTj1VroLdgCoZo5pYY3anRnqv8O0LONZXtsh1AjunXA+R+3R2 OvBUbIUB3gODjMJM3ZL+1mflhdab4guC2FlsOUV/kBQ8w1aDXnNc/tux1TmVvy4fhgJf 6HJt9YHlaHeIiKYrNToajtS0qb4eJRo+nIi7Z30QHKHLBd7KDQVdAPYKJtBJ6w8t3wCp 6+jw== X-Gm-Message-State: AOJu0YyEdg7nXGp2wfG18CbJOg/Pbmhr1SGxzqFmXRHQXLLqwQNvHiK8 bfwIxD6w/KN+6PFomlUigbJPtWDIxXp5udZ5L4SZK8PQxGCvc5t9/H/iGc73DwFpVQ== X-Google-Smtp-Source: AGHT+IGNmib65LDOBKOQRrNTouAlXyuI/bpruHFp2PRNf9TheEZeVt7nbTrBBQsxbd1SqrXkYU8PaQ== X-Received: by 2002:a05:6a20:8424:b0:1a7:78d2:a142 with SMTP id c36-20020a056a20842400b001a778d2a142mr66148pzd.38.1713370177453; Wed, 17 Apr 2024 09:09:37 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([115.171.40.106]) by smtp.gmail.com with ESMTPSA id h189-20020a6383c6000000b005f75cf4db92sm5708366pge.82.2024.04.17.09.09.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Apr 2024 09:09:37 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , "Huang, Ying" , Matthew Wilcox , Chris Li , Barry Song , Ryan Roberts , Neil Brown , Minchan Kim , Hugh Dickins , David Hildenbrand , Yosry Ahmed , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 0/8] mm/swap: optimize swap cache search space Date: Thu, 18 Apr 2024 00:08:34 +0800 Message-ID: <20240417160842.76665-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.44.0 Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 407EE4000D X-Stat-Signature: pcqdzn3bc8mn6ubejurs8rzgkxey5345 X-HE-Tag: 1713370179-689759 X-HE-Meta: U2FsdGVkX19V65/4seMt8yTXm+RLL8FproHejEXoGhXyaX38uEWqVWxle179SXdHTtFlNMSfALj9Ha2DAZnzN0ziph9mlMALCYmjxwa3+M/upewvQgPvkGvGf18Fyt1QuWB1mJ21CzO65VzzGSAIwR6iiDZGbTxEX+DcUm22MQNR0yAxar5UyhDuudv0mfgASdNdsFTmGghoI0VlfNAFHHLyTlIIWQUXYjZq2JYJo1PqHRGw1ksKqN2JlAA4F0HPM+scDnigs6c/vlvN/+tT6TxHsO30nAn2OquUvqmPCbut2B6m5/I5O67a7n1zZPw4wHvEglMZZhwJ4c2x0Csn3+PNjMhO0GNvXJjEhFbLNavXwxTNcD/XZLi8WjBtbMeu1ND1OyMUo2c5UhRqCibVto0Yz2YJrYrTo0IvrMKIjykQCXcPpaK9F5p8ZOedMcTHvsnXfqXTJ66Hn45bsSiNF2/SxzJZvCCb4u4fewfDMs1NhPRQYW7CJUUmavhfq1s0qnVQ9vC08dF/mLHy3ekG5Gt0cvC5rWcaLOR+/PMnA/PYlRr6L4CiP2/iX/BKUbaQI4nd7E/PuUN28KJ4XiwSxU0MxKu4L0FSo31wkmuV3lAg3yVWWD8HihfTxWz6V420V482XwqkFb4KLU5xmyXiXR5M2uOsIUcS8Sx4PXuijfHyCtJR3+v2DX7weisBo216kdnyKGlbq3v1Gr1qMDfFZLKIKT7KHusjv9FByPbWWSaW1G5W5mBYwHfPY+mlV0nwlq3rYTT0ieHE1YBhrZpzBEVgXOciGi6NStXwsXjLvPoFvvT+SDaWJfGP3dRyMiHnnxw1Rzxjd6DVVx6JzN1aYDHwWecZcgsgzzZ1hC2lJr+jK3+dZv2qdDFs6qvpk2tHp09q5pnRCxjIR318Kjb16/5jBOATs3knoqOlMpex0t1qr4/5YG2vAXzgsIqXT1GfGVZgxZ4PROa21AvY+3B z66kjSwh oGy+wSZqW2du0dbO4LfZ+jroqw9AWX2/3lyQAa6qNUOydKwEYdctsL1mLZ8luROJS2mKX8KSt8MYOx34qrHuzm4Vs5lhErT+6S/tYumviYVlzaRh83rWFrdd2zla4mo3Y4OuRtHqx+oikBWWNU8TGiZkAnlUqw5X/ujL9oxThYm9VPhkQBzAliNz3Vh654Ntu4esaP80/thjK8R3CTLxjExTQRyVRX66XWCQEyBTTKkmK2c6LjmxO3aKVYq9cFssh9Pfc4GFb0yrCCX8ZmPCi3+YwVIdh1RXT8EkyfezDU2PSDpQU2BLktu2S/rmJuJV0nil30KjA2O67Z1ZIxZYKbs4cIhDCMhAlEhK/kcjDysHwyxyRuTP11m8vA2Y3vOTaCzX5lB7YnbahoXXhYZ/PQMnR9z6c8yme4kxfimFksERLoXKnBOSgS1pDtqFWNYPRdImHAfSMHOUjtdxJdRDmsTjgDT1QjaDWOkjW8h6flz3DBtFZ3xkbG4q840QCsN+etfdDZCIpAhInEXExxd67dhblxUafGvCjKEX94nKsBkkgJ6W/uR1ej4qy+opPJpSFXHz/FJEKDKg9FiP867iEJ081DlIPH6xQWPZze1EymwIaWYkNl/Ubo3xiYVZTv9lH/8yQa+TH6zgrhGlJ2Fc8twn+hWjd8QGe4sZRcp7CIu1VY6VTmlIXdDcNXfS+J81BkMxHcURe3OFNzuJHSKeVCkUymw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently we use one swap_address_space for every 64M chunk to reduce lock contention, this is like having a set of smaller swap files inside one big swap file. But when doing swap cache look up or insert, we are still using the offset of the whole large swap file. This is OK for correctness, as the offset (key) is unique. But Xarray is specially optimized for small indexes, it creates the redix tree levels lazily to be just enough to fit the largest key stored in one Xarray. So we are wasting tree nodes unnecessarily. For 64M chunk it should only take at most 3 level to contain everything. But we are using the offset from the whole swap file, so the offset (key) value will be way beyond 64M, and so will the tree level. Optimize this by reduce the swap cache search space into 64M scope. Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap cache): Before: 6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k 0inputs+0outputs (55major+33555018minor)pagefaults 0swaps After (+1.8% faster): 6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k 0inputs+0outputs (54major+33555027minor)pagefaults 0swaps Similar result with MySQL and sysbench using swap: Before: 94055.61 qps After (+0.8% faster): 94834.91 qps There is alse a very slight drop of radix tree node slab usage: Before: 303952K After: 302224K For this series: There are multiple places that expect mixed type of pages (page cache or swap cache), eg. migration, huge memory split; There are four helpers for that: - page_index - page_file_offset - folio_index - folio_file_pos So this series first cleaned up usage of page_index and page_file_offset, then convert folio_index and folio_file_pos to be compatible with separate offsets. And introduce a new helper swap_cache_index for swap internal usage, replace swp_offset with swap_cache_index when used to retrieve folio from swap cache. And idealy, we may want to reduce SWAP_ADDRESS_SPACE_SHIFT from 14 to 12: Default Xarray chunk offset is 6, so we have 3 level trees instead of 2 level trees just for 2 extra bits. But swap cache is based on address_space struct, with 4 times more metadata sparsely distributed in memory it waste more cacheline, the performance gain from this series is almost canceled. So firstly, just have a cleaner seperation of offsets. Patch 1/8 - 6/8: Clean up usage of page_index and page_file_offset Patch 7/8: Convert folio_index and folio_file_pos to be compatible with separate offset. Patch 8/8: Introduce swap_cache_index and use it when doing lookup in swap cache. This series is part of effort to reduce swap cache overhead, and ultimately remove SWP_SYNCHRONOUS_IO and unify swap cache usage as proposed before: https://lore.kernel.org/lkml/20240326185032.72159-1-ryncsn@gmail.com/ Kairui Song (8): NFS: remove nfs_page_lengthg and usage of page_index nilfs2: drop usage of page_index f2fs: drop usage of page_index ceph: drop usage of page_index cifs: drop usage of page_file_offset mm/swap: get the swap file offset directly mm: drop page_index/page_file_offset and convert swap helpers to use folio mm/swap: reduce swap cache search space fs/ceph/dir.c | 2 +- fs/ceph/inode.c | 2 +- fs/f2fs/data.c | 5 ++--- fs/nfs/internal.h | 19 ------------------- fs/nilfs2/bmap.c | 3 +-- fs/smb/client/file.c | 2 +- include/linux/mm.h | 13 ------------- include/linux/pagemap.h | 19 +++++++++---------- mm/huge_memory.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- mm/page_io.c | 6 +++--- mm/shmem.c | 2 +- mm/swap.h | 12 ++++++++++++ mm/swap_state.c | 12 ++++++------ mm/swapfile.c | 17 +++++++++++------ 16 files changed, 51 insertions(+), 69 deletions(-)