From patchwork Thu Mar 2 23:27:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13158049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA8DDC7EE33 for ; Thu, 2 Mar 2023 23:28:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B382C6B0082; Thu, 2 Mar 2023 18:28:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ABDE56B0083; Thu, 2 Mar 2023 18:28:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 826876B0085; Thu, 2 Mar 2023 18:28:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6C7E36B0082 for ; Thu, 2 Mar 2023 18:28:08 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8A990160FA4 for ; Thu, 2 Mar 2023 23:28:07 +0000 (UTC) X-FDA: 80525548614.13.A060390 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf13.hostedemail.com (Postfix) with ESMTP id 1EA2720002 for ; Thu, 2 Mar 2023 23:28:04 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=fsbQHlBu; spf=none (imf13.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677799686; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=8W0GXVgRuSB9B6rebynfjhy7dE20zjSrVQrM74z6Jks=; b=T9F10M1Ftp2OK5DM+mueQXMS+TNXZh0JUc3MgSaiVkX9vvh9dYWvIc4hUINieuq7RpZO91 GcRvbee4FKh5TLRsbv8cmucmjVaqJ9n+ZrBOBWQMo3RQUHrDTSNsufkkXt7Z6wNih3u42V BL5r1ZD66T+lDxjuw39UKCYX9IgS+DM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=fsbQHlBu; spf=none (imf13.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677799686; a=rsa-sha256; cv=none; b=WRe1JUzH4VXS4kbO3pqCOdiTuc3jEfWI6ASYkyx9hTPOtgD3oWuknxGDvc02WcZDywO8RT 53vsRcaFel2FmLlJ2+a44fAXGE75oST6f/DWehE/N3N+WfuFhwvi6/+wKOdz/RYUsHIZYq YvjdzraSabGVOxbBh0mXpM5riZWhgpg= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:In-Reply-To:References; bh=8W0GXVgRuSB9B6rebynfjhy7dE20zjSrVQrM74z6Jks=; b=fsbQHlBuN24Stwq+YBoeoK7D2D xa/36UVjFZQ+53YYTYfzpTGwzxM2v78TQZNvAZ6zy5RZLZskqbS74QYOwkr5QsOg93EcNTdyh2hzP EdQvVZCGBmsu2JZgryEKE8jENfPsHxZBzKj4xDQzmcVcJEOI8XvSYh8kg9lS0f2HPTMcIz+BiSFXQ CR3O5p5sCpe7pD8eNwWDWg30MKpvo494xBd2wY5gL1X9MafuTrO9KS+/aI/UWUrOGMYXxlZlIGujc cjTp85aO+7bXzNUwc0iZxFhNih0RUqaCQN7DBfw8qH1i4Ku+3vlBt58EYExHXIYXoWiSc1/keVEbR Jf9t6Y7Q==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pXsL9-003j3G-0d; Thu, 02 Mar 2023 23:27:59 +0000 From: Luis Chamberlain To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, mcgrof@kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH 0/6] tmpfs: add the option to disable swap Date: Thu, 2 Mar 2023 15:27:52 -0800 Message-Id: <20230302232758.888157-1-mcgrof@kernel.org> X-Mailer: git-send-email 2.37.1 MIME-Version: 1.0 X-Stat-Signature: btpa84kb1t1ec5nfhykhz38audtw37yh X-Rspam-User: X-Rspamd-Queue-Id: 1EA2720002 X-Rspamd-Server: rspam06 X-HE-Tag: 1677799684-796755 X-HE-Meta: U2FsdGVkX1+gsVa9kC0s601qQZJs/QmOBBP4b3Acxrmk/Nfxa1QaZZDV9X63Kp6Bb7FCrZH56TmnbOJJs8bC6TAnihdsWnYcqAxj/UMlhyglu6ZsNoQLWJlBlp561409cAAxh6DTHQP9M3tScw93gMHuf0gjBtQI631riEfSKaqkJ1AzP136jp/0efd6B1RFIF5ipQ0ls0Wxxx56u0XdV5ipy2RM4XMU3XO48hr4nEB676gmM+RzJ2nNaVDObA+/0Y6rl+zMWDonUUt+gHqZwuan7wnbClYLTQKR1VVC71grjqbxLfzYry1XBtjr7pu6rZ/BDPagYo8xd3AW0X8Nxm1ASKirQnwdCfieiFPYYP3Xs1dSDHlDGQqw8V+SIJH7vPzqKZOotdZ6vF/oHHFCEHUb+cUh54uQoCkXsnZwRgDjdxX0x2asxNo/GPMPIJVX711XBSs8fsTXg64Zqa+3rfbC54RMHCXZJ6yT3dYI28Ufunu8jdnXA4bOaNgl5yP0WpPU51BLYjIsko5UyDh8Fpsi0OYlLB78oZhSRYghZz1hvTGwh127rJ+eoapyZ2oC4lQVQaW/W3Ks5ZstcMURRIczEPZTsCOQxSBLPflVlNBTyUAmcbwzyBYuORANu54J5ezJtrEvv3tyfC2WQdD3X4RWToBTDVtBY+dNe5TqT9DqYmqdsQ0LaPXTBDlGcdccvZ+0YlcMm0Jo1i02IO0H8dN/cTfdUTMvHMKEc7wdsFGfZ+F3Le8BbiQM6X9tx84keTbUOVbv+Neyv/10Hzjk4+G9Xh5it2wF1rslO1GGOzblZmxTrF57y2V07uqY6KwMtbGTzIvtykCO2JoHBXA10LBQMgxsTmcNg0xwuWNWsRRsp19U4iS8H0ZVq2w2T5xZMsgg2Y+VvblnK8cDc31c66+pz0VHmaVA2yQw8Mh9F6Z9JD/26Mb7zE4WggN4ZzypQOA0F3mXZeH3MNKYigf YaqnrtPu sM1K3D9zIi4ixkNhXFX6nojmUslt/g0XZyIbzg/DvIyDrZxJAa5fnb2CgqIpcakov4yszkQX/KU/uoMaNkkt//eldYNPQV3hJu7YiUlXFlr541RT/3+CMnk4UnfMHiAEFqqsQeBi1S0htei+1eXw8qlNOSFE17kcKw7CBgu0kBzR4G1sRSdMW3ngttJNIBI6KFwstbP3sQgpecxdA+YYWqwCd8NLaS1pKFrL4ux0+dsEeuoNiWvtiXkNyu7I/bZC2FnYvAzZb9jMxzMQw5+mso26HBwgUq7ZOt0dRn/CxAk+RwliQA7XgWLzFGQg0fDBPaO8zxjsjrSpIstdq7lCmUvDZThlb+h9EOFwxQP5uWR7wkjB9+3NpsQcF//DmFc+FP3BQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After a couple of RFCs I think this is ready for PATCH form. Review is appreciated. Below the changes I also list the series of tests I performed to verify correctness. In short you either create a fs with swap or without, but if you can't change that option later. If we really wanted to, we could work on accepting this change on reconfigure (remount) but its not clear yet that is desirable so for now keep things simple. Changes since last RFCv2: o Added Christian Brauner'd Acked-by for the noswap patch (the only change in that patch is just the new shmem_show_options() change I describe below). o Embraced Yosry Ahmed's recommendation to use mapping_set_unevictable() to at ensure the folios at least appear in the unevictable LRU. Since that is the goal, this accomplishes what we want and the VM takes care of things for us. The shem writepage() still uses a stop-gap to ensure we don't get called for swap when its shmem uses mapping_set_unevictable(). o I had evaluated using shmem_lock() instead of calling mapping_set_unevictable() but upon my review this doesn't make much sense, as shmem_lock() was designed to make use of the RLIMIT_MEMLOCK and this was designed for files / IPC / unprivileged perf limits. If we were to use shmem_lock() we'd bump the count on each new inode. Using shmem_lock() would also complicate inode allocation on shmem as we'd to unwind on failure from the user_shm_lock(). It would also beg the question of when to capture a ucount for an inode, should we just share one for the superblock at shmem_fill_super() or do we really need to capture it at every single inode creation? In theory we could end up with different limits. The simple solution is to juse use mapping_set_unevictable() upon inode creation and be done with it, as it cannot fail. o Update the documentation for tmpfs before / after my patch to reflect use cases a bit more clearly between ramfs, tmpfs and brd ramdisks. o I updated the shmem_show_options() to also reveal the noswap option when its used. o Address checkpatch style complaint with spaces before tabs on shmem_fs.h. Chances since first RFC: o Matthew suggested BUG_ON(!folio_test_locked(folio)) is not needed on writepage() callback for shmem so just remove that. o Based on Matthew's feedback the inode is set up early as it is not reset in case we split the folio. So now we move all the variables we can set up really early. o shmem writepage() should only be issued on reclaim, so just move the WARN_ON_ONCE(!wbc->for_reclaim) early so that the code and expectations are easier to read. This also avoid the folio splitting in case of that odd case. o There are a few cases where the shmem writepage() could possibly hit, but in the total_swap_pages we just bail out. We shouldn't be splitting the folio then. Likewise for VM_LOCKED case. But for a writepage() on a VM_LOCKED case is not expected so we want to learn about it so add a WARN_ON_ONCE() on that condition. o Based on Yosry Ahmed's feedback the patch which allows tmpfs to disable swap now just uses mapping_set_unevictable() on inode creation. In that case writepage() should not be called so we augment the WARN_ON_ONCE() for writepage() for that case to ensure that never happens. To test I've used kdevops [0] 8 vpcu 4 GiB libvirt guest on linux-next. I'm doing this work as part of future experimentation with tmpfs and the page cache, but given a common complaint found about tmpfs is the innability to work without the page cache I figured this might be useful to others. It turns out it is -- at least Christian Brauner indicates systemd uses ramfs for a few use-cases because they don't want to use swap and so having this option would let them move over to using tmpfs for those small use cases, see systemd-creds(1). To see if you hit swap: mkswap /dev/nvme2n1 swapon /dev/nvme2n1 free -h With swap - what we see today ============================= mount -t tmpfs -o size=5G tmpfs /data-tmpfs/ dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5 free -h total used free shared buff/cache available Mem: 3.7Gi 2.6Gi 1.2Gi 2.2Gi 2.2Gi 1.2Gi Swap: 99Gi 2.8Gi 97Gi Without swap ============= free -h total used free shared buff/cache available Mem: 3.7Gi 387Mi 3.4Gi 2.1Mi 57Mi 3.3Gi Swap: 99Gi 0B 99Gi mount -t tmpfs -o size=5G -o noswap tmpfs /data-tmpfs/ dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5 free -h total used free shared buff/cache available Mem: 3.7Gi 2.6Gi 1.2Gi 2.3Gi 2.3Gi 1.1Gi Swap: 99Gi 21Mi 99Gi The mix and match remount testing ================================= # Cannot disable swap after it was first enabled: mount -t tmpfs -o size=5G tmpfs /data-tmpfs/ mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/ mount: /data-tmpfs: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. dmesg -c tmpfs: Cannot disable swap on remount # Remount with the same noswap option is OK: mount -t tmpfs -o size=5G -o noswap tmpfs /data-tmpfs/ mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/ dmesg -c # Trying to enable swap with a remount after it first disabled: mount -t tmpfs -o size=5G -o noswap tmpfs /data-tmpfs/ mount -t tmpfs -o remount -o size=5G tmpfs /data-tmpfs/ mount: /data-tmpfs: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. dmesg -c tmpfs: Cannot enable swap on remount if it was disabled on first mount [0] https://github.com/linux-kdevops/kdevops Luis Chamberlain (6): shmem: remove check for folio lock on writepage() shmem: set shmem_writepage() variables early shmem: move reclaim check early on writepages() shmem: skip page split if we're not reclaiming shmem: update documentation shmem: add support to ignore swap Documentation/filesystems/tmpfs.rst | 36 +++++++++----- Documentation/mm/unevictable-lru.rst | 2 + include/linux/shmem_fs.h | 1 + mm/shmem.c | 70 +++++++++++++++++++--------- 4 files changed, 75 insertions(+), 34 deletions(-) Reviewed-by: Christian Brauner