From patchwork Fri Jul 30 07:25:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B35DC4320A for ; Fri, 30 Jul 2021 07:25:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0F576103B for ; Fri, 30 Jul 2021 07:25:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D0F576103B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 59DCF6B0036; Fri, 30 Jul 2021 03:25:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54DE56B005D; Fri, 30 Jul 2021 03:25:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 414956B006C; Fri, 30 Jul 2021 03:25:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 257406B0036 for ; Fri, 30 Jul 2021 03:25:43 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BCA0C22C09 for ; Fri, 30 Jul 2021 07:25:42 +0000 (UTC) X-FDA: 78418419324.03.610008E Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by imf19.hostedemail.com (Postfix) with ESMTP id 6A123B000C30 for ; Fri, 30 Jul 2021 07:25:42 +0000 (UTC) Received: by mail-qk1-f170.google.com with SMTP id az7so8574134qkb.5 for ; Fri, 30 Jul 2021 00:25:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=jkYXHYQOJwdYcrKEK9MjpwYhNSueJb3fc8wKTs2YDzo=; b=MnYv4B4ahWeK8EelXwid43sUxDgsWADeVSOhfvddZrBPkswJK5Et+CtJWb+1DpLpwA dJxKP4tIc+aR6szsW9Lj/v66BEXLzd94E7bNJu1Qv1OUyxB+6RVhP7wlqaRXjwuQMRxn pr4uQKsGvfWn5CQjX4TZC+h5ZSYjlqNl8ghcCklzv30T9zSJOLkgfBRgJn4Mnl9FHJjc 5xY3HwHZ6ZhzzS1dkQQTof7vNMZVM8qYC1xMtDJVq2VqMGeUWb4XpvYSVM3lFW/hJJeQ c9HZP2vLmNhDzpObKtkmUZSZAx/7AMfQgfv6CjmnYWAD7b7wiotN3Ekace0xWNMNXI+D fYBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=jkYXHYQOJwdYcrKEK9MjpwYhNSueJb3fc8wKTs2YDzo=; b=Np+i6pjKAs694IhtfVXIOfBaenyKLi7qDE9WsydH1jW/navH+wmtFQlxDJL5l36M/g foY34YTdrFL5l+Wvnoo4q0+5bxZQL+6ozGx0oRETgZNT6qjEvmditCpBGwimXZTxMzhf 7cUCRZkSYhnCyH/HPFdaY9BRx5OsQ92J+0nLeCcqCtoO+pfbew+gPHyK9LuYYqSBi8hi S83Irifym7qXyM2e2Ou4Q6RTGIp0vkt/KVZManqWMpkfLdcW+iMco61rPco7Lth8uuRz irVsz0t6TnH07D7mBZmZDkJgR9GulXSXmr+cIScEoBqoI8IAiqlfKObJy4EpL3qMdgf2 eWqg== X-Gm-Message-State: AOAM532Rfzp7bkyRYkMvzKzradKfL4gN9/FPcIPnGBZl+rMH+NFWMycZ 6d5IJzVMOUqB4KHHzt0dkpN7JQ== X-Google-Smtp-Source: ABdhPJwqYYV/p4qw44eoD0Ui59yNPpX+r3lsn/zvr7f6iY0IIpj9pmMIxxqRvhwlEzpGikFmo1z2LQ== X-Received: by 2002:ae9:e90e:: with SMTP id x14mr992985qkf.118.1627629941526; Fri, 30 Jul 2021 00:25:41 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 5sm524075qko.53.2021.07.30.00.25.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:25:40 -0700 (PDT) Date: Fri, 30 Jul 2021 00:25:37 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 01/16] huge tmpfs: fix fallocate(vanilla) advance over huge pages In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6A123B000C30 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=MnYv4B4a; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.222.170 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: hf48s9cgcaw9sqfjt1pomsgarhyy134o X-HE-Tag: 1627629942-642447 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: shmem_fallocate() goes to a lot of trouble to leave its newly allocated pages !Uptodate, partly to identify and undo them on failure, partly to leave the overhead of clearing them until later. But the huge page case did not skip to the end of the extent, walked through the tail pages one by one, and appeared to work just fine: but in doing so, cleared and Uptodated the huge page, so there was no way to undo it on failure. Now advance immediately to the end of the huge extent, with a comment on why this is more than just an optimization. But although this speeds up huge tmpfs fallocation, it does leave the clearing until first use, and some users may have come to appreciate slow fallocate but fast first use: if they complain, then we can consider adding a pass to clear at the end. Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/shmem.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 70d9ce294bb4..0cd5c9156457 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2736,7 +2736,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, inode->i_private = &shmem_falloc; spin_unlock(&inode->i_lock); - for (index = start; index < end; index++) { + for (index = start; index < end; ) { struct page *page; /* @@ -2759,13 +2759,26 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, goto undone; } + index++; + /* + * Here is a more important optimization than it appears: + * a second SGP_FALLOC on the same huge page will clear it, + * making it PageUptodate and un-undoable if we fail later. + */ + if (PageTransCompound(page)) { + index = round_up(index, HPAGE_PMD_NR); + /* Beware 32-bit wraparound */ + if (!index) + index--; + } + /* * Inform shmem_writepage() how far we have reached. * No need for lock or barrier: we have the page lock. */ - shmem_falloc.next++; if (!PageUptodate(page)) - shmem_falloc.nr_falloced++; + shmem_falloc.nr_falloced += index - shmem_falloc.next; + shmem_falloc.next = index; /* * If !PageUptodate, leave it that way so that freeable pages From patchwork Fri Jul 30 07:28:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C473C4338F for ; Fri, 30 Jul 2021 07:28:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9A16960EFF for ; Fri, 30 Jul 2021 07:28:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9A16960EFF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2B4ED6B0036; Fri, 30 Jul 2021 03:28:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23D6B6B005D; Fri, 30 Jul 2021 03:28:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DDFB6B006C; Fri, 30 Jul 2021 03:28:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id E9E136B0036 for ; Fri, 30 Jul 2021 03:28:26 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 805E31801D3E9 for ; Fri, 30 Jul 2021 07:28:26 +0000 (UTC) X-FDA: 78418426212.36.8B7A764 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf10.hostedemail.com (Postfix) with ESMTP id 2F9D9600F2CE for ; Fri, 30 Jul 2021 07:28:26 +0000 (UTC) Received: by mail-qk1-f181.google.com with SMTP id c9so8533503qkc.13 for ; Fri, 30 Jul 2021 00:28:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=gm6Y2NNCrYynKUz85whiBz6KRNEuhvZyPYL7GqtO3tI=; b=YcXdMtMwe4CJ20YESa/SidLP2f0ojKCdgnA72cM/M/stZw8yPcShxr50c/X9GZ0fnj S2MpNEV0VqmbGbDz05IaPe2+fVwwCmSgqluOiT7fPmsnu0puBHItNjHJiFnJGffealfE EiRNpowfAZk8zZzuIf7zVHIw698cpXXbM96ZzNKmOB1ByAxOLgtChU32ndrfLSUczX2y CzjvalOcWF6Xdlh9tcyHAGRz4tUWPzKOXro4/VvEDL5br0c0lAAIdnPlrDWWfWBXBqhb gsk2ZTKYCgECLFu6It8Rb7W9jW2U2Uru10IkQKIe+uERSQMoK8+l3Bsdi4JjKVCwjmgL 9low== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=gm6Y2NNCrYynKUz85whiBz6KRNEuhvZyPYL7GqtO3tI=; b=V5HpY0zeDCCp+jctHAEcwykF2DPEBXzBzDG/mTRJ3gYZc/YA+VepfcmxxHJoaZ9+Dw FkpW2HeKfnvio5P80mGsoRFPQoH5bMfCq4RS4mIvbzQDtrc2rK8LCrMyXnZdpaYglVTD nZ5ukk1LrDTqMDIqZpzYG0+qUFNnCLG/HGo/lmnBmrW7LKJ576exAl3j2cuyhzMm0sTj ZU9JAmSft7osjL7o63yGJdc4FcyF0Al3k/Ge0J0kbh5szst0C0/LxJJKtQZExa+iiPgh Ayh3yfLOo3bIDt9K4miCiu+Ixj/KVa9AolMg5h4f2AsrJJPMeYizNlp/ZVIYaGRQaV5e fwPg== X-Gm-Message-State: AOAM5303SanIcRXU27v5WqKtdv+CBwLHbl0YkyCFoQ3qry9RV492Umc3 cpSRDipE1kbjsIEPw/WUOArFRA== X-Google-Smtp-Source: ABdhPJxecgdSCZfI93S5kHeP+Cwiny87A7nY3qgiITXrJwsBVA4B+q+cTKRJXSaBP4piXTqndrRkBQ== X-Received: by 2002:a05:620a:12f4:: with SMTP id f20mr1025600qkl.220.1627630105358; Fri, 30 Jul 2021 00:28:25 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v6sm517507qkp.117.2021.07.30.00.28.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:28:24 -0700 (PDT) Date: Fri, 30 Jul 2021 00:28:22 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 02/16] huge tmpfs: fix split_huge_page() after FALLOC_FL_KEEP_SIZE In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2F9D9600F2CE Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=YcXdMtMw; spf=pass (imf10.hostedemail.com: domain of hughd@google.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: jjnq8qco3kpb6dm666minwd33y4a4ecj X-HE-Tag: 1627630106-765787 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A successful shmem_fallocate() guarantees that the extent has been reserved, even beyond i_size when the FALLOC_FL_KEEP_SIZE flag was used. But that guarantee is broken by shmem_unused_huge_shrink()'s attempts to split huge pages and free their excess beyond i_size; and by other uses of split_huge_page() near i_size. It's sad to add a shmem inode field just for this, but I did not find a better way to keep the guarantee. A flag to say KEEP_SIZE has been used would be cheaper, but I'm averse to unclearable flags. The fallocend field is not perfect either (many disjoint ranges might be fallocated), but good enough; and gains another use later on. Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- include/linux/shmem_fs.h | 13 +++++++++++++ mm/huge_memory.c | 6 ++++-- mm/shmem.c | 15 ++++++++++++++- 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 8e775ce517bb..9b7f7ac52351 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -18,6 +18,7 @@ struct shmem_inode_info { unsigned long flags; unsigned long alloced; /* data pages alloced to file */ unsigned long swapped; /* subtotal assigned to swap */ + pgoff_t fallocend; /* highest fallocate endindex */ struct list_head shrinklist; /* shrinkable hpage inodes */ struct list_head swaplist; /* chain of maybes on swap */ struct shared_policy policy; /* NUMA memory alloc policy */ @@ -119,6 +120,18 @@ static inline bool shmem_file(struct file *file) return shmem_mapping(file->f_mapping); } +/* + * If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages + * beyond i_size's notion of EOF, which fallocate has committed to reserving: + * which split_huge_page() must therefore not delete. This use of a single + * "fallocend" per inode errs on the side of not deleting a reservation when + * in doubt: there are plenty of cases when it preserves unreserved pages. + */ +static inline pgoff_t shmem_fallocend(struct inode *inode, pgoff_t eof) +{ + return max(eof, SHMEM_I(inode)->fallocend); +} + extern bool shmem_charge(struct inode *inode, long pages); extern void shmem_uncharge(struct inode *inode, long pages); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index afff3ac87067..890fb73ac89b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2454,11 +2454,11 @@ static void __split_huge_page(struct page *page, struct list_head *list, for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); - /* Some pages can be beyond i_size: drop them from page cache */ + /* Some pages can be beyond EOF: drop them from page cache */ if (head[i].index >= end) { ClearPageDirty(head + i); __delete_from_page_cache(head + i, NULL); - if (IS_ENABLED(CONFIG_SHMEM) && PageSwapBacked(head)) + if (shmem_mapping(head->mapping)) shmem_uncharge(head->mapping->host, 1); put_page(head + i); } else if (!PageAnon(page)) { @@ -2686,6 +2686,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * head page lock is good enough to serialize the trimming. */ end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); + if (shmem_mapping(mapping)) + end = shmem_fallocend(mapping->host, end); } /* diff --git a/mm/shmem.c b/mm/shmem.c index 0cd5c9156457..24c9da6b41c2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -905,6 +905,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, if (lend == -1) end = -1; /* unsigned, so actually very big */ + if (info->fallocend > start && info->fallocend <= end && !unfalloc) + info->fallocend = start; + pagevec_init(&pvec); index = start; while (index < end && find_lock_entries(mapping, index, end - 1, @@ -2667,7 +2670,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_falloc shmem_falloc; - pgoff_t start, index, end; + pgoff_t start, index, end, undo_fallocend; int error; if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) @@ -2736,6 +2739,15 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, inode->i_private = &shmem_falloc; spin_unlock(&inode->i_lock); + /* + * info->fallocend is only relevant when huge pages might be + * involved: to prevent split_huge_page() freeing fallocated + * pages when FALLOC_FL_KEEP_SIZE committed beyond i_size. + */ + undo_fallocend = info->fallocend; + if (info->fallocend < end) + info->fallocend = end; + for (index = start; index < end; ) { struct page *page; @@ -2750,6 +2762,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, else error = shmem_getpage(inode, index, &page, SGP_FALLOC); if (error) { + info->fallocend = undo_fallocend; /* Remove the !PageUptodate pages we added */ if (index > start) { shmem_undo_range(inode, From patchwork Fri Jul 30 07:30:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410571 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D74FC4338F for ; Fri, 30 Jul 2021 07:31:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D83EB60230 for ; Fri, 30 Jul 2021 07:31:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D83EB60230 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 76D526B0036; Fri, 30 Jul 2021 03:31:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71C446B005D; Fri, 30 Jul 2021 03:31:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60AD98D0001; Fri, 30 Jul 2021 03:31:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id 443046B0036 for ; Fri, 30 Jul 2021 03:31:01 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D51C78249980 for ; Fri, 30 Jul 2021 07:31:00 +0000 (UTC) X-FDA: 78418432680.36.03363A5 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf25.hostedemail.com (Postfix) with ESMTP id 8E63BB0029A7 for ; Fri, 30 Jul 2021 07:31:00 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id k7so8556303qki.11 for ; Fri, 30 Jul 2021 00:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=nVKTzg+zegoJ7pq31Eztzo2O8w5rVTTFPmYOBnSB5HQ=; b=I5y09he3LMg9nKJBO/RVZIAjX6eTPeNw3nDjtVgbJJDsdfBWEvUcV7yl3ay/o2aq3N U/3Db1OVkuwPTecxrJzo9KsnCtel83a2PlXjtiNu0Rmayu9sRgJrg1Y3QGGP6uDxv80+ WbBfX4kJ+XgAia/sidQoA3Vk7ZCYy1wJR60sTwGylISHBMWoKN/vkV4GWBs7zEx+mGll u4M62kd1imThj2ggOgJ585Fitxj1BfSa+4J1Va4PDDiO5VdAta3MHT4du6H9lezICkVs K08fxnzZpkjVK8KJuM0jehbU6wPy6juKuaTpVaNVWtJ8OnEvN/N1ZCEjky/4Wup1/IZV 4EFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=nVKTzg+zegoJ7pq31Eztzo2O8w5rVTTFPmYOBnSB5HQ=; b=XJFFk/YX3jTJ8DNBOXljP+ZcEPGnQv3GkfeemcLTmJeEBwa2RinMWs5Ha9jzUXpGW3 4ENe0jC+Q9g7yWqDbIjXfY62tBltY329VY93BwTdnbsnLYGDc8KRpLY+QIYBoaQ7fAde SV0fMyQNpYMlztRhG3gfVqwYgIMILvYNAMMSq5Sd/iNXvwtkjaX+0pY3pSlaC90n2eUk 6iFTmZmo06/RTDrm5q09DHynl2vyJbxsBW3SsPBbikxvep81BER5Dioccrb+PMuYeSsH bdtJHIm0rq4u51ZW9ArQIucMu/0dmHXFATfnRgNy+4LeIktoQFMSXjftHuTExvcqcU3P Tsjw== X-Gm-Message-State: AOAM531gsLXt0Nspdssdeun9LRP1NbZJ2/AxD4I/OX9jBYgeRCuGRa4V jXVVbRtHGfiH4Kc93PfK6I4U5Q== X-Google-Smtp-Source: ABdhPJxDC7noWhmpF/FxW8qRqxgJS863LkrD33kURWaXcju0YbgRvrIQ99//7ad611wyKnJPjLJpKA== X-Received: by 2002:ae9:e002:: with SMTP id m2mr1031128qkk.474.1627630259647; Fri, 30 Jul 2021 00:30:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c6sm504860qke.133.2021.07.30.00.30.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:30:58 -0700 (PDT) Date: Fri, 30 Jul 2021 00:30:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 03/16] huge tmpfs: remove shrinklist addition from shmem_setattr() In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <42353193-6896-aa85-9127-78881d5fef66@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8E63BB0029A7 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=I5y09he3; spf=pass (imf25.hostedemail.com: domain of hughd@google.com designates 209.85.222.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: a9dya9esgcok8dmi6hnq9dm3gxjwy7ia X-HE-Tag: 1627630260-445211 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There's a block of code in shmem_setattr() to add the inode to shmem_unused_huge_shrink()'s shrinklist when lowering i_size: it dates from before 5.7 changed truncation to do split_huge_page() for itself, and should have been removed at that time. I am over-stating that: split_huge_page() can fail (notably if there's an extra reference to the page at that time), so there might be value in retrying. But there were already retries as truncation worked through the tails, and this addition risks repeating unsuccessful retries indefinitely: I'd rather remove it now, and work on reducing the chance of split_huge_page() failures separately, if we need to. Fixes: 71725ed10c40 ("mm: huge tmpfs: try to split_huge_page() when punching hole") Signed-off-by: Hugh Dickins --- mm/shmem.c | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 24c9da6b41c2..ce3ccaac54d6 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1061,7 +1061,6 @@ static int shmem_setattr(struct user_namespace *mnt_userns, { struct inode *inode = d_inode(dentry); struct shmem_inode_info *info = SHMEM_I(inode); - struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); int error; error = setattr_prepare(&init_user_ns, dentry, attr); @@ -1097,24 +1096,6 @@ static int shmem_setattr(struct user_namespace *mnt_userns, if (oldsize > holebegin) unmap_mapping_range(inode->i_mapping, holebegin, 0, 1); - - /* - * Part of the huge page can be beyond i_size: subject - * to shrink under memory pressure. - */ - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - spin_lock(&sbinfo->shrinklist_lock); - /* - * _careful to defend against unlocked access to - * ->shrink_list in shmem_unused_huge_shrink() - */ - if (list_empty_careful(&info->shrinklist)) { - list_add_tail(&info->shrinklist, - &sbinfo->shrinklist); - sbinfo->shrinklist_len++; - } - spin_unlock(&sbinfo->shrinklist_lock); - } } } From patchwork Fri Jul 30 07:36:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410579 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF549C4338F for ; Fri, 30 Jul 2021 07:36:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6772660F5E for ; Fri, 30 Jul 2021 07:36:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6772660F5E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id F0BE78D0001; Fri, 30 Jul 2021 03:36:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB70D6B005D; Fri, 30 Jul 2021 03:36:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7E568D0001; Fri, 30 Jul 2021 03:36:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id BA69C6B0036 for ; Fri, 30 Jul 2021 03:36:54 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 662F38249980 for ; Fri, 30 Jul 2021 07:36:54 +0000 (UTC) X-FDA: 78418447548.22.9CDA500 Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) by imf01.hostedemail.com (Postfix) with ESMTP id 10F69502D690 for ; Fri, 30 Jul 2021 07:36:53 +0000 (UTC) Received: by mail-ot1-f54.google.com with SMTP id z6-20020a9d24860000b02904d14e47202cso8630951ota.4 for ; Fri, 30 Jul 2021 00:36:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=7byT5bHnPOS7y5EU7qUuO5PL/0UgHrhgbJIWzprjQ1c=; b=ORF4v5oDyw8K3+2KbHlXODBIhFHkD4R+F5BXpH8NINHorwAPc7YVmxWyBoy0+VRNAE iRWPTSbOiINERDx6exN7IO5B2+rcQcvZKXTcCtE6QuIObLXuRcETqSxKpDTl4aDVED1W flpybY/IOPY94ZThriNdKbiu4eo4W1HVyfAMe/K+b+McrV4pVrGqz4sHsXLKuhULr0Sy 40L+6BulBdvLKtEDZvIKsPD999DXg6gVvLsh71e+BnYj3vnSg9H1aPldHjOBlpO9edCs /j8px2PnkGQnzt0Y30GWdAI+gXcB/8smhs7YN+/5xTUpp4FKs2IW42nOcpRARJEEu3wt sGlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=7byT5bHnPOS7y5EU7qUuO5PL/0UgHrhgbJIWzprjQ1c=; b=lAu8JQMqSjFn+w+i6S29c2AW4DIUH70IYGky5l8Kb2jkFkwjPKFFeHlaf6C8JCwOPU taNIVh5yGuVpy3M0LYmLlWqb3pJcThr5hbvWekKewHz1AHol4ML5JeEyPqFiRqn+mvgU 1xMQ+4l43cqmTG9cfmAJJ23bkX/fkMjEuRCFuL8NbENyOGSKfNiUsrz2EyvreKVB3uLo aYutHM7Pq97S5H5j1LVX8X2czkV7Y1CDfRBZIFA7jCAR5UTT2vAqvsY8b6wui4DsPUBe S1S200y/f4q8mzA5JGofnfSMvYi3xGeQsS/IE+3yZ6iQb8LxNxAr10WmGxNTEoljncHE qd+w== X-Gm-Message-State: AOAM533qLElMV++9e7YaScs9Ln5mFlmsj9Ft6jPFMgFVp1dfAq7dyVWa /BCHO8no+xd5QyCZbue6Gptr/w== X-Google-Smtp-Source: ABdhPJzLIsT5sg4hoRajKGVV2ZmrG0gayWp/81hGgTA8eD8ZXDDcpn9inia6/YIQi3zQ3umq+wkPoA== X-Received: by 2002:a9d:4911:: with SMTP id e17mr1093386otf.38.1627630613080; Fri, 30 Jul 2021 00:36:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b70sm172434oii.24.2021.07.30.00.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:36:52 -0700 (PDT) Date: Fri, 30 Jul 2021 00:36:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 04/16] huge tmpfs: revert shmem's use of transhuge_vma_enabled() In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 10F69502D690 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=ORF4v5oD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of hughd@google.com designates 209.85.210.54 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: iuxkw44p8bkomfaj7f5f7kb4nuq9q3i6 X-HE-Tag: 1627630613-428116 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 5.14 commit e6be37b2e7bd ("mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()") added transhuge_vma_enabled() as a wrapper for two very different checks: shmem_huge_enabled() prefers to show those two checks explicitly, as before. Signed-off-by: Hugh Dickins --- mm/shmem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index ce3ccaac54d6..c6fa6f4f2db8 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -4003,7 +4003,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma) loff_t i_size; pgoff_t off; - if (!transhuge_vma_enabled(vma, vma->vm_flags)) + if ((vma->vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) return false; if (shmem_huge == SHMEM_HUGE_FORCE) return true; From patchwork Fri Jul 30 07:39:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 790F3C432BE for ; Fri, 30 Jul 2021 07:39:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 25EB460F4B for ; Fri, 30 Jul 2021 07:39:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 25EB460F4B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C45D48D0003; Fri, 30 Jul 2021 03:39:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF4E46B005D; Fri, 30 Jul 2021 03:39:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABD0A8D0003; Fri, 30 Jul 2021 03:39:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 920C66B0036 for ; Fri, 30 Jul 2021 03:39:30 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2659F1AB58 for ; Fri, 30 Jul 2021 07:39:30 +0000 (UTC) X-FDA: 78418454100.03.922628E Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf20.hostedemail.com (Postfix) with ESMTP id D19ABD005849 for ; Fri, 30 Jul 2021 07:39:29 +0000 (UTC) Received: by mail-qt1-f175.google.com with SMTP id b1so5862155qtx.0 for ; Fri, 30 Jul 2021 00:39:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=iB3xdPLbnvzRXZ9pj7SdMcyOa2roZKbkLihlKFhALkM=; b=pEB5hNVNDT29rgSS0xSX6dOBbY1KcxCcqMvGY3gqkTMlD2qdO44nqN1eVywbejwkBi 6wx/4WR4Jr1fWj893QfHi1LOJuaZGArAhQ1mhVHvQZ+ogc6B7C5Zk8gT/FZW95s7NBjn kFDfr2gVH2RmIy1sLtLYDuWZVm/kAcgNgZoad3LQb2en02krriC9y2zB3b+GMWM+MVtE /hOh7Hd6PxcmDQrtEQSzovyYoJJA4W5fIm/0MRKPb1CQwRWqp5lnz0UtvTQC7wS9s0rF 6RWws3/ykqlke0QCR/iaA8RaZ8eSD78Wscco+ddUG1TPCRebInxkJiWEatdd1I1ZQ+Xd w4tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=iB3xdPLbnvzRXZ9pj7SdMcyOa2roZKbkLihlKFhALkM=; b=ie8KBPeMtBDKJ70nWjMZizYGN/1olk4QwA+RT7mUAXaBbIA5VYk1pm/NwxIvKaTpUs fufxkxd4z7XRFX61Z70rs5vivWKXoEqmqwvhj9qkKu+8qf0j2xV7GDmGLTwSBfmDur7t 1yL5b6LNXvmA/+XqKWPgPBT11CVHPLhqIV+eqw1ABRdgndaa0yJlhBy4uUzahtxqnNDt fjgOjdWgdj/YGraDAAptjW59Sgga4IZud4AiACWvAbtYArm+XaCH0/WdE6ttkJaIvMSL Jz/vCyLpUr8uPAzPPW8W2hyXt1FCSTQq26uI+Y+qab5qKm+gm/PNjtShRTod7CU+fRhR netQ== X-Gm-Message-State: AOAM532gVb8ymXFb6JSndQC3P0ryxoztedwhaXok7zbpXO8WQYv+homc m/smNVh4ecrlUXNihTlKlArz6Q== X-Google-Smtp-Source: ABdhPJxItQfjw0SaVTAqN5E1ENTrQa91frTNTj9nKOX3FeDN4EF84vkP5RmmpvWrdhMLgJxTvQ2moQ== X-Received: by 2002:ac8:6b45:: with SMTP id x5mr1138606qts.249.1627630768926; Fri, 30 Jul 2021 00:39:28 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g24sm296473qtr.86.2021.07.30.00.39.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:39:28 -0700 (PDT) Date: Fri, 30 Jul 2021 00:39:24 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 05/16] huge tmpfs: move shmem_huge_enabled() upwards In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D19ABD005849 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=pEB5hNVN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of hughd@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: 67hbwmfjiyx3hucdhbt1iosq1fjnhki5 X-HE-Tag: 1627630769-944058 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: shmem_huge_enabled() is about to be enhanced into shmem_is_huge(), so that it can be used more widely throughout: before making functional changes, shift it to its final position (to avoid forward declaration). Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/shmem.c | 72 ++++++++++++++++++++++++++---------------------------- 1 file changed, 35 insertions(+), 37 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index c6fa6f4f2db8..740d48ef1eb5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -476,6 +476,41 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly; +bool shmem_huge_enabled(struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(vma->vm_file); + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); + loff_t i_size; + pgoff_t off; + + if ((vma->vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + if (shmem_huge == SHMEM_HUGE_FORCE) + return true; + if (shmem_huge == SHMEM_HUGE_DENY) + return false; + switch (sbinfo->huge) { + case SHMEM_HUGE_NEVER: + return false; + case SHMEM_HUGE_ALWAYS: + return true; + case SHMEM_HUGE_WITHIN_SIZE: + off = round_up(vma->vm_pgoff, HPAGE_PMD_NR); + i_size = round_up(i_size_read(inode), PAGE_SIZE); + if (i_size >= HPAGE_PMD_SIZE && + i_size >> PAGE_SHIFT >= off) + return true; + fallthrough; + case SHMEM_HUGE_ADVISE: + /* TODO: implement fadvise() hints */ + return (vma->vm_flags & VM_HUGEPAGE); + default: + VM_BUG_ON(1); + return false; + } +} + #if defined(CONFIG_SYSFS) static int shmem_parse_huge(const char *str) { @@ -3995,43 +4030,6 @@ struct kobj_attribute shmem_enabled_attr = __ATTR(shmem_enabled, 0644, shmem_enabled_show, shmem_enabled_store); #endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -bool shmem_huge_enabled(struct vm_area_struct *vma) -{ - struct inode *inode = file_inode(vma->vm_file); - struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); - loff_t i_size; - pgoff_t off; - - if ((vma->vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return false; - if (shmem_huge == SHMEM_HUGE_FORCE) - return true; - if (shmem_huge == SHMEM_HUGE_DENY) - return false; - switch (sbinfo->huge) { - case SHMEM_HUGE_NEVER: - return false; - case SHMEM_HUGE_ALWAYS: - return true; - case SHMEM_HUGE_WITHIN_SIZE: - off = round_up(vma->vm_pgoff, HPAGE_PMD_NR); - i_size = round_up(i_size_read(inode), PAGE_SIZE); - if (i_size >= HPAGE_PMD_SIZE && - i_size >> PAGE_SHIFT >= off) - return true; - fallthrough; - case SHMEM_HUGE_ADVISE: - /* TODO: implement fadvise() hints */ - return (vma->vm_flags & VM_HUGEPAGE); - default: - VM_BUG_ON(1); - return false; - } -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - #else /* !CONFIG_SHMEM */ /* From patchwork Fri Jul 30 07:42:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE207C4338F for ; Fri, 30 Jul 2021 07:42:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5228060F5C for ; Fri, 30 Jul 2021 07:42:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5228060F5C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DA1A26B0036; Fri, 30 Jul 2021 03:42:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D51B38D0001; Fri, 30 Jul 2021 03:42:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C183A6B006C; Fri, 30 Jul 2021 03:42:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id ABC836B0036 for ; Fri, 30 Jul 2021 03:42:22 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 62E79252B1 for ; Fri, 30 Jul 2021 07:42:22 +0000 (UTC) X-FDA: 78418461324.19.D449C2F Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf21.hostedemail.com (Postfix) with ESMTP id 231C7D0167B9 for ; Fri, 30 Jul 2021 07:42:22 +0000 (UTC) Received: by mail-qt1-f171.google.com with SMTP id d9so5793225qty.12 for ; Fri, 30 Jul 2021 00:42:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=xThcyhwVzbZ0QG3SJMzciKfOkK2D1+jiVLTEbX6+zFE=; b=HbUlXvFb5XYC3VEXO+6211+jPIKP5PobcFagWqFtBrvjjP+Cxr4VLM1qLw4GnGiHY4 7xuX0hmza/QU+KT850MEpof+IzVXeTzrcjvMq53TXy+7TpybwnXgNnwKRewAVUyTaNo8 5AY/s0nJXhrf79mI3dagZwDRkRpk+In3d0tMkqp/K5cVNQhYSpZKOhBstMUE4HeEvmy1 VCd6nPqLn/S/7f884daFRqK6jXJxIu/vJWh60QT2BMOpgippk7xNzyotPf3gqYxvjeKV nKAa/Z5XYPoOxmdZpLtRJqo4iGjrg3G4x11CRSjPTn786elcgmIVZ7SobbMWJkaG2gwv twWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=xThcyhwVzbZ0QG3SJMzciKfOkK2D1+jiVLTEbX6+zFE=; b=ajRvBSd3ftJ0+XwL5OwvruS0JMCuAHgp5bcllhf+ifo9k5a6guMd8395RF+WUfrKzR xW3cm63F7JE0AVguQikVd030YfHMdtnywYG8hgWhkBMsMwCoEgtjWkyYX9GOJHtpadRM qxHAhSREGfaJenV9VvOpVrmGvIFUArtSM9wi7ygfgHbFyuc7xdcYZSjSzXbFb0SpCDOc 5Mw2p1jTcZIujeqjPh4Ha5rYuDJhdA6Dg1ysETEuT7P+6jo4Q6rduEUf0FCXx2xPmF9E Fs3pHMFmhWXP3lxFQXR7koTksyOmAH4Vb/OfKIAxZQ5cCjSseV0POhLH36D69T7fun3C i2Sg== X-Gm-Message-State: AOAM530TYnUIbWbHhjXnloxk7LxGlo3A/tpyEZzl0E/BKWHZTJXhpiu9 7bBcT2YXlH5//lmuQRIube9fzw== X-Google-Smtp-Source: ABdhPJzarucL0DNU7lVhbHFW8SWRQUj2t3cj9hw+0XuyUqg25IndM2TpaLf5DYuxw/fq6deq1ypAvg== X-Received: by 2002:ac8:72d6:: with SMTP id o22mr1139596qtp.177.1627630941171; Fri, 30 Jul 2021 00:42:21 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d79sm547197qke.45.2021.07.30.00.42.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:42:20 -0700 (PDT) Date: Fri, 30 Jul 2021 00:42:16 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 06/16] huge tmpfs: shmem_is_huge(vma, inode, index) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 231C7D0167B9 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=HbUlXvFb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of hughd@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: 6xbc5qfcb83wyaqrj3r68jpney4ke561 X-HE-Tag: 1627630942-677791 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Extend shmem_huge_enabled(vma) to shmem_is_huge(vma, inode, index), so that a consistent set of checks can be applied, even when the inode is accessed through read/write syscalls (with NULL vma) instead of mmaps (the index argument is seldom of interest, but required by mount option "huge=within_size"). Clean up and rearrange the checks a little. This then replaces the checks which shmem_fault() and shmem_getpage_gfp() were making, and eliminates the SGP_HUGE and SGP_NOHUGE modes: while it's still true that khugepaged's collapse_file() at that point wants a small page, the race that might allocate it a huge page is too unlikely to be worth optimizing against (we are there *because* there was at least one small page in the way), and handled by a later PageTransCompound check. Replace a couple of 0s by explicit SHMEM_HUGE_NEVERs; and replace the obscure !shmem_mapping() symlink check by explicit S_ISLNK() - nothing else needs that symlink check, so leave it there in shmem_getpage_gfp(). Signed-off-by: Hugh Dickins --- include/linux/shmem_fs.h | 9 +++-- mm/khugepaged.c | 2 +- mm/shmem.c | 84 ++++++++++++---------------------------- 3 files changed, 32 insertions(+), 63 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 9b7f7ac52351..3b05a28e34c4 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -86,7 +86,12 @@ extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); extern int shmem_unuse(unsigned int type, bool frontswap, unsigned long *fs_pages_to_unuse); -extern bool shmem_huge_enabled(struct vm_area_struct *vma); +extern bool shmem_is_huge(struct vm_area_struct *vma, + struct inode *inode, pgoff_t index); +static inline bool shmem_huge_enabled(struct vm_area_struct *vma) +{ + return shmem_is_huge(vma, file_inode(vma->vm_file), vma->vm_pgoff); +} extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, pgoff_t start, pgoff_t end); @@ -95,8 +100,6 @@ extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ SGP_CACHE, /* don't exceed i_size, may allocate page */ - SGP_NOHUGE, /* like SGP_CACHE, but no huge pages */ - SGP_HUGE, /* like SGP_CACHE, huge pages preferred */ SGP_WRITE, /* may exceed i_size, may allocate !Uptodate page */ SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */ }; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0412be08fa2..cecb19c3e965 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1721,7 +1721,7 @@ static void collapse_file(struct mm_struct *mm, xas_unlock_irq(&xas); /* swap in or instantiate fallocated page */ if (shmem_getpage(mapping->host, index, &page, - SGP_NOHUGE)) { + SGP_CACHE)) { result = SCAN_FAIL; goto xa_unlocked; } diff --git a/mm/shmem.c b/mm/shmem.c index 740d48ef1eb5..6def7391084c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -474,39 +474,35 @@ static bool shmem_confirm_swap(struct address_space *mapping, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* ifdef here to avoid bloating shmem.o when not necessary */ -static int shmem_huge __read_mostly; +static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; -bool shmem_huge_enabled(struct vm_area_struct *vma) +bool shmem_is_huge(struct vm_area_struct *vma, + struct inode *inode, pgoff_t index) { - struct inode *inode = file_inode(vma->vm_file); - struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); loff_t i_size; - pgoff_t off; - if ((vma->vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return false; - if (shmem_huge == SHMEM_HUGE_FORCE) - return true; if (shmem_huge == SHMEM_HUGE_DENY) return false; - switch (sbinfo->huge) { - case SHMEM_HUGE_NEVER: + if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return false; + if (shmem_huge == SHMEM_HUGE_FORCE) + return true; + + switch (SHMEM_SB(inode->i_sb)->huge) { case SHMEM_HUGE_ALWAYS: return true; case SHMEM_HUGE_WITHIN_SIZE: - off = round_up(vma->vm_pgoff, HPAGE_PMD_NR); + index = round_up(index, HPAGE_PMD_NR); i_size = round_up(i_size_read(inode), PAGE_SIZE); - if (i_size >= HPAGE_PMD_SIZE && - i_size >> PAGE_SHIFT >= off) + if (i_size >= HPAGE_PMD_SIZE && (i_size >> PAGE_SHIFT) >= index) return true; fallthrough; case SHMEM_HUGE_ADVISE: - /* TODO: implement fadvise() hints */ - return (vma->vm_flags & VM_HUGEPAGE); + if (vma && (vma->vm_flags & VM_HUGEPAGE)) + return true; + fallthrough; default: - VM_BUG_ON(1); return false; } } @@ -680,6 +676,12 @@ static long shmem_unused_huge_count(struct super_block *sb, #define shmem_huge SHMEM_HUGE_DENY +bool shmem_is_huge(struct vm_area_struct *vma, + struct inode *inode, pgoff_t index) +{ + return false; +} + static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, struct shrink_control *sc, unsigned long nr_to_split) { @@ -1829,7 +1831,6 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, struct shmem_sb_info *sbinfo; struct mm_struct *charge_mm; struct page *page; - enum sgp_type sgp_huge = sgp; pgoff_t hindex = index; gfp_t huge_gfp; int error; @@ -1838,8 +1839,6 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) return -EFBIG; - if (sgp == SGP_NOHUGE || sgp == SGP_HUGE) - sgp = SGP_CACHE; repeat: if (sgp <= SGP_CACHE && ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) { @@ -1898,36 +1897,12 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, return 0; } - /* shmem_symlink() */ - if (!shmem_mapping(mapping)) - goto alloc_nohuge; - if (shmem_huge == SHMEM_HUGE_DENY || sgp_huge == SGP_NOHUGE) + /* Never use a huge page for shmem_symlink() */ + if (S_ISLNK(inode->i_mode)) goto alloc_nohuge; - if (shmem_huge == SHMEM_HUGE_FORCE) - goto alloc_huge; - switch (sbinfo->huge) { - case SHMEM_HUGE_NEVER: + if (!shmem_is_huge(vma, inode, index)) goto alloc_nohuge; - case SHMEM_HUGE_WITHIN_SIZE: { - loff_t i_size; - pgoff_t off; - - off = round_up(index, HPAGE_PMD_NR); - i_size = round_up(i_size_read(inode), PAGE_SIZE); - if (i_size >= HPAGE_PMD_SIZE && - i_size >> PAGE_SHIFT >= off) - goto alloc_huge; - fallthrough; - } - case SHMEM_HUGE_ADVISE: - if (sgp_huge == SGP_HUGE) - goto alloc_huge; - /* TODO: implement fadvise() hints */ - goto alloc_nohuge; - } - -alloc_huge: huge_gfp = vma_thp_gfp_mask(vma); huge_gfp = limit_gfp_mask(huge_gfp, gfp); page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true); @@ -2083,7 +2058,6 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; struct inode *inode = file_inode(vma->vm_file); gfp_t gfp = mapping_gfp_mask(inode->i_mapping); - enum sgp_type sgp; int err; vm_fault_t ret = VM_FAULT_LOCKED; @@ -2146,15 +2120,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) spin_unlock(&inode->i_lock); } - sgp = SGP_CACHE; - - if ((vma->vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - sgp = SGP_NOHUGE; - else if (vma->vm_flags & VM_HUGEPAGE) - sgp = SGP_HUGE; - - err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, sgp, + err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE, gfp, vma, vmf, &ret); if (err) return vmf_error(err); @@ -3961,7 +3927,7 @@ int __init shmem_init(void) if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY) SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge; else - shmem_huge = 0; /* just in case it was patched */ + shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */ #endif return 0; From patchwork Fri Jul 30 07:45:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9AA3C4338F for ; Fri, 30 Jul 2021 07:45:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5488160EBB for ; Fri, 30 Jul 2021 07:45:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5488160EBB Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E393F6B0036; Fri, 30 Jul 2021 03:45:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE99C6B005D; Fri, 30 Jul 2021 03:45:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB2016B006C; Fri, 30 Jul 2021 03:45:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id AEA596B0036 for ; Fri, 30 Jul 2021 03:45:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 45292253D5 for ; Fri, 30 Jul 2021 07:45:54 +0000 (UTC) X-FDA: 78418470228.03.4CF972C Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by imf18.hostedemail.com (Postfix) with ESMTP id F0EAE4001890 for ; Fri, 30 Jul 2021 07:45:53 +0000 (UTC) Received: by mail-qk1-f177.google.com with SMTP id x3so8605826qkl.6 for ; Fri, 30 Jul 2021 00:45:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=Hjb4Qvz+d4rYfp83KEA4cSUCQ4NRMEv6hD5clar2zEo=; b=a7HxZWdRuQp/zhxdj9ZcSffkcLO8HYoMF85EQHR69lnn6lWZFwTxyv1XpqKMeLtIqv XIXaGDHJwfWpAl+KwQV0h6XIqzYEOWE60HA9eqcxGD37L31oqLG3edyDXvQ8Retknk6K 0WuRqc35Hc3ncI3tEDPx4nIPfhoOoxRviPd0VD2XUuZ+hb4W1vHQugu6R4CpXCa/DO8C KSOXL82pzPwx3B5pU6GJNxVM/E22uC7bFDvuHFjKggCu97Z4dxW6o33YLWi4uEDKMi6K z/rRSowG6Za2DhA/BYRuReLIYJQ/tXN+EonrH/gzp+MW3wGfZKCQhOAp8C0SrnHPDetw UGeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=Hjb4Qvz+d4rYfp83KEA4cSUCQ4NRMEv6hD5clar2zEo=; b=kzKUtHt8X3NfSGIWtmswSEaALbG96LAKZKqKuKQJVQJ+nZqZtTPwWAYjmNb8gOhkUY hmLxNaXfo/AaDZQ79pDveN7cOZolbNNNv849Os8fkdsfhJoAhxkatjhj7lqozZyfz+7a bN7VnDG8EWMg5ek4vuinurwC0G+NZTE0+ZRy9xh9DB/5e6PhEe2q7tzluywGdjSBaRzi jOC/pCbtDVj7aiEQ2hKmCKa+oeF/9IDlvsmmOt0nlnAOyUzqRkIdAFNzo1nnpNsLCaXe 8ZyLd+aBK5IJcTUPKxSdFk7vlakJkPa/OZhywGpHobPIUmVPVkOfr8Qa3vCr4k3JXXsR oR6Q== X-Gm-Message-State: AOAM531ha3KefwnfMQm7X7JQuaDFe8MAthEwZNf5cGifNN3iYg1Zw27e B0H1QUIKWTPDqODLIM7kXa1X7g== X-Google-Smtp-Source: ABdhPJyxwl7K7cMuTTQwVGAyfcH23tbKgh3bhsC84QKSqBFc0wlrNsNL3TZXdmX5JSzubXT3+108+A== X-Received: by 2002:ae9:e901:: with SMTP id x1mr1079559qkf.360.1627631153052; Fri, 30 Jul 2021 00:45:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x125sm539177qkd.8.2021.07.30.00.45.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:45:52 -0700 (PDT) Date: Fri, 30 Jul 2021 00:45:49 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/16] memfd: memfd_create(name, MFD_HUGEPAGE) for shmem huge pages In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F0EAE4001890 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=a7HxZWdR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of hughd@google.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: mzdma15uzmsmjtri5d3yzz6u5kz31ieg X-HE-Tag: 1627631153-617003 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit 749df87bd7be ("mm/shmem: add hugetlbfs support to memfd_create()") in 4.14 added the MFD_HUGETLB flag to memfd_create(), to use hugetlbfs pages instead of tmpfs pages: now add the MFD_HUGEPAGE flag, to use tmpfs Transparent Huge Pages when they can be allocated (flag named to follow the precedent of madvise's MADV_HUGEPAGE for THPs). /sys/kernel/mm/transparent_hugepage/shmem_enabled "always" or "force" already made this possible: but that is much too blunt an instrument, affecting all the very different kinds of files on the internal shmem mount, and was intended just for ease of testing hugepage loads. MFD_HUGEPAGE is implemented internally by VM_HUGEPAGE in the shmem inode flags: do not permit a PR_SET_THP_DISABLE (MMF_DISABLE_THP) task to set this flag, and do not set it if THPs are not allowed at all; but let the memfd_create() succeed even in those cases - the caller wants to create a memfd, just hinting how it's best allocated if huge pages are available. shmem_is_huge() (at allocation time or khugepaged time) applies its SHMEM_HUGE_DENY and vma VM_NOHUGEPAGE and vm_mm MMF_DISABLE_THP checks first, and only then allows the memfd's MFD_HUGEPAGE to take effect. Signed-off-by: Hugh Dickins Reported-by: kernel test robot --- include/uapi/linux/memfd.h | 3 ++- mm/memfd.c | 24 ++++++++++++++++++------ mm/shmem.c | 33 +++++++++++++++++++++++++++++++-- 3 files changed, 51 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h index 7a8a26751c23..8358a69e78cc 100644 --- a/include/uapi/linux/memfd.h +++ b/include/uapi/linux/memfd.h @@ -7,7 +7,8 @@ /* flags for memfd_create(2) (unsigned int) */ #define MFD_CLOEXEC 0x0001U #define MFD_ALLOW_SEALING 0x0002U -#define MFD_HUGETLB 0x0004U +#define MFD_HUGETLB 0x0004U /* Use hugetlbfs */ +#define MFD_HUGEPAGE 0x0008U /* Use huge tmpfs */ /* * Huge page size encoding when MFD_HUGETLB is specified, and a huge page diff --git a/mm/memfd.c b/mm/memfd.c index 081dd33e6a61..0d1a504d2fc9 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -245,7 +245,10 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB) +#define MFD_ALL_FLAGS (MFD_CLOEXEC | \ + MFD_ALLOW_SEALING | \ + MFD_HUGETLB | \ + MFD_HUGEPAGE) SYSCALL_DEFINE2(memfd_create, const char __user *, uname, @@ -257,14 +260,17 @@ SYSCALL_DEFINE2(memfd_create, char *name; long len; - if (!(flags & MFD_HUGETLB)) { - if (flags & ~(unsigned int)MFD_ALL_FLAGS) + if (flags & MFD_HUGETLB) { + /* Disallow huge tmpfs when choosing hugetlbfs */ + if (flags & MFD_HUGEPAGE) return -EINVAL; - } else { /* Allow huge page size encoding in flags. */ if (flags & ~(unsigned int)(MFD_ALL_FLAGS | (MFD_HUGE_MASK << MFD_HUGE_SHIFT))) return -EINVAL; + } else { + if (flags & ~(unsigned int)MFD_ALL_FLAGS) + return -EINVAL; } /* length includes terminating zero */ @@ -303,8 +309,14 @@ SYSCALL_DEFINE2(memfd_create, HUGETLB_ANONHUGE_INODE, (flags >> MFD_HUGE_SHIFT) & MFD_HUGE_MASK); - } else - file = shmem_file_setup(name, 0, VM_NORESERVE); + } else { + unsigned long vm_flags = VM_NORESERVE; + + if (flags & MFD_HUGEPAGE) + vm_flags |= VM_HUGEPAGE; + file = shmem_file_setup(name, 0, vm_flags); + } + if (IS_ERR(file)) { error = PTR_ERR(file); goto err_fd; diff --git a/mm/shmem.c b/mm/shmem.c index 6def7391084c..e2bcf3313686 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -476,6 +476,20 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; +/* + * Does either /sys/kernel/mm/transparent_hugepage/shmem_enabled or + * /sys/kernel/mm/transparent_hugepage/enabled allow transparent hugepages? + * (Can only return true when the machine has_transparent_hugepage() too.) + */ +static bool transparent_hugepage_allowed(void) +{ + return shmem_huge > SHMEM_HUGE_NEVER || + test_bit(TRANSPARENT_HUGEPAGE_FLAG, + &transparent_hugepage_flags) || + test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, + &transparent_hugepage_flags); +} + bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, pgoff_t index) { @@ -486,6 +500,8 @@ bool shmem_is_huge(struct vm_area_struct *vma, if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return false; + if (SHMEM_I(inode)->flags & VM_HUGEPAGE) + return true; if (shmem_huge == SHMEM_HUGE_FORCE) return true; @@ -676,6 +692,11 @@ static long shmem_unused_huge_count(struct super_block *sb, #define shmem_huge SHMEM_HUGE_DENY +bool transparent_hugepage_allowed(void) +{ + return false; +} + bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, pgoff_t index) { @@ -2171,10 +2192,14 @@ unsigned long shmem_get_unmapped_area(struct file *file, if (shmem_huge != SHMEM_HUGE_FORCE) { struct super_block *sb; + struct inode *inode; if (file) { VM_BUG_ON(file->f_op != &shmem_file_operations); - sb = file_inode(file)->i_sb; + inode = file_inode(file); + if (SHMEM_I(inode)->flags & VM_HUGEPAGE) + goto huge; + sb = inode->i_sb; } else { /* * Called directly from mm/mmap.c, or drivers/char/mem.c @@ -2187,7 +2212,7 @@ unsigned long shmem_get_unmapped_area(struct file *file, if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) return addr; } - +huge: offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1); if (offset && offset + len < 2 * HPAGE_PMD_SIZE) return addr; @@ -2308,6 +2333,10 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode atomic_set(&info->stop_eviction, 0); info->seals = F_SEAL_SEAL; info->flags = flags & VM_NORESERVE; + if ((flags & VM_HUGEPAGE) && + transparent_hugepage_allowed() && + !test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) + info->flags |= VM_HUGEPAGE; INIT_LIST_HEAD(&info->shrinklist); INIT_LIST_HEAD(&info->swaplist); simple_xattrs_init(&info->xattrs); From patchwork Fri Jul 30 07:48:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD508C4338F for ; Fri, 30 Jul 2021 07:48:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4265961040 for ; Fri, 30 Jul 2021 07:48:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4265961040 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id AB4466B005D; Fri, 30 Jul 2021 03:48:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A64288D0001; Fri, 30 Jul 2021 03:48:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9522C6B0070; Fri, 30 Jul 2021 03:48:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0094.hostedemail.com [216.40.44.94]) by kanga.kvack.org (Postfix) with ESMTP id 792836B005D for ; Fri, 30 Jul 2021 03:48:38 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0FC3E26DEB for ; Fri, 30 Jul 2021 07:48:38 +0000 (UTC) X-FDA: 78418477116.06.A0B0138 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf23.hostedemail.com (Postfix) with ESMTP id BE35B900ED49 for ; Fri, 30 Jul 2021 07:48:37 +0000 (UTC) Received: by mail-qt1-f169.google.com with SMTP id l24so5823696qtj.4 for ; Fri, 30 Jul 2021 00:48:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=EDpc0/d9Oj/hIds1HKJj7zYeYPkeXWmfcCU40dY3+sw=; b=mulBDyHU6yp0zs2CFm2s/qF8ZQrZf+J2EM92FJ7UD86JWzy6+B+y02mYtGpoe/bymr FulZ9jgWv8TSoZG2zmk+sh3BcGjARG5wTNH9RMKlhHCPgTdQ9t44NZmIsEgGqSX7T2eD ZPciAH+CbJ+bJ8Qkv/+dK/eJ6h2fJkahljPWaWbVMl0NaQvNAZyCdL6QRkftIXIxCfru FCfAjzrflmygFr09VLHPRLI4xj5kDYcsZ0yTrpmH6LR/6BSPrUqUw76Pt5G0B1yewiOt i3ppecxRxcCkJ1xMFXsBjQ7uC+2vG3bqwdqVS5Kk5oCB3p3WZ+BiUnt7eoup6ZDTLsQC H9JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=EDpc0/d9Oj/hIds1HKJj7zYeYPkeXWmfcCU40dY3+sw=; b=K/s+M2qi50loPiFe2gLp6AOufoXvw3Om+f7f2eeHjPyyvIJ3S7wOR9uQrVf+UAeejN 3Ee4VhvfJeKXiQUTnfFtkEofTPF9vg+tkDncyli51/Gtr70mklJ7OkEjJ7/y+PA/wf+5 RvK4espzGeQQx9kJct3vCjJFw5ohHiHjWYRxci8r5Qw1qBkEvYiTNHirz2EqD9w5qxE/ aewykKSwnYV/dqm+/VgW0tDUCc0o8jL7ElzwOYg6hSREgfaNTRvR2nztmJkh809WGL2k 2FShzlvAe/MbE2pIkuHDQ9Rtiv9zJvcLidtGb5DFVTK+TWH7LK6ZOa4imOr/3b/uQVeX l58g== X-Gm-Message-State: AOAM532/JHj9h8x/v2+698DCROQtdcxkY+lF5UeoeFG5gdK0wDGyOPjH SjVX7W4aN9JdEmrubiiVNndImw== X-Google-Smtp-Source: ABdhPJxpjf8WwHJfSjqtA9nxku8loLDko6+6MLVcrUnY5FWupM5kOcvLXGpW+9Cz0tLmMtD8aDa0vA== X-Received: by 2002:ac8:a84:: with SMTP id d4mr1172505qti.109.1627631316774; Fri, 30 Jul 2021 00:48:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id l4sm304571qtr.62.2021.07.30.00.48.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:48:35 -0700 (PDT) Date: Fri, 30 Jul 2021 00:48:33 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 08/16] huge tmpfs: fcntl(fd, F_HUGEPAGE) and fcntl(fd, F_NOHUGEPAGE) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <1c32c75b-095-22f0-aee3-30a44d4a4744@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BE35B900ED49 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=mulBDyHU; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: irxzkgp5yo68wgah3hj87ije37afjs4m X-HE-Tag: 1627631317-673452 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add support for fcntl(fd, F_HUGEPAGE) and fcntl(fd, F_NOHUGEPAGE), to select hugeness per file: useful to override the default hugeness of the shmem mount, when occasionally needing to store a hugepage file in a smallpage mount or vice versa. These fcntls just specify whether or not to try for huge pages when allocating to the object later: F_HUGEPAGE does not touch small pages already allocated (though khugepaged may do so when the file is mapped afterwards), F_NOHUGEPAGE does not split huge pages already allocated. Why fcntl? Because it's already in use (for sealing) on memfds; and I'm anxious to keep this simple, just applying it to whole files: fallocate, madvise and posix_fadvise each involve a range, which would need a new kind of tree attached to the inode for proper support. Any application needing range support should be able to provide that from userspace, by issuing the respective fcntl prior to instantiating each range. Do not allow it when the file is open read-only (EBADF). Do not permit a PR_SET_THP_DISABLE (MMF_DISABLE_THP) task to interfere with the flags, and do not let VM_HUGEPAGE be set if THPs are not allowed at all (EPERM). Note that transparent_hugepage_allowed(), used to validate F_HUGEPAGE, accepts (anon) transparent_hugepage_flags in addition to mount option. This is to overcome the limitation of the "huge=advise" option, which applies hugepage alignment (reducing ASLR) to all mappings, because madvise(address,len,MADV_HUGEPAGE) needs address before it can be used. So mount option "huge=never" gives a default which can be overridden by fcntl(fd, F_HUGEPAGE) when /sys/kernel/mm/transparent_hugepage/enabled is not "never" too. (We could instead add a "huge=fcntl" mount option between "never" and "advise", but I lack the enthusiasm for that.) Signed-off-by: Hugh Dickins --- fs/fcntl.c | 5 +++ include/linux/shmem_fs.h | 8 +++++ include/uapi/linux/fcntl.h | 9 +++++ mm/shmem.c | 70 ++++++++++++++++++++++++++++++++++---- 4 files changed, 85 insertions(+), 7 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index f946bec8f1f1..9cfff87c3332 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -434,6 +435,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, case F_SET_FILE_RW_HINT: err = fcntl_rw_hint(filp, cmd, arg); break; + case F_HUGEPAGE: + case F_NOHUGEPAGE: + err = shmem_fcntl(filp, cmd, arg); + break; default: break; } diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 3b05a28e34c4..51b75d74ce89 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -67,6 +67,14 @@ extern int shmem_zero_setup(struct vm_area_struct *); extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts); +#ifdef CONFIG_TMPFS +extern long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg); +#else +static inline long shmem_fcntl(struct file *f, unsigned int c, unsigned long a) +{ + return -EINVAL; +} +#endif /* CONFIG_TMPFS */ #ifdef CONFIG_SHMEM extern const struct address_space_operations shmem_aops; static inline bool shmem_mapping(struct address_space *mapping) diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 2f86b2ad6d7e..10f82b223642 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -73,6 +73,15 @@ */ #define RWF_WRITE_LIFE_NOT_SET RWH_WRITE_LIFE_NOT_SET +/* + * Allocate hugepages when available: useful on a tmpfs which was not mounted + * with the "huge=always" option, as for memfds. And, do not allocate hugepages + * even when available: useful to cancel the above request, or make an exception + * on a tmpfs mounted with "huge=always" (without splitting existing hugepages). + */ +#define F_HUGEPAGE (F_LINUX_SPECIFIC_BASE + 15) +#define F_NOHUGEPAGE (F_LINUX_SPECIFIC_BASE + 16) + /* * Types of directory notifications that may be requested. */ diff --git a/mm/shmem.c b/mm/shmem.c index e2bcf3313686..67a4b7a4849b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -448,9 +448,9 @@ static bool shmem_confirm_swap(struct address_space *mapping, * enables huge pages for the mount; * SHMEM_HUGE_WITHIN_SIZE: * only allocate huge pages if the page will be fully within i_size, - * also respect fadvise()/madvise() hints; + * also respect fcntl()/madvise() hints; * SHMEM_HUGE_ADVISE: - * only allocate huge pages if requested with fadvise()/madvise(); + * only allocate huge pages if requested with fcntl()/madvise(). */ #define SHMEM_HUGE_NEVER 0 @@ -477,13 +477,13 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; /* - * Does either /sys/kernel/mm/transparent_hugepage/shmem_enabled or + * Does either tmpfs mount option (or transparent_hugepage/shmem_enabled) or * /sys/kernel/mm/transparent_hugepage/enabled allow transparent hugepages? * (Can only return true when the machine has_transparent_hugepage() too.) */ -static bool transparent_hugepage_allowed(void) +static bool transparent_hugepage_allowed(struct shmem_sb_info *sbinfo) { - return shmem_huge > SHMEM_HUGE_NEVER || + return sbinfo->huge > SHMEM_HUGE_NEVER || test_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags) || test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, @@ -500,6 +500,8 @@ bool shmem_is_huge(struct vm_area_struct *vma, if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return false; + if (SHMEM_I(inode)->flags & VM_NOHUGEPAGE) + return false; if (SHMEM_I(inode)->flags & VM_HUGEPAGE) return true; if (shmem_huge == SHMEM_HUGE_FORCE) @@ -692,7 +694,7 @@ static long shmem_unused_huge_count(struct super_block *sb, #define shmem_huge SHMEM_HUGE_DENY -bool transparent_hugepage_allowed(void) +bool transparent_hugepage_allowed(struct shmem_sb_info *sbinfo) { return false; } @@ -2197,6 +2199,8 @@ unsigned long shmem_get_unmapped_area(struct file *file, if (file) { VM_BUG_ON(file->f_op != &shmem_file_operations); inode = file_inode(file); + if (SHMEM_I(inode)->flags & VM_NOHUGEPAGE) + return addr; if (SHMEM_I(inode)->flags & VM_HUGEPAGE) goto huge; sb = inode->i_sb; @@ -2211,6 +2215,11 @@ unsigned long shmem_get_unmapped_area(struct file *file, } if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) return addr; + /* + * Note that SHMEM_HUGE_ADVISE has to give out huge-aligned + * addresses to everyone, because madvise(,,MADV_HUGEPAGE) + * needs the address-chicken on which to advise if huge-egg. + */ } huge: offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1); @@ -2334,7 +2343,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode info->seals = F_SEAL_SEAL; info->flags = flags & VM_NORESERVE; if ((flags & VM_HUGEPAGE) && - transparent_hugepage_allowed() && + transparent_hugepage_allowed(sbinfo) && !test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) info->flags |= VM_HUGEPAGE; INIT_LIST_HEAD(&info->shrinklist); @@ -2674,6 +2683,53 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence) return offset; } +static int shmem_huge_fcntl(struct file *file, unsigned int cmd) +{ + struct inode *inode = file_inode(file); + struct shmem_inode_info *info = SHMEM_I(inode); + + if (!(file->f_mode & FMODE_WRITE)) + return -EBADF; + if (test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) + return -EPERM; + if (cmd == F_HUGEPAGE && + !transparent_hugepage_allowed(SHMEM_SB(inode->i_sb))) + return -EPERM; + + inode_lock(inode); + if (cmd == F_HUGEPAGE) { + info->flags &= ~VM_NOHUGEPAGE; + info->flags |= VM_HUGEPAGE; + } else { + info->flags &= ~VM_HUGEPAGE; + info->flags |= VM_NOHUGEPAGE; + } + inode_unlock(inode); + return 0; +} + +long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg) +{ + long error = -EINVAL; + + if (file->f_op != &shmem_file_operations) + return error; + + switch (cmd) { + /* + * case F_ADD_SEALS: + * case F_GET_SEALS: + * are handled by memfd_fcntl(). + */ + case F_HUGEPAGE: + case F_NOHUGEPAGE: + error = shmem_huge_fcntl(file, cmd); + break; + } + + return error; +} + static long shmem_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { From patchwork Fri Jul 30 07:51:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AFE3C4338F for ; Fri, 30 Jul 2021 07:51:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B884160C51 for ; Fri, 30 Jul 2021 07:51:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B884160C51 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 50D3A8D0002; Fri, 30 Jul 2021 03:51:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4950F8D0001; Fri, 30 Jul 2021 03:51:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 335B08D0002; Fri, 30 Jul 2021 03:51:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0217.hostedemail.com [216.40.44.217]) by kanga.kvack.org (Postfix) with ESMTP id 134B88D0001 for ; Fri, 30 Jul 2021 03:51:05 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AAD5B183DC7D3 for ; Fri, 30 Jul 2021 07:51:04 +0000 (UTC) X-FDA: 78418483248.14.EBB3198 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by imf16.hostedemail.com (Postfix) with ESMTP id 66CE4F003753 for ; Fri, 30 Jul 2021 07:51:04 +0000 (UTC) Received: by mail-qk1-f178.google.com with SMTP id t66so8679527qkb.0 for ; Fri, 30 Jul 2021 00:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=ZYvXgAMhZX01NKmlvH33RhZ44JsHMLRjdU/+X4oPAP0=; b=M++CnQDxvFEiHc9PulkSLAh6kL9L0Mf02pYSg26QbAHd0RupUQVcTbQ6JejGvxhBWa uDCs7A2+Blzlyec01AEwB3KYzygoRhepQ6cM7zkuC26SuVwQAgqYyDMYGvAb+p6tuWVt uAti9yDs3Y0FMyKX/H1+I893Cjxm2AstBrNhJ5/Yj6dO2Njiiu0h7F7BLrpjKHQf7Ag3 iOEagjkCO1QRJk1OwPPNUwpwm6/wsrkuI72zheG31DfrIoFPKjDcKUJykTvL4iMsvS+h dkQMCsTsIQBzcYeQAaiSJFwaYWNXTllFQVFbKi2YeGxu3YbeQ8/w4dCKKQF9WC7QqTrC hx9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=ZYvXgAMhZX01NKmlvH33RhZ44JsHMLRjdU/+X4oPAP0=; b=PhX/F7L2HtI3WHROE6YGN5n6F5EKsrcJURVJLXb6pTUcmlP9w4OFA5hv+nEAjmhD+3 XXDEZ6Y2AWWqKg1eL6h7GrlsX8uFYi5lh2q0fq95OH8TOUX/IfuPx8ZvMjtJAhHTt7gZ RyJ7A3vp0vVDyG1pxMw/AJtUgx+qZ25lfzXU/WfdoZ0WAYwTUrZ2L+ASzbU6/HnR+1Bn b0v7eoSxJhwbJUVj5Z55E17upSy7wfZwxFiv/PjsLlw71JESf26r4sk4vSeBUiAPbqrs CGKPFzR5+LnxD1vJ5L5Lq6FY+JzOwJ4gsKwgRq+tZnPGFZFnPnEcGhbLCSvBHsdpk/nR SoVg== X-Gm-Message-State: AOAM533rsg20hkaW81Jg8hEq0xWB5aiRWLm6uHoCJbQNYazQA6YPApcg KFTSWXiOJ3vP7aGikOrzWGlYNA== X-Google-Smtp-Source: ABdhPJyPBn/olLKduBhhHdPeEPWykg5rqcweE1bgQNc9dr4GS3RCw9sWt+ENRQNyeOq3zhFv/hgaog== X-Received: by 2002:a05:620a:1242:: with SMTP id a2mr1063374qkl.443.1627631463539; Fri, 30 Jul 2021 00:51:03 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7sm507569qki.102.2021.07.30.00.51.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:51:02 -0700 (PDT) Date: Fri, 30 Jul 2021 00:51:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 09/16] huge tmpfs: decide stat.st_blksize by shmem_is_huge() In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=M++CnQDx; spf=pass (imf16.hostedemail.com: domain of hughd@google.com designates 209.85.222.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-Stat-Signature: a5sk7fn3wgu8e3o8qjhcjeke6sfqt6os X-Rspamd-Queue-Id: 66CE4F003753 X-HE-Tag: 1627631464-2426 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 4.18 commit 89fdcd262fd4 ("mm: shmem: make stat.st_blksize return huge page size if THP is on") added is_huge_enabled() to decide st_blksize: now that hugeness can be defined per file, that too needs to be replaced by shmem_is_huge(). Unless they have been fcntl'ed F_HUGEPAGE, this does give a different answer (No) for small files on a "huge=within_size" mount: but that can be considered a minor bugfix. And a different answer (No) for unfcntl'ed files on a "huge=advise" mount: I'm reluctant to complicate it, just to reproduce the same debatable answer as before. Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/shmem.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 67a4b7a4849b..f50f2ede71da 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -712,15 +712,6 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool is_huge_enabled(struct shmem_sb_info *sbinfo) -{ - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && - (shmem_huge == SHMEM_HUGE_FORCE || sbinfo->huge) && - shmem_huge != SHMEM_HUGE_DENY) - return true; - return false; -} - /* * Like add_to_page_cache_locked, but error if expected item has gone. */ @@ -1101,7 +1092,6 @@ static int shmem_getattr(struct user_namespace *mnt_userns, { struct inode *inode = path->dentry->d_inode; struct shmem_inode_info *info = SHMEM_I(inode); - struct shmem_sb_info *sb_info = SHMEM_SB(inode->i_sb); if (info->alloced - info->swapped != inode->i_mapping->nrpages) { spin_lock_irq(&info->lock); @@ -1110,7 +1100,7 @@ static int shmem_getattr(struct user_namespace *mnt_userns, } generic_fillattr(&init_user_ns, inode, stat); - if (is_huge_enabled(sb_info)) + if (shmem_is_huge(NULL, inode, 0)) stat->blksize = HPAGE_PMD_SIZE; return 0; From patchwork Fri Jul 30 07:55:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6FC5C4338F for ; Fri, 30 Jul 2021 07:55:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 648AA60C51 for ; Fri, 30 Jul 2021 07:55:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 648AA60C51 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 04CFD8D0002; Fri, 30 Jul 2021 03:55:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3EA48D0001; Fri, 30 Jul 2021 03:55:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2F688D0002; Fri, 30 Jul 2021 03:55:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id C934F8D0001 for ; Fri, 30 Jul 2021 03:55:27 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 75E5418464DFC for ; Fri, 30 Jul 2021 07:55:27 +0000 (UTC) X-FDA: 78418494294.03.922C5FD Received: from mail-oi1-f173.google.com (mail-oi1-f173.google.com [209.85.167.173]) by imf16.hostedemail.com (Postfix) with ESMTP id 2F629F003751 for ; Fri, 30 Jul 2021 07:55:27 +0000 (UTC) Received: by mail-oi1-f173.google.com with SMTP id z26so12066889oih.10 for ; Fri, 30 Jul 2021 00:55:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=RMStr+4np8YoPskN92/pPzruONJ2EqyVRrBa1/D0LCs=; b=lh3y8fimrdjE5vv7C7NyR40xZo8b9wN1o3GcQD0WrNYcvMjTmwIS6m0UMTToLDPzzO jhutDxwEI4Q7aJlj0GMTeFqxg36Fv2SGHWPMk7ZPouhevmncddjx8PurTfOHM1GDu/F8 +VGmQ6zWUTWxhfZwjrrfQByl393WiN0qmjTFmGBTcb5U2EvBgDzbF+KRr1ottqHHqtbn Ql+PvNOOxbFRvUaB2lkdQF3Q88RM8SbpvlQI32xnBMz0YioHyWzCESA7J4P65h2XFaHu NYB9AnDqs9ENYBY/SqoiP1dbOATAyQKOztzTSITS2KFz5bHR6LAd7H7U0q+jP4snrGe1 QH9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=RMStr+4np8YoPskN92/pPzruONJ2EqyVRrBa1/D0LCs=; b=mLg1RSWU/RBcEeBD6EDNj5JwnRB99VDnIjuYjFisQ2jh2eYN2+KBQL3+x7kqAByVDR wXdA+5JpIewvaIdpfKbeV+r+djV8RzXg42pUl8ZK5bVnD5AgTmbGoO0gaBMh0ozc5GnW jJjeu50faGLMZX5ec/BN1KN8W5dmvTODuyU+z1q0+uwwHv+2TvccuUp1jLi1ETBRoq+p CtGiivXlq/byBL/+iD4xScANkr+W03rgowXsSMn+4xwm/OaCKqene62grFnV6YyXFOM7 SjI0Rwof2RIJwYGKqlOIgV8FKyR/ufB1uKMVLbsV9IUY8JWwDZXgDPWMcyohoojOU5H0 ZoBw== X-Gm-Message-State: AOAM532pxi8jFZy+1fAv63QYBA6ZDWDNZx/zIAoRz7ili4uHrb0wEi9z xMB+LGTiOy5BilOY/NUSCKQjEA== X-Google-Smtp-Source: ABdhPJy4kWg+4Sk9qcVwxcHQBhMXif/y5EwKo/yp8CClsDo7kMW10/4mmrm6MeykZRLm4UnpUK6b+A== X-Received: by 2002:a54:4d8f:: with SMTP id y15mr1065690oix.32.1627631726130; Fri, 30 Jul 2021 00:55:26 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i20sm135085ook.12.2021.07.30.00.55.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:55:25 -0700 (PDT) Date: Fri, 30 Jul 2021 00:55:22 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 10/16] tmpfs: fcntl(fd, F_MEM_LOCK) to memlock a tmpfs file In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <54e03798-d836-ae64-f41-4a1d46bc115b@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=lh3y8fim; spf=pass (imf16.hostedemail.com: domain of hughd@google.com designates 209.85.167.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2F629F003751 X-Stat-Signature: 766d97mqj68aymmnuuzu9687jhk6zzyd X-HE-Tag: 1627631727-737911 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shakeel Butt A new uapi to lock the files on tmpfs in memory, to protect against swap without mapping the files. This commit introduces two new commands to fcntl and shmem: F_MEM_LOCK and F_MEM_UNLOCK. The locking will be charged against RLIMIT_MEMLOCK of uid in namespace of the caller. This feature is implemented by mostly re-using the shmctl's SHM_LOCK mechanism (System V IPC shared memory). This api follows the design choices of shmctl's SHM_LOCK and also of mlock2 syscall where pages on swap are not populated on the syscall. The pages will be brought to memory on first access. As with System V shared memory, these pages are counted as Unevictable in /proc/meminfo (when they are allocated, or when page reclaim finds any allocated earlier), but they are not counted as Mlocked there. For simplicity the locked files are forbidden to grow or shrink to keep the user accounting simple. This design decision will be revisited once such use-case arises. The permissions to lock and unlock differs slightly from other similar interfaces. Anyone having CAP_IPC_LOCK or remaining rlimit can lock the file, but the unlocker has to have either CAP_IPC_LOCK or it should be the locker itself. This commit does not make the locked status of a tmpfs file visible. We can add an F_MEM_LOCKED fcntl later, to query that status if required; but it's not yet clear how best to make it visible. Signed-off-by: Shakeel Butt Signed-off-by: Hugh Dickins --- fs/fcntl.c | 2 ++ include/linux/shmem_fs.h | 1 + include/uapi/linux/fcntl.h | 7 +++++ mm/shmem.c | 59 ++++++++++++++++++++++++++++++++++++-- 4 files changed, 66 insertions(+), 3 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index 9cfff87c3332..a3534764b50e 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -437,6 +437,8 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, break; case F_HUGEPAGE: case F_NOHUGEPAGE: + case F_MEM_LOCK: + case F_MEM_UNLOCK: err = shmem_fcntl(filp, cmd, arg); break; default: diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 51b75d74ce89..ffdd0da816e5 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -24,6 +24,7 @@ struct shmem_inode_info { struct shared_policy policy; /* NUMA memory alloc policy */ struct simple_xattrs xattrs; /* list of xattrs */ atomic_t stop_eviction; /* hold when working on inode */ + struct ucounts *mlock_ucounts; /* user memlocked tmpfs file */ struct inode vfs_inode; }; diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 10f82b223642..21dc969df0fd 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -82,6 +82,13 @@ #define F_HUGEPAGE (F_LINUX_SPECIFIC_BASE + 15) #define F_NOHUGEPAGE (F_LINUX_SPECIFIC_BASE + 16) +/* + * Lock all pages of file into memory, as they are allocated; or unlock them. + * Currently supported only on tmpfs, and on its memfd_created files. + */ +#define F_MEM_LOCK (F_LINUX_SPECIFIC_BASE + 17) +#define F_MEM_UNLOCK (F_LINUX_SPECIFIC_BASE + 18) + /* * Types of directory notifications that may be requested. */ diff --git a/mm/shmem.c b/mm/shmem.c index f50f2ede71da..ba9b9900287b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -888,7 +888,7 @@ unsigned long shmem_swap_usage(struct vm_area_struct *vma) } /* - * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists. + * SHM_UNLOCK or F_MEM_UNLOCK restore Unevictable pages to their evictable list. */ void shmem_unlock_mapping(struct address_space *mapping) { @@ -897,7 +897,7 @@ void shmem_unlock_mapping(struct address_space *mapping) pagevec_init(&pvec); /* - * Minor point, but we might as well stop if someone else SHM_LOCKs it. + * Minor point, but we might as well stop if someone else memlocks it. */ while (!mapping_unevictable(mapping)) { if (!pagevec_lookup(&pvec, mapping, &index)) @@ -1123,7 +1123,8 @@ static int shmem_setattr(struct user_namespace *mnt_userns, /* protected by i_mutex */ if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) || - (newsize > oldsize && (info->seals & F_SEAL_GROW))) + (newsize > oldsize && (info->seals & F_SEAL_GROW)) || + (newsize != oldsize && info->mlock_ucounts)) return -EPERM; if (newsize != oldsize) { @@ -1161,6 +1162,10 @@ static void shmem_evict_inode(struct inode *inode) struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); if (shmem_mapping(inode->i_mapping)) { + if (info->mlock_ucounts) { + user_shm_unlock(inode->i_size, info->mlock_ucounts); + info->mlock_ucounts = NULL; + } shmem_unacct_size(info->flags, inode->i_size); inode->i_size = 0; shmem_truncate_range(inode, 0, (loff_t)-1); @@ -2266,6 +2271,7 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) /* * What serializes the accesses to info->flags? + * inode_lock() when called from shmem_memlock_fcntl(), * ipc_lock_object() when called from shmctl_do_lock(), * no serialization needed when called from shm_destroy(). */ @@ -2286,6 +2292,43 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) return retval; } +static int shmem_memlock_fcntl(struct file *file, unsigned int cmd) +{ + struct inode *inode = file_inode(file); + struct shmem_inode_info *info = SHMEM_I(inode); + bool cleanup_mapping = false; + int retval = 0; + + inode_lock(inode); + if (cmd == F_MEM_LOCK) { + if (!info->mlock_ucounts) { + struct ucounts *ucounts = current_ucounts(); + /* capability/rlimit check is down in user_shm_lock */ + retval = shmem_lock(file, 1, ucounts); + if (!retval) + info->mlock_ucounts = ucounts; + else if (!rlimit(RLIMIT_MEMLOCK)) + retval = -EPERM; + /* else retval == -ENOMEM */ + } + } else { /* F_MEM_UNLOCK */ + if (info->mlock_ucounts) { + if (info->mlock_ucounts == current_ucounts() || + capable(CAP_IPC_LOCK)) { + shmem_lock(file, 0, info->mlock_ucounts); + info->mlock_ucounts = NULL; + cleanup_mapping = true; + } else + retval = -EPERM; + } + } + inode_unlock(inode); + + if (cleanup_mapping) + shmem_unlock_mapping(file->f_mapping); + return retval; +} + static int shmem_mmap(struct file *file, struct vm_area_struct *vma) { struct shmem_inode_info *info = SHMEM_I(file_inode(file)); @@ -2503,6 +2546,8 @@ shmem_write_begin(struct file *file, struct address_space *mapping, if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size) return -EPERM; } + if (unlikely(info->mlock_ucounts) && pos + len > inode->i_size) + return -EPERM; return shmem_getpage(inode, index, pagep, SGP_WRITE); } @@ -2715,6 +2760,10 @@ long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg) case F_NOHUGEPAGE: error = shmem_huge_fcntl(file, cmd); break; + case F_MEM_LOCK: + case F_MEM_UNLOCK: + error = shmem_memlock_fcntl(file, cmd); + break; } return error; @@ -2778,6 +2827,10 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, error = -EPERM; goto out; } + if (info->mlock_ucounts && offset + len > inode->i_size) { + error = -EPERM; + goto out; + } start = offset >> PAGE_SHIFT; end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT; From patchwork Fri Jul 30 07:57:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A6E4C4338F for ; Fri, 30 Jul 2021 07:58:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD03A60F5C for ; Fri, 30 Jul 2021 07:58:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CD03A60F5C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6637B8D0002; Fri, 30 Jul 2021 03:58:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 614EA8D0001; Fri, 30 Jul 2021 03:58:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5025F8D0002; Fri, 30 Jul 2021 03:58:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 33A1A8D0001 for ; Fri, 30 Jul 2021 03:58:02 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E1C7E2683D for ; Fri, 30 Jul 2021 07:58:01 +0000 (UTC) X-FDA: 78418500762.35.2FE3BC2 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) by imf28.hostedemail.com (Postfix) with ESMTP id 99E9F9000E37 for ; Fri, 30 Jul 2021 07:58:01 +0000 (UTC) Received: by mail-ot1-f44.google.com with SMTP id 19-20020a9d08930000b02904b98d90c82cso8671765otf.5 for ; Fri, 30 Jul 2021 00:58:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=Ah5+LqI1Dziut4yM0LH3jQhWAKB3EiOL6LUittwAvkA=; b=PZSsLjQhcjyWAqzvgQshw7+5wZKOaT84MH5jTXoLyyI/aOKDQwtjaOC8rspyuwR40D CPEzyuE2oY1RLOrSY1DwOaWdrUFxPFQ7XZxVb/F45EKbtlDPwmk9NYUTYdHB6voKCv5J NWAVxgTmbC9AIVv1MulyF6w3VltvV+ZvFpaXGQ40tgHDAWtuqSTyD73X4oPShh5dvZ9o rh5oQ96rSEvCqs5JDkjnxTOZuAlSHA10NF8NHIR43vAsccP9TdV7QGyC2MwgJMJL/+Lv JQMF3kva96grq5CFt8QuFkcc04skiTeWZ44uu8EtK40TUyhev/AUzfBrw9bMK1QGwlha SMkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=Ah5+LqI1Dziut4yM0LH3jQhWAKB3EiOL6LUittwAvkA=; b=mT9pBiMJXj+8NQAhFVGZcnEMVVCzcym1nT6+a/cbJC8C6aLt2lstLEdZPy0gZEh/1t mdl5L/9KEPGqQ4YIKGJ45pHf5yYBMRFaQQm2ws8tt0fmESdoAfGLLnUBYON0PGuFKW4Y DEQstjj8P3njZB8/JKZufnddf6nK8RwCB2ickmT9kMCeDo0X5SbV9m1LouepcJRxMfqT 87BMq03I8HUqSopDVTen+z1Q8oPiEOS1nUpOH6x9rpXSWQwNURH22eeyF3Pw7onRxM5z 2p1vs1N7nH43cDuqkhurFGQR4+TOjTFJQuX5NgiwJ37+hg/d9mXRd+lDXuuJPR9IvNfe Rk1w== X-Gm-Message-State: AOAM533GTD0Eghx2Wy2ego++gecmi72sl4MiWBXKUDzHaFSnxLMS1GPm FZY8YwrDwC5isif+5us9ABwgEg== X-Google-Smtp-Source: ABdhPJyC6TATMzaci6gPPfYcCgBGVbL7XqE2Ve5WDt3z2Ltgzz9z2ISDZBfr8b91jrb3S/yU1QEXWA== X-Received: by 2002:a9d:6f99:: with SMTP id h25mr1064541otq.113.1627631880857; Fri, 30 Jul 2021 00:58:00 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y19sm179786oia.22.2021.07.30.00.57.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:57:59 -0700 (PDT) Date: Fri, 30 Jul 2021 00:57:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 11/16] tmpfs: fcntl(fd, F_MEM_LOCKED) to test if memlocked In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 99E9F9000E37 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=PZSsLjQh; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: 7x54yyum8443y4dt8b1ooftx6u97u9s5 X-HE-Tag: 1627631881-20564 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Though we have not yet found a compelling need to make the locked status of a tmpfs file visible, and offer no tool to show it, the kernel ought to be able to support such a tool: add the F_MEM_LOCKED fcntl, returning -1 on failure (not tmpfs), 0 when not F_MEM_LOCKED, 1 when F_MEM_LOCKED. Signed-off-by: Hugh Dickins --- fs/fcntl.c | 1 + include/uapi/linux/fcntl.h | 1 + mm/shmem.c | 4 ++++ 3 files changed, 6 insertions(+) diff --git a/fs/fcntl.c b/fs/fcntl.c index a3534764b50e..0d8dc723732d 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -439,6 +439,7 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, case F_NOHUGEPAGE: case F_MEM_LOCK: case F_MEM_UNLOCK: + case F_MEM_LOCKED: err = shmem_fcntl(filp, cmd, arg); break; default: diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 21dc969df0fd..012585e8c9ab 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -88,6 +88,7 @@ */ #define F_MEM_LOCK (F_LINUX_SPECIFIC_BASE + 17) #define F_MEM_UNLOCK (F_LINUX_SPECIFIC_BASE + 18) +#define F_MEM_LOCKED (F_LINUX_SPECIFIC_BASE + 19) /* * Types of directory notifications that may be requested. diff --git a/mm/shmem.c b/mm/shmem.c index ba9b9900287b..6e53dabe658b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2299,6 +2299,9 @@ static int shmem_memlock_fcntl(struct file *file, unsigned int cmd) bool cleanup_mapping = false; int retval = 0; + if (cmd == F_MEM_LOCKED) + return !!info->mlock_ucounts; + inode_lock(inode); if (cmd == F_MEM_LOCK) { if (!info->mlock_ucounts) { @@ -2762,6 +2765,7 @@ long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg) break; case F_MEM_LOCK: case F_MEM_UNLOCK: + case F_MEM_LOCKED: error = shmem_memlock_fcntl(file, cmd); break; } From patchwork Fri Jul 30 08:00:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67A3AC4338F for ; Fri, 30 Jul 2021 08:00:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1789260F5C for ; Fri, 30 Jul 2021 08:00:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1789260F5C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id ACD9C8D0002; Fri, 30 Jul 2021 04:00:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A57118D0001; Fri, 30 Jul 2021 04:00:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91E078D0002; Fri, 30 Jul 2021 04:00:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id 7591C8D0001 for ; Fri, 30 Jul 2021 04:00:21 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 31DB48249980 for ; Fri, 30 Jul 2021 08:00:21 +0000 (UTC) X-FDA: 78418506642.39.C97C30D Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by imf20.hostedemail.com (Postfix) with ESMTP id E7EE8D005863 for ; Fri, 30 Jul 2021 08:00:20 +0000 (UTC) Received: by mail-qk1-f173.google.com with SMTP id x3so8631179qkl.6 for ; Fri, 30 Jul 2021 01:00:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=Y7HX5YFqqvflWtOAFdHp5v6MXY/OcEeQOaBTkBgwruM=; b=u8YUNlG7+lu5Is9eMpdvivobPs3T+0kbTL+S17t5SgHQK7ZI42wPzwim0p5Qj9MV7u 2bU9YmpRo2bxZv0e7RZC0bKrCIrL2hx3bFOEZ6pFcNGTALYst0tCHVstlt4tlGIRQQdX YBJkVxUcLZYwRKs5+DEOoB5RT6WXu3viAXutpX6d0PengNqn+KpyfxmeUiJktHHuDf9S wz6YT+4+y452/F7EFITeaClRAPZZk6ry5z+Bi8M8uGzUp3dTwWg0eb8/yMP9X5qjqoH0 P/262T7dGLXaXcsIuhno7w4OQvYVMVnVrxPySbDDf0a4bY6qGegqMAGl1aNtMhFP5DZ/ 8ibw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=Y7HX5YFqqvflWtOAFdHp5v6MXY/OcEeQOaBTkBgwruM=; b=q40TV9dGj1qZSyabIwVIiDbJ3COeH5hUoCOcOsrnB7PoOUuoJqYWRUCAe9Aaz9r7aw g3SU4ApnznGhlkNaJa06oudWEQfLYdOUDEFNC1JRslxpGJx5fq5cDS/iiKbhu5FkKheb qoYnPmC8AK0Lo3+8I5G6Iaq+ty5D7mbTyc2g0Bpdboa2mrhFFO0QyHx9CH+RiiTnCL2B 58DX7Zf9k9NLAh6qHcNk8VsQjbAAaF65bEwaRfBO5WrW67OyRyKl63ZdplGzbOvxaFrV MW9pEQFBxehnBPmY1FVMj+NM+bSaL+LZl47m926yGyA8Btx7ow4p4MZtUqaHID1ZSFAY +etA== X-Gm-Message-State: AOAM533/eavc5tWyZNV7mDhT1G+n3tC829GDrJpqepLlLrQwRUkIA9OR g0j8L/k0JKbTucTEqwdMnlM6aw== X-Google-Smtp-Source: ABdhPJxX+FZqmUvFbB6LFhFizDSmok15mmV3G7aae8zfjyp+xlKCEmRsFFggV9F2XDQhme81TddiHg== X-Received: by 2002:a05:620a:13a1:: with SMTP id m1mr1061123qki.91.1627632020050; Fri, 30 Jul 2021 01:00:20 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m80sm536727qke.98.2021.07.30.01.00.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:00:18 -0700 (PDT) Date: Fri, 30 Jul 2021 01:00:16 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 12/16] tmpfs: refuse memlock when fallocated beyond i_size In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <3e5b2999-a27d-3590-46d9-80841b9427a9@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E7EE8D005863 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=u8YUNlG7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of hughd@google.com designates 209.85.222.173 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: xxx4aywhs4rdok6esk8uxyu4awdt44ye X-HE-Tag: 1627632020-469041 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: F_MEM_LOCK is accounted by i_size, but fallocate(,FALLOC_FL_KEEP_SIZE,,) could have added many pages beyond i_size, which would also be held as Unevictable from memory. The mlock_ucounts check in shmem_fallocate() is fine, but shmem_memlock_fcntl() needs to check fallocend too. We could change F_MEM_LOCK accounting to use the max of i_size and fallocend, but fallocend is obscure: I think it's better just to refuse the F_MEM_LOCK (with EPERM) if fallocend exceeds (page-rounded) i_size. Signed-off-by: Hugh Dickins --- mm/shmem.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 6e53dabe658b..35c0f5c7120e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2304,7 +2304,10 @@ static int shmem_memlock_fcntl(struct file *file, unsigned int cmd) inode_lock(inode); if (cmd == F_MEM_LOCK) { - if (!info->mlock_ucounts) { + if (info->fallocend > DIV_ROUND_UP(inode->i_size, PAGE_SIZE)) { + /* locking is accounted by i_size: disallow excess */ + retval = -EPERM; + } else if (!info->mlock_ucounts) { struct ucounts *ucounts = current_ucounts(); /* capability/rlimit check is down in user_shm_lock */ retval = shmem_lock(file, 1, ucounts); @@ -2854,9 +2857,10 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, spin_unlock(&inode->i_lock); /* - * info->fallocend is only relevant when huge pages might be + * info->fallocend is mostly relevant when huge pages might be * involved: to prevent split_huge_page() freeing fallocated * pages when FALLOC_FL_KEEP_SIZE committed beyond i_size. + * But it is also checked in F_MEM_LOCK validation. */ undo_fallocend = info->fallocend; if (info->fallocend < end) From patchwork Fri Jul 30 08:03:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D20A3C4338F for ; Fri, 30 Jul 2021 08:03:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 64B0660F5C for ; Fri, 30 Jul 2021 08:03:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 64B0660F5C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 0432F8D0002; Fri, 30 Jul 2021 04:03:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F353B8D0001; Fri, 30 Jul 2021 04:03:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4BA78D0002; Fri, 30 Jul 2021 04:03:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id CAB7D8D0001 for ; Fri, 30 Jul 2021 04:03:20 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6ECA82030C for ; Fri, 30 Jul 2021 08:03:20 +0000 (UTC) X-FDA: 78418514160.30.9147945 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by imf13.hostedemail.com (Postfix) with ESMTP id 298D11016C51 for ; Fri, 30 Jul 2021 08:03:20 +0000 (UTC) Received: by mail-qk1-f170.google.com with SMTP id k7so8616831qki.11 for ; Fri, 30 Jul 2021 01:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=MGzbYcDH1nDg768HY0nxSbDVEpHf6CaY6eKAkj1Bcno=; b=o9CkvmfR1TcmeOk6LOKvzvte8hfyZYPrZSw2A7Wzr6ScvLcANBE/Yoki51+w9yX2f6 LVZpGnEbwEkqt7Hnt4+bRy/6x7vHr//wG299GJtb3ki0DosAuUEDfoNE2XtI+uUhKWo8 JHPPLfpx8KtKArXuCe/nuOYN+WbhwkFd6C/is1BOTenZhPT2ltWUzirXBxTvUZUS+Eth QpoRVlzDTTDqPnJEkTWpSHqLZy5XfvU0CnPupILljD14KkG6MGijbWyMZkjws6psWPKu It373FYdIqlePn1A4XbdCWvefbcGFvZ9tg5krXyxfwhlI9Zg4rXj4U19/DZ057wpLO/T RdPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=MGzbYcDH1nDg768HY0nxSbDVEpHf6CaY6eKAkj1Bcno=; b=QE/S7113AgRcjixfVlcJu4JzADzxlO4Qd8MEeooqTiUr4zLvjJfaSS7T4qFAVB9xZv OBVt/r6ZxgzQeP7UbQsB/itxoHPskuvcV4xEfEBDNsidIGK9GlHYs/JtCPKdQ2lJBYME k6aBlp4s7ST/hOB03vsejOcxcFIH6cJbNJYdKloUopO2zpb7Z7u9JnYbr4elVnzQoXQg 1I2akPnjc8+hBWYh1whHnsFK847PLlU58QN+kulf0iu1qtO5zu5FXPrtQPu12oehGkNt 0DYOiXFw1WJjAZ5sE2P36OlAOFyBILwgv6rU586p75uUnHXn6nD1UBpO7u3Qx3jZeNXI a3TA== X-Gm-Message-State: AOAM5313ua1wuhtRqbhLu/lIsI/zhdVMGsQ67VcNiB47JId9Rx8iiDGS 4ulkxWwKVCKvudkjgBBbdSofyw== X-Google-Smtp-Source: ABdhPJykYS4oamptTpmIT6dFsy8t5YJDFW4VgtJtNRS/my2z80sEMhOQ8L7GnOEOo0Fymj/Aya18fA== X-Received: by 2002:a37:4042:: with SMTP id n63mr1074442qka.425.1627632199330; Fri, 30 Jul 2021 01:03:19 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y9sm316166qtw.51.2021.07.30.01.03.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:03:18 -0700 (PDT) Date: Fri, 30 Jul 2021 01:03:15 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 13/16] mm: bool user_shm_lock(loff_t size, struct ucounts *) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 298D11016C51 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=o9CkvmfR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.222.170 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: agp5dnxfj1t5kq1376mkr9ikreezdeqz X-HE-Tag: 1627632200-857040 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: user_shm_lock()'s size_t size was big enough for SysV SHM locking, but not quite big enough for O_LARGEFILE on 32-bit: change to loff_t size. And while changing the prototype, let's use bool rather than int here. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 4 ++-- mm/mlock.c | 14 +++++++------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 7ca22e6e694a..f1be2221512b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1713,8 +1713,8 @@ extern bool can_do_mlock(void); #else static inline bool can_do_mlock(void) { return false; } #endif -extern int user_shm_lock(size_t, struct ucounts *); -extern void user_shm_unlock(size_t, struct ucounts *); +extern bool user_shm_lock(loff_t size, struct ucounts *ucounts); +extern void user_shm_unlock(loff_t size, struct ucounts *ucounts); /* * Parameter block passed down to zap_pte_range in exceptional cases. diff --git a/mm/mlock.c b/mm/mlock.c index 16d2ee160d43..7df88fce0fc9 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -813,21 +813,21 @@ SYSCALL_DEFINE0(munlockall) } /* - * Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB - * shm segments) get accounted against the user_struct instead. + * Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB shm + * segments and F_MEM_LOCK tmpfs) get accounted to the user_namespace instead. */ static DEFINE_SPINLOCK(shmlock_user_lock); -int user_shm_lock(size_t size, struct ucounts *ucounts) +bool user_shm_lock(loff_t size, struct ucounts *ucounts) { unsigned long lock_limit, locked; long memlock; - int allowed = 0; + bool allowed = false; locked = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; lock_limit = rlimit(RLIMIT_MEMLOCK); if (lock_limit == RLIM_INFINITY) - allowed = 1; + allowed = true; lock_limit >>= PAGE_SHIFT; spin_lock(&shmlock_user_lock); memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); @@ -840,13 +840,13 @@ int user_shm_lock(size_t size, struct ucounts *ucounts) dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; } - allowed = 1; + allowed = true; out: spin_unlock(&shmlock_user_lock); return allowed; } -void user_shm_unlock(size_t size, struct ucounts *ucounts) +void user_shm_unlock(loff_t size, struct ucounts *ucounts) { spin_lock(&shmlock_user_lock); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); From patchwork Fri Jul 30 08:06:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32E20C4338F for ; Fri, 30 Jul 2021 08:06:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D5C1D603E9 for ; Fri, 30 Jul 2021 08:06:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D5C1D603E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6F67D8D0002; Fri, 30 Jul 2021 04:06:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A6788D0001; Fri, 30 Jul 2021 04:06:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5965F8D0002; Fri, 30 Jul 2021 04:06:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 3F24E8D0001 for ; Fri, 30 Jul 2021 04:06:40 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E50961DE8C for ; Fri, 30 Jul 2021 08:06:39 +0000 (UTC) X-FDA: 78418522518.22.45CE419 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) by imf05.hostedemail.com (Postfix) with ESMTP id 8DF5F50321C9 for ; Fri, 30 Jul 2021 08:06:39 +0000 (UTC) Received: by mail-qv1-f48.google.com with SMTP id g6so4760920qvj.8 for ; Fri, 30 Jul 2021 01:06:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=1kfuy3p4fQPWj8BIXMnfIDnrdaVNjXGHJ629/sTHhN0=; b=CGPy5pxveZbPA1YkH+RTGRMS6aPEOFmKcznpUhPyKtR9PWFd7pRCReXbF5wpNnlYp/ 9SpcE6rwiUrLzJQyIfAD/Yzb5gZ3Mv8DewkIbl39cqv3QD3ds4jZEJSD49UlWMzu+xLT 5JlghbqtaO9QyjG6S5rwyAiGMYfZTJVtRuFXIRVXiwjvyY71TgPJTMDUX+0TaOp+wGx9 kZgUaTLM3vhSlQud2H2zdrvR9FR1S6J7SWdH6M1/RsHPz5Pq7alHk4GsDrodUH/l+fXh vXErqQtQA1uE+Mggx8zNjo9CJz9yMyRMqxCpNn4IOIPbAFTZ9xwBw1wqz0A9YfmLWtvb J0sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=1kfuy3p4fQPWj8BIXMnfIDnrdaVNjXGHJ629/sTHhN0=; b=GnqUJIGvnny9Kcs8OWGHaM4KqrcQUy9YRRXSnI20ReHTt/A9/GVyUKBuVUfnpw/Icx NC6NmyjwaOD20GlSQWvaWfuL58kT2s2BCtKUXgPx27M2lMqb7GTGVPkqNutth2ELDzbH ZlQVjbV/S/5smo9JLDi9kjVo//cS9evu7bmz3CeJVvClbd4iA+LRA48oyINSy54cNYin gJher3PL8+GrATJPHU8HkzZk69El315LJIauOaKUE5o47xZWSxLkbJ8mi/8rcai6V7I4 lxWQM4DGpkk0Ef8mPTWmwIYnUkr7OyLDeGE+nnsvtMrdkcvDeEJVUwQEw7FQ+f+SmLAc H0yA== X-Gm-Message-State: AOAM5337s7aYMd5UrBmAxVsJ7H/nupRNV0ftRfAGcRP8AcUw6EC3yIWd LagSq7HYXIqjIq8G7xaseUUXAA== X-Google-Smtp-Source: ABdhPJzjhVOQvAERKlBjHv/dpZ6fohKVV28g9IpIkBW8etwIftV8Ohu4XnZ/N7AaWt7UMH4c2S5Vmg== X-Received: by 2002:a05:6214:ca5:: with SMTP id s5mr1431061qvs.58.1627632398575; Fri, 30 Jul 2021 01:06:38 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t8sm328269qtq.28.2021.07.30.01.06.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:06:37 -0700 (PDT) Date: Fri, 30 Jul 2021 01:06:35 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 14/16] mm: user_shm_lock(,,getuc) and user_shm_unlock(,,putuc) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <4bd4072-7eb0-d1a5-ce49-82f4b24bd070@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8DF5F50321C9 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=CGPy5pxv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: n8oynkp6s3cy47y5idon4r6krxxqa9qw X-HE-Tag: 1627632399-792209 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: user_shm_lock() and user_shm_unlock() have to get and put a reference on the ucounts structure, and get fails at overflow. That will be awkward for the next commit (shrinking ought not to fail), so add an argument (always true in this commit) to condition that get and put. It would be even easier to do the put_ucounts() separately when unlocking, but messy for the get_ucounts() when locking: better to keep them symmetric. Signed-off-by: Hugh Dickins --- fs/hugetlbfs/inode.c | 4 ++-- include/linux/mm.h | 4 ++-- ipc/shm.c | 4 ++-- mm/mlock.c | 9 +++++---- mm/shmem.c | 6 +++--- 5 files changed, 14 insertions(+), 13 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cdfb1ae78a3f..381902288f4d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1465,7 +1465,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size, if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) { *ucounts = current_ucounts(); - if (user_shm_lock(size, *ucounts)) { + if (user_shm_lock(size, *ucounts, true)) { task_lock(current); pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is deprecated\n", current->comm, current->pid); @@ -1499,7 +1499,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size, iput(inode); out: if (*ucounts) { - user_shm_unlock(size, *ucounts); + user_shm_unlock(size, *ucounts, true); *ucounts = NULL; } return file; diff --git a/include/linux/mm.h b/include/linux/mm.h index f1be2221512b..43cb5a6f97ff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1713,8 +1713,8 @@ extern bool can_do_mlock(void); #else static inline bool can_do_mlock(void) { return false; } #endif -extern bool user_shm_lock(loff_t size, struct ucounts *ucounts); -extern void user_shm_unlock(loff_t size, struct ucounts *ucounts); +extern bool user_shm_lock(loff_t size, struct ucounts *ucounts, bool getuc); +extern void user_shm_unlock(loff_t size, struct ucounts *ucounts, bool putuc); /* * Parameter block passed down to zap_pte_range in exceptional cases. diff --git a/ipc/shm.c b/ipc/shm.c index 748933e376ca..3e63809d38b7 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -289,7 +289,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp) shmem_lock(shm_file, 0, shp->mlock_ucounts); else if (shp->mlock_ucounts) user_shm_unlock(i_size_read(file_inode(shm_file)), - shp->mlock_ucounts); + shp->mlock_ucounts, true); fput(shm_file); ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); @@ -699,7 +699,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); if (is_file_hugepages(file) && shp->mlock_ucounts) - user_shm_unlock(size, shp->mlock_ucounts); + user_shm_unlock(size, shp->mlock_ucounts, true); fput(file); ipc_rcu_putref(&shp->shm_perm, shm_rcu_free); return error; diff --git a/mm/mlock.c b/mm/mlock.c index 7df88fce0fc9..5afa3eba9a13 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -818,7 +818,7 @@ SYSCALL_DEFINE0(munlockall) */ static DEFINE_SPINLOCK(shmlock_user_lock); -bool user_shm_lock(loff_t size, struct ucounts *ucounts) +bool user_shm_lock(loff_t size, struct ucounts *ucounts, bool getuc) { unsigned long lock_limit, locked; long memlock; @@ -836,7 +836,7 @@ bool user_shm_lock(loff_t size, struct ucounts *ucounts) dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; } - if (!get_ucounts(ucounts)) { + if (getuc && !get_ucounts(ucounts)) { dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; } @@ -846,10 +846,11 @@ bool user_shm_lock(loff_t size, struct ucounts *ucounts) return allowed; } -void user_shm_unlock(loff_t size, struct ucounts *ucounts) +void user_shm_unlock(loff_t size, struct ucounts *ucounts, bool putuc) { spin_lock(&shmlock_user_lock); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); spin_unlock(&shmlock_user_lock); - put_ucounts(ucounts); + if (putuc) + put_ucounts(ucounts); } diff --git a/mm/shmem.c b/mm/shmem.c index 35c0f5c7120e..1ddb910e976c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1163,7 +1163,7 @@ static void shmem_evict_inode(struct inode *inode) if (shmem_mapping(inode->i_mapping)) { if (info->mlock_ucounts) { - user_shm_unlock(inode->i_size, info->mlock_ucounts); + user_shm_unlock(inode->i_size, info->mlock_ucounts, true); info->mlock_ucounts = NULL; } shmem_unacct_size(info->flags, inode->i_size); @@ -2276,13 +2276,13 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) * no serialization needed when called from shm_destroy(). */ if (lock && !(info->flags & VM_LOCKED)) { - if (!user_shm_lock(inode->i_size, ucounts)) + if (!user_shm_lock(inode->i_size, ucounts, true)) goto out_nomem; info->flags |= VM_LOCKED; mapping_set_unevictable(file->f_mapping); } if (!lock && (info->flags & VM_LOCKED) && ucounts) { - user_shm_unlock(inode->i_size, ucounts); + user_shm_unlock(inode->i_size, ucounts, true); info->flags &= ~VM_LOCKED; mapping_clear_unevictable(file->f_mapping); } From patchwork Fri Jul 30 08:09:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42787C4338F for ; Fri, 30 Jul 2021 08:10:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DCF64603E9 for ; Fri, 30 Jul 2021 08:10:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DCF64603E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5BA7D8D0002; Fri, 30 Jul 2021 04:10:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 542DC8D0001; Fri, 30 Jul 2021 04:10:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40A6F8D0002; Fri, 30 Jul 2021 04:10:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0123.hostedemail.com [216.40.44.123]) by kanga.kvack.org (Postfix) with ESMTP id 23BCA8D0001 for ; Fri, 30 Jul 2021 04:10:01 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C5F8018149036 for ; Fri, 30 Jul 2021 08:10:00 +0000 (UTC) X-FDA: 78418530960.19.A6293D6 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf20.hostedemail.com (Postfix) with ESMTP id 8AA33D00584C for ; Fri, 30 Jul 2021 08:10:00 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id 129so8658431qkg.4 for ; Fri, 30 Jul 2021 01:10:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=iO85ZMC6Pth+31xPvVbOO8VKXEmUR+jk3X7hFax1vLE=; b=gL63kdkFN7iyXAPotPB2doPP4eoWZKnl2sfroPBPAmL0gzVw5Upydd4gfixGG7s4WC giXC1QKX0/mEwYnfv2DX5x83kim9pV5K+Gm8NMUXleRvZrKLiKonmPtcs7J3ioWqHe6F Ckfr9iXkkJcyYmswDgsxzVxSxBIysXhiyCXSKrZy8vVuiXmWrD2T+1NhJYm15q+9DkgU XwXZEF2H4svRElS8LTbSdUa4RDEDk42LvlEa7OxioJzfqS/+4IOmq0XGsfWVqB2JgtX/ zTuwoWGUPFo1/OuUj+lNosYVHJAFcNXFHYHIbWeDozDiEk7Ux5ckytcnsj1H/ae+IAiB bTCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=iO85ZMC6Pth+31xPvVbOO8VKXEmUR+jk3X7hFax1vLE=; b=EFl0iOo2dVNtYpbpLnWg/tDlHOGvdtpsTyY/bZfhQUvOUNGGUW4UMBFdYxWst7PvAq kP9UfdIjPzyhHQzJMBe8r30YmqtWF+evizErQ4F47RTxKuDjWqFwa/wA7BOyegXxdopA O977XuUYT+lnTARnOlmtgWcO43WpL4MBgP2N+dW3W5wrR0CfI0o8hMIAdYAh5ESzEphE cppUUJsEmy9uDIdsSqpFvh7peWrdDFEiNVvVqI3JFo5JxhX6BO7f3PQYqPDT//AvYdLa 3ie36u/+LrRmry2zEJSBYJ9MKT1PA8Sc+xmKF74E5PTf8qREV0kwqC07BtQ18Xx658dk VPOg== X-Gm-Message-State: AOAM532oj6DBOJ6Lb2eNSWDyKHEJI1HEcR3KmCigJuDTKBZ1LHdBFcgx JxthGUn2nXx52s8YptCkoqXwEA== X-Google-Smtp-Source: ABdhPJzxLMkX99MbqShrEgAY7r7lr2pFJKeO21G4hsIJ29P9sxgcQLfSXm9uBiPHU0BoiYUpJA2ihQ== X-Received: by 2002:ae9:e90e:: with SMTP id x14mr1119707qkf.118.1627632599619; Fri, 30 Jul 2021 01:09:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id w71sm571098qkb.67.2021.07.30.01.09.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:09:58 -0700 (PDT) Date: Fri, 30 Jul 2021 01:09:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 15/16] tmpfs: permit changing size of memlocked file In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=gL63kdkF; spf=pass (imf20.hostedemail.com: domain of hughd@google.com designates 209.85.222.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-Stat-Signature: coiiej539jythd3wepdrahwq6m9rfdmi X-Rspamd-Queue-Id: 8AA33D00584C X-HE-Tag: 1627632600-518618 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have users who change the size of their memlocked file by F_MEM_UNLOCK, ftruncate, F_MEM_LOCK. That risks swapout in between, and is distasteful: particularly if the file is very large (when shmem_unlock_mapping() has a lot of work to move pages off the Unevictable list, only for them to be moved back there later on). Modify shmem_setattr() to grow or shrink, and shmem_fallocate() to grow, the locked extent. But forbid (EPERM) both if current_ucounts() differs from the locker's mlock_ucounts (without even a CAP_IPC_LOCK override). They could be permitted (the caller already has unsealed write access), but it's probably less confusing to restrict size change to the locker. But leave shmem_write_begin() as is, preventing the memlocked file from being extended implicitly by writes beyond EOF: I think that it's best to demand an explicit size change, by truncate or fallocate, when memlocked. (But notice in testing "echo x >memlockedfile" how the O_TRUNC succeeds but the write fails: would F_MEM_UNLOCK on truncation to 0 be better?) Signed-off-by: Hugh Dickins --- mm/shmem.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 1ddb910e976c..fa4a264453bf 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1123,15 +1123,30 @@ static int shmem_setattr(struct user_namespace *mnt_userns, /* protected by i_mutex */ if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) || - (newsize > oldsize && (info->seals & F_SEAL_GROW)) || - (newsize != oldsize && info->mlock_ucounts)) + (newsize > oldsize && (info->seals & F_SEAL_GROW))) return -EPERM; if (newsize != oldsize) { - error = shmem_reacct_size(SHMEM_I(inode)->flags, - oldsize, newsize); + struct ucounts *ucounts = info->mlock_ucounts; + + if (ucounts && ucounts != current_ucounts()) + return -EPERM; + error = shmem_reacct_size(info->flags, + oldsize, newsize); if (error) return error; + if (ucounts) { + loff_t mlock = round_up(newsize, PAGE_SIZE) - + round_up(oldsize, PAGE_SIZE); + if (mlock < 0) { + user_shm_unlock(-mlock, ucounts, false); + } else if (mlock > 0 && + !user_shm_lock(mlock, ucounts, false)) { + shmem_reacct_size(info->flags, + newsize, oldsize); + return -EPERM; + } + } i_size_write(inode, newsize); inode->i_ctime = inode->i_mtime = current_time(inode); } @@ -2784,6 +2799,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_falloc shmem_falloc; pgoff_t start, index, end, undo_fallocend; + loff_t mlock = 0; int error; if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) @@ -2830,13 +2846,23 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, if (error) goto out; - if ((info->seals & F_SEAL_GROW) && offset + len > inode->i_size) { - error = -EPERM; - goto out; - } - if (info->mlock_ucounts && offset + len > inode->i_size) { + if (offset + len > inode->i_size) { error = -EPERM; - goto out; + if (info->seals & F_SEAL_GROW) + goto out; + if (info->mlock_ucounts) { + if (info->mlock_ucounts != current_ucounts() || + (mode & FALLOC_FL_KEEP_SIZE)) + goto out; + mlock = round_up(offset + len, PAGE_SIZE) - + round_up(inode->i_size, PAGE_SIZE); + if (mlock > 0 && + !user_shm_lock(mlock, info->mlock_ucounts, false)) { + mlock = 0; + goto out; + } + } + error = 0; } start = offset >> PAGE_SHIFT; @@ -2932,6 +2958,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, inode->i_private = NULL; spin_unlock(&inode->i_lock); out: + if (error && mlock > 0) + user_shm_unlock(mlock, info->mlock_ucounts, false); inode_unlock(inode); return error; } From patchwork Fri Jul 30 08:13:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 254E7C4338F for ; Fri, 30 Jul 2021 08:13:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BACE960F6B for ; Fri, 30 Jul 2021 08:13:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BACE960F6B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5DDDD8D0003; Fri, 30 Jul 2021 04:13:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 58DD98D0001; Fri, 30 Jul 2021 04:13:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4558D8D0003; Fri, 30 Jul 2021 04:13:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id 2B9BF8D0001 for ; Fri, 30 Jul 2021 04:13:05 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D3E3326DE0 for ; Fri, 30 Jul 2021 08:13:04 +0000 (UTC) X-FDA: 78418538688.25.09C0A9C Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf19.hostedemail.com (Postfix) with ESMTP id 8F87EB002444 for ; Fri, 30 Jul 2021 08:13:04 +0000 (UTC) Received: by mail-qt1-f181.google.com with SMTP id h27so5831028qtu.9 for ; Fri, 30 Jul 2021 01:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=6jcq19xGjLmUrRwSvlYxDwEEZEY30Ie1/WmTXGS/o98=; b=Kac6JXUK7HkQ+j1nfCyWf78Lx9/J8DhbK9cEe+BUdcchaQcuwA3yUFSUhn/812O94Y NYSWAx93XtwzseOdArk3qWr12pWQKPcJXgne/v78P0RQYFGrJ9nealVyPoIWx/EqYa7m hDroul1FOfEO5SGdz3xKVOhIExAaODMNchQY4wjDIFQbwb10BIZ+bstMTSQIg1LN8qqy eQAeDVhg1sz7SnP/E6f7lZDfkw7dTurQMEXnYyWb1vsMskpJatZ803Gdo+KCv9j+v8j+ ZEaOVViwafQGLKntkVwsZOYOPJK6ZWRqxSUO+LBdEYTTuWXaUrj8wSn5ScQLmiKbcD64 v8vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=6jcq19xGjLmUrRwSvlYxDwEEZEY30Ie1/WmTXGS/o98=; b=pcyrGOxUh/ckG3MDL9zsC+DHC46Y+ENvI/Rw2q4ucc18O9K47vzUOi09RtEMCgs0Gh TWuEdiOSZDz+LGaRUXA8q7tSzrlMd90cC9erjuO39QdmVZmLtEAByJccm1QHmgWOszyA Izq0nK5UHrESxUr6lGcnk6SCrTN18PcXDaPlwxLO7LylQme0JPV9YjcPFPogmVSBn+z2 ojmo58H5lYdZBnBEyXPTvr7/tciq9s6saE9DvF99nExdrqgY8//XkpR/JLfJpkpQE4Zp ugjDiyEEPMB5APIGrQeNaS01xv/ud3hAdnXTtAHfhBbXZy+JYoXsjsvHavZ8q18MklvQ 9gsA== X-Gm-Message-State: AOAM533RyC4P9WwL4ZCWvRtuAxf1HSB4qnwsHOkA7RT3Tws5z55aw3HF 59YfX+kVcdGturx7ugLzeAwmaQ== X-Google-Smtp-Source: ABdhPJyEd1jy6kKL/d+sUdaFX+XbhOvASGlRdpkXHz5LKfA/TN8qNb3mrpImsaMxEE6XRNqjUfjSpg== X-Received: by 2002:ac8:5552:: with SMTP id o18mr1239016qtr.51.1627632783757; Fri, 30 Jul 2021 01:13:03 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d129sm539530qkf.136.2021.07.30.01.13.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:13:02 -0700 (PDT) Date: Fri, 30 Jul 2021 01:13:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 16/16] memfd: memfd_create(name, MFD_MEM_LOCK) for memlocked shmem In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8F87EB002444 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=Kac6JXUK; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: axe4bkn19h853qoct3pkk9h3md31nsee X-HE-Tag: 1627632784-481629 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that the size of a memlocked file can be changed, memfd_create() can accept an MFD_MEM_LOCK flag to request memlocking, even though the initial size is of course 0. Signed-off-by: Hugh Dickins Reported-by: kernel test robot --- include/uapi/linux/memfd.h | 1 + mm/memfd.c | 7 +++++-- mm/shmem.c | 13 ++++++++++++- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h index 8358a69e78cc..9113b5aa1763 100644 --- a/include/uapi/linux/memfd.h +++ b/include/uapi/linux/memfd.h @@ -9,6 +9,7 @@ #define MFD_ALLOW_SEALING 0x0002U #define MFD_HUGETLB 0x0004U /* Use hugetlbfs */ #define MFD_HUGEPAGE 0x0008U /* Use huge tmpfs */ +#define MFD_MEM_LOCK 0x0010U /* Memlock tmpfs */ /* * Huge page size encoding when MFD_HUGETLB is specified, and a huge page diff --git a/mm/memfd.c b/mm/memfd.c index 0d1a504d2fc9..e39f9eed55d2 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -248,7 +248,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) #define MFD_ALL_FLAGS (MFD_CLOEXEC | \ MFD_ALLOW_SEALING | \ MFD_HUGETLB | \ - MFD_HUGEPAGE) + MFD_HUGEPAGE | \ + MFD_MEM_LOCK) SYSCALL_DEFINE2(memfd_create, const char __user *, uname, @@ -262,7 +263,7 @@ SYSCALL_DEFINE2(memfd_create, if (flags & MFD_HUGETLB) { /* Disallow huge tmpfs when choosing hugetlbfs */ - if (flags & MFD_HUGEPAGE) + if (flags & (MFD_HUGEPAGE | MFD_MEM_LOCK)) return -EINVAL; /* Allow huge page size encoding in flags. */ if (flags & ~(unsigned int)(MFD_ALL_FLAGS | @@ -314,6 +315,8 @@ SYSCALL_DEFINE2(memfd_create, if (flags & MFD_HUGEPAGE) vm_flags |= VM_HUGEPAGE; + if (flags & MFD_MEM_LOCK) + vm_flags |= VM_LOCKED; file = shmem_file_setup(name, 0, vm_flags); } diff --git a/mm/shmem.c b/mm/shmem.c index fa4a264453bf..a0a83e59ae07 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2395,7 +2395,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode spin_lock_init(&info->lock); atomic_set(&info->stop_eviction, 0); info->seals = F_SEAL_SEAL; - info->flags = flags & VM_NORESERVE; + info->flags = flags & (VM_NORESERVE | VM_LOCKED); if ((flags & VM_HUGEPAGE) && transparent_hugepage_allowed(sbinfo) && !test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) @@ -4254,6 +4254,17 @@ static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name, l inode->i_size = size; clear_nlink(inode); /* It is unlinked */ res = ERR_PTR(ramfs_nommu_expand_for_mapping(inode, size)); + if (!IS_ERR(res) && (flags & VM_LOCKED)) { + struct ucounts *ucounts = current_ucounts(); + /* + * Only memfd_create() may pass VM_LOCKED, and it passes + * size 0; but avoid that assumption in case it changes. + */ + if (user_shm_lock(size, ucounts, true)) + SHMEM_I(inode)->mlock_ucounts = ucounts; + else + res = ERR_PTR(-EPERM); + } if (!IS_ERR(res)) res = alloc_file_pseudo(inode, mnt, name, O_RDWR, &shmem_file_operations);