From patchwork Fri Jul 30 07:25:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1B47C432BE for ; Fri, 30 Jul 2021 07:25:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 87DBD61040 for ; Fri, 30 Jul 2021 07:25:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237767AbhG3HZs (ORCPT ); Fri, 30 Jul 2021 03:25:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237572AbhG3HZr (ORCPT ); Fri, 30 Jul 2021 03:25:47 -0400 Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A05E8C0613C1 for ; Fri, 30 Jul 2021 00:25:42 -0700 (PDT) Received: by mail-qk1-x733.google.com with SMTP id 129so8574925qkg.4 for ; Fri, 30 Jul 2021 00:25:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=jkYXHYQOJwdYcrKEK9MjpwYhNSueJb3fc8wKTs2YDzo=; b=MnYv4B4ahWeK8EelXwid43sUxDgsWADeVSOhfvddZrBPkswJK5Et+CtJWb+1DpLpwA dJxKP4tIc+aR6szsW9Lj/v66BEXLzd94E7bNJu1Qv1OUyxB+6RVhP7wlqaRXjwuQMRxn pr4uQKsGvfWn5CQjX4TZC+h5ZSYjlqNl8ghcCklzv30T9zSJOLkgfBRgJn4Mnl9FHJjc 5xY3HwHZ6ZhzzS1dkQQTof7vNMZVM8qYC1xMtDJVq2VqMGeUWb4XpvYSVM3lFW/hJJeQ c9HZP2vLmNhDzpObKtkmUZSZAx/7AMfQgfv6CjmnYWAD7b7wiotN3Ekace0xWNMNXI+D fYBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=jkYXHYQOJwdYcrKEK9MjpwYhNSueJb3fc8wKTs2YDzo=; b=KwO4ANO2QVFB6+60KKyaCdEGlLhJy0vu8sO7rd6Msz49PgQSPH1N2CP7i/mHuuJGME X4134FU0N8ZpVXvkxL4l8jWRM9LNngIdRY0WR0y6ydOxCqDcnFArSk1OlWeS9mGMQ+IX 36SN2eC8ysHiuueN+lfprBLE+2414mSk8fCu85a2qqeKdpqnCQ/NnOKPnWiTkQHoySYI uw6M7Mc0vfEMcCPTbVtOSJSmb8/OZW8/15gKF871kMS05UZ/8hOO/0XbKVTYgqINTh+a 4+BGjiGnon7KywiDZAdQzIQj6aRkMR0E/R8r8Asl3STdINKW2ByUk97AcKqZ4Gsk0oOh A6qg== X-Gm-Message-State: AOAM530EY7PoAMZMPuT0NlpC1tC1RYk5t6Srq3OYNSOIW/wLQY7tjHCO rz3loCibmXaNob3/Rpddv67kcA== X-Google-Smtp-Source: ABdhPJwqYYV/p4qw44eoD0Ui59yNPpX+r3lsn/zvr7f6iY0IIpj9pmMIxxqRvhwlEzpGikFmo1z2LQ== X-Received: by 2002:ae9:e90e:: with SMTP id x14mr992985qkf.118.1627629941526; Fri, 30 Jul 2021 00:25:41 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 5sm524075qko.53.2021.07.30.00.25.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:25:40 -0700 (PDT) Date: Fri, 30 Jul 2021 00:25:37 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 01/16] huge tmpfs: fix fallocate(vanilla) advance over huge pages In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org shmem_fallocate() goes to a lot of trouble to leave its newly allocated pages !Uptodate, partly to identify and undo them on failure, partly to leave the overhead of clearing them until later. But the huge page case did not skip to the end of the extent, walked through the tail pages one by one, and appeared to work just fine: but in doing so, cleared and Uptodated the huge page, so there was no way to undo it on failure. Now advance immediately to the end of the huge extent, with a comment on why this is more than just an optimization. But although this speeds up huge tmpfs fallocation, it does leave the clearing until first use, and some users may have come to appreciate slow fallocate but fast first use: if they complain, then we can consider adding a pass to clear at the end. Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/shmem.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 70d9ce294bb4..0cd5c9156457 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2736,7 +2736,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, inode->i_private = &shmem_falloc; spin_unlock(&inode->i_lock); - for (index = start; index < end; index++) { + for (index = start; index < end; ) { struct page *page; /* @@ -2759,13 +2759,26 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, goto undone; } + index++; + /* + * Here is a more important optimization than it appears: + * a second SGP_FALLOC on the same huge page will clear it, + * making it PageUptodate and un-undoable if we fail later. + */ + if (PageTransCompound(page)) { + index = round_up(index, HPAGE_PMD_NR); + /* Beware 32-bit wraparound */ + if (!index) + index--; + } + /* * Inform shmem_writepage() how far we have reached. * No need for lock or barrier: we have the page lock. */ - shmem_falloc.next++; if (!PageUptodate(page)) - shmem_falloc.nr_falloced++; + shmem_falloc.nr_falloced += index - shmem_falloc.next; + shmem_falloc.next = index; /* * If !PageUptodate, leave it that way so that freeable pages From patchwork Fri Jul 30 07:28:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F154C432BE for ; Fri, 30 Jul 2021 07:28:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7FBB56103B for ; Fri, 30 Jul 2021 07:28:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237836AbhG3H2c (ORCPT ); Fri, 30 Jul 2021 03:28:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237686AbhG3H2b (ORCPT ); Fri, 30 Jul 2021 03:28:31 -0400 Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6122FC061765 for ; Fri, 30 Jul 2021 00:28:26 -0700 (PDT) Received: by mail-qk1-x72f.google.com with SMTP id t68so8575256qkf.8 for ; Fri, 30 Jul 2021 00:28:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=gm6Y2NNCrYynKUz85whiBz6KRNEuhvZyPYL7GqtO3tI=; b=YcXdMtMwe4CJ20YESa/SidLP2f0ojKCdgnA72cM/M/stZw8yPcShxr50c/X9GZ0fnj S2MpNEV0VqmbGbDz05IaPe2+fVwwCmSgqluOiT7fPmsnu0puBHItNjHJiFnJGffealfE EiRNpowfAZk8zZzuIf7zVHIw698cpXXbM96ZzNKmOB1ByAxOLgtChU32ndrfLSUczX2y CzjvalOcWF6Xdlh9tcyHAGRz4tUWPzKOXro4/VvEDL5br0c0lAAIdnPlrDWWfWBXBqhb gsk2ZTKYCgECLFu6It8Rb7W9jW2U2Uru10IkQKIe+uERSQMoK8+l3Bsdi4JjKVCwjmgL 9low== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=gm6Y2NNCrYynKUz85whiBz6KRNEuhvZyPYL7GqtO3tI=; b=KfhqjUXeqdXommhlab0DirH0YIc5n3kj5ugs54DLvUJrs6fFvtfn2J8YblqtnwKzFX tnYBmMSmLUj86SzpRx15cnYSPCPNHBZL0rB1AWI9E8QBl5xyxUyPMTAH8VArb03P0Eyx NlwuxnSbQ5BMqhhYK2mGGWZM5AFCGoDtSpZRQdz78stV2MM6Tqqsqc014un8YqKrrgpe WgSP6zitiu7P5068pfb4ruha1YdACrq27S3SjNbC4hnHLvVMCVaF07zHsdzbw71Gn8Iw yQMRAinTlc0hP877hiJ6DjwNm+gFDJGrXu0AJrq86kMH4Ohcmh94Pp/bInoJQON8Igue 3OVg== X-Gm-Message-State: AOAM532pjJ8Ci2HOhepmjmx1F2mb3YD0BGLQ+oGn+kyVtzPQPDCg81Bh +343lqttcFbY7Tzqy8dwDr4Nwg== X-Google-Smtp-Source: ABdhPJxecgdSCZfI93S5kHeP+Cwiny87A7nY3qgiITXrJwsBVA4B+q+cTKRJXSaBP4piXTqndrRkBQ== X-Received: by 2002:a05:620a:12f4:: with SMTP id f20mr1025600qkl.220.1627630105358; Fri, 30 Jul 2021 00:28:25 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v6sm517507qkp.117.2021.07.30.00.28.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:28:24 -0700 (PDT) Date: Fri, 30 Jul 2021 00:28:22 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 02/16] huge tmpfs: fix split_huge_page() after FALLOC_FL_KEEP_SIZE In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A successful shmem_fallocate() guarantees that the extent has been reserved, even beyond i_size when the FALLOC_FL_KEEP_SIZE flag was used. But that guarantee is broken by shmem_unused_huge_shrink()'s attempts to split huge pages and free their excess beyond i_size; and by other uses of split_huge_page() near i_size. It's sad to add a shmem inode field just for this, but I did not find a better way to keep the guarantee. A flag to say KEEP_SIZE has been used would be cheaper, but I'm averse to unclearable flags. The fallocend field is not perfect either (many disjoint ranges might be fallocated), but good enough; and gains another use later on. Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- include/linux/shmem_fs.h | 13 +++++++++++++ mm/huge_memory.c | 6 ++++-- mm/shmem.c | 15 ++++++++++++++- 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 8e775ce517bb..9b7f7ac52351 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -18,6 +18,7 @@ struct shmem_inode_info { unsigned long flags; unsigned long alloced; /* data pages alloced to file */ unsigned long swapped; /* subtotal assigned to swap */ + pgoff_t fallocend; /* highest fallocate endindex */ struct list_head shrinklist; /* shrinkable hpage inodes */ struct list_head swaplist; /* chain of maybes on swap */ struct shared_policy policy; /* NUMA memory alloc policy */ @@ -119,6 +120,18 @@ static inline bool shmem_file(struct file *file) return shmem_mapping(file->f_mapping); } +/* + * If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages + * beyond i_size's notion of EOF, which fallocate has committed to reserving: + * which split_huge_page() must therefore not delete. This use of a single + * "fallocend" per inode errs on the side of not deleting a reservation when + * in doubt: there are plenty of cases when it preserves unreserved pages. + */ +static inline pgoff_t shmem_fallocend(struct inode *inode, pgoff_t eof) +{ + return max(eof, SHMEM_I(inode)->fallocend); +} + extern bool shmem_charge(struct inode *inode, long pages); extern void shmem_uncharge(struct inode *inode, long pages); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index afff3ac87067..890fb73ac89b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2454,11 +2454,11 @@ static void __split_huge_page(struct page *page, struct list_head *list, for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); - /* Some pages can be beyond i_size: drop them from page cache */ + /* Some pages can be beyond EOF: drop them from page cache */ if (head[i].index >= end) { ClearPageDirty(head + i); __delete_from_page_cache(head + i, NULL); - if (IS_ENABLED(CONFIG_SHMEM) && PageSwapBacked(head)) + if (shmem_mapping(head->mapping)) shmem_uncharge(head->mapping->host, 1); put_page(head + i); } else if (!PageAnon(page)) { @@ -2686,6 +2686,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * head page lock is good enough to serialize the trimming. */ end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); + if (shmem_mapping(mapping)) + end = shmem_fallocend(mapping->host, end); } /* diff --git a/mm/shmem.c b/mm/shmem.c index 0cd5c9156457..24c9da6b41c2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -905,6 +905,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, if (lend == -1) end = -1; /* unsigned, so actually very big */ + if (info->fallocend > start && info->fallocend <= end && !unfalloc) + info->fallocend = start; + pagevec_init(&pvec); index = start; while (index < end && find_lock_entries(mapping, index, end - 1, @@ -2667,7 +2670,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_falloc shmem_falloc; - pgoff_t start, index, end; + pgoff_t start, index, end, undo_fallocend; int error; if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) @@ -2736,6 +2739,15 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, inode->i_private = &shmem_falloc; spin_unlock(&inode->i_lock); + /* + * info->fallocend is only relevant when huge pages might be + * involved: to prevent split_huge_page() freeing fallocated + * pages when FALLOC_FL_KEEP_SIZE committed beyond i_size. + */ + undo_fallocend = info->fallocend; + if (info->fallocend < end) + info->fallocend = end; + for (index = start; index < end; ) { struct page *page; @@ -2750,6 +2762,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, else error = shmem_getpage(inode, index, &page, SGP_FALLOC); if (error) { + info->fallocend = undo_fallocend; /* Remove the !PageUptodate pages we added */ if (index > start) { shmem_undo_range(inode, From patchwork Fri Jul 30 07:30:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7358C4320E for ; Fri, 30 Jul 2021 07:31:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8084B60230 for ; Fri, 30 Jul 2021 07:31:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237832AbhG3HbH (ORCPT ); Fri, 30 Jul 2021 03:31:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237872AbhG3HbG (ORCPT ); Fri, 30 Jul 2021 03:31:06 -0400 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA59AC0613CF for ; Fri, 30 Jul 2021 00:31:00 -0700 (PDT) Received: by mail-qk1-x735.google.com with SMTP id 129so8585634qkg.4 for ; Fri, 30 Jul 2021 00:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=nVKTzg+zegoJ7pq31Eztzo2O8w5rVTTFPmYOBnSB5HQ=; b=I5y09he3LMg9nKJBO/RVZIAjX6eTPeNw3nDjtVgbJJDsdfBWEvUcV7yl3ay/o2aq3N U/3Db1OVkuwPTecxrJzo9KsnCtel83a2PlXjtiNu0Rmayu9sRgJrg1Y3QGGP6uDxv80+ WbBfX4kJ+XgAia/sidQoA3Vk7ZCYy1wJR60sTwGylISHBMWoKN/vkV4GWBs7zEx+mGll u4M62kd1imThj2ggOgJ585Fitxj1BfSa+4J1Va4PDDiO5VdAta3MHT4du6H9lezICkVs K08fxnzZpkjVK8KJuM0jehbU6wPy6juKuaTpVaNVWtJ8OnEvN/N1ZCEjky/4Wup1/IZV 4EFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=nVKTzg+zegoJ7pq31Eztzo2O8w5rVTTFPmYOBnSB5HQ=; b=Zj3lp8QvyMBobiMoeXcvwzKJUN68lG1FqYjUAfcuTHlcfM+Kqtiw7uCw7eoz5XFfuE XGleXdQZ63ugmMnBTvYUfh0gNZcVsrbpIKe1ciiiWZN/sVz28ZMUB7LAKtYybMLRIl23 fSRE59PtNwLbsJwedz8y543ZITvG6QAj6yR9DV2/f9bvMFzBa+lZBx4cgElH/BgstqlM 9OgrVuSBFkAkQZnhXO9c517gd6NjYKYok9w1NV/HJCLABmjQsVpED0qqIohN63y6km0n g+3i+4KRBGdAfCN19vKXrCeCxxQGfO45+HDzhx3zdcPuplQGeCUhNceZS7BBS7iuFtIk szrw== X-Gm-Message-State: AOAM531JruFwiFbf+Sjh7MIfQ80lmkK9nZKBVMq4ineKOSjWPFRnvjNS A6VUT59BXLzqZIcPWCKZc6o9VQ== X-Google-Smtp-Source: ABdhPJxDC7noWhmpF/FxW8qRqxgJS863LkrD33kURWaXcju0YbgRvrIQ99//7ad611wyKnJPjLJpKA== X-Received: by 2002:ae9:e002:: with SMTP id m2mr1031128qkk.474.1627630259647; Fri, 30 Jul 2021 00:30:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c6sm504860qke.133.2021.07.30.00.30.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:30:58 -0700 (PDT) Date: Fri, 30 Jul 2021 00:30:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 03/16] huge tmpfs: remove shrinklist addition from shmem_setattr() In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <42353193-6896-aa85-9127-78881d5fef66@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org There's a block of code in shmem_setattr() to add the inode to shmem_unused_huge_shrink()'s shrinklist when lowering i_size: it dates from before 5.7 changed truncation to do split_huge_page() for itself, and should have been removed at that time. I am over-stating that: split_huge_page() can fail (notably if there's an extra reference to the page at that time), so there might be value in retrying. But there were already retries as truncation worked through the tails, and this addition risks repeating unsuccessful retries indefinitely: I'd rather remove it now, and work on reducing the chance of split_huge_page() failures separately, if we need to. Fixes: 71725ed10c40 ("mm: huge tmpfs: try to split_huge_page() when punching hole") Signed-off-by: Hugh Dickins --- mm/shmem.c | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 24c9da6b41c2..ce3ccaac54d6 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1061,7 +1061,6 @@ static int shmem_setattr(struct user_namespace *mnt_userns, { struct inode *inode = d_inode(dentry); struct shmem_inode_info *info = SHMEM_I(inode); - struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); int error; error = setattr_prepare(&init_user_ns, dentry, attr); @@ -1097,24 +1096,6 @@ static int shmem_setattr(struct user_namespace *mnt_userns, if (oldsize > holebegin) unmap_mapping_range(inode->i_mapping, holebegin, 0, 1); - - /* - * Part of the huge page can be beyond i_size: subject - * to shrink under memory pressure. - */ - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - spin_lock(&sbinfo->shrinklist_lock); - /* - * _careful to defend against unlocked access to - * ->shrink_list in shmem_unused_huge_shrink() - */ - if (list_empty_careful(&info->shrinklist)) { - list_add_tail(&info->shrinklist, - &sbinfo->shrinklist); - sbinfo->shrinklist_len++; - } - spin_unlock(&sbinfo->shrinklist_lock); - } } } From patchwork Fri Jul 30 07:36:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410573 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03BC8C4320E for ; Fri, 30 Jul 2021 07:36:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDB106108C for ; Fri, 30 Jul 2021 07:36:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237832AbhG3Hg7 (ORCPT ); Fri, 30 Jul 2021 03:36:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237737AbhG3Hg7 (ORCPT ); Fri, 30 Jul 2021 03:36:59 -0400 Received: from mail-ot1-x32b.google.com (mail-ot1-x32b.google.com [IPv6:2607:f8b0:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3174C061765 for ; Fri, 30 Jul 2021 00:36:53 -0700 (PDT) Received: by mail-ot1-x32b.google.com with SMTP id c2-20020a0568303482b029048bcf4c6bd9so8596645otu.8 for ; Fri, 30 Jul 2021 00:36:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=7byT5bHnPOS7y5EU7qUuO5PL/0UgHrhgbJIWzprjQ1c=; b=ORF4v5oDyw8K3+2KbHlXODBIhFHkD4R+F5BXpH8NINHorwAPc7YVmxWyBoy0+VRNAE iRWPTSbOiINERDx6exN7IO5B2+rcQcvZKXTcCtE6QuIObLXuRcETqSxKpDTl4aDVED1W flpybY/IOPY94ZThriNdKbiu4eo4W1HVyfAMe/K+b+McrV4pVrGqz4sHsXLKuhULr0Sy 40L+6BulBdvLKtEDZvIKsPD999DXg6gVvLsh71e+BnYj3vnSg9H1aPldHjOBlpO9edCs /j8px2PnkGQnzt0Y30GWdAI+gXcB/8smhs7YN+/5xTUpp4FKs2IW42nOcpRARJEEu3wt sGlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=7byT5bHnPOS7y5EU7qUuO5PL/0UgHrhgbJIWzprjQ1c=; b=Bz3OvfKp/uBlNaxJpChGaC1vJM1t9AYGwkNQ+rxChQJ178k0H7iV2GQyr/FuCllK8v +sXOsOs4Nz2hYuAOl0MZxR+ElHSxxg4IM3KWEZDUEEy6/0LHfrZBWEjPSOiQ/028Pgm2 xTMO/Czd40SuthikjPKzH9EXMZTj6QOhqiMIKGDnytfzsvL1r6/3F0k/LhibccqrUeJa urshpGEudpsLZNN5GgTtmeiKWdaZDaGBfxohDwOap+dOXcOISOeXomL3q4dnxkTbK24/ PBkNNt0aoYembKQq0cDQ/rMOJq4NJoZ/0uJF+Hibo4SW2QOdHn0t0LwairbreT+sD8Ow +72Q== X-Gm-Message-State: AOAM531weHzD8U66gbwXkUDsdxvvS0L89EnYYRpYOhONEWpcFVLt8F6v VWB+9Jw6nqX2UXpmFyoZCaW84w== X-Google-Smtp-Source: ABdhPJzLIsT5sg4hoRajKGVV2ZmrG0gayWp/81hGgTA8eD8ZXDDcpn9inia6/YIQi3zQ3umq+wkPoA== X-Received: by 2002:a9d:4911:: with SMTP id e17mr1093386otf.38.1627630613080; Fri, 30 Jul 2021 00:36:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b70sm172434oii.24.2021.07.30.00.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:36:52 -0700 (PDT) Date: Fri, 30 Jul 2021 00:36:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 04/16] huge tmpfs: revert shmem's use of transhuge_vma_enabled() In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org 5.14 commit e6be37b2e7bd ("mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()") added transhuge_vma_enabled() as a wrapper for two very different checks: shmem_huge_enabled() prefers to show those two checks explicitly, as before. Signed-off-by: Hugh Dickins --- mm/shmem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index ce3ccaac54d6..c6fa6f4f2db8 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -4003,7 +4003,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma) loff_t i_size; pgoff_t off; - if (!transhuge_vma_enabled(vma, vma->vm_flags)) + if ((vma->vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) return false; if (shmem_huge == SHMEM_HUGE_FORCE) return true; From patchwork Fri Jul 30 07:39:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410575 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62E4BC4338F for ; Fri, 30 Jul 2021 07:39:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44F8F60F4B for ; Fri, 30 Jul 2021 07:39:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237860AbhG3Hjg (ORCPT ); Fri, 30 Jul 2021 03:39:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230337AbhG3Hjf (ORCPT ); Fri, 30 Jul 2021 03:39:35 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E80E4C061765 for ; Fri, 30 Jul 2021 00:39:29 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id g11so5784513qts.11 for ; Fri, 30 Jul 2021 00:39:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=iB3xdPLbnvzRXZ9pj7SdMcyOa2roZKbkLihlKFhALkM=; b=pEB5hNVNDT29rgSS0xSX6dOBbY1KcxCcqMvGY3gqkTMlD2qdO44nqN1eVywbejwkBi 6wx/4WR4Jr1fWj893QfHi1LOJuaZGArAhQ1mhVHvQZ+ogc6B7C5Zk8gT/FZW95s7NBjn kFDfr2gVH2RmIy1sLtLYDuWZVm/kAcgNgZoad3LQb2en02krriC9y2zB3b+GMWM+MVtE /hOh7Hd6PxcmDQrtEQSzovyYoJJA4W5fIm/0MRKPb1CQwRWqp5lnz0UtvTQC7wS9s0rF 6RWws3/ykqlke0QCR/iaA8RaZ8eSD78Wscco+ddUG1TPCRebInxkJiWEatdd1I1ZQ+Xd w4tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=iB3xdPLbnvzRXZ9pj7SdMcyOa2roZKbkLihlKFhALkM=; b=RKHoJI/tdz3atPXtoZQRBkVk6QVNgqPz06s4Bc5SGUBoll3haBpUzOUjXJVSY0Oi1X ZeiMc4vTjP3ufV3PFHxJ5AfglMy5j/1o4sMBvl+3Ko7nUZuPZwsBLSfFnpvAe6j6gsjz P9j4/72xMYFwVd8nS3jJfa58n0Hd9GUutvgJLvTs9x2o+axPd9wfKhTPMvtCd+FeQXWK P4p0ebnxmMS0CJ0wSFjAZb9U9i6xSl9LIiAVyX0vIGXeIsKqt1/roiVLZHQOkSxqHd1q 0ir5X7a2KcmOljf8iItIis8Yo5ovQtr+ogruuUPqaVLI7kZAFNs+mFYME1smjzz/cWX2 MjBA== X-Gm-Message-State: AOAM530AktOdIGGCyhCNplyPTcV1mZQZgEov080HjJN41PdFVZfYyzH2 nK7lcgI9ckU+S1aipYl84cL/fg== X-Google-Smtp-Source: ABdhPJxItQfjw0SaVTAqN5E1ENTrQa91frTNTj9nKOX3FeDN4EF84vkP5RmmpvWrdhMLgJxTvQ2moQ== X-Received: by 2002:ac8:6b45:: with SMTP id x5mr1138606qts.249.1627630768926; Fri, 30 Jul 2021 00:39:28 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g24sm296473qtr.86.2021.07.30.00.39.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:39:28 -0700 (PDT) Date: Fri, 30 Jul 2021 00:39:24 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 05/16] huge tmpfs: move shmem_huge_enabled() upwards In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org shmem_huge_enabled() is about to be enhanced into shmem_is_huge(), so that it can be used more widely throughout: before making functional changes, shift it to its final position (to avoid forward declaration). Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/shmem.c | 72 ++++++++++++++++++++++++++---------------------------- 1 file changed, 35 insertions(+), 37 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index c6fa6f4f2db8..740d48ef1eb5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -476,6 +476,41 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly; +bool shmem_huge_enabled(struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(vma->vm_file); + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); + loff_t i_size; + pgoff_t off; + + if ((vma->vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + if (shmem_huge == SHMEM_HUGE_FORCE) + return true; + if (shmem_huge == SHMEM_HUGE_DENY) + return false; + switch (sbinfo->huge) { + case SHMEM_HUGE_NEVER: + return false; + case SHMEM_HUGE_ALWAYS: + return true; + case SHMEM_HUGE_WITHIN_SIZE: + off = round_up(vma->vm_pgoff, HPAGE_PMD_NR); + i_size = round_up(i_size_read(inode), PAGE_SIZE); + if (i_size >= HPAGE_PMD_SIZE && + i_size >> PAGE_SHIFT >= off) + return true; + fallthrough; + case SHMEM_HUGE_ADVISE: + /* TODO: implement fadvise() hints */ + return (vma->vm_flags & VM_HUGEPAGE); + default: + VM_BUG_ON(1); + return false; + } +} + #if defined(CONFIG_SYSFS) static int shmem_parse_huge(const char *str) { @@ -3995,43 +4030,6 @@ struct kobj_attribute shmem_enabled_attr = __ATTR(shmem_enabled, 0644, shmem_enabled_show, shmem_enabled_store); #endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -bool shmem_huge_enabled(struct vm_area_struct *vma) -{ - struct inode *inode = file_inode(vma->vm_file); - struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); - loff_t i_size; - pgoff_t off; - - if ((vma->vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return false; - if (shmem_huge == SHMEM_HUGE_FORCE) - return true; - if (shmem_huge == SHMEM_HUGE_DENY) - return false; - switch (sbinfo->huge) { - case SHMEM_HUGE_NEVER: - return false; - case SHMEM_HUGE_ALWAYS: - return true; - case SHMEM_HUGE_WITHIN_SIZE: - off = round_up(vma->vm_pgoff, HPAGE_PMD_NR); - i_size = round_up(i_size_read(inode), PAGE_SIZE); - if (i_size >= HPAGE_PMD_SIZE && - i_size >> PAGE_SHIFT >= off) - return true; - fallthrough; - case SHMEM_HUGE_ADVISE: - /* TODO: implement fadvise() hints */ - return (vma->vm_flags & VM_HUGEPAGE); - default: - VM_BUG_ON(1); - return false; - } -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - #else /* !CONFIG_SHMEM */ /* From patchwork Fri Jul 30 07:42:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4865AC4320A for ; Fri, 30 Jul 2021 07:42:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29E3460EBB for ; Fri, 30 Jul 2021 07:42:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237886AbhG3Hm2 (ORCPT ); Fri, 30 Jul 2021 03:42:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230337AbhG3Hm0 (ORCPT ); Fri, 30 Jul 2021 03:42:26 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6136DC0613C1 for ; Fri, 30 Jul 2021 00:42:22 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id x9so5794882qtw.13 for ; Fri, 30 Jul 2021 00:42:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=xThcyhwVzbZ0QG3SJMzciKfOkK2D1+jiVLTEbX6+zFE=; b=HbUlXvFb5XYC3VEXO+6211+jPIKP5PobcFagWqFtBrvjjP+Cxr4VLM1qLw4GnGiHY4 7xuX0hmza/QU+KT850MEpof+IzVXeTzrcjvMq53TXy+7TpybwnXgNnwKRewAVUyTaNo8 5AY/s0nJXhrf79mI3dagZwDRkRpk+In3d0tMkqp/K5cVNQhYSpZKOhBstMUE4HeEvmy1 VCd6nPqLn/S/7f884daFRqK6jXJxIu/vJWh60QT2BMOpgippk7xNzyotPf3gqYxvjeKV nKAa/Z5XYPoOxmdZpLtRJqo4iGjrg3G4x11CRSjPTn786elcgmIVZ7SobbMWJkaG2gwv twWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=xThcyhwVzbZ0QG3SJMzciKfOkK2D1+jiVLTEbX6+zFE=; b=f+q0s9+RjSaK/IyS+Um84kmTmn8suAyzgzxhF/KnVJya+TVvkgFRhrIL5ajH2ky35w eMaEa6sR9NXeoOeR8bHfS5wN9eDHs2jWdffodBvIyE7t/rnKFUkwBMm2/Zxu4Kc1RVQK pSl3mA4eZ5PYlmiN3jVOgTE5O7hWRJUDS9wPdW0k3ms67vl4st2t6D04l44bjLvvwdiX EZ5uYDC1TzterkiiqW37XKdpAka6+OmP18j84xYUmbsu0sD5+cvQQ85sd9m7/VHIR3f4 ck9+IrEUlGmbBE5Y1cU3fd1oU3zHq5Q5vgeDjAgsot625XbgwaMLQPRdUJmUigvRGrpE y3XA== X-Gm-Message-State: AOAM530B+k3sBGi+PHq+zqf3SwJo9u1yAVpffDVPhpQqsQ/uC4tx7m2X llqIuPg0qLtZlNTG75sba+M0fQ== X-Google-Smtp-Source: ABdhPJzarucL0DNU7lVhbHFW8SWRQUj2t3cj9hw+0XuyUqg25IndM2TpaLf5DYuxw/fq6deq1ypAvg== X-Received: by 2002:ac8:72d6:: with SMTP id o22mr1139596qtp.177.1627630941171; Fri, 30 Jul 2021 00:42:21 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d79sm547197qke.45.2021.07.30.00.42.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:42:20 -0700 (PDT) Date: Fri, 30 Jul 2021 00:42:16 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 06/16] huge tmpfs: shmem_is_huge(vma, inode, index) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Extend shmem_huge_enabled(vma) to shmem_is_huge(vma, inode, index), so that a consistent set of checks can be applied, even when the inode is accessed through read/write syscalls (with NULL vma) instead of mmaps (the index argument is seldom of interest, but required by mount option "huge=within_size"). Clean up and rearrange the checks a little. This then replaces the checks which shmem_fault() and shmem_getpage_gfp() were making, and eliminates the SGP_HUGE and SGP_NOHUGE modes: while it's still true that khugepaged's collapse_file() at that point wants a small page, the race that might allocate it a huge page is too unlikely to be worth optimizing against (we are there *because* there was at least one small page in the way), and handled by a later PageTransCompound check. Replace a couple of 0s by explicit SHMEM_HUGE_NEVERs; and replace the obscure !shmem_mapping() symlink check by explicit S_ISLNK() - nothing else needs that symlink check, so leave it there in shmem_getpage_gfp(). Signed-off-by: Hugh Dickins --- include/linux/shmem_fs.h | 9 +++-- mm/khugepaged.c | 2 +- mm/shmem.c | 84 ++++++++++++---------------------------- 3 files changed, 32 insertions(+), 63 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 9b7f7ac52351..3b05a28e34c4 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -86,7 +86,12 @@ extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); extern int shmem_unuse(unsigned int type, bool frontswap, unsigned long *fs_pages_to_unuse); -extern bool shmem_huge_enabled(struct vm_area_struct *vma); +extern bool shmem_is_huge(struct vm_area_struct *vma, + struct inode *inode, pgoff_t index); +static inline bool shmem_huge_enabled(struct vm_area_struct *vma) +{ + return shmem_is_huge(vma, file_inode(vma->vm_file), vma->vm_pgoff); +} extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, pgoff_t start, pgoff_t end); @@ -95,8 +100,6 @@ extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ SGP_CACHE, /* don't exceed i_size, may allocate page */ - SGP_NOHUGE, /* like SGP_CACHE, but no huge pages */ - SGP_HUGE, /* like SGP_CACHE, huge pages preferred */ SGP_WRITE, /* may exceed i_size, may allocate !Uptodate page */ SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */ }; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0412be08fa2..cecb19c3e965 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1721,7 +1721,7 @@ static void collapse_file(struct mm_struct *mm, xas_unlock_irq(&xas); /* swap in or instantiate fallocated page */ if (shmem_getpage(mapping->host, index, &page, - SGP_NOHUGE)) { + SGP_CACHE)) { result = SCAN_FAIL; goto xa_unlocked; } diff --git a/mm/shmem.c b/mm/shmem.c index 740d48ef1eb5..6def7391084c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -474,39 +474,35 @@ static bool shmem_confirm_swap(struct address_space *mapping, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* ifdef here to avoid bloating shmem.o when not necessary */ -static int shmem_huge __read_mostly; +static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; -bool shmem_huge_enabled(struct vm_area_struct *vma) +bool shmem_is_huge(struct vm_area_struct *vma, + struct inode *inode, pgoff_t index) { - struct inode *inode = file_inode(vma->vm_file); - struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); loff_t i_size; - pgoff_t off; - if ((vma->vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return false; - if (shmem_huge == SHMEM_HUGE_FORCE) - return true; if (shmem_huge == SHMEM_HUGE_DENY) return false; - switch (sbinfo->huge) { - case SHMEM_HUGE_NEVER: + if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return false; + if (shmem_huge == SHMEM_HUGE_FORCE) + return true; + + switch (SHMEM_SB(inode->i_sb)->huge) { case SHMEM_HUGE_ALWAYS: return true; case SHMEM_HUGE_WITHIN_SIZE: - off = round_up(vma->vm_pgoff, HPAGE_PMD_NR); + index = round_up(index, HPAGE_PMD_NR); i_size = round_up(i_size_read(inode), PAGE_SIZE); - if (i_size >= HPAGE_PMD_SIZE && - i_size >> PAGE_SHIFT >= off) + if (i_size >= HPAGE_PMD_SIZE && (i_size >> PAGE_SHIFT) >= index) return true; fallthrough; case SHMEM_HUGE_ADVISE: - /* TODO: implement fadvise() hints */ - return (vma->vm_flags & VM_HUGEPAGE); + if (vma && (vma->vm_flags & VM_HUGEPAGE)) + return true; + fallthrough; default: - VM_BUG_ON(1); return false; } } @@ -680,6 +676,12 @@ static long shmem_unused_huge_count(struct super_block *sb, #define shmem_huge SHMEM_HUGE_DENY +bool shmem_is_huge(struct vm_area_struct *vma, + struct inode *inode, pgoff_t index) +{ + return false; +} + static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, struct shrink_control *sc, unsigned long nr_to_split) { @@ -1829,7 +1831,6 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, struct shmem_sb_info *sbinfo; struct mm_struct *charge_mm; struct page *page; - enum sgp_type sgp_huge = sgp; pgoff_t hindex = index; gfp_t huge_gfp; int error; @@ -1838,8 +1839,6 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) return -EFBIG; - if (sgp == SGP_NOHUGE || sgp == SGP_HUGE) - sgp = SGP_CACHE; repeat: if (sgp <= SGP_CACHE && ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) { @@ -1898,36 +1897,12 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, return 0; } - /* shmem_symlink() */ - if (!shmem_mapping(mapping)) - goto alloc_nohuge; - if (shmem_huge == SHMEM_HUGE_DENY || sgp_huge == SGP_NOHUGE) + /* Never use a huge page for shmem_symlink() */ + if (S_ISLNK(inode->i_mode)) goto alloc_nohuge; - if (shmem_huge == SHMEM_HUGE_FORCE) - goto alloc_huge; - switch (sbinfo->huge) { - case SHMEM_HUGE_NEVER: + if (!shmem_is_huge(vma, inode, index)) goto alloc_nohuge; - case SHMEM_HUGE_WITHIN_SIZE: { - loff_t i_size; - pgoff_t off; - - off = round_up(index, HPAGE_PMD_NR); - i_size = round_up(i_size_read(inode), PAGE_SIZE); - if (i_size >= HPAGE_PMD_SIZE && - i_size >> PAGE_SHIFT >= off) - goto alloc_huge; - fallthrough; - } - case SHMEM_HUGE_ADVISE: - if (sgp_huge == SGP_HUGE) - goto alloc_huge; - /* TODO: implement fadvise() hints */ - goto alloc_nohuge; - } - -alloc_huge: huge_gfp = vma_thp_gfp_mask(vma); huge_gfp = limit_gfp_mask(huge_gfp, gfp); page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true); @@ -2083,7 +2058,6 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; struct inode *inode = file_inode(vma->vm_file); gfp_t gfp = mapping_gfp_mask(inode->i_mapping); - enum sgp_type sgp; int err; vm_fault_t ret = VM_FAULT_LOCKED; @@ -2146,15 +2120,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) spin_unlock(&inode->i_lock); } - sgp = SGP_CACHE; - - if ((vma->vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - sgp = SGP_NOHUGE; - else if (vma->vm_flags & VM_HUGEPAGE) - sgp = SGP_HUGE; - - err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, sgp, + err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE, gfp, vma, vmf, &ret); if (err) return vmf_error(err); @@ -3961,7 +3927,7 @@ int __init shmem_init(void) if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY) SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge; else - shmem_huge = 0; /* just in case it was patched */ + shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */ #endif return 0; From patchwork Fri Jul 30 07:45:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0774CC4338F for ; Fri, 30 Jul 2021 07:45:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D98AC61057 for ; Fri, 30 Jul 2021 07:45:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237888AbhG3HqB (ORCPT ); Fri, 30 Jul 2021 03:46:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237667AbhG3Hp7 (ORCPT ); Fri, 30 Jul 2021 03:45:59 -0400 Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12798C0613C1 for ; Fri, 30 Jul 2021 00:45:54 -0700 (PDT) Received: by mail-qk1-x736.google.com with SMTP id 129so8613331qkg.4 for ; Fri, 30 Jul 2021 00:45:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=Hjb4Qvz+d4rYfp83KEA4cSUCQ4NRMEv6hD5clar2zEo=; b=a7HxZWdRuQp/zhxdj9ZcSffkcLO8HYoMF85EQHR69lnn6lWZFwTxyv1XpqKMeLtIqv XIXaGDHJwfWpAl+KwQV0h6XIqzYEOWE60HA9eqcxGD37L31oqLG3edyDXvQ8Retknk6K 0WuRqc35Hc3ncI3tEDPx4nIPfhoOoxRviPd0VD2XUuZ+hb4W1vHQugu6R4CpXCa/DO8C KSOXL82pzPwx3B5pU6GJNxVM/E22uC7bFDvuHFjKggCu97Z4dxW6o33YLWi4uEDKMi6K z/rRSowG6Za2DhA/BYRuReLIYJQ/tXN+EonrH/gzp+MW3wGfZKCQhOAp8C0SrnHPDetw UGeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=Hjb4Qvz+d4rYfp83KEA4cSUCQ4NRMEv6hD5clar2zEo=; b=IBaK5T/yhEl69Sw5nC7/gic5yqgVoaaCGrP3qOQLhzbZjG5jmMCUfUyqMk/MB3rQI0 Ep9ngGruz+7Ga4N9iG+KYKl5JwZatpse4nBdw/v2Kwl3YcOyxS1cw5HNNkP2F8/GT2aL Ih/q9GizNULn8+aG3EN+KQUY94BUW1zMIkvsTnEU7JjcrGcAxXT/vuhe6e/d5ODR9yls ZOVk6Op2MQggvBWyFPnJSQrEQ2DobtE4yjLzQz9gsY7BOUCotr9WuoRZj7cB766GeX/+ cncmnmloMxzmab0V4ub3eJJieBZm4UNgzuDxXvF/cPuC7FjfVJ71YzzR+IRY3EjbnD79 hF9Q== X-Gm-Message-State: AOAM530s8ajAhpAaySc6y8sH/wlFI6V/yV3IihsQyhdRnXw3cPw8jGep c+vI/zP/5PaIR4MSGyG06TWDEg== X-Google-Smtp-Source: ABdhPJyxwl7K7cMuTTQwVGAyfcH23tbKgh3bhsC84QKSqBFc0wlrNsNL3TZXdmX5JSzubXT3+108+A== X-Received: by 2002:ae9:e901:: with SMTP id x1mr1079559qkf.360.1627631153052; Fri, 30 Jul 2021 00:45:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x125sm539177qkd.8.2021.07.30.00.45.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:45:52 -0700 (PDT) Date: Fri, 30 Jul 2021 00:45:49 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/16] memfd: memfd_create(name, MFD_HUGEPAGE) for shmem huge pages In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Commit 749df87bd7be ("mm/shmem: add hugetlbfs support to memfd_create()") in 4.14 added the MFD_HUGETLB flag to memfd_create(), to use hugetlbfs pages instead of tmpfs pages: now add the MFD_HUGEPAGE flag, to use tmpfs Transparent Huge Pages when they can be allocated (flag named to follow the precedent of madvise's MADV_HUGEPAGE for THPs). /sys/kernel/mm/transparent_hugepage/shmem_enabled "always" or "force" already made this possible: but that is much too blunt an instrument, affecting all the very different kinds of files on the internal shmem mount, and was intended just for ease of testing hugepage loads. MFD_HUGEPAGE is implemented internally by VM_HUGEPAGE in the shmem inode flags: do not permit a PR_SET_THP_DISABLE (MMF_DISABLE_THP) task to set this flag, and do not set it if THPs are not allowed at all; but let the memfd_create() succeed even in those cases - the caller wants to create a memfd, just hinting how it's best allocated if huge pages are available. shmem_is_huge() (at allocation time or khugepaged time) applies its SHMEM_HUGE_DENY and vma VM_NOHUGEPAGE and vm_mm MMF_DISABLE_THP checks first, and only then allows the memfd's MFD_HUGEPAGE to take effect. Signed-off-by: Hugh Dickins --- include/uapi/linux/memfd.h | 3 ++- mm/memfd.c | 24 ++++++++++++++++++------ mm/shmem.c | 33 +++++++++++++++++++++++++++++++-- 3 files changed, 51 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h index 7a8a26751c23..8358a69e78cc 100644 --- a/include/uapi/linux/memfd.h +++ b/include/uapi/linux/memfd.h @@ -7,7 +7,8 @@ /* flags for memfd_create(2) (unsigned int) */ #define MFD_CLOEXEC 0x0001U #define MFD_ALLOW_SEALING 0x0002U -#define MFD_HUGETLB 0x0004U +#define MFD_HUGETLB 0x0004U /* Use hugetlbfs */ +#define MFD_HUGEPAGE 0x0008U /* Use huge tmpfs */ /* * Huge page size encoding when MFD_HUGETLB is specified, and a huge page diff --git a/mm/memfd.c b/mm/memfd.c index 081dd33e6a61..0d1a504d2fc9 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -245,7 +245,10 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB) +#define MFD_ALL_FLAGS (MFD_CLOEXEC | \ + MFD_ALLOW_SEALING | \ + MFD_HUGETLB | \ + MFD_HUGEPAGE) SYSCALL_DEFINE2(memfd_create, const char __user *, uname, @@ -257,14 +260,17 @@ SYSCALL_DEFINE2(memfd_create, char *name; long len; - if (!(flags & MFD_HUGETLB)) { - if (flags & ~(unsigned int)MFD_ALL_FLAGS) + if (flags & MFD_HUGETLB) { + /* Disallow huge tmpfs when choosing hugetlbfs */ + if (flags & MFD_HUGEPAGE) return -EINVAL; - } else { /* Allow huge page size encoding in flags. */ if (flags & ~(unsigned int)(MFD_ALL_FLAGS | (MFD_HUGE_MASK << MFD_HUGE_SHIFT))) return -EINVAL; + } else { + if (flags & ~(unsigned int)MFD_ALL_FLAGS) + return -EINVAL; } /* length includes terminating zero */ @@ -303,8 +309,14 @@ SYSCALL_DEFINE2(memfd_create, HUGETLB_ANONHUGE_INODE, (flags >> MFD_HUGE_SHIFT) & MFD_HUGE_MASK); - } else - file = shmem_file_setup(name, 0, VM_NORESERVE); + } else { + unsigned long vm_flags = VM_NORESERVE; + + if (flags & MFD_HUGEPAGE) + vm_flags |= VM_HUGEPAGE; + file = shmem_file_setup(name, 0, vm_flags); + } + if (IS_ERR(file)) { error = PTR_ERR(file); goto err_fd; diff --git a/mm/shmem.c b/mm/shmem.c index 6def7391084c..e2bcf3313686 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -476,6 +476,20 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; +/* + * Does either /sys/kernel/mm/transparent_hugepage/shmem_enabled or + * /sys/kernel/mm/transparent_hugepage/enabled allow transparent hugepages? + * (Can only return true when the machine has_transparent_hugepage() too.) + */ +static bool transparent_hugepage_allowed(void) +{ + return shmem_huge > SHMEM_HUGE_NEVER || + test_bit(TRANSPARENT_HUGEPAGE_FLAG, + &transparent_hugepage_flags) || + test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, + &transparent_hugepage_flags); +} + bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, pgoff_t index) { @@ -486,6 +500,8 @@ bool shmem_is_huge(struct vm_area_struct *vma, if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return false; + if (SHMEM_I(inode)->flags & VM_HUGEPAGE) + return true; if (shmem_huge == SHMEM_HUGE_FORCE) return true; @@ -676,6 +692,11 @@ static long shmem_unused_huge_count(struct super_block *sb, #define shmem_huge SHMEM_HUGE_DENY +bool transparent_hugepage_allowed(void) +{ + return false; +} + bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, pgoff_t index) { @@ -2171,10 +2192,14 @@ unsigned long shmem_get_unmapped_area(struct file *file, if (shmem_huge != SHMEM_HUGE_FORCE) { struct super_block *sb; + struct inode *inode; if (file) { VM_BUG_ON(file->f_op != &shmem_file_operations); - sb = file_inode(file)->i_sb; + inode = file_inode(file); + if (SHMEM_I(inode)->flags & VM_HUGEPAGE) + goto huge; + sb = inode->i_sb; } else { /* * Called directly from mm/mmap.c, or drivers/char/mem.c @@ -2187,7 +2212,7 @@ unsigned long shmem_get_unmapped_area(struct file *file, if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) return addr; } - +huge: offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1); if (offset && offset + len < 2 * HPAGE_PMD_SIZE) return addr; @@ -2308,6 +2333,10 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode atomic_set(&info->stop_eviction, 0); info->seals = F_SEAL_SEAL; info->flags = flags & VM_NORESERVE; + if ((flags & VM_HUGEPAGE) && + transparent_hugepage_allowed() && + !test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) + info->flags |= VM_HUGEPAGE; INIT_LIST_HEAD(&info->shrinklist); INIT_LIST_HEAD(&info->swaplist); simple_xattrs_init(&info->xattrs); From patchwork Fri Jul 30 07:48:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1960C432BE for ; Fri, 30 Jul 2021 07:48:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D36CF60EBB for ; Fri, 30 Jul 2021 07:48:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237824AbhG3Hsm (ORCPT ); Fri, 30 Jul 2021 03:48:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230240AbhG3Hsm (ORCPT ); Fri, 30 Jul 2021 03:48:42 -0400 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8F79C061765 for ; Fri, 30 Jul 2021 00:48:37 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id x9so5803124qtw.13 for ; Fri, 30 Jul 2021 00:48:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=EDpc0/d9Oj/hIds1HKJj7zYeYPkeXWmfcCU40dY3+sw=; b=mulBDyHU6yp0zs2CFm2s/qF8ZQrZf+J2EM92FJ7UD86JWzy6+B+y02mYtGpoe/bymr FulZ9jgWv8TSoZG2zmk+sh3BcGjARG5wTNH9RMKlhHCPgTdQ9t44NZmIsEgGqSX7T2eD ZPciAH+CbJ+bJ8Qkv/+dK/eJ6h2fJkahljPWaWbVMl0NaQvNAZyCdL6QRkftIXIxCfru FCfAjzrflmygFr09VLHPRLI4xj5kDYcsZ0yTrpmH6LR/6BSPrUqUw76Pt5G0B1yewiOt i3ppecxRxcCkJ1xMFXsBjQ7uC+2vG3bqwdqVS5Kk5oCB3p3WZ+BiUnt7eoup6ZDTLsQC H9JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=EDpc0/d9Oj/hIds1HKJj7zYeYPkeXWmfcCU40dY3+sw=; b=JaiJbsHs3/C2o66nEn2dXRYbwq9zQVzuOCknggEGSt51cwkGqezAxGj0BIq8U6Nszb Uc4Zw3seu3h2VgNZYL49gWZZWqW+uVD4gs+AK8iWRAniGD+zKR6ofwd/dIdKIhTBrMw8 4V3M3LrbFYboupFUIMhH5d1QNaxDj3LYRf5c7C+I2ceAuyEcsfKrodyJf/Idm7fRuLiv NEDXMF2UpM72YYdh7MeWK7NT/bScaSPsd9RcZwGWi5BrTJbSx5tg1kZhzIYPYKuTFffD xIfnTOgmV8cDHZl/jeQ1jZIl7FUM5oJTHv6RkycKBN6YlOb4QztwFG/2IFaEItEW3pfx D5IA== X-Gm-Message-State: AOAM532B2XxMxo3SXZVed2nFynm8kS0Z/HPoByK4NX8g7AHN3qa4h8LE 9DLK4G670FE0k0EB0u2yJFB5ug== X-Google-Smtp-Source: ABdhPJxpjf8WwHJfSjqtA9nxku8loLDko6+6MLVcrUnY5FWupM5kOcvLXGpW+9Cz0tLmMtD8aDa0vA== X-Received: by 2002:ac8:a84:: with SMTP id d4mr1172505qti.109.1627631316774; Fri, 30 Jul 2021 00:48:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id l4sm304571qtr.62.2021.07.30.00.48.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:48:35 -0700 (PDT) Date: Fri, 30 Jul 2021 00:48:33 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 08/16] huge tmpfs: fcntl(fd, F_HUGEPAGE) and fcntl(fd, F_NOHUGEPAGE) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <1c32c75b-095-22f0-aee3-30a44d4a4744@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add support for fcntl(fd, F_HUGEPAGE) and fcntl(fd, F_NOHUGEPAGE), to select hugeness per file: useful to override the default hugeness of the shmem mount, when occasionally needing to store a hugepage file in a smallpage mount or vice versa. These fcntls just specify whether or not to try for huge pages when allocating to the object later: F_HUGEPAGE does not touch small pages already allocated (though khugepaged may do so when the file is mapped afterwards), F_NOHUGEPAGE does not split huge pages already allocated. Why fcntl? Because it's already in use (for sealing) on memfds; and I'm anxious to keep this simple, just applying it to whole files: fallocate, madvise and posix_fadvise each involve a range, which would need a new kind of tree attached to the inode for proper support. Any application needing range support should be able to provide that from userspace, by issuing the respective fcntl prior to instantiating each range. Do not allow it when the file is open read-only (EBADF). Do not permit a PR_SET_THP_DISABLE (MMF_DISABLE_THP) task to interfere with the flags, and do not let VM_HUGEPAGE be set if THPs are not allowed at all (EPERM). Note that transparent_hugepage_allowed(), used to validate F_HUGEPAGE, accepts (anon) transparent_hugepage_flags in addition to mount option. This is to overcome the limitation of the "huge=advise" option, which applies hugepage alignment (reducing ASLR) to all mappings, because madvise(address,len,MADV_HUGEPAGE) needs address before it can be used. So mount option "huge=never" gives a default which can be overridden by fcntl(fd, F_HUGEPAGE) when /sys/kernel/mm/transparent_hugepage/enabled is not "never" too. (We could instead add a "huge=fcntl" mount option between "never" and "advise", but I lack the enthusiasm for that.) Signed-off-by: Hugh Dickins --- fs/fcntl.c | 5 +++ include/linux/shmem_fs.h | 8 +++++ include/uapi/linux/fcntl.h | 9 +++++ mm/shmem.c | 70 ++++++++++++++++++++++++++++++++++---- 4 files changed, 85 insertions(+), 7 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index f946bec8f1f1..9cfff87c3332 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -434,6 +435,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, case F_SET_FILE_RW_HINT: err = fcntl_rw_hint(filp, cmd, arg); break; + case F_HUGEPAGE: + case F_NOHUGEPAGE: + err = shmem_fcntl(filp, cmd, arg); + break; default: break; } diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 3b05a28e34c4..51b75d74ce89 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -67,6 +67,14 @@ extern int shmem_zero_setup(struct vm_area_struct *); extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts); +#ifdef CONFIG_TMPFS +extern long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg); +#else +static inline long shmem_fcntl(struct file *f, unsigned int c, unsigned long a) +{ + return -EINVAL; +} +#endif /* CONFIG_TMPFS */ #ifdef CONFIG_SHMEM extern const struct address_space_operations shmem_aops; static inline bool shmem_mapping(struct address_space *mapping) diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 2f86b2ad6d7e..10f82b223642 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -73,6 +73,15 @@ */ #define RWF_WRITE_LIFE_NOT_SET RWH_WRITE_LIFE_NOT_SET +/* + * Allocate hugepages when available: useful on a tmpfs which was not mounted + * with the "huge=always" option, as for memfds. And, do not allocate hugepages + * even when available: useful to cancel the above request, or make an exception + * on a tmpfs mounted with "huge=always" (without splitting existing hugepages). + */ +#define F_HUGEPAGE (F_LINUX_SPECIFIC_BASE + 15) +#define F_NOHUGEPAGE (F_LINUX_SPECIFIC_BASE + 16) + /* * Types of directory notifications that may be requested. */ diff --git a/mm/shmem.c b/mm/shmem.c index e2bcf3313686..67a4b7a4849b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -448,9 +448,9 @@ static bool shmem_confirm_swap(struct address_space *mapping, * enables huge pages for the mount; * SHMEM_HUGE_WITHIN_SIZE: * only allocate huge pages if the page will be fully within i_size, - * also respect fadvise()/madvise() hints; + * also respect fcntl()/madvise() hints; * SHMEM_HUGE_ADVISE: - * only allocate huge pages if requested with fadvise()/madvise(); + * only allocate huge pages if requested with fcntl()/madvise(). */ #define SHMEM_HUGE_NEVER 0 @@ -477,13 +477,13 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; /* - * Does either /sys/kernel/mm/transparent_hugepage/shmem_enabled or + * Does either tmpfs mount option (or transparent_hugepage/shmem_enabled) or * /sys/kernel/mm/transparent_hugepage/enabled allow transparent hugepages? * (Can only return true when the machine has_transparent_hugepage() too.) */ -static bool transparent_hugepage_allowed(void) +static bool transparent_hugepage_allowed(struct shmem_sb_info *sbinfo) { - return shmem_huge > SHMEM_HUGE_NEVER || + return sbinfo->huge > SHMEM_HUGE_NEVER || test_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags) || test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, @@ -500,6 +500,8 @@ bool shmem_is_huge(struct vm_area_struct *vma, if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return false; + if (SHMEM_I(inode)->flags & VM_NOHUGEPAGE) + return false; if (SHMEM_I(inode)->flags & VM_HUGEPAGE) return true; if (shmem_huge == SHMEM_HUGE_FORCE) @@ -692,7 +694,7 @@ static long shmem_unused_huge_count(struct super_block *sb, #define shmem_huge SHMEM_HUGE_DENY -bool transparent_hugepage_allowed(void) +bool transparent_hugepage_allowed(struct shmem_sb_info *sbinfo) { return false; } @@ -2197,6 +2199,8 @@ unsigned long shmem_get_unmapped_area(struct file *file, if (file) { VM_BUG_ON(file->f_op != &shmem_file_operations); inode = file_inode(file); + if (SHMEM_I(inode)->flags & VM_NOHUGEPAGE) + return addr; if (SHMEM_I(inode)->flags & VM_HUGEPAGE) goto huge; sb = inode->i_sb; @@ -2211,6 +2215,11 @@ unsigned long shmem_get_unmapped_area(struct file *file, } if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) return addr; + /* + * Note that SHMEM_HUGE_ADVISE has to give out huge-aligned + * addresses to everyone, because madvise(,,MADV_HUGEPAGE) + * needs the address-chicken on which to advise if huge-egg. + */ } huge: offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1); @@ -2334,7 +2343,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode info->seals = F_SEAL_SEAL; info->flags = flags & VM_NORESERVE; if ((flags & VM_HUGEPAGE) && - transparent_hugepage_allowed() && + transparent_hugepage_allowed(sbinfo) && !test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) info->flags |= VM_HUGEPAGE; INIT_LIST_HEAD(&info->shrinklist); @@ -2674,6 +2683,53 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence) return offset; } +static int shmem_huge_fcntl(struct file *file, unsigned int cmd) +{ + struct inode *inode = file_inode(file); + struct shmem_inode_info *info = SHMEM_I(inode); + + if (!(file->f_mode & FMODE_WRITE)) + return -EBADF; + if (test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) + return -EPERM; + if (cmd == F_HUGEPAGE && + !transparent_hugepage_allowed(SHMEM_SB(inode->i_sb))) + return -EPERM; + + inode_lock(inode); + if (cmd == F_HUGEPAGE) { + info->flags &= ~VM_NOHUGEPAGE; + info->flags |= VM_HUGEPAGE; + } else { + info->flags &= ~VM_HUGEPAGE; + info->flags |= VM_NOHUGEPAGE; + } + inode_unlock(inode); + return 0; +} + +long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg) +{ + long error = -EINVAL; + + if (file->f_op != &shmem_file_operations) + return error; + + switch (cmd) { + /* + * case F_ADD_SEALS: + * case F_GET_SEALS: + * are handled by memfd_fcntl(). + */ + case F_HUGEPAGE: + case F_NOHUGEPAGE: + error = shmem_huge_fcntl(file, cmd); + break; + } + + return error; +} + static long shmem_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { From patchwork Fri Jul 30 07:51:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54F00C4320E for ; Fri, 30 Jul 2021 07:51:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A07A61050 for ; Fri, 30 Jul 2021 07:51:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237851AbhG3HvJ (ORCPT ); Fri, 30 Jul 2021 03:51:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230240AbhG3HvI (ORCPT ); Fri, 30 Jul 2021 03:51:08 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1DB0C0613CF for ; Fri, 30 Jul 2021 00:51:04 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id x3so8614731qkl.6 for ; Fri, 30 Jul 2021 00:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=ZYvXgAMhZX01NKmlvH33RhZ44JsHMLRjdU/+X4oPAP0=; b=M++CnQDxvFEiHc9PulkSLAh6kL9L0Mf02pYSg26QbAHd0RupUQVcTbQ6JejGvxhBWa uDCs7A2+Blzlyec01AEwB3KYzygoRhepQ6cM7zkuC26SuVwQAgqYyDMYGvAb+p6tuWVt uAti9yDs3Y0FMyKX/H1+I893Cjxm2AstBrNhJ5/Yj6dO2Njiiu0h7F7BLrpjKHQf7Ag3 iOEagjkCO1QRJk1OwPPNUwpwm6/wsrkuI72zheG31DfrIoFPKjDcKUJykTvL4iMsvS+h dkQMCsTsIQBzcYeQAaiSJFwaYWNXTllFQVFbKi2YeGxu3YbeQ8/w4dCKKQF9WC7QqTrC hx9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=ZYvXgAMhZX01NKmlvH33RhZ44JsHMLRjdU/+X4oPAP0=; b=OUwIRLvx59Jsv2bjiL4DKqkbh1favT8jHkPpFOaR74PeYKEMYLeK4/ksBVeUTBQjiH uxlZhu9npQb9PpyETUIdbNFHzvnaf790zcuFNWeKqoNzIsA4RSd6OlI5CbhI7c4EJTM5 8A+huxoFKMhdn2l/RpbRdu6Zx1nxQuetmj+I21nNR95LvEGexQf6s2F1MhGVkhKW+0Lj XCKidaRK2W2XQw0CxSDbLfcGu9VlNuy67hoNfWh0knE0/32ue4zd7KDU2IMAiMbyo3jy ukmZHbD5SXoQXDRA8qVwLbAMiKxYhLX8LLltBGivytRkEUcBqvyvhmw/0mfxNxyKlwsE JzCg== X-Gm-Message-State: AOAM53120CAvOkMfURudUCvBS7XnjLYbt7R6H9mqR8zbAoK6jHd2HBhH 5MJjerccC+IWJZtv2WgP7hPwTg== X-Google-Smtp-Source: ABdhPJyPBn/olLKduBhhHdPeEPWykg5rqcweE1bgQNc9dr4GS3RCw9sWt+ENRQNyeOq3zhFv/hgaog== X-Received: by 2002:a05:620a:1242:: with SMTP id a2mr1063374qkl.443.1627631463539; Fri, 30 Jul 2021 00:51:03 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7sm507569qki.102.2021.07.30.00.51.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:51:02 -0700 (PDT) Date: Fri, 30 Jul 2021 00:51:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 09/16] huge tmpfs: decide stat.st_blksize by shmem_is_huge() In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org 4.18 commit 89fdcd262fd4 ("mm: shmem: make stat.st_blksize return huge page size if THP is on") added is_huge_enabled() to decide st_blksize: now that hugeness can be defined per file, that too needs to be replaced by shmem_is_huge(). Unless they have been fcntl'ed F_HUGEPAGE, this does give a different answer (No) for small files on a "huge=within_size" mount: but that can be considered a minor bugfix. And a different answer (No) for unfcntl'ed files on a "huge=advise" mount: I'm reluctant to complicate it, just to reproduce the same debatable answer as before. Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- mm/shmem.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 67a4b7a4849b..f50f2ede71da 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -712,15 +712,6 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool is_huge_enabled(struct shmem_sb_info *sbinfo) -{ - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && - (shmem_huge == SHMEM_HUGE_FORCE || sbinfo->huge) && - shmem_huge != SHMEM_HUGE_DENY) - return true; - return false; -} - /* * Like add_to_page_cache_locked, but error if expected item has gone. */ @@ -1101,7 +1092,6 @@ static int shmem_getattr(struct user_namespace *mnt_userns, { struct inode *inode = path->dentry->d_inode; struct shmem_inode_info *info = SHMEM_I(inode); - struct shmem_sb_info *sb_info = SHMEM_SB(inode->i_sb); if (info->alloced - info->swapped != inode->i_mapping->nrpages) { spin_lock_irq(&info->lock); @@ -1110,7 +1100,7 @@ static int shmem_getattr(struct user_namespace *mnt_userns, } generic_fillattr(&init_user_ns, inode, stat); - if (is_huge_enabled(sb_info)) + if (shmem_is_huge(NULL, inode, 0)) stat->blksize = HPAGE_PMD_SIZE; return 0; From patchwork Fri Jul 30 07:55:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26874C432BE for ; Fri, 30 Jul 2021 07:55:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E3A560C51 for ; Fri, 30 Jul 2021 07:55:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238001AbhG3Hzd (ORCPT ); Fri, 30 Jul 2021 03:55:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237851AbhG3Hzc (ORCPT ); Fri, 30 Jul 2021 03:55:32 -0400 Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17EFCC0613C1 for ; Fri, 30 Jul 2021 00:55:27 -0700 (PDT) Received: by mail-oi1-x230.google.com with SMTP id z26so12066885oih.10 for ; Fri, 30 Jul 2021 00:55:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=RMStr+4np8YoPskN92/pPzruONJ2EqyVRrBa1/D0LCs=; b=lh3y8fimrdjE5vv7C7NyR40xZo8b9wN1o3GcQD0WrNYcvMjTmwIS6m0UMTToLDPzzO jhutDxwEI4Q7aJlj0GMTeFqxg36Fv2SGHWPMk7ZPouhevmncddjx8PurTfOHM1GDu/F8 +VGmQ6zWUTWxhfZwjrrfQByl393WiN0qmjTFmGBTcb5U2EvBgDzbF+KRr1ottqHHqtbn Ql+PvNOOxbFRvUaB2lkdQF3Q88RM8SbpvlQI32xnBMz0YioHyWzCESA7J4P65h2XFaHu NYB9AnDqs9ENYBY/SqoiP1dbOATAyQKOztzTSITS2KFz5bHR6LAd7H7U0q+jP4snrGe1 QH9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=RMStr+4np8YoPskN92/pPzruONJ2EqyVRrBa1/D0LCs=; b=WLm9vn11PUeGuUka4vYLy43rmnipT0qJm7ZcSAo2iwP4mlHnV4g4vNjQbAXylh3jl7 zkt0vfANEnezBMblnr0AKUT7FtZ87XPTq2ujBvMHP4sYgWeFAbds9OfrgpfHAZgR/LA3 pG5opJG8biaOnKDk2l9Na5JbdjoSu+V+MCnJtD6egvTkcDPAcRuUtvjbkqoK6hgRnt3c jLfVCKo7Hw0Lv2Xr3cgSFRTWDn4EzmBky/0WkSCnAi8LG2perkveMhbayKQ4E3bwkmZF BSmiIPqbjiIRvX3JET9YpclC8HWa7pJDceumZqiPS+g7fpI0e2phqrG36ha1g4VUrTNZ kwqA== X-Gm-Message-State: AOAM533JBwFaJf9qiQAoB/WR8+kFu3LNXnme6gPDXzBpo4AmMEMbXFzk V/UdI5P8IkWH11G53kBA+22n/A== X-Google-Smtp-Source: ABdhPJy4kWg+4Sk9qcVwxcHQBhMXif/y5EwKo/yp8CClsDo7kMW10/4mmrm6MeykZRLm4UnpUK6b+A== X-Received: by 2002:a54:4d8f:: with SMTP id y15mr1065690oix.32.1627631726130; Fri, 30 Jul 2021 00:55:26 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i20sm135085ook.12.2021.07.30.00.55.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:55:25 -0700 (PDT) Date: Fri, 30 Jul 2021 00:55:22 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 10/16] tmpfs: fcntl(fd, F_MEM_LOCK) to memlock a tmpfs file In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <54e03798-d836-ae64-f41-4a1d46bc115b@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Shakeel Butt A new uapi to lock the files on tmpfs in memory, to protect against swap without mapping the files. This commit introduces two new commands to fcntl and shmem: F_MEM_LOCK and F_MEM_UNLOCK. The locking will be charged against RLIMIT_MEMLOCK of uid in namespace of the caller. This feature is implemented by mostly re-using the shmctl's SHM_LOCK mechanism (System V IPC shared memory). This api follows the design choices of shmctl's SHM_LOCK and also of mlock2 syscall where pages on swap are not populated on the syscall. The pages will be brought to memory on first access. As with System V shared memory, these pages are counted as Unevictable in /proc/meminfo (when they are allocated, or when page reclaim finds any allocated earlier), but they are not counted as Mlocked there. For simplicity the locked files are forbidden to grow or shrink to keep the user accounting simple. This design decision will be revisited once such use-case arises. The permissions to lock and unlock differs slightly from other similar interfaces. Anyone having CAP_IPC_LOCK or remaining rlimit can lock the file, but the unlocker has to have either CAP_IPC_LOCK or it should be the locker itself. This commit does not make the locked status of a tmpfs file visible. We can add an F_MEM_LOCKED fcntl later, to query that status if required; but it's not yet clear how best to make it visible. Signed-off-by: Shakeel Butt Signed-off-by: Hugh Dickins --- fs/fcntl.c | 2 ++ include/linux/shmem_fs.h | 1 + include/uapi/linux/fcntl.h | 7 +++++ mm/shmem.c | 59 ++++++++++++++++++++++++++++++++++++-- 4 files changed, 66 insertions(+), 3 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index 9cfff87c3332..a3534764b50e 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -437,6 +437,8 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, break; case F_HUGEPAGE: case F_NOHUGEPAGE: + case F_MEM_LOCK: + case F_MEM_UNLOCK: err = shmem_fcntl(filp, cmd, arg); break; default: diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 51b75d74ce89..ffdd0da816e5 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -24,6 +24,7 @@ struct shmem_inode_info { struct shared_policy policy; /* NUMA memory alloc policy */ struct simple_xattrs xattrs; /* list of xattrs */ atomic_t stop_eviction; /* hold when working on inode */ + struct ucounts *mlock_ucounts; /* user memlocked tmpfs file */ struct inode vfs_inode; }; diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 10f82b223642..21dc969df0fd 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -82,6 +82,13 @@ #define F_HUGEPAGE (F_LINUX_SPECIFIC_BASE + 15) #define F_NOHUGEPAGE (F_LINUX_SPECIFIC_BASE + 16) +/* + * Lock all pages of file into memory, as they are allocated; or unlock them. + * Currently supported only on tmpfs, and on its memfd_created files. + */ +#define F_MEM_LOCK (F_LINUX_SPECIFIC_BASE + 17) +#define F_MEM_UNLOCK (F_LINUX_SPECIFIC_BASE + 18) + /* * Types of directory notifications that may be requested. */ diff --git a/mm/shmem.c b/mm/shmem.c index f50f2ede71da..ba9b9900287b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -888,7 +888,7 @@ unsigned long shmem_swap_usage(struct vm_area_struct *vma) } /* - * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists. + * SHM_UNLOCK or F_MEM_UNLOCK restore Unevictable pages to their evictable list. */ void shmem_unlock_mapping(struct address_space *mapping) { @@ -897,7 +897,7 @@ void shmem_unlock_mapping(struct address_space *mapping) pagevec_init(&pvec); /* - * Minor point, but we might as well stop if someone else SHM_LOCKs it. + * Minor point, but we might as well stop if someone else memlocks it. */ while (!mapping_unevictable(mapping)) { if (!pagevec_lookup(&pvec, mapping, &index)) @@ -1123,7 +1123,8 @@ static int shmem_setattr(struct user_namespace *mnt_userns, /* protected by i_mutex */ if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) || - (newsize > oldsize && (info->seals & F_SEAL_GROW))) + (newsize > oldsize && (info->seals & F_SEAL_GROW)) || + (newsize != oldsize && info->mlock_ucounts)) return -EPERM; if (newsize != oldsize) { @@ -1161,6 +1162,10 @@ static void shmem_evict_inode(struct inode *inode) struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); if (shmem_mapping(inode->i_mapping)) { + if (info->mlock_ucounts) { + user_shm_unlock(inode->i_size, info->mlock_ucounts); + info->mlock_ucounts = NULL; + } shmem_unacct_size(info->flags, inode->i_size); inode->i_size = 0; shmem_truncate_range(inode, 0, (loff_t)-1); @@ -2266,6 +2271,7 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) /* * What serializes the accesses to info->flags? + * inode_lock() when called from shmem_memlock_fcntl(), * ipc_lock_object() when called from shmctl_do_lock(), * no serialization needed when called from shm_destroy(). */ @@ -2286,6 +2292,43 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) return retval; } +static int shmem_memlock_fcntl(struct file *file, unsigned int cmd) +{ + struct inode *inode = file_inode(file); + struct shmem_inode_info *info = SHMEM_I(inode); + bool cleanup_mapping = false; + int retval = 0; + + inode_lock(inode); + if (cmd == F_MEM_LOCK) { + if (!info->mlock_ucounts) { + struct ucounts *ucounts = current_ucounts(); + /* capability/rlimit check is down in user_shm_lock */ + retval = shmem_lock(file, 1, ucounts); + if (!retval) + info->mlock_ucounts = ucounts; + else if (!rlimit(RLIMIT_MEMLOCK)) + retval = -EPERM; + /* else retval == -ENOMEM */ + } + } else { /* F_MEM_UNLOCK */ + if (info->mlock_ucounts) { + if (info->mlock_ucounts == current_ucounts() || + capable(CAP_IPC_LOCK)) { + shmem_lock(file, 0, info->mlock_ucounts); + info->mlock_ucounts = NULL; + cleanup_mapping = true; + } else + retval = -EPERM; + } + } + inode_unlock(inode); + + if (cleanup_mapping) + shmem_unlock_mapping(file->f_mapping); + return retval; +} + static int shmem_mmap(struct file *file, struct vm_area_struct *vma) { struct shmem_inode_info *info = SHMEM_I(file_inode(file)); @@ -2503,6 +2546,8 @@ shmem_write_begin(struct file *file, struct address_space *mapping, if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size) return -EPERM; } + if (unlikely(info->mlock_ucounts) && pos + len > inode->i_size) + return -EPERM; return shmem_getpage(inode, index, pagep, SGP_WRITE); } @@ -2715,6 +2760,10 @@ long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg) case F_NOHUGEPAGE: error = shmem_huge_fcntl(file, cmd); break; + case F_MEM_LOCK: + case F_MEM_UNLOCK: + error = shmem_memlock_fcntl(file, cmd); + break; } return error; @@ -2778,6 +2827,10 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, error = -EPERM; goto out; } + if (info->mlock_ucounts && offset + len > inode->i_size) { + error = -EPERM; + goto out; + } start = offset >> PAGE_SHIFT; end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT; From patchwork Fri Jul 30 07:57:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CE86C432BE for ; Fri, 30 Jul 2021 07:58:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E90A161054 for ; Fri, 30 Jul 2021 07:58:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237895AbhG3H6G (ORCPT ); Fri, 30 Jul 2021 03:58:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237888AbhG3H6F (ORCPT ); Fri, 30 Jul 2021 03:58:05 -0400 Received: from mail-ot1-x32b.google.com (mail-ot1-x32b.google.com [IPv6:2607:f8b0:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A68DC061765 for ; Fri, 30 Jul 2021 00:58:01 -0700 (PDT) Received: by mail-ot1-x32b.google.com with SMTP id x15-20020a05683000cfb02904d1f8b9db81so8624921oto.12 for ; Fri, 30 Jul 2021 00:58:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=Ah5+LqI1Dziut4yM0LH3jQhWAKB3EiOL6LUittwAvkA=; b=PZSsLjQhcjyWAqzvgQshw7+5wZKOaT84MH5jTXoLyyI/aOKDQwtjaOC8rspyuwR40D CPEzyuE2oY1RLOrSY1DwOaWdrUFxPFQ7XZxVb/F45EKbtlDPwmk9NYUTYdHB6voKCv5J NWAVxgTmbC9AIVv1MulyF6w3VltvV+ZvFpaXGQ40tgHDAWtuqSTyD73X4oPShh5dvZ9o rh5oQ96rSEvCqs5JDkjnxTOZuAlSHA10NF8NHIR43vAsccP9TdV7QGyC2MwgJMJL/+Lv JQMF3kva96grq5CFt8QuFkcc04skiTeWZ44uu8EtK40TUyhev/AUzfBrw9bMK1QGwlha SMkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=Ah5+LqI1Dziut4yM0LH3jQhWAKB3EiOL6LUittwAvkA=; b=p07pMBEikxiqkpGSF4cvXXVK5Re6OgPUt2V6N41CCRRhG8YrU3FWUKAUPZoDyr3JrX Xm41SvAQYI9+OCZvsUacGJIm62f8Rx0TOIEdlV1G/CSeiT+nd6xXTyttKHEODoVkCpC3 FLBWY/lxPazD++TI66om2FPB6Blz5AvHw1Xrid6ZSTRBx5UOOj2+OTrkYsNJyxPTd+WM Ga8Fpy7m5jJqqI6h5aE68OIRE8sC/D9nhA+caXfuChKpNweKxkOMXCVGaNmjDwsT0TJj woTciyk3Ozt/IzXm1gV2Iik2GI/M9CEUA8+8ruIMqjtXyYEacO5h0XGbTSCjhh+viL3p gpwA== X-Gm-Message-State: AOAM530Ku4ccaC3k+LctGChzkd47jrn95qMaWavLbrHLNg8wWEzCdJyr TIEV+e7DvCnm2yXxCtFzme3vIg== X-Google-Smtp-Source: ABdhPJyC6TATMzaci6gPPfYcCgBGVbL7XqE2Ve5WDt3z2Ltgzz9z2ISDZBfr8b91jrb3S/yU1QEXWA== X-Received: by 2002:a9d:6f99:: with SMTP id h25mr1064541otq.113.1627631880857; Fri, 30 Jul 2021 00:58:00 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y19sm179786oia.22.2021.07.30.00.57.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:57:59 -0700 (PDT) Date: Fri, 30 Jul 2021 00:57:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 11/16] tmpfs: fcntl(fd, F_MEM_LOCKED) to test if memlocked In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Though we have not yet found a compelling need to make the locked status of a tmpfs file visible, and offer no tool to show it, the kernel ought to be able to support such a tool: add the F_MEM_LOCKED fcntl, returning -1 on failure (not tmpfs), 0 when not F_MEM_LOCKED, 1 when F_MEM_LOCKED. Signed-off-by: Hugh Dickins --- fs/fcntl.c | 1 + include/uapi/linux/fcntl.h | 1 + mm/shmem.c | 4 ++++ 3 files changed, 6 insertions(+) diff --git a/fs/fcntl.c b/fs/fcntl.c index a3534764b50e..0d8dc723732d 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -439,6 +439,7 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, case F_NOHUGEPAGE: case F_MEM_LOCK: case F_MEM_UNLOCK: + case F_MEM_LOCKED: err = shmem_fcntl(filp, cmd, arg); break; default: diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 21dc969df0fd..012585e8c9ab 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -88,6 +88,7 @@ */ #define F_MEM_LOCK (F_LINUX_SPECIFIC_BASE + 17) #define F_MEM_UNLOCK (F_LINUX_SPECIFIC_BASE + 18) +#define F_MEM_LOCKED (F_LINUX_SPECIFIC_BASE + 19) /* * Types of directory notifications that may be requested. diff --git a/mm/shmem.c b/mm/shmem.c index ba9b9900287b..6e53dabe658b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2299,6 +2299,9 @@ static int shmem_memlock_fcntl(struct file *file, unsigned int cmd) bool cleanup_mapping = false; int retval = 0; + if (cmd == F_MEM_LOCKED) + return !!info->mlock_ucounts; + inode_lock(inode); if (cmd == F_MEM_LOCK) { if (!info->mlock_ucounts) { @@ -2762,6 +2765,7 @@ long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg) break; case F_MEM_LOCK: case F_MEM_UNLOCK: + case F_MEM_LOCKED: error = shmem_memlock_fcntl(file, cmd); break; } From patchwork Fri Jul 30 08:00:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11878C4320A for ; Fri, 30 Jul 2021 08:00:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E830260C51 for ; Fri, 30 Jul 2021 08:00:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237979AbhG3IA0 (ORCPT ); Fri, 30 Jul 2021 04:00:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237886AbhG3IAZ (ORCPT ); Fri, 30 Jul 2021 04:00:25 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CAC0C061765 for ; Fri, 30 Jul 2021 01:00:21 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id z24so8625972qkz.7 for ; Fri, 30 Jul 2021 01:00:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=Y7HX5YFqqvflWtOAFdHp5v6MXY/OcEeQOaBTkBgwruM=; b=u8YUNlG7+lu5Is9eMpdvivobPs3T+0kbTL+S17t5SgHQK7ZI42wPzwim0p5Qj9MV7u 2bU9YmpRo2bxZv0e7RZC0bKrCIrL2hx3bFOEZ6pFcNGTALYst0tCHVstlt4tlGIRQQdX YBJkVxUcLZYwRKs5+DEOoB5RT6WXu3viAXutpX6d0PengNqn+KpyfxmeUiJktHHuDf9S wz6YT+4+y452/F7EFITeaClRAPZZk6ry5z+Bi8M8uGzUp3dTwWg0eb8/yMP9X5qjqoH0 P/262T7dGLXaXcsIuhno7w4OQvYVMVnVrxPySbDDf0a4bY6qGegqMAGl1aNtMhFP5DZ/ 8ibw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=Y7HX5YFqqvflWtOAFdHp5v6MXY/OcEeQOaBTkBgwruM=; b=UPsjSwQtVa9O6X/7aXDbIfEpa8QSUTlNVxo3Dp4pXaD4hHvlvdyV7WNCtd0wm6KZSd eg9XN+Pjod4sagJHvwmtou5HVbrwj8EyYlzSYAUKWw1mbfdvT01WoZjQrUOO49mRjbxN XsOTHDU+8HkGlA7fVONKHOtcFSb9z+AsbAV1Pfvw6s/66qZcmzQCGV17s/azMu9xFBUx bBE6p6OUYV+AST2E3qsa+QsBjbFeDmxPgGRplueCnvbkeuUzonr91VH4w4Roq7Ys0sSz UWNO5FwJohm2iwnprktcwWk+HOxsbfpUxqNKB9BbGpz0SbLcg/w39g4S8oiD4fw2H304 4sPA== X-Gm-Message-State: AOAM533cZirjmyma3X9Nsm6s8GaupJGxew8T2G1jZzh8bx3yrUM3ym0v okYk91Qnm17XsUZbHZzBzF7Cew== X-Google-Smtp-Source: ABdhPJxX+FZqmUvFbB6LFhFizDSmok15mmV3G7aae8zfjyp+xlKCEmRsFFggV9F2XDQhme81TddiHg== X-Received: by 2002:a05:620a:13a1:: with SMTP id m1mr1061123qki.91.1627632020050; Fri, 30 Jul 2021 01:00:20 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m80sm536727qke.98.2021.07.30.01.00.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:00:18 -0700 (PDT) Date: Fri, 30 Jul 2021 01:00:16 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 12/16] tmpfs: refuse memlock when fallocated beyond i_size In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <3e5b2999-a27d-3590-46d9-80841b9427a9@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org F_MEM_LOCK is accounted by i_size, but fallocate(,FALLOC_FL_KEEP_SIZE,,) could have added many pages beyond i_size, which would also be held as Unevictable from memory. The mlock_ucounts check in shmem_fallocate() is fine, but shmem_memlock_fcntl() needs to check fallocend too. We could change F_MEM_LOCK accounting to use the max of i_size and fallocend, but fallocend is obscure: I think it's better just to refuse the F_MEM_LOCK (with EPERM) if fallocend exceeds (page-rounded) i_size. Signed-off-by: Hugh Dickins --- mm/shmem.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 6e53dabe658b..35c0f5c7120e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2304,7 +2304,10 @@ static int shmem_memlock_fcntl(struct file *file, unsigned int cmd) inode_lock(inode); if (cmd == F_MEM_LOCK) { - if (!info->mlock_ucounts) { + if (info->fallocend > DIV_ROUND_UP(inode->i_size, PAGE_SIZE)) { + /* locking is accounted by i_size: disallow excess */ + retval = -EPERM; + } else if (!info->mlock_ucounts) { struct ucounts *ucounts = current_ucounts(); /* capability/rlimit check is down in user_shm_lock */ retval = shmem_lock(file, 1, ucounts); @@ -2854,9 +2857,10 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, spin_unlock(&inode->i_lock); /* - * info->fallocend is only relevant when huge pages might be + * info->fallocend is mostly relevant when huge pages might be * involved: to prevent split_huge_page() freeing fallocated * pages when FALLOC_FL_KEEP_SIZE committed beyond i_size. + * But it is also checked in F_MEM_LOCK validation. */ undo_fallocend = info->fallocend; if (info->fallocend < end) From patchwork Fri Jul 30 08:03:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADAFEC4338F for ; Fri, 30 Jul 2021 08:03:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8BCD1603E9 for ; Fri, 30 Jul 2021 08:03:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237910AbhG3ID4 (ORCPT ); Fri, 30 Jul 2021 04:03:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237961AbhG3IDg (ORCPT ); Fri, 30 Jul 2021 04:03:36 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B234C0613D3 for ; Fri, 30 Jul 2021 01:03:20 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id t66so8702470qkb.0 for ; Fri, 30 Jul 2021 01:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=MGzbYcDH1nDg768HY0nxSbDVEpHf6CaY6eKAkj1Bcno=; b=o9CkvmfR1TcmeOk6LOKvzvte8hfyZYPrZSw2A7Wzr6ScvLcANBE/Yoki51+w9yX2f6 LVZpGnEbwEkqt7Hnt4+bRy/6x7vHr//wG299GJtb3ki0DosAuUEDfoNE2XtI+uUhKWo8 JHPPLfpx8KtKArXuCe/nuOYN+WbhwkFd6C/is1BOTenZhPT2ltWUzirXBxTvUZUS+Eth QpoRVlzDTTDqPnJEkTWpSHqLZy5XfvU0CnPupILljD14KkG6MGijbWyMZkjws6psWPKu It373FYdIqlePn1A4XbdCWvefbcGFvZ9tg5krXyxfwhlI9Zg4rXj4U19/DZ057wpLO/T RdPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=MGzbYcDH1nDg768HY0nxSbDVEpHf6CaY6eKAkj1Bcno=; b=hoI5ul8DMsHyKIw1K37854Mdt4EHwrkq0saTQrLTYisu6qZtNbg2+Eu0abOz7AhgHQ 0MAFVujfIPj8B2Q4Mgzlfda6AYSpv6gkyc+Q1T2p/uHdXeXROJU0kp2RPZ64Eae//9Pu fwODBwAq5nGcmczM2t6btoY93MU7nq93hN5Bq4cq3Hxop3B3/gji6qIDhgtF95f1EoIt OqlBwoIFsKEDmmApCqPy+3S9RSodnnZRzHjKqQd++V1sf+f/+j97gSjdR+9/rvOJ/0Y6 +TUqdK109urGYhHzbPQ/hcj0EzJpDqVbqs70pXzpZJlEEDzJu5w3Z4/GVsVfuQygk385 kQNQ== X-Gm-Message-State: AOAM530n9YgXL+yW4e8Dh5NHYO3qx4DpSgykIofLn7pBSmXg7QyFh6h2 FrQJQ5jASj8O4gMtdVpsmpud6Q== X-Google-Smtp-Source: ABdhPJykYS4oamptTpmIT6dFsy8t5YJDFW4VgtJtNRS/my2z80sEMhOQ8L7GnOEOo0Fymj/Aya18fA== X-Received: by 2002:a37:4042:: with SMTP id n63mr1074442qka.425.1627632199330; Fri, 30 Jul 2021 01:03:19 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y9sm316166qtw.51.2021.07.30.01.03.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:03:18 -0700 (PDT) Date: Fri, 30 Jul 2021 01:03:15 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 13/16] mm: bool user_shm_lock(loff_t size, struct ucounts *) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org user_shm_lock()'s size_t size was big enough for SysV SHM locking, but not quite big enough for O_LARGEFILE on 32-bit: change to loff_t size. And while changing the prototype, let's use bool rather than int here. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 4 ++-- mm/mlock.c | 14 +++++++------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 7ca22e6e694a..f1be2221512b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1713,8 +1713,8 @@ extern bool can_do_mlock(void); #else static inline bool can_do_mlock(void) { return false; } #endif -extern int user_shm_lock(size_t, struct ucounts *); -extern void user_shm_unlock(size_t, struct ucounts *); +extern bool user_shm_lock(loff_t size, struct ucounts *ucounts); +extern void user_shm_unlock(loff_t size, struct ucounts *ucounts); /* * Parameter block passed down to zap_pte_range in exceptional cases. diff --git a/mm/mlock.c b/mm/mlock.c index 16d2ee160d43..7df88fce0fc9 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -813,21 +813,21 @@ SYSCALL_DEFINE0(munlockall) } /* - * Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB - * shm segments) get accounted against the user_struct instead. + * Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB shm + * segments and F_MEM_LOCK tmpfs) get accounted to the user_namespace instead. */ static DEFINE_SPINLOCK(shmlock_user_lock); -int user_shm_lock(size_t size, struct ucounts *ucounts) +bool user_shm_lock(loff_t size, struct ucounts *ucounts) { unsigned long lock_limit, locked; long memlock; - int allowed = 0; + bool allowed = false; locked = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; lock_limit = rlimit(RLIMIT_MEMLOCK); if (lock_limit == RLIM_INFINITY) - allowed = 1; + allowed = true; lock_limit >>= PAGE_SHIFT; spin_lock(&shmlock_user_lock); memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); @@ -840,13 +840,13 @@ int user_shm_lock(size_t size, struct ucounts *ucounts) dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; } - allowed = 1; + allowed = true; out: spin_unlock(&shmlock_user_lock); return allowed; } -void user_shm_unlock(size_t size, struct ucounts *ucounts) +void user_shm_unlock(loff_t size, struct ucounts *ucounts) { spin_lock(&shmlock_user_lock); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); From patchwork Fri Jul 30 08:06:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B19BAC4320A for ; Fri, 30 Jul 2021 08:06:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 97A7360FED for ; Fri, 30 Jul 2021 08:06:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237992AbhG3IGr (ORCPT ); Fri, 30 Jul 2021 04:06:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230422AbhG3IGn (ORCPT ); Fri, 30 Jul 2021 04:06:43 -0400 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98AB3C061765 for ; Fri, 30 Jul 2021 01:06:39 -0700 (PDT) Received: by mail-qv1-xf30.google.com with SMTP id 3so4784209qvd.2 for ; Fri, 30 Jul 2021 01:06:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=1kfuy3p4fQPWj8BIXMnfIDnrdaVNjXGHJ629/sTHhN0=; b=CGPy5pxveZbPA1YkH+RTGRMS6aPEOFmKcznpUhPyKtR9PWFd7pRCReXbF5wpNnlYp/ 9SpcE6rwiUrLzJQyIfAD/Yzb5gZ3Mv8DewkIbl39cqv3QD3ds4jZEJSD49UlWMzu+xLT 5JlghbqtaO9QyjG6S5rwyAiGMYfZTJVtRuFXIRVXiwjvyY71TgPJTMDUX+0TaOp+wGx9 kZgUaTLM3vhSlQud2H2zdrvR9FR1S6J7SWdH6M1/RsHPz5Pq7alHk4GsDrodUH/l+fXh vXErqQtQA1uE+Mggx8zNjo9CJz9yMyRMqxCpNn4IOIPbAFTZ9xwBw1wqz0A9YfmLWtvb J0sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=1kfuy3p4fQPWj8BIXMnfIDnrdaVNjXGHJ629/sTHhN0=; b=Xo9YCuA1vqxKxz6nCIo270D7Yt6BkzQJ8OaWMybhbvF2beRBdR7UxpUfzkhFSmlFeu rs2ztclR8YkteanNUJczQCHQ+uUxlu+oNRP7bfuYCzVMwRP31nWu3HAaOOUp1BdUvunI fSZIuhZPTNxR80/QwVcS6Ys6oz0pvwVnc6rnWwRU1WcKtXDm9h/Yl+yxHSk2/McVBuRj IeGyny1oMppgiyP1Q3ZSM+oN9tepii6rdjSumRVLRivZutY57WIGycIjqKQ+vlhF2sLy kkKrviR/lPEv/LbfVzdjmj6cJh9z9H8seFpc/YQCPlyJ1S/wytxfmAzQtz2WgBB4Z6HQ MoEQ== X-Gm-Message-State: AOAM5334GzL8xWms4yM8B+uyEYs9HYZxlwoXE2EC/ah0M/7x1+a33pfI 29ZqLezwEHl0YWJFONltwIvKfw== X-Google-Smtp-Source: ABdhPJzjhVOQvAERKlBjHv/dpZ6fohKVV28g9IpIkBW8etwIftV8Ohu4XnZ/N7AaWt7UMH4c2S5Vmg== X-Received: by 2002:a05:6214:ca5:: with SMTP id s5mr1431061qvs.58.1627632398575; Fri, 30 Jul 2021 01:06:38 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t8sm328269qtq.28.2021.07.30.01.06.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:06:37 -0700 (PDT) Date: Fri, 30 Jul 2021 01:06:35 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 14/16] mm: user_shm_lock(,,getuc) and user_shm_unlock(,,putuc) In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: <4bd4072-7eb0-d1a5-ce49-82f4b24bd070@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org user_shm_lock() and user_shm_unlock() have to get and put a reference on the ucounts structure, and get fails at overflow. That will be awkward for the next commit (shrinking ought not to fail), so add an argument (always true in this commit) to condition that get and put. It would be even easier to do the put_ucounts() separately when unlocking, but messy for the get_ucounts() when locking: better to keep them symmetric. Signed-off-by: Hugh Dickins --- fs/hugetlbfs/inode.c | 4 ++-- include/linux/mm.h | 4 ++-- ipc/shm.c | 4 ++-- mm/mlock.c | 9 +++++---- mm/shmem.c | 6 +++--- 5 files changed, 14 insertions(+), 13 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cdfb1ae78a3f..381902288f4d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1465,7 +1465,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size, if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) { *ucounts = current_ucounts(); - if (user_shm_lock(size, *ucounts)) { + if (user_shm_lock(size, *ucounts, true)) { task_lock(current); pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is deprecated\n", current->comm, current->pid); @@ -1499,7 +1499,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size, iput(inode); out: if (*ucounts) { - user_shm_unlock(size, *ucounts); + user_shm_unlock(size, *ucounts, true); *ucounts = NULL; } return file; diff --git a/include/linux/mm.h b/include/linux/mm.h index f1be2221512b..43cb5a6f97ff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1713,8 +1713,8 @@ extern bool can_do_mlock(void); #else static inline bool can_do_mlock(void) { return false; } #endif -extern bool user_shm_lock(loff_t size, struct ucounts *ucounts); -extern void user_shm_unlock(loff_t size, struct ucounts *ucounts); +extern bool user_shm_lock(loff_t size, struct ucounts *ucounts, bool getuc); +extern void user_shm_unlock(loff_t size, struct ucounts *ucounts, bool putuc); /* * Parameter block passed down to zap_pte_range in exceptional cases. diff --git a/ipc/shm.c b/ipc/shm.c index 748933e376ca..3e63809d38b7 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -289,7 +289,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp) shmem_lock(shm_file, 0, shp->mlock_ucounts); else if (shp->mlock_ucounts) user_shm_unlock(i_size_read(file_inode(shm_file)), - shp->mlock_ucounts); + shp->mlock_ucounts, true); fput(shm_file); ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); @@ -699,7 +699,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); if (is_file_hugepages(file) && shp->mlock_ucounts) - user_shm_unlock(size, shp->mlock_ucounts); + user_shm_unlock(size, shp->mlock_ucounts, true); fput(file); ipc_rcu_putref(&shp->shm_perm, shm_rcu_free); return error; diff --git a/mm/mlock.c b/mm/mlock.c index 7df88fce0fc9..5afa3eba9a13 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -818,7 +818,7 @@ SYSCALL_DEFINE0(munlockall) */ static DEFINE_SPINLOCK(shmlock_user_lock); -bool user_shm_lock(loff_t size, struct ucounts *ucounts) +bool user_shm_lock(loff_t size, struct ucounts *ucounts, bool getuc) { unsigned long lock_limit, locked; long memlock; @@ -836,7 +836,7 @@ bool user_shm_lock(loff_t size, struct ucounts *ucounts) dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; } - if (!get_ucounts(ucounts)) { + if (getuc && !get_ucounts(ucounts)) { dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; } @@ -846,10 +846,11 @@ bool user_shm_lock(loff_t size, struct ucounts *ucounts) return allowed; } -void user_shm_unlock(loff_t size, struct ucounts *ucounts) +void user_shm_unlock(loff_t size, struct ucounts *ucounts, bool putuc) { spin_lock(&shmlock_user_lock); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); spin_unlock(&shmlock_user_lock); - put_ucounts(ucounts); + if (putuc) + put_ucounts(ucounts); } diff --git a/mm/shmem.c b/mm/shmem.c index 35c0f5c7120e..1ddb910e976c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1163,7 +1163,7 @@ static void shmem_evict_inode(struct inode *inode) if (shmem_mapping(inode->i_mapping)) { if (info->mlock_ucounts) { - user_shm_unlock(inode->i_size, info->mlock_ucounts); + user_shm_unlock(inode->i_size, info->mlock_ucounts, true); info->mlock_ucounts = NULL; } shmem_unacct_size(info->flags, inode->i_size); @@ -2276,13 +2276,13 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) * no serialization needed when called from shm_destroy(). */ if (lock && !(info->flags & VM_LOCKED)) { - if (!user_shm_lock(inode->i_size, ucounts)) + if (!user_shm_lock(inode->i_size, ucounts, true)) goto out_nomem; info->flags |= VM_LOCKED; mapping_set_unevictable(file->f_mapping); } if (!lock && (info->flags & VM_LOCKED) && ucounts) { - user_shm_unlock(inode->i_size, ucounts); + user_shm_unlock(inode->i_size, ucounts, true); info->flags &= ~VM_LOCKED; mapping_clear_unevictable(file->f_mapping); } From patchwork Fri Jul 30 08:09:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FF8BC432BE for ; Fri, 30 Jul 2021 08:10:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 266D661008 for ; Fri, 30 Jul 2021 08:10:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237895AbhG3IKG (ORCPT ); Fri, 30 Jul 2021 04:10:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230408AbhG3IKG (ORCPT ); Fri, 30 Jul 2021 04:10:06 -0400 Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1CF6C061765 for ; Fri, 30 Jul 2021 01:10:00 -0700 (PDT) Received: by mail-qk1-x736.google.com with SMTP id c18so8672280qke.2 for ; Fri, 30 Jul 2021 01:10:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=iO85ZMC6Pth+31xPvVbOO8VKXEmUR+jk3X7hFax1vLE=; b=gL63kdkFN7iyXAPotPB2doPP4eoWZKnl2sfroPBPAmL0gzVw5Upydd4gfixGG7s4WC giXC1QKX0/mEwYnfv2DX5x83kim9pV5K+Gm8NMUXleRvZrKLiKonmPtcs7J3ioWqHe6F Ckfr9iXkkJcyYmswDgsxzVxSxBIysXhiyCXSKrZy8vVuiXmWrD2T+1NhJYm15q+9DkgU XwXZEF2H4svRElS8LTbSdUa4RDEDk42LvlEa7OxioJzfqS/+4IOmq0XGsfWVqB2JgtX/ zTuwoWGUPFo1/OuUj+lNosYVHJAFcNXFHYHIbWeDozDiEk7Ux5ckytcnsj1H/ae+IAiB bTCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=iO85ZMC6Pth+31xPvVbOO8VKXEmUR+jk3X7hFax1vLE=; b=UG+mWzqX60n0AvCB581TDwrHJejmC71Z5ml1YlxUWENcgmRrVr6o317O3NKv7V7wZ8 Dpwl1cx+pm1zjLGeCzlRwFo/7A5vGOU1BGv7C9BoSP+SnEzZw+5hN7Vfeme/AiLAbJaP MyrAcK5ICZ0kvmGN1UL6w6w/nUwAG/GazuqyMHG3tv3SfJLfSanJGuhGzMR2DpEZkwvW F/+bTqy+qZ7SapTIh7oUnbhyaUyQWYONhtS5aeVBmfcUUQ26vFqdfiZsqMJImUdBOUS7 o/IWNFz1AMmjrR2qyVNgaIji/Y1Rs/rg7usEsnutTGlJtqdO4NnTQboJlgiJcqRw5BlK jn0Q== X-Gm-Message-State: AOAM533ysuVeGDuK2KeU6AUMDxkd/95qe0xXAHeKh1XNssXw1+rfEX4Y FxhbgUDZECnzxGTZJM3PXfwQJw== X-Google-Smtp-Source: ABdhPJzxLMkX99MbqShrEgAY7r7lr2pFJKeO21G4hsIJ29P9sxgcQLfSXm9uBiPHU0BoiYUpJA2ihQ== X-Received: by 2002:ae9:e90e:: with SMTP id x14mr1119707qkf.118.1627632599619; Fri, 30 Jul 2021 01:09:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id w71sm571098qkb.67.2021.07.30.01.09.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:09:58 -0700 (PDT) Date: Fri, 30 Jul 2021 01:09:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 15/16] tmpfs: permit changing size of memlocked file In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We have users who change the size of their memlocked file by F_MEM_UNLOCK, ftruncate, F_MEM_LOCK. That risks swapout in between, and is distasteful: particularly if the file is very large (when shmem_unlock_mapping() has a lot of work to move pages off the Unevictable list, only for them to be moved back there later on). Modify shmem_setattr() to grow or shrink, and shmem_fallocate() to grow, the locked extent. But forbid (EPERM) both if current_ucounts() differs from the locker's mlock_ucounts (without even a CAP_IPC_LOCK override). They could be permitted (the caller already has unsealed write access), but it's probably less confusing to restrict size change to the locker. But leave shmem_write_begin() as is, preventing the memlocked file from being extended implicitly by writes beyond EOF: I think that it's best to demand an explicit size change, by truncate or fallocate, when memlocked. (But notice in testing "echo x >memlockedfile" how the O_TRUNC succeeds but the write fails: would F_MEM_UNLOCK on truncation to 0 be better?) Signed-off-by: Hugh Dickins --- mm/shmem.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 1ddb910e976c..fa4a264453bf 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1123,15 +1123,30 @@ static int shmem_setattr(struct user_namespace *mnt_userns, /* protected by i_mutex */ if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) || - (newsize > oldsize && (info->seals & F_SEAL_GROW)) || - (newsize != oldsize && info->mlock_ucounts)) + (newsize > oldsize && (info->seals & F_SEAL_GROW))) return -EPERM; if (newsize != oldsize) { - error = shmem_reacct_size(SHMEM_I(inode)->flags, - oldsize, newsize); + struct ucounts *ucounts = info->mlock_ucounts; + + if (ucounts && ucounts != current_ucounts()) + return -EPERM; + error = shmem_reacct_size(info->flags, + oldsize, newsize); if (error) return error; + if (ucounts) { + loff_t mlock = round_up(newsize, PAGE_SIZE) - + round_up(oldsize, PAGE_SIZE); + if (mlock < 0) { + user_shm_unlock(-mlock, ucounts, false); + } else if (mlock > 0 && + !user_shm_lock(mlock, ucounts, false)) { + shmem_reacct_size(info->flags, + newsize, oldsize); + return -EPERM; + } + } i_size_write(inode, newsize); inode->i_ctime = inode->i_mtime = current_time(inode); } @@ -2784,6 +2799,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_falloc shmem_falloc; pgoff_t start, index, end, undo_fallocend; + loff_t mlock = 0; int error; if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) @@ -2830,13 +2846,23 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, if (error) goto out; - if ((info->seals & F_SEAL_GROW) && offset + len > inode->i_size) { - error = -EPERM; - goto out; - } - if (info->mlock_ucounts && offset + len > inode->i_size) { + if (offset + len > inode->i_size) { error = -EPERM; - goto out; + if (info->seals & F_SEAL_GROW) + goto out; + if (info->mlock_ucounts) { + if (info->mlock_ucounts != current_ucounts() || + (mode & FALLOC_FL_KEEP_SIZE)) + goto out; + mlock = round_up(offset + len, PAGE_SIZE) - + round_up(inode->i_size, PAGE_SIZE); + if (mlock > 0 && + !user_shm_lock(mlock, info->mlock_ucounts, false)) { + mlock = 0; + goto out; + } + } + error = 0; } start = offset >> PAGE_SHIFT; @@ -2932,6 +2958,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, inode->i_private = NULL; spin_unlock(&inode->i_lock); out: + if (error && mlock > 0) + user_shm_unlock(mlock, info->mlock_ucounts, false); inode_unlock(inode); return error; } From patchwork Fri Jul 30 08:13:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12410631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 854BAC4338F for ; Fri, 30 Jul 2021 08:13:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 695D460F9B for ; Fri, 30 Jul 2021 08:13:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238060AbhG3INL (ORCPT ); Fri, 30 Jul 2021 04:13:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238048AbhG3INK (ORCPT ); Fri, 30 Jul 2021 04:13:10 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C58A8C0613CF for ; Fri, 30 Jul 2021 01:13:04 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id t18so5860941qta.8 for ; Fri, 30 Jul 2021 01:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=6jcq19xGjLmUrRwSvlYxDwEEZEY30Ie1/WmTXGS/o98=; b=Kac6JXUK7HkQ+j1nfCyWf78Lx9/J8DhbK9cEe+BUdcchaQcuwA3yUFSUhn/812O94Y NYSWAx93XtwzseOdArk3qWr12pWQKPcJXgne/v78P0RQYFGrJ9nealVyPoIWx/EqYa7m hDroul1FOfEO5SGdz3xKVOhIExAaODMNchQY4wjDIFQbwb10BIZ+bstMTSQIg1LN8qqy eQAeDVhg1sz7SnP/E6f7lZDfkw7dTurQMEXnYyWb1vsMskpJatZ803Gdo+KCv9j+v8j+ ZEaOVViwafQGLKntkVwsZOYOPJK6ZWRqxSUO+LBdEYTTuWXaUrj8wSn5ScQLmiKbcD64 v8vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=6jcq19xGjLmUrRwSvlYxDwEEZEY30Ie1/WmTXGS/o98=; b=RQBcYO1vmXz1IG60dHvLeiLOcZcyVivDreqc0J6xyuiwrHFYummtYyWVS46us7N4b5 w756zIhQiwyp2qeYCT+acyIDcZH6Zr06CpeoeW5gI6IdLs5SFab5qrmVVjbcLnxgDeNo l6F0AfRyh1OAiGpc47qKkOJ32kJHn9F4YUHv2IWuEtC+P6yIWJW/HuwLnoyJCGDz2N4I +RGXFBbvpo9S5o8RmVw+rbbv6vLP/gkqYLP2eyy1r81LVaAgquuUlHtBAByB38JY4bHX KnN7RFrJERD5Cx0WzXDDoABDXzjGZMI8djff2mJlKb9AzdJ6Jh3ksYh8zuS3vfNhbvP/ tuYQ== X-Gm-Message-State: AOAM532TjZHsrGTVYcgIv8IfRlweXojV3Kvi5TFjscFNP/JveNBPCDrQ pGqQtNJUAtUvcrGJiqTIck0WBg== X-Google-Smtp-Source: ABdhPJyEd1jy6kKL/d+sUdaFX+XbhOvASGlRdpkXHz5LKfA/TN8qNb3mrpImsaMxEE6XRNqjUfjSpg== X-Received: by 2002:ac8:5552:: with SMTP id o18mr1239016qtr.51.1627632783757; Fri, 30 Jul 2021 01:13:03 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d129sm539530qkf.136.2021.07.30.01.13.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 01:13:02 -0700 (PDT) Date: Fri, 30 Jul 2021 01:13:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 16/16] memfd: memfd_create(name, MFD_MEM_LOCK) for memlocked shmem In-Reply-To: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> Message-ID: References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Now that the size of a memlocked file can be changed, memfd_create() can accept an MFD_MEM_LOCK flag to request memlocking, even though the initial size is of course 0. Signed-off-by: Hugh Dickins --- include/uapi/linux/memfd.h | 1 + mm/memfd.c | 7 +++++-- mm/shmem.c | 13 ++++++++++++- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h index 8358a69e78cc..9113b5aa1763 100644 --- a/include/uapi/linux/memfd.h +++ b/include/uapi/linux/memfd.h @@ -9,6 +9,7 @@ #define MFD_ALLOW_SEALING 0x0002U #define MFD_HUGETLB 0x0004U /* Use hugetlbfs */ #define MFD_HUGEPAGE 0x0008U /* Use huge tmpfs */ +#define MFD_MEM_LOCK 0x0010U /* Memlock tmpfs */ /* * Huge page size encoding when MFD_HUGETLB is specified, and a huge page diff --git a/mm/memfd.c b/mm/memfd.c index 0d1a504d2fc9..e39f9eed55d2 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -248,7 +248,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) #define MFD_ALL_FLAGS (MFD_CLOEXEC | \ MFD_ALLOW_SEALING | \ MFD_HUGETLB | \ - MFD_HUGEPAGE) + MFD_HUGEPAGE | \ + MFD_MEM_LOCK) SYSCALL_DEFINE2(memfd_create, const char __user *, uname, @@ -262,7 +263,7 @@ SYSCALL_DEFINE2(memfd_create, if (flags & MFD_HUGETLB) { /* Disallow huge tmpfs when choosing hugetlbfs */ - if (flags & MFD_HUGEPAGE) + if (flags & (MFD_HUGEPAGE | MFD_MEM_LOCK)) return -EINVAL; /* Allow huge page size encoding in flags. */ if (flags & ~(unsigned int)(MFD_ALL_FLAGS | @@ -314,6 +315,8 @@ SYSCALL_DEFINE2(memfd_create, if (flags & MFD_HUGEPAGE) vm_flags |= VM_HUGEPAGE; + if (flags & MFD_MEM_LOCK) + vm_flags |= VM_LOCKED; file = shmem_file_setup(name, 0, vm_flags); } diff --git a/mm/shmem.c b/mm/shmem.c index fa4a264453bf..a0a83e59ae07 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2395,7 +2395,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode spin_lock_init(&info->lock); atomic_set(&info->stop_eviction, 0); info->seals = F_SEAL_SEAL; - info->flags = flags & VM_NORESERVE; + info->flags = flags & (VM_NORESERVE | VM_LOCKED); if ((flags & VM_HUGEPAGE) && transparent_hugepage_allowed(sbinfo) && !test_bit(MMF_DISABLE_THP, ¤t->mm->flags)) @@ -4254,6 +4254,17 @@ static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name, l inode->i_size = size; clear_nlink(inode); /* It is unlinked */ res = ERR_PTR(ramfs_nommu_expand_for_mapping(inode, size)); + if (!IS_ERR(res) && (flags & VM_LOCKED)) { + struct ucounts *ucounts = current_ucounts(); + /* + * Only memfd_create() may pass VM_LOCKED, and it passes + * size 0; but avoid that assumption in case it changes. + */ + if (user_shm_lock(size, ucounts, true)) + SHMEM_I(inode)->mlock_ucounts = ucounts; + else + res = ERR_PTR(-EPERM); + } if (!IS_ERR(res)) res = alloc_file_pseudo(inode, mnt, name, O_RDWR, &shmem_file_operations);