From patchwork Thu Sep 5 18:23:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11133755 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9516113BD for ; Thu, 5 Sep 2019 18:23:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1237720828 for ; Thu, 5 Sep 2019 18:23:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Gpl9LKCw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389881AbfIESXy (ORCPT ); Thu, 5 Sep 2019 14:23:54 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:34672 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389837AbfIESXy (ORCPT ); Thu, 5 Sep 2019 14:23:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=tXdc4iXzoXHU6OjGWpZjE6sWjkGFF7V5SHiPuHEFbCs=; b=Gpl9LKCwzm4apXNKWlvcBoujeM PczdnpXwPrCTloQ/JcvJKN8U2YuahJIfrkP4L9cUTYB3V1ZoPBSrWxAJyzoTsIj0+uDx5oS+h2CTC 9WMsyr8KX2eDDyjkltgxe8psNHJRdUKtp45khXvYjgtXRbqbQtAXCH5jPgQ/sPHMHrJbuEUclg3da lBLNtjsLsfv2I8xEU3XAITOWeji5QqdmivrHTMv0HJVopIqdALlSG1fBQeTkNv/dwEzCdz/eZW5Xa AzDyQykKBt5d7ufUxG0MM1zraic1i+RhB8ub0KKJOYa8lnpwR+CGQRubsYGvcK4Wi+6QTjYl1yYcv BGa/FFzA==; Received: from willy by bombadil.infradead.org with local (Exim 4.92 #3 (Red Hat Linux)) id 1i5wQA-0001U6-Dv; Thu, 05 Sep 2019 18:23:50 +0000 From: Matthew Wilcox To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , Kirill Shutemov , Song Liu , William Kucharski , Johannes Weiner Subject: [PATCH 1/3] mm: Add __page_cache_alloc_order Date: Thu, 5 Sep 2019 11:23:46 -0700 Message-Id: <20190905182348.5319-2-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190905182348.5319-1-willy@infradead.org> References: <20190905182348.5319-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" This new function allows page cache pages to be allocated that are larger than an order-0 page. Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Song Liu --- include/linux/pagemap.h | 14 +++++++++++--- mm/filemap.c | 11 +++++++---- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 103205494ea0..d2147215d415 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -208,14 +208,22 @@ static inline int page_cache_add_speculative(struct page *page, int count) } #ifdef CONFIG_NUMA -extern struct page *__page_cache_alloc(gfp_t gfp); +extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order); #else -static inline struct page *__page_cache_alloc(gfp_t gfp) +static inline +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order) { - return alloc_pages(gfp, 0); + if (order > 0) + gfp |= __GFP_COMP; + return alloc_pages(gfp, order); } #endif +static inline struct page *__page_cache_alloc(gfp_t gfp) +{ + return __page_cache_alloc_order(gfp, 0); +} + static inline struct page *page_cache_alloc(struct address_space *x) { return __page_cache_alloc(mapping_gfp_mask(x)); diff --git a/mm/filemap.c b/mm/filemap.c index 05a5aa82cd32..041c77c4ca56 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -957,24 +957,27 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, EXPORT_SYMBOL_GPL(add_to_page_cache_lru); #ifdef CONFIG_NUMA -struct page *__page_cache_alloc(gfp_t gfp) +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order) { int n; struct page *page; + if (order > 0) + gfp |= __GFP_COMP; + if (cpuset_do_page_mem_spread()) { unsigned int cpuset_mems_cookie; do { cpuset_mems_cookie = read_mems_allowed_begin(); n = cpuset_mem_spread_node(); - page = __alloc_pages_node(n, gfp, 0); + page = __alloc_pages_node(n, gfp, order); } while (!page && read_mems_allowed_retry(cpuset_mems_cookie)); return page; } - return alloc_pages(gfp, 0); + return alloc_pages(gfp, order); } -EXPORT_SYMBOL(__page_cache_alloc); +EXPORT_SYMBOL(__page_cache_alloc_order); #endif /* From patchwork Thu Sep 5 18:23:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11133751 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C25BB16B1 for ; Thu, 5 Sep 2019 18:23:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3E30720870 for ; Thu, 5 Sep 2019 18:23:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="ru0grhHm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389867AbfIESXv (ORCPT ); Thu, 5 Sep 2019 14:23:51 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:34320 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389833AbfIESXv (ORCPT ); Thu, 5 Sep 2019 14:23:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Wv/nfVfHUcEEwLwvWDB143BDF1cy+chJvZKm5OI/UQ0=; b=ru0grhHmHVJaHFm/ZEKyOlN78+ X3yojMPGb2MTakkInNd8Gs0vdgmL5sdIz7ZAQt2XmZcvCvMva8YL9nqPANMRw6WgXm6W3z3zR16mC sHw+e8qWOYBHMAw6Q/qoY87ISI0H0ZkWh0QmlEKtsJ3i67Non71xJ9cppIhuHwm44QAR9z7qiJ4G8 2UzSSL9xeg3flYfjVOuu06ZRhCfiVtqG611OK0rAKQn5rDQiL591OtBu3vacSCVzir3K2JzHZfv/J bL+4/HrdMaifmK44Q7+WUUe5vsFDO7Feubh2w6R069zPYE/yNXn5zgBcM/Wt4wnE3dvhvTQIExukq 2rPa1FXA==; Received: from willy by bombadil.infradead.org with local (Exim 4.92 #3 (Red Hat Linux)) id 1i5wQA-0001UQ-N1; Thu, 05 Sep 2019 18:23:50 +0000 From: Matthew Wilcox To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , Kirill Shutemov , Song Liu , William Kucharski , Johannes Weiner Subject: [PATCH 2/3] mm: Allow large pages to be added to the page cache Date: Thu, 5 Sep 2019 11:23:47 -0700 Message-Id: <20190905182348.5319-3-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190905182348.5319-1-willy@infradead.org> References: <20190905182348.5319-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" We return -EEXIST if there are any non-shadow entries in the page cache in the range covered by the large page. If there are multiple shadow entries in the range, we set *shadowp to one of them (currently the one at the highest index). If that turns out to be the wrong answer, we can implement something more complex. This is mostly modelled after the equivalent function in the shmem code. Signed-off-by: Matthew Wilcox (Oracle) --- mm/filemap.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 041c77c4ca56..ae3c0a70a8e9 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -850,6 +850,7 @@ static int __add_to_page_cache_locked(struct page *page, int huge = PageHuge(page); struct mem_cgroup *memcg; int error; + unsigned int nr = 1; void *old; VM_BUG_ON_PAGE(!PageLocked(page), page); @@ -861,31 +862,47 @@ static int __add_to_page_cache_locked(struct page *page, gfp_mask, &memcg, false); if (error) return error; + xas_set_order(&xas, offset, compound_order(page)); + nr = compound_nr(page); } - get_page(page); + page_ref_add(page, nr); page->mapping = mapping; page->index = offset; do { + unsigned long exceptional = 0; + unsigned int i = 0; + xas_lock_irq(&xas); - old = xas_load(&xas); - if (old && !xa_is_value(old)) + xas_for_each_conflict(&xas, old) { + if (!xa_is_value(old)) + break; + exceptional++; + if (shadowp) + *shadowp = old; + } + if (old) { xas_set_err(&xas, -EEXIST); - xas_store(&xas, page); + break; + } + xas_create_range(&xas); if (xas_error(&xas)) goto unlock; - if (xa_is_value(old)) { - mapping->nrexceptional--; - if (shadowp) - *shadowp = old; +next: + xas_store(&xas, page); + if (++i < nr) { + xas_next(&xas); + goto next; } - mapping->nrpages++; + mapping->nrexceptional -= exceptional; + mapping->nrpages += nr; /* hugetlb pages do not participate in page cache accounting */ if (!huge) - __inc_node_page_state(page, NR_FILE_PAGES); + __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, + nr); unlock: xas_unlock_irq(&xas); } while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK)); @@ -902,7 +919,7 @@ static int __add_to_page_cache_locked(struct page *page, /* Leave page->index set: truncation relies upon it */ if (!huge) mem_cgroup_cancel_charge(page, memcg, false); - put_page(page); + page_ref_sub(page, nr); return xas_error(&xas); } ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO); From patchwork Thu Sep 5 18:23:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11133753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D373513BD for ; Thu, 5 Sep 2019 18:23:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4F35C20828 for ; Thu, 5 Sep 2019 18:23:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="ZlX5W95i" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389871AbfIESXw (ORCPT ); Thu, 5 Sep 2019 14:23:52 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:34328 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389837AbfIESXv (ORCPT ); Thu, 5 Sep 2019 14:23:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=1QLSFFXUGzf9sZNnPcKI2Z9BvmtAY43QLuh0XGQS/rg=; b=ZlX5W95i+L/mDBug7zVcWALuB9 Qg/q/3eWRr0nv2cDXYn6ZFCg4/Du+aym+zSHkwl8AsIVMGRepn8UGpe+JoypvYYv5rJraeLr2nveS OdUU8FrnozHR6LrZc7E8lHd95iL+kwR9hyl1bY47vCMwt82nAjczOXkVALyrJVgPC+ZSucSp8Vrtb 8kMF/ytkkNjm6kLQZzO7axSvYdfkc2E/6yu1+sVO7GZMNjmuvMpASP9SkGiakcWQOiFf0CPtxFhbj 5tPMedyzTimhnkRVi3AKHsT+XbX2t+FqukqQ6GudbnTxAh4guTnVHBqBq9gKHE29W6MVWuuoZQIfQ Vttqkz4w==; Received: from willy by bombadil.infradead.org with local (Exim 4.92 #3 (Red Hat Linux)) id 1i5wQA-0001Uq-Uc; Thu, 05 Sep 2019 18:23:50 +0000 From: Matthew Wilcox To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , Kirill Shutemov , Song Liu , William Kucharski , Johannes Weiner Subject: [PATCH 3/3] mm: Allow find_get_page to be used for large pages Date: Thu, 5 Sep 2019 11:23:48 -0700 Message-Id: <20190905182348.5319-4-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190905182348.5319-1-willy@infradead.org> References: <20190905182348.5319-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" Add FGP_PMD to indicate that we're trying to find-or-create a page that is at least PMD_ORDER in size. The internal 'conflict' entry usage is modelled after that in DAX, but the implementations are different due to DAX using multi-order entries and the page cache using multiple order-0 entries. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 9 +++++ mm/filemap.c | 82 +++++++++++++++++++++++++++++++++++++---- 2 files changed, 84 insertions(+), 7 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d2147215d415..72101811524c 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -248,6 +248,15 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, #define FGP_NOFS 0x00000010 #define FGP_NOWAIT 0x00000020 #define FGP_FOR_MMAP 0x00000040 +/* + * If you add more flags, increment FGP_ORDER_SHIFT (no further than 25). + * Do not insert flags above the FGP order bits. + */ +#define FGP_ORDER_SHIFT 7 +#define FGP_PMD ((PMD_SHIFT - PAGE_SHIFT) << FGP_ORDER_SHIFT) +#define FGP_PUD ((PUD_SHIFT - PAGE_SHIFT) << FGP_ORDER_SHIFT) + +#define fgp_order(fgp) ((fgp) >> FGP_ORDER_SHIFT) struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, int fgp_flags, gfp_t cache_gfp_mask); diff --git a/mm/filemap.c b/mm/filemap.c index ae3c0a70a8e9..904dfabbea52 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1572,7 +1572,71 @@ struct page *find_get_entry(struct address_space *mapping, pgoff_t offset) return page; } -EXPORT_SYMBOL(find_get_entry); + +static bool pagecache_is_conflict(struct page *page) +{ + return page == XA_RETRY_ENTRY; +} + +/** + * __find_get_page - Find and get a page cache entry. + * @mapping: The address_space to search. + * @offset: The page cache index. + * @order: The minimum order of the entry to return. + * + * Looks up the page cache entries at @mapping between @offset and + * @offset + 2^@order. If there is a page cache page, it is returned with + * an increased refcount unless it is smaller than @order. + * + * If the slot holds a shadow entry of a previously evicted page, or a + * swap entry from shmem/tmpfs, it is returned. + * + * Return: the found page, a value indicating a conflicting page or %NULL if + * there are no pages in this range. + */ +static struct page *__find_get_page(struct address_space *mapping, + unsigned long offset, unsigned int order) +{ + XA_STATE(xas, &mapping->i_pages, offset); + struct page *page; + + rcu_read_lock(); +repeat: + xas_reset(&xas); + page = xas_find(&xas, offset | ((1UL << order) - 1)); + if (xas_retry(&xas, page)) + goto repeat; + /* + * A shadow entry of a recently evicted page, or a swap entry from + * shmem/tmpfs. Skip it; keep looking for pages. + */ + if (xa_is_value(page)) + goto repeat; + if (!page) + goto out; + if (compound_order(page) < order) { + page = XA_RETRY_ENTRY; + goto out; + } + + if (!page_cache_get_speculative(page)) + goto repeat; + + /* + * Has the page moved or been split? + * This is part of the lockless pagecache protocol. See + * include/linux/pagemap.h for details. + */ + if (unlikely(page != xas_reload(&xas))) { + put_page(page); + goto repeat; + } + page = find_subpage(page, offset); +out: + rcu_read_unlock(); + + return page; +} /** * find_lock_entry - locate, pin and lock a page cache entry @@ -1614,12 +1678,12 @@ EXPORT_SYMBOL(find_lock_entry); * pagecache_get_page - find and get a page reference * @mapping: the address_space to search * @offset: the page index - * @fgp_flags: PCG flags + * @fgp_flags: FGP flags * @gfp_mask: gfp mask to use for the page cache data page allocation * * Looks up the page cache slot at @mapping & @offset. * - * PCG flags modify how the page is returned. + * FGP flags modify how the page is returned. * * @fgp_flags can be: * @@ -1632,6 +1696,10 @@ EXPORT_SYMBOL(find_lock_entry); * - FGP_FOR_MMAP: Similar to FGP_CREAT, only we want to allow the caller to do * its own locking dance if the page is already in cache, or unlock the page * before returning if we had to add the page to pagecache. + * - FGP_PMD: We're only interested in pages at PMD granularity. If there + * is no page here (and FGP_CREATE is set), we'll create one large enough. + * If there is a smaller page in the cache that overlaps the PMD page, we + * return %NULL and do not attempt to create a page. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. @@ -1646,9 +1714,9 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, struct page *page; repeat: - page = find_get_entry(mapping, offset); - if (xa_is_value(page)) - page = NULL; + page = __find_get_page(mapping, offset, fgp_order(fgp_flags)); + if (pagecache_is_conflict(page)) + return NULL; if (!page) goto no_page; @@ -1682,7 +1750,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (fgp_flags & FGP_NOFS) gfp_mask &= ~__GFP_FS; - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc_order(gfp_mask, fgp_order(fgp_flags)); if (!page) return NULL;