From patchwork Wed Apr 6 22:51:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 8766591 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id C4E6D9F36E for ; Wed, 6 Apr 2016 22:53:44 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 821E820160 for ; Wed, 6 Apr 2016 22:53:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1EFB120131 for ; Wed, 6 Apr 2016 22:53:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754674AbcDFWvm (ORCPT ); Wed, 6 Apr 2016 18:51:42 -0400 Received: from mga02.intel.com ([134.134.136.20]:59061 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754599AbcDFWvh (ORCPT ); Wed, 6 Apr 2016 18:51:37 -0400 Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP; 06 Apr 2016 15:51:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,448,1455004800"; d="scan'208";a="779594908" Received: from black.fi.intel.com ([10.237.72.93]) by orsmga003.jf.intel.com with ESMTP; 06 Apr 2016 15:51:32 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 9D59B931; Thu, 7 Apr 2016 01:51:22 +0300 (EEST) From: "Kirill A. Shutemov" To: Hugh Dickins , Andrea Arcangeli , Andrew Morton Cc: Dave Hansen , Vlastimil Babka , Christoph Lameter , Naoya Horiguchi , Jerome Marchand , Yang Shi , Sasha Levin , Andres Lagar-Cavilla , Ning Qu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv6 20/30] filemap: prepare find and delete operations for huge pages Date: Thu, 7 Apr 2016 01:51:10 +0300 Message-Id: <1459983080-106718-21-git-send-email-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.8.0.rc3 In-Reply-To: <1459983080-106718-1-git-send-email-kirill.shutemov@linux.intel.com> References: <1459983080-106718-1-git-send-email-kirill.shutemov@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For now, we would have HPAGE_PMD_NR entries in radix tree for every huge page. That's suboptimal and it will be changed to use Matthew's multi-order entries later. 'add' operation is not changed, because we don't need it to implement hugetmpfs: shmem uses its own implementation. Signed-off-by: Kirill A. Shutemov --- mm/filemap.c | 187 ++++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 134 insertions(+), 53 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 7e982835d4ec..bf29ab4f87dc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -110,43 +110,18 @@ * ->tasklist_lock (memory_failure, collect_procs_ao) */ -static void page_cache_tree_delete(struct address_space *mapping, - struct page *page, void *shadow) +static void __page_cache_tree_delete(struct address_space *mapping, + struct radix_tree_node *node, void **slot, unsigned long index, + void *shadow) { - struct radix_tree_node *node; - unsigned long index; - unsigned int offset; unsigned int tag; - void **slot; - - VM_BUG_ON(!PageLocked(page)); - __radix_tree_lookup(&mapping->page_tree, page->index, &node, &slot); - - if (shadow) { - mapping->nrexceptional++; - /* - * Make sure the nrexceptional update is committed before - * the nrpages update so that final truncate racing - * with reclaim does not see both counters 0 at the - * same time and miss a shadow entry. - */ - smp_wmb(); - } - mapping->nrpages--; - - if (!node) { - /* Clear direct pointer tags in root node */ - mapping->page_tree.gfp_mask &= __GFP_BITS_MASK; - radix_tree_replace_slot(slot, shadow); - return; - } + VM_BUG_ON(node == NULL); + VM_BUG_ON(*slot == NULL); /* Clear tree tags for the removed page */ - index = page->index; - offset = index & RADIX_TREE_MAP_MASK; for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { - if (test_bit(offset, node->tags[tag])) + if (test_bit(index & RADIX_TREE_MAP_MASK, node->tags[tag])) radix_tree_tag_clear(&mapping->page_tree, index, tag); } @@ -173,6 +148,54 @@ static void page_cache_tree_delete(struct address_space *mapping, } } +static void page_cache_tree_delete(struct address_space *mapping, + struct page *page, void *shadow) +{ + struct radix_tree_node *node; + unsigned long index; + void **slot; + int i, nr = PageHuge(page) ? 1 : hpage_nr_pages(page); + + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(PageTail(page), page); + + __radix_tree_lookup(&mapping->page_tree, page->index, &node, &slot); + + if (shadow) { + mapping->nrexceptional += nr; + /* + * Make sure the nrexceptional update is committed before + * the nrpages update so that final truncate racing + * with reclaim does not see both counters 0 at the + * same time and miss a shadow entry. + */ + smp_wmb(); + } + mapping->nrpages -= nr; + + if (!node) { + /* Clear direct pointer tags in root node */ + mapping->page_tree.gfp_mask &= __GFP_BITS_MASK; + VM_BUG_ON(nr != 1); + radix_tree_replace_slot(slot, shadow); + return; + } + + index = page->index; + VM_BUG_ON_PAGE(index & (nr - 1), page); + for (i = 0; i < nr; i++) { + /* Cross node border */ + if (i && ((index + i) & RADIX_TREE_MAP_MASK) == 0) { + __radix_tree_lookup(&mapping->page_tree, + page->index + i, &node, &slot); + } + + __page_cache_tree_delete(mapping, node, + slot + (i & RADIX_TREE_MAP_MASK), index + i, + shadow); + } +} + /* * Delete a page from the page cache and free it. Caller has to make * sure the page is locked and that nobody else uses it - or that usage @@ -181,6 +204,7 @@ static void page_cache_tree_delete(struct address_space *mapping, void __delete_from_page_cache(struct page *page, void *shadow) { struct address_space *mapping = page->mapping; + int nr = hpage_nr_pages(page); trace_mm_filemap_delete_from_page_cache(page); /* @@ -193,6 +217,7 @@ void __delete_from_page_cache(struct page *page, void *shadow) else cleancache_invalidate_page(mapping, page); + VM_BUG_ON_PAGE(PageTail(page), page); VM_BUG_ON_PAGE(page_mapped(page), page); if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(page_mapped(page))) { int mapcount; @@ -224,9 +249,9 @@ void __delete_from_page_cache(struct page *page, void *shadow) /* hugetlb pages do not participate in page cache accounting. */ if (!PageHuge(page)) - __dec_zone_page_state(page, NR_FILE_PAGES); + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, -nr); if (PageSwapBacked(page)) - __dec_zone_page_state(page, NR_SHMEM); + __mod_zone_page_state(page_zone(page), NR_SHMEM, -nr); /* * At this point page must be either written or cleaned by truncate. @@ -250,9 +275,8 @@ void __delete_from_page_cache(struct page *page, void *shadow) */ void delete_from_page_cache(struct page *page) { - struct address_space *mapping = page->mapping; + struct address_space *mapping = page_mapping(page); unsigned long flags; - void (*freepage)(struct page *); BUG_ON(!PageLocked(page)); @@ -265,7 +289,13 @@ void delete_from_page_cache(struct page *page) if (freepage) freepage(page); - put_page(page); + + if (PageTransHuge(page) && !PageHuge(page)) { + page_ref_sub(page, HPAGE_PMD_NR); + VM_BUG_ON_PAGE(page_count(page) <= 0, page); + } else { + put_page(page); + } } EXPORT_SYMBOL(delete_from_page_cache); @@ -1054,7 +1084,7 @@ EXPORT_SYMBOL(page_cache_prev_hole); struct page *find_get_entry(struct address_space *mapping, pgoff_t offset) { void **pagep; - struct page *page; + struct page *head, *page; rcu_read_lock(); repeat: @@ -1074,9 +1104,17 @@ repeat: */ goto out; } - if (!page_cache_get_speculative(page)) + + head = compound_head(page); + if (!page_cache_get_speculative(head)) goto repeat; + /* The page was split under us? */ + if (compound_head(page) != head) { + put_page(page); + goto repeat; + } + /* * Has the page moved? * This is part of the lockless pagecache protocol. See @@ -1119,12 +1157,12 @@ repeat: if (page && !radix_tree_exception(page)) { lock_page(page); /* Has the page been truncated? */ - if (unlikely(page->mapping != mapping)) { + if (unlikely(page_mapping(page) != mapping)) { unlock_page(page); put_page(page); goto repeat; } - VM_BUG_ON_PAGE(page->index != offset, page); + VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page); } return page; } @@ -1256,7 +1294,7 @@ unsigned find_get_entries(struct address_space *mapping, rcu_read_lock(); radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { - struct page *page; + struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) @@ -1273,8 +1311,16 @@ repeat: */ goto export; } - if (!page_cache_get_speculative(page)) + + head = compound_head(page); + if (!page_cache_get_speculative(head)) + goto repeat; + + /* The page was split under us? */ + if (compound_head(page) != head) { + put_page(page); goto repeat; + } /* Has the page moved? */ if (unlikely(page != *slot)) { @@ -1319,7 +1365,7 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start, rcu_read_lock(); radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { - struct page *page; + struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) @@ -1338,9 +1384,16 @@ repeat: continue; } - if (!page_cache_get_speculative(page)) + head = compound_head(page); + if (!page_cache_get_speculative(head)) goto repeat; + /* The page was split under us? */ + if (compound_head(page) != head) { + put_page(page); + goto repeat; + } + /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(page); @@ -1380,7 +1433,7 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index, rcu_read_lock(); radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) { - struct page *page; + struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); /* The hole, there no reason to continue */ @@ -1400,8 +1453,14 @@ repeat: break; } - if (!page_cache_get_speculative(page)) + head = compound_head(page); + if (!page_cache_get_speculative(head)) goto repeat; + /* The page was split under us? */ + if (compound_head(page) != head) { + put_page(page); + goto repeat; + } /* Has the page moved? */ if (unlikely(page != *slot)) { @@ -1414,7 +1473,7 @@ repeat: * otherwise we can get both false positives and false * negatives, which is just confusing to the caller. */ - if (page->mapping == NULL || page->index != iter.index) { + if (page->mapping == NULL || page_to_pgoff(page) != iter.index) { put_page(page); break; } @@ -1452,7 +1511,7 @@ unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index, rcu_read_lock(); radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, *index, tag) { - struct page *page; + struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) @@ -1477,8 +1536,15 @@ repeat: continue; } - if (!page_cache_get_speculative(page)) + head = compound_head(page); + if (!page_cache_get_speculative(head)) + goto repeat; + + /* The page was split under us? */ + if (compound_head(page) != head) { + put_page(page); goto repeat; + } /* Has the page moved? */ if (unlikely(page != *slot)) { @@ -1526,7 +1592,7 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start, rcu_read_lock(); radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, start, tag) { - struct page *page; + struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) @@ -1544,9 +1610,17 @@ repeat: */ goto export; } - if (!page_cache_get_speculative(page)) + + head = compound_head(page); + if (!page_cache_get_speculative(head)) goto repeat; + /* The page was split under us? */ + if (compound_head(page) != head) { + put_page(page); + goto repeat; + } + /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(page); @@ -2140,7 +2214,7 @@ void filemap_map_pages(struct fault_env *fe, struct address_space *mapping = file->f_mapping; pgoff_t last_pgoff = start_pgoff; loff_t size; - struct page *page; + struct page *head, *page; rcu_read_lock(); radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, @@ -2159,8 +2233,15 @@ repeat: goto next; } - if (!page_cache_get_speculative(page)) + head = compound_head(page); + if (!page_cache_get_speculative(head)) + goto repeat; + + /* The page was split under us? */ + if (compound_head(page) != head) { + put_page(page); goto repeat; + } /* Has the page moved? */ if (unlikely(page != *slot)) {