From patchwork Thu Sep 15 11:55:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 9333425 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CD9D16077F for ; Thu, 15 Sep 2016 12:08:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA2BF2969E for ; Thu, 15 Sep 2016 12:08:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AE904296A4; Thu, 15 Sep 2016 12:08:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E4F52969E for ; Thu, 15 Sep 2016 12:08:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934823AbcIOMHk (ORCPT ); Thu, 15 Sep 2016 08:07:40 -0400 Received: from mga07.intel.com ([134.134.136.100]:41643 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756136AbcIOLzr (ORCPT ); Thu, 15 Sep 2016 07:55:47 -0400 Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP; 15 Sep 2016 04:55:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.30,339,1470726000"; d="scan'208";a="879188979" Received: from black.fi.intel.com ([10.237.72.56]) by orsmga003.jf.intel.com with ESMTP; 15 Sep 2016 04:55:41 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 490A4A51; Thu, 15 Sep 2016 14:55:27 +0300 (EEST) From: "Kirill A. Shutemov" To: "Theodore Ts'o" , Andreas Dilger , Jan Kara , Andrew Morton Cc: Alexander Viro , Hugh Dickins , Andrea Arcangeli , Dave Hansen , Vlastimil Babka , Matthew Wilcox , Ross Zwisler , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, Naoya Horiguchi , "Kirill A . Shutemov" Subject: [PATCHv3 28/41] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries Date: Thu, 15 Sep 2016 14:55:10 +0300 Message-Id: <20160915115523.29737-29-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160915115523.29737-1-kirill.shutemov@linux.intel.com> References: <20160915115523.29737-1-kirill.shutemov@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Naoya Horiguchi Currently, hugetlb pages are linked to page cache on the basis of hugepage offset (derived from vma_hugecache_offset()) for historical reason, which doesn't match to the generic usage of page cache and requires some routines to covert page offset <=> hugepage offset in common path. This patch adjusts code for multi-order radix-tree to avoid the situation. Main change is on the behavior of page->index for hugetlbfs. Before this patch, it represented hugepage offset, but with this patch it represents page offset. So index-related code have to be updated. Note that hugetlb_fault_mutex_hash() and reservation region handling are still working with hugepage offset. Signed-off-by: Naoya Horiguchi [kirill.shutemov@linux.intel.com: reject fixed] Signed-off-by: Kirill A. Shutemov --- fs/hugetlbfs/inode.c | 22 ++++++++++------------ include/linux/pagemap.h | 10 +--------- mm/filemap.c | 30 ++++++++++++++++++------------ mm/hugetlb.c | 19 ++++++------------- 4 files changed, 35 insertions(+), 46 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 4ea71eba40a5..fc918c0e33e9 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -388,8 +388,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, { struct hstate *h = hstate_inode(inode); struct address_space *mapping = &inode->i_data; - const pgoff_t start = lstart >> huge_page_shift(h); - const pgoff_t end = lend >> huge_page_shift(h); + const pgoff_t start = lstart >> PAGE_SHIFT; + const pgoff_t end = lend >> PAGE_SHIFT; struct vm_area_struct pseudo_vma; struct pagevec pvec; pgoff_t next; @@ -447,8 +447,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, i_mmap_lock_write(mapping); hugetlb_vmdelete_list(&mapping->i_mmap, - next * pages_per_huge_page(h), - (next + 1) * pages_per_huge_page(h)); + next, next + 1); i_mmap_unlock_write(mapping); } @@ -467,7 +466,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, freed++; if (!truncate_op) { if (unlikely(hugetlb_unreserve_pages(inode, - next, next + 1, 1))) + (next) << huge_page_order(h), + (next + 1) << huge_page_order(h), 1))) hugetlb_fix_reserve_counts(inode, rsv_on_error); } @@ -552,8 +552,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, struct hstate *h = hstate_inode(inode); struct vm_area_struct pseudo_vma; struct mm_struct *mm = current->mm; - loff_t hpage_size = huge_page_size(h); - unsigned long hpage_shift = huge_page_shift(h); pgoff_t start, index, end; int error; u32 hash; @@ -569,8 +567,8 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * For this range, start is rounded down and end is rounded up * as well as being converted to page offsets. */ - start = offset >> hpage_shift; - end = (offset + len + hpage_size - 1) >> hpage_shift; + start = offset >> PAGE_SHIFT; + end = (offset + len + huge_page_size(h) - 1) >> PAGE_SHIFT; inode_lock(inode); @@ -588,7 +586,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED); pseudo_vma.vm_file = file; - for (index = start; index < end; index++) { + for (index = start; index < end; index += pages_per_huge_page(h)) { /* * This is supposed to be the vaddr where the page is being * faulted in, but we have no vaddr here. @@ -609,10 +607,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, } /* Set numa allocation policy based on index */ - hugetlb_set_vma_policy(&pseudo_vma, inode, index); + hugetlb_set_vma_policy(&pseudo_vma, inode, index >> huge_page_order(h)); /* addr is the offset within the file (zero based) */ - addr = index * hpage_size; + addr = index << PAGE_SHIFT & ~huge_page_mask(h); /* mutex taken here, fault path and hole punch */ hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 4d6e9aec2d1f..645dcb72ca0f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -380,15 +380,11 @@ static inline struct page *read_mapping_page(struct address_space *mapping, /* * Get the offset in PAGE_SIZE. - * (TODO: hugepage should have ->index in PAGE_SIZE) */ static inline pgoff_t page_to_pgoff(struct page *page) { pgoff_t pgoff; - if (unlikely(PageHeadHuge(page))) - return page->index << compound_order(page); - if (likely(!PageTransTail(page))) return page->index; @@ -414,15 +410,11 @@ static inline loff_t page_file_offset(struct page *page) return ((loff_t)page_file_index(page)) << PAGE_SHIFT; } -extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma, - unsigned long address); - static inline pgoff_t linear_page_index(struct vm_area_struct *vma, unsigned long address) { pgoff_t pgoff; - if (unlikely(is_vm_hugetlb_page(vma))) - return linear_hugepage_index(vma, address); + pgoff = (address - vma->vm_start) >> PAGE_SHIFT; pgoff += vma->vm_pgoff; return pgoff; diff --git a/mm/filemap.c b/mm/filemap.c index 53da93156e60..61d375d7f93e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -114,7 +114,7 @@ static void page_cache_tree_delete(struct address_space *mapping, struct page *page, void *shadow) { struct radix_tree_node *node; - int nr = PageHuge(page) ? 1 : hpage_nr_pages(page); + int nr = hpage_nr_pages(page); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -668,7 +668,7 @@ static int __add_to_page_cache_locked(struct page *page, page->index = offset; spin_lock_irq(&mapping->tree_lock); - if (PageTransHuge(page)) { + if (PageCompound(page)) { struct radix_tree_iter iter; void **slot; void *p; @@ -677,7 +677,7 @@ static int __add_to_page_cache_locked(struct page *page, /* Wipe shadow entires */ radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, offset) { - if (iter.index >= offset + HPAGE_PMD_NR) + if (iter.index >= offset + hpage_nr_pages(page)) break; p = radix_tree_deref_slot_protected(slot, @@ -699,10 +699,15 @@ static int __add_to_page_cache_locked(struct page *page, compound_order(page), page); if (!error) { - count_vm_event(THP_FILE_ALLOC); - mapping->nrpages += HPAGE_PMD_NR; - *shadowp = NULL; - __inc_node_page_state(page, NR_FILE_THPS); + if (hugetlb) { + mapping->nrpages += 1 << compound_order(page); + } else if (PageTransHuge(page)) { + count_vm_event(THP_FILE_ALLOC); + mapping->nrpages += HPAGE_PMD_NR; + *shadowp = NULL; + __inc_node_page_state(page, NR_FILE_THPS); + } else + BUG(); } } else { error = page_cache_tree_insert(mapping, page, shadowp); @@ -1144,9 +1149,9 @@ repeat: } /* For multi-order entries, find relevant subpage */ - if (PageTransHuge(page)) { + if (PageCompound(page)) { VM_BUG_ON(offset - page->index < 0); - VM_BUG_ON(offset - page->index >= HPAGE_PMD_NR); + VM_BUG_ON(offset - page->index >= 1 << compound_order(page)); page += offset - page->index; } } @@ -1514,16 +1519,17 @@ repeat: } /* For multi-order entries, find relevant subpage */ - if (PageTransHuge(page)) { + if (PageCompound(page)) { VM_BUG_ON(index - page->index < 0); - VM_BUG_ON(index - page->index >= HPAGE_PMD_NR); + VM_BUG_ON(index - page->index >= + 1 << compound_order(page)); page += index - page->index; } pages[ret] = page; if (++ret == nr_pages) break; - if (!PageTransCompound(page)) + if (PageHuge(page) || !PageTransCompound(page)) continue; for (refs = 0; ret < nr_pages && (index + 1) % HPAGE_PMD_NR; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 87e11d8ad536..a50e20553265 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -622,13 +622,6 @@ static pgoff_t vma_hugecache_offset(struct hstate *h, (vma->vm_pgoff >> huge_page_order(h)); } -pgoff_t linear_hugepage_index(struct vm_area_struct *vma, - unsigned long address) -{ - return vma_hugecache_offset(hstate_vma(vma), vma, address); -} -EXPORT_SYMBOL_GPL(linear_hugepage_index); - /* * Return the size of the pages allocated when backing a VMA. In the majority * cases this will be same size as used by the page table entries. @@ -3491,7 +3484,7 @@ static struct page *hugetlbfs_pagecache_page(struct hstate *h, pgoff_t idx; mapping = vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, vma, address); + idx = linear_page_index(vma, address); return find_lock_page(mapping, idx); } @@ -3508,7 +3501,7 @@ static bool hugetlbfs_pagecache_present(struct hstate *h, struct page *page; mapping = vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, vma, address); + idx = linear_page_index(vma, address); page = find_get_page(mapping, idx); if (page) @@ -3563,7 +3556,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, retry: page = find_lock_page(mapping, idx); if (!page) { - size = i_size_read(mapping->host) >> huge_page_shift(h); + size = i_size_read(mapping->host) >> PAGE_SHIFT; if (idx >= size) goto out; page = alloc_huge_page(vma, address, 0); @@ -3625,7 +3618,7 @@ retry: ptl = huge_pte_lockptr(h, mm, ptep); spin_lock(ptl); - size = i_size_read(mapping->host) >> huge_page_shift(h); + size = i_size_read(mapping->host) >> PAGE_SHIFT; if (idx >= size) goto backout; @@ -3672,7 +3665,7 @@ u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, if (vma->vm_flags & VM_SHARED) { key[0] = (unsigned long) mapping; - key[1] = idx; + key[1] = idx >> huge_page_order(h); } else { key[0] = (unsigned long) mm; key[1] = address >> huge_page_shift(h); @@ -3728,7 +3721,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } mapping = vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, vma, address); + idx = linear_page_index(vma, address); /* * Serialize hugepage allocation and instantiation, so that we don't