From patchwork Tue Nov 29 11:22:48 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 9451575 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9AD7D60710 for ; Tue, 29 Nov 2016 11:28:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D17328236 for ; Tue, 29 Nov 2016 11:28:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 71BA92823D; Tue, 29 Nov 2016 11:28:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E37C6281E1 for ; Tue, 29 Nov 2016 11:28:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933145AbcK2LZq (ORCPT ); Tue, 29 Nov 2016 06:25:46 -0500 Received: from mga05.intel.com ([192.55.52.43]:65495 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757197AbcK2LYJ (ORCPT ); Tue, 29 Nov 2016 06:24:09 -0500 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP; 29 Nov 2016 03:24:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,568,1473145200"; d="scan'208";a="1091991839" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 29 Nov 2016 03:24:02 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id E1F3AA62; Tue, 29 Nov 2016 13:23:12 +0200 (EET) From: "Kirill A. Shutemov" To: "Theodore Ts'o" , Andreas Dilger , Jan Kara , Andrew Morton Cc: Alexander Viro , Hugh Dickins , Andrea Arcangeli , Dave Hansen , Vlastimil Babka , Matthew Wilcox , Ross Zwisler , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 20/36] truncate: make truncate_inode_pages_range() aware about huge pages Date: Tue, 29 Nov 2016 14:22:48 +0300 Message-Id: <20161129112304.90056-21-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.10.2 In-Reply-To: <20161129112304.90056-1-kirill.shutemov@linux.intel.com> References: <20161129112304.90056-1-kirill.shutemov@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As with shmem_undo_range(), truncate_inode_pages_range() removes huge pages, if it fully within range. Partial truncate of huge pages zero out this part of THP. Unlike with shmem, it doesn't prevent us having holes in the middle of huge page we still can skip writeback not touched buffers. With memory-mapped IO we would loose holes in some cases when we have THP in page cache, since we cannot track access on 4k level in this case. Signed-off-by: Kirill A. Shutemov --- fs/buffer.c | 2 +- include/linux/mm.h | 9 +++++- mm/truncate.c | 86 ++++++++++++++++++++++++++++++++++++++++++++---------- 3 files changed, 80 insertions(+), 17 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 8e000021513c..24daf7b9bdb0 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1534,7 +1534,7 @@ void block_invalidatepage(struct page *page, unsigned int offset, /* * Check for overflow */ - BUG_ON(stop > PAGE_SIZE || stop < length); + BUG_ON(stop > hpage_size(page) || stop < length); head = page_buffers(page); bh = head; diff --git a/include/linux/mm.h b/include/linux/mm.h index 582844ca0b23..59e74dc57359 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1328,8 +1328,15 @@ int get_kernel_page(unsigned long start, int write, struct page **pages); struct page *get_dump_page(unsigned long addr); extern int try_to_release_page(struct page * page, gfp_t gfp_mask); -extern void do_invalidatepage(struct page *page, unsigned int offset, +extern void __do_invalidatepage(struct page *page, unsigned int offset, unsigned int length); +static inline void do_invalidatepage(struct page *page, unsigned int offset, + unsigned int length) +{ + if (page_has_private(page)) + __do_invalidatepage(page, offset, length); +} + int __set_page_dirty_nobuffers(struct page *page); int __set_page_dirty_no_writeback(struct page *page); diff --git a/mm/truncate.c b/mm/truncate.c index eb3a3a45feb6..d2d95f283ec3 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -70,12 +70,12 @@ static void clear_exceptional_entry(struct address_space *mapping, * point. Because the caller is about to free (and possibly reuse) those * blocks on-disk. */ -void do_invalidatepage(struct page *page, unsigned int offset, +void __do_invalidatepage(struct page *page, unsigned int offset, unsigned int length) { void (*invalidatepage)(struct page *, unsigned int, unsigned int); - invalidatepage = page->mapping->a_ops->invalidatepage; + invalidatepage = page_mapping(page)->a_ops->invalidatepage; #ifdef CONFIG_BLOCK if (!invalidatepage) invalidatepage = block_invalidatepage; @@ -100,8 +100,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page) if (page->mapping != mapping) return -EIO; - if (page_has_private(page)) - do_invalidatepage(page, 0, PAGE_SIZE); + do_invalidatepage(page, 0, hpage_size(page)); /* * Some filesystems seem to re-dirty the page even after @@ -273,13 +272,35 @@ void truncate_inode_pages_range(struct address_space *mapping, unlock_page(page); continue; } + + if (PageTransHuge(page)) { + int j, first = 0, last = HPAGE_PMD_NR - 1; + + if (start > page->index) + first = start & (HPAGE_PMD_NR - 1); + if (index == round_down(end, HPAGE_PMD_NR)) + last = (end - 1) & (HPAGE_PMD_NR - 1); + + /* Range starts or ends in the middle of THP */ + if (first != 0 || last != HPAGE_PMD_NR - 1) { + int off, len; + for (j = first; j <= last; j++) + clear_highpage(page + j); + off = first * PAGE_SIZE; + len = (last + 1) * PAGE_SIZE - off; + do_invalidatepage(page, off, len); + unlock_page(page); + continue; + } + } + truncate_inode_page(mapping, page); unlock_page(page); } pagevec_remove_exceptionals(&pvec); + index += pvec.nr ? hpage_nr_pages(pvec.pages[pvec.nr - 1]) : 1; pagevec_release(&pvec); cond_resched(); - index++; } if (partial_start) { @@ -294,9 +315,12 @@ void truncate_inode_pages_range(struct address_space *mapping, wait_on_page_writeback(page); zero_user_segment(page, partial_start, top); cleancache_invalidate_page(mapping, page); - if (page_has_private(page)) - do_invalidatepage(page, partial_start, - top - partial_start); + if (page_has_private(page)) { + int off = page - compound_head(page); + do_invalidatepage(compound_head(page), + off * PAGE_SIZE + partial_start, + top - partial_start); + } unlock_page(page); put_page(page); } @@ -307,9 +331,12 @@ void truncate_inode_pages_range(struct address_space *mapping, wait_on_page_writeback(page); zero_user_segment(page, 0, partial_end); cleancache_invalidate_page(mapping, page); - if (page_has_private(page)) - do_invalidatepage(page, 0, - partial_end); + if (page_has_private(page)) { + int off = page - compound_head(page); + do_invalidatepage(compound_head(page), + off * PAGE_SIZE, + partial_end); + } unlock_page(page); put_page(page); } @@ -323,7 +350,7 @@ void truncate_inode_pages_range(struct address_space *mapping, index = start; for ( ; ; ) { - cond_resched(); +restart: cond_resched(); if (!pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) { /* If all gone from start onwards, we're done */ @@ -346,8 +373,8 @@ void truncate_inode_pages_range(struct address_space *mapping, index = indices[i]; if (index >= end) { /* Restart punch to make sure all gone */ - index = start - 1; - break; + index = start; + goto restart; } if (radix_tree_exceptional_entry(page)) { @@ -358,12 +385,41 @@ void truncate_inode_pages_range(struct address_space *mapping, lock_page(page); WARN_ON(page_to_index(page) != index); wait_on_page_writeback(page); + + if (PageTransHuge(page)) { + int j, first = 0, last = HPAGE_PMD_NR - 1; + + if (start > page->index) + first = start & (HPAGE_PMD_NR - 1); + if (index == round_down(end, HPAGE_PMD_NR)) + last = (end - 1) & (HPAGE_PMD_NR - 1); + + /* + * On Partial thp truncate due 'start' in + * middle of THP: don't need to look on these + * pages again on !pvec.nr restart. + */ + start = page->index + HPAGE_PMD_NR; + + /* Range starts or ends in the middle of THP */ + if (first != 0 || last != HPAGE_PMD_NR - 1) { + int off, len; + for (j = first; j <= last; j++) + clear_highpage(page + j); + off = first * PAGE_SIZE; + len = (last + 1) * PAGE_SIZE - off; + do_invalidatepage(page, off, len); + unlock_page(page); + continue; + } + } + truncate_inode_page(mapping, page); unlock_page(page); } pagevec_remove_exceptionals(&pvec); + index += pvec.nr ? hpage_nr_pages(pvec.pages[pvec.nr - 1]) : 1; pagevec_release(&pvec); - index++; } cleancache_invalidate_inode(mapping); }