From patchwork Wed Sep 25 00:52:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159821 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14861912 for ; Wed, 25 Sep 2019 00:52:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DD89220874 for ; Wed, 25 Sep 2019 00:52:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="uvRj3VPw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405290AbfIYAwV (ORCPT ); Tue, 24 Sep 2019 20:52:21 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56896 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404146AbfIYAwU (ORCPT ); Tue, 24 Sep 2019 20:52:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=IYJnfPXQbX2KlchjxVENlwTcxZRg1vNrwgB6nJ+Nk+o=; b=uvRj3VPweMBLx72xxubKIor8iq XeXDZdyVqWM27tGKBPBDx1XBnC9cEsaNrnmPRK1zWRHopGOOFEmzAJNbGCvzIaRh7D64dmbJn75Vg 7Oc4zzfR6j1S/CMnMaNyozzl4LQhTKeoujlhXtgdAg6WlAx0Lf8BuxE+B+4HCe/9DbTD8dxMXLcLK NTYP6FCDuOHAbfvVOLJ9npLV93mIZBeo2Eq+RbyqIcdqF2qc+8f7nWx2uzI2F2MSAMLDDLSHT24sI lmDu2cBd3FQ2B7lD8J8NL7DwdaNMxOBk2T7gmPDFoTLw5aLTAU1G2CPC5hXpN9EnJp4REFhW4JFUG otHYz3EQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076G-68; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 01/15] mm: Use vm_fault error code directly Date: Tue, 24 Sep 2019 17:52:00 -0700 Message-Id: <20190925005214.27240-2-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" Use VM_FAULT_OOM instead of indirecting through vmf_error(-ENOMEM). Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Kirill A. Shutemov --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 1146fcfa3215..625ef3ef19f3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2533,7 +2533,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (!page) { if (fpin) goto out_retry; - return vmf_error(-ENOMEM); + return VM_FAULT_OOM; } } From patchwork Wed Sep 25 00:52:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159859 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 670CF912 for ; Wed, 25 Sep 2019 00:53:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 452A9214DA for ; Wed, 25 Sep 2019 00:53:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="b1R5xT8e" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411294AbfIYAwa (ORCPT ); Tue, 24 Sep 2019 20:52:30 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56938 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2411288AbfIYAw3 (ORCPT ); Tue, 24 Sep 2019 20:52:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=AegXWtwhJufykUg7KL27jy9vkWxe3tZkmjCUDyo5W6Q=; b=b1R5xT8e1BIe+BWgE8iNLTnzcB oEK7pRSr5HvxwSaSPxqyvanifUKQy9nu2AkVDdox8EtnNSI9xObs/8ifynPAyF2HWAYMJ5wO8el1D JXuxJRSVg0uJE/uWEun84vwBeCy8Pz8ZWGMngjyKJkbpUXgQA5/sY0q7m9WmqqlUOMPysKuzjXRDU /e27fKKPiXEsmoNEJFG72913QHgON4sCZs0vuoj+F3BImN6U1so72WBjm+ymv0c833WJ6E/qtX2hZ 9Z5I2eZFaPnPFw22k6/NGi1Bf/mY7JXHhUV/PvR5n7ZFuOcYgO5vlKvJ5inXu5ZFlDmS91gMOAxo8 tEp/QWgQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076K-99; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 02/15] fs: Introduce i_blocks_per_page Date: Tue, 24 Sep 2019 17:52:01 -0700 Message-Id: <20190925005214.27240-3-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" This helper is useful for both large pages in the page cache and for supporting block size larger than page size. Convert some example users (we have a few different ways of writing this idiom). Signed-off-by: Matthew Wilcox (Oracle) --- fs/iomap/buffered-io.c | 4 ++-- fs/jfs/jfs_metapage.c | 2 +- fs/xfs/xfs_aops.c | 8 ++++---- include/linux/pagemap.h | 13 +++++++++++++ 4 files changed, 20 insertions(+), 7 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index e25901ae3ff4..0e76a4b6d98a 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -24,7 +24,7 @@ iomap_page_create(struct inode *inode, struct page *page) { struct iomap_page *iop = to_iomap_page(page); - if (iop || i_blocksize(inode) == PAGE_SIZE) + if (iop || i_blocks_per_page(inode, page) <= 1) return iop; iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL); @@ -128,7 +128,7 @@ iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len) bool uptodate = true; if (iop) { - for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) { + for (i = 0; i < i_blocks_per_page(inode, page); i++) { if (i >= first && i <= last) set_bit(i, iop->uptodate); else if (!test_bit(i, iop->uptodate)) diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index a2f5338a5ea1..176580f54af9 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page *page) struct inode *inode = page->mapping->host; struct bio *bio = NULL; int block_offset; - int blocks_per_page = PAGE_SIZE >> inode->i_blkbits; + int blocks_per_page = i_blocks_per_page(inode, page); sector_t page_start; /* address of page in fs blocks */ sector_t pblock; int xlen; diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index f16d5f196c6b..102cfd8a97d6 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -68,7 +68,7 @@ xfs_finish_page_writeback( mapping_set_error(inode->i_mapping, -EIO); } - ASSERT(iop || i_blocksize(inode) == PAGE_SIZE); + ASSERT(iop || i_blocks_per_page(inode, bvec->bv_page) <= 1); ASSERT(!iop || atomic_read(&iop->write_count) > 0); if (!iop || atomic_dec_and_test(&iop->write_count)) @@ -839,7 +839,7 @@ xfs_aops_discard_page( page, ip->i_ino, offset); error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - PAGE_SIZE / i_blocksize(inode)); + i_blocks_per_page(inode, page)); if (error && !XFS_FORCED_SHUTDOWN(mp)) xfs_alert(mp, "page discard unable to remove delalloc mapping."); out_invalidate: @@ -877,7 +877,7 @@ xfs_writepage_map( uint64_t file_offset; /* file offset of page */ int error = 0, count = 0, i; - ASSERT(iop || i_blocksize(inode) == PAGE_SIZE); + ASSERT(iop || i_blocks_per_page(inode, page) <= 1); ASSERT(!iop || atomic_read(&iop->write_count) == 0); /* @@ -886,7 +886,7 @@ xfs_writepage_map( * one. */ for (i = 0, file_offset = page_offset(page); - i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset; + i < i_blocks_per_page(inode, page) && file_offset < end_offset; i++, file_offset += len) { if (iop && !test_bit(i, iop->uptodate)) continue; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 37a4d9e32cd3..750770a2c685 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -636,4 +636,17 @@ static inline unsigned long dir_pages(struct inode *inode) PAGE_SHIFT; } +/** + * i_blocks_per_page - How many blocks fit in this page. + * @inode: The inode which contains the blocks. + * @page: The (potentially large) page. + * + * Context: Any context. + * Return: The number of filesystem blocks covered by this page. + */ +static inline +unsigned int i_blocks_per_page(struct inode *inode, struct page *page) +{ + return page_size(page) >> inode->i_blkbits; +} #endif /* _LINUX_PAGEMAP_H */ From patchwork Wed Sep 25 00:52:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159871 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5AA80912 for ; Wed, 25 Sep 2019 00:53:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3839D214DA for ; Wed, 25 Sep 2019 00:53:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="jjPM6AOe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393217AbfIYAwT (ORCPT ); Tue, 24 Sep 2019 20:52:19 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56872 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391492AbfIYAwS (ORCPT ); Tue, 24 Sep 2019 20:52:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Msuh+GIy2CoxvHIBREeyTvGqw5Zy0zVyz1WZpxkYJLU=; b=jjPM6AOeJ3T4p2ns/p0veL1StU pLb7u0VfrTxsOUToXx7EuihWY+hieN/+gonGpGZk5xIgcajiTU07YfgpU7SniFynUrx/HPqxTaRUm Ihws+qwkWEE31PJqfwGAPKoeUfcdgeMokv7q5r/KmJqy5HTxDT/KE2kkAlrETXwgffWVYNjtL4w+z D9um7Zm3lZUQAKnd9d107iYu2CRCTMUu+67AvtxPaVePxDduc1NP3g6AR6TduLZAHkbodQyEpFP1T blk6GxeW0vnCyT2uojZJ8pBHjWpYSHujrsm2YeSnAuOPf6nFMzMMhMXlYo5d+5vUElPptm/ox1UdQ plHWZMAg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076O-AM; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 03/15] mm: Add file_offset_of_ helpers Date: Tue, 24 Sep 2019 17:52:02 -0700 Message-Id: <20190925005214.27240-4-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" The page_offset function is badly named for people reading the functions which call it. The natural meaning of a function with this name would be 'offset within a page', not 'page offset in bytes within a file'. Dave Chinner suggests file_offset_of_page() as a replacement function name and I'm also adding file_offset_of_next_page() as a helper for the large page work. Also add kernel-doc for these functions so they show up in the kernel API book. page_offset() is retained as a compatibility define for now. --- drivers/net/ethernet/ibm/ibmveth.c | 2 -- include/linux/pagemap.h | 25 ++++++++++++++++++++++--- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index c5be4ebd8437..bf98aeaf9a45 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -978,8 +978,6 @@ static int ibmveth_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) return -EOPNOTSUPP; } -#define page_offset(v) ((unsigned long)(v) & ((1 << 12) - 1)) - static int ibmveth_send(struct ibmveth_adapter *adapter, union ibmveth_buf_desc *descs, unsigned long mss) { diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 750770a2c685..103205494ea0 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -428,14 +428,33 @@ static inline pgoff_t page_to_pgoff(struct page *page) return page_to_index(page); } -/* - * Return byte-offset into filesystem object for page. +/** + * file_offset_of_page - File offset of this page. + * @page: Page cache page. + * + * Context: Any context. + * Return: The offset of the first byte of this page. */ -static inline loff_t page_offset(struct page *page) +static inline loff_t file_offset_of_page(struct page *page) { return ((loff_t)page->index) << PAGE_SHIFT; } +/* Legacy; please convert callers */ +#define page_offset(page) file_offset_of_page(page) + +/** + * file_offset_of_next_page - File offset of the next page. + * @page: Page cache page. + * + * Context: Any context. + * Return: The offset of the first byte after this page. + */ +static inline loff_t file_offset_of_next_page(struct page *page) +{ + return ((loff_t)page->index + compound_nr(page)) << PAGE_SHIFT; +} + static inline loff_t page_file_offset(struct page *page) { return ((loff_t)page_index(page)) << PAGE_SHIFT; From patchwork Wed Sep 25 00:52:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 04164912 for ; Wed, 25 Sep 2019 00:52:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C1C1C21655 for ; Wed, 25 Sep 2019 00:52:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="aTZHh/d9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411305AbfIYAwe (ORCPT ); Tue, 24 Sep 2019 20:52:34 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56950 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2411300AbfIYAwd (ORCPT ); Tue, 24 Sep 2019 20:52:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=D/5bQzcGgQkfd9G5pE3hprJOP16SeMHFGWAqOiEhlJY=; b=aTZHh/d9DRCSCokXrgRZVDp95Q JFiJRGFfX2XRziAr0NNDDVnvnddQqPOf9Y9KdTukniRF7GPiTcx0G9fRuPW4HVJ/uOXQ4bNjdEmUJ WvRHt6+/lCX7LyoQvEDPnhDPJ0ZKnFIA7AiBbdH3RrIql1Iqml5hHQ3rDhuUwMgrxXiA/S7hZ8OHq ekdi/VhpAzucXxN5mq9c+0XlNhjVtG6awVibsgij1m2lAUQGPD70r2Y50ATYn/647qL0o2RLqMxNX n/SNQ9Ng2IlWjzPmCGYc6SZ/aMk7Pdmi/SCZRsFQe39dpm3RD7QwYCUON6HEMbgLBOI4mOHSoDhwP E4Tc3qcA==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076S-C3; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 04/15] iomap: Support large pages Date: Tue, 24 Sep 2019 17:52:03 -0700 Message-Id: <20190925005214.27240-5-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" Change iomap_page from a statically sized uptodate bitmap to a dynamically allocated uptodate bitmap, allowing an arbitrarily large page. The only remaining places where iomap assumes an order-0 page are for files with inline data, where there's no sense in allocating a larger page. Signed-off-by: Matthew Wilcox (Oracle) --- fs/iomap/buffered-io.c | 119 ++++++++++++++++++++++++++--------------- include/linux/iomap.h | 2 +- include/linux/mm.h | 2 + 3 files changed, 80 insertions(+), 43 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 0e76a4b6d98a..15d844a88439 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -23,14 +23,14 @@ static struct iomap_page * iomap_page_create(struct inode *inode, struct page *page) { struct iomap_page *iop = to_iomap_page(page); + unsigned int n; if (iop || i_blocks_per_page(inode, page) <= 1) return iop; - iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL); - atomic_set(&iop->read_count, 0); - atomic_set(&iop->write_count, 0); - bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE); + n = BITS_TO_LONGS(i_blocks_per_page(inode, page)); + iop = kmalloc(struct_size(iop, uptodate, n), + GFP_NOFS | __GFP_NOFAIL | __GFP_ZERO); /* * migrate_page_move_mapping() assumes that pages with private data have @@ -61,15 +61,16 @@ iomap_page_release(struct page *page) * Calculate the range inside the page that we actually need to read. */ static void -iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop, +iomap_adjust_read_range(struct inode *inode, struct page *page, loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp) { + struct iomap_page *iop = to_iomap_page(page); loff_t orig_pos = *pos; loff_t isize = i_size_read(inode); unsigned block_bits = inode->i_blkbits; unsigned block_size = (1 << block_bits); - unsigned poff = offset_in_page(*pos); - unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length); + unsigned poff = offset_in_this_page(page, *pos); + unsigned plen = min_t(loff_t, page_size(page) - poff, length); unsigned first = poff >> block_bits; unsigned last = (poff + plen - 1) >> block_bits; @@ -107,7 +108,8 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop, * page cache for blocks that are entirely outside of i_size. */ if (orig_pos <= isize && orig_pos + length > isize) { - unsigned end = offset_in_page(isize - 1) >> block_bits; + unsigned end = offset_in_this_page(page, isize - 1) >> + block_bits; if (first <= end && last > end) plen -= (last - end) * block_size; @@ -121,19 +123,16 @@ static void iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len) { struct iomap_page *iop = to_iomap_page(page); - struct inode *inode = page->mapping->host; - unsigned first = off >> inode->i_blkbits; - unsigned last = (off + len - 1) >> inode->i_blkbits; - unsigned int i; bool uptodate = true; if (iop) { - for (i = 0; i < i_blocks_per_page(inode, page); i++) { - if (i >= first && i <= last) - set_bit(i, iop->uptodate); - else if (!test_bit(i, iop->uptodate)) - uptodate = false; - } + struct inode *inode = page->mapping->host; + unsigned first = off >> inode->i_blkbits; + unsigned count = len >> inode->i_blkbits; + + bitmap_set(iop->uptodate, first, count); + if (!bitmap_full(iop->uptodate, i_blocks_per_page(inode, page))) + uptodate = false; } if (uptodate && !PageError(page)) @@ -194,6 +193,7 @@ iomap_read_inline_data(struct inode *inode, struct page *page, return; BUG_ON(page->index); + BUG_ON(PageCompound(page)); BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); addr = kmap_atomic(page); @@ -203,6 +203,16 @@ iomap_read_inline_data(struct inode *inode, struct page *page, SetPageUptodate(page); } +/* + * Estimate the number of vectors we need based on the current page size; + * if we're wrong we'll end up doing an overly large allocation or needing + * to do a second allocation, neither of which is a big deal. + */ +static unsigned int iomap_nr_vecs(struct page *page, loff_t length) +{ + return (length + page_size(page) - 1) >> page_shift(page); +} + static loff_t iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, struct iomap *iomap) @@ -222,7 +232,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, } /* zero post-eof blocks as the page may be mapped */ - iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); + iomap_adjust_read_range(inode, page, &pos, length, &poff, &plen); if (plen == 0) goto done; @@ -258,7 +268,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) { gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); - int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT; + int nr_vecs = iomap_nr_vecs(page, length); if (ctx->bio) submit_bio(ctx->bio); @@ -293,9 +303,9 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops) unsigned poff; loff_t ret; - for (poff = 0; poff < PAGE_SIZE; poff += ret) { - ret = iomap_apply(inode, page_offset(page) + poff, - PAGE_SIZE - poff, 0, ops, &ctx, + for (poff = 0; poff < page_size(page); poff += ret) { + ret = iomap_apply(inode, file_offset_of_page(page) + poff, + page_size(page) - poff, 0, ops, &ctx, iomap_readpage_actor); if (ret <= 0) { WARN_ON_ONCE(ret == 0); @@ -328,7 +338,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, while (!list_empty(pages)) { struct page *page = lru_to_page(pages); - if (page_offset(page) >= (u64)pos + length) + if (file_offset_of_page(page) >= (u64)pos + length) break; list_del(&page->lru); @@ -342,7 +352,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, * readpages call itself as every page gets checked again once * actually needed. */ - *done += PAGE_SIZE; + *done += page_size(page); put_page(page); } @@ -355,9 +365,14 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, { struct iomap_readpage_ctx *ctx = data; loff_t done, ret; + size_t left = 0; + + if (ctx->cur_page) + left = page_size(ctx->cur_page) - + offset_in_this_page(ctx->cur_page, pos); for (done = 0; done < length; done += ret) { - if (ctx->cur_page && offset_in_page(pos + done) == 0) { + if (ctx->cur_page && left == 0) { if (!ctx->cur_page_in_bio) unlock_page(ctx->cur_page); put_page(ctx->cur_page); @@ -369,14 +384,27 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, if (!ctx->cur_page) break; ctx->cur_page_in_bio = false; + left = page_size(ctx->cur_page); } ret = iomap_readpage_actor(inode, pos + done, length - done, ctx, iomap); + left -= ret; } return done; } +/* move to fs.h? */ +static inline struct page *readahead_first_page(struct list_head *head) +{ + return list_entry(head->prev, struct page, lru); +} + +static inline struct page *readahead_last_page(struct list_head *head) +{ + return list_entry(head->next, struct page, lru); +} + int iomap_readpages(struct address_space *mapping, struct list_head *pages, unsigned nr_pages, const struct iomap_ops *ops) @@ -385,9 +413,10 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages, .pages = pages, .is_readahead = true, }; - loff_t pos = page_offset(list_entry(pages->prev, struct page, lru)); - loff_t last = page_offset(list_entry(pages->next, struct page, lru)); - loff_t length = last - pos + PAGE_SIZE, ret = 0; + loff_t pos = file_offset_of_page(readahead_first_page(pages)); + loff_t end = file_offset_of_next_page(readahead_last_page(pages)); + loff_t length = end - pos; + loff_t ret = 0; while (length > 0) { ret = iomap_apply(mapping->host, pos, length, 0, ops, @@ -410,7 +439,7 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages, } /* - * Check that we didn't lose a page due to the arcance calling + * Check that we didn't lose a page due to the arcane calling * conventions.. */ WARN_ON_ONCE(!ret && !list_empty(ctx.pages)); @@ -435,7 +464,7 @@ iomap_is_partially_uptodate(struct page *page, unsigned long from, unsigned i; /* Limit range to one page */ - len = min_t(unsigned, PAGE_SIZE - from, count); + len = min_t(unsigned, page_size(page) - from, count); /* First and last blocks in range within page */ first = from >> inode->i_blkbits; @@ -474,7 +503,7 @@ iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len) * If we are invalidating the entire page, clear the dirty state from it * and release it to avoid unnecessary buildup of the LRU. */ - if (offset == 0 && len == PAGE_SIZE) { + if (offset == 0 && len == page_size(page)) { WARN_ON_ONCE(PageWriteback(page)); cancel_dirty_page(page); iomap_page_release(page); @@ -550,18 +579,20 @@ static int __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, struct page *page, struct iomap *iomap) { - struct iomap_page *iop = iomap_page_create(inode, page); loff_t block_size = i_blocksize(inode); loff_t block_start = pos & ~(block_size - 1); loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1); - unsigned from = offset_in_page(pos), to = from + len, poff, plen; + unsigned from = offset_in_this_page(page, pos); + unsigned to = from + len; + unsigned poff, plen; int status = 0; if (PageUptodate(page)) return 0; + iomap_page_create(inode, page); do { - iomap_adjust_read_range(inode, iop, &block_start, + iomap_adjust_read_range(inode, page, &block_start, block_end - block_start, &poff, &plen); if (plen == 0) break; @@ -673,7 +704,7 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len, */ if (unlikely(copied < len && !PageUptodate(page))) return 0; - iomap_set_range_uptodate(page, offset_in_page(pos), len); + iomap_set_range_uptodate(page, offset_in_this_page(page, pos), len); iomap_set_page_dirty(page); return copied; } @@ -685,6 +716,7 @@ iomap_write_end_inline(struct inode *inode, struct page *page, void *addr; WARN_ON_ONCE(!PageUptodate(page)); + BUG_ON(PageCompound(page)); BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data)); addr = kmap_atomic(page); @@ -749,6 +781,10 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, unsigned long bytes; /* Bytes to write to page */ size_t copied; /* Bytes copied from user */ + /* + * XXX: We don't know what size page we'll find in the + * page cache, so only copy up to a regular page boundary. + */ offset = offset_in_page(pos); bytes = min_t(unsigned long, PAGE_SIZE - offset, iov_iter_count(i)); @@ -1041,19 +1077,18 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops) lock_page(page); size = i_size_read(inode); if ((page->mapping != inode->i_mapping) || - (page_offset(page) > size)) { + (file_offset_of_page(page) > size)) { /* We overload EFAULT to mean page got truncated */ ret = -EFAULT; goto out_unlock; } - /* page is wholly or partially inside EOF */ - if (((page->index + 1) << PAGE_SHIFT) > size) - length = offset_in_page(size); + offset = file_offset_of_page(page); + if (size - offset < page_size(page)) + length = offset_in_this_page(page, size); else - length = PAGE_SIZE; + length = page_size(page); - offset = page_offset(page); while (length > 0) { ret = iomap_apply(inode, offset, length, IOMAP_WRITE | IOMAP_FAULT, ops, page, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index bc499ceae392..86be24a8259b 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -139,7 +139,7 @@ loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length, struct iomap_page { atomic_t read_count; atomic_t write_count; - DECLARE_BITMAP(uptodate, PAGE_SIZE / 512); + unsigned long uptodate[]; }; static inline struct iomap_page *to_iomap_page(struct page *page) diff --git a/include/linux/mm.h b/include/linux/mm.h index 294a67b94147..04bea9f9282c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1415,6 +1415,8 @@ static inline void clear_page_pfmemalloc(struct page *page) extern void pagefault_out_of_memory(void); #define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK) +#define offset_in_this_page(page, p) \ + ((unsigned long)(p) & (page_size(page) - 1)) /* * Flags passed to show_mem() and show_free_areas() to suppress output in From patchwork Wed Sep 25 00:52:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159861 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5BA24912 for ; Wed, 25 Sep 2019 00:53:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 39AD521655 for ; Wed, 25 Sep 2019 00:53:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Y7odDPcV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411291AbfIYAwa (ORCPT ); Tue, 24 Sep 2019 20:52:30 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56924 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2411284AbfIYAw1 (ORCPT ); Tue, 24 Sep 2019 20:52:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=cFah0KKXQyC1uWypSS4R6u3pxcVf3XmfQzNfxpF7ksY=; b=Y7odDPcVVmdoYj0pm9p8PjLP3t ap/2uYnhzNSTPQ0UsA7u/qoy71/BdWDnRbwRs9All7mfgg78WzhP7/tPnanfMIYrfaIEDnYNDkhYw eRyYtI9B5iAXvn/e1XMeqprHuDgj0hWf+cBcNxBov9zvMCyPQL5cJSCJ26cDJ0i689lIzfg15k9sb r6SszJGTw0EJLyY2vpMu6lw0n182VL7uK1eCRU0dVZ4NqP02lAr3VHuY78Nryh1ejXt5w8c5nJoaa WMyRjIGNmvYGX5wFgsQdA32lhiKtz2GJV30EsZNGQnXWIXz3AnTPwoK0bXGTIEHXqcJJsKkOZv+3O 3MAVTtCQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076W-EK; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 05/15] xfs: Support large pages Date: Tue, 24 Sep 2019 17:52:04 -0700 Message-Id: <20190925005214.27240-6-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" Mostly this is just checking the page size of each page instead of assuming PAGE_SIZE. Clean up the logic in writepage a little. Signed-off-by: Matthew Wilcox (Oracle) --- fs/xfs/xfs_aops.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 102cfd8a97d6..1a26e9ca626b 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -765,7 +765,7 @@ xfs_add_to_ioend( struct xfs_mount *mp = ip->i_mount; struct block_device *bdev = xfs_find_bdev_for_inode(inode); unsigned len = i_blocksize(inode); - unsigned poff = offset & (PAGE_SIZE - 1); + unsigned poff = offset & (page_size(page) - 1); bool merged, same_page = false; sector_t sector; @@ -843,7 +843,7 @@ xfs_aops_discard_page( if (error && !XFS_FORCED_SHUTDOWN(mp)) xfs_alert(mp, "page discard unable to remove delalloc mapping."); out_invalidate: - xfs_vm_invalidatepage(page, 0, PAGE_SIZE); + xfs_vm_invalidatepage(page, 0, page_size(page)); } /* @@ -984,8 +984,7 @@ xfs_do_writepage( struct xfs_writepage_ctx *wpc = data; struct inode *inode = page->mapping->host; loff_t offset; - uint64_t end_offset; - pgoff_t end_index; + uint64_t end_offset; trace_xfs_writepage(inode, page, 0, 0); @@ -1024,10 +1023,9 @@ xfs_do_writepage( * ---------------------------------^------------------| */ offset = i_size_read(inode); - end_index = offset >> PAGE_SHIFT; - if (page->index < end_index) - end_offset = (xfs_off_t)(page->index + 1) << PAGE_SHIFT; - else { + end_offset = file_offset_of_next_page(page); + + if (end_offset > offset) { /* * Check whether the page to write out is beyond or straddles * i_size or not. @@ -1039,7 +1037,8 @@ xfs_do_writepage( * | | Straddles | * ---------------------------------^-----------|--------| */ - unsigned offset_into_page = offset & (PAGE_SIZE - 1); + unsigned offset_into_page = offset_in_this_page(page, offset); + pgoff_t end_index = offset >> PAGE_SHIFT; /* * Skip the page if it is fully outside i_size, e.g. due to a @@ -1070,7 +1069,7 @@ xfs_do_writepage( * memory is zeroed when mapped, and writes to that region are * not written out to the file." */ - zero_user_segment(page, offset_into_page, PAGE_SIZE); + zero_user_segment(page, offset_into_page, page_size(page)); /* Adjust the end_offset to the end of file */ end_offset = offset; From patchwork Wed Sep 25 00:52:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159863 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C72D91709 for ; Wed, 25 Sep 2019 00:53:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9AF06217F4 for ; Wed, 25 Sep 2019 00:53:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="suSAuWrP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411276AbfIYAwZ (ORCPT ); Tue, 24 Sep 2019 20:52:25 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56910 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2411271AbfIYAwY (ORCPT ); Tue, 24 Sep 2019 20:52:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=NExg9D+3Wa1fP9cAWyjnC7ZkvWAvXtW4PTf8NgFY27g=; b=suSAuWrP7cezUqpexhdh3X0z5y 6NfZdi50ztEisvpucYDUd8qslBQOK3CD6GRgm+x6K58lGz0PeZGxnmTHEX8GjdeiMReZOwB2tUvRD Ew63ydItKTG+uFzaJHLbLjIMuwD9cRlJ2fySGe4+u4RjeUxIxuIxu+LI3ShyZHjB+GomS6ZVduzGK xJ69gFGUxCtMNnt08HEHqOvKbSH+2Moaf/jZfQ0e5ELf6Uxce85lsSaTXOrXsXyL/zyOnAAOt4juK 8vWodAioL6Ilx6s4i1jwAgrpZNaiqjOyGbD4+EeQcPerWqpglu+A04YHWqDz0duVCXYxNzdQK6PJ+ VhX6mFDw==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076b-Fc; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 06/15] xfs: Pass a page to xfs_finish_page_writeback Date: Tue, 24 Sep 2019 17:52:05 -0700 Message-Id: <20190925005214.27240-7-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" The only part of the bvec we were accessing was the bv_page, so just pass that instead of the whole bvec. Signed-off-by: Matthew Wilcox (Oracle) --- fs/xfs/xfs_aops.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 1a26e9ca626b..edcb4797fcc2 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -58,21 +58,21 @@ xfs_find_daxdev_for_inode( static void xfs_finish_page_writeback( struct inode *inode, - struct bio_vec *bvec, + struct page *page, int error) { - struct iomap_page *iop = to_iomap_page(bvec->bv_page); + struct iomap_page *iop = to_iomap_page(page); if (error) { - SetPageError(bvec->bv_page); + SetPageError(page); mapping_set_error(inode->i_mapping, -EIO); } - ASSERT(iop || i_blocks_per_page(inode, bvec->bv_page) <= 1); + ASSERT(iop || i_blocks_per_page(inode, page) <= 1); ASSERT(!iop || atomic_read(&iop->write_count) > 0); if (!iop || atomic_dec_and_test(&iop->write_count)) - end_page_writeback(bvec->bv_page); + end_page_writeback(page); } /* @@ -106,7 +106,7 @@ xfs_destroy_ioend( /* walk each page on bio, ending page IO on them */ bio_for_each_segment_all(bvec, bio, iter_all) - xfs_finish_page_writeback(inode, bvec, error); + xfs_finish_page_writeback(inode, bvec->bv_page, error); bio_put(bio); } From patchwork Wed Sep 25 00:52:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159855 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5187912 for ; Wed, 25 Sep 2019 00:53:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8157421655 for ; Wed, 25 Sep 2019 00:53:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Ri0Wt1OF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2633479AbfIYAxG (ORCPT ); Tue, 24 Sep 2019 20:53:06 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56952 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2411298AbfIYAwd (ORCPT ); Tue, 24 Sep 2019 20:52:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Y1NQt4anQEfBjooPzpptnZgqbwsCqUUifwKGuz058Us=; b=Ri0Wt1OFlgSu5WqI76Qo7Dl3PA z/6CoduVoj4eqcbssamNy4bItMsi2FmnNE73+TMXyxGFkSsgJbsM4w40M7Js3TSPsWObahgOTg9lQ b05c4+Nd8s5Us4TcGl88hCHlhigXp/0ZMhbxt5I2plWxdZBYf7fbVAvxx8qgvPH5CHRJoRqOw49jD liwU5sBhX1zd41WD0zvIZ4KKGnbzgRif6SlIwP3GfVozKLYsyA8OYyX5o7XgCMAJfN2daJOOOfMUA JSCi7ZXFc7Q3/yGTDESyNR5Fk7YdwFB6tegDEi88Ti7VR6Dgo9+3f3kD+JoaVVCvHltzYM4ij6mbH 4EDfUNzw==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076f-Gq; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 07/15] mm: Make prep_transhuge_page tail-callable Date: Tue, 24 Sep 2019 17:52:06 -0700 Message-Id: <20190925005214.27240-8-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" By permitting NULL or order-0 pages as an argument, and returning the argument, callers can write: return prep_transhuge_page(alloc_pages(...)); instead of assigning the result to a temporary variable and conditionally passing that to prep_transhuge_page(). Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Kirill A. Shutemov --- include/linux/huge_mm.h | 7 +++++-- mm/huge_memory.c | 9 +++++++-- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 61c9ffd89b05..779e83800a77 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -153,7 +153,7 @@ extern unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); -extern void prep_transhuge_page(struct page *page); +extern struct page *prep_transhuge_page(struct page *page); extern void free_transhuge_page(struct page *page); bool can_split_huge_page(struct page *page, int *pextra_pins); @@ -303,7 +303,10 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, return false; } -static inline void prep_transhuge_page(struct page *page) {} +static inline struct page *prep_transhuge_page(struct page *page) +{ + return page; +} #define transparent_hugepage_flags 0UL diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 73fc517c08d2..cbe7d0619439 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -516,15 +516,20 @@ static inline struct deferred_split *get_deferred_split_queue(struct page *page) } #endif -void prep_transhuge_page(struct page *page) +struct page *prep_transhuge_page(struct page *page) { + if (!page || compound_order(page) == 0) + return page; /* - * we use page->mapping and page->indexlru in second tail page + * we use page->mapping and page->index in second tail page * as list_head: assuming THP order >= 2 */ + BUG_ON(compound_order(page) == 1); INIT_LIST_HEAD(page_deferred_list(page)); set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR); + + return page; } static unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len, From patchwork Wed Sep 25 00:52:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159867 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E390F912 for ; Wed, 25 Sep 2019 00:53:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C1F14214DA for ; Wed, 25 Sep 2019 00:53:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="pQ5Pucn6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411288AbfIYAxl (ORCPT ); Tue, 24 Sep 2019 20:53:41 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56898 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404150AbfIYAwU (ORCPT ); Tue, 24 Sep 2019 20:52:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=xv/WUgbtqmO8yeljjTDvPzq/gNoRpGQ4K8yB2J8JANQ=; b=pQ5Pucn6DpB0Nn2u6v1J8JtcZG jRr2OOsCdxVdedM65MUAhg3AJ2wNUxUHxdyJULiq/HbDeurkmI/dMeJafXlCohSZNojdCRvqBjXje TFNYwBDYDGNZQei0GPFRZ0UgezdPpJhfggT14HstvnTFUbYVGJQjjDYpIwGnu9Vrc+3FfaQ1H8vZm 7YbMlfeeUjpxSyi23GUmAxLNXAdlhuOYmOxFgByHAJApnyG4ZbEpJVqjuuUOuMqiBwkKhAMytb27e JrEKI2G7Gssy3puSO71D6wDAd7HZEU2LDS6ozP6OI4moMCNHZ7qP/7WpeiX+xVns39fD+NzZArMoC 3tM4PWzA==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076j-I7; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 08/15] mm: Add __page_cache_alloc_order Date: Tue, 24 Sep 2019 17:52:07 -0700 Message-Id: <20190925005214.27240-9-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" This new function allows page cache pages to be allocated that are larger than an order-0 page. Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Kirill A. Shutemov --- include/linux/pagemap.h | 14 +++++++++++--- mm/filemap.c | 12 ++++++++---- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 103205494ea0..d610a49be571 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -208,14 +208,22 @@ static inline int page_cache_add_speculative(struct page *page, int count) } #ifdef CONFIG_NUMA -extern struct page *__page_cache_alloc(gfp_t gfp); +extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order); #else -static inline struct page *__page_cache_alloc(gfp_t gfp) +static inline +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order) { - return alloc_pages(gfp, 0); + if (order == 0) + return alloc_pages(gfp, 0); + return prep_transhuge_page(alloc_pages(gfp | __GFP_COMP, order)); } #endif +static inline struct page *__page_cache_alloc(gfp_t gfp) +{ + return __page_cache_alloc_order(gfp, 0); +} + static inline struct page *page_cache_alloc(struct address_space *x) { return __page_cache_alloc(mapping_gfp_mask(x)); diff --git a/mm/filemap.c b/mm/filemap.c index 625ef3ef19f3..bab97addbb1d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -962,24 +962,28 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, EXPORT_SYMBOL_GPL(add_to_page_cache_lru); #ifdef CONFIG_NUMA -struct page *__page_cache_alloc(gfp_t gfp) +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order) { int n; struct page *page; + if (order > 0) + gfp |= __GFP_COMP; + if (cpuset_do_page_mem_spread()) { unsigned int cpuset_mems_cookie; do { cpuset_mems_cookie = read_mems_allowed_begin(); n = cpuset_mem_spread_node(); - page = __alloc_pages_node(n, gfp, 0); + page = __alloc_pages_node(n, gfp, order); + prep_transhuge_page(page); } while (!page && read_mems_allowed_retry(cpuset_mems_cookie)); return page; } - return alloc_pages(gfp, 0); + return prep_transhuge_page(alloc_pages(gfp, order)); } -EXPORT_SYMBOL(__page_cache_alloc); +EXPORT_SYMBOL(__page_cache_alloc_order); #endif /* From patchwork Wed Sep 25 00:52:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E106912 for ; Wed, 25 Sep 2019 00:53:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7A85B2168B for ; Wed, 25 Sep 2019 00:53:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="uN4G2Y5e" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411299AbfIYAwc (ORCPT ); Tue, 24 Sep 2019 20:52:32 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56940 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2411289AbfIYAwa (ORCPT ); Tue, 24 Sep 2019 20:52:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=4eLFInkalxdv/sw4VoZ/JGQOQopPh05FfkwoHlyPkHg=; b=uN4G2Y5e/IFFyzI5vzkyq9bD5O oQuuzIZbJlYUPt/CXX1xsBPVnzUFCpKS++qcwiDKgM9+XQAjgrXprTYPPhsb+X0dAl6aqRG+fci/n Zh8JLGwV03UI7QCiMFQeNxQXofuXQ27mkpDxQaiK56gSGsW5nX8qzi/IUQzn514r/ovUZinSwO6E2 01Dg2mcF8Y2mmyLqTQdqmvyTuOYF9oi3Goy2NkaRWM9YDw/WLEv2uw2b9IuiaGkIsR1vC+24Jn4xG sayzdbjPq++cLENak4akLGHWWGPyNTWZkotZSDxzBw1iW/gwICJYxHd478jR0XOx06UcKbznGmmw9 sTGPtDHg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076n-JL; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 09/15] mm: Allow large pages to be added to the page cache Date: Tue, 24 Sep 2019 17:52:08 -0700 Message-Id: <20190925005214.27240-10-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" We return -EEXIST if there are any non-shadow entries in the page cache in the range covered by the large page. If there are multiple shadow entries in the range, we set *shadowp to one of them (currently the one at the highest index). If that turns out to be the wrong answer, we can implement something more complex. This is mostly modelled after the equivalent function in the shmem code. Signed-off-by: Matthew Wilcox (Oracle) --- mm/filemap.c | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index bab97addbb1d..afe8f5d95810 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -855,6 +855,7 @@ static int __add_to_page_cache_locked(struct page *page, int huge = PageHuge(page); struct mem_cgroup *memcg; int error; + unsigned int nr = 1; void *old; VM_BUG_ON_PAGE(!PageLocked(page), page); @@ -866,31 +867,45 @@ static int __add_to_page_cache_locked(struct page *page, gfp_mask, &memcg, false); if (error) return error; + xas_set_order(&xas, offset, compound_order(page)); + nr = compound_nr(page); } - get_page(page); + page_ref_add(page, nr); page->mapping = mapping; page->index = offset; do { + unsigned long exceptional = 0; + unsigned int i = 0; + xas_lock_irq(&xas); - old = xas_load(&xas); - if (old && !xa_is_value(old)) + xas_for_each_conflict(&xas, old) { + if (!xa_is_value(old)) + break; + exceptional++; + if (shadowp) + *shadowp = old; + } + if (old) xas_set_err(&xas, -EEXIST); - xas_store(&xas, page); + xas_create_range(&xas); if (xas_error(&xas)) goto unlock; - if (xa_is_value(old)) { - mapping->nrexceptional--; - if (shadowp) - *shadowp = old; +next: + xas_store(&xas, page); + if (++i < nr) { + xas_next(&xas); + goto next; } - mapping->nrpages++; + mapping->nrexceptional -= exceptional; + mapping->nrpages += nr; /* hugetlb pages do not participate in page cache accounting */ if (!huge) - __inc_node_page_state(page, NR_FILE_PAGES); + __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, + nr); unlock: xas_unlock_irq(&xas); } while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK)); @@ -907,7 +922,7 @@ static int __add_to_page_cache_locked(struct page *page, /* Leave page->index set: truncation relies upon it */ if (!huge) mem_cgroup_cancel_charge(page, memcg, false); - put_page(page); + page_ref_sub(page, nr); return xas_error(&xas); } ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO); From patchwork Wed Sep 25 00:52:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 612961709 for ; Wed, 25 Sep 2019 00:52:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 356DF2168B for ; Wed, 25 Sep 2019 00:52:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="fLE+BnEN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411321AbfIYAwk (ORCPT ); Tue, 24 Sep 2019 20:52:40 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56966 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2411314AbfIYAwi (ORCPT ); Tue, 24 Sep 2019 20:52:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=4Ix+rtdReVpr8/SR97OGNImGLNvgwHl+UyKKIKsaKkA=; b=fLE+BnENwkn3sUfzaA5z8CuT+r Be5iNPaViRlrUw+WAU59Nk8I17KZr/+h8+MXkPaBLlD7tH4U1aD7CrcPwSC/SAKNtuevnrzEWVt2W 5IbOU9J2PlawK466boFdlugxGvR6Ic9ddf1G+k4xEu5auD6F4aGrnAN3uUMwt2IEiivIQCy1e4RYp pkvm9w5XVb3ujNdYSBJQZMCS4QNeVwFFm1qHWLvxUaCBrKDt/tqioBuri4s+3mr5AOxfL5W1Bxzc6 SZxCRgWEDlNAvj8TNdv//vCox6ob3janG2IaVAI0AAcMoHldLHDqWSqkRkTYZNuBm/gugv5pNlKC+ KnG52EEg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076r-Kb; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 10/15] mm: Allow find_get_page to be used for large pages Date: Tue, 24 Sep 2019 17:52:09 -0700 Message-Id: <20190925005214.27240-11-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" Add FGP_PMD to indicate that we're trying to find-or-create a page that is at least PMD_ORDER in size. The internal 'conflict' entry usage is modelled after that in DAX, but the implementations are different due to DAX using multi-order entries and the page cache using multiple order-0 entries. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 13 ++++++ mm/filemap.c | 99 +++++++++++++++++++++++++++++++++++------ 2 files changed, 99 insertions(+), 13 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d610a49be571..d6d97f9fb762 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -248,6 +248,19 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, #define FGP_NOFS 0x00000010 #define FGP_NOWAIT 0x00000020 #define FGP_FOR_MMAP 0x00000040 +/* + * If you add more flags, increment FGP_ORDER_SHIFT (no further than 25). + * Do not insert flags above the FGP order bits. + */ +#define FGP_ORDER_SHIFT 7 +#define FGP_PMD ((PMD_SHIFT - PAGE_SHIFT) << FGP_ORDER_SHIFT) +#define FGP_PUD ((PUD_SHIFT - PAGE_SHIFT) << FGP_ORDER_SHIFT) + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define fgp_order(fgp) ((fgp) >> FGP_ORDER_SHIFT) +#else +#define fgp_order(fgp) 0 +#endif struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, int fgp_flags, gfp_t cache_gfp_mask); diff --git a/mm/filemap.c b/mm/filemap.c index afe8f5d95810..8eca91547e40 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1576,7 +1576,71 @@ struct page *find_get_entry(struct address_space *mapping, pgoff_t offset) return page; } -EXPORT_SYMBOL(find_get_entry); + +static bool pagecache_is_conflict(struct page *page) +{ + return page == XA_RETRY_ENTRY; +} + +/** + * __find_get_page - Find and get a page cache entry. + * @mapping: The address_space to search. + * @offset: The page cache index. + * @order: The minimum order of the entry to return. + * + * Looks up the page cache entries at @mapping between @offset and + * @offset + 2^@order. If there is a page cache page, it is returned with + * an increased refcount unless it is smaller than @order. + * + * If the slot holds a shadow entry of a previously evicted page, or a + * swap entry from shmem/tmpfs, it is returned. + * + * Return: the found page, a value indicating a conflicting page or %NULL if + * there are no pages in this range. + */ +static struct page *__find_get_page(struct address_space *mapping, + unsigned long offset, unsigned int order) +{ + XA_STATE(xas, &mapping->i_pages, offset); + struct page *page; + + rcu_read_lock(); +repeat: + xas_reset(&xas); + page = xas_find(&xas, offset | ((1UL << order) - 1)); + if (xas_retry(&xas, page)) + goto repeat; + /* + * A shadow entry of a recently evicted page, or a swap entry from + * shmem/tmpfs. Skip it; keep looking for pages. + */ + if (xa_is_value(page)) + goto repeat; + if (!page) + goto out; + if (compound_order(page) < order) { + page = XA_RETRY_ENTRY; + goto out; + } + + if (!page_cache_get_speculative(page)) + goto repeat; + + /* + * Has the page moved or been split? + * This is part of the lockless pagecache protocol. See + * include/linux/pagemap.h for details. + */ + if (unlikely(page != xas_reload(&xas))) { + put_page(page); + goto repeat; + } + page = find_subpage(page, offset); +out: + rcu_read_unlock(); + + return page; +} /** * find_lock_entry - locate, pin and lock a page cache entry @@ -1618,12 +1682,12 @@ EXPORT_SYMBOL(find_lock_entry); * pagecache_get_page - find and get a page reference * @mapping: the address_space to search * @offset: the page index - * @fgp_flags: PCG flags + * @fgp_flags: FGP flags * @gfp_mask: gfp mask to use for the page cache data page allocation * * Looks up the page cache slot at @mapping & @offset. * - * PCG flags modify how the page is returned. + * FGP flags modify how the page is returned. * * @fgp_flags can be: * @@ -1636,6 +1700,10 @@ EXPORT_SYMBOL(find_lock_entry); * - FGP_FOR_MMAP: Similar to FGP_CREAT, only we want to allow the caller to do * its own locking dance if the page is already in cache, or unlock the page * before returning if we had to add the page to pagecache. + * - FGP_PMD: We're only interested in pages at PMD granularity. If there + * is no page here (and FGP_CREATE is set), we'll create one large enough. + * If there is a smaller page in the cache that overlaps the PMD page, we + * return %NULL and do not attempt to create a page. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. @@ -1649,10 +1717,11 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, { struct page *page; + BUILD_BUG_ON(((63 << FGP_ORDER_SHIFT) >> FGP_ORDER_SHIFT) != 63); repeat: - page = find_get_entry(mapping, offset); - if (xa_is_value(page)) - page = NULL; + page = __find_get_page(mapping, offset, fgp_order(fgp_flags)); + if (pagecache_is_conflict(page)) + return NULL; if (!page) goto no_page; @@ -1686,7 +1755,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (fgp_flags & FGP_NOFS) gfp_mask &= ~__GFP_FS; - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc_order(gfp_mask, fgp_order(fgp_flags)); if (!page) return NULL; @@ -1704,13 +1773,17 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (err == -EEXIST) goto repeat; } + if (page) { + if (fgp_order(fgp_flags)) + count_vm_event(THP_FILE_ALLOC); - /* - * add_to_page_cache_lru locks the page, and for mmap we expect - * an unlocked page. - */ - if (page && (fgp_flags & FGP_FOR_MMAP)) - unlock_page(page); + /* + * add_to_page_cache_lru locks the page, and + * for mmap we expect an unlocked page. + */ + if (fgp_flags & FGP_FOR_MMAP) + unlock_page(page); + } } return page; From patchwork Wed Sep 25 00:52:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159875 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 961BF1709 for ; Wed, 25 Sep 2019 00:53:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 60AB921655 for ; Wed, 25 Sep 2019 00:53:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="PzRlI0jU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391842AbfIYAwS (ORCPT ); Tue, 24 Sep 2019 20:52:18 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56876 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391536AbfIYAwS (ORCPT ); Tue, 24 Sep 2019 20:52:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=0cjGwEA4PqaAXASo8dxcqkZcnZ5NT/5g/ODeUF4SeXg=; b=PzRlI0jUb52P6YMDT6xr4CtRV9 Epx4aVhJjHCShqUftshMX21nvkY5k+8iRR3PEPOy7gsp4+C/Aj/jLNlUBx8E2OYwX4df1eaqFw6RR nHNJ3rRa1jEtQh31NVdrK6O/MhmVg6uAfFBgksHPUmDXZrcV/NI7bMiSwlp48c3Q0QKN8eEgBwVx8 +4lRwTmFLJdKK7x5MGmpVj9A4my2Wc+Xv9BM0T2yG8nDgiiZJd+j0sr4Vw1UF6lRlholMWGz896fS yrnF5BCeMf/CR2AgLLSzG0b/G7LKi/XQE+lXnAbriFY8u9hVH++Qcv59FbzdN7D0HhjRvS3JXWqrC nPhQs1Cg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00076v-Lz; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 11/15] mm: Remove hpage_nr_pages Date: Tue, 24 Sep 2019 17:52:10 -0700 Message-Id: <20190925005214.27240-12-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" This function assumed that compound pages were necessarily PMD sized. While that may be true for some users, it's not going to be true for all users forever, so it's better to remove it and avoid the confusion by just using compound_nr() or page_size(). Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Kirill A. Shutemov --- drivers/nvdimm/btt.c | 4 +--- drivers/nvdimm/pmem.c | 3 +-- include/linux/huge_mm.h | 8 -------- include/linux/mm_inline.h | 6 +++--- mm/filemap.c | 2 +- mm/gup.c | 2 +- mm/internal.h | 4 ++-- mm/memcontrol.c | 14 +++++++------- mm/memory_hotplug.c | 4 ++-- mm/mempolicy.c | 2 +- mm/migrate.c | 19 ++++++++++--------- mm/mlock.c | 9 ++++----- mm/page_io.c | 4 ++-- mm/page_vma_mapped.c | 6 +++--- mm/rmap.c | 8 ++++---- mm/swap.c | 4 ++-- mm/swap_state.c | 4 ++-- mm/swapfile.c | 2 +- mm/vmscan.c | 4 ++-- 19 files changed, 49 insertions(+), 60 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index a8d56887ec88..2aac2bf10a37 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -1488,10 +1488,8 @@ static int btt_rw_page(struct block_device *bdev, sector_t sector, { struct btt *btt = bdev->bd_disk->private_data; int rc; - unsigned int len; - len = hpage_nr_pages(page) * PAGE_SIZE; - rc = btt_do_bvec(btt, NULL, page, len, 0, op, sector); + rc = btt_do_bvec(btt, NULL, page, page_size(page), 0, op, sector); if (rc == 0) page_endio(page, op_is_write(op), 0); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index f9f76f6ba07b..778c73fd10d6 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -224,8 +224,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector, struct pmem_device *pmem = bdev->bd_queue->queuedata; blk_status_t rc; - rc = pmem_do_bvec(pmem, page, hpage_nr_pages(page) * PAGE_SIZE, - 0, op, sector); + rc = pmem_do_bvec(pmem, page, page_size(page), 0, op, sector); /* * The ->rw_page interface is subtle and tricky. The core diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 779e83800a77..6018d31549c3 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,12 +226,6 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, else return NULL; } -static inline int hpage_nr_pages(struct page *page) -{ - if (unlikely(PageTransHuge(page))) - return HPAGE_PMD_NR; - return 1; -} struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap); @@ -285,8 +279,6 @@ static inline struct list_head *page_deferred_list(struct page *page) #define HPAGE_PUD_MASK ({ BUILD_BUG(); 0; }) #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; }) -#define hpage_nr_pages(x) 1 - static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) { return false; diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 6f2fef7b0784..3bd675ce6ba8 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -47,14 +47,14 @@ static __always_inline void update_lru_size(struct lruvec *lruvec, static __always_inline void add_page_to_lru_list(struct page *page, struct lruvec *lruvec, enum lru_list lru) { - update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page)); + update_lru_size(lruvec, lru, page_zonenum(page), compound_nr(page)); list_add(&page->lru, &lruvec->lists[lru]); } static __always_inline void add_page_to_lru_list_tail(struct page *page, struct lruvec *lruvec, enum lru_list lru) { - update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page)); + update_lru_size(lruvec, lru, page_zonenum(page), compound_nr(page)); list_add_tail(&page->lru, &lruvec->lists[lru]); } @@ -62,7 +62,7 @@ static __always_inline void del_page_from_lru_list(struct page *page, struct lruvec *lruvec, enum lru_list lru) { list_del(&page->lru); - update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page)); + update_lru_size(lruvec, lru, page_zonenum(page), -compound_nr(page)); } /** diff --git a/mm/filemap.c b/mm/filemap.c index 8eca91547e40..b07ef9469861 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -196,7 +196,7 @@ static void unaccount_page_cache_page(struct address_space *mapping, if (PageHuge(page)) return; - nr = hpage_nr_pages(page); + nr = compound_nr(page); __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); if (PageSwapBacked(page)) { diff --git a/mm/gup.c b/mm/gup.c index 60c3915c8ee6..579dc9426b87 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1469,7 +1469,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, mod_node_page_state(page_pgdat(head), NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); + compound_nr(head)); } } } diff --git a/mm/internal.h b/mm/internal.h index e32390802fd3..abe3a15b456c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -327,7 +327,7 @@ extern void clear_page_mlock(struct page *page); static inline void mlock_migrate_page(struct page *newpage, struct page *page) { if (TestClearPageMlocked(page)) { - int nr_pages = hpage_nr_pages(page); + int nr_pages = compound_nr(page); /* Holding pmd lock, no change in irq context: __mod is safe */ __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); @@ -354,7 +354,7 @@ vma_address(struct page *page, struct vm_area_struct *vma) unsigned long start, end; start = __vma_address(page, vma); - end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1); + end = start + page_size(page) - 1; /* page should be within @vma mapping range */ VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2156ef775d04..9d457684a731 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5406,7 +5406,7 @@ static int mem_cgroup_move_account(struct page *page, struct mem_cgroup *to) { unsigned long flags; - unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1; + unsigned int nr_pages = compound ? compound_nr(page) : 1; int ret; bool anon; @@ -6447,7 +6447,7 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, bool compound) { struct mem_cgroup *memcg = NULL; - unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1; + unsigned int nr_pages = compound ? compound_nr(page) : 1; int ret = 0; if (mem_cgroup_disabled()) @@ -6521,7 +6521,7 @@ int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm, void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, bool lrucare, bool compound) { - unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1; + unsigned int nr_pages = compound ? compound_nr(page) : 1; VM_BUG_ON_PAGE(!page->mapping, page); VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page); @@ -6565,7 +6565,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg, bool compound) { - unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1; + unsigned int nr_pages = compound ? compound_nr(page) : 1; if (mem_cgroup_disabled()) return; @@ -6772,7 +6772,7 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage) /* Force-charge the new page. The old one will be freed soon */ compound = PageTransHuge(newpage); - nr_pages = compound ? hpage_nr_pages(newpage) : 1; + nr_pages = compound ? compound_nr(newpage) : 1; page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) @@ -6995,7 +6995,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) * ancestor for the swap instead and transfer the memory+swap charge. */ swap_memcg = mem_cgroup_id_get_online(memcg); - nr_entries = hpage_nr_pages(page); + nr_entries = compound_nr(page); /* Get references for the tail pages, too */ if (nr_entries > 1) mem_cgroup_id_get_many(swap_memcg, nr_entries - 1); @@ -7041,7 +7041,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) */ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) { - unsigned int nr_pages = hpage_nr_pages(page); + unsigned int nr_pages = compound_nr(page); struct page_counter *counter; struct mem_cgroup *memcg; unsigned short oldid; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b1be791f772d..317478203d20 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1344,8 +1344,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) isolate_huge_page(head, &source); continue; } else if (PageTransHuge(page)) - pfn = page_to_pfn(compound_head(page)) - + hpage_nr_pages(page) - 1; + pfn = page_to_pfn(compound_head(page)) + + compound_nr(page) - 1; /* * HWPoison pages have elevated reference counts so the migration would diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 464406e8da91..586ba2adbfd2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -978,7 +978,7 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist, list_add_tail(&head->lru, pagelist); mod_node_page_state(page_pgdat(head), NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); + compound_nr(head)); } else if (flags & MPOL_MF_STRICT) { /* * Non-movable page may reach here. And, there may be diff --git a/mm/migrate.c b/mm/migrate.c index 73d476d690b1..c3c9a3e70f07 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -191,8 +191,9 @@ void putback_movable_pages(struct list_head *l) unlock_page(page); put_page(page); } else { - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -compound_nr(page)); putback_lru_page(page); } } @@ -381,7 +382,7 @@ static int expected_page_refs(struct address_space *mapping, struct page *page) */ expected_count += is_device_private_page(page); if (mapping) - expected_count += hpage_nr_pages(page) + page_has_private(page); + expected_count += compound_nr(page) + page_has_private(page); return expected_count; } @@ -436,7 +437,7 @@ int migrate_page_move_mapping(struct address_space *mapping, */ newpage->index = page->index; newpage->mapping = page->mapping; - page_ref_add(newpage, hpage_nr_pages(page)); /* add cache reference */ + page_ref_add(newpage, compound_nr(page)); /* add cache reference */ if (PageSwapBacked(page)) { __SetPageSwapBacked(newpage); if (PageSwapCache(page)) { @@ -469,7 +470,7 @@ int migrate_page_move_mapping(struct address_space *mapping, * to one less reference. * We know this isn't the last reference. */ - page_ref_unfreeze(page, expected_count - hpage_nr_pages(page)); + page_ref_unfreeze(page, expected_count - compound_nr(page)); xas_unlock(&xas); /* Leave irq disabled to prevent preemption while updating stats */ @@ -579,7 +580,7 @@ static void copy_huge_page(struct page *dst, struct page *src) } else { /* thp page */ BUG_ON(!PageTransHuge(src)); - nr_pages = hpage_nr_pages(src); + nr_pages = compound_nr(src); } for (i = 0; i < nr_pages; i++) { @@ -1215,7 +1216,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, */ if (likely(!__PageMovable(page))) mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); + page_is_file_cache(page), -compound_nr(page)); } /* @@ -1571,7 +1572,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, list_add_tail(&head->lru, pagelist); mod_node_page_state(page_pgdat(head), NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); + compound_nr(head)); } out_putpage: /* @@ -1912,7 +1913,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) page_lru = page_is_file_cache(page); mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, - hpage_nr_pages(page)); + compound_nr(page)); /* * Isolating the page has taken another reference, so the diff --git a/mm/mlock.c b/mm/mlock.c index a90099da4fb4..5567d55bf5e1 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -61,8 +61,7 @@ void clear_page_mlock(struct page *page) if (!TestClearPageMlocked(page)) return; - mod_zone_page_state(page_zone(page), NR_MLOCK, - -hpage_nr_pages(page)); + mod_zone_page_state(page_zone(page), NR_MLOCK, -compound_nr(page)); count_vm_event(UNEVICTABLE_PGCLEARED); /* * The previous TestClearPageMlocked() corresponds to the smp_mb() @@ -95,7 +94,7 @@ void mlock_vma_page(struct page *page) if (!TestSetPageMlocked(page)) { mod_zone_page_state(page_zone(page), NR_MLOCK, - hpage_nr_pages(page)); + compound_nr(page)); count_vm_event(UNEVICTABLE_PGMLOCKED); if (!isolate_lru_page(page)) putback_lru_page(page); @@ -192,7 +191,7 @@ unsigned int munlock_vma_page(struct page *page) /* * Serialize with any parallel __split_huge_page_refcount() which * might otherwise copy PageMlocked to part of the tail pages before - * we clear it in the head page. It also stabilizes hpage_nr_pages(). + * we clear it in the head page. It also stabilizes compound_nr(). */ spin_lock_irq(&pgdat->lru_lock); @@ -202,7 +201,7 @@ unsigned int munlock_vma_page(struct page *page) goto unlock_out; } - nr_pages = hpage_nr_pages(page); + nr_pages = compound_nr(page); __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); if (__munlock_isolate_lru_page(page, true)) { diff --git a/mm/page_io.c b/mm/page_io.c index 24ee600f9131..965fcc5701f8 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -40,7 +40,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags, bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9; bio->bi_end_io = end_io; - bio_add_page(bio, page, PAGE_SIZE * hpage_nr_pages(page), 0); + bio_add_page(bio, page, page_size(page), 0); } return bio; } @@ -271,7 +271,7 @@ static inline void count_swpout_vm_event(struct page *page) if (unlikely(PageTransHuge(page))) count_vm_event(THP_SWPOUT); #endif - count_vm_events(PSWPOUT, hpage_nr_pages(page)); + count_vm_events(PSWPOUT, compound_nr(page)); } int __swap_writepage(struct page *page, struct writeback_control *wbc, diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index eff4b4520c8d..dfca512c7b50 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -57,7 +57,7 @@ static inline bool pfn_in_hpage(struct page *hpage, unsigned long pfn) unsigned long hpage_pfn = page_to_pfn(hpage); /* THP can be referenced by any subpage */ - return pfn >= hpage_pfn && pfn - hpage_pfn < hpage_nr_pages(hpage); + return (pfn - hpage_pfn) < compound_nr(hpage); } /** @@ -223,7 +223,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->address >= pvmw->vma->vm_end || pvmw->address >= __vma_address(pvmw->page, pvmw->vma) + - hpage_nr_pages(pvmw->page) * PAGE_SIZE) + page_size(pvmw->page)) return not_found(pvmw); /* Did we cross page table boundary? */ if (pvmw->address % PMD_SIZE == 0) { @@ -264,7 +264,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma) unsigned long start, end; start = __vma_address(page, vma); - end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1); + end = start + page_size(page) - 1; if (unlikely(end < vma->vm_start || start >= vma->vm_end)) return 0; diff --git a/mm/rmap.c b/mm/rmap.c index d9a23bb773bf..2d857283fb41 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1112,7 +1112,7 @@ void do_page_add_anon_rmap(struct page *page, } if (first) { - int nr = compound ? hpage_nr_pages(page) : 1; + int nr = compound ? compound_nr(page) : 1; /* * We use the irq-unsafe __{inc|mod}_zone_page_stat because * these counters are not modified in interrupt context, and @@ -1150,7 +1150,7 @@ void do_page_add_anon_rmap(struct page *page, void page_add_new_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address, bool compound) { - int nr = compound ? hpage_nr_pages(page) : 1; + int nr = compound ? compound_nr(page) : 1; VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); __SetPageSwapBacked(page); @@ -1826,7 +1826,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, return; pgoff_start = page_to_pgoff(page); - pgoff_end = pgoff_start + hpage_nr_pages(page) - 1; + pgoff_end = pgoff_start + compound_nr(page) - 1; anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff_start, pgoff_end) { struct vm_area_struct *vma = avc->vma; @@ -1879,7 +1879,7 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, return; pgoff_start = page_to_pgoff(page); - pgoff_end = pgoff_start + hpage_nr_pages(page) - 1; + pgoff_end = pgoff_start + compound_nr(page) - 1; if (!locked) i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, diff --git a/mm/swap.c b/mm/swap.c index 784dc1620620..25d8c43035a4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -465,7 +465,7 @@ void lru_cache_add_active_or_unevictable(struct page *page, * lock is held(spinlock), which implies preemption disabled. */ __mod_zone_page_state(page_zone(page), NR_MLOCK, - hpage_nr_pages(page)); + compound_nr(page)); count_vm_event(UNEVICTABLE_PGMLOCKED); } lru_cache_add(page); @@ -558,7 +558,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, ClearPageSwapBacked(page); add_page_to_lru_list(page, lruvec, LRU_INACTIVE_FILE); - __count_vm_events(PGLAZYFREE, hpage_nr_pages(page)); + __count_vm_events(PGLAZYFREE, compound_nr(page)); count_memcg_page_event(page, PGLAZYFREE); update_page_reclaim_stat(lruvec, 1, 0); } diff --git a/mm/swap_state.c b/mm/swap_state.c index 8e7ce9a9bc5e..51d8884a693a 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -158,7 +158,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp) void __delete_from_swap_cache(struct page *page, swp_entry_t entry) { struct address_space *address_space = swap_address_space(entry); - int i, nr = hpage_nr_pages(page); + int i, nr = compound_nr(page); pgoff_t idx = swp_offset(entry); XA_STATE(xas, &address_space->i_pages, idx); @@ -251,7 +251,7 @@ void delete_from_swap_cache(struct page *page) xa_unlock_irq(&address_space->i_pages); put_swap_page(page, entry); - page_ref_sub(page, hpage_nr_pages(page)); + page_ref_sub(page, compound_nr(page)); } /* diff --git a/mm/swapfile.c b/mm/swapfile.c index dab43523afdd..2dc7fbde7d9b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1331,7 +1331,7 @@ void put_swap_page(struct page *page, swp_entry_t entry) unsigned char *map; unsigned int i, free_entries = 0; unsigned char val; - int size = swap_entry_size(hpage_nr_pages(page)); + int size = swap_entry_size(compound_nr(page)); si = _swap_info_get(entry); if (!si) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4911754c93b7..a7f9f379e523 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1901,7 +1901,7 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, SetPageLRU(page); lru = page_lru(page); - nr_pages = hpage_nr_pages(page); + nr_pages = compound_nr(page); update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_move(&page->lru, &lruvec->lists[lru]); @@ -2095,7 +2095,7 @@ static void shrink_active_list(unsigned long nr_to_scan, if (page_referenced(page, 0, sc->target_mem_cgroup, &vm_flags)) { - nr_rotated += hpage_nr_pages(page); + nr_rotated += compound_nr(page); /* * Identify referenced, file-backed active pages and * give them one more trip around the active list. So From patchwork Wed Sep 25 00:52:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159869 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64152912 for ; Wed, 25 Sep 2019 00:53:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 36FDF21655 for ; Wed, 25 Sep 2019 00:53:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="KBSXnigF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403905AbfIYAwT (ORCPT ); Tue, 24 Sep 2019 20:52:19 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56880 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391712AbfIYAwS (ORCPT ); Tue, 24 Sep 2019 20:52:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=42ILn7VMcsrJTz7O5GIB2CEUGt2gPkfqKrZQV8C0eas=; b=KBSXnigFVablb8KQbZq/Yf0TKY A8I7AtrNOd1ysd54G0tMobsxGWAqgz4kRXOciO7kJgO1ccogPjiHPU2VUJu1bEVmw0BL9mTF9GAYI dop6v15ZkD9BQD5mF1RWDh62GDWB/rz9lsS0rxjw97BaE+0YlTIqNznzf19FhudztuATJyttfvoJP wJQTLB+YhNze9mVsWig0hXJCk914UIj/2mzniOFv4OCttoi8RdnnPQ7KJ9eeQfhu5zO9A9sTBRpCF GvLPTgpX3i0DBp4/aNjAgHuxrglwKrasFt7x+z18I4EftTVwJYEh3wE3q+hWX6BMZ4U0ffoGQVvNG OAkd0bFg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00077C-Nq; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: William Kucharski , Matthew Wilcox Subject: [PATCH 12/15] mm: Support removing arbitrary sized pages from mapping Date: Tue, 24 Sep 2019 17:52:11 -0700 Message-Id: <20190925005214.27240-13-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: William Kucharski __remove_mapping() assumes that pages can only be either base pages or HPAGE_PMD_SIZE. Ask the page what size it is. Signed-off-by: William Kucharski Signed-off-by: Matthew Wilcox (Oracle) --- mm/vmscan.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index a7f9f379e523..9f44868e640b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -932,10 +932,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, * Note that if SetPageDirty is always performed via set_page_dirty, * and thus under the i_pages lock, then this ordering is not required. */ - if (unlikely(PageTransHuge(page)) && PageSwapCache(page)) - refcount = 1 + HPAGE_PMD_NR; - else - refcount = 2; + refcount = 1 + compound_nr(page); if (!page_ref_freeze(page, refcount)) goto cannot_free; /* note: atomic_cmpxchg in page_ref_freeze provides the smp_rmb */ From patchwork Wed Sep 25 00:52:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159865 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA7CC1709 for ; Wed, 25 Sep 2019 00:53:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A694221655 for ; Wed, 25 Sep 2019 00:53:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Lkn3NuqI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2411259AbfIYAwW (ORCPT ); Tue, 24 Sep 2019 20:52:22 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56904 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404204AbfIYAwV (ORCPT ); Tue, 24 Sep 2019 20:52:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=0lJN/0sM0LXHeuSCNK/VyRj6hx2s9f2LfDmenH+FoVc=; b=Lkn3NuqIyAirOrFb1og7lnpDKD 1C4EQL9P7Um54NNjITxEbJxxHma2imx4dglv7/9NfZ/Iorpdu9NczTtVPc6w+cgFu94qryssi3zI2 dw5WRFPoq/+M0aV4JmiYi6ltKgI9izTyR7qF2hOScD4x5vPGaoI8G+r4c4lAEFkqDGITEm3DZY+8j IvjywdVjiiuoIhs+odCb67vKhZgVhvItF0FsvrR4zKN6m5bWkoKc/M4iBrvShbLNvlMdrF6C/sWys StvA7qAZFkgW21pirNSms0aUO7suhb8qsqcEHAIq5tbO7KVkWXDsXkJOzqmH3yXCvH0Ng1mevtv3W zGLAJK8w==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00077J-RN; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: William Kucharski , Matthew Wilcox Subject: [PATCH 13/15] mm: Add a huge page fault handler for files Date: Tue, 24 Sep 2019 17:52:12 -0700 Message-Id: <20190925005214.27240-14-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: William Kucharski Add filemap_huge_fault() to attempt to satisfy page faults on memory-mapped read-only text pages using THP when possible. Signed-off-by: William Kucharski [rebased on top of mm prep patches -- Matthew] Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 10 +++ include/linux/pagemap.h | 8 ++ mm/filemap.c | 165 ++++++++++++++++++++++++++++++++++++++-- 3 files changed, 178 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 04bea9f9282c..623878f11eaf 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2414,6 +2414,16 @@ extern void truncate_inode_pages_final(struct address_space *); /* generic vm_area_ops exported for stackable file systems */ extern vm_fault_t filemap_fault(struct vm_fault *vmf); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +extern vm_fault_t filemap_huge_fault(struct vm_fault *vmf, + enum page_entry_size pe_size); +#else +static inline vm_fault_t filemap_huge_fault(struct vm_fault *vmf, + enum page_entry_size pe_size) +{ + return VM_FAULT_FALLBACK; +} +#endif extern void filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff); extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d6d97f9fb762..ae09788f5345 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -354,6 +354,14 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, mapping_gfp_mask(mapping)); } +/* This (head) page should be found at this offset in the page cache */ +static inline void page_cache_assert(struct page *page, pgoff_t offset) +{ + VM_BUG_ON_PAGE(PageTail(page), page); + VM_BUG_ON_PAGE(page->index == (offset & ~(compound_nr(page) - 1)), + page); +} + static inline struct page *find_subpage(struct page *page, pgoff_t offset) { if (PageHuge(page)) diff --git a/mm/filemap.c b/mm/filemap.c index b07ef9469861..8017e905df7a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1590,7 +1590,8 @@ static bool pagecache_is_conflict(struct page *page) * * Looks up the page cache entries at @mapping between @offset and * @offset + 2^@order. If there is a page cache page, it is returned with - * an increased refcount unless it is smaller than @order. + * an increased refcount unless it is smaller than @order. This function + * returns the head page, not a tail page. * * If the slot holds a shadow entry of a previously evicted page, or a * swap entry from shmem/tmpfs, it is returned. @@ -1601,7 +1602,7 @@ static bool pagecache_is_conflict(struct page *page) static struct page *__find_get_page(struct address_space *mapping, unsigned long offset, unsigned int order) { - XA_STATE(xas, &mapping->i_pages, offset); + XA_STATE(xas, &mapping->i_pages, offset & ~((1UL << order) - 1)); struct page *page; rcu_read_lock(); @@ -1635,7 +1636,6 @@ static struct page *__find_get_page(struct address_space *mapping, put_page(page); goto repeat; } - page = find_subpage(page, offset); out: rcu_read_unlock(); @@ -1741,11 +1741,12 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, put_page(page); goto repeat; } - VM_BUG_ON_PAGE(page->index != offset, page); + page_cache_assert(page, offset); } if (fgp_flags & FGP_ACCESSED) mark_page_accessed(page); + page = find_subpage(page, offset); no_page: if (!page && (fgp_flags & FGP_CREAT)) { @@ -2638,7 +2639,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) put_page(page); goto retry_find; } - VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page); + page_cache_assert(page, offset); /* * We have a locked page in the page cache, now we need to check @@ -2711,6 +2712,160 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) } EXPORT_SYMBOL(filemap_fault); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/** + * filemap_huge_fault - Read in file data for page fault handling. + * @vmf: struct vm_fault containing details of the fault. + * @pe_size: Page entry size. + * + * filemap_huge_fault() is invoked via the vma operations vector for a + * mapped memory region to read in file data during a page fault. + * + * The goto's are kind of ugly, but this streamlines the normal case of having + * it in the page cache, and handles the special cases reasonably without + * having a lot of duplicated code. + * + * vma->vm_mm->mmap_sem must be held on entry. + * + * If our return value has VM_FAULT_RETRY set, it's because the mmap_sem + * may be dropped before doing I/O or by lock_page_maybe_drop_mmap(). + * + * If our return value does not have VM_FAULT_RETRY set, the mmap_sem + * has not been released. + * + * We never return with VM_FAULT_RETRY and a bit from VM_FAULT_ERROR set. + * + * Return: bitwise-OR of %VM_FAULT_ codes. + */ +vm_fault_t filemap_huge_fault(struct vm_fault *vmf, + enum page_entry_size pe_size) +{ + int error; + struct vm_area_struct *vma = vmf->vma; + struct file *file = vma->vm_file; + struct file *fpin = NULL; + struct address_space *mapping = file->f_mapping; + struct inode *inode = mapping->host; + pgoff_t offset = vmf->pgoff; + pgoff_t max_off; + struct page *page; + vm_fault_t ret = 0; + + if (pe_size != PE_SIZE_PMD) + return VM_FAULT_FALLBACK; + /* Read-only mappings for now */ + if (vmf->flags & FAULT_FLAG_WRITE) + return VM_FAULT_FALLBACK; + if (vma->vm_start & ~HPAGE_PMD_MASK) + return VM_FAULT_FALLBACK; + /* Don't allocate a huge page for the tail of the file (?) */ + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); + if (unlikely((offset | (HPAGE_PMD_NR - 1)) >= max_off)) + return VM_FAULT_FALLBACK; + + /* + * Do we have something in the page cache already? + */ + page = __find_get_page(mapping, offset, HPAGE_PMD_ORDER); + if (likely(page)) { + if (pagecache_is_conflict(page)) + return VM_FAULT_FALLBACK; + /* Readahead the next huge page here? */ + page = find_subpage(page, offset & ~(HPAGE_PMD_NR - 1)); + } else { + /* No page in the page cache at all */ + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + ret = VM_FAULT_MAJOR; +retry_find: + page = pagecache_get_page(mapping, offset, + FGP_CREAT | FGP_FOR_MMAP | FGP_PMD, + vmf->gfp_mask | + __GFP_NOWARN | __GFP_NORETRY); + if (!page) + return VM_FAULT_FALLBACK; + } + + if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) + goto out_retry; + + /* Did it get truncated? */ + if (unlikely(page->mapping != mapping)) { + unlock_page(page); + put_page(page); + goto retry_find; + } + VM_BUG_ON_PAGE(page_to_index(page) != offset, page); + + /* + * We have a locked page in the page cache, now we need to check + * that it's up-to-date. Because we don't readahead in huge_fault, + * this may or may not be due to an error. + */ + if (!PageUptodate(page)) + goto page_not_uptodate; + + /* + * We've made it this far and we had to drop our mmap_sem, now is the + * time to return to the upper layer and have it re-find the vma and + * redo the fault. + */ + if (fpin) { + unlock_page(page); + goto out_retry; + } + + /* + * Found the page and have a reference on it. + * We must recheck i_size under page lock. + */ + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); + if (unlikely(offset >= max_off)) { + unlock_page(page); + put_page(page); + return VM_FAULT_SIGBUS; + } + + ret |= alloc_set_pte(vmf, NULL, page); + unlock_page(page); + if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) + put_page(page); + return ret; + +page_not_uptodate: + ClearPageError(page); + fpin = maybe_unlock_mmap_for_io(vmf, fpin); + error = mapping->a_ops->readpage(file, page); + if (!error) { + wait_on_page_locked(page); + if (!PageUptodate(page)) + error = -EIO; + } + if (fpin) + goto out_retry; + put_page(page); + + if (!error || error == AOP_TRUNCATED_PAGE) + goto retry_find; + + /* Things didn't work out */ + return VM_FAULT_SIGBUS; + +out_retry: + /* + * We dropped the mmap_sem, we need to return to the fault handler to + * re-find the vma and come back and find our hopefully still populated + * page. + */ + if (page) + put_page(page); + if (fpin) + fput(fpin); + return ret | VM_FAULT_RETRY; +} +EXPORT_SYMBOL(filemap_huge_fault); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + void filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff) { From patchwork Wed Sep 25 00:52:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D428617EE for ; Wed, 25 Sep 2019 00:52:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B211220874 for ; Wed, 25 Sep 2019 00:52:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="b1WCC3Fi" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403842AbfIYAwT (ORCPT ); Tue, 24 Sep 2019 20:52:19 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56884 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391736AbfIYAwS (ORCPT ); Tue, 24 Sep 2019 20:52:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=gfc79DzYRzAqx05Od1Rvplr/Hq8No1idlzxjKEIY6m4=; b=b1WCC3Fic73cDDRDtJp3WWeVnN 97AuG5Jt9tVQDpbibgk2a9TQRVt2ouezcRrIJX2ZCAqd3ziWD/FjFZAXCQDiSxslqN0yHGMaaUxiv RWklDZNuzNQN9LRk8bpbSeuNDy/pLNKD4pULWXu8xQvbe6sN3YmVhKNhctDwiiAXezlrHBufNBXlG iC6BiWo3AWcYPKyjKz6x8Nh2d47BuZAr8/YyS2HSqrWUfSAzGMTgxxiTVDAqta7T7LDMXTWonX2CP fOKvyYwjlEStSmDVxuFiZdzjp1Pg5Hx9+21Gqnt/Omnnt7pn2AzYsMg8fniitqbf4N6F98oLX+Egb xW0lI/6w==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXV-00077N-Va; Wed, 25 Sep 2019 00:52:17 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: William Kucharski , Matthew Wilcox Subject: [PATCH 14/15] mm: Align THP mappings for non-DAX Date: Tue, 24 Sep 2019 17:52:13 -0700 Message-Id: <20190925005214.27240-15-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: William Kucharski When we have the opportunity to use transparent huge pages to map a file, we want to follow the same rules as DAX. Signed-off-by: William Kucharski Signed-off-by: Matthew Wilcox (Oracle) --- mm/huge_memory.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cbe7d0619439..670a1780bd2f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -563,8 +563,6 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, if (addr) goto out; - if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) - goto out; addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE); if (addr) From patchwork Wed Sep 25 00:52:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11159873 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47BC01709 for ; Wed, 25 Sep 2019 00:53:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1D01C21783 for ; Wed, 25 Sep 2019 00:53:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bQLt4tYK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392556AbfIYAwT (ORCPT ); Tue, 24 Sep 2019 20:52:19 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:56888 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391738AbfIYAwS (ORCPT ); Tue, 24 Sep 2019 20:52:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=5PLjU7Lobfecd7nUe18LQeptBQVP1uzg6vaKwytBhAY=; b=bQLt4tYKFfbdCslX+iJsC68nvC hqFYBs6pZfNJEIfkUqkUd9cB+0IO+KHSyaEWSPs+DOCDiI31/qEtQ5QH4N+XYvdujhobGoR8jPrjS rIVmX0rCcroUfXV2BYhTnf2hjOe0IHpAozn56Pl4dVXUeOpNrbYYawdiSHTC0SPLTPrHXfvmbmt/g p+/rUPCwPwGncy+1fYolF97BZqaSEViNQQtj+qSPuOGzERdSnkC0ThVBCCYv3sFPsWP13SaMU/AXO 2wpKx035Y45cSpEy0ijiT5k4yOI4uesHAxZZE0kejzkFTFKmrF3tLLauy9RtJ7GzgRaVV1eyHAA5U KNopRiXA==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCvXW-00077V-1g; Wed, 25 Sep 2019 00:52:18 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" Subject: [PATCH 15/15] xfs: Use filemap_huge_fault Date: Tue, 24 Sep 2019 17:52:14 -0700 Message-Id: <20190925005214.27240-16-willy@infradead.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190925005214.27240-1-willy@infradead.org> References: <20190925005214.27240-1-willy@infradead.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Matthew Wilcox (Oracle)" Signed-off-by: Matthew Wilcox (Oracle) --- fs/xfs/xfs_file.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index d952d5962e93..9445196f8056 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1156,6 +1156,8 @@ __xfs_filemap_fault( } else { if (write_fault) ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops); + else if (pe_size) + ret = filemap_huge_fault(vmf, pe_size); else ret = filemap_fault(vmf); } @@ -1181,9 +1183,6 @@ xfs_filemap_huge_fault( struct vm_fault *vmf, enum page_entry_size pe_size) { - if (!IS_DAX(file_inode(vmf->vma->vm_file))) - return VM_FAULT_FALLBACK; - /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, pe_size, (vmf->flags & FAULT_FLAG_WRITE));