From patchwork Fri Nov 4 17:08:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 9412935 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1049B60585 for ; Fri, 4 Nov 2016 17:08:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F04A42B129 for ; Fri, 4 Nov 2016 17:08:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E52FC2B143; Fri, 4 Nov 2016 17:08:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B2982B129 for ; Fri, 4 Nov 2016 17:08:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756811AbcKDRIl (ORCPT ); Fri, 4 Nov 2016 13:08:41 -0400 Received: from mx2.suse.de ([195.135.220.15]:35642 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S936491AbcKDRI1 (ORCPT ); Fri, 4 Nov 2016 13:08:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id AFD2AAD70; Fri, 4 Nov 2016 17:08:25 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 521991E079C; Fri, 4 Nov 2016 18:08:23 +0100 (CET) From: Jan Kara To: Jens Axboe Cc: Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, ocfs2-devel@oss.oracle.com, Evgeniy Dushistov , linux-ntfs-dev@lists.sourceforge.net, Jan Kara Subject: [PATCH 1/6] fs: Provide function to unmap metadata for a range of blocks Date: Fri, 4 Nov 2016 18:08:11 +0100 Message-Id: <1478279296-6575-2-git-send-email-jack@suse.cz> X-Mailer: git-send-email 2.6.6 In-Reply-To: <1478279296-6575-1-git-send-email-jack@suse.cz> References: <1478279296-6575-1-git-send-email-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Provide function equivalent to unmap_underlying_metadata() for a range of blocks. We somewhat optimize the function to use pagevec lookups instead of looking up buffer heads one by one and use page lock to pin buffer heads instead of mapping's private_lock to improve scalability. Signed-off-by: Jan Kara --- fs/buffer.c | 76 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/buffer_head.h | 2 ++ 2 files changed, 78 insertions(+) diff --git a/fs/buffer.c b/fs/buffer.c index b205a629001d..05f30838cec3 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -43,6 +43,7 @@ #include #include #include +#include #include static int fsync_buffers_list(spinlock_t *lock, struct list_head *list); @@ -1636,6 +1637,81 @@ void unmap_underlying_metadata(struct block_device *bdev, sector_t block) } EXPORT_SYMBOL(unmap_underlying_metadata); +/** + * clean_bdev_aliases: clean a range of buffers in block device + * @bdev: Block device to clean buffers in + * @block: Start of a range of blocks to clean + * @len: Number of blocks to clean + * + * We are taking a range of blocks for data and we don't want writeback of any + * buffer-cache aliases starting from return from this function and until the + * moment when something will explicitly mark the buffer dirty (hopefully that + * will not happen until we will free that block ;-) We don't even need to mark + * it not-uptodate - nobody can expect anything from a newly allocated buffer + * anyway. We used to used unmap_buffer() for such invalidation, but that was + * wrong. We definitely don't want to mark the alias unmapped, for example - it + * would confuse anyone who might pick it with bread() afterwards... + * + * Also.. Note that bforget() doesn't lock the buffer. So there can be + * writeout I/O going on against recently-freed buffers. We don't wait on that + * I/O in bforget() - it's more efficient to wait on the I/O only if we really + * need to. That happens here. + */ +void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len) +{ + struct inode *bd_inode = bdev->bd_inode; + struct address_space *bd_mapping = bd_inode->i_mapping; + struct pagevec pvec; + pgoff_t index = block >> (PAGE_SHIFT - bd_inode->i_blkbits); + pgoff_t end; + int i; + struct buffer_head *bh; + struct buffer_head *head; + + end = (block + len - 1) >> (PAGE_SHIFT - bd_inode->i_blkbits); + pagevec_init(&pvec, 0); + while (index <= end && pagevec_lookup(&pvec, bd_mapping, index, + min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { + for (i = 0; i < pagevec_count(&pvec); i++) { + struct page *page = pvec.pages[i]; + + index = page->index; + if (index > end) + break; + if (!page_has_buffers(page)) + continue; + /* + * We use page lock instead of bd_mapping->private_lock + * to pin buffers here since we can afford to sleep and + * it scales better than a global spinlock lock. + */ + lock_page(page); + /* Recheck when the page is locked which pins bhs */ + if (!page_has_buffers(page)) + goto unlock_page; + head = page_buffers(page); + bh = head; + do { + if (!buffer_mapped(bh)) + goto next; + if (bh->b_blocknr >= block + len) + break; + clear_buffer_dirty(bh); + wait_on_buffer(bh); + clear_buffer_req(bh); +next: + bh = bh->b_this_page; + } while (bh != head); +unlock_page: + unlock_page(page); + } + pagevec_release(&pvec); + cond_resched(); + index++; + } +} +EXPORT_SYMBOL(clean_bdev_aliases); + /* * Size is a power-of-two in the range 512..PAGE_SIZE, * and the case we care about most is PAGE_SIZE. diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index ebbacd14d450..9c9c73ce7d4f 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -169,6 +169,8 @@ void invalidate_inode_buffers(struct inode *); int remove_inode_buffers(struct inode *inode); int sync_mapping_buffers(struct address_space *mapping); void unmap_underlying_metadata(struct block_device *bdev, sector_t block); +void clean_bdev_aliases(struct block_device *bdev, sector_t block, + sector_t len); void mark_buffer_async_write(struct buffer_head *bh); void __wait_on_buffer(struct buffer_head *);