From patchwork Mon Aug 24 07:59:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11732297 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECE8A1392 for ; Mon, 24 Aug 2020 08:00:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE4F220738 for ; Mon, 24 Aug 2020 08:00:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726037AbgHXIAJ (ORCPT ); Mon, 24 Aug 2020 04:00:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:38098 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725947AbgHXIAI (ORCPT ); Mon, 24 Aug 2020 04:00:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 21E92AC98 for ; Mon, 24 Aug 2020 08:00:37 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 1/3] btrfs: change how we calculate the nrptrs for btrfs_buffered_write() Date: Mon, 24 Aug 2020 15:59:57 +0800 Message-Id: <20200824075959.85212-2-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200824075959.85212-1-wqu@suse.com> References: <20200824075959.85212-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org @nrptrs of btrfs_bufferd_write() determines the up limit of pages we can process in one batch. Normally we want it to be as large as possible as btrfs metadata/data reserve and release can cost quite a lot of time. Commit 142349f541d0 ("btrfs: lower the dirty balance poll interval") introduced two extra limitations which are suspicious now: - limit the page number to nr_dirtied_pause - nr_dirtied However I can't find any mainline fs nor iomap utilize these two members. Although page write back still uses those two members, as no other fs utilizeing them at all, I doubt about the usefulness. - ensure we always have 8 pages allocates The 8 lower limit looks pretty strange, this means even we're just writing 4K, we will allocate page* for 8 pages no matter what. To me, this 8 pages look more like a upper limit. This patch will change it by: - Extract the calculation into another function This allows us to add more comment explaining every calculation. - Do proper page alignment calculation The old calculation, DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE) doesn't take @pos into consideration. In fact we can easily have iov_iter_count(i) == 2, but still cross two pages. (pos == page_offset() + PAGE_SIZE - 1). - Remove the useless max(8) - Use PAGE_SIZE independent up limit Now we use 64K as nr_pages limit, so we should get similar performance between different arches. Signed-off-by: Qu Wenruo --- fs/btrfs/file.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5a818ebcb01f..c592350a5a82 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1620,6 +1620,29 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode) btrfs_drew_write_unlock(&inode->root->snapshot_lock); } +/* Helper to get how many pages we should alloc for the batch */ +static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos, + struct iov_iter *iov) +{ + int nr_pages; + + /* + * Try to cover the full iov range, as btrfs metadata/data reserve + * and release can be pretty slow, thus the more pages we process in + * one batch the better. + */ + nr_pages = (round_up(pos + iov_iter_count(iov), PAGE_SIZE) - + round_down(pos, PAGE_SIZE)) / PAGE_SIZE; + + /* + * But still limit it to 64KiB, so we can still get a similar + * buffered write performance between different page sizes + */ + nr_pages = min_t(int, nr_pages, SZ_64K / PAGE_SIZE); + + return nr_pages; +} + static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) { @@ -1638,10 +1661,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, bool only_release_metadata = false; bool force_page_uptodate = false; - nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE), - PAGE_SIZE / (sizeof(struct page *))); - nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied); - nrptrs = max(nrptrs, 8); + nrptrs = get_nr_pages(fs_info, pos, i); pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); if (!pages) return -ENOMEM; From patchwork Mon Aug 24 07:59:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11732299 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 35E4F739 for ; Mon, 24 Aug 2020 08:00:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1ADDC206F0 for ; Mon, 24 Aug 2020 08:00:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726158AbgHXIAN (ORCPT ); Mon, 24 Aug 2020 04:00:13 -0400 Received: from mx2.suse.de ([195.135.220.15]:38150 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725947AbgHXIAL (ORCPT ); Mon, 24 Aug 2020 04:00:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4E520AC98 for ; Mon, 24 Aug 2020 08:00:39 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 2/3] btrfs: refactor btrfs_buffered_write() into process_one_batch() Date: Mon, 24 Aug 2020 15:59:58 +0800 Message-Id: <20200824075959.85212-3-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200824075959.85212-1-wqu@suse.com> References: <20200824075959.85212-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Inside btrfs_buffered_write() we had a big chunk of code in the while() loop. Refactor this big chunk into process_one_batch(), which will do the main job. This refactor doesn't touch the functioniality at all, just make life easier with more headroom for codes. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 409 ++++++++++++++++++++++++++---------------------- 1 file changed, 218 insertions(+), 191 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index c592350a5a82..81d480b5218d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1402,7 +1402,10 @@ static int prepare_uptodate_page(struct inode *inode, } /* - * this just gets pages into the page cache and locks them down. + * This just gets pages into the page cache and locks them down. + * + * @force_uptodate: Whether to force the first page uptodate. + * This happens after a short copy on non-uptodate page. */ static noinline int prepare_pages(struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, @@ -1620,7 +1623,7 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode) btrfs_drew_write_unlock(&inode->root->snapshot_lock); } -/* Helper to get how many pages we should alloc for the batch */ +/* The helper to get how many pages we should alloc for the batch */ static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos, struct iov_iter *iov) { @@ -1643,226 +1646,193 @@ static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos, return nr_pages; } -static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, - struct iov_iter *i) +/* + * The helper to copy one batch of data from iov to pages. + * + * @pages: The page pointer array we're going to use. + * @nrptrs: The number of page pointers we can utilize + * @pos: The position of this write, in bytes + * @iov: The source of data. + * @force_uptodate: Do we need to force the first page uptodate. + * true if previous run we hit a short copy. + * + * Return >0 as the bytes we copied, with iov advanced. + * Return 0 if we hit a short copy, and should force the next run to have the + * first page uptodate. + * Return <0 for error. + */ +static ssize_t process_one_batch(struct inode *inode, struct page **pages, + int nrptrs, loff_t pos, struct iov_iter *iov, + bool force_uptodate) { - struct file *file = iocb->ki_filp; - loff_t pos = iocb->ki_pos; - struct inode *inode = file_inode(file); - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct page **pages = NULL; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct extent_state *cached_state = NULL; struct extent_changeset *data_reserved = NULL; + size_t offset = offset_in_page(pos); + size_t sector_offset; + size_t write_bytes = min(iov_iter_count(iov), + nrptrs * (size_t)PAGE_SIZE - offset); + size_t num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE); + size_t reserve_bytes; + size_t dirty_pages; + size_t copied = 0; + size_t dirty_sectors; + size_t num_sectors; + bool only_release_metadata = false; u64 release_bytes = 0; u64 lockstart; u64 lockend; - size_t num_written = 0; - int nrptrs; - int ret = 0; - bool only_release_metadata = false; - bool force_page_uptodate = false; + int extents_locked; + int ret; - nrptrs = get_nr_pages(fs_info, pos, i); - pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); - if (!pages) - return -ENOMEM; + WARN_ON(num_pages > nrptrs); - while (iov_iter_count(i) > 0) { - struct extent_state *cached_state = NULL; - size_t offset = offset_in_page(pos); - size_t sector_offset; - size_t write_bytes = min(iov_iter_count(i), - nrptrs * (size_t)PAGE_SIZE - - offset); - size_t num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - size_t reserve_bytes; - size_t dirty_pages; - size_t copied; - size_t dirty_sectors; - size_t num_sectors; - int extents_locked; - - WARN_ON(num_pages > nrptrs); - - /* - * Fault pages before locking them in prepare_pages - * to avoid recursive lock - */ - if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) { - ret = -EFAULT; - break; - } + /* + * Fault pages before locking them in prepare_pages to avoid recursive + * lock. + */ + if (unlikely(iov_iter_fault_in_readable(iov, write_bytes))) { + ret = -EFAULT; + goto out; + } - only_release_metadata = false; - sector_offset = pos & (fs_info->sectorsize - 1); - reserve_bytes = round_up(write_bytes + sector_offset, - fs_info->sectorsize); + sector_offset = pos & (fs_info->sectorsize - 1); + reserve_bytes = round_up(write_bytes + sector_offset, + fs_info->sectorsize); - extent_changeset_release(data_reserved); - ret = btrfs_check_data_free_space(BTRFS_I(inode), - &data_reserved, pos, - write_bytes); - if (ret < 0) { - if (btrfs_check_nocow_lock(BTRFS_I(inode), pos, - &write_bytes) > 0) { - /* - * For nodata cow case, no need to reserve - * data space. - */ - only_release_metadata = true; - /* - * our prealloc extent may be smaller than - * write_bytes, so scale down. - */ - num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - reserve_bytes = round_up(write_bytes + - sector_offset, - fs_info->sectorsize); - } else { - break; - } + /* Reserve space for data and metadata */ + ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, pos, + write_bytes); + if (ret < 0) { + if (btrfs_check_nocow_lock(BTRFS_I(inode), pos, + &write_bytes) > 0) { + /* Nodata cow case, no need to reserve data space. */ + only_release_metadata = true; + /* + * our prealloc extent may be smaller than + * write_bytes, so scale down. + */ + num_pages = DIV_ROUND_UP(write_bytes + offset, + PAGE_SIZE); + reserve_bytes = round_up(write_bytes + sector_offset, + fs_info->sectorsize); + } else { + goto out; } + } - WARN_ON(reserve_bytes == 0); - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - reserve_bytes); - if (ret) { - if (!only_release_metadata) - btrfs_free_reserved_data_space(BTRFS_I(inode), - data_reserved, pos, - write_bytes); - else - btrfs_check_nocow_unlock(BTRFS_I(inode)); - break; - } + WARN_ON(reserve_bytes == 0); + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), reserve_bytes); + if (ret) { + if (!only_release_metadata) + btrfs_free_reserved_data_space(BTRFS_I(inode), + data_reserved, pos, write_bytes); + else + btrfs_check_nocow_unlock(BTRFS_I(inode)); + goto out; + } - release_bytes = reserve_bytes; -again: - /* - * This is going to setup the pages array with the number of - * pages we want, so we don't really need to worry about the - * contents of pages from loop to loop - */ - ret = prepare_pages(inode, pages, num_pages, - pos, write_bytes, - force_page_uptodate); - if (ret) { - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes); - break; - } + release_bytes = reserve_bytes; - extents_locked = lock_and_cleanup_extent_if_need( - BTRFS_I(inode), pages, - num_pages, pos, write_bytes, &lockstart, - &lockend, &cached_state); - if (extents_locked < 0) { - if (extents_locked == -EAGAIN) - goto again; - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes); - ret = extents_locked; - break; - } + /* Lock the pages and extent range */ +again: + /* + * This is going to setup the pages array with the number of pages we + * want, so we don't really need to worry about the contents of pages + * from loop to loop. + */ + ret = prepare_pages(inode, pages, num_pages, pos, write_bytes, + force_uptodate); + if (ret) { + btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); + goto out; + } - copied = btrfs_copy_from_user(pos, write_bytes, pages, i); + extents_locked = lock_and_cleanup_extent_if_need(BTRFS_I(inode), pages, + num_pages, pos, write_bytes, &lockstart, + &lockend, &cached_state); + if (extents_locked < 0) { + if (extents_locked == -EAGAIN) + goto again; + btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); + ret = extents_locked; + goto out; + } - num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes); - dirty_sectors = round_up(copied + sector_offset, - fs_info->sectorsize); - dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); + /* Copy the data from iov to pages */ + copied = btrfs_copy_from_user(pos, write_bytes, pages, iov); - /* - * if we have trouble faulting in the pages, fall - * back to one page at a time - */ - if (copied < write_bytes) - nrptrs = 1; + num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes); + dirty_sectors = round_up(copied + sector_offset, fs_info->sectorsize); + dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); - if (copied == 0) { - force_page_uptodate = true; - dirty_sectors = 0; - dirty_pages = 0; - } else { - force_page_uptodate = false; - dirty_pages = DIV_ROUND_UP(copied + offset, - PAGE_SIZE); - } - - if (num_sectors > dirty_sectors) { - /* release everything except the sectors we dirtied */ - release_bytes -= dirty_sectors << - fs_info->sb->s_blocksize_bits; - if (only_release_metadata) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - release_bytes, true); - } else { - u64 __pos; + if (copied == 0) { + dirty_sectors = 0; + dirty_pages = 0; + } else { + dirty_pages = DIV_ROUND_UP(copied + offset, PAGE_SIZE); + } - __pos = round_down(pos, - fs_info->sectorsize) + - (dirty_pages << PAGE_SHIFT); - btrfs_delalloc_release_space(BTRFS_I(inode), - data_reserved, __pos, + if (num_sectors > dirty_sectors) { + /* release everything except the sectors we dirtied */ + release_bytes -= dirty_sectors << fs_info->sb->s_blocksize_bits; + if (only_release_metadata) { + btrfs_delalloc_release_metadata(BTRFS_I(inode), release_bytes, true); - } - } - - release_bytes = round_up(copied + sector_offset, - fs_info->sectorsize); - - if (copied > 0) - ret = btrfs_dirty_pages(BTRFS_I(inode), pages, - dirty_pages, pos, copied, - &cached_state); - - /* - * If we have not locked the extent range, because the range's - * start offset is >= i_size, we might still have a non-NULL - * cached extent state, acquired while marking the extent range - * as delalloc through btrfs_dirty_pages(). Therefore free any - * possible cached extent state to avoid a memory leak. - */ - if (extents_locked) - unlock_extent_cached(&BTRFS_I(inode)->io_tree, - lockstart, lockend, &cached_state); - else - free_extent_state(cached_state); + } else { + u64 __pos; - btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); - if (ret) { - btrfs_drop_pages(pages, num_pages); - break; + __pos = round_down(pos, fs_info->sectorsize) + + (dirty_pages << PAGE_SHIFT); + btrfs_delalloc_release_space(BTRFS_I(inode), + data_reserved, __pos, + release_bytes, true); } + } - release_bytes = 0; - if (only_release_metadata) - btrfs_check_nocow_unlock(BTRFS_I(inode)); + release_bytes = round_up(copied + sector_offset, fs_info->sectorsize); - if (only_release_metadata && copied > 0) { - lockstart = round_down(pos, - fs_info->sectorsize); - lockend = round_up(pos + copied, - fs_info->sectorsize) - 1; + if (copied > 0) + ret = btrfs_dirty_pages(BTRFS_I(inode), pages, dirty_pages, pos, + copied, &cached_state); - set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, - lockend, EXTENT_NORESERVE, NULL, - NULL, GFP_NOFS); - } + /* + * If we have not locked the extent range, because the range's + * start offset is >= i_size, we might still have a non-NULL + * cached extent state, acquired while marking the extent range + * as delalloc through btrfs_dirty_pages(). Therefore free any + * possible cached extent state to avoid a memory leak. + */ + if (extents_locked) + unlock_extent_cached(&BTRFS_I(inode)->io_tree, + lockstart, lockend, &cached_state); + else + free_extent_state(cached_state); + btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); + if (ret) { btrfs_drop_pages(pages, num_pages); + goto out; + } - cond_resched(); + release_bytes = 0; + if (only_release_metadata) + btrfs_check_nocow_unlock(BTRFS_I(inode)); - balance_dirty_pages_ratelimited(inode->i_mapping); + if (only_release_metadata && copied > 0) { + lockstart = round_down(pos, fs_info->sectorsize); + lockend = round_up(pos + copied, fs_info->sectorsize) - 1; - pos += copied; - num_written += copied; + set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend, + EXTENT_NORESERVE, NULL, NULL, GFP_NOFS); } - kfree(pages); + btrfs_drop_pages(pages, num_pages); + + cond_resched(); + balance_dirty_pages_ratelimited(inode->i_mapping); +out: if (release_bytes) { if (only_release_metadata) { btrfs_check_nocow_unlock(BTRFS_I(inode)); @@ -1876,7 +1846,64 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, } } - extent_changeset_free(data_reserved); + if (!copied && ret < 0) + return ret; + return copied; +} + +static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, + struct iov_iter *i) +{ + struct file *file = iocb->ki_filp; + loff_t pos = iocb->ki_pos; + struct inode *inode = file_inode(file); + struct page **pages = NULL; + size_t num_written = 0; + int nrptrs; + int orig_nrptrs; + int ret = 0; + bool force_page_uptodate = false; + + orig_nrptrs = get_nr_pages(BTRFS_I(inode)->root->fs_info, pos, i); + nrptrs = orig_nrptrs; + pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return -ENOMEM; + + while (iov_iter_count(i) > 0) { + ssize_t copied; + + copied = process_one_batch(inode, pages, nrptrs, pos, i, + force_page_uptodate); + if (copied < 0) { + ret = copied; + break; + } + + if (copied == 0) { + /* + * We had a short copy on even the first page, need to + * force the next page uptodate and fall back to page + * by page pace. + */ + nrptrs = 1; + force_page_uptodate = true; + } else { + /* + * Copy finished without problem. No longer need to + * force next page uptodate, and revert to regular + * multi-page pace. + */ + nrptrs = orig_nrptrs; + force_page_uptodate = false; + } + + pos += copied; + num_written += copied; + } + + kfree(pages); + return num_written ? num_written : ret; } From patchwork Mon Aug 24 07:59:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11732301 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9BB1739 for ; Mon, 24 Aug 2020 08:00:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCA4B20738 for ; Mon, 24 Aug 2020 08:00:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726156AbgHXIAP (ORCPT ); Mon, 24 Aug 2020 04:00:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:38186 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726086AbgHXIAN (ORCPT ); Mon, 24 Aug 2020 04:00:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 31FC8AC83 for ; Mon, 24 Aug 2020 08:00:41 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 3/3] btrfs: remove the again: tag in process_one_batch() Date: Mon, 24 Aug 2020 15:59:59 +0800 Message-Id: <20200824075959.85212-4-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200824075959.85212-1-wqu@suse.com> References: <20200824075959.85212-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The again: tag here is for us to retry when we failed to lock extent range, which is only used once. Instead of the open tag, integrate prepare_pages() and lock_and_cleanup_extent_if_need() into lock_pages_and_extent(), and do the retry inside the function. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 183 +++++++++++++++++++++++------------------------- 1 file changed, 86 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 81d480b5218d..de829e42410b 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1456,83 +1456,6 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages, } -/* - * This function locks the extent and properly waits for data=ordered extents - * to finish before allowing the pages to be modified if need. - * - * The return value: - * 1 - the extent is locked - * 0 - the extent is not locked, and everything is OK - * -EAGAIN - need re-prepare the pages - * the other < 0 number - Something wrong happens - */ -static noinline int -lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, - size_t num_pages, loff_t pos, - size_t write_bytes, - u64 *lockstart, u64 *lockend, - struct extent_state **cached_state) -{ - struct btrfs_fs_info *fs_info = inode->root->fs_info; - u64 start_pos; - u64 last_pos; - int i; - int ret = 0; - - start_pos = round_down(pos, fs_info->sectorsize); - last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1; - - if (start_pos < inode->vfs_inode.i_size) { - struct btrfs_ordered_extent *ordered; - - lock_extent_bits(&inode->io_tree, start_pos, last_pos, - cached_state); - ordered = btrfs_lookup_ordered_range(inode, start_pos, - last_pos - start_pos + 1); - if (ordered && - ordered->file_offset + ordered->num_bytes > start_pos && - ordered->file_offset <= last_pos) { - unlock_extent_cached(&inode->io_tree, start_pos, - last_pos, cached_state); - for (i = 0; i < num_pages; i++) { - unlock_page(pages[i]); - put_page(pages[i]); - } - btrfs_start_ordered_extent(&inode->vfs_inode, - ordered, 1); - btrfs_put_ordered_extent(ordered); - return -EAGAIN; - } - if (ordered) - btrfs_put_ordered_extent(ordered); - - *lockstart = start_pos; - *lockend = last_pos; - ret = 1; - } - - /* - * It's possible the pages are dirty right now, but we don't want - * to clean them yet because copy_from_user may catch a page fault - * and we might have to fall back to one page at a time. If that - * happens, we'll unlock these pages and we'd have a window where - * reclaim could sneak in and drop the once-dirty page on the floor - * without writing it. - * - * We have the pages locked and the extent range locked, so there's - * no way someone can start IO on any dirty pages in this range. - * - * We'll call btrfs_dirty_pages() later on, and that will flip around - * delalloc bits and dirty the pages as required. - */ - for (i = 0; i < num_pages; i++) { - set_page_extent_mapped(pages[i]); - WARN_ON(!PageLocked(pages[i])); - } - - return ret; -} - static int check_can_nocow(struct btrfs_inode *inode, loff_t pos, size_t *write_bytes, bool nowait) { @@ -1646,6 +1569,87 @@ static int get_nr_pages(struct btrfs_fs_info *fs_info, loff_t pos, return nr_pages; } +/* + * The helper to lock both pages and extent bits + * + * Return 0 if the extent is not locked and everything is OK. + * Return 1 if the extent is locked and everything is OK. + * Return <0 for error. + */ +static int lock_pages_and_extent(struct btrfs_inode *inode, struct page **pages, + int num_pages, loff_t pos, size_t write_bytes, + u64 *lockstart, u64 *lockend, + struct extent_state **cached_state, + bool force_uptodate) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + u64 start_pos; + u64 last_pos; + int ret; + int i; + +again: + /* Lock the pages */ + ret = prepare_pages(&inode->vfs_inode, pages, num_pages, pos, + write_bytes, force_uptodate); + if (ret < 0) + return ret; + + start_pos = round_down(pos, fs_info->sectorsize); + last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1; + + if (start_pos < inode->vfs_inode.i_size) { + struct btrfs_ordered_extent *ordered; + + /* Lock the extent range */ + lock_extent_bits(&inode->io_tree, start_pos, last_pos, + cached_state); + ordered = btrfs_lookup_ordered_range(inode, start_pos, + last_pos - start_pos + 1); + if (ordered && + ordered->file_offset + ordered->num_bytes > start_pos && + ordered->file_offset <= last_pos) { + unlock_extent_cached(&inode->io_tree, start_pos, + last_pos, cached_state); + for (i = 0; i < num_pages; i++) { + unlock_page(pages[i]); + put_page(pages[i]); + } + btrfs_start_ordered_extent(&inode->vfs_inode, + ordered, 1); + btrfs_put_ordered_extent(ordered); + goto again; + } + if (ordered) + btrfs_put_ordered_extent(ordered); + + *lockstart = start_pos; + *lockend = last_pos; + ret = 1; + } + + /* + * It's possible the pages are dirty right now, but we don't want + * to clean them yet because copy_from_user may catch a page fault + * and we might have to fall back to one page at a time. If that + * happens, we'll unlock these pages and we'd have a window where + * reclaim could sneak in and drop the once-dirty page on the floor + * without writing it. + * + * We have the pages locked and the extent range locked, so there's + * no way someone can start IO on any dirty pages in this range. + * + * We'll call btrfs_dirty_pages() later on, and that will flip around + * delalloc bits and dirty the pages as required. + */ + for (i = 0; i < num_pages; i++) { + set_page_extent_mapped(pages[i]); + WARN_ON(!PageLocked(pages[i])); + } + + return ret; +} + /* * The helper to copy one batch of data from iov to pages. * @@ -1735,29 +1739,14 @@ static ssize_t process_one_batch(struct inode *inode, struct page **pages, release_bytes = reserve_bytes; /* Lock the pages and extent range */ -again: - /* - * This is going to setup the pages array with the number of pages we - * want, so we don't really need to worry about the contents of pages - * from loop to loop. - */ - ret = prepare_pages(inode, pages, num_pages, pos, write_bytes, - force_uptodate); - if (ret) { - btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); - goto out; - } - - extents_locked = lock_and_cleanup_extent_if_need(BTRFS_I(inode), pages, - num_pages, pos, write_bytes, &lockstart, - &lockend, &cached_state); - if (extents_locked < 0) { - if (extents_locked == -EAGAIN) - goto again; + ret = lock_pages_and_extent(BTRFS_I(inode), pages, num_pages, pos, + write_bytes, &lockstart, &lockend, &cached_state, + force_uptodate); + if (ret < 0) { btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); - ret = extents_locked; goto out; } + extents_locked = ret; /* Copy the data from iov to pages */ copied = btrfs_copy_from_user(pos, write_bytes, pages, iov);