From patchwork Tue Aug 25 05:48:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11734959 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43E8D722 for ; Tue, 25 Aug 2020 05:48:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3643E20737 for ; Tue, 25 Aug 2020 05:48:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728882AbgHYFs0 (ORCPT ); Tue, 25 Aug 2020 01:48:26 -0400 Received: from mx2.suse.de ([195.135.220.15]:50540 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728145AbgHYFs0 (ORCPT ); Tue, 25 Aug 2020 01:48:26 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B42BFACCF for ; Tue, 25 Aug 2020 05:48:55 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 1/4] btrfs: refactor @nrptrs calculation of btrfs_buffered_write() Date: Tue, 25 Aug 2020 13:48:05 +0800 Message-Id: <20200825054808.16241-2-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200825054808.16241-1-wqu@suse.com> References: <20200825054808.16241-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org @nrptrs of btrfs_bufferd_write() determines the up limit of pages we can process in one batch. Refactor that calculation into its own function, and fix a small page alignment bug where the result can be one page less than ideal. Despite that, no functionality change. Signed-off-by: Qu Wenruo --- fs/btrfs/file.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5a818ebcb01f..64f744989697 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1620,6 +1620,25 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode) btrfs_drew_write_unlock(&inode->root->snapshot_lock); } +/* Helper to get how many pages we should alloc for the batch */ +static int calc_nr_pages(loff_t pos, struct iov_iter *iov) +{ + int nr_pages; + + /* + * Try to cover the full iov range, as btrfs metadata/data reserve + * and release can be pretty slow, thus the more pages we process in + * one batch the better. + */ + nr_pages = (round_up(pos + iov_iter_count(iov), PAGE_SIZE) - + round_down(pos, PAGE_SIZE)) / PAGE_SIZE; + + nr_pages = min(nr_pages, current->nr_dirtied_pause - + current->nr_dirtied); + nr_pages = max(nr_pages, 8); + return nr_pages; +} + static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) { @@ -1638,10 +1657,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, bool only_release_metadata = false; bool force_page_uptodate = false; - nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE), - PAGE_SIZE / (sizeof(struct page *))); - nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied); - nrptrs = max(nrptrs, 8); + nrptrs = calc_nr_pages(pos, i); pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); if (!pages) return -ENOMEM; From patchwork Tue Aug 25 05:48:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11734961 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 46FA7722 for ; Tue, 25 Aug 2020 05:48:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 300352071E for ; Tue, 25 Aug 2020 05:48:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728903AbgHYFse (ORCPT ); Tue, 25 Aug 2020 01:48:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:50570 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728145AbgHYFsb (ORCPT ); Tue, 25 Aug 2020 01:48:31 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BF873ACCF; Tue, 25 Aug 2020 05:48:58 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 2/4] btrfs: refactor btrfs_buffered_write() into process_one_batch() Date: Tue, 25 Aug 2020 13:48:06 +0800 Message-Id: <20200825054808.16241-3-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200825054808.16241-1-wqu@suse.com> References: <20200825054808.16241-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Inside btrfs_buffered_write() we had a big chunk of code in the while() loop. Refactor this big chunk into process_one_batch(), which will do the main job. This refactor doesn't touch the functioniality at all, just make life easier with more headroom for codes. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 406 ++++++++++++++++++++++++++---------------------- 1 file changed, 216 insertions(+), 190 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 64f744989697..f906f0910310 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1402,7 +1402,10 @@ static int prepare_uptodate_page(struct inode *inode, } /* - * this just gets pages into the page cache and locks them down. + * This just gets pages into the page cache and locks them down. + * + * @force_uptodate: Whether to force the first page uptodate. + * This happens after a short copy on non-uptodate page. */ static noinline int prepare_pages(struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, @@ -1639,226 +1642,192 @@ static int calc_nr_pages(loff_t pos, struct iov_iter *iov) return nr_pages; } -static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, - struct iov_iter *i) +/* + * The helper to copy one batch of data from iov to pages. + * + * @pages: The page pointer array we're going to use. + * @nrptrs: The number of page pointers we can utilize + * @pos: The position of this write, in bytes + * @iov: The source of data. + * @force_uptodate: Do we need to force the first page uptodate. + * true if previous run we hit a short copy. + * + * Return >0 as the bytes we copied, with iov advanced. + * Return 0 if we hit a short copy, and should force the next run to have the + * first page uptodate. + * Return <0 for error. + */ +static ssize_t process_one_batch(struct inode *inode, struct page **pages, + int nrptrs, loff_t pos, struct iov_iter *iov, + bool force_uptodate) { - struct file *file = iocb->ki_filp; - loff_t pos = iocb->ki_pos; - struct inode *inode = file_inode(file); - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct page **pages = NULL; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct extent_state *cached_state = NULL; struct extent_changeset *data_reserved = NULL; + size_t offset = offset_in_page(pos); + size_t sector_offset; + size_t write_bytes = min(iov_iter_count(iov), + nrptrs * (size_t)PAGE_SIZE - offset); + size_t num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE); + size_t reserve_bytes; + size_t dirty_pages; + size_t copied = 0; + size_t dirty_sectors; + size_t num_sectors; + bool only_release_metadata = false; u64 release_bytes = 0; u64 lockstart; u64 lockend; - size_t num_written = 0; - int nrptrs; - int ret = 0; - bool only_release_metadata = false; - bool force_page_uptodate = false; + int extents_locked; + int ret; - nrptrs = calc_nr_pages(pos, i); - pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); - if (!pages) - return -ENOMEM; - while (iov_iter_count(i) > 0) { - struct extent_state *cached_state = NULL; - size_t offset = offset_in_page(pos); - size_t sector_offset; - size_t write_bytes = min(iov_iter_count(i), - nrptrs * (size_t)PAGE_SIZE - - offset); - size_t num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - size_t reserve_bytes; - size_t dirty_pages; - size_t copied; - size_t dirty_sectors; - size_t num_sectors; - int extents_locked; - - WARN_ON(num_pages > nrptrs); - - /* - * Fault pages before locking them in prepare_pages - * to avoid recursive lock - */ - if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) { - ret = -EFAULT; - break; - } + /* + * Fault pages before locking them in prepare_pages to avoid recursive + * lock. + */ + if (unlikely(iov_iter_fault_in_readable(iov, write_bytes))) { + ret = -EFAULT; + goto out; + } - only_release_metadata = false; - sector_offset = pos & (fs_info->sectorsize - 1); - reserve_bytes = round_up(write_bytes + sector_offset, - fs_info->sectorsize); + sector_offset = pos & (fs_info->sectorsize - 1); + reserve_bytes = round_up(write_bytes + sector_offset, + fs_info->sectorsize); - extent_changeset_release(data_reserved); - ret = btrfs_check_data_free_space(BTRFS_I(inode), - &data_reserved, pos, - write_bytes); - if (ret < 0) { - if (btrfs_check_nocow_lock(BTRFS_I(inode), pos, - &write_bytes) > 0) { - /* - * For nodata cow case, no need to reserve - * data space. - */ - only_release_metadata = true; - /* - * our prealloc extent may be smaller than - * write_bytes, so scale down. - */ - num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - reserve_bytes = round_up(write_bytes + - sector_offset, - fs_info->sectorsize); - } else { - break; - } + /* Reserve space for data and metadata */ + ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, pos, + write_bytes); + if (ret < 0) { + if (btrfs_check_nocow_lock(BTRFS_I(inode), pos, + &write_bytes) > 0) { + /* Nodata cow case, no need to reserve data space. */ + only_release_metadata = true; + /* + * our prealloc extent may be smaller than + * write_bytes, so scale down. + */ + num_pages = DIV_ROUND_UP(write_bytes + offset, + PAGE_SIZE); + reserve_bytes = round_up(write_bytes + sector_offset, + fs_info->sectorsize); + } else { + goto out; } + } - WARN_ON(reserve_bytes == 0); - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - reserve_bytes); - if (ret) { - if (!only_release_metadata) - btrfs_free_reserved_data_space(BTRFS_I(inode), - data_reserved, pos, - write_bytes); - else - btrfs_check_nocow_unlock(BTRFS_I(inode)); - break; - } + WARN_ON(reserve_bytes == 0); + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), reserve_bytes); + if (ret) { + if (!only_release_metadata) + btrfs_free_reserved_data_space(BTRFS_I(inode), + data_reserved, pos, write_bytes); + else + btrfs_check_nocow_unlock(BTRFS_I(inode)); + goto out; + } - release_bytes = reserve_bytes; -again: - /* - * This is going to setup the pages array with the number of - * pages we want, so we don't really need to worry about the - * contents of pages from loop to loop - */ - ret = prepare_pages(inode, pages, num_pages, - pos, write_bytes, - force_page_uptodate); - if (ret) { - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes); - break; - } + release_bytes = reserve_bytes; - extents_locked = lock_and_cleanup_extent_if_need( - BTRFS_I(inode), pages, - num_pages, pos, write_bytes, &lockstart, - &lockend, &cached_state); - if (extents_locked < 0) { - if (extents_locked == -EAGAIN) - goto again; - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes); - ret = extents_locked; - break; - } + /* Lock the pages and extent range */ +again: + /* + * This is going to setup the pages array with the number of pages we + * want, so we don't really need to worry about the contents of pages + * from loop to loop. + */ + ret = prepare_pages(inode, pages, num_pages, pos, write_bytes, + force_uptodate); + if (ret) { + btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); + goto out; + } - copied = btrfs_copy_from_user(pos, write_bytes, pages, i); + extents_locked = lock_and_cleanup_extent_if_need(BTRFS_I(inode), pages, + num_pages, pos, write_bytes, &lockstart, + &lockend, &cached_state); + if (extents_locked < 0) { + if (extents_locked == -EAGAIN) + goto again; + btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); + ret = extents_locked; + goto out; + } - num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes); - dirty_sectors = round_up(copied + sector_offset, - fs_info->sectorsize); - dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); + /* Copy the data from iov to pages */ + copied = btrfs_copy_from_user(pos, write_bytes, pages, iov); - /* - * if we have trouble faulting in the pages, fall - * back to one page at a time - */ - if (copied < write_bytes) - nrptrs = 1; + num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes); + dirty_sectors = round_up(copied + sector_offset, fs_info->sectorsize); + dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); - if (copied == 0) { - force_page_uptodate = true; - dirty_sectors = 0; - dirty_pages = 0; - } else { - force_page_uptodate = false; - dirty_pages = DIV_ROUND_UP(copied + offset, - PAGE_SIZE); - } - - if (num_sectors > dirty_sectors) { - /* release everything except the sectors we dirtied */ - release_bytes -= dirty_sectors << - fs_info->sb->s_blocksize_bits; - if (only_release_metadata) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - release_bytes, true); - } else { - u64 __pos; + if (copied == 0) { + dirty_sectors = 0; + dirty_pages = 0; + } else { + dirty_pages = DIV_ROUND_UP(copied + offset, PAGE_SIZE); + } - __pos = round_down(pos, - fs_info->sectorsize) + - (dirty_pages << PAGE_SHIFT); - btrfs_delalloc_release_space(BTRFS_I(inode), - data_reserved, __pos, + if (num_sectors > dirty_sectors) { + /* release everything except the sectors we dirtied */ + release_bytes -= dirty_sectors << fs_info->sb->s_blocksize_bits; + if (only_release_metadata) { + btrfs_delalloc_release_metadata(BTRFS_I(inode), release_bytes, true); - } - } - - release_bytes = round_up(copied + sector_offset, - fs_info->sectorsize); - - if (copied > 0) - ret = btrfs_dirty_pages(BTRFS_I(inode), pages, - dirty_pages, pos, copied, - &cached_state); - - /* - * If we have not locked the extent range, because the range's - * start offset is >= i_size, we might still have a non-NULL - * cached extent state, acquired while marking the extent range - * as delalloc through btrfs_dirty_pages(). Therefore free any - * possible cached extent state to avoid a memory leak. - */ - if (extents_locked) - unlock_extent_cached(&BTRFS_I(inode)->io_tree, - lockstart, lockend, &cached_state); - else - free_extent_state(cached_state); + } else { + u64 __pos; - btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); - if (ret) { - btrfs_drop_pages(pages, num_pages); - break; + __pos = round_down(pos, fs_info->sectorsize) + + (dirty_pages << PAGE_SHIFT); + btrfs_delalloc_release_space(BTRFS_I(inode), + data_reserved, __pos, + release_bytes, true); } + } - release_bytes = 0; - if (only_release_metadata) - btrfs_check_nocow_unlock(BTRFS_I(inode)); + release_bytes = round_up(copied + sector_offset, fs_info->sectorsize); - if (only_release_metadata && copied > 0) { - lockstart = round_down(pos, - fs_info->sectorsize); - lockend = round_up(pos + copied, - fs_info->sectorsize) - 1; + if (copied > 0) + ret = btrfs_dirty_pages(BTRFS_I(inode), pages, dirty_pages, pos, + copied, &cached_state); - set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, - lockend, EXTENT_NORESERVE, NULL, - NULL, GFP_NOFS); - } + /* + * If we have not locked the extent range, because the range's + * start offset is >= i_size, we might still have a non-NULL + * cached extent state, acquired while marking the extent range + * as delalloc through btrfs_dirty_pages(). Therefore free any + * possible cached extent state to avoid a memory leak. + */ + if (extents_locked) + unlock_extent_cached(&BTRFS_I(inode)->io_tree, + lockstart, lockend, &cached_state); + else + free_extent_state(cached_state); + btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); + if (ret) { btrfs_drop_pages(pages, num_pages); + goto out; + } - cond_resched(); + release_bytes = 0; + if (only_release_metadata) + btrfs_check_nocow_unlock(BTRFS_I(inode)); - balance_dirty_pages_ratelimited(inode->i_mapping); + if (only_release_metadata && copied > 0) { + lockstart = round_down(pos, fs_info->sectorsize); + lockend = round_up(pos + copied, fs_info->sectorsize) - 1; - pos += copied; - num_written += copied; + set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend, + EXTENT_NORESERVE, NULL, NULL, GFP_NOFS); } - kfree(pages); + btrfs_drop_pages(pages, num_pages); + + cond_resched(); + balance_dirty_pages_ratelimited(inode->i_mapping); +out: if (release_bytes) { if (only_release_metadata) { btrfs_check_nocow_unlock(BTRFS_I(inode)); @@ -1872,7 +1841,64 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, } } - extent_changeset_free(data_reserved); + if (!copied && ret < 0) + return ret; + return copied; +} + +static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, + struct iov_iter *i) +{ + struct file *file = iocb->ki_filp; + loff_t pos = iocb->ki_pos; + struct inode *inode = file_inode(file); + struct page **pages = NULL; + size_t num_written = 0; + int nrptrs; + int orig_nrptrs; + int ret = 0; + bool force_page_uptodate = false; + + orig_nrptrs = calc_nr_pages(pos, i); + nrptrs = orig_nrptrs; + pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return -ENOMEM; + + while (iov_iter_count(i) > 0) { + ssize_t copied; + + copied = process_one_batch(inode, pages, nrptrs, pos, i, + force_page_uptodate); + if (copied < 0) { + ret = copied; + break; + } + + if (copied == 0) { + /* + * We had a short copy on even the first page, need to + * force the next page uptodate and fall back to page + * by page pace. + */ + nrptrs = 1; + force_page_uptodate = true; + } else { + /* + * Copy finished without problem. No longer need to + * force next page uptodate, and revert to regular + * multi-page pace. + */ + nrptrs = orig_nrptrs; + force_page_uptodate = false; + } + + pos += copied; + num_written += copied; + } + + kfree(pages); + return num_written ? num_written : ret; } From patchwork Tue Aug 25 05:48:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11734965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E46A3138A for ; Tue, 25 Aug 2020 05:48:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D6A6720737 for ; Tue, 25 Aug 2020 05:48:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728919AbgHYFsj (ORCPT ); Tue, 25 Aug 2020 01:48:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:50582 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728890AbgHYFsc (ORCPT ); Tue, 25 Aug 2020 01:48:32 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7F2C9AD77; Tue, 25 Aug 2020 05:49:01 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 3/4] btrfs: remove the again: tag in process_one_batch() Date: Tue, 25 Aug 2020 13:48:07 +0800 Message-Id: <20200825054808.16241-4-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200825054808.16241-1-wqu@suse.com> References: <20200825054808.16241-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The again: tag here is for us to retry when we failed to lock extent range, which is only used once. Instead of the open tag, integrate prepare_pages() and lock_and_cleanup_extent_if_need() into lock_pages_and_extent(), and do the retry inside the function. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 183 +++++++++++++++++++++++------------------------- 1 file changed, 86 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f906f0910310..67d2368a8fa6 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1456,83 +1456,6 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages, } -/* - * This function locks the extent and properly waits for data=ordered extents - * to finish before allowing the pages to be modified if need. - * - * The return value: - * 1 - the extent is locked - * 0 - the extent is not locked, and everything is OK - * -EAGAIN - need re-prepare the pages - * the other < 0 number - Something wrong happens - */ -static noinline int -lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, - size_t num_pages, loff_t pos, - size_t write_bytes, - u64 *lockstart, u64 *lockend, - struct extent_state **cached_state) -{ - struct btrfs_fs_info *fs_info = inode->root->fs_info; - u64 start_pos; - u64 last_pos; - int i; - int ret = 0; - - start_pos = round_down(pos, fs_info->sectorsize); - last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1; - - if (start_pos < inode->vfs_inode.i_size) { - struct btrfs_ordered_extent *ordered; - - lock_extent_bits(&inode->io_tree, start_pos, last_pos, - cached_state); - ordered = btrfs_lookup_ordered_range(inode, start_pos, - last_pos - start_pos + 1); - if (ordered && - ordered->file_offset + ordered->num_bytes > start_pos && - ordered->file_offset <= last_pos) { - unlock_extent_cached(&inode->io_tree, start_pos, - last_pos, cached_state); - for (i = 0; i < num_pages; i++) { - unlock_page(pages[i]); - put_page(pages[i]); - } - btrfs_start_ordered_extent(&inode->vfs_inode, - ordered, 1); - btrfs_put_ordered_extent(ordered); - return -EAGAIN; - } - if (ordered) - btrfs_put_ordered_extent(ordered); - - *lockstart = start_pos; - *lockend = last_pos; - ret = 1; - } - - /* - * It's possible the pages are dirty right now, but we don't want - * to clean them yet because copy_from_user may catch a page fault - * and we might have to fall back to one page at a time. If that - * happens, we'll unlock these pages and we'd have a window where - * reclaim could sneak in and drop the once-dirty page on the floor - * without writing it. - * - * We have the pages locked and the extent range locked, so there's - * no way someone can start IO on any dirty pages in this range. - * - * We'll call btrfs_dirty_pages() later on, and that will flip around - * delalloc bits and dirty the pages as required. - */ - for (i = 0; i < num_pages; i++) { - set_page_extent_mapped(pages[i]); - WARN_ON(!PageLocked(pages[i])); - } - - return ret; -} - static int check_can_nocow(struct btrfs_inode *inode, loff_t pos, size_t *write_bytes, bool nowait) { @@ -1642,6 +1565,87 @@ static int calc_nr_pages(loff_t pos, struct iov_iter *iov) return nr_pages; } +/* + * The helper to lock both pages and extent bits + * + * Return 0 if the extent is not locked and everything is OK. + * Return 1 if the extent is locked and everything is OK. + * Return <0 for error. + */ +static int lock_pages_and_extent(struct btrfs_inode *inode, struct page **pages, + int num_pages, loff_t pos, size_t write_bytes, + u64 *lockstart, u64 *lockend, + struct extent_state **cached_state, + bool force_uptodate) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + u64 start_pos; + u64 last_pos; + int ret; + int i; + +again: + /* Lock the pages */ + ret = prepare_pages(&inode->vfs_inode, pages, num_pages, pos, + write_bytes, force_uptodate); + if (ret < 0) + return ret; + + start_pos = round_down(pos, fs_info->sectorsize); + last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1; + + if (start_pos < inode->vfs_inode.i_size) { + struct btrfs_ordered_extent *ordered; + + /* Lock the extent range */ + lock_extent_bits(&inode->io_tree, start_pos, last_pos, + cached_state); + ordered = btrfs_lookup_ordered_range(inode, start_pos, + last_pos - start_pos + 1); + if (ordered && + ordered->file_offset + ordered->num_bytes > start_pos && + ordered->file_offset <= last_pos) { + unlock_extent_cached(&inode->io_tree, start_pos, + last_pos, cached_state); + for (i = 0; i < num_pages; i++) { + unlock_page(pages[i]); + put_page(pages[i]); + } + btrfs_start_ordered_extent(&inode->vfs_inode, + ordered, 1); + btrfs_put_ordered_extent(ordered); + goto again; + } + if (ordered) + btrfs_put_ordered_extent(ordered); + + *lockstart = start_pos; + *lockend = last_pos; + ret = 1; + } + + /* + * It's possible the pages are dirty right now, but we don't want + * to clean them yet because copy_from_user may catch a page fault + * and we might have to fall back to one page at a time. If that + * happens, we'll unlock these pages and we'd have a window where + * reclaim could sneak in and drop the once-dirty page on the floor + * without writing it. + * + * We have the pages locked and the extent range locked, so there's + * no way someone can start IO on any dirty pages in this range. + * + * We'll call btrfs_dirty_pages() later on, and that will flip around + * delalloc bits and dirty the pages as required. + */ + for (i = 0; i < num_pages; i++) { + set_page_extent_mapped(pages[i]); + WARN_ON(!PageLocked(pages[i])); + } + + return ret; +} + /* * The helper to copy one batch of data from iov to pages. * @@ -1730,29 +1734,14 @@ static ssize_t process_one_batch(struct inode *inode, struct page **pages, release_bytes = reserve_bytes; /* Lock the pages and extent range */ -again: - /* - * This is going to setup the pages array with the number of pages we - * want, so we don't really need to worry about the contents of pages - * from loop to loop. - */ - ret = prepare_pages(inode, pages, num_pages, pos, write_bytes, - force_uptodate); - if (ret) { - btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); - goto out; - } - - extents_locked = lock_and_cleanup_extent_if_need(BTRFS_I(inode), pages, - num_pages, pos, write_bytes, &lockstart, - &lockend, &cached_state); - if (extents_locked < 0) { - if (extents_locked == -EAGAIN) - goto again; + ret = lock_pages_and_extent(BTRFS_I(inode), pages, num_pages, pos, + write_bytes, &lockstart, &lockend, &cached_state, + force_uptodate); + if (ret < 0) { btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); - ret = extents_locked; goto out; } + extents_locked = ret; /* Copy the data from iov to pages */ copied = btrfs_copy_from_user(pos, write_bytes, pages, iov); From patchwork Tue Aug 25 05:48:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11734963 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C52AC138A for ; Tue, 25 Aug 2020 05:48:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B65DC2071E for ; Tue, 25 Aug 2020 05:48:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728913AbgHYFsf (ORCPT ); Tue, 25 Aug 2020 01:48:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:50594 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728861AbgHYFsf (ORCPT ); Tue, 25 Aug 2020 01:48:35 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5C5C9ACCF for ; Tue, 25 Aug 2020 05:49:04 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 4/4] btrfs: avoid allocating unnecessary page pointers Date: Tue, 25 Aug 2020 13:48:08 +0800 Message-Id: <20200825054808.16241-5-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200825054808.16241-1-wqu@suse.com> References: <20200825054808.16241-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Commit 142349f541d0 ("btrfs: lower the dirty balance poll interval") introduced one limit which is definitely suspicious: - ensure we always have 8 pages allocated The 8 lower limit looks pretty strange, this means even we're just writing 4K, we will allocate page pointers for 8 pages no matter what. To me, this 8 pages look more like a upper limit. This 8 pages upper limit looks so incorrect that my eyes alawys correct it into "min(, 8)" other than "max(, 8)". This patch will use a fixed size (SZ_64K) other than page numbers to setup the upper limit. Also, with comment added to show why we need a upper limit. Signed-off-by: Qu Wenruo Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- fs/btrfs/file.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 67d2368a8fa6..de6d1c313042 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1561,7 +1561,14 @@ static int calc_nr_pages(loff_t pos, struct iov_iter *iov) nr_pages = min(nr_pages, current->nr_dirtied_pause - current->nr_dirtied); - nr_pages = max(nr_pages, 8); + + /* + * Limit the batch to 64K, too large batch may lead to higher memory + * pressure and increase the possibility of short-copy. + * With more and more short-copy, the benefit of batch copy would be + * hugely reduced, as we will fall back to page-by-page copy. + */ + nr_pages = min(nr_pages, SZ_64K / PAGE_SIZE); return nr_pages; }