From patchwork Sun Aug 29 05:24:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D903C432BE for ; Sun, 29 Aug 2021 05:25:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56974606A5 for ; Sun, 29 Aug 2021 05:25:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234109AbhH2FZ5 (ORCPT ); Sun, 29 Aug 2021 01:25:57 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45746 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233899AbhH2FZz (ORCPT ); Sun, 29 Aug 2021 01:25:55 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 157ED20017 for ; Sun, 29 Aug 2021 05:25:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214703; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lNwuEzLTk+QCOipRrVvhTJVVdfuXHtfj7JnXEW6uPiw=; b=hzyU6XXSQw+lMstgaCEf1Wik/vmqDqP9WWbb3sTMs5ezDwvYYbVtPFxECyrh14sSdNTk/l pHzIHLt37jeG44611vVR83PBR9AvkB1Ri5KX5EZ6pChY6TvS8nrFKPdrOeeP2vch8vWTxc t7YlV/bZdWpfIf/UUlst7h8zh/xUIY0= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 53C3C13964 for ; Sun, 29 Aug 2021 05:25:02 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id kOsZBi4aK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:02 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 01/26] btrfs: remove unused parameter @nr_pages in add_ra_bio_pages() Date: Sun, 29 Aug 2021 13:24:33 +0800 Message-Id: <20210829052458.15454-2-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Variable @nr_pages only get increased but never used. Remove it. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 7869ad12bc6e..a26ce27ad005 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -549,7 +549,6 @@ static noinline int add_ra_bio_pages(struct inode *inode, u64 isize = i_size_read(inode); int ret; struct page *page; - unsigned long nr_pages = 0; struct extent_map *em; struct address_space *mapping = inode->i_mapping; struct extent_map_tree *em_tree; @@ -645,7 +644,6 @@ static noinline int add_ra_bio_pages(struct inode *inode, PAGE_SIZE, 0); if (ret == PAGE_SIZE) { - nr_pages++; put_page(page); } else { unlock_extent(tree, last_offset, end); From patchwork Sun Aug 29 05:24:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96FD8C4320A for ; Sun, 29 Aug 2021 05:25:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7873E60F39 for ; Sun, 29 Aug 2021 05:25:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234468AbhH2FZ6 (ORCPT ); Sun, 29 Aug 2021 01:25:58 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45752 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233813AbhH2FZ4 (ORCPT ); Sun, 29 Aug 2021 01:25:56 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 494BD1FD50 for ; Sun, 29 Aug 2021 05:25:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214704; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sQLMh096uneLUfHF1OdGzRovA1RudrQafl9kw/nIcrA=; b=Wpcl+1jTyyT4Fl4kHK03eiRuPqTzJMwRIyD2upoobPRhwWmIqoF8KpqwB6j/da3lL/g0Do S+fNYyhIWQmdAcMrVVA5y8j+cRp9V8qUyE2F78kEl38+3aqme46rjQ2iYocGY3rzvQA+gU yiVk08yF4QyFyTo10CLgwCX+M9R/tTA= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 865B913964 for ; Sun, 29 Aug 2021 05:25:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id EFhxEi8aK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:03 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 02/26] btrfs: remove unnecessary parameter @delalloc_start for writepage_delalloc() Date: Sun, 29 Aug 2021 13:24:34 +0800 Message-Id: <20210829052458.15454-3-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In function __extent_writepage() we always pass page start to @delalloc_start for writepage_delalloc(). Thus we don't really need @delalloc_start parameter as we can extract it from @page. So this patch will remove @delalloc_start parameter and make __extent_writepage() to declare @page_start and @page_end as const. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5ad749e19ff3..2806949c6243 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3777,10 +3777,11 @@ static void update_nr_written(struct writeback_control *wbc, */ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, struct page *page, struct writeback_control *wbc, - u64 delalloc_start, unsigned long *nr_written) + unsigned long *nr_written) { - u64 page_end = delalloc_start + PAGE_SIZE - 1; + u64 page_end = page_offset(page) + PAGE_SIZE - 1; bool found; + u64 delalloc_start = page_offset(page); u64 delalloc_to_write = 0; u64 delalloc_end = 0; int ret; @@ -4058,8 +4059,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, struct extent_page_data *epd) { struct inode *inode = page->mapping->host; - u64 start = page_offset(page); - u64 page_end = start + PAGE_SIZE - 1; + const u64 page_start = page_offset(page); + const u64 page_end = page_start + PAGE_SIZE - 1; int ret; int nr = 0; size_t pg_offset; @@ -4094,7 +4095,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, } if (!epd->extent_locked) { - ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start, + ret = writepage_delalloc(BTRFS_I(inode), page, wbc, &nr_written); if (ret == 1) return 0; @@ -4145,7 +4146,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, * capable of that. */ if (PageError(page)) - end_extent_writepage(page, ret, start, page_end); + end_extent_writepage(page, ret, page_start, page_end); unlock_page(page); ASSERT(ret <= 0); return ret; From patchwork Sun Aug 29 05:24:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C88EAC4320E for ; Sun, 29 Aug 2021 05:25:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADA3560C3F for ; Sun, 29 Aug 2021 05:25:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234556AbhH2FZ7 (ORCPT ); Sun, 29 Aug 2021 01:25:59 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59210 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234265AbhH2FZ6 (ORCPT ); Sun, 29 Aug 2021 01:25:58 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 7DAF921D68 for ; Sun, 29 Aug 2021 05:25:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214705; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MLYyrxAocR6PGWtx7jg5h2GLsRTsgnVwKR2t8ZNmG8A=; b=n/NwLEwdG/qq3Z44mFxo01kMFlumFTlBg5M58xQF9+MaEx3FAgXTjGa5o289gFEnEnany1 Ef16PA3ooVqYNEAUpAA2xaRTjjdrbngph/nLmpMqGL7l9VOBsYfr85/DumaSIaLS0nXwe2 BPB/91DaY6YvN0qXcJKgXGPp4CG8rFI= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id BAB9D13964 for ; Sun, 29 Aug 2021 05:25:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id yNokHzAaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:04 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 03/26] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Date: Sun, 29 Aug 2021 13:24:35 +0800 Message-Id: <20210829052458.15454-4-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For structure async_chunk, we use a very strange member layout to grab structure async_cow who owns this async_chunk. At initialization, it goes like this: async_chunk[i].pending = &ctx->num_chunks; Then at async_cow_free() we do a super weird freeing: /* * Since the pointer to 'pending' is at the beginning of the array of * async_chunk's, freeing it ensures the whole array has been freed. */ if (atomic_dec_and_test(async_chunk->pending)) kvfree(async_chunk->pending); This is absolutely an abuse of kvfree(). Replace async_chunk::pending with async_chunk::async_cow, so that we can grab the async_cow structure directly, without this strange dancing. And with this change, there is no requirement for any specific member location. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0517f31a3bed..02448857b72f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -455,11 +455,10 @@ struct async_chunk { struct list_head extents; struct cgroup_subsys_state *blkcg_css; struct btrfs_work work; - atomic_t *pending; + struct async_cow *async_cow; }; struct async_cow { - /* Number of chunks in flight; must be first in the structure */ atomic_t num_chunks; struct async_chunk chunks[]; }; @@ -1325,18 +1324,17 @@ static noinline void async_cow_submit(struct btrfs_work *work) static noinline void async_cow_free(struct btrfs_work *work) { struct async_chunk *async_chunk; + struct async_cow *async_cow; async_chunk = container_of(work, struct async_chunk, work); if (async_chunk->inode) btrfs_add_delayed_iput(async_chunk->inode); if (async_chunk->blkcg_css) css_put(async_chunk->blkcg_css); - /* - * Since the pointer to 'pending' is at the beginning of the array of - * async_chunk's, freeing it ensures the whole array has been freed. - */ - if (atomic_dec_and_test(async_chunk->pending)) - kvfree(async_chunk->pending); + + async_cow = async_chunk->async_cow; + if (atomic_dec_and_test(&async_cow->num_chunks)) + kvfree(async_cow); } static int cow_file_range_async(struct btrfs_inode *inode, @@ -1397,7 +1395,7 @@ static int cow_file_range_async(struct btrfs_inode *inode, * lightweight reference for the callback lifetime */ ihold(&inode->vfs_inode); - async_chunk[i].pending = &ctx->num_chunks; + async_chunk[i].async_cow = ctx; async_chunk[i].inode = &inode->vfs_inode; async_chunk[i].start = start; async_chunk[i].end = cur_end; From patchwork Sun Aug 29 05:24:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4A3FC432BE for ; Sun, 29 Aug 2021 05:25:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7EB06606A5 for ; Sun, 29 Aug 2021 05:25:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234601AbhH2F0A (ORCPT ); Sun, 29 Aug 2021 01:26:00 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45758 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234325AbhH2FZ6 (ORCPT ); Sun, 29 Aug 2021 01:25:58 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B0A701FD50 for ; Sun, 29 Aug 2021 05:25:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214706; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hZLbJcFJHlclKPIbFqwLJZHTfnFUxTJlAYZzMPrgCPg=; b=FyCXI1guMUcOFh9F7tuE+yuVL6Ws/spY91hd58jXlClmtxvRvlkVMS3BxUSWiGS9SW32Bs 1tgBxfwTALo6dPb1sxvGLHNPhKo6zbftIu4kAJ+hB19ZvtKGjMsPJ0LGxvZDDKLYOTE6d9 4tA4GGitjA7U5nUL4CWejsbxPv/yogg= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id EEF3413964 for ; Sun, 29 Aug 2021 05:25:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id IJbmKzEaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:05 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 04/26] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Date: Sun, 29 Aug 2021 13:24:36 +0800 Message-Id: <20210829052458.15454-5-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since async_extent holds the compressed page, it would trigger the new ASSERT() in btrfs_mark_ordered_io_finished() which checks the range is inside the page. Now btrfs_writepage_endio_finish_ordered() can accept @page == NULL, just pass NULL to btrfs_writepage_endio_finish_ordered(). Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 02448857b72f..6760ec78df6b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -973,15 +973,12 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) async_extent->nr_pages, async_chunk->write_flags, async_chunk->blkcg_css)) { - struct page *p = async_extent->pages[0]; const u64 start = async_extent->start; const u64 end = start + async_extent->ram_size - 1; - p->mapping = inode->vfs_inode.i_mapping; - btrfs_writepage_endio_finish_ordered(inode, p, start, + btrfs_writepage_endio_finish_ordered(inode, NULL, start, end, false); - p->mapping = NULL; extent_clear_unlock_delalloc(inode, start, end, NULL, 0, PAGE_END_WRITEBACK | PAGE_SET_ERROR); From patchwork Sun Aug 29 05:24:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0778C4320A for ; Sun, 29 Aug 2021 05:25:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AA6B60C3F for ; Sun, 29 Aug 2021 05:25:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234658AbhH2F0D (ORCPT ); Sun, 29 Aug 2021 01:26:03 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45764 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234575AbhH2F0A (ORCPT ); Sun, 29 Aug 2021 01:26:00 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id EAE5120017 for ; Sun, 29 Aug 2021 05:25:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214707; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PZboN9dxztyRIWEML5NyVrokmLlFdMXae9wWIfi1stM=; b=CZZ04gaDd7O0v5s3gisb1Pe7e1dOudeoUYkIPvRyS585in41K9knBsi6zy0UWxuWIx9ERf F3HoLr9K9wF8KMOu1vXygJtVtJH6E610cmjN8yyy5cOd61jRtoOJv/yxtx6VEymN2Kxuiw aprJS+ZuwElSIYEvpzyRDNWJuYKc8CM= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 3247113964 for ; Sun, 29 Aug 2021 05:25:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id QHiqODIaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:06 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 05/26] btrfs: make add_ra_bio_pages() to be subpage compatible Date: Sun, 29 Aug 2021 13:24:37 +0800 Message-Id: <20210829052458.15454-6-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] If we remove the subpage limitation in add_ra_bio_pages(), then read a compressed extent which has part of its range in next page, like the following inode layout: 0 32K 64K 96K 128K |<--------------|-------------->| Btrfs will trigger ASSERT() in endio function: assertion failed: atomic_read(&subpage->readers) >= nbits ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3431! Internal error: Oops - BUG: 0 [#1] SMP Workqueue: btrfs-endio btrfs_work_helper [btrfs] Call trace: assertfail.constprop.0+0x28/0x2c [btrfs] btrfs_subpage_end_reader+0x148/0x14c [btrfs] end_page_read+0x8c/0x100 [btrfs] end_bio_extent_readpage+0x320/0x6b0 [btrfs] bio_endio+0x15c/0x1dc end_workqueue_fn+0x44/0x64 [btrfs] btrfs_work_helper+0x74/0x250 [btrfs] process_one_work+0x1d4/0x47c worker_thread+0x180/0x400 kthread+0x11c/0x120 ret_from_fork+0x10/0x30 ---[ end trace c8b7b552d3bb408c ]--- [CAUSE] When we read the page range [0, 64K), we find it's a compressed extent, and we will try to add extra pages in add_ra_bio_pages() to avoid reading the same compressed extent. But when we add such page into the read bio, it doesn't follow the behavior of btrfs_do_readpage() to properly set subpage::readers. This means, for page [64K, 128K), its subpage::readers is still 0. And when endio is executed on both pages, since page [64K, 128K) has 0 subpage::readers, it triggers above ASSERT() [FIX] Function add_ra_bio_pages() is far from subpage compatible, it always assume PAGE_SIZE == sectorsize, thus when it skip to next range it always just skip PAGE_SIZE. Make it subpage compatible by: - Skip to next page properly when needed If we find there is already a page cache, we need to skip to next page. For that case, we shouldn't just skip PAGE_SIZE bytes, but use @pg_index to calculate the next bytenr and continue. - Only add the page range covered by current extent map We need to calculate which range is covered by current extent map and only add that part into the read bio. - Update subpage::readers before submitting the bio - Use proper cursor other than confusing @last_offset - Calculate the missed threshold based on sector size It's no longer using missed pages, as for 64K page size, we have at most 3 pages to skip. (If aligned only 2 pages) - Add ASSERT() to make sure our bytenr is always aligned - Add comment for the function Add an especial note for subpage case, as the function won't really work well for subpage cases. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 89 +++++++++++++++++++++++++++--------------- fs/btrfs/extent_io.c | 1 + 2 files changed, 59 insertions(+), 31 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index a26ce27ad005..09171e557ba8 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -539,13 +539,24 @@ static u64 bio_end_offset(struct bio *bio) return page_offset(last->bv_page) + last->bv_len + last->bv_offset; } +/* + * Add extra pages in the same compressed file extent so that we don't + * need to re-read the same extent again and again. + * + * NOTE: this won't work well for subpage, as for subpage read, we lock the + * full page then submit bio for each compressed/regular extents. + * + * This means, if we have several sectors in the same page points to the same + * on-disk compressed data, we will re-read the same extent many times and + * this function can only help for the next page. + */ static noinline int add_ra_bio_pages(struct inode *inode, u64 compressed_end, struct compressed_bio *cb) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); unsigned long end_index; - unsigned long pg_index; - u64 last_offset; + u64 cur = bio_end_offset(cb->orig_bio); u64 isize = i_size_read(inode); int ret; struct page *page; @@ -553,10 +564,8 @@ static noinline int add_ra_bio_pages(struct inode *inode, struct address_space *mapping = inode->i_mapping; struct extent_map_tree *em_tree; struct extent_io_tree *tree; - u64 end; - int misses = 0; + int sectors_missed = 0; - last_offset = bio_end_offset(cb->orig_bio); em_tree = &BTRFS_I(inode)->extent_tree; tree = &BTRFS_I(inode)->io_tree; @@ -575,18 +584,29 @@ static noinline int add_ra_bio_pages(struct inode *inode, end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT; - while (last_offset < compressed_end) { - pg_index = last_offset >> PAGE_SHIFT; + while (cur < compressed_end) { + u64 page_end; + u64 pg_index = cur >> PAGE_SHIFT; + u32 add_size; if (pg_index > end_index) break; page = xa_load(&mapping->i_pages, pg_index); if (page && !xa_is_value(page)) { - misses++; - if (misses > 4) + sectors_missed += (PAGE_SIZE - offset_in_page(cur)) >> + fs_info->sectorsize_bits; + + /* Beyond threshold, no need to continue */ + if (sectors_missed > 4) break; - goto next; + + /* + * Jump to next page start as we already have page for + * current offset. + */ + cur = (pg_index << PAGE_SHIFT) + PAGE_SIZE; + continue; } page = __page_cache_alloc(mapping_gfp_constraint(mapping, @@ -596,14 +616,11 @@ static noinline int add_ra_bio_pages(struct inode *inode, if (add_to_page_cache_lru(page, mapping, pg_index, GFP_NOFS)) { put_page(page); - goto next; + /* There is already a page, skip to page end */ + cur = (pg_index << PAGE_SHIFT) + PAGE_SIZE; + continue; } - /* - * at this point, we have a locked page in the page cache - * for these bytes in the file. But, we have to make - * sure they map to this compressed extent on disk. - */ ret = set_page_extent_mapped(page); if (ret < 0) { unlock_page(page); @@ -611,18 +628,22 @@ static noinline int add_ra_bio_pages(struct inode *inode, break; } - end = last_offset + PAGE_SIZE - 1; - lock_extent(tree, last_offset, end); + page_end = (pg_index << PAGE_SHIFT) + PAGE_SIZE - 1; + lock_extent(tree, cur, page_end); read_lock(&em_tree->lock); - em = lookup_extent_mapping(em_tree, last_offset, - PAGE_SIZE); + em = lookup_extent_mapping(em_tree, cur, page_end + 1 - cur); read_unlock(&em_tree->lock); - if (!em || last_offset < em->start || - (last_offset + PAGE_SIZE > extent_map_end(em)) || + /* + * At this point, we have a locked page in the page cache + * for these bytes in the file. But, we have to make + * sure they map to this compressed extent on disk. + */ + if (!em || cur < em->start || + (cur + fs_info->sectorsize > extent_map_end(em)) || (em->block_start >> 9) != cb->orig_bio->bi_iter.bi_sector) { free_extent_map(em); - unlock_extent(tree, last_offset, end); + unlock_extent(tree, cur, page_end); unlock_page(page); put_page(page); break; @@ -640,19 +661,25 @@ static noinline int add_ra_bio_pages(struct inode *inode, } } + add_size = min(em->start + em->len, page_end + 1) - cur; ret = bio_add_page(cb->orig_bio, page, - PAGE_SIZE, 0); - - if (ret == PAGE_SIZE) { - put_page(page); - } else { - unlock_extent(tree, last_offset, end); + add_size, offset_in_page(cur)); + if (ret != add_size) { + unlock_extent(tree, cur, page_end); unlock_page(page); put_page(page); break; } -next: - last_offset += PAGE_SIZE; + /* + * If it's subpage, we also need to increase its + * subpage::readers numbre, as at endio we will + * decrease subpage::readers and to unlock the page. + */ + if (fs_info->sectorsize < PAGE_SIZE) + btrfs_subpage_start_reader(fs_info, page, + cur, add_size); + put_page(page); + cur += add_size; } return 0; } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2806949c6243..3214ecb36cf3 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3599,6 +3599,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, bool force_bio_submit = false; u64 disk_bytenr; + ASSERT(IS_ALIGNED(cur, fs_info->sectorsize)); if (cur >= last_byte) { struct extent_state *cached = NULL; From patchwork Sun Aug 29 05:24:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B03D7C432BE for ; Sun, 29 Aug 2021 05:25:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9811260F3A for ; Sun, 29 Aug 2021 05:25:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234688AbhH2F0D (ORCPT ); Sun, 29 Aug 2021 01:26:03 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45770 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234628AbhH2F0B (ORCPT ); Sun, 29 Aug 2021 01:26:01 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2CB921FD50 for ; Sun, 29 Aug 2021 05:25:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214709; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FXIfAS3DwuHmKLLhtvsKPdqqP8Vvq0Un6/Vvnovofcs=; b=aizUmbl5B+iGSLKixGXfrzNPEc7hGfSFcFhlN1xzr9wDFpK9fC2aIWoy5DZyDLJjSlOuEP Z+OkbAmlfIUtW1dAHhwgZu8IHwsJBEfAbXdQJy8Pjk1DqBRFeOQ1EzTi4WfCGolPk+AxtX PQYkxn7L+EGOkEqj8RcIHm8IZj6CDs0= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 691ED13964 for ; Sun, 29 Aug 2021 05:25:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id IOQiCzQaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:08 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 06/26] btrfs: introduce compressed_bio::pending_sectors to trace compressed bio more elegantly Date: Sun, 29 Aug 2021 13:24:38 +0800 Message-Id: <20210829052458.15454-7-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For btrfs_submit_compressed_read() and btrfs_submit_compressed_write(), we have a pretty weird dance around compressed_bio::pending_bios: btrfs_submit_compressed_read/write() { cb = kmalloc() refcount_set(&cb->pending_bios, 0); bio = btrfs_alloc_bio(); /* NOTE here, we haven't yet submitted any bio */ refcount_set(&cb->pending_bios, 1); for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) { if (submit) { /* Here we submit bio, but we always have one * extra pending_bios */ refcount_inc(&cb->pending_bios); ret = btrfs_map_bio(); } } /* Submit the last bio */ ret = btrfs_map_bio(); } There are two reasons why we do this: - compressed_bio::pending_bios is a refcount Thus if it's reduced to 0, it can not be increased again. - To ensure the compressed_bio is not freed by some submitted bios If the submitted bio is finished before the next bio submitted, we can free the compressed_bio completely. But the above code is sometimes confusing, and we can do it better by just introduce a new member, compressed_bio::pending_sectors. Now we use compressed_bio::pending_sectors to indicate whether we have any pending sectors under IO or not yet submitted. If pending_sectors == 0, we're definitely the last bio of compressed_bio, and is OK to release the compressed bio. Now the workflow looks like this: btrfs_submit_compressed_read/write() { cb = kmalloc() atomic_set(&cb->pending_bios, 0); refcount_set(&cb->pending_sectors, compressed_len >> sectorsize_bits); bio = btrfs_alloc_bio(); for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) { if (submit) { refcount_inc(&cb->pending_bios); ret = btrfs_map_bio(); } } /* Submit the last bio */ refcount_inc(&cb->pending_bios); ret = btrfs_map_bio(); } For now we still need pending_bios for later error handling, but will remove pending_bios eventually after properly handling the errors. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 78 ++++++++++++++++++++++++------------------ fs/btrfs/compression.h | 5 ++- 2 files changed, 49 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 09171e557ba8..ad5064fe81a5 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -192,6 +192,39 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio, return 0; } +/* + * Reduce bio and io accounting for a compressed_bio with its coresponding bio. + * + * Return true if there is no pending bio nor io. + * Return false otherwise. + */ +static bool dec_and_test_compressed_bio(struct compressed_bio *cb, + struct bio *bio) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); + unsigned int bi_size = 0; + bool last_io = false; + struct bio_vec *bvec; + struct bvec_iter_all iter_all; + + /* + * At endio time, bi_iter.bi_size doesn't represent the real bio size. + * Thus here we have to iterate through all segments to grab correct + * bio size. + */ + bio_for_each_segment_all(bvec, bio, iter_all) + bi_size += bvec->bv_len; + + if (bio->bi_status) + cb->errors = 1; + + ASSERT(bi_size && bi_size <= cb->compressed_len); + last_io = refcount_sub_and_test(bi_size >> fs_info->sectorsize_bits, + &cb->pending_sectors); + atomic_dec(&cb->pending_bios); + return last_io; +} + /* when we finish reading compressed pages from the disk, we * decompress them and then run the bio end_io routines on the * decompressed pages (in the inode address space). @@ -211,13 +244,7 @@ static void end_compressed_bio_read(struct bio *bio) unsigned int mirror = btrfs_io_bio(bio)->mirror_num; int ret = 0; - if (bio->bi_status) - cb->errors = 1; - - /* if there are more bios still pending for this compressed - * extent, just exit - */ - if (!refcount_dec_and_test(&cb->pending_bios)) + if (!dec_and_test_compressed_bio(cb, bio)) goto out; /* @@ -335,13 +362,7 @@ static void end_compressed_bio_write(struct bio *bio) struct page *page; unsigned int index; - if (bio->bi_status) - cb->errors = 1; - - /* if there are more bios still pending for this compressed - * extent, just exit - */ - if (!refcount_dec_and_test(&cb->pending_bios)) + if (!dec_and_test_compressed_bio(cb, bio)) goto out; /* ok, we're the last bio for this extent, step one is to @@ -407,7 +428,9 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS); if (!cb) return BLK_STS_RESOURCE; - refcount_set(&cb->pending_bios, 0); + atomic_set(&cb->pending_bios, 0); + refcount_set(&cb->pending_sectors, + compressed_len >> fs_info->sectorsize_bits); cb->errors = 0; cb->inode = &inode->vfs_inode; cb->start = start; @@ -440,7 +463,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, bio->bi_opf |= REQ_CGROUP_PUNT; kthread_associate_blkcg(blkcg_css); } - refcount_set(&cb->pending_bios, 1); /* create and submit bios for the compressed pages */ bytes_left = compressed_len; @@ -468,13 +490,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, page->mapping = NULL; if (submit || len < PAGE_SIZE) { - /* - * inc the count before we submit the bio so - * we know the end IO handler won't happen before - * we inc the count. Otherwise, the cb might get - * freed before we're done setting it up - */ - refcount_inc(&cb->pending_bios); + atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); BUG_ON(ret); /* -ENOMEM */ @@ -512,6 +528,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cond_resched(); } + atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); BUG_ON(ret); /* -ENOMEM */ @@ -733,7 +750,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, if (!cb) goto out; - refcount_set(&cb->pending_bios, 0); + atomic_set(&cb->pending_bios, 0); + refcount_set(&cb->pending_sectors, + compressed_len >> fs_info->sectorsize_bits); cb->errors = 0; cb->inode = inode; cb->mirror_num = mirror_num; @@ -777,7 +796,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, comp_bio->bi_opf = REQ_OP_READ; comp_bio->bi_private = cb; comp_bio->bi_end_io = end_compressed_bio_read; - refcount_set(&cb->pending_bios, 1); for (pg_index = 0; pg_index < nr_pages; pg_index++) { u32 pg_len = PAGE_SIZE; @@ -806,18 +824,11 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, if (submit || bio_add_page(comp_bio, page, pg_len, 0) < pg_len) { unsigned int nr_sectors; + atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA); BUG_ON(ret); /* -ENOMEM */ - /* - * inc the count before we submit the bio so - * we know the end IO handler won't happen before - * we inc the count. Otherwise, the cb might get - * freed before we're done setting it up - */ - refcount_inc(&cb->pending_bios); - ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); BUG_ON(ret); /* -ENOMEM */ @@ -841,6 +852,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, cur_disk_byte += pg_len; } + atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA); BUG_ON(ret); /* -ENOMEM */ diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 399be0b435bf..61955581e34f 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -29,7 +29,10 @@ struct btrfs_inode; struct compressed_bio { /* number of bios pending for this compressed extent */ - refcount_t pending_bios; + atomic_t pending_bios; + + /* Number of sectors with unfinished IO (unsubmitted or unfinished) */ + refcount_t pending_sectors; /* Number of compressed pages in the array */ unsigned int nr_pages; From patchwork Sun Aug 29 05:24:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463705 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 116ACC43214 for ; Sun, 29 Aug 2021 05:25:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED502606A5 for ; Sun, 29 Aug 2021 05:25:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234683AbhH2F0E (ORCPT ); Sun, 29 Aug 2021 01:26:04 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59216 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234650AbhH2F0C (ORCPT ); Sun, 29 Aug 2021 01:26:02 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 5FCB621D68 for ; Sun, 29 Aug 2021 05:25:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214710; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=njfTHUB8zLg08la2hNUVqBTZk4x3qaVSUqwSeqAwF8c=; b=D16syrYA4OdnA2n5xlvW8siKy/DLw6Qnj/4JOOpYWJPe2AzCtigRrdiCYfPQlohnNeyUxj TdntnXwdxK0OI2uETHF2+eg9mu/L+KxPTpmXoasHQNbzAjZ/kE5yf6l3AkZ1gFLuOqotXq dqLcpLcmQ6Om2C3aSwrYBmQmsnNiDNY= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 9D78513964 for ; Sun, 29 Aug 2021 05:25:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id KFj5FzUaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:09 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 07/26] btrfs: add subpage checked bitmap to make PageChecked flag to be subpage compatible Date: Sun, 29 Aug 2021 13:24:39 +0800 Message-Id: <20210829052458.15454-8-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Although in btrfs we have very limited usage of PageChecked flag, it's still some page flag not yet subpage compatible. Fix it by introducing btrfs_subpage::checked_offset to do the convert. For most call sites, especially for free-space cache, COW fixup and btrfs_invalidatepage(), they all work in full page mode anyway. For other call sites, they work as subpage compatible mode. Some call sites need extra modification: - btrfs_drop_pages() Needs extra parameter to get the real range we need to clear checked flag. Also since btrfs_drop_pages() will accept pages beyond the dirtied range, update btrfs_subpage_clamp_range() to handle such case by setting @len to 0 if the page is beyond target range. - btrfs_invalidatepage() We need to call subpage helper before calling __btrfs_releasepage(), or it will trigger ASSERT() as page->private will be cleared. - btrfs_verify_data_csum() In theory we don't need the io_bio->csum check anymore, but it's won't hurt. Just change the comment. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 11 +++++++-- fs/btrfs/file.c | 20 ++++++++++++---- fs/btrfs/free-space-cache.c | 6 ++++- fs/btrfs/inode.c | 30 +++++++++++------------ fs/btrfs/reflink.c | 2 +- fs/btrfs/subpage.c | 48 +++++++++++++++++++++++++++++++++++-- fs/btrfs/subpage.h | 2 ++ 7 files changed, 92 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index ad5064fe81a5..79379fa741b8 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -29,6 +29,7 @@ #include "extent_io.h" #include "extent_map.h" #include "zoned.h" +#include "subpage.h" static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" }; @@ -296,8 +297,14 @@ static void end_compressed_bio_read(struct bio *bio) * checked so the end_io handlers know about it */ ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) - SetPageChecked(bvec->bv_page); + bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) { + u64 bvec_start = page_offset(bvec->bv_page) + + bvec->bv_offset; + + btrfs_page_set_checked(btrfs_sb(cb->inode->i_sb), + bvec->bv_page, bvec_start, + bvec->bv_len); + } bio_endio(cb->orig_bio); } diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 7ff577005d0f..03986354a0cc 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -437,9 +437,16 @@ static noinline int btrfs_copy_from_user(loff_t pos, size_t write_bytes, /* * unlocks pages after btrfs_file_write is done with them */ -static void btrfs_drop_pages(struct page **pages, size_t num_pages) +static void btrfs_drop_pages(struct btrfs_fs_info *fs_info, + struct page **pages, size_t num_pages, + u64 pos, u64 copied) { size_t i; + u64 block_start = round_down(pos, fs_info->sectorsize); + u64 block_len = round_up(pos + copied, fs_info->sectorsize) - + block_start; + + ASSERT(block_len <= U32_MAX); for (i = 0; i < num_pages; i++) { /* page checked is some magic around finding pages that * have been modified without going through btrfs_set_page_dirty @@ -447,7 +454,8 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages) * accessed as prepare_pages should have marked them accessed * in prepare_pages via find_or_create_page() */ - ClearPageChecked(pages[i]); + btrfs_page_clamp_clear_checked(fs_info, pages[i], block_start, + block_len); unlock_page(pages[i]); put_page(pages[i]); } @@ -504,7 +512,8 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, struct page *p = pages[i]; btrfs_page_clamp_set_uptodate(fs_info, p, start_pos, num_bytes); - ClearPageChecked(p); + btrfs_page_clamp_clear_checked(fs_info, p, start_pos, + num_bytes); btrfs_page_clamp_set_dirty(fs_info, p, start_pos, num_bytes); } @@ -1845,7 +1854,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); if (ret) { - btrfs_drop_pages(pages, num_pages); + btrfs_drop_pages(fs_info, pages, num_pages, pos, + copied); break; } @@ -1853,7 +1863,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, if (only_release_metadata) btrfs_check_nocow_unlock(BTRFS_I(inode)); - btrfs_drop_pages(pages, num_pages); + btrfs_drop_pages(fs_info, pages, num_pages, pos, copied); cond_resched(); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 0d26819b1cf6..f3fee88c8ee0 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -22,6 +22,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "discard.h" +#include "subpage.h" #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) #define MAX_CACHE_BYTES_PER_GIG SZ_64K @@ -411,7 +412,10 @@ static void io_ctl_drop_pages(struct btrfs_io_ctl *io_ctl) for (i = 0; i < io_ctl->num_pages; i++) { if (io_ctl->pages[i]) { - ClearPageChecked(io_ctl->pages[i]); + btrfs_page_clear_checked(io_ctl->fs_info, + io_ctl->pages[i], + page_offset(io_ctl->pages[i]), + PAGE_SIZE); unlock_page(io_ctl->pages[i]); put_page(io_ctl->pages[i]); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6760ec78df6b..fdcb1b0fabb8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2759,7 +2759,8 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) clear_page_dirty_for_io(page); SetPageError(page); } - ClearPageChecked(page); + btrfs_page_clear_checked(inode->root->fs_info, page, + page_start, PAGE_SIZE); unlock_page(page); put_page(page); kfree(fixup); @@ -2814,7 +2815,7 @@ int btrfs_writepage_cow_fixup(struct page *page) * page->mapping outside of the page lock. */ ihold(inode); - SetPageChecked(page); + btrfs_page_set_checked(fs_info, page, page_offset(page), PAGE_SIZE); get_page(page); btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL); fixup->page = page; @@ -3263,27 +3264,23 @@ unsigned int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset, struct page *page, u64 start, u64 end) { struct inode *inode = page->mapping->host; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_root *root = BTRFS_I(inode)->root; const u32 sectorsize = root->fs_info->sectorsize; u32 pg_off; unsigned int result = 0; - if (PageChecked(page)) { - ClearPageChecked(page); + if (btrfs_page_test_checked(fs_info, page, start, end + 1 - start)) { + btrfs_page_clear_checked(fs_info, page, start, end + 1 - start); return 0; } /* - * For subpage case, above PageChecked is not safe as it's not subpage - * compatible. - * But for now only cow fixup and compressed read utilize PageChecked - * flag, while in this context we can easily use io_bio->csum to - * determine if we really need to do csum verification. - * - * So for now, just exit if io_bio->csum is NULL, as it means it's - * compressed read, and its compressed data csum has already been - * verified. + * This only happens for NODATASUM or compressed read. + * Normally this should be covered by above check for compressed read + * or the next check for NODATASUM. + * Just do a quicker exit here. */ if (io_bio->csum == NULL) return 0; @@ -5104,7 +5101,8 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len, len); flush_dcache_page(page); } - ClearPageChecked(page); + btrfs_page_clear_checked(fs_info, page, block_start, + block_end + 1 - block_start); btrfs_page_set_dirty(fs_info, page, block_start, block_end + 1 - block_start); unlock_extent_cached(io_tree, block_start, block_end, &cached_state); @@ -8694,9 +8692,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, * did something wrong. */ ASSERT(!PageOrdered(page)); + btrfs_page_clear_checked(fs_info, page, page_offset(page), PAGE_SIZE); if (!inode_evicting) __btrfs_releasepage(page, GFP_NOFS); - ClearPageChecked(page); clear_page_extent_mapped(page); } @@ -8840,7 +8838,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) memzero_page(page, zero_start, PAGE_SIZE - zero_start); flush_dcache_page(page); } - ClearPageChecked(page); + btrfs_page_clear_checked(fs_info, page, page_start, PAGE_SIZE); btrfs_page_set_dirty(fs_info, page, page_start, end + 1 - page_start); btrfs_page_set_uptodate(fs_info, page, page_start, end + 1 - page_start); diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index c71e49782e86..e0f93b357548 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -138,7 +138,7 @@ static int copy_inline_to_page(struct btrfs_inode *inode, } btrfs_page_set_uptodate(fs_info, page, file_offset, block_size); - ClearPageChecked(page); + btrfs_page_clear_checked(fs_info, page, file_offset, block_size); btrfs_page_set_dirty(fs_info, page, file_offset, block_size); out_unlock: if (page) { diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 51f873a680ea..2bea6766d84e 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -88,6 +88,9 @@ void btrfs_init_subpage_info(struct btrfs_subpage_info *subpage_info, u32 sector subpage_info->ordered_offset = cur; cur += nr_bits; + subpage_info->checked_offset = cur; + cur += nr_bits; + subpage_info->total_nr_bits = cur; } @@ -255,8 +258,16 @@ static void btrfs_subpage_clamp_range(struct page *page, u64 *start, u32 *len) u32 orig_len = *len; *start = max_t(u64, page_offset(page), orig_start); - *len = min_t(u64, page_offset(page) + PAGE_SIZE, - orig_start + orig_len) - *start; + /* + * For certain call sites like btrfs_drop_pages(), we may have pages + * beyond the target range. In that case, just set @len to 0, subpage + * helpers can handle @len == 0 without any problem. + */ + if (page_offset(page) >= orig_start + orig_len) + *len = 0; + else + *len = min_t(u64, page_offset(page) + PAGE_SIZE, + orig_start + orig_len) - *start; } void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info, @@ -532,6 +543,36 @@ void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info, ClearPageOrdered(page); spin_unlock_irqrestore(&subpage->lock, flags); } + +void btrfs_subpage_set_checked(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + unsigned int start_bit = subpage_calc_start_bit(fs_info, page, + checked, start, len); + unsigned long flags; + + spin_lock_irqsave(&subpage->lock, flags); + bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); + if (subpage_test_bitmap_all_set(fs_info, subpage, checked)) + SetPageChecked(page); + spin_unlock_irqrestore(&subpage->lock, flags); +} + +void btrfs_subpage_clear_checked(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + unsigned int start_bit = subpage_calc_start_bit(fs_info, page, + checked, start, len); + unsigned long flags; + + spin_lock_irqsave(&subpage->lock, flags); + bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); + ClearPageChecked(page); + spin_unlock_irqrestore(&subpage->lock, flags); +} + /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. @@ -557,6 +598,7 @@ IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered); +IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(checked); /* * Note that, in selftests (extent-io-tests), we can have empty fs_info passed @@ -627,6 +669,8 @@ IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback, PageWriteback); IMPLEMENT_BTRFS_PAGE_OPS(ordered, SetPageOrdered, ClearPageOrdered, PageOrdered); +IMPLEMENT_BTRFS_PAGE_OPS(checked, SetPageChecked, ClearPageChecked, + PageChecked); /* * Make sure not only the page dirty bit is cleared, but also subpage dirty bit diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index ac4dd64ed257..46224f959c34 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -36,6 +36,7 @@ struct btrfs_subpage_info { unsigned int dirty_offset; unsigned int writeback_offset; unsigned int ordered_offset; + unsigned int checked_offset; }; /* @@ -142,6 +143,7 @@ DECLARE_BTRFS_SUBPAGE_OPS(error); DECLARE_BTRFS_SUBPAGE_OPS(dirty); DECLARE_BTRFS_SUBPAGE_OPS(writeback); DECLARE_BTRFS_SUBPAGE_OPS(ordered); +DECLARE_BTRFS_SUBPAGE_OPS(checked); bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, struct page *page, u64 start, u32 len); From patchwork Sun Aug 29 05:24:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C19AC4320E for ; Sun, 29 Aug 2021 05:25:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EAF7C606A5 for ; Sun, 29 Aug 2021 05:25:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234699AbhH2F0G (ORCPT ); Sun, 29 Aug 2021 01:26:06 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45776 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234575AbhH2F0D (ORCPT ); Sun, 29 Aug 2021 01:26:03 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 946571FD50 for ; Sun, 29 Aug 2021 05:25:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214711; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZNG3ycTpxf7rBVI9XbsCzEdlEEFUlF8M6UOoiNGIvNw=; b=JlGBmK5POef9uFfm0WK8BG5aRT3VLehfOw8viAU6nefUwNjcoAC70Nia5JdtsXNQRFxf1f IsQF8xZaWR4NeW3R1OxS+Pr27/ZyQuvj2K0azwnSyifi/33V7827Ye5awG8IgjvlicutWa aIQDGuYxmD0HitFJhjmovFFOUCxCW+E= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id D1ACB13964 for ; Sun, 29 Aug 2021 05:25:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id QMrBJDYaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:10 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 08/26] btrfs: handle errors properly inside btrfs_submit_compressed_read() Date: Sun, 29 Aug 2021 13:24:40 +0800 Message-Id: <20210829052458.15454-9-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are quite some BUG_ON()s inside btrfs_submit_compressed_read(), namingly all errors inside the for() loop relies on BUG_ON() to handle -ENOMEM. Handle these errors properly by: - Wait for submitted bios to finish first Using wake_var_event() APIs to wait without introducing extra memory overhead inside compressed_bio. This allows us to wait for any submitted bio to finish, while still keeps the compressed_bio from being freed. - Introduce finish_compressed_bio_read() to finish the compressed_bio - Properly end the bio and finish compressed_bio when error happens Now in btrfs_submit_compressed_read() even when the bio submission failed, we can properly handle the error without triggering BUG_ON(). Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 133 +++++++++++++++++++++++++---------------- 1 file changed, 83 insertions(+), 50 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 79379fa741b8..d4b46d46b249 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -223,9 +223,60 @@ static bool dec_and_test_compressed_bio(struct compressed_bio *cb, last_io = refcount_sub_and_test(bi_size >> fs_info->sectorsize_bits, &cb->pending_sectors); atomic_dec(&cb->pending_bios); + /* + * Here we must wake up the possible error handler after all other + * operations on @cb finished, or we can race with + * finish_compressed_bio_*() which may free @cb. + */ + wake_up_var(cb); + return last_io; } +static void finish_compressed_bio_read(struct compressed_bio *cb, + struct bio *bio) +{ + unsigned int index; + struct page *page; + + /* Release the compressed pages */ + for (index = 0; index < cb->nr_pages; index++) { + page = cb->compressed_pages[index]; + page->mapping = NULL; + put_page(page); + } + + /* Do io completion on the original bio */ + if (cb->errors) { + bio_io_error(cb->orig_bio); + } else { + struct bio_vec *bvec; + struct bvec_iter_all iter_all; + + ASSERT(bio); + ASSERT(!bio->bi_status); + /* + * We have verified the checksum already, set page + * checked so the end_io handlers know about it + */ + ASSERT(!bio_flagged(bio, BIO_CLONED)); + bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) { + u64 bvec_start = page_offset(bvec->bv_page) + + bvec->bv_offset; + + btrfs_page_set_checked(btrfs_sb(cb->inode->i_sb), + bvec->bv_page, bvec_start, + bvec->bv_len); + } + + bio_endio(cb->orig_bio); + } + + /* Finally free the cb struct */ + kfree(cb->compressed_pages); + kfree(cb); +} + /* when we finish reading compressed pages from the disk, we * decompress them and then run the bio end_io routines on the * decompressed pages (in the inode address space). @@ -240,8 +291,6 @@ static void end_compressed_bio_read(struct bio *bio) { struct compressed_bio *cb = bio->bi_private; struct inode *inode; - struct page *page; - unsigned int index; unsigned int mirror = btrfs_io_bio(bio)->mirror_num; int ret = 0; @@ -276,42 +325,7 @@ static void end_compressed_bio_read(struct bio *bio) csum_failed: if (ret) cb->errors = 1; - - /* release the compressed pages */ - index = 0; - for (index = 0; index < cb->nr_pages; index++) { - page = cb->compressed_pages[index]; - page->mapping = NULL; - put_page(page); - } - - /* do io completion on the original bio */ - if (cb->errors) { - bio_io_error(cb->orig_bio); - } else { - struct bio_vec *bvec; - struct bvec_iter_all iter_all; - - /* - * we have verified the checksum already, set page - * checked so the end_io handlers know about it - */ - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) { - u64 bvec_start = page_offset(bvec->bv_page) + - bvec->bv_offset; - - btrfs_page_set_checked(btrfs_sb(cb->inode->i_sb), - bvec->bv_page, bvec_start, - bvec->bv_len); - } - - bio_endio(cb->orig_bio); - } - - /* finally free the cb struct */ - kfree(cb->compressed_pages); - kfree(cb); + finish_compressed_bio_read(cb, bio); out: bio_put(bio); } @@ -834,20 +848,20 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto finish_cb; ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto finish_cb; nr_sectors = DIV_ROUND_UP(comp_bio->bi_iter.bi_size, fs_info->sectorsize); sums += fs_info->csum_size * nr_sectors; ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); - if (ret) { - comp_bio->bi_status = ret; - bio_endio(comp_bio); - } + if (ret) + goto finish_cb; comp_bio = btrfs_bio_alloc(cur_disk_byte); comp_bio->bi_opf = REQ_OP_READ; @@ -861,16 +875,16 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto last_bio; ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto last_bio; ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); - if (ret) { - comp_bio->bi_status = ret; - bio_endio(comp_bio); - } + if (ret) + goto last_bio; return 0; @@ -886,6 +900,25 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, out: free_extent_map(em); return ret; +last_bio: + comp_bio->bi_status = ret; + /* This is the last bio, endio functions will free @cb */ + bio_endio(comp_bio); + return ret; +finish_cb: + if (comp_bio) { + comp_bio->bi_status = ret; + bio_endio(comp_bio); + } + wait_var_event(cb, atomic_read(&cb->pending_bios) == 0); + /* + * Even with previous bio ended, we should still have io not yet + * submitted, thus need to finish @cb manually. + */ + ASSERT(refcount_read(&cb->pending_sectors)); + /* Now we are the only one referring @cb, can finish it safely. */ + finish_compressed_bio_read(cb, NULL); + return ret; } /* From patchwork Sun Aug 29 05:24:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 356CDC432BE for ; Sun, 29 Aug 2021 05:25:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D5D560F3A for ; Sun, 29 Aug 2021 05:25:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234701AbhH2F0G (ORCPT ); Sun, 29 Aug 2021 01:26:06 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45782 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234689AbhH2F0F (ORCPT ); Sun, 29 Aug 2021 01:26:05 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C66AF20017 for ; Sun, 29 Aug 2021 05:25:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214712; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f47k+XJMMarp5qbcN7fJZw5wlCEvjLZiAXuVW250Vrk=; b=sj3HwCSAdue6V/G52deMxxGY/fASZ2UFj+unvIwDwePBK4nS8jJZk5LZcycb7nmOz2G30s N08WC/9UZVWCg28Pot6oMZUKdPoer2dbUe8IiGILgIwV7leMi17qcxQZeHvoADWy8kDKwe Coxp/ZDTOABXK32oEyMmhiHLQUBcerU= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 1159E13964 for ; Sun, 29 Aug 2021 05:25:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id YGZnMTcaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:11 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 09/26] btrfs: handle errors properly inside btrfs_submit_compressed_write() Date: Sun, 29 Aug 2021 13:24:41 +0800 Message-Id: <20210829052458.15454-10-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Just like btrfs_submit_compressed_read(), there are quite some BUG_ON()s inside btrfs_submit_compressed_write() for the bio submission path. Fix them using the same method: - For last bio, just endio the bio As in that case, one of the endio function of all these submitted bio will be able to free the compressed_bio - For half-submitted bio, wait and finish the compressed_bio manually In this case, as long as all other bio finishes, we're the only one referring the compressed bio, and can manually finish it. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 99 +++++++++++++++++++++++++++--------------- 1 file changed, 63 insertions(+), 36 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index d4b46d46b249..ece4d9ab3c65 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -368,50 +368,56 @@ static noinline void end_compressed_writeback(struct inode *inode, /* the inode may be gone now */ } -/* - * do the cleanup once all the compressed pages hit the disk. - * This will clear writeback on the file pages and free the compressed - * pages. - * - * This also calls the writeback end hooks for the file pages so that - * metadata and checksums can be updated in the file. - */ -static void end_compressed_bio_write(struct bio *bio) +static void finish_compressed_bio_write(struct compressed_bio *cb) { - struct compressed_bio *cb = bio->bi_private; - struct inode *inode; - struct page *page; + struct inode *inode = cb->inode; unsigned int index; - if (!dec_and_test_compressed_bio(cb, bio)) - goto out; - - /* ok, we're the last bio for this extent, step one is to - * call back into the FS and do all the end_io operations + /* + * Ok, we're the last bio for this extent, step one is to + * call back into the FS and do all the end_io operations. */ - inode = cb->inode; - btrfs_record_physical_zoned(inode, cb->start, bio); btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL, cb->start, cb->start + cb->len - 1, !cb->errors); end_compressed_writeback(inode, cb); - /* note, our inode could be gone now */ + /* Note, our inode could be gone now */ /* - * release the compressed pages, these came from alloc_page and + * Release the compressed pages, these came from alloc_page and * are not attached to the inode at all */ - index = 0; for (index = 0; index < cb->nr_pages; index++) { - page = cb->compressed_pages[index]; + struct page *page = cb->compressed_pages[index]; + page->mapping = NULL; put_page(page); } - /* finally free the cb struct */ + /* Finally free the cb struct */ kfree(cb->compressed_pages); kfree(cb); +} + +/* + * do the cleanup once all the compressed pages hit the disk. + * This will clear writeback on the file pages and free the compressed + * pages. + * + * This also calls the writeback end hooks for the file pages so that + * metadata and checksums can be updated in the file. + */ +static void end_compressed_bio_write(struct bio *bio) +{ + struct compressed_bio *cb = bio->bi_private; + + if (!dec_and_test_compressed_bio(cb, bio)) + goto out; + + btrfs_record_physical_zoned(cb->inode, cb->start, bio); + + finish_compressed_bio_write(cb); out: bio_put(bio); } @@ -514,18 +520,18 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto finish_cb; if (!skip_sum) { ret = btrfs_csum_one_bio(inode, bio, start, 1); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto finish_cb; } ret = btrfs_map_bio(fs_info, bio, 0); - if (ret) { - bio->bi_status = ret; - bio_endio(bio); - } + if (ret) + goto finish_cb; bio = btrfs_bio_alloc(first_byte); bio->bi_opf = bio_op | write_flags; @@ -551,23 +557,44 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto last_bio; if (!skip_sum) { ret = btrfs_csum_one_bio(inode, bio, start, 1); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + goto last_bio; } ret = btrfs_map_bio(fs_info, bio, 0); - if (ret) { - bio->bi_status = ret; - bio_endio(bio); - } + if (ret) + goto last_bio; if (blkcg_css) kthread_associate_blkcg(NULL); return 0; +last_bio: + bio->bi_status = ret; + /* One of the bios' endio function will free @cb. */ + bio_endio(bio); + return ret; + +finish_cb: + if (bio) { + bio->bi_status = ret; + bio_endio(bio); + } + + wait_var_event(cb, atomic_read(&cb->pending_bios) == 0); + /* + * Even with previous bio ended, we should still have io not yet + * submitted, thus need to finish manually. + */ + ASSERT(refcount_read(&cb->pending_sectors)); + /* Now we are the only one referring @cb, can finish it safely. */ + finish_compressed_bio_write(cb); + return ret; } static u64 bio_end_offset(struct bio *bio) From patchwork Sun Aug 29 05:24:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463711 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EF38C432BE for ; Sun, 29 Aug 2021 05:25:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0EA8160C3F for ; Sun, 29 Aug 2021 05:25:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233763AbhH2F0I (ORCPT ); Sun, 29 Aug 2021 01:26:08 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45788 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234697AbhH2F0G (ORCPT ); Sun, 29 Aug 2021 01:26:06 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 05C8020016 for ; Sun, 29 Aug 2021 05:25:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214714; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KKRGU4zwkseWVE58aAkNIbrQKxjsQrWwkgjnLw/e8ec=; b=M2eVpF0VMr63MkXlMyLbrdCRv2d1Ohfm3ih9Tanxbr5qXpHX3tPUlejNMy7AtNjUG3q7Nk Qj15HhsGiMKWZyhz2Ow3O3P1h6ctfPjznEyIwb57IXJVQLPeqi188rbjL7HZc/YCyWw2ef yPJMFjO0RE5CTGkskCo2mHvpaVCyUu4= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 4437813964 for ; Sun, 29 Aug 2021 05:25:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id GChAAjkaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:13 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 10/26] btrfs: introduce submit_compressed_bio() for compression Date: Sun, 29 Aug 2021 13:24:42 +0800 Message-Id: <20210829052458.15454-11-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new helper, submit_compressed_bio(), will aggregate the following work: - Increase compressed_bio::pending_bios - Remap the endio function - Map and submit the bio Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 45 ++++++++++++++++++------------------------ 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index ece4d9ab3c65..033ca1060567 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -422,6 +422,21 @@ static void end_compressed_bio_write(struct bio *bio) bio_put(bio); } +static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info, + struct compressed_bio *cb, + struct bio *bio, int mirror_num) +{ + blk_status_t ret; + + ASSERT(bio->bi_iter.bi_size); + atomic_inc(&cb->pending_bios); + ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); + if (ret) + return ret; + ret = btrfs_map_bio(fs_info, bio, mirror_num); + return ret; +} + /* * worker function to build and submit bios for previously compressed pages. * The corresponding pages in the inode should be marked for writeback @@ -517,19 +532,13 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, page->mapping = NULL; if (submit || len < PAGE_SIZE) { - atomic_inc(&cb->pending_bios); - ret = btrfs_bio_wq_end_io(fs_info, bio, - BTRFS_WQ_ENDIO_DATA); - if (ret) - goto finish_cb; - if (!skip_sum) { ret = btrfs_csum_one_bio(inode, bio, start, 1); if (ret) goto finish_cb; } - ret = btrfs_map_bio(fs_info, bio, 0); + ret = submit_compressed_bio(fs_info, cb, bio, 0); if (ret) goto finish_cb; @@ -555,18 +564,13 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cond_resched(); } - atomic_inc(&cb->pending_bios); - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - goto last_bio; - if (!skip_sum) { ret = btrfs_csum_one_bio(inode, bio, start, 1); if (ret) goto last_bio; } - ret = btrfs_map_bio(fs_info, bio, 0); + ret = submit_compressed_bio(fs_info, cb, bio, 0); if (ret) goto last_bio; @@ -872,12 +876,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, if (submit || bio_add_page(comp_bio, page, pg_len, 0) < pg_len) { unsigned int nr_sectors; - atomic_inc(&cb->pending_bios); - ret = btrfs_bio_wq_end_io(fs_info, comp_bio, - BTRFS_WQ_ENDIO_DATA); - if (ret) - goto finish_cb; - ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); if (ret) goto finish_cb; @@ -886,7 +884,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, fs_info->sectorsize); sums += fs_info->csum_size * nr_sectors; - ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); + ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num); if (ret) goto finish_cb; @@ -900,16 +898,11 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, cur_disk_byte += pg_len; } - atomic_inc(&cb->pending_bios); - ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - goto last_bio; - ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); if (ret) goto last_bio; - ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); + ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num); if (ret) goto last_bio; From patchwork Sun Aug 29 05:24:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 671D0C4320E for ; Sun, 29 Aug 2021 05:25:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E6FD60C3F for ; Sun, 29 Aug 2021 05:25:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234718AbhH2F0L (ORCPT ); Sun, 29 Aug 2021 01:26:11 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59222 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234705AbhH2F0H (ORCPT ); Sun, 29 Aug 2021 01:26:07 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3D6CD21D68 for ; Sun, 29 Aug 2021 05:25:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214715; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vPWoiZgRHvrIM9fIwk+2g8F5PFwW+rKwnMW/O0f+/Vs=; b=FSj26Q4IhOvCE6IZ896LBf92scD4K8Y4vUU7lvSgaraluIaC95/5YQtaTWRXPyL6JLhQwZ nwQcyBKjl4k5PgPejPAg11x6sU7iS15hUqVQRRJ8SVo73GEmzeyREZmFZxw+8uFhMOIdDY 8U1tlWdbmaCvASZF1sxs6CD9jAZxmoA= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 7B14A13964 for ; Sun, 29 Aug 2021 05:25:14 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id eHyIDzoaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:14 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 11/26] btrfs: introduce alloc_compressed_bio() for compression Date: Sun, 29 Aug 2021 13:24:43 +0800 Message-Id: <20210829052458.15454-12-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Just aggregate the bio allocation code into one helper, so that we can replace 4 call sites. There is one special note for zoned write. Currently btrfs_submit_compressed_write() will only allocate the first bio using ZONE_APPEND. If we have to submit current bio due to stripe boundary, the new bio allocated will not use ZONE_APPEND. In theory this should be a bug, but considering zoned mode currently only support SINGLE profile, which doesn't have any stripe boundary limit, it should never be a problem. This function will provide a good entrance for any work which needs to be done at bio allocation time. Like determining the stripe boundary. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 85 ++++++++++++++++++++++++++++-------------- 1 file changed, 57 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 033ca1060567..7fdc9c131ac5 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -437,6 +437,35 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info, return ret; } +/* + * To allocate a compressed_bio, which will be used to read/write on-disk data. + */ +static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr, + unsigned int opf, bio_end_io_t endio_func) +{ + struct bio *bio; + + bio = btrfs_bio_alloc(disk_bytenr); + + bio->bi_opf = opf; + bio->bi_private = cb; + bio->bi_end_io = endio_func; + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); + struct btrfs_device *device; + + device = btrfs_zoned_get_device(fs_info, disk_bytenr, + fs_info->sectorsize); + if (IS_ERR(device)) { + bio_put(bio); + return ERR_CAST(device); + } + bio_set_dev(bio, device->bdev); + } + return bio; +} + /* * worker function to build and submit bios for previously compressed pages. * The corresponding pages in the inode should be marked for writeback @@ -483,22 +512,11 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cb->orig_bio = NULL; cb->nr_pages = nr_pages; - bio = btrfs_bio_alloc(first_byte); - bio->bi_opf = bio_op | write_flags; - bio->bi_private = cb; - bio->bi_end_io = end_compressed_bio_write; - - if (use_append) { - struct btrfs_device *device; - - device = btrfs_zoned_get_device(fs_info, disk_start, PAGE_SIZE); - if (IS_ERR(device)) { - kfree(cb); - bio_put(bio); - return BLK_STS_NOTSUPP; - } - - bio_set_dev(bio, device->bdev); + bio = alloc_compressed_bio(cb, first_byte, bio_op | write_flags, + end_compressed_bio_write); + if (IS_ERR(bio)) { + kfree(cb); + return errno_to_blk_status(PTR_ERR(bio)); } if (blkcg_css) { @@ -542,10 +560,14 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, if (ret) goto finish_cb; - bio = btrfs_bio_alloc(first_byte); - bio->bi_opf = bio_op | write_flags; - bio->bi_private = cb; - bio->bi_end_io = end_compressed_bio_write; + bio = alloc_compressed_bio(cb, first_byte, + bio_op | write_flags, + end_compressed_bio_write); + if (IS_ERR(bio)) { + ret = errno_to_blk_status(PTR_ERR(bio)); + bio = NULL; + goto finish_cb; + } if (blkcg_css) bio->bi_opf |= REQ_CGROUP_PUNT; /* @@ -844,10 +866,13 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; - comp_bio = btrfs_bio_alloc(cur_disk_byte); - comp_bio->bi_opf = REQ_OP_READ; - comp_bio->bi_private = cb; - comp_bio->bi_end_io = end_compressed_bio_read; + comp_bio = alloc_compressed_bio(cb, cur_disk_byte, REQ_OP_READ, + end_compressed_bio_read); + if (IS_ERR(comp_bio)) { + ret = errno_to_blk_status(PTR_ERR(comp_bio)); + comp_bio = NULL; + goto fail2; + } for (pg_index = 0; pg_index < nr_pages; pg_index++) { u32 pg_len = PAGE_SIZE; @@ -888,10 +913,14 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, if (ret) goto finish_cb; - comp_bio = btrfs_bio_alloc(cur_disk_byte); - comp_bio->bi_opf = REQ_OP_READ; - comp_bio->bi_private = cb; - comp_bio->bi_end_io = end_compressed_bio_read; + comp_bio = alloc_compressed_bio(cb, cur_disk_byte, + REQ_OP_READ, + end_compressed_bio_read); + if (IS_ERR(comp_bio)) { + ret = errno_to_blk_status(PTR_ERR(comp_bio)); + comp_bio = NULL; + goto finish_cb; + } bio_add_page(comp_bio, page, pg_len, 0); } From patchwork Sun Aug 29 05:24:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72FB9C432BE for ; Sun, 29 Aug 2021 05:25:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A1C860C3F for ; Sun, 29 Aug 2021 05:25:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234731AbhH2F0M (ORCPT ); Sun, 29 Aug 2021 01:26:12 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59228 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234710AbhH2F0I (ORCPT ); Sun, 29 Aug 2021 01:26:08 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 70D6B21D6B for ; Sun, 29 Aug 2021 05:25:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214716; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DYpmN1rQcvsVKjLJ4cSTPmym++WnBw6fje7yPHZlzqI=; b=inDeIkX+WzjMfMYl8AAQWN/LipY7zwxOvXPZQIvQhO2UQ14RkArR4Q9q/+D95JisPnRM0X 6OjA3DF/TGg2txRPmgCVZD/ZVrxS1dJSEhKFaXtJhWdRldcbxl9ZCNm6HrTWO+EHI1bou6 k9yXM+0mCnRJD0s2dkDCQix6qxF1QBg= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id AECC813964 for ; Sun, 29 Aug 2021 05:25:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id kKgyHDsaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:15 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 12/26] btrfs: make btrfs_submit_compressed_read() to determine stripe boundary at bio allocation time Date: Sun, 29 Aug 2021 13:24:44 +0800 Message-Id: <20210829052458.15454-13-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs_submit_compressed_read() will check btrfs_bio_fits_in_stripe() each time a new page is going to be added. Even compressed extent is small, we don't really need to do that for every page. This patch will align the behavior to extent_io.c, by determining the stripe boundary when allocating a bio. Unlike extent_io.c, in compressed.c we don't need to bother things like different bio flags, thus no need to re-use bio_ctrl. Here we just manually introduce new local variable, next_stripe_start, and teach alloc_compressed_bio() to calculate the stripe boundary. Then each time we add some page range into the bio, we check if we reached the boundary. And if reached, submit it. Also, since we have @cur_disk_byte to determine whether we're the last bio, we don't need a explicit last_bio: tag for error handling any more. And we can use @cur_disk_byte to track which range has been added to bio, we can also use @cur_disk_byte to calculate the wait condition, no need for @pending_bios. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 164 +++++++++++++++++++++++------------------ 1 file changed, 93 insertions(+), 71 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 7fdc9c131ac5..ac50972e4920 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -439,11 +439,18 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info, /* * To allocate a compressed_bio, which will be used to read/write on-disk data. + * + * @next_stripe_start: Disk bytenr of next stripe start */ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr, - unsigned int opf, bio_end_io_t endio_func) + unsigned int opf, bio_end_io_t endio_func, + u64 *next_stripe_start) { + struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); + struct btrfs_io_geometry geom; + struct extent_map *em; struct bio *bio; + int ret; bio = btrfs_bio_alloc(disk_bytenr); @@ -451,18 +458,24 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte bio->bi_private = cb; bio->bi_end_io = endio_func; - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); - struct btrfs_device *device; + em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize); + if (IS_ERR(em)) { + bio_put(bio); + return ERR_CAST(em); + } - device = btrfs_zoned_get_device(fs_info, disk_bytenr, - fs_info->sectorsize); - if (IS_ERR(device)) { - bio_put(bio); - return ERR_CAST(device); - } - bio_set_dev(bio, device->bdev); + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev); + + ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr, + &geom); + free_extent_map(em); + if (ret < 0) { + bio_put(bio); + return ERR_PTR(ret); } + *next_stripe_start = disk_bytenr + geom.len; + return bio; } @@ -490,6 +503,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, int pg_index = 0; struct page *page; u64 first_byte = disk_start; + u64 next_stripe_start; blk_status_t ret; int skip_sum = inode->flags & BTRFS_INODE_NODATASUM; const bool use_append = btrfs_use_zone_append(inode, disk_start); @@ -513,7 +527,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cb->nr_pages = nr_pages; bio = alloc_compressed_bio(cb, first_byte, bio_op | write_flags, - end_compressed_bio_write); + end_compressed_bio_write, + &next_stripe_start); if (IS_ERR(bio)) { kfree(cb); return errno_to_blk_status(PTR_ERR(bio)); @@ -562,7 +577,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, bio = alloc_compressed_bio(cb, first_byte, bio_op | write_flags, - end_compressed_bio_write); + end_compressed_bio_write, + &next_stripe_start); if (IS_ERR(bio)) { ret = errno_to_blk_status(PTR_ERR(bio)); bio = NULL; @@ -795,9 +811,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, unsigned int compressed_len; unsigned int nr_pages; unsigned int pg_index; - struct page *page; - struct bio *comp_bio; - u64 cur_disk_byte = bio->bi_iter.bi_sector << 9; + struct bio *comp_bio = NULL; + const u64 disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 cur_disk_byte = disk_bytenr; + u64 next_stripe_start; u64 file_offset; u64 em_len; u64 em_start; @@ -866,39 +883,62 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; - comp_bio = alloc_compressed_bio(cb, cur_disk_byte, REQ_OP_READ, - end_compressed_bio_read); - if (IS_ERR(comp_bio)) { - ret = errno_to_blk_status(PTR_ERR(comp_bio)); - comp_bio = NULL; - goto fail2; - } - - for (pg_index = 0; pg_index < nr_pages; pg_index++) { - u32 pg_len = PAGE_SIZE; - int submit = 0; + while (cur_disk_byte < disk_bytenr + compressed_len) { + u64 offset = cur_disk_byte - disk_bytenr; + unsigned int index = offset >> PAGE_SHIFT; + unsigned int real_size; + unsigned int added; + struct page *page = cb->compressed_pages[index]; + bool submit = false; + /* Allocate new bio if submitted or not yet allocated */ + if (!comp_bio) { + comp_bio = alloc_compressed_bio(cb, cur_disk_byte, + REQ_OP_READ, end_compressed_bio_read, + &next_stripe_start); + if (IS_ERR(comp_bio)) { + ret = errno_to_blk_status(PTR_ERR(comp_bio)); + comp_bio = NULL; + goto finish_cb; + } + } /* - * To handle subpage case, we need to make sure the bio only - * covers the range we need. - * - * If we're at the last page, truncate the length to only cover - * the remaining part. + * We should never reach next_stripe_start start as we will + * submit comp_bio when reach the boundary immediately. + */ + ASSERT(cur_disk_byte != next_stripe_start); + /* + * We have various limit on the real read size: + * - stripe boundary + * - page boundary + * - compressed length boundary + */ + real_size = min_t(u64, U32_MAX, + next_stripe_start - cur_disk_byte); + real_size = min_t(u64, real_size, + PAGE_SIZE - offset_in_page(offset)); + real_size = min_t(u64, real_size, + compressed_len - offset); + ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize)); + + added = bio_add_page(comp_bio, page, real_size, + offset_in_page(offset)); + /* + * Maximum compressed extent is smaller than bio size limit, + * thus bio_add_page() should always success. */ - if (pg_index == nr_pages - 1) - pg_len = min_t(u32, PAGE_SIZE, - compressed_len - pg_index * PAGE_SIZE); + ASSERT(added == real_size); + cur_disk_byte += added; - page = cb->compressed_pages[pg_index]; - page->mapping = inode->i_mapping; - page->index = em_start >> PAGE_SHIFT; + /* Reached stripe boundary, need to submit */ + if (cur_disk_byte == next_stripe_start) + submit = true; - if (comp_bio->bi_iter.bi_size) - submit = btrfs_bio_fits_in_stripe(page, pg_len, - comp_bio, 0); + /* Has finished the range, need to submit */ + if (cur_disk_byte == disk_bytenr + compressed_len) + submit = true; - page->mapping = NULL; - if (submit || bio_add_page(comp_bio, page, pg_len, 0) < pg_len) { + if (submit) { unsigned int nr_sectors; ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); @@ -909,32 +949,13 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, fs_info->sectorsize); sums += fs_info->csum_size * nr_sectors; - ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num); + ret = submit_compressed_bio(fs_info, cb, comp_bio, + mirror_num); if (ret) goto finish_cb; - - comp_bio = alloc_compressed_bio(cb, cur_disk_byte, - REQ_OP_READ, - end_compressed_bio_read); - if (IS_ERR(comp_bio)) { - ret = errno_to_blk_status(PTR_ERR(comp_bio)); - comp_bio = NULL; - goto finish_cb; - } - - bio_add_page(comp_bio, page, pg_len, 0); + comp_bio = NULL; } - cur_disk_byte += pg_len; } - - ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); - if (ret) - goto last_bio; - - ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num); - if (ret) - goto last_bio; - return 0; fail2: @@ -949,17 +970,18 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, out: free_extent_map(em); return ret; -last_bio: - comp_bio->bi_status = ret; - /* This is the last bio, endio functions will free @cb */ - bio_endio(comp_bio); - return ret; finish_cb: if (comp_bio) { comp_bio->bi_status = ret; bio_endio(comp_bio); } - wait_var_event(cb, atomic_read(&cb->pending_bios) == 0); + /* All bytes of @cb is submitted, endio will free @cb */ + if (cur_disk_byte == disk_bytenr + compressed_len) + return ret; + + wait_var_event(cb, refcount_read(&cb->pending_sectors) == + (disk_bytenr + compressed_len - cur_disk_byte) >> + fs_info->sectorsize_bits); /* * Even with previous bio ended, we should still have io not yet * submitted, thus need to finish @cb manually. From patchwork Sun Aug 29 05:24:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0E52C4320E for ; Sun, 29 Aug 2021 05:25:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB6E160C3F for ; Sun, 29 Aug 2021 05:25:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234750AbhH2F0N (ORCPT ); Sun, 29 Aug 2021 01:26:13 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45794 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234713AbhH2F0L (ORCPT ); Sun, 29 Aug 2021 01:26:11 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id A53F420016 for ; Sun, 29 Aug 2021 05:25:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214717; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BfFxXKAz9aK74zbGUEzTwsJmJmUTA8s9Cq4GdHR8NKs=; b=XSmg2EI2zHZdayi2X6Rkkvd1Ch/2MGnVGfWnYiw2A2Ku5ETclk4xQ5JPqWenwKJaU1Z+xy lseXf60OkfyXKdmXden7m2jPxC4E/GUvsrJvY4N6KYWLUKRtpn8ytxg6XRlR9qLHWPlgVG dKNU3Y4ywrc8LyJdvXQK0zDedRWdUXk= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id E330813964 for ; Sun, 29 Aug 2021 05:25:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id 6AruKDwaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:16 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 13/26] btrfs: make btrfs_submit_compressed_write() to determine stripe boundary at bio allocation time Date: Sun, 29 Aug 2021 13:24:45 +0800 Message-Id: <20210829052458.15454-14-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs_submit_compressed_write() will check btrfs_bio_fits_in_stripe() each time a new page is going to be added. Even compressed extent is small, we don't really need to do that for every page. This patch will align the behavior to extent_io.c, by determining the stripe boundary when allocating a bio. Unlike extent_io.c, in compressed.c we don't need to bother things like different bio flags, thus no need to re-use bio_ctrl. Here we just manually introduce new local variable, next_stripe_start, and use that value returned from alloc_compressed_bio() to calculate the stripe boundary. Then each time we add some page range into the bio, we check if we reached the boundary. And if reached, submit it. Also, since we have @cur_disk_bytenr to determine whether we're the last bio, we don't need a explicit last_bio: tag for error handling any more. And since we use @cur_disk_bytenr to wait, there is no need for pending_bios, also remove it to save some memory of compressed_bio. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 145 ++++++++++++++++++----------------------- fs/btrfs/compression.h | 3 - 2 files changed, 62 insertions(+), 86 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index ac50972e4920..6e389f1c6091 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -222,7 +222,6 @@ static bool dec_and_test_compressed_bio(struct compressed_bio *cb, ASSERT(bi_size && bi_size <= cb->compressed_len); last_io = refcount_sub_and_test(bi_size >> fs_info->sectorsize_bits, &cb->pending_sectors); - atomic_dec(&cb->pending_bios); /* * Here we must wake up the possible error handler after all other * operations on @cb finished, or we can race with @@ -429,7 +428,6 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info, blk_status_t ret; ASSERT(bio->bi_iter.bi_size); - atomic_inc(&cb->pending_bios); ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); if (ret) return ret; @@ -499,10 +497,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, struct btrfs_fs_info *fs_info = inode->root->fs_info; struct bio *bio = NULL; struct compressed_bio *cb; - unsigned long bytes_left; - int pg_index = 0; - struct page *page; - u64 first_byte = disk_start; + u64 cur_disk_bytenr = disk_start; u64 next_stripe_start; blk_status_t ret; int skip_sum = inode->flags & BTRFS_INODE_NODATASUM; @@ -513,7 +508,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS); if (!cb) return BLK_STS_RESOURCE; - atomic_set(&cb->pending_bios, 0); refcount_set(&cb->pending_sectors, compressed_len >> fs_info->sectorsize_bits); cb->errors = 0; @@ -526,45 +520,65 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cb->orig_bio = NULL; cb->nr_pages = nr_pages; - bio = alloc_compressed_bio(cb, first_byte, bio_op | write_flags, - end_compressed_bio_write, - &next_stripe_start); - if (IS_ERR(bio)) { - kfree(cb); - return errno_to_blk_status(PTR_ERR(bio)); - } - - if (blkcg_css) { - bio->bi_opf |= REQ_CGROUP_PUNT; - kthread_associate_blkcg(blkcg_css); - } - - /* create and submit bios for the compressed pages */ - bytes_left = compressed_len; - for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) { - int submit = 0; - int len = 0; + while (cur_disk_bytenr < disk_start + compressed_len) { + u64 offset = cur_disk_bytenr - disk_start; + unsigned int index = offset >> PAGE_SHIFT; + unsigned int real_size; + unsigned int added; + struct page *page = compressed_pages[index]; + bool submit = false; - page = compressed_pages[pg_index]; - page->mapping = inode->vfs_inode.i_mapping; - if (bio->bi_iter.bi_size) - submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, - 0); + /* Allocate new bio if submitted or not yet allocated */ + if (!bio) { + bio = alloc_compressed_bio(cb, cur_disk_bytenr, + bio_op | write_flags, end_compressed_bio_write, + &next_stripe_start); + if (IS_ERR(bio)) { + ret = errno_to_blk_status(PTR_ERR(bio)); + bio = NULL; + goto finish_cb; + } + } + /* + * We should never reach next_stripe_start start as we will + * submit comp_bio when reach the boundary immediately. + */ + ASSERT(cur_disk_bytenr != next_stripe_start); /* - * Page can only be added to bio if the current bio fits in - * stripe. + * We have various limit on the real read size: + * - stripe boundary + * - page boundary + * - compressed length boundary */ - if (!submit) { - if (pg_index == 0 && use_append) - len = bio_add_zone_append_page(bio, page, - PAGE_SIZE, 0); - else - len = bio_add_page(bio, page, PAGE_SIZE, 0); - } + real_size = min_t(u64, U32_MAX, + next_stripe_start - cur_disk_bytenr); + real_size = min_t(u64, real_size, + PAGE_SIZE - offset_in_page(offset)); + real_size = min_t(u64, real_size, + compressed_len - offset); + ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize)); - page->mapping = NULL; - if (submit || len < PAGE_SIZE) { + if (use_append) + added = bio_add_zone_append_page(bio, page, real_size, + offset_in_page(offset)); + else + added = bio_add_page(bio, page, real_size, + offset_in_page(offset)); + /* Reached zoned boundary */ + if (added == 0) + submit = true; + + cur_disk_bytenr += added; + /* Reached stripe boundary */ + if (cur_disk_bytenr == next_stripe_start) + submit = true; + + /* Finished the range */ + if (cur_disk_bytenr == disk_start + compressed_len) + submit = true; + + if (submit) { if (!skip_sum) { ret = btrfs_csum_one_bio(inode, bio, start, 1); if (ret) @@ -574,61 +588,27 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, ret = submit_compressed_bio(fs_info, cb, bio, 0); if (ret) goto finish_cb; - - bio = alloc_compressed_bio(cb, first_byte, - bio_op | write_flags, - end_compressed_bio_write, - &next_stripe_start); - if (IS_ERR(bio)) { - ret = errno_to_blk_status(PTR_ERR(bio)); - bio = NULL; - goto finish_cb; - } - if (blkcg_css) - bio->bi_opf |= REQ_CGROUP_PUNT; - /* - * Use bio_add_page() to ensure the bio has at least one - * page. - */ - bio_add_page(bio, page, PAGE_SIZE, 0); - } - if (bytes_left < PAGE_SIZE) { - btrfs_info(fs_info, - "bytes left %lu compress len %u nr %u", - bytes_left, cb->compressed_len, cb->nr_pages); + bio = NULL; } - bytes_left -= PAGE_SIZE; - first_byte += PAGE_SIZE; cond_resched(); } - - if (!skip_sum) { - ret = btrfs_csum_one_bio(inode, bio, start, 1); - if (ret) - goto last_bio; - } - - ret = submit_compressed_bio(fs_info, cb, bio, 0); - if (ret) - goto last_bio; - if (blkcg_css) kthread_associate_blkcg(NULL); return 0; -last_bio: - bio->bi_status = ret; - /* One of the bios' endio function will free @cb. */ - bio_endio(bio); - return ret; finish_cb: if (bio) { bio->bi_status = ret; bio_endio(bio); } + /* Last byte of @cb is submitted, endio will free @cb */ + if (cur_disk_bytenr == disk_start + compressed_len) + return ret; - wait_var_event(cb, atomic_read(&cb->pending_bios) == 0); + wait_var_event(cb, refcount_read(&cb->pending_sectors) == + (disk_start + compressed_len - cur_disk_bytenr) >> + fs_info->sectorsize_bits); /* * Even with previous bio ended, we should still have io not yet * submitted, thus need to finish manually. @@ -841,7 +821,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, if (!cb) goto out; - atomic_set(&cb->pending_bios, 0); refcount_set(&cb->pending_sectors, compressed_len >> fs_info->sectorsize_bits); cb->errors = 0; diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 61955581e34f..56eef0821e3e 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -28,9 +28,6 @@ struct btrfs_inode; #define BTRFS_ZLIB_DEFAULT_LEVEL 3 struct compressed_bio { - /* number of bios pending for this compressed extent */ - atomic_t pending_bios; - /* Number of sectors with unfinished IO (unsubmitted or unfinished) */ refcount_t pending_sectors; From patchwork Sun Aug 29 05:24:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38E93C4320A for ; Sun, 29 Aug 2021 05:25:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 17275606A5 for ; Sun, 29 Aug 2021 05:25:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234739AbhH2F0N (ORCPT ); Sun, 29 Aug 2021 01:26:13 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45800 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233496AbhH2F0L (ORCPT ); Sun, 29 Aug 2021 01:26:11 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D904820017 for ; Sun, 29 Aug 2021 05:25:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214718; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nrXuuN+OW63XeWcDPoXycWuZrrrGbjIHkEe5fbjpj/Y=; b=oDYLo4+osqxU/BkGjognnJpH+TbvNxHOkYuGlMgX8/79r5v472aR415De/U4jT34M1bEHN KVhJyoGlr4eq7yGwBZGWSx+QThhvvTx+oyL4v/eCxYut/QRAU+ARy4+fvO5PXeizgfsF0L y1lqD+5mSrL0q5Z01FWLFekYJgcZuBw= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 2337813964 for ; Sun, 29 Aug 2021 05:25:17 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id OEnLNT0aK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:17 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 14/26] btrfs: remove unused function btrfs_bio_fits_in_stripe() Date: Sun, 29 Aug 2021 13:24:46 +0800 Message-Id: <20210829052458.15454-15-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As the last caller in compression.c is removed, we don't need that function anymore. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 2 -- fs/btrfs/inode.c | 42 ------------------------------------------ 2 files changed, 44 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 38870ae46cbb..ae7bc867f50e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3178,8 +3178,6 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new, struct extent_state *other); void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split); -int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, - unsigned long bio_flags); void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end); vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fdcb1b0fabb8..e8de19c7ad45 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2228,48 +2228,6 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, } } -/* - * btrfs_bio_fits_in_stripe - Checks whether the size of the given bio will fit - * in a chunk's stripe. This function ensures that bios do not span a - * stripe/chunk - * - * @page - The page we are about to add to the bio - * @size - size we want to add to the bio - * @bio - bio we want to ensure is smaller than a stripe - * @bio_flags - flags of the bio - * - * return 1 if page cannot be added to the bio - * return 0 if page can be added to the bio - * return error otherwise - */ -int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, - unsigned long bio_flags) -{ - struct inode *inode = page->mapping->host; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - u64 logical = bio->bi_iter.bi_sector << 9; - u32 bio_len = bio->bi_iter.bi_size; - struct extent_map *em; - int ret = 0; - struct btrfs_io_geometry geom; - - if (bio_flags & EXTENT_BIO_COMPRESSED) - return 0; - - em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize); - if (IS_ERR(em)) - return PTR_ERR(em); - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), logical, &geom); - if (ret < 0) - goto out; - - if (geom.len < bio_len + size) - ret = 1; -out: - free_extent_map(em); - return ret; -} - /* * in order to insert checksums into the metadata in large chunks, * we wait until bio submission time. All the pages in the bio are From patchwork Sun Aug 29 05:24:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6CFEC432BE for ; Sun, 29 Aug 2021 05:25:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D05FA60F39 for ; Sun, 29 Aug 2021 05:25:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234758AbhH2F0O (ORCPT ); Sun, 29 Aug 2021 01:26:14 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59378 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234705AbhH2F0M (ORCPT ); Sun, 29 Aug 2021 01:26:12 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 19FF721D95 for ; Sun, 29 Aug 2021 05:25:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214720; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZNfufLuvQDN9HEJP+SucIeZF7OLkQv/9UuPVwPdSdp0=; b=nAP81svrxifxvNkfnC3zXk4o/f5To2YhZ8B+Vq1E5ijese2rwhgfa53VAx8s/MkX5ZhukD edFJc54ayVak70iXDp2TgWA+gtUJm6gUexvj0xQHi8z+HEoP2/L04dUFDKraaD8CHVjx2S o64twTB8X5Yhrf6LLw9Y32xGykdHag0= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 5774E13964 for ; Sun, 29 Aug 2021 05:25:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id GKLkBj8aK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:19 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 15/26] btrfs: refactor submit_compressed_extents() Date: Sun, 29 Aug 2021 13:24:47 +0800 Message-Id: <20210829052458.15454-16-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We have a big hunk of code inside a while() loop, with tons of strange jump for error handling. It's definitely not to the code standard of today. Move the code into a new function, submit_one_async_extent(). Since we're here, also do the following modifications: - Comment style change To follow the current scheme - Don't fallback to non-compressed write then hitting ENOSPC If we hit ENOSPC for compressed write, how could we reserve more space for non-compressed write? Thus we go error path directly. This removes the retry: label. - Add more comment for super long parameter list Explain which parameter is for, so we don't need to check the prototype. - Move the error handling to submit_one_async_extent() Thus no strange code like: out_free: ... goto again; Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 278 +++++++++++++++++++++++------------------------ 1 file changed, 137 insertions(+), 141 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e8de19c7ad45..2098cc2e6db2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -839,163 +839,129 @@ static void free_async_extent_pages(struct async_extent *async_extent) async_extent->pages = NULL; } -/* - * phase two of compressed writeback. This is the ordered portion - * of the code, which only gets called in the order the work was - * queued. We walk all the async extents created by compress_file_range - * and send them down to the disk. - */ -static noinline void submit_compressed_extents(struct async_chunk *async_chunk) +static int submit_one_async_extent(struct btrfs_inode *inode, + struct async_chunk *async_chunk, + struct async_extent *async_extent, + u64 *alloc_hint) { - struct btrfs_inode *inode = BTRFS_I(async_chunk->inode); - struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct async_extent *async_extent; - u64 alloc_hint = 0; + struct extent_io_tree *io_tree = &inode->io_tree; + struct btrfs_root *root = inode->root; + struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_key ins; struct extent_map *em; - struct btrfs_root *root = inode->root; - struct extent_io_tree *io_tree = &inode->io_tree; int ret = 0; + u64 start = async_extent->start; + u64 end = async_extent->start + async_extent->ram_size - 1; -again: - while (!list_empty(&async_chunk->extents)) { - async_extent = list_entry(async_chunk->extents.next, - struct async_extent, list); - list_del(&async_extent->list); - -retry: - lock_extent(io_tree, async_extent->start, - async_extent->start + async_extent->ram_size - 1); - /* did the compression code fall back to uncompressed IO? */ - if (!async_extent->pages) { - int page_started = 0; - unsigned long nr_written = 0; - - /* allocate blocks */ - ret = cow_file_range(inode, async_chunk->locked_page, - async_extent->start, - async_extent->start + - async_extent->ram_size - 1, - &page_started, &nr_written, 0); - - /* JDM XXX */ + lock_extent(io_tree, start, end); - /* - * if page_started, cow_file_range inserted an - * inline extent and took care of all the unlocking - * and IO for us. Otherwise, we need to submit - * all those pages down to the drive. - */ - if (!page_started && !ret) - extent_write_locked_range(&inode->vfs_inode, - async_extent->start, - async_extent->start + - async_extent->ram_size - 1, - WB_SYNC_ALL); - else if (ret && async_chunk->locked_page) - unlock_page(async_chunk->locked_page); - kfree(async_extent); - cond_resched(); - continue; - } - - ret = btrfs_reserve_extent(root, async_extent->ram_size, - async_extent->compressed_size, - async_extent->compressed_size, - 0, alloc_hint, &ins, 1, 1); - if (ret) { - free_async_extent_pages(async_extent); - - if (ret == -ENOSPC) { - unlock_extent(io_tree, async_extent->start, - async_extent->start + - async_extent->ram_size - 1); - - /* - * we need to redirty the pages if we decide to - * fallback to uncompressed IO, otherwise we - * will not submit these pages down to lower - * layers. - */ - extent_range_redirty_for_io(&inode->vfs_inode, - async_extent->start, - async_extent->start + - async_extent->ram_size - 1); + /* We have fall back to uncompressed write */ + if (!async_extent->pages) { + int page_started = 0; + unsigned long nr_written = 0; - goto retry; - } - goto out_free; - } /* - * here we're doing allocation and writeback of the - * compressed pages + * Call cow_file_range() to run the delalloc range directly, + * since we won't go to nocow or async path again. */ - em = create_io_em(inode, async_extent->start, - async_extent->ram_size, /* len */ - async_extent->start, /* orig_start */ - ins.objectid, /* block_start */ - ins.offset, /* block_len */ - ins.offset, /* orig_block_len */ - async_extent->ram_size, /* ram_bytes */ - async_extent->compress_type, - BTRFS_ORDERED_COMPRESSED); - if (IS_ERR(em)) - /* ret value is not necessary due to void function */ - goto out_free_reserve; - free_extent_map(em); - - ret = btrfs_add_ordered_extent_compress(inode, - async_extent->start, - ins.objectid, - async_extent->ram_size, - ins.offset, - async_extent->compress_type); - if (ret) { - btrfs_drop_extent_cache(inode, async_extent->start, - async_extent->start + - async_extent->ram_size - 1, 0); - goto out_free_reserve; - } - btrfs_dec_block_group_reservations(fs_info, ins.objectid); - + ret = cow_file_range(inode, async_chunk->locked_page, + start, end, &page_started, &nr_written, 0); /* - * clear dirty, set writeback and unlock the pages. + * If @page_started, cow_file_range() inserted an + * inline extent and took care of all the unlocking + * and IO for us. Otherwise, we need to submit + * all those pages down to the drive. */ - extent_clear_unlock_delalloc(inode, async_extent->start, - async_extent->start + - async_extent->ram_size - 1, - NULL, EXTENT_LOCKED | EXTENT_DELALLOC, - PAGE_UNLOCK | PAGE_START_WRITEBACK); - if (btrfs_submit_compressed_write(inode, async_extent->start, - async_extent->ram_size, - ins.objectid, - ins.offset, async_extent->pages, - async_extent->nr_pages, - async_chunk->write_flags, - async_chunk->blkcg_css)) { - const u64 start = async_extent->start; - const u64 end = start + async_extent->ram_size - 1; - - btrfs_writepage_endio_finish_ordered(inode, NULL, start, - end, false); - - extent_clear_unlock_delalloc(inode, start, end, NULL, 0, - PAGE_END_WRITEBACK | - PAGE_SET_ERROR); - free_async_extent_pages(async_extent); - } - alloc_hint = ins.objectid + ins.offset; + if (!page_started && !ret) + extent_write_locked_range(&inode->vfs_inode, start, + end, WB_SYNC_ALL); + else if (ret && async_chunk->locked_page) + unlock_page(async_chunk->locked_page); kfree(async_extent); - cond_resched(); + return ret; + } + + ret = btrfs_reserve_extent(root, async_extent->ram_size, + async_extent->compressed_size, + async_extent->compressed_size, + 0, *alloc_hint, &ins, 1, 1); + if (ret) { + free_async_extent_pages(async_extent); + /* + * Here we used to try again by going back to non-compressed + * path for ENOSPC. + * But we can't reserve space even for compressed size, how + * could it work for uncompressed size which requires larger + * size? + * So here we directly go error path. + */ + goto out_free; + } + + /* + * Here we're doing allocation and writeback of the + * compressed pages + */ + em = create_io_em(inode, start, + async_extent->ram_size, /* len */ + start, /* orig_start */ + ins.objectid, /* block_start */ + ins.offset, /* block_len */ + ins.offset, /* orig_block_len */ + async_extent->ram_size, /* ram_bytes */ + async_extent->compress_type, + BTRFS_ORDERED_COMPRESSED); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out_free_reserve; + } + free_extent_map(em); + + ret = btrfs_add_ordered_extent_compress(inode, start, /* file_offset */ + ins.objectid, /* disk_bytenr */ + async_extent->ram_size, /* num_bytes */ + ins.offset, /* disk_num_bytes */ + async_extent->compress_type); + if (ret) { + btrfs_drop_extent_cache(inode, start, end, 0); + goto out_free_reserve; } - return; + btrfs_dec_block_group_reservations(fs_info, ins.objectid); + + /* + * clear dirty, set writeback and unlock the pages. + */ + extent_clear_unlock_delalloc(inode, start, end, + NULL, EXTENT_LOCKED | EXTENT_DELALLOC, + PAGE_UNLOCK | PAGE_START_WRITEBACK); + if (btrfs_submit_compressed_write(inode, start, /* file_offset */ + async_extent->ram_size, /* num_bytes */ + ins.objectid, /* disk_bytenr */ + ins.offset, /* compressed_len */ + async_extent->pages, /* compressed_pages */ + async_extent->nr_pages, + async_chunk->write_flags, + async_chunk->blkcg_css)) { + const u64 start = async_extent->start; + const u64 end = start + async_extent->ram_size - 1; + + btrfs_writepage_endio_finish_ordered(inode, NULL, start, + end, 0); + + extent_clear_unlock_delalloc(inode, start, end, NULL, 0, + PAGE_END_WRITEBACK | + PAGE_SET_ERROR); + free_async_extent_pages(async_extent); + } + *alloc_hint = ins.objectid + ins.offset; + kfree(async_extent); + return ret; + out_free_reserve: btrfs_dec_block_group_reservations(fs_info, ins.objectid); btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); out_free: - extent_clear_unlock_delalloc(inode, async_extent->start, - async_extent->start + - async_extent->ram_size - 1, + extent_clear_unlock_delalloc(inode, start, end, NULL, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, @@ -1003,7 +969,37 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) PAGE_END_WRITEBACK | PAGE_SET_ERROR); free_async_extent_pages(async_extent); kfree(async_extent); - goto again; + return ret; +} + +/* + * phase two of compressed writeback. This is the ordered portion + * of the code, which only gets called in the order the work was + * queued. We walk all the async extents created by compress_file_range + * and send them down to the disk. + */ +static noinline void submit_compressed_extents(struct async_chunk *async_chunk) +{ + struct btrfs_inode *inode = BTRFS_I(async_chunk->inode); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct async_extent *async_extent; + u64 alloc_hint = 0; + int ret = 0; + + while (!list_empty(&async_chunk->extents)) { + async_extent = list_entry(async_chunk->extents.next, + struct async_extent, list); + list_del(&async_extent->list); + + ret = submit_one_async_extent(inode, async_chunk, async_extent, + &alloc_hint); + /* Just for developer */ + btrfs_debug(fs_info, +"async extent submission failed root=%lld inode=%llu start=%llu len=%llu ret=%d", + inode->root->root_key.objectid, + btrfs_ino(inode), async_extent->start, + async_extent->ram_size, ret); + } } static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, From patchwork Sun Aug 29 05:24:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55C34C4320A for ; Sun, 29 Aug 2021 05:25:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 40B3460F3A for ; Sun, 29 Aug 2021 05:25:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234761AbhH2F0P (ORCPT ); Sun, 29 Aug 2021 01:26:15 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59386 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234745AbhH2F0N (ORCPT ); Sun, 29 Aug 2021 01:26:13 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4C15B21DA9 for ; Sun, 29 Aug 2021 05:25:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214721; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nVW8obWKFOmwEu0EJxL7ZyUThnwOAeI9Dg92+VTQDls=; b=pdIH1OiN2WIInkcQ1bSxwMhnOVxTyedJVol9R5+NWZBTZF+Wnn5bJCmzV8qDGYFjywlKAp OmzhspVkDxhQ07aIXXwG+QiT9sNoZFD/RnjhsEy4XRb8i3/8fDLfeB7+IBC5qXLQHG4HE5 OHPK2DuGmb4epJt4re4IXkyAty7znug= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 8A63C13964 for ; Sun, 29 Aug 2021 05:25:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id wM91E0AaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:20 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 16/26] btrfs: cleanup for extent_write_locked_range() Date: Sun, 29 Aug 2021 13:24:48 +0800 Message-Id: <20210829052458.15454-17-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are several cleanups for extent_write_locked_range(), most of them are pure cleanups, but with some preparation for future subpage support. - Add a proper comment for which call sites are suitable Unlike regular synchronized extent write back, if async cow or zoned cow happens, we have all pages in the range still locked. Thus for those (only) two call sites, we need this function to submit page content into bios and submit them. - Remove @mode parameter All the existing two call sites pass WB_SYNC_ALL. No need for @mode parameter. - Better error handling Currently if we hit an error during the page iteration loop, we overwrite @ret, causing only the last error can be recorded. Here we add @found_error and @first_error variable to record if we hit any error, and the first error we hit. So the first error won't get lost. - Don't reuse @start as the cursor We reuse the parameter @start as the cursor to iterate the range, not a big problem, but since we're here, introduce a proper @cur as the cursor. - Remove impossible branch Since all pages are still locked after the ordered extent is inserted, there is no way that pages can get its dirty bit cleared. Remove the branch where page is not dirty and replace it with an ASSERT(). Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 45 ++++++++++++++++++++++++++++---------------- fs/btrfs/extent_io.h | 3 +-- fs/btrfs/inode.c | 4 ++-- 3 files changed, 32 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3214ecb36cf3..f6f4c069cf06 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5083,23 +5083,29 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc) return ret; } -int extent_write_locked_range(struct inode *inode, u64 start, u64 end, - int mode) +/* + * Submit the pages in the range to bio for call sites which delalloc range + * has already be ran (aka, ordered extent inserted) and all pages are still + * locked. + */ +int extent_write_locked_range(struct inode *inode, u64 start, u64 end) { + bool found_error = false; + int first_error = 0; int ret = 0; struct address_space *mapping = inode->i_mapping; struct page *page; + u64 cur = start; unsigned long nr_pages = (end - start + PAGE_SIZE) >> PAGE_SHIFT; - struct extent_page_data epd = { .bio_ctrl = { 0 }, .extent_locked = 1, - .sync_io = mode == WB_SYNC_ALL, + .sync_io = 1, }; struct writeback_control wbc_writepages = { - .sync_mode = mode, .nr_to_write = nr_pages * 2, + .sync_mode = WB_SYNC_ALL, .range_start = start, .range_end = end + 1, /* We're called from an async helper function */ @@ -5108,26 +5114,33 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, }; wbc_attach_fdatawrite_inode(&wbc_writepages, inode); - while (start <= end) { - page = find_get_page(mapping, start >> PAGE_SHIFT); - if (clear_page_dirty_for_io(page)) - ret = __extent_writepage(page, &wbc_writepages, &epd); - else { - btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), - page, start, start + PAGE_SIZE - 1, true); - unlock_page(page); + while (cur <= end) { + page = find_get_page(mapping, cur >> PAGE_SHIFT); + /* + * All pages in the range are locked since + * btrfs_run_delalloc_range(), thus there is no way to clear + * the page dirty flag. + */ + ASSERT(PageDirty(page)); + clear_page_dirty_for_io(page); + ret = __extent_writepage(page, &wbc_writepages, &epd); + ASSERT(ret <= 0); + if (ret < 0) { + found_error = true; + first_error = ret; } put_page(page); - start += PAGE_SIZE; + cur += PAGE_SIZE; } - ASSERT(ret <= 0); - if (ret == 0) + if (!found_error) ret = flush_write_bio(&epd); else end_write_bio(&epd, ret); wbc_detach_inode(&wbc_writepages); + if (found_error) + return first_error; return ret; } diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 9f3e0a45a5e4..9a8e5ed8b01b 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -184,8 +184,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, struct btrfs_bio_ctrl *bio_ctrl, unsigned int read_flags, u64 *prev_em_start); int extent_write_full_page(struct page *page, struct writeback_control *wbc); -int extent_write_locked_range(struct inode *inode, u64 start, u64 end, - int mode); +int extent_write_locked_range(struct inode *inode, u64 start, u64 end); int extent_writepages(struct address_space *mapping, struct writeback_control *wbc); int btree_write_cache_pages(struct address_space *mapping, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2098cc2e6db2..18d99f5dd380 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -874,7 +874,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode, */ if (!page_started && !ret) extent_write_locked_range(&inode->vfs_inode, start, - end, WB_SYNC_ALL); + end); else if (ret && async_chunk->locked_page) unlock_page(async_chunk->locked_page); kfree(async_extent); @@ -1461,7 +1461,7 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode, __set_page_dirty_nobuffers(locked_page); account_page_redirty(locked_page); - extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + extent_write_locked_range(&inode->vfs_inode, start, end); *page_started = 1; return 0; From patchwork Sun Aug 29 05:24:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12D4FC43216 for ; Sun, 29 Aug 2021 05:25:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EEC0A606A5 for ; Sun, 29 Aug 2021 05:25:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234767AbhH2F0Q (ORCPT ); Sun, 29 Aug 2021 01:26:16 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59392 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234713AbhH2F0O (ORCPT ); Sun, 29 Aug 2021 01:26:14 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8BF0E21DAB for ; Sun, 29 Aug 2021 05:25:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oZcuZEUnXtJ4oXuw+raoc3NAiYzDAdxYokCQsFtbGhA=; b=imCR6EcbwkoCFUP5SgXhymSZdjB14ZnT2ejcStDb7sYGSf7tLOmNhRTTWNvIcPKCjie+GP NpqnUA1D75NcQXFO0a5IufaBn16iuETMeI4pSIopY7OYyHUHN69HxFP7Iql8qzKj4Ocxb3 8tOW0wOQ6LbOefir/KMd0XZiHsVMfjQ= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id BE77C13964 for ; Sun, 29 Aug 2021 05:25:21 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id oAz8H0EaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:21 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 17/26] btrfs: make compress_file_range() to be subpage compatible Date: Sun, 29 Aug 2021 13:24:49 +0800 Message-Id: <20210829052458.15454-18-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In function compress_file_range(), when the compression is finished, the function just round up @total_in to PAGE_SIZE. This is fine for regular sectorsize == PAGE_SIZE case, but not for subpage. Just change the ALIGN(, PAGE_SIZE) to round_up(, sectorsize) so that both regular sectorsize and subpage sectorsize will be happy. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 18d99f5dd380..09639887b65f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -758,7 +758,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk) * win, compare the page count read with the blocks on disk, * compression must free at least one sector size */ - total_in = ALIGN(total_in, PAGE_SIZE); + total_in = round_up(total_in, fs_info->sectorsize); if (total_compressed + blocksize <= total_in) { compressed_extents++; From patchwork Sun Aug 29 05:24:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ADA8C4320E for ; Sun, 29 Aug 2021 05:25:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86A8460F39 for ; Sun, 29 Aug 2021 05:25:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234770AbhH2F0Q (ORCPT ); Sun, 29 Aug 2021 01:26:16 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45812 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234705AbhH2F0P (ORCPT ); Sun, 29 Aug 2021 01:26:15 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B61D020016 for ; Sun, 29 Aug 2021 05:25:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214723; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VKeLbzx48Ff/UHwCOf/9ZOq/1RL0f88Tc4YllAOU2Ec=; b=bdzX0GU8IdRRpFad6B+/WyBUK/bZM6w+snPV15BQ3BFEMrO6PRu8sv0sHzr+0KRbCnpCFc IQkjWSGz5T/gWoM6zRLv2HaFHnmvDmYbSVffDliq2NsqygNZLUzjPcQtdwBm7Kk6KBuf2W 5D+QI3JKF9aR/m5uuPdIcpSAVj3QkYA= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id F30FF13964 for ; Sun, 29 Aug 2021 05:25:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id UKbcLEIaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:22 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 18/26] btrfs: make btrfs_submit_compressed_write() to be subpage compatible Date: Sun, 29 Aug 2021 13:24:50 +0800 Message-Id: <20210829052458.15454-19-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There is a WARN_ON() checking if @start is aligned to PAGE_SIZE, not sectorsize, which will cause false alert for subpage. Fix it to check against sectorsize. Furthermore: - Use ASSERT() to do the check So that in the future we may skip the check for production build - Also check alignment for @len Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 6e389f1c6091..6bfa626bbd0b 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -504,7 +504,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, const bool use_append = btrfs_use_zone_append(inode, disk_start); const unsigned int bio_op = use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE; - WARN_ON(!PAGE_ALIGNED(start)); + ASSERT(IS_ALIGNED(start, fs_info->sectorsize) && + IS_ALIGNED(len, fs_info->sectorsize)); cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS); if (!cb) return BLK_STS_RESOURCE; From patchwork Sun Aug 29 05:24:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463729 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EDACC432BE for ; Sun, 29 Aug 2021 05:25:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 106AB606A5 for ; Sun, 29 Aug 2021 05:25:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234772AbhH2F0R (ORCPT ); Sun, 29 Aug 2021 01:26:17 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59398 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234713AbhH2F0R (ORCPT ); Sun, 29 Aug 2021 01:26:17 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E9F5F21D6A for ; Sun, 29 Aug 2021 05:25:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214724; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OawqWwuUDuWuzEgwCMslzCgEH3puAhpaawAMB3aNKsM=; b=a8DUaBKo68TtJqpX8o+qTHhP56eSrDwIopVhoG5AA7vVCF7k64r90kN2gC1WZkkTg0qXCh wTilVVpqdl9SWo0xNXkL8bx2rMvWqCnEiC+qHvfCzVB3+SJo6IWuOMbDGlC8hZ/PIUETbW zWI/LT7nIyIZzwe1pMExVlV/lXFkqQE= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 3313513964 for ; Sun, 29 Aug 2021 05:25:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id 8MW1OUMaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:23 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 19/26] btrfs: make end_compressed_bio_writeback() to be subpage compatble Date: Sun, 29 Aug 2021 13:24:51 +0800 Message-Id: <20210829052458.15454-20-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In end_compressed_writeback() we just clear the full page writeback. For subpage case, if there are two delalloc range in the same page, the 2nd range will trigger a BUG_ON() as the page writeback is already cleared by previous range. Fix it by using btrfs_page_clamp_clear_writeback() helper. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 6bfa626bbd0b..ef3d409a1bc3 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -336,6 +336,7 @@ static void end_compressed_bio_read(struct bio *bio) static noinline void end_compressed_writeback(struct inode *inode, const struct compressed_bio *cb) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); unsigned long index = cb->start >> PAGE_SHIFT; unsigned long end_index = (cb->start + cb->len - 1) >> PAGE_SHIFT; struct page *pages[16]; @@ -358,7 +359,8 @@ static noinline void end_compressed_writeback(struct inode *inode, for (i = 0; i < ret; i++) { if (cb->errors) SetPageError(pages[i]); - end_page_writeback(pages[i]); + btrfs_page_clamp_clear_writeback(fs_info, pages[i], + cb->start, cb->len); put_page(pages[i]); } nr_pages -= ret; From patchwork Sun Aug 29 05:24:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1638FC4320E for ; Sun, 29 Aug 2021 05:25:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F1A3360F39 for ; Sun, 29 Aug 2021 05:25:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234780AbhH2F0T (ORCPT ); Sun, 29 Aug 2021 01:26:19 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59404 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234777AbhH2F0S (ORCPT ); Sun, 29 Aug 2021 01:26:18 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2C39921D6A for ; Sun, 29 Aug 2021 05:25:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214726; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MAzEEau2QuPlVvWVwq4O02VoroBNDUtp1UyW7MY7MWI=; b=oWtGMq9kEDczMSb22amu0JQjLjLhahHkVQsah2kmYrTUPkU82EgqzxkFrF4f5ow1+SsRbp LVGXGaO5ypwTi9R52G33sZ7c9/s9pmP2Ksqw5Q2XAIM4j8Rx8aDzUhqDHdua2hkPKLrYAx /H7kVCK3vtKuNKoqYc25EcP1tyefsmQ= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 6771213964 for ; Sun, 29 Aug 2021 05:25:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id oAm/CkUaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:25 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 20/26] btrfs: make extent_write_locked_range() to be subpage compatible Date: Sun, 29 Aug 2021 13:24:52 +0800 Message-Id: <20210829052458.15454-21-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are two sites are not subpage compatible yet for extent_write_locked_range(): - How @nr_pages are calculated For subpage we can have the following range with 64K page size: 0 32K 64K 96K 128K | |////|/////| | In that case, although 96K - 32K == 64K, thus it looks like one page is enough, but the range spans across two pages, not one. Fix it by doing proper round_up() and round_down() to calculate @nr_pages. Also add some extra ASSERT()s to ensure the range passed in is already aligned. - How the page end is calculated Currently we just use cur + PAGE_SIZE - 1 to calculate the page end. Which can't handle above range layout, and will trigger ASSERT() in btrfs_writepage_endio_finish_ordered(), as the range is no longer covered by the page range. Fix it by take page end into consideration. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index f6f4c069cf06..077856bf15d0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5096,15 +5096,14 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end) struct address_space *mapping = inode->i_mapping; struct page *page; u64 cur = start; - unsigned long nr_pages = (end - start + PAGE_SIZE) >> - PAGE_SHIFT; + unsigned long nr_pages; + const u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize; struct extent_page_data epd = { .bio_ctrl = { 0 }, .extent_locked = 1, .sync_io = 1, }; struct writeback_control wbc_writepages = { - .nr_to_write = nr_pages * 2, .sync_mode = WB_SYNC_ALL, .range_start = start, .range_end = end + 1, @@ -5113,14 +5112,24 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end) .no_cgroup_owner = 1, }; + ASSERT(IS_ALIGNED(start, sectorsize) && + IS_ALIGNED(end + 1, sectorsize)); + nr_pages = (round_up(end, PAGE_SIZE) - round_down(start, PAGE_SIZE)) >> + PAGE_SHIFT; + wbc_writepages.nr_to_write = nr_pages * 2; + wbc_attach_fdatawrite_inode(&wbc_writepages, inode); while (cur <= end) { + u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, + end); + page = find_get_page(mapping, cur >> PAGE_SHIFT); /* * All pages in the range are locked since * btrfs_run_delalloc_range(), thus there is no way to clear * the page dirty flag. */ + ASSERT(PageLocked(page)); ASSERT(PageDirty(page)); clear_page_dirty_for_io(page); ret = __extent_writepage(page, &wbc_writepages, &epd); @@ -5130,7 +5139,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end) first_error = ret; } put_page(page); - cur += PAGE_SIZE; + cur = cur_end + 1; } if (!found_error) From patchwork Sun Aug 29 05:24:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463733 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1D59C432BE for ; Sun, 29 Aug 2021 05:25:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CDEEB60C3F for ; Sun, 29 Aug 2021 05:25:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234783AbhH2F0U (ORCPT ); Sun, 29 Aug 2021 01:26:20 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59410 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234781AbhH2F0T (ORCPT ); Sun, 29 Aug 2021 01:26:19 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 623C121DAB for ; Sun, 29 Aug 2021 05:25:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214727; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yvq0Aiw0n0HwHkh1h90Iclh8weakBxcHJbysTTQJOlo=; b=FK+Ud94e+A1QgalcN0m7xstJgrN8RGGq5RZxv47+7pV7rGUyPCyyHBx3IWuNe/woupZ5w3 ErIKJwKJjPLK8Q8WtXdstLoKGdvuMn1p+3j/hPuH/zIZAEjua3/c0FYQ22nxh2C1qez9qx KVfOybIh5VqdWsVeEGpm3Lg2bP4fYlQ= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 9DBDD13964 for ; Sun, 29 Aug 2021 05:25:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id SM8HGEYaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:26 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 21/26] btrfs: extract uncompressed async extent submission code into a new helper Date: Sun, 29 Aug 2021 13:24:53 +0800 Message-Id: <20210829052458.15454-22-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce a new helper, submit_uncompressed_range(), for async cow cases where we fallback to cow. There are some new modification introduced to the helper: - Proper locked_page detection It's possible that the async_extent range doesn't cover the locked page. In that case we shouldn't unlock the locked page. In the new helper, we will ensure that we only unlock the locked page when: * The locked page covers part of the async_extent range * The locked page is not unlocked by cow_file_range() nor extent_write_locked_range() This also means extra comments are added focusing on the page locking. - Add extra comment on some rare parameter used. We use @unlock_page = 0 for cow_file_range(), where only two call sites doing the same thing, including the new helper. It's definitely worthy some comments. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 76 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 52 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 09639887b65f..aa5c764fd1d4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -839,6 +839,43 @@ static void free_async_extent_pages(struct async_extent *async_extent) async_extent->pages = NULL; } +static int submit_uncompressed_range(struct btrfs_inode *inode, + struct async_extent *async_extent, + struct page *locked_page) +{ + u64 start = async_extent->start; + u64 end = async_extent->start + async_extent->ram_size - 1; + unsigned long nr_written = 0; + int page_started = 0; + int ret; + + /* + * Call cow_file_range() to run the delalloc range directly, + * since we won't go to nocow or async path again. + * + * Also we call cow_file_range() with @unlock_page == 0, so that we + * can directly submit them without interruption. + */ + ret = cow_file_range(inode, locked_page, start, end, &page_started, + &nr_written, 0); + /* Inline extent inserted, page get unlocked and everything is done */ + if (ret > 0) { + ret = 0; + goto out; + } + if (ret < 0) { + if (locked_page) + unlock_page(locked_page); + goto out; + } + + ret = extent_write_locked_range(&inode->vfs_inode, start, end); + /* All pages will be unlocked, including @locked_page */ +out: + kfree(async_extent); + return ret; +} + static int submit_one_async_extent(struct btrfs_inode *inode, struct async_chunk *async_chunk, struct async_extent *async_extent, @@ -848,38 +885,29 @@ static int submit_one_async_extent(struct btrfs_inode *inode, struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_key ins; + struct page *locked_page = NULL; struct extent_map *em; int ret = 0; u64 start = async_extent->start; u64 end = async_extent->start + async_extent->ram_size - 1; + /* + * If async_chunk->locked_page is in the async_extent range, we + * need to handle it. + */ + if (async_chunk->locked_page) { + u64 locked_page_start = page_offset(async_chunk->locked_page); + u64 locked_page_end = locked_page_start + PAGE_SIZE - 1; + + if (!(start >= locked_page_end || end <= locked_page_start)) + locked_page = async_chunk->locked_page; + } lock_extent(io_tree, start, end); /* We have fall back to uncompressed write */ - if (!async_extent->pages) { - int page_started = 0; - unsigned long nr_written = 0; - - /* - * Call cow_file_range() to run the delalloc range directly, - * since we won't go to nocow or async path again. - */ - ret = cow_file_range(inode, async_chunk->locked_page, - start, end, &page_started, &nr_written, 0); - /* - * If @page_started, cow_file_range() inserted an - * inline extent and took care of all the unlocking - * and IO for us. Otherwise, we need to submit - * all those pages down to the drive. - */ - if (!page_started && !ret) - extent_write_locked_range(&inode->vfs_inode, start, - end); - else if (ret && async_chunk->locked_page) - unlock_page(async_chunk->locked_page); - kfree(async_extent); - return ret; - } + if (!async_extent->pages) + return submit_uncompressed_range(inode, async_extent, + locked_page); ret = btrfs_reserve_extent(root, async_extent->ram_size, async_extent->compressed_size, From patchwork Sun Aug 29 05:24:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463735 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20007C4320A for ; Sun, 29 Aug 2021 05:25:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 09E0C60C3F for ; Sun, 29 Aug 2021 05:25:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234786AbhH2F0V (ORCPT ); Sun, 29 Aug 2021 01:26:21 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45818 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234784AbhH2F0U (ORCPT ); Sun, 29 Aug 2021 01:26:20 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9703220016 for ; Sun, 29 Aug 2021 05:25:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214728; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cu3HWTpscKImVMOpDz4cF1F9lUWaMYx3WEGOjAVwXu4=; b=Q2w5+3IVgFDypBWvbVaKeOtw3GYkb+Lwli5ZmsSZfzmQu5I1qaZ6hkDgZFpGewejO68JUu 0JzITO4hzAz/E15Q+uXh7jb/zRswx81lsbVLEiLHmGBl9R0Xgnsvpxx1Ebf9Y8aUEo38fj BASjU5L9qSYB2zJP8izRzGkFFjZ8OMc= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id D51E713964 for ; Sun, 29 Aug 2021 05:25:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id ENBoJUcaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:27 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 22/26] btrfs: rework lzo_compress_pages() to make it subpage compatible Date: Sun, 29 Aug 2021 13:24:54 +0800 Message-Id: <20210829052458.15454-23-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are several problems in lzo_compress_pages() preventing it from being subpage compatible: - No page offset is calculated when reading from inode pages For subpage case, we could have @start which is not aligned to PAGE_SIZE. Thus the destination where we read data from must take offset in page into consideration. - The padding for segment header is bound to PAGE_SIZE This means, for subpage case we can skip several corners where on x86 machines we need to add padding zeros. The rework will: - Update the comment to replace "page" with "sector" - Introduce a new helper, copy_compressed_data_to_page(), to do the copy So that we don't need to bother page switches for both input and output. Now in lzo_compress_pages() we only care about page switching for input, while in copy_compressed_data_to_page() we only care the page switching for output. - Only one main cursor For lzo_compress_pages() we use @cur_in as main curor. It will be the file offset we are currently at. All other helper variables will be only declared inside the loop. For copy_compressed_data_to_page() it's similar, we will have @cur_out at the main cursor, which records how many bytes are in the output. - Get rid of kmap()/kunmap() Instead of using __GFP_HIGHMEM and needs to do kmap()/kunmap(), just get rid of that GFP flag, so we can use page_address() and never bother the kmap()/kunmap() thing. Signed-off-by: Qu Wenruo --- fs/btrfs/lzo.c | 280 +++++++++++++++++++++++++------------------------ 1 file changed, 144 insertions(+), 136 deletions(-) diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c index c25dfd1a8a54..ee2b055a8069 100644 --- a/fs/btrfs/lzo.c +++ b/fs/btrfs/lzo.c @@ -32,19 +32,19 @@ * payload. * One regular LZO compressed extent can have one or more segments. * For inlined LZO compressed extent, only one segment is allowed. - * One segment represents at most one page of uncompressed data. + * One segment represents at most one sector of uncompressed data. * * 2.1 Segment header * Fixed size. LZO_LEN (4) bytes long, LE32. * Records the total size of the segment (not including the header). - * Segment header never crosses page boundary, thus it's possible to - * have at most 3 padding zeros at the end of the page. + * Segment header never crosses sector boundary, thus it's possible to + * have at most 3 padding zeros at the end of the sector. * * 2.2 Data Payload - * Variable size. Size up limit should be lzo1x_worst_compress(PAGE_SIZE) - * which is 4419 for a 4KiB page. + * Variable size. Size up limit should be lzo1x_worst_compress(sectorsize) + * which is 4419 for a 4KiB sectorsize. * - * Example: + * Example with 4K sectorsize: * Page 1: * 0 0x2 0x4 0x6 0x8 0xa 0xc 0xe 0x10 * 0x0000 | Header | SegHdr 01 | Data payload 01 ... | @@ -112,163 +112,171 @@ static inline size_t read_compress_length(const char *buf) return le32_to_cpu(dlen); } +/* + * Will do: + * + * - Write a segment header into the destination + * - Copy the compressed buffer into the destination + * - Make sure we have enough space in the last sector to fit a segment header + * If not, we will pad at most (LZO_LEN (4)) - 1 bytes of zeros. + * + * Will allocate new pages when needed. + */ +static int copy_compressed_data_to_page(char *compressed_data, + size_t compressed_size, + struct page **out_pages, + u32 *cur_out, + const u32 sectorsize) +{ + u32 sector_bytes_left; + u32 orig_out; + struct page *cur_page; + + /* + * We never allow a segment header crossing sector boundary, previous + * run should ensure we have enough space left inside the sector. + */ + ASSERT((*cur_out / sectorsize) == + (*cur_out + LZO_LEN - 1) / sectorsize); + + cur_page = out_pages[*cur_out / PAGE_SIZE]; + /* Allocate a new page */ + if (!cur_page) { + cur_page = alloc_page(GFP_NOFS); + if (!cur_page) + return -ENOMEM; + out_pages[*cur_out / PAGE_SIZE] = cur_page; + } + + write_compress_length(page_address(cur_page) + offset_in_page(*cur_out), + compressed_size); + *cur_out += LZO_LEN; + + orig_out = *cur_out; + /* *cur_out is increased, let the main loop to grab a proper page */ + cur_page = NULL; + + /* Copy compressed data */ + while (*cur_out - orig_out < compressed_size) { + u32 copy_len = min_t(u32, sectorsize - *cur_out % sectorsize, + orig_out + compressed_size - *cur_out); + + /* Grab a page or allocate a new one */ + if (!cur_page) { + cur_page = out_pages[*cur_out / PAGE_SIZE]; + if (!cur_page) { + cur_page = alloc_page(GFP_NOFS); + if (!cur_page) + return -ENOMEM; + out_pages[*cur_out / PAGE_SIZE] = cur_page; + } + } + + memcpy(page_address(cur_page) + offset_in_page(*cur_out), + compressed_data + *cur_out - orig_out, copy_len); + + *cur_out += copy_len; + + /* If we reached page boudnary, go to next page */ + if (IS_ALIGNED(*cur_out, PAGE_SIZE)) { + /* Let next iteration to grab a page */ + cur_page = NULL; + } + } + + /* + * Check if we can fit the next segment header into the remaining space + * of the sector. + */ + sector_bytes_left = round_up(*cur_out, sectorsize) - *cur_out; + if (sector_bytes_left >= LZO_LEN) + return 0; + + /* The remaining size is not enough, pad it with zeros */ + memset(page_address(cur_page) + offset_in_page(*cur_out), 0, + sector_bytes_left); + *cur_out += sector_bytes_left; + return 0; +} + int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, u64 start, struct page **pages, unsigned long *out_pages, unsigned long *total_in, unsigned long *total_out) { struct workspace *workspace = list_entry(ws, struct workspace, list); + const u32 sectorsize = btrfs_sb(mapping->host->i_sb)->sectorsize; + struct page *page_in = NULL; int ret = 0; - char *data_in; - char *cpage_out, *sizes_ptr; - int nr_pages = 0; - struct page *in_page = NULL; - struct page *out_page = NULL; - unsigned long bytes_left; - unsigned long len = *total_out; - unsigned long nr_dest_pages = *out_pages; - const unsigned long max_out = nr_dest_pages * PAGE_SIZE; - size_t in_len; - size_t out_len; - char *buf; - unsigned long tot_in = 0; - unsigned long tot_out = 0; - unsigned long pg_bytes_left; - unsigned long out_offset; - unsigned long bytes; + u64 cur_in = start; /* Points to the file offset of input data */ + u32 cur_out = 0; /* Points to the current output byte */ + u32 len = *total_out; *out_pages = 0; *total_out = 0; *total_in = 0; - in_page = find_get_page(mapping, start >> PAGE_SHIFT); - data_in = page_address(in_page); - /* - * store the size of all chunks of compressed data in - * the first 4 bytes + * Skip the header for now, we will later come back and write the total + * compressed size */ - out_page = alloc_page(GFP_NOFS); - if (out_page == NULL) { - ret = -ENOMEM; - goto out; - } - cpage_out = page_address(out_page); - out_offset = LZO_LEN; - tot_out = LZO_LEN; - pages[0] = out_page; - nr_pages = 1; - pg_bytes_left = PAGE_SIZE - LZO_LEN; - - /* compress at most one page of data each time */ - in_len = min(len, PAGE_SIZE); - while (tot_in < len) { - ret = lzo1x_1_compress(data_in, in_len, workspace->cbuf, - &out_len, workspace->mem); - if (ret != LZO_E_OK) { - pr_debug("BTRFS: lzo in loop returned %d\n", - ret); + cur_out += LZO_LEN; + while (cur_in < start + len) { + u32 sector_off = (cur_in - start) % sectorsize; + u32 in_len; + size_t out_len; + + /* Get the input page first */ + if (!page_in) { + page_in = find_get_page(mapping, cur_in >> PAGE_SHIFT); + ASSERT(page_in); + } + + /* Compress at most one sector of data each time */ + in_len = min_t(u32, start + len - cur_in, + sectorsize - sector_off); + ASSERT(in_len); + ret = lzo1x_1_compress(page_address(page_in) + + offset_in_page(cur_in), in_len, + workspace->cbuf, &out_len, + workspace->mem); + if (ret < 0) { + pr_debug("BTRFS: lzo in loop returned %d\n", ret); ret = -EIO; goto out; } - /* store the size of this chunk of compressed data */ - write_compress_length(cpage_out + out_offset, out_len); - tot_out += LZO_LEN; - out_offset += LZO_LEN; - pg_bytes_left -= LZO_LEN; - - tot_in += in_len; - tot_out += out_len; - - /* copy bytes from the working buffer into the pages */ - buf = workspace->cbuf; - while (out_len) { - bytes = min_t(unsigned long, pg_bytes_left, out_len); - - memcpy(cpage_out + out_offset, buf, bytes); - - out_len -= bytes; - pg_bytes_left -= bytes; - buf += bytes; - out_offset += bytes; - - /* - * we need another page for writing out. - * - * Note if there's less than 4 bytes left, we just - * skip to a new page. - */ - if ((out_len == 0 && pg_bytes_left < LZO_LEN) || - pg_bytes_left == 0) { - if (pg_bytes_left) { - memset(cpage_out + out_offset, 0, - pg_bytes_left); - tot_out += pg_bytes_left; - } - - /* we're done, don't allocate new page */ - if (out_len == 0 && tot_in >= len) - break; - - if (nr_pages == nr_dest_pages) { - out_page = NULL; - ret = -E2BIG; - goto out; - } - - out_page = alloc_page(GFP_NOFS); - if (out_page == NULL) { - ret = -ENOMEM; - goto out; - } - cpage_out = page_address(out_page); - pages[nr_pages++] = out_page; - - pg_bytes_left = PAGE_SIZE; - out_offset = 0; - } - } + ret = copy_compressed_data_to_page(workspace->cbuf, out_len, + pages, &cur_out, sectorsize); + if (ret < 0) + goto out; + + cur_in += in_len; - /* we're making it bigger, give up */ - if (tot_in > 8192 && tot_in < tot_out) { + /* + * Check if we're making it bigger after two sectors. + * And if we're making it bigger, give up. + */ + if (cur_in - start > sectorsize * 2 && + cur_in - start < cur_out) { ret = -E2BIG; goto out; } - /* we're all done */ - if (tot_in >= len) - break; - - if (tot_out > max_out) - break; - - bytes_left = len - tot_in; - put_page(in_page); - - start += PAGE_SIZE; - in_page = find_get_page(mapping, start >> PAGE_SHIFT); - data_in = page_address(in_page); - in_len = min(bytes_left, PAGE_SIZE); - } - - if (tot_out >= tot_in) { - ret = -E2BIG; - goto out; + /* Check if we have reached page boundary */ + if (IS_ALIGNED(cur_in, PAGE_SIZE)) { + put_page(page_in); + page_in = NULL; + } } - /* store the size of all chunks of compressed data */ - sizes_ptr = page_address(pages[0]); - write_compress_length(sizes_ptr, tot_out); + /* Store the size of all chunks of compressed data */ + write_compress_length(page_address(pages[0]), cur_out); ret = 0; - *total_out = tot_out; - *total_in = tot_in; + *total_out = cur_out; + *total_in = cur_in - start; out: - *out_pages = nr_pages; - - if (in_page) - put_page(in_page); - + *out_pages = DIV_ROUND_UP(cur_out, PAGE_SIZE); return ret; } From patchwork Sun Aug 29 05:24:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DE12C432BE for ; Sun, 29 Aug 2021 05:25:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 64B94606A5 for ; Sun, 29 Aug 2021 05:25:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234793AbhH2F0Y (ORCPT ); Sun, 29 Aug 2021 01:26:24 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45824 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234789AbhH2F0W (ORCPT ); Sun, 29 Aug 2021 01:26:22 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CC1EB20017 for ; Sun, 29 Aug 2021 05:25:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214729; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AW3gPR+JId+hxx3CfxLPnDZi43ldGI6XtAzOP8bz178=; b=SI1YBI6UnPmvTqXX84U6pTUwKK6fosUAWXGmq5x6PNF+rmjvjx2mBy/6Tmnd8CpsneaYJQ 9cEW2oEuV7CFNA6WXkhVY5IqMqhEI7wHL9h8kAKN9bwVU+GeNF1cMSD1MU2q4r7I8t9Pvc XGnTWMrAYZva0ooK//+QfxjXOEd6OSc= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 14E7013964 for ; Sun, 29 Aug 2021 05:25:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id 0JBJMkgaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:28 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 23/26] btrfs: teach __extent_writepage() to handle locked page differently Date: Sun, 29 Aug 2021 13:24:55 +0800 Message-Id: <20210829052458.15454-24-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Pages passed to __extent_writepage() are always locked, but they may be locked by different functions. There are two types of locked page for __extent_writepage(): - Page locked by plain lock_page() It should not have any subpage::writers count. Can be unlocked by unlock_page(). This is the most common locked page for __extent_writepage() called inside extent_write_cache_pages() or extent_write_full_page(). Rarer cases includes the @locked_page from extent_write_locked_range(). - Page locked by lock_delalloc_pages() There is only one caller, all pages except @locked_page for extent_write_locked_range(). In this case, we have to call subpage helper to handle the case. So here we introduce a helper, btrfs_page_unlock_writer(), to allow __extent_writepage() to unlock different locked pages. And since for all other callers of __extent_writepage() their pages are ensured to be locked by lock_page(), also add an extra check for epd::extent_locked to unlock such pages directly. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 15 ++++++++++++++- fs/btrfs/subpage.c | 43 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/subpage.h | 2 ++ 3 files changed, 59 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 077856bf15d0..44df7e142fb7 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4060,6 +4060,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, struct extent_page_data *epd) { struct inode *inode = page->mapping->host; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const u64 page_start = page_offset(page); const u64 page_end = page_start + PAGE_SIZE - 1; int ret; @@ -4148,7 +4149,19 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, */ if (PageError(page)) end_extent_writepage(page, ret, page_start, page_end); - unlock_page(page); + if (epd->extent_locked) { + /* + * If epd->extent_locked, it's from extent_write_locked_range(), + * the page can either be locked by lock_page() or + * process_one_page(). + * Let btrfs_page_unlock_writer() to handle both cases. + */ + ASSERT(wbc); + btrfs_page_unlock_writer(fs_info, page, wbc->range_start, + wbc->range_end + 1 - wbc->range_start); + } else { + unlock_page(page); + } ASSERT(ret <= 0); return ret; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 2bea6766d84e..50b3d96289ca 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -691,3 +691,46 @@ void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info, ASSERT(PagePrivate(page) && page->private); ASSERT(subpage_test_bitmap_all_zero(fs_info, subpage, dirty)); } + +/* + * Helper to handle different locked page with different page size + * - Page locked by plain lock_page() + * It should not have any subpage::writers count. + * Can be unlocked by unlock_page(). + * This is the most common locked page for __extent_writepage() called + * inside extent_write_cache_pages() or extent_write_full_page(). + * Rarer cases includes the @locked_page from extent_write_locked_range(). + * + * - Page locked by lock_delalloc_pages() + * There is only one caller, all pages except @locked_page for + * extent_write_locked_range(). + * In this case, we have to call subpage helper to handle the case. + */ +void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page, + u64 start, u32 len) +{ + struct btrfs_subpage *subpage; + + ASSERT(PageLocked(page)); + /* For regular page size case, we just unlock the page */ + if (fs_info->sectorsize == PAGE_SIZE) + return unlock_page(page); + + ASSERT(PagePrivate(page) && page->private); + subpage = (struct btrfs_subpage *)page->private; + + /* + * For subpage case, there are two types of locked page. + * With or without writers number. + * + * Since we own the page lock, no one else could touch + * subpage::writers and are we safe to do several atomic operations + * without spinlock. + */ + if (atomic_read(&subpage->writers)) + /* No writers, locked by plain lock_page() */ + return unlock_page(page); + + /* Have writers, use proper subpage helper to end it */ + btrfs_page_end_writer_lock(fs_info, page, start, len); +} diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 46224f959c34..7accb5c40d33 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -150,5 +150,7 @@ bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info, struct page *page); +void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page, + u64 start, u32 len); #endif From patchwork Sun Aug 29 05:24:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463739 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A03ECC4320E for ; Sun, 29 Aug 2021 05:25:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 89B7B60F39 for ; Sun, 29 Aug 2021 05:25:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234800AbhH2F0Z (ORCPT ); Sun, 29 Aug 2021 01:26:25 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59418 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234784AbhH2F0X (ORCPT ); Sun, 29 Aug 2021 01:26:23 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0C0B021D6A for ; Sun, 29 Aug 2021 05:25:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214731; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xnPty9NpjNaK8w9BEXEdx93WEuA4DkXxoQo+onTUaC0=; b=uhQxK3jshxzseqWFPFXYXowycVTVY6BPIAqOyhQPP+wIaONnYeTniszdjbmos3DDIfPVwE B28YkPdC+cL9UQGNP6v1pAz2hZrr3LFCvrqpNzNlZE5kRBEZpbK8Fo9PGomol8c/+MAEG9 uN5WjuVXH/rmHgf2vaCZZ8l5c7THF7c= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 4910713964 for ; Sun, 29 Aug 2021 05:25:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id AMVfA0oaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:30 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 24/26] btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even if it's locked by plain page_lock() Date: Sun, 29 Aug 2021 13:24:56 +0800 Message-Id: <20210829052458.15454-25-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are several call sites of extent_clear_unlock_delalloc() which gets @locked_page = NULL. So that extent_clear_unlock_delalloc() will try to call process_one_page() to unlock every page even the first page is not locked by btrfs_page_start_writer_lock(). This will trigger an ASSERT() in btrfs_subpage_end_and_test_writer() as previously we require every page passed to btrfs_subpage_end_and_test_writer() to be locked by btrfs_page_start_writer_lock(). But compression path doesn't go that way. Thankfully it's not hard to distinguish page locked by lock_page() and btrfs_page_start_writer_lock(). So do the check in btrfs_subpage_end_and_test_writer() so now it can handle both cases well. Signed-off-by: Qu Wenruo --- fs/btrfs/subpage.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 50b3d96289ca..39d63a21dcb9 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -292,6 +292,17 @@ bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, btrfs_subpage_assert(fs_info, page, start, len); + /* + * We have call sites passing @lock_page into + * extent_clear_unlock_delalloc() for compression path. + * + * Those @locked_page is locked by plain lock_page(), thus its + * subpage::writers is 0. + * Handle them specially. + */ + if (atomic_read(&subpage->writers) == 0) + return true; + ASSERT(atomic_read(&subpage->writers) >= nbits); return atomic_sub_and_test(nbits, &subpage->writers); } From patchwork Sun Aug 29 05:24:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E083C43214 for ; Sun, 29 Aug 2021 05:25:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 48456606A5 for ; Sun, 29 Aug 2021 05:25:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234805AbhH2F00 (ORCPT ); Sun, 29 Aug 2021 01:26:26 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:45830 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234790AbhH2F0Y (ORCPT ); Sun, 29 Aug 2021 01:26:24 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3FCE220016 for ; Sun, 29 Aug 2021 05:25:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214732; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9Hdm/u/MTeB74yfqNW4Eo+/HCwu5Jw1ZKjHxGPSUjes=; b=DLrCMSk0CEJVPebOqHomomY53pEc+bIFsCFi/QV+wQEF2/jxvS7m0qeg+R2/nYjP8q9AjK NQVIZhZYjhAEDgKpdl1OogJcxDnyiwROpAihsYHAhu0sk/kFS771UurqojL36bHF3m0t0N IJxhXH7/uh5/n9ouqRg1nsskqvnhtgY= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 7D9B613964 for ; Sun, 29 Aug 2021 05:25:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id IM9GEEsaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:31 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 25/26] btrfs: don't run delalloc range which is beyond the locked_page to prevent deadlock for subpage compression Date: Sun, 29 Aug 2021 13:24:57 +0800 Message-Id: <20210829052458.15454-26-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] With experimental subpage compression enabled, a simple fsstress can lead to self deadlock on page 720896: mkfs.btrfs -f -s 4k $dev > /dev/null mount $dev -o compress $mnt $fsstress -p 1 -n 100 -w -d $mnt -v -s 1625511156 [CAUSE] If we have a file layout looks like below: 0 32K 64K 96K 128K |//| |///////////////| 4K Then we run delalloc range for the inode, it will: - Call find_lock_delalloc_range() with @delalloc_start = 0 Then we got a delalloc range [0, 4K). This range will be CoWed. - Call find_lock_delalloc_range() again with @delalloc_start = 4K Since find_lock_delalloc_range() never cares whether the range is still inside page range [0, 64K), it will return range [64K, 128K). This range meets the condition for subpage compression, will go through async cow path. And async cow path will return @page_started. But that @page_started is now for range [64K, 128K), not for range [0, 64K). - writepage_dellloc() returned 1 for page [0, 64K) Thus page [0, 64K) will not be unlocked, nor its page dirty status will be cleared. Next time when we try to lock page [0, 64K) we will deadlock, as there is no one to release page [0, 64K). This problem will never happen for regular page size as one page only contains one sector. After the first find_lock_delalloc_range() call, the @delalloc_end will go beyond @page_end no matter if we found a delalloc range or not Thus this bug only happens for subpage, as now we need multiple runs to exhaust the delalloc range of a page. [FIX] Fix the problem by ensure the delalloc range we ran at least starts inside @locked_page. So that we will never got incorrect @page_started. And to prevent such problem from happening again: - Make find_lock_delalloc_range() to return false if the found range is beyond @end value passed in. Since @end will be utilized now, add an ASSERT() to ensure we pass correct @end into find_lock_delalloc_range(). This also means, for selftest we needs to populate @end before calling find_lock_delalloc_range(). - New ASSERT() in find_lock_delalloc_range() Now we will make sure the @start/@end passed in at least covers part of the page. - New ASSERT() in run_delalloc_range() To make sure the range at least starts inside @locked page. - Use @delalloc_start as proper cursor, while @delalloc_end is always reset to @page_end. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 36 ++++++++++++++++++++++++-------- fs/btrfs/inode.c | 7 +++++++ fs/btrfs/tests/extent-io-tests.c | 12 +++++------ 3 files changed, 40 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 44df7e142fb7..ac5fb3cd9922 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1975,10 +1975,18 @@ static noinline int lock_delalloc_pages(struct inode *inode, /* * Find and lock a contiguous range of bytes in the file marked as delalloc, no - * more than @max_bytes. @Start and @end are used to return the range, + * more than @max_bytes. * - * Return: true if we find something - * false if nothing was in the tree + * @start: The original start bytenr to search. + * Will store the extent range start bytenr. + * @end: The original end bytenr of the search range + * Will store the extent range end bytenr. + * + * Return true if we find a delalloc range which starts inside the original + * range, and @start/@end will store the delalloc range start/end. + * + * Return false if we can't find any delalloc range which starts inside the + * original range, and @start/@end will be the non-delalloc range start/end. */ EXPORT_FOR_TESTS noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, @@ -1986,6 +1994,8 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, u64 *end) { struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; + const u64 orig_start = *start; + const u64 orig_end = *end; u64 max_bytes = BTRFS_MAX_EXTENT_SIZE; u64 delalloc_start; u64 delalloc_end; @@ -1994,15 +2004,23 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, int ret; int loops = 0; + /* Caller should pass a valid @end to indicate the search range end */ + ASSERT(orig_end > orig_start); + + /* The range should at least cover part of the page */ + ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE || + orig_end <= page_offset(locked_page))); again: /* step one, find a bunch of delalloc bytes starting at start */ delalloc_start = *start; delalloc_end = 0; found = btrfs_find_delalloc_range(tree, &delalloc_start, &delalloc_end, max_bytes, &cached_state); - if (!found || delalloc_end <= *start) { + if (!found || delalloc_end <= *start || delalloc_start > orig_end) { *start = delalloc_start; - *end = delalloc_end; + + /* @delalloc_end can be -1, never go beyond @orig_end */ + *end = min(delalloc_end, orig_end); free_extent_state(cached_state); return false; } @@ -3780,16 +3798,16 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, struct page *page, struct writeback_control *wbc, unsigned long *nr_written) { - u64 page_end = page_offset(page) + PAGE_SIZE - 1; - bool found; + const u64 page_end = page_offset(page) + PAGE_SIZE - 1; u64 delalloc_start = page_offset(page); u64 delalloc_to_write = 0; - u64 delalloc_end = 0; int ret; int page_started = 0; + while (delalloc_start < page_end) { + u64 delalloc_end = page_end; + bool found; - while (delalloc_end < page_end) { found = find_lock_delalloc_range(&inode->vfs_inode, page, &delalloc_start, &delalloc_end); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index aa5c764fd1d4..241081e2cd5a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1965,6 +1965,13 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page int ret; const bool zoned = btrfs_is_zoned(inode->root->fs_info); + /* + * The range must cover part of the @locked_page, or the returned + * @page_started can confuse the caller. + */ + ASSERT(!(end <= page_offset(locked_page) || + start >= page_offset(locked_page) + PAGE_SIZE)); + if (should_nocow(inode, start, end)) { ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c index 73e96d505f4f..c2e72e7a8ff0 100644 --- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -112,7 +112,7 @@ static int test_find_delalloc(u32 sectorsize) */ set_extent_delalloc(tmp, 0, sectorsize - 1, 0, NULL); start = 0; - end = 0; + end = start + PAGE_SIZE - 1; found = find_lock_delalloc_range(inode, locked_page, &start, &end); if (!found) { @@ -143,7 +143,7 @@ static int test_find_delalloc(u32 sectorsize) } set_extent_delalloc(tmp, sectorsize, max_bytes - 1, 0, NULL); start = test_start; - end = 0; + end = start + PAGE_SIZE - 1; found = find_lock_delalloc_range(inode, locked_page, &start, &end); if (!found) { @@ -177,14 +177,14 @@ static int test_find_delalloc(u32 sectorsize) goto out_bits; } start = test_start; - end = 0; + end = start + PAGE_SIZE - 1; found = find_lock_delalloc_range(inode, locked_page, &start, &end); if (found) { test_err("found range when we shouldn't have"); goto out_bits; } - if (end != (u64)-1) { + if (end != test_start + PAGE_SIZE - 1) { test_err("did not return the proper end offset"); goto out_bits; } @@ -198,7 +198,7 @@ static int test_find_delalloc(u32 sectorsize) */ set_extent_delalloc(tmp, max_bytes, total_dirty - 1, 0, NULL); start = test_start; - end = 0; + end = start + PAGE_SIZE - 1; found = find_lock_delalloc_range(inode, locked_page, &start, &end); if (!found) { @@ -233,7 +233,7 @@ static int test_find_delalloc(u32 sectorsize) /* We unlocked it in the previous test */ lock_page(locked_page); start = test_start; - end = 0; + end = start + PAGE_SIZE - 1; /* * Currently if we fail to find dirty pages in the delalloc range we * will adjust max_bytes down to PAGE_SIZE and then re-search. If From patchwork Sun Aug 29 05:24:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12463743 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 230DAC432BE for ; Sun, 29 Aug 2021 05:25:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0A80F60F39 for ; Sun, 29 Aug 2021 05:25:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234551AbhH2F01 (ORCPT ); Sun, 29 Aug 2021 01:26:27 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59424 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234789AbhH2F0Z (ORCPT ); Sun, 29 Aug 2021 01:26:25 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 75AE021D6A for ; Sun, 29 Aug 2021 05:25:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1630214733; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FF/VedHCH9UHqBHtPk+5tw1ePogOIgLOfLJFcDgaZu4=; b=XdI3uQx6YcOUSsu1YHQuwM9bS7Rjd8R5xfZuQTJ4QBRCUtfb+XrfbJhNqm4/TcSDyHfY2o x0/NRqEfchD392zlUv8vTNz1hUzr2q4DlzJyfAlwMjvZY3raoSdYJHJqQQRyrnAF9HWFZo aeemnj3ndrqSGvjRQG2WbvBvRExuhoI= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id B20D613964 for ; Sun, 29 Aug 2021 05:25:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id II0DHUwaK2HnPAAAGKfGzw (envelope-from ) for ; Sun, 29 Aug 2021 05:25:32 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 26/26] btrfs: only allow subpage compression if the range is fully page aligned Date: Sun, 29 Aug 2021 13:24:58 +0800 Message-Id: <20210829052458.15454-27-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210829052458.15454-1-wqu@suse.com> References: <20210829052458.15454-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For btrfs compressed write, we use a mechanism called async cow, which unlike regular run_delalloc_cow() or cow_file_range(), it will also unlock the first page. This mechanism allows btrfs to continue handling next ranges, without waiting for the time consuming compression. But this has a problem for subpage case, as we could have the following delalloc range for a page: 0 32K 64K | |///////| |///////| \- A \- B In above case, if we pass both range to cow_file_range_async(), both range A and range B will try to unlock the full page [0, 64K). And which finishes later than the other range will try to do other page operations like end_page_writeback() on a unlocked page, triggering VM layer BUG_ON(). To make subpage compression work at least partially, here we add another restriction for it, only allow compression if the delalloc range is fully page aligned. By that, async extent is always ensured to unlock the first page exclusively, just like it used to be for regular sectorsize. In theory, we only need to make sure the delalloc range fully covers its first page, but the tailing page will be locked anyway, blocking later writeback until the compression finishes. Thus here we choose to make sure the range is fully page aligned before doing the compression. In the future, we could optimize the situation by properly increase subpage::writers number for the locked page, but that also means we need to change how we run delalloc range of page. (Instead of running each delalloc range we hit, we need to find and lock all delalloc range covers the page, then run each of them). Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 44 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 241081e2cd5a..64e35224175d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -489,9 +489,6 @@ static noinline int add_async_extent(struct async_chunk *cow, */ static inline bool inode_can_compress(struct btrfs_inode *inode) { - /* Subpage doesn't support compression yet */ - if (inode->root->fs_info->sectorsize < PAGE_SIZE) - return false; if (inode->flags & BTRFS_INODE_NODATACOW || inode->flags & BTRFS_INODE_NODATASUM) return false; @@ -513,6 +510,38 @@ static inline int inode_need_compress(struct btrfs_inode *inode, u64 start, btrfs_ino(inode)); return 0; } + /* + * Special check for subpage. + * + * We lock the full page then run each delalloc range in the page, thus + * for the following case, we will hit some subpage specific corner case: + * + * 0 32K 64K + * | |///////| |///////| + * \- A \- B + * + * In above case, both range A and range B will try to unlock the full + * page [0, 64K), causing the one finished later will have page + * unlocked already, triggering various page lock requirement BUG_ON()s. + * + * So here we add an artificial limit that subpage compression can only + * if the range is fully page aligned. + * + * In theory we only need to ensure the first page is fully covered, but + * the tailing partial page will be locked until the full compression + * finishes, delaying the write of other range. + * + * TODO: Make btrfs_run_delalloc_range() to lock all delalloc range + * first to prevent any submitted async extent to unlock the full page. + * By this, we can ensure for subpage case that only the last async_cow + * will unlock the full page. + */ + if (fs_info->sectorsize < PAGE_SIZE) { + if (!IS_ALIGNED(start, PAGE_SIZE) || + !IS_ALIGNED(end + 1, PAGE_SIZE)) + return 0; + } + /* force compress */ if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) return 1; @@ -614,11 +643,12 @@ static noinline int compress_file_range(struct async_chunk *async_chunk) total_compressed = actual_end - start; /* - * skip compression for a small file range(<=blocksize) that - * isn't an inline extent, since it doesn't save disk space at all. + * For subpage case, we require full page alignment for the sector + * aligned range. + * Thus we must also check against @actual_end, not just @end. */ - if (total_compressed <= blocksize && - (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size)) + if (blocksize < PAGE_SIZE && + !IS_ALIGNED(round_up(actual_end, blocksize), PAGE_SIZE)) goto cleanup_and_bail_uncompressed; total_compressed = min_t(unsigned long, total_compressed,