From patchwork Tue Sep 27 03:10:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 9351459 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8F5026077A for ; Tue, 27 Sep 2016 03:10:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E28D28E70 for ; Tue, 27 Sep 2016 03:10:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7235428FDC; Tue, 27 Sep 2016 03:10:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D3E8228FD0 for ; Tue, 27 Sep 2016 03:10:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752075AbcI0DKg (ORCPT ); Mon, 26 Sep 2016 23:10:36 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:46246 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S933600AbcI0DK1 (ORCPT ); Mon, 26 Sep 2016 23:10:27 -0400 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="11404817" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Sep 2016 11:10:20 +0800 Received: from G08CNEXCHPEKD02.g08.fujitsu.local (unknown [10.167.33.83]) by cn.fujitsu.com (Postfix) with ESMTP id A7578466F727; Tue, 27 Sep 2016 11:10:17 +0800 (CST) Received: from [172.16.0.100] (10.167.226.34) by G08CNEXCHPEKD02.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.279.2; Tue, 27 Sep 2016 11:10:16 +0800 Subject: Re: [PATCH] qgroup: Prevent qgroup->reserved from going subzero To: Goldwyn Rodrigues , Goldwyn Rodrigues , References: <1474570050-4715-1-git-send-email-rgoldwyn@suse.de> <628e1dcd-ddc3-3f37-8a47-c01c912db970@suse.com> <65e6515c-a9f3-d5f9-db6b-3ffd8d97f90e@cn.fujitsu.com> <0e892f0a-22b3-0fd7-ac8f-d52b76b4d4af@suse.de> From: Qu Wenruo Message-ID: <1274b3b2-d196-ee9e-2261-d2f5cb906477@cn.fujitsu.com> Date: Tue, 27 Sep 2016 11:10:16 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <0e892f0a-22b3-0fd7-ac8f-d52b76b4d4af@suse.de> X-Originating-IP: [10.167.226.34] X-yoursite-MailScanner-ID: A7578466F727.AF26A X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: quwenruo@cn.fujitsu.com Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP At 09/26/2016 10:31 PM, Goldwyn Rodrigues wrote: > > > On 09/25/2016 09:33 PM, Qu Wenruo wrote: >> >> >> At 09/23/2016 09:43 PM, Goldwyn Rodrigues wrote: >>> >>> [snipped] >> >> Sorry I still don't get the point. >> Would you please give a call flow of the timing dirtying page and >> calling btrfs_qgroup_reserve/free/release_data()? >> >> Like: >> __btrfs_buffered_write() >> |- btrfs_check_data_free_space() >> | |- btrfs_qgroup_reserve_data() <- Mark QGROUP_RESERVED bit >> |- btrfs_dirty_pages() <- Mark page dirty >> >> >> [[Timing of btrfs_invalidatepage()]] >> About your commit message "once while invalidating the page, and the >> next time while free'ing delalloc." >> "Free'ing delalloc" did you mean btrfs_qgroup_free_delayed_ref(). >> >> If so, it means one extent goes through full write back, and long before >> calling btrfs_qgroup_free_delayed_ref(), it will call >> btrfs_qgroup_release_data() to clear QGROUP_RESERVED. >> >> So the call will be: >> __btrfs_buffered_write() >> |- btrfs_check_data_free_space() >> | |- btrfs_qgroup_reserve_data() <- Mark QGROUP_RESERVED bit >> |- btrfs_dirty_pages() <- Mark page dirty >> >> >> run_delalloc_range() >> |- cow_file_range() >> |- extent_clear_unlock_delalloc() <- Clear page dirty >> >> >> >> btrfs_finish_ordered_io() >> |- insert_reserved_file_extent() >> |- btrfs_qgroup_release_data() <- Clear QGROUP_RESERVED bit >> but not decrease reserved space >> >> >> run_one_delyaed_refs() >> |- btrfs_qgroup_free_delayed_ref() <- Directly decrease reserved space >> >> >> So the problem seems to be, btrfs_invalidatepage() is called after >> run_delalloc_range() but before btrfs_finish_ordered_io(). >> >> Did you mean that? > > This happens event before a writeback happens. So, here is what is > happening. This is in reference, and specific to the test case in the > description. The flowchart helps a lot, thanks. But still some flow is still strange. > > Process: dd - first time > __btrfs_buffered_write() > |- btrfs_check_data_free_space() > | |- btrfs_qgroup_reserve_data() <- Mark QGROUP_RESERVED bit > |- btrfs_dirty_pages() <- Mark page dirty > > Please note data writeback does _not_ happen/complete. This part is completely fine. > > Process: dd - next time, new process > sys_open O_TRUNC > . > |-btrfs_setattr() > |-truncate_pagecache() > |-truncate_inode_pages_range() > |-truncate_inode_page() - Page is cleared of Dirty flag. > |-btrfs_invalidatepage(page) > |-__btrfs_qgroup_release_data() > |-btrfs_qgroup_free_refroot() - qgroup->reserved is > reduced by PAGESIZE. That's OK too. No problem. Although the __btrfs_qgroup_release_data() is called by btrfs_qgroup_free_data(). Which cleared QGROUP_RESERVED flag. So the call flow should be |-btrfs_setattr() |-truncate_pagecache() |-truncate_inode_pages_range() |-truncate_inode_page() <- Clear page dirty |-btrfs_invalidatepage(page) |-btrfs_qgroup_free_data() <- reduce qgroup->reserved and clear QGROUP_RESERVED After btrfs_setattr(), the new dd write data. So __btrfs_buffered_write() happens again, dirtying pages and mark QGROUP_RESERVED and increase qgroup->reserved. So here I don't see any problem buffered_write: Mark dirty Mark QGROUP_RESERVED Increase qgroup->reserved btrfs_setattr: Clear dirty Clear QGROUP_RESERVED Decrease qgroup->reserved All pairs with each other, no leak no underflow. Until the last buffered_write(), which doesn't got truncated. > > > Process: sync > btrfs_sync_fs() > |-btrfs_commit_transaction() > |-btrfs_run_delayed_refs() > |- qgroup_free_refroot() - Reduces reserved by the size of the > extent (in this case, the filesize of dd (first time)), even though some > of the PAGESIZEs have been reduced while performing the truncate on the > file. The delayed_ref belongs to the last extent, which doesn't got truncated. And in that case, that last extent got into disk. run_dealloc_range() <- Write data into disk finish_ordered_io() <- Update metadata |- insert_reserved_file_extents() |- btrfs_alloc_reserved_file_extent() | |- btrfs_add_delayed_data_ref | |- btrfs_qgroup_release_data() Then goes to your btrfs_sync_fs() => qgroup_free_refroot() [[I think I find the problem now]] Between btrfs_alloc_reserved_file_extent() and btrfs_qgroup_release_data(), there is a window that delayed_refs can be executed. Since that d*mn delayed_refs can be run at any time asynchronously. In that case, it will call qgroup_free_refroot() first at another process context, and later btrfs_qgroup_release_data() get schedualed, which will clear QGROUP_RESERVED again, causing the underflow. Although it's not the original cause you reported, would you mind to try the following quick-dirty fix without your fix, to see if it fixes the problem? ------- ------- > > I hope that makes it clear. > >> >> [[About test case]] >> And for the test case, I can't reproduce the problem no matter if I >> apply the fix or not. >> >> Either way it just fails after 3 loops of dd, and later dd will all fail. >> But I can still remove the file and write new data into the fs. >> > > Strange? I can reproduce at every instance of running it. Even on 4.8.0-rc7 Maybe related to the memory size. Since you're also using VM, maybe your VM RAM is too small, so dirty pages pressure triggers a commit halfway and cause the race? Thanks, Qu > >> >> [[Extra protect about qgroup->reserved]] >> And for the underflowed qgroup reserve space, would you mind to add >> warning for that case? >> Just like what we did in qgroup excl/rfer values, so at least it won't >> make qgroup blocking any write. >> > > Oh yes, I wonder why that is not placed there when it is placed in all > other location where qgroup->reserved is reduced. > > Also, this has nothing to do with the comment of two ways of free'ing > the qgroups, as suggested in the commit statement. Thats my bad. > Reviewed-by: Qu Wenruo --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e6811c4..70efa14 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2168,15 +2168,15 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, ins.objectid = disk_bytenr; ins.offset = disk_num_bytes; ins.type = BTRFS_EXTENT_ITEM_KEY; - ret = btrfs_alloc_reserved_file_extent(trans, root, - root->root_key.objectid, - btrfs_ino(inode), file_pos, - ram_bytes, &ins); /* * Release the reserved range from inode dirty range map, as it is * already moved into delayed_ref_head */ btrfs_qgroup_release_data(inode, file_pos, ram_bytes); + ret = btrfs_alloc_reserved_file_extent(trans, root, + root->root_key.objectid, + btrfs_ino(inode), file_pos, + ram_bytes, &ins); out: btrfs_free_path(path);