From patchwork Mon Sep 16 12:02:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11146921 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B6FB1708 for ; Mon, 16 Sep 2019 12:02:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 53DDE2067B for ; Mon, 16 Sep 2019 12:02:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732298AbfIPMCq (ORCPT ); Mon, 16 Sep 2019 08:02:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:55128 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727795AbfIPMCq (ORCPT ); Mon, 16 Sep 2019 08:02:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 77AC0AFF3; Mon, 16 Sep 2019 12:02:44 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v2 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Date: Mon, 16 Sep 2019 20:02:38 +0800 Message-Id: <20190916120239.12570-1-wqu@suse.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] Under the follow case with qgroup enabled, if some error happened after we have reserved delalloc space, then in error handling path, we could cause qgroup data space leakage: From btrfs_truncate_block() in inode.c: ret = btrfs_delalloc_reserve_space(inode, &data_reserved, block_start, blocksize); if (ret) goto out; again: page = find_or_create_page(mapping, index, mask); if (!page) { btrfs_delalloc_release_space(inode, data_reserved, block_start, blocksize, true); btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true); ret = -ENOMEM; goto out; } [CAUSE] In above case, btrfs_delalloc_reserve_space() will call btrfs_qgroup_reserve_data() and mark the io_tree range with EXTENT_QGROUP_RESERVED flag. In the error handling path, we have the following call stack: btrfs_delalloc_release_space() |- btrfs_free_reserved_data_space() |- btrsf_qgroup_free_data() |- __btrfs_qgroup_release_data(reserved=@reserved, free=1) |- qgroup_free_reserved_data(reserved=@reserved) |- clear_record_extent_bits(); |- freed += changeset.bytes_changed; However due to a completion bug, qgroup_free_reserved_data() will clear EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other than the correct BTRFS_I(inode)->io_tree. Since io_failure_tree is never marked with that flag, btrfs_qgroup_free_data() will not free any data reserved space at all, causing a leakage. This type of error handling can only be triggered by errors outside of qgroup code. So EDQUOT error from qgroup can't trigger it. [FIX] Fix the wrong target io_tree. Reported-by: Josef Bacik Fixes: bc42bda22345 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges") Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- Changelog: v2: - Commit message polishment Use proper call chain to describe the error, as it's pretty deep. And rephrase how to trigger the bug. --- fs/btrfs/qgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 2891b57b9e1e..64bdc3e3652d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3492,7 +3492,7 @@ static int qgroup_free_reserved_data(struct inode *inode, * EXTENT_QGROUP_RESERVED, we won't double free. * So not need to rush. */ - ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree, + ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, free_start, free_start + free_len - 1, EXTENT_QGROUP_RESERVED, &changeset); if (ret < 0) From patchwork Mon Sep 16 12:02:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11146923 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E897313BD for ; Mon, 16 Sep 2019 12:02:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D1DB72067B for ; Mon, 16 Sep 2019 12:02:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732310AbfIPMCr (ORCPT ); Mon, 16 Sep 2019 08:02:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:55136 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731501AbfIPMCq (ORCPT ); Mon, 16 Sep 2019 08:02:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1EC0EAF7C for ; Mon, 16 Sep 2019 12:02:45 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 2/2] btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls Date: Mon, 16 Sep 2019 20:02:39 +0800 Message-Id: <20190916120239.12570-2-wqu@suse.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190916120239.12570-1-wqu@suse.com> References: <20190916120239.12570-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] The following script can cause btrfs qgroup data space leak: mkfs.btrfs -f $dev mount $dev -o nospace_cache $mnt btrfs subv create $mnt/subv btrfs quota en $mnt btrfs quota rescan -w $mnt btrfs qgroup limit 128m $mnt/subv for (( i = 0; i < 3; i++)); do # Create 3 64M holes for latter fallocate to fail truncate -s 192m $mnt/subv/file xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null sync # it's supposed to fail, and each failure will leak at least 64M # data space xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null rm $mnt/subv/file sync done # Shouldn't fail after we removed the file xfs_io -f -c "falloc 0 64m" $mnt/subv/file [CAUSE] Btrfs qgroup data reserve code allow multiple reservations to happen on a single extent_changeset: E.g: btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M); btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M); btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M); Btrfs qgroup code has its internal tracking to make sure we don't double-reserve in above example. The only pattern utlizing this feature is in the main while loop of btrfs_fallocate() function. However btrfs_qgroup_reserve_data()'s error handling has a bug in that on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED flag but doesn't free previously reserved bytes. This bug has a two fold effect: - Clearing EXTENT_QGROUP_RESERVED ranges This is the correct behavior, but it prevents btrfs_qgroup_check_reserved_leak() to catch the leakage as the detector is purely EXTENT_QGROUP_RESERVED flag based. - Leak the previously reserved data bytes. The bug manifests when N calls to btrfs_qgroup_reserve_data are made and the last one fails, leaking space reserved in the previous ones. [FIX] Also free previously reserved data bytes when btrfs_qgroup_reserve_data fails. Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function") Signed-off-by: Qu Wenruo --- Changelog: v2: - Commit message polishment Rephrase the multiple btrfs_qgroup_reserve_data() calls ability. Rephrase the effect of the bug. --- fs/btrfs/qgroup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 64bdc3e3652d..59f6a9981087 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3448,6 +3448,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode, while ((unode = ulist_next(&reserved->range_changed, &uiter))) clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val, unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL); + /* Also free data bytes of already reserved one */ + btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid, + orig_reserved, BTRFS_QGROUP_RSV_DATA); extent_changeset_release(reserved); return ret; }