@@ -3091,7 +3091,22 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
* chunk tree entries
*/
ret = btrfs_remove_chunk(trans, chunk_offset);
- btrfs_end_transaction(trans);
+ if (ret < 0) {
+ btrfs_abort_transaction(trans, ret);
+ btrfs_end_transaction(trans);
+ return ret;
+ }
+
+ /*
+ * If the block group has a lot of extents (common for data block
+ * groups) we may have too many block rsv for delayed_refs, which
+ * may cause ENOSPC for the next balance.
+ *
+ * The root fix is to make delayed_refs estimation more accurate,
+ * but here we can commit transaction to run delayed refs so that
+ * delayed_refs_rsv will be reset to regular level.
+ */
+ ret = btrfs_commit_transaction(trans);
return ret;
}
[BUG] When doing full balance for certain fs, it can cause unexpected ENOSPC: BTRFS info (device loop0p3): balance: start -d -m -s BTRFS info (device loop0p3): relocating block group 1104150528 flags data BTRFS info (device loop0p3): found 14659 extents BTRFS info (device loop0p3): found 14659 extents BTRFS info (device loop0p3): unable to make block group 30408704 ro BTRFS info (device loop0p3): sinfo_used=2298347520 bg_num_bytes=1046872064 min_allocable=1048576 BTRFS info (device loop0p3): space_info 4 has 18446744072492285952 free, is not full BTRFS info (device loop0p3): space_info total=1073741824, used=24281088, pinned=1277952, reserved=1245184, may_use=2264137728, readonly=65536 BTRFS info (device loop0p3): global_block_rsv: size 3407872 reserved 3407872 BTRFS info (device loop0p3): trans_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): chunk_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): delayed_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): delayed_refs_rsv: size 2260205568 reserved 2260205568 BTRFS info (device loop0p3): unable to make block group 30408704 ro BTRFS info (device loop0p3): sinfo_used=2289958912 bg_num_bytes=1046872064 min_allocable=1048576 BTRFS info (device loop0p3): space_info 4 has 18446744072792424448 free, is not full BTRFS info (device loop0p3): space_info total=1342177280, used=24281088, pinned=1277952, reserved=1261568, may_use=2232418304, readonly=65536 BTRFS info (device loop0p3): global_block_rsv: size 3407872 reserved 3407872 BTRFS info (device loop0p3): trans_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): chunk_block_rsv: size 393216 reserved 393216 BTRFS info (device loop0p3): delayed_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): delayed_refs_rsv: size 2228486144 reserved 2228486144 BTRFS info (device loop0p3): unable to make block group 22020096 ro BTRFS info (device loop0p3): sinfo_used=32768 bg_num_bytes=8355840 min_allocable=1048576 BTRFS info (device loop0p3): space_info 2 has 8355840 free, is not full BTRFS info (device loop0p3): space_info total=8388608, used=16384, pinned=0, reserved=16384, may_use=0, readonly=0 BTRFS info (device loop0p3): global_block_rsv: size 3407872 reserved 3407872 BTRFS info (device loop0p3): trans_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): chunk_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): delayed_block_rsv: size 0 reserved 0 BTRFS info (device loop0p3): delayed_refs_rsv: size 2093481984 reserved 2093481984 [CAUSE] For data block group 1104150528, it has 14659 extents got relocated, thus its data inode (inode for relocation, records all newerly relocated data) can be pretty big, with exactly 14659 non-hole data extents. That would cause a lot of space being reserved for delayed_refs, that's more or less acceptable for regular inodes. And unfortunately, currently we are already over-esitmating to ensure we will have enough space for delayed refs updates, so we reserved around 2.2G space just to delete that data inode. Then we are going to relocate the next block group, our metadata block group is only 1G, but has already reserved 2.2G, there is no wonder we will fail with ENOSPC. [WORKAROUND] The real fix needs to rework how we calculate reserved space for delayed_refs_rsv. But at least, we can work around this false ENOSPC, by commit transaction immediately after putting that data inode. There will be still a window where our metadata space is exhausted, but that would still be better than returning ENOSPC. Signed-off-by: Qu Wenruo <wqu@suse.com> --- Reason for RFC: Obviously, this is a pretty bad workaround, just trying to make it work for balance. In fact, if we're just deleting a super fragemented file, it could cause the problem, and this patch can't address it. I'm still looking into the delayed_refs_rsv part for btrfs_evict_inode(), so this patch is definitely not a good solution. But this RFC itself may inspire us to get better solution. --- fs/btrfs/volumes.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)