Message ID | cover.1706177914.git.fdmanana@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: some fixes around unused block deletion | expand |
On 25.01.24 11:28, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > These fix a couple issues regarding block group deletion, either deleting > an unused block group when it shouldn't be deleted due to outstanding > reservations relying on the block group, or unused block groups never > getting deleted since they were created due to pessimistic space > reservation and ended up never being used. More details on the change > logs of each patch. > > Filipe Manana (5): > btrfs: add and use helper to check if block group is used > btrfs: do not delete unused block group if it may be used soon > btrfs: add new unused block groups to the list of unused block groups > btrfs: document what the spinlock unused_bgs_lock protects > btrfs: add comment about list_is_singular() use at btrfs_delete_unused_bgs() > > fs/btrfs/block-group.c | 87 +++++++++++++++++++++++++++++++++++++++++- > fs/btrfs/block-group.h | 7 ++++ > fs/btrfs/fs.h | 3 ++ > 3 files changed, 95 insertions(+), 2 deletions(-) > Looks good to me, Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On Thu, Jan 25, 2024 at 10:26:23AM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > These fix a couple issues regarding block group deletion, either deleting > an unused block group when it shouldn't be deleted due to outstanding > reservations relying on the block group, or unused block groups never > getting deleted since they were created due to pessimistic space > reservation and ended up never being used. More details on the change > logs of each patch. > Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Thu, Jan 25, 2024 at 10:26:23AM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > These fix a couple issues regarding block group deletion, either deleting > an unused block group when it shouldn't be deleted due to outstanding > reservations relying on the block group, or unused block groups never > getting deleted since they were created due to pessimistic space > reservation and ended up never being used. More details on the change > logs of each patch. > > Filipe Manana (5): > btrfs: add and use helper to check if block group is used > btrfs: do not delete unused block group if it may be used soon > btrfs: add new unused block groups to the list of unused block groups > btrfs: document what the spinlock unused_bgs_lock protects > btrfs: add comment about list_is_singular() use at btrfs_delete_unused_bgs() > > fs/btrfs/block-group.c | 87 +++++++++++++++++++++++++++++++++++++++++- > fs/btrfs/block-group.h | 7 ++++ > fs/btrfs/fs.h | 3 ++ > 3 files changed, 95 insertions(+), 2 deletions(-) > > -- > 2.40.1 > Reviewed-by: Boris Burkov <boris@bur.io>
On 2024-01-25 at 10:26 +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > These fix a couple issues regarding block group deletion, either > deleting > an unused block group when it shouldn't be deleted due to outstanding > reservations relying on the block group, or unused block groups never > getting deleted since they were created due to pessimistic space > reservation and ended up never being used. More details on the change > logs of each patch. > > Filipe Manana (5): > btrfs: add and use helper to check if block group is used > btrfs: do not delete unused block group if it may be used soon > btrfs: add new unused block groups to the list of unused block > groups > btrfs: document what the spinlock unused_bgs_lock protects > btrfs: add comment about list_is_singular() use at > btrfs_delete_unused_bgs() > > fs/btrfs/block-group.c | 87 > +++++++++++++++++++++++++++++++++++++++++- > fs/btrfs/block-group.h | 7 ++++ > fs/btrfs/fs.h | 3 ++ > 3 files changed, 95 insertions(+), 2 deletions(-) > Still broken for me, unfortunately.
On Sat, Jan 27, 2024 at 9:39 PM Ivan Shapovalov <intelfx@intelfx.name> wrote: > > On 2024-01-25 at 10:26 +0000, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > These fix a couple issues regarding block group deletion, either > > deleting > > an unused block group when it shouldn't be deleted due to outstanding > > reservations relying on the block group, or unused block groups never > > getting deleted since they were created due to pessimistic space > > reservation and ended up never being used. More details on the change > > logs of each patch. > > > > Filipe Manana (5): > > btrfs: add and use helper to check if block group is used > > btrfs: do not delete unused block group if it may be used soon > > btrfs: add new unused block groups to the list of unused block > > groups > > btrfs: document what the spinlock unused_bgs_lock protects > > btrfs: add comment about list_is_singular() use at > > btrfs_delete_unused_bgs() > > > > fs/btrfs/block-group.c | 87 > > +++++++++++++++++++++++++++++++++++++++++- > > fs/btrfs/block-group.h | 7 ++++ > > fs/btrfs/fs.h | 3 ++ > > 3 files changed, 95 insertions(+), 2 deletions(-) > > > > Still broken for me, unfortunately. I'm curious about your workload. Is it something like continuous, non-stop deduplication or cloning for example? Did you actually experience -ENOSPC errors? Also, if you unmount and then mount again the fs, any unused block groups should be scheduled for deletion once the cleaner thread runs, at least if there's not a huge workload for a minute or two. On top of this patchset, can you try the following patch? https://pastebin.com/raw/U7b0e03g If that still doesn't help, try the following the following patch on top of this patchset: https://pastebin.com/raw/rKiSmG5w Thanks. > > -- > Ivan Shapovalov / intelfx /
On 2024-01-29 at 12:49 +0000, Filipe Manana wrote: > On Sat, Jan 27, 2024 at 9:39 PM Ivan Shapovalov > <intelfx@intelfx.name> wrote: > > > > On 2024-01-25 at 10:26 +0000, fdmanana@kernel.org wrote: > > > From: Filipe Manana <fdmanana@suse.com> > > > > > > These fix a couple issues regarding block group deletion, either > > > deleting > > > an unused block group when it shouldn't be deleted due to > > > outstanding > > > reservations relying on the block group, or unused block groups > > > never > > > getting deleted since they were created due to pessimistic space > > > reservation and ended up never being used. More details on the > > > change > > > logs of each patch. > > > > > > Filipe Manana (5): > > > btrfs: add and use helper to check if block group is used > > > btrfs: do not delete unused block group if it may be used soon > > > btrfs: add new unused block groups to the list of unused block > > > groups > > > btrfs: document what the spinlock unused_bgs_lock protects > > > btrfs: add comment about list_is_singular() use at > > > btrfs_delete_unused_bgs() > > > > > > fs/btrfs/block-group.c | 87 > > > +++++++++++++++++++++++++++++++++++++++++- > > > fs/btrfs/block-group.h | 7 ++++ > > > fs/btrfs/fs.h | 3 ++ > > > 3 files changed, 95 insertions(+), 2 deletions(-) > > > > > > > Still broken for me, unfortunately. > > I'm curious about your workload. Is it something like continuous, > non-stop deduplication or cloning for example? > > Did you actually experience -ENOSPC errors? > > Also, if you unmount and then mount again the fs, any unused block > groups should be scheduled for deletion once the cleaner thread runs, > at least if there's not a huge workload for a minute or two. Apologies, I must have clarified the workload. The easiest way to reproduce the original problem was to run a metadata balance, e. g. `btrfs balance start -m /` or `for u in {0..75..5}; do btrfs balance start -musage=$u /`. I originally spotted this regression by observing an abnormally low metadata space utilization and trying to run the above balance, only to discover that it makes overallocation even worse. So, with the patchset applied, the _metadata balance_ is still broken. I cannot say anything about normal usage, but any attempt to balance metadata immediately explodes the metadata allocation. I did not experience a -ENOSPC per se, but the balance (that I used as a reproducer) eventually consumed *all* unallocated space on the volume and stopped making progress. Attempting to cancel that balance RO'ed the filesystem and I had to reboot. > > On top of this patchset, can you try the following patch? > > https://pastebin.com/raw/U7b0e03g > > If that still doesn't help, try the following the following patch on > top of this patchset: > > https://pastebin.com/raw/rKiSmG5w Should I still try these patches to see if they help with metadata balance, or is that a different problem then? Thanks,
On Mon, Jan 29, 2024 at 5:56 PM Ivan Shapovalov <intelfx@intelfx.name> wrote: > > On 2024-01-29 at 12:49 +0000, Filipe Manana wrote: > > On Sat, Jan 27, 2024 at 9:39 PM Ivan Shapovalov > > <intelfx@intelfx.name> wrote: > > > > > > On 2024-01-25 at 10:26 +0000, fdmanana@kernel.org wrote: > > > > From: Filipe Manana <fdmanana@suse.com> > > > > > > > > These fix a couple issues regarding block group deletion, either > > > > deleting > > > > an unused block group when it shouldn't be deleted due to > > > > outstanding > > > > reservations relying on the block group, or unused block groups > > > > never > > > > getting deleted since they were created due to pessimistic space > > > > reservation and ended up never being used. More details on the > > > > change > > > > logs of each patch. > > > > > > > > Filipe Manana (5): > > > > btrfs: add and use helper to check if block group is used > > > > btrfs: do not delete unused block group if it may be used soon > > > > btrfs: add new unused block groups to the list of unused block > > > > groups > > > > btrfs: document what the spinlock unused_bgs_lock protects > > > > btrfs: add comment about list_is_singular() use at > > > > btrfs_delete_unused_bgs() > > > > > > > > fs/btrfs/block-group.c | 87 > > > > +++++++++++++++++++++++++++++++++++++++++- > > > > fs/btrfs/block-group.h | 7 ++++ > > > > fs/btrfs/fs.h | 3 ++ > > > > 3 files changed, 95 insertions(+), 2 deletions(-) > > > > > > > > > > Still broken for me, unfortunately. > > > > I'm curious about your workload. Is it something like continuous, > > non-stop deduplication or cloning for example? > > > > Did you actually experience -ENOSPC errors? > > > > Also, if you unmount and then mount again the fs, any unused block > > groups should be scheduled for deletion once the cleaner thread runs, > > at least if there's not a huge workload for a minute or two. > > Apologies, I must have clarified the workload. > > The easiest way to reproduce the original problem was to run a metadata > balance, e. g. `btrfs balance start -m /` or > `for u in {0..75..5}; do btrfs balance start -musage=$u /`. > > I originally spotted this regression by observing an abnormally low > metadata space utilization and trying to run the above balance, only to > discover that it makes overallocation even worse. > > So, with the patchset applied, the _metadata balance_ is still broken. > I cannot say anything about normal usage, but any attempt to balance > metadata immediately explodes the metadata allocation. I did not > experience a -ENOSPC per se, but the balance (that I used as a > reproducer) eventually consumed *all* unallocated space on the volume > and stopped making progress. Attempting to cancel that balance RO'ed > the filesystem and I had to reboot. I tried that but I didn't see any problem. Here's a sample test script: https://pastebin.com/raw/vc70BqY5 > > > > > On top of this patchset, can you try the following patch? > > > > https://pastebin.com/raw/U7b0e03g > > > > If that still doesn't help, try the following the following patch on > > top of this patchset: > > > > https://pastebin.com/raw/rKiSmG5w > > Should I still try these patches to see if they help with metadata > balance, or is that a different problem then? Yes. Please try them, given that I can't reproduce your issue with the balance. I could trigger getting 1 or 2 at most unused metadata block groups after having a bunch of tasks doing metadata operations, and patch 3/5 is to address that case. So one thing I'm curious about, and I asked before, is that if you unmount and mount again the fs, after a few minutes (without running that balance), you should get space reclaimed, as any empty block groups are scheduled for deletion at mount time, and the next time the cleaner kthread runs, it deletes them. > > Thanks, > -- > Ivan Shapovalov / intelfx /
On 2024-01-29 at 20:28 +0000, Filipe Manana wrote: > On Mon, Jan 29, 2024 at 5:56 PM Ivan Shapovalov > <intelfx@intelfx.name> wrote: > > > > On 2024-01-29 at 12:49 +0000, Filipe Manana wrote: > > > On Sat, Jan 27, 2024 at 9:39 PM Ivan Shapovalov > > > <intelfx@intelfx.name> wrote: > > > > > > > > On 2024-01-25 at 10:26 +0000, fdmanana@kernel.org wrote: > > > > > From: Filipe Manana <fdmanana@suse.com> > > > > > > > > > > These fix a couple issues regarding block group deletion, > > > > > either > > > > > deleting > > > > > an unused block group when it shouldn't be deleted due to > > > > > outstanding > > > > > reservations relying on the block group, or unused block > > > > > groups > > > > > never > > > > > getting deleted since they were created due to pessimistic > > > > > space > > > > > reservation and ended up never being used. More details on > > > > > the > > > > > change > > > > > logs of each patch. > > > > > > > > > > Filipe Manana (5): > > > > > btrfs: add and use helper to check if block group is used > > > > > btrfs: do not delete unused block group if it may be used > > > > > soon > > > > > btrfs: add new unused block groups to the list of unused > > > > > block > > > > > groups > > > > > btrfs: document what the spinlock unused_bgs_lock protects > > > > > btrfs: add comment about list_is_singular() use at > > > > > btrfs_delete_unused_bgs() > > > > > > > > > > fs/btrfs/block-group.c | 87 > > > > > +++++++++++++++++++++++++++++++++++++++++- > > > > > fs/btrfs/block-group.h | 7 ++++ > > > > > fs/btrfs/fs.h | 3 ++ > > > > > 3 files changed, 95 insertions(+), 2 deletions(-) > > > > > > > > > > > > > Still broken for me, unfortunately. > > > > > > I'm curious about your workload. Is it something like continuous, > > > non-stop deduplication or cloning for example? > > > > > > Did you actually experience -ENOSPC errors? > > > > > > Also, if you unmount and then mount again the fs, any unused > > > block > > > groups should be scheduled for deletion once the cleaner thread > > > runs, > > > at least if there's not a huge workload for a minute or two. > > > > Apologies, I must have clarified the workload. > > > > The easiest way to reproduce the original problem was to run a > > metadata > > balance, e. g. `btrfs balance start -m /` or > > `for u in {0..75..5}; do btrfs balance start -musage=$u /`. > > > > I originally spotted this regression by observing an abnormally low > > metadata space utilization and trying to run the above balance, > > only to > > discover that it makes overallocation even worse. > > > > So, with the patchset applied, the _metadata balance_ is still > > broken. > > I cannot say anything about normal usage, but any attempt to > > balance > > metadata immediately explodes the metadata allocation. I did not > > experience a -ENOSPC per se, but the balance (that I used as a > > reproducer) eventually consumed *all* unallocated space on the > > volume > > and stopped making progress. Attempting to cancel that balance > > RO'ed > > the filesystem and I had to reboot. > > I tried that but I didn't see any problem. > Here's a sample test script: > > https://pastebin.com/raw/vc70BqY5 > > > > > > > > > On top of this patchset, can you try the following patch? > > > > > > https://pastebin.com/raw/U7b0e03g > > > > > > If that still doesn't help, try the following the following patch > > > on > > > top of this patchset: > > > > > > https://pastebin.com/raw/rKiSmG5w > > > > Should I still try these patches to see if they help with metadata > > balance, or is that a different problem then? > > Yes. Please try them, given that I can't reproduce your issue with > the balance. Hi, The second pastebinned patch (applied on top of the patchset and the first pastebinned patch) passes my balance smoke test. As for reproducing the issue: it's a long shot, but could you try on a 4K-native block device (i. e. where `blockdev --getpbsz` tells `4096`)? > > I could trigger getting 1 or 2 at most unused metadata block groups > after having a bunch of tasks doing metadata operations, and patch > 3/5 > is to address that case. > > So one thing I'm curious about, and I asked before, is that if you > unmount and mount again the fs, after a few minutes (without running > that balance), you should get space reclaimed, > as any empty block groups are scheduled for deletion at mount time, > and the next time the cleaner kthread runs, it deletes them. Yes, that indeed does happen. Cancelling the balance and letting the system idle for a while reclaims all (or almost all) overallocated space. However, that only happens _after_ I cancel the balance. If I do not cancel it, it ends up consuming all unallocated space on the volume and ceases to make progress. So, basically, at least the balance operation becomes completely unusable.
On Fri, Feb 2, 2024 at 12:52 AM Ivan Shapovalov <intelfx@intelfx.name> wrote: > > On 2024-01-29 at 20:28 +0000, Filipe Manana wrote: > > On Mon, Jan 29, 2024 at 5:56 PM Ivan Shapovalov > > <intelfx@intelfx.name> wrote: > > > > > > On 2024-01-29 at 12:49 +0000, Filipe Manana wrote: > > > > On Sat, Jan 27, 2024 at 9:39 PM Ivan Shapovalov > > > > <intelfx@intelfx.name> wrote: > > > > > > > > > > On 2024-01-25 at 10:26 +0000, fdmanana@kernel.org wrote: > > > > > > From: Filipe Manana <fdmanana@suse.com> > > > > > > > > > > > > These fix a couple issues regarding block group deletion, > > > > > > either > > > > > > deleting > > > > > > an unused block group when it shouldn't be deleted due to > > > > > > outstanding > > > > > > reservations relying on the block group, or unused block > > > > > > groups > > > > > > never > > > > > > getting deleted since they were created due to pessimistic > > > > > > space > > > > > > reservation and ended up never being used. More details on > > > > > > the > > > > > > change > > > > > > logs of each patch. > > > > > > > > > > > > Filipe Manana (5): > > > > > > btrfs: add and use helper to check if block group is used > > > > > > btrfs: do not delete unused block group if it may be used > > > > > > soon > > > > > > btrfs: add new unused block groups to the list of unused > > > > > > block > > > > > > groups > > > > > > btrfs: document what the spinlock unused_bgs_lock protects > > > > > > btrfs: add comment about list_is_singular() use at > > > > > > btrfs_delete_unused_bgs() > > > > > > > > > > > > fs/btrfs/block-group.c | 87 > > > > > > +++++++++++++++++++++++++++++++++++++++++- > > > > > > fs/btrfs/block-group.h | 7 ++++ > > > > > > fs/btrfs/fs.h | 3 ++ > > > > > > 3 files changed, 95 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > > > Still broken for me, unfortunately. > > > > > > > > I'm curious about your workload. Is it something like continuous, > > > > non-stop deduplication or cloning for example? > > > > > > > > Did you actually experience -ENOSPC errors? > > > > > > > > Also, if you unmount and then mount again the fs, any unused > > > > block > > > > groups should be scheduled for deletion once the cleaner thread > > > > runs, > > > > at least if there's not a huge workload for a minute or two. > > > > > > Apologies, I must have clarified the workload. > > > > > > The easiest way to reproduce the original problem was to run a > > > metadata > > > balance, e. g. `btrfs balance start -m /` or > > > `for u in {0..75..5}; do btrfs balance start -musage=$u /`. > > > > > > I originally spotted this regression by observing an abnormally low > > > metadata space utilization and trying to run the above balance, > > > only to > > > discover that it makes overallocation even worse. > > > > > > So, with the patchset applied, the _metadata balance_ is still > > > broken. > > > I cannot say anything about normal usage, but any attempt to > > > balance > > > metadata immediately explodes the metadata allocation. I did not > > > experience a -ENOSPC per se, but the balance (that I used as a > > > reproducer) eventually consumed *all* unallocated space on the > > > volume > > > and stopped making progress. Attempting to cancel that balance > > > RO'ed > > > the filesystem and I had to reboot. > > > > I tried that but I didn't see any problem. > > Here's a sample test script: > > > > https://pastebin.com/raw/vc70BqY5 > > > > > > > > > > > > > On top of this patchset, can you try the following patch? > > > > > > > > https://pastebin.com/raw/U7b0e03g > > > > > > > > If that still doesn't help, try the following the following patch > > > > on > > > > top of this patchset: > > > > > > > > https://pastebin.com/raw/rKiSmG5w > > > > > > Should I still try these patches to see if they help with metadata > > > balance, or is that a different problem then? > > > > Yes. Please try them, given that I can't reproduce your issue with > > the balance. > > Hi, > > The second pastebinned patch (applied on top of the patchset and the > first pastebinned patch) passes my balance smoke test. Ok, thanks, I've just sent it to the list a few minutes ago: https://lore.kernel.org/linux-btrfs/eba624e8cef9a1e84c9e1ba0c8f32347aa487e63.1706892030.git.fdmanana@suse.com/ > > As for reproducing the issue: it's a long shot, but could you try on a > 4K-native block device (i. e. where `blockdev --getpbsz` tells `4096`)? Well, that's irrelevant. It doesn't change anything on btrfs (node sizes, sector sizes, etc). > > > > > I could trigger getting 1 or 2 at most unused metadata block groups > > after having a bunch of tasks doing metadata operations, and patch > > 3/5 > > is to address that case. > > > > So one thing I'm curious about, and I asked before, is that if you > > unmount and mount again the fs, after a few minutes (without running > > that balance), you should get space reclaimed, > > as any empty block groups are scheduled for deletion at mount time, > > and the next time the cleaner kthread runs, it deletes them. > > Yes, that indeed does happen. Cancelling the balance and letting the > system idle for a while reclaims all (or almost all) overallocated > space. However, that only happens _after_ I cancel the balance. If I do > not cancel it, it ends up consuming all unallocated space on the volume > and ceases to make progress. So, basically, at least the balance > operation becomes completely unusable. Ok, I see. I think relocation of a block group creates another block group when reserving space somewhere, and that block group gets caught next for relocation, which generates another one, and so forth. Thanks for the testing and report. > > -- > Ivan Shapovalov / intelfx /
From: Filipe Manana <fdmanana@suse.com> These fix a couple issues regarding block group deletion, either deleting an unused block group when it shouldn't be deleted due to outstanding reservations relying on the block group, or unused block groups never getting deleted since they were created due to pessimistic space reservation and ended up never being used. More details on the change logs of each patch. Filipe Manana (5): btrfs: add and use helper to check if block group is used btrfs: do not delete unused block group if it may be used soon btrfs: add new unused block groups to the list of unused block groups btrfs: document what the spinlock unused_bgs_lock protects btrfs: add comment about list_is_singular() use at btrfs_delete_unused_bgs() fs/btrfs/block-group.c | 87 +++++++++++++++++++++++++++++++++++++++++- fs/btrfs/block-group.h | 7 ++++ fs/btrfs/fs.h | 3 ++ 3 files changed, 95 insertions(+), 2 deletions(-)