Message ID | 77ec19769e75c704cb260b98b41e33340a51c40c.1692181669.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: retry flushing for del_balance_item() if the transaction is interrupted | expand |
On Wed, Aug 16, 2023 at 06:28:16PM +0800, Qu Wenruo wrote: > [BUG] > > There is an internal bug report that there are only 3 lines of btrfs > errors, then btrfs falls read-only: > > [358958.022131] BTRFS info (device dm-9): balance: canceled > [358958.022148] BTRFS: error (device dm-9) in __cancel_balance:4014: errno=-4 unknown > [358958.022150] BTRFS info (device dm-9): forced readonly > > [CAUSE] > The error number -4 is -EINTR, and according to the code line (although > backported kernel, the code is still relevant upstream), it's the > btrfs_handle_fs_error() call inside reset_balance_state(). > > This can happen when we try to start a transaction which requires > metadata flushing. > > This metadata flushing can be interrupted by signal, thus it can return > -EINTR. > > For our case, the -EINTR is deadly because we don't handle the error at > all, and immediately mark the fs read-only in the following call chain: > > reset_balance_state() > |- del_balance_item() > | `- btrfs_start_transation_fallback_global_rsv() > | `- start_transaction() > | `- btrfs_block_rsv_add() > | `- __reserve_bytes() > | `- handle_reserve_ticket() > | `- wait_reserve_ticket() > | `- prepare_to_wait_event() > | This wait has TASK_KILLABLE, thus can be > | interrupted. > | Thus we return -EINTR. > | > |- IS_ERR(trans) triggered > |- btrfs_handle_fs_error() > The fs is marked read-only. > > [FIX] > For this particular call site, we can not afford just erroring out with > -EINTR. > > This patch would fix the error by retry until either we got a valid > transaction handle, or got an error other than -EINTR. > > Since we're here, also enhance the error message a little to make it > more readable. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/volumes.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 189da583bb67..e83711fe31bb 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -3507,7 +3507,15 @@ static int del_balance_item(struct btrfs_fs_info *fs_info) > if (!path) > return -ENOMEM; > > - trans = btrfs_start_transaction_fallback_global_rsv(root, 0); > + do { > + /* > + * The transaction starting here can be interrupted, but if we > + * just error out we would mark the fs read-only. > + * Thus here we try to start the transaction again if it's > + * interrupted. > + */ > + trans = btrfs_start_transaction_fallback_global_rsv(root, 0); > + } while (IS_ERR(trans) && PTR_ERR(trans) == -EINTR); This condition can be simply: trans == ERR_PTR(-EINTR) My only concern is if this can turn into an infinite loop due to a high enough rate of signals being sent to the process... Instead of this I would make reset_balance_state() just print a warning, and not call btrfs_handle_fs_error() and then change insert_balance_item() to not fail in case the item already exists - instead just overwrite it. Thanks. > if (IS_ERR(trans)) { > btrfs_free_path(path); > return PTR_ERR(trans); > @@ -3594,7 +3602,7 @@ static void reset_balance_state(struct btrfs_fs_info *fs_info) > kfree(bctl); > ret = del_balance_item(fs_info); > if (ret) > - btrfs_handle_fs_error(fs_info, ret, NULL); > + btrfs_handle_fs_error(fs_info, ret, "failed to delete balance item"); > } > > /* > -- > 2.41.0 >
On 2023/8/16 20:45, Filipe Manana wrote: > On Wed, Aug 16, 2023 at 06:28:16PM +0800, Qu Wenruo wrote: >> [BUG] >> >> There is an internal bug report that there are only 3 lines of btrfs >> errors, then btrfs falls read-only: >> >> [358958.022131] BTRFS info (device dm-9): balance: canceled >> [358958.022148] BTRFS: error (device dm-9) in __cancel_balance:4014: errno=-4 unknown >> [358958.022150] BTRFS info (device dm-9): forced readonly >> >> [CAUSE] >> The error number -4 is -EINTR, and according to the code line (although >> backported kernel, the code is still relevant upstream), it's the >> btrfs_handle_fs_error() call inside reset_balance_state(). >> >> This can happen when we try to start a transaction which requires >> metadata flushing. >> >> This metadata flushing can be interrupted by signal, thus it can return >> -EINTR. >> >> For our case, the -EINTR is deadly because we don't handle the error at >> all, and immediately mark the fs read-only in the following call chain: >> >> reset_balance_state() >> |- del_balance_item() >> | `- btrfs_start_transation_fallback_global_rsv() >> | `- start_transaction() >> | `- btrfs_block_rsv_add() >> | `- __reserve_bytes() >> | `- handle_reserve_ticket() >> | `- wait_reserve_ticket() >> | `- prepare_to_wait_event() >> | This wait has TASK_KILLABLE, thus can be >> | interrupted. >> | Thus we return -EINTR. >> | >> |- IS_ERR(trans) triggered >> |- btrfs_handle_fs_error() >> The fs is marked read-only. >> >> [FIX] >> For this particular call site, we can not afford just erroring out with >> -EINTR. >> >> This patch would fix the error by retry until either we got a valid >> transaction handle, or got an error other than -EINTR. >> >> Since we're here, also enhance the error message a little to make it >> more readable. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/volumes.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 189da583bb67..e83711fe31bb 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -3507,7 +3507,15 @@ static int del_balance_item(struct btrfs_fs_info *fs_info) >> if (!path) >> return -ENOMEM; >> >> - trans = btrfs_start_transaction_fallback_global_rsv(root, 0); >> + do { >> + /* >> + * The transaction starting here can be interrupted, but if we >> + * just error out we would mark the fs read-only. >> + * Thus here we try to start the transaction again if it's >> + * interrupted. >> + */ >> + trans = btrfs_start_transaction_fallback_global_rsv(root, 0); >> + } while (IS_ERR(trans) && PTR_ERR(trans) == -EINTR); > > This condition can be simply: trans == ERR_PTR(-EINTR) > > My only concern is if this can turn into an infinite loop due to a high enough rate of > signals being sent to the process... Yep, that's indeed a concern. The other solution is to introduce a flag to disallow signal for the ticket system (aka non-killable wait), which can get rid of the frequent signal problems. In fact, we may not want certain reclaim to be interrupted at all, especially for BTRFS_RESERVE_FLUSH_ALL_STEAL, which are only utilized for very critical operations like unlink and other deletion operations. > > Instead of this I would make reset_balance_state() just print a warning, and not > call btrfs_handle_fs_error() and then change insert_balance_item() to not fail in > case the item already exists - instead just overwrite it. This means, if a unlucky interruption happened, the left balance item can cause us to resume a balance on the next mount, which can be unexpected for the end user. Thanks, Qu > > Thanks. > > >> if (IS_ERR(trans)) { >> btrfs_free_path(path); >> return PTR_ERR(trans); >> @@ -3594,7 +3602,7 @@ static void reset_balance_state(struct btrfs_fs_info *fs_info) >> kfree(bctl); >> ret = del_balance_item(fs_info); >> if (ret) >> - btrfs_handle_fs_error(fs_info, ret, NULL); >> + btrfs_handle_fs_error(fs_info, ret, "failed to delete balance item"); >> } >> >> /* >> -- >> 2.41.0 >>
On Wed, Aug 16, 2023 at 10:54 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2023/8/16 20:45, Filipe Manana wrote: > > On Wed, Aug 16, 2023 at 06:28:16PM +0800, Qu Wenruo wrote: > >> [BUG] > >> > >> There is an internal bug report that there are only 3 lines of btrfs > >> errors, then btrfs falls read-only: > >> > >> [358958.022131] BTRFS info (device dm-9): balance: canceled > >> [358958.022148] BTRFS: error (device dm-9) in __cancel_balance:4014: errno=-4 unknown > >> [358958.022150] BTRFS info (device dm-9): forced readonly > >> > >> [CAUSE] > >> The error number -4 is -EINTR, and according to the code line (although > >> backported kernel, the code is still relevant upstream), it's the > >> btrfs_handle_fs_error() call inside reset_balance_state(). > >> > >> This can happen when we try to start a transaction which requires > >> metadata flushing. > >> > >> This metadata flushing can be interrupted by signal, thus it can return > >> -EINTR. > >> > >> For our case, the -EINTR is deadly because we don't handle the error at > >> all, and immediately mark the fs read-only in the following call chain: > >> > >> reset_balance_state() > >> |- del_balance_item() > >> | `- btrfs_start_transation_fallback_global_rsv() > >> | `- start_transaction() > >> | `- btrfs_block_rsv_add() > >> | `- __reserve_bytes() > >> | `- handle_reserve_ticket() > >> | `- wait_reserve_ticket() > >> | `- prepare_to_wait_event() > >> | This wait has TASK_KILLABLE, thus can be > >> | interrupted. > >> | Thus we return -EINTR. > >> | > >> |- IS_ERR(trans) triggered > >> |- btrfs_handle_fs_error() > >> The fs is marked read-only. > >> > >> [FIX] > >> For this particular call site, we can not afford just erroring out with > >> -EINTR. > >> > >> This patch would fix the error by retry until either we got a valid > >> transaction handle, or got an error other than -EINTR. > >> > >> Since we're here, also enhance the error message a little to make it > >> more readable. > >> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/volumes.c | 12 ++++++++++-- > >> 1 file changed, 10 insertions(+), 2 deletions(-) > >> > >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > >> index 189da583bb67..e83711fe31bb 100644 > >> --- a/fs/btrfs/volumes.c > >> +++ b/fs/btrfs/volumes.c > >> @@ -3507,7 +3507,15 @@ static int del_balance_item(struct btrfs_fs_info *fs_info) > >> if (!path) > >> return -ENOMEM; > >> > >> - trans = btrfs_start_transaction_fallback_global_rsv(root, 0); > >> + do { > >> + /* > >> + * The transaction starting here can be interrupted, but if we > >> + * just error out we would mark the fs read-only. > >> + * Thus here we try to start the transaction again if it's > >> + * interrupted. > >> + */ > >> + trans = btrfs_start_transaction_fallback_global_rsv(root, 0); > >> + } while (IS_ERR(trans) && PTR_ERR(trans) == -EINTR); > > > > This condition can be simply: trans == ERR_PTR(-EINTR) > > > > My only concern is if this can turn into an infinite loop due to a high enough rate of > > signals being sent to the process... > > Yep, that's indeed a concern. > > The other solution is to introduce a flag to disallow signal for the > ticket system (aka non-killable wait), which can get rid of the frequent > signal problems. > > In fact, we may not want certain reclaim to be interrupted at all, > especially for BTRFS_RESERVE_FLUSH_ALL_STEAL, which are only utilized > for very critical operations like unlink and other deletion operations. We shouldn't need to call btrfs_start_transaction_fallback_global_rsv() because we don't need to reserve space. A join transaction would be enough, because: 1) We pass 0 items to the start transaction call; 2) More importantly, we are updating the root tree and the root tree uses the global block reserve, see btrfs_init_root_block_rsv(). So the start transaction call should not be reserving any space because "num_items == 0" and "flush != BTRFS_RESERVE_FLUSH_ALL". Are you sure the -EINTR is coming from the btrfs_start_transaction_fallback_global_rsv() and not from the btrfs_search_slot() call at del_balance_item() for example? Nothing in the partial log you pasted can tell the -EINTR comes from btrfs_start_transaction_fallback_global_rsv(), which would be very surprising because it's not reserving any space for those reasons mentioned above. > > > > > Instead of this I would make reset_balance_state() just print a warning, and not > > call btrfs_handle_fs_error() and then change insert_balance_item() to not fail in > > case the item already exists - instead just overwrite it. > > This means, if a unlucky interruption happened, the left balance item > can cause us to resume a balance on the next mount, which can be > unexpected for the end user. Yes, but maybe not much work is done unless after that some block groups got fragmented enough to trigger the stored balance filters in the item. The worst case is without filters, where all block groups are always relocated. Not ideal, yes. But having had a closer look, my concern is that I don't see how btrfs_start_transaction_fallback_global_rsv() can return -EINTR, and I suspect it comes from somewhere else. Thanks. > > Thanks, > Qu > > > > Thanks. > > > > > >> if (IS_ERR(trans)) { > >> btrfs_free_path(path); > >> return PTR_ERR(trans); > >> @@ -3594,7 +3602,7 @@ static void reset_balance_state(struct btrfs_fs_info *fs_info) > >> kfree(bctl); > >> ret = del_balance_item(fs_info); > >> if (ret) > >> - btrfs_handle_fs_error(fs_info, ret, NULL); > >> + btrfs_handle_fs_error(fs_info, ret, "failed to delete balance item"); > >> } > >> > >> /* > >> -- > >> 2.41.0 > >>
On 2023/8/17 15:39, Filipe Manana wrote: > On Wed, Aug 16, 2023 at 10:54 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: [...] >>> >>> My only concern is if this can turn into an infinite loop due to a high enough rate of >>> signals being sent to the process... >> >> Yep, that's indeed a concern. >> >> The other solution is to introduce a flag to disallow signal for the >> ticket system (aka non-killable wait), which can get rid of the frequent >> signal problems. >> >> In fact, we may not want certain reclaim to be interrupted at all, >> especially for BTRFS_RESERVE_FLUSH_ALL_STEAL, which are only utilized >> for very critical operations like unlink and other deletion operations. > > We shouldn't need to call > btrfs_start_transaction_fallback_global_rsv() because we don't need to > reserve space. > A join transaction would be enough, because: > > 1) We pass 0 items to the start transaction call; > > 2) More importantly, we are updating the root tree and the root tree > uses the global block reserve, see btrfs_init_root_block_rsv(). > > So the start transaction call should not be reserving any space > because "num_items == 0" and "flush != BTRFS_RESERVE_FLUSH_ALL". My bad, this is caused by the difference between SLE and the upstream. For upstream, we go BTRFS_RESERVE_FLUSH_ALL_STEAL for this call site (btrfs_start_transaction_fallback_global_rsv()), but for the SLE kernel it's just a plain btrfs_start_transaction() which is going to need to do block rsv. I should just backport the patch changing the start transaction helper. And indeed your analyse is correct, for upstream kernel we won't flush for btrfs_start_transaction_fallback_global_rsv(), thus we don't need to handle it at all. > Are you sure the -EINTR is coming from the > btrfs_start_transaction_fallback_global_rsv() and not from the > btrfs_search_slot() call at del_balance_item() for example? Just to fill my curiosity, I checked the tree read code which waits using UNINTERRUPTIBLE, but it's no longer relevant anymore... Thanks for pointing out the hole! Qu > > Nothing in the partial log you pasted can tell the -EINTR comes from > btrfs_start_transaction_fallback_global_rsv(), which would be very > surprising > because it's not reserving any space for those reasons mentioned above. > >> >>> >>> Instead of this I would make reset_balance_state() just print a warning, and not >>> call btrfs_handle_fs_error() and then change insert_balance_item() to not fail in >>> case the item already exists - instead just overwrite it. >> >> This means, if a unlucky interruption happened, the left balance item >> can cause us to resume a balance on the next mount, which can be >> unexpected for the end user. > > Yes, but maybe not much work is done unless after that some block > groups got fragmented enough to trigger the stored balance filters in > the item. > The worst case is without filters, where all block groups are always relocated. > Not ideal, yes. > > But having had a closer look, my concern is that I don't see how > btrfs_start_transaction_fallback_global_rsv() can return -EINTR, and I > suspect > it comes from somewhere else. > > Thanks. > >> >> Thanks, >> Qu >>> >>> Thanks. >>> >>> >>>> if (IS_ERR(trans)) { >>>> btrfs_free_path(path); >>>> return PTR_ERR(trans); >>>> @@ -3594,7 +3602,7 @@ static void reset_balance_state(struct btrfs_fs_info *fs_info) >>>> kfree(bctl); >>>> ret = del_balance_item(fs_info); >>>> if (ret) >>>> - btrfs_handle_fs_error(fs_info, ret, NULL); >>>> + btrfs_handle_fs_error(fs_info, ret, "failed to delete balance item"); >>>> } >>>> >>>> /* >>>> -- >>>> 2.41.0 >>>>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 189da583bb67..e83711fe31bb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3507,7 +3507,15 @@ static int del_balance_item(struct btrfs_fs_info *fs_info) if (!path) return -ENOMEM; - trans = btrfs_start_transaction_fallback_global_rsv(root, 0); + do { + /* + * The transaction starting here can be interrupted, but if we + * just error out we would mark the fs read-only. + * Thus here we try to start the transaction again if it's + * interrupted. + */ + trans = btrfs_start_transaction_fallback_global_rsv(root, 0); + } while (IS_ERR(trans) && PTR_ERR(trans) == -EINTR); if (IS_ERR(trans)) { btrfs_free_path(path); return PTR_ERR(trans); @@ -3594,7 +3602,7 @@ static void reset_balance_state(struct btrfs_fs_info *fs_info) kfree(bctl); ret = del_balance_item(fs_info); if (ret) - btrfs_handle_fs_error(fs_info, ret, NULL); + btrfs_handle_fs_error(fs_info, ret, "failed to delete balance item"); } /*
[BUG] There is an internal bug report that there are only 3 lines of btrfs errors, then btrfs falls read-only: [358958.022131] BTRFS info (device dm-9): balance: canceled [358958.022148] BTRFS: error (device dm-9) in __cancel_balance:4014: errno=-4 unknown [358958.022150] BTRFS info (device dm-9): forced readonly [CAUSE] The error number -4 is -EINTR, and according to the code line (although backported kernel, the code is still relevant upstream), it's the btrfs_handle_fs_error() call inside reset_balance_state(). This can happen when we try to start a transaction which requires metadata flushing. This metadata flushing can be interrupted by signal, thus it can return -EINTR. For our case, the -EINTR is deadly because we don't handle the error at all, and immediately mark the fs read-only in the following call chain: reset_balance_state() |- del_balance_item() | `- btrfs_start_transation_fallback_global_rsv() | `- start_transaction() | `- btrfs_block_rsv_add() | `- __reserve_bytes() | `- handle_reserve_ticket() | `- wait_reserve_ticket() | `- prepare_to_wait_event() | This wait has TASK_KILLABLE, thus can be | interrupted. | Thus we return -EINTR. | |- IS_ERR(trans) triggered |- btrfs_handle_fs_error() The fs is marked read-only. [FIX] For this particular call site, we can not afford just erroring out with -EINTR. This patch would fix the error by retry until either we got a valid transaction handle, or got an error other than -EINTR. Since we're here, also enhance the error message a little to make it more readable. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/volumes.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)