mbox series

[0/5,v2] Deal with a few ENOSPC corner cases

Message ID 20200313195809.141753-1-josef@toxicpanda.com (mailing list archive)
Headers show
Series Deal with a few ENOSPC corner cases | expand

Message

Josef Bacik March 13, 2020, 7:58 p.m. UTC
v1->v2:
- Dropped "btrfs: only take normal tickets into account in
  may_commit_transaction" because "btrfs: only check priority tickets for
  priority flushing" should actually fix the problem, and Nikolay pointed out
  that evict uses the priority list but is allowed to commit, so we need to take
  into account priority tickets sometimes.
- Added "btrfs: allow us to use up to 90% of the global rsv for" so that the
  global rsv change was separate from the serialization patch.
- Fixed up some changelogs.
- Dropped an extra trace_printk that made it into v2.

----------------------- Original email --------------------------------------

Nikolay has been digging into a failure of generic/320 on ppc64.  This has
shaken out a variety of issues, and he's done a good job at running all of the
weird corners down and then testing my ideas to get them all fixed.  This is the
series that has survived the longest, so we're declaring victory.

First there is the global reserve stealing logic.  The way unlink works is it
attempts to start a transaction with a normal reservation amount, and if this
fails with ENOSPC we fall back to stealing from the global reserve.  This is
problematic because of all the same reasons we had with previous iterations of
the ENOSPC handling, thundering herd.  We get a bunch of failures all at once,
everybody tries to allocate from the global reserve, some win and some lose, we
get an ENSOPC.

To fix this we need to integrate this logic into the normal ENOSPC
infrastructure.  The idea is simple, we add a new flushing state that indicates
we are allowed to steal from the global reserve.  We still go through all of the
normal flushing work, and at the moment we begin to fail all the tickets we try
to satisfy any tickets that are allowed to steal by stealing from the global
reserve.  If this works we start the flushing system over again just like we
would with a normal ticket satisfaction.  This serializes our global reserve
stealing, so we don't have the thundering herd problem

This isn't the only problem however.  Nikolay also noticed that we would
sometimes have huge amounts of space in the trans block rsv and we would ENOSPC
out.  This is because the may_commit_transaction() logic didn't take into
account the space that would be reclaimed by all of the outstanding trans
handles being required to stop in order to commit the transaction.

Another corner here was that priority tickets could race in and make
may_commit_transaction() think that it had no work left to do, and thus not
commit the transaction.

Those fixes all address the failures that Nikolay was seeing.  The last two
patches are just cleanups around how we handle priority tickets.  We shouldn't
even be serializing priority tickets behind normal tickets, only behind other
priority tickets.  And finally there would be a small window where priority
tickets would fail out if there were multiple priority tickets and one of them
failed.  This is addressed by the previous patch.

Nikolay has put these through many iterations of generic/320, and so far it
hasn't failed.  Thanks,

Josef

Comments

Nikolay Borisov March 17, 2020, 3:46 p.m. UTC | #1
On 13.03.20 г. 21:58 ч., Josef Bacik wrote:
> v1->v2:
> - Dropped "btrfs: only take normal tickets into account in
>   may_commit_transaction" because "btrfs: only check priority tickets for
>   priority flushing" should actually fix the problem, and Nikolay pointed out
>   that evict uses the priority list but is allowed to commit, so we need to take
>   into account priority tickets sometimes.
> - Added "btrfs: allow us to use up to 90% of the global rsv for" so that the
>   global rsv change was separate from the serialization patch.
> - Fixed up some changelogs.
> - Dropped an extra trace_printk that made it into v2.
> 
> ----------------------- Original email --------------------------------------
> 
> Nikolay has been digging into a failure of generic/320 on ppc64.  This has
> shaken out a variety of issues, and he's done a good job at running all of the
> weird corners down and then testing my ideas to get them all fixed.  This is the
> series that has survived the longest, so we're declaring victory.
> 
> First there is the global reserve stealing logic.  The way unlink works is it
> attempts to start a transaction with a normal reservation amount, and if this
> fails with ENOSPC we fall back to stealing from the global reserve.  This is
> problematic because of all the same reasons we had with previous iterations of
> the ENOSPC handling, thundering herd.  We get a bunch of failures all at once,
> everybody tries to allocate from the global reserve, some win and some lose, we
> get an ENSOPC.
> 
> To fix this we need to integrate this logic into the normal ENOSPC
> infrastructure.  The idea is simple, we add a new flushing state that indicates
> we are allowed to steal from the global reserve.  We still go through all of the
> normal flushing work, and at the moment we begin to fail all the tickets we try
> to satisfy any tickets that are allowed to steal by stealing from the global
> reserve.  If this works we start the flushing system over again just like we
> would with a normal ticket satisfaction.  This serializes our global reserve
> stealing, so we don't have the thundering herd problem
> 
> This isn't the only problem however.  Nikolay also noticed that we would
> sometimes have huge amounts of space in the trans block rsv and we would ENOSPC
> out.  This is because the may_commit_transaction() logic didn't take into
> account the space that would be reclaimed by all of the outstanding trans
> handles being required to stop in order to commit the transaction.
> 
> Another corner here was that priority tickets could race in and make
> may_commit_transaction() think that it had no work left to do, and thus not
> commit the transaction.
> 
> Those fixes all address the failures that Nikolay was seeing.  The last two
> patches are just cleanups around how we handle priority tickets.  We shouldn't
> even be serializing priority tickets behind normal tickets, only behind other
> priority tickets.  And finally there would be a small window where priority
> tickets would fail out if there were multiple priority tickets and one of them
> failed.  This is addressed by the previous patch.
> 
> Nikolay has put these through many iterations of generic/320, and so far it
> hasn't failed.  Thanks,
> 
> Josef
> 


I tested this on PPC64LE and didn't observe any regressions (apart form
the one fixed by [PATCH] btrfs: force chunk allocation if our global rsv
is larger than metadata), so:

Tested-by: Nikolay Borisov <nborisov@suse.com>
David Sterba March 25, 2020, 3:50 p.m. UTC | #2
On Fri, Mar 13, 2020 at 03:58:04PM -0400, Josef Bacik wrote:
> v1->v2:
> - Dropped "btrfs: only take normal tickets into account in
>   may_commit_transaction" because "btrfs: only check priority tickets for
>   priority flushing" should actually fix the problem, and Nikolay pointed out
>   that evict uses the priority list but is allowed to commit, so we need to take
>   into account priority tickets sometimes.
> - Added "btrfs: allow us to use up to 90% of the global rsv for" so that the
>   global rsv change was separate from the serialization patch.
> - Fixed up some changelogs.
> - Dropped an extra trace_printk that made it into v2.

The patchset seems to be based on some other, code I think it's the
tickets for data chunks. The compilation fails because
BTRFS_RESERVE_FLUSH_DATA is not defined, but it's mentioned in several
patches.

If the base patchset is a hard requirement then both would need to go in
at the same time, otherwise if it's possible to refresh this branch I
could add it to for-next now.
Nikolay Borisov March 25, 2020, 3:52 p.m. UTC | #3
On 25.03.20 г. 17:50 ч., David Sterba wrote:
> On Fri, Mar 13, 2020 at 03:58:04PM -0400, Josef Bacik wrote:
>> v1->v2:
>> - Dropped "btrfs: only take normal tickets into account in
>>   may_commit_transaction" because "btrfs: only check priority tickets for
>>   priority flushing" should actually fix the problem, and Nikolay pointed out
>>   that evict uses the priority list but is allowed to commit, so we need to take
>>   into account priority tickets sometimes.
>> - Added "btrfs: allow us to use up to 90% of the global rsv for" so that the
>>   global rsv change was separate from the serialization patch.
>> - Fixed up some changelogs.
>> - Dropped an extra trace_printk that made it into v2.
> 
> The patchset seems to be based on some other, code I think it's the
> tickets for data chunks. The compilation fails because
> BTRFS_RESERVE_FLUSH_DATA is not defined, but it's mentioned in several
> patches.
> 
> If the base patchset is a hard requirement then both would need to go in
> at the same time, otherwise if it's possible to refresh this branch I
> could add it to for-next now.
> 

No, the data ticket is not a hard requirement. I've tested this branch
on our SLE kernels without it. So the conflict resolution is really mino
- simply removing the conditions involving BTRFS_RESERVE_FLUSH_DATA.
David Sterba March 25, 2020, 6:33 p.m. UTC | #4
On Wed, Mar 25, 2020 at 05:52:38PM +0200, Nikolay Borisov wrote:
> 
> 
> On 25.03.20 г. 17:50 ч., David Sterba wrote:
> > On Fri, Mar 13, 2020 at 03:58:04PM -0400, Josef Bacik wrote:
> >> v1->v2:
> >> - Dropped "btrfs: only take normal tickets into account in
> >>   may_commit_transaction" because "btrfs: only check priority tickets for
> >>   priority flushing" should actually fix the problem, and Nikolay pointed out
> >>   that evict uses the priority list but is allowed to commit, so we need to take
> >>   into account priority tickets sometimes.
> >> - Added "btrfs: allow us to use up to 90% of the global rsv for" so that the
> >>   global rsv change was separate from the serialization patch.
> >> - Fixed up some changelogs.
> >> - Dropped an extra trace_printk that made it into v2.
> > 
> > The patchset seems to be based on some other, code I think it's the
> > tickets for data chunks. The compilation fails because
> > BTRFS_RESERVE_FLUSH_DATA is not defined, but it's mentioned in several
> > patches.
> > 
> > If the base patchset is a hard requirement then both would need to go in
> > at the same time, otherwise if it's possible to refresh this branch I
> > could add it to for-next now.
> > 
> 
> No, the data ticket is not a hard requirement. I've tested this branch
> on our SLE kernels without it. So the conflict resolution is really mino
> - simply removing the conditions involving BTRFS_RESERVE_FLUSH_DATA.

Ok, thanks. With this diff applied, I'll add the branch to for-next and
then to misc-next once some tests finish.

--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1188,8 +1188,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
  */
 static inline bool is_normal_flushing(enum btrfs_reserve_flush_enum flush)
 {
-       return (flush == BTRFS_RESERVE_FLUSH_DATA) ||
-               (flush == BTRFS_RESERVE_FLUSH_ALL) ||
+       return  (flush == BTRFS_RESERVE_FLUSH_ALL) ||
                (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL);
 }
David Sterba April 3, 2020, 3:46 p.m. UTC | #5
On Fri, Mar 13, 2020 at 03:58:04PM -0400, Josef Bacik wrote:
> v1->v2:
> - Dropped "btrfs: only take normal tickets into account in
>   may_commit_transaction" because "btrfs: only check priority tickets for
>   priority flushing" should actually fix the problem, and Nikolay pointed out
>   that evict uses the priority list but is allowed to commit, so we need to take
>   into account priority tickets sometimes.
> - Added "btrfs: allow us to use up to 90% of the global rsv for" so that the
>   global rsv change was separate from the serialization patch.
> - Fixed up some changelogs.
> - Dropped an extra trace_printk that made it into v2.

Patchset moved to misc-next, thanks.