mbox series

[0/2] btrfs: balance root leak and runaway balance fix

Message ID 20200520065851.12689-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs: balance root leak and runaway balance fix | expand

Message

Qu Wenruo May 20, 2020, 6:58 a.m. UTC
This patchset will fix the most wanted balance bug, runaway balance.
All my fault, and all small fixes.

The first patch fixes the root leakage and NULL pointer dereference
caused by it.

The second patch will fix the runaway balance and add alerting system to
prevent such problem from happening again.
The runaway fix depends on the root leakage fix, thus they are sent in a
patchset.

The first patch is just resent without any modification.

For backport to older kernels, the first patch needs small modification
to use atomic_t other than refcount_t.

Qu Wenruo (2):
  btrfs: relocation: Fix reloc root leakage and the NULL pointer
    reference caused by the leakage
  btrfs: relocation: Clear the DEAD_RELOC_TREE bit for orphan roots to
    prevent runaway balance

 fs/btrfs/disk-io.c    |  1 +
 fs/btrfs/relocation.c | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

Comments

David Sterba May 22, 2020, 11:13 a.m. UTC | #1
On Wed, May 20, 2020 at 02:58:49PM +0800, Qu Wenruo wrote:
> This patchset will fix the most wanted balance bug, runaway balance.
> All my fault, and all small fixes.

Well, that happens.

d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")

is the most broken patch in recent history (5.1+), there were so many
fixups but hopefully this is the last one. I've tagged the patches for
5.1+ stable but we'll need manual backports due to the root refcount
changes in 5.7.

I reproduced the umount crash and verified the fix, the runaway balance
does not happen anymore in the test so I guess we have all the needed
fixes in place to allow the fast balance cancel. Thanks.
Zygo Blaxell July 23, 2020, 9:54 p.m. UTC | #2
On Fri, May 22, 2020 at 01:13:47PM +0200, David Sterba wrote:
> On Wed, May 20, 2020 at 02:58:49PM +0800, Qu Wenruo wrote:
> > This patchset will fix the most wanted balance bug, runaway balance.
> > All my fault, and all small fixes.
> 
> Well, that happens.
> 
> d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
> 
> is the most broken patch in recent history (5.1+), there were so many
> fixups but hopefully this is the last one. I've tagged the patches for
> 5.1+ stable but we'll need manual backports due to the root refcount
> changes in 5.7.

The patch 1dae7e0e58b4 "btrfs: reloc: clear DEAD_RELOC_TREE bit for
orphan roots to prevent runaway balance" does apply to 5.7 itself, but
it is not present in 5.7.10.  I've been running it in test (and even a
few pre-prod) systems since May.

We still get someone in IRC with a runaway balance every week or so.
Currently we can only tell them to wait for 5.8, or roll all the way
back to 4.19.

> I reproduced the umount crash and verified the fix, the runaway balance
> does not happen anymore in the test so I guess we have all the needed
> fixes in place to allow the fast balance cancel. Thanks.
Qu Wenruo July 24, 2020, 12:05 a.m. UTC | #3
On 2020/7/24 上午5:54, Zygo Blaxell wrote:
> On Fri, May 22, 2020 at 01:13:47PM +0200, David Sterba wrote:
>> On Wed, May 20, 2020 at 02:58:49PM +0800, Qu Wenruo wrote:
>>> This patchset will fix the most wanted balance bug, runaway balance.
>>> All my fault, and all small fixes.
>>
>> Well, that happens.
>>
>> d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
>>
>> is the most broken patch in recent history (5.1+), there were so many
>> fixups but hopefully this is the last one. I've tagged the patches for
>> 5.1+ stable but we'll need manual backports due to the root refcount
>> changes in 5.7.
> 
> The patch 1dae7e0e58b4 "btrfs: reloc: clear DEAD_RELOC_TREE bit for
> orphan roots to prevent runaway balance" does apply to 5.7 itself, but
> it is not present in 5.7.10.  I've been running it in test (and even a
> few pre-prod) systems since May.

Strange, I see no mail about merge failure nor merge success.

I'll send the backport manually to all older branches.

BTW, what's the proper tag for stable branch ranges?

Thanks,
Qu

> 
> We still get someone in IRC with a runaway balance every week or so.
> Currently we can only tell them to wait for 5.8, or roll all the way
> back to 4.19.
> 
>> I reproduced the umount crash and verified the fix, the runaway balance
>> does not happen anymore in the test so I guess we have all the needed
>> fixes in place to allow the fast balance cancel. Thanks.
David Sterba July 24, 2020, 9:33 a.m. UTC | #4
On Fri, Jul 24, 2020 at 08:05:16AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/7/24 上午5:54, Zygo Blaxell wrote:
> > On Fri, May 22, 2020 at 01:13:47PM +0200, David Sterba wrote:
> >> On Wed, May 20, 2020 at 02:58:49PM +0800, Qu Wenruo wrote:
> >>> This patchset will fix the most wanted balance bug, runaway balance.
> >>> All my fault, and all small fixes.
> >>
> >> Well, that happens.
> >>
> >> d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
> >>
> >> is the most broken patch in recent history (5.1+), there were so many
> >> fixups but hopefully this is the last one. I've tagged the patches for
> >> 5.1+ stable but we'll need manual backports due to the root refcount
> >> changes in 5.7.
> > 
> > The patch 1dae7e0e58b4 "btrfs: reloc: clear DEAD_RELOC_TREE bit for
> > orphan roots to prevent runaway balance" does apply to 5.7 itself, but
> > it is not present in 5.7.10.  I've been running it in test (and even a
> > few pre-prod) systems since May.
> 
> Strange, I see no mail about merge failure nor merge success.
> 
> I'll send the backport manually to all older branches.
> 
> BTW, what's the proper tag for stable branch ranges?

For inspiration look at subjects at https://lore.kernel.org/stable/ ,
something like, the version needs to be visible without looking to the
patch.

"[PATCH for 5.4] btrfs: ...."

You can send it as a thread with various versions in case the patches
differ, or use [PATCH for 5.4+].