[v2] xfs: extend the freelist before available space check

There is a long standing issue which could cause fs shutdown due to
inode extent to btree conversion failure right after an extent
allocation in the same AG, which is absolutely unexpected due to the
proper minleft reservation in the previous allocation.  Brian once
addressed one of the root cause [1], however, such symptom can still
occur after the commit is merged as reported [2], and our cloud
environment is also suffering from this issue.

From the description of the commit [1], I found that Zirong has an
in-house stress test reproducer for this issue, therefore I asked him
to reproduce again and he confirmed that such issue can still be
reproducable on RHEL 9.

Thanks to him, after dumping the transaction log items, I think
the root cause is as below:
 1. Allocate space with the following condition:
    freeblks: 18304 pagf_flcount: 6
    reservation: 18276 need (min_free): 6
    args->minleft: 1
    available = freeblks + agflcount - reservation - need - minleft
              = 18304 + min(6, 6) - 18276 - 6 - 1 = 27

    The first allocation check itself is ok;

 2. At that time, the AG state is
    AGF Buffer: (XAGF)
        ver:1  seq#:3  len:2621424
        root BNO:9  CNT:7
        level BNO:2  CNT:2
        1st:64  last:69  cnt:6  freeblks:18277  longest:6395

    agfl (flfirst = 64, fllast = 69, flcount = 6)
    64:547 65:167 66:1651 67:2040807 68:783 69:604

 3. Then, cntbt needs a new btree block (so take one block
    from agfl), and then the log records a new AGF:
    blkno 62914177, len 1, map_size 1
    00000000: 58 41 47 46 00 00 00 01 00 00 00 03 00 27 ff f0  XAGF.........'..
    00000010: 00 00 00 09 00 00 00 07 00 00 00 00 00 00 00 02  ................
    00000020: 00 00 00 02 00 00 00 00 00 00 00 41 00 00 00 45  ...........A...E
    00000030: 00 00 00 05 00 00 47 65 00 00 18 fb 00 00 00 09  ......Ge........
    00000040: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66  u....E@*.Pr.Ynbf
    agf 3  flfirst: 65 (0x41) fllast: 69 (0x45) cnt: 5
    freeblks 18277

 4. agfl 64 (daddr 62918552) was then written as a cntbt block
    log item:
      type#011= 0x123c
      flags#011= 0x8
    blkno 62918552, len 8, map_size 1
    00000000: 41 42 33 43 00 00 00 fd 00 1f 23 e4 ff ff ff ff  AB3C......#.....
    00000010: 00 00 00 00 03 c0 0f 98 00 00 00 00 00 00 00 00  ................
    00000020: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66  u....E@*.Pr.Ynbf

 5. Finally, the following inode extent to btree allocation fails:
    Nov  1 07:56:09 dell-per750-08 kernel: ------------[ cut here ]------------
    Nov  1 07:56:09 dell-per750-08 kernel: WARNING: CPU: 15 PID: 49290 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0xc51/0x1050 [xfs]
    ...
    Nov  1 07:56:10 dell-per750-08 kernel: XFS (sda2): agno 3 agflcount 5 freeblks 18277 reservation 18276 6

    since

    available = freeblks + agflcount - reservation - need - minleft
              = 18277 + min(5, 6) - 18276 - 6 - 0 = 0   < 1
    kaboom.

In order to fix the issue above, freelist needs to be filled with the
minimal blocks at least before available space check, and then we also
know the freelist has enough blocks for the following emergency btree
allocations.

[1] 1ca89fbc48e1 ("xfs: don't account extra agfl blocks as available")
    https://lore.kernel.org/r/20190327145000.10756-1-bfoster@redhat.com
[2] https://lore.kernel.org/r/20220105071052.GD20464@templeofstupid.com
Reported-by: Zirong Lang <zlang@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
changes since v1:
  - refine commit message for better understanding;
  - add a "Reported-by" tag to thank Zirong for reproducing.

 fs/xfs/libxfs/xfs_alloc.c | 186 ++++++++++++++++++++++++--------------
 1 file changed, 120 insertions(+), 66 deletions(-)

Message ID	20221103131025.40064-1-hsiangkao@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: Gao Xiang <hsiangkao@linux.alibaba.com> To: linux-xfs@vger.kernel.org, "Darrick J. Wong" <djwong@kernel.org>, Dave Chinner <dchinner@redhat.com>, Brian Foster <bfoster@redhat.com> Cc: LKML <linux-kernel@vger.kernel.org>, Zirong Lang <zlang@redhat.com>, Gao Xiang <hsiangkao@linux.alibaba.com> Subject: [PATCH v2] xfs: extend the freelist before available space check Date: Thu, 3 Nov 2022 21:10:25 +0800 Message-Id: <20221103131025.40064-1-hsiangkao@linux.alibaba.com> In-Reply-To: <20221103094639.39984-1-hsiangkao@linux.alibaba.com> References: <20221103094639.39984-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] xfs: extend the freelist before available space check \| expand [v2] xfs: extend the freelist before available space check

[v2] xfs: extend the freelist before available space check

Commit Message

Comments

Patch