diff mbox series

xfs: extend the freelist before available space check

Message ID 20221103094639.39984-1-hsiangkao@linux.alibaba.com (mailing list archive)
State Superseded
Headers show
Series xfs: extend the freelist before available space check | expand

Commit Message

Gao Xiang Nov. 3, 2022, 9:46 a.m. UTC
There is a long standing issue which could cause fs shutdown due to
inode extent to btree conversion failure right after an extent
allocation in the same AG, which is absolutely unexpected due to the
proper minleft reservation in the previous allocation.  Brian once
addressed one of the root cause [1], however, such symptom can still
occur after the commit is merged as reported [2], and our cloud
environment is also suffering from this issue.

From the description of the commit [1], I found that Zirong has an
in-house stress test reproducer for this issue, therefore I asked him
to reproduce again and he confirmed that such issue can still be
reproducable on RHEL 9.

Thanks to him, after dumping the transaction log items, I think
the root cause is as below:
 1. BUF:  ... start blkno:0x3bffe81  len:1  bmap size:1  flags:0x2800
    AGF Buffer: (XAGF)
        ver:1  seq#:3  len:2621424
        root BNO:9  CNT:7
        level BNO:2  CNT:2
        1st:64  last:69  cnt:6  freeblks:18277  longest:6395

 2. agfl (flfirst = 64, fllast = 69, flcount = 6)
    64:547 65:167 66:1651 67:2040807 68:783 69:604

 3. The log records a new AGF
    blkno 62914177, len 1, map_size 1
    00000000: 58 41 47 46 00 00 00 01 00 00 00 03 00 27 ff f0  XAGF.........'..
    00000010: 00 00 00 09 00 00 00 07 00 00 00 00 00 00 00 02  ................
    00000020: 00 00 00 02 00 00 00 00 00 00 00 41 00 00 00 45  ...........A...E
    00000030: 00 00 00 05 00 00 47 65 00 00 18 fb 00 00 00 09  ......Ge........
    00000040: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66  u....E@*.Pr.Ynbf
    agf 3  flfirst: 65 (0x41) fllast: 69 (0x45) cnt: 5

 4. agfl 64 (daddr 62918552) was then written as a cntbt block
    log item:
      type#011= 0x123c
      flags#011= 0x8
    blkno 62918552, len 8, map_size 1
    00000000: 41 42 33 43 00 00 00 fd 00 1f 23 e4 ff ff ff ff  AB3C......#.....
    00000010: 00 00 00 00 03 c0 0f 98 00 00 00 00 00 00 00 00  ................
    00000020: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66  u....E@*.Pr.Ynbf

 5. Finally, kaboom.
    Nov  1 07:56:09 dell-per750-08 kernel: ------------[ cut here ]------------
    Nov  1 07:56:09 dell-per750-08 kernel: WARNING: CPU: 15 PID: 49290 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0xc51/0x1050 [xfs]
    ...
    Nov  1 07:56:10 dell-per750-08 kernel: XFS (sda2): agno 3 agflcount 5 freeblks 18277 reservation 18276 6

In order to fix the issue above, freelist needs to be filled with the
minimal blocks at least before available space check, and then we also
know the freelist has enough blocks for the following emergency btree
allocations.

[1] commit 1ca89fbc48e1 ("xfs: don't account extra agfl blocks as available")
    https://lore.kernel.org/r/20190327145000.10756-1-bfoster@redhat.com
[2] https://lore.kernel.org/r/20220105071052.GD20464@templeofstupid.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 187 ++++++++++++++++++++++++--------------
 1 file changed, 121 insertions(+), 66 deletions(-)

Comments

kernel test robot Nov. 3, 2022, 4:51 p.m. UTC | #1
Hi Gao,

I love your patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on linus/master v6.1-rc3 next-20221103]
[cannot apply to djwong-xfs/djwong-devel]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Gao-Xiang/xfs-extend-the-freelist-before-available-space-check/20221103-174840
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link:    https://lore.kernel.org/r/20221103094639.39984-1-hsiangkao%40linux.alibaba.com
patch subject: [PATCH] xfs: extend the freelist before available space check
config: hexagon-randconfig-r004-20221102
compiler: clang version 15.0.4 (https://github.com/llvm/llvm-project 5c68a1cb123161b54b72ce90e7975d95a8eaf2a4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/427b4db33a93f832a6da62b2838bd5a558d024d9
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Gao-Xiang/xfs-extend-the-freelist-before-available-space-check/20221103-174840
        git checkout 427b4db33a93f832a6da62b2838bd5a558d024d9
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash fs/xfs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from fs/xfs/libxfs/xfs_alloc.c:6:
   In file included from fs/xfs/xfs.h:22:
   In file included from fs/xfs/xfs_linux.h:31:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from fs/xfs/libxfs/xfs_alloc.c:6:
   In file included from fs/xfs/xfs.h:22:
   In file included from fs/xfs/xfs_linux.h:31:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from fs/xfs/libxfs/xfs_alloc.c:6:
   In file included from fs/xfs/xfs.h:22:
   In file included from fs/xfs/xfs_linux.h:31:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
>> fs/xfs/libxfs/xfs_alloc.c:2597:1: warning: no previous prototype for function 'xfs_fill_agfl' [-Wmissing-prototypes]
   xfs_fill_agfl(
   ^
   fs/xfs/libxfs/xfs_alloc.c:2596:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int
   ^
   static 
   7 warnings generated.


vim +/xfs_fill_agfl +2597 fs/xfs/libxfs/xfs_alloc.c

  2585	
  2586	/*
  2587	 * The freelist has to be in a good state before the available space check
  2588	 * since multiple allocations could be performed from a single AG and
  2589	 * transaction under certain conditions.  For example, A bmbt allocation
  2590	 * request made for inode extent to bmap format conversion after an extent
  2591	 * allocation is expected to be satisfied by the same AG.  Such bmap conversion
  2592	 * allocation can fail due to the available space check if allocbt required an
  2593	 * extra btree block from the freelist in the previous allocation but without
  2594	 * make the freelist longer.
  2595	 */
  2596	int
> 2597	xfs_fill_agfl(
  2598		struct xfs_alloc_arg    *args,
  2599		int			flags,
  2600		xfs_extlen_t		need,
  2601		struct xfs_buf          *agbp)
  2602	{
  2603		struct xfs_trans	*tp = args->tp;
  2604		struct xfs_perag	*pag = agbp->b_pag;
  2605		struct xfs_alloc_arg	targs = {
  2606			.tp		= tp,
  2607			.mp		= tp->t_mountp,
  2608			.agbp		= agbp,
  2609			.agno		= args->agno,
  2610			.alignment	= 1,
  2611			.minlen		= 1,
  2612			.prod		= 1,
  2613			.type		= XFS_ALLOCTYPE_THIS_AG,
  2614			.pag		= pag,
  2615		};
  2616		struct xfs_buf          *agflbp = NULL;
  2617		xfs_agblock_t		bno;
  2618		int error;
  2619	
  2620		if (flags & XFS_ALLOC_FLAG_NORMAP)
  2621			targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
  2622		else
  2623			targs.oinfo = XFS_RMAP_OINFO_AG;
  2624	
  2625		error = xfs_alloc_read_agfl(pag, tp, &agflbp);
  2626		if (error)
  2627			return error;
  2628	
  2629		/* Make the freelist longer if it's too short. */
  2630		while (pag->pagf_flcount < need) {
  2631			targs.agbno = 0;
  2632			targs.maxlen = need - pag->pagf_flcount;
  2633			targs.resv = XFS_AG_RESV_AGFL;
  2634	
  2635			/* Allocate as many blocks as possible at once. */
  2636			error = xfs_alloc_ag_vextent(&targs);
  2637			if (error)
  2638				goto out_agflbp_relse;
  2639	
  2640			/*
  2641			 * Stop if we run out.  Won't happen if callers are obeying
  2642			 * the restrictions correctly.  Can happen for free calls
  2643			 * on a completely full ag.
  2644			 */
  2645			if (targs.agbno == NULLAGBLOCK) {
  2646				if (flags & XFS_ALLOC_FLAG_FREEING)
  2647					break;
  2648				error = -ENOSPC;
  2649				goto out_agflbp_relse;
  2650			}
  2651			/*
  2652			 * Put each allocated block on the list.
  2653			 */
  2654			for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
  2655				error = xfs_alloc_put_freelist(pag, tp, agbp,
  2656								agflbp, bno, 0);
  2657				if (error)
  2658					goto out_agflbp_relse;
  2659			}
  2660		}
  2661	out_agflbp_relse:
  2662		xfs_trans_brelse(tp, agflbp);
  2663		return error;
  2664	}
  2665
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6261599bb389..5d5e0e0e5227 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2587,6 +2587,86 @@  xfs_exact_minlen_extent_available(
 }
 #endif
 
+/*
+ * The freelist has to be in a good state before the available space check
+ * since multiple allocations could be performed from a single AG and
+ * transaction under certain conditions.  For example, A bmbt allocation
+ * request made for inode extent to bmap format conversion after an extent
+ * allocation is expected to be satisfied by the same AG.  Such bmap conversion
+ * allocation can fail due to the available space check if allocbt required an
+ * extra btree block from the freelist in the previous allocation but without
+ * make the freelist longer.
+ */
+int
+xfs_fill_agfl(
+	struct xfs_alloc_arg    *args,
+	int			flags,
+	xfs_extlen_t		need,
+	struct xfs_buf          *agbp)
+{
+	struct xfs_trans	*tp = args->tp;
+	struct xfs_perag	*pag = agbp->b_pag;
+	struct xfs_alloc_arg	targs = {
+		.tp		= tp,
+		.mp		= tp->t_mountp,
+		.agbp		= agbp,
+		.agno		= args->agno,
+		.alignment	= 1,
+		.minlen		= 1,
+		.prod		= 1,
+		.type		= XFS_ALLOCTYPE_THIS_AG,
+		.pag		= pag,
+	};
+	struct xfs_buf          *agflbp = NULL;
+	xfs_agblock_t		bno;
+	int error;
+
+	if (flags & XFS_ALLOC_FLAG_NORMAP)
+		targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
+	else
+		targs.oinfo = XFS_RMAP_OINFO_AG;
+
+	error = xfs_alloc_read_agfl(pag, tp, &agflbp);
+	if (error)
+		return error;
+
+	/* Make the freelist longer if it's too short. */
+	while (pag->pagf_flcount < need) {
+		targs.agbno = 0;
+		targs.maxlen = need - pag->pagf_flcount;
+		targs.resv = XFS_AG_RESV_AGFL;
+
+		/* Allocate as many blocks as possible at once. */
+		error = xfs_alloc_ag_vextent(&targs);
+		if (error)
+			goto out_agflbp_relse;
+
+		/*
+		 * Stop if we run out.  Won't happen if callers are obeying
+		 * the restrictions correctly.  Can happen for free calls
+		 * on a completely full ag.
+		 */
+		if (targs.agbno == NULLAGBLOCK) {
+			if (flags & XFS_ALLOC_FLAG_FREEING)
+				break;
+			error = -ENOSPC;
+			goto out_agflbp_relse;
+		}
+		/*
+		 * Put each allocated block on the list.
+		 */
+		for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
+			error = xfs_alloc_put_freelist(pag, tp, agbp,
+							agflbp, bno, 0);
+			if (error)
+				goto out_agflbp_relse;
+		}
+	}
+out_agflbp_relse:
+	xfs_trans_brelse(tp, agflbp);
+	return error;
+}
+
 /*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
@@ -2600,8 +2680,7 @@  xfs_alloc_fix_freelist(
 	struct xfs_perag	*pag = args->pag;
 	struct xfs_trans	*tp = args->tp;
 	struct xfs_buf		*agbp = NULL;
-	struct xfs_buf		*agflbp = NULL;
-	struct xfs_alloc_arg	targs;	/* local allocation arguments */
+	struct xfs_owner_info	oinfo;
 	xfs_agblock_t		bno;	/* freelist block */
 	xfs_extlen_t		need;	/* total blocks needed in freelist */
 	int			error = 0;
@@ -2630,22 +2709,45 @@  xfs_alloc_fix_freelist(
 		goto out_agbp_relse;
 	}
 
-	need = xfs_alloc_min_freelist(mp, pag);
-	if (!xfs_alloc_space_available(args, need, flags |
-			XFS_ALLOC_FLAG_CHECK))
-		goto out_agbp_relse;
-
 	/*
-	 * Get the a.g. freespace buffer.
-	 * Can fail if we're not blocking on locks, and it's held.
+	 * See the comment above xfs_fill_agfl() for the reason why we need to
+	 * make freelist longer here.  Assumed that such case is quite rare, so
+	 * in order to simplify the code, let's take agbp unconditionally.
 	 */
-	if (!agbp) {
-		error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
-		if (error) {
-			/* Couldn't lock the AGF so skip this AG. */
-			if (error == -EAGAIN)
-				error = 0;
-			goto out_no_agbp;
+	need = xfs_alloc_min_freelist(mp, pag);
+	if (pag->pagf_flcount < need) {
+		/*
+		 * Get the a.g. freespace buffer.
+		 * Can fail if we're not blocking on locks, and it's held.
+		 */
+		if (!agbp) {
+			error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
+			if (error) {
+				/* Couldn't lock the AGF so skip this AG. */
+				if (error == -EAGAIN)
+					error = 0;
+				goto out_no_agbp;
+			}
+		}
+
+		need = xfs_alloc_min_freelist(mp, pag);
+		error = xfs_fill_agfl(args, flags, need, agbp);
+		if (error)
+			goto out_agbp_relse;
+	} else {
+		if (!xfs_alloc_space_available(args, need, flags |
+				XFS_ALLOC_FLAG_CHECK))
+			goto out_agbp_relse;
+
+		/* Get the a.g. freespace buffer. */
+		if (!agbp) {
+			error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
+			if (error) {
+				/* Couldn't lock the AGF so skip this AG. */
+				if (error == -EAGAIN)
+					error = 0;
+				goto out_no_agbp;
+			}
 		}
 	}
 
@@ -2691,69 +2793,22 @@  xfs_alloc_fix_freelist(
 	 * regenerated AGFL, bnobt, and cntbt.  See repair/phase5.c and
 	 * repair/rmap.c in xfsprogs for details.
 	 */
-	memset(&targs, 0, sizeof(targs));
-	/* struct copy below */
 	if (flags & XFS_ALLOC_FLAG_NORMAP)
-		targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
+		oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
 	else
-		targs.oinfo = XFS_RMAP_OINFO_AG;
+		oinfo = XFS_RMAP_OINFO_AG;
 	while (!(flags & XFS_ALLOC_FLAG_NOSHRINK) && pag->pagf_flcount > need) {
 		error = xfs_alloc_get_freelist(pag, tp, agbp, &bno, 0);
 		if (error)
 			goto out_agbp_relse;
 
 		/* defer agfl frees */
-		xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo);
+		xfs_defer_agfl_block(tp, args->agno, bno, &oinfo);
 	}
 
-	targs.tp = tp;
-	targs.mp = mp;
-	targs.agbp = agbp;
-	targs.agno = args->agno;
-	targs.alignment = targs.minlen = targs.prod = 1;
-	targs.type = XFS_ALLOCTYPE_THIS_AG;
-	targs.pag = pag;
-	error = xfs_alloc_read_agfl(pag, tp, &agflbp);
-	if (error)
-		goto out_agbp_relse;
-
-	/* Make the freelist longer if it's too short. */
-	while (pag->pagf_flcount < need) {
-		targs.agbno = 0;
-		targs.maxlen = need - pag->pagf_flcount;
-		targs.resv = XFS_AG_RESV_AGFL;
-
-		/* Allocate as many blocks as possible at once. */
-		error = xfs_alloc_ag_vextent(&targs);
-		if (error)
-			goto out_agflbp_relse;
-
-		/*
-		 * Stop if we run out.  Won't happen if callers are obeying
-		 * the restrictions correctly.  Can happen for free calls
-		 * on a completely full ag.
-		 */
-		if (targs.agbno == NULLAGBLOCK) {
-			if (flags & XFS_ALLOC_FLAG_FREEING)
-				break;
-			goto out_agflbp_relse;
-		}
-		/*
-		 * Put each allocated block on the list.
-		 */
-		for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
-			error = xfs_alloc_put_freelist(pag, tp, agbp,
-							agflbp, bno, 0);
-			if (error)
-				goto out_agflbp_relse;
-		}
-	}
-	xfs_trans_brelse(tp, agflbp);
 	args->agbp = agbp;
 	return 0;
 
-out_agflbp_relse:
-	xfs_trans_brelse(tp, agflbp);
 out_agbp_relse:
 	if (agbp)
 		xfs_trans_brelse(tp, agbp);