diff mbox series

xfs: fix uaf when leaf dir bestcount not match with dir data blocks

Message ID 20220829070212.2540615-1-guoxuenan@huawei.com (mailing list archive)
State Superseded, archived
Headers show
Series xfs: fix uaf when leaf dir bestcount not match with dir data blocks | expand

Commit Message

Guo Xuenan Aug. 29, 2022, 7:02 a.m. UTC
For leaf dir, there should be as many bestfree slots as there are dir data
blocks that can fit under i_size. Othrewise, which may cause UAF or
slab-out-of bound etc.

Root cause is we don't examin the number bestfree slots, when the slots
number less than dir data blocks, if we need to allocate new dir data block
and update the bestfree array, we will use the dir block number as index to
assign bestfree array, which may cause UAF or other memory access problem.
This issue can also triggered with test cases xfs/473 from fstests.

Simplify the testcase xfs/473 with commands below:
DEV=/dev/sdb
MP=/mnt/sdb
WORKDIR=/mnt/sdb/341 #1. mkfs create new xfs image
mkfs.xfs -f ${DEV}
mount ${DEV} ${MP}
mkdir -p ${WORKDIR}
for i in `seq 1 341` #2. create leaf dir with 341 entries file name len 8
do
    touch ${WORKDIR}/$(printf "%08d" ${i})
done
inode=$(ls -i ${MP} | cut -d' ' -f1)
umount ${MP}         #3. xfs_db set bestcount to 0
xfs_db -x ${DEV} -c "inode ${inode}" -c "dblock 8388608" \
-c "write ltail.bestcount 0"
mount ${DEV} ${MP}
touch ${WORKDIR}/{1..100}.txt #4. touch new file, reproduce

The error log is shown as follows:
==================================================================
BUG: KASAN: use-after-free in xfs_dir2_leaf_addname+0x1995/0x1ac0
Write of size 2 at addr ffff88810168b000 by task touch/1552
CPU: 5 PID: 1552 Comm: touch Not tainted 6.0.0-rc3+ #101
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x4d/0x66
 print_report.cold+0xf6/0x691
 kasan_report+0xa8/0x120
 xfs_dir2_leaf_addname+0x1995/0x1ac0
 xfs_dir_createname+0x58c/0x7f0
 xfs_create+0x7af/0x1010
 xfs_generic_create+0x270/0x5e0
 path_openat+0x270b/0x3450
 do_filp_open+0x1cf/0x2b0
 do_sys_openat2+0x46b/0x7a0
 do_sys_open+0xb7/0x130
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe4d9e9312b
Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
RDX: 0000000000000941 RSI: 00007ffda4c17f33 RDI: 00000000ffffff9c
RBP: 00007ffda4c17f33 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
R13: 00007fe4d9f631a4 R14: 00007ffda4c17f33 R15: 0000000000000000
 </TASK>

The buggy address belongs to the physical page:
page:ffffea000405a2c0 refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x10168b
flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
raw: 002fffff80000000 ffffea0004057788 ffffea000402dbc8 0000000000000000
raw: 0000000000000000 0000000000170000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88810168af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88810168af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88810168b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff88810168b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88810168b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
00000000: 58 44 44 33 5b 53 35 c2 00 00 00 00 00 00 00 78
XDD3[S5........x
XFS (sdb): Internal error xfs_dir2_data_use_free at line 1200 of file
fs/xfs/libxfs/xfs_dir2_data.c.  Caller
xfs_dir2_data_use_free+0x28a/0xeb0
CPU: 5 PID: 1552 Comm: touch Tainted: G    B              6.0.0-rc3+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x4d/0x66
 xfs_corruption_error+0x132/0x150
 xfs_dir2_data_use_free+0x198/0xeb0
 xfs_dir2_leaf_addname+0xa59/0x1ac0
 xfs_dir_createname+0x58c/0x7f0
 xfs_create+0x7af/0x1010
 xfs_generic_create+0x270/0x5e0
 path_openat+0x270b/0x3450
 do_filp_open+0x1cf/0x2b0
 do_sys_openat2+0x46b/0x7a0
 do_sys_open+0xb7/0x130
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe4d9e9312b
Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
RDX: 0000000000000941 RSI: 00007ffda4c17f46 RDI: 00000000ffffff9c
RBP: 00007ffda4c17f46 R08: 0000000000000000 R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
R13: 00007fe4d9f631a4 R14: 00007ffda4c17f46 R15: 0000000000000000
 </TASK>
XFS (sdb): Corruption detected. Unmount and run xfs_repair

Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
---
 fs/xfs/libxfs/xfs_dir2_leaf.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Dave Chinner Aug. 29, 2022, 8:12 a.m. UTC | #1
On Mon, Aug 29, 2022 at 03:02:12PM +0800, Guo Xuenan wrote:
> For leaf dir, there should be as many bestfree slots as there are dir data
> blocks that can fit under i_size. Othrewise, which may cause UAF or
> slab-out-of bound etc.

Nice find, and thanks for the comprehensive description of the
problem!

> Root cause is we don't examin the number bestfree slots, when the slots
> number less than dir data blocks, if we need to allocate new dir data block
> and update the bestfree array, we will use the dir block number as index to
> assign bestfree array, which may cause UAF or other memory access problem.
> This issue can also triggered with test cases xfs/473 from fstests.
> 
> Simplify the testcase xfs/473 with commands below:
> DEV=/dev/sdb
> MP=/mnt/sdb
> WORKDIR=/mnt/sdb/341 #1. mkfs create new xfs image
> mkfs.xfs -f ${DEV}
> mount ${DEV} ${MP}
> mkdir -p ${WORKDIR}
> for i in `seq 1 341` #2. create leaf dir with 341 entries file name len 8
> do
>     touch ${WORKDIR}/$(printf "%08d" ${i})
> done
> inode=$(ls -i ${MP} | cut -d' ' -f1)
> umount ${MP}         #3. xfs_db set bestcount to 0
> xfs_db -x ${DEV} -c "inode ${inode}" -c "dblock 8388608" \
> -c "write ltail.bestcount 0"
> mount ${DEV} ${MP}
> touch ${WORKDIR}/{1..100}.txt #4. touch new file, reproduce

Ah, so it's verifier deficiency...

Can you turn this reproducer back into a new fstest, please?

[....]

> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
> ---
>  fs/xfs/libxfs/xfs_dir2_leaf.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
> index d9b66306a9a7..09414651ac48 100644
> --- a/fs/xfs/libxfs/xfs_dir2_leaf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
> @@ -659,6 +659,20 @@ xfs_dir2_leaf_addname(
>  	bestsp = xfs_dir2_leaf_bests_p(ltp);
>  	length = xfs_dir2_data_entsize(dp->i_mount, args->namelen);
>  
> +	/*
> +	 * There should be as many bestfree slots as there are dir data
> +	 * blocks that can fit under i_size. Othrewise, which may cause
> +	 * serious problems eg. UAF or slab-out-of bound etc.
> +	 */
> +	if (be32_to_cpu(ltp->bestcount) !=
> +			xfs_dir2_byte_to_db(args->geo, dp->i_d.di_size)) {
> +		xfs_buf_ioerror_alert(lbp, __return_address);
> +		if (tp->t_flags & XFS_TRANS_DIRTY)
> +			xfs_force_shutdown(tp->t_mountp,
> +				SHUTDOWN_CORRUPT_INCORE);
> +		return -EFSCORRUPTED;
> +	}
> +

Yeah, that needs to go in xfs_dir3_leaf_check_int() so we catch the
corruption as it comes in off disk (and before an in-memory
corruption might get written back to disk) - we don't need to check
it every time we add an entry to a leaf block.

Indeed, I left a comment in xfs_dir3_leaf_check_int() indicating
that the checking wasn't restrictive enough:

        /*
         * XXX (dgc): This value is not restrictive enough.
         * Should factor in the size of the bests table as well.
         * We can deduce a value for that from i_disk_size.
         */
        if (hdr->count > geo->leaf_max_ents)
                return __this_address;

IOWs, we need update xfs_dir3_leaf_check_int() to check the
bestcount against the size of the bests table as you've done above,
and then also use that table size info to reduce the bound that we
validate the leaf block entry count against, too.

Can you update your fix to validate both these fields correctly
against the size of the bests table in xfs_dir3_leaf_check_int()?

Cheers,

Dave.
kernel test robot Aug. 29, 2022, 9:16 a.m. UTC | #2
Hi Guo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v6.0-rc3]
[also build test ERROR on linus/master next-20220829]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Guo-Xuenan/xfs-fix-uaf-when-leaf-dir-bestcount-not-match-with-dir-data-blocks/20220829-144530
base:    b90cb1053190353cc30f0fef0ef1f378ccc063c5
config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20220829/202208291703.BcRRyCDy-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/26c85e96017e84257ac452f142a123bfd7dad776
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Guo-Xuenan/xfs-fix-uaf-when-leaf-dir-bestcount-not-match-with-dir-data-blocks/20220829-144530
        git checkout 26c85e96017e84257ac452f142a123bfd7dad776
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash fs/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/xfs/libxfs/xfs_dir2_leaf.c: In function 'xfs_dir2_leaf_addname':
>> fs/xfs/libxfs/xfs_dir2_leaf.c:668:60: error: 'struct xfs_inode' has no member named 'i_d'; did you mean 'i_df'?
     668 |                         xfs_dir2_byte_to_db(args->geo, dp->i_d.di_size)) {
         |                                                            ^~~
         |                                                            i_df


vim +668 fs/xfs/libxfs/xfs_dir2_leaf.c

   604	
   605	/*
   606	 * Add an entry to a leaf form directory.
   607	 */
   608	int						/* error */
   609	xfs_dir2_leaf_addname(
   610		struct xfs_da_args	*args)		/* operation arguments */
   611	{
   612		struct xfs_dir3_icleaf_hdr leafhdr;
   613		struct xfs_trans	*tp = args->trans;
   614		__be16			*bestsp;	/* freespace table in leaf */
   615		__be16			*tagp;		/* end of data entry */
   616		struct xfs_buf		*dbp;		/* data block buffer */
   617		struct xfs_buf		*lbp;		/* leaf's buffer */
   618		struct xfs_dir2_leaf	*leaf;		/* leaf structure */
   619		struct xfs_inode	*dp = args->dp;	/* incore directory inode */
   620		struct xfs_dir2_data_hdr *hdr;		/* data block header */
   621		struct xfs_dir2_data_entry *dep;	/* data block entry */
   622		struct xfs_dir2_leaf_entry *lep;	/* leaf entry table pointer */
   623		struct xfs_dir2_leaf_entry *ents;
   624		struct xfs_dir2_data_unused *dup;	/* data unused entry */
   625		struct xfs_dir2_leaf_tail *ltp;		/* leaf tail pointer */
   626		struct xfs_dir2_data_free *bf;		/* bestfree table */
   627		int			compact;	/* need to compact leaves */
   628		int			error;		/* error return value */
   629		int			grown;		/* allocated new data block */
   630		int			highstale = 0;	/* index of next stale leaf */
   631		int			i;		/* temporary, index */
   632		int			index;		/* leaf table position */
   633		int			length;		/* length of new entry */
   634		int			lfloglow;	/* low leaf logging index */
   635		int			lfloghigh;	/* high leaf logging index */
   636		int			lowstale = 0;	/* index of prev stale leaf */
   637		int			needbytes;	/* leaf block bytes needed */
   638		int			needlog;	/* need to log data header */
   639		int			needscan;	/* need to rescan data free */
   640		xfs_dir2_db_t		use_block;	/* data block number */
   641	
   642		trace_xfs_dir2_leaf_addname(args);
   643	
   644		error = xfs_dir3_leaf_read(tp, dp, args->geo->leafblk, &lbp);
   645		if (error)
   646			return error;
   647	
   648		/*
   649		 * Look up the entry by hash value and name.
   650		 * We know it's not there, our caller has already done a lookup.
   651		 * So the index is of the entry to insert in front of.
   652		 * But if there are dup hash values the index is of the first of those.
   653		 */
   654		index = xfs_dir2_leaf_search_hash(args, lbp);
   655		leaf = lbp->b_addr;
   656		ltp = xfs_dir2_leaf_tail_p(args->geo, leaf);
   657		xfs_dir2_leaf_hdr_from_disk(dp->i_mount, &leafhdr, leaf);
   658		ents = leafhdr.ents;
   659		bestsp = xfs_dir2_leaf_bests_p(ltp);
   660		length = xfs_dir2_data_entsize(dp->i_mount, args->namelen);
   661	
   662		/*
   663		 * There should be as many bestfree slots as there are dir data
   664		 * blocks that can fit under i_size. Othrewise, which may cause
   665		 * serious problems eg. UAF or slab-out-of bound etc.
   666		 */
   667		if (be32_to_cpu(ltp->bestcount) !=
 > 668				xfs_dir2_byte_to_db(args->geo, dp->i_d.di_size)) {
   669			xfs_buf_ioerror_alert(lbp, __return_address);
   670			if (tp->t_flags & XFS_TRANS_DIRTY)
   671				xfs_force_shutdown(tp->t_mountp,
   672					SHUTDOWN_CORRUPT_INCORE);
   673			return -EFSCORRUPTED;
   674		}
   675	
   676		/*
   677		 * See if there are any entries with the same hash value
   678		 * and space in their block for the new entry.
   679		 * This is good because it puts multiple same-hash value entries
   680		 * in a data block, improving the lookup of those entries.
   681		 */
   682		for (use_block = -1, lep = &ents[index];
   683		     index < leafhdr.count && be32_to_cpu(lep->hashval) == args->hashval;
   684		     index++, lep++) {
   685			if (be32_to_cpu(lep->address) == XFS_DIR2_NULL_DATAPTR)
   686				continue;
   687			i = xfs_dir2_dataptr_to_db(args->geo, be32_to_cpu(lep->address));
   688			ASSERT(i < be32_to_cpu(ltp->bestcount));
   689			ASSERT(bestsp[i] != cpu_to_be16(NULLDATAOFF));
   690			if (be16_to_cpu(bestsp[i]) >= length) {
   691				use_block = i;
   692				break;
   693			}
   694		}
   695		/*
   696		 * Didn't find a block yet, linear search all the data blocks.
   697		 */
   698		if (use_block == -1) {
   699			for (i = 0; i < be32_to_cpu(ltp->bestcount); i++) {
   700				/*
   701				 * Remember a block we see that's missing.
   702				 */
   703				if (bestsp[i] == cpu_to_be16(NULLDATAOFF) &&
   704				    use_block == -1)
   705					use_block = i;
   706				else if (be16_to_cpu(bestsp[i]) >= length) {
   707					use_block = i;
   708					break;
   709				}
   710			}
   711		}
   712		/*
   713		 * How many bytes do we need in the leaf block?
   714		 */
   715		needbytes = 0;
   716		if (!leafhdr.stale)
   717			needbytes += sizeof(xfs_dir2_leaf_entry_t);
   718		if (use_block == -1)
   719			needbytes += sizeof(xfs_dir2_data_off_t);
   720	
   721		/*
   722		 * Now kill use_block if it refers to a missing block, so we
   723		 * can use it as an indication of allocation needed.
   724		 */
   725		if (use_block != -1 && bestsp[use_block] == cpu_to_be16(NULLDATAOFF))
   726			use_block = -1;
   727		/*
   728		 * If we don't have enough free bytes but we can make enough
   729		 * by compacting out stale entries, we'll do that.
   730		 */
   731		if ((char *)bestsp - (char *)&ents[leafhdr.count] < needbytes &&
   732		    leafhdr.stale > 1)
   733			compact = 1;
   734	
   735		/*
   736		 * Otherwise if we don't have enough free bytes we need to
   737		 * convert to node form.
   738		 */
   739		else if ((char *)bestsp - (char *)&ents[leafhdr.count] < needbytes) {
   740			/*
   741			 * Just checking or no space reservation, give up.
   742			 */
   743			if ((args->op_flags & XFS_DA_OP_JUSTCHECK) ||
   744								args->total == 0) {
   745				xfs_trans_brelse(tp, lbp);
   746				return -ENOSPC;
   747			}
   748			/*
   749			 * Convert to node form.
   750			 */
   751			error = xfs_dir2_leaf_to_node(args, lbp);
   752			if (error)
   753				return error;
   754			/*
   755			 * Then add the new entry.
   756			 */
   757			return xfs_dir2_node_addname(args);
   758		}
   759		/*
   760		 * Otherwise it will fit without compaction.
   761		 */
   762		else
   763			compact = 0;
   764		/*
   765		 * If just checking, then it will fit unless we needed to allocate
   766		 * a new data block.
   767		 */
   768		if (args->op_flags & XFS_DA_OP_JUSTCHECK) {
   769			xfs_trans_brelse(tp, lbp);
   770			return use_block == -1 ? -ENOSPC : 0;
   771		}
   772		/*
   773		 * If no allocations are allowed, return now before we've
   774		 * changed anything.
   775		 */
   776		if (args->total == 0 && use_block == -1) {
   777			xfs_trans_brelse(tp, lbp);
   778			return -ENOSPC;
   779		}
   780		/*
   781		 * Need to compact the leaf entries, removing stale ones.
   782		 * Leave one stale entry behind - the one closest to our
   783		 * insertion index - and we'll shift that one to our insertion
   784		 * point later.
   785		 */
   786		if (compact) {
   787			xfs_dir3_leaf_compact_x1(&leafhdr, ents, &index, &lowstale,
   788				&highstale, &lfloglow, &lfloghigh);
   789		}
   790		/*
   791		 * There are stale entries, so we'll need log-low and log-high
   792		 * impossibly bad values later.
   793		 */
   794		else if (leafhdr.stale) {
   795			lfloglow = leafhdr.count;
   796			lfloghigh = -1;
   797		}
   798		/*
   799		 * If there was no data block space found, we need to allocate
   800		 * a new one.
   801		 */
   802		if (use_block == -1) {
   803			/*
   804			 * Add the new data block.
   805			 */
   806			if ((error = xfs_dir2_grow_inode(args, XFS_DIR2_DATA_SPACE,
   807					&use_block))) {
   808				xfs_trans_brelse(tp, lbp);
   809				return error;
   810			}
   811			/*
   812			 * Initialize the block.
   813			 */
   814			if ((error = xfs_dir3_data_init(args, use_block, &dbp))) {
   815				xfs_trans_brelse(tp, lbp);
   816				return error;
   817			}
   818			/*
   819			 * If we're adding a new data block on the end we need to
   820			 * extend the bests table.  Copy it up one entry.
   821			 */
   822			if (use_block >= be32_to_cpu(ltp->bestcount)) {
   823				bestsp--;
   824				memmove(&bestsp[0], &bestsp[1],
   825					be32_to_cpu(ltp->bestcount) * sizeof(bestsp[0]));
   826				be32_add_cpu(&ltp->bestcount, 1);
   827				xfs_dir3_leaf_log_tail(args, lbp);
   828				xfs_dir3_leaf_log_bests(args, lbp, 0,
   829							be32_to_cpu(ltp->bestcount) - 1);
   830			}
   831			/*
   832			 * If we're filling in a previously empty block just log it.
   833			 */
   834			else
   835				xfs_dir3_leaf_log_bests(args, lbp, use_block, use_block);
   836			hdr = dbp->b_addr;
   837			bf = xfs_dir2_data_bestfree_p(dp->i_mount, hdr);
   838			bestsp[use_block] = bf[0].length;
   839			grown = 1;
   840		} else {
   841			/*
   842			 * Already had space in some data block.
   843			 * Just read that one in.
   844			 */
   845			error = xfs_dir3_data_read(tp, dp,
   846					   xfs_dir2_db_to_da(args->geo, use_block),
   847					   0, &dbp);
   848			if (error) {
   849				xfs_trans_brelse(tp, lbp);
   850				return error;
   851			}
   852			hdr = dbp->b_addr;
   853			bf = xfs_dir2_data_bestfree_p(dp->i_mount, hdr);
   854			grown = 0;
   855		}
   856		/*
   857		 * Point to the biggest freespace in our data block.
   858		 */
   859		dup = (xfs_dir2_data_unused_t *)
   860		      ((char *)hdr + be16_to_cpu(bf[0].offset));
   861		needscan = needlog = 0;
   862		/*
   863		 * Mark the initial part of our freespace in use for the new entry.
   864		 */
   865		error = xfs_dir2_data_use_free(args, dbp, dup,
   866				(xfs_dir2_data_aoff_t)((char *)dup - (char *)hdr),
   867				length, &needlog, &needscan);
   868		if (error) {
   869			xfs_trans_brelse(tp, lbp);
   870			return error;
   871		}
   872		/*
   873		 * Initialize our new entry (at last).
   874		 */
   875		dep = (xfs_dir2_data_entry_t *)dup;
   876		dep->inumber = cpu_to_be64(args->inumber);
   877		dep->namelen = args->namelen;
   878		memcpy(dep->name, args->name, dep->namelen);
   879		xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
   880		tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
   881		*tagp = cpu_to_be16((char *)dep - (char *)hdr);
   882		/*
   883		 * Need to scan fix up the bestfree table.
   884		 */
   885		if (needscan)
   886			xfs_dir2_data_freescan(dp->i_mount, hdr, &needlog);
   887		/*
   888		 * Need to log the data block's header.
   889		 */
   890		if (needlog)
   891			xfs_dir2_data_log_header(args, dbp);
   892		xfs_dir2_data_log_entry(args, dbp, dep);
   893		/*
   894		 * If the bests table needs to be changed, do it.
   895		 * Log the change unless we've already done that.
   896		 */
   897		if (be16_to_cpu(bestsp[use_block]) != be16_to_cpu(bf[0].length)) {
   898			bestsp[use_block] = bf[0].length;
   899			if (!grown)
   900				xfs_dir3_leaf_log_bests(args, lbp, use_block, use_block);
   901		}
   902	
   903		lep = xfs_dir3_leaf_find_entry(&leafhdr, ents, index, compact, lowstale,
   904					       highstale, &lfloglow, &lfloghigh);
   905	
   906		/*
   907		 * Fill in the new leaf entry.
   908		 */
   909		lep->hashval = cpu_to_be32(args->hashval);
   910		lep->address = cpu_to_be32(
   911					xfs_dir2_db_off_to_dataptr(args->geo, use_block,
   912					be16_to_cpu(*tagp)));
   913		/*
   914		 * Log the leaf fields and give up the buffers.
   915		 */
   916		xfs_dir2_leaf_hdr_to_disk(dp->i_mount, leaf, &leafhdr);
   917		xfs_dir3_leaf_log_header(args, lbp);
   918		xfs_dir3_leaf_log_ents(args, &leafhdr, lbp, lfloglow, lfloghigh);
   919		xfs_dir3_leaf_check(dp, lbp);
   920		xfs_dir3_data_check(dp, dbp);
   921		return 0;
   922	}
   923
Darrick J. Wong Aug. 29, 2022, 2:47 p.m. UTC | #3
On Mon, Aug 29, 2022 at 03:02:12PM +0800, Guo Xuenan wrote:
> For leaf dir, there should be as many bestfree slots as there are dir data
> blocks that can fit under i_size. Othrewise, which may cause UAF or
> slab-out-of bound etc.
> 
> Root cause is we don't examin the number bestfree slots, when the slots
> number less than dir data blocks, if we need to allocate new dir data block
> and update the bestfree array, we will use the dir block number as index to
> assign bestfree array, which may cause UAF or other memory access problem.
> This issue can also triggered with test cases xfs/473 from fstests.
> 
> Simplify the testcase xfs/473 with commands below:
> DEV=/dev/sdb
> MP=/mnt/sdb
> WORKDIR=/mnt/sdb/341 #1. mkfs create new xfs image
> mkfs.xfs -f ${DEV}
> mount ${DEV} ${MP}
> mkdir -p ${WORKDIR}
> for i in `seq 1 341` #2. create leaf dir with 341 entries file name len 8
> do
>     touch ${WORKDIR}/$(printf "%08d" ${i})
> done
> inode=$(ls -i ${MP} | cut -d' ' -f1)
> umount ${MP}         #3. xfs_db set bestcount to 0
> xfs_db -x ${DEV} -c "inode ${inode}" -c "dblock 8388608" \
> -c "write ltail.bestcount 0"
> mount ${DEV} ${MP}
> touch ${WORKDIR}/{1..100}.txt #4. touch new file, reproduce
> 
> The error log is shown as follows:
> ==================================================================
> BUG: KASAN: use-after-free in xfs_dir2_leaf_addname+0x1995/0x1ac0
> Write of size 2 at addr ffff88810168b000 by task touch/1552
> CPU: 5 PID: 1552 Comm: touch Not tainted 6.0.0-rc3+ #101
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x4d/0x66
>  print_report.cold+0xf6/0x691
>  kasan_report+0xa8/0x120
>  xfs_dir2_leaf_addname+0x1995/0x1ac0
>  xfs_dir_createname+0x58c/0x7f0
>  xfs_create+0x7af/0x1010
>  xfs_generic_create+0x270/0x5e0
>  path_openat+0x270b/0x3450
>  do_filp_open+0x1cf/0x2b0
>  do_sys_openat2+0x46b/0x7a0
>  do_sys_open+0xb7/0x130
>  do_syscall_64+0x35/0x80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fe4d9e9312b
> Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
> 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
> f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
> RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
> RDX: 0000000000000941 RSI: 00007ffda4c17f33 RDI: 00000000ffffff9c
> RBP: 00007ffda4c17f33 R08: 0000000000000000 R09: 0000000000000000
> R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
> R13: 00007fe4d9f631a4 R14: 00007ffda4c17f33 R15: 0000000000000000
>  </TASK>
> 
> The buggy address belongs to the physical page:
> page:ffffea000405a2c0 refcount:0 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0x10168b
> flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
> raw: 002fffff80000000 ffffea0004057788 ffffea000402dbc8 0000000000000000
> raw: 0000000000000000 0000000000170000 00000000ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff88810168af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffff88810168af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >ffff88810168b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>                    ^
>  ffff88810168b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>  ffff88810168b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ==================================================================
> Disabling lock debugging due to kernel taint
> 00000000: 58 44 44 33 5b 53 35 c2 00 00 00 00 00 00 00 78
> XDD3[S5........x
> XFS (sdb): Internal error xfs_dir2_data_use_free at line 1200 of file
> fs/xfs/libxfs/xfs_dir2_data.c.  Caller
> xfs_dir2_data_use_free+0x28a/0xeb0
> CPU: 5 PID: 1552 Comm: touch Tainted: G    B              6.0.0-rc3+
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x4d/0x66
>  xfs_corruption_error+0x132/0x150
>  xfs_dir2_data_use_free+0x198/0xeb0
>  xfs_dir2_leaf_addname+0xa59/0x1ac0
>  xfs_dir_createname+0x58c/0x7f0
>  xfs_create+0x7af/0x1010
>  xfs_generic_create+0x270/0x5e0
>  path_openat+0x270b/0x3450
>  do_filp_open+0x1cf/0x2b0
>  do_sys_openat2+0x46b/0x7a0
>  do_sys_open+0xb7/0x130
>  do_syscall_64+0x35/0x80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fe4d9e9312b
> Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
> 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
> f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
> RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
> RDX: 0000000000000941 RSI: 00007ffda4c17f46 RDI: 00000000ffffff9c
> RBP: 00007ffda4c17f46 R08: 0000000000000000 R09: 0000000000000001
> R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
> R13: 00007fe4d9f631a4 R14: 00007ffda4c17f46 R15: 0000000000000000
>  </TASK>
> XFS (sdb): Corruption detected. Unmount and run xfs_repair
> 
> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
> ---
>  fs/xfs/libxfs/xfs_dir2_leaf.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
> index d9b66306a9a7..09414651ac48 100644
> --- a/fs/xfs/libxfs/xfs_dir2_leaf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
> @@ -659,6 +659,20 @@ xfs_dir2_leaf_addname(
>  	bestsp = xfs_dir2_leaf_bests_p(ltp);
>  	length = xfs_dir2_data_entsize(dp->i_mount, args->namelen);
>  
> +	/*
> +	 * There should be as many bestfree slots as there are dir data
> +	 * blocks that can fit under i_size. Othrewise, which may cause

Typo: "Otherwise..."

> +	 * serious problems eg. UAF or slab-out-of bound etc.

What about commit e5d1802c70f5 ("xfs: fix a bug in the online fsck
directory leaf1 bestcount check")?  Online fsck used to have this check
in it, until I discovered that the kernel actually /can/ remove blocks
from the end of the data section of a directory and forget to decrease
i_disk_size.  xfs_repair hasn't ever complained about that, which means
it's part of the disk format.

> +	 */
> +	if (be32_to_cpu(ltp->bestcount) !=
> +			xfs_dir2_byte_to_db(args->geo, dp->i_d.di_size)) {

What is i_d.di_size?  We changed that to i_disk_size ages ago...

--D

> +		xfs_buf_ioerror_alert(lbp, __return_address);
> +		if (tp->t_flags & XFS_TRANS_DIRTY)
> +			xfs_force_shutdown(tp->t_mountp,
> +				SHUTDOWN_CORRUPT_INCORE);
> +		return -EFSCORRUPTED;
> +	}
> +
>  	/*
>  	 * See if there are any entries with the same hash value
>  	 * and space in their block for the new entry.
> -- 
> 2.25.1
>
Guo Xuenan Sept. 2, 2022, 11:24 a.m. UTC | #4
Hi Dave & Darrick:

To reproduce this problem add xfs/554 [1].
And, I resend a patch v2 [2], considering the situation mentioned by Darrick,
Meanwhile, check the i_disk_size or get dir blocks at xfs_dir3_leaf_check_int
looks a little strange. So in my opinion, we can just add judgment in that
situation, avoiding UAF and return EFSCORRUPTED, xfs_repair will help us fix it.

[1] https://lore.kernel.org/all/20220902094046.3891252-1-guoxuenan@huawei.com/
[2] https://lore.kernel.org/all/20220831121639.3060527-1-guoxuenan@huawei.com/



thanks
Xueanan
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index d9b66306a9a7..09414651ac48 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -659,6 +659,20 @@  xfs_dir2_leaf_addname(
 	bestsp = xfs_dir2_leaf_bests_p(ltp);
 	length = xfs_dir2_data_entsize(dp->i_mount, args->namelen);
 
+	/*
+	 * There should be as many bestfree slots as there are dir data
+	 * blocks that can fit under i_size. Othrewise, which may cause
+	 * serious problems eg. UAF or slab-out-of bound etc.
+	 */
+	if (be32_to_cpu(ltp->bestcount) !=
+			xfs_dir2_byte_to_db(args->geo, dp->i_d.di_size)) {
+		xfs_buf_ioerror_alert(lbp, __return_address);
+		if (tp->t_flags & XFS_TRANS_DIRTY)
+			xfs_force_shutdown(tp->t_mountp,
+				SHUTDOWN_CORRUPT_INCORE);
+		return -EFSCORRUPTED;
+	}
+
 	/*
 	 * See if there are any entries with the same hash value
 	 * and space in their block for the new entry.