diff mbox series

[v2,3/3] xfs: make sure done item committed before cancel intents

Message ID 20230715063647.2094989-4-leo.lilong@huawei.com (mailing list archive)
State New, archived
Headers show
Series xfs: fix two problem when recovery intents fails | expand

Commit Message

Long Li July 15, 2023, 6:36 a.m. UTC
KASAN report a uaf when recover intents fails:

 ==================================================================
 BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
 Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
 CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
 Workqueue: xfs-cil/sda xlog_cil_push_work
 Call Trace:
  <TASK>
  dump_stack_lvl+0x50/0x70
  print_report+0xc2/0x600
  kasan_report+0xb6/0xe0
  xfs_cui_release+0xb7/0xc0
  xfs_cud_item_release+0x3c/0x90
  xfs_trans_committed_bulk+0x2d5/0x7f0
  xlog_cil_committed+0xaba/0xf20
  xlog_cil_push_work+0x1a60/0x2360
  process_one_work+0x78e/0x1140
  worker_thread+0x58b/0xf60
  kthread+0x2cd/0x3c0
  ret_from_fork+0x1f/0x30
  </TASK>

 Allocated by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  __kasan_slab_alloc+0x55/0x60
  kmem_cache_alloc+0x195/0x5f0
  xfs_cui_init+0x198/0x1d0
  xlog_recover_cui_commit_pass2+0x133/0x5f0
  xlog_recover_items_pass2+0x107/0x230
  xlog_recover_commit_trans+0x3e7/0x9c0
  xlog_recovery_process_trans+0x140/0x1d0
  xlog_recover_process_ophdr+0x1a0/0x3d0
  xlog_recover_process_data+0x108/0x2d0
  xlog_recover_process+0x1f6/0x280
  xlog_do_recovery_pass+0x609/0xdb0
  xlog_do_log_recovery+0x84/0xe0
  xlog_do_recover+0x7d/0x470
  xlog_recover+0x25f/0x490
  xfs_log_mount+0x2dd/0x6f0
  xfs_mountfs+0x11ce/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 Freed by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  kasan_save_free_info+0x2b/0x40
  __kasan_slab_free+0x114/0x1b0
  kmem_cache_free+0xf8/0x510
  xfs_cui_item_free+0x95/0xb0
  xfs_cui_release+0x86/0xc0
  xlog_recover_cancel_intents.isra.0+0xf8/0x210
  xlog_recover_finish+0x7e7/0x980
  xfs_log_mount_finish+0x2bb/0x4a0
  xfs_mountfs+0x14bf/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 The buggy address belongs to the object at ffff888012575dc8
  which belongs to the cache xfs_cui_item of size 432
 The buggy address is located 152 bytes inside of
  freed 432-byte region [ffff888012575dc8, ffff888012575f78)

 The buggy address belongs to the physical page:
 page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
 head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
 page_type: 0xffffffff()
 raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
 raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
  ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
 >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
 ==================================================================

If process intents fails, intent items left in AIL will be delete
from AIL and freed in error handling, even intent items that have been
recovered and created done items. After this, uaf will be triggered when
done item commited, because at this point the released intent item will
be accessed.

xlog_recover_finish                     xlog_cil_push_work
----------------------------            ---------------------------
xlog_recover_process_intents
  xfs_cui_item_recover//cui_refcount == 1
    xfs_trans_get_cud
    xfs_trans_commit
      <add cud item to cil>
  xfs_cui_item_recover
    <error occurred and return>
xlog_recover_cancel_intents
  xfs_cui_release     //cui_refcount == 0
    xfs_cui_item_free //free cui
  <release other intent items>
xlog_force_shutdown   //shutdown
                               <...>
                                        <push items in cil>
                                        xlog_cil_committed
                                          xfs_cud_item_release
                                            xfs_cui_release // UAF

Fix it by move log force forward to make sure done items committed before
cancel intents.

Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails")
Signed-off-by: Long Li <leo.lilong@huawei.com>
---
 fs/xfs/xfs_log_recover.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

Darrick J. Wong July 19, 2023, 2:19 a.m. UTC | #1
On Sat, Jul 15, 2023 at 02:36:47PM +0800, Long Li wrote:
> KASAN report a uaf when recover intents fails:
> 
>  ==================================================================
>  BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
>  Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
>  CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
>  Workqueue: xfs-cil/sda xlog_cil_push_work
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x50/0x70
>   print_report+0xc2/0x600
>   kasan_report+0xb6/0xe0
>   xfs_cui_release+0xb7/0xc0
>   xfs_cud_item_release+0x3c/0x90
>   xfs_trans_committed_bulk+0x2d5/0x7f0
>   xlog_cil_committed+0xaba/0xf20
>   xlog_cil_push_work+0x1a60/0x2360
>   process_one_work+0x78e/0x1140
>   worker_thread+0x58b/0xf60
>   kthread+0x2cd/0x3c0
>   ret_from_fork+0x1f/0x30
>   </TASK>
> 
>  Allocated by task 531:
>   kasan_save_stack+0x22/0x40
>   kasan_set_track+0x25/0x30
>   __kasan_slab_alloc+0x55/0x60
>   kmem_cache_alloc+0x195/0x5f0
>   xfs_cui_init+0x198/0x1d0
>   xlog_recover_cui_commit_pass2+0x133/0x5f0
>   xlog_recover_items_pass2+0x107/0x230
>   xlog_recover_commit_trans+0x3e7/0x9c0
>   xlog_recovery_process_trans+0x140/0x1d0
>   xlog_recover_process_ophdr+0x1a0/0x3d0
>   xlog_recover_process_data+0x108/0x2d0
>   xlog_recover_process+0x1f6/0x280
>   xlog_do_recovery_pass+0x609/0xdb0
>   xlog_do_log_recovery+0x84/0xe0
>   xlog_do_recover+0x7d/0x470
>   xlog_recover+0x25f/0x490
>   xfs_log_mount+0x2dd/0x6f0
>   xfs_mountfs+0x11ce/0x1e70
>   xfs_fs_fill_super+0x10ec/0x1b20
>   get_tree_bdev+0x3c8/0x730
>   vfs_get_tree+0x89/0x2c0
>   path_mount+0xecf/0x1800
>   do_mount+0xf3/0x110
>   __x64_sys_mount+0x154/0x1f0
>   do_syscall_64+0x39/0x80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
>  Freed by task 531:
>   kasan_save_stack+0x22/0x40
>   kasan_set_track+0x25/0x30
>   kasan_save_free_info+0x2b/0x40
>   __kasan_slab_free+0x114/0x1b0
>   kmem_cache_free+0xf8/0x510
>   xfs_cui_item_free+0x95/0xb0
>   xfs_cui_release+0x86/0xc0
>   xlog_recover_cancel_intents.isra.0+0xf8/0x210
>   xlog_recover_finish+0x7e7/0x980
>   xfs_log_mount_finish+0x2bb/0x4a0
>   xfs_mountfs+0x14bf/0x1e70
>   xfs_fs_fill_super+0x10ec/0x1b20
>   get_tree_bdev+0x3c8/0x730
>   vfs_get_tree+0x89/0x2c0
>   path_mount+0xecf/0x1800
>   do_mount+0xf3/0x110
>   __x64_sys_mount+0x154/0x1f0
>   do_syscall_64+0x39/0x80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
>  The buggy address belongs to the object at ffff888012575dc8
>   which belongs to the cache xfs_cui_item of size 432
>  The buggy address is located 152 bytes inside of
>   freed 432-byte region [ffff888012575dc8, ffff888012575f78)
> 
>  The buggy address belongs to the physical page:
>  page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
>  head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
>  page_type: 0xffffffff()
>  raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
>  raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
>  page dumped because: kasan: bad access detected
> 
>  Memory state around the buggy address:
>   ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
>   ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>  >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                         ^
>   ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
>  ==================================================================
> 
> If process intents fails, intent items left in AIL will be delete
> from AIL and freed in error handling, even intent items that have been
> recovered and created done items. After this, uaf will be triggered when
> done item commited, because at this point the released intent item will
> be accessed.
> 
> xlog_recover_finish                     xlog_cil_push_work
> ----------------------------            ---------------------------
> xlog_recover_process_intents
>   xfs_cui_item_recover//cui_refcount == 1
>     xfs_trans_get_cud
>     xfs_trans_commit
>       <add cud item to cil>
>   xfs_cui_item_recover
>     <error occurred and return>
> xlog_recover_cancel_intents
>   xfs_cui_release     //cui_refcount == 0
>     xfs_cui_item_free //free cui
>   <release other intent items>
> xlog_force_shutdown   //shutdown
>                                <...>
>                                         <push items in cil>
>                                         xlog_cil_committed
>                                           xfs_cud_item_release
>                                             xfs_cui_release // UAF
> 
> Fix it by move log force forward to make sure done items committed before
> cancel intents.
> 
> Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails")
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/xfs_log_recover.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index fdaa0ffe029b..c37031e64db5 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3444,6 +3444,13 @@ xlog_recover_finish(
>  	int	error;
>  
>  	error = xlog_recover_process_intents(log);
> +	/*
> +	 * Sync the log to get all the intents that have done item out of

"...that have intent done items out of the AIL."

> +	 * the AIL.  This isn't absolutely necessary, but it helps in case
> +	 * the unlink transactions would have problems pushing the intents
> +	 * out of the way.
> +	 */
> +	xfs_log_force(log->l_mp, XFS_LOG_SYNC);

Hmm.  Why doesn't the shutdown force the (now dead) dead log items out
of memory?

	/*
	 * Flush all the completed transactions to disk before marking the log
	 * being shut down. We need to do this first as shutting down the log
	 * before the force will prevent the log force from flushing the iclogs
	 * to disk.
	 *
	 * When we are in recovery, there are no transactions to flush, and
	 * we don't want to touch the log because we don't want to perturb the
	 * current head/tail for future recovery attempts. Hence we need to
	 * avoid a log force in this case.
	 *
	 * If we are shutting down due to a log IO error, then we must avoid
	 * trying to write the log as that may just result in more IO errors and
	 * an endless shutdown/force loop.
	 */
	if (!log_error && !xlog_in_recovery(log))
		xfs_log_force(log->l_mp, XFS_LOG_SYNC);

Oh.  We're in recovery, but we've passed the part where we've finished
with the log/inode/dquot items that we read from the disk.  IOWs, now
we've gotten to the part of recovery where we're actually standing up
new transactions to finish the work that was in the intents ... with new
incore intents.

I wonder, does the problem go away if you (hackishly)
clear_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate); before the
xlog_force_shutdown?

--D

>  	if (error) {
>  		/*
>  		 * Cancel all the unprocessed intent items now so that we don't
> @@ -3458,13 +3465,6 @@ xlog_recover_finish(
>  		return error;
>  	}
>  
> -	/*
> -	 * Sync the log to get all the intents out of the AIL.  This isn't
> -	 * absolutely necessary, but it helps in case the unlink transactions
> -	 * would have problems pushing the intents out of the way.
> -	 */
> -	xfs_log_force(log->l_mp, XFS_LOG_SYNC);
> -
>  	/*
>  	 * Now that we've recovered the log and all the intents, we can clear
>  	 * the log incompat feature bits in the superblock because there's no
> -- 
> 2.31.1
>
Dave Chinner July 19, 2023, 6:41 a.m. UTC | #2
On Sat, Jul 15, 2023 at 02:36:47PM +0800, Long Li wrote:
> KASAN report a uaf when recover intents fails:
....
> 
> If process intents fails, intent items left in AIL will be delete
> from AIL and freed in error handling, even intent items that have been
> recovered and created done items. After this, uaf will be triggered when
> done item commited, because at this point the released intent item will
> be accessed.
> 
> xlog_recover_finish                     xlog_cil_push_work
> ----------------------------            ---------------------------
> xlog_recover_process_intents
>   xfs_cui_item_recover//cui_refcount == 1
>     xfs_trans_get_cud
>     xfs_trans_commit
>       <add cud item to cil>
>   xfs_cui_item_recover
>     <error occurred and return>
> xlog_recover_cancel_intents
>   xfs_cui_release     //cui_refcount == 0
>     xfs_cui_item_free //free cui
>   <release other intent items>
> xlog_force_shutdown   //shutdown
>                                <...>
>                                         <push items in cil>
>                                         xlog_cil_committed
>                                           xfs_cud_item_release
>                                             xfs_cui_release // UAF

Huh. The log stores items in the AIL without holding a reference to
them, then on shutdown takes the intent done reference away because
it assumes the intent has not been processed as it is still in the
AIL.

Ok, that's broken.

> Fix it by move log force forward to make sure done items committed before
> cancel intents.

That doesn't fix the fact we have a reference counted object that is
being accessed by code that doesn't actually own a reference to the
object.  Intent log items are created with a reference count of 2 -
one for the creator, and one for the intent done object.

Look at xlog_recover_cui_commit_pass2():

        /*
         * Insert the intent into the AIL directly and drop one reference so
         * that finishing or canceling the work will drop the other.
         */
        xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
        xfs_cui_release(cuip);
        return 0;
}

Log recovery explicitly drops the creator reference after it is
inserted into the AIL, but it then processes the log item as if it
also owns the intent-done reference. The moment we call
->iop_recover(), the intent-done reference should be owned by the
log item.

The recovery of the BUI, RUI and EFI all do the same thing. I
suspect that these references should actually be held by log
recovery until it is done processing the item, at which point it
should be removed from the AIL by xlog_recover_process_intents().

The code in ->iop_recovery should assume that it passes the
reference to the done intent, but if that code fails before creating
the done-intent then it needs to release the intent reference
itself.

That way when we go to cancel the intent, the only intents we find
in the AIL are the ones we know have not been processed yet and
hence we can safely drop both the creator and the intent done
reference from xlog_recover_cancel_intents().

Cheers,

Dave.
Long Li July 22, 2023, 1:19 a.m. UTC | #3
On Wed, Jul 19, 2023 at 04:41:41PM +1000, Dave Chinner wrote:
> On Sat, Jul 15, 2023 at 02:36:47PM +0800, Long Li wrote:
> > KASAN report a uaf when recover intents fails:
> ....
> > 
> > If process intents fails, intent items left in AIL will be delete
> > from AIL and freed in error handling, even intent items that have been
> > recovered and created done items. After this, uaf will be triggered when
> > done item commited, because at this point the released intent item will
> > be accessed.
> > 
> > xlog_recover_finish                     xlog_cil_push_work
> > ----------------------------            ---------------------------
> > xlog_recover_process_intents
> >   xfs_cui_item_recover//cui_refcount == 1
> >     xfs_trans_get_cud
> >     xfs_trans_commit
> >       <add cud item to cil>
> >   xfs_cui_item_recover
> >     <error occurred and return>
> > xlog_recover_cancel_intents
> >   xfs_cui_release     //cui_refcount == 0
> >     xfs_cui_item_free //free cui
> >   <release other intent items>
> > xlog_force_shutdown   //shutdown
> >                                <...>
> >                                         <push items in cil>
> >                                         xlog_cil_committed
> >                                           xfs_cud_item_release
> >                                             xfs_cui_release // UAF
> 
> Huh. The log stores items in the AIL without holding a reference to
> them, then on shutdown takes the intent done reference away because
> it assumes the intent has not been processed as it is still in the
> AIL.
> 
> Ok, that's broken.
> 
> > Fix it by move log force forward to make sure done items committed before
> > cancel intents.
> 
> That doesn't fix the fact we have a reference counted object that is
> being accessed by code that doesn't actually own a reference to the
> object.  Intent log items are created with a reference count of 2 -
> one for the creator, and one for the intent done object.
> 
> Look at xlog_recover_cui_commit_pass2():
> 
>         /*
>          * Insert the intent into the AIL directly and drop one reference so
>          * that finishing or canceling the work will drop the other.
>          */
>         xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
>         xfs_cui_release(cuip);
>         return 0;
> }
> 
> Log recovery explicitly drops the creator reference after it is
> inserted into the AIL, but it then processes the log item as if it
> also owns the intent-done reference. The moment we call
> ->iop_recover(), the intent-done reference should be owned by the
> log item.

Hi, Dave

Thanks for the reply. Yes, your analysis seems reasonable, it helped me a
lot to understand the intent lifecycle.

> 
> The recovery of the BUI, RUI and EFI all do the same thing. I
> suspect that these references should actually be held by log
> recovery until it is done processing the item, at which point it
> should be removed from the AIL by xlog_recover_process_intents().

Why do we need to remove the intent from the AIL at this point, shouldn't
it be removed from the AIL when the done intent is committed? Or is there
any way to ensure that the intents are removed from the AIL when they are
processed.

> 
> The code in ->iop_recovery should assume that it passes the
> reference to the done intent, but if that code fails before creating
> the done-intent then it needs to release the intent reference
> itself.
> 
> That way when we go to cancel the intent, the only intents we find
> in the AIL are the ones we know have not been processed yet and
> hence we can safely drop both the creator and the intent done
> reference from xlog_recover_cancel_intents().

Yes, if the processed intent has been removed from the AIL, then only the
unprocessed intent remains in the current AIL, and it is safe to cancel
the intent.

Best regards
Long Li
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Long Li July 22, 2023, 2:16 a.m. UTC | #4
On Tue, Jul 18, 2023 at 07:19:14PM -0700, Darrick J. Wong wrote:
> On Sat, Jul 15, 2023 at 02:36:47PM +0800, Long Li wrote:
> > KASAN report a uaf when recover intents fails:
> > 
> >  ==================================================================
> >  BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
> >  Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
> >  CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
> >  Workqueue: xfs-cil/sda xlog_cil_push_work
> >  Call Trace:
> >   <TASK>
> >   dump_stack_lvl+0x50/0x70
> >   print_report+0xc2/0x600
> >   kasan_report+0xb6/0xe0
> >   xfs_cui_release+0xb7/0xc0
> >   xfs_cud_item_release+0x3c/0x90
> >   xfs_trans_committed_bulk+0x2d5/0x7f0
> >   xlog_cil_committed+0xaba/0xf20
> >   xlog_cil_push_work+0x1a60/0x2360
> >   process_one_work+0x78e/0x1140
> >   worker_thread+0x58b/0xf60
> >   kthread+0x2cd/0x3c0
> >   ret_from_fork+0x1f/0x30
> >   </TASK>
> > 
> >  Allocated by task 531:
> >   kasan_save_stack+0x22/0x40
> >   kasan_set_track+0x25/0x30
> >   __kasan_slab_alloc+0x55/0x60
> >   kmem_cache_alloc+0x195/0x5f0
> >   xfs_cui_init+0x198/0x1d0
> >   xlog_recover_cui_commit_pass2+0x133/0x5f0
> >   xlog_recover_items_pass2+0x107/0x230
> >   xlog_recover_commit_trans+0x3e7/0x9c0
> >   xlog_recovery_process_trans+0x140/0x1d0
> >   xlog_recover_process_ophdr+0x1a0/0x3d0
> >   xlog_recover_process_data+0x108/0x2d0
> >   xlog_recover_process+0x1f6/0x280
> >   xlog_do_recovery_pass+0x609/0xdb0
> >   xlog_do_log_recovery+0x84/0xe0
> >   xlog_do_recover+0x7d/0x470
> >   xlog_recover+0x25f/0x490
> >   xfs_log_mount+0x2dd/0x6f0
> >   xfs_mountfs+0x11ce/0x1e70
> >   xfs_fs_fill_super+0x10ec/0x1b20
> >   get_tree_bdev+0x3c8/0x730
> >   vfs_get_tree+0x89/0x2c0
> >   path_mount+0xecf/0x1800
> >   do_mount+0xf3/0x110
> >   __x64_sys_mount+0x154/0x1f0
> >   do_syscall_64+0x39/0x80
> >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> >  Freed by task 531:
> >   kasan_save_stack+0x22/0x40
> >   kasan_set_track+0x25/0x30
> >   kasan_save_free_info+0x2b/0x40
> >   __kasan_slab_free+0x114/0x1b0
> >   kmem_cache_free+0xf8/0x510
> >   xfs_cui_item_free+0x95/0xb0
> >   xfs_cui_release+0x86/0xc0
> >   xlog_recover_cancel_intents.isra.0+0xf8/0x210
> >   xlog_recover_finish+0x7e7/0x980
> >   xfs_log_mount_finish+0x2bb/0x4a0
> >   xfs_mountfs+0x14bf/0x1e70
> >   xfs_fs_fill_super+0x10ec/0x1b20
> >   get_tree_bdev+0x3c8/0x730
> >   vfs_get_tree+0x89/0x2c0
> >   path_mount+0xecf/0x1800
> >   do_mount+0xf3/0x110
> >   __x64_sys_mount+0x154/0x1f0
> >   do_syscall_64+0x39/0x80
> >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> >  The buggy address belongs to the object at ffff888012575dc8
> >   which belongs to the cache xfs_cui_item of size 432
> >  The buggy address is located 152 bytes inside of
> >   freed 432-byte region [ffff888012575dc8, ffff888012575f78)
> > 
> >  The buggy address belongs to the physical page:
> >  page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
> >  head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> >  flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
> >  page_type: 0xffffffff()
> >  raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
> >  raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
> >  page dumped because: kasan: bad access detected
> > 
> >  Memory state around the buggy address:
> >   ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
> >   ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
> >  >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >                                                         ^
> >   ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >   ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
> >  ==================================================================
> > 
> > If process intents fails, intent items left in AIL will be delete
> > from AIL and freed in error handling, even intent items that have been
> > recovered and created done items. After this, uaf will be triggered when
> > done item commited, because at this point the released intent item will
> > be accessed.
> > 
> > xlog_recover_finish                     xlog_cil_push_work
> > ----------------------------            ---------------------------
> > xlog_recover_process_intents
> >   xfs_cui_item_recover//cui_refcount == 1
> >     xfs_trans_get_cud
> >     xfs_trans_commit
> >       <add cud item to cil>
> >   xfs_cui_item_recover
> >     <error occurred and return>
> > xlog_recover_cancel_intents
> >   xfs_cui_release     //cui_refcount == 0
> >     xfs_cui_item_free //free cui
> >   <release other intent items>
> > xlog_force_shutdown   //shutdown
> >                                <...>
> >                                         <push items in cil>
> >                                         xlog_cil_committed
> >                                           xfs_cud_item_release
> >                                             xfs_cui_release // UAF
> > 
> > Fix it by move log force forward to make sure done items committed before
> > cancel intents.
> > 
> > Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails")
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> >  fs/xfs/xfs_log_recover.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index fdaa0ffe029b..c37031e64db5 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -3444,6 +3444,13 @@ xlog_recover_finish(
> >  	int	error;
> >  
> >  	error = xlog_recover_process_intents(log);
> > +	/*
> > +	 * Sync the log to get all the intents that have done item out of
> 
> "...that have intent done items out of the AIL."
> 
> > +	 * the AIL.  This isn't absolutely necessary, but it helps in case
> > +	 * the unlink transactions would have problems pushing the intents
> > +	 * out of the way.
> > +	 */
> > +	xfs_log_force(log->l_mp, XFS_LOG_SYNC);
> 
> Hmm.  Why doesn't the shutdown force the (now dead) dead log items out
> of memory?
> 
> 	/*
> 	 * Flush all the completed transactions to disk before marking the log
> 	 * being shut down. We need to do this first as shutting down the log
> 	 * before the force will prevent the log force from flushing the iclogs
> 	 * to disk.
> 	 *
> 	 * When we are in recovery, there are no transactions to flush, and
> 	 * we don't want to touch the log because we don't want to perturb the
> 	 * current head/tail for future recovery attempts. Hence we need to
> 	 * avoid a log force in this case.
> 	 *
> 	 * If we are shutting down due to a log IO error, then we must avoid
> 	 * trying to write the log as that may just result in more IO errors and
> 	 * an endless shutdown/force loop.
> 	 */
> 	if (!log_error && !xlog_in_recovery(log))
> 		xfs_log_force(log->l_mp, XFS_LOG_SYNC);
> 
> Oh.  We're in recovery, but we've passed the part where we've finished
> with the log/inode/dquot items that we read from the disk.  IOWs, now
> we've gotten to the part of recovery where we're actually standing up
> new transactions to finish the work that was in the intents ... with new
> incore intents.
> 
> I wonder, does the problem go away if you (hackishly)
> clear_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate); before the
> xlog_force_shutdown?

I used the patch below and tested it to confirm that the uaf problem
goes away.

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 82c81d20459d..5d9f84ce2b9b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3452,9 +3452,10 @@ xlog_recover_finish(
 		 * (inode reclaim does this) before we get around to
 		 * xfs_log_mount_cancel.
 		 */
+		clear_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate);
+		xlog_force_shutdown(log, SHUTDOWN_CORRUPT_INCORE);
 		xlog_recover_cancel_intents(log);
 		xfs_alert(log->l_mp, "Failed to recover intents");
-		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
 		return error;
 	}
 
I'm wondering if a shutdown with SHUTDOWN_CORRUPT_INCORE is appropriate,
even though there is no log io error at this point.

Best regards
Long Li

> 
> --D
> 
> >  	if (error) {
> >  		/*
> >  		 * Cancel all the unprocessed intent items now so that we don't
> > @@ -3458,13 +3465,6 @@ xlog_recover_finish(
> >  		return error;
> >  	}
> >  
> > -	/*
> > -	 * Sync the log to get all the intents out of the AIL.  This isn't
> > -	 * absolutely necessary, but it helps in case the unlink transactions
> > -	 * would have problems pushing the intents out of the way.
> > -	 */
> > -	xfs_log_force(log->l_mp, XFS_LOG_SYNC);
> > -
> >  	/*
> >  	 * Now that we've recovered the log and all the intents, we can clear
> >  	 * the log incompat feature bits in the superblock because there's no
> > -- 
> > 2.31.1
> >
Dave Chinner July 24, 2023, 1:05 a.m. UTC | #5
On Sat, Jul 22, 2023 at 09:19:09AM +0800, Long Li wrote:
> On Wed, Jul 19, 2023 at 04:41:41PM +1000, Dave Chinner wrote:
> > On Sat, Jul 15, 2023 at 02:36:47PM +0800, Long Li wrote:
> > > KASAN report a uaf when recover intents fails:
> > ....
> > > 
> > > If process intents fails, intent items left in AIL will be delete
> > > from AIL and freed in error handling, even intent items that have been
> > > recovered and created done items. After this, uaf will be triggered when
> > > done item commited, because at this point the released intent item will
> > > be accessed.
> > > 
> > > xlog_recover_finish                     xlog_cil_push_work
> > > ----------------------------            ---------------------------
> > > xlog_recover_process_intents
> > >   xfs_cui_item_recover//cui_refcount == 1
> > >     xfs_trans_get_cud
> > >     xfs_trans_commit
> > >       <add cud item to cil>
> > >   xfs_cui_item_recover
> > >     <error occurred and return>
> > > xlog_recover_cancel_intents
> > >   xfs_cui_release     //cui_refcount == 0
> > >     xfs_cui_item_free //free cui
> > >   <release other intent items>
> > > xlog_force_shutdown   //shutdown
> > >                                <...>
> > >                                         <push items in cil>
> > >                                         xlog_cil_committed
> > >                                           xfs_cud_item_release
> > >                                             xfs_cui_release // UAF
> > 
> > Huh. The log stores items in the AIL without holding a reference to
> > them, then on shutdown takes the intent done reference away because
> > it assumes the intent has not been processed as it is still in the
> > AIL.
> > 
> > Ok, that's broken.
> > 
> > > Fix it by move log force forward to make sure done items committed before
> > > cancel intents.
> > 
> > That doesn't fix the fact we have a reference counted object that is
> > being accessed by code that doesn't actually own a reference to the
> > object.  Intent log items are created with a reference count of 2 -
> > one for the creator, and one for the intent done object.
> > 
> > Look at xlog_recover_cui_commit_pass2():
> > 
> >         /*
> >          * Insert the intent into the AIL directly and drop one reference so
> >          * that finishing or canceling the work will drop the other.
> >          */
> >         xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
> >         xfs_cui_release(cuip);
> >         return 0;
> > }
> > 
> > Log recovery explicitly drops the creator reference after it is
> > inserted into the AIL, but it then processes the log item as if it
> > also owns the intent-done reference. The moment we call
> > ->iop_recover(), the intent-done reference should be owned by the
> > log item.
> 
> Hi, Dave
> 
> Thanks for the reply. Yes, your analysis seems reasonable, it helped me a
> lot to understand the intent lifecycle.
> 
> > 
> > The recovery of the BUI, RUI and EFI all do the same thing. I
> > suspect that these references should actually be held by log
> > recovery until it is done processing the item, at which point it
> > should be removed from the AIL by xlog_recover_process_intents().
> 
> Why do we need to remove the intent from the AIL at this point,

Because we've processed the recovery of it - it is either completely
done or we have a new intent in the CIL ready to continue operation.
Either way, the next write to the journal will remove the item from
the AIL when it completes.

Intents don't need to be in the AIL, though - we can cancel them
in memory (see the intent whiteout code) and so when we process the
done item from journal IO completion the last reference goes away
and they won't be in the AIL at this point in time.

IOWs, the intent freeing code doesn't care if the intent is in the
AIL or not, it does the right thing either way.

Hence if we remove the intent from the list of intents that need to
be recovered after we have done the initial recovery, we acheive two
things:

1. the tail of the log can be moved forward with the commit of the
done intent or new intent to continue the operation, and

2. We avoid the problem of trying to determine how many reference
counts we need to drop from intent recovery cancelling because we
never come across intents we've actually attempted recovery on.

> shouldn't
> it be removed from the AIL when the done intent is committed? Or is there
> any way to ensure that the intents are removed from the AIL when they are
> processed.

THe reference counting ensures the right thing is done when the last
reference goes away. If it is in the AIL, it will get removed, if it
is not in the AIL, then AIL removal is a no-op and nothign bad
happens.

Cheers,

Dave.
Long Li July 25, 2023, 12:25 p.m. UTC | #6
On Mon, Jul 24, 2023 at 11:05:48AM +1000, Dave Chinner wrote:
> On Sat, Jul 22, 2023 at 09:19:09AM +0800, Long Li wrote:
> > On Wed, Jul 19, 2023 at 04:41:41PM +1000, Dave Chinner wrote:
> > > On Sat, Jul 15, 2023 at 02:36:47PM +0800, Long Li wrote:
> > > > KASAN report a uaf when recover intents fails:
> > > ....
> > > > 
> > > > If process intents fails, intent items left in AIL will be delete
> > > > from AIL and freed in error handling, even intent items that have been
> > > > recovered and created done items. After this, uaf will be triggered when
> > > > done item commited, because at this point the released intent item will
> > > > be accessed.
> > > > 
> > > > xlog_recover_finish                     xlog_cil_push_work
> > > > ----------------------------            ---------------------------
> > > > xlog_recover_process_intents
> > > >   xfs_cui_item_recover//cui_refcount == 1
> > > >     xfs_trans_get_cud
> > > >     xfs_trans_commit
> > > >       <add cud item to cil>
> > > >   xfs_cui_item_recover
> > > >     <error occurred and return>
> > > > xlog_recover_cancel_intents
> > > >   xfs_cui_release     //cui_refcount == 0
> > > >     xfs_cui_item_free //free cui
> > > >   <release other intent items>
> > > > xlog_force_shutdown   //shutdown
> > > >                                <...>
> > > >                                         <push items in cil>
> > > >                                         xlog_cil_committed
> > > >                                           xfs_cud_item_release
> > > >                                             xfs_cui_release // UAF
> > > 
> > > Huh. The log stores items in the AIL without holding a reference to
> > > them, then on shutdown takes the intent done reference away because
> > > it assumes the intent has not been processed as it is still in the
> > > AIL.
> > > 
> > > Ok, that's broken.
> > > 
> > > > Fix it by move log force forward to make sure done items committed before
> > > > cancel intents.
> > > 
> > > That doesn't fix the fact we have a reference counted object that is
> > > being accessed by code that doesn't actually own a reference to the
> > > object.  Intent log items are created with a reference count of 2 -
> > > one for the creator, and one for the intent done object.
> > > 
> > > Look at xlog_recover_cui_commit_pass2():
> > > 
> > >         /*
> > >          * Insert the intent into the AIL directly and drop one reference so
> > >          * that finishing or canceling the work will drop the other.
> > >          */
> > >         xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
> > >         xfs_cui_release(cuip);
> > >         return 0;
> > > }
> > > 
> > > Log recovery explicitly drops the creator reference after it is
> > > inserted into the AIL, but it then processes the log item as if it
> > > also owns the intent-done reference. The moment we call
> > > ->iop_recover(), the intent-done reference should be owned by the
> > > log item.
> > 
> > Hi, Dave
> > 
> > Thanks for the reply. Yes, your analysis seems reasonable, it helped me a
> > lot to understand the intent lifecycle.
> > 
> > > 
> > > The recovery of the BUI, RUI and EFI all do the same thing. I
> > > suspect that these references should actually be held by log
> > > recovery until it is done processing the item, at which point it
> > > should be removed from the AIL by xlog_recover_process_intents().
> > 
> > Why do we need to remove the intent from the AIL at this point,
> 
> Because we've processed the recovery of it - it is either completely
> done or we have a new intent in the CIL ready to continue operation.
> Either way, the next write to the journal will remove the item from
> the AIL when it completes.
> 
> Intents don't need to be in the AIL, though - we can cancel them
> in memory (see the intent whiteout code) and so when we process the
> done item from journal IO completion the last reference goes away
> and they won't be in the AIL at this point in time.
> 
> IOWs, the intent freeing code doesn't care if the intent is in the
> AIL or not, it does the right thing either way.
> 
> Hence if we remove the intent from the list of intents that need to
> be recovered after we have done the initial recovery, we acheive two
> things:
> 
> 1. the tail of the log can be moved forward with the commit of the
> done intent or new intent to continue the operation, and
> 
> 2. We avoid the problem of trying to determine how many reference
> counts we need to drop from intent recovery cancelling because we
> never come across intents we've actually attempted recovery on.
> 
> > shouldn't
> > it be removed from the AIL when the done intent is committed? Or is there
> > any way to ensure that the intents are removed from the AIL when they are
> > processed.
> 
> THe reference counting ensures the right thing is done when the last
> reference goes away. If it is in the AIL, it will get removed, if it
> is not in the AIL, then AIL removal is a no-op and nothign bad
> happens.

Thank you for your detailed answer, it was great, I understand
what you mean.

The intent item stays in the AIL until the done item is committed,
which prevents the tail lsn from moving forward. After the transaction
is alloced in ->iop_recover(), we don't need to stop tail lsn from
moving forward because there is no other thread committing the
transaction at this point, the intent item will not be overwritten
until the done intent is written to the log, so we can just remove
the intent item from the AIL.

I think we can remove the intent item from the AIL as soon as the
done item is created, since the reference count has already been
passed to the done item. Eventually the problem can be fixed.

This is my understanding.

Thanks
Long Li
diff mbox series

Patch

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index fdaa0ffe029b..c37031e64db5 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3444,6 +3444,13 @@  xlog_recover_finish(
 	int	error;
 
 	error = xlog_recover_process_intents(log);
+	/*
+	 * Sync the log to get all the intents that have done item out of
+	 * the AIL.  This isn't absolutely necessary, but it helps in case
+	 * the unlink transactions would have problems pushing the intents
+	 * out of the way.
+	 */
+	xfs_log_force(log->l_mp, XFS_LOG_SYNC);
 	if (error) {
 		/*
 		 * Cancel all the unprocessed intent items now so that we don't
@@ -3458,13 +3465,6 @@  xlog_recover_finish(
 		return error;
 	}
 
-	/*
-	 * Sync the log to get all the intents out of the AIL.  This isn't
-	 * absolutely necessary, but it helps in case the unlink transactions
-	 * would have problems pushing the intents out of the way.
-	 */
-	xfs_log_force(log->l_mp, XFS_LOG_SYNC);
-
 	/*
 	 * Now that we've recovered the log and all the intents, we can clear
 	 * the log incompat feature bits in the superblock because there's no