diff mbox series

[RFC,4/4] btrfs: scrub: make sctx->stripes[] array work as a ring buffer

Message ID 04b653322f5d8b32c22083a4df9baf6d4c9097ea.1689744163.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: scrub: make sctx->stripes[] a ring buffer | expand

Commit Message

Qu Wenruo July 19, 2023, 5:30 a.m. UTC
Since all scrub works (repair using other mirrors, writeback to the
original mirror, writeback to the dev-replace target) all happens inside
a btrfs workqueue, there is no need to manually flush all the stripes
inside queue_scrub_stripe().

We only need to wait for the current slot to finish its work, then queue
the stripe, and finally increase the cur_stripe number (and mod it).

This should allow scrub to increase its queue depth when submitting read
requests.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Comments

kernel test robot July 28, 2023, 3:15 a.m. UTC | #1
Hello,

kernel test robot noticed "BUG:KASAN:use-after-free_in__list_del_entry_valid" on:

commit: 70c50070fcdecf82cb3ca4dc718eab1098f3c8ad ("[PATCH RFC 4/4] btrfs: scrub: make sctx->stripes[] array work as a ring buffer")
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-move-write-back-of-repaired-sectors-into-scrub_stripe_read_repair_worker/20230719-133412
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/all/04b653322f5d8b32c22083a4df9baf6d4c9097ea.1689744163.git.wqu@suse.com/
patch subject: [PATCH RFC 4/4] btrfs: scrub: make sctx->stripes[] array work as a ring buffer

in testcase: xfstests
version: xfstests-x86_64-bb8af9c-1_20230724
with following parameters:

	disk: 6HDD
	fs: btrfs
	test: btrfs-group-07



compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell) with 8G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202307281035.31af197-oliver.sang@intel.com


[ 72.139677][ T65] BUG: KASAN: use-after-free in __list_del_entry_valid (lib/list_debug.c:62) 
[   72.147444][   T65] Read of size 8 at addr ffff888205850178 by task kworker/u16:3/65
[   72.155213][   T65]
[   72.157407][   T65] CPU: 1 PID: 65 Comm: kworker/u16:3 Not tainted 6.5.0-rc2-00048-g70c50070fcde #1
[   72.166478][   T65] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
[   72.174417][   T65] Workqueue:  0x0 (btrfs-btrfs-scrub)
[   72.179666][   T65] Call Trace:
[   72.182819][   T65]  <TASK>
[ 72.185620][ T65] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) 
[ 72.189996][ T65] print_address_description+0x2c/0x3a0 
[ 72.196462][ T65] ? __list_del_entry_valid (lib/list_debug.c:62) 
[ 72.201880][ T65] print_report (mm/kasan/report.c:476) 
[ 72.206170][ T65] ? kasan_addr_to_slab (mm/kasan/common.c:35) 
[ 72.210979][ T65] ? __list_del_entry_valid (lib/list_debug.c:62) 
[ 72.216394][ T65] kasan_report (mm/kasan/report.c:590) 
[ 72.220680][ T65] ? __list_del_entry_valid (lib/list_debug.c:62) 
[ 72.226098][ T65] __list_del_entry_valid (lib/list_debug.c:62) 
[ 72.231342][ T65] move_linked_works (include/linux/list.h:134 include/linux/list.h:229 kernel/workqueue.c:1284) 
[ 72.236064][ T65] ? process_one_work (kernel/workqueue.c:938 kernel/workqueue.c:2633) 
[ 72.241050][ T65] pwq_activate_inactive_work (include/linux/instrumented.h:40 include/asm-generic/bitops/instrumented-non-atomic.h:44 kernel/workqueue.c:1364) 
[ 72.246554][ T65] pwq_dec_nr_in_flight (kernel/workqueue.c:1374 kernel/workqueue.c:1396) 
[ 72.251623][ T65] worker_thread (include/linux/list.h:292 kernel/workqueue.c:2749) 
[ 72.256172][ T65] ? rescuer_thread (kernel/workqueue.c:2691) 
[ 72.260894][ T65] kthread (kernel/kthread.c:389) 
[ 72.264833][ T65] ? kthread_complete_and_exit (kernel/kthread.c:342) 
[ 72.270336][ T65] ret_from_fork (arch/x86/kernel/process.c:151) 
[ 72.274623][ T65] ? kthread_complete_and_exit (kernel/kthread.c:342) 
[ 72.280128][ T65] ret_from_fork_asm (arch/x86/entry/entry_64.S:298) 
[   72.284763][   T65] RIP: 0000:0x0
[ 72.288091][ T65] Code: Unable to access opcode bytes at 0xffffffffffffffd6.

Code starting with the faulting instruction
===========================================
[   72.295335][   T65] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[   72.303626][   T65] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   72.311477][   T65] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   72.319330][   T65] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   72.327183][   T65] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   72.335038][   T65] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   72.342895][   T65]  </TASK>
[   72.345784][   T65]
[   72.347976][   T65] The buggy address belongs to the physical page:
[   72.354262][   T65] page:00000000cb5c2985 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x205850
[   72.364640][   T65] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[   72.371887][   T65] page_type: 0xffffff7f(buddy)
[   72.376520][   T65] raw: 0017ffffc0000000 ffffea0008153008 ffff88821edd6020 0000000000000000
[   72.384980][   T65] raw: 0000000000000000 0000000000000004 00000000ffffff7f 0000000000000000
[   72.393440][   T65] page dumped because: kasan: bad access detected
[   72.399725][   T65]
[   72.401915][   T65] Memory state around the buggy address:
[   72.407417][   T65]  ffff888205850000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   72.415354][   T65]  ffff888205850080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   72.423291][   T65] >ffff888205850100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   72.431228][   T65]                                                                 ^
[   72.439080][   T65]  ffff888205850180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   72.447018][   T65]  ffff888205850200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   72.454953][   T65] ==================================================================
[   72.462889][   T65] Disabling lock debugging due to kernel taint
[   72.469218][   T64] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
[   72.481165][   T64] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[   72.489451][   T64] CPU: 3 PID: 64 Comm: kworker/u16:2 Tainted: G    B              6.5.0-rc2-00048-g70c50070fcde #1
[   72.499997][   T64] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
[   72.507935][   T64] Workqueue: btrfs-btrfs-scrub btrfs_work_helper [btrfs]
[ 72.514927][ T64] RIP: 0010:scrub_verify_one_sector (fs/btrfs/scrub.c:712) btrfs
[ 72.521821][ T64] Code: 82 98 00 00 00 4c 8b 74 24 08 48 89 d8 4c 8b 54 24 10 48 c1 e0 04 49 01 c6 48 b8 00 00 00 00 00 fc ff df 4c 89 f1 48 c1 e9 03 <0f> b6 04 01 84 c0 74 06 0f 8e 1a 03 00 00 41 80 3e 00 0f 84 a2 00
All code
========
   0:	82                   	(bad)  
   1:	98                   	cwtl   
   2:	00 00                	add    %al,(%rax)
   4:	00 4c 8b 74          	add    %cl,0x74(%rbx,%rcx,4)
   8:	24 08                	and    $0x8,%al
   a:	48 89 d8             	mov    %rbx,%rax
   d:	4c 8b 54 24 10       	mov    0x10(%rsp),%r10
  12:	48 c1 e0 04          	shl    $0x4,%rax
  16:	49 01 c6             	add    %rax,%r14
  19:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  20:	fc ff df 
  23:	4c 89 f1             	mov    %r14,%rcx
  26:	48 c1 e9 03          	shr    $0x3,%rcx
  2a:*	0f b6 04 01          	movzbl (%rcx,%rax,1),%eax		<-- trapping instruction
  2e:	84 c0                	test   %al,%al
  30:	74 06                	je     0x38
  32:	0f 8e 1a 03 00 00    	jle    0x352
  38:	41 80 3e 00          	cmpb   $0x0,(%r14)
  3c:	0f                   	.byte 0xf
  3d:	84                   	.byte 0x84
  3e:	a2                   	.byte 0xa2
	...

Code starting with the faulting instruction
===========================================
   0:	0f b6 04 01          	movzbl (%rcx,%rax,1),%eax
   4:	84 c0                	test   %al,%al
   6:	74 06                	je     0xe
   8:	0f 8e 1a 03 00 00    	jle    0x328
   e:	41 80 3e 00          	cmpb   $0x0,(%r14)
  12:	0f                   	.byte 0xf
  13:	84                   	.byte 0x84
  14:	a2                   	.byte 0xa2
	...
[   72.530498][ T2521] BTRFS info (device sda2): scrub: finished on devid 2 with status: 0
[   72.541321][   T64] RSP: 0018:ffffc900004cfab8 EFLAGS: 00010246
[   72.541325][   T64] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   72.541328][   T64] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffc900004cfa60
[   72.570983][   T64] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffed1040b0a0c3
[   72.578832][   T64] R10: ffff88813b476000 R11: 0000000000000001 R12: 000000000000000c
[   72.586683][   T64] R13: ffffc900004cfb10 R14: 0000000000000000 R15: ffff8882058504f8
[   72.594534][   T64] FS:  0000000000000000(0000) GS:ffff8881b5980000(0000) knlGS:0000000000000000
[   72.603348][   T64] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.609810][   T64] CR2: 00007f5a3bba8610 CR3: 000000011981e003 CR4: 00000000001706e0
[   72.617665][   T64] Call Trace:
[   72.620819][   T64]  <TASK>
[ 72.623622][ T64] ? die_addr (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:460) 
[ 72.627649][ T64] ? exc_general_protection (arch/x86/kernel/traps.c:786 arch/x86/kernel/traps.c:728) 
[ 72.633072][ T64] ? asm_exc_general_protection (arch/x86/include/asm/idtentry.h:564) 
[ 72.638672][ T64] ? scrub_verify_one_sector (fs/btrfs/scrub.c:712) btrfs
[ 72.644974][ T64] ? scrub_verify_one_sector (arch/x86/include/asm/bitops.h:228 (discriminator 8) arch/x86/include/asm/bitops.h:240 (discriminator 8) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 8) fs/btrfs/scrub.c:708 (discriminator 8)) btrfs
[ 72.651266][ T64] ? scrub_verify_one_metadata (fs/btrfs/scrub.c:692) btrfs
[ 72.657906][ T64] scrub_verify_one_stripe (fs/btrfs/scrub.c:761) btrfs
[ 72.664021][ T64] ? scrub_verify_one_sector (fs/btrfs/scrub.c:754) btrfs
[ 72.670305][ T64] ? scrub_write_sectors (fs/btrfs/scrub.c:1656) btrfs
[ 72.676236][ T64] scrub_stripe_read_repair_worker (fs/btrfs/scrub.c:1125) btrfs
[ 72.683040][ T64] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) 
[ 72.688720][ T64] ? kasan_report (mm/kasan/report.c:572) 
[ 72.693178][ T64] ? scrub_stripe_report_errors (fs/btrfs/scrub.c:1111) btrfs
[ 72.699726][ T64] ? scrub_write_sectors (fs/btrfs/scrub.c:1656) btrfs
[ 72.705660][ T64] btrfs_work_helper (fs/btrfs/async-thread.c:315) btrfs
[ 72.711245][ T64] process_one_work (kernel/workqueue.c:2602) 
[ 72.716054][ T64] worker_thread (include/linux/list.h:292 kernel/workqueue.c:2749) 
[ 72.720600][ T64] ? rescuer_thread (kernel/workqueue.c:2691) 
[ 72.725318][ T64] kthread (kernel/kthread.c:389) 
[ 72.729254][ T64] ? kthread_complete_and_exit (kernel/kthread.c:342) 
[ 72.734755][ T64] ret_from_fork (arch/x86/kernel/process.c:151) 
[ 72.739039][ T64] ? kthread_complete_and_exit (kernel/kthread.c:342) 
[ 72.744541][ T64] ret_from_fork_asm (arch/x86/entry/entry_64.S:298) 
[   72.749176][   T64] RIP: 0000:0x0
[ 72.752505][ T64] Code: Unable to access opcode bytes at 0xffffffffffffffd6.

Code starting with the faulting instruction
===========================================


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
diff mbox series

Patch

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 784fbe2341d4..e71ee8d2fbf7 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1770,7 +1770,7 @@  static int flush_scrub_stripes(struct scrub_ctx *sctx)
 	blk_finish_plug(&plug);
 
 	/* Wait for all stripes to finish. */
-	for (int i = 0; i < nr_stripes; i++) {
+	for (int i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) {
 		stripe = &sctx->stripes[i];
 
 		wait_event(stripe->repair_wait, stripe->state == 0);
@@ -1794,40 +1794,46 @@  static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
 	int ret;
 
 	/*
-	 * We should always have at least one slot, when full the last one who
-	 * queued a slot should handle the flush.
+	 * We should always have at least one slot, as the stripes[] array
+	 * is used as a ring buffer.
 	 */
 	ASSERT(sctx->cur_stripe < SCRUB_STRIPES_PER_SCTX);
 	stripe = &sctx->stripes[sctx->cur_stripe];
-	scrub_reset_stripe(stripe);
+
+	/* Wait if the stripe is still under usage. */
+	wait_event(stripe->repair_wait, stripe->state == 0);
+
 	ret = scrub_find_fill_first_stripe(bg, dev, physical, mirror_num,
 					   logical, length, stripe);
 	/* Either >0 as no more extents or <0 for error. */
 	if (ret)
 		return ret;
+
 	*found_stripe_start_ret = stripe->logical;
 
-	sctx->cur_stripe++;
-
-	/* Last slot used, flush them all. */
-	if (sctx->cur_stripe == SCRUB_STRIPES_PER_SCTX)
-		return flush_scrub_stripes(sctx);
-
-	/* We have filled one group, submit them now. */
-	if (sctx->cur_stripe % SCRUB_STRIPES_PER_GROUP == 0) {
+	/* Last slot in the group, submit the group.*/
+	if ((sctx->cur_stripe + 1) % SCRUB_STRIPES_PER_GROUP == 0) {
+		const int first_stripe = sctx->cur_stripe + 1 -
+					 SCRUB_STRIPES_PER_GROUP;
 		struct blk_plug plug;
 
-		scrub_throttle_dev_io(sctx, sctx->stripes[0].dev,
-			      btrfs_stripe_nr_to_offset(SCRUB_STRIPES_PER_GROUP));
+		scrub_throttle_dev_io(sctx, dev,
+			btrfs_stripe_nr_to_offset(SCRUB_STRIPES_PER_GROUP));
 
 		blk_start_plug(&plug);
-		for (int i = sctx->cur_stripe - SCRUB_STRIPES_PER_GROUP;
-		     i < sctx->cur_stripe; i++) {
+		for (int i = first_stripe;
+		     i < first_stripe + SCRUB_STRIPES_PER_GROUP; i++) {
 			stripe = &sctx->stripes[i];
+
+			/* All stripes in the group should haven been queued. */
+			ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED,
+					&stripe->state));
 			scrub_submit_initial_read(sctx, stripe);
 		}
 		blk_finish_plug(&plug);
 	}
+
+	sctx->cur_stripe = (sctx->cur_stripe + 1) % SCRUB_STRIPES_PER_SCTX;
 	return 0;
 }