Message ID | 171892418834.3183906.376857417040987772.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | [1/9] xfs: clean up extent free log intent item tracepoint callsites | expand |
Hello, kernel test robot noticed "Assertion_failed" on: commit: f53305b8c490815f244c0d44b096abd4f2a63aeb ("[PATCH 8/9] xfs: remove xfs_defer_agfl_block") url: https://github.com/intel-lab-lkp/linux/commits/Darrick-J-Wong/xfs-convert-skip_discard-to-a-proper-flags-bitset/20240625-204930 base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next patch link: https://lore.kernel.org/all/171892418834.3183906.376857417040987772.stgit@frogsfrogsfrogs/ patch subject: [PATCH 8/9] xfs: remove xfs_defer_agfl_block in testcase: stress-ng version: stress-ng-x86_64-ecd3fe291-1_20240612 with following parameters: nr_threads: 100% disk: 1HDD testtime: 60s fs: xfs test: copy-file cpufreq_governor: performance compiler: gcc-13 test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202407031556.d271bd4c-oliver.sang@intel.com user :err : [ 88.899876] [ perf record: Woken up 5 times to write data ] user :err : [ 88.979592] [ perf record: Captured and wrote 9.304 MB /tmp/lkp/perf_c2c.data (5470 samples) ] kern :warn : [ 101.832173] XFS: Assertion failed: type != XFS_AG_RESV_AGFL, file: fs/xfs/libxfs/xfs_alloc.c, line: 2558 kern :warn : [ 101.842834] ------------[ cut here ]------------ kern :warn : [ 101.848538] WARNING: CPU: 22 PID: 536 at fs/xfs/xfs_message.c:89 asswarn (kbuild/src/consumer/fs/xfs/xfs_message.c:89 (discriminator 1)) xfs kern :warn : [ 101.857842] Modules linked in: xfs intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal coretemp btrfs blake2b_generic kvm_intel ipmi_ssif xor raid6_pq libcrc32c kvm crct10dif_pclmul crc32_pclmul crc32c_intel sd_mod ghash_clmulni_intel sg sha512_ssse3 nvme ahci rapl libahci ast nvme_core binfmt_misc t10_pi intel_cstate mei_me drm_shmem_helper acpi_power_meter intel_th_gth crc64_rocksoft_generic i2c_i801 crc64_rocksoft ioatdma intel_th_pci libata intel_uncore drm_kms_helper megaraid_sas i2c_smbus ipmi_si mei intel_pch_thermal acpi_ipmi dax_hmem crc64 intel_th dca wmi ipmi_devintf ipmi_msghandler joydev drm fuse loop dm_mod ip_tables user :notice: [ 101.860115] stress-ng: metrc: [2914] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max kern :warn : [ 101.914497] CPU: 22 PID: 536 Comm: kworker/22:1 Not tainted 6.10.0-rc4-00009-gf53305b8c490 #1 kern :warn : [ 101.929361] Hardware name: Inspur NF5180M6/NF5180M6, BIOS 06.00.04 04/12/2022 user :notice: [ 101.940509] stress-ng: metrc: [2914] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB) kern :warn : [ 101.940764] Workqueue: xfs-inodegc/sdb1 xfs_inodegc_worker [xfs] kern :warn : [ 101.962311] RIP: 0010:asswarn (kbuild/src/consumer/fs/xfs/xfs_message.c:89 (discriminator 1)) xfs user :notice: [ 101.970762] stress-ng: metrc: [2914] copy-file 10938 60.17 0.14 4.61 181.79 2300.90 0.12 3244 kern :warn : [ 101.971200] Code: 90 90 66 0f 1f 00 0f 1f 44 00 00 49 89 d0 41 89 c9 48 c7 c2 90 ed 01 c1 48 89 f1 48 89 fe 48 c7 c7 20 07 01 c1 e8 18 fd ff ff <0f> 0b c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 All code ======== 0: 90 nop 1: 90 nop 2: 66 0f 1f 00 nopw (%rax) 6: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) b: 49 89 d0 mov %rdx,%r8 e: 41 89 c9 mov %ecx,%r9d 11: 48 c7 c2 90 ed 01 c1 mov $0xffffffffc101ed90,%rdx 18: 48 89 f1 mov %rsi,%rcx 1b: 48 89 fe mov %rdi,%rsi 1e: 48 c7 c7 20 07 01 c1 mov $0xffffffffc1010720,%rdi 25: e8 18 fd ff ff callq 0xfffffffffffffd42 2a:* 0f 0b ud2 <-- trapping instruction 2c: c3 retq 2d: cc int3 2e: cc int3 2f: cc int3 30: cc int3 31: 90 nop 32: 90 nop 33: 90 nop 34: 90 nop 35: 90 nop 36: 90 nop 37: 90 nop 38: 90 nop 39: 90 nop 3a: 90 nop 3b: 90 nop 3c: 90 nop 3d: 90 nop 3e: 90 nop 3f: 90 nop Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: c3 retq 3: cc int3 4: cc int3 5: cc int3 6: cc int3 7: 90 nop 8: 90 nop 9: 90 nop a: 90 nop b: 90 nop c: 90 nop d: 90 nop e: 90 nop f: 90 nop 10: 90 nop 11: 90 nop 12: 90 nop 13: 90 nop 14: 90 nop 15: 90 nop user :notice: [ 101.974008] stress-ng: metrc: [2914] miscellaneous metrics: kern :warn : [ 101.978448] RSP: 0018:ffa000000db6f9b8 EFLAGS: 00010246 user :notice: [ 101.993576] stress-ng: metrc: [2914] copy-file 2629.63 MB per sec copy rate (harmonic mean of 64 instances) kern :warn : [ 102.020066] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000007fffffff kern :warn : [ 102.020067] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffc1010720 user :notice: [ 102.026590] stress-ng: info: [2914] for a 60.26s run time: kern :warn : [ 102.028177] RBP: ffa000000db6f9f8 R08: 0000000000000000 R09: 000000000000000a kern :warn : [ 102.028178] R10: 000000000000000a R11: 0fffffffffffffff R12: ffa000000db6faa0 kern :warn : [ 102.028179] R13: ff11004060da7790 R14: 0000000000000000 R15: 0000000000000001 user :notice: [ 102.040178] stress-ng: info: [2914] 3856.62s available CPU time kern :warn : [ 102.041662] FS: 0000000000000000(0000) GS:ff11003fc0900000(0000) knlGS:0000000000000000 user :notice: [ 102.044574] stress-ng: info: [2914] 0.14s user time ( 0.00%) kern :warn : [ 102.051679] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 102.051681] CR2: 000056061f02f700 CR3: 000000407de1c002 CR4: 0000000000771ef0 user :notice: [ 102.060248] stress-ng: info: [2914] 4.63s system time ( 0.12%) kern :warn : [ 102.065773] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kern :warn : [ 102.081424] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kern :warn : [ 102.081426] PKRU: 55555554 user :notice: [ 102.089981] stress-ng: info: [2914] 4.77s total time ( 0.12%) kern :warn : [ 102.091444] Call Trace: kern :warn : [ 102.091446] <TASK> user :notice: [ 102.099108] stress-ng: info: [2914] load average: 42.09 12.22 4.21 kern :warn : [ 102.107184] ? __warn (kbuild/src/consumer/kernel/panic.c:693) The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240703/202407031556.d271bd4c-oliver.sang@intel.com
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 03a0a4289d943..1da3b1f741300 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2535,48 +2535,6 @@ xfs_agfl_reset( clear_bit(XFS_AGSTATE_AGFL_NEEDS_RESET, &pag->pag_opstate); } -/* - * Defer an AGFL block free. This is effectively equivalent to - * xfs_free_extent_later() with some special handling particular to AGFL blocks. - * - * Deferring AGFL frees helps prevent log reservation overruns due to too many - * allocation operations in a transaction. AGFL frees are prone to this problem - * because for one they are always freed one at a time. Further, an immediate - * AGFL block free can cause a btree join and require another block free before - * the real allocation can proceed. Deferring the free disconnects freeing up - * the AGFL slot from freeing the block. - */ -static int -xfs_defer_agfl_block( - struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t agbno, - struct xfs_owner_info *oinfo) -{ - struct xfs_mount *mp = tp->t_mountp; - struct xfs_extent_free_item *xefi; - xfs_fsblock_t fsbno = XFS_AGB_TO_FSB(mp, agno, agbno); - - ASSERT(xfs_extfree_item_cache != NULL); - ASSERT(oinfo != NULL); - - if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, fsbno))) - return -EFSCORRUPTED; - - xefi = kmem_cache_zalloc(xfs_extfree_item_cache, - GFP_KERNEL | __GFP_NOFAIL); - xefi->xefi_startblock = fsbno; - xefi->xefi_blockcount = 1; - xefi->xefi_owner = oinfo->oi_owner; - xefi->xefi_agresv = XFS_AG_RESV_AGFL; - - trace_xfs_agfl_free_defer(mp, xefi); - - xfs_extent_free_get_group(mp, xefi); - xfs_defer_add(tp, &xefi->xefi_list, &xfs_agfl_free_defer_type); - return 0; -} - /* * Add the extent to the list of extents to be free at transaction end. * The list is maintained sorted (by block number). @@ -2624,7 +2582,13 @@ xfs_defer_extent_free( trace_xfs_extent_free_defer(mp, xefi); xfs_extent_free_get_group(mp, xefi); - *dfpp = xfs_defer_add(tp, &xefi->xefi_list, &xfs_extent_free_defer_type); + + if (xefi->xefi_agresv == XFS_AG_RESV_AGFL) + *dfpp = xfs_defer_add(tp, &xefi->xefi_list, + &xfs_agfl_free_defer_type); + else + *dfpp = xfs_defer_add(tp, &xefi->xefi_list, + &xfs_extent_free_defer_type); return 0; } @@ -2882,8 +2846,21 @@ xfs_alloc_fix_freelist( if (error) goto out_agbp_relse; - /* defer agfl frees */ - error = xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo); + /* + * Defer the AGFL block free. + * + * This helps to prevent log reservation overruns due to too + * many allocation operations in a transaction. AGFL frees are + * prone to this problem because for one they are always freed + * one at a time. Further, an immediate AGFL block free can + * cause a btree join and require another block free before the + * real allocation can proceed. + * Deferring the free disconnects freeing up the AGFL slot from + * freeing the block. + */ + error = xfs_free_extent_later(tp, + XFS_AGB_TO_FSB(mp, args->agno, bno), 1, + &targs.oinfo, XFS_AG_RESV_AGFL, 0); if (error) goto out_agbp_relse; }