Message ID | 20250228203722.CAEB63AC@davehans-spike.ostc.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] filemap: Move prefaulting out of hot write path | expand |
On Fri, 28 Feb 2025 12:37:22 -0800 Dave Hansen <dave.hansen@linux.intel.com> wrote: > There is a generic anti-pattern that shows up in the VFS and several > filesystems where the hot write paths touch userspace twice when they > could get away with doing it once. > > Dave Chinner suggested that they should all be fixed up[1]. I agree[2]. > But, the series to do that fixup spans a bunch of filesystems and a lot > of people. This patch fixes common code that absolutely everyone uses. > It has measurable performance benefits[3]. > > I think this patch can go in and not be held up by the others. I'll queue this in mm-hotfixes with a view to sneaking it into 6.14-rcX as a basis for others to work against.
Hello, kernel test robot noticed a 3.6% improvement of will-it-scale.per_thread_ops on: commit: 391ab5826c820c58d180534a7a727ff5668d4d61 ("[PATCH] [v2] filemap: Move prefaulting out of hot write path") url: https://github.com/intel-lab-lkp/linux/commits/Dave-Hansen/filemap-Move-prefaulting-out-of-hot-write-path/20250301-043921 base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/all/20250228203722.CAEB63AC@davehans-spike.ostc.intel.com/ patch subject: [PATCH] [v2] filemap: Move prefaulting out of hot write path testcase: will-it-scale config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 104 threads 2 sockets (Skylake) with 192G memory parameters: nr_task: 100% mode: thread test: pwrite1 cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250310/202503101621.e0858506-lkp@intel.com ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pwrite1/will-it-scale commit: 3dec9c0e67 ("foo") 391ab5826c ("filemap: Move prefaulting out of hot write path") 3dec9c0e67aaf496 391ab5826c820c58d180534a7a7 ---------------- --------------------------- %stddev %change %stddev \ | \ 182266 ± 3% +9.4% 199333 ± 4% meminfo.DirectMap4k 765.67 ± 8% -22.1% 596.83 ± 9% perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter 17510 ± 6% +19.3% 20889 ± 8% sched_debug.cpu.nr_switches.max 3219 ± 5% +11.2% 3578 ± 2% sched_debug.cpu.nr_switches.stddev 54561715 +3.6% 56543708 will-it-scale.104.threads 524631 +3.6% 543689 will-it-scale.per_thread_ops 54561715 +3.6% 56543708 will-it-scale.workload 1.752e+10 -1.2% 1.731e+10 perf-stat.i.branch-instructions 1.59 +0.0 1.63 perf-stat.i.branch-miss-rate% 3.25 +1.8% 3.31 perf-stat.i.cpi 8.828e+10 -1.5% 8.699e+10 perf-stat.i.instructions 0.31 -1.8% 0.30 perf-stat.i.ipc 1.58 +0.0 1.62 perf-stat.overall.branch-miss-rate% 3.25 +1.8% 3.31 perf-stat.overall.cpi 0.31 -1.7% 0.30 perf-stat.overall.ipc 487316 -4.8% 464012 perf-stat.overall.path-length 1.746e+10 -1.2% 1.725e+10 perf-stat.ps.branch-instructions 8.798e+10 -1.5% 8.67e+10 perf-stat.ps.instructions 2.659e+13 -1.3% 2.624e+13 perf-stat.total.instructions 34.45 -5.9 28.57 perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64 48.17 -5.3 42.87 ± 2% perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite 42.56 -5.3 37.29 ± 2% perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe 51.38 -4.9 46.46 perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite 54.26 -4.5 49.76 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite 62.52 -3.9 58.65 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite 13.30 ± 2% -2.3 10.96 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64 10.17 ± 2% -2.0 8.18 perf-profile.calltrace.cycles-pp.rep_movs_alternative.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write 0.59 ± 4% -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter 0.88 ± 3% -0.1 0.74 ± 4% perf-profile.calltrace.cycles-pp.folio_mark_accessed.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter 0.67 ± 2% -0.1 0.56 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64 99.51 +0.1 99.56 perf-profile.calltrace.cycles-pp.__libc_pwrite 0.64 ± 5% +0.1 0.71 ± 2% perf-profile.calltrace.cycles-pp.fput.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite 0.93 ± 2% +0.1 1.02 perf-profile.calltrace.cycles-pp.folio_unlock.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write 0.72 +0.1 0.82 ± 2% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64_mg.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter 1.20 +0.2 1.38 ± 2% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite 0.84 ± 2% +0.2 1.04 ± 2% perf-profile.calltrace.cycles-pp.noop_dirty_folio.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write 1.48 +0.2 1.72 ± 2% perf-profile.calltrace.cycles-pp.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write 1.00 ± 2% +0.3 1.26 ± 3% perf-profile.calltrace.cycles-pp.folio_mark_dirty.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write 1.99 ± 3% +0.3 2.27 ± 4% perf-profile.calltrace.cycles-pp.fdget.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite 6.69 +0.3 7.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__libc_pwrite 2.24 +0.4 2.60 ± 3% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64 2.78 ± 2% +0.4 3.21 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64 4.34 +0.6 4.92 perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64 12.01 +0.9 12.92 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__libc_pwrite 2.89 ± 6% +1.4 4.26 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_pwrite 15.09 +1.5 16.61 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pwrite 34.58 -5.9 28.70 perf-profile.children.cycles-pp.generic_perform_write 48.24 -5.3 42.94 ± 2% perf-profile.children.cycles-pp.vfs_write 42.95 -5.3 37.66 ± 2% perf-profile.children.cycles-pp.shmem_file_write_iter 51.54 -4.9 46.63 perf-profile.children.cycles-pp.__x64_sys_pwrite64 54.37 -4.5 49.86 perf-profile.children.cycles-pp.do_syscall_64 62.76 -3.9 58.90 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 10.62 ± 2% -2.3 8.30 perf-profile.children.cycles-pp.rep_movs_alternative 13.47 ± 2% -2.3 11.16 perf-profile.children.cycles-pp.copy_page_from_iter_atomic 0.90 ± 3% -0.1 0.76 ± 4% perf-profile.children.cycles-pp.folio_mark_accessed 0.69 ± 2% -0.1 0.60 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags 0.29 -0.0 0.26 ± 2% perf-profile.children.cycles-pp.testcase 0.31 ± 3% -0.0 0.29 ± 2% perf-profile.children.cycles-pp.update_process_times 0.50 -0.0 0.48 perf-profile.children.cycles-pp.rcu_all_qs 99.67 +0.0 99.70 perf-profile.children.cycles-pp.__libc_pwrite 0.64 ± 5% +0.1 0.71 ± 2% perf-profile.children.cycles-pp.fput 0.43 ± 3% +0.1 0.50 ± 2% perf-profile.children.cycles-pp.folio_mapping 0.94 ± 2% +0.1 1.02 perf-profile.children.cycles-pp.folio_unlock 0.74 ± 2% +0.1 0.85 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64_mg 1.23 +0.2 1.41 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.89 +0.2 1.10 ± 2% perf-profile.children.cycles-pp.noop_dirty_folio 1.54 +0.2 1.79 ± 2% perf-profile.children.cycles-pp.current_time 1.99 ± 3% +0.3 2.27 ± 4% perf-profile.children.cycles-pp.fdget 1.08 ± 3% +0.3 1.36 ± 3% perf-profile.children.cycles-pp.folio_mark_dirty 2.30 +0.4 2.66 ± 3% perf-profile.children.cycles-pp.inode_needs_update_time 2.86 ± 2% +0.4 3.30 ± 2% perf-profile.children.cycles-pp.file_update_time 4.58 +0.6 5.17 perf-profile.children.cycles-pp.shmem_write_end 1.73 ± 5% +0.7 2.43 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 12.88 +1.0 13.84 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 6.99 +1.0 7.95 perf-profile.children.cycles-pp.entry_SYSCALL_64 15.22 +1.5 16.75 perf-profile.children.cycles-pp.syscall_return_via_sysret 10.43 ± 2% -2.3 8.08 perf-profile.self.cycles-pp.rep_movs_alternative 3.26 -0.4 2.86 perf-profile.self.cycles-pp.generic_perform_write 2.02 -0.2 1.78 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.87 ± 3% -0.1 0.74 ± 4% perf-profile.self.cycles-pp.folio_mark_accessed 0.53 ± 2% -0.1 0.43 ± 2% perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags 0.25 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.testcase 0.54 +0.0 0.59 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.79 ± 4% +0.0 0.84 ± 2% perf-profile.self.cycles-pp.__x64_sys_pwrite64 0.51 ± 2% +0.1 0.58 ± 2% perf-profile.self.cycles-pp.fput 0.38 ± 3% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.folio_mapping 0.74 ± 2% +0.1 0.82 perf-profile.self.cycles-pp.folio_unlock 0.73 ± 4% +0.1 0.82 ± 4% perf-profile.self.cycles-pp.inode_needs_update_time 0.54 ± 6% +0.1 0.64 ± 3% perf-profile.self.cycles-pp.file_update_time 0.72 ± 2% +0.1 0.82 ± 2% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64_mg 0.79 ± 2% +0.1 0.94 ± 3% perf-profile.self.cycles-pp.current_time 0.98 ± 2% +0.2 1.14 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.85 +0.2 1.04 ± 2% perf-profile.self.cycles-pp.noop_dirty_folio 0.65 ± 3% +0.2 0.85 ± 4% perf-profile.self.cycles-pp.folio_mark_dirty 1.11 ± 6% +0.2 1.32 ± 2% perf-profile.self.cycles-pp.do_syscall_64 1.98 ± 3% +0.3 2.26 ± 4% perf-profile.self.cycles-pp.fdget 2.14 ± 2% +0.4 2.55 ± 3% perf-profile.self.cycles-pp.__libc_pwrite 1.63 +0.6 2.23 ± 3% perf-profile.self.cycles-pp.shmem_write_begin 8.54 +0.7 9.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 6.09 +0.9 7.03 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64 12.75 +1.0 13.70 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 15.20 +1.5 16.72 perf-profile.self.cycles-pp.syscall_return_via_sysret Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
diff -puN mm/filemap.c~generic_perform_write-1 mm/filemap.c --- a/mm/filemap.c~generic_perform_write-1 2025-02-28 11:58:36.499615962 -0800 +++ b/mm/filemap.c 2025-02-28 12:13:27.845549365 -0800 @@ -4170,17 +4170,6 @@ retry: bytes = min(chunk - offset, bytes); balance_dirty_pages_ratelimited(mapping); - /* - * Bring in the user page that we will copy from _first_. - * Otherwise there's a nasty deadlock on copying from the - * same page as we're writing to, without it being marked - * up-to-date. - */ - if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) { - status = -EFAULT; - break; - } - if (fatal_signal_pending(current)) { status = -EINTR; break; @@ -4198,6 +4187,12 @@ retry: if (mapping_writably_mapped(mapping)) flush_dcache_folio(folio); + /* + * Faults here on mmap()s can recurse into arbitrary + * filesystem code. Lots of locks are held that can + * deadlock. Use an atomic copy to avoid deadlocking + * in page fault handling. + */ copied = copy_folio_from_iter_atomic(folio, offset, bytes, i); flush_dcache_folio(folio); @@ -4223,6 +4218,16 @@ retry: bytes = copied; goto retry; } + + /* + * 'folio' is now unlocked and faults on it can be + * handled. Ensure forward progress by trying to + * fault it in now. + */ + if (fault_in_iov_iter_readable(i, bytes) == bytes) { + status = -EFAULT; + break; + } } else { pos += status; written += status;