Message ID | 20250303230409.452687-4-mjguzik@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | some pipe + wait stuff | expand |
On Tue, Mar 04, 2025 at 12:04:09AM +0100, Mateusz Guzik wrote: > In vast majority of cases the condition determining whether the thread > can proceed is true after the first wake up. > > However, even in that case the thread ends up calling into > prepare_to_wait_event() again, suffering a spurious irq + lock trip. > > Then it calls into finish_wait() to unlink itself. > > Note that in case of a pending signal the work done by > prepare_to_wait_event() gets ignored even without the change. > > pre-check the condition after waking up instead. > > Stats gathared during a kernel build: > bpftrace -e 'kprobe:prepare_to_wait_event,kprobe:finish_wait \ > { @[probe] = count(); }' > > @[kprobe:finish_wait]: 392483 > @[kprobe:prepare_to_wait_event]: 778690 > > As in calls to prepare_to_wait_event() almost double calls to > finish_wait(). This evens out with the patch. > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > --- > > One may worry about using "condition" twice. However, macros leading up > to this one already do it, so it should be fine. > > Also one may wonder about fences -- to my understanding going off and on > CPU guarantees a full fence, so the now avoided lock trip changes > nothing. so it always helps if you provide actual numbers. Supposedly this makes it go faster? Also, how much bytes does it add to the image? Anyway, no real objection, but it would be good to have better substantiation etc. > include/linux/wait.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/wait.h b/include/linux/wait.h > index 2bdc8f47963b..965a19809c7e 100644 > --- a/include/linux/wait.h > +++ b/include/linux/wait.h > @@ -316,6 +316,9 @@ extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags); > } \ > \ > cmd; \ > + \ > + if (condition) \ > + break; \ > } \ > finish_wait(&wq_head, &__wq_entry); \ > __out: __ret; \ > -- > 2.43.0 >
On Tue, Mar 4, 2025 at 3:19 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Mar 04, 2025 at 12:04:09AM +0100, Mateusz Guzik wrote: > > In vast majority of cases the condition determining whether the thread > > can proceed is true after the first wake up. > > > > However, even in that case the thread ends up calling into > > prepare_to_wait_event() again, suffering a spurious irq + lock trip. > > > > Then it calls into finish_wait() to unlink itself. > > > > Note that in case of a pending signal the work done by > > prepare_to_wait_event() gets ignored even without the change. > > > > pre-check the condition after waking up instead. > > > > Stats gathared during a kernel build: > > bpftrace -e 'kprobe:prepare_to_wait_event,kprobe:finish_wait \ > > { @[probe] = count(); }' > > > > @[kprobe:finish_wait]: 392483 > > @[kprobe:prepare_to_wait_event]: 778690 > > > > As in calls to prepare_to_wait_event() almost double calls to > > finish_wait(). This evens out with the patch. > > > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > > --- > > > > One may worry about using "condition" twice. However, macros leading up > > to this one already do it, so it should be fine. > > > > Also one may wonder about fences -- to my understanding going off and on > > CPU guarantees a full fence, so the now avoided lock trip changes > > nothing. > > so it always helps if you provide actual numbers. Supposedly this makes > it go faster? > > Also, how much bytes does it add to the image? > > Anyway, no real objection, but it would be good to have better > substantiation etc. > I did not bother benchmarking this per se, but I did demonstrate with bpftrace above that this does avoid the irq and lock usage, which is normally the desired outcome. It is a fair point that in this case this is a tradeoff with i-cache footprint. bloat-o-meter claims: add/remove: 2/2 grow/shrink: 159/5 up/down: 3865/-250 (3615) [..] Total: Before=29946099, After=29949714, chg +0.01% The biggest growth, by a large margin: will_read_block.part - 183 +183 fuse_get_req 800 924 +124 load_module 9137 9255 +118 fuse_do_readfolio 415 528 +113 fuse_page_mkwrite 157 259 +102 On the other hand the shrinkage: port_fops_read 504 503 -1 __ps2_command 1402 1401 -1 fcntl_setlk 961 947 -14 __pfx_fuse_wait_on_folio_writeback 16 - -16 load_compressed_image 4007 3974 -33 uprobe_notify_resume 3384 3323 -61 fuse_wait_on_folio_writeback 124 - -124 The entire thing is attached. -- Mateusz Guzik <mjguzik gmail.com> add/remove: 2/2 grow/shrink: 159/5 up/down: 3865/-250 (3615) Function old new delta will_read_block.part - 183 +183 fuse_get_req 800 924 +124 load_module 9137 9255 +118 fuse_do_readfolio 415 528 +113 fuse_page_mkwrite 157 259 +102 wait_port_writable 545 642 +97 ksm_scan_thread 8189 8270 +81 sg_ioctl 2839 2918 +79 fuse_wait_on_folio_writeback.part 202 280 +78 ring_buffer_wait 431 500 +69 __bio_queue_enter 551 617 +66 drm_wait_vblank_ioctl 1692 1756 +64 blk_queue_enter 490 554 +64 add_transaction_credits 700 762 +62 khugepaged 2197 2256 +59 sg_open 1663 1721 +58 sg_read 1523 1579 +56 fuse_dev_do_read 2405 2459 +54 joydev_read 1106 1158 +52 btrfs_wait_block_group_cache_progress 370 420 +50 syslog_print 709 757 +48 usb_wait_anchor_empty_timeout 235 280 +45 mmu_interval_notifier_remove 412 456 +44 ec_guard 510 554 +44 kcompactd 961 1004 +43 wait_for_tpm_stat 669 711 +42 psi_rtpoll_worker 906 948 +42 anon_pipe_write 1581 1621 +40 ksgxd 357 394 +37 btrfs_commit_transaction 3598 3634 +36 scsi_block_when_processing_errors 200 234 +34 save_compressed_image 2695 2728 +33 virtio_gpu_init 2249 2281 +32 regmap_async_complete.part 413 444 +31 fuse_wait_on_page_writeback 185 216 +31 fanotify_handle_event 6056 6087 +31 crtc_crc_read 718 749 +31 acpi_ec_stop 418 448 +30 fuse_readahead 930 959 +29 pipe_wait_writable 234 262 +28 evdev_read 643 671 +28 ps2_drain 322 349 +27 __tpm_tis_request_locality 466 492 +26 throttle_direct_reclaim 626 651 +25 ps2_do_sendbyte 539 563 +24 drm_atomic_helper_wait_for_vblanks.part 538 562 +24 devkmsg_read 467 491 +24 set_termios.part 637 660 +23 fuse_wait_aborted 252 275 +23 btrfs_caching_ctl_wait_done 177 200 +23 fuse_file_cached_io_open 377 399 +22 btrfs_scrub_pause 233 255 +22 virtballoon_free_page_report 219 240 +21 locks_lock_inode_wait 357 378 +21 usbhid_wait_io 238 258 +20 serio_raw_read 521 541 +20 request_wait_answer 519 539 +20 btrfs_pause_balance 264 284 +20 blk_mq_freeze_queue_wait_timeout 213 233 +20 anon_pipe_read 970 990 +20 shpc_write_cmd 656 675 +19 mousedev_read 457 476 +19 btrfs_commit_transaction_async 253 272 +19 tell_host 226 244 +18 oom_killer_disable 281 299 +18 virtgpu_virtio_get_uuid 200 217 +17 pipe_wait_readable 228 245 +17 async_synchronize_cookie_domain 285 302 +17 jbd2_journal_destroy 768 784 +16 decompress_threadfn 323 339 +16 crc32_threadfn 317 333 +16 compress_threadfn 344 360 +16 __pfx_will_read_block.part - 16 +16 wait_current_trans 296 311 +15 pciehp_sysfs_enable_slot 403 418 +15 pciehp_sysfs_disable_slot 392 407 +15 module_patient_check_exists.isra 313 328 +15 btrfs_balance_delayed_items 436 450 +14 genl_unregister_family 480 493 +13 btrfs_rm_dev_replace_blocked 159 172 +13 pci_dpc_recovered 212 224 +12 jbd2_log_wait_commit 269 281 +12 fuse_sync_fs 569 581 +12 fuse_set_nowrite 212 224 +12 flush_scrub_stripes 615 627 +12 drm_vblank_work_flush_all 217 229 +12 drm_read 678 690 +12 blk_mq_freeze_queue_wait 150 162 +12 rmw_rbio 2673 2684 +11 pseudo_lock_measure_trigger 740 751 +11 percpu_ref_switch_to_atomic_sync 186 197 +11 flush_space 1395 1406 +11 btrfs_bio_counter_inc_blocked 256 267 +11 __usermodehelper_disable 308 319 +11 wait_scrub_stripe_io 138 148 +10 wait_for_device_probe 148 158 +10 usbdev_ioctl 5231 5241 +10 synchronize_rcu_expedited 459 469 +10 serial_core_unregister_port 739 749 +10 pci_doe_wait.constprop 193 203 +10 oom_reaper 973 983 +10 kernfs_drain 305 315 +10 kauditd_thread 677 687 +10 jbd2_journal_lock_updates 210 220 +10 jbd2_journal_load 891 901 +10 iosf_mbi_punit_acquire 184 194 +10 iosf_mbi_block_punit_i2c_access 818 828 +10 fsnotify_destroy_group 213 223 +10 expand_files 667 677 +10 exp_funnel_lock 503 513 +10 dm_wait_event 157 167 +10 cpuset_handle_hotplug 3241 3251 +10 btrfs_wait_on_delayed_iputs 160 170 +10 btrfs_scrub_cancel_dev 255 265 +10 btrfs_scrub_cancel 252 262 +10 btrfs_remount_begin 221 231 +10 btrfs_drew_write_lock 199 209 +10 __synchronize_irq 148 158 +10 __scrub_blocked_if_needed 185 195 +10 virtio_gpu_vram_mmap 484 493 +9 virtio_gpu_get_caps_ioctl 555 564 +9 try_flush_qgroup 275 284 +9 start_this_handle 1243 1252 +9 scrub_raid56_parity_stripe 1328 1337 +9 pcie_wait_cmd 568 577 +9 pci_wait_cfg 170 179 +9 ns_revision_read 394 403 +9 mmu_interval_read_begin 190 199 +9 drm_vblank_work_flush 195 204 +9 dquot_disable 1596 1605 +9 dm_kcopyd_client_destroy 353 362 +9 cpufreq_freq_transition_begin 333 342 +9 cppc_set_perf 885 894 +9 btrfs_cancel_balance 339 348 +9 wake_up_and_wait_for_irq_thread_ready 156 164 +8 squashfs_cache_get 799 807 +8 scrub_rbio_work_locked 2250 2258 +8 rdtgroup_pseudo_lock_create 1725 1733 +8 drm_wait_one_vblank 473 481 +8 close_ctree 1258 1266 +8 btrfs_write_dirty_block_groups 910 918 +8 btrfs_start_ordered_extent_nowriteback 379 387 +8 btrfs_cleanup_transaction.isra 1373 1381 +8 autofs_wait 1763 1771 +8 wait_for_random_bytes 235 242 +7 virtio_gpu_cursor_ping 647 654 +7 usb_poison_urb 176 183 +7 usb_kill_urb.part 149 156 +7 rcu_sync_enter 251 258 +7 io_sqd_handle_event 305 312 +7 btrfs_drew_read_lock 136 143 +7 __vt_event_wait.part 122 129 +7 __percpu_ref_switch_mode 458 465 +7 wb_wait_for_completion 130 136 +6 hib_wait_io 131 137 +6 do_coredump 6667 6673 +6 xfs_pwork_poll 186 191 +5 virtio_gpu_queue_ctrl_sgs 648 653 +5 tty_wait_until_sent 429 434 +5 submit_read_wait_bio_list 373 377 +4 wait_for_commit 401 402 +1 port_fops_read 504 503 -1 __ps2_command 1402 1401 -1 fcntl_setlk 961 947 -14 __pfx_fuse_wait_on_folio_writeback 16 - -16 load_compressed_image 4007 3974 -33 uprobe_notify_resume 3384 3323 -61 fuse_wait_on_folio_writeback 124 - -124 Total: Before=29946099, After=29949714, chg +0.01%
diff --git a/include/linux/wait.h b/include/linux/wait.h index 2bdc8f47963b..965a19809c7e 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -316,6 +316,9 @@ extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags); } \ \ cmd; \ + \ + if (condition) \ + break; \ } \ finish_wait(&wq_head, &__wq_entry); \ __out: __ret; \
In vast majority of cases the condition determining whether the thread can proceed is true after the first wake up. However, even in that case the thread ends up calling into prepare_to_wait_event() again, suffering a spurious irq + lock trip. Then it calls into finish_wait() to unlink itself. Note that in case of a pending signal the work done by prepare_to_wait_event() gets ignored even without the change. pre-check the condition after waking up instead. Stats gathared during a kernel build: bpftrace -e 'kprobe:prepare_to_wait_event,kprobe:finish_wait \ { @[probe] = count(); }' @[kprobe:finish_wait]: 392483 @[kprobe:prepare_to_wait_event]: 778690 As in calls to prepare_to_wait_event() almost double calls to finish_wait(). This evens out with the patch. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- One may worry about using "condition" twice. However, macros leading up to this one already do it, so it should be fine. Also one may wonder about fences -- to my understanding going off and on CPU guarantees a full fence, so the now avoided lock trip changes nothing. include/linux/wait.h | 3 +++ 1 file changed, 3 insertions(+)