diff mbox series

[3/3] wait: avoid spurious calls to prepare_to_wait_event() in ___wait_event()

Message ID 20250303230409.452687-4-mjguzik@gmail.com (mailing list archive)
State New
Headers show
Series some pipe + wait stuff | expand

Commit Message

Mateusz Guzik March 3, 2025, 11:04 p.m. UTC
In vast majority of cases the condition determining whether the thread
can proceed is true after the first wake up.

However, even in that case the thread ends up calling into
prepare_to_wait_event() again, suffering a spurious irq + lock trip.

Then it calls into finish_wait() to unlink itself.

Note that in case of a pending signal the work done by
prepare_to_wait_event() gets ignored even without the change.

pre-check the condition after waking up instead.

Stats gathared during a kernel build:
bpftrace -e 'kprobe:prepare_to_wait_event,kprobe:finish_wait \
		 { @[probe] = count(); }'

@[kprobe:finish_wait]: 392483
@[kprobe:prepare_to_wait_event]: 778690

As in calls to prepare_to_wait_event() almost double calls to
finish_wait(). This evens out with the patch.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

One may worry about using "condition" twice. However, macros leading up
to this one already do it, so it should be fine.

Also one may wonder about fences -- to my understanding going off and on
CPU guarantees a full fence, so the now avoided lock trip changes
nothing.

 include/linux/wait.h | 3 +++
 1 file changed, 3 insertions(+)

Comments

Peter Zijlstra March 4, 2025, 2:19 p.m. UTC | #1
On Tue, Mar 04, 2025 at 12:04:09AM +0100, Mateusz Guzik wrote:
> In vast majority of cases the condition determining whether the thread
> can proceed is true after the first wake up.
> 
> However, even in that case the thread ends up calling into
> prepare_to_wait_event() again, suffering a spurious irq + lock trip.
> 
> Then it calls into finish_wait() to unlink itself.
> 
> Note that in case of a pending signal the work done by
> prepare_to_wait_event() gets ignored even without the change.
> 
> pre-check the condition after waking up instead.
> 
> Stats gathared during a kernel build:
> bpftrace -e 'kprobe:prepare_to_wait_event,kprobe:finish_wait \
> 		 { @[probe] = count(); }'
> 
> @[kprobe:finish_wait]: 392483
> @[kprobe:prepare_to_wait_event]: 778690
> 
> As in calls to prepare_to_wait_event() almost double calls to
> finish_wait(). This evens out with the patch.
> 
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
> 
> One may worry about using "condition" twice. However, macros leading up
> to this one already do it, so it should be fine.
> 
> Also one may wonder about fences -- to my understanding going off and on
> CPU guarantees a full fence, so the now avoided lock trip changes
> nothing.

so it always helps if you provide actual numbers. Supposedly this makes
it go faster?

Also, how much bytes does it add to the image?

Anyway, no real objection, but it would be good to have better
substantiation etc.

>  include/linux/wait.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 2bdc8f47963b..965a19809c7e 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -316,6 +316,9 @@ extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags);
>  		}								\
>  										\
>  		cmd;								\
> +										\
> +		if (condition)							\
> +			break;							\
>  	}									\
>  	finish_wait(&wq_head, &__wq_entry);					\
>  __out:	__ret;									\
> -- 
> 2.43.0
>
Mateusz Guzik March 4, 2025, 3:25 p.m. UTC | #2
On Tue, Mar 4, 2025 at 3:19 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Mar 04, 2025 at 12:04:09AM +0100, Mateusz Guzik wrote:
> > In vast majority of cases the condition determining whether the thread
> > can proceed is true after the first wake up.
> >
> > However, even in that case the thread ends up calling into
> > prepare_to_wait_event() again, suffering a spurious irq + lock trip.
> >
> > Then it calls into finish_wait() to unlink itself.
> >
> > Note that in case of a pending signal the work done by
> > prepare_to_wait_event() gets ignored even without the change.
> >
> > pre-check the condition after waking up instead.
> >
> > Stats gathared during a kernel build:
> > bpftrace -e 'kprobe:prepare_to_wait_event,kprobe:finish_wait \
> >                { @[probe] = count(); }'
> >
> > @[kprobe:finish_wait]: 392483
> > @[kprobe:prepare_to_wait_event]: 778690
> >
> > As in calls to prepare_to_wait_event() almost double calls to
> > finish_wait(). This evens out with the patch.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> > ---
> >
> > One may worry about using "condition" twice. However, macros leading up
> > to this one already do it, so it should be fine.
> >
> > Also one may wonder about fences -- to my understanding going off and on
> > CPU guarantees a full fence, so the now avoided lock trip changes
> > nothing.
>
> so it always helps if you provide actual numbers. Supposedly this makes
> it go faster?
>
> Also, how much bytes does it add to the image?
>
> Anyway, no real objection, but it would be good to have better
> substantiation etc.
>

I did not bother benchmarking this per se, but I did demonstrate with
bpftrace above that this does avoid the irq and lock usage, which is
normally the desired outcome.

It is a fair point that in this case this is a tradeoff with i-cache footprint.

bloat-o-meter claims:
add/remove: 2/2 grow/shrink: 159/5 up/down: 3865/-250 (3615)
[..]
Total: Before=29946099, After=29949714, chg +0.01%

The biggest growth, by a large margin:
will_read_block.part                           -     183    +183
fuse_get_req                                 800     924    +124
load_module                                 9137    9255    +118
fuse_do_readfolio                            415     528    +113
fuse_page_mkwrite                            157     259    +102

On the other hand the shrinkage:
port_fops_read                               504     503      -1
__ps2_command                               1402    1401      -1
fcntl_setlk                                  961     947     -14
__pfx_fuse_wait_on_folio_writeback            16       -     -16
load_compressed_image                       4007    3974     -33
uprobe_notify_resume                        3384    3323     -61
fuse_wait_on_folio_writeback                 124       -    -124

The entire thing is attached.


--
Mateusz Guzik <mjguzik gmail.com>
add/remove: 2/2 grow/shrink: 159/5 up/down: 3865/-250 (3615)
Function                                     old     new   delta
will_read_block.part                           -     183    +183
fuse_get_req                                 800     924    +124
load_module                                 9137    9255    +118
fuse_do_readfolio                            415     528    +113
fuse_page_mkwrite                            157     259    +102
wait_port_writable                           545     642     +97
ksm_scan_thread                             8189    8270     +81
sg_ioctl                                    2839    2918     +79
fuse_wait_on_folio_writeback.part            202     280     +78
ring_buffer_wait                             431     500     +69
__bio_queue_enter                            551     617     +66
drm_wait_vblank_ioctl                       1692    1756     +64
blk_queue_enter                              490     554     +64
add_transaction_credits                      700     762     +62
khugepaged                                  2197    2256     +59
sg_open                                     1663    1721     +58
sg_read                                     1523    1579     +56
fuse_dev_do_read                            2405    2459     +54
joydev_read                                 1106    1158     +52
btrfs_wait_block_group_cache_progress        370     420     +50
syslog_print                                 709     757     +48
usb_wait_anchor_empty_timeout                235     280     +45
mmu_interval_notifier_remove                 412     456     +44
ec_guard                                     510     554     +44
kcompactd                                    961    1004     +43
wait_for_tpm_stat                            669     711     +42
psi_rtpoll_worker                            906     948     +42
anon_pipe_write                             1581    1621     +40
ksgxd                                        357     394     +37
btrfs_commit_transaction                    3598    3634     +36
scsi_block_when_processing_errors            200     234     +34
save_compressed_image                       2695    2728     +33
virtio_gpu_init                             2249    2281     +32
regmap_async_complete.part                   413     444     +31
fuse_wait_on_page_writeback                  185     216     +31
fanotify_handle_event                       6056    6087     +31
crtc_crc_read                                718     749     +31
acpi_ec_stop                                 418     448     +30
fuse_readahead                               930     959     +29
pipe_wait_writable                           234     262     +28
evdev_read                                   643     671     +28
ps2_drain                                    322     349     +27
__tpm_tis_request_locality                   466     492     +26
throttle_direct_reclaim                      626     651     +25
ps2_do_sendbyte                              539     563     +24
drm_atomic_helper_wait_for_vblanks.part      538     562     +24
devkmsg_read                                 467     491     +24
set_termios.part                             637     660     +23
fuse_wait_aborted                            252     275     +23
btrfs_caching_ctl_wait_done                  177     200     +23
fuse_file_cached_io_open                     377     399     +22
btrfs_scrub_pause                            233     255     +22
virtballoon_free_page_report                 219     240     +21
locks_lock_inode_wait                        357     378     +21
usbhid_wait_io                               238     258     +20
serio_raw_read                               521     541     +20
request_wait_answer                          519     539     +20
btrfs_pause_balance                          264     284     +20
blk_mq_freeze_queue_wait_timeout             213     233     +20
anon_pipe_read                               970     990     +20
shpc_write_cmd                               656     675     +19
mousedev_read                                457     476     +19
btrfs_commit_transaction_async               253     272     +19
tell_host                                    226     244     +18
oom_killer_disable                           281     299     +18
virtgpu_virtio_get_uuid                      200     217     +17
pipe_wait_readable                           228     245     +17
async_synchronize_cookie_domain              285     302     +17
jbd2_journal_destroy                         768     784     +16
decompress_threadfn                          323     339     +16
crc32_threadfn                               317     333     +16
compress_threadfn                            344     360     +16
__pfx_will_read_block.part                     -      16     +16
wait_current_trans                           296     311     +15
pciehp_sysfs_enable_slot                     403     418     +15
pciehp_sysfs_disable_slot                    392     407     +15
module_patient_check_exists.isra             313     328     +15
btrfs_balance_delayed_items                  436     450     +14
genl_unregister_family                       480     493     +13
btrfs_rm_dev_replace_blocked                 159     172     +13
pci_dpc_recovered                            212     224     +12
jbd2_log_wait_commit                         269     281     +12
fuse_sync_fs                                 569     581     +12
fuse_set_nowrite                             212     224     +12
flush_scrub_stripes                          615     627     +12
drm_vblank_work_flush_all                    217     229     +12
drm_read                                     678     690     +12
blk_mq_freeze_queue_wait                     150     162     +12
rmw_rbio                                    2673    2684     +11
pseudo_lock_measure_trigger                  740     751     +11
percpu_ref_switch_to_atomic_sync             186     197     +11
flush_space                                 1395    1406     +11
btrfs_bio_counter_inc_blocked                256     267     +11
__usermodehelper_disable                     308     319     +11
wait_scrub_stripe_io                         138     148     +10
wait_for_device_probe                        148     158     +10
usbdev_ioctl                                5231    5241     +10
synchronize_rcu_expedited                    459     469     +10
serial_core_unregister_port                  739     749     +10
pci_doe_wait.constprop                       193     203     +10
oom_reaper                                   973     983     +10
kernfs_drain                                 305     315     +10
kauditd_thread                               677     687     +10
jbd2_journal_lock_updates                    210     220     +10
jbd2_journal_load                            891     901     +10
iosf_mbi_punit_acquire                       184     194     +10
iosf_mbi_block_punit_i2c_access              818     828     +10
fsnotify_destroy_group                       213     223     +10
expand_files                                 667     677     +10
exp_funnel_lock                              503     513     +10
dm_wait_event                                157     167     +10
cpuset_handle_hotplug                       3241    3251     +10
btrfs_wait_on_delayed_iputs                  160     170     +10
btrfs_scrub_cancel_dev                       255     265     +10
btrfs_scrub_cancel                           252     262     +10
btrfs_remount_begin                          221     231     +10
btrfs_drew_write_lock                        199     209     +10
__synchronize_irq                            148     158     +10
__scrub_blocked_if_needed                    185     195     +10
virtio_gpu_vram_mmap                         484     493      +9
virtio_gpu_get_caps_ioctl                    555     564      +9
try_flush_qgroup                             275     284      +9
start_this_handle                           1243    1252      +9
scrub_raid56_parity_stripe                  1328    1337      +9
pcie_wait_cmd                                568     577      +9
pci_wait_cfg                                 170     179      +9
ns_revision_read                             394     403      +9
mmu_interval_read_begin                      190     199      +9
drm_vblank_work_flush                        195     204      +9
dquot_disable                               1596    1605      +9
dm_kcopyd_client_destroy                     353     362      +9
cpufreq_freq_transition_begin                333     342      +9
cppc_set_perf                                885     894      +9
btrfs_cancel_balance                         339     348      +9
wake_up_and_wait_for_irq_thread_ready        156     164      +8
squashfs_cache_get                           799     807      +8
scrub_rbio_work_locked                      2250    2258      +8
rdtgroup_pseudo_lock_create                 1725    1733      +8
drm_wait_one_vblank                          473     481      +8
close_ctree                                 1258    1266      +8
btrfs_write_dirty_block_groups               910     918      +8
btrfs_start_ordered_extent_nowriteback       379     387      +8
btrfs_cleanup_transaction.isra              1373    1381      +8
autofs_wait                                 1763    1771      +8
wait_for_random_bytes                        235     242      +7
virtio_gpu_cursor_ping                       647     654      +7
usb_poison_urb                               176     183      +7
usb_kill_urb.part                            149     156      +7
rcu_sync_enter                               251     258      +7
io_sqd_handle_event                          305     312      +7
btrfs_drew_read_lock                         136     143      +7
__vt_event_wait.part                         122     129      +7
__percpu_ref_switch_mode                     458     465      +7
wb_wait_for_completion                       130     136      +6
hib_wait_io                                  131     137      +6
do_coredump                                 6667    6673      +6
xfs_pwork_poll                               186     191      +5
virtio_gpu_queue_ctrl_sgs                    648     653      +5
tty_wait_until_sent                          429     434      +5
submit_read_wait_bio_list                    373     377      +4
wait_for_commit                              401     402      +1
port_fops_read                               504     503      -1
__ps2_command                               1402    1401      -1
fcntl_setlk                                  961     947     -14
__pfx_fuse_wait_on_folio_writeback            16       -     -16
load_compressed_image                       4007    3974     -33
uprobe_notify_resume                        3384    3323     -61
fuse_wait_on_folio_writeback                 124       -    -124
Total: Before=29946099, After=29949714, chg +0.01%
diff mbox series

Patch

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 2bdc8f47963b..965a19809c7e 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -316,6 +316,9 @@  extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags);
 		}								\
 										\
 		cmd;								\
+										\
+		if (condition)							\
+			break;							\
 	}									\
 	finish_wait(&wq_head, &__wq_entry);					\
 __out:	__ret;									\