Message ID | 202003281643.02SGhBrh000992@sdf.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On Sun, Mar 29, 2020 at 09:52:23AM +0200, Takashi Iwai wrote: > On Thu, 22 Aug 2019 02:25:10 +0200, George Spelvin wrote: >> Likewise, "msecs_to_jiffies(seconds * 1000)" is more >> conveniently written "seconds * HZ". > > I thought the compiler already optimizes to the constant calculation > for the above case? It optimizes that if the entire argument, including "seconds", is a compile-time constant. However, given "msecs_to_jiffies(hdev->rpa_timeout * 1000);", the computatin is non-trivial.
On Sun, 29 Mar 2020 14:11:29 +0200, George Spelvin wrote: > > On Sun, Mar 29, 2020 at 09:52:23AM +0200, Takashi Iwai wrote: > > On Thu, 22 Aug 2019 02:25:10 +0200, George Spelvin wrote: > >> Likewise, "msecs_to_jiffies(seconds * 1000)" is more > >> conveniently written "seconds * HZ". > > > > I thought the compiler already optimizes to the constant calculation > > for the above case? > > It optimizes that if the entire argument, including "seconds", is > a compile-time constant. > > However, given "msecs_to_jiffies(hdev->rpa_timeout * 1000);", > the computatin is non-trivial. Fair enough. But it's still a question whether an open code X * HZ is good at all... thanks, Takashi
On Sun, Mar 29, 2020 at 07:13:33PM +0200, Takashi Iwai wrote: > On Sun, 29 Mar 2020 14:11:29 +0200, George Spelvin wrote: >> On Sun, Mar 29, 2020 at 09:52:23AM +0200, Takashi Iwai wrote: >>> I thought the compiler already optimizes to the constant calculation >>> for the above case? >> >> It optimizes that if the entire argument, including "seconds", is >> a compile-time constant. >> >> However, given "msecs_to_jiffies(hdev->rpa_timeout * 1000);", >> the computatin is non-trivial. > > Fair enough. But it's still a question whether an open code X * HZ is > good at all... I'm sorry, I don't understand what you mean by "good at all" here. The value computed is exactly the same. msecs_to_jiffies(x) is basically (x * HZ + 999) / 1000, so msecs_to_jiffies(s * 1000) = (s * 1000 * HZ + 999) / 1000 = s * HZ + 999/1000 = s * HZ.
On Sun, 2020-03-29 at 17:50 +0000, George Spelvin wrote: > On Sun, Mar 29, 2020 at 07:13:33PM +0200, Takashi Iwai wrote: > > On Sun, 29 Mar 2020 14:11:29 +0200, George Spelvin wrote: > > > On Sun, Mar 29, 2020 at 09:52:23AM +0200, Takashi Iwai wrote: > > > > I thought the compiler already optimizes to the constant > > > > calculation > > > > for the above case? > > > > > > It optimizes that if the entire argument, including "seconds", is > > > a compile-time constant. > > > > > > However, given "msecs_to_jiffies(hdev->rpa_timeout * 1000);", > > > the computatin is non-trivial. > > > > Fair enough. But it's still a question whether an open code X * HZ > > is > > good at all... > > I'm sorry, I don't understand what you mean by "good at all" here. > The value computed is exactly the same. I think he means what the compiler does with it. We all assume that msecs_to_jiffies is properly optimized so there should be no need to open code it like you're proposing. So firstly can you produce the assembly that shows the worse output from msecs_to_jiffies? If there is a problem, then we should be fixing it in msecs_to_jiffies, not adding open coding. James
On Sun, Mar 29, 2020 at 11:16:47AM -0700, James Bottomley wrote: > On Sun, 2020-03-29 at 17:50 +0000, George Spelvin wrote: >> On Sun, Mar 29, 2020 at 07:13:33PM +0200, Takashi Iwai wrote: >>> Fair enough. But it's still a question whether an open code X * HZ >>> is good at all... >> >> I'm sorry, I don't understand what you mean by "good at all" here. >> The value computed is exactly the same. > > I think he means what the compiler does with it. > > We all assume that msecs_to_jiffies is properly optimized so there > should be no need to open code it like you're proposing. So firstly > can you produce the assembly that shows the worse output from > msecs_to_jiffies? If there is a problem, then we should be fixing it > in msecs_to_jiffies, not adding open coding. Fair enough. For the two alternative functions: #include <linux/jiffies.h> unsigned long timeout1(unsigned seconds) { return jiffies + msecs_to_jiffies(seconds * 1000); } unsigned long timeout2(unsigned seconds) { return jiffies + seconds * HZ; } compiled with Kbuild, the object code produced is: Disassembly of section .text: 0000000000000000 <timeout1>: 0: 69 ff e8 03 00 00 imul $0x3e8,%edi,%edi 6: e8 00 00 00 00 callq b <timeout1+0xb> 7: R_X86_64_PLT32 __msecs_to_jiffies-0x4 b: 49 89 c0 mov %rax,%r8 e: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 15 <timeout1+0x15> 11: R_X86_64_PC32 jiffies-0x4 15: 4c 01 c0 add %r8,%rax 18: c3 retq 19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 0000000000000020 <timeout2>: 20: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 27 <timeout2+0x7> 23: R_X86_64_PC32 jiffies-0x4 27: 69 c7 2c 01 00 00 imul $0x12c,%edi,%eax 2d: 48 01 d0 add %rdx,%rax 30: c3 retq This is the type of code I replaced: code where the number of seconds is not known at compile time. Notice how the first multiplies by 1000 and then calls __msecs_to_jiffies. The second multiplies by 300 and makes no function call. __msecs_to_jiffies (from kernel/time/time.o) is: 0000000000000100 <__msecs_to_jiffies>: 100: 48 b8 fe ff ff ff ff movabs $0x3ffffffffffffffe,%rax 107: ff ff 3f 10a: 85 ff test %edi,%edi 10c: 78 1c js 12a <__msecs_to_jiffies+0x2a> 10e: b8 9a 99 99 99 mov $0x9999999a,%eax 113: 89 ff mov %edi,%edi 115: 48 0f af f8 imul %rax,%rdi 119: 48 b8 cc cc cc cc 01 movabs $0x1cccccccc,%rax 120: 00 00 00 123: 48 01 f8 add %rdi,%rax 126: 48 c1 e8 21 shr $0x21,%rax 12a: c3 retq 12b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) I didn't try to replace code that uses compile-time constant arguments such as include/linux/ceph/libceph.h.
On Sun, 29 Mar 2020 20:16:47 +0200, James Bottomley wrote: > > On Sun, 2020-03-29 at 17:50 +0000, George Spelvin wrote: > > On Sun, Mar 29, 2020 at 07:13:33PM +0200, Takashi Iwai wrote: > > > On Sun, 29 Mar 2020 14:11:29 +0200, George Spelvin wrote: > > > > On Sun, Mar 29, 2020 at 09:52:23AM +0200, Takashi Iwai wrote: > > > > > I thought the compiler already optimizes to the constant > > > > > calculation > > > > > for the above case? > > > > > > > > It optimizes that if the entire argument, including "seconds", is > > > > a compile-time constant. > > > > > > > > However, given "msecs_to_jiffies(hdev->rpa_timeout * 1000);", > > > > the computatin is non-trivial. > > > > > > Fair enough. But it's still a question whether an open code X * HZ > > > is > > > good at all... > > > > I'm sorry, I don't understand what you mean by "good at all" here. > > The value computed is exactly the same. > > I think he means what the compiler does with it. > > We all assume that msecs_to_jiffies is properly optimized so there > should be no need to open code it like you're proposing. Yes, it'd be best if the compiler can handle it properly. But also I meant to keep using the macro for consistency reason. IIRC, we wanted to eliminate the explicit use of HZ in the past, and it's how many lines have been converted with *_to_jiffies() calls. I don't know whether the eliminate of HZ is still wished, but reverting to the open code is a step backward for that. thanks, Takashi
On Mon, Mar 30, 2020 at 08:27:01AM +0200, Takashi Iwai wrote: > On Sun, 29 Mar 2020 20:16:47 +0200, James Bottomley wrote: >> We all assume that msecs_to_jiffies is properly optimized so there >> should be no need to open code it like you're proposing. > > Yes, it'd be best if the compiler can handle it properly. I've tried, and can't figure out how to get the compiler to detect this special case and not invoke the general code. In particular, for a variable x, __builtin_constant_p(x * 1000 % 1000) is false. Even if x is signed and ANSI lets the compiler assume that overflow doesn't happen. If you can do it, I'm most curious how! > But also I meant to keep using the macro for consistency reason. > IIRC, we wanted to eliminate the explicit use of HZ in the past, and > it's how many lines have been converted with *_to_jiffies() calls. > I don't know whether the eliminate of HZ is still wished, but > reverting to the open code is a step backward for that. Well, you could always add a secs_to_jiffies(x) wrapper. But given that it expands to basically x * HZ, some people might wonder why you're bothering. I assumed that open-coding x * HZ was the preferred style, so that's what I did.
On Mon, 30 Mar 2020 08:51:05 +0200, George Spelvin wrote: > > On Mon, Mar 30, 2020 at 08:27:01AM +0200, Takashi Iwai wrote: > > On Sun, 29 Mar 2020 20:16:47 +0200, James Bottomley wrote: > >> We all assume that msecs_to_jiffies is properly optimized so there > >> should be no need to open code it like you're proposing. > > > > Yes, it'd be best if the compiler can handle it properly. > > I've tried, and can't figure out how to get the compiler to detect this > special case and not invoke the general code. In particular, for a > variable x, __builtin_constant_p(x * 1000 % 1000) is false. Even if x is > signed and ANSI lets the compiler assume that overflow doesn't happen. > > If you can do it, I'm most curious how! Actually in the very early version of msecs_to_jiffies() was all inlined, so the compiler could optimize such a case, I guess. Now it was factored out to an external function in commit ca42aaf0c861, so it became difficult. > > But also I meant to keep using the macro for consistency reason. > > IIRC, we wanted to eliminate the explicit use of HZ in the past, and > > it's how many lines have been converted with *_to_jiffies() calls. > > I don't know whether the eliminate of HZ is still wished, but > > reverting to the open code is a step backward for that. > > Well, you could always add a secs_to_jiffies(x) wrapper. But given > that it expands to basically x * HZ, some people might wonder why > you're bothering. Well, comparing with the expanded result doesn't make always sense. With such a logic, you can argue why BIT(x) macro is needed, too. After all, it's a matter of semantics. > I assumed that open-coding x * HZ was the preferred style, so that's > what I did. That's my question, too -- whether the open code is preferred for this particular purpose. thanks, Takashi
diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c index 1791a393795da..9c530f8827518 100644 --- a/drivers/scsi/fcoe/fcoe_ctlr.c +++ b/drivers/scsi/fcoe/fcoe_ctlr.c @@ -2238,10 +2238,10 @@ static void fcoe_ctlr_vn_restart(struct fcoe_ctlr *fip) if (fip->probe_tries < FIP_VN_RLIM_COUNT) { fip->probe_tries++; - wait = prandom_u32() % FIP_VN_PROBE_WAIT; + wait = prandom_u32_max(msecs_to_jiffies(FIP_VN_PROBE_WAIT)); } else - wait = FIP_VN_RLIM_INT; - mod_timer(&fip->timer, jiffies + msecs_to_jiffies(wait)); + wait = msecs_to_jiffies(FIP_VN_RLIM_INT); + mod_timer(&fip->timer, jiffies + wait); fcoe_ctlr_set_state(fip, FIP_ST_VNMP_START); } @@ -3132,8 +3132,8 @@ static void fcoe_ctlr_vn_timeout(struct fcoe_ctlr *fip) fcoe_ctlr_vn_send(fip, FIP_SC_VN_BEACON, fcoe_all_vn2vn, 0); fip->port_ka_time = jiffies + - msecs_to_jiffies(FIP_VN_BEACON_INT + - (prandom_u32() % FIP_VN_BEACON_FUZZ)); + msecs_to_jiffies(FIP_VN_BEACON_INT) + + prandom_u32_max(msecs_to_jiffies(FIP_VN_BEACON_FUZZ)); } if (time_before(fip->port_ka_time, next_time)) next_time = fip->port_ka_time; diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 46c5cd9f8019e..9efd96e91d53e 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -288,7 +288,7 @@ batadv_iv_ogm_emit_send_time(const struct batadv_priv *bat_priv) /* when do we schedule a ogm packet to be sent */ static unsigned long batadv_iv_ogm_fwd_send_time(void) { - return jiffies + msecs_to_jiffies(prandom_u32() % (BATADV_JITTER / 2)); + return jiffies + prandom_u32_max(msecs_to_jiffies(BATADV_JITTER / 2)); } /* apply hop penalty for a normal link */ diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c index 411ce5fc6492f..61fa742ff5501 100644 --- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -85,12 +85,12 @@ struct batadv_orig_node *batadv_v_ogm_orig_get(struct batadv_priv *bat_priv, */ static void batadv_v_ogm_start_queue_timer(struct batadv_hard_iface *hard_iface) { - unsigned int msecs = BATADV_MAX_AGGREGATION_MS * 1000; + unsigned int delay = msecs_to_jiffies(BATADV_MAX_AGGREGATION_MS); - /* msecs * [0.9, 1.1] */ - msecs += prandom_u32() % (msecs / 5) - (msecs / 10); + /* delay * [0.9, 1.1] */ + delay += prandom_u32_max(delay / 5) - (delay / 10); queue_delayed_work(batadv_event_workqueue, &hard_iface->bat_v.aggr_wq, - msecs_to_jiffies(msecs / 1000)); + delay); } /** diff --git a/net/bluetooth/hci_request.c b/net/bluetooth/hci_request.c index 2a1b64dbf76e4..8b46e23b4abe7 100644 --- a/net/bluetooth/hci_request.c +++ b/net/bluetooth/hci_request.c @@ -1505,7 +1505,7 @@ int hci_get_random_address(struct hci_dev *hdev, bool require_privacy, bacpy(rand_addr, &hdev->rpa); - to = msecs_to_jiffies(hdev->rpa_timeout * 1000); + to = hdev->rpa_timeout * HZ; if (adv_instance) queue_delayed_work(hdev->workqueue, &adv_instance->rpa_expired_cb, to); diff --git a/net/wireless/scan.c b/net/wireless/scan.c index aef240fdf8df6..b6856cbb68d3b 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -700,7 +700,7 @@ void cfg80211_bss_age(struct cfg80211_registered_device *rdev, unsigned long age_secs) { struct cfg80211_internal_bss *bss; - unsigned long age_jiffies = msecs_to_jiffies(age_secs * MSEC_PER_SEC); + unsigned long age_jiffies = age_secs * HZ; spin_lock_bh(&rdev->bss_lock); list_for_each_entry(bss, &rdev->bss_list, list) diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c index 2236b5e0c1f25..8a2bf333200c1 100644 --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -1839,7 +1839,7 @@ static int wait_for_avail(struct snd_pcm_substream *substream, runtime->rate; wait_time = max(t, wait_time); } - wait_time = msecs_to_jiffies(wait_time * 1000); + wait_time *= HZ; } } diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index df40d38f6e290..1ea763f9f956d 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -1937,7 +1937,7 @@ static int snd_pcm_drain(struct snd_pcm_substream *substream, long t = runtime->period_size * 2 / runtime->rate; tout = max(t, tout); } - tout = msecs_to_jiffies(tout * 1000); + tout *= HZ; } tout = schedule_timeout(tout);
Rather than generating a random number of milliseconds in a constant range and then converting to jiffies, convert the range to jiffies (evaluated at compile time) and choose a random number of jiffies in that range. Likewise, "msecs_to_jiffies(seconds * 1000)" is more conveniently written "seconds * HZ". Signed-off-by: George Spelvin <lkml@sdf.org> Cc: Hannes Reinecke <hare@suse.de> Cc: linux-scsi@vger.kernel.org Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: Sven Eckelmann <sven@narfation.org> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-wireless@vger.kernel.org Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: alsa-devel@alsa-project.org --- drivers/scsi/fcoe/fcoe_ctlr.c | 10 +++++----- net/batman-adv/bat_iv_ogm.c | 2 +- net/batman-adv/bat_v_ogm.c | 8 ++++---- net/bluetooth/hci_request.c | 2 +- net/wireless/scan.c | 2 +- sound/core/pcm_lib.c | 2 +- sound/core/pcm_native.c | 2 +- 7 files changed, 14 insertions(+), 14 deletions(-)