Message ID | cover.1675586554.git.wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: reduce div64 calls for __btrfs_map_block() and its variants | expand |
On Sun, Feb 05, 2023 at 04:53:40PM +0800, Qu Wenruo wrote: > Div64 is much slower than 32 bit division, and only get improved in > the most recent CPUs, that's why we have dedicated div64* helpers. I think that's a little too simple. All 64-bit CPUs can do 64-bit divisions of course. 32-bit CPUs usually can't do it inside a single instruction, and the helpers exist to avoid the libgcc calls. That should not stand against an cleanups and optimizations of course.
On 2023/2/6 14:38, Christoph Hellwig wrote: > On Sun, Feb 05, 2023 at 04:53:40PM +0800, Qu Wenruo wrote: >> Div64 is much slower than 32 bit division, and only get improved in >> the most recent CPUs, that's why we have dedicated div64* helpers. > > I think that's a little too simple. All 64-bit CPUs can do 64-bit > divisions of course. 32-bit CPUs usually can't do it inside a single > instruction, and the helpers exist to avoid the libgcc calls. Right, I focused too much on the perf part, but sometimes the perf part may sound more scary: For example, DIVQ on Skylake has latency of 42-95 cycles [1] (and reciprocal throughput of 24-90), for 64-bits inputs. I'm not sure what's the best way to benchmark such thing. Maybe just do such division with random numbers and run it at module load time to verify the perf? Thanks, Qu > > That should not stand against an cleanups and optimizations of course. >
On Mon, Feb 06, 2023 at 02:58:55PM +0800, Qu Wenruo wrote: > Right, I focused too much on the perf part, but sometimes the perf part may > sound more scary: > > For example, DIVQ on Skylake has latency of 42-95 cycles [1] (and > reciprocal throughput of 24-90), for 64-bits inputs. > > I'm not sure what's the best way to benchmark such thing. > Maybe just do such division with random numbers and run it at module load > time to verify the perf? Honestly, getting rid of the ugly divisions calls is probably reason enough as the series looks like a nice cleanup. I just had to nipick on the sentence.