diff mbox series

[wireless-next] wifi: iwlwifi: mvm: Avoid 64-bit division in iwl_mvm_get_crosstimestamp_fw()

Message ID 20230329-iwlwifi-ptp-avoid-64-bit-div-v1-1-ad8db8d66bc2@kernel.org (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [wireless-next] wifi: iwlwifi: mvm: Avoid 64-bit division in iwl_mvm_get_crosstimestamp_fw() | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Nathan Chancellor March 29, 2023, 5:05 p.m. UTC
There is a 64-bit division in iwl_mvm_get_crosstimestamp_fw(), which
results in a link failure when building 32-bit architectures with clang:

  ld.lld: error: undefined symbol: __udivdi3
  >>> referenced by ptp.c
  >>>               drivers/net/wireless/intel/iwlwifi/mvm/ptp.o:(iwl_mvm_phc_get_crosstimestamp) in archive vmlinux.a

GCC has optimizations for division by a constant that clang does not
implement, so this issue is not visible when building with GCC.

Using div_u64() would resolve this issue, but Arnd points out that this
can be quite expensive and the timestamp is being read at nanosecond
granularity. Nick pointed out that the result of this division is being
stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
division, which elides the need for libcalls.

Fixes: 21fb8da6ebe4 ("wifi: iwlwifi: mvm: read synced time from firmware if supported")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Link: https://github.com/ClangBuiltLinux/linux/issues/1826
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Link: https://lore.kernel.org/6423173a.620a0220.3d5cc.6358@mx.google.com/
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 drivers/net/wireless/intel/iwlwifi/mvm/ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


---
base-commit: 2af3b2a631b194a43551ce119cb71559d8f6b54b
change-id: 20230329-iwlwifi-ptp-avoid-64-bit-div-1c4717f73f8a

Best regards,

Comments

Johannes Berg March 29, 2023, 5:20 p.m. UTC | #1
On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote:
> 
> GCC has optimizations for division by a constant that clang does not
> implement, so this issue is not visible when building with GCC.

Huh yeah, we did 32-bit builds with gcc ...

> Using div_u64() would resolve this issue, but Arnd points out that this
> can be quite expensive and the timestamp is being read at nanosecond
> granularity. 

Doesn't matter though, all the calculations are based on just the
command response from the firmware, which (tries to) take it in a
synchronised fashion.

So taking more time here would be fine, as far as I can tell.

> Nick pointed out that the result of this division is being
> stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
> division, which elides the need for libcalls.

That loses ~7 top bits though, no? I'd be more worried about that, than
the time div_u64() takes.

johannes
Nathan Chancellor March 29, 2023, 5:24 p.m. UTC | #2
On Wed, Mar 29, 2023 at 07:20:43PM +0200, Johannes Berg wrote:
> On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote:
> > 
> > GCC has optimizations for division by a constant that clang does not
> > implement, so this issue is not visible when building with GCC.
> 
> Huh yeah, we did 32-bit builds with gcc ...
> 
> > Using div_u64() would resolve this issue, but Arnd points out that this
> > can be quite expensive and the timestamp is being read at nanosecond
> > granularity. 
> 
> Doesn't matter though, all the calculations are based on just the
> command response from the firmware, which (tries to) take it in a
> synchronised fashion.

Okay, that is good information, thanks for providing it!

> So taking more time here would be fine, as far as I can tell.
> 
> > Nick pointed out that the result of this division is being
> > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
> > division, which elides the need for libcalls.
> 
> That loses ~7 top bits though, no? I'd be more worried about that, than
> the time div_u64() takes.

Right, I sent this version of the fix to spur discussion around whether
or not this was an acceptable approach, rather than having the question
sit unanswered in our issue tracker :) I have no problems sending a v2
to use div_u64() and be done with it, which I will do shortly.

Thanks for the quick input, cheers!
Nathan
Nick Desaulniers March 29, 2023, 5:30 p.m. UTC | #3
On Wed, Mar 29, 2023 at 10:20 AM Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote:
> >
> > GCC has optimizations for division by a constant that clang does not
> > implement, so this issue is not visible when building with GCC.
>
> Huh yeah, we did 32-bit builds with gcc ...

Right, GCC is better about turning division by double-word constant
into multiplication by reciprocal. Craig has been improving LLVM, but
it seems that some divisors still aren't supported (in this case 100).

>
> > Using div_u64() would resolve this issue, but Arnd points out that this
> > can be quite expensive and the timestamp is being read at nanosecond
> > granularity.
>
> Doesn't matter though, all the calculations are based on just the
> command response from the firmware, which (tries to) take it in a
> synchronised fashion.
>
> So taking more time here would be fine, as far as I can tell.

div_u64() it is then.

>
> > Nick pointed out that the result of this division is being
> > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
> > division, which elides the need for libcalls.
>
> That loses ~7 top bits though, no? I'd be more worried about that, than
> the time div_u64() takes.

The result is still stored in a u32; there is a loss of precision
regardless of use of div_u64 or open coded binary operator /.  So is
the loss of precision before the division as tolerable as after the
division?
Johannes Berg March 29, 2023, 7 p.m. UTC | #4
On Wed, 2023-03-29 at 10:30 -0700, Nick Desaulniers wrote:
> > 
> > > Nick pointed out that the result of this division is being
> > > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
> > > division, which elides the need for libcalls.
> > 
> > That loses ~7 top bits though, no? I'd be more worried about that, than
> > the time div_u64() takes.
> 
> The result is still stored in a u32; there is a loss of precision
> regardless of use of div_u64 or open coded binary operator /.  
> 

Right, obviously.

> So is
> the loss of precision before the division as tolerable as after the
> division?

For all I can tell this is meant to be 'gp2' with an additional lower
bits to reach a unit/granularity of 10ns, basically in FW something like

  gp2_10ns = gp2 * 100 + subsampling_10ns_unit

(and gp2 in FW is a 32-bit value, so it rolls over eventually).

But I _think_ we want to make a proper division by 100 to obtain back
the original 'gp2' value here.

johannes
diff mbox series

Patch

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c b/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c
index 5c2bfc8ed88d..cdd6d69c5b68 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c
@@ -116,7 +116,7 @@  iwl_mvm_get_crosstimestamp_fw(struct iwl_mvm *mvm, u32 *gp2, u64 *sys_time)
 
 	gp2_10ns = (u64)le32_to_cpu(resp->gp2_timestamp_hi) << 32 |
 		le32_to_cpu(resp->gp2_timestamp_lo);
-	*gp2 = gp2_10ns / 100;
+	*gp2 = (u32)gp2_10ns / 100;
 
 	*sys_time = (u64)le32_to_cpu(resp->platform_timestamp_hi) << 32 |
 		le32_to_cpu(resp->platform_timestamp_lo);