diff mbox series

[net-next,v2,1/2] bnxt_en: cache only 24 bits of hw counter

Message ID 20241025194753.3070604-1-vadfed@meta.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2,1/2] bnxt_en: cache only 24 bits of hw counter | expand

Commit Message

Vadim Fedorenko Oct. 25, 2024, 7:47 p.m. UTC
This hardware can provide only 48 bits of cycle counter. We can leave
only 24 bits in the cache to extend RX timestamps from 32 bits to 48
bits. This make cache writes atomic even on 32 bit platforms and we can
simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
configuration structure will be also reduced by 4 bytes.

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c |  8 ++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h | 16 +---------------
 2 files changed, 5 insertions(+), 19 deletions(-)

Comments

Michael Chan Oct. 25, 2024, 9:31 p.m. UTC | #1
On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>
> This hardware can provide only 48 bits of cycle counter. We can leave
> only 24 bits in the cache to extend RX timestamps from 32 bits to 48
> bits. This make cache writes atomic even on 32 bit platforms and we can
> simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
> configuration structure will be also reduced by 4 bytes.

ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW
counter and for rollover check.  With this patch reducing
ptp->old_time to 24 bits, we now use the upper 16 bits for the cache
and the next 8 bits for the rollover check.  I think this will work.
But since the field is now 32-bit, why not use the full 32 bits
instead of 24 bits?  Thanks.
Vadim Fedorenko Oct. 25, 2024, 9:53 p.m. UTC | #2
On 25/10/2024 22:31, Michael Chan wrote:
> On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>
>> This hardware can provide only 48 bits of cycle counter. We can leave
>> only 24 bits in the cache to extend RX timestamps from 32 bits to 48
>> bits. This make cache writes atomic even on 32 bit platforms and we can
>> simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
>> configuration structure will be also reduced by 4 bytes.
> 
> ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW
> counter and for rollover check.  With this patch reducing
> ptp->old_time to 24 bits, we now use the upper 16 bits for the cache
> and the next 8 bits for the rollover check.  I think this will work.
> But since the field is now 32-bit, why not use the full 32 bits
> instead of 24 bits?  Thanks.

As you confirmed that the HW has 48 bits of cycle counter, we have to
cache 16 bits only. The other 8 bits are used for the rollover check. We 
can even use less bits for it. What is the use for another 8 bits? With
this patch the rollover will happen every ~16 ms, but will be caught
even till ~4s. If we will cache upper 32 bits, the rollover will happen
every 64us and we will have to update cache value more often. But at the
same time it will not bring any value, because the upper limit will 
still be ~4s. That's why I don't see any benefits of using all 32 bits.
Michael Chan Oct. 25, 2024, 10:13 p.m. UTC | #3
On Fri, Oct 25, 2024 at 2:53 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 25/10/2024 22:31, Michael Chan wrote:
> > On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> >>
> >> This hardware can provide only 48 bits of cycle counter. We can leave
> >> only 24 bits in the cache to extend RX timestamps from 32 bits to 48
> >> bits. This make cache writes atomic even on 32 bit platforms and we can
> >> simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
> >> configuration structure will be also reduced by 4 bytes.
> >
> > ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW
> > counter and for rollover check.  With this patch reducing
> > ptp->old_time to 24 bits, we now use the upper 16 bits for the cache
> > and the next 8 bits for the rollover check.  I think this will work.
> > But since the field is now 32-bit, why not use the full 32 bits
> > instead of 24 bits?  Thanks.
>
> As you confirmed that the HW has 48 bits of cycle counter, we have to
> cache 16 bits only. The other 8 bits are used for the rollover check. We
> can even use less bits for it. What is the use for another 8 bits? With
> this patch the rollover will happen every ~16 ms, but will be caught
> even till ~4s. If we will cache upper 32 bits, the rollover will happen
> every 64us and we will have to update cache value more often. But at the
> same time it will not bring any value, because the upper limit will
> still be ~4s. That's why I don't see any benefits of using all 32 bits.

I agree the extra bits have no value other than to fill the 32-bit
field.  But it should not affect the frequency of caching
ptp->old_time.  It should be updated in bnxt_ptp_ts_aux_work() at a
fixed interval (1 second).
Vadim Fedorenko Oct. 26, 2024, 10:20 p.m. UTC | #4
On 25/10/2024 23:13, Michael Chan wrote:
> On Fri, Oct 25, 2024 at 2:53 PM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
>>
>> On 25/10/2024 22:31, Michael Chan wrote:
>>> On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>>>
>>>> This hardware can provide only 48 bits of cycle counter. We can leave
>>>> only 24 bits in the cache to extend RX timestamps from 32 bits to 48
>>>> bits. This make cache writes atomic even on 32 bit platforms and we can
>>>> simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
>>>> configuration structure will be also reduced by 4 bytes.
>>>
>>> ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW
>>> counter and for rollover check.  With this patch reducing
>>> ptp->old_time to 24 bits, we now use the upper 16 bits for the cache
>>> and the next 8 bits for the rollover check.  I think this will work.
>>> But since the field is now 32-bit, why not use the full 32 bits
>>> instead of 24 bits?  Thanks.
>>
>> As you confirmed that the HW has 48 bits of cycle counter, we have to
>> cache 16 bits only. The other 8 bits are used for the rollover check. We
>> can even use less bits for it. What is the use for another 8 bits? With
>> this patch the rollover will happen every ~16 ms, but will be caught
>> even till ~4s. If we will cache upper 32 bits, the rollover will happen
>> every 64us and we will have to update cache value more often. But at the
>> same time it will not bring any value, because the upper limit will
>> still be ~4s. That's why I don't see any benefits of using all 32 bits.
> 
> I agree the extra bits have no value other than to fill the 32-bit
> field.  But it should not affect the frequency of caching
> ptp->old_time.  It should be updated in bnxt_ptp_ts_aux_work() at a
> fixed interval (1 second).

Ok, BNXT_LO_TIMER_MASK/BNXT_HI_TIMER_MASK use 24 bits only. I don't see
any reason to keep more bits out of 48 bit of counter and I tried to be
consistent across the code. It doesn't matter in terms of performance
should we shift 16 or 24 bits. But because there will be another version
(I forgot to remove one variable), I can change the code to fill 32-bit
field if you insist.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
index fa514be87650..c7e626b9098a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
@@ -106,7 +106,7 @@  static void bnxt_ptp_get_current_time(struct bnxt *bp)
 	if (!ptp)
 		return;
 	spin_lock_irqsave(&ptp->ptp_lock, flags);
-	WRITE_ONCE(ptp->old_time, ptp->current_time);
+	WRITE_ONCE(ptp->old_time, (u32)(ptp->current_time >> 24));
 	bnxt_refclk_read(bp, NULL, &ptp->current_time);
 	spin_unlock_irqrestore(&ptp->ptp_lock, flags);
 }
@@ -174,7 +174,7 @@  void bnxt_ptp_update_current_time(struct bnxt *bp)
 	struct bnxt_ptp_cfg *ptp = bp->ptp_cfg;
 
 	bnxt_refclk_read(ptp->bp, NULL, &ptp->current_time);
-	WRITE_ONCE(ptp->old_time, ptp->current_time);
+	WRITE_ONCE(ptp->old_time, (u32)(ptp->current_time >> 24));
 }
 
 static int bnxt_ptp_adjphc(struct bnxt_ptp_cfg *ptp, s64 delta)
@@ -813,7 +813,7 @@  int bnxt_get_rx_ts_p5(struct bnxt *bp, u64 *ts, u32 pkt_ts)
 	if (!ptp)
 		return -ENODEV;
 
-	BNXT_READ_TIME64(ptp, time, ptp->old_time);
+	time = (u64)(READ_ONCE(ptp->old_time) << 24);
 	*ts = (time & BNXT_HI_TIMER_MASK) | pkt_ts;
 	if (pkt_ts < (time & BNXT_LO_TIMER_MASK))
 		*ts += BNXT_LO_TIMER_MASK + 1;
@@ -1079,7 +1079,7 @@  int bnxt_ptp_init(struct bnxt *bp, bool phc_cfg)
 
 		spin_lock_irqsave(&ptp->ptp_lock, flags);
 		bnxt_refclk_read(bp, NULL, &ptp->current_time);
-		WRITE_ONCE(ptp->old_time, ptp->current_time);
+		WRITE_ONCE(ptp->old_time, (u32)(ptp->current_time >> 24));
 		spin_unlock_irqrestore(&ptp->ptp_lock, flags);
 		ptp_schedule_worker(ptp->ptp_clock, 0);
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
index f322466ecad3..80046bd314db 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
@@ -106,10 +106,10 @@  struct bnxt_ptp_cfg {
 	/* serialize ts tx request queuing */
 	spinlock_t		ptp_tx_lock;
 	u64			current_time;
-	u64			old_time;
 	unsigned long		next_period;
 	unsigned long		next_overflow_check;
 	u32			cmult;
+	u32			old_time;
 	/* a 23b shift cyclecounter will overflow in ~36 mins.  Check overflow every 18 mins. */
 	#define BNXT_PHC_OVERFLOW_PERIOD	(18 * 60 * HZ)
 
@@ -145,20 +145,6 @@  struct bnxt_ptp_cfg {
 	struct bnxt_ptp_stats	stats;
 };
 
-#if BITS_PER_LONG == 32
-#define BNXT_READ_TIME64(ptp, dst, src)				\
-do {								\
-	unsigned long flags;					\
-								\
-	spin_lock_irqsave(&(ptp)->ptp_lock, flags);		\
-	(dst) = (src);						\
-	spin_unlock_irqrestore(&(ptp)->ptp_lock, flags);	\
-} while (0)
-#else
-#define BNXT_READ_TIME64(ptp, dst, src)		\
-	((dst) = READ_ONCE(src))
-#endif
-
 #define BNXT_PTP_INC_TX_AVAIL(ptp)		\
 do {						\
 	spin_lock_bh(&(ptp)->ptp_tx_lock);	\