diff mbox series

[net-next,v2,4/4] net: dqs: make struct dql more cache efficient

Message ID 20240408172605.635508-5-leitao@debian.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net : dqs: optimize if stall threshold is not set | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Breno Leitao April 8, 2024, 5:25 p.m. UTC
With the previous change, struct dqs->stall_thrs will be in the hot path
(at queue side), even if DQS is disabled.

The other fields accessed in this function (last_obj_cnt and num_queued)
are in the first cache line, let's move this field  (stall_thrs) to the
very first cache line, since there is a hole there.

This does not change the structure size, since it moves an short (2
bytes) to 4-bytes whole in the first cache line.

This is the new structure format now:

struct dql {
	unsigned int    num_queued;     /* Total ever queued */
	unsigned int    last_obj_cnt;       /* Count at last queuing */
...
	short unsigned int         stall_thrs;           /*    12     2 */
	/* XXX 2 bytes hole, try to pack */
...
	/* --- cacheline 1 boundary (64 bytes) --- */
...
 	/* Longest stall detected, reported to user */
	short unsigned int         stall_max;            /*   100     2 */
	/* XXX 2 bytes hole, try to pack */
};

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/dynamic_queue_limits.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Jakub Kicinski April 10, 2024, 12:21 a.m. UTC | #1
On Mon,  8 Apr 2024 10:25:56 -0700 Breno Leitao wrote:
> With the previous change, struct dqs->stall_thrs will be in the hot path
> (at queue side), even if DQS is disabled.
> 
> The other fields accessed in this function (last_obj_cnt and num_queued)
> are in the first cache line, let's move this field  (stall_thrs) to the
> very first cache line, since there is a hole there.
> 
> This does not change the structure size, since it moves an short (2
> bytes) to 4-bytes whole in the first cache line.

Doesn't this move the cache line bouncing problem to the other side?
Eric said "copy" I read that as "have two fields with the same value".
I think it's single digit number of alu instructions we'd be saving
here, not super convinced patch 3 is the right trade off...
Breno Leitao April 10, 2024, 1:52 p.m. UTC | #2
On Tue, Apr 09, 2024 at 05:21:49PM -0700, Jakub Kicinski wrote:
> On Mon,  8 Apr 2024 10:25:56 -0700 Breno Leitao wrote:
> > With the previous change, struct dqs->stall_thrs will be in the hot path
> > (at queue side), even if DQS is disabled.
> > 
> > The other fields accessed in this function (last_obj_cnt and num_queued)
> > are in the first cache line, let's move this field  (stall_thrs) to the
> > very first cache line, since there is a hole there.
> > 
> > This does not change the structure size, since it moves an short (2
> > bytes) to 4-bytes whole in the first cache line.
> 
> Doesn't this move the cache line bouncing problem to the other side?

I think so. Looking at dql_check_stall(), it only uses fields in the
second cache line, except now 'dql->stall_thrs' that is in the first
cache line.

> Eric said "copy" I read that as "have two fields with the same value".

Sorry, I misunderstood it. I can create two fields, and update them
together at the only place where they will be updated
(bql_set_stall_thrs).

> I think it's single digit number of alu instructions we'd be saving
> here, not super convinced patch 3 is the right trade off...

Right. I was more concerned about the write barriers (smp_wmb()) inside
the loop which happen quite frequently.

But, if you think that the this is not the right approach, I can drop
this whole patchset. Do you think a profiler will us here?

Thanks!
Jakub Kicinski April 11, 2024, 1:45 a.m. UTC | #3
On Wed, 10 Apr 2024 06:52:56 -0700 Breno Leitao wrote:
> > Doesn't this move the cache line bouncing problem to the other side?  
> 
> I think so. Looking at dql_check_stall(), it only uses fields in the
> second cache line, except now 'dql->stall_thrs' that is in the first
> cache line.

We do read num_queued at the beginning of dql_completed().
So maybe we we move the read of the threshold there we will be fine.
diff mbox series

Patch

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 869afb800ea1..281298e77a15 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -50,6 +50,9 @@  struct dql {
 	unsigned int	adj_limit;		/* limit + num_completed */
 	unsigned int	last_obj_cnt;		/* Count at last queuing */
 
+	/* Stall threshold (in jiffies), defined by user */
+	unsigned short	stall_thrs;
+
 	unsigned long	history_head;		/* top 58 bits of jiffies */
 	/* stall entries, a bit per entry */
 	unsigned long	history[DQL_HIST_LEN];
@@ -71,8 +74,6 @@  struct dql {
 	unsigned int	min_limit;		/* Minimum limit */
 	unsigned int	slack_hold_time;	/* Time to measure slack */
 
-	/* Stall threshold (in jiffies), defined by user */
-	unsigned short	stall_thrs;
 	/* Longest stall detected, reported to user */
 	unsigned short	stall_max;
 	unsigned long	last_reap;		/* Last reap (in jiffies) */